Kprobe placed on the kretprobe_trampoline() during boot time can be
optimized, since the instruction at probe point is a 'nop'.
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Current infrastructure of kprobe uses the unconditional trap instruction
to probe a running kernel. Optprobe allows kprobe to replace the trap
with a branch instruction to a detour buffer. Detour buffer contains
instructions to create an in memory pt_regs. Detour buffer also has a
call to optimized_callback() which in turn call the pre_handler(). After
the execution of the pre-handler, a call is made for instruction
emulation. The NIP is determined in advanced through dummy instruction
emulation and a branch instruction is created to the NIP at the end of
the trampoline.
To address the limitation of branch instruction in POWER architecture,
detour buffer slot is allocated from a reserved area. For the time
being, 64KB is reserved in memory for this purpose.
Instructions which can be emulated using analyse_instr() are the
candidates for optimization. Before optimization ensure that the address
range between the detour buffer allocated and the instruction being
probed is within +/- 32MB.
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Fix two issues with kprobes.h on BE which were exposed with the
optprobes work:
- one, having to do with a missing include for linux/module.h for
MODULE_NAME_LEN -- this didn't show up previously since the only
users of kprobe_lookup_name were in kprobes.c, which included
linux/module.h through other headers, and
- two, with a missing const qualifier for a local variable which ends
up referring a string literal. Again, this is unique to how
kprobe_lookup_name is being invoked in optprobes.c
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
To permit the use of relative branch instruction in powerpc, the target
address has to be relatively nearby, since the address is specified in an
immediate field (24 bit filed) in the instruction opcode itself. Here
nearby refers to 32MB on either side of the current instruction.
This patch verifies whether the target address is within +/- 32MB
range or not.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Introduce __PPC_SH64() as a 64-bit variant to encode shift field in some
of the shift and rotate instructions operating on double-words. Convert
some of the BPF instruction macros to use the same.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We've now implemented code in the pseries platform to use the new PAPR
interface to allow resizing the hash page table (HPT) at runtime.
This patch uses that interface to automatically attempt to resize the HPT
when memory is hot added or removed. This tries to always keep the HPT at
a reasonable size for our current memory size.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The hypervisor needs to know a guest is capable of using the HPT resizing
PAPR extension in order to make full advantage of it for memory hotplug.
If the hypervisor knows the guest is HPT resize aware, it can size the
initial HPT based on the initial guest RAM size, relying on the guest to
resize the HPT when more memory is hot-added. Without this, the hypervisor
must size the HPT for the maximum possible guest RAM, which can lead to
a huge waste of space if the guest never actually expends to that maximum
size.
This patch advertises the guest's support for HPT resizing via the
ibm,client-architecture-support OF interface. We use bit 5 of byte 6 of
option vector 5 for this purpose, as defined in the PAPR ACR "HPT
resizing option".
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Reviewed-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This adds support for using two hypercalls to change the size of the
main hash page table while running as a PAPR guest. For now these
hypercalls are only in experimental qemu versions.
The interface is two part: first H_RESIZE_HPT_PREPARE is used to
allocate and prepare the new hash table. This may be slow, but can be
done asynchronously. Then, H_RESIZE_HPT_COMMIT is used to switch to the
new hash table. This requires that no CPUs be concurrently updating the
HPT, and so must be run under stop_machine().
This also adds a debugfs file which can be used to manually control
HPT resizing or testing purposes.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Paul Mackerras <paulus@samba.org>
[mpe: Rename the debugfs file to "hpt_order"]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The other instances of this structure got removed along with the MDIO
device change, but this one was left behind and needs to be removed
as well:
arch/arm/mach-orion5x/wnr854t-setup.c:109:44: error: 'wnr854t_switch_plat_data' defined but not used [-Werror=unused-variable]
static struct dsa_platform_data __initdata wnr854t_switch_plat_data = {
Fixes: 575e93f7b5 ("ARM: orion: Register DSA switch as a MDIO device")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull ARM fixes from Russell King:
"A couple more fixes for 4.10:
- fix addressing the short regset write issue (Dave Martin)
- fix for LPAE systems which leave a pending imprecise data abort
before entering the kernel (Alexander Sverdlin)"
* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 8643/3: arm/ptrace: Preserve previous registers for short regset write
ARM: 8642/1: LPAE: catch pending imprecise abort on unmask
The SPE architecture requires each exception level to enable access
to the SPE controls for the exception level below it, since additional
context-switch logic may be required to handle the buffer safely.
This patch allows EL1 (host) access to the SPE controls when entered at
EL2.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The newly added hypercall doesn't work on x86-32:
arch/x86/kvm/x86.c: In function 'kvm_pv_clock_pairing':
arch/x86/kvm/x86.c:6163:6: error: implicit declaration of function 'kvm_get_walltime_and_clockread';did you mean 'kvm_get_time_scale'? [-Werror=implicit-function-declaration]
This adds an #ifdef around it, matching the one around the related
functions that are also only implemented on 64-bit systems.
Fixes: 55dd00a73a ("KVM: x86: add KVM_HC_CLOCK_PAIRING hypercall")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
To is_vmalloc_addr() to check if an address is a vmalloc address
instead of checking VMALLOC_START and VMALLOC_END manually.
Signed-off-by: Miles Chen <miles.chen@mediatek.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Delete an assignment for the local variable "ret" in an if branch
because it was initialised by the same value.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Mac interrupt code has been debugged. The Penguin deficiencies that
still cause unhandled interrupts aren't fixable here. Besides,
interrupts are fast and frequent and these printk statements
were never really useful IMO. Remove them.
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
In macints.c there is some startup code which disables the SONIC interrupt
in an attempt to avoid an unhandled slot interrupt, which would be fatal.
This only works on those machines where the SONIC device is on-board.
When the mac_sonic driver is built-in, there's little point in doing this,
because the device will be initialized a few seconds later anyway. But
when mac_sonic is a module, the window for an unhandled interrupt is
longer.
Either way, we've already run the gauntlet for 5 or 10 seconds by the time
we get around to disabling this particular device. It's only by sheer luck
that we got this far.
Really, this is too little too late. The general problem of unhandled
early interrupts also affects other devices on other models. There are
better ways to resolve this problem.
1) When using the Penguin bootloader, boot Mac OS with extensions disabled
(by holding down the shift key at startup or by use of the Extensions
Manager control panel). The Penguin docs already contain this advice,
as it is always effective.
2) Have the Penguin bootloader disable the device. It already attempts
to disable slot interrupts. But since some hardware cannot mask slot
interrupts, Penguin should probably close the relevant device
drivers.
3) Use Emile instead of Penguin. AFAIK the boot ROM never enables network
device interrupts and hence they don't need to be disabled.
Remove this hack. It requires maintenance and it doesn't solve the
problem. It improves the odds for a few models, but so does setting
CONFIG_MAC_SONIC=y.
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
mac_nmi_handler() is useless in its present form and locks up my PowerBook
180. Let's throw out the dead code and make it do something useful: print
a register dump and a stack trace.
mac_debug_handler() is also dead code. Remove it along with its static
data.
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
arch/m68k/mac/misc.c does not use any miscdevice so the
inclusion of linux/miscdevice.h is unnecessary.
This patch remove this unnecessary include.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Fix to return error code -ENOMEM from the memory alloc error handling
case instead of 0, as done elsewhere in this function.
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This adds the hypercall numbers and wrapper functions for the hash page
table resizing hypercalls.
These hypercall numbers are defined in the PAPR ACR "HPT resizing
option".
It also adds a new firmware feature flag to track the presence of the
HPT resizing calls.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Otherwise KVM will fail to pass them through to the host
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The IPIs come in as HVI not EE, so we need to test the appropriate
SRR1 bits. The encoding is such that it won't have false positives
on P7 and P8 so we can just test it like that. We also need to handle
the icp-opal variant of the flush.
Fixes: d74361881f ("powerpc/xics: Add ICP OPAL backend")
Cc: stable@vger.kernel.org # v4.8+
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Three tiny changes to the ERAT flushing logic: First don't make
it depend on DD1. It hasn't been decided yet but we might run
DD2 in a mode that also requires explicit flushes for performance
reasons so make it unconditional. We also add a missing isync, and
finally remove the flush from _tlbiel_va as it is only necessary
for congruence-class invalidations (PID, LPID and full TLB), not
targetted invalidations.
Fixes: 96ed1fe511 ("powerpc/mm/radix: Invalidate ERAT on tlbiel for POWER9 DD1")
Cc: stable@vger.kernel.org # v4.9+
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This reverts commit 020eb3daab.
Gabriel C reports that it causes his machine to not boot, and we haven't
tracked down the reason for it yet. Since the bug it fixes has been
around for a longish time, we're better off reverting the fix for now.
Gabriel says:
"It hangs early and freezes with a lot RCU warnings.
I bisected it down to :
> Ruslan Ruslichenko (1):
> x86/ioapic: Restore IO-APIC irq_chip retrigger callback
Reverting this one fixes the problem for me..
The box is a PRIMERGY TX200 S5 , 2 socket , 2 x E5520 CPU(s) installed"
and Ruslan and Thomas are currently stumped.
Reported-and-bisected-by: Gabriel C <nix.or.die@gmail.com>
Cc: Ruslan Ruslichenko <rruslich@cisco.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@kernel.org # for the backport of the original commit
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Enable all applicable CPUfreq options.
Signed-off-by: Markus Mayer <mmayer@broadcom.com>
Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Turn on CPU_SUPPORTS_CPUFREQ and MIPS_EXTERNAL_TIMER for BMIPS.
Signed-off-by: Markus Mayer <mmayer@broadcom.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Ran "make savedefconfig" to bring bmips_stb_defconfig up to date.
Signed-off-by: Markus Mayer <mmayer@broadcom.com>
Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Recent versions of OPAL can provide names for the various OPAL interrupts,
so let's use them. This also modernises the code that fetches the
interrupt array to use the helpers provided by the generic code instead
of hand-parsing the property.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[mpe: Free irqs on error, check allocation of names, consolidate error
handling, whitespace.]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
On some CAPP errors we see console messages that prints unknown HMIs for
which CAPI recovery is in progress. This patch fixes this by printing
correct error info for HMI generated due to CAPP recovery.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Tested-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
These are common on bare metal machines, so put them in the defconfig.
This adds 216KB to the vmlinux size
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently in arm64's copy_{to,from}_user, we only check the
source/destination object size if access_ok() tells us the user access
is permissible.
However, in copy_from_user() we'll subsequently zero any remainder on
the destination object. If we failed the access_ok() check, that applies
to the whole object size, which we didn't check.
To ensure that we catch that case, this patch hoists check_object_size()
to the start of copy_from_user(), matching __copy_from_user() and
__copy_to_user(). To make all of our uaccess copy primitives consistent,
the same is done to copy_to_user().
Cc: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
- A relatively large patch restores booting on i.MX platforms that
failed to boot after a cleanup was merged for v4.10.
- A quirk for USB needs to be enabled on the STi platform
- On the Meson platform, we saw memory corruption with part of
the memory used by the secure monitor, so we have to stay out
of that area.
- The same platform also has a problem with ethernet under load,
which is fixed by disabling EEE negotiation.
- imx6dl has an incorrect pin configuration, which prevents SPI
from working.
- Two maintainers have lost their access to their email addresses, so
we should update the MAINTAINERS file before the release
- Renaming one of the orion5x linkstation models to help simplify
the debian install.
- A couple of fixes for build warnings that were introduced during
v4.10-rc.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIVAwUAWJtPYGCrR//JCVInAQIrqxAA4XlWN83jn0+zDt1A7OmrXUhtycD8eJFM
cP0Lc965pa1OVVkcIRRrgvYE9dnILuK6qPit7HbQtQcCSRxb5P2OQ0AyNBjAt6C0
exBD/R13aImPLWWuGl65/WqBaCWaZ9KnVqzNHDXGt9d51NBsqreM9TdwLMvFMQBl
tyvoRNK4TbIMGpOtrnLMTwHLkh4yZXik7srkuwSV0jeIVh7HrQUd2eawqWssuX7A
idkZWBheDhQt2s1tI5wkRf4TFEI6muWpaNaU3NGi9qmQdHpJWc0ivYZtHlE29Fli
T/nXDWmPptRIhOSIney6TwLdgN1Lg4ztRdaowHEpYXnfieUx+P86QJTXhxxo/3eT
be30IhWX4WKAWiQkQHAsVCt6zIYRfXE5N8An6S5MfsC3n1dYAvCCf/qpToGUnoc/
nyZQcbasHaSB3r5YMUmgH1oDowT9FsE/iaOzCr5xymiXgxR/p3gTVxcLn9jgp3Zq
m1hSNfCACGmGLNcQBR/fz63/1b6sanXV6JOSEEB2TfpcQ0Mi0AaeZjAR9JAfQbNR
hG/r8LC2Q0cG10zwOe1Iv60Ery7UsfCtdzEU6D3/gwtlDssSOs351cBoIKWtQxnX
SPcUag9ZHZS1iuaZaSISmOWMyhK7CeCjk12TDFJPBrLolJOhIuHUOW/5cFwydWgp
DLbhSBwqQKU=
=zlTb
-----END PGP SIGNATURE-----
Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Arnd Bergmann:
- A relatively large patch restores booting on i.MX platforms that
failed to boot after a cleanup was merged for v4.10.
- A quirk for USB needs to be enabled on the STi platform
- On the Meson platform, we saw memory corruption with part of the
memory used by the secure monitor, so we have to stay out of that
area.
- The same platform also has a problem with ethernet under load, which
is fixed by disabling EEE negotiation.
- imx6dl has an incorrect pin configuration, which prevents SPI from
working.
- Two maintainers have lost their access to their email addresses, so
we should update the MAINTAINERS file before the release
- Renaming one of the orion5x linkstation models to help simplify the
debian install.
- A couple of fixes for build warnings that were introduced during
v4.10-rc.
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: defconfigs: make NF_CT_PROTO_SCTP and NF_CT_PROTO_UDPLITE built-in
MAINTAINERS: socfpga: update email for Dinh Nguyen
ARM: orion5x: fix Makefile for linkstation-lschl.dtb
ARM: dts: orion5x-lschl: More consistent naming on linkstation series
ARM: dts: orion5x-lschl: Fix model name
MAINTAINERS: change email address from atmel to microchip
MAINTAINERS: at91: change email address
ARM64: dts: meson-gx: Add firmware reserved memory zones
ARM64: dts: meson-gxbb-odroidc2: fix GbE tx link breakage
ARM: dts: STiH407-family: set snps,dis_u3_susphy_quirk
ARM: dts: imx: Pass 'chosen' and 'memory' nodes
ARM: dts: imx6dl: fix GPIO4 range
ARM: imx: hide unused variable in #ifdef
Emulate read and write operations to CNTP_TVAL, CNTP_CVAL and CNTP_CTL.
Now VMs are able to use the EL1 physical timer.
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
KVM traps on the EL1 phys timer accesses from VMs, but it doesn't handle
those traps. This results in terminating VMs. Instead, set a handler for
the EL1 phys timer access, and inject an undefined exception as an
intermediate step.
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
When scheduling a background timer, consider both of the virtual and
physical timer and pick the earliest expiration time.
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Initialize the emulated EL1 physical timer with the default irq number.
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Now that we have a separate structure for timer context, make functions
generic so that they can work with any timer context, not just the
virtual timer context. This does not change the virtual timer
functionality.
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Make cntvoff per each timer context. This is helpful to abstract kvm
timer functions to work with timer context without considering timer
types (e.g. physical timer or virtual timer).
This also would pave the way for ever doing adjustments of the cntvoff
on a per-CPU basis if that should ever make sense.
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
The sysfs cpu_capacity entry for each CPU has nothing to do with
PROC_FS, nor it's in /proc/sys path.
Remove such ifdef.
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Reported-and-suggested-by: Sudeep Holla <sudeep.holla@arm.com>
Fixes: be8f185d8a ('arm64: add sysfs cpu_capacity attribute')
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Bit 0x100 of a page table, segment table of region table entry
can be used to disallow code execution for the virtual addresses
associated with the entry.
There is one tricky bit, the system call to return from a signal
is part of the signal frame written to the user stack. With a
non-executable stack this would stop working. To avoid breaking
things the protection fault handler checks the opcode that caused
the fault for 0x0a77 (sys_sigreturn) and 0x0aad (sys_rt_sigreturn)
and injects a system call. This is preferable to the alternative
solution with a stub function in the vdso because it works for
vdso=off and statically linked binaries as well.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add hardware capability bits and feature tags to /proc/cpuinfo for
the "Vector Packed Decimal Facility" (tag "vxd" / hwcap bit 2^12)
and the "Vector Enhancements Facility 1" (tag "vxe" / hwcap bit 2^13).
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The current implementation of setup_randomness uses the stack address
and therefore the pointer to the SYSIB 3.2.2 block as input data
address. Furthermore the length of the input data is the number of
virtual-machine description blocks which is typically one.
This means that typically a single zero byte is fed to
add_device_randomness.
Fix both of these and use the address of the first virtual machine
description block as input data address and also use the correct
length.
Fixes: bcfcbb6bae ("s390: add system information as device randomness")
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The early vt220 sclp printk code added an extra new line to each
printed multi-line text. If used for the early sclp console this will
lead to numerous extra new lines. Therefore get rid of this semantic
and require that each to be printed string contains a line feed
character if a new line is wanted.
Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This patch
- unifies the old sclp early code and the sclp early printk code, so
they can use common functions
- makes sure all sclp early functions and variables have the same
"sclp_early" prefix
- converts the sclp early printk code into readable code by using
existing data structures instead of hard coded magic arrays
- splits the early sclp code into two files: sclp_early.c and
sclp_early_core.c. The core file contains everything that is
required by the kernel decompressor and may not call functions not
contained within the core file. Otherwise the result would be a
link error.
- changes interrupt handling to be completely synchronous. The old
early sclp code had a small window which allowed to receive several
interrupts instead of exactly the single expected interrupt. This
did hide a subtle potential bug, which is fixed with this large
rework.
- contains a couple of small cleanups.
Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Move the early sclp printk code to the drivers folder where also the
rest of the sclp code can be found. This way it is possible to use the
sclp private header files for further cleanups.
Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
When autonuma (Automatic NUMA balancing) marks a PTE inaccessible it
clears all the protection bits but leave the PTE valid.
With the Radix MMU, an attempt at executing from such a PTE will
take a fault with bit 35 of SRR1 set "SRR1_ISI_N_OR_G".
It is thus incorrect to treat all such faults as errors. We should
pass them to handle_mm_fault() for autonuma to deal with. The case
of pages that are really not executable is handled by the existing
test for VM_EXEC further down.
That leaves us with catching the kernel attempts at executing user
pages. We can catch that earlier, even before we do find_vma.
It is never valid on powerpc for the kernel to take an exec fault
to begin with. So fold that test with the existing test for the
kernel faulting on kernel addresses to bail out early.
Fixes: 1d18ad0268 ("powerpc/mm: Detect instruction fetch denied and report")
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Fix rebase breakage from commit 55dd00a73a ("KVM: x86: add
KVM_HC_CLOCK_PAIRING hypercall", 2017-01-24), courtesy of the
"I could have sworn I had pushed the right branch" department.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This merges in a fix which touches both PPC and KVM code,
which was therefore put into a topic branch in the powerpc
tree.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Due to the reference clock comes from 26M oscillator directly
on mt8173, and it is a fixed-clock in DTS which always turned
on, we ignore it before. But on some platforms, it comes
from PLL, and need be controlled, so here add it, no matter
it is a fixed-clock or not.
Signed-off-by: Chunfeng Yun <chunfeng.yun@mediatek.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently we have code inline in the arch timer probe path to cater for
Freescale erratum A-008585, complete with ifdeffery. This is a little
ugly, and will get worse as we try to add more errata handling.
This patch refactors the handling of Freescale erratum A-008585. Now the
erratum is described in a generic arch_timer_erratum_workaround
structure, and the probe path can iterate over these to detect errata
and enable workarounds.
This will simplify the addition and maintenance of code handling
Hisilicon erratum 161010101.
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
[Mark: split patch, correct Kconfig, reword commit message]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
The conflict was an interaction between a bug fix in the
netvsc driver in 'net' and an optimization of the RX path
in 'net-next'.
Signed-off-by: David S. Miller <davem@davemloft.net>
Both of these options are poorly named. The features they provide are
necessary for system security and should not be considered debug only.
Change the names to CONFIG_STRICT_KERNEL_RWX and
CONFIG_STRICT_MODULE_RWX to better describe what these options do.
Signed-off-by: Laura Abbott <labbott@redhat.com>
Acked-by: Jessica Yu <jeyu@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
There are multiple architectures that support CONFIG_DEBUG_RODATA and
CONFIG_SET_MODULE_RONX. These options also now have the ability to be
turned off at runtime. Move these to an architecture independent
location and make these options def_bool y for almost all of those
arches.
Signed-off-by: Laura Abbott <labbott@redhat.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
This patch adds a OSTM driver for the Renesas architecture.
The OS Timer (OSTM) has independent channels that can be
used as a freerun or interval times.
This driver uses the first probed device as a clocksource
and then any additional devices as clock events.
Signed-off-by: Chris Brandt <chris.brandt@renesas.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
- enable some simd extensions for guests
- enable nx for guests
- debug log for cpu model
- PER fixes
- remove bitwise annotation from ar_t
- detect guests in operation exception program check loops
- fix potential null-pointer dereference for ucontrol guests
- also contains merge for fix that went into 4.10 to avoid conflicts
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)
iQIcBAABAgAGBQJYmGFiAAoJEBF7vIC1phx8wzYP/2xrpknbKSLhG7Vnn6R7Mmkq
joUl4bfqKoYDVkJ2yYLkN3WfCyIx/MA7t406EYjt/6INQcGWOWR4dG8/LU4sw5xk
1EXckD2YIR1iGIasTCsyDjCpABqsldIUofaKPsNeWFtnnKfR7EK9FJowFOkdRAwk
BUaBG5drayOnLySo02E7BrN3EAqkIuDdZinpM8e25h6nU9dLjS5o8nxX5iIIItgZ
VyHDTfWAGxWMqC5s4MnKsxC01NFA8JJa1KQful199D1jZ2nsC66OobNPr3vpaLFS
Nbolls9AF6jKDLPSJQopkkWcr3BuFFYwZSYsNYDRmuUc2Ellvuf0Ug/X2yQEN4lx
VnCRo9mDNRIryWVg1h003EQSVT7rgi3pBH+T7U+N9JPwdN7RgkDOvOgbMjWO0I3m
glZhJ/l0MIfEfgtam6cu3/k5r/ZKPDoE+kGXZMvOJ4MLI534ErD9yVgEarPYzEM4
fWnOuznUHRUKARhf6zR3DCIyp39UwD+QAfoTPgyvvnUFjWaPaWsuktZ4P4e1KPTT
XDfPTqQQJScBQtwphHYVvDGyfPRv6/taQXFyQAu95O31b8OBeQvlunyivBo7E+dr
ocL0f7pqjKkaw/0qICyLbUL0rBHGNuf3E2ODyyDl3HEI3+NL8c42+M8KaMps4cas
QyhSoeENYV9Kw6UpOBn6
=0x3e
-----END PGP SIGNATURE-----
Merge tag 'kvm-s390-next-4.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
KVM: s390: Fixes and features for 4.11 (via kvm/next)
- enable some simd extensions for guests
- enable nx for guests
- debug log for cpu model
- PER fixes
- remove bitwise annotation from ar_t
- detect guests in operation exception program check loops
- fix potential null-pointer dereference for ucontrol guests
- also contains merge for fix that went into 4.10 to avoid conflicts
Numerous MIPS KVM fixes, improvements, and features for 4.11, many of
which continue to pave the way for VZ support, the most interesting of
which are:
- Add GVA->HPA page tables for T&E, to cache GVA mappings.
- Generate fast-path TLB refill exception handler which loads host TLB
entries from GVA page table, avoiding repeated guest memory
translation and guest TLB lookups.
- Use uaccess macros when T&E needs to access guest memory, which with
GVA page tables and the Linux TLB refill handler improves robustness
against TLB faults and fixes EVA hosts.
- Use BadInstr/BadInstrP registers when available to obtain instruction
encodings after a synchronous trap.
- Add GPA->HPA page tables to replace the inflexible linear array,
allowing for multiple sparsely arranged memory regions.
- Properly implement dirty page logging.
- Add KVM_CAP_SYNC_MMU support so that changes in GPA mappings become
effective in guests even if they are already running, allowing for
copy-on-write, KSM, idle page tracking, swapping, and guest memory
ballooning.
- Add KVM_CAP_READONLY_MEM support, so writes to specified memory
regions are treated as MMIO.
- Implement proper CP0_EBase support in T&E.
- Expose a few more missing CP0 registers to userland.
- Add KVM_CAP_NR_VCPUS and KVM_CAP_MAX_VCPUS support, and allow up to 8
VCPUs to be created in a VM.
- Various cleanups and dropping of dead and duplicated code.
-----BEGIN PGP SIGNATURE-----
iQIcBAABCAAGBQJYlKZeAAoJEGwLaZPeOHZ6ghsQAL4HRU32rA6XKE6gKgRiIPYC
n8iHv2EuUCSKnZyM1a9o1QdJU02bBBB5TPYw+NUsqNClaiRLHsaNZR0TSze4gmcF
NVdnEIOmKluSPnbSXNstqpihJ/p6vzJuh+2eh/EpRuyunzmQ01B6ffjUalKzPtBx
EfgFv1mBnteqLgYZwlJQwj8ogX+Y92TrfdzzazJAom6MFx/lMnigPnUeiaXEG8u6
VrOr/c6Q6lMz4Yfh0xyskJWN4B4zI6PW8/G3SKvKhl8YIRQdtFAv1OfFPaSbdTko
ZdEsFO9UOr0KQu13f10pHAdwRruF7OMQ+3nRDYttdYKzWUYC6pTm77yOG/3+MNdv
KALwaQqJBglaShjuzM8WBI09sDeKgaJ8LYZOttm9Mb+ltwfKsJZPDvba67Kv5266
jkzroKuZeQC6SvAHAlQ7qKgdQr1wrqF3WwjNMmeqNR4Fiw2C3ni/8N39MY/qi7RX
NXQv/fJ6XqM37RC0XMlyu5O+zVWPf0IZ0VdCcl3kkqbvCyXq8B8u6dC+L9+wGb5r
07ZMwnYC93CFeYfrjHu/GsqKqONfAL5Pz13Y/YUlgX0phaLB+yrkq70cFLB4sA8K
KgBwDuD0Qbmdvtcd97qFvp96GMuspIcOkMEiqbD/XrblYnYddehP81Aojdw/GDdp
C62jTny1c/n95ylfbpkC
=lMjW
-----END PGP SIGNATURE-----
Merge tag 'kvm_mips_4.11_1' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/kvm-mips into HEAD
KVM: MIPS: GVA/GPA page tables, dirty logging, SYNC_MMU etc
Numerous MIPS KVM fixes, improvements, and features for 4.11, many of
which continue to pave the way for VZ support, the most interesting of
which are:
- Add GVA->HPA page tables for T&E, to cache GVA mappings.
- Generate fast-path TLB refill exception handler which loads host TLB
entries from GVA page table, avoiding repeated guest memory
translation and guest TLB lookups.
- Use uaccess macros when T&E needs to access guest memory, which with
GVA page tables and the Linux TLB refill handler improves robustness
against TLB faults and fixes EVA hosts.
- Use BadInstr/BadInstrP registers when available to obtain instruction
encodings after a synchronous trap.
- Add GPA->HPA page tables to replace the inflexible linear array,
allowing for multiple sparsely arranged memory regions.
- Properly implement dirty page logging.
- Add KVM_CAP_SYNC_MMU support so that changes in GPA mappings become
effective in guests even if they are already running, allowing for
copy-on-write, KSM, idle page tracking, swapping, and guest memory
ballooning.
- Add KVM_CAP_READONLY_MEM support, so writes to specified memory
regions are treated as MMIO.
- Implement proper CP0_EBase support in T&E.
- Expose a few more missing CP0 registers to userland.
- Add KVM_CAP_NR_VCPUS and KVM_CAP_MAX_VCPUS support, and allow up to 8
VCPUs to be created in a VM.
- Various cleanups and dropping of dead and duplicated code.
The big feature this time is support for POWER9 using the radix-tree
MMU for host and guest. This required some changes to arch/powerpc
code, so I talked with Michael Ellerman and he created a topic branch
with this patchset, which I merged into kvm-ppc-next and which Michael
will pull into his tree. Michael also put in some patches from Nick
Piggin which fix bugs in the interrupt vector code in relocatable
kernels when coming from a KVM guest.
Other notable changes include:
* Add the ability to change the size of the hashed page table,
from David Gibson.
* XICS (interrupt controller) emulation fixes and improvements,
from Li Zhong.
* Bug fixes from myself and Thomas Huth.
These patches define some new KVM capabilities and ioctls, but there
should be no conflicts with anything else currently upstream, as far
as I am aware.
Add a hypercall to retrieve the host realtime clock and the TSC value
used to calculate that clock read.
Used to implement clock synchronization between host and guest.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
vmx_complete_nested_posted_interrupt() can't fail, let's turn it into
a void function.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
kmap() can't fail, therefore it will always return a valid pointer. Let's
just get rid of the unnecessary checks.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Utilize the ability to pass board specific MDIO bus information towards a
particular MDIO device thus allowing us to provide the per-port switch layout
to the Marvell 88E6XXX switch driver.
Since we would end-up with conflicting registration paths, do not register the
"dsa" platform device anymore.
Note that the MDIO devices registered by code in net/dsa/dsa2.c does not
parse a dsa_platform_data, but directly take a dsa_chip_data (specific
to a single switch chip), so we update the different call sites to pass
this structure down to orion_ge00_switch_init().
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using native_machine_emergency_restart (called during reboot) will
lead PVH guests to machine_real_restart() where we try to use
real_mode_header which is not initialized.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Since we are not using PIC and (at least currently) don't have IOAPIC
we want to make sure that acpi_irq_model doesn't stay set to
ACPI_IRQ_MODEL_PIC (which is the default value). If we allowed it to
stay then acpi_os_install_interrupt_handler() would try (and fail) to
request_irq() for PIC.
Instead we set the model to ACPI_IRQ_MODEL_PLATFORM which will prevent
this from happening.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Start PVH guest at XEN_ELFNOTE_PHYS32_ENTRY address. Setup hypercall
page, initialize boot_params, enable early page tables.
Since this stub is executed before kernel entry point we cannot use
variables in .bss which is cleared by kernel. We explicitly place
variables that are initialized here into .data.
While adjusting xen_hvm_init_shared_info() make it use cpuid_e?x()
instead of cpuid() (wherever possible).
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
We are replacing existing PVH guests with new implementation.
We are keeping xen_pvh_domain() macro (for now set to zero) because
when we introduce new PVH implementation later in this series we will
reuse current PVH-specific code (xen_pvh_gnttab_setup()), and that
code is conditioned by 'if (xen_pvh_domain())'. (We will also need
a noop xen_pvh_domain() for !CONFIG_XEN_PVH).
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
The new Xen PVH entry point requires page tables to be setup by the
kernel since it is entered with paging disabled.
Pull the common code out of head_32.S so that mk_early_pgtbl_32() can be
invoked from both the new Xen entry point and the existing startup_32()
code.
Convert resulting common code to C.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: matt@codeblueprint.co.uk
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1481215471-9639-1-git-send-email-boris.ostrovsky@oracle.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We may or may not have all possible CPUs in MADT on boot but in any
case we're overwriting x86_cpu_to_acpiid mapping with U32_MAX when
acpi_register_lapic() is called again on the CPU hotplug path:
acpi_processor_hotadd_init()
-> acpi_map_cpu()
-> acpi_register_lapic()
As we have the required acpi_id information in acpi_processor_hotadd_init()
propagate it to acpi_map_cpu() to always keep x86_cpu_to_acpiid
mapping valid.
Reported-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Print the secure boot status in the x86 setup_arch() function, but otherwise do
nothing more for now. More functionality will be added later, but this at
least allows for testing.
Signed-off-by: David Howells <dhowells@redhat.com>
[ Use efi_enabled() instead of IS_ENABLED(CONFIG_EFI). ]
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1486380166-31868-7-git-send-email-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Get the firmware's secure-boot status in the kernel boot wrapper and stash
it somewhere that the main kernel image can find.
The efi_get_secureboot() function is extracted from the ARM stub and (a)
generalised so that it can be called from x86 and (b) made to use
efi_call_runtime() so that it can be run in mixed-mode.
For x86, it is stored in boot_params and can be overridden by the boot
loader or kexec. This allows secure-boot mode to be passed on to a new
kernel.
Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1486380166-31868-5-git-send-email-ard.biesheuvel@linaro.org
[ Small readability edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
efi_call_runtime() is provided for x86 to be able abstract mixed mode
support. Provide this for ARM also so that common code work in mixed mode
also.
Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1486380166-31868-3-git-send-email-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Provide the ability to perform mixed-mode runtime service calls for x86 in
the same way the following commit provided the ability to invoke for boot
services:
0a637ee612 ("x86/efi: Allow invocation of arbitrary boot services")
Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1486380166-31868-2-git-send-email-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There used to be a restriction with KASLR and hibernation, but this is no
longer true, and since commit:
65fe935dd2 ("x86/KASLR, x86/power: Remove x86 hibernation restrictions")
the parameter "kaslr" does no longer exist.
Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Niklas Cassel <niklass@axis.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1486399429-23078-1-git-send-email-niklass@axis.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The debug features currently uses absolute TOD time stamps for the
debug events. Given that the TOD clock can jump forward and backward
due to STP sync checks the order of debug events can get obfuscated.
Replace the absolute TOD time stamps with a delta to the IPL time
stamp. On a STP sync check the TOD clock correction is added to
the IPL time stamp as well to make the deltas unaffected by STP
sync check.
The readout of the debug feature entries will convert the deltas
back to absolute time stamps based on the Unix epoch.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The data stored by the STSI instruction can be up to a page in size
but the memblock_virt_alloc allocation for tl_info only specifies
16 bytes. The memory after the short allocation is overwritten
every time arch_update_cpu_topology is called.
Fixes: 8c91058022 "s390/numa: establish cpu to node mapping early"
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Commit bcfcbb6bae ("s390: add system information as device
randomness") intended to add some virtual machine specific information
to the randomness pool.
Unfortunately it uses the page allocator before it is ready to use. In
result the page allocator always returns NULL and the setup_randomness
function never adds anything to the randomness pool.
To fix this use memblock_alloc and memblock_free instead.
Fixes: bcfcbb6bae ("s390: add system information as device randomness")
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Change the device probe test in the via-cuda.c driver so it will load on
Egret-based machines too. Remove the now redundant via-maciisi.c driver.
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
All entry points already read the MSR so they can easily do
the right thing.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The branch from hmi_exception_early to hmi_exception_realmode must use
a "relocatable-style" branch, because it is branching from unrelocated
exception code to beyond __end_interrupts.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Without this we will always find the feature disabled.
Fixes: 984d7a1ec6 ("powerpc/mm: Fixup kernel read only mapping")
Cc: stable@vger.kernel.org # v4.7+
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Pull crypto fixes from Herbert Xu:
- use-after-free in algif_aead
- modular aesni regression when pcbc is modular but absent
- bug causing IO page faults in ccp
- double list add in ccp
- NULL pointer dereference in qat (two patches)
- panic in chcr
- NULL pointer dereference in chcr
- out-of-bound access in chcr
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: chcr - Fix key length for RFC4106
crypto: algif_aead - Fix kernel panic on list_del
crypto: aesni - Fix failure when pcbc module is absent
crypto: ccp - Fix double add when creating new DMA command
crypto: ccp - Fix DMA operations when IOMMU is enabled
crypto: chcr - Check device is allocated before use
crypto: chcr - Fix panic on dma_unmap_sg
crypto: qat - zero esram only for DH85x devices
crypto: qat - fix bar discovery for c62x
start,size has the benefit of being easier to search for (start,end
usually gives you the preceeding vector from the one you want, as first
result).
Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Somewhere along the line, search/replace left some naming garbled,
and untidy alignment (aka. mpe stuffed it up). Might as well fix them
all up now while git blame history doesn't extend too far.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Atomic operation function symbols are exported,when
CONFIG_ARM64_LSE_ATOMICS is defined. Prefix them with notrace, so that
an user can not trace these functions. Tracing these functions causes
kernel crash.
Signed-off-by: Pratyush Anand <panand@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The NUMA code may get confused by the presence of NOMAP regions within
zones, resulting in spurious BUG() checks where the node id deviates
from the containing zone's node id.
Since the kernel has no business reasoning about node ids of pages it
does not own in the first place, enable CONFIG_HOLES_IN_ZONE to ensure
that such pages are disregarded.
Acked-by: Robert Richter <rrichter@cavium.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
A typical SMP system expects cache coherency. Initial NPS platform
support was slated to be SMP w/o cache coherency.
However it seems the platform now selects that option, so there is no
point in keeping it around.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
No need for specifying a list of interrupts in the declaration
of IDU interrupt controller anymore since the kernel can obtain
a number of supported interrupts from the build register.
Also delete support of the second parameter for devices which
are connected to IDU because it is not used anywhere.
Signed-off-by: Yuriy Kolerov <yuriy.kolerov@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
This enhancement is needed to allow masking all available common interrupts
in IDU interrupt controller in boot time since the kernel can
discover a number of them from the build register. Also now there
is no need to specify in device tree a list of used core interrupts
by IDU. E.g. before:
idu_intc: idu-interrupt-controller {
compatible = "snps,archs-idu-intc";
interrupt-controller;
interrupt-parent = <&core_intc>;
#interrupt-cells = <2>;
interrupts = <24 25 26 27 28 29 30 31>;
};
and after:
idu_intc: idu-interrupt-controller {
compatible = "snps,archs-idu-intc";
interrupt-controller;
interrupt-parent = <&core_intc>;
#interrupt-cells = <2>;
};
Signed-off-by: Yuriy Kolerov <yuriy.kolerov@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
After reset all interrupts in the core interrupt controller has
the highest priority P0. If the platform supports Fast IRQs and
has more than 1 banks of registers then CPU automatically switch
banks of registers when P0 interrupt comes.
The problem is that the kernel expects that by default switching
of banks is not used by all interrupts. It is necessary to set a
default nonzero priority for all available interrupts to avoid
undefined behaviour.
Signed-off-by: Yuriy Kolerov <yuriy.kolerov@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Currently Kconfig knob ARC_NUMBER_OF_INTERRUPTS is used as indicator of
hard irq count. But it is flawed that it doesn't affect
- NR_IRQS : for number of virtual interrupts
- NR_CPU_IRQS : for number of hardware interrupts
Moreover the actual hardware irq count might still not be same as
ARC_NUMBER_OF_INTERRUPTS. So use the information availble in the
Build Configuration Registers and get rid of the Kconfig option.
We still need "some" build time info about irq count to set up
sufficient number of vector table entries. This is done with a
sufficiently large NR_CPU_IRQS which will eventually be used soley for
that purpose (subsequent patches will remove its usage elsewhere)
So to summarize what this patch does:
* NR_CPU_IRQS defines a maximum number of hardware interrupts.
* Remove ARC_NUMBER_OF_INTERRUPTS option and create interrupts
table for all possible hardware interrupts.
* Increase a maximum number of virtual IRQs to 512. ARCv2 can
support 240 interrupts in the core interrupts controllers
and 128 interrupts in IDU. Thus 512 virtual IRQs must be
enough for most configurations of boards.
This patch leads to NR_CPU_IRQS in 2 places, to reduce the overall
churn. The next patch will remove the 2nd definition anyways.
Signed-off-by: Yuriy Kolerov <yuriy.kolerov@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
[vgupta: reworked the changelog a bit]
It is better to use it instead of magic numbers.
Signed-off-by: Yuriy Kolerov <yuriy.kolerov@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
ARC VDK for EVSS uses UIO for communication with Embedded Vision
Subsystem.
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
In recent VDKs ARC cores are configured as "run on reset"
which made existing kernel configuration outdated to effect that
slave cores never start execution of the code keeping only master
online.
With that fix we're again in sync with VDK platform.
And while at it we regenerate defconfig via savedefconfig so default
options are now excluded.
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
The symbols can no longer be used as loadable modules, leading to a harmless Kconfig
warning:
arch/arm/configs/imote2_defconfig:60:warning: symbol value 'm' invalid for NF_CT_PROTO_UDPLITE
arch/arm/configs/imote2_defconfig:59:warning: symbol value 'm' invalid for NF_CT_PROTO_SCTP
arch/arm/configs/ezx_defconfig:68:warning: symbol value 'm' invalid for NF_CT_PROTO_UDPLITE
arch/arm/configs/ezx_defconfig:67:warning: symbol value 'm' invalid for NF_CT_PROTO_SCTP
Let's make them built-in.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
More consistent naming for some orion5x based boards helping the
switch to device tree for debian users.
-----BEGIN PGP SIGNATURE-----
iIEEABECAEEWIQQYqXDMF3cvSLY+g9cLBhiOFHI71QUCWJNvuSMcZ3JlZ29yeS5j
bGVtZW50QGZyZWUtZWxlY3Ryb25zLmNvbQAKCRALBhiOFHI71bUAAJ9F2ae0LEvo
4Fu44238w1Kr6hC6LQCeO16PL46cLMTZj2E/hOy/9eytbM0=
=IcBG
-----END PGP SIGNATURE-----
Merge tag 'mvebu-fixes-4.10-1' of git://git.infradead.org/linux-mvebu into fixes
Pull "mvebu fixes for 4.10 (part 1)" from Gregory CLEMENT:
More consistent naming for some orion5x based boards helping the
switch to device tree for debian users.
* tag 'mvebu-fixes-4.10-1' of git://git.infradead.org/linux-mvebu:
ARM: orion5x: fix Makefile for linkstation-lschl.dtb
ARM: dts: orion5x-lschl: More consistent naming on linkstation series
ARM: dts: orion5x-lschl: Fix model name
Back when this was first written, dma_supported() was somewhat of a
murky mess, with subtly different interpretations being relied upon in
various places. The "does device X support DMA to address range Y?"
uses assuming Y to be physical addresses, which motivated the current
iommu_dma_supported() implementation and are alluded to in the comment
therein, have since been cleaned up, leaving only the far less ambiguous
"can device X drive address bits Y" usage internal to DMA API mask
setting. As such, there is no reason to keep a slightly misleading
callback which does nothing but duplicate the current default behaviour;
we already constrain IOVA allocations to the iommu_domain aperture where
necessary, so let's leave DMA mask business to architecture-specific
code where it belongs.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Sometimes (e.g. early boot) a guest is broken in such ways that it loops
100% delivering operation exceptions (illegal operation) but the pgm new
PSW is not set properly. This will result in code being read from
address zero, which usually contains another illegal op. Let's detect
this case and return to userspace. Instead of only detecting
this for address zero apply a heuristic that will work for any program
check new psw.
We do not want guest problem state to be able to trigger a guest panic,
e.g. by faulting on an address that is the same as the program check
new PSW, so we check for the problem state bit being off.
With proper handling in userspace we
a: get rid of CPU consumption of such broken guests
b: keep the program old PSW. This allows to find out the original illegal
operation - making debugging such early boot issues much easier than
with single stepping
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
User controlled KVM guests do not support the dirty log, as they have
no single gmap that we can check for changes.
As they have no single gmap, kvm->arch.gmap is NULL and all further
referencing to it for dirty checking will result in a NULL
dereference.
Let's return -EINVAL if a caller tries to sync dirty logs for a
UCONTROL guest.
Fixes: 15f36eb ("KVM: s390: Add proper dirty bitmap support to S390 kvm.")
Cc: <stable@vger.kernel.org> # 3.16+
Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Reported-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Use hlist_for_each_entry() in the first loop in the kretprobe
trampoline_handler() function, because it doesn't change the hlist.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/148637493309.19245.12546866092052500584.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Remove the 'HAVE_KPROBES' dependency from the HAVE_KRETPROBES line,
since HAVE_KPROBES is already selected unconditionally in the Kconfig
line above this one.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David A. Long <dave.long@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandeepa Prabhu <sandeepa.s.prabhu@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/148637486369.19245.316601692744886725.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Hardware flow-control capability must be specified at a platform
level in order to inform the ASC driver that the platform is capable
(i.e. are the lines wired up, etc). STiH4{07,10} devices are indeed
capable, so let's provide the property.
Acked-by: Peter Griffin <peter.griffin@linaro.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Having just defined some new Pinctrl groups for when HW flow-control
is {en,dis}abled, let's reference them for use within the driver.
Acked-by: Peter Griffin <peter.griffin@linaro.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Each serial port which supports HW flow-control should have 2 Pinctrl
groups. One for when HW flow-control is in progress, where the IP
will take over controlling the lines and another group which enables
the lines to be toggled using GPIO mechanisms.
Acked-by: Peter Griffin <peter.griffin@linaro.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When hardware flow-control is disabled, manual toggling of the UART's
reset line (RTS) using userland applications (e.g. stty) is not
possible, since the ASC IP does not provide this functionality in the
same was as some other IPs do. Thus, we have to do this manually.
This patch configures the UART RTS line as a GPIO for manipulation
within the UART driver when HW flow-control is not enabled.
Acked-by: Peter Griffin <peter.griffin@linaro.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This adds AUX vectors for the L1I,D, L2 and L3 cache levels
providing for each cache level the size of the cache in bytes
and the geometry (line size and number of ways).
We chose to not use the existing alpha/sh definition which
packs all the information in a single entry per cache level as
it is too restricted to represent some of the geometries used
on POWER.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
All shipping firmware versions have it wrong in the device-tree
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Retrieved from device-tree when available
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We have two set of identical struct members for the I and D sides
and mostly identical bunches of code to parse the device-tree to
populate them. Instead make a ppc_cache_info structure with one
copy for I and one for D
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
It will be used to calculate the associativity
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In a number of places we called "cache line size" what is actually
the cache block size, which in the powerpc architecture, means the
effective size to use with cache management instructions (it can
be different from the actual cache line size).
We fix the naming across the board and properly retrieve both
pieces of information when available in the device-tree.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We don't patch instructions based on the cache lines or block
sizes these days.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The variables are defined twice in setup_32.c and setup_64.c, do it
once in setup-common.c instead
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
It's an kernel private macro, it doesn't belong there
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
After:
a33d331761 ("x86/CPU/AMD: Fix Bulldozer topology")
our SMT scheduling topology for Fam17h systems is broken, because
the ThreadId is included in the ApicId when SMT is enabled.
So, without further decoding cpu_core_id is unique for each thread
rather than the same for threads on the same core. This didn't affect
systems with SMT disabled. Make cpu_core_id be what it is defined to be.
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org> # 4.9
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170205105022.8705-2-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commit:
a33d331761 ("x86/CPU/AMD: Fix Bulldozer topology")
restored the initial approach we had with the Fam15h topology of
enumerating CU (Compute Unit) threads as cores. And this is still
correct - they're beefier than HT threads but still have some
shared functionality.
Our current approach has a problem with the Mad Max Steam game, for
example. Yves Dionne reported a certain "choppiness" while playing on
v4.9.5.
That problem stems most likely from the fact that the CU threads share
resources within one CU and when we schedule to a thread of a different
compute unit, this incurs latency due to migrating the working set to a
different CU through the caches.
When the thread siblings mask mirrors that aspect of the CUs and
threads, the scheduler pays attention to it and tries to schedule within
one CU first. Which takes care of the latency, of course.
Reported-by: Yves Dionne <yves.dionne@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org> # 4.9
Cc: Brice Goglin <Brice.Goglin@inria.fr>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yazen Ghannam <yazen.ghannam@amd.com>
Link: http://lkml.kernel.org/r/20170205105022.8705-1-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Enable ring 3 MONITOR/MWAIT for Intel Xeon Phi codenamed Knights Mill. We
can't guarantee that this (KNM) will be the last CPU model that needs this
hack. But, we do recognize that this is far from optimal, and there is an
effort to ensure we don't keep doing extending this hack forever.
Signed-off-by: Piotr Luc <piotr.luc@intel.com>
Cc: Piotr.Luc@intel.com
Cc: dave.hansen@linux.intel.com
Link: http://lkml.kernel.org/r/1484918557-15481-6-git-send-email-grzegorz.andrejczuk@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pull irq fixes from Thomas Gleixner:
- Prevent double activation of interrupt lines, which causes problems
on certain interrupt controllers
- Handle the fallout of the above because x86 (ab)uses the activation
function to reconfigure interrupts under the hood.
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/irq: Make irq activate operations symmetric
irqdomain: Avoid activating interrupts more than once
Fix a regression that prevented migration between hosts with different
XSAVE features even if the missing features were not used by the guest
(for stable).
-----BEGIN PGP SIGNATURE-----
iQEcBAABCAAGBQJYlf83AAoJEED/6hsPKofoQI8H/2Y9v5FkIMUeLVPf5nskcomw
pV/IqqMJEQ0sEp0+fkGhk15nykrVpXfOdqgGD8FI9Xk8rlkTEcUSGMGvfXrIk0ir
fzX27ASWrHvyjso+6XZzarSUhMFiBljU+NDcqWgjAeYEA1H+fxtxcomx+KiC1D1H
Q3kYMWTDQ0q/QU0q/4ohVM0gfVIunmVjoJaMK3tlrPP+w4MgMu2WALi0BlZKyugZ
fcVxzgGxPKoxAfXoFHohS7jKhLX9rF8MJoSH2NxInguajpMtf76Jw+YOr10yWtR2
ESY/5JXb4KLE94cwM3XiDghYg2ak/zphTFxBbPHmSxY3nim7QahRyuiMQFr3VN8=
=0UcD
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fix from Radim Krčmář:
"Fix a regression that prevented migration between hosts with different
XSAVE features even if the missing features were not used by the guest
(for stable)"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: do not save guest-unsupported XSAVE state
To make the code clearer, use rb_entry() instead of open coding it
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Link: http://lkml.kernel.org/r/974a91cd4ed2d04c92e4faa4765077e38f248d6b.1482157956.git.geliangtang@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Exception handlers which may run on IST stack call ist_enter() at the start
of execution and ist_exit() in the end. ist_enter() disables preemption
unconditionally and ist_exit() enables it.
So the extra preempt_disable/enable() pairs nested inside the
ist_enter/exit() regions are pointless and can be removed.
Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Jianyu Zhan <nasa4836@gmail.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/20161128075057.7724-1-kuleshovmail@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
commit 8fd524b355 ("x86: Kill bad_dma_address variable") has killed
bad_dma_address variable and used instead of macro DMA_ERROR_CODE
which is always zero. Since dma_addr is unsigned, the statement
dma_addr >= DMA_ERROR_CODE
is always true, and not needed.
arch/x86/kernel/pci-calgary_64.c: In function ‘iommu_free’:
arch/x86/kernel/pci-calgary_64.c:299:2: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
if (unlikely((dma_addr >= DMA_ERROR_CODE) && (dma_addr < badend))) {
Fixes: 8fd524b355 ("x86: Kill bad_dma_address variable")
Signed-off-by: Nikola Pajkovsky <npajkovsky@suse.cz>
Cc: iommu@lists.linux-foundation.org
Cc: Jon Mason <jdmason@kudzu.us>
Cc: Muli Ben-Yehuda <mulix@mulix.org>
Link: http://lkml.kernel.org/r/7612c0f9dd7c1290407dbf8e809def922006920b.1479161177.git.npajkovsky@suse.cz
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Enable ring 3 MONITOR/MWAIT for Intel Xeon Phi x200 codenamed Knights
Landing.
Presence of this feature cannot be detected automatically (by reading any
other MSR) therefore it is required to explicitly check for the family and
model of the CPU before attempting to enable it.
Signed-off-by: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Cc: Piotr.Luc@intel.com
Cc: dave.hansen@linux.intel.com
Link: http://lkml.kernel.org/r/1484918557-15481-5-git-send-email-grzegorz.andrejczuk@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Introduce ELF_HWCAP2 variable for x86 and reserve its bit 0 to expose the
ring 3 MONITOR/MWAIT.
HWCAP variables contain bitmasks which can be used by userspace
applications to detect which instruction sets are supported by CPU. On x86
architecture information about CPU capabilities can be checked via CPUID
instructions, unfortunately presence of ring 3 MONITOR/MWAIT feature cannot
be checked this way. ELF_HWCAP cannot be used as well, because on x86 it is
set to CPUID[1].EDX which means that all bits are reserved there.
HWCAP2 approach was chosen because it reuses existing solution present
in other architectures, so only minor modifications are required to the
kernel and userspace applications. When ELF_HWCAP2 is defined
kernel maps it to AT_HWCAP2 during the start of the application.
This way the ring 3 MONITOR/MWAIT feature can be detected using getauxval()
API in a simple and fast manner. ELF_HWCAP2 type is u32 to be consistent
with x86 ELF_HWCAP type.
Signed-off-by: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Cc: Piotr.Luc@intel.com
Cc: dave.hansen@linux.intel.com
Link: http://lkml.kernel.org/r/1484918557-15481-3-git-send-email-grzegorz.andrejczuk@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Define new MSR MISC_FEATURE_ENABLES (0x140).
On supported CPUs if bit 1 of this MSR is set, then calling MONITOR and
MWAIT instructions outside of ring 0 will not cause invalid-opcode
exception.
The MSR MISC_FEATURE_ENABLES is not yet documented in the SDM. Here is the
relevant documentation:
Hex Dec Name Scope
140H 320 MISC_FEATURE_ENABLES Thread
0 Reserved
1 If set to 1, the MONITOR and MWAIT instructions do not
cause invalid-opcode exceptions when executed with CPL > 0
or in virtual-8086 mode. If MWAIT is executed when CPL > 0
or in virtual-8086 mode, and if EAX indicates a C-state
other than C0 or C1, the instruction operates as if EAX
indicated the C-state C1.
63:2 Reserved
Signed-off-by: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Cc: Piotr.Luc@intel.com
Cc: dave.hansen@linux.intel.com
Link: http://lkml.kernel.org/r/1484918557-15481-2-git-send-email-grzegorz.andrejczuk@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This doesn't have any benefit apart from saving a small amount of memory
when it is disabled. The ifdef hackery in the code makes it dirty
unnecessarily.
Clean it up by removing the Kconfig option completely. Few defconfigs
are also updated and CONFIG_CPU_FREQ_STAT_DETAILS is replaced with
CONFIG_CPU_FREQ_STAT now in them, as users wanted stats to be enabled.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com>
Acked-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The main change is we're reverting the initial stack protector support we
merged this cycle. It turns out to not work on toolchains built with libc
support, and fixing it will be need to wait for another release.
And the rest are all fairly minor:
- Some pasemi machines were not booting due to a missing error check in
prom_find_boot_cpu().
- In EEH we were checking a pointer rather than the bool it pointed to.
- The clang build was broken by a BUILD_BUG_ON() we added.
- The radix (Power9 only) version of map_kernel_page() was broken if our
memory size was a multiple of 2MB, which it generally isn't.
Thanks to:
Darren Stevens, Gavin Shan, Reza Arbab.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJYlGHSAAoJEFHr6jzI4aWAVy4QALSpVLv0900hMtn+5MBhhgad
GxHI3ViJUV+K3gDpagl+Ya++WiPd9ffiK9FqS4Xe/r1r2pgSoNU8SC736Jomb1wL
cCx34E73SLMTfnhm/lVq+a4e+nRnk633C8gdsU4Py+FFOfz7FkeQX/gBGbv+ffng
oVO6Susqo/dKirVo6+BCgVjSLdYezzxw31SVNGYUYz4MFxWUHK+bi/l0FIDQGhjD
mTzV2cPhfjYTAHQtY43in6plQ91Pd2CR5HxL3Bw9DnFhS7zFMBnWPYYWh+kdT/vf
wNS1cFJoGHjXCaXt/eXsk3GPC696bDWGSwpenQewTQdb8yJ5MWP7Jv9KeBN8603x
LSE6o9/RATv7+pJJ9UPP+vLwSXtx2bi1FJlOU4Nbxi5YhfXZncCltMANCNuVix1l
2x5Qhs8GJNt7nqxhv5GwvAmEaKZet1Ucgo+p/9D08bfKk8loN+evRx6fjT5CymE7
s1INIbkzxppebTBjawDA7w6kWXjmgrskWvqjrPUfu91Ro8KmeBVS4t0pXY9D/i8N
hCEMbXNL5qrWTRScOxP3k5cQsOPnQEz6F7AkecYjWfwGG9ZvdcztuhTj1qYRjVaF
9Xk7OWu52/EamFdZ5MGuiVIEp8Vbqy8/0IJ/bdTOEMgjsaziD6LyRNZ6Qoyxg+bf
kFEqmzuKgmz3D4aTklnh
=XUAi
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"The main change is we're reverting the initial stack protector support
we merged this cycle. It turns out to not work on toolchains built
with libc support, and fixing it will be need to wait for another
release.
And the rest are all fairly minor:
- Some pasemi machines were not booting due to a missing error check
in prom_find_boot_cpu()
- In EEH we were checking a pointer rather than the bool it pointed
to
- The clang build was broken by a BUILD_BUG_ON() we added.
- The radix (Power9 only) version of map_kernel_page() was broken if
our memory size was a multiple of 2MB, which it generally isn't
Thanks to: Darren Stevens, Gavin Shan, Reza Arbab"
* tag 'powerpc-4.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/mm: Use the correct pointer when setting a 2MB pte
powerpc: Fix build failure with clang due to BUILD_BUG_ON()
powerpc: Revert the initial stack protector support
powerpc/eeh: Fix wrong flag passed to eeh_unfreeze_pe()
powerpc: Add missing error check to prom_find_boot_cpu()
Use builtin_platform_driver() helper to simplify the code.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This patch adds a Qualcomm specific quirk to the arm_smccc_smc call.
On Qualcomm ARM64 platforms, the SMC call can return before it has
completed. If this occurs, the call can be restarted, but it requires
using the returned session ID value from the interrupted SMC call.
The quirk stores off the session ID from the interrupted call in the
quirk structure so that it can be used by the caller.
This patch folds in a fix given by Sricharan R:
https://lkml.org/lkml/2016/9/28/272
Signed-off-by: Andy Gross <andy.gross@linaro.org>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This patch adds a quirk parameter to the arm_smccc_(smc/hvc) calls.
The quirk structure allows for specialized SMC operations due to SoC
specific requirements. The current arm_smccc_(smc/hvc) is renamed and
macros are used instead to specify the standard arm_smccc_(smc/hvc) or
the arm_smccc_(smc/hvc)_quirk function.
This patch and partial implementation was suggested by Will Deacon.
Signed-off-by: Andy Gross <andy.gross@linaro.org>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Saving unsupported state prevents migration when the new host does not
support a XSAVE feature of the original host, even if the feature is not
exposed to the guest.
We've masked host features with guest-visible features before, with
4344ee981e ("KVM: x86: only copy XSAVE state for the supported
features") and dropped it when implementing XSAVES. Do it again.
Fixes: df1daba7d1 ("KVM: x86: support XSAVES usage in the host")
Cc: stable@vger.kernel.org
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
The modversion symbol CRCs are emitted as ELF symbols, which allows us
to easily populate the kcrctab sections by relying on the linker to
associate each kcrctab slot with the correct value.
This has a couple of downsides:
- Given that the CRCs are treated as memory addresses, we waste 4 bytes
for each CRC on 64 bit architectures,
- On architectures that support runtime relocation, a R_<arch>_RELATIVE
relocation entry is emitted for each CRC value, which identifies it
as a quantity that requires fixing up based on the actual runtime
load offset of the kernel. This results in corrupted CRCs unless we
explicitly undo the fixup (and this is currently being handled in the
core module code)
- Such runtime relocation entries take up 24 bytes of __init space
each, resulting in a x8 overhead in [uncompressed] kernel size for
CRCs.
Switching to explicit 32 bit values on 64 bit architectures fixes most
of these issues, given that 32 bit values are not treated as quantities
that require fixing up based on the actual runtime load offset. Note
that on some ELF64 architectures [such as PPC64], these 32-bit values
are still emitted as [absolute] runtime relocatable quantities, even if
the value resolves to a build time constant. Since relative relocations
are always resolved at build time, this patch enables MODULE_REL_CRCS on
powerpc when CONFIG_RELOCATABLE=y, which turns the absolute CRC
references into relative references into .rodata where the actual CRC
value is stored.
So redefine all CRC fields and variables as u32, and redefine the
__CRC_SYMBOL() macro for 64 bit builds to emit the CRC reference using
inline assembler (which is necessary since 64-bit C code cannot use
32-bit types to hold memory addresses, even if they are ultimately
resolved using values that do not exceed 0xffffffff). To avoid
potential problems with legacy 32-bit architectures using legacy
toolchains, the equivalent C definition of the kcrctab entry is retained
for 32-bit architectures.
Note that this mostly reverts commit d4703aefdb ("module: handle ppc64
relocating kcrctabs when CONFIG_RELOCATABLE=y")
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When building with debugging symbols, take the absolute path to the
vmlinux binary and add it to the special PE/COFF debug table entry.
This allows a debug EFI build to find the vmlinux binary, which is
very helpful in debugging, given that the offset where the Image is
first loaded by EFI is highly unpredictable.
On implementations of UEFI that choose to implement it, this
information is exposed via the EFI debug support table, which is a UEFI
configuration table that is accessible both by the firmware at boot time
and by the OS at runtime, and lists all PE/COFF images loaded by the
system.
The format of the NB10 Codeview entry is based on the definition used
by EDK2, which is our primary reference when it comes to the use of
PE/COFF in the context of UEFI firmware.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[will: use realpath instead of shell invocation, as discussed on list]
Signed-off-by: Will Deacon <will.deacon@arm.com>
Increase the maximum number of MIPS KVM VCPUs to 8, and implement the
KVM_CAP_NR_VCPUS and KVM_CAP_MAX_CPUS capabilities which expose the
recommended and maximum number of VCPUs to userland. The previous
maximum of 1 didn't allow for any form of SMP guests.
We calculate the values similarly to ARM, recommending as many VCPUs as
there are CPUs online in the system. This will allow userland to know
how many VCPUs it is possible to create.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Expose the CP0_IntCtl register through the KVM register access API,
which is a required register since MIPS32r2. It is currently read-only
since the VS field isn't implemented due to lack of Config3.VInt or
Config3.VEIC.
It is implemented in trap_emul.c so that a VZ implementation can allow
writes.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Expose the CP0_EntryLo0 and CP0_EntryLo1 registers through the KVM
register access API. This is fairly straightforward for trap & emulate
since we don't support the RI and XI bits. For the sake of future
proofing (particularly for VZ) it is explicitly specified that the API
always exposes the 64-bit version of these registers (i.e. with the RI
and XI bits in bit positions 63 and 62 respectively), and they are
implemented in trap_emul.c rather than mips.c to allow them to be
implemented differently for VZ.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Set the default VCPU state closer to the architectural reset state, with
PC pointing at the reset vector (uncached PA 0x1fc00000, which for KVM
T&E is VA 0x5fc00000), and with CP0_Status.BEV and CP0_Status.ERL to 1.
Although QEMU at least will overwrite this state, it makes sense to do
this now that CP0_EBase is properly implemented to check BEV, and now
that we support a sparse GPA layout potentially with a boot ROM at GPA
0x1fc00000.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
The CP0_EBase register is a standard feature of MIPS32r2, so we should
always have been implementing it properly. However the register value
was ignored and wasn't exposed to userland.
Fix the emulation of exceptions and interrupts to use the value stored
in guest CP0_EBase, and fix the masks so that the top 3 bits (rather
than the standard 2) are fixed, so that it is always in the guest KSeg0
segment.
Also add CP0_EBASE to the KVM one_reg interface so it can be accessed by
userland, also allowing the CPU number field to be written (which isn't
permitted by the guest).
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Access to various CP0 registers via the KVM register access API needs to
be implementation specific to allow restrictions to be made on changes,
for example when VZ guest registers aren't present, so move them all
into trap_emul.c in preparation for VZ.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Now that load/store faults due to read only memory regions are treated
as MMIO accesses it is safe to claim support for read only memory
regions (KVM_CAP_READONLY_MEM).
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Implement the SYNC_MMU capability for KVM MIPS, allowing changes in the
underlying user host virtual address (HVA) mappings to be promptly
reflected in the corresponding guest physical address (GPA) mappings.
This allows for several features to work with guest RAM which require
mappings to be altered or protected, such as copy-on-write, KSM (Kernel
Samepage Merging), idle page tracking, memory swapping, and guest memory
ballooning.
There are two main aspects of this change, described below.
The KVM MMU notifier architecture callbacks are implemented so we can be
notified of changes in the HVA mappings. These arrange for the guest
physical address (GPA) page tables to be modified and possibly for
derived mappings (GVA page tables and TLBs) to be flushed.
- kvm_unmap_hva[_range]() - These deal with HVA mappings being removed,
for example before a copy-on-write takes place, which requires the
corresponding GPA page table mappings to be removed too.
- kvm_set_spte_hva() - These update a GPA page table entry to match the
new HVA entry, but must be careful to respect KVM specific
configuration such as not dirtying a clean guest page which is dirty
to the host, and write protecting writable pages in read only
memslots (which will soon be supported).
- kvm[_test]_age_hva() - These update GPA page table entries to be old
(invalid) so that access can be tracked, making them young again.
The GPA page fault handling (kvm_mips_map_page) is updated to use
gfn_to_pfn_prot() (which may provide read-only pages), to handle
asynchronous page table invalidation from MMU notifier callbacks, and to
handle more cases in the fast path.
- mmu_notifier_seq is used to detect asynchronous page table
invalidations while we're holding a pfn from gfn_to_pfn_prot()
outside of kvm->mmu_lock, retrying if invalidations have taken place,
e.g. a COW or a KSM page merge.
- The fast path (_kvm_mips_map_page_fast) now handles marking old pages
as young / accessed, and disallowing dirtying of clean pages that
aren't actually writable (e.g. shared pages that should COW, and
read-only memory regions when they are enabled in a future patch).
- Due to the use of MMU notifications we no longer need to keep the
page references after we've updated the GPA page tables.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Propagate the GPA PTE protection bits on to the GVA PTEs on a mapped
fault (except _PAGE_WRITE, and filtered by the guest TLB entry), rather
than always overriding the protection. This allows dirty page tracking
to work in mapped guest segments as a clear dirty bit in the GPA PTE
will propagate to the GVA PTEs even when the guest TLB has the dirty bit
set.
Since the filtering of protection bits is now abstracted, if the buddy
GVA PTE is also valid, we obtain the corresponding GPA PTE using a
simple non-allocating walk and load that into the GVA PTE similarly
(which may itself be invalid).
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Propagate the GPA PTE protection bits on to the GVA PTEs on a KSeg0
fault (except _PAGE_WRITE), rather than always overriding the
protection. This allows dirty page tracking to work in KSeg0 as a clear
dirty bit in the GPA PTE will propagate to the GVA PTEs.
This makes it simpler to use a single kvm_mips_map_page() to obtain both
the main GPA PTE and its buddy (which may be invalid), which also allows
memory regions to be fully accessible when they don't start and end on a
2*PAGE_SIZE boundary.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Update kvm_mips_map_page() to handle logging of dirty guest physical
pages. Upcoming patches will propagate the dirty bit to the GVA page
tables.
A fast path is added for handling protection bits that can be resolved
without calling into KVM, currently just dirtying of clean pages being
written to.
The slow path marks the GPA page table entry writable only on writes,
and at the same time marks the page dirty in the dirty page logging
bitmask.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
When an existing memory region has dirty page logging enabled, make the
entire slot clean (read only) so that writes will immediately start
logging dirty pages (once the dirty bit is transferred from GPA to GVA
page tables in an upcoming patch).
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
MIPS hasn't up to this point properly supported dirty page logging, as
pages in slots with dirty logging enabled aren't made clean, and tlbmod
exceptions from writes to clean pages have been assumed to be due to
guest TLB protection and unconditionally passed to the guest.
Use the generic dirty logging helper kvm_get_dirty_log_protect() to
properly implement kvm_vm_ioctl_get_dirty_log(), similar to how ARM
does. This uses xchg to clear the dirty bits when reading them, rather
than wiping them out afterwards with a memset, which would potentially
wipe recently set bits that weren't caught by kvm_get_dirty_log(). It
also makes the pages clean again using the
kvm_arch_mmu_enable_log_dirty_pt_masked() architecture callback so that
further writes after the shadow memslot is flushed will trigger tlbmod
exceptions and dirty handling.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Add a helper function to make a range of guest physical address (GPA)
mappings in the GPA page table clean so that writes can be caught. This
will be used in a few places to manage dirty page logging.
Note that until the dirty bit is transferred from GPA page table entries
to GVA page table entries in an upcoming patch this won't trigger a TLB
modified exception on write.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Rewrite TLB modified exception handling to handle read only GPA memory
regions, instead of unconditionally passing the exception to the guest.
If the guest TLB is not the cause of the exception we call into the
normal TLB fault handling depending on the memory segment, which will
soon attempt to remap the physical page to be writable (handling dirty
page tracking or copy on write in the process).
Failing that we fall back to treating it as MMIO, due to a read only
memory region. Once the capability is enabled, this will allow read only
memory regions (such as the Malta boot flash as emulated by QEMU) to
have writes treated as MMIO, while still allowing reads to run
untrapped.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Treat unhandled accesses to guest KSeg0 as MMIO, rather than only host
KSeg0 addresses. This will allow read only memory regions (such as the
Malta boot flash as emulated by QEMU) to have writes (before reads)
treated as MMIO, and unallocated physical addresses to have all accesses
treated as MMIO.
The MMIO emulation uses the gva_to_gpa callback, so this is also updated
for trap & emulate to handle guest KSeg0 addresses.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Abstract the handling of bad guest loads and stores which may need to
trigger an MMIO, so that the same code can be used in a later patch for
guest KSeg0 addresses (TLB exception handling) as well as for host KSeg1
addresses (existing address error exception and TLB exception handling).
We now use kvm_mips_emulate_store() and kvm_mips_emulate_load() directly
rather than the more generic kvm_mips_emulate_inst(), as there is no
need to expose emulation of any other instructions.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
kvm_mips_map_page() will need to know whether the fault was due to a
read or a write in order to support dirty page tracking,
KVM_CAP_SYNC_MMU, and read only memory regions, so get that information
passed down to it via new bool write_fault arguments to various
functions.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Ignore userland writes to CP0_Config7 rather than reporting an error,
since we do allow reads of this register and it is claimed to exist in
the ioctl API.
This allows userland to blindly save and restore KVM registers without
having to special case certain registers as not being writable, for
example during live migration once dirty page logging is fixed.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Implement the kvm_arch_flush_shadow_all() and
kvm_arch_flush_shadow_memslot() KVM functions for MIPS to allow guest
physical mappings to be safely changed.
The general MIPS KVM code takes care of flushing of GPA page table
entries. kvm_arch_flush_shadow_all() flushes the whole GPA page table,
and is always called on the cleanup path so there is no need to acquire
the kvm->mmu_lock. kvm_arch_flush_shadow_memslot() flushes only the
range of mappings in the GPA page table corresponding to the slot being
flushed, and happens when memory regions are moved or deleted.
MIPS KVM implementation callbacks are added for handling the
implementation specific flushing of mappings derived from the GPA page
tables. These are implemented for trap_emul.c using
kvm_flush_remote_tlbs() which should now be functional, and will flush
the per-VCPU GVA page tables and ASIDS synchronously (before next
entering guest mode or directly accessing GVA space).
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Use the lockless GVA helpers to implement the reading of guest
instructions for emulation. This will allow it to handle asynchronous
TLB flushes when they are implemented.
This is a little more complicated than the other two cases (get_inst()
and dynamic translation) due to the need to emulate the appropriate
guest TLB exception when the address isn't present or isn't valid in the
guest TLB.
Since there are several protected cache ops that may need to be
performed safely, this is abstracted by kvm_mips_guest_cache_op() which
is passed a protected cache op function pointer and takes care of the
lockless operation and fault handling / retry if the op should fail,
taking advantage of the new errors which the protected cache ops can now
return. This allows the existing advance fault handling which relied on
host TLB lookups to be removed, along with the now unused
kvm_mips_host_tlb_lookup(),
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Use the lockless GVA helpers to implement the reading of guest
instructions for emulation. This will allow it to handle asynchronous
TLB flushes when they are implemented.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Use the lockless GVA helpers to implement the dynamic translation of
guest instructions. This will allow it to handle asynchronous TLB
flushes when they are implemented.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Add helpers to allow for lockless direct access to the GVA space, by
changing the VCPU mode to READING_SHADOW_PAGE_TABLES for the duration of
the access. This allows asynchronous TLB flush requests in future
patches to safely trigger either a TLB flush before the direct GVA space
access, or a delay until the in-progress lockless direct access is
complete.
The kvm_trap_emul_gva_lockless_begin() and
kvm_trap_emul_gva_lockless_end() helpers take care of guarding the
direct GVA accesses, and kvm_trap_emul_gva_fault() tries to handle a
uaccess fault resulting from a flush having taken place.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
The stale ASID checks taking place on VCPU load can be reduced:
- Now that we check for a stale ASID on guest re-entry, there is no need
to do so when loading the VCPU outside of guest context, since it will
happen before entering the guest. Note that a lot of KVM VCPU ioctls
will cause the VCPU to be loaded but guest context won't be entered.
- There is no need to check for a stale kernel_mm ASID when the guest is
in user mode and vice versa. In fact doing so can potentially be
problematic since the user_mm ASID regeneration may trigger a new ASID
cycle, which would cause the kern_mm ASID to become stale after it has
been checked for staleness.
Therefore only check the ASID for the mm corresponding to the current
guest mode, and only if we're already in guest context. We drop some of
the related kvm_debug() calls here too.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Add handling of TLB invalidation requests before entering guest mode.
This will allow asynchonous invalidation of the VCPU mappings when
physical memory regions are altered. Should the CPU running the VCPU
already be in guest mode an IPI will be sent to trigger a guest exit.
The reload_asid path will be used in a future patch for when GVA is
about to be directly accessed by KVM.
In the process, the stale user ASID check in the re-entry path (for lazy
user GVA flushing) is generalised to check the ASID for the current
guest mode, in case a TLB invalidation request was handled. This has the
side effect of making the ASID checks on vcpu_load too conservative,
which will be addressed in a later patch.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Keep the vcpu->mode and vcpu->cpu variables up to date so that
kvm_make_all_cpus_request() has a chance of functioning correctly. This
will soon need to be used for kvm_flush_remote_tlbs().
We can easily update vcpu->cpu when the VCPU context is loaded or saved,
which will happen when accessing guest context and when the guest is
scheduled in and out.
We need to be a little careful with vcpu->mode though, as we will in
future be checking for outstanding VCPU requests, and this must be done
after the value of IN_GUEST_MODE in vcpu->mode is visible to other CPUs.
Otherwise the other CPU could fail to trigger an IPI to wait for
completion dispite the VCPU request not being seen.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Current guest physical memory is mapped to host physical addresses using
a single linear array (guest_pmap of length guest_pmap_npages). This was
only really meant to be temporary, and isn't sparse, so its wasteful of
memory. A small amount of RAM at GPA 0 and a small boot exception vector
at GPA 0x1fc00000 cannot be represented without a full 128KiB guest_pmap
allocation (MIPS32 with 16KiB pages), which is one reason why QEMU
currently runs its boot code at the top of RAM instead of the usual boot
exception vector address.
Instead use the existing infrastructure for host virtual page table
management to allocate a page table for guest physical memory too. This
should be sufficient for now, assuming the size of physical memory
doesn't exceed the size of virtual memory. It may need extending in
future to handle XPA (eXtended Physical Addressing) in 32-bit guests, as
supported by VZ guests on P5600.
Some of this code is based loosely on Cavium's VZ KVM implementation.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
When exiting from the guest, store the values of the CP0_BadInstr and
CP0_BadInstrP registers if they exist, which contain the encodings of
the instructions which caused the last synchronous exception.
When the instruction is needed for emulation, kvm_get_badinstr() and
kvm_get_badinstrp() are used instead of calling kvm_get_inst() directly,
to decide whether to read the saved CP0_BadInstr/CP0_BadInstrP registers
(if they exist), or read the instruction from memory (if not).
The use of these registers should be more robust than using
kvm_get_inst(), as it actually gives the instruction encoding seen by
the hardware rather than relying on user accessors after the fact, which
can be fooled by incoherent icache or a racing code modification. It
will also work with VZ, where the guest virtual memory isn't directly
accessible by the host with user accessors.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Currently kvm_get_inst() returns KVM_INVALID_INST in the event of a
fault reading the guest instruction. This has the rather arbitrary magic
value 0xdeadbeef. This API isn't very robust, and in fact 0xdeadbeef is
a valid MIPS64 instruction encoding, namely "ld t1,-16657(s5)".
Therefore change the kvm_get_inst() API to return 0 or -EFAULT, and to
return the instruction via a u32 *out argument. We can then drop the
KVM_INVALID_INST definition entirely.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
In order to make use of the CP0_BadInstr & CP0_BadInstrP registers we
need to be a bit more careful not to treat code fetch faults as MMIO,
lest we hit an UNPREDICTABLE register value when we try to emulate the
MMIO load instruction but there was no valid instruction word available
to the hardware.
Add a kvm_is_ifetch_fault() helper to try to figure out whether a load
fault was due to a code fetch, and prevent MMIO instruction emulation in
that case.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
MIPS KVM uses its own variation of get_new_mmu_context() which takes an
extra vcpu pointer (unused) and does exactly the same thing.
Switch to just using get_new_mmu_context() directly and drop KVM's
version of it as it doesn't really serve any purpose.
The nearby declarations of kvm_mips_alloc_new_mmu_context(),
kvm_mips_vcpu_load() and kvm_mips_vcpu_put() are also removed from
kvm_host.h, as no definitions or users exist.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
When exceptions are injected into the MIPS KVM guest, the whole host TLB
is flushed (except any entries in the guest KSeg0 range). This is
certainly not mandated by the architecture when exceptions are taken
(userland can't directly change TLB mappings anyway), and is a pretty
heavyweight operation:
- There may be hundreds of TLB entries especially when a 512 entry FTLB
is present. These are walked and read and conditionally invalidated,
so the TLBINV feature can't be used either.
- It'll indiscriminately wipe out entries belonging to other memory
spaces. A simple ASID regeneration would be much faster to perform,
although it'd wipe out the guest KSeg0 mappings too.
My suspicion is that this was simply to plaster over the fact that
kvm_mips_host_tlb_inv() incorrectly only invalidated TLB entries in the
ASID for guest usermode, and not the ASID for guest kernelmode.
Now that the recent commit "KVM: MIPS/TLB: Flush host TLB entry in
kernel ASID" fixes kvm_mips_host_tlb_inv() to flush TLB entries in the
kernelmode ASID when the guest TLB changes, lets drop these calls and
the otherwise unused kvm_mips_flush_host_tlb().
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Now that KVM no longer uses wired entries we can safely use
local_flush_tlb_all() when we need to flush the entire TLB (on the start
of a new ASID cycle). This doesn't flush wired entries, which allows
other code to use them without KVM clobbering them all the time. It also
is more up to date, knowing about the tlbinv architectural feature,
flushing of micro TLB on cores where that is necessary (Loongson I
believe), and knows to stop the HTW while doing so.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Use protected_writeback_dcache_line() instead of flush_dcache_line(),
and protected_flush_icache_line() instead of flush_icache_line(), so
that CACHEE (the EVA variant) is used on EVA host kernels.
Without this, guest floating point branch delay slot emulation via a
trampoline on the user stack fails on EVA host kernels due to failure of
the icache sync, resulting in the break instruction getting skipped and
execution from the stack.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Now that we have GVA page tables, use standard user accesses with page
faults disabled to read & modify guest instructions. This should be more
robust (than the rather dodgy method of accessing guest mapped segments
by just directly addressing them) and will also work with Enhanced
Virtual Addressing (EVA) host kernel configurations where dedicated
instructions are needed for accessing user mode memory.
For simplicity and speed we do this regardless of the guest segment the
address resides in, rather than handling guest KSeg0 specially with
kmap_atomic() as before.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Now that the commpage doesn't use wired TLB entries, the per-CPU
vm_init() callback is the only work done by kvm_mips_init_vm_percpu().
The trap & emulate implementation doesn't actually need to do anything
from vm_init(), and the future VZ implementation would be better served
by a kvm_arch_hardware_enable callback anyway.
Therefore drop the vm_init() callback entirely, allowing the
kvm_mips_init_vm_percpu() function to also be dropped, along with the
kvm_mips_instance atomic counter.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Now that we have GVA page tables and an optimised TLB refill handler in
place, convert the handling of commpage faults from the guest kernel to
fill the GVA page table and invalidate the TLB entry, rather than
filling the wired TLB entry directly.
For simplicity we no longer use a wired entry for the commpage (refill
should be much cheaper with the fast-path handler anyway). Since we
don't need to manipulate the TLB directly any longer, move the function
from tlb.c to mmu.c. This puts it closer to the similar functions
handling KSeg0 and TLB mapped page faults from the guest.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Now that we have GVA page tables and an optimised TLB refill handler in
place, convert the handling of page faults in TLB mapped segment from
the guest to fill a single GVA page table entry and invalidate the TLB
entry, rather than filling a TLB entry pair directly.
Also remove the now unused kvm_mips_get_{kernel,user}_asid() functions
in mmu.c and kvm_mips_host_tlb_write() in tlb.c.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Now that we have GVA page tables and an optimised TLB refill handler in
place, convert the handling of KSeg0 page faults from the guest to fill
the GVA page tables and invalidate the TLB entry, rather than filling a
TLB entry directly.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Implement invalidation of specific pairs of GVA page table entries in
one or both of the GVA page tables. This is used when existing mappings
are replaced in the guest TLB by emulated TLBWI/TLBWR instructions. Due
to the sharing of page tables in the host kernel range, we should be
careful not to allow host pages to be invalidated.
Add a helper kvm_mips_walk_pgd() which can be used when walking of
either GPA (future patches) or GVA page tables is needed, optionally
with allocation of page tables along the way when they don't exist.
GPA page table walking will need to be protected by the kvm->mmu_lock,
so we also add a small MMU page cache in each KVM VCPU, like that found
for other architectures but smaller. This allows enough pages to be
pre-allocated to handle a single fault without holding the lock,
allowing the helper to run with the lock held without having to handle
allocation failures.
Using the same mechanism for GVA allows the same code to be used, and
allows it to use the same cache of allocated pages if the GPA walk
didn't need to allocate any new tables.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Implement invalidation of large ranges of virtual addresses from GVA
page tables in response to a guest ASID change (immediately for guest
kernel page table, lazily for guest user page table).
We iterate through a range of page tables invalidating entries and
freeing fully invalidated tables. To minimise overhead the exact ranges
invalidated depends on the flags argument to kvm_mips_flush_gva_pt(),
which also allows it to be used in future KVM_CAP_SYNC_MMU patches in
response to GPA changes, which unlike guest TLB mapping changes affects
guest KSeg0 mappings.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Refactor kvm_mips_host_tlb_inv() to also be able to invalidate any
matching TLB entry in the kernel ASID rather than assuming only the TLB
entries in the user ASID can change. Two new bool user/kernel arguments
allow the caller to indicate whether the mapping should affect each of
the ASIDs for guest user/kernel mode.
- kvm_mips_invalidate_guest_tlb() (used by TLBWI/TLBWR emulation) can
now invalidate any corresponding TLB entry in both the kernel ASID
(guest kernel may have accessed any guest mapping), and the user ASID
if the entry being replaced is in guest USeg (where guest user may
also have accessed it).
- The tlbmod fault handler (and the KSeg0 / TLB mapped / commpage fault
handlers in later patches) can now invalidate the corresponding TLB
entry in whichever ASID is currently active, since only a single page
table will have been updated anyway.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
kvm_mips_host_tlb_inv() uses the TLBP instruction to probe the host TLB
for an entry matching the given guest virtual address, and determines
whether a match was found based on whether CP0_Index > 0. This is
technically incorrect as an index of 0 (with the high bit clear) is a
perfectly valid TLB index.
This is harmless at the moment due to the use of at least 1 wired TLB
entry for the KVM commpage, however we will soon be ridding ourselves of
that particular wired entry so lets fix the condition in case the entry
needing invalidation does land at TLB index 0.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Use functions from the general MIPS TLB exception vector generation code
(tlbex.c) to construct a fast path TLB refill handler similar to the
general one, but cut down and capable of preserving K0 and K1.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
tlbex.c uses the implementation dependent $22 CP0 register group on
NetLogic cores, with the help of the c0_kscratch() helper. Allow these
registers to be allocated by the KVM entry code too instead of assuming
KScratch registers are all $31, which will also allow pgd_reg to be
handled since it is allocated that way.
We also drop the masking of kscratch_mask with 0xfc, as it is redundant
for the standard KScratch registers (Config4.KScrExist won't have the
low 2 bits set anyway), and apparently not necessary for NetLogic.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Activate the GVA page tables when in guest context. This will allow the
normal Linux TLB refill handler to fill from it when guest memory is
read, as well as preventing accidental reading from user memory.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Allocate GVA -> HPA page tables for guest kernel and guest user mode on
each VCPU, to allow for fast path TLB refill handling to be added later.
In the process kvm_arch_vcpu_init() needs updating to pass on any error
from the vcpu_init() callback.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Wire up a vcpu uninit implementation callback. This will be used for the
clean up of GVA->HPA page tables.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Set init_mm as the active_mm and update mm_cpumask(current->mm) to
reflect that it isn't active when in guest context. This prevents cache
management code from attempting cache flushes on host virtual addresses
while in guest context, for example due to a cache management IPIs or
later when writing of dynamically translated code hits copy on write.
We do this using helpers in static kernel code to avoid having to export
init_mm to modules.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
We only need the guest ASID loaded while in guest context, i.e. while
running guest code and while handling guest exits. We load the guest
ASID when entering the guest, however we restore the host ASID later
than necessary, when the VCPU state is saved i.e. vcpu_put() or slightly
earlier if preempted after returning to the host.
This mismatch is both unpleasant and causes redundant host ASID restores
in kvm_trap_emul_vcpu_put(). Lets explicitly restore the host ASID when
returning to the host, and don't bother restoring the host ASID on
context switch in unless we're already in guest context.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Add implementation callbacks for entering the guest (vcpu_run()) and
reentering the guest (vcpu_reenter()), allowing implementation specific
operations to be performed before entering the guest or after returning
to the host without cluttering kvm_arch_vcpu_ioctl_run().
This allows the T&E specific lazy user GVA flush to be moved into
trap_emul.c, along with disabling of the HTW. We also move
kvm_mips_deliver_interrupts() as VZ will need to restore the guest timer
state prior to delivering interrupts.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
The kvm_vcpu_arch structure contains both mm_structs for allocating MMU
contexts (primarily the ASID) but it also copies the resulting ASIDs
into guest_{user,kernel}_asid[] arrays which are referenced from uasm
generated code.
This duplication doesn't seem to serve any purpose, and it gets in the
way of generalising the ASID handling across guest kernel/user modes, so
lets just extract the ASID straight out of the mm_struct on demand, and
in fact there are convenient cpu_context() and cpu_asid() macros for
doing so.
To reduce the verbosity of this code we do also add kern_mm and user_mm
local variables where the kernel and user mm_structs are used.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
The MIPS KVM host and guest GVA ASIDs may need regenerating when
scheduling a process in guest context, which is done from the
kvm_arch_vcpu_load() / kvm_arch_vcpu_put() functions in mmu.c.
However this is a fairly implementation specific detail. VZ for example
may use GuestIDs instead of normal ASIDs to distinguish mappings
belonging to different guests, and even on VZ without GuestID the root
TLB will be used differently to trap & emulate.
Trap & emulate GVA ASIDs only relate to the user part of the full
address space, so can be left active during guest exit handling (guest
context) to allow guest instructions to be easily read and translated.
VZ root ASIDs however are for GPA mappings so can't be left active
during normal kernel code. They also aren't useful for accessing guest
virtual memory, and we should have CP0_BadInstr[P] registers available
to provide encodings of trapping guest instructions anyway.
Therefore move the ASID preemption handling into the implementation
callback.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Convert the get_regs() and set_regs() callbacks to vcpu_load() and
vcpu_put(), which provide a cpu argument and more closely match the
kvm_arch_vcpu_load() / kvm_arch_vcpu_put() that they are called by.
This is in preparation for moving ASID management into the
implementations.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
KVM T&E uses an ASID for guest kernel mode and an ASID for guest user
mode. The current ASID is saved when the guest is scheduled out, and
restored when scheduling back in, with checks for whether the ASID needs
to be regenerated.
This isn't really necessary as the ASID can be easily determined by the
current guest mode, so lets simplify it to just read the required ASID
from guest_kernel_asid or guest_user_asid even if the ASID hasn't been
regenerated.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
MIPS incompletely implements the KVM_NMI ioctl to supposedly perform a
CPU reset, but all it actually does is invalidate the ASIDs. It doesn't
expose the KVM_CAP_USER_NMI capability which is supposed to indicate the
presence of the KVM_NMI ioctl, and no user software actually uses it on
MIPS.
Since this is dead code that would technically need updating for GVA
page table handling in upcoming patches, remove it now. If we wanted to
implement NMI injection later it can always be done properly along with
the KVM_CAP_USER_NMI capability, and if we wanted to implement a proper
CPU reset it would be better done with a separate ioctl.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Merge in MIPS prerequisites from GVA page tables and GPA page tables
series. The same branch can also merge into the MIPS tree.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
The protected cache ops contain no out of line fixup code to return an
error code in the event of a fault, with the cache op being skipped in
that case. For KVM however we'd like to detect this case as page
faulting will be disabled so it could happen during normal operation if
the GVA page tables were flushed, and need to be handled by the caller.
Add the out-of-line fixup code to load the error value -EFAULT into the
return variable, and adapt the protected cache line functions to pass
the error back to the caller.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Export to TLB exception code generating functions so that KVM can
construct a fast TLB refill handler for guest context without
reinventing the wheel quite so much.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Add include guards in asm/uasm.h to allow it to be safely used by a new
header asm/tlbex.h in the next patch to expose TLB exception building
functions for KVM to use.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Export pmd_init(), invalid_pmd_table and tlbmiss_handler_setup_pgd to
GPL kernel modules so that MIPS KVM can use the inline page table
management functions and switch between page tables:
- pmd_init() will be used directly by KVM to initialise newly allocated
pmd tables with invalid lower level table pointers.
- invalid_pmd_table is used by pud_present(), pud_none(), and
pud_clear(), which KVM will use to test and clear pud entries.
- tlbmiss_handler_setup_pgd() will be called by KVM entry code to switch
to the appropriate GVA page tables.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Enable support for GCC plugins on powerpc.
Add an additional version check in gcc-plugins-check to advise users to
upgrade to gcc 5.2+ on powerpc to avoid issues with header files (gcc <=
4.6) or missing copies of rs6000-cpus.def (4.8 to 5.1 on 64-bit
targets).
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Commit 38addce8b6 ("gcc-plugins: Add latent_entropy plugin") excludes
certain powerpc early boot code from the latent entropy plugin by adding
appropriate CFLAGS. It looks like this was supposed to cover
prom_init.o, but ended up saying init.o (which doesn't exist) instead.
Fix the typo.
Fixes: 38addce8b6 ("gcc-plugins: Add latent_entropy plugin")
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When aesni is built as a module together with pcbc, the pcbc module
must be present for aesni to load. However, the pcbc module may not
be present for reasons such as its absence on initramfs. This patch
allows the aesni to function even if the pcbc module is enabled but
not present.
Reported-by: Arkadiusz Miśkiewicz <arekm@maven.pl>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The L2 cache controller on the T2080 SoC has similar capabilities to the
others already supported by the mpc85xx_edac driver. Add it to the list
of compatible devices.
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Acked-by: Johannes Thumshirn <jth@kernel.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: devicetree@vger.kernel.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20170201231624.28843-1-chris.packham@alliedtelesis.co.nz
Signed-off-by: Borislav Petkov <bp@suse.de>
Pull x86 fixes from Ingo Molnar:
"Misc fixes:
- two microcode loader fixes
- two FPU xstate handling fixes
- an MCE timer handling related crash fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Make timer handling more robust
x86/microcode: Do not access the initrd after it has been freed
x86/fpu/xstate: Fix xcomp_bv in XSAVES header
x86/fpu: Set the xcomp_bv when we fake up a XSAVES area
x86/microcode/intel: Drop stashed AP patch pointer optimization
Pull perf fixes from Ingo Molnar:
"Five kernel fixes:
- an mmap tracing ABI fix for certain mappings
- a use-after-free fix, found via KASAN
- three CPU hotplug related x86 PMU driver fixes"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel/uncore: Make package handling more robust
perf/x86/intel/uncore: Clean up hotplug conversion fallout
perf/x86/intel/rapl: Make package handling more robust
perf/core: Fix PERF_RECORD_MMAP2 prot/flags for anonymous memory
perf/core: Fix use-after-free bug
Pull EFI fixes from Ingo Molnar:
"Two EFI boot fixes, one for arm64 and one for x86 systems with certain
firmware versions"
* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/fdt: Avoid FDT manipulation after ExitBootServices()
x86/efi: Always map the first physical page into the EFI pagetables
- fix noMMU build on cores with MMU.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJYk4VkAAoJEFH5zJH4P6BEdxUP/RQRkWmkoJCf5UI7MQOiU1LS
HuT9U4trDyUUPGRYi1oUgSZMjM2IqZZvmUqSQOq7i2Uo52UffTWMBCrniye+CPk9
XLEiAoiD5JCObuDKYQTtpG5BQUqpGsCdV1ZGtpcJofZIef8sxOVvBuyzD+Tr4ok+
OvA4/X20nEX3fp4R2EyEMZg9re+GeJ8vgY/71jAH3bw4SwJ6znsRgNDM2hP7ZaYq
yZfqcwpqzfj8Jrx6Nfq9ibM37iVEam3OoIGJbYFpvF+uLsOyRPMABc7d6oJk2cTo
hFKfu9TlH7vNZOhh812KLqcJSeLf4TxNAUZqKLrCsZ860QnCZoI63911FxwZX3sU
gXYzu4K+c+GhtB48w5UlRlKlHRTqu55MOC+5Pk/lYpm7DXXmgS0cjfRpBvYDFVfH
QIcZ6isGht0TKbLcE79nxnLf1miVajQtSYhtVjND93TwB7ryURLlMdriI7nSYF2M
RksL8m9di1fsZyIHQTaBOlNc9taxnxWJr+2oSJAzJ7f9QSYC4FdYTRwDujx4mPoe
u3w6ED466zPLZzuZeg5qZbZp4s4DGkh9IKHCwZfadmCVtneMlDRV3JRnPvhE740M
+syyJMR8HtLfRMto7fkrkUcK7+pV2jTRTwKk06QIoRlTIwR4uKjO6kVdizNwXQPD
AqmAMqT0YBntVmDgM6sR
=qIDo
-----END PGP SIGNATURE-----
Merge tag 'xtensa-20170202' of git://github.com/jcmvbkbc/linux-xtensa
Pull Xtensa fix from Max Filippov:
"A for an Xtensa build error introduced in reset code refactoring
series in v4.9:
- fix noMMU build on cores with MMU"
* tag 'xtensa-20170202' of git://github.com/jcmvbkbc/linux-xtensa:
xtensa: fix noMMU build on cores with MMU
We recently discovered that __raw_read_system_reg() erroneously mapped
sysreg IDs to the wrong registers.
To ensure that we don't get hit by a similar issue in future, this patch
makes __raw_read_system_reg() use a macro for each case statement,
ensuring that each case reads the correct register.
To ensure that this patch hasn't introduced an issue, I've binary-diffed
the object files before and after this patch. No code or data sections
differ (though some debug section differ due to line numbering
changing).
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Since it was introduced in commit da8d02d19f ("arm64/capabilities:
Make use of system wide safe value"), __raw_read_system_reg() has
erroneously mapped some sysreg IDs to other registers.
For the fields in ID_ISAR5_EL1, our local feature detection will be
erroneous. We may spuriously detect that a feature is uniformly
supported, or may fail to detect when it actually is, meaning some
compat hwcaps may be erroneous (or not enforced upon hotplug).
This patch corrects the erroneous entries.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Fixes: da8d02d19f ("arm64/capabilities: Make use of system wide safe value")
Reported-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Will Deacon <will.deacon@arm.com>
The SPE buffer is virtually addressed, using the page tables of the CPU
MMU. Unusually, this means that the EL0/1 page table may be live whilst
we're executing at EL2 on non-VHE configurations. When VHE is in use,
we can use the same property to profile the guest behind its back.
This patch adds the relevant disabling and flushing code to KVM so that
the host can make use of SPE without corrupting guest memory, and any
attempts by a guest to use SPE will result in a trap.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Alex Bennée <alex.bennee@linaro.org>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Instead of open-coding the loop, let's use canned macro.
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The rename of orion5x-lschl.dts needs to be reflected in the Makefile:
make[3]: *** No rule to make target 'arch/arm/boot/dts/orion5x-lschl.dtb', needed by '__build'.
Fixes: 6cfd3cd8d8 ("ARM: dts: orion5x-lschl: More consistent naming on linkstation series")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
pgd_alloc() references init_mm which is not exported to modules. In
order for KVM to be able to use pgd_alloc() to allocate GVA page tables,
move pgd_alloc() into a new pgtable.c file and export it to modules.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
* Return directly after a call of the function "copy_from_user" failed
in a case block.
* Delete the jump label "out" which became unnecessary with
this refactoring.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
As we add the ability to do DLPAR of additional devices through
the sysfs interface we need to know which devices are supported.
This adds the reporting of supported devices with a comma separated
list reported in the existing /sys/kernel/dlpar.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Extend the existing PRRN infrastructure to perform the actual affinity
updating for cpus and memory in addition to the device tree updating.
For cpus, dynamic affinity updating already appears to exist in the
kernel in the form of arch_update_cpu_topology(). For memory, we must
place a READD operation on the hotplug queue for any phandle included in
the PRRN event that is determined to be an LMB.
Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently, memory must be hot removed and subsequently re-added in order
to dynamically update the affinity of LMBs specified by a PRRN event.
Earlier implementations of the PRRN event handler ran into issues in which
the hot remove would occur successfully, but a hotplug event would be
initiated from another source and grab the hotplug lock preventing the hot
add from occurring. To prevent this situation, this patch introduces the
notion of a hot "readd" action for memory which atomizes a hot remove and
a hot add into a single, serialized operation on the hotplug queue.
Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When adding and removing LMBs we should make the acquire/release of
the DRC a separate step to allow for a few improvements. First
this will ensure that LMBs removed during a remove by count operation
are all available if a error occurs and we need to add them back. By
first removeing all the LMBs from the kernel before releasing their
DRCs the LMBs are available to add back should an error occur.
Also, this will allow for faster re-add operations of memory for
PRRN event handling since we can skip the unneeded step of having
to release the DRC and the acquire it back.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
CONFIG_PPC_PTDUMP currently selects CONFIG_DEBUG_FS. But CONFIG_DEBUG_FS
is user-selectable, so we shouldn't select it. Instead depend on it.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Commit db9112173b ("powerpc: Turn on BPF_JIT in ppc64_defconfig")
only added BPF_JIT to the ppc64 defconfig. Add it to our powernv
and pseries defconfigs too.
Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We added support for HAVE_CONTEXT_TRACKING, but placed the option inside
PPC_PSERIES.
This has the undesirable effect that NO_HZ_FULL can be enabled on a
kernel with both powernv and pseries support, but cannot on a kernel
with powernv only support.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In __get_user_nosleep, we create an intermediate pointer for the
user address we're about to fetch. We currently don't tag this
pointer as const. Make it const, as we are simply dereferencing
it, and it's scope is limited to the __get_user_nosleep macro.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In __get_user_nocheck, we create an intermediate pointer for the
user address we're about to fetch. We currently don't tag this
pointer as const. Make it const, as we are simply dereferencing
it, and it's scope is limited to the __get_user_nocheck macro.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In __get_user_check, we create an intermediate pointer for the
user address we're about to fetch. We currently don't tag this
pointer as const. Make it const, as we are simply dereferencing
it, and it's scope is limited to the __get_user_check macro.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
opal_lpc_init() is called from an __init routine, and calls other __init
routines, so should also be __init, init?
Fixes: 023b13a501 ("powerpc/powernv: Add support for direct mapped LPC on POWER9")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Pull crypto fixes from Herbert Xu:
"This fixes a bug in CBC/CTR on ARM64 that breaks chaining as well as a
bug in the core API that causes registration failures when a driver
unloads and then reloads an algorithm"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: arm64/aes-blk - honour iv_out requirement in CBC and CTR modes
crypto: api - Clear CRYPTO_ALG_DEAD bit before registering an alg
During a TLB invalidate sequence targeting the inner shareable domain,
Falkor may prematurely complete the DSB before all loads and stores using
the old translation are observed. Instruction fetches are not subject to
the conditions of this erratum. If the original code sequence includes
multiple TLB invalidate instructions followed by a single DSB, onle one of
the TLB instructions needs to be repeated to work around this erratum.
While the erratum only applies to cases in which the TLBI specifies the
inner-shareable domain (*IS form of TLBI) and the DSB is ISH form or
stronger (OSH, SYS), this changes applies the workaround overabundantly--
to local TLBI, DSB NSH sequences as well--for simplicity.
Based on work by Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Christopher Covington <cov@codeaurora.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Commit:
8ee83b2ab3 ("perf/x86/intel/pt: Add support for PTWRITE and power event tracing")
forgot to add format strings to the PT driver. So one could enable these features
by setting corresponding bits in the event config, but not by their mnemonic names.
This patch adds the format strings.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: vince@deater.net
Fixes: 8ee83b2ab3 ("perf/x86/intel/pt: Add support for PTWRITE...")
Link: http://lkml.kernel.org/r/20170127151644.8585-2-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Move the check to whether this is a UV system that needs initialization
from is_uv_system() to the internal uv_system_init() function. This is
because on a UV system without a HUB the is_uv_system() returns false.
But we still need some specific UV system initialization. See the
uv_system_init() for change to a quick check if UV is applicable. This
change should not increase overhead since is_uv_system() also called
into this same area.
Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Russ Anderson <rja@hpe.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Dimitri Sivanich <sivanich@hpe.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20170125163518.256403963@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The initialize PCH NMI I/O function is separate and may be moved to BIOS
for security reasons. This function detects whether the PCH NMI config
has already been done and if not, it will then initialize the PCH here.
Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Russ Anderson <rja@hpe.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Dimitri Sivanich <sivanich@hpe.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20170125163518.089387859@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Verify that the NMI action being set is valid. The default NMI action
changes from the non-standard 'kdb' to the more standard 'dump'.
Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Russ Anderson <rja@hpe.com>
Reviewed-by: Alex Thorlton <athorlton@sgi.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Dimitri Sivanich <sivanich@hpe.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20170125163517.922751779@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>