Without this we will always find the feature disabled.
Fixes: 984d7a1ec6 ("powerpc/mm: Fixup kernel read only mapping")
Cc: stable@vger.kernel.org # v4.7+
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
start,size has the benefit of being easier to search for (start,end
usually gives you the preceeding vector from the one you want, as first
result).
Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Somewhere along the line, search/replace left some naming garbled,
and untidy alignment (aka. mpe stuffed it up). Might as well fix them
all up now while git blame history doesn't extend too far.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This adds AUX vectors for the L1I,D, L2 and L3 cache levels
providing for each cache level the size of the cache in bytes
and the geometry (line size and number of ways).
We chose to not use the existing alpha/sh definition which
packs all the information in a single entry per cache level as
it is too restricted to represent some of the geometries used
on POWER.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
All shipping firmware versions have it wrong in the device-tree
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Retrieved from device-tree when available
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We have two set of identical struct members for the I and D sides
and mostly identical bunches of code to parse the device-tree to
populate them. Instead make a ppc_cache_info structure with one
copy for I and one for D
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
It will be used to calculate the associativity
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In a number of places we called "cache line size" what is actually
the cache block size, which in the powerpc architecture, means the
effective size to use with cache management instructions (it can
be different from the actual cache line size).
We fix the naming across the board and properly retrieve both
pieces of information when available in the device-tree.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We don't patch instructions based on the cache lines or block
sizes these days.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The variables are defined twice in setup_32.c and setup_64.c, do it
once in setup-common.c instead
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
It's an kernel private macro, it doesn't belong there
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Enable support for GCC plugins on powerpc.
Add an additional version check in gcc-plugins-check to advise users to
upgrade to gcc 5.2+ on powerpc to avoid issues with header files (gcc <=
4.6) or missing copies of rs6000-cpus.def (4.8 to 5.1 on 64-bit
targets).
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Commit 38addce8b6 ("gcc-plugins: Add latent_entropy plugin") excludes
certain powerpc early boot code from the latent entropy plugin by adding
appropriate CFLAGS. It looks like this was supposed to cover
prom_init.o, but ended up saying init.o (which doesn't exist) instead.
Fix the typo.
Fixes: 38addce8b6 ("gcc-plugins: Add latent_entropy plugin")
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
As we add the ability to do DLPAR of additional devices through
the sysfs interface we need to know which devices are supported.
This adds the reporting of supported devices with a comma separated
list reported in the existing /sys/kernel/dlpar.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Extend the existing PRRN infrastructure to perform the actual affinity
updating for cpus and memory in addition to the device tree updating.
For cpus, dynamic affinity updating already appears to exist in the
kernel in the form of arch_update_cpu_topology(). For memory, we must
place a READD operation on the hotplug queue for any phandle included in
the PRRN event that is determined to be an LMB.
Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently, memory must be hot removed and subsequently re-added in order
to dynamically update the affinity of LMBs specified by a PRRN event.
Earlier implementations of the PRRN event handler ran into issues in which
the hot remove would occur successfully, but a hotplug event would be
initiated from another source and grab the hotplug lock preventing the hot
add from occurring. To prevent this situation, this patch introduces the
notion of a hot "readd" action for memory which atomizes a hot remove and
a hot add into a single, serialized operation on the hotplug queue.
Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When adding and removing LMBs we should make the acquire/release of
the DRC a separate step to allow for a few improvements. First
this will ensure that LMBs removed during a remove by count operation
are all available if a error occurs and we need to add them back. By
first removeing all the LMBs from the kernel before releasing their
DRCs the LMBs are available to add back should an error occur.
Also, this will allow for faster re-add operations of memory for
PRRN event handling since we can skip the unneeded step of having
to release the DRC and the acquire it back.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
CONFIG_PPC_PTDUMP currently selects CONFIG_DEBUG_FS. But CONFIG_DEBUG_FS
is user-selectable, so we shouldn't select it. Instead depend on it.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Commit db9112173b ("powerpc: Turn on BPF_JIT in ppc64_defconfig")
only added BPF_JIT to the ppc64 defconfig. Add it to our powernv
and pseries defconfigs too.
Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We added support for HAVE_CONTEXT_TRACKING, but placed the option inside
PPC_PSERIES.
This has the undesirable effect that NO_HZ_FULL can be enabled on a
kernel with both powernv and pseries support, but cannot on a kernel
with powernv only support.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In __get_user_nosleep, we create an intermediate pointer for the
user address we're about to fetch. We currently don't tag this
pointer as const. Make it const, as we are simply dereferencing
it, and it's scope is limited to the __get_user_nosleep macro.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In __get_user_nocheck, we create an intermediate pointer for the
user address we're about to fetch. We currently don't tag this
pointer as const. Make it const, as we are simply dereferencing
it, and it's scope is limited to the __get_user_nocheck macro.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In __get_user_check, we create an intermediate pointer for the
user address we're about to fetch. We currently don't tag this
pointer as const. Make it const, as we are simply dereferencing
it, and it's scope is limited to the __get_user_check macro.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
opal_lpc_init() is called from an __init routine, and calls other __init
routines, so should also be __init, init?
Fixes: 023b13a501 ("powerpc/powernv: Add support for direct mapped LPC on POWER9")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This adds a few last pieces of the support for radix guests:
* Implement the backends for the KVM_PPC_CONFIGURE_V3_MMU and
KVM_PPC_GET_RMMU_INFO ioctls for radix guests
* On POWER9, allow secondary threads to be on/off-lined while guests
are running.
* Set up LPCR and the partition table entry for radix guests.
* Don't allocate the rmap array in the kvm_memory_slot structure
on radix.
* Don't try to initialize the HPT for radix guests, since they don't
have an HPT.
* Take out the code that prevents the HV KVM module from
initializing on radix hosts.
At this stage, we only support radix guests if the host is running
in radix mode, and only support HPT guests if the host is running in
HPT mode. Thus a guest cannot switch from one mode to the other,
which enables some simplifications.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
On POWER9 DD1, we need to invalidate the ERAT (effective to real
address translation cache) when changing the PIDR register, which
we do as part of guest entry and exit.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
If we allow LPCR[AIL] to be set for radix guests, then interrupts from
the guest to the host can be delivered by the hardware with relocation
on, and thus the code path starting at kvmppc_interrupt_hv can be
executed in virtual mode (MMU on) for radix guests (previously it was
only ever executed in real mode).
Most of the code is indifferent to whether the MMU is on or off, but
the calls to OPAL that use the real-mode OPAL entry code need to
be switched to use the virtual-mode code instead. The affected
calls are the calls to the OPAL XICS emulation functions in
kvmppc_read_one_intr() and related functions. We test the MSR[IR]
bit to detect whether we are in real or virtual mode, and call the
opal_rm_* or opal_* function as appropriate.
The other place that depends on the MMU being off is the optimization
where the guest exit code jumps to the external interrupt vector or
hypervisor doorbell interrupt vector, or returns to its caller (which
is __kvmppc_vcore_entry). If the MMU is on and we are returning to
the caller, then we don't need to use an rfid instruction since the
MMU is already on; a simple blr suffices. If there is an external
or hypervisor doorbell interrupt to handle, we branch to the
relocation-on version of the interrupt vector.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
With radix, the guest can do TLB invalidations itself using the tlbie
(global) and tlbiel (local) TLB invalidation instructions. Linux guests
use local TLB invalidations for translations that have only ever been
accessed on one vcpu. However, that doesn't mean that the translations
have only been accessed on one physical cpu (pcpu) since vcpus can move
around from one pcpu to another. Thus a tlbiel might leave behind stale
TLB entries on a pcpu where the vcpu previously ran, and if that task
then moves back to that previous pcpu, it could see those stale TLB
entries and thus access memory incorrectly. The usual symptom of this
is random segfaults in userspace programs in the guest.
To cope with this, we detect when a vcpu is about to start executing on
a thread in a core that is a different core from the last time it
executed. If that is the case, then we mark the core as needing a
TLB flush and then send an interrupt to any thread in the core that is
currently running a vcpu from the same guest. This will get those vcpus
out of the guest, and the first one to re-enter the guest will do the
TLB flush. The reason for interrupting the vcpus executing on the old
core is to cope with the following scenario:
CPU 0 CPU 1 CPU 4
(core 0) (core 0) (core 1)
VCPU 0 runs task X VCPU 1 runs
core 0 TLB gets
entries from task X
VCPU 0 moves to CPU 4
VCPU 0 runs task X
Unmap pages of task X
tlbiel
(still VCPU 1) task X moves to VCPU 1
task X runs
task X sees stale TLB
entries
That is, as soon as the VCPU starts executing on the new core, it
could unmap and tlbiel some page table entries, and then the task
could migrate to one of the VCPUs running on the old core and
potentially see stale TLB entries.
Since the TLB is shared between all the threads in a core, we only
use the bit of kvm->arch.need_tlb_flush corresponding to the first
thread in the core. To ensure that we don't have a window where we
can miss a flush, this moves the clearing of the bit from before the
actual flush to after it. This way, two threads might both do the
flush, but we prevent the situation where one thread can enter the
guest before the flush is finished.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
If the guest is in radix mode, then it doesn't have a hashed page
table (HPT), so all of the hypercalls that manipulate the HPT can't
work and should return an error. This adds checks to make them
return H_FUNCTION ("function not supported").
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This adds code to keep track of dirty pages when requested (that is,
when memslot->dirty_bitmap is non-NULL) for radix guests. We use the
dirty bits in the PTEs in the second-level (partition-scoped) page
tables, together with a bitmap of pages that were dirty when their
PTE was invalidated (e.g., when the page was paged out). This bitmap
is stored in the first half of the memslot->dirty_bitmap area, and
kvm_vm_ioctl_get_dirty_log_hv() now uses the second half for the
bitmap that gets returned to userspace.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This adapts our implementations of the MMU notifier callbacks
(unmap_hva, unmap_hva_range, age_hva, test_age_hva, set_spte_hva)
to call radix functions when the guest is using radix. These
implementations are much simpler than for HPT guests because we
have only one PTE to deal with, so we don't need to traverse
rmap chains.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This adds the code to construct the second-level ("partition-scoped" in
architecturese) page tables for guests using the radix MMU. Apart from
the PGD level, which is allocated when the guest is created, the rest
of the tree is all constructed in response to hypervisor page faults.
As well as hypervisor page faults for missing pages, we also get faults
for reference/change (RC) bits needing to be set, as well as various
other error conditions. For now, we only set the R or C bit in the
guest page table if the same bit is set in the host PTE for the
backing page.
This code can take advantage of the guest being backed with either
transparent or ordinary 2MB huge pages, and insert 2MB page entries
into the guest page tables. There is no support for 1GB huge pages
yet.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This adds code to branch around the parts that radix guests don't
need - clearing and loading the SLB with the guest SLB contents,
saving the guest SLB contents on exit, and restoring the host SLB
contents.
Since the host is now using radix, we need to save and restore the
host value for the PID register.
On hypervisor data/instruction storage interrupts, we don't do the
guest HPT lookup on radix, but just save the guest physical address
for the fault (from the ASDR register) in the vcpu struct.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This adds a field in struct kvm_arch and an inline helper to
indicate whether a guest is a radix guest or not, plus a new file
to contain the radix MMU code, which currently contains just a
translate function which knows how to traverse the guest page
tables to translate an address.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
POWER9 adds a register called ASDR (Access Segment Descriptor
Register), which is set by hypervisor data/instruction storage
interrupts to contain the segment descriptor for the address
being accessed, assuming the guest is using HPT translation.
(For radix guests, it contains the guest real address of the
access.)
Thus, for HPT guests on POWER9, we can use this register rather
than looking up the SLB with the slbfee. instruction.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This adds the implementation of the KVM_PPC_CONFIGURE_V3_MMU ioctl
for HPT guests on POWER9. With this, we can return 1 for the
KVM_CAP_PPC_MMU_HASH_V3 capability.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This adds two capabilities and two ioctls to allow userspace to
find out about and configure the POWER9 MMU in a guest. The two
capabilities tell userspace whether KVM can support a guest using
the radix MMU, or using the hashed page table (HPT) MMU with a
process table and segment tables. (Note that the MMUs in the
POWER9 processor cores do not use the process and segment tables
when in HPT mode, but the nest MMU does).
The KVM_PPC_CONFIGURE_V3_MMU ioctl allows userspace to specify
whether a guest will use the radix MMU or the HPT MMU, and to
specify the size and location (in guest space) of the process
table.
The KVM_PPC_GET_RMMU_INFO ioctl gives userspace information about
the radix MMU. It returns a list of supported radix tree geometries
(base page size and number of bits indexed at each level of the
radix tree) and the encoding used to specify the various page
sizes for the TLB invalidate entry instruction.
Initially, both capabilities return 0 and the ioctls return -EINVAL,
until the necessary infrastructure for them to operate correctly
is added.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
With host and guest both using radix translation, it is feasible
for the host to take interrupts that come from the guest with
relocation on, and that is in fact what the POWER9 hardware will
do when LPCR[AIL] = 3. All such interrupts use HSRR0/1 not SRR0/1
except for system call with LEV=1 (hcall).
Therefore this adds the KVM tests to the _HV variants of the
relocation-on interrupt handlers, and adds the KVM test to the
relocation-on system call entry point.
We also instantiate the relocation-on versions of the hypervisor
data storage and instruction interrupt handlers, since these can
occur with relocation on in radix guests.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When changing a partition table entry on POWER9, we do a particular
form of the tlbie instruction which flushes all TLBs and caches of
the partition table for a given logical partition ID (LPID).
This instruction has a field in the instruction word, labelled R
(radix), which should be 1 if the partition was previously a radix
partition and 0 if it was a HPT partition. This implements that
logic.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This exports the pgtable_cache array and the pgtable_cache_add
function so that HV KVM can use them for allocating radix page
tables for guests.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This adds definitions for bits in the DSISR register which are used
by POWER9 for various translation-related exception conditions, and
for some more bits in the partition table entry that will be needed
by KVM.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
To use radix as a guest, we first need to tell the hypervisor via
the ibm,client-architecture call first that we support POWER9 and
architecture v3.00, and that we can do either radix or hash and
that we would like to choose later using an hcall (the
H_REGISTER_PROC_TBL hcall).
Then we need to check whether the hypervisor agreed to us using
radix. We need to do this very early on in the kernel boot process
before any of the MMU initialization is done. If the hypervisor
doesn't agree, we can't use radix and therefore clear the radix
MMU feature bit.
Later, when we have set up our process table, which points to the
radix tree for each process, we need to install that using the
H_REGISTER_PROC_TBL hcall.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This fixes the byte index values for some of the option bits in
the "ibm,architectur-vec-5" property. The "platform facilities options"
bits are in byte 17 not byte 14, so the upper 8 bits of their
definitions need to be 0x11 not 0x0E. The "sub processor support" option
is in byte 21 not byte 15.
Note none of these options are actually looked up in
"ibm,architecture-vec-5" at this time, so there is no bug.
When checking whether option bits are set, we should check that
the offset of the byte being checked is less than the vector
length that we got from the hypervisor.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently, if the kernel is running on a POWER9 processor under a
hypervisor, it will try to use the radix MMU even though it doesn't have
the necessary code to use radix under a hypervisor (it doesn't negotiate
use of radix, and it doesn't do the H_REGISTER_PROC_TBL hcall). The
result is that the guest kernel will crash when it tries to turn on the
MMU.
This fixes it by looking for the /chosen/ibm,architecture-vec-5
property, and if it exists, clears the radix MMU feature bit, before we
decide whether to initialize for radix or HPT. This property is created
by the hypervisor as a result of the guest calling the
ibm,client-architecture-support method to indicate its capabilities, so
it will indicate whether the hypervisor agreed to us using radix.
Systems without a hypervisor may have this property also (for example,
skiboot creates it), so we check the HV bit in the MSR to see whether we
are running as a guest or not. If we are in hypervisor mode, then we can
do whatever we like including using the radix MMU.
The reason for using this property is that in future, when we have
support for using radix under a hypervisor, we will need to check this
property to see whether the hypervisor agreed to us using radix.
Fixes: 2bfd65e45e ("powerpc/mm/radix: Add radix callbacks for early init routines")
Cc: stable@vger.kernel.org # v4.7+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
64-bit Book3S exception handlers must find the dynamic kernel base
to add to the target address when branching beyond __end_interrupts,
in order to support kernel running at non-0 physical address.
Support this in KVM by branching with CTR, similarly to regular
interrupt handlers. The guest CTR saved in HSTATE_SCRATCH1 and
restored after the branch.
Without this, the host kernel hangs and crashes randomly when it is
running at a non-0 address and a KVM guest is started.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Use remove_pagetable() and friends for radix vmemmap removal.
We do not require the special-case handling of vmemmap done in the x86
versions of these functions. This is because vmemmap_free() has already
freed the mapped pages, and calls us with an aligned address range.
So, add a few failsafe WARNs, but otherwise the code to remove physical
mappings is already sufficient for vmemmap.
Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Tear down and free the four-level page tables of physical mappings
during memory hotremove.
Borrow the basic structure of remove_pagetable() and friends from the
identically-named x86 functions. Reduce the frequency of tlb flushes and
page_table_lock spinlocks by only doing them in the outermost function.
There was some question as to whether the locking is needed at all.
Leave it for now, but we could consider dropping it.
Memory must be offline to be removed, thus not in use. So there
shouldn't be the sort of concurrent page walking activity here that
might prompt us to use RCU.
Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Wire up memory hotplug page mapping for radix. Share the mapping
function already used by radix_init_pgtable().
Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Move the page mapping code in radix_init_pgtable() into a separate
function that will also be used for memory hotplug.
The current goto loop progressively decreases its mapping size as it
covers the tail of a range whose end is unaligned. Change this to a for
loop which can do the same for both ends of the range.
Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Use the new non-PCI ISA bridge support to expose the POWER9
LPC bus as direct mapped via the ISA IO port range. This
enables direct access via drivers such as 8250
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The POWER9 chip supports an LPC bus that isn't hanging
off a PCI bus, so let's add support for that, mapping it
to the reserved space at ISA_IO_BASE
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We'll be adding non-PCI isa bridge support so let's not
have all the definition in pci-bridge.h
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The power9_idle_stop method currently takes only the requested stop
level as a parameter and picks up the rest of the PSSCR bits from a
hand-coded macro. This is not a very flexible design, especially when
the firmware has the capability to communicate the psscr value and the
mask associated with a particular stop state via device tree.
This patch modifies the power9_idle_stop API to take as parameters the
PSSCR value and the PSSCR mask corresponding to the stop state that
needs to be set. These PSSCR value and mask are respectively obtained
by parsing the "ibm,cpu-idle-state-psscr" and
"ibm,cpu-idle-state-psscr-mask" fields from the device tree.
In addition to this, the patch adds support for handling stop states
for which ESL and EC bits in the PSSCR are zero. As per the
architecture, a wakeup from these stop states resumes execution from
the subsequent instruction as opposed to waking up at the System
Vector.
The older firmware sets only the Requested Level (RL) field in the
psscr and psscr-mask exposed in the device tree. For older firmware
where psscr-mask=0xf, this patch will set the default sane values that
the set for for remaining PSSCR fields (i.e PSLL, MTL, ESL, EC, and
TR). For the new firmware, the patch will validate that the invariants
required by the ISA for the psscr values are maintained by the
firmware.
This skiboot patch that exports fully populated PSSCR values and the
mask for all the stop states can be found here:
https://lists.ozlabs.org/pipermail/skiboot/2016-September/004869.html
[Optimize the number of instructions before entering STOP with
ESL=EC=0, validate the PSSCR values provided by the firimware
maintains the invariants required as per the ISA suggested by Balbir
Singh]
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Balbir pointed out that the name of the function pnv_arch300_idle_init
was inconsistent with the names of the variables and functions
pertaining to POWER9 features in book3s_idle.S.
This patch renames pnv_arch300_idle_init to pnv_power9_idle_init.
This patch does not change any behaviour.
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently all the low-power idle states are expected to wake up
at reset vector 0x100. Which is why the macro IDLE_STATE_ENTER_SEQ
that puts the CPU to an idle state and never returns.
On ISA v3.0, when the ESL and EC bits in the PSSCR are zero, the CPU
is expected to wake up at the next instruction of the idle
instruction.
This patch adds a new macro named IDLE_STATE_ENTER_SEQ_NORET for the
no-return variant and reuses the name IDLE_STATE_ENTER_SEQ
for a variant that allows resuming operation at the instruction next
to the idle-instruction.
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Add detection of NPU2 PHBs. NPU2/NVLink2 has a different register
layout for the TCE kill register therefore TCE invalidation should be
done via the OPAL call rather than using the register directly as it
is for PHB3 and NVLink1. This changes TCE invalidation to use the OPAL
call in the case of a NPU2 PHB model.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
POWER9 contains an off core mmu called the nest mmu (NMMU). This is
used by other hardware units on the chip to translate virtual
addresses into real addresses. The unit attempting an address
translation provides the majority of the context required for the
translation request except for the base address of the partition table
(ie. the PTCR) which needs to be programmed into the NMMU.
This patch adds a call to OPAL to set the PTCR for the nest mmu in
opal_init().
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Relax the check preventing us from hotplugging into an offline node.
This limitation was added in commit 482ec7c403 ("[PATCH] powerpc numa:
Support sparse online node map") to prevent adding resources to an
uninitialized node.
These days, there is no harm in doing so. The addition will actually
cause the node to be initialized and onlined; add_memory_resource()
calls hotadd_new_pgdat() (if necessary) and node_set_online().
Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The flow of the main loop in parse_numa_properties() is overly
complicated. Simplify it to be less confusing and easier to read.
No functional change.
The end of the main loop in parse_numa_properties() looks like this:
for_each_node_by_type(...) {
...
if (!condition) {
if (--ranges)
goto new_range;
else
continue;
}
statement();
if (--ranges)
goto new_range;
/* else
* continue; <- implicit, this is the end of the loop
*/
}
The only effect of !condition is to skip execution of statement(). This
can be rewritten in a simpler way:
for_each_node_by_type(...) {
...
if (condition)
statement();
if (--ranges)
goto new_range;
}
Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
There are chances that multiple CPUs can call crash_fadump() simultaneously
and would start duplicating same info to vmcoreinfo ELF note section. This
causes makedumpfile to fail during kdump capture. One example is,
triggering dumprestart from HMC which sends system reset to all the CPUs at
once.
makedumpfile --dump-dmesg /proc/vmcore
read_vmcoreinfo_basic_info: Invalid data in /tmp/vmcoreinfoyjgxlL: CRASHTIME=1475605971CRASHTIME=1475605971CRASHTIME=1475605971CRASHTIME=1475605971CRASHTIME=1475605971CRASHTIME=1475605971CRASHTIME=1475605971CRASHTIME=1475605971
makedumpfile Failed.
Running makedumpfile --dump-dmesg /proc/vmcore failed (1).
makedumpfile -d 31 -l /proc/vmcore
read_vmcoreinfo_basic_info: Invalid data in /tmp/vmcoreinfo1mmVdO: CRASHTIME=1475605971CRASHTIME=1475605971CRASHTIME=1475605971CRASHTIME=1475605971CRASHTIME=1475605971CRASHTIME=1475605971CRASHTIME=1475605971CRASHTIME=1475605971
makedumpfile Failed.
Running makedumpfile -d 31 -l /proc/vmcore failed (1).
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The proto VSID is built using both the MMU context id and effective
segment ID (ESID). We should not have overlapping bits between those.
That could result in us having a VSID collision. With the current code
we missed masking the top bits of the ESID. This implies for kernel
address we ended up using the top 4 bits of the ESID as part of the
proto VSID, which is wrong.
The current code use the top 4 context values (0x7fffc - 0x7ffff) for
the kernel. With those context IDs used for the kernel, we don't run
into VSID collisions because we get the same proto VSID irrespective of
whether we mask the ESID bits or not. eg:
ea = 0xf000000000000000
context = 0x7ffff
w/out masking:
proto_vsid = (0x7ffff << 6 | 0xf000000000000000 >> 40)
= (0x1ffffc0 | 0xf00000)
= 0x1ffffc0
with masking:
proto_vsid = (0x7ffff << 6 | ((0xf000000000000000 >> 40) & 0x3f))
= (0x1ffffc0 | (0xf00000 & 0x3f))
= 0x1ffffc0 | 0)
= 0x1ffffc0
So although there is no bug, the code is still overly subtle, so fix it
to save ourselves pain in future.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
A subsequent patch to make KVM handlers relocation-safe makes them
unusable from within alt section "else" cases (due to the way fixed
addresses are taken from within fixed section head code).
Stop open-coding the KVM handlers, and add them both as normal. A more
optimal fix may be to allow some level of alternate feature patching in
the exception macros themselves, but for now this will do.
The TRAMP_KVM handlers must be moved to the "virt" fixed section area
(name is arbitrary) in order to be closer to .text and avoid the dreaded
"relocation truncated to fit" error.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Change the calling convention to put the trap number together with
CR in two halves of r12, which frees up HSTATE_SCRATCH2 in the HV
handler.
The 64-bit PR handler entry translates the calling convention back
to match the previous call convention (i.e., shared with 32-bit), for
simplicity.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
With bpf_jit_binary_alloc(), we allocate at a page granularity and fill
the rest of the space with illegal instructions to mitigate BPF spraying
attacks, while having the actual JIT'ed BPF program at a random location
within the allocated space. Under this scenario, it would be better to
flush the entire allocated buffer rather than just the part containing
the actual program. We already flush the buffer from start to the end of
the BPF program. Extend this to include the illegal instructions after
the BPF program.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We have a check earlier to ensure we don't proceed if image is NULL. As
such, the redundant check can be removed.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
[Added similar changes for classic BPF JIT]
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This function already has multiple exit points, so there's no harm
adding another. Although it looks odd to return directly in a function
which takes a lock, we've actually just dropped the mmap_sem in this
code, so there's really no reason to go via a label. And it means we can
drop the unhelpfully named out2 label.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
[mpe: Rewrite change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Setting err and going to ldst_done just returns 0, without using err, so
just return 0 directly. We already do that for other call sites in this
function.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
[mpe: Rewrite change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The RTAS device-tree node's refcount has been increased by one in
the function call of_find_node_by_name(), but it's missed to be
decreased by one in the error path. It leads to unbalanced refcount
on RTAS device-tree node.
This fixes above issue by decreasing RTAS device-tree node's refcount
in error path.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This uses of_property_read_u32() in rtas_initialize() so that we
needn't explicitly care the CPU's endian.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This removes the unnecessary nested if statements in function
rtas_initialize(), to simplify the code. No functional changes
introduced.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Use kmalloc_array(), which checks for overflow of the multiplication,
rather than doing it by hand.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In commit a4b349540a ("powerpc/mm: Cleanup LPCR defines") we updated
LPCR_VRMASD wrongly as below.
-#define LPCR_VRMASD (0x1ful << (63-16))
+#define LPCR_VRMASD_SH 47
+#define LPCR_VRMASD (ASM_CONST(1) << LPCR_VRMASD_SH)
We initialize the VRMA bits in LPCR to 0x00 in kvm. Hence using a
different mask value as above while updating lpcr should not have any
impact.
This patch updates it to the correct value.
Fixes: a4b349540a ("powerpc/mm: Cleanup LPCR defines")
Reported-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Jia He <hejianet@gmail.com>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently we have optimized hand-coded assembly checksum routines for
big-endian 64-bit systems, but for little-endian we use the generic C
routines. This modifies the optimized routines to work for
little-endian. With this, we no longer need to enable
CONFIG_GENERIC_CSUM. This also fixes a couple of comments in
checksum_64.S so they accurately reflect what the associated instruction
does.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
[mpe: Use the more common __BIG_ENDIAN__]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
These functions compute an IP checksum by computing a 64-bit sum and
folding it to 32 bits (the "nofold" in their names refers to folding
down to 16 bits). However, doing (u32) (s + (s >> 32)) is not
sufficient to fold a 64-bit sum to 32 bits correctly. The addition
can produce a carry out from bit 31, which needs to be added in to
the sum to produce the correct result.
To fix this, we copy the from64to32() function from lib/checksum.c
and use that.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The OPAL memory console is reported to be size zero, as we do not
initialise the struct attr with any size information due to the size
being variable. This leads users to think that the console is empty.
Instead report the maximum size.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We now support THP with both 64k and 4K page size configuration
for radix. (hash only support THP with 64K page size). Hence we
will have CONFIG_TRANSPARENT_HUGEPAGE enabled for both PPC_64K
and PPC_4K config. Since we only need large pmd page table
with hash configuration (to store the slot information
in the second half of the table) restrict the large pmd page table
to THP and 64K configs.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We don't do this for other page table entries. So lets keep this simple
and always return false for hugepd check on a 64K page size config.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Pull x86 fix from Thomas Gleixner:
"Restore the retrigger callbacks in the IO APIC irq chips. That
addresses a long standing regression which got introduced with the
rewrite of the x86 irq subsystem two years ago and went unnoticed so
far"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/ioapic: Restore IO-APIC irq_chip retrigger callback
- More intc updates [Yuriv]
- Fix module build when unwinder is turned off
- IO Coherency Programming model updates
- Other miscll
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJYg9jlAAoJEGnX8d3iisJe0cEP/0o28bGo4cE/WrJnb4CKc3Q5
0BZSng0C3HpXzC3+ecKMa2T4GmyDG+CvZd/mcdTrn89KS/AkiR8KYK618sIB3E3u
xth8TqZtjTW5E8nHEtsVIWha3zHvwO3Zd4/9VcAe9hwwGUiLBhgS1m4yadSrS2QP
phiOrK8nzT22zi6iwJZ3tyGY7CYp4nYdhgBpgdNIE9mm5SCPal/Aj7tBCZW5HjsI
6eJxaEUeuvYsaJqLuqSGjMI7695iVdWaVkfEL9hNdeVzjULeO5jeoixoPAtiLHwS
4v2gMdWtCROZGuuVgJIGuwGUkOE9qkMvptKUNQfXwGWhjfQ1HfqZCzLRK1d0kc4A
iOUUJWk6peIy759OY9qinCqETekF1iEGdIpMjcDGhUKNbqXFil6fX0e12tpMnXuv
y2YGA1jIxYaSeloikBCSlLkF6XZV5cSdGclL2hJBnOqBW0dh/x/EJt5k85LEWYXF
cmuhkBv8UWtai4ZJXuRxX6DQD68Pn6POiGZrAx+Ud4LppXKLVJ0p6rUHVNzD5ulk
JBGdSALFgu7J4XQBIka8Msk9RdfY3UfxuHQrPnPn9p7f9df0GoqvPz14/xSbX0N2
9P/qAYskB+URd6j1c8iPl41VnmhO6fq69usEN+OyVzuv4L0CCHcJMBuvHWL6NoFy
u2MBqArM8omUtBlUGu0/
=pU0g
-----END PGP SIGNATURE-----
Merge tag 'arc-4.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC fixes from Vineet Gupta:
- more intc updates [Yuriv]
- fix module build when unwinder is turned off
- IO Coherency Programming model updates
- other miscellaneous
* tag 'arc-4.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: Revert "ARC: mm: IOC: Don't enable IOC by default"
ARC: mm: split arc_cache_init to allow __init reaping of bulk
ARCv2: IOC: Use actual memory size to setup aperture size
ARCv2: IOC: Adhere to progamming model guidelines to avoid DMA corruption
ARCv2: IOC: refactor the IOC and SLC operations into own functions
ARC: module: Fix !CONFIG_ARC_DW2_UNWIND builds
ARCv2: save r30 on kernel entry as gcc uses it for code-gen
ARCv2: IRQ: Call entry/exit functions for chained handlers in MCIP
ARC: IRQ: Use hwirq instead of virq in mask/unmask
ARC: mmu: clarify the MMUv3 programming model
Two fixes for fallout from the hugetlb changes we merged this cycle.
Ten other fixes, four only affect Power9, and the rest are a bit of a mixture
though nothing terrible.
Thanks to:
Aneesh Kumar K.V, Anton Blanchard, Benjamin Herrenschmidt, Dave Martin, Gavin
Shan, Madhavan Srinivasan, Nicholas Piggin, Reza Arbab.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJYg0dNAAoJEFHr6jzI4aWAq9QP/0wB4jL8nx4J3iazHS+abTHr
jV0TGLi3EqVWNjRAaYTDG8Dti9kAs8WhP/ueiEfZdNCE3+0eqnAxFkL1JsI3OoQU
a42m+Cl5XO3iWLfRvADWnZagqgRmOZNQuUBrfpK6YuZrmshgRhJq6n50mJmr7c4z
yhAI1CTRekZAfSwptMradsbYEBpKrKRdphdE0EuodbOZbmYwXeRX8Ut9kH83w+OS
2AQTtXq8aqgTxlaLYzuRQIz8yxMGLQj/isjvsG9FNXtMrV6n7RKqq0LJPKlZoC5o
RyZu6nS1dZQUNmvr7rIB3lPg/9dhQdVYr65+cOYcKYLurDKmz7/CVT2/UMvDXltk
pfYUB1p34658TjjrdeTTKafvpdcVHEh+Qzwgvk3edfPI7v3Ct1VV1Yk5x9zGxKRh
3O2xxx5oWJWwKm5ksJmgeNxjG4TKSLJXRCEuGg49wOMZxqlBB6z0i631Rmv6wOjA
6R9SFZhiG0niNLHMXy8uOnZs4mLS0LkumNlOYunzGap8zpaz/CxW5Ct5FLjVon4O
mG+uslRVQjKFUo2Dudssj3AIC7p1wVMXTkp5fp5cMGeMu+ipvDLxLTH1JozEHJMR
3zdaFn8qOJfUkW/aNuOHNkZd+BKQxBpmIRhBpOpnaEG6EF6e0V67YUJRq1Zdf50x
R1DTmvNsyxQTTJPRLx6Z
=xOFu
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Two fixes for fallout from the hugetlb changes we merged this cycle.
Ten other fixes, four only affect Power9, and the rest are a bit of a
mixture though nothing terrible.
Thanks to: Aneesh Kumar K.V, Anton Blanchard, Benjamin Herrenschmidt,
Dave Martin, Gavin Shan, Madhavan Srinivasan, Nicholas Piggin, Reza
Arbab"
* tag 'powerpc-4.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc: Ignore reserved field in DCSR and PVR reads and writes
powerpc/ptrace: Preserve previous TM fprs/vsrs on short regset write
powerpc/ptrace: Preserve previous fprs/vsrs on short regset write
powerpc/perf: Use MSR to report privilege level on P9 DD1
selftest/powerpc: Wrong PMC initialized in pmc56_overflow test
powerpc/eeh: Enable IO path on permanent error
powerpc/perf: Fix PM_BRU_CMPL event code for power9
powerpc/mm: Fix little-endian 4K hugetlb
powerpc/mm/hugetlb: Don't panic when we don't find the default huge page size
powerpc: Fix pgtable pmd cache init
powerpc/icp-opal: Fix missing KVM case and harden replay
powerpc/mm: Fix memory hotplug BUG() on radix
ARM:
- Fix for timer setup on VHE machines
- Drop spurious warning when the timer races against the vcpu running
again
- Prevent a vgic deadlock when the initialization fails (for stable)
s390:
- Fix a kernel memory exposure (for stable)
x86:
- Fix exception injection when hypercall instruction cannot be patched
-----BEGIN PGP SIGNATURE-----
iQEcBAABCAAGBQJYglwIAAoJEED/6hsPKofoZp0H+gLLEeKP0Mu+olXiOWjB/KFp
WBDAR1872xIjvEcOl9l6AZgdmp2hk7KW1t+kJj5npgu237v6fHBO9ybqrAfhfU4l
PH23zOebL15HINcwCK6OcxOTiOtgae5Nui1cnLJBHDQgPTC/VmIE8NgV/qrMyo2r
Vth+K/cBLKiWG9JhyQvxmrfupNJUknLSH7CTnlO/fC8GEJzDfMpUl7B1Ui0TGK53
ExVgVLg3F28SErj9bUU8y4VJhMrwDAf2Kx2BNHqDbzXMzTdp0LrGRymFLl2/Gxez
zLtZDfGYYzEhPp1NuDydlxLb8ymnsQNB7K6Kau0w9JoAvOYwfUYfDt+GaTegwYM=
=dPtS
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Radim Krčmář:
"ARM:
- Fix for timer setup on VHE machines
- Drop spurious warning when the timer races against the vcpu running
again
- Prevent a vgic deadlock when the initialization fails (for stable)
s390:
- Fix a kernel memory exposure (for stable)
x86:
- Fix exception injection when hypercall instruction cannot be
patched"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: s390: do not expose random data via facility bitmap
KVM: x86: fix fixing of hypercalls
KVM: arm/arm64: vgic: Fix deadlock on error handling
KVM: arm64: Access CNTHCTL_EL2 bit fields correctly on VHE systems
KVM: arm/arm64: Fix occasional warning from the timer work function
uninitialised variables
- SWIOTLB DMA API fall-back allocation fix when the SWIOTLB buffer is
not initialised (all RAM is suitable for 32-bit DMA masks)
- Fix the bad_mode function returning for unhandled exceptions coming
from user space
- Fix name clash in __page_to_voff()
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJYghTJAAoJEGvWsS0AyF7xCTsP/jKcUyfpDVwegS/Qf66pTfnQ
THDDgNvadnFuU+qUJc+97ZV0V4q1l2azxbelK8tG0i68jRQtf41gY13yXtxyvUzl
u4stMhlIZX7VSIUhfti+cfER+VObAvA4f5XK7taXgDFUqgaFOLapjyzzKC40djim
OLAo8PtxB9n3AgV8M5uDvdHIvxVkot84k0vKlQO1wBYQowMDMTkHw9HLbGx2pHnm
58xFB/aSwEYOy4wJcPISQu1pq02T8LwCnOU7tE4tNkcQSIopEsbqX3+TXMktlvc8
f9W8J0knLRGwp0nGw3+qnmDu1r5juFkrE6U/0jxTxLGnH6voPemlWmzuZQMTqJTW
uPvALhkU5qd8S4FaOxGMZb01F7xisvBgD984Ej2uYyTBCS5Q0iyB/Z/szDMFZh4C
1v2W2eDGUvJgt5f8b83/s9j637OxT2M3P/swZYo4lqQ18srzMzgN2/RkoJuBcjmW
mLkm4qcswWpmItmwGgW3yRaBTZfbO1ab/fuXFe+AtwPhNyClMU/88L90Sn7zgOET
rCcm6gJe0RwaFJ2tjUWpx6Ygejn/eOsDXCo3DRnyOEQq26KOjjimMct0uR4z2INV
ODrLYZjRMIwNAMdD3vcFvIBfHrlKqK0lq8T+lc0wk+6Pggu2X5yDqPP8ckNImfRv
aTD7Rsbk7/P5IgtKwwjV
=IArX
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- avoid potential stack information leak via the ptrace ABI caused by
uninitialised variables
- SWIOTLB DMA API fall-back allocation fix when the SWIOTLB buffer is
not initialised (all RAM is suitable for 32-bit DMA masks)
- fix the bad_mode function returning for unhandled exceptions coming
from user space
- fix name clash in __page_to_voff()
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: avoid returning from bad_mode
arm64/ptrace: Reject attempts to set incomplete hardware breakpoint fields
arm64/ptrace: Avoid uninitialised struct padding in fpr_set()
arm64/ptrace: Preserve previous registers for short regset write
arm64/ptrace: Preserve previous registers for short regset write
arm64/ptrace: Preserve previous registers for short regset write
arm64: mm: avoid name clash in __page_to_voff()
arm64: Fix swiotlb fallback allocation
kvm_s390_get_machine() populates the facility bitmap by copying bytes
from the host results that are stored in a 256 byte array in the prefix
page. The KVM code does use the size of the target buffer (2k), thus
copying and exposing unrelated kernel memory (mostly machine check
related logout data).
Let's use the size of the source buffer instead. This is ok, as the
target buffer will always be greater or equal than the source buffer as
the KVM internal buffers (and thus S390_ARCH_FAC_LIST_SIZE_BYTE) cover
the maximum possible size that is allowed by STFLE, which is 256
doublewords. All structures are zero allocated so we can leave bytes
256-2047 unchanged.
Add a similar fix for kvm_arch_init_vm().
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
[found with smatch]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
CC: stable@vger.kernel.org
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
IBM bit 31 (for the rest of us - bit 0) is a reserved field in the
instruction definition of mtspr and mfspr. Hardware is encouraged to
(and does) ignore it.
As a result, if userspace executes an mtspr DSCR with the reserved bit
set, we get a DSCR facility unavailable exception. The kernel fails to
match against the expected value/mask, and we silently return to
userspace to try and re-execute the same mtspr DSCR instruction. We
loop forever until the process is killed.
We should do something here, and it seems mirroring what hardware does
is the better option vs killing the process. While here, relax the
matching of mfspr PVR too.
Cc: stable@vger.kernel.org
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Ensure that if userspace supplies insufficient data to PTRACE_SETREGSET
to fill all the check pointed registers, the thread's old check pointed
registers are preserved.
Fixes: 9d3918f7c0 ("powerpc/ptrace: Enable support for NT_PPC_CVSX")
Fixes: 19cbcbf75a ("powerpc/ptrace: Enable support for NT_PPC_CFPR")
Cc: stable@vger.kernel.org # v4.8+
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Ensure that if userspace supplies insufficient data to PTRACE_SETREGSET
to fill all the registers, the thread's old registers are preserved.
Fixes: c6e6771b87 ("powerpc: Introduce VSX thread_struct and CONFIG_VSX")
Cc: stable@vger.kernel.org # v2.6.27+
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We've been sitting on fixes for a while, and they keep trickling in at a low
rate. Nothing in here comes across as particularly scary or noteworthy, for
the most part it's a large collection of small DT tweaks.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJYgVpgAAoJEIwa5zzehBx3umkP/A2082k9R6srk+/bdeDf6F+w
YiiMJdu37jWb/W71gMlr95NC3zmk4c+U0eHj8iOdsuOYyzSZ6uUQcm8Etg7N2JwR
cupsySXvlRJ9Hq28SPe6vRnNFqGiDGmGrcfNlwnfHd/CausaJBdcocbyTsVt+omO
WGMDPy5miN8TIbYQiu6jF2sXkuuVHwXlQyBi52xW5w7Uy0iZDZdsW6GuziK0zpDH
k0QktKkVx/Q8Riy3b9Vj7kKwvaGXF2JBMsGpORhs4+JcdZl9u+GBJnmehpXmABTn
8mXEU5zu6gnBHMXxExaK/ZlFDk0yHNxGfapoRQwYecPeBZQGXWu9vyUa3/38npLr
egyMDzBgJJyHXbs7BXy6weiysn8adsNS3juhniL7mLuTp2hGZHNK6IcH2tV4Z/kD
hq/VTK/BzmKAY/GP3psoQXVavIUifh498ymCkgoZtUx8Eqq9ZrFA9hkm86F/9eOJ
vfGNTdVuPI51tdKrmqMXglI4iBc35oSyOQUlUL0DXVlKrzzaVPbQwnIZclgI+VGn
qos8l8vPxDzQ7lRgMzXsmq8D6pDSfUZQqvUr8gld/zNvd8+LsdmVYHdrvbIQu+sN
TsI0TyfU4JphSdCPtFKasu7aDBmwfB8npDXUcOazAyG8UyHZpikR9WmmIQ63BwRL
DYVre9JMhrfL6ZY2kaKC
=cc1U
-----END PGP SIGNATURE-----
Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Olof Johansson:
"We've been sitting on fixes for a while, and they keep trickling in at
a low rate. Nothing in here comes across as particularly scary or
noteworthy, for the most part it's a large collection of small DT
tweaks"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (24 commits)
ARM: dts: da850-evm: fix read access to SPI flash
ARM: dts: omap3: Fix Card Detect and Write Protect on Logic PD SOM-LV
ARM64: dts: meson-gxbb-odroidc2: Disable SCPI DVFS
ARM: dts: OMAP5 / DRA7: indicate that SATA port 0 is available.
ARM: dts: NSP: Fix DT ranges error
ARM: multi_v7_defconfig: set bcm47xx watchdog
ARM: multi_v7_defconfig: fix config typo
ARM: dts: dra72-evm-revc: fix typo in ethernet-phy node
soc: ti: wkup_m3_ipc: Fix error return code in wkup_m3_ipc_probe()
ARM: ux500: fix prcmu_is_cpu_in_wfi() calculation
ARM: dts: sunxi: Change node name for pwrseq pin on Olinuxino-lime2-emmc
ARM: dts: sun8i: Support DTB build for NanoPi M1
ARM: dts: sun6i: hummingbird: Enable display engine again
ARM: dts: sun6i: Disable display pipeline by default
ARM, ARM64: dts: drop "arm,amba-bus" in favor of "simple-bus" part 3
ARM: dts: imx6qdl-nitrogen6_som2: fix sgtl5000 pinctrl init
ARM: dts: imx6qdl-nitrogen6_max: fix sgtl5000 pinctrl init
ARM: OMAP1: DMA: Correct the number of logical channels
ARM: dts: am335x-icev2: Remove the duplicated pinmux setting
ARM: OMAP2+: Fix WL1283 Bluetooth Baud Rate
...
Read access to the SPI flash are broken on da850-evm, i.e. the data
read is not what is actually programmed on the flash.
According to the datasheet for the M25P64 part present on the da850-evm,
if the SPI frequency is higher than 20MHz then the READ command is not
usable anymore and only the FAST_READ command can be used to read data.
This commit specifies in the DTS that we should use FAST_READ command
instead of the READ command.
Cc: stable@vger.kernel.org
Tested-by: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: Fabien Parent <fparent@baylibre.com>
[nsekhar@ti.com: subject line adjustment]
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJYgMnBAAoJEFmIoMA60/r8+PAQAKwSfmjn7y0cOabzrSOShrTA
DutYzp1idgXlj8nmNIy04O/aQfK2GeXJlmWX3ye+D6c4Yn+m5CGpbCpx6WbWvvvX
9qgJmxGp5yq9iy5gi45iAyXp7kfBUvEbPd7pFRg3Rr3g73uGm3whd9ZcNUs7onBL
B+p7q4Sq4/Hgy0yzbMkYe6s7ogXKa3lHt15WkETmaYaFayRlDIL1SAtFOddmi67r
ooV4qm3QZm4JgCPxN0YHrA8ffUC1V9n9esPg11+UNUFxG9u5GZykQ8nedm+54HjT
BVE7v9SqChf7lZArgTXM/d+L/mmK9Hmx6mfrgnZav+GiG8OZ27nzv/X7eabQ/bcu
C/coO2BQhkGRcQ2yMa8JtQp2+BMPuc0io2i+U18TXAt+x7DzlW4nC1WOywb/Xuu3
aJhIEH8SFNnLoM5H+sXLWXsSYG86M4lKHw3ufzH/TOV85J301N/KH6OUdaYaEt+/
nta3xsz8qA+vDWmyYxpKzZGWQEqRDaBEJxd+bO+kSRcNfnFMUpQ9PkCLW19DVRWM
YsLn81LYlLwH9z7pQ+y9okqZPViGs+Ta3fRLLeIlxDSJ6B2PAmoZdfa5LGKlrz6b
nCT26YEPwK++nS3dGvh93k7FiTZE0LWJkfs734Wu9Jnz2C4wATqWwyCij5a2MXLn
lilujaUV2xNhQPfZZ3Jk
=4X8Q
-----END PGP SIGNATURE-----
Merge tag 'pci-v4.10-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
- recognize that a PCI-to-PCIe bridge originates a PCIe hierarchy, so
we enumerate that hierarchy correctly
- X-Gene: fix a change merged for v4.10 that broke MSI
- Keystone: avoid reading undefined registers, which can cause
asynchronous external aborts
- Supermicro X8DTH-i/6/iF/6F: ignore broken _CRS that caused us to
change (and break) existing I/O port assignments
* tag 'pci-v4.10-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI/MSI: pci-xgene-msi: Fix CPU hotplug registration handling
PCI: Enumerate switches below PCI-to-PCIe bridges
x86/PCI: Ignore _CRS on Supermicro X8DTH-i/6/iF/6F
PCI: designware: Check for iATU unroll only on platforms that use ATU
Pull two s390 bug fixes from Martin Schwidefsky:
"Two changes, the first is a fix to add a missing memory clobber to the
inline assembly to load control registers. This has not caused any
issues so far, but who knows what code gcc will generate in future
versions.
The second change is an update for the default configurations. This
includes CONFIG_BUG_ON_DATA_CORRUPTION=y, we want this to be enabled
for s390. The usual approach to debug problems on production systems
is to use crash on a system dump and for us avoiding data corruptions
is priority one"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390: update defconfigs
s390/ctl_reg: make __ctl_load a full memory barrier
Generally, taking an unexpected exception should be a fatal event, and
bad_mode is intended to cater for this. However, it should be possible
to contain unexpected synchronous exceptions from EL0 without bringing
the kernel down, by sending a SIGILL to the task.
We tried to apply this approach in commit 9955ac47f4 ("arm64:
don't kill the kernel on a bad esr from el0"), by sending a signal for
any bad_mode call resulting from an EL0 exception.
However, this also applies to other unexpected exceptions, such as
SError and FIQ. The entry paths for these exceptions branch to bad_mode
without configuring the link register, and have no kernel_exit. Thus, if
we take one of these exceptions from EL0, bad_mode will eventually
return to the original user link register value.
This patch fixes this by introducing a new bad_el0_sync handler to cater
for the recoverable case, and restoring bad_mode to its original state,
whereby it calls panic() and never returns. The recoverable case
branches to bad_el0_sync with a bl, and returns to userspace via the
usual ret_to_user mechanism.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Fixes: 9955ac47f4 ("arm64: don't kill the kernel on a bad esr from el0")
Reported-by: Mark Salter <msalter@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The programming model has been fixed with prev patches so re-enable it
by default
This reverts commit 23cb1f6440.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
arc_cache_init() is called for each core so can't be tagged __init.
However bulk of it is only executed by master core and thus is candidate
for __init reaping.
So split it up to allow that.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
omap1, and then a handful of trivial fixes for boards and devices to
work:
- Fixes TI wilink bluetooth strange platform data baud rate
- Remove duplicate pinmux line for am335x-icev2
- Fix omap1 dma regression
- Fix uninitialized return value for wkup_m3_ipc_probe()
- Fix Ethernet PHY binding typo for dra72-evm
- Fix init for omap5 and dra7 sata ports
- Fix mmc card detect pin for Logic PD SOM-LV
-----BEGIN PGP SIGNATURE-----
iQJFBAABCAAvFiEEkgNvrZJU/QSQYIcQG9Q+yVyrpXMFAlh+lYcRHHRvbnlAYXRv
bWlkZS5jb20ACgkQG9Q+yVyrpXMYFg//TvDU9XADwlzHt4IbP8fFilVcmB1R3LeI
+PYJvPiXqlN8T+CnQ1R2ZiI9GQoP8YLwF5UFcIsAVZMxg8l2XpJa0mjd1d/yK7o3
zuYcUx+rEGRqW75SDYqw5VPtjtNHrumjYjipaC2GFIu3Wr9UfFDxcKU8h5mcARQ4
JlyqO/Dn7/defQq0Yg2v5CcHjMz+5d/A9iAVF3kYelXLwX+wMnlQxv5H0FKLodz8
nZ14yR5hyANhGbVyyoapKQWj+uEh205FoBoZW8ws+VCRHsswepA3m05w53LaElBs
rCdA7HkRjdp3XkelXssclMdRpB42/LieouhBbtyaBNhaVPfUH6EBGcVB2cIbBmWU
6jauifjyvEXl0hni+85eB1A1sTIsofEcvhHBBIwlZ1tyeVoPEVT/3Ito+k4W2iQk
K4nSi/YuvLBBGKYUoFSovaEXj1uydG0PrNZ3cYGCstjwOvwbCuu2AMWP62jPlhD2
SDHqrEaojyRynvNi9Bus+LJGREf3dcnLCytEyErsQsRglkiOigr6bgGoSyDxkFuk
1HtYgArspeRB4IdOSX8PXP93K3sARoyxTWwu2f8HZxoqHraOjPZiwfZ3KS0MnZnh
bMIlCTx7YKOI4dBHrgDoLJZRH9P9Io9sCnaJ+Jkz4dXUOJEsf4QA0wjCFHhNQa7h
u4ZAQVlLgpU=
=VJaM
-----END PGP SIGNATURE-----
Merge tag 'omap-for-v4.10/fixes-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into fixes
Fixes for omaps for v4.10-rc cycle. Mostly a DMA regression fix for
omap1, and then a handful of trivial fixes for boards and devices to
work:
- Fixes TI wilink bluetooth strange platform data baud rate
- Remove duplicate pinmux line for am335x-icev2
- Fix omap1 dma regression
- Fix uninitialized return value for wkup_m3_ipc_probe()
- Fix Ethernet PHY binding typo for dra72-evm
- Fix init for omap5 and dra7 sata ports
- Fix mmc card detect pin for Logic PD SOM-LV
* tag 'omap-for-v4.10/fixes-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: dts: omap3: Fix Card Detect and Write Protect on Logic PD SOM-LV
ARM: dts: OMAP5 / DRA7: indicate that SATA port 0 is available.
ARM: dts: dra72-evm-revc: fix typo in ethernet-phy node
soc: ti: wkup_m3_ipc: Fix error return code in wkup_m3_ipc_probe()
ARM: OMAP1: DMA: Correct the number of logical channels
ARM: dts: am335x-icev2: Remove the duplicated pinmux setting
ARM: OMAP2+: Fix WL1283 Bluetooth Baud Rate
Signed-off-by: Olof Johansson <olof@lixom.net>
vs. fixed 512M before.
But this still assumes that all of memory is under IOC which may not be
true for the SoC. Improve that later when this becomes a real issue, by
specifying this from DT.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
On AXS103 release bitfiles, DMA data corruptions were seen because IOC
setup was not following the recommended way in documentation.
Flipping IOC on when caches are enabled or coherency transactions are in
flight, might cause some of the memory operations to not observe
coherency as expected.
So strictly follow the programming model recommendations as documented
in comment header above arc_ioc_setup()
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>