Rather than calling hard_irq_disable() when we're back in C code
we can just call RECONCILE_IRQ_STATE to soft disable IRQs while
we're already in hard disabled state.
This should be functionally equivalent to the code before, but
cleaner and faster.
Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
[agraf: fix comment, commit message]
Signed-off-by: Alexander Graf <agraf@suse.de>
KVM uses same WIM tlb attributes as the corresponding qemu pte.
For this we now search the linux pte for the requested page and
get these cache caching/coherency attributes from pte.
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Reviewed-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
We need to search linux "pte" to get "pte" attributes for setting TLB in KVM.
This patch defines a lookup_linux_ptep() function which returns pte pointer.
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Reviewed-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
lookup_linux_pte() is doing more than lookup, updating the pte,
so for clarity it is renamed to lookup_linux_pte_and_update()
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Reviewed-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
On booke, "struct tlbe_ref" contains host tlb mapping information
(pfn: for guest-pfn to pfn, flags: attribute associated with this mapping)
for a guest tlb entry. So when a guest creates a TLB entry then
"struct tlbe_ref" is set to point to valid "pfn" and set attributes in
"flags" field of the above said structure. When a guest TLB entry is
invalidated then flags field of corresponding "struct tlbe_ref" is
updated to point that this is no more valid, also we selectively clear
some other attribute bits, example: if E500_TLB_BITMAP was set then we clear
E500_TLB_BITMAP, if E500_TLB_TLB0 is set then we clear this.
Ideally we should clear complete "flags" as this entry is invalid and does not
have anything to re-used. The other part of the problem is that when we use
the same entry again then also we do not clear (started doing or-ing etc).
So far it was working because the selectively clearing mentioned above
actually clears "flags" what was set during TLB mapping. But the problem
starts coming when we add more attributes to this then we need to selectively
clear them and which is not needed.
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Reviewed-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
This modifies kvmppc_load_fp and kvmppc_save_fp to use the generic
FP/VSX and VMX load/store functions instead of open-coding the
FP/VSX/VMX load/store instructions. Since kvmppc_load/save_fp don't
follow C calling conventions, we make them private symbols within
book3s_hv_rmhandlers.S.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Now that we have the vcpu floating-point and vector state stored in
the same type of struct as the main kernel uses, we can load that
state directly from the vcpu struct instead of having extra copies
to/from the thread_struct. Similarly, when the guest state needs to
be saved, we can have it saved it directly to the vcpu struct by
setting the current->thread.fp_save_area and current->thread.vr_save_area
pointers. That also means that we don't need to back up and restore
userspace's FP/vector state. This all makes the code simpler and
faster.
Note that it's not necessary to save or modify current->thread.fpexc_mode,
since nothing in KVM uses or is affected by its value. Nor is it
necessary to touch used_vr or used_vsr.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
This uses struct thread_fp_state and struct thread_vr_state to store
the floating-point, VMX/Altivec and VSX state, rather than flat arrays.
This makes transferring the state to/from the thread_struct simpler
and allows us to unify the get/set_one_reg implementations for the
VSX registers.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
The load_up_fpu and load_up_altivec functions were never intended to
be called from C, and do things like modifying the MSR value in their
callers' stack frames, which are assumed to be interrupt frames. In
addition, on 32-bit Book S they require the MMU to be off.
This makes KVM use the new load_fp_state() and load_vr_state() functions
instead of load_up_fpu/altivec. This means we can remove the assembler
glue in book3s_rmhandlers.S, and potentially fixes a bug on Book E,
where load_up_fpu was called directly from C.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
kvm_hypercall0() and friends have nothing KVM specific so moved to
epapr_hypercall0() and friends. Also they are moved from
arch/powerpc/include/asm/kvm_para.h to arch/powerpc/include/asm/epapr_hcalls.h
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
kvm_hypercall() have nothing KVM specific, so renamed to epapr_hypercall().
Also this in moved to arch/powerpc/include/asm/epapr_hcalls.h
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Systems that support automatic loading of kernel modules through
device aliases should try and automatically load kvm when /dev/kvm
gets opened.
Add code to support that magic for all PPC kvm targets, even the
ones that don't support modules yet.
Signed-off-by: Alexander Graf <agraf@suse.de>
This patch fixed several typo in printk from various
part of kernel source.
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Pull powerpc fixes from Ben Herrenschmidt:
"A bit more endian problems found during testing of 3.13 and a few
other simple fixes and regressions fixes"
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc: Fix alignment of secondary cpu spin vars
powerpc: Align p_end
powernv/eeh: Add buffer for P7IOC hub error data
powernv/eeh: Fix possible buffer overrun in ioda_eeh_phb_diag()
powerpc: Make 64-bit non-VMX __copy_tofrom_user bi-endian
powerpc: Make unaligned accesses endian-safe for powerpc
powerpc: Fix bad stack check in exception entry
powerpc/512x: dts: disable MPC5125 usb module
powerpc/512x: dts: remove misplaced IRQ spec from 'soc' node (5125)
Commit 5c0484e25e ('powerpc: Endian safe trampoline') resulted in
losing proper alignment of the spinlock variables used when booting
secondary CPUs, causing some quite odd issues with failing to boot on
PA Semi-based systems.
This showed itself on ppc64_defconfig, but not on pasemi_defconfig,
so it had gone unnoticed when I initially tested the LE patch set.
Fix is to add explicit alignment instead of relying on good luck. :)
[ It appears that there is a different issue with PA Semi systems
however this fix is definitely correct so applying anyway -- BenH
]
Fixes: 5c0484e25e ('powerpc: Endian safe trampoline')
Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=67811
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
p_end is an 8 byte value embedded in the text section. This means it
is only 4 byte aligned when it should be 8 byte aligned. Fix this
by adding an explicit alignment.
This fixes an issue where POWER7 little endian builds with
CONFIG_RELOCATABLE=y fail to boot.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Prevent ioda_eeh_hub_diag() from clobbering itself when called by supplying
a per-PHB buffer for P7IOC hub diagnostic data. Take care to inform OPAL of
the correct size for the buffer.
[Small style change to the use of sizeof -- BenH]
Signed-off-by: Brian W Hart <hartb@linux.vnet.ibm.com>
Acked-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
PHB diagnostic buffer may be smaller than PAGE_SIZE, especially when
PAGE_SIZE > 4KB.
Signed-off-by: Brian W Hart <hartb@linux.vnet.ibm.com>
Acked-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The powerpc 64-bit __copy_tofrom_user() function uses shifts to handle
unaligned invocations. However, these shifts were designed for
big-endian systems: On little-endian systems, they must shift in the
opposite direction.
This commit relies on the C preprocessor to insert the correct shifts
into the assembly code.
[ This is a rare but nasty LE issue. Most of the time we use the POWER7
optimised __copy_tofrom_user_power7 loop, but when it hits an exception
we fall back to the base __copy_tofrom_user loop. - Anton ]
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The generic put_unaligned/get_unaligned macros were made endian-safe by
calling the appropriate endian dependent macros based on the endian type
of the powerpc processor.
Signed-off-by: Rajesh B Prathipati <rprathip@linux.vnet.ibm.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
In EXCEPTION_PROLOG_COMMON() we check to see if the stack pointer (r1)
is valid when coming from the kernel. If it's not valid, we die but
with a nice oops message.
Currently we allocate a stack frame (subtract INT_FRAME_SIZE) before we
check to see if the stack pointer is negative. Unfortunately, this
won't detect a bad stack where r1 is less than INT_FRAME_SIZE.
This patch fixes the check to compare the modified r1 with
-INT_FRAME_SIZE. With this, bad kernel stack pointers (including NULL
pointers) are correctly detected again.
Kudos to Paulus for finding this.
Signed-off-by: Michael Neuling <mikey@neuling.org>
cc: stable@vger.kernel.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
These interfaces:
pcibios_resource_to_bus(struct pci_dev *dev, *bus_region, *resource)
pcibios_bus_to_resource(struct pci_dev *dev, *resource, *bus_region)
took a pci_dev, but they really depend only on the pci_bus. And we want to
use them in resource allocation paths where we have the bus but not a
device, so this patch converts them to take the pci_bus instead of the
pci_dev:
pcibios_resource_to_bus(struct pci_bus *bus, *bus_region, *resource)
pcibios_bus_to_resource(struct pci_bus *bus, *resource, *bus_region)
In fact, with standard PCI-PCI bridges, they only depend on the host
bridge, because that's the only place address translation occurs, but
we aren't going that far yet.
[bhelgaas: changelog]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
At the moment the USB controller's pin muxing is not setup
correctly and causes a kernel panic upon system startup, so
disable the USB1 device tree node in the MPC5125 tower board
dts file.
The USB controller is connected to an USB3320 ULPI transceiver
and the device tree should receive an update to reflect correct
dependencies and required initialization data before the USB1
node can get re-enabled.
Signed-off-by: Matteo Facchinetti <matteo.facchinetti@sirius-es.it>
Signed-off-by: Anatolij Gustschin <agust@denx.de>
Correct spelling typo in various part of kernel
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
the 'soc' node in the MPC5125 "tower" board .dts has an '#interrupt-cells'
property although this node is not an interrupt controller
remove this erroneously placed property because starting with v3.13-rc1
lookup and resolution of 'interrupts' specs for peripherals gets misled
(tries to use the 'soc' as the interrupt parent which fails), emits
'no irq domain found' WARN() messages and breaks the boot process
[ best viewed with 'git diff -U5' to have DT node names in the context ]
Cc: Anatolij Gustschin <agust@denx.de>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: devicetree@vger.kernel.org
Signed-off-by: Gerhard Sittig <gsi@denx.de>
Signed-off-by: Anatolij Gustschin <agust@denx.de>
Commit caaa4c804f ("KVM: PPC: Book3S HV: Fix physical address
calculations") unfortunately resulted in some low-order address bits
getting dropped in the case where the guest is creating a 4k HPTE
and the host page size is 64k. By getting the low-order bits from
hva rather than gpa we miss out on bits 12 - 15 in this case, since
hva is at page granularity. This puts the missing bits back in.
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
We don't use PACATOC for PR. Avoid updating HOST_R2 with PR
KVM mode when both HV and PR are enabled in the kernel. Without this we
get the below crash
(qemu)
Unable to handle kernel paging request for data at address 0xffffffffffff8310
Faulting instruction address: 0xc00000000001d5a4
cpu 0x2: Vector: 300 (Data Access) at [c0000001dc53aef0]
pc: c00000000001d5a4: .vtime_delta.isra.1+0x34/0x1d0
lr: c00000000001d760: .vtime_account_system+0x20/0x60
sp: c0000001dc53b170
msr: 8000000000009032
dar: ffffffffffff8310
dsisr: 40000000
current = 0xc0000001d76c62d0
paca = 0xc00000000fef1100 softe: 0 irq_happened: 0x01
pid = 4472, comm = qemu-system-ppc
enter ? for help
[c0000001dc53b200] c00000000001d760 .vtime_account_system+0x20/0x60
[c0000001dc53b290] c00000000008d050 .kvmppc_handle_exit_pr+0x60/0xa50
[c0000001dc53b340] c00000000008f51c kvm_start_lightweight+0xb4/0xc4
[c0000001dc53b510] c00000000008cdf0 .kvmppc_vcpu_run_pr+0x150/0x2e0
[c0000001dc53b9e0] c00000000008341c .kvmppc_vcpu_run+0x2c/0x40
[c0000001dc53ba50] c000000000080af4 .kvm_arch_vcpu_ioctl_run+0x54/0x1b0
[c0000001dc53bae0] c00000000007b4c8 .kvm_vcpu_ioctl+0x478/0x730
[c0000001dc53bca0] c0000000002140cc .do_vfs_ioctl+0x4ac/0x770
[c0000001dc53bd80] c0000000002143e8 .SyS_ioctl+0x58/0xb0
[c0000001dc53be30] c000000000009e58 syscall_exit+0x0/0x98
Signed-off-by: Alexander Graf <agraf@suse.de>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.15 (GNU/Linux)
iQEcBAABAgAGBQJSrhGrAAoJEHm+PkMAQRiGsNoH/jIK3CsQ2lbW7yRLXmfgtbzz
i2Kep6D4SDvmaLpLYOVC8xNYTiE8jtTbSXHomwP5wMZ63MQDhBfnEWsEWqeZ9+D9
3Q46p0QWuoBgYu2VGkoxTfygkT6hhSpwWIi3SeImbY4fg57OHiUil/+YGhORM4Qc
K4549OCTY3sIrgmWL77gzqjRUo+pQ4C73NKqZ3+5nlOmYBZC1yugk8mFwEpQkwhK
4NRNU760Fo+XIht/bINqRiPMddzC15p0mxvJy3cDW8bZa1tFSS9SB7AQUULBbcHL
+2dFlFOEb5SV1sNiNPrJ0W+h2qUh2e7kPB0F8epaBppgbwVdyQoC2u4uuLV2ZN0=
=lI2r
-----END PGP SIGNATURE-----
Merge tag 'v3.13-rc4' into core/locking
Merge Linux 3.13-rc4, to refresh this rather old tree with the latest fixes.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The powerpc lock acquisition sequence is as follows:
lwarx; cmpwi; bne; stwcx.; lwsync;
Lock release is as follows:
lwsync; stw;
If CPU 0 does a store (say, x=1) then a lock release, and CPU 1
does a lock acquisition then a load (say, r1=y), then there is
no guarantee of a full memory barrier between the store to 'x'
and the load from 'y'. To see this, suppose that CPUs 0 and 1
are hardware threads in the same core that share a store buffer,
and that CPU 2 is in some other core, and that CPU 2 does the
following:
y = 1; sync; r2 = x;
If 'x' and 'y' are both initially zero, then the lock
acquisition and release sequences above can result in r1 and r2
both being equal to zero, which could not happen if unlock+lock
was a full barrier.
This commit therefore makes powerpc's
smp_mb__after_unlock_lock() be a full barrier.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: <linux-arch@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1386799151-2219-8-git-send-email-paulmck@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Since the commit 15ad7146 ("KVM: Use the scheduler preemption notifiers
to make kvm preemptible"), the remaining stuff in this function is a
simple cond_resched() call with an extra need_resched() check which was
there to avoid dropping VCPUs unnecessarily. Now it is meaningless.
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We are passing pointers to the firmware for reads, we need to properly
convert the result as OPAL is always BE.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
opal_xscom_read uses a pointer to return the data so we need
to byteswap it on LE builds.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
A couple more device tree properties that need byte swapping.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The MSI code is miscalculating quotas in little endian mode.
Add required byteswaps to fix this.
Before we claimed a quota of 65536, after the patch we
see the correct value of 256.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We need to byteswap ibm,pcie-link-speed-stats.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The NVRAM code has a number of endian issues. I noticed a very
confused error log count:
RTAS: 100663330 -------- RTAS event begin --------
100663330 == 0x06000022. 0x6 LE error logs and 0x22 BE error logs.
The pstore code has similar issues - if we write an oops in one
endian and attempt to read it in another we get junk.
Make both of these formats big endian, and byteswap as required.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
cpu_to_core_id() is missing a byteswap:
cat /sys/devices/system/cpu/cpu63/topology/core_id
201326592
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
During on LE boot we see:
Partition configured for 1073741824 cpus, operating system maximum is 2048.
Clearly missing a byteswap here.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
There is a bug in using ptrace to access FPRs via PTRACE_PEEKUSR /
PTRACE_POKEUSR. In effect, trying to access any of the FPRs always
really accesses FPR0, which does seriously break debugging :-)
The problem seems to have been introduced by commit 3ad26e5c44
(Merge branch 'for-kvm' into next).
[ It is indeed a merge conflict between Paul's FPU/VSX state rework
and my LE patches - Anton ]
Signed-off-by: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Commit ce11e48b7f ("KVM: PPC: E500: Add
userspace debug stub support") added "struct thread_struct" to the
stack of kvmppc_vcpu_run(). thread_struct is 1152 bytes on my build,
compared to 48 bytes for the recently-introduced "struct debug_reg".
Use the latter instead.
This fixes the following error:
cc1: warnings being treated as errors
arch/powerpc/kvm/booke.c: In function 'kvmppc_vcpu_run':
arch/powerpc/kvm/booke.c:760:1: error: the frame size of 1424 bytes is larger than 1024 bytes
make[2]: *** [arch/powerpc/kvm/booke.o] Error 1
make[1]: *** [arch/powerpc/kvm] Error 2
make[1]: *** Waiting for unfinished jobs....
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
The current logic sets the kdump base to min of 2G or ppc64_rma_size/2.
On PowerNV kernel the first memory block 'memory@0' can be very large,
equal to the DIMM size with ppc64_rma_size value capped to 1G. Hence on
PowerNV, kdump base is set to 512M resulting kdump to fail while allocating
paca array. This is because, paca need its memory from RMA region capped
at 256M (see allocate_pacas()).
This patch lowers the kdump base cap to 128M so that kdump kernel can
successfully get memory below 256M for paca allocation.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
I have recently found out that no iommu_groups could be found under
/sys/ on a P8. That prevents PCI passthrough from working.
During my investigation, I found out there seems to be a missing
iommu_register_group for PHB3. The following patch seems to fix the
problem. After applying it, I see iommu_groups under
/sys/kernel/iommu_groups/, and can also bind vfio-pci to an adapter,
which gives me a device at /dev/vfio/.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The bestcomm driver has been moved to drivers/dma, so to select
this driver by default additionally CONFIG_DMADEVICES has to be
enabled. Currently it is not enabled in the config despite existing
CONFIG_PPC_BESTCOMM=y in the config files. Fix it.
Signed-off-by: Anatolij Gustschin <agust@denx.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
At least some distros expect it these days; turn it on. Also, random
churn from doing a savedefconfig for the first time in a year or so.
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
In pte_alloc_one(), pgtable_page_ctor() is passed an address that has
not been converted by page_address() to the newly allocated PTE page.
When the PTE is freed, __pte_free_tlb() calls pgtable_page_dtor()
with an address to the PTE page that has been converted by page_address().
The mismatch in the PTE's page address causes pgtable_page_dtor() to access
invalid memory, so resources for that PTE (such as the page lock) is not
properly cleaned up.
On PPC32, only SMP kernels are affected.
On PPC64, only SMP kernels with 4K page size are affected.
This bug was introduced by commit d614bb0412
"powerpc: Move the pte free routines from common header".
On a preempt-rt kernel, a spinlock is dynamically allocated for each
PTE in pgtable_page_ctor(). When the PTE is freed, calling
pgtable_page_dtor() with a mismatched page address causes a memory leak,
as the pointer to the PTE's spinlock is bogus.
On mainline, there isn't any immediately obvious symptoms, but the
problem still exists here.
Fixes: d614bb0412 "powerpc: Move the pte free routes from common header"
Cc: Paul Mackerras <paulus@samba.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: linux-stable <stable@vger.kernel.org> # v3.10+
Signed-off-by: Hong H. Pham <hong.pham@windriver.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Allocate enough memory for the ocm_block structure, not just a pointer
to it.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
A kernel configured with PPC_EARLY_DEBUG_BOOTX=y but PPC_PMAC=n and
PPC_MAPLE=n will fail to link:
btext.c:(.text+0x2d0fc): undefined reference to `.rmci_off'
btext.c:(.text+0x2d214): undefined reference to `.rmci_on'
Fix it by making the build of rmci_on/off() depend on
PPC_EARLY_DEBUG_BOOTX, which also enable the only code that uses them.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Now that the svcpu sync is interrupt aware we can enable interrupts
earlier in the exit code path again, moving 32bit and 64bit closer
together.
While at it, document the fact that we're always executing the exit
path with interrupts enabled so that the next person doesn't trap
over this.
Signed-off-by: Alexander Graf <agraf@suse.de>
As soon as we get back to our "highmem" handler in virtual address
space we may get preempted. Today the reason we can get preempted is
that we replay interrupts and all the lazy logic thinks we have
interrupts enabled.
However, it's not hard to make the code interruptible and that way
we can enable and handle interrupts even earlier.
This fixes random guest crashes that happened with CONFIG_PREEMPT=y
for me.
Signed-off-by: Alexander Graf <agraf@suse.de>
The kvmppc_copy_{to,from}_svcpu functions are publically visible,
so we should also export them in a header for others C files to
consume.
So far we didn't need this because we only called it from asm code.
The next patch will introduce a C caller.
Signed-off-by: Alexander Graf <agraf@suse.de>
We call a C helper to save all svcpu fields into our vcpu. The C
ABI states that r12 is considered volatile. However, we keep our
exit handler id in r12 currently.
So we need to save it away into a non-volatile register instead
that definitely does get preserved across the C call.
This bug usually didn't hit anyone yet since gcc is smart enough
to generate code that doesn't even need r12 which means it stayed
identical throughout the call by sheer luck. But we can't rely on
that.
Signed-off-by: Alexander Graf <agraf@suse.de>
for tmp_part->header.name:
it is "Terminating null required only for names < 12 chars".
so need to limit the %.12s for it in printk
additional info:
%12s limit the width, not for the original string output length
if name length is more than 12, it still can be fully displayed.
if name length is less than 12, the ' ' will be filled before name.
%.12s truly limit the original string output length (precision)
Signed-off-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
In a recent patch:
commit c13f20ac48
Author: Michael Neuling <mikey@neuling.org>
powerpc/signals: Mark VSX not saved with small contexts
We fixed an issue but an improved solution was later discussed after the patch
was merged.
Firstly, this patch doesn't handle the 64bit signals case, which could also hit
this issue (but has never been reported).
Secondly, the original patch isn't clear what MSR VSX should be set to. The
new approach below always clears the MSR VSX bit (to indicate no VSX is in the
context) and sets it only in the specific case where VSX is available (ie. when
VSX has been used and the signal context passed has space to provide the
state).
This reverts the original patch and replaces it with the improved solution. It
also adds a 64 bit version.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Cc: stable@vger.kernel.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When CONFIG_SPARSEMEM_VMEMMAP option is used in kernel, makedumpfile fails
to filter vmcore dump as it fails to do vmemmap translations. So far
dump filtering on ppc64 never had to deal with vmemmap addresses seperately
as vmemmap regions where mapped in zone normal. But with the inclusion of
CONFIG_SPARSEMEM_VMEMMAP config option in kernel, this vmemmap address
translation support becomes necessary for dump filtering. For vmemmap adress
translation, few kernel symbols are needed by dump filtering tool. This patch
adds those symbols to vmcoreinfo, which a dump filtering tool can use for
filtering the kernel dump. Tested this changes successfully with makedumpfile
tool that supports vmemmap to physical address translation outside zone normal.
[ Removed unneeded #ifdef as suggested by Michael Ellerman --BenH ]
Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Stephen reported a failure in an allyesconfig build.
CONFIG_CPU_LITTLE_ENDIAN=y gets set but his toolchain is not
new enough to support little endian. We really want to
default to a big endian build; Ben suggested using a choice
which defaults to CPU_BIG_ENDIAN.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently if I cross build TAGS or cscope from x86 I get this:
% make ARCH=powerpc TAGS
gcc-4.8.real: error: unrecognized command line option ‘-mbig-endian’
GEN TAGS
%
I'm not setting CROSS_COMPILE= as logically I shouldn't need to and I
haven't needed to in the past when building TAGS or cscope. Also, the
above completess correct as the error is not fatal to the build.
This was caused by:
commit d72b080171
Author: Ian Munsie <imunsie@au1.ibm.com>
powerpc: Add ability to build little endian kernels
The below fixes this by testing for the -mbig-endian option before
adding it.
I've not done the same thing in the little endian case as if
-mlittle-endian doesn't exist, we probably want to fail quickly as you
probably have an old big endian compiler.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
And in flush_hugetlb_page(), don't check whether vma is NULL after
we've already dereferenced it.
This was found by Dan using static analysis as described here:
https://lists.ozlabs.org/pipermail/linuxppc-dev/2013-November/113161.html
We currently get away with this because the callers that currently pass
NULL for vma seem to be 32-bit-only (e.g. highmem, and
CONFIG_DEBUG_PGALLOC in pgtable_32.c) Hugetlb is currently 64-bit only,
so we never saw a NULL vma here.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
These lines were inoperative for four years, which puts some doubt into
their importance, and it's possible the fixed version will regress, but
at the very least they should be removed instead.
Signed-off-by: Adam Borowski <kilobyte@angband.pl>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Commit beb2dc0a7a breaks the MPC8xx which
seems to not support using mfspr SPRN_TBRx instead of mftb/mftbu
despite what is written in the reference manual.
This patch reverts to the use of mftb/mftbu when CONFIG_8xx is
selected.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
If CONFIG_ALTIVEC is enabled for CoreNet64, and if we also
select CONFIG_E{5,6}500_CPU this may introduce -mcpu=e500mc64
into $CFLAGS. But Altivec option not allowed with e500mc64,
then some compiling errors occur like this:
CC arch/powerpc/lib/xor_vmx.o
arch/powerpc/lib/xor_vmx.c:1:0: error: AltiVec not supported in this target
make[1]: *** [arch/powerpc/lib/xor_vmx.o] Error 1
make: *** [arch/powerpc/lib] Error 2
So we should restrict e500mc64 in altivec scenario.
Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Pull third set of powerpc updates from Benjamin Herrenschmidt:
"This is a small collection of random bug fixes and a few improvements
of Oops output which I deemed valuable enough to include as well.
The fixes are essentially recent build breakage and regressions, and a
couple of older bugs such as the DTL log duplication, the EEH issue
with PCI_COMMAND_MASTER and the problem with small contexts passed to
get/set_context with VSX enabled"
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc/signals: Mark VSX not saved with small contexts
powerpc/pseries: Fix SMP=n build of rng.c
powerpc: Make cpu_to_chip_id() available when SMP=n
powerpc/vio: Fix a dma_mask issue of vio
powerpc: booke: Fix build failures
powerpc: ppc64 address space capped at 32TB, mmap randomisation disabled
powerpc: Only print PACATMSCRATCH in oops when TM is active
powerpc/pseries: Duplicate dtl entries sometimes sent to userspace
powerpc: Remove a few lines of oops output
powerpc: Print DAR and DSISR on machine check oopses
powerpc: Fix __get_user_pages_fast() irq handling
powerpc/eeh: More accurate log
powerpc/eeh: Enable PCI_COMMAND_MASTER for PCI bridges
In some scene, e.g openstack CI, PR guest can trigger "sc 1" frequently,
this patch optimizes the path by directly delivering BOOK3S_INTERRUPT_SYSCALL
to HV guest, so powernv can return to HV guest without heavy exit, i.e,
no need to swap TLB, HTAB,.. etc
Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
The VSX MSR bit in the user context indicates if the context contains VSX
state. Currently we set this when the process has touched VSX at any stage.
Unfortunately, if the user has not provided enough space to save the VSX state,
we can't save it but we currently still set the MSR VSX bit.
This patch changes this to clear the MSR VSX bit when the user doesn't provide
enough space. This indicates that there is no valid VSX state in the user
context.
This is needed to support get/set/make/swapcontext for applications that use
VSX but only provide a small context. For example, getcontext in glibc
provides a smaller context since the VSX registers don't need to be saved over
the glibc function call. But since the program calling getcontext may have
used VSX, the kernel currently says the VSX state is valid when it's not. If
the returned context is then used in setcontext (ie. a small context without
VSX but with MSR VSX set), the kernel will refuse the context. This situation
has been reported by the glibc community.
Based on patch from Carlos O'Donell.
Tested-by: Haren Myneni <haren@linux.vnet.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Cc: stable@vger.kernel.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
In commit a489043 "Implement arch_get_random_long() based on H_RANDOM" I
broke the SMP=n build. We were getting plpar_wrappers.h via spinlock.h
which breaks when SMP=n.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Up until now we have only used cpu_to_chip_id() in the topology code,
which is only used on SMP builds. However my recent commit a4da0d5
"Implement arch_get_random_long/int() for powernv" added a usage when
SMP=n, breaking the build.
Move cpu_to_chip_id() into prom.c so it is available for SMP=n builds.
We would move the extern to prom.h, but that breaks the include in
topology.h. Instead we leave it in smp.h, but move it out of the
CONFIG_SMP #ifdef. We also need to include asm/smp.h in rng.c, because
the linux version skips asm/smp.h on UP. What a mess.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
I encountered following issue:
[ 0.283035] ibmvscsi 30000015: couldn't initialize event pool
[ 5.688822] ibmvscsi: probe of 30000015 failed with error -1
which prevents the storage from being recognized, and the machine from
booting.
After some digging, it seems that it is caused by commit 4886c399da
as dma_mask pointer in viodev->dev is not set, so in
dma_set_mask_and_coherent(), dma_set_coherent_mask() is not called
because dma_set_mask(), which is dma_set_mask_pSeriesLP() returned EIO.
While before the commit, dma_set_coherent_mask() is always called.
I tried to replace dma_set_mask_and_coherent() with
dma_coerce_mask_and_coherent(), and the machine could boot again.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
arch/powerpc/platforms/wsp/wsp.c: In function ‘wsp_probe_devices’:
arch/powerpc/platforms/wsp/wsp.c:76:3: error: implicit declaration of function ‘of_address_to_resource’ [-Werror=implicit-function-declaration]
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Commit fba2369e6c (mm: use vm_unmapped_area() on powerpc architecture)
has a bug in slice_scan_available() where we compare an unsigned long
(high_slices) against a shifted int. As a result, comparisons against
the top 32 bits of high_slices (representing the top 32TB) always
returns 0 and the top of our mmap region is clamped at 32TB
This also breaks mmap randomisation since the randomised address is
always up near the top of the address space and it gets clamped down
to 32TB.
Cc: stable@vger.kernel.org # v3.10+
Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
If TM is not active there is no need to print PACATMSCRATCH
so we can save ourselves a line.
Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Machine check exceptions set DAR and DSISR, so print them in our
oops output.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
__get_user_pages_fast() may be called with interrupts disabled (see e.g.
get_futex_key() in kernel/futex.c) and therefore should use local_irq_save()
and local_irq_restore() instead of local_irq_disable()/enable().
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
CC: <stable@vger.kernel.org> [v3.12]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This clarifies in the log whether the error is a global PHB error
or an individual PE being frozen.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
On PHB3, we will fail to fetch IODA tables without PCI_COMMAND_MASTER
on PCI bridges. According to one experiment I had, the MSIx interrupts
didn't raise from the adapter without the bit applied to all upstream
PCI bridges including root port of the adapter. The patch forces to
have that bit enabled accordingly.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Pull powerpc LE updates from Ben Herrenschmidt:
"With my previous pull request I mentioned some remaining Little Endian
patches, notably support for our new ABI, which I was sitting on
making sure it was all finalized.
The toolchain folks confirmed it now, the new ABI is stable and merged
with gcc, so we are all good. Oh and we actually missed the actual
Kconfig switch for LE so here it is, along with a couple more bug
fixes.
I have more fixes but not related to LE so I'll send them as a
separate pull request tomorrow, let's get this one out of the way.
Note that this supports running user space binaries using the new ABI,
but the kernel itself still needs to be built with the old one. We'll
bring fixes for that after -rc1.
Here's Anton log that goes with this series:
This patch series adds support for the new ABI, LPAR support for
H_SET_MODE and finally adds a kconfig option and defconfig.
ABIv2 support was recently committed to binutils and gcc, and should
be merged into glibc soon. There are a number of very nice
improvements including the removal of function descriptors. Rusty's
kernel patches allow binaries of either ABI to work, easing the
transition"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc: Wrong DWARF CFI in the kernel vdso for little-endian / ELFv2
powerpc: Add pseries_le_defconfig
powerpc: Add CONFIG_CPU_LITTLE_ENDIAN kernel config option.
powerpc: Don't use ELFv2 ABI to build the kernel
powerpc: ELF2 binaries signal handling
powerpc: ELF2 binaries launched directly.
powerpc: Set eflags correctly for ELF ABIv2 core dumps.
powerpc: Add TIF_ELF2ABI flag.
pseries: Add H_SET_MODE to change exception endianness
powerpc/pseries: Fix endian issues in pseries EEH code
I've finally tracked down why my CR signal-unwind test case still
fails on little-endian. The problem turned to be that the kernel
installs a signal trampoline in the vDSO, and provides a DWARF CFI
record for that trampoline. This CFI describes the save location
for CR:
rsave (70, 38*RSIZE + (RSIZE - CRSIZE))
which is correct for big-endian, but points to the wrong word on
little-endian. This is wrong no matter which ABI.
In addition, for the ELFv2 ABI, we should not only provide a CFI
record for register 70 (cr2), but for all CR fields separately.
Strictly speaking, I guess this would mean providing two separate
vDSO images, one for ELFv1 processes and one for ELFv2 processes (or
maybe playing some tricks with conditional DWARF expressions).
However, having CFI records for the other CR fields in ELFv1 is not
actually wrong, they just will be ignored. So it seems the simplest
fix would be just to always provide CFI for all the fields.
Signed-off-by: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
With the little endian support merged, we can add the
CONFIG_CPU_LITTLE_ENDIAN kernel config option.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The kernel doesn't build correctly using the ELFv2 ABI. This patch
ensures that the ELFv1 ABI is used when building a kernel with an
ELFv2 enabled compiler.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
For the ELFv2 ABI, the hander is the entry point, not a function descriptor.
We also need to set up r12, and fortunately the fast_exception_return
exit path restores r12 for us so nothing else is required.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
No function descriptor, but we set r12 up and set TIF_RESTOREALL as it
normally isn't restored on return from syscall.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We leave it at zero (though it could be 1) for old tasks.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Little endian ppc64 is getting an exciting new ABI. This is reflected
by the bottom two bits of e_flags in the ELF header:
0 == legacy binaries (v1 ABI)
1 == binaries using the old ABI (compiled with a new toolchain)
2 == binaries using the new ABI.
We store this in a thread flag, because we need to set it in core
dumps and for signal delivery. Our chief concern is that it doesn't
use function descriptors.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
On little endian builds call H_SET_MODE so exceptions have the
correct endianness. We need to reset the endian during kexec
so do that in the MMU hashtable clear callback.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Pull slave-dmaengine changes from Vinod Koul:
"This brings for slave dmaengine:
- Change dma notification flag to DMA_COMPLETE from DMA_SUCCESS as
dmaengine can only transfer and not verify validaty of dma
transfers
- Bunch of fixes across drivers:
- cppi41 driver fixes from Daniel
- 8 channel freescale dma engine support and updated bindings from
Hongbo
- msx-dma fixes and cleanup by Markus
- DMAengine updates from Dan:
- Bartlomiej and Dan finalized a rework of the dma address unmap
implementation.
- In the course of testing 1/ a collection of enhancements to
dmatest fell out. Notably basic performance statistics, and
fixed / enhanced test control through new module parameters
'run', 'wait', 'noverify', and 'verbose'. Thanks to Andriy and
Linus [Walleij] for their review.
- Testing the raid related corner cases of 1/ triggered bugs in
the recently added 16-source operation support in the ioatdma
driver.
- Some minor fixes / cleanups to mv_xor and ioatdma"
* 'next' of git://git.infradead.org/users/vkoul/slave-dma: (99 commits)
dma: mv_xor: Fix mis-usage of mmio 'base' and 'high_base' registers
dma: mv_xor: Remove unneeded NULL address check
ioat: fix ioat3_irq_reinit
ioat: kill msix_single_vector support
raid6test: add new corner case for ioatdma driver
ioatdma: clean up sed pool kmem_cache
ioatdma: fix selection of 16 vs 8 source path
ioatdma: fix sed pool selection
ioatdma: Fix bug in selftest after removal of DMA_MEMSET.
dmatest: verbose mode
dmatest: convert to dmaengine_unmap_data
dmatest: add a 'wait' parameter
dmatest: add basic performance metrics
dmatest: add support for skipping verification and random data setup
dmatest: use pseudo random numbers
dmatest: support xor-only, or pq-only channels in tests
dmatest: restore ability to start test at module load and init
dmatest: cleanup redundant "dmatest: " prefixes
dmatest: replace stored results mechanism, with uniform messages
Revert "dmatest: append verify result to results"
...
powerpc has both arch_uprobe->insn and arch_uprobe->ainsn to
make the generic code happy. This is no longer needed after
the previous change, powerpc can just use "u32 insn".
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Pull irq cleanups from Ingo Molnar:
"This is a multi-arch cleanup series from Thomas Gleixner, which we
kept to near the end of the merge window, to not interfere with
architecture updates.
This series (motivated by the -rt kernel) unifies more aspects of IRQ
handling and generalizes PREEMPT_ACTIVE"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
preempt: Make PREEMPT_ACTIVE generic
sparc: Use preempt_schedule_irq
ia64: Use preempt_schedule_irq
m32r: Use preempt_schedule_irq
hardirq: Make hardirq bits generic
m68k: Simplify low level interrupt handling code
genirq: Prevent spurious detection for unconditionally polled interrupts
Since kvmppc_hv_find_lock_hpte() is called from both virtmode and
realmode, so it can trigger the deadlock.
Suppose the following scene:
Two physical cpuM, cpuN, two VM instances A, B, each VM has a group of
vcpus.
If on cpuM, vcpu_A_1 holds bitlock X (HPTE_V_HVLOCK), then is switched
out, and on cpuN, vcpu_A_2 try to lock X in realmode, then cpuN will be
caught in realmode for a long time.
What makes things even worse if the following happens,
On cpuM, bitlockX is hold, on cpuN, Y is hold.
vcpu_B_2 try to lock Y on cpuM in realmode
vcpu_A_2 try to lock X on cpuN in realmode
Oops! deadlock happens
Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
Reviewed-by: Paul Mackerras <paulus@samba.org>
CC: stable@vger.kernel.org
Signed-off-by: Alexander Graf <agraf@suse.de>