Commit Graph

426177 Commits

Author SHA1 Message Date
Scott Wood
9d378dfac8 powerpc/booke64: Use SPRG7 for VDSO
Previously SPRG3 was marked for use by both VDSO and critical
interrupts (though critical interrupts were not fully implemented).

In commit 8b64a9dfb0 ("powerpc/booke64:
Use SPRG0/3 scratch for bolted TLB miss & crit int"), Mihai Caraman
made an attempt to resolve this conflict by restoring the VDSO value
early in the critical interrupt, but this has some issues:

 - It's incompatible with EXCEPTION_COMMON which restores r13 from the
   by-then-overwritten scratch (this cost me some debugging time).
 - It forces critical exceptions to be a special case handled
   differently from even machine check and debug level exceptions.
 - It didn't occur to me that it was possible to make this work at all
   (by doing a final "ld r13, PACA_EXCRIT+EX_R13(r13)") until after
   I made (most of) this patch. :-)

It might be worth investigating using a load rather than SPRG on return
from all exceptions (except TLB misses where the scratch never leaves
the SPRG) -- it could save a few cycles.  Until then, let's stick with
SPRG for all exceptions.

Since we cannot use SPRG4-7 for scratch without corrupting the state of
a KVM guest, move VDSO to SPRG7 on book3e.  Since neither SPRG4-7 nor
critical interrupts exist on book3s, SPRG3 is still used for VDSO
there.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Cc: Mihai Caraman <mihai.caraman@freescale.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: kvm-ppc@vger.kernel.org
2014-03-19 19:57:14 -05:00
Scott Wood
82d86de25b powerpc/e6500: Make TLB lock recursive
Once special level interrupts are supported, we may take nested TLB
misses -- so allow the same thread to acquire the lock recursively.

The lock will not be effective against the nested TLB miss handler
trying to write the same entry as the interrupted TLB miss handler, but
that's also a problem on non-threaded CPUs that lack TLB write
conditional.  This will be addressed in the patch that enables crit/mc
support by invalidating the TLB on return from level exceptions.

Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-03-19 19:57:13 -05:00
Scott Wood
c4787d1ecf powerpc/booke64: Fix exception numbers
altivec_unavailable was commented as 0xf20 but the code uses 0x200.
Note that 0xf20 is also used by ap_unavailable.

altivec_assist was commented as 0x1700 but the code uses 0x220.

critical_input was commented as 0x580 but the code uses 0x100.

machine_check was commented and implemented as 0x200, which conflicts
with altivec_assist (it only builds because MC_EXCEPTION_PROLOG is
commented out).  Changed to the fixed IVOR value of 0x000.

Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-03-19 19:57:12 -05:00
Tiejun Chen
19007b340d powerpc/book3e: store crit/mc/dbg exception thread info
We need to store thread info to these exception thread info like something
we already did for PPC32.

Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-03-19 19:57:10 -05:00
Tiejun Chen
160c732433 powerpc/book3e: initialize crit/mc/dbg kernel stack pointers
We already allocated critical/machine/debug check exceptions, but
we also should initialize those associated kernel stack pointers
for use by special exceptions in the PACA.

Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-03-19 19:57:09 -05:00
Wang Dongsheng
b0b7dcbdf3 powerpc/fsl: add PVR definition for E500MC and E5500
Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-03-19 19:39:25 -05:00
Zhao Qiang
0f5a869600 Corenet: Add QE platform support for Corenet
There is QE on platform T104x, add support.
Call funcs qe_ic_init and qe_init if CONFIG_QUICC_ENGINE is defined.

Signed-off-by: Zhao Qiang <B45475@freescale.com>
[scottwood@freesacle.com: whitespace fix]
Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-03-19 19:37:59 -05:00
Zhao Qiang
706f4aa035 QE: split function mpc85xx_qe_init() into two functions.
New QE doesn't have par_io, it doesn't need to init par_io
for new QE.
Split function mpc85xx_qe_init() into mpc85xx_qe_init()
and mpc85xx_qe_par_io_init().
Call mpc85xx_qe_init() for both new and old while
mpc85xx_qe_par_io_init() after mpc85xx_qe_init() for old.

Signed-off-by: Zhao Qiang <B45475@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-03-19 19:37:31 -05:00
Sebastian Siewior
b0ad062cc4 powerpc: 85xx rdb: move np pointer to avoid builderror
If CONFIG_UCC_GETH or CONFIG_SERIAL_QE is not defined then we get a
warning about an used variable which leads to a build error.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-03-19 19:02:42 -05:00
harninder rai
c70a97447b powerpc/fsl: Add/update miscellaneous missing binding
Missing bindings were found on running checkpatch.pl on bsc9132
device tree. This patch add/update the following

- Add bindings for L2 cache controller
- Add bindings for memory controller
- Update bindings for USB controller

Signed-off-by: Harninder Rai <harninder.rai@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-03-19 18:09:40 -05:00
Prabhakar Kushwaha
2eb1b9a41b powerpc/config: Remove unnecssary CONFIG_FSL_IFC
CONFIG_FSL_IFC gets enabled by Kconfig dependancies.
So remove unnecssary define from the defconfigs

Signed-off-by: Prabhakar Kushwaha <prabhakar@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-03-19 17:38:13 -05:00
Tang Yuantian
5a7c258ef4 powerpc: T4240: Add ina220 node in dts
Add power sensor chip ina220 node in dts to support
power monitor

Signed-off-by: Tang Yuantian <Yuantian.Tang@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-03-19 16:57:25 -05:00
Luis Henriques
3cae801695 powerpc/kconfig: Remove TSI108_BRIDGE duplicates
The MPC7448HPC2 and PPC_HOLLY config options contain TSI108_BRIDGE
duplicates since commit:

commit 3490cba56f
Author: Jon Loeliger <jdl@jdl.com>
Date:   Wed Jan 23 12:42:50 2008 -0600

    [POWERPC] Add initial iomega StorCenter board port.

This patch cleans these duplicates.

Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-03-19 16:14:44 -05:00
Minghuan Lian
a424b97b7e powerpc/pci: Fix IMMRBAR address
For PEXCSRBAR, bit 3-0 indicate prefetchable and address type.
So when getting base address, these bits should be masked,
otherwise we may get incorrect base address.

Signed-off-by: Minghuan Lian <Minghuan.Lian@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-03-19 16:09:05 -05:00
Tang Yuantian
5d1a566e51 powerpc/mpc85xx: Update clock nodes in device tree
The following SoCs will be affected: p2041, p3041, p4080,
p5020, p5040, b4420, b4860, t4240

Signed-off-by: Tang Yuantian <Yuantian.Tang@freescale.com>
Signed-off-by: Li Yang <leoli@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-03-19 16:04:23 -05:00
Stewart Smith
c7e64b9ce0 powerpc/powernv Platform dump interface
This enables support for userspace to fetch and initiate FSP and
Platform dumps from the service processor (via firmware) through sysfs.

Based on original patch from Vasant Hegde <hegdevasant@linux.vnet.ibm.com>

Flow:
  - We register for OPAL notification events.
  - OPAL sends new dump available notification.
  - We make information on dump available via sysfs
  - Userspace requests dump contents
  - We retrieve the dump via OPAL interface
  - User copies the dump data
  - userspace sends ack for dump
  - We send ACK to OPAL.

sysfs files:
  - We add the /sys/firmware/opal/dump directory
  - echoing 1 (well, anything, but in future we may support
    different dump types) to /sys/firmware/opal/dump/initiate_dump
    will initiate a dump.
  - Each dump that we've been notified of gets a directory
    in /sys/firmware/opal/dump/ with a name of the dump type and ID (in hex,
    as this is what's used elsewhere to identify the dump).
  - Each dump has files: id, type, dump and acknowledge
    dump is binary and is the dump itself.
    echoing 'ack' to acknowledge (currently any string will do) will
    acknowledge the dump and it will soon after disappear from sysfs.

OPAL APIs:
  - opal_dump_init()
  - opal_dump_info()
  - opal_dump_read()
  - opal_dump_ack()
  - opal_dump_resend_notification()

Currently we are only ever notified for one dump at a time (until
the user explicitly acks the current dump, then we get a notification
of the next dump), but this kernel code should "just work" when OPAL
starts notifying us of all the dumps present.

Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-03-07 16:19:10 +11:00
Stewart Smith
774fea1a38 powerpc/powernv: Read OPAL error log and export it through sysfs
Based on a patch by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

This patch adds support to read error logs from OPAL and export
them to userspace through a sysfs interface.

We export each log entry as a directory in /sys/firmware/opal/elog/

Currently, OPAL will buffer up to 128 error log records, we don't
need to have any knowledge of this limit on the Linux side as that
is actually largely transparent to us.

Each error log entry has the following files: id, type, acknowledge, raw.
Currently we just export the raw binary error log in the 'raw' attribute.
In a future patch, we may parse more of the error log to make it a bit
easier for userspace (e.g. to be able to display a brief summary in
petitboot without having to have a full parser).

If we have >128 logs from OPAL, we'll only be notified of 128 until
userspace starts acknowledging them. This limitation may be lifted in
the future and with this patch, that should "just work" from the linux side.

A userspace daemon should:
- wait for error log entries using normal mechanisms (we announce creation)
- read error log entry
- save error log entry safely to disk
- acknowledge the error log entry
- rinse, repeat.

On the Linux side, we read the error log when we're notified of it. This
possibly isn't ideal as it would be better to only read them on-demand.
However, this doesn't really work with current OPAL interface, so we
read the error log immediately when notified at the moment.

I've tested this pretty extensively and am rather confident that the
linux side of things works rather well. There is currently an issue with
the service processor side of things for >128 error logs though.

Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-03-07 16:19:00 +11:00
Benjamin Krill
60b962239a powerpc/book3e: Fix check for linear mapping in TLB miss handler
The previous code added wrong TLBs and causes machine check errors if
a driver accessed passed the end of the linear mapping instead of
a clean page fault.

Signed-off-by: Ralph E. Bellofatto <ralphbel@us.ibm.com>
Signed-off-by: Benjamin Krill <ben@codiert.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-03-07 15:54:51 +11:00
Sebastian Siewior
eb3b80f676 powerpc: Add "force config cmd line" Kconfig option
powerpc uses early_init_dt_scan_chosen() from common fdt code. By
enabling this option, the common code can take the built in
command line over the one that is comming from bootloader / DT.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-03-07 15:54:50 +11:00
Tyrel Datwyler
9da3489210 powerpc/pseries: Expose in kernel device tree update to drmgr
Traditionally it has been drmgr's responsibilty to update the device tree
through the /proc/ppc64/ofdt interface after a suspend/resume operation.
This patchset however has modified suspend/resume ops to preform an update
entirely in the kernel during the resume. Therefore, a mechanism is required
to expose that information to drmgr.

This patch adds a show function to the "hibernate" attribute that returns 1
if the kernel performs a device tree update after the resume and 0 otherwise.
This allows newer versions of drmgr to avoid doing a second unnecessary
device tree update.

Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-03-07 15:54:50 +11:00
Haren Myneni
6b36ba8492 powerpc/pseries: Update dynamic cache nodes for suspend/resume operation
pHyp can change cache nodes for suspend/resume operation. Currently the
device tree is updated by drmgr in userspace after all non boot CPUs are
enabled. Hence, we do not modify the cache list based on the latest cache
nodes. Also we do not remove cache entries for the primary CPU.

This patch removes the cache list for the boot CPU, updates the device tree
before enabling nonboot CPUs and adds cache list for the boot cpu.

This patch also has the side effect that older versions of drmgr will
perform a second device tree update from userspace. While this is a
redundant waste of a couple cycles it is harmless since firmware returns the
same data for the subsequent update-nodes/properties rtas calls.

Signed-off-by: Haren Myneni <hbabu@us.ibm.com>
Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-03-07 15:54:49 +11:00
Haren Myneni
39a33b59f4 powerpc/pseries: Device tree should only be updated once after suspend/migrate
The current code makes rtas calls for update-nodes, activate-firmware and then
update-nodes again. The FW provides the same data for both update-nodes calls.
As a result a proc entry exists error is reported for the second update while
adding device nodes.

This patch makes a single rtas call for update-nodes after activating the FW.
It also add rtas_busy delay for the activate-firmware rtas call.

Signed-off-by: Haren Myneni <hbabu@us.ibm.com>
Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-03-07 15:54:49 +11:00
Shuah Khan
639291f263 macintosh/adb: Change platform power management to use dev_pm_ops
Change adb platform driver to register pm ops using dev_pm_ops instead of
legacy pm_ops. .pm hooks call existing legacy suspend and resume interfaces
by passing in the right pm state.

Signed-off-by: Shuah Khan <shuah.kh@samsung.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-03-07 15:54:48 +11:00
Paul Gortmaker
3c8464a9b1 powerpc: Delete old PrPMC 280/2800 support
This processor/memory module was mostly used on ATCA blades and
before that, on cPCI blades.  It wasn't really user friendly, with
custom non u-boot bootloaders (powerboot/motload) and no real way
to recover corrupted boot flash (which was a common problem).

As such, it had its day back before the big ppc --> powerpc move
to device trees, and that was largely through commercial BSPs that
started to dry up around 2007.

Systems using one were largely in a "deploy and sustain" mode,
so interest in upgrading to new kernels in the field was nil.
Also, requiring 50A, 48V power supplies and a 2'x2'x2' ATCA
chassis largely rules out any hobbyist/enthusiast interest.

The point of all this, is that we might as well delete the in
kernel files relating to this platform.  No point in continuing
to build it via walking the defconfigs or via linux-next testing.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-03-07 15:54:48 +11:00
Brandon Stewart
74e7cd432c macintosh/adb: Fixed some coding style problems
Signed-off-by: Brandon Stewart <stewartb2@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-03-07 15:53:13 +11:00
Nathan Fontenot
9ac8cde938 powerpc/pseries: Use remove_memory() to remove memory
The memory remove code for powerpc/pseries should call remove_memory()
so that we are holding the hotplug_memory lock during memory remove
operations.

This patch updates the memory node remove handler to call remove_memory()
and adds a ppc_md.remove_memory() entry to handle pseries specific work
that is called from arch_remove_memory().

During memory remove in pseries_remove_memblock() we have to stay with
removing memory one section at a time. This is needed because of how memory
resources are handled. During memory add for pseries (via the probe file in
sysfs) we add memory one section at a time which gives us a memory resource
for each section. Future patches will aim to address this so will not have
to remove memory one section at a time.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-03-07 15:53:13 +11:00
Michael Ellerman
22d651dcef selftests/powerpc: Import Anton's memcpy / copy_tofrom_user tests
Turn Anton's memcpy / copy_tofrom_user test into something that can
live in tools/testing/selftests.

It requires one turd in arch/powerpc/lib/memcpy_64.S, but it's pretty
harmless IMHO.

We are sailing very close to the wind with the feature macros. We define
them to nothing, which currently means we get a few extra nops and
include the unaligned calls.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-03-07 15:53:12 +11:00
Mahesh Salgaonkar
55672ecfa2 powerpc/book3s: Recover from MC in sapphire on SCOM read via MMIO.
Detect and recover from machine check when inside opal on a special
scom load instructions. On specific SCOM read via MMIO we may get a machine
check exception with SRR0 pointing inside opal. To recover from MC
in this scenario, get a recovery instruction address and return to it from
MC.

OPAL will export the machine check recoverable ranges through
device tree node mcheck-recoverable-ranges under ibm,opal:

# hexdump /proc/device-tree/ibm,opal/mcheck-recoverable-ranges
0000000 0000 0000 3000 2804 0000 000c 0000 0000
0000010 3000 2814 0000 0000 3000 27f0 0000 000c
0000020 0000 0000 3000 2814 xxxx xxxx xxxx xxxx
0000030 llll llll yyyy yyyy yyyy yyyy
...
...
#

where:
	xxxx xxxx xxxx xxxx = Starting instruction address
	llll llll           = Length of the address range.
	yyyy yyyy yyyy yyyy = recovery address

Each recoverable address range entry is (start address, len,
recovery address), 2 cells each for start and recovery address, 1 cell for
len, totalling 5 cells per entry. During kernel boot time, build up the
recovery table with the list of recovery ranges from device-tree node which
will be used during machine check exception to recover from MMIO SCOM UE.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-03-07 15:52:10 +11:00
Benjamin Herrenschmidt
d2a36071ef powerpc/pseries: Don't try to register pseries cpu hotplug on non-pseries
This results in oddball messages at boot on other platforms telling us
that CPU hotplug isn't supported even when it is.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-03-07 15:50:13 +11:00
Philippe Bergheaud
72eceef67a powerpc: Fix xmon disassembler for little-endian
This patch fixes the disassembler of the powerpc kernel debugger xmon,
for little-endian.

Signed-off-by: Philippe Bergheaud <felix@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-03-07 15:50:12 +11:00
Li Zhong
10862a0c71 powerpc: Revert c6102609 and replace it with the correct fix for vio dma mask setting
This patch reverts my previous "fix", and replace it with the correct
fix from Russell.

And as Russell pointed out -- dma_set_mask_and_coherent() (and the other
dma_set_mask() functions) are really supposed to be used by drivers
only.

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-03-07 15:50:12 +11:00
송은봉
847443774b powerpc: : Kill CONFIG_MTD_PARTITIONS
This patch removes CONFIG_MTD_PARTITIONS in config files for powerpc.
 Because CONFIG_MTD_PARTITIONS was removed by commit 6a8a98b22b.

Signed-off-by: Eunbong Song <eunb.song@samsung.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-03-07 15:50:11 +11:00
Anton Blanchard
a5b2cf5b1a powerpc: Align p_dyn, p_rela and p_st symbols
The 64bit relocation code places a few symbols in the text segment.
These symbols are only 4 byte aligned where they need to be 8 byte
aligned. Add an explicit alignment.

Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: stable@vger.kernel.org
Tested-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-03-07 13:50:19 +11:00
Michael Neuling
621b5060e8 powerpc/tm: Fix crash when forking inside a transaction
When we fork/clone we currently don't copy any of the TM state to the new
thread.  This results in a TM bad thing (program check) when the new process is
switched in as the kernel does a tmrechkpt with TEXASR FS not set.  Also, since
R1 is from userspace, we trigger the bad kernel stack pointer detection.  So we
end up with something like this:

   Bad kernel stack pointer 0 at c0000000000404fc
   cpu 0x2: Vector: 700 (Program Check) at [c00000003ffefd40]
       pc: c0000000000404fc: restore_gprs+0xc0/0x148
       lr: 0000000000000000
       sp: 0
      msr: 9000000100201030
     current = 0xc000001dd1417c30
     paca    = 0xc00000000fe00800   softe: 0        irq_happened: 0x01
       pid   = 0, comm = swapper/2
   WARNING: exception is not recoverable, can't continue

The below fixes this by flushing the TM state before we copy the task_struct to
the clone.  To do this we go through the tmreclaim patch, which removes the
checkpointed registers from the CPU and transitions the CPU out of TM suspend
mode.  Hence we need to call tmrechkpt after to restore the checkpointed state
and the TM mode for the current task.

To make this fail from userspace is simply:
	tbegin
	li	r0, 2
	sc
	<boom>

Kudos to Adhemerval Zanella Neto for finding this.

Signed-off-by: Michael Neuling <mikey@neuling.org>
cc: Adhemerval Zanella Neto <azanella@br.ibm.com>
cc: stable@vger.kernel.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-03-07 13:50:15 +11:00
Benjamin Herrenschmidt
e0cf957614 powerpc/powernv: Fix indirect XSCOM unmangling
We need to unmangle the full address, not just the register
number, and we also need to support the real indirect bit
being set for in-kernel uses.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: <stable@vger.kernel.org> [v3.13]
2014-02-28 19:15:49 +11:00
Benjamin Herrenschmidt
2f3f38e4d3 powerpc/powernv: Fix opal_xscom_{read,write} prototype
The OPAL firmware functions opal_xscom_read and opal_xscom_write
take a 64-bit argument for the XSCOM (PCB) address in order to
support the indirect mode on P8.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: <stable@vger.kernel.org> [v3.13]
2014-02-28 19:15:48 +11:00
Gavin Shan
af87d2fe95 powerpc/powernv: Refactor PHB diag-data dump
As Ben suggested, the patch prints PHB diag-data with multiple
fields in one line and omits the line if the fields of that
line are all zero.

With the patch applied, the PHB3 diag-data dump looks like:

PHB3 PHB#3 Diag-data (Version: 1)

  brdgCtl:     00000002
  RootSts:     0000000f 00400000 b0830008 00100147 00002000
  nFir:        0000000000000000 0030006e00000000 0000000000000000
  PhbSts:      0000001c00000000 0000000000000000
  Lem:         0000000000100000 42498e327f502eae 0000000000000000
  InAErr:      8000000000000000 8000000000000000 0402030000000000 0000000000000000
  PE[  8] A/B: 8480002b00000000 8000000000000000

[ The current diag data is so big that it overflows the printk
  buffer pretty quickly in cases when we get a handful of errors
  at once which can happen. --BenH
]

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-02-28 18:43:19 +11:00
Gavin Shan
9471660437 powerpc/powernv: Dump PHB diag-data immediately
The PHB diag-data is important to help locating the root cause for
EEH errors such as frozen PE or fenced PHB. However, the EEH core
enables IO path by clearing part of HW registers before collecting
this data causing it to be corrupted.

This patch fixes this by dumping the PHB diag-data immediately when
frozen/fenced state on PE or PHB is detected for the first time in
eeh_ops::get_state() or next_error() backend.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-02-28 18:43:10 +11:00
Paul Mackerras
573ebfa660 powerpc: Increase stack redzone for 64-bit userspace to 512 bytes
The new ELFv2 little-endian ABI increases the stack redzone -- the
area below the stack pointer that can be used for storing data --
from 288 bytes to 512 bytes.  This means that we need to allow more
space on the user stack when delivering a signal to a 64-bit process.

To make the code a bit clearer, we define new USER_REDZONE_SIZE and
KERNEL_REDZONE_SIZE symbols in ptrace.h.  For now, we leave the
kernel redzone size at 288 bytes, since increasing it to 512 bytes
would increase the size of interrupt stack frames correspondingly.

Gcc currently only makes use of 288 bytes of redzone even when
compiling for the new little-endian ABI, and the kernel cannot
currently be compiled with the new ABI anyway.

In the future, hopefully gcc will provide an option to control the
amount of redzone used, and then we could reduce it even more.

This also changes the code in arch_compat_alloc_user_space() to
preserve the expanded redzone.  It is not clear why this function would
ever be used on a 64-bit process, though.

Signed-off-by: Paul Mackerras <paulus@samba.org>
CC: <stable@vger.kernel.org> [v3.13]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-02-28 18:06:26 +11:00
Liu Ping Fan
a95fc58549 powerpc/ftrace: bugfix for test_24bit_addr
The branch target should be the func addr, not the addr of func_descr_t.
So using ppc_function_entry() to generate the right target addr.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-02-28 18:06:25 +11:00
Laurent Dufour
f5295bd8ea powerpc/crashdump : Fix page frame number check in copy_oldmem_page
In copy_oldmem_page, the current check using max_pfn and min_low_pfn to
decide if the page is backed or not, is not valid when the memory layout is
not continuous.

This happens when running as a QEMU/KVM guest, where RTAS is mapped higher
in the memory. In that case max_pfn points to the end of RTAS, and a hole
between the end of the kdump kernel and RTAS is not backed by PTEs. As a
consequence, the kdump kernel is crashing in copy_oldmem_page when accessing
in a direct way the pages in that hole.

This fix relies on the memblock's service memblock_is_region_memory to
check if the read page is part or not of the directly accessible memory.

Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Tested-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-02-28 18:06:25 +11:00
Tony Breeds
41dd03a94c powerpc/le: Ensure that the 'stop-self' RTAS token is handled correctly
Currently we're storing a host endian RTAS token in
rtas_stop_self_args.token.  We then pass that directly to rtas.  This is
fine on big endian however on little endian the token is not what we
expect.

This will typically result in hitting:
	panic("Alas, I survived.\n");

To fix this we always use the stop-self token in host order and always
convert it to be32 before passing this to rtas.

Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-02-28 18:06:24 +11:00
Gavin Shan
66f9af83e5 powerpc/eeh: Disable EEH on reboot
We possiblly detect EEH errors during reboot, particularly in kexec
path, but it's impossible for device drivers and EEH core to handle
or recover them properly.

The patch registers one reboot notifier for EEH and disable EEH
subsystem during reboot. That means the EEH errors is going to be
cleared by hardware reset or second kernel during early stage of
PCI probe.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-02-17 11:19:39 +11:00
Gavin Shan
2ec5a0adf6 powerpc/eeh: Cleanup on eeh_subsystem_enabled
The patch cleans up variable eeh_subsystem_enabled so that we needn't
refer the variable directly from external. Instead, we will use
function eeh_enabled() and eeh_set_enable() to operate the variable.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-02-17 11:19:39 +11:00
Gavin Shan
5b2e198e50 powerpc/powernv: Rework EEH reset
When doing reset in order to recover the affected PE, we issue
hot reset on PE primary bus if it's not root bus. Otherwise, we
issue hot or fundamental reset on root port or PHB accordingly.
For the later case, we didn't cover the situation where PE only
includes root port and it potentially causes kernel crash upon
EEH error to the PE.

The patch reworks the logic of EEH reset to improve the code
readability and also avoid the kernel crash.

Cc: stable@vger.kernel.org
Reported-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-02-17 11:19:38 +11:00
Anton Blanchard
24b659a138 powerpc: Use unstripped VDSO image for more accurate profiling data
We are seeing a lot of hits in the VDSO that are not resolved by perf.
A while(1) gettimeofday() loop shows the issue:

27.64%  [vdso]  [.] 0x000000000000060c
22.57%  [vdso]  [.] 0x0000000000000628
16.88%  [vdso]  [.] 0x0000000000000610
12.39%  [vdso]  [.] __kernel_gettimeofday
 6.09%  [vdso]  [.] 0x00000000000005f8
 3.58%  test    [.] 00000037.plt_call.gettimeofday@@GLIBC_2.18
 2.94%  [vdso]  [.] __kernel_datapage_offset
 2.90%  test    [.] main

We are using a stripped VDSO image which means only symbols with
relocation info can be resolved. There isn't a lot of point to
stripping the VDSO, the debug info is only about 1kB:

4680 arch/powerpc/kernel/vdso64/vdso64.so
5815 arch/powerpc/kernel/vdso64/vdso64.so.dbg

By using the unstripped image, we can resolve all the symbols in the
VDSO and the perf profile data looks much better:

76.53%  [vdso]  [.] __do_get_tspec
12.20%  [vdso]  [.] __kernel_gettimeofday
 5.05%  [vdso]  [.] __get_datapage
 3.20%  test    [.] main
 2.92%  test    [.] 00000037.plt_call.gettimeofday@@GLIBC_2.18

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-02-17 11:19:37 +11:00
Anton Blanchard
a0a4419e30 powerpc: Link VDSOs at 0x0
perf is failing to resolve symbols in the VDSO. A while (1)
gettimeofday() loop shows:

93.99%  [vdso]  [.] 0x00000000000005e0
 3.12%  test    [.] 00000037.plt_call.gettimeofday@@GLIBC_2.18
 2.81%  test    [.] main

The reason for this is that we are linking our VDSO shared libraries
at 1MB, which is a little weird. Even though this is uncommon, Alan
points out that it is valid and we should probably fix perf userspace.

Regardless, I can't see a reason why we are doing this. The code
is all position independent and we never rely on the VDSO ending
up at 1M (and we never place it there on 64bit tasks).

Changing our link address to 0x0 fixes perf VDSO symbol resolution:

73.18%  [vdso]  [.] 0x000000000000060c
12.39%  [vdso]  [.] __kernel_gettimeofday
 3.58%  test    [.] 00000037.plt_call.gettimeofday@@GLIBC_2.18
 2.94%  [vdso]  [.] __kernel_datapage_offset
 2.90%  test    [.] main

We still have some local symbol resolution issues that will be
fixed in a subsequent patch.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-02-17 11:19:37 +11:00
Aneesh Kumar K.V
56eecdb912 mm: Use ptep/pmdp_set_numa() for updating _PAGE_NUMA bit
Archs like ppc64 doesn't do tlb flush in set_pte/pmd functions when using
a hash table MMU for various reasons (the flush is handled as part of
the PTE modification when necessary).

ppc64 thus doesn't implement flush_tlb_range for hash based MMUs.

Additionally ppc64 require the tlb flushing to be batched within ptl locks.

The reason to do that is to ensure that the hash page table is in sync with
linux page table.

We track the hpte index in linux pte and if we clear them without flushing
hash and drop the ptl lock, we can have another cpu update the pte and can
end up with duplicate entry in the hash table, which is fatal.

We also want to keep set_pte_at simpler by not requiring them to do hash
flush for performance reason. We do that by assuming that set_pte_at() is
never *ever* called on a PTE that is already valid.

This was the case until the NUMA code went in which broke that assumption.

Fix that by introducing a new pair of helpers to set _PAGE_NUMA in a
way similar to ptep/pmdp_set_wrprotect(), with a generic implementation
using set_pte_at() and a powerpc specific one using the appropriate
mechanism needed to keep the hash table in sync.

Acked-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-02-17 11:19:36 +11:00
Aneesh Kumar K.V
9d85d5863f mm: Dirty accountable change only apply to non prot numa case
So move it within the if loop

Acked-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-02-17 11:19:36 +11:00
Aneesh Kumar K.V
88247e8d7b powerpc/mm: Add new "set" flag argument to pte/pmd update function
pte_update() is a powerpc-ism used to change the bits of a PTE
when the access permission is being restricted (a flush is
potentially needed).

It uses atomic operations on when needed and handles the hash
synchronization on hash based processors.

It is currently only used to clear PTE bits and so the current
implementation doesn't provide a way to also set PTE bits.

The new _PAGE_NUMA bit, when set, is actually restricting access
so it must use that function too, so this change adds the ability
for pte_update() to also set bits.

We will use this later to set the _PAGE_NUMA bit.

Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-02-17 11:19:35 +11:00