This update the proto-VSID and VSID scramble related information
to be more generic by using names instead of current values.
Reviewed-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Increase max addressable range to 64TB. This is not tested on
real hardware yet.
Reviewed-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
With larger vsid we need to track more bits of ESID in slb cache
for slb invalidate.
Reviewed-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch makes the high psizes mask as an unsigned char array
so that we can have more than 16TB. Currently we support upto
64TB
Reviewed-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
As we keep increasing PGTABLE_RANGE we need not increase the virual
map area for kernel.
Reviewed-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch convert different functions to take virtual page number
instead of virtual address. Virtual page number is virtual address
shifted right by VPN_SHIFT (12) bits. This enable us to have an
address range of upto 76 bits.
Reviewed-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
PTRS_PER_PUD should be based on PUD_INDEX_SIZE, not PMD_INDEX_SIZE. We
got away with it because PUD and PMD had the same index size, but this is
no longer true with Aneesh's patchset to support a 46-bit user effective
address space.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
On POWER6 and POWER7 if the input operand to an instruction is a
denormalised single precision binary floating point value we can take
a denormalisation exception where it's expected that the hypervisor
(HV=1) will fix up the inputs before the instruction is run.
This adds code to handle this denormalisation exception for POWER6 and
POWER7.
It also add a CONFIG_PPC_DENORMALISATION option and sets it in
pseries/ppc64_defconfig.
This is useful on bare metal systems only. Based on patch from Milton
Miller.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Merge Gavin patches from the PCI tree as subsequent powerpc
patches are going to depend on them
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently fifo transfer is started when submitting a transfer
request. Add posibility to defer the fifo transfer and start it
later by calling additional function. This change is backward
compatible, the behaviour of mpc52xx_lpbfifo_submit() is the same
for previous driver users, so there is no need to adapt them.
Signed-off-by: Anatolij Gustschin <agust@denx.de>
* commit 'v3.6-rc5': (1098 commits)
Linux 3.6-rc5
HID: tpkbd: work even if the new Lenovo Keyboard driver is not configured
Remove user-triggerable BUG from mpol_to_str
xen/pciback: Fix proper FLR steps.
uml: fix compile error in deliver_alarm()
dj: memory scribble in logi_dj
Fix order of arguments to compat_put_time[spec|val]
xen: Use correct masking in xen_swiotlb_alloc_coherent.
xen: fix logical error in tlb flushing
xen/p2m: Fix one-off error in checking the P2M tree directory.
powerpc: Don't use __put_user() in patch_instruction
powerpc: Make sure IPI handlers see data written by IPI senders
powerpc: Restore correct DSCR in context switch
powerpc: Fix DSCR inheritance in copy_thread()
powerpc: Keep thread.dscr and thread.dscr_inherit in sync
powerpc: Update DSCR on all CPUs when writing sysfs dscr_default
powerpc/powernv: Always go into nap mode when CPU is offline
powerpc: Give hypervisor decrementer interrupts their own handler
powerpc/vphn: Fix arch_update_cpu_topology() return value
ARM: gemini: fix the gemini build
...
Conflicts:
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
drivers/rapidio/devices/tsi721.c
* commit 'v3.6-rc5': (1098 commits)
Linux 3.6-rc5
HID: tpkbd: work even if the new Lenovo Keyboard driver is not configured
Remove user-triggerable BUG from mpol_to_str
xen/pciback: Fix proper FLR steps.
uml: fix compile error in deliver_alarm()
dj: memory scribble in logi_dj
Fix order of arguments to compat_put_time[spec|val]
xen: Use correct masking in xen_swiotlb_alloc_coherent.
xen: fix logical error in tlb flushing
xen/p2m: Fix one-off error in checking the P2M tree directory.
powerpc: Don't use __put_user() in patch_instruction
powerpc: Make sure IPI handlers see data written by IPI senders
powerpc: Restore correct DSCR in context switch
powerpc: Fix DSCR inheritance in copy_thread()
powerpc: Keep thread.dscr and thread.dscr_inherit in sync
powerpc: Update DSCR on all CPUs when writing sysfs dscr_default
powerpc/powernv: Always go into nap mode when CPU is offline
powerpc: Give hypervisor decrementer interrupts their own handler
powerpc/vphn: Fix arch_update_cpu_topology() return value
ARM: gemini: fix the gemini build
...
Conflicts:
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
drivers/rapidio/devices/tsi721.c
Freescale's Integrated Flash controller(IFC) v1.1.0 supports 40 bit
address bus width.
In case more than 32 bit address is used, the EXT registers should be set.
Add support of ext registers.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: York Sun <yorksun@freescale.com>
Signed-off-by: Prabhakar Kushwaha <prabhakar@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
All SOC device error interrupts are muxed and delivered to the core
as a single MPIC error interrupt. Currently all the device drivers
requiring access to device errors have to register for the MPIC error
interrupt as a shared interrupt.
With this patch we add interrupt demuxing capability in the mpic driver,
allowing device drivers to register for their individual error interrupts.
This is achieved by handling error interrupts in a cascaded fashion.
MPIC error interrupt is handled by the "error_int_handler", which
subsequently demuxes it using the EISR and delivers it to the respective
drivers.
The error interrupt capability is dependent on the MPIC EIMR register,
which was introduced in FSL MPIC version 4.1 (P4080 rev2). So, error
interrupt demuxing capability is dependent on the MPIC version and can
be used for versions >= 4.1.
Signed-off-by: Varun Sethi <Varun.Sethi@freescale.com>
Signed-off-by: Bogdan Hamciuc <bogdan.hamciuc@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Remove the dependency on PCI initialization for SWIOTLB initialization.
So that PCI can be initialized at proper time.
SWIOTLB is partly determined by PCI inbound/outbound map which is assigned
in PCI initialization. But swiotlb_init() should be done at the stage of
mem_init() which is much earlier than PCI initialization. So we reserve the
memory for SWIOTLB first and free it if not necessary.
All boards are converted to fit this change.
Signed-off-by: Jia Hongtao <B38951@freescale.com>
Signed-off-by: Li Yang <leoli@freescale.com>
Acked-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Add support to disable and re-enable individual cores at runtime on
MPC85xx/QorIQ SMP machines. Currently support e500v1/e500v2 core.
MPC85xx machines use ePAPR spin-table in boot page for CPU kick-off. This
patch uses the boot page from bootloader to boot core at runtime. It
supports 32-bit and 36-bit physical address.
Signed-off-by: Li Yang <leoli@freescale.com>
Signed-off-by: Jin Qing <b24347@freescale.com>
Signed-off-by: Zhao Chenhui <chenhui.zhao@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Do hardware timebase sync. Firstly, stop all timebases, and transfer the
timebase value of the boot core to the other core. Finally, start all
timebases.
Only apply to dual-core chips, such as MPC8572, P2020, etc.
Signed-off-by: Zhao Chenhui <chenhui.zhao@freescale.com>
Signed-off-by: Li Yang <leoli@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
In the case of cpu hotplug, the cpu_state should be set to CPU_UP_PREPARE
when kicking cpu. Otherwise, the cpu_state is always CPU_DEAD after
calling generic_set_cpu_dead(), which makes the delay in generic_cpu_die()
not happen.
Signed-off-by: Zhao Chenhui <chenhui.zhao@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Previously, these interrupts would be mapped, but the offset calculation
was broken, and only the first group was initialized.
Signed-off-by: Varun Sethi <Varun.Sethi@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
This patch implements pcibios_window_alignment() so powerpc platforms can
force P2P bridge windows to be at larger alignments than the PCI spec
requires.
[bhelgaas: changelog]
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
These are no longer used so get rid of them
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently we mark the DABRX to interrupt on all matches
(hypervisor/kernel/user and then filter in software. We can be a lot
smarter now that we can set the DABRX dynamically.
This sets the DABRX based on the flags passed by the user.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Rework set_dabr to take a DABRX value as well.
Both the pseries and PS3 hypervisors do some checks on the DABRX
values that are passed in the hcall. This patch stops bogus values
from being passed to hypervisor. Also, in the case where we are
clearing the breakpoint, where DABR and DABRX are zero, we modify the
DABRX value to make it valid so that the hcall won't fail.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The patch does cleanup on EEH PCI address cache based on the fact
EEH core is the only user of the component.
* Cleanup on function names so that they all have prefix
"eeh" and looks more short.
* Function printk() has been replaced with pr_debug() or
pr_warning() accordingly.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The idea comes from Benjamin Herrenschmidt. The eeh cache helps
fetching the pci device according to the given I/O address. Since
the eeh cache is serving for eeh, it's reasonable for eeh cache
to trace eeh device except pci device.
The patch make eeh cache to trace eeh device. Also, the major
eeh entry function eeh_dn_check_failure has been renamed to
eeh_dev_check_failure since it will take eeh device as input
parameter.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
While EEH module is installed, PCI devices is checked one by one
to see if it supports eeh. On different platforms, the PCI devices
are referred through different ways when the EEH module is loaded.
For example, on pSeries platform, that is done by OF node. However,
we would do that by real PCI devices (struct pci_dev) on PowerNV
platform in future. So we needs some mechanism to differentiate
those cases by classifying them to probe modes, either from OF
nodes or real PCI devices.
The patch implements the support to eeh probe mode. Also, the
EEH on pSeries has set it into EEH_PROBE_MODE_DEVTREE. That means
the probe will be done based on OF nodes on pSeries platform.
In addition, On pSeries platform, it's done by OF nodes. The patch
moves the the probe function from EEH core to platform dependent
backend and some cleanup applied.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The patch removes the eeh related statistics for eeh device since
they have been maintained by the corresponding eeh PE. Also, the
flags used to trace the state of eeh device and PE have been reworked
for a little bit.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The patch reworks the current implementation so that the eeh errors
will be handled basing on PE instead of eeh device.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The patch implements reset based on PE instead of eeh device. Also,
The functions used to retrieve the reset type, either hot or fundamental
reset, have been reworked for a little bit. More specificly, it's
implemented based the the eeh device traverse function.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The patch refactors the original implementation in order to enable
I/O and retrieve EEH log based on PE.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The patch introduces the function to traverse the devices of the
specified PE and its child PEs. Also, the restore on device bars
is implemented based on the traverse function.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Originally, all the EEH operations were implemented based on OF node.
Actually, it explicitly breaks the rules that the operation target
is PE instead of device. Therefore, the patch makes all the operations
based on PE instead of device.
Unfortunately, the backend for config space has to be kept as original
because it doesn't depend on PE.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
There're 2 conditions to trigger EEH error detection: invalid value
returned from reading I/O or config space. On each case, the function
eeh_dn_check_failure will be called to initialize EEH event and put
it into the poll for further processing.
The patch changes the function for a little bit so that the EEH error
will be traced based on PE instead of EEH device any more. Also, the
function eeh_find_device_pe() has been removed since the eeh device
is tracing the PE by struct eeh_dev::pe.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Since we've introduced dedicated struct to trace individual PEs,
it's reasonable to trace its state through the dedicated struct
instead of using "eeh_dev" any more.
The patches implements the state tracing based on PE. It's notable
that the PE state will be applied to the specified PE as well as
its child PEs. That complies with the rule that problematic parent
PE will prevent those child PEs from working properly.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The original implementation builds EEH event based on EEH device.
We already had dedicated struct to depict PE. It's reasonable to
build EEH event based on PE.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
During PCI hotplug and EEH recovery, the PE hierarchy tree might be
changed due to the PCI topology changes. At later point when the
PCI device is added, the PE will be created dynamically again.
The patch introduces new function to remove EEH devices from the
associated PE. That also can cause that the parent PE is removed
from the PE tree if the parent PE doesn't include valid EEH devices
and child PEs.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The patch creates PEs and associated the newly created PEs with
it parent/silbing as well as EEH devices. It would become more
straight to trace EEH errors and recover them accordingly.
Once the EEH functionality on one PCI IOA has been enabled, we
tries to create PE against it. If there's existing PE, to which
the current PCI IOA should be attached, the existing PE will be
converted from "device" type to "bus" type accordingly.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The patch implements searching PE based on the following
requirements:
* Search PE according to PE address, which is traditional
PE address that is composed of PCI bus/device/function
number, or unified PE address assigned by firmware or
platform.
* Search parent PE according to the given EEH device. It's
useful when creating new PE and put it into right position.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
For one particular PE, it's only meaningful in the ancestor PHB
domain. Therefore, each PHB should have its own PE hierarchy tree
to trace those PEs created against the PHB.
The patch creates PEs for the PHBs and put those PEs into the
global link list traced by "eeh_phb_pe". The link list of PEs
would be first level of overall PE hierarchy tree across the
system.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The patch introduces global mutex for EEH so that the core data
structures can be protected by that. Also, 2 inline functions
are exported for that: eeh_lock() and eeh_unlock().
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
As defined in PAPR 2.4, Partitionable Endpoint (PE) is an I/O subtree
that can be treated as a unit for the purposes of partitioning and error
recovery. Therefore, eeh core should be aware of PE. With eeh_pe struct,
we can support PE explicitly. Further more, it makes all the stuff much
more data centralized. Another important reason is for eeh core to support
multiple platforms. Some of them like pSeries figures out PEs through
OF nodes while others like powernv have to do that through PCI bus/device
tree. With explicit PE support, eeh core will be implemented based on
the centrialized data and platform dependent implementations figure it
out by their feasible ways.
When the struct is designed, following factors are taken in account:
* Reflecting the relationships of PEs. PE might have parent
as well children.
* Reflecting the association of PE and (eeh) devices.
* PEs have PHB boundary.
* PE should have unique address assigned in the corresponding
PHB domain.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently, we have 3 phases for EEH initialization on pSeries platform.
All of them are done through builtin functions: platform initialization,
EEH device creation, and EEH subsystem enablement. All of them are done
no later than ppc_md.setup_arch. That means that the slab/slub isn't ready
yet, so we have to allocate memory chunks on basis of PAGE_SIZE for those
dynamically created EEH devices. That's pretty expensive.
In order to utilize slab/slub for memory allocation, we have to move the EEH
initialization functions around, but all of them should be called after slab
is ready.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
There are some device-tree nodes, whose values are of type phys_addr_t.
The phys_addr_t is variable sized based on the CONFIG_PHSY_T_64BIT.
Change these to a fixed unsigned long long for consistency.
This patch does the change only for memory_limit.
The following is a list of such variables which need the change:
1) kernel_end, crashk_size - in arch/powerpc/kernel/machine_kexec.c
2) (struct resource *)crashk_res.start - We could export a local static
variable from machine_kexec.c.
Changing the above values might break the kexec-tools. So, I will
fix kexec-tools first to handle the different sized values and then change
the above.
Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Suzuki K. Poulose <suzuki@in.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This definition will be used by subsequent perf and oprofile patches
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
commit: 8b7b80b9eb
[24/29] powerpc: Uprobes port to powerpc
Caused a clash with the fore200e driver:
In file included from drivers/atm/fore200e.c:70:0:
drivers/atm/fore200e.h:263:3: error: redefinition of typedef 'opcode_t' with different type
arch/powerpc/include/asm/probes.h:25:13: note: previous declaration of 'opcode_t' was here
Fix the namespace clash by making opcode_t in probes.h to ppc_opcode_t.
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Critical exception on 64-bit booke uses user-visible SPRG3 as scratch.
Restore VDSO information in SPRG3 on exception prolog.
Use a common sprg3 field in PACA for all powerpc64 architectures.
Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The CPU hotplug code for the powernv platform currently only puts
offline CPUs into nap mode if the powersave_nap variable is set.
However, HV-style KVM on this platform requires secondary CPU threads
to be offline and in nap mode. Since we know nap mode works just
fine on all POWER7 machines, and the only machines that support the
powernv platform are POWER7 machines, this changes the code to
always put offline CPUs into nap mode, regardless of powersave_nap.
Powersave_nap still controls whether or not CPUs go into nap mode
when idle, as before.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Embedded.Hypervisor category defines GSPRG0..3 physical registers for guests.
Avoid SPRG4-7 usage as scratch in host exception handlers, otherwise guest
SPRG4-7 registers will be clobbered.
For bolted TLB miss exception handlers, which is the version currently
supported by KVM, use SPRN_SPRG_GEN_SCRATCH aka SPRG0 instead of
SPRN_SPRG_TLB_SCRATCH aka SPRG6. Keep using TLB PACA slots to fit in one
64-byte cache line.
For critical exception handlers use SPRG3 instead of SPRG7. Provide a routine
to store and restore user-visible SPRGs. This will be subsequently used
to restore VDSO information in SPRG3. Add EX_R13 to paca slots to free up
SPRG3 and change the critical exception epilog to use it.
Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Guest Doorbell interrupts use guest save and restore registers. Add a new
Guest Doorbell exception type to accommodate GSRR0/1 SPRs usage in exception
prolog and fix the exception handler.
Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This is the port of uprobes to powerpc. Usage is similar to x86.
[root@xxxx ~]# ./bin/perf probe -x /lib64/libc.so.6 malloc
Added new event:
probe_libc:malloc (on 0xb4860)
You can now use it in all perf tools, such as:
perf record -e probe_libc:malloc -aR sleep 1
[root@xxxx ~]# ./bin/perf record -e probe_libc:malloc -aR sleep 20
[ perf record: Woken up 22 times to write data ]
[ perf record: Captured and wrote 5.843 MB perf.data (~255302 samples) ]
[root@xxxx ~]# ./bin/perf report --stdio
...
69.05% tar libc-2.12.so [.] malloc
28.57% rm libc-2.12.so [.] malloc
1.32% avahi-daemon libc-2.12.so [.] malloc
0.58% bash libc-2.12.so [.] malloc
0.28% sshd libc-2.12.so [.] malloc
0.08% irqbalance libc-2.12.so [.] malloc
0.05% bzip2 libc-2.12.so [.] malloc
0.04% sleep libc-2.12.so [.] malloc
0.03% multipathd libc-2.12.so [.] malloc
0.01% sendmail libc-2.12.so [.] malloc
0.01% automount libc-2.12.so [.] malloc
The trap_nr addition patch is a prereq.
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Add thread_struct.trap_nr and use it to store the last exception
the thread experienced. In this patch, we populate the field at
various places where we force_sig_info() to the process.
This is also used in uprobes to determine if the probed instruction
caused an exception.
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Move is_trap() and relatives to a common file to be shared between kprobes
and uprobes.
Code movement only; no change in functionality.
Suggested by Michael Ellerman.
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We have an old FIXME in reg.h which points out that we should standardise
on PVR_foo for our PVR #defines. Currently we use PVR_ on 32-bit and PV_
on 64-bit.
So do that rename and remove the FIXME.
Seeing as we're touching all but one usage of __is_processor(), rename it
to something less ugly and more indicative of what it does, which is
simply to check the PVR version.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
It contains no code and is not included by anyone.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
In commit f5339277 "powerpc: Remove FW_FEATURE ISERIES from arch code", we
removed the bulk of the iSeries code, but missed a few bits.
Remove the mschunks bits, these were only ever used on iSeries as far as I
know, and are definitely not used anymore.
Make it even clearer that phys_to_abs() is a nop, by making it a macro. We
still have a few users of this, but should clean those up.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Merging critical fixes from upstream required for development.
* upstream/master: (809 commits)
libata: Add a space to " 2GB ATA Flash Disk" DMA blacklist entry
Revert "powerpc: Update g5_defconfig"
powerpc/perf: Use pmc_overflow() to detect rolled back events
powerpc: Fix VMX in interrupt check in POWER7 copy loops
powerpc: POWER7 copy_to_user/copy_from_user patch applied twice
powerpc: Fix personality handling in ppc64_personality()
powerpc/dma-iommu: Fix IOMMU window check
powerpc: Remove unnecessary ifdefs
powerpc/kgdb: Restore current_thread_info properly
powerpc/kgdb: Bail out of KGDB when we've been triggered
powerpc/kgdb: Do not set kgdb_single_step on ppc
powerpc/mpic_msgr: Add missing includes
powerpc: Fix null pointer deref in perf hardware breakpoints
powerpc: Fixup whitespace in xmon
powerpc: Fix xmon dl command for new printk implementation
xfs: check for possible overflow in xfs_ioc_trim
xfs: unlock the AGI buffer when looping in xfs_dialloc
xfs: fix uninitialised variable in xfs_rtbuf_get()
powerpc/fsl: fix "Failed to mount /dev: No such device" errors
powerpc/fsl: update defconfigs
...
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Pull powerpc fixes from Benjamin Herrenschmidt:
"I meant to sent that earlier but got swamped with other things, so
here are some powerpc fixes for 3.6. A few regression fixes and some
bug fixes that I deemed should still make it.
There's a FSL update from Kumar with a bunch of defconfig updates
along with a few embedded fixes.
I also reverted my g5_defconfig update that I merged earlier as it was
completely busted, not too sure what happened there, I'll do a new one
later."
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
Revert "powerpc: Update g5_defconfig"
powerpc/perf: Use pmc_overflow() to detect rolled back events
powerpc: Fix VMX in interrupt check in POWER7 copy loops
powerpc: POWER7 copy_to_user/copy_from_user patch applied twice
powerpc: Fix personality handling in ppc64_personality()
powerpc/dma-iommu: Fix IOMMU window check
powerpc: Remove unnecessary ifdefs
powerpc/kgdb: Restore current_thread_info properly
powerpc/kgdb: Bail out of KGDB when we've been triggered
powerpc/kgdb: Do not set kgdb_single_step on ppc
powerpc/mpic_msgr: Add missing includes
powerpc: Fix null pointer deref in perf hardware breakpoints
powerpc: Fixup whitespace in xmon
powerpc: Fix xmon dl command for new printk implementation
powerpc/fsl: fix "Failed to mount /dev: No such device" errors
powerpc/fsl: update defconfigs
booke/wdt: some ioctls do not return values properly
powerpc/p4080ds: dts - add usb controller version info and port0
powerpc/85xx: mpc85xx_defconfig - add VIA PATA support for MPC85xxCDS
powerpc/fsl-pci: Only scan PCI bus if configured as a host
Add several #includes that mpic_msgr relies on being pulled implicitly,
which only happens on certain configs.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Cc: Meador Inge <meador_inge@mentor.com>
Cc: Jia Hongtao <B38951@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The archs that implement virtual cputime accounting all
flush the cputime of a task when it gets descheduled
and sometimes set up some ground initialization for the
next task to account its cputime.
These archs all put their own hooks in their context
switch callbacks and handle the off-case themselves.
Consolidate this by creating a new account_switch_vtime()
callback called in generic code right after a context switch
and that these archs must implement to flush the prev task
cputime and initialize the next task cputime related state.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
When we map a page that wasn't icache cleared before, do so when first
mapping it in KVM using the same information bits as the Linux mapping
logic. That way we are 100% sure that any page we map does not have stale
entries in the icache.
Signed-off-by: Alexander Graf <agraf@suse.de>
Two reasons:
- x86 can integrate rmap and rmap_pde and remove heuristics in
__gfn_to_rmap().
- Some architectures do not need rmap.
Since rmap is one of the most memory consuming stuff in KVM, ppc'd
better restrict the allocation to Book3S HV.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
- bring back critical fixes (esp. aa67f6096c)
- provide an updated base for development
* upstream: (4334 commits)
missed mnt_drop_write() in do_dentry_open()
UBIFS: nuke pdflush from comments
gfs2: nuke pdflush from comments
drbd: nuke pdflush from comments
nilfs2: nuke write_super from comments
hfs: nuke write_super from comments
vfs: nuke pdflush from comments
jbd/jbd2: nuke write_super from comments
btrfs: nuke pdflush from comments
btrfs: nuke write_super from comments
ext4: nuke pdflush from comments
ext4: nuke write_super from comments
ext3: nuke write_super from comments
Documentation: fix the VM knobs descritpion WRT pdflush
Documentation: get rid of write_super
vfs: kill write_super and sync_supers
ACPI processor: Fix tick_broadcast_mask online/offline regression
ACPI: Only count valid srat memory structures
ACPI: Untangle a return statement for better readability
Linux 3.6-rc1
...
Signed-off-by: Avi Kivity <avi@redhat.com>
Merge Andrew's first set of patches:
"Non-MM patches:
- lots of misc bits
- tree-wide have_clk() cleanups
- quite a lot of printk tweaks. I draw your attention to "printk:
convert the format for KERN_<LEVEL> to a 2 byte pattern" which
looks a bit scary. But afaict it's solid.
- backlight updates
- lib/ feature work (notably the addition and use of memweight())
- checkpatch updates
- rtc updates
- nilfs updates
- fatfs updates (partial, still waiting for acks)
- kdump, proc, fork, IPC, sysctl, taskstats, pps, etc
- new fault-injection feature work"
* Merge emailed patches from Andrew Morton <akpm@linux-foundation.org>: (128 commits)
drivers/misc/lkdtm.c: fix missing allocation failure check
lib/scatterlist: do not re-write gfp_flags in __sg_alloc_table()
fault-injection: add tool to run command with failslab or fail_page_alloc
fault-injection: add selftests for cpu and memory hotplug
powerpc: pSeries reconfig notifier error injection module
memory: memory notifier error injection module
PM: PM notifier error injection module
cpu: rewrite cpu-notifier-error-inject module
fault-injection: notifier error injection
c/r: fcntl: add F_GETOWNER_UIDS option
resource: make sure requested range is included in the root range
include/linux/aio.h: cpp->C conversions
fs: cachefiles: add support for large files in filesystem caching
pps: return PTR_ERR on error in device_create
taskstats: check nla_reserve() return
sysctl: suppress kmemleak messages
ipc: use Kconfig options for __ARCH_WANT_[COMPAT_]IPC_PARSE_VERSION
ipc: compat: use signed size_t types for msgsnd and msgrcv
ipc: allow compat IPC version field parsing if !ARCH_WANT_OLD_COMPAT_IPC
ipc: add COMPAT_SHMLBA support
...
Rather than #define the options manually in the architecture code, add
Kconfig options for them and select them there instead. This also allows
us to select the compat IPC version parsing automatically for platforms
using the old compat IPC interface.
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull DMA-mapping updates from Marek Szyprowski:
"Those patches are continuation of my earlier work.
They contains extensions to DMA-mapping framework to remove limitation
of the current ARM implementation (like limited total size of DMA
coherent/write combine buffers), improve performance of buffer sharing
between devices (attributes to skip cpu cache operations or creation
of additional kernel mapping for some specific use cases) as well as
some unification of the common code for dma_mmap_attrs() and
dma_mmap_coherent() functions. All extensions have been implemented
and tested for ARM architecture."
* 'for-linus-for-3.6-rc1' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping:
ARM: dma-mapping: add support for DMA_ATTR_SKIP_CPU_SYNC attribute
common: DMA-mapping: add DMA_ATTR_SKIP_CPU_SYNC attribute
ARM: dma-mapping: add support for dma_get_sgtable()
common: dma-mapping: introduce dma_get_sgtable() function
ARM: dma-mapping: add support for DMA_ATTR_NO_KERNEL_MAPPING attribute
common: DMA-mapping: add DMA_ATTR_NO_KERNEL_MAPPING attribute
common: dma-mapping: add support for generic dma_mmap_* calls
ARM: dma-mapping: fix error path for memory allocation failure
ARM: dma-mapping: add more sanity checks in arm_dma_mmap()
ARM: dma-mapping: remove custom consistent dma region
mm: vmalloc: use const void * for caller argument
scatterlist: add sg_alloc_table_from_pages function
Commit 9adc5374 ('common: dma-mapping: introduce mmap method') added a
generic method for implementing mmap user call to dma_map_ops structure.
This patch converts ARM and PowerPC architectures (the only providers of
dma_mmap_coherent/dma_mmap_writecombine calls) to use this generic
dma_map_ops based call and adds a generic cross architecture
definition for dma_mmap_attrs, dma_mmap_coherent, dma_mmap_writecombine
functions.
The generic mmap virt_to_page-based fallback implementation is provided for
architectures which don't provide their own implementation for mmap method.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Kyungmin Park <kyungmin.park@samsung.com>
Pull final kmap_atomic cleanups from Cong Wang:
"This should be the final round of cleanup, as the definitions of enum
km_type finally get removed from the whole tree. The patches have
been in linux-next for a long time."
* 'kmap_atomic' of git://github.com/congwang/linux:
pipe: remove KM_USER0 from comments
vmalloc: remove KM_USER0 from comments
feature-removal-schedule.txt: remove kmap_atomic(page, km_type)
tile: remove km_type definitions
um: remove km_type definitions
asm-generic: remove km_type definitions
avr32: remove km_type definitions
frv: remove km_type definitions
powerpc: remove km_type definitions
arm: remove km_type definitions
highmem: remove the deprecated form of kmap_atomic
tile: remove usage of enum km_type
frv: remove the second parameter of kmap_atomic_primary()
jbd2: remove the second argument of kmap_atomic
Merge patches queued during the run-up to the merge window.
* queue: (25 commits)
KVM: Choose better candidate for directed yield
KVM: Note down when cpu relax intercepted or pause loop exited
KVM: Add config to support ple or cpu relax optimzation
KVM: switch to symbolic name for irq_states size
KVM: x86: Fix typos in pmu.c
KVM: x86: Fix typos in lapic.c
KVM: x86: Fix typos in cpuid.c
KVM: x86: Fix typos in emulate.c
KVM: x86: Fix typos in x86.c
KVM: SVM: Fix typos
KVM: VMX: Fix typos
KVM: remove the unused parameter of gfn_to_pfn_memslot
KVM: remove is_error_hpa
KVM: make bad_pfn static to kvm_main.c
KVM: using get_fault_pfn to get the fault pfn
KVM: MMU: track the refcount when unmap the page
KVM: x86: remove unnecessary mark_page_dirty
KVM: MMU: Avoid handling same rmap_pde in kvm_handle_hva_range()
KVM: MMU: Push trace_kvm_age_page() into kvm_age_rmapp()
KVM: MMU: Add memslot parameter to hva handlers
...
Signed-off-by: Avi Kivity <avi@redhat.com>
Host bridge hotplug
- Add MMCONFIG support for hot-added host bridges (Jiang Liu)
Device hotplug
- Move fixups from __init to __devinit (Sebastian Andrzej Siewior)
- Call FINAL fixups for hot-added devices, too (Myron Stowe)
- Factor out generic code for P2P bridge hot-add (Yinghai Lu)
- Remove all functions in a slot, not just those with _EJx (Amos Kong)
Dynamic resource management
- Track bus number allocation (struct resource tree per domain) (Yinghai Lu)
- Make P2P bridge 1K I/O windows work with resource reassignment (Bjorn Helgaas, Yinghai Lu)
- Disable decoding while updating 64-bit BARs (Bjorn Helgaas)
Power management
- Add PCIe runtime D3cold support (Huang Ying)
Virtualization
- Add VFIO infrastructure (ACS, DMA source ID quirks) (Alex Williamson)
- Add quirks for devices with broken INTx masking (Jan Kiszka)
Miscellaneous
- Fix some PCI Express capability version issues (Myron Stowe)
- Factor out some arch code with a weak, generic, pcibios_setup() (Myron Stowe)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
iQIcBAABAgAGBQJQBy+9AAoJEPGMOI97Hn6zOpQP+wVFvA7pcteFj6HPs5nTq2Hc
55oeRqCO0wBHoFMCKB0AjeTATjqxi9OhcjaiVrZejxNyWKC9MnrXuunpQ0l/hCbR
M/TK+BCelfX2FU4eXNf+TBCCcOhOVWqQft9Gm6nYKwX8Y0msRVCceI4WwhZgSwtI
vdtmnqlwolscdnq+8ThsnvUMtwkN0gExmn2FJRl6EoEgG0DTqhMkZ83uA+NPBhvv
I+g0XbA6haaZph2nnSYR0hIW4Q7JkT/LgA6uVAQxamctwxLol7xxsjCRnfqrulkf
kaRr2fAgBXfmaOIltro4UkXrCM52ZSyggCDfExHp6mWGPKMjE5ZcyK1YbGfmmumk
DS3t1S0eBdDJXrnf9l/Yb8e95dQxRCYKelKzr1rTD9QAXsInE8rC40hvhfFaTa4s
nZYRTz0SKv6coQihqaOR7shx1DNomLFk7jndaWEElfl9/cT/nQnZ8XLfVMzkJNNB
Y4SM6zkiIaCL0aiSEE16MqVjmODYRjbURLYzQIrqr2KJQg8X6XjIRojQLjL6xEgA
22ry2ZRPhqO68g7aLqvixiSDaTp0Z0Vw+JmgjtBqvkokwZcGQtm4umkpAdOi+Es8
3bJaMY7ZUpDX53FE8iyP6AnmR/1k19rC1gNnNq/syWyjtYOYJ9i3QCTafFgvE1VC
5coQ1L5tByHvpzK5PHwf
=oo/A
-----END PGP SIGNATURE-----
Merge tag 'for-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI changes from Bjorn Helgaas:
"Host bridge hotplug:
- Add MMCONFIG support for hot-added host bridges (Jiang Liu)
Device hotplug:
- Move fixups from __init to __devinit (Sebastian Andrzej Siewior)
- Call FINAL fixups for hot-added devices, too (Myron Stowe)
- Factor out generic code for P2P bridge hot-add (Yinghai Lu)
- Remove all functions in a slot, not just those with _EJx (Amos
Kong)
Dynamic resource management:
- Track bus number allocation (struct resource tree per domain)
(Yinghai Lu)
- Make P2P bridge 1K I/O windows work with resource reassignment
(Bjorn Helgaas, Yinghai Lu)
- Disable decoding while updating 64-bit BARs (Bjorn Helgaas)
Power management:
- Add PCIe runtime D3cold support (Huang Ying)
Virtualization:
- Add VFIO infrastructure (ACS, DMA source ID quirks) (Alex
Williamson)
- Add quirks for devices with broken INTx masking (Jan Kiszka)
Miscellaneous:
- Fix some PCI Express capability version issues (Myron Stowe)
- Factor out some arch code with a weak, generic, pcibios_setup()
(Myron Stowe)"
* tag 'for-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (122 commits)
PCI: hotplug: ensure a consistent return value in error case
PCI: fix undefined reference to 'pci_fixup_final_inited'
PCI: build resource code for M68K architecture
PCI: pciehp: remove unused pciehp_get_max_lnk_width(), pciehp_get_cur_lnk_width()
PCI: reorder __pci_assign_resource() (no change)
PCI: fix truncation of resource size to 32 bits
PCI: acpiphp: merge acpiphp_debug and debug
PCI: acpiphp: remove unused res_lock
sparc/PCI: replace pci_cfg_fake_ranges() with pci_read_bridge_bases()
PCI: call final fixups hot-added devices
PCI: move final fixups from __init to __devinit
x86/PCI: move final fixups from __init to __devinit
MIPS/PCI: move final fixups from __init to __devinit
PCI: support sizing P2P bridge I/O windows with 1K granularity
PCI: reimplement P2P bridge 1K I/O windows (Intel P64H2)
PCI: disable MEM decoding while updating 64-bit MEM BARs
PCI: leave MEM and IO decoding disabled during 64-bit BAR sizing, too
PCI: never discard enable/suspend/resume_early/resume fixups
PCI: release temporary reference in __nv_msi_ht_cap_quirk()
PCI: restructure 'pci_do_fixups()'
...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJQDRDNAAoJEI7yEDeUysxlkl8P/3C2AHx2webOU8sVzhfU6ONZ
ZoGevwBjyZIeJEmiWVpFTTEew1l0PXtpyOocXGNUXIddVnhXTQOKr/Scj4uFbmx8
ROqgK8NSX9+xOGrBPCoN7SlJkmp+m6uYtwYkl2SGnsEVLWMKkc7J7oqmszCcTQvN
UXMf7G47/Ul2NUSBdv4Yvizhl4kpvWxluiweDw3E/hIQKN0uyP7CY58qcAztw8nG
csZBAnnuPFwIAWxHXW3eBBv4UP138HbNDqJ/dujjocM6GnOxmXJmcZ6b57gh+Y64
3+w9IR4qrRWnsErb/I8inKLJ1Jdcf7yV2FmxYqR4pIXay2Yzo1BsvFd6EB+JavUv
pJpixrFiDDFoQyXlh4tGpsjpqdXNMLqyG4YpqzSZ46C8naVv9gKE7SXqlXnjyDlb
Llx3hb9Fop8O5ykYEGHi+gIISAK5eETiQl4yw9RUBDpxydH4qJtqGIbLiDy8y9wi
Xyi8PBlNl+biJFsK805lxURqTp/SJTC3+Zb7A7CzYEQm5xZw3W/CKZx1ZYBfpaa/
pWaP6tB7JwgLIVXi4HQayLWqMVwH0soZIn9yazpOEFv6qO8d5QH5RAxAW2VXE3n5
JDlrajar/lGIdiBVWfwTJLb86gv3QDZtIWoR9mZuLKeKWE/6PRLe7HQpG1pJovsm
2AsN5bS0BWq+aqPpZHa5
=pECD
-----END PGP SIGNATURE-----
Merge tag 'kvm-3.6-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Avi Kivity:
"Highlights include
- full big real mode emulation on pre-Westmere Intel hosts (can be
disabled with emulate_invalid_guest_state=0)
- relatively small ppc and s390 updates
- PCID/INVPCID support in guests
- EOI avoidance; 3.6 guests should perform better on 3.6 hosts on
interrupt intensive workloads)
- Lockless write faults during live migration
- EPT accessed/dirty bits support for new Intel processors"
Fix up conflicts in:
- Documentation/virtual/kvm/api.txt:
Stupid subchapter numbering, added next to each other.
- arch/powerpc/kvm/booke_interrupts.S:
PPC asm changes clashing with the KVM fixes
- arch/s390/include/asm/sigp.h, arch/s390/kvm/sigp.c:
Duplicated commits through the kvm tree and the s390 tree, with
subsequent edits in the KVM tree.
* tag 'kvm-3.6-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (93 commits)
KVM: fix race with level interrupts
x86, hyper: fix build with !CONFIG_KVM_GUEST
Revert "apic: fix kvm build on UP without IOAPIC"
KVM guest: switch to apic_set_eoi_write, apic_write
apic: add apic_set_eoi_write for PV use
KVM: VMX: Implement PCID/INVPCID for guests with EPT
KVM: Add x86_hyper_kvm to complete detect_hypervisor_platform check
KVM: PPC: Critical interrupt emulation support
KVM: PPC: e500mc: Fix tlbilx emulation for 64-bit guests
KVM: PPC64: booke: Set interrupt computation mode for 64-bit host
KVM: PPC: bookehv: Add ESR flag to Data Storage Interrupt
KVM: PPC: bookehv64: Add support for std/ld emulation.
booke: Added crit/mc exception handler for e500v2
booke/bookehv: Add host crit-watchdog exception support
KVM: MMU: document mmu-lock and fast page fault
KVM: MMU: fix kvm_mmu_pagetable_walk tracepoint
KVM: MMU: trace fast page fault
KVM: MMU: fast path of handling guest page fault
KVM: MMU: introduce SPTE_MMU_WRITEABLE bit
KVM: MMU: fold tlb flush judgement into mmu_spte_update
...
When we tested KVM under memory pressure, with THP enabled on the host,
we noticed that MMU notifier took a long time to invalidate huge pages.
Since the invalidation was done with mmu_lock held, it not only wasted
the CPU but also made the host harder to respond.
This patch mitigates this by using kvm_handle_hva_range().
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Cc: Alexander Graf <agraf@suse.de>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Some power systems do not have legacy ISA devices. So, /dev/port is not
a valid interface on these systems. User level tools such as kbdrate is
trying to access the device using this interface which is causing the
system crash.
This patch will fix this issue by not creating this interface on these
powerpc systems.
Signed-off-by: Haren Myneni <haren@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Add "memory" attribute in inline assembly language as a compiler
barrier to make sure 4.6.x GCC don't reorder mfmsr().
Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: stable@vger.kernel.org
We have a request for a fast method of getting CPU and NUMA node IDs
from userspace. This patch implements a getcpu VDSO function,
similar to x86.
Ben suggested we use SPRG3 which is userspace readable. SPRG3 can be
modified by a KVM guest, so we save the SPRG3 value in the paca and
restore it when transitioning from the guest to the host.
I have a glibc patch that implements sched_getcpu on top of this.
Testing on a POWER7:
baseline: 538 cycles
vdso: 30 cycles
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
QE Microcode Initialization using qe_upload_microcode() does not work on
P1021 if the IRAM-Ready register is not set after the microcode upload. Add
a definition for the "I-RAM Ready" register and sets it upon microcode
upload completion.
Signed-off-by: Ioannis Kokkoris <ioannis.kokoris@siemens-enterprise.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Add the ability to inject IOMMU faults. We enable this per device
via a fail_iommu sysfs property, similar to fault injection on other
subsystems.
An example:
...
0003:01:00.1 Ethernet controller: Emulex Corporation OneConnect 10Gb NIC (be3) (rev 02)
To inject one error to this device:
echo 1 > /sys/bus/pci/devices/0003:01:00.1/fail_iommu
echo 1 > /sys/kernel/debug/fail_iommu/probability
echo 1 > /sys/kernel/debug/fail_iommu/times
As feared, the first failure injected on the be3 results in an
unrecoverable error, taking down both functions of the card
permanently:
be2net 0003:01:00.1: Unrecoverable error in the card
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The DMA API debug code has hooks to verify all DMA entries have been
freed at time of hot unplug. We need to call dma_debug_add_bus for
this to work.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The logic to choose whether to use the SIAR or get the information
out of pt_regs is going to get more complicated, so do it once in
perf_read_regs.
We overload regs->result which is gross but we are already doing it
with regs->dsisr.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Some macros use RA where when RA=R0 the values is 0, so make this
the enforced mnemonic in the macro.
Idea suggested by Andreas Schwab.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Enforce the use of R0-R31 in macros where possible now we have all the
fixes in.
R0-R31 macros are removed here so that can't be used anymore. They
should not be defined anywhere.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Now have ___PPC_RA/B/S/T we can use it in some places. These are
places where we can't use the existing defines which will soon enforce
R0-R31 usage.
The macros being changed here are being used in inline asm, which
can't convert to enforce the R0-R31 usage.
bpf_jit uses a mix of both generated and non-generated with the same
code, so just convert all these to use the ___PPC_R versions which
won't enforce R usage later.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
These are currently the same as __PPC_RA/B/S/T but we'll wrap them
soon.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We need to do this so we can enforce the name of a and b in called
macros PPC_RA/B later.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
mtocrf define is just a wrapper around the real instructions so we can
just use real register names here (ie. lower case).
Also remove braces in macro so this is possible.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Move this duplicated definition to ppc_asm.h and remove the
braces which prevent the use of %rN register names
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Merge the defines of VCPU_GPR from different places.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Merge the defines of STACKFRAMESIZE, STK_REG, STK_PARAM from different
places.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>