Commit Graph

293 Commits

Author SHA1 Message Date
Anton Blanchard
3ece16632b powerpc: Remove assembly versions of strcpy, strcat, strlen and strcmp
A number of our assembly implementations of string functions do not
align their hot loops. I was going to align them manually, but I
realised that they are are almost instruction for instruction
identical to what gcc produces, with the advantage that gcc does
align them.

In light of that, let's just remove the assembly versions.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-14 13:58:25 +10:00
Oliver O'Halloran
6670783606 powerpc/sstep: Fix emulation fall-through
There is a switch fallthough in instr_analyze() which can cause an
invalid instruction to be emulated as a different, valid, instruction.
The rld* (opcode 30) case extracts a sub-opcode from bits 3:1 of the
instruction word. However, the only valid values of this field are 001
and 000. These cases are correctly handled, but the others are not which
causes execution to fall through into case 31.

Breaking out of the switch causes the instruction to be marked as
unknown and allows the caller to deal with the invalid instruction in a
manner consistent with other invalid instructions.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-11 21:54:08 +10:00
Lennart Sorensen
dd21731022 powerpc/sstep: Fix sstep.c compile on powerpcspe
Commit be96f63375 ("powerpc: Split out instruction analysis part of
emulate_step()") introduced ldarx and stdcx into the instructions in
sstep.c, which are not accepted by the assembler on powerpcspe, but does
seem to be accepted by the normal powerpc assembler even in 32 bit mode.

Wrap these two instructions in a __powerpc64__ check like it is
everywhere else in the file.

Fixes: be96f63375 ("powerpc: Split out instruction analysis part of emulate_step()")
Signed-off-by: Len Sorensen <lsorense@csclub.uwaterloo.ca>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-11 21:54:07 +10:00
Daniel Axtens
8fe088850f powerpc: rework sparse for lib/xor_vmx.c
Sparse doesn't seem to be passing -maltivec around properly, leading
to lots of errors:

.../include/altivec.h:34:2: error: Use the "-maltivec" flag to enable PowerPC AltiVec support
arch/powerpc/lib/xor_vmx.c:27:16: error: Expected ; at end of declaration
arch/powerpc/lib/xor_vmx.c:27:16: error: got signed
arch/powerpc/lib/xor_vmx.c:60:9: error: No right hand side of '*'-expression
arch/powerpc/lib/xor_vmx.c:60:9: error: Expected ; at end of statement
arch/powerpc/lib/xor_vmx.c:60:9: error: got v1_in
...
arch/powerpc/lib/xor_vmx.c:87:9: error: too many errors

Only include the altivec.h header for non-__CHECKER__ builds.
For builds with __CHECKER__, make up some stubs instead, as
suggested by Balbir. (The vector size of 16 is arbitrary.)

Suggested-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Tested-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-04-27 09:33:37 +10:00
Michael Ellerman
b4c6afdc3a powerpc: Make generic_memcpy() private to copy_32.S
generic_memcpy() is only called from copy_32.S, so there's no reason for
it to be global.

Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-04-11 20:30:41 +10:00
Michael Ellerman
a1b5344620 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux into next
Freescale updates from Scott:

"Highlights include 8xx optimizations, 32-bit checksum optimizations,
86xx consolidation, e5500/e6500 cpu hotplug, more fman and other dt
bits, and minor fixes/cleanup."
2016-03-14 20:05:14 +11:00
Christophe Leroy
7e393220b6 powerpc: optimise csum_partial() call when len is constant
csum_partial is often called for small fixed length packets
for which it is suboptimal to use the generic csum_partial()
function.

For instance, in my configuration, I got:
* One place calling it with constant len 4
* Seven places calling it with constant len 8
* Three places calling it with constant len 14
* One place calling it with constant len 20
* One place calling it with constant len 24
* One place calling it with constant len 32

This patch renames csum_partial() to __csum_partial() and
implements csum_partial() as a wrapper inline function which
* uses csum_add() for small 16bits multiple constant length
* uses ip_fast_csum() for other 32bits multiple constant
* uses __csum_partial() in all other cases

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>
2016-03-09 10:44:18 -06:00
Torsten Duwe
9a7841ae8d powerpc/ftrace: Use $(CC_FLAGS_FTRACE) when disabling ftrace
Rather than open-coding -pg whereever we want to disable ftrace, use the
existing $(CC_FLAGS_FTRACE) variable.

This has the advantage that it will work in future when we use a
different set of flags to enable ftrace.

Signed-off-by: Torsten Duwe <duwe@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-07 14:53:55 +11:00
Christophe Leroy
f867d556dd powerpc32: optimise csum_partial() loop
On the 8xx, load latency is 2 cycles and taking branches also takes
2 cycles. So let's unroll the loop.

This patch improves csum_partial() speed by around 10% on both:
* 8xx (single issue processor with parallel execution)
* 83xx (superscalar 6xx processor with dual instruction fetch
and parallel execution)

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>
2016-03-04 23:03:45 -06:00
Christophe Leroy
48821a34b1 powerpc32: optimise a few instructions in csum_partial()
r5 does contain the value to be updated, so lets use r5 all way long
for that. It makes the code more readable.

To avoid confusion, it is better to use adde instead of addc

The first addition is useless. Its only purpose is to clear carry.
As r4 is a signed int that is always positive, this can be done by
using srawi instead of srwi

Let's also remove the comment about bdnz having no overhead as it
is not correct on all powerpc, at least on MPC8xx

In the last part, in our situation, the remaining quantity of bytes
to be proceeded is between 0 and 3. Therefore, we can base that part
on the value of bit 31 and bit 30 of r4 instead of anding r4 with 3
then proceding on comparisons and substractions.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>
2016-03-04 23:00:52 -06:00
Christophe Leroy
7aef413656 powerpc32: rewrite csum_partial_copy_generic() based on copy_tofrom_user()
csum_partial_copy_generic() does the same as copy_tofrom_user and also
calculates the checksum during the copy. Unlike copy_tofrom_user(),
the existing version of csum_partial_copy_generic() doesn't take
benefit of the cache.

This patch is a rewrite of csum_partial_copy_generic() based on
copy_tofrom_user().
The previous version of csum_partial_copy_generic() was handling
errors. Now we have the checksum wrapper functions to handle the error
case like in powerpc64 so we can make the error case simple:
just return -EFAULT.
copy_tofrom_user() only has r12 available => we use it for the
checksum r7 and r8 which contains pointers to error feedback are used,
so we stack them.

On a TCP benchmark using socklib on the loopback interface on which
checksum offload and scatter/gather have been deactivated, we get
about 20% performance increase.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>
2016-03-04 22:53:27 -06:00
Christophe Leroy
37e08cad8f powerpc: inline ip_fast_csum()
In several architectures, ip_fast_csum() is inlined
There are functions like ip_send_check() which do nothing
much more than calling ip_fast_csum().
Inlining ip_fast_csum() allows the compiler to optimise better

Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[scottwood: whitespace and cast fixes]
Signed-off-by: Scott Wood <oss@buserror.net>
2016-03-04 21:49:49 -06:00
Christophe Leroy
03bc8b0fc8 powerpc32: checksum_wrappers_64 becomes checksum_wrappers
The powerpc64 checksum wrapper functions adds csum_and_copy_to_user()
which otherwise is implemented in include/net/checksum.h by using
csum_partial() then copy_to_user()

Those two wrapper fonctions are also applicable to powerpc32 as it is
based on the use of csum_partial_copy_generic() which also
exists on powerpc32

This patch renames arch/powerpc/lib/checksum_wrappers_64.c to
arch/powerpc/lib/checksum_wrappers.c and
makes it non-conditional to CONFIG_WORD_SIZE

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>
2016-03-04 21:47:47 -06:00
Christophe Leroy
e0f82bdf2d powerpc: unexport csum_tcpudp_magic
csum_tcpudp_magic is now an inline function, so there is
nothing to export

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>
2016-03-04 21:47:22 -06:00
Anton Blanchard
dc4fbba11e powerpc: Create disable_kernel_{fp,altivec,vsx,spe}()
The enable_kernel_*() functions leave the relevant MSR bits enabled
until we exit the kernel sometime later. Create disable versions
that wrap the kernel use of FP, Altivec VSX or SPE.

While we don't want to disable it normally for performance reasons
(MSR writes are slow), it will be used for a debug boot option that
does this and catches bad uses in other areas of the kernel.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-01 13:52:25 +11:00
LEROY Christophe
400c47d81c powerpc32: memset: only use dcbz once cache is enabled
memset() uses instruction dcbz to speed up clearing by not wasting time
loading cache line with data that will be overwritten.
Some platform like mpc52xx do no have cache active at startup and
can therefore not use memset(). Allthough no part of the code
explicitly uses memset(), GCC may make calls to it.

This patch modifies memset() such that at startup, memset()
unconditionally skip the optimised bloc that uses dcbz instruction.

Once the initial MMU is set up, in machine_init() we patch memset()
by replacing this inconditional jump by a NOP

Tested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-09-17 10:36:53 +10:00
LEROY Christophe
1cd03890ea powerpc32: memcpy: only use dcbz once cache is enabled
memcpy() uses instruction dcbz to speed up copy by not wasting time
loading cache line with data that will be overwritten.
Some platform like mpc52xx do no have cache active at startup and
can therefore not use memcpy(). Allthough no part of the code
explicitly uses memcpy(), GCC makes calls to it.

This patch modifies memcpy() such that at startup, memcpy()
unconditionally jumps to generic_memcpy() which doesn't use
the dcbz instruction.

Once the initial MMU is set up, in machine_init() we patch memcpy()
by replacing this inconditional jump by a NOP

Reported-by: Michal Sojka <sojkam1@fel.cvut.cz>
Tested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-09-17 10:36:44 +10:00
LEROY Christophe
295ffb4189 powerpc/32: Few optimisations in memcpy
This patch adds a few optimisations in memcpy functions by using
lbzu/stbu instead of lxb/stb and by re-ordering insn inside a loop
to reduce latency due to loading

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-08-07 22:59:29 -05:00
LEROY Christophe
0b05e2d671 powerpc/32: cacheable_memcpy becomes memcpy
cacheable_memcpy uses dcbz instruction and is more efficient than
memcpy when the destination is in RAM. If the destination is in an
io area, memcpy_toio() is normally used, not memcpy

This patch renames memcpy as generic_memcpy, and renames
cacheable_memcpy as memcpy

On MPC885, we get approximatly 7% increase of the transfer rate
on an FTP reception

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-08-07 22:59:27 -05:00
LEROY Christophe
c152f149ce powerpc/32: Merge the new memset() with the old one
cacheable_memzero() which has become the new memset() and the old
memset() are quite similar, so just merge them.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-08-07 22:59:24 -05:00
LEROY Christophe
5b2a32e806 powerpc/32: memset(0): use cacheable_memzero
cacheable_memzero uses dcbz instruction and is more efficient than
memset(0) when the destination is in RAM

This patch renames memset as generic_memset, and defines memset
as a prolog to cacheable_memzero. This prolog checks if the byte
to set is 0. If not, it falls back to generic_memcpy()

cacheable_memzero disappears as it is not referenced anywhere anymore

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-08-07 22:59:21 -05:00
LEROY Christophe
df087e450d Partially revert "powerpc: Remove duplicate cacheable_memcpy/memzero functions"
This partially reverts
commit 'powerpc: Remove duplicate cacheable_memcpy/memzero functions
("b05ae4ee602b7dc90771408ccf0972e1b3801a35")'

Functions cacheable_memcpy/memzero are more efficient than
memcpy/memset as they use the dcbz instruction which avoids refill
of the cacheline with the data that we will overwrite.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-08-07 22:59:21 -05:00
LEROY Christophe
92c985f1d7 powerpc: put csum_tcpudp_magic inline
csum_tcpudp_magic() is only a few instructions, and does modify
really few registers. So it is not worth having it as a separate
function and suffer function branching and saving of volatile
registers.

This patch makes it inline by use of the already existing
csum_tcpudp_nofold() function.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-08-07 22:59:19 -05:00
Linus Torvalds
08d183e3c1 powerpc updates for 4.2
- Disable the 32-bit vdso when building LE, so we can build with a 64-bit only
    toolchain.
  - EEH fixes from Gavin & Richard.
  - Enable the sys_kcmp syscall from Laurent.
  - Sysfs control for fastsleep workaround from Shreyas.
  - Expose OPAL events as an irq chip by Alistair.
  - MSI ops moved to pci_controller_ops by Daniel.
  - Fix for kernel to userspace backtraces for perf from Anton.
  - Merge pseries and pseries_le defconfigs from Cyril.
  - CXL in-kernel API from Mikey.
  - OPAL prd driver from Jeremy.
  - Fix for DSCR handling & tests from Anshuman.
  - Powernv flash mtd driver from Cyril.
  - Dynamic DMA Window support on powernv from Alexey.
  - LLVM clang fixes & workarounds from Anton.
  - Reworked version of the patch to abort syscalls when transactional.
  - Fix the swap encoding to support 4TB, from Aneesh.
  - Various fixes as usual.
  - Freescale updates from Scott: Highlights include more 8xx optimizations, an
    e6500 hugetlb optimization, QMan device tree nodes, t1024/t1023 support, and
    various fixes and cleanup.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJViSZqAAoJEFHr6jzI4aWAA7kQAKq3+pejfo2rY7alpKJyeVao
 vlaIEaDNOTh+ctcmu3MFF9Jy6fai8gNZziRXU5JRmE5RW4GVBN4KZiqXRbkVjdBK
 uG9sCX7Y58VRsS2vnGBYLsamfTMgjaXeDvgunQHVLiechJnrDr0RHEK90F3LSi73
 Axp6l8XIG63a3zFZmkhzANMCme2lm5+MWmGlSjUUNi5F+viQUgJc5iiO8xrVUgM5
 RpNlV2NJSqFiU+gMQWJ226V85UIniouq4j+qtyUcu8/m9BberyolXVU0GPlPFdsx
 r/Qh9uCJyZaUdSB5hzomQZj50IsSz6J6nEuJTeGRoVZOmeI8Dnc2xU9fxQF5fC8H
 lUJw10WPoNOggQZTeSUKn7wTXw3i4p3KsWNUczaW68VJdhqZUVaSp0+I6mnDSqzs
 9iGC+VffLYNa1OHq7mGRFrgDdLBCHes31aZ3CxlQsmyNpAPCwMzsD4TUfVnvOG6E
 oJOeaQ4mZM9PvqxEYJfoIL+vgRxmQ8sdIBtNY4in+C7J6eFnZNFO9xmPnJZuVU31
 PGtx60kjFCOVMXvqn34WkRNbgqGWI91IK0KcRwFO2LXVio1uY77TWL52kNK2IMsp
 Az+VDDvqnT3+BoV1yz0P6SrXAkwTpvFk2y+IdmEiUUN7zZFL5ZSA2epej9AzHTAK
 WID2bc5yVtIL6p6x5ICH
 =d9Wh
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux

Pull powerpc updates from Michael Ellerman:

 - disable the 32-bit vdso when building LE, so we can build with a
   64-bit only toolchain.

 - EEH fixes from Gavin & Richard.

 - enable the sys_kcmp syscall from Laurent.

 - sysfs control for fastsleep workaround from Shreyas.

 - expose OPAL events as an irq chip by Alistair.

 - MSI ops moved to pci_controller_ops by Daniel.

 - fix for kernel to userspace backtraces for perf from Anton.

 - merge pseries and pseries_le defconfigs from Cyril.

 - CXL in-kernel API from Mikey.

 - OPAL prd driver from Jeremy.

 - fix for DSCR handling & tests from Anshuman.

 - Powernv flash mtd driver from Cyril.

 - dynamic DMA Window support on powernv from Alexey.

 - LLVM clang fixes & workarounds from Anton.

 - reworked version of the patch to abort syscalls when transactional.

 - fix the swap encoding to support 4TB, from Aneesh.

 - various fixes as usual.

 - Freescale updates from Scott: Highlights include more 8xx
   optimizations, an e6500 hugetlb optimization, QMan device tree nodes,
   t1024/t1023 support, and various fixes and cleanup.

* tag 'powerpc-4.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: (180 commits)
  cxl: Fix typo in debug print
  cxl: Add CXL_KERNEL_API config option
  powerpc/powernv: Fix wrong IOMMU table in pnv_ioda_setup_bus_dma()
  powerpc/mm: Change the swap encoding in pte.
  powerpc/mm: PTE_RPN_MAX is not used, remove the same
  powerpc/tm: Abort syscalls in active transactions
  powerpc/iommu/ioda2: Enable compile with IOV=on and IOMMU_API=off
  powerpc/include: Add opal-prd to installed uapi headers
  powerpc/powernv: fix construction of opal PRD messages
  powerpc/powernv: Increase opal-irqchip initcall priority
  powerpc: Make doorbell check preemption safe
  powerpc/powernv: pnv_init_idle_states() should only run on powernv
  macintosh/nvram: Remove as unused
  powerpc: Don't use gcc specific options on clang
  powerpc: Don't use -mno-strict-align on clang
  powerpc: Only use -mtraceback=no, -mno-string and -msoft-float if toolchain supports it
  powerpc: Only use -mabi=altivec if toolchain supports it
  powerpc: Fix duplicate const clang warning in user access code
  vfio: powerpc/spapr: Support Dynamic DMA windows
  vfio: powerpc/spapr: Register memory and define IOMMU v2
  ...
2015-06-24 08:46:32 -07:00
Anton Blanchard
1fb3f5a7ca powerpc: Only use -mabi=altivec if toolchain supports it
The -mabi=altivec option is not recognised on LLVM, so use call cc-option
to check for support.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-06-11 17:33:05 +10:00
David Hildenbrand
5f76eea88d sched/preempt, powerpc: Disable preemption in enable_kernel_altivec() explicitly
enable_kernel_altivec() has to be called with disabled preemption.
Let's make this explicit, to prepare for pagefault_disable() not
touching preemption anymore.

Reviewed-and-tested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David.Laight@ACULAB.COM
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: airlied@linux.ie
Cc: akpm@linux-foundation.org
Cc: bigeasy@linutronix.de
Cc: borntraeger@de.ibm.com
Cc: daniel.vetter@intel.com
Cc: heiko.carstens@de.ibm.com
Cc: herbert@gondor.apana.org.au
Cc: hocko@suse.cz
Cc: hughd@google.com
Cc: mst@redhat.com
Cc: paulus@samba.org
Cc: ralf@linux-mips.org
Cc: schwidefsky@de.ibm.com
Cc: yang.shi@windriver.com
Link: http://lkml.kernel.org/r/1431359540-32227-14-git-send-email-dahi@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 08:39:17 +02:00
Michael Ellerman
f691fa1080 powerpc: Replace mem_init_done with slab_is_available()
We have a powerpc specific global called mem_init_done which is "set on
boot once kmalloc can be called".

But that's not *quite* true. We set it at the bottom of mem_init(), and
rely on the fact that mm_init() calls kmem_cache_init() immediately
after that, and nothing is running in parallel.

So replace it with the generic and 100% correct slab_is_available().

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-10 20:02:48 +10:00
Michael Ellerman
df60f57684 Merge branch 'next-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc into test
Merge miscellaneous bits from benh. Fix a minor conflict with
OpalMessageType changing names to opal_msg_type.
2015-03-26 20:04:28 +11:00
Geert Uytterhoeven
1f8c82ab1b cpufreq/ppc: Add missing #include <asm/smp.h>
If CONFIG_SMP=n, <linux/smp.h> does not include <asm/smp.h>, causing:

drivers/cpufreq/ppc-corenet-cpufreq.c: In function 'corenet_cpufreq_cpu_init':
drivers/cpufreq/ppc-corenet-cpufreq.c:173:3: error: implicit declaration of function 'get_hard_smp_processor_id' [-Werror=implicit-funcuresh E. Warrier" <warrier@linux.vnet.ibm.com>
X-Patchwork-Id: 443703
Message-Id: <54EE5989.7010800@linux.vnet.ibm.com>
To: linuxppc-dev@ozlabs.org
Date: Wed, 25 Feb 2015 17:23:53 -0600

Export __spin_yield so that the arch_spin_unlock() function can
be invoked from a module. This will be required for modules where
we want to take a lock that is also is acquired in hypervisor
real mode. Because we want to avoid running any lockdep code
(which may not be safe in real mode), this lock needs to be
an arch_spinlock_t instead of a normal spinlock.

Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-25 16:53:28 +11:00
Kyle Moffett
b05ae4ee60 powerpc: Remove duplicate cacheable_memcpy/memzero functions
These functions are only used from one place each.  If the cacheable_*
versions really are more efficient, then those changes should be
migrated into the common code instead.

NOTE: The old routines are just flat buggy on kernels that support
      hardware with different cacheline sizes.

Signed-off-by: Kyle Moffett <Kyle.D.Moffett@boeing.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-17 11:25:50 +11:00
Markus Elfring
7f4eec3953 powerpc: Delete unnecessary checks before kfree()
The kfree() function tests whether its argument is NULL and then returns
immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-16 18:50:14 +11:00
Anton Blanchard
df99e6eb3f powerpc: Change vsrX register defines to vsX to match gcc and glibc
As our various loops (copy, string, crypto etc) get more complicated,
we want to share implementations between userspace (eg glibc) and
the kernel. We also want to write userspace test harnesses to put
in tools/testing/selftest.

One gratuitous difference between userspace and the kernel is the
VSX register definitions - the kernel uses vsrX whereas gcc uses
vsX.

Change the kernel to match userspace.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-16 18:32:11 +11:00
Anton Blanchard
c2ce6f9f3d powerpc: Change vrX register defines to vX to match gcc and glibc
As our various loops (copy, string, crypto etc) get more complicated,
we want to share implementations between userspace (eg glibc) and
the kernel. We also want to write userspace test harnesses to put
in tools/testing/selftest.

One gratuitous difference between userspace and the kernel is the
VMX register definitions - the kernel uses vrX whereas both gcc and
glibc use vX.

Change the kernel to match userspace.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-16 18:32:11 +11:00
Michael Ellerman
1dcee55fea powerpc/lib: Makefile, use obj64-y to consolidate 64-bit rules
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-28 15:00:24 +11:00
Michael Ellerman
564ec2f2a0 powerpc/lib: Makefile, consolidate obj-y sections
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-28 15:00:24 +11:00
Anton Blanchard
15c2d45d17 powerpc: Add 64bit optimised memcmp
I noticed ksm spending quite a lot of time in memcmp on a large
KVM box. The current memcmp loop is very unoptimised - byte at a
time compares with no loop unrolling. We can do much much better.

Optimise the loop in a few ways:

- Unroll the byte at a time loop

- For large (at least 32 byte) comparisons that are also 8 byte
  aligned, use an unrolled modulo scheduled loop using 8 byte
  loads. This is similar to our glibc memcmp.

A simple microbenchmark testing 10000000 iterations of an 8192 byte
memcmp was used to measure the performance:

baseline:	29.93 s

modified:	 1.70 s

Just over 17x faster.

v2: Incorporated some suggestions from Segher:

- Use andi. instead of rdlicl.

- Convert bdnzt eq, to bdnz. It's just duplicating the earlier compare
  and was a relic from a previous version.

- Don't use cr5, we have plans to use that CR field for fast local
  atomics.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-23 14:02:55 +11:00
Andreas Ruprecht
803d57de2b powerpc/lib: Do not include string.o in obj-y twice
In the Makefile, string.o (which is generated from string.S) is
included into the list of objects being built unconditionally
(obj-y) in line 12.

Additionally, if CONFIG_PPC64 is set, it is included again in
line 17.

This patch removes the latter unnecessary inclusion.

Signed-off-by: Andreas Ruprecht <rupran@einserver.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-29 15:45:55 +11:00
Linus Torvalds
a7cb7bb664 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree update from Jiri Kosina:
 "Usual stuff: documentation updates, printk() fixes, etc"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (24 commits)
  intel_ips: fix a type in error message
  cpufreq: cpufreq-dt: Move newline to end of error message
  ps3rom: fix error return code
  treewide: fix typo in printk and Kconfig
  ARM: dts: bcm63138: change "interupts" to "interrupts"
  Replace mentions of "list_struct" to "list_head"
  kernel: trace: fix printk message
  scsi: mpt2sas: fix ioctl in comment
  zbud, zswap: change module author email
  clocksource: Fix 'clcoksource' typo in comment
  arm: fix wording of "Crotex" in CONFIG_ARCH_EXYNOS3 help
  gpio: msm-v1: make boolean argument more obvious
  usb: Fix typo in usb-serial-simple.c
  PCI: Fix comment typo 'COMFIG_PM_OPS'
  powerpc: Fix comment typo 'CONIFG_8xx'
  powerpc: Fix comment typos 'CONFiG_ALTIVEC'
  clk: st: Spelling s/stucture/structure/
  isci: Spelling s/stucture/structure/
  usb: gadget: zero: Spelling s/infrastucture/infrastructure/
  treewide: Fix company name in module descriptions
  ...
2014-12-12 10:08:06 -08:00
Jiri Kosina
a02001086b Merge Linus' tree to be be to apply submitted patches to newer code than
current trivial.git base
2014-11-20 14:42:02 +01:00
Michael Ellerman
e39f223fc9 powerpc: Remove more traces of bootmem
Although we are now selecting NO_BOOTMEM, we still have some traces of
bootmem lying around. That is because even with NO_BOOTMEM there is
still a shim that converts bootmem calls into memblock calls, but
ultimately we want to remove all traces of bootmem.

Most of the patch is conversions from alloc_bootmem() to
memblock_virt_alloc(). In general a call such as:

  p = (struct foo *)alloc_bootmem(x);

Becomes:

  p = memblock_virt_alloc(x, 0);

We don't need the cast because memblock_virt_alloc() returns a void *.
The alignment value of zero tells memblock to use the default alignment,
which is SMP_CACHE_BYTES, the same value alloc_bootmem() uses.

We remove a number of NULL checks on the result of
memblock_virt_alloc(). That is because memblock_virt_alloc() will panic
if it can't allocate, in exactly the same way as alloc_bootmem(), so the
NULL checks are and always have been redundant.

The memory returned by memblock_virt_alloc() is already zeroed, so we
remove several memsets of the result of memblock_virt_alloc().

Finally we convert a few uses of __alloc_bootmem(x, y, MAX_DMA_ADDRESS)
to just plain memblock_virt_alloc(). We don't use memblock_alloc_base()
because MAX_DMA_ADDRESS is ~0ul on powerpc, so limiting the allocation
to that is pointless, 16XB ought to be enough for anyone.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-19 21:41:51 +11:00
Paul Mackerras
7048c84694 powerpc: Fix compilation of emulate_step()
Commit be96f63375 ("powerpc: Split out instruction analysis
part of emulate_step()") added some calls to do_fp_load()
and do_fp_store(), which fail to compile on configs with
CONFIG_PPC_FPU=n and CONFIG_PPC_EMULATE_SSTEP=y.  This fixes
the compile by adding #ifdef CONFIG_PPC_FPU around the code
that calls these functions.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-12 15:54:29 +11:00
Kyle McMartin
dedd24a12f powerpc: Remove unused devm_ioremap_prot()
Added in 2008, but has never had any in-tree users, and no other
architectures provide it.

Signed-off-by: Kyle McMartin <kyle@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-10 09:59:28 +11:00
Paul Bolle
c2522dcdcb powerpc: Fix comment typos 'CONFiG_ALTIVEC'
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2014-10-29 14:41:49 +01:00
Paul Mackerras
c9f6f4ed95 powerpc: Implement emulation of string loads and stores
The size field of the op.type word is now the total number of bytes
to be loaded or stored.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-09-25 23:14:52 +10:00
Paul Mackerras
cf87c3f6b6 powerpc: Emulate icbi, mcrf and conditional-trap instructions
This extends the instruction emulation done by analyse_instr() and
emulate_step() to handle a few more instructions that are found in
the kernel.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-09-25 23:14:51 +10:00
Paul Mackerras
be96f63375 powerpc: Split out instruction analysis part of emulate_step()
This splits out the instruction analysis part of emulate_step() into
a separate analyse_instr() function, which decodes the instruction,
but doesn't execute any load or store instructions.  It does execute
integer instructions and branches which can be executed purely by
updating register values in the pt_regs struct.  For other instructions,
it returns the instruction type and other details in a new
instruction_op struct.  emulate_step() then uses that information
to execute loads, stores, cache operations, mfmsr, mtmsr[d], and
(on 64-bit) sc instructions.

The reason for doing this is so that the KVM code can use it instead
of having its own separate instruction emulation code.  Possibly the
alignment interrupt handler could also use this.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-09-25 23:14:51 +10:00
Anton Blanchard
e51df2c170 powerpc: Make a bunch of things static
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-09-25 23:14:41 +10:00
Anton Blanchard
7b20a955c3 powerpc: Move lib symbol exports into arch/powerpc/lib/ppc_ksyms.c
Move the lib symbol exports closer to their function definitions

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-09-25 23:14:39 +10:00
Michael Ellerman
78e05b1421 powerpc: Add smp_mb()s to arch_spin_unlock_wait()
Similar to the previous commit which described why we need to add a
barrier to arch_spin_is_locked(), we have a similar problem with
spin_unlock_wait().

We need a barrier on entry to ensure any spinlock we have previously
taken is visibly locked prior to the load of lock->slock.

It's also not clear if spin_unlock_wait() is intended to have ACQUIRE
semantics. For now be conservative and add a barrier on exit to give it
ACQUIRE semantics.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-13 15:13:27 +10:00
Michael Ellerman
0f369103ce powerpc: Remove power3 from comments
There are still a few occurences where it remains, because it helps to
explain something that persists.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-07-28 14:10:26 +10:00
Li Zhong
6f5405bc2e powerpc: use _GLOBAL_TOC for memmove
memmove may be called from module code copy_pages(btrfs), and it may
call memcpy, which may call back to C code, so it needs to use
_GLOBAL_TOC to set up r2 correctly.

This fixes following error when I tried to boot an le guest:

Vector: 300 (Data Access) at [c000000073f97210]
    pc: c000000000015004: enable_kernel_altivec+0x24/0x80
    lr: c000000000058fbc: enter_vmx_copy+0x3c/0x60
    sp: c000000073f97490
   msr: 8000000002009033
   dar: d000000001d50170
 dsisr: 40000000
  current = 0xc0000000734c0000
  paca    = 0xc00000000fff0000	 softe: 0	 irq_happened: 0x01
    pid   = 815, comm = mktemp
enter ? for help
[c000000073f974f0] c000000000058fbc enter_vmx_copy+0x3c/0x60
[c000000073f97510] c000000000057d34 memcpy_power7+0x274/0x840
[c000000073f97610] d000000001c3179c copy_pages+0xfc/0x110 [btrfs]
[c000000073f97660] d000000001c3c248 memcpy_extent_buffer+0xe8/0x160 [btrfs]
[c000000073f97700] d000000001be4be8 setup_items_for_insert+0x208/0x4a0 [btrfs]
[c000000073f97820] d000000001be50b4 btrfs_insert_empty_items+0xf4/0x140 [btrfs]
[c000000073f97890] d000000001bfed30 insert_with_overflow+0x70/0x180 [btrfs]
[c000000073f97900] d000000001bff174 btrfs_insert_dir_item+0x114/0x2f0 [btrfs]
[c000000073f979a0] d000000001c1f92c btrfs_add_link+0x10c/0x370 [btrfs]
[c000000073f97a40] d000000001c20e94 btrfs_create+0x204/0x270 [btrfs]
[c000000073f97b00] c00000000026d438 vfs_create+0x178/0x210
[c000000073f97b50] c000000000270a70 do_last+0x9f0/0xe90
[c000000073f97c20] c000000000271010 path_openat+0x100/0x810
[c000000073f97ce0] c000000000272ea8 do_filp_open+0x58/0xd0
[c000000073f97dc0] c00000000025ade8 do_sys_open+0x1b8/0x300
[c000000073f97e30] c00000000000a008 syscall_exit+0x0/0x7c

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-07-22 15:56:04 +10:00
Paul Mackerras
e698b96678 powerpc: Fix bugs in emulate_step()
This fixes some bugs in emulate_step().  First, the setting of the carry
bit for the arithmetic right-shift instructions was not correct on 64-bit
machines because we were masking with a mask of type int rather than
unsigned long.  Secondly, the sld (shift left doubleword) instruction was
using the wrong instruction field for the register containing the shift
count.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-07-22 15:55:51 +10:00
Paul Bolle
b69a1da94f powerpc: fix typo 'CONFIG_PPC_CPU'
Commit cd64d1697c ("powerpc: mtmsrd not defined") added a check for
CONFIG_PPC_CPU were a check for CONFIG_PPC_FPU was clearly intended.

Fixes: cd64d1697c ("powerpc: mtmsrd not defined")
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-11 17:04:25 +10:00
Anton Blanchard
2ac7b0166a powerpc: Exported functions __clear_user and copy_page use r2 so need _GLOBAL_TOC()
__clear_user and copy_page load from the TOC and are also exported
to modules. This means we have to use _GLOBAL_TOC() so that we
create the global entry point that sets up the TOC.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-05 13:20:41 +10:00
Benjamin Herrenschmidt
f6869e7fe6 Merge remote-tracking branch 'anton/abiv2' into next
This series adds support for building the powerpc 64-bit
LE kernel using the new ABI v2. We already supported
running ABI v2 userspace programs but this adds support
for building the kernel itself using the new ABI.
2014-05-05 20:57:12 +10:00
Philippe Bergheaud
00f554fade powerpc: memcpy optimization for 64bit LE
Unaligned stores take alignment exceptions on POWER7 running in little-endian.
This is a dumb little-endian base memcpy that prevents unaligned stores.
Once booted the feature fixup code switches over to the VMX copy loops
(which are already endian safe).

The question is what we do before that switch over. The base 64bit
memcpy takes alignment exceptions on POWER7 so we can't use it as is.
Fixing the causes of alignment exception would slow it down, because
we'd need to ensure all loads and stores are aligned either through
rotate tricks or bytewise loads and stores. Either would be bad for
all other 64bit platforms.

[ I simplified the loop a bit - Anton ]

Signed-off-by: Philippe Bergheaud <felix@linux.vnet.ibm.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-04-30 15:26:18 +10:00
Anton Blanchard
169c7cee31 powerpc: Add _GLOBAL_TOC for ABIv2 assembly functions exported to modules
If an assembly function that calls back into c code is exported to
modules, we need to ensure r2 is setup correctly. There are only
two places crazy enough to do it (two of which are my fault).

Signed-off-by: Anton Blanchard <anton@samba.org>
2014-04-23 10:05:32 +10:00
Ulrich Weigand
752a6422fe powerpc: Fix unsafe accesses to parameter area in ELFv2
Some of the assembler files in lib/ make use of the fact that in the
ELFv1 ABI, the caller guarantees to provide stack space to save the
parameter registers r3 ... r10.  This guarantee is no longer present
in ELFv2 for functions that have no variable argument list and no
more than 8 arguments.

Change the affected routines to temporarily store registers in the
red zone and/or the top of their own stack frame (in the space
provided to save r31 .. r29, which is actually not used in these
routines).

In opal_query_takeover, simply always allocate a stack frame;
the routine is not performance critical.

Signed-off-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
2014-04-23 10:05:24 +10:00
Anton Blanchard
b37c10d128 powerpc: Fix ABIv2 issues with stack offsets in assembly code
Fix STK_PARAM and use it instead of hardcoding ABIv1 offsets.

Signed-off-by: Anton Blanchard <anton@samba.org>
2014-04-23 10:05:23 +10:00
Anton Blanchard
b1576fec7f powerpc: No need to use dot symbols when branching to a function
binutils is smart enough to know that a branch to a function
descriptor is actually a branch to the functions text address.

Alan tells me that binutils has been doing this for 9 years.

Signed-off-by: Anton Blanchard <anton@samba.org>
2014-04-23 10:05:16 +10:00
Michael Ellerman
22d651dcef selftests/powerpc: Import Anton's memcpy / copy_tofrom_user tests
Turn Anton's memcpy / copy_tofrom_user test into something that can
live in tools/testing/selftests.

It requires one turd in arch/powerpc/lib/memcpy_64.S, but it's pretty
harmless IMHO.

We are sailing very close to the wind with the feature macros. We define
them to nothing, which currently means we get a few extra nops and
include the unaligned calls.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-03-07 15:53:12 +11:00
Andreas Schwab
8fe9c93e74 powerpc: Add vr save/restore functions
GCC 4.8 now generates out-of-line vr save/restore functions when
optimizing for size.  They are needed for the raid6 altivec support.

Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-15 13:46:43 +11:00
Benjamin Herrenschmidt
dece8ada99 Merge branch 'merge' into next
Merge a pile of fixes that went into the "merge" branch (3.13-rc's) such
as Anton Little Endian fixes.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-30 15:19:31 +11:00
Paul E. McKenney
20151169f1 powerpc: Make 64-bit non-VMX __copy_tofrom_user bi-endian
The powerpc 64-bit __copy_tofrom_user() function uses shifts to handle
unaligned invocations.  However, these shifts were designed for
big-endian systems: On little-endian systems, they must shift in the
opposite direction.

This commit relies on the C preprocessor to insert the correct shifts
into the assembly code.

[ This is a rare but nasty LE issue. Most of the time we use the POWER7
optimised __copy_tofrom_user_power7 loop, but when it hits an exception
we fall back to the base __copy_tofrom_user loop. - Anton ]

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-30 14:02:30 +11:00
Kevin Hao
1e8341ae0c powerpc: Move the patch_exception to a common place
So that it can be used by other codes. No function change.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-02 14:06:54 +11:00
Anton Blanchard
ef1313deaf powerpc: Add VMX optimised xor for RAID5
Add a VMX optimised xor, used primarily for RAID5. On a POWER7 blade
this is a decent win:

   32regs    : 17932.800 MB/sec
   altivec   : 19724.800 MB/sec

The bigger gain is when the same test is run in SMT4 mode, as it
would if there was a lot of work going on:

   8regs     :  8377.600 MB/sec
   altivec   : 15801.600 MB/sec

I tested this against an array created without the patch, and also
verified it worked as expected on a little endian kernel.

[ Fix !CONFIG_ALTIVEC build -- BenH ]

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-10-30 16:02:28 +11:00
Tom Musta
dbc2fbd7c2 powerpc: Fix Unaligned LE Floating Point Loads and Stores
This patch addresses unaligned single precision floating point loads
and stores in the single-step code.  The old implementation
improperly treated an 8 byte structure as an array of two 4 byte
words, which is a classic little endian bug.

Signed-off-by: Tom Musta <tmusta@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-10-30 16:01:36 +11:00
Tom Musta
6506b4718b powerpc: Fix Unaligned Loads and Stores
This patch modifies the unaligned access routines of the sstep.c
module so that it properly reverses the bytes of storage operands
in the little endian kernel kernel.   This is implemented by
breaking an unaligned little endian access into a combination of
single byte accesses plus an overal byte reversal operation.

Signed-off-by: Tom Musta <tmusta@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-10-30 16:01:30 +11:00
Anton Blanchard
de577a3568 powerpc: Use generic memcpy code in little endian
We need to fix some endian issues in our memcpy code. For now
just enable the generic memcpy routine for little endian builds.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-10-11 16:48:40 +11:00
Anton Blanchard
7a332b0c9a powerpc: Use generic checksum code in little endian
We need to fix some endian issues in our checksum code. For now
just enable the generic checksum routines for little endian builds.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-10-11 16:48:39 +11:00
Anton Blanchard
32ee1e188e powerpc: Fix endian issues in VMX copy loops
Fix the permute loops for little endian.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-10-11 16:48:25 +11:00
Paul E. McKenney
8f21bd0090 powerpc: Restore registers on error exit from csum_partial_copy_generic()
The csum_partial_copy_generic() function saves the PowerPC non-volatile
r14, r15, and r16 registers for the main checksum-and-copy loop.
Unfortunately, it fails to restore them upon error exit from this loop,
which results in silent corruption of these registers in the presumably
rare event of an access exception within that loop.

This commit therefore restores these register on error exit from the loop.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: stable@vger.kernel.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-10-03 17:22:42 +10:00
Paul E. McKenney
d9813c3681 powerpc: Fix parameter clobber in csum_partial_copy_generic()
The csum_partial_copy_generic() uses register r7 to adjust the remaining
bytes to process.  Unfortunately, r7 also holds a parameter, namely the
address of the flag to set in case of access exceptions while reading
the source buffer.  Lacking a quantum implementation of PowerPC, this
commit instead uses register r9 to do the adjusting, leaving r7's
pointer uncorrupted.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: stable@vger.kernel.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-10-03 17:22:36 +10:00
Benjamin Herrenschmidt
cbc9565ee8 powerpc: Remove ksp_limit on ppc64
We've been keeping that field in thread_struct for a while, it contains
the "limit" of the current stack pointer and is meant to be used for
detecting stack overflows.

It has a few problems however:

 - First, it was never actually *used* on 64-bit. Set and updated but
not actually exploited

 - When switching stack to/from irq and softirq stacks, it's update
is racy unless we hard disable interrupts, which is costly. This
is fine on 32-bit as we don't soft-disable there but not on 64-bit.

Thus rather than fixing 2 in order to implement 1 in some hypothetical
future, let's remove the code completely from 64-bit. In order to avoid
a clutter of ifdef's, we remove the updates from C code completely
during interrupt stack switching, and instead maintain it from the
asm helper that is used to do the stack switching in the first place.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-09-25 14:15:51 +10:00
Tom Musta
17e8de7e18 powerpc: Unaligned stores and stmw are broken in emulation code
The stmw instruction was incorrectly decoded as an update form instruction
and thus the RA register was being clobbered.

Also, the utility routine to write memory to unaligned addresses breaks the
operation into smaller aligned accesses but was incorrectly incrementing
the address by only one; it needs to increment the address by the size of
the smaller aligned chunk.

Signed-off-by: Tom Musta <tmusta@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-08-27 14:36:08 +10:00
Anton Blanchard
7ffcf8ec26 powerpc: Fix little endian lppaca, slb_shadow and dtl_entry
The lppaca, slb_shadow and dtl_entry hypervisor structures are
big endian, so we have to byte swap them in little endian builds.

LE KVM hosts will also need to be fixed but for now add an #error
to remind us.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-08-14 15:33:35 +10:00
Michael Neuling
70a54a4fae powerpc: Fix single step emulation of 32bit overflowed branches
Check truncate_if_32bit() on final write to nip.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 16:55:13 +10:00
Michael Neuling
280a5ba22c powerpc/pseries: Improve stream generation comments in copypage/user
No code changes, just documenting what's happening a little better.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01 08:29:26 +10:00
Suzuki K. Poulose
5e249d4528 uprobes/powerpc: Add dependency on single step emulation
Uprobes uses emulate_step in sstep.c, but we haven't explicitly specified
the dependency. On pseries HAVE_HW_BREAKPOINT protects us, but 44x has no
such luxury.

Consolidate other users that depend on sstep and create a new config option.

Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Suzuki K. Poulose <suzuki@in.ibm.com>
Cc: linuxppc-dev@ozlabs.org
Cc: stable@vger.kernel.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-01-29 11:35:06 +11:00
Anton Blanchard
1fbe9cf259 powerpc: Build kernel with -mcmodel=medium
Finally remove the two level TOC and build with -mcmodel=medium.

Unfortunately we can't build modules with -mcmodel=medium due to
the tricks the kernel module loader plays with percpu data:

# -mcmodel=medium breaks modules because it uses 32bit offsets from
# the TOC pointer to create pointers where possible. Pointers into the
# percpu data area are created by this method.
#
# The kernel module loader relocates the percpu data section from the
# original location (starting with 0xd...) to somewhere in the base
# kernel percpu data space (starting with 0xc...). We need a full
# 64bit relocation for this to work, hence -mcmodel=large.

On older kernels we fall back to the two level TOC (-mminimal-toc)

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-01-10 17:00:31 +11:00
Nishanth Aravamudan
c8adfeccee powerpc: Fix VMX fix for memcpy case
In 2fae7cdb60 ("powerpc: Fix VMX in
interrupt check in POWER7 copy loops"), Anton inadvertently
introduced a regression for memcpy on POWER7 machines. copyuser and
memcpy diverge slightly in their use of cr1 (copyuser doesn't use it,
but memcpy does) and you end up clobbering that register with your fix.
That results in (taken from an FC18 kernel):

[   18.824604] Unrecoverable VMX/Altivec Unavailable Exception f20 at c000000000052f40
[   18.824618] Oops: Unrecoverable VMX/Altivec Unavailable Exception, sig: 6 [#1]
[   18.824623] SMP NR_CPUS=1024 NUMA pSeries
[   18.824633] Modules linked in: tg3(+) be2net(+) cxgb4(+) ipr(+) sunrpc xts lrw gf128mul dm_crypt dm_round_robin dm_multipath linear raid10 raid456 async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx raid1 raid0 scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua squashfs cramfs
[   18.824705] NIP: c000000000052f40 LR: c00000000020b874 CTR: 0000000000000512
[   18.824709] REGS: c000001f1fef7790 TRAP: 0f20   Not tainted  (3.6.0-0.rc6.git0.2.fc18.ppc64)
[   18.824713] MSR: 8000000000009032 <SF,EE,ME,IR,DR,RI>  CR: 4802802e  XER: 20000010
[   18.824726] SOFTE: 0
[   18.824728] CFAR: 0000000000000f20
[   18.824731] TASK = c000000fa7128400[0] 'swapper/24' THREAD: c000000fa7480000 CPU: 24
GPR00: 00000000ffffffc0 c000001f1fef7a10 c00000000164edc0 c000000f9b9a8120
GPR04: c000000f9b9a8124 0000000000001438 0000000000000060 03ffffff064657ee
GPR08: 0000000080000000 0000000000000010 0000000000000020 0000000000000030
GPR12: 0000000028028022 c00000000ff25400 0000000000000001 0000000000000000
GPR16: 0000000000000000 7fffffffffffffff c0000000016b2180 c00000000156a500
GPR20: c000000f968c7a90 c0000000131c31d8 c000001f1fef4000 c000000001561d00
GPR24: 000000000000000a 0000000000000000 0000000000000001 0000000000000012
GPR28: c000000fa5c04f80 00000000000008bc c0000000015c0a28 000000000000022e
[   18.824792] NIP [c000000000052f40] .memcpy_power7+0x5a0/0x7c4
[   18.824797] LR [c00000000020b874] .pcpu_free_area+0x174/0x2d0
[   18.824800] Call Trace:
[   18.824803] [c000001f1fef7a10] [c000000000052c14] .memcpy_power7+0x274/0x7c4 (unreliable)
[   18.824809] [c000001f1fef7b10] [c00000000020b874] .pcpu_free_area+0x174/0x2d0
[   18.824813] [c000001f1fef7bb0] [c00000000020ba88] .free_percpu+0xb8/0x1b0
[   18.824819] [c000001f1fef7c50] [c00000000043d144] .throtl_pd_exit+0x94/0xd0
[   18.824824] [c000001f1fef7cf0] [c00000000043acf8] .blkg_free+0x88/0xe0
[   18.824829] [c000001f1fef7d90] [c00000000018c048] .rcu_process_callbacks+0x2e8/0x8a0
[   18.824835] [c000001f1fef7e90] [c0000000000a8ce8] .__do_softirq+0x158/0x4d0
[   18.824840] [c000001f1fef7f90] [c000000000025ecc] .call_do_softirq+0x14/0x24
[   18.824845] [c000000fa7483650] [c000000000010e80] .do_softirq+0x160/0x1a0
[   18.824850] [c000000fa74836f0] [c0000000000a94a4] .irq_exit+0xf4/0x120
[   18.824854] [c000000fa7483780] [c000000000020c44] .timer_interrupt+0x154/0x4d0
[   18.824859] [c000000fa7483830] [c000000000003be0] decrementer_common+0x160/0x180
[   18.824866] --- Exception: 901 at .plpar_hcall_norets+0x84/0xd4
[   18.824866]     LR = .check_and_cede_processor+0x48/0x80
[   18.824871] [c000000fa7483b20] [c00000000007f018] .check_and_cede_processor+0x18/0x80 (unreliable)
[   18.824877] [c000000fa7483b90] [c00000000007f104] .dedicated_cede_loop+0x84/0x150
[   18.824883] [c000000fa7483c50] [c0000000006bc030] .cpuidle_enter+0x30/0x50
[   18.824887] [c000000fa7483cc0] [c0000000006bc9f4] .cpuidle_idle_call+0x104/0x720
[   18.824892] [c000000fa7483d80] [c000000000070af8] .pSeries_idle+0x18/0x40
[   18.824897] [c000000fa7483df0] [c000000000019084] .cpu_idle+0x1a4/0x380
[   18.824902] [c000000fa7483ec0] [c0000000008a4c18] .start_secondary+0x520/0x528
[   18.824907] [c000000fa7483f90] [c0000000000093f0] .start_secondary_prolog+0x10/0x14
[   18.824911] Instruction dump:
[   18.824914] 38840008 90030000 90e30004 38630008 7ca62850 7cc300d0 78c7e102 7cf01120
[   18.824923] 78c60660 39200010 39400020 39600030 <7e00200c> 7c0020ce 38840010 409f001c
[   18.824935] ---[ end trace 0bb95124affaaa45 ]---
[   18.825046] Unrecoverable VMX/Altivec Unavailable Exception f20 at c000000000052d08

I believe the right fix is to make memcpy match usercopy and not use
cr1.

Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: <stable@kernel.org> [v3.6]
2012-10-04 18:02:43 +10:00
Tiejun Chen
8e9f693715 powerpc/kprobe: Don't emulate store when kprobe stwu r1
We don't do the real store operation for kprobing 'stwu Rx,(y)R1'
since this may corrupt the exception frame, now we will do this
operation safely in exception return code after migrate current
exception frame below the kprobed function stack.

So we only update gpr[1] here and trigger a thread flag to mask
this.

Note we should make sure if we trigger kernel stack over flow.

Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-09-18 15:32:45 +10:00
Benjamin Herrenschmidt
636802ef96 powerpc: Don't use __put_user() in patch_instruction
patch_instruction() can be called very early on ppc32, when the kernel
isn't yet running at it's linked address. That can cause the !
is_kernel_addr() test in __put_user() to trip and call might_sleep()
which is very bad at that point during boot.

Use a lower level function instead for now, at least until we get to
rework ppc32 boot process to do the code patching later, like ppc64
does.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-09-05 16:05:23 +10:00
Anton Blanchard
2fae7cdb60 powerpc: Fix VMX in interrupt check in POWER7 copy loops
The enhanced prefetch hint patches corrupt the condition register
that was used to check if we are in interrupt. Fix this by using cr1.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-08-24 20:26:09 +10:00
Anton Blanchard
dad477ccd6 powerpc: POWER7 copy_to_user/copy_from_user patch applied twice
"powerpc: Use enhanced touch instructions in POWER7
copy_to_user/copy_from_user" was applied twice. Remove one.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-08-24 20:26:09 +10:00
Stephen Rothwell
1d5a436d2c powerpc: Put the gpr save/restore functions in their own section
This allows the linker to know that calls to them do not need to switch
TOC and stop errors like the following when linking large configurations:

powerpc64-linux-ld: drivers/built-in.o: In function `.gpiochip_is_requested':
(.text+0x4): sibling call optimization to `_savegpr0_29' does not allow automatic multiple TOCs; recompile with -mminimal-toc or -fno-optimize-sibling-calls, or make `_savegpr0_29' extern

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-11 14:19:59 +10:00
Michael Neuling
e55174e911 powerpc: Fixes for instructions not using correct register naming
These macros are using integers where they could be using logical
names since they take registers.

We are going to enforce this soon, so fix these up now.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-10 19:18:16 +10:00
Michael Neuling
86e32fdce7 powerpc: Change mtcrf to use real register names
mtocrf define is just a wrapper around the real instructions so we can
just use real register names here (ie. lower case).

Also remove braces in macro so this is possible.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-10 19:18:11 +10:00
Michael Neuling
44ce6a5ee7 powerpc: Merge STK_REG/PARAM/FRAMESIZE
Merge the defines of STACKFRAMESIZE, STK_REG, STK_PARAM from different
places.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-10 19:18:03 +10:00
Michael Neuling
c75df6f96c powerpc: Fix usage of register macros getting ready for %r0 change
Anything that uses a constructed instruction (ie. from ppc-opcode.h),
need to use the new R0 macro, as %r0 is not going to work.

Also convert usages of macros where we are just determining an offset
(usually for a load/store), like:
	std	r14,STK_REG(r14)(r1)
Can't use STK_REG(r14) as %r14 doesn't work in the STK_REG macro since
it's just calculating an offset.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-10 19:17:55 +10:00
Anton Blanchard
cf8fb5533f powerpc: Optimise the 64bit optimised __clear_user
I blame Mikey for this. He elevated my slightly dubious testcase:

to benchmark status. And naturally we need to be number 1 at creating
zeros. So lets improve __clear_user some more.

As Paul suggests we can use dcbz for large lengths. This patch gets
the destination cacheline aligned then uses dcbz on whole cachelines.

Before:
10485760000 bytes (10 GB) copied, 0.414744 s, 25.3 GB/s

After:
10485760000 bytes (10 GB) copied, 0.268597 s, 39.0 GB/s

39 GB/s, a new record.

Signed-off-by: Anton Blanchard <anton@samba.org>
Tested-by: Olof Johansson <olof@lixom.net>
Acked-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-03 14:14:48 +10:00
Anton Blanchard
b3f271e86e powerpc: POWER7 optimised memcpy using VMX and enhanced prefetch
Implement a POWER7 optimised memcpy using VMX and enhanced prefetch
instructions.

This is a copy of the POWER7 optimised copy_to_user/copy_from_user
loop. Detailed implementation and performance details can be found in
commit a66086b819 (powerpc: POWER7 optimised
copy_to_user/copy_from_user using VMX).

I noticed memcpy issues when profiling a RAID6 workload:

	.memcpy
	.async_memcpy
	.async_copy_data
	.__raid_run_ops
	.handle_stripe
	.raid5d
	.md_thread

I created a simplified testcase by building a RAID6 array with 4 1GB
ramdisks (booting with brd.rd_size=1048576):

# mdadm -CR -e 1.2 /dev/md0 --level=6 -n4 /dev/ram[0-3]

I then timed how long it took to write to the entire array:

# dd if=/dev/zero of=/dev/md0 bs=1M

Before: 892 MB/s
After:  999 MB/s

A 12% improvement.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-03 14:14:46 +10:00
Anton Blanchard
bce4b4bd91 powerpc: Use enhanced touch instructions in POWER7 copy_to_user/copy_from_user
Version 2.06 of the POWER ISA introduced enhanced touch instructions,
allowing us to specify a number of attributes including the length of
a stream.

This patch adds a software stream for both loads and stores in the
POWER7 copy_tofrom_user loop. Since the setup is quite complicated
and we have to use an eieio to ensure correct ordering of the "GO"
command we only do this for copies above 4kB.

To quantify any performance improvements we need a working set
bigger than the caches so we operate on a 1GB file:

# dd if=/dev/zero of=/tmp/foo bs=1M count=1024

And we compare how fast we can read the file:

# dd if=/tmp/foo of=/dev/null bs=1M

before: 7.7 GB/s
after:  9.6 GB/s

A 25% improvement.

The worst case for this patch will be a completely L1 cache contained
copy of just over 4kB. We can test this with the copy_to_user
testcase we used to tune copy_tofrom_user originally:

http://ozlabs.org/~anton/junkcode/copy_to_user.c

# time ./copy_to_user2 -l 4224 -i 10000000

before: 6.807 s
after:  6.946 s

A 2% slowdown, which seems reasonable considering our data is unlikely
to be completely L1 contained.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-03 14:14:45 +10:00
Anton Blanchard
fde69282b7 powerpc: POWER7 optimised copy_page using VMX and enhanced prefetch
Implement a POWER7 optimised copy_page using VMX and enhanced
prefetch instructions. We use enhanced prefetch hints to prefetch
both the load and store side. We copy a cacheline at a time and
fall back to regular loads and stores if we are unable to use VMX
(eg we are in an interrupt).

The following microbenchmark was used to assess the impact of
the patch:

http://ozlabs.org/~anton/junkcode/page_fault_file.c

We test MAP_PRIVATE page faults across a 1GB file, 100 times:

# time ./page_fault_file -p -l 1G -i 100

Before: 22.25s
After:  18.89s

17% faster

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-03 14:14:44 +10:00
Anton Blanchard
6f7839e542 powerpc: Rename copyuser_power7_vmx.c to vmx-helper.c
Subsequent patches will add more VMX library functions and it makes
sense to keep all the c-code helper functions in the one file.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-03 14:14:43 +10:00
Anton Blanchard
a9514dc69d powerpc: Use enhanced touch instructions in POWER7 copy_to_user/copy_from_user
Version 2.06 of the POWER ISA introduced enhanced touch instructions,
allowing us to specify a number of attributes including the length of
a stream.

This patch adds a software stream for both loads and stores in the
POWER7 copy_tofrom_user loop. Since the setup is quite complicated
and we have to use an eieio to ensure correct ordering of the "GO"
command we only do this for copies above 4kB.

To quantify any performance improvements we need a working set
bigger than the caches so we operate on a 1GB file:

# dd if=/dev/zero of=/tmp/foo bs=1M count=1024

And we compare how fast we can read the file:

# dd if=/tmp/foo of=/dev/null bs=1M

before: 7.7 GB/s
after:  9.6 GB/s

A 25% improvement.

The worst case for this patch will be a completely L1 cache contained
copy of just over 4kB. We can test this with the copy_to_user
testcase we used to tune copy_tofrom_user originally:

http://ozlabs.org/~anton/junkcode/copy_to_user.c

# time ./copy_to_user2 -l 4224 -i 10000000

before: 6.807 s
after:  6.946 s

A 2% slowdown, which seems reasonable considering our data is unlikely
to be completely L1 contained.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-03 14:14:42 +10:00
Anton Blanchard
17968fbbd1 powerpc: 64bit optimised __clear_user
I noticed __clear_user high up in a profile of one of my RAID stress
tests. The testcase was doing a dd from /dev/zero which ends up
calling __clear_user.

__clear_user is basically a loop with a single 4 byte store which
is horribly slow. We can do much better by aligning the desination
and doing 32 bytes of 8 byte stores in a loop.

The following testcase was used to verify the patch:

http://ozlabs.org/~anton/junkcode/stress_clear_user.c

To show the improvement in performance I ran a dd from /dev/zero
to /dev/null on a POWER7 box:

Before:

# dd if=/dev/zero of=/dev/null bs=1M count=10000
10485760000 bytes (10 GB) copied, 3.72379 s, 2.8 GB/s

After:

# time dd if=/dev/zero of=/dev/null bs=1M count=10000
10485760000 bytes (10 GB) copied, 0.728318 s, 14.4 GB/s

Over 5x faster.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-03 14:14:41 +10:00
Steven Rostedt
b6e3796834 powerpc: Have patch_instruction detect faults
For ftrace to use the patch_instruction code, it needs to check for
faults on write. Ftrace updates code all over the kernel, and we need to
know if code is updated or not due to protections that are placed on
some portions of the kernel. If ftrace does not detect a fault, it will
error later on, and it will be much more difficult to find the problem.

By changing patch_instruction() to detect faults, then ftrace will be
able to make use of it too.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-03 14:14:38 +10:00
Paul Mackerras
1629372caa powerpc: Use the new generic strncpy_from_user() and strnlen_user()
This is much the same as for SPARC except that we can do the find_zero()
function more efficiently using the count-leading-zeroes instructions.
Tested on 32-bit and 64-bit PowerPC.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-27 21:00:07 -07:00
Anton Blanchard
694caf0255 powerpc: Remove CONFIG_POWER4_ONLY
Remove CONFIG_POWER4_ONLY, the option is badly named and only does two
things:

- It wraps the MMU segment table code. With feature fixups there is
  little downside to compiling this in.

- It uses the newer mtocrf instruction in various assembly functions.
  Instead of making this a compile option just do it at runtime via
  a feature fixup.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-04-30 15:37:26 +10:00