Commit Graph

51 Commits

Author SHA1 Message Date
Ingo Molnar
f541ae326f Merge branch 'linus' into perfcounters/core-v2
Merge reason: we have gathered quite a few conflicts, need to merge upstream

Conflicts:
	arch/powerpc/kernel/Makefile
	arch/x86/ia32/ia32entry.S
	arch/x86/include/asm/hardirq.h
	arch/x86/include/asm/unistd_32.h
	arch/x86/include/asm/unistd_64.h
	arch/x86/kernel/cpu/common.c
	arch/x86/kernel/irq.c
	arch/x86/kernel/syscall_table_32.S
	arch/x86/mm/iomap_32.c
	include/linux/sched.h
	kernel/Makefile

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-06 09:02:57 +02:00
Steven Rostedt
bb7253403f powerpc64, ftrace: save toc only on modules for function graph
The TOCS used by modules are different than the one used by
the core kernel code. The function graph tracer must save and
restore the TOC whenever it traces a module call. But this
is an added overhead to burden the majority of core kernel
code being traced.

Benjamin Herrenschmidt suggested in testing the entry of
the call to tell if it is a core kernel function or a module.
He recommended using the REGION_ID() macro to perform this test.

This patch implements Benjamin's idea, and uses a different
return_to_handler routine dependent on if the entry is a core
kernel function or not. The module version saves the TOC, where as
the core kernel version does not.

Geoff Lavand tested on PS3.

Tested-by: Geoff Levand <geoffrey.levand@am.sony.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 10:48:54 +11:00
Steven Rostedt
4654288847 powerpc64, tracing: add function graph tracer with dynamic tracing
This is the port of the function graph tracer to PowerPC with
dynamic tracing.

Geoff Lavand tested on PS3.

Tested-by: Geoff Levand <geoffrey.levand@am.sony.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 10:48:54 +11:00
Steven Rostedt
6794c78243 powerpc64: port of the function graph tracer
This is a port of the function graph tracer that was written by
Frederic Weisbecker for the x86.

This only works for PPC64 at the moment and only for static tracing.
PPC32 and dynamic function graph tracing support will come later.

The trace produces a visual calling of functions:

 # tracer: function_graph
 #
 # CPU  DURATION                  FUNCTION CALLS
 # |     |   |                     |   |   |   |
  0)   2.224 us    |                        }
  0) ! 271.024 us  |                      }
  0) ! 320.080 us  |                    }
  0) ! 324.656 us  |                  }
  0) ! 329.136 us  |                }
  0)               |                .put_prev_task_fair() {
  0)               |                  .update_curr() {
  0)   2.240 us    |                    .update_min_vruntime();
  0)   6.512 us    |                  }
  0)   2.528 us    |                  .__enqueue_entity();
  0) + 15.536 us   |                }
  0)               |                .pick_next_task_fair() {
  0)   2.032 us    |                  .__pick_next_entity();
  0)   2.064 us    |                  .__clear_buddies();
  0)               |                  .set_next_entity() {
  0)   2.672 us    |                    .__dequeue_entity();
  0)   6.864 us    |                  }

Geoff Lavand tested on PS3.

Tested-by: Geoff Levand <geoffrey.levand@am.sony.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 10:48:53 +11:00
Paul Mackerras
93a6d3ce69 powerpc: Provide a way to defer perf counter work until interrupts are enabled
Because 64-bit powerpc uses lazy (soft) interrupt disabling, it is
possible for a performance monitor exception to come in when the
kernel thinks interrupts are disabled (i.e. when they are
soft-disabled but hard-enabled).  In such a situation the performance
monitor exception handler might have some processing to do (such as
process wakeups) which can't be done in what is effectively an NMI
handler.

This provides a way to defer that work until interrupts get enabled,
either in raw_local_irq_restore() or by returning from an interrupt
handler to code that had interrupts enabled.  We have a per-processor
flag that indicates that there is work pending to do when interrupts
subsequently get re-enabled.  This flag is checked in the interrupt
return path and in raw_local_irq_restore(), and if it is set,
perf_counter_do_pending() is called to do the pending work.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-01-09 19:48:17 +11:00
Ingo Molnar
b8307db247 Merge commit 'v2.6.28-rc7' into tracing/core 2008-12-04 09:07:19 +01:00
Paul Mackerras
ab598b6680 powerpc: Fix system calls on Cell entered with XER.SO=1
It turns out that on Cell, on a kernel with CONFIG_VIRT_CPU_ACCOUNTING
= y, if a program sets the SO (summary overflow) bit in the XER and
then does a system call, the SO bit in CR0 will be set on return
regardless of whether the system call detected an error.  Since CR0.SO
is used as the error indication from the system call, this means that
all system calls appear to fail.

The reason is that the workaround for the timebase bug on Cell uses a
compare instruction.  With CONFIG_VIRT_CPU_ACCOUNTING = y, the
ACCOUNT_CPU_USER_ENTRY macro reads the timebase, so we end up doing a
compare instruction, which copies XER.SO to CR0.SO.  Since we were
doing this in the system call entry patch after clearing CR0.SO but
before saving the CR, this meant that the saved CR image had CR0.SO
set if XER.SO was set on entry.

This fixes it by moving the clearing of CR0.SO to after the
ACCOUNT_CPU_USER_ENTRY call in the system call entry path.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-12-01 09:40:19 +11:00
Steven Rostedt
c7b0d17366 powerpc: ftrace, do nothing in mcount call for dyn ftrace
Impact: quicken mcount calls that are not replaced by dyn ftrace

Dynamic ftrace no longer does on the fly recording of mcount locations.
The mcount locations are now found at compile time. The mcount
function no longer needs to store registers and call a stub function.
It can now just simply return.

Since there are some functions that do not get converted to a nop
(.init sections and other code that may disappear), this patch should
help speed up that code.

Also, the stub for mcount on PowerPC 32 can not be a simple branch
link register like it is on PowerPC 64. According to the ABI specification:

"The _mcount routine is required to restore the link register from
 the stack so that the profiling code can be inserted transparently,
 whether or not the profiled function saves the link register itself."

This means that we must restore the link register that was used
to make the call to mcount.  The minimal mcount function for PPC32
ends up being:

 mcount:
        mflr    r0
        mtctr   r0
        lwz     r0, 4(r1)
        mtlr    r0
        bctr

Where we move the link register used to call mcount into the
ctr register, and then restore the link register from the stack.
Then we use the ctr register to jump back to the mcount caller.
The r0 register is free for us to use.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-28 14:07:45 +01:00
Steven Rostedt
606576ce81 ftrace: rename FTRACE to FUNCTION_TRACER
Due to confusion between the ftrace infrastructure and the gcc profiling
tracer "ftrace", this patch renames the config options from FTRACE to
FUNCTION_TRACER.  The other two names that are offspring from FTRACE
DYNAMIC_FTRACE and FTRACE_MCOUNT_RECORD will stay the same.

This patch was generated mostly by script, and partially by hand.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-20 18:27:03 +02:00
Paul Mackerras
e31aa453bb powerpc: Use LOAD_REG_IMMEDIATE only for constants on 64-bit
Using LOAD_REG_IMMEDIATE to get the address of kernel symbols
generates 5 instructions where LOAD_REG_ADDR can do it in one,
and will generate R_PPC64_ADDR16_* relocations in the output when
we get to making the kernel as a position-independent executable,
which we'd rather not have to handle.  This changes various bits
of assembly code to use LOAD_REG_ADDR when we need to get the
address of a symbol, or to use suitable position-independent code
for cases where we can't access the TOC for various reasons, or
if we're not running at the address we were linked at.

It also cleans up a few minor things; there's no reason to save and
restore SRR0/1 around RTAS calls, __mmu_off can get the return
address from LR more conveniently than the caller can supply it in
R4 (and we already assume elsewhere that EA == RA if the MMU is on
in early boot), and enable_64b_mode was using 5 instructions where
2 would do.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-09-15 11:08:35 -07:00
Michael Ellerman
01f3880dd8 powerpc: Streamline ret_from_except_lite for non-iSeries platforms
There is a small passage of code in ret_from_except_lite which is
only required on iSeries.  For a multi-platform kernel on non-iSeries
machines this means we end up executing ~15 nops in ret_from_except_lite.

It would be nicer if non-iSeries could skip the code entirely, and on
iSeries we can jump out of line to execute the code.

I have no performance numbers to justify this, other than the assertion
that executing 15 nops takes longer than executing 0.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-08-20 16:34:57 +10:00
Roland McGrath
7d6d637dac powerpc: Add TIF_NOTIFY_RESUME support for tracehook
This adds TIF_NOTIFY_RESUME support for powerpc.  When set,
we call tracehook_notify_resume() on the way to user mode.
This overloads do_signal() to do the work, but changes its
arguments to it has the TIF_* bits handy in a register and
drops the useless first argument that was always zero.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-07-28 16:30:50 +10:00
Roland McGrath
4f72c4279e powerpc: Make syscall tracing use tracehook.h helpers
This changes powerpc syscall tracing to use the new tracehook.h entry
points.  There is no change, only cleanup.

In addition, the assembly changes allow do_syscall_trace_enter() to
abort the syscall without losing the information about the original
r0 value.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-07-28 16:30:49 +10:00
Benjamin Herrenschmidt
43d2548bb2 Merge commit '85082fd7cbe3173198aac0eb5e85ab1edcc6352c' into test-build
Manual fixup of:

	arch/powerpc/Kconfig
2008-07-15 15:44:51 +10:00
Michael Neuling
ce48b21007 powerpc: Add VSX context save/restore, ptrace and signal support
This patch extends the floating point save and restore code to use the
VSX load/stores when VSX is available.  This will make FP context
save/restore marginally slower on FP only code, when VSX is available,
as it has to load/store 128bits rather than just 64bits.

Mixing FP, VMX and VSX code will get constant architected state.

The signals interface is extended to enable access to VSR 0-31
doubleword 1 after discussions with tool chain maintainers.  Backward
compatibility is maintained.

The ptrace interface is also extended to allow access to VSR 0-31 full
registers.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-07-01 11:28:50 +10:00
Michael Ellerman
c230328def powerpc: Use an alternative feature section in entry_64.S
Use an alternative feature section in _switch. There are three cases
handled here, either we don't have an SLB, in which case we jump over the
entire code section, or if we do we either do or don't have 1TB segments.

Boot tested on Power3, Power5 and Power5+.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-07-01 11:28:31 +10:00
Abhishek Sagar
395a59d0f8 ftrace: store mcount address in rec->ip
Record the address of the mcount call-site. Currently all archs except sparc64
record the address of the instruction following the mcount call-site. Some
general cleanups are entailed. Storing mcount addresses in rec->ip enables
looking them up in the kprobe hash table later on to check if they're kprobe'd.

Signed-off-by: Abhishek Sagar <sagar.abhishek@gmail.com>
Cc: davem@davemloft.net
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-23 22:10:56 +02:00
Steven Rostedt
4e491d14f2 ftrace: support for PowerPC
This patch adds full support for ftrace for PowerPC (both 64 and 32 bit).
This includes dynamic tracing and function filtering.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-23 22:43:11 +02:00
Benjamin Herrenschmidt
945feb174b [POWERPC] irqtrace support for 64-bit powerpc
This adds the low level irq tracing hooks to the powerpc architecture
needed to enable full lockdep functionality.

This is partly based on Johannes Berg's initial version.  I removed
the asm trampoline that isn't needed (thus improving performance) and
modified all sorts of bits and pieces, reworking most of the assembly,
etc...

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-18 15:38:47 +10:00
Benjamin Herrenschmidt
ec2b36b9f2 [POWERPC] Move stackframe definitions to common header
This moves various definitions used all over the place to parse stack
frames to ptrace.h so only one definition is needed.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-18 15:37:18 +10:00
Olof Johansson
f66bce5e6a [POWERPC] Add 1TB workaround for PA6T
PA6T has a bug where the slbie instruction does not honor the large
segment bit.  As a result, we have to always use slbia when switching
context.

We don't have to worry about changing the slbie's during fault processing,
since they should never be replacing one VSID with another using the
same ESID.  I.e. there's no risk for inserting duplicate entries due to a
failed slbie of the old entry.  So as long as we clear it out on context
switch we should be fine.

Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-10-17 22:30:09 +10:00
Paul Mackerras
1189be6508 [POWERPC] Use 1TB segments
This makes the kernel use 1TB segments for all kernel mappings and for
user addresses of 1TB and above, on machines which support them
(currently POWER5+, POWER6 and PA6T).

We detect that the machine supports 1TB segments by looking at the
ibm,processor-segment-sizes property in the device tree.

We don't currently use 1TB segments for user addresses < 1T, since
that would effectively prevent 32-bit processes from using huge pages
unless we also had a way to revert to using 256MB segments.  That
would be possible but would involve extra complications (such as
keeping track of which segment size was used when HPTEs were inserted)
and is not addressed here.

Parts of this patch were originally written by Ben Herrenschmidt.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-10-12 14:05:17 +10:00
Michael Neuling
00efee7d5d [POWERPC] Remove barriers from the SLB shadow buffer update
After talking to an IBM POWER hypervisor (PHYP) design and development
guy, there seems to be no need for memory barriers when updating the SLB
shadow buffer provided we only update it from the current CPU, which we
do.

Also, these guys see no need in the future for these barriers.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-09-19 14:40:54 +10:00
Michael Neuling
67439b76f2 [POWERPC] Fixes for the SLB shadow buffer code
On a machine with hardware 64kB pages and a kernel configured for a
64kB base page size, we need to change the vmalloc segment from 64kB
pages to 4kB pages if some driver creates a non-cacheable mapping in
the vmalloc area.  However, we never updated with SLB shadow buffer.
This fixes it.  Thanks to paulus for finding this.

Also added some write barriers to ensure the shadow buffer contents
are always consistent.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-08-03 19:36:01 +10:00
Stephane Eranian
a583f1b542 remove unused TIF_NOTIFY_RESUME flag
Remove unused TIF_NOTIFY_RESUME flag for all processor architectures.  The
flag was not used excecpt on IA-64 where the patch replaces it with
TIF_PERFMON_WORK.

Signed-off-by: stephane eranian <eranian@hpl.hp.com>
Cc: <linux-arch@vger.kernel.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-31 15:39:38 -07:00
Paul Mackerras
e56a6e20f3 [POWERPC] Clear RI bit in MSR before restoring r13 when returning to userspace
Some instruction tracing tools use the RI (recoverable interrupt) bit
in the MSR to indicate when it's safe to single-step.  Currently we
clear RI after restoring r13 when returning to userspace.  However,
if we single-step past the point where r13 is restored, we'll corrupt
r13 in the exception entry code and not restore it.  This moves the
clearing of RI to just before r13 is restored so this doesn't happen.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-02-07 14:03:23 +11:00
David Woodhouse
007d88d042 [POWERPC] Fix manual assembly WARN_ON() in enter_rtas().
When we switched over to the generic BUG mechanism we forgot to change
the assembly code which open-codes a WARN_ON() in enter_rtas(), so the
bug table got corrupted.

This patch provides an EMIT_BUG_ENTRY macro for use in assembly code,
and uses it in entry_64.S. Tested with CONFIG_DEBUG_BUGVERBOSE on ppc64
but not without -- I tried to turn it off but it wouldn't go away; I
suspect Aunt Tillie probably needed it.

This version gets __FILE__ and __LINE__ right in the assembly version --
rather than saying include/asm-powerpc/bug.h line 21 every time which is
a little suboptimal.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-01-09 17:03:02 +11:00
Stephen Rothwell
c705677e1c [POWERPC] iSeries: Eliminate "exceeds stub group size" warnings
Commit 3ccfc65c50 missed the same fixes for
legacy iSeries specific code, so make some more symbols no longer global.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-12-04 20:41:31 +11:00
s.hauer@pengutronix.de
a7a1ed3050 [PATCH] Remove occurences of PPC_MULTIPLATFORM in head_64.S
Since iSeries is merged to MULTIPLATFORM, there is no way to build a 64bit
kernel without MULTIPLATFORM, so PPC_MULTIPLATFORM can be removed in
64bit-only files.

Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-11-13 14:44:58 +11:00
Paul Mackerras
b0a779debd [POWERPC] Make sure interrupt enable gets restored properly
The lazy IRQ disable patch missed a couple of places where the
interrupt enable flags need to be restored correctly.  First, we
weren't restoring the paca->hard_enabled flag on interrupt exit.
Instead of saving it on entry, we compute it from the MSR_EE bit
in the MSR we are restoring at exit.  Secondly, the MMU hash miss
code was clearing both paca->soft_enabled and paca->hard_enabled
but not restoring them in the case where hash_page was able to
resolve the miss from the Linux page tables.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-10-18 10:12:53 +10:00
Paul Mackerras
d04c56f73c [POWERPC] Lazy interrupt disabling for 64-bit machines
This implements a lazy strategy for disabling interrupts.  This means
that local_irq_disable() et al. just clear the 'interrupts are
enabled' flag in the paca.  If an interrupt comes along, the interrupt
entry code notices that interrupts are supposed to be disabled, and
clears the EE bit in SRR1, clears the 'interrupts are hard-enabled'
flag in the paca, and returns.  This means that interrupts only
actually get disabled in the processor when an interrupt comes along.

When interrupts are enabled by local_irq_enable() et al., the code
sets the interrupts-enabled flag in the paca, and then checks whether
interrupts got hard-disabled.  If so, it also sets the EE bit in the
MSR to hard-enable the interrupts.

This has the potential to improve performance, and also makes it
easier to make a kernel that can boot on iSeries and on other 64-bit
machines, since this lazy-disable strategy is very similar to the
soft-disable strategy that iSeries already uses.

This version renames paca->proc_enabled to paca->soft_enabled, and
changes a couple of soft-disables in the kexec code to hard-disables,
which should fix the crash that Michael Ellerman saw.  This doesn't
yet use a reserved CR field for the soft_enabled and hard_enabled
flags.  This applies on top of Stephen Rothwell's patches to make it
possible to build a combined iSeries/other kernel.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-10-16 16:31:36 +10:00
Stephen Rothwell
3f639ee8c5 [POWERPC] implement BEGIN/END_FW_FTR_SECTION
and use it an all the obvious places in assembler code.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2006-10-03 16:50:21 +10:00
Michael Neuling
11a27ad782 [POWERPC] SLB shadow buffer cleanup
Cleanup some of the #define magic as suggested by Milton.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-08-25 13:17:08 +10:00
Michael Neuling
2f6093c847 [POWERPC] Implement SLB shadow buffer
This adds a shadow buffer for the SLBs and regsiters it with PHYP.
Only the bolted SLB entries (top 3) are shadowed.

The SLB shadow buffer tells the hypervisor what the kernel needs to
have in the SLB for the kernel to be able to function.  The hypervisor
can use this information to speed up partition context switches.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-08-08 17:08:56 +10:00
Jörn Engel
6ab3d5624e Remove obsolete #include <linux/config.h>
Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-30 19:25:36 +02:00
Anton Blanchard
bd19c8994a [POWERPC] system call micro optimisation
In the syscall path we currently have:

       crclr   so
       mfcr    r9

If we shift the crclr up we can avoid a stall on some CPUs.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-15 19:31:26 +10:00
Mike Kravetz
9fe901d124 [PATCH] powerpc: Workaround for pSeries RTAS bug
A bug in the RTAS services incorrectly interprets some bits in the CR
when called from the OS.  Specifically, bits in CR4.  The result could
be a firmware crash that also takes down the partition.  A firmware
fix is in the works.  We have seen this situation when performing DLPAR
operations.  As a temporary workaround, clear the CR in enter_rtas().
Note that enter_rtas() will not set any bits in CR4 before calling RTAS.

Also note that the 32 bit version of enter_rtas() should have the same
work around even though the chances of hitting the bug are much smaller
due to the lack of DLPAR on 32 bit kernels.  However, my assembly skills
are a bit rusty and the 32 bit code doesn't seem to follow the conventions
for where things should be saved.  In addition, I don't have a system
to test 32 bit kernels.  Help creating and at least touch testing the
same workaround for 32 bit would be appreciated.

Signed-off-by: Mike Kravetz <kravetz@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-28 16:45:33 +11:00
Paul Mackerras
5164501794 Merge ../linux-2.6 2006-03-09 14:32:05 +11:00
Paul Mackerras
1bd79336a4 powerpc: Fix various syscall/signal/swapcontext bugs
A careful reading of the recent changes to the system call entry/exit
paths revealed several problems, plus some things that could be
simplified and improved:

* 32-bit wasn't testing the _TIF_NOERROR bit in the syscall fast exit
  path, so it was only doing anything with it once it saw some other
  bit being set.  In other words, the noerror behaviour would apply to
  the next system call where we had to reschedule or deliver a signal,
  which is not necessarily the current system call.

* 32-bit wasn't doing the call to ptrace_notify in the syscall exit
  path when the _TIF_SINGLESTEP bit was set.

* _TIF_RESTOREALL was in both _TIF_USER_WORK_MASK and
  _TIF_PERSYSCALL_MASK, which is odd since _TIF_RESTOREALL is only set
  by system calls.  I took it out of _TIF_USER_WORK_MASK.

* On 64-bit, _TIF_RESTOREALL wasn't causing the non-volatile registers
  to be restored (unless perhaps a signal was delivered or the syscall
  was traced or single-stepped).  Thus the non-volatile registers
  weren't restored on exit from a signal handler.  We probably got
  away with it mostly because signal handlers written in C wouldn't
  alter the non-volatile registers.

* On 32-bit I simplified the code and made it more like 64-bit by
  making the syscall exit path jump to ret_from_except to handle
  preemption and signal delivery.

* 32-bit was calling do_signal unnecessarily when _TIF_RESTOREALL was
  set - but I think because of that 32-bit was actually restoring the
  non-volatile registers on exit from a signal handler.

* I changed the order of enabling interrupts and saving the
  non-volatile registers before calling do_syscall_trace_leave; now we
  enable interrupts first.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-08 13:24:22 +11:00
Paul Mackerras
c6622f63db powerpc: Implement accurate task and CPU time accounting
This implements accurate task and cpu time accounting for 64-bit
powerpc kernels.  Instead of accounting a whole jiffy of time to a
task on a timer interrupt because that task happened to be running at
the time, we now account time in units of timebase ticks according to
the actual time spent by the task in user mode and kernel mode.  We
also count the time spent processing hardware and software interrupts
accurately.  This is conditional on CONFIG_VIRT_CPU_ACCOUNTING.  If
that is not set, we do tick-based approximate accounting as before.

To get this accurate information, we read either the PURR (processor
utilization of resources register) on POWER5 machines, or the timebase
on other machines on

* each entry to the kernel from usermode
* each exit to usermode
* transitions between process context, hard irq context and soft irq
  context in kernel mode
* context switches.

On POWER5 systems with shared-processor logical partitioning we also
read both the PURR and the timebase at each timer interrupt and
context switch in order to determine how much time has been taken by
the hypervisor to run other partitions ("steal" time).  Unfortunately,
since we need values of the PURR on both threads at the same time to
accurately calculate the steal time, and since we can only calculate
steal time on a per-core basis, the apportioning of the steal time
between idle time (time which we ceded to the hypervisor in the idle
loop) and actual stolen time is somewhat approximate at the moment.

This is all based quite heavily on what s390 does, and it uses the
generic interfaces that were added by the s390 developers,
i.e. account_system_time(), account_user_time(), etc.

This patch doesn't add any new interfaces between the kernel and
userspace, and doesn't change the units in which time is reported to
userspace by things such as /proc/stat, /proc/<pid>/stat, getrusage(),
times(), etc.  Internally the various task and cpu times are stored in
timebase units, but they are converted to USER_HZ units (1/100th of a
second) when reported to userspace.  Some precision is therefore lost
but there should not be any accumulating error, since the internal
accumulation is at full precision.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-02-24 14:05:56 +11:00
Jon Mason
2ef9481e66 [PATCH] powerpc: trivial: modify comments to refer to new location of files
This patch removes all self references and fixes references to files
in the now defunct arch/ppc64 tree.  I think this accomplises
everything wanted, though there might be a few references I missed.

Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-02-10 16:53:51 +11:00
David Woodhouse
f27201da5c [PATCH] TIF_RESTORE_SIGMASK support for arch/powerpc
Implement the TIF_RESTORE_SIGMASK flag in the new arch/powerpc kernel, for
both 32-bit and 64-bit system call paths.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-18 19:20:30 -08:00
David Gibson
3356bb9f7b [PATCH] powerpc: Remove lppaca structure from the PACA
At present the lppaca - the structure shared with the iSeries
hypervisor and phyp - is contained within the PACA, our own low-level
per-cpu structure.  This doesn't have to be so, the patch below
removes it, making a separate array of lppaca structures.

This saves approximately 500*NR_CPUS bytes of image size and kernel
memory, because we don't need aligning gap between the Linux and
hypervisor portions of every PACA.  On the other hand it means an
extra level of dereference in many accesses to the lppaca.

The patch also gets rid of several places where we assign the paca
address to a local variable for no particular reason.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-13 21:17:39 +11:00
David Gibson
e58c3495e6 [PATCH] powerpc: Cleanup LOADADDR etc. asm macros
This patch consolidates the variety of macros used for loading 32 or
64-bit constants in assembler (LOADADDR, LOADBASE, SET_REG_TO_*).  The
idea is to make the set of macros consistent across 32 and 64 bit and
to make it more obvious which is the appropriate one to use in a given
situation.  The new macros and their semantics are described in the
comments in ppc_asm.h.

In the process, we change several places that were unnecessarily using
immediate loads on ppc64 to use the GOT/TOC.  Likewise we cleanup a
couple of places where we were clumsily subtracting PAGE_OFFSET with
asm instructions to use assemble-time arithmetic or the toreal() macro
instead.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-13 21:16:23 +11:00
Michael Ellerman
b5666f7039 [PATCH] powerpc: Separate usage of KERNELBASE and PAGE_OFFSET
This patch separates usage of KERNELBASE and PAGE_OFFSET. I haven't
looked at any of the PPC32 code, if we ever want to support Kdump on
PPC we'll have to do another audit, ditto for iSeries.

This patch makes PAGE_OFFSET the constant, it'll always be 0xC * 1
gazillion for 64-bit.

To get a physical address from a virtual one you subtract PAGE_OFFSET,
_not_ KERNELBASE.

KERNELBASE is the virtual address of the start of the kernel, it's
often the same as PAGE_OFFSET, but _might not be_.

If you want to know something's offset from the start of the kernel
you should subtract KERNELBASE.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09 14:51:54 +11:00
David Woodhouse
bcb05504ed [PATCH] ppc64 syscall_exit_work: call the save_nvgprs function, not its descriptor.
On Tue, 2005-11-15 at 18:52 +0000, David Woodhouse wrote:
> This cleanup patch speeds up the null syscall path on ppc64 by about 3%,
> and brings the ppc32 and ppc64 code slightly closer together.

Needs this unless your binutils, like mine, are clever enough to notice
my stupidity and fix it up automatically...

Spotted by Paul.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09 14:49:42 +11:00
David Woodhouse
401d1f029b [PATCH] syscall entry/exit revamp
This cleanup patch speeds up the null syscall path on ppc64 by about 3%,
and brings the ppc32 and ppc64 code slightly closer together.

The ppc64 code was checking current_thread_info()->flags twice in the
syscall exit path; once for TIF_SYSCALL_T_OR_A before disabling
interrupts, and then again for TIF_SIGPENDING|TIF_NEED_RESCHED etc after
disabling interrupts. Now we do the same as ppc32 -- check the flags
only once in the fast path, and re-enable interrupts if necessary in the
ptrace case.

The patch abolishes the 'syscall_noerror' member of struct thread_info
and replaces it with a TIF_NOERROR bit in the flags, which is handled in
the slow path. This shortens the syscall entry code, which no longer
needs to clear syscall_noerror.

The patch adds a TIF_SAVE_NVGPRS flag which causes the syscall exit slow
path to save the non-volatile GPRs into a signal frame. This removes the
need for the assembly wrappers around sys_sigsuspend(),
sys_rt_sigsuspend(), et al which existed solely to save those registers
in advance. It also means I don't have to add new wrappers for ppoll()
and pselect(), which is what I was supposed to be doing when I got
distracted into this...

Finally, it unifies the ppc64 and ppc32 methods of handling syscall exit
directly into a signal handler (as required by sigsuspend et al) by
introducing a TIF_RESTOREALL flag which causes _all_ the registers to be
reloaded from the pt_regs by taking the ret_from_exception path, instead
of the normal syscall exit path which stomps on the callee-saved GPRs.

It appears to pass an LTP test run on ppc64, and passes basic testing on
ppc32 too. Brief tests of ptrace functionality with strace and gdb also
appear OK. I wouldn't send it to Linus for 2.6.15 just yet though :)

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09 14:49:01 +11:00
Paul Mackerras
3eb6f26bcd powerpc: correct register usage in 64-bit syscall exit path
Since we don't restore the volatile registers in the syscall exit
path, we need to make sure we don't leak any potentially interesting
values from the kernel to userspace.  This was already the case for
all except r11.  This makes it use r11 for an MSR value, so r11 will
have an (uninteresting) MSR value in it on return to userspace.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-12-20 15:38:47 +11:00
Horms
ebd50e5001 [PATCH] audit_sysctl_exit can only be used with CONF_AUDIT_SYSCTL
This section of code calls .audit_syscal_exit, but is inside CONFIG_AUDIT,
so it will fail to build if CONFIG_AUDITSYSCALL is not defined.

After discussion with David Woodhouse, change the ifdef to
CONFIG_AUDITSYSCALL

Signed-off-by: Horms <horms@verge.net.au>
Acked-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-01 21:44:46 -08:00
Stephen Rothwell
b09a4913b1 powerpc: change sys32_ to compat_sys_
This allows us to get rid of one type of entry in systbl.S.

In passing we remove the duplicate compat_sys_getdents and
compat_sys_utimes for which there are generic versions.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2005-10-18 14:51:57 +10:00