Split machine check handler code and move it to cio and kernel code
where it belongs to. No functional change.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Currently we use the cpuid (via STIDP instruction) to recognize LPAR,
z/VM and KVM.
The architecture states, that bit 0-7 of STIDP returns all zero, and
if STIDP is executed in a virtual machine, the VM operating system
will replace bits 0-7 with FF.
KVM should not use FE to distinguish z/VM from KVM for interested
guests. The proper way to detect the hypervisor is the STSI (Store
System Information) instruction, which return information about the
hypervisors via function code 3, selector1=2, selector2=2.
This patch changes the detection routine of Linux to use STSI instead
of STIDP. This detection is earlier than bootmem, we have to use a
static buffer. Since STSI expects a 4kb block (4kb aligned) this
patch also changes the init.data alignment for s390. As this section
will be freed during boot, this should be no problem.
Patch is tested with LPAR, z/VM, KVM on LPAR, and KVM under z/VM.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
To support High Performance FICON, the DASD device driver has to
translate I/O requests into the new transport mode control words (TCW)
instead of the traditional (command mode) CCW requests.
Signed-off-by: Stefan Weinhuber <wein@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The dasd device driver will now support ECKD devices with more then
65520 cylinders.
In the traditional ECKD adressing scheme each track is addressed
by a 16-bit cylinder and 16-bit head number. The new addressing
scheme makes use of the fact that the actual number of heads is
never larger then 15, so 12 bits of the head number can be redefined
to be part of the cylinder address.
Signed-off-by: Stefan Weinhuber <wein@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Provide new shutdown action "dump_reipl" for automatic ipl after dump.
Signed-off-by: Frank Munzert <munzert@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
After TASK_SIZE now gives the current size of the address space the
upgrade of a 64 bit process from 3 to 4 levels of page table needs
to use the arch_mmap_check hook to catch large mmap lengths. The
get_unmapped_area* functions need to check for -ENOMEM from the
arch_get_unmapped_area*, upgrade the page table and retry.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Make page table walking on s390 more robust. The current code requires
that the pgd/pud/pmd/pte loop is only done for address ranges that are
below the end address of the last vma of the address space. But this
is not always true, e.g. the generic page table walker does not guarantee
this. Change TASK_SIZE/TASK_SIZE_OF to reflect the current size of the
address space. This makes the generic page table walker happy but it
breaks the upgrade of a 3 level page table to a 4 level page table.
To make the upgrade work again another fix is required.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The default values for SD_MC_INIT cause an additional cpu usage of up
to 40% on some network benchmarks compared to the plain SD_CPU_INIT
values. So just define SD_MC_INIT to SD_CPU_INIT.
More tuning needs to be done.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Standby memory detected with the sclp interface gets always registered
with add_memory calls without considering the limitationt that the
"mem=" kernel paramater implies.
So fix this and only register standby memory that is below the specified
limit.
This fixes zfcpdump since it uses "mem=32M". In case there is appr.
2GB standby memory present all of usable memory would be used for the
struct pages needed for standby memory.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
commit aa5e97ce4b
[PATCH] improve precision of process accounting.
Introduced a timing regression:
-bash-3.2# time ls
real 0m0.006s
user 0m1.754s
sys 0m1.094s
The problem was introduced by an error in cputime_to_timeval.
Cputime is now 1/4096 microsecond, therefore, we have to divide
the remainder with 4096 to get the microseconds.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The vdso_per_cpu_data entry in the lowcore structure uses __u32
instead of __u64. If the data page is above 4GB the pointer is
truncated and the kernel crashes.
Reported-by: Mijo Safradin <mijo@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Use the personality() macro to mask out all bits that are not
relevant for the personality type.
The personality field contains bits for other things as well,
so without masking out the not relevalent bits the comparison
won't do what is expected.
Reported-by: Andreas Krebbel <krebbel@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Add swab.h to kbuild.asm and remove the individual entries from
each arch, mark as unifdef as some arches have some kernel-only
bits inside.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As requested by Andrew. Same as what sparc did.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
/include/asm/chpid.h:12: include of <linux/types.h> is preferred over <asm/types.h>
/include/asm/chsc.h:15: found __[us]{8,16,32,64} type without #include <linux/types.h>
/include/asm/cmb.h:28: found __[us]{8,16,32,64} type without #include <linux/types.h>
/include/asm/dasd.h:195: found __[us]{8,16,32,64} type without #include <linux/types.h>
/include/asm/kvm.h:16: include of <linux/types.h> is preferred over <asm/types.h>
/include/asm/kvm.h:30: found __[us]{8,16,32,64} type without #include <linux/types.h>
/include/asm/qeth.h:24: found __[us]{8,16,32,64} type without #include <linux/types.h>
/include/asm/schid.h:5: found __[us]{8,16,32,64} type without #include <linux/types.h>
/include/asm/swab.h:12: include of <linux/types.h> is preferred over <asm/types.h>
/include/asm/swab.h:19: found __[us]{8,16,32,64} type without #include <linux/types.h>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
When the connection between host and storage server is lost, the
dasd device driver usually blocks all I/O on affected devices and
waits for them to reappear. In some setups however it would be
better if the I/O is returned as error so that device can be
recovered by some other means, eg. in a raid or multipath setup.
Signed-off-by: Holger Smolinski <Holger.Smolinski@de.ibm.com>
Signed-off-by: Stefan Weinhuber <wein@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Bring s390 in line with all the other ports. Not sure how s390 missed
this change as all the other arches were being updated ...
CC: Heiko Carstens <heiko.carstens@de.ibm.com>
CC: linux390@de.ibm.com
CC: linux-s390@vger.kernel.org
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
/include/asm/ptrace.h:275: extern's make no sense in userspace
/include/asm/ptrace.h:279: extern's make no sense in userspace
/include/asm/ptrace.h:280: extern's make no sense in userspace
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The atomic_t type cannot currently be used in some header files because it
would create an include loop with asm/atomic.h. Move the type definition
to linux/types.h to break the loop.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'cpus4096-for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (77 commits)
x86: setup_per_cpu_areas() cleanup
cpumask: fix compile error when CONFIG_NR_CPUS is not defined
cpumask: use alloc_cpumask_var_node where appropriate
cpumask: convert shared_cpu_map in acpi_processor* structs to cpumask_var_t
x86: use cpumask_var_t in acpi/boot.c
x86: cleanup some remaining usages of NR_CPUS where s/b nr_cpu_ids
sched: put back some stack hog changes that were undone in kernel/sched.c
x86: enable cpus display of kernel_max and offlined cpus
ia64: cpumask fix for is_affinity_mask_valid()
cpumask: convert RCU implementations, fix
xtensa: define __fls
mn10300: define __fls
m32r: define __fls
h8300: define __fls
frv: define __fls
cris: define __fls
cpumask: CONFIG_DISABLE_OBSOLETE_CPUMASK_FUNCTIONS
cpumask: zero extra bits in alloc_cpumask_var_node
cpumask: replace for_each_cpu_mask_nr with for_each_cpu in kernel/time/
cpumask: convert mm/
...
Impact: New API
The old topology_core_siblings() and topology_thread_siblings() return
a cpumask_t; these new ones return a (const) struct cpumask *.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
The extract cpu time instruction (ectg) instruction allows the user
process to get the current thread cputime without calling into the
kernel. The code that uses the instruction needs to switch to the
access registers mode to get access to the per-cpu info page that
contains the two base values that are needed to calculate the current
cputime from the CPU timer with the ectg instruction.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Distinguish the cputime of the idle process where idle is actually using
cpu cycles from the cputime where idle is sleeping on an enabled wait psw.
The former is accounted as system time, the later as idle time.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Increase the precision of the idle time calculation that is exported
to user space via /sys/devices/system/cpu/cpu<x>/idle_time_us
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The unit of the cputime accouting values that are stored per process is
currently a microsecond. The CPU timer has a maximum granularity of
2**-12 microseconds. There is no benefit in storing the per process values
in the lesser precision and there is the disadvantage that the backend
has to do the rounding to microseconds. The better solution is to use
the maximum granularity of the CPU timer as cputime unit.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This doesn't really matter, since s390 pagesize is 4k anyway.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Like cpu_coregroup_map, but returns a (const) pointer.
Compile-tested on s390 (defconfig).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Tell the compile that the clear_table inline assembly writes to the
memory referenced by *s.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
On s390 we always want to run with precise cputime accounting.
Remove the config options VIRT_TIMER and VIRT_CPU_ACCOUNTING.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Functions which end in a BUG() statement and skip the return statement
cause compile warnings on s390, e.g.:
mm/bootmem.c: In function 'mark_bootmem':
mm/bootmem.c:321: warning: control reaches end of non-void function
To avoid the warning add an endless loop to the BUG() macro.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
disabled_wait() won't return, so add an __attribute__((noreturn)).
This will remove a false positive finding which our internal code
checker reports.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Values for the sac field have changed - update code accordingly.
Signed-off-by: Peter Oberparleiter <peter.oberparleiter@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This implements just the basic function tracer (_mcount) backend for s390.
The dynamic variant will come later.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add a new proc interface /proc/service_levels that allows any code
to report a relevant service level, e.g. the microcode level of
devices, the service level of the hypervisor, etc.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
qeth needs to get the port count information before
qdio has allocated a page for the chsc operation.
Extend qdio_get_ssqd_desc() to store the data in the
specified structure.
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
When the machine supports AP adapter interrupts polling will be
switched off at module initialization and the driver will work in
interrupt mode.
Signed-off-by: Felix Beck <felix.beck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
stfle will be needed by the ap_bus module to figure out wether the AP
queue adapter interruption facility is installed.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Since etr/stp don't need the old smp_call_function semantics anymore
we can convert s390 to the generic IPI infrastructure.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add a vdso to speed up gettimeofday and clock_getres/clock_gettime for
CLOCK_REALTIME/CLOCK_MONOTONIC.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
All architectures now use the generic compat_sys_ptrace, as should every
new architecture that needs 32bit compat (if we'll ever get another).
Remove the now superflous __ARCH_WANT_COMPAT_SYS_PTRACE define, and also
kill a comment about __ARCH_SYS_PTRACE that was added after
__ARCH_SYS_PTRACE was already gone.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When running several kvm processes with lots of memory overcommitment,
we have seen an oops during process shutdown:
------------[ cut here ]------------
Kernel BUG at 0000000000193434 [verbose debug info unavailable]
addressing exception: 0005 [#1] PREEMPT SMP
Modules linked in: kvm sunrpc qeth_l2 dm_mod qeth ccwgroup
CPU: 10 Not tainted 2.6.28-rc4-kvm-bigiron-00521-g0ccca08-dirty #8
Process kuli (pid: 14460, task: 0000000149822338, ksp: 0000000024f57650)
Krnl PSW : 0704e00180000000 0000000000193434 (unmap_vmas+0x884/0xf10)
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 EA:3
Krnl GPRS: 0000000000000002 0000000000000000 000000051008d000 000003e05e6034e0
00000000001933f6 00000000000001e9 0000000407259e0a 00000002be88c400
00000200001c1000 0000000407259608 0000000407259e08 0000000024f577f0
0000000407259e09 0000000000445fa8 00000000001933f6 0000000024f577f0
Krnl Code: 0000000000193426: eb22000c000d sllg %r2,%r2,12
000000000019342c: a7180000 lhi %r1,0
0000000000193430: b2290012 iske %r1,%r2
>0000000000193434: a7110002 tmll %r1,2
0000000000193438: a7840006 brc 8,193444
000000000019343c: 9602c000 oi 0(%r12),2
0000000000193440: 96806000 oi 0(%r6),128
0000000000193444: a7110004 tmll %r1,4
Call Trace:
([<00000000001933f6>] unmap_vmas+0x846/0xf10)
[<0000000000199680>] exit_mmap+0x210/0x458
[<000000000012a8f8>] mmput+0x54/0xfc
[<000000000012f714>] exit_mm+0x134/0x144
[<000000000013120c>] do_exit+0x240/0x878
[<00000000001318dc>] do_group_exit+0x98/0xc8
[<000000000013e6b0>] get_signal_to_deliver+0x30c/0x358
[<000000000010bee0>] do_signal+0xec/0x860
[<0000000000112e30>] sysc_sigpending+0xe/0x22
[<000002000013198a>] 0x2000013198a
INFO: lockdep is turned off.
Last Breaking-Event-Address:
[<00000000001a68d0>] free_swap_and_cache+0x1a0/0x1a4
<4>---[ end trace bc19f1d51ac9db7c ]---
The faulting instruction is the storage key operation (iske) in
ptep_rcp_copy (called by pte_clear, called by unmap_vmas). iske
reads dirty and reference bit information for a physical page and
requires a valid physical address. Since we are in pte_clear, we
cannot rely on the pte containing a valid address. Fortunately we
dont need these information in pte_clear - after all there is no
mapping. The best fix is to remove the needless call to ptep_rcp_copy
that contains the iske.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
syscall_get_nr() currently returns a valid result only if the call
chain of the traced process includes do_syscall_trace_enter(). But
collect_syscall() can be called for any sleeping task, the result of
syscall_get_nr() in general is completely bogus.
To make syscall_get_nr() work for any sleeping task the traps field
in pt_regs is replace with svcnr - the system call number the process
is executing. If svcnr == 0 the process is not on a system call path.
The syscall_get_arguments and syscall_set_arguments use regs->gprs[2]
for the first system call parameter. This is incorrect since gprs[2]
may have been overwritten with the system call number if the call
chain includes do_syscall_trace_enter. Use regs->orig_gprs2 instead.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>