Commit Graph

2022 Commits

Author SHA1 Message Date
Martin Schwidefsky
207a05499b [S390] return address of compat signals
A 31-bit kernel always sets the high order bit in the return address
for a signal handler.
git commit d4e81b35b8 "[S390] allow all addressing modes" makes
sure that the high order bit is set in the signal return address for
standard signals of a 31-bit compat process but fails to do the same
for real-time signals. To make things consistent the bit needs to be
set by setup_rt_frame32 as well.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-12-27 11:27:14 +01:00
Heiko Carsten
77ecd06ae4 [S390] sysctl: get rid of dead declaration
Remove dead sysctl_userprocess_debug. The variable is
gone since we map our old userprocess_debug sysctl to
the common code show_unhandled_signals sysctl on s390.

Signed-off-by: Heiko Carsten <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-12-27 11:27:14 +01:00
Steffen Maier
e58b0d902f [S390] qdio: fix kernel panic for zfcp 31-bit
The queue_start_poll function pointer field in struct qdio_initialize
had to change its type and become a vector of function pointers to
support asynchronous delivery of storage blocks so rename the field to
make the type change explicit and ensure no other user of qdio tries
to use the field the old way. During setting up the qdio queues, only
dereference vector elements if the vector is actually allocated.

Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Signed-off-by: Einar Lueck <elelueck@de.ibm.com>
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-12-27 11:27:14 +01:00
Michael Holzheu
7fe7a18cdd [S390] Add VMCOREINFO_SYMBOL(high_memory) to vmcoreinfo
Currently the vmalloc_start address (or better end of real memory) for s390x
is obtained by makedumpfile using vmlist.addr symbol, which is not correct.
The correct vmalloc_start address can be obtained using 'high_memory' symbol.

This patch adds the high_memory symbol to vmcoreinfo.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-12-27 11:27:13 +01:00
Martin Schwidefsky
85ac7ca597 [S390] outstanding interrupts vs. smp_send_stop
The panic function will first print the panic message to the console,
then stop additional cpus with smp_send_stop and finally call the
function on the panic notifier list.
In case of an I/O based console the panic message will cause I/O to
be started and a function on the panic notifier list will wait for the
completion of the I/O. That does not work if an I/O completion interrupt
has already been delivered to a cpu that is then stopped by smp_send_stop.
To break this cyclic dependency add code to smp_send_stop that gives
the additional cpu the opportunity to complete outstanding interrupts.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-12-27 11:27:13 +01:00
Heiko Carstens
3a3954ceae [S390] ipc: call generic sys_ipc demultiplexer
Call generic IPC demultiplexer instead of having a nearly identical
s390 variant. Also make sure that native and compat handling now have
the same behaviour.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-12-27 11:27:13 +01:00
Martin Schwidefsky
aa33c8cbba [S390] cleanup trap handling
Move the program interruption code and the translation exception identifier
to the pt_regs structure as 'int_code' and 'int_parm_long' and make the
first level interrupt handler in entry[64].S store the two values. That
makes it possible to drop 'prot_addr' and 'trap_no' from the thread_struct
and to reduce the number of arguments to a lot of functions. Finally
un-inline do_trap. Overall this saves 5812 bytes in the .text section of
the 64 bit kernel.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-12-27 11:27:12 +01:00
Heiko Carstens
679e2ea733 [S390] Remove Kerntypes leftovers
Remove last traces of our kerntypes patch which was always an addon
patch which never got upstream. Somehow a few bits got upstream
anyway.
Since kerntypes aren't used anymore and lcrash isn't maintained (for
s390 at least) remove the last traces of kerntypes that somehow went
upstream. Also remove the documentation that mentions lcrash.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-12-27 11:27:12 +01:00
Heiko Carstens
d68bddb732 [S390] topology: increase poll frequency if change is anticipated
Increase cpu topology change poll frequency if a change is anticipated.
Otherwise a user might be a bit confused to have to wait up to a minute
in order to see a change this should be visible immediatly.
However there is no guarantee that the change will happen during the
time frame the poll frequency is increased.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-12-27 11:27:12 +01:00
Martin Schwidefsky
c5328901aa [S390] entry[64].S improvements
Another round of cleanup for entry[64].S, in particular the program check
handler looks more reasonable now. The code size for the 31 bit kernel
has been reduced by 616 byte and by 528 byte for the 64 bit version.
Even better the code is a bit faster as well.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-12-27 11:27:12 +01:00
Jan Glauber
3b7f993394 [S390] make arch/s390 subdirectories depend on config option
Only add subdirectories of arch/s390 to kbuild if their respective
config option is selected.

Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-12-27 11:27:12 +01:00
Martin Schwidefsky
ddd6f9537d [S390] kvm: move cmf host id constant out of lowcore
There is no reason for the cpu-measurement-facility host id constant to
reside in the lowcore where space is precious. Use an entry in the literal
pool in HANDLE_SIE_INTERCEPT and a stack slot in sie64a.
While we are at it replace the id -1 with 0 to indicate host execution.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-12-27 11:27:12 +01:00
Heiko Carstens
4baeb964d9 [S390] topology: cleanup z10 topology handling
Cleanup z10 topology handling. This adds some more code but hopefully
the result is more readable and easier to maintain.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-12-27 11:27:11 +01:00
Carsten Otte
f32269a0d0 [S390] disable MACHINE_IS_VM check for pfault
This patch disables the check for MACHINE_IS_VM when initializing the
pfault infrastructure. The code checks for successful completion of
diag 258 anyway, thus it's safe to try initialization on LPAR anyway.
This is needed to use pfault on kvm

Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-12-27 11:27:10 +01:00
Heiko Carstens
83a24e3290 [S390] topology: get rid of ifdefs
Remove all ifdefs from topology code and also only compile it for the
CONFIG_SCHED_BOOK case. The new code selects SCHED_MC if SCHED_BOOK is
selected. SCHED_MC without SCHED_BOOK is not possible anymore.
Furthermore various sysfs attributes are not available anymore for the
!SCHED_BOOK case. In particular all attributes that correspond to
CPU polarization.
But since all real world kernels have SCHED_BOOK selected anyway this
doesn't matter too much.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-12-27 11:27:10 +01:00
Michael Holzheu
3931723f36 [S390] kernel: Fix smp_switch_to_ipl_cpu() stack frame setup
Currently, when smp_switch_to_ipl_cpu() is done, the backchain in the dump
analysis tool crash looks like the following:

 #0 [1f746e70] __machine_kexec at 11dd92
 #1 [1f746eb8] smp_restart_cpu at 11820e
 #0 [00907eb0] cpu_idle at 10602e
 #1 [00907ef8] start_kernel at 979a08

It would be good to see the registers of the interrupted function.
To achieve this, the backchain on the new stack has to be set to zero.
This looks then like the following:

 #0 [1f746e70] __machine_kexec at 11dd8e
 #1 [1f746eb8] smp_restart_cpu at 11820a
 PSW:  0706000180000000 00000000005c6fe6 (vtime_stop_cpu+134)
 GPRS: 0000000000000000 00000000005c6fe6 0000000001ad0228 0000000001ad0248
       0000000000907f08 0000000001ad0b40 0000000000979344 0000000000000000
       00000000009c0000 00000000009c0010 00000000009ab024 0000000001ad0200
       0000000001ad0238 00000000005cc9d8 000000000010602e 0000000000907e68
 #0 [00907eb0] cpu_idle at 10602e
 #1 [00907ef8] start_kernel at 979a08

In addition to this, now also the correct PSW is stored in the pt_regs
structure that is located at the start of the panic stack.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-12-27 11:27:10 +01:00
Martin Schwidefsky
14045ebf1e [S390] add support for physical memory > 4TB
The kernel address space of a 64 bit kernel currently uses a three level
page table and the vmemmap array has a fixed address and a fixed maximum
size. A three level page table is good enough for systems with less than
3.8TB of memory, for bigger systems four page table levels need to be
used. Each page table level costs a bit of performance, use 3 levels for
normal systems and 4 levels only for the really big systems.
To avoid bloating sparse.o too much set MAX_PHYSMEM_BITS to 46 for a
maximum of 64TB of memory.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-12-27 11:27:10 +01:00
Michael Holzheu
4999023aa9 [S390] Remove useless newline in reserve_kdump_bootmem()
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-12-27 11:27:09 +01:00
Michael Holzheu
44e5ddc4e9 [S390] Rework create_mem_hole() function
This patch makes the create_mem_hole() function more readable and
fixes some minor bugs (e.g. off-by-one problems).

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-12-27 11:27:09 +01:00
Christian Borntraeger
c86cce2a20 [S390] kvm: fix sleeping function ... at mm/page_alloc.c:2260
commit cc772456ac
    [S390] fix list corruption in gmap reverse mapping

added a potential dead lock:

BUG: sleeping function called from invalid context at mm/page_alloc.c:2260
in_atomic(): 1, irqs_disabled(): 0, pid: 1108, name: qemu-system-s39
3 locks held by qemu-system-s39/1108:
 #0:  (&kvm->slots_lock){+.+.+.}, at: [<000003e004866542>] kvm_set_memory_region+0x3a/0x6c [kvm]
 #1:  (&mm->mmap_sem){++++++}, at: [<0000000000123790>] gmap_map_segment+0x9c/0x298
 #2:  (&(&mm->page_table_lock)->rlock){+.+.+.}, at: [<00000000001237a8>] gmap_map_segment+0xb4/0x298
CPU: 0 Not tainted 3.1.3 #45
Process qemu-system-s39 (pid: 1108, task: 00000004f8b3cb30, ksp: 00000004fd5978d0)
00000004fd5979a0 00000004fd597920 0000000000000002 0000000000000000
       00000004fd5979c0 00000004fd597938 00000004fd597938 0000000000617e96
       0000000000000000 00000004f8b3cf58 0000000000000000 0000000000000000
       000000000000000d 000000000000000c 00000004fd597988 0000000000000000
       0000000000000000 0000000000100a18 00000004fd597920 00000004fd597960
Call Trace:
([<0000000000100926>] show_trace+0xee/0x144)
 [<0000000000131f3a>] __might_sleep+0x12a/0x158
 [<0000000000217fb4>] __alloc_pages_nodemask+0x224/0xadc
 [<0000000000123086>] gmap_alloc_table+0x46/0x114
 [<000000000012395c>] gmap_map_segment+0x268/0x298
 [<000003e00486b014>] kvm_arch_commit_memory_region+0x44/0x6c [kvm]
 [<000003e004866414>] __kvm_set_memory_region+0x3b0/0x4a4 [kvm]
 [<000003e004866554>] kvm_set_memory_region+0x4c/0x6c [kvm]
 [<000003e004867c7a>] kvm_vm_ioctl+0x14a/0x314 [kvm]
 [<0000000000292100>] do_vfs_ioctl+0x94/0x588
 [<0000000000292688>] SyS_ioctl+0x94/0xac
 [<000000000061e124>] sysc_noemu+0x22/0x28
 [<000003fffcd5e7ca>] 0x3fffcd5e7ca
3 locks held by qemu-system-s39/1108:
 #0:  (&kvm->slots_lock){+.+.+.}, at: [<000003e004866542>] kvm_set_memory_region+0x3a/0x6c [kvm]
 #1:  (&mm->mmap_sem){++++++}, at: [<0000000000123790>] gmap_map_segment+0x9c/0x298
 #2:  (&(&mm->page_table_lock)->rlock){+.+.+.}, at: [<00000000001237a8>] gmap_map_segment+0xb4/0x298

Fix this by freeing the lock on the alloc path. This is ok, since the
gmap table is never freed until we call gmap_free, so the table we are
walking cannot go.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-12-27 11:25:48 +01:00
Michael Holzheu
1fb810576f [S390] Check for NULL termination in command line setup
The current code in setup_boot_command_line() uses a heuristic to
detect an EBCDIC command line. It checks if any of the bytes in
the command line has bit one (0x80) set. In that case it is assumed
that we have an EBCDIC string and the complete command line is
converted.

On s390 there are cases where the boot loader provides a kernel
command line that is NULL terminated, but has random data after
the NULL termination. In that case, setup_boot_command_line()
might misinterpret an ASCII string for an EBCDIC string. A
subsequent string conversion can then damage the ASCII string.

This patch solves the problem by checking for NULL termination.
If no EBCDIC character has been found until the the NULL
termination has been found, we now assume that we have an ASCII
string.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-12-27 11:25:48 +01:00
Heiko Carstens
272f01bf9b [S390] irq: fix accounting of external call/emergency signal
Mask the extint_code parameter of the smp external interrupt handler
to get the interruption code. Otherwise emergency call interrupts
erroneously might be accounted as emergency signal interrupts.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-12-27 11:25:48 +01:00
David S. Miller
abb434cb05 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	net/bluetooth/l2cap_core.c

Just two overlapping changes, one added an initialization of
a local variable, and another change added a new local variable.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-23 17:13:56 -05:00
Christoph Lameter
933393f58f percpu: Remove irqsafe_cpu_xxx variants
We simply say that regular this_cpu use must be safe regardless of
preemption and interrupt state.  That has no material change for x86
and s390 implementations of this_cpu operations.  However, arches that
do not provide their own implementation for this_cpu operations will
now get code generated that disables interrupts instead of preemption.

-tj: This is part of on-going percpu API cleanup.  For detailed
     discussion of the subject, please refer to the following thread.

     http://thread.gmane.org/gmane.linux.kernel/1222078

Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
LKML-Reference: <alpine.DEB.2.00.1112221154380.11787@router.home>
2011-12-22 10:40:20 -08:00
Kay Sievers
3fbacffbe9 s390: time - convert sysdev_class to a regular subsystem
After all sysdev classes are ported to regular driver core entities, the
sysdev implementation will be entirely removed from the kernel.

Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-21 15:09:50 -08:00
Kay Sievers
8a25a2fd12 cpu: convert 'cpu' and 'machinecheck' sysdev_class to a regular subsystem
This moves the 'cpu sysdev_class' over to a regular 'cpu' subsystem
and converts the devices to regular devices. The sysdev drivers are
implemented as subsystem interfaces now.

After all sysdev classes are ported to regular driver core entities, the
sysdev implementation will be entirely removed from the kernel.

Userspace relies on events and generic sysfs subsystem infrastructure
from sysdev devices, which are made available with this conversion.

Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Borislav Petkov <bp@amd64.org>
Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Cc: Len Brown <lenb@kernel.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-21 14:29:42 -08:00
Rafael J. Wysocki
b00f4dc5ff Merge branch 'master' into pm-sleep
* master: (848 commits)
  SELinux: Fix RCU deref check warning in sel_netport_insert()
  binary_sysctl(): fix memory leak
  mm/vmalloc.c: remove static declaration of va from __get_vm_area_node
  ipmi_watchdog: restore settings when BMC reset
  oom: fix integer overflow of points in oom_badness
  memcg: keep root group unchanged if creation fails
  nilfs2: potential integer overflow in nilfs_ioctl_clean_segments()
  nilfs2: unbreak compat ioctl
  cpusets: stall when updating mems_allowed for mempolicy or disjoint nodemask
  evm: prevent racing during tfm allocation
  evm: key must be set once during initialization
  mmc: vub300: fix type of firmware_rom_wait_states module parameter
  Revert "mmc: enable runtime PM by default"
  mmc: sdhci: remove "state" argument from sdhci_suspend_host
  x86, dumpstack: Fix code bytes breakage due to missing KERN_CONT
  IB/qib: Correct sense on freectxts increment and decrement
  RDMA/cma: Verify private data length
  cgroups: fix a css_set not found bug in cgroup_attach_proc
  oprofile: Fix uninitialized memory access when writing to writing to oprofilefs
  Revert "xen/pv-on-hvm kexec: add xs_reset_watches to shutdown watches from old kernel"
  ...

Conflicts:
	kernel/cgroup_freezer.c
2011-12-21 21:59:45 +01:00
Ingo Molnar
d87f69a16e Merge commit 'v3.2-rc6' into perf/core
Merge reason: Update with the latest fixes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-20 20:32:11 +01:00
Ingo Molnar
45aa0663cc Merge branch 'memblock-kill-early_node_map' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc into core/memblock 2011-12-20 12:14:26 +01:00
Ingo Molnar
124ba94033 Merge branch 'for-tip' of git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile into perf/core 2011-12-20 12:10:29 +01:00
Martin Schwidefsky
612ef28a04 Merge branch 'sched/core' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into cputime-tip
Conflicts:
	drivers/cpufreq/cpufreq_conservative.c
	drivers/cpufreq/cpufreq_ondemand.c
	drivers/macintosh/rack-meter.c
	fs/proc/stat.c
	fs/proc/uptime.c
	kernel/sched/core.c
2011-12-19 19:23:15 +01:00
Robert Richter
913050b91e oprofile: Fix uninitialized memory access when writing to writing to oprofilefs
If oprofilefs_ulong_from_user() is called with count equals
zero, *val remains unchanged. Depending on the implementation it
might be uninitialized.

Change oprofilefs_ulong_from_user()'s interface to return count
on success. Thus, we are able to return early if count equals
zero which avoids using *val uninitialized. Fixing all users of
oprofilefs_ulong_ from_user().

This follows write syscall implementation when count is zero:
"If count is zero ... [and if] no errors are detected, 0 will be
returned without causing any other effect." (man 2 write)

Reported-By: Mike Waychison <mikew@google.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <stable@vger.kernel.org>
Cc: oprofile-list <oprofile-list@lists.sourceforge.net>
Link: http://lkml.kernel.org/r/20111219153830.GH16765@erda.amd.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-19 17:18:43 +01:00
Martin Schwidefsky
648616343c [S390] cputime: add sparse checking and cleanup
Make cputime_t and cputime64_t nocast to enable sparse checking to
detect incorrect use of cputime. Drop the cputime macros for simple
scalar operations. The conversion macros are still needed.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-12-15 14:56:19 +01:00
Ingo Molnar
6a54aebf69 Merge commit 'v3.2-rc5' into sched/core
Merge reason: Pick up the latest fixes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-15 08:21:30 +01:00
David Howells
1632b9e2a1 UAPI: Split trivial #if defined(__KERNEL__) && X conditionals
Split trivial #if defined(__KERNEL__) && X conditionals to make automated
disintegration easier.

Signed-off-by: David Howells <dhowells@redhat.com>
2011-12-13 15:07:49 +00:00
David Howells
87d2f5e398 UAPI: Alter the S390 asm include guards to be recognisable by the UAPI splitter
Alter some of the S390 asm include guards to fit a pattern that the UAPI
splitter recognises.

Signed-off-by: David Howells <dhowells@redhat.com>
2011-12-13 09:26:45 +00:00
Frederic Weisbecker
1268fbc746 nohz: Remove tick_nohz_idle_enter_norcu() / tick_nohz_idle_exit_norcu()
Those two APIs were provided to optimize the calls of
tick_nohz_idle_enter() and rcu_idle_enter() into a single
irq disabled section. This way no interrupt happening in-between would
needlessly process any RCU job.

Now we are talking about an optimization for which benefits
have yet to be measured. Let's start simple and completely decouple
idle rcu and dyntick idle logics to simplify.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-12-11 10:31:57 -08:00
Frederic Weisbecker
2bbb6817c0 nohz: Allow rcu extended quiescent state handling seperately from tick stop
It is assumed that rcu won't be used once we switch to tickless
mode and until we restart the tick. However this is not always
true, as in x86-64 where we dereference the idle notifiers after
the tick is stopped.

To prepare for fixing this, add two new APIs:
tick_nohz_idle_enter_norcu() and tick_nohz_idle_exit_norcu().

If no use of RCU is made in the idle loop between
tick_nohz_enter_idle() and tick_nohz_exit_idle() calls, the arch
must instead call the new *_norcu() version such that the arch doesn't
need to call rcu_idle_enter() and rcu_idle_exit().

Otherwise the arch must call tick_nohz_enter_idle() and
tick_nohz_exit_idle() and also call explicitly:

- rcu_idle_enter() after its last use of RCU before the CPU is put
to sleep.
- rcu_idle_exit() before the first use of RCU after the CPU is woken
up.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: David Miller <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-12-11 10:31:36 -08:00
Frederic Weisbecker
280f06774a nohz: Separate out irq exit and idle loop dyntick logic
The tick_nohz_stop_sched_tick() function, which tries to delay
the next timer tick as long as possible, can be called from two
places:

- From the idle loop to start the dytick idle mode
- From interrupt exit if we have interrupted the dyntick
idle mode, so that we reprogram the next tick event in
case the irq changed some internal state that requires this
action.

There are only few minor differences between both that
are handled by that function, driven by the ts->inidle
cpu variable and the inidle parameter. The whole guarantees
that we only update the dyntick mode on irq exit if we actually
interrupted the dyntick idle mode, and that we enter in RCU extended
quiescent state from idle loop entry only.

Split this function into:

- tick_nohz_idle_enter(), which sets ts->inidle to 1, enters
dynticks idle mode unconditionally if it can, and enters into RCU
extended quiescent state.

- tick_nohz_irq_exit() which only updates the dynticks idle mode
when ts->inidle is set (ie: if tick_nohz_idle_enter() has been called).

To maintain symmetry, tick_nohz_restart_sched_tick() has been renamed
into tick_nohz_idle_exit().

This simplifies the code and micro-optimize the irq exit path (no need
for local_irq_save there). This also prepares for the split between
dynticks and rcu extended quiescent state logics. We'll need this split to
further fix illegal uses of RCU in extended quiescent states in the idle
loop.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: David Miller <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-12-11 10:31:35 -08:00
Tejun Heo
0ee332c145 memblock: Kill early_node_map[]
Now all ARCH_POPULATES_NODE_MAP archs select HAVE_MEBLOCK_NODE_MAP -
there's no user of early_node_map[] left.  Kill early_node_map[] and
replace ARCH_POPULATES_NODE_MAP with HAVE_MEMBLOCK_NODE_MAP.  Also,
relocate for_each_mem_pfn_range() and helper from mm.h to memblock.h
as page_alloc.c would no longer host an alternative implementation.

This change is ultimately one to one mapping and shouldn't cause any
observable difference; however, after the recent changes, there are
some functions which now would fit memblock.c better than page_alloc.c
and dependency on HAVE_MEMBLOCK_NODE_MAP instead of HAVE_MEMBLOCK
doesn't make much sense on some of them.  Further cleanups for
functions inside HAVE_MEMBLOCK_NODE_MAP in mm.h would be nice.

-v2: Fix compile bug introduced by mis-spelling
 CONFIG_HAVE_MEMBLOCK_NODE_MAP to CONFIG_MEMBLOCK_HAVE_NODE_MAP in
 mmzone.h.  Reported by Stephen Rothwell.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Chen Liqin <liqin.chen@sunplusct.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-12-08 10:22:09 -08:00
Tejun Heo
ff38df377c s390: Use HAVE_MEMBLOCK_NODE_MAP
s390 used early_node_map[] just to prime free_area_init_nodes().  Now
memblock can be used for the same purpose and early_node_map[] is
scheduled to be dropped.  Use memblock instead.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux-s390@vger.kernel.org
2011-12-08 10:22:09 -08:00
Andreas Krebbel
dd3c4670d7 oprofile, s390: Add event interface to the System z hardware sampling module
With this patch the OProfile Basic Mode Sampling support for System z
is enhanced with a counter file system.  That way hardware sampling
can be configured using the user space tools with only little
modifications.

With the patch by default new cpu_types (s390/z10, s390/z196) are
returned in order to indicate that we are running a CPU which provides
the hardware sampling facility.  Existing user space tools will
complain about an unknown cpu type. In order to be compatible with
existing user space tools the `cpu_type' module parameter has been
added.  Setting the parameter to `timer' will force the module to
return `timer' as cpu_type.  The module will still try to use hardware
sampling if available and the hwsampling virtual filesystem will be
also be available for configuration.  So this has a different effect
than using the generic oprofile module parameter `timer=1'.

If the basic mode sampling is enabled on the machine and the
cpu_type=timer parameter is not used the kernel module will provide
the following virtual filesystem:

/dev/oprofile/0/enabled
/dev/oprofile/0/event
/dev/oprofile/0/count
/dev/oprofile/0/unit_mask
/dev/oprofile/0/kernel
/dev/oprofile/0/user

In the counter file system only the values of 'enabled', 'count',
'kernel', and 'user' are evaluated by the kernel module. Everything
else must contain fixed values.

The 'event' value only supports a single event - HWSAMPLING with value
0.

The 'count' value specifies the hardware sampling rate as it is passed
to the CPU measurement facility.

The 'kernel' and 'user' flags can now be used to filter for samples
when using hardware sampling.

Additionally also the following file will be created:
/dev/oprofile/timer/enabled

This will always be the inverted value of /dev/oprofile/0/enabled. 0
is not accepted without hardware sampling.

Signed-off-by: Andreas Krebbel <krebbel@linux.vnet.ibm.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>
2011-12-07 11:47:09 +01:00
Robert Richter
f8c852031a oprofile: Fix oprofile_timer_exit() breakage
Removing remainings of oprofile_timer_exit() completly.

Signed-off-by: Robert Richter <robert.richter@amd.com>
2011-12-07 11:16:38 +01:00
David S. Miller
959327c784 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2011-12-06 21:10:05 -05:00
Glauber Costa
3292beb340 sched/accounting: Change cpustat fields to an array
This patch changes fields in cpustat from a structure, to an
u64 array. Math gets easier, and the code is more flexible.

Signed-off-by: Glauber Costa <glommer@parallels.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul Tuner <pjt@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1322498719-2255-2-git-send-email-glommer@parallels.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06 09:06:38 +01:00
Martin Schwidefsky
cfc9066bcd [S390] remove reset of system call restart on psw changes
git commit 20b40a794b "signal race with restarting system calls"
added code to the poke_user/poke_user_compat to reset the system call
restart information in the thread-info if the PSW address is changed.
The purpose of that change has been to workaround old gdbs that do
not know about the REGSET_SYSTEM_CALL. It turned out that this is not
a good idea, it makes the behaviour of the debuggee dependent on the
order of specific ptrace call, e.g. the REGSET_SYSTEM_CALL register
set needs to be written last. And the workaround does not really fix
old gdbs, inferior calls on interrupted restarting system calls do not
work either way.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-12-01 13:32:17 +01:00
Martin Schwidefsky
b934069c99 [S390] add missing .set function for NT_S390_LAST_BREAK regset
The last breaking event address is a read-only value, the regset misses the
.set function. If a PTRACE_SETREGSET is done for NT_S390_LAST_BREAK we
get an oops due to a branch to zero:

Kernel BUG at 0000000000000002 verbose debug info unavailable
illegal operation: 0001 #1 SMP
...
Call Trace:
(<0000000000158294> ptrace_regset+0x184/0x188)
 <00000000001595b6> ptrace_request+0x37a/0x4fc
 <0000000000109a78> arch_ptrace+0x108/0x1fc
 <00000000001590d6> SyS_ptrace+0xaa/0x12c
 <00000000005c7a42> sysc_noemu+0x16/0x1c
 <000003fffd5ec10c> 0x3fffd5ec10c
Last Breaking-Event-Address:
 <0000000000158242> ptrace_regset+0x132/0x188

Add a nop .set function to prevent the branch to zero.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: stable@kernel.org
2011-12-01 13:32:17 +01:00
Carsten Otte
7c81878b34 [S390] fix page change underindication in pgste_update_all
This patch makes sure we don't underindicate _PAGE_CHANGED in case
we have a race between an operation that changes the page and this
code path that hits us between page_get_storage_key and
page_set_storage_key. Note that we still have a potential
underindication on _PAGE_REFERENCED in the unlikely event that
the page was changed but not referenced _and_ someone references
the page in the race window. That's not considered to be a problem.

Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-12-01 13:32:17 +01:00
Martin Schwidefsky
d9ae6772d3 [S390] ptrace inferior call interactions with TIF_SYSCALL
The TIF_SYSCALL bit needs to be cleared if the debugger changes the state
of the ptraced process in regard to the presence of a system call.
Otherwise the system call will be restarted although the debugger set up
an inferior call.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-12-01 13:32:17 +01:00
Michael Holzheu
5f894cbb68 [S390] kdump: Replace is_kdump_kernel() with OLDMEM_BASE check
In order to have the same behavior for kdump based stand-alone dump
as for the kexec method, the is_kdump_kernel() check (only true for
the kexec method) has to be replaced by the OLDMEM_BASE check (true
for both methods).

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-12-01 13:32:17 +01:00
Tejun Heo
d88e4cb671 freezer: remove now unused TIF_FREEZE
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linux-arch@vger.kernel.org
2011-11-21 12:32:25 -08:00
David S. Miller
efd0bf97de Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
The forcedeth changes had a conflict with the conversion over
to atomic u64 statistics in net-next.

The libertas cfg.c code had a conflict with the bss reference
counting fix by John Linville in net-next.

Conflicts:
	drivers/net/ethernet/nvidia/forcedeth.c
	drivers/net/wireless/libertas/cfg.c
2011-11-21 13:50:33 -05:00
Linus Torvalds
a4cc3889f7 Merge branch 'kvm-updates/3.2' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/3.2' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM guest: prevent tracing recursion with kvmclock
  Revert "KVM: PPC: Add support for explicit HIOR setting"
  KVM: VMX: Check for automatic switch msr table overflow
  KVM: VMX: Add support for guest/host-only profiling
  KVM: VMX: add support for switching of PERF_GLOBAL_CTRL
  KVM: s390: announce SYNC_MMU
  KVM: s390: Fix tprot locking
  KVM: s390: handle SIGP sense running intercepts
  KVM: s390: Fix RUNNING flag misinterpretation
2011-11-20 14:57:43 -08:00
John W. Linville
e11c259f74 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem
Conflicts:
	include/net/bluetooth/bluetooth.h
2011-11-17 13:11:43 -05:00
Christian Borntraeger
52e16b185f KVM: s390: announce SYNC_MMU
KVM on s390 always had a sync mmu. Any mapping change in userspace
mapping was always reflected immediately in the guest mapping.
- In older code the guest mapping was just an offset
- In newer code the last level page table is shared

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-11-17 16:25:52 +02:00
Christian Borntraeger
1eddb85f88 KVM: s390: Fix tprot locking
There is a potential host deadlock in the tprot intercept handling.
We must not hold the mmap semaphore while resolving the guest
address. If userspace is remapping, then the memory detection in
the guest is broken anyway so we can safely separate the
address translation from walking the vmas.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-11-17 16:25:48 +02:00
Cornelia Huck
bd59d3a444 KVM: s390: handle SIGP sense running intercepts
SIGP sense running may cause an intercept on higher level
virtualization, so handle it by checking the CPUSTAT_RUNNING flag.

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-11-17 16:25:46 +02:00
Cornelia Huck
9e6dabeffd KVM: s390: Fix RUNNING flag misinterpretation
CPUSTAT_RUNNING was implemented signifying that a vcpu is not stopped.
This is not, however, what the architecture says: RUNNING should be
set when the host is acting on the behalf of the guest operating
system.

CPUSTAT_RUNNING has been changed to be set in kvm_arch_vcpu_load()
and to be unset in kvm_arch_vcpu_put().

For signifying stopped state of a vcpu, a host-controlled bit has
been used and is set/unset basically on the reverse as the old
CPUSTAT_RUNNING bit (including pushing it down into stop handling
proper in handle_stop()).

Cc: stable@kernel.org
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-11-17 16:25:43 +02:00
Heiko Carstens
f6bf1a8acd [S390] topology: fix topology on z10 machines
Make sure that all cpus in a book on a z10 appear as book siblings
and not as core siblings. This fixes some performance regressions that
appeared after the book scheduling domain got introduced.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-11-14 11:19:09 +01:00
Jan Glauber
6ed54387dc [S390] crypto: avoid MSA3 and MSA4 instructions in ESA mode
MSA3 and MSA4 instructions are only available under CONFIG_64BIT.
Bail out before using any of these instructions if the kernel is
running in 31 bit mode.

Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-11-14 11:19:09 +01:00
Jan Glauber
cfa1e7e1d4 [S390] avoid STCKF if running in ESA mode
In ESA mode STCKF is not defined even if the facility bit is enabled.
To prevent an illegal operation we must also check if we run a 64 bit kernel.
To make the check perform well add the STCKF bit to the machine flags.

Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-11-14 11:19:09 +01:00
Michael Holzheu
3f25dc4fcb [S390] zfcpdump: Do not initialize zfcpdump in kdump mode
When the kernel is started in kdump mode, zfcpdump should not be
initialized because both dump methods can't be used at the same time.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-11-14 11:19:09 +01:00
Michael Holzheu
96603b505c [S390] Kconfig: Select CONFIG_KEXEC for CONFIG_CRASH_DUMP
The kdump infrastructure is built on top of kexec. Therefore
CONFIG_KEXEC has to be enabled when CONFIG_CRASH_DUMP is selected.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-11-14 11:19:08 +01:00
Martin Schwidefsky
7a2512b744 [S390] incorrect note program header
'readelf -n' on the s390 vmlinux file generates lots of warnings about
corrupt notes. The reason is that the 'NOTE' program header has incorrect
file and memory sizes. The problem is that the section following the
NOTES section do not switch to a different phdr and they get added to
the NOTE program section. Add a dummy entry to the linker script that
switches to the data phdr before the start of the RODATA section.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-11-14 11:19:08 +01:00
Heiko Carstens
fa2fb2f4a5 [S390] pfault: ignore leftover completion interrupts
Ignore completion interrupts if the initial interrupt hasn't been
received and the addressed task is not running. This case can only
happen if leftover (pending) completion interrupt gets delivered
which wasn't removed with the PFAULT CANCEL operation during cpu
hotplug.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-11-14 11:19:08 +01:00
Martin Schwidefsky
09b538833b [S390] fix pgste update logic
The pgste_update_all / pgste_update_young and pgste_set_pte need to
check if the pte entry contains a valid page address before the storage
key can be accessed. In addition pgste_set_pte needs to set the access
key and fetch protection bit of the new pte entry, not the old entry.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-11-14 11:19:08 +01:00
Heiko Carstens
800252976b [S390] wire up process_vm syscalls
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-11-14 11:19:08 +01:00
Jiri Kosina
2290c0d06d Merge branch 'master' into for-next
Sync with Linus tree to have 157550ff ("mtd: add GPMI-NAND driver
in the config and Makefile") as I have patch depending on that one.
2011-11-13 20:55:53 +01:00
Paul Bolle
06b19a5526 s390: drop "select HAVE_GET_USER_PAGES_FAST"
There is no Kconfig symbol named HAVE_GET_USER_PAGES_FAST. The select
statement for that symbol is a nop. Drop it.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-11-13 20:48:22 +01:00
Johannes Berg
6e3e939f3b net: add wireless TX status socket option
The 802.1X EAPOL handshake hostapd does requires
knowing whether the frame was ack'ed by the peer.
Currently, we fudge this pretty badly by not even
transmitting the frame as a normal data frame but
injecting it with radiotap and getting the status
out of radiotap monitor as well. This is rather
complex, confuses users (mon.wlan0 presence) and
doesn't work with all hardware.

To get rid of that hack, introduce a real wifi TX
status option for data frame transmissions.

This works similar to the existing TX timestamping
in that it reflects the SKB back to the socket's
error queue with a SCM_WIFI_STATUS cmsg that has
an int indicating ACK status (0/1).

Since it is possible that at some point we will
want to have TX timestamping and wifi status in a
single errqueue SKB (there's little point in not
doing that), redefine SO_EE_ORIGIN_TIMESTAMPING
to SO_EE_ORIGIN_TXSTATUS which can collect more
than just the timestamp; keep the old constant
as an alias of course. Currently the internal APIs
don't make that possible, but it wouldn't be hard
to split them up in a way that makes it possible.

Thanks to Neil Horman for helping me figure out
the functions that add the control messages.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-11-09 16:01:02 -05:00
Linus Torvalds
b32fc0a062 Merge branch 'upstream/jump-label-noearly' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen
* 'upstream/jump-label-noearly' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen:
  jump-label: initialize jump-label subsystem much earlier
  x86/jump_label: add arch_jump_label_transform_static()
  s390/jump-label: add arch_jump_label_transform_static()
  jump_label: add arch_jump_label_transform_static() to optimise non-live code updates
  sparc/jump_label: drop arch_jump_label_text_poke_early()
  x86/jump_label: drop arch_jump_label_text_poke_early()
  jump_label: if a key has already been initialized, don't nop it out
  stop_machine: make stop_machine safe and efficient to call early
  jump_label: use proper atomic_t initializer

Conflicts:
 - arch/x86/kernel/jump_label.c
	Added __init_or_module to arch_jump_label_text_poke_early vs
	removal of that function entirely
 - kernel/stop_machine.c
	same patch ("stop_machine: make stop_machine safe and efficient
	to call early") merged twice, with whitespace fix in one version
2011-11-06 20:20:46 -08:00
Linus Torvalds
32aaeffbd4 Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux
* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
  Revert "tracing: Include module.h in define_trace.h"
  irq: don't put module.h into irq.h for tracking irqgen modules.
  bluetooth: macroize two small inlines to avoid module.h
  ip_vs.h: fix implicit use of module_get/module_put from module.h
  nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
  include: replace linux/module.h with "struct module" wherever possible
  include: convert various register fcns to macros to avoid include chaining
  crypto.h: remove unused crypto_tfm_alg_modname() inline
  uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
  pm_runtime.h: explicitly requires notifier.h
  linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
  miscdevice.h: fix up implicit use of lists and types
  stop_machine.h: fix implicit use of smp.h for smp_processor_id
  of: fix implicit use of errno.h in include/linux/of.h
  of_platform.h: delete needless include <linux/module.h>
  acpi: remove module.h include from platform/aclinux.h
  miscdevice.h: delete unnecessary inclusion of module.h
  device_cgroup.h: delete needless include <linux/module.h>
  net: sch_generic remove redundant use of <linux/module.h>
  net: inet_timewait_sock doesnt need <linux/module.h>
  ...

Fix up trivial conflicts (other header files, and  removal of the ab3550 mfd driver) in
 - drivers/media/dvb/frontends/dibx000_common.c
 - drivers/media/video/{mt9m111.c,ov6650.c}
 - drivers/mfd/ab3550-core.c
 - include/linux/dmaengine.h
2011-11-06 19:44:47 -08:00
Linus Torvalds
092f4c56c1 Merge branch 'akpm' (Andrew's incoming - part two)
Says Andrew:

 "60 patches.  That's good enough for -rc1 I guess.  I have quite a lot
  of detritus to be rechecked, work through maintainers, etc.

 - most of the remains of MM
 - rtc
 - various misc
 - cgroups
 - memcg
 - cpusets
 - procfs
 - ipc
 - rapidio
 - sysctl
 - pps
 - w1
 - drivers/misc
 - aio"

* akpm: (60 commits)
  memcg: replace ss->id_lock with a rwlock
  aio: allocate kiocbs in batches
  drivers/misc/vmw_balloon.c: fix typo in code comment
  drivers/misc/vmw_balloon.c: determine page allocation flag can_sleep outside loop
  w1: disable irqs in critical section
  drivers/w1/w1_int.c: multiple masters used same init_name
  drivers/power/ds2780_battery.c: fix deadlock upon insertion and removal
  drivers/power/ds2780_battery.c: add a nolock function to w1 interface
  drivers/power/ds2780_battery.c: create central point for calling w1 interface
  w1: ds2760 and ds2780, use ida for id and ida_simple_get() to get it
  pps gpio client: add missing dependency
  pps: new client driver using GPIO
  pps: default echo function
  include/linux/dma-mapping.h: add dma_zalloc_coherent()
  sysctl: make CONFIG_SYSCTL_SYSCALL default to n
  sysctl: add support for poll()
  RapidIO: documentation update
  drivers/net/rionet.c: fix ethernet address macros for LE platforms
  RapidIO: fix potential null deref in rio_setup_device()
  RapidIO: add mport driver for Tsi721 bridge
  ...
2011-11-02 16:07:27 -07:00
Andrea Arcangeli
b35a35b556 thp: share get_huge_page_tail()
This avoids duplicating the function in every arch gup_fast.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-02 16:06:58 -07:00
Andrea Arcangeli
0693bc9ce2 s390: gup_huge_pmd() return 0 if pte changes
s390 didn't return 0 in that case, if it's rolling back the *nr pointer it
should also return zero to avoid adding pages to the array at the wrong
offset.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-02 16:06:58 -07:00
Andrea Arcangeli
220a2eb228 s390: gup_huge_pmd() support THP tail recounting
Up to this point the code assumed old refcounting for hugepages (pre-thp).
This updates the code directly to the thp mapcount tail page refcounting.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-02 16:06:58 -07:00
Miklos Szeredi
bfe8684869 filesystems: add set_nlink()
Replace remaining direct i_nlink updates with a new set_nlink()
updater function.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-11-02 12:53:43 +01:00
Miklos Szeredi
6d6b77f163 filesystems: add missing nlink wrappers
Replace direct i_nlink updates with the respective updater function
(inc_nlink, drop_nlink, clear_nlink, inode_dec_link_count).

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2011-11-02 12:53:43 +01:00
Miklos Szeredi
dc78610228 hypfs: remove unnecessary nlink setting
alloc_inode() initializes i_nlink to 1.  Remove unnecessary
re-initialization.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-11-02 12:53:42 +01:00
Heiko Carstens
3a4c5d5964 s390: add missing module.h/export.h includes
Fix several compile errors on s390 caused by splitting module.h.

Some include additions [e.g. qdio_setup.c, zfcp_qdio.c] are in
anticipation of pending changes queued for s390 that increase
the modular use footprint.

[PG: added additional obvious changes since Heiko's original patch]

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:30:58 -04:00
Linus Torvalds
32087d4eec Merge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6: (54 commits)
  [S390] Remove error checking from copy_oldmem_page()
  [S390] qdio: prevent dsci access without adapter interrupts
  [S390] irqstats: split IPI interrupt accounting
  [S390] add missing __tlb_flush_global() for !CONFIG_SMP
  [S390] sparse: fix sparse symbol shadow warning
  [S390] sparse: fix sparse NULL pointer warnings
  [S390] sparse: fix sparse warnings with __user pointers
  [S390] sparse: fix sparse warnings in math-emu
  [S390] sparse: fix sparse warnings about missing prototypes
  [S390] sparse: fix sparse ANSI-C warnings
  [S390] sparse: fix sparse static warnings
  [S390] sparse: fix access past end of array warnings
  [S390] dasd: prevent path verification before resume
  [S390] qdio: remove multicast polling
  [S390] qdio: reset outbound SBAL error states
  [S390] qdio: EQBS retry after CCQ 96
  [S390] qdio: add timestamp for last queue scan time
  [S390] Introduce get_clock_fast()
  [S390] kvm: Handle diagnose 0x10 (release pages)
  [S390] take mmap_sem when walking guest page table
  ...
2011-10-31 16:14:20 -07:00
Linus Torvalds
1bc87b0055 Merge branch 'kvm-updates/3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm
* 'kvm-updates/3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: (75 commits)
  KVM: SVM: Keep intercepting task switching with NPT enabled
  KVM: s390: implement sigp external call
  KVM: s390: fix register setting
  KVM: s390: fix return value of kvm_arch_init_vm
  KVM: s390: check cpu_id prior to using it
  KVM: emulate lapic tsc deadline timer for guest
  x86: TSC deadline definitions
  KVM: Fix simultaneous NMIs
  KVM: x86 emulator: convert push %sreg/pop %sreg to direct decode
  KVM: x86 emulator: switch lds/les/lss/lfs/lgs to direct decode
  KVM: x86 emulator: streamline decode of segment registers
  KVM: x86 emulator: simplify OpMem64 decode
  KVM: x86 emulator: switch src decode to decode_operand()
  KVM: x86 emulator: qualify OpReg inhibit_byte_regs hack
  KVM: x86 emulator: switch OpImmUByte decode to decode_imm()
  KVM: x86 emulator: free up some flag bits near src, dst
  KVM: x86 emulator: switch src2 to generic decode_operand()
  KVM: x86 emulator: expand decode flags to 64 bits
  KVM: x86 emulator: split dst decode to a generic decode_operand()
  KVM: x86 emulator: move memop, memopp into emulation context
  ...
2011-10-30 15:36:45 -07:00
Michael Holzheu
07ea815b22 [S390] Remove error checking from copy_oldmem_page()
Currently it can happen that the pre-allocated ELF header contains a wrong
memory map which would result in errors when copying /proc/vmcore.
In order to still get a valid vmcore, we (temporarily) disable the error
checking in copy_oldmem_page(). This will then produce zero pages for those
memory regions.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30 15:16:47 +01:00
Heiko Carstens
2a3a2d66aa [S390] irqstats: split IPI interrupt accounting
We use both the external call and emergency call IPIs to signal remote
cpus. Therefore it makes sense to account them differently withing
/proc/irqstats so we actually know what happened.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30 15:16:47 +01:00
Jan Glauber
e1c4d0142d [S390] add missing __tlb_flush_global() for !CONFIG_SMP
Fix this compiler error for !CONFIG_SMP:

  CC      arch/s390/mm/pgtable.o
arch/s390/mm/pgtable.c: In function ‘gmap_flush_tlb’:
arch/s390/mm/pgtable.c:202:3: error: implicit declaration of function ‘__tlb_flush_global’ [-Werror=implicit-function-declaration]
cc1: some warnings being treated as errors

Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30 15:16:47 +01:00
Martin Schwidefsky
3c52e49d7c [S390] sparse: fix sparse warnings with __user pointers
Use __force to quiet sparse warnings about user address space.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30 15:16:46 +01:00
Martin Schwidefsky
5b479a79bf [S390] sparse: fix sparse warnings in math-emu
Fix three sparse warnings in math-emu / sysinfo:

arch/s390/kernel/sysinfo.c:448:17: error: return expression in void function
arch/s390/kernel/sysinfo.c:445:25: warning: shift too big (32) for type unsigned int
arch/s390/kernel/sysinfo.c:445:25: warning: shift too big (32) for type unsigned int

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30 15:16:46 +01:00
Martin Schwidefsky
638ad34a88 [S390] sparse: fix sparse warnings about missing prototypes
Add prototypes and includes for functions used in different modules.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30 15:16:46 +01:00
Martin Schwidefsky
e54aafa0c3 [S390] sparse: fix sparse ANSI-C warnings
Fix prototype of some functions in arch/s390/oprofile to avoid non-ANSI
warnings from sparse.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30 15:16:46 +01:00
Martin Schwidefsky
c4736d9682 [S390] sparse: fix sparse static warnings
Make functions and data static to avoid sparse warnings.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30 15:16:46 +01:00
Martin Schwidefsky
399c1d8dbf [S390] sparse: fix access past end of array warnings
Remove unnecessary code to avoid false positives from sparse, e.g.

arch/s390/kernel/compat_signal.c:221:61: warning: invalid access past the end of 'set32' (8 8)

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30 15:16:46 +01:00
Jan Glauber
80376f347d [S390] Introduce get_clock_fast()
Add get_clock_fast() which uses the slightly faster stckf if available.
If stckf is not available fall back to stck, which has the same width.

Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30 15:16:45 +01:00
Christian Borntraeger
388186bc92 [S390] kvm: Handle diagnose 0x10 (release pages)
Linux on System z uses a ballooner based on diagnose 0x10. (aka as
collaborative memory management). This patch implements diagnose
0x10 on the guest address space.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30 15:16:45 +01:00
Carsten Otte
499069e1a4 [S390] take mmap_sem when walking guest page table
gmap_fault needs to walk the guest page table. However, parts of
that may change if some other thread does munmap. In that case
gmap_unmap_notifier will also unmap the corresponding parts from
the guest page table. We need to take mmap_sem in order to serialize
these operations.
do_exception now calls __gmap_fault with mmap_sem held which does
not get exported to modules. The exported function, which is called
from KVM, now takes mmap_sem.

Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30 15:16:45 +01:00
Carsten Otte
cc772456ac [S390] fix list corruption in gmap reverse mapping
This introduces locking via mm->page_table_lock to protect
the rmap list for guest mappings from being corrupted by concurrent
operations.

Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30 15:16:44 +01:00
Carsten Otte
a9162f238a [S390] fix possible deadlock in gmap_map_segment
Fix possible deadlock reported by lockdep:
qemu-system-s39/2963 is trying to acquire lock:
(&mm->mmap_sem){++++++}, at: gmap_alloc_table+0x9c/0x120
but task is already holding lock:
(&mm->mmap_sem){++++++}, at: gmap_map_segment+0xa6/0x27c

Actually gmap_alloc_table is the only called in gmap_map_segment with
mmap_sem held, thus it's safe to simply remove the inner lock.

Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30 15:16:44 +01:00
Carsten Otte
69ba974366 [S390] load user asce on sie_fault
On sie_fault we need to switch back to user ASCE. Otherwise we get
interresting effects when exiting to "userspace" while the guest
space is still active.

Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30 15:16:44 +01:00
Martin Schwidefsky
d98e19ccef [S390] smp: external call vs. emergency signal
Use a sigp sense running to decide which signal processor order to use
for an ipi. If the target cpu is running use external call, if the target
cpu is not running use emergency signal.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30 15:16:44 +01:00
Sebastian Ott
65b4e403ac [S390] chsc_sch: add support for irq statistics
Add support for CHSC I/O interrupt statistics in /proc/interrupts.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30 15:16:44 +01:00
Martin Schwidefsky
d4e81b35b8 [S390] allow all addressing modes
The user space program can change its addressing mode between the
24-bit, 31-bit and the 64-bit mode if the kernel is 64 bit. Currently
the kernel always forces the standard amode on signal delivery and
signal return and on ptrace: 64-bit for a 64-bit process, 31-bit for
a compat process and 31-bit kernels. Change the signal and ptrace code
to allow the full range of addressing modes. Signal handlers are
run in the standard addressing mode for the process.

One caveat is that even an 31-bit compat process can switch to the
64-bit mode. The next signal will switch back into the 31-bit mode
and there is no room in the 31-bit compat signal frame to store the
information that the program came from the 64-bit mode.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30 15:16:43 +01:00