Commit Graph

449520 Commits

Author SHA1 Message Date
Joe Perches
804bcaf79b tile: convert use of typedef ctl_table to struct ctl_table
This typedef is unnecessary and should just be removed.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:15 -07:00
Joe Perches
2841efa636 ia64: convert use of typedef ctl_table to struct ctl_table
This typedef is unnecessary and should just be removed.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:15 -07:00
Joe Perches
37649de239 arm: convert use of typedef ctl_table to struct ctl_table
This typedef is unnecessary and should just be removed.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:15 -07:00
Fabian Frederick
119ce5c8b9 kernel/seccomp.c: kernel-doc warning fix
+ fix small typo

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:15 -07:00
Manfred Spraul
9b44ee2eef ipc/sem.c: add a printk_once for semctl(GETNCNT/GETZCNT)
The actual Linux implementation for semctl(GETNCNT) and semctl(GETZCNT)
always (since 0.99.10) reported a thread as sleeping on all semaphores
that are listed in the semop() call.

The documented behavior (both in the Linux man page and in the Single
Unix Specification) is that a task should be reported on exactly one
semaphore: The semaphore that caused the thread to got to sleep.

This patch adds a pr_info_once() that is triggered if a thread hits the
relevant case.

The code triggers slightly too often, otherwise it would be necessary to
replicate the old code.  As there are no known users of GETNCNT or
GETZCNT, this is done to prevent unnecessary bloat.

The task that triggered is reported with name (tsk->comm) and pid.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Acked-by: Davidlohr Bueso <davidlohr@hp.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:15 -07:00
Manfred Spraul
b220c57aec ipc/sem.c: make semctl(,,{GETNCNT,GETZCNT}) standard compliant
SUSv4 clearly defines how semncnt and semzcnt must be calculated: A task
waits on exactly one semaphore: The semaphore from the first operation
in the sop array that cannot proceed.

The Linux implementation never followed the standard, it tried to count
all semaphores that might be the reason why a task sleeps.

This patch fixes that.

Note:
a) The implementation assumes that GETNCNT and GETZCNT are rare operations,
   therefore the code counts them only on demand.
   (If they wouldn't be rare, then the non-compliance would have
   been found earlier)

b) compared to the initial version of the patch, the BUG_ONs were removed
   and it was clarified that the new behavior conforms to SUS.

Back-compatibility concerns:

Manfred:

: - there is no application in Fedora that uses GETNCNT or GETZCNT.
:
: - application that use only single-sop semop() are also safe, the
:   difference only affects complex apps.
:
: - portable application are also safe, the new behavior is standard
:   compliant.
:
: But that's it.  The old behavior existed in Linux from 0.99.something
: until now.

Michael:

: * These operations seem to be very little used.  Grepping the public
:   source that is contained Fedora 20 source DVD, there appear to be no
:   uses.  Of course, this says nothing about uses in private /
:   non-mainstream FOSS code, but it seems likely that the same pattern
:   is followed there.
:
: * The existing behavior is hard enough to understand that I suspect
:   that no one understood it well enough to rely on it anyway
:   (especially as that behavior contradicted both man page and POSIX).
:
: So, there's a chance of breakage, but I estimate that it's minute.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:15 -07:00
Manfred Spraul
ed247b7ca0 ipc/sem.c: store which operation blocks in perform_atomic_semop()
Preparation for the next patch:

In the slow-path of perform_atomic_semop(), store a pointer to the
operation that caused the operation to block.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:15 -07:00
Manfred Spraul
d198cd6d6d ipc/sem.c: change perform_atomic_semop parameters
Right now, perform_atomic_semop gets the content of sem_queue as
individual fields.  Changes that, instead pass a pointer to sem_queue.

This is a preparation for the next patch: it uses sem_queue to store the
reason why a task must sleep.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:15 -07:00
Manfred Spraul
2f2ed41dca ipc/sem.c: remove code duplication
count_semzcnt and count_semncnt are more of less identical.  The patch
creates a single function that either counts the number of tasks waiting
for zero or waiting due to a decrease operation.

Compared to the initial version, the BUG_ONs were removed.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:15 -07:00
Manfred Spraul
1994862dc9 ipc/sem.c: bugfix for semctl(,,GETZCNT)
GETZCNT is supposed to return the number of threads that wait until a
semaphore value becomes 0.

The current implementation overlooks complex operations that contain
both wait-for-zero operation and operations that alter at least one
semaphore.

The patch fixes that.  It's intentionally copy&paste, this will be
cleaned up in the next patch.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:15 -07:00
Davidlohr Bueso
4bb6657dd3 ipc,msg: document volatile r_msg
The need for volatile is not obvious, document it.

Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Aswin Chandramouleeswaran <aswin@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:15 -07:00
Davidlohr Bueso
3440a6bd1d ipc,msg: move some msgq ns code around
Nothing big and no logical changes, just get rid of some redundant
function declarations.  Move msg_[init/exit]_ns down the end of the
file.

Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Aswin Chandramouleeswaran <aswin@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:14 -07:00
Davidlohr Bueso
f75a2f358d ipc,msg: use current->state helpers
Call __set_current_state() instead of assigning the new state directly.

Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Signed-off-by: Manfred Spraul <manfred@colorfullif.com>
Cc: Aswin Chandramouleeswaran <aswin@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:14 -07:00
Davidlohr Bueso
f57a19a7bc ipc,shm: document new limits in the uapi header
This is useful in the future and allows users to better understand the
reasoning behind the changes.

Also use UL as we're dealing with it anyways.

Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:14 -07:00
Manfred Spraul
060028bac9 ipc/shm.c: increase the defaults for SHMALL, SHMMAX
System V shared memory

a) can be abused to trigger out-of-memory conditions and the standard
   measures against out-of-memory do not work:

    - it is not possible to use setrlimit to limit the size of shm segments.

    - segments can exist without association with any processes, thus
      the oom-killer is unable to free that memory.

b) is typically used for shared information - today often multiple GB.
   (e.g. database shared buffers)

The current default is a maximum segment size of 32 MB and a maximum
total size of 8 GB.  This is often too much for a) and not enough for
b), which means that lots of users must change the defaults.

This patch increases the default limits (nearly) to the maximum, which
is perfect for case b).  The defaults are used after boot and as the
initial value for each new namespace.

Admins/distros that need a protection against a) should reduce the
limits and/or enable shm_rmid_forced.

Unix has historically required setting these limits for shared memory,
and Linux inherited such behavior.  The consequence of this is added
complexity for users and administrators.  One very common example are
Database setup/installation documents and scripts, where users must
manually calculate the values for these limits.  This also requires
(some) knowledge of how the underlying memory management works, thus
causing, in many occasions, the limits to just be flat out wrong.
Disabling these limits sooner could have saved companies a lot of time,
headaches and money for support.  But it's never too late, simplify
users life now.

Further notes:
- The patch only changes default, overrides behave as before:
        # sysctl kernel.shmall=33554432
  would recreate the previous limit for SHMMAX (for the current namespace).

- Disabling sysv shm allocation is possible with:
        # sysctl kernel.shmall=0
  (not a new feature, also per-namespace)

- The limits are intentionally set to a value slightly less than ULONG_MAX,
  to avoid triggering overflows in user space apps.
  [not unreasonable, see http://marc.info/?l=linux-mm&m=139638334330127]

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Reported-by: Davidlohr Bueso <davidlohr@hp.com>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:14 -07:00
Manfred Spraul
1376327ce1 ipc/shm.c: check for integer overflow during shmget.
SHMMAX is the upper limit for the size of a shared memory segment, counted
in bytes.  The actual allocation is that size, rounded up to the next full
page.

Add a check that prevents the creation of segments where the rounded up
size causes an integer overflow.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Acked-by: Davidlohr Bueso <davidlohr@hp.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:14 -07:00
Manfred Spraul
09c6eb1f65 ipc/shm.c: check for overflows of shm_tot
shm_tot counts the total number of pages used by shm segments.

If SHMALL is ULONG_MAX (or nearly ULONG_MAX), then the number can
overflow.  Subsequent calls to shmctl(,SHM_INFO,) would return wrong
values for shm_tot.

The patch adds a detection for overflows.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Acked-by: Davidlohr Bueso <davidlohr@hp.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:14 -07:00
Manfred Spraul
247a8ce822 ipc/shm.c: check for ulong overflows in shmat
The increase of SHMMAX/SHMALL is a 4 patch series.

The change itself is trivial, the only problem are interger overflows.
The overflows are not new, but if we make huge values the default, then
the code should be free from overflows.

SHMMAX:

- shmmem_file_setup places a hard limit on the segment size:
  MAX_LFS_FILESIZE.

  On 32-bit, the limit is > 1 TB, i.e. 4 GB-1 byte segments are
  possible. Rounded up to full pages the actual allocated size
  is 0. --> must be fixed, patch 3

- shmat:
  - find_vma_intersection does not handle overflows properly.
    --> must be fixed, patch 1

  - the rest is fine, do_mmap_pgoff limits mappings to TASK_SIZE
    and checks for overflows (i.e.: map 2 GB, starting from
    addr=2.5GB fails).

SHMALL:
- after creating 8192 segments size (1L<<63)-1, shm_tot overflows and
  returns 0.  --> must be fixed, patch 2.

Userspace:
- Obviously, there could be overflows in userspace. There is nothing
  we can do, only use values smaller than ULONG_MAX.
  I ended with "ULONG_MAX - 1L<<24":

  - TASK_SIZE cannot be used because it is the size of the current
    task. Could be 4G if it's a 32-bit task on a 64-bit kernel.

  - The maximum size is not standardized across archs:
    I found TASK_MAX_SIZE, TASK_SIZE_MAX and TASK_SIZE_64.

  - Just in case some arch revives a 4G/4G split, nearly
    ULONG_MAX is a valid segment size.

  - Using "0" as a magic value for infinity is even worse, because
    right now 0 means 0, i.e. fail all allocations.

This patch (of 4):

find_vma_intersection() does not work as intended if addr+size overflows.
The patch adds a manual check before the call to find_vma_intersection.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Acked-by: Davidlohr Bueso <davidlohr@hp.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:14 -07:00
Paul McQuade
46c0a8ca3e ipc, kernel: clear whitespace
trailing whitespace

Signed-off-by: Paul McQuade <paulmcquad@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:14 -07:00
Paul McQuade
7153e40273 ipc, kernel: use Linux headers
Use #include <linux/uaccess.h> instead of <asm/uaccess.h>
Use #include <linux/types.h> instead of <asm/types.h>

Signed-off-by: Paul McQuade <paulmcquad@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:14 -07:00
Mathias Krause
eb66ec44f8 ipc: constify ipc_ops
There is no need to recreate the very same ipc_ops structure on every
kernel entry for msgget/semget/shmget.  Just declare it static and be
done with it.  While at it, constify it as we don't modify the structure
at runtime.

Found in the PaX patch, written by the PaX Team.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: PaX Team <pageexec@freemail.hu>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:14 -07:00
Paul Bolle
3e4e0f0a87 initramfs: remove "compression mode" choice
Commit 9ba4bcb645 ("initramfs: read CONFIG_RD_ variables for initramfs
compression") removed the users of the various INITRAMFS_COMPRESSION_*
Kconfig symbols.  So since v3.13 the entire "Built-in initramfs
compression mode" choice is a set of knobs connected to nothing.  The
entire choice can safely be removed.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Cc: P J P <ppandit@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:14 -07:00
Fabian Frederick
04541a2f31 fs/devpts/inode.c: convert printk to pr_foo()
Also convert spaces to tabs (checkpatch warnings) if (!dentry) KERN_NOTICE
converted to pr_err (like if (!inode) error process)

[akpm@linux-foundation.org: use KBUILD_MODNAME, per Joe]
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Joe Perches <joe@perches.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:14 -07:00
Fabian Frederick
0227d6abb3 fs/cachefiles: replace kerror by pr_err
Also add pr_fmt in internal.h

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:14 -07:00
Fabian Frederick
4e1eb88305 FS/CACHEFILES: convert printk to pr_foo()
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:14 -07:00
Fabian Frederick
ef74885353 fs/pstore: logging clean-up
- Define pr_fmt in plateform.c and ram_core.c for global prefix.

- Coalesce format fragments.

- Separate format/arguments on lines > 80 characters.

Note: Some pr_foo() were initially declared without prefix and therefore
this could break existing log analyzer.

[akpm@linux-foundation.org: missed a couple of prefix removals]
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Joe Perches <joe@perches.com>
Cc: Anton Vorontsov <anton@enomsg.org>
Cc: Colin Cross <ccross@android.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:13 -07:00
Fabian Frederick
f3da64d1ea kernel/profile.c: use static const char instead of static char
schedstr, sleepstr and kvmstr are only used in strcmp & strlen

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:13 -07:00
Fabian Frederick
aba871f1e9 kernel/profile.c: convert printk to pr_foo()
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:13 -07:00
Fabian Frederick
9606d9aa85 fs/affs: pr_debug cleanup
- Remove AFFS: prefix (defined in pr_fmt)

- Use __func__

- Separate format/arguments on lines > 80 characters.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:13 -07:00
Fabian Frederick
0158de12b0 fs/affs: convert printk to pr_foo()
-All printk(KERN_foo converted to pr_foo()

-Default printk converted to pr_warn()

-Add pr_fmt to affs.h

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:13 -07:00
Fabian Frederick
0c89d67016 fs/affs/file.c: remove unnecessary function parameters
- affs_do_readpage_ofs is always called with from = 0 ie reading from
  page->index

- File parameter is never used

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:13 -07:00
Hans Verkuil
d55875f5d5 include/asm-generic/ioctl.h: fix _IOC_TYPECHECK sparse error
When running sparse over drivers/media/v4l2-core/v4l2-ioctl.c I get these
errors:

  drivers/media/v4l2-core/v4l2-ioctl.c:2043:9: error: bad integer constant expression
  drivers/media/v4l2-core/v4l2-ioctl.c:2044:9: error: bad integer constant expression
  drivers/media/v4l2-core/v4l2-ioctl.c:2045:9: error: bad integer constant expression
  drivers/media/v4l2-core/v4l2-ioctl.c:2046:9: error: bad integer constant expression

etc.

The root cause of that turns out to be in include/asm-generic/ioctl.h:

#include <uapi/asm-generic/ioctl.h>

/* provoke compile error for invalid uses of size argument */
extern unsigned int __invalid_size_argument_for_IOC;
#define _IOC_TYPECHECK(t) \
        ((sizeof(t) == sizeof(t[1]) && \
          sizeof(t) < (1 << _IOC_SIZEBITS)) ? \
          sizeof(t) : __invalid_size_argument_for_IOC)

If it is defined as this (as is already done if __KERNEL__ is not defined):

  #define _IOC_TYPECHECK(t) (sizeof(t))

then all is well with the world.

This patch allows sparse to work correctly.

Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:13 -07:00
Fabian Frederick
68a9a435e4 kernel/user_namespace.c: kernel-doc/checkpatch fixes
-uid->gid
-split some function declarations
-if/then/else warning

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:13 -07:00
Kees Cook
24fe831c17 tools/testing/selftests/sysctl: validate sysctl_writes_strict
This adds several behavioral tests to sysctl string and number writing
to detect unexpected cases that behaved differently when the sysctl
kernel.sysctl_writes_strict != 1.

[ original ]
    root@localhost:~# make test_num
    == Testing sysctl behavior against /proc/sys/kernel/domainname ==
    Writing test file ... ok
    Checking sysctl is not set to test value ... ok
    Writing sysctl from shell ... ok
    Resetting sysctl to original value ... ok
    Writing entire sysctl in single write ... ok
    Writing middle of sysctl after synchronized seek ... FAIL
    Writing beyond end of sysctl ... FAIL
    Writing sysctl with multiple long writes ... FAIL
    Writing entire sysctl in short writes ... FAIL
    Writing middle of sysctl after unsynchronized seek ... ok
    Checking sysctl maxlen is at least 65 ... ok
    Checking sysctl keeps original string on overflow append ... FAIL
    Checking sysctl stays NULL terminated on write ... ok
    Checking sysctl stays NULL terminated on overwrite ... ok
    make: *** [test_num] Error 1
    root@localhost:~# make test_string
    == Testing sysctl behavior against /proc/sys/vm/swappiness ==
    Writing test file ... ok
    Checking sysctl is not set to test value ... ok
    Writing sysctl from shell ... ok
    Resetting sysctl to original value ... ok
    Writing entire sysctl in single write ... ok
    Writing middle of sysctl after synchronized seek ... FAIL
    Writing beyond end of sysctl ... FAIL
    Writing sysctl with multiple long writes ... ok
    make: *** [test_string] Error 1

[ with CONFIG_PROC_SYSCTL_STRICT_WRITES ]
    root@localhost:~# make run_tests
    == Testing sysctl behavior against /proc/sys/kernel/domainname ==
    Writing test file ... ok
    Checking sysctl is not set to test value ... ok
    Writing sysctl from shell ... ok
    Resetting sysctl to original value ... ok
    Writing entire sysctl in single write ... ok
    Writing middle of sysctl after synchronized seek ... ok
    Writing beyond end of sysctl ... ok
    Writing sysctl with multiple long writes ... ok
    Writing entire sysctl in short writes ... ok
    Writing middle of sysctl after unsynchronized seek ... ok
    Checking sysctl maxlen is at least 65 ... ok
    Checking sysctl keeps original string on overflow append ... ok
    Checking sysctl stays NULL terminated on write ... ok
    Checking sysctl stays NULL terminated on overwrite ... ok
    == Testing sysctl behavior against /proc/sys/vm/swappiness ==
    Writing test file ... ok
    Checking sysctl is not set to test value ... ok
    Writing sysctl from shell ... ok
    Resetting sysctl to original value ... ok
    Writing entire sysctl in single write ... ok
    Writing middle of sysctl after synchronized seek ... ok
    Writing beyond end of sysctl ... ok
    Writing sysctl with multiple long writes ... ok

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:13 -07:00
Kees Cook
f4aacea2f5 sysctl: allow for strict write position handling
When writing to a sysctl string, each write, regardless of VFS position,
begins writing the string from the start.  This means the contents of
the last write to the sysctl controls the string contents instead of the
first:

  open("/proc/sys/kernel/modprobe", O_WRONLY)   = 1
  write(1, "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"..., 4096) = 4096
  write(1, "/bin/true", 9)                = 9
  close(1)                                = 0

  $ cat /proc/sys/kernel/modprobe
  /bin/true

Expected behaviour would be to have the sysctl be "AAAA..." capped at
maxlen (in this case KMOD_PATH_LEN: 256), instead of truncating to the
contents of the second write.  Similarly, multiple short writes would
not append to the sysctl.

The old behavior is unlike regular POSIX files enough that doing audits
of software that interact with sysctls can end up in unexpected or
dangerous situations.  For example, "as long as the input starts with a
trusted path" turns out to be an insufficient filter, as what must also
happen is for the input to be entirely contained in a single write
syscall -- not a common consideration, especially for high level tools.

This provides kernel.sysctl_writes_strict as a way to make this behavior
act in a less surprising manner for strings, and disallows non-zero file
position when writing numeric sysctls (similar to what is already done
when reading from non-zero file positions).  For now, the default (0) is
to warn about non-zero file position use, but retain the legacy
behavior.  Setting this to -1 disables the warning, and setting this to
1 enables the file position respecting behavior.

[akpm@linux-foundation.org: fix build]
[akpm@linux-foundation.org: move misplaced hunk, per Randy]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:13 -07:00
Kees Cook
2ca9bb456a sysctl: refactor sysctl string writing logic
Consolidate buffer length checking with new-line/end-of-line checking.
Additionally, instead of reading user memory twice, just do the
assignment during the loop.

This change doesn't affect the potential races here.  It was already
possible to read a sysctl that was in the middle of a write.  In both
cases, the string will always be NULL terminated.  The pre-existing race
remains a problem to be solved.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:13 -07:00
Kees Cook
f88083005a sysctl: clean up char buffer arguments
When writing to a sysctl string, each write, regardless of VFS position,
began writing the string from the start.  This meant the contents of the
last write to the sysctl controlled the string contents instead of the
first.

This misbehavior was featured in an exploit against Chrome OS.  While
it's not in itself a vulnerability, it's a weirdness that isn't on the
mind of most auditors: "This filter looks correct, the first line
written would not be meaningful to sysctl" doesn't apply here, since the
size of the write and the contents of the final write are what matter
when writing to sysctls.

This adds the sysctl kernel.sysctl_writes_strict to control the write
behavior.  The default (0) reports when VFS position is non-0 on a
write, but retains legacy behavior, -1 disables the warning, and 1
enables the position-respecting behavior.

The long-term plan here is to wait for userspace to be fixed in response
to the new warning and to then switch the default kernel behavior to the
new position-respecting behavior.

This patch (of 4):

The char buffer arguments are needlessly cast in weird places.  Clean it
up so things are easier to read.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:13 -07:00
Alexander Gordeev
1c92ab1e74 rapidio/tsi721: use pci_enable_msix_exact() instead of pci_enable_msix()
As result of deprecation of MSI-X/MSI enablement functions
pci_enable_msix() and pci_enable_msi_block() all drivers using these two
interfaces need to be updated to use the new pci_enable_msi_range() or
pci_enable_msi_exact() and pci_enable_msix_range() or
pci_enable_msix_exact() interfaces.

The patch has no runtime effect.

Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Acked-by: Alexandre Bounine <alexandre.bounine@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:13 -07:00
Lai Jiangshan
dcbff5d1ef idr: reorder the fields
idr_layer->layer is always accessed in read path, move it in the front.

idr_layer->bitmap is moved on the bottom.  And rcu_head shares with
bitmap due to they do not be accessed at the same time.

idr->id_free/id_free_cnt/lock are free list fields, and moved to the
bottom.  They will be removed in near future.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:13 -07:00
Lai Jiangshan
15f3ec3f23 idr: reduce the unneeded check in free_layer()
If "idr->hint == p" is true, it also implies "idr->hint" is true(not NULL).

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:12 -07:00
Lai Jiangshan
aefb768297 idr: don't need to shink the free list when idr_remove()
After idr subsystem is changed to RCU-awared, the free layer will not go
to the free list.  The free list will not be filled up when
idr_remove().  So we don't need to shink it too.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:12 -07:00
Lai Jiangshan
b93804b2fc idr: fix idr_replace()'s returned error code
When the smaller id is not found, idr_replace() returns -ENOENT.  But
when the id is bigger enough, idr_replace() returns -EINVAL, actually
there is no difference between these two kinds of ids.

These are all unallocated id, the return values of the idr_replace() for
these ids should be the same: -ENOENT.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:12 -07:00
Lai Jiangshan
aef0f62e87 idr: fix NULL pointer dereference when ida_remove(unallocated_id)
If the ida has at least one existing id, and when an unallocated ID
which meets a certain condition is passed to the ida_remove(), the
system will crash because it hits NULL pointer dereference.

The condition is that the unallocated ID shares the same lowest idr
layer with the existing ID, but the idr slot would be different if the
unallocated ID were to be allocated.

In this case the matching idr slot for the unallocated_id is NULL,
causing @bitmap to be NULL which the function dereferences without
checking crashing the kernel.

See the test code:

  static void test3(void)
  {
        int id;
        DEFINE_IDA(test_ida);

        printk(KERN_INFO "Start test3\n");
        if (ida_pre_get(&test_ida, GFP_KERNEL) < 0) return;
        if (ida_get_new(&test_ida,  &id) < 0) return;
        ida_remove(&test_ida, 4000); /* bug: null deference here */
        printk(KERN_INFO "End of test3\n");
  }

It happens only when the caller tries to free an unallocated ID which is
the caller's fault.  It is not a bug.  But it is better to add the
proper check and complain rather than crashing the kernel.

[tj@kernel.org: updated patch description]
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:12 -07:00
Lai Jiangshan
8f9f665a70 idr: fix unexpected ID-removal when idr_remove(unallocated_id)
If unallocated_id = (ANY * idr_max(idp->layers) + existing_id) is passed
to idr_remove().  The existing_id will be removed unexpectedly.

The following test shows this unexpected id-removal:

  static void test4(void)
  {
        int id;
        DEFINE_IDR(test_idr);

        printk(KERN_INFO "Start test4\n");
        id = idr_alloc(&test_idr, (void *)1, 42, 43, GFP_KERNEL);
        BUG_ON(id != 42);
        idr_remove(&test_idr, 42 + IDR_SIZE);
        TEST_BUG_ON(idr_find(&test_idr, 42) != (void *)1);
        idr_destroy(&test_idr);
        printk(KERN_INFO "End of test4\n");
  }

ida_remove() shares the similar problem.

It happens only when the caller tries to free an unallocated ID which is
the caller's fault.  It is not a bug.  But it is better to add the
proper check and complain rather than removing an existing_id silently.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:12 -07:00
Lai Jiangshan
3afb69cb55 idr: fix overflow bug during maximum ID calculation at maximum height
idr_replace() open-codes the logic to calculate the maximum valid ID
given the height of the idr tree; unfortunately, the open-coded logic
doesn't account for the fact that the top layer may have unused slots
and over-shifts the limit to zero when the tree is at its maximum
height.

The following test code shows it fails to replace the value for
id=((1<<27)+42):

  static void test5(void)
  {
        int id;
        DEFINE_IDR(test_idr);
  #define TEST5_START ((1<<27)+42) /* use the highest layer */

        printk(KERN_INFO "Start test5\n");
        id = idr_alloc(&test_idr, (void *)1, TEST5_START, 0, GFP_KERNEL);
        BUG_ON(id != TEST5_START);
        TEST_BUG_ON(idr_replace(&test_idr, (void *)2, TEST5_START) != (void *)1);
        idr_destroy(&test_idr);
        printk(KERN_INFO "End of test5\n");
  }

Fix the bug by using idr_max() which correctly takes into account the
maximum allowed shift.

sub_alloc() shares the same problem and may incorrectly fail with
-EAGAIN; however, this bug doesn't affect correct operation because
idr_get_empty_slot(), which already uses idr_max(), retries with the
increased @id in such cases.

[tj@kernel.org: Updated patch description.]
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:12 -07:00
Fabian Frederick
e1bebcf41e kernel/kexec.c: convert printk to pr_foo()
+ some pr_warning -> pr_warn and checkpatch warning fixes

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:12 -07:00
Masami Hiramatsu
f06e5153f4 kernel/panic.c: add "crash_kexec_post_notifiers" option for kdump after panic_notifers
Add a "crash_kexec_post_notifiers" boot option to run kdump after
running panic_notifiers and dump kmsg.  This can help rare situations
where kdump fails because of unstable crashed kernel or hardware failure
(memory corruption on critical data/code), or the 2nd kernel is already
broken by the 1st kernel (it's a broken behavior, but who can guarantee
that the "crashed" kernel works correctly?).

Usage: add "crash_kexec_post_notifiers" to kernel boot option.

Note that this actually increases risks of the failure of kdump.  This
option should be set only if you worry about the rare case of kdump
failure rather than increasing the chance of success.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Motohiro Kosaki <Motohiro.Kosaki@us.fujitsu.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
Cc: Satoru MORIYA <satoru.moriya.br@hitachi.com>
Cc: Tomoki Sekiyama <tomoki.sekiyama@hds.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:12 -07:00
Srivatsa S. Bhat
a219ccf463 smp: print more useful debug info upon receiving IPI on an offline CPU
There is a longstanding problem related to CPU hotplug which causes IPIs
to be delivered to offline CPUs, and the smp-call-function IPI handler
code prints out a warning whenever this is detected.  Every once in a
while this (usually harmless) warning gets reported on LKML, but so far
it has not been completely fixed.  Usually the solution involves finding
out the IPI sender and fixing it by adding appropriate synchronization
with CPU hotplug.

However, while going through one such internal bug reports, I found that
there is a significant bug in the receiver side itself (more
specifically, in stop-machine) that can lead to this problem even when
the sender code is perfectly fine.  This patchset fixes that
synchronization problem in the CPU hotplug stop-machine code.

Patch 1 adds some additional debug code to the smp-call-function
framework, to help debug such issues easily.

Patch 2 modifies the stop-machine code to ensure that any IPIs that were
sent while the target CPU was online, would be noticed and handled by
that CPU without fail before it goes offline.  Thus, this avoids
scenarios where IPIs are received on offline CPUs (as long as the sender
uses proper hotplug synchronization).

In fact, I debugged the problem by using Patch 1, and found that the
payload of the IPI was always the block layer's trigger_softirq()
function.  But I was not able to find anything wrong with the block
layer code.  That's when I started looking at the stop-machine code and
realized that there is a race-window which makes the IPI _receiver_ the
culprit, not the sender.  Patch 2 fixes that race and hence this should
put an end to most of the hard-to-debug IPI-to-offline-CPU issues.

This patch (of 2):

Today the smp-call-function code just prints a warning if we get an IPI
on an offline CPU.  This info is sufficient to let us know that
something went wrong, but often it is very hard to debug exactly who
sent the IPI and why, from this info alone.

In most cases, we get the warning about the IPI to an offline CPU,
immediately after the CPU going offline comes out of the stop-machine
phase and reenables interrupts.  Since all online CPUs participate in
stop-machine, the information regarding the sender of the IPI is already
lost by the time we exit the stop-machine loop.  So even if we dump the
stack on each CPU at this point, we won't find anything useful since all
of them will show the stack-trace of the stopper thread.  So we need a
better way to figure out who sent the IPI and why.

To achieve this, when we detect an IPI targeted to an offline CPU, loop
through the call-single-data linked list and print out the payload
(i.e., the name of the function which was supposed to be executed by the
target CPU).  This would give us an insight as to who might have sent
the IPI and help us debug this further.

[akpm@linux-foundation.org: correctly suppress warning output on second and later occurrences]
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mike Galbraith <mgalbraith@suse.de>
Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:12 -07:00
Fabian Frederick
a05e16ada4 fs/proc/vmcore.c: remove NULL assignment to static
Static values are automatically initialized to NULL.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:12 -07:00
Fabian Frederick
17c2b4ee40 fs/proc/task_mmu.c: replace seq_printf by seq_puts
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:12 -07:00