Commit Graph

17794 Commits

Author SHA1 Message Date
Petr Mladek
39b25109b4 printk: use also the last bytes in the ring buffer
It seems that we have newer used the last byte in the ring buffer.  In
fact, we have newer used the last 4 bytes because of padding.

First problem is in the check for free space.  The exact number of free
bytes is enough to store the length of data.

Second problem is in the check where the ring buffer is rotated.  The
left side counts the first unused index.  It is unused, so it might be
the same as the size of the buffer.

Note that the first problem has to be fixed together with the second
one.  Otherwise, the buffer is rotated even when there is enough space
on the end of the buffer.  Then the beginning of the buffer is rewritten
and valid entries get corrupted.

Signed-off-by: Petr Mladek <pmladek@suse.cz>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Kay Sievers <kay@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:07 -07:00
Petr Mladek
e8c42d36ab printk: add comment about tricky check for text buffer size
There is no check for potential "text_len" overflow.  It is not needed
because only valid level is detected.  It took me some time to
understand why.  It would deserve a comment ;-)

Signed-off-by: Petr Mladek <pmladek@suse.cz>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Kay Sievers <kay@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:07 -07:00
Petr Mladek
c64730b26f printk: remove obsolete check for log level "c"
The kernel log level "c" was removed in commit 61e99ab8e3 ("printk:
remove the now unnecessary "C" annotation for KERN_CONT").  It is no
longer detected in printk_get_level().  Hence we do not need to check it
in vprintk_emit.

Signed-off-by: Petr Mladek <pmladek@suse.cz>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Kay Sievers <kay@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:07 -07:00
Daeseok Youn
28ab49ff7f kernel/resource.c: make reallocate_resource() static
sparse says:

kernel/resource.c:518:5: warning:
 symbol 'reallocate_resource' was not declared. Should it be static?

Signed-off-by: Daeseok Youn <daeseok.youn@gmail.com>
Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:07 -07:00
Paul Gortmaker
c96d6660dc kernel: audit/fix non-modular users of module_init in core code
Code that is obj-y (always built-in) or dependent on a bool Kconfig
(built-in or absent) can never be modular.  So using module_init as an
alias for __initcall can be somewhat misleading.

Fix these up now, so that we can relocate module_init from init.h into
module.h in the future.  If we don't do this, we'd have to add module.h
to obviously non-modular code, and that would be a worse thing.

The audit targets the following module_init users for change:
 kernel/user.c                  obj-y
 kernel/kexec.c                 bool KEXEC (one instance per arch)
 kernel/profile.c               bool PROFILING
 kernel/hung_task.c             bool DETECT_HUNG_TASK
 kernel/sched/stats.c           bool SCHEDSTATS
 kernel/user_namespace.c        bool USER_NS

Note that direct use of __initcall is discouraged, vs.  one of the
priority categorized subgroups.  As __initcall gets mapped onto
device_initcall, our use of subsys_initcall (which makes sense for these
files) will thus change this registration from level 6-device to level
4-subsys (i.e.  slightly earlier).  However no observable impact of that
difference has been observed during testing.

Also, two instances of missing ";" at EOL are fixed in kexec.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:07 -07:00
Josh Triplett
69369a7003 fs, kernel: permit disabling the uselib syscall
uselib hasn't been used since libc5; glibc does not use it.  Support
turning it off.

When disabled, also omit the load_elf_library implementation from
binfmt_elf.c, which only uselib invokes.

bloat-o-meter:
add/remove: 0/4 grow/shrink: 0/1 up/down: 0/-785 (-785)
function                                     old     new   delta
padzero                                       39      36      -3
uselib_flags                                  20       -     -20
sys_uselib                                   168       -    -168
SyS_uselib                                   168       -    -168
load_elf_library                             426       -    -426

The new CONFIG_USELIB defaults to `y'.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:05 -07:00
Wang YanQing
8f6c5ffc89 kernel/groups.c: remove return value of set_groups
After commit 6307f8fee2 ("security: remove dead hook task_setgroups"),
set_groups will always return zero, so we could just remove return value
of set_groups.

This patch reduces code size, and simplfies code to use set_groups,
because we don't need to check its return value any more.

[akpm@linux-foundation.org: remove obsolete claims from set_groups() comment]
Signed-off-by: Wang YanQing <udknight@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:05 -07:00
Fabian Frederick
6af9f7bf3c sys_sysfs: Add CONFIG_SYSFS_SYSCALL
sys_sysfs is an obsolete system call no longer supported by libc.

 - This patch adds a default CONFIG_SYSFS_SYSCALL=y

 - Option can be turned off in expert mode.

 - cond_syscall added to kernel/sys_ni.c

[akpm@linux-foundation.org: tweak Kconfig help text]
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:05 -07:00
Dave Hansen
5509a5d27b drop_caches: add some documentation and info message
There is plenty of anecdotal evidence and a load of blog posts
suggesting that using "drop_caches" periodically keeps your system
running in "tip top shape".  Perhaps adding some kernel documentation
will increase the amount of accurate data on its use.

If we are not shrinking caches effectively, then we have real bugs.
Using drop_caches will simply mask the bugs and make them harder to
find, but certainly does not fix them, nor is it an appropriate
"workaround" to limit the size of the caches.  On the contrary, there
have been bug reports on issues that turned out to be misguided use of
cache dropping.

Dropping caches is a very drastic and disruptive operation that is good
for debugging and running tests, but if it creates bug reports from
production use, kernel developers should be aware of its use.

Add a bit more documentation about it, a syslog message to track down
abusers, and vmstat drop counters to help analyze problem reports.

[akpm@linux-foundation.org: checkpatch fixes]
[hannes@cmpxchg.org: add runtime suppression control]
Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:04 -07:00
Mel Gorman
d26914d117 mm: optimize put_mems_allowed() usage
Since put_mems_allowed() is strictly optional, its a seqcount retry, we
don't need to evaluate the function if the allocation was in fact
successful, saving a smp_rmb some loads and comparisons on some relative
fast-paths.

Since the naming, get/put_mems_allowed() does suggest a mandatory
pairing, rename the interface, as suggested by Mel, to resemble the
seqcount interface.

This gives us: read_mems_allowed_begin() and read_mems_allowed_retry(),
where it is important to note that the return value of the latter call
is inverted from its previous incarnation.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:20:58 -07:00
Ben Zhang
62572e29bc kernel/watchdog.c: touch_nmi_watchdog should only touch local cpu not every one
I ran into a scenario where while one cpu was stuck and should have
panic'd because of the NMI watchdog, it didn't.  The reason was another
cpu was spewing stack dumps on to the console.  Upon investigation, I
noticed that when writing to the console and also when dumping the
stack, the watchdog is touched.

This causes all the cpus to reset their NMI watchdog flags and the
'stuck' cpu just spins forever.

This change causes the semantics of touch_nmi_watchdog to be changed
slightly.  Previously, I accidentally changed the semantics and we
noticed there was a codepath in which touch_nmi_watchdog could be
touched from a preemtible area.  That caused a BUG() to happen when
CONFIG_DEBUG_PREEMPT was enabled.  I believe it was the acpi code.

My attempt here re-introduces the change to have the
touch_nmi_watchdog() code only touch the local cpu instead of all of the
cpus.  But instead of using __get_cpu_var(), I use the
__raw_get_cpu_var() version.

This avoids the preemption problem.  However my reasoning wasn't because
I was trying to be lazy.  Instead I rationalized it as, well if
preemption is enabled then interrupts should be enabled to and the NMI
watchdog will have no reason to trigger.  So it won't matter if the
wrong cpu is touched because the percpu interrupt counters the NMI
watchdog uses should still be incrementing.

Don said:

: I'm ok with this patch, though it does alter the behaviour of how
: touch_nmi_watchdog works.  For the most part I don't think most callers
: need to touch all of the watchdogs (on each cpu).  Perhaps a corner case
: will pop up (the scheduler??  to mimic touch_all_softlockup_watchdogs() ).
:
: But this does address an issue where if a system is locked up and one cpu
: is spewing out useful debug messages (or error messages), the hard lockup
: will fail to go off.  We have seen this on RHEL also.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Ben Zhang <benzh@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:20:58 -07:00
Nishanth Aravamudan
81c98869fa kthread: ensure locality of task_struct allocations
In the presence of memoryless nodes, numa_node_id() will return the
current CPU's NUMA node, but that may not be where we expect to allocate
from memory from.  Instead, we should rely on the fallback code in the
memory allocator itself, by using NUMA_NO_NODE.  Also, when calling
kthread_create_on_node(), use the nearest node with memory to the cpu in
question, rather than the node it is running on.

Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Ben Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:20:49 -07:00
Linus Torvalds
32d01dc7be Merge branch 'for-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
 "A lot updates for cgroup:

   - The biggest one is cgroup's conversion to kernfs.  cgroup took
     after the long abandoned vfs-entangled sysfs implementation and
     made it even more convoluted over time.  cgroup's internal objects
     were fused with vfs objects which also brought in vfs locking and
     object lifetime rules.  Naturally, there are places where vfs rules
     don't fit and nasty hacks, such as credential switching or lock
     dance interleaving inode mutex and cgroup_mutex with object serial
     number comparison thrown in to decide whether the operation is
     actually necessary, needed to be employed.

     After conversion to kernfs, internal object lifetime and locking
     rules are mostly isolated from vfs interactions allowing shedding
     of several nasty hacks and overall simplification.  This will also
     allow implmentation of operations which may affect multiple cgroups
     which weren't possible before as it would have required nesting
     i_mutexes.

   - Various simplifications including dropping of module support,
     easier cgroup name/path handling, simplified cgroup file type
     handling and task_cg_lists optimization.

   - Prepatory changes for the planned unified hierarchy, which is still
     a patchset away from being actually operational.  The dummy
     hierarchy is updated to serve as the default unified hierarchy.
     Controllers which aren't claimed by other hierarchies are
     associated with it, which BTW was what the dummy hierarchy was for
     anyway.

   - Various fixes from Li and others.  This pull request includes some
     patches to add missing slab.h to various subsystems.  This was
     triggered xattr.h include removal from cgroup.h.  cgroup.h
     indirectly got included a lot of files which brought in xattr.h
     which brought in slab.h.

  There are several merge commits - one to pull in kernfs updates
  necessary for converting cgroup (already in upstream through
  driver-core), others for interfering changes in the fixes branch"

* 'for-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (74 commits)
  cgroup: remove useless argument from cgroup_exit()
  cgroup: fix spurious lockdep warning in cgroup_exit()
  cgroup: Use RCU_INIT_POINTER(x, NULL) in cgroup.c
  cgroup: break kernfs active_ref protection in cgroup directory operations
  cgroup: fix cgroup_taskset walking order
  cgroup: implement CFTYPE_ONLY_ON_DFL
  cgroup: make cgrp_dfl_root mountable
  cgroup: drop const from @buffer of cftype->write_string()
  cgroup: rename cgroup_dummy_root and related names
  cgroup: move ->subsys_mask from cgroupfs_root to cgroup
  cgroup: treat cgroup_dummy_root as an equivalent hierarchy during rebinding
  cgroup: remove NULL checks from [pr_cont_]cgroup_{name|path}()
  cgroup: use cgroup_setup_root() to initialize cgroup_dummy_root
  cgroup: reorganize cgroup bootstrapping
  cgroup: relocate setting of CGRP_DEAD
  cpuset: use rcu_read_lock() to protect task_cs()
  cgroup_freezer: document freezer_fork() subtleties
  cgroup: update cgroup_transfer_tasks() to either succeed or fail
  cgroup: drop task_lock() protection around task->cgroups
  cgroup: update how a newly forked task gets associated with css_set
  ...
2014-04-03 13:05:42 -07:00
Linus Torvalds
68114e5eb8 Most of the changes were largely clean ups, and some documentation.
But there were a few features that were added.
 
 Uprobes now work with event triggers and multi buffers.
 Uprobes have support under ftrace and perf.
 
 The big feature is that the function tracer can now be used within the
 multi buffer instances. That is, you can now trace some functions
 in one buffer, others in another buffer, all functions in a third buffer
 and so on. They are basically agnostic from each other. This only
 works for the function tracer and not for the function graph trace,
 although you can have the function graph tracer running in the top level
 buffer (or any tracer for that matter) and have different function tracing
 going on in the sub buffers.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJTOthtAAoJEKQekfcNnQGu5c8H/Ana/U+0tmksp1dbHkRHsKSH
 +Fsv4Jeu8gf1NaFKHEhkUTcFtnzE6qAPV2VCrcJwXbhAhhwZm+LjrnWdoy3215S3
 cQW4LftLEonh2cM36Cos74TulMEYN6XmL6dQZV+CILKQkDrWU4qJjQ64okXEkqrd
 9iG3p/mSXyvJcmnyg61ALnMOhZDLsXY3djBhWBPhiTPGS6BRb9zh4Pmw6Zv0n2rJ
 U93Gt/3AQrv1ybu73dUxqP0abp60oXOiWoF/R2jcbKqIM+K9RPJX79unCV3jq3u9
 f+6jMlB9PgAMqQj6ihJdwxKDDuzwyrVdEPnsgvl4jarCBCtVVwhKedBaKN/KS8k=
 =HdXY
 -----END PGP SIGNATURE-----

Merge tag 'trace-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "Most of the changes were largely clean ups, and some documentation.
  But there were a few features that were added:

  Uprobes now work with event triggers and multi buffers and have
  support under ftrace and perf.

  The big feature is that the function tracer can now be used within the
  multi buffer instances.  That is, you can now trace some functions in
  one buffer, others in another buffer, all functions in a third buffer
  and so on.  They are basically agnostic from each other.  This only
  works for the function tracer and not for the function graph trace,
  although you can have the function graph tracer running in the top
  level buffer (or any tracer for that matter) and have different
  function tracing going on in the sub buffers"

* tag 'trace-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (45 commits)
  tracing: Add BUG_ON when stack end location is over written
  tracepoint: Remove unused API functions
  Revert "tracing: Move event storage for array from macro to standalone function"
  ftrace: Constify ftrace_text_reserved
  tracepoints: API doc update to tracepoint_probe_register() return value
  tracepoints: API doc update to data argument
  ftrace: Fix compilation warning about control_ops_free
  ftrace/x86: BUG when ftrace recovery fails
  ftrace: Warn on error when modifying ftrace function
  ftrace: Remove freelist from struct dyn_ftrace
  ftrace: Do not pass data to ftrace_dyn_arch_init
  ftrace: Pass retval through return in ftrace_dyn_arch_init()
  ftrace: Inline the code from ftrace_dyn_table_alloc()
  ftrace: Cleanup of global variables ftrace_new_pgs and ftrace_update_cnt
  tracing: Evaluate len expression only once in __dynamic_array macro
  tracing: Correctly expand len expressions from __dynamic_array macro
  tracing/module: Replace include of tracepoint.h with jump_label.h in module.h
  tracing: Fix event header migrate.h to include tracepoint.h
  tracing: Fix event header writeback.h to include tracepoint.h
  tracing: Warn if a tracepoint is not set via debugfs
  ...
2014-04-03 10:26:31 -07:00
Linus Torvalds
bea803183e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem updates from James Morris:
 "Apart from reordering the SELinux mmap code to ensure DAC is called
  before MAC, these are minor maintenance updates"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (23 commits)
  selinux: correctly label /proc inodes in use before the policy is loaded
  selinux: put the mmap() DAC controls before the MAC controls
  selinux: fix the output of ./scripts/get_maintainer.pl for SELinux
  evm: enable key retention service automatically
  ima: skip memory allocation for empty files
  evm: EVM does not use MD5
  ima: return d_name.name if d_path fails
  integrity: fix checkpatch errors
  ima: fix erroneous removal of security.ima xattr
  security: integrity: Use a more current logging style
  MAINTAINERS: email updates and other misc. changes
  ima: reduce memory usage when a template containing the n field is used
  ima: restore the original behavior for sending data with ima template
  Integrity: Pass commname via get_task_comm()
  fs: move i_readcount
  ima: use static const char array definitions
  security: have cap_dentry_init_security return error
  ima: new helper: file_inode(file)
  kernel: Mark function as static in kernel/seccomp.c
  capability: Use current logging styles
  ...
2014-04-03 09:26:18 -07:00
Linus Torvalds
cd6362befe Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Here is my initial pull request for the networking subsystem during
  this merge window:

   1) Support for ESN in AH (RFC 4302) from Fan Du.

   2) Add full kernel doc for ethtool command structures, from Ben
      Hutchings.

   3) Add BCM7xxx PHY driver, from Florian Fainelli.

   4) Export computed TCP rate information in netlink socket dumps, from
      Eric Dumazet.

   5) Allow IPSEC SA to be dumped partially using a filter, from Nicolas
      Dichtel.

   6) Convert many drivers to pci_enable_msix_range(), from Alexander
      Gordeev.

   7) Record SKB timestamps more efficiently, from Eric Dumazet.

   8) Switch to microsecond resolution for TCP round trip times, also
      from Eric Dumazet.

   9) Clean up and fix 6lowpan fragmentation handling by making use of
      the existing inet_frag api for it's implementation.

  10) Add TX grant mapping to xen-netback driver, from Zoltan Kiss.

  11) Auto size SKB lengths when composing netlink messages based upon
      past message sizes used, from Eric Dumazet.

  12) qdisc dumps can take a long time, add a cond_resched(), From Eric
      Dumazet.

  13) Sanitize netpoll core and drivers wrt.  SKB handling semantics.
      Get rid of never-used-in-tree netpoll RX handling.  From Eric W
      Biederman.

  14) Support inter-address-family and namespace changing in VTI tunnel
      driver(s).  From Steffen Klassert.

  15) Add Altera TSE driver, from Vince Bridgers.

  16) Optimizing csum_replace2() so that it doesn't adjust the checksum
      by checksumming the entire header, from Eric Dumazet.

  17) Expand BPF internal implementation for faster interpreting, more
      direct translations into JIT'd code, and much cleaner uses of BPF
      filtering in non-socket ocntexts.  From Daniel Borkmann and Alexei
      Starovoitov"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1976 commits)
  netpoll: Use skb_irq_freeable to make zap_completion_queue safe.
  net: Add a test to see if a skb is freeable in irq context
  qlcnic: Fix build failure due to undefined reference to `vxlan_get_rx_port'
  net: ptp: move PTP classifier in its own file
  net: sxgbe: make "core_ops" static
  net: sxgbe: fix logical vs bitwise operation
  net: sxgbe: sxgbe_mdio_register() frees the bus
  Call efx_set_channels() before efx->type->dimension_resources()
  xen-netback: disable rogue vif in kthread context
  net/mlx4: Set proper build dependancy with vxlan
  be2net: fix build dependency on VxLAN
  mac802154: make csma/cca parameters per-wpan
  mac802154: allow only one WPAN to be up at any given time
  net: filter: minor: fix kdoc in __sk_run_filter
  netlink: don't compare the nul-termination in nla_strcmp
  can: c_can: Avoid led toggling for every packet.
  can: c_can: Simplify TX interrupt cleanup
  can: c_can: Store dlc private
  can: c_can: Reduce register access
  can: c_can: Make the code readable
  ...
2014-04-02 20:53:45 -07:00
Linus Torvalds
159d8133d0 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina:
 "Usual rocket science -- mostly documentation and comment updates"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
  sparse: fix comment
  doc: fix double words
  isdn: capi: fix "CAPI_VERSION" comment
  doc: DocBook: Fix typos in xml and template file
  Bluetooth: add module name for btwilink
  driver core: unexport static function create_syslog_header
  mmc: core: typo fix in printk specifier
  ARM: spear: clean up editing mistake
  net-sysfs: fix comment typo 'CONFIG_SYFS'
  doc: Insert MODULE_ in module-signing macros
  Documentation: update URL to hfsplus Technote 1150
  gpio: update path to documentation
  ixgbe: Fix format string in ixgbe_fcoe.
  Kconfig: Remove useless "default N" lines
  user_namespace.c: Remove duplicated word in comment
  CREDITS: fix formatting
  treewide: Fix typo in Documentation/DocBook
  mm: Fix warning on make htmldocs caused by slab.c
  ata: ata-samsung_cf: cleanup in header file
  idr: remove unused prototype of idr_free()
2014-04-02 16:23:38 -07:00
Linus Torvalds
05bf58ca4b Merge branch 'sched-idle-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull sched/idle changes from Ingo Molnar:
 "More idle code reorganization, to prepare for more integration.

  (Sent separately because it depended on pending timer work, which is
  now upstream)"

* 'sched-idle-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/idle: Add more comments to the code
  sched/idle: Move idle conditions in cpuidle_idle main function
  sched/idle: Reorganize the idle loop
  cpuidle/idle: Move the cpuidle_idle_call function to idle.c
  idle/cpuidle: Split cpuidle_idle_call main function into smaller functions
2014-04-02 16:22:27 -07:00
Oleg Nesterov
d23082257d pid_namespace: pidns_get() should check task_active_pid_ns() != NULL
pidns_get()->get_pid_ns() can hit ns == NULL. This task_struct can't
go away, but task_active_pid_ns(task) is NULL if release_task(task)
was already called. Alternatively we could change get_pid_ns(ns) to
check ns != NULL, but it seems that other callers are fine.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Eric W. Biederman ebiederm@xmission.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-02 16:20:21 -07:00
Eric Paris
56c4911aed audit: do not cast audit_rule_data pointers pointlesly
For some sort of legacy support audit_rule is a subset of (and first
entry in) audit_rule_data.  We don't actually need or use audit_rule.
We just do a cast from one to the other for no gain what so ever.  Stop
the crazy casting.

Signed-off-by: Eric Paris <eparis@redhat.com>
2014-04-02 15:55:14 -04:00
Linus Torvalds
7125764c5d Merge branch 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull compat time conversion changes from Peter Anvin:
 "Despite the branch name this is really neither an x86 nor an
  x32-specific patchset, although it the implementation of the
  discussions that followed the x32 security hole a few months ago.

  This removes get/put_compat_timespec/val() and replaces them with
  compat_get/put_timespec/val() which are savvy as to the current status
  of COMPAT_USE_64BIT_TIME.

  It removes several unused and/or incorrect/misleading functions (like
  compat_put_timeval_convert which doesn't in fact do any conversion)
  and also replaces several open-coded implementations what is now
  called compat_convert_timespec() with that function"

* 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  compat: Fix sparse address space warnings
  compat: Get rid of (get|put)_compat_time(val|spec)
2014-04-02 12:51:41 -07:00
Al Viro
fbb32750a6 pipe: kill ->map() and ->unmap()
all pipe_buffer_operations have the same instances of those...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-04-01 23:19:19 -04:00
Linus Torvalds
7a48837732 Merge branch 'for-3.15/core' of git://git.kernel.dk/linux-block
Pull core block layer updates from Jens Axboe:
 "This is the pull request for the core block IO bits for the 3.15
  kernel.  It's a smaller round this time, it contains:

   - Various little blk-mq fixes and additions from Christoph and
     myself.

   - Cleanup of the IPI usage from the block layer, and associated
     helper code.  From Frederic Weisbecker and Jan Kara.

   - Duplicate code cleanup in bio-integrity from Gu Zheng.  This will
     give you a merge conflict, but that should be easy to resolve.

   - blk-mq notify spinlock fix for RT from Mike Galbraith.

   - A blktrace partial accounting bug fix from Roman Pen.

   - Missing REQ_SYNC detection fix for blk-mq from Shaohua Li"

* 'for-3.15/core' of git://git.kernel.dk/linux-block: (25 commits)
  blk-mq: add REQ_SYNC early
  rt,blk,mq: Make blk_mq_cpu_notify_lock a raw spinlock
  blk-mq: support partial I/O completions
  blk-mq: merge blk_mq_insert_request and blk_mq_run_request
  blk-mq: remove blk_mq_alloc_rq
  blk-mq: don't dump CPU -> hw queue map on driver load
  blk-mq: fix wrong usage of hctx->state vs hctx->flags
  blk-mq: allow blk_mq_init_commands() to return failure
  block: remove old blk_iopoll_enabled variable
  blktrace: fix accounting of partially completed requests
  smp: Rename __smp_call_function_single() to smp_call_function_single_async()
  smp: Remove wait argument from __smp_call_function_single()
  watchdog: Simplify a little the IPI call
  smp: Move __smp_call_function_single() below its safe version
  smp: Consolidate the various smp_call_function_single() declensions
  smp: Teach __smp_call_function_single() to check for offline cpus
  smp: Remove unused list_head from csd
  smp: Iterate functions through llist_for_each_entry_safe()
  block: Stop abusing rq->csd.list in blk-softirq
  block: Remove useless IPI struct initialization
  ...
2014-04-01 19:19:15 -07:00
Linus Torvalds
4b1779c2cf PCI changes for the v3.15 merge window:
Enumeration
     - Increment max correctly in pci_scan_bridge() (Andreas Noever)
     - Clarify the "scan anyway" comment in pci_scan_bridge() (Andreas Noever)
     - Assign CardBus bus number only during the second pass (Andreas Noever)
     - Use request_resource_conflict() instead of insert_ for bus numbers (Andreas Noever)
     - Make sure bus number resources stay within their parents bounds (Andreas Noever)
     - Remove pci_fixup_parent_subordinate_busnr() (Andreas Noever)
     - Check for child busses which use more bus numbers than allocated (Andreas Noever)
     - Don't scan random busses in pci_scan_bridge() (Andreas Noever)
     - x86: Drop pcibios_scan_root() check for bus already scanned (Bjorn Helgaas)
     - x86: Use pcibios_scan_root() instead of pci_scan_bus_with_sysdata() (Bjorn Helgaas)
     - x86: Use pcibios_scan_root() instead of pci_scan_bus_on_node() (Bjorn Helgaas)
     - x86: Merge pci_scan_bus_on_node() into pcibios_scan_root() (Bjorn Helgaas)
     - x86: Drop return value of pcibios_scan_root() (Bjorn Helgaas)
 
   NUMA
     - x86: Add x86_pci_root_bus_node() to look up NUMA node from PCI bus (Bjorn Helgaas)
     - x86: Use x86_pci_root_bus_node() instead of get_mp_bus_to_node() (Bjorn Helgaas)
     - x86: Remove mp_bus_to_node[], set_mp_bus_to_node(), get_mp_bus_to_node() (Bjorn Helgaas)
     - x86: Use NUMA_NO_NODE, not -1, for unknown node (Bjorn Helgaas)
     - x86: Remove acpi_get_pxm() usage (Bjorn Helgaas)
     - ia64: Use NUMA_NO_NODE, not MAX_NUMNODES, for unknown node (Bjorn Helgaas)
     - ia64: Remove acpi_get_pxm() usage (Bjorn Helgaas)
     - ACPI: Fix acpi_get_node() prototype (Bjorn Helgaas)
 
   Resource management
     - i2o: Fix and refactor PCI space allocation (Bjorn Helgaas)
     - Add resource_contains() (Bjorn Helgaas)
     - Add %pR support for IORESOURCE_UNSET (Bjorn Helgaas)
     - Mark resources as IORESOURCE_UNSET if we can't assign them (Bjorn Helgaas)
     - Don't clear IORESOURCE_UNSET when updating BAR (Bjorn Helgaas)
     - Check IORESOURCE_UNSET before updating BAR (Bjorn Helgaas)
     - Don't try to claim IORESOURCE_UNSET resources (Bjorn Helgaas)
     - Mark 64-bit resource as IORESOURCE_UNSET if we only support 32-bit (Bjorn Helgaas)
     - Don't enable decoding if BAR hasn't been assigned an address (Bjorn Helgaas)
     - Add "weak" generic pcibios_enable_device() implementation (Bjorn Helgaas)
     - alpha, microblaze, sh, sparc, tile: Use default pcibios_enable_device() (Bjorn Helgaas)
     - s390: Use generic pci_enable_resources() (Bjorn Helgaas)
     - Don't check resource_size() in pci_bus_alloc_resource() (Bjorn Helgaas)
     - Set type in __request_region() (Bjorn Helgaas)
     - Check all IORESOURCE_TYPE_BITS in pci_bus_alloc_from_region() (Bjorn Helgaas)
     - Change pci_bus_alloc_resource() type_mask to unsigned long (Bjorn Helgaas)
     - Log IDE resource quirk in dmesg (Bjorn Helgaas)
     - Revert "[PATCH] Insert GART region into resource map" (Bjorn Helgaas)
 
   PCI device hotplug
     - Make check_link_active() non-static (Rajat Jain)
     - Use link change notifications for hot-plug and removal (Rajat Jain)
     - Enable link state change notifications (Rajat Jain)
     - Don't disable the link permanently during removal (Rajat Jain)
     - Don't check adapter or latch status while disabling (Rajat Jain)
     - Disable link notification across slot reset (Rajat Jain)
     - Ensure very fast hotplug events are also processed (Rajat Jain)
     - Add hotplug_lock to serialize hotplug events (Rajat Jain)
     - Remove a non-existent card, regardless of "surprise" capability (Rajat Jain)
     - Don't turn slot off when hot-added device already exists (Yijing Wang)
 
   MSI
     - Keep pci_enable_msi() documentation (Alexander Gordeev)
     - ahci: Fix broken single MSI fallback (Alexander Gordeev)
     - ahci, vfio: Use pci_enable_msi_range() (Alexander Gordeev)
     - Check kmalloc() return value, fix leak of name (Greg Kroah-Hartman)
     - Fix leak of msi_attrs (Greg Kroah-Hartman)
     - Fix pci_msix_vec_count() htmldocs failure (Masanari Iida)
 
   Virtualization
     - Device-specific ACS support (Alex Williamson)
 
   Freescale i.MX6
     - Wait for retraining (Marek Vasut)
 
   Marvell MVEBU
     - Use Device ID and revision from underlying endpoint (Andrew Lunn)
     - Fix incorrect size for PCI aperture resources (Jason Gunthorpe)
     - Call request_resource() on the apertures (Jason Gunthorpe)
     - Fix potential issue in range parsing (Jean-Jacques Hiblot)
 
   Renesas R-Car
     - Check platform_get_irq() return code (Ben Dooks)
     - Add error interrupt handling (Ben Dooks)
     - Fix bridge logic configuration accesses (Ben Dooks)
     - Register each instance independently (Magnus Damm)
     - Break out window size handling (Magnus Damm)
     - Make the Kconfig dependencies more generic (Magnus Damm)
 
   Synopsys DesignWare
     - Fix RC BAR to be single 64-bit non-prefetchable memory (Mohit Kumar)
 
   Miscellaneous
     - Remove unused SR-IOV VF Migration support (Bjorn Helgaas)
     - Enable INTx if BIOS left them disabled (Bjorn Helgaas)
     - Fix hex vs decimal typo in cpqhpc_probe() (Dan Carpenter)
     - Clean up par-arch object file list (Liviu Dudau)
     - Set IORESOURCE_ROM_SHADOW only for the default VGA device (Sander Eikelenboom)
     - ACPI, ARM, drm, powerpc, pcmcia, PCI: Use list_for_each_entry() for bus traversal (Yijing Wang)
     - Fix pci_bus_b() build failure (Paul Gortmaker)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJTOdAZAAoJEFmIoMA60/r8VYUQALRrReyMBk3pjRt/fKIX4Kwi
 ydSo/YJeeKTN8K93fLw8bb8bdPItJScJFTfEa4Q2SpZezR/ecGXLowisy0BBaPHK
 qtOyB8EqjkLS17GfyecIe9Nd2SIAI2De/0bchK3kDtIX1YlZB/k/tD3eCPMHDnnl
 m8c5kAHKPQYd8g01I+S8nrtGHk/A33grfYpJXPZbcqyhE0lWU3SI8KDAGbcKzNHE
 23Do0yNyd4nHIdixWlhETcNvzHn35Q/O38JJwW9Mf1aI9gusYuml6GFefCgu/iov
 lxqp3CEW7iPZgQEgNbrQ0HzWn/durL2Trd6S/Yh6f2xbm1LGYKWh3LZUFLd3AQDd
 INEpUgKsyb//nF3dtiyGnZlp0QykoqFyLo2AEDrb+ILTd4up5DeRY/m1UpjAXR5p
 QicBmrDksHrSivPmMZwLx1DFQYKjQbdx5lOqy9hQM/Jmsr+N3/l7QBrbQWXks3JZ
 NNAyn4RZHQB7UDQS/MmVPArs+JK5qaEDQD57QuOTlqgP19VY9C9E/l/aEqefjdFo
 XOAm7CwGpB/iBAkIbE6ROEDiJArigRVHEfxLYeE/jtGOdRDCD1deWk+g3S8DWD7m
 ZxWSgIVB00PMAmomczdg59YVFBhocgwPUa8/cw6yqzx2QKP4mWXIFZ/Sjau5I3tn
 WWoxXlUirZfTJc29XnVy
 =3mNS
 -----END PGP SIGNATURE-----

Merge tag 'pci-v3.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI changes from Bjorn Helgaas:
 "Enumeration
   - Increment max correctly in pci_scan_bridge() (Andreas Noever)
   - Clarify the "scan anyway" comment in pci_scan_bridge() (Andreas Noever)
   - Assign CardBus bus number only during the second pass (Andreas Noever)
   - Use request_resource_conflict() instead of insert_ for bus numbers (Andreas Noever)
   - Make sure bus number resources stay within their parents bounds (Andreas Noever)
   - Remove pci_fixup_parent_subordinate_busnr() (Andreas Noever)
   - Check for child busses which use more bus numbers than allocated (Andreas Noever)
   - Don't scan random busses in pci_scan_bridge() (Andreas Noever)
   - x86: Drop pcibios_scan_root() check for bus already scanned (Bjorn Helgaas)
   - x86: Use pcibios_scan_root() instead of pci_scan_bus_with_sysdata() (Bjorn Helgaas)
   - x86: Use pcibios_scan_root() instead of pci_scan_bus_on_node() (Bjorn Helgaas)
   - x86: Merge pci_scan_bus_on_node() into pcibios_scan_root() (Bjorn Helgaas)
   - x86: Drop return value of pcibios_scan_root() (Bjorn Helgaas)

  NUMA
   - x86: Add x86_pci_root_bus_node() to look up NUMA node from PCI bus (Bjorn Helgaas)
   - x86: Use x86_pci_root_bus_node() instead of get_mp_bus_to_node() (Bjorn Helgaas)
   - x86: Remove mp_bus_to_node[], set_mp_bus_to_node(), get_mp_bus_to_node() (Bjorn Helgaas)
   - x86: Use NUMA_NO_NODE, not -1, for unknown node (Bjorn Helgaas)
   - x86: Remove acpi_get_pxm() usage (Bjorn Helgaas)
   - ia64: Use NUMA_NO_NODE, not MAX_NUMNODES, for unknown node (Bjorn Helgaas)
   - ia64: Remove acpi_get_pxm() usage (Bjorn Helgaas)
   - ACPI: Fix acpi_get_node() prototype (Bjorn Helgaas)

  Resource management
   - i2o: Fix and refactor PCI space allocation (Bjorn Helgaas)
   - Add resource_contains() (Bjorn Helgaas)
   - Add %pR support for IORESOURCE_UNSET (Bjorn Helgaas)
   - Mark resources as IORESOURCE_UNSET if we can't assign them (Bjorn Helgaas)
   - Don't clear IORESOURCE_UNSET when updating BAR (Bjorn Helgaas)
   - Check IORESOURCE_UNSET before updating BAR (Bjorn Helgaas)
   - Don't try to claim IORESOURCE_UNSET resources (Bjorn Helgaas)
   - Mark 64-bit resource as IORESOURCE_UNSET if we only support 32-bit (Bjorn Helgaas)
   - Don't enable decoding if BAR hasn't been assigned an address (Bjorn Helgaas)
   - Add "weak" generic pcibios_enable_device() implementation (Bjorn Helgaas)
   - alpha, microblaze, sh, sparc, tile: Use default pcibios_enable_device() (Bjorn Helgaas)
   - s390: Use generic pci_enable_resources() (Bjorn Helgaas)
   - Don't check resource_size() in pci_bus_alloc_resource() (Bjorn Helgaas)
   - Set type in __request_region() (Bjorn Helgaas)
   - Check all IORESOURCE_TYPE_BITS in pci_bus_alloc_from_region() (Bjorn Helgaas)
   - Change pci_bus_alloc_resource() type_mask to unsigned long (Bjorn Helgaas)
   - Log IDE resource quirk in dmesg (Bjorn Helgaas)
   - Revert "[PATCH] Insert GART region into resource map" (Bjorn Helgaas)

  PCI device hotplug
   - Make check_link_active() non-static (Rajat Jain)
   - Use link change notifications for hot-plug and removal (Rajat Jain)
   - Enable link state change notifications (Rajat Jain)
   - Don't disable the link permanently during removal (Rajat Jain)
   - Don't check adapter or latch status while disabling (Rajat Jain)
   - Disable link notification across slot reset (Rajat Jain)
   - Ensure very fast hotplug events are also processed (Rajat Jain)
   - Add hotplug_lock to serialize hotplug events (Rajat Jain)
   - Remove a non-existent card, regardless of "surprise" capability (Rajat Jain)
   - Don't turn slot off when hot-added device already exists (Yijing Wang)

  MSI
   - Keep pci_enable_msi() documentation (Alexander Gordeev)
   - ahci: Fix broken single MSI fallback (Alexander Gordeev)
   - ahci, vfio: Use pci_enable_msi_range() (Alexander Gordeev)
   - Check kmalloc() return value, fix leak of name (Greg Kroah-Hartman)
   - Fix leak of msi_attrs (Greg Kroah-Hartman)
   - Fix pci_msix_vec_count() htmldocs failure (Masanari Iida)

  Virtualization
   - Device-specific ACS support (Alex Williamson)

  Freescale i.MX6
   - Wait for retraining (Marek Vasut)

  Marvell MVEBU
   - Use Device ID and revision from underlying endpoint (Andrew Lunn)
   - Fix incorrect size for PCI aperture resources (Jason Gunthorpe)
   - Call request_resource() on the apertures (Jason Gunthorpe)
   - Fix potential issue in range parsing (Jean-Jacques Hiblot)

  Renesas R-Car
   - Check platform_get_irq() return code (Ben Dooks)
   - Add error interrupt handling (Ben Dooks)
   - Fix bridge logic configuration accesses (Ben Dooks)
   - Register each instance independently (Magnus Damm)
   - Break out window size handling (Magnus Damm)
   - Make the Kconfig dependencies more generic (Magnus Damm)

  Synopsys DesignWare
   - Fix RC BAR to be single 64-bit non-prefetchable memory (Mohit Kumar)

  Miscellaneous
   - Remove unused SR-IOV VF Migration support (Bjorn Helgaas)
   - Enable INTx if BIOS left them disabled (Bjorn Helgaas)
   - Fix hex vs decimal typo in cpqhpc_probe() (Dan Carpenter)
   - Clean up par-arch object file list (Liviu Dudau)
   - Set IORESOURCE_ROM_SHADOW only for the default VGA device (Sander Eikelenboom)
   - ACPI, ARM, drm, powerpc, pcmcia, PCI: Use list_for_each_entry() for bus traversal (Yijing Wang)
   - Fix pci_bus_b() build failure (Paul Gortmaker)"

* tag 'pci-v3.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (108 commits)
  Revert "[PATCH] Insert GART region into resource map"
  PCI: Log IDE resource quirk in dmesg
  PCI: Change pci_bus_alloc_resource() type_mask to unsigned long
  PCI: Check all IORESOURCE_TYPE_BITS in pci_bus_alloc_from_region()
  resources: Set type in __request_region()
  PCI: Don't check resource_size() in pci_bus_alloc_resource()
  s390/PCI: Use generic pci_enable_resources()
  tile PCI RC: Use default pcibios_enable_device()
  sparc/PCI: Use default pcibios_enable_device() (Leon only)
  sh/PCI: Use default pcibios_enable_device()
  microblaze/PCI: Use default pcibios_enable_device()
  alpha/PCI: Use default pcibios_enable_device()
  PCI: Add "weak" generic pcibios_enable_device() implementation
  PCI: Don't enable decoding if BAR hasn't been assigned an address
  PCI: Enable INTx in pci_reenable_device() only when MSI/MSI-X not enabled
  PCI: Mark 64-bit resource as IORESOURCE_UNSET if we only support 32-bit
  PCI: Don't try to claim IORESOURCE_UNSET resources
  PCI: Check IORESOURCE_UNSET before updating BAR
  PCI: Don't clear IORESOURCE_UNSET when updating BAR
  PCI: Mark resources as IORESOURCE_UNSET if we can't assign them
  ...

Conflicts:
	arch/x86/include/asm/topology.h
	drivers/ata/ahci.c
2014-04-01 15:14:04 -07:00
Linus Torvalds
4dedde7c7a ACPI and power management updates for 3.15-rc1
- Device PM QoS support for latency tolerance constraints on systems with
    hardware interfaces allowing such constraints to be specified.  That is
    necessary to prevent hardware-driven power management from becoming
    overly aggressive on some systems and to prevent power management
    features leading to excessive latencies from being used in some cases.
 
  - Consolidation of the handling of ACPI hotplug notifications for device
    objects.  This causes all device hotplug notifications to go through
    the root notify handler (that was executed for all of them anyway
    before) that propagates them to individual subsystems, if necessary,
    by executing callbacks provided by those subsystems (those callbacks
    are associated with struct acpi_device objects during device
    enumeration).  As a result, the code in question becomes both smaller
    in size and more straightforward and all of those changes should not
    affect users.
 
  - ACPICA update, including fixes related to the handling of _PRT in cases
    when it is broken and the addition of "Windows 2013" to the list of
    supported "features" for _OSI (which is necessary to support systems
    that work incorrectly or don't even boot without it).  Changes from
    Bob Moore and Lv Zheng.
 
  - Consolidation of ACPI _OST handling from Jiang Liu.
 
  - ACPI battery and AC fixes allowing unusual system configurations to
    be handled by that code from Alexander Mezin.
 
  - New device IDs for the ACPI LPSS driver from Chiau Ee Chew.
 
  - ACPI fan and thermal optimizations related to system suspend and resume
    from Aaron Lu.
 
  - Cleanups related to ACPI video from Jean Delvare.
 
  - Assorted ACPI fixes and cleanups from Al Stone, Hanjun Guo, Lan Tianyu,
    Paul Bolle, Tomasz Nowicki.
 
  - Intel RAPL (Running Average Power Limits) driver cleanups from Jacob Pan.
 
  - intel_pstate fixes and cleanups from Dirk Brandewie.
 
  - cpufreq fixes related to system suspend/resume handling from Viresh Kumar.
 
  - cpufreq core fixes and cleanups from Viresh Kumar, Stratos Karafotis,
    Saravana Kannan, Rashika Kheria, Joe Perches.
 
  - cpufreq drivers updates from Viresh Kumar, Zhuoyu Zhang, Rob Herring.
 
  - cpuidle fixes related to the menu governor from Tuukka Tikkanen.
 
  - cpuidle fix related to coupled CPUs handling from Paul Burton.
 
  - Asynchronous execution of all device suspend and resume callbacks,
    except for ->prepare and ->complete, during system suspend and resume
    from Chuansheng Liu.
 
  - Delayed resuming of runtime-suspended devices during system suspend for
    the PCI bus type and ACPI PM domain.
 
  - New set of PM helper routines to allow device runtime PM callbacks to
    be used during system suspend and resume more easily from Ulf Hansson.
 
  - Assorted fixes and cleanups in the PM core from Geert Uytterhoeven,
    Prabhakar Lad, Philipp Zabel, Rashika Kheria, Sebastian Capella.
 
  - devfreq fix from Saravana Kannan.
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJTLgB1AAoJEILEb/54YlRxfs4P/35fIu9h8ClNWUPXqi3nlGIt
 yMyumKvF1VdsOKLbjTtFq6B3UOlhqDijYTCQd7Xt7X8ONTk/ND9ec2t/5xGkSdUI
 q46fa0qZXeqUn0Kt2t+kl6tgVQOkDj94aNlEh+7Ya3Uu6WYDDfmZtOBOFAMk6D8l
 ND4rHJpX+eUsRLBrcxaUxxdD8AW5guGcPKyeyzsXv1bY1BZnpLFrZ3PhuI5dn2CL
 L/zmk3A+wG6+ZlQxnwDdrKa3E6uhRSIDeF0vI4Byspa1wi5zXknJG2J7MoQ9JEE9
 VQpBXlqach5wgXqJ8PAqAeaB6Ie26/F7PYG8r446zKw/5UUtdNUx+0dkjQ7Mz8Tu
 ajuVxfwrrPhZeQqmVBxlH5Gg7Ez2KBKEfDxTdRnzI7FoA7PE5XDcg3kO64bhj8LJ
 yugnV/ToU9wMztZnPC7CoGPwUgxMJvr9LwmxS4aeKcVUBES05eg0vS3lwdZMgqkV
 iO0QkWTmhZ952qZCqZxbh0JqaaX8Wgx2kpX2tf1G2GJqLMZco289bLh6njNT+8CH
 EzdQKYYyn6G6+Qg2M0f/6So3qU17x9XtE4ZBWQdGDpqYOGZhjZAOs/VnB1Ysw/K3
 cDBzswlJd0CyyUps9B+qbf49OpbWVwl5kKeuHUuPxugEVryhpSp9AuG+tNil74Sj
 JuGTGR4fyFjDBX5cvAPm
 =ywR6
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI and power management updates from Rafael Wysocki:
 "The majority of this material spent some time in linux-next, some of
  it even several weeks.  There are a few relatively fresh commits in
  it, but they are mostly fixes and simple cleanups.

  ACPI took the lead this time, both in terms of the number of commits
  and the number of modified lines of code, cpufreq follows and there
  are a few changes in the PM core and in cpuidle too.

  A new feature that already got some LWN.net's attention is the device
  PM QoS extension allowing latency tolerance requirements to be
  propagated from leaf devices to their ancestors with hardware
  interfaces for specifying latency tolerance.  That should help systems
  with hardware-driven power management to avoid going too far with it
  in cases when there are latency tolerance constraints.

  There also are some significant changes in the ACPI core related to
  the way in which hotplug notifications are handled.  They affect PCI
  hotplug (ACPIPHP) and the ACPI dock station code too.  The bottom line
  is that all those notification now go through the root notify handler
  and are propagated to the interested subsystems by means of callbacks
  instead of having to install a notify handler for each device object
  that we can potentially get hotplug notifications for.

  In addition to that ACPICA will now advertise "Windows 2013"
  compatibility for _OSI, because some systems out there don't work
  correctly if that is not done (some of them don't even boot).

  On the system suspend side of things, all of the device suspend and
  resume callbacks, except for ->prepare() and ->complete(), are now
  going to be executed asynchronously as that turns out to speed up
  system suspend and resume on some platforms quite significantly and we
  have a few more optimizations in that area.

  Apart from that, there are some new device IDs and fixes and cleanups
  all over.  In particular, the system suspend and resume handling by
  cpufreq should be improved and the cpuidle menu governor should be a
  bit more robust now.

  Specifics:

   - Device PM QoS support for latency tolerance constraints on systems
     with hardware interfaces allowing such constraints to be specified.
     That is necessary to prevent hardware-driven power management from
     becoming overly aggressive on some systems and to prevent power
     management features leading to excessive latencies from being used
     in some cases.

   - Consolidation of the handling of ACPI hotplug notifications for
     device objects.  This causes all device hotplug notifications to go
     through the root notify handler (that was executed for all of them
     anyway before) that propagates them to individual subsystems, if
     necessary, by executing callbacks provided by those subsystems
     (those callbacks are associated with struct acpi_device objects
     during device enumeration).  As a result, the code in question
     becomes both smaller in size and more straightforward and all of
     those changes should not affect users.

   - ACPICA update, including fixes related to the handling of _PRT in
     cases when it is broken and the addition of "Windows 2013" to the
     list of supported "features" for _OSI (which is necessary to
     support systems that work incorrectly or don't even boot without
     it).  Changes from Bob Moore and Lv Zheng.

   - Consolidation of ACPI _OST handling from Jiang Liu.

   - ACPI battery and AC fixes allowing unusual system configurations to
     be handled by that code from Alexander Mezin.

   - New device IDs for the ACPI LPSS driver from Chiau Ee Chew.

   - ACPI fan and thermal optimizations related to system suspend and
     resume from Aaron Lu.

   - Cleanups related to ACPI video from Jean Delvare.

   - Assorted ACPI fixes and cleanups from Al Stone, Hanjun Guo, Lan
     Tianyu, Paul Bolle, Tomasz Nowicki.

   - Intel RAPL (Running Average Power Limits) driver cleanups from
     Jacob Pan.

   - intel_pstate fixes and cleanups from Dirk Brandewie.

   - cpufreq fixes related to system suspend/resume handling from Viresh
     Kumar.

   - cpufreq core fixes and cleanups from Viresh Kumar, Stratos
     Karafotis, Saravana Kannan, Rashika Kheria, Joe Perches.

   - cpufreq drivers updates from Viresh Kumar, Zhuoyu Zhang, Rob
     Herring.

   - cpuidle fixes related to the menu governor from Tuukka Tikkanen.

   - cpuidle fix related to coupled CPUs handling from Paul Burton.

   - Asynchronous execution of all device suspend and resume callbacks,
     except for ->prepare and ->complete, during system suspend and
     resume from Chuansheng Liu.

   - Delayed resuming of runtime-suspended devices during system suspend
     for the PCI bus type and ACPI PM domain.

   - New set of PM helper routines to allow device runtime PM callbacks
     to be used during system suspend and resume more easily from Ulf
     Hansson.

   - Assorted fixes and cleanups in the PM core from Geert Uytterhoeven,
     Prabhakar Lad, Philipp Zabel, Rashika Kheria, Sebastian Capella.

   - devfreq fix from Saravana Kannan"

* tag 'pm+acpi-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (162 commits)
  PM / devfreq: Rewrite devfreq_update_status() to fix multiple bugs
  PM / sleep: Correct whitespace errors in <linux/pm.h>
  intel_pstate: Set core to min P state during core offline
  cpufreq: Add stop CPU callback to cpufreq_driver interface
  cpufreq: Remove unnecessary braces
  cpufreq: Fix checkpatch errors and warnings
  cpufreq: powerpc: add cpufreq transition latency for FSL e500mc SoCs
  MAINTAINERS: Reorder maintainer addresses for PM and ACPI
  PM / Runtime: Update runtime_idle() documentation for return value meaning
  video / output: Drop display output class support
  fujitsu-laptop: Drop unneeded include
  acer-wmi: Stop selecting VIDEO_OUTPUT_CONTROL
  ACPI / gpu / drm: Stop selecting VIDEO_OUTPUT_CONTROL
  ACPI / video: fix ACPI_VIDEO dependencies
  cpufreq: remove unused notifier: CPUFREQ_{SUSPENDCHANGE|RESUMECHANGE}
  cpufreq: Do not allow ->setpolicy drivers to provide ->target
  cpufreq: arm_big_little: set 'physical_cluster' for each CPU
  cpufreq: arm_big_little: make vexpress driver depend on bL core driver
  ACPI / button: Add ACPI Button event via netlink routine
  ACPI: Remove duplicate definitions of PREFIX
  ...
2014-04-01 12:48:54 -07:00
Linus Torvalds
683b6c6f82 Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq code updates from Thomas Gleixner:
 "The irq department proudly presents:

   - Another tree wide sweep of irq infrastructure abuse.  Clear winner
     of the trainwreck engineering contest was:
         #include "../../../kernel/irq/settings.h"

   - Tree wide update of irq_set_affinity() callbacks which miss a cpu
     online check when picking a single cpu out of the affinity mask.

   - Tree wide consolidation of interrupt statistics.

   - Updates to the threaded interrupt infrastructure to allow explicit
     wakeup of the interrupt thread and a variant of synchronize_irq()
     which synchronizes only the hard interrupt handler.  Both are
     needed to replace the homebrewn thread handling in the mmc/sdhci
     code.

   - New irq chip callbacks to allow proper support for GPIO based irqs.
     The GPIO based interrupts need to request/release GPIO resources
     from request/free_irq.

   - A few new ARM interrupt chips.  No revolutionary new hardware, just
     differently wreckaged variations of the scheme.

   - Small improvments, cleanups and updates all over the place"

I was hoping that that trainwreck engineering contest was a April Fools'
joke.  But no.

* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (68 commits)
  irqchip: sun7i/sun6i: Disable NMI before registering the handler
  ARM: sun7i/sun6i: dts: Fix IRQ number for sun6i NMI controller
  ARM: sun7i/sun6i: irqchip: Update the documentation
  ARM: sun7i/sun6i: dts: Add NMI irqchip support
  ARM: sun7i/sun6i: irqchip: Add irqchip driver for NMI controller
  genirq: Export symbol no_action()
  arm: omap: Fix typo in ams-delta-fiq.c
  m68k: atari: Fix the last kernel_stat.h fallout
  irqchip: sun4i: Simplify sun4i_irq_ack
  irqchip: sun4i: Use handle_fasteoi_irq for all interrupts
  genirq: procfs: Make smp_affinity values go+r
  softirq: Add linux/irq.h to make it compile again
  m68k: amiga: Add linux/irq.h to make it compile again
  irqchip: sun4i: Don't ack IRQs > 0, fix acking of IRQ 0
  irqchip: sun4i: Fix a comment about mask register initialization
  irqchip: sun4i: Fix irq 0 not working
  genirq: Add a new IRQCHIP_EOI_THREADED flag
  genirq: Document IRQCHIP_ONESHOT_SAFE flag
  ARM: sunxi: dt: Convert to the new irq controller compatibles
  irqchip: sunxi: Change compatibles
  ...
2014-04-01 11:22:57 -07:00
Linus Torvalds
1ead658124 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer changes from Thomas Gleixner:
 "This assorted collection provides:

   - A new timer based timer broadcast feature for systems which do not
     provide a global accessible timer device.  That allows those
     systems to put CPUs into deep idle states where the per cpu timer
     device stops.

   - A few NOHZ_FULL related improvements to the timer wheel

   - The usual updates to timer devices found in ARM SoCs

   - Small improvements and updates all over the place"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits)
  tick: Remove code duplication in tick_handle_periodic()
  tick: Fix spelling mistake in tick_handle_periodic()
  x86: hpet: Use proper destructor for delayed work
  workqueue: Provide destroy_delayed_work_on_stack()
  clocksource: CMT, MTU2, TMU and STI should depend on GENERIC_CLOCKEVENTS
  timer: Remove code redundancy while calling get_nohz_timer_target()
  hrtimer: Rearrange comments in the order struct members are declared
  timer: Use variable head instead of &work_list in __run_timers()
  clocksource: exynos_mct: silence a static checker warning
  arm: zynq: Add support for cpufreq
  arm: zynq: Don't use arm_global_timer with cpufreq
  clocksource/cadence_ttc: Overhaul clocksource frequency adjustment
  clocksource/cadence_ttc: Call clockevents_update_freq() with IRQs enabled
  clocksource: Add Kconfig entries for CMT, MTU2, TMU and STI
  sh: Remove Kconfig entries for TMU, CMT and MTU2
  ARM: shmobile: Remove CMT, TMU and STI Kconfig entries
  clocksource: armada-370-xp: Use atomic access for shared registers
  clocksource: orion: Use atomic access for shared registers
  clocksource: timer-keystone: Delete unnecessary variable
  clocksource: timer-keystone: introduce clocksource driver for Keystone
  ...
2014-04-01 11:00:07 -07:00
Linus Torvalds
a21e40877a Merge branch 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Ingo Molnar:
 "The main purpose is to fix a full dynticks bug related to
  virtualization, where steal time accounting appears to be zero in
  /proc/stat even after a few seconds of competing guests running busy
  loops in a same host CPU.  It's not a regression though as it was
  there since the beginning.

  The other commits are preparatory work to fix the bug and various
  cleanups"

* 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  arch: Remove stub cputime.h headers
  sched: Remove needless round trip nsecs <-> tick conversion of steal time
  cputime: Fix jiffies based cputime assumption on steal accounting
  cputime: Bring cputime -> nsecs conversion
  cputime: Default implementation of nsecs -> cputime conversion
  cputime: Fix nsecs_to_cputime() return type cast
2014-04-01 10:16:10 -07:00
Linus Torvalds
1ce235faa8 - KGDB support for arm64
- PCI I/O space extended to 16M (in preparation of PCIe support patches)
 - Dropping ZONE_DMA32 in favour of ZONE_DMA (we only need one for the
   time being), together with swiotlb late initialisation to correctly
   setup the bounce buffer
 - DMA API cache maintenance support (not all ARMv8 platforms have
   hardware cache coherency)
 - Crypto extensions advertising via ELF_HWCAP2 for compat user space
 - Perf support for dwarf unwinding in compat mode
 - asm/tlb.h converted to the generic mmu_gather code
 - asm-generic rwsem implementation
 - Code clean-up
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.9 (GNU/Linux)
 
 iQIcBAABAgAGBQJTOaqsAAoJEGvWsS0AyF7xYNUP/3/IPySIB+/6pyUG6q7kvIpF
 Di93M+VdmnLEOKhhx/tjkiEmEQMp0hFPeOlQRWf/Ugg4ksulP6gRejdDEjIfkmsk
 LrRXLjvH79NDJbN0pTUXqGDvLLZ9Qnib+HEOuKABIYUrwhNKySBk+5omGfXFtwLR
 Mb5JxPX0kbBXOqbOX4RgANQoRlE8GxJR3V245zlGxA4klcN4IiaDy/99kj+kaeaa
 Cl8X9K2I550IZ2YUAWPOut2aee2qRFQtAhIDgVthTYlGRx7Y/rDLM16B8fFY/T0H
 7azIpSO5hk5lp8J3giJHYajlJlXNla5FeHQb8XAVnlyqFBmCUn0vvd2VbPvWREJp
 UD8t1vZZt/s2he6CVAQIfQghwLyzrpPa19KbnyI+3HtsZ+NS/puBJmcVKZ2PBY/L
 28BsRzB7BKAPEVhNmyPwFHNdZTvjaqYUCLhQ0uTp1sSHMcLeSs7+vyMR99f/0u9E
 doSYAeF41ZkxHXL5xEevdj4sFkCEY1XFxER1Y8VM1rqHTeGEoeYbdS/u9tEeBgit
 jBelvHAlNTBgbur2nW4E9fQpAF2CsvWnRq6lSmDRTkyjzcLUQqA8bsQJ3aUyJtZt
 j17kUIzSH1q7x3zAaWQcvMVeawdkv2+HanjuTOdeO2ehvyG71vvxA3RkCv8o5Jhh
 da+jAMhkpYQxk8mSKkWm
 =8+cB
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull ARM64 updates from Catalin Marinas:
 - KGDB support for arm64
 - PCI I/O space extended to 16M (in preparation of PCIe support
   patches)
 - Dropping ZONE_DMA32 in favour of ZONE_DMA (we only need one for the
   time being), together with swiotlb late initialisation to correctly
   setup the bounce buffer
 - DMA API cache maintenance support (not all ARMv8 platforms have
   hardware cache coherency)
 - Crypto extensions advertising via ELF_HWCAP2 for compat user space
 - Perf support for dwarf unwinding in compat mode
 - asm/tlb.h converted to the generic mmu_gather code
 - asm-generic rwsem implementation
 - Code clean-up

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (42 commits)
  arm64: Remove pgprot_dmacoherent()
  arm64: Support DMA_ATTR_WRITE_COMBINE
  arm64: Implement custom mmap functions for dma mapping
  arm64: Fix __range_ok macro
  arm64: Fix duplicated Kconfig entries
  arm64: mm: Route pmd thp functions through pte equivalents
  arm64: rwsem: use asm-generic rwsem implementation
  asm-generic: rwsem: de-PPCify rwsem.h
  arm64: enable generic CPU feature modalias matching for this architecture
  arm64: smp: make local symbol static
  arm64: debug: make local symbols static
  ARM64: perf: support dwarf unwinding in compat mode
  ARM64: perf: add support for frame pointer unwinding in compat mode
  ARM64: perf: add support for perf registers API
  arm64: Add boot time configuration of Intermediate Physical Address size
  arm64: Do not synchronise I and D caches for special ptes
  arm64: Make DMA coherent and strongly ordered mappings not executable
  arm64: barriers: add dmb barrier
  arm64: topology: Implement basic CPU topology support
  arm64: advertise ARMv8 extensions to 32-bit compat ELF binaries
  ...
2014-03-31 15:01:45 -07:00
Linus Torvalds
1f8c538ed6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Martin Schwidefsky:
 "There are two memory management related changes, the CMMA support for
  KVM to avoid swap-in of freed pages and the split page table lock for
  the PMD level.  These two come with common code changes in mm/.

  A fix for the long standing theoretical TLB flush problem, this one
  comes with a common code change in kernel/sched/.

  Another set of changes is Heikos uaccess work, included is the initial
  set of patches with more to come.

  And fixes and cleanups as usual"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (36 commits)
  s390/con3270: optionally disable auto update
  s390/mm: remove unecessary parameter from pgste_ipte_notify
  s390/mm: remove unnecessary parameter from gmap_do_ipte_notify
  s390/mm: fixing comment so that parameter name match
  s390/smp: limit number of cpus in possible cpu mask
  hypfs: Add clarification for "weight_min" attribute
  s390: update defconfigs
  s390/ptrace: add support for PTRACE_SINGLEBLOCK
  s390/perf: make print_debug_cf() static
  s390/topology: Remove call to update_cpu_masks()
  s390/compat: remove compat exec domain
  s390: select CONFIG_TTY for use of tty in unconditional keyboard driver
  s390/appldata_os: fix cpu array size calculation
  s390/checksum: remove memset() within csum_partial_copy_from_user()
  s390/uaccess: remove copy_from_user_real()
  s390/sclp_early: Return correct HSA block count also for zero
  s390: add some drivers/subsystems to the MAINTAINERS file
  s390: improve debug feature usage
  s390/airq: add support for irq ranges
  s390/mm: enable split page table lock for PMD level
  ...
2014-03-31 14:35:30 -07:00
Linus Torvalds
190f918660 Merge branch 'compat' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 compat wrapper rework from Heiko Carstens:
 "S390 compat system call wrapper simplification work.

  The intention of this work is to get rid of all hand written assembly
  compat system call wrappers on s390, which perform proper sign or zero
  extension, or pointer conversion of compat system call parameters.
  Instead all of this should be done with C code eg by using Al's
  COMPAT_SYSCALL_DEFINEx() macro.

  Therefore all common code and s390 specific compat system calls have
  been converted to the COMPAT_SYSCALL_DEFINEx() macro.

  In order to generate correct code all compat system calls may only
  have eg compat_ulong_t parameters, but no unsigned long parameters.
  Those patches which change parameter types from unsigned long to
  compat_ulong_t parameters are separate in this series, but shouldn't
  cause any harm.

  The only compat system calls which intentionally have 64 bit
  parameters (preadv64 and pwritev64) in support of the x86/32 ABI
  haven't been changed, but are now only available if an architecture
  defines __ARCH_WANT_COMPAT_SYS_PREADV64/PWRITEV64.

  System calls which do not have a compat variant but still need proper
  zero extension on s390, like eg "long sys_brk(unsigned long brk)" will
  get a proper wrapper function with the new s390 specific
  COMPAT_SYSCALL_WRAPx() macro:

     COMPAT_SYSCALL_WRAP1(brk, unsigned long, brk);

  which generates the following code (simplified):

     asmlinkage long sys_brk(unsigned long brk);
     asmlinkage long compat_sys_brk(long brk)
     {
         return sys_brk((u32)brk);
     }

  Given that the C file which contains all the COMPAT_SYSCALL_WRAP lines
  includes both linux/syscall.h and linux/compat.h, it will generate
  build errors, if the declaration of sys_brk() doesn't match, or if
  there exists a non-matching compat_sys_brk() declaration.

  In addition this will intentionally result in a link error if
  somewhere else a compat_sys_brk() function exists, which probably
  should have been used instead.  Two more BUILD_BUG_ONs make sure the
  size and type of each compat syscall parameter can be handled
  correctly with the s390 specific macros.

  I converted the compat system calls step by step to verify the
  generated code is correct and matches the previous code.  In fact it
  did not always match, however that was always a bug in the hand
  written asm code.

  In result we get less code, less bugs, and much more sanity checking"

* 'compat' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (44 commits)
  s390/compat: add copyright statement
  compat: include linux/unistd.h within linux/compat.h
  s390/compat: get rid of compat wrapper assembly code
  s390/compat: build error for large compat syscall args
  mm/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types
  kexec/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types
  net/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types
  ipc/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types
  fs/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types
  ipc/compat: convert to COMPAT_SYSCALL_DEFINE
  fs/compat: convert to COMPAT_SYSCALL_DEFINE
  security/compat: convert to COMPAT_SYSCALL_DEFINE
  mm/compat: convert to COMPAT_SYSCALL_DEFINE
  net/compat: convert to COMPAT_SYSCALL_DEFINE
  kernel/compat: convert to COMPAT_SYSCALL_DEFINE
  fs/compat: optional preadv64/pwrite64 compat system calls
  ipc/compat_sys_msgrcv: change msgtyp type from long to compat_long_t
  s390/compat: partial parameter conversion within syscall wrappers
  s390/compat: automatic zero, sign and pointer conversion of syscalls
  s390/compat: add sync_file_range and fallocate compat syscalls
  ...
2014-03-31 14:32:17 -07:00
Linus Torvalds
176ab02d49 Merge branch 'x86-asmlinkage-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 LTO changes from Peter Anvin:
 "More infrastructure work in preparation for link-time optimization
  (LTO).  Most of these changes is to make sure symbols accessed from
  assembly code are properly marked as visible so the linker doesn't
  remove them.

  My understanding is that the changes to support LTO are still not
  upstream in binutils, but are on the way there.  This patchset should
  conclude the x86-specific changes, and remaining patches to actually
  enable LTO will be fed through the Kbuild tree (other than keeping up
  with changes to the x86 code base, of course), although not
  necessarily in this merge window"

* 'x86-asmlinkage-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
  Kbuild, lto: Handle basic LTO in modpost
  Kbuild, lto: Disable LTO for asm-offsets.c
  Kbuild, lto: Add a gcc-ld script to let run gcc as ld
  Kbuild, lto: add ld-version and ld-ifversion macros
  Kbuild, lto: Drop .number postfixes in modpost
  Kbuild, lto, workaround: Don't warn for initcall_reference in modpost
  lto: Disable LTO for sys_ni
  lto: Handle LTO common symbols in module loader
  lto, workaround: Add workaround for initcall reordering
  lto: Make asmlinkage __visible
  x86, lto: Disable LTO for the x86 VDSO
  initconst, x86: Fix initconst mistake in ts5500 code
  initconst: Fix initconst mistake in dcdbas
  asmlinkage: Make trace_hardirqs_on/off_caller visible
  asmlinkage, x86: Fix 32bit memcpy for LTO
  asmlinkage Make __stack_chk_failed and memcmp visible
  asmlinkage: Mark rwsem functions that can be called from assembler asmlinkage
  asmlinkage: Make main_extable_sort_needed visible
  asmlinkage, mutex: Mark __visible
  asmlinkage: Make trace_hardirq visible
  ...
2014-03-31 14:13:25 -07:00
Eric Paris
543bc6a1a9 AUDIT: Allow login in non-init namespaces
It its possible to configure your PAM stack to refuse login if audit
messages (about the login) were unable to be sent.  This is common in
many distros and thus normal configuration of many containers.  The PAM
modules determine if audit is enabled/disabled in the kernel based on
the return value from sending an audit message on the netlink socket.
If userspace gets back ECONNREFUSED it believes audit is disabled in the
kernel.  If it gets any other error else it refuses to let the login
proceed.

Just about ever since the introduction of namespaces the kernel audit
subsystem has returned EPERM if the task sending a message was not in
the init user or pid namespace.  So many forms of containers have never
worked if audit was enabled in the kernel.

BUT if the container was not in net_init then the kernel network code
would send ECONNREFUSED (instead of the audit code sending EPERM).  Thus
by pure accident/dumb luck/bug if an admin configured the PAM stack to
reject all logins that didn't talk to audit, but then ran the login
untility in the non-init_net namespace, it would work!! Clearly this was
a bug, but it is a bug some people expected.

With the introduction of network namespace support in 3.14-rc1 the two
bugs stopped cancelling each other out.  Now, containers in the
non-init_net namespace refused to let users log in (just like PAM was
configfured!) Obviously some people were not happy that what used to let
users log in, now didn't!

This fix is kinda hacky.  We return ECONNREFUSED for all non-init
relevant namespaces.  That means that not only will the old broken
non-init_net setups continue to work, now the broken non-init_pid or
non-init_user setups will 'work'.  They don't really work, since audit
isn't logging things.  But it's what most users want.

In 3.15 we should have patches to support not only the non-init_net
(3.14) namespace but also the non-init_pid and non-init_user namespace.
So all will be right in the world.  This just opens the doors wide open
on 3.14 and hopefully makes users happy, if not the audit system...

Reported-by: Andre Tomt <andre@tomt.net>
Reported-by: Adam Richter <adam_richter2004@yahoo.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Conflicts:
	kernel/audit.c
2014-03-31 15:36:41 -04:00
Linus Torvalds
918d80a136 Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu handling changes from Ingo Molnar:
 "Bigger changes:

   - Intel CPU hardware-enablement: new vector instructions support
     (AVX-512), by Fenghua Yu.

   - Support the clflushopt instruction and use it in appropriate
     places.  clflushopt is similar to clflush but with more relaxed
     ordering, by Ross Zwisler.

   - MSR accessor cleanups, by Borislav Petkov.

   - 'forcepae' boot flag for those who have way too much time to spend
     on way too old Pentium-M systems and want to live way too
     dangerously, by Chris Bainbridge"

* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, cpu: Add forcepae parameter for booting PAE kernels on PAE-disabled Pentium M
  Rename TAINT_UNSAFE_SMP to TAINT_CPU_OUT_OF_SPEC
  x86, intel: Make MSR_IA32_MISC_ENABLE bit constants systematic
  x86, Intel: Convert to the new bit access MSR accessors
  x86, AMD: Convert to the new bit access MSR accessors
  x86: Add another set of MSR accessor functions
  x86: Use clflushopt in drm_clflush_virt_range
  x86: Use clflushopt in drm_clflush_page
  x86: Use clflushopt in clflush_cache_range
  x86: Add support for the clflushopt instruction
  x86, AVX-512: Enable AVX-512 States Context Switch
  x86, AVX-512: AVX-512 Feature Detection
2014-03-31 12:00:45 -07:00
Linus Torvalds
971eae7c99 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler changes from Ingo Molnar:
 "Bigger changes:

   - sched/idle restructuring: they are WIP preparation for deeper
     integration between the scheduler and idle state selection, by
     Nicolas Pitre.

   - add NUMA scheduling pseudo-interleaving, by Rik van Riel.

   - optimize cgroup context switches, by Peter Zijlstra.

   - RT scheduling enhancements, by Thomas Gleixner.

  The rest is smaller changes, non-urgnt fixes and cleanups"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (68 commits)
  sched: Clean up the task_hot() function
  sched: Remove double calculation in fix_small_imbalance()
  sched: Fix broken setscheduler()
  sparc64, sched: Remove unused sparc64_multi_core
  sched: Remove unused mc_capable() and smt_capable()
  sched/numa: Move task_numa_free() to __put_task_struct()
  sched/fair: Fix endless loop in idle_balance()
  sched/core: Fix endless loop in pick_next_task()
  sched/fair: Push down check for high priority class task into idle_balance()
  sched/rt: Fix picking RT and DL tasks from empty queue
  trace: Replace hardcoding of 19 with MAX_NICE
  sched: Guarantee task priority in pick_next_task()
  sched/idle: Remove stale old file
  sched: Put rq's sched_avg under CONFIG_FAIR_GROUP_SCHED
  cpuidle/arm64: Remove redundant cpuidle_idle_call()
  cpuidle/powernv: Remove redundant cpuidle_idle_call()
  sched, nohz: Exclude isolated cores from load balancing
  sched: Fix select_task_rq_fair() description comments
  workqueue: Replace hardcoding of -20 and 19 with MIN_NICE and MAX_NICE
  sys: Replace hardcoding of -20 and 19 with MIN_NICE and MAX_NICE
  ...
2014-03-31 11:21:19 -07:00
Linus Torvalds
8c292f1174 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf changes from Ingo Molnar:
 "Main changes:

  Kernel side changes:

   - Add SNB/IVB/HSW client uncore memory controller support (Stephane
     Eranian)

   - Fix various x86/P4 PMU driver bugs (Don Zickus)

  Tooling, user visible changes:

   - Add several futex 'perf bench' microbenchmarks (Davidlohr Bueso)

   - Speed up thread map generation (Don Zickus)

   - Introduce 'perf kvm --list-cmds' command line option for use by
     scripts (Ramkumar Ramachandra)

   - Print the evsel name in the annotate stdio output, prep to fix
     support outputting annotation for multiple events, not just for the
     first one (Arnaldo Carvalho de Melo)

   - Allow setting preferred callchain method in .perfconfig (Jiri Olsa)

   - Show in what binaries/modules 'perf probe's are set (Masami
     Hiramatsu)

   - Support distro-style debuginfo for uprobe in 'perf probe' (Masami
     Hiramatsu)

  Tooling, internal changes and fixes:

   - Use tid in mmap/mmap2 events to find maps (Don Zickus)

   - Record the reason for filtering an address_location (Namhyung Kim)

   - Apply all filters to an addr_location (Namhyung Kim)

   - Merge al->filtered with hist_entry->filtered in report/hists
     (Namhyung Kim)

   - Fix memory leak when synthesizing thread records (Namhyung Kim)

   - Use ui__has_annotation() in 'report' (Namhyung Kim)

   - hists browser refactorings to reuse code accross UIs (Namhyung Kim)

   - Add support for the new DWARF unwinder library in elfutils (Jiri
     Olsa)

   - Fix build race in the generation of bison files (Jiri Olsa)

   - Further streamline the feature detection display, trimming it a bit
     to show just the libraries detected, using VF=1 gets a more verbose
     output, showing the less interesting feature checks as well (Jiri
     Olsa).

   - Check compatible symtab type before loading dso (Namhyung Kim)

   - Check return value of filename__read_debuglink() (Stephane Eranian)

   - Move some hashing and fs related code from tools/perf/util/ to
     tools/lib/ so that it can be used by more tools/ living utilities
     (Borislav Petkov)

   - Prepare DWARF unwinding code for using an elfutils alternative
     unwinding library (Jiri Olsa)

   - Fix DWARF unwind max_stack processing (Jiri Olsa)

   - Add dwarf unwind 'perf test' entry (Jiri Olsa)

   - 'perf probe' improvements including memory leak fixes, sharing the
     intlist class with other tools, uprobes/kprobes code sharing and
     use of ref_reloc_sym (Masami Hiramatsu)

   - Shorten sample symbol resolving by adding cpumode to struct
     addr_location (Arnaldo Carvalho de Melo)

   - Fix synthesizing mmaps for threads (Don Zickus)

   - Fix invalid output on event group stdio report (Namhyung Kim)

   - Fixup header alignment in 'perf sched latency' output (Ramkumar
     Ramachandra)

   - Fix off-by-one error in 'perf timechart record' argv handling
     (Ramkumar Ramachandra)

  Tooling, cleanups:

   - Remove unused thread__find_map function (Jiri Olsa)

   - Remove unused simple_strtoul() function (Ramkumar Ramachandra)

  Tooling, documentation updates:

   - Update function names in debug messages (Ramkumar Ramachandra)

   - Update some code references in design.txt (Ramkumar Ramachandra)

   - Clarify load-latency information in the 'perf mem' docs (Andi
     Kleen)

   - Clarify x86 register naming in 'perf probe' docs (Andi Kleen)"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (96 commits)
  perf tools: Remove unused simple_strtoul() function
  perf tools: Update some code references in design.txt
  perf evsel: Update function names in debug messages
  perf tools: Remove thread__find_map function
  perf annotate: Print the evsel name in the stdio output
  perf report: Use ui__has_annotation()
  perf tools: Fix memory leak when synthesizing thread records
  perf tools: Use tid in mmap/mmap2 events to find maps
  perf report: Merge al->filtered with hist_entry->filtered
  perf symbols: Apply all filters to an addr_location
  perf symbols: Record the reason for filtering an address_location
  perf sched: Fixup header alignment in 'latency' output
  perf timechart: Fix off-by-one error in 'record' argv handling
  perf machine: Factor machine__find_thread to take tid argument
  perf tools: Speed up thread map generation
  perf kvm: introduce --list-cmds for use by scripts
  perf ui hists: Pass evsel to hpp->header/width functions explicitly
  perf symbols: Introduce thread__find_cpumode_addr_location
  perf session: Change header.misc dump from decimal to hex
  perf ui/tui: Reuse generic __hpp__fmt() code
  ...
2014-03-31 11:13:25 -07:00
Linus Torvalds
b3fd4ea9df Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
 "Main changes:

   - Torture-test changes, including refactoring of rcutorture and
     introduction of a vestigial locktorture.

   - Real-time latency fixes.

   - Documentation updates.

   - Miscellaneous fixes"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (77 commits)
  rcu: Provide grace-period piggybacking API
  rcu: Ensure kernel/rcu/rcu.h can be sourced/used stand-alone
  rcu: Fix sparse warning for rcu_expedited from kernel/ksysfs.c
  notifier: Substitute rcu_access_pointer() for rcu_dereference_raw()
  Documentation/memory-barriers.txt: Clarify release/acquire ordering
  rcutorture: Save kvm.sh output to log
  rcutorture: Add a lock_busted to test the test
  rcutorture: Place kvm-test-1-run.sh output into res directory
  rcutorture: Rename TREE_RCU-Kconfig.txt
  locktorture: Add kvm-recheck.sh plug-in for locktorture
  rcutorture: Gracefully handle NULL cleanup hooks
  locktorture: Add vestigial locktorture configuration
  rcutorture: Introduce "rcu" directory level underneath configs
  rcutorture: Rename kvm-test-1-rcu.sh
  rcutorture: Remove RCU dependencies from ver_functions.sh API
  rcutorture: Create CFcommon file for common Kconfig parameters
  rcutorture: Create config files for scripted test-the-test testing
  rcutorture: Add an rcu_busted to test the test
  locktorture: Add a lock-torture kernel module
  rcutorture: Abstract kvm-recheck.sh
  ...
2014-03-31 11:05:24 -07:00
Linus Torvalds
462bf234a8 Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core locking updates from Ingo Molnar:
 "The biggest change is the MCS spinlock generalization changes from Tim
  Chen, Peter Zijlstra, Jason Low et al.  There's also lockdep
  fixes/enhancements from Oleg Nesterov, in particular a false negative
  fix related to lockdep_set_novalidate_class() usage"

* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
  locking/mutex: Fix debug checks
  locking/mutexes: Add extra reschedule point
  locking/mutexes: Introduce cancelable MCS lock for adaptive spinning
  locking/mutexes: Unlock the mutex without the wait_lock
  locking/mutexes: Modify the way optimistic spinners are queued
  locking/mutexes: Return false if task need_resched() in mutex_can_spin_on_owner()
  locking: Move mcs_spinlock.h into kernel/locking/
  m68k: Skip futex_atomic_cmpxchg_inatomic() test
  futex: Allow architectures to skip futex_atomic_cmpxchg_inatomic() test
  Revert "sched/wait: Suppress Sparse 'variable shadowing' warning"
  lockdep: Change lockdep_set_novalidate_class() to use _and_name
  lockdep: Change mark_held_locks() to check hlock->check instead of lockdep_no_validate
  lockdep: Don't create the wrong dependency on hlock->check == 0
  lockdep: Make held_lock->check and "int check" argument bool
  locking/mcs: Allow architecture specific asm files to be used for contended case
  locking/mcs: Order the header files in Kbuild of each architecture in alphabetical order
  sched/wait: Suppress Sparse 'variable shadowing' warning
  hung_task/Documentation: Fix hung_task_warnings description
  locking/mcs: Allow architectures to hook in to contended paths
  locking/mcs: Micro-optimize the MCS code, add extra comments
  ...
2014-03-31 10:59:39 -07:00
Alexei Starovoitov
bd4cf0ed33 net: filter: rework/optimize internal BPF interpreter's instruction set
This patch replaces/reworks the kernel-internal BPF interpreter with
an optimized BPF instruction set format that is modelled closer to
mimic native instruction sets and is designed to be JITed with one to
one mapping. Thus, the new interpreter is noticeably faster than the
current implementation of sk_run_filter(); mainly for two reasons:

1. Fall-through jumps:

  BPF jump instructions are forced to go either 'true' or 'false'
  branch which causes branch-miss penalty. The new BPF jump
  instructions have only one branch and fall-through otherwise,
  which fits the CPU branch predictor logic better. `perf stat`
  shows drastic difference for branch-misses between the old and
  new code.

2. Jump-threaded implementation of interpreter vs switch
   statement:

  Instead of single table-jump at the top of 'switch' statement,
  gcc will now generate multiple table-jump instructions, which
  helps CPU branch predictor logic.

Note that the verification of filters is still being done through
sk_chk_filter() in classical BPF format, so filters from user- or
kernel space are verified in the same way as we do now, and same
restrictions/constraints hold as well.

We reuse current BPF JIT compilers in a way that this upgrade would
even be fine as is, but nevertheless allows for a successive upgrade
of BPF JIT compilers to the new format.

The internal instruction set migration is being done after the
probing for JIT compilation, so in case JIT compilers are able to
create a native opcode image, we're going to use that, and in all
other cases we're doing a follow-up migration of the BPF program's
instruction set, so that it can be transparently run in the new
interpreter.

In short, the *internal* format extends BPF in the following way (more
details can be taken from the appended documentation):

  - Number of registers increase from 2 to 10
  - Register width increases from 32-bit to 64-bit
  - Conditional jt/jf targets replaced with jt/fall-through
  - Adds signed > and >= insns
  - 16 4-byte stack slots for register spill-fill replaced
    with up to 512 bytes of multi-use stack space
  - Introduction of bpf_call insn and register passing convention
    for zero overhead calls from/to other kernel functions
  - Adds arithmetic right shift and endianness conversion insns
  - Adds atomic_add insn
  - Old tax/txa insns are replaced with 'mov dst,src' insn

Performance of two BPF filters generated by libpcap resp. bpf_asm
was measured on x86_64, i386 and arm32 (other libpcap programs
have similar performance differences):

fprog #1 is taken from Documentation/networking/filter.txt:
tcpdump -i eth0 port 22 -dd

fprog #2 is taken from 'man tcpdump':
tcpdump -i eth0 'tcp port 22 and (((ip[2:2] - ((ip[0]&0xf)<<2)) -
   ((tcp[12]&0xf0)>>2)) != 0)' -dd

Raw performance data from BPF micro-benchmark: SK_RUN_FILTER on the
same SKB (cache-hit) or 10k SKBs (cache-miss); time in ns per call,
smaller is better:

--x86_64--
         fprog #1  fprog #1   fprog #2  fprog #2
         cache-hit cache-miss cache-hit cache-miss
old BPF      90       101        192       202
new BPF      31        71         47        97
old BPF jit  12        34         17        44
new BPF jit TBD

--i386--
         fprog #1  fprog #1   fprog #2  fprog #2
         cache-hit cache-miss cache-hit cache-miss
old BPF     107       136        227       252
new BPF      40       119         69       172

--arm32--
         fprog #1  fprog #1   fprog #2  fprog #2
         cache-hit cache-miss cache-hit cache-miss
old BPF     202       300        475       540
new BPF     180       270        330       470
old BPF jit  26       182         37       202
new BPF jit TBD

Thus, without changing any userland BPF filters, applications on
top of AF_PACKET (or other families) such as libpcap/tcpdump, cls_bpf
classifier, netfilter's xt_bpf, team driver's load-balancing mode,
and many more will have better interpreter filtering performance.

While we are replacing the internal BPF interpreter, we also need
to convert seccomp BPF in the same step to make use of the new
internal structure since it makes use of lower-level API details
without being further decoupled through higher-level calls like
sk_unattached_filter_{create,destroy}(), for example.

Just as for normal socket filtering, also seccomp BPF experiences
a time-to-verdict speedup:

05-sim-long_jumps.c of libseccomp was used as micro-benchmark:

  seccomp_rule_add_exact(ctx,...
  seccomp_rule_add_exact(ctx,...

  rc = seccomp_load(ctx);

  for (i = 0; i < 10000000; i++)
     syscall(199, 100);

'short filter' has 2 rules
'large filter' has 200 rules

'short filter' performance is slightly better on x86_64/i386/arm32
'large filter' is much faster on x86_64 and i386 and shows no
               difference on arm32

--x86_64-- short filter
old BPF: 2.7 sec
 39.12%  bench  libc-2.15.so       [.] syscall
  8.10%  bench  [kernel.kallsyms]  [k] sk_run_filter
  6.31%  bench  [kernel.kallsyms]  [k] system_call
  5.59%  bench  [kernel.kallsyms]  [k] trace_hardirqs_on_caller
  4.37%  bench  [kernel.kallsyms]  [k] trace_hardirqs_off_caller
  3.70%  bench  [kernel.kallsyms]  [k] __secure_computing
  3.67%  bench  [kernel.kallsyms]  [k] lock_is_held
  3.03%  bench  [kernel.kallsyms]  [k] seccomp_bpf_load
new BPF: 2.58 sec
 42.05%  bench  libc-2.15.so       [.] syscall
  6.91%  bench  [kernel.kallsyms]  [k] system_call
  6.25%  bench  [kernel.kallsyms]  [k] trace_hardirqs_on_caller
  6.07%  bench  [kernel.kallsyms]  [k] __secure_computing
  5.08%  bench  [kernel.kallsyms]  [k] sk_run_filter_int_seccomp

--arm32-- short filter
old BPF: 4.0 sec
 39.92%  bench  [kernel.kallsyms]  [k] vector_swi
 16.60%  bench  [kernel.kallsyms]  [k] sk_run_filter
 14.66%  bench  libc-2.17.so       [.] syscall
  5.42%  bench  [kernel.kallsyms]  [k] seccomp_bpf_load
  5.10%  bench  [kernel.kallsyms]  [k] __secure_computing
new BPF: 3.7 sec
 35.93%  bench  [kernel.kallsyms]  [k] vector_swi
 21.89%  bench  libc-2.17.so       [.] syscall
 13.45%  bench  [kernel.kallsyms]  [k] sk_run_filter_int_seccomp
  6.25%  bench  [kernel.kallsyms]  [k] __secure_computing
  3.96%  bench  [kernel.kallsyms]  [k] syscall_trace_exit

--x86_64-- large filter
old BPF: 8.6 seconds
    73.38%    bench  [kernel.kallsyms]  [k] sk_run_filter
    10.70%    bench  libc-2.15.so       [.] syscall
     5.09%    bench  [kernel.kallsyms]  [k] seccomp_bpf_load
     1.97%    bench  [kernel.kallsyms]  [k] system_call
new BPF: 5.7 seconds
    66.20%    bench  [kernel.kallsyms]  [k] sk_run_filter_int_seccomp
    16.75%    bench  libc-2.15.so       [.] syscall
     3.31%    bench  [kernel.kallsyms]  [k] system_call
     2.88%    bench  [kernel.kallsyms]  [k] __secure_computing

--i386-- large filter
old BPF: 5.4 sec
new BPF: 3.8 sec

--arm32-- large filter
old BPF: 13.5 sec
 73.88%  bench  [kernel.kallsyms]  [k] sk_run_filter
 10.29%  bench  [kernel.kallsyms]  [k] vector_swi
  6.46%  bench  libc-2.17.so       [.] syscall
  2.94%  bench  [kernel.kallsyms]  [k] seccomp_bpf_load
  1.19%  bench  [kernel.kallsyms]  [k] __secure_computing
  0.87%  bench  [kernel.kallsyms]  [k] sys_getuid
new BPF: 13.5 sec
 76.08%  bench  [kernel.kallsyms]  [k] sk_run_filter_int_seccomp
 10.98%  bench  [kernel.kallsyms]  [k] vector_swi
  5.87%  bench  libc-2.17.so       [.] syscall
  1.77%  bench  [kernel.kallsyms]  [k] __secure_computing
  0.93%  bench  [kernel.kallsyms]  [k] sys_getuid

BPF filters generated by seccomp are very branchy, so the new
internal BPF performance is better than the old one. Performance
gains will be even higher when BPF JIT is committed for the
new structure, which is planned in future work (as successive
JIT migrations).

BPF has also been stress-tested with trinity's BPF fuzzer.

Joint work with Daniel Borkmann.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paul Moore <pmoore@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: linux-kernel@vger.kernel.org
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-31 00:45:09 -04:00
Rusty Russell
57673c2b0b Use 'E' instead of 'X' for unsigned module taint flag.
Takashi Iwai <tiwai@suse.de> says:
> The letter 'X' has been already used for SUSE kernels for very long
> time, to indicate the external supported modules.  Can the new flag be
> changed to another letter for avoiding conflict...?
> (BTW, we also use 'N' for "no support", too.)

Note: this code should be cleaned up, so we don't have such maps in
three places!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-03-31 14:52:43 +10:30
Eric Paris
aa4af831bb AUDIT: Allow login in non-init namespaces
It its possible to configure your PAM stack to refuse login if audit
messages (about the login) were unable to be sent.  This is common in
many distros and thus normal configuration of many containers.  The PAM
modules determine if audit is enabled/disabled in the kernel based on
the return value from sending an audit message on the netlink socket.
If userspace gets back ECONNREFUSED it believes audit is disabled in the
kernel.  If it gets any other error else it refuses to let the login
proceed.

Just about ever since the introduction of namespaces the kernel audit
subsystem has returned EPERM if the task sending a message was not in
the init user or pid namespace.  So many forms of containers have never
worked if audit was enabled in the kernel.

BUT if the container was not in net_init then the kernel network code
would send ECONNREFUSED (instead of the audit code sending EPERM).  Thus
by pure accident/dumb luck/bug if an admin configured the PAM stack to
reject all logins that didn't talk to audit, but then ran the login
untility in the non-init_net namespace, it would work!! Clearly this was
a bug, but it is a bug some people expected.

With the introduction of network namespace support in 3.14-rc1 the two
bugs stopped cancelling each other out.  Now, containers in the
non-init_net namespace refused to let users log in (just like PAM was
configfured!) Obviously some people were not happy that what used to let
users log in, now didn't!

This fix is kinda hacky.  We return ECONNREFUSED for all non-init
relevant namespaces.  That means that not only will the old broken
non-init_net setups continue to work, now the broken non-init_pid or
non-init_user setups will 'work'.  They don't really work, since audit
isn't logging things.  But it's what most users want.

In 3.15 we should have patches to support not only the non-init_net
(3.14) namespace but also the non-init_pid and non-init_user namespace.
So all will be right in the world.  This just opens the doors wide open
on 3.14 and hopefully makes users happy, if not the audit system...

Reported-by: Andre Tomt <andre@tomt.net>
Reported-by: Adam Richter <adam_richter2004@yahoo.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-03-30 17:02:53 -07:00
Li Zefan
1ec41830e0 cgroup: remove useless argument from cgroup_exit()
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2014-03-29 09:15:54 -04:00
Li Zefan
e8604cb436 cgroup: fix spurious lockdep warning in cgroup_exit()
cgroup_exit() is called in fork and exit path. If it's called in the
failure path during fork, PF_EXITING isn't set, and then lockdep will
complain.

Fix this by removing cgroup_exit() in that failure path. cgroup_fork()
does nothing that needs cleanup.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2014-03-29 09:15:53 -04:00
John Stultz
cab5e127ee time: Revert to calling clock_was_set_delayed() while in irq context
In commit 47a1b79630 ("tick/timekeeping: Call
update_wall_time outside the jiffies lock"), we moved to calling
clock_was_set() due to the fact that we were no longer holding
the timekeeping or jiffies lock.

However, there is still the problem that clock_was_set()
triggers an IPI, which cannot be done from the timer's hard irq
context, and will generate WARN_ON warnings.

Apparently in my earlier testing, I'm guessing I didn't bump the
dmesg log level, so I somehow missed the WARN_ONs.

Thus we need to revert back to calling clock_was_set_delayed().

Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1395963049-11923-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-28 08:07:07 +01:00
Linus Torvalds
f217c44ebd While on my flight to Linux Collaboration Summit, I was working on
my slides for the event trigger tutorial. I booted a 3.14-rc7 kernel
 to perform what I wanted to teach and cut and paste it into my slides.
 When I tried the traceon event trigger with a condition attached to it
 (turns tracing on only if a field of the trigger event matches a condition
 set by the user), nothing happened. Tracing would not turn on. I stopped
 working on my presentation in order to find what was wrong.
 
 It ended up being the way trace event triggers work when they have
 conditions. Instead of copying the fields, the condition code just
 looks at the fields that were copied into the ring buffer. This works
 great, unless tracing is off. That's because when the event is reserved
 on the ring buffer, the ring buffer returns a NULL pointer, this tells
 the tracing code that the ring buffer is disabled. This ends up being
 a problem for the traceon trigger if it is using this information to
 check its condition.
 
 Luckily the code that checks if tracing is on returns the ring buffer
 to use (because the ring buffer is determined by the event file
 also passed to that field). I was able to easily solve this bug by
 checking in that helper function if the returned ring buffer entry
 is NULL, and if so, also check the file flag if it has a trace event
 trigger condition, and if so, to pass back a temp ring buffer to use.
 This will allow the trace event trigger condition to still test the
 event fields, but nothing will be recorded.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJTMtD2AAoJEKQekfcNnQGuwrUH/juk/qKY7T/nq98iMleOD+LR
 iTQyWh9zvHdku/3ZlK5eJqla2TJUctM5Sf9WhvV/iXJ2/9m5qnekZ6gCoraGVpG4
 qfKZAiHdGU0sFdp2NIt1uIHcqjYc8dg2KGajEKGNr7E2F3fvSTfNZV+WKAnmGqlE
 8ICsoYJ/Uv4kh18QQJexbyd7jXWE8tzxxS9UzPvCtCiSd1xf2yCAS5OlwPYafgiv
 Egbjqq7IUCJfVb0h/YfgPbEW69sjlxOulzn42Nii+UOx9uv1RTdtXdivyCXDWfX3
 Que0Do757oTvpW/22s9T3mpUtYLCB+6B2GV+ljqbLCZ5pRywc/GDCU//D1AXKEI=
 =dKWH
 -----END PGP SIGNATURE-----

Merge tag 'trace-fixes-v3.14-rc7-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fix from Steven Rostedt:
 "While on my flight to Linux Collaboration Summit, I was working on my
  slides for the event trigger tutorial.  I booted a 3.14-rc7 kernel to
  perform what I wanted to teach and cut and paste it into my slides.
  When I tried the traceon event trigger with a condition attached to it
  (turns tracing on only if a field of the trigger event matches a
  condition set by the user), nothing happened.  Tracing would not turn
  on.  I stopped working on my presentation in order to find what was
  wrong.

  It ended up being the way trace event triggers work when they have
  conditions.  Instead of copying the fields, the condition code just
  looks at the fields that were copied into the ring buffer.  This works
  great, unless tracing is off.  That's because when the event is
  reserved on the ring buffer, the ring buffer returns a NULL pointer,
  this tells the tracing code that the ring buffer is disabled.  This
  ends up being a problem for the traceon trigger if it is using this
  information to check its condition.

  Luckily the code that checks if tracing is on returns the ring buffer
  to use (because the ring buffer is determined by the event file also
  passed to that field).  I was able to easily solve this bug by
  checking in that helper function if the returned ring buffer entry is
  NULL, and if so, also check the file flag if it has a trace event
  trigger condition, and if so, to pass back a temp ring buffer to use.
  This will allow the trace event trigger condition to still test the
  event fields, but nothing will be recorded"

* tag 'trace-fixes-v3.14-rc7-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix traceon trigger condition to actually turn tracing on
2014-03-26 09:09:18 -07:00
Steven Rostedt (Red Hat)
2c4a33aba5 tracing: Fix traceon trigger condition to actually turn tracing on
While working on my tutorial for 2014 Linux Collaboration Summit
I found that the traceon trigger did not work when conditions were
used. The other triggers worked fine though. Looking into it, it
is because of the way the triggers use the ring buffer to store
the fields it will use for the condition. But if tracing is off, nothing
is stored in the buffer, and the tracepoint exits before calling the
trigger to test the condition. This is fine for all the triggers that
only work when tracing is on, but for traceon trigger that is to
work when tracing is off, nothing happens.

The fix is simple, just use a temp ring buffer to record the event
if tracing is off and the event has a trace event conditional trigger
enabled. The rest of the tracepoint code will work just fine, but
the tracepoint wont be recorded in the other buffers.

Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-03-25 23:39:41 -04:00
Viresh Kumar
b97f0291a2 tick: Remove code duplication in tick_handle_periodic()
tick_handle_periodic() is calling ktime_add() at two places, first before the
infinite loop and then at the end of infinite loop. We can rearrange code a bit
to fix code duplication here.

It looks quite simple and shouldn't break anything, I guess :)

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: linaro-kernel@lists.linaro.org
Cc: fweisbec@gmail.com
Link: http://lkml.kernel.org/r/be3481e8f3f71df694a4b43623254fc93ca51b59.1395735873.git.viresh.kumar@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-03-26 00:56:49 +01:00
Viresh Kumar
cacb3c76c2 tick: Fix spelling mistake in tick_handle_periodic()
One of the comments in tick_handle_periodic() had 'when' instead of 'which' (My
guess :)). Fix it.

Also fix spelling mistake in 'Possible'.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: linaro-kernel@lists.linaro.org
Cc: skarafotis@gmail.com
Link: http://lkml.kernel.org/r/2b29ca4230c163e44179941d7c7a16c1474385c2.1395743878.git.viresh.kumar@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-03-26 00:56:49 +01:00
Thomas Gleixner
ea2e64f280 workqueue: Provide destroy_delayed_work_on_stack()
If a delayed or deferrable work is on stack we need to tell debug
objects that we are destroying the timer and the work. Otherwise we
leak the tracking object.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Acked-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/20140323141939.911487677@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-03-25 17:33:42 +01:00
Ingo Molnar
7de700e680 Merge branch 'rcu/next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU update from Paul E. McKenney:

" [...] one late-breaking commit.  This one was requested for 3.15 by Peter Zijlstra.
  It is low risk because it adds a new in-kernel API with minimal changes to the
  existing code.  Those minimal changes are the addition of memory barriers and
  ACCESS_ONCE() macro calls, neither of which should be able to break things.
  This commit has passed significant rcutorture testing, with these additional
  additions to rcutorture slated for 3.16.  This commit has also been exposed to
  -next testing. "

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-25 06:45:39 +01:00
Monam Agarwal
e231d54c12 kernel: Use RCU_INIT_POINTER(x, NULL) in audit.c
This patch replaces rcu_assign_pointer(x, NULL) with RCU_INIT_POINTER(x, NULL)

The rcu_assign_pointer() ensures that the initialization of a structure
is carried out before storing a pointer to that structure.
And in the case of the NULL pointer, there is no structure to initialize.
So, rcu_assign_pointer(p, NULL) can be safely converted to RCU_INIT_POINTER(p, NULL)

Signed-off-by: Monam Agarwal <monamagarwal123@gmail.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2014-03-24 12:00:22 -04:00
Aaron Tomlin
3862807880 tracing: Add BUG_ON when stack end location is over written
It is difficult to detect a stack overrun when it
actually occurs.

We have observed that this type of corruption is often
silent and can go unnoticed. Once the corrupted region
is examined, the outcome is undefined and often
results in sporadic system crashes.

When the stack tracing feature is enabled, let's check
for this condition and take appropriate action.

Note: init_task doesn't get its stack end location
set to STACK_END_MAGIC.

Link: http://lkml.kernel.org/r/1395669837-30209-1-git-send-email-atomlin@redhat.com

Signed-off-by: Aaron Tomlin <atomlin@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-03-24 10:39:11 -04:00
Monam Agarwal
01a9714061 cgroup: Use RCU_INIT_POINTER(x, NULL) in cgroup.c
This patch replaces rcu_assign_pointer(x, NULL) with
RCU_INIT_POINTER(x, NULL)

The rcu_assign_pointer() ensures that the initialization of a
structure is carried out before storing a pointer to that structure.
And in the case of the NULL pointer, there is no structure to
initialize.  So, rcu_assign_pointer(p, NULL) can be safely converted
to RCU_INIT_POINTER(p, NULL)

Signed-off-by: Monam Agarwal <monamagarwal123@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2014-03-24 08:48:02 -04:00
Alexander Shiyan
d6ee6d2325 genirq: Export symbol no_action()
This will allow to use the dummy IRQ handler no_action() from drivers
compiled as module. Drivers which use ARM FIQ interrupts can use this
to request the interrupt via the normal request_irq() mechanism w/o
having to copy the dummy handler to their own code.

Signed-off-by: Alexander Shiyan <shc_work@mail.ru>
Link: http://lkml.kernel.org/r/1395476431-16070-1-git-send-email-shc_work@mail.ru
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-03-22 11:33:09 +01:00
Mathieu Desnoyers
0dea6d5263 tracepoint: Remove unused API functions
After the following commit:

commit b75ef8b44b
Author: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Date:   Wed Aug 10 15:18:39 2011 -0400

    Tracepoint: Dissociate from module mutex

The following functions became unnecessary:

- tracepoint_probe_register_noupdate,
- tracepoint_probe_unregister_noupdate,
- tracepoint_probe_update_all.

In fact, none of the in-kernel tracers, nor LTTng, nor SystemTAP use
them. Remove those.

Moreover, the functions:

- tracepoint_iter_start,
- tracepoint_iter_next,
- tracepoint_iter_stop,
- tracepoint_iter_reset.

are unused by in-kernel tracers, LTTng and SystemTAP. Remove those too.

Link: http://lkml.kernel.org/r/1395379142-2118-2-git-send-email-mathieu.desnoyers@efficios.com

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-03-21 14:01:15 -04:00
Steven Rostedt (Red Hat)
bc4c426ee2 Revert "tracing: Move event storage for array from macro to standalone function"
I originally wrote commit 35bb4399bd to shrink the size of the overhead of
tracepoints by several kilobytes. Later, I received a patch from Vaibhav
Nagarnaik that fixed a bug in the same code that this commit touches. Not
only did it fix a bug, it also removed code and shrunk the size of the
overhead of trace events even more than this commit did.

Since this commit is scheduled for 3.15 and Vaibhav's patch is already in
mainline, I need to revert this patch in order to keep it from conflicting
with Vaibhav's patch. Not to mention, Vaibhav's patch makes this patch
obsolete.

Link: http://lkml.kernel.org/r/20140320225637.0226041b@gandalf.local.home

Cc: Vaibhav Nagarnaik <vnagarnaik@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-03-21 13:11:41 -04:00
Linus Torvalds
11d4616bd0 futex: revert back to the explicit waiter counting code
Srikar Dronamraju reports that commit b0c29f79ec ("futexes: Avoid
taking the hb->lock if there's nothing to wake up") causes java threads
getting stuck on futexes when runing specjbb on a power7 numa box.

The cause appears to be that the powerpc spinlocks aren't using the same
ticket lock model that we use on x86 (and other) architectures, which in
turn result in the "spin_is_locked()" test in hb_waiters_pending()
occasionally reporting an unlocked spinlock even when there are pending
waiters.

So this reinstates Davidlohr Bueso's original explicit waiter counting
code, which I had convinced Davidlohr to drop in favor of figuring out
the pending waiters by just using the existing state of the spinlock and
the wait queue.

Reported-and-tested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Original-code-by: Davidlohr Bueso <davidlohr@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-03-20 22:11:17 -07:00
Linus Torvalds
477cc48484 Vaibhav Nagarnaik discovered that since 3.10 a clean up patch made the
array index in the trace event format bogus. He supplied an elegant solution
 that uses __stringify() and also removes the need for the event_storage
 and event_storage_mutex and also cuts off a few K of overhead from
 the trace events.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJTK6cxAAoJEKQekfcNnQGu+3oIAIPGwsevXcVNlmqLwtsoUhu2
 d0uMJYBpi+aQ/stenhkAThzY/5/O0SN2o+uyacSJKDo3WayF9fxGTvOHVbJhvmLF
 YsX6oQLVmzqPrq7BGJTvglv4+RYf+HkV1MAb/1iacA7sFtd7jVpUiPvLlQ3CEwph
 kNqdmoFT16iyE1snUviE0GmZmrdOqZBjwC1Ys+oSbaycRSFnvmDjYAGYot5tJfU5
 gYgpkeJ8J3bxHOGzNCRgmpLQNR3P1HzailPQVi51We14FlzSwOTwuKDtpf8WwzXV
 0fIEkdTU3+K62XxVw5/YQ5o/PpFKO/J5dSPjFe7PF2e6hCTTOABcK5foSSP1KSU=
 =559y
 -----END PGP SIGNATURE-----

Merge tag 'trace-fixes-v3.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull trace fix from Steven Rostedt:
 "Vaibhav Nagarnaik discovered that since 3.10 a clean-up patch made the
  array index in the trace event format bogus.

  He supplied an elegant solution that uses __stringify() and also
  removes the need for the event_storage and event_storage_mutex and
  also cuts off a few K of overhead from the trace events"

* tag 'trace-fixes-v3.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix array size mismatch in format string
2014-03-20 22:09:30 -07:00
Paul E. McKenney
765a3f4fed rcu: Provide grace-period piggybacking API
The following pattern is currently not well supported by RCU:

1.	Make data element inaccessible to RCU readers.

2.	Do work that probably lasts for more than one grace period.

3.	Do something to make sure RCU readers in flight before #1 above
	have completed.

Here are some things that could currently be done:

a.	Do a synchronize_rcu() unconditionally at either #1 or #3 above.
	This works, but imposes needless work and latency.

b.	Post an RCU callback at #1 above that does a wakeup, then
	wait for the wakeup at #3.  This works well, but likely results
	in an extra unneeded grace period.  Open-coding this is also
	a bit more semi-tricky code than would be good.

This commit therefore adds get_state_synchronize_rcu() and
cond_synchronize_rcu() APIs.  Call get_state_synchronize_rcu() at #1
above and pass its return value to cond_synchronize_rcu() at #3 above.
This results in a call to synchronize_rcu() if no grace period has
elapsed between #1 and #3, but requires only a load, comparison, and
memory barrier if a full grace period did elapse.

Requested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
2014-03-20 17:12:25 -07:00
Dave Jones
8c90487cdc Rename TAINT_UNSAFE_SMP to TAINT_CPU_OUT_OF_SPEC
Rename TAINT_UNSAFE_SMP to TAINT_CPU_OUT_OF_SPEC, so we can repurpose
the flag to encompass a wider range of pushing the CPU beyond its
warrany.

Signed-off-by: Dave Jones <davej@fedoraproject.org>
Link: http://lkml.kernel.org/r/20140226154949.GA770@redhat.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-03-20 16:28:09 -07:00
Vaibhav Nagarnaik
87291347c4 tracing: Fix array size mismatch in format string
In event format strings, the array size is reported in two locations.
One in array subscript and then via the "size:" attribute. The values
reported there have a mismatch.

For e.g., in sched:sched_switch the prev_comm and next_comm character
arrays have subscript values as [32] where as the actual field size is
16.

name: sched_switch
ID: 301
format:
        field:unsigned short common_type;       offset:0;       size:2; signed:0;
        field:unsigned char common_flags;       offset:2;       size:1; signed:0;
        field:unsigned char common_preempt_count;       offset:3;       size:1;signed:0;
        field:int common_pid;   offset:4;       size:4; signed:1;

        field:char prev_comm[32];       offset:8;       size:16;        signed:1;
        field:pid_t prev_pid;   offset:24;      size:4; signed:1;
        field:int prev_prio;    offset:28;      size:4; signed:1;
        field:long prev_state;  offset:32;      size:8; signed:1;
        field:char next_comm[32];       offset:40;      size:16;        signed:1;
        field:pid_t next_pid;   offset:56;      size:4; signed:1;
        field:int next_prio;    offset:60;      size:4; signed:1;

After bisection, the following commit was blamed:
92edca0 tracing: Use direct field, type and system names

This commit removes the duplication of strings for field->name and
field->type assuming that all the strings passed in
__trace_define_field() are immutable. This is not true for arrays, where
the type string is created in event_storage variable and field->type for
all array fields points to event_storage.

Use __stringify() to create a string constant for the type string.

Also, get rid of event_storage and event_storage_mutex that are not
needed anymore.

also, an added benefit is that this reduces the overhead of events a bit more:

   text    data     bss     dec     hex filename
8424787 2036472 1302528 11763787         b3804b vmlinux
8420814 2036408 1302528 11759750         b37086 vmlinux.patched

Link: http://lkml.kernel.org/r/1392349908-29685-1-git-send-email-vnagarnaik@google.com

Cc: Laurent Chavey <chavey@google.com>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-03-20 13:21:05 -04:00
Tejun Heo
e1b2dc176f cgroup: break kernfs active_ref protection in cgroup directory operations
cgroup_tree_mutex should nest above the kernfs active_ref protection;
however, cgroup_create() and cgroup_rename() were grabbing
cgroup_tree_mutex while under kernfs active_ref protection.  This has
actualy possibility to lead to deadlocks in case these operations race
against cgroup_rmdir() which invokes kernfs_remove() on directory
kernfs_node while holding cgroup_tree_mutex.

Neither cgroup_create() or cgroup_rename() requires active_ref
protection.  The former already has enough synchronization through
cgroup_lock_live_group() and the latter doesn't care, so this can be
fixed by updating both functions to break all active_ref protections
before grabbing cgroup_tree_mutex.

While this patch fixes the immediate issue, it probably needs further
work in the long term - kernfs directories should enable lockdep
annotations and maybe the better way to handle this is marking
directory nodes as not needing active_ref protection rather than
breaking it in each operation.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-20 11:10:15 -04:00
Eric Paris
5e937a9ae9 syscall_get_arch: remove useless function arguments
Every caller of syscall_get_arch() uses current for the task and no
implementors of the function need args.  So just get rid of both of
those things.  Admittedly, since these are inline functions we aren't
wasting stack space, but it just makes the prototypes better.

Signed-off-by: Eric Paris <eparis@redhat.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-mips@linux-mips.org
Cc: linux390@de.ibm.com
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linux-arch@vger.kernel.org
2014-03-20 10:11:59 -04:00
Joe Perches
b7550787fe audit: remove stray newline from audit_log_execve_info() audit_panic() call
There's an unnecessary use of a \n in audit_panic.

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
2014-03-20 10:11:58 -04:00
Josh Boyer
f12835276c audit: remove stray newlines from audit_log_lost messages
Calling audit_log_lost with a \n in the format string leads to extra
newlines in dmesg.  That function will eventually call audit_panic which
uses pr_err with an explicit \n included.  Just make these calls match the
others that lack \n.

Reported-by: Jonathan Kamens <jik@kamens.brookline.ma.us>
Signed-off-by: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
2014-03-20 10:11:58 -04:00
Eric Paris
ddfad8affd audit: include subject in login records
The login uid change record does not include the selinux context of the
task logging in.  Add that information.

(Updated from 2011-01: RHBZ:670328 -- RGB)

Reported-by: Steve Grubb <sgrubb@redhat.com>
Acked-by: James Morris <jmorris@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
2014-03-20 10:11:57 -04:00
Richard Guy Briggs
aa589a13b5 audit: remove superfluous new- prefix in AUDIT_LOGIN messages
The new- prefix on ses and auid are un-necessary and break ausearch.

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
2014-03-20 10:11:57 -04:00
Richard Guy Briggs
5a3cb3b6c3 audit: allow user processes to log from another PID namespace
Still only permit the audit logging daemon and control to operate from the
initial PID namespace, but allow processes to log from another PID namespace.

Cc: "Eric W. Biederman" <ebiederm@xmission.com>
(informed by ebiederman's c776b5d2)

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
2014-03-20 10:11:56 -04:00
Richard Guy Briggs
f1dc4867ff audit: anchor all pid references in the initial pid namespace
Store and log all PIDs with reference to the initial PID namespace and
use the access functions task_pid_nr() and task_tgid_nr() for task->pid
and task->tgid.

Cc: "Eric W. Biederman" <ebiederm@xmission.com>
(informed by ebiederman's c776b5d2)
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
2014-03-20 10:11:55 -04:00
Richard Guy Briggs
c92cdeb45e audit: convert PPIDs to the inital PID namespace.
sys_getppid() returns the parent pid of the current process in its own pid
namespace.  Since audit filters are based in the init pid namespace, a process
could avoid a filter or trigger an unintended one by being in an alternate pid
namespace or log meaningless information.

Switch to task_ppid_nr() for PPIDs to anchor all audit filters in the
init_pid_ns.

(informed by ebiederman's 6c621b7e)
Cc: stable@vger.kernel.org
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
2014-03-20 10:11:55 -04:00
Richard Guy Briggs
4a3eb726d1 audit: rename the misleading audit_get_context() to audit_take_context()
"get" usually implies incrementing a refcount into a structure to indicate a
reference being held by another part of code.

Change this function name to indicate it is in fact being taken from it,
returning the value while clearing it in the supplying structure.

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
2014-03-20 10:11:54 -04:00
Eric W. Biederman
099dd23511 audit: Send replies in the proper network namespace.
In perverse cases of file descriptor passing the current network
namespace of a process and the network namespace of a socket used by
that socket may differ.  Therefore use the network namespace of the
appropiate socket to ensure replies always go to the appropiate
socket.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2014-03-20 10:11:02 -04:00
Eric W. Biederman
638a0fd2a0 audit: Use struct net not pid_t to remember the network namespce to reply in
While reading through 3.14-rc1 I found a pretty siginficant mishandling
of network namespaces in the recent audit changes.

In struct audit_netlink_list and audit_reply add a reference to the
network namespace of the caller and remove the userspace pid of the
caller.  This cleanly remembers the callers network namespace, and
removes a huge class of races and nasty failure modes that can occur
when attempting to relook up the callers network namespace from a pid_t
(including the caller's network namespace changing, pid wraparound, and
the pid simply not being present).

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2014-03-20 10:10:53 -04:00
William Roberts
3f1c82502c audit: Audit proc/<pid>/cmdline aka proctitle
During an audit event, cache and print the value of the process's
proctitle value (proc/<pid>/cmdline). This is useful in situations
where processes are started via fork'd virtual machines where the
comm field is incorrect. Often times, setting the comm field still
is insufficient as the comm width is not very wide and most
virtual machine "package names" do not fit. Also, during execution,
many threads have their comm field set as well. By tying it back to
the global cmdline value for the process, audit records will be more
complete in systems with these properties. An example of where this
is useful and applicable is in the realm of Android. With Android,
their is no fork/exec for VM instances. The bare, preloaded Dalvik
VM listens for a fork and specialize request. When this request comes
in, the VM forks, and the loads the specific application (specializing).
This was done to take advantage of COW and to not require a load of
basic packages by the VM on very app spawn. When this spawn occurs,
the package name is set via setproctitle() and shows up in procfs.
Many of these package names are longer then 16 bytes, the historical
width of task->comm. Having the cmdline in the audit records will
couple the application back to the record directly. Also, on my
Debian development box, some audit records were more useful then
what was printed under comm.

The cached proctitle is tied to the life-cycle of the audit_context
structure and is built on demand.

Proctitle is controllable by userspace, and thus should not be trusted.
It is meant as an aid to assist in debugging. The proctitle event is
emitted during syscall audits, and can be filtered with auditctl.

Example:
type=AVC msg=audit(1391217013.924:386): avc:  denied  { getattr } for  pid=1971 comm="mkdir" name="/" dev="selinuxfs" ino=1 scontext=system_u:system_r:consolekit_t:s0-s0:c0.c255 tcontext=system_u:object_r:security_t:s0 tclass=filesystem
type=SYSCALL msg=audit(1391217013.924:386): arch=c000003e syscall=137 success=yes exit=0 a0=7f019dfc8bd7 a1=7fffa6aed2c0 a2=fffffffffff4bd25 a3=7fffa6aed050 items=0 ppid=1967 pid=1971 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="mkdir" exe="/bin/mkdir" subj=system_u:system_r:consolekit_t:s0-s0:c0.c255 key=(null)
type=UNKNOWN[1327] msg=audit(1391217013.924:386):  proctitle=6D6B646972002D70002F7661722F72756E2F636F6E736F6C65

Acked-by: Steve Grubb <sgrubb@redhat.com> (wrt record formating)

Signed-off-by: William Roberts <wroberts@tresys.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2014-03-20 10:10:52 -04:00
Srivatsa S. Bhat
c270a81719 profile: Fix CPU hotplug callback registration
Subsystems that want to register CPU hotplug callbacks, as well as perform
initialization for the CPUs that are already online, often do it as shown
below:

	get_online_cpus();

	for_each_online_cpu(cpu)
		init_cpu(cpu);

	register_cpu_notifier(&foobar_cpu_notifier);

	put_online_cpus();

This is wrong, since it is prone to ABBA deadlocks involving the
cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
with CPU hotplug operations).

Instead, the correct and race-free way of performing the callback
registration is:

	cpu_notifier_register_begin();

	for_each_online_cpu(cpu)
		init_cpu(cpu);

	/* Note the use of the double underscored version of the API */
	__register_cpu_notifier(&foobar_cpu_notifier);

	cpu_notifier_register_done();

Fix the profile code by using this latter form of callback registration.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-20 13:43:48 +01:00
Srivatsa S. Bhat
d39ad278a3 trace, ring-buffer: Fix CPU hotplug callback registration
Subsystems that want to register CPU hotplug callbacks, as well as perform
initialization for the CPUs that are already online, often do it as shown
below:

	get_online_cpus();

	for_each_online_cpu(cpu)
		init_cpu(cpu);

	register_cpu_notifier(&foobar_cpu_notifier);

	put_online_cpus();

This is wrong, since it is prone to ABBA deadlocks involving the
cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
with CPU hotplug operations).

Instead, the correct and race-free way of performing the callback
registration is:

	cpu_notifier_register_begin();

	for_each_online_cpu(cpu)
		init_cpu(cpu);

	/* Note the use of the double underscored version of the API */
	__register_cpu_notifier(&foobar_cpu_notifier);

	cpu_notifier_register_done();

Fix the tracing ring-buffer code by using this latter form of callback
registration.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-20 13:43:48 +01:00
Srivatsa S. Bhat
93ae4f978c CPU hotplug: Provide lockless versions of callback registration functions
The following method of CPU hotplug callback registration is not safe
due to the possibility of an ABBA deadlock involving the cpu_add_remove_lock
and the cpu_hotplug.lock.

	get_online_cpus();

	for_each_online_cpu(cpu)
		init_cpu(cpu);

	register_cpu_notifier(&foobar_cpu_notifier);

	put_online_cpus();

The deadlock is shown below:

          CPU 0                                         CPU 1
          -----                                         -----

   Acquire cpu_hotplug.lock
   [via get_online_cpus()]

                                              CPU online/offline operation
                                              takes cpu_add_remove_lock
                                              [via cpu_maps_update_begin()]

   Try to acquire
   cpu_add_remove_lock
   [via register_cpu_notifier()]

                                              CPU online/offline operation
                                              tries to acquire cpu_hotplug.lock
                                              [via cpu_hotplug_begin()]

                            *** DEADLOCK! ***

The problem here is that callback registration takes the locks in one order
whereas the CPU hotplug operations take the same locks in the opposite order.
To avoid this issue and to provide a race-free method to register CPU hotplug
callbacks (along with initialization of already online CPUs), introduce new
variants of the callback registration APIs that simply register the callbacks
without holding the cpu_add_remove_lock during the registration. That way,
we can avoid the ABBA scenario. However, we will need to hold the
cpu_add_remove_lock throughout the entire critical section, to protect updates
to the callback/notifier chain.

This can be achieved by writing the callback registration code as follows:

	cpu_maps_update_begin(); [ or cpu_notifier_register_begin(); see below ]

	for_each_online_cpu(cpu)
		init_cpu(cpu);

	/* This doesn't take the cpu_add_remove_lock */
	__register_cpu_notifier(&foobar_cpu_notifier);

	cpu_maps_update_done();  [ or cpu_notifier_register_done(); see below ]

Note that we can't use get_online_cpus() here instead of cpu_maps_update_begin()
because the cpu_hotplug.lock is dropped during the invocation of CPU_POST_DEAD
notifiers, and hence get_online_cpus() cannot provide the necessary
synchronization to protect the callback/notifier chains against concurrent
reads and writes. On the other hand, since the cpu_add_remove_lock protects
the entire hotplug operation (including CPU_POST_DEAD), we can use
cpu_maps_update_begin/done() to guarantee proper synchronization.

Also, since cpu_maps_update_begin/done() is like a super-set of
get/put_online_cpus(), the former naturally protects the critical sections
from concurrent hotplug operations.

Since the names cpu_maps_update_begin/done() don't make much sense in CPU
hotplug callback registration scenarios, we'll introduce new APIs named
cpu_notifier_register_begin/done() and map them to cpu_maps_update_begin/done().

In summary, introduce the lockless variants of un/register_cpu_notifier() and
also export the cpu_notifier_register_begin/done() APIs for use by modules.
This way, we provide a race-free way to register hotplug callbacks as well as
perform initialization for the CPUs that are already online.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-20 13:43:40 +01:00
Gautham R. Shenoy
a19423b987 CPU hotplug: Add lockdep annotations to get/put_online_cpus()
Add lockdep annotations for get/put_online_cpus() and
cpu_hotplug_begin()/cpu_hotplug_end().

Cc: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-20 13:43:40 +01:00
Rafael J. Wysocki
36cc86e8ec Merge branches 'pm-runtime' and 'pm-sleep'
* pm-runtime:
  PM / Runtime: Update runtime_idle() documentation for return value meaning

* pm-sleep:
  PM / sleep: Correct whitespace errors in <linux/pm.h>
  PM: Add missing "freeze" state
  PM / Hibernate: Spelling s/anonymouns/anonymous/
  PM / Runtime: Add missing "it" in comment
  PM / suspend: Remove unnecessary !!
  PCI / PM: Resume runtime-suspended devices later during system suspend
  ACPI / PM: Resume runtime-suspended devices later during system suspend
  PM / sleep: Set pm_generic functions to NULL for !CONFIG_PM_SLEEP
  PM: fix typo in comment
  PM / hibernate: use name_to_dev_t to parse resume
  PM / wakeup: Include appropriate header file in kernel/power/wakelock.c
  PM / sleep: Move prototype declaration to header file kernel/power/power.h
  PM / sleep: Asynchronous threads for suspend_late
  PM / sleep: Asynchronous threads for suspend_noirq
  PM / sleep: Asynchronous threads for resume_early
  PM / sleep: Asynchronous threads for resume_noirq
  PM / sleep: Two flags for async suspend_noirq and suspend_late
2014-03-20 13:25:54 +01:00
Rafael J. Wysocki
165f5fd04a Merge branches 'pm-qos', 'pm-domains' and 'pm-drivers'
* pm-qos:
  PM / QoS: Add type to dev_pm_qos_add_ancestor_request() arguments
  ACPI / LPSS: Support for device latency tolerance PM QoS
  ACPI / scan: Add bind/unbind callbacks to struct acpi_scan_handler
  PM / QoS: Introcuce latency tolerance device PM QoS type
  PM / QoS: Add no_constraints_value field to struct pm_qos_constraints
  PM / QoS: Rename device resume latency QoS items

* pm-domains:
  PM / domains: Turn latency warning into debug message

* pm-drivers:
  PM: Add pm_runtime_suspend|resume_force functions
  PM / runtime: Fetch runtime PM callbacks using a macro
2014-03-20 13:25:36 +01:00
Viresh Kumar
6201b4d61f timer: Remove code redundancy while calling get_nohz_timer_target()
There are only two users of get_nohz_timer_target(): timer and hrtimer. Both
call it under same circumstances, i.e.

	#ifdef CONFIG_NO_HZ_COMMON
	       if (!pinned && get_sysctl_timer_migration() && idle_cpu(this_cpu))
	               return get_nohz_timer_target();
	#endif

So, it makes more sense to get all this as part of get_nohz_timer_target()
instead of duplicating code at two places. For this another parameter is
required to be passed to this routine, pinned.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: linaro-kernel@lists.linaro.org
Cc: fweisbec@gmail.com
Cc: peterz@infradead.org
Link: http://lkml.kernel.org/r/1e1b53537217d58d48c2d7a222a9c3ac47d5b64c.1395140107.git.viresh.kumar@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-03-20 12:35:46 +01:00
Viresh Kumar
c41eba7de1 timer: Use variable head instead of &work_list in __run_timers()
We already have a variable 'head' that points to '&work_list', and so
we should use that instead wherever possible.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: linaro-kernel@lists.linaro.org
Link: http://lkml.kernel.org/r/0d8645a6efc8360c4196c9797d59343abbfdcc5e.1395129136.git.viresh.kumar@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-03-20 12:35:45 +01:00
Linus Torvalds
7c3895383f Merge branch 'for-3.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fix from Tejun Heo:
 "One really late cgroup patch to fix error path in create_css().
  Hitting this bug would be pretty rare but still possible and it gets
  delayed we'd need to backport it through -stable anyway.  It only
  updates error path in create_css() and has low chance of new
  breakages"

* 'for-3.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: fix a failure path in create_css()
2014-03-19 16:13:40 -07:00
Tejun Heo
1b9aba49ea cgroup: fix cgroup_taskset walking order
cgroup_taskset is used to track and iterate target tasks while
migrating a task or process and should guarantee that the first task
iterated is the task group leader if a process is being migrated.

b3dc094e93 ("cgroup: use css_set->mg_tasks to track target tasks
during migration") replaced flex array cgroup_taskset->tc_array with
css_set->mg_tasks list to remove process size limit and dynamic
allocation during migration; unfortunately, it incorrectly used list
operations which don't preserve order breaking the guarantee that
cgroup_taskset_first() returns the leader for a process target.

Fix it by using order preserving list operations.  Note that as
multiple src_csets may map to a single dst_cset, the iteration order
may change across cgroup_task_migrate(); however, the leader is still
guaranteed to be the first entry.

The switch to list_splice_tail_init() at the end of cgroup_migrate()
isn't strictly necessary.  Let's still do it for consistency.

Signed-off-by: Tejun Heo <tj@kernel.org>
2014-03-19 17:43:21 -04:00
Bjorn Helgaas
6404e88e83 resources: Set type in __request_region()
We don't set the type (I/O, memory, etc.) of resources added by
__request_region(), which leads to confusing messages like this:

    address space collision: [io  0x1000-0x107f] conflicts with ACPI CPU throttle [??? 0x00001010-0x00001015 flags 0x80000000]

Set the type of a new resource added by __request_region() (used by
request_region() and request_mem_region()) to the type of its parent.  This
makes the resource tree internally consistent and fixes messages like the
above, where the ACPI CPU throttle resource really is an I/O port region,
but request_region() didn't fill in the type, so %pR didn't know how to
print it.

Sample dmesg showing the issue at the link below.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=71611
Reported-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2014-03-19 15:00:16 -06:00
Tejun Heo
8cbbf2c972 cgroup: implement CFTYPE_ONLY_ON_DFL
This cftype flag makes the file only appear on the default hierarchy.
This will later be used for cgroup.controllers file.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2014-03-19 10:23:55 -04:00
Tejun Heo
a2dd424750 cgroup: make cgrp_dfl_root mountable
cgrp_dfl_root will be used as the default unified hierarchy.  This
patch makes cgrp_dfl_root mountable by making the following changes.

* cgroup_init_early() now initializes cgrp_dfl_root w/
  CGRP_ROOT_SANE_BEHAVIOR.  The default hierarchy is always sane.

* parse_cgroupfs_options() and cgroup_mount() are updated such that
  cgrp_dfl_root is mounted if sane_behavior is specified w/o any
  subsystems.

* rebind_subsystems() now populates the root directory of
  cgrp_dfl_root.  Note that the function still guarantees success of
  rebinding subsystems to cgrp_dfl_root.  If populating fails while
  rebinding to cgrp_dfl_root, it whines but ignores the error.

* For backward compatibility, the default hierarchy shows up in
  /proc/$PID/cgroup only after it's explicitly mounted so that
  userland which doesn't make use of it doesn't see any change.

* "current_css_set_cg_links" file of debug cgroup now treats the
  default hierarchy the same as other hierarchies.  This is visible to
  userland.  Given that it's for debug controller, this should be
  fine.

* While at it, implement cgroup_on_dfl() which tests whether a give
  cgroup is on the default hierarchy or not.

The above changes make cgrp_dfl_root mostly equivalent to other
controllers but the actual unified hierarchy behaviors are not
implemented yet.  Let's plug child cgroup creation in cgrp_dfl_root
from create_cgroup() for now.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2014-03-19 10:23:55 -04:00
Tejun Heo
4d3bb511b5 cgroup: drop const from @buffer of cftype->write_string()
cftype->write_string() just passes on the writeable buffer from kernfs
and there's no reason to add const restriction on the buffer.  The
only thing const achieves is unnecessarily complicating parsing of the
buffer.  Drop const from @buffer.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Daniel Borkmann <dborkman@redhat.com>
Cc: Michal Hocko <mhocko@suse.cz>                                           
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
2014-03-19 10:23:54 -04:00
Tejun Heo
3dd06ffa9d cgroup: rename cgroup_dummy_root and related names
The dummy root will be repurposed to serve as the default unified
hierarchy.  Let's rename things in preparation.

* s/cgroup_dummy_root/cgrp_dfl_root/
* s/cgroupfs_root/cgroup_root/ as we don't do fs part directly anymore
* s/cgroup_root->top_cgroup/cgroup_root->cgrp/ for brevity

This is pure rename.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2014-03-19 10:23:54 -04:00
Tejun Heo
944196278d cgroup: move ->subsys_mask from cgroupfs_root to cgroup
cgroupfs_root->subsys_mask represents the controllers attached to the
hierarchy.  This patch moves the field to cgroup.  Subsystem
initialization and rebinding updates the top cgroup's subsys_mask.
For !root cgroups, the subsys_mask bits are set from create_css() and
cleared from kill_css(), which effectively means that all cgroups will
have the same subsys_mask as the top cgroup.

While this doesn't make any difference now, this will help
implementation of the default unified hierarchy where !root cgroups
may have subsets of the top_cgroup's subsys_mask.

While at it, __kill_css() is split out of kill_css().  The former
doesn't care about the subsys_mask while the latter becomes noop if
the controller is already killed and clears the matching bit if not
before proceeding to killing the css.  This will be used later by the
default unified hierarchy implementation.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2014-03-19 10:23:54 -04:00
Tejun Heo
5df3603229 cgroup: treat cgroup_dummy_root as an equivalent hierarchy during rebinding
Currently, while rebinding, cgroup_dummy_root serves as the anchor
point.  In addition to the target root, rebind_subsystems() takes
@added_mask and @removed_mask.  The subsystems specified in the former
are expected to be on the dummy root and then moved to the target
root.  The ones in the latter are moved from non-dummy root to dummy.
Now that the dummy root is a fully functional one and we're planning
to use it for the default unified hierarchy, this level of distinction
between dummy and non-dummy roots is quite awkward.

This patch updates rebind_subsystems() to take the target root and one
subsystem mask and move the specified subsystmes to the target root
which may or may not be the dummy root.  IOW, unbinding now becomes
moving the subsystems to the dummy root and binding to non-dummy root.
This makes the dummy root mostly equivalent to other hierarchies in
terms of the mechanism of moving subsystems around; however, we still
retain all the semantical restrictions so that this patch doesn't
introduce any visible behavior differences.  Another noteworthy detail
is that rebind_subsystems() guarantees that moving a subsystem to the
dummy root never fails so that valid unmounting attempts always
succeed.

This unifies binding and unbinding of subsystems.  The invocation
points of ->bind() were inconsistent between the two and now moved
after whole rebinding is complete.  This doesn't break the current
users and generally makes more sense.

All rebind_subsystems() users are converted accordingly.  Note that
cgroup_remount() now makes two calls to rebind_subsystems() to bind
and then unbind the requested subsystems.

This will allow repurposing of the dummy hierarchy as the default
unified hierarchy and shouldn't make any userland visible behavior
difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2014-03-19 10:23:54 -04:00
Tejun Heo
985ed67014 cgroup: use cgroup_setup_root() to initialize cgroup_dummy_root
cgroup_dummy_root is used to host controllers which aren't attached to
any other hierarchy.  The root is minimally set up during kernfs
bootstrap and didn't go through full hierarchy initialization.  We're
planning to use cgroup_dummy_root for the default unified hierarchy
and thus want it to be fully functional.

Replace the special initialization, which was collected into
cgroup_init() by the previous patch, with an invocation of
cgroup_setup_root().  This simplifies the init path and makes
cgroup_dummy_root a full hierarchy with its own kernfs_root and all.

As this puts the dummy hierarchy on the cgroup_roots list, rename
for_each_active_root() to for_each_root() and update its users to skip
the dummy root for now.

This patch doesn't cause any userland visible behavior changes at this
point.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2014-03-19 10:23:53 -04:00
Tejun Heo
172a2c0685 cgroup: reorganize cgroup bootstrapping
* Fields of init_css_set and css_set_count are now set using
  initializer instead of programmatically from cgroup_init_early().

* init_cgroup_root() now also takes @opts and performs the optional
  part of initialization too.  The leftover part of
  cgroup_root_from_opts() is collapsed into its only caller -
  cgroup_mount().

* Initialization of cgroup_root_count and linking of init_css_set are
  moved from cgroup_init_early() to to cgroup_init().  None of the
  early_init users depends on init_css_set being linked.

* Subsystem initializations are moved after dummy hierarchy init and
  init_css_set linking.

These changes reorganize the bootstrap logic so that the dummy
hierarchy can share the usual hierarchy init path and be made more
normal.  These changes don't make noticeable behavior changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2014-03-19 10:23:53 -04:00
Tejun Heo
5d77381fd8 cgroup: relocate setting of CGRP_DEAD
In cgroup_destroy_locked(), move setting of CGRP_DEAD above
invocations of kill_css().  This doesn't make any visible behavior
difference now but will be used to inhibit manipulating controller
enable states of a dying cgroup on the unified hierarchy.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2014-03-19 10:23:53 -04:00
Chema Gonzalez
bab5c790cc genirq: procfs: Make smp_affinity values go+r
Includes:
- /proc/irq/default_smp_affinity
- /proc/irq/*/affinity_hint
- /proc/irq/*/smp_affinity
- /proc/irq/*/smp_affinity_list

Users can distill the same information by reading /proc/interrupts.

Signed-off-by: Chema Gonzalez <chema@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Link: http://lkml.kernel.org/r/1394765455-1217-1-git-send-email-chema@google.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-03-19 12:34:29 +01:00
Thomas Gleixner
d532676cc7 softirq: Add linux/irq.h to make it compile again
On Sparc and S390 the removal of irq.h from kernel_stat.h causes:

   kernel/softirq.c:774:9: error: 'NR_IRQS_LEGACY' undeclared

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-03-19 11:28:14 +01:00
Li Zefan
3eb59ec64f cgroup: fix a failure path in create_css()
If online_css() fails, we should remove cgroup files belonging
to css->ss.

Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2014-03-18 17:15:36 -04:00
David A. Long
6fe50a28ba uprobes: allow ignoring of probe hits
Allow arches to decided to ignore a probe hit.  ARM will use this to
only call handlers if the conditions to execute a conditionally executed
instruction are satisfied.

Signed-off-by: David A. Long <dave.long@linaro.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
2014-03-18 16:39:34 -04:00
David A. Long
09294e31b1 uprobes: Kconfig dependency fix
Suggested change from Oleg Nesterov. Fixes incomplete dependencies
for uprobes feature.

Signed-off-by: David A. Long <dave.long@linaro.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
2014-03-18 16:39:33 -04:00
Linus Torvalds
59bf6c3c6c Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
 "Three small fixes"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/clock: Prevent tracing recursion in sched_clock_cpu()
  stop_machine: Fix^2 race between stop_two_cpus() and stop_cpus()
  sched/deadline: Deny unprivileged users to set/change SCHED_DEADLINE policy
2014-03-16 10:42:07 -07:00