Commit Graph

442569 Commits

Author SHA1 Message Date
Dave Chinner
43ec1460a2 xfs: xfs_dir_fsync() returns positive errno
And it should be negative.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-05-15 09:21:11 +10:00
Helge Deller
042d27acb6 parisc,metag: Do not hardcode maximum userspace stack size
This patch affects only architectures where the stack grows upwards
(currently parisc and metag only). On those do not hardcode the maximum
initial stack size to 1GB for 32-bit processes, but make it configurable
via a config option.

The main problem with the hardcoded stack size is, that we have two
memory regions which grow upwards: stack and heap. To keep most of the
memory available for heap in a flexmap memory layout, it makes no sense
to hard allocate up to 1GB of the memory for stack which can't be used
as heap then.

This patch makes the stack size for 32-bit processes configurable and
uses 80MB as default value which has been in use during the last few
years on parisc and which hasn't showed any problems yet.

Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: linux-parisc@vger.kernel.org
Cc: linux-metag@vger.kernel.org
Cc: John David Anglin <dave.anglin@bell.net>
2014-05-15 00:01:41 +01:00
James Hogan
d71f290b4e metag: Reduce maximum stack size to 256MB
Specify the maximum stack size for arches where the stack grows upward
(parisc and metag) in asm/processor.h rather than hard coding in
fs/exec.c so that metag can specify a smaller value of 256MB rather than
1GB.

This fixes a BUG on metag if the RLIMIT_STACK hard limit is increased
beyond a safe value by root. E.g. when starting a process after running
"ulimit -H -s unlimited" it will then attempt to use a stack size of the
maximum 1GB which is far too big for metag's limited user virtual
address space (stack_top is usually 0x3ffff000):

BUG: failure at fs/exec.c:589/shift_arg_pages()!

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: linux-parisc@vger.kernel.org
Cc: linux-metag@vger.kernel.org
Cc: John David Anglin <dave.anglin@bell.net>
Cc: stable@vger.kernel.org # only needed for >= v3.9 (arch/metag)
2014-05-15 00:00:35 +01:00
Mikulas Patocka
2425ce8402 metag: fix memory barriers
Volatile access doesn't really imply the compiler barrier. Volatile access
is only ordered with respect to other volatile accesses, it isn't ordered
with respect to general memory accesses. Gcc may reorder memory accesses
around volatile access, as we can see in this simple example (if we
compile it with optimization, both increments of *b will be collapsed to
just one):

void fn(volatile int *a, long *b)
{
	(*b)++;
	*a = 10;
	(*b)++;
}

Consequently, we need the compiler barrier after a write to the volatile
variable, to make sure that the compiler doesn't reorder the volatile
write with something else.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
2014-05-15 00:00:34 +01:00
Mike Snitzer
4cdd2ad780 dm mpath: fix lock order inconsistency in multipath_ioctl
Commit 3e9f1be1b4 ("dm mpath: remove process_queued_ios()") did not
consistently take the multipath device's spinlock (m->lock) before
calling dm_table_run_md_queue_async() -- which takes the q->queue_lock.

Found with code inspection using hint from reported lockdep warning.

Reported-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-05-14 16:12:17 -04:00
Joe Thornber
85ad643b7e dm thin: add timeout to stop out-of-data-space mode holding IO forever
If the pool runs out of data space, dm-thin can be configured to
either error IOs that would trigger provisioning, or hold those IOs
until the pool is resized.  Unfortunately, holding IOs until the pool is
resized can result in a cascade of tasks hitting the hung_task_timeout,
which may render the system unavailable.

Add a fixed timeout so IOs can only be held for a maximum of 60 seconds.
If LVM is going to resize a thin-pool that is out of data space it needs
to be prompt about it.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 3.14+
2014-05-14 16:11:37 -04:00
Joe Thornber
8d07e8a5f5 dm thin: allow metadata commit if pool is in PM_OUT_OF_DATA_SPACE mode
Commit 3e1a0699 ("dm thin: fix out of data space handling") introduced
a regression in the metadata commit() method by returning an error if
the pool is in PM_OUT_OF_DATA_SPACE mode.  This oversight caused a thin
device to return errors even if the default queue_if_no_space ENOSPC
handling mode is used.

Fix commit() to only fail if pool is in PM_READ_ONLY or PM_FAIL mode.

Reported-by: qindehua@163.com
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 3.14+
2014-05-14 16:11:36 -04:00
Mikulas Patocka
610f2de355 dm crypt: fix cpu hotplug crash by removing per-cpu structure
The DM crypt target used per-cpu structures to hold pointers to a
ablkcipher_request structure.  The code assumed that the work item keeps
executing on a single CPU, so it didn't use synchronization when
accessing this structure.

If a CPU is disabled by writing 0 to /sys/devices/system/cpu/cpu*/online,
the work item could be moved to another CPU.  This causes dm-crypt
crashes, like the following, because the code starts using an incorrect
ablkcipher_request:

 smpboot: CPU 7 is now offline
 BUG: unable to handle kernel NULL pointer dereference at 0000000000000130
 IP: [<ffffffffa1862b3d>] crypt_convert+0x12d/0x3c0 [dm_crypt]
 ...
 Call Trace:
  [<ffffffffa1864415>] ? kcryptd_crypt+0x305/0x470 [dm_crypt]
  [<ffffffff81062060>] ? finish_task_switch+0x40/0xc0
  [<ffffffff81052a28>] ? process_one_work+0x168/0x470
  [<ffffffff8105366b>] ? worker_thread+0x10b/0x390
  [<ffffffff81053560>] ? manage_workers.isra.26+0x290/0x290
  [<ffffffff81058d9f>] ? kthread+0xaf/0xc0
  [<ffffffff81058cf0>] ? kthread_create_on_node+0x120/0x120
  [<ffffffff813464ac>] ? ret_from_fork+0x7c/0xb0
  [<ffffffff81058cf0>] ? kthread_create_on_node+0x120/0x120

Fix this bug by removing the per-cpu definition.  The structure
ablkcipher_request is accessed via a pointer from convert_context.
Consequently, if the work item is rescheduled to a different CPU, the
thread still uses the same ablkcipher_request.

This change may undermine performance improvements intended by commit
c0297721 ("dm crypt: scale to multiple cpus") on select hardware.  In
practice no performance difference was observed on recent hardware.  But
regardless, correctness is more important than performance.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
2014-05-14 16:11:35 -04:00
Heiko Carstens
e84d2f8d2a net: filter: s390: fix JIT address randomization
This is the s390 variant of Alexei's JIT bug fix.
(patch description below stolen from Alexei's patch)

bpf_alloc_binary() adds 128 bytes of room to JITed program image
and rounds it up to the nearest page size. If image size is close
to page size (like 4000), it is rounded to two pages:
round_up(4000 + 4 + 128) == 8192
then 'hole' is computed as 8192 - (4000 + 4) = 4188
If prandom_u32() % hole selects a number >= PAGE_SIZE - sizeof(*header)
then kernel will crash during bpf_jit_free():

kernel BUG at arch/x86/mm/pageattr.c:887!
Call Trace:
 [<ffffffff81037285>] change_page_attr_set_clr+0x135/0x460
 [<ffffffff81694cc0>] ? _raw_spin_unlock_irq+0x30/0x50
 [<ffffffff810378ff>] set_memory_rw+0x2f/0x40
 [<ffffffffa01a0d8d>] bpf_jit_free_deferred+0x2d/0x60
 [<ffffffff8106bf98>] process_one_work+0x1d8/0x6a0
 [<ffffffff8106bf38>] ? process_one_work+0x178/0x6a0
 [<ffffffff8106c90c>] worker_thread+0x11c/0x370

since bpf_jit_free() does:
  unsigned long addr = (unsigned long)fp->bpf_func & PAGE_MASK;
  struct bpf_binary_header *header = (void *)addr;
to compute start address of 'bpf_binary_header'
and header->pages will pass junk to:
  set_memory_rw(addr, header->pages);

Fix it by making sure that &header->image[prandom_u32() % hole] and &header
are in the same page.

Fixes: aa2d2c73c2 ("s390/bpf,jit: address randomize and write protect jit code")

Reported-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: <stable@vger.kernel.org> # v3.11+
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-14 16:10:16 -04:00
Rajkumar Manoharan
faf1dc64e3 ath9k_htc: Stop ANI before doing hw_reset
During remain on channel request, ANI worker thread is not stopped
before doing hw reset. This is causing kernel crash in
hw_per_calibration. This change ensures that ANI is stopped before
doing chip reset and it will be rescheduled later when the chip is
configured back to home channel and having valid bss.

Reported-by: David Herrmann <dh.herrmann@gmail.com>
Tested-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-05-14 15:42:46 -04:00
John W. Linville
eac94da8b4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 2014-05-14 15:39:45 -04:00
Ursula Braun
f5738e2ef8 af_iucv: wrong mapping of sent and confirmed skbs
When sending data through IUCV a MESSAGE COMPLETE interrupt
signals that sent data memory can be freed or reused again.
With commit f9c41a62bb
"af_iucv: fix recvmsg by replacing skb_pull() function" the
MESSAGE COMPLETE callback iucv_callback_txdone() identifies
the wrong skb as being confirmed, which leads to data corruption.
This patch fixes the skb mapping logic in iucv_callback_txdone().

Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-14 15:38:39 -04:00
Kalesh AP
03a58baa78 be2net: enable interrupts in EEH resume
On some BE3 FW versions, after a HW reset, interrupts will remain disabled
for each function. So, explicitly enable the interrupts in the eeh_resume
handler, else after an eeh recovery interrupts wouldn't work.

Signed-off-by: Kalesh AP <kalesh.purayil@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-14 15:36:39 -04:00
Neil Horman
c4b160685f jme: Fix unmap loop counting error:
In my recent fix (76a691d0a: fix dma unmap warning), Ben Hutchings noted that my
loop count was incorrect.  Where j started at startidx, it should have started
at zero, and gone on for count entries, not to endidx.  Additionally, a DMA
resource exhaustion should drop the frame and (for now), return
NETDEV_TX_OK, not NETEV_TX_BUSY.  This patch fixes both of those issues:

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Ben Hutchings <ben@decadent.org.uk>
CC: "David S. Miller" <davem@davemloft.net>
CC: Guo-Fu Tseng <cooldavid@cooldavid.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-14 15:11:22 -04:00
Wolfram Sang
d7653964c5 i2c: rcar: bail out on zero length transfers
This hardware does not support zero length transfers. Instead, the
driver does one (random) byte transfers currently with undefined results
for the slaves. We now bail out.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
2014-05-14 18:59:57 +02:00
Andy Gross
fa01d096bf i2c: qup: Fix pm_runtime_get_sync usage
This patch corrects the error check on the call to pm_runtime_get_sync.

Signed-off-by: Andy Gross <agross@codeaurora.org>
Reviewed-by: Ivan T. Ivanov <iivanov@mm-sol.com>
Acked-by: Bjorn Andersson <bjorn.andersson@sonymobile.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2014-05-14 18:14:49 +02:00
Olof Johansson
ce78cc071f i2c: s3c2410: resume race fix
Don't unmark the device as suspended until after it's been re-setup.

The main race would be w.r.t. an i2c driver that gets resumed at the same
time (asyncronously), that is allowed to do a transfer since suspended
is set to 0 before reinit, but really should have seen the -EIO return
instead.

Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Doug Anderson <dianders@chromium.org>
Acked-by: Kukjin Kim <kgene.kim@samsung.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
2014-05-14 18:14:43 +02:00
Ulf Hansson
37e5eb0bae i2c: nomadik: Don't use IS_ERR for devm_ioremap
devm_ioremap() returns NULL on error, not an error.

Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
2014-05-14 18:14:35 +02:00
Du, Wenkai
47bb27e788 i2c: designware: Mask all interrupts during i2c controller enable
There have been "i2c_designware 80860F41:00: controller timed out" errors
on a number of Baytrail platforms. The issue is caused by incorrect value in
Interrupt Mask Register (DW_IC_INTR_MASK)  when i2c core is being enabled.
This causes call to __i2c_dw_enable() to immediately start the transfer which
leads to timeout. There are 3 failure modes observed:

1. Failure in S0 to S3 resume path

The default value after reset for DW_IC_INTR_MASK is 0x8ff. When we start
the first transaction after resuming from system sleep, TX_EMPTY interrupt
is already unmasked because of the hardware default.

2. Failure in normal operational path

This failure happens rarely and is hard to reproduce. Debug trace showed that
DW_IC_INTR_MASK had value of 0x254 when failure occurred, which meant
TX_EMPTY was unmasked.

3. Failure in S3 to S0 suspend path

This failure also happens rarely and is hard to reproduce. Adding debug trace
that read DW_IC_INTR_MASK made this failure not reproducible. But from ISR
call trace we could conclude TX_EMPTY was unmasked when problem occurred.

The patch masks all interrupts before the controller is enabled to resolve the
faulty DW_IC_INTR_MASK conditions.

Signed-off-by: Wenkai Du <wenkai.du@intel.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
[wsa: improved the comment and removed typo in commit msg]
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
2014-05-14 18:14:20 +02:00
Steven J. Hill
7bb3940940 MIPS: mm: Fix broken microMIPS kernel regression.
Commit f4ae17aa0f [MIPS: mm: Use scratch for
PGD when !CONFIG_MIPS_PGD_C0_CONTEXT] broke microMIPS kernel builds. This
patch refactors that code similar to what was done for the 'clear_page'
and 'copy_page' functions.

Signed-off-by: Steven J. Hill <Steven.Hill@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/6744/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2014-05-14 18:11:06 +02:00
Dan Carpenter
665ebe926e ALSA: sb_mixer: missing return statement
The if condition here was supposed to return on error but the return
statement is missing.  The effect is that the ->mixername is set to
"???" instead of "DT019X".

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2014-05-14 16:46:48 +02:00
Thomas Petazzoni
582da6527d of: make of_update_property() usable earlier in the boot process
Commit 75b57ecf9d ('of: Make device
nodes kobjects so they show up in sysfs') has turned Device Tree nodes
in kobjects and added a sysfs based representation for Device Tree
nodes. Since the sysfs logic is only available after the execution of
a core_initcall(), the patch took precautions in of_add_property() and
of_remove_property() to not do any sysfs related manipulation early in
the boot process.

However, it forgot to do the same for of_update_property(), which if
used early in the boot process (before core_initcalls have been
called), tries to call sysfs_remove_bin_file(), and crashes:

------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at /home/thomas/projets/linux-2.6/fs/kernfs/dir.c:1216 kernfs_remove_by_name_ns+0x80/0x88()
kernfs: can not remove '(null)', no directory
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.15.0-rc1-00127-g1d7e7b2-dirty #423
[<c0014910>] (unwind_backtrace) from [<c00110ec>] (show_stack+0x10/0x14)
[<c00110ec>] (show_stack) from [<c04c84b8>] (dump_stack+0x84/0x94)
[<c04c84b8>] (dump_stack) from [<c001d8c0>] (warn_slowpath_common+0x6c/0x88)
[<c001d8c0>] (warn_slowpath_common) from [<c001d90c>] (warn_slowpath_fmt+0x30/0x40)
[<c001d90c>] (warn_slowpath_fmt) from [<c0104468>] (kernfs_remove_by_name_ns+0x80/0x88)
[<c0104468>] (kernfs_remove_by_name_ns) from [<c0394d98>] (of_update_property+0xc0/0xf0)
[<c0394d98>] (of_update_property) from [<c0647248>] (mvebu_timer_and_clk_init+0xfc/0x194)
[<c0647248>] (mvebu_timer_and_clk_init) from [<c0640934>] (start_kernel+0x218/0x350)
[<c0640934>] (start_kernel) from [<00008070>] (0x8070)
---[ end trace 3406ff24bd97382e ]---
Unable to handle kernel NULL pointer dereference at virtual address 0000003c
pgd = c0004000
[0000003c] *pgd=00000000
Internal error: Oops: 5 [#1] SMP ARM
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W     3.15.0-rc1-00127-g1d7e7b2-dirty #423
task: c10ad4d8 ti: c10a2000 task.ti: c10a2000
PC is at kernfs_find_ns+0x8/0xf0
LR is at kernfs_find_and_get_ns+0x30/0x48
pc : [<c0103834>]    lr : [<c010394c>]    psr: 600001d3
sp : c10a3f34  ip : 00000073  fp : 00000000
r10: 00000000  r9 : cfffc240  r8 : cfdf2980
r7 : cf812c00  r6 : 00000000  r5 : 00000000  r4 : c10b45e0
r3 : c10ad4d8  r2 : 00000000  r1 : cf812c00  r0 : 00000000
Flags: nZCv  IRQs off  FIQs off  Mode SVC_32  ISA ARM  Segment kernel
Control: 10c53c7d  Table: 0000404a  DAC: 00000015
Process swapper/0 (pid: 0, stack limit = 0xc10a2240)
Stack: (0xc10a3f34 to 0xc10a4000)
3f20:                                              c10b45e0 00000000 00000000
3f40: cf812c00 c010394c 00000063 cf812c00 00000001 cf812c00 cfdf29ac c03932cc
3f60: 00000063 cf812bc0 cfdf29ac cf812c00 ffffffff c03943f8 cfdf2980 c0104468
3f80: cfdf2a04 cfdf2980 cf812bc0 c06634b0 c10aa3c0 c0394da4 c10f74dc cfdf2980
3fa0: cf812bc0 c0647248 c10aa3c0 ffffffff c10de940 c10aa3c0 ffffffff c0640934
3fc0: ffffffff ffffffff c06404ec 00000000 00000000 c06634b0 00000000 10c53c7d
3fe0: c10aa434 c06634ac c10ae4c8 0000406a 414fc091 00008070 00000000 00000000
[<c0103834>] (kernfs_find_ns) from [<00000001>] (0x1)
Code: e5c89001 eaffffcf e92d40f0 e1a06002 (e1d023bc)
---[ end trace 3406ff24bd97382f ]---
Kernel panic - not syncing: Attempted to kill the idle task!
---[ end Kernel panic - not syncing: Attempted to kill the idle task!

To fix this problem, we simply skip the sysfs related calls in
of_update_property(), and rely on of_init() to fix up things when it
will be called, exactly as is done in of_add_property() and
of_remove_property().

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Fixes: 75b57ecf9d ("of: Make device nodes kobjects so they show up in sysfs")
Signed-off-by: Grant Likely <grant.likely@linaro.org>
2014-05-14 15:27:36 +01:00
Johannes Berg
b4b177a555 mac80211: fix on-channel remain-on-channel
Jouni reported that if a remain-on-channel was active on the
same channel as the current operating channel, then the ROC
would start, but any frames transmitted using mgmt-tx on the
same channel would get delayed until after the ROC.

The reason for this is that the ROC starts, but doesn't have
any handling for "remain on the same channel", so it stops
the interface queues. The later mgmt-tx then puts the frame
on the interface queues (since it's on the current operating
channel) and thus they get delayed until after the ROC.

To fix this, add some logic to handle remaining on the same
channel specially and not stop the queues etc. in this case.
This not only fixes the bug but also improves behaviour in
this case as data frames etc. can continue to flow.

Cc: stable@vger.kernel.org
Reported-by: Jouni Malinen <j@w1.fi>
Tested-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-05-14 15:48:38 +02:00
Takashi Iwai
ff2354bc6e ASoC: Intel fixes for v3.15
This is a relatively large batch of fixes for the newly added
 Haswell/Baytrail drivers from Intel.  It's a bit larger than is good for
 this point in the cycle but it's all for a newly added driver so not so
 worrying as it might otherwise be.  Some of it's integration problems,
 some of it's the sort of problem usually turned up in stress tests.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJTc1YrAAoJELSic+t+oim9losP/0I50EKB6hrEugvFrNT+vBkR
 OQxhcj+WGgIguVQNqaHPBr3rQ1iGG54C5Yf1KDSi2jNIjkHSTIJ1bT2QB40rmclT
 14V6LZJL481TadcKcjLux1jgmaMvNO5Fa1gqYdLLVZFUYP/yTCLhqYrVIDO7NC6M
 VCKtZ965l4u6TMJACTdz9MzCbAwlhb7OTpgxIiyVQlE4SPq2D7M8W9IceLjiDzGg
 rviGK7MejMxc2b4i2vJGi4msaqK8aazDmMhoqrI+HZr/6pZgWkvKJ2S/zZf8AdEH
 6KfEQ8vPU/ag8M4UoH72JJtn1gsjphkEY8GyNWCvP7fIXnpcYB34c/cyaaMQ6lGN
 fGPfwQpbNobx3sJsPMRj21kFgy61rXM2PcbA4QEhPniHd6UlVPUgjkxBNE2YVM3d
 0+tWtgzWVT10F10fcKIkw00/gDohBK+4rViAu0VK5Ml90F0eYLeITWYFVyjmU52f
 7Q+0Duwm65LsA4hqFKcH5lRbk6IM29yRte/wGYY8mLWODADO0+cU6WmVRDHTlRFo
 35HidtfY8EGcU+rS24XyGd0+GRoO+nghzKTckoD3z9OCPilePkMb/dD5vC7NvmMO
 5Q15VbxmJgsus0aTDOPD634XoVTlQ/ESBA6bj5APylZorNKyANorJBphn4rBfZV+
 GXtzzjFrllDCOpKtjDzB
 =ROtF
 -----END PGP SIGNATURE-----

Merge tag 'asoc-v3.15-rc5-intel' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Intel fixes for v3.15

This is a relatively large batch of fixes for the newly added
Haswell/Baytrail drivers from Intel.  It's a bit larger than is good for
this point in the cycle but it's all for a newly added driver so not so
worrying as it might otherwise be.  Some of it's integration problems,
some of it's the sort of problem usually turned up in stress tests.
2014-05-14 14:27:12 +02:00
Takashi Iwai
7ca33c7a1d ASoC: Driver fixes for v3.15
A small set of driver fixes, nothing remarkable in itself or of any
 relevance outside of the driver.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJTc1hlAAoJELSic+t+oim9H/sP/A6r/9/qBesBiBvK7q+jXKNo
 FW6M6zM2lZpLuAnUlSLNC6nb6e6nQYTvvCV4VIXfuj8r8NPMmxwdreg5Vp2fJdAk
 0Qlr3l025B4VmPxqik5wA04jE74G8BW8ttXHpWxxOXdceaUJ4uDzryJBuCpV3vJk
 +2gprKcnR4/wgqpzkGBAwNfurkfXOxpqB8DfPj+FcfRdWyxTqFtvZi+9dSwm2bLv
 T7z8alaMcG/ZpkH7nF+Q4Vu/0tJhgyHIVRTFsJvBKNYVr8cdGi+VDq++AE/62YOE
 v6EGpsrjn243AA8UDBnCznVVZcnwhSf33OpBg44DHb0/1J343x5r4bsGlQkb+19V
 FkBBnhiozWRwVjTBU/E6ss5eua+ESNqL6/EFLpCrD7ykduNOxC/qyKJxcWmmYq9K
 B7Z3tWjX/EzdQ/tMEt26MTnXWSmtk/Yq8dg2UDkdF5r4zBsGyNgHwpi1nbYDCNcm
 alBF7TKkhCV5woJcF+ygG7oFarnQnjSL+J+LTuKRo1gn6k4/NWaVmb/8K9pkN5wS
 hZhL12rjySVXcijrbwVJC9HfFYCtGDjJeTC0ifGQ9jTvEan/Hwi/vwk5XocTywbF
 PglDm9ygwyLAC3A602RWhd/WdgLdmfCYuyKJEcnlnVlkEFmpa5LyJ7wSi6qYYmAk
 j+xPZI2ROuKtebPAP6Es
 =0trh
 -----END PGP SIGNATURE-----

Merge tag 'asoc-v3.15-rc5-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Driver fixes for v3.15

A small set of driver fixes, nothing remarkable in itself or of any
relevance outside of the driver.
2014-05-14 14:24:09 +02:00
Takashi Iwai
927cdab3b6 ASoC: Core fixes for v3.15
A few things here:
 
  - Fix the creation of spurious CODEC<->CODEC links which caused DAPM to
    have audio paths which shouldn't be present causing spurious powerups
    and potential audible issues for users.
  - Ensure the suspend->off transition doesn't have spurious transitions
    to prepare added to the sequence.
  - Fix incorrect skipping of PCM suspension for active audio streams.
  - Remove Timur Tabi from the CS4270 maintainers, Cirrus are now doing
    this and Timur no longer has the boards that he was using.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJTc1qXAAoJELSic+t+oim9rQsP/0plEA4KmnSUAD5l+LnFLZVI
 l4PrbTM9BrjdNpGk/dn0yeF9xAmKvlEa6hcclhPMSF5QIVWUxK5wiNZ4ZN1W64as
 VQsh9XHnD0AHSCD1v7ugz3vh4bvDk2cQkQUsJb9CS4Eh4Dgt1T2bjk74K3AnUBCV
 3yDkGW+15Yumo7WW8TKB1Qd7fIsuo95qua+caC1btnFz2VLWkdWixZ5D/t7tp4G2
 SriITNMqUF6gT0RWCue9sDKyfMkCN8tIOh5mvckHEYWLC5/pSgi7zKabDulUayS9
 GgG8mQIto49LL6NmzeyzBsDlf8Gk0O50BZOrEHvHQWw4dMiQ6ml8NTTxuz6oynzE
 k/b1aBUlnOf7wFHA1ILIgAHq3rMah9+/XVkxnHqPBxmP9IgIZoL//rc3DwCtqMbs
 CiIgHOPONdfiHtxMPJwCiBNqfQKDCerYVq4dmZTwU3m99Zn1keFKSZ2dcEWJK02S
 s3kQlYQ6sStpGjPrlrDbS7UcNtX+pqSI2c46GXHHRsLZbAAhHe2kwQ2y/Iry0ntc
 eh1ztL8FLZylEiXmYWjC2Sx5azKOhWVMJGdlKBbX3CNtxFTKzG0NXwHjXJXH72Y3
 Zm8SS9lK4uuJz4IKr7k2RHXETxFb1vdFlijYN5VvkGmYBoGo69G/dFDv6QAg0Ow/
 ANnqLBtrgKir3q49T2YQ
 =s9u3
 -----END PGP SIGNATURE-----

Merge tag 'asoc-v3.15-rc5-core' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Core fixes for v3.15

A few things here:

 - Fix the creation of spurious CODEC<->CODEC links which caused DAPM to
   have audio paths which shouldn't be present causing spurious powerups
   and potential audible issues for users.
 - Ensure the suspend->off transition doesn't have spurious transitions
   to prepare added to the sequence.
 - Fix incorrect skipping of PCM suspension for active audio streams.
 - Remove Timur Tabi from the CS4270 maintainers, Cirrus are now doing
   this and Timur no longer has the boards that he was using.
2014-05-14 14:23:48 +02:00
Mark Brown
cf86197ec5 Merge remote-tracking branch 'asoc/fix/pcm' into asoc-linus 2014-05-14 12:52:41 +01:00
Mark Brown
0d1203f291 Merge remote-tracking branch 'asoc/fix/dapm' into asoc-linus 2014-05-14 12:52:32 +01:00
Mark Brown
f9a405961e Merge remote-tracking branches 'asoc/fix/audmux', 'asoc/fix/cs42l52', 'asoc/fix/fsl-esai', 'asoc/fix/fsl-spdif', 'asoc/fix/rcar', 'asoc/fix/tlv320aic31xx' and 'asoc/fix/wm8962' into asoc-linus 2014-05-14 12:49:10 +01:00
Hannes Frederic Sowa
3a1cebe7e0 ipv6: fix calculation of option len in ip6_append_data
tot_len does specify the size of struct ipv6_txoptions. We need opt_flen +
opt_nflen to calculate the overall length of additional ipv6 extensions.

I found this while auditing the ipv6 output path for a memory corruption
reported by Alexey Preobrazhensky while he fuzzed an instrumented
AddressSanitizer kernel with trinity. This may or may not be the cause
of the original bug.

Fixes: 4df98e76cd ("ipv6: pmtudisc setting not respected with UFO/CORK")
Reported-by: Alexey Preobrazhensky <preobr@google.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-14 00:40:27 -04:00
Hannes Frederic Sowa
3d4405226d net: avoid dependency of net_get_random_once on nop patching
net_get_random_once depends on the static keys infrastructure to patch up
the branch to the slow path during boot. This was realized by abusing the
static keys api and defining a new initializer to not enable the call
site while still indicating that the branch point should get patched
up. This was needed to have the fast path considered likely by gcc.

The static key initialization during boot up normally walks through all
the registered keys and either patches in ideal nops or enables the jump
site but omitted that step on x86 if ideal nops where already placed at
static_key branch points. Thus net_get_random_once branches not always
became active.

This patch switches net_get_random_once to the ordinary static_key
api and thus places the kernel fast path in the - by gcc considered -
unlikely path.  Microbenchmarks on Intel and AMD x86-64 showed that
the unlikely path actually beats the likely path in terms of cycle cost
and that different nop patterns did not make much difference, thus this
switch should not be noticeable.

Fixes: a48e42920f ("net: introduce new macro net_get_random_once")
Reported-by: Tuomas Räsänen <tuomasjjrasanen@tjjr.fi>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-14 00:37:34 -04:00
Markos Chandras
f2d0801f00 MIPS: Add new AUDIT_ARCH token for the N32 ABI on MIPS64
A MIPS64 kernel may support ELF files for all 3 MIPS ABIs
(O32, N32, N64). Furthermore, the AUDIT_ARCH_MIPS{,EL}64 token
does not provide enough information about the ABI for the 64-bit
process. As a result of which, userland needs to use complex
seccomp filters to decide whether a syscall belongs to the o32 or n32
or n64 ABI. Therefore, a new arch token for MIPS64/n32 is added so it
can be used by seccomp to explicitely set syscall filters for this ABI.

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
Acked-by: Eric Paris <eparis@redhat.com>
Acked-by: Paul Moore <pmoore@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: linux-mips@linux-mips.org
Link: http://sourceforge.net/p/libseccomp/mailman/message/32239040/
Patchwork: https://patchwork.linux-mips.org/patch/6818/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2014-05-14 01:39:54 +02:00
Anthony Iliopoulos
9844f54623 x86, mm, hugetlb: Add missing TLB page invalidation for hugetlb_cow()
The invalidation is required in order to maintain proper semantics
under CoW conditions. In scenarios where a process clones several
threads, a thread operating on a core whose DTLB entry for a
particular hugepage has not been invalidated, will be reading from
the hugepage that belongs to the forked child process, even after
hugetlb_cow().

The thread will not see the updated page as long as the stale DTLB
entry remains cached, the thread attempts to write into the page,
the child process exits, or the thread gets migrated to a different
processor.

Signed-off-by: Anthony Iliopoulos <anthony.iliopoulos@huawei.com>
Link: http://lkml.kernel.org/r/20140514092948.GA17391@server-36.huawei.corp
Suggested-by: Shay Goikhman <shay.goikhman@huawei.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org> # v2.6.16+ (!)
2014-05-13 16:34:09 -07:00
Guennadi Liakhovetski
97d9d23dda [media] V4L2: fix VIDIOC_CREATE_BUFS in 64- / 32-bit compatibility mode
If a struct contains 64-bit fields, it is aligned on 64-bit boundaries
within containing structs in 64-bit compilations. This is the case with
struct v4l2_window, which contains pointers and is embedded into struct
v4l2_format, and that one is embedded into struct v4l2_create_buffers.
Unlike some other structs, used as a part of the kernel ABI as ioctl()
arguments, that are packed, these structs aren't packed. This isn't a
problem per se, but the ioctl-compat code for VIDIOC_CREATE_BUFS contains
a bug, that triggers in such 64-bit builds. That code wrongly assumes,
that in struct v4l2_create_buffers, struct v4l2_format immediately follows
the __u32 memory field, which in fact isn't the case. This bug wasn't
visible until now, because until recently hardly any applications used
this ioctl() and mostly embedded 32-bit only drivers implemented it. This
is changing now with addition of this ioctl() to some USB drivers, e.g.
UVC. This patch fixes the bug by copying parts of struct
v4l2_create_buffers separately.

Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Cc: stable@vger.kernel.org
2014-05-13 20:03:31 -03:00
Guennadi Liakhovetski
cfece5857c [media] V4L2: ov7670: fix a wrong index, potentially Oopsing the kernel from user-space
Commit 75e2bdad89 "ov7670: allow
configuration of image size, clock speed, and I/O method" uses a wrong
index to iterate an array. Apart from being wrong, it also uses an
unchecked value from user-space, which can cause access to unmapped
memory in the kernel, triggered by a normal desktop user with rights to
use V4L2 devices.

Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Acked-by: Jonathan Corbet <corbet@lwn.net>
Cc: stable@vger.kernel.org
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2014-05-13 20:03:02 -03:00
Alexei Starovoitov
773cd38f40 net: filter: x86: fix JIT address randomization
bpf_alloc_binary() adds 128 bytes of room to JITed program image
and rounds it up to the nearest page size. If image size is close
to page size (like 4000), it is rounded to two pages:
round_up(4000 + 4 + 128) == 8192
then 'hole' is computed as 8192 - (4000 + 4) = 4188
If prandom_u32() % hole selects a number >= PAGE_SIZE - sizeof(*header)
then kernel will crash during bpf_jit_free():

kernel BUG at arch/x86/mm/pageattr.c:887!
Call Trace:
 [<ffffffff81037285>] change_page_attr_set_clr+0x135/0x460
 [<ffffffff81694cc0>] ? _raw_spin_unlock_irq+0x30/0x50
 [<ffffffff810378ff>] set_memory_rw+0x2f/0x40
 [<ffffffffa01a0d8d>] bpf_jit_free_deferred+0x2d/0x60
 [<ffffffff8106bf98>] process_one_work+0x1d8/0x6a0
 [<ffffffff8106bf38>] ? process_one_work+0x178/0x6a0
 [<ffffffff8106c90c>] worker_thread+0x11c/0x370

since bpf_jit_free() does:
  unsigned long addr = (unsigned long)fp->bpf_func & PAGE_MASK;
  struct bpf_binary_header *header = (void *)addr;
to compute start address of 'bpf_binary_header'
and header->pages will pass junk to:
  set_memory_rw(addr, header->pages);

Fix it by making sure that &header->image[prandom_u32() % hole] and &header
are in the same page

Fixes: 314beb9bca ("x86: bpf_jit_comp: secure bpf jit against spraying attacks")
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-13 18:31:13 -04:00
Jason Cooper
2ed9fd28c2 MAINTAINERS: Add co-maintainer for drivers/irqchip
Thomas Gleixner has asked me to assist with the review and merging of
patches for the irqchip subsystem.

Signed-off-by: Jason Cooper <jason@lakedaemon.net>
Link: http://lkml.kernel.org/r/1400006821-32145-1-git-send-email-jason@lakedaemon.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-13 21:10:55 +02:00
Vikas Chaudhary
629c27aa0c iscsi_ibft: Fix finding Broadcom specific ibft sign
Search for Broadcom specific ibft sign "BIFT"
along with other possible values on UEFI

This patch is fix for regression introduced in
“935a9fee51c945b8942be2d7b4bae069167b4886”.
https://lkml.org/lkml/2011/12/16/353

This impacts Broadcom CNA for iSCSI Boot on UEFI platform.

Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Mike Christie <michaelc@cs.wisc.edu>
2014-05-13 14:54:14 -04:00
John W. Linville
209f6c3754 Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes 2014-05-13 14:52:34 -04:00
Mark Brown
fd30c37331 Merge remote-tracking branches 'spi/fix/pxa2xx' and 'spi/fix/qup' into spi-linus 2014-05-13 19:08:34 +01:00
Mark Brown
6c7bdf2d9c Merge remote-tracking branch 'spi/fix/core' into spi-linus 2014-05-13 19:08:33 +01:00
Charles Keepax
44330ab516 ASoC: wm8962: Update register CLASS_D_CONTROL_1 to be non-volatile
The register CLASS_D_CONTROL_1 is marked as volatile because it contains
a bit, DAC_MUTE, which is also mirrored in the ADC_DAC_CONTROL_1
register. This causes problems for the "Speaker Switch" control, which
will report an error if the CODEC is suspended because it relies on a
volatile register.

To resolve this issue mark CLASS_D_CONTROL_1 as non-volatile and
manually keep the register cache in sync by updating both bits when
changing the mute status.

Reported-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com>
Tested-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Mark Brown <broonie@linaro.org>
Cc: stable@vger.kernel.org
2014-05-13 19:02:30 +01:00
Jarkko Nikula
cffd6665f5 ASoC: Intel: Fix Baytrail SST DSP firmware loading
Commit 10df350977 ("ASoC: Intel: Fix Audio DSP usage when IOMMU is
enabled.") caused following regression in Baytrail SST:

baytrail-pcm-audio baytrail-pcm-audio: error: DMA alloc failed
baytrail-pcm-audio baytrail-pcm-audio: error: failed to load firmware

Fix this by calling dma_coerce_mask_and_coherent() in sst_byt_init() with
the same dma_dev device what is now used in sst_fw_new() when allocating the
DMA buffer.

Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Mark Brown <broonie@linaro.org>
2014-05-13 18:21:02 +01:00
David S. Miller
6262971a8a Included changes:
- properly release neigh_ifinfo in batadv_iv_ogm_process_per_outif()
 - properly release orig_ifinfo->router when freeing orig_ifinfo
 - properly release neigh_node objects during periodic check
 - properly release neigh_info objects when the related hard_iface
   is free'd
 
 These changes are all very important because they fix some
 reference counting imbalances that lead to the
 impossibility of releasing the netdev object used by
 batman-adv on shutdown.
 The consequence is that such object cannot be destroyed by
 the networking stack (the refcounter does not reach zero)
 thus bringing the system in hanging state during a normal
 reboot operation or a network reconfiguration.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJTbyNlAAoJEEKTMo6mOh1VzeUQAJmcP73MMwdFDGfI+3DUD43Z
 ziaWzHK1/NAkERIJMYu/Nj9BPhFJ/JgYNoYGd4eZ+0IVzIidBKffpGvZYLKJaBBb
 kVzDt8sHgm7T+bmJdGK5zBCkCrQ66T1/7jF7evzWCtdmzAj9Ld+cJha6sZ6OLY4v
 WusFFHH2yQgzOGML52HdM99lIfZJu53sdQtYrMI7FpmObwmoBw1VQsmLsJbbFj0A
 XbFWYNOtQ0s8JvuHPnHB2gsczMXG6AdDuYdG1douOUryjsdg4AsKVWbPWaSuIyS9
 ED6TiNsxtRt3A2YDgKrYmcGWHIc7CR4TE97DpdaB1xOEe/h0JPy8NEXaTiXifVi0
 yWXaDZAl0J1gEKxda5foqIJZEScQyqWnAGFIIMVsxWxMpv9V3C+XaMgpgC5yQdoQ
 hgs6lv8U/w7Qevu4oaU2oq64C5ipyzheLuL+l9Ykwig9brJ9pqvBhEr34VDyyLnK
 l1VVQP5Y94gsPX2FuBaFgQ6oN3xjAkzFWDVKPtdYhMW7l93ER31KWgyJ53zK0Avk
 wl/h5Xvep7vgA1pvyiu7Lom47QX2SVY3Xt6vsJ42qrR9bp1sLZ+piZaSBPTSuNmo
 YySwgku6QlQfCFThh09zjuQ8+zwlq5Enjp+fvy/NtzEhTzK1gmknrQo0QF+Fj1Fj
 5yz30/XWjUTn1dtBNeBw
 =GsPT
 -----END PGP SIGNATURE-----

Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge

Included changes:
- properly release neigh_ifinfo in batadv_iv_ogm_process_per_outif()
- properly release orig_ifinfo->router when freeing orig_ifinfo
- properly release neigh_node objects during periodic check
- properly release neigh_info objects when the related hard_iface
  is free'd

These changes are all very important because they fix some
reference counting imbalances that lead to the
impossibility of releasing the netdev object used by
batman-adv on shutdown.
The consequence is that such object cannot be destroyed by
the networking stack (the refcounter does not reach zero)
thus bringing the system in hanging state during a normal
reboot operation or a network reconfiguration.
2014-05-13 12:53:36 -04:00
Duan Jiong
2176d5d418 neigh: set nud_state to NUD_INCOMPLETE when probing router reachability
Since commit 7e98056964("ipv6: router reachability probing"), a router falls
into NUD_FAILED will be probed.

Now if function rt6_select() selects a router which neighbour state is NUD_FAILED,
and at the same time function rt6_probe() changes the neighbour state to NUD_PROBE,
then function dst_neigh_output() can directly send packets, but actually the
neighbour still is unreachable. If we set nud_state to NUD_INCOMPLETE instead
NUD_PROBE, packets will not be sent out until the neihbour is reachable.

In addition, because the route should be probes with a single NS, so we must
set neigh->probes to neigh_max_probes(), then the neigh timer timeout and function
neigh_timer_handler() will not send other NS Messages.

Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-13 12:43:05 -04:00
Ralf Baechle
367f0b50e5 MIPS: Wire up renameat2 syscall.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2014-05-13 17:57:33 +02:00
Dirk Brandewie
d40a63c45b intel_pstate: remove setting P state to MAX on init
Setting the P state of the core to max at init time is a hold over
from early implementation of intel_pstate where intel_pstate disabled
cpufreq and loaded VERY early in the boot sequence.  This was to
ensure that intel_pstate did not affect boot time. This in not needed
now that intel_pstate is a cpufreq driver.

Removing this covers the case where a CPU has gone through a manual
CPU offline/online cycle and the P state is set to MAX on init and the
CPU immediately goes idle.  Due to HW coordination the P state request
on the idle CPU will drag all cores to MAX P state until the load is
reevaluated when to core goes non-idle.

Reported-by: Patrick Marlier <patrick.marlier@gmail.com>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Cc: 3.14+ <stable@vger.kernel.org> # 3.14+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-05-13 17:39:13 +02:00
Tejun Heo
36e9d2ebcc cgroup: fix rcu_read_lock() leak in update_if_frozen()
While updating cgroup_freezer locking, 68fafb77d827 ("cgroup_freezer:
replace freezer->lock with freezer_mutex") introduced a bug in
update_if_frozen() where it returns with rcu_read_lock() held.  Fix it
by adding rcu_read_unlock() before returning.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: kbuild test robot <fengguang.wu@intel.com>
2014-05-13 11:28:30 -04:00
Tejun Heo
e5ced8ebb1 cgroup_freezer: replace freezer->lock with freezer_mutex
After 96d365e0b8 ("cgroup: make css_set_lock a rwsem and rename it
to css_set_rwsem"), css task iterators requires sleepable context as
it may block on css_set_rwsem.  I missed that cgroup_freezer was
iterating tasks under IRQ-safe spinlock freezer->lock.  This leads to
errors like the following on freezer state reads and transitions.

  BUG: sleeping function called from invalid context at /work
 /os/work/kernel/locking/rwsem.c:20
  in_atomic(): 0, irqs_disabled(): 0, pid: 462, name: bash
  5 locks held by bash/462:
   #0:  (sb_writers#7){.+.+.+}, at: [<ffffffff811f0843>] vfs_write+0x1a3/0x1c0
   #1:  (&of->mutex){+.+.+.}, at: [<ffffffff8126d78b>] kernfs_fop_write+0xbb/0x170
   #2:  (s_active#70){.+.+.+}, at: [<ffffffff8126d793>] kernfs_fop_write+0xc3/0x170
   #3:  (freezer_mutex){+.+...}, at: [<ffffffff81135981>] freezer_write+0x61/0x1e0
   #4:  (rcu_read_lock){......}, at: [<ffffffff81135973>] freezer_write+0x53/0x1e0
  Preemption disabled at:[<ffffffff81104404>] console_unlock+0x1e4/0x460

  CPU: 3 PID: 462 Comm: bash Not tainted 3.15.0-rc1-work+ #10
  Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
   ffff88000916a6d0 ffff88000e0a3da0 ffffffff81cf8c96 0000000000000000
   ffff88000e0a3dc8 ffffffff810cf4f2 ffffffff82388040 ffff880013aaf740
   0000000000000002 ffff88000e0a3de8 ffffffff81d05974 0000000000000246
  Call Trace:
   [<ffffffff81cf8c96>] dump_stack+0x4e/0x7a
   [<ffffffff810cf4f2>] __might_sleep+0x162/0x260
   [<ffffffff81d05974>] down_read+0x24/0x60
   [<ffffffff81133e87>] css_task_iter_start+0x27/0x70
   [<ffffffff8113584d>] freezer_apply_state+0x5d/0x130
   [<ffffffff81135a16>] freezer_write+0xf6/0x1e0
   [<ffffffff8112eb88>] cgroup_file_write+0xd8/0x230
   [<ffffffff8126d7b7>] kernfs_fop_write+0xe7/0x170
   [<ffffffff811f0756>] vfs_write+0xb6/0x1c0
   [<ffffffff811f121d>] SyS_write+0x4d/0xc0
   [<ffffffff81d08292>] system_call_fastpath+0x16/0x1b

freezer->lock used to be used in hot paths but that time is long gone
and there's no reason for the lock to be IRQ-safe spinlock or even
per-cgroup.  In fact, given the fact that a cgroup may contain large
number of tasks, it's not a good idea to iterate over them while
holding IRQ-safe spinlock.

Let's simplify locking by replacing per-cgroup freezer->lock with
global freezer_mutex.  This also makes the comments explaining the
intricacies of policy inheritance and the locking around it as the
states are protected by a common mutex.

The conversion is mostly straight-forward.  The followings are worth
mentioning.

* freezer_css_online() no longer needs double locking.

* freezer_attach() now performs propagation simply while holding
  freezer_mutex.  update_if_frozen() race no longer exists and the
  comment is removed.

* freezer_fork() now tests whether the task is in root cgroup using
  the new task_css_is_root() without doing rcu_read_lock/unlock().  If
  not, it grabs freezer_mutex and performs the operation.

* freezer_read() and freezer_change_state() grab freezer_mutex across
  the whole operation and pin the css while iterating so that each
  descendant processing happens in sleepable context.

Fixes: 96d365e0b8 ("cgroup: make css_set_lock a rwsem and rename it to css_set_rwsem")
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2014-05-13 11:26:31 -04:00
Tejun Heo
5024ae29cd cgroup: introduce task_css_is_root()
Determining the css of a task usually requires RCU read lock as that's
the only thing which keeps the returned css accessible till its
reference is acquired; however, testing whether a task belongs to the
root can be performed without dereferencing the returned css by
comparing the returned pointer against the root one in init_css_set[]
which never changes.

Implement task_css_is_root() which can be invoked in any context.
This will be used by the scheduled cgroup_freezer change.

v2: cgroup no longer supports modular controllers.  No need to export
    init_css_set.  Pointed out by Li.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2014-05-13 11:26:27 -04:00