Sometimes, warnings about ioctls to partition happen often enough that they
form majority of the warnings in the kernel log and users complain. In some
cases warnings are about ioctls such as SG_IO so it's not good to get rid of
the warnings completely as they can ease debugging of userspace problems
when ioctl is refused.
Since I have seen warnings from lots of commands, including some proprietary
userspace applications, I don't think disallowing the ioctls for processes
with CAP_SYS_RAWIO will happen in the near future if ever. So lets just
stop warning for processes with CAP_SYS_RAWIO for which ioctl is allowed.
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: James Bottomley <JBottomley@parallels.com>
CC: linux-scsi@vger.kernel.org
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This function was only used by btrfs code in btrfs_abort_devices()
(seems in a wrong way).
It was removed in commit d07eb91170,
So, Let's remove the dead code to avoid any confusion.
Changes in v2: update commit log, btrfs_abort_devices() was removed
already.
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-kernel@vger.kernel.org
Cc: Chris Mason <chris.mason@oracle.com>
Cc: linux-btrfs@vger.kernel.org
Cc: David Sterba <dave@jikos.cz>
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Commit 777eb1bf15 disconnects externally
supplied queue_lock before blk_drain_queue(). Switching the lock would
introduce lock unbalance because theads which have taken the external
lock might unlock the internal lock in the during the queue drain. This
patch mitigate this by disconnecting the lock after the queue draining
since queue draining makes a lot of request_queue users go away.
However, please note, this patch only makes the problem less likely to
happen. Anyone who still holds a ref might try to issue a new request on
a dead queue after the blk_cleanup_queue() finishes draining, the lock
unbalance might still happen in this case.
=====================================
[ BUG: bad unlock balance detected! ]
3.4.0+ #288 Not tainted
-------------------------------------
fio/17706 is trying to release lock (&(&q->__queue_lock)->rlock) at:
[<ffffffff81329372>] blk_queue_bio+0x2a2/0x380
but there are no more locks to release!
other info that might help us debug this:
1 lock held by fio/17706:
#0: (&(&vblk->lock)->rlock){......}, at: [<ffffffff81327f1a>]
get_request_wait+0x19a/0x250
stack backtrace:
Pid: 17706, comm: fio Not tainted 3.4.0+ #288
Call Trace:
[<ffffffff81329372>] ? blk_queue_bio+0x2a2/0x380
[<ffffffff810dea49>] print_unlock_inbalance_bug+0xf9/0x100
[<ffffffff810dfe4f>] lock_release_non_nested+0x1df/0x330
[<ffffffff811dae24>] ? dio_bio_end_aio+0x34/0xc0
[<ffffffff811d6935>] ? bio_check_pages_dirty+0x85/0xe0
[<ffffffff811daea1>] ? dio_bio_end_aio+0xb1/0xc0
[<ffffffff81329372>] ? blk_queue_bio+0x2a2/0x380
[<ffffffff81329372>] ? blk_queue_bio+0x2a2/0x380
[<ffffffff810e0079>] lock_release+0xd9/0x250
[<ffffffff81a74553>] _raw_spin_unlock_irq+0x23/0x40
[<ffffffff81329372>] blk_queue_bio+0x2a2/0x380
[<ffffffff81328faa>] generic_make_request+0xca/0x100
[<ffffffff81329056>] submit_bio+0x76/0xf0
[<ffffffff8115470c>] ? set_page_dirty_lock+0x3c/0x60
[<ffffffff811d69e1>] ? bio_set_pages_dirty+0x51/0x70
[<ffffffff811dd1a8>] do_blockdev_direct_IO+0xbf8/0xee0
[<ffffffff811d8620>] ? blkdev_get_block+0x80/0x80
[<ffffffff811dd4e5>] __blockdev_direct_IO+0x55/0x60
[<ffffffff811d8620>] ? blkdev_get_block+0x80/0x80
[<ffffffff811d92e7>] blkdev_direct_IO+0x57/0x60
[<ffffffff811d8620>] ? blkdev_get_block+0x80/0x80
[<ffffffff8114c6ae>] generic_file_aio_read+0x70e/0x760
[<ffffffff810df7c5>] ? __lock_acquire+0x215/0x5a0
[<ffffffff811e9924>] ? aio_run_iocb+0x54/0x1a0
[<ffffffff8114bfa0>] ? grab_cache_page_nowait+0xc0/0xc0
[<ffffffff811e82cc>] aio_rw_vect_retry+0x7c/0x1e0
[<ffffffff811e8250>] ? aio_fsync+0x30/0x30
[<ffffffff811e9936>] aio_run_iocb+0x66/0x1a0
[<ffffffff811ea9b0>] do_io_submit+0x6f0/0xb80
[<ffffffff8134de2e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff811eae50>] sys_io_submit+0x10/0x20
[<ffffffff81a7c9e9>] system_call_fastpath+0x16/0x1b
Changes since v2: Update commit log to explain how the code is still
broken even if we delay the lock switching after the drain.
Changes since v1: Update commit log as Tejun suggested.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
After hot-unplug a stressed disk, I found that rl->wait[] is not empty
while rl->count[] is empty and there are theads still sleeping on
get_request after the queue cleanup. With simple debug code, I found
there are exactly nr_sleep - nr_wakeup of theads in D state. So there
are missed wakeup.
$ dmesg | grep nr_sleep
[ 52.917115] ---> nr_sleep=1046, nr_wakeup=873, delta=173
$ vmstat 1
1 173 0 712640 24292 96172 0 0 0 0 419 757 0 0 0 100 0
To quote Tejun:
Ah, okay, freed_request() wakes up single waiter with the assumption
that after the wakeup there will at least be one successful allocation
which in turn will continue the wakeup chain until the wait list is
empty - ie. waiter wakeup is dependent on successful request
allocation happening after each wakeup. With queue marked dead, any
woken up waiter fails the allocation path, so the wakeup chaining is
lost and we're left with hung waiters. What we need is wake_up_all()
after drain completion.
This patch fixes the missed wakeup by waking up all the theads which
are sleeping on wait queue after queue drain.
Changes in v2: Drop waitqueue_active() optimization
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Asias He <asias@redhat.com>
Fixed a bug by me, where stacked devices would oops on calling
blk_drain_queue() since ->rq.wait[] do not get initialized unless
it's a full queue setup.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Fix a regression introduced by 7eaceaccab ("block: remove per-queue
plugging"). In that patch, Jens removed the whole mm_unplug_device()
function, which used to be the trigger to make umem start to work.
We need to implement unplugging to make umem start to work, or I/O will
never be triggered.
Signed-off-by: Tao Guo <Tao.Guo@emc.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Shaohua Li <shli@kernel.org>
Cc: <stable@vger.kernel.org>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Dave Jones reported a kernel BUG at mm/slub.c:3474! triggered
by splice_shrink_spd() called from vmsplice_to_pipe()
commit 35f3d14dbb (pipe: add support for shrinking and growing pipes)
added capability to adjust pipe->buffers.
Problem is some paths don't hold pipe mutex and assume pipe->buffers
doesn't change for their duration.
Fix this by adding nr_pages_max field in struct splice_pipe_desc, and
use it in place of pipe->buffers where appropriate.
splice_shrink_spd() loses its struct pipe_inode_info argument.
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Tom Herbert <therbert@google.com>
Cc: stable <stable@vger.kernel.org> # 2.6.35
Tested-by: Dave Jones <davej@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We must not look at mdev->actlog, unless we have a get_ldev() reference.
It also does not make much sense to try to disconnect or pull-ahead of
the peer, if we don't have good local data.
Only even consider congestion policies, if our local disk is D_UP_TO_DATE.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
If a read is aborted due to force-detach of a supposedly unresponsive
local backing device, and retried on the peer, it can happen that the
local request later still completes (hopefully with an error).
As it may already have been completed to upper layers meanwhile,
it must not be retried again now.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
BUG: unable to handle kernel NULL pointer dereference at (null)
...
[<d1e17561>] ? _drbd_bm_set_bits+0x151/0x240 [drbd]
[<d1e236f8>] ? receive_bitmap+0x4f8/0xbc0 [drbd]
This fixes an off-by-one error in the receive_bitmap() path,
if run-length encoded bitmap transfer is enabled.
If the bitmap is an exact multiple of PAGE_SIZE, which means the visible
capacity of the drbd device is an exact multiple of 128 MiB (for 4k page
size), and bitmap compression (use-rle) is enabled (which became default
with 8.4), and the very last bit is dirty and reported in an rle
comressed bitmap packet, we ended up trying to kmap_atomic a page pointer
that does not exist (bitmap->bm_pages[last index + 1]).
bug introduced by:
Date: Fri Jul 24 15:33:24 2009 +0200
set bits: optimize for complete last word, fix off-by-one-word corner case
made effective by:
Date: Thu Dec 16 00:32:38 2010 +0100
drbd: get rid of unused debug code
Long time ago, we had paranoia code in the bitmap that allocated one
extra word, assigned a magic value, and checked on every occasion that
the magic value was still unchanged.
That debug code is unused, the extra long word complicates code a bit.
Get rid of it.
No-one triggered this bug in the last few years, because a large subset
of our userbase is unaffected:
* typically the last few blocks of a device are not modified
frequently, and remain unset
* use-rle was disabled by default in drbd < 8.4
* those with slightly "odd" device sizes, or
* drbd internal meta data (which will skew the device size slightly,
thus makes it harder to have a bug relevant device size)
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Part of the ring structure is the 'id' field which is under
control of the frontend. The frontend stamps it with "some"
value (this some in this implementation being a value less
than BLK_RING_SIZE), and when it gets a response expects
said value to be in the response structure. We have a check
for the id field when spolling new requests but not when
de-spolling responses.
We also add an extra check in add_id_to_freelist to make
sure that the 'struct request' was not NULL - as we cannot
pass a NULL to __blk_end_request_all, otherwise that crashes
(and all the operations that the response is dealing with
end up with __blk_end_request_all).
Lastly we also print the name of the operation that failed.
[v1: s/BUG/WARN/ suggested by Stefano]
[v2: Add extra check in add_id_to_freelist]
[v3: Redid op_name per Jan's suggestion]
[v4: add const * and add WARN on failure returns]
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
blkg_destroy() caches @blkg->q in local variable @q. While there are
two places which needs @blkg->q, only lockdep_assert_held() used the
local variable leading to unused local variable warning if lockdep is
configured out. Drop the local variable and just use @blkg->q
directly.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Rakesh Iyer <rni@google.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
On module load, creates a debugfs parent 'rssd' in debugfs root. Then for each
device, create a new node with corresponding disk name. Under the new node, two
entries 'registers' and 'flags' are created.
NOTE: These entries were removed from sysfs in the previous patch
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This patch removes entries 'registers' and 'flags' from sysfs. Updated ABI file
to reflect this change.
Reported-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When policy data allocation fails in the middle, blkg_alloc() invokes
blkg_free() to destroy the half constructed blkg. This ends up
calling pd_exit_fn() on policy datas which didn't go through
pd_init_fn(). Fix it by making blkg_alloc() call pd_init_fn()
immediately after each policy data allocation.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
cfq may be built w/ or w/o blkcg support depending on
CONFIG_CFQ_CGROUP_IOSCHED. If blkcg support is disabled, most of
related code is ifdef'd out but some part is left dangling -
blkcg_policy_cfq is left zero-filled and blkcg_policy_[un]register()
calls are made on it.
Feeding zero filled policy to blkcg_policy_register() is incorrect and
triggers the following WARN_ON() if CONFIG_BLK_CGROUP &&
!CONFIG_CFQ_GROUP_IOSCHED.
------------[ cut here ]------------
WARNING: at block/blk-cgroup.c:867
Modules linked in:
Modules linked in:
CPU: 3 Not tainted 3.4.0-09547-gfb21aff #1
Process swapper/0 (pid: 1, task: 000000003ff80000, ksp: 000000003ff7f8b8)
Krnl PSW : 0704100180000000 00000000003d76ca (blkcg_policy_register+0xca/0xe0)
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:1 PM:0 EA:3
Krnl GPRS: 0000000000000000 00000000014b85ec 00000000014b85b0 0000000000000000
000000000096fb60 0000000000000000 00000000009a8e78 0000000000000048
000000000099c070 0000000000b6f000 0000000000000000 000000000099c0b8
00000000014b85b0 0000000000667580 000000003ff7fd98 000000003ff7fd70
Krnl Code: 00000000003d76be: a7280001 lhi %r2,1
00000000003d76c2: a7f4ffdf brc 15,3d7680
#00000000003d76c6: a7f40001 brc 15,3d76c8
>00000000003d76ca: a7c8ffea lhi %r12,-22
00000000003d76ce: a7f4ffce brc 15,3d766a
00000000003d76d2: a7f40001 brc 15,3d76d4
00000000003d76d6: a7c80000 lhi %r12,0
00000000003d76da: a7f4ffc2 brc 15,3d765e
Call Trace:
([<0000000000b6f000>] initcall_debug+0x0/0x4)
[<0000000000989e8a>] cfq_init+0x62/0xd4
[<00000000001000ba>] do_one_initcall+0x3a/0x170
[<000000000096fb60>] kernel_init+0x214/0x2bc
[<0000000000623202>] kernel_thread_starter+0x6/0xc
[<00000000006231fc>] kernel_thread_starter+0x0/0xc
no locks held by swapper/0/1.
Last Breaking-Event-Address:
[<00000000003d76c6>] blkcg_policy_register+0xc6/0xe0
---[ end trace b8ef4903fcbf9dd3 ]---
This patch fixes the problem by ensuring all blkcg support code is
inside CONFIG_CFQ_GROUP_IOSCHED.
* blkcg_policy_cfq declaration and blkg_to_cfqg() definition are moved
inside the first CONFIG_CFQ_GROUP_IOSCHED block. __maybe_unused is
dropped from blkcg_policy_cfq decl.
* blkcg_deactivate_poilcy() invocation is moved inside ifdef. This
also makes the activation logic match cfq_init_queue().
* All blkcg_policy_[un]register() invocations are moved inside ifdef.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
LKML-Reference: <20120601112954.GC3535@osiris.boeblingen.de.ibm.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
cfq_init() would return zero after kmem cache creation failure. Fix
so that it returns -ENOMEM.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Should be 'exynos5_xxx' instead of 'exonys5_xxx'.
It happened at the commit 30b842889e ("Merge tag 'soc2' of
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc")
during v3.5 merge window.
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
[ My bad - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull some left-over PM patches from Rafael J. Wysocki.
* 'pm-acpi' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / PM: Make acpi_pm_device_sleep_state() follow the specification
ACPI / PM: Make __acpi_bus_get_power() cover D3cold correctly
ACPI / PM: Fix error messages in drivers/acpi/bus.c
rtc-cmos / PM: report wakeup event on ACPI RTC alarm
ACPI / PM: Generate wakeup events on fixed power button
This reverts commit 5ceb9ce6fe.
That commit seems to be the cause of the mm compation list corruption
issues that Dave Jones reported. The locking (or rather, absense
there-of) is dubious, as is the use of the 'page' variable once it has
been found to be outside the pageblock range.
So revert it for now, we can re-visit this for 3.6. If we even need to:
as Minchan Kim says, "The patch wasn't a bug fix and even test workload
was very theoretical".
Reported-and-tested-by: Dave Jones <davej@redhat.com>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
New tmpfs use of !PageUptodate pages for fallocate() is triggering the
WARNING: at mm/page-writeback.c:1990 when __set_page_dirty_nobuffers()
is called from migrate_page_copy() for compaction.
It is anomalous that migration should use __set_page_dirty_nobuffers()
on an address_space that does not participate in dirty and writeback
accounting; and this has also been observed to insert surprising dirty
tags into a tmpfs radix_tree, despite tmpfs not using tags at all.
We should probably give migrate_page_copy() a better way to preserve the
tag and migrate accounting info, when mapping_cap_account_dirty(). But
that needs some more work: so in the interim, avoid the warning by using
a simple SetPageDirty on PageSwapBacked pages.
Reported-and-tested-by: Dave Jones <davej@redhat.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The comment above it says "Stat data, not accessed from path walking",
but in fact some of inode fields we use for the common stat data was way
down at the end of the inode, causing unnecessary cache misses for the
common stat operations.
The inode structure is pretty big, and this can change padding depending
on field width, but at least on the common 64-bit configurations this
doesn't change the size. Some of our inode layout has historically been
to tro to avoid unnecessary padding fields, but cache locality is at
least as important for layout, if not more.
Noticed by looking at kernel profiles, and noticing that the "i_blkbits"
access stood out like a sore thumb.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
and provide a simple reserve/release mechanism for userspace tools to
access thin provisioning metadata while the pool is in use.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJPyqIdAAoJEK2W1qbAHj1nSPoQAJvAb/6UHufTWC/lufbEyo7t
ft6uwZZ4S/VV1Gdx8V5YXo3rxkVIZj/CV0hiJIDctmDMKGPMlzup39kCgjD/rOUF
mzcFAE8sEr3QEavkfjSWw2RHIIlhnJpvqVnb8nu3p/mSgAB4qYGgaDjBpi+W60PV
aqQSSWgwH1uNhfGDBIxQoJ8OIjjYvKPIf2Ir2FAXam/dNi9chWO9nzFdj3q2LccP
nZir094BDsFac1BF0FYW3J+rgT1FfPO7RRGAQct6WNJ197IZlYWYjKH3XehxnUHE
wgiJmjfUO8vrho1hhWmWDOesKJPPWFN67EQnl5FqAu9itP7c7k8bd7Ay4jWgtZQU
QIx10uiAgAuFUmTdWGK1fLlE8HGKUFINYLp63N5n5NZ4TDJrgo8e7CIID3rvYf/O
EtmL7HzAyztL9Uc6oaXzCK6TgMUtd/ht8OJCDFhjitzQTNjbrfAGz6m+RHnEZyyj
dtOVK7WBlmuKEANl2vDFGuVVF0+MwJLTlvPx1/b/ejFvnHI/R5Wuk9EH7t/DO4LB
nCmiwzB6uWMzU3y3vnZG72AYSF5NTKSvnAl5B8U/0rI1MZU+6PehjeviJNx6ddJN
2YheHBLU4vbBV/LF4XIpaHK2aiHN1ltaKCp8INo3EKhCwpR4ZdlVvnAGU9ocf9+c
qoaFTOP7zGD9zgPeGjoG
=wCpY
-----END PGP SIGNATURE-----
Merge tag 'dm-3.5-changes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm
Pull device-mapper updates from Alasdair G Kergon:
"Improve multipath's retrying mechanism in some defined circumstances
and provide a simple reserve/release mechanism for userspace tools to
access thin provisioning metadata while the pool is in use."
* tag 'dm-3.5-changes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm:
dm thin: provide userspace access to pool metadata
dm thin: use slab mempools
dm mpath: allow ioctls to trigger pg init
dm mpath: delay retry of bypassed pg
dm mpath: reduce size of struct multipath
This patch implements two new messages that can be sent to the thin
pool target allowing it to take a snapshot of the _metadata_. This,
read-only snapshot can be accessed by userland, concurrently with the
live target.
Only one metadata snapshot can be held at a time. The pool's status
line will give the block location for the current msnap.
Since version 0.1.5 of the userland thin provisioning tools, the
thin_dump program displays the msnap as follows:
thin_dump -m <msnap root> <metadata dev>
Available here: https://github.com/jthornber/thin-provisioning-tools
Now that userland can access the metadata we can do various things
that have traditionally been kernel side tasks:
i) Incremental backups.
By using metadata snapshots we can work out what blocks have
changed over time. Combined with data snapshots we can ensure
the data doesn't change while we back it up.
A short proof of concept script can be found here:
https://github.com/jthornber/thinp-test-suite/blob/master/incremental_backup_example.rb
ii) Migration of thin devices from one pool to another.
iii) Merging snapshots back into an external origin.
iv) Asyncronous replication.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Use dedicated caches prefixed with a "dm_" name rather than relying on
kmalloc mempools backed by generic slab caches so the memory usage of
thin provisioning (and any leaks) can be accounted for independently.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
After the failure of a group of paths, any alternative paths that
need initialising do not become available until further I/O is sent to
the device. Until this has happened, ioctls return -EAGAIN.
With this patch, new paths are made available in response to an ioctl
too. The processing of the ioctl gets delayed until this has happened.
Instead of returning an error, we submit a work item to kmultipathd
(that will potentially activate the new path) and retry in ten
milliseconds.
Note that the patch doesn't retry an ioctl if the ioctl itself fails due
to a path failure. Such retries should be handled intelligently by the
code that generated the ioctl in the first place, noting that some SCSI
commands should not be retried because they are not idempotent (XOR write
commands). For commands that could be retried, there is a danger that
if the device rejected the SCSI command, the path could be errorneously
marked as failed, and the request would be retried on another path which
might fail too. It can be determined if the failure happens on the
device or on the SCSI controller, but there is no guarantee that all
SCSI drivers set these flags correctly.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
If I/O needs retrying and only bypassed priority groups are available,
set the pg_init_delay_retry flag to wait before retrying.
If, for example, the reason for the bypass is that the controller is
getting reset or there is a firmware upgrade happening, retrying right
away would cause a flood of log messages and retries for what could be a
few seconds or even several minutes.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Move multipath structure's 'lock' and 'queue_size' members to eliminate
two 4-byte holes. Also use a bit within a single unsigned int for each
existing flag (saves 8-bytes). This allows future flags to be added
without each consuming an unsigned int.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Pull networking updates from David Miller:
1) Make syn floods consume significantly less resources by
a) Not pre-COW'ing routing metrics for SYN/ACKs
b) Mirroring the device queue mapping of the SYN for the SYN/ACK
reply.
Both from Eric Dumazet.
2) Fix calculation errors in Byte Queue Limiting, from Hiroaki SHIMODA.
3) Validate the length requested when building a paged SKB for a
socket, so we don't overrun the page vector accidently. From Jason
Wang.
4) When netlabel is disabled, we abort all IP option processing when we
see a CIPSO option. This isn't the right thing to do, we should
simply skip over it and continue processing the remaining options
(if any). Fix from Paul Moore.
5) SRIOV fixes for the mellanox driver from Jack orgenstein and Marcel
Apfelbaum.
6) 8139cp enables the receiver before the ring address is properly
programmed, which potentially lets the device crap over random
memory. Fix from Jason Wang.
7) e1000/e1000e fixes for i217 RST handling, and an improper buffer
address reference in jumbo RX frame processing from Bruce Allan and
Sebastian Andrzej Siewior, respectively.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
fec_mpc52xx: fix timestamp filtering
mcs7830: Implement link state detection
e1000e: fix Rapid Start Technology support for i217
e1000: look into the page instead of skb->data for e1000_tbi_adjust_stats()
r8169: call netif_napi_del at errpaths and at driver unload
tcp: reflect SYN queue_mapping into SYNACK packets
tcp: do not create inetpeer on SYNACK message
8139cp/8139too: terminate the eeprom access with the right opmode
8139cp: set ring address before enabling receiver
cipso: handle CIPSO options correctly when NetLabel is disabled
net: sock: validate data_len before allocating skb in sock_alloc_send_pskb()
bql: Avoid possible inconsistent calculation.
bql: Avoid unneeded limit decrement.
bql: Fix POSDIFF() to integer overflow aware.
net/mlx4_core: Fix obscure mlx4_cmd_box parameter in QUERY_DEV_CAP
net/mlx4_core: Check port out-of-range before using in mlx4_slave_cap
net/mlx4_core: Fixes for VF / Guest startup flow
net/mlx4_en: Fix improper use of "port" parameter in mlx4_en_event
net/mlx4_core: Fix number of EQs used in ICM initialisation
net/mlx4_core: Fix the slave_id out-of-range test in mlx4_eq_int
Pull straggler x86 fixes from Peter Anvin:
"Three groups of patches:
- EFI boot stub documentation and the ability to print error messages;
- Removal for PTRACE_ARCH_PRCTL for x32 (obsolete interface which
should never have been ported, and the port is broken and
potentially dangerous.)
- ftrace stack corruption fixes. I'm not super-happy about the
technical implementation, but it is probably the least invasive in
the short term. In the future I would like a single method for
nesting the debug stack, however."
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, x32, ptrace: Remove PTRACE_ARCH_PRCTL for x32
x86, efi: Add EFI boot stub documentation
x86, efi; Add EFI boot stub console support
x86, efi: Only close open files in error path
ftrace/x86: Do not change stacks in DEBUG when calling lockdep
x86: Allow nesting of the debug stack IDT setting
x86: Reset the debug_stack update counter
ftrace: Use breakpoint method to update ftrace caller
ftrace: Synchronize variable setting with breakpoints
This reverts the tty layer change to use per-tty locking, because it's
not correct yet, and fixing it will require some more deep surgery.
The main revert is d29f3ef39b ("tty_lock: Localise the lock"), but
there are several smaller commits that built upon it, they also get
reverted here. The list of reverted commits is:
fde86d3108 - tty: add lockdep annotations
8f6576ad47 - tty: fix ldisc lock inversion trace
d3ca8b64b9 - pty: Fix lock inversion
b1d679afd7 - tty: drop the pty lock during hangup
abcefe5fc3 - tty/amiserial: Add missing argument for tty_unlock()
fd11b42e35 - cris: fix missing tty arg in wait_event_interruptible_tty call
d29f3ef39b - tty_lock: Localise the lock
The revert had a trivial conflict in the 68360serial.c staging driver
that got removed in the meantime.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
skb_defer_rx_timestamp was called with a freshly allocated skb but must
be called with rskb instead.
Signed-off-by: Stephan Gatzka <stephan@gatzka.org>
Cc: stable <stable@vger.kernel.org>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add .status callback that detects link state changes.
Tested with MCS7832CV-AA chip (9710:7830, identified as rev.C by the driver).
Fixes https://bugzilla.kernel.org/show_bug.cgi?id=28532
Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull vfs fix and a fix from the signal changes for frv from Al Viro.
The __kernel_nlink_t for powerpc got scrogged because 64-bit powerpc
actually depended on the default "unsigned long", while 32-bit powerpc
had an explicit override to "unsigned short". Al didn't notice, and
made both of them be the unsigned short.
The frv signal fix is fallout from simplifying the do_notify_resume()
code, and leaving an extra parenthesis.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
powerpc: Fix size of st_nlink on 64bit
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
frv: Remove bogus closing parenthesis
commit e57f93cc53 (powerpc: get rid of nlink_t uses, switch to
explicitly-sized type) changed the size of st_nlink on ppc64 from
a long to a short, resulting in boot failures.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Introduced by commit 6fd84c0831
("TIF_RESTORE_SIGMASK can be set only when TIF_SIGPENDING is set")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The definition of I217_PROXY_CTRL must use the BM_PHY_REG() macro instead
of the PHY_REG() macro for PHY page 800 register 70 since it is for a PHY
register greater than the maximum allowed by the latter macro, and fix a
typo setting the I217_MEMPWR register in e1000_suspend_workarounds_ich8lan.
Also for clarity, rename a few defines as bit definitions instead of masks.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This is another fixup where the data is not transfered into buffer
addressed by skb->data but into a page.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Merge fixups for the mac NLS tables from Andrew.
* emailed from Andrew Morton, and one cleanup by me:
nls: fix (and rename) mac NLS table files and config options
fs/nls/Makefile: remove bogus CONFIG_ assignments
The config options in the Kconfig file (with _CODEPAGE_ in the name)
didn't match the config option name in the Makefile (no _CODEPAGE_).
And both of them were of the hard-to-read MACXYZZY variety, which made
them hard to parse for normal humans: MACROMAN easily reads as "macro
man", not as "Mac Roman".
So rename the options to be consistent, and be NLS_MAC_xyzzy. Rename
the files to be mac-xyzzy.c too, and drop the "nls" part entirely (it's
already in the directory name).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
These were debug things which snuck through.
Reported-by: Yinghai Lu <yinghai@kernel.org>
Cc: Vladimir Serbinenko <phcoder@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It includes:
- driver for AUO-K1900 and AUO-K1901 epaper controller
- large updates for OMAP (e.g. decouple HDMI audio and video)
- some updates for Exynos and SH Mobile
- various other small fixes and cleanups
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.12 (GNU/Linux)
iQIcBAABAgAGBQJPyAhmAAoJECSVL5KnPj1PBcoQAIWftuoXo3sk94f5jKcV4Ucx
MthEc5iEpMVs8xaEruHHNHXWv8ic0x/PfdC2xrpKOEbNXQcNPlb/QE2xWmBRxmT1
ucDyu10HJ36jKcwcK4ra5IQwOW+GtbTBEoBZT+WNAjxHZtJmxzjQGM4C12zVQpdJ
+qV2RP93JmsJoVBL9aKVAg1Ko135LLfD8TcKd+z8TmgFnLfSwKhfl7Jtd2xXwyvz
/hmW3kJUEnD8E5wuj+/g8sKJhQkGalEiITTqG2j2vJyFgxHSqyLSw8BBixrFW1uT
B9VnZsHF35ccCo+96UZRH4QsGJTx08+rea/qsv8IMSGczyRp5ey1ufjL+CzKiiIN
FWfex6fY0HHqZGAopQhjag54e914SIbSxdBwWS/iRrtVt3e9d03BzkhYs4rXl4Ey
CTC5obzWNTbQ6hLEjgWfVKkKcrF56BnRn3zGPgCTKGp2NK3vODdBkt/EmzUFvCWR
CcyQhh+PvZzEWp3XsdOGossYs/0aP4bO+7XPGJxZaa3+WVcRaZwAG/uZvJXXBfnp
DGRFy4wPsTTwKYIx4+t/KrsLtNVKioSMS5GEtuM1YEb8pA7mkUIkqwJv1I261h58
heTr6vWUsviUqHlKALJ+1CdwWGr3CtktCZssGsSUri61nm8CvlSRn2Nr2aJ/L3RN
AkemC/33RE5X/+lfkdMx
=tmIU
-----END PGP SIGNATURE-----
Merge tag 'fbdev-updates-for-3.5' of git://github.com/schandinat/linux-2.6
Pull fbdev updates from Florian Tobias Schandinat:
- driver for AUO-K1900 and AUO-K1901 epaper controller
- large updates for OMAP (e.g. decouple HDMI audio and video)
- some updates for Exynos and SH Mobile
- various other small fixes and cleanups
* tag 'fbdev-updates-for-3.5' of git://github.com/schandinat/linux-2.6: (130 commits)
video: bfin_adv7393fb: Fix cleanup code
video: exynos_dp: reduce delay time when configuring video setting
video: exynos_dp: move sw reset prioir to enabling sw defined function
video: exynos_dp: use devm_ functions
fb: handle NULL pointers in framebuffer release
OMAPDSS: HDMI: OMAP4: Update IRQ flags for the HPD IRQ request
OMAPDSS: Apply VENC timings even if panel is disabled
OMAPDSS: VENC/DISPC: Delay dividing Y resolution for managers connected to VENC
OMAPDSS: DISPC: Support rotation through TILER
OMAPDSS: VRFB: remove compiler warnings when CONFIG_BUG=n
OMAPFB: remove compiler warnings when CONFIG_BUG=n
OMAPDSS: remove compiler warnings when CONFIG_BUG=n
OMAPDSS: DISPC: fix usage of dispc_ovl_set_accu_uv
OMAPDSS: use DSI_FIFO_BUG workaround only for manual update displays
OMAPDSS: DSI: Support command mode interleaving during video mode blanking periods
OMAPDSS: DISPC: Update Accumulator configuration for chroma plane
drivers/video: fsl-diu-fb: don't initialize the THRESHOLDS registers
video: exynos mipi dsi: support reverse panel type
video: exynos mipi dsi: Properly interpret the interrupt source flags
video: exynos mipi dsi: Avoid races in probe()
...
- Updates to mxc_nand and gpmi drivers to support new boards and device tree
- Improve consistency of information about ECC strength in NAND devices
- Clean up partition handling of plat_nand
- Support NAND drivers without dedicated access to OOB area
- BCH hardware ECC support for OMAP
- Other fixes and cleanups, and a few new device IDs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iEYEABECAAYFAk/JG1wACgkQdwG7hYl686M80wCglN4kutx20j+KJWuZofkr9Hog
weEAoI4jrqEWEdW9EcT2CIWQw7eG+1v+
=7tdo
-----END PGP SIGNATURE-----
Merge tag 'for-linus-3.5-20120601' of git://git.infradead.org/linux-mtd
Pull mtd update from David Woodhouse:
- More robust parsing especially of xattr data in JFFS2
- Updates to mxc_nand and gpmi drivers to support new boards and device tree
- Improve consistency of information about ECC strength in NAND devices
- Clean up partition handling of plat_nand
- Support NAND drivers without dedicated access to OOB area
- BCH hardware ECC support for OMAP
- Other fixes and cleanups, and a few new device IDs
Fixed trivial conflict in drivers/mtd/nand/gpmi-nand/gpmi-nand.c due to
added include files next to each other.
* tag 'for-linus-3.5-20120601' of git://git.infradead.org/linux-mtd: (75 commits)
mtd: mxc_nand: move ecc strengh setup before nand_scan_tail
mtd: block2mtd: fix recursive call of mtd_writev
mtd: gpmi-nand: define ecc.strength
mtd: of_parts: fix breakage in Kconfig
mtd: nand: fix scan_read_raw_oob
mtd: docg3 fix in-middle of blocks reads
mtd: cfi_cmdset_0002: Slight cleanup of fixup messages
mtd: add fixup for S29NS512P NOR flash.
jffs2: allow to complete xattr integrity check on first GC scan
jffs2: allow to discriminate between recoverable and non-recoverable errors
mtd: nand: omap: add support for hardware BCH ecc
ARM: OMAP3: gpmc: add BCH ecc api and modes
mtd: nand: check the return code of 'read_oob/read_oob_raw'
mtd: nand: remove 'sndcmd' parameter of 'read_oob/read_oob_raw'
mtd: m25p80: Add support for Winbond W25Q80BW
jffs2: get rid of jffs2_sync_super
jffs2: remove unnecessary GC pass on sync
jffs2: remove unnecessary GC pass on umount
jffs2: remove lock_super
mtd: gpmi: add gpmi support for mx6q
...
Pull x86 platform driver updates from Matthew Garrett:
"Some significant improvements for the Sony driver on newer machines,
but other than that mostly just minor fixes and a patch to remove the
broken rfkill code from the Dell driver."
* 'for_linus' of git://cavan.codon.org.uk/platform-drivers-x86: (35 commits)
apple-gmux: Fix up the suspend/resume patch
dell-laptop: Remove rfkill code
toshiba_acpi: Fix mis-merge
dell-laptop: Add touchpad led support for Dell V3450
acer-wmi: add 3 laptops to video backlight vendor mode quirk table
sony-laptop: add touchpad enable/disable function
sony-laptop: add missing Fn key combos for 0x100 handlers
sony-laptop: add support for more WWAN modems
sony-laptop: new keyboard backlight handle
sony-laptop: add high speed battery charging function
sony-laptop: support automatic resume on lid open
sony-laptop: adjust error handling in finding SNC handles
sony-laptop: add thermal profiles support
sony-laptop: support battery care functions
sony-laptop: additional debug statements
sony-laptop: improve SNC initialization and acpi notify callback code
sony-laptop: use kstrtoul to parse sysfs values
sony-laptop: generalise ACPI calls into SNC functions
sony-laptop: fix return path when no ACPI buffer is allocated
sony-laptop: use soft rfkill status stored in hw
...
Pull slab updates from Pekka Enberg:
"Mainly a bunch of SLUB fixes from Joonsoo Kim"
* 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux:
slub: use __SetPageSlab function to set PG_slab flag
slub: fix a memory leak in get_partial_node()
slub: remove unused argument of init_kmem_cache_node()
slub: fix a possible memory leak
Documentations: Fix slabinfo.c directory in vm/slub.txt
slub: fix incorrect return type of get_any_partial()
Pull arm fixes for ux500 mismerge mishap from Arnd Bergmann:
"The device tree conversion for arm/ux500 in 3.5 turns out to be
incomplete because of a mismerge done by Linus Walleij that I failed
to notice early enough and that Lee Jones as the original author of
those patches did not manage to fix during the -next cycle. While we
originally to get a much larger set of ux500 device tree enablement
patches merged, this did not happen in time.
After some discussion at Linaro Connect conference this week, Lee has
been able to do damage control and provide a series to put the broken
platform back into usable shape for both DT and non-DT based booting.
This series has not been part of linux-next and is based on top of the
current state of the upstream kernel rather than an -rc, but this is
the best we could manage given the earlier breakage."
* 'ux500/hickup' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: ux500: Enable probing of pinctrl through Device Tree
ARM: ux500: Add support for ab8500 regulators into the Device Tree
ARM: ux500: Provide regulator support for SMSC911x via Device Tree
ARM: ux500: Allow PRCMU regulator to be probed during a DT enabled boot
ARM: ux500: Apply db8500-prcmu regulator information to db8500 Device Tree
ARM: ux500: Only initialise STE's UIBs on boards which support them
ARM: ux500: Disable platform setup of the ab8500 when DT is enabled
ARM: ux500: Use correct format for dynamic IRQ assignment
ARM: ux500: Re-enable SMSC911x platform code registration during non-DT boots
ARM: ux500: PRCMU related configuration and layout corrections for Device Tree
ARM: ux500: Remove DB8500 PRCMU platform registration when DT is enabled
ARM: ux500: Disable SMSC911x platform code registration when DT is enabled
ARM: ux500: New DT:ed u8500_init_devices for one-by-one device enablement
ARM: ux500: New DT:ed snowball_platform_devs for one-by-one device enablement
pinctrl-nomadik: Allow Device Tree driver probing