2
0
mirror of https://github.com/edk2-porting/linux-next.git synced 2025-01-01 00:54:15 +08:00
Commit Graph

4882 Commits

Author SHA1 Message Date
Jens Axboe
6443af9855 Merge branch 'stable/for-jens-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-linus 2015-06-27 11:47:07 -06:00
Keith Busch
3399a3f746 NVMe: Fix filesystem deadlock on removal
Move gendisk deletion before controller shutdown so filesystem may sync
dirty pages. Before, this would deadlock trying to allocate requests
on frozen queues that are about to be deleted.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-27 11:42:54 -06:00
Keith Busch
de3eff2bad NVMe: Failed controller initialization fixes
This fixes an infinite device reset loop that may occur on devices that
fail initialization. If the drive fails to become ready for any reason
that does not involve an admin command timeout, the probe task should
assume the drive is unavailable and remove it from the topology. In
the case an admin command times out during device probing, the driver's
existing reset action will handle removing the drive.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-27 11:42:53 -06:00
Keith Busch
ffe7704d59 NVMe: Unify controller probe and resume
This unifies probe and resume so they both may be scheduled in the same
way. This is necessary for error handling that may occur during device
initialization since the task to cleanup the device wouldn't be able to
run if it is blocked on device initialization.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-27 11:42:51 -06:00
Keith Busch
17188bb403 NVMe: Don't use fake status on cancelled command
Synchronized commands do different things for timed out commands
vs. controller returned errors.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-27 11:42:50 -06:00
Keith Busch
4af0e21caf NVMe: Fix device cleanup on initialization failure
Don't release block queue and tagging resoureces if the driver never
got them in the first place. This can happen if the controller fails to
become ready, if memory wasn't available to allocate a tagset or admin
queue, or if the resources were released as part of error recovery.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-27 11:42:48 -06:00
Linus Torvalds
d87823813f Char/Misc driver patches for 4.2-rc1
Here's the big char/misc driver pull request for 4.2-rc1.
 
 Lots of mei, extcon, coresight, uio, mic, and other driver updates in
 here.  Full details in the shortlog.  All of these have been in
 linux-next for some time with no reported problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlWNn0gACgkQMUfUDdst+ykCCQCgvdF4F2+Hy9+RATdk22ak1uq1
 JDMAoJTf4oyaIEdaiOKfEIWg9MasS42B
 =H5wD
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc driver updates from Greg KH:
 "Here's the big char/misc driver pull request for 4.2-rc1.

  Lots of mei, extcon, coresight, uio, mic, and other driver updates in
  here.  Full details in the shortlog.  All of these have been in
  linux-next for some time with no reported problems"

* tag 'char-misc-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (176 commits)
  mei: me: wait for power gating exit confirmation
  mei: reset flow control on the last client disconnection
  MAINTAINERS: mei: add mei_cl_bus.h to maintained file list
  misc: sram: sort and clean up included headers
  misc: sram: move reserved block logic out of probe function
  misc: sram: add private struct device and virt_base members
  misc: sram: report correct SRAM pool size
  misc: sram: bump error message level on unclean driver unbinding
  misc: sram: fix device node reference leak on error
  misc: sram: fix enabled clock leak on error path
  misc: mic: Fix reported static checker warning
  misc: mic: Fix randconfig build error by including errno.h
  uio: pruss: Drop depends on ARCH_DAVINCI_DA850 from config
  uio: pruss: Add CONFIG_HAS_IOMEM dependence
  uio: pruss: Include <linux/sizes.h>
  extcon: Redefine the unique id of supported external connectors without 'enum extcon' type
  char:xilinx_hwicap:buffer_icap - change 1/0 to true/false for bool type variable in function buffer_icap_set_configuration().
  Drivers: hv: vmbus: Allocate ring buffer memory in NUMA aware fashion
  parport: check exclusive access before register
  w1: use correct lock on error in w1_seq_show()
  ...
2015-06-26 14:51:15 -07:00
Linus Torvalds
47a469421d Merge branch 'akpm' (patches from Andrew)
Merge second patchbomb from Andrew Morton:

 - most of the rest of MM

 - lots of misc things

 - procfs updates

 - printk feature work

 - updates to get_maintainer, MAINTAINERS, checkpatch

 - lib/ updates

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (96 commits)
  exit,stats: /* obey this comment */
  coredump: add __printf attribute to cn_*printf functions
  coredump: use from_kuid/kgid when formatting corename
  fs/reiserfs: remove unneeded cast
  NILFS2: support NFSv2 export
  fs/befs/btree.c: remove unneeded initializations
  fs/minix: remove unneeded cast
  init/do_mounts.c: add create_dev() failure log
  kasan: remove duplicate definition of the macro KASAN_FREE_PAGE
  fs/efs: femove unneeded cast
  checkpatch: emit "NOTE: <types>" message only once after multiple files
  checkpatch: emit an error when there's a diff in a changelog
  checkpatch: validate MODULE_LICENSE content
  checkpatch: add multi-line handling for PREFER_ETHER_ADDR_COPY
  checkpatch: suggest using eth_zero_addr() and eth_broadcast_addr()
  checkpatch: fix processing of MEMSET issues
  checkpatch: suggest using ether_addr_equal*()
  checkpatch: avoid NOT_UNIFIED_DIFF errors on cover-letter.patch files
  checkpatch: remove local from codespell path
  checkpatch: add --showfile to allow input via pipe to show filenames
  ...
2015-06-26 09:52:05 -07:00
Sergey Senozhatsky
d93435c3fb zram: check comp algorithm availability earlier
Improvement idea by Marcin Jabrzyk.

comp_algorithm_store() silently accepts any supplied algorithm name,
because zram performs algorithm availability check later, during the
device configuration phase in disksize_store() and emits the following
error:

  "zram: Cannot initialise %s compressing backend"

this error line is somewhat generic and, besides, can indicate a failed
attempt to allocate compression backend's working buffers.

add algorithm availability check to comp_algorithm_store():

  echo lzz > /sys/block/zram0/comp_algorithm
  -bash: echo: write error: Invalid argument

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Marcin Jabrzyk <m.jabrzyk@samsung.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 17:00:37 -07:00
Sergey Senozhatsky
4bbacd51a6 zram: cut trailing newline in algorithm name
Supplied sysfs values sometimes contain new-line symbols (echo vs.  echo
-n), which we also copy as a compression algorithm name.  it works fine
when we lookup for compression algorithm, because we use sysfs_streq()
which takes care of new line symbols.  however, it doesn't look nice when
we print compression algorithm name if zcomp_create() failed:

 zram: Cannot initialise LXZ
            compressing backend

cut trailing new-line, so the error string will look like

  zram: Cannot initialise LXZ compressing backend

we also now can replace sysfs_streq() in zcomp_available_show() with
strcmp().

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 17:00:36 -07:00
Sergey Senozhatsky
17162f41f0 zram: cosmetic zram_bvec_write() cleanup
`bool locked' local variable tells us if we should perform
zcomp_strm_release() or not (jumped to `out' label before
zcomp_strm_find() occurred), which is equivalent to `zstrm' being or not
being NULL.  remove `locked' and check `zstrm' instead.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 17:00:36 -07:00
Sergey Senozhatsky
6566d1a32b zram: add dynamic device add/remove functionality
We currently don't support on-demand device creation.  The one and only
way to have N zram devices is to specify num_devices module parameter
(default value: 1).  IOW if, for some reason, at some point, user wants
to have N + 1 devies he/she must umount all the existing devices, unload
the module, load the module passing num_devices equals to N + 1.  And do
this again, if needed.

This patch introduces zram control sysfs class, which has two sysfs
attrs:
- hot_add      -- add a new zram device
- hot_remove   -- remove a specific (device_id) zram device

hot_add sysfs attr is read-only and has only automatic device id
assignment mode (as requested by Minchan Kim).  read operation performed
on this attr creates a new zram device and returns back its device_id or
error status.

Usage example:
	# add a new specific zram device
	cat /sys/class/zram-control/hot_add
	2

	# remove a specific zram device
	echo 4 > /sys/class/zram-control/hot_remove

Returning zram_add() error code back to user (-ENOMEM in this case)

	cat /sys/class/zram-control/hot_add
	cat: /sys/class/zram-control/hot_add: Cannot allocate memory

NOTE, there might be users who already depend on the fact that at least
zram0 device gets always created by zram_init(). Preserve this behavior.

[minchan@kernel.org: use zram->claim to avoid lockdep splat]
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 17:00:36 -07:00
Sergey Senozhatsky
f405c445a4 zram: close race by open overriding
[ Original patch from Minchan Kim <minchan@kernel.org> ]

Commit ba6b17d68c ("zram: fix umount-reset_store-mount race
condition") introduced bdev->bd_mutex to protect a race between mount
and reset.  At that time, we don't have dynamic zram-add/remove feature
so it was okay.

However, as we introduce dynamic device feature, bd_mutex became
trouble.

	CPU 0

echo 1 > /sys/block/zram<id>/reset
  -> kernfs->s_active(A)
    -> zram:reset_store->bd_mutex(B)

	CPU 1

echo <id> > /sys/class/zram/zram-remove
  ->zram:zram_remove: bd_mutex(B)
  -> sysfs_remove_group
    -> kernfs->s_active(A)

IOW, AB -> BA deadlock

The reason we are holding bd_mutex for zram_remove is to prevent
any incoming open /dev/zram[0-9]. Otherwise, we could remove zram
others already have opened. But it causes above deadlock problem.

To fix the problem, this patch overrides block_device.open and
it returns -EBUSY if zram asserts he claims zram to reset so any
incoming open will be failed so we don't need to hold bd_mutex
for zram_remove ayn more.

This patch is to prepare for zram-add/remove feature.

[sergey.senozhatsky@gmail.com: simplify reset_store()]
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 17:00:36 -07:00
Sergey Senozhatsky
92ff152887 zram: return zram device_id from zram_add()
This patch prepares zram to enable on-demand device creation.
zram_add() performs automatic device_id assignment and returns
new device id (>= 0) or error code (< 0).

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 17:00:36 -07:00
Sergey Senozhatsky
b31177f2a9 zram: trivial: correct flag operations comment
We don't have meta->tb_lock anymore and use meta table entry bit_spin_lock
instead. update corresponding comment.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 17:00:36 -07:00
Sergey Senozhatsky
d12b63c927 zram: report every added and removed device
With dynamic device creation/removal (which will be introduced later in
the series) printing num_devices in zram_init() will not make a lot of
sense, as well as printing the number of destroyed devices in
destroy_devices().  Print per-device action (added/removed) in zram_add()
and zram_remove() instead.

Example:

[ 3645.259652] zram: Added device: zram5
[ 3646.152074] zram: Added device: zram6
[ 3650.585012] zram: Removed device: zram5
[ 3655.845584] zram: Added device: zram8
[ 3660.975223] zram: Removed device: zram6

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 17:00:36 -07:00
Sergey Senozhatsky
c3cdb40e66 zram: remove max_num_devices limitation
Limiting the number of zram devices to 32 (default max_num_devices value)
is confusing, let's drop it.  A user with 2TB or 4TB of RAM, for example,
can request as many devices as he can handle.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 17:00:36 -07:00
Sergey Senozhatsky
522698d7ca zram: reorganize code layout
This patch looks big, but basically it just moves code blocks.
No functional changes.

Our current code layout looks like a sandwitch.

For example,
a) between read/write handlers, we have update_used_max() helper function:

static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw

b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:

static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request

c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.

Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):

-- one-liners: zram_test_flag/etc.

-- helpers: is_partial_io/update_position/etc

-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.

-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page

-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page

-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 17:00:36 -07:00
Sergey Senozhatsky
85508ec6cb zram: use idr instead of `zram_devices' array
This patch makes some preparations for on-demand device add/remove
functionality.

Remove `zram_devices' array and switch to id-to-pointer translation (idr).
idr doesn't bloat zram struct with additional members, f.e.  list_head,
yet still provides ability to match the device_id with the device pointer.

No user-space visible changes.

[Julia.Lawall@lip6.fr: return -ENOMEM when `queue' alloc fails]
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 17:00:36 -07:00
Sergey Senozhatsky
3bca3ef769 zram: cosmetic ZRAM_ATTR_RO code formatting tweak
Fix a misplaced backslash.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 17:00:36 -07:00
Marcin Jabrzyk
9e65bf68a8 zram: remove obsolete ZRAM_DEBUG option
This config option doesn't provide any usage for zram.

Signed-off-by: Marcin Jabrzyk <m.jabrzyk@samsung.com>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 17:00:35 -07:00
Linus Torvalds
e4bc13adfd Merge branch 'for-4.2/writeback' of git://git.kernel.dk/linux-block
Pull cgroup writeback support from Jens Axboe:
 "This is the big pull request for adding cgroup writeback support.

  This code has been in development for a long time, and it has been
  simmering in for-next for a good chunk of this cycle too.  This is one
  of those problems that has been talked about for at least half a
  decade, finally there's a solution and code to go with it.

  Also see last weeks writeup on LWN:

        http://lwn.net/Articles/648292/"

* 'for-4.2/writeback' of git://git.kernel.dk/linux-block: (85 commits)
  writeback, blkio: add documentation for cgroup writeback support
  vfs, writeback: replace FS_CGROUP_WRITEBACK with SB_I_CGROUPWB
  writeback: do foreign inode detection iff cgroup writeback is enabled
  v9fs: fix error handling in v9fs_session_init()
  bdi: fix wrong error return value in cgwb_create()
  buffer: remove unusued 'ret' variable
  writeback: disassociate inodes from dying bdi_writebacks
  writeback: implement foreign cgroup inode bdi_writeback switching
  writeback: add lockdep annotation to inode_to_wb()
  writeback: use unlocked_inode_to_wb transaction in inode_congested()
  writeback: implement unlocked_inode_to_wb transaction and use it for stat updates
  writeback: implement [locked_]inode_to_wb_and_lock_list()
  writeback: implement foreign cgroup inode detection
  writeback: make writeback_control track the inode being written back
  writeback: relocate wb[_try]_get(), wb_put(), inode_{attach|detach}_wb()
  mm: vmscan: disable memcg direct reclaim stalling if cgroup writeback support is in use
  writeback: implement memcg writeback domain based throttling
  writeback: reset wb_domain->dirty_limit[_tstmp] when memcg domain size changes
  writeback: implement memcg wb_domain
  writeback: update wb_over_bg_thresh() to use wb_domain aware operations
  ...
2015-06-25 16:00:17 -07:00
Linus Torvalds
6a398a3ef4 Merge branch 'for-4.2/drivers' of git://git.kernel.dk/linux-block
Pull block driver updates from Jens Axboe:
 "This contains:

   - a few race fixes for null_blk, from Akinobu Mita.

   - a series of fixes for mtip32xx, from Asai Thambi and Selvan Mani at
     Micron.

   - NVMe:
        * Fix for missing error return on allocation failure, from Axel
          Lin.

        * Code consolidation and cleanups from Christoph.

        * Memory barrier addition, syncing queue count and queue
          pointers. From Jon Derrick.

        * Various fixes from Keith, an addition to support user
          issue reset from sysfs or ioctl, and automatic namespace
          rescan.

        * Fix from Matias, avoiding losing some request flags when
          marking the request failfast.

   - small cleanups and sparse fixups for ps3vram.  From Geert
     Uytterhoeven and Geoff Lavand.

   - s390/dasd dead code removal, from Jarod Wilson.

   - a set of fixes and optimizations for loop, from Ming Lei.

   - conversion to blkdev_reread_part() of loop, dasd, ndb.  From Ming
     Lei.

   - updates to cciss.  From Tomas Henzl"

* 'for-4.2/drivers' of git://git.kernel.dk/linux-block: (44 commits)
  mtip32xx: Fix accessing freed memory
  block: nvme-scsi: Catch kcalloc failure
  NVMe: Fix IO for extended metadata formats
  nvme: don't overwrite req->cmd_flags on sync cmd
  mtip32xx: increase wait time for hba reset
  mtip32xx: fix minor number
  mtip32xx: remove unnecessary sleep in mtip_ftl_rebuild_poll()
  mtip32xx: fix crash on surprise removal of the drive
  mtip32xx: Abort I/O during secure erase operation
  mtip32xx: fix incorrectly setting MTIP_DDF_SEC_LOCK_BIT
  mtip32xx: remove unused variable 'port->allocated'
  mtip32xx: fix rmmod issue
  MAINTAINERS: Update ps3vram block driver
  block/ps3vram: Remove obsolete reference to MTD
  block/ps3vram: Fix sparse warnings
  NVMe: Automatic namespace rescan
  NVMe: Memory barrier before queue_count is incremented
  NVMe: add sysfs and ioctl controller reset
  null_blk: restart request processing on completion handler
  null_blk: prevent timer handler running on a different CPU where started
  ...
2015-06-25 15:12:50 -07:00
Linus Torvalds
bfffa1cc9d Merge branch 'for-4.2/core' of git://git.kernel.dk/linux-block
Pull core block IO update from Jens Axboe:
 "Nothing really major in here, mostly a collection of smaller
  optimizations and cleanups, mixed with various fixes.  In more detail,
  this contains:

   - Addition of policy specific data to blkcg for block cgroups.  From
     Arianna Avanzini.

   - Various cleanups around command types from Christoph.

   - Cleanup of the suspend block I/O path from Christoph.

   - Plugging updates from Shaohua and Jeff Moyer, for blk-mq.

   - Eliminating atomic inc/dec of both remaining IO count and reference
     count in a bio.  From me.

   - Fixes for SG gap and chunk size support for data-less (discards)
     IO, so we can merge these better.  From me.

   - Small restructuring of blk-mq shared tag support, freeing drivers
     from iterating hardware queues.  From Keith Busch.

   - A few cfq-iosched tweaks, from Tahsin Erdogan and me.  Makes the
     IOPS mode the default for non-rotational storage"

* 'for-4.2/core' of git://git.kernel.dk/linux-block: (35 commits)
  cfq-iosched: fix other locations where blkcg_to_cfqgd() can return NULL
  cfq-iosched: fix sysfs oops when attempting to read unconfigured weights
  cfq-iosched: move group scheduling functions under ifdef
  cfq-iosched: fix the setting of IOPS mode on SSDs
  blktrace: Add blktrace.c to BLOCK LAYER in MAINTAINERS file
  block, cgroup: implement policy-specific per-blkcg data
  block: Make CFQ default to IOPS mode on SSDs
  block: add blk_set_queue_dying() to blkdev.h
  blk-mq: Shared tag enhancements
  block: don't honor chunk sizes for data-less IO
  block: only honor SG gap prevention for merges that contain data
  block: fix returnvar.cocci warnings
  block, dm: don't copy bios for request clones
  block: remove management of bi_remaining when restoring original bi_end_io
  block: replace trylock with mutex_lock in blkdev_reread_part()
  block: export blkdev_reread_part() and __blkdev_reread_part()
  suspend: simplify block I/O handling
  block: collapse bio bit space
  block: remove unused BIO_RW_BLOCK and BIO_EOF flags
  block: remove BIO_EOPNOTSUPP
  ...
2015-06-25 14:29:53 -07:00
Ilya Dryomov
b55841807f rbd: queue_depth map option
nr_requests (/sys/block/rbd<id>/queue/nr_requests) is pretty much
irrelevant in blk-mq case because each driver sets its own max depth
that it can handle and that's the number of tags that gets preallocated
on setup.  Users can't increase queue depth beyond that value via
writing to nr_requests.

For rbd we are happy with the default BLKDEV_MAX_RQ (128) for most
cases but we want to give users the opportunity to increase it.
Introduce a new per-device queue_depth option to do just that:

    $ sudo rbd map -o queue_depth=1024 ...

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2015-06-25 18:30:55 +03:00
Ilya Dryomov
d147543d79 rbd: store rbd_options in rbd_device
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2015-06-25 18:30:55 +03:00
Ilya Dryomov
210c104c54 rbd: terminate rbd_opts_tokens with Opt_err
Also nuke useless Opt_last_bool and don't break lines unnecessarily.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2015-06-25 18:30:54 +03:00
Ilya Dryomov
d3834fefcf rbd: bump queue_max_segments
The default queue_limits::max_segments value (BLK_MAX_SEGMENTS = 128)
unnecessarily limits bio sizes to 512k (assuming 4k pages).  rbd, being
a virtual block device, doesn't have any restrictions on the number of
physical segments, so bump max_segments to max_hw_sectors, in theory
allowing a sector per segment (although the only case this matters that
I can think of is some readv/writev style thing).  In practice this is
going to give us 1M bios - the number of segments in a bio is limited
in bio_get_nr_vecs() by BIO_MAX_PAGES = 256.

Note that this doesn't result in any improvement on a typical direct
sequential test.  This is because on a box with a not too badly
fragmented memory the default BLK_MAX_SEGMENTS is enough to see nice
rbd object size sized requests.  The only difference is the size of
bios being merged - 512k vs 1M for something like

    $ dd if=/dev/zero of=/dev/rbd0 oflag=direct bs=$RBD_OBJ_SIZE
    $ dd if=/dev/rbd0 iflag=direct of=/dev/null bs=$RBD_OBJ_SIZE

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2015-06-25 18:30:49 +03:00
Ilya Dryomov
2894e1d769 rbd: timeout watch teardown on unmap with mount_timeout
As part of unmap sequence, kernel client has to talk to the OSDs to
teardown watch on the header object.  If none of the OSDs are available
it would hang forever, until interrupted by a signal - when that
happens we follow through with the rest of unmap procedure (i.e.
unregister the device and put all the data structures) and the unmap is
still considired successful (rbd cli tool exits with 0).  The watch on
the userspace side should eventually timeout so that's fine.

This isn't very nice, because various userspace tools (pacemaker rbd
resource agent, for example) then have to worry about setting up their
own timeouts.  Timeout it with mount_timeout (60 seconds by default).

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Reviewed-by: Sage Weil <sage@redhat.com>
2015-06-25 11:49:30 +03:00
Ilya Dryomov
a319bf56a6 libceph: store timeouts in jiffies, verify user input
There are currently three libceph-level timeouts that the user can
specify on mount: mount_timeout, osd_idle_ttl and osdkeepalive.  All of
these are in seconds and no checking is done on user input: negative
values are accepted, we multiply them all by HZ which may or may not
overflow, arbitrarily large jiffies then get added together, etc.

There is also a bug in the way mount_timeout=0 is handled.  It's
supposed to mean "infinite timeout", but that's not how wait.h APIs
treat it and so __ceph_open_session() for example will busy loop
without much chance of being interrupted if none of ceph-mons are
there.

Fix all this by verifying user input, storing timeouts capped by
msecs_to_jiffies() in jiffies and using the new ceph_timeout_jiffies()
helper for all user-specified waits to handle infinite timeouts
correctly.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2015-06-25 11:49:29 +03:00
Yan, Zheng
144cba1493 libceph: allow setting osd_req_op's flags
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2015-06-25 11:49:27 +03:00
Dan Williams
18da2c9ee4 libnvdimm, pmem: move pmem to drivers/nvdimm/
Prepare the pmem driver to consume PMEM namespaces emitted by regions of
an nvdimm_bus instance.  No functional change.

Acked-by: Christoph Hellwig <hch@lst.de>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-24 21:24:10 -04:00
Linus Torvalds
e0456717e4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) Add TX fast path in mac80211, from Johannes Berg.

 2) Add TSO/GRO support to ibmveth, from Thomas Falcon

 3) Move away from cached routes in ipv6, just like ipv4, from Martin
    KaFai Lau.

 4) Lots of new rhashtable tests, from Thomas Graf.

 5) Run ingress qdisc lockless, from Alexei Starovoitov.

 6) Allow servers to fetch TCP packet headers for SYN packets of new
    connections, for fingerprinting.  From Eric Dumazet.

 7) Add mode parameter to pktgen, for testing receive.  From Alexei
    Starovoitov.

 8) Cache access optimizations via simplifications of build_skb(), from
    Alexander Duyck.

 9) Move page frag allocator under mm/, also from Alexander.

10) Add xmit_more support to hv_netvsc, from KY Srinivasan.

11) Add a counter guard in case we try to perform endless reclassify
    loops in the packet scheduler.

12) Extern flow dissector to be programmable and use it in new "Flower"
    classifier.  From Jiri Pirko.

13) AF_PACKET fanout rollover fixes, performance improvements, and new
    statistics.  From Willem de Bruijn.

14) Add netdev driver for GENEVE tunnels, from John W Linville.

15) Add ingress netfilter hooks and filtering, from Pablo Neira Ayuso.

16) Fix handling of epoll edge triggers in TCP, from Eric Dumazet.

17) Add an ECN retry fallback for the initial TCP handshake, from Daniel
    Borkmann.

18) Add tail call support to BPF, from Alexei Starovoitov.

19) Add several pktgen helper scripts, from Jesper Dangaard Brouer.

20) Add zerocopy support to AF_UNIX, from Hannes Frederic Sowa.

21) Favor even port numbers for allocation to connect() requests, and
    odd port numbers for bind(0), in an effort to help avoid
    ip_local_port_range exhaustion.  From Eric Dumazet.

22) Add Cavium ThunderX driver, from Sunil Goutham.

23) Allow bpf programs to access skb_iif and dev->ifindex SKB metadata,
    from Alexei Starovoitov.

24) Add support for T6 chips in cxgb4vf driver, from Hariprasad Shenai.

25) Double TCP Small Queues default to 256K to accomodate situations
    like the XEN driver and wireless aggregation.  From Wei Liu.

26) Add more entropy inputs to flow dissector, from Tom Herbert.

27) Add CDG congestion control algorithm to TCP, from Kenneth Klette
    Jonassen.

28) Convert ipset over to RCU locking, from Jozsef Kadlecsik.

29) Track and act upon link status of ipv4 route nexthops, from Andy
    Gospodarek.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1670 commits)
  bridge: vlan: flush the dynamically learned entries on port vlan delete
  bridge: multicast: add a comment to br_port_state_selection about blocking state
  net: inet_diag: export IPV6_V6ONLY sockopt
  stmmac: troubleshoot unexpected bits in des0 & des1
  net: ipv4 sysctl option to ignore routes when nexthop link is down
  net: track link-status of ipv4 nexthops
  net: switchdev: ignore unsupported bridge flags
  net: Cavium: Fix MAC address setting in shutdown state
  drivers: net: xgene: fix for ACPI support without ACPI
  ip: report the original address of ICMP messages
  net/mlx5e: Prefetch skb data on RX
  net/mlx5e: Pop cq outside mlx5e_get_cqe
  net/mlx5e: Remove mlx5e_cq.sqrq back-pointer
  net/mlx5e: Remove extra spaces
  net/mlx5e: Avoid TX CQE generation if more xmit packets expected
  net/mlx5e: Avoid redundant dev_kfree_skb() upon NOP completion
  net/mlx5e: Remove re-assignment of wq type in mlx5e_enable_rq()
  net/mlx5e: Use skb_shinfo(skb)->gso_segs rather than counting them
  net/mlx5e: Static mapping of netdev priv resources to/from netdev TX queues
  net/mlx4_en: Use HW counters for rx/tx bytes/packets in PF device
  ...
2015-06-24 16:49:49 -07:00
Selvan Mani
98f57c5196 mtip32xx: Fix accessing freed memory
In mtip_pci_remove(), driver data 'dd' is accessed after freeing it. This
is a residue of SRSI code cleanup in the patch 016a41c38821 "mtip32xx: fix
crash on surprise removal of the drive". Removed the bit flags
MTIP_DDF_REMOVE_DONE_BIT and MTIP_PF_SR_CLEANUP_BIT.

Reported-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Vignesh Gunasekaran <vgunasekaran@micron.com>
Signed-off-by: Selvan Mani <smani@micron.com>
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-24 08:48:46 -06:00
Linus Torvalds
acd53127c4 SCSI misc on 20150622
This is the usual grab bag of driver updates (lpfc, hpsa,
 megaraid_sas, cxgbi, be2iscsi) plus an assortment of minor updates.
 There are also one new driver: the Cisco snic; the advansys driver has
 been rewritten to get rid of the warning about converting it to the
 DMA API, the tape statistics patch got in and finally, there's a
 resuffle of SCSI header files to separate more cleanly initiator from
 target mode (and better share the common definitions).
 
 Signed-off-by: James Bottomley <JBottomley@Odin.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABAgAGBQJViKWdAAoJEDeqqVYsXL0MAr8IAMmlA6HBVjMJJFCEOY9corHj
 e70MNQa7LUgf+JCdOtzGcvHXTiFFd4IHZAwXUJAnsC4IU2QWEfi1bjUTErlqBIGk
 LoZlXXpEHnFpmWot3OluOzzcGcxede8rVgPiKWVVdojIngBC2+LL/i2vPCJ84ri9
 WCVlk6KBvWZXuU6JuOKAb2FO9HOX7Q61wuKAMast2Qc6RNc2ksgc7VbstsITqzZ9
 FVEsjmQ5lqUj+xdxBpiUOdUpc22IJ4VcpBgQ2HrThvg6vf4aq937RJ/g4vi/g0SU
 Utk0a3bUw1H/WnYAfJVFx83nVEsS/954Z7/ERDg1sjlfLYwQtQnpov0XIbPIbZU=
 =k9IT
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 "This is the usual grab bag of driver updates (lpfc, hpsa,
  megaraid_sas, cxgbi, be2iscsi) plus an assortment of minor updates.

  There is also one new driver: the Cisco snic.  The advansys driver has
  been rewritten to get rid of the warning about converting it to the
  DMA API, the tape statistics patch got in and finally, there's a
  resuffle of SCSI header files to separate more cleanly initiator from
  target mode (and better share the common definitions)"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (156 commits)
  snic: driver for Cisco SCSI HBA
  qla2xxx: Fix indentation
  qla2xxx: Comment out unreachable code
  fusion: remove dead MTRR code
  advansys: fix compilation errors and warnings when CONFIG_PCI is not set
  mptsas: fix depth param in scsi_track_queue_full
  megaraid: fix irq setup process regression
  lpfc: Update version to 10.7.0.0 for upstream patch set.
  lpfc: Fix to drop PLOGIs from fabric node till LOGO processing completes
  lpfc: Fix scsi task management error message.
  lpfc: Fix cq_id masking problem.
  lpfc: Fix scsi prep dma buf error.
  lpfc: Add support for using block multi-queue
  lpfc: Devices are not discovered during takeaway/giveback testing
  lpfc: Fix vport deletion failure.
  lpfc: Check for active portpeerbeacon.
  lpfc: Update driver version for upstream patch set 10.6.0.1.
  lpfc: Change buffer pool empty message to miscellaneous category
  lpfc: Fix incorrect log message reported for empty FCF record.
  lpfc: Fix rport leak.
  ...
2015-06-23 15:55:44 -07:00
Al Viro
dc3f4198ea make simple_positive() public
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-06-23 18:02:01 -04:00
Miklos Szeredi
9bf39ab2ad vfs: add file_path() helper
Turn
	d_path(&file->f_path, ...);
into
	file_path(file, ...);

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-06-23 18:00:05 -04:00
Linus Torvalds
d70b3ef54c Merge branch 'x86-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 core updates from Ingo Molnar:
 "There were so many changes in the x86/asm, x86/apic and x86/mm topics
  in this cycle that the topical separation of -tip broke down somewhat -
  so the result is a more traditional architecture pull request,
  collected into the 'x86/core' topic.

  The topics were still maintained separately as far as possible, so
  bisectability and conceptual separation should still be pretty good -
  but there were a handful of merge points to avoid excessive
  dependencies (and conflicts) that would have been poorly tested in the
  end.

  The next cycle will hopefully be much more quiet (or at least will
  have fewer dependencies).

  The main changes in this cycle were:

   * x86/apic changes, with related IRQ core changes: (Jiang Liu, Thomas
     Gleixner)

     - This is the second and most intrusive part of changes to the x86
       interrupt handling - full conversion to hierarchical interrupt
       domains:

          [IOAPIC domain]   -----
                                 |
          [MSI domain]      --------[Remapping domain] ----- [ Vector domain ]
                                 |   (optional)          |
          [HPET MSI domain] -----                        |
                                                         |
          [DMAR domain]     -----------------------------
                                                         |
          [Legacy domain]   -----------------------------

       This now reflects the actual hardware and allowed us to distangle
       the domain specific code from the underlying parent domain, which
       can be optional in the case of interrupt remapping.  It's a clear
       separation of functionality and removes quite some duct tape
       constructs which plugged the remap code between ioapic/msi/hpet
       and the vector management.

     - Intel IOMMU IRQ remapping enhancements, to allow direct interrupt
       injection into guests (Feng Wu)

   * x86/asm changes:

     - Tons of cleanups and small speedups, micro-optimizations.  This
       is in preparation to move a good chunk of the low level entry
       code from assembly to C code (Denys Vlasenko, Andy Lutomirski,
       Brian Gerst)

     - Moved all system entry related code to a new home under
       arch/x86/entry/ (Ingo Molnar)

     - Removal of the fragile and ugly CFI dwarf debuginfo annotations.
       Conversion to C will reintroduce many of them - but meanwhile
       they are only getting in the way, and the upstream kernel does
       not rely on them (Ingo Molnar)

     - NOP handling refinements. (Borislav Petkov)

   * x86/mm changes:

     - Big PAT and MTRR rework: making the code more robust and
       preparing to phase out exposing direct MTRR interfaces to drivers -
       in favor of using PAT driven interfaces (Toshi Kani, Luis R
       Rodriguez, Borislav Petkov)

     - New ioremap_wt()/set_memory_wt() interfaces to support
       Write-Through cached memory mappings.  This is especially
       important for good performance on NVDIMM hardware (Toshi Kani)

   * x86/ras changes:

     - Add support for deferred errors on AMD (Aravind Gopalakrishnan)

       This is an important RAS feature which adds hardware support for
       poisoned data.  That means roughly that the hardware marks data
       which it has detected as corrupted but wasn't able to correct, as
       poisoned data and raises an APIC interrupt to signal that in the
       form of a deferred error.  It is the OS's responsibility then to
       take proper recovery action and thus prolonge system lifetime as
       far as possible.

     - Add support for Intel "Local MCE"s: upcoming CPUs will support
       CPU-local MCE interrupts, as opposed to the traditional system-
       wide broadcasted MCE interrupts (Ashok Raj)

     - Misc cleanups (Borislav Petkov)

   * x86/platform changes:

     - Intel Atom SoC updates

  ... and lots of other cleanups, fixlets and other changes - see the
  shortlog and the Git log for details"

* 'x86-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (222 commits)
  x86/hpet: Use proper hpet device number for MSI allocation
  x86/hpet: Check for irq==0 when allocating hpet MSI interrupts
  x86/mm/pat, drivers/infiniband/ipath: Use arch_phys_wc_add() and require PAT disabled
  x86/mm/pat, drivers/media/ivtv: Use arch_phys_wc_add() and require PAT disabled
  x86/platform/intel/baytrail: Add comments about why we disabled HPET on Baytrail
  genirq: Prevent crash in irq_move_irq()
  genirq: Enhance irq_data_to_desc() to support hierarchy irqdomain
  iommu, x86: Properly handle posted interrupts for IOMMU hotplug
  iommu, x86: Provide irq_remapping_cap() interface
  iommu, x86: Setup Posted-Interrupts capability for Intel iommu
  iommu, x86: Add cap_pi_support() to detect VT-d PI capability
  iommu, x86: Avoid migrating VT-d posted interrupts
  iommu, x86: Save the mode (posted or remapped) of an IRTE
  iommu, x86: Implement irq_set_vcpu_affinity for intel_ir_chip
  iommu: dmar: Provide helper to copy shared irte fields
  iommu: dmar: Extend struct irte for VT-d Posted-Interrupts
  iommu: Add new member capability to struct irq_remap_ops
  x86/asm/entry/64: Disentangle error_entry/exit gsbase/ebx/usermode code
  x86/asm/entry/32: Shorten __audit_syscall_entry() args preparation
  x86/asm/entry/32: Explain reloading of registers after __audit_syscall_entry()
  ...
2015-06-22 17:59:09 -07:00
Bob Liu
a9b54bb951 drivers: xen-blkfront: only talk_to_blkback() when in XenbusStateInitialising
Patch 69b91ede5c
"drivers: xen-blkback: delay pending_req allocation to connect_ring"
exposed an problem that Xen blkfront has. There is a race
with XenStored and the drivers such that we can see two:

vbd vbd-268440320: blkfront:blkback_changed to state 2.
vbd vbd-268440320: blkfront:blkback_changed to state 2.
vbd vbd-268440320: blkfront:blkback_changed to state 4.

state changes to XenbusStateInitWait ('2'). The end result is that
blkback_changed() receives two notify and calls twice setup_blkring().

While the backend driver may only get the first setup_blkring() which is
wrong and reads out-dated (or reads them as they are being updated
with new ring-ref values).

The end result is that the ring ends up being incorrectly set.

The other drivers in the tree have such checks already in.

Reported-and-Tested-by: Robert Butera <robert.butera@oracle.com>
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2015-06-22 10:06:29 -04:00
Axel Lin
51ef72bda7 block: nvme-scsi: Catch kcalloc failure
res variable was initialized to -ENOMEM, but it's override by
nvme_trans_copy_from_user(). So current code returns 0 if kcalloc fails.
Fix it to return proper error code.

Signed-off-by: Axel Lin <axel.lin@ingics.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-20 10:34:07 -06:00
Keith Busch
71feb364e7 NVMe: Fix IO for extended metadata formats
This fixes io submit ioctl handling when using extended metadata
formats. When these formats are used, the user provides a single virtually
contiguous buffer containing both the block and metadata interleaved,
so the metadata size needs to be added to the total length and not mapped
as a separate transfer.

The command is also driver generated, so this patch does not enforce
blk-integrity extensions provide the metadata buffer.

Reported-by: Marcin Dziegielewski <marcin.dziegielewski@intel.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-19 13:16:26 -06:00
Matias Bjørling
e112af0dc9 nvme: don't overwrite req->cmd_flags on sync cmd
In __nvme_submit_sync_cmd, the request direction is overwritten when
the REQ_FAILFAST_DRIVER flag is set.

Signed-off-by: Matias Bjørling <m@bjorling.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Fixes: 75619bfa90 ("NVMe: End sync requests immediately on failure")
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-17 09:36:57 -06:00
Julien Grall
6684fa1cdb block/xen-blkback: s/nr_pages/nr_segs/
Make the code less confusing to read now that Linux may not have the
same page size as Xen.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-06-17 16:35:19 +01:00
Julien Grall
ee4b7179fc block/xen-blkfront: Remove invalid comment
Since commit b764915 "xen-blkfront: use a different scatterlist for each
request", biovec has been replaced by scatterlist when copying back the
data during a completion request.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-06-17 16:35:19 +01:00
Julien Grall
db26a68695 block/xen-blkfront: Remove unused macro MAXIMUM_OUTSTANDING_BLOCK_REQS
Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-06-17 16:35:18 +01:00
Asai Thambi SP
2f17d71dd7 mtip32xx: increase wait time for hba reset
In LUN failure conditions, device takes longer time to complete the hba reset.
Increased wait time from 1 second to 10 seconds.

Signed-off-by: Sam Bradshaw <sbradshaw@micron.com>
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-16 08:24:53 -06:00
Asai Thambi SP
75787265d6 mtip32xx: fix minor number
When a device is surprise removed and inserted, it is assigned a new minor
number because driver use multiples of 'instance' number. Modified to use the
multiples of 'index' for minor number.

Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-16 08:24:52 -06:00
Asai Thambi SP
284eb9a202 mtip32xx: remove unnecessary sleep in mtip_ftl_rebuild_poll()
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-16 08:24:50 -06:00
Asai Thambi SP
2132a54472 mtip32xx: fix crash on surprise removal of the drive
pci and block layers have changed a lot compared to when SRSI support was added.
Given the current state of pci and block layers, this driver do not have to do
any specific handling.

Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Selvan Mani <smani@micron.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-16 08:24:49 -06:00
Asai Thambi SP
686d8e0bb5 mtip32xx: Abort I/O during secure erase operation
Currently I/Os are being queued when secure erase operation starts, and issue
them after the operation completes. As all data will be gone when the operation
completes, any queued I/O doesn't make sense. Hence, abort I/O (return -ENODATA)
as soon as the driver receives.

Signed-off-by: Selvan Mani <smani@micron.com>
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-16 08:24:47 -06:00
Asai Thambi SP
ee04bed690 mtip32xx: fix incorrectly setting MTIP_DDF_SEC_LOCK_BIT
Fix incorrectly setting MTIP_DDF_SEC_LOCK_BIT

Signed-off-by: Selvan Mani <smani@micron.com>
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-16 08:24:46 -06:00
Asai Thambi SP
a7806fadc5 mtip32xx: remove unused variable 'port->allocated'
Remove unused variable 'port->allocated'

Signed-off-by: Selvan Mani <smani@micron.com>
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-16 08:24:44 -06:00
Asai Thambi SP
02b48265e7 mtip32xx: fix rmmod issue
put_disk() need to be called after del_gendisk() to free the disk object structure.

Signed-off-by: Selvan Mani <smani@micron.com>
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-16 08:24:40 -06:00
David S. Miller
25c43bf13b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-06-13 23:56:52 -07:00
Linus Torvalds
b85dfd30cb Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block layer fixes from Jens Axboe:
 "Remember about a week ago when I sent the last pull request for 4.1?
  Well, I lied.  Now, I don't want to shift the blame, but Dan, Ming,
  and Richard made a liar out of me.

  Here are three small patches that should go into 4.1.  More
  specifically, this pull request contains:

   - A Kconfig dependency for the pmem block driver, so it can't be
     selected if HAS_IOMEM isn't availble.  From Richard Weinberger.

   - A fix for genhd, making the ext_devt_lock softirq safe.  This makes
     lockdep happier, since we also end up grabbing this lock on release
     off the softirq path.  From Dan Williams.

   - A blk-mq software queue release fix from Ming Lei.

  Last two are headed to stable, first fixes an issue introduced in this
  cycle"

* 'for-linus' of git://git.kernel.dk/linux-block:
  block: pmem: Add dependency on HAS_IOMEM
  block: fix ext_dev_lock lockdep report
  blk-mq: free hctx->ctxs in queue's release handler
2015-06-12 11:35:19 -07:00
Richard Weinberger
b6f2098fb7 block: pmem: Add dependency on HAS_IOMEM
Not all architectures have io memory.

Fixes:
drivers/block/pmem.c: In function ‘pmem_alloc’:
drivers/block/pmem.c:146:2: error: implicit declaration of function ‘ioremap_nocache’ [-Werror=implicit-function-declaration]
  pmem->virt_addr = ioremap_nocache(pmem->phys_addr, pmem->size);
  ^
drivers/block/pmem.c:146:18: warning: assignment makes pointer from integer without a cast [enabled by default]
  pmem->virt_addr = ioremap_nocache(pmem->phys_addr, pmem->size);
                  ^
drivers/block/pmem.c:182:2: error: implicit declaration of function ‘iounmap’ [-Werror=implicit-function-declaration]
  iounmap(pmem->virt_addr);
  ^

Signed-off-by: Richard Weinberger <richard@nod.at>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-11 15:54:58 -06:00
Weijie Yang
d7ad41a1c4 zram: clear disk io accounting when reset zram device
Clear zram disk io accounting when resetting the zram device.  Otherwise
the residual io accounting stat will affect the diskstat in the next
zram active cycle.

Signed-off-by: Weijie Yang <weijie.yang@samsung.com>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-10 16:43:43 -07:00
Geert Uytterhoeven
de667203fd block/ps3vram: Remove obsolete reference to MTD
The ps3vram driver is a plain block device driver since commit
f507cd2203 ("ps3/block: Replace mtd/ps3vram
by block/ps3vram").

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Geoff Levand <geoff@infradead.org>
Acked-by: Jim Paris <jim@jtan.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-10 14:06:55 -06:00
Geoff Levand
e7bdd17b08 block/ps3vram: Fix sparse warnings
Fix sparse warnings like these:

 drivers/block/ps3vram.c: warning: incorrect type in assignment (different address spaces)
 drivers/block/ps3vram.c:    expected unsigned int [usertype] *ctrl
 drivers/block/ps3vram.c:    got void [noderef] <asn:2>*

Cc: Jim Paris <jim@jtan.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Geoff Levand <geoff@infradead.org>
Acked-by: Jim Paris <jim@jtan.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-10 14:06:53 -06:00
David S. Miller
941742f497 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-06-08 20:06:56 -07:00
Toshi Kani
957561ec0f drivers/block/pmem: Map NVDIMM in Write-Through mode
The pmem driver maps NVDIMM uncacheable so that we don't lose
data which hasn't reached non-volatile storage in the case of a
crash. Change this to Write-Through mode which provides uncached
writes but cached reads, thus improving read performance.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Elliott@hp.com
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: arnd@arndb.de
Cc: hch@lst.de
Cc: hmh@hmh.eng.br
Cc: jgross@suse.com
Cc: konrad.wilk@oracle.com
Cc: linux-mm <linux-mm@kvack.org>
Cc: linux-nvdimm@lists.01.org
Cc: stefan.bader@canonical.com
Cc: yigal@plexistor.com
Link: http://lkml.kernel.org/r/1433436928-31903-14-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07 15:29:01 +02:00
Bob Liu
86839c56de xen/block: add multi-page ring support
Extend xen/block to support multi-page ring, so that more requests can be
issued by using more than one pages as the request ring between blkfront
and backend.
As a result, the performance can get improved significantly.

We got some impressive improvements on our highend iscsi storage cluster
backend. If using 64 pages as the ring, the IOPS increased about 15 times
for the throughput testing and above doubled for the latency testing.

The reason was the limit on outstanding requests is 32 if use only one-page
ring, but in our case the iscsi lun was spread across about 100 physical
drives, 32 was really not enough to keep them busy.

Changes in v2:
 - Rebased to 4.0-rc6.
 - Document on how multi-page ring feature working to linux io/blkif.h.

Changes in v3:
 - Remove changes to linux io/blkif.h and follow the protocol defined
   in io/blkif.h of XEN tree.
 - Rebased to 4.1-rc3

Changes in v4:
 - Turn to use 'ring-page-order' and 'max-ring-page-order'.
 - A few comments from Roger.

Changes in v5:
 - Clarify with 4k granularity to comment
 - Address more comments from Roger

Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2015-06-05 21:14:05 -04:00
Bob Liu
8ab0144a46 driver: xen-blkfront: move talk_to_blkback to a more suitable place
The major responsibility of talk_to_blkback() is allocate and initialize
the request ring and write the ring info to xenstore.
But this work should be done after backend entered 'XenbusStateInitWait' as
defined in the protocol file.
See xen/include/public/io/blkif.h in XEN git tree:
Front                                Back
=================================    =====================================
XenbusStateInitialising              XenbusStateInitialising
 o Query virtual device               o Query backend device identification
   properties.                          data.
 o Setup OS device instance.          o Open and validate backend device.
                                      o Publish backend features and
                                        transport parameters.
                                                     |
                                                     |
                                                     V
                                     XenbusStateInitWait

o Query backend features and
  transport parameters.
o Allocate and initialize the
  request ring.

There is no problem with this yet, but it is an violation of the design and
furthermore it would not allow frontend/backend to negotiate 'multi-page'
and 'multi-queue' features.

Changes in v2:
 - Re-write the commit message to be more clear.

Signed-off-by: Bob Liu <bob.liu@oracle.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2015-06-05 21:14:05 -04:00
Bob Liu
69b91ede5c drivers: xen-blkback: delay pending_req allocation to connect_ring
This is a pre-patch for multi-page ring feature.
In connect_ring, we can know exactly how many pages are used for the shared
ring, delay pending_req allocation here so that we won't waste too much memory.

Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2015-06-05 21:14:05 -04:00
Keith Busch
a5768aa887 NVMe: Automatic namespace rescan
Namespaces may be dynamically allocated and deleted or attached and
detached. This has the driver rescan the device for namespace changes
after each device reset or namespace change asynchronous event.

There could potentially be many detached namespaces that we don't want
polluting /dev/ with unusable block handles, so this will delete disks
if the namespace is not active as indicated by the response from identify
namespace. This also skips adding the disk if no capacity is provisioned
to the namespace in the first place.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-05 10:58:34 -06:00
Jon Derrick
36a7e993ee NVMe: Memory barrier before queue_count is incremented
Protects against reordering and/or preempting which would allow the
kthread to access the queue descriptor before it is set up

Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-05 10:33:34 -06:00
Keith Busch
4cc06521ee NVMe: add sysfs and ioctl controller reset
We need the ability to perform an nvme controller reset as discussed on
the mailing list thread:

  http://lists.infradead.org/pipermail/linux-nvme/2015-March/001585.html

This adds a sysfs entry that when written to will reset perform an NVMe
controller reset if the controller was successfully initialized in the
first place.

This also adds locking around resetting the device in the async probe
method so the driver can't schedule two resets.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Cc: Brandon Schultz <brandon.schulz@hgst.com>
Cc: David Sariel <david.sariel@pmcs.com>

Updated by Jens to:

1) Merge this with the ioctl reset patch from David Sariel. The ioctl
   path now shares the reset code from the sysfs path.

2) Don't flush work if we fail issuing the reset.

Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-05 10:30:08 -06:00
Tejun Heo
66114cad64 writeback: separate out include/linux/backing-dev-defs.h
With the planned cgroup writeback support, backing-dev related
declarations will be more widely used across block and cgroup;
unfortunately, including backing-dev.h from include/linux/blkdev.h
makes cyclic include dependency quite likely.

This patch separates out backing-dev-defs.h which only has the
essential definitions and updates blkdev.h to include it.  c files
which need access to more backing-dev details now include
backing-dev.h directly.  This takes backing-dev.h off the common
include dependency chain making it a lot easier to use it across block
and cgroup.

v2: fs/fat build failure fixed.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-02 08:33:34 -06:00
Tejun Heo
4452226ea2 writeback: move backing_dev_info->state into bdi_writeback
Currently, a bdi (backing_dev_info) embeds single wb (bdi_writeback)
and the role of the separation is unclear.  For cgroup support for
writeback IOs, a bdi will be updated to host multiple wb's where each
wb serves writeback IOs of a different cgroup on the bdi.  To achieve
that, a wb should carry all states necessary for servicing writeback
IOs for a cgroup independently.

This patch moves bdi->state into wb.

* enum bdi_state is renamed to wb_state and the prefix of all enums is
  changed from BDI_ to WB_.

* Explicit zeroing of bdi->state is removed without adding zeoring of
  wb->state as the whole data structure is zeroed on init anyway.

* As there's still only one bdi_writeback per backing_dev_info, all
  uses of bdi->state are mechanically replaced with bdi->wb.state
  introducing no behavior changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: drbd-dev@lists.linbit.com
Cc: Neil Brown <neilb@suse.de>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-02 08:33:34 -06:00
Akinobu Mita
8b70f45e2e null_blk: restart request processing on completion handler
When irqmode=2 (IRQ completion handler is timer) and queue_mode=1
(Block interface to use is rq), the completion handler should restart
request handling for any pending requests on a queue because request
processing stops when the number of commands are queued more than
hw_queue_depth (null_rq_prep_fn returns BLKPREP_DEFER).

Without this change, the following command cannot finish.

	# modprobe null_blk irqmode=2 queue_mode=1 hw_queue_depth=1
	# fio --name=t --rw=read --size=1g --direct=1 \
	  --ioengine=libaio --iodepth=64 --filename=/dev/nullb0

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Jens Axboe <axboe@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-01 20:09:05 -06:00
Akinobu Mita
419c21a3b6 null_blk: prevent timer handler running on a different CPU where started
When irqmode=2 (IRQ completion handler is timer), timer handler should
be called on the same CPU where the timer has been started.

Since completion_queues are per-cpu and the completion handler only
touches completion_queue for local CPU, we need to prevent the handler
from running on a different CPU where the timer has been started.
Otherwise, the IO cannot be completed until another completion handler
is executed on that CPU.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Jens Axboe <axboe@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-01 20:09:03 -06:00
Keith Busch
42483228d4 NVMe: Remove hctx reliance for multi-namespace
The driver needs to track shared tags to support multiple namespaces
that may be dynamically allocated or deleted. Relying on the first
request_queue's hctx's is not appropriate as we cannot clear outstanding
tags for all namespaces using this handle, nor can the driver easily track
all request_queue's hctx as namespaces are attached/detached. Instead,
this patch uses the nvme_dev's tagset to get the shared tag resources
instead of through a request_queue hctx.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-01 14:36:06 -06:00
Hannes Reinecke
b84b1d522f scsi: Do not set cmd_per_lun to 1 in the host template
'0' is now used as the default cmd_per_lun value,
so there's no need to explicitly set it to '1' in the
host template.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
2015-05-31 18:06:28 -07:00
Sudip Mukherjee
9f4ba6b058 paride: use new parport device model
Modify paride driver to use the new parallel port device model.

Tested-by: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-06-01 07:10:23 +09:00
Tomas Henzl
b9ea9dcdb9 cciss: correct the non-resettable board list
The hpsa driver carries a more recent version,
copy the table from there.

Signed-off-by: Tomas Henzl <thenzl@redhat.com>
Acked-by: Don Brace <Don.Brace@pmcs.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
2015-05-31 11:14:34 -07:00
Tomas Henzl
c854c38559 cciss: remove duplicate entries from board_type struct
and devices not supported by this driver from unresettable list

Signed-off-by: Tomas Henzl <thenzl@redhat.com>
Acked-by: Don Brace <Don.Brace@pmcs.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
2015-05-31 11:12:33 -07:00
Keith Busch
75619bfa90 NVMe: End sync requests immediately on failure
Do not retry failed sync commands so the original status may be seen
without issuing unnecessary retries.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-29 10:10:30 -06:00
Keith Busch
f4ff414aeb NVMe: Use requested sync command timeout
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-29 10:10:29 -06:00
Arnd Bergmann
fec558b5f1 NVMe: fix type warning on 32-bit
A recent change to the ioctl handling caused a new harmless
warning in the NVMe driver on all 32-bit machines:

drivers/block/nvme-core.c: In function 'nvme_submit_io':
drivers/block/nvme-core.c:1794:29: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]

In order to shup up that warning, this introduces a new
temporary variable that uses a double cast to extract
the pointer from an __u64 structure member.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: a67a95134f ("NVMe: Meta data handling through submit io ioctl")
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-29 10:06:21 -06:00
Luis R. Rodriguez
9c27847dda kernel/params: constify struct kernel_param_ops uses
Most code already uses consts for the struct kernel_param_ops,
sweep the kernel for the last offending stragglers. Other than
include/linux/moduleparam.h and kernel/params.c all other changes
were generated with the following Coccinelle SmPL patch. Merge
conflicts between trees can be handled with Coccinelle.

In the future git could get Coccinelle merge support to deal with
patch --> fail --> grammar --> Coccinelle --> new patch conflicts
automatically for us on patches where the grammar is available and
the patch is of high confidence. Consider this a feature request.

Test compiled on x86_64 against:

	* allnoconfig
	* allmodconfig
	* allyesconfig

@ const_found @
identifier ops;
@@

const struct kernel_param_ops ops = {
};

@ const_not_found depends on !const_found @
identifier ops;
@@

-struct kernel_param_ops ops = {
+const struct kernel_param_ops ops = {
};

Generated-by: Coccinelle SmPL
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Junio C Hamano <gitster@pobox.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: cocci@systeme.lip6.fr
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-05-28 11:32:10 +09:30
David S. Miller
36583eb54d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/cadence/macb.c
	drivers/net/phy/phy.c
	include/linux/skbuff.h
	net/ipv4/tcp.c
	net/switchdev/switchdev.c

Switchdev was a case of RTNH_H_{EXTERNAL --> OFFLOAD}
renaming overlapping with net-next changes of various
sorts.

phy.c was a case of two changes, one adding a local
variable to a function whilst the second was removing
one.

tcp.c overlapped a deadlock fix with the addition of new tcp_info
statistic values.

macb.c involved the addition of two zyncq device entries.

skbuff.h involved adding back ipv4_daddr to nf_bridge_info
whilst net-next changes put two other existing members of
that struct into a union.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-23 01:22:35 -04:00
Keith Busch
a0a931d6a2 NVMe: Fix obtaining command result
Replaces req->sense_len usage, which is not owned by the LLD, to
req->special to contain the command result for driver created commands,
and sets the result unconditionally on completion.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@fb.com>
Fixes: d29ec8241c ("nvme: submit internal commands through the block layer")
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-22 14:13:32 -06:00
Christoph Hellwig
d29ec8241c nvme: submit internal commands through the block layer
Use block layer queues with an internal cmd_type to submit internally
generated NVMe commands.  This both simplifies the code a lot and allow
for a better structure.  For example now the LighNVM code can construct
commands without knowing the details of the underlying I/O descriptors.
Or a future NVMe over network target could inject commands, as well as
could the SCSI translation and ioctl code be reused for such a beast.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-22 08:37:20 -06:00
Christoph Hellwig
772ce43559 nvme: fail SCSI read/write command with unsupported protection bit
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-22 08:36:44 -06:00
Christoph Hellwig
9085176848 nvme: report the DPOFUA in MODE_SENSE
NVMe device always support the FUA bit, and the SCSI translations
accepts the DPO bit, which doesn't have much of a meaning for us.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-22 08:36:42 -06:00
Christoph Hellwig
cbbb7a2ec6 nvme: simplify and cleanup the READ/WRITE SCSI CDB parsing code
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-22 08:36:40 -06:00
Christoph Hellwig
3726897efd nvme: first round at deobsfucating the SCSI translation code
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-22 08:36:38 -06:00
Christoph Hellwig
e61b0a86ca nvme: fix scsi translation error handling
Erorr handling for the scsi translation was completely broken, as there
were two different positive error number spaces overlapping.  Fix this
up by removing one of them, and centralizing the generation of the other
positive values in a single place.  Also fix up a few places that didn't
handle the NVMe error codes properly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-22 08:36:36 -06:00
Christoph Hellwig
b90c48d0c1 nvme: split nvme_trans_send_fw_cmd
This function handles two totally different opcodes, so split it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-22 08:36:34 -06:00
Christoph Hellwig
e75ec752d7 nvme: store a struct device pointer in struct nvme_dev
Most users want the generic device, so store that in struct nvme_dev
instead of the pci_dev.  This also happens to be a nice step towards
making some code reusable for non-PCI transports.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-22 08:36:33 -06:00
Christoph Hellwig
f705f837c5 nvme: consolidate synchronous command submission helpers
Note that we keep the unused timeout argument, but allow callers to
pass 0 instead of a timeout if they want the default.  This will allow
adding a timeout to the pass through path later on.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-22 08:36:31 -06:00
Jens Axboe
6a92700758 loop: remove (now) unused 'out' label
gcc, righfully, complains:

drivers/block/loop.c:1369:1: warning: label 'out' defined but not used [-Wunused-label]

Kill it.

Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-20 09:54:35 -06:00
Ming Lei
9dcd137953 block: nbd: convert to blkdev_reread_part()
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Jarod Wilson <jarod@redhat.com>
Acked-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-20 09:06:13 -06:00
Ming Lei
06f0e9e68c block: loop: fix another reread part failure
loop_clr_fd() can be run piggyback with lo_release(), and
under this situation, reread partition may always fail because
bd_mutex has been held already.

This patch detects the situation by the reference count, and
call __blkdev_reread_part() to avoid acquiring the lock again.

In the meantime, this patch switches to new kernel APIs
of blkdev_reread_part() and __blkdev_reread_part().

Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Jarod Wilson <jarod@redhat.com>
Acked-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-20 09:06:11 -06:00
Ming Lei
f893366795 block: loop: don't hold lo_ctl_mutex in lo_open
The lo_ctl_mutex is held for running all ioctl handlers, and
in some ioctl handlers, ioctl_by_bdev(BLKRRPART) is called for
rereading partitions, which requires bd_mutex.

So it is easy to cause failure because trylock(bd_mutex) may
fail inside blkdev_reread_part(), and follows the lock context:

blkid or other application:
	->open()
		->mutex_lock(bd_mutex)
		->lo_open()
			->mutex_lock(lo_ctl_mutex)

losetup(set fd ioctl):
	->mutex_lock(lo_ctl_mutex)
	->ioctl_by_bdev(BLKRRPART)
		->trylock(bd_mutex)

This patch trys to eliminate the ABBA lock dependency by removing
lo_ctl_mutext in lo_open() with the following approach:

1) make lo_refcnt as atomic_t and avoid acquiring lo_ctl_mutex in lo_open():
	- for open vs. add/del loop, no any problem because of loop_index_mutex
	- freeze request queue during clr_fd, so I/O can't come until
	  clearing fd is completed, like the effect of holding lo_ctl_mutex
	  in lo_open
	- both open() and release() have been serialized by bd_mutex already

2) don't hold lo_ctl_mutex for decreasing/checking lo_refcnt in
lo_release(), then lo_ctl_mutex is only required for the last release.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Jarod Wilson <jarod@redhat.com>
Acked-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-20 09:06:09 -06:00
Christoph Hellwig
cddcd72bce nvme: disable irqs in nvme_freeze_queues
The queue_lock needs to be taken with irqs disabled.  This is mostly
due to the old pre blk-mq usage pattern, but we've also picked it up
in most of the few places where we use the queue_lock with blk-mq.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-19 09:13:06 -06:00
Tomas Henzl
8a0ee3b52d cciss: correct the non-resettable board list
The hpsa driver carries a more recent version,
copy the table from there.

Signed-off-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-18 15:27:31 -06:00
Tomas Henzl
5aea3288d3 cciss: remove duplicate entries from board_type struct
and devices not supported by this driver from unresettable list

Signed-off-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-18 15:27:29 -06:00
David S. Miller
b04096ff33 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Four minor merge conflicts:

1) qca_spi.c renamed the local variable used for the SPI device
   from spi_device to spi, meanwhile the spi_set_drvdata() call
   got moved further up in the probe function.

2) Two changes were both adding new members to codel params
   structure, and thus we had overlapping changes to the
   initializer function.

3) 'net' was making a fix to sk_release_kernel() which is
   completely removed in 'net-next'.

4) In net_namespace.c, the rtnl_net_fill() call for GET operations
   had the command value fixed, meanwhile 'net-next' adjusted the
   argument signature a bit.

This also matches example merge resolutions posted by Stephen
Rothwell over the past two days.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-13 14:31:43 -04:00
Christoph Hellwig
3fd61b2099 nvme: fix kernel memory corruption with short INQUIRY buffers
If userspace asks for an INQUIRY buffer smaller than 36 bytes, the SCSI
translation layer will happily write past the end of the INQUIRY buffer
allocation.

This is fairly easily reproducible by running the libiscsi test
suite and then starting an xfstests run.

Fixes: 4f1982 ("NVMe: Update SCSI Inquiry VPD 83h translation")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-13 10:22:12 -04:00