2
0
mirror of https://github.com/edk2-porting/linux-next.git synced 2025-01-02 02:34:05 +08:00
Commit Graph

4370 Commits

Author SHA1 Message Date
Weijie Yang
38515c7339 zram: correct offset usage in zram_bio_discard
We want to skip the physical block(PAGE_SIZE) which is partially covered
by the discard bio, so we check the remaining size and subtract it if
there is a need to goto the next physical block.

The current offset usage in zram_bio_discard is incorrect, it will cause
its upper filesystem breakdown.  Consider the following scenario:

On some architecture or config, PAGE_SIZE is 64K for example, filesystem
is set up on zram disk without PAGE_SIZE aligned, a discard bio leads to a
offset = 4K and size=72K, normally, it should not really discard any
physical block as it partially cover two physical blocks.  However, with
the current offset usage, it will discard the second physical block and
free its memory, which will cause filesystem breakdown.

This patch corrects the offset usage in zram_bio_discard.

Signed-off-by: Weijie Yang <weijie.yang@samsung.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:54:13 -07:00
Matthew Wilcox
96f8d8e096 brd: return -ENOSPC rather than -ENOMEM on page allocation failure
brd is effectively a thinly provisioned device.  Thinly provisioned
devices return -ENOSPC when they can't write a new block.  -ENOMEM is an
implementation detail that callers shouldn't know.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Acked-by: Dave Chinner <david@fromorbit.com>
Cc: Dheeraj Reddy <dheeraj.reddy@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:54:02 -07:00
Matthew Wilcox
a72132c31d brd: add support for rw_page()
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dheeraj Reddy <dheeraj.reddy@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:54:02 -07:00
Jens Axboe
0e62f51f87 blk-mq: let blk_mq_tag_to_rq() take blk_mq_tags as the main parameter
We currently pass in the hardware queue, and get the tags from there.
But from scsi-mq, with a shared tag space, it's a lot more convenient
to pass in the blk_mq_tags instead as the hardware queue isn't always
directly available. So instead of having to re-map to a given
hardware queue from rq->mq_ctx, just pass in the tags structure.

Signed-off-by: Jens Axboe <axboe@fb.com>
2014-06-04 10:23:49 -06:00
Matthew Wilcox
bd67608a61 NVMe: Rename io_timeout to nvme_io_timeout
It's positively immoral to have a global variable called 'io_timeout'.
Keep the module parameter called io_timeout, though.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2014-06-03 23:04:30 -04:00
Keith Busch
dedf4b1561 NVMe: Use last bytes of f/w rev SCSI Inquiry
After skipping right-padded spaces, use the last four bytes of the
firmware revision when reporting the Inquiry Product Revision. These
are generally more indicative to what is running.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Acked-by: Vishal Verma <vishal.l.verma@linux.intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2014-06-03 22:59:01 -04:00
Sam Bradshaw
b4e75cbf13 NVMe: Adhere to request queue block accounting enable/disable
Recently, a new sysfs control "iostats" was added to selectively
enable or disable io statistics collection for request queues.  This
patch hooks that control.

IO statistics collection is rather expensive on large, multi-node
machines with drives pushing millions of iops.  Having the ability to
disable collection if not needed can improve throughput significantly.

As a data point, on a quad E5-4640, I see more than 50% throughput
improvement when io statistics accounting is disabled during heavily
multi-threaded small block random read benchmarks where device
performance is in the million iops+ range.

Signed-off-by: Sam Bradshaw <sbradshaw@micron.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2014-06-03 22:58:54 -04:00
Keith Busch
a51afb5433 NVMe: Fix nvme get/put queue semantics
The routines to get and lock nvme queues required the caller to "put"
or "unlock" them even if getting one returned NULL. This patch fixes that.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2014-06-03 22:58:34 -04:00
Matthew Wilcox
de672b9748 NVMe: Delete NVME_GET_FEAT_TEMP_THRESH
This define isn't used, and any code that wanted to use it should use
NVME_FEAT_TEMP_THRESH instead.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2014-06-03 22:58:27 -04:00
Keith Busch
9d43cf646e NVMe: Make admin timeout a module parameter
Signed-off-by: Keith Busch <keith.busch@intel.com>
[made admin_timeout static]
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2014-06-03 22:57:56 -04:00
Keith Busch
61e4ce086d NVMe: Make iod bio timeout a parameter
This was originally set to 4 times the IO timeout, but that was when
the IO timeout was 5 seconds instead of 30. 20 seconds for total time
to failure seemed more reasonable than 2 minutes for most, but other
users have requested to make this a module parameter instead.

Signed-off-by: Keith Busch <keith.busch@intel.com>
[renamed the module parameter to retry_time]
[made retry_time static]
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2014-06-03 22:56:43 -04:00
Linus Torvalds
c84a1e32ee Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next
Pull scheduler updates from Ingo Molnar:
 "The main scheduling related changes in this cycle were:

   - various sched/numa updates, for better performance

   - tree wide cleanup of open coded nice levels

   - nohz fix related to rq->nr_running use

   - cpuidle changes and continued consolidation to improve the
     kernel/sched/idle.c high level idle scheduling logic.  As part of
     this effort I pulled cpuidle driver changes from Rafael as well.

   - standardized idle polling amongst architectures

   - continued work on preparing better power/energy aware scheduling

   - sched/rt updates

   - misc fixlets and cleanups"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (49 commits)
  sched/numa: Decay ->wakee_flips instead of zeroing
  sched/numa: Update migrate_improves/degrades_locality()
  sched/numa: Allow task switch if load imbalance improves
  sched/rt: Fix 'struct sched_dl_entity' and dl_task_time() comments, to match the current upstream code
  sched: Consolidate open coded implementations of nice level frobbing into nice_to_rlimit() and rlimit_to_nice()
  sched: Initialize rq->age_stamp on processor start
  sched, nohz: Change rq->nr_running to always use wrappers
  sched: Fix the rq->next_balance logic in rebalance_domains() and idle_balance()
  sched: Use clamp() and clamp_val() to make sys_nice() more readable
  sched: Do not zero sg->cpumask and sg->sgp->power in build_sched_groups()
  sched/numa: Fix initialization of sched_domain_topology for NUMA
  sched: Call select_idle_sibling() when not affine_sd
  sched: Simplify return logic in sched_read_attr()
  sched: Simplify return logic in sched_copy_attr()
  sched: Fix exec_start/task_hot on migrated tasks
  arm64: Remove TIF_POLLING_NRFLAG
  metag: Remove TIF_POLLING_NRFLAG
  sched/idle: Make cpuidle_idle_call() void
  sched/idle: Reflow cpuidle_idle_call()
  sched/idle: Delay clearing the polling bit
  ...
2014-06-03 14:00:15 -07:00
Santosh Y
6808c5fb7f NVMe: Prevent possible NULL pointer dereference
kmalloc() used by the nvme_alloc_iod() to allocate memory for 'iod'
can fail. So check the return value.

Signed-off-by: Santosh Y <santosh.sy@samsung.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2014-06-03 16:43:30 -04:00
Indraneel Mukherjee
4131f2fcdc NVMe: Fix the buffer size passed in GetLogPage(CDW10.NUMD)
In GetLogPage the buffer size passed to device is a 0's based value.

Signed-off-by: Indraneel M <indraneel.m@samsung.com>
Reported-by: Shiro Itou <shiro.itou@outlook.com>
Reviewed-by: Vishal Verma <vishal.l.verma@linux.intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2014-06-03 16:42:41 -04:00
Linus Torvalds
80081ec309 Merge branch 'for-3.16/drivers' of git://git.kernel.dk/linux-block into next
Pull block driver changes from Jens Axboe:
 "Now that the core bits are in, here's the pull request for the driver
  related changes for 3.16.  Nothing out of the ordinary here, mostly
  business as usual.  There are a few pulls of for-3.16/core into this
  branch, which were done when the blk-mq was modified after the
  mtip32xx conversion was put in.

  The pull request contains:

   - skd and cciss converted to use pci_enable_msix_exact().  From
     Alexander Gordeev.

   - A few mtip32xx fixes from Asai @ Micron.

   - The conversion of mtip32xx from make_request_fn to blk-mq, and a
     later small fix for that conversion on quiescing for non-queued IO.
     From me.

   - A fix for bsg to use an exported function to check whether this
     driver is request based or not.  Needed updating for blk-mq, which
     is request based, but does not have a request_fn hook.  From me.

   - Small floppy bug fix from Jiri.

   - A series of cleanups for the cdrom uniform layer from Joe Perches.
     Gets rid of various old ugly macros, making the code conform more
     to the modern coding style.

   - A series of patches for drbd from the drbd crew (Lars Ellenberg and
     Philipp Reisner).

   - A use-after-free fix for null_blk from Ming Lei.

   - Also from Ming Lei is a performance patch for virtio-blk, which can
     net us a 3x win on kvm platforms where world notification is
     expensive.

   - Ming Lei also fixed a stall issue in virtio-blk, due to a race
     between queue start/stop and resource limits.

   - A small batch of fixes for xen-blk{back,front} from Olaf Hering and
     Valentin Priescu"

* 'for-3.16/drivers' of git://git.kernel.dk/linux-block: (54 commits)
  block: virtio_blk: don't hold spin lock during world switch
  xen-blkback: defer freeing blkif to avoid blocking xenwatch
  xen blkif.h: fix comment typo in discard-alignment
  xen/blkback: disable discard feature if requested by toolstack
  xen-blkfront: remove type check from blkfront_setup_discard
  floppy: do not corrupt bio.bi_flags when reading block 0
  mtip32xx: move error handling to service thread
  virtio_blk: fix race between start and stop queue
  mtip32xx: stop block hardware queues before quiescing IO
  mtip32xx: blk_mq_init_queue() returns an ERR_PTR
  mtip32xx: convert to use blk-mq
  cdrom: Remove unnecessary prototype for cdrom_get_disc_info
  cdrom: Remove unnecessary prototype for cdrom_mrw_exit
  cdrom: Remove cdrom_count_tracks prototype
  cdrom: Remove cdrom_get_next_writeable prototype
  cdrom: Remove cdrom_get_last_written prototype
  cdrom: Move mmc_ioctls above cdrom_ioctl to remove unnecessary prototype
  cdrom: Remove unnecessary sanitize_format prototype
  cdrom: Remove unnecessary check_for_audio_disc prototype
  cdrom: Remove prototype for open_for_data
  ...
2014-06-02 13:57:01 -07:00
Linus Torvalds
425553209b PCI changes for the v3.16 merge window:
Enumeration
     - Notify driver before and after device reset (Keith Busch)
     - Use reset notification in NVMe (Keith Busch)
 
   NUMA
     - Warn if we have to guess host bridge node information (Myron Stowe)
     - Work around AMD Fam15h BIOSes that fail to provide _PXM (Suravee Suthikulpanit)
     - Clean up and mark early_root_info_init() as deprecated (Suravee Suthikulpanit)
 
   Driver binding
     - Add "driver_override" for force specific binding (Alex Williamson)
     - Fail "new_id" addition for devices we already know about (Bandan Das)
 
   Resource management
     - Support BAR sizes up to 8GB (Nikhil Rao, Alan Cox)
     - Don't move IORESOURCE_PCI_FIXED resources (Bjorn Helgaas)
     - Mark SBx00 HPET BAR as IORESOURCE_PCI_FIXED (Bjorn Helgaas)
     - Fail safely if we can't handle BARs larger than 4GB (Bjorn Helgaas)
     - Reject BAR above 4GB if dma_addr_t is too small (Bjorn Helgaas)
     - Don't convert BAR address to resource if dma_addr_t is too small (Bjorn Helgaas)
     - Don't set BAR to zero if dma_addr_t is too small (Bjorn Helgaas)
     - Don't print anything while decoding is disabled (Bjorn Helgaas)
     - Don't add disabled subtractive decode bus resources (Bjorn Helgaas)
     - Add resource allocation comments (Bjorn Helgaas)
     - Restrict 64-bit prefetchable bridge windows to 64-bit resources (Yinghai Lu)
     - Assign i82875p_edac PCI resources before adding device (Yinghai Lu)
 
   PCI device hotplug
     - Remove unnecessary "dev->bus" test (Bjorn Helgaas)
     - Use PCI_EXP_SLTCAP_PSN define (Bjorn Helgaas)
     - Fix rphahp endianess issues (Laurent Dufour)
     - Acknowledge spurious "cmd completed" event (Rajat Jain)
     - Allow hotplug service drivers to operate in polling mode (Rajat Jain)
     - Fix cpqphp possible NULL dereference (Rickard Strandqvist)
 
   MSI
     - Replace pci_enable_msi_block() by pci_enable_msi_exact() (Alexander Gordeev)
     - Replace pci_enable_msix() by pci_enable_msix_exact() (Alexander Gordeev)
     - Simplify populate_msi_sysfs() (Jan Beulich)
 
   Virtualization
     - Add Intel Patsburg (X79) root port ACS quirk (Alex Williamson)
     - Mark RTL8110SC INTx masking as broken (Alex Williamson)
 
   Generic host bridge driver
     - Add generic PCI host controller driver (Will Deacon)
 
   Freescale i.MX6
     - Use new clock names (Lucas Stach)
     - Drop old IRQ mapping (Lucas Stach)
     - Remove optional (and unused) IRQs (Lucas Stach)
     - Add support for MSI (Lucas Stach)
     - Fix imx6_add_pcie_port() section mismatch warning (Sachin Kamat)
 
   Renesas R-Car
     - Add gen2 device tree support (Ben Dooks)
     - Use new OF interrupt mapping when possible (Lucas Stach)
     - Add PCIe driver (Phil Edworthy)
     - Add PCIe MSI support (Phil Edworthy)
     - Add PCIe device tree bindings (Phil Edworthy)
 
   Samsung Exynos
     - Remove unnecessary OOM messages (Jingoo Han)
     - Fix add_pcie_port() section mismatch warning (Sachin Kamat)
 
   Synopsys DesignWare
     - Make MSI ISR shared IRQ aware (Lucas Stach)
 
   Miscellaneous
     - Check for broken config space aliasing (Alex Williamson)
     - Update email address (Ben Hutchings)
     - Fix Broadcom CNB20LE unintended sign extension (Bjorn Helgaas)
     - Fix incorrect vgaarb conditional in WARN_ON() (Bjorn Helgaas)
     - Remove unnecessary __ref annotations (Bjorn Helgaas)
     - Add arch/x86/kernel/quirks.c to MAINTAINERS PCI file patterns (Bjorn Helgaas)
     - Fix use of uninitialized MPS value (Bjorn Helgaas)
     - Tidy x86/gart messages (Bjorn Helgaas)
     - Fix return value from pci_user_{read,write}_config_*() (Gavin Shan)
     - Turn pcibios_penalize_isa_irq() into a weak function (Hanjun Guo)
     - Remove unused serial device IDs (Jean Delvare)
     - Use designated initialization in PCI_VDEVICE (Mark Rustad)
     - Fix powerpc NULL dereference in pci_root_buses traversal (Mike Qiu)
     - Configure MPS on ARM (Murali Karicheri)
     - Remove unnecessary includes of <linux/init.h> (Paul Gortmaker)
     - Move Open Firmware devspec attribute to PCI common code (Sebastian Ott)
     - Use pdev->dev.groups for attribute creation on s390 (Sebastian Ott)
     - Remove pcibios_add_platform_entries() (Sebastian Ott)
     - Add new ID for Intel GPU "spurious interrupt" quirk (Thomas Jarosch)
     - Rename pci_is_bridge() to pci_has_subordinate() (Yijing Wang)
     - Add and use new pci_is_bridge() interface (Yijing Wang)
     - Make pci_bus_add_device() void (Yijing Wang)
 
   DMA API
     - Clarify physical/bus address distinction in docs (Bjorn Helgaas)
     - Fix typos in docs (Emilio López)
     - Update dma_pool_create ()and dma_pool_alloc() descriptions (Gioh Kim)
     - Change dma_declare_coherent_memory() CPU address to phys_addr_t (Bjorn Helgaas)
     - Pass GAPSPCI_DMA_BASE CPU & bus address to dma_declare_coherent_memory() (Bjorn Helgaas)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJTjMaQAAoJEFmIoMA60/r8XncQAKX7cD6btXCZnrcYo7inseyp
 3rwOlrsNkWyHqSj/RqqzE1NY6L1h5G2uliI6xg1SKenuHPDcosm5d8FYO0ORKiUs
 xrqBkmZJHXN7fck//tJwsTXiYh5u42RO8QWbvZVr5UqXe40LyaMHMh9Y7VarrU/o
 sM2ADzFKagv1qMQ13nmYxqT+Zl+CqpimyLP+ep6Nfqxi6ils+KJ6b9SKYqrpqE6t
 Mcq2K5ShqU5SaYub1JIXLcQ9XylID+t1M9+cwixcs7a87HbJiktfkGqQvQJoUIuK
 Q5U+abcIGk4vfOnDCctSnoRyrcbTAZ/vqfo0vpX22TokESjwrD8hFOX5HPOFtD+4
 wIDbYurW/8oBhLRaJ0uTPzSH8bXjXTynAwxHZgIrEur5908eECKQ/WiFCxyrovvv
 r4ThAN0FaobllEr0XOFESOzDNSt/ME00WWI7+puAJ/KJkFEtcXt9othLmLmvLz8H
 2GWXrm/aOR0WUO7foGUxI3bXYlDN6NbSKpfuZsLAi2VAyJJ6L6yVSo/fT0X07e3z
 qRy9LOohuiwIKv/I4F2SEq2REfGGsnkrJBoeQi/oBZDcBy1Lsi7P9LWIERhLQEM+
 Hm+30lC/f326nI3hoyThj2k2xxZOQzCIvrt658xP4qd9Zfe1bvCH58FF8K62CoOd
 p8XAf7Sl6v6YUodUrT/t
 =km55
 -----END PGP SIGNATURE-----

Merge tag 'pci-v3.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci into next

Pull PCI changes from Bjorn Helgaas:
 "Enumeration
    - Notify driver before and after device reset (Keith Busch)
    - Use reset notification in NVMe (Keith Busch)

  NUMA
    - Warn if we have to guess host bridge node information (Myron Stowe)
    - Work around AMD Fam15h BIOSes that fail to provide _PXM (Suravee
      Suthikulpanit)
    - Clean up and mark early_root_info_init() as deprecated (Suravee
      Suthikulpanit)

  Driver binding
    - Add "driver_override" for force specific binding (Alex Williamson)
    - Fail "new_id" addition for devices we already know about (Bandan
      Das)

  Resource management
    - Support BAR sizes up to 8GB (Nikhil Rao, Alan Cox)
    - Don't move IORESOURCE_PCI_FIXED resources (Bjorn Helgaas)
    - Mark SBx00 HPET BAR as IORESOURCE_PCI_FIXED (Bjorn Helgaas)
    - Fail safely if we can't handle BARs larger than 4GB (Bjorn Helgaas)
    - Reject BAR above 4GB if dma_addr_t is too small (Bjorn Helgaas)
    - Don't convert BAR address to resource if dma_addr_t is too small
      (Bjorn Helgaas)
    - Don't set BAR to zero if dma_addr_t is too small (Bjorn Helgaas)
    - Don't print anything while decoding is disabled (Bjorn Helgaas)
    - Don't add disabled subtractive decode bus resources (Bjorn Helgaas)
    - Add resource allocation comments (Bjorn Helgaas)
    - Restrict 64-bit prefetchable bridge windows to 64-bit resources
      (Yinghai Lu)
    - Assign i82875p_edac PCI resources before adding device (Yinghai Lu)

  PCI device hotplug
    - Remove unnecessary "dev->bus" test (Bjorn Helgaas)
    - Use PCI_EXP_SLTCAP_PSN define (Bjorn Helgaas)
    - Fix rphahp endianess issues (Laurent Dufour)
    - Acknowledge spurious "cmd completed" event (Rajat Jain)
    - Allow hotplug service drivers to operate in polling mode (Rajat Jain)
    - Fix cpqphp possible NULL dereference (Rickard Strandqvist)

  MSI
    - Replace pci_enable_msi_block() by pci_enable_msi_exact()
      (Alexander Gordeev)
    - Replace pci_enable_msix() by pci_enable_msix_exact() (Alexander Gordeev)
    - Simplify populate_msi_sysfs() (Jan Beulich)

  Virtualization
    - Add Intel Patsburg (X79) root port ACS quirk (Alex Williamson)
    - Mark RTL8110SC INTx masking as broken (Alex Williamson)

  Generic host bridge driver
    - Add generic PCI host controller driver (Will Deacon)

  Freescale i.MX6
    - Use new clock names (Lucas Stach)
    - Drop old IRQ mapping (Lucas Stach)
    - Remove optional (and unused) IRQs (Lucas Stach)
    - Add support for MSI (Lucas Stach)
    - Fix imx6_add_pcie_port() section mismatch warning (Sachin Kamat)

  Renesas R-Car
    - Add gen2 device tree support (Ben Dooks)
    - Use new OF interrupt mapping when possible (Lucas Stach)
    - Add PCIe driver (Phil Edworthy)
    - Add PCIe MSI support (Phil Edworthy)
    - Add PCIe device tree bindings (Phil Edworthy)

  Samsung Exynos
    - Remove unnecessary OOM messages (Jingoo Han)
    - Fix add_pcie_port() section mismatch warning (Sachin Kamat)

  Synopsys DesignWare
    - Make MSI ISR shared IRQ aware (Lucas Stach)

  Miscellaneous
    - Check for broken config space aliasing (Alex Williamson)
    - Update email address (Ben Hutchings)
    - Fix Broadcom CNB20LE unintended sign extension (Bjorn Helgaas)
    - Fix incorrect vgaarb conditional in WARN_ON() (Bjorn Helgaas)
    - Remove unnecessary __ref annotations (Bjorn Helgaas)
    - Add arch/x86/kernel/quirks.c to MAINTAINERS PCI file patterns
      (Bjorn Helgaas)
    - Fix use of uninitialized MPS value (Bjorn Helgaas)
    - Tidy x86/gart messages (Bjorn Helgaas)
    - Fix return value from pci_user_{read,write}_config_*() (Gavin Shan)
    - Turn pcibios_penalize_isa_irq() into a weak function (Hanjun Guo)
    - Remove unused serial device IDs (Jean Delvare)
    - Use designated initialization in PCI_VDEVICE (Mark Rustad)
    - Fix powerpc NULL dereference in pci_root_buses traversal (Mike Qiu)
    - Configure MPS on ARM (Murali Karicheri)
    - Remove unnecessary includes of <linux/init.h> (Paul Gortmaker)
    - Move Open Firmware devspec attribute to PCI common code (Sebastian Ott)
    - Use pdev->dev.groups for attribute creation on s390 (Sebastian Ott)
    - Remove pcibios_add_platform_entries() (Sebastian Ott)
    - Add new ID for Intel GPU "spurious interrupt" quirk (Thomas Jarosch)
    - Rename pci_is_bridge() to pci_has_subordinate() (Yijing Wang)
    - Add and use new pci_is_bridge() interface (Yijing Wang)
    - Make pci_bus_add_device() void (Yijing Wang)

  DMA API
    - Clarify physical/bus address distinction in docs (Bjorn Helgaas)
    - Fix typos in docs (Emilio López)
    - Update dma_pool_create ()and dma_pool_alloc() descriptions (Gioh Kim)
    - Change dma_declare_coherent_memory() CPU address to phys_addr_t
      (Bjorn Helgaas)
    - Pass GAPSPCI_DMA_BASE CPU & bus address to dma_declare_coherent_memory()
      (Bjorn Helgaas)"

* tag 'pci-v3.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (92 commits)
  MAINTAINERS: Add generic PCI host controller driver
  PCI: generic: Add generic PCI host controller driver
  PCI: imx6: Add support for MSI
  PCI: designware: Make MSI ISR shared IRQ aware
  PCI: imx6: Remove optional (and unused) IRQs
  PCI: imx6: Drop old IRQ mapping
  PCI: imx6: Use new clock names
  i82875p_edac: Assign PCI resources before adding device
  ARM/PCI: Call pcie_bus_configure_settings() to set MPS
  PCI: imx6: Fix imx6_add_pcie_port() section mismatch warning
  PCI: Make pci_bus_add_device() void
  PCI: exynos: Fix add_pcie_port() section mismatch warning
  PCI: Introduce new device binding path using pci_dev.driver_override
  PCI: rcar: Add gen2 device tree support
  PCI: cpqphp: Fix possible null pointer dereference
  PCI: rcar: Add R-Car PCIe device tree bindings
  PCI: rcar: Add MSI support for PCIe
  PCI: rcar: Add Renesas R-Car PCIe driver
  PCI: Fix return value from pci_user_{read,write}_config_*()
  PCI: exynos: Remove unnecessary OOM messages
  ...
2014-06-02 12:15:19 -07:00
Linus Torvalds
681a289548 Merge branch 'for-3.16/core' of git://git.kernel.dk/linux-block into next
Pull block core updates from Jens Axboe:
 "It's a big(ish) round this time, lots of development effort has gone
  into blk-mq in the last 3 months.  Generally we're heading to where
  3.16 will be a feature complete and performant blk-mq.  scsi-mq is
  progressing nicely and will hopefully be in 3.17.  A nvme port is in
  progress, and the Micron pci-e flash driver, mtip32xx, is converted
  and will be sent in with the driver pull request for 3.16.

  This pull request contains:

   - Lots of prep and support patches for scsi-mq have been integrated.
     All from Christoph.

   - API and code cleanups for blk-mq from Christoph.

   - Lots of good corner case and error handling cleanup fixes for
     blk-mq from Ming Lei.

   - A flew of blk-mq updates from me:

     * Provide strict mappings so that the driver can rely on the CPU
       to queue mapping.  This enables optimizations in the driver.

     * Provided a bitmap tagging instead of percpu_ida, which never
       really worked well for blk-mq.  percpu_ida relies on the fact
       that we have a lot more tags available than we really need, it
       fails miserably for cases where we exhaust (or are close to
       exhausting) the tag space.

     * Provide sane support for shared tag maps, as utilized by scsi-mq

     * Various fixes for IO timeouts.

     * API cleanups, and lots of perf tweaks and optimizations.

   - Remove 'buffer' from struct request.  This is ancient code, from
     when requests were always virtually mapped.  Kill it, to reclaim
     some space in struct request.  From me.

   - Remove 'magic' from blk_plug.  Since we store these on the stack
     and since we've never caught any actual bugs with this, lets just
     get rid of it.  From me.

   - Only call part_in_flight() once for IO completion, as includes two
     atomic reads.  Hopefully we'll get a better implementation soon, as
     the part IO stats are now one of the more expensive parts of doing
     IO on blk-mq.  From me.

   - File migration of block code from {mm,fs}/ to block/.  This
     includes bio.c, bio-integrity.c, bounce.c, and ioprio.c.  From me,
     from a discussion on lkml.

  That should describe the meat of the pull request.  Also has various
  little fixes and cleanups from Dave Jones, Shaohua Li, Duan Jiong,
  Fengguang Wu, Fabian Frederick, Randy Dunlap, Robert Elliott, and Sam
  Bradshaw"

* 'for-3.16/core' of git://git.kernel.dk/linux-block: (100 commits)
  blk-mq: push IPI or local end_io decision to __blk_mq_complete_request()
  blk-mq: remember to start timeout handler for direct queue
  block: ensure that the timer is always added
  blk-mq: blk_mq_unregister_hctx() can be static
  blk-mq: make the sysfs mq/ layout reflect current mappings
  blk-mq: blk_mq_tag_to_rq should handle flush request
  block: remove dead code in scsi_ioctl:blk_verify_command
  blk-mq: request initialization optimizations
  block: add queue flag for disabling SG merging
  block: remove 'magic' from struct blk_plug
  blk-mq: remove alloc_hctx and free_hctx methods
  blk-mq: add file comments and update copyright notices
  blk-mq: remove blk_mq_alloc_request_pinned
  blk-mq: do not use blk_mq_alloc_request_pinned in blk_mq_map_request
  blk-mq: remove blk_mq_wait_for_tags
  blk-mq: initialize request in __blk_mq_alloc_request
  blk-mq: merge blk_mq_alloc_reserved_request into blk_mq_alloc_request
  blk-mq: add helper to insert requests from irq context
  blk-mq: remove stale comment for blk_mq_complete_request()
  blk-mq: allow non-softirq completions
  ...
2014-06-02 09:29:34 -07:00
Linus Torvalds
5d70dacd4e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k into next
Pull m68k updates from Geert Uytterhoeven:
 "Highlights:
   - support for running kernels in fast TT-RAM instead of slow ST-RAM
     on Atari
   - multi-platform EARLY_PRINTK
   - better support for machines with lots of RAM (think ARAnyM), and
     for running kernels larger than 4 MiB (think multi-platform)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
  m68k/hp300: Convert printk to pr_foo()
  m68k/apollo: Convert printk to pr_foo()
  m68k/amiga: Convert printk(foo to pr_foo()
  m68k: Increase initial mapping to 8 or 16 MiB if possible
  m68k: Update defconfigs for v3.15-rc2
  m68k/atari: fix SCC initialization for debug console
  m68k/mvme16x: Adopt common boot console
  m68k: Multi-platform EARLY_PRINTK
  m68k: Toward platform agnostic framebuffer debug logging
  m68k/atari - atari_scsi: use correct virt/phys translation for DMA buffer
  m68k/atari - ataflop: use correct virt/phys translation for DMA buffer
  m68k/atari - atafb: convert allocation of fb ram to new interface
  m68k/atari - stram: alloc ST-RAM pool even if kernel not in ST-RAM
2014-06-02 08:03:34 -07:00
Ming Lei
e8edca6f7f block: virtio_blk: don't hold spin lock during world switch
Firstly, it isn't necessary to hold lock of vblk->vq_lock
when notifying hypervisor about queued I/O.

Secondly, virtqueue_notify() will cause world switch and
it may take long time on some hypervisors(such as, qemu-arm),
so it isn't good to hold the lock and block other vCPUs.

On arm64 quad core VM(qemu-kvm), the patch can increase I/O
performance a lot with VIRTIO_RING_F_EVENT_IDX enabled:
	- without the patch: 14K IOPS
	- with the patch: 34K IOPS

fio script:
	[global]
	direct=1
	bsrange=4k-4k
	timeout=10
	numjobs=4
	ioengine=libaio
	iodepth=64

	filename=/dev/vdc
	group_reporting=1

	[f1]
	rw=randread

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org # 3.13+
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-05-30 08:19:39 -06:00
Jens Axboe
f89ca16646 Merge branch 'for-3.16/core' into for-3.16/drivers
Pulled in for the blk_mq_tag_to_rq() change, which impacts
mtip32xx.

Signed-off-by: Jens Axboe <axboe@fb.com>
2014-05-30 08:11:50 -06:00
Jens Axboe
879466e6a5 Merge branch 'stable/for-jens-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip into for-3.16/drivers
Konrad writes:

Please git pull the following branch:

git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git stable/for-jens-3.16

which has a bunch of fixes to the Xen block frontend and backend driver
and a new parameter for Xen backend driver - an override (set by the toolstack)
whether to expose the discard support (if disk of course supports it) or not.
2014-05-28 12:37:04 -06:00
Valentin Priescu
814d04e7df xen-blkback: defer freeing blkif to avoid blocking xenwatch
Currently xenwatch blocks in VBD disconnect, waiting for all pending I/O
requests to finish. If the VBD is attached to a hot-swappable disk, then
xenwatch can hang for a long period of time, stalling other watches.

 INFO: task xenwatch:39 blocked for more than 120 seconds.
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 ffff880057f01bd0 0000000000000246 ffff880057f01ac0 ffffffff810b0782
 ffff880057f01ad0 00000000000131c0 0000000000000004 ffff880057edb040
 ffff8800344c6080 0000000000000000 ffff880058c00ba0 ffff880057edb040
 Call Trace:
 [<ffffffff810b0782>] ? irq_to_desc+0x12/0x20
 [<ffffffff8128f761>] ? list_del+0x11/0x40
 [<ffffffff8147a080>] ? wait_for_common+0x60/0x160
 [<ffffffff8147bcef>] ? _raw_spin_lock_irqsave+0x2f/0x50
 [<ffffffff8147bd49>] ? _raw_spin_unlock_irqrestore+0x19/0x20
 [<ffffffff8147a26a>] schedule+0x3a/0x60
 [<ffffffffa018fe6a>] xen_blkif_disconnect+0x8a/0x100 [xen_blkback]
 [<ffffffff81079f70>] ? wake_up_bit+0x40/0x40
 [<ffffffffa018ffce>] xen_blkbk_remove+0xae/0x1e0 [xen_blkback]
 [<ffffffff8130b254>] xenbus_dev_remove+0x44/0x90
 [<ffffffff81345cb7>] __device_release_driver+0x77/0xd0
 [<ffffffff81346488>] device_release_driver+0x28/0x40
 [<ffffffff813456e8>] bus_remove_device+0x78/0xe0
 [<ffffffff81342c9f>] device_del+0x12f/0x1a0
 [<ffffffff81342d2d>] device_unregister+0x1d/0x60
 [<ffffffffa0190826>] frontend_changed+0xa6/0x4d0 [xen_blkback]
 [<ffffffffa019c252>] ? frontend_changed+0x192/0x650 [xen_netback]
 [<ffffffff8130ae50>] ? cmp_dev+0x60/0x60
 [<ffffffff81344fe4>] ? bus_for_each_dev+0x94/0xa0
 [<ffffffff8130b06e>] xenbus_otherend_changed+0xbe/0x120
 [<ffffffff8130b4cb>] frontend_changed+0xb/0x10
 [<ffffffff81309c82>] xenwatch_thread+0xf2/0x130
 [<ffffffff81079f70>] ? wake_up_bit+0x40/0x40
 [<ffffffff81309b90>] ? xenbus_directory+0x80/0x80
 [<ffffffff810799d6>] kthread+0x96/0xa0
 [<ffffffff81485934>] kernel_thread_helper+0x4/0x10
 [<ffffffff814839f3>] ? int_ret_from_sys_call+0x7/0x1b
 [<ffffffff8147c17c>] ? retint_restore_args+0x5/0x6
 [<ffffffff81485930>] ? gs_change+0x13/0x13

With this patch, when there is still pending I/O, the actual disconnect
is done by the last reference holder (last pending I/O request). In this
case, xenwatch doesn't block indefinitely.

Signed-off-by: Valentin Priescu <priescuv@amazon.com>
Reviewed-by: Steven Kady <stevkady@amazon.com>
Reviewed-by: Steven Noonan <snoonan@amazon.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-05-28 14:17:32 -04:00
Olaf Hering
c926b701fe xen/blkback: disable discard feature if requested by toolstack
Newer toolstacks may provide a boolean property "discard-enable" in the
backend node. Its purpose is to disable discard for file backed storage
to avoid fragmentation. Recognize this setting also for physical
storage.  If that property exists and is false, do not advertise
"feature-discard" to the frontend.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-05-28 14:17:28 -04:00
Olaf Hering
1c8cad6c1b xen-blkfront: remove type check from blkfront_setup_discard
In its initial implementation a check for "type" was added, but only phy
and file are handled. This breaks advertised discard support for other
type values such as qdisk.

Fix and simplify this function: If the backend advertises discard
support it is supposed to implement it properly, so enable
feature_discard unconditionally. If the backend advertises the need for
a certain granularity and alignment then propagate both properties to
the blocklayer. The discard-secure property is a boolean, update the code
to reflect that.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-05-28 14:17:26 -04:00
Jens Axboe
0fb662e225 Merge branch 'for-3.16/core' into for-3.16/drivers
Pull in core changes (again), since we got rid of the alloc/free
hctx mq_ops hooks and mtip32xx then needed updating again.

Signed-off-by: Jens Axboe <axboe@fb.com>
2014-05-28 10:18:51 -06:00
Christoph Hellwig
cdef54dd85 blk-mq: remove alloc_hctx and free_hctx methods
There is no need for drivers to control hardware context allocation
now that we do the context to node mapping in common code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-05-28 10:18:31 -06:00
Jens Axboe
6178976500 Merge branch 'for-3.16/core' into for-3.16/drivers
mtip32xx uses blk_mq_alloc_reserved_request(), so pull in the
core changes so we have a properly merged end result.

Signed-off-by: Jens Axboe <axboe@fb.com>
2014-05-28 09:50:26 -06:00
Jiri Kosina
6314a108ec floppy: do not corrupt bio.bi_flags when reading block 0
Commit 41a55b4de3 ("floppy: silence warning during disk test") caused
bio.bi_flags being overwritten, and its initialization to BIO_UPTODATE
in bio_init() to be lost.

This was unnoticed until 7b7b68bba5 ("floppy: bail out in open() if
drive is not responding to block0 read"), because the error value wasn't
checked for in the bio completion callback.

Now we are actually looking at the error, and the loss of BIO_UPTODATE
causes EIO to be wrongly passed to the callback, which confuses the
FD_OPEN_SHOULD_FAIL_BIT logic.

Fix this by not destroying previous value of bi_flags when setting
BIO_QUIET.

Cc: Stephen Hemminger <shemminger@vyatta.com>
Reported-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2014-05-28 14:56:52 +02:00
Jens Axboe
f14bbe77a9 blk-mq: pass in suggested NUMA node to ->alloc_hctx()
Drivers currently have to figure this out on their own, and they
are missing information to do it properly. The ones that did
attempt to do it, do it wrong.

So just pass in the suggested node directly to the alloc
function.

Signed-off-by: Jens Axboe <axboe@fb.com>
2014-05-27 12:06:53 -06:00
Keith Busch
f0d54a5417 NVMe: Implement PCIe reset notification callback
Quiesce and shutdown the device prior to reset, then restart the device and
resume IO after.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2014-05-27 11:12:29 -06:00
Ming Lei
aa0818c6ee virtio_blk: fix race between start and stop queue
When there isn't enough vring descriptor for adding to vq,
blk-mq will be put as stopped state until some of pending
descriptors are completed & freed.

Unfortunately, the vq's interrupt may come just before
blk-mq's BLK_MQ_S_STOPPED flag is set, so the blk-mq will
still be kept as stopped even though lots of descriptors
are completed and freed in the interrupt handler. The worst
case is that all pending descriptors are freed in the
interrupt handler, and the queue is kept as stopped forever.

This patch fixes the problem by starting/stopping blk-mq
with holding vq_lock.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>

Conflicts:
	drivers/block/virtio_blk.c
2014-05-27 08:41:10 -06:00
Michael Schmitz
a4de73fbcf m68k/atari - ataflop: use correct virt/phys translation for DMA buffer
With the kernel running from FastRAM instead of ST-RAM, none of ST-RAM is
mapped by mem_init, and DMA-addressable buffer must be mapped by ioremap.

Use platform specific virt/phys translation helpers for this case.

Signed-off-by: Michael Schmitz <schmitz@debian.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
2014-05-26 22:41:24 +02:00
Ingo Molnar
65c2ce7004 Linux 3.15-rc6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJTfR2zAAoJEHm+PkMAQRiG3noH/2s+KUge3qO2M+AmxttUo74B
 +npAMdbqYR3MdEiwxYZfsHcMu4Ye/IKLcrh4pydB5hI2mdjtGkH1bnmia0f1ve/c
 Z/a0256+W8gWp7mcUBqSNztqLPAWa7wKOqNdLjj5idr1BSj6u8im+fQ9FBh2woki
 1fyYAuq/60lq4CMOKJvkA95V1Ome/jO+8tS4PguOgsCETQxCVFGurZcBbG3Mx5Y3
 v+ioCqeRc6GvxPFR6YngnTZCrsLxSRT3tnO2Qy5zX7dxjIQkCEbvIckpBQv01Y3R
 wNUaX+2Jae207igxrEv8CjmCFnmZFuUI15aWWCy6fOS/j8bjuk6ThYJO8N4ZBM0=
 =2ShG
 -----END PGP SIGNATURE-----

Merge tag 'v3.15-rc6' into sched/core, to pick up the latest fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 10:28:56 +02:00
Asai Thambi S P
9b204fbf09 mtip32xx: move error handling to service thread
Move error handling to service thread, and use mtip_set_timeout()
to set timeouts for HDIO_DRIVE_TASK and HDIO_DRIVE_CMD IOCTL commands.

Signed-off-by: Selvan Mani <smani@micron.com>
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-05-20 11:51:30 -06:00
Ming Lei
0c29e93eae virtio_blk: fix race between start and stop queue
When there isn't enough vring descriptor for adding to vq,
blk-mq will be put as stopped state until some of pending
descriptors are completed & freed.

Unfortunately, the vq's interrupt may come just before
blk-mq's BLK_MQ_S_STOPPED flag is set, so the blk-mq will
still be kept as stopped even though lots of descriptors
are completed and freed in the interrupt handler. The worst
case is that all pending descriptors are freed in the
interrupt handler, and the queue is kept as stopped forever.

This patch fixes the problem by starting/stopping blk-mq
with holding vq_lock.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-05-16 09:40:31 -06:00
Jens Axboe
9acf03cfb1 mtip32xx: stop block hardware queues before quiescing IO
We need to stop the block layer queues to prevent new "normal"
IO from entering the driver, while we wait for existing commands
to finish.

Signed-off-by: Jens Axboe <axboe@fb.com>
2014-05-14 08:22:56 -06:00
Dan Carpenter
a8a642ccd2 mtip32xx: blk_mq_init_queue() returns an ERR_PTR
We changed this from blk_alloc_queue_node() to blk_mq_init_queue() so
the check needs to be updated as well.

Fixes: ffc771b3ca ('mtip32xx: convert to use blk-mq')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-05-14 08:05:44 -06:00
Jens Axboe
ffc771b3ca mtip32xx: convert to use blk-mq
This rips out timeout handling, requeueing, etc in converting
it to use blk-mq instead.

Acked-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-05-13 19:51:22 -06:00
Matthew Wilcox
21bd78bcf4 NVMe: Enable BUILD_BUG_ON checks
Since _nvme_check_size() wasn't being called from anywhere, the compiler
was optimising it away ... along with all the link-time build failures
that would result if any of the structures were the wrong size.  Call it
from nvme_exit() for no particular reason.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2014-05-09 22:42:39 -04:00
Ingo Molnar
2fe5de9ce7 Merge branch 'sched/urgent' into sched/core, to avoid conflicts
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-07 13:15:46 +02:00
Keith Busch
53562be74b NVMe: Flush with data support
It is possible a filesystem may send a flush flagged bio with write
data. There is no such composite NVMe command, so the driver sends flush
and write separately.

The device is allowed to execute these commands in any order, so it was
possible the driver ends the bio after the write completes, but while the
flush is still active. We don't want to let a filesystem believe flush
succeeded before it really has; this could cause data corruption on a
power loss between these events. To fix, this patch splits the flush
and write into chained bios.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2014-05-05 10:54:02 -04:00
Keith Busch
a7d2ce2832 NVMe: Configure support for block flush
This configures an nvme request_queue as flush capable if the device
has a volatile write cache present.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2014-05-05 10:53:53 -04:00
Matthew Daley
2145e15e05 floppy: don't write kernel-only members to FDRAWCMD ioctl output
Do not leak kernel-only floppy_raw_cmd structure members to userspace.
This includes the linked-list pointer and the pointer to the allocated
DMA space.

Signed-off-by: Matthew Daley <mattd@bugfuzz.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-05-05 07:46:56 -07:00
Matthew Daley
ef87dbe761 floppy: ignore kernel-only members in FDRAWCMD ioctl input
Always clear out these floppy_raw_cmd struct members after copying the
entire structure from userspace so that the in-kernel version is always
valid and never left in an interdeterminate state.

Signed-off-by: Matthew Daley <mattd@bugfuzz.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-05-05 07:46:55 -07:00
Keith Busch
3291fa57cb NVMe: Add tracepoints
Adding tracepoints for bio_complete and block_split into nvme to help
with gathering IO info using blktrace and blkparse.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2014-05-05 10:41:26 -04:00
Keith Busch
94bbac4052 NVMe: Protect against badly formatted CQEs
If a misbehaving device posts a CQE with a command id < depth but for
one that was never allocated, the command info will have a callback
function set to NULL and we don't want to try invoking that.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2014-05-05 10:41:25 -04:00
Matthew Wilcox
27e8166c31 NVMe: Improve error messages
Help people diagnose what is going wrong at initialisation time by
printing out which command has gone wrong and what the device returned.
Also fix the error message printed while waiting for reset.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
2014-05-05 10:41:25 -04:00
Matthew Wilcox
8757ad65d3 NVMe: Update copyright headers
Make the copyright dates accurate and remove the final paragraph that
includes the address of the FSF.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2014-05-05 10:41:25 -04:00
Ming Lei
fc27691f35 block: null_blk: fix use after free
entry(cmd->ll_list) may belong to new request once end_cmd()
returns, so fix the bug with the patch.

Without the change, it is easy to observe oops when
doing null_blk(timer) test.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-05-01 09:17:41 -06:00
Lars Ellenberg
ec4a340789 drbd: use list_first_entry_or_null in first_peer_device/first_connection
If there are no peer_devices or connections, I'd rather have NULL
than some "arbitrary" address pretending to point to a struct.

Helps to avoid hard to debug symptoms, in case we ever try to use
and dereference a drbd_connection or drbd_peer_device
where we in fact don't have any connection at all.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30 13:46:56 -06:00
Philipp Reisner
babea49ebe drbd: Allow attaching of a newly created device to any backing device
A newly created device was never exposed before, i.e. has a
exposed_data_uuid of 0. Then it is valid to attach to any current_uuid
of a backing device (of course also to a newly created one (4))

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30 13:46:56 -06:00
Philipp Reisner
02df6fe145 drbd: Test cstate while holding req_lock
In case a connection transitions into C_TIMEOUT within the timer
function (request_timer_fn()) we need to make sure that the receiver
thread (potentially running on a different CPU) sees the updated
cstate later on.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30 13:46:56 -06:00
Philipp Reisner
c1b3156f12 drbd: use blk_set_stacking_limits()
...instead directly assigning to q->limits.discard_zeroes_data

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30 13:46:55 -06:00
Lars Ellenberg
08535466bc drbd: evaluate disk and network timeout on different requests
Just because it is the oldest not yet completed request
does not make it the oldest request waiting for disk.
Or waiting for the peer.

And we completely missed already completed requests
that would still hold references to activity log extents,
waiting only for the barrier ack.

Find two oldest not yet completely processed requests,
one that is still waiting for local completion,
and one that is still waiting for some response from the peer.
These may or may not be the same request object.

Then separately apply the network and disk timeouts, respectively.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30 13:46:55 -06:00
Philipp Reisner
67cca286ca drbd: Fix a hole in the challange-response connection authentication
In the implementation as it was, the two peers sent each other
a challenge, and expects the challenge hashed with the shared
secret back.

A attacker could simply wait for the challenge of the peer, and
send the same challenge back. Then it waits for the response, and
sends the same response back.

Prevent this by not accepting a challenge from the peer that is
the same as the challenge sent to the peer.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30 13:46:55 -06:00
Lars Ellenberg
f9c78128f8 drbd: always implicitly close last epoch when idle
Once our sender thread needs to wait_for_work(),
and actually needs to schedule(), just before we do that,
we already check if it is useful to implicitly close the last epoch.

The condition was too strict: only implicitly close the epoch,
if there have been no new (write) requests at all.

The assumption was that if there were new requests, they would
always be communicated one way or another, and would send necessary
epoch separating barriers explicitly.

This is not always true, e.g. when becoming diskless,
or while explicitly starting a full resync.

The last communicated epoch could stay open for a long time,
locking down corresponding activity log extents.

It is safe to always implicitly send that last barrier, as soon as we
determin that there cannot be more requests in the last communicated
epoch, even if there have been (uncommunicated) new requests in new
epochs meanwhile.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30 13:46:55 -06:00
Lars Ellenberg
e4d7d6f4d3 drbd: add back some fairness to AL transactions
When batching more updates to the activity log into single transactions,
we lost the ability for new requests to force themselves into the active
set: all preparation steps became non-blocking, and if all currently
hot extents keep busy, they could starve out new incoming requests
to cold extents for quite a while.

This can only happen if your IO backend accepts more IO operations per
average DRBD replication round trip time than you have al-extents
configured.

If we have incoming requests to cold extents,
at least do one blocking update per transaction.

In an artificial worst-case workload on SSD with an asynchronous 600 ms
replication link, with al-extents = 7 (the minimum we allow), and
concurrent full resynch, without this patch, some write requests have
been observed to be starved for 40 seconds.
With this patch, application observed a worst case latency of twice the
replication round trip time.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30 13:46:55 -06:00
Lars Ellenberg
fa090e708a drbd: keep max-bio size during detach/attach on disconnected primary
We want to store in persistent meta data what the peer DRBD can handle,
which, due to spreading requests to multiple bios,
may be more than its backing device can handle.

Otherwise, if a disconnected Primary temporarily loses access to its local data
as well, we may accidentally shrink the max-bio setting, portentially causing
already assembled, but not yet processed, application bios to be spuriously
failed due to device limits.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30 13:46:55 -06:00
Lars Ellenberg
074f4afeb2 drbd: fix a race between start_resync and send_and_submit
In the drbd make request function, specifically in
drbd_send_and_submit(), we decide whether we want to send the actual
write request, or only a "set this block out of sync" information.

We do so based on the current connection state, while holding the req_lock.
The connection state is not supposed to change while holding the req_lock.

But in drbd_start_resync, we did change that state anyways,
while only holding the global_state_lock, which is enough to change
sync-after dependencies (paused vs active resync), but
not good enough to change the connection state.

Fix: in drbd_start_resync, first grab the req_lock to serialize with
drbd_send_and_submit(), before grabbing the global_state_lock
to be able to evaluate the sync-after dependencies.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30 13:46:55 -06:00
Lars Ellenberg
20c68fdea1 drbd: Enable QUEUE_FLAG_DISCARD only if the peer can recieve P_TRIM
Allow the user of REQ_DISCARD.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30 13:46:55 -06:00
Lars Ellenberg
2f632aeb53 drbd: prepare sending side for REQ_DISCARD
Note that I do NOT call __drbd_chk_io_error for failed REQ_DISCARD.
That may be wrong, though, or needs to differ between EOPNOTSUPP and
other errors...

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30 13:46:55 -06:00
Lars Ellenberg
a0fb3c47a1 drbd: prepare receiving side for REQ_DISCARD
If the receiver needs to serve a discard request on a queue that does
not announce to be discard cabable, it falls back to do synchronous
blkdev_issue_zeroout().

We expect only "reasonably" large (up to one activity log extent?)
discard requests.

We do this to not to not block the receiver for too long in this
fallback code path, and to not set/clear too many bits inside one
spinlock_irq_save() in drbd_set_in_sync/drbd_set_out_of_sync,

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30 13:46:55 -06:00
Lars Ellenberg
9e276872fe drbd: allow parallel promote/demote actions
We plan to use genl_family->parallel_ops = true in the future,
but need to review all possible interactions first.

For now, only selectively drop genl_lock() in drbd_set_role(),
instead serializing on our own internal resource->conf_update mutex.

We now can be promoted/demoted on many resources in parallel,
which may significantly improve cluster failover times
when fencing is required.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30 13:46:54 -06:00
Lars Ellenberg
a910b12352 drbd: perpare for genetlink parallel_ops
Because all administrative requests via genetlink have been globally
serialized via genl_lock(), we used to have one static struct
drbd_config_context "admin context".

Move this on-stack to the respective callback functions.

This will allow us to selectively drop the genl_lock()
(or use genl_family->parallel_ops) in the future.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30 13:46:54 -06:00
Philipp Reisner
88ea685d33 drbd: Do not BUG() when connection breaks in a special way
When a 'cluster wide' disconnect executes, the result comes back
from the peer, and immediately after that the connection breaks
then _conn_rq_cond() reported back SS_CW_SUCCESS.
Therefore _conn_request_state() calls conn_set_state(), which
has a BUG() in it.
The BUG() is hit because conn_is_valid_transition() does not like
the transaction. Which goes back to is_valid_soft_transition()
returning SS_OUTDATE_WO_CONN.

This fix is to consider an error reported by is_valid_soft_transition()
even when the peer agreed to the transaction.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30 13:46:54 -06:00
Lars Ellenberg
e829987433 drbd: don't let application IO pre-empt resync too often
Before, application IO could pre-empt resync activity
for up to hardcoded 20 seconds per resync request.
A very busy server could throttle the effective resync bandwidth
down to one request per 20 seconds.

Now, we only let application IO pre-empt resync traffic
while the current resync rate estimate is above c-min-rate.

If you disable the c-min-rate throttle feature (set c-min-rate = 0),
application IO will no longer pre-empt resync traffic at all.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30 13:46:54 -06:00
Lars Ellenberg
0e49d7b014 drbd: fix potential distributed deadlock during verify or resync
If max-buffers and socket buffer sizes are "too small" for the chosen
resync rate, this could lead potentially lead to a distributed deadlock,
which may or may not resolve itself via the "ko-count" and request
timeout mechanism, or could be resolved by forced disconnect.

One option to deal with this is proper configuration:
use larger max-buffer and socket buffers settings,
or reduce the resync rate.

But even with bad configuration we should not deadlock,
but "gracefully" recover.

The issue is avoided by using only up to max-buffers/2 for resync
requests, and by using max-buffers not as a hard limit for data buffer
allocations, but as a throttle threshold only.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30 13:46:54 -06:00
Lars Ellenberg
6377b92350 drbd: resync: fix too large bursts for very slow rates
While merging adjacent dirty blocks into resync requests,
the resync rate throttle was disregarded.
For very low resync rates, the effective rate may have exceeded
the intended rate by a larger margin.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30 13:46:54 -06:00
Lars Ellenberg
9ae472605a drbd: fix stalled resync detection in /proc/drbd
If we don't make resync or verify progress for "too long",
we want to flag it as "stalled".

Since 2010, "use rolling marks for resync speed calculation"
this "too long" was wrong by a factor of HZ.
With HZ 250, it would have been flagged as stalled
after 100 minutes.

Hardcode 3 minutes instead.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30 13:46:54 -06:00
Philipp Reisner
cdc6af8df4 drbd: Allow online layout change of AL while peer is not connected
If a user forces the operation he takes the blame in case
the peer does not have enough space. No reason to dey this...

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30 13:46:54 -06:00
Philipp Reisner
d40e567149 drbd: Remove drbd_wrappers.h
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30 13:46:54 -06:00
Philipp Reisner
d7fe69c6a1 drbd: Leave IO suspended if the fence handler find the peer primary
Actually we are clearing the susp_fen flag if we are not going
to call a fencing handler.

For setting the susp_fen flag needs to be edge-triggerd, and not
level triggered.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30 13:46:54 -06:00
Philipp Reisner
31007745a5 drbd: Break a deadlock while concurrent fencing and establishing a connection
When we need to outdate the peer while being promoted to primary,
and the connection gets established at the same time, we deadlock
in drbd_try_outdate_peer() when trying to clear the susp_fen
bit.

Fix this by setting the STATE_SENT bit while holding the mutex.

Using drbd_change_state(.. , CS_HARD, ..) which does not block
until STATE_SENT is cleared, is only for clearness. It does
not contribute anything to the fix.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30 13:46:53 -06:00
Asai Thambi S P
d1e714db81 mtip32xx: Fix ERO and NoSnoop values in PCIe upstream on AMD systems
A hardware quirk in P320h/P420m interfere with PCIe transactions on some
AMD chipsets, making P320h/P420m unusable. This workaround is to disable
ERO and NoSnoop bits in the parent and root complex for normal
functioning of these devices

NOTE: This workaround is specific to AMD chipset with a PCIe upstream
device with device id 0x5aXX

Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Sam Bradshaw <sbradshaw@micron.com>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-22 19:48:52 -06:00
Asai Thambi S P
af5ded8ccf mtip32xx: Remove dfs_parent after pci unregister
In module exit, dfs_parent and it's subtree were removed before
unregistering with pci. When debugfs entry for each device is attempted
to remove in pci_remove() context, they don't exist, as dfs_parent and
its children were already ripped apart.

Modified to first unregister with pci and then remove dfs_parent.

Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-22 19:48:51 -06:00
Asai Thambi S P
670a641420 mtip32xx: Increase timeout for STANDBY IMMEDIATE command
Increased timeout for STANDBY IMMEDIATE command to 2 minutes.

Signed-off-by: Selvan Mani <smani@micron.com>
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-22 19:46:21 -06:00
Alexander Gordeev
24cddb83b4 cciss: Use pci_enable_msix_exact() instead of pci_enable_msix()
As result of deprecation of MSI-X/MSI enablement functions
pci_enable_msix() and pci_enable_msi_block() all drivers
using these two interfaces need to be updated to use the
new pci_enable_msi_range()  or pci_enable_msi_exact()
and pci_enable_msix_range() or pci_enable_msix_exact()
interfaces.

Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Cc: Mike Miller <mike.miller@hp.com>
Cc: iss_storagedev@hp.com
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-pci@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-21 19:02:08 -06:00
Alexander Gordeev
01aad3f0de skd: Use pci_enable_msix_exact() instead of pci_enable_msix_range()
Function pci_enable_msix_exact() is a variation of
pci_enable_msix_range() that allows a device driver
to request a particular number of MSI-X interrupts,
rather than any number within a specified range.

Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: linux-pci@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-21 19:02:05 -06:00
Dongsheng Yang
8698a745d8 sched, treewide: Replace hardcoded nice values with MIN_NICE/MAX_NICE
Replace various -20/+19 hardcoded nice values with MIN_NICE/MAX_NICE.

Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/ff13819fd09b7a5dba5ab5ae797f2e7019bdfa17.1394532288.git.yangds.fnst@cn.fujitsu.com
Cc: devel@driverdev.osuosl.org
Cc: devicetree@vger.kernel.org
Cc: fcoe-devel@open-fcoe.org
Cc: linux390@de.ibm.com
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-s390@vger.kernel.org
Cc: linux-scsi@vger.kernel.org
Cc: nbd-general@lists.sourceforge.net
Cc: ocfs2-devel@oss.oracle.com
Cc: openipmi-developer@lists.sourceforge.net
Cc: qla2xxx-upstream@qlogic.com
Cc: linux-arch@vger.kernel.org
[ Consolidated the patches, twiddled the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-18 12:07:24 +02:00
Jens Axboe
dc4a93078b sd/skd: stuff discard page in request->completion_data
Store the pointer to the page there, so we can always safely
reference it from end_io context where ->bio may have been
cleared.

Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-16 21:37:30 -06:00
Christoph Hellwig
1b4a325858 blk-mq: add async parameter to blk_mq_start_stopped_hw_queues
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-16 14:15:25 -06:00
Christoph Hellwig
24d2f90309 blk-mq: split out tag initialization, support shared tags
Add a new blk_mq_tag_set structure that gets set up before we initialize
the queue.  A single blk_mq_tag_set structure can be shared by multiple
queues.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Modular export of blk_mq_{alloc,free}_tagset added by me.

Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-15 14:18:02 -06:00
Christoph Hellwig
e9b267d91f blk-mq: add ->init_request and ->exit_request methods
The current blk_mq_init_commands/blk_mq_free_commands interface has a
two problems:

 1) Because only the constructor is passed to blk_mq_init_commands there
    is no easy way to clean up when a comman initialization failed.  The
    current code simply leaks the allocations done in the constructor.

 2) There is no good place to call blk_mq_free_commands: before
    blk_cleanup_queue there is no guarantee that all outstanding
    commands have completed, so we can't free them yet.  After
    blk_cleanup_queue the queue has usually been freed.  This can be
    worked around by grabbing an unconditional reference before calling
    blk_cleanup_queue and dropping it after blk_mq_free_commands is
    done, although that's not exatly pretty and driver writers are
    guaranteed to get it wrong sooner or later.

Both issues are easily fixed by making the request constructor and
destructor normal blk_mq_ops methods.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-15 14:03:03 -06:00
Christoph Hellwig
9d74e25737 blk-mq: do not initialize req->special
Drivers can reach their private data easily using the blk_mq_rq_to_pdu
helper and don't need req->special.  By not initializing it code can
be simplified nicely, and we also shave off a few more instructions from
the I/O path.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-15 14:03:02 -06:00
Jens Axboe
b4f42e2831 block: remove struct request buffer member
This was used in the olden days, back when onions were proper
yellow. Basically it mapped to the current buffer to be
transferred. With highmem being added more than a decade ago,
most drivers map pages out of a bio, and rq->buffer isn't
pointing at anything valid.

Convert old style drivers to just use bio_data().

For the discard payload use case, just reference the page
in the bio.

Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-15 14:03:02 -06:00
Linus Torvalds
5166701b36 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:
 "The first vfs pile, with deep apologies for being very late in this
  window.

  Assorted cleanups and fixes, plus a large preparatory part of iov_iter
  work.  There's a lot more of that, but it'll probably go into the next
  merge window - it *does* shape up nicely, removes a lot of
  boilerplate, gets rid of locking inconsistencie between aio_write and
  splice_write and I hope to get Kent's direct-io rewrite merged into
  the same queue, but some of the stuff after this point is having
  (mostly trivial) conflicts with the things already merged into
  mainline and with some I want more testing.

  This one passes LTP and xfstests without regressions, in addition to
  usual beating.  BTW, readahead02 in ltp syscalls testsuite has started
  giving failures since "mm/readahead.c: fix readahead failure for
  memoryless NUMA nodes and limit readahead pages" - might be a false
  positive, might be a real regression..."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
  missing bits of "splice: fix racy pipe->buffers uses"
  cifs: fix the race in cifs_writev()
  ceph_sync_{,direct_}write: fix an oops on ceph_osdc_new_request() failure
  kill generic_file_buffered_write()
  ocfs2_file_aio_write(): switch to generic_perform_write()
  ceph_aio_write(): switch to generic_perform_write()
  xfs_file_buffered_aio_write(): switch to generic_perform_write()
  export generic_perform_write(), start getting rid of generic_file_buffer_write()
  generic_file_direct_write(): get rid of ppos argument
  btrfs_file_aio_write(): get rid of ppos
  kill the 5th argument of generic_file_buffered_write()
  kill the 4th argument of __generic_file_aio_write()
  lustre: don't open-code kernel_recvmsg()
  ocfs2: don't open-code kernel_recvmsg()
  drbd: don't open-code kernel_recvmsg()
  constify blk_rq_map_user_iov() and friends
  lustre: switch to kernel_sendmsg()
  ocfs2: don't open-code kernel_sendmsg()
  take iov_iter stuff to mm/iov_iter.c
  process_vm_access: tidy up a bit
  ...
2014-04-12 14:49:50 -07:00
Linus Torvalds
3e8072d48b Merge git://git.infradead.org/users/willy/linux-nvme
Pull NVMe driver updates from Matthew Wilcox:
 "Various updates to the NVMe driver.  The most user-visible change is
  that drive hotplugging now works and CPU hotplug while an NVMe drive
  is installed should also work better"

* git://git.infradead.org/users/willy/linux-nvme:
  NVMe: Retry failed commands with non-fatal errors
  NVMe: Add getgeo to block ops
  NVMe: Start-stop nvme_thread during device add-remove.
  NVMe: Make I/O timeout a module parameter
  NVMe: CPU hot plug notification
  NVMe: per-cpu io queues
  NVMe: Replace DEFINE_PCI_DEVICE_TABLE
  NVMe: Fix divide-by-zero in nvme_trans_io_get_num_cmds
  NVMe: IOCTL path RCU protect queue access
  NVMe: RCU protected access to io queues
  NVMe: Initialize device reference count earlier
  NVMe: Add CONFIG_PM_SLEEP to suspend/resume functions
2014-04-11 16:45:59 -07:00
Keith Busch
edd10d3328 NVMe: Retry failed commands with non-fatal errors
For commands returned with failed status, queue these for resubmission
and continue retrying them until success or for a limited amount of
time. The final timeout was arbitrarily chosen so requests can't be
retried indefinitely.

Since these are requeued on the nvmeq that submitted the command, the
callbacks have to take an nvmeq instead of an nvme_dev as a parameter
so that we can use the locked queue to append the iod to retry later.

The nvme_iod conviently can be used to track how long we've been trying
to successfully complete an iod request. The nvme_iod also provides the
nvme prp dma mappings, so I had to move a few things around so we can
keep those mappings.

Signed-off-by: Keith Busch <keith.busch@intel.com>
[fixed checkpatch issue with long line]
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2014-04-10 17:11:59 -04:00
Keith Busch
4cc09e2dc4 NVMe: Add getgeo to block ops
Some programs require HDIO_GETGEO work, which requires we implement
getgeo.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2014-04-10 17:06:11 -04:00
Dan McLeran
b9afca3efb NVMe: Start-stop nvme_thread during device add-remove.
Done to ensure nvme_thread is not running when there
are no devices to poll.

Signed-off-by: Dan McLeran <daniel.mcleran@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2014-04-10 17:04:46 -04:00
Keith Busch
b355084a89 NVMe: Make I/O timeout a module parameter
Increase the default timeout to 30 seconds to match SCSI.

Signed-off-by: Keith Busch <keith.busch@intel.com>
[use byte instead of ushort]
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2014-04-10 17:04:38 -04:00
Keith Busch
33b1e95c90 NVMe: CPU hot plug notification
Registers with hot cpu notification to rebalance, and potentially allocate
additional, io queues.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2014-04-10 17:03:42 -04:00
Keith Busch
42f614201e NVMe: per-cpu io queues
The device's IO queues are associated with CPUs, so we can use a per-cpu
variable to map the a qid to a cpu. This provides a convienient way
to optimally assign queues to multiple cpus when the device supports
fewer queues than the host has cpus. The previous implementation may
have assigned these poorly in these situations. This patch addresses
this by sharing queues among cpus that are "close" together and should
have a lower lock contention penalty.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2014-04-10 17:03:15 -04:00
Linus Torvalds
dd76a786af Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block layer fixes from Jens Axboe:
 "A small collection of fixes that should go in before -rc1.  The pull
  request contains:

   - A two patch fix for a regression with block enabled tagging caused
     by a commit in the initial pull request.  One patch is from Martin
     and ensures that SCSI doesn't truncate 64-bit block flags, the
     other one is from me and prevents us from double using struct
     request queuelist for both completion and busy tags.  This caused
     anything from a boot crash for some, to crashes under load.

   - A blk-mq fix for a potential soft stall when hot unplugging CPUs
     with busy IO.

   - percpu_counter fix is listed in here, that caused a suspend issue
     with virtio-blk due to percpu counters having an inconsistent state
     during CPU removal.  Andrew sent this in separately a few days ago,
     but it's here.  JFYI.

   - A few fixes for block integrity from Martin.

   - A ratelimit fix for loop from Mike Galbraith, to avoid spewing too
     much in error cases"

* 'for-linus' of git://git.kernel.dk/linux-block:
  block: fix regression with block enabled tagging
  scsi: Make sure cmd_flags are 64-bit
  block: Ensure we only enable integrity metadata for reads and writes
  block: Fix integrity verification
  block: Fix for_each_bvec()
  drivers/block/loop.c: ratelimit error messages
  blk-mq: fix potential stall during CPU unplug with IO pending
  percpu_counter: fix bad counter state during suspend
2014-04-10 09:26:55 -07:00
Mike Galbraith
44bd70c347 drivers/block/loop.c: ratelimit error messages
Metric tons of high speed spew is not helpful when things go pear shaped.
systemd lost its mind, forgot how to stop services it insists on being
sole manager of, massive printk() flood ensued, box eventually died.

[16206.684000] loop: Write error at byte offset 11412291584, length 4096.
[16206.684000] systemd-journald[1758]: /dev/kmsg buffer overrun, some messages lost.
[16206.684000] loop: Write error at byte offset 13155434496, length 4096.
[16206.684000] loop: Write error at byte offset 13155438592, length 4096.
[16206.684000] loop: Write error at byte offset 13155442688, length 4096.
[16206.684000] loop: Write error at byte offset 13960736768, length 4096.
[16206.684000] loop: Write error at byte offset 14229172224, length 4096.
[16206.684000] systemd-journald[1758]: /dev/kmsg buffer overrun, some messages lost.
[16206.684000] loop: Write error at byte offset 14766043136, length 4096.
[16206.684000] loop: Write error at byte offset 15034478592, length 4096.
[16206.684000] systemd-journald[1758]: /dev/kmsg buffer overrun, some messages lost.

Signed-off-by: Mike Galbraith <bitbucket@online.de>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-08 14:44:35 -06:00
Linus Torvalds
26c12d9334 Merge branch 'akpm' (incoming from Andrew)
Merge second patch-bomb from Andrew Morton:
 - the rest of MM
 - zram updates
 - zswap updates
 - exit
 - procfs
 - exec
 - wait
 - crash dump
 - lib/idr
 - rapidio
 - adfs, affs, bfs, ufs
 - cris
 - Kconfig things
 - initramfs
 - small amount of IPC material
 - percpu enhancements
 - early ioremap support
 - various other misc things

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (156 commits)
  MAINTAINERS: update Intel C600 SAS driver maintainers
  fs/ufs: remove unused ufs_super_block_third pointer
  fs/ufs: remove unused ufs_super_block_second pointer
  fs/ufs: remove unused ufs_super_block_first pointer
  fs/ufs/super.c: add __init to init_inodecache()
  doc/kernel-parameters.txt: add early_ioremap_debug
  arm64: add early_ioremap support
  arm64: initialize pgprot info earlier in boot
  x86: use generic early_ioremap
  mm: create generic early_ioremap() support
  x86/mm: sparse warning fix for early_memremap
  lglock: map to spinlock when !CONFIG_SMP
  percpu: add preemption checks to __this_cpu ops
  vmstat: use raw_cpu_ops to avoid false positives on preemption checks
  slub: use raw_cpu_inc for incrementing statistics
  net: replace __this_cpu_inc in route.c with raw_cpu_inc
  modules: use raw_cpu_write for initialization of per cpu refcount.
  mm: use raw_cpu ops for determining current NUMA node
  percpu: add raw_cpu_ops
  slub: fix leak of 'name' in sysfs_slab_add
  ...
2014-04-07 16:38:06 -07:00
Joonsoo Kim
f4659d8e62 zram: support REQ_DISCARD
zram is ram based block device and can be used by backend of filesystem.
When filesystem deletes a file, it normally doesn't do anything on data
block of that file.  It just marks on metadata of that file.  This
behavior has no problem on disk based block device, but has problems on
ram based block device, since we can't free memory used for data block.
To overcome this disadvantage, there is REQ_DISCARD functionality.  If
block device support REQ_DISCARD and filesystem is mounted with discard
option, filesystem sends REQ_DISCARD to block device whenever some data
blocks are discarded.  All we have to do is to handle this request.

This patch implements to flag up QUEUE_FLAG_DISCARD and handle this
REQ_DISCARD request.  With it, we can free memory used by zram if it isn't
used.

[akpm@linux-foundation.org: tweak comments]
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:02 -07:00
Sergey Senozhatsky
56b4e8cb85 zram: use scnprintf() in attrs show() methods
sysfs.txt documentation lists the following requirements:

 - The buffer will always be PAGE_SIZE bytes in length. On i386, this
   is 4096.

 - show() methods should return the number of bytes printed into the
   buffer. This is the return value of scnprintf().

 - show() should always use scnprintf().

Use scnprintf() in show() functions.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:02 -07:00
Minchan Kim
60a726e333 zram: propagate error to user
When we initialized zcomp with single, we couldn't change
max_comp_streams without zram reset but current interface doesn't show
any error to user and even it changes max_comp_streams's value without
any effect so it would make user very confusing.

This patch prevents max_comp_streams's change when zcomp was initialized
as single zcomp and emit the error to user(ex, echo).

[akpm@linux-foundation.org: don't return with the lock held, per Sergey]
[fengguang.wu@intel.com: fix coccinelle warnings]
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:02 -07:00
Sergey Senozhatsky
fcfa8d95ca zram: return error-valued pointer from zcomp_create()
Instead of returning just NULL, return ERR_PTR from zcomp_create() if
compressing backend creation has failed.  ERR_PTR(-EINVAL) for unsupported
compression algorithm request, ERR_PTR(-ENOMEM) for allocation (zcomp or
compression stream) error.

Perform IS_ERR() check of returned from zcomp_create() value in
disksize_store() and set return code to PTR_ERR().

Change suggested by Jerome Marchand.

[akpm@linux-foundation.org: clean up error recovery flow]
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Jerome Marchand <jmarchan@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:02 -07:00