Commit Graph

222 Commits

Author SHA1 Message Date
Mike Snitzer
cc07d72bf3 dm raid: fix discard limits for raid1
Block core warned that discard_granularity was 0 for dm-raid with
personality of raid1.  Reason is that raid_io_hints() was incorrectly
special-casing raid1 rather than raid0.

Fix raid_io_hints() by removing discard limits settings for
raid1. Check for raid0 instead.

Fixes: 61697a6abd ("dm: eliminate 'split_discard_bios' flag from DM target interface")
Cc: stable@vger.kernel.org
Reported-by: Zdenek Kabelac <zkabelac@redhat.com>
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Reported-by: Stephan Bärwolf <stephan@matrixstorm.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-01-04 15:02:30 -05:00
Linus Torvalds
ac7ac4618c for-5.11/block-2020-12-14
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl/Xec8QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpoLbEACzXypgZWwMdfgRckA/Vt333rXHtbhUV+hK
 2XP+P81iRvr9Esi31UPbRp82vrgcDO0cpI1QmQojS5U5TIQP88BfXptfRZZu48eb
 wT5RDDNQ34HItqAh/yEuYsv9yUKcxeIrB99tBVvM+4UmQg9zTdIW3mg6PvCBdbhV
 N38jI0tCF/PJatjfRuphT/nXonQLPWBlVDmZk06KZQFOwQe9ep1vUi1+nbiRPuo3
 geFBpTh1Kp6Vl1B3n4RpECs6Y7I0RRuJdaH2sDizICla1/BW91F9fQwHimNnUxUq
 e1Q1kMuh6ftcQGkYlHSYcPhuv6CvorldTZCO5arPxWpcwvxriTSMRPWAgUr5pEiF
 fhiGhqeDu9e6vl9vS31wUD1B30hy+jFz9wyjRrDwJ3cPHH1JVBjTzvdX+cIh/1ku
 IbIwUMteUtvUrzqAv/DzbGhedp7xWtOFaVo8j0QFYh9zkjd6b8yDOF/yztwX2gjY
 Xt1cd+KpDSiN449ZRaoMI0sCJAxqzhMa6nsWlb0L7KuNyWKAbvKQBm9Rb47FLV9A
 Vx70KC+zkFoyw23capvIahmQazerriUJ5PGe0lVm6ROgmIFdCpXTPDjnrvq/6RZ/
 GEpD7gTW9atGJ7EuEE8686sAfKD5kneChWLX5EHXf0d0AG5Mr2lKsluiGp5LpPJg
 Q1Xqs6xwww==
 =zo4w
 -----END PGP SIGNATURE-----

Merge tag 'for-5.11/block-2020-12-14' of git://git.kernel.dk/linux-block

Pull block updates from Jens Axboe:
 "Another series of killing more code than what is being added, again
  thanks to Christoph's relentless cleanups and tech debt tackling.

  This contains:

   - blk-iocost improvements (Baolin Wang)

   - part0 iostat fix (Jeffle Xu)

   - Disable iopoll for split bios (Jeffle Xu)

   - block tracepoint cleanups (Christoph Hellwig)

   - Merging of struct block_device and hd_struct (Christoph Hellwig)

   - Rework/cleanup of how block device sizes are updated (Christoph
     Hellwig)

   - Simplification of gendisk lookup and removal of block device
     aliasing (Christoph Hellwig)

   - Block device ioctl cleanups (Christoph Hellwig)

   - Removal of bdget()/blkdev_get() as exported API (Christoph Hellwig)

   - Disk change rework, avoid ->revalidate_disk() (Christoph Hellwig)

   - sbitmap improvements (Pavel Begunkov)

   - Hybrid polling fix (Pavel Begunkov)

   - bvec iteration improvements (Pavel Begunkov)

   - Zone revalidation fixes (Damien Le Moal)

   - blk-throttle limit fix (Yu Kuai)

   - Various little fixes"

* tag 'for-5.11/block-2020-12-14' of git://git.kernel.dk/linux-block: (126 commits)
  blk-mq: fix msec comment from micro to milli seconds
  blk-mq: update arg in comment of blk_mq_map_queue
  blk-mq: add helper allocating tagset->tags
  Revert "block: Fix a lockdep complaint triggered by request queue flushing"
  nvme-loop: use blk_mq_hctx_set_fq_lock_class to set loop's lock class
  blk-mq: add new API of blk_mq_hctx_set_fq_lock_class
  block: disable iopoll for split bio
  block: Improve blk_revalidate_disk_zones() checks
  sbitmap: simplify wrap check
  sbitmap: replace CAS with atomic and
  sbitmap: remove swap_lock
  sbitmap: optimise sbitmap_deferred_clear()
  blk-mq: skip hybrid polling if iopoll doesn't spin
  blk-iocost: Factor out the base vrate change into a separate function
  blk-iocost: Factor out the active iocgs' state check into a separate function
  blk-iocost: Move the usage ratio calculation to the correct place
  blk-iocost: Remove unnecessary advance declaration
  blk-iocost: Fix some typos in comments
  blktrace: fix up a kerneldoc comment
  block: remove the request_queue to argument request based tracepoints
  ...
2020-12-16 12:57:51 -08:00
Mike Snitzer
0941e3b065 Revert "dm raid: fix discard limits for raid1 and raid10"
This reverts commit e0910c8e4f.

Reverting 6ffeb1c3f8 ("md: change mddev 'chunk_sectors' from int to
unsigned") exposes dm-raid.c compiler warnings detailed that commit's
header. Clearly this more conservative fix, of simply reverting
e0910c8e4f, would've been more prudent given how late we were in the
v5.10 release. Lessons have been learned.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-12-14 12:12:08 -05:00
Song Liu
e2782f560c Revert "dm raid: remove unnecessary discard limits for raid10"
This reverts commit f0e90b6c66.

Matthew Ruffell reported data corruption in raid10 due to the changes
in discard handling [1]. Revert these changes before we find a proper fix.

[1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/
Cc: Matthew Ruffell <matthew.ruffell@canonical.com>
Cc: Xiao Ni <xni@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
2020-12-09 20:43:48 -08:00
Christoph Hellwig
dc2985a8d5 dm-raid: use set_capacity_and_notify
Use set_capacity_and_notify to set the size of both the disk and block
device.  This also gets the uevent notifications for the resize for free.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-11-16 08:34:15 -07:00
Mike Snitzer
f0e90b6c66 dm raid: remove unnecessary discard limits for raid10
Commit bcc90d2804 ("md/raid10: improve raid10 discard request")
removes raid10's inability to properly handle large discards.  So
eliminate associated constraint from dm-raid's raid10 support.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-09-29 16:33:10 -04:00
Mike Snitzer
e0910c8e4f dm raid: fix discard limits for raid1 and raid10
Block core warned that discard_granularity was 0 for dm-raid with
personality of raid1.  Reason is that raid_io_hints() was incorrectly
special-casing raid1 rather than raid0.

But since commit 29efc390b9 ("md/md0: optimize raid0 discard
handling") even raid0 properly handles large discards.

Fix raid_io_hints() by removing discard limits settings for raid1.
Also, fix limits for raid10 by properly stacking underlying limits as
done in blk_stack_limits().

Depends-on: 29efc390b9 ("md/md0: optimize raid0 discard handling")
Fixes: 61697a6abd ("dm: eliminate 'split_discard_bios' flag from DM target interface")
Cc: stable@vger.kernel.org
Reported-by: Zdenek Kabelac <zkabelac@redhat.com>
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-09-29 16:33:09 -04:00
Christoph Hellwig
659e56ba86 block: add a new revalidate_disk_size helper
revalidate_disk is a relative awkward helper for driver use, as it first
calls an optional driver method and then updates the block device size,
while most callers either don't need the method call at all, or want to
keep state between the caller and the called method.

Add a revalidate_disk_size helper that just performs the update of the
block device size from the gendisk one, and switch all drivers that do
not implement ->revalidate_disk to use the new helper instead of
revalidate_disk()

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-02 08:00:07 -06:00
Linus Torvalds
2f12d44085 - DM multipath locking fixes around m->flags tests and improvements to
bio-based code so that it follows patterns established by
   request-based code.
 
 - Request-based DM core improvement to eliminate unnecessary call to
   blk_mq_queue_stopped().
 
 - Add "panic_on_corruption" error handling mode to DM verity target.
 
 - DM bufio fix to to perform buffer cleanup from a workqueue rather
   than wait for IO in reclaim context from shrinker.
 
 - DM crypt improvement to optionally avoid async processing via
   workqueues for reads and/or writes -- via "no_read_workqueue" and
   "no_write_workqueue" features.  This more direct IO processing
   improves latency and throughput with faster storage.  Avoiding
   workqueue IO submission for writes (DM_CRYPT_NO_WRITE_WORKQUEUE) is
   a requirement for adding zoned block device support to DM crypt.
 
 - Add zoned block device support to DM crypt.  Makes use of
   DM_CRYPT_NO_WRITE_WORKQUEUE and a new optional feature
   (DM_CRYPT_WRITE_INLINE) that allows write completion to wait for
   encryption to complete.  This allows write ordering to be preserved,
   which is needed for zoned block devices.
 
 - Fix DM ebs target's check for REQ_OP_FLUSH.
 
 - Fix DM core's report zones support to not report more zones than
   were requested.
 
 - A few small compiler warning fixes.
 
 - DM dust improvements to return output directly to the user rather
   than require they scrape the system log for output.
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAl8tdOQTHHNuaXR6ZXJA
 cmVkaGF0LmNvbQAKCRDFI/EKLZ0DWvDlB/sF8svagDqeqs27xTxCiUPykD29cMmS
 OGPr0Mp/BntZOBpSaTPM9s5XucP3WJhPsxet5qeoyM3OViSFx+O55PqPjn8C65y0
 eGMa4zknd9eO1933+ijmyQu6VNr4sf/6nusX4xSGqv00UR22dJ+3pHtfN9ANDXYX
 AAYA0Ve6UuOwAbGUCnRGI/2780aYY0B8Ok+cF21CskqryF+RpmbZ6BsR07+Hk4cy
 LX5EaHUqezW12cibLq2f0l7TLLJ86OscvqyU9lGVIxiV57e2i5c2S1HvhKZu+Wn3
 6CUmlOhGI0viCKgM1ArekZ+zOw9ROIaAKKPzC5mspqx9yuuCqdY8k8xV
 =X3tt
 -----END PGP SIGNATURE-----

Merge tag 'for-5.9/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

 - DM multipath locking fixes around m->flags tests and improvements to
   bio-based code so that it follows patterns established by
   request-based code.

 - Request-based DM core improvement to eliminate unnecessary call to
   blk_mq_queue_stopped().

 - Add "panic_on_corruption" error handling mode to DM verity target.

 - DM bufio fix to to perform buffer cleanup from a workqueue rather
   than wait for IO in reclaim context from shrinker.

 - DM crypt improvement to optionally avoid async processing via
   workqueues for reads and/or writes -- via "no_read_workqueue" and
   "no_write_workqueue" features. This more direct IO processing
   improves latency and throughput with faster storage. Avoiding
   workqueue IO submission for writes (DM_CRYPT_NO_WRITE_WORKQUEUE) is a
   requirement for adding zoned block device support to DM crypt.

 - Add zoned block device support to DM crypt. Makes use of
   DM_CRYPT_NO_WRITE_WORKQUEUE and a new optional feature
   (DM_CRYPT_WRITE_INLINE) that allows write completion to wait for
   encryption to complete. This allows write ordering to be preserved,
   which is needed for zoned block devices.

 - Fix DM ebs target's check for REQ_OP_FLUSH.

 - Fix DM core's report zones support to not report more zones than were
   requested.

 - A few small compiler warning fixes.

 - DM dust improvements to return output directly to the user rather
   than require they scrape the system log for output.

* tag 'for-5.9/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm: don't call report zones for more than the user requested
  dm ebs: Fix incorrect checking for REQ_OP_FLUSH
  dm init: Set file local variable static
  dm ioctl: Fix compilation warning
  dm raid: Remove empty if statement
  dm verity: Fix compilation warning
  dm crypt: Enable zoned block device support
  dm crypt: add flags to optionally bypass kcryptd workqueues
  dm bufio: do buffer cleanup from a workqueue
  dm rq: don't call blk_mq_queue_stopped() in dm_stop_queue()
  dm dust: add interface to list all badblocks
  dm dust: report some message results directly back to user
  dm verity: add "panic_on_corruption" error handling mode
  dm mpath: use double checked locking in fast path
  dm mpath: rename current_pgpath to pgpath in multipath_prepare_ioctl
  dm mpath: rework __map_bio()
  dm mpath: factor out multipath_queue_bio
  dm mpath: push locking down to must_push_back_rq()
  dm mpath: take m->lock spinlock when testing QUEUE_IF_NO_PATH
  dm mpath: changes from initial m->flags locking audit
2020-08-07 13:08:09 -07:00
Damien Le Moal
04dc5330e5 dm raid: Remove empty if statement
In super_init_validation(), remove a body-less if statement testing only
variables to avoid a compilation warning when compiling with W=1.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-08-04 15:48:48 -04:00
Christoph Hellwig
21cf866145 writeback: remove bdi->congested_fn
Except for pktdvd, the only places setting congested bits are file
systems that allocate their own backing_dev_info structures.  And
pktdvd is a deprecated driver that isn't useful in stack setup
either.  So remove the dead congested_fn stacking infrastructure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Song Liu <song@kernel.org>
Acked-by: David Sterba <dsterba@suse.com>
[axboe: fixup unused variables in bcache/request.c]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-08 17:20:46 -06:00
Gustavo A. R. Silva
b18ae8dd9d dm: replace zero-length array with flexible-array
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
        int stuff;
        struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

sizeof(flexible-array-member) triggers a warning because flexible array
members have incomplete type[1]. There are some instances of code in
which the sizeof operator is being incorrectly/erroneously applied to
zero-length arrays and the result is zero. Such instances may be hiding
some bugs. So, this work (flexible-array member conversions) will also
help to get completely rid of those sorts of issues.

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-05-20 17:09:44 -04:00
Heinz Mauelshagen
43f3952a51 dm raid: table line rebuild status fixes
raid_status() wasn't emitting rebuild flags on the table line properly
because the rdev number was not yet set properly; index raid component
devices array directly to solve.

Also fix wrong argument count on emitted table line caused by 1 too
many rebuild/write_mostly argument and consider any journal_(dev|mode)
pairs.

Link: https://bugzilla.redhat.com/1782045
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-01-07 11:43:37 -05:00
Nathan Chancellor
35ad035b83 dm raid: Remove unnecessary negation of a shift in raid10_format_to_md_layout
When building with Clang + -Wtautological-constant-compare:

 drivers/md/dm-raid.c:619:8: warning: converting the result of '<<' to a
 boolean always evaluates to true [-Wtautological-constant-compare]
                 r = !RAID10_OFFSET;
                      ^
 drivers/md/dm-raid.c:517:28: note: expanded from macro 'RAID10_OFFSET'
 #define RAID10_OFFSET                   (1 << 16) /* stripes with data
 copies area adjacent on devices */
                                           ^
 1 warning generated.

Negating a non-zero number will always make it zero, which is the
default value of r in this function so this statement is unnecessary;
remove it so that clang no longer warns.

Link: https://github.com/ClangBuiltLinux/linux/issues/753
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Acked-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-11-07 11:59:38 -05:00
Heinz Mauelshagen
53be73a5d7 dm raid: streamline rs_get_progress() and its raid_status() caller side
Pass already deciphered state into rs_get_progress, simplify recovery offset
definition and combine two st_resync, st_reshape conditionals into one as is
already the case with st_check and st_repair.

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-11-05 14:04:37 -05:00
Heinz Mauelshagen
f9f3ee9130 dm raid: simplify rs_setup_recovery call chain
rs_setup_recovery() sets the starting recovery offset.

Drop superfluous rs_setup_recovery() and replace with __rs_setup_recovery().

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-11-05 14:02:52 -05:00
Heinz Mauelshagen
99273d9e6e dm raid: to ensure resynchronization, perform raid set grow in preresume
This fixes a flaw causing raid set extensions not to be synchronized
in case the MD bitmap resize required additional pages to be allocated.

Also share resize code in the raid constructor between
new size changes and those occuring during recovery.

Bump the target version to define the change and document
it in Documentation/admin-guide/device-mapper/dm-raid.rst.

Reported-by: Steve D <steved424@gmail.com>
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-11-05 14:02:26 -05:00
Heinz Mauelshagen
22c992e1a8 dm raid: change rs_set_dev_and_array_sectors API and callers
Add a size argument to rs_set_dev_and_array_sectors as prerequisite
to fixing grown device resynchronization not occuring when new MD
bitmap pages have to be allocated as a result of the extension in
a follwup patch.

Also avoid code duplication by using rs_set_rdev_sectors
in the aforementioned function.

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-11-05 14:02:00 -05:00
Ming Lei
c8156fc77d dm raid: fix updating of max_discard_sectors limit
Unit of 'chunk_size' is byte, instead of sector, so fix it by setting
the queue_limits' max_discard_sectors to rs->md.chunk_sectors.  Also,
rename chunk_size to chunk_size_bytes.

Without this fix, too big max_discard_sectors is applied on the request
queue of dm-raid, finally raid code has to split the bio again.

This re-split done by raid causes the following nested clone_endio:

1) one big bio 'A' is submitted to dm queue, and served as the original
bio

2) one new bio 'B' is cloned from the original bio 'A', and .map()
is run on this bio of 'B', and B's original bio points to 'A'

3) raid code sees that 'B' is too big, and split 'B' and re-submit
the remainded part of 'B' to dm-raid queue via generic_make_request().

4) now dm will handle 'B' as new original bio, then allocate a new
clone bio of 'C' and run .map() on 'C'. Meantime C's original bio
points to 'B'.

5) suppose now 'C' is completed by raid directly, then the following
clone_endio() is called recursively:

	clone_endio(C)
		->clone_endio(B)		#B is original bio of 'C'
			->bio_endio(A)

'A' can be big enough to make hundreds of nested clone_endio(), then
stack can be corrupted easily.

Fixes: 61697a6abd ("dm: eliminate 'split_discard_bios' flag from DM target interface")
Cc: stable@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-09-11 16:18:23 -04:00
Wenwen Wang
dc1a3e8e0c dm raid: add missing cleanup in raid_ctr()
If rs_prepare_reshape() fails, no cleanup is executed, leading to
leak of the raid_set structure allocated at the beginning of
raid_ctr(). To fix this issue, go to the label 'bad' if the error
occurs.

Fixes: 11e4723206 ("dm raid: stop keeping raid set frozen altogether")
Cc: stable@vger.kernel.org
Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-08-21 11:47:05 -04:00
Mauro Carvalho Chehab
6cf2a73cb2 docs: device-mapper: move it to the admin-guide
The DM support describes lots of aspects related to mapped
disk partitions from the userspace PoV.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
2019-07-15 11:03:01 -03:00
Mauro Carvalho Chehab
f0ba43774c docs: convert docs to ReST and rename to *.rst
The conversion is actually:
  - add blank lines and indentation in order to identify paragraphs;
  - fix tables markups;
  - add some lists markups;
  - mark literal blocks;
  - adjust title markups.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2019-06-14 14:21:04 -06:00
Mike Snitzer
61697a6abd dm: eliminate 'split_discard_bios' flag from DM target interface
There is no need to have DM core split discards on behalf of a DM target
now that blk_queue_split() handles splitting discards based on the
queue_limits.  A DM target just needs to set max_discard_sectors,
discard_granularity, etc, in queue_limits.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-02-20 23:24:55 -05:00
Heinz Mauelshagen
74694bcbdf dm raid: fix false -EBUSY when handling check/repair message
Sending a check/repair message infrequently leads to -EBUSY instead of
properly identifying an active resync.  This occurs because
raid_message() is testing recovery bits in a racy way.

Fix by calling decipher_sync_action() from raid_message() to properly
identify the idle state of the RAID device.

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-12-18 13:48:35 -05:00
Heinz Mauelshagen
d857ad75ed dm raid: avoid bitmap with raid4/5/6 journal device
With raid4/5/6, journal device and write intent bitmap are mutually exclusive.

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-10-18 15:13:48 -04:00
Geert Uytterhoeven
0328ba9040 dm raid: remove bogus const from decipher_sync_action() return type
With gcc-4.1.2:

    drivers/md/dm-raid.c:3357: warning: type qualifiers ignored on function return type

Remove the "const" keyword to fix this.

Fixes: 36a240a706 ("dm raid: fix RAID leg rebuild errors")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-09-17 22:46:50 -04:00
Heinz Mauelshagen
5380c05b68 dm raid: bump target version, update comments and documentation
Bump target version to reflect the documented fixes are available.
Also fix some code comments (typos and clarity).

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-09-06 17:07:58 -04:00
Heinz Mauelshagen
36a240a706 dm raid: fix RAID leg rebuild errors
On fast devices such as NVMe, a flaw in rs_get_progress() results in
false target status output when userspace lvm2 requests leg rebuilds
(symptom of the failure is device health chars 'aaaaaaaa' instead of
expected 'aAaAAAAA' causing lvm2 to fail).

The correct sync action state definitions already exist in
decipher_sync_action() so fix rs_get_progress() to use it.

Change decipher_sync_action() to return an enum rather than a string for
the sync states and call it from rs_get_progress().  Introduce
sync_str() to translate from enum to the string that is needed by
raid_status().

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-09-06 17:07:56 -04:00
Heinz Mauelshagen
c44a5ee803 dm raid: fix rebuild of specific devices by updating superblock
Update superblock when particular devices are requested via rebuild
(e.g. lvconvert --replace ...) to avoid spurious failure with the "New
device injected into existing raid set without 'delta_disks' or
'rebuild' parameter specified" error message.

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-09-06 17:07:56 -04:00
Heinz Mauelshagen
644e2537fd dm raid: fix stripe adding reshape deadlock
When initiating a stripe adding reshape, a deadlock between
md_stop_writes() waiting for the sync thread to stop and the running
sync thread waiting for inactive stripes occurs (this frequently happens
on single-core but rarely on multi-core systems).

Fix this deadlock by setting MD_RECOVERY_WAIT to have the main MD
resynchronization thread worker (md_do_sync()) bail out when initiating
the reshape via constructor arguments.

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-09-06 17:07:50 -04:00
Heinz Mauelshagen
38b0bd0cda dm raid: fix reshape race on small devices
Loading a new mapping table, the dm-raid target's constructor
retrieves the volatile reshaping state from the raid superblocks.

When the new table is activated in a following resume, the actual
reshape position is retrieved.  The reshape driven by the previous
mapping can already have finished on small and/or fast devices thus
updating raid superblocks about the new raid layout.

This causes the actual array state (e.g. stripe size reshape finished)
to be inconsistent with the one in the new mapping, causing hangs with
left behind devices.

This race does not occur with usual raid device sizes but with small
ones (e.g. those created by the lvm2 test suite).

Fix by no longer transferring stale/inconsistent raid_set state during
preresume.

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-09-06 14:11:00 -04:00
Linus Torvalds
08b5fa8199 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov:

 - a new driver for Rohm BU21029 touch controller

 - new bitmap APIs: bitmap_alloc, bitmap_zalloc and bitmap_free

 - updates to Atmel, eeti. pxrc and iforce drivers

 - assorted driver cleanups and fixes.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (57 commits)
  MAINTAINERS: Add PhoenixRC Flight Controller Adapter
  Input: do not use WARN() in input_alloc_absinfo()
  Input: mark expected switch fall-throughs
  Input: raydium_i2c_ts - use true and false for boolean values
  Input: evdev - switch to bitmap API
  Input: gpio-keys - switch to bitmap_zalloc()
  Input: elan_i2c_smbus - cast sizeof to int for comparison
  bitmap: Add bitmap_alloc(), bitmap_zalloc() and bitmap_free()
  md: Avoid namespace collision with bitmap API
  dm: Avoid namespace collision with bitmap API
  Input: pm8941-pwrkey - add resin entry
  Input: pm8941-pwrkey - abstract register offsets and event code
  Input: iforce - reorganize joystick configuration lists
  Input: atmel_mxt_ts - move completion to after config crc is updated
  Input: atmel_mxt_ts - don't report zero pressure from T9
  Input: atmel_mxt_ts - zero terminate config firmware file
  Input: atmel_mxt_ts - refactor config update code to add context struct
  Input: atmel_mxt_ts - config CRC may start at T71
  Input: atmel_mxt_ts - remove unnecessary debug on ENOMEM
  Input: atmel_mxt_ts - remove duplicate setup of ABS_MT_PRESSURE
  ...
2018-08-18 16:48:07 -07:00
Andy Shevchenko
e64e4018d5 md: Avoid namespace collision with bitmap API
bitmap API (include/linux/bitmap.h) has 'bitmap' prefix for its methods.

On the other hand MD bitmap API is special case.
Adding 'md' prefix to it to avoid name space collision.

No functional changes intended.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Shaohua Li <shli@kernel.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2018-08-01 15:49:39 -07:00
Arnd Bergmann
f2ccaa5904 dm raid: don't use 'const' in function return
A newly introduced function has 'const int' as the return type,
but as "make W=1" reports, that has no meaning:

drivers/md/dm-raid.c:510:18: error: type qualifiers ignored on function return type [-Werror=ignored-qualifiers]

This changes the return type to plain 'int'.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 33e53f0685 ("dm raid: introduce extended superblock and new raid types to support takeover/reshaping")
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Fixes: 552aa679f2 ("dm raid: use rs_is_raid*()")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-06-22 14:51:12 -04:00
Kees Cook
acafe7e302 treewide: Use struct_size() for kmalloc()-family
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:

struct foo {
    int stuff;
    void *entry[];
};

instance = kmalloc(sizeof(struct foo) + sizeof(void *) * count, GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:

instance = kmalloc(struct_size(instance, entry, count), GFP_KERNEL);

This patch makes the changes for kmalloc()-family (and kvmalloc()-family)
uses. It was done via automatic conversion with manual review for the
"CHECKME" non-standard cases noted below, using the following Coccinelle
script:

// pkey_cache = kmalloc(sizeof *pkey_cache + tprops->pkey_tbl_len *
//                      sizeof *pkey_cache->table, GFP_KERNEL);
@@
identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc";
expression GFP;
identifier VAR, ELEMENT;
expression COUNT;
@@

- alloc(sizeof(*VAR) + COUNT * sizeof(*VAR->ELEMENT), GFP)
+ alloc(struct_size(VAR, ELEMENT, COUNT), GFP)

// mr = kzalloc(sizeof(*mr) + m * sizeof(mr->map[0]), GFP_KERNEL);
@@
identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc";
expression GFP;
identifier VAR, ELEMENT;
expression COUNT;
@@

- alloc(sizeof(*VAR) + COUNT * sizeof(VAR->ELEMENT[0]), GFP)
+ alloc(struct_size(VAR, ELEMENT, COUNT), GFP)

// Same pattern, but can't trivially locate the trailing element name,
// or variable name.
@@
identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc";
expression GFP;
expression SOMETHING, COUNT, ELEMENT;
@@

- alloc(sizeof(SOMETHING) + COUNT * sizeof(ELEMENT), GFP)
+ alloc(CHECKME_struct_size(&SOMETHING, ELEMENT, COUNT), GFP)

Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-06 11:15:43 -07:00
Heinz Mauelshagen
13bc62d4a6 dm raid: fix parse_raid_params() variable range issue
parse_raid_params() compares variable "int value" with INT_MAX.

E.g. related Coverity report excerpt:
   CID 1364818 (#2 of 3): Operands don't affect result (CONSTANT_EXPRESSION_RESULT) [select issue]
1433                        if (value > INT_MAX) {

Fix by changing checks to avoid INT_MAX.

Whilst on it, avoid unnecessary checks against constants
and add check for sane recovery speed min/max.

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-04 12:12:37 -04:00
Heinz Mauelshagen
880bcce0dc dm raid: fix nosync status
Fix a race for "nosync" activations providing "aa.." device health
characters and "0/N" sync ratio rather than "AA..." and "N/N".  Occurs
when status for the raid set is retrieved during resume before the MD
sync thread starts and clears the MD_RECOVERY_NEEDED flag.

Cc: stable@vger.kernel.org # 4.16+
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03 15:04:19 -04:00
Mike Snitzer
1eb5fa849f dm: allow targets to return output from messages they are sent
Could be useful for a target to return stats or other information.
If a target does DMEMIT() anything to @result from its .message method
then it must return 1 to the caller.

Signed-off-By: Mike Snitzer <snitzer@redhat.com>
2018-04-03 15:04:10 -04:00
Jonathan Brassow
da1e148803 dm raid: fix incorrect sync_ratio when degraded
Upstream commit 4102d9de6d ("dm raid: fix rs_get_progress()
synchronization state/ratio") in combination with commit 7c29744ecc
("dm raid: simplify rs_get_progress()") introduced a regression by
incorrectly reporting a sync_ratio of 0 for degraded raid sets.  This
caused lvm2 to fail to repair raid legs automatically.

Fix by identifying the degraded state by checking the MD_RECOVERY_INTR
flag and returning mddev->recovery_cp in case it is set.

MD sets recovery = [ MD_RECOVERY_RECOVER MD_RECOVERY_INTR
MD_RECOVERY_NEEDED ] when a RAID member fails.  It then shuts down any
sync thread that is running and leaves us with all MD_RECOVERY_* flags
cleared.  The bug occurs if a status is requested in the short time it
takes to shut down any sync thread and clear the flags, because we were
keying in on the MD_RECOVERY_NEEDED - understanding it to be the initial
phase of a “recover” sync thread.  However, this is an incorrect
interpretation if MD_RECOVERY_INTR is also set.

This also explains why the bug only happened when automatic repair was
enabled and not a normal ‘manual’ method.  It is impossible to react
quick enough to hit the problematic window without it being automated.

Fix passes automatic repair tests.

Fixes: 7c29744ecc ("dm raid: simplify rs_get_progress()")
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-03-06 20:23:57 -05:00
Linus Torvalds
0be600a5ad - DM core fixes to ensure that bio submission follows a depth-first tree
walk; this is critical to allow forward progress without the need to
   use the bioset's BIOSET_NEED_RESCUER.
 
 - Remove DM core's BIOSET_NEED_RESCUER based dm_offload infrastructure.
 
 - DM core cleanups and improvements to make bio-based DM more efficient
   (e.g. reduced memory footprint as well leveraging per-bio-data more).
 
 - Introduce new bio-based mode (DM_TYPE_NVME_BIO_BASED) that leverages
   the more direct IO submission path in the block layer; this mode is
   used by DM multipath and also optimizes targets like DM thin-pool that
   stack directly on NVMe data device.
 
 - DM multipath improvements to factor out legacy SCSI-only
   (e.g. scsi_dh) code paths to allow for more optimized support for NVMe
   multipath.
 
 - A fix for DM multipath path selectors (service-time and queue-length)
   to select paths in a more balanced way; largely academic but doesn't
   hurt.
 
 - Numerous DM raid target fixes and improvements.
 
 - Add a new DM "unstriped" target that enables Intel to workaround
   firmware limitations in some NVMe drives that are striped internally
   (this target also works when stacked above the DM "striped" target).
 
 - Various Documentation fixes and improvements.
 
 - Misc. cleanups and fixes across various DM infrastructure and targets
   (e.g. bufio, flakey, log-writes, snapshot).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJacgwPAAoJEMUj8QotnQNaEw0H/0XRTcg8/lRuGl46kdeI3PgR
 ZxUy4XgUrCLiACWO5yCU/nKipB32+3xTlTDTBcjmaBfX8HolH147Pasb1KdHqLVC
 dOWLMpjlFztb5fnuOMitJA05qQAbgRlZ52QdVk/FDo9yWicgWjQZduh8aYX53pHw
 6XOYWzSFAXQcaduPdz6TLiPw479xBwIpXxQbrO09f4qt3Ub4bqknEhzFXc+6M7zl
 ejmW/bG2Qg6WmsfAuaAhFTV0LpTPSEzvaq9TfR7yqFU3DvDIAi7Yh8eQinIUDo4u
 txpOGoESRAMPAMKH0/UJdr/u7jTsfgJox4QEavWfnViPvkouah5KdjVOL1veZ5U=
 =R3dN
 -----END PGP SIGNATURE-----

Merge tag 'for-4.16/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

 - DM core fixes to ensure that bio submission follows a depth-first
   tree walk; this is critical to allow forward progress without the
   need to use the bioset's BIOSET_NEED_RESCUER.

 - Remove DM core's BIOSET_NEED_RESCUER based dm_offload infrastructure.

 - DM core cleanups and improvements to make bio-based DM more efficient
   (e.g. reduced memory footprint as well leveraging per-bio-data more).

 - Introduce new bio-based mode (DM_TYPE_NVME_BIO_BASED) that leverages
   the more direct IO submission path in the block layer; this mode is
   used by DM multipath and also optimizes targets like DM thin-pool
   that stack directly on NVMe data device.

 - DM multipath improvements to factor out legacy SCSI-only (e.g.
   scsi_dh) code paths to allow for more optimized support for NVMe
   multipath.

 - A fix for DM multipath path selectors (service-time and queue-length)
   to select paths in a more balanced way; largely academic but doesn't
   hurt.

 - Numerous DM raid target fixes and improvements.

 - Add a new DM "unstriped" target that enables Intel to workaround
   firmware limitations in some NVMe drives that are striped internally
   (this target also works when stacked above the DM "striped" target).

 - Various Documentation fixes and improvements.

 - Misc cleanups and fixes across various DM infrastructure and targets
   (e.g. bufio, flakey, log-writes, snapshot).

* tag 'for-4.16/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (69 commits)
  dm cache: Documentation: update default migration_throttling value
  dm mpath selector: more evenly distribute ties
  dm unstripe: fix target length versus number of stripes size check
  dm thin: fix trailing semicolon in __remap_and_issue_shared_cell
  dm table: fix NVMe bio-based dm_table_determine_type() validation
  dm: various cleanups to md->queue initialization code
  dm mpath: delay the retry of a request if the target responded as busy
  dm mpath: return DM_MAPIO_DELAY_REQUEUE if QUEUE_IO or PG_INIT_REQUIRED
  dm mpath: return DM_MAPIO_REQUEUE on blk-mq rq allocation failure
  dm log writes: fix max length used for kstrndup
  dm: backfill missing calls to mutex_destroy()
  dm snapshot: use mutex instead of rw_semaphore
  dm flakey: check for null arg_name in parse_features()
  dm thin: extend thinpool status format string with omitted fields
  dm thin: fixes in thin-provisioning.txt
  dm thin: document representation of <highest mapped sector> when there is none
  dm thin: fix documentation relative to low water mark threshold
  dm cache: be consistent in specifying sectors and SI units in cache.txt
  dm cache: delete obsoleted paragraph in cache.txt
  dm cache: fix grammar in cache-policies.txt
  ...
2018-01-31 11:05:47 -08:00
Wei Yongjun
67ac901c55 dm raid: make raid_sets symbol static
Fixes the following sparse warning:

drivers/md/dm-raid.c:33:1: warning:
 symbol 'raid_sets' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-01-17 09:16:05 -05:00
Heinz Mauelshagen
552aa679f2 dm raid: use rs_is_raid*()
Cleanup, no functional change.

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-12-13 12:15:47 -05:00
Heinz Mauelshagen
7c29744ecc dm raid: simplify rs_get_progress()
No need to calculate the reshaping progress because
mddev->curr_resync_completed holds it.

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-12-13 11:59:21 -05:00
Heinz Mauelshagen
dc15b943d4 dm raid: ensure 'a' chars during reshape
During reshape, 'A' chars were reported in status rather than 'a'.

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-12-13 11:57:36 -05:00
Heinz Mauelshagen
11e4723206 dm raid: stop keeping raid set frozen altogether
In order to avoid redoing synchronization/recovery/reshape partially,
the raid set got frozen until after all passed in table line flags had
been cleared.  The related table reload sequence had to be precisely
followed, or reshaping may lead to data corruption caused by the active
mapping carrying on with a reshape when the inactive mapping already
had retrieved a stale reshape position.

Harden by retrieving the actual resync/recovery/reshape position
during resume whilst the active table is suspended thus avoiding
to keep the raid set frozen altogether.  This prevents superfluous
redoing of an already resynchronized or recovered segment and,
most importantly, potential for redoing of an already reshaped
segment causing data corruption.

Fixes: d39f0010e ("dm raid: fix raid_resume() to keep raid set frozen as needed")
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-12-13 11:52:02 -05:00
Heinz Mauelshagen
53bf5384f9 dm raid: validate current raid sets redundancy
Verifying the current raid sets redundancy based on retrieved
superblock content has to use the superblock's raid level (e.g. raid0),
not the constructor requested one (e.g. raid10).

Using the requested raid level of raid10 lead to a "divide error"
on raid0 which defines data copies divided by to be zero.

Also check for bogus data copies.

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-12-13 11:50:52 -05:00
Song Liu
d5d885fd51 md: introduce new personality funciton start()
In do_md_run(), md threads should not wake up until the array is fully
initialized in md_run(). However, in raid5_run(), raid5-cache may wake
up mddev->thread to flush stripes that need to be written back. This
design doesn't break badly right now. But it could lead to bad bug in
the future.

This patch tries to resolve this problem by splitting start up work
into two personality functions, run() and start(). Tasks that do not
require the md threads should go into run(), while task that require
the md threads go into start().

r5l_load_log() is moved to raid5_start(), so it is not called until
the md threads are started in do_md_run().

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
2017-12-11 08:52:34 -08:00
Mike Snitzer
b84cf26924 dm raid: bump target version to reflect numerous fixes
Also update Documentation accordingly.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-12-08 10:59:58 -05:00
Heinz Mauelshagen
78a75d10ef dm raid: small cleanup and remove unsed "struct raid_set" member
Move raid_resume()'s setting of 'rw' and 'in_sync' to just prior to
mddev_resume().

Also, remove unused 'bitmap_loaded' member from "struct raid_set".

No functional changes.

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-12-08 10:59:58 -05:00
Heinz Mauelshagen
4102d9de6d dm raid: fix rs_get_progress() synchronization state/ratio
Fix various sync state issues causing racy/bogus sync ratio,
sync_action ad health chars in dm_status() info output.

Sync ratio could be N/N (i.e. 100%) shortly after raid set
creation, i.e. creating a new RaidLV or upconverting a linear LV to
raid1 thus:
  "0 2097152 raid raid1 2 Aa 2097162/2097152 recover 0 0 -"
instead of:
  "0 2097152 raid raid1 2 Aa 0/2097152 idle 0 0 -"

Sync action could be non-idle, when the MD thread was done with io.

Health chars could be 'A' when they should be 'a' for a short time
before a resynchonization started.

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-12-08 10:59:58 -05:00