linux/drivers/md
Shaohua Li 0576b1c618 raid5: log reclaim support
This is the reclaim support for raid5 log. A stripe write will have
following steps:

1. reconstruct the stripe, read data/calculate parity. ops_run_io
prepares to write data/parity to raid disks
2. hijack ops_run_io. stripe data/parity is appending to log disk
3. flush log disk cache
4. ops_run_io run again and do normal operation. stripe data/parity is
written in raid array disks. raid core can return io to upper layer.
5. flush cache of all raid array disks
6. update super block
7. log disk space used by the stripe can be reused

In practice, several stripes consist of an io_unit and we will batch
several io_unit in different steps, but the whole process doesn't
change.

It's possible io return just after data/parity hit log disk, but then
read IO will need read from log disk. For simplicity, IO return happens
at step 4, where read IO can directly read from raid disks.

Currently reclaim run if there is specific reclaimable space (1/4 disk
size or 10G) or we are out of space. Reclaim is just to free log disk
spaces, it doesn't impact data consistency. The size based force reclaim
is to make sure log isn't too big, so recovery doesn't scan log too
much.

Recovery make sure raid disks and log disk have the same data of a
stripe. If crash happens before 4, recovery might/might not recovery
stripe's data/parity depending on if data/parity and its checksum
matches. In either case, this doesn't change the syntax of an IO write.
After step 3, stripe is guaranteed recoverable, because stripe's
data/parity is persistent in log disk. In some cases, log disk content
and raid disks content of a stripe are the same, but recovery will still
copy log disk content to raid disks, this doesn't impact data
consistency. space reuse happens after superblock update and cache
flush.

There is one situation we want to avoid. A broken meta in the middle of
a log causes recovery can't find meta at the head of log. If operations
require meta at the head persistent in log, we must make sure meta
before it persistent in log too. The case is stripe data/parity is in
log and we start write stripe to raid disks (before step 4). stripe
data/parity must be persistent in log before we do the write to raid
disks. The solution is we restrictly maintain io_unit list order. In
this case, we only write stripes of an io_unit to raid disks till the
io_unit is the first one whose data/parity is in log.

The io_unit list order is important for other cases too. For example,
some io_unit are reclaimable and others not. They can be mixed in the
list, we shouldn't reuse space of an unreclaimable io_unit.

Includes fixes to problems which were...
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-10-24 17:16:19 +11:00
..
bcache bcache: remove driver private bio splitting code 2015-08-13 12:31:40 -06:00
persistent-data dm: remove unlikely() before IS_ERR() 2015-08-12 11:32:21 -04:00
bitmap.c md-cluster: Use a small window for resync 2015-10-12 01:32:05 -05:00
bitmap.h md-cluster: Use a small window for resync 2015-10-12 01:32:05 -05:00
dm-bio-prison.c block: add a bi_error field to struct bio 2015-07-29 08:55:15 -06:00
dm-bio-prison.h dm bio prison: add dm_cell_promote_or_release() 2015-05-29 14:19:06 -04:00
dm-bio-record.h dm: Refactor for new bio cloning/splitting 2013-11-23 22:33:55 -08:00
dm-bufio.c block: add a bi_error field to struct bio 2015-07-29 08:55:15 -06:00
dm-bufio.h dm snapshot: use dm-bufio prefetch 2014-01-14 23:23:03 -05:00
dm-builtin.c dm sysfs: fix a module unload race 2014-01-14 23:23:04 -05:00
dm-cache-block-types.h dm cache: revert "remove remainder of distinct discard block size" 2014-11-10 15:25:30 -05:00
dm-cache-metadata.c dm cache: add fail io mode and needs_check flag 2015-06-11 17:13:00 -04:00
dm-cache-metadata.h dm cache: add fail io mode and needs_check flag 2015-06-11 17:13:00 -04:00
dm-cache-policy-cleaner.c dm cache: fix NULL pointer when switching from cleaner policy 2015-10-09 09:16:29 -04:00
dm-cache-policy-internal.h dm cache: age and write back cache entries even without active IO 2015-06-11 17:13:01 -04:00
dm-cache-policy-mq.c dm cache policy smq: move 'dm-cache-default' module alias to SMQ 2015-08-12 11:27:29 -04:00
dm-cache-policy-smq.c dm cache policy smq: change the mutex to a spinlock 2015-08-12 11:32:19 -04:00
dm-cache-policy.c dm cache: add policy name to status output 2014-01-16 13:44:11 -05:00
dm-cache-policy.h dm cache: age and write back cache entries even without active IO 2015-06-11 17:13:01 -04:00
dm-cache-target.c Merge tag 'dm-4.3-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm 2015-09-02 16:35:26 -07:00
dm-crypt.c dm crypt: constrain crypt device's max_segment_size to PAGE_SIZE 2015-09-14 12:04:24 -04:00
dm-delay.c dm: do not override error code returned from dm_get_device() 2015-08-12 11:32:21 -04:00
dm-era-target.c block: kill merge_bvec_fn() completely 2015-08-13 12:31:57 -06:00
dm-exception-store.c dm snapshot: add new persistent store option to support overflow 2015-10-09 16:57:03 -04:00
dm-exception-store.h dm snapshot: add new persistent store option to support overflow 2015-10-09 16:57:03 -04:00
dm-flakey.c Merge tag 'dm-4.3-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm 2015-09-02 16:35:26 -07:00
dm-io.c block: remove bio_get_nr_vecs() 2015-08-13 12:32:04 -06:00
dm-ioctl.c char: make misc_deregister a void function 2015-08-05 10:35:49 -07:00
dm-kcopyd.c dm: stop using WQ_NON_REENTRANT 2013-08-23 09:02:13 -04:00
dm-linear.c Merge tag 'dm-4.3-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm 2015-09-02 16:35:26 -07:00
dm-log-userspace-base.c dm log userspace base: fix compile warning 2015-04-15 12:10:20 -04:00
dm-log-userspace-transfer.c dm log userspace transfer: match wait_for_completion_timeout return type 2015-04-15 12:10:20 -04:00
dm-log-userspace-transfer.h
dm-log-writes.c Merge tag 'dm-4.3-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm 2015-09-02 16:35:26 -07:00
dm-log.c
dm-mpath.c dm-mpath, scsi_dh: request scsi_dh modules in scsi_dh, not dm-mpath 2015-08-28 13:14:55 -07:00
dm-mpath.h
dm-path-selector.c
dm-path-selector.h
dm-queue-length.c
dm-raid1.c Merge tag 'dm-4.3-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm 2015-09-02 16:35:26 -07:00
dm-raid.c dm raid: fix round up of default region size 2015-10-02 12:02:31 -04:00
dm-region-hash.c block: Abstract out bvec iterator 2013-11-23 22:33:47 -08:00
dm-round-robin.c
dm-service-time.c
dm-snap-persistent.c dm snapshot: add new persistent store option to support overflow 2015-10-09 16:57:03 -04:00
dm-snap-transient.c dm snapshot: add new persistent store option to support overflow 2015-10-09 16:57:03 -04:00
dm-snap.c dm snapshot: add new persistent store option to support overflow 2015-10-09 16:57:03 -04:00
dm-stats.c dm stats: report precise_timestamps and histogram in @stats_list output 2015-08-18 17:20:03 -04:00
dm-stats.h dm stats: support precise timestamps 2015-06-17 12:40:40 -04:00
dm-stripe.c Merge tag 'dm-4.3-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm 2015-09-02 16:35:26 -07:00
dm-switch.c dm switch: efficiently support repetitive patterns 2014-08-01 12:30:37 -04:00
dm-sysfs.c dm: add 'use_blk_mq' module param and expose in per-device ro sysfs attr 2015-04-15 12:10:17 -04:00
dm-table.c block: Replace SG_GAPS with new queue limits mask 2015-08-19 14:26:02 -07:00
dm-target.c dm: allocate requests in target when stacking on blk-mq devices 2015-02-09 13:06:47 -05:00
dm-thin-metadata.c dm thin metadata: delete btrees when releasing metadata snapshot 2015-08-12 10:42:51 -04:00
dm-thin-metadata.h dm thin metadata: add dm_thin_remove_range() 2015-06-11 17:13:04 -04:00
dm-thin.c dm thin: disable discard support for thin devices if pool's is disabled 2015-09-13 21:32:10 -04:00
dm-uevent.c
dm-uevent.h
dm-verity.c Merge tag 'dm-4.3-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm 2015-09-02 16:35:26 -07:00
dm-zero.c block: add a bi_error field to struct bio 2015-07-29 08:55:15 -06:00
dm.c dm: fix request-based dm error reporting 2015-10-06 10:08:16 -04:00
dm.h block: kill merge_bvec_fn() completely 2015-08-13 12:31:57 -06:00
faulty.c block: add a bi_error field to struct bio 2015-07-29 08:55:15 -06:00
Kconfig SCSI misc on 20150911 2015-09-11 18:15:18 -07:00
linear.c block: kill merge_bvec_fn() completely 2015-08-13 12:31:57 -06:00
linear.h
Makefile raid5: add basic stripe log 2015-10-24 17:16:19 +11:00
md-cluster.c md-cluster: remove mddev arg from add_resync_info() 2015-10-24 17:16:18 +11:00
md-cluster.h md-cluster: Fix adding of new disk with new reload code 2015-10-12 03:35:30 -05:00
md.c md: override md superblock recovery_offset for journal device 2015-10-24 17:16:18 +11:00
md.h md: override md superblock recovery_offset for journal device 2015-10-24 17:16:18 +11:00
multipath.c md: drop null test before destroy functions 2015-10-02 17:23:44 +10:00
multipath.h
raid0.c md/raid0: apply base queue limits *before* disk_stack_limits 2015-10-02 17:23:44 +10:00
raid0.h block: kill merge_bvec_fn() completely 2015-08-13 12:31:57 -06:00
raid1.c md-cluster: Call update_raid_disks() if another node --grow's raid_disks 2015-10-24 17:16:18 +11:00
raid1.h md-cluster: Use a small window for resync 2015-10-12 01:32:05 -05:00
raid5-cache.c raid5: log reclaim support 2015-10-24 17:16:19 +11:00
raid5.c raid5: log reclaim support 2015-10-24 17:16:19 +11:00
raid5.h raid5: log reclaim support 2015-10-24 17:16:19 +11:00
raid10.c Merge branch 'md-next' of git://github.com/goldwynr/linux into for-next 2015-10-14 07:09:52 +11:00
raid10.h md/raid10: ensure device failure recorded before write request returns. 2015-08-31 19:43:45 +02:00