linux-next

mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-22 12:14:01 +08:00

History

Guoqing Jiang 0ba959774e md-cluster: use sync way to handle METADATA_UPDATED msg Previously, when node received METADATA_UPDATED msg, it just need to wakeup mddev->thread, then md_reload_sb will be called eventually. We taken the asynchronous way to avoid a deadlock issue, the deadlock issue could happen when one node is receiving the METADATA_UPDATED msg (wants reconfig_mutex) and trying to run the path: md_check_recovery -> mddev_trylock(hold reconfig_mutex) -> md_update_sb-metadata_update_start (want EX on token however token is got by the sending node) Since we will support resizing for clustered raid, and we need the metadata update handling to be synchronous so that the initiating node can detect failure, so we need to change the way for handling METADATA_UPDATED msg. But, we obviously need to avoid above deadlock with the sync way. To make this happen, we considered to not hold reconfig_mutex to call md_reload_sb, if some other thread has already taken reconfig_mutex and waiting for the 'token', then process_recvd_msg() can safely call md_reload_sb() without taking the mutex. This is because we can be certain that no other thread will take the mutex, and we also certain that the actions performed by md_reload_sb() won't interfere with anything that the other thread is in the middle of. To make this more concrete, we added a new cinfo->state bit MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD Which is set in lock_token() just before dlm_lock_sync() is called, and cleared just after. As lock_token() is always called with reconfig_mutex() held (the specific case is the resync_info_update which is distinguished well in previous patch), if process_recvd_msg() finds that the new bit is set, then the mutex must be held by some other thread, and it will keep waiting. So process_metadata_update() can call md_reload_sb() if either mddev_trylock() succeeds, or if MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD is set. The tricky bit is what to do if neither of these apply. We need to wait. Fortunately mddev_unlock() always calls wake_up() on mddev->thread->wqueue. So we can get lock_token() to call wake_up() on that when it sets the bit. There are also some related changes inside this commit: 1. remove RELOAD_SB related codes since there are not valid anymore. 2. mddev is added into md_cluster_info then we can get mddev inside lock_token. 3. add new parameter for lock_token to distinguish reconfig_mutex is held or not. And, we need to set MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD in below: 1. set it before unregister thread, otherwise a deadlock could appear if stop a resyncing array. This is because md_unregister_thread(&cinfo->recv_thread) is blocked by recv_daemon -> process_recvd_msg -> process_metadata_update. To resolve the issue, MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD is also need to be set before unregister thread. 2. set it in metadata_update_start to fix another deadlock. a. Node A sends METADATA_UPDATED msg (held Token lock). b. Node B wants to do resync, and is blocked since it can't get Token lock, but MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD is not set since the callchain (md_do_sync -> sync_request -> resync_info_update -> sendmsg -> lock_comm -> lock_token) doesn't hold reconfig_mutex. c. Node B trys to update sb (held reconfig_mutex), but stopped at wait_event() in metadata_update_start since we have set MD_CLUSTER_SEND_LOCK flag in lock_comm (step 2). d. Then Node B receives METADATA_UPDATED msg from A, of course recv_daemon is blocked forever. Since metadata_update_start always calls lock_token with reconfig_mutex, we need to set MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD here as well, and lock_token don't need to set it twice unless lock_token is invoked from lock_comm. Finally, thanks to Neil for his great idea and help! Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>		2017-03-16 16:55:49 -07:00
..
bcache	drivers/md/bcache/util.h: remove duplicate inclusion of blkdev.h	2017-03-09 17:01:10 -08:00
persistent-data	sched/headers: Prepare to move the get_task_struct()/put_task_struct() and related APIs from <linux/sched.h> to <linux/sched/task.h>	2017-03-02 08:42:40 +01:00
bitmap.c	md: separate flags for superblock changes	2016-12-08 22:01:47 -08:00
bitmap.h	md-cluster: sync bitmap when node received RESYNCING msg	2016-05-04 12:39:35 -07:00
dm-bio-prison.c	block: add a bi_error field to struct bio	2015-07-29 08:55:15 -06:00
dm-bio-prison.h	dm bio prison: add dm_cell_promote_or_release()	2015-05-29 14:19:06 -04:00
dm-bio-record.h
dm-bufio.c	sched/headers: Prepare to move the memalloc_noio_*() APIs to <linux/sched/mm.h>	2017-03-02 08:42:33 +01:00
dm-bufio.h
dm-builtin.c	dm: move request-based code out to dm-rq.[hc]	2016-06-10 15:15:44 -04:00
dm-cache-block-types.h	linux: drop __bitwise__ everywhere	2016-12-16 00:13:41 +02:00
dm-cache-metadata.c	dm cache metadata: use cursor api in blocks_are_clean_separate_dirty()	2017-02-16 13:12:51 -05:00
dm-cache-metadata.h	dm cache metadata: add "metadata2" feature	2017-02-16 13:12:47 -05:00
dm-cache-policy-cleaner.c	dm cache: speed up writing of the hint array	2016-09-22 11:15:02 -04:00
dm-cache-policy-internal.h	dm cache: speed up writing of the hint array	2016-09-22 11:15:02 -04:00
dm-cache-policy-smq.c	dm cache policy smq: use hash_32() instead of hash_32_generic()	2016-12-08 19:42:37 -05:00
dm-cache-policy.c
dm-cache-policy.h	dm cache: speed up writing of the hint array	2016-09-22 11:15:02 -04:00
dm-cache-target.c	- Fix dm-raid transient device failure processing and other smaller	2017-02-21 12:11:41 -08:00
dm-core.h	dm: always defer request allocation to the owner of the request_queue	2017-01-27 15:08:35 -07:00
dm-crypt.c	KEYS: Differentiate uses of rcu_dereference_key() and user_key_payload()	2017-03-02 10:09:00 +11:00
dm-delay.c	dm: rename target's per_bio_data_size to per_io_data_size	2016-02-22 22:34:37 -05:00
dm-era-target.c	block: Use pointer to backing_dev_info from request_queue	2017-02-02 08:20:48 -07:00
dm-exception-store.c	- Revert a dm-multipath change that caused a regression for unprivledged	2015-11-04 21:19:53 -08:00
dm-exception-store.h	dm snapshot: fix hung bios when copy error occurs	2016-01-08 20:03:05 -05:00
dm-flakey.c	dm flakey: introduce "error_writes" feature	2016-12-13 15:01:31 -05:00
dm-io.c	dm io: use bvec iterator helpers to implement .get_page and .next_page	2016-11-21 09:51:57 -05:00
dm-ioctl.c	sched/headers: Prepare to move the memalloc_noio_*() APIs to <linux/sched/mm.h>	2017-03-02 08:42:33 +01:00
dm-kcopyd.c	dm: move request-based code out to dm-rq.[hc]	2016-06-10 15:15:44 -04:00
dm-linear.c	libnvdimm for 4.8	2016-07-28 17:38:16 -07:00
dm-log-userspace-base.c	dm: drop NULL test before kmem_cache_destroy() and mempool_destroy()	2015-10-31 19:06:00 -04:00
dm-log-userspace-transfer.c
dm-log-userspace-transfer.h
dm-log-writes.c	Merge branch 'for-4.9/block' of git://git.kernel.dk/linux-block	2016-10-07 14:42:05 -07:00
dm-log.c	block,fs: use REQ_* flags directly	2016-11-01 09:43:26 -06:00
dm-mpath.c	Merge branch 'for-4.11/next' into for-4.11/linus-merge	2017-02-17 14:08:19 -07:00
dm-mpath.h
dm-path-selector.c
dm-path-selector.h	dm path selector: remove 'repeat_count' return from .select_path hook	2016-02-22 22:34:42 -05:00
dm-queue-length.c	dm path selector: remove 'repeat_count' return from .select_path hook	2016-02-22 22:34:42 -05:00
dm-raid1.c	Merge branch 'for-4.10/block' of git://git.kernel.dk/linux-block	2016-12-13 10:19:16 -08:00
dm-raid.c	dm raid: bump the target version	2017-02-28 16:47:52 -05:00
dm-region-hash.c	block: rename bio bi_rw to bi_opf	2016-08-07 14:41:02 -06:00
dm-round-robin.c	dm round robin: revert "use percpu 'repeat_count' and 'current_path'"	2017-02-17 00:54:09 -05:00
dm-rq.c	dm-rq: don't dereference request payload after ending request	2017-02-24 13:19:32 -07:00
dm-rq.h	dm: always defer request allocation to the owner of the request_queue	2017-01-27 15:08:35 -07:00
dm-service-time.c	dm path selector: remove 'repeat_count' return from .select_path hook	2016-02-22 22:34:42 -05:00
dm-snap-persistent.c	block,fs: use REQ_* flags directly	2016-11-01 09:43:26 -06:00
dm-snap-transient.c	dm snapshot: fix hung bios when copy error occurs	2016-01-08 20:03:05 -05:00
dm-snap.c	block: rename bio bi_rw to bi_opf	2016-08-07 14:41:02 -06:00
dm-stats.c	dm stats: fix a leaked s->histogram_boundaries array	2017-02-16 14:17:07 -05:00
dm-stats.h	dm stats: support precise timestamps	2015-06-17 12:40:40 -04:00
dm-stripe.c	block: rename bio bi_rw to bi_opf	2016-08-07 14:41:02 -06:00
dm-switch.c	dm switch: simplify conditional in alloc_region_table()	2015-10-31 19:06:06 -04:00
dm-sysfs.c	dm: move request-based code out to dm-rq.[hc]	2016-06-10 15:15:44 -04:00
dm-table.c	block: Use pointer to backing_dev_info from request_queue	2017-02-02 08:20:48 -07:00
dm-target.c	dm: always defer request allocation to the owner of the request_queue	2017-01-27 15:08:35 -07:00
dm-thin-metadata.c	dm thin: fix a race condition between discarding and provisioning a block	2016-07-20 12:43:35 -04:00
dm-thin-metadata.h	dm thin: fix a race condition between discarding and provisioning a block	2016-07-20 12:43:35 -04:00
dm-thin.c	block: Use pointer to backing_dev_info from request_queue	2017-02-02 08:20:48 -07:00
dm-uevent.c
dm-uevent.h
dm-verity-fec.c	dm verity fec: fix block calculation	2016-07-01 23:29:08 -04:00
dm-verity-fec.h	dm verity: add support for forward error correction	2015-12-10 10:39:03 -05:00
dm-verity-target.c	dm verity: fix incorrect error message	2016-11-21 09:52:01 -05:00
dm-verity.h	dm verity: add ignore_zero_blocks feature	2015-12-10 10:39:03 -05:00
dm-zero.c	block: rename bio bi_rw to bi_opf	2016-08-07 14:41:02 -06:00
dm.c	blk: Ensure users for current->bio_list can see the full list.	2017-03-11 15:31:37 -07:00
dm.h	dm: always defer request allocation to the owner of the request_queue	2017-01-27 15:08:35 -07:00
faulty.c	md: fast clone bio in bio_clone_mddev()	2017-02-15 11:24:54 -08:00
Kconfig	dm block manager: make block locking optional	2016-11-14 15:17:47 -05:00
linear.c	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md	2017-02-24 14:42:19 -08:00
linear.h	md linear: fix a race between linear_add() and linear_congested()	2017-02-13 09:17:50 -08:00
Makefile	dm: move request-based code out to dm-rq.[hc]	2016-06-10 15:15:44 -04:00
md-cluster.c	md-cluster: use sync way to handle METADATA_UPDATED msg	2017-03-16 16:55:49 -07:00
md-cluster.h	md-cluster: gather resync infos and enable recv_thread after bitmap is ready	2016-05-09 09:24:03 -07:00
md.c	md-cluster: use sync way to handle METADATA_UPDATED msg	2017-03-16 16:55:49 -07:00
md.h	md-cluster: use sync way to handle METADATA_UPDATED msg	2017-03-16 16:55:49 -07:00
multipath.c	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md	2017-02-24 14:42:19 -08:00
multipath.h
raid0.c	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md	2017-02-24 14:42:19 -08:00
raid0.h	block: kill merge_bvec_fn() completely	2015-08-13 12:31:57 -06:00
raid1.c	md/raid1: fix a trivial typo in comments	2017-03-14 11:10:44 -07:00
raid1.h	RAID1: avoid unnecessary spin locks in I/O barrier code	2017-02-19 22:04:25 -08:00
raid5-cache.c	md/raid5-cache: exclude reclaiming stripes in reclaim check	2017-02-13 09:20:05 -08:00
raid5.c	md/r5cache: fix set_syndrome_sources() for data in cache	2017-03-14 09:57:10 -07:00
raid5.h	md/raid5-cache: exclude reclaiming stripes in reclaim check	2017-02-13 09:20:05 -08:00
raid10.c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md	2017-03-16 11:43:48 -07:00
raid10.h	md/raid10: add failfast handling for reads.	2016-11-22 09:14:28 -08:00