2
0
mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-26 14:14:01 +08:00

md: use memalloc scope APIs in mddev_suspend()/mddev_resume()

In raid5.c:resize_chunk(), scribble_alloc() is called with GFP_NOIO
flag, then it is sent into kvmalloc_array() inside scribble_alloc().

The problem is kvmalloc_array() eventually calls kvmalloc_node() which
does not accept non GFP_KERNEL compatible flag like GFP_NOIO, then
kmalloc_node() is called indeed to allocate physically continuous
pages. When system memory is under heavy pressure, and the requesting
size is large, there is high probability that allocating continueous
pages will fail.

But simply using GFP_KERNEL flag to call kvmalloc_array() is also
progblematic. In the code path where scribble_alloc() is called, the
raid array is suspended, if kvmalloc_node() triggers memory reclaim I/Os
and such I/Os go back to the suspend raid array, deadlock will happen.

What is desired here is to allocate non-physically (a.k.a virtually)
continuous pages and avoid memory reclaim I/Os. Michal Hocko suggests
to use the mmealloc sceope APIs to restrict memory reclaim I/O in
allocating context, specifically to call memalloc_noio_save() when
suspend the raid array and to call memalloc_noio_restore() when
resume the raid array.

This patch adds the memalloc scope APIs in mddev_suspend() and
mddev_resume(), to restrict memory reclaim I/Os during the raid array
is suspended. The benifit of adding the memalloc scope API in the
unified entry point mddev_suspend()/mddev_resume() is, no matter which
md raid array type (personality), we are sure the deadlock by recursive
memory reclaim I/O won't happen on the suspending context.

Please notice that the memalloc scope APIs only take effect on the raid
array suspending context, if the memory allocation is from another new
created kthread after raid array suspended, the recursive memory reclaim
I/Os won't be restricted. The mddev_suspend()/mddev_resume() entries are
used for the critical section where the raid metadata is modifying,
creating a kthread to allocate memory inside the critical section is
queer and very probably being buggy.

Fixes: b330e6a49d ("md: convert to kvmalloc")
Suggested-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Song Liu <songliubraving@fb.com>
This commit is contained in:
Coly Li 2020-04-09 22:17:20 +08:00 committed by Song Liu
parent 3f79cc2234
commit 78f57ef9d5
2 changed files with 5 additions and 0 deletions

View File

@ -528,11 +528,15 @@ void mddev_suspend(struct mddev *mddev)
wait_event(mddev->sb_wait, !test_bit(MD_UPDATING_SB, &mddev->flags));
del_timer_sync(&mddev->safemode_timer);
/* restrict memory reclaim I/O during raid array is suspend */
mddev->noio_flag = memalloc_noio_save();
}
EXPORT_SYMBOL_GPL(mddev_suspend);
void mddev_resume(struct mddev *mddev)
{
/* entred the memalloc scope from mddev_suspend() */
memalloc_noio_restore(mddev->noio_flag);
lockdep_assert_held(&mddev->reconfig_mutex);
if (--mddev->suspended)
return;

View File

@ -497,6 +497,7 @@ struct mddev {
void (*sync_super)(struct mddev *mddev, struct md_rdev *rdev);
struct md_cluster_info *cluster_info;
unsigned int good_device_nr; /* good device num within cluster raid */
unsigned int noio_flag; /* for memalloc scope API */
bool has_superblocks:1;
bool fail_last_dev:1;