linux/fs
Jérôme Glisse 2916ecc0f9 mm/migrate: new migrate mode MIGRATE_SYNC_NO_COPY
Introduce a new migration mode that allow to offload the copy to a device
DMA engine.  This changes the workflow of migration and not all
address_space migratepage callback can support this.

This is intended to be use by migrate_vma() which itself is use for thing
like HMM (see include/linux/hmm.h).

No additional per-filesystem migratepage testing is needed.  I disables
MIGRATE_SYNC_NO_COPY in all problematic migratepage() callback and i
added comment in those to explain why (part of this patch).  The commit
message is unclear it should say that any callback that wish to support
this new mode need to be aware of the difference in the migration flow
from other mode.

Some of these callbacks do extra locking while copying (aio, zsmalloc,
balloon, ...) and for DMA to be effective you want to copy multiple
pages in one DMA operations.  But in the problematic case you can not
easily hold the extra lock accross multiple call to this callback.

Usual flow is:

For each page {
 1 - lock page
 2 - call migratepage() callback
 3 - (extra locking in some migratepage() callback)
 4 - migrate page state (freeze refcount, update page cache, buffer
     head, ...)
 5 - copy page
 6 - (unlock any extra lock of migratepage() callback)
 7 - return from migratepage() callback
 8 - unlock page
}

The new mode MIGRATE_SYNC_NO_COPY:
 1 - lock multiple pages
For each page {
 2 - call migratepage() callback
 3 - abort in all problematic migratepage() callback
 4 - migrate page state (freeze refcount, update page cache, buffer
     head, ...)
} // finished all calls to migratepage() callback
 5 - DMA copy multiple pages
 6 - unlock all the pages

To support MIGRATE_SYNC_NO_COPY in the problematic case we would need a
new callback migratepages() (for instance) that deals with multiple
pages in one transaction.

Because the problematic cases are not important for current usage I did
not wanted to complexify this patchset even more for no good reason.

Link: http://lkml.kernel.org/r/20170817000548.32038-14-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mark Hairgrove <mhairgrove@nvidia.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Sherry Cheung <SCheung@nvidia.com>
Cc: Subhash Gutti <sgutti@nvidia.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Bob Liu <liubo95@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08 18:26:46 -07:00
..
9p Merge branch 'akpm' (patches from Andrew) 2017-09-06 20:49:49 -07:00
adfs
affs fs: convert a pile of fsync routines to errseq_t based reporting 2017-08-01 08:39:29 -04:00
afs Merge branch 'akpm' (patches from Andrew) 2017-09-06 20:49:49 -07:00
autofs4 Fix up over-eager 'wait_queue_t' renaming 2017-07-10 11:40:19 -07:00
befs Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-07-15 12:00:42 -07:00
bfs bfs: fix sanity checks for empty files 2017-07-12 16:26:00 -07:00
btrfs Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-block 2017-09-07 11:59:42 -07:00
cachefiles sched/wait: Disambiguate wq_entry->task_list and wq_head->task_list naming 2017-06-20 12:19:14 +02:00
ceph Merge branch 'akpm' (patches from Andrew) 2017-09-06 20:49:49 -07:00
cifs enable xattr support for smb3 and also a bugfix 2017-09-07 16:06:14 -07:00
coda fs: implement vfs_iter_write using do_iter_write 2017-06-29 17:49:23 -04:00
configfs configfs: Introduce config_item_get_unless_zero() 2017-06-12 13:20:20 +02:00
cramfs
crypto block: replace bi_bdev with a gendisk pointer and partitions index 2017-08-23 12:49:55 -06:00
debugfs Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-07-15 12:00:42 -07:00
devpts pty: Repair TIOCGPTPEER 2017-08-24 13:23:03 -07:00
dlm File locking related changes for v4.14 2017-09-06 13:43:26 -07:00
ecryptfs ecryptfs: convert to file_write_and_wait in ->fsync 2017-08-03 14:20:22 -04:00
efivarfs VFS: Kill off s_options and helpers 2017-07-11 06:09:21 -04:00
efs Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
exofs Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-block 2017-09-07 11:59:42 -07:00
exportfs Merge branch 'rebased-statx' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-03-03 11:38:56 -08:00
ext2 dax: use common 4k zero page for dax mmap reads 2017-09-06 17:27:24 -07:00
ext4 Merge branch 'quota_scaling' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2017-09-07 15:19:35 -07:00
f2fs mm/migrate: new migrate mode MIGRATE_SYNC_NO_COPY 2017-09-08 18:26:46 -07:00
fat fat: fix using uninitialized fields of fat_inode/fsinfo_inode 2017-03-09 17:01:10 -08:00
freevxfs
fscache mm: remove nr_pages argument from pagevec_lookup{,_range}() 2017-09-06 17:27:27 -07:00
fuse Writeback error handling fixes for v4.14 2017-09-06 14:11:03 -07:00
gfs2 Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-block 2017-09-07 11:59:42 -07:00
hfs fs: convert a pile of fsync routines to errseq_t based reporting 2017-08-01 08:39:29 -04:00
hfsplus Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-block 2017-09-07 11:59:42 -07:00
hostfs fs: convert a pile of fsync routines to errseq_t based reporting 2017-08-01 08:39:29 -04:00
hpfs fs: convert a pile of fsync routines to errseq_t based reporting 2017-08-01 08:39:29 -04:00
hugetlbfs mm/migrate: new migrate mode MIGRATE_SYNC_NO_COPY 2017-09-08 18:26:46 -07:00
isofs isofs: Delete an unnecessary variable initialisation in isofs_read_inode() 2017-08-23 18:54:03 +02:00
jbd2 Writeback error handling fixes (pile #2) 2017-07-07 19:38:17 -07:00
jffs2 fs: convert a pile of fsync routines to errseq_t based reporting 2017-08-01 08:39:29 -04:00
jfs Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-block 2017-09-07 11:59:42 -07:00
kernfs Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-block 2017-09-07 11:59:42 -07:00
lockd sunrpc: mark all struct svc_version instances as const 2017-05-15 17:42:31 +02:00
minix Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-07-08 10:50:54 -07:00
ncpfs fs: convert a pile of fsync routines to errseq_t based reporting 2017-08-01 08:39:29 -04:00
nfs Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-block 2017-09-07 11:59:42 -07:00
nfs_common
nfsd annotate RWF_... flags 2017-08-31 17:32:38 -04:00
nilfs2 Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-block 2017-09-07 11:59:42 -07:00
nls
notify fsnotify: make dnotify_fsnotify_ops const 2017-08-30 16:02:48 +02:00
ntfs fs: convert a pile of fsync routines to errseq_t based reporting 2017-08-01 08:39:29 -04:00
ocfs2 Merge branch 'quota_scaling' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2017-09-07 15:19:35 -07:00
omfs omfs: Implement show_options 2017-07-06 03:31:46 -04:00
openpromfs Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
orangefs Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-07-15 12:00:42 -07:00
overlayfs overlayfs, locking: Remove smp_mb__before_spinlock() usage 2017-08-10 12:29:02 +02:00
proc mm/ZONE_DEVICE: new type of ZONE_DEVICE for unaddressable memory 2017-09-08 18:26:46 -07:00
pstore Revert "pstore: Honor dmesg_restrict sysctl on dmesg dumps" 2017-08-17 16:29:19 -07:00
qnx4 more trivial ->iterate_shared conversions 2016-05-09 11:41:14 -04:00
qnx6
quota Merge branch 'quota_scaling' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2017-09-07 15:19:35 -07:00
ramfs mm: make pagevec_lookup() update index 2017-09-06 17:27:26 -07:00
reiserfs Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2017-09-07 14:53:17 -07:00
romfs
squashfs fs/pstore: fs/squashfs: change usage of LZ4 to work with new LZ4 version 2017-02-24 17:46:57 -08:00
sysfs sysfs: be careful of error returns from ops->show() 2017-04-08 17:33:32 +02:00
sysv mm: drop "wait" parameter from write_one_page() 2017-07-05 18:44:22 -04:00
tracefs VFS: Don't use save/replace_mount_options if not using generic_show_options 2017-07-06 03:31:46 -04:00
ubifs mm/migrate: new migrate mode MIGRATE_SYNC_NO_COPY 2017-09-08 18:26:46 -07:00
udf fs-udf: Delete an error message for a failed memory allocation in two functions 2017-08-16 16:43:23 +02:00
ufs Writeback error handling fixes (pile #1) 2017-07-07 18:39:15 -07:00
xfs Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-block 2017-09-07 11:59:42 -07:00
aio.c mm/migrate: new migrate mode MIGRATE_SYNC_NO_COPY 2017-09-08 18:26:46 -07:00
anon_inodes.c
attr.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/signal.h> 2017-03-02 08:42:29 +01:00
bad_inode.c statx: Add a system call to make enhanced file info available 2017-03-02 20:51:15 -05:00
binfmt_aout.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task_stack.h> 2017-03-02 08:42:36 +01:00
binfmt_elf_fdpic.c binfmt: Introduce secureexec flag 2017-08-01 12:03:05 -07:00
binfmt_elf.c This series has the ultimate goal of providing a sane stack rlimit when 2017-09-07 20:35:29 -07:00
binfmt_em86.c
binfmt_flat.c exec: Rename bprm->cred_prepared to called_set_creds 2017-08-01 12:02:48 -07:00
binfmt_misc.c fs: constify tree_descr arrays passed to simple_fill_super() 2017-04-26 23:54:06 -04:00
binfmt_script.c syscalls: implement execveat() system call 2014-12-13 12:42:51 -08:00
block_dev.c block: replace bi_bdev with a gendisk pointer and partitions index 2017-08-23 12:49:55 -06:00
buffer.c Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-block 2017-09-07 11:59:42 -07:00
char_dev.c char_dev: order /proc/devices by major number 2017-07-17 15:28:50 +02:00
compat_binfmt_elf.c
compat_ioctl.c media: get rid of removed DMX_GET_CAPS and DMX_SET_SOURCE leftovers 2017-09-05 08:25:07 -04:00
compat.c fs/compat.c: trim unused includes 2017-04-17 12:52:27 -04:00
coredump.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task_stack.h> 2017-03-02 08:42:36 +01:00
dax.c dax: initialize variable pfn before using it 2017-09-06 17:27:24 -07:00
dcache.c Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-07-15 12:00:42 -07:00
dcookies.c
direct-io.c block: replace bi_bdev with a gendisk pointer and partitions index 2017-08-23 12:49:55 -06:00
drop_caches.c
eventfd.c There has been a fair amount of activity in the docs tree this time 2017-07-03 21:13:25 -07:00
eventpoll.c epoll: fix race between ep_poll_callback(POLLFREE) and ep_free()/ep_remove() 2017-09-01 13:07:35 -07:00
exec.c exec: Consolidate pdeath_signal clearing 2017-08-01 12:03:14 -07:00
fcntl.c vfs: fix flock compat thinko 2017-07-07 13:48:18 -07:00
fhandle.c fhandle: move compat syscalls from compat.c 2017-04-17 12:52:26 -04:00
file_table.c fs: new infrastructure for writeback error handling and reporting 2017-07-06 07:02:25 -04:00
file.c fs/file.c: replace alloc_fdmem() with kvmalloc() alternative 2017-07-06 16:24:30 -07:00
filesystems.c Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-07-15 12:00:42 -07:00
fs_pin.c sched/wait: Disambiguate wq_entry->task_list and wq_head->task_list naming 2017-06-20 12:19:14 +02:00
fs_struct.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task.h> 2017-03-02 08:42:35 +01:00
fs-writeback.c writeback: rework wb_[dec|inc]_stat family of functions 2017-07-12 16:26:05 -07:00
inode.c xfs: evict all inodes involved with log redo item 2017-09-01 10:55:30 -07:00
internal.h xfs: evict all inodes involved with log redo item 2017-09-01 10:55:30 -07:00
ioctl.c sched/headers: Prepare for the reduction of <linux/sched.h>'s signal API dependency 2017-03-02 08:42:37 +01:00
iomap.c Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-block 2017-09-07 11:59:42 -07:00
Kconfig fs/Kconfig: kill CONFIG_PERCPU_RWSEM some more 2017-07-12 16:26:00 -07:00
Kconfig.binfmt
libfs.c fs: convert __generic_file_fsync to use errseq_t based reporting 2017-07-06 07:02:29 -04:00
locks.c locks: restore a warn for leaked locks on close 2017-07-21 13:57:31 -04:00
Makefile
mbcache.c ext4: xattr inode deduplication 2017-06-22 11:44:55 -04:00
mount.h Now that IPC and other changes have landed, enable manual markings for 2017-07-19 08:55:18 -07:00
mpage.c block: replace bi_bdev with a gendisk pointer and partitions index 2017-08-23 12:49:55 -06:00
namei.c Now that IPC and other changes have landed, enable manual markings for 2017-07-19 08:55:18 -07:00
namespace.c Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-07-15 12:00:42 -07:00
no-block.c
nsfs.c VFS: Provide empty name qstr 2017-07-06 03:27:09 -04:00
open.c Writeback error handling fixes (pile #2) 2017-07-07 19:38:17 -07:00
pipe.c VFS: Provide empty name qstr 2017-07-06 03:27:09 -04:00
pnode.c mnt: Make propagate_umount less slow for overlapping mount propagation trees 2017-05-23 08:41:17 -05:00
pnode.h
posix_acl.c sched/headers: Prepare to remove <linux/cred.h> inclusion from <linux/sched.h> 2017-03-02 08:42:31 +01:00
proc_namespace.c sched/headers: Prepare to move the task_lock()/unlock() APIs to <linux/sched/task.h> 2017-03-02 08:42:38 +01:00
read_write.c annotate RWF_... flags 2017-08-31 17:32:38 -04:00
readdir.c readdir: move compat syscalls from compat.c 2017-04-17 12:52:24 -04:00
select.c fs/select: Fix memory corruption in compat_get_fd_set() 2017-08-28 16:09:19 -07:00
seq_file.c mm: introduce kv[mz]alloc helpers 2017-05-08 17:15:12 -07:00
signalfd.c sched/wait: Rename wait_queue_t => wait_queue_entry_t 2017-06-20 12:18:27 +02:00
splice.c fs: implement vfs_iter_write using do_iter_write 2017-06-29 17:49:23 -04:00
stack.c
stat.c fs: Provide __inode_get_bytes() 2017-08-17 22:06:03 +02:00
statfs.c Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-07-08 10:50:54 -07:00
super.c quota: Convert dqio_mutex to rwsem 2017-08-17 18:52:48 +02:00
sync.c Merge branch 'akpm' (patches from Andrew) 2017-09-06 20:49:49 -07:00
timerfd.c timerfd: Use get_itimerspec64() and put_itimerspec64() 2017-06-30 04:14:38 -04:00
userfaultfd.c userfaultfd: provide pid in userfault msg - add feat union 2017-09-06 17:27:29 -07:00
utimes.c utimes: move compat syscalls from compat.c 2017-04-17 12:52:23 -04:00
xattr.c treewide: use kv[mz]alloc* rather than opencoded variants 2017-05-08 17:15:13 -07:00