linux/fs
Kirill Tkhai e1603b6eff inotify: Extend ioctl to allow to request id of new watch descriptor
Watch descriptor is id of the watch created by inotify_add_watch().
It is allocated in inotify_add_to_idr(), and takes the numbers
starting from 1. Every new inotify watch obtains next available
number (usually, old + 1), as served by idr_alloc_cyclic().

CRIU (Checkpoint/Restore In Userspace) project supports inotify
files, and restores watched descriptors with the same numbers,
they had before dump. Since there was no kernel support, we
had to use cycle to add a watch with specific descriptor id:

	while (1) {
		int wd;

		wd = inotify_add_watch(inotify_fd, path, mask);
		if (wd < 0) {
			break;
		} else if (wd == desired_wd_id) {
			ret = 0;
			break;
		}

		inotify_rm_watch(inotify_fd, wd);
	}

(You may find the actual code at the below link:
 https://github.com/checkpoint-restore/criu/blob/v3.7/criu/fsnotify.c#L577)

The cycle is suboptiomal and very expensive, but since there is no better
kernel support, it was the only way to restore that. Happily, we had met
mostly descriptors with small id, and this approach had worked somehow.

But recent time containers with inotify with big watch descriptors
begun to come, and this way stopped to work at all. When descriptor id
is something about 0x34d71d6, the restoring process spins in busy loop
for a long time, and the restore hungs and delay of migration from node
to node could easily be watched.

This patch aims to solve this problem. It introduces new ioctl
INOTIFY_IOC_SETNEXTWD, which allows to request the number of next created
watch descriptor from userspace. It simply calls idr_set_cursor() primitive
to populate idr::idr_next, so that next idr_alloc_cyclic() allocation
will return this id, if it is not occupied. This is the way which is
used to restore some other resources from userspace. For example,
/proc/sys/kernel/ns_last_pid works the same for task pids.

The new code is under CONFIG_CHECKPOINT_RESTORE #define, so small system
may exclude it.

v2: Use INT_MAX instead of custom definition of max id,
as IDR subsystem guarantees id is between 0 and INT_MAX.

CC: Jan Kara <jack@suse.cz>
CC: Matthew Wilcox <willy@infradead.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jan Kara <jack@suse.cz>
2018-02-14 11:16:28 +01:00
..
9p fs/*/Kconfig: drop links to 404-compliant http://acl.bestbits.at 2018-01-01 12:45:37 -07:00
adfs Rename superblock flags (MS_xyz -> SB_xyz) 2017-11-27 13:05:09 -08:00
affs iversion: Rename make inode_cmp_iversion{+raw} to inode_eq_iversion{+raw} 2018-02-01 08:15:25 -05:00
afs afs: Support the AFS dynamic root 2018-02-06 14:43:37 +00:00
autofs4 Merge branch 'userns-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2018-01-30 14:43:12 -08:00
befs befs: Define usercopy region in befs_inode_cache slab cache 2018-01-15 12:07:54 -08:00
bfs License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
btrfs Documentation updates for 4.16. New stuff includes refcount_t 2018-01-31 19:25:25 -08:00
cachefiles vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
ceph Things have been very quiet on the rbd side, as work continues on the 2018-02-08 11:38:59 -08:00
cifs Add missing structs and defines from recent SMB3.1.1 documentation 2018-02-07 09:36:46 -06:00
coda vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
configfs configfs: make ci_type field, some pointers and function arguments const 2017-10-19 16:15:16 +02:00
cramfs cramfs: better MTD dependency expression 2018-02-08 11:37:31 -08:00
crypto fscrypt: fix build with pre-4.6 gcc versions 2018-02-01 10:51:18 -05:00
debugfs vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
devpts devpts: fix error handling in devpts_mntget() 2018-01-31 08:48:37 -08:00
dlm vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
ecryptfs vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
efivarfs VFS: Kill off s_options and helpers 2017-07-11 06:09:21 -04:00
efs Rename superblock flags (MS_xyz -> SB_xyz) 2017-11-27 13:05:09 -08:00
exofs iversion.h related cleanup for v4.16 2018-02-07 14:25:22 -08:00
exportfs
ext2 iversion.h related cleanup for v4.16 2018-02-07 14:25:22 -08:00
ext4 iversion.h related cleanup for v4.16 2018-02-07 14:25:22 -08:00
f2fs Refactor support for encrypted symlinks to move common code to fscrypt. 2018-02-04 10:43:12 -08:00
fat iversion: Rename make inode_cmp_iversion{+raw} to inode_eq_iversion{+raw} 2018-02-01 08:15:25 -05:00
freevxfs vxfs: Define usercopy region in vxfs_inode slab cache 2018-01-15 12:07:57 -08:00
fscache AFS development 2017-11-16 11:41:22 -08:00
fuse vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
gfs2 gfs2: Glock dump performance regression fix 2018-02-01 11:27:11 -07:00
hfs Rename superblock flags (MS_xyz -> SB_xyz) 2017-11-27 13:05:09 -08:00
hfsplus hfsplus: honor setgid flag on directories 2018-02-06 18:32:45 -08:00
hostfs License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
hpfs hpfs: don't bother with the i_version counter or f_version 2017-12-10 12:58:18 -08:00
hugetlbfs hugetlb: implement memfd sealing 2018-01-31 17:18:39 -08:00
isofs Rename superblock flags (MS_xyz -> SB_xyz) 2017-11-27 13:05:09 -08:00
jbd2 jbd2: fix sphinx kernel-doc build warnings 2018-01-10 00:27:29 -05:00
jffs2 Documentation updates for 4.16. New stuff includes refcount_t 2018-01-31 19:25:25 -08:00
jfs Currently, hardened usercopy performs dynamic bounds checking on slab 2018-02-03 16:25:42 -08:00
kernfs vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
lockd lockd: Fix server refcounting 2018-01-24 17:33:57 -05:00
minix Rename superblock flags (MS_xyz -> SB_xyz) 2017-11-27 13:05:09 -08:00
nfs iversion: Rename make inode_cmp_iversion{+raw} to inode_eq_iversion{+raw} 2018-02-01 08:15:25 -05:00
nfs_common lockd: fix "list_add double add" caused by legacy signal interface 2017-11-27 16:45:11 -05:00
nfsd This request is late, apologies. 2018-02-08 15:18:32 -08:00
nilfs2 nilfs2: use time64_t internally 2018-02-06 18:32:45 -08:00
nls License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
notify inotify: Extend ioctl to allow to request id of new watch descriptor 2018-02-14 11:16:28 +01:00
ntfs ntfs: remove i_version handling 2018-01-01 10:09:33 -05:00
ocfs2 vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
omfs License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
openpromfs Rename superblock flags (MS_xyz -> SB_xyz) 2017-11-27 13:05:09 -08:00
orangefs vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
overlayfs ovl: check ERR_PTR() return value from ovl_encode_fh() 2018-02-05 09:50:29 +01:00
proc vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
pstore fs: pstore: remove unused hardirq.h 2017-11-28 16:39:09 -08:00
qnx4 Rename superblock flags (MS_xyz -> SB_xyz) 2017-11-27 13:05:09 -08:00
qnx6 Rename superblock flags (MS_xyz -> SB_xyz) 2017-11-27 13:05:09 -08:00
quota quota: Check for register_shrinker() failure. 2017-11-29 16:46:48 +01:00
ramfs mm: make pagevec_lookup() update index 2017-09-06 17:27:26 -07:00
reiserfs fs/*/Kconfig: drop links to 404-compliant http://acl.bestbits.at 2018-01-01 12:45:37 -07:00
romfs Rename superblock flags (MS_xyz -> SB_xyz) 2017-11-27 13:05:09 -08:00
squashfs Rename superblock flags (MS_xyz -> SB_xyz) 2017-11-27 13:05:09 -08:00
sysfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk 2018-02-01 13:36:15 -08:00
sysv Rename superblock flags (MS_xyz -> SB_xyz) 2017-11-27 13:05:09 -08:00
tracefs VFS: Don't use save/replace_mount_options if not using generic_show_options 2017-07-06 03:31:46 -04:00
ubifs Refactor support for encrypted symlinks to move common code to fscrypt. 2018-02-04 10:43:12 -08:00
udf udf: Sanitize nanoseconds for time stamps 2017-12-19 08:11:01 +01:00
ufs iversion.h related cleanup for v4.16 2018-02-07 14:25:22 -08:00
xfs Changes since last update: 2018-02-05 13:35:56 -08:00
aio.c Merge branch 'misc.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-11-17 11:54:55 -08:00
anon_inodes.c
attr.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
bad_inode.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
binfmt_aout.c fs: fix kernel_read prototype 2017-09-04 19:05:15 -04:00
binfmt_elf_fdpic.c Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-11-17 12:54:01 -08:00
binfmt_elf.c elf: fix NT_FILE integer overflow 2018-02-06 18:32:45 -08:00
binfmt_em86.c
binfmt_flat.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
binfmt_misc.c fs/binfmt_misc.c: node could be NULL when evicting inode 2017-10-13 16:18:33 -07:00
binfmt_script.c exec: load_script: kill the onstack interp[BINPRM_BUF_SIZE] array 2017-10-03 17:54:25 -07:00
block_dev.c Merge branch 'for-4.15/block' of git://git.kernel.dk/linux-block 2017-11-14 15:32:19 -08:00
buffer.c Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-01-31 09:25:20 -08:00
char_dev.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
compat_binfmt_elf.c
compat_ioctl.c fs: compat_ioctl: add new DVB demux ioctls 2017-12-28 11:17:29 -05:00
compat.c
coredump.c Merge branch 'misc.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-11-17 11:54:55 -08:00
dax.c Only miscellaneous cleanups and bug fixes for ext4 this cycle. 2018-02-03 13:49:22 -08:00
dcache.c Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs 2018-02-05 13:05:20 -08:00
dcookies.c
direct-io.c iomap: report collisions between directio and buffered writes to userspace 2018-01-08 10:41:39 -08:00
drop_caches.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
eventfd.c vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
eventpoll.c vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
exec.c exec: Weaken dumpability for secureexec 2018-01-03 10:13:36 -08:00
fcntl.c vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
fhandle.c vfs: Copy struct mount.mnt_id to userspace using put_user() 2018-01-15 12:07:51 -08:00
file_table.c vfs: remove unused hardirq.h 2017-12-07 14:23:30 -05:00
file.c Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-01-31 09:25:20 -08:00
filesystems.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
fs_pin.c Merge branch 'linus' into locking/core, to resolve conflicts 2017-11-07 10:32:44 +01:00
fs_struct.c
fs-writeback.c writeback: update comment in inode_io_list_move_locked 2018-01-06 09:18:00 -07:00
inode.c vfs: remove might_sleep() from clear_inode() 2018-02-06 18:32:47 -08:00
internal.h fs: expose do_unlinkat for built-in callers 2017-11-10 08:48:46 -05:00
ioctl.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
iomap.c iomap: warn on zero-length mappings 2018-01-29 07:27:24 -08:00
Kconfig libnvdimm for 4.16 2018-02-06 10:41:33 -08:00
Kconfig.binfmt ARM: enable elf_fdpic on systems with an MMU 2017-09-10 19:31:46 -04:00
libfs.c Rename superblock flags (MS_xyz -> SB_xyz) 2017-11-27 13:05:09 -08:00
locks.c This request is late, apologies. 2018-02-08 15:18:32 -08:00
Makefile ncpfs: move net/ncpfs to drivers/staging/ncpfs 2017-11-28 13:55:01 +01:00
mbcache.c mbcache: make sure c_entry_count is not decremented past zero 2018-01-09 23:57:52 -05:00
mount.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mpage.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
namei.c Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-01-31 09:25:20 -08:00
namespace.c VFS: Handle lazytime in do_mount() 2017-12-09 20:16:33 -05:00
no-block.c
nsfs.c nsfs: generalize ns_get_path() for path resolution with a task 2017-12-31 16:12:23 +01:00
open.c ovl: don't allow writing ioctl on lower layer 2017-09-05 12:53:12 +02:00
pipe.c vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
pnode.c
pnode.h
posix_acl.c posix_acl: convert posix_acl.a_refcount from atomic_t to refcount_t 2018-01-02 19:27:28 -08:00
proc_namespace.c vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
read_write.c Merge branch 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-11-17 12:08:18 -08:00
readdir.c Merge branch 'linus' into locking/core, to resolve conflicts 2017-11-07 10:32:44 +01:00
select.c vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
seq_file.c seq_file: fix incomplete reset on read from zero offset 2018-01-20 02:31:15 -05:00
signalfd.c vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
splice.c locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE() 2017-10-25 11:01:08 +02:00
stack.c
stat.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
statfs.c Rename superblock flags (MS_xyz -> SB_xyz) 2017-11-27 13:05:09 -08:00
super.c Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-01-31 09:25:20 -08:00
sync.c Merge branch 'for-4.15/block' of git://git.kernel.dk/linux-block 2017-11-14 15:32:19 -08:00
timerfd.c vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
userfaultfd.c vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
utimes.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
xattr.c lsm: fix smack_inode_removexattr and xattr_getsecurity memleak 2017-10-04 18:03:15 +11:00