linux/fs
Linus Torvalds b5683a37c8 vfs-6.9.pidfd
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZem4/wAKCRCRxhvAZXjc
 opnBAQCaQWwxjT0VLHebPniw6tel/KYlZ9jH9kBQwLrk1pembwEA+BsCY2C8YS4a
 75v9jOPxr+Z8j1SjxwwubcONPyqYXwQ=
 =+Wa3
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.9.pidfd' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull pdfd updates from Christian Brauner:

 - Until now pidfds could only be created for thread-group leaders but
   not for threads. There was no technical reason for this. We simply
   had no users that needed support for this. Now we do have users that
   need support for this.

   This introduces a new PIDFD_THREAD flag for pidfd_open(). If that
   flag is set pidfd_open() creates a pidfd that refers to a specific
   thread.

   In addition, we now allow clone() and clone3() to be called with
   CLONE_PIDFD | CLONE_THREAD which wasn't possible before.

   A pidfd that refers to an individual thread differs from a pidfd that
   refers to a thread-group leader:

    (1) Pidfds are pollable. A task may poll a pidfd and get notified
        when the task has exited.

        For thread-group leader pidfds the polling task is woken if the
        thread-group is empty. In other words, if the thread-group
        leader task exits when there are still threads alive in its
        thread-group the polling task will not be woken when the
        thread-group leader exits but rather when the last thread in the
        thread-group exits.

        For thread-specific pidfds the polling task is woken if the
        thread exits.

    (2) Passing a thread-group leader pidfd to pidfd_send_signal() will
        generate thread-group directed signals like kill(2) does.

        Passing a thread-specific pidfd to pidfd_send_signal() will
        generate thread-specific signals like tgkill(2) does.

        The default scope of the signal is thus determined by the type
        of the pidfd.

        Since use-cases exist where the default scope of the provided
        pidfd needs to be overriden the following flags are added to
        pidfd_send_signal():

         - PIDFD_SIGNAL_THREAD
           Send a thread-specific signal.

         - PIDFD_SIGNAL_THREAD_GROUP
           Send a thread-group directed signal.

         - PIDFD_SIGNAL_PROCESS_GROUP
           Send a process-group directed signal.

        The scope change will only work if the struct pid is actually
        used for this scope.

        For example, in order to send a thread-group directed signal the
        provided pidfd must be used as a thread-group leader and
        similarly for PIDFD_SIGNAL_PROCESS_GROUP the struct pid must be
        used as a process group leader.

 - Move pidfds from the anonymous inode infrastructure to a tiny pseudo
   filesystem. This will unblock further work that we weren't able to do
   simply because of the very justified limitations of anonymous inodes.
   Moving pidfds to a tiny pseudo filesystem allows for statx on pidfds
   to become useful for the first time. They can now be compared by
   inode number which are unique for the system lifetime.

   Instead of stashing struct pid in file->private_data we can now stash
   it in inode->i_private. This makes it possible to introduce concepts
   that operate on a process once all file descriptors have been closed.
   A concrete example is kill-on-last-close. Another side-effect is that
   file->private_data is now freed up for per-file options for pidfds.

   Now, each struct pid will refer to a different inode but the same
   struct pid will refer to the same inode if it's opened multiple
   times. In contrast to now where each struct pid refers to the same
   inode.

   The tiny pseudo filesystem is not visible anywhere in userspace
   exactly like e.g., pipefs and sockfs. There's no lookup, there's no
   complex inode operations, nothing. Dentries and inodes are always
   deleted when the last pidfd is closed.

   We allocate a new inode and dentry for each struct pid and we reuse
   that inode and dentry for all pidfds that refer to the same struct
   pid. The code is entirely optional and fairly small. If it's not
   selected we fallback to anonymous inodes. Heavily inspired by nsfs.

   The dentry and inode allocation mechanism is moved into generic
   infrastructure that is now shared between nsfs and pidfs. The
   path_from_stashed() helper must be provided with a stashing location,
   an inode number, a mount, and the private data that is supposed to be
   used and it will provide a path that can be passed to dentry_open().

   The helper will try retrieve an existing dentry from the provided
   stashing location. If a valid dentry is found it is reused. If not a
   new one is allocated and we try to stash it in the provided location.
   If this fails we retry until we either find an existing dentry or the
   newly allocated dentry could be stashed. Subsequent openers of the
   same namespace or task are then able to reuse it.

 - Currently it is only possible to get notified when a task has exited,
   i.e., become a zombie and userspace gets notified with EPOLLIN. We
   now also support waiting until the task has been reaped, notifying
   userspace with EPOLLHUP.

 - Ensure that ESRCH is reported for getfd if a task is exiting instead
   of the confusing EBADF.

 - Various smaller cleanups to pidfd functions.

* tag 'vfs-6.9.pidfd' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (23 commits)
  libfs: improve path_from_stashed()
  libfs: add stashed_dentry_prune()
  libfs: improve path_from_stashed() helper
  pidfs: convert to path_from_stashed() helper
  nsfs: convert to path_from_stashed() helper
  libfs: add path_from_stashed()
  pidfd: add pidfs
  pidfd: move struct pidfd_fops
  pidfd: allow to override signal scope in pidfd_send_signal()
  pidfd: change pidfd_send_signal() to respect PIDFD_THREAD
  signal: fill in si_code in prepare_kill_siginfo()
  selftests: add ESRCH tests for pidfd_getfd()
  pidfd: getfd should always report ESRCH if a task is exiting
  pidfd: clone: allow CLONE_THREAD | CLONE_PIDFD together
  pidfd: exit: kill the no longer used thread_group_exited()
  pidfd: change do_notify_pidfd() to use __wake_up(poll_to_key(EPOLLIN))
  pid: kill the obsolete PIDTYPE_PID code in transfer_pid()
  pidfd: kill the no longer needed do_notify_pidfd() in de_thread()
  pidfd_poll: report POLLHUP when pid_task() == NULL
  pidfd: implement PIDFD_THREAD flag for pidfd_open()
  ...
2024-03-11 10:21:06 -07:00
..
9p 9p: Use length of data written to the server in preference to error 2024-01-04 13:15:31 +00:00
adfs adfs: remove writepage implementation 2023-12-29 11:58:33 -08:00
affs affs: free affs_sb_info with kfree_rcu() 2024-02-25 02:10:31 -05:00
afs afs: Fix endless loop in directory parsing 2024-02-27 11:20:43 +01:00
autofs dcache stuff for this cycle 2024-01-11 20:11:35 -08:00
bcachefs bcachefs: fix bch2_save_backtrace() 2024-02-25 15:45:36 -05:00
befs befs: d_obtain_alias(ERR_PTR(...)) will do the right thing 2023-12-21 12:51:02 -05:00
bfs misc cleanups (the part that hadn't been picked by individual fs trees) 2024-01-11 20:23:50 -08:00
btrfs for-6.8-rc6-tag 2024-03-01 12:29:20 -08:00
cachefiles cachefiles: fix memory leak in cachefiles_add_cache() 2024-02-20 09:46:07 +01:00
ceph ceph: switch to corrected encoding of max_xattr_size in mdsmap 2024-02-26 19:20:30 +01:00
coda Convert coda to use the new mount API 2024-02-22 10:03:38 +01:00
configfs
cramfs vfs-6.7.ctime 2023-10-30 09:47:13 -10:00
crypto case-insensitive updates for 6.9 2024-03-07 11:55:41 +01:00
debugfs Merge branches 'acpi-pm', 'acpi-video', 'acpi-apei' and 'acpi-extlog' 2024-01-04 13:19:40 +01:00
devpts fs: Remove the now superfluous sentinel elements from ctl_table array 2023-12-28 04:57:57 -08:00
dlm dlm: update format header reflect current format 2023-12-20 15:36:48 -06:00
ecryptfs Merge tag 'exportfs-6.9' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/cel/linux 2024-01-23 17:56:30 +01:00
efivarfs efivarfs: Drop 'duplicates' bool parameter on efivar_init() 2024-02-25 09:43:39 +01:00
efs efs: remove SLAB_MEM_SPREAD flag usage 2024-02-27 11:21:33 +01:00
erofs erofs: apply proper VMA alignment for memory mapped files on THP 2024-03-07 10:21:10 +08:00
exfat Description for this pull request: 2024-03-01 12:22:30 -08:00
exportfs fs: Create a generic is_dot_dotdot() utility 2024-01-23 10:58:56 -05:00
ext2 fix directory locking scheme on rename 2024-01-11 20:00:22 -08:00
ext4 vfs-6.9.misc 2024-03-11 09:38:17 -07:00
f2fs vfs-6.9.iomap 2024-03-11 10:07:03 -07:00
fat vfs-6.7.fsid 2023-11-07 12:11:26 -08:00
freevxfs freevxfs: lookup: fix function params kernel-doc 2023-12-20 15:02:58 -08:00
fuse fuse: fix UAF in rcu pathwalks 2024-02-25 02:10:32 -05:00
gfs2 vfs-6.9.iomap 2024-03-11 10:07:03 -07:00
hfs hfs: really remove hfs_writepage 2023-12-29 11:58:34 -08:00
hfsplus vfs-6.9.misc 2024-03-11 09:38:17 -07:00
hostfs hostfs: use d_splice_alias() calling conventions to simplify failure exits 2023-12-21 12:51:00 -05:00
hpfs
hugetlbfs vfs-6.9.misc 2024-03-11 09:38:17 -07:00
iomap vfs-6.9.rw_hint 2024-03-04 18:35:21 +01:00
isofs
jbd2 jbd2: abort journal when detecting metadata writeback error of fs dev 2024-01-04 23:42:21 -05:00
jffs2 jffs2: mark __jffs2_dbg_superblock_counts() static 2023-12-10 17:21:43 -08:00
jfs vfs-6.9.misc 2024-03-11 09:38:17 -07:00
kernfs Revert "kernfs: convert kernfs_idr_lock to an irq safe raw spinlock" 2024-01-11 11:51:27 +01:00
lockd sysctl-6.8-rc1 2024-01-10 17:44:36 -08:00
minix minix: remove SLAB_MEM_SPREAD flag usage 2024-02-27 11:21:32 +01:00
netfs netfs: Fix missing zero-length check in unbuffered write 2024-01-29 14:53:21 +01:00
nfs nfs: fix UAF on pathwalk running into umount 2024-02-25 02:10:32 -05:00
nfs_common
nfsd nfsd-6.8 fixes: 2024-02-07 17:48:15 +00:00
nilfs2 nilfs2: fix potential bug in end_buffer_async_write 2024-02-07 21:20:37 -08:00
nls
notify dcache stuff for this cycle 2024-01-11 20:11:35 -08:00
ntfs3 vfs-6.9.misc 2024-03-11 09:38:17 -07:00
ocfs2 misc cleanups (the part that hadn't been picked by individual fs trees) 2024-01-11 20:23:50 -08:00
omfs
openpromfs openpromfs: remove SLAB_MEM_SPREAD flag usage 2024-02-27 11:21:32 +01:00
orangefs orangefs: saner arguments passing in readdir guts 2023-12-21 12:53:36 -05:00
overlayfs vfs-6.9.misc 2024-03-11 09:38:17 -07:00
proc vfs-6.9.misc 2024-03-11 09:38:17 -07:00
pstore pstore: inode: Use cleanup.h for struct pstore_private 2023-12-08 14:15:44 -08:00
qnx4 qnx4: convert qnx4 to use the new mount api 2024-03-01 13:58:50 +01:00
qnx6 qnx6: remove SLAB_MEM_SPREAD flag usage 2024-02-27 11:21:32 +01:00
quota sysctl-6.8-rc1 2024-01-10 17:44:36 -08:00
ramfs mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER 2024-01-08 15:27:15 -08:00
reiserfs reiserfs: remove SLAB_MEM_SPREAD flag usage 2024-02-27 11:21:32 +01:00
romfs romfs: remove SLAB_MEM_SPREAD flag usage 2024-02-27 11:21:31 +01:00
smb We still have some races in filesystem methods when exposed to RCU 2024-02-25 09:29:05 -08:00
squashfs Squashfs: fix variable overflow triggered by sysbot 2023-12-10 17:21:26 -08:00
sysfs fs/sysfs/dir.c : Fix typo in comment 2023-12-07 11:35:23 +09:00
sysv sysv: remove SLAB_MEM_SPREAD flag usage 2024-02-27 11:21:31 +01:00
tracefs eventfs: Keep all directory links at 1 2024-02-01 11:53:53 -05:00
ubifs case-insensitive updates for 6.9 2024-03-07 11:55:41 +01:00
udf misc cleanups (the part that hadn't been picked by individual fs trees) 2024-01-11 20:23:50 -08:00
ufs Many singleton patches against the MM code. The patch series which 2024-01-09 11:18:47 -08:00
unicode
vboxsf fs: vboxsf: fix a kernel-doc warning 2023-12-08 15:32:31 -07:00
verity Networking changes for 6.8. 2024-01-11 10:07:29 -08:00
xfs vfs-6.9.iomap 2024-03-11 10:07:03 -07:00
zonefs vfs-6.9.iomap 2024-03-11 10:07:03 -07:00
aio.c fs/aio: Check IOCB_AIO_RW before the struct aio_kiocb conversion 2024-03-05 10:40:58 +01:00
anon_inodes.c Merge branch 'kvm-guestmemfd' into HEAD 2023-11-14 08:31:31 -05:00
attr.c fs: fix a typo in attr.c 2024-01-22 15:33:30 +01:00
backing-file.c fs: Use KMEM_CACHE instead of kmem_cache_create 2024-02-02 13:11:50 +01:00
bad_inode.c
binfmt_elf_fdpic.c execve updates for v6.7-rc1 2023-10-30 19:28:19 -10:00
binfmt_elf_test.c
binfmt_elf.c
binfmt_flat.c
binfmt_misc.c execve updates for v6.7-rc1 2023-10-30 19:28:19 -10:00
binfmt_script.c
buffer.c vfs-6.9.iomap 2024-03-11 10:07:03 -07:00
char_dev.c As usual, lots of singleton and doubleton patches all over the tree and 2023-11-02 20:53:31 -10:00
compat_binfmt_elf.c
coredump.c iov_iter: get rid of 'copy_mc' flag 2024-03-06 10:52:12 +01:00
d_path.c
dax.c fs : Fix warning using plain integer as NULL 2023-11-18 15:00:01 +01:00
dcache.c vfs-6.9.misc 2024-03-11 09:38:17 -07:00
direct-io.c block, fs: Restore the per-bio/request data lifetime fields 2024-02-06 14:31:05 +01:00
drop_caches.c
eventfd.c eventfd: strictly check the count parameter of eventfd_write to avoid inputting illegal strings 2024-02-08 10:12:26 +01:00
eventpoll.c eventpoll: prefer kfree_rcu() in __ep_remove() 2024-02-22 10:03:31 +01:00
exec.c vfs-6.9.pidfd 2024-03-11 10:21:06 -07:00
fcntl.c vfs-6.9.iomap 2024-03-11 10:07:03 -07:00
fhandle.c do_sys_name_to_handle(): use kzalloc() to fix kernel-infoleak 2024-01-22 15:33:38 +01:00
file_table.c dcache stuff for this cycle 2024-01-11 20:11:35 -08:00
file.c file: remove __receive_fd() 2023-12-12 14:24:14 +01:00
filesystems.c
fs_context.c
fs_parser.c __fs_parse: Correct a documentation comment 2024-02-02 13:11:50 +01:00
fs_pin.c
fs_struct.c
fs_types.c
fs-writeback.c writeback: move wb_wakeup_delayed defination to fs-writeback.c 2024-01-22 15:33:38 +01:00
fsopen.c
init.c
inode.c vfs-6.9.iomap 2024-03-11 10:07:03 -07:00
internal.h libfs: improve path_from_stashed() 2024-03-01 22:31:40 +01:00
ioctl.c lsm: new security_file_ioctl_compat() hook 2023-12-24 15:48:03 -05:00
Kconfig vfs-6.9.pidfd 2024-03-11 10:21:06 -07:00
Kconfig.binfmt
kernel_read_file.c
libfs.c vfs-6.9.pidfd 2024-03-11 10:21:06 -07:00
locks.c fs: Remove the now superfluous sentinel elements from ctl_table array 2023-12-28 04:57:57 -08:00
Makefile vfs-6.9.pidfd 2024-03-11 10:21:06 -07:00
mbcache.c vfs: remove SLAB_MEM_SPREAD flag usage 2024-02-27 11:21:31 +01:00
mnt_idmapping.c fs/mnt_idmapping.c: Return -EINVAL when no map is written 2024-02-08 10:12:37 +01:00
mount.h mounts: keep list of mounts in an rbtree 2023-11-18 14:56:16 +01:00
mpage.c block, fs: Restore the per-bio/request data lifetime fields 2024-02-06 14:31:05 +01:00
namei.c vfs-6.9.misc 2024-03-11 09:38:17 -07:00
namespace.c fs: relax mount_setattr() permission checks 2024-02-07 21:16:29 +01:00
nsfs.c libfs: improve path_from_stashed() 2024-03-01 22:31:40 +01:00
open.c vfs-6.8.rw 2024-01-08 11:11:51 -08:00
pidfs.c libfs: improve path_from_stashed() 2024-03-01 22:31:40 +01:00
pipe.c fs/pipe: Convert to lockdep_cmp_fn 2024-02-02 13:11:49 +01:00
pnode.c mounts: keep list of mounts in an rbtree 2023-11-18 14:56:16 +01:00
pnode.h
posix_acl.c fs: fix doc comment typo fs tree wide 2023-12-21 13:17:54 +01:00
proc_namespace.c namespace: extract show_path() helper 2023-11-18 14:56:16 +01:00
read_write.c fsnotify: optionally pass access range in file permission hooks 2023-12-12 16:20:02 +01:00
readdir.c fsnotify: optionally pass access range in file permission hooks 2023-12-12 16:20:02 +01:00
remap_range.c remap_range: merge do_clone_file_range() into vfs_clone_file_range() 2024-02-06 17:07:21 +01:00
select.c fs/select: rework stack allocation hack for clang 2024-02-20 09:23:52 +01:00
seq_file.c
signalfd.c
splice.c fs: use splice_copy_file_range() inline helper 2023-12-12 16:20:02 +01:00
stack.c
stat.c vfs-6.8.mount 2024-01-08 10:57:34 -08:00
statfs.c
super.c fs/super.c: don't drop ->s_user_ns until we free struct super_block itself 2024-02-25 02:10:31 -05:00
sync.c
sysctls.c fs: Remove the now superfluous sentinel elements from ctl_table array 2023-12-28 04:57:57 -08:00
timerfd.c
userfaultfd.c Generic: 2024-01-17 13:03:37 -08:00
utimes.c
xattr.c