linux/fs
Sebastian Andrzej Siewior 45f78b0a27 fs/dcache: Move the wakeup from __d_lookup_done() to the caller.
__d_lookup_done() wakes waiters on dentry->d_wait.  On PREEMPT_RT we are
not allowed to do that with preemption disabled, since the wakeup
acquired wait_queue_head::lock, which is a "sleeping" spinlock on RT.

Calling it under dentry->d_lock is not a problem, since that is also a
"sleeping" spinlock on the same configs.  Unfortunately, two of its
callers (__d_add() and __d_move()) are holding more than just ->d_lock
and that needs to be dealt with.

The key observation is that wakeup can be moved to any point before
dropping ->d_lock.

As a first step to solve this, move the wake up outside of the
hlist_bl_lock() held section.

This is safe because:

Waiters get inserted into ->d_wait only after they'd taken ->d_lock
and observed DCACHE_PAR_LOOKUP in flags.  As long as they are
woken up (and evicted from the queue) between the moment __d_lookup_done()
has removed DCACHE_PAR_LOOKUP and dropping ->d_lock, we are safe,
since the waitqueue ->d_wait points to won't get destroyed without
having __d_lookup_done(dentry) called (under ->d_lock).

->d_wait is set only by d_alloc_parallel() and only in case when
it returns a freshly allocated in-lookup dentry.  Whenever that happens,
we are guaranteed that __d_lookup_done() will be called for resulting
dentry (under ->d_lock) before the wq in question gets destroyed.

With two exceptions wq lives in call frame of the caller of
d_alloc_parallel() and we have an explicit d_lookup_done() on the
resulting in-lookup dentry before we leave that frame.

One of those exceptions is nfs_call_unlink(), where wq is embedded into
(dynamically allocated) struct nfs_unlinkdata.  It is destroyed in
nfs_async_unlink_release() after an explicit d_lookup_done() on the
dentry wq went into.

Remaining exception is d_add_ci(). There wq is what we'd found in
->d_wait of d_add_ci() argument. Callers of d_add_ci() are two
instances of ->d_lookup() and they must have been given an in-lookup
dentry.  Which means that they'd been called by __lookup_slow() or
lookup_open(), with wq in the call frame of one of those.

Result of d_alloc_parallel() in d_add_ci() is fed to
d_splice_alias(), which either returns non-NULL (and d_add_ci() does
d_lookup_done()) or feeds dentry to __d_add() that will do
__d_lookup_done() under ->d_lock.  That concludes the analysis.

Let __d_lookup_unhash():

  1) Lock the lookup hash and clear DCACHE_PAR_LOOKUP
  2) Unhash the dentry
  3) Retrieve and clear dentry::d_wait
  4) Unlock the hash and return the retrieved waitqueue head pointer
  5) Let the caller handle the wake up.
  6) Rename __d_lookup_done() to __d_lookup_unhash_wake() to enforce
     build failures for OOT code that used __d_lookup_done() and is not
     aware of the new return value.

This does not yet solve the PREEMPT_RT problem completely because
preemption is still disabled due to i_dir_seq being held for write. This
will be addressed in subsequent steps.

An alternative solution would be to switch the waitqueue to a simple
waitqueue, but aside of Linus not being a fan of them, moving the wake up
closer to the place where dentry::lock is unlocked reduces lock contention
time for the woken up waiter.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lkml.kernel.org/r/20220613140712.77932-3-bigeasy@linutronix.de
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-07-30 00:36:10 -04:00
..
9p netfs: Rename the netfs_io_request cleanup op and give it an op pointer 2022-06-10 20:55:21 +01:00
adfs fs: Convert block_read_full_page() to block_read_full_folio() 2022-05-09 16:21:44 -04:00
affs affs: Convert affs to read_folio 2022-05-09 16:21:44 -04:00
afs netfs: Rename the netfs_io_request cleanup op and give it an op pointer 2022-06-10 20:55:21 +01:00
autofs
befs befs: Convert befs to read_folio 2022-05-09 16:21:45 -04:00
bfs fs: Convert block_read_full_page() to block_read_full_folio() 2022-05-09 16:21:44 -04:00
btrfs Page cache changes for 5.19 2022-05-24 19:55:07 -07:00
cachefiles cachefiles: add tracepoints for on-demand read mode 2022-05-18 00:11:18 +08:00
ceph netfs: Rename the netfs_io_request cleanup op and give it an op pointer 2022-06-10 20:55:21 +01:00
cifs 3 smb3 reconnect fixes 2022-06-12 11:05:44 -07:00
coda coda: Convert coda to read_folio 2022-05-09 16:21:45 -04:00
configfs configfs: fix a race in configfs_{,un}register_subsystem() 2022-02-22 18:30:28 +01:00
cramfs cramfs: Convert cramfs to read_folio 2022-05-09 16:21:45 -04:00
crypto fscrypt: add new helper functions for test_dummy_encryption 2022-05-09 16:18:54 -07:00
debugfs debugfs: Document that debugfs_create functions need not be error checked 2022-02-25 11:56:13 +01:00
devpts fsnotify: fix fsnotify hooks in pseudo filesystems 2022-01-24 14:17:02 +01:00
dlm dlm: use kref_put_lock in __put_lkb 2022-05-02 11:23:49 -05:00
ecryptfs ecryptfs: Convert ecryptfs to read_folio 2022-05-09 16:21:45 -04:00
efivarfs
efs efs: Convert efs symlinks to read_folio 2022-05-09 16:21:45 -04:00
erofs Changes since last update: 2022-06-01 11:54:29 -07:00
exfat Page cache changes for 5.19 2022-05-24 19:55:07 -07:00
exportfs exportfs: support idmapped mounts 2022-04-28 16:31:10 +02:00
ext2 fs: Fix syntax errors in comments 2022-06-06 09:53:03 +02:00
ext4 Page cache changes for 5.19 2022-05-24 19:55:07 -07:00
f2fs f2fs-for-5.19 2022-05-31 16:52:59 -07:00
fat Not a lot of material this cycle. Many singleton patches against various 2022-05-27 11:22:03 -07:00
freevxfs SPDX changes for 5.19-rc1 2022-06-03 10:34:34 -07:00
fscache fscache: remove FSCACHE_OLD_API Kconfig option 2022-04-08 23:54:37 +01:00
fuse libnvdimm for 5.19 2022-05-27 15:49:30 -07:00
gfs2 Page cache changes for 5.19 2022-05-24 19:55:07 -07:00
hfs fs: Change try_to_free_buffers() to take a folio 2022-05-09 23:12:34 -04:00
hfsplus fs: Change try_to_free_buffers() to take a folio 2022-05-09 23:12:34 -04:00
hostfs hostfs: Convert hostfs to read_folio 2022-05-09 16:21:45 -04:00
hpfs hpfs: Convert symlinks to read_folio 2022-05-09 16:21:45 -04:00
hugetlbfs powerpc updates for 5.19 2022-05-28 11:27:17 -07:00
iomap Page cache changes for 5.19 2022-05-24 19:55:07 -07:00
isofs isofs: Convert symlinks and zisofs to read_folio 2022-05-09 16:21:45 -04:00
jbd2 Page cache changes for 5.19 2022-05-24 19:55:07 -07:00
jffs2 This pull request contains fixes for JFFS2, UBI and UBIFS 2022-06-03 14:42:24 -07:00
jfs JFS: One bug fix and some code cleanup 2022-05-27 15:59:21 -07:00
kernfs kernfs: Separate kernfs_pr_cont_buf and rename_lock. 2022-05-19 19:37:06 +02:00
ksmbd ksmbd: smbd: relax the count of sges required 2022-05-27 21:31:20 -05:00
lockd NFSD: Move svc_serv_ops::svo_function into struct svc_serv 2022-02-28 10:26:40 -05:00
minix fs: Convert block_read_full_page() to block_read_full_folio() 2022-05-09 16:21:44 -04:00
netfs netfs: Rename the netfs_io_request cleanup op and give it an op pointer 2022-06-10 20:55:21 +01:00
nfs Cleanups (and one fix) around struct mount handling. 2022-06-04 19:00:05 -07:00
nfs_common
nfsd Notable changes: 2022-06-10 17:28:43 -07:00
nilfs2 Page cache changes for 5.19 2022-05-24 19:55:07 -07:00
nls
notify \n 2022-05-25 19:29:54 -07:00
ntfs Not a lot of material this cycle. Many singleton patches against various 2022-05-27 11:22:03 -07:00
ntfs3 Ntfs3 for 5.19 2022-06-03 16:57:16 -07:00
ocfs2 Not a lot of material this cycle. Many singleton patches against various 2022-05-27 11:22:03 -07:00
omfs fs: Convert block_read_full_page() to block_read_full_folio() 2022-05-09 16:21:44 -04:00
openpromfs fs: allocate inode by using alloc_inode_sb() 2022-03-22 15:57:03 -07:00
orangefs orangefs: Convert to free_folio 2022-05-09 23:12:53 -04:00
overlayfs overlayfs update for 5.19 2022-05-30 11:19:16 -07:00
proc Not a lot of material this cycle. Many singleton patches against various 2022-05-27 11:22:03 -07:00
pstore pstore: Don't use semaphores in always-atomic-context code 2022-03-15 11:08:23 -07:00
qnx4 fs: Convert block_read_full_page() to block_read_full_folio() 2022-05-09 16:21:44 -04:00
qnx6 fs: Convert mpage_readpage to mpage_read_folio 2022-05-09 16:21:44 -04:00
quota quota: Prevent memory allocation recursion while holding dq_lock 2022-06-06 10:08:10 +02:00
ramfs Merge branch 'akpm' (patches from Andrew) 2021-11-09 10:11:53 -08:00
reiserfs fs: Change try_to_free_buffers() to take a folio 2022-05-09 23:12:34 -04:00
romfs romfs: Convert romfs to read_folio 2022-05-09 16:21:46 -04:00
smbfs_common Add various fsctl structs 2022-05-23 20:24:12 -05:00
squashfs Page cache changes for 5.19 2022-05-24 19:55:07 -07:00
sysfs kobject: kobj_type: remove default_attrs 2022-04-05 15:39:19 +02:00
sysv Not a lot of material this cycle. Many singleton patches against various 2022-05-27 11:22:03 -07:00
tracefs tracefs: Set the group ownership in apply_options() not parse_options() 2022-02-25 21:05:04 -05:00
ubifs This pull request contains fixes for JFFS2, UBI and UBIFS 2022-06-03 14:42:24 -07:00
udf Page cache changes for 5.19 2022-05-24 19:55:07 -07:00
ufs fs: Convert block_read_full_page() to block_read_full_folio() 2022-05-09 16:21:44 -04:00
unicode kbuild: unify cmd_copy and cmd_shipped 2022-02-14 10:37:32 +09:00
vboxsf vboxsf: Convert vboxsf to read_folio 2022-05-09 16:21:46 -04:00
verity Page cache changes for 5.19 2022-05-24 19:55:07 -07:00
xfs xfs: Changes for 5.19-rc1 [2nd set] 2022-06-01 17:23:53 -07:00
zonefs zonefs: fix zonefs_iomap_begin() for reads 2022-06-08 19:13:55 +09:00
aio.c Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2022-04-01 19:57:03 -07:00
anon_inodes.c
attr.c fs: handle circular mappings correctly 2021-11-17 09:26:09 +01:00
bad_inode.c
binfmt_aout.c
binfmt_elf_fdpic.c coredump: Snapshot the vmas in do_coredump 2022-03-08 12:55:29 -06:00
binfmt_elf_test.c binfmt_elf: Introduce KUnit test 2022-03-03 20:38:56 -08:00
binfmt_elf.c revert "fs/binfmt_elf: use PT_LOAD p_align values for static PIE" 2022-04-15 14:49:56 -07:00
binfmt_flat.c binfmt_flat: Remove shared library support 2022-04-22 10:57:18 -07:00
binfmt_misc.c Fix regression due to "fs: move binfmt_misc sysctl to its own file" 2022-02-09 09:50:02 -08:00
binfmt_script.c
buffer.c fs: Convert drop_buffers() to use a folio 2022-05-09 23:12:34 -04:00
char_dev.c
compat_binfmt_elf.c binfmt_elf: Introduce KUnit test 2022-03-03 20:38:56 -08:00
coredump.c ptrace: Cleanups for v5.18 2022-03-28 17:29:53 -07:00
d_path.c d_path: fix Kernel doc validator complaining 2021-11-06 13:30:32 -07:00
dax.c libnvdimm for 5.19 2022-05-27 15:49:30 -07:00
dcache.c fs/dcache: Move the wakeup from __d_lookup_done() to the caller. 2022-07-30 00:36:10 -04:00
direct-io.c direct-io: remove random prefetches 2022-04-17 19:50:02 -06:00
drop_caches.c
eventfd.c
eventpoll.c eventpoll: simplify sysctl declaration with register_sysctl() 2022-01-22 08:33:35 +02:00
exec.c This set of changes updates init and user mode helper tasks to be 2022-06-03 16:03:05 -07:00
fcntl.c VFS: add FMODE_CAN_ODIRECT file flag 2022-05-09 18:20:49 -07:00
fhandle.c
file_table.c Descriptor handling cleanups 2022-06-04 18:52:00 -07:00
file.c fix the breakage in close_fd_get_file() calling conventions change 2022-06-05 15:03:03 -04:00
filesystems.c
fs_context.c vfs: fs_context: fix up param length parsing in legacy_parse_param 2022-01-18 09:23:19 +02:00
fs_parser.c fs_parse: allow parameter value to be empty 2021-12-09 14:09:36 -05:00
fs_pin.c
fs_struct.c
fs_types.c
fs-writeback.c writeback: Fix inode->i_io_list not be protected by inode->i_lock error 2022-06-06 09:54:30 +02:00
fsopen.c uninline may_mount() and don't opencode it in fspick(2)/fsopen(2) 2022-05-19 23:25:10 -04:00
init.c
inode.c writeback: Fix inode->i_io_list not be protected by inode->i_lock error 2022-06-06 09:54:30 +02:00
internal.h Cleanups (and one fix) around struct mount handling. 2022-06-04 19:00:05 -07:00
io_uring.c fix for breakage in #work.fd this window 2022-06-05 17:14:03 -07:00
io-wq.c io-wq: use __set_notify_signal() to wake workers 2022-04-30 08:39:54 -06:00
io-wq.h io_uring: add support for IORING_ASYNC_CANCEL_ALL 2022-04-24 18:18:18 -06:00
ioctl.c Fixes for 5.18-rc1: 2022-04-01 19:35:56 -07:00
Kconfig mm: hugetlb_vmemmap: cleanup CONFIG_HUGETLB_PAGE_FREE_VMEMMAP* 2022-04-28 23:16:15 -07:00
Kconfig.binfmt m68knommu: changes for linux 5.19 2022-05-30 10:56:18 -07:00
kernel_read_file.c
libfs.c fs: Convert simple_readpage to simple_read_folio 2022-05-09 16:21:44 -04:00
locks.c fs/lock: add 2 callbacks to lock_manager_operations to resolve conflict 2022-05-19 12:25:39 -04:00
Makefile Fix from Christoph Hellwig merging the CONFIG_UNICODE_UTF8_DATA into the 2022-02-01 11:13:24 -08:00
mbcache.c
mount.h
mpage.c fs: Change try_to_free_buffers() to take a folio 2022-05-09 23:12:34 -04:00
namei.c Several cleanups in fs/namei.c. 2022-06-04 19:07:15 -07:00
namespace.c Cleanups (and one fix) around struct mount handling. 2022-06-04 19:00:05 -07:00
no-block.c
nsfs.c
open.c RISC-V Patches for the 5.19 Merge Window, Part 1 2022-05-31 14:10:54 -07:00
pipe.c Not a lot of material this cycle. Many singleton patches against various 2022-05-27 11:22:03 -07:00
pnode.c
pnode.h
posix_acl.c fs: fix acl translation 2022-04-19 10:19:02 -07:00
proc_namespace.c fs: add is_idmapped_mnt() helper 2021-12-03 18:44:06 +01:00
read_write.c riscv: compat: syscall: Add compat_sys_call_table implementation 2022-04-26 13:36:25 -07:00
readdir.c
remap_range.c Filesystem folio changes for 5.18 2022-03-22 18:26:56 -07:00
select.c select: Fix indefinitely sleeping task in poll_schedule_timeout() 2022-01-11 09:03:05 -08:00
seq_file.c rxrpc: Fix locking issue 2022-05-22 21:03:01 +01:00
signalfd.c Merge branch 'signal-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2022-01-17 05:49:30 +02:00
splice.c mm: Convert remove_mapping() to take a folio 2022-03-21 12:59:01 -04:00
stack.c
stat.c RISC-V Patches for the 5.19 Merge Window, Part 1 2022-05-31 14:10:54 -07:00
statfs.c
super.c block: add a bdev_stable_writes helper 2022-04-17 19:49:59 -06:00
sync.c riscv: compat: syscall: Add compat_sys_call_table implementation 2022-04-26 13:36:25 -07:00
sysctls.c fs: move namespace sysctls and declare fs base directory 2022-01-22 08:33:36 +02:00
timerfd.c
userfaultfd.c mm/uffd: enable write protection for shmem & hugetlbfs 2022-05-13 07:20:11 -07:00
utimes.c
xattr.c fs: split off do_getxattr from getxattr 2022-04-24 18:18:37 -06:00