linux/fs
David Howells c3a5e3e872
vfs: Fix potential circular locking through setxattr() and removexattr()
When using cachefiles, lockdep may emit something similar to the circular
locking dependency notice below.  The problem appears to stem from the
following:

 (1) Cachefiles manipulates xattrs on the files in its cache when called
     from ->writepages().

 (2) The setxattr() and removexattr() system call handlers get the name
     (and value) from userspace after taking the sb_writers lock, putting
     accesses of the vma->vm_lock and mm->mmap_lock inside of that.

 (3) The afs filesystem uses a per-inode lock to prevent multiple
     revalidation RPCs and in writeback vs truncate to prevent parallel
     operations from deadlocking against the server on one side and local
     page locks on the other.

Fix this by moving the getting of the name and value in {get,remove}xattr()
outside of the sb_writers lock.  This also has the minor benefits that we
don't need to reget these in the event of a retry and we never try to take
the sb_writers lock in the event we can't pull the name and value into the
kernel.

Alternative approaches that might fix this include moving the dispatch of a
write to the cache off to a workqueue or trying to do without the
validation lock in afs.  Note that this might also affect other filesystems
that use netfslib and/or cachefiles.

 ======================================================
 WARNING: possible circular locking dependency detected
 6.10.0-build2+ #956 Not tainted
 ------------------------------------------------------
 fsstress/6050 is trying to acquire lock:
 ffff888138fd82f0 (mapping.invalidate_lock#3){++++}-{3:3}, at: filemap_fault+0x26e/0x8b0

 but task is already holding lock:
 ffff888113f26d18 (&vma->vm_lock->lock){++++}-{3:3}, at: lock_vma_under_rcu+0x165/0x250

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #4 (&vma->vm_lock->lock){++++}-{3:3}:
        __lock_acquire+0xaf0/0xd80
        lock_acquire.part.0+0x103/0x280
        down_write+0x3b/0x50
        vma_start_write+0x6b/0xa0
        vma_link+0xcc/0x140
        insert_vm_struct+0xb7/0xf0
        alloc_bprm+0x2c1/0x390
        kernel_execve+0x65/0x1a0
        call_usermodehelper_exec_async+0x14d/0x190
        ret_from_fork+0x24/0x40
        ret_from_fork_asm+0x1a/0x30

 -> #3 (&mm->mmap_lock){++++}-{3:3}:
        __lock_acquire+0xaf0/0xd80
        lock_acquire.part.0+0x103/0x280
        __might_fault+0x7c/0xb0
        strncpy_from_user+0x25/0x160
        removexattr+0x7f/0x100
        __do_sys_fremovexattr+0x7e/0xb0
        do_syscall_64+0x9f/0x100
        entry_SYSCALL_64_after_hwframe+0x76/0x7e

 -> #2 (sb_writers#14){.+.+}-{0:0}:
        __lock_acquire+0xaf0/0xd80
        lock_acquire.part.0+0x103/0x280
        percpu_down_read+0x3c/0x90
        vfs_iocb_iter_write+0xe9/0x1d0
        __cachefiles_write+0x367/0x430
        cachefiles_issue_write+0x299/0x2f0
        netfs_advance_write+0x117/0x140
        netfs_write_folio.isra.0+0x5ca/0x6e0
        netfs_writepages+0x230/0x2f0
        afs_writepages+0x4d/0x70
        do_writepages+0x1e8/0x3e0
        filemap_fdatawrite_wbc+0x84/0xa0
        __filemap_fdatawrite_range+0xa8/0xf0
        file_write_and_wait_range+0x59/0x90
        afs_release+0x10f/0x270
        __fput+0x25f/0x3d0
        __do_sys_close+0x43/0x70
        do_syscall_64+0x9f/0x100
        entry_SYSCALL_64_after_hwframe+0x76/0x7e

 -> #1 (&vnode->validate_lock){++++}-{3:3}:
        __lock_acquire+0xaf0/0xd80
        lock_acquire.part.0+0x103/0x280
        down_read+0x95/0x200
        afs_writepages+0x37/0x70
        do_writepages+0x1e8/0x3e0
        filemap_fdatawrite_wbc+0x84/0xa0
        filemap_invalidate_inode+0x167/0x1e0
        netfs_unbuffered_write_iter+0x1bd/0x2d0
        vfs_write+0x22e/0x320
        ksys_write+0xbc/0x130
        do_syscall_64+0x9f/0x100
        entry_SYSCALL_64_after_hwframe+0x76/0x7e

 -> #0 (mapping.invalidate_lock#3){++++}-{3:3}:
        check_noncircular+0x119/0x160
        check_prev_add+0x195/0x430
        __lock_acquire+0xaf0/0xd80
        lock_acquire.part.0+0x103/0x280
        down_read+0x95/0x200
        filemap_fault+0x26e/0x8b0
        __do_fault+0x57/0xd0
        do_pte_missing+0x23b/0x320
        __handle_mm_fault+0x2d4/0x320
        handle_mm_fault+0x14f/0x260
        do_user_addr_fault+0x2a2/0x500
        exc_page_fault+0x71/0x90
        asm_exc_page_fault+0x22/0x30

 other info that might help us debug this:

 Chain exists of:
   mapping.invalidate_lock#3 --> &mm->mmap_lock --> &vma->vm_lock->lock

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   rlock(&vma->vm_lock->lock);
                                lock(&mm->mmap_lock);
                                lock(&vma->vm_lock->lock);
   rlock(mapping.invalidate_lock#3);

  *** DEADLOCK ***

 1 lock held by fsstress/6050:
  #0: ffff888113f26d18 (&vma->vm_lock->lock){++++}-{3:3}, at: lock_vma_under_rcu+0x165/0x250

 stack backtrace:
 CPU: 0 PID: 6050 Comm: fsstress Not tainted 6.10.0-build2+ #956
 Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
 Call Trace:
  <TASK>
  dump_stack_lvl+0x57/0x80
  check_noncircular+0x119/0x160
  ? queued_spin_lock_slowpath+0x4be/0x510
  ? __pfx_check_noncircular+0x10/0x10
  ? __pfx_queued_spin_lock_slowpath+0x10/0x10
  ? mark_lock+0x47/0x160
  ? init_chain_block+0x9c/0xc0
  ? add_chain_block+0x84/0xf0
  check_prev_add+0x195/0x430
  __lock_acquire+0xaf0/0xd80
  ? __pfx___lock_acquire+0x10/0x10
  ? __lock_release.isra.0+0x13b/0x230
  lock_acquire.part.0+0x103/0x280
  ? filemap_fault+0x26e/0x8b0
  ? __pfx_lock_acquire.part.0+0x10/0x10
  ? rcu_is_watching+0x34/0x60
  ? lock_acquire+0xd7/0x120
  down_read+0x95/0x200
  ? filemap_fault+0x26e/0x8b0
  ? __pfx_down_read+0x10/0x10
  ? __filemap_get_folio+0x25/0x1a0
  filemap_fault+0x26e/0x8b0
  ? __pfx_filemap_fault+0x10/0x10
  ? find_held_lock+0x7c/0x90
  ? __pfx___lock_release.isra.0+0x10/0x10
  ? __pte_offset_map+0x99/0x110
  __do_fault+0x57/0xd0
  do_pte_missing+0x23b/0x320
  __handle_mm_fault+0x2d4/0x320
  ? __pfx___handle_mm_fault+0x10/0x10
  handle_mm_fault+0x14f/0x260
  do_user_addr_fault+0x2a2/0x500
  exc_page_fault+0x71/0x90
  asm_exc_page_fault+0x22/0x30

Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/2136178.1721725194@warthog.procyon.org.uk
cc: Alexander Viro <viro@zeniv.linux.org.uk>
cc: Christian Brauner <brauner@kernel.org>
cc: Jan Kara <jack@suse.cz>
cc: Jeff Layton <jlayton@kernel.org>
cc: Gao Xiang <xiang@kernel.org>
cc: Matthew Wilcox <willy@infradead.org>
cc: netfs@lists.linux.dev
cc: linux-erofs@lists.ozlabs.org
cc: linux-fsdevel@vger.kernel.org
[brauner: fix minor issues]
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-24 10:53:14 +02:00
..
9p Two fixes headed to stable trees: 2024-05-29 09:25:15 -07:00
adfs fs/adfs: add MODULE_DESCRIPTION 2024-07-18 09:50:08 +02:00
affs affs: struct slink_front: Replace 1-element array with flexible array 2024-07-11 16:14:26 +02:00
afs - 875fa64577 ("mm/hugetlb_vmemmap: fix race with speculative PFN 2024-07-21 17:15:46 -07:00
autofs vfs-6.11.mount.api 2024-07-15 11:31:32 -07:00
bcachefs - In the series "treewide: Refactor heap related implementation", 2024-07-21 17:56:22 -07:00
befs befs: Convert befs_symlink_read_folio() to use folio_end_read() 2024-05-31 12:31:39 +02:00
bfs mm, slab: remove last vestiges of SLAB_MEM_SPREAD 2024-03-12 20:32:19 -07:00
btrfs - 875fa64577 ("mm/hugetlb_vmemmap: fix race with speculative PFN 2024-07-21 17:15:46 -07:00
cachefiles cachefiles: Set the max subreq size for cache writes to MAX_RW_COUNT 2024-07-24 10:53:13 +02:00
ceph ceph: drop usage of page_index 2024-07-03 19:29:55 -07:00
coda coda: Convert coda_symlink_filler() to use folio_end_read() 2024-05-31 12:31:39 +02:00
configfs fs/configfs: Add a callback to determine attribute visibility 2024-06-17 20:42:57 +02:00
cramfs vfs-6.11.module.description 2024-07-15 11:14:59 -07:00
crypto The usual shower of singleton fixes and minor series all over MM, 2024-05-19 09:21:03 -07:00
debugfs vfs-6.11.mount.api 2024-07-15 11:31:32 -07:00
devpts
dlm dlm: add rcu_barrier before destroy kmem cache 2024-06-13 12:48:46 -05:00
ecryptfs hardening updates for 6.10-rc1 2024-05-13 14:14:05 -07:00
efivarfs efivarfs: Convert to new uid/gid option parsing helpers 2024-07-02 06:21:18 +02:00
efs vfs-6.11.module.description 2024-07-15 11:14:59 -07:00
erofs erofs: silence uninitialized variable warning in z_erofs_scan_folio() 2024-07-13 12:47:34 +08:00
exfat Description for this pull request: 2024-07-17 12:53:47 -07:00
exportfs fhandle: relax open_by_handle_at() permission checks 2024-05-28 15:57:23 +02:00
ext2 ext2: Verify bitmap and itable block numbers before using them 2024-06-26 12:54:11 +02:00
ext4 Many cleanups and bug fixes in ext4, especially for the fast commit 2024-07-18 17:03:42 -07:00
f2fs vfs-6.11.casefold 2024-07-15 11:18:49 -07:00
fat vfs-6.11.mount.api 2024-07-15 11:31:32 -07:00
freevxfs freevxfs: Convert freevxfs to the new mount API. 2024-03-26 09:04:53 +01:00
fuse virtio: features, fixes, cleanups 2024-07-19 11:57:55 -07:00
gfs2 gfs2: Clean up glock demote logic 2024-07-09 10:40:03 +02:00
hfs vfs-6.11.module.description 2024-07-15 11:14:59 -07:00
hfsplus vfs-6.11.misc 2024-07-15 10:52:51 -07:00
hostfs vfs-6.11.mount.api 2024-07-15 11:31:32 -07:00
hpfs vfs-6.11.module.description 2024-07-15 11:14:59 -07:00
hugetlbfs - 875fa64577 ("mm/hugetlb_vmemmap: fix race with speculative PFN 2024-07-21 17:15:46 -07:00
iomap vfs-6.11.iomap 2024-07-15 13:28:14 -07:00
isofs \n 2024-07-17 13:11:42 -07:00
jbd2 jbd2: increase maximum transaction size 2024-07-08 23:59:37 -04:00
jffs2 jffs2: Remove calls to set/clear the folio error flag 2024-05-31 12:31:41 +02:00
jfs jfs: xattr: fix buffer overflow for invalid xattr 2024-06-04 18:09:03 +02:00
kernfs kernfs: mount: Remove unnecessary ‘NULL’ values from knparent 2024-05-04 19:02:39 +02:00
lockd lockd: Use *-y instead of *-objs in Makefile 2024-07-08 14:10:03 -04:00
minix vfs-6.11.module.description 2024-07-15 11:14:59 -07:00
netfs netfs: Fix writeback that needs to go to both server and cache 2024-07-24 10:53:13 +02:00
nfs - 875fa64577 ("mm/hugetlb_vmemmap: fix race with speculative PFN 2024-07-21 17:15:46 -07:00
nfs_common fs: nfs: add missing MODULE_DESCRIPTION() macros 2024-07-08 13:47:24 -04:00
nfsd NFSD 6.11 Release Notes 2024-07-17 12:00:49 -07:00
nilfs2 - In the series "treewide: Refactor heap related implementation", 2024-07-21 17:56:22 -07:00
nls fs: nls: add missing MODULE_DESCRIPTION() macros 2024-06-03 16:37:07 +02:00
notify fsnotify: clear PARENT_WATCHED flags lazily 2024-06-05 09:52:38 +02:00
ntfs3 ntfs3: Convert to new uid/gid option parsing helpers 2024-07-02 06:21:19 +02:00
ocfs2 - In the series "treewide: Refactor heap related implementation", 2024-07-21 17:56:22 -07:00
omfs
openpromfs openpromfs: add missing MODULE_DESCRIPTION() macro 2024-06-20 09:46:01 +02:00
orangefs orangefs: Remove calls to set/clear the error flag 2024-05-31 12:31:41 +02:00
overlayfs ovl: fix encoding fid for lower only root 2024-06-14 10:30:40 +02:00
proc - 875fa64577 ("mm/hugetlb_vmemmap: fix race with speculative PFN 2024-07-21 17:15:46 -07:00
pstore memblock: updates for 6.11-rc1 2024-07-18 14:48:11 -07:00
qnx4 qnx4: add MODULE_DESCRIPTION() 2024-05-28 11:52:53 +02:00
qnx6 qnx6: add MODULE_DESCRIPTION() 2024-05-28 11:52:49 +02:00
quota vfs: replace WARN(down_read_trylock, ...) abuse with proper asserts 2024-06-03 15:45:47 +02:00
ramfs mm: switch mm->get_unmapped_area() to a flag 2024-04-25 20:56:25 -07:00
reiserfs reiserfs: Remove call to folio_set_error() 2024-05-31 12:31:41 +02:00
romfs romfs: Convert romfs_read_folio() to use a folio 2024-05-31 12:31:42 +02:00
smb four ksmbd server fixes 2024-07-21 20:50:39 -07:00
squashfs Mainly singleton patches, documented in their respective changelogs. 2024-05-19 14:02:03 -07:00
sysfs Merge 6.9-rc5 into driver-core-next 2024-04-23 13:27:43 +02:00
sysv fs: sysv: add MODULE_DESCRIPTION() 2024-05-28 11:52:45 +02:00
tracefs tracefs: Convert to new uid/gid option parsing helpers 2024-07-02 06:21:20 +02:00
ubifs This pull request contains updates for UBI and UBIFS: 2024-03-21 15:09:29 -07:00
udf udf: prevent integer overflow in udf_bitmap_free_blocks() 2024-06-26 12:54:11 +02:00
ufs - In the series "treewide: Refactor heap related implementation", 2024-07-21 17:56:22 -07:00
unicode kbuild: use $(src) instead of $(srctree)/$(src) for source directory 2024-05-10 04:34:52 +09:00
vboxsf vfs-6.11.mount.api 2024-07-15 11:31:32 -07:00
verity bpf: treewide: Align kfunc signatures to prog point-of-view 2024-06-12 11:01:31 -07:00
xfs New code for 6.11: 2024-07-17 12:57:48 -07:00
zonefs zonefs: enable support for large folios 2024-06-11 11:22:57 +09:00
aio.c - 875fa64577 ("mm/hugetlb_vmemmap: fix race with speculative PFN 2024-07-21 17:15:46 -07:00
anon_inodes.c fs: Create anon_inode_getfile_fmode() 2024-04-26 10:33:05 +02:00
attr.c fs: Export in_group_or_capable() 2024-06-25 11:15:48 +02:00
backing-file.c ovl: implement tmpfile 2024-05-02 20:35:57 +02:00
bad_inode.c
binfmt_elf_fdpic.c fs: don't block i_writecount during exec 2024-06-03 15:52:10 +02:00
binfmt_elf_test.c
binfmt_elf.c execve updates for v6.11-rc1 2024-07-16 12:59:20 -07:00
binfmt_flat.c
binfmt_misc.c vfs-6.11.module.description 2024-07-15 11:14:59 -07:00
binfmt_script.c fs: binfmt: add missing MODULE_DESCRIPTION() macros 2024-05-28 12:06:51 +02:00
buffer.c Many cleanups and bug fixes in ext4, especially for the fast commit 2024-07-18 17:03:42 -07:00
char_dev.c
compat_binfmt_elf.c
coredump.c coredump: simplify zap_process() 2024-07-04 23:43:09 -07:00
d_path.c
dax.c dax: use huge_zero_folio 2024-04-25 20:56:20 -07:00
dcache.c vfs-6.11.misc 2024-07-15 10:52:51 -07:00
direct-io.c fs/direct-io: remove redundant assignment to variable retval 2024-04-11 10:21:24 +02:00
drop_caches.c
eventfd.c eventfd: strictly check the count parameter of eventfd_write to avoid inputting illegal strings 2024-02-08 10:12:26 +01:00
eventpoll.c epoll: be better about file lifetimes 2024-05-05 14:00:48 -07:00
exec_test.c exec: Avoid pathological argc, envc, and bprm->p values 2024-07-13 21:31:58 -07:00
exec.c execve updates for v6.11-rc1 2024-07-16 12:59:20 -07:00
fcntl.c fcntl: add F_DUPFD_QUERY fcntl() 2024-05-10 08:26:31 +02:00
fhandle.c fhandle: relax open_by_handle_at() permission checks 2024-05-28 15:57:23 +02:00
file_table.c lsm/stable-6.9 PR 20240312 2024-03-12 20:03:34 -07:00
file.c fs/file: fix the check in find_next_fd() 2024-05-30 09:11:47 +02:00
filesystems.c
fs_context.c
fs_parser.c fs_parse: add uid & gid option option parsing helpers 2024-07-02 06:20:49 +02:00
fs_pin.c
fs_struct.c
fs_types.c
fs-writeback.c fs/writeback: remove unnecessary return in writeback_inodes_sb 2024-04-05 15:53:45 +02:00
fsopen.c vfs: retire user_path_at_empty and drop empty arg from getname_flags 2024-06-05 17:03:57 +02:00
init.c
inode.c vfs: handle __wait_on_freeing_inode() and evict() race 2024-07-24 10:52:58 +02:00
internal.h vfs-6.11.pidfs 2024-07-15 12:34:01 -07:00
ioctl.c fs/ioctl: Add a comment to keep the logic in sync with LSM policies 2024-05-13 06:58:35 +02:00
Kconfig - Sumanth Korikkar has taught s390 to allocate hotplug-time page frames 2024-03-14 17:43:30 -07:00
Kconfig.binfmt exec: Add KUnit test for bprm_stack_limits() 2024-06-19 13:13:55 -07:00
kernel_read_file.c
libfs.c libfs: Introduce case-insensitive string comparison helper 2024-06-07 17:00:44 +02:00
locks.c filelock: Fix fcntl/close race recovery compat path 2024-07-24 10:53:14 +02:00
Makefile vfs-6.9.pidfd 2024-03-11 10:21:06 -07:00
mbcache.c vfs: remove SLAB_MEM_SPREAD flag usage 2024-02-27 11:21:31 +01:00
mnt_idmapping.c fs/mnt_idmapping.c: Return -EINVAL when no map is written 2024-02-08 10:12:37 +01:00
mount.h vfs-6.11.mount 2024-07-15 11:54:04 -07:00
mpage.c buffer: Remove calls to set and clear the folio error flag 2024-05-31 12:31:43 +02:00
namei.c vfs: correct the comments of vfs_*() helpers 2024-07-24 10:53:12 +02:00
namespace.c fs: use all available ids 2024-07-24 10:53:13 +02:00
nsfs.c nsfs: use cleanup guard 2024-07-18 09:50:08 +02:00
open.c vfs-6.11.misc 2024-07-15 10:52:51 -07:00
pidfs.c pidfs: handle kernels without namespaces cleanly 2024-07-24 10:53:13 +02:00
pipe.c fs/pipe: Convert to lockdep_cmp_fn 2024-02-02 13:11:49 +01:00
pnode.c
pnode.h
posix_acl.c lsm/stable-6.9 PR 20240312 2024-03-12 20:03:34 -07:00
proc_namespace.c fs: rename show_mnt_opts -> show_vfsmnt_opts 2024-06-28 14:36:43 +02:00
read_write.c fs: Initial atomic write support 2024-06-20 15:19:17 -06:00
readdir.c readdir: Add missing quote in macro comment 2024-06-03 15:49:26 +02:00
remap_range.c vfs: export remap and write check helpers 2024-04-15 14:54:13 -07:00
select.c fs/select: rework stack allocation hack for clang 2024-02-20 09:23:52 +01:00
seq_file.c seq_file: Simplify __seq_puts() 2024-05-02 16:28:20 +02:00
signalfd.c signalfd: drop an obsolete comment 2024-05-24 13:34:07 +02:00
splice.c remove call_{read,write}_iter() functions 2024-04-15 16:03:25 -04:00
stack.c
stat.c for-6.11/block-20240710 2024-07-15 14:20:22 -07:00
statfs.c
super.c fs: don't misleadingly warn during thaw operations 2024-06-18 16:20:47 +02:00
sync.c
sysctls.c
timerfd.c timerfd: convert to ->read_iter() 2024-04-10 16:23:02 -06:00
userfaultfd.c mm: provide mm_struct and address to huge_ptep_get() 2024-07-12 15:52:15 -07:00
utimes.c
xattr.c vfs: Fix potential circular locking through setxattr() and removexattr() 2024-07-24 10:53:14 +02:00