linux/fs/btrfs
Filipe Manana a6d155d2e3 Btrfs: fix deadlock between fiemap and transaction commits
The fiemap handler locks a file range that can have unflushed delalloc,
and after locking the range, it tries to attach to a running transaction.
If the running transaction started its commit, that is, it is in state
TRANS_STATE_COMMIT_START, and either the filesystem was mounted with the
flushoncommit option or the transaction is creating a snapshot for the
subvolume that contains the file that fiemap is operating on, we end up
deadlocking. This happens because fiemap is blocked on the transaction,
waiting for it to complete, and the transaction is waiting for the flushed
dealloc to complete, which requires locking the file range that the fiemap
task already locked. The following stack traces serve as an example of
when this deadlock happens:

  (...)
  [404571.515510] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
  [404571.515956] Call Trace:
  [404571.516360]  ? __schedule+0x3ae/0x7b0
  [404571.516730]  schedule+0x3a/0xb0
  [404571.517104]  lock_extent_bits+0x1ec/0x2a0 [btrfs]
  [404571.517465]  ? remove_wait_queue+0x60/0x60
  [404571.517832]  btrfs_finish_ordered_io+0x292/0x800 [btrfs]
  [404571.518202]  normal_work_helper+0xea/0x530 [btrfs]
  [404571.518566]  process_one_work+0x21e/0x5c0
  [404571.518990]  worker_thread+0x4f/0x3b0
  [404571.519413]  ? process_one_work+0x5c0/0x5c0
  [404571.519829]  kthread+0x103/0x140
  [404571.520191]  ? kthread_create_worker_on_cpu+0x70/0x70
  [404571.520565]  ret_from_fork+0x3a/0x50
  [404571.520915] kworker/u8:6    D    0 31651      2 0x80004000
  [404571.521290] Workqueue: btrfs-flush_delalloc btrfs_flush_delalloc_helper [btrfs]
  (...)
  [404571.537000] fsstress        D    0 13117  13115 0x00004000
  [404571.537263] Call Trace:
  [404571.537524]  ? __schedule+0x3ae/0x7b0
  [404571.537788]  schedule+0x3a/0xb0
  [404571.538066]  wait_current_trans+0xc8/0x100 [btrfs]
  [404571.538349]  ? remove_wait_queue+0x60/0x60
  [404571.538680]  start_transaction+0x33c/0x500 [btrfs]
  [404571.539076]  btrfs_check_shared+0xa3/0x1f0 [btrfs]
  [404571.539513]  ? extent_fiemap+0x2ce/0x650 [btrfs]
  [404571.539866]  extent_fiemap+0x2ce/0x650 [btrfs]
  [404571.540170]  do_vfs_ioctl+0x526/0x6f0
  [404571.540436]  ksys_ioctl+0x70/0x80
  [404571.540734]  __x64_sys_ioctl+0x16/0x20
  [404571.540997]  do_syscall_64+0x60/0x1d0
  [404571.541279]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
  (...)
  [404571.543729] btrfs           D    0 14210  14208 0x00004000
  [404571.544023] Call Trace:
  [404571.544275]  ? __schedule+0x3ae/0x7b0
  [404571.544526]  ? wait_for_completion+0x112/0x1a0
  [404571.544795]  schedule+0x3a/0xb0
  [404571.545064]  schedule_timeout+0x1ff/0x390
  [404571.545351]  ? lock_acquire+0xa6/0x190
  [404571.545638]  ? wait_for_completion+0x49/0x1a0
  [404571.545890]  ? wait_for_completion+0x112/0x1a0
  [404571.546228]  wait_for_completion+0x131/0x1a0
  [404571.546503]  ? wake_up_q+0x70/0x70
  [404571.546775]  btrfs_wait_ordered_extents+0x27c/0x400 [btrfs]
  [404571.547159]  btrfs_commit_transaction+0x3b0/0xae0 [btrfs]
  [404571.547449]  ? btrfs_mksubvol+0x4a4/0x640 [btrfs]
  [404571.547703]  ? remove_wait_queue+0x60/0x60
  [404571.547969]  btrfs_mksubvol+0x605/0x640 [btrfs]
  [404571.548226]  ? __sb_start_write+0xd4/0x1c0
  [404571.548512]  ? mnt_want_write_file+0x24/0x50
  [404571.548789]  btrfs_ioctl_snap_create_transid+0x169/0x1a0 [btrfs]
  [404571.549048]  btrfs_ioctl_snap_create_v2+0x11d/0x170 [btrfs]
  [404571.549307]  btrfs_ioctl+0x133f/0x3150 [btrfs]
  [404571.549549]  ? mem_cgroup_charge_statistics+0x4c/0xd0
  [404571.549792]  ? mem_cgroup_commit_charge+0x84/0x4b0
  [404571.550064]  ? __handle_mm_fault+0xe3e/0x11f0
  [404571.550306]  ? do_raw_spin_unlock+0x49/0xc0
  [404571.550608]  ? _raw_spin_unlock+0x24/0x30
  [404571.550976]  ? __handle_mm_fault+0xedf/0x11f0
  [404571.551319]  ? do_vfs_ioctl+0xa2/0x6f0
  [404571.551659]  ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
  [404571.552087]  do_vfs_ioctl+0xa2/0x6f0
  [404571.552355]  ksys_ioctl+0x70/0x80
  [404571.552621]  __x64_sys_ioctl+0x16/0x20
  [404571.552864]  do_syscall_64+0x60/0x1d0
  [404571.553104]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
  (...)

If we were joining the transaction instead of attaching to it, we would
not risk a deadlock because a join only blocks if the transaction is in a
state greater then or equals to TRANS_STATE_COMMIT_DOING, and the delalloc
flush performed by a transaction is done before it reaches that state,
when it is in the state TRANS_STATE_COMMIT_START. However a transaction
join is intended for use cases where we do modify the filesystem, and
fiemap only needs to peek at delayed references from the current
transaction in order to determine if extents are shared, and, besides
that, when there is no current transaction or when it blocks to wait for
a current committing transaction to complete, it creates a new transaction
without reserving any space. Such unnecessary transactions, besides doing
unnecessary IO, can cause transaction aborts (-ENOSPC) and unnecessary
rotation of the precious backup roots.

So fix this by adding a new transaction join variant, named join_nostart,
which behaves like the regular join, but it does not create a transaction
when none currently exists or after waiting for a committing transaction
to complete.

Fixes: 03628cdbc6 ("Btrfs: do not start a transaction during fiemap")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-30 18:25:12 +02:00
..
tests btrfs: Evaluate io_tree in find_lock_delalloc_range() 2019-07-04 17:26:17 +02:00
acl.c btrfs: cleanup btrfs_setxattr_trans and drop transaction parameter 2019-04-29 19:02:44 +02:00
async-thread.c btrfs: simplify workqueue name when allocating 2019-02-25 14:13:24 +01:00
async-thread.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
backref.c Btrfs: fix deadlock between fiemap and transaction commits 2019-07-30 18:25:12 +02:00
backref.h btrfs: fiemap: preallocate ulists for btrfs_check_shared 2019-07-01 13:34:53 +02:00
block-rsv.c btrfs: migrate the global_block_rsv helpers to block-rsv.c 2019-07-02 12:30:55 +02:00
block-rsv.h btrfs: migrate the global_block_rsv helpers to block-rsv.c 2019-07-02 12:30:55 +02:00
btrfs_inode.h btrfs: remove assumption about csum type form btrfs_print_data_csum_error() 2019-07-01 13:35:02 +02:00
check-integrity.c btrfs: directly call into crypto framework for checksumming 2019-07-01 13:35:02 +02:00
check-integrity.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
compression.c btrfs: lift bio_set_dev from bio allocation helpers 2019-07-02 12:30:51 +02:00
compression.h btrfs: correctly validate compression type 2019-07-02 12:30:48 +02:00
ctree.c btrfs: ctree: Dump the leaf before BUG_ON in btrfs_set_item_key_safe 2019-04-29 19:02:52 +02:00
ctree.h btrfs: migrate the delalloc space stuff to it's own home 2019-07-04 17:26:17 +02:00
dedupe.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
delalloc-space.c btrfs: migrate the delalloc space stuff to it's own home 2019-07-04 17:26:17 +02:00
delalloc-space.h btrfs: migrate the delalloc space stuff to it's own home 2019-07-04 17:26:17 +02:00
delayed-inode.c btrfs: get fs_info from eb in btrfs_leaf_free_space 2019-04-29 19:02:30 +02:00
delayed-inode.h Btrfs: delayed-inode: use rb_first_cached for ins_root and del_root 2018-10-15 17:23:33 +02:00
delayed-ref.c btrfs: migrate the delayed refs rsv code 2019-07-04 17:26:17 +02:00
delayed-ref.h btrfs: migrate the delayed refs rsv code 2019-07-04 17:26:17 +02:00
dev-replace.c btrfs: remove mapping tree structures indirection 2019-07-01 13:34:56 +02:00
dev-replace.h btrfs: get fs_info from trans in btrfs_run_dev_replace 2019-04-29 19:02:43 +02:00
dir-item.c btrfs: remove unused parameter fs_info from btrfs_extend_item 2019-04-29 19:02:50 +02:00
disk-io.c btrfs: free checksum hash on in close_ctree 2019-07-17 17:03:33 +02:00
disk-io.h btrfs: directly call into crypto framework for checksumming 2019-07-01 13:35:02 +02:00
export.c btrfs: Remove 'objectid' member from struct btrfs_root 2018-10-15 17:23:25 +02:00
export.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
extent_io.c btrfs: fix memory leak of path on error return path 2019-07-05 18:47:57 +02:00
extent_io.h btrfs: Evaluate io_tree in find_lock_delalloc_range() 2019-07-04 17:26:17 +02:00
extent_map.c btrfs: Optimize unallocated chunks discard 2019-04-29 19:02:38 +02:00
extent_map.h btrfs: Remove impossible condition from mergable_maps 2019-02-25 14:13:21 +01:00
extent-tree.c btrfs: move the subvolume reservation stuff out of extent-tree.c 2019-07-04 17:26:18 +02:00
file-item.c btrfs: directly call into crypto framework for checksumming 2019-07-01 13:35:02 +02:00
file.c btrfs: migrate the delalloc space stuff to it's own home 2019-07-04 17:26:17 +02:00
free-space-cache.c btrfs: migrate the delalloc space stuff to it's own home 2019-07-04 17:26:17 +02:00
free-space-cache.h btrfs: get fs_info from block group in btrfs_find_space_cluster 2019-04-29 19:02:46 +02:00
free-space-tree.c btrfs: get fs_info from block group in search_free_space_info 2019-04-29 19:02:46 +02:00
free-space-tree.h btrfs: get fs_info from block group in search_free_space_info 2019-04-29 19:02:46 +02:00
inode-item.c btrfs: remove unused parameter fs_info from btrfs_extend_item 2019-04-29 19:02:50 +02:00
inode-map.c btrfs: migrate the delalloc space stuff to it's own home 2019-07-04 17:26:17 +02:00
inode-map.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
inode.c btrfs: inode: Don't compress if NODATASUM or NODATACOW set 2019-07-17 17:03:28 +02:00
ioctl.c btrfs: migrate the delalloc space stuff to it's own home 2019-07-04 17:26:17 +02:00
Kconfig btrfs: Fix build error while LIBCRC32C is module 2019-07-17 17:03:30 +02:00
locking.c btrfs: Fix deadlock caused by missing memory barrier 2019-07-25 17:34:08 +02:00
locking.h btrfs: merge btrfs_set_lock_blocking_rw with it's caller 2019-02-25 14:13:28 +01:00
lzo.c btrfs: change set_level() to bound the level passed in 2019-02-25 14:13:32 +01:00
Makefile btrfs: migrate the delalloc space stuff to it's own home 2019-07-04 17:26:17 +02:00
math.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
ordered-data.c btrfs: fix extent_state leak in btrfs_lock_and_flush_ordered_range 2019-07-26 12:21:22 +02:00
ordered-data.h btrfs: don't assume ordered sums to be 4 bytes 2019-07-01 13:35:00 +02:00
orphan.c btrfs: replace GPL boilerplate by SPDX -- sources 2018-04-12 16:29:51 +02:00
print-tree.c btrfs: switch extent_buffer write_locks from atomic to int 2019-07-02 12:30:47 +02:00
print-tree.h btrfs: print-tree: debugging output enhancement 2018-04-20 19:18:16 +02:00
props.c btrfs: shut up bogus -Wmaybe-uninitialized warning 2019-07-02 12:30:49 +02:00
props.h btrfs: delete unused function btrfs_set_prop_trans 2019-04-29 19:02:54 +02:00
qgroup.c btrfs: qgroup: Don't hold qgroup_ioctl_lock in btrfs_qgroup_inherit() 2019-07-02 12:30:48 +02:00
qgroup.h btrfs: qgroup: Move reserved data accounting from btrfs_delayed_ref_head to btrfs_qgroup_extent_record 2019-02-25 14:13:39 +01:00
raid56.c block: remove the i argument to bio_for_each_segment_all 2019-04-30 09:26:13 -06:00
raid56.h btrfs: constify map parameter for nr_parity_stripes and nr_data_stripes 2019-07-01 13:34:58 +02:00
rcu-string.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
reada.c btrfs: start readahead also in seed devices 2019-06-14 17:33:46 +02:00
ref-verify.c Wimplicit-fallthrough patches for 5.2-rc1 2019-05-07 12:48:10 -07:00
ref-verify.h btrfs: ref-verify: Use btrfs_ref to refactor btrfs_ref_tree_mod() 2019-04-29 19:02:49 +02:00
relocation.c btrfs: migrate the delalloc space stuff to it's own home 2019-07-04 17:26:17 +02:00
root-tree.c btrfs: move the subvolume reservation stuff out of extent-tree.c 2019-07-04 17:26:18 +02:00
scrub.c btrfs: add mask for all RAID1 types 2019-07-02 12:30:48 +02:00
send.c Btrfs: fix incremental send failure after deduplication 2019-07-30 18:25:11 +02:00
send.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
space-info.c btrfs: Simplify update of space_info in __reserve_metadata_bytes() 2019-07-02 12:30:53 +02:00
space-info.h btrfs: unexport can_overcommit 2019-07-02 12:30:53 +02:00
struct-funcs.c btrfs: prune unused includes 2018-08-06 13:12:43 +02:00
super.c btrfs: move space_info to space-info.h 2019-07-02 12:30:51 +02:00
sysfs.c btrfs: move space_info to space-info.h 2019-07-02 12:30:51 +02:00
sysfs.h btrfs: drop extra enum initialization where using defaults 2018-12-17 14:51:43 +01:00
transaction.c Btrfs: fix deadlock between fiemap and transaction commits 2019-07-30 18:25:12 +02:00
transaction.h Btrfs: fix deadlock between fiemap and transaction commits 2019-07-30 18:25:12 +02:00
tree-checker.c btrfs: tree-checker: Check if the file extent end overflows 2019-07-01 13:34:55 +02:00
tree-checker.h btrfs: get fs_info from eb in btrfs_check_chunk_valid 2019-04-29 19:02:39 +02:00
tree-defrag.c btrfs: open code now trivial btrfs_set_lock_blocking 2019-02-25 14:13:27 +01:00
tree-log.c Btrfs: fix fsync not persisting dentry deletions due to inode evictions 2019-07-02 12:30:50 +02:00
tree-log.h btrfs: get fs_info from trans in btrfs_set_log_full_commit 2019-04-29 19:02:41 +02:00
ulist.c btrfs: replace GPL boilerplate by SPDX -- sources 2018-04-12 16:29:51 +02:00
ulist.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
uuid-tree.c btrfs: remove unused parameter fs_info from btrfs_extend_item 2019-04-29 19:02:50 +02:00
volumes.c btrfs: don't leak extent_map in btrfs_get_io_geometry() 2019-07-17 17:03:36 +02:00
volumes.h btrfs: Use btrfs_get_io_geometry appropriately 2019-07-02 12:30:50 +02:00
xattr.c Btrfs: fix failure to persist compression property xattr deletion on fsync 2019-06-17 16:37:17 +02:00
xattr.h btrfs: cleanup btrfs_setxattr_trans and drop transaction parameter 2019-04-29 19:02:44 +02:00
zlib.c btrfs: change set_level() to bound the level passed in 2019-02-25 14:13:32 +01:00
zstd.c btrfs: correct zstd workspace manager lock to use spin_lock_bh() 2019-05-28 18:54:09 +02:00