linux/fs/btrfs
Filipe Manana 061dde8245 btrfs: fix race between transaction aborts and fsyncs leading to use-after-free
There is a race between a task aborting a transaction during a commit,
a task doing an fsync and the transaction kthread, which leads to an
use-after-free of the log root tree. When this happens, it results in a
stack trace like the following:

  BTRFS info (device dm-0): forced readonly
  BTRFS warning (device dm-0): Skipping commit of aborted transaction.
  BTRFS: error (device dm-0) in cleanup_transaction:1958: errno=-5 IO failure
  BTRFS warning (device dm-0): lost page write due to IO error on /dev/mapper/error-test (-5)
  BTRFS warning (device dm-0): Skipping commit of aborted transaction.
  BTRFS warning (device dm-0): direct IO failed ino 261 rw 0,0 sector 0xa4e8 len 4096 err no 10
  BTRFS error (device dm-0): error writing primary super block to device 1
  BTRFS warning (device dm-0): direct IO failed ino 261 rw 0,0 sector 0x12e000 len 4096 err no 10
  BTRFS warning (device dm-0): direct IO failed ino 261 rw 0,0 sector 0x12e008 len 4096 err no 10
  BTRFS warning (device dm-0): direct IO failed ino 261 rw 0,0 sector 0x12e010 len 4096 err no 10
  BTRFS: error (device dm-0) in write_all_supers:4110: errno=-5 IO failure (1 errors while writing supers)
  BTRFS: error (device dm-0) in btrfs_sync_log:3308: errno=-5 IO failure
  general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b68: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
  CPU: 2 PID: 2458471 Comm: fsstress Not tainted 5.12.0-rc5-btrfs-next-84 #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
  RIP: 0010:__mutex_lock+0x139/0xa40
  Code: c0 74 19 (...)
  RSP: 0018:ffff9f18830d7b00 EFLAGS: 00010202
  RAX: 6b6b6b6b6b6b6b68 RBX: 0000000000000001 RCX: 0000000000000002
  RDX: ffffffffb9c54d13 RSI: 0000000000000000 RDI: 0000000000000000
  RBP: ffff9f18830d7bc0 R08: 0000000000000000 R09: 0000000000000000
  R10: ffff9f18830d7be0 R11: 0000000000000001 R12: ffff8c6cd199c040
  R13: ffff8c6c95821358 R14: 00000000fffffffb R15: ffff8c6cbcf01358
  FS:  00007fa9140c2b80(0000) GS:ffff8c6fac600000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007fa913d52000 CR3: 000000013d2b4003 CR4: 0000000000370ee0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   ? __btrfs_handle_fs_error+0xde/0x146 [btrfs]
   ? btrfs_sync_log+0x7c1/0xf20 [btrfs]
   ? btrfs_sync_log+0x7c1/0xf20 [btrfs]
   btrfs_sync_log+0x7c1/0xf20 [btrfs]
   btrfs_sync_file+0x40c/0x580 [btrfs]
   do_fsync+0x38/0x70
   __x64_sys_fsync+0x10/0x20
   do_syscall_64+0x33/0x80
   entry_SYSCALL_64_after_hwframe+0x44/0xae
  RIP: 0033:0x7fa9142a55c3
  Code: 8b 15 09 (...)
  RSP: 002b:00007fff26278d48 EFLAGS: 00000246 ORIG_RAX: 000000000000004a
  RAX: ffffffffffffffda RBX: 0000563c83cb4560 RCX: 00007fa9142a55c3
  RDX: 00007fff26278cb0 RSI: 00007fff26278cb0 RDI: 0000000000000005
  RBP: 0000000000000005 R08: 0000000000000001 R09: 00007fff26278d5c
  R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000340
  R13: 00007fff26278de0 R14: 00007fff26278d96 R15: 0000563c83ca57c0
  Modules linked in: btrfs dm_zero dm_snapshot dm_thin_pool (...)
  ---[ end trace ee2f1b19327d791d ]---

The steps that lead to this crash are the following:

1) We are at transaction N;

2) We have two tasks with a transaction handle attached to transaction N.
   Task A and Task B. Task B is doing an fsync;

3) Task B is at btrfs_sync_log(), and has saved fs_info->log_root_tree
   into a local variable named 'log_root_tree' at the top of
   btrfs_sync_log(). Task B is about to call write_all_supers(), but
   before that...

4) Task A calls btrfs_commit_transaction(), and after it sets the
   transaction state to TRANS_STATE_COMMIT_START, an error happens before
   it waits for the transaction's 'num_writers' counter to reach a value
   of 1 (no one else attached to the transaction), so it jumps to the
   label "cleanup_transaction";

5) Task A then calls cleanup_transaction(), where it aborts the
   transaction, setting BTRFS_FS_STATE_TRANS_ABORTED on fs_info->fs_state,
   setting the ->aborted field of the transaction and the handle to an
   errno value and also setting BTRFS_FS_STATE_ERROR on fs_info->fs_state.

   After that, at cleanup_transaction(), it deletes the transaction from
   the list of transactions (fs_info->trans_list), sets the transaction
   to the state TRANS_STATE_COMMIT_DOING and then waits for the number
   of writers to go down to 1, as it's currently 2 (1 for task A and 1
   for task B);

6) The transaction kthread is running and sees that BTRFS_FS_STATE_ERROR
   is set in fs_info->fs_state, so it calls btrfs_cleanup_transaction().

   There it sees the list fs_info->trans_list is empty, and then proceeds
   into calling btrfs_drop_all_logs(), which frees the log root tree with
   a call to btrfs_free_log_root_tree();

7) Task B calls write_all_supers() and, shortly after, under the label
   'out_wake_log_root', it deferences the pointer stored in
   'log_root_tree', which was already freed in the previous step by the
   transaction kthread. This results in a use-after-free leading to a
   crash.

Fix this by deleting the transaction from the list of transactions at
cleanup_transaction() only after setting the transaction state to
TRANS_STATE_COMMIT_DOING and waiting for all existing tasks that are
attached to the transaction to release their transaction handles.
This makes the transaction kthread wait for all the tasks attached to
the transaction to be done with the transaction before dropping the
log roots and doing other cleanups.

Fixes: ef67963dac ("btrfs: drop logs when we've aborted a transaction")
CC: stable@vger.kernel.org # 5.10+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-04-19 17:25:22 +02:00
..
tests idmapped-mounts-v5.12 2021-02-23 13:39:45 -08:00
acl.c fs: make helpers idmap mount aware 2021-01-24 14:27:20 +01:00
async-thread.c Btrfs: fix crash during unmount due to race with delayed inode workers 2020-03-23 17:01:51 +01:00
async-thread.h Btrfs: fix crash during unmount due to race with delayed inode workers 2020-03-23 17:01:51 +01:00
backref.c btrfs: move the tree mod log code into its own file 2021-04-19 17:25:16 +02:00
backref.h btrfs: add asserts for deleting backref cache nodes 2021-02-08 22:58:56 +01:00
block-group.c btrfs: fix exhaustion of the system chunk array due to concurrent allocations 2021-04-19 17:25:17 +02:00
block-group.h btrfs: fix race between writes to swap files and scrub 2021-02-22 18:07:15 +01:00
block-rsv.c btrfs: introduce mount option rescue=ignorebadroots 2020-12-08 15:53:41 +01:00
block-rsv.h btrfs: Remove __ prefix from btrfs_block_rsv_release 2020-03-23 17:01:55 +01:00
btrfs_inode.h btrfs: remove stale comment and logic from btrfs_inode_in_log() 2021-04-19 17:25:16 +02:00
check-integrity.c btrfs: integrity-checker: convert block context kmap's to kmap_local_page 2021-04-19 17:25:16 +02:00
check-integrity.h btrfs: remove btrfsic_submit_bh() 2020-03-23 17:01:39 +01:00
compression.c btrfs: convert kmap to kmap_local_page, simple cases 2021-04-19 17:25:16 +02:00
compression.h btrfs: compression: move declarations to header 2020-10-07 12:06:55 +02:00
ctree.c btrfs: improve btree readahead for full send operations 2021-04-19 17:25:17 +02:00
ctree.h btrfs: improve btree readahead for full send operations 2021-04-19 17:25:17 +02:00
delalloc-space.c btrfs: fix parameter description of btrfs_inode_rsv_release/btrfs_delalloc_release_space 2021-02-08 22:58:54 +01:00
delalloc-space.h btrfs: make btrfs_delalloc_reserve_space take btrfs_inode 2020-07-27 12:55:36 +02:00
delayed-inode.c btrfs: use btrfs_inode_lock/btrfs_inode_unlock inode lock helpers 2021-04-19 17:25:15 +02:00
delayed-inode.h btrfs: make btrfs_delayed_update_inode take btrfs_inode 2020-12-08 15:54:10 +01:00
delayed-ref.c btrfs: update debug message when checking seq number of a delayed ref 2021-04-19 17:25:17 +02:00
delayed-ref.h btrfs: only let one thread pre-flush delayed refs in commit 2021-02-08 22:58:56 +01:00
dev-replace.c btrfs: do not initialize dev replace for bad dev root 2021-03-17 19:42:14 +01:00
dev-replace.h btrfs: zoned: mark block groups to copy for device-replace 2021-02-09 02:46:07 +01:00
dir-item.c btrfs: locking: rip out path->leave_spinning 2020-12-08 15:54:02 +01:00
discard.c btrfs: document now parameter of peek_discard_list 2021-02-08 22:58:53 +01:00
discard.h btrfs: cleanup btrfs_discard_update_discardable usage 2020-12-08 15:54:02 +01:00
disk-io.c btrfs: subpage: support metadata checksum calculation at write time 2021-04-19 17:25:18 +02:00
disk-io.h btrfs: split alloc_log_tree() 2021-02-09 02:46:07 +01:00
export.c btrfs: locking: rip out path->leave_spinning 2020-12-08 15:54:02 +01:00
export.h btrfs: export helpers for subvolume name/id resolution 2020-03-23 17:01:42 +01:00
extent_io.c btrfs: introduce submit_eb_subpage() to submit a subpage metadata page 2021-04-19 17:25:22 +02:00
extent_io.h btrfs: move forward declarations to the beginning of extent_io.h 2021-04-19 17:25:19 +02:00
extent_map.c btrfs: fix parameter description of btrfs_add_extent_mapping 2021-02-08 22:58:53 +01:00
extent_map.h
extent-io-tree.h btrfs: use fixed width int type for extent_state::state 2020-12-08 15:54:13 +01:00
extent-tree.c btrfs: use the new bit BTRFS_FS_TREE_MOD_LOG_USERS at btrfs_free_tree_block() 2021-04-19 17:25:16 +02:00
file-item.c btrfs: remove duplicated in_range() macro 2021-04-19 17:25:16 +02:00
file.c btrfs: fix a potential hole punching failure 2021-04-19 17:25:17 +02:00
free-space-cache.c btrfs: zoned: do not account freed region of read-only block group as zone_unusable 2021-03-04 16:16:58 +01:00
free-space-cache.h btrfs: zoned: track unusable bytes for zones 2021-02-09 02:46:03 +01:00
free-space-tree.c btrfs: fix possible free space tree corruption with online conversion 2021-01-25 18:44:37 +01:00
free-space-tree.h
inode-item.c btrfs: locking: rip out path->leave_spinning 2020-12-08 15:54:02 +01:00
inode.c btrfs: handle btrfs_record_root_in_trans failure in btrfs_delete_subvolume 2021-04-19 17:25:19 +02:00
ioctl.c btrfs: handle btrfs_record_root_in_trans failure in create_subvol 2021-04-19 17:25:20 +02:00
Kconfig btrfs: switch to iomap for direct IO 2020-10-07 12:06:57 +02:00
locking.c btrfs: remove the recurse parameter from __btrfs_tree_read_lock 2020-12-08 15:54:09 +01:00
locking.h btrfs: remove the recurse parameter from __btrfs_tree_read_lock 2020-12-08 15:54:09 +01:00
lzo.c btrfs: convert kmap to kmap_local_page, simple cases 2021-04-19 17:25:16 +02:00
Makefile btrfs: move the tree mod log code into its own file 2021-04-19 17:25:16 +02:00
misc.h btrfs: rename tree_entry to rb_simple_node and export it 2020-05-25 11:25:19 +02:00
ordered-data.c btrfs: replace offset_in_entry with in_range 2021-04-19 17:25:14 +02:00
ordered-data.h btrfs: fix comment for btrfs ordered extent flag bits 2021-04-19 17:25:14 +02:00
orphan.c
print-tree.c btrfs: print the actual offset in btrfs_root_name 2021-01-07 17:25:05 +01:00
print-tree.h btrfs: print the actual offset in btrfs_root_name 2021-01-07 17:25:05 +01:00
props.c btrfs: simplify iget helpers 2020-05-25 11:25:37 +02:00
props.h
qgroup.c btrfs: move the tree mod log code into its own file 2021-04-19 17:25:16 +02:00
qgroup.h btrfs: export and rename qgroup_reserve_meta 2021-03-02 16:58:30 +01:00
raid56.c btrfs: raid56: convert kmaps to kmap_local_page 2021-04-19 17:25:16 +02:00
raid56.h
rcu-string.h btrfs: rcu-string: Replace zero-length array with flexible-array member 2020-03-23 17:01:53 +01:00
reada.c btrfs: subpage: make readahead work properly 2021-03-16 11:06:21 +01:00
ref-verify.c btrfs: ref-verify: use 'inline void' keyword ordering 2021-03-02 16:55:40 +01:00
ref-verify.h
reflink.c btrfs: make reflinks respect O_SYNC O_DSYNC and S_SYNC flags 2021-04-19 17:25:17 +02:00
reflink.h Btrfs: move all reflink implementation code into its own file 2020-03-23 17:01:54 +01:00
relocation.c btrfs: check return value of btrfs_commit_transaction in relocation 2021-04-19 17:25:22 +02:00
root-tree.c btrfs: qgroup: fix qgroup meta rsv leak for subvolume operations 2020-10-07 12:12:13 +02:00
scrub.c btrfs: scrub: drop a few function declarations 2021-04-19 17:25:14 +02:00
send.c btrfs: improve btree readahead for full send operations 2021-04-19 17:25:17 +02:00
send.h btrfs: send: avoid copying file data 2020-10-07 12:13:17 +02:00
space-info.c btrfs: use percpu_read_positive instead of sum_positive for need_preempt 2021-04-19 17:25:17 +02:00
space-info.h btrfs: zoned: track unusable bytes for zones 2021-02-09 02:46:03 +01:00
struct-funcs.c btrfs: handle sectorsize < PAGE_SIZE case for extent buffer accessors 2020-12-09 19:16:10 +01:00
subpage.c btrfs: subpage: add overview comments 2021-04-19 17:25:18 +02:00
subpage.h btrfs: subpage: introduce helpers for writeback status 2021-04-19 17:25:18 +02:00
super.c btrfs: fix spurious free_space_tree remount warning 2021-03-02 16:55:55 +01:00
sysfs.c btrfs: add sysfs interface for supported sectorsize 2021-04-19 17:25:18 +02:00
sysfs.h btrfs: split and refactor btrfs_sysfs_remove_devices_dir 2020-10-07 12:12:21 +02:00
transaction.c btrfs: fix race between transaction aborts and fsyncs leading to use-after-free 2021-04-19 17:25:22 +02:00
transaction.h btrfs: fix exhaustion of the system chunk array due to concurrent allocations 2021-04-19 17:25:17 +02:00
tree-checker.c btrfs: tree-checker: check for BTRFS_BLOCK_FLAG_FULL_BACKREF being set improperly 2021-04-19 17:25:21 +02:00
tree-checker.h
tree-defrag.c btrfs: locking: remove all the blocking helpers 2020-12-08 15:54:01 +01:00
tree-log.c btrfs: handle btrfs_record_root_in_trans failure in btrfs_recover_log_trees 2021-04-19 17:25:20 +02:00
tree-log.h btrfs: make fast fsyncs wait only for writeback 2020-10-07 12:06:56 +02:00
tree-mod-log.c btrfs: add and use helper to get lowest sequence number for the tree mod log 2021-04-19 17:25:17 +02:00
tree-mod-log.h btrfs: add and use helper to get lowest sequence number for the tree mod log 2021-04-19 17:25:17 +02:00
ulist.c
ulist.h
uuid-tree.c btrfs: remove unnecessary casts in printk 2020-12-08 15:53:52 +01:00
volumes.c btrfs: zoned: bail out in btrfs_alloc_chunk for bad input 2021-04-19 17:25:17 +02:00
volumes.h btrfs: zoned: relocate block group to repair IO failure in zoned filesystems 2021-02-09 02:46:07 +01:00
xattr.c for-5.12-rc1-tag 2021-03-05 12:21:14 -08:00
xattr.h
zlib.c btrfs: use memcpy_[to|from]_page() and kmap_local_page() 2021-02-26 12:45:15 +01:00
zoned.c for-5.12-rc6-tag 2021-04-11 11:53:36 -07:00
zoned.h btrfs: zoned: extend zoned allocator to use dedicated tree-log block group 2021-02-09 02:46:08 +01:00
zstd.c btrfs: use memcpy_[to|from]_page() and kmap_local_page() 2021-02-26 12:45:15 +01:00