linux/fs/btrfs
Filipe Manana bbc37d6e47 btrfs: fix space cache memory leak after transaction abort
If a transaction aborts it can cause a memory leak of the pages array of
a block group's io_ctl structure. The following steps explain how that can
happen:

1) Transaction N is committing, currently in state TRANS_STATE_UNBLOCKED
   and it's about to start writing out dirty extent buffers;

2) Transaction N + 1 already started and another task, task A, just called
   btrfs_commit_transaction() on it;

3) Block group B was dirtied (extents allocated from it) by transaction
   N + 1, so when task A calls btrfs_start_dirty_block_groups(), at the
   very beginning of the transaction commit, it starts writeback for the
   block group's space cache by calling btrfs_write_out_cache(), which
   allocates the pages array for the block group's io_ctl with a call to
   io_ctl_init(). Block group A is added to the io_list of transaction
   N + 1 by btrfs_start_dirty_block_groups();

4) While transaction N's commit is writing out the extent buffers, it gets
   an IO error and aborts transaction N, also setting the file system to
   RO mode;

5) Task A has already returned from btrfs_start_dirty_block_groups(), is at
   btrfs_commit_transaction() and has set transaction N + 1 state to
   TRANS_STATE_COMMIT_START. Immediately after that it checks that the
   filesystem was turned to RO mode, due to transaction N's abort, and
   jumps to the "cleanup_transaction" label. After that we end up at
   btrfs_cleanup_one_transaction() which calls btrfs_cleanup_dirty_bgs().
   That helper finds block group B in the transaction's io_list but it
   never releases the pages array of the block group's io_ctl, resulting in
   a memory leak.

In fact at the point when we are at btrfs_cleanup_dirty_bgs(), the pages
array points to pages that were already released by us at
__btrfs_write_out_cache() through the call to io_ctl_drop_pages(). We end
up freeing the pages array only after waiting for the ordered extent to
complete through btrfs_wait_cache_io(), which calls io_ctl_free() to do
that. But in the transaction abort case we don't wait for the space cache's
ordered extent to complete through a call to btrfs_wait_cache_io(), so
that's why we end up with a memory leak - we wait for the ordered extent
to complete indirectly by shutting down the work queues and waiting for
any jobs in them to complete before returning from close_ctree().

We can solve the leak simply by freeing the pages array right after
releasing the pages (with the call to io_ctl_drop_pages()) at
__btrfs_write_out_cache(), since we will never use it anymore after that
and the pages array points to already released pages at that point, which
is currently not a problem since no one will use it after that, but not a
good practice anyway since it can easily lead to use-after-free issues.

So fix this by freeing the pages array right after releasing the pages at
__btrfs_write_out_cache().

This issue can often be reproduced with test case generic/475 from fstests
and kmemleak can detect it and reports it with the following trace:

unreferenced object 0xffff9bbf009fa600 (size 512):
  comm "fsstress", pid 38807, jiffies 4298504428 (age 22.028s)
  hex dump (first 32 bytes):
    00 a0 7c 4d 3d ed ff ff 40 a0 7c 4d 3d ed ff ff  ..|M=...@.|M=...
    80 a0 7c 4d 3d ed ff ff c0 a0 7c 4d 3d ed ff ff  ..|M=.....|M=...
  backtrace:
    [<00000000f4b5cfe2>] __kmalloc+0x1a8/0x3e0
    [<0000000028665e7f>] io_ctl_init+0xa7/0x120 [btrfs]
    [<00000000a1f95b2d>] __btrfs_write_out_cache+0x86/0x4a0 [btrfs]
    [<00000000207ea1b0>] btrfs_write_out_cache+0x7f/0xf0 [btrfs]
    [<00000000af21f534>] btrfs_start_dirty_block_groups+0x27b/0x580 [btrfs]
    [<00000000c3c23d44>] btrfs_commit_transaction+0xa6f/0xe70 [btrfs]
    [<000000009588930c>] create_subvol+0x581/0x9a0 [btrfs]
    [<000000009ef2fd7f>] btrfs_mksubvol+0x3fb/0x4a0 [btrfs]
    [<00000000474e5187>] __btrfs_ioctl_snap_create+0x119/0x1a0 [btrfs]
    [<00000000708ee349>] btrfs_ioctl_snap_create_v2+0xb0/0xf0 [btrfs]
    [<00000000ea60106f>] btrfs_ioctl+0x12c/0x3130 [btrfs]
    [<000000005c923d6d>] __x64_sys_ioctl+0x83/0xb0
    [<0000000043ace2c9>] do_syscall_64+0x33/0x80
    [<00000000904efbce>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

CC: stable@vger.kernel.org # 4.9+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-08-19 18:39:46 +02:00
..
tests btrfs: make btrfs_set_extent_delalloc take btrfs_inode 2020-07-27 12:55:35 +02:00
acl.c btrfs: cleanup btrfs_setxattr_trans and drop transaction parameter 2019-04-29 19:02:44 +02:00
async-thread.c Btrfs: fix crash during unmount due to race with delayed inode workers 2020-03-23 17:01:51 +01:00
async-thread.h Btrfs: fix crash during unmount due to race with delayed inode workers 2020-03-23 17:01:51 +01:00
backref.c btrfs: check correct variable after allocation in btrfs_backref_iter_alloc 2020-08-10 19:50:54 +02:00
backref.h btrfs: rename BTRFS_ROOT_REF_COWS to BTRFS_ROOT_SHAREABLE 2020-05-25 11:25:35 +02:00
block-group.c btrfs: if we're restriping, use the target restripe profile 2020-07-27 12:55:47 +02:00
block-group.h btrfs: convert block group refcount to refcount_t 2020-07-27 12:55:42 +02:00
block-rsv.c btrfs: rename BTRFS_ROOT_REF_COWS to BTRFS_ROOT_SHAREABLE 2020-05-25 11:25:35 +02:00
block-rsv.h btrfs: Remove __ prefix from btrfs_block_rsv_release 2020-03-23 17:01:55 +01:00
btrfs_inode.h btrfs: reduce contention on log trees when logging checksums 2020-07-27 12:55:45 +02:00
check-integrity.c btrfs: check-integrity: remove unnecessary failure messages during memory allocation 2020-07-27 12:55:21 +02:00
check-integrity.h btrfs: remove btrfsic_submit_bh() 2020-03-23 17:01:39 +01:00
compression.c btrfs: remove fail label in check_compressed_csum 2020-07-27 12:55:42 +02:00
compression.h btrfs: make btrfs_submit_compressed_write take btrfs_inode 2020-07-27 12:55:31 +02:00
ctree.c btrfs: use the correct const function attribute for btrfs_get_num_csums 2020-08-19 18:39:21 +02:00
ctree.h btrfs: use the correct const function attribute for btrfs_get_num_csums 2020-08-19 18:39:21 +02:00
delalloc-space.c btrfs: make btrfs_delalloc_reserve_space take btrfs_inode 2020-07-27 12:55:36 +02:00
delalloc-space.h btrfs: make btrfs_delalloc_reserve_space take btrfs_inode 2020-07-27 12:55:36 +02:00
delayed-inode.c btrfs: use nofs allocations for running delayed items 2020-03-25 16:26:00 +01:00
delayed-inode.h btrfs: delayed-inode: Replace zero-length array with flexible-array member 2020-03-23 17:01:53 +01:00
delayed-ref.c btrfs: Remove __ prefix from btrfs_block_rsv_release 2020-03-23 17:01:55 +01:00
delayed-ref.h btrfs: migrate the delayed refs rsv code 2019-07-04 17:26:17 +02:00
dev-replace.c btrfs: sysfs, rename device_link add/remove functions 2020-03-23 17:01:35 +01:00
dev-replace.h btrfs: add __pure attribute to functions 2019-11-18 12:46:52 +01:00
dir-item.c btrfs: remove unused parameter fs_info from btrfs_extend_item 2019-04-29 19:02:50 +02:00
discard.c btrfs: discard: add missing put when grabbing block group from unused list 2020-07-07 16:06:28 +02:00
discard.h btrfs: discard: Use the correct style for SPDX License Identifier 2020-04-20 17:43:42 +02:00
disk-io.c btrfs: fix space cache memory leak after transaction abort 2020-08-19 18:39:46 +02:00
disk-io.h btrfs: preallocate anon block device at first phase of snapshot creation 2020-07-27 12:55:38 +02:00
export.c btrfs: simplify iget helpers 2020-05-25 11:25:37 +02:00
export.h btrfs: export helpers for subvolume name/id resolution 2020-03-23 17:01:42 +01:00
extent_io.c btrfs: do not set the full sync flag on the inode during page release 2020-07-27 12:55:48 +02:00
extent_io.h btrfs: don't use UAPI types for fiemap callback 2020-07-27 12:55:27 +02:00
extent_map.c Btrfs: fix race between using extent maps and merging them 2020-02-12 17:16:46 +01:00
extent_map.h btrfs: remove extent_map::bdev 2019-11-18 23:43:44 +01:00
extent-io-tree.h btrfs: trim: fix underflow in trim length to prevent access beyond device boundary 2020-08-12 10:15:58 +02:00
extent-tree.c btrfs: trim: fix underflow in trim length to prevent access beyond device boundary 2020-08-12 10:15:58 +02:00
file-item.c btrfs: make btrfs_csum_one_bio takae btrfs_inode 2020-07-27 12:55:26 +02:00
file.c btrfs: make btrfs_check_data_free_space take btrfs_inode 2020-07-27 12:55:36 +02:00
free-space-cache.c btrfs: fix space cache memory leak after transaction abort 2020-08-19 18:39:46 +02:00
free-space-cache.h btrfs: let btrfs_return_cluster_to_free_space() return void 2020-07-27 12:55:21 +02:00
free-space-tree.c btrfs: move the root freeing stuff into btrfs_put_root 2020-03-23 17:01:59 +01:00
free-space-tree.h btrfs: rename btrfs_block_group_cache 2019-11-18 17:51:51 +01:00
inode-item.c btrfs: Make btrfs_find_name_in_ext_backref return struct btrfs_inode_extref 2019-09-09 14:59:16 +02:00
inode-map.c btrfs: make btrfs_delalloc_reserve_space take btrfs_inode 2020-07-27 12:55:36 +02:00
inode-map.h
inode.c btrfs: handle errors from async submission 2020-08-19 18:26:14 +02:00
ioctl.c btrfs: add missing check for nocow and compression inode flags 2020-07-27 12:55:44 +02:00
Kconfig Revert "btrfs: switch to iomap_dio_rw() for dio" 2020-06-14 01:19:02 +02:00
locking.c btrfs: add missing annotation for btrfs_tree_lock() 2020-05-25 11:25:16 +02:00
locking.h btrfs: Implement DREW lock 2020-03-23 17:01:43 +01:00
lzo.c btrfs: compression: inline free_workspace 2019-11-18 12:46:59 +01:00
Makefile Btrfs: move all reflink implementation code into its own file 2020-03-23 17:01:54 +01:00
misc.h btrfs: rename tree_entry to rb_simple_node and export it 2020-05-25 11:25:19 +02:00
ordered-data.c btrfs: make btrfs_add_ordered_extent_dio take btrfs_inode 2020-07-27 12:55:34 +02:00
ordered-data.h btrfs: make btrfs_add_ordered_extent_dio take btrfs_inode 2020-07-27 12:55:34 +02:00
orphan.c
print-tree.c btrfs: Remove unneeded semicolon 2020-01-20 16:40:55 +01:00
print-tree.h
props.c btrfs: simplify iget helpers 2020-05-25 11:25:37 +02:00
props.h btrfs: delete unused function btrfs_set_prop_trans 2019-04-29 19:02:54 +02:00
qgroup.c btrfs: qgroup: remove ASYNC_COMMIT mechanism in favor of reserve retry-after-EDQUOT 2020-07-27 12:55:43 +02:00
qgroup.h btrfs: qgroup: export qgroups in sysfs 2020-07-27 12:55:37 +02:00
raid56.c btrfs: raid56: remove out label in __raid56_parity_recover 2020-07-27 12:55:44 +02:00
raid56.h btrfs: constify map parameter for nr_parity_stripes and nr_data_stripes 2019-07-01 13:34:58 +02:00
rcu-string.h btrfs: rcu-string: Replace zero-length array with flexible-array member 2020-03-23 17:01:53 +01:00
reada.c btrfs: rename btrfs_block_group_cache 2019-11-18 17:51:51 +01:00
ref-verify.c btrfs: ref-verify: fix memory leak in add_block_entry 2020-07-27 12:55:43 +02:00
ref-verify.h btrfs: ref-verify: Use btrfs_ref to refactor btrfs_ref_tree_mod() 2019-04-29 19:02:49 +02:00
reflink.c btrfs: reduce contention on log trees when logging checksums 2020-07-27 12:55:45 +02:00
reflink.h Btrfs: move all reflink implementation code into its own file 2020-03-23 17:01:54 +01:00
relocation.c btrfs: relocation: review the call sites which can be interrupted by signal 2020-07-27 12:55:45 +02:00
root-tree.c btrfs: simplify root lookup by id 2020-05-25 11:25:36 +02:00
scrub.c btrfs: return EROFS for BTRFS_FS_STATE_ERROR cases 2020-07-27 12:55:46 +02:00
send.c for-5.8-tag 2020-06-02 19:59:25 -07:00
send.h
space-info.c btrfs: fix lockdep splat from btrfs_dump_space_info 2020-07-27 12:55:47 +02:00
space-info.h btrfs: improve global reserve stealing logic 2020-05-25 11:25:22 +02:00
struct-funcs.c btrfs: update documentation of set/get helpers 2020-05-25 11:25:35 +02:00
super.c btrfs: reset compression level for lzo on remount 2020-08-19 18:39:12 +02:00
sysfs.c btrfs: sysfs: fix NULL pointer dereference at btrfs_sysfs_del_qgroups() 2020-08-10 19:51:08 +02:00
sysfs.h btrfs: qgroup: export qgroups in sysfs 2020-07-27 12:55:37 +02:00
transaction.c btrfs: return EROFS for BTRFS_FS_STATE_ERROR cases 2020-07-27 12:55:46 +02:00
transaction.h btrfs: qgroup: remove ASYNC_COMMIT mechanism in favor of reserve retry-after-EDQUOT 2020-07-27 12:55:43 +02:00
tree-checker.c btrfs: tree-checker: remove duplicate definition of 'inode_item_err' 2020-05-25 11:25:23 +02:00
tree-checker.h btrfs: get fs_info from eb in btrfs_check_chunk_valid 2019-04-29 19:02:39 +02:00
tree-defrag.c btrfs: remove unused btrfs_root::defrag_trans_start 2020-07-27 12:55:28 +02:00
tree-log.c btrfs: fix memory leaks after failure to lookup checksums during inode logging 2020-08-10 18:58:30 +02:00
tree-log.h btrfs: get fs_info from trans in btrfs_set_log_full_commit 2019-04-29 19:02:41 +02:00
ulist.c
ulist.h
uuid-tree.c btrfs: simplify root lookup by id 2020-05-25 11:25:36 +02:00
volumes.c btrfs: trim: fix underflow in trim length to prevent access beyond device boundary 2020-08-12 10:15:58 +02:00
volumes.h btrfs: record btrfs_device directly in btrfs_io_bio 2020-07-27 12:55:40 +02:00
xattr.c Btrfs: fix failure to persist compression property xattr deletion on fsync 2019-06-17 16:37:17 +02:00
xattr.h btrfs: cleanup btrfs_setxattr_trans and drop transaction parameter 2019-04-29 19:02:44 +02:00
zlib.c btrfs: use larger zlib buffer for s390 hardware compression 2020-01-31 10:30:40 -08:00
zstd.c btrfs: compression: inline free_workspace 2019-11-18 12:46:59 +01:00