linux/fs/btrfs
Filipe Manana 6c5041a103 btrfs: fix processing of delayed tree block refs during backref walking
[ Upstream commit 943553ef9b ]

During backref walking, when processing a delayed reference with a type of
BTRFS_TREE_BLOCK_REF_KEY, we have two bugs there:

1) We are accessing the delayed references extent_op, and its key, without
   the protection of the delayed ref head's lock;

2) If there's no extent op for the delayed ref head, we end up with an
   uninitialized key in the stack, variable 'tmp_op_key', and then pass
   it to add_indirect_ref(), which adds the reference to the indirect
   refs rb tree.

   This is wrong, because indirect references should have a NULL key
   when we don't have access to the key, and in that case they should be
   added to the indirect_missing_keys rb tree and not to the indirect rb
   tree.

   This means that if have BTRFS_TREE_BLOCK_REF_KEY delayed ref resulting
   from freeing an extent buffer, therefore with a count of -1, it will
   not cancel out the corresponding reference we have in the extent tree
   (with a count of 1), since both references end up in different rb
   trees.

   When using fiemap, where we often need to check if extents are shared
   through shared subtrees resulting from snapshots, it means we can
   incorrectly report an extent as shared when it's no longer shared.
   However this is temporary because after the transaction is committed
   the extent is no longer reported as shared, as running the delayed
   reference results in deleting the tree block reference from the extent
   tree.

   Outside the fiemap context, the result is unpredictable, as the key was
   not initialized but it's used when navigating the rb trees to insert
   and search for references (prelim_ref_compare()), and we expect all
   references in the indirect rb tree to have valid keys.

The following reproducer triggers the second bug:

   $ cat test.sh
   #!/bin/bash

   DEV=/dev/sdj
   MNT=/mnt/sdj

   mkfs.btrfs -f $DEV
   mount -o compress $DEV $MNT

   # With a compressed 128M file we get a tree height of 2 (level 1 root).
   xfs_io -f -c "pwrite -b 1M 0 128M" $MNT/foo

   btrfs subvolume snapshot $MNT $MNT/snap

   # Fiemap should output 0x2008 in the flags column.
   # 0x2000 means shared extent
   # 0x8 means encoded extent (because it's compressed)
   echo
   echo "fiemap after snapshot, range [120M, 120M + 128K):"
   xfs_io -c "fiemap -v 120M 128K" $MNT/foo
   echo

   # Overwrite one extent and fsync to flush delalloc and COW a new path
   # in the snapshot's tree.
   #
   # After this we have a BTRFS_DROP_DELAYED_REF delayed ref of type
   # BTRFS_TREE_BLOCK_REF_KEY with a count of -1 for every COWed extent
   # buffer in the path.
   #
   # In the extent tree we have inline references of type
   # BTRFS_TREE_BLOCK_REF_KEY, with a count of 1, for the same extent
   # buffers, so they should cancel each other, and the extent buffers in
   # the fs tree should no longer be considered as shared.
   #
   echo "Overwriting file range [120M, 120M + 128K)..."
   xfs_io -c "pwrite -b 128K 120M 128K" $MNT/snap/foo
   xfs_io -c "fsync" $MNT/snap/foo

   # Fiemap should output 0x8 in the flags column. The extent in the range
   # [120M, 120M + 128K) is no longer shared, it's now exclusive to the fs
   # tree.
   echo
   echo "fiemap after overwrite range [120M, 120M + 128K):"
   xfs_io -c "fiemap -v 120M 128K" $MNT/foo
   echo

   umount $MNT

Running it before this patch:

   $ ./test.sh
   (...)
   wrote 134217728/134217728 bytes at offset 0
   128 MiB, 128 ops; 0.1152 sec (1.085 GiB/sec and 1110.5809 ops/sec)
   Create a snapshot of '/mnt/sdj' in '/mnt/sdj/snap'

   fiemap after snapshot, range [120M, 120M + 128K):
   /mnt/sdj/foo:
    EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
      0: [245760..246015]: 34304..34559       256 0x2008

   Overwriting file range [120M, 120M + 128K)...
   wrote 131072/131072 bytes at offset 125829120
   128 KiB, 1 ops; 0.0001 sec (683.060 MiB/sec and 5464.4809 ops/sec)

   fiemap after overwrite range [120M, 120M + 128K):
   /mnt/sdj/foo:
    EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
      0: [245760..246015]: 34304..34559       256 0x2008

The extent in the range [120M, 120M + 128K) is still reported as shared
(0x2000 bit set) after overwriting that range and flushing delalloc, which
is not correct - an entire path was COWed in the snapshot's tree and the
extent is now only referenced by the original fs tree.

Running it after this patch:

   $ ./test.sh
   (...)
   wrote 134217728/134217728 bytes at offset 0
   128 MiB, 128 ops; 0.1198 sec (1.043 GiB/sec and 1068.2067 ops/sec)
   Create a snapshot of '/mnt/sdj' in '/mnt/sdj/snap'

   fiemap after snapshot, range [120M, 120M + 128K):
   /mnt/sdj/foo:
    EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
      0: [245760..246015]: 34304..34559       256 0x2008

   Overwriting file range [120M, 120M + 128K)...
   wrote 131072/131072 bytes at offset 125829120
   128 KiB, 1 ops; 0.0001 sec (694.444 MiB/sec and 5555.5556 ops/sec)

   fiemap after overwrite range [120M, 120M + 128K):
   /mnt/sdj/foo:
    EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
      0: [245760..246015]: 34304..34559       256   0x8

Now the extent is not reported as shared anymore.

So fix this by passing a NULL key pointer to add_indirect_ref() when
processing a delayed reference for a tree block if there's no extent op
for our delayed ref head with a defined key. Also access the extent op
only after locking the delayed ref head's lock.

The reproducer will be converted later to a test case for fstests.

Fixes: 86d5f99442 ("btrfs: convert prelimary reference tracking to use rbtrees")
Fixes: a6dbceafb9 ("btrfs: Remove unused op_key var from add_delayed_refs")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-29 10:12:55 +02:00
..
tests btrfs: remove ignore_offset argument from btrfs_find_all_roots() 2021-08-23 13:19:01 +02:00
acl.c overlayfs update for 5.15 2021-09-02 09:21:27 -07:00
async-thread.c btrfs: fix memory ordering between normal and ordered work functions 2021-11-25 09:48:46 +01:00
async-thread.h
backref.c btrfs: fix processing of delayed tree block refs during backref walking 2022-10-29 10:12:55 +02:00
backref.h btrfs: remove ignore_offset argument from btrfs_find_all_roots() 2021-08-23 13:19:01 +02:00
block-group.c btrfs: enhance unsupported compat RO flags handling 2022-10-29 10:12:53 +02:00
block-group.h btrfs: fix space cache corruption and potential double allocations 2022-09-05 10:30:12 +02:00
block-rsv.c btrfs: introduce mount option rescue=ignorebadroots 2020-12-08 15:53:41 +01:00
block-rsv.h
btrfs_inode.h btrfs: put initial index value of a directory in a constant 2022-08-31 17:16:35 +02:00
check-integrity.c btrfs: rename btrfs_bio to btrfs_io_context 2022-07-21 21:24:32 +02:00
check-integrity.h
compression.c btrfs: remove unused parameter nr_pages in add_ra_bio_pages() 2022-04-20 09:34:04 +02:00
compression.h btrfs: rework btrfs_decompress_buf2page() 2021-08-23 13:19:04 +02:00
ctree.c btrfs: fix lockdep splat with reloc root extent buffers 2022-09-05 10:30:12 +02:00
ctree.h btrfs: fix space cache corruption and potential double allocations 2022-09-05 10:30:12 +02:00
delalloc-space.c btrfs: convert count_max_extents() to use fs_info->max_extent_size 2022-08-31 17:16:34 +02:00
delalloc-space.h btrfs: make btrfs_delalloc_reserve_space take btrfs_inode 2020-07-27 12:55:36 +02:00
delayed-inode.c btrfs: add ro compat flags to inodes 2021-08-23 13:19:09 +02:00
delayed-inode.h btrfs: make btrfs_delayed_update_inode take btrfs_inode 2020-12-08 15:54:10 +01:00
delayed-ref.c btrfs: fix lock inversion problem when doing qgroup extent tracing 2021-07-22 15:50:07 +02:00
delayed-ref.h btrfs: add additional parameters to btrfs_init_tree_ref/btrfs_init_data_ref 2022-07-12 16:34:50 +02:00
dev-replace.c btrfs: add info when mount fails due to stale replace target 2022-08-31 17:16:46 +02:00
dev-replace.h btrfs: zoned: mark block groups to copy for device-replace 2021-02-09 02:46:07 +01:00
dir-item.c btrfs: unify lookup return value when dir entry is missing 2021-10-07 22:06:32 +02:00
discard.c btrfs: fix typos in comments 2021-06-22 14:11:57 +02:00
discard.h btrfs: cleanup btrfs_discard_update_discardable usage 2020-12-08 15:54:02 +01:00
disk-io.c btrfs: fix hang during unmount when stopping a space reclaim worker 2022-09-28 11:11:42 +02:00
disk-io.h btrfs: move lockdep class helpers to locking.c 2022-09-05 10:30:12 +02:00
export.c btrfs: locking: rip out path->leave_spinning 2020-12-08 15:54:02 +01:00
export.h
extent_io.c btrfs: fix lockdep splat with reloc root extent buffers 2022-09-05 10:30:12 +02:00
extent_io.h btrfs: fix qgroup reserve overflow the qgroup limit 2022-04-13 20:59:23 +02:00
extent_map.c btrfs: rename btrfs_bio to btrfs_io_context 2022-07-21 21:24:32 +02:00
extent_map.h
extent-io-tree.h btrfs: use fixed width int type for extent_state::state 2020-12-08 15:54:13 +01:00
extent-tree.c btrfs: set generation before calling btrfs_clean_tree_block in btrfs_init_new_buffer 2022-10-26 12:34:27 +02:00
file-item.c btrfs: make search_csum_tree return 0 if we get -EFBIG 2022-04-08 14:23:58 +02:00
file.c btrfs: add additional parameters to btrfs_init_tree_ref/btrfs_init_data_ref 2022-07-12 16:34:50 +02:00
free-space-cache.c btrfs: dump extra info if one free space cache has more bitmaps than it should 2022-10-26 12:35:44 +02:00
free-space-cache.h btrfs: zoned: track unusable bytes for zones 2021-02-09 02:46:03 +01:00
free-space-tree.c btrfs: fix invalid delayed ref after subvolume creation failure 2022-07-12 16:34:50 +02:00
free-space-tree.h
inode-item.c btrfs: locking: rip out path->leave_spinning 2020-12-08 15:54:02 +01:00
inode.c btrfs: remove root argument from btrfs_unlink_inode() 2022-09-05 10:30:09 +02:00
ioctl.c btrfs: fix use of uninitialized variable at rm device ioctl 2022-07-12 16:35:11 +02:00
Kconfig btrfs: disable build on platforms having page size 256K 2021-06-22 14:11:57 +02:00
locking.c btrfs: fix lockdep splat with reloc root extent buffers 2022-09-05 10:30:12 +02:00
locking.h btrfs: fix lockdep splat with reloc root extent buffers 2022-09-05 10:30:12 +02:00
lzo.c btrfs: prevent copying too big compressed lzo segment 2022-03-02 11:48:07 +01:00
Makefile btrfs: initial fsverity support 2021-08-23 13:19:09 +02:00
misc.h btrfs: use correct header for div_u64 in misc.h 2021-09-07 14:29:50 +02:00
ordered-data.c btrfs: zoned: fix double counting of split ordered extent 2021-09-07 14:30:41 +02:00
ordered-data.h btrfs: remove uptodate parameter from btrfs_dec_test_first_ordered_pending 2021-08-23 13:19:02 +02:00
orphan.c
print-tree.c btrfs: print the actual offset in btrfs_root_name 2021-01-07 17:25:05 +01:00
print-tree.h btrfs: print the actual offset in btrfs_root_name 2021-01-07 17:25:05 +01:00
props.c btrfs: props: change how empty value is interpreted 2021-06-22 14:11:58 +02:00
props.h
qgroup.c btrfs: fix race between quota enable and quota rescan ioctl 2022-10-26 12:34:27 +02:00
qgroup.h btrfs: fix lock inversion problem when doing qgroup extent tracing 2021-07-22 15:50:07 +02:00
raid56.c btrfs: raid56: don't trust any cached sector in __raid56_parity_recover() 2022-08-21 15:17:49 +02:00
raid56.h btrfs: rename btrfs_bio to btrfs_io_context 2022-07-21 21:24:32 +02:00
rcu-string.h
reada.c btrfs: rename btrfs_bio to btrfs_io_context 2022-07-21 21:24:32 +02:00
ref-verify.c btrfs: stop doing GFP_KERNEL memory allocations in the ref verify tool 2021-08-23 13:19:00 +02:00
ref-verify.h
reflink.c btrfs: fix unexpected error path when reflinking an inline extent 2022-04-08 14:23:11 +02:00
reflink.h
relocation.c btrfs: fix lockdep splat with reloc root extent buffers 2022-09-05 10:30:12 +02:00
root-tree.c btrfs: fix silent failure when deleting root reference 2022-08-31 17:16:46 +02:00
scrub.c btrfs: scrub: try to fix super block errors 2022-10-26 12:35:44 +02:00
send.c btrfs: make send work with concurrent block group relocation 2022-03-16 14:23:46 +01:00
send.h btrfs: send: avoid copying file data 2020-10-07 12:13:17 +02:00
space-info.c btrfs: extend locking to all space_info members accesses 2022-04-08 14:23:02 +02:00
space-info.h btrfs: rip out btrfs_space_info::total_bytes_pinned 2021-06-22 14:55:25 +02:00
struct-funcs.c btrfs: add special case to setget helpers for 64k pages 2021-08-23 13:18:58 +02:00
subpage.c btrfs: subpage: fix a potential use-after-free in writeback helper 2021-08-23 13:19:05 +02:00
subpage.h btrfs: subpage: fix writeback which does not have ordered extent 2021-08-23 13:19:04 +02:00
super.c btrfs: enhance unsupported compat RO flags handling 2022-10-29 10:12:53 +02:00
sysfs.c btrfs: sysfs: document structures and their associated files 2021-08-23 13:19:12 +02:00
sysfs.h btrfs: split and refactor btrfs_sysfs_remove_devices_dir 2020-10-07 12:12:21 +02:00
transaction.c btrfs: make send work with concurrent block group relocation 2022-03-16 14:23:46 +01:00
transaction.h btrfs: do not start relocation until in progress drops are done 2022-03-08 19:12:54 +01:00
tree-checker.c btrfs: tree-checker: check for overlapping extent items 2022-09-05 10:30:12 +02:00
tree-checker.h
tree-defrag.c btrfs: locking: remove all the blocking helpers 2020-12-08 15:54:01 +01:00
tree-log.c btrfs: fix warning during log replay when bumping inode link count 2022-09-05 10:30:09 +02:00
tree-log.h btrfs: pass the dentry to btrfs_log_new_name() instead of the inode 2022-08-31 17:16:36 +02:00
tree-mod-log.c btrfs: fix race when picking most recent mod log operation for an old root 2021-04-20 19:27:17 +02:00
tree-mod-log.h btrfs: add and use helper to get lowest sequence number for the tree mod log 2021-04-19 17:25:17 +02:00
ulist.c
ulist.h
uuid-tree.c btrfs: remove unnecessary casts in printk 2020-12-08 15:53:52 +01:00
verity.c btrfs: fix transaction handle leak after verity rollback failure 2021-09-17 19:29:41 +02:00
volumes.c btrfs: fix possible memory leak in btrfs_get_dev_args_from_path() 2022-08-31 17:16:46 +02:00
volumes.h btrfs: rename btrfs_bio to btrfs_io_context 2022-07-21 21:24:32 +02:00
xattr.c btrfs: check if root is readonly while setting security xattr 2022-08-31 17:16:46 +02:00
xattr.h
zlib.c Revert "btrfs: compression: drop kmap/kunmap from zlib" 2021-10-29 13:03:05 +02:00
zoned.c btrfs: zoned: set pseudo max append zone limit in zone emulation mode 2022-09-15 11:30:02 +02:00
zoned.h btrfs: zoned: revive max_zone_append_bytes 2022-08-31 17:16:34 +02:00
zstd.c Revert "btrfs: compression: drop kmap/kunmap from zstd" 2021-10-29 13:02:50 +02:00