linux/fs/btrfs
Qu Wenruo e42b9d8b9e btrfs: defrag: avoid unnecessary defrag caused by incorrect extent size
[BUG]
With the following file extent layout, defrag would do unnecessary IO
and result more on-disk space usage.

  # mkfs.btrfs -f $dev
  # mount $dev $mnt
  # xfs_io -f -c "pwrite 0 40m" $mnt/foobar
  # sync
  # xfs_io -f -c "pwrite 40m 16k" $mnt/foobar
  # sync

Above command would lead to the following file extent layout:

        item 6 key (257 EXTENT_DATA 0) itemoff 15816 itemsize 53
                generation 7 type 1 (regular)
                extent data disk byte 298844160 nr 41943040
                extent data offset 0 nr 41943040 ram 41943040
                extent compression 0 (none)
        item 7 key (257 EXTENT_DATA 41943040) itemoff 15763 itemsize 53
                generation 8 type 1 (regular)
                extent data disk byte 13631488 nr 16384
                extent data offset 0 nr 16384 ram 16384
                extent compression 0 (none)

Which is mostly fine. We can allow the final 16K to be merged with the
previous 40M, but it's upon the end users' preference.

But if we defrag the file using the default parameters, it would result
worse file layout:

 # btrfs filesystem defrag $mnt/foobar
 # sync

        item 6 key (257 EXTENT_DATA 0) itemoff 15816 itemsize 53
                generation 7 type 1 (regular)
                extent data disk byte 298844160 nr 41943040
                extent data offset 0 nr 8650752 ram 41943040
                extent compression 0 (none)
        item 7 key (257 EXTENT_DATA 8650752) itemoff 15763 itemsize 53
                generation 9 type 1 (regular)
                extent data disk byte 340787200 nr 33292288
                extent data offset 0 nr 33292288 ram 33292288
                extent compression 0 (none)
        item 8 key (257 EXTENT_DATA 41943040) itemoff 15710 itemsize 53
                generation 8 type 1 (regular)
                extent data disk byte 13631488 nr 16384
                extent data offset 0 nr 16384 ram 16384
                extent compression 0 (none)

Note the original 40M extent is still there, but a new 32M extent is
created for no benefit at all.

[CAUSE]
There is an existing check to make sure we won't defrag a large enough
extent (the threshold is by default 32M).

But the check is using the length to the end of the extent:

	range_len = em->len - (cur - em->start);

	/* Skip too large extent */
	if (range_len >= extent_thresh)
		goto next;

This means, for the first 8MiB of the extent, the range_len is always
smaller than the default threshold, and would not be defragged.
But after the first 8MiB, the remaining part would fit the requirement,
and be defragged.

Such different behavior inside the same extent caused the above problem,
and we should avoid different defrag decision inside the same extent.

[FIX]
Instead of using @range_len, just use @em->len, so that we have a
consistent decision among the same file extent.

Now with this fix, we won't touch the extent, thus not making it any
worse.

Reported-by: Filipe Manana <fdmanana@suse.com>
Fixes: 0cb5950f3f ("btrfs: fix deadlock when reserving space during defrag")
CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Boris Burkov <boris@bur.io>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-02-19 11:19:58 +01:00
..
tests btrfs: migrate extent_buffer::pages[] to folio 2023-12-15 23:01:04 +01:00
accessors.c btrfs: migrate get_eb_page_index() and get_eb_offset_in_page() to folios 2023-12-15 23:03:58 +01:00
accessors.h btrfs: migrate extent_buffer::pages[] to folio 2023-12-15 23:01:04 +01:00
acl.c fs: port acl to mnt_idmap 2023-01-19 09:24:28 +01:00
acl.h fs: port ->set_acl() to pass mnt_idmap 2023-01-19 09:24:27 +01:00
async-thread.c btrfs: merge ordered work callbacks in btrfs_work into one 2023-10-12 16:44:10 +02:00
async-thread.h btrfs: merge ordered work callbacks in btrfs_work into one 2023-10-12 16:44:10 +02:00
backref.c for-6.7-tag 2023-10-30 10:42:06 -10:00
backref.h for-6.7-tag 2023-10-30 10:42:06 -10:00
bio.c btrfs: migrate btrfs_repair_io_failure() to folio interfaces 2023-12-15 23:03:58 +01:00
bio.h btrfs: migrate btrfs_repair_io_failure() to folio interfaces 2023-12-15 23:03:58 +01:00
block-group.c btrfs: add new unused block groups to the list of unused block groups 2024-02-09 20:29:22 +01:00
block-group.h btrfs: add and use helper to check if block group is used 2024-02-09 20:29:14 +01:00
block-rsv.c btrfs: read raid stripe tree from disk 2023-10-12 16:44:09 +02:00
block-rsv.h btrfs: move btrfs_check_trunc_cache_free_space into block-rsv.c 2023-06-19 13:59:24 +02:00
btrfs_inode.h btrfs: fix mismatching parameter names for btrfs_get_extent() 2023-12-15 22:59:30 +01:00
compression.c btrfs: zlib: fix and simplify the inline extent decompression 2024-01-18 23:35:26 +01:00
compression.h btrfs: zstd: fix and simplify the inline extent decompression 2024-01-18 23:35:35 +01:00
ctree.c btrfs: migrate get_eb_page_index() and get_eb_offset_in_page() to folios 2023-12-15 23:03:58 +01:00
ctree.h btrfs: switch btrfs_root::delayed_nodes_tree to xarray from radix-tree 2023-12-15 23:01:03 +01:00
defrag.c btrfs: defrag: avoid unnecessary defrag caused by incorrect extent size 2024-02-19 11:19:58 +01:00
defrag.h btrfs: move btrfs_defrag_root() to defrag.{c,h} 2023-10-12 16:44:13 +02:00
delalloc-space.c btrfs: don't reserve space for checksums when writing to nocow files 2024-02-13 18:36:35 +01:00
delalloc-space.h btrfs: move delalloc space related prototypes to delalloc-space.h 2022-12-05 18:00:44 +01:00
delayed-inode.c btrfs: switch btrfs_root::delayed_nodes_tree to xarray from radix-tree 2023-12-15 23:01:03 +01:00
delayed-inode.h btrfs: remove redundant root argument from btrfs_delayed_update_inode() 2023-10-12 16:44:12 +02:00
delayed-ref.c btrfs: fix qgroup record leaks when using simple quotas 2023-11-09 14:01:59 +01:00
delayed-ref.h btrfs: stop reserving excessive space for block group item insertions 2023-10-12 16:44:16 +02:00
dev-replace.c btrfs: use a dedicated data structure for chunk maps 2023-12-15 20:27:02 +01:00
dev-replace.h btrfs: move dev-replace prototypes into dev-replace.h 2022-12-05 18:00:47 +01:00
dir-item.c btrfs: abort transaction on generation mismatch when marking eb as dirty 2023-10-12 16:44:07 +02:00
dir-item.h btrfs: add fscrypt related dependencies to respective headers 2023-10-12 16:44:02 +02:00
discard.c btrfs: unexport btrfs_run_discard_work and make it static 2023-06-19 13:59:25 +02:00
discard.h btrfs: unexport btrfs_run_discard_work and make it static 2023-06-19 13:59:25 +02:00
disk-io.c btrfs: do not ASSERT() if the newly created subvolume already got read 2024-01-31 08:42:53 +01:00
disk-io.h btrfs: move one shot mount option clearing to super.c 2023-12-15 20:27:04 +01:00
export.c btrfs: move super_block specific helpers into super.h 2022-12-05 18:00:47 +01:00
export.h btrfs: simplify generation check in btrfs_get_dentry 2022-12-05 18:00:41 +01:00
extent_io.c btrfs: migrate eb_bitmap_offset() to folio interfaces 2023-12-15 23:03:58 +01:00
extent_io.h btrfs: migrate get_eb_page_index() and get_eb_offset_in_page() to folios 2023-12-15 23:03:58 +01:00
extent_map.c btrfs: use the flags of an extent map to identify the compression type 2023-12-15 22:59:02 +01:00
extent_map.h btrfs: use the flags of an extent map to identify the compression type 2023-12-15 22:59:02 +01:00
extent-io-tree.c btrfs: allocate btrfs_inode::file_extent_tree only without NO_HOLES 2023-12-15 22:59:01 +01:00
extent-io-tree.h btrfs: always set extent_io_tree::inode and drop fs_info 2023-12-15 20:27:02 +01:00
extent-tree.c btrfs: don't warn if discard range is not aligned to sector 2024-01-18 23:35:57 +01:00
extent-tree.h btrfs: get correct owning_root when dropping snapshot 2023-11-03 16:39:06 +01:00
file-item.c btrfs: use the flags of an extent map to identify the compression type 2023-12-15 22:59:02 +01:00
file-item.h btrfs: scrub: avoid unnecessary csum tree search preparing stripes 2023-08-21 14:54:48 +02:00
file.c btrfs: migrate subpage code to folio interfaces 2023-12-15 23:03:58 +01:00
file.h btrfs: use cached state when looking for delalloc ranges with fiemap 2022-12-05 18:00:56 +01:00
free-space-cache.c btrfs: migrate subpage code to folio interfaces 2023-12-15 23:03:58 +01:00
free-space-cache.h btrfs: move btrfs_check_trunc_cache_free_space into block-rsv.c 2023-06-19 13:59:24 +02:00
free-space-tree.c btrfs: abort transaction on generation mismatch when marking eb as dirty 2023-10-12 16:44:07 +02:00
free-space-tree.h btrfs: make clear_cache mount option to rebuild FST without disabling it 2023-05-10 14:51:27 +02:00
fs.c btrfs: sysfs: update fs features directory asynchronously 2023-02-13 17:50:35 +01:00
fs.h btrfs: remove old mount API code 2023-12-15 20:27:04 +01:00
inode-item.c btrfs: track owning root in btrfs_ref 2023-10-12 16:44:11 +02:00
inode-item.h btrfs: add fscrypt related dependencies to respective headers 2023-10-12 16:44:02 +02:00
inode.c btrfs: reject encoded write if inode has nodatasum flag set 2024-02-13 18:38:05 +01:00
ioctl.c btrfs: forbid creating subvol qgroups 2024-01-31 08:42:44 +01:00
ioctl.h fs: port ->fileattr_set() to pass mnt_idmap 2023-01-19 09:24:27 +01:00
Kconfig btrfs: check-integrity: remove CONFIG_BTRFS_FS_CHECK_INTEGRITY option 2023-10-12 16:44:05 +02:00
locking.c btrfs: add raid stripe tree definitions 2023-10-12 16:44:09 +02:00
locking.h btrfs: do not block starts waiting on previous transaction commit 2023-09-08 14:10:49 +02:00
lru_cache.c btrfs: fix typos found by codespell 2023-12-15 23:00:04 +01:00
lru_cache.h btrfs: remove btrfs_lru_cache_is_full() inline function 2023-04-17 18:01:18 +02:00
lzo.c btrfs: lzo: fix and simplify the inline extent decompression 2024-01-18 23:35:30 +01:00
Makefile btrfs: add support for inserting raid stripe extents 2023-10-12 16:44:09 +02:00
messages.c btrfs: constify fs_info parameter in __btrfs_panic() 2023-12-15 20:27:02 +01:00
messages.h btrfs: constify fs_info parameter in __btrfs_panic() 2023-12-15 20:27:02 +01:00
misc.h minmax: add in_range() macro 2023-08-24 16:20:18 -07:00
ordered-data.c btrfs: migrate subpage code to folio interfaces 2023-12-15 23:03:58 +01:00
ordered-data.h btrfs: remove unused btrfs_ordered_extent::outstanding_isize 2023-12-15 20:27:01 +01:00
orphan.c btrfs: move orphan prototypes into orphan.h 2022-12-05 18:00:47 +01:00
orphan.h btrfs: move orphan prototypes into orphan.h 2022-12-05 18:00:47 +01:00
print-tree.c btrfs: new inline ref storing owning subvol of data extents 2023-10-12 16:44:11 +02:00
print-tree.h btrfs: print-tree: pass const extent buffer pointer 2023-06-19 13:59:22 +02:00
props.c btrfs: move btrfs_name_hash to dir-item.h 2023-10-12 16:44:02 +02:00
props.h btrfs: make module init/exit match their sequence 2022-12-05 18:00:40 +01:00
qgroup.c btrfs: forbid deleting live subvol qgroup 2024-01-31 08:42:47 +01:00
qgroup.h btrfs: ensure releasing squota reserve on head refs 2023-12-06 22:32:57 +01:00
raid56.c btrfs: refactor alloc_extent_buffer() to allocate-then-attach method 2023-12-15 23:01:04 +01:00
raid56.h btrfs: use a dedicated data structure for chunk maps 2023-12-15 20:27:02 +01:00
raid-stripe-tree.c btrfs: directly return 0 on no error code in btrfs_insert_raid_extent() 2023-11-03 16:38:51 +01:00
raid-stripe-tree.h btrfs: zoned: support RAID0/1/10 on top of raid stripe tree 2023-10-12 16:44:09 +02:00
rcu-string.h btrfs: replace strncpy() with strscpy() 2022-12-05 18:00:59 +01:00
ref-verify.c btrfs: ref-verify: free ref cache before clearing mount opt 2024-01-12 01:59:49 +01:00
ref-verify.h
reflink.c btrfs: migrate subpage code to folio interfaces 2023-12-15 23:03:58 +01:00
reflink.h
relocation.c btrfs: migrate subpage code to folio interfaces 2023-12-15 23:03:58 +01:00
relocation.h btrfs: relocation: constify parameters where possible 2023-10-12 16:44:13 +02:00
root-tree.c btrfs: qgroup: add new quota mode for simple quotas 2023-10-12 16:44:10 +02:00
root-tree.h btrfs: drop __must_check annotations 2023-10-12 16:44:04 +02:00
scrub.c btrfs: scrub: limit RST scrub to chunk boundary 2024-01-18 23:43:08 +01:00
scrub.h btrfs: scrub: remove scrub_bio structure 2023-04-17 18:01:24 +02:00
send.c btrfs: send: return EOPNOTSUPP on unknown flags 2024-01-31 08:42:30 +01:00
send.h btrfs: send add define for v2 buffer size 2022-12-05 18:00:41 +01:00
space-info.c btrfs: adjust overcommit logic when very close to full 2023-10-12 16:44:16 +02:00
space-info.h btrfs: pass a space_info argument to btrfs_reserve_metadata_bytes() 2023-10-12 16:44:05 +02:00
subpage.c btrfs: don't unconditionally call folio_start_writeback in subpage 2024-01-18 23:39:59 +01:00
subpage.h btrfs: migrate subpage code to folio interfaces 2023-12-15 23:03:58 +01:00
super.c btrfs: use the original mount's mount options for the legacy reconfigure 2024-01-18 23:38:54 +01:00
super.h btrfs: remove old mount API code 2023-12-15 20:27:04 +01:00
sysfs.c btrfs: sysfs: validate scrub_speed_max value 2023-12-15 23:01:04 +01:00
sysfs.h btrfs: sysfs: update fs features directory asynchronously 2023-02-13 17:50:35 +01:00
transaction.c btrfs: don't refill whole delayed refs block reserve when starting transaction 2024-02-13 18:39:09 +01:00
transaction.h btrfs: free qgroup pertrans reserve on transaction abort 2023-12-06 22:32:49 +01:00
tree-checker.c btrfs: tree-checker: fix inline ref size in error messages 2024-01-18 23:35:50 +01:00
tree-checker.h btrfs: fix typos found by codespell 2023-12-15 23:00:04 +01:00
tree-log.c btrfs: use the flags of an extent map to identify the compression type 2023-12-15 22:59:02 +01:00
tree-log.h btrfs: change for_rename argument of btrfs_record_unlink_dir() to bool 2023-06-19 13:59:26 +02:00
tree-mod-log.c btrfs: avoid tree mod log ENOMEM failures when we don't need to log 2023-06-19 13:59:38 +02:00
tree-mod-log.h btrfs: fix SPDX comment in tree-mod-log.h 2022-12-05 18:00:48 +01:00
ulist.c btrfs: reformat remaining kdoc style comments 2023-10-12 16:44:04 +02:00
ulist.h btrfs: constify ulist parameter of ulist_next() 2022-12-05 18:00:50 +01:00
uuid-tree.c btrfs: abort transaction on generation mismatch when marking eb as dirty 2023-10-12 16:44:07 +02:00
uuid-tree.h btrfs: move uuid tree prototypes to uuid-tree.h 2022-12-05 18:00:46 +01:00
verity.c btrfs: remove redundant root argument from btrfs_update_inode() 2023-10-12 16:44:12 +02:00
verity.h btrfs: move verity prototypes into verity.h 2022-12-05 18:00:47 +01:00
volumes.c btrfs: fix unbalanced unlock of mapping_tree_lock 2024-01-12 01:59:59 +01:00
volumes.h btrfs: fix typos found by codespell 2023-12-15 23:00:04 +01:00
xattr.c btrfs: cache that we don't have security.capability set 2023-12-15 20:27:05 +01:00
xattr.h btrfs: move btrfs_xattr_handlers to .rodata 2023-10-09 16:24:17 +02:00
zlib.c btrfs: zlib: fix and simplify the inline extent decompression 2024-01-18 23:35:26 +01:00
zoned.c btrfs: zoned: fix chunk map leak when loading block group zone info 2024-02-13 18:38:19 +01:00
zoned.h btrfs: fix typos found by codespell 2023-12-15 23:00:04 +01:00
zstd.c btrfs: zstd: fix and simplify the inline extent decompression 2024-01-18 23:35:35 +01:00