linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-14 15:54:15 +08:00

History

Filipe Manana 30b80f3ce0 btrfs: use delayed items when logging a directory When logging a directory we start by flushing all its delayed items. That results in adding dir index items to the subvolume btree, for new dentries, and removing dir index items from the subvolume btree for any dentries that were deleted. This makes it straightforward to log a directory simply by iterating over all the modified subvolume btree leaves, especially when we used to log both dir index keys and dir item keys (before commit `339d035424` ("btrfs: only copy dir index keys when logging a directory") and when we used to copy old dir index entries for leaves modified in the current transaction (before commit `732d591a5d` ("btrfs: stop copying old dir items when logging a directory")). From an efficiency point of view this has a couple of drawbacks: 1) Adds extra latency, due to copying delayed items to the subvolume btree and deleting dir index items from the btree. Further if there are other tasks accessing the btree, which is common (syscalls like creat, mkdir, rename, link, unlink, truncate, reflinks, etc, finishing an ordered extent, etc), lock contention can cause further delays, both to the task logging a directory and to the other tasks accessing the btree; 2) More time spent overall flushing delayed items, if after logging the directory further changes are done to the directory in the same transaction. For example, if we add 10 dentries to a directory, fsync it, add more 10 dentries, fsync it again, then add more 10 dentries and fsync it again, then we end up inserting 3 batches of 10 items to the subvolume btree. With the changes from this patch, we flush all the delayed items to the btree only once - a single batch of 30 items, and outside the logging code (transaction commit or when delayed items are flushed asynchronously). This change simply skips the flushing of delayed items every time we log a directory. Instead we copy the delayed insertion items directly to the log tree and delete delayed deletion items directly from the log tree. Therefore avoiding changing first the subvolume btree and then scanning it for new items to copy from it to the log tree and detecting deletions by observing gaps in consecutive dir index keys in subvolume btree leaves. Running the following tests on a non-debug kernel (Debian's default kernel config), on a box with a NVMe device, a 12 cores Intel CPU and 64G of ram, produced the results below. The results compare a branch without this patch and all the other patches it depends on versus the same branch with the patchset applied. The patchset is comprised of the following patches: btrfs: don't drop dir index range items when logging a directory btrfs: remove the root argument from log_new_dir_dentries() btrfs: update stale comment for log_new_dir_dentries() btrfs: free list element sooner at log_new_dir_dentries() btrfs: avoid memory allocation at log_new_dir_dentries() for common case btrfs: remove root argument from btrfs_delayed_item_reserve_metadata() btrfs: store index number instead of key in struct btrfs_delayed_item btrfs: remove unused logic when looking up delayed items btrfs: shrink the size of struct btrfs_delayed_item btrfs: search for last logged dir index if it's not cached in the inode btrfs: move need_log_inode() to above log_conflicting_inodes() btrfs: move log_new_dir_dentries() above btrfs_log_inode() btrfs: log conflicting inodes without holding log mutex of the initial inode btrfs: skip logging parent dir when conflicting inode is not a dir btrfs: use delayed items when logging a directory Custom test script for testing time spent at btrfs_log_inode(): #!/bin/bash DEV=/dev/nvme0n1 MNT=/mnt/nvme0n1 # Total number of files to create in the test directory. NUM_FILES=10000 # Fsync after creating or renaming N files. FSYNC_AFTER=100 umount $DEV &> /dev/null mkfs.btrfs -f $DEV mount -o ssd $DEV $MNT TEST_DIR=$MNT/testdir mkdir $TEST_DIR echo "Creating files..." for ((i = 1; i <= $NUM_FILES; i++)); do echo -n > $TEST_DIR/file_$i if (( ($i % $FSYNC_AFTER) == 0 )); then xfs_io -c "fsync" $TEST_DIR fi done sync echo "Renaming files..." for ((i = 1; i <= $NUM_FILES; i++)); do mv $TEST_DIR/file_$i $TEST_DIR/file_$i.renamed if (( ($i % $FSYNC_AFTER) == 0 )); then xfs_io -c "fsync" $TEST_DIR fi done umount $MNT And using the following bpftrace script to capture the total time that is spent at btrfs_log_inode(): #!/usr/bin/bpftrace k:btrfs_log_inode { @start_log_inode[tid] = nsecs; } kr:btrfs_log_inode /@start_log_inode[tid]/ { $dur = (nsecs - @start_log_inode[tid]) / 1000; @btrfs_log_inode_total_time = sum($dur); delete(@start_log_inode[tid]); } END { clear(@start_log_inode); } Result before applying patchset: @btrfs_log_inode_total_time: 622642 Result after applying patchset: @btrfs_log_inode_total_time: 354134 (-43.1% time spent) The following dbench script was also used for testing: #!/bin/bash NUM_JOBS=$(nproc --all) DEV=/dev/nvme0n1 MNT=/mnt/nvme0n1 MOUNT_OPTIONS="-o ssd" MKFS_OPTIONS="-O no-holes -R free-space-tree" echo "performance" \| \ tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor umount $DEV &> /dev/null mkfs.btrfs -f $MKFS_OPTIONS $DEV mount $MOUNT_OPTIONS $DEV $MNT dbench -D $MNT --skip-cleanup -t 120 -S $NUM_JOBS umount $MNT Before patchset: Operation Count AvgLat MaxLat ---------------------------------------- NTCreateX 3322265 0.034 21.032 Close 2440562 0.002 0.994 Rename 140664 1.150 269.633 Unlink 670796 1.093 269.678 Deltree 96 5.481 15.510 Mkdir 48 0.004 0.052 Qpathinfo 3010924 0.014 8.127 Qfileinfo 528055 0.001 0.518 Qfsinfo 552113 0.003 0.372 Sfileinfo 270575 0.005 0.688 Find 1164176 0.052 13.931 WriteX 1658537 0.019 5.918 ReadX 5207412 0.003 1.034 LockX 10818 0.003 0.079 UnlockX 10818 0.002 0.313 Flush 232811 1.027 269.735 Throughput 869.867 MB/sec (sync dirs) 12 clients 12 procs max_latency=269.741 ms After patchset: Operation Count AvgLat MaxLat ---------------------------------------- NTCreateX 4152738 0.029 20.863 Close 3050770 0.002 1.119 Rename 175829 0.871 211.741 Unlink 838447 0.845 211.724 Deltree 120 4.798 14.162 Mkdir 60 0.003 0.005 Qpathinfo 3763807 0.011 4.673 Qfileinfo 660111 0.001 0.400 Qfsinfo 690141 0.003 0.429 Sfileinfo 338260 0.005 0.725 Find 1455273 0.046 6.787 WriteX 2073307 0.017 5.690 ReadX 6509193 0.003 1.171 LockX 13522 0.003 0.077 UnlockX 13522 0.002 0.125 Flush 291044 0.811 211.631 Throughput 1089.27 MB/sec (sync dirs) 12 clients 12 procs max_latency=211.750 ms (+25.2% throughput, -21.5% max latency) Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>		2022-09-26 12:27:57 +02:00
..
tests	btrfs: add optimized btrfs_ino() version for 64 bits systems	2022-07-25 17:45:41 +02:00
acl.c	btrfs: reserve correct number of items for inode creation	2022-05-16 17:03:08 +02:00
async-thread.c	btrfs: simplify WQ_HIGHPRI handling in struct btrfs_workqueue	2022-05-16 17:03:15 +02:00
async-thread.h	btrfs: remove unused typedefs get_extent_t and btrfs_work_func_t	2022-07-25 17:45:36 +02:00
backref.c	btrfs: sink iterator parameter to btrfs_ioctl_logical_to_ino	2022-07-25 17:45:36 +02:00
backref.h	btrfs: sink iterator parameter to btrfs_ioctl_logical_to_ino	2022-07-25 17:45:36 +02:00
block-group.c	btrfs: delete btrfs_wait_space_cache_v1_finished	2022-09-26 12:27:55 +02:00
block-group.h	btrfs: delete btrfs_wait_space_cache_v1_finished	2022-09-26 12:27:55 +02:00
block-rsv.c	btrfs: use enum for btrfs_block_rsv::type	2022-07-25 17:45:40 +02:00
block-rsv.h	btrfs: use enum for btrfs_block_rsv::type	2022-07-25 17:45:40 +02:00
btrfs_inode.h	btrfs: add optimized btrfs_ino() version for 64 bits systems	2022-07-25 17:45:41 +02:00
check-integrity.c	fs/btrfs: Use the enum req_op and blk_opf_t types	2022-07-14 12:14:32 -06:00
check-integrity.h	btrfs: check-integrity: split submit_bio from btrfsic checking	2022-05-16 17:03:12 +02:00
compression.c	for-5.20-tag	2022-08-03 14:54:52 -07:00
compression.h	for-5.20-tag	2022-08-03 14:54:52 -07:00
ctree.c	btrfs: fix lockdep splat with reloc root extent buffers	2022-08-17 16:19:12 +02:00
ctree.h	btrfs: send: add support for fs-verity	2022-09-26 12:27:55 +02:00
delalloc-space.c	btrfs: convert count_max_extents() to use fs_info->max_extent_size	2022-07-25 17:45:41 +02:00
delalloc-space.h
delayed-inode.c	btrfs: use delayed items when logging a directory	2022-09-26 12:27:57 +02:00
delayed-inode.h	btrfs: use delayed items when logging a directory	2022-09-26 12:27:57 +02:00
delayed-ref.c	btrfs: switch btrfs_block_rsv::full to bool	2022-07-25 17:45:40 +02:00
delayed-ref.h	btrfs: remove btrfs_delayed_extent_op::is_data	2022-05-16 17:17:31 +02:00
dev-replace.c	btrfs: remove lock protection for BLOCK_GROUP_FLAG_TO_COPY	2022-09-26 12:27:54 +02:00
dev-replace.h	btrfs: zoned: mark block groups to copy for device-replace	2021-02-09 02:46:07 +01:00
dir-item.c	btrfs: use btrfs_for_each_slot in btrfs_search_dir_index_item	2022-05-16 17:03:07 +02:00
discard.c	btrfs: fix typos in comments	2021-06-22 14:11:57 +02:00
discard.h
disk-io.c	btrfs: add lockdep annotations for the ordered extents wait event	2022-09-26 12:27:53 +02:00
disk-io.h	btrfs: move lockdep class helpers to locking.c	2022-08-17 16:19:10 +02:00
export.c
export.h
extent_io.c	btrfs: use atomic_try_cmpxchg in free_extent_buffer	2022-09-26 12:27:55 +02:00
extent_io.h	btrfs: fix repair of compressed extents	2022-07-25 19:56:16 +02:00
extent_map.c	btrfs: assert we have a write lock when removing and replacing extent maps	2022-03-14 13:13:50 +01:00
extent_map.h	btrfs: defrag: don't use merged extent map for their generation check	2022-02-23 17:43:13 +01:00
extent-io-tree.h	btrfs: Convert from invalidatepage to invalidate_folio	2022-03-15 08:23:29 -04:00
extent-tree.c	btrfs: convert block group bit field to use bit helpers	2022-09-26 12:27:54 +02:00
file-item.c	btrfs: rename btrfs_insert_file_extent() to btrfs_insert_hole_extent()	2022-09-26 12:27:54 +02:00
file.c	btrfs: log conflicting inodes without holding log mutex of the initial inode	2022-09-26 12:27:57 +02:00
free-space-cache.c	btrfs: convert block group bit field to use bit helpers	2022-09-26 12:27:54 +02:00
free-space-cache.h	btrfs: change name and type of private member of btrfs_free_space_ctl	2022-01-03 15:09:50 +01:00
free-space-tree.c	btrfs: use rbtree with leftmost node cached for tracking lowest block group	2022-05-16 17:03:13 +02:00
free-space-tree.h
inode-item.c	btrfs: make should_throttle loop local in btrfs_truncate_inode_items	2022-01-07 14:18:25 +01:00
inode-item.h	btrfs: add inode to truncate control	2022-01-07 14:18:24 +01:00
inode.c	btrfs: rename btrfs_insert_file_extent() to btrfs_insert_hole_extent()	2022-09-26 12:27:54 +02:00
ioctl.c	btrfs: use fs_info->max_extent_size in get_extent_max_capacity()	2022-07-25 17:45:41 +02:00
Kconfig	btrfs: use generic Kconfig option for 256kB page size limit	2022-01-20 08:52:55 +02:00
locking.c	btrfs: fix lockdep splat with reloc root extent buffers	2022-08-17 16:19:12 +02:00
locking.h	btrfs: fix lockdep splat with reloc root extent buffers	2022-08-17 16:19:12 +02:00
lzo.c	btrfs: replace kmap() with kmap_local_page() in lzo.c	2022-07-25 17:45:33 +02:00
Makefile	Kbuild: add -Wno-shift-negative-value where -Wextra is used	2022-03-13 17:30:31 +09:00
misc.h	btrfs: use correct header for div_u64 in misc.h	2021-09-07 14:29:50 +02:00
ordered-data.c	btrfs: add lockdep annotations for the ordered extents wait event	2022-09-26 12:27:53 +02:00
ordered-data.h	btrfs: remove the finish_func argument to btrfs_mark_ordered_io_finished	2022-07-25 17:45:37 +02:00
orphan.c
print-tree.c	btrfs: unify the error handling pattern for read_tree_block()	2022-03-14 13:13:53 +01:00
print-tree.h	btrfs: print the actual offset in btrfs_root_name	2021-01-07 17:25:05 +01:00
props.c	btrfs: move common inode creation code into btrfs_create_new_inode()	2022-05-16 17:03:08 +02:00
props.h	btrfs: move common inode creation code into btrfs_create_new_inode()	2022-05-16 17:03:08 +02:00
qgroup.c	btrfs: avoid blocking on space revervation when doing nowait dio writes	2022-05-16 17:03:10 +02:00
qgroup.h	btrfs: avoid blocking on space revervation when doing nowait dio writes	2022-05-16 17:03:10 +02:00
raid56.c	for-5.20-tag	2022-08-03 14:54:52 -07:00
raid56.h	btrfs: do not return errors from raid56_parity_recover	2022-07-25 17:45:39 +02:00
rcu-string.h
ref-verify.c	btrfs: stop accessing ->extent_root directly	2022-01-03 15:09:49 +01:00
ref-verify.h
reflink.c	btrfs: clean up chained assignments	2022-07-25 17:45:39 +02:00
reflink.h
relocation.c	btrfs: fix lockdep splat with reloc root extent buffers	2022-08-17 16:19:12 +02:00
root-tree.c	btrfs: fix silent failure when deleting root reference	2022-08-23 22:15:21 +02:00
scrub.c	btrfs: scrub: use larger block size for data extent scrub	2022-09-26 12:27:55 +02:00
send.c	btrfs: send: add support for fs-verity	2022-09-26 12:27:55 +02:00
send.h	btrfs: send: add support for fs-verity	2022-09-26 12:27:55 +02:00
space-info.c	btrfs: convert block group bit field to use bit helpers	2022-09-26 12:27:54 +02:00
space-info.h	btrfs: handle space_info setting of bg in btrfs_add_bg_to_space_info	2022-09-26 12:27:54 +02:00
struct-funcs.c	btrfs: remove redundant check in up check_setget_bounds	2022-07-25 17:45:33 +02:00
subpage.c	btrfs: remove extent writepage address space operation	2022-07-25 17:45:37 +02:00
subpage.h	btrfs: make nodesize >= PAGE_SIZE case to reuse the non-subpage routine	2022-05-16 17:03:11 +02:00
super.c	- The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe	2022-08-05 16:32:45 -07:00
sysfs.c	btrfs: sysfs: use sysfs_streq for string matching	2022-09-26 12:27:53 +02:00
sysfs.h
transaction.c	btrfs: add lockdep annotations for pending_ordered wait event	2022-09-26 12:27:53 +02:00
transaction.h	btrfs: pass btrfs_fs_info for deleting snapshots and cleaner	2022-03-14 13:13:52 +01:00
tree-checker.c	btrfs: tree-checker: check for overlapping extent items	2022-08-17 16:20:25 +02:00
tree-checker.h	btrfs: tree-checker: check extent buffer owner against owner rootid	2022-05-16 17:03:09 +02:00
tree-defrag.c	btrfs: remove unnecessary extent root check in btrfs_defrag_leaves	2022-01-03 15:09:48 +01:00
tree-log.c	btrfs: use delayed items when logging a directory	2022-09-26 12:27:57 +02:00
tree-log.h	btrfs: use delayed items when logging a directory	2022-09-26 12:27:57 +02:00
tree-mod-log.c	btrfs: fix race when picking most recent mod log operation for an old root	2021-04-20 19:27:17 +02:00
tree-mod-log.h	btrfs: add and use helper to get lowest sequence number for the tree mod log	2021-04-19 17:25:17 +02:00
ulist.c
ulist.h
uuid-tree.c	btrfs: drop the _nr from the item helpers	2022-01-03 15:09:43 +01:00
verity.c	btrfs: send: add support for fs-verity	2022-09-26 12:27:55 +02:00
volumes.c	btrfs: remove lock protection for BLOCK_GROUP_FLAG_RELOCATING_REPAIR	2022-09-26 12:27:54 +02:00
volumes.h	btrfs: do not return errors from btrfs_map_bio	2022-07-25 17:45:39 +02:00
xattr.c	btrfs: check if root is readonly while setting security xattr	2022-08-22 18:06:30 +02:00
xattr.h
zlib.c	btrfs: zlib: replace kmap() with kmap_local_page() in zlib_decompress_bio()	2022-07-25 17:45:41 +02:00
zoned.c	btrfs: convert block group bit field to use bit helpers	2022-09-26 12:27:54 +02:00
zoned.h	btrfs: zoned: activate metadata block group on flush_space	2022-07-25 17:45:42 +02:00
zstd.c	btrfs: zstd: replace kmap() with kmap_local_page()	2022-07-25 17:45:40 +02:00