linux/fs/btrfs
Filipe Manana 732d591a5d btrfs: stop copying old dir items when logging a directory
When logging a directory, we go over every leaf of the subvolume tree that
was changed in the current transaction and copy all its dir index keys to
the log tree.

That includes copying dir index keys created in past transactions. This is
done mostly for simplicity, as after logging the keys we log an item that
specifies the start and end ranges of the keys we logged. That item is
then used during log replay to figure out which keys need to be deleted -
every key in that range that we find in the subvolume tree and is not in
the log tree, needs to be deleted.

Now that we log only dir index keys, and not dir item keys anymore, when
we remove dentries from a directory (due to unlink and rename operations),
we can get entire leaves that we changed only for deleting old dir index
keys, or that have few dir index keys that are new - this is due to the
fact that the offset for new index keys comes from a monotonically
increasing counter.

We can avoid logging dir index keys from past transactions, and in order
to track the deletions, only log range items (BTRFS_DIR_LOG_INDEX_KEY key
type) when we find gaps between consecutive index keys. This massively
reduces the amount of logged metadata when we have deleted directory
entries, even if it's a small percentage of the total number of entries.
The reduction comes from both less items that are logged and instead of
logging many dir index items (struct btrfs_dir_item), which have a size
of 30 bytes plus a file name, we typically log just a few range items
(struct btrfs_dir_log_item), which take only 8 bytes each.

Even if no entries were deleted from a directory and only new entries
were added, we typically still get a reduction on the amount of logged
metadata, because it's very likely the first leaf that got the new
dir index entries also has several old dir index entries.

So change the logging logic to not log dir index keys created in past
transactions and log a range item for every gap it finds between each
pair of consecutive index keys, to ensure deletions are tracked and
replayed on log replay.

This patch is part of a patchset comprised of the following patches:

 1/4 btrfs: don't log unnecessary boundary keys when logging directory
 2/4 btrfs: put initial index value of a directory in a constant
 3/4 btrfs: stop copying old dir items when logging a directory
 4/4 btrfs: stop trying to log subdirectories created in past transactions

The following test was run on a branch without this patchset and on a
branch with the first three patches applied:

  $ cat test.sh
  #!/bin/bash

  DEV=/dev/nvme0n1
  MNT=/mnt/nvme0n1

  NUM_FILES=1000000
  NUM_FILE_DELETES=10000

  MKFS_OPTIONS="-O no-holes -R free-space-tree"
  MOUNT_OPTIONS="-o ssd"

  mkfs.btrfs -f $MKFS_OPTIONS $DEV
  mount $MOUNT_OPTIONS $DEV $MNT

  mkdir $MNT/testdir
  for ((i = 1; i <= $NUM_FILES; i++)); do
      echo -n > $MNT/testdir/file_$i
  done

  sync

  del_inc=$(( $NUM_FILES / $NUM_FILE_DELETES ))
  for ((i = 1; i <= $NUM_FILES; i += $del_inc)); do
      rm -f $MNT/testdir/file_$i
  done

  start=$(date +%s%N)
  xfs_io -c "fsync" $MNT/testdir
  end=$(date +%s%N)

  dur=$(( (end - start) / 1000000 ))
  echo "dir fsync took $dur ms after deleting $NUM_FILE_DELETES files"
  echo

  umount $MNT

The test was run on a non-debug kernel (Debian's default kernel config),
and the results were the following for various values of NUM_FILES and
NUM_FILE_DELETES:

** before, NUM_FILES = 1 000 000, NUM_FILE_DELETES = 10 000 **

dir fsync took 585 ms after deleting 10000 files

** after, NUM_FILES = 1 000 000, NUM_FILE_DELETES = 10 000 **

dir fsync took 34 ms after deleting 10000 files   (-94.2%)

** before, NUM_FILES = 100 000, NUM_FILE_DELETES = 1 000 **

dir fsync took 50 ms after deleting 1000 files

** after, NUM_FILES = 100 000, NUM_FILE_DELETES = 1 000 **

dir fsync took 7 ms after deleting 1000 files    (-86.0%)

** before, NUM_FILES = 10 000, NUM_FILE_DELETES = 100 **

dir fsync took 9 ms after deleting 100 files

** after, NUM_FILES = 10 000, NUM_FILE_DELETES = 100 **

dir fsync took 5 ms after deleting 100 files     (-44.4%)

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-14 13:13:46 +01:00
..
tests btrfs: selftests: dump extent io tree if extent-io-tree test failed 2022-01-07 14:18:27 +01:00
acl.c overlayfs update for 5.15 2021-09-02 09:21:27 -07:00
async-thread.c btrfs: fix memory ordering between normal and ordered work functions 2021-11-16 16:50:23 +01:00
async-thread.h
backref.c btrfs: stop accessing ->extent_root directly 2022-01-03 15:09:49 +01:00
backref.h btrfs: remove ignore_offset argument from btrfs_find_all_roots() 2021-08-23 13:19:01 +02:00
block-group.c btrfs: skip reserved bytes warning on unmount after log cleanup failure 2022-01-31 16:06:50 +01:00
block-group.h btrfs: fix deadlock between chunk allocation and chunk btree modifications 2021-10-26 19:08:07 +02:00
block-rsv.c btrfs: reserve extra space for the free space tree 2022-01-07 14:18:25 +01:00
block-rsv.h btrfs: init root block_rsv at init root time 2022-01-03 15:09:48 +01:00
btrfs_inode.h btrfs: put initial index value of a directory in a constant 2022-03-14 13:13:46 +01:00
check-integrity.c btrfs: check-integrity: stop storing the block device name in btrfsic_dev_state 2021-10-26 19:08:07 +02:00
check-integrity.h
compression.c btrfs: remove unnecessary parameter type from compression_decompress_bio 2022-01-07 14:18:27 +01:00
compression.h btrfs: determine stripe boundary at bio allocation time in btrfs_submit_compressed_write 2021-10-26 19:08:04 +02:00
ctree.c btrfs: refactor unlock_up 2022-01-07 14:18:26 +01:00
ctree.h btrfs: do not start relocation until in progress drops are done 2022-03-02 16:52:39 +01:00
delalloc-space.c btrfs: change root to fs_info for btrfs_reserve_metadata_bytes 2022-01-03 15:09:45 +01:00
delalloc-space.h
delayed-inode.c btrfs: add an inode-item.h 2022-01-07 14:18:23 +01:00
delayed-inode.h btrfs: make btrfs_delayed_update_inode take btrfs_inode 2020-12-08 15:54:10 +01:00
delayed-ref.c btrfs: reserve extra space for the free space tree 2022-01-07 14:18:25 +01:00
delayed-ref.h btrfs: make btrfs_ref::real_root optional 2021-10-26 19:08:06 +02:00
dev-replace.c btrfs: remove reada infrastructure 2022-01-07 14:18:26 +01:00
dev-replace.h btrfs: zoned: mark block groups to copy for device-replace 2021-02-09 02:46:07 +01:00
dir-item.c btrfs: drop the _nr from the item helpers 2022-01-03 15:09:43 +01:00
discard.c btrfs: fix typos in comments 2021-06-22 14:11:57 +02:00
discard.h btrfs: cleanup btrfs_discard_update_discardable usage 2020-12-08 15:54:02 +01:00
disk-io.c btrfs: do not start relocation until in progress drops are done 2022-03-02 16:52:39 +01:00
disk-io.h btrfs: track the csum, extent, and free space trees in a rb tree 2022-01-03 15:09:50 +01:00
export.c btrfs: locking: rip out path->leave_spinning 2020-12-08 15:54:02 +01:00
export.h
extent_io.c for-5.17-rc6-tag 2022-03-06 12:19:36 -08:00
extent_io.h btrfs: cleanup for extent_write_locked_range() 2021-10-26 19:08:04 +02:00
extent_map.c btrfs: defrag: don't use merged extent map for their generation check 2022-02-23 17:43:13 +01:00
extent_map.h btrfs: defrag: don't use merged extent map for their generation check 2022-02-23 17:43:13 +01:00
extent-io-tree.h btrfs: use fixed width int type for extent_state::state 2020-12-08 15:54:13 +01:00
extent-tree.c btrfs: do not start relocation until in progress drops are done 2022-03-02 16:52:39 +01:00
file-item.c btrfs: stop accessing ->csum_root directly 2022-01-03 15:09:49 +01:00
file.c btrfs: reduce extent threshold for autodefrag 2022-02-24 16:11:28 +01:00
free-space-cache.c btrfs: add inode to truncate control 2022-01-07 14:18:24 +01:00
free-space-cache.h btrfs: change name and type of private member of btrfs_free_space_ctl 2022-01-03 15:09:50 +01:00
free-space-tree.c btrfs: track the csum, extent, and free space trees in a rb tree 2022-01-03 15:09:50 +01:00
free-space-tree.h
inode-item.c btrfs: make should_throttle loop local in btrfs_truncate_inode_items 2022-01-07 14:18:25 +01:00
inode-item.h btrfs: add inode to truncate control 2022-01-07 14:18:24 +01:00
inode.c btrfs: put initial index value of a directory in a constant 2022-03-14 13:13:46 +01:00
ioctl.c btrfs: reuse existing pointers from btrfs_ioctl 2022-03-14 13:13:46 +01:00
Kconfig btrfs: use generic Kconfig option for 256kB page size limit 2022-01-20 08:52:55 +02:00
locking.c btrfs: fix typos in comments 2021-06-22 14:11:57 +02:00
locking.h btrfs: assert that extent buffers are write locked instead of only locked 2021-10-26 19:08:02 +02:00
lzo.c btrfs: prevent copying too big compressed lzo segment 2022-02-15 19:59:09 +01:00
Makefile btrfs: remove reada infrastructure 2022-01-07 14:18:26 +01:00
misc.h btrfs: use correct header for div_u64 in misc.h 2021-09-07 14:29:50 +02:00
ordered-data.c btrfs: zoned: fix double counting of split ordered extent 2021-09-07 14:30:41 +02:00
ordered-data.h btrfs: remove uptodate parameter from btrfs_dec_test_first_ordered_pending 2021-08-23 13:19:02 +02:00
orphan.c
print-tree.c btrfs: drop the _nr from the item helpers 2022-01-03 15:09:43 +01:00
print-tree.h btrfs: print the actual offset in btrfs_root_name 2021-01-07 17:25:05 +01:00
props.c btrfs: change root to fs_info for btrfs_reserve_metadata_bytes 2022-01-03 15:09:45 +01:00
props.h
qgroup.c btrfs: qgroup: fix deadlock between rescan worker and remove qgroup 2022-03-02 16:53:04 +01:00
qgroup.h btrfs: fix lock inversion problem when doing qgroup extent tracing 2021-07-22 15:50:07 +02:00
raid56.c btrfs: remove btrfs_raid_bio::fs_info member 2021-10-26 19:08:03 +02:00
raid56.h btrfs: remove btrfs_raid_bio::fs_info member 2021-10-26 19:08:03 +02:00
rcu-string.h
ref-verify.c btrfs: stop accessing ->extent_root directly 2022-01-03 15:09:49 +01:00
ref-verify.h
reflink.c btrfs: drop the _nr from the item helpers 2022-01-03 15:09:43 +01:00
reflink.h
relocation.c btrfs: do not start relocation until in progress drops are done 2022-03-02 16:52:39 +01:00
root-tree.c btrfs: do not start relocation until in progress drops are done 2022-03-02 16:52:39 +01:00
scrub.c btrfs: scrub: cleanup the argument list of scrub_stripe() 2022-01-07 14:18:27 +01:00
send.c btrfs: send: in case of IO error log it 2022-02-09 18:53:26 +01:00
send.h btrfs: send: prepare for v2 protocol 2021-10-29 12:38:43 +02:00
space-info.c btrfs: fix argument list that the kdoc format and script verified 2022-01-07 14:18:27 +01:00
space-info.h btrfs: change root to fs_info for btrfs_reserve_metadata_bytes 2022-01-03 15:09:45 +01:00
struct-funcs.c btrfs: add special case to setget helpers for 64k pages 2021-08-23 13:18:58 +02:00
subpage.c btrfs: subpage: fix a wrong check on subpage->writers 2022-03-02 16:51:39 +01:00
subpage.h btrfs: rework page locking in __extent_writepage() 2021-10-26 19:08:05 +02:00
super.c mm: remove cleancache 2022-01-22 08:33:38 +02:00
sysfs.c btrfs: sysfs: add devinfo/fsid to retrieve actual fsid from the device 2022-01-07 14:18:25 +01:00
sysfs.h
transaction.c btrfs: fix relocation crash due to premature return from btrfs_commit_transaction() 2022-03-02 16:52:46 +01:00
transaction.h btrfs: do not start relocation until in progress drops are done 2022-03-02 16:52:39 +01:00
tree-checker.c btrfs: tree-checker: use u64 for item data end to avoid overflow 2022-03-02 16:52:32 +01:00
tree-checker.h
tree-defrag.c btrfs: remove unnecessary extent root check in btrfs_defrag_leaves 2022-01-03 15:09:48 +01:00
tree-log.c btrfs: stop copying old dir items when logging a directory 2022-03-14 13:13:46 +01:00
tree-log.h btrfs: change error handling for btrfs_delete_*_in_log 2021-10-26 19:08:05 +02:00
tree-mod-log.c btrfs: fix race when picking most recent mod log operation for an old root 2021-04-20 19:27:17 +02:00
tree-mod-log.h btrfs: add and use helper to get lowest sequence number for the tree mod log 2021-04-19 17:25:17 +02:00
ulist.c
ulist.h
uuid-tree.c btrfs: drop the _nr from the item helpers 2022-01-03 15:09:43 +01:00
verity.c btrfs: drop the _nr from the item helpers 2022-01-03 15:09:43 +01:00
volumes.c btrfs: remove reada infrastructure 2022-01-07 14:18:26 +01:00
volumes.h btrfs: remove reada infrastructure 2022-01-07 14:18:26 +01:00
xattr.c btrfs: drop the _nr from the item helpers 2022-01-03 15:09:43 +01:00
xattr.h
zlib.c Revert "btrfs: compression: drop kmap/kunmap from zlib" 2021-10-29 13:03:05 +02:00
zoned.c btrfs: zoned: fix chunk allocation condition for zoned allocator 2022-01-07 14:18:26 +01:00
zoned.h btrfs: zoned: fix chunk allocation condition for zoned allocator 2022-01-07 14:18:26 +01:00
zstd.c lib: zstd: Add kernel-specific API 2021-11-08 16:55:21 -08:00