linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-12-13 22:14:20 +08:00

History

Filipe Manana c65ca98f9e btrfs: unlock path before checking if extent is shared during nocow writeback When we are attempting to start writeback for an existing extent in NOCOW mode, at run_delalloc_nocow(), we must check if the extent is shared, and if it is, fallback to a COW write. However we do such check while still holding a read lock on the leaf that contains the file extent item, and that check, the call to btrfs_cross_ref_exist(), can take some time because: 1) It needs to do a search on the extent tree, which obviously takes some time, specially if delayed references are being run at the moment, as we can block when trying to lock currently write locked btree nodes; 2) It needs to check the delayed references for any existing reference for our data extent, this requires acquiring the delayed references' spinlock and maybe block on the mutex of a delayed reference head in the case where there is a delayed reference for our data extent, in the worst case it makes us release the path on the extent tree and retry the whole process again (going back to step 1). There are other operations we do while holding the leaf locked that can take some significant time as well (specially all together): * btrfs_extent_readonly() - to check if the block group containing the extent is currently in RO mode. This requires taking a spinlock and searching for the block group in a rbtree that can be big on large filesystems; * csum_exist_in_range() - to search if there are any checksums in the csum tree for the extent. Like before, this can take some time if we are in a filesystem that has both COW and NOCOW files, in which case the csum tree is not empty; * btrfs_inc_nocow_writers() - increment the number of nocow writers in the block group that contains the data extent. Needs to acquire a spinlock and search for the block group in a rbtree that can be big on large filesystems. So just unlock the leaf (release the path) before doing all those checks, since we do not need it anymore. In case we can not do a NOCOW write for the extent, due to any of those checks failing, and the writeback range goes beyond that extents' length, we will do another btree search for the next file extent item. The following script that calls dbench was used to measure the impact of this change on a VM with 8 CPUs, 16Gb of ram, using a raw NVMe device directly (no intermediary filesystem on the host) and using a non-debug kernel (default configuration on Debian): $ cat test-dbench.sh #!/bin/bash DEV=/dev/sdk MNT=/mnt/sdk MOUNT_OPTIONS="-o ssd -o nodatacow" MKFS_OPTIONS="-m single -d single" mkfs.btrfs -f $MKFS_OPTIONS $DEV mount $MOUNT_OPTIONS $DEV $MNT dbench -D $MNT -t 300 64 umount $MNT Before this change: Operation Count AvgLat MaxLat ---------------------------------------- NTCreateX 9326331 0.317 399.957 Close 6851198 0.002 6.402 Rename 394894 2.621 402.819 Unlink 1883131 0.931 398.082 Deltree 256 19.160 303.580 Mkdir 128 0.003 0.016 Qpathinfo 8452314 0.068 116.133 Qfileinfo 1481921 0.001 5.081 Qfsinfo 1549963 0.002 4.444 Sfileinfo 759679 0.084 17.079 Find `3268168` 0.396 118.196 WriteX 4653310 0.056 110.993 ReadX 14618818 0.005 23.314 LockX 30364 0.003 0.497 UnlockX 30364 0.002 1.720 Flush 653619 16.954 569.299 Throughput 966.651 MB/sec 64 clients 64 procs max_latency=569.377 ms After this change: Operation Count AvgLat MaxLat ---------------------------------------- NTCreateX 9710433 0.302 232.449 Close 7132948 0.002 11.496 Rename 411144 2.452 131.805 Unlink `1960961` 0.893 230.383 Deltree 256 14.858 198.646 Mkdir 128 0.002 0.005 Qpathinfo 8800890 0.066 111.588 Qfileinfo 1542556 0.001 3.852 Qfsinfo 1613835 0.002 5.483 Sfileinfo 790871 0.081 19.492 Find 3402743 0.386 120.185 WriteX 4842918 0.054 179.312 ReadX 15220407 0.005 32.435 LockX 31612 0.003 1.533 UnlockX 31612 0.002 1.047 Flush 680567 16.320 463.323 Throughput 1016.59 MB/sec 64 clients 64 procs max_latency=463.327 ms +5.0% throughput, -20.5% max latency Also, the following test using fio was run: $ cat test-fio.sh #!/bin/bash DEV=/dev/sdk MNT=/mnt/sdk MOUNT_OPTIONS="-o ssd -o nodatacow" MKFS_OPTIONS="-d single -m single" if [ $# -ne 4 ]; then echo "Use $0 NUM_JOBS FILE_SIZE FSYNC_FREQ BLOCK_SIZE" exit 1 fi NUM_JOBS=$1 FILE_SIZE=$2 FSYNC_FREQ=$3 BLOCK_SIZE=$4 cat <<EOF > /tmp/fio-job.ini [writers] rw=randwrite fsync=$FSYNC_FREQ fallocate=none group_reporting=1 direct=0 bs=$BLOCK_SIZE ioengine=sync size=$FILE_SIZE directory=$MNT numjobs=$NUM_JOBS EOF echo echo "Using fio config:" echo cat /tmp/fio-job.ini echo echo "mount options: $MOUNT_OPTIONS" echo mkfs.btrfs -f $MKFS_OPTIONS $DEV > /dev/null mount $MOUNT_OPTIONS $DEV $MNT echo "Creating nodatacow files before fio runs..." for ((i = 0; i < $NUM_JOBS; i++)); do xfs_io -f -c "pwrite -b 128M 0 $FILE_SIZE" "$MNT/writers.$i.0" done sync fio /tmp/fio-job.ini umount $MNT Before this change: $ ./test-fio.sh 16 512M 2 4K (...) WRITE: bw=28.3MiB/s (29.6MB/s), 28.3MiB/s-28.3MiB/s (29.6MB/s-29.6MB/s), io=8192MiB (8590MB), run=289800-289800msec After this change: $ ./test-fio.sh 16 512M 2 4K (...) WRITE: bw=31.2MiB/s (32.7MB/s), 31.2MiB/s-31.2MiB/s (32.7MB/s-32.7MB/s), io=8192MiB (8590MB), run=262845-262845msec +9.7% throughput, -9.8% runtime Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>		2020-12-08 15:54:15 +01:00
..
9p	fs: 9p: add generic splice_write file operation	2020-12-01 21:40:47 +01:00
adfs	Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2020-10-24 12:26:05 -07:00
affs	Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2020-10-24 12:26:05 -07:00
afs	afs: Fix speculative status fetch going out of order wrt to modifications	2020-11-22 11:27:03 -08:00
autofs	autofs: harden ioctl table	2020-10-16 11:11:22 -07:00
befs	[PATCH] reduce boilerplate in fsid handling	2020-09-18 16:45:50 -04:00
bfs	[PATCH] reduce boilerplate in fsid handling	2020-09-18 16:45:50 -04:00
btrfs	btrfs: unlock path before checking if extent is shared during nocow writeback	2020-12-08 15:54:15 +01:00
cachefiles	cachefiles: Handle readpage error correctly	2020-10-26 10:42:54 -07:00
ceph	ceph: check session state after bumping session->s_seq	2020-11-04 20:55:49 +01:00
cifs	cifs: refactor create_sd_buf() and and avoid corrupting the buffer	2020-12-03 17:12:14 -06:00
coda	docs: filesystems: convert coda.txt to ReST	2020-05-05 09:22:21 -06:00
configfs	fs: configfs: delete repeated words in comments	2020-10-16 11:11:19 -07:00
cramfs	[PATCH] reduce boilerplate in fsid handling	2020-09-18 16:45:50 -04:00
crypto	fscrypt: fix inline encryption not used on new files	2020-11-11 20:59:07 -08:00
debugfs	debugfs: remove return value of debugfs_create_devm_seqfile()	2020-10-30 08:37:39 +01:00
devpts
dlm	networking changes for the 5.10 merge window	2020-10-15 18:42:13 -07:00
ecryptfs	mm, treewide: rename kzfree() to kfree_sensitive()	2020-08-07 11:33:22 -07:00
efivarfs	efivarfs: revert "fix memory leak in efivarfs_create()"	2020-11-25 16:55:02 +01:00
efs	[PATCH] reduce boilerplate in fsid handling	2020-09-18 16:45:50 -04:00
erofs	erofs: fix setting up pcluster for temporary pages	2020-11-04 09:15:48 +08:00
exfat	Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2020-10-24 12:26:05 -07:00
exportfs
ext2	Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2020-10-24 12:26:05 -07:00
ext4	ext4: fix bogus warning in ext4_update_dx_flag()	2020-11-19 22:41:10 -05:00
f2fs	Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2020-10-24 12:26:05 -07:00
fat	[PATCH] reduce boilerplate in fsid handling	2020-09-18 16:45:50 -04:00
freevxfs
fscache	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next	2020-06-03 16:27:18 -07:00
fuse	fuse update for 5.10	2020-10-19 14:28:30 -07:00
gfs2	gfs2: Fix deadlock between gfs2_{create_inode,inode_lookup} and delete_work_func	2020-12-01 00:21:10 +01:00
hfs	fs: Replace zero-length array with flexible-array member	2020-10-29 17:22:59 -05:00
hfsplus	fs: Replace zero-length array with flexible-array member	2020-10-29 17:22:59 -05:00
hostfs	hostfs: Use kasprintf() instead of fixed buffer formatting	2020-03-29 23:23:00 +02:00
hpfs	[PATCH] reduce boilerplate in fsid handling	2020-09-18 16:45:50 -04:00
hugetlbfs	hugetlbfs: prevent filesystem stacking of hugetlbfs	2020-08-12 10:57:56 -07:00
iomap	iomap: clean up writeback state logic on writepage error	2020-11-04 08:52:46 -08:00
isofs	fs: Replace zero-length array with flexible-array member	2020-10-29 17:22:59 -05:00
jbd2	jbd2: fix kernel-doc markups	2020-11-19 22:38:29 -05:00
jffs2	treewide: Use fallthrough pseudo-keyword	2020-08-23 17:36:59 -05:00
jfs	fs: Introduce i_blocks_per_page	2020-09-21 08:59:26 -07:00
kernfs	fsnotify: pass dir and inode arguments to fsnotify()	2020-07-27 23:15:48 +02:00
lockd	The one new feature this time, from Anna Schumaker, is READ_PLUS, which	2020-10-22 09:44:27 -07:00
minix	[PATCH] reduce boilerplate in fsid handling	2020-09-18 16:45:50 -04:00
nfs	NFS: Remove unnecessary inode lock in nfs_fsync_dir()	2020-11-12 10:41:26 -05:00
nfs_common	NFSv4.2: Fix NFS4ERR_STALE error when doing inter server copy	2020-10-21 10:31:20 -04:00
nfsd	NFSD: fix missing refcount in nfsd4_copy by nfsd4_do_async_copy	2020-11-05 17:25:14 -05:00
nilfs2	Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2020-10-24 12:26:05 -07:00
nls	treewide: replace '---help---' in Kconfig files with 'help'	2020-06-14 01:57:21 +09:00
notify	fanotify: fix logic of reporting name info with watched parent	2020-11-09 15:03:08 +01:00
ntfs	Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2020-10-24 12:26:05 -07:00
ocfs2	ocfs2: initialize ip_next_orphan	2020-11-14 11:26:04 -08:00
omfs	fs: omfs: use kmemdup() rather than kmalloc+memcpy	2020-09-22 23:39:45 -04:00
openpromfs
orangefs	orangefs: remove unnecessary assignment to variable ret	2020-08-04 15:01:58 -04:00
overlayfs	ovl: use generic vfs_ioc_setflags_prepare() helper	2020-10-06 15:38:15 +02:00
proc	io_uring-5.10-2020-11-20	2020-11-20 11:47:22 -08:00
pstore	treewide: Use fallthrough pseudo-keyword	2020-08-23 17:36:59 -05:00
qnx4	[PATCH] reduce boilerplate in fsid handling	2020-09-18 16:45:50 -04:00
qnx6	[PATCH] reduce boilerplate in fsid handling	2020-09-18 16:45:50 -04:00
quota	\n	2020-10-15 14:56:15 -07:00
ramfs	ramfs: fix nommu mmap with gaps in the page cache	2020-10-16 11:11:22 -07:00
reiserfs	reiserfs: Fix oops during mount	2020-10-01 11:15:31 +02:00
romfs	Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2020-10-24 12:26:05 -07:00
squashfs	Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2020-10-24 12:26:05 -07:00
sysfs	sysfs: Add sysfs_emit and sysfs_emit_at to format sysfs output	2020-10-02 12:02:30 +02:00
sysv	[PATCH] reduce boilerplate in fsid handling	2020-09-18 16:45:50 -04:00
tracefs
ubifs	This pull request contains fixes for UBI and UBIFS	2020-10-18 09:56:50 -07:00
udf	Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2020-10-24 12:26:05 -07:00
ufs	Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2020-10-24 12:26:05 -07:00
unicode	unicode: Add utf8_casefold_hash	2020-09-10 14:03:31 -07:00
vboxsf	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial	2020-10-15 15:11:56 -07:00
verity	fs-verity: use smp_load_acquire() for ->i_verity_info	2020-07-21 16:02:41 -07:00
xfs	xfs: revert "xfs: fix rmap key and record comparison functions"	2020-11-19 15:17:50 -08:00
zonefs	Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2020-10-24 12:26:05 -07:00
aio.c	vfs: separate __sb_start_write into blocking and non-blocking helpers	2020-11-10 16:53:07 -08:00
anon_inodes.c
attr.c
bad_inode.c	fs: move the fiemap definitions out of fs.h	2020-06-03 23:16:55 -04:00
binfmt_aout.c	exec: Rename flush_old_exec begin_new_exec	2020-05-07 16:55:47 -05:00
binfmt_elf_fdpic.c	binfmt_elf, binfmt_elf_fdpic: use a VMA list snapshot	2020-10-16 11:11:21 -07:00
binfmt_elf.c	fs: Replace zero-length array with flexible-array member	2020-10-29 17:22:59 -05:00
binfmt_em86.c	Merge branch 'akpm' (patches from Andrew)	2020-06-04 19:18:29 -07:00
binfmt_flat.c	binfmt_flat: revert "binfmt_flat: don't offset the data start"	2020-08-24 08:49:13 +10:00
binfmt_misc.c	Merge branch 'akpm' (patches from Andrew)	2020-06-04 19:18:29 -07:00
binfmt_script.c	Merge branch 'akpm' (patches from Andrew)	2020-06-04 19:18:29 -07:00
block_dev.c	block: add a bdget_part helper	2020-10-05 10:38:33 -06:00
buffer.c	mm, memcg: rework remote charging API to support nesting	2020-10-18 09:27:09 -07:00
char_dev.c	vfs: allow unprivileged whiteout creation	2020-05-14 16:44:23 +02:00
compat_binfmt_elf.c	Split the old READ_IMPLIES_EXEC workaround from executable PT_GNU_STACK	2020-06-05 13:45:21 -07:00
coredump.c	coredump: fix core_pattern parse error	2020-12-06 10:19:07 -08:00
d_path.c	fs: fix NULL dereference due to data race in prepend_path()	2020-10-14 14:54:45 -07:00
dax.c	fuse update for 5.10	2020-10-19 14:28:30 -07:00
dcache.c	vfs: Use sequence counter with associated spinlock	2020-07-29 16:14:27 +02:00
dcookies.c
direct-io.c	\n	2020-10-15 15:03:10 -07:00
drop_caches.c	sysctl: pass kernel pointers to ->proc_handler	2020-04-27 02:07:40 -04:00
eventfd.c	eventfd: convert to f_op->read_iter()	2020-05-06 22:33:43 -04:00
eventpoll.c	ep_create_wakeup_source(): dentry name can change under you...	2020-09-24 19:41:58 -04:00
exec.c	powerpc updates for 5.10	2020-10-16 12:21:15 -07:00
fcntl.c	treewide: Use fallthrough pseudo-keyword	2020-08-23 17:36:59 -05:00
fhandle.c
file_table.c	task_work: cleanup notification modes	2020-10-17 15:05:30 -06:00
file.c	io_uring: don't rely on weak ->files references	2020-09-30 20:32:32 -06:00
filesystems.c	fs/filesystems.c: downgrade user-reachable WARN_ONCE() to pr_warn_once()	2020-04-10 15:36:22 -07:00
fs_context.c	treewide: Use fallthrough pseudo-keyword	2020-08-23 17:36:59 -05:00
fs_parser.c	fs_parse: mark fs_param_bad_value() as static	2020-10-13 18:38:27 -07:00
fs_pin.c
fs_struct.c	vfs: Use sequence counter with associated spinlock	2020-07-29 16:14:27 +02:00
fs_types.c
fs-writeback.c	block-5.10-2020-10-12	2020-10-13 12:12:44 -07:00
fsopen.c	treewide: Use fallthrough pseudo-keyword	2020-08-23 17:36:59 -05:00
init.c	init: add an init_dup helper	2020-08-04 21:02:38 -04:00
inode.c	fs: add a filesystem flag for THPs	2020-10-16 11:11:15 -07:00
internal.h	fs: remove compat_sys_mount	2020-09-22 23:45:57 -04:00
io_uring.c	io_uring: fix recvmsg setup with compat buf-select	2020-11-30 11:12:03 -07:00
io-wq.c	io-wq: cancel request if it's asking for files and we don't have them	2020-11-04 10:22:56 -07:00
io-wq.h	io_uring: unify fsize with def->work_flags	2020-10-20 16:03:13 -06:00
ioctl.c	fs: remove ksys_ioctl	2020-07-31 08:16:01 +02:00
Kconfig	tmpfs: support 64-bit inums per-sb	2020-08-07 11:33:24 -07:00
Kconfig.binfmt	treewide: replace '---help---' in Kconfig files with 'help'	2020-06-14 01:57:21 +09:00
kernel_read_file.c	fs/kernel_file_read: Add "offset" arg for partial reads	2020-10-05 13:37:04 +02:00
libfs.c	libfs: fix error cast of negative value in simple_attr_write()	2020-11-22 10:48:22 -08:00
locks.c	treewide: Use fallthrough pseudo-keyword	2020-08-23 17:36:59 -05:00
Makefile	Refactored code for 5.10:	2020-10-23 11:33:41 -07:00
mbcache.c
mount.h	proc/mounts: add cursor	2020-05-14 16:44:24 +02:00
mpage.c	fs: convert mpage_readpages to mpage_readahead	2020-06-02 10:59:07 -07:00
namei.c	Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2020-10-24 12:26:05 -07:00
namespace.c	Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2020-10-24 12:26:05 -07:00
no-block.c
nsfs.c	nsproxy: attach to namespaces via pidfds	2020-05-13 11:41:22 +02:00
open.c	exec: move S_ISREG() check earlier	2020-08-12 10:58:01 -07:00
pipe.c	Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2020-10-11 11:11:35 -07:00
pnode.c	propagate_one(): mnt_set_mountpoint() needs mount_lock	2020-04-27 10:37:14 -04:00
pnode.h
posix_acl.c	vfs: clean up posix_acl_permission() logic aroudn MAY_NOT_BLOCK	2020-06-08 11:04:19 -07:00
proc_namespace.c	Add a "nosymfollow" mount option.	2020-08-27 16:06:47 -04:00
read_write.c	Refactored code for 5.10:	2020-10-23 11:33:41 -07:00
readdir.c	fs: remove ksys_getdents64	2020-07-31 08:16:00 +02:00
remap_range.c	vfs: move the remap range helpers to remap_range.c	2020-10-15 09:48:49 -07:00
select.c	fs: Replace zero-length array with flexible-array member	2020-10-29 17:22:59 -05:00
seq_file.c	seq_file: add seq_read_iter	2020-11-06 10:05:18 -08:00
signalfd.c	treewide: Use fallthrough pseudo-keyword	2020-08-23 17:36:59 -05:00
splice.c	io_uring-5.10-2020-10-24	2020-10-24 12:40:18 -07:00
stack.c
stat.c	fs: remove KSTAT_QUERY_FLAGS	2020-09-26 22:55:05 -04:00
statfs.c	Add a "nosymfollow" mount option.	2020-08-27 16:06:47 -04:00
super.c	vfs: move __sb_{start,end}_write* to fs.h	2020-11-10 16:53:11 -08:00
sync.c	overlayfs update for 5.8	2020-06-09 15:40:50 -07:00
timerfd.c
userfaultfd.c	mm: remove the now-unnecessary mmget_still_valid() hack	2020-10-16 11:11:22 -07:00
utimes.c	fs: expose utimes_common	2020-07-31 08:16:01 +02:00
xattr.c	fs/xattr.c: fix kernel-doc warnings for setxattr & removexattr	2020-10-13 18:38:27 -07:00