linux/fs
Wu Fengguang 424b351fe1 writeback: refill b_io iff empty
There is no point to carry different refill policies between for_kupdate
and other type of works. Use a consistent "refill b_io iff empty" policy
which can guarantee fairness in an easy to understand way.

A b_io refill will setup a _fixed_ work set with all currently eligible
inodes and start a new round of walk through b_io. The "fixed" work set
means no new inodes will be added to the work set during the walk.
Only when a complete walk over b_io is done, new inodes that are
eligible at the time will be enqueued and the walk be started over.

This procedure provides fairness among the inodes because it guarantees
each inode to be synced once and only once at each round. So all inodes
will be free from starvations.

This change relies on wb_writeback() to keep retrying as long as we made
some progress on cleaning some pages and/or inodes. Without that ability,
the old logic on background works relies on aggressively queuing all
eligible inodes into b_io at every time. But that's not a guarantee.

The below test script completes a slightly faster now:

             2.6.39-rc3	  2.6.39-rc3-dyn-expire+
------------------------------------------------
all elapsed     256.043      252.367
stddev           24.381       12.530

tar elapsed      30.097       28.808
dd  elapsed      13.214       11.782

	#!/bin/zsh

	cp /c/linux-2.6.38.3.tar.bz2 /dev/shm/

	umount /dev/sda7
	mkfs.xfs -f /dev/sda7
	mount /dev/sda7 /fs

	echo 3 > /proc/sys/vm/drop_caches

	tic=$(cat /proc/uptime|cut -d' ' -f2)

	cd /fs
	time tar jxf /dev/shm/linux-2.6.38.3.tar.bz2 &
	time dd if=/dev/zero of=/fs/zero bs=1M count=1000 &

	wait
	sync
	tac=$(cat /proc/uptime|cut -d' ' -f2)
	echo elapsed: $((tac - tic))

It maintains roughly the same small vs. large file writeout shares, and
offers large files better chances to be written in nice 4M chunks.

Analyzes from Dave Chinner in great details:

Let's say we have lots of inodes with 100 dirty pages being created,
and one large writeback going on. We expire 8 new inodes for every
1024 pages we write back.

With the old code, we do:

	b_more_io (large inode) -> b_io (1l)
	8 newly expired inodes -> b_io (1l, 8s)

	writeback  large inode 1024 pages -> b_more_io

	b_more_io (large inode) -> b_io (8s, 1l)
	8 newly expired inodes -> b_io (8s, 1l, 8s)

	writeback  8 small inodes 800 pages
		   1 large inode 224 pages -> b_more_io

	b_more_io (large inode) -> b_io (8s, 1l)
	8 newly expired inodes -> b_io (8s, 1l, 8s)
	.....

Your new code:

	b_more_io (large inode) -> b_io (1l)
	8 newly expired inodes -> b_io (1l, 8s)

	writeback  large inode 1024 pages -> b_more_io
	(b_io == 8s)
	writeback  8 small inodes 800 pages

	b_io empty: (1800 pages written)
		b_more_io (large inode) -> b_io (1l)
		14 newly expired inodes -> b_io (1l, 14s)

	writeback  large inode 1024 pages -> b_more_io
	(b_io == 14s)
	writeback  10 small inodes 1000 pages
		   1 small inode 24 pages -> b_more_io (1l, 1s(24))
	writeback  5 small inodes 500 pages
	b_io empty: (2548 pages written)
		b_more_io (large inode) -> b_io (1l, 1s(24))
		20 newly expired inodes -> b_io (1l, 1s(24), 20s)
	......

Rough progression of pages written at b_io refill:

Old code:

	total	large file	% of writeback
	1024	224		21.9% (fixed)

New code:
	total	large file	% of writeback
	1800	1024		~55%
	2550	1024		~40%
	3050	1024		~33%
	3500	1024		~29%
	3950	1024		~26%
	4250	1024		~24%
	4500	1024		~22.7%
	4700	1024		~21.7%
	4800	1024		~21.3%
	4800	1024		~21.3%
	(pretty much steady state from here)

Ok, so the steady state is reached with a similar percentage of
writeback to the large file as the existing code. Ok, that's good,
but providing some evidence that is doesn't change the shared of
writeback to the large should be in the commit message ;)

The other advantage to this is that we always write 1024 page chunks
to the large file, rather than smaller "whatever remains" chunks.

CC: Jan Kara <jack@suse.cz>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
2011-06-08 08:25:21 +08:00
..
9p 9p: remove unnecessary dentry_unhash on rmdir, dir rename 2011-05-28 01:02:53 -04:00
adfs Fix common misspellings 2011-03-31 11:26:23 -03:00
affs affs: remove unnecessary dentry_unhash on rmdir, dir rename 2011-05-28 01:02:53 -04:00
afs afs: remove unnecessary dentry_unhash on rmdir, dir rename 2011-05-28 01:02:53 -04:00
autofs4 autofs4: bogus dentry_unhash() added in ->unlink() 2011-05-30 01:50:53 -04:00
befs Fix common misspellings 2011-03-31 11:26:23 -03:00
bfs bfs: remove unnecessary dentry_unhash on dir rename 2011-05-28 01:02:50 -04:00
btrfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable 2011-06-05 06:17:23 +09:00
cachefiles Fix common misspellings 2011-03-31 11:26:23 -03:00
ceph ceph: remove unnecessary dentry_unhash calls 2011-05-26 07:26:53 -04:00
cifs cifs/ubifs: Fix shrinker API change fallout 2011-05-29 11:17:34 -07:00
coda coda: remove unnecessary dentry_unhash on rmdir, dir rename 2011-05-28 01:02:53 -04:00
configfs configfs: remove unnecessary dentry_unhash on rmdir, dir rename 2011-05-28 01:02:54 -04:00
cramfs cramfs: generate unique inode number for better inode cache usage 2011-01-13 08:03:23 -08:00
debugfs debugfs: move to new strtobool 2011-05-19 16:55:28 +09:30
devpts fs/devpts/inode.c: correctly check d_alloc_name() return code in devpts_pty_new() 2011-03-22 17:44:17 -07:00
dlm Merge branch 'trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6 2011-05-26 13:19:00 -07:00
ecryptfs eCryptfs: Remove ecryptfs_header_cache_2 2011-05-29 14:24:25 -05:00
efs block: remove per-queue plugging 2011-03-10 08:52:07 +01:00
exofs exofs: remove unnecessary dentry_unhash on rmdir/rename_dir 2011-05-26 07:26:57 -04:00
exportfs vfs: Add open by file handle support 2011-03-15 02:21:44 -04:00
ext2 ext2: remove unnecessary dentry_unhash on rmdir/rename_dir 2011-05-26 07:26:56 -04:00
ext3 fs: pass exact type of data dirties to ->dirty_inode 2011-05-27 07:04:40 -04:00
ext4 writeback: introduce .tagged_writepages for the WB_SYNC_NONE sync stage 2011-06-08 08:25:20 +08:00
fat fat: remove unnecessary dentry_unhash on rmdir, dir rename 2011-05-28 01:02:54 -04:00
freevxfs treewide: fix a few typos in comments 2011-05-10 10:16:21 +02:00
fscache fscache: remove dead code under CONFIG_WORKQUEUE_DEBUGFS 2011-05-25 08:39:44 -07:00
fuse fuse: remove unnecessary dentry_unhash on rmdir, dir rename 2011-05-28 01:02:53 -04:00
gfs2 Merge branch 'trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6 2011-05-26 13:19:00 -07:00
hfs hfs: remove unnecessary dentry_unhash on rmdir, dir rename 2011-05-28 01:02:52 -04:00
hfsplus hfsplus: remove unnecessary dentry_unhash on rmdir, dir rename 2011-05-28 01:02:52 -04:00
hostfs hostfs: remove unnecessary dentry_unhash on rmdir, dir rename 2011-05-28 01:02:52 -04:00
hpfs hpfs: remove unnecessary dentry_unhash on rmdir, dir rename 2011-05-28 01:02:54 -04:00
hppfs fs: icache RCU free inodes 2011-01-07 17:50:26 +11:00
hugetlbfs mm: don't access vm_flags as 'int' 2011-05-26 09:20:31 -07:00
isofs Merge branch 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block 2011-03-24 10:16:26 -07:00
jbd jbd: Fix comment to match the code in journal_start() 2011-05-24 00:27:53 +02:00
jbd2 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 2011-05-26 09:53:20 -07:00
jffs2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2011-05-28 13:03:41 -07:00
jfs jfs: remove unnecessary dentry_unhash from rmdir, dir rename 2011-05-28 01:02:51 -04:00
lockd NLM: Fix "kernel BUG at fs/lockd/host.c:417!" or ".../host.c:283!" 2011-01-25 15:24:47 -05:00
logfs logfs: remove unnecessary dentry_unhash from rmdir, dir rename 2011-05-28 01:02:51 -04:00
minix minix: remove unnecessary dentry_unhash on rmdir, dir rename 2011-05-28 01:02:54 -04:00
ncpfs ncpfs: fix rename over directory with dangling references 2011-05-28 01:02:53 -04:00
nfs Merge branch 'pnfs-submit' of git://git.open-osd.org/linux-open-osd 2011-05-29 14:10:13 -07:00
nfs_common Fix common misspellings 2011-03-31 11:26:23 -03:00
nfsd Merge branch 'for-2.6.40' of git://linux-nfs.org/~bfields/linux 2011-05-29 11:21:12 -07:00
nilfs2 nilfs2: remove unnecessary dentry_unhash from rmdir, dir rename 2011-05-28 01:02:51 -04:00
nls
notify Merge branch 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6 2011-04-07 11:14:49 -07:00
ntfs Fix common misspellings 2011-03-31 11:26:23 -03:00
ocfs2 Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 2011-05-27 10:23:10 -07:00
omfs omfs: remove unnecessary dentry_unhash on rmdir, dir rneame 2011-05-28 01:02:52 -04:00
openpromfs fs: icache RCU free inodes 2011-01-07 17:50:26 +11:00
partitions Revert "block: Remove extra discard_alignment from hd_struct." 2011-05-30 07:42:51 +02:00
proc Merge git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile 2011-05-29 11:29:28 -07:00
pstore pstore: fix pstore filesystem mount/remount issue 2011-05-16 11:05:00 -07:00
qnx4 block: remove per-queue plugging 2011-03-10 08:52:07 +01:00
quota vmscan: change shrinker API by passing shrink_control struct 2011-05-25 08:39:26 -07:00
ramfs ramfs: fix memleak on no-mmu arch 2011-04-14 16:06:56 -07:00
reiserfs reiserfs: remove unnecessary dentry_unhash from rmdir, dir rename 2011-05-28 01:02:51 -04:00
romfs fs: icache RCU free inodes 2011-01-07 17:50:26 +11:00
squashfs Merge git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus 2011-05-29 11:19:45 -07:00
sysfs sysfs: remove "last sysfs file:" line from the oops messages 2011-05-13 16:05:51 -07:00
sysv sysv: remove unnecessary dentry_unhash from rmdir, dir rename 2011-05-28 01:02:50 -04:00
ubifs UBIFS: fix-up free space earlier 2011-06-03 18:12:31 +03:00
udf udf: remove unnecessary dentry_unhash from rmdir, dir rename 2011-05-28 01:02:52 -04:00
ufs ufs: remove unnecessary dentry_unhash from rmdir, dir rename 2011-05-28 01:02:51 -04:00
xfs fs: pass exact type of data dirties to ->dirty_inode 2011-05-27 07:04:40 -04:00
aio.c Merge branch 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block 2011-03-24 10:16:26 -07:00
anon_inodes.c sanitize vfsmount refcounting changes 2011-01-16 13:47:07 -05:00
attr.c Cache xattr security drop check for write v2 2011-05-28 12:02:09 -04:00
bad_inode.c fs: provide rcu-walk aware permission i_ops 2011-01-07 17:50:29 +11:00
binfmt_aout.c
binfmt_elf_fdpic.c
binfmt_elf.c brk: COMPAT_BRK: fix detection of randomized brk 2011-04-14 16:06:55 -07:00
binfmt_em86.c
binfmt_flat.c CRED: Fix load_flat_shared_library() to initialise bprm correctly 2011-05-03 10:10:51 +10:00
binfmt_misc.c
binfmt_script.c
binfmt_som.c
bio-integrity.c block: Require subsystems to explicitly allocate bio_set integrity mempool 2011-03-17 11:11:05 +01:00
bio.c block: improve the bio_add_page() and bio_add_pc_page() descriptions 2011-05-28 14:44:46 +02:00
block_dev.c block: blkdev_get() should access ->bd_disk only after success 2011-06-01 08:28:47 +02:00
buffer.c fs: block_page_mkwrite should wait for writeback to finish 2011-05-28 01:03:21 -04:00
char_dev.c Merge branch 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block 2011-01-13 10:45:01 -08:00
compat_binfmt_elf.c
compat_ioctl.c Merge branch 'tty-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6 2011-01-07 14:39:20 -08:00
compat.c exec: unify do_execve/compat_do_execve code 2011-04-09 15:53:56 +02:00
dcache.c vmscan: change shrinker API by passing shrink_control struct 2011-05-25 08:39:26 -07:00
dcookies.c
direct-io.c Merge branch 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block 2011-03-24 10:16:26 -07:00
drop_caches.c vmscan: change shrinker API by passing shrink_control struct 2011-05-25 08:39:26 -07:00
eventfd.c Docbook: add fs/eventfd.c and fix typos in it 2011-02-21 15:07:04 -08:00
eventpoll.c Fix common misspellings 2011-03-31 11:26:23 -03:00
exec.c coredump: add support for exe_file in core name 2011-05-26 17:12:36 -07:00
fcntl.c userns: rename is_owner_or_cap to inode_owner_or_capable 2011-03-23 19:47:13 -07:00
fhandle.c fs/fhandle.c: add <linux/personality.h> for ia64 2011-04-14 16:06:56 -07:00
fifo.c Filesystem: fifo: Fixed coding style issue. 2011-03-21 00:16:09 -04:00
file_table.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2011-03-16 13:26:17 -07:00
file.c vfs: avoid large kmalloc()s for the fdtable 2011-04-28 11:28:20 -07:00
filesystems.c fs: synchronize_rcu when unregister_filesystem success not failure 2011-04-17 10:42:01 -07:00
fs_struct.c sanitize vfsmount refcounting changes 2011-01-16 13:47:07 -05:00
fs-writeback.c writeback: refill b_io iff empty 2011-06-08 08:25:21 +08:00
generic_acl.c userns: rename is_owner_or_cap to inode_owner_or_capable 2011-03-23 19:47:13 -07:00
inode.c fs: cosmetic inode.c cleanups 2011-05-27 09:43:00 -04:00
internal.h fs: move i_wb_list out from under inode_lock 2011-03-24 21:17:51 -04:00
ioctl.c vfs: cleanup do_vfs_ioctl() 2011-03-21 00:16:08 -04:00
ioprio.c
Kconfig Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2011-05-26 09:52:14 -07:00
Kconfig.binfmt
libfs.c libfs: drop unneeded dentry_unhash 2011-05-26 07:26:50 -04:00
locks.c Merge branch 'for-2.6.39' of git://linux-nfs.org/~bfields/linux 2011-03-24 08:20:39 -07:00
Makefile Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6 2011-03-16 19:01:29 -07:00
mbcache.c vmscan: change shrinker API by passing shrink_control struct 2011-05-25 08:39:26 -07:00
mpage.c mm/fs: add hooks to support cleancache 2011-05-26 10:01:43 -06:00
namei.c vfs: shrink_dcache_parent before rmdir, dir rename 2011-05-30 01:48:27 -04:00
namespace.c fs/namespace.c: bound mount propagation fix 2011-05-26 07:26:44 -04:00
nfsctl.c open-style analog of vfs_path_lookup() 2011-03-14 09:15:28 -04:00
no-block.c
open.c fs: Use BUG_ON(!mnt) at dentry_open(). 2011-03-21 01:10:41 -04:00
pipe.c Fix broken "pipe: use event aware wakeups" optimization 2011-01-20 16:21:59 -08:00
pnode.c fs: scale mntget/mntput 2011-01-07 17:50:33 +11:00
pnode.h
posix_acl.c NFS: Prevent memory allocation failure in nfsacl_encode() 2011-01-25 15:24:47 -05:00
read_write.c fix signedness mess in rw_verify_area() on 64bit architectures 2011-01-12 20:06:58 -05:00
read_write.h
readdir.c
select.c select: remove unused MAX_SELECT_SECONDS 2011-03-21 00:16:08 -04:00
seq_file.c
signalfd.c
splice.c splice: add wakeup_pipe_readers() 2011-05-23 19:58:53 +02:00
stack.c
stat.c readlinkat(), fchownat() and fstatat() with empty relative pathnames 2011-03-15 02:21:45 -04:00
statfs.c clean statfs-like syscalls up 2011-03-14 09:15:28 -04:00
super.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/djm/tmem 2011-05-26 10:50:56 -07:00
sync.c Merge branch 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block 2011-03-24 10:16:26 -07:00
timerfd.c timerfd: Manage cancelable timers in timerfd 2011-05-23 13:59:53 +02:00
utimes.c userns: rename is_owner_or_cap to inode_owner_or_capable 2011-03-23 19:47:13 -07:00
xattr_acl.c
xattr.c Cache xattr security drop check for write v2 2011-05-28 12:02:09 -04:00