linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-14 15:54:15 +08:00

History

Johannes Weiner 7b785645e8 mm: fix page cache convergence regression Since `a283348629` ("page cache: Finish XArray conversion"), on most major Linux distributions, the page cache doesn't correctly transition when the hot data set is changing, and leaves the new pages thrashing indefinitely instead of kicking out the cold ones. On a freshly booted, freshly ssh'd into virtual machine with 1G RAM running stock Arch Linux: [root@ham ~]# ./reclaimtest.sh + dd of=workingset-a bs=1M count=0 seek=600 + cat workingset-a + cat workingset-a + cat workingset-a + cat workingset-a + cat workingset-a + cat workingset-a + cat workingset-a + cat workingset-a + ./mincore workingset-a 153600/153600 workingset-a + dd of=workingset-b bs=1M count=0 seek=600 + cat workingset-b + cat workingset-b + cat workingset-b + cat workingset-b + ./mincore workingset-a workingset-b 104029/153600 workingset-a 120086/153600 workingset-b + cat workingset-b + cat workingset-b + cat workingset-b + cat workingset-b + ./mincore workingset-a workingset-b 104029/153600 workingset-a 120268/153600 workingset-b workingset-b is a 600M file on a 1G host that is otherwise entirely idle. No matter how often it's being accessed, it won't get cached. While investigating, I noticed that the non-resident information gets aggressively reclaimed - /proc/vmstat::workingset_nodereclaim. This is a problem because a workingset transition like this relies on the non-resident information tracked in the page cache tree of evicted file ranges: when the cache faults are refaults of recently evicted cache, we challenge the existing active set, and that allows a new workingset to establish itself. Tracing the shrinker that maintains this memory revealed that all page cache tree nodes were allocated to the root cgroup. This is a problem, because 1) the shrinker sizes the amount of non-resident information it keeps to the size of the cgroup's other memory and 2) on most major Linux distributions, only kernel threads live in the root cgroup and everything else gets put into services or session groups: [root@ham ~]# cat /proc/self/cgroup 0::/user.slice/user-0.slice/session-c1.scope As a result, we basically maintain no non-resident information for the workloads running on the system, thus breaking the caching algorithm. Looking through the code, I found the culprit in the above-mentioned patch: when switching from the radix tree to xarray, it dropped the __GFP_ACCOUNT flag from the tree node allocations - the flag that makes sure the allocated memory gets charged to and tracked by the cgroup of the calling process - in this case, the one doing the fault. To fix this, allow xarray users to specify per-tree flag that makes xarray allocate nodes using __GFP_ACCOUNT. Then restore the page cache tree annotation to request such cgroup tracking for the cache nodes. With this patch applied, the page cache correctly converges on new workingsets again after just a few iterations: [root@ham ~]# ./reclaimtest.sh + dd of=workingset-a bs=1M count=0 seek=600 + cat workingset-a + cat workingset-a + cat workingset-a + cat workingset-a + cat workingset-a + cat workingset-a + cat workingset-a + cat workingset-a + ./mincore workingset-a 153600/153600 workingset-a + dd of=workingset-b bs=1M count=0 seek=600 + cat workingset-b + ./mincore workingset-a workingset-b 124607/153600 workingset-a 87876/153600 workingset-b + cat workingset-b + ./mincore workingset-a workingset-b 81313/153600 workingset-a 133321/153600 workingset-b + cat workingset-b + ./mincore workingset-a workingset-b 63036/153600 workingset-a 153600/153600 workingset-b Cc: stable@vger.kernel.org # 4.20+ Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>		2019-05-31 13:52:41 -04:00
..
9p	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
adfs	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
affs	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
afs	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 36	2019-05-24 17:27:11 +02:00
autofs	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 83	2019-05-24 17:37:52 +02:00
befs	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
bfs	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
btrfs	for-5.2-rc2-tag	2019-05-30 20:52:40 -07:00
cachefiles	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 36	2019-05-24 17:27:11 +02:00
ceph	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
cifs	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 61	2019-05-24 17:36:45 +02:00
coda	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
configfs	configfs: Fix use-after-free when accessing sd->s_dentry	2019-05-28 08:11:58 +02:00
cramfs	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
crypto	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
debugfs	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
devpts	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 83	2019-05-24 17:37:52 +02:00
dlm	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
ecryptfs	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
efivarfs	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
efs	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
exportfs	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
ext2	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
ext4	Bug fixes (including a regression fix) for ext4.	2019-05-25 15:03:12 -07:00
f2fs	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
fat	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
freevxfs	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
fscache	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 36	2019-05-24 17:27:11 +02:00
fuse	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
gfs2	Fix a gfs2 sign extension bug introduced in v4.3.	2019-05-22 08:31:09 -07:00
hfs	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
hfsplus	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
hostfs	This pull request contains the following changes for UML:	2019-05-12 17:52:13 -04:00
hpfs	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
hugetlbfs	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
isofs	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
jbd2	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
jffs2	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
jfs	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
kernfs	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
lockd	treewide: Add SPDX license identifier for more missed files	2019-05-21 10:50:45 +02:00
minix	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
nfs	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 36	2019-05-24 17:27:11 +02:00
nfs_common	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
nfsd	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 1	2019-05-21 11:28:39 +02:00
nilfs2	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
nls	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
notify	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 118	2019-05-24 17:39:02 +02:00
ntfs	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 97	2019-05-24 17:37:53 +02:00
ocfs2	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
omfs	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
openpromfs	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
orangefs	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
overlayfs	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
proc	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
pstore	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
qnx4	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
qnx6	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
quota	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
ramfs	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
reiserfs	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
romfs	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
squashfs	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 118	2019-05-24 17:39:02 +02:00
sysfs	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
sysv	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
tracefs	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
ubifs	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
udf	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
ufs	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
unicode	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
xfs	Fixes for 5.1:	2019-05-23 11:18:18 -07:00
aio.c	aio: use kmem_cache_free() instead of kfree()	2019-04-04 20:13:59 -04:00
anon_inodes.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
attr.c
bad_inode.c
binfmt_aout.c	treewide: Add SPDX license identifier for more missed files	2019-05-21 10:50:45 +02:00
binfmt_elf_fdpic.c
binfmt_elf.c	treewide: Add SPDX license identifier for more missed files	2019-05-21 10:50:45 +02:00
binfmt_em86.c	treewide: Add SPDX license identifier for more missed files	2019-05-21 10:50:45 +02:00
binfmt_flat.c
binfmt_misc.c	treewide: Add SPDX license identifier for more missed files	2019-05-21 10:50:45 +02:00
binfmt_script.c	treewide: Add SPDX license identifier for more missed files	2019-05-21 10:50:45 +02:00
block_dev.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
buffer.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
char_dev.c	chardev: update comment based on the code	2019-04-02 17:49:58 +02:00
compat_binfmt_elf.c
compat_ioctl.c
compat.c
coredump.c
d_path.c
dax.c	mm: page_mkclean vs MADV_DONTNEED race	2019-05-14 09:47:48 -07:00
dcache.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
dcookies.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
direct-io.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
drop_caches.c	fs/drop_caches.c: avoid softlockups in drop_pagecache_sb()	2019-02-01 15:46:24 -08:00
eventfd.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
eventpoll.c	epoll: use rwlock in order to reduce ep_poll_callback() contention	2019-03-07 18:32:01 -08:00
exec.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
fcntl.c	fs: mark expected switch fall-throughs	2019-04-08 18:21:02 -05:00
fhandle.c
file_table.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
file.c	io_uring-2019-03-06	2019-03-08 14:48:40 -08:00
filesystems.c	vfs: Implement a filesystem superblock creation/configuration context	2019-02-28 03:29:26 -05:00
fs_context.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 36	2019-05-24 17:27:11 +02:00
fs_parser.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 36	2019-05-24 17:27:11 +02:00
fs_pin.c
fs_struct.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
fs_types.c	fs: common implementation of file type	2019-01-21 17:48:13 +01:00
fs-writeback.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
fsopen.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 36	2019-05-24 17:27:11 +02:00
inode.c	mm: fix page cache convergence regression	2019-05-31 13:52:41 -04:00
internal.h	Merge branch 'work.mount-syscalls' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2019-05-07 20:17:51 -07:00
io_uring.c	for-linus-20190516	2019-05-16 19:10:37 -07:00
ioctl.c
iomap.c	for-5.2/block-20190507	2019-05-07 18:14:36 -07:00
Kconfig	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
Kconfig.binfmt	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
libfs.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
locks.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
Makefile	Add as a feature case-insensitive directories (the casefold feature)	2019-05-07 21:12:44 -07:00
mbcache.c	treewide: Add SPDX license identifier for more missed files	2019-05-21 10:50:45 +02:00
mount.h	saner handling of temporary namespaces	2019-01-30 17:44:07 -05:00
mpage.c	block: remove the i argument to bio_for_each_segment_all	2019-04-30 09:26:13 -06:00
namei.c	Clean up fscrypt's dcache revalidation support, and other	2019-05-07 21:28:04 -07:00
namespace.c	do_move_mount(): fix an unsafe use of is_anon_ns()	2019-05-09 02:32:50 -04:00
no-block.c
nsfs.c	nsfs: unobfuscate	2019-04-09 19:20:57 -04:00
open.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
pipe.c	Merge branch 'page-refs' (page ref overflow)	2019-04-14 15:09:40 -07:00
pnode.c	separate copying and locking mount tree on cross-userns copies	2019-01-30 17:14:50 -05:00
pnode.h	separate copying and locking mount tree on cross-userns copies	2019-01-30 17:14:50 -05:00
posix_acl.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
proc_namespace.c
read_write.c	vfs: pass ppos=NULL to .read()/.write() of FMODE_STREAM files	2019-05-06 17:46:52 +03:00
readdir.c
select.c	y2038: syscalls: rename y2038 compat syscalls	2019-02-07 00:13:27 +01:00
seq_file.c	fs: mark expected switch fall-throughs	2019-04-08 18:21:02 -05:00
signalfd.c	fs: mark expected switch fall-throughs	2019-04-08 18:21:02 -05:00
splice.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
stack.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
stat.c	fs: move generic stat response attr handling to vfs_getattr_nosec	2019-02-01 01:55:45 -05:00
statfs.c	vfs: add vfs_get_fsid() helper	2019-02-07 16:38:35 +01:00
super.c	[fix] get rid of checking for absent device name in vfs_get_tree()	2019-04-28 21:34:21 -04:00
sync.c	fs/sync.c: sync_file_range(2) may use WB_SYNC_ALL writeback	2019-05-14 09:47:50 -07:00
timerfd.c	y2038: syscalls: rename y2038 compat syscalls	2019-02-07 00:13:27 +01:00
userfaultfd.c	userfaultfd/sysctl: add vm.unprivileged_userfaultfd	2019-05-14 09:47:45 -07:00
utimes.c	y2038: syscalls: rename y2038 compat syscalls	2019-02-07 00:13:27 +01:00
xattr.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00