linux/fs/proc
Johannes Weiner 4b85afbdac mm: zero-seek shrinkers
The page cache and most shrinkable slab caches hold data that has been
read from disk, but there are some caches that only cache CPU work, such
as the dentry and inode caches of procfs and sysfs, as well as the subset
of radix tree nodes that track non-resident page cache.

Currently, all these are shrunk at the same rate: using DEFAULT_SEEKS for
the shrinker's seeks setting tells the reclaim algorithm that for every
two page cache pages scanned it should scan one slab object.

This is a bogus setting.  A virtual inode that required no IO to create is
not twice as valuable as a page cache page; shadow cache entries with
eviction distances beyond the size of memory aren't either.

In most cases, the behavior in practice is still fine.  Such virtual
caches don't tend to grow and assert themselves aggressively, and usually
get picked up before they cause problems.  But there are scenarios where
that's not true.

Our database workloads suffer from two of those.  For one, their file
workingset is several times bigger than available memory, which has the
kernel aggressively create shadow page cache entries for the non-resident
parts of it.  The workingset code does tell the VM that most of these are
expendable, but the VM ends up balancing them 2:1 to cache pages as per
the seeks setting.  This is a huge waste of memory.

These workloads also deal with tens of thousands of open files and use
/proc for introspection, which ends up growing the proc_inode_cache to
absurdly large sizes - again at the cost of valuable cache space, which
isn't a reasonable trade-off, given that proc inodes can be re-created
without involving the disk.

This patch implements a "zero-seek" setting for shrinkers that results in
a target ratio of 0:1 between their objects and IO-backed caches.  This
allows such virtual caches to grow when memory is available (they do
cache/avoid CPU work after all), but effectively disables them as soon as
IO-backed objects are under pressure.

It then switches the shrinkers for procfs and sysfs metadata, as well as
excess page cache shadow nodes, to the new zero-seek setting.

Link: http://lkml.kernel.org/r/20181009184732.762-5-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Domas Mituzas <dmituzas@fb.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Rik van Riel <riel@surriel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-26 16:26:33 -07:00
..
array.c proc: use "unsigned int" for sigqueue length 2018-06-07 17:34:38 -07:00
base.c proc: restrict kernel stack dumps to root 2018-10-05 16:32:05 -07:00
cmdline.c proc: introduce proc_create_single{,_data} 2018-05-16 07:23:35 +02:00
consoles.c proc: introduce proc_create_seq{,_data} 2018-05-16 07:23:35 +02:00
cpuinfo.c x86 / CPU: Always show current CPU frequency in /proc/cpuinfo 2017-11-15 19:46:50 +01:00
devices.c proc: introduce proc_create_seq{,_data} 2018-05-16 07:23:35 +02:00
fd.c proc: use "unsigned int" in proc_fill_cache() 2018-06-07 17:34:38 -07:00
fd.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
generic.c proc: smaller readlock section in readdir("/proc") 2018-08-22 10:52:45 -07:00
inode.c mm: zero-seek shrinkers 2018-10-26 16:26:33 -07:00
internal.h proc: spread "const" a bit 2018-08-22 10:52:46 -07:00
interrupts.c proc: introduce proc_create_seq{,_data} 2018-05-16 07:23:35 +02:00
Kconfig proc/kcore: add vmcoreinfo note to /proc/kcore 2018-08-22 10:52:46 -07:00
kcore.c fs/proc/kcore.c: fix invalid memory access in multi-page read optimization 2018-09-20 22:01:11 +02:00
kmsg.c vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
loadavg.c sched: loadavg: consolidate LOAD_INT, LOAD_FRAC, CALC_LOAD 2018-10-26 16:26:32 -07:00
Makefile proc: : uninline name_to_int() 2017-11-17 16:10:00 -08:00
meminfo.c mm, proc: add KReclaimable to /proc/meminfo 2018-10-26 16:26:32 -07:00
namespaces.c procfs: switch instantiate_t to d_splice_alias() 2018-05-26 14:20:50 -04:00
nommu.c proc: introduce proc_create_seq{,_data} 2018-05-16 07:23:35 +02:00
page.c mm: mark pages in use for page tables 2018-06-07 17:34:37 -07:00
proc_net.c proc: Add a way to make network proc files writable 2018-05-18 11:46:15 +01:00
proc_sysctl.c treewide: kzalloc() -> kcalloc() 2018-06-12 16:19:22 -07:00
proc_tty.c tty: replace ->proc_fops with ->proc_show 2018-05-16 07:24:30 +02:00
root.c proc: Make inline name size calculation automatic 2018-06-15 00:48:57 -04:00
self.c proc: introduce a proc_pid_ns helper 2018-05-16 07:23:35 +02:00
softirqs.c proc: introduce proc_create_single{,_data} 2018-05-16 07:23:35 +02:00
stat.c proc: use "unsigned int" in /proc/stat hook 2018-08-22 10:52:46 -07:00
task_mmu.c mm: /proc/pid/smaps_rollup: fix NULL pointer deref in smaps_pte_range() 2018-10-26 16:25:18 -07:00
task_nommu.c mm: /proc/pid/*maps remove is_pid and related wrappers 2018-08-22 10:52:44 -07:00
thread_self.c proc: introduce a proc_pid_ns helper 2018-05-16 07:23:35 +02:00
uptime.c fs/proc/uptime.c: use ktime_get_boottime_ts64 2018-08-22 10:52:45 -07:00
util.c proc: use do-while in name_to_int() 2017-11-17 16:10:00 -08:00
version.c proc: introduce proc_create_single{,_data} 2018-05-16 07:23:35 +02:00
vmcore.c proc/vmcore: Fix i386 build error of missing copy_oldmem_page_encrypted() 2018-10-09 11:57:28 +02:00