linux/Documentation
Johannes Weiner 00501b531c mm: memcontrol: rewrite charge API
These patches rework memcg charge lifetime to integrate more naturally
with the lifetime of user pages.  This drastically simplifies the code and
reduces charging and uncharging overhead.  The most expensive part of
charging and uncharging is the page_cgroup bit spinlock, which is removed
entirely after this series.

Here are the top-10 profile entries of a stress test that reads a 128G
sparse file on a freshly booted box, without even a dedicated cgroup (i.e.
 executing in the root memcg).  Before:

    15.36%              cat  [kernel.kallsyms]   [k] copy_user_generic_string
    13.31%              cat  [kernel.kallsyms]   [k] memset
    11.48%              cat  [kernel.kallsyms]   [k] do_mpage_readpage
     4.23%              cat  [kernel.kallsyms]   [k] get_page_from_freelist
     2.38%              cat  [kernel.kallsyms]   [k] put_page
     2.32%              cat  [kernel.kallsyms]   [k] __mem_cgroup_commit_charge
     2.18%          kswapd0  [kernel.kallsyms]   [k] __mem_cgroup_uncharge_common
     1.92%          kswapd0  [kernel.kallsyms]   [k] shrink_page_list
     1.86%              cat  [kernel.kallsyms]   [k] __radix_tree_lookup
     1.62%              cat  [kernel.kallsyms]   [k] __pagevec_lru_add_fn

After:

    15.67%           cat  [kernel.kallsyms]   [k] copy_user_generic_string
    13.48%           cat  [kernel.kallsyms]   [k] memset
    11.42%           cat  [kernel.kallsyms]   [k] do_mpage_readpage
     3.98%           cat  [kernel.kallsyms]   [k] get_page_from_freelist
     2.46%           cat  [kernel.kallsyms]   [k] put_page
     2.13%       kswapd0  [kernel.kallsyms]   [k] shrink_page_list
     1.88%           cat  [kernel.kallsyms]   [k] __radix_tree_lookup
     1.67%           cat  [kernel.kallsyms]   [k] __pagevec_lru_add_fn
     1.39%       kswapd0  [kernel.kallsyms]   [k] free_pcppages_bulk
     1.30%           cat  [kernel.kallsyms]   [k] kfree

As you can see, the memcg footprint has shrunk quite a bit.

   text    data     bss     dec     hex filename
  37970    9892     400   48262    bc86 mm/memcontrol.o.old
  35239    9892     400   45531    b1db mm/memcontrol.o

This patch (of 4):

The memcg charge API charges pages before they are rmapped - i.e.  have an
actual "type" - and so every callsite needs its own set of charge and
uncharge functions to know what type is being operated on.  Worse,
uncharge has to happen from a context that is still type-specific, rather
than at the end of the page's lifetime with exclusive access, and so
requires a lot of synchronization.

Rewrite the charge API to provide a generic set of try_charge(),
commit_charge() and cancel_charge() transaction operations, much like
what's currently done for swap-in:

  mem_cgroup_try_charge() attempts to reserve a charge, reclaiming
  pages from the memcg if necessary.

  mem_cgroup_commit_charge() commits the page to the charge once it
  has a valid page->mapping and PageAnon() reliably tells the type.

  mem_cgroup_cancel_charge() aborts the transaction.

This reduces the charge API and enables subsequent patches to
drastically simplify uncharging.

As pages need to be committed after rmap is established but before they
are added to the LRU, page_add_new_anon_rmap() must stop doing LRU
additions again.  Revive lru_cache_add_active_or_unevictable().

[hughd@google.com: fix shmem_unuse]
[hughd@google.com: Add comments on the private use of -EAGAIN]
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08 15:57:17 -07:00
..
ABI Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid 2014-08-06 20:56:28 -07:00
accounting Documentation/accounting/getdelays.c: add missing null-terminate after strncpy call 2014-06-23 16:47:44 -07:00
acpi ACPI / documentation: Remove reference to acpi_platform_device_ids from enumeration.txt 2014-07-12 00:07:05 +02:00
aoe aoe: remove do-nothing NAME="%k" term from example udev rules 2013-09-11 15:59:28 -07:00
arm Merge branch 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm into next 2014-06-05 15:57:04 -07:00
arm64 arm64: Add support for 48-bit VA space with 64KB page configuration 2014-07-23 15:28:15 +01:00
auxdisplay
backlight backlight: lp855x_bl: support new LP8555 device 2013-11-13 12:09:14 +09:00
blackfin Documentation/: update 00-INDEX files 2014-02-10 16:01:40 -08:00
block Documentation/: update 00-INDEX files 2014-02-10 16:01:40 -08:00
blockdev zram: propagate error to user 2014-04-07 16:36:02 -07:00
bus-devices
cdrom
cgroups mm: memcontrol: rewrite charge API 2014-08-08 15:57:17 -07:00
connector w1: optional bundling of netlink kernel replies 2014-05-27 13:56:21 -07:00
console TTY:console: update document console.txt 2013-05-21 10:21:57 -07:00
cpu-freq intel_pstate: Update documentation of {max,min}_perf_pct sysfs files 2014-07-07 01:22:19 +02:00
cpuidle cpuidle: remove cpuidle_unregister_governor() 2013-10-30 01:21:24 +01:00
cris
crypto drivers/dma: remove unused support for MEMSET operations 2013-07-03 16:07:42 -07:00
development-process Documentation: development-process: Update -mm and -next URLs 2013-07-25 12:37:24 +02:00
device-mapper dm thin: add 'no_space_timeout' dm-thin-pool module param 2014-05-20 14:30:36 -04:00
devicetree Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc 2014-08-07 08:50:34 -07:00
DocBook Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-08-05 17:46:42 -07:00
driver-model Documentation: devres: Sort managed interfaces 2014-07-11 17:56:55 -07:00
dvb [media] get_dvb_firmware: Add firmware extractor for si2165 2014-07-27 17:01:12 -03:00
early-userspace Documentation: remove reference to 2.7 kernel in early-userspace 2013-08-20 12:47:28 +02:00
EDID drm: Add 800x600 (SVGA) screen resolution to the built-in EDIDs 2014-05-26 12:53:40 +10:00
extcon extcon: fix switch class porting guide (Documentation) 2014-01-07 11:54:28 +09:00
fault-injection
fb doc: spelling error changes 2014-05-05 15:32:05 +02:00
filesystems Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-08-05 17:46:42 -07:00
firmware_class doc: fix minor typos in firmware_class README 2014-07-17 18:43:40 -07:00
fmc FMC: make eeprom attribute writable 2014-02-28 15:12:08 -08:00
frv
gpio Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial into next 2014-06-04 08:50:34 -07:00
hid Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial into next 2014-06-04 08:50:34 -07:00
hwmon hwmon: Add pwm-fan driver 2014-08-04 07:01:38 -07:00
i2c Documentation: i2c: improve section about flags mangling the protocol 2014-04-06 13:53:48 +02:00
i2o
ia64
ide Documentation/: update 00-INDEX files 2014-02-10 16:01:40 -08:00
infiniband
input Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2014-07-23 15:42:53 -07:00
ioctl crypto: qat - Update to makefiles 2014-06-20 21:26:19 +08:00
isdn
ja_JP Documentation: Update stable address in Chinese and Japanese translations 2014-04-16 14:13:27 -07:00
kbuild kbuild: fix a typo in a kbuild document 2014-06-18 21:38:18 +02:00
kdump Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2013-07-04 11:40:58 -07:00
ko_KR Documentation: HOWTO: Updates on subsystem trees, patchwork, -next (vs. -mm) in ko_KR 2014-01-08 15:32:51 -08:00
laptops Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2014-08-06 21:03:53 -07:00
leds Documentation/: update 00-INDEX files 2014-02-10 16:01:40 -08:00
m68k Documentation/: update 00-INDEX files 2014-02-10 16:01:40 -08:00
make
memory-devices
metag doc: fix misspellings with 'codespell' tool 2013-05-28 12:02:12 +02:00
mic misc: mic: add support for loading/unloading dma driver 2014-07-11 18:31:12 -07:00
mips
misc-devices Merge branch 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging 2014-01-29 18:56:27 -08:00
mmc
mn10300
mtd MTD updates for 3.16: 2014-06-11 08:35:34 -07:00
namespaces
netlabel
networking i40e: adds FCoE to build and updates its documentation 2014-08-02 19:41:13 -07:00
nfc
parisc parisc: document the shadow registers 2013-07-09 22:09:19 +02:00
PCI doc: replace "practise" with "practice" in Documentation 2014-06-19 15:28:56 +02:00
pcmcia
phy phy: Add new Exynos USB 2.0 PHY driver 2014-03-08 12:39:44 +05:30
platform Documentation: Add list of laptop models supported by the Compal driver 2014-06-10 19:11:06 -04:00
power ACPI and power management updates for 3.17-rc1 2014-08-06 20:34:19 -07:00
powerpc Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc 2014-06-10 18:54:22 -07:00
pps USB: serial: invoke dcd_change ldisc's handler. 2013-09-26 09:45:40 -07:00
prctl
pti
ptp ptp: In the testptp utility, use clock_adjtime from glibc when available 2014-06-16 21:32:31 -07:00
rapidio rapidio: rework device hierarchy and introduce mport class of devices 2014-04-07 16:36:07 -07:00
RCU list: fix order of arguments for hlist_add_after(_rcu) 2014-08-06 18:01:24 -07:00
s390 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial into next 2014-06-04 08:50:34 -07:00
scheduler asm/system.h: clean asm/system.h from docs 2014-04-07 16:36:11 -07:00
scsi Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2014-08-06 21:03:53 -07:00
security Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2014-08-06 21:03:53 -07:00
serial tty/serial: Add GPIOLIB helpers for controlling modem lines 2014-05-28 12:49:14 -07:00
sh
sound ALSA: virtuoso: add Xonar Essence STX II support 2014-08-04 15:20:48 +02:00
spi Merge remote-tracking branches 'spi/topic/s3c64xx', 'spi/topic/sc18is602', 'spi/topic/sh-hspi', 'spi/topic/sh-msiof', 'spi/topic/sh-sci', 'spi/topic/sirf' and 'spi/topic/spidev' into spi-next 2014-03-30 00:51:34 +00:00
sysctl kernel/watchdog.c: print traces for all cpus on lockup detection 2014-06-23 16:47:44 -07:00
target target: Remove TF_CIT_TMPL macro 2013-10-16 13:35:02 -07:00
thermal drm/nouveau/doc: update the thermal documentation 2014-06-17 14:50:17 +10:00
timers clocksource: document some basic timekeeping concepts 2014-07-23 15:07:13 -07:00
tpm drivers/tpm: add xen tpmfront interface 2013-08-09 10:57:06 -04:00
trace mm: trace-vmscan-postprocess.pl: report the number of file/anon pages respectively 2014-08-06 18:01:20 -07:00
usb usb: doc: hotplug.txt code typos 2014-07-09 16:05:42 -07:00
vDSO x86/vdso/doc: Make vDSO examples more portable 2014-06-12 19:01:24 -07:00
video4linux [media] update cx23885 and em28xx cardlists 2014-07-26 11:55:10 -03:00
virtual Bugfixes 2014-07-22 10:22:53 +02:00
vm mm: mark remap_file_pages() syscall as deprecated 2014-06-06 16:08:17 -07:00
w1 w1: new w1_ds2406 driver 2014-06-19 17:45:14 -07:00
watchdog Documentation: fix two typos in watchdog-api.txt 2014-08-05 22:43:21 +02:00
wimax
x86 x86/mm: New tunable for single vs full TLB flush 2014-07-31 08:48:51 -07:00
xtensa xtensa: remap io area defined in device tree 2014-01-15 00:25:14 +04:00
zh_CN Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2014-08-06 21:03:53 -07:00
.gitignore
00-INDEX Merge branch 'x86-nuke-platforms-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-04-02 13:15:58 -07:00
applying-patches.txt
assoc_array.txt KEYS: Fix multiple key add into associative array 2013-12-02 11:24:18 +00:00
atomic_ops.txt arch,doc: Convert smp_mb__*() 2014-04-18 14:20:48 +02:00
bad_memory.txt
basic_profiling.txt
bcache.txt Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2013-07-04 11:40:58 -07:00
binfmt_misc.txt
braille-console.txt
bt8xxgpio.txt
btmrvl.txt
BUG-HUNTING
bus-virt-phys-mapping.txt
cachetlb.txt Documentation: fix typo and update version in cachetlb.txt 2013-08-20 12:46:52 +02:00
Changes Documentation/Changes: clean up mcelog paragraph 2014-07-12 11:30:36 -07:00
circular-buffers.txt documentation: Update circular buffer for load-acquire/store-release 2013-12-03 10:08:57 -08:00
clk.txt clk: Improve clk_ops documentation 2014-05-12 17:08:33 -07:00
coccinelle.txt Coccinelle: Update information about the minimal version required 2013-07-03 22:58:20 +02:00
CodingStyle Documentation: expand/clarify debug documentation 2014-06-04 16:54:17 -07:00
cpu-hotplug.txt Doc/cpu-hotplug: Specify race-free way to register CPU hotplug callbacks 2014-03-20 13:43:40 +01:00
cpu-load.txt
cputopology.txt doc: Documentation/cputopology.txt fix typo 2013-09-04 12:59:47 +02:00
crc32.txt
dcdbas.txt
debugging-modules.txt
debugging-via-ohci1394.txt firewire: revert to 4 GB RDMA, fix protocols using Memory Space 2014-05-29 15:50:30 +02:00
dell_rbu.txt
devices.txt Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media 2014-04-04 09:50:07 -07:00
digsig.txt
DMA-API-HOWTO.txt DMA-API: Update dma_pool_create ()and dma_pool_alloc() descriptions 2014-05-26 17:28:28 -06:00
DMA-API.txt DMA-API: Capitalize "CPU" consistently 2014-05-26 17:28:27 -06:00
DMA-attributes.txt doc: spelling error changes 2014-05-05 15:32:05 +02:00
dma-buf-sharing.txt doc: spelling error changes 2014-05-05 15:32:05 +02:00
DMA-ISA-LPC.txt DMA-API: Clarify physical/bus address distinction 2014-05-20 16:54:21 -06:00
dmaengine.txt
dmatest.txt dmatest: add a 'wait' parameter 2013-11-14 11:04:40 -08:00
dontdiff Documentation: LLVMLinux: Update Documentation/dontdiff 2014-04-09 13:44:34 -07:00
dynamic-debug-howto.txt doc: spelling error changes 2014-05-05 15:32:05 +02:00
edac.txt Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial into next 2014-06-04 08:50:34 -07:00
efi-stub.txt doc: arm64: add description of EFI stub support 2014-04-30 19:57:05 +01:00
eisa.txt
email-clients.txt Documentation: add section about git to email-clients.txt 2014-06-29 13:38:33 -07:00
flexible-arrays.txt
futex-requeue-pi.txt doc: fix double words 2014-03-21 13:16:58 +01:00
gcov.txt gcov: compile specific gcov implementation based on gcc version 2013-11-13 12:09:34 +09:00
highuid.txt
HOWTO Documentation: HOWTO: Update broken links to tpp 2013-12-10 23:09:08 -08:00
hsi.txt Documentation: HSI: Add some general description for the HSI subsystem 2014-05-04 09:49:46 +02:00
hw_random.txt
hwspinlock.txt doc: documentation/hwspinlock.txt fix typo 2013-08-27 10:46:02 +02:00
init.txt
initrd.txt
intel_txt.txt
Intel-IOMMU.txt
io_ordering.txt
io-mapping.txt doc: fix some typos 2013-12-02 14:48:28 +01:00
iostats.txt
IPMI.txt
IRQ-affinity.txt doc: fix a typo about irq affinity 2013-08-20 12:59:18 +02:00
IRQ-domain.txt genirq: Improve documentation to match current implementation 2014-05-27 10:16:44 +02:00
IRQ.txt
irqflags-tracing.txt asm/system.h: clean asm/system.h from docs 2014-04-07 16:36:11 -07:00
isapnp.txt
java.txt Documentation: update java sample wrapper for java 7 2014-05-25 12:39:00 -07:00
kernel-doc-nano-HOWTO.txt kernel-doc: Update references to SGML to refs to XML instead. 2013-05-28 12:02:11 +02:00
kernel-docs.txt
kernel-parameters.txt Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus 2014-08-07 08:47:00 -07:00
kernel-per-CPU-kthreads.txt Documentation/kernel-per-CPU-kthreads.txt: Workqueue affinity 2014-02-17 14:56:08 -08:00
kmemcheck.txt doc: fix double words 2014-03-21 13:16:58 +01:00
kmemleak.txt mm: introduce kmemleak_update_trace() 2014-06-06 16:08:17 -07:00
kobject.txt kobject: remove kset from sysfs immediately in kset_unregister() 2013-12-07 21:20:11 -08:00
kprobes.txt kprobes: Introduce NOKPROBE_SYMBOL() macro to maintain kprobes blacklist 2014-04-24 10:02:56 +02:00
kref.txt
ldm.txt
local_ops.txt
lockdep-design.txt
lockstat.txt lockstat: Report avg wait and hold times 2013-10-09 08:19:08 +02:00
lockup-watchdogs.txt
logo.gif
logo.txt
magic-number.txt Documentation/serial: Delete obsolete driver documentation 2014-04-16 14:20:34 -07:00
Makefile
ManagementStyle
md.txt doc: fix some typos in documentations 2013-12-02 14:45:19 +01:00
media-framework.txt Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media 2013-07-13 12:09:57 -07:00
memory-barriers.txt documentation: Add acquire/release barriers to pairing rules 2014-07-08 08:32:51 -07:00
memory-hotplug.txt mm, hotplug: probe interface is available on several platforms 2014-06-23 16:47:43 -07:00
module-signing.txt Nothing major: the stricter permissions checking for sysfs broke 2014-04-06 09:38:07 -07:00
mono.txt
mutex-design.txt locking/mutexes: Documentation update/rewrite 2014-06-05 13:29:37 +02:00
nommu-mmap.txt
numastat.txt
oops-tracing.txt Use 'E' instead of 'X' for unsigned module taint flag. 2014-03-31 14:52:43 +10:30
padata.txt
parport-lowlevel.txt
parport.txt
percpu-rw-semaphore.txt
phy.txt phy: core: Let node ptr of PHY point to PHY and not of PHY provider 2014-07-22 12:46:11 +05:30
pi-futex.txt
pinctrl.txt pinctrl: Fix some typos and grammar issues in the documentation 2014-01-15 13:59:50 +01:00
pnp.txt
preempt-locking.txt
printk-formats.txt doc: printk-formats: do not mention casts for u64/s64 2014-05-05 15:32:42 +02:00
pwm.txt pwm: modify PWM_LOOKUP to initialize all struct pwm_lookup members 2014-05-21 11:19:36 +02:00
ramoops.txt
rbtree.txt doc: spelling error changes 2014-05-05 15:32:05 +02:00
remoteproc.txt
rfkill.txt doc: spelling error changes 2014-05-05 15:32:05 +02:00
robust-futex-ABI.txt Documentation/robust-futex-API: Count properly to 4 2013-11-30 14:08:28 +01:00
robust-futexes.txt doc: spelling error changes 2014-05-05 15:32:05 +02:00
rpmsg.txt
rt-mutex-design.txt doc: fix some typos in documentations 2013-12-02 14:45:19 +01:00
rt-mutex.txt
rtc.txt rtc: add ability to push out an existing wakealarm using sysfs 2013-07-03 16:07:54 -07:00
SAK.txt
SecurityBugs
serial-console.txt
sgi-ioc4.txt
SM501.txt
smsc_ece1099.txt
sparse.txt
spinlocks.txt sched: Rename sched.c as sched/core.c in comments and Documentation 2013-06-19 12:58:42 +02:00
stable_api_nonsense.txt
stable_kernel_rules.txt stable_kernel_rules: Add pointer to netdev-FAQ for network patches 2014-07-09 15:54:27 -07:00
static-keys.txt doc: fix some typos in documentations 2013-12-02 14:45:19 +01:00
SubmitChecklist Finally eradicate CONFIG_HOTPLUG 2013-06-03 14:20:18 -07:00
SubmittingDrivers doc: SubmittingPatches: remove dead link, kerneltrap.org no longer works 2014-06-19 15:15:27 +02:00
SubmittingPatches Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2014-08-06 21:03:53 -07:00
svga.txt
sysfs-rules.txt doc: Fix typo in doucmentations 2013-07-25 12:34:15 +02:00
sysrq.txt sysrq: Allow magic SysRq key functions to be disabled through Kconfig 2013-10-16 13:01:44 -07:00
this_cpu_ops.txt
unaligned-memory-access.txt ether_addr_equal: Optimize implementation, remove unused compare_ether_addr 2013-12-06 16:37:43 -05:00
unicode.txt
unshare.txt
vfio.txt drivers/vfio: EEH support for VFIO PCI device 2014-08-05 15:28:48 +10:00
VGA-softcursor.txt
vgaarbiter.txt
video-output.txt
vme_api.txt VME: Rename vme_slot_get to avoid confusion with reference counting 2013-12-03 11:15:58 -08:00
volatile-considered-harmful.txt
workqueue.txt workqueue: Correct/Drop references to gcwq in Documentation 2013-08-21 10:32:09 -04:00
ww-mutex-design.txt mutex: Add support for wound/wait style locks 2013-06-26 12:10:56 +02:00
xz.txt
zorro.txt zorro/UAPI: Disintegrate include/linux/zorro*.h 2013-11-26 11:09:08 +01:00