linux/mm
Joonsoo Kim 37a2140dc2 mm, hugetlb: do not use a page in page cache for cow optimization
Currently, we use a page with mapped count 1 in page cache for cow
optimization.  If we find this condition, we don't allocate a new page and
copy contents.  Instead, we map this page directly.  This may introduce a
problem that writting to private mapping overwrite hugetlb file directly.
You can find this situation with following code.

        size = 20 * MB;
        flag = MAP_SHARED;
        p = mmap(NULL, size, PROT_READ|PROT_WRITE, flag, fd, 0);
        if (p == MAP_FAILED) {
                fprintf(stderr, "mmap() failed: %s\n", strerror(errno));
                return -1;
        }
        p[0] = 's';
        fprintf(stdout, "BEFORE STEAL PRIVATE WRITE: %c\n", p[0]);
        munmap(p, size);

        flag = MAP_PRIVATE;
        p = mmap(NULL, size, PROT_READ|PROT_WRITE, flag, fd, 0);
        if (p == MAP_FAILED) {
                fprintf(stderr, "mmap() failed: %s\n", strerror(errno));
        }
        p[0] = 'c';
        munmap(p, size);

        flag = MAP_SHARED;
        p = mmap(NULL, size, PROT_READ|PROT_WRITE, flag, fd, 0);
        if (p == MAP_FAILED) {
                fprintf(stderr, "mmap() failed: %s\n", strerror(errno));
                return -1;
        }
        fprintf(stdout, "AFTER STEAL PRIVATE WRITE: %c\n", p[0]);
        munmap(p, size);

We can see that "AFTER STEAL PRIVATE WRITE: c", not "AFTER STEAL PRIVATE
WRITE: s".  If we turn off this optimization to a page in page cache, the
problem is disappeared.

So, I change the trigger condition of optimization.  If this page is not
AnonPage, we don't do optimization.  This makes this optimization turning
off for a page cache.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Hillf Danton <dhillf@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:57:27 -07:00
..
backing-dev.c backing-dev: convert class code to use dev_groups 2013-08-19 21:22:34 -07:00
balloon_compaction.c mm: introduce a common interface for balloon pages mobility 2012-12-11 17:22:26 -08:00
bootmem.c mm: kill free_all_bootmem_node() 2013-07-03 16:07:39 -07:00
bounce.c Merge branch 'for-3.10/core' of git://git.kernel.dk/linux-block 2013-05-08 10:13:35 -07:00
cleancache.c mm: cleancache: clean up cleancache_enabled 2013-04-30 17:04:01 -07:00
compaction.c mm: add & use zone_end_pfn() and zone_spans_pfn() 2013-02-23 17:50:20 -08:00
debug-pagealloc.c
dmapool.c dmapool: make DMAPOOL_DEBUG detect corruption of free marker 2012-12-11 17:22:24 -08:00
fadvise.c teach SYSCALL_DEFINE<n> how to deal with long long/unsigned long long 2013-03-03 22:46:22 -05:00
failslab.c
filemap_xip.c lift sb_start_write() out of ->write() 2013-04-09 14:12:56 -04:00
filemap.c direct-io: Handle O_(D)SYNC AIO 2013-09-04 09:23:46 -04:00
fremap.c mm: save soft-dirty bits on file pages 2013-08-13 17:57:48 -07:00
frontswap.c frontswap: fix incorrect zeroing and allocation size for frontswap_map 2013-06-12 16:29:46 -07:00
highmem.c Some nice cleanups, and even a patch my wife did as a "live" demo for 2012-12-20 08:37:05 -08:00
huge_memory.c mm/huge_memory.c: fix potential NULL pointer dereference 2013-09-11 15:57:19 -07:00
hugetlb_cgroup.c cgroup: pass around cgroup_subsys_state instead of cgroup in file methods 2013-08-08 20:11:24 -04:00
hugetlb.c mm, hugetlb: do not use a page in page cache for cow optimization 2013-09-11 15:57:27 -07:00
hwpoison-inject.c memcg: rename config variables 2012-07-31 18:42:43 -07:00
init-mm.c
internal.h mm: remove unused __put_page() 2013-07-09 10:33:22 -07:00
interval_tree.c mm: add CONFIG_DEBUG_VM_RB build option 2012-10-09 16:22:42 +09:00
Kconfig Merge remote-tracking branch 'origin/next' into kvm-ppc-next 2013-08-29 00:41:59 +02:00
Kconfig.debug
kmemcheck.c
kmemleak-test.c
kmemleak.c mm: replace strict_strtoul() with kstrtoul() 2013-09-11 15:57:11 -07:00
ksm.c mm: replace strict_strtoul() with kstrtoul() 2013-09-11 15:57:11 -07:00
maccess.c
madvise.c mm/madvise.c: fix coding-style errors 2013-09-11 15:57:00 -07:00
Makefile zswap: add to mm/ 2013-07-10 18:11:34 -07:00
memblock.c mm/memblock.c: fix wrong comment in __next_free_mem_range() 2013-07-09 10:33:23 -07:00
memcontrol.c Merge branch 'for-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2013-09-03 18:25:03 -07:00
memory_hotplug.c mm/memory_hotplug.c: fix return value of online_pages() 2013-07-09 10:33:25 -07:00
memory-failure.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2013-09-06 09:36:28 -07:00
memory.c Merge 3.11-rc6 into char-misc-next 2013-08-18 20:40:33 -07:00
mempolicy.c mm: mempolicy: turn vma_set_policy() into vma_dup_policy() 2013-09-11 15:57:00 -07:00
mempool.c mempool: add @gfp_mask to mempool_create_node() 2012-06-25 11:53:47 +02:00
migrate.c mm: migration: add migrate_entry_wait_huge() 2013-06-12 16:29:46 -07:00
mincore.c swap: make each swap partition have one address_space 2013-02-23 17:50:17 -08:00
mlock.c Revert "mm: introduce VM_POPULATE flag to better deal with racy userspace programs" 2013-03-28 17:45:51 -07:00
mm_init.c mm: tune vm_committed_as percpu_counter batching size 2013-07-03 16:07:32 -07:00
mmap.c mm: mmap_region: kill correct_wcount/inode, use allow_write_access() 2013-09-11 15:57:07 -07:00
mmu_context.c mm: remove old aio use_mm() comment 2013-05-07 18:38:27 -07:00
mmu_notifier.c treewide: relase -> release 2013-06-28 14:34:33 +02:00
mmzone.c mm: rename page struct field helpers 2013-02-23 17:50:18 -08:00
mprotect.c mm/mprotect.c: coding-style cleanups 2012-12-18 15:02:15 -08:00
mremap.c mm: move_ptes -- Set soft dirty bit depending on pte type 2013-08-27 09:36:17 -07:00
msync.c
nobootmem.c mm: concentrate modification of totalram_pages into the mm core 2013-07-03 16:07:33 -07:00
nommu.c mm: remove free_area_cache 2013-07-10 18:11:34 -07:00
oom_kill.c mm/oom_kill: remove weird use of ERR_PTR()/PTR_ERR(). 2013-07-15 11:25:05 +09:30
page_alloc.c mm: page_alloc: fair zone allocator policy 2013-09-11 15:57:23 -07:00
page_cgroup.c memcontrol: use N_MEMORY instead N_HIGH_MEMORY 2012-12-12 17:38:32 -08:00
page_io.c mm: remove compressed copy from zram in-memory 2013-07-03 16:07:26 -07:00
page_isolation.c page_isolation: Fix a comment typo in test_pages_isolated() 2013-08-20 13:03:41 +02:00
page-writeback.c mm: revert "page-writeback.c: subtract min_free_kbytes from dirtyable memory" 2013-09-11 15:57:23 -07:00
pagewalk.c mm/pagewalk.c: walk_page_range should avoid VM_PFNMAP areas 2013-05-24 16:22:53 -07:00
percpu-km.c
percpu-vm.c mm: fix kernel-doc warnings 2012-06-20 14:39:36 -07:00
percpu.c mm, percpu: Make sure percpu_alloc early parameter has an argument 2012-12-02 06:23:04 -08:00
pgtable-generic.c mm/THP: add pmd args to pgtable deposit and withdraw APIs 2013-06-20 16:55:07 +10:00
process_vm_access.c Fix: compat_rw_copy_check_uvector() misuse in aio, readv, writev, and security keys 2013-03-12 11:05:45 -07:00
quicklist.c
readahead.c mm: change invalidatepage prototype to accept length 2013-05-21 23:17:23 -04:00
rmap.c s390/mm: implement software referenced bits 2013-08-29 13:20:11 +02:00
shmem.c shm_mnt is as longterm as it gets, TYVM... 2013-09-03 22:50:27 -04:00
slab_common.c Merge branch 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux 2013-07-14 15:14:29 -07:00
slab.c kernel: delete __cpuinit usage from all core kernel files 2013-07-14 19:36:59 -04:00
slab.h memcg: check that kmem_cache has memcg_params before accessing it 2013-08-28 19:26:38 -07:00
slob.c Merge branch 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux 2013-07-14 15:14:29 -07:00
slub.c mm: replace strict_strtoul() with kstrtoul() 2013-09-11 15:57:11 -07:00
sparse-vmemmap.c sparse-vmemmap: specify vmemmap population range in bytes 2013-04-29 15:54:35 -07:00
sparse.c mm/sparse.c: put clear_hwpoisoned_pages within CONFIG_MEMORY_HOTREMOVE 2013-07-09 10:33:22 -07:00
swap_state.c swap: avoid read_swap_cache_async() race to deadlock while waiting on discard I/O completion 2013-06-12 16:29:45 -07:00
swap.c thp, mm: avoid PageUnevictable on active/inactive lru lists 2013-07-31 14:41:03 -07:00
swapfile.c swap: make cluster allocation per-cpu 2013-09-11 15:57:17 -07:00
truncate.c mm: teach truncate_inode_pages_range() to handle non page aligned ranges 2013-05-27 23:32:35 -04:00
util.c mm: remove free_area_cache 2013-07-10 18:11:34 -07:00
vmalloc.c mm/vmalloc.c: fix an overflow bug in alloc_vmap_area() 2013-07-09 10:33:23 -07:00
vmpressure.c Merge branch 'for-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2013-09-03 18:25:03 -07:00
vmscan.c mm: vmscan: fix numa reclaim balance problem in kswapd 2013-09-11 15:57:22 -07:00
vmstat.c mm: page_alloc: fair zone allocator policy 2013-09-11 15:57:23 -07:00
zbud.c mm: zbud: fix condition check on allocation size 2013-07-31 14:41:03 -07:00
zswap.c mm/zswap.c: get swapper address_space by using macro 2013-09-11 15:57:08 -07:00