linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-12-23 02:54:32 +08:00

History

Luiz Capitulino 2906dd5283 hugetlb: prep_compound_gigantic_page(): drop __init marker The HugeTLB subsystem uses the buddy allocator to allocate hugepages during runtime. This means that hugepages allocation during runtime is limited to MAX_ORDER order. For archs supporting gigantic pages (that is, page sizes greater than MAX_ORDER), this in turn means that those pages can't be allocated at runtime. HugeTLB supports gigantic page allocation during boottime, via the boot allocator. To this end the kernel provides the command-line options hugepagesz= and hugepages=, which can be used to instruct the kernel to allocate N gigantic pages during boot. For example, x86_64 supports 2M and 1G hugepages, but only 2M hugepages can be allocated and freed at runtime. If one wants to allocate 1G gigantic pages, this has to be done at boot via the hugepagesz= and hugepages= command-line options. Now, gigantic page allocation at boottime has two serious problems: 1. Boottime allocation is not NUMA aware. On a NUMA machine the kernel evenly distributes boottime allocated hugepages among nodes. For example, suppose you have a four-node NUMA machine and want to allocate four 1G gigantic pages at boottime. The kernel will allocate one gigantic page per node. On the other hand, we do have users who want to be able to specify which NUMA node gigantic pages should allocated from. So that they can place virtual machines on a specific NUMA node. 2. Gigantic pages allocated at boottime can't be freed At this point it's important to observe that regular hugepages allocated at runtime don't have those problems. This is so because HugeTLB interface for runtime allocation in sysfs supports NUMA and runtime allocated pages can be freed just fine via the buddy allocator. This series adds support for allocating gigantic pages at runtime. It does so by allocating gigantic pages via CMA instead of the buddy allocator. Releasing gigantic pages is also supported via CMA. As this series builds on top of the existing HugeTLB interface, it makes gigantic page allocation and releasing just like regular sized hugepages. This also means that NUMA support just works. For example, to allocate two 1G gigantic pages on node 1, one can do: # echo 2 > \ /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages And, to release all gigantic pages on the same node: # echo 0 > \ /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages Please, refer to patch 5/5 for full technical details. Finally, please note that this series is a follow up for a previous series that tried to extend the command-line options set to be NUMA aware: http://marc.info/?l=linux-mm&m=139593335312191&w=2 During the discussion of that series it was agreed that having runtime allocation support for gigantic pages was a better solution. This patch (of 5): This function is going to be used by non-init code in a future commit. Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Reviewed-by: Davidlohr Bueso <davidlohr@hp.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: David Rientjes <rientjes@google.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2014-06-04 16:53:58 -07:00
..
backing-dev.c	arch: Mass conversion of smp_mb__*()	2014-04-18 14:20:48 +02:00
balloon_compaction.c	mm: print more details for bad_page()	2014-01-23 16:36:50 -08:00
bootmem.c	mm/bootmem.c: remove unused local `map'	2013-11-13 12:09:09 +09:00
cleancache.c	mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE	2014-01-23 16:36:50 -08:00
compaction.c	mm/compaction: make isolate_freepages start at pageblock boundary	2014-05-06 13:04:59 -07:00
debug-pagealloc.c
dmapool.c	mm: Fix printk typo in dmapool.c	2014-05-05 15:44:47 +02:00
early_ioremap.c	mm: create generic early_ioremap() support	2014-04-07 16:36:15 -07:00
fadvise.c	teach SYSCALL_DEFINE<n> how to deal with long long/unsigned long long	2013-03-03 22:46:22 -05:00
failslab.c
filemap_xip.c	seqcount: Add lockdep functionality to seqcount/seqlock structures	2013-11-06 12:40:26 +01:00
filemap.c	Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next	2014-06-03 12:57:53 -07:00
fremap.c	mm: softdirty: make freshly remapped file pages being softdirty unconditionally	2014-06-04 16:53:56 -07:00
frontswap.c	frontswap: fix incorrect zeroing and allocation size for frontswap_map	2013-06-12 16:29:46 -07:00
highmem.c	Some nice cleanups, and even a patch my wife did as a "live" demo for	2012-12-20 08:37:05 -08:00
huge_memory.c	mm/huge_memory.c: complete conversion to pr_foo()	2014-06-04 16:53:58 -07:00
hugetlb_cgroup.c	cgroup: drop const from @buffer of cftype->write_string()	2014-03-19 10:23:54 -04:00
hugetlb.c	hugetlb: prep_compound_gigantic_page(): drop __init marker	2014-06-04 16:53:58 -07:00
hwpoison-inject.c	mm/hwpoison: add '#' to hwpoison_inject	2014-01-21 16:19:48 -08:00
init-mm.c
internal.h	mm/readahead.c: inline ra_submit	2014-04-07 16:35:58 -07:00
interval_tree.c
iov_iter.c	take iov_iter stuff to mm/iov_iter.c	2014-04-01 23:19:30 -04:00
Kconfig	hugetlb: restrict hugepage_migration_support() to x86_64	2014-06-04 16:53:51 -07:00
Kconfig.debug
kmemcheck.c
kmemleak-test.c
kmemleak.c	mm: postpone the disabling of kmemleak early logging	2014-05-11 17:55:48 +09:00
ksm.c	mm: close PageTail race	2014-03-04 07:55:47 -08:00
list_lru.c	mm: keep page cache radix tree nodes in check	2014-04-03 16:21:01 -07:00
maccess.c
madvise.c	mm: madvise: fix MADV_WILLNEED on shmem swapouts	2014-05-23 09:37:29 -07:00
Makefile	block: move mm/bounce.c to block/	2014-05-19 20:01:52 -06:00
memblock.c	memblock: introduce memblock_alloc_range()	2014-06-04 16:53:57 -07:00
memcontrol.c	mm: memcontrol: remove hierarchy restrictions for swappiness and oom_control	2014-06-04 16:53:58 -07:00
memory_hotplug.c	mm/memory_hotplug.c: move register_memory_resource out of the lock_memory_hotplug	2014-01-23 16:36:52 -08:00
memory-failure.c	mm/memory-failure.c: fix memory leak by race between poison and unpoison	2014-05-23 09:37:30 -07:00
memory.c	x86: define _PAGE_NUMA by reusing software bits on the PMD and PTE levels	2014-06-04 16:53:55 -07:00
mempolicy.c	mm, mempolicy: remove per-process flag	2014-04-07 16:35:54 -07:00
mempool.c	mm/mempool: warn about __GFP_ZERO usage	2014-06-04 16:53:58 -07:00
migrate.c	mm: fix swapops.h:131 bug if remap_file_pages raced migration	2014-03-20 22:09:09 -07:00
mincore.c	mm + fs: prepare for non-page entries in page cache radix trees	2014-04-03 16:21:00 -07:00
mlock.c	mm: try_to_unmap_cluster() should lock_page() before mlocking	2014-04-07 16:35:57 -07:00
mm_init.c	mm: bring back /sys/kernel/mm	2014-01-27 21:02:39 -08:00
mmap.c	mm/mmap.c: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO	2014-06-04 16:53:58 -07:00
mmu_context.c	sched/mm: call finish_arch_post_lock_switch in idle_task_exit and use_mm	2014-02-21 08:50:17 +01:00
mmu_notifier.c	mm: audit/fix non-modular users of module_init in core code	2014-01-23 16:36:52 -08:00
mmzone.c	mm: numa: Change page last {nid,pid} into {cpu,pid}	2013-10-09 14:47:45 +02:00
mprotect.c	mm: move mmu notifier call from change_protection to change_pmd_range	2014-04-07 16:35:50 -07:00
mremap.c	mm, thp: close race between mremap() and split_huge_page()	2014-05-11 17:55:48 +09:00
msync.c
nobootmem.c	mm/nobootmem.c: mark function as static	2014-04-03 16:21:02 -07:00
nommu.c	mm: fix 'ERROR: do not initialise globals to 0 or NULL' and coding style	2014-04-07 16:35:55 -07:00
oom_kill.c	mm, oom: base root bonus on current usage	2014-01-30 16:56:56 -08:00
page_alloc.c	mm: get rid of __GFP_KMEMCG	2014-06-04 16:53:56 -07:00
page_cgroup.c	mm/page_cgroup.c: mark functions as static	2014-04-03 16:21:02 -07:00
page_io.c	Merge branch 'for-3.14/core' of git://git.kernel.dk/linux-block	2014-01-30 11:19:05 -08:00
page_isolation.c	mm: memory-hotplug: enable memory hotplug to handle hugepage	2013-09-11 15:57:48 -07:00
page-writeback.c	mm/page-writeback.c: fix divide by zero in pos_ratio_polynom	2014-05-06 13:04:58 -07:00
pagewalk.c	mm/pagewalk.c: fix walk_page_range() access of wrong PTEs	2013-10-30 14:27:03 -07:00
percpu-km.c
percpu-vm.c
percpu.c	percpu: make pcpu_alloc_chunk() use pcpu_mem_free() instead of kfree()	2014-04-14 16:18:06 -04:00
pgtable-generic.c	mm: fix TLB flush race between migration, and change_protection_range	2013-12-18 19:04:51 -08:00
process_vm_access.c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2014-04-12 14:49:50 -07:00
quicklist.c
readahead.c	mm/readahead.c: inline ra_submit	2014-04-07 16:35:58 -07:00
rmap.c	mm: softdirty: don't forget to save file map softdiry bit on unmap	2014-06-04 16:53:56 -07:00
shmem.c	mm: Initialize error in shmem_file_aio_read()	2014-04-13 14:10:26 -07:00
slab_common.c	slab: document kmalloc_order	2014-06-04 16:53:58 -07:00
slab.c	sl[au]b: charge slabs to kmemcg explicitly	2014-06-04 16:53:56 -07:00
slab.h	sl[au]b: charge slabs to kmemcg explicitly	2014-06-04 16:53:56 -07:00
slob.c	mm: slab/slub: use page->list consistently instead of page->lru	2014-04-11 10:06:06 +03:00
slub.c	mm: get rid of __GFP_KMEMCG	2014-06-04 16:53:56 -07:00
sparse-vmemmap.c	mm/sparse: use memblock apis for early memory allocations	2014-01-21 16:19:47 -08:00
sparse.c	mm: use macros from compiler.h instead of __attribute__((...))	2014-04-07 16:35:54 -07:00
swap_state.c	swap: add a simple detector for inappropriate swapin readahead	2014-02-06 13:48:51 -08:00
swap.c	mm: thrash detection-based file cache sizing	2014-04-03 16:21:01 -07:00
swapfile.c	mm/swap: fix race on swap_info reuse between swapoff and swapon	2014-02-06 13:48:51 -08:00
truncate.c	mm: filemap: update find_get_pages_tag() to deal with shadow entries	2014-05-06 13:04:59 -07:00
util.c	nick kvfree() from apparmor	2014-05-06 14:02:53 -04:00
vmacache.c	mm,vmacache: optimize overflow system-wide flushing	2014-06-04 16:53:57 -07:00
vmalloc.c	mm/vmalloc.c: enhance vm_map_ram() comment	2014-04-07 16:35:55 -07:00
vmpressure.c	arm, pm, vmpressure: add missing slab.h includes	2014-02-03 13:24:01 -05:00
vmscan.c	mm: only force scan in reclaim when none of the LRUs are big enough.	2014-06-04 16:53:56 -07:00
vmstat.c	mm,vmacache: add debug data	2014-06-04 16:53:57 -07:00
workingset.c	mm: keep page cache radix tree nodes in check	2014-04-03 16:21:01 -07:00
zbud.c	mm/zbud: fix some trivial typos in comments	2013-09-11 15:57:35 -07:00
zsmalloc.c	zsmalloc: Fix CPU hotplug callback registration	2014-03-20 13:43:45 +01:00
zswap.c	Merge branch 'akpm' (incoming from Andrew)	2014-04-07 16:38:06 -07:00