linux/include
KAMEZAWA Hiroyuki f0c0b2b808 change zonelist order: zonelist order selection logic
Make zonelist creation policy selectable from sysctl/boot option v6.

This patch makes NUMA's zonelist (of pgdat) order selectable.
Available order are Default(automatic)/ Node-based / Zone-based.

[Default Order]
The kernel selects Node-based or Zone-based order automatically.

[Node-based Order]
This policy treats the locality of memory as the most important parameter.
Zonelist order is created by each zone's locality. This means lower zones
(ex. ZONE_DMA) can be used before higher zone (ex. ZONE_NORMAL) exhausion.
IOW. ZONE_DMA will be in the middle of zonelist.
current 2.6.21 kernel uses this.

Pros.
 * A user can expect local memory as much as possible.
Cons.
 * lower zone will be exhansted before higher zone. This may cause OOM_KILL.

Maybe suitable if ZONE_DMA is relatively big and you never see OOM_KILL
because of ZONE_DMA exhaution and you need the best locality.

(example)
assume 2 node NUMA. node(0) has ZONE_DMA/ZONE_NORMAL, node(1) has ZONE_NORMAL.

*node(0)'s memory allocation order:

 node(0)'s NORMAL -> node(0)'s DMA -> node(1)'s NORMAL.

*node(1)'s memory allocation order:

 node(1)'s NORMAL -> node(0)'s NORMAL -> node(0)'s DMA.

[Zone-based order]
This policy treats the zone type as the most important parameter.
Zonelist order is created by zone-type order. This means lower zone
never be used bofere higher zone exhaustion.
IOW. ZONE_DMA will be always at the tail of zonelist.

Pros.
 * OOM_KILL(bacause of lower zone) occurs only if the whole zones are exhausted.
Cons.
 * memory locality may not be best.

(example)
assume 2 node NUMA. node(0) has ZONE_DMA/ZONE_NORMAL, node(1) has ZONE_NORMAL.

*node(0)'s memory allocation order:

 node(0)'s NORMAL -> node(1)'s NORMAL -> node(0)'s DMA.

*node(1)'s memory allocation order:

 node(1)'s NORMAL -> node(0)'s NORMAL -> node(0)'s DMA.

bootoption "numa_zonelist_order=" and proc/sysctl is supporetd.

command:
%echo N > /proc/sys/vm/numa_zonelist_order

Will rebuild zonelist in Node-based order.

command:
%echo Z > /proc/sys/vm/numa_zonelist_order

Will rebuild zonelist in Zone-based order.

Thanks to Lee Schermerhorn, he gives me much help and codes.

[Lee.Schermerhorn@hp.com: add check_highest_zone to build_zonelists_in_zone_order]
[akpm@linux-foundation.org: build fix]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "jesse.barnes@intel.com" <jesse.barnes@intel.com>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16 09:05:35 -07:00
..
acpi Pull osi-now into release branch 2007-06-02 01:02:09 -04:00
asm-alpha PCI: remove pci_dac_dma_... APIs 2007-07-11 16:02:11 -07:00
asm-arm Merge branch 'ioat-md-accel-for-linus' of git://lost.foo-projects.org/~dwillia2/git/iop 2007-07-13 10:52:27 -07:00
asm-arm26 lots-of-architectures: enable arbitary speed tty support 2007-07-10 17:51:13 -07:00
asm-avr32 lots-of-architectures: enable arbitary speed tty support 2007-07-10 17:51:13 -07:00
asm-blackfin Blackfin arch: Add peripheral io API to gpio header file 2007-07-12 17:06:45 +08:00
asm-cris PCI: remove pci_dac_dma_... APIs 2007-07-11 16:02:11 -07:00
asm-frv frv: missing __clear_user() 2007-07-15 16:40:52 -07:00
asm-generic sched: simplify sched_find_first_bit() 2007-07-09 18:52:00 +02:00
asm-h8300 PCI: Use a weak symbol for the empty version of pcibios_add_platform_entries() 2007-07-11 16:02:07 -07:00
asm-i386 serial: convert early_uart to earlycon for 8250 2007-07-16 09:05:35 -07:00
asm-ia64 [IA64] Un-break ia64 build 2007-07-12 16:04:39 -07:00
asm-m32r lots-of-architectures: enable arbitary speed tty support 2007-07-10 17:51:13 -07:00
asm-m68k PCI: Use a weak symbol for the empty version of pcibios_add_platform_entries() 2007-07-11 16:02:07 -07:00
asm-m68knommu PCI: remove pci_dac_dma_... APIs 2007-07-11 16:02:11 -07:00
asm-mips [MIPS] Workaround for a sparse warning in include/asm-mips/mach-tx4927/ioremap.h 2007-07-13 17:40:01 +01:00
asm-parisc PCI: remove pci_dac_dma_... APIs 2007-07-11 16:02:11 -07:00
asm-powerpc PCI: remove pci_dac_dma_... APIs 2007-07-11 16:02:11 -07:00
asm-ppc PCI: remove pci_dac_dma_... APIs 2007-07-11 16:02:11 -07:00
asm-s390 lots-of-architectures: enable arbitary speed tty support 2007-07-10 17:51:13 -07:00
asm-sh PCI: remove pci_dac_dma_... APIs 2007-07-11 16:02:11 -07:00
asm-sh64 PCI: remove pci_dac_dma_... APIs 2007-07-11 16:02:11 -07:00
asm-sparc PCI: remove pci_dac_dma_... APIs 2007-07-11 16:02:11 -07:00
asm-sparc64 PCI: remove pci_dac_dma_... APIs 2007-07-11 16:02:11 -07:00
asm-um uml: add asm/paravirt.h 2007-06-24 08:59:11 -07:00
asm-v850 PCI: remove pci_dac_dma_... APIs 2007-07-11 16:02:11 -07:00
asm-x86_64 serial: convert early_uart to earlycon for 8250 2007-07-16 09:05:35 -07:00
asm-xtensa PCI: remove pci_dac_dma_... APIs 2007-07-11 16:02:11 -07:00
crypto
keys
linux change zonelist order: zonelist order selection logic 2007-07-16 09:05:35 -07:00
math-emu
media V4L/DVB (5592): DMA: Correctly free resources on error, sync PCI streamed data 2007-05-09 10:12:42 -03:00
mtd
net Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2007-07-15 16:50:46 -07:00
pcmcia PCMCIA-NETDEV : add new ID of lan&modem multifunction card 2007-07-08 22:16:39 -04:00
rdma IB/cm: Include HCA ACK delay in local ACK timeout 2007-07-10 21:50:05 -07:00
rxrpc
scsi [SCSI] Remove unused method scsi_device_cancel 2007-07-14 16:01:16 -05:00
sound [ALSA] version 1.0.14 2007-05-31 11:03:27 +02:00
video atmel_lcdfb: AT91/AT32 LCD Controller framebuffer driver 2007-05-11 08:29:37 -07:00
Kbuild