linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-15 08:14:15 +08:00

Author	SHA1	Message	Date
KAMEZAWA Hiroyuki	f2dbcfa738	mm: clean up for early_pfn_to_nid() What's happening is that the assertion in mm/page_alloc.c:move_freepages() is triggering: BUG_ON(page_zone(start_page) != page_zone(end_page)); Once I knew this is what was happening, I added some annotations: if (unlikely(page_zone(start_page) != page_zone(end_page))) { printk(KERN_ERR "move_freepages: Bogus zones: " "start_page[%p] end_page[%p] zone[%p]\n", start_page, end_page, zone); printk(KERN_ERR "move_freepages: " "start_zone[%p] end_zone[%p]\n", page_zone(start_page), page_zone(end_page)); printk(KERN_ERR "move_freepages: " "start_pfn[0x%lx] end_pfn[0x%lx]\n", page_to_pfn(start_page), page_to_pfn(end_page)); printk(KERN_ERR "move_freepages: " "start_nid[%d] end_nid[%d]\n", page_to_nid(start_page), page_to_nid(end_page)); ... And here's what I got: move_freepages: Bogus zones: start_page[2207d0000] end_page[2207dffc0] zone[fffff8103effcb00] move_freepages: start_zone[fffff8103effcb00] end_zone[fffff8003fffeb00] move_freepages: start_pfn[0x81f600] end_pfn[0x81f7ff] move_freepages: start_nid[1] end_nid[0] My memory layout on this box is: [ 0.000000] Zone PFN ranges: [ 0.000000] Normal 0x00000000 -> 0x0081ff5d [ 0.000000] Movable zone start PFN for each node [ 0.000000] early_node_map[8] active PFN ranges [ 0.000000] 0: 0x00000000 -> 0x00020000 [ 0.000000] 1: 0x00800000 -> 0x0081f7ff [ 0.000000] 1: 0x0081f800 -> 0x0081fe50 [ 0.000000] 1: 0x0081fed1 -> 0x0081fed8 [ 0.000000] 1: 0x0081feda -> 0x0081fedb [ 0.000000] 1: 0x0081fedd -> 0x0081fee5 [ 0.000000] 1: 0x0081fee7 -> 0x0081ff51 [ 0.000000] 1: 0x0081ff59 -> 0x0081ff5d So it's a block move in that 0x81f600-->0x81f7ff region which triggers the problem. This patch: Declaration of early_pfn_to_nid() is scattered over per-arch include files, and it seems it's complicated to know when the declaration is used. I think it makes fix-for-memmap-init not easy. This patch moves all declaration to include/linux/mm.h After this, if !CONFIG_NODES_POPULATES_NODE_MAP && !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID -> Use static definition in include/linux/mm.h else if !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID -> Use generic definition in mm/page_alloc.c else -> per-arch back end function will be called. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reported-by: David Miller <davem@davemlloft.net> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: <stable@kernel.org> [2.6.25.x, 2.6.26.x, 2.6.27.x, 2.6.28.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-18 15:37:55 -08:00
Brian Gerst	44581a28e8	x86: fix abuse of per_cpu_offset Impact: bug fix Don't use per_cpu_offset() to determine if it valid to access a per-cpu variable for a given cpu number. It is not a valid assumption on x86-64 anymore. Use cpu_possible() instead. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-09 10:30:30 +01:00
Brian Gerst	6470aff619	x86: move 64-bit NUMA code Impact: Code movement, no functional change. Move the 64-bit NUMA code from setup_percpu.c to numa_64.c Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2009-01-27 12:56:47 +09:00
Mike Travis	168ef543a4	x86: prepare for cpumask iterators to only go to nr_cpu_ids Impact: cleanup, futureproof In fact, all cpumask ops will only be valid (in general) for bit numbers < nr_cpu_ids. So use that instead of NR_CPUS in various places. This is always safe: no cpu number can be >= nr_cpu_ids, and nr_cpu_ids is initialized to NR_CPUS at boot. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Mike Travis <travis@sgi.com> Acked-by: Ingo Molnar <mingo@elte.hu>	2008-12-16 17:40:58 -08:00
Joerg Roedel	be3e89ee6d	x86: convert numa_64.c from round_up to roundup Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-26 15:39:21 +02:00
Johannes Weiner	b61bfa3c46	mm: move bootmem descriptors definition to a single place There are a lot of places that define either a single bootmem descriptor or an array of them. Use only one central array with MAX_NUMNODES items instead. Signed-off-by: Johannes Weiner <hannes@saeurebad.de> Acked-by: Ralf Baechle <ralf@linux-mips.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Richard Henderson <rth@twiddle.net> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Tony Luck <tony.luck@intel.com> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Kyle McMartin <kyle@parisc-linux.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: David S. Miller <davem@davemloft.net> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:14 -07:00
Thomas Gleixner	cfc1b9a6a6	x86: convert Dprintk to pr_debug There are a couple of places where (P)Dprintk is used which is an old compile time enabled printk wrapper. Convert it to the generic pr_debug(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-07-21 21:35:38 +02:00
Yinghai Lu	c987d12f84	x86: remove end_pfn in 64bit and use max_pfn directly. Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-08 13:10:38 +02:00
Yinghai Lu	1f75d7e32e	x86: introduce initmem_init for 64 bit Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-08 12:50:14 +02:00
Ingo Molnar	2b4fa851b2	Merge branch 'x86/numa' into x86/devel Conflicts: arch/x86/Kconfig arch/x86/kernel/e820.c arch/x86/kernel/efi_64.c arch/x86/kernel/mpparse.c arch/x86/kernel/setup.c arch/x86/kernel/setup_32.c arch/x86/mm/init_64.c include/asm-x86/proto.h Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-08 11:59:23 +02:00
Thomas Gleixner	886533a3e3	x86: numa_64.c fix shadowed variable sparse mutters: arch/x86/mm/numa_64.c:195:27: warning: symbol 'end_pfn' shadows an earlier one Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-08 11:31:29 +02:00
Thomas Gleixner	864fc31ea5	x86: numa_64.c make local variables static plat_node_bdata, cmdline, nodemap_addr, nodemap_size are local to numa_64.c. Make them static Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-08 11:31:28 +02:00
Mike Travis	9f248bde9d	x86: remove the static 256k node_to_cpumask_map * Consolidate node_to_cpumask operations and remove the 256k byte node_to_cpumask_map. This is done by allocating the node_to_cpumask_map array after the number of possible nodes (nr_node_ids) is known. * Debug printouts when CONFIG_DEBUG_PER_CPU_MAPS is active have been increased. It now shows faults when calling node_to_cpumask() and node_to_cpumask_ptr(). For inclusion into sched-devel/latest tree. Based on: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git + sched-devel/latest .../mingo/linux-2.6-sched-devel.git Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-07-08 11:31:24 +02:00
Mike Travis	23ca4bba3e	x86: cleanup early per cpu variables/accesses v4 * Introduce a new PER_CPU macro called "EARLY_PER_CPU". This is used by some per_cpu variables that are initialized and accessed before there are per_cpu areas allocated. ["Early" in respect to per_cpu variables is "earlier than the per_cpu areas have been setup".] This patchset adds these new macros: DEFINE_EARLY_PER_CPU(_type, _name, _initvalue) EXPORT_EARLY_PER_CPU_SYMBOL(_name) DECLARE_EARLY_PER_CPU(_type, _name) early_per_cpu_ptr(_name) early_per_cpu_map(_name, _idx) early_per_cpu(_name, _cpu) The DEFINE macro defines the per_cpu variable as well as the early map and pointer. It also initializes the per_cpu variable and map elements to "_initvalue". The early_* macros provide access to the initial map (usually setup during system init) and the early pointer. This pointer is initialized to point to the early map but is then NULL'ed when the actual per_cpu areas are setup. After that the per_cpu variable is the correct access to the variable. The early_per_cpu() macro is not very efficient but does show how to access the variable if you have a function that can be called both "early" and "late". It tests the early ptr to be NULL, and if not then it's still valid. Otherwise, the per_cpu variable is used instead: #define early_per_cpu(_name, _cpu) \ (early_per_cpu_ptr(_name) ? \ early_per_cpu_ptr(_name)[_cpu] : \ per_cpu(_name, _cpu)) A better method is to actually check the pointer manually. In the case below, numa_set_node can be called both "early" and "late": void __cpuinit numa_set_node(int cpu, int node) { int cpu_to_node_map = early_per_cpu_ptr(x86_cpu_to_node_map); if (cpu_to_node_map) cpu_to_node_map[cpu] = node; else per_cpu(x86_cpu_to_node_map, cpu) = node; } Add a flag "arch_provides_topology_pointers" that indicates pointers to topology cpumask_t maps are available. Otherwise, use the function returning the cpumask_t value. This is useful if cpumask_t set size is very large to avoid copying data on to/off of the stack. * The coverage of CONFIG_DEBUG_PER_CPU_MAPS has been increased while the non-debug case has been optimized a bit. * Remove an unreferenced compiler warning in drivers/base/topology.c * Clean up #ifdef in setup.c For inclusion into sched-devel/latest tree. Based on: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git + sched-devel/latest .../mingo/linux-2.6-sched-devel.git Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-07-08 11:31:20 +02:00
Paul Jackson	e9197bf011	x86 boot: remove some unused extern function declarations Remove three extern declarations for routines that don't exist. Fix a typo in a comment. Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-05-25 10:55:10 +02:00
Yinghai Lu	1a27fc0a42	x86_64: fix setup_node_bootmem to support big mem excluding with memmap typical case: four sockets system, every node has 4g ram, and we are using: memmap=10g$4g to mask out memory on node1 and node2 when numa is enabled, early_node_mem is used to get node_data and node_bootmap. if it can not get memory from the same node with find_e820_area(), it will use alloc_bootmem to get buff from previous nodes. so check it and print out some info about it. need to move early_res_to_bootmem into every setup_node_bootmem. and it takes range that node has. otherwise alloc_bootmem could return addr that reserved early. depends on "mm: make reserve_bootmem can crossed the nodes". Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-26 22:51:08 +02:00
Linus Torvalds	ec965350bb	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel: (62 commits) sched: build fix sched: better rt-group documentation sched: features fix sched: /debug/sched_features sched: add SCHED_FEAT_DEADLINE sched: debug: show a weight tree sched: fair: weight calculations sched: fair-group: de-couple load-balancing from the rb-trees sched: fair-group scheduling vs latency sched: rt-group: optimize dequeue_rt_stack sched: debug: add some debug code to handle the full hierarchy sched: fair-group: SMP-nice for group scheduling sched, cpuset: customize sched domains, core sched, cpuset: customize sched domains, docs sched: prepatory code movement sched: rt: multi level group constraints sched: task_group hierarchy sched: fix the task_group hierarchy for UID grouping sched: allow the group scheduler to have multiple levels sched: mix tasks and groups ...	2008-04-21 15:40:24 -07:00
Mike Travis	f46bdf2db2	numa: move large array from stack to _initdata section * Move large array "struct bootnode nodes" from stack to _initdata section to reduce amount of stack space required. Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-19 19:44:58 +02:00
Suresh Siddha	6ec6e0d9f2	srat, x86: add support for nodes spanning other nodes For example, If the physical address layout on a two node system with 8 GB memory is something like: node 0: 0-2GB, 4-6GB node 1: 2-4GB, 6-8GB Current kernels fail to boot/detect this NUMA topology. ACPI SRAT tables can expose such a topology which needs to be supported. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-19 19:19:55 +02:00
Mike Travis	b447a468fc	x86: clean up non-smp usage of cpu maps Cleanup references to the early cpu maps for the non-SMP configuration and remove some functions called for SMP configurations only. Cc: Andi Kleen <ak@suse.de> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-17 17:41:34 +02:00
Yinghai Lu	04adf11435	x86: remove never used nodenumer in pda Signed-off-by: Yinghai Lu <yinghai.lu@sun.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-17 17:40:47 +02:00
Yinghai Lu	37bff62e98	x86_64: free_bootmem should take phys so use nodedata_phys directly. Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-03-21 17:06:15 +01:00
Yinghai Lu	7c9e92b6cd	x86: not set node to cpu_to_node if the node is not online resolve boot problem reported by Mel Gorman: http://lkml.org/lkml/2008/2/13/404 init_cpu_to_node will use cpu->apic (from MADT or mptable) and apic->node(from SRAT or AMD config space with k8_bus_64.c) to have cpu->node mapping, and later identify_cpu will overwrite them again...(with nearby_node...) this patch checks if the node is online, otherwise it will not update cpu_node map. so keep cpu_node map to online node before identify_cpu..., to prevent possible error. Signed-off-by: Yinghai Lu <yinghai.lu@sun.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Thomas Gleixner <tglx@linutronix.de>	2008-03-04 17:10:12 +01:00
Yinghai Lu	b7ad149d62	x86: reenable support for system without on node0 One system doesn't have RAM for node0 installed. SRAT: PXM 0 -> APIC 0 -> Node 0 SRAT: PXM 0 -> APIC 1 -> Node 0 SRAT: PXM 1 -> APIC 2 -> Node 1 SRAT: PXM 1 -> APIC 3 -> Node 1 SRAT: Node 1 PXM 1 0-a0000 SRAT: Node 1 PXM 1 0-dd000000 SRAT: Node 1 PXM 1 0-123000000 ACPI: SLIT: nodes = 2 10 13 13 10 mapped APIC to ffffffffff5fb000 ( fee00000) Bootmem setup node 1 0000000000000000-0000000123000000 NODE_DATA [000000000000e000 - 0000000000014fff] bootmap [0000000000015000 - 00000000000395ff] pages 25 Could not find start_pfn for node 0 Pid: 0, comm: swapper Not tainted 2.6.24-smp-g5a514e21-dirty #14 Call Trace: [<ffffffff80bab498>] free_area_init_node+0x22/0x381 [<ffffffff8045ffc5>] generic_swap+0x0/0x17 [<ffffffff80bab0cc>] find_zone_movable_pfns_for_nodes+0x54/0x271 [<ffffffff80baba5f>] free_area_init_nodes+0x239/0x287 [<ffffffff80ba6311>] paging_init+0x46/0x4c [<ffffffff80b9dda5>] setup_arch+0x3c3/0x44e [<ffffffff80b978be>] start_kernel+0x6f/0x2c7 [<ffffffff80b971cc>] _sinittext+0x1cc/0x1d3 This happens because node 0 is not online, but the node state in mm/page_alloc.c has node 0 set. nodemask_t node_states[NR_NODE_STATES] __read_mostly = { [N_POSSIBLE] = NODE_MASK_ALL, [N_ONLINE] = { { [0] = 1UL } }, So we need to clear node_online_map before initializing the memory. Signed-off-by: Yinghai Lu <yinghai.lu@sun.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-02-18 20:54:14 +01:00
Bernhard Walle	72a7fe3967	Introduce flags for reserve_bootmem() This patchset adds a flags variable to reserve_bootmem() and uses the BOOTMEM_EXCLUSIVE flag in crashkernel reservation code to detect collisions between crashkernel area and already used memory. This patch: Change the reserve_bootmem() function to accept a new flag BOOTMEM_EXCLUSIVE. If that flag is set, the function returns with -EBUSY if the memory already has been reserved in the past. This is to avoid conflicts. Because that code runs before SMP initialisation, there's no race condition inside reserve_bootmem_core(). [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: fix powerpc build] Signed-off-by: Bernhard Walle <bwalle@suse.de> Cc: <linux-arch@vger.kernel.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-07 08:42:25 -08:00
Yinghai Lu	6118f76fb7	x86: print out node_data addr and bootmap_start addr print out node_data addr and bootmap_start addr. helpful for debugging early crashes on high-end NUMA systems. Signed-off-by: Yinghai Lu <yinghai.lu@sun.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-02-04 16:47:56 +01:00
Yinghai Lu	9347e0b0ce	x86: remove unneeded round_up Signed-off-by: Yinghai Lu <yinghai.lu@sun.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-02-01 17:49:42 +01:00
Yinghai Lu	24a5da73f4	x86_64: make bootmap_start page align v6 boot oopses when a system has 64 or 128 GB of RAM installed: Calling initcall 0xffffffff80bc33b6: sctp_init+0x0/0x711() BUG: unable to handle kernel NULL pointer dereference at 000000000000005f IP: [<ffffffff802bfe55>] proc_register+0xe7/0x10f PGD 0 Oops: 0000 [1] SMP CPU 0 Modules linked in: Pid: 1, comm: swapper Not tainted 2.6.24-smp-g5a514e21-dirty #6 RIP: 0010:[<ffffffff802bfe55>] [<ffffffff802bfe55>] proc_register+0xe7/0x10f RSP: 0000:ffff810824c57e60 EFLAGS: 00010246 RAX: 000000000000d7d7 RBX: ffff811024c5fa80 RCX: ffff810824c57e08 RDX: 0000000000000000 RSI: 0000000000000195 RDI: ffffffff80cc2460 RBP: ffffffffffffffff R08: 0000000000000000 R09: ffff811024c5fa80 R10: 0000000000000000 R11: 0000000000000002 R12: ffff810824c57e6c R13: 0000000000000000 R14: ffff810824c57ee0 R15: 00000006abd25bee FS: 0000000000000000(0000) GS:ffffffff80b4d000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 000000000000005f CR3: 0000000000201000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper (pid: 1, threadinfo ffff810824c56000, task ffff812024c52000) Stack: ffffffff80a57348 0000019500000000 ffff811024c5fa80 0000000000000000 00000000ffffff97 ffffffff802bfef0 0000000000000000 ffffffffffffffff 0000000000000000 ffffffff80bc3b4b ffff810824c57ee0 ffffffff80bc34a5 Call Trace: [<ffffffff802bfef0>] ? create_proc_entry+0x73/0x8a [<ffffffff80bc3b4b>] ? sctp_snmp_proc_init+0x1c/0x34 [<ffffffff80bc34a5>] ? sctp_init+0xef/0x711 [<ffffffff80b976e3>] ? kernel_init+0x175/0x2e1 [<ffffffff8020ccf8>] ? child_rip+0xa/0x12 [<ffffffff80b9756e>] ? kernel_init+0x0/0x2e1 [<ffffffff8020ccee>] ? child_rip+0x0/0x12 Code: 1e 48 83 7b 38 00 75 08 48 c7 43 38 f0 e8 82 80 48 83 7b 30 00 75 08 48 c7 43 30 d0 e9 82 80 48 c7 c7 60 24 cc 80 e8 bd 5a 54 00 <48> 8b 45 60 48 89 6b 58 48 89 5d 60 48 89 43 50 fe 05 f5 25 a0 RIP [<ffffffff802bfe55>] proc_register+0xe7/0x10f RSP <ffff810824c57e60> CR2: 000000000000005f ---[ end trace 02c2d78def82877a ]--- Kernel panic - not syncing: Attempted to kill init! it turns out some variables near end of bss are corrupted already. in System.map we have ffffffff80d40420 b rsi_table ffffffff80d40620 B krb5_seq_lock ffffffff80d40628 b i.20437 ffffffff80d40630 b xprt_rdma_inline_write_padding ffffffff80d40638 b sunrpc_table_header ffffffff80d40640 b zero ffffffff80d40644 b min_memreg ffffffff80d40648 b rpcrdma_tk_lock_g ffffffff80d40650 B sctp_assocs_id_lock ffffffff80d40658 B proc_net_sctp ffffffff80d40660 B sctp_assocs_id ffffffff80d40680 B sysctl_sctp_mem ffffffff80d40690 B sysctl_sctp_rmem ffffffff80d406a0 B sysctl_sctp_wmem ffffffff80d406b0 b sctp_ctl_socket ffffffff80d406b8 b sctp_pf_inet6_specific ffffffff80d406c0 b sctp_pf_inet_specific ffffffff80d406c8 b sctp_af_v4_specific ffffffff80d406d0 b sctp_af_v6_specific ffffffff80d406d8 b sctp_rand.33270 ffffffff80d406dc b sctp_memory_pressure ffffffff80d406e0 b sctp_sockets_allocated ffffffff80d406e4 b sctp_memory_allocated ffffffff80d406e8 b sctp_sysctl_header ffffffff80d406f0 b zero ffffffff80d406f4 A __bss_stop ffffffff80d406f4 A _end and setup_node_bootmem() will use that page 0xd40000 for bootmap Bootmem setup node 0 0000000000000000-0000000828000000 NODE_DATA [000000000008a485 - 0000000000091484] bootmap [0000000000d406f4 - 0000000000e456f3] pages 105 Bootmem setup node 1 0000000828000000-0000001028000000 NODE_DATA [0000000828000000 - 0000000828006fff] bootmap [0000000828007000 - 0000000828106fff] pages 100 Bootmem setup node 2 0000001028000000-0000001828000000 NODE_DATA [0000001028000000 - 0000001028006fff] bootmap [0000001028007000 - 0000001028106fff] pages 100 Bootmem setup node 3 0000001828000000-0000002028000000 NODE_DATA [0000001828000000 - 0000001828006fff] bootmap [0000001828007000 - 0000001828106fff] pages 100 setup_node_bootmem() makes NODE_DATA cacheline aligned, and bootmap is page-aligned. the patch updates find_e820_area() to make sure we can meet the alignment constraints. Signed-off-by: Yinghai Lu <yinghai.lu@sun.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-02-01 17:49:41 +01:00
Yinghai Lu	25eff8d4cd	x86_64: add debug name for early_res helps debugging problems in this rather murky area of code. Signed-off-by: Yinghai Lu <yinghai.lu@sun.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-02-01 17:49:41 +01:00
Yinghai Lu	afadcd788f	x86: fix nodemap_size according to nodeid bits memnode.map is s16 array because of nodeid is 16 bit now. so need to increase the nodemap_size according to that bits. Signed-off-by: Yinghai Lu <yinghai.lu@sun.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-01-30 13:34:12 +01:00
travis@sgi.com	1ce357129a	x86: early cpu_to_node fix in numa_64.c Both of these references to cpu_to_node() can potentially occur before the "late" cpu_to_node map is setup. Therefore, they should be changed to use early_cpu_to_node(). Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-01-30 13:33:33 +01:00
travis@sgi.com	4323838215	x86: change size of node ids from u8 to s16 Change the size of node ids for X86_64 from u8 to s16 to accomodate more than 32k nodes and allow for NUMA_NO_NODE (-1) to be sign extended to int. Cc: David Rientjes <rientjes@google.com> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Cc: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Mike Travis <travis@sgi.com> Reviewed-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-01-30 13:33:25 +01:00
Mike Travis	625d6cffca	x86: fix early cpu_to_node panic from nr_free_zone_pages call early_cpu_to_node() since per_cpu(cpu_to_node_map) might not be setup yet. I also had to export x86_cpu_to_node_map_early_ptr because of some calls from the network code to numa_node_id(): net/ipv4/netfilter/arp_tables.c: net/ipv4/netfilter/ip_tables.c: net/ipv4/netfilter/ip_tables.c: Signed-off-by: Mike Travis <travis@sgi.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-01-30 13:33:25 +01:00
Ingo Molnar	9c5ba48958	x86: clean up paging_init() Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-01-30 13:33:24 +01:00
Yinghai Lu	fc7250ab38	x86: only support sparsemem sparsemem is only one supported, so could remove FLAT_NODE_MEM related, that is only needed !SPARSEMEM Signed-off-by: Yinghai Lu <yinghai.lu@sun.com> Reviewed-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-01-30 13:33:24 +01:00
travis@sgi.com	c49a4955ea	x86: add debug of invalid per_cpu map accesses Provide a means to trap usages of per_cpu map variables before they are setup. Define CONFIG_DEBUG_PER_CPU_MAPS to activate. Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-01-30 13:33:22 +01:00
Andi Kleen	7517527891	x86: replace hard coded reservations in 64-bit early boot code with dynamic table On x86-64 there are several memory allocations before bootmem. To avoid them stomping on each other they used to be all hard coded in bad_area(). Replace this with an array that is filled as needed. This cleans up the code considerably and allows to expand its use. Cc: peterz@infradead.org Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-01-30 13:33:17 +01:00
travis@sgi.com	316390b093	x86: fixup NR-CPUS patch for numa This patch removes the EXPORT_SYMBOL for: x86_cpu_to_node_map_init x86_cpu_to_node_map_early_ptr ... thus fixing the section mismatch problem. Also, the mem -> node hash lookup is fixed. Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-01-30 13:33:15 +01:00
Andrew Morton	118c890961	arch/x86/mm/numa_64.c: section fix WARNING: vmlinux.o(__ksymtab+0x670): Section mismatch: reference to .init.data:x86_cpu_to_node_map_init (between '__ksymtab_x86_cpu_to_node_map_init' and '__ksymtab_node_data') Cc: Matthew Dobson <colpatch@us.ibm.com> Cc: Mike Travis <travis@sgi.com> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-01-30 13:33:14 +01:00
Mike Travis	693e3c5603	x86: reduce memory and intra-node effects Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-01-30 13:33:14 +01:00
travis@sgi.com	df3825c56d	x86: change NR_CPUS arrays in numa_64 Change the following static arrays sized by NR_CPUS to per_cpu data variables: char cpu_to_node_map[NR_CPUS]; Signed-off-by: Mike Travis <travis@sgi.com> Reviewed-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-01-30 13:33:11 +01:00
travis@sgi.com	3cc87e3f40	x86: change size of node ids from u8 to u16 Change the size of node ids from 8 bits to 16 bits to accomodate more than 256 nodes. Signed-off-by: Mike Travis <travis@sgi.com> Reviewed-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-01-30 13:33:11 +01:00
travis@sgi.com	ef97001f3d	x86: change size of APICIDs from u8 to u16 Change the size of APICIDs from u8 to u16. This partially supports the new x2apic mode that will be present on future processor chips. (Chips actually support 32-bit APICIDs, but that change is more intrusive. Supporting 16-bit is sufficient for now). Signed-off-by: Jack Steiner <steiner@sgi.com> I've included just the partial change from u8 to u16 apicids. The remaining x2apic changes will be in a separate patch. In addition, the fake_node_to_pxm_map[] and fake_apicid_to_node[] tables have been moved from local data to the __initdata section reducing stack pressure when MAX_NUMNODES and MAX_LOCAL_APIC are increased in size. Signed-off-by: Mike Travis <travis@sgi.com> Reviewed-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-01-30 13:33:10 +01:00
Yinghai Lu	a261670aed	x86: cleanup setup_node_zones called by paging_init() setup_node_zones() calcuates some variables but only use them when FLAT_NODE_MEM_MAP is set so change the MACRO postion to avoid calculating. also change it to static, and rename it to flat_setup_node_zones(). Signed-off-by: Yinghai Lu <yinghai.lu@sun.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-01-30 13:33:09 +01:00
Jeremy Fitzhardinge	5548fecdff	x86: clean up bitops-related warnings Add casts to appropriate places to silence spurious bitops warnings. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-01-30 13:30:55 +01:00
Christoph Lameter	b263295dbf	x86: 64-bit, make sparsemem vmemmap the only memory model Use sparsemem as the only memory model for UP, SMP and NUMA. Measurements indicate that DISCONTIGMEM has a higher overhead than sparsemem. And FLATMEMs benefits are minimal. So I think its best to simply standardize on sparsemem. Results of page allocator tests (test can be had via git from slab git tree branch tests) Measurements in cycle counts. 1000 allocations were performed and then the average cycle count was calculated. Order FlatMem Discontig SparseMem 0 639 665 641 1 567 647 593 2 679 774 692 3 763 967 781 4 961 1501 962 5 1356 2344 1392 6 2224 3982 2336 7 4869 7225 5074 8 12500 14048 12732 9 27926 28223 28165 10 58578 58714 58682 (Note that FlatMem is an SMP config and the rest NUMA configurations) Memory use: SMP Sparsemem ------------- Kernel size: text data bss dec hex filename 3849268 397739 1264856 5511863 541ab7 vmlinux total used free shared buffers cached Mem: 8242252 41164 8201088 0 352 11512 -/+ buffers/cache: 29300 8212952 Swap: 9775512 0 9775512 SMP Flatmem ----------- Kernel size: text data bss dec hex filename 3844612 397739 1264536 5506887 540747 vmlinux So 4.5k growth in text size vs. FLATMEM. total used free shared buffers cached Mem: 8244052 40544 8203508 0 352 11484 -/+ buffers/cache: 28708 8215344 2k growth in overall memory use after boot. NUMA discontig: text data bss dec hex filename 3888124 470659 1276504 5635287 55fcd7 vmlinux total used free shared buffers cached Mem: 8256256 56908 8199348 0 352 11496 -/+ buffers/cache: 45060 8211196 Swap: 9775512 0 9775512 NUMA sparse: text data bss dec hex filename 3896428 470659 1276824 5643911 561e87 vmlinux 8k text growth. Given that we fully inline virt_to_page and friends now that is rather good. total used free shared buffers cached Mem: 8264720 57240 8207480 0 352 11516 -/+ buffers/cache: 45372 8219348 Swap: 9775512 0 9775512 The total available memory is increased by 8k. This patch makes sparsemem the default and removes discontig and flatmem support from x86. [ akpm@linux-foundation.org: allnoconfig build fix ] Acked-by: Andi Kleen <ak@suse.de> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-01-30 13:30:47 +01:00
Thomas Gleixner	7462894a7c	x86: fixup numa 64 namespace Using a variable name, which is the same as a macro name is not really smart. Change the variable names and fixup all users. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-01-30 13:30:38 +01:00
Thomas Gleixner	e3cfe529dd	x86: cleanup numa_64.c Clean it up before applying more patches. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-01-30 13:30:37 +01:00
Thomas Gleixner	3abf024d2a	x86: nuke a ton of unused exports Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-01-30 13:30:28 +01:00
Thomas Gleixner	c9ff03428f	x86: move k8 related declarations Move k8 related declarations to k8.h and fix numa_64.c Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-01-30 13:30:16 +01:00
Mike Travis	71fff5e6ca	x86: convert cpu_to_apicid to be a per cpu variable This patch converts the x86_cpu_to_apicid array to be a per cpu variable. This saves sizeof(apicid) * NR unused cpus. Access is mostly from startup and CPU HOTPLUG functions. MP_processor_info() is one of the functions that require access to the x86_cpu_to_apicid array before the per_cpu data area is setup. For this case, a pointer to the __initdata array is initialized in setup_arch() and removed in smp_prepare_cpus() after the per_cpu data area is initialized. A second change is included to change the initial array value of ARCH i386 from 0xff to BAD_APICID to be consistent with ARCH x86_64. Signed-off-by: Mike Travis <travis@sgi.com> Cc: Andi Kleen <ak@suse.de> Cc: Christoph Lameter <clameter@sgi.com> Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-19 20:35:03 +02:00
Mike Travis	98c9e27a56	x86: fix cpu_to_node references In x86_64 and i386 architectures most arrays that are sized using NR_CPUS lay in local memory on node 0. Not only will most (99%?) of the systems not use all the slots in these arrays, particularly when NR_CPUS is increased to accommodate future very high cpu count systems, but a number of cache lines are passed unnecessarily on the system bus when these arrays are referenced by cpus on other nodes. Typically, the values in these arrays are referenced by the cpu accessing it's own values, though when passing IPI interrupts, the cpu does access the data relevant to the targeted cpu/node. Of course, if the referencing cpu is not on node 0, then the reference will still require cross node exchanges of cache lines. A common use of this is for an interrupt service routine to pass the interrupt to other cpus local to that node. Ideally, all the elements in these arrays should be moved to the per_cpu data area. In some cases (such as x86_cpu_to_apicid) the array is referenced before the per_cpu data areas are setup. In this case, a static array is declared in the __initdata area and initialized by the booting cpu (BSP). The values are then moved to the per_cpu area after it is initialized and the original static array is freed with the rest of the __initdata. This patch: Fix four instances where cpu_to_node is referenced by array instead of via the cpu_to_node macro. This is preparation to moving it to the per_cpu data area. Signed-off-by: Mike Travis <travis@sgi.com> Cc: Andi Kleen <ak@suse.de> Cc: Christoph Lameter <clameter@sgi.com> Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-17 20:16:37 +02:00
Yoann Padioleau	83e83d546c	x86: 0 -> NULL, for arch/x86_64 When comparing a pointer, it's clearer to compare it to NULL than to 0. [ tglx: arch/x86 adaptation ] Signed-off-by: Yoann Padioleau <padator@wanadoo.fr> Signed-off-by: Andi Kleen <ak@suse.de> Cc: ak@suse.de Cc: discuss@x86-64.org Cc: akpm@linux-foundation.org Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-17 20:15:52 +02:00
Thomas Gleixner	95119fbd87	x86_64: move mm Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-10-11 11:17:18 +02:00

1 2 3 4

154 Commits