Let's make 32bit consistent with 64bit.
-v2: Andrew pointed out for 32bit that we should use -1ULL
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-25-git-send-email-yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Simplify setup_node_mem: don't use bootmem from other node, instead
just find_e820_area in early_node_mem.
This keeps the boundary between early_res and boot mem more clear, and
lets us only call early_res_to_bootmem() one time instead of for all
nodes.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-12-git-send-email-yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Makes it consistent with the extern declaration, used when CONFIG_HIGHMEM
is set Removes redundant casts in printout messages
Signed-off-by: Andreas Fenkart <andreas.fenkart@streamunlimited.com>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Chen Liqin <liqin.chen@sunplusct.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Steven Rostedt reported that we are unconditionally making the
kernel text mapping as read-only. i.e., if someone does cpa() to
the kernel text area for setting/clearing any page table
attribute, we unconditionally clear the read-write attribute for
the kernel text mapping that is set at compile time.
We should delay (to forbid the write attribute) and enforce only
after the kernel has mapped the text as read-only.
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Tested-by: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <20091029024820.996634347@sbs-t61.sc.intel.com>
[ marked kernel_set_to_readonly as __read_mostly ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
commit cc9f7a0ccf changed
add_one_highpage_init. We don't use pfn any more.
Let's remove unnecessary argument.
This patch doesn't chage function behavior.
This patch is based on v2.6.32-rc5.
Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
LKML-Reference: <20091022112722.adc8e55c.minchan.kim@barrios-desktop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
To eventually interleave emulated nodes over physical nodes, we
need to know the physical topology of the machine without actually
registering it. This does the k8 node setup in two parts:
detection and registration. NUMA emulation can then used the
physical topology detected to setup the address ranges of emulated
nodes accordingly. If emulation isn't used, the k8 nodes are
registered as normal.
Two formals are added to the x86 NUMA setup functions: `acpi' and
`k8'. These represent whether ACPI or K8 NUMA has been detected;
both cannot be true at the same time. This specifies to the NUMA
emulation code whether an underlying physical NUMA topology exists
and which interface to use.
This patch deals solely with separating the k8 setup path into
Northbridge detection and registration steps and leaves the ACPI
changes for a subsequent patch. The `acpi' formal is added here,
however, to avoid touching all the header files again in the next
patch.
This approach also ensures emulated nodes will not span physical
nodes so the true memory latency is not misrepresented.
k8_get_nodes() may now be used to export the k8 physical topology
of the machine for NUMA emulation.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Ankita Garg <ankita@in.ibm.com>
Cc: Len Brown <len.brown@intel.com>
LKML-Reference: <alpine.DEB.1.00.0909251518400.14754@chino.kir.corp.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
For /proc/kcore, each arch registers its memory range by kclist_add().
In usual,
- range of physical memory
- range of vmalloc area
- text, etc...
are registered but "range of physical memory" has some troubles. It
doesn't updated at memory hotplug and it tend to include unnecessary
memory holes. Now, /proc/iomem (kernel/resource.c) includes required
physical memory range information and it's properly updated at memory
hotplug. Then, it's good to avoid using its own code(duplicating
information) and to rebuild kclist for physical memory based on
/proc/iomem.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For /proc/kcore, vmalloc areas are registered per arch. But, all of them
registers same range of [VMALLOC_START...VMALLOC_END) This patch unifies
them. By this. archs which have no kclist_add() hooks can see vmalloc
area correctly.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Presently, kclist_add() only eats start address and size as its arguments.
Considering to make kclist dynamically reconfigulable, it's necessary to
know which kclists are for System RAM and which are not.
This patch add kclist types as
KCORE_RAM
KCORE_VMALLOC
KCORE_TEXT
KCORE_OTHER
This "type" is used in a patch following this for detecting KCORE_RAM.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since alloc_bootmem() will never return inaccessible (via virtual
addressing) memory anyway, using the ..._low() variant only makes sense
when the physical address range of the allocated memory must fulfill
further constraints, espacially since on 64-bits (or more generally in all
cases where the pools the two variants allocate from are than the full
available range.
Probably the use in alloc_tce_table() could also be eliminated (based on
code inspection of pci-calgary_64.c), but that seems too risky given I
know nothing about that hardware and have no way to test it.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 9617729941 ("Drop free_pages()")
modified nr_free_pages() to return 'unsigned long' instead of 'unsigned
int'. This made the casts to 'unsigned long' in most callers superfluous,
so remove them.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Acked-by: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Howells <dhowells@redhat.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Chris Zankel <zankel@tensilica.com>
Cc: Michal Simek <monstr@monstr.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The hooks that we modify are:
- Page fault handler (to handle kmemcheck faults)
- Debug exception handler (to hide pages after single-stepping
the instruction that caused the page fault)
Also redefine memset() to use the optimized version if kmemcheck is
enabled.
(Thanks to Pekka Enberg for minimizing the impact on the page fault
handler.)
As kmemcheck doesn't handle MMX/SSE instructions (yet), we also disable
the optimized xor code, and rely instead on the generic C implementation
in order to avoid false-positive warnings.
Signed-off-by: Vegard Nossum <vegardno@ifi.uio.no>
[whitespace fixlet]
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
[rebased for mainline inclusion]
Signed-off-by: Vegard Nossum <vegardno@ifi.uio.no>
With sparse memory, holes should not be marked present for memmap.
This patch makes sure sparsemem really works on SMP mode (!NUMA).
[ Impact: use less memory to map fragmented RAM, avoid boot-OOM/crash ]
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
LKML-Reference: <1242117600.22431.0.camel@sli10-desk.sh.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch moves the max_pfn_mapped and max_low_pfn_mapped global
variables to kernel/setup.c where they're initialized.
[ Impact: cleanup ]
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
LKML-Reference: <1240923649.1982.21.camel@penberg-laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
Ingo found there warning about nodeid with some configs.
try to use for_each_online_node for non numa too. in that case
nodeid will be 0.
also move out boundary checking from setup_node_bootmem(), so
non-numa config will not check it.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <49B03069.80001@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Rather than relying on the ever-unreliable system_state,
add a specific __vmalloc_start_set flag to indicate whether
the vmalloc area has meaningful boundaries yet, and use that
in x86-32's __phys_addr and __virt_addr_valid.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
In preparation for moving the function declaration to a header file,
unify 32-bit and 64-bit signatures.
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1236257708-27269-16-git-send-email-penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
The table_start, table_end, and table_top are too generic for global
namespace so rename them to be more specific.
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1236257708-27269-15-git-send-email-penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
This patch moves the init_memory_mapping() function to common mm/init.c.
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1236257708-27269-14-git-send-email-penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
This patch adds an empty static inline init_gbpages() for the 32-bit
version of init_memory_mapping() making both versions identical.
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1236257708-27269-13-git-send-email-penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
As a trivial preparation for moving common code to arc/x86/mm/init.c,
ifdef the 32-bit and 64-bit versions of NR_RANGE_MR.
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1236257708-27269-12-git-send-email-penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
To reduce the diff between the 32-bit and 64-bit versions of
init_memory_mapping(), ifdef configuration specific pfn setup
code in the function.
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1236257708-27269-11-git-send-email-penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
To reduce the diff between the 32-bit and 64-bit versions of
init_memory_mapping(), ifdef configuration specific setup code
in the function.
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1236257708-27269-10-git-send-email-penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
This patch adds a sanity check to the 32-bit version of
init_memory_mapping() to reduce the diff to the 64-bit version.
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1236257708-27269-9-git-send-email-penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
The 64-bit version of init_memory_mapping() uses the last mapped
address returned from kernel_physical_mapping_init() whereas the
32-bit version doesn't. This patch adds relevant ifdefs to both
versions of the function to reduce the diff between them.
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1236257708-27269-8-git-send-email-penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
This patch renames after_init_bootmem to after_bootmem in
mm/init_32.c to reduce the diff to the 64-bit version of of
init_memory_mapping().
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1236257708-27269-7-git-send-email-penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
The save_mr() function already checks that start_pfn is less than
end_pfn so we can remove the unnecessary check which reduces the
diff between the 32-bit and the 64-bit versions of init_memory_mapping().
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1236257708-27269-6-git-send-email-penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
Enabling NX, PSE, and PGE are only required on 32-bit so ifdef them
in both versions of the function.
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1236257708-27269-5-git-send-email-penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
This patch moves pgd_base out of init_memory_mapping() to reduce
the diff between the 32-bit version and the 64-bit version of the
function.
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1236257708-27269-4-git-send-email-penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
There are some minor differences between the 32-bit and 64-bit
find_early_table_space() functions. This patch wraps those
differences under CONFIG_X86_32 to make the function identical
on both configurations.
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1236257708-27269-3-git-send-email-penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
To reduce the diff between the 32-bit and 64-bit versions of
init_memory_mapping(), add gbpages support to the 32-bit version.
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1236257708-27269-2-git-send-email-penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
To reduce the diff between the 32-bit and 64-bit versions of
init_memory_mapping(), fix up all trivial issues.
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1236257708-27269-1-git-send-email-penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: clean up
Simplify the code, reuse some lines.
Remove min_low_pfn reference, it is always 0
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <49AEE2C4.2030602@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
The function is identical on 32-bit and 64-bit configurations so move it to the
common mm/init.c file.
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
LKML-Reference: <1236158020.29024.28.camel@penberg-laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
make code more readable and more like 64-bit
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <49AE48B4.8010907@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix panic on system 2g x4 sockets
Found one system with 4 sockets and every sockets has 2g can not boot
with numa32 because boot mem is crossing nodes.
So try to have numa version of setup_bootmem_allocator().
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <49AE485B.8000902@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
The function is identical on 32-bit and 64-bit configurations so move
it to the common mm/init.c file.
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
LKML-Reference: <1236160001.29024.29.camel@penberg-laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
This patch moves set_highmem_pages_init() to arch/x86/mm/highmem_32.c.
The declaration of the function is kept in asm/numa_32.h because
asm/highmem.h is included only if CONFIG_HIGHMEM is enabled so we
can't put the empty static inline function there.
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
LKML-Reference: <1236082212.2675.24.camel@penberg-laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: unification
This patch introduces a common arch/x86/mm/init.c and moves the identical
free_init_pages() and free_initmem() functions to the file.
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
LKML-Reference: <1236078906.2675.18.camel@penberg-laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: unification
This patch adds sanity checks that are already in init_64.c to init_32.c.
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
LKML-Reference: <1236078902.2675.16.camel@penberg-laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
This patch changes find_early_table_space() to use roundup() for rounding up
tables to page size to unify the common parts of the 32-bit and 64-bit
implementations.
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
LKML-Reference: <1236077705.2675.6.camel@penberg-laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
The __VMALLOC_RESERVE global variable is not used in init_32.c. Move that to
pgtable_32.c to reduce the diff between init_32.c and init_64.c.
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
LKML-Reference: <1236077704.2675.4.camel@penberg-laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: minor change to populate_extra_pte() and addition of pmd flavor
Update populate_extra_pte() to return pointer to the pte_t for the
specified address and add populate_extra_pmd() which only populates
till the pmd and returns pointer to the pmd entry for the address.
For 64bit, pud/pmd/pte fill functions are separated out from
set_pte_vaddr[_pud]() and used for set_pte_vaddr[_pud]() and
populate_extra_{pte|pmd}().
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: keep kernel text read only
Because dynamic ftrace converts the calls to mcount into and out of
nops at run time, we needed to always keep the kernel text writable.
But this defeats the point of CONFIG_DEBUG_RODATA. This patch converts
the kernel code to writable before ftrace modifies the text, and converts
it back to read only afterward.
The kernel text is converted to read/write, stop_machine is called to
modify the code, then the kernel text is converted back to read only.
The original version used SYSTEM_STATE to determine when it was OK
or not to change the code to rw or ro. Andrew Morton pointed out that
using SYSTEM_STATE is a bad idea since there is no guarantee to what
its state will actually be.
Instead, I moved the check into the set_kernel_text_* functions
themselves, and use a local variable to determine when it is
OK to change the kernel text RW permissions.
[ Update: Ingo Molnar suggested moving the prototypes to cacheflush.h ]
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: use new dynamic allocator, unified access to static/dynamic
percpu memory
Convert to the new dynamic percpu allocator.
* implement populate_extra_pte() for both 32 and 64
* update setup_per_cpu_areas() to use pcpu_setup_static()
* define __addr_to_pcpu_ptr() and __pcpu_ptr_to_addr()
* define config HAVE_DYNAMIC_PER_CPU_AREA
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: cleanup
Make the max_low_pfn logic a bit more standard between
lowmem_pfn_init() and highmem_pfn_init().
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
Split find_low_pfn_range() into two functions:
- lowmem_pfn_init()
- highmem_pfn_init()
The former gets called if all of RAM fits into lowmem,
otherwise we call highmem_pfn_init().
Signed-off-by: Ingo Molnar <mingo@elte.hu>
arch/x86/mm/init_32.c: In function ‘find_low_pfn_range’:
arch/x86/mm/init_32.c:696: warning: format ‘%u’ expects type ‘unsigned int’, but
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Debugging and original patch from Nick Piggin <npiggin@suse.de>
The early fixmap pmd entry inserted at the very top of the KVA is causing the
subsequent fixmap mapping code to not provide physically linear pte pages over
the kmap atomic portion of the fixmap (which relies on said property to
calculate pte addresses).
This has caused weird boot failures in kmap_atomic much later in the boot
process (initial userspace faults) on a 32-bit PAE system with a larger number
of CPUs (smaller CPU counts tend not to run over into the next page so don't
show up the problem).
Solve this by attempting to clear out the page table, and copy any of its
entries to the new one. Also, add a bug if a nonlinear condition is encountered
and can't be resolved, which might save some hours of debugging if this fragile
scheme ever breaks again...
Once we have such logic, we can also use it to eliminate the early ioremap
trickery around the page table setup for the fixmap area. This also fixes
potential issues with FIX_* entries sharing the leaf page table with the early
ioremap ones getting discarded by early_ioremap_clear() and not restored by
early_ioremap_reset(). It at once eliminates the temporary (and configuration,
namely NR_CPUS, dependent) unavailability of early fixed mappings during the
time the fixmap area page tables get constructed.
Finally, also replace the hard coded calculation of the initial table space
needed for the fixmap area with a proper one, allowing kernels configured for
large CPU counts to actually boot.
Based-on: Nick Piggin <npiggin@suse.de>
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Device drivers that use pci_request_regions() (and similar APIs) have a
reasonable expectation that they are the only ones accessing their device.
As part of the e1000e hunt, we were afraid that some userland (X or some
bootsplash stuff) was mapping the MMIO region that the driver thought it
had exclusively via /dev/mem or via various sysfs resource mappings.
This patch adds the option for device drivers to cause their reserved
regions to the "banned from /dev/mem use" list, so now both kernel memory
and device-exclusive MMIO regions are banned.
NOTE: This is only active when CONFIG_STRICT_DEVMEM is set.
In addition to the config option, a kernel parameter iomem=relaxed is
provided for the cases where developers want to diagnose, in the field,
drivers issues from userspace.
Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Show node to memory section relationship with symlinks in sysfs
Add /sys/devices/system/node/nodeX/memoryY symlinks for all
the memory sections located on nodeX. For example:
/sys/devices/system/node/node1/memory135 -> ../../memory/memory135
indicates that memory section 135 resides on node1.
Also revises documentation to cover this change as well as updating
Documentation/ABI/testing/sysfs-devices-memory to include descriptions
of memory hotremove files 'phys_device', 'phys_index', and 'state'
that were previously not described there.
In addition to it always being a good policy to provide users with
the maximum possible amount of physical location information for
resources that can be hot-added and/or hot-removed, the following
are some (but likely not all) of the user benefits provided by
this change.
Immediate:
- Provides information needed to determine the specific node
on which a defective DIMM is located. This will reduce system
downtime when the node or defective DIMM is swapped out.
- Prevents unintended onlining of a memory section that was
previously offlined due to a defective DIMM. This could happen
during node hot-add when the user or node hot-add assist script
onlines _all_ offlined sections due to user or script inability
to identify the specific memory sections located on the hot-added
node. The consequences of reintroducing the defective memory
could be ugly.
- Provides information needed to vary the amount and distribution
of memory on specific nodes for testing or debugging purposes.
Future:
- Will provide information needed to identify the memory
sections that need to be offlined prior to physical removal
of a specific node.
Symlink creation during boot was tested on 2-node x86_64, 2-node
ppc64, and 2-node ia64 systems. Symlink creation during physical
memory hot-add tested on a 2-node x86_64 system.
Signed-off-by: Gary Hade <garyhade@us.ibm.com>
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (63 commits)
stacktrace: provide save_stack_trace_tsk() weak alias
rcu: provide RCU options on non-preempt architectures too
printk: fix discarding message when recursion_bug
futex: clean up futex_(un)lock_pi fault handling
"Tree RCU": scalable classic RCU implementation
futex: rename field in futex_q to clarify single waiter semantics
x86/swiotlb: add default swiotlb_arch_range_needs_mapping
x86/swiotlb: add default phys<->bus conversion
x86: unify pci iommu setup and allow swiotlb to compile for 32 bit
x86: add swiotlb allocation functions
swiotlb: consolidate swiotlb info message printing
swiotlb: support bouncing of HighMem pages
swiotlb: factor out copy to/from device
swiotlb: add arch hook to force mapping
swiotlb: allow architectures to override phys<->bus<->phys conversions
swiotlb: add comment where we handle the overflow of a dma mask on 32 bit
rcu: fix rcutorture behavior during reboot
resources: skip sanity check of busy resources
swiotlb: move some definitions to header
swiotlb: allow architectures to override swiotlb pool allocation
...
Fix up trivial conflicts in
arch/x86/kernel/Makefile
arch/x86/mm/init_32.c
include/linux/hardirq.h
as per Ingo's suggestions.
swiotlb on 32 bit will be used by Xen domain 0 support.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Some of the inconsistencies checked for at run time can be detected at
build time already, so duplicate the checks done at run time to also be
done at build time.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
Neither of the callers really needs the physical address this function
returns, so eliminate the pointless argument.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: simplify implementation, cleanup
If !(pgd_val(*pgd) & _PAGE_PRESENT) in PAE mode, we need not get value of
pmd_table again.
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: introduce new APIs, separate kmap code from CONFIG_HIGHMEM
This takes the code used for CONFIG_HIGHMEM memory mappings except that
it's designed for dynamic IO resource mapping.
These fixmaps are available even with CONFIG_HIGHMEM turned off.
Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: change the implementation of the debug feature
the periodic corruption checks are better off run from a work queue; there's
nothing time critical about them and this way the amount of
interrupt-context work is reduced.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Use one of the software-defined PTE bits to indicate that a mapping is
intended for an IO address. On native hardware this is irrelevent,
since a physical address is a physical address. But in a virtual
environment, physical addresses are also virtualized, so there needs
to be some way to distinguish between pseudo-physical addresses and
actual hardware addresses; _PAGE_IOMAP indicates this intent.
By default, __supported_pte_mask masks out _PAGE_IOMAP, so it doesn't
even appear in the final pagetable.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Move the prototypes from the generic kernel.h header to the more
appropriate include/asm-x86/bios_ebda.h header file.
Also, remove the check from the power management code - this is a
pure x86 matter for now.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Interrupt context no longer splits large page in cpa(). So we can do away
with cpa memory pool code.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: arjan@linux.intel.com
Cc: venkatesh.pallipadi@intel.com
Cc: jeremy@goop.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Don't use large pages for kernel identity mapping with DEBUG_PAGEALLOC.
This will remove the need to split the large page for the
allocated kernel page in the interrupt context.
This will simplify cpa code(as we don't do the split any more from the
interrupt context). cpa code simplication in the subsequent patches.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: arjan@linux.intel.com
Cc: venkatesh.pallipadi@intel.com
Cc: jeremy@goop.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In the first pass, kernel physical mapping will be setup using large or
small pages but uses the same PTE attributes as that of the early
PTE attributes setup by early boot code in head_[32|64].S
After flushing TLB's, we go through the second pass, which setups the
direct mapped PTE's with the appropriate attributes (like NX, GLOBAL etc)
which are runtime detectable.
This two pass mechanism conforms to the TLB app note which says:
"Software should not write to a paging-structure entry in a way that would
change, for any linear address, both the page size and either the page frame
or attributes."
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: arjan@linux.intel.com
Cc: venkatesh.pallipadi@intel.com
Cc: jeremy@goop.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
They were already called once in arch/x86/kernel/setup.c - we don't need to call them again.
fixes:
http://bugzilla.kernel.org/show_bug.cgi?id=11485
Signed-off-by: Alex Nixon <alex.nixon@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Perodically check for corruption in low phusical memory. Don't bother
checking at fault time, since it won't show anything useful.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
included <asm/smp.h> in mm/init_32.c for zap_low_mappings()
declared free_initmem() in asm-x86/page_XX.h
Signed-off-by: Jaswinder Singh <jaswinder@infradead.org>
when more than 4g memory is installed, don't map the big hole below 4g.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Older x86-32 processors do not support global mappings (PGD), so must
only use it if the processor supports it.
The _PAGE_KERNEL* flags always have _PAGE_KERNEL set, since logically
we always want it set.
This is OK even on processors which do not support PGD, since all
_PAGE flags are masked with __supported_pte_mask before being turned
into a real in-pagetable pte. On 32-bit systems, __supported_pte_mask
is initialized to not contain _PAGE_GLOBAL, and it is then added if
the CPU is found to support it.
The x86-32 code used to use __PAGE_KERNEL/__PAGE_KERNEL_EXEC for this
purpose, but they're now redundant and can be removed.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
move out e820_register_active_regions from non numa zones_sizes_init()
and remove numa version zones_sizes_init().
and let 32 bit call remove_all_active_ranges() in setup_arch() directly
like 64-bit
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
use PMD_SHIFT to calculate boundary also adjust size for pre-allocated
table size
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
some ram-end boundary only has page alignment, instead of 2M alignment.
v2: make init_memory_mapping more solid: start could be any value other than 0
v3: fix NON PAE by handling left over in kernel_physical_mapping
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
do that in init_memory_mapping
also remove one init_ohci1394_dma_on_all_controllers
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The patch "x86: introduce init_memory_mapping for 32bit" does not allocate
enough space for PTEs if the CPU does not implement PSE.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>