Add TTB accessor functions and give it a sensible default
value. We will use this later for optimizing the fault
path.
Signed-off-by: Stuart Menefy <stuart.menefy@st.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Remove the previous saving of fault codes into the thread_struct
as they are never used, and appeared to be inherited from x86.
Signed-off-by: Stuart Menefy <stuart.menefy@st.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This adds some preliminary support for the SH-X2 MMU, used by
newer SH-4A parts (particularly SH7785).
This MMU implements a 'compat' mode with SH-X MMUs and an
'extended' mode for SH-X2 extended features. Extended features
include additional page sizes (8kB, 4MB, 64MB), as well as the
addition of page execute permissions.
The extended mode attributes are placed in a second data array,
which requires us to switch to 64-bit PTEs when in X2 mode.
With the addition of the exec perms, we also overhaul the mmap
prots somewhat, now that it's possible to handle them more
intelligently.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This implements initial support for the SH7206 (SH-2A) and SH7619
(SH-2) MMU-less CPUs.
Signed-off-by: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This is an updated version of Eric Biederman's is_init() patch.
(http://lkml.org/lkml/2006/2/6/280). It applies cleanly to 2.6.18-rc3 and
replaces a few more instances of ->pid == 1 with is_init().
Further, is_init() checks pid and thus removes dependency on Eric's other
patches for now.
Eric's original description:
There are a lot of places in the kernel where we test for init
because we give it special properties. Most significantly init
must not die. This results in code all over the kernel test
->pid == 1.
Introduce is_init to capture this case.
With multiple pid spaces for all of the cases affected we are
looking for only the first process on the system, not some other
process that has pid == 1.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: <lxc-devel@lists.sourceforge.net>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make PROT_WRITE imply PROT_READ for a number of architectures which don't
support write only in hardware.
While looking at this, I noticed that some architectures which do not
support write only mappings already take the exact same approach. For
example, in arch/alpha/mm/fault.c:
"
if (cause < 0) {
if (!(vma->vm_flags & VM_EXEC))
goto bad_area;
} else if (!cause) {
/* Allow reads even for write-only mappings */
if (!(vma->vm_flags & (VM_READ | VM_WRITE)))
goto bad_area;
} else {
if (!(vma->vm_flags & VM_WRITE))
goto bad_area;
}
"
Thus, this patch brings other architectures which do not support write only
mappings in-line and consistent with the rest. I've verified the patch on
ia64, x86_64 and x86.
Additional discussion:
Several architectures, including x86, can not support write-only mappings.
The pte for x86 reserves a single bit for protection and its two states are
read only or read/write. Thus, write only is not supported in h/w.
Currently, if i 'mmap' a page write-only, the first read attempt on that page
creates a page fault and will SEGV. That check is enforced in
arch/blah/mm/fault.c. However, if i first write that page it will fault in
and the pte will be set to read/write. Thus, any subsequent reads to the page
will succeed. It is this inconsistency in behavior that this patch is
attempting to address. Furthermore, if the page is swapped out, and then
brought back the first read will also cause a SEGV. Thus, any arbitrary read
on a page can potentially result in a SEGV.
According to the SuSv3 spec, "if the application requests only PROT_WRITE, the
implementation may also allow read access." Also as mentioned, some
archtectures, such as alpha, shown above already take the approach that i am
suggesting.
The counter-argument to this raised by Arjan, is that the kernel is enforcing
the write only mapping the best it can given the h/w limitations. This is
true, however Alan Cox, and myself would argue that the inconsitency in
behavior, that is applications can sometimes work/sometimes fails is highly
undesireable. If you read through the thread, i think people, came to an
agreement on the last patch i posted, as nobody has objected to it...
Signed-off-by: Jason Baron <jbaron@redhat.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Andi Kleen <ak@muc.de>
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Ian Molton <spyro@f2s.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
IRQs disabling in flush_cache_4096 for cache purge. Under certain
workloads we would get an IRQ in the middle of a purge operation,
and the cachelines would remain in an inconsistent state, leading
to occasional stack corruption.
Signed-off-by: Takeo Takahashi <takahashi.takeo@renesas.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This implements initial support for the vsyscall page on SH.
At the moment we leave it configurable due to having nommu
to support from the same code base. We hook it up for the
signal trampoline return at present, with more to be added
later, once uClibc catches up.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
flush_cache_mm() wraps in to flush_cache_all(), which is rather
excessive given that the number of PTEs within the specified context
are generally quite low. Optimize for walking the mm's VMA list and
selectively flushing the VMA ranges from the dcache. Invalidate the
icache only if a VMA sets VM_EXEC.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This adds support for the aforementioned CPU subtypes, and cleans
up some build issues encountered as a result.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This fixes up some of the various outstanding nommu bugs on
SH.
Signed-off-by: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
nommu needs to be able to shift PAGE_OFFSET, so we switch it to a
non-user-visible CONFIG_PAGE_OFFSET and use that in the few places
where it matters.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Inhibit mapping through page tables in __ioremap() for PCI memory
apertures on SH7751 and SH7780-style PCI controllers, translation is
not possible for these areas. For other users that map a small window
in P1/P2 space, ioremap() traps that already, and should never make
it to __ioremap().
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
There was a bug that got introduced when the split ptlock changes
went in where mm could be unintialized for user mappings, this
fixes it up..
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
ioremap() overhaul. Add support for transparent PMB mapping, get rid of
p3_ioremap(), etc. Also drop ioremap() and iounmap() routines from the
machvec, as everyone can use the generic ioremap() API instead. For PCI
memory apertures and other special cases, use the pci_iomap() API, as
boards are already required to get the mapping right there.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Cleanup of page table allocators, using generic folded PMD and PUD
helpers. TLB flushing operations are moved to a more sensible spot.
The page fault handler is also optimized slightly, we no longer waste
cycles on IRQ disabling for flushing of the page from the ITLB, since
we're already under CLI protection by the initial exception handler.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Currently when making changes to control registers, we
typically need some time for changes to take effect (8
nops, generally). However, for sh4a we simply need to
do an icbi..
This is a simple patch for implementing a general purpose
ctrl_barrier() which functions as a control register write
barrier. There's some additional documentation in the patch
itself, but it's pretty self explanatory.
There were also some places where we were not doing the
barrier, which didn't seem to have any adverse effects on
legacy parts, but certainly did on sh4a. It's safer to have
the barrier in place for legacy parts as well in these cases,
though this does make flush_tlb_all() more expensive (by an
order of 8 nops). We can ifdef around the flush_tlb_all()
case for now if it's clear that all legacy parts won't have
a problem with this.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
We had a pretty interesting oops happening, where copy_user_page()
was down()'ing p3map_sem[] with a bogus offset (particularly, an
offset that hadn't been initialized with sema_init(), due to the
mismatch between cpu_data->dcache.n_aliases and what was assumed
based off of the old CACHE_ALIAS value).
Luckily, spinlock debugging caught this for us, and so we drop
the old hardcoded CACHE_ALIAS for sh4 completely and rely on the
run-time probed cpu_data->dcache.alias_mask. This in turn gets
the p3map_sem[] index right, and everything works again.
While we're at it, also convert to 4-level page tables..
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This reworks some of the SH-4 cache handling code to more easily
accomodate newer-style caches (particularly for the > direct-mapped
case), as well as optimizing some of the old code.
Signed-off-by: Richard Curnow <richard.curnow@st.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
For some of the larger sizes we permitted spanning pages
across several PTEs, but this turned out to not be generally
useful. This reverts the sh hugetlbpage interface to something
more sensible using huge pages at single PTE granularity.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
flush_cache_range() wasn't page aligning the end of the range,
we can't assume that it will always be page aligned, and we
ended up getting unaligned faults in some rare call paths.
Additionally, we add a small optimization to just purge the
dcache entirely if the range is large enough that the page
table walking will take longer. We use an arbitrary value of
64 pages for the large range size, as per sh64.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
One of the changes necessary for shared page tables is to standardize the
pxx_page macros. pte_page and pmd_page have always returned the struct
page associated with their entry, while pte_page_kernel and pmd_page_kernel
have returned the kernel virtual address. pud_page and pgd_page, on the
other hand, return the kernel virtual address.
Shared page tables needs pud_page and pgd_page to return the actual page
structures. There are very few actual users of these functions, so it is
simple to standardize their usage.
Since this is basic cleanup, I am submitting these changes as a standalone
patch. Per Hugh Dickins' comments about it, I am also changing the
pxx_page_kernel macros to pxx_page_vaddr to clarify their meaning.
Signed-off-by: Dave McCracken <dmccr@us.ibm.com>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Quite a long time back, prepare_hugepage_range() replaced
is_aligned_hugepage_range() as the callback from mm/mmap.c to arch code to
verify if an address range is suitable for a hugepage mapping.
is_aligned_hugepage_range() stuck around, but only to implement
prepare_hugepage_range() on archs which didn't implement their own.
Most archs (everything except ia64 and powerpc) used the same
implementation of is_aligned_hugepage_range(). On powerpc, which
implements its own prepare_hugepage_range(), the custom version was never
used.
In addition, "is_aligned_hugepage_range()" was a bad name, because it
suggests it returns true iff the given range is a good hugepage range,
whereas in fact it returns 0-or-error (so the sense is reversed).
This patch cleans up by abolishing is_aligned_hugepage_range(). Instead
prepare_hugepage_range() is defined directly. Most archs use the default
version, which simply checks the given region is aligned to the size of a
hugepage. ia64 and powerpc define custom versions. The ia64 one simply
checks that the range is in the correct address space region in addition to
being suitably aligned. The powerpc version (just as previously) checks
for suitable addresses, and if necessary performs low-level MMU frobbing to
set up new areas for use by hugepages.
No libhugetlbfs testsuite regressions on ppc64 (POWER5 LPAR).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
set_page_count usage outside mm/ is limited to setting the refcount to 1.
Remove set_page_count from outside mm/, and replace those users with
init_page_count() and set_page_refcounted().
This allows more debug checking, and tighter control on how code is allowed
to play around with page->_count.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Have an explicit mm call to split higher order pages into individual pages.
Should help to avoid bugs and be more explicit about the code's intention.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Zankel <chris@zankel.net>
Signed-off-by: Yoichi Yuasa <yoichi_yuasa@tripeaks.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently the CPU subtype options are cluttering up arch/sh/Kconfig somewhat.
Given that, this moves all of that in to its own arch/sh/mm/Kconfig. Things
like cache configuration are also moved to this new location.
This also adds support for strict CPU tuning on newer cores, which requires
the addition of as-option.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This introduces a few changes in the way that the I/O routines are defined on
SH, specifically so that things like the iomap API properly wrap through the
machvec for board-specific quirks.
In addition to this, the old p3_ioremap() work is converted to a more generic
__ioremap() that will map through the PMB if it's available, or fall back on
page tables for everything else.
An alpha-like IO_CONCAT is also added so we can start to clean up the
board-specific io.h mess, which will be handled in board update patches..
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
SH7705 in extended cache mode has some left-over VALID_PAGE() cruft that it
checks when doing lazy dcache write-back. This has been gone for some time
(the last bits were in the discontig code, which should now also be gone --
this also fixes up a build error in the non-discontig case).
pfn_valid() gives the desired behaviour, so we switch to that.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There was only one board using this (hp690 specifically), and it just so
happens that it's only physically discontiguous at the "normal" P1 offset. If
we bump up the P1 offset, it's possible to hit a shadowed region of memory
where we suddenly become magically contiguous.
As people have been using this shadowed region workaround for quite some time
(and without any adverse effects), it's time to drop the left over discontig
bits that no longer have any practical use (it was always very much
hp690-centric to begin with).
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use pte_offset_map_lock, instead of pte_offset_map (or inappropriate
pte_offset_kernel) and mm-wide page_table_lock, in sundry arch places.
The i386 vm86 mark_screen_rdonly: yes, there was and is an assumption that the
screen fits inside the one page table, as indeed it does.
The sh __do_page_fault: which handles both kernel faults (without lock) and
user mm faults (locked - though it set_pte without locking before).
The sh64 flush_cache_range and helpers: which wrongly thought callers held
page_table_lock before (only its tlb_start_vma did, and no longer does so);
moved the flush loop down, and adjusted the large versus small range decision
to consider a range which spans page tables as large.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
First step in pushing down the page_table_lock. init_mm.page_table_lock has
been used throughout the architectures (usually for ioremap): not to serialize
kernel address space allocation (that's usually vmlist_lock), but because
pud_alloc,pmd_alloc,pte_alloc_kernel expect caller holds it.
Reverse that: don't lock or unlock init_mm.page_table_lock in any of the
architectures; instead rely on pud_alloc,pmd_alloc,pte_alloc_kernel to take
and drop it when allocating a new one, to check lest a racing task already
did. Similarly no page_table_lock in vmalloc's map_vm_area.
Some temporary ugliness in __pud_alloc and __pmd_alloc: since they also handle
user mms, which are converted only by a later patch, for now they have to lock
differently according to whether or not it's init_mm.
If sources get muddled, there's a danger that an arch source taking
init_mm.page_table_lock will be mixed with common source also taking it (or
neither take it). So break the rules and make another change, which should
break the build for such a mismatch: remove the redundant mm arg from
pte_alloc_kernel (ppc64 scrapped its distinct ioremap_mm in 2.6.13).
Exceptions: arm26 used pte_alloc_kernel on user mm, now pte_alloc_map; ia64
used pte_alloc_map on init_mm, now pte_alloc_kernel; parisc had bad args to
pmd_alloc and pte_alloc_kernel in unused USE_HPPA_IOREMAP code; ppc64
map_io_page forgot to unlock on failure; ppc mmu_mapin_ram and ppc64 im_free
took page_table_lock for no good reason.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The sh64 hugetlbpage.c seems to be erroneous, left over from a bygone age,
clashing with the common hugetlb.c. Replace it by a copy of the sh
hugetlbpage.c. Except, delete that mk_pte_huge macro neither uses.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A lot of the code in arch/*/mm/hugetlbpage.c is quite similar. This patch
attempts to consolidate a lot of the code across the arch's, putting the
combined version in mm/hugetlb.c. There are a couple of uglyish hacks in
order to covert all the hugepage archs, but the result is a very large
reduction in the total amount of code. It also means things like hugepage
lazy allocation could be implemented in one place, instead of six.
Tested, at least a little, on ppc64, i386 and x86_64.
Notes:
- this patch changes the meaning of set_huge_pte() to be more
analagous to set_pte()
- does SH4 need s special huge_ptep_get_and_clear()??
Acked-by: William Lee Irwin <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.
Let it rip!