- Remove the ifdef'ery and write distinct versions for each mmu ver even
if there is some code duplication
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
That is because __after_dc_op() already reads it for status check, so it
is better anyways to use that "newer" value.
Also reduces the clutter in callers for passing from/to these routines.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
The core VM already knows about VM_FAULT_SIGBUS, but cannot return a
"you should SIGSEGV" error, because the SIGSEGV case was generally
handled by the caller - usually the architecture fault handler.
That results in lots of duplication - all the architecture fault
handlers end up doing very similar "look up vma, check permissions, do
retries etc" - but it generally works. However, there are cases where
the VM actually wants to SIGSEGV, and applications _expect_ SIGSEGV.
In particular, when accessing the stack guard page, libsigsegv expects a
SIGSEGV. And it usually got one, because the stack growth is handled by
that duplicated architecture fault handler.
However, when the generic VM layer started propagating the error return
from the stack expansion in commit fee7e49d45 ("mm: propagate error
from stack expansion even for guard page"), that now exposed the
existing VM_FAULT_SIGBUS result to user space. And user space really
expected SIGSEGV, not SIGBUS.
To fix that case, we need to add a VM_FAULT_SIGSEGV, and teach all those
duplicate architecture fault handlers about it. They all already have
the code to handle SIGSEGV, so it's about just tying that new return
value to the existing code, but it's all a bit annoying.
This is the mindless minimal patch to do this. A more extensive patch
would be to try to gather up the mostly shared fault handling logic into
one generic helper routine, and long-term we really should do that
cleanup.
Just from this patch, you can generally see that most architectures just
copied (directly or indirectly) the old x86 way of doing things, but in
the meantime that original x86 model has been improved to hold the VM
semaphore for shorter times etc and to handle VM_FAULT_RETRY and other
"newer" things, so it would be a good idea to bring all those
improvements to the generic case and teach other architectures about
them too.
Reported-and-tested-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Jan Engelhardt <jengelh@inai.de>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # "s390 still compiles and boots"
Cc: linux-arch@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 609838cfed ("mm: invoke oom-killer from remaining unconverted page
fault handlers") converted arc to call pagefault_out_of_memory(), so remove
the comment about future conversion.
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
INV cmd for dcache provides 2 modes discard or wback-before-discard.
One is default and other needs to be set, if so desired. This is common
for line-op/entire-cache-op. So refactor them out into a helper
Doesn't affect generated code but paves way for any common
micro-optimization.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
* print aliasing or not, VIPT/PIPT etc
* compress param storage using bitfields
* more use of IS_ENABLED to de-uglify code
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
icaches are not snooped hence not cohrent in SMP setups which means
kernel has to do cross core calls to ensure the same.
The leaf routine __ic_line_inv_vaddr() now does cross core calls.
__sync_icache_dcache() is affected due to this:
* local dcache line flushed ahead of remote icache inv requests
* can't disable interrupts anymore, since
__ic_line_inv_vaddr()->on_each_cpu() can deadlock.
| WARNING: CPU: 0 PID: 1 at kernel/smp.c:374
| smp_call_function_many+0x25a/0x2c4()
|
| init_kprobes+0x90/0xc8
| register_kprobe+0x1d6/0x510
| __sync_icache_dcache+0x28/0x80
|
| DISABLE IRQ
|
| __ic_line_inv_vaddr
| on_each_cpu
| smp_call_function_many+0x25a/0x2c4 --> WARN
| __ic_line_inv_vaddr_local
| __dc_line_op
* TODO: Needs to use mask of relevant CPUs to avoid broadcasting
Signed-off-by: Noam Camus <noamc@ezchip.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
With commit 9df62f0544 "arch: use ASM_NL instead of ';'" the generic
macros can handle the arch specific newline quirk. Hence we can get rid
of ARC asm macros and use the "C" style macros.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
This fixes a subtle issue with cache flush which could potentially cause
random userspace crashes because of stale icache lines.
This error crept in when consolidating the cache flush code
Fixes: bd12976c36 (ARC: cacheflush refactor #3: Unify the {d,i}cache)
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org # 3.13
Cc: arc-linux-dev@synopsys.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
usual for this cycle with lots of clean-up.
- Cross arch clean-up and consolidation of early DT scanning code.
- Clean-up and removal of arch prom.h headers. Makes arch specific
prom.h optional on all but Sparc.
- Addition of interrupts-extended property for devices connected to
multiple interrupt controllers.
- Refactoring of DT interrupt parsing code in preparation for deferred
probe of interrupts.
- ARM cpu and cpu topology bindings documentation.
- Various DT vendor binding documentation updates.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQEcBAABAgAGBQJSgPQ4AAoJEMhvYp4jgsXif28H/1WkrXq5+lCFQZF8nbYdE2h0
R8PsfiJJmAl6/wFgQTsRel+ScMk2hiP08uTyqf2RLnB1v87gCF7MKVaLOdONfUDi
huXbcQGWCmZv0tbBIklxJe3+X3FIJch4gnyUvPudD1m8a0R0LxWXH/NhdTSFyB20
PNjhN/IzoN40X1PSAhfB5ndWnoxXBoehV/IVHVDU42vkPVbVTyGAw5qJzHW8CLyN
2oGTOalOO4ffQ7dIkBEQfj0mrgGcODToPdDvUQyyGZjYK2FY2sGrjyquir6SDcNa
Q4gwatHTu0ygXpyphjtQf5tc3ZCejJ/F0s3olOAS1ahKGfe01fehtwPRROQnCK8=
=GCbY
-----END PGP SIGNATURE-----
Merge tag 'devicetree-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree updates from Rob Herring:
"DeviceTree updates for 3.13. This is a bit larger pull request than
usual for this cycle with lots of clean-up.
- Cross arch clean-up and consolidation of early DT scanning code.
- Clean-up and removal of arch prom.h headers. Makes arch specific
prom.h optional on all but Sparc.
- Addition of interrupts-extended property for devices connected to
multiple interrupt controllers.
- Refactoring of DT interrupt parsing code in preparation for
deferred probe of interrupts.
- ARM cpu and cpu topology bindings documentation.
- Various DT vendor binding documentation updates"
* tag 'devicetree-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (82 commits)
powerpc: add missing explicit OF includes for ppc
dt/irq: add empty of_irq_count for !OF_IRQ
dt: disable self-tests for !OF_IRQ
of: irq: Fix interrupt-map entry matching
MIPS: Netlogic: replace early_init_devtree() call
of: Add Panasonic Corporation vendor prefix
of: Add Chunghwa Picture Tubes Ltd. vendor prefix
of: Add AU Optronics Corporation vendor prefix
of/irq: Fix potential buffer overflow
of/irq: Fix bug in interrupt parsing refactor.
of: set dma_mask to point to coherent_dma_mask
of: add vendor prefix for PHYTEC Messtechnik GmbH
DT: sort vendor-prefixes.txt
of: Add vendor prefix for Cadence
of: Add empty for_each_available_child_of_node() macro definition
arm/versatile: Fix versatile irq specifications.
of/irq: create interrupts-extended property
microblaze/pci: Drop PowerPC-ism from irq parsing
of/irq: Create of_irq_parse_and_map_pci() to consolidate arch code.
of/irq: Use irq_of_parse_and_map()
...
- Add mm_cpumask setting (aggregating only, unlike some other arches)
used to restrict the TLB flush cross-calling
- cross-calling versions of TLB flush routines (thanks to Noam)
Signed-off-by: Noam Camus <noamc@ezchip.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
------------------>8----------------------
arch/arc/mm/tlb.c: In function ‘do_tlb_overlap_fault’:
arch/arc/mm/tlb.c:688:13: warning: array subscript is above array bounds
[-Warray-bounds]
(pd0[n] & PAGE_MASK)) {
^
------------------>8----------------------
While at it, remove the usless last iteration of outer loop when reading
a TLB SET for duplicate entries.
Suggested-by: Mischa Jonker <mjonker@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
switch the args (address, pt_regs) to match with all the other "C"
exception handlers.
This removes the awkwardness in EV_ProtV for page fault vs. unaligned
access.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Line op needs vaddr (indexing) and paddr (tag match). For page sized
flushes (V-P const), each line op will need a different index, but the
tag bits wil remain constant, hence paddr can be setup once outside the
loop.
This improves select LMBench numbers for Aliasing dcache where we have
more "preventive" cache flushing.
Processor, Processes - times in microseconds - smaller is better
------------------------------------------------------------------------------
Host OS Mhz null null open slct sig sig fork exec sh
call I/O stat clos TCP inst hndl proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
3.11-rc7- Linux 3.11.0- 80 4.66 8.88 69.7 112. 268. 8.60 28.0 3489 13.K 27.K # Non alias ARC700
3.11-rc7- Linux 3.11.0- 80 4.64 8.51 68.6 98.5 271. 8.58 28.1 4160 15.K 32.K # Aliasing
3.11-rc7- Linux 3.11.0- 80 4.64 8.51 69.8 99.4 270. 8.73 27.5 3880 15.K 31.K # PTAG loop Inv
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
With Line length being constant now, we can fold the 2 helpers into 1.
This allows applying any optimizations (forthcoming) to single place.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
ARC dcache supports 3 ops - Inv, Flush, Flush-n-Inv.
The programming model however provides 2 commands FLUSH, INV.
INV will either discard or flush-n-discard (based on DT_CTRL bit)
The leaf helper __dc_line_loop() used to take the AUX register
(corresponding to the 2 commands). Now we push that to within the
helper, paving way for code consolidations to follow.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
A vmalloc fault needs to sync up PGD/PTE entry from init_mm to current
task's "active_mm". ARC vmalloc fault handler however was using mm.
A vmalloc fault for non user task context (actually pre-userland, from
init thread's open for /dev/console) caused the handler to deref NULL mm
(for mm->pgd)
The reasons it worked so far is amazing:
1. By default (!SMP), vmalloc fault handler uses a cached value of PGD.
In SMP that MMU register is repurposed hence need for mm pointer deref.
2. In pre-3.12 SMP kernel, the problem triggering vmalloc didn't exist in
pre-userland code path - it was introduced with commit 20bafb3d23
"n_tty: Move buffers into n_tty_data"
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Cc: Gilad Ben-Yossef <gilad@benyossef.com>
Cc: Noam Camus <noamc@ezchip.com>
Cc: stable@vger.kernel.org #3.10 and 3.11
Cc: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
All arches do essentially the same thing now for
early_init_dt_setup_initrd_arch, so it can now be removed.
Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Mark Salter <msalter@redhat.com>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Chris Zankel <chris@zankel.net>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Acked-by: Grant Likely <grant.likely@linaro.org>
Unlike global OOM handling, memory cgroup code will invoke the OOM killer
in any OOM situation because it has no way of telling faults occuring in
kernel context - which could be handled more gracefully - from
user-triggered faults.
Pass a flag that identifies faults originating in user space from the
architecture-specific fault handlers to generic code so that memcg OOM
handling can be improved.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: azurIt <azurit@pobox.sk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The memcg code can trap tasks in the context of the failing allocation
until an OOM situation is resolved. They can hold all kinds of locks
(fs, mm) at this point, which makes it prone to deadlocking.
This series converts memcg OOM handling into a two step process that is
started in the charge context, but any waiting is done after the fault
stack is fully unwound.
Patches 1-4 prepare architecture handlers to support the new memcg
requirements, but in doing so they also remove old cruft and unify
out-of-memory behavior across architectures.
Patch 5 disables the memcg OOM handling for syscalls, readahead, kernel
faults, because they can gracefully unwind the stack with -ENOMEM. OOM
handling is restricted to user triggered faults that have no other
option.
Patch 6 reworks memcg's hierarchical OOM locking to make it a little
more obvious wth is going on in there: reduce locked regions, rename
locking functions, reorder and document.
Patch 7 implements the two-part OOM handling such that tasks are never
trapped with the full charge stack in an OOM situation.
This patch:
Back before smart OOM killing, when faulting tasks were killed directly on
allocation failures, the arch-specific fault handlers needed special
protection for the init process.
Now that all fault handlers call into the generic OOM killer (see commit
609838cfed: "mm: invoke oom-killer from remaining unconverted page
fault handlers"), which already provides init protection, the
arch-specific leftovers can be removed.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: azurIt <azurit@pobox.sk>
Acked-by: Vineet Gupta <vgupta@synopsys.com> [arch/arc bits]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Generally minor changes. A bunch of bug fixes, particularly for
initialization and some refactoring. Most notable change if feeding the
entire flattened tree into the random pool at boot. May not be
significant, but shouldn't hurt either.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJSL12LAAoJEEFnBt12D9kB64gP/RBipnYbo3RPanHg+lE/J1V7
KSVFNGKWJHxTg47VVC1YJGIG21jqxAilpdS2MQL5FP7iyd+IzvtHpQiJgp+2G+pq
di06yrdyrYErxRgZgGQi8IpR538ZzOEVLCKJGdb09YelkRzPT5au7CC1MAsX3qco
yba7PHk0/Nc4hZE4aGbgR1DlRmn86ob7mM0KFE/LORaSN2BueMgWcwKhQXYNGyoh
assX4yNhAbUG6Bgw7paBLDGqHh8c5Ei5AppU8yPb+N094jgYHBJryUoDlzzUHD23
qqiEqHhUKT0TpgHNs8KH0WZFugcmjKvYEbzdzadBxqfXnJN4fKSEcdfF3iz4T14j
U6EZks89GoHwA523OghUZkKNOqlsUdWfdKz+8/grQqKisYwDcf3fCxEYk/4weDCQ
b6fFlOv6+AI3btjXp6F511ZKxyT4ZZzkHjp/ZSrhBygyamNZfax0ma0j+ZS9AZql
kPxQS0nOve6NKaP7vXxMmW5sGMnL19ER/Hm31wthGcWI43GVebUdklnzfGaEeSjs
pmP8oiCNemceqVpiPKxcOxiguf/eyIjP1SFXbguASygUmQeTDbbJ8n1FYznCitue
xJgWttKWsEf/aMR3eJtQ3aBmHR3rijAV4E28Wlq8XMkocwvpQm2zMocS2Z5BJ80S
hi1kQVy8+RxNX96tOSp1
=GSWl
-----END PGP SIGNATURE-----
Merge tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux
Pull device tree core updates from Grant Likely:
"Generally minor changes. A bunch of bug fixes, particularly for
initialization and some refactoring. Most notable change if feeding
the entire flattened tree into the random pool at boot. May not be
significant, but shouldn't hurt either"
Tim Bird questions whether the boot time cost of the random feeding may
be noticeable. And "add_device_randomness()" is definitely not some
speed deamon of a function.
* tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux:
of/platform: add error reporting to of_amba_device_create()
irq/of: Fix comment typo for irq_of_parse_and_map
of: Feed entire flattened device tree into the random pool
of/fdt: Clean up casting in unflattening path
of/fdt: Remove duplicate memory clearing on FDT unflattening
gpio: implement gpio-ranges binding document fix
of: call __of_parse_phandle_with_args from of_parse_phandle
of: introduce of_parse_phandle_with_fixed_args
of: move of_parse_phandle()
of: move documentation of of_parse_phandle_with_args
of: Fix missing memory initialization on FDT unflattening
of: consolidate definition of early_init_dt_alloc_memory_arch()
of: Make of_get_phy_mode() return int i.s.o. const int
include: dt-binding: input: create a DT header defining key codes.
of/platform: Staticize of_platform_device_create_pdata()
of: Specify initrd location using 64-bit
dt: Typo fix
OF: make of_property_for_each_{u32|string}() use parameters if OF is not enabled
This helps remove asid-to-mm reverse map
While mm->context.id contains the ASID assigned to a process, our ASID
allocator also used asid_mm_map[] reverse map. In a new allocation
cycle (mm->ASID >= @asid_cache), the Round Robin ASID allocator used this
to check if new @asid_cache belonged to some mm2 (from prev cycle).
If so, it could locate that mm using the ASID reverse map, and mark that
mm as unallocated ASID, to force it to refresh at the time of switch_mm()
However, for SMP, the reverse map has to be maintained per CPU, so
becomes 2 dimensional, hence got rid of it.
With reverse map gone, it is NOT possible to reach out to current
assignee. So we track the ASID allocation generation/cycle and
on every switch_mm(), check if the current generation of CPU ASID is
same as mm's ASID; If not it is refreshed.
(Based loosely on arch/sh implementation)
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
ASID allocation changes/1
This patch does 2 things:
(1) get_new_mmu_context() NOW moves mm->ASID to a new value ONLY if it
was from a prev allocation cycle/generation OR if mm had no ASID
allocated (vs. before would unconditionally moving to a new ASID)
Callers desiring unconditional update of ASID, e.g.local_flush_tlb_mm()
(for parent's address space invalidation at fork) need to first force
the parent to an unallocated ASID.
(2) get_new_mmu_context() always sets the MMU PID reg with unchanged/new
ASID value.
The gains are:
- consolidation of all asid alloc logic into get_new_mmu_context()
- avoiding code duplication in switch_mm() for PID reg setting
- Enables future change to fold activate_mm() into switch_mm()
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
-Asm code already has values of SW and HW ASID values, so they can be
passed to the printing routine.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
This reorganizes the current TLB operations into psuedo-ops to better
pair with MMUv4's native Insert/Delete operations
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
With previous commit freeing up PTE bits, reassign them so as to:
- Match the bit to H/w counterpart where possible
(e.g. MMUv2 GLOBAL/PRESENT, this avoids a shift in create_tlb())
- Avoid holes in _PAGE_xxx definitions
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
The current ARC VM code has 13 flags in Page Table entry: some software
(accesed/dirty/non-linear-maps) and rest hardware specific. With 8k MMU
page, we need 19 bits for addressing page frame so remaining 13 bits is
just about enough to accomodate the current flags.
In MMUv4 there are 2 additional flags, SZ (normal or super page) and WT
(cache access mode write-thru) - and additionally PFN is 20 bits (vs. 19
before for 8k). Thus these can't be held in current PTE w/o making each
entry 64bit wide.
It seems there is some scope of compressing the current PTE flags (and
freeing up a few bits). Currently PTE contains fully orthogonal distinct
access permissions for kernel and user mode (Kr, Kw, Kx; Ur, Uw, Ux)
which can be folded into one set (R, W, X). The translation of 3 PTE
bits into 6 TLB bits (when programming the MMU) can be done based on
following pre-requites/assumptions:
1. For kernel-mode-only translations (vmalloc: 0x7000_0000 to
0x7FFF_FFFF), PTE additionally has PAGE_GLOBAL flag set (and user
space entries can never be global). Thus such a PTE can translate
to Kr, Kw, Kx (as appropriate) and zero for User mode counterparts.
2. For non global entries, the PTE flags can be used to create mirrored
K and U TLB bits. This is true after commit a950549c67
"ARC: copy_(to|from)_user() to honor usermode-access permissions"
which ensured that user-space translations _MUST_ have same access
permissions for both U/K mode accesses so that copy_{to,from}_user()
play fair with fault based CoW break and such...
There is no such thing as free lunch - the cost is slightly infalted
TLB-Miss Handlers.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>