Pull DMA-mapping updates from Marek Szyprowski:
"Those patches are continuation of my earlier work.
They contains extensions to DMA-mapping framework to remove limitation
of the current ARM implementation (like limited total size of DMA
coherent/write combine buffers), improve performance of buffer sharing
between devices (attributes to skip cpu cache operations or creation
of additional kernel mapping for some specific use cases) as well as
some unification of the common code for dma_mmap_attrs() and
dma_mmap_coherent() functions. All extensions have been implemented
and tested for ARM architecture."
* 'for-linus-for-3.6-rc1' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping:
ARM: dma-mapping: add support for DMA_ATTR_SKIP_CPU_SYNC attribute
common: DMA-mapping: add DMA_ATTR_SKIP_CPU_SYNC attribute
ARM: dma-mapping: add support for dma_get_sgtable()
common: dma-mapping: introduce dma_get_sgtable() function
ARM: dma-mapping: add support for DMA_ATTR_NO_KERNEL_MAPPING attribute
common: DMA-mapping: add DMA_ATTR_NO_KERNEL_MAPPING attribute
common: dma-mapping: add support for generic dma_mmap_* calls
ARM: dma-mapping: fix error path for memory allocation failure
ARM: dma-mapping: add more sanity checks in arm_dma_mmap()
ARM: dma-mapping: remove custom consistent dma region
mm: vmalloc: use const void * for caller argument
scatterlist: add sg_alloc_table_from_pages function
This patch adds support for DMA_ATTR_SKIP_CPU_SYNC attribute for
dma_(un)map_(single,page,sg) functions family. It lets dma mapping clients
to create a mapping for the buffer for the given device without performing
a CPU cache synchronization. CPU cache synchronization can be skipped for
the buffers which it is known that they are already in 'device' domain (CPU
caches have been already synchronized or there are only coherent mappings
for the buffer). For advanced users only, please use it with care.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Kyungmin Park <kyungmin.park@samsung.com>
This patch adds support for dma_get_sgtable() function which is required
to let drivers to share the buffers allocated by DMA-mapping subsystem.
Generic implementation based on virt_to_page() is not suitable for ARM
dma-mapping subsystem.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Kyungmin Park <kyungmin.park@samsung.com>
This patch adds support for DMA_ATTR_NO_KERNEL_MAPPING attribute for
IOMMU allocations, what let drivers to save precious kernel virtual
address space for large buffers that are intended to be accessed only
from userspace.
This patch is heavily based on initial work kindly provided by Abhinav
Kochhar <abhinav.k@samsung.com>.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Kyungmin Park <kyungmin.park@samsung.com>
This patch fixes incorrect check in error path. When the allocation of
first page fails, the kernel ops appears due to accessing -1 element of
the pages array.
Reported-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Add some sanity checks and forbid mmaping of buffers into vma areas larger
than allocated dma buffer.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
This patch changes dma-mapping subsystem to use generic vmalloc areas
for all consistent dma allocations. This increases the total size limit
of the consistent allocations and removes platform hacks and a lot of
duplicated code.
Atomic allocations are served from special pool preallocated on boot,
because vmalloc areas cannot be reliably created in atomic context.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Kyungmin Park <kyungmin.park@samsung.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
WARNING: at mm/vmalloc.c:1471 __iommu_free_buffer+0xcc/0xd0()
Trying to vfree() nonexistent vm area (ef095000)
Modules linked in:
[<c0015a18>] (unwind_backtrace+0x0/0xfc) from [<c0025a94>] (warn_slowpath_common+0x54/0x64)
[<c0025a94>] (warn_slowpath_common+0x54/0x64) from [<c0025b38>] (warn_slowpath_fmt+0x30/0x40)
[<c0025b38>] (warn_slowpath_fmt+0x30/0x40) from [<c0016de0>] (__iommu_free_buffer+0xcc/0xd0)
[<c0016de0>] (__iommu_free_buffer+0xcc/0xd0) from [<c0229a5c>] (exynos_drm_free_buf+0xe4/0x138)
[<c0229a5c>] (exynos_drm_free_buf+0xe4/0x138) from [<c022b358>] (exynos_drm_gem_destroy+0x80/0xfc)
[<c022b358>] (exynos_drm_gem_destroy+0x80/0xfc) from [<c0211230>] (drm_gem_object_free+0x28/0x34)
[<c0211230>] (drm_gem_object_free+0x28/0x34) from [<c0211bd0>] (drm_gem_object_release_handle+0xcc/0xd8)
[<c0211bd0>] (drm_gem_object_release_handle+0xcc/0xd8) from [<c01abe10>] (idr_for_each+0x74/0xb8)
[<c01abe10>] (idr_for_each+0x74/0xb8) from [<c02114e4>] (drm_gem_release+0x1c/0x30)
[<c02114e4>] (drm_gem_release+0x1c/0x30) from [<c0210ae8>] (drm_release+0x608/0x694)
[<c0210ae8>] (drm_release+0x608/0x694) from [<c00b75a0>] (fput+0xb8/0x228)
[<c00b75a0>] (fput+0xb8/0x228) from [<c00b40c4>] (filp_close+0x64/0x84)
[<c00b40c4>] (filp_close+0x64/0x84) from [<c0029d54>] (put_files_struct+0xe8/0x104)
[<c0029d54>] (put_files_struct+0xe8/0x104) from [<c002b930>] (do_exit+0x608/0x774)
[<c002b930>] (do_exit+0x608/0x774) from [<c002bae4>] (do_group_exit+0x48/0xb4)
[<c002bae4>] (do_group_exit+0x48/0xb4) from [<c002bb60>] (sys_exit_group+0x10/0x18)
[<c002bb60>] (sys_exit_group+0x10/0x18) from [<c000ee80>] (ret_fast_syscall+0x0/0x30)
This patch modifies the condition while freeing to match the condition
used while allocation. This fixes the above warning which arises when
array size is equal to PAGE_SIZE where allocation is done using kzalloc
but free is done using vfree.
Signed-off-by: Prathyush K <prathyush.k@samsung.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
IOMMU-aware dma_alloc_attrs() implementation allocates buffers in
power-of-two chunks to improve performance and take advantage of large
page mappings provided by some IOMMU hardware. However current code, due
to a subtle bug, allocated those chunks in the smallest-to-largest
order, what completely killed all the advantages of using larger than
page chunks. If a 4KiB chunk has been mapped as a first chunk, the
consecutive chunks are not aligned correctly to the power-of-two which
match their size and IOMMU drivers were not able to use internal
mappings of size other than the 4KiB (largest common denominator of
alignment and chunk size).
This patch fixes this issue by changing to the correct largest-to-smallest
chunk size allocation sequence.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Fixes the following sparse warnings:
arch/arm/mm/dma-mapping.c:231:15: warning: symbol 'consistent_base' was not
declared. Should it be static?
arch/arm/mm/dma-mapping.c:326:8: warning: symbol 'coherent_pool_size' was not
declared. Should it be static?
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
CMA has been enabled unconditionally on all ARMv6+ systems to solve the
long standing issue of double kernel mappings for all dma coherent
buffers. This however created a dependency on CONFIG_EXPERIMENTAL for
the whole ARM architecture what should be really avoided. This patch
removes this dependency and lets one use old, well-tested dma-mapping
implementation also on ARMv6+ systems without the need to use
EXPERIMENTAL stuff.
Reported-by: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
The dma_contiguous_remap() function clears existing section maps using
the wrong size (PGDIR_SIZE instead of PMD_SIZE). This is a bug which
does not affect non-LPAE systems, where PGDIR_SIZE and PMD_SIZE are the same.
On LPAE systems, however, this bug causes the kernel to hang at this point.
This fix has been tested on both LPAE and non-LPAE kernel builds.
Signed-off-by: Vitaly Andrianov <vitalya@ti.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
This patch adds support for CMA to dma-mapping subsystem for ARM
architecture. By default a global CMA area is used, but specific devices
are allowed to have their private memory areas if required (they can be
created with dma_declare_contiguous() function during board
initialisation).
Contiguous memory areas reserved for DMA are remapped with 2-level page
tables on boot. Once a buffer is requested, a low memory kernel mapping
is updated to to match requested memory access type.
GFP_ATOMIC allocations are performed from special pool which is created
early during boot. This way remapping page attributes is not needed on
allocation time.
CMA has been enabled unconditionally for ARMv6+ systems.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Rob Clark <rob.clark@linaro.org>
Tested-by: Ohad Ben-Cohen <ohad@wizery.com>
Tested-by: Benjamin Gaignard <benjamin.gaignard@linaro.org>
Tested-by: Robert Nelson <robertcnelson@gmail.com>
Tested-by: Barry Song <Baohua.Song@csr.com>
This patch add a complete implementation of DMA-mapping API for
devices which have IOMMU support.
This implementation tries to optimize dma address space usage by remapping
all possible physical memory chunks into a single dma address space chunk.
DMA address space is managed on top of the bitmap stored in the
dma_iommu_mapping structure stored in device->archdata. Platform setup
code has to initialize parameters of the dma address space (base address,
size, allocation precision order) with arm_iommu_create_mapping() function.
To reduce the size of the bitmap, all allocations are aligned to the
specified order of base 4 KiB pages.
dma_alloc_* functions allocate physical memory in chunks, each with
alloc_pages() function to avoid failing if the physical memory gets
fragmented. In worst case the allocated buffer is composed of 4 KiB page
chunks.
dma_map_sg() function minimizes the total number of dma address space
chunks by merging of physical memory chunks into one larger dma address
space chunk. If requested chunk (scatter list entry) boundaries
match physical page boundaries, most calls to dma_map_sg() requests will
result in creating only one chunk in dma address space.
dma_map_page() simply creates a mapping for the given page(s) in the dma
address space.
All dma functions also perform required cache operation like their
counterparts from the arm linear physical memory mapping version.
This patch contains code and fixes kindly provided by:
- Krishna Reddy <vdumpa@nvidia.com>,
- Andrzej Pietrasiewicz <andrzej.p@samsung.com>,
- Hiroshi DOYU <hdoyu@nvidia.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-By: Subash Patel <subash.ramaswamy@linaro.org>
This patch converts dma_alloc/free/mmap_{coherent,writecombine}
functions to use generic alloc/free/mmap methods from dma_map_ops
structure. A new DMA_ATTR_WRITE_COMBINE DMA attribute have been
introduced to implement writecombine methods.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Tested-By: Subash Patel <subash.ramaswamy@linaro.org>
This patch just performs a global cleanup in DMA mapping implementation
for ARM architecture. Some of the tiny helper functions have been moved
to the caller code, some have been merged together.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Tested-By: Subash Patel <subash.ramaswamy@linaro.org>
This patch removes dma bounce hooks from the common dma mapping
implementation on ARM architecture and creates a separate set of
dma_map_ops for dma bounce devices.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Tested-By: Subash Patel <subash.ramaswamy@linaro.org>
This patch converts all dma_sg methods to be generic (independent of the
current DMA mapping implementation for ARM architecture). All dma sg
operations are now implemented on top of respective
dma_map_page/dma_sync_single_for* operations from dma_map_ops structure.
Before this patch there were custom methods for all scatter/gather
related operations. They iterated over the whole scatter list and called
cache related operations directly (which in turn checked if we use dma
bounce code or not and called respective version). This patch changes
them not to use such shortcut. Instead it provides similar loop over
scatter list and calls methods from the device's dma_map_ops structure.
This enables us to use device dependent implementations of cache related
operations (direct linear or dma bounce) depending on the provided
dma_map_ops structure.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Tested-By: Subash Patel <subash.ramaswamy@linaro.org>
This patch modifies dma-mapping implementation on ARM architecture to
use common dma_map_ops structure and asm-generic/dma-mapping-common.h
helpers.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Tested-By: Subash Patel <subash.ramaswamy@linaro.org>
This patch removes the need for the offset parameter in dma bounce
functions. This is required to let dma-mapping framework on ARM
architecture to use common, generic dma_map_ops based dma-mapping
helpers.
Background and more detailed explaination:
dma_*_range_* functions are available from the early days of the dma
mapping api. They are the correct way of doing a partial syncs on the
buffer (usually used by the network device drivers). This patch changes
only the internal implementation of the dma bounce functions to let
them tunnel through dma_map_ops structure. The driver api stays
unchanged, so driver are obliged to call dma_*_range_* functions to
keep code clean and easy to understand.
The only drawback from this patch is reduced detection of the dma api
abuse. Let us consider the following code:
dma_addr = dma_map_single(dev, ptr, 64, DMA_TO_DEVICE);
dma_sync_single_range_for_cpu(dev, dma_addr+16, 0, 32, DMA_TO_DEVICE);
Without the patch such code fails, because dma bounce code is unable
to find the bounce buffer for the given dma_address. After the patch
the above sync call will be equivalent to:
dma_sync_single_range_for_cpu(dev, dma_addr, 16, 32, DMA_TO_DEVICE);
which succeeds.
I don't consider this as a real problem, because DMA API abuse should be
caught by debug_dma_* function family. This patch lets us to simplify
the internal low-level implementation without chaning the driver visible
API.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Tested-By: Subash Patel <subash.ramaswamy@linaro.org>
Replace all uses of ~0 with DMA_ERROR_CODE, what should make the code
easier to read.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Tested-By: Subash Patel <subash.ramaswamy@linaro.org>
Replace all calls to printk with pr_* functions family.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Tested-By: Subash Patel <subash.ramaswamy@linaro.org>
Add a new seqfile for reporting coherent DMA allocations. This contains
the address range, size and the function which was used to allocate
each region, allowing these allocations to be viewed in much the same
way as /proc/vmallocinfo.
The DMA coherent region has limited space, so this allows allocation
failures to be viewed, as well as finding out how much space is being
used.
Make sure this file is only readable by root - same as vmallocinfo - to
prevent information leakage.
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
dma_alloc_coherent wants to split pages after allocation in order to
reduce the memory footprint. This does not work well with GFP_COMP
pages, so drop this flag before allocation.
This patch is ported from arch/avr32
(commit 3611553ef9).
[swarren: s/HUGETLB_PAGE/HUGETLBFS/ in comment, minor comment cleanup]
Signed-off-by: Sumit Bhattacharya <sumitb@nvidia.com>
Tested-by: Varun Colbert <vcolbert@nvidia.com>
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Commit 99d1717d (ARM: Add init_consistent_dma_size()) introduces dynamic
allocation of the consistent_pte array. The number of PTEs should be
calculated based on the number of PMD entries rather than PGD, hence the
PMD_SHIFT.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
If the attempt to map a page for DMA fails (eg, because we're out of
mapping space) then we must not hold on to the page we allocated for
DMA - doing so will result in a memory leak.
Cc: <stable@kernel.org>
Reported-by: Bryan Phillippe <bp@darkforest.org>
Tested-by: Bryan Phillippe <bp@darkforest.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
PGDIR_SHIFT and PMD_SHIFT for the classic 2-level page table format have
the same value (21). This patch converts the PGDIR_* uses in the kernel
to the PMD_* equivalent so that LPAE builds can reuse the same code.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This function can be called during boot to increase the size of the consistent
DMA region above it's default value of 2MB. It must be called before the memory
allocator is initialised, i.e. before any core_initcall.
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
ISA_DMA_THRESHOLD has been unused by non-arch code, so lets now get
rid of it from ARM by replacing it with arm_dma_zone_mask. Move
dma_supported() and dma_set_mask() out of line, and have
dma_supported() check this new variable instead.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Add pud_offset() et.al. between the pgd and pmd code in preparation of
using pgtable-nopud.h rather than 4level-fixup.h.
This incorporates a fix from Jamie Iles <jamie@jamieiles.com> for
uaccess_with_memcpy.c.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The kerneldoc for this function is at odds with the DMA-API
document, which holds, so fix it.
Signed-off-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Add ARM support for the DMA debug infrastructure, which allows the
DMA API usage to be debugged.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Replace the page_to_dma() and dma_to_page() macros with their PFN
equivalents. This allows us to map parts of memory which do not have
a struct page allocated to them to bus addresses. This will be used
internally by dma_alloc_coherent()/dma_alloc_writecombine().
Build tested on Versatile, OMAP1, IOP13xx and KS8695.
Tested-by: Janusz Krzysztofik <jkrzyszt@tis.icnet.pl>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Since commit 3e4d3af501 "mm: stack based kmap_atomic()", it is no longer
necessary to carry an ad hoc version of kmap_atomic() added in commit
7e5a69e83b "ARM: 6007/1: fix highmem with VIPT cache and DMA" to cope
with reentrancy.
In fact, it is now actively wrong to rely on fixed kmap type indices
(namely KM_L1_CACHE) as kmap_atomic() totally ignores them now and a
concurrent instance of it may reuse any slot for any purpose.
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
An out by one bug meant that the DMA coherent allocator was aligning
to one more bit than it should, causing it to run out of available
memory quicker. Fix this.
Reported-by: Petr Štetiar <ynezz@true.cz>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
There are places in Linux where writes to newly allocated page cache
pages happen without a subsequent call to flush_dcache_page() (several
PIO drivers including USB HCD). This patch changes the meaning of
PG_arch_1 to be PG_dcache_clean and always flush the D-cache for a newly
mapped page in update_mmu_cache().
The patch also sets the PG_arch_1 bit in the DMA cache maintenance
function to avoid additional cache flushing in update_mmu_cache().
Tested-by: Rabin Vincent <rabin.vincent@stericsson.com>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Dave Hylands reports:
| We've observed a problem with dma_alloc_writecombine when the system
| is under heavy load (heavy bus traffic). We've managed to reduce the
| problem to the following snippet, which is run from a kthread in a
| continuous loop:
|
| void *virtAddr;
| dma_addr_t physAddr;
| unsigned int numBytes = 256;
|
| for (;;) {
| virtAddr = dma_alloc_writecombine(NULL,
| numBytes, &physAddr, GFP_KERNEL);
| if (virtAddr == NULL) {
| printk(KERN_ERR "Running out of memory\n");
| break;
| }
|
| /* access DMA memory allocated */
| tmp = virtAddr;
| *tmp = 0x77;
|
| /* free DMA memory */
| dma_free_writecombine(NULL,
| numBytes, virtAddr, physAddr);
|
| ...sleep here...
| }
|
| By itself, the code will run forever with no issues. However, as we
| increase our bus traffic (typically using DMA) then the *tmp = 0x77
| line will eventually cause a page fault. If we add a small delay (a
| few microseconds) before the *tmp = 0x77, then we don't see a page
| fault, even under heavy load.
A dsb() is required after modifying the PTE entries to ensure that they
will always be visible. Add this dsb().
Reported-by: Dave Hylands <dhylands@gmail.com>
Tested-by: Dave Hylands <dhylands@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The DMA coherent remap area is used to provide an uncached mapping
of memory for coherency with DMA engines. Currently, we look for
any free hole which our allocation will fit in with page alignment.
However, this can lead to fragmentation of the area, and allows small
allocations to cross L1 entry boundaries. This is undesirable as we
want to move towards allocating sections of memory.
Align allocations according to the size, limiting the alignment between
the page and section sizes.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This macro is not defined when !CONFIG_MMU so this patch moves the
CONSISTENT_* definitions to the CONFIG_MMU section.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The VIVT cache of a highmem page is always flushed before the page
is unmapped. This cache flush is explicit through flush_cache_kmaps()
in flush_all_zero_pkmaps(), or through __cpuc_flush_dcache_area() in
kunmap_atomic(). There is also an implicit flush of those highmem pages
that were part of a process that just terminated making those pages free
as the whole VIVT cache has to be flushed on every task switch. Hence
unmapped highmem pages need no cache maintenance in that case.
However unmapped pages may still be cached with a VIPT cache because the
cache is tagged with physical addresses. There is no need for a whole
cache flush during task switching for that reason, and despite the
explicit cache flushes in flush_all_zero_pkmaps() and kunmap_atomic(),
some highmem pages that were mapped in user space end up still cached
even when they become unmapped.
So, we do have to perform cache maintenance on those unmapped highmem
pages in the context of DMA when using a VIPT cache. Unfortunately,
it is not possible to perform that cache maintenance using physical
addresses as all the L1 cache maintenance coprocessor functions accept
virtual addresses only. Therefore we have no choice but to set up a
temporary virtual mapping for that purpose.
And of course the explicit cache flushing when unmapping a highmem page
on a system with a VIPT cache now can go, which should increase
performance.
While at it, because the code in __flush_dcache_page() has to be modified
anyway, let's also make sure the mapped highmem pages are pinned with
kmap_high_get() for the duration of the cache maintenance operation.
Because kunmap() does unmap highmem pages lazily, it was reported by
Gary King <GKing@nvidia.com> that those pages ended up being unmapped
during cache maintenance on SMP causing segmentation faults.
Signed-off-by: Nicolas Pitre <nico@marvell.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>