2
0
mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-19 10:44:14 +08:00
Commit Graph

50728 Commits

Author SHA1 Message Date
Konstantin Khlebnikov
bbf808ed7d mm/memcg: kill mem_cgroup_lru_del()
This patch kills mem_cgroup_lru_del(), we can use
mem_cgroup_lru_del_list() instead.  On 0-order isolation we already have
right lru list id.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29 16:22:25 -07:00
Konstantin Khlebnikov
f3fd4a6192 mm: remove lru type checks from __isolate_lru_page()
After patch "mm: forbid lumpy-reclaim in shrink_active_list()" we can
completely remove anon/file and active/inactive lru type filters from
__isolate_lru_page(), because isolation for 0-order reclaim always
isolates pages from right lru list.  And pages-isolation for lumpy
shrink_inactive_list() or memory-compaction anyway allowed to isolate
pages from all evictable lru lists.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29 16:22:25 -07:00
Konstantin Khlebnikov
014483bccc mm: mark mm-inline functions as __always_inline
GCC sometimes ignores "inline" directives even for small and simple functions.
This supposed to be fixed in gcc 4.7, but it was released only yesterday.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29 16:22:25 -07:00
Hugh Dickins
89abfab133 mm/memcg: move reclaim_stat into lruvec
With mem_cgroup_disabled() now explicit, it becomes clear that the
zone_reclaim_stat structure actually belongs in lruvec, per-zone when
memcg is disabled but per-memcg per-zone when it's enabled.

We can delete mem_cgroup_get_reclaim_stat(), and change
update_page_reclaim_stat() to update just the one set of stats, the one
which get_scan_count() will actually use.

Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29 16:22:25 -07:00
KAMEZAWA Hiroyuki
4b91355e9d memcg: fix/change behavior of shared anon at moving task
This patch changes memcg's behavior at task_move().

At task_move(), the kernel scans a task's page table and move the changes
for mapped pages from source cgroup to target cgroup.  There has been a
bug at handling shared anonymous pages for a long time.

Before patch:
  - The spec says 'shared anonymous pages are not moved.'
  - The implementation was 'shared anonymoys pages may be moved'.
    If page_mapcount <=2, shared anonymous pages's charge were moved.

After patch:
  - The spec says 'all anonymous pages are moved'.
  - The implementation is 'all anonymous pages are moved'.

Considering usage of memcg, this will not affect user's experience.
'shared anonymous' pages only exists between a tree of processes which
don't do exec().  Moving one of process without exec() seems not sane.
For example, libcgroup will not be affected by this change.  (Anyway, no
one noticed the implementation for a long time...)

Below is a discussion log:

 - current spec/implementation are complex
 - Now, shared file caches are moved
 - It adds unclear check as page_mapcount(). To do correct check,
   we should check swap users, etc.
 - No one notice this implementation behavior. So, no one get benefit
   from the design.
 - In general, once task is moved to a cgroup for running, it will not
   be moved....
 - Finally, we have control knob as memory.move_charge_at_immigrate.

Here is a patch to allow moving shared pages, completely. This makes
memcg simpler and fix current broken code.

Suggested-by: Hugh Dickins <hughd@google.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Glauber Costa <glommer@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29 16:22:24 -07:00
Pravin B Shelar
5bf5f03c27 mm: fix slab->page flags corruption
Transparent huge pages can change page->flags (PG_compound_lock) without
taking Slab lock.  Since THP can not break slab pages we can safely access
compound page without taking compound lock.

Specifically this patch fixes a race between compound_unlock() and slab
functions which perform page-flags updates.  This can occur when
get_page()/put_page() is called on a page from slab.

[akpm@linux-foundation.org: tweak comment text, fix comment layout, fix label indenting]
Reported-by: Amey Bhide <abhide@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Acked-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29 16:22:24 -07:00
Andrea Arcangeli
26c191788f mm: pmd_read_atomic: fix 32bit PAE pmd walk vs pmd_populate SMP race condition
When holding the mmap_sem for reading, pmd_offset_map_lock should only
run on a pmd_t that has been read atomically from the pmdp pointer,
otherwise we may read only half of it leading to this crash.

PID: 11679  TASK: f06e8000  CPU: 3   COMMAND: "do_race_2_panic"
 #0 [f06a9dd8] crash_kexec at c049b5ec
 #1 [f06a9e2c] oops_end at c083d1c2
 #2 [f06a9e40] no_context at c0433ded
 #3 [f06a9e64] bad_area_nosemaphore at c043401a
 #4 [f06a9e6c] __do_page_fault at c0434493
 #5 [f06a9eec] do_page_fault at c083eb45
 #6 [f06a9f04] error_code (via page_fault) at c083c5d5
    EAX: 01fb470c EBX: fff35000 ECX: 00000003 EDX: 00000100 EBP:
    00000000
    DS:  007b     ESI: 9e201000 ES:  007b     EDI: 01fb4700 GS:  00e0
    CS:  0060     EIP: c083bc14 ERR: ffffffff EFLAGS: 00010246
 #7 [f06a9f38] _spin_lock at c083bc14
 #8 [f06a9f44] sys_mincore at c0507b7d
 #9 [f06a9fb0] system_call at c083becd
                         start           len
    EAX: ffffffda  EBX: 9e200000  ECX: 00001000  EDX: 6228537f
    DS:  007b      ESI: 00000000  ES:  007b      EDI: 003d0f00
    SS:  007b      ESP: 62285354  EBP: 62285388  GS:  0033
    CS:  0073      EIP: 00291416  ERR: 000000da  EFLAGS: 00000286

This should be a longstanding bug affecting x86 32bit PAE without THP.
Only archs with 64bit large pmd_t and 32bit unsigned long should be
affected.

With THP enabled the barrier() in pmd_none_or_trans_huge_or_clear_bad()
would partly hide the bug when the pmd transition from none to stable,
by forcing a re-read of the *pmd in pmd_offset_map_lock, but when THP is
enabled a new set of problem arises by the fact could then transition
freely in any of the none, pmd_trans_huge or pmd_trans_stable states.
So making the barrier in pmd_none_or_trans_huge_or_clear_bad()
unconditional isn't good idea and it would be a flakey solution.

This should be fully fixed by introducing a pmd_read_atomic that reads
the pmd in order with THP disabled, or by reading the pmd atomically
with cmpxchg8b with THP enabled.

Luckily this new race condition only triggers in the places that must
already be covered by pmd_none_or_trans_huge_or_clear_bad() so the fix
is localized there but this bug is not related to THP.

NOTE: this can trigger on x86 32bit systems with PAE enabled with more
than 4G of ram, otherwise the high part of the pmd will never risk to be
truncated because it would be zero at all times, in turn so hiding the
SMP race.

This bug was discovered and fully debugged by Ulrich, quote:

----
[..]
pmd_none_or_trans_huge_or_clear_bad() loads the content of edx and
eax.

    496 static inline int pmd_none_or_trans_huge_or_clear_bad(pmd_t
    *pmd)
    497 {
    498         /* depend on compiler for an atomic pmd read */
    499         pmd_t pmdval = *pmd;

                                // edi = pmd pointer
0xc0507a74 <sys_mincore+548>:   mov    0x8(%esp),%edi
...
                                // edx = PTE page table high address
0xc0507a84 <sys_mincore+564>:   mov    0x4(%edi),%edx
...
                                // eax = PTE page table low address
0xc0507a8e <sys_mincore+574>:   mov    (%edi),%eax

[..]

Please note that the PMD is not read atomically. These are two "mov"
instructions where the high order bits of the PMD entry are fetched
first. Hence, the above machine code is prone to the following race.

-  The PMD entry {high|low} is 0x0000000000000000.
   The "mov" at 0xc0507a84 loads 0x00000000 into edx.

-  A page fault (on another CPU) sneaks in between the two "mov"
   instructions and instantiates the PMD.

-  The PMD entry {high|low} is now 0x00000003fda38067.
   The "mov" at 0xc0507a8e loads 0xfda38067 into eax.
----

Reported-by: Ulrich Obergfell <uobergfe@redhat.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Petr Matousek <pmatouse@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29 16:22:24 -07:00
David Rientjes
a7f638f999 mm, oom: normalize oom scores to oom_score_adj scale only for userspace
The oom_score_adj scale ranges from -1000 to 1000 and represents the
proportion of memory available to the process at allocation time.  This
means an oom_score_adj value of 300, for example, will bias a process as
though it was using an extra 30.0% of available memory and a value of
-350 will discount 35.0% of available memory from its usage.

The oom killer badness heuristic also uses this scale to report the oom
score for each eligible process in determining the "best" process to
kill.  Thus, it can only differentiate each process's memory usage by
0.1% of system RAM.

On large systems, this can end up being a large amount of memory: 256MB
on 256GB systems, for example.

This can be fixed by having the badness heuristic to use the actual
memory usage in scoring threads and then normalizing it to the
oom_score_adj scale for userspace.  This results in better comparison
between eligible threads for kill and no change from the userspace
perspective.

Suggested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Tested-by: Dave Jones <davej@redhat.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29 16:22:24 -07:00
Hugh Dickins
17cf28afea mm/fs: remove truncate_range
Remove vmtruncate_range(), and remove the truncate_range method from
struct inode_operations: only tmpfs ever supported it, and tmpfs has now
converted over to using the fallocate method of file_operations.

Update Documentation accordingly, adding (setlease and) fallocate lines.
And while we're in mm.h, remove duplicate declarations of shmem_lock() and
shmem_file_setup(): everyone is now using the ones in shmem_fs.h.

Based-on-patch-by: Cong Wang <amwang@redhat.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Cong Wang <amwang@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29 16:22:23 -07:00
Hugh Dickins
bde05d1ccd shmem: replace page if mapping excludes its zone
The GMA500 GPU driver uses GEM shmem objects, but with a new twist: the
backing RAM has to be below 4GB.  Not a problem while the boards
supported only 4GB: but now Intel's D2700MUD boards support 8GB, and
their GMA3600 is managed by the GMA500 driver.

shmem/tmpfs has never pretended to support hardware restrictions on the
backing memory, but it might have appeared to do so before v3.1, and
even now it works fine until a page is swapped out then back in.  When
read_cache_page_gfp() supplied a freshly allocated page for copy, that
compensated for whatever choice might have been made by earlier swapin
readahead; but swapoff was likely to destroy the illusion.

We'd like to continue to support GMA500, so now add a new
shmem_should_replace_page() check on the zone when about to move a page
from swapcache to filecache (in swapin and swapoff cases), with
shmem_replace_page() to allocate and substitute a suitable page (given
gma500/gem.c's mapping_set_gfp_mask GFP_KERNEL | __GFP_DMA32).

This does involve a minor extension to mem_cgroup_replace_page_cache()
(the page may or may not have already been charged); and I've removed a
comment and call to mem_cgroup_uncharge_cache_page(), which in fact is
always a no-op while PageSwapCache.

Also removed optimization of an unlikely path in shmem_getpage_gfp(),
now that we need to check PageSwapCache more carefully (a racing caller
might already have made the copy).  And at one point shmem_unuse_inode()
needs to use the hitherto private page_swapcount(), to guard against
racing with inode eviction.

It would make sense to extend shmem_should_replace_page(), to cover
cpuset and NUMA mempolicy restrictions too, but set that aside for now:
needs a cleanup of shmem mempolicy handling, and more testing, and ought
to handle swap faults in do_swap_page() as well as shmem.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@infradead.org>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Stephane Marchesin <marcheu@chromium.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Dave Airlie <airlied@gmail.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Rob Clark <rob.clark@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29 16:22:22 -07:00
Bartlomiej Zolnierkiewicz
5ceb9ce6fe mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
When MIGRATE_UNMOVABLE pages are freed from MIGRATE_UNMOVABLE type
pageblock (and some MIGRATE_MOVABLE pages are left in it) waiting until an
allocation takes ownership of the block may take too long.  The type of
the pageblock remains unchanged so the pageblock cannot be used as a
migration target during compaction.

Fix it by:

* Adding enum compact_mode (COMPACT_ASYNC_[MOVABLE,UNMOVABLE], and
  COMPACT_SYNC) and then converting sync field in struct compact_control
  to use it.

* Adding nr_pageblocks_skipped field to struct compact_control and
  tracking how many destination pageblocks were of MIGRATE_UNMOVABLE type.
   If COMPACT_ASYNC_MOVABLE mode compaction ran fully in
  try_to_compact_pages() (COMPACT_COMPLETE) it implies that there is not a
  suitable page for allocation.  In this case then check how if there were
  enough MIGRATE_UNMOVABLE pageblocks to try a second pass in
  COMPACT_ASYNC_UNMOVABLE mode.

* Scanning the MIGRATE_UNMOVABLE pageblocks (during COMPACT_SYNC and
  COMPACT_ASYNC_UNMOVABLE compaction modes) and building a count based on
  finding PageBuddy pages, page_count(page) == 0 or PageLRU pages.  If all
  pages within the MIGRATE_UNMOVABLE pageblock are in one of those three
  sets change the whole pageblock type to MIGRATE_MOVABLE.

My particular test case (on a ARM EXYNOS4 device with 512 MiB, which means
131072 standard 4KiB pages in 'Normal' zone) is to:

- allocate 120000 pages for kernel's usage
- free every second page (60000 pages) of memory just allocated
- allocate and use 60000 pages from user space
- free remaining 60000 pages of kernel memory
  (now we have fragmented memory occupied mostly by user space pages)
- try to allocate 100 order-9 (2048 KiB) pages for kernel's usage

The results:
- with compaction disabled I get 11 successful allocations
- with compaction enabled - 14 successful allocations
- with this patch I'm able to get all 100 successful allocations

NOTE: If we can make kswapd aware of order-0 request during compaction, we
can enhance kswapd with changing mode to COMPACT_ASYNC_FULL
(COMPACT_ASYNC_MOVABLE + COMPACT_ASYNC_UNMOVABLE).  Please see the
following thread:

	http://marc.info/?l=linux-mm&m=133552069417068&w=2

[minchan@kernel.org: minor cleanups]
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29 16:22:22 -07:00
Johannes Weiner
238305bb4d mm: remove sparsemem allocation details from the bootmem allocator
alloc_bootmem_section() derives allocation area constraints from the
specified sparsemem section.  This is a bit specific for a generic memory
allocator like bootmem, though, so move it over to sparsemem.

As __alloc_bootmem_node_nopanic() already retries failed allocations with
relaxed area constraints, the fallback code in sparsemem.c can be removed
and the code becomes a bit more compact overall.

[akpm@linux-foundation.org: fix build]
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29 16:22:22 -07:00
Alex Shi
2099597401 mm: move is_vma_temporary_stack() declaration to huge_mm.h
When transparent_hugepage_enabled() is used outside mm/, such as in
arch/x86/xx/tlb.c:

+       if (!cpu_has_invlpg || vma->vm_flags & VM_HUGETLB
+                       || transparent_hugepage_enabled(vma)) {
+               flush_tlb_mm(vma->vm_mm);

is_vma_temporary_stack() isn't referenced in huge_mm.h, so it has compile
errors:

  arch/x86/mm/tlb.c: In function `flush_tlb_range':
  arch/x86/mm/tlb.c:324:4: error: implicit declaration of function `is_vma_temporary_stack' [-Werror=implicit-function-declaration]

Since is_vma_temporay_stack() is just used in rmap.c and huge_memory.c, it
is better to move it to huge_mm.h from rmap.h to avoid such errors.

Signed-off-by: Alex Shi <alex.shi@intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29 16:22:21 -07:00
Ulrich Drepper
9295b7a07c kbuild: install kernel-page-flags.h
Programs using /proc/kpageflags need to know about the various flags.  The
<linux/kernel-page-flags.h> provides them and the comments in the file
indicate that it is supposed to be used by user-level code.  But the file
is not installed.

Install the headers and mark the unstable flags as out-of-bounds.  The
page-type tool is also adjusted to not duplicate the definitions

Signed-off-by: Ulrich Drepper <drepper@gmail.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29 16:22:21 -07:00
Konstantin Khlebnikov
02602a18c3 bug: completely remove code generated by disabled VM_BUG_ON()
Even if CONFIG_DEBUG_VM=n gcc genereates code for some VM_BUG_ON()

for example VM_BUG_ON(!PageCompound(page) || !PageHead(page)); in
do_huge_pmd_wp_page() generates 114 bytes of code.

But they mostly disappears when I split this VM_BUG_ON into two:

  -VM_BUG_ON(!PageCompound(page) || !PageHead(page));
  +VM_BUG_ON(!PageCompound(page));
  +VM_BUG_ON(!PageHead(page));

weird... but anyway after this patch code disappears completely.

  add/remove: 0/0 grow/shrink: 7/97 up/down: 135/-1784 (-1649)

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29 16:22:20 -07:00
Konstantin Khlebnikov
baf05aa927 bug: introduce BUILD_BUG_ON_INVALID() macro
Sometimes we want to check some expressions correctness at compile time.
"(void)(e);" or "if (e);" can be dangerous if the expression has
side-effects, and gcc sometimes generates a lot of code, even if the
expression has no effect.

This patch introduces macro BUILD_BUG_ON_INVALID() for such checks, it
forces a compilation error if expression is invalid without any extra
code.

[Cast to "long" required because sizeof does not work for bit-fields.]

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29 16:22:20 -07:00
Johannes Weiner
c3ac9a8ade mm: memcg: count pte references from every member of the reclaimed hierarchy
The rmap walker checking page table references has historically ignored
references from VMAs that were not part of the memcg that was being
reclaimed during memcg hard limit reclaim.

When transitioning global reclaim to memcg hierarchy reclaim, I missed
that bit and now references from outside a memcg are ignored even during
global reclaim.

Reverting back to traditional behaviour - count all references during
global reclaim and only mind references of the memcg being reclaimed
during limit reclaim would be one option.

However, the more generic idea is to ignore references exactly then when
they are outside the hierarchy that is currently under reclaim; because
only then will their reclamation be of any use to help the pressure
situation.  It makes no sense to ignore references from a sibling memcg
and then evict a page that will be immediately refaulted by that sibling
which contributes to the same usage of the common ancestor under
reclaim.

The solution: make the rmap walker ignore references from VMAs that are
not part of the hierarchy that is being reclaimed.

Flat limit reclaim will stay the same, hierarchical limit reclaim will
mind the references only to pages that the hierarchy owns.  Global
reclaim, since it reclaims from all memcgs, will be fixed to regard all
references.

[akpm@linux-foundation.org: name the args in the declaration]
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Acked-by: Konstantin Khlebnikov<khlebnikov@openvz.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29 16:22:20 -07:00
Andrew Morton
0ce72d4f73 mm: do_migrate_pages(): rename arguments
s/from_nodes/from and s/to_nodes/to/.  The "_nodes" is redundant - it
duplicates the argument's type.

Done in a fit of irritation over 80-col issues :(

Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <mkosaki@redhat.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29 16:22:20 -07:00
Mel Gorman
23b9da55c5 mm: vmscan: remove reclaim_mode_t
There is little motiviation for reclaim_mode_t once RECLAIM_MODE_[A]SYNC
and lumpy reclaim have been removed.  This patch gets rid of
reclaim_mode_t as well and improves the documentation about what
reclaim/compaction is and when it is triggered.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ying Han <yinghan@google.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29 16:22:19 -07:00
Mel Gorman
41ac1999c3 mm: vmscan: do not stall on writeback during memory compaction
This patch stops reclaim/compaction entering sync reclaim as this was
only intended for lumpy reclaim and an oversight.  Page migration has
its own logic for stalling on writeback pages if necessary and memory
compaction is already using it.

Waiting on page writeback is bad for a number of reasons but the primary
one is that waiting on writeback to a slow device like USB can take a
considerable length of time.  Page reclaim instead uses
wait_iff_congested() to throttle if too many dirty pages are being
scanned.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ying Han <yinghan@google.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29 16:22:19 -07:00
Mel Gorman
c53919adc0 mm: vmscan: remove lumpy reclaim
This series removes lumpy reclaim and some stalling logic that was
unintentionally being used by memory compaction.  The end result is that
stalling on dirty pages during page reclaim now depends on
wait_iff_congested().

Four kernels were compared

  3.3.0     vanilla
  3.4.0-rc2 vanilla
  3.4.0-rc2 lumpyremove-v2 is patch one from this series
  3.4.0-rc2 nosync-v2r3 is the full series

Removing lumpy reclaim saves almost 900 bytes of text whereas the full
series removes 1200 bytes.

     text     data      bss       dec     hex  filename
  6740375  1927944  2260992  10929311  a6c49f  vmlinux-3.4.0-rc2-vanilla
  6739479  1927944  2260992  10928415  a6c11f  vmlinux-3.4.0-rc2-lumpyremove-v2
  6739159  1927944  2260992  10928095  a6bfdf  vmlinux-3.4.0-rc2-nosync-v2

There are behaviour changes in the series and so tests were run with
monitoring of ftrace events.  This disrupts results so the performance
results are distorted but the new behaviour should be clearer.

fs-mark running in a threaded configuration showed little of interest as
it did not push reclaim aggressively

  FS-Mark Multi Threaded
                          3.3.0-vanilla       rc2-vanilla       lumpyremove-v2r3       nosync-v2r3
  Files/s  min           3.20 ( 0.00%)        3.20 ( 0.00%)        3.20 ( 0.00%)        3.20 ( 0.00%)
  Files/s  mean          3.20 ( 0.00%)        3.20 ( 0.00%)        3.20 ( 0.00%)        3.20 ( 0.00%)
  Files/s  stddev        0.00 ( 0.00%)        0.00 ( 0.00%)        0.00 ( 0.00%)        0.00 ( 0.00%)
  Files/s  max           3.20 ( 0.00%)        3.20 ( 0.00%)        3.20 ( 0.00%)        3.20 ( 0.00%)
  Overhead min      508667.00 ( 0.00%)   521350.00 (-2.49%)   544292.00 (-7.00%)   547168.00 (-7.57%)
  Overhead mean     551185.00 ( 0.00%)   652690.73 (-18.42%)   991208.40 (-79.83%)   570130.53 (-3.44%)
  Overhead stddev    18200.69 ( 0.00%)   331958.29 (-1723.88%)  1579579.43 (-8578.68%)     9576.81 (47.38%)
  Overhead max      576775.00 ( 0.00%)  1846634.00 (-220.17%)  6901055.00 (-1096.49%)   585675.00 (-1.54%)
  MMTests Statistics: duration
  Sys Time Running Test (seconds)             309.90    300.95    307.33    298.95
  User+Sys Time Running Test (seconds)        319.32    309.67    315.69    307.51
  Total Elapsed Time (seconds)               1187.85   1193.09   1191.98   1193.73

  MMTests Statistics: vmstat
  Page Ins                                       80532       82212       81420       79480
  Page Outs                                  111434984   111456240   111437376   111582628
  Swap Ins                                           0           0           0           0
  Swap Outs                                          0           0           0           0
  Direct pages scanned                           44881       27889       27453       34843
  Kswapd pages scanned                        25841428    25860774    25861233    25843212
  Kswapd pages reclaimed                      25841393    25860741    25861199    25843179
  Direct pages reclaimed                         44881       27889       27453       34843
  Kswapd efficiency                                99%         99%         99%         99%
  Kswapd velocity                            21754.791   21675.460   21696.029   21649.127
  Direct efficiency                               100%        100%        100%        100%
  Direct velocity                               37.783      23.375      23.031      29.188
  Percentage direct scans                           0%          0%          0%          0%

ftrace showed that there was no stalling on writeback or pages submitted
for IO from reclaim context.

postmark was similar and while it was more interesting, it also did not
push reclaim heavily.

  POSTMARK
                                       3.3.0-vanilla       rc2-vanilla  lumpyremove-v2r3       nosync-v2r3
  Transactions per second:               16.00 ( 0.00%)    20.00 (25.00%)    18.00 (12.50%)    17.00 ( 6.25%)
  Data megabytes read per second:        18.80 ( 0.00%)    24.27 (29.10%)    22.26 (18.40%)    20.54 ( 9.26%)
  Data megabytes written per second:     35.83 ( 0.00%)    46.25 (29.08%)    42.42 (18.39%)    39.14 ( 9.24%)
  Files created alone per second:        28.00 ( 0.00%)    38.00 (35.71%)    34.00 (21.43%)    30.00 ( 7.14%)
  Files create/transact per second:       8.00 ( 0.00%)    10.00 (25.00%)     9.00 (12.50%)     8.00 ( 0.00%)
  Files deleted alone per second:       556.00 ( 0.00%)  1224.00 (120.14%)  3062.00 (450.72%)  6124.00 (1001.44%)
  Files delete/transact per second:       8.00 ( 0.00%)    10.00 (25.00%)     9.00 (12.50%)     8.00 ( 0.00%)

  MMTests Statistics: duration
  Sys Time Running Test (seconds)             113.34    107.99    109.73    108.72
  User+Sys Time Running Test (seconds)        145.51    139.81    143.32    143.55
  Total Elapsed Time (seconds)               1159.16    899.23    980.17   1062.27

  MMTests Statistics: vmstat
  Page Ins                                    13710192    13729032    13727944    13760136
  Page Outs                                   43071140    42987228    42733684    42931624
  Swap Ins                                           0           0           0           0
  Swap Outs                                          0           0           0           0
  Direct pages scanned                               0           0           0           0
  Kswapd pages scanned                         9941613     9937443     9939085     9929154
  Kswapd pages reclaimed                       9940926     9936751     9938397     9928465
  Direct pages reclaimed                             0           0           0           0
  Kswapd efficiency                                99%         99%         99%         99%
  Kswapd velocity                             8576.567   11051.058   10140.164    9347.109
  Direct efficiency                               100%        100%        100%        100%
  Direct velocity                                0.000       0.000       0.000       0.000

It looks like here that the full series regresses performance but as
ftrace showed no usage of wait_iff_congested() or sync reclaim I am
assuming it's a disruption due to monitoring.  Other data such as memory
usage, page IO, swap IO all looked similar.

Running a benchmark with a plain DD showed nothing very interesting.
The full series stalled in wait_iff_congested() slightly less but stall
times on vanilla kernels were marginal.

Running a benchmark that hammered on file-backed mappings showed stalls
due to congestion but not in sync writebacks

  MICRO
                                       3.3.0-vanilla       rc2-vanilla  lumpyremove-v2r3       nosync-v2r3
  MMTests Statistics: duration
  Sys Time Running Test (seconds)             308.13    294.50    298.75    299.53
  User+Sys Time Running Test (seconds)        330.45    316.28    318.93    320.79
  Total Elapsed Time (seconds)               1814.90   1833.88   1821.14   1832.91

  MMTests Statistics: vmstat
  Page Ins                                      108712      120708       97224      110344
  Page Outs                                  155514576   156017404   155813676   156193256
  Swap Ins                                           0           0           0           0
  Swap Outs                                          0           0           0           0
  Direct pages scanned                         2599253     1550480     2512822     2414760
  Kswapd pages scanned                        69742364    71150694    68839041    69692533
  Kswapd pages reclaimed                      34824488    34773341    34796602    34799396
  Direct pages reclaimed                         53693       94750       61792       75205
  Kswapd efficiency                                49%         48%         50%         49%
  Kswapd velocity                            38427.662   38797.901   37799.972   38022.889
  Direct efficiency                                 2%          6%          2%          3%
  Direct velocity                             1432.174     845.464    1379.807    1317.446
  Percentage direct scans                           3%          2%          3%          3%
  Page writes by reclaim                             0           0           0           0
  Page writes file                                   0           0           0           0
  Page writes anon                                   0           0           0           0
  Page reclaim immediate                             0           0           0        1218
  Page rescued immediate                             0           0           0           0
  Slabs scanned                                  15360       16384       13312       16384
  Direct inode steals                                0           0           0           0
  Kswapd inode steals                             4340        4327        1630        4323

  FTrace Reclaim Statistics: congestion_wait
  Direct number congest     waited                 0          0          0          0
  Direct time   congest     waited               0ms        0ms        0ms        0ms
  Direct full   congest     waited                 0          0          0          0
  Direct number conditional waited               900        870        754        789
  Direct time   conditional waited               0ms        0ms        0ms       20ms
  Direct full   conditional waited                 0          0          0          0
  KSwapd number congest     waited              2106       2308       2116       1915
  KSwapd time   congest     waited          139924ms   157832ms   125652ms   132516ms
  KSwapd full   congest     waited              1346       1530       1202       1278
  KSwapd number conditional waited             12922      16320      10943      14670
  KSwapd time   conditional waited               0ms        0ms        0ms        0ms
  KSwapd full   conditional waited                 0          0          0          0

Reclaim statistics are not radically changed.  The stall times in kswapd
are massive but it is clear that it is due to calls to congestion_wait()
and that is almost certainly the call in balance_pgdat().  Otherwise
stalls due to dirty pages are non-existant.

I ran a benchmark that stressed high-order allocation.  This is very
artifical load but was used in the past to evaluate lumpy reclaim and
compaction.  Generally I look at allocation success rates and latency
figures.

  STRESS-HIGHALLOC
                   3.3.0-vanilla       rc2-vanilla  lumpyremove-v2r3       nosync-v2r3
  Pass 1          81.00 ( 0.00%)    28.00 (-53.00%)    24.00 (-57.00%)    28.00 (-53.00%)
  Pass 2          82.00 ( 0.00%)    39.00 (-43.00%)    38.00 (-44.00%)    43.00 (-39.00%)
  while Rested    88.00 ( 0.00%)    87.00 (-1.00%)    88.00 ( 0.00%)    88.00 ( 0.00%)

  MMTests Statistics: duration
  Sys Time Running Test (seconds)             740.93    681.42    685.14    684.87
  User+Sys Time Running Test (seconds)       2922.65   3269.52   3281.35   3279.44
  Total Elapsed Time (seconds)               1161.73   1152.49   1159.55   1161.44

  MMTests Statistics: vmstat
  Page Ins                                     4486020     2807256     2855944     2876244
  Page Outs                                    7261600     7973688     7975320     7986120
  Swap Ins                                       31694           0           0           0
  Swap Outs                                      98179           0           0           0
  Direct pages scanned                           53494       57731       34406      113015
  Kswapd pages scanned                         6271173     1287481     1278174     1219095
  Kswapd pages reclaimed                       2029240     1281025     1260708     1201583
  Direct pages reclaimed                          1468       14564       16649       92456
  Kswapd efficiency                                32%         99%         98%         98%
  Kswapd velocity                             5398.133    1117.130    1102.302    1049.641
  Direct efficiency                                 2%         25%         48%         81%
  Direct velocity                               46.047      50.092      29.672      97.306
  Percentage direct scans                           0%          4%          2%          8%
  Page writes by reclaim                       1616049           0           0           0
  Page writes file                             1517870           0           0           0
  Page writes anon                               98179           0           0           0
  Page reclaim immediate                        103778       27339        9796       17831
  Page rescued immediate                             0           0           0           0
  Slabs scanned                                1096704      986112      980992      998400
  Direct inode steals                              223      215040      216736      247881
  Kswapd inode steals                           175331       61548       68444       63066
  Kswapd skipped wait                            21991           0           1           0
  THP fault alloc                                    1         135         125         134
  THP collapse alloc                               393         311         228         236
  THP splits                                        25          13           7           8
  THP fault fallback                                 0           0           0           0
  THP collapse fail                                  3           5           7           7
  Compaction stalls                                865        1270        1422        1518
  Compaction success                               370         401         353         383
  Compaction failures                              495         869        1069        1135
  Compaction pages moved                        870155     3828868     4036106     4423626
  Compaction move failure                        26429       23865       29742       27514

Success rates are completely hosed for 3.4-rc2 which is almost certainly
due to commit fe2c2a1066 ("vmscan: reclaim at order 0 when compaction
is enabled").  I expected this would happen for kswapd and impair
allocation success rates (https://lkml.org/lkml/2012/1/25/166) but I did
not anticipate this much a difference: 80% less scanning, 37% less
reclaim by kswapd

In comparison, reclaim/compaction is not aggressive and gives up easily
which is the intended behaviour.  hugetlbfs uses __GFP_REPEAT and would
be much more aggressive about reclaim/compaction than THP allocations
are.  The stress test above is allocating like neither THP or hugetlbfs
but is much closer to THP.

Mainline is now impaired in terms of high order allocation under heavy
load although I do not know to what degree as I did not test with
__GFP_REPEAT.  Keep this in mind for bugs related to hugepage pool
resizing, THP allocation and high order atomic allocation failures from
network devices.

In terms of congestion throttling, I see the following for this test

  FTrace Reclaim Statistics: congestion_wait
  Direct number congest     waited                 3          0          0          0
  Direct time   congest     waited               0ms        0ms        0ms        0ms
  Direct full   congest     waited                 0          0          0          0
  Direct number conditional waited               957        512       1081       1075
  Direct time   conditional waited               0ms        0ms        0ms        0ms
  Direct full   conditional waited                 0          0          0          0
  KSwapd number congest     waited                36          4          3          5
  KSwapd time   congest     waited            3148ms      400ms      300ms      500ms
  KSwapd full   congest     waited                30          4          3          5
  KSwapd number conditional waited             88514        197        332        542
  KSwapd time   conditional waited            4980ms        0ms        0ms        0ms
  KSwapd full   conditional waited                49          0          0          0

The "conditional waited" times are the most interesting as this is
directly impacted by the number of dirty pages encountered during scan.
As lumpy reclaim is no longer scanning contiguous ranges, it is finding
fewer dirty pages.  This brings wait times from about 5 seconds to 0.
kswapd itself is still calling congestion_wait() so it'll still stall but
it's a lot less.

In terms of the type of IO we were doing, I see this

  FTrace Reclaim Statistics: mm_vmscan_writepage
  Direct writes anon  sync                         0          0          0          0
  Direct writes anon  async                        0          0          0          0
  Direct writes file  sync                         0          0          0          0
  Direct writes file  async                        0          0          0          0
  Direct writes mixed sync                         0          0          0          0
  Direct writes mixed async                        0          0          0          0
  KSwapd writes anon  sync                         0          0          0          0
  KSwapd writes anon  async                    91682          0          0          0
  KSwapd writes file  sync                         0          0          0          0
  KSwapd writes file  async                   822629          0          0          0
  KSwapd writes mixed sync                         0          0          0          0
  KSwapd writes mixed async                        0          0          0          0

In 3.2, kswapd was doing a bunch of async writes of pages but
reclaim/compaction was never reaching a point where it was doing sync
IO.  This does not guarantee that reclaim/compaction was not calling
wait_on_page_writeback() but I would consider it unlikely.  It indicates
that merging patches 2 and 3 to stop reclaim/compaction calling
wait_on_page_writeback() should be safe.

This patch:

Lumpy reclaim had a purpose but in the mind of some, it was to kick the
system so hard it trashed.  For others the purpose was to complicate
vmscan.c.  Over time it was giving softer shoes and a nicer attitude but
memory compaction needs to step up and replace it so this patch sends
lumpy reclaim to the farm.

The tracepoint format changes for isolating LRU pages with this patch
applied.  Furthermore reclaim/compaction can no longer queue dirty pages
in pageout() if the underlying BDI is congested.  Lumpy reclaim used
this logic and reclaim/compaction was using it in error.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ying Han <yinghan@google.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29 16:22:19 -07:00
Rik van Riel
e709ffd616 mm: remove swap token code
The swap token code no longer fits in with the current VM model.  It
does not play well with cgroups or the better NUMA placement code in
development, since we have only one swap token globally.

It also has the potential to mess with scalability of the system, by
increasing the number of non-reclaimable pages on the active and
inactive anon LRU lists.

Last but not least, the swap token code has been broken for a year
without complaints, as reported by Konstantin Khlebnikov.  This suggests
we no longer have much use for it.

The days of sub-1G memory systems with heavy use of swap are over.  If
we ever need thrashing reducing code in the future, we will have to
implement something that does scale.

Signed-off-by: Rik van Riel <riel@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Hugh Dickins <hughd@google.com>
Acked-by: Bob Picco <bpicco@meloft.net>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29 16:22:19 -07:00
Paul Gortmaker
af2e840971 pagemap.h: fix warning about possibly used before init var
Commit f56f821feb ("mm: extend prefault helpers to fault in more than
PAGE_SIZE") added in the new functions: fault_in_multipages_writeable()
and fault_in_multipages_readable().

However, we currently see:

  include/linux/pagemap.h:492: warning: 'ret' may be used uninitialized in this function
  include/linux/pagemap.h:492: note: 'ret' was declared here

Unlike a lot of gcc nags, this one appears somewhat legit.  i.e.  passing
in an invalid negative value of "size" does make it look like all the
conditionals in there would be bypassed and the uninitialized value would
be returned.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29 16:22:18 -07:00
Linus Torvalds
4b78147468 MFD changes for 3.5
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPw2QKAAoJEIqAPN1PVmxKfv8P/2L5d38tc3+9wYuGI1l+k7Mz
 xQt2PdAx/kHQGTjLE1DSoeOD6dn4aodFbPaTcsLsU9Eo4IiJnT68b8adr/bqYHKU
 Cod6NSPJMaBxLBJZxXsA7nY69Z6O5SMjXxEQsiDc24gaP0jjwaeY35KJSfMug8nF
 DA6rvEpchkF8QXzBmkO2t2/uPYr1YWqDZQkauLDnLRG01JnGXFz5ajv9N5pYhiFt
 QyYtheg8yEnfwnQ6AlmRtGK75jZRVmrj0kOzRjE9UL7ZwtzswWJes+RE3tlgk89m
 JQ7KASRmmqLpvcVJ9fG9SlGX//yBO6OEp5Km06RTxgmt0XftBDVqBTjk1EG2tfMR
 SR0NIz6gJ0twKAe6U0d+5HMYalOU45H5ha9e3vCqZ8vl9JfmM95RS+TmWbGcRIqj
 04Y5x3I4zq6e9D0u+211BeuRfzkQiefwWJmdPpn0oac3u5LeYbRj/aQ85fqwJWzG
 f99D9VU5xgfFHPAtL3SLFiwgd9yOiMBar6eeIva+okDyOW3KaEUzs8Y4dgDyvYcg
 IU//JGK51vLVmI5kXtGCwYkgRLF/Y7WKZ8TwypT+SY6iv6tPQVvApOZljq7RC9GI
 mXx2z2slA90jlg3TlEFZfxr1WqbZ3TCbonU1riLeMEtkiXUpLtmKC8gbhOqfGvvn
 Nzgt+YqRJXafZdELb/S+
 =Rh0r
 -----END PGP SIGNATURE-----

Merge tag 'mfd-3.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6

Pull MFD changes from Samuel Ortiz:
 "Besides the usual cleanups, this one brings:

   * Support for 5 new chipsets: Intel's ICH LPC and SCH Centerton,
     ST-E's STAX211, Samsung's MAX77693 and TI's LM3533.

   * Device tree support for the twl6040, tps65910, da9502 and ab8500
     drivers.

   * Fairly big tps56910, ab8500 and db8500 updates.

   * i2c support for mc13xxx.

   * Our regular update for the wm8xxx driver from Mark."

Fix up various conflicts with other trees, largely due to ab5500 removal
etc.

* tag 'mfd-3.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6: (106 commits)
  mfd: Fix build break of max77693 by adding REGMAP_I2C option
  mfd: Fix twl6040 build failure
  mfd: Fix max77693 build failure
  mfd: ab8500-core should depend on MFD_DB8500_PRCMU
  gpio: tps65910: dt: process gpio specific device node info
  mfd: Remove the parsing of dt info for tps65910 gpio
  mfd: Save device node parsed platform data for tps65910 sub devices
  mfd: Add r_select to lm3533 platform data
  gpio: Add Intel Centerton support to gpio-sch
  mfd: Emulate active low IRQs as well as active high IRQs for wm831x
  mfd: Mark two lm3533 zone registers as volatile
  mfd: Fix return type of lm533 attribute is_visible
  mfd: Enable Device Tree support in the ab8500-pwm driver
  mfd: Enable Device Tree support in the ab8500-sysctrl driver
  mfd: Add support for Device Tree to twl6040
  mfd: Register the twl6040 child for the ASoC codec unconditionally
  mfd: Allocate twl6040 IRQ numbers dynamically
  mfd: twl6040 code cleanup in interrupt initialization part
  mfd: Enable ab8500-gpadc driver for Device Tree
  mfd: Prevent unassigned pointer from being used in ab8500-gpadc driver
  ...
2012-05-29 11:53:11 -07:00
Linus Torvalds
53f2c4a8fd NFS client updates for Linux 3.5
New features include:
 - Rewrite the O_DIRECT code so that it can share the same coalescing and
   pNFS functionality as the page cache code.
 - Allow the server to provide hints as to when we should use pNFS, and
   when it is more efficient to read and write through the metadata
   server.
 - NFS cache consistency updates:
   - Use the ctime to emulate a change attribute for NFSv2/v3 so that
     all NFS versions can share the same cache management code.
   - New cache management code will only look at the change attribute
     and size attribute when deciding whether or not our cached data
     is still valid or not.
   - Don't request NFSv4 post-op attributes on writes in cases such as
     O_DIRECT, where we don't care about data cache consistency, or
     when we have a write delegation, and know that our cache is
     still consistent.
   - Don't request NFSv4 post-op attributes on operations such as
     COMMIT, where there are no expected metadata updates.
   - Don't request NFSv4 directory post-op attributes in cases where
     the operations themselves already return change attribute updates:
     i.e.  operations such as OPEN, CREATE, REMOVE, LINK and RENAME.
 - Speed up 'ls' and friends by using READDIR rather than READDIRPLUS
   if we detect no attempts to lookup filenames.
 - Improve the code sharing between NFSv2/v3 and v4 mounts
 - NFSv4.1 state management efficiency improvements
 - More patches in preparation for NFSv4/v4.1 migration functionality.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJPw/MNAAoJEGcL54qWCgDyxU8P/2kKqhAlhoLEArBqo9FT3/OK
 YrNs5uO/erTgnCG8L0XQvTKjHB9F7TAeFXqTmBZuPlb1afRpHHt2vzPqzIvUCeOC
 ZXm8vzZf4nxWZgEFoTDdUBvqQi9lLdIzCRhSaVCKcRnNwiuaKDd/iwykbWGcHqmv
 jtR4lzXPllJdKCUL3yb3juVrpq6Vvn254ID2pqdnYcEtIJIHgaRZpwdp4Iz9+8b5
 Moishiw2rgCBJIhf+VCYd8B2oYfMgSDPxG1o3etkwY46qo+4s+CIls9Vu/6YzGXK
 3+NdLatRDqKhQpLm0/R+dI3rntnTZ8x6LgWnTGxUsiqb6pAaHZPK284rf2eh/s7M
 Q4G4203r0uw539kIt6eKOGqC9c8kZAPCHlQSPCaImZyCJsz+6OMShNlGB5bZpFPr
 tbdxaxudrhCF7UVKXicJCWgv2nIHtek6fNwey1jqFoYgZP5ipiBKymvXQC5WAMBw
 7RHJor/JEC+UJkVg/7Mkpg0UNw3E36CTYLeRJKlNCS6YO9NJQseCDxhhMNAy/ab7
 RGO8DVMkUsOUH20S+a19LyeFQtveWFIE0DiDqRn0KnNGhGwHrv2t4xFukjlrf4Sw
 8FQUBRdtFxfmspfA1IdoTY49XZQda5eagvTy1MyaWEh+jPSJ4G5j3sSjFiaKAJqw
 79iQKFGkxPOSHx2yCdAF
 =suVW
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.5-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "New features include:
   - Rewrite the O_DIRECT code so that it can share the same coalescing
     and pNFS functionality as the page cache code.
   - Allow the server to provide hints as to when we should use pNFS,
     and when it is more efficient to read and write through the
     metadata server.
   - NFS cache consistency updates:
     * Use the ctime to emulate a change attribute for NFSv2/v3 so that
       all NFS versions can share the same cache management code.
     * New cache management code will only look at the change attribute
       and size attribute when deciding whether or not our cached data
       is still valid or not.
     * Don't request NFSv4 post-op attributes on writes in cases such as
       O_DIRECT, where we don't care about data cache consistency, or
       when we have a write delegation, and know that our cache is still
       consistent.
     * Don't request NFSv4 post-op attributes on operations such as
       COMMIT, where there are no expected metadata updates.
     * Don't request NFSv4 directory post-op attributes in cases where
       the operations themselves already return change attribute
       updates: i.e. operations such as OPEN, CREATE, REMOVE, LINK and
       RENAME.
   - Speed up 'ls' and friends by using READDIR rather than READDIRPLUS
     if we detect no attempts to lookup filenames.
   - Improve the code sharing between NFSv2/v3 and v4 mounts
   - NFSv4.1 state management efficiency improvements
   - More patches in preparation for NFSv4/v4.1 migration functionality."

Fix trivial conflict in fs/nfs/nfs4proc.c that was due to the dcache
qstr name initialization changes (that made the length/hash a 64-bit
union)

* tag 'nfs-for-3.5-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (146 commits)
  NFSv4: Add debugging printks to state manager
  NFSv4: Map NFS4ERR_SHARE_DENIED into an EACCES error instead of EIO
  NFSv4: update_changeattr does not need to set NFS_INO_REVAL_PAGECACHE
  NFSv4.1: nfs4_reset_session should use nfs4_handle_reclaim_lease_error
  NFSv4.1: Handle other occurrences of NFS4ERR_CONN_NOT_BOUND_TO_SESSION
  NFSv4.1: Handle NFS4ERR_CONN_NOT_BOUND_TO_SESSION in the state manager
  NFSv4.1: Handle errors in nfs4_bind_conn_to_session
  NFSv4.1: nfs4_bind_conn_to_session should drain the session
  NFSv4.1: Don't clobber the seqid if exchange_id returns a confirmed clientid
  NFSv4.1: Add DESTROY_CLIENTID
  NFSv4.1: Ensure we use the correct credentials for bind_conn_to_session
  NFSv4.1: Ensure we use the correct credentials for session create/destroy
  NFSv4.1: Move NFSPROC4_CLNT_BIND_CONN_TO_SESSION to the end of the operations
  NFSv4.1: Handle NFS4ERR_SEQ_MISORDERED when confirming the lease
  NFSv4: When purging the lease, we must clear NFS4CLNT_LEASE_CONFIRM
  NFSv4: Clean up the error handling for nfs4_reclaim_lease
  NFSv4.1: Exchange ID must use GFP_NOFS allocation mode
  nfs41: Use BIND_CONN_TO_SESSION for CB_PATH_DOWN*
  nfs4.1: add BIND_CONN_TO_SESSION operation
  NFSv4.1 test the mdsthreshold hint parameters
  ...
2012-05-29 10:43:51 -07:00
Linus Torvalds
90324cc1b1 avoid iput() from flusher thread
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPw2J/AAoJECvKgwp+S8Ja5jkP/3uMxkhf8XQpXCI3O1QVfaQr
 uZFfM8sINqIPDVm1dtFjFj7f8Bw9mhE2KAnnJ1rKT8tQwqq9yAse1QPlhCG1ZqoP
 +AnMDDXHtx7WmQZXhBvS9b+unpZ7Jr6r6pO5XrmTL2kRL3YJPUhZ2+xbTT5belTB
 KoAu4WqORZRxfXoC76S7U8K+D4NcAGhAOxCClsIjmY+oocCiCag4FZOyzYIFViqc
 ghUN/+rLQ3fqGGv2yO7Ylx1gUM7sxIwkZQ/h962jFAtxz9czImr2NmRoMliOaOkS
 tvcnIf+E3u0n/zIjzFvzhxKgHJPP8PkcPMk60d3jKmFngBkqFTzNUeVTP8md7HrV
 4DlXisWr+z7YVyWUCFaNcJLmjiWSwQ8DV/clRLobeBf9EJKan5F1PjFgl6PLJM5F
 Qr1+LHMNaetdulBwMRTyveZTzYqw9RmDnD9dWMo4mX/kTpvtC4jTPVV7hkRD+Qlv
 5vTRR+VXL3Q50yClLf0AQMSKTnH2gBuepM/b+7cShLGfsMln8DtUjmbigv+niL63
 BibcCIbIlP2uWGnl37VhsC34AT+RKt3lggrBOpn/7XJMq/wKR7IRP/7V9TfYgaUN
 NBa+wtnLDa1pZEn/X7izdcQP62PzDtmB+ObvYT0Yb40A4+2ud3qF/lB53c1A1ewF
 /9c4zxxekjHZnn2oooEa
 =oLXf
 -----END PGP SIGNATURE-----

Merge tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux

Pull writeback tree from Wu Fengguang:
 "Mainly from Jan Kara to avoid iput() in the flusher threads."

* tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
  writeback: Avoid iput() from flusher thread
  vfs: Rename end_writeback() to clear_inode()
  vfs: Move waiting for inode writeback from end_writeback() to evict_inode()
  writeback: Refactor writeback_single_inode()
  writeback: Remove wb->list_lock from writeback_single_inode()
  writeback: Separate inode requeueing after writeback
  writeback: Move I_DIRTY_PAGES handling
  writeback: Move requeueing when I_SYNC set to writeback_sb_inodes()
  writeback: Move clearing of I_SYNC into inode_sync_complete()
  writeback: initialize global_dirty_limit
  fs: remove 8 bytes of padding from struct writeback_control on 64 bit builds
  mm: page-writeback.c: local functions should not be exposed globally
2012-05-28 09:54:45 -07:00
Linus Torvalds
1e2aec873a Merge branch 'generic-string-functions'
This makes <asm/word-at-a-time.h> actually live up to its promise of
allowing architectures to help tune the string functions that do their
work a word at a time.

David had already taken the x86 strncpy_from_user() function, modified
it to work on sparc, and then done the extra work to make it generically
useful.  This then expands on that work by making x86 use that generic
version, completing the circle.

But more importantly, it fixes up the word-at-a-time interfaces so that
it's now easy to also support things like strnlen_user(), and pretty
much most random string functions.

David reports that it all works fine on sparc, and Jonas Bonn reported
that an earlier version of this worked on OpenRISC too.  It's pretty
easy for architectures to add support for this and just replace their
private versions with the generic code.

* generic-string-functions:
  sparc: use the new generic strnlen_user() function
  x86: use the new generic strnlen_user() function
  lib: add generic strnlen_user() function
  word-at-a-time: make the interfaces truly generic
  x86: use generic strncpy_from_user routine
2012-05-26 16:57:16 -07:00
Linus Torvalds
ae32adc1e0 Merge branch 'i2c-embedded/for-next' of git://git.pengutronix.de/git/wsa/linux
Pull i2c-embedded changes from Wolfram Sang:
 "Major changes:

   - lots of devicetree additions for existing drivers.  I tried hard to
     make sure the bindings are proper.  In more complicated cases, I
     requested acks from people having more experience with them than
     me.  That took a bit of extra time and also some time went into
     discussions with developers about what bindings are and what not.
     I have the feeling that the workflow with bindings should be
     improved to scale better.  I will spend some more thought on
     this...

   - i2c-muxes are succesfully used meanwhile, so we dropped
     EXPERIMENTAL for them and renamed the drivers to a standard pattern
     to match the rest of the subsystem.  They can also be used with
     devicetree now.

   - ixp2000 was removed since the whole platform goes away.

   - cleanups (strlcpy instead of strcpy, NULL instead of 0)

   - The rest is typical driver fixes I assume.

  All patches have been in linux-next at least since v3.4-rc6."

Fixed up trivial conflict in arch/arm/mach-lpc32xx/common.c due to the
same patch already having come in through the arm/soc trees, with
additional patches on top of it.

* 'i2c-embedded/for-next' of git://git.pengutronix.de/git/wsa/linux: (35 commits)
  i2c: davinci: Free requested IRQ in remove
  i2c: ocores: register OF i2c devices
  i2c: tegra: notify transfer-complete after clearing status.
  I2C: xiic: Add OF binding support
  i2c: Rename last mux driver to standard pattern
  i2c: tegra: fix 10bit address configuration
  i2c: muxes: rename first set of drivers to a standard pattern
  of/i2c: implement of_find_i2c_adapter_by_node
  i2c: implement i2c_verify_adapter
  i2c-s3c2410: Add HDMIPHY quirk for S3C2440
  i2c-s3c2410: Rework device type handling
  i2c: muxes are not EXPERIMENTAL anymore
  i2c/of: Automatically populate i2c mux busses from device tree data.
  i2c: Add a struct device * parameter to i2c_add_mux_adapter()
  of/i2c: call i2c_verify_client from of_find_i2c_device_by_node
  i2c: designware: Add clk_{un}prepare() support
  i2c: designware: add PM support
  i2c: ixp2000: remove driver
  i2c: pnx: add device tree support
  i2c: imx: don't use strcpy but strlcpy
  ...
2012-05-26 13:35:03 -07:00
Linus Torvalds
84a442b9a1 arm-soc: device tree conversions, part 2
These continue the device tree work from part 1, this set is for the
 tegra, mxs and imx platforms, all of which have dependencies on clock
 or pinctrl changes submitted earlier.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPuex7AAoJEIwa5zzehBx3xsQP/jkyt74MvuKUi8pi2zkeMIgn
 4XieyqcA0KZjJzfB22q3GIZjNIf/mEIGE4E/3bneVMPh/E2zaiohaXFExBmjNjml
 hhzWeZlFGPBjrZsfpIXJIIUhwSI7gX2rjYh4npJmdNhZmy8Y89XnpNJhN1kOwMuV
 oN23hPWoSVGbyDMQ0fmHx9GyOL8m7yap+joG13aljDa2OKpQg+pYvdwft+k1K9di
 8yPF+qA043UUR7dSsjmTbiCcjZy2eySdCmfOAkEG4inSgxNoM7GBs3MuwZo/veCD
 v5WssJqWDbLXtqKn5Uo2bvGWiEcf0xtwOAqhSpbaup3dQFJSWMEenBTtA9UlxFhk
 6gdY62O+7k6N0thkxXyLNGkgaGzexZAsK7dM6XSDB+PqD+OSNJS7dvmxZM8tuaRn
 rvCM1XWcNeN/dpnLbgwCR12efkwWtJoqqUZUUp/tFFaTo8HriqeAIYk7obnR8s9n
 S5x9LeueQGNgaxXJzVdh481YKG/1lqjG/a06HbVgYS4XQvtdA+4khalOefJv10tm
 Nkg8+4/8pMthWJfhhlfPUgWFXOXFF2AGPG4su2XwKuFXypO8599lzi7gUQaEZu2U
 7caqoWP69KsKvK5iAAmA4DQ2rcsgHd44NXx/8Jjes9ma8knlYjrf42dBH6AZMQBG
 69I9sJ1cyqusBwx72NPN
 =WeDQ
 -----END PGP SIGNATURE-----

Merge tag 'dt2' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull arm-soc device tree conversions (part 2) from Olof Johansson:
 "These continue the device tree work from part 1, this set is for the
  tegra, mxs and imx platforms, all of which have dependencies on clock
  or pinctrl changes submitted earlier."

Fix up trivial conflicts due to nearby changes in
drivers/{gpio/gpio,i2c/busses/i2c}-mxs.c

* tag 'dt2' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (73 commits)
  ARM: dt: tegra: invert status=disable vs status=okay
  ARM: dt: tegra: consistent basic property ordering
  ARM: dt: tegra: sort nodes based on bus order
  ARM: dt: tegra: remove duplicate device_type property
  ARM: dt: tegra: consistenly use lower-case for hex constants
  ARM: dt: tegra: format regs properties consistently
  ARM: dt: tegra: gpio comment cleanup
  ARM: dt: tegra: remove unnecessary unit addresses
  ARM: dt: tegra: whitespace cleanup
  ARM: dt: tegra cardhu: fix typo in SDHCI node name
  ARM: dt: tegra: cardhu: register core regulator tps62361
  ARM: dt: tegra30.dtsi: Add SMMU node
  ARM: dt: tegra20.dtsi: Add GART node
  ARM: dt: tegra30.dtsi: Add Memory Controller(MC) nodes
  ARM: dt: tegra20.dtsi: Add Memory Controller(MC) nodes
  ARM: dt: tegra: Add device tree support for AHB
  ARM: dts: enable audio support for imx28-evk
  ARM: dts: enable i2c device for imx28-evk
  i2c: mxs: add device tree probe support
  ARM: dts: enable mmc for imx28-evk
  ...
2012-05-26 12:57:47 -07:00
Linus Torvalds
39b6cc668c arm-soc: add stmp-dev library code
A number of devices are using a common register layout, this adds support
 code for it in lib/stmp_device.c so we do not need to duplicate it in
 each driver.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPuexlAAoJEIwa5zzehBx31P0P/jK7GC5Ln2gr/bV+4Kt9fStS
 VcGI/ARsyQtwaNTJQfPkg8Weg3DhbPRlUWeimVKMFo3uEle3VjnPBjdcMPUKtW3x
 SPka/W591LGEdKQRmXZrISm2OiQXVvM2zkhSJV89n/tJBdHd+tDWDDq4Y784F8Cj
 hWmcIi66G4RBPj5pplf80UhNAEg5HoZHQnlgrS1iLMpBTwXAesv7zyZpvnsMzdpg
 qSJTfcifgLULtM0WFbooNGojBn8ftuA67psrw78vgV2bz7bVBioZHYFyqPWK9Gr0
 vtiKuyXqiDA65mueXA+E5RXXLCLQSyGdV8y0xiSYjilRVkziPcMKnQT07keb8SJN
 CCDpetjEULiQpgKvVWc7sDGlb5ePd/C5rs31S0fFOKjeRJNlfG5+OuqZPiobO7hk
 F2Fx3gq4LPLel7gwjK3T4XTmmL9kNt/y1sIfXx5WybJL8N5n6TdZIfWm6yOZYwfX
 jvG/CnvVvhgdWk/ebaTEOG1MaeNAY3uwGpSBuEEoXUDHatQdOYAsgLfJJv/H4zKp
 2AY9qvXTDtFYys/hs2WhwmS7s1WFlIrA+voEPBDa3WT2qGup8HAL/C9kL3ms2zqk
 8JL/yQ/IJpTHPb4bCGo9C08qdi1YtMbylHB0/ELvG1BNoHOnCDV3wZlVG3ZTQQb5
 c/Lb2H8crk5HVbpCPLQU
 =VHLM
 -----END PGP SIGNATURE-----

Merge tag 'stmp-dev' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull arm-soc stmp-dev library code from Olof Johansson:
 "A number of devices are using a common register layout, this adds
  support code for it in lib/stmp_device.c so we do not need to
  duplicate it in each driver."

Fix up trivial conflicts in drivers/i2c/busses/i2c-mxs.c and
lib/Makefile

* tag 'stmp-dev' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  i2c: mxs: use global reset function
  lib: add support for stmp-style devices
2012-05-26 12:50:04 -07:00
Linus Torvalds
2795343705 arm-soc: clock driver changes
The new clock subsystem was merged in linux-3.4 without any users, this
 now moves the first three platforms over to it: imx, mxs and spear.
 
 The series also contains the changes for the clock subsystem itself,
 since Mike preferred to have it together with the platforms that require
 these changes, in order to avoid interdependencies and conflicts.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPuexPAAoJEIwa5zzehBx3YBsP/0nFhXjb5t1PdLfFzGKtcZVB
 j4zXWXMHQ1fA7wIfEpZF3Nnco6MQkufF5wJPoPdn1+wmkzCn3D6IwNVWVtW4U5i9
 VGyShSbgusAAYXUe/9yYj8eN+bbRQSvdN4eWYWU6+rRXShGZ5dZZmp+IPNl54dnW
 6F8uCnHX0cnIMCpGqV+41zZgZ/4wL2k9gdqu0LO6pi07o4tGd0Z4gcySgUFAnn1R
 kofNHueYIP4UgOg8DREoBzVKlpRqMou3S2kSZUfMeb3Q9ryF7UIvaGqIILyi7PKL
 kWd3nptg0EPavfL21SwXHiGpnDpB/Gj/F70kcPLus5RYujB24C9bvBmc26z68NZx
 Sz9mbElkkIU5duZsl1nxBWJ8IZ/tSWdtmC2xQMznmV7gHyGgVwr4j47f4Uv5sBvM
 14JHDO7mqN6E6FnTFZu/oPAN5pDjgL+TVNK5BU6Wkq0zitrA6eyKDqCvBCqkO6Nn
 tNzOuyRDzMOwM7HzqXhxqtzJWXylO1Mldc4bM8X4Cocf4pnLna/X6uP6dgE6A+JY
 azVYx4I/0NdEPerDTzIcEhBDgZeBVROhUQr+kHxc4rf6WzUUbu/wEo1UKXWV66oW
 1jb1yAFFWqYjkQuQc2PD4JSx35sFJaoSaoneRtmzBzRDfzSr5KjKj1E0e1skyMFq
 7ZVLCqZD0cB9DhmMDkWP
 =rwFF
 -----END PGP SIGNATURE-----

Merge tag 'clock' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull arm-soc clock driver changes from Olof Johansson:
 "The new clock subsystem was merged in linux-3.4 without any users,
  this now moves the first three platforms over to it: imx, mxs and
  spear.

  The series also contains the changes for the clock subsystem itself,
  since Mike preferred to have it together with the platforms that
  require these changes, in order to avoid interdependencies and
  conflicts."

Fix up trivial conflicts in arch/arm/mach-kirkwood/common.c (code
removed in one branch, added OF support in another) and
drivers/dma/imx-sdma.c (independent changes next to each other).

* tag 'clock' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (97 commits)
  clk: Fix CLK_SET_RATE_GATE flag validation in clk_set_rate().
  clk: Provide dummy clk_unregister()
  SPEAr: Update defconfigs
  SPEAr: Add SMI NOR partition info in dts files
  SPEAr: Switch to common clock framework
  SPEAr: Call clk_prepare() before calling clk_enable
  SPEAr: clk: Add General Purpose Timer Synthesizer clock
  SPEAr: clk: Add Fractional Synthesizer clock
  SPEAr: clk: Add Auxiliary Synthesizer clock
  SPEAr: clk: Add VCO-PLL Synthesizer clock
  SPEAr: Add DT bindings for SPEAr's timer
  ARM i.MX: remove now unused clock files
  ARM: i.MX6: implement clocks using common clock framework
  ARM i.MX35: implement clocks using common clock framework
  ARM i.MX5: implement clocks using common clock framework
  ARM: Kirkwood: Replace clock gating
  ARM: Orion: Audio: Add clk/clkdev support
  ARM: Orion: PCIE: Add support for clk
  ARM: Orion: XOR: Add support for clk
  ARM: Orion: CESA: Add support for clk
  ...
2012-05-26 12:42:29 -07:00
Linus Torvalds
36126f8f2e word-at-a-time: make the interfaces truly generic
This changes the interfaces in <asm/word-at-a-time.h> to be a bit more
complicated, but a lot more generic.

In particular, it allows us to really do the operations efficiently on
both little-endian and big-endian machines, pretty much regardless of
machine details.  For example, if you can rely on a fast population
count instruction on your architecture, this will allow you to make your
optimized <asm/word-at-a-time.h> file with that.

NOTE! The "generic" version in include/asm-generic/word-at-a-time.h is
not truly generic, it actually only works on big-endian.  Why? Because
on little-endian the generic algorithms are wasteful, since you can
inevitably do better. The x86 implementation is an example of that.

(The only truly non-generic part of the asm-generic implementation is
the "find_zero()" function, and you could make a little-endian version
of it.  And if the Kbuild infrastructure allowed us to pick a particular
header file, that would be lovely)

The <asm/word-at-a-time.h> functions are as follows:

 - WORD_AT_A_TIME_CONSTANTS: specific constants that the algorithm
   uses.

 - has_zero(): take a word, and determine if it has a zero byte in it.
   It gets the word, the pointer to the constant pool, and a pointer to
   an intermediate "data" field it can set.

   This is the "quick-and-dirty" zero tester: it's what is run inside
   the hot loops.

 - "prep_zero_mask()": take the word, the data that has_zero() produced,
   and the constant pool, and generate an *exact* mask of which byte had
   the first zero.  This is run directly *outside* the loop, and allows
   the "has_zero()" function to answer the "is there a zero byte"
   question without necessarily getting exactly *which* byte is the
   first one to contain a zero.

   If you do multiple byte lookups concurrently (eg "hash_name()", which
   looks for both NUL and '/' bytes), after you've done the prep_zero_mask()
   phase, the result of those can be or'ed together to get the "either
   or" case.

 - The result from "prep_zero_mask()" can then be fed into "find_zero()"
   (to find the byte offset of the first byte that was zero) or into
   "zero_bytemask()" (to find the bytemask of the bytes preceding the
   zero byte).

   The existence of zero_bytemask() is optional, and is not necessary
   for the normal string routines.  But dentry name hashing needs it, so
   if you enable DENTRY_WORD_AT_A_TIME you need to expose it.

This changes the generic strncpy_from_user() function and the dentry
hashing functions to use these modified word-at-a-time interfaces.  This
gets us back to the optimized state of the x86 strncpy that we lost in
the previous commit when moving over to the generic version.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-26 11:33:40 -07:00
Trond Myklebust
32b0131069 NFSv4.1: Don't clobber the seqid if exchange_id returns a confirmed clientid
If the EXCHGID4_FLAG_CONFIRMED_R flag is set, the client is in theory
supposed to already know the correct value of the seqid, in which case
RFC5661 states that it should ignore the value returned.

Also ensure that if the sanity check in nfs4_check_cl_exchange_flags
fails, then we must not change the nfs_client fields.

Finally, clean up the code: we don't need to retest the value of
'status' unless it can change.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-26 14:17:31 -04:00
Trond Myklebust
6624553910 NFSv4.1: Add DESTROY_CLIENTID
Ensure that we destroy our lease on last unmount

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-26 14:17:30 -04:00
Linus Torvalds
fa2af6e4fe Merge git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile
Pull tile updates from Chris Metcalf:
 "These changes cover a range of new arch/tile features and
  optimizations.  They've been through LKML review and on linux-next for
  a month or so.  There's also one bug-fix that just missed 3.4, which
  I've marked for stable."

Fixed up trivial conflict in arch/tile/Kconfig (new added tile Kconfig
entries clashing with the generic timer/clockevents changes).

* git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
  tile: default to tilegx_defconfig for ARCH=tile
  tile: fix bug where fls(0) was not returning 0
  arch/tile: mark TILEGX as not EXPERIMENTAL
  tile/mm/fault.c: Port OOM changes to handle_page_fault
  arch/tile: add descriptive text if the kernel reports a bad trap
  arch/tile: allow querying cpu module information from the hypervisor
  arch/tile: fix hardwall for tilegx and generalize for idn and ipi
  arch/tile: support multiple huge page sizes dynamically
  mm: add new arch_make_huge_pte() method for tile support
  arch/tile: support kexec() for tilegx
  arch/tile: support <asm/cachectl.h> header for cacheflush() syscall
  arch/tile: Allow tilegx to build with either 16K or 64K page size
  arch/tile: optimize get_user/put_user and friends
  arch/tile: support building big-endian kernel
  arch/tile: allow building Linux with transparent huge pages enabled
  arch/tile: use interrupt critical sections less
2012-05-25 15:59:38 -07:00
Trond Myklebust
ad24ecfbcd NFSv4.1: Move NFSPROC4_CLNT_BIND_CONN_TO_SESSION to the end of the operations
For backward compatibility with nfs-utils.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Weston Andros Adamson <dros@netapp.com>
2012-05-25 18:02:09 -04:00
Chris Metcalf
d9ed9faac2 mm: add new arch_make_huge_pte() method for tile support
The tile support for multiple-size huge pages requires tagging
the hugetlb PTE with a "super" bit for PTEs that are multiples of
the basic size of a pagetable span.  To set that bit properly
we need to tweak the PTe in make_huge_pte() based on the vma.

This change provides the API for a subsequent tile-specific
change to use.

Reviewed-by: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2012-05-25 12:48:26 -04:00
Chris Metcalf
73636b1aac arch/tile: allow building Linux with transparent huge pages enabled
The change adds some infrastructure for managing tile pmd's more generally,
using pte_pmd() and pmd_pte() methods to translate pmd values to and
from ptes, since on TILEPro a pmd is really just a nested structure
holding a pgd (aka pte).  Several existing pmd methods are moved into
this framework, and a whole raft of additional pmd accessors are defined
that are used by the transparent hugepage framework.

The tile PTE now has a "client2" bit.  The bit is used to indicate a
transparent huge page is in the process of being split into subpages.

This change also fixes a generic bug where the return value of the
generic pmdp_splitting_flush() was incorrect.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2012-05-25 12:48:21 -04:00
Linus Torvalds
da89fb165e dma-buf updates for 3.5
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQEcBAABAgAGBQJPvzQCAAoJEFErWKtxJpJdpNEIAI1sKDywvfuJK0Ik76ICj1Yt
 P//4/ZvROmT8w9u/Jw3BAG7K3u7NLtfht6RcrUFqMULjMUUQ/aymlY9uTbwFZ+so
 WCsVh5tHCULa1oUnAUv8fGMgvGoufD4ZqI/9qbuYLmBtUwPAatul51cEmQyWVvLa
 lJN8PzJ7whfYqNoXpR4SCp8eHY4iJ3DZFDhypdQfZbTgOTrzsoVIJnTdHUXsiRQQ
 E3gB2dRvyihzOD/UFac47af5wVUwtvo1N6NdQ5tJxOX9ZhVGdHaxAqF5FTlWpm6F
 uK100uqFHPbm/TZGtSrGD1ai8L7Hbl//LuzaODjLH9usCiYe6KzSSwf8Alg59Ws=
 =hsWu
 -----END PGP SIGNATURE-----

Merge tag 'tag-for-linus-3.5' of git://git.linaro.org/people/sumitsemwal/linux-dma-buf

Pull dma-buf updates from Sumit Semwal:
 "Here's the first signed-tag pull request for dma-buf framework.  It
  includes the following key items:
   - mmap support
   - vmap support
   - related documentation updates

  These are needed by various drivers to allow mmap/vmap of dma-buf
  shared buffers.  Dave Airlie has some prime patches dependent on the
  vmap pull as well."

* tag 'tag-for-linus-3.5' of git://git.linaro.org/people/sumitsemwal/linux-dma-buf:
  dma-buf: add initial vmap documentation
  dma-buf: minor documentation fixes.
  dma-buf: add vmap interface
  dma-buf: mmap support
2012-05-25 09:37:26 -07:00
Linus Torvalds
d5adf235ad Merge branch 'next' of git://git.infradead.org/users/vkoul/slave-dma
Pull slave-dmaengine updates from Vinod Koul:
 "Nothing exciting this time, odd fixes in a bunch of drivers"

* 'next' of git://git.infradead.org/users/vkoul/slave-dma:
  dmaengine: at_hdmac: take maxburst from slave configuration
  dmaengine: at_hdmac: remove ATC_DEFAULT_CTRLA constant
  dmaengine: at_hdmac: remove some at_dma_slave comments
  dma: imx-sdma: make channel0 operations atomic
  dmaengine: Fixup dmaengine_prep_slave_single() to be actually useful
  dmaengine: Use dma_sg_len(sg) instead of sg->length
  dmaengine: Use sg_dma_address instead of sg_phys
  DMA: PL330: Remove duplicate header file inclusion
  dma: imx-sdma: keep the callbacks invoked in the tasklet
  dmaengine: dw_dma: add Device Tree probing capability
  dmaengine: dw_dmac: Add clk_{un}prepare() support
  dma/amba-pl08x: add support for the Nomadik variant
  dma/amba-pl08x: check for terminal count status only
2012-05-25 09:31:59 -07:00
Linus Torvalds
d484864dd9 Merge branch 'for-linus' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping
Pull CMA and ARM DMA-mapping updates from Marek Szyprowski:
 "These patches contain two major updates for DMA mapping subsystem
  (mainly for ARM architecture).  First one is Contiguous Memory
  Allocator (CMA) which makes it possible for device drivers to allocate
  big contiguous chunks of memory after the system has booted.

  The main difference from the similar frameworks is the fact that CMA
  allows to transparently reuse the memory region reserved for the big
  chunk allocation as a system memory, so no memory is wasted when no
  big chunk is allocated.  Once the alloc request is issued, the
  framework migrates system pages to create space for the required big
  chunk of physically contiguous memory.

  For more information one can refer to nice LWN articles:

   - 'A reworked contiguous memory allocator':
		http://lwn.net/Articles/447405/

   - 'CMA and ARM':
		http://lwn.net/Articles/450286/

   - 'A deep dive into CMA':
		http://lwn.net/Articles/486301/

   - and the following thread with the patches and links to all previous
     versions:
		https://lkml.org/lkml/2012/4/3/204

  The main client for this new framework is ARM DMA-mapping subsystem.

  The second part provides a complete redesign in ARM DMA-mapping
  subsystem.  The core implementation has been changed to use common
  struct dma_map_ops based infrastructure with the recent updates for
  new dma attributes merged in v3.4-rc2.  This allows to use more than
  one implementation of dma-mapping calls and change/select them on the
  struct device basis.  The first client of this new infractructure is
  dmabounce implementation which has been completely cut out of the
  core, common code.

  The last patch of this redesign update introduces a new, experimental
  implementation of dma-mapping calls on top of generic IOMMU framework.
  This lets ARM sub-platform to transparently use IOMMU for DMA-mapping
  calls if one provides required IOMMU hardware.

  For more information please refer to the following thread:
		http://www.spinics.net/lists/arm-kernel/msg175729.html

  The last patch merges changes from both updates and provides a
  resolution for the conflicts which cannot be avoided when patches have
  been applied on the same files (mainly arch/arm/mm/dma-mapping.c)."

Acked by Andrew Morton <akpm@linux-foundation.org>:
 "Yup, this one please.  It's had much work, plenty of review and I
  think even Russell is happy with it."

* 'for-linus' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping: (28 commits)
  ARM: dma-mapping: use PMD size for section unmap
  cma: fix migration mode
  ARM: integrate CMA with DMA-mapping subsystem
  X86: integrate CMA with DMA-mapping subsystem
  drivers: add Contiguous Memory Allocator
  mm: trigger page reclaim in alloc_contig_range() to stabilise watermarks
  mm: extract reclaim code from __alloc_pages_direct_reclaim()
  mm: Serialize access to min_free_kbytes
  mm: page_isolation: MIGRATE_CMA isolation functions added
  mm: mmzone: MIGRATE_CMA migration type added
  mm: page_alloc: change fallbacks array handling
  mm: page_alloc: introduce alloc_contig_range()
  mm: compaction: export some of the functions
  mm: compaction: introduce isolate_freepages_range()
  mm: compaction: introduce map_pages()
  mm: compaction: introduce isolate_migratepages_range()
  mm: page_alloc: remove trailing whitespace
  ARM: dma-mapping: add support for IOMMU mapper
  ARM: dma-mapping: use alloc, mmap, free from dma_ops
  ARM: dma-mapping: remove redundant code and do the cleanup
  ...

Conflicts:
	arch/x86/include/asm/dma-mapping.h
2012-05-25 09:18:59 -07:00
Linus Torvalds
92bf3d0941 MMC highlights for 3.5:
Drivers:
  - at91-mci: This driver will be replaced by atmel-mci in 3.7.
  - atmel-mci: Add support for old at91-mci hardware.
  - dw_mmc: Allow multiple controllers; this previously caused corruption.
  - imxmmc: Remove this driver, replaced by mxcmmc.
  - mmci: Add device tree support.
  - omap: Allow multiple controllers.
  - omap_hsmmc: Auto CMD12, DDR support.
  - tegra: Support SD 3.0 spec.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPvtlQAAoJEHNBYZ7TNxYMxr8P/0oKBszCCpvYQVH9F7WxWdcs
 haXcfvQBshKLTw8GxnJhNOuoMpDO565pLCxtL1NigFzA/nnvDrsGu03Rjy8vmeGT
 vtuda+T+OVOmHwGx868fHMMtp3OeV0cyE2I9WQe0R1M0IW5YFpOCS3zzuaXofMlA
 dYK6KJC0RlZnc/Usn4esQhXCBS3dH80ynMfORjNtHl2wDnuvp+k2DD2kB2SFmw0H
 raieXeqcfGTqK9UYWqYYDvFM1D1FZcaokyYIs4Ut8WQtnKWSCWyMEqy5JjC1xbIr
 YZs+08uIUEcGBnYAuuB6XDcmWfPInqTiStQ6lX6iO6msY8DNh9n2wjt5V30X9GWx
 fVT8a27qB7gCf7D/ACRbGq+sgRjCL/2de4UcuH/wZyFu585lohinUqpZM6ODz7wA
 7AyfPYixpegbYLrppESgzdWzDUFY5HFyhogI9JUemtI1etIvy/tH1n6+l0388Qmy
 vYYV91U3ommxw4XgyFI2psJrl5y+XveEdv4q0NuqD5nkVrTDQ3cmp/vXXMfIo7fe
 YliEQuOwV1DAUsvA+FgNKCTYDtHveKIacHVJtQqJcelUOn6Klxs0xgHvsAXgux8U
 ZDLmAPrrg7O+YmDGsHC0lU8joaXQmtfauqVGm+JSSoQv4iSNI+SqwGaSTlzRURd5
 12htXRTx0TTINOAs7vLH
 =rjPa
 -----END PGP SIGNATURE-----

Merge tag 'mmc-merge-for-3.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc

Pull MMC changes from Chris Ball
 - at91-mci: This driver will be replaced by atmel-mci in 3.7.
 - atmel-mci: Add support for old at91-mci hardware.
 - dw_mmc: Allow multiple controllers; this previously caused
   corruption.
 - imxmmc: Remove this driver, replaced by mxcmmc.
 - mmci: Add device tree support.
 - omap: Allow multiple controllers.
 - omap_hsmmc: Auto CMD12, DDR support.
 - tegra: Support SD 3.0 spec.

Fix up the usual trivial conflicts in feature-removal-schedule.txt

* tag 'mmc-merge-for-3.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc: (38 commits)
  mmc: at91-mci: this driver is now deprecated
  mmc: omap_hsmmc: pass IRQF_ONESHOT to request_threaded_irq
  mmc: block: Allow disabling 512B sector size emulation
  mmc: atmel-mci: add debug logs
  mmc: atmel-mci: add support for version lower than v2xx
  mmc: atmel-mci: change the state machine for compatibility with old IP
  mmc: atmel-mci: the r/w proof capability lack was not well managed
  mmc: dw_mmc: Fixed sdio interrupt mask bit setting bug
  mmc: omap: convert to module_platform_driver
  mmc: omap: make it behave well as a module
  mmc: omap: convert to per instance workqueue
  mmc: core: Remove dead code
  mmc: card: Avoid null pointer dereference
  mmc: core: Prevent eMMC VCC supply to be cut from late init
  mmc: dw_mmc: make multiple instances of dw_mci_card_workqueue
  mmc: queue: remove redundant memsets
  mmc: queue: rename mmc_request function
  mmc: core: skip card initialization if power class selection fails
  mmc: core: fix the signaling 1.8V for HS200
  mmc: core: fix the decision of HS200/DDR card-type
  ...
2012-05-25 08:23:32 -07:00
Linus Torvalds
ece78b7df7 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext2, ext3 and quota fixes from Jan Kara:
 "Interesting bits are:
   - removal of a special i_mutex locking subclass (I_MUTEX_QUOTA) since
     quota code does not need i_mutex anymore in any unusual way.
   - backport (from ext4) of a fix of a checkpointing bug (missing cache
     flush) that could lead to fs corruption on power failure

  The rest are just random small fixes & cleanups."

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  ext2: trivial fix to comment for ext2_free_blocks
  ext2: remove the redundant comment for ext2_export_ops
  ext3: return 32/64-bit dir name hash according to usage type
  quota: Get rid of nested I_MUTEX_QUOTA locking subclass
  quota: Use precomputed value of sb_dqopt in dquot_quota_sync
  ext2: Remove i_mutex use from ext2_quota_write()
  reiserfs: Remove i_mutex use from reiserfs_quota_write()
  ext4: Remove i_mutex use from ext4_quota_write()
  ext3: Remove i_mutex use from ext3_quota_write()
  quota: Fix double lock in add_dquot_ref() with CONFIG_QUOTA_DEBUG
  jbd: Write journal superblock with WRITE_FUA after checkpointing
  jbd: protect all log tail updates with j_checkpoint_mutex
  jbd: Split updating of journal superblock and marking journal empty
  ext2: do not register write_super within VFS
  ext2: Remove s_dirt handling
  ext2: write superblock only once on unmount
  ext3: update documentation with barrier=1 default
  ext3: remove max_debt in find_group_orlov()
  jbd: Refine commit writeout logic
2012-05-25 08:14:59 -07:00
Sumit Semwal
12c4727e1d dma-buf: minor documentation fixes.
Some minor inline documentation fixes for gaps resulting from new patches.

Signed-off-by: Sumit Semwal <sumit.semwal@ti.com>
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
2012-05-25 12:46:23 +05:30
Dave Airlie
98f86c9e4a dma-buf: add vmap interface
The main requirement I have for this interface is for scanning out
using the USB gpu devices. Since these devices have to read the
framebuffer on updates and linearly compress it, using kmaps
is a major overhead for every update.

v2: fix warn issues pointed out by Sylwester Nawrocki.

v3: fix compile !CONFIG_DMA_SHARED_BUFFER and add _GPL for now

Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Rob Clark <rob.clark@linaro.org>
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
2012-05-25 12:35:24 +05:30
Daniel Vetter
4c78513e45 dma-buf: mmap support
Compared to Rob Clark's RFC I've ditched the prepare/finish hooks
and corresponding ioctls on the dma_buf file. The major reason for
that is that many people seem to be under the impression that this is
also for synchronization with outstanding asynchronous processsing.
I'm pretty massively opposed to this because:

- It boils down reinventing a new rather general-purpose userspace
  synchronization interface. If we look at things like futexes, this
  is hard to get right.
- Furthermore a lot of kernel code has to interact with this
  synchronization primitive. This smells a look like the dri1 hw_lock,
  a horror show I prefer not to reinvent.
- Even more fun is that multiple different subsystems would interact
  here, so we have plenty of opportunities to create funny deadlock
  scenarios.

I think synchronization is a wholesale different problem from data
sharing and should be tackled as an orthogonal problem.

Now we could demand that prepare/finish may only ensure cache
coherency (as Rob intended), but that runs up into the next problem:
We not only need mmap support to facilitate sw-only processing nodes
in a pipeline (without jumping through hoops by importing the dma_buf
into some sw-access only importer), which allows for a nicer
ION->dma-buf upgrade path for existing Android userspace. We also need
mmap support for existing importing subsystems to support existing
userspace libraries. And a loot of these subsystems are expected to
export coherent userspace mappings.

So prepare/finish can only ever be optional and the exporter /needs/
to support coherent mappings. Given that mmap access is always
somewhat fallback-y in nature I've decided to drop this optimization,
instead of just making it optional. If we demonstrate a clear need for
this, supported by benchmark results, we can always add it in again
later as an optional extension.

Other differences compared to Rob's RFC is the above mentioned support
for mapping a dma-buf through facilities provided by the importer.
Which results in mmap support no longer being optional.

Note that this dma-buf mmap patch does _not_ support every possible
insanity an existing subsystem could pull of with mmap: Because it
does not allow to intercept pagefaults and shoot down ptes importing
subsystems can't add some magic of their own at these points (e.g. to
automatically synchronize with outstanding rendering or set up some
special resources). I've done a cursory read through a few mmap
implementions of various subsytems and I'm hopeful that we can avoid
this (and the complexity it'd bring with it).

Additonally I've extended the documentation a bit to explain the hows
and whys of this mmap extension.

In case we ever want to add support for explicitly cache maneged
userspace mmap with a prepare/finish ioctl pair, we could specify that
userspace needs to mmap a different part of the dma_buf, e.g. the
range starting at dma_buf->size up to dma_buf->size*2. This works
because the size of a dma_buf is invariant over it's lifetime. The
exporter would obviously need to fall back to coherent mappings for
both ranges if a legacy clients maps the coherent range and the
architecture cannot suppor conflicting caching policies. Also, this
would obviously be optional and userspace needs to be able to fall
back to coherent mappings.

v2:
- Spelling fixes from Rob Clark.
- Compile fix for !DMA_BUF from Rob Clark.
- Extend commit message to explain how explicitly cache managed mmap
  support could be added later.
- Extend the documentation with implementations notes for exporters
  that need to manually fake coherency.

v3:
- dma_buf pointer initialization goof-up noticed by Rebecca Schultz
  Zavin.

Cc: Rob Clark <rob.clark@linaro.org>
Cc: Rebecca Schultz Zavin <rebecca@android.com>
Acked-by: Rob Clark <rob.clark@linaro.org>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
2012-05-25 12:35:24 +05:30
Linus Torvalds
07acfc2a93 Merge branch 'next' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM changes from Avi Kivity:
 "Changes include additional instruction emulation, page-crossing MMIO,
  faster dirty logging, preventing the watchdog from killing a stopped
  guest, module autoload, a new MSI ABI, and some minor optimizations
  and fixes.  Outside x86 we have a small s390 and a very large ppc
  update.

  Regarding the new (for kvm) rebaseless workflow, some of the patches
  that were merged before we switch trees had to be rebased, while
  others are true pulls.  In either case the signoffs should be correct
  now."

Fix up trivial conflicts in Documentation/feature-removal-schedule.txt
arch/powerpc/kvm/book3s_segment.S and arch/x86/include/asm/kvm_para.h.

I suspect the kvm_para.h resolution ends up doing the "do I have cpuid"
check effectively twice (it was done differently in two different
commits), but better safe than sorry ;)

* 'next' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (125 commits)
  KVM: make asm-generic/kvm_para.h have an ifdef __KERNEL__ block
  KVM: s390: onereg for timer related registers
  KVM: s390: epoch difference and TOD programmable field
  KVM: s390: KVM_GET/SET_ONEREG for s390
  KVM: s390: add capability indicating COW support
  KVM: Fix mmu_reload() clash with nested vmx event injection
  KVM: MMU: Don't use RCU for lockless shadow walking
  KVM: VMX: Optimize %ds, %es reload
  KVM: VMX: Fix %ds/%es clobber
  KVM: x86 emulator: convert bsf/bsr instructions to emulate_2op_SrcV_nobyte()
  KVM: VMX: unlike vmcs on fail path
  KVM: PPC: Emulator: clean up SPR reads and writes
  KVM: PPC: Emulator: clean up instruction parsing
  kvm/powerpc: Add new ioctl to retreive server MMU infos
  kvm/book3s: Make kernel emulated H_PUT_TCE available for "PR" KVM
  KVM: PPC: bookehv: Fix r8/r13 storing in level exception handler
  KVM: PPC: Book3S: Enable IRQs during exit handling
  KVM: PPC: Fix PR KVM on POWER7 bare metal
  KVM: PPC: Fix stbux emulation
  KVM: PPC: bookehv: Use lwz/stw instead of PPC_LL/PPC_STL for 32-bit fields
  ...
2012-05-24 16:17:30 -07:00
Linus Torvalds
b5f4035adf Features:
* Extend the APIC ops implementation and add IRQ_WORKER vector support so that 'perf' can work properly.
  * Fix self-ballooning code, and balloon logic when booting as initial domain.
  * Move array printing code to generic debugfs
  * Support XenBus domains.
  * Lazily free grants when a domain is dead/non-existent.
  * In M2P code use batching calls
 Bug-fixes:
  * Fix NULL dereference in allocation failure path (hvc_xen)
  * Fix unbinding of IRQ_WORKER vector during vCPU hot-unplug
  * Fix HVM guest resume - we would leak an PIRQ value instead of reusing the existing one.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQEcBAABAgAGBQJPu9MpAAoJEFjIrFwIi8fJaNQH/RylThiO+O+LBpPrO8VRUw+2
 /Io98T7ZK2ggoUeaJx0C8irM0JMFAkxGMcfX3w9fwNt/BTec4s++4JhbN1jYN0da
 6a0PqINo+M8y73So6CBfuJDCunaRLGKVG/ibIO3Y3WAff51/H+DMvO7uYYDAE0aA
 mikyOxnaty0DiG5i4JGDHGmCzDASfK/jgGccZ03m6522mDx5ZIbTzZWONLfz8dqT
 rbxnn9vrNLgEYWuzyLMwW0GymToUtt01xBQvwJLAbhn8lr1WBRBLpxXA+5iYNQrn
 Ri25G7keYJhG4uwZfaHnR+4HTrmhlGzK1Z96dkqpGUaeIcdyWmPMp22VtBBiwG8=
 =uyRr
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.5-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen

Pull Xen updates from Konrad Rzeszutek Wilk:
 "Features:
   * Extend the APIC ops implementation and add IRQ_WORKER vector
     support so that 'perf' can work properly.
   * Fix self-ballooning code, and balloon logic when booting as initial
     domain.
   * Move array printing code to generic debugfs
   * Support XenBus domains.
   * Lazily free grants when a domain is dead/non-existent.
   * In M2P code use batching calls
  Bug-fixes:
   * Fix NULL dereference in allocation failure path (hvc_xen)
   * Fix unbinding of IRQ_WORKER vector during vCPU hot-unplug
   * Fix HVM guest resume - we would leak an PIRQ value instead of
     reusing the existing one."

Fix up add-add onflicts in arch/x86/xen/enlighten.c due to addition of
apic ipi interface next to the new apic_id functions.

* tag 'stable/for-linus-3.5-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen: do not map the same GSI twice in PVHVM guests.
  hvc_xen: NULL dereference on allocation failure
  xen: Add selfballoning memory reservation tunable.
  xenbus: Add support for xenbus backend in stub domain
  xen/smp: unbind irqworkX when unplugging vCPUs.
  xen: enter/exit lazy_mmu_mode around m2p_override calls
  xen/acpi/sleep: Enable ACPI sleep via the __acpi_os_prepare_sleep
  xen: implement IRQ_WORK_VECTOR handler
  xen: implement apic ipi interface
  xen/setup: update VA mapping when releasing memory during setup
  xen/setup: Combine the two hypercall functions - since they are quite similar.
  xen/setup: Populate freed MFNs from non-RAM E820 entries and gaps to E820 RAM
  xen/setup: Only print "Freeing XXX-YYY pfn range: Z pages freed" if Z > 0
  xen/gnttab: add deferred freeing logic
  debugfs: Add support to print u32 array in debugfs
  xen/p2m: An early bootup variant of set_phys_to_machine
  xen/p2m: Collapse early_alloc_p2m_middle redundant checks.
  xen/p2m: Allow alloc_p2m_middle to call reserve_brk depending on argument
  xen/p2m: Move code around to allow for better re-usage.
2012-05-24 16:02:08 -07:00
Linus Torvalds
ce004178be Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
Pull sparc changes from David S. Miller:
 "This has the generic strncpy_from_user() implementation architectures
  can now use, which we've been developing on linux-arch over the past
  few days.

  For good measure I ran both a 32-bit and a 64-bit glibc testsuite run,
  and the latter of which pointed out an adjustment I needed to make to
  sparc's user_addr_max() definition.  Linus, you were right, STACK_TOP
  was not the right thing to use, even on sparc itself :-)

  From Sam Ravnborg, we have a conversion of sparc32 over to the common
  alloc_thread_info_node(), since the aspect which originally blocked
  our doing so (sun4c) has been removed."

Fix up trivial arch/sparc/Kconfig and lib/Makefile conflicts.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  sparc: Fix user_addr_max() definition.
  lib: Sparc's strncpy_from_user is generic enough, move under lib/
  kernel: Move REPEAT_BYTE definition into linux/kernel.h
  sparc: Increase portability of strncpy_from_user() implementation.
  sparc: Optimize strncpy_from_user() zero byte search.
  sparc: Add full proper error handling to strncpy_from_user().
  sparc32: use the common implementation of alloc_thread_info_node()
2012-05-24 15:10:28 -07:00
Linus Torvalds
b1bf7d4d1b GPIO driver changes for v3.5 merge window
Lots of gpio changes, both to core code and drivers.  Changes do touch
 architecture code to remove the need for separate arm/gpio.h includes
 in most architectures.  Some new drivers are added, and a number of
 gpio drivers are converted to use irq_domains for gpio inputs used as
 interrupts.  Device tree support has been amended to allow multiple
 gpio_chips to use the same device tree node.  Remaining changes are
 primarily bug fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPvpFBAAoJEEFnBt12D9kB50EP/0q2co+Ddlz4DWM07TLMgTw8
 eCSi79+oB85RcE+0FlAo/SJu9VlYDKSLT3wMbIyycfJi3cUtOb+hay0j+wxcn4bz
 G2qXj2Het5rX6hFI2tSCvJfDqMwU0wEygn9a6a/bw3VGSOIVmMTmRswrbbBcFzVu
 8xobviN7LANLEZyhd4Ip5YfrcWH9ABmmhZX7ihn1AJubVL47xGo0uds9ZFX1sAKB
 Zyr80+BeUK7mhZ74UUfQHtS+x24JD62OLM9eaQN0/BBAqBewQJlxhMakPbTGmcuO
 Vy3CPmZiWw6tdVWgKvxE7cIXLI4YbB2B6w2TRJBBkFAlz4RsO2bFU/ibEv1vg9YE
 oxAUelMj0INdY4iRT135fDJTIGauWon22Tqd2MVtun4r6fwcL0BgFYN6yCMtEqbx
 bpYkKTi6tdyE7k2Ph+carCIuw9SwOk/4pm1xCWC0k6YdAnRE0zykCLvAuAabpmzs
 i/H1jcp/F4KSYldEoDlGYG4lFZiISthxOy9l6/d4GrBj723attrmztolMfrpFLF6
 XPTf7HODQFmZ6n7mBIjCg4hoqydAYyKcW7lROc7DKkEXIWOeeeA+EoTytkwPLLz5
 CBLoZfDoqUT8xa2vv4MZ/+G9chSDi5vMryqEYi9tXMbVEZW31xqh6hxk0xPMcY13
 qVAaRlcz49AQjWq/0vR4
 =U6hj
 -----END PGP SIGNATURE-----

Merge tag 'gpio-for-linus' of git://git.secretlab.ca/git/linux-2.6

Pull GPIO driver changes from Grant Likely:
 "Lots of gpio changes, both to core code and drivers.

  Changes do touch architecture code to remove the need for separate
  arm/gpio.h includes in most architectures.

  Some new drivers are added, and a number of gpio drivers are converted
  to use irq_domains for gpio inputs used as interrupts.  Device tree
  support has been amended to allow multiple gpio_chips to use the same
  device tree node.

  Remaining changes are primarily bug fixes."

* tag 'gpio-for-linus' of git://git.secretlab.ca/git/linux-2.6: (33 commits)
  gpio/generic: initialize basic_mmio_gpio shadow variables properly
  gpiolib: Remove 'const' from data argument of gpiochip_find()
  gpio/rc5t583: add gpio driver for RICOH PMIC RC5T583
  gpiolib: quiet gpiochip_add boot message noise
  gpio: mpc8xxx: Prevent NULL pointer deref in demux handler
  gpio/lpc32xx: Add device tree support
  gpio: Adjust of_xlate API to support multiple GPIO chips
  gpiolib: Implement devm_gpio_request_one()
  gpio-mcp23s08: dbg_show: fix pullup configuration display
  Add support for TCA6424A
  gpio/omap: (re)fix wakeups on level-triggered GPIOs
  gpio/omap: fix broken context restore for non-OFF mode transitions
  gpio/omap: fix missing check in *_runtime_suspend()
  gpio/omap: remove cpu_is_omapxxxx() checks from *_runtime_resume()
  gpio/omap: remove suspend/resume callbacks
  gpio/omap: remove retrigger variable in gpio_irq_handler
  gpio/omap: remove saved_wakeup field from struct gpio_bank
  gpio/omap: remove suspend_wakeup field from struct gpio_bank
  gpio/omap: remove saved_fallingdetect, saved_risingdetect
  gpio/omap: remove virtual_irq_start variable
  ...

Conflicts:
	drivers/gpio/gpio-samsung.c
2012-05-24 14:01:46 -07:00