linux-next

mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-22 20:23:57 +08:00

Author	SHA1	Message	Date
David Howells	27a3aadcdc	UAPI: (Scripted) Disintegrate include/linux/caif Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Michael Kerrisk <mtk.manpages@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Dave Jones <davej@redhat.com>	2012-10-09 09:48:40 +01:00
Linus Torvalds	9e2d8656f5	Merge branch 'akpm' (Andrew's patch-bomb) Merge patches from Andrew Morton: "A few misc things and very nearly all of the MM tree. A tremendous amount of stuff (again), including a significant rbtree library rework." * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (160 commits) sparc64: Support transparent huge pages. mm: thp: Use more portable PMD clearing sequenece in zap_huge_pmd(). mm: Add and use update_mmu_cache_pmd() in transparent huge page code. sparc64: Document PGD and PMD layout. sparc64: Eliminate PTE table memory wastage. sparc64: Halve the size of PTE tables sparc64: Only support 4MB huge pages and 8KB base pages. memory-hotplug: suppress "Trying to free nonexistent resource <XXXXXXXXXXXXXXXX-YYYYYYYYYYYYYYYY>" warning mm: memcg: clean up mm_match_cgroup() signature mm: document PageHuge somewhat mm: use %pK for /proc/vmallocinfo mm, thp: fix mlock statistics mm, thp: fix mapped pages avoiding unevictable list on mlock memory-hotplug: update memory block's state and notify userspace memory-hotplug: preparation to notify memory block's state at memory hot remove mm: avoid section mismatch warning for memblock_type_name make GFP_NOTRACK definition unconditional cma: decrease cc.nr_migratepages after reclaiming pagelist CMA: migrate mlocked pages kpageflags: fix wrong KPF_THP on non-huge compound pages ...	2012-10-09 16:23:15 +09:00
Johannes Weiner	587af308cc	mm: memcg: clean up mm_match_cgroup() signature It really should return a boolean for match/no match. And since it takes a memcg, not a cgroup, fix that parameter name as well. [akpm@linux-foundation.org: mm_match_cgroup() is not a macro] Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:23:04 +09:00
David Rientjes	b676b293fb	mm, thp: fix mapped pages avoiding unevictable list on mlock When a transparent hugepage is mapped and it is included in an mlock() range, follow_page() incorrectly avoids setting the page's mlock bit and moving it to the unevictable lru. This is evident if you try to mlock(), munlock(), and then mlock() a range again. Currently: #define MAP_SIZE (4 << 30) /* 4GB / void ptr = mmap(NULL, MAP_SIZE, PROT_READ \| PROT_WRITE, MAP_PRIVATE \| MAP_ANONYMOUS, 0, 0); mlock(ptr, MAP_SIZE); $ grep -E "Unevictable\|Inactive\(anon" /proc/meminfo Inactive(anon): 6304 kB Unevictable: 4213924 kB munlock(ptr, MAP_SIZE); Inactive(anon): 4186252 kB Unevictable: 19652 kB mlock(ptr, MAP_SIZE); Inactive(anon): 4198556 kB Unevictable: 21684 kB Notice that less than 2MB was added to the unevictable list; this is because these pages in the range are not transparent hugepages since the 4GB range was allocated with mmap() and has no specific alignment. If posix_memalign() were used instead, unevictable would not have grown at all on the second mlock(). The fix is to call mlock_vma_page() so that the mlock bit is set and the page is added to the unevictable list. With this patch: mlock(ptr, MAP_SIZE); Inactive(anon): 4056 kB Unevictable: 4213940 kB munlock(ptr, MAP_SIZE); Inactive(anon): 4198268 kB Unevictable: 19636 kB mlock(ptr, MAP_SIZE); Inactive(anon): 4008 kB Unevictable: 4213940 kB Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Hugh Dickins <hughd@google.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:23:02 +09:00
Wen Congyang	e90bdb7f52	memory-hotplug: update memory block's state and notify userspace remove_memory() will be called when hot removing a memory device. But even if offlining memory, we cannot notice it. So the patch updates the memory block's state and sends notification to userspace. Additionally, the memory device may contain more than one memory block. If the memory block has been offlined, __offline_pages() will fail. So we should try to offline one memory block at a time. Thus remove_memory() also check each memory block's state. So there is no need to check the memory block's state before calling remove_memory(). Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: Jiang Liu <liuj97@gmail.com> Cc: Len Brown <len.brown@intel.com> Cc: Christoph Lameter <cl@linux.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:23:02 +09:00
Wen Congyang	a16cee10c7	memory-hotplug: preparation to notify memory block's state at memory hot remove remove_memory() is called in two cases: 1. echo offline >/sys/devices/system/memory/memoryXX/state 2. hot remove a memory device In the 1st case, the memory block's state is changed and the notification that memory block's state changed is sent to userland after calling remove_memory(). So user can notice memory block is changed. But in the 2nd case, the memory block's state is not changed and the notification is not also sent to userspcae even if calling remove_memory(). So user cannot notice memory block is changed. For adding the notification at memory hot remove, the patch just prepare as follows: 1st case uses offline_pages() for offlining memory. 2nd case uses remove_memory() for offlining memory and changing memory block's state and notifing the information. The patch does not implement notification to remove_memory(). Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: Jiang Liu <liuj97@gmail.com> Cc: Len Brown <len.brown@intel.com> Cc: Christoph Lameter <cl@linux.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:23:02 +09:00
Glauber Costa	3e648ebe07	make GFP_NOTRACK definition unconditional There was a general sentiment in a recent discussion (See https://lkml.org/lkml/2012/9/18/258) that the __GFP flags should be defined unconditionally. Currently, the only offender is GFP_NOTRACK, which is conditional to KMEMCHECK. Signed-off-by: Glauber Costa <glommer@parallels.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Mel Gorman <mgorman@suse.de> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:23:01 +09:00
Minchan Kim	e46a28790e	CMA: migrate mlocked pages Presently CMA cannot migrate mlocked pages so it ends up failing to allocate contiguous memory space. This patch makes mlocked pages be migrated out. Of course, it can affect realtime processes but in CMA usecase, contiguous memory allocation failing is far worse than access latency to an mlocked page being variable while CMA is running. If someone wants to make the system realtime, he shouldn't enable CMA because stalls can still happen at random times. [akpm@linux-foundation.org: tweak comment text, per Mel] Signed-off-by: Minchan Kim <minchan@kernel.org> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:23:00 +09:00
Hugh Dickins	8befedfe67	mm: remove unevictable_pgs_mlockfreed Simply remove UNEVICTABLE_MLOCKFREED and unevictable_pgs_mlockfreed line from /proc/vmstat: Johannes and Mel point out that it was very unlikely to have been used by any tool, and of course we can restore it easily enough if that turns out to be wrong. Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michel Lespinasse <walken@google.com> Cc: Ying Han <yinghan@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:59 +09:00
Minchan Kim	5a88381384	memory-hotplug: fix zone stat mismatch During memory-hotplug, I found NR_ISOLATED_[ANON\|FILE] are increasing, causing the kernel to hang. When the system doesn't have enough free pages, it enters reclaim but never reclaim any pages due to too_many_isolated()==true and loops forever. The cause is that when we do memory-hotadd after memory-remove, __zone_pcp_update() clears a zone's ZONE_STAT_ITEMS in setup_pageset() although the vm_stat_diff of all CPUs still have values. In addtion, when we offline all pages of the zone, we reset them in zone_pcp_reset without draining so we loss some zone stat item. Reviewed-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:59 +09:00
Sagi Grimberg	2ec74c3ef2	mm: move all mmu notifier invocations to be done outside the PT lock In order to allow sleeping during mmu notifier calls, we need to avoid invoking them under the page table spinlock. This patch solves the problem by calling invalidate_page notification after releasing the lock (but before freeing the page itself), or by wrapping the page invalidation with calls to invalidate_range_begin and invalidate_range_end. To prevent accidental changes to the invalidate_range_end arguments after the call to invalidate_range_begin, the patch introduces a convention of saving the arguments in consistently named locals: unsigned long mmun_start; /* For mmu_notifiers / unsigned long mmun_end; / For mmu_notifiers */ ... mmun_start = ... mmun_end = ... mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end); ... mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end); The patch changes code to use this convention for all calls to mmu_notifier_invalidate_range_start/end, except those where the calls are close enough so that anyone who glances at the code can see the values aren't changing. This patchset is a preliminary step towards on-demand paging design to be added to the RDMA stack. Why do we want on-demand paging for Infiniband? Applications register memory with an RDMA adapter using system calls, and subsequently post IO operations that refer to the corresponding virtual addresses directly to HW. Until now, this was achieved by pinning the memory during the registration calls. The goal of on demand paging is to avoid pinning the pages of registered memory regions (MRs). This will allow users the same flexibility they get when swapping any other part of their processes address spaces. Instead of requiring the entire MR to fit in physical memory, we can allow the MR to be larger, and only fit the current working set in physical memory. Why should anyone care? What problems are users currently experiencing? This can make programming with RDMA much simpler. Today, developers that are working with more data than their RAM can hold need either to deregister and reregister memory regions throughout their process's life, or keep a single memory region and copy the data to it. On demand paging will allow these developers to register a single MR at the beginning of their process's life, and let the operating system manage which pages needs to be fetched at a given time. In the future, we might be able to provide a single memory access key for each process that would provide the entire process's address as one large memory region, and the developers wouldn't need to register memory regions at all. Is there any prospect that any other subsystems will utilise these infrastructural changes? If so, which and how, etc? As for other subsystems, I understand that XPMEM wanted to sleep in MMU notifiers, as Christoph Lameter wrote at http://lkml.indiana.edu/hypermail/linux/kernel/0802.1/0460.html and perhaps Andrea knows about other use cases. Scheduling in mmu notifications is required since we need to sync the hardware with the secondary page tables change. A TLB flush of an IO device is inherently slower than a CPU TLB flush, so our design works by sending the invalidation request to the device, and waiting for an interrupt before exiting the mmu notifier handler. Avi said: kvm may be a buyer. kvm::mmu_lock, which serializes guest page faults, also protects long operations such as destroying large ranges. It would be good to convert it into a spinlock, but as it is used inside mmu notifiers, this cannot be done. (there are alternatives, such as keeping the spinlock and using a generation counter to do the teardown in O(1), which is what the "may" is doing up there). [akpm@linux-foundation.orgpossible speed tweak in hugetlb_cow(), cleanups] Signed-off-by: Andrea Arcangeli <andrea@qumranet.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: Or Gerlitz <ogerlitz@mellanox.com> Cc: Haggai Eran <haggaie@mellanox.com> Cc: Shachar Raindel <raindel@mellanox.com> Cc: Liran Liss <liranl@mellanox.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Avi Kivity <avi@redhat.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:58 +09:00
David Rientjes	957f822a0a	mm, numa: reclaim from all nodes within reclaim distance RECLAIM_DISTANCE represents the distance between nodes at which it is deemed too costly to allocate from; it's preferred to try to reclaim from a local zone before falling back to allocating on a remote node with such a distance. To do this, zone_reclaim_mode is set if the distance between any two nodes on the system is greather than this distance. This, however, ends up causing the page allocator to reclaim from every zone regardless of its affinity. What we really want is to reclaim only from zones that are closer than RECLAIM_DISTANCE. This patch adds a nodemask to each node that represents the set of nodes that are within this distance. During the zone iteration, if the bit for a zone's node is set for the local node, then reclaim is attempted; otherwise, the zone is skipped. [akpm@linux-foundation.org: fix CONFIG_NUMA=n build] Signed-off-by: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Minchan Kim <minchan@kernel.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:56 +09:00
Hugh Dickins	a0c5e813f0	mm: remove free_page_mlock We should not be seeing non-0 unevictable_pgs_mlockfreed any longer. So remove free_page_mlock() from the page freeing paths: __PG_MLOCKED is already in PAGE_FLAGS_CHECK_AT_FREE, so free_pages_check() will now be checking it, reporting "BUG: Bad page state" if it's ever found set. Comment UNEVICTABLE_MLOCKFREED and unevictable_pgs_mlockfreed always 0. Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Cc: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michel Lespinasse <walken@google.com> Cc: Ying Han <yinghan@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:56 +09:00
Hugh Dickins	39b5f29ac1	mm: remove vma arg from page_evictable page_evictable(page, vma) is an irritant: almost all its callers pass NULL for vma. Remove the vma arg and use mlocked_vma_newpage(vma, page) explicitly in the couple of places it's needed. But in those places we don't even need page_evictable() itself! They're dealing with a freshly allocated anonymous page, which has no "mapping" and cannot be mlocked yet. Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Cc: Rik van Riel <riel@redhat.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michel Lespinasse <walken@google.com> Cc: Ying Han <yinghan@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:55 +09:00
Jianguo Wu	7f1290f2f2	mm: fix-up zone present pages I think zone->present_pages indicates pages that buddy system can management, it should be: zone->present_pages = spanned pages - absent pages - bootmem pages, but is now: zone->present_pages = spanned pages - absent pages - memmap pages. spanned pages: total size, including holes. absent pages: holes. bootmem pages: pages used in system boot, managed by bootmem allocator. memmap pages: pages used by page structs. This may cause zone->present_pages less than it should be. For example, numa node 1 has ZONE_NORMAL and ZONE_MOVABLE, it's memmap and other bootmem will be allocated from ZONE_MOVABLE, so ZONE_NORMAL's present_pages should be spanned pages - absent pages, but now it also minus memmap pages(free_area_init_core), which are actually allocated from ZONE_MOVABLE. When offlining all memory of a zone, this will cause zone->present_pages less than 0, because present_pages is unsigned long type, it is actually a very large integer, it indirectly caused zone->watermark[WMARK_MIN] becomes a large integer(setup_per_zone_wmarks()), than cause totalreserve_pages become a large integer(calculate_totalreserve_pages()), and finally cause memory allocating failure when fork process(__vm_enough_memory()). [root@localhost ~]# dmesg -bash: fork: Cannot allocate memory I think the bug described in http://marc.info/?l=linux-mm&m=134502182714186&w=2 is also caused by wrong zone present pages. This patch intends to fix-up zone->present_pages when memory are freed to buddy system on x86_64 and IA64 platforms. Signed-off-by: Jianguo Wu <wujianguo@huawei.com> Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Reported-by: Petr Tesarik <ptesarik@suse.cz> Tested-by: Petr Tesarik <ptesarik@suse.cz> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:54 +09:00
Catalin Marinas	2d28a2275c	mm: thp: fix the pmd_clear() arguments in pmdp_get_and_clear() The CONFIG_TRANSPARENT_HUGEPAGE implementation of pmdp_get_and_clear() calls pmd_clear() with 3 arguments instead of 1. This happens only for !__HAVE_ARCH_PMDP_GET_AND_CLEAR which doesn't seem to happen because x86 defines this and it uses pmd_update. [mhocko@suse.cz: changelog addition] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Steve Capper <steve.capper@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: Kirill A. Shutemov <kirill@shutemov.name> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:53 +09:00
Minchan Kim	723a0644a7	mm/page_alloc: refactor out __alloc_contig_migrate_alloc() __alloc_contig_migrate_alloc() can be used by memory-hotplug so refactor it out (move + rename as a common name) into page_isolation.c. [akpm@linux-foundation.org: checkpatch fixes] Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Acked-by: Michal Nazarewicz <mina86@mina86.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:52 +09:00
Mel Gorman	62997027ca	mm: compaction: clear PG_migrate_skip based on compaction and reclaim activity Compaction caches if a pageblock was scanned and no pages were isolated so that the pageblocks can be skipped in the future to reduce scanning. This information is not cleared by the page allocator based on activity due to the impact it would have to the page allocator fast paths. Hence there is a requirement that something clear the cache or pageblocks will be skipped forever. Currently the cache is cleared if there were a number of recent allocation failures and it has not been cleared within the last 5 seconds. Time-based decisions like this are terrible as they have no relationship to VM activity and is basically a big hammer. Unfortunately, accurate heuristics would add cost to some hot paths so this patch implements a rough heuristic. There are two cases where the cache is cleared. 1. If a !kswapd process completes a compaction cycle (migrate and free scanner meet), the zone is marked compact_blockskip_flush. When kswapd goes to sleep, it will clear the cache. This is expected to be the common case where the cache is cleared. It does not really matter if kswapd happens to be asleep or going to sleep when the flag is set as it will be woken on the next allocation request. 2. If there have been multiple failures recently and compaction just finished being deferred then a process will clear the cache and start a full scan. This situation happens if there are multiple high-order allocation requests under heavy memory pressure. The clearing of the PG_migrate_skip bits and other scans is inherently racy but the race is harmless. For allocations that can fail such as THP, they will simply fail. For requests that cannot fail, they will retry the allocation. Tests indicated that scanning rates were roughly similar to when the time-based heuristic was used and the allocation success rates were similar. Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Richard Davies <richard@arachsys.com> Cc: Shaohua Li <shli@kernel.org> Cc: Avi Kivity <avi@redhat.com> Cc: Rafael Aquini <aquini@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:51 +09:00
Mel Gorman	c89511ab2f	mm: compaction: Restart compaction from near where it left off This is almost entirely based on Rik's previous patches and discussions with him about how this might be implemented. Order > 0 compaction stops when enough free pages of the correct page order have been coalesced. When doing subsequent higher order allocations, it is possible for compaction to be invoked many times. However, the compaction code always starts out looking for things to compact at the start of the zone, and for free pages to compact things to at the end of the zone. This can cause quadratic behaviour, with isolate_freepages starting at the end of the zone each time, even though previous invocations of the compaction code already filled up all free memory on that end of the zone. This can cause isolate_freepages to take enormous amounts of CPU with certain workloads on larger memory systems. This patch caches where the migration and free scanner should start from on subsequent compaction invocations using the pageblock-skip information. When compaction starts it begins from the cached restart points and will update the cached restart points until a page is isolated or a pageblock is skipped that would have been scanned by synchronous compaction. Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Richard Davies <richard@arachsys.com> Cc: Shaohua Li <shli@kernel.org> Cc: Avi Kivity <avi@redhat.com> Acked-by: Rafael Aquini <aquini@redhat.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:50 +09:00
Mel Gorman	bb13ffeb9f	mm: compaction: cache if a pageblock was scanned and no pages were isolated When compaction was implemented it was known that scanning could potentially be excessive. The ideal was that a counter be maintained for each pageblock but maintaining this information would incur a severe penalty due to a shared writable cache line. It has reached the point where the scanning costs are a serious problem, particularly on long-lived systems where a large process starts and allocates a large number of THPs at the same time. Instead of using a shared counter, this patch adds another bit to the pageblock flags called PG_migrate_skip. If a pageblock is scanned by either migrate or free scanner and 0 pages were isolated, the pageblock is marked to be skipped in the future. When scanning, this bit is checked before any scanning takes place and the block skipped if set. The main difficulty with a patch like this is "when to ignore the cached information?" If it's ignored too often, the scanning rates will still be excessive. If the information is too stale then allocations will fail that might have otherwise succeeded. In this patch o CMA always ignores the information o If the migrate and free scanner meet then the cached information will be discarded if it's at least 5 seconds since the last time the cache was discarded o If there are a large number of allocation failures, discard the cache. The time-based heuristic is very clumsy but there are few choices for a better event. Depending solely on multiple allocation failures still allows excessive scanning when THP allocations are failing in quick succession due to memory pressure. Waiting until memory pressure is relieved would cause compaction to continually fail instead of using reclaim/compaction to try allocate the page. The time-based mechanism is clumsy but a better option is not obvious. Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Richard Davies <richard@arachsys.com> Cc: Shaohua Li <shli@kernel.org> Cc: Avi Kivity <avi@redhat.com> Acked-by: Rafael Aquini <aquini@redhat.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: Kyungmin Park <kyungmin.park@samsung.com> Cc: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:50 +09:00
Mel Gorman	753341a4b8	revert "mm: have order > 0 compaction start off where it left" This reverts commit `7db8889ab0` ("mm: have order > 0 compaction start off where it left") and commit `de74f1cc` ("mm: have order > 0 compaction start near a pageblock with free pages"). These patches were a good idea and tests confirmed that they massively reduced the amount of scanning but the implementation is complex and tricky to understand. A later patch will cache what pageblocks should be skipped and reimplements the concept of compact_cached_free_pfn on top for both migration and free scanners. Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Richard Davies <richard@arachsys.com> Cc: Shaohua Li <shli@kernel.org> Cc: Avi Kivity <avi@redhat.com> Acked-by: Rafael Aquini <aquini@redhat.com> Acked-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:50 +09:00
Wanpeng Li	f2d52fe51c	mm/memblock: cleanup early_node_map[] related comments Commit `0ee332c145` ("memblock: Kill early_node_map[]") removed early_node_map[]. Clean up the comments to comply with that change. Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Gavin Shan <shangw@linux.vnet.ibm.com> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:47 +09:00
Shaohua Li	45cac65b0f	readahead: fault retry breaks mmap file read random detection .fault now can retry. The retry can break state machine of .fault. In filemap_fault, if page is miss, ra->mmap_miss is increased. In the second try, since the page is in page cache now, ra->mmap_miss is decreased. And these are done in one fault, so we can't detect random mmap file access. Add a new flag to indicate .fault is tried once. In the second try, skip ra->mmap_miss decreasing. The filemap_fault state machine is ok with it. I only tested x86, didn't test other archs, but looks the change for other archs is obvious, but who knows :) Signed-off-by: Shaohua Li <shaohua.li@fusionio.com> Cc: Rik van Riel <riel@redhat.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:47 +09:00
Shaohua Li	e79bee24fd	atomic: implement generic atomic_dec_if_positive() The x86 implementation of atomic_dec_if_positive is quite generic, so make it available to all architectures. This is needed for "swap: add a simple detector for inappropriate swapin readahead". [akpm@linux-foundation.org: do the "#define foo foo" trick in the conventional manner] Signed-off-by: Shaohua Li <shli@fusionio.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: "David S. Miller" <davem@davemloft.net> Cc: Rik van Riel <riel@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michal Simek <monstr@monstr.eu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:46 +09:00
Minchan Kim	435b405c06	memory-hotplug: fix pages missed by race rather than failing If race between allocation and isolation in memory-hotplug offline happens, some pages could be in MIGRATE_MOVABLE of free_list although the pageblock's migratetype of the page is MIGRATE_ISOLATE. The race could be detected by get_freepage_migratetype in __test_page_isolated_in_pageblock. If it is detected, now EBUSY gets bubbled all the way up and the hotplug operations fails. But better idea is instead of returning and failing memory-hotremove, move the free page to the correct list at the time it is detected. It could enhance memory-hotremove operation success ratio although the race is really rare. Suggested by Mel Gorman. [akpm@linux-foundation.org: small cleanup] Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:46 +09:00
Minchan Kim	95e3441248	mm: remain migratetype in freed page The page allocator caches the pageblock information in page->private while it is in the PCP freelists but this is overwritten with the order of the page when freed to the buddy allocator. This patch stores the migratetype of the page in the page->index field so that it is available at all times when the page remain in free_list. This patch adds a new call site in __free_pages_ok so it might be overhead a bit but it's for high order allocation. So I believe damage isn't hurt. Signed-off-by: Minchan Kim <minchan@kernel.org> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:45 +09:00
Minchan Kim	b12c4ad14e	mm: page_alloc: use get_freepage_migratetype() instead of page_private() The page allocator uses set_page_private and page_private for handling migratetype when it frees page. Let's replace them with [set\|get] _freepage_migratetype to make it more clear. Signed-off-by: Minchan Kim <minchan@kernel.org> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:45 +09:00
Bartlomiej Zolnierkiewicz	d1ce749a0d	cma: count free CMA pages Add NR_FREE_CMA_PAGES counter to be later used for checking watermark in __zone_watermark_ok(). For simplicity and to avoid #ifdef hell make this counter always available (not only when CONFIG_CMA=y). [akpm@linux-foundation.org: use conventional migratetype naming] Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:44 +09:00
Minchan Kim	02c6de8d75	mm: cma: discard clean pages during contiguous allocation instead of migration Drop clean cache pages instead of migration during alloc_contig_range() to minimise allocation latency by reducing the amount of migration that is necessary. It's useful for CMA because latency of migration is more important than evicting the background process's working set. In addition, as pages are reclaimed then fewer free pages for migration targets are required so it avoids memory reclaiming to get free pages, which is a contributory factor to increased latency. I measured elapsed time of __alloc_contig_migrate_range() which migrates 10M in 40M movable zone in QEMU machine. Before - 146ms, After - 7ms [akpm@linux-foundation.org: fix nommu build] Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Mel Gorman <mgorman@suse.de> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Acked-by: Michal Nazarewicz <mina86@mina86.com> Cc: Rik van Riel <riel@redhat.com> Tested-by: Kyungmin Park <kyungmin.park@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:43 +09:00
Michel Lespinasse	38a76013ad	mm: avoid taking rmap locks in move_ptes() During mremap(), the destination VMA is generally placed after the original vma in rmap traversal order: in move_vma(), we always have new_pgoff >= vma->vm_pgoff, and as a result new_vma->vm_pgoff >= vma->vm_pgoff unless vma_merge() merged the new vma with an adjacent one. When the destination VMA is placed after the original in rmap traversal order, we can avoid taking the rmap locks in move_ptes(). Essentially, this reintroduces the optimization that had been disabled in "mm anon rmap: remove anon_vma_moveto_tail". The difference is that we don't try to impose the rmap traversal order; instead we just rely on things being in the desired order in the common case and fall back to taking locks in the uncommon case. Also we skip the i_mmap_mutex in addition to the anon_vma lock: in both cases, the vmas are traversed in increasing vm_pgoff order with ties resolved in tree insertion order. Signed-off-by: Michel Lespinasse <walken@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Daniel Santos <daniel.santos@pobox.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:42 +09:00
Michel Lespinasse	ed8ea81501	mm: add CONFIG_DEBUG_VM_RB build option Add a CONFIG_DEBUG_VM_RB build option for the previously existing DEBUG_MM_RB code. Now that Andi Kleen modified it to avoid using recursive algorithms, we can expose it a bit more. Also extend this code to validate_mm() after stack expansion, and to check that the vma's start and last pgoffs have not changed since the nodes were inserted on the anon vma interval tree (as it is important that the nodes be reindexed after each such update). Signed-off-by: Michel Lespinasse <walken@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Daniel Santos <daniel.santos@pobox.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:42 +09:00
Michel Lespinasse	bf181b9f9d	mm anon rmap: replace same_anon_vma linked list with an interval tree. When a large VMA (anon or private file mapping) is first touched, which will populate its anon_vma field, and then split into many regions through the use of mprotect(), the original anon_vma ends up linking all of the vmas on a linked list. This can cause rmap to become inefficient, as we have to walk potentially thousands of irrelevent vmas before finding the one a given anon page might fall into. By replacing the same_anon_vma linked list with an interval tree (where each avc's interval is determined by its vma's start and last pgoffs), we can make rmap efficient for this use case again. While the change is large, all of its pieces are fairly simple. Most places that were walking the same_anon_vma list were looking for a known pgoff, so they can just use the anon_vma_interval_tree_foreach() interval tree iterator instead. The exception here is ksm, where the page's index is not known. It would probably be possible to rework ksm so that the index would be known, but for now I have decided to keep things simple and just walk the entirety of the interval tree there. When updating vma's that already have an anon_vma assigned, we must take care to re-index the corresponding avc's on their interval tree. This is done through the use of anon_vma_interval_tree_pre_update_vma() and anon_vma_interval_tree_post_update_vma(), which remove the avc's from their interval tree before the update and re-insert them after the update. The anon_vma stays locked during the update, so there is no chance that rmap would miss the vmas that are being updated. Signed-off-by: Michel Lespinasse <walken@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Daniel Santos <daniel.santos@pobox.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:41 +09:00
Michel Lespinasse	108d6642ad	mm anon rmap: remove anon_vma_moveto_tail mremap() had a clever optimization where move_ptes() did not take the anon_vma lock to avoid a race with anon rmap users such as page migration. Instead, the avc's were ordered in such a way that the origin vma was always visited by rmap before the destination. This ordering and the use of page table locks rmap usage safe. However, we want to replace the use of linked lists in anon rmap with an interval tree, and this will make it harder to impose such ordering as the interval tree will always be sorted by the avc->vma->vm_pgoff value. For now, let's replace the anon_vma_moveto_tail() ordering function with proper anon_vma locking in move_ptes(). Once we have the anon interval tree in place, we will re-introduce an optimization to avoid taking these locks in the most common cases. Signed-off-by: Michel Lespinasse <walken@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Daniel Santos <daniel.santos@pobox.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:41 +09:00
Michel Lespinasse	9826a516ff	mm: interval tree updates Update the generic interval tree code that was introduced in "mm: replace vma prio_tree with an interval tree". Changes: - fixed 'endpoing' typo noticed by Andrew Morton - replaced include/linux/interval_tree_tmpl.h, which was used as a template (including it automatically defined the interval tree functions) with include/linux/interval_tree_generic.h, which only defines a preprocessor macro INTERVAL_TREE_DEFINE(), which itself defines the interval tree functions when invoked. Now that is a very long macro which is unfortunate, but it does make the usage sites (lib/interval_tree.c and mm/interval_tree.c) a bit nicer than previously. - make use of RB_DECLARE_CALLBACKS() in the INTERVAL_TREE_DEFINE() macro, instead of duplicating that code in the interval tree template. - replaced vma_interval_tree_add(), which was actually handling the nonlinear and interval tree cases, with vma_interval_tree_insert_after() which handles only the interval tree case and has an API that is more consistent with the other interval tree handling functions. The nonlinear case is now handled explicitly in kernel/fork.c dup_mmap(). Signed-off-by: Michel Lespinasse <walken@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Daniel Santos <daniel.santos@pobox.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:40 +09:00
Michel Lespinasse	9c079add0d	rbtree: move augmented rbtree functionality to rbtree_augmented.h Provide rb_insert_augmented() and rb_erase_augmented() through a new rbtree_augmented.h include file. rb_erase_augmented() is defined there as an __always_inline function, in order to allow inlining of augmented rbtree callbacks into it. Since this generates a relatively large function, each augmented rbtree user should make sure to have a single call site. Signed-off-by: Michel Lespinasse <walken@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:40 +09:00
Michel Lespinasse	147e615f83	prio_tree: remove After both prio_tree users have been converted to use red-black trees, there is no need to keep around the prio tree library anymore. Signed-off-by: Michel Lespinasse <walken@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:40 +09:00
Michel Lespinasse	6b2dbba8b6	mm: replace vma prio_tree with an interval tree Implement an interval tree as a replacement for the VMA prio_tree. The algorithms are similar to lib/interval_tree.c; however that code can't be directly reused as the interval endpoints are not explicitly stored in the VMA. So instead, the common algorithm is moved into a template and the details (node type, how to get interval endpoints from the node, etc) are filled in using the C preprocessor. Once the interval tree functions are available, using them as a replacement to the VMA prio tree is a relatively simple, mechanical job. Signed-off-by: Michel Lespinasse <walken@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:39 +09:00
Michel Lespinasse	fff3fd8a12	rbtree: add prio tree and interval tree tests Patch 1 implements support for interval trees, on top of the augmented rbtree API. It also adds synthetic tests to compare the performance of interval trees vs prio trees. Short answers is that interval trees are slightly faster (~25%) on insert/erase, and much faster (~2.4 - 3x) on search. It is debatable how realistic the synthetic test is, and I have not made such measurements yet, but my impression is that interval trees would still come out faster. Patch 2 uses a preprocessor template to make the interval tree generic, and uses it as a replacement for the vma prio_tree. Patch 3 takes the other prio_tree user, kmemleak, and converts it to use a basic rbtree. We don't actually need the augmented rbtree support here because the intervals are always non-overlapping. Patch 4 removes the now-unused prio tree library. Patch 5 proposes an additional optimization to rb_erase_augmented, now providing it as an inline function so that the augmented callbacks can be inlined in. This provides an additional 5-10% performance improvement for the interval tree insert/erase benchmark. There is a maintainance cost as it exposes augmented rbtree users to some of the rbtree library internals; however I think this cost shouldn't be too high as I expect the augmented rbtree will always have much less users than the base rbtree. I should probably add a quick summary of why I think it makes sense to replace prio trees with augmented rbtree based interval trees now. One of the drivers is that we need augmented rbtrees for Rik's vma gap finding code, and once you have them, it just makes sense to use them for interval trees as well, as this is the simpler and more well known algorithm. prio trees, in comparison, seem too clever: they impose an additional 'heap' constraint on the tree, which they use to guarantee a faster worst-case complexity of O(k+log N) for stabbing queries in a well-balanced prio tree, vs O(k*log N) for interval trees (where k=number of matches, N=number of intervals). Now this sounds great, but in practice prio trees don't realize this theorical benefit. First, the additional constraint makes them harder to update, so that the kernel implementation has to simplify things by balancing them like a radix tree, which is not always ideal. Second, the fact that there are both index and heap properties makes both tree manipulation and search more complex, which results in a higher multiplicative time constant. As it turns out, the simple interval tree algorithm ends up running faster than the more clever prio tree. This patch: Add two test modules: - prio_tree_test measures the performance of lib/prio_tree.c, both for insertion/removal and for stabbing searches - interval_tree_test measures the performance of a library of equivalent functionality, built using the augmented rbtree support. In order to support the second test module, lib/interval_tree.c is introduced. It is kept separate from the interval_tree_test main file for two reasons: first we don't want to provide an unfair advantage over prio_tree_test by having everything in a single compilation unit, and second there is the possibility that the interval tree functionality could get some non-test users in kernel over time. Signed-off-by: Michel Lespinasse <walken@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:39 +09:00
Michel Lespinasse	3908836aa7	rbtree: add RB_DECLARE_CALLBACKS() macro As proposed by Peter Zijlstra, this makes it easier to define the augmented rbtree callbacks. Signed-off-by: Michel Lespinasse <walken@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:38 +09:00
Michel Lespinasse	9d9e6f9703	rbtree: remove prior augmented rbtree implementation convert arch/x86/mm/pat_rbtree.c to the proposed augmented rbtree api and remove the old augmented rbtree implementation. Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:38 +09:00
Michel Lespinasse	14b94af0b2	rbtree: faster augmented rbtree manipulation Introduce new augmented rbtree APIs that allow minimal recalculation of augmented node information. A new callback is added to the rbtree insertion and erase rebalancing functions, to be called on each tree rotations. Such rotations preserve the subtree's root augmented value, but require recalculation of the one child that was previously located at the subtree root. In the insertion case, the handcoded search phase must be updated to maintain the augmented information on insertion, and then the rbtree coloring/rebalancing algorithms keep it up to date. In the erase case, things are more complicated since it is library code that manipulates the rbtree in order to remove internal nodes. This requires a couple additional callbacks to copy a subtree's augmented value when a new root is stitched in, and to recompute augmented values down the ancestry path when a node is removed from the tree. In order to preserve maximum speed for the non-augmented case, we provide two versions of each tree manipulation function. rb_insert_augmented() is the augmented equivalent of rb_insert_color(), and rb_erase_augmented() is the augmented equivalent of rb_erase(). Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:37 +09:00
Michel Lespinasse	bf7ad8eeab	rbtree: move some implementation details from rbtree.h to rbtree.c rbtree users must use the documented APIs to manipulate the tree structure. Low-level helpers to manipulate node colors and parenthood are not part of that API, so move them to lib/rbtree.c [dwmw2@infradead.org: fix jffs2 build issue due to renamed __rb_parent_color field] Signed-off-by: Michel Lespinasse <walken@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Acked-by: David Woodhouse <David.Woodhouse@intel.com> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Daniel Santos <daniel.santos@pobox.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:32 +09:00
Michel Lespinasse	4c199a93a2	rbtree: empty nodes have no color Empty nodes have no color. We can make use of this property to simplify the code emitted by the RB_EMPTY_NODE and RB_CLEAR_NODE macros. Also, we can get rid of the rb_init_node function which had been introduced by commit `88d19cf379` ("timers: Add rb_init_node() to allow for stack allocated rb nodes") to avoid some issue with the empty node's color not being initialized. I'm not sure what the RB_EMPTY_NODE checks in rb_prev() / rb_next() are doing there, though. axboe introduced them in commit `10fd48f237` ("rbtree: fixed reversed RB_EMPTY_NODE and rb_next/prev"). The way I see it, the 'empty node' abstraction is only used by rbtree users to flag nodes that they haven't inserted in any rbtree, so asking the predecessor or successor of such nodes doesn't make any sense. One final rb_init_node() caller was recently added in sysctl code to implement faster sysctl name lookups. This code doesn't make use of RB_EMPTY_NODE at all, and from what I could see it only called rb_init_node() under the mistaken assumption that such initialization was required before node insertion. [sfr@canb.auug.org.au: fix net/ceph/osd_client.c build] Signed-off-by: Michel Lespinasse <walken@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Acked-by: David Woodhouse <David.Woodhouse@intel.com> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Daniel Santos <daniel.santos@pobox.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: John Stultz <john.stultz@linaro.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:32 +09:00
Michel Lespinasse	1457d28778	rbtree: reference Documentation/rbtree.txt for usage instructions I recently started looking at the rbtree code (with an eye towards improving the augmented rbtree support, but I haven't gotten there yet). I noticed a lot of possible speed improvements, which I am now proposing in this patch set. Patches 1-4 are preparatory: remove internal functions from rbtree.h so that users won't be tempted to use them instead of the documented APIs, clean up some incorrect usages I've noticed (in particular, with the recently added fs/proc/proc_sysctl.c rbtree usage), reference the documentation so that people have one less excuse to miss it, etc. Patch 5 is a small module I wrote to check the rbtree performance. It creates 100 nodes with random keys and repeatedly inserts and erases them from an rbtree. Additionally, it has code to check for rbtree invariants after each insert or erase operation. Patches 6-12 is where the rbtree optimizations are done, and they touch only that one file, lib/rbtree.c . I am getting good results out of these - in my small benchmark doing rbtree insertion (including search) and erase, I'm seeing a 30% runtime reduction on Sandybridge E5, which is more than I initially thought would be possible. (the results aren't as impressive on my two other test hosts though, AMD barcelona and Intel Westmere, where I am seeing 14% runtime reduction only). The code size - both source (ommiting comments) and compiled - is also shorter after these changes. However, I do admit that the updated code is more arduous to read - one big reason for that is the removal of the tree rotation helpers, which added some overhead but also made it easier to reason about things locally. Overall, I believe this is an acceptable compromise, given that this code doesn't get modified very often, and that I have good tests for it. Upon Peter's suggestion, I added comments showing the rtree configuration before every rotation. I think they help; however it's still best to have a copy of the cormen/leiserson/rivest book when digging into this code. This patch: reference Documentation/rbtree.txt for usage instructions include/linux/rbtree.h included some basic usage instructions, while Documentation/rbtree.txt had some more complete and easier to follow instructions. Replacing the former with a reference to the latter. Signed-off-by: Michel Lespinasse <walken@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Acked-by: David Woodhouse <David.Woodhouse@intel.com> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Daniel Santos <daniel.santos@pobox.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:31 +09:00
Gerald Schaefer	46dcde735c	thp: introduce pmdp_invalidate() On s390, a valid page table entry must not be changed while it is attached to any CPU. So instead of pmd_mknotpresent() and set_pmd_at(), an IDTE operation would be necessary there. This patch introduces the pmdp_invalidate() function, to allow architecture-specific implementations. Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:29 +09:00
Gerald Schaefer	e3ebcf6438	thp: remove assumptions on pgtable_t type The thp page table pre-allocation code currently assumes that pgtable_t is of type "struct page *". This may not be true for all architectures, so this patch removes that assumption by replacing the functions prepare_pmd_huge_pte() and get_pmd_huge_pte() with two new functions that can be defined architecture-specific. It also removes two VM_BUG_ON checks for page_count() and page_mapcount() operating on a pgtable_t. Apart from the VM_BUG_ON removal, there will be no functional change introduced by this patch. Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:29 +09:00
Davidlohr Bueso	01dc52ebdf	oom: remove deprecated oom_adj The deprecated /proc/<pid>/oom_adj is scheduled for removal this month. Signed-off-by: Davidlohr Bueso <dave@gnu.org> Acked-by: David Rientjes <rientjes@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:24 +09:00
Sagi Grimberg	21a92735f6	mm: mmu_notifier: have mmu_notifiers use a global SRCU so they may safely schedule With an RCU based mmu_notifier implementation, any callout to mmu_notifier_invalidate_range_{start,end}() or mmu_notifier_invalidate_page() would not be allowed to call schedule() as that could potentially allow a modification to the mmu_notifier structure while it is currently being used. Since srcu allocs 4 machine words per instance per cpu, we may end up with memory exhaustion if we use srcu per mm. So all mms share a global srcu. Note that during large mmu_notifier activity exit & unregister paths might hang for longer periods, but it is tolerable for current mmu_notifier clients. Signed-off-by: Sagi Grimberg <sagig@mellanox.co.il> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Haggai Eran <haggaie@mellanox.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:23 +09:00
Xiao Guangrong	48af0d7cb3	mm: mmu_notifier: fix inconsistent memory between secondary MMU and host There is a bug in set_pte_at_notify() which always sets the pte to the new page before releasing the old page in the secondary MMU. At this time, the process will access on the new page, but the secondary MMU still access on the old page, the memory is inconsistent between them The below scenario shows the bug more clearly: at the beginning: p = 0, and p is write-protected by KSM or shared with parent process CPU 0 CPU 1 write 1 to p to trigger COW, set_pte_at_notify will be called: pte = new_page + W; /* The W bit of pte is set / p = 1; /* pte is valid, so no #PF / return back to secondary MMU, then the secondary MMU read p, but get: p == 0; /* * !!!!!! * the host has already set p to 1, but the secondary * MMU still get the old value 0 */ call mmu_notifier_change_pte to release old page in secondary MMU We can fix it by release old page first, then set the pte to the new page. Note, the new page will be firstly used in secondary MMU before it is mapped into the page table of the process, but this is safe because it is protected by the page table lock, there is no race to change the pte [akpm@linux-foundation.org: add comment from Andrea] Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: Avi Kivity <avi@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:22 +09:00
Mel Gorman	b22d127a39	mempolicy: fix a race in shared_policy_replace() shared_policy_replace() use of sp_alloc() is unsafe. 1) sp_node cannot be dereferenced if sp->lock is not held and 2) another thread can modify sp_node between spin_unlock for allocating a new sp node and next spin_lock. The bug was introduced before 2.6.12-rc2. Kosaki's original patch for this problem was to allocate an sp node and policy within shared_policy_replace and initialise it when the lock is reacquired. I was not keen on this approach because it partially duplicates sp_alloc(). As the paths were sp->lock is taken are not that performance critical this patch converts sp->lock to sp->mutex so it can sleep when calling sp_alloc(). [kosaki.motohiro@jp.fujitsu.com: Original patch] Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Christoph Lameter <cl@linux.com> Cc: Josh Boyer <jwboyer@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:22 +09:00
Mel Gorman	1fb3f8ca0e	mm: compaction: capture a suitable high-order page immediately when it is made available While compaction is migrating pages to free up large contiguous blocks for allocation it races with other allocation requests that may steal these blocks or break them up. This patch alters direct compaction to capture a suitable free page as soon as it becomes available to reduce this race. It uses similar logic to split_free_page() to ensure that watermarks are still obeyed. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:21 +09:00
Konstantin Khlebnikov	314e51b985	mm: kill vma flag VM_RESERVED and mm->reserved_vm counter A long time ago, in v2.4, VM_RESERVED kept swapout process off VMA, currently it lost original meaning but still has some effects: \| effect \| alternative flags -+------------------------+--------------------------------------------- 1\| account as reserved_vm \| VM_IO 2\| skip in core dump \| VM_IO, VM_DONTDUMP 3\| do not merge or expand \| VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP 4\| do not mlock \| VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP This patch removes reserved_vm counter from mm_struct. Seems like nobody cares about it, it does not exported into userspace directly, it only reduces total_vm showed in proc. Thus VM_RESERVED can be replaced with VM_IO or pair VM_DONTEXPAND \| VM_DONTDUMP. remap_pfn_range() and io_remap_pfn_range() set VM_IO\|VM_DONTEXPAND\|VM_DONTDUMP. remap_vmalloc_range() set VM_DONTEXPAND \| VM_DONTDUMP. [akpm@linux-foundation.org: drivers/vfio/pci/vfio_pci.c fixup] Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Eric Paris <eparis@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robert Richter <robert.richter@amd.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Venkatesh Pallipadi <venki@google.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:19 +09:00
Konstantin Khlebnikov	0103bd16fb	mm: prepare VM_DONTDUMP for using in drivers Rename VM_NODUMP into VM_DONTDUMP: this name matches other negative flags: VM_DONTEXPAND, VM_DONTCOPY. Currently this flag used only for sys_madvise. The next patch will use it for replacing the outdated flag VM_RESERVED. Also forbid madvise(MADV_DODUMP) for special kernel mappings VM_SPECIAL (VM_IO \| VM_DONTEXPAND \| VM_RESERVED \| VM_PFNMAP) Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Eric Paris <eparis@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robert Richter <robert.richter@amd.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Venkatesh Pallipadi <venki@google.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:18 +09:00
Konstantin Khlebnikov	e9714acf8c	mm: kill vma flag VM_EXECUTABLE and mm->num_exe_file_vmas Currently the kernel sets mm->exe_file during sys_execve() and then tracks number of vmas with VM_EXECUTABLE flag in mm->num_exe_file_vmas, as soon as this counter drops to zero kernel resets mm->exe_file to NULL. Plus it resets mm->exe_file at last mmput() when mm->mm_users drops to zero. VMA with VM_EXECUTABLE flag appears after mapping file with flag MAP_EXECUTABLE, such vmas can appears only at sys_execve() or after vma splitting, because sys_mmap ignores this flag. Usually binfmt module sets mm->exe_file and mmaps executable vmas with this file, they hold mm->exe_file while task is running. comment from v2.6.25-6245-g925d1c4 ("procfs task exe symlink"), where all this stuff was introduced: > The kernel implements readlink of /proc/pid/exe by getting the file from > the first executable VMA. Then the path to the file is reconstructed and > reported as the result. > > Because of the VMA walk the code is slightly different on nommu systems. > This patch avoids separate /proc/pid/exe code on nommu systems. Instead of > walking the VMAs to find the first executable file-backed VMA we store a > reference to the exec'd file in the mm_struct. > > That reference would prevent the filesystem holding the executable file > from being unmounted even after unmapping the VMAs. So we track the number > of VM_EXECUTABLE VMAs and drop the new reference when the last one is > unmapped. This avoids pinning the mounted filesystem. exe_file's vma accounting is hooked into every file mmap/unmmap and vma split/merge just to fix some hypothetical pinning fs from umounting by mm, which already unmapped all its executable files, but still alive. Seems like currently nobody depends on this behaviour. We can try to remove this logic and keep mm->exe_file until final mmput(). mm->exe_file is still protected with mm->mmap_sem, because we want to change it via new sys_prctl(PR_SET_MM_EXE_FILE). Also via this syscall task can change its mm->exe_file and unpin mountpoint explicitly. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Eric Paris <eparis@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robert Richter <robert.richter@amd.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Venkatesh Pallipadi <venki@google.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:18 +09:00
Konstantin Khlebnikov	0b173bc4da	mm: kill vma flag VM_CAN_NONLINEAR Move actual pte filling for non-linear file mappings into the new special vma operation: ->remap_pages(). Filesystems must implement this method to get non-linear mapping support, if it uses filemap_fault() then generic_file_remap_pages() can be used. Now device drivers can implement this method and obtain nonlinear vma support. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> #arch/tile Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Eric Paris <eparis@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robert Richter <robert.richter@amd.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Venkatesh Pallipadi <venki@google.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:17 +09:00
Konstantin Khlebnikov	4b6e1e3702	mm: kill vma flag VM_INSERTPAGE Merge VM_INSERTPAGE into VM_MIXEDMAP. VM_MIXEDMAP VMA can mix pure-pfn ptes, special ptes and normal ptes. Now copy_page_range() always copies VM_MIXEDMAP VMA on fork like VM_PFNMAP. If driver populates whole VMA at mmap() it probably not expects page-faults. This patch removes special check from vma_wants_writenotify() which disables pages write tracking for VMA populated via vm_instert_page(). BDI below mapped file should not use dirty-accounting, moreover do_wp_page() can handle this. vm_insert_page() still marks vma after first usage. Usually it is called from f_op->mmap() handler under mm->mmap_sem write-lock, so it able to change vma->vm_flags. Caller must set VM_MIXEDMAP at mmap time if it wants to call this function from other places, for example from page-fault handler. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Eric Paris <eparis@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robert Richter <robert.richter@amd.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Venkatesh Pallipadi <venki@google.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:17 +09:00
Konstantin Khlebnikov	cc2383ec06	mm: introduce arch-specific vma flag VM_ARCH_1 Combine several arch-specific vma flags into one. before patch: 0x00000200 0x01000000 0x20000000 0x40000000 x86 VM_NOHUGEPAGE VM_HUGEPAGE - VM_PAT powerpc - - VM_SAO - parisc VM_GROWSUP - - - ia64 VM_GROWSUP - - - nommu - VM_MAPPED_COPY - - others - - - - after patch: 0x00000200 0x01000000 0x20000000 0x40000000 x86 - VM_PAT VM_HUGEPAGE VM_NOHUGEPAGE powerpc - VM_SAO - - parisc - VM_GROWSUP - - ia64 - VM_GROWSUP - - nommu - VM_MAPPED_COPY - - others - VM_ARCH_1 - - And voila! One completely free bit. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Eric Paris <eparis@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robert Richter <robert.richter@amd.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Venkatesh Pallipadi <venki@google.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:16 +09:00
Konstantin Khlebnikov	b3b9c2932c	mm, x86, pat: rework linear pfn-mmap tracking Replace the generic vma-flag VM_PFN_AT_MMAP with x86-only VM_PAT. We can toss mapping address from remap_pfn_range() into track_pfn_vma_new(), and collect all PAT-related logic together in arch/x86/. This patch also restores orignal frustration-free is_cow_mapping() check in remap_pfn_range(), as it was before commit v2.6.28-rc8-88-g3c8bb73 ("x86: PAT: store vm_pgoff for all linear_over_vma_region mappings - v3") is_linear_pfn_mapping() checks can be removed from mm/huge_memory.c, because it already handled by VM_PFNMAP in VM_NO_THP bit-mask. [suresh.b.siddha@intel.com: Reset the VM_PAT flag as part of untrack_pfn_vma()] Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Venkatesh Pallipadi <venki@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Ingo Molnar <mingo@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Eric Paris <eparis@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robert Richter <robert.richter@amd.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Venkatesh Pallipadi <venki@google.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:16 +09:00
Suresh Siddha	5180da410d	x86, pat: separate the pfn attribute tracking for remap_pfn_range and vm_insert_pfn With PAT enabled, vm_insert_pfn() looks up the existing pfn memory attribute and uses it. Expectation is that the driver reserves the memory attributes for the pfn before calling vm_insert_pfn(). remap_pfn_range() (when called for the whole vma) will setup a new attribute (based on the prot argument) for the specified pfn range. This addresses the legacy usage which typically calls remap_pfn_range() with a desired memory attribute. For ranges smaller than the vma size (which is typically not the case), remap_pfn_range() will use the existing memory attribute for the pfn range. Expose two different API's for these different behaviors. track_pfn_insert() for tracking the pfn attribute set by vm_insert_pfn() and track_pfn_remap() for the remap_pfn_range(). This cleanup also prepares the ground for the track/untrack pfn vma routines to take over the ownership of setting PAT specific vm_flag in the 'vma'. [khlebnikov@openvz.org: Clear checks in track_pfn_remap()] [akpm@linux-foundation.org: tweak a few comments] Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Venkatesh Pallipadi <venki@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Ingo Molnar <mingo@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Eric Paris <eparis@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robert Richter <robert.richter@amd.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:16 +09:00
Rik van Riel	c654345924	mm: remove __GFP_NO_KSWAPD When transparent huge pages were introduced, memory compaction and swap storms were an issue, and the kernel had to be careful to not make THP allocations cause pageout or compaction. Now that we have working compaction deferral, kswapd is smart enough to invoke compaction and the quadratic behaviour around isolate_free_pages has been fixed, it should be safe to remove __GFP_NO_KSWAPD. [minchan@kernel.org: Comment fix] [mgorman@suse.de: Avoid direct reclaim for deferred compaction] Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:15 +09:00
Linus Torvalds	50e0d10232	This has three changes for asm-generic that did not really fit into any other branch as normal asm-generic changes do. One is a fix for a build warning, the other two are more interesting: * A patch from Mark Brown to allow using the common clock infrastructure on all architectures, so we can use the clock API in architecture independent device drivers. * The UAPI split patches from David Howells for the asm-generic files. There are other architecture specific series that are going through the arch maintainer tree and that depend on this one. There may be a few small merge conflicts between Mark's patch and the following arch header file split patches. In each case the solution will be to keep the new "generic-y += clkdev.h" line, even if it ends up being the only line in the Kbuild file. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIVAwUAUHLuO2CrR//JCVInAQLsKxAAoa+oSP3KGuQbLHq2wvUxAdXWDFcZgKo+ qMRejSJPI0sreJ9GJHpUjHtJ7W2gujeo9upmUIJzoWY9vrmjkhCDkaWliaQI8SmY CKB9zI2xCB9iFzHtWxocfnJzU7NvzjJm+jnIYrqkaO9HGMxL99tsv9TsBYXK/08j QmlGP5fHdGU3zZxVt5r1GL8/nfX4zn3/YEll9nJ7vqXZltIBbaksxmgPoa0QkkH8 LMeMAlgRR2DHWt58gXHyGB7Afx3QEnZBDaQpYxA446P+2gtvIhFYOnpuX14pZb7t m4IM0vOO6WzARQR6DJlRHfYJevojgGHu4Y8wkEzuWE+Hr2BqmiVct7UKqGJdqTY5 7+I7wwaJmdd3zE61LxRS9UOjJDwMh1gmsNU4+42RArQ5eLcikNR5zfYzDRLCTmnk qKZvbiaxgme2YvWazxbBT6EqmIVU6lfHHIoMLr8U0j40Cl0GCmN7EBbe7/r2Jhjs 6VnCOJ6vb4RCOJGGAcLRMQu7xEtqcCe0Zht839wl13QXewxS3QRgwg6Bjy/fwA9r jij5gf+R25J/fQW7yZv4LwcMowRE1xvpu0ebwkK3LLR8jcon71scd6f3PW/bUUpj j4tgFuJbXzOxQ4LFgBzvdVgx3wDzsQhqb/6p2l6ROdcw7xXFDdFZ4zq3h0A25wXZ J6WDO387tpg= =Aaki -----END PGP SIGNATURE----- Merge tag 'asm-generic' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic updates from Arnd Bergmann: "This has three changes for asm-generic that did not really fit into any other branch as normal asm-generic changes do. One is a fix for a build warning, the other two are more interesting: * A patch from Mark Brown to allow using the common clock infrastructure on all architectures, so we can use the clock API in architecture independent device drivers. * The UAPI split patches from David Howells for the asm-generic files. There are other architecture specific series that are going through the arch maintainer tree and that depend on this one. There may be a few small merge conflicts between Mark's patch and the following arch header file split patches. In each case the solution will be to keep the new "generic-y += clkdev.h" line, even if it ends up being the only line in the Kbuild file." * tag 'asm-generic' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: UAPI: (Scripted) Disintegrate include/asm-generic asm-generic: Add default clkdev.h asm-generic: xor: mark static functions as __maybe_unused	2012-10-09 15:58:38 +09:00
Linus Torvalds	8711798772	Merge branch 'linux-next' of git://git.open-osd.org/linux-open-osd Pull exofs update from Boaz Harrosh: "Just three one liners" * 'linux-next' of git://git.open-osd.org/linux-open-osd: pnfs_osd_xdr: Remove unused #include from pnfs_osd_xdr.h ore: signedness bug in _sp2d_min_pg() exofs: check for allocation failure in uri_store()	2012-10-09 15:54:27 +09:00
Len Brown	d1d4a81b84	Merge branches 'fixes-for-37', 'ec' and 'thermal' into release	2012-10-09 01:47:35 -04:00
Len Brown	29b19e2504	Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux into thermal Conflicts: drivers/staging/omap-thermal/omap-thermal-common. OMAP supplied dummy TC1 and TC2, at the same time that the thermal tree removed them from thermal_zone_device_register() drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c propogate the upstream MAX_IDR_LEVEL re-name to prevent a build failure Previously-fixed-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Len Brown <len.brown@intel.com>	2012-10-09 01:35:52 -04:00
Linus Torvalds	f5a246eab9	Sound updates for 3.7-rc1 This contains pretty many small commits covering fairly large range of files in sound/ directory. Partly because of additional API support and partly because of constantly developed ASoC and ARM stuff. Some highlights: - Introduced the helper function and documentation for exposing the channel map via control API, as discussed in Plumbers; most of PCI drivers are covered, will follow more drivers later - Most of drivers have been replaced with the new PM callbacks (if the bus is supported) - HD-audio controller got the support of runtime PM and the support of D3 clock-stop. Also changing the power_save option in sysfs kicks off immediately to enable / disable the power-save mode. - Another significant code change in HD-audio is the rewrite of firmware loading code. Other than that, most of changes in HD-audio are continued cleanups and standardization for the generic auto parser and bug fixes (HBR, device-specific fixups), in addition to the support of channel-map API. - Addition of ASoC bindings for the compressed API, used by the mid-x86 drivers. - Lots of cleanups and API refreshes for ASoC codec drivers and DaVinci. - Conversion of OMAP to dmaengine. - New machine driver for Wolfson Microelectronics Bells. - New CODEC driver for Wolfson Microelectronics WM0010. - Enhancements to the ux500 and wm2000 drivers - A new driver for DA9055 and the support for regulator bypass mode. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iQIcBAABAgAGBQJQcpeWAAoJEGwxgFQ9KSmkpi4P/2etDDz5aEkEHNa1l4xEmFcm ymiGTgjaalqpUAVbM/gYx9G59EFMEbzUl1BHAqE5La4wO/v9lNPb+VrdUo+B+NZ7 WSxIPWcNqdinSuoSqyYPjoPMVnhs3EMtNOqmf4jm1JOvdqA+4rO29xQVAqK/5Gfu LpMOyPiRi5ODnbQ1BOIWwpKICioY/mLwGJudK3z0i/fYVA7gLub20f+w+sOjKIA4 wmwQAMTjAR798Cg/tVy4fQmf4SLw+c2nIgGe/PD+2gVlGXLNKBrJfMonHPTbmwKu lmJO/EtnijNOnpbn6up7ryUQ9cSoZAUZOfdIOgmAeQgQ/LWR0f+zf2IQehSPwrul g6hqOnQI2DNN7ugT3cYVbYnsh56TjyhnxhhxZgkapqh706QkqHGyKJNMRetzuXmP 1O//MnZJrFQWd6sOKLlTL2ZzRvnxEJcNVGaE6bbwZTfQMtPeo9l1842uIq1dLUtG VxZb/svKUkMXv4is1dwUYUkpDsKxsgMEmabmuovceGf2N7jj/irkXgqxf6LWkaY1 JQ7ZFWUJyDzEMXRaFfzdGO15T532CfB84wvFX5xoPMwMste2AA7QuybFBVstXhKu AtKNDgRJFUTlnLIxydpPBWdWH3UJdEaFwwsSfuNKI8OmmGKhWC/aP83k4hzueu9H KYLvY/0ObMSMqiwh/ndQ =uNqD -----END PGP SIGNATURE----- Merge tag 'sound-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound updates from Takashi Iwai: "This contains pretty many small commits covering fairly large range of files in sound/ directory. Partly because of additional API support and partly because of constantly developed ASoC and ARM stuff. Some highlights: - Introduced the helper function and documentation for exposing the channel map via control API, as discussed in Plumbers; most of PCI drivers are covered, will follow more drivers later - Most of drivers have been replaced with the new PM callbacks (if the bus is supported) - HD-audio controller got the support of runtime PM and the support of D3 clock-stop. Also changing the power_save option in sysfs kicks off immediately to enable / disable the power-save mode. - Another significant code change in HD-audio is the rewrite of firmware loading code. Other than that, most of changes in HD-audio are continued cleanups and standardization for the generic auto parser and bug fixes (HBR, device-specific fixups), in addition to the support of channel-map API. - Addition of ASoC bindings for the compressed API, used by the mid-x86 drivers. - Lots of cleanups and API refreshes for ASoC codec drivers and DaVinci. - Conversion of OMAP to dmaengine. - New machine driver for Wolfson Microelectronics Bells. - New CODEC driver for Wolfson Microelectronics WM0010. - Enhancements to the ux500 and wm2000 drivers - A new driver for DA9055 and the support for regulator bypass mode." Fix up various arm soc header file reorg conflicts. * tag 'sound-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (339 commits) ALSA: hda - Add new codec ALC283 ALC290 support ALSA: hda - avoid unneccesary indices on "Headphone Jack" controls ALSA: hda - fix indices on boost volume on Conexant ALSA: aloop - add locking to timer access ALSA: hda - Fix hang caused by race during suspend. sound: Remove unnecessary semicolon ALSA: hda/realtek - Fix detection of ALC271X codec ALSA: hda - Add inverted internal mic quirk for Lenovo IdeaPad U310 ALSA: hda - make Realtek/Sigmatel/Conexant use the generic unsol event ALSA: hda - make a generic unsol event handler ASoC: codecs: Add DA9055 codec driver ASoC: eukrea-tlv320: Convert it to platform driver ALSA: ASoC: add DT bindings for CS4271 ASoC: wm_hubs: Ensure volume updates are handled during class W startup ASoC: wm5110: Adding missing volume update bits ASoC: wm5110: Add OUT3R support ASoC: wm5110: Add AEC loopback support ASoC: wm5110: Rename EPOUT to HPOUT3 ASoC: arizona: Add more clock rates ASoC: arizona: Add more DSP options for mixer input muxes ...	2012-10-09 07:07:14 +09:00
Julian Anastasov	c92b96553a	ipv4: Add FLOWI_FLAG_KNOWN_NH Add flag to request that output route should be returned with known rt_gateway, in case we want to use it as nexthop for neighbour resolving. The returned route can be cached as follows: - in NH exception: because the cached routes are not shared with other destinations - in FIB NH: when using gateway because all destinations for NH share same gateway As last option, to return rt_gateway!=0 we have to set DST_NOCACHE. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-10-08 17:42:36 -04:00
Julian Anastasov	155e8336c3	ipv4: introduce rt_uses_gateway Add new flag to remember when route is via gateway. We will use it to allow rt_gateway to contain address of directly connected host for the cases when DST_NOCACHE is used or when the NH exception caches per-destination route without DST_NOCACHE flag, i.e. when routes are not used for other destinations. By this way we force the neighbour resolving to work with the routed destination but we can use different address in the packet, feature needed for IPVS-DR where original packet for virtual IP is routed via route to real IP. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-10-08 17:42:36 -04:00
Eric Dumazet	863472454c	ipv6: gro: fix PV6_GRO_CB(skb)->proto problem It seems IPV6_GRO_CB(skb)->proto can be destroyed in skb_gro_receive() if a new skb is allocated (to serve as an anchor for frag_list) We copy NAPI_GRO_CB() only (not the IPV6 specific part) in : NAPI_GRO_CB(nskb) = NAPI_GRO_CB(p); So we leave IPV6_GRO_CB(nskb)->proto to 0 (fresh skb allocation) instead of IPPROTO_TCP (6) ipv6_gro_complete() isnt able to call ops->gro_complete() [ tcp6_gro_complete() ] Fix this by moving proto in NAPI_GRO_CB() and getting rid of IPV6_GRO_CB Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-10-08 15:40:43 -04:00
Florian Zumbiehl	48cc32d38a	vlan: don't deliver frames for unknown vlans to protocols `6a32e4f9dd` made the vlan code skip marking vlan-tagged frames for not locally configured vlans as PACKET_OTHERHOST if there was an rx_handler, as the rx_handler could cause the frame to be received on a different (virtual) vlan-capable interface where that vlan might be configured. As rx_handlers do not necessarily return RX_HANDLER_ANOTHER, this could cause frames for unknown vlans to be delivered to the protocol stack as if they had been received untagged. For example, if an ipv6 router advertisement that's tagged for a locally not configured vlan is received on an interface with macvlan interfaces attached, macvlan's rx_handler returns RX_HANDLER_PASS after delivering the frame to the macvlan interfaces, which caused it to be passed to the protocol stack, leading to ipv6 addresses for the announced prefix being configured even though those are completely unusable on the underlying interface. The fix moves marking as PACKET_OTHERHOST after the rx_handler so the rx_handler, if there is one, sees the frame unchanged, but afterwards, before the frame is delivered to the protocol stack, it gets marked whether there is an rx_handler or not. Signed-off-by: Florian Zumbiehl <florz@florz.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-10-08 15:21:55 -04:00
Eric Dumazet	2e71a6f808	net: gro: selective flush of packets Current GRO can hold packets in gro_list for almost unlimited time, in case napi->poll() handler consumes its budget over and over. In this case, napi_complete()/napi_gro_flush() are not called. Another problem is that gro_list is flushed in non friendly way : We scan the list and complete packets in the reverse order. (youngest packets first, oldest packets last) This defeats priorities that sender could have cooked. Since GRO currently only store TCP packets, we dont really notice the bug because of retransmits, but this behavior can add unexpected latencies, particularly on mice flows clamped by elephant flows. This patch makes sure no packet can stay more than 1 ms in queue, and only in stress situations. It also complete packets in the right order to minimize latencies. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Jesse Gross <jesse@nicira.com> Cc: Tom Herbert <therbert@google.com> Cc: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-10-08 14:51:51 -04:00
Dmitry Torokhov	7f8d4cad1e	Input: extend the number of event (and other) devices Extend the amount of character devices, such as eventX, mouseX and jsX, from a hard limit of 32 per input handler to about 1024 shared across all handlers. To be compatible with legacy installations input handlers will start creating char devices with minors in their legacy range, however once legacy range is exhausted they will start allocating minors from the dynamic range 256-1024. Reviewed-by: David Herrmann <dh.herrmann@googlemail.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>	2012-10-08 09:37:55 -07:00
Wolfram Sang	102084d3d3	Linux 3.6-rc7 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) iQEcBAABAgAGBQJQX7MuAAoJEHm+PkMAQRiG0h0IAJURkrMCAQUxA+Ik66ReH89s LQcVd0U9uL4UUOi7f5WR64Vf9Cfu6VVGX9ZKSvjpNskvlQaUQPMIt4pMe6g4X4dI u0bApEy4XZz3nGabUAghIU8jJ8cDmhCG6kPpSiS7pi7KHc0yIa4WFtJRrIpGaIWT xuK38YOiOHcSDRlLyWZzainMncQp/ixJdxnqVMTonkVLk0q0b84XzOr4/qlLE5lU i+TsK3PRKdQXgvZ4CebL+srPBwWX1dmgP3VkeBloQbSSenSeELICbFWavn2ml+sF GXi4dO93oNquL/Oy5SwI666T4uNcrRPaS+5X+xSZgBW/y2aQVJVJuNZg6ZP/uWk= =0v2l -----END PGP SIGNATURE----- Merge tag 'v3.6-rc7' into i2c-embedded/for-next Linux 3.6-rc7 Needed to get updates from i2c-embedded/for-current into i2c-embedded/for-next	2012-10-08 12:46:32 +02:00
Kuninori Morimoto	6ccbe60713	i2c: add Renesas R-Car I2C driver R-Car I2C is similar with SH7760 I2C. But the SH7760 I2C driver had many workaround operations, since H/W had bugs. Thus, it was pointless to keep compatible between SH7760 and R-Car I2C drivers. This patch creates new Renesas R-Car I2C driver. Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>	2012-10-08 12:46:25 +02:00
Linus Torvalds	1929041bd8	Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux Pill drm updates part 2 from Dave Airlie: "This is the follow-up pull, 3 pieces a) exynos next stuff, was delayed but looks okay to me, one patch in v4l bits but it was acked by v4l person. b) UAPI disintegration bits c) intel fixes - DP fixes, hang fixes, other misc fixes." * 'drm-next' of git://people.freedesktop.org/~airlied/linux: (52 commits) drm: exynos: hdmi: remove drm common hdmi platform data struct drm: exynos: hdmi: add support for exynos5 hdmi drm: exynos: hdmi: replace is_v13 with version check in hdmi drm: exynos: hdmi: add support for exynos5 mixer drm: exynos: hdmi: add support to disable video processor in mixer drm: exynos: hdmi: add support for platform variants for mixer drm: exynos: hdmi: add support for exynos5 hdmiphy drm: exynos: hdmi: add support for exynos5 ddc drm: exynos: remove drm hdmi platform data struct drm: exynos: hdmi: turn off HPD interrupt in HDMI chip drm: exynos: hdmi: use s5p-hdmi platform data drm: exynos: hdmi: fix interrupt handling drm: exynos: hdmi: support for platform variants media: s5p-hdmi: add HPD GPIO to platform data UAPI: (Scripted) Disintegrate include/drm drm/i915: Fix GT_MODE default value drm/i915: don't frob the vblank ts in finish_page_flip drm/i915: call drm_handle_vblank before finish_page_flip drm/i915: print warning if vmi915_gem_fault error is not handled drm/i915: EBUSY status handling added to i915_gem_fault(). ...	2012-10-08 16:19:32 +09:00
David Howells	e104599294	MPILIB: Provide a function to read raw data into an MPI Provide a function to read raw data of a predetermined size into an MPI rather than expecting the size to be encoded within the data. The data is assumed to represent an unsigned integer, and the resulting MPI will be positive. The function looks like this: MPI mpi_read_raw_data(const void *, size_t); This is useful for reading ASN.1 integer primitives where the length is encoded in the ASN.1 metadata. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2012-10-08 13:50:21 +10:30
David Howells	42d5ec27f8	X.509: Add an ASN.1 decoder Add an ASN.1 BER/DER/CER decoder. This uses the bytecode from the ASN.1 compiler in the previous patch to inform it as to what to expect to find in the encoded byte stream. The output from the compiler also tells it what functions to call on what tags, thus allowing the caller to retrieve information. The decoder is called as follows: int asn1_decoder(const struct asn1_decoder decoder, void context, const unsigned char *data, size_t datalen); The decoder argument points to the bytecode from the ASN.1 compiler. context is the caller's context and is passed to the action functions. data and datalen define the byte stream to be decoded. Note that the decoder is currently limited to datalen being less than 64K. This reduces the amount of stack space used by the decoder because ASN.1 is a nested construct. Similarly, the decoder is limited to a maximum of 10 levels of constructed data outside of a leaf node also in an effort to keep stack usage down. These restrictions can be raised if necessary. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2012-10-08 13:50:20 +10:30
David Howells	4520c6a49a	X.509: Add simple ASN.1 grammar compiler Add a simple ASN.1 grammar compiler. This produces a bytecode output that can be fed to a decoder to inform the decoder how to interpret the ASN.1 stream it is trying to parse. Action functions can be specified in the grammar by interpolating: ({ foo }) after a type, for example: SubjectPublicKeyInfo ::= SEQUENCE { algorithm AlgorithmIdentifier, subjectPublicKey BIT STRING ({ do_key_data }) } The decoder is expected to call these after matching this type and parsing the contents if it is a constructed type. The grammar compiler does not currently support the SET type (though it does support SET OF) as I can't see a good way of tracking which members have been encountered yet without using up extra stack space. Currently, the grammar compiler will fail if more than 256 bytes of bytecode would be produced or more than 256 actions have been specified as it uses 8-bit jump values and action indices to keep space usage down. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2012-10-08 13:50:19 +10:30
David Howells	4f73175d03	X.509: Add utility functions to render OIDs as strings Add a pair of utility functions to render OIDs as strings. The first takes an encoded OID and turns it into a "a.b.c.d" form string: int sprint_oid(const void data, size_t datasize, char buffer, size_t bufsize); The second takes an OID enum index and calls the first on the data held therein: int sprint_OID(enum OID oid, char *buffer, size_t bufsize); Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2012-10-08 13:50:18 +10:30
David Howells	a77ad6ea0b	X.509: Implement simple static OID registry Implement a simple static OID registry that allows the mapping of an encoded OID to an enum value for ease of use. The OID registry index enum appears in the: linux/oid_registry.h header file. A script generates the registry from lines in the header file that look like: <sp>OID_foo,<sp>/<sp>1.2.3.4<sp>/ The actual OID is taken to be represented by the numbers with interpolated dots in the comment. All other lines in the header are ignored. The registry is queries by calling: OID look_up_oid(const void *data, size_t datasize); This returns a number from the registry enum representing the OID if found or OID__NR if not. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2012-10-08 13:50:18 +10:30
David Howells	4ae71c1dce	KEYS: Provide signature verification with an asymmetric key Provide signature verification using an asymmetric-type key to indicate the public key to be used. The API is a single function that can be found in crypto/public_key.h: int verify_signature(const struct key key, const struct public_key_signature sig) The first argument is the appropriate key to be used and the second argument is the parsed signature data: struct public_key_signature { u8 digest; u16 digest_size; enum pkey_hash_algo pkey_hash_algo : 8; union { MPI mpi[2]; struct { MPI s; / m^d mod n */ } rsa; struct { MPI r; MPI s; } dsa; }; }; This should be filled in prior to calling the function. The hash algorithm should already have been called and the hash finalised and the output should be in a buffer pointed to by the 'digest' member. Any extra data to be added to the hash by the hash format (eg. PGP) should have been added by the caller prior to finalising the hash. It is assumed that the signature is made up of a number of MPI values. If an algorithm becomes available for which this is not the case, the above structure will have to change. It is also assumed that it will have been checked that the signature algorithm matches the key algorithm. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2012-10-08 13:50:15 +10:30
David Howells	a9681bf3dd	KEYS: Asymmetric public-key algorithm crypto key subtype Add a subtype for supporting asymmetric public-key encryption algorithms such as DSA (FIPS-186) and RSA (PKCS#1 / RFC1337). Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2012-10-08 13:50:14 +10:30
David Howells	46c6f1776e	KEYS: Asymmetric key pluggable data parsers The instantiation data passed to the asymmetric key type are expected to be formatted in some way, and there are several possible standard ways to format the data. The two obvious standards are OpenPGP keys and X.509 certificates. The latter is especially useful when dealing with UEFI, and the former might be useful when dealing with, say, eCryptfs. Further, it might be desirable to provide formatted blobs that indicate hardware is to be accessed to retrieve the keys or that the keys live unretrievably in a hardware store, but that the keys can be used by means of the hardware. From userspace, the keys can be loaded using the keyctl command, for example, an X.509 binary certificate: keyctl padd asymmetric foo @s <dhowells.pem or a PGP key: keyctl padd asymmetric bar @s <dhowells.pub or a pointer into the contents of the TPM: keyctl add asymmetric zebra "TPM:04982390582905f8" @s Inside the kernel, pluggable parsers register themselves and then get to examine the payload data to see if they can handle it. If they can, they get to: (1) Propose a name for the key, to be used it the name is "" or NULL. (2) Specify the key subtype. (3) Provide the data for the subtype. The key type asks the parser to do its stuff before a key is allocated and thus before the name is set. If successful, the parser stores the suggested data into the key_preparsed_payload struct, which will be either used (if the key is successfully created and instantiated or updated) or discarded. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2012-10-08 13:50:13 +10:30
David Howells	964f3b3bf4	KEYS: Implement asymmetric key type Create a key type that can be used to represent an asymmetric key type for use in appropriate cryptographic operations, such as encryption, decryption, signature generation and signature verification. The key type is "asymmetric" and can provide access to a variety of cryptographic algorithms. Possibly, this would be better as "public_key" - but that has the disadvantage that "public key" is an overloaded term. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2012-10-08 13:50:12 +10:30
David Howells	aacf29bf1b	MPILIB: Provide count_leading/trailing_zeros() based on arch functions Provide count_leading/trailing_zeros() macros based on extant arch bit scanning functions rather than reimplementing from scratch in MPILIB. Whilst we're at it, turn count_foo_zeros(n, x) into n = count_foo_zeros(x). Also move the definition to asm-generic as other people may be interested in using it. Signed-off-by: David Howells <dhowells@redhat.com> Cc: David S. Miller <davem@davemloft.net> Cc: Dmitry Kasatkin <dmitry.kasatkin@intel.com> Cc: Arnd Bergmann <arnd@arndb.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2012-10-08 13:50:11 +10:30
David Howells	cf7f601c06	KEYS: Add payload preparsing opportunity prior to key instantiate or update Give the key type the opportunity to preparse the payload prior to the instantiation and update routines being called. This is done with the provision of two new key type operations: int (preparse)(struct key_preparsed_payload prep); void (free_preparse)(struct key_preparsed_payload prep); If the first operation is present, then it is called before key creation (in the add/update case) or before the key semaphore is taken (in the update and instantiate cases). The second operation is called to clean up if the first was called. preparse() is given the opportunity to fill in the following structure: struct key_preparsed_payload { char description; void type_data[2]; void payload; const void data; size_t datalen; size_t quotalen; }; Before the preparser is called, the first three fields will have been cleared, the payload pointer and size will be stored in data and datalen and the default quota size from the key_type struct will be stored into quotalen. The preparser may parse the payload in any way it likes and may store data in the type_data[] and payload fields for use by the instantiate() and update() ops. The preparser may also propose a description for the key by attaching it as a string to the description field. This can be used by passing a NULL or "" description to the add_key() system call or the key_create_or_update() function. This cannot work with request_key() as that required the description to tell the upcall about the key to be created. This, for example permits keys that store PGP public keys to generate their own name from the user ID and public key fingerprint in the key. The instantiate() and update() operations are then modified to look like this: int (instantiate)(struct key key, struct key_preparsed_payload prep); int (update)(struct key key, struct key_preparsed_payload prep); and the new payload data is passed in *prep, whether or not it was preparsed. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2012-10-08 13:49:48 +10:30
Linus Torvalds	d8dc91b753	Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux Pul ACPI & Power Management updates from Len Brown: - acpidump utility added - intel_idle driver now supports IVB Xeon - turbostat utility can now count SMIs - ACPI can now bind to USB3 hubs - misc fixes * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (49 commits) ACPI: Add new sysfs interface to export device description ACPI: Harden acpi_table_parse_entries() against BIOS bug tools/power/turbostat: add option to count SMIs, re-name some options tools/power turbostat: add [-d MSR#][-D MSR#] options to print counter deltas intel_idle: enable IVB Xeon support tools/power turbostat: add [-m MSR#] option tools/power turbostat: make -M output pretty tools/power turbostat: print more turbo-limit information tools/power turbostat: delete unused line tools/power turbostat: run on IVB Xeon tools/power/acpi/acpidump: create acpidump(8), local make install targets tools/power/acpi/acpidump: version 20101221 - find dynamic tables in sysfs ACPI: run _OSC after ACPI_FULL_INITIALIZATION tools/power/acpi/acpidump: create acpidump(8), local make install targets tools/power/acpi/acpidump: version 20101221 - find dynamic tables in sysfs tools/power/acpi/acpidump: version 20071116 tools/power/acpi/acpidump: version 20070714 tools/power/acpi/acpidump: version 20060606 tools/power/acpi/acpidump: version 20051111 xo15-ebook: convert to module_acpi_driver() ...	2012-10-08 07:14:06 +09:00
Ulf Hansson	e6c085863f	mmc: core: Fixup broken suspend and eMMC4.5 power off notify This patch fixes up the broken suspend sequence for eMMC with sleep support. Additionally it reworks the eMMC4.5 Power Off Notification feature so it fits together with the existing sleep feature. The CMD0 based re-initialization of the eMMC at resume is re-introduced to maintain compatiblity for devices using sleep. A host shall use MMC_CAP2_POWEROFF_NOTIFY to enable the Power Off Notification feature. We might be able to remove this cap later on, if we think that Power Off Notification always is preferred over sleep, even if the host is not able to cut the eMMC VCCQ power. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Saugata Das <saugata.das@linaro.org> Acked-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Chris Ball <cjb@laptop.org>	2012-10-07 17:41:45 -04:00
Linus Torvalds	7035cdf36d	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull ceph updates from Sage Weil: "The bulk of this pull is a series from Alex that refactors and cleans up the RBD code to lay the groundwork for supporting the new image format and evolving feature set. There are also some cleanups in libceph, and for ceph there's fixed validation of file striping layouts and a bugfix in the code handling a shrinking MDS cluster." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (71 commits) ceph: avoid 32-bit page index overflow ceph: return EIO on invalid layout on GET_DATALOC ioctl rbd: BUG on invalid layout ceph: propagate layout error on osd request creation libceph: check for invalid mapping ceph: convert to use le32_add_cpu() ceph: Fix oops when handling mdsmap that decreases max_mds rbd: update remaining header fields for v2 rbd: get snapshot name for a v2 image rbd: get the snapshot context for a v2 image rbd: get image features for a v2 image rbd: get the object prefix for a v2 rbd image rbd: add code to get the size of a v2 rbd image rbd: lay out header probe infrastructure rbd: encapsulate code that gets snapshot info rbd: add an rbd features field rbd: don't use index in __rbd_add_snap_dev() rbd: kill create_snap sysfs entry rbd: define rbd_dev_image_id() rbd: define some new format constants ...	2012-10-08 06:38:18 +09:00
Linus Torvalds	6432f21284	The big new feature added this time is supporting online resizing using the meta_bg feature. This allows us to resize file systems which are greater than 16TB. In addition, the speed of online resizing has been improved in general. We also fix a number of races, some of which could lead to deadlocks, in ext4's Asynchronous I/O and online defrag support, thanks to good work by Dmitry Monakhov. There are also a large number of more minor bug fixes and cleanups from a number of other ext4 contributors, quite of few of which have submitted fixes for the first time. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABCAAGBQJQbxMXAAoJENNvdpvBGATwlg4QAJZ4mHNSL2eaaxjRtTbL1pAz +FVXpJ3lhw1lSfE9hJGqPVE8EfU2fWjIqxEI7dgh95Tukc5pUnPAQ2/hBz8ZA0qq o0AFMk3mRnvCEh6HsZfumsV83eqpR3k/zEy4uFH+KtxBskPe2sEKy3B7qOxvgdKW Gh8B2WqF2BpIj9WIT1P9G6xsxZW64EMHTbWcgRhuoRD7bakDNnwQ3kElz/TJQU5q bM/5wE7pqKwU2J1L0Ho0mxDi0f/BbXeJdA9k1tQy2KM1pZwHtpj4Ls0qmfoi49GE KyZqQOXlFbAz/9tidPDceY5KoRRQm1MwZ+1MimQX1P+40cs/w3pNu3yiibcaXIru UZ63AQMCj5JHMcFNVi20sVCwjU/ibNtEO75cfDD4bzPgHJvfCj73EbHTLl21nbTu izIMffhJEHmRnmRXiiortYVuI4b19oIfnXg7eclrJoUWSuGwKKsJOc5nMjDqidG4 B7Gq4TD89sGkIYzx+50E+ll2ispcBN0BQnGqp4k2BzgDyEHhuFYk7VuVQvJgCGTi eobzQJj7JUXPWxyemcAVkQTtUq4vVbkm/IwS+/GA9b9Z80X8hR8x6EVHUW5lX3qC YHoBSCU4XKZXXWqzx0fIVCXyKKFiBzM+OXcgHOKH90vK8k6kPmPODhNCxvV3pITU jfl9q+X1dY4SpybZjLt5 =iYeV -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "The big new feature added this time is supporting online resizing using the meta_bg feature. This allows us to resize file systems which are greater than 16TB. In addition, the speed of online resizing has been improved in general. We also fix a number of races, some of which could lead to deadlocks, in ext4's Asynchronous I/O and online defrag support, thanks to good work by Dmitry Monakhov. There are also a large number of more minor bug fixes and cleanups from a number of other ext4 contributors, quite of few of which have submitted fixes for the first time." * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (69 commits) ext4: fix ext4_flush_completed_IO wait semantics ext4: fix mtime update in nodelalloc mode ext4: fix ext_remove_space for punch_hole case ext4: punch_hole should wait for DIO writers ext4: serialize truncate with owerwrite DIO workers ext4: endless truncate due to nonlocked dio readers ext4: serialize unlocked dio reads with truncate ext4: serialize dio nonlocked reads with defrag workers ext4: completed_io locking cleanup ext4: fix unwritten counter leakage ext4: give i_aiodio_unwritten a more appropriate name ext4: ext4_inode_info diet ext4: convert to use leXX_add_cpu() ext4: ext4_bread usage audit fs: reserve fallocate flag codepoint ext4: remove redundant offset check in mext_check_arguments() ext4: don't clear orphan list on ro mount with errors jbd2: fix assertion failure in commit code due to lacking transaction credits ext4: release donor reference when EXT4_IOC_MOVE_EXT ioctl fails ext4: enable FITRIM ioctl on bigalloc file system ...	2012-10-08 06:36:39 +09:00
Linus Torvalds	1b033447bf	Merge branch 'i2c-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging Pull i2c updates from Jean Delvare: "Most visible changes are the SMBus multiplexing support added to the i2c-i801 driver, as well as support for the VIA VX900." * 'i2c-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging: i2c-piix4: Fix build failure i2c: Correct struct i2c_driver doc about detection i2c-i801: Let i2c-mux-gpio find the GPIO chip i2c-mux-gpio: Update documentation i2c-mux-gpio: Add support for dynamically allocated GPIO pins i2c-mux-gpio: Use devm_kzalloc instead of kzalloc i2c-i801: Support SMBus multiplexing on Asus Z8 series i2c-viapro: Add VIA VX900 device ID i2c-parport: i2c_parport_irq can be static i2c-designware: i2c_dw_xfer_msg can be static i2c/scx200_*: Replace printks with pr_<level>s i2c: Make I2C available on UML i2c: Convert struct i2c_msg initialization to C99 format i2c-smbus: Convert kzalloc to devm_kzalloc i2c-mux: Add support for device auto-detection	2012-10-08 06:35:17 +09:00
Eric Dumazet	ca07e43e28	net: gro: fix a potential crash in skb_gro_reset_offset Before accessing skb first fragment, better make sure there is one. This is probably not needed for old kernels, since an ethernet frame cannot contain only an ethernet header, but the recent GRO addition to tunnels makes this patch needed. Also skb_gro_reset_offset() can be static, it actually allows compiler to inline it. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-10-07 14:49:17 -04:00
Antti Palosaari	33eebec55c	[media] dvb: LNA implementation changes * use dvb property cache * implement get (thus API minor++) * PCTV 290e: 1=LNA ON, all the other values LNA OFF Also fix PCTV 290e LNA comment, it is disabled by default Hans and Mauro proposed use of cache implementation of get as they were planning to extend LNA usage for analog side too. Reported-by: Hans Verkuil <hverkuil@xs4all.nl> Reported-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Antti Palosaari <crope@iki.fi> Acked-by: Hans Verkuil <hans.verkuil@cisco.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-10-07 10:27:49 -03:00
Linus Torvalds	dc92b1f9ab	Merge branch 'virtio-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull virtio changes from Rusty Russell: "New workflow: same git trees pulled by linux-next get sent straight to Linus. Git is awkward at shuffling patches compared with quilt or mq, but that doesn't happen often once things get into my -next branch." * 'virtio-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (24 commits) lguest: fix occasional crash in example launcher. virtio-blk: Disable callback in virtblk_done() virtio_mmio: Don't attempt to create empty virtqueues virtio_mmio: fix off by one error allocating queue drivers/virtio/virtio_pci.c: fix error return code virtio: don't crash when device is buggy virtio: remove CONFIG_VIRTIO_RING virtio: add help to CONFIG_VIRTIO option. virtio: support reserved vqs virtio: introduce an API to set affinity for a virtqueue virtio-ring: move queue_index to vring_virtqueue virtio_balloon: not EXPERIMENTAL any more. virtio-balloon: dependency fix virtio-blk: fix NULL checking in virtblk_alloc_req() virtio-blk: Add REQ_FLUSH and REQ_FUA support to bio path virtio-blk: Add bio-based IO path for virtio-blk virtio: console: fix error handling in init() function tools: Fix pthread flag for Makefile of trace-agent used by virtio-trace tools: Add guest trace agent as a user tool virtio/console: Allocate scatterlist according to the current pipe size ...	2012-10-07 21:04:56 +09:00
Dave Airlie	1f31c69dac	Merge branch 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel into drm-next Daniel writes: Bigger -fixes pile, mostly because I've included Ajax' DP dongle stuff, as discussed on irc. Otherwise just small things: - regression fix to finally make 6bpc auto-dither on dp work (Jani) - reinstate an snb ctx w/a that accidentally got lost in a rework (Chris) - fixup the DP train sequence, logic-goof-up uncovered by Coverty (Chris) - fix set_caching locking (Ben) - fix spurious segfault on con-current gtt mmap faulting (Dimitry and Mika) - some pageflip correctness fixes (still hunting down some issues, but these are the worst offenders of confused code that we've tracked down thus far) from Chris and me - fixup swizzling settings on vlv (Jesse) - gt_mode w/a from Ben added, fixes snb gt1 rc6+hw ctx hangs. * 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel: drm/i915: Fix GT_MODE default value drm/i915: don't frob the vblank ts in finish_page_flip drm/i915: call drm_handle_vblank before finish_page_flip drm/i915: print warning if vmi915_gem_fault error is not handled drm/i915: EBUSY status handling added to i915_gem_fault(). drm/i915: Try harder to complete DP training pattern 1 drm/i915: set swizzling to none on VLV drm/dp: Make sink count DP 1.2 aware drm/dp: Document DP spec versions for various DPCD registers drm/i915/dp: Be smarter about connection sense for branch devices drm/i915/dp: Fetch downstream port info if needed during DPCD fetch drm/dp: Update DPCD defines drm: Export drm_probe_ddc() drm/i915: Flush the pending flips on the CRTC before modification drm/i915: Actually invalidate the TLB for the SandyBridge HW contexts w/a drm/i915: Fix set_caching locking drm/i915: use adjusted_mode instead of mode for checking the 6bpc force flag	2012-10-07 21:13:54 +10:00
Dave Airlie	a5a0fc6743	Merge branch 'exynos-drm-next' of git://git.infradead.org/users/kmpark/linux-samsung into drm-next Inki writes: "this patch set updates exynos drm framework and includes minor fixups. and this pull request except hdmi device tree support patch set posted by Rahul Sharma because that includes media side patch so for this patch set, we may have git pull one more time in addition, if we get an agreement with media guys. for this patch, you can refer to below link, http://comments.gmane.org/gmane.comp.video.dri.devel/74504 this pull request adds hdmi device tree support and includes related patch set such as disabling of hdmi internal interrupt, suppport for platform variants for hdmi and mixer, support to disable video processor based on platform type and removal of drm common platform data. as you know, this patch set was delayed because it included an media side patch. so for this, we got an ack from v4l2-based hdmi driver author, Tomasz Stanislawski." * 'exynos-drm-next' of git://git.infradead.org/users/kmpark/linux-samsung: (34 commits) drm: exynos: hdmi: remove drm common hdmi platform data struct drm: exynos: hdmi: add support for exynos5 hdmi drm: exynos: hdmi: replace is_v13 with version check in hdmi drm: exynos: hdmi: add support for exynos5 mixer drm: exynos: hdmi: add support to disable video processor in mixer drm: exynos: hdmi: add support for platform variants for mixer drm: exynos: hdmi: add support for exynos5 hdmiphy drm: exynos: hdmi: add support for exynos5 ddc drm: exynos: remove drm hdmi platform data struct drm: exynos: hdmi: turn off HPD interrupt in HDMI chip drm: exynos: hdmi: use s5p-hdmi platform data drm: exynos: hdmi: fix interrupt handling drm: exynos: hdmi: support for platform variants media: s5p-hdmi: add HPD GPIO to platform data drm/exynos: fix kcalloc size of g2d cmdlist node drm/exynos: fix to calculate CRTC shown via screen drm/exynos: fix display power call issue. drm/exynos: add platform_device_id table and driver data for drm fimd drm/exynos: Fix potential NULL pointer dereference drm/exynos: support drm_wait_vblank feature for VIDI ... Conflicts: include/drm/exynos_drm.h	2012-10-07 21:06:33 +10:00
Dave Airlie	0dbe232183	Merge branch 'disintegrate-drm' of git://git.infradead.org/users/dhowells/linux-headers into drm-next Merge the uapi bits for drm. * 'disintegrate-drm' of git://git.infradead.org/users/dhowells/linux-headers: UAPI: (Scripted) Disintegrate include/drm	2012-10-07 21:04:20 +10:00
Yi Zou	3b64b18811	[SCSI] libfc: fix lun reset failure bugs in fc_fcp_resp handling of FCP_RSP_INFO In LUN RESET testing involving NetApp targets, it is observed that LUN RESET is failing. The fc_fcp_resp() is not completing the completion for the LUN RESET task since fc_fcp_resp assumes that the FCP_RSP_INFO is 8 bytes with the 4 byte reserved field, where in case of NetApp targets the FCP_RSP to LUN RESET only has 4 bytes of FCP_RSP_INFO. This leads fc_fcp_resp to error out w/o completing the task completion, eventually causing LUN RESET to be escalated to host reset, which is not very nice. Per FCP-3 r04, clause 9.5.15 and Table 23, the FCP_RSP_INFO field can be either 4 bytes or 8 bytes, with the last 4 bytes as "Reserved (if any)". Therefore it is valid to have 4 bytes FCP_RSP_INFO like some of the NetApp targets behave. Fixing this by validating the FCP_RSP_INFO against both the two spec allowed length. Reported-by: Frank Zhang <frank_1.zhang@intel.com> Signed-off-by: Yi Zou <yi.zou@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2012-10-07 11:52:55 +01:00
Neerav Parikh	31c37a6f21	[SCSI] fcoe: Fix write errors on NPIV ports SCSI errors were generated while writing to LUNs connected via NPIV ports. Debugging this it was found that the FCoE packets transmitted via the NPIV ports were not tagged with correct user priority as negotiated with peer by DCB agent. This resulted in FCoE traffic going with priority zero(0) that did not have priority flow control (PFC) enabled for it. The initiator after transferring data to the target never saw any reply indicating the transfer was complete. This resulted in error recovery (ABTS) and SCSI command retries by the scsi-mid layer; eventually resulting in I/O errors. This patch fixes this issue by keeping the FCoE user priority information in the fcoe_interface instance that is common for both the physical port as well as NPIV ports connected to that physical port; instead of storing it in fcoe_port structure that has a per port instance. Signed-off-by: Neerav Parikh <Neerav.Parikh@intel.com> Acked-by: Yi Zou <yi.zou@intel.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Tested-by: Marcus Dennis <marcusx.e.dennis@intel.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2012-10-07 11:49:34 +01:00
Linus Torvalds	0b8e74c6f4	Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media updates from Mauro Carvalho Chehab: "The first part of the media updates for Kernel 3.7. This series contain: - A major tree renaming patch series: now, drivers are organized internally by their used bus, instead of by V4L2 and/or DVB API, providing a cleaner driver location for hybrid drivers that implement both APIs, and allowing to cleanup the Kconfig items and make them more intuitive for the end user; - Media Kernel developers are typically very lazy with their duties of keeping the MAINTAINERS entries for their drivers updated. As now the tree is more organized, we're doing an effort to add/update those entries for the drivers that aren't currently orphan; - Several DVB USB drivers got moved to a new DVB USB v2 core; the new core fixes several bugs (as the existing one that got bitroted). Now, suspend/resume finally started to work fine (at least with some devices - we should expect more work with regards to it); - added multistream support for DVB-T2, and unified the API for DVB-S2 and ISDB-S. Backward binary support is preserved; - as usual, a few new drivers, some V4L2 core improvements and lots of drivers improvements and fixes. There are some points to notice on this series: 1) you should expect a trivial merge conflict on your tree, with the removal of Documentation/feature-removal-schedule.txt: this series would be adding two additional entries there. I opted to not rebase it due to this recent change; 2) With regards to the PCTV 520e udev-related breakage, I opted to fix it in a way that the patches can be backported to 3.5 even without your firmware fix patch. This way, Greg doesn't need to rush backporting your patch (as there are still the firmware cache and firmware path customization issues to be addressed there). I'll send later a patch (likely after the end of the merge window) reverting the rest of the DRX-K async firmware request, fully restoring its original behaviour to allow media drivers to initialize everything serialized as before for 3.7 and upper. 3) I'm planning to work on this weekend to test the DMABUF patches for V4L2. The patches are on my queue for several Kernel cycles, but, up to now, there is/was no way to test the series locally. I have some concerns about this particular changeset with regards to security issues, and with regards to the replacement of the old VIDIOC_OVERLAY ioctl's that is broken on modern systems, due to GPU drivers change. The Overlay API allows direct PCI2PCI transfers from a media capture card into the GPU framebuffer, but its API is crappy. Also, the only existing X11 driver that implements it requires a XV extension that is not available anymore on modern drivers. The DMABUF can do the same thing, but with it is promising to be a properly-designed API. If I can successfully test this series and be happy with it, I should be asking you to pull them next week." * 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (717 commits) em28xx: regression fix: use DRX-K sync firmware requests on em28xx drxk: allow loading firmware synchrousnously em28xx: Make all em28xx extensions to be initialized asynchronously [media] tda18271: properly report read errors in tda18271_get_id [media] tda18271: delay IR & RF calibration until init() if delay_cal is set [media] MAINTAINERS: add Michael Krufky as tda827x maintainer [media] MAINTAINERS: add Michael Krufky as tda8290 maintainer [media] MAINTAINERS: add Michael Krufky as cxusb maintainer [media] MAINTAINERS: add Michael Krufky as lg2160 maintainer [media] MAINTAINERS: add Michael Krufky as lgdt3305 maintainer [media] MAINTAINERS: add Michael Krufky as mxl111sf maintainer [media] MAINTAINERS: add Michael Krufky as mxl5007t maintainer [media] MAINTAINERS: add Michael Krufky as tda18271 maintainer [media] s5p-tv: Report only multi-plane capabilities in vidioc_querycap [media] s5p-mfc: Fix misplaced return statement in s5p_mfc_suspend() [media] exynos-gsc: Add missing static storage class specifiers [media] exynos-gsc: Remove <linux/version.h> header file inclusion [media] s5p-fimc: Fix incorrect condition in fimc_lite_reqbufs() [media] s5p-tv: Fix potential NULL pointer dereference error [media] s5k6aa: Fix possible NULL pointer dereference ...	2012-10-07 17:49:05 +09:00
Linus Torvalds	7f60ba388f	1. We no longer ad-hoc to the function tracer "high level" infrastructure and no longer use its debugfs knobs. The change slightly touches kernel/trace directory, but it got the needed ack from Steven Rostedt: http://lkml.org/lkml/2012/8/21/688 2. Added maintainers entry; 3. A bunch of fixes, nothing special. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iQIcBAABAgAGBQJQbjoFAAoJEGgI9fZJve1bZgUP/A/ZwFGfdnochDgRhK5p7ljY baRZpgSh2B+BxIDTEPLfVh6HbOivmYJ8WF0unD9kKTzCCS71ZUMiLB25G/bV4lnZ fawAOhGfOLG3rXmldxf6nJllHr9JpoSmVEHypvjFNcbjYZ04zhe7jM+YsaWmBw68 eHXQkOSdfPPpKXZ2B0Eef/EoGWhORW0kTD7xFlorsxYAkksSheY0PC0nYgCFhvCZ 168y9pi4T4lucr4s44x8AJ/r/5BQ1jEQAY/A2qUE/iBfRFP4XyE1Oao4OHtVDYdU KjVPA1VYmwkKSfnkiVFrpb/94IyrKslblgR8nX0kK3L/ccFYjQix4nd9jR+n857s xfAuj9nfhUO6fI5qoaVSOBufxKyPp1S7X8INEAJ7WQ0c9VoMv00biK9M77ifDGZg ll/Ecq1CADtcbOnQXf6qwGwRKmpR+qgPkIzpNXcuGMuM4AEPwtckOhCyXFr37Txk 6ZoGM8IIaBJ0yXxHkfpUA7l9ZF0gXR+qHMQCwpUS8tIMx35On+IbybEaKbniKEi1 AURgQ7ZimVYAHPi0Y0L00+EKI3IPVQJvCFH7SG+wUfLWcbEtNbTv3MAer5o3DANJ GMnWBwNw9ClTydWKI0GMNmnWpFukWhd4OXleyl2+q4qRJi3HhNacrok3s/2r+CnT QRg8i/0SDvxGuXazrTZT =1HAE -----END PGP SIGNATURE----- Merge tag 'for-v3.7' of git://git.infradead.org/users/cbou/linux-pstore Pull pstore changes from Anton Vorontsov: 1) We no longer ad-hoc to the function tracer "high level" infrastructure and no longer use its debugfs knobs. The change slightly touches kernel/trace directory, but it got the needed ack from Steven Rostedt: http://lkml.org/lkml/2012/8/21/688 2) Added maintainers entry; 3) A bunch of fixes, nothing special. * tag 'for-v3.7' of git://git.infradead.org/users/cbou/linux-pstore: pstore: Avoid recursive spinlocks in the oops_in_progress case pstore/ftrace: Convert to its own enable/disable debugfs knob pstore/ram: Add missing platform_device_unregister MAINTAINERS: Add pstore maintainers pstore/ram: Mark ramoops_pstore_write_buf() as notrace pstore/ram: Fix printk format warning pstore/ram: Fix possible NULL dereference	2012-10-07 17:30:50 +09:00
Linus Torvalds	e665faa424	1. New drivers: - Marvell 88pm860x charger and battery drivers; - Texas Instruments LP8788 charger driver; 2. Two new power supply properties: whether a battery is authentic, and chargers' maximal currents and voltages; 3. A lot of TI LP8727 Charger cleanups; 4. New features for Charger Manager, mainly now we can disable specific regulators; 5. Random fixes and cleanups for other drivers. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iQIcBAABAgAGBQJQbjMFAAoJEGgI9fZJve1bR6gP/2Ri0etCU5zQMmbsFv5wxwHT BNgqFYoLmtxAJm7afM/9Qo2GxSam/Gi6vW6jfYxVOEDt9hJ5zaaJBxM4XmKEBShR PYlV9u45jGUCIHJXoi8DTND9Swz0VfVve9UtK5fVdXr73fYFemP8U86alI1MBL5Q 8U3F0V8rObjGKWZrzN4BreB8bfbZYQCUafpoGS1ids3Uwm38d4tDb/jx8guSV24D LEhXy9unV9NdPExRn1FWFAgqjtgnnBv5SmpCBd6VwKNmpOgVB+H0CjOaOPzMgd+0 X6dpJQtbjGZOhHhkcAoWsXYgxS8WMtTqlSjHSgswJJ/fdjv8YJoT7ncyu5wItY1x cN5NRBKVpnHv+fkQDE4xVWzEOH68lK8RoGksay0gvTJj3MQWwOpOANGHbsnMG6rj GBzQ7pyMbDTK7n1oynvmSskF4BdvYHM+Ns7vPUOAzoduKrOnhPVDpk3rLSFZxQNO RoqXbnNYqg4qROB9z8Drs95WD59fKjNTlfFr1moR2sr4wncEJiFduMz4dCOs7nq7 ahZxd95a2Bb6LbcN244+loGBXmx6KD6BurrDc3yTM3hl+oVx+MecVv9uoKvKlz4y YPh9XrdCSfj9Ms7TOqW8XPDl/ZREswaCIer7x8XsL+K42fYkPKlrO3OUMHIbh+3e rfkFPAR55bLcmtiaahp3 =LkdH -----END PGP SIGNATURE----- Merge tag 'for-v3.7' of git://git.infradead.org/battery-2.6 Pull battery updates from Anton Vorontsov: "1. New drivers: - Marvell 88pm860x charger and battery drivers; - Texas Instruments LP8788 charger driver; 2. Two new power supply properties: whether a battery is authentic, and chargers' maximal currents and voltages; 3. A lot of TI LP8727 Charger cleanups; 4. New features for Charger Manager, mainly now we can disable specific regulators; 5. Random fixes and cleanups for other drivers." Fix up trivial conflicts in <linux/mfd/88pm860x.h> * tag 'for-v3.7' of git://git.infradead.org/battery-2.6: (52 commits) pda_power: Remove ac_draw_failed goto and label charger-manager: Add support sysfs entry for charger charger-manager: Support limit of maximum possible charger-manager: Check fully charged state of battery periodically lp8727_charger: More pure cosmetic improvements lp8727_charger: Fix checkpatch warning lp8727_charger: Add description in the private data lp8727_charger: Fix a typo - chg_parm to chg_param lp8727_charger: Make some cosmetic changes in lp8727_delayed_func() lp8727_charger: Clean up lp8727_charger_changed() lp8727_charger: Return if the battery is discharging lp8727_charger: Make lp8727_charger_get_propery() simpler lp8727_charger: Make lp8727_ctrl_switch() inline lp8727_charger: Make lp8727_init_device() shorter lp8727_charger: Clean up lp8727_is_charger_attached() lp8727_charger: Use specific definition lp8727_charger: Clean up lp8727 definitions lp8727_charger: Use the definition rather than enum lp8727_charger: Fix code for getting battery temp lp8727_charger: Clear interrrupts at inital time ...	2012-10-07 17:29:24 +09:00
Eric Dumazet	acb600def2	net: remove skb recycling Over time, skb recycling infrastructure got litle interest and many bugs. Generic rx path skb allocation is now using page fragments for efficient GRO / TCP coalescing, and recyling a tx skb for rx path is not worth the pain. Last identified bug is that fat skbs can be recycled and it can endup using high order pages after few iterations. With help from Maxime Bizon, who pointed out that commit `87151b8689` (net: allow pskb_expand_head() to get maximum tailroom) introduced this regression for recycled skbs. Instead of fixing this bug, lets remove skb recycling. Drivers wanting really hot skbs should use build_skb() anyway, to allocate/populate sk_buff right before netif_receive_skb() Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Maxime Bizon <mbizon@freebox.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-10-07 00:40:54 -04:00
Gao feng	809d5fc9bf	infiniband: pass rdma_cm module to netlink_dump_start set netlink_dump_control.module to avoid panic. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Cc: Roland Dreier <roland@kernel.org> Cc: Sean Hefty <sean.hefty@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-10-07 00:30:56 -04:00
Gao feng	6dc878a8ca	netlink: add reference of module in netlink_dump_start I get a panic when I use ss -a and rmmod inet_diag at the same time. It's because netlink_dump uses inet_diag_dump which belongs to module inet_diag. I search the codes and find many modules have the same problem. We need to add a reference to the module which the cb->dump belongs to. Thanks for all help from Stephen,Jan,Eric,Steffen and Pablo. Change From v3: change netlink_dump_start to inline,suggestion from Pablo and Eric. Change From v2: delete netlink_dump_done,and call module_put in netlink_dump and netlink_sock_destruct. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-10-07 00:30:56 -04:00
Linus Torvalds	ed5062ddaa	Merge branch 'uapi-prep' of git://git.infradead.org/users/dhowells/linux-headers Pull UAPI disintegration fixes from David Howells: "There are three main parts: (1) I found I needed some more fixups in the wake of testing Arm64 (some asm/unistd.h files had weird guards that caused problems - mostly in arches for which I don't have a compiler) and some __KERNEL__ splitting needed to take place in Arm64. (2) I found that c6x was missing some __KERNEL__ guards in its asm/signal.h. Mark Salter pointed me at a tree with a patch to remove that file entirely and use the asm-generic variant instead. (3) Lastly, m68k turned out to have a header installation problem due to it lacking a kvm_para.h file. The conditional installation bits for linux/kvm_para.h, linux/kvm.h and linux/a.out.h weren't very well specified - and didn't work if an arch didn't have the asm/ version of that file, but there was an asm-generic/ version. It seems the "ifneq $((wildcard ...),)" for each of those three headers in include/kernel/Kbuild is invoked twice during header installation, and the second time it matches on the just installed asm-generic/kvm_para.h file and thus incorrectly installs linux/kvm_para.h as well. Most arches actually have an asm/kvm_para.h, so this wasn't detectable in those." * 'uapi-prep' of git://git.infradead.org/users/dhowells/linux-headers: UAPI: Fix conditional header installation handling (notably kvm_para.h on m68k) c6x: remove c6x signal.h UAPI: Split compound conditionals containing __KERNEL__ in Arm64 UAPI: Fix the guards on various asm/unistd.h files c6x: make dsk6455 the default config	2012-10-07 07:55:10 +09:00
Linus Torvalds	125b79d74a	Merge branch 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux Pull SLAB changes from Pekka Enberg: "New and noteworthy: * More SLAB allocator unification patches from Christoph Lameter and others. This paves the way for slab memcg patches that hopefully will land in v3.8. * SLAB tracing improvements from Ezequiel Garcia. * Kernel tainting upon SLAB corruption from Dave Jones. * Miscellanous SLAB allocator bug fixes and improvements from various people." * 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux: (43 commits) slab: Fix build failure in __kmem_cache_create() slub: init_kmem_cache_cpus() and put_cpu_partial() can be static mm/slab: Fix kmem_cache_alloc_node_trace() declaration Revert "mm/slab: Fix kmem_cache_alloc_node_trace() declaration" mm, slob: fix build breakage in __kmalloc_node_track_caller mm/slab: Fix kmem_cache_alloc_node_trace() declaration mm/slab: Fix typo _RET_IP -> _RET_IP_ mm, slub: Rename slab_alloc() -> slab_alloc_node() to match SLAB mm, slab: Rename __cache_alloc() -> slab_alloc() mm, slab: Match SLAB and SLUB kmem_cache_alloc_xxx_trace() prototype mm, slab: Replace 'caller' type, void* -> unsigned long mm, slob: Add support for kmalloc_track_caller() mm, slab: Remove silly function slab_buffer_size() mm, slob: Use NUMA_NO_NODE instead of -1 mm, sl[au]b: Taint kernel when we detect a corrupted slab slab: Only define slab_error for DEBUG slab: fix the DEADLOCK issue on l3 alien lock slub: Zero initial memory segment for kmem_cache and kmem_cache_node Revert "mm/sl[aou]b: Move sysfs_slab_add to common" mm/sl[aou]b: Move kmem_cache refcounting to common code ...	2012-10-07 07:53:13 +09:00
Linus Torvalds	f1c6872e49	Features: * Allow a Linux guest to boot as initial domain and as normal guests on Xen on ARM (specifically ARMv7 with virtualized extensions). PV console, block and network frontend/backends are working. Bug-fixes: * Fix compile linux-next fallout. * Fix PVHVM bootup crashing. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQEcBAABAgAGBQJQbJELAAoJEFjIrFwIi8fJSI4H/32qrQKyF5IIkFKHTN9FYDC1 OxEGc4y47DIQpGUd/PgZ/i6h9Iyhj+I6pb4lCevykwgd0j83noepluZlCIcJnTfL HVXNiRIQKqFhqKdjTANxVM4APup+7Lqrvqj6OZfUuoxaZ3tSTLhabJ/7UXf2+9xy g2RfZtbSdQ1sukQ/A2MeGQNT79rh7v7PrYQUYSrqytjSjSLPTqRf75HWQ+eapIAH X3aVz8Tn6nTixZWvZOK7rAaD4awsFxGP6E46oFekB02f4x9nWHJiCZiXwb35lORb tz9F9td99f6N4fPJ9LgcYTaCPwzVnceZKqE9hGfip4uT+0WrEqDxq8QmBqI5YtI= =gxJD -----END PGP SIGNATURE----- Merge tag 'stable/for-linus-3.7-arm-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull ADM Xen support from Konrad Rzeszutek Wilk: Features: * Allow a Linux guest to boot as initial domain and as normal guests on Xen on ARM (specifically ARMv7 with virtualized extensions). PV console, block and network frontend/backends are working. Bug-fixes: * Fix compile linux-next fallout. * Fix PVHVM bootup crashing. The Xen-unstable hypervisor (so will be 4.3 in a ~6 months), supports ARMv7 platforms. The goal in implementing this architecture is to exploit the hardware as much as possible. That means use as little as possible of PV operations (so no PV MMU) - and use existing PV drivers for I/Os (network, block, console, etc). This is similar to how PVHVM guests operate in X86 platform nowadays - except that on ARM there is no need for QEMU. The end result is that we share a lot of the generic Xen drivers and infrastructure. Details on how to compile/boot/etc are available at this Wiki: http://wiki.xen.org/wiki/Xen_ARMv7_with_Virtualization_Extensions and this blog has links to a technical discussion/presentations on the overall architecture: http://blog.xen.org/index.php/2012/09/21/xensummit-sessions-new-pvh-virtualisation-mode-for-arm-cortex-a15arm-servers-and-x86/ * tag 'stable/for-linus-3.7-arm-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: (21 commits) xen/xen_initial_domain: check that xen_start_info is initialized xen: mark xen_init_IRQ __init xen/Makefile: fix dom-y build arm: introduce a DTS for Xen unprivileged virtual machines MAINTAINERS: add myself as Xen ARM maintainer xen/arm: compile netback xen/arm: compile blkfront and blkback xen/arm: implement alloc/free_xenballooned_pages with alloc_pages/kfree xen/arm: receive Xen events on ARM xen/arm: initialize grant_table on ARM xen/arm: get privilege status xen/arm: introduce CONFIG_XEN on ARM xen: do not compile manage, balloon, pci, acpi, pcpu and cpu_hotplug on ARM xen/arm: Introduce xen_ulong_t for unsigned long xen/arm: Xen detection and shared_info page mapping docs: Xen ARM DT bindings xen/arm: empty implementation of grant_table arch specific functions xen/arm: sync_bitops xen/arm: page.h definitions xen/arm: hypercalls ...	2012-10-07 07:13:01 +09:00
Len Brown	3f44ea0d1c	Merge branches 'acpica', 'acpidump', 'intel-idle', 'misc', 'module_acpi_driver-simplify', 'turbostat' and 'usb3' into release add acpidump utility intel_idle driver now supports IVB Xeon turbostat can now count SMIs ACPI can now bind to USB3 hubs misc fixes	2012-10-06 16:00:32 -04:00
Lance Ortiz	d1efe3c324	ACPI: Add new sysfs interface to export device description Add support to export the device description obtained from the ACPI _STR method, if one exists for a device, to user-space via a sysfs interface. This new interface provides a standard and platform neutral way for users to obtain the description text stored in the ACPI _STR method. If no _STR method exists for the device, no sysfs 'description' file will be created. The 'description' file will be located in the /sys/devices/ directory using the device's path. /sys/device/<bus>/<bridge path>/<device path>.../firmware_node/description Example: /sys/devices/pci0000:00/0000:00.07.0/0000:0e:00.0/firmware_node/description It can also be located using the ACPI device path, for example: /sys/devices/LNXSYSTM:00/device:00/ACPI0004:00/PNP0A08:00/device:13/device:15/description /sys/devices/LNXSYSTM:00/device:00/ACPI0004:00/ACPI0004:01/ACPI0007:02/description Execute the 'cat' command on the 'description' file to obtain the description string for that device. This patch also includes documentation describing how the new sysfs interface works Changes from v1-v2 based on comments by Len Brown and Fengguang Wu * Removed output "No Description" and leaving a NULL attribute if the _STR method failed to evaluate. * In acpi_device_remove_files() removed the redundent check of dev->pnp.str_obj before calling free. This check triggered a message from smatch. Signed-off-by: Lance Ortiz <lance.ortiz@hp.com> Signed-off-by: Len Brown <len.brown@intel.com>	2012-10-06 15:52:16 -04:00
Takashi Iwai	0fd0ba5f9e	ASoC: Additional updates for v3.7 A couple more updates for 3.7, enhancements to the ux500 and wm2000 drivers, a new driver for DA9055 and the support for regulator bypass mode. With the exception of the DA9055 this has all had a chance to soak in -next (the driver was added on Friday so should be in -next today). -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJQaWzdAAoJEOoSHmUN5Tg4UB0P/Avr2Xbg+aH6hrqcSBe7TR1C 410uSU/4b9vz3BOoZ3ITW8+6C9chwtnOPuRsCtUQbWFy8Rd9rZvmNQL2xLYK/NMt tMvRMJXQZtwGCes8Agw5mme9Pcun7ra5vKwQ93mxvTqAQOkLTlCo43PcXSr9j8+Z QnnVlajOECm3+PjwXzQ9dwc+QhgAiLR35Xhe7CsCKvj278Ng4FenZpxj+FySaXhL rZnSnImNUn/zAuSIlsOasTOciSNwuqrGcdholpWcFJ8qHzAmntAL4VBrQQl3FNNh qTsrIzzlByEdK5qFr8/7erRlni4Xy1YmSOhtJ85fGFfo62xUK+cdLCpvPFKwqTPt GfQtsqnKTUNBUSoAHDHWCf0zP4/80ZD4XTEpt8+3Oj7QsKzwU2YS3gNf9zQktDtZ lMKo/yN0ihPAmIHLtQdXpNDCuZDyurP/r11sJku4GQXnQG302pzGo8Lc5mig7Tzw 5TDQ58OY4Gz4pJZ7y70nGn8+z3nMMBkoMFXZD1dBxgQnNdvNWrgu1jiGPOHrHXMm TkVS3i7VWXAwX5jAGnXeUOuNmlGsvHE/WO7dantGGgf1ef06oqcM/FEeew7heme0 oLYiklbhaxdWED632FVjzsOpgEwTF+QUqKpzAMFsFieK9yQzFiZ2mkuMT+1QoQaI 8ksSVvD1cP/yXkVf8h1E =nHb7 -----END PGP SIGNATURE----- Merge tag 'asoc-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-next ASoC: Additional updates for v3.7 A couple more updates for 3.7, enhancements to the ux500 and wm2000 drivers, a new driver for DA9055 and the support for regulator bypass mode. With the exception of the DA9055 this has all had a chance to soak in -next (the driver was added on Friday so should be in -next today).	2012-10-06 16:33:52 +02:00
Jean Pihet	3db11feffc	ARM: OMAP: convert I2C driver to PM QoS for MPU latency constraints Convert the driver from the outdated omap_pm_set_max_mpu_wakeup_lat API to the new PM QoS API. Since the constraint is on the MPU subsystem, use the PM_QOS_CPU_DMA_LATENCY class of PM QoS. The resulting MPU constraints are used by cpuidle to decide the next power state of the MPU subsystem. The I2C device latency timing is derived from the FIFO size and the clock speed and so is applicable to all OMAP SoCs. Signed-off-by: Jean Pihet <j-pihet@ti.com> Acked-by: Shubhrajyoti D <shubhrajyoti@ti.com> Acked-by: Tony Lindgren <tony@atomide.com> Acked-by: Kevin Hilman <khilman@ti.com> Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>	2012-10-06 13:43:38 +02:00
Lee Jones	43fea5813c	i2c: nomadik: Add Device Tree support to the Nomadik I2C driver Here we apply the bindings required for successful Device Tree probing of the i2c-nomadik driver. Signed-off-by: Lee Jones <lee.jones@linaro.org> Acked-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>	2012-10-06 13:20:33 +02:00
Thomas Kavanagh	a76e7c6821	i2c: algo: pca: Fix chip reset function for PCA9665 The parameter passed to pca9665_reset is adap->data (which is bus driver specific), not i2c_algp_pca_data adap. pca9665_reset expects it to be i2c_algp_pca_data adap. All other wrappers from the algo call back to the bus driver, which knows to handle its custom data. Only pca9665_reset resides inside the algorithm code and does not know how to handle a custom data structure. This can result in a kernel crash. Fix by re-factoring pca_reset() from a macro to a function handling chip specific code, and only call adap->reset_chip() if the chip is not PCA9665. Signed-off-by: Thomas Kavanagh <tkavanagh@juniper.net> Signed-off-by: Guenter Roeck <groeck@juniper.net> Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>	2012-10-06 13:14:36 +02:00
Tomasz Stanislawski	78a0b0608b	[media] media: s5p-hdmi: add HPD GPIO to platform data This patch extends s5p-hdmi platform data by a GPIO identifier for Hot-Plug-Detection pin. Signed-off-by: Tomasz Stanislawski <t.stanislaws@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-10-05 23:11:51 -03:00
Arun Kumar K	2e81dde943	[media] v4l: Add control definitions for new H264 encoder features New controls are added for supporting H264 encoding features like: - MVC frame packing, - flexible macroblock ordering, - arbitrary slice ordering, - hierarchical coding. Signed-off-by: Jeongtae Park <jtp.park@samsung.com> Signed-off-by: Naveen Krishna Chatradhi <ch.naveen@samsung.com> Signed-off-by: Arun Kumar K <arun.kk@samsung.com> Acked-by: Kamil Debski <k.debski@samsung.com> Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-10-05 22:42:17 -03:00
Arun Kumar K	4d08f670e6	[media] v4l: Add fourcc definitions for new formats Add the following new fourcc definitions, for multiplanar YCbCr: V4L2_PIX_FMT_NV21M, V4L2_PIX_FMT_NV12MT_16X16 and compressed formats: V4L2_PIX_FMT_H264_MVC, V4L2_PIX_FMT_VP8. Signed-off-by: Jeongtae Park <jtp.park@samsung.com> Signed-off-by: Naveen Krishna Chatradhi <ch.naveen@samsung.com> Signed-off-by: Arun Kumar K <arun.kk@samsung.com> Acked-by: Kamil Debski <k.debski@samsung.com> Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-10-05 22:39:51 -03:00
Sylwester Nawrocki	65214a8603	[media] s5p-csis: Allow to specify pixel clock's source through platform data Depending on the sensor configuration it might be required to adjust the CSIS's output pixel clock so it is greater than its input pixel clock, in order to avoid the input data FIFO overflow. Use platform data to select SCLK_CSIS clock from CMU as a source, rather than CSI APB clock. Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-10-05 22:35:16 -03:00
Sylwester Nawrocki	09ff034047	[media] s5p-fimc: Remove unused platform data structure fields alignment, fixed_phy_vdd and phy_enable fields are now unused so removed them. The data alignment is now derived directly from media bus pixel code, phy_enable callback has been replaced with direct function call and fixed_phy_vdd was dropped in commit `438df3ebe5` "[media] s5p-csis: Handle all available power supplies". Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-10-05 22:34:47 -03:00
Sylwester Nawrocki	ccbfd1d49d	[media] s5p-csis: Replace phy_enable platform data callback with direct call The phy_enable callback is common for all Samsung SoC platforms, replace it with direct function call so the MIPI-CSI2 DPHY control is also possible on device tree instantiated platforms. Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-10-05 22:33:39 -03:00
Mauro Carvalho Chehab	eabe7b01c2	Merge branch 'samsung_platform_data' into staging/for_v3.7 * samsung_platform_data: ARM: samsung: move platform_data definitions ARM: orion: move platform_data definitions ARM: nomadik: move platform_data definitions ARM: w90x900: move platform_data definitions ARM: vt8500: move platform_data definitions ARM: tegra: move sdhci platform_data definition ARM: sa1100: move platform_data definitions ARM: pxa: move platform_data definitions ARM: netx: move platform_data definitions ARM: msm: move platform_data definitions ARM: imx: move platform_data definitions ARM: ep93xx: move platform_data definitions ARM: davinci: move platform_data definitions ARM: at91: move platform_data definitions	2012-10-05 22:32:05 -03:00
Lad, Prabhakar	2bd4e58c9d	[media] media: davinci: vpif: display: separate out subdev from output vpif_display relied on a 1-1 mapping of output and subdev. This is not necessarily the case. Separate the two. So there is a list of subdevs and a list of outputs. Each output refers to a subdev and has routing information. An output does not have to have a subdev. The initial output for each channel is set to the fist output. Currently missing is support for associating multiple subdevs with an output. Signed-off-by: Lad, Prabhakar <prabhakar.lad@ti.com> Signed-off-by: Manjunath Hadli <manjunath.hadli@ti.com> Acked-by: Hans Verkuil <hans.verkuil@cisco.com> Acked-by: Sekhar Nori <nsekhar@ti.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-10-05 22:12:27 -03:00
Hans Verkuil	0d4f35f3f0	[media] davinci: move struct vpif_interface to chan_cfg struct vpif_interface is channel specific, not subdev specific. Move it to the channel config. Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com> Acked-by: Sekhar Nori <nsekhar@ti.com> Acked-by: Lad, Prabhakar <prabhakar.lad@ti.com> Tested-by: Lad, Prabhakar <prabhakar.lad@ti.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-10-05 22:09:57 -03:00
Hans Verkuil	7aaad13124	[media] vpif_capture: move routing info from subdev to input Routing information is a property of the input, not of the subdev. One subdev may provide multiple inputs, each with its own routing information. Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com> Acked-by: Sekhar Nori <nsekhar@ti.com> Acked-by: Lad, Prabhakar <prabhakar.lad@ti.com> Tested-by: Lad, Prabhakar <prabhakar.lad@ti.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-10-05 22:02:34 -03:00
Hans Verkuil	33bf178660	[media] vpif_capture: remove unnecessary can_route flag Calling a subdev op that isn't implemented will just return -ENOIOCTLCMD No need to have a flag for that. Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com> Acked-by: Sekhar Nori <nsekhar@ti.com> Acked-by: Lad, Prabhakar <prabhakar.lad@ti.com> Tested-by: Lad, Prabhakar <prabhakar.lad@ti.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-10-05 22:02:10 -03:00
Lad, Prabhakar	117a711a2c	[media] media: v4l2-ctrl: add a helper function to add standard control with driver specific menu Add helper function v4l2_ctrl_new_std_menu_items(), which adds a standard menu control, with driver specific menu. Signed-off-by: Lad, Prabhakar <prabhakar.lad@ti.com> Signed-off-by: Manjunath Hadli <manjunath.hadli@ti.com> Acked-by: Hans Verkuil <hans.verkuil@cisco.com> Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-10-05 21:48:03 -03:00
Lad, Prabhakar	5ebef0fbe0	[media] media: v4l2-ctrls: add control for test pattern add V4L2_CID_TEST_PATTERN of type menu, which determines the internal test pattern selected by the device. Signed-off-by: Lad, Prabhakar <prabhakar.lad@ti.com> Signed-off-by: Manjunath Hadli <manjunath.hadli@ti.com> Acked-by: Sakari Ailus <sakari.ailus@iki.fi> Acked-by: Hans Verkuil <hans.verkuil@cisco.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-10-05 21:44:26 -03:00
Sylwester Nawrocki	c3010097a7	[media] V4L: Add V4L2_PIX_FMT_S5C_UYVY_JPG fourcc definition This patch adds definition of the Samsung S5C73M3 camera specific image format. V4L2_PIX_FMT_S5C_UYVY_JPG is a two-planar format, the first plane contains interleaved UYVY and JPEG data followed by meta-data. The second plane contains additional meta-data needed for extracting JPEG and UYVY data stream from the first plane. Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-10-05 21:28:03 -03:00
Sylwester Nawrocki	d2642485d6	[media] V4L: Add V4L2_MBUS_FMT_S5C_UYVY_JPEG_1X8 media bus format This patch adds media bus pixel code for the interleaved JPEG/UYVY image format used by S5C73MX Samsung cameras. This interleaved image data is transferred on MIPI-CSI2 bus as User Defined Byte-based Data. It also defines an experimental vendor and device specific media bus formats section and adds related DocBook documentation. Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-10-05 21:28:02 -03:00
Sylwester Nawrocki	2910311924	[media] V4L: Add [get/set]_frame_desc subdev callbacks Add subdev callbacks for setting up parameters of the frame on media bus that are not exposed to user space directly. This is just an initial, mostly stub implementation. struct v4l2_mbus_frame_desc is intended to be extended with sub-structures specific to a particular hardware media bus. For now these new callbacks are used only to query or specify maximum size of a compressed or hybrid (container) media bus frame in octets. Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-10-05 21:28:01 -03:00
Sylwester Nawrocki	a375e1dfef	[media] V4L: Add s_rx_buffer subdev video operation The s_rx_buffer callback allows the host to set buffer for a data being received by the subdev, e.g. non-image frame (meta) data. This callback can be implemented by subdevice like a MIPI CSI2 receiver, allowing the host driver to gather additional data into frame buffer passed to user space. Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-10-05 21:28:00 -03:00
Vivien Didelot	0ec13867ef	i2c: Correct struct i2c_driver doc about detection s/address_data/address_list/ in addition to `c3813d6`. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: Jean Delvare <khali@linux-fr.org>	2012-10-05 22:23:54 +02:00
Jean Delvare	e7ee514058	i2c-mux-gpio: Add support for dynamically allocated GPIO pins The code instantiating an i2c-mux-gpio platform device doesn't necessarily know in advance the GPIO pin numbers it wants to use. If pins are on a GPIO device which gets its base GPIO number assigned dynamically at run-time, the values can't be hard-coded. In that case, let the caller tell i2c-mux-gpio the name of the GPIO chip and the (relative) GPIO pin numbers to use. At probe time, the i2c-mux-gpio driver will look for the chip and apply the proper offset to turn relative GPIO pin numbers to absolute GPIO pin numbers. The same could be (and was so far) done on the caller's end, however doing it in i2c-mux-gpio has two benefits: * It avoids duplicating the code on every caller's side (about 30 lines of code.) * It allows for deferred probing for the muxed part of the I2C bus only. If finding the GPIO chip is the caller's responsibility, then deferred probing (if the GPIO chip isn't there yet) will not only affect the mux and the I2C bus segments behind it, but also the I2C bus trunk. Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: Peter Korsgaard <peter.korsgaard@barco.com>	2012-10-05 22:23:54 +02:00
Jean Delvare	01d56a6aa1	i2c-viapro: Add VIA VX900 device ID The SMBus controller in the VIA VX900 appears to be compatible with the VIA VX855, so just add the device ID. This closes kernel bug #43096. Signed-off-by: Jean Delvare <khali@linux-fr.org>	2012-10-05 22:23:53 +02:00
Jean Delvare	eee543e824	i2c-mux: Add support for device auto-detection Let I2C bus segments behind multiplexers have a class. This allows for device auto-detection on these segments. As long as parent segments don't share the same class, it should be fine. I implemented support in drivers i2c-mux-gpio and i2c-mux-pca954x. I left i2c-mux-pca9541 and i2c-mux-pinctrl alone for the moment as I don't know if this feature makes sense for the use cases of these drivers. Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: Peter Korsgaard <peter.korsgaard@barco.com> Cc: David Daney <david.daney@cavium.com> Cc: Michael Lawnick <ml.lawnick@gmx.de> Cc: Rodolfo Giometti <giometti@linux.it>	2012-10-05 22:23:51 +02:00
Linus Torvalds	283dbd8205	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking changes from David Miller: "The most important bit in here is the fix for input route caching from Eric Dumazet, it's a shame we couldn't fully analyze this in time for 3.6 as it's a 3.6 regression introduced by the routing cache removal. Anyways, will send quickly to -stable after you pull this in. Other changes of note: 1) Fix lockdep splats in team and bonding, from Eric Dumazet. 2) IPV6 adds link local route even when there is no link local address, from Nicolas Dichtel. 3) Fix ixgbe PTP implementation, from Jacob Keller. 4) Fix excessive stack usage in cxgb4 driver, from Vipul Pandya. 5) MAC length computed improperly in VLAN demux, from Antonio Quartulli." * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits) ipv6: release reference of ip6_null_entry's dst entry in __ip6_del_rt Remove noisy printks from llcp_sock_connect tipc: prevent dropped connections due to rcvbuf overflow silence some noisy printks in irda team: set qdisc_tx_busylock to avoid LOCKDEP splat bonding: set qdisc_tx_busylock to avoid LOCKDEP splat sctp: check src addr when processing SACK to update transport state sctp: fix a typo in prototype of __sctp_rcv_lookup() ipv4: add a fib_type to fib_info can: mpc5xxx_can: fix section type conflict can: peak_pcmcia: fix error return code can: peak_pci: fix error return code cxgb4: Fix build error due to missing linux/vmalloc.h include. bnx2x: fix ring size for 10G functions cxgb4: Dynamically allocate memory in t4_memory_rw() and get_vpd_params() ixgbe: add support for X540-AT1 ixgbe: fix poll loop for FDIRCTRL.INIT_DONE bit ixgbe: fix PTP ethtool timestamping function ixgbe: (PTP) Fix PPS interrupt code ixgbe: Fix PTP X540 SDP alignment code for PPS signal ...	2012-10-06 03:11:59 +09:00
Linus Torvalds	11126c611e	Merge branch 'akpm' (Andrew's patch-bomb) Merge misc patches from Andrew Morton: "The MM tree is rather stuck while I wait to find out what the heck is happening with sched/numa. Probably I'll need to route around all the code which was added to -next, sigh. So this is "everything else", or at least most of it - other small bits are still awaiting resolutions of various kinds." * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (180 commits) lib/decompress.c add __init to decompress_method and data kernel/resource.c: fix stack overflow in __reserve_region_with_split() omfs: convert to use beXX_add_cpu() taskstats: cgroupstats_user_cmd() may leak on error aoe: update aoe-internal version number to 50 aoe: update documentation to better reflect aoe-plus-udev usage aoe: remove unused code aoe: make dynamic block minor numbers the default aoe: update and specify AoE address guards and error messages aoe: retain static block device numbers for backwards compatibility aoe: support more AoE addresses with dynamic block device minor numbers aoe: update documentation with new URL and VM settings reference aoe: update copyright year in touched files aoe: update internal version number to 49 aoe: remove unused code and add cosmetic improvements aoe: increase net_device reference count while using it aoe: associate frames with the AoE storage target aoe: disallow unsupported AoE minor addresses aoe: do revalidation steps in order aoe: failover remote interface based on aoe_deadsecs parameter ...	2012-10-06 03:09:16 +09:00
Paul Clements	a336d29870	nbd: handle discard requests Add discard support to nbd. If the nbd-server supports discard, it will send NBD_FLAG_SEND_TRIM to the client. The client will then set the flag in the kernel via NBD_SET_FLAGS, which tells the kernel to enable discards for the device (QUEUE_FLAG_DISCARD). If discard support is enabled, then when the nbd client system receives a discard request, this will be passed along to the nbd-server. When the discard request is received by the nbd-server, it will perform: fallocate(.. FALLOC_FL_PUNCH_HOLE ..) To punch a hole in the backend storage, which is no longer needed. Signed-off-by: Paul Clements <paul.clements@steeleye.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:24 +09:00
Paul Clements	2f01250888	nbd: add set flags ioctl Add a set-flags ioctl, allowing various option flags to be set on an nbd device. This allows the nbd-client to set the device flags (to enable read-only mode, or enable discard support, etc.). Flags are typically specified by the nbd-server. During the negotiation phase of the nbd connection, the server sends its flags to the client. The client then uses NBD_SET_FLAGS to inform the kernel of the options. Also included is a one-line fix to debug output for the set-timeout ioctl. Signed-off-by: Paul Clements <paul.clements@steeleye.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:23 +09:00
Alexandre Bounine	de74e00a96	rapidio: add destination ID allocation mechanism Replace the single global destination ID counter with per-net allocation mechanism to allow independent destID management for each available RapidIO network. Using bitmap based mechanism instead of counters allows destination ID release and reuse in systems that support hot-swap. Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:23 +09:00
Alexandre Bounine	a7071efc20	rapidio: use device lists handling on per-net basis Modify handling of device lists to resolve issues caused by using single global list of RIO devices during enumeration/discovery. The most common sign of existing issue is incorrect contents of switch routing tables in systems with multiple mport controllers while single-port configuration performs as expected. Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:22 +09:00
Alexandre Bounine	da1589f073	rapidio: add inbound memory mapping interface Add common inbound memory mapping/unmapping interface. This allows to make local memory space accessible from the RapidIO side using hardware mapping capabilities of RapidIO bridging devices. The new interface is intended to enable data transfers between RapidIO devices in combination with DMA engine support. This patch is based on patch submitted by Li Yang <leoli@freescale.com> (https://lists.ozlabs.org/pipermail/linuxppc-dev/2009-April/071210.html) Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Cc: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:21 +09:00
Alexandre Bounine	fe50c927d7	rapidio: fix kerneldoc warnings after DMA support was added Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Reported-by: Robert P. J. Day <rpjday@crashcourse.ca> Cc: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:20 +09:00
Alexandre Bounine	ed43f44f86	rapidio/tsi721: modify mport name assignment Modify RapidIO mport device name assignment to include device name of PCIe side of Tsi721 bridge. The new name format is intended to provide definitive reference between RapidIO and PCIe sides of the bridge in systems with multiple Tsi721 bridges. Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:20 +09:00
Denys Vlasenko	2aa362c49c	coredump: extend core dump note section to contain file names of mapped files This note has the following format: long count -- how many files are mapped long page_size -- units for file_ofs array of [COUNT] elements of long start long end long file_ofs followed by COUNT filenames in ASCII: "FILE1" NUL "FILE2" NUL... Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Amerigo Wang <amwang@redhat.com> Cc: "Jonathan M. Foote" <jmfoote@cert.org> Cc: Roland McGrath <roland@hack.frob.com> Cc: Pedro Alves <palves@redhat.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:17 +09:00
Denys Vlasenko	49ae4d4b11	coredump: add a new elf note with siginfo of the signal Existing PRSTATUS note contains only si_signo, si_code, si_errno fields from the siginfo of the signal which caused core to be dumped. There are tools which try to analyze crashes for possible security implications, and they want to use, among other data, si_addr field from the SIGSEGV. This patch adds a new elf note, NT_SIGINFO, which contains the complete siginfo_t of the signal which killed the process. Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Amerigo Wang <amwang@redhat.com> Cc: "Jonathan M. Foote" <jmfoote@cert.org> Cc: Roland McGrath <roland@hack.frob.com> Cc: Pedro Alves <palves@redhat.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:16 +09:00
Denys Vlasenko	751f409db6	compat: move compat_siginfo_t definition to asm/compat.h This is a preparatory patch for the introduction of NT_SIGINFO elf note. Make the location of compat_siginfo_t uniform across eight architectures which have it. Now it can be pulled in by including asm/compat.h or linux/compat.h. Most of the copies are verbatim. compat_uid[32]_t had to be replaced by __compat_uid[32]_t. compat_uptr_t had to be moved up before compat_siginfo_t in asm/compat.h on a several architectures (tile already had it moved up). compat_sigval_t had to be relocated from linux/compat.h to asm/compat.h. Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Amerigo Wang <amwang@redhat.com> Cc: "Jonathan M. Foote" <jmfoote@cert.org> Cc: Roland McGrath <roland@hack.frob.com> Cc: Pedro Alves <palves@redhat.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:16 +09:00
Denys Vlasenko	5ab1c309b3	coredump: pass siginfo_t* to do_coredump() and below, not merely signr This is a preparatory patch for the introduction of NT_SIGINFO elf note. With this patch we pass "siginfo_t *siginfo" instead of "int signr" to do_coredump() and put it into coredump_params. It will be used by the next patch. Most changes are simple s/signr/siginfo->si_signo/. Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Amerigo Wang <amwang@redhat.com> Cc: "Jonathan M. Foote" <jmfoote@cert.org> Cc: Roland McGrath <roland@hack.frob.com> Cc: Pedro Alves <palves@redhat.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:16 +09:00
Alex Kelly	179899fd5d	coredump: update coredump-related headers Create a new header file, fs/coredump.h, which contains functions only used by the new coredump.c. It also moves do_coredump to the include/linux/coredump.h header file, for consistency. Signed-off-by: Alex Kelly <alex.page.kelly@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:15 +09:00
Alex Kelly	046d662f48	coredump: make core dump functionality optional Adds an expert Kconfig option, CONFIG_COREDUMP, which allows disabling of core dump. This saves approximately 2.6k in the compiled kernel, and complements CONFIG_ELF_CORE, which now depends on it. CONFIG_COREDUMP also disables coredump-related sysctls, except for suid_dumpable and related functions, which are necessary for ptrace. [akpm@linux-foundation.org: fix binfmt_aout.c build] Signed-off-by: Alex Kelly <alex.page.kelly@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:15 +09:00
David Fries	4c24e29e65	rtc_sysfs_show_hctosys(): display 0 if resume failed Without this patch /sys/class/rtc/$CONFIG_RTC_HCTOSYS_DEVICE/hctosys contains a 1 (meaning "This rtc was used to initialize the system clock") even if setting the time by do_settimeofday() at bootup failed. The RTC can also be used to set the clock on resume, if it did 1, otherwise 0. Previously there was no indication if the RTC was used to set the clock in resume. This uses only CONFIG_RTC_HCTOSYS_DEVICE for conditional compilation instead of it and CONFIG_RTC_HCTOSYS to be more consistent. rtc_hctosys_ret was moved to class.c so class.c no longer depends on hctosys.c. [sfr@canb.auug.org.au: fix build] Signed-off-by: David Fries <David@Fries.net> Cc: Matthew Garrett <mjg@redhat.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:04 +09:00

1 2 3 4 5 ...

53858 Commits