When a user or administrator requires swap for their application, they
create a swap partition and file, format it with mkswap and activate it
with swapon. Swap over the network is considered as an option in diskless
systems. The two likely scenarios are when blade servers are used as part
of a cluster where the form factor or maintenance costs do not allow the
use of disks and thin clients.
The Linux Terminal Server Project recommends the use of the Network Block
Device (NBD) for swap according to the manual at
https://sourceforge.net/projects/ltsp/files/Docs-Admin-Guide/LTSPManual.pdf/download
There is also documentation and tutorials on how to setup swap over NBD at
places like https://help.ubuntu.com/community/UbuntuLTSP/EnableNBDSWAP The
nbd-client also documents the use of NBD as swap. Despite this, the fact
is that a machine using NBD for swap can deadlock within minutes if swap
is used intensively. This patch series addresses the problem.
The core issue is that network block devices do not use mempools like
normal block devices do. As the host cannot control where they receive
packets from, they cannot reliably work out in advance how much memory
they might need. Some years ago, Peter Zijlstra developed a series of
patches that supported swap over an NFS that at least one distribution is
carrying within their kernels. This patch series borrows very heavily
from Peter's work to support swapping over NBD as a pre-requisite to
supporting swap-over-NFS. The bulk of the complexity is concerned with
preserving memory that is allocated from the PFMEMALLOC reserves for use
by the network layer which is needed for both NBD and NFS.
Patch 1 adds knowledge of the PFMEMALLOC reserves to SLAB and SLUB to
preserve access to pages allocated under low memory situations
to callers that are freeing memory.
Patch 2 optimises the SLUB fast path to avoid pfmemalloc checks
Patch 3 introduces __GFP_MEMALLOC to allow access to the PFMEMALLOC
reserves without setting PFMEMALLOC.
Patch 4 opens the possibility for softirqs to use PFMEMALLOC reserves
for later use by network packet processing.
Patch 5 only sets page->pfmemalloc when ALLOC_NO_WATERMARKS was required
Patch 6 ignores memory policies when ALLOC_NO_WATERMARKS is set.
Patches 7-12 allows network processing to use PFMEMALLOC reserves when
the socket has been marked as being used by the VM to clean pages. If
packets are received and stored in pages that were allocated under
low-memory situations and are unrelated to the VM, the packets
are dropped.
Patch 11 reintroduces __skb_alloc_page which the networking
folk may object to but is needed in some cases to propogate
pfmemalloc from a newly allocated page to an skb. If there is a
strong objection, this patch can be dropped with the impact being
that swap-over-network will be slower in some cases but it should
not fail.
Patch 13 is a micro-optimisation to avoid a function call in the
common case.
Patch 14 tags NBD sockets as being SOCK_MEMALLOC so they can use
PFMEMALLOC if necessary.
Patch 15 notes that it is still possible for the PFMEMALLOC reserve
to be depleted. To prevent this, direct reclaimers get throttled on
a waitqueue if 50% of the PFMEMALLOC reserves are depleted. It is
expected that kswapd and the direct reclaimers already running
will clean enough pages for the low watermark to be reached and
the throttled processes are woken up.
Patch 16 adds a statistic to track how often processes get throttled
Some basic performance testing was run using kernel builds, netperf on
loopback for UDP and TCP, hackbench (pipes and sockets), iozone and
sysbench. Each of them were expected to use the sl*b allocators
reasonably heavily but there did not appear to be significant performance
variances.
For testing swap-over-NBD, a machine was booted with 2G of RAM with a
swapfile backed by NBD. 8*NUM_CPU processes were started that create
anonymous memory mappings and read them linearly in a loop. The total
size of the mappings were 4*PHYSICAL_MEMORY to use swap heavily under
memory pressure.
Without the patches and using SLUB, the machine locks up within minutes
and runs to completion with them applied. With SLAB, the story is
different as an unpatched kernel run to completion. However, the patched
kernel completed the test 45% faster.
MICRO
3.5.0-rc2 3.5.0-rc2
vanilla swapnbd
Unrecognised test vmscan-anon-mmap-write
MMTests Statistics: duration
Sys Time Running Test (seconds) 197.80 173.07
User+Sys Time Running Test (seconds) 206.96 182.03
Total Elapsed Time (seconds) 3240.70 1762.09
This patch: mm: sl[au]b: add knowledge of PFMEMALLOC reserve pages
Allocations of pages below the min watermark run a risk of the machine
hanging due to a lack of memory. To prevent this, only callers who have
PF_MEMALLOC or TIF_MEMDIE set and are not processing an interrupt are
allowed to allocate with ALLOC_NO_WATERMARKS. Once they are allocated to
a slab though, nothing prevents other callers consuming free objects
within those slabs. This patch limits access to slab pages that were
alloced from the PFMEMALLOC reserves.
When this patch is applied, pages allocated from below the low watermark
are returned with page->pfmemalloc set and it is up to the caller to
determine how the page should be protected. SLAB restricts access to any
page with page->pfmemalloc set to callers which are known to able to
access the PFMEMALLOC reserve. If one is not available, an attempt is
made to allocate a new page rather than use a reserve. SLUB is a bit more
relaxed in that it only records if the current per-CPU page was allocated
from PFMEMALLOC reserve and uses another partial slab if the caller does
not have the necessary GFP or process flags. This was found to be
sufficient in tests to avoid hangs due to SLUB generally maintaining
smaller lists than SLAB.
In low-memory conditions it does mean that !PFMEMALLOC allocators can fail
a slab allocation even though free objects are available because they are
being preserved for callers that are freeing pages.
[a.p.zijlstra@chello.nl: Original implementation]
[sebastian@breakpoint.cc: Correct order of page flag clearing]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: David Miller <davem@davemloft.net>
Cc: Neil Brown <neilb@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When hotplug offlining happens on zone A, it starts to mark freed page as
MIGRATE_ISOLATE type in buddy for preventing further allocation.
(MIGRATE_ISOLATE is very irony type because it's apparently on buddy but
we can't allocate them).
When the memory shortage happens during hotplug offlining, current task
starts to reclaim, then wake up kswapd. Kswapd checks watermark, then go
sleep because current zone_watermark_ok_safe doesn't consider
MIGRATE_ISOLATE freed page count. Current task continue to reclaim in
direct reclaim path without kswapd's helping. The problem is that
zone->all_unreclaimable is set by only kswapd so that current task would
be looping forever like below.
__alloc_pages_slowpath
restart:
wake_all_kswapd
rebalance:
__alloc_pages_direct_reclaim
do_try_to_free_pages
if global_reclaim && !all_unreclaimable
return 1; /* It means we did did_some_progress */
skip __alloc_pages_may_oom
should_alloc_retry
goto rebalance;
If we apply KOSAKI's patch[1] which doesn't depends on kswapd about
setting zone->all_unreclaimable, we can solve this problem by killing some
task in direct reclaim path. But it doesn't wake up kswapd, still. It
could be a problem still if other subsystem needs GFP_ATOMIC request. So
kswapd should consider MIGRATE_ISOLATE when it calculate free pages BEFORE
going sleep.
This patch counts the number of MIGRATE_ISOLATE page block and
zone_watermark_ok_safe will consider it if the system has such blocks
(fortunately, it's very rare so no problem in POV overhead and kswapd is
never hotpath).
Copy/modify from Mel's quote
"
Ideal solution would be "allocating" the pageblock.
It would keep the free space accounting as it is but historically,
memory hotplug didn't allocate pages because it would be difficult to
detect if a pageblock was isolated or if part of some balloon.
Allocating just full pageblocks would work around this, However,
it would play very badly with CMA.
"
[1] http://lkml.org/lkml/2012/6/14/74
[akpm@linux-foundation.org: simplify nr_zone_isolate_freepages(), rework zone_watermark_ok_safe() comment, simplify set_pageblock_isolate() and restore_pageblock_isolate()]
[akpm@linux-foundation.org: fix CONFIG_MEMORY_ISOLATION=n build]
Signed-off-by: Minchan Kim <minchan@kernel.org>
Suggested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Tested-by: Aaditya Kumar <aaditya.kumar.30@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mm/page_alloc.c has some memory isolation functions but they are used only
when we enable CONFIG_{CMA|MEMORY_HOTPLUG|MEMORY_FAILURE}. So let's make
it configurable by new CONFIG_MEMORY_ISOLATION so that it can reduce
binary size and we can check it simple by CONFIG_MEMORY_ISOLATION, not if
defined CONFIG_{CMA|MEMORY_HOTPLUG|MEMORY_FAILURE}.
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
By globally defining check_panic_on_oom(), the memcg oom handler can be
moved entirely to mm/memcontrol.c. This removes the ugly #ifdef in the
oom killer and cleans up the code.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The global oom killer is serialized by the per-zonelist
try_set_zonelist_oom() which is used in the page allocator. Concurrent
oom kills are thus a rare event and only occur in systems using
mempolicies and with a large number of nodes.
Memory controller oom kills, however, can frequently be concurrent since
there is no serialization once the oom killer is called for oom conditions
in several different memcgs in parallel.
This creates a massive contention on tasklist_lock since the oom killer
requires the readside for the tasklist iteration. If several memcgs are
calling the oom killer, this lock can be held for a substantial amount of
time, especially if threads continue to enter it as other threads are
exiting.
Since the exit path grabs the writeside of the lock with irqs disabled in
a few different places, this can cause a soft lockup on cpus as a result
of tasklist_lock starvation.
The kernel lacks unfair writelocks, and successful calls to the oom killer
usually result in at least one thread entering the exit path, so an
alternative solution is needed.
This patch introduces a seperate oom handler for memcgs so that they do
not require tasklist_lock for as much time. Instead, it iterates only
over the threads attached to the oom memcg and grabs a reference to the
selected thread before calling oom_kill_process() to ensure it doesn't
prematurely exit.
This still requires tasklist_lock for the tasklist dump, iterating
children of the selected process, and killing all other threads on the
system sharing the same memory as the selected victim. So while this
isn't a complete solution to tasklist_lock starvation, it significantly
reduces the amount of time that it is held.
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Sha Zhengju <handai.szj@taobao.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mem_cgroup_out_of_memory() is defined in mm/oom_kill.c, so declare it in
linux/oom.h rather than linux/memcontrol.h.
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When a zone becomes empty after memory offlining, free zone->pageset.
Otherwise it will cause memory leak when adding memory to the empty zone
again because build_all_zonelists() will allocate zone->pageset for an
empty zone.
Signed-off-by: Jiang Liu <liuj97@gmail.com>
Signed-off-by: Wei Wang <Bessel.Wang@huawei.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Keping Chen <chenkeping@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When hotadd_new_pgdat() is called to create new pgdat for a new node, a
fallback zonelist should be created for the new node. There's code to try
to achieve that in hotadd_new_pgdat() as below:
/*
* The node we allocated has no zone fallback lists. For avoiding
* to access not-initialized zonelist, build here.
*/
mutex_lock(&zonelists_mutex);
build_all_zonelists(pgdat, NULL);
mutex_unlock(&zonelists_mutex);
But it doesn't work as expected. When hotadd_new_pgdat() is called, the
new node is still in offline state because node_set_online(nid) hasn't
been called yet. And build_all_zonelists() only builds zonelists for
online nodes as:
for_each_online_node(nid) {
pg_data_t *pgdat = NODE_DATA(nid);
build_zonelists(pgdat);
build_zonelist_cache(pgdat);
}
Though we hope to create zonelist for the new pgdat, but it doesn't. So
add a new parameter "pgdat" the build_all_zonelists() to build pgdat for
the new pgdat too.
Signed-off-by: Jiang Liu <liuj97@gmail.com>
Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Keping Chen <chenkeping@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
09f363c7 ("vmscan: fix shrinker callback bug in fs/super.c") fixed a
shrinker callback which was returning -1 when nr_to_scan is zero, which
caused excessive slab scanning. But 635697c6 ("vmscan: fix initial
shrinker size handling") fixed the problem, again so we can freely return
-1 although nr_to_scan is zero. So let's revert 09f363c7 because the
comment added in 09f363c7 made an unnecessary rule.
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
0ee332c145 ("memblock: Kill early_node_map[]") wanted to replace
CONFIG_ARCH_POPULATES_NODE_MAP with CONFIG_HAVE_MEMBLOCK_NODE_MAP but
ended up replacing one occurence with a reference to the non-existent
symbol CONFIG_HAVE_MEMBLOCK_NODE.
The resulting omission of code would probably have been causing problems
to 32-bit machines with memory hotplug.
Signed-off-by: Rabin Vincent <rabin@rab.in>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Order > 0 compaction stops when enough free pages of the correct page
order have been coalesced. When doing subsequent higher order
allocations, it is possible for compaction to be invoked many times.
However, the compaction code always starts out looking for things to
compact at the start of the zone, and for free pages to compact things to
at the end of the zone.
This can cause quadratic behaviour, with isolate_freepages starting at the
end of the zone each time, even though previous invocations of the
compaction code already filled up all free memory on that end of the zone.
This can cause isolate_freepages to take enormous amounts of CPU with
certain workloads on larger memory systems.
The obvious solution is to have isolate_freepages remember where it left
off last time, and continue at that point the next time it gets invoked
for an order > 0 compaction. This could cause compaction to fail if
cc->free_pfn and cc->migrate_pfn are close together initially, in that
case we restart from the end of the zone and try once more.
Forced full (order == -1) compactions are left alone.
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: s/laste/last/, use 80 cols]
Signed-off-by: Rik van Riel <riel@redhat.com>
Reported-by: Jim Schutt <jaschut@sandia.gov>
Tested-by: Jim Schutt <jaschut@sandia.gov>
Cc: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When CONFIG_COMPACTION is enabled, compaction_deferred() tries to
recalculate the deferred limit again, which isn't necessary.
When CONFIG_COMPACTION is disabled, compaction_deferred() should return
"true" or "false" since it has "bool" for its return value.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With HugeTLB pages, hugetlb cgroup is uncharged in compound page
destructor. Since we are holding a hugepage reference, we can be sure
that old page won't get uncharged till the last put_page().
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add the hugetlb cgroup pointer to 3rd page lru.next. This limit the usage
to hugetlb cgroup to only hugepages with 3 or more normal pages. I guess
that is an acceptable limitation.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hillf Danton <dhillf@gmail.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Implement a new controller that allows us to control HugeTLB allocations.
The extension allows to limit the HugeTLB usage per control group and
enforces the controller limit during page fault. Since HugeTLB doesn't
support page reclaim, enforcing the limit at page fault time implies that,
the application will get SIGBUS signal if it tries to access HugeTLB pages
beyond its limit. This requires the application to know beforehand how
much HugeTLB pages it would require for its use.
The charge/uncharge calls will be added to HugeTLB code in later patch.
Support for cgroup removal will be added in later patches.
[akpm@linux-foundation.org: s/CONFIG_CGROUP_HUGETLB_RES_CTLR/CONFIG_MEMCG_HUGETLB/g]
[akpm@linux-foundation.org: s/CONFIG_MEMCG_HUGETLB/CONFIG_CGROUP_HUGETLB/g]
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We will use them later in hugetlb_cgroup.c
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hillf Danton <dhillf@gmail.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
hugepage_activelist will be used to track currently used HugeTLB pages.
We need to find the in-use HugeTLB pages to support HugeTLB cgroup removal.
On cgroup removal we update the page's HugeTLB cgroup to point to parent
cgroup.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since we migrate only one hugepage, don't use linked list for passing the
page around. Directly pass the page that need to be migrated as argument.
This also removes the usage of page->lru in the migrate path.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use a mmu_gather instead of a temporary linked list for accumulating pages
when we unmap a hugepage range
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add an inline helper and use it in the code.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since per-BDI flusher threads were introduced in 2.6, the pdflush
mechanism is not used any more. But the old interface exported through
/proc/sys/vm/nr_pdflush_threads still exists and is obviously useless.
For back-compatibility, printk warning information and return 2 to notify
the users that the interface is removed.
Signed-off-by: Wanpeng Li <liwp@linux.vnet.ibm.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
vm_stat_account() accounts the shared_vm, stack_vm and reserved_vm now.
But we can also account for total_vm in the vm_stat_account() which makes
the code tidy.
Even for mprotect_fixup(), we can get the right result in the end.
Signed-off-by: Huang Shijie <shijie8@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
all pretty straightforward, except one thing.
One of our patches added thermal support for power supply class, but
thermal/ subsystem changed under our feet. We (well, Stephen, that is)
caught the issue and it was decided[1] that I'd just delay the battery
pull request, and then will fix it up by merging upstream back into
battery tree at the specific commit.
That's not all though: another[2] small fixup for thermal subsystem was
needed to get rid of a warning in power supply subsystem (the warning
was not drivers/power's "fault", the thermal registration function just
needed a proper const annotation, which is also done by a small commit
on top of the merge.
So, to sum this up:
- The 'master' branch of the battery tree was in the -next tree for
weeks, was never rebased, altered etc. It should be all OK;
- Although, for-v3.6 tag contains the 'master' branch + merge + the
warning fix.
[1] http://lkml.org/lkml/2012/6/19/23
[2] http://lkml.org/lkml/2012/6/18/28
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
iQIcBAABAgAGBQJQF9V8AAoJEGgI9fZJve1bLvkP/j/Nt1fBud2w5Q/NJr310hYJ
NWIMSJwFbMhPoNd7sESznogXH8eHQ6YJP+CmkA5Gxr0t8pjEEJHEEyEcf1eNv6/c
YDZfDB3TIaeYzulvRUkXMQ1f7hiA5Bq2t13yXeMM19+r9DzNZ51jZ3TXETLkpWZG
BfZPg5wmP0xssXB3fjJMWuW5hVEc503WLpLFXkWfWKMU3PGdy/8DckV/YLvf2l7K
1fkBLZry0gtruKqFbwcXhanP1JQ8FFFO8n1tSVLJhXXoym5twn/5GAgcpcKSFfJg
mkGXAQLLuXKfERBIda7qbQl74HmTzYadCcueeXy1hTpom+VwfOpG+by2t/FrXT/M
aJW6hfSLMgicG8FIuSYqbkutvijU9srU/YI00zrSGDBgi4sGKChRMf4sKQXnHO7X
Lb7csQ7hEWsfG5gkgjRkmgZdhqWYoIxxe5Gv2Z9MKF27mHoSM03KHiZGlDJMrmNs
w4KcU5H9tA62dT/UFszuu7NZenmsVS/ktiHWe5k+EXElZMZRrxKDJk2cvLPkRz/E
VkXGvAmTDYPasZm29yzTnPKcuo6pfeOVJnUWybHOYxhkqAhJu1QzHCatqapgfy8E
F2ODI5FoWtQES96B9t3tY4lTzq/0yUHcbiJt4BsyRcCGP+ggEQjCq0HGqOca12XX
gxE20O3l+YQTCQIYKH+S
=1NaS
-----END PGP SIGNATURE-----
Merge tag 'for-v3.6' of git://git.infradead.org/battery-2.6
Pull battery updates from Anton Vorontsov:
"The tag contains just a few battery-related changes for v3.6. It's is
all pretty straightforward, except one thing.
One of our patches added thermal support for power supply class, but
thermal/ subsystem changed under our feet. We (well, Stephen, that
is) caught the issue and it was decided[1] that I'd just delay the
battery pull request, and then will fix it up by merging upstream back
into battery tree at the specific commit.
That's not all though: another[2] small fixup for thermal subsystem
was needed to get rid of a warning in power supply subsystem (the
warning was not drivers/power's "fault", the thermal registration
function just needed a proper const annotation, which is also done by
a small commit on top of the merge.
So, to sum this up:
- The 'master' branch of the battery tree was in the -next tree for
weeks, was never rebased, altered etc. It should be all OK;
- Although, for-v3.6 tag contains the 'master' branch + merge + the
warning fix.
[1] http://lkml.org/lkml/2012/6/19/23
[2] http://lkml.org/lkml/2012/6/18/28"
* tag 'for-v3.6' of git://git.infradead.org/battery-2.6: (23 commits)
thermal: Constify 'type' argument for the registration routine
olpc-battery: update CHARGE_FULL_DESIGN property for BYD LiFe batteries
olpc-battery: Add VOLTAGE_MAX_DESIGN property
charger-manager: Fix build break related to EXTCON
lp8727_charger: Move header file into platform_data directory
power_supply: Add min/max alert properties for CAPACITY, TEMP, TEMP_AMBIENT
bq27x00_battery: Add support for BQ27425 chip
charger-manager: Set current limit of regulator for over current protection
charger-manager: Use EXTCON Subsystem to detect charger cables for charging
test_power: Add VOLTAGE_NOW and BATTERY_TEMP properties
test_power: Add support for USB AC source
gpio-charger: Use cansleep version of gpio_set_value
bq27x00_battery: Add support for power average and health properties
sbs-battery: Don't trigger false supply_changed event
twl4030_charger: Allow charger to control the regulator that feeds it
twl4030_charger: Add backup-battery charging
twl4030_charger: Fix some typos
max17042_battery: Support CHARGE_COUNTER power supply attribute
smb347-charger: Add constant charge and current properties
power_supply: Add constant charge_current and charge_voltage properties
...
When a device is unregistered, we have to purge all of the
references to it that may exist in the entire system.
If a route is uncached, we currently have no way of accomplishing
this.
So create a global list that is scanned when a network device goes
down. This mirrors the logic in net/core/dst.c's dst_ifdown().
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull nfsd changes from J. Bruce Fields:
"This has been an unusually quiet cycle--mostly bugfixes and cleanup.
The one large piece is Stanislav's work to containerize the server's
grace period--but that in itself is just one more step in a
not-yet-complete project to allow fully containerized nfs service.
There are a number of outstanding delegation, container, v4 state, and
gss patches that aren't quite ready yet; 3.7 may be wilder."
* 'nfsd-next' of git://linux-nfs.org/~bfields/linux: (35 commits)
NFSd: make boot_time variable per network namespace
NFSd: make grace end flag per network namespace
Lockd: move grace period management from lockd() to per-net functions
LockD: pass actual network namespace to grace period management functions
LockD: manage grace list per network namespace
SUNRPC: service request network namespace helper introduced
NFSd: make nfsd4_manager allocated per network namespace context.
LockD: make lockd manager allocated per network namespace
LockD: manage grace period per network namespace
Lockd: add more debug to host shutdown functions
Lockd: host complaining function introduced
LockD: manage used host count per networks namespace
LockD: manage garbage collection timeout per networks namespace
LockD: make garbage collector network namespace aware.
LockD: mark host per network namespace on garbage collect
nfsd4: fix missing fault_inject.h include
locks: move lease-specific code out of locks_delete_lock
locks: prevent side-effects of locks_release_private before file_lock is initialized
NFSd: set nfsd_serv to NULL after service destruction
NFSd: introduce nfsd_destroy() helper
...
Input path is mostly run under RCU and doesnt touch dst refcnt
But output path on forwarding or UDP workloads hits
badly dst refcount, and we have lot of false sharing, for example
in ipv4_mtu() when reading rt->rt_pmtu
Using a percpu cache for nh_rth_output gives a nice performance
increase at a small cost.
24 udpflood test on my 24 cpu machine (dummy0 output device)
(each process sends 1.000.000 udp frames, 24 processes are started)
before : 5.24 s
after : 2.06 s
For reference, time on linux-3.5 : 6.60 s
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 404e0a8b6a (net: ipv4: fix RCU races on dst refcounts) tried
to solve a race but added a problem at device/fib dismantle time :
We really want to call dst_free() as soon as possible, even if sockets
still have dst in their cache.
dst_release() calls in free_fib_info_rcu() are not welcomed.
Root of the problem was that now we also cache output routes (in
nh_rth_output), we must use call_rcu() instead of call_rcu_bh() in
rt_free(), because output route lookups are done in process context.
Based on feedback and initial patch from David Miller (adding another
call_rcu_bh() call in fib, but it appears it was not the right fix)
I left the inet_sk_rx_dst_set() helper and added __rcu attributes
to nh_rth_output and nh_rth_input to better document what is going on in
this code.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull Ceph changes from Sage Weil:
"Lots of stuff this time around:
- lots of cleanup and refactoring in the libceph messenger code, and
many hard to hit races and bugs closed as a result.
- lots of cleanup and refactoring in the rbd code from Alex Elder,
mostly in preparation for the layering functionality that will be
coming in 3.7.
- some misc rbd cleanups from Josh Durgin that are finally going
upstream
- support for CRUSH tunables (used by newer clusters to improve the
data placement)
- some cleanup in our use of d_parent that Al brought up a while back
- a random collection of fixes across the tree
There is another patch coming that fixes up our ->atomic_open()
behavior, but I'm going to hammer on it a bit more before sending it."
Fix up conflicts due to commits that were already committed earlier in
drivers/block/rbd.c, net/ceph/{messenger.c, osd_client.c}
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (132 commits)
rbd: create rbd_refresh_helper()
rbd: return obj version in __rbd_refresh_header()
rbd: fixes in rbd_header_from_disk()
rbd: always pass ops array to rbd_req_sync_op()
rbd: pass null version pointer in add_snap()
rbd: make rbd_create_rw_ops() return a pointer
rbd: have __rbd_add_snap_dev() return a pointer
libceph: recheck con state after allocating incoming message
libceph: change ceph_con_in_msg_alloc convention to be less weird
libceph: avoid dropping con mutex before fault
libceph: verify state after retaking con lock after dispatch
libceph: revoke mon_client messages on session restart
libceph: fix handling of immediate socket connect failure
ceph: update MAINTAINERS file
libceph: be less chatty about stray replies
libceph: clear all flags on con_close
libceph: clean up con flags
libceph: replace connection state bits with states
libceph: drop unnecessary CLOSED check in socket state change callback
libceph: close socket directly from ceph_con_close()
...
Remove the control.sta pointer from ieee80211_tx_info to free up
sufficient space in the TX skb control buffer for the upcoming
Transmit Power Control (TPC).
Instead, the pointer is now on the stack in a new control struct
that is passed as a function parameter to the drivers' tx method.
Signed-off-by: Thomas Huehn <thomas@net.t-labs.tu-berlin.de>
Signed-off-by: Alina Friedrichsen <x-alina@gmx.net>
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
[reworded commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add PCI device support for VFIO. PCI devices expose regions
for accessing config space, I/O port space, and MMIO areas
of the device. PCI config access is virtualized in the kernel,
allowing us to ensure the integrity of the system, by preventing
various accesses while reducing duplicate support across various
userspace drivers. I/O port supports read/write access while
MMIO also supports mmap of sufficiently sized regions. Support
for INTx, MSI, and MSI-X interrupts are provided using eventfds to
userspace.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
This VFIO IOMMU backend is designed primarily for AMD-Vi and Intel
VT-d hardware, but is potentially usable by anything supporting
similar mapping functionality. We arbitrarily call this a Type1
backend for lack of a better name. This backend has no IOVA
or host memory mapping restrictions for the user and is optimized
for relatively static mappings. Mapped areas are pinned into system
memory.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
VFIO is a secure user level driver for use with both virtual machines
and user level drivers. VFIO makes use of IOMMU groups to ensure the
isolation of devices in use, allowing unprivileged user access. It's
intended that VFIO will replace KVM device assignment and UIO drivers
(in cases where the target platform includes a sufficiently capable
IOMMU).
New in this version of VFIO is support for IOMMU groups managed
through the IOMMU core as well as a rework of the API, removing the
group merge interface. We now go back to a model more similar to
original VFIO with UIOMMU support where the file descriptor obtained
from /dev/vfio/vfio allows access to the IOMMU, but only after a
group is added, avoiding the previous privilege issues with this type
of model. IOMMU support is also now fully modular as IOMMUs have
vastly different interface requirements on different platforms. VFIO
users are able to query and initialize the IOMMU model of their
choice.
Please see the follow-on Documentation commit for further description
and usage example.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Currently, ps mode is indicated per device (rather than
per interface), which doesn't make a lot of sense.
Moreover, there are subtle bugs caused by the inability
to indicate ps change along with other changes
(e.g. when the AP deauth us, we'd like to indicate
CHANGED_PS | CHANGED_ASSOC, as changing PS before
notifying about disassociation will result in null-packets
being sent (if IEEE80211_HW_SUPPORTS_DYNAMIC_PS) while
the sta is already disconnected.)
Keep the current per-device notifications, and add
parallel per-vif notifications.
In order to keep it simple, the per-device ps and
the per-vif ps are orthogonal - the per-vif ps
configuration is determined only by the user
configuration (enable/disable) and the connection
state, and is not affected by other vifs state and
(temporary) dynamic_ps/offchannel operations
(unlike per-device ps).
Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
thermal_zone_device_register() does not modify 'type' argument, so it is
safe to declare it as const. Otherwise, if we pass a const string, we are
getting the ugly warning:
CC drivers/power/power_supply_core.o
drivers/power/power_supply_core.c: In function 'psy_register_thermal':
drivers/power/power_supply_core.c:204:6: warning: passing argument 1 of 'thermal_zone_device_register' discards 'const' qualifier from pointer target type [enabled by default]
include/linux/thermal.h:140:29: note: expected 'char *' but argument is of type 'const char *'
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Jean Delvare <khali@linux-fr.org>
This merge is performed to take commit c56f5c0342 ("Thermal: Make
Thermal trip points writeable") out of Linus' tree and then fixup power
supply class. This is needed since thermal stuff added a new argument:
CC drivers/power/power_supply_core.o
drivers/power/power_supply_core.c: In function ‘psy_register_thermal’:
drivers/power/power_supply_core.c:204:6: warning: passing argument 3 of ‘thermal_zone_device_register’ makes integer from pointer without a cast [enabled by default]
include/linux/thermal.h:154:29: note: expected ‘int’ but argument is of type ‘struct power_supply *’
drivers/power/power_supply_core.c:204:6: error: too few arguments to function ‘thermal_zone_device_register’
include/linux/thermal.h:154:29: note: declared here
make[1]: *** [drivers/power/power_supply_core.o] Error 1
make: *** [drivers/power/] Error 2
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
This will allow md/raid to know why the unplug was called,
and will be able to act according - if !from_schedule it
is safe to perform tasks which could themselves schedule.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Both md and umem has similar code for getting notified on an
blk_finish_plug event.
Centralize this code in block/ and allow each driver to
provide its distinctive difference.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Now that all users are converted, we can remove functions, variables, and
constants defined by the old freezing mechanism.
BugLink: https://bugs.launchpad.net/bugs/897421
Tested-by: Kamal Mostafa <kamal@canonical.com>
Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com>
Tested-by: Dann Frazier <dann.frazier@canonical.com>
Tested-by: Massimo Morana <massimo.morana@canonical.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
vfs_check_frozen() tests are racy since the filesystem can be frozen just after
the test is performed. Thus in write paths we can end up marking some pages or
inodes dirty even though the file system is already frozen. This creates
problems with flusher thread hanging on frozen filesystem.
Another problem is that exclusion between ->page_mkwrite() and filesystem
freezing has been handled by setting page dirty and then verifying s_frozen.
This guaranteed that either the freezing code sees the faulted page, writes it,
and writeprotects it again or we see s_frozen set and bail out of page fault.
This works to protect from page being marked writeable while filesystem
freezing is running but has an unpleasant artefact of leaving dirty (although
unmodified and writeprotected) pages on frozen filesystem resulting in similar
problems with flusher thread as the first problem.
This patch aims at providing exclusion between write paths and filesystem
freezing. We implement a writer-freeze read-write semaphore in the superblock.
Actually, there are three such semaphores because of lock ranking reasons - one
for page fault handlers (->page_mkwrite), one for all other writers, and one of
internal filesystem purposes (used e.g. to track running transactions). Write
paths which should block freezing (e.g. directory operations, ->aio_write(),
->page_mkwrite) hold reader side of the semaphore. Code freezing the filesystem
takes the writer side.
Only that we don't really want to bounce cachelines of the semaphores between
CPUs for each write happening. So we implement the reader side of the semaphore
as a per-cpu counter and the writer side is implemented using s_writers.frozen
superblock field.
[AV: microoptimize sb_start_write(); we want it fast in normal case]
BugLink: https://bugs.launchpad.net/bugs/897421
Tested-by: Kamal Mostafa <kamal@canonical.com>
Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com>
Tested-by: Dann Frazier <dann.frazier@canonical.com>
Tested-by: Massimo Morana <massimo.morana@canonical.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
which can adapt equally well to fast/slow devices.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJQF0/wAAoJECvKgwp+S8JaszsP/16EO5F5mUCOFgncVRp+8R9U
BxuKJ61j2R9ckHA+ngMEg72W5vJQds64cjywZnz6HMr0/+3tXUf4QBbU4/4sCeai
0lpK8MCKgp5KHHCxgO8zyoSaboankUgoDcbSmGJREV1WXoR8VWXsO9gXqiiH9XOe
e8ADjds/YdxkQbOYDRgZKvLzwWS61K9Kwq5/56GASh2uflw7rkJZ38xqvGbo3YiQ
IJJwOUYfJjFadIewYARQmkZZyWeAmtY0ADh15Z8pJt+iY4PgcDDlaWagwUH2Oaoi
vhTFO4KnCjhSpc872et21g/jN/VrcQqzuUF/LUE9rW5irXeDZVCDrCQOuHQ+3Uo5
YuV3rpNABW/LU8AvtIwt9hUunaKUnrXaSluoL9LzH2VpH++JljQeR4yZ62Q5rpRs
z4Bow25p7tlbcIJWzueMPdOFUr5s3P6XfQHoLVRWqN94eJ+Z1DPOgBlOain6cCUN
oivPh4FxZfUscNmX8/7cHaTNEpGzJ0FzLPMNFnGQ2zwG0gk41fnMdWb19aVIE5GL
/Q96TB24k/v+o1lbxGmyPi0L+Aq+NknvNm+p/YJHAAQrIpu5t2hPvka8/m7Nj7tu
c3rM75RKiZkEI+3U6Ws1DhhQPtVcfNIVlNYQMGzlIndtK6T0ByueQ4eN5Z7ltjSE
pS89rv2hyBw7yCaTU6ui
=gZrM
-----END PGP SIGNATURE-----
Merge tag 'writeback-proportions' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux
Pull writeback updates from Wu Fengguang:
"Use time based periods to age the writeback proportions, which can
adapt equally well to fast/slow devices."
Fix up trivial conflict in comment in fs/sync.c
* tag 'writeback-proportions' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
writeback: Fix some comment errors
block: Convert BDI proportion calculations to flexible proportions
lib: Fix possible deadlock in flexible proportion code
lib: Proportions with flexible period
Features include:
- More preparatory patches for modularising NFSv2/v3/v4.
Split out the various NFSv2/v3/v4-specific code into separate
files
- More preparation for the NFSv4 migration code
- Ensure that OPEN(O_CREATE) observes the pNFS mds threshold parameters
- pNFS fast failover when the data servers are down
- Various cleanups and debugging patches
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJQFwYRAAoJEGcL54qWCgDy6YAQAIZuUPFCQbuzWev4L/XA0uhg
0xjr9YBmg57UOs8e7FhS0azJip+D/kmF2Rx5gCMX0QO/IwaEVoms4BS8zjFOr3yL
cPIyY5ErF+WJ25f3Mz8k0wOHTzrOvMdq4/7TQf5hztsrGYFid78sSb3O9ZO9bgb4
QxqhgA0Z4S2vIqeObJAF9BT5UOT/cXfVKV0iTd52ZQVA+ZhqJ2SDlNFsoxnVOweV
qzKOi0FZCPsjH+Z4a/LLdieHG5AYRPhOScBdG7gTFAdkysI8DDtSNCmJ68dMHYRw
OHtyt/X4k9TImfwauV3Eww1sw3NkDswX+dv3GOSDTlMvVBrm0WXWj2zoHZbOB+0Q
ETI9Zpolvd9pADtyfu9c+kCYIR+NdB4lGKd15iZfdEy/CZ0CtbLAbn5Szo2YcwNR
iVjswR0jsbklD+mZjlMeZoG4JX2d9ZRqLs20POz7QQBmdIQ8jb9/SwVj9zGvX0w0
NyVUHTFlGaNwkpo7oeRoWi1xjZWE6OzfoncV/46DOJWb08OKZ4fR4ugMYJWGi7Xx
I4nQrIiHtInqL+sF5Y3wlZ48dufLuIhAcHh7klxgud4GRj7t8j+a99pTrRmpHvvC
4K2Yu69VYcIA7N2RsSLohIllGPFmMwpdar7yERdD0So8/fz/zf7wn0dFFYY11+bL
YNVVGhvNxccMU5EU9YR4
=Lc59
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-3.6-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
"Features include:
- More preparatory patches for modularising NFSv2/v3/v4. Split out
the various NFSv2/v3/v4-specific code into separate files
- More preparation for the NFSv4 migration code
- Ensure that OPEN(O_CREATE) observes the pNFS mds threshold
parameters
- pNFS fast failover when the data servers are down
- Various cleanups and debugging patches"
* tag 'nfs-for-3.6-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (67 commits)
nfs: fix fl_type tests in NFSv4 code
NFS: fix pnfs regression with directio writes
NFS: fix pnfs regression with directio reads
sunrpc: clnt: Add missing braces
nfs: fix stub return type warnings
NFS: exit_nfs_v4() shouldn't be an __exit function
SUNRPC: Add a missing spin_unlock to gss_mech_list_pseudoflavors
NFS: Split out NFS v4 client functions
NFS: Split out the NFS v4 filesystem types
NFS: Create a single nfs_clone_super() function
NFS: Split out NFS v4 server creating code
NFS: Initialize the NFS v4 client from init_nfs_v4()
NFS: Move the v4 getroot code to nfs4getroot.c
NFS: Split out NFS v4 file operations
NFS: Initialize v4 sysctls from nfs_init_v4()
NFS: Create an init_nfs_v4() function
NFS: Split out NFS v4 inode operations
NFS: Split out NFS v3 inode operations
NFS: Split out NFS v2 inode operations
NFS: Clean up nfs4_proc_setclientid() and friends
...
Pull media updates from Mauro Carvalho Chehab:
"This is the first part of the media patches for v3.6.
This patch series contain:
- new DVB frontend: rtl2832
- new video drivers: adv7393
- some unused files got removed
- a selection API cleanup between V4L2 and V4L2 subdev API's
- a major redesign at v4l-ioctl2, in order to clean it up
- several driver fixes and improvements."
* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (174 commits)
v4l: Export v4l2-common.h in include/linux/Kbuild
media: Revert "[media] Terratec Cinergy S2 USB HD Rev.2"
[media] media: Use pr_info not homegrown pr_reg macro
[media] Terratec Cinergy S2 USB HD Rev.2
[media] v4l: Correct conflicting V4L2 subdev selection API documentation
[media] Feature removal: V4L2 selections API target and flag definitions
[media] v4l: Unify selection flags documentation
[media] v4l: Unify selection flags
[media] v4l: Common documentation for selection targets
[media] v4l: Unify selection targets across V4L2 and V4L2 subdev interfaces
[media] v4l: Remove "_ACTUAL" from subdev selection API target definition names
[media] V4L: Remove "_ACTIVE" from the selection target name definitions
[media] media: dvb-usb: print mac address via native %pM
[media] s5p-tv: Use module_i2c_driver in sii9234_drv.c file
[media] media: gpio-ir-recv: add allowed_protos for platform data
[media] s5p-jpeg: Use module_platform_driver in jpeg-core.c file
[media] saa7134: fix spelling of detach in label
[media] cx88-blackbird: replace ioctl by unlocked_ioctl
[media] cx88: don't use current_norm
[media] cx88: fix a number of v4l2-compliance violations
...
Rename flags with CON_FLAG prefix, move the definitions into the c file,
and (better) document their meaning.
Signed-off-by: Sage Weil <sage@inktank.com>
Use a simple set of 6 enumerated values for the socket states (CON_STATE_*)
and use those instead of the state bits. All of the con->state checks are
now under the protection of the con mutex, so this is safe. It also
simplifies many of the state checks because we can check for anything other
than the expected state instead of various bits for races we can think of.
This appears to hold up well to stress testing both with and without socket
failure injection on the server side.
Signed-off-by: Sage Weil <sage@inktank.com>
There are two structures in which a count of snapshots are
maintained:
struct ceph_snap_context {
...
u32 num_snaps;
...
}
and
struct ceph_snap_realm {
...
u32 num_prior_parent_snaps; /* had prior to parent_since */
...
u32 num_snaps;
...
}
These fields never take on negative values (e.g., to hold special
meaning), and so are really inherently unsigned. Furthermore they
take their value from over-the-wire or on-disk formatted 32-bit
values.
So change their definition to have type u32, and change some spots
elsewhere in the code to account for this change.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
The server side recently added support for tuning some magic
crush variables. Decode these variables if they are present, or use the
default values if they are not present.
Corresponds to ceph.git commit 89af369c25f274fe62ef730e5e8aad0c54f1e5a5.
Signed-off-by: caleb miles <caleb.miles@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
Merge Andrew's first set of patches:
"Non-MM patches:
- lots of misc bits
- tree-wide have_clk() cleanups
- quite a lot of printk tweaks. I draw your attention to "printk:
convert the format for KERN_<LEVEL> to a 2 byte pattern" which
looks a bit scary. But afaict it's solid.
- backlight updates
- lib/ feature work (notably the addition and use of memweight())
- checkpatch updates
- rtc updates
- nilfs updates
- fatfs updates (partial, still waiting for acks)
- kdump, proc, fork, IPC, sysctl, taskstats, pps, etc
- new fault-injection feature work"
* Merge emailed patches from Andrew Morton <akpm@linux-foundation.org>: (128 commits)
drivers/misc/lkdtm.c: fix missing allocation failure check
lib/scatterlist: do not re-write gfp_flags in __sg_alloc_table()
fault-injection: add tool to run command with failslab or fail_page_alloc
fault-injection: add selftests for cpu and memory hotplug
powerpc: pSeries reconfig notifier error injection module
memory: memory notifier error injection module
PM: PM notifier error injection module
cpu: rewrite cpu-notifier-error-inject module
fault-injection: notifier error injection
c/r: fcntl: add F_GETOWNER_UIDS option
resource: make sure requested range is included in the root range
include/linux/aio.h: cpp->C conversions
fs: cachefiles: add support for large files in filesystem caching
pps: return PTR_ERR on error in device_create
taskstats: check nla_reserve() return
sysctl: suppress kmemleak messages
ipc: use Kconfig options for __ARCH_WANT_[COMPAT_]IPC_PARSE_VERSION
ipc: compat: use signed size_t types for msgsnd and msgrcv
ipc: allow compat IPC version field parsing if !ARCH_WANT_OLD_COMPAT_IPC
ipc: add COMPAT_SHMLBA support
...
When we restore file descriptors we would like them to look exactly as
they were at dumping time.
With help of fcntl it's almost possible, the missing snippet is file
owners UIDs.
To be able to read their values the F_GETOWNER_UIDS is introduced.
This option is valid iif CONFIG_CHECKPOINT_RESTORE is turned on, otherwise
returning -EINVAL.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Convert init_sync_kiocb() from a nasty macro into a nice C function. The
struct assignment trick takes care of zeroing all unmentioned fields.
Shrinks fs/read_write.o's .text from 9857 bytes to 9714.
Also demacroize is_sync_kiocb() and aio_ring_avail(). The latter fixes an
arg-referenced-multiple-times hand grenade.
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rather than #define the options manually in the architecture code, add
Kconfig options for them and select them there instead. This also allows
us to select the compat IPC version parsing automatically for platforms
using the old compat IPC interface.
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The msgsnd and msgrcv system calls use size_t to represent the size of the
message being transferred. POSIX states that values of msgsz greater than
SSIZE_MAX cause the result to be implementation-defined. On Linux, this
equates to returning -EINVAL if (long) msgsz < 0.
For compat tasks where !CONFIG_ARCH_WANT_OLD_COMPAT_IPC and compat_size_t
is smaller than size_t, negative size values passed from userspace will be
interpreted as positive values by do_msg{rcv,snd} and will fail to exit
early with -EINVAL.
This patch changes the compat prototypes for msg{rcv,snd} so that the
message size is represented as a compat_ssize_t, which we cast to the
native ssize_t type for the core IPC code.
Cc: Arnd Bergmann <arnd@arndb.de>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 48b25c43e6 ("ipc: provide generic compat versions of IPC
syscalls") added a new ARCH_WANT_OLD_COMPAT_IPC config option for
architectures to select if their compat target requires the old IPC
syscall interface.
For architectures (such as AArch64) that do not require the internal
calling conventions provided by this option, but have a compat target
where the C library passes the IPC_64 flag explicitly,
compat_ipc_parse_version no longer strips out the flag before calling
the native system call implementation, resulting in unknown SHM/IPC
commands and -EINVAL being returned to userspace.
This patch separates the selection of the internal calling conventions
for the IPC syscalls from the version parsing, allowing architectures to
select __ARCH_WANT_COMPAT_IPC_PARSE_VERSION if they want to use version
parsing whilst retaining the newer syscall calling conventions.
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If the SHMLBA definition for a native task differs from the definition for
a compat task, the do_shmat() function would need to handle both.
This patch introduces COMPAT_SHMLBA, which is used by the compat shmat
syscall when calling the ipc code and allows architectures such as AArch64
(where the native SHMLBA is 64k but the compat (AArch32) definition is
16k) to provide the correct semantics for compat IPC system calls.
Cc: David S. Miller <davem@davemloft.net>
Cc: Chris Zankel <chris@zankel.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
memweight() is the function that counts the total number of bits set in
memory area. Unlike bitmap_weight(), memweight() takes pointer and size
in bytes to specify a memory area which does not need to be aligned to
long-word boundary.
[akpm@linux-foundation.org: rename `w' to `ret']
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Anders Larsen <al@alarsen.net>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: Tony Luck <tony.luck@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The lp855x header is used only in the platform side, so it can be moved
into platform_data directory
Signed-off-by: Milo(Woogyom) Kim <milo.kim@ti.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Bryan Wu <bryan.wu@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ROM boundary definitions do not need to be exported because these are
used only internally in the lp855x driver.
And few code cosmetic changes
Signed-off-by: Milo(Woogyom) Kim <milo.kim@ti.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Bryan Wu <bryan.wu@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now that all KERN_<LEVEL> uses are prefixed with ASCII SOH, there is no
need for a KERN_CONT. Keep it backward compatible by adding #define
KERN_CONT ""
Reduces kernel image size a thousand bytes.
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Instead of "<.>", use an ASCII SOH for the KERN_<LEVEL> prefix initiator.
This saves 1 byte per printk, thousands of bytes in a normal kernel.
No output changes are produced as vprintk_emit converts these uses to
"<.>".
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Separate the printk.h file into 2 pieces so the definitions can be used in
asm files.
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The current form of a KERN_<LEVEL> is "<.>".
Add printk_get_level and printk_skip_level functions to handle these
formats.
These functions centralize tests of KERN_<LEVEL> so a future modification
can change the KERN_<LEVEL> style and shorten the number of bytes consumed
by these headers.
[akpm@linux-foundation.org: fix build error and warning]
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Wu Fengguang <wfg@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On the suspend/resume path the boot CPU does not go though an
offline->online transition. This breaks the NMI detector post-resume
since it depends on PMU state that is lost when the system gets
suspended.
Fix this by forcing a CPU offline->online transition for the lockup
detector on the boot CPU during resume.
To provide more context, we enable NMI watchdog on Chrome OS. We have
seen several reports of systems freezing up completely which indicated
that the NMI watchdog was not firing for some reason.
Debugging further, we found a simple way of repro'ing system freezes --
issuing the command 'tasket 1 sh -c "echo nmilockup > /proc/breakme"'
after the system has been suspended/resumed one or more times.
With this patch in place, the system freeze result in panics, as
expected.
These panics provide a nice stack trace for us to debug the actual issue
causing the freeze.
[akpm@linux-foundation.org: fiddle with code comment]
[akpm@linux-foundation.org: make lockup_detector_bootcpu_resume() conditional on CONFIG_SUSPEND]
[akpm@linux-foundation.org: fix section errors]
Signed-off-by: Sameer Nanda <snanda@chromium.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Mandeep Singh Baines <msb@chromium.org>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With addition of dummy clk_*() calls for non CONFIG_HAVE_CLK cases in
clk.h, there is no need to have clk code enclosed in #ifdef
CONFIG_HAVE_CLK, #endif macros.
Marvell usb also has these dummy macros defined locally. Remove them as
they aren't required anymore.
Signed-off-by: Viresh Kumar <viresh.kumar@st.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Mike Turquette <mturquette@linaro.org>
Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Cc: viresh kumar <viresh.linux@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Many drivers are shared between architectures that may or may not have
HAVE_CLK selected for them. To remove compilation errors for them we
enclose clk_*() calls in these drivers within #ifdef CONFIG_HAVE_CLK,
#endif.
This patch removes the need of these CONFIG_HAVE_CLK statements, by
introducing dummy routines when HAVE_CLK is not selected by platforms.
So, definition of these routines will always be available. These calls
will return error for platforms that don't select HAVE_CLK.
Signed-off-by: Viresh Kumar <viresh.kumar@st.com>
Cc: Wolfram Sang <w.sang@pengutronix.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jeff Garzik <jgarzik@redhat.com>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Bhupesh Sharma <bhupesh.sharma@st.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Mike Turquette <mturquette@linaro.org>
Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Cc: viresh kumar <viresh.linux@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When suid_dumpable=2, detect unsafe core_pattern settings and warn when
they are seen.
Signed-off-by: Kees Cook <keescook@chromium.org>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alan Cox <alan@linux.intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: James Morris <james.l.morris@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is simply cleanup that will keep things more closely synced with the
userland code.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
To allow apps to limit a hw-freq-seek to a specific band, for further
info see the documentation this patch adds for these new fields.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
This adds the usual core support code for this new ioctl.
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Add a new ioctl to enumerate the supported frequency bands of a tuner.
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
This patch exports symbols needed by the v4 module. In addition, I also
switch over to using IS_ENABLED() to check if CONFIG_NFS_V4 or
CONFIG_NFS_V4_MODULE are set.
The module (nfs4.ko) will be created in the same directory as nfs.ko and
will be automatically loaded the first time you try to mount over NFS v4.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
v2 and v4 don't use it, so I create two new nfs_rpc_ops functions to
initialize the ACL client only when we are using v3.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
I'm already looking up the nfs subversion in nfs_fs_mount(), so I have
easy access to rpc_ops that used to be difficult to reach. This allows
me to set up a different mount path for NFS v2/3 and NFS v4.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This patch adds in the code to track multiple versions of the NFS
protocol. I created default structures for v2, v3 and v4 so that each
version can continue to work while I convert them into kernel modules.
I also removed the const parameter from the rpc_version array so that I
can change it at runtime.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This patch adds new V4L2_CAP_VIDEO_M2M and V4L2_CAP_VIDEO_M2M_MPLANE
capability flags that are intended to be used for memory-to-memory (M2M)
devices, instead of ORed V4L2_CAP_VIDEO_CAPTURE and V4L2_CAP_VIDEO_OUTPUT.
V4L2_CAP_VIDEO_M2M flag is added at the drivers, CAPTURE and OUTPUT
capability flags are left untouched and will be removed in future,
after a transition period required for existing applications to be
adapted to check only for V4L2_CAP_VIDEO_M2M.
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Acked-by: Kamil Debski <k.debski@samsung.com>
Acked-by: Andrzej Pietrasiewicz <andrzej.p@samsung.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
add hardware clipping support for VPIF output data. This
is needed as it is possible that the external encoder
might get confused between the FF or 00 which are a part
of the data and that of the SAV or EAV codes.
Signed-off-by: Manjunath Hadli <manjunath.hadli@ti.com>
Signed-off-by: Lad, Prabhakar <prabhakar.lad@ti.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
After IP route cache removal, rt_cache_rebuild_count is no longer
used.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit c6cffba4ff (ipv4: Fix input route performance regression.)
added various fatal races with dst refcounts.
crashes happen on tcp workloads if routes are added/deleted at the same
time.
The dst_free() calls from free_fib_info_rcu() are clearly racy.
We need instead regular dst refcounting (dst_release()) and make
sure dst_release() is aware of RCU grace periods :
Add DST_RCU_FREE flag so that dst_release() respects an RCU grace period
before dst destruction for cached dst
Introduce a new inet_sk_rx_dst_set() helper, using atomic_inc_not_zero()
to make sure we dont increase a zero refcount (On a dst currently
waiting an rcu grace period before destruction)
rt_cache_route() must take a reference on the new cached route, and
release it if was not able to install it.
With this patch, my machines survive various benchmarks.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When mnt_want_write() starts to handle freezing it will get a full lock
semantics requiring proper lock ordering. So push mnt_want_write() call
consistently outside of i_mutex.
CC: linux-nfs@vger.kernel.org
CC: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Make default vm_ops provide ->page_mkwrite handler. Currently it only updates
file's modification times and gets locked page but later it will also handle
filesystem freezing.
BugLink: https://bugs.launchpad.net/bugs/897421
Tested-by: Kamal Mostafa <kamal@canonical.com>
Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com>
Tested-by: Dann Frazier <dann.frazier@canonical.com>
Tested-by: Massimo Morana <massimo.morana@canonical.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Some devices which use the tea575x tuner chip don't allow direct control
over the IO pins, and thus cannot mute the audio output.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
CC: Ondrej Zary <linux@rainbow-software.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Some devices which use the tea575x tuner chip don't allow bit banging the
lines, instead they offer a method to directly set / get the contents of the
25 bit shift-register in the chip. Notably the Griffin radioSHARK USB radio
receiver does this.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
CC: Ondrej Zary <linux@rainbow-software.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJQFgcxAAoJENkgDmzRrbjxQKkP/A0Vuz9ltiRuaMys5Q6nwPYz
pHB1VfTNhCAI+fW8AY3imZD6RngBRk9frezRGkhTwxM33QRPAfZOLUQd0d6PoVD5
rTw8uuA1PwuQIsKAHrFZ27nrdqKb6bWid7tW8ABfpxQzQrVAm3PeUFfvnTT9BsIU
OGiZXQkmojgnuIzJjd1wCBroWCrVyIyR4SJcx6ODYlUNOYdwVbi+jiohsZ23iG5h
cAYQna4y4UDK16LmTHg7xxnB7WDCD1ZZ1gBNeJBM+f0F1S2xxK50NSpqGoq367NN
6QRsZM9mqofMRZgpYSe6Yl9wd/STiMFysNiQHwhSkvhnT+Sf2gpAcZZ570gGdNMN
CR90RcaK+IDJ9i4XMs2LVcjnt3uEznpKhgpq0Lvnw2S0P7KUCbR8ti+jR8LZR/8F
eSVcvEJKOzZSWp6P6BHSHK1K/D/b31/qKXEESAnGGVNRheq4o5jA2TvyzDCAAohH
Y70KCNYO/kqiZGM1bbb/VvmiiaD/aYRHQOg8Ef7wjVE/0HoMRR5xQeW7tonw2krZ
35//eT0AJimAhdZCaw+JkANw+wI+bIQGc8IjHZhvWzSRWH0Fb5SJAWhV7urD4O6K
Nbzclzx2aB6ZNmIsh9cCaJ4a+/i0jO1ql/VJYwBMmwcRLlrYdjLLX/SXcBCY/KFj
yFPzDMqQCOJm9grTxn1m
=kIHV
-----END PGP SIGNATURE-----
Merge tag 'virtio-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus
Pull virtio update from Rusty Russell:
"Virtio patches, mainly hotplugging fixes."
* tag 'virtio-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
virtio-blk: return VIRTIO_BLK_F_FLUSH to header.
virtio-blk: allow toggling host cache between writeback and writethrough
virtio-blk: Use block layer provided spinlock
virtio-blk: Reset device after blk_cleanup_queue()
virtio-blk: Call del_gendisk() before disable guest kick
virtio: rng: s3/s4 support
virtio: rng: split out common code in probe / remove for s3/s4 ops
virtio: rng: don't wait on host when module is going away
virtio: rng: allow tasks to be killed that are waiting for rng input
virtio ids: fix comment for virtio-rng
We have support for a few new drivers:
- Samsung s2mps11
- Wolfson Microelectronics wm5102 and wm5110
- Marvell 88PM800 and 88PM805
- TI twl6041
We also have our regular driver improvements:
- Device tree and IRQ domain support for STE AB8500
- Regmap and devm_* API conversion for TI tps6586x
- Device tree support for Samsung max77686
- devm_* API conversion for STE AB3100
Besides that, quite a lot of fixing and cleanup for mc13xxx, tps65910,
tps65090, da9052 and twl-core.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJQFpGVAAoJEIqAPN1PVmxKFNoP/1dkYngrxxV6cxdyLJ74APhG
lVKPgaDxQhdgfwCZJmMeZK1UphZo80cWnEXG6sTHZUEQdTaslSJu5SuPfUM+fo7e
52/dU0nx0ZE04pwPQLHbidS4TmHlbLg9oM2kmIf9RO5rg34GodwVgRL/4+k1qvhz
aWYJt9erFhQOpqaSX66mXHSuhvzHypbcE7d2B2Ykmh3NoYiH2w9H9KmIbbb+ZLq8
+Bp/i5Ys/vfooo+8IE2w6KZfIzMwsmmtWjjr/38yuQJaKZCh/zn23DM9HsdrVf++
RzfniRF4YBxmeKi7zi8MFIYys8filTCXA9dXbGSAKCuUCT37yZRnUxTeN1bn7Bux
A7KRpG7pUKQKVKqCTndvK5LcQKlT33XyW2ZzV1wVWX2JkCJ+gilPeykb8IabNvGX
nIp0STEGR/WdCLEAKo8pJF7Usn0RuUzAug02SG/mQ6dpnLoZqp0Od5W7gRhT7M7h
hXr/xKJ6cG5YwicpAdy5kJJ0dRgQrtaHwxrF0B68AXZ7CmAtkPuEGCYhUCFnGQUH
XJ0CodAqqVBRyYiQS4zIpIh2nqhIdsgv4OC1+kVLbubQk+PR88zG9Jvg6i1HQi/A
OHi7N5Wite3YUrs3sBzDKnEc/Il2YRhVaz2SLVNfZR0PS7hywHN3rK/tVFINTgei
jNEz1H6hu7ToNLfs0UzP
=c28c
-----END PGP SIGNATURE-----
Merge tag 'mfd-3.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6
Pull MFD bits from Samuel Ortiz:
"We have support for a few new drivers:
- Samsung s2mps11
- Wolfson Microelectronics wm5102 and wm5110
- Marvell 88PM800 and 88PM805
- TI twl6041
We also have our regular driver improvements:
- Device tree and IRQ domain support for STE AB8500
- Regmap and devm_* API conversion for TI tps6586x
- Device tree support for Samsung max77686
- devm_* API conversion for STE AB3100
Besides that, quite a lot of fixing and cleanup for mc13xxx, tps65910,
tps65090, da9052 and twl-core."
Fix up mostly trivial conflicts, with the exception of
drivers/usb/host/ehci-omap.c in particular, which had some
re-organization of the reset sequence (commit 1a49e2ac96: "EHCI:
centralize controller initialization") that clashed with commit
2761a63945 ("mfd: USB: Fix the omap-usb EHCI ULPI PHY reset fix
issues").
In particular, commit 2761a63945 moved the usb_add_hcd() to the
*middle* of the reset sequence, which clashes fairly badly with the
reset sequence re-organization (although it could have been done inside
the new omap_ehci_init() function).
I left that part of commit 2761a63945 just undone.
* tag 'mfd-3.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6: (110 commits)
mfd: Ensure AB8500 platform data is passed through db8500-prcmu to MFD Core
mfd: Arizone core should select MFD_CORE
mfd: Fix arizona-irq.c build by selecting REGMAP_IRQ
mfd: Add debug trace on entering and leaving arizone runtime suspend
mfd: Correct tps65090 cell names
mfd: Remove gpio support from tps6586x core driver
ARM: tegra: defconfig: Enable tps6586x gpio
gpio: tps6586x: Add gpio support through platform driver
mfd: Cache tps6586x register through regmap
mfd: Use regmap for tps6586x register access.
mfd: Use devm managed resources for tps6586x
input: Add onkey support for 88PM80X PMIC
mfd: Add support for twl6041
mfd: Fix twl6040 revision information
mfd: Matches should be NULL when populate anatop child devices
input: ab8500-ponkey: Create AB8500 domain IRQ mapping
mfd: Add missing out of memory check for pcf50633
Documentation: Describe the AB8500 Device Tree bindings
mfd: Add tps65910 32-kHz-crystal-input init
mfd: Drop modifying mc13xxx driver's id_table in probe
...
Pull x86 platform driver updates from Matthew Garrett:
"Nothing overly dramatic here - improved support for the Classmate,
some random small fixes and a rework of backlight management to deal
with some of the more awkward cases."
* 'linux-next' of git://cavan.codon.org.uk/platform-drivers-x86:
thinkpad_acpi: Free hotkey_keycode_map after unregistering tpacpi_inputdev
thinkpad_acpi: Fix a memory leak during module exit
thinkpad_acpi: Flush the workqueue before freeing tpacpi_leds
dell-laptop: Add 6 machines to touchpad led quirk
ACER: Fix Smatch double-free issue
ACER: Fix up sparse warning
asus-nb-wmi: add some video toggle keys
asus-nb-wmi: add wapf quirk for ASUS machines
classmate-laptop: Fix extra keys hardware id.
classmate-laptop: Add support for Classmate V4 accelerometer.
asus-wmi: enable resume on lid open
asus-wmi: control backlight power through WMI, not ACPI
samsung-laptop: support R40/R41
acpi/video_detect: blacklist samsung x360
samsung-laptop: X360 ACPI backlight device is broken
drivers-platform-x86: use acpi_video_dmi_promote_vendor()
acpi: add a way to promote/demote vendor backlight drivers
ACER: Add support for accelerometer sensor
asus-wmi: use ASUS_WMI_METHODID_DSTS2 as default DSTS ID.
Pull MIPS updates from Ralf Baechle:
"More hardware support across the field including a bunch of device
drivers. The highlight however really are further steps towards
device tree.
This has been sitting in -next for ages. All MIPS _defconfigs have
been tested to boot or where I don't have hardware available, to at
least build fine."
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (77 commits)
MIPS: Loongson 1B: Add defconfig
MIPS: Loongson 1B: Add board support
MIPS: Netlogic: early console fix
MIPS: Netlogic: Fix indentation of smpboot.S
MIPS: Netlogic: remove cpu_has_dc_aliases define for XLP
MIPS: Netlogic: Remove unused pcibios_fixups
MIPS: Netlogic: Add XLP SoC devices in FDT
MIPS: Netlogic: Add IRQ mappings for more devices
MIPS: Netlogic: USB support for XLP
MIPS: Netlogic: XLP PCIe controller support.
MIPS: Netlogic: Platform changes for XLR/XLS I2C
MIPS: Netlogic: Platform NAND/NOR flash support
MIPS: Netlogic: Platform changes for XLS USB
MIPS: Netlogic: Remove NETLOGIC_ prefix
MIPS: Netlogic: SMP wakeup code update
MIPS: Netlogic: Update comments in smpboot.S
MIPS: BCM63XX: Add 96328avng reference board
MIPS: Expose PCIe drivers for MIPS
MIPS: BCM63XX: Add PCIe Support for BCM6328
MIPS: BCM63XX: Move the PCI initialization into its own function
...
Pull SLAB changes from Pekka Enberg:
"Most of the changes included are from Christoph Lameter's "common
slab" patch series that unifies common parts of SLUB, SLAB, and SLOB
allocators. The unification is needed for Glauber Costa's "kmem
memcg" work that will hopefully appear for v3.7.
The rest of the changes are fixes and speedups by various people."
* 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux: (32 commits)
mm: Fix build warning in kmem_cache_create()
slob: Fix early boot kernel crash
mm, slub: ensure irqs are enabled for kmemcheck
mm, sl[aou]b: Move kmem_cache_create mutex handling to common code
mm, sl[aou]b: Use a common mutex definition
mm, sl[aou]b: Common definition for boot state of the slab allocators
mm, sl[aou]b: Extract common code for kmem_cache_create()
slub: remove invalid reference to list iterator variable
mm: Fix signal SIGFPE in slabinfo.c.
slab: move FULL state transition to an initcall
slab: Fix a typo in commit 8c138b "slab: Get rid of obj_size macro"
mm, slab: Build fix for recent kmem_cache changes
slab: rename gfpflags to allocflags
slub: refactoring unfreeze_partials()
slub: use __cmpxchg_double_slab() at interrupt disabled place
slab/mempolicy: always use local policy from interrupt context
slab: Get rid of obj_size macro
mm, sl[aou]b: Extract common fields from struct kmem_cache
slab: Remove some accessors
slab: Use page struct fields instead of casting
...
Pull security subsystem bugfixes from James Morris.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
selinux: fix selinux_inode_setxattr oops
KEYS: linux/key-type.h needs linux/errno.h
smack: off by one error
- Flip the thin target into new read-only or failed modes if errors
are detected;
- Handle chunk sizes that are not powers of two in the snapshot and
thin targets;
- Provide a way for userspace to avoid replacing an already-loaded
multipath hardware handler while booting;
- Reduce dm_thin_endio_hook slab size to avoid allocation failures;
- Numerous small changes and cleanups to the code.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJQFoJTAAoJEK2W1qbAHj1nh2MP/2c461igABbx6kAJMhSCm+Zo
xiZ97r3dFx4l4+myCfC5dQE+x07zFzPh1r/E1Dk5QXfEMRrpxzyysUGK4rtVwdPF
t4V5jgu5mOO2E1Gk4cmPjYAG+n198P7b+eiMMP0hms8t4kbXaaRfCDklY7sxvOo2
eOdUp/tAPkQWnflHEj7yPHH2WZNrD4MW8OiGnFt22kFwe9RKcm2zPavO8XrnMnco
FiRCahIW1LV+8kPtpNksy1DjTjafJ7S19lNFPC7ZChFN1MKxlIcTJ95XDKqbXAsc
eU2hTc9yBZEp1Fy3dvqoZ3g1YZB7zg2yi6wmIRLuYVuu4W7BQczfCa3VN5+s3fs9
oX2gDnu5GPDcd/FAUeDQRZLScw8dHOIaubDeqMZ4GYA1ns1BliMotArlew13D9/S
YFIrGaMmN2jcmlwNqfzjT4UJUbHnE8lr5FcLixopGmfWl1bc3FkQZ7eXnWBgGX6P
X2SaAgvvEoyEjomDbM+TdnLglBb7GLCer9grOoqw+Y2z6k9RqQTISo6LOQ0Gh1U9
nr1rAaHehFuAOl1BI8BO2CdAtjdybNXrtanfoLaHorECGHkEwojNIfvhHu3Io/YL
gbXlgY4v0j8Bb4xLJkdOsJxD2d5ACzTFPrY61/ApW5TOYn8rBYLSrmJRuIBAr8Cs
Fqd0D/wywFhK9di+4Bxw
=O9sO
-----END PGP SIGNATURE-----
Merge tag 'dm-3.6-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm
Pull device-mapper updates from Alasdair G Kergon:
- Flip the thin target into new read-only or failed modes if errors
are detected;
- Handle chunk sizes that are not powers of two in the snapshot and
thin targets;
- Provide a way for userspace to avoid replacing an already-loaded
multipath hardware handler while booting;
- Reduce dm_thin_endio_hook slab size to avoid allocation failures;
- Numerous small changes and cleanups to the code.
* tag 'dm-3.6-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm: (63 commits)
dm thin: commit before gathering status
dm thin: add read only and fail io modes
dm thin metadata: introduce dm_pool_abort_metadata
dm thin metadata: introduce dm_pool_metadata_set_read_only
dm persistent data: introduce dm_bm_set_read_only
dm thin: reduce number of metadata commits
dm thin metadata: add dm_thin_changed_this_transaction
dm thin metadata: add format option to dm_pool_metadata_open
dm thin metadata: tidy up open and format error paths
dm thin metadata: only check incompat features on open
dm thin metadata: remove duplicate pmd initialisation
dm thin metadata: remove create parameter from __create_persistent_data_objects
dm thin metadata: move __superblock_all_zeroes to __open_or_format_metadata
dm thin metadata: remove nr_blocks arg from __create_persistent_data_objects
dm thin metadata: split __open or format metadata
dm thin metadata: use struct dm_pool_metadata members in __open_or_format_metadata
dm thin metadata: zero unused superblock uuid
dm thin metadata: lift __begin_transaction out of __write_initial_superblock
dm thin metadata: move dm_commit_pool_metadata into __write_initial_superblock
dm thin metadata: factor out __write_initial_superblock
...
Pull DMA-mapping updates from Marek Szyprowski:
"Those patches are continuation of my earlier work.
They contains extensions to DMA-mapping framework to remove limitation
of the current ARM implementation (like limited total size of DMA
coherent/write combine buffers), improve performance of buffer sharing
between devices (attributes to skip cpu cache operations or creation
of additional kernel mapping for some specific use cases) as well as
some unification of the common code for dma_mmap_attrs() and
dma_mmap_coherent() functions. All extensions have been implemented
and tested for ARM architecture."
* 'for-linus-for-3.6-rc1' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping:
ARM: dma-mapping: add support for DMA_ATTR_SKIP_CPU_SYNC attribute
common: DMA-mapping: add DMA_ATTR_SKIP_CPU_SYNC attribute
ARM: dma-mapping: add support for dma_get_sgtable()
common: dma-mapping: introduce dma_get_sgtable() function
ARM: dma-mapping: add support for DMA_ATTR_NO_KERNEL_MAPPING attribute
common: DMA-mapping: add DMA_ATTR_NO_KERNEL_MAPPING attribute
common: dma-mapping: add support for generic dma_mmap_* calls
ARM: dma-mapping: fix error path for memory allocation failure
ARM: dma-mapping: add more sanity checks in arm_dma_mmap()
ARM: dma-mapping: remove custom consistent dma region
mm: vmalloc: use const void * for caller argument
scatterlist: add sg_alloc_table_from_pages function
Pull Exynos DRM changes from Dave Airlie:
"So I totally missed Inki's pull request for -next, its fully exynos
self contained."
(I took just the actual commits, not Dave's two extraneous merges)
* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (30 commits)
drm/exynos: fixed exception to page allocation failure
drm/exynos: use __free_page() to deallocate memory
drm/exynos: fixed a comment to gem size.
drm/exynos: removed unnecessary variable
drm/exynos: do not release memory region from exporter.
drm/exynos: set buffer type from exporter.
drm/exynos: use alloc_page() to allocate pages.
drm/exynos: fixed build warning.
drm/exynos: fixed edid data setting at vidi connection request
drm/exynos: check if raw edid data is fake or not for test
drm/exynos: set edid fake data only for test.
drm/exynos: removed unnecessary declaration.
drm/exynos: fix buffer pitch calculation
drm/exynos: check for null in return value of dma_buf_map_attachment()
drm/exynos: return NULL if exynos_pages_to_sg fails
drm/exynos: Use devm_* functions in exynos_mixer.c
drm/exynos: Use devm_* functions in exynos_hdmi.c
drm/exynos: Use devm_* functions in exynos_drm_fimd.c
drm/exynos: Add missing static storage class specifier
drm/exynos: add property for crtc mode
...
Pull input updates from Dmitry Torokhov:
"A new driver for FT5x06 based EDT displays and a couple of other
driver changes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: synaptics - handle out of bounds values from the hardware
Input: wacom - add support to Cintiq 22HD
Input: add driver for FT5x06 based EDT displays
Pull EDAC patches from Mauro Carvalho Chehab:
- the second part of the EDAC rework:
- Add the sysfs nodes that exports the real memory layout, instead
of the fake one (needed to properly represent Intel memory
controllers since 2002)
- convert EDAC MC to use "struct device" instead of creating the
sysfs nodes via the kobj API
- adds a tracepoint to represent memory errors
- some cleanup patches
- some fixes at i5000, i5400 and EDAC core
- a new EDAC driver for Caldera.
* git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac: (33 commits)
edac i5000, i5400: fix pointer math in i5000_get_mc_regs()
edac: allow specifying the error count with fake_inject
edac: add support for Calxeda highbank L2 cache ecc
edac: add support for Calxeda highbank memory controller
edac: create top-level debugfs directory
sb_edac: properly handle error count
i7core_edac: properly handle error count
edac: edac_mc_handle_error(): add an error_count parameter
edac: remove arch-specific parameter for the error handler
amd64_edac: Don't pass driver name as an error parameter
edac_mc: check for allocation failure in edac_mc_alloc()
edac: Increase version to 3.0.0
edac_mc: Cleanup per-dimm_info debug messages
edac: Convert debugfX to edac_dbg(X,
edac: Use more normal debugging macro style
edac: Don't add __func__ or __FILE__ for debugf[0-9] msgs
Edac: Add ABI Documentation for the new device nodes
edac: move documentation ABI to ABI/testing/sysfs-devices-edac
i7core_edac: change the mem allocation scheme to make Documentation/kobject.txt happy
edac: change the mem allocation scheme to make Documentation/kobject.txt happy
...
- A new sysfs attribute to tell local and remote nodes apart.
Useful to set special permissions/ ownership of local nodes'
/dev/fw*, to start daemons on them (for diagnostics, management,
AV targets, VersaPHY initiator or targets...), to pick up their
GUID to use it as GUID of an SBP2 target instance, and of course
for informational purposes.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iQIcBAABAgAGBQJQFPvkAAoJEHnzb7JUXXnQ1qkP/Akky4rnp2jtAi0f97FrrVIX
D5kf4buehiOY0NN5NupbP2ABVLG9pq6aiH0FqaUjLE/t5g0xTKUaWwIy1ybr9rNT
luN2xrAqkkA49jVtTIlE1sSn9tt7E8dRbmiggCrci2S0K0FTuG2V3EXCaACEojQQ
vPzW4Hm+PWlQRUItBYBRbMlZrxqkBAIBY+u0I8rTWMv4nwMchHzT6CeZ510fe4tA
plr3UDsbToqBRxkEIQ6E5IJe0XDP+hwcaaKwRR0v2zaZU/Xy0pJK1ZoiUJUEOmIA
c2YmbWvYQJjGawm3SFHxN+OjhrOEi4gTgwhgJ88K3MK6tCk+edB1eML+F59gYwR4
FATnNYUnAQBrcIfayxIct7aTVqmlFdtp9un0YfqQrcr9yVTHPG2RD6jc5sYW9sYG
7+Pmdkp1EzPBpgPM8/BwsffEwvDtFjkmQqru6J/nVuyTfDZnUTwncndY0iChVR2F
kVcgZicZzYK34yER+yUGcFjEEnlSuQZ/l0CXB6r4t5KHrDc3os4gHnYOFgDnppvk
nXLKyADeIyvQpDhsZlyLOwtsoYQbhqlmTc17yukW+wrgXZet8Uh4qE1Q1UW2Mm1y
OsRCO/0uNU4eg7IAWHL7+mCHGG15CWpmj1Xp6+apj9fRTXi6tcI5/M+U6yz3Nkfw
ChV+ObvejQmBQadE1g81
=8e6I
-----END PGP SIGNATURE-----
Merge tag 'firewire-updates' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394
Pull firewire updates from Stefan Richter:
- Small fixes and optimizations.
- A new sysfs attribute to tell local and remote nodes apart.
Useful to set special permissions/ ownership of local nodes'
/dev/fw*, to start daemons on them (for diagnostics, management,
AV targets, VersaPHY initiator or targets...), to pick up their
GUID to use it as GUID of an SBP2 target instance, and of course
for informational purposes.
* tag 'firewire-updates' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
firewire: core: document is_local sysfs attribute
firewire: core: add is_local sysfs device attribute
firewire: ohci: initialize multiChanMode bits after reset
firewire: core: fix multichannel IR with buffers larger than 2 GB
firewire: ohci: sanity-check MMIO resource
firewire: ohci: lazy bus time initialization
firewire: core: allocate the low memory region
firewire: core: make address handler length 64 bits
This adds a new utility routine which will return a dynamically-
allocated buffer containing a string that has been decoded from ceph
over-the-wire format. It also returns the length of the string
if the address of a size variable is supplied to receive it.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
There is a BUG_ON() call that doesn't account for the single byte
structure version at the start of an encoded filepath in
ceph_encode_filepath(). Fix that.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Add an atomic variable 'stopping' as flag in struct ceph_messenger,
set this flag to 1 in function ceph_destroy_client(), and add the condition code
in function ceph_data_ready() to test the flag value, if true(1), just return.
Signed-off-by: Guanjun He <gjhe@suse.com>
Reviewed-by: Sage Weil <sage@inktank.com>
In ancient times, the messenger could both initiate and accept connections.
An artifact if that was data structures to store/process an incoming
ceph_msg_connect request and send an outgoing ceph_msg_connect_reply.
Sadly, the negotiation code was referencing those structures and ignoring
important information (like the peer's connect_seq) from the correct ones.
Among other things, this fixes tight reconnect loops where the server sends
RETRY_SESSION and we (the client) retries with the same connect_seq as last
time. This bug pretty easily triggered by injecting socket failures on the
MDS and running some fs workload like workunits/direct_io/test_sync_io.
Signed-off-by: Sage Weil <sage@inktank.com>
Initialize the type field for messages in a msgpool. The caller was doing
this for osd ops, but not for the reply messages.
Reported-by: Alex Elder <elder@inktank.com>
Signed-off-by: Sage Weil <sage@inktank.com>
Pull PWM subsystem from Thierry Reding:
"The new PWM subsystem aims at collecting all implementations of the
legacy PWM API and to eventually replace it completely.
The subsystem has been in development for over half a year now and
many drivers have already been converted. It has been in linux-next
for a couple of weeks and there have been no major issues so I think
it is ready for inclusion in your tree."
Arnd Bergmann <arnd@arndb.de>:
"Very much Ack on the new subsystem. It uses the interface
declarations as the previously separate pwm drivers, so nothing
changes for now in the drivers using it, although it enables us to
change those more easily in the future if we want to.
This work is also one of the missing pieces that are required to
eventually build ARM kernels for multiple platforms, which is
currently prohibited (amongs other things) by the fact that you cannot
have more than one driver exporting the pwm functions."
Tested-and-acked-by: Alexandre Courbot <acourbot@nvidia.com>
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Acked-by: Philip, Avinash <avinashphilip@ti.com> # TI's AM33xx platforms
Acked-By: Alexandre Pereira da Silva <aletes.xgr@gmail.com> # LPC32XX
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Sachin Kamat <sachin.kamat@linaro.org>
Fix up trivial conflicts with other cleanups and DT updates.
* 'for-3.6' of git://gitorious.org/linux-pwm/linux-pwm: (36 commits)
pwm: pwm-tiehrpwm: PWM driver support for EHRPWM
pwm: pwm-tiecap: PWM driver support for ECAP APWM
pwm: fix used-uninitialized warning in pwm_get()
pwm: add lpc32xx PWM support
pwm_backlight: pass correct brightness to callback
pwm: Use pr_* functions in pwm-samsung.c file
pwm: Convert pwm-samsung to use devm_* APIs
pwm: Convert pwm-tegra to use devm_clk_get()
pwm: pwm-mxs: Return proper error if pwmchip_remove() fails
pwm: pwm-bfin: Return proper error if pwmchip_remove() fails
pwm: pxa: Propagate pwmchip_remove() error
pwm: Convert pwm-pxa to use devm_* APIs
pwm: Convert pwm-vt8500 to use devm_* APIs
pwm: Convert pwm-imx to use devm_* APIs
pwm: Conflict with legacy PWM API
pwm: pwm-mxs: add pinctrl support
pwm: pwm-mxs: use devm_* managed functions
pwm: pwm-mxs: use global reset function stmp_reset_block
pwm: pwm-mxs: encode soc name in compatible string
pwm: Take over maintainership of the PWM subsystem
...
This patch adds DMA_ATTR_SKIP_CPU_SYNC attribute to the DMA-mapping
subsystem.
By default dma_map_{single,page,sg} functions family transfer a given
buffer from CPU domain to device domain. Some advanced use cases might
require sharing a buffer between more than one device. This requires
having a mapping created separately for each device and is usually
performed by calling dma_map_{single,page,sg} function more than once
for the given buffer with device pointer to each device taking part in
the buffer sharing. The first call transfers a buffer from 'CPU' domain
to 'device' domain, what synchronizes CPU caches for the given region
(usually it means that the cache has been flushed or invalidated
depending on the dma direction). However, next calls to
dma_map_{single,page,sg}() for other devices will perform exactly the
same sychronization operation on the CPU cache. CPU cache sychronization
might be a time consuming operation, especially if the buffers are
large, so it is highly recommended to avoid it if possible.
DMA_ATTR_SKIP_CPU_SYNC allows platform code to skip synchronization of
the CPU cache for the given buffer assuming that it has been already
transferred to 'device' domain. This attribute can be also used for
dma_unmap_{single,page,sg} functions family to force buffer to stay in
device domain after releasing a mapping for it. Use this attribute with
care!
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Kyungmin Park <kyungmin.park@samsung.com>
This patch adds dma_get_sgtable() function which is required to let
drivers to share the buffers allocated by DMA-mapping subsystem. Right
now the driver gets a dma address of the allocated buffer and the kernel
virtual mapping for it. If it wants to share it with other device (= map
into its dma address space) it usually hacks around kernel virtual
addresses to get pointers to pages or assumes that both devices share
the DMA address space. Both solutions are just hacks for the special
cases, which should be avoided in the final version of buffer sharing.
To solve this issue in a generic way, a new call to DMA mapping has been
introduced - dma_get_sgtable(). It allocates a scatter-list which
describes the allocated buffer and lets the driver(s) to use it with
other device(s) by calling dma_map_sg() on it.
This patch provides a generic implementation based on virt_to_page()
call. Architectures which require more sophisticated translation might
provide their own get_sgtable() methods.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Kyungmin Park <kyungmin.park@samsung.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This patch adds DMA_ATTR_NO_KERNEL_MAPPING attribute which lets the
platform to avoid creating a kernel virtual mapping for the allocated
buffer. On some architectures creating such mapping is non-trivial task
and consumes very limited resources (like kernel virtual address space
or dma consistent address space). Buffers allocated with this attribute
can be only passed to user space by calling dma_mmap_attrs().
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Kyungmin Park <kyungmin.park@samsung.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Commit 9adc5374 ('common: dma-mapping: introduce mmap method') added a
generic method for implementing mmap user call to dma_map_ops structure.
This patch converts ARM and PowerPC architectures (the only providers of
dma_mmap_coherent/dma_mmap_writecombine calls) to use this generic
dma_map_ops based call and adds a generic cross architecture
definition for dma_mmap_attrs, dma_mmap_coherent, dma_mmap_writecombine
functions.
The generic mmap virt_to_page-based fallback implementation is provided for
architectures which don't provide their own implementation for mmap method.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Kyungmin Park <kyungmin.park@samsung.com>
This patch changes dma-mapping subsystem to use generic vmalloc areas
for all consistent dma allocations. This increases the total size limit
of the consistent allocations and removes platform hacks and a lot of
duplicated code.
Atomic allocations are served from special pool preallocated on boot,
because vmalloc areas cannot be reliably created in atomic context.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Kyungmin Park <kyungmin.park@samsung.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
'const void *' is a safer type for caller function type. This patch
updates all references to caller function type.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Kyungmin Park <kyungmin.park@samsung.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
This patch adds a new constructor for an sg table. The table is constructed
from an array of struct pages. All contiguous chunks of the pages are merged
into a single sg nodes. A user may provide an offset and a size of a buffer if
the buffer is not page-aligned.
The function is dedicated for DMABUF exporters which often perform conversion
from an page array to a scatterlist. Moreover the scatterlist should be
squashed in order to save memory and to speed-up the process of DMA mapping
using dma_map_sg.
The code is based on the patch 'v4l: vb2-dma-contig: add support for
scatterlist in userptr mode' and hints from Laurent Pinchart.
Signed-off-by: Tomasz Stanislawski <t.stanislaws@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
CC: Andrew Morton <akpm@linux-foundation.org>
linux/key-type.h needs to #include linux/errno.h as it refers to ENOKEY.
Without this, with sparc's allmodconfig in one of my test trees, the following
error occurs:
include/linux/key-type.h: In function 'key_negate_and_link':
include/linux/key-type.h:122:43: error: 'ENOKEY' undeclared (first use in this function)
include/linux/key-type.h:122:43: note: each undeclared identifier is reported only once for each fun
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
This patch adds support for the new VIRTIO_BLK_F_CONFIG_WCE feature,
which exposes the cache mode in the configuration space and lets the
driver modify it. The cache mode is exposed via sysfs.
Even if the host does not support the new feature, the cache mode is
visible (thanks to the existing VIRTIO_BLK_F_WCE), but not modifiable.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* devel: (33 commits)
edac i5000, i5400: fix pointer math in i5000_get_mc_regs()
edac: allow specifying the error count with fake_inject
edac: add support for Calxeda highbank L2 cache ecc
edac: add support for Calxeda highbank memory controller
edac: create top-level debugfs directory
sb_edac: properly handle error count
i7core_edac: properly handle error count
edac: edac_mc_handle_error(): add an error_count parameter
edac: remove arch-specific parameter for the error handler
amd64_edac: Don't pass driver name as an error parameter
edac_mc: check for allocation failure in edac_mc_alloc()
edac: Increase version to 3.0.0
edac_mc: Cleanup per-dimm_info debug messages
edac: Convert debugfX to edac_dbg(X,
edac: Use more normal debugging macro style
edac: Don't add __func__ or __FILE__ for debugf[0-9] msgs
Edac: Add ABI Documentation for the new device nodes
edac: move documentation ABI to ABI/testing/sysfs-devices-edac
i7core_edac: change the mem allocation scheme to make Documentation/kobject.txt happy
edac: change the mem allocation scheme to make Documentation/kobject.txt happy
...
Adds audit messages for unexpected link restriction violations so that
system owners will have some sort of potentially actionable information
about misbehaving processes.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This adds symlink and hardlink restrictions to the Linux VFS.
Symlinks:
A long-standing class of security issues is the symlink-based
time-of-check-time-of-use race, most commonly seen in world-writable
directories like /tmp. The common method of exploitation of this flaw
is to cross privilege boundaries when following a given symlink (i.e. a
root process follows a symlink belonging to another user). For a likely
incomplete list of hundreds of examples across the years, please see:
http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=/tmp
The solution is to permit symlinks to only be followed when outside
a sticky world-writable directory, or when the uid of the symlink and
follower match, or when the directory owner matches the symlink's owner.
Some pointers to the history of earlier discussion that I could find:
1996 Aug, Zygo Blaxell
http://marc.info/?l=bugtraq&m=87602167419830&w=2
1996 Oct, Andrew Tridgell
http://lkml.indiana.edu/hypermail/linux/kernel/9610.2/0086.html
1997 Dec, Albert D Cahalan
http://lkml.org/lkml/1997/12/16/4
2005 Feb, Lorenzo Hernández García-Hierro
http://lkml.indiana.edu/hypermail/linux/kernel/0502.0/1896.html
2010 May, Kees Cook
https://lkml.org/lkml/2010/5/30/144
Past objections and rebuttals could be summarized as:
- Violates POSIX.
- POSIX didn't consider this situation and it's not useful to follow
a broken specification at the cost of security.
- Might break unknown applications that use this feature.
- Applications that break because of the change are easy to spot and
fix. Applications that are vulnerable to symlink ToCToU by not having
the change aren't. Additionally, no applications have yet been found
that rely on this behavior.
- Applications should just use mkstemp() or O_CREATE|O_EXCL.
- True, but applications are not perfect, and new software is written
all the time that makes these mistakes; blocking this flaw at the
kernel is a single solution to the entire class of vulnerability.
- This should live in the core VFS.
- This should live in an LSM. (https://lkml.org/lkml/2010/5/31/135)
- This should live in an LSM.
- This should live in the core VFS. (https://lkml.org/lkml/2010/8/2/188)
Hardlinks:
On systems that have user-writable directories on the same partition
as system files, a long-standing class of security issues is the
hardlink-based time-of-check-time-of-use race, most commonly seen in
world-writable directories like /tmp. The common method of exploitation
of this flaw is to cross privilege boundaries when following a given
hardlink (i.e. a root process follows a hardlink created by another
user). Additionally, an issue exists where users can "pin" a potentially
vulnerable setuid/setgid file so that an administrator will not actually
upgrade a system fully.
The solution is to permit hardlinks to only be created when the user is
already the existing file's owner, or if they already have read/write
access to the existing file.
Many Linux users are surprised when they learn they can link to files
they have no access to, so this change appears to follow the doctrine
of "least surprise". Additionally, this change does not violate POSIX,
which states "the implementation may require that the calling process
has permission to access the existing file"[1].
This change is known to break some implementations of the "at" daemon,
though the version used by Fedora and Ubuntu has been fixed[2] for
a while. Otherwise, the change has been undisruptive while in use in
Ubuntu for the last 1.5 years.
[1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/linkat.html
[2] http://anonscm.debian.org/gitweb/?p=collab-maint/at.git;a=commitdiff;h=f4114656c3a6c6f6070e315ffdf940a49eda3279
This patch is based on the patches in Openwall and grsecurity, along with
suggestions from Al Viro. I have added a sysctl to enable the protected
behavior, and documentation.
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Patch works around the below silicon errata:
During LCDC initialization, there is the potential for a FIFO
underflow condition to occur. A FIFO underflow condition
occurs when the input FIFO is completely empty and the LCDC
raster controller logic that drives data to the output pins
attempts to fetch data from the FIFO. When a FIFO underflow
condition occurs, incorrect data will be driven out on the
LCDC data pins.
Software should poll the FUF bit field in the LCD_STAT register
to check if an error condition has occurred or service the
interrupt if FUF_EN is enabled when FUF occurs. If the FUF bit
field has been set to 1, this will indicate an underflow
condition has occurred and then the software should execute a
reset of the LCDC via the LPSC.
This problem may occur if the LCDC FIFO threshold size
(LCDDMA_CTRL[TH_FIFO_READY]) is left at its default value after
reset. Increasing the FIFO threshold size will reduce or
eliminate underflows. Setting the threshold size to 256 double
words or larger is recommended.
Above issue is described in section 2.1.3 of silicon errata
http://www.ti.com/lit/er/sprz313e/sprz313e.pdf
Signed-off-by: Rajashekhara, Sudhakar <sudhakar.raj@ti.com>
Signed-off-by: Manjunathappa, Prakash <prakash.pm@ti.com>
Signed-off-by: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
Pull embedded i2c changes from Wolfram Sang:
"Changes for the "embedded" part of the I2C subsystem:
- lots of devicetree conversions of drivers (and preparations for
that)
- big cleanups for drivers for OMAP, Tegra, Nomadik, Blackfin
- Rafael's struct dev_pm_ops conversion patches for I2C
- usual driver cleanups and fixes
All patches have been in linux-next for an apropriate time and all
patches touching files outside of i2c-folders should have proper acks
from the maintainers."
* 'i2c-embedded/for-next' of git://git.pengutronix.de/git/wsa/linux: (60 commits)
Revert "i2c: tegra: convert normal suspend/resume to *_noirq"
I2C: MV64XYZ: Add Device Tree support
i2c: stu300: use devm managed resources
i2c: i2c-ocores: support for 16bit and 32bit IO
V4L/DVB: mfd: use reg_shift instead of regstep
i2c: i2c-ocores: Use reg-shift property
i2c: i2c-ocores: DT bindings and minor fixes.
i2c: mv64xxxx: remove EXPERIMENTAL tag
i2c-s3c2410: Use plain pm_runtime_put()
i2c: s3c2410: Fix pointer type passed to of_match_node()
i2c: mxs: Set I2C timing registers for mxs-i2c
i2c: i2c-bfin-twi: Move blackfin TWI register access Macro to head file.
i2c: i2c-bfin-twi: Move TWI peripheral pin request array to platform data.
i2c:i2c-bfin-twi: include twi head file
i2c:i2c-bfin-twi: TWI fails to restart next transfer in high system load.
i2c: i2c-bfin-twi: Tighten condition when failing I2C transfer if MEN bit is reset unexpectedly.
i2c: i2c-bfin-twi: Break dead waiting loop if i2c device misbehaves.
i2c: i2c-bfin-twi: Improve the patch for bug "Illegal i2c bus lock upon certain transfer scenarios".
i2c: i2c-bfin-twi: Illegal i2c bus lock upon certain transfer scenarios.
i2c-mv64xxxx: allow more than one driver instance
...
Conflicts:
drivers/i2c/busses/i2c-nomadik.c
Instead of adding a big blacklist in video_detect.c to set
ACPI_VIDEO_BACKLIGHT_DMI_VENDOR correctly, let external modules
promote or demote themselves when they know the generic video
module won't work.
Currently drivers where using acpi_video_unregister() directly
but:
- That didn't respect any acpi_backlight=[video|vendor] parameter
provided by the user.
- Any later call to acpi_video_register() would still re-load the
generic video module (and some gpu drivers are doing that).
This patch fix those two issues.
Signed-off-by: Corentin Chary <corentin.chary@gmail.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Pull ARM updates from Russell King:
"First ARM push of this merge window, post me coming back from holiday.
This is what has been in linux-next for the last few weeks. Not much
to say which isn't described by the commit summaries."
* 'for-linus' of git://git.linaro.org/people/rmk/linux-arm: (32 commits)
ARM: 7463/1: topology: Update cpu_power according to DT information
ARM: 7462/1: topology: factorize the update of sibling masks
ARM: 7461/1: topology: Add arch_scale_freq_power function
ARM: 7456/1: ptrace: provide separate functions for tracing syscall {entry,exit}
ARM: 7455/1: audit: move syscall auditing until after ptrace SIGTRAP handling
ARM: 7454/1: entry: don't bother with syscall tracing on ret_from_fork path
ARM: 7453/1: audit: only allow syscall auditing for pure EABI userspace
ARM: 7452/1: delay: allow timer-based delay implementation to be selected
ARM: 7451/1: arch timer: implement read_current_timer and get_cycles
ARM: 7450/1: dcache: select DCACHE_WORD_ACCESS for little-endian ARMv6+ CPUs
ARM: 7449/1: use generic strnlen_user and strncpy_from_user functions
ARM: 7448/1: perf: remove arm_perf_pmu_ids global enumeration
ARM: 7447/1: rwlocks: remove unused branch labels from trylock routines
ARM: 7446/1: spinlock: use ticket algorithm for ARMv6+ locking implementation
ARM: 7445/1: mm: update CONTEXTIDR register to contain PID of current process
ARM: 7444/1: kernel: add arch-timer C3STOP feature
ARM: 7460/1: remove asm/locks.h
ARM: 7439/1: head.S: simplify initial page table mapping
ARM: 7437/1: zImage: Allow DTB command line concatenation with ATAG_CMDLINE
ARM: 7436/1: Do not map the vectors page as write-through on UP systems
...
Passed network namespace replaced hard-coded init_net
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This is a cleanup patch - makes code looks simplier.
It replaces widely used rqstp->rq_xprt->xpt_net by introduced SVC_NET(rqstp).
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This is required for per-network NLM shutdown and cleanup.
This patch passes init_net for a while.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Here's the "tiny" set of patches for 3.6-rc1 for the tty layer and
serial drivers. They were cherry-picked from the tty-next branch of the
tty git tree, as they are small and "obvious" fixes. The larger
changes, as mentioned before, will be saved for the 3.7-rc1 merge
window.
All of these changes have been in the linux-next releases for quite a
while.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
iEYEABECAAYFAlAS5OkACgkQMUfUDdst+ykYLgCfdCz1v64yvk21r8un9fa3JXYW
dIIAoNcIeLHsUOcmMSp0JN2oZkBNZOb7
=y7gx
-----END PGP SIGNATURE-----
Merge tag 'tty-3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull TTY/Serial patches from Greg Kroah-Hartman:
"Here's the "tiny" set of patches for 3.6-rc1 for the tty layer and
serial drivers. They were cherry-picked from the tty-next branch of
the tty git tree, as they are small and "obvious" fixes. The larger
changes, as mentioned before, will be saved for the 3.7-rc1 merge
window.
All of these changes have been in the linux-next releases for quite a
while.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"
* tag 'tty-3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
pch_uart: Fix parity setting issue
pch_uart: Fix rx error interrupt setting issue
pch_uart: Fix missing break for 16 byte fifo
tty ldisc: Close/Reopen race prevention should check the proper flag
pch_uart: Add eg20t_port lock field, avoid recursive spinlocks
vt: fix race in vt_waitactive()
serial/of-serial: Add LPC3220 standard UART compatible string
serial/8250: Add LPC3220 standard UART type
serial_core: Update buffer overrun statistics.
serial: samsung: Fixed wrong comparison for baudclk_rate
Pull final kmap_atomic cleanups from Cong Wang:
"This should be the final round of cleanup, as the definitions of enum
km_type finally get removed from the whole tree. The patches have
been in linux-next for a long time."
* 'kmap_atomic' of git://github.com/congwang/linux:
pipe: remove KM_USER0 from comments
vmalloc: remove KM_USER0 from comments
feature-removal-schedule.txt: remove kmap_atomic(page, km_type)
tile: remove km_type definitions
um: remove km_type definitions
asm-generic: remove km_type definitions
avr32: remove km_type definitions
frv: remove km_type definitions
powerpc: remove km_type definitions
arm: remove km_type definitions
highmem: remove the deprecated form of kmap_atomic
tile: remove usage of enum km_type
frv: remove the second parameter of kmap_atomic_primary()
jbd2: remove the second argument of kmap_atomic
Commit outstanding metadata before returning the status for a dm thin
pool so that the numbers reported are as up-to-date as possible.
The commit is not performed if the device is suspended or if
the DM_NOFLUSH_FLAG is supplied by userspace and passed to the target
through a new 'status_flags' parameter in the target's dm_status_fn.
The userspace dmsetup tool will support the --noflush flag with the
'dmsetup status' and 'dmsetup wait' commands from version 1.02.76
onwards.
Tested-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Allow targets to override the 'supports flush' calculation.
Set 'flush_supported' if a target needs to receive flushes regardless of
whether or not its underlying devices have support.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
This patch introduces a new variable split_discard_requests. It can be
set by targets so that discard requests are split on max_io_len
boundaries.
When split_discard_requests is not set, discard requests are only split on
boundaries between targets, as was the case before this patch.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Remove the restriction that limits a target's specified maximum incoming
I/O size to be a power of 2.
Rename this setting from 'split_io' to the less-ambiguous 'max_io_len'.
Change it from sector_t to uint32_t, which is plenty big enough, and
introduce a wrapper function dm_set_target_max_io_len() to set it.
Use sector_div() to process it now that it is not necessarily a power of 2.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Remove unused dm_flush_fn .flush target method from header.
This was left-over from the FLUSH/FUA conversion and is no longer used.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Pull LED subsystem update from Bryan Wu.
* 'for-3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds: (50 commits)
leds-lp8788: forgotten unlock at lp8788_led_work
LEDS: propagate error codes in blinkm_detect()
LEDS: memory leak in blinkm_led_common_set()
leds: add new lp8788 led driver
LEDS: add BlinkM RGB LED driver, documentation and update MAINTAINERS
leds: max8997: Simplify max8997_led_set_mode implementation
leds/leds-s3c24xx: use devm_gpio_request
leds: convert Network Space v2 LED driver to devm_kzalloc() and cleanup error exit path
leds: convert DAC124S085 LED driver to devm_kzalloc()
leds: convert LM3530 LED driver to devm_kzalloc() and cleanup error exit path
leds: convert TCA6507 LED driver to devm_kzalloc()
leds: convert Freescale MC13783 LED driver to devm_kzalloc() and cleanup error exit path
leds: convert ADP5520 LED driver to devm_kzalloc() and cleanup error exit path
leds: convert PCA955x LED driver to devm_kzalloc() and cleanup error exit path
leds: convert Sun Fire LED driver to devm_kzalloc() and cleanup error exit path
leds: convert PCA9532 LED driver to devm_kzalloc()
leds: convert LT3593 LED driver to devm_kzalloc()
leds: convert Renesas TPU LED driver to devm_kzalloc() and cleanup error exit path
leds: convert LP5523 LED driver to devm_kzalloc() and cleanup error exit path
leds: convert PCA9633 LED driver to devm_kzalloc()
...
The exynos drm driver used a specific ioctl - DRM_EXYNOS_PLANE_SET_ZPOS
to set zpos of plane. It can be substitute to property of plane. This
patch adds a property for plane zpos and removes
DRM_EXYNOS_PLANE_SET_ZPOS ioctl.
Signed-off-by: Joonyoung Shim <jy0922.shim@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Pull networking updates and fixes from David Miller:
1) Reinstate the no-ref optimization for input route lookups in ipv4 to
fix some routing cache removal perf regressions.
2) Make TCP socket pre-demux work on ipv6 side too, from Eric Dumazet.
3) Get RX hash value from correct place in be2net driver, from
Sarveshwar Bandi.
4) Validation of FIB cached routes missing critical check, from Eric
Dumazet.
5) EEH support in mlx4 driver, from Kleber Sacilotto de Souza.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (23 commits)
ipv6: Early TCP socket demux
ipv4: Fix input route performance regression.
pch_gbe: vlan skb len fix
pch_gbe: add extra clean tx
pch_gbe: fix transmit watchdog timeout
ixgbe: fix panic while dumping packets on Tx hang with IOMMU
be2net: Fix to parse RSS hash from Receive completions correctly.
net/mlx4_en: Limit the RFS filter IDs to be < RPS_NO_FILTER
hyperv: Add error handling to rndis_filter_device_add()
hyperv: Add a check for ring_size value
ipv4: rt_cache_valid must check expired routes
net/pch_gpe: Cannot disable ethernet autonegation
qeth: repair crash in qeth_l3_vlan_rx_kill_vid()
netiucv: cleanup attribute usage
net: wiznet add missing HAS_IOMEM dependency
be2net: Missing byteswap in be_get_fw_log_level causes oops on PowerPC
mlx4: Add support for EEH error recovery
cdc-ncm: tag Ericsson WWAN devices (eg F5521gw) with FLAG_WWAN
wanmain: comparing array with NULL
caif: fix NULL pointer check
...
As introduced in Rusty's commit 29c0177e6a, the function has no
parameter @len, so need to remove it from comments to avoid kernel-doc
warning:
alexs@debian:~/linux-next$ scripts/kernel-doc -man
include/linux/cpumask.h | split-man.pl /tmp/man
....
Warning(include/linux/cpumask.h:602): Excess function parameter 'len'
description in 'cpulist_parse'
and correct the function name in comments to cpulist_parse.
Signed-off-by: Alex Shi <alex.shi@intel.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
main.c has initcall_level_names[] for parse_args to print in debug messages,
add comments to keep them in sync with initcalls defined in init.h.
Also add "loadable" into comment re not using *_initcall macros in
modules, to disambiguate from kernel/params.c and other builtins.
Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Current few cpumask functions' purposes are not quite clear. Stupid
user like myself needs to dig into details for clear function
purpose and return value.
Add few explanation for them is helpful.
Thanks for Srivatsa's comments and correction!
Signed-off-by: Alex Shi <alex.shi@intel.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
- custom binary format support from Sjur Brændeland
- groundwork for recovery and runtime pm support
- some cleanups and API simplifications
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJQD75aAAoJELLolMlTRIoMM84P/RfnGhFl/mzKmDssFtwkajIb
HRrZ1/YWjAjnSOvD5FF+LrS4TtxPgx9747I1Xh49Uk7DHb6siNTDUNbFJNifWo7g
TbKo3LgLllfNXnwbunIfiGqOW2HzaCVo6vhk59fL3QAsdtCDlROegt1NbOrV8T2S
+XZlgYhhUSlIQQCgsRBVJWbwP4k5PtAYmwL1VN7ONgj2ILJDP7MhAasDMl+KeGSG
128g+aCoIkfc9vC+ghDNAfE6DsHHeVyXAWPoWHoyjiteZ1NDkGxUBdTbsrtMYM2K
ZguEISfVNcMSk10HhoYWtqYZfZUTEM18kOt/182CEwpRVRE34Z7fhcQCiCGX6u6v
E+7tNj/0qow+dcj2OtS3NEePIHKcuvjBQ09b6GED+qsmC8lENo2Ly364T6JfriNl
tv1PShvmodrBlLFusAikJKuzYFI9xgQawpL3oV0pMrEiujHqgNhuqrCQJkIWnA8d
9At2RZAMdFBFa7gd90lPicVqPR9HcGipVk7bKRyFAoqmpPLI85Nm1r9l4TPqEXpC
Otb373gX30yqWNRD9Hmmxx/+40S2odELEDN4wrPRF+cCAbmWtFey415gdoaqBESn
BiTRrHxNy2+dKsCgSUQMyn59rF552qrCb31REYAyMMImpftAaWP35OREFNn6oqj2
vzxsc4wcWZaLflrqQGj6
=vBgg
-----END PGP SIGNATURE-----
Merge tag 'remoteproc-for-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/remoteproc
Pull remoteproc update from Ohad Ben-Cohen:
- custom binary format support from Sjur Brændeland
- groundwork for recovery and runtime pm support
- some cleanups and API simplifications
Fix up conflicts in drivers/remoteproc/remoteproc_core.c due to clashes
with earlier cleanups by Sjur Brændeland (with part of the cleanups
moved into the new remoteproc_elf_loader.c file).
* tag 'remoteproc-for-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/remoteproc:
MAINTAINERS: add remoteproc's git
remoteproc: Support custom firmware handlers
remoteproc: Move Elf related functions to separate file
remoteproc: Add function rproc_get_boot_addr
remoteproc: Pass struct fw to load_segments and find_rsc_table.
remoteproc: adopt the driver core's alloc/add/del/put naming
remoteproc: remove the get_by_name/put API
remoteproc: support non-iommu carveout assignment
remoteproc: simplify unregister/free interfaces
remoteproc: remove the now-redundant kref
remoteproc: maintain a generic child device for each rproc
remoteproc: allocate vrings on demand, free when not needed
This is the IPv6 missing bits for infrastructure added in commit
41063e9dd1 (ipv4: Early TCP socket demux.)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the routing cache removal we lost the "noref" code paths on
input, and this can kill some routing workloads.
Reinstate the noref path when we hit a cached route in the FIB
nexthops.
With help from Eric Dumazet.
Reported-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>