Including <linux/bug.h> pulls in a lot of bloat from <asm/bug.h> and
<asm-generic/bug.h> that is not needed to call the BUILD_BUG() family of
macros. Split them out into their own header, <linux/build_bug.h>.
Also correct some checkpatch.pl errors for the BUILD_BUG_ON_ZERO() and
BUILD_BUG_ON_NULL() macros by adding parentheses around the bitfield
widths that begin with a minus sign.
Link: http://lkml.kernel.org/r/20170525120316.24473-6-abbotti@mev.co.uk
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Correct these checkpatch.pl errors:
|ERROR: space required before that '-' (ctx:OxO)
|#37: FILE: include/linux/bug.h:37:
|+#define BUILD_BUG_ON_ZERO(e) (sizeof(struct { int:-!!(e); }))
|ERROR: space required before that '-' (ctx:OxO)
|#38: FILE: include/linux/bug.h:38:
|+#define BUILD_BUG_ON_NULL(e) ((void *)sizeof(struct { int:-!!(e); }))
I decided to wrap the bitfield expressions that begin with minus signs
in parentheses rather than insert spaces before the minus signs.
Link: http://lkml.kernel.org/r/20170525120316.24473-5-abbotti@mev.co.uk
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Correct these checkpatch.pl warnings:
|WARNING: Block comments use * on subsequent lines
|#34: FILE: include/linux/bug.h:34:
|+/* Force a compilation error if condition is true, but also produce a
|+ result (of value 0 and type size_t), so the expression can be used
|WARNING: Block comments use a trailing */ on a separate line
|#36: FILE: include/linux/bug.h:36:
|+ aren't permitted). */
Link: http://lkml.kernel.org/r/20170525120316.24473-3-abbotti@mev.co.uk
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
early_pfn_to_nid will return node 0 if both HAVE_ARCH_EARLY_PFN_TO_NID
and HAVE_MEMBLOCK_NODE_MAP are disabled. It seems we are safe now
because all architectures which support NUMA define one of them (with an
exception of alpha which however has CONFIG_NUMA marked as broken) so
this works as expected. It can get silently and subtly broken too
easily, though. Make sure we fail the compilation if NUMA is enabled
and there is no proper implementation for this function. If that ever
happens we know that either the specific configuration is invalid and
the fix should either disable NUMA or enable one of the above configs.
Link: http://lkml.kernel.org/r/20170704075803.15979-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Yang Shi <yang.shi@linaro.org>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The rework of the cpu hotplug locking unearthed potential deadlocks with
the memory hotplug locking code.
The solution for these is to rework the memory hotplug locking code as
well and take the cpu hotplug lock before the memory hotplug lock in
mem_hotplug_begin(), but this will cause a recursive locking of the cpu
hotplug lock when the memory hotplug code calls lru_add_drain_all().
Split out the inner workings of lru_add_drain_all() into
lru_add_drain_all_cpuslocked() so this function can be invoked from the
memory hotplug code with the cpu hotplug lock held.
Link: http://lkml.kernel.org/r/20170704093421.419329357@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
list_lru_count_node() iterates over all memcgs to get the total number of
entries on the node but it can race with memcg_drain_all_list_lrus(),
which migrates the entries from a dead cgroup to another. This can return
incorrect number of entries from list_lru_count_node().
Fix this by keeping track of entries per node and simply return it in
list_lru_count_node().
Link: http://lkml.kernel.org/r/1498707555-30525-1-git-send-email-stummala@codeaurora.org
Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Alexander Polakov <apolyakov@beget.ru>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
wb_stat_sum() disables interrupts and calls __wb_stat_sum() which
eventually calls __percpu_counter_sum(). However, the percpu routine is
already irq-safe. Simplify the code a bit by making wb_stat_sum()
directly call percpu_counter_sum_positive() and not disable interrupts.
Also remove the now-uneeded __wb_stat_sum() which was just a wrapper
over percpu_counter_sum_positive().
Link: http://lkml.kernel.org/r/1498230681-29103-1-git-send-email-nborisov@suse.com
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently pg_data_t is just a struct which describes a NUMA node memory
layout. Let's keep the comment simple and remove ambiguity.
Link: http://lkml.kernel.org/r/1498220534-22717-1-git-send-email-nborisov@suse.com
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
alloc_huge_page_nodemask tries to allocate from any numa node in the
allowed node mask starting from lower numa nodes. This might lead to
filling up those low NUMA nodes while others are not used. We can
reduce this risk by introducing a concept of the preferred node similar
to what we have in the regular page allocator. We will start allocating
from the preferred nid and then iterate over all allowed nodes in the
zonelist order until we try them all.
This is mimicing the page allocator logic except it operates on per-node
mempools. dequeue_huge_page_vma already does this so distill the
zonelist logic into a more generic dequeue_huge_page_nodemask and use it
in alloc_huge_page_nodemask.
This will allow us to use proper per numa distance fallback also for
alloc_huge_page_node which can use alloc_huge_page_nodemask now and we
can get rid of alloc_huge_page_node helper which doesn't have any user
anymore.
Link: http://lkml.kernel.org/r/20170622193034.28972-3-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Tested-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "mm, hugetlb: allow proper node fallback dequeue".
While working on a hugetlb migration issue addressed in a separate
patchset[1] I have noticed that the hugetlb allocations from the
preallocated pool are quite subotimal.
[1] //lkml.kernel.org/r/20170608074553.22152-1-mhocko@kernel.org
There is no fallback mechanism implemented and no notion of preferred
node. I have tried to work around it but Vlastimil was right to push
back for a more robust solution. It seems that such a solution is to
reuse zonelist approach we use for the page alloctor.
This series has 3 patches. The first one tries to make hugetlb
allocation layers more clear. The second one implements the zonelist
hugetlb pool allocation and introduces a preferred node semantic which
is used by the migration callbacks. The last patch is a clean up.
This patch (of 3):
Hugetlb allocation path for fresh huge pages is unnecessarily complex
and it mixes different interfaces between layers.
__alloc_buddy_huge_page is the central place to perform a new
allocation. It checks for the hugetlb overcommit and then relies on
__hugetlb_alloc_buddy_huge_page to invoke the page allocator. This is
all good except that __alloc_buddy_huge_page pushes vma and address down
the callchain and so __hugetlb_alloc_buddy_huge_page has to deal with
two different allocation modes - one for memory policy and other node
specific (or to make it more obscure node non-specific) requests.
This just screams for a reorganization.
This patch pulls out all the vma specific handling up to
__alloc_buddy_huge_page_with_mpol where it belongs.
__alloc_buddy_huge_page will get nodemask argument and
__hugetlb_alloc_buddy_huge_page will become a trivial wrapper over the
page allocator.
In short:
__alloc_buddy_huge_page_with_mpol - memory policy handling
__alloc_buddy_huge_page - overcommit handling and accounting
__hugetlb_alloc_buddy_huge_page - page allocator layer
Also note that __hugetlb_alloc_buddy_huge_page and its cpuset retry loop
is not really needed because the page allocator already handles the
cpusets update.
Finally __hugetlb_alloc_buddy_huge_page had a special case for node
specific allocations (when no policy is applied and there is a node
given). This has relied on __GFP_THISNODE to not fallback to a different
node. alloc_huge_page_node is the only caller which relies on this
behavior so move the __GFP_THISNODE there.
Not only does this remove quite some code it also should make those
layers easier to follow and clear wrt responsibilities.
Link: http://lkml.kernel.org/r/20170622193034.28972-2-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Tested-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 394e31d2ce ("mem-hotplug: alloc new page from a nearest
neighbor node when mem-offline") has duplicated a large part of
alloc_migrate_target with some hotplug specific special casing.
To be more precise it tried to enfore the allocation from a different
node than the original page. As a result the two function diverged in
their shared logic, e.g. the hugetlb allocation strategy.
Let's unify the two and express different NUMA requirements by the given
nodemask. new_node_page will simply exclude the node it doesn't care
about and alloc_migrate_target will use all the available nodes.
alloc_migrate_target will then learn to migrate hugetlb pages more
sanely and use preallocated pool when possible.
Please note that alloc_migrate_target used to call alloc_page resp.
alloc_pages_current so the memory policy of the current context which is
quite strange when we consider that it is used in the context of
alloc_contig_range which just tries to migrate pages which stand in the
way.
Link: http://lkml.kernel.org/r/20170608074553.22152-4-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: zhong jiang <zhongjiang@huawei.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
new_node_page will try to use the origin's next NUMA node as the
migration destination for hugetlb pages. If such a node doesn't have
any preallocated pool it falls back to __alloc_buddy_huge_page_no_mpol
to allocate a surplus page instead. This is quite subotpimal for any
configuration when hugetlb pages are no distributed to all NUMA nodes
evenly. Say we have a hotplugable node 4 and spare hugetlb pages are
node 0
/sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages:10000
/sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages:0
/sys/devices/system/node/node2/hugepages/hugepages-2048kB/nr_hugepages:0
/sys/devices/system/node/node3/hugepages/hugepages-2048kB/nr_hugepages:0
/sys/devices/system/node/node4/hugepages/hugepages-2048kB/nr_hugepages:10000
/sys/devices/system/node/node5/hugepages/hugepages-2048kB/nr_hugepages:0
/sys/devices/system/node/node6/hugepages/hugepages-2048kB/nr_hugepages:0
/sys/devices/system/node/node7/hugepages/hugepages-2048kB/nr_hugepages:0
Now we consume the whole pool on node 4 and try to offline this node.
All the allocated pages should be moved to node0 which has enough
preallocated pages to hold them. With the current implementation
offlining very likely fails because hugetlb allocations during runtime
are much less reliable.
Fix this by reusing the nodemask which excludes migration source and try
to find a first node which has a page in the preallocated pool first and
fall back to __alloc_buddy_huge_page_no_mpol only when the whole pool is
consumed.
[akpm@linux-foundation.org: remove bogus arg from alloc_huge_page_nodemask() stub]
Link: http://lkml.kernel.org/r/20170608074553.22152-3-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: zhong jiang <zhongjiang@huawei.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
page_ref_freeze and page_ref_unfreeze are designed to be used as a pair,
wrapping a critical section where struct pages can be modified without
having to worry about consistency for a concurrent fast-GUP.
Whilst page_ref_freeze has full barrier semantics due to its use of
atomic_cmpxchg, page_ref_unfreeze is implemented using atomic_set, which
doesn't provide any barrier semantics and allows the operation to be
reordered with respect to page modifications in the critical section.
This patch ensures that page_ref_unfreeze is ordered after any critical
section updates, by invoking smp_mb() prior to the atomic_set.
Link: http://lkml.kernel.org/r/1497349722-6731-3-git-send-email-will.deacon@arm.com
Signed-off-by: Will Deacon <will.deacon@arm.com>
Acked-by: Steve Capper <steve.capper@arm.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The madvise policy for transparent huge pages is meant to avoid unwanted
allocations of transparent huge pages. It allows a policy of disabling
the extra memory pressure and effort to arrange for a huge page when it
is not needed.
DAX by definition never incurs this overhead since it is statically
allocated. The policy choice makes even less sense for device-dax which
tries to guarantee a given tlb-fault size. Specifically, the following
setting:
echo never > /sys/kernel/mm/transparent_hugepage/enabled
...violates that guarantee and silently disables all device-dax
instances with a 2M or 1G alignment. So, let's avoid that non-obvious
side effect by force enabling thp for dax mappings in all cases.
It is worth noting that the reason this uses vma_is_dax(), and the
resulting header include changes, is that previous attempts to add a
VM_DAX flag were NAKd.
Link: http://lkml.kernel.org/r/149739531127.20686.15813586620597484283.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Turn the macro into a static inline and rewrite the condition checks for
better readability in preparation for adding another condition.
[ross.zwisler@linux.intel.com: fix logic to make conversion equivalent]
[akpm@linux-foundation.org: resolve vs mm-make-pr_set_thp_disable-immediately-active.patch]
[akpm@linux-foundation.org: include coredump.h for MMF_DISABLE_THP]
Link: http://lkml.kernel.org/r/149739530612.20686.14760671150202647861.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
PR_SET_THP_DISABLE has a rather subtle semantic. It doesn't affect any
existing mapping because it only updated mm->def_flags which is a
template for new mappings.
The mappings created after prctl(PR_SET_THP_DISABLE) have VM_NOHUGEPAGE
flag set. This can be quite surprising for all those applications which
do not do prctl(); fork() & exec() and want to control their own THP
behavior.
Another usecase when the immediate semantic of the prctl might be useful
is a combination of pre- and post-copy migration of containers with
CRIU. In this case CRIU populates a part of a memory region with data
that was saved during the pre-copy stage. Afterwards, the region is
registered with userfaultfd and CRIU expects to get page faults for the
parts of the region that were not yet populated. However, khugepaged
collapses the pages and the expected page faults do not occur.
In more general case, the prctl(PR_SET_THP_DISABLE) could be used as a
temporary mechanism for enabling/disabling THP process wide.
Implementation wise, a new MMF_DISABLE_THP flag is added. This flag is
tested when decision whether to use huge pages is taken either during
page fault of at the time of THP collapse.
It should be noted, that the new implementation makes PR_SET_THP_DISABLE
master override to any per-VMA setting, which was not the case
previously.
Fixes: a0715cc226 ("mm, thp: add VM_INIT_DEF_MASK and PRCTL_THP_DISABLE")
Link: http://lkml.kernel.org/r/1496415802-30944-1-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
dequeue_hwpoisoned_huge_page() is no longer used, so let's remove it.
Link: http://lkml.kernel.org/r/1496305019-5493-9-git-send-email-n-horiguchi@ah.jp.nec.com
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently hugepage migrated by soft-offline (i.e. due to correctable
memory errors) is contained as a hugepage, which means many non-error
pages in it are unreusable, i.e. wasted.
This patch solves this issue by dissolving source hugepages into buddy.
As done in previous patch, PageHWPoison is set only on a head page of
the error hugepage. Then in dissoliving we move the PageHWPoison flag
to the raw error page so that all healthy subpages return back to buddy.
[arnd@arndb.de: fix warnings: replace some macros with inline functions]
Link: http://lkml.kernel.org/r/20170609102544.2947326-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/1496305019-5493-5-git-send-email-n-horiguchi@ah.jp.nec.com
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We'd like to narrow down the error region in memory error on hugetlb
pages. However, currently we set PageHWPoison flags on all subpages in
the error hugepage and add # of subpages to num_hwpoison_pages, which
doesn't fit our purpose.
So this patch changes the behavior and we only set PageHWPoison on the
head page then increase num_hwpoison_pages only by 1. This is a
preparation for narrow-down part which comes in later patches.
Link: http://lkml.kernel.org/r/1496305019-5493-4-git-send-email-n-horiguchi@ah.jp.nec.com
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For fast flash disk, async IO could introduce overhead because of
context switch. block-mq now supports IO poll, which improves
performance and latency a lot. swapin is a good place to use this
technique, because the task is waiting for the swapin page to continue
execution.
In my virtual machine, directly read 4k data from a NVMe with iopoll is
about 60% better than that without poll. With iopoll support in swapin
patch, my microbenchmark (a task does random memory write) is about
10%~25% faster. CPU utilization increases a lot though, 2x and even 3x
CPU utilization. This will depend on disk speed.
While iopoll in swapin isn't intended for all usage cases, it's a win
for latency sensistive workloads with high speed swap disk. block layer
has knob to control poll in runtime. If poll isn't enabled in block
layer, there should be no noticeable change in swapin.
I got a chance to run the same test in a NVMe with DRAM as the media.
In simple fio IO test, blkpoll boosts 50% performance in single thread
test and ~20% in 8 threads test. So this is the base line. In above
swap test, blkpoll boosts ~27% performance in single thread test.
blkpoll uses 2x CPU time though.
If we enable hybid polling, the performance gain has very slight drop
but CPU time is only 50% worse than that without blkpoll. Also we can
adjust parameter of hybid poll, with it, the CPU time penality is
reduced further. In 8 threads test, blkpoll doesn't help though. The
performance is similar to that without blkpoll, but cpu utilization is
similar too. There is lock contention in swap path. The cpu time
spending on blkpoll isn't high. So overall, blkpoll swapin isn't worse
than that without it.
The swapin readahead might read several pages in in the same time and
form a big IO request. Since the IO will take longer time, it doesn't
make sense to do poll, so the patch only does iopoll for single page
swapin.
[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/070c3c3e40b711e7b1390002c991e86a-b5408f0@7511894063d3764ff01ea8111f5a004d7dd700ed078797c204a24e620ddb965c
Signed-off-by: Shaohua Li <shli@fb.com>
Cc: Tim Chen <tim.c.chen@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Avoid quotacheck deadlocks
- Fix transaction overflows when bunmapping fragmented files
- Refactor directory readahead
- Allow admin to configure if ASSERT is fatal
- Improve transaction usage detail logging during overflows
- Minor cleanups
- Don't leak log items when the log shuts down
- Remove double-underscore typedefs
- Various preparation for online scrubbing
- Introduce new error injection configuration sysfs knobs
- Refactor dq_get_next to use extent map directly
- Fix problems with iterating the page cache for unwritten data
- Implement SEEK_{HOLE,DATA} via iomap
- Refactor XFS to use iomap SEEK_HOLE and SEEK_DATA
- Don't use MAXPATHLEN to check on-disk symlink target lengths
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABCgAGBQJZYDw4AAoJEPh/dxk0SrTr2IMP/3JLeygIDtKBBVRPvlCmEXQC
j8w1C/ntn46zZKQ8l14fAFV4HV2d+KJWf8+yDuPuGdMXJfPeKZf95otYhnSx/9Th
MvCH7Nzg63yjEGqXpBkfIVr/GT0KTx28lxiqNViChr7XiXWookgf3SSLINO+vU4J
L2jgLqieJfijiHTBs4qGCQPDwSXVoSOi5XCCQWDYQrXz6DI5UEJc70U53WkH4tRu
RctOgp1lralwEO0PhfomD3m/Gk94taE/4ZpX/j/5Y4tvH/yh5aY3/KTCLm6+mYT3
rgMpmg5hmm+UiCTNoTnQ5RxzGZWCfI1I9FZ3HqDsbhmFtaWh32ti0dEEDYsF8Opj
ARnTty3cRx41LH9dULrVWdwW105AHgwEz8/OZlG0JOca9qzj9GKERMg/hpHINAKN
TrBlkweg86LWZDy23udZJ/v35svNqSFsqL1yV8j5dXyBi+Yi2CGfU27zbBwnj4Jk
047l+OuRbBnEOUULqJTEVBY3euoclwl/yQrW2m409s7vPGkGQBLuFCsDKQdnvJ/A
D7frZqH8XypwnhFOkKybUnBkn4P7vZ2sEuCIZMsrH5k/ys8XyEkaBaOurjvMBOKA
vLIMD6RXDWrFbOoovfK/stEM6/UFoQkgMhBe7vB9EXk1AjM8NYyWZgp5BkHtytC7
qa6GRjtGefhc67hbwXJd
=/GZI
-----END PGP SIGNATURE-----
Merge tag 'xfs-4.13-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull XFS updates from Darrick Wong:
"Here are some changes for you for 4.13. For the most part it's fixes
for bugs and deadlock problems, and preparation for online fsck in
some future merge window.
- Avoid quotacheck deadlocks
- Fix transaction overflows when bunmapping fragmented files
- Refactor directory readahead
- Allow admin to configure if ASSERT is fatal
- Improve transaction usage detail logging during overflows
- Minor cleanups
- Don't leak log items when the log shuts down
- Remove double-underscore typedefs
- Various preparation for online scrubbing
- Introduce new error injection configuration sysfs knobs
- Refactor dq_get_next to use extent map directly
- Fix problems with iterating the page cache for unwritten data
- Implement SEEK_{HOLE,DATA} via iomap
- Refactor XFS to use iomap SEEK_HOLE and SEEK_DATA
- Don't use MAXPATHLEN to check on-disk symlink target lengths"
* tag 'xfs-4.13-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (48 commits)
xfs: don't crash on unexpected holes in dir/attr btrees
xfs: rename MAXPATHLEN to XFS_SYMLINK_MAXLEN
xfs: fix contiguous dquot chunk iteration livelock
xfs: Switch to iomap for SEEK_HOLE / SEEK_DATA
vfs: Add iomap_seek_hole and iomap_seek_data helpers
vfs: Add page_cache_seek_hole_data helper
xfs: remove a whitespace-only line from xfs_fs_get_nextdqblk
xfs: rewrite xfs_dq_get_next_id using xfs_iext_lookup_extent
xfs: Check for m_errortag initialization in xfs_errortag_test
xfs: grab dquots without taking the ilock
xfs: fix semicolon.cocci warnings
xfs: Don't clear SGID when inheriting ACLs
xfs: free cowblocks and retry on buffered write ENOSPC
xfs: replace log_badcrc_factor knob with error injection tag
xfs: convert drop_writes to use the errortag mechanism
xfs: remove unneeded parameter from XFS_TEST_ERROR
xfs: expose errortag knobs via sysfs
xfs: make errortag a per-mountpoint structure
xfs: free uncommitted transactions during log recovery
xfs: don't allow bmap on rt files
...
Pull copy*_iter fix from Al Viro.
[ Al used entirely the wrong return value. Oopsie. ]
* 'fix-uio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fix brown paperbag bug in inlined copy_..._iter()
Pull HID updates from Jiri Kosina:
- open/close tracking improvements from Dmitry Torokhov
- battery support improvements in Wacom driver from Jason Gerecke
- Win8 support fixes from Benjamin Tissories and Hans de Geode
- misc fixes to Intel-ISH driver from Arnd Bergmann
- support for quite a few new devices and small assorted fixes here and
there
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (35 commits)
HID: intel-ish-hid: Enable Gemini Lake ish driver
HID: intel-ish-hid: Enable Cannon Lake ish driver
HID: wacom: fix mistake in printk
HID: multitouch: optimize the sticky fingers timer
HID: multitouch: fix rare Win 8 cases when the touch up event gets missing
HID: multitouch: use BIT macro
HID: Add driver for Retrode2 joypad adapter
HID: multitouch: Add support for Google Rose Touchpad
HID: multitouch: Support PTP Stick and Touchpad device
HID: core: don't use negative operands when shift
HID: apple: Use country code to detect ISO keyboards
HID: remove no longer used hid->open field
greybus: hid: remove custom locking from gb_hid_open/close
HID: usbhid: remove custom locking from usbhid_open/close
HID: i2c-hid: remove custom locking from i2c_hid_open/close
HID: serialize hid_hw_open and hid_hw_close
HID: usbhid: do not rely on hid->open when deciding to do IO
HID: hiddev: use hid_hw_power instead of usbhid_get/put_power
HID: hiddev: use hid_hw_open/close instead of usbhid_open/close
HID: asus: Add support for Zen AiO MD-5110 keyboard
...
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJZYseIAAoJEAx081l5xIa+85kP/0zKzKKVzZXSXG2TAGb5jNfk
Ex+TELG8tWk9KBxA7lEE5c0WEsnP79cNoXZLQu8wlUzO8+kwQK5Bz0zgNUkpSuo1
RthwdsxBQX1++UxB+HoSG+dOa7hkKVqlgQR3z9qyhsBXzetkJV0DoYcpMV0A1EWd
6Jzt+AvCShVkcW+21LqHPlc5EIVewrDMoA3oU6aYCLhyAOUTVvvQB2ML8YApH7TM
JrSrzCFHTrQEBbGUrZQhzR0sZzZzk9byntb/I/mdVbHeCyIHiL8sC4PfWSOyyazm
GkPnA8G3aFAY9haBRz9jG/VBr1yVb0mCBjkWQ1lGfIAOCDDSc+d7PDXdG+i4AewK
jZheXlrDIdGgmJLy4W3rdEqJvdf7UQHZOs8594OL19l4+FxCTrol1JSHSMeavCvr
8bUNil9Jb/ONU/wmp+q55U0k4TCTyerUA7gKnuaJAwBvd4n78/PKmQnbrWinDyJc
GQXp6zESk9bKt5DXSnVZuVf4POTzpuAsQkkfX1V2y145EHTQYfS3jLENWqEjyZUy
QtKCHZvRkJfGaFU4Pr+vBo9Iu1GlA5OiOv08QadldTT4OxUI0T6yaLDobHCQfKPE
sc3wCuCM+/dAnqoKDcGC4hAmF8zDdO0kw65P2m7uC6T9Jm1G35CioKbzo+fzUhuL
fg5TBpbp2Wwe2oPA5iBm
=2S5N
-----END PGP SIGNATURE-----
Merge tag 'drm-for-v4.13' of git://people.freedesktop.org/~airlied/linux
Pull drm updates from Dave Airlie:
"This is the main pull request for the drm, I think I've got one later
driver pull for mediatek SoC driver, I'm undecided on if it needs to
go to you yet.
Otherwise summary below:
Core drm:
- Atomic add driver private objects
- Deprecate preclose hook in modern drivers
- MST bandwidth tracking
- Use kvmalloc in more places
- Add mode_valid hook for crtc/encoder/bridge
- Reduce sync_file construction time
- Documentation updates
- New DRM synchronisation object support
New drivers:
- pl111 - pl111 CLCD display controller
Panel:
- Innolux P079ZCA panel driver
- Add NL12880B20-05, NL192108AC18-02D, P320HVN03 panels
- panel-samsung-s6e3ha2: Add s6e3hf2 panel support
i915:
- SKL+ watermark fixes
- G4x/G33 reset improvements
- DP AUX backlight improvements
- Buffer based GuC/host communication
- New getparam for (sub)slice infomation
- Cannonlake and Coffeelake initial patches
- Execbuf optimisations
radeon/amdgpu:
- Lots of Vega10 bug fixes
- Preliminary raven support
- KIQ support for compute rings
- MEC queue management rework
- DCE6 Audio support
- SR-IOV improvements
- Better radeon/amdgpu selection support
nouveau:
- HDMI stereoscopic support
- Display code rework for >= GM20x GPUs
msm:
- GEM rework for fine-grained locking
- Per-process pagetable work
- HDMI fixes for Snapdragon 820.
vc4:
- Remove 256MB CMA limit from vc4
- Add out-fence support
- Add support for cygnus
- Get/set tiling ioctls support
- Add T-format tiling support for scanout
zte:
- add VGA support.
etnaviv:
- Thermal throttle support for newer GPUs
- Restore userspace buffer cache performance
- dma-buf sync fix
stm:
- add stm32f429 display support
exynos:
- Rework vblank handling
- Fixup sw-trigger code
sun4i:
- V3s display engine support
- HDMI support for older SoCs
- Preliminary work on dual-pipeline SoCs.
rcar-du:
- VSP work
imx-drm:
- Remove counter load enable from PRE
- Double read/write reduction flag support
tegra:
- Documentation for the host1x and drm driver.
- Lots of staging ioctl fixes due to grate project work.
omapdrm:
- dma-buf fence support
- TILER rotation fixes"
* tag 'drm-for-v4.13' of git://people.freedesktop.org/~airlied/linux: (1270 commits)
drm: Remove unused drm_file parameter to drm_syncobj_replace_fence()
drm/amd/powerplay: fix bug fail to remove sysfs when rmmod amdgpu.
amdgpu: Set cik/si_support to 1 by default if radeon isn't built
drm/amdgpu/gfx9: fix driver reload with KIQ
drm/amdgpu/gfx8: fix driver reload with KIQ
drm/amdgpu: Don't call amd_powerplay_destroy() if we don't have powerplay
drm/ttm: Fix use-after-free in ttm_bo_clean_mm
drm/amd/amdgpu: move get memory type function from early init to sw init
drm/amdgpu/cgs: always set reference clock in mode_info
drm/amdgpu: fix vblank_time when displays are off
drm/amd/powerplay: power value format change for Vega10
drm/amdgpu/gfx9: support the amdgpu.disable_cu option
drm/amd/powerplay: change PPSMC_MSG_GetCurrPkgPwr for Vega10
drm/amdgpu: Make amdgpu_cs_parser_init static (v2)
drm/amdgpu/cs: fix a typo in a comment
drm/amdgpu: Fix the exported always on CU bitmap
drm/amdgpu/gfx9: gfx_v9_0_enable_gfx_static_mg_power_gating() can be static
drm/amdgpu/psp: upper_32_bits/lower_32_bits for address setup
drm/amd/powerplay/cz: print message if smc message fails
drm/amdgpu: fix typo in amdgpu_debugfs_test_ib_init
...
Pull timers fixlet from Thomas Gleixner:
"Add Frederic Weisbecker as NOHZ/dyntick maintainer"
[ And an unmentioned and unrelated typo fix in the same commit? Hmm.. ]
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
MAINTAINERS: Add Frederic Weisbecker as nohz/dyntics maintainer
Pull scheduler fixes from Thomas Gleixner:
"This scheduler update provides:
- The (hopefully) final fix for the vtime accounting issues which
were around for quite some time
- Use types known to user space in UAPI headers to unbreak user space
builds
- Make load balancing respect the current scheduling domain again
instead of evaluating unrelated CPUs"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/headers/uapi: Fix linux/sched/types.h userspace compilation errors
sched/fair: Fix load_balance() affinity redo path
sched/cputime: Accumulate vtime on top of nsec clocksource
sched/cputime: Move the vtime task fields to their own struct
sched/cputime: Rename vtime fields
sched/cputime: Always set tsk->vtime_snap_whence after accounting vtime
vtime, sched/cputime: Remove vtime_account_user()
Revert "sched/cputime: Refactor the cputime_adjust() code"
Pull perf fixes from Thomas Gleixner:
"A couple of fixes for perf and kprobes:
- Add he missing exclude_kernel attribute for the precise_ip level so
!CAP_SYS_ADMIN users get the proper results.
- Warn instead of failing completely when perf has no unwind support
for a particular architectiure built in.
- Ensure that jprobes are at function entry and not at some random
place"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
kprobes: Ensure that jprobe probepoints are at function entry
kprobes: Simplify register_jprobes()
kprobes: Rename [arch_]function_offset_within_entry() to [arch_]kprobe_on_func_entry()
perf unwind: Do not fail due to missing unwind support
perf evsel: Set attr.exclude_kernel when probing max attr.precise_ip
Pull irq fixes from Thomas Gleixner:
- A few fixes mopping up the fallout of the big irq overhaul
- Move the interrupt resource management logic out of the spin locked,
irq disabled region to avoid unnecessary restrictions of the resource
callbacks
- Preparation for reworking the per cpu irq request function.
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqdomain: Allow ACPI device nodes to be used as irqdomain identifiers
genirq/debugfs: Remove redundant NULL pointer check
genirq: Allow to pass the IRQF_TIMER flag with percpu irq request
genirq/timings: Move free timings out of spinlocked region
genirq: Move irq resource handling out of spinlocked region
genirq: Add mutex to irq desc to serialize request/free_irq()
genirq: Move bus locking into __setup_irq()
genirq: Force inlining of __irq_startup_managed to prevent build failure
genirq/debugfs: Fix build for !CONFIG_IRQ_DOMAIN
feature, which allows ext4 directories to support over 2 billion
directory entries (assuming ~64 byte file names; in practice, users
will run into practical performance limits first.) This feature was
originally written by the Lustre team, and credit goes to Artem
Blagodarenko from Seagate for getting this feature upstream.
The second major major feature allows ext4 to support extended
attribute values up to 64k. This feature was also originally from
Lustre, and has been enhanced by Tahsin Erdogan from Google with a
deduplication feature so that if multiple files have the same xattr
value (for example, Windows ACL's stored by Samba), only one copy will
be stored on disk for encoding and caching efficiency.
We also have the usual set of bug fixes, cleanups, and optimizations.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAllhl5AACgkQ8vlZVpUN
gaOiNQf+L23sT9KIQmFwQP38vkBVw67Eo7gBfevmk7oqQLiRppT5mmLzW8EWEDxR
PVaDQXvSZi18wSCAAcCd1ZqeIZk0P6tst0ufnIT60tGlZdUlwSLyrqvV/30axR2g
6kcnv90ZszrQNx5U8q8bMzNrs1KtyPHFCRzavFsBX11WezNSpWnH2in/uxO+t9Jy
F2zlrLUrE2m9AVMH48Dh6LbeaB6pqgr4k3jq1jG4Iqb2h9xgU8OKhs8gL07YS+Qi
5A7s8GIvYQSoZUO9DOOie2f1zhpO0KrhXchyZTJukVQH7TsmFxoSh0vhXnP1Bohu
CNLV6dzetDT0VfmPr1WhVe7lhZeeVw==
=FFkF
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"The first major feature for ext4 this merge window is the largedir
feature, which allows ext4 directories to support over 2 billion
directory entries (assuming ~64 byte file names; in practice, users
will run into practical performance limits first.) This feature was
originally written by the Lustre team, and credit goes to Artem
Blagodarenko from Seagate for getting this feature upstream.
The second major major feature allows ext4 to support extended
attribute values up to 64k. This feature was also originally from
Lustre, and has been enhanced by Tahsin Erdogan from Google with a
deduplication feature so that if multiple files have the same xattr
value (for example, Windows ACL's stored by Samba), only one copy will
be stored on disk for encoding and caching efficiency.
We also have the usual set of bug fixes, cleanups, and optimizations"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (47 commits)
ext4: fix spelling mistake: "prellocated" -> "preallocated"
ext4: fix __ext4_new_inode() journal credits calculation
ext4: skip ext4_init_security() and encryption on ea_inodes
fs: generic_block_bmap(): initialize all of the fields in the temp bh
ext4: change fast symlink test to not rely on i_blocks
ext4: require key for truncate(2) of encrypted file
ext4: don't bother checking for encryption key in ->mmap()
ext4: check return value of kstrtoull correctly in reserved_clusters_store
ext4: fix off-by-one fsmap error on 1k block filesystems
ext4: return EFSBADCRC if a bad checksum error is found in ext4_find_entry()
ext4: return EIO on read error in ext4_find_entry
ext4: forbid encrypting root directory
ext4: send parallel discards on commit completions
ext4: avoid unnecessary stalls in ext4_evict_inode()
ext4: add nombcache mount option
ext4: strong binding of xattr inode references
ext4: eliminate xattr entry e_hash recalculation for removes
ext4: reserve space for xattr entries/names
quota: add get_inode_usage callback to transfer multi-inode charges
ext4: xattr inode deduplication
...
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAllhktgACgkQ8vlZVpUN
gaOQIQf+KM2s46sxxEl0/hjdBXR4OxTmSS2/0900NPyg7JHKlL8PdYslOyvMiKjo
wEi+YPwwQgbHtxhI1VINfV/q12MZHwvmFOfD9NzjrISwfmfsKj0dBgZDAfBH82sK
12wKgUxA8xJ4P+Xdvnz2PokRcFCsh1YUr5IUQkP3JR2RZOxNFUj42QwPJ2yWzqxO
MsnepMjIHsxvXZi0E7sPjRaoFsh3DDeLmNl8sX6INodC7hxJ1LotYKqJhA4stQpB
ezXY2tabwg3gaOWvWH7THyHhGntbZVDga3iRrKdNLahXN8OBdHktmG75ubiN6tEg
x80pqQLgr41yIQuJVOuyeh5jLYZrww==
=i4r9
-----END PGP SIGNATURE-----
Merge tag 'fscrypt_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt
Pull fscrypt updates from Ted Ts'o:
"Add support for 128-bit AES and some cleanups to fscrypt"
* tag 'fscrypt_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt:
fscrypt: make ->dummy_context() return bool
fscrypt: add support for AES-128-CBC
fscrypt: inline fscrypt_free_filename()
Pull input updates from Dmitry Torokhov:
- a new driver for STM FingerTip touchscreen
- a new driver for D-Link DIR-685 touch keys
- updated list of supported devices in xpad driver
- other assorted updates and fixes
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (23 commits)
MAINTAINERS: update input subsystem patterns
Input: introduce KEY_ASSISTANT
Input: xpad - sync supported devices with XBCD
Input: xpad - sync supported devices with 360Controller
Input: xen-kbdfront - use string constants from PV protocol
Input: stmfts - mark all PM functions as __maybe_unused
Input: add support for the STMicroelectronics FingerTip touchscreen
Input: add D-Link DIR-685 touchkeys driver
Input: s3c2410_ts - handle return value of clk_prepare_enable
Input: axp20x-pek - add wakeup support
Input: synaptics-rmi4 - use %phN to form F34 configuration ID
Input: synaptics-rmi4 - change a char type to u8
Input: sparse-keymap - remove sparse_keymap_free()
Input: tsc2007 - move header file out of I2C realm
Input: mms114 - move header file out of I2C realm
Input: mcs - move header file out of I2C realm
Input: lm8323 - move header file out of I2C realm
Input: elantech - force relative mode on a certain module
Input: elan_i2c - add support for fetching chip type on newer hardware
Input: elan_i2c - check if device is there before really probing
...
- removal of AVR32 support in dw driver as AVR32 is gone
- new driver for Broadcom stream buffer accelerator (SBA) RAID driver
- add support for Faraday Technology FTDMAC020 in amba-pl08x driver
- IOMMU support in pl330 driver
- updates to bunch of drivers
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJZYG73AAoJEHwUBw8lI4NH+DoP/1f0TsYrQFCjNqa4nybjU1Sd
bbqnpouuJscwt8Qk2LGuSimi0QG91gQOLvrmueFbXtEg86nPOfa0RnWGNF4qwYFK
oliDlXF2PnV65J5kl7CvqXCj6bFiXCULVdO9JD2HFoFB1+lzXN9JQqOG5ne29BQ6
g3HNRlUTXNzQXWisgbAOLxuuvyfv68Zo3wCLYLkd4vC/C4zmxM+KXUG8+s0hS7t3
AOUpYW6F/C+y1Ax+SiACm0QGNZ4rc6/+ZUIIXUO5CfTYGjv6QUdzxiLHtc4br25l
2CoN9IP4V/OxHaW9T1jA61TeAAFr63oXYfDMBBzclzVryZRAIU72ups31uRQXpFz
99zUQ0OsdOCvy0oPInhNd8u+cpyh/4e2RDgSZ9rxw3xVaKFh8lsw5OtcCBQzCMeI
xgFCUBHsLjEi4uafJcl6n2T7+Y4Y0KgOmxPHZo3tpq/2a5M6tVy8k68m3afCQylF
1SOxzVZdDRUutPpviQWop6RgP0EcVuzaUJ0vO4nat4j77vuimaPqdk+oLV46XP2d
5I52kcvbVI4BbJavTjVs3FRdcez0pW37iOw+5MOxHE3dnBp4X/3btFzBY4aOsdg0
wVut3B+9U4WHDBF2ConBxxMvGqMYmcssOQ096GdC6oBHHS7x6n7tEVPiZ5iUacn5
LB8k9AZtpBC7nUWPH7FS
=srPZ
-----END PGP SIGNATURE-----
Merge tag 'dmaengine-4.13-rc1' of git://git.infradead.org/users/vkoul/slave-dma
Pull dmaengine updates from Vinod Koul:
- removal of AVR32 support in dw driver as AVR32 is gone
- new driver for Broadcom stream buffer accelerator (SBA) RAID driver
- add support for Faraday Technology FTDMAC020 in amba-pl08x driver
- IOMMU support in pl330 driver
- updates to bunch of drivers
* tag 'dmaengine-4.13-rc1' of git://git.infradead.org/users/vkoul/slave-dma: (36 commits)
dmaengine: qcom_hidma: correct API violation for submit
dmaengine: zynqmp_dma: Remove max len check in zynqmp_dma_prep_memcpy
dmaengine: tegra-apb: Really fix runtime-pm usage
dmaengine: fsl_raid: make of_device_ids const.
dmaengine: qcom_hidma: allow ACPI/DT parameters to be overridden
dmaengine: fsldma: set BWC, DAHTS and SAHTS values correctly
dmaengine: Kconfig: Simplify the help text for MXS_DMA
dmaengine: pl330: Delete unused functions
dmaengine: Replace WARN_TAINT_ONCE() with pr_warn_once()
dmaengine: Kconfig: Extend the dependency for MXS_DMA
dmaengine: mxs: Use %zu for printing a size_t variable
dmaengine: ste_dma40: Cleanup scatterlist layering violations
dmaengine: imx-dma: cleanup scatterlist layering violations
dmaengine: use proper name for the R-Car SoC
dmaengine: imx-sdma: Fix compilation warning.
dmaengine: imx-sdma: Handle return value of clk_prepare_enable
dmaengine: pl330: Add IOMMU support to slave tranfers
dmaengine: DW DMAC: Handle return value of clk_prepare_enable
dmaengine: pl08x: use GENMASK() to create bitmasks
dmaengine: pl08x: Add support for Faraday Technology FTDMAC020
...
Pull networking fixes from David Miller:
"Mostly fixing some light fallout from the changes that went into the
merge window.
1) Fix memory leaks on network namespace teardown in netfilter, from
Liping Zhang.
2) When comparing ipv6 nexthops, we have to take the lightweight
tunnel state into account as well. From David Ahern.
3) Fix socket option object length check in the new TLS code, from
Matthias Rosenfelder.
4) Fix memory leak in nfp driver flower support, from Jakub Kicinski.
5) Several netlink attribute validation fixes in cfg80211, from
Srinivas Dasari.
6) Fix context array leak in virtio_net, from Jason Wang.
7) SKB use after free in hns driver, from Yusheng Lin.
8) Fix socket leak on accept() in RDS, from Sowmini Varadhan. Also
add a WARN_ON() to sock_graft() so other protocol stacks don't
trip over this as well"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits)
net: ethernet: mediatek: remove useless code in mtk_probe()
mpls: fix uninitialized in_label var warning in mpls_getroute
doc: SKB_GSO_[IPIP|SIT] have been replaced
bonding: avoid NETDEV_CHANGEMTU event when unregistering slave
net/sock: add WARN_ON(parent->sk) in sock_graft()
rds: tcp: use sock_create_lite() to create the accept socket
net: hns: Fix a skb used after free bug
net: hns: Fix a wrong op phy C45 code
net: macb: Adding Support for Jumbo Frames up to 10240 Bytes in SAMA5D3
net: Update networking MAINTAINERS entry.
virtio-net: fix leaking of ctx array
cfg80211: Validate frequencies nested in NL80211_ATTR_SCAN_FREQUENCIES
cfg80211: Define nla_policy for NL80211_ATTR_LOCAL_MESH_POWER_MODE
cfg80211: Check if NAN service ID is of expected size
cfg80211: Check if PMKID attribute is of expected size
arcnet: com20020-pci: Fix an error handling path in 'com20020pci_probe()'
nfp: flower: add missing clean up call to avoid memory leaks
vrf: fix bug_on triggered by rx when destroying a vrf
ptp: dte: Use LL suffix for 64-bit constants
sctp: set the value of flowi6_oif to sk_bound_dev_if to make sctp_v6_get_dst to find the correct route entry.
...
Pull misc filesystem updates from Al Viro:
"Assorted normal VFS / filesystems stuff..."
* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
dentry name snapshots
Make statfs properly return read-only state after emergency remount
fs/dcache: init in_lookup_hashtable
minix: Deinline get_block, save 2691 bytes
fs: Reorder inode_owner_or_capable() to avoid needless
fs: warn in case userspace lied about modprobe return
Pull __copy_in_user removal from Al Viro:
"There used to be 6 places in the entire tree calling __copy_in_user(),
all of them bogus.
Four got killed off in work.drm branch, this takes care of the
remaining ones and kills the definition of that sucker"
* 'work.__copy_in_user' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
kill __copy_in_user()
sanitize do_i2c_smbus_ioctl()
As Hongjun/Nicolas summarized in their original patch:
"
When a device changes from one netns to another, it's first unregistered,
then the netns reference is updated and the dev is registered in the new
netns. Thus, when a slave moves to another netns, it is first
unregistered. This triggers a NETDEV_UNREGISTER event which is caught by
the bonding driver. The driver calls bond_release(), which calls
dev_set_mtu() and thus triggers NETDEV_CHANGEMTU (the device is still in
the old netns).
"
This is a very special case, because the device is being unregistered
no one should still care about the NETDEV_CHANGEMTU event triggered
at this point, we can avoid broadcasting this event on this path,
and avoid touching inetdev_event()/addrconf_notify() path.
It requires to export __dev_set_mtu() to bonding driver.
Reported-by: Hongjun Li <hongjun.li@6wind.com>
Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rename function_offset_within_entry() to scope it to kprobe namespace by
using kprobe_ prefix, and to also simplify it.
Suggested-by: Ingo Molnar <mingo@kernel.org>
Suggested-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/3aa6c7e2e4fb6e00f3c24fa306496a66edb558ea.1499443367.git.naveen.n.rao@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull iov_iter hardening from Al Viro:
"This is the iov_iter/uaccess/hardening pile.
For one thing, it trims the inline part of copy_to_user/copy_from_user
to the minimum that *does* need to be inlined - object size checks,
basically. For another, it sanitizes the checks for iov_iter
primitives. There are 4 groups of checks: access_ok(), might_fault(),
object size and KASAN.
- access_ok() had been verified by whoever had set the iov_iter up.
However, that has happened in a function far away, so proving that
there's no path to actual copying bypassing those checks is hard
and proving that iov_iter has not been buggered in the meanwhile is
also not pleasant. So we want those redone in actual
copyin/copyout.
- might_fault() is better off consolidated - we know whether it needs
to be checked as soon as we enter iov_iter primitive and observe
the iov_iter flavour. No need to wait until the copyin/copyout. The
call chains are short enough to make sure we won't miss anything -
in fact, it's more robust that way, since there are cases where we
do e.g. forced fault-in before getting to copyin/copyout. It's not
quite what we need to check (in particular, combination of
iovec-backed and set_fs(KERNEL_DS) is almost certainly a bug, not a
cause to skip checks), but that's for later series. For now let's
keep might_fault().
- KASAN checks belong in copyin/copyout - at the same level where
other iov_iter flavours would've hit them in memcpy().
- object size checks should apply to *all* iov_iter flavours, not
just iovec-backed ones.
There are two groups of primitives - one gets the kernel object
described as pointer + size (copy_to_iter(), etc.) while another gets
it as page + offset + size (copy_page_to_iter(), etc.)
For the first group the checks are best done where we actually have a
chance to find the object size. In other words, those belong in inline
wrappers in uio.h, before calling into iov_iter.c. Same kind as we
have for inlined part of copy_to_user().
For the second group there is no object to look at - offset in page is
just a number, it bears no type information. So we do them in the
common helper called by iov_iter.c primitives of that kind. All it
currently does is checking that we are not trying to access outside of
the compound page; eventually we might want to add some sanity checks
on the page involved.
So the things we need in copyin/copyout part of iov_iter.c do not
quite match anything in uaccess.h (we want no zeroing, we *do* want
access_ok() and KASAN and we want no might_fault() or object size
checks done on that level). OTOH, these needs are simple enough to
provide a couple of helpers (static in iov_iter.c) doing just what we
need..."
* 'uaccess-work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
iov_iter: saner checks on copyin/copyout
iov_iter: sanity checks for copy to/from page primitives
iov_iter/hardening: move object size checks to inlined part
copy_{to,from}_user(): consolidate object size checks
copy_{from,to}_user(): move kasan checks and might_fault() out-of-line
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJZXhmCAAoJEAAOaEEZVoIVpRkP/1qlYn3pq6d5Kuz84pejOmlL
5jbkS/cOmeTxeUU4+B1xG8Lx7bAk8PfSXQOADbSJGiZd0ug95tJxplFYIGJzR/tG
aNMHeu/BVKKhUKORGuKR9rJKtwC839L/qao+yPBo5U3mU4L73rFWX8fxFuhSJ8HR
hvkgBu3Hx6GY59CzxJ8iJzj+B+uPSFrNweAk0+0UeWkBgTzEdiGqaXBX4cHIkq/5
hMoCG+xnmwHKbCBsQ5js+YJT+HedZ4lvfjOqGxgElUyjJ7Bkt/IFYOp8TUiu193T
tA4UinDjN8A7FImmIBIftrECmrAC9HIGhGZroYkMKbb8ReDR2ikE5FhKEpuAGU3a
BXBgX2mPQuArvZWM7qeJCkxV9QJ0u/8Ykbyzo30iPrICyrzbEvIubeB/mDA034+Z
Z0/z8C3v7826F3zP/NyaQEojUgRq30McMOIS8GMnx15HJwRsRKlzjfy9Wm4tWhl0
t3nH1jMqAZ7068s6rfh/oCwdgGOwr5o4hW/bnlITzxbjWQUOnZIe7KBxIezZJ2rv
OcIwd5qE8PNtpagGj5oUbnjGOTkERAgsMfvPk5tjUNt28/qUlVs2V0aeo47dlcsh
oYr8WMOIzw98Rl7Bo70mplLrqLD6nGl0LfXOyUlT4STgLWW4ksmLVuJjWIUxcO/0
yKWjj9wfYRQ0vSUqhsI5
=3Z93
-----END PGP SIGNATURE-----
Merge tag 'for-linus-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux
Pull Writeback error handling updates from Jeff Layton:
"This pile represents the bulk of the writeback error handling fixes
that I have for this cycle. Some of the earlier patches in this pile
may look trivial but they are prerequisites for later patches in the
series.
The aim of this set is to improve how we track and report writeback
errors to userland. Most applications that care about data integrity
will periodically call fsync/fdatasync/msync to ensure that their
writes have made it to the backing store.
For a very long time, we have tracked writeback errors using two flags
in the address_space: AS_EIO and AS_ENOSPC. Those flags are set when a
writeback error occurs (via mapping_set_error) and are cleared as a
side-effect of filemap_check_errors (as you noted yesterday). This
model really sucks for userland.
Only the first task to call fsync (or msync or fdatasync) will see the
error. Any subsequent task calling fsync on a file will get back 0
(unless another writeback error occurs in the interim). If I have
several tasks writing to a file and calling fsync to ensure that their
writes got stored, then I need to have them coordinate with one
another. That's difficult enough, but in a world of containerized
setups that coordination may even not be possible.
But wait...it gets worse!
The calls to filemap_check_errors can be buried pretty far down in the
call stack, and there are internal callers of filemap_write_and_wait
and the like that also end up clearing those errors. Many of those
callers ignore the error return from that function or return it to
userland at nonsensical times (e.g. truncate() or stat()). If I get
back -EIO on a truncate, there is no reason to think that it was
because some previous writeback failed, and a subsequent fsync() will
(incorrectly) return 0.
This pile aims to do three things:
1) ensure that when a writeback error occurs that that error will be
reported to userland on a subsequent fsync/fdatasync/msync call,
regardless of what internal callers are doing
2) report writeback errors on all file descriptions that were open at
the time that the error occurred. This is a user-visible change,
but I think most applications are written to assume this behavior
anyway. Those that aren't are unlikely to be hurt by it.
3) document what filesystems should do when there is a writeback
error. Today, there is very little consistency between them, and a
lot of cargo-cult copying. We need to make it very clear what
filesystems should do in this situation.
To achieve this, the set adds a new data type (errseq_t) and then
builds new writeback error tracking infrastructure around that. Once
all of that is in place, we change the filesystems to use the new
infrastructure for reporting wb errors to userland.
Note that this is just the initial foray into cleaning up this mess.
There is a lot of work remaining here:
1) convert the rest of the filesystems in a similar fashion. Once the
initial set is in, then I think most other fs' will be fairly
simple to convert. Hopefully most of those can in via individual
filesystem trees.
2) convert internal waiters on writeback to use errseq_t for
detecting errors instead of relying on the AS_* flags. I have some
draft patches for this for ext4, but they are not quite ready for
prime time yet.
This was a discussion topic this year at LSF/MM too. If you're
interested in the gory details, LWN has some good articles about this:
https://lwn.net/Articles/718734/https://lwn.net/Articles/724307/"
* tag 'for-linus-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
btrfs: minimal conversion to errseq_t writeback error reporting on fsync
xfs: minimal conversion to errseq_t writeback error reporting
ext4: use errseq_t based error handling for reporting data writeback errors
fs: convert __generic_file_fsync to use errseq_t based reporting
block: convert to errseq_t based writeback error tracking
dax: set errors in mapping when writeback fails
Documentation: flesh out the section in vfs.txt on storing and reporting writeback errors
mm: set both AS_EIO/AS_ENOSPC and errseq_t in mapping_set_error
fs: new infrastructure for writeback error handling and reporting
lib: add errseq_t type and infrastructure for handling it
mm: don't TestClearPageError in __filemap_fdatawait_range
mm: clear AS_EIO/AS_ENOSPC when writeback initiation fails
jbd2: don't clear and reset errors after waiting on writeback
buffer: set errors in mapping at the time that the error occurs
fs: check for writeback errors after syncing out buffers in generic_file_fsync
buffer: use mapping_set_error instead of setting the flag
mm: fix mapping_set_error call in me_pagecache_dirty
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJZXXLdAAoJEAAOaEEZVoIVtBIP/2BMtyDB5IVaxUuYc9LiFxCZ
Y6W4aYEBgPhrct6epV3pnV+SXuzov9F5QZWe1P+lB3e30JHvPhO52OUIT7gSbFbv
kKCh+p7Q1vLqaKxONPQpJI5LjlB6e6GIekrI4woA2RWVw+6cUyP0oQTVhsSsgnj/
/GMo2pAqlhR3vnn9cWG93vl+xnrtmckpwFe0g5Jhdp/cVQBrqwxG+1W9rEsJf0nx
RN29E7+CyxI3x2KkVdmgsMQkpkM2ooopn//1QDmS3M2sbCrJrLSTRG8LBEcs8fi8
pQZcgW6uHXDH2I0hews1vhJRA38TeXoQfj9OZoFGQcVpbP3ZnjASKioRoQiSsHyQ
QRDxUw6C45tjWT0HZ1GaCDMuTMs0z2/zF/E7TaOX6zB2LS/NuIluoVAMkYVyXY3a
L39flIddnDaga1ojL+tQK5hhSl9C66++/FsFa2FZ0hLkeXA5WDLhRy0ODW3NaYg8
89pPJDfiocEiI7ULht2Bkk88zFe+K07bQRQ5eoFtSOAxOnWGJCbxn8G8dFZZDHnO
XZe3gscbR3DCMJ+agb4V/YOyqCHAJMA/lcnP9v7P+QnrEXSV5yrblk1Gx442xMhv
tANcCUI3nb/b2Ma3DW3iZS/iYmhmy/baBSV3n65K9NqtkkIbnqSXxk+5RJd5eKsS
8Y5nyu+6mlcOOxBMkmRo
=jRrj
-----END PGP SIGNATURE-----
Merge tag 'for-linus-v4.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux
Pull Writeback error handling fixes from Jeff Layton:
"The main rationale for all of these changes is to tighten up writeback
error reporting to userland. There are many ways now that writeback
errors can be lost, such that fsync/fdatasync/msync return 0 when
writeback actually failed.
This pile contains a small set of cleanups and writeback error
handling fixes that I was able to break off from the main pile (#2).
Two of the patches in this pile are trivial. The exceptions are the
patch to fix up error handling in write_one_page, and the patch to
make JFS pay attention to write_one_page errors"
* tag 'for-linus-v4.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
fs: remove call_fsync helper function
mm: clean up error handling in write_one_page
JFS: do not ignore return code from write_one_page()
mm: drop "wait" parameter from write_one_page()
take_dentry_name_snapshot() takes a safe snapshot of dentry name;
if the name is a short one, it gets copied into caller-supplied
structure, otherwise an extra reference to external name is grabbed
(those are never modified). In either case the pointer to stable
string is stored into the same structure.
dentry must be held by the caller of take_dentry_name_snapshot(),
but may be freely dropped afterwards - the snapshot will stay
until destroyed by release_dentry_name_snapshot().
Intended use:
struct name_snapshot s;
take_dentry_name_snapshot(&s, dentry);
...
access s.name
...
release_dentry_name_snapshot(&s);
Replaces fsnotify_oldname_...(), gets used in fsnotify to obtain the name
to pass down with event.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull security layer fixes from James Morris:
"Bugfixes for TPM and SELinux"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
IB/core: Fix static analysis warning in ib_policy_change_task
IB/core: Fix uninitialized variable use in check_qp_port_pkey_settings
tpm: do not suspend/resume if power stays on
tpm: use tpm2_pcr_read() in tpm2_do_selftest()
tpm: use tpm_buf functions in tpm2_pcr_read()
tpm_tis: make ilb_base_addr static
tpm: consolidate the TPM startup code
tpm: Enable CLKRUN protocol for Braswell systems
tpm/tpm_crb: fix priv->cmd_size initialisation
tpm: fix a kernel memory leak in tpm-sysfs.c
tpm: Issue a TPM2_Shutdown for TPM2 devices.
Add "shutdown" to "struct class".
Highlights include:
- Support for STRICT_KERNEL_RWX on 64-bit server CPUs.
- Platform support for FSP2 (476fpe) board
- Enable ZONE_DEVICE on 64-bit server CPUs.
- Generic & powerpc spin loop primitives to optimise busy waiting
- Convert VDSO update function to use new update_vsyscall() interface
- Optimisations to hypercall/syscall/context-switch paths
- Improvements to the CPU idle code on Power8 and Power9.
As well as many other fixes and improvements.
Thanks to:
Akshay Adiga, Andrew Donnellan, Andrew Jeffery, Anshuman Khandual, Anton
Blanchard, Balbir Singh, Benjamin Herrenschmidt, Christophe Leroy, Christophe
Lombard, Colin Ian King, Dan Carpenter, Gautham R. Shenoy, Hari Bathini, Ian
Munsie, Ivan Mikhaylov, Javier Martinez Canillas, Madhavan Srinivasan,
Masahiro Yamada, Matt Brown, Michael Neuling, Michal Suchanek, Murilo
Opsfelder Araujo, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran, Paul
Mackerras, Pavel Machek, Russell Currey, Santosh Sivaraj, Stephen Rothwell,
Thiago Jung Bauermann, Yang Li.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJZXyPCAAoJEFHr6jzI4aWAI9QQAISf2x5y//cqCi4ISyQB5pTq
KLS/yQajNkQOw7c0fzBZOaH5Xd/SJ6AcKWDg8yDlpDR3+sRRsr98iIRECgKS5I7/
DxD9ywcbSoMXFQQo1ZMCp5CeuMUIJRtugBnUQM+JXCSUCPbznCY5DchDTLyTBTpO
MeMVhI//JxthhoOMA9MudiEGaYCU9ho442Z4OJUSiLUv8WRbvQX9pTqoc4vx1fxA
BWf2mflztBVcIfKIyxIIIlDLukkMzix6gSYPMCbC7lzkbnU7JSqKiheJXjo1gJS2
ePHKDxeNR2/QP0g/j3aT/MR1uTt9MaNBSX3gANE1xQ9OoJ8m1sOtCO4gNbSdLWae
eXhDnoiEp930DRZOeEioOItuWWoxFaMyYk3BMmRKV4QNdYL3y3TRVeL2/XmRGqYL
Lxz4IY/x5TteFEJNGcRX90uizNSi8AaEXPF16pUk8Ctt6eH3ZSwPMv2fHeYVCMr0
KFlKHyaPEKEoztyzLcUR6u9QB56yxDN58bvLpd32AeHvKhqyxFoySy59x9bZbatn
B2y8mmDItg860e0tIG6jrtplpOVvL8i5jla5RWFVoQDuxxrLAds3vG9JZQs+eRzx
Fiic93bqeUAS6RzhXbJ6QQJYIyhE7yqpcgv7ME1W87SPef3HPBk9xlp3yIDwdA2z
bBDvrRnvupusz8qCWrxe
=w8rj
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
"Highlights include:
- Support for STRICT_KERNEL_RWX on 64-bit server CPUs.
- Platform support for FSP2 (476fpe) board
- Enable ZONE_DEVICE on 64-bit server CPUs.
- Generic & powerpc spin loop primitives to optimise busy waiting
- Convert VDSO update function to use new update_vsyscall() interface
- Optimisations to hypercall/syscall/context-switch paths
- Improvements to the CPU idle code on Power8 and Power9.
As well as many other fixes and improvements.
Thanks to: Akshay Adiga, Andrew Donnellan, Andrew Jeffery, Anshuman
Khandual, Anton Blanchard, Balbir Singh, Benjamin Herrenschmidt,
Christophe Leroy, Christophe Lombard, Colin Ian King, Dan Carpenter,
Gautham R. Shenoy, Hari Bathini, Ian Munsie, Ivan Mikhaylov, Javier
Martinez Canillas, Madhavan Srinivasan, Masahiro Yamada, Matt Brown,
Michael Neuling, Michal Suchanek, Murilo Opsfelder Araujo, Naveen N.
Rao, Nicholas Piggin, Oliver O'Halloran, Paul Mackerras, Pavel Machek,
Russell Currey, Santosh Sivaraj, Stephen Rothwell, Thiago Jung
Bauermann, Yang Li"
* tag 'powerpc-4.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (158 commits)
powerpc/Kconfig: Enable STRICT_KERNEL_RWX for some configs
powerpc/mm/radix: Implement STRICT_RWX/mark_rodata_ro() for Radix
powerpc/mm/hash: Implement mark_rodata_ro() for hash
powerpc/vmlinux.lds: Align __init_begin to 16M
powerpc/lib/code-patching: Use alternate map for patch_instruction()
powerpc/xmon: Add patch_instruction() support for xmon
powerpc/kprobes/optprobes: Use patch_instruction()
powerpc/kprobes: Move kprobes over to patch_instruction()
powerpc/mm/radix: Fix execute permissions for interrupt_vectors
powerpc/pseries: Fix passing of pp0 in updatepp() and updateboltedpp()
powerpc/64s: Blacklist rtas entry/exit from kprobes
powerpc/64s: Blacklist functions invoked on a trap
powerpc/64s: Un-blacklist system_call() from kprobes
powerpc/64s: Move system_call() symbol to just after setting MSR_EE
powerpc/64s: Blacklist system_call() and system_call_common() from kprobes
powerpc/64s: Convert .L__replay_interrupt_return to a local label
powerpc64/elfv1: Only dereference function descriptor for non-text symbols
cxl: Export library to support IBM XSL
powerpc/dts: Use #include "..." to include local DT
powerpc/perf/hv-24x7: Aggregate result elements on POWER9 SMT8
...
- Report correct error status to user
- Fix-ups
- Move Backlight headers out of I2C; adp8860, adp8870
-----BEGIN PGP SIGNATURE-----
iQIcBAABCAAGBQJZXgAWAAoJEFGvii+H/HdhlrQP/j+PXYo6jgYg+uoNvlKpgBWD
bkx2JM9fXA6ZFaEhcvU1LWCt+CqnBeL6ujDb7GBJTRrazws1vZ2Snjrbl/go+6Y2
h4thCecO9AJG1YaDXsWdDjLu9Flx+ZqKS7r+fyp/0O+s+G78SiQZHfoPKNplfrD3
TQpw4MIMRPqeYILgIzOXkBg+HEa2L5oQD1zyr6wIUyqNQIQIR7cUHdVutd241BTD
JitNk39e8CW8jw9zpYJKb7KxdwLtgm7iU6Lf+DWBffVtzh4CY7vJC0WZla7W6lhL
uzBH5mgvmfpRg8/FL37dgQrmal1MHzkXJ5EHQk+08D+Yk6b2vqw6PBDSQ2ZZJmr5
SLyLZK1Y1s2+wX+X7fit71QrcUWDUSYOhh+9GKDwcVfNqacWzro1i2p2ovOTntW4
JbGx8u+UU9pcz1sY1qljjYKxi6G1EAvHbwwobe3syleqh7I09RsnSKqi6ZEXbLAE
Nfxe4K+djnoirtjFL8vpIy0rXFGQKccIAisid5rFnXjL9rVB/7FiQWGovav+9YuG
rpr+dQJe5FlLaeTx9CDZeFeefwVyaDOBVJsVC0yghJoz3o2tz/et5zhBEAtLtMQP
DkS+mEFYLv3ZkXy2Wu4jU7y4jPoeRwTQrGaaUnsh/H+/dAmY6LH6Ccl2My+H0RBi
wAFUl1yW8G+qgr8mhT7y
=QD8S
-----END PGP SIGNATURE-----
Merge tag 'backlight-next-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight
Pull backlight updates from Lee Jones:
"Core Framework:
- Report correct error status to user
Fix-ups:
- Move Backlight headers out of I2C (adp8860, adp8870)"
* tag 'backlight-next-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight:
video: adp8870: move header file out of I2C realm
backlight: adp8860: Move header file out of I2C realm
backlight: Report error on failure
Pull MFD updates from Lee Jones:
"New Drivers:
- Intel Cherry Trail Whiskey Cove PMIC
- TI LP87565 PMIC
New Device Support:
- Add support for Cannonlake to intel-lpss-pci
- Add support for Simatic IOT2000 to intel_quark_i2c_gpio
New Functionality:
- Add Regulator support (axp20x)
Fix-ups:
- Rework IRQ handling (intel_soc_pmic_bxtwc, rtsx_pcr, cros_ec)
- Remove unused/unwelcome code (ipaq-micro, wm831x-core, da9062-core)
- Provide deregistration on unbind (rn5t618)
- Rework DT code/documentation (arizona)
- Constify things (fsl-imx25-tsadc)
- MAINTAINERS updates (DA9062/61)
- Kconfig configuration adaptions (INTEL_SOC_PMIC, MFD_AXP20X_I2C)
- Switch to DMI matching (intel_quark_i2c_gpio)
- Provide an appropriate level of error checking (wm831x-{i2c,spi},
twl4030-irq, tc6393xb)
- Make use of devm_* (resource handling) calls (intel_soc_pmic_bxtwc,
stm32-timers, atmel-flexcom, cros_ec, fsl-imx25-tsadc,
exynos-lpass, palmas, qcom-spmi-pmic, smsc-ece1099,
motorola-cpcap)"
[ Skipped the last commit in that series that added eight thousand
lines of pointless repeated register definitions. - Linus ]
* tag 'mfd-next-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (38 commits)
mfd: Add LP87565 PMIC support
mfd: cros_ec: Free IRQ on exit
dt-bindings: vendor-prefixes: Add arctic to vendor prefix
mfd: da9061: Fix to remove BBAT_CONT register from chip model
mfd: da9061: Fix to remove BBAT_CONT register from chip model
mfd: axp20x-i2c: Document that this must be builtin on x86
mfd: Add Cherry Trail Whiskey Cove PMIC driver
mfd: tc6393xb: Handle return value of clk_prepare_enable
mfd: intel_quark_i2c_gpio: Add support for SIMATIC IOT2000 platform
mfd: intel_quark_i2c_gpio: Use dmi_system_id table for retrieving frequency
mfd: motorola-cpcap: Use devm_of_platform_populate()
mfd: smsc-ece: Use devm_of_platform_populate()
mfd: qcom-spmi-pmic: Use devm_of_platform_populate()
mfd: palmas: Use devm_of_platform_populate()
mfd: exynos: Use devm_of_platform_populate()
mfd: fsl-imx25: Use devm_of_platform_populate()
mfd: cros_ec: Use devm_of_platform_populate()
mfd: atmel: Use devm_of_platform_populate()
mfd: stm32-timers: Use devm_of_platform_populate()
mfd: intel_soc_pmic: Select designware i2c-bus driver
...