This patch move pgtable_t into platform headers.
It gets rid of the CONFIG_PPC_64K_PAGES case for PPC64
as nohash/64 doesn't support CONFIG_PPC_64K_PAGES.
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The purpose of this patch is to move platform specific
mmu-xxx.h files in platform directories like pte-xxx.h files.
In the meantime this patch creates common nohash and
nohash/32 + nohash/64 mmu.h files for future common parts.
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Pull XArray conversion from Matthew Wilcox:
"The XArray provides an improved interface to the radix tree data
structure, providing locking as part of the API, specifying GFP flags
at allocation time, eliminating preloading, less re-walking the tree,
more efficient iterations and not exposing RCU-protected pointers to
its users.
This patch set
1. Introduces the XArray implementation
2. Converts the pagecache to use it
3. Converts memremap to use it
The page cache is the most complex and important user of the radix
tree, so converting it was most important. Converting the memremap
code removes the only other user of the multiorder code, which allows
us to remove the radix tree code that supported it.
I have 40+ followup patches to convert many other users of the radix
tree over to the XArray, but I'd like to get this part in first. The
other conversions haven't been in linux-next and aren't suitable for
applying yet, but you can see them in the xarray-conv branch if you're
interested"
* 'xarray' of git://git.infradead.org/users/willy/linux-dax: (90 commits)
radix tree: Remove multiorder support
radix tree test: Convert multiorder tests to XArray
radix tree tests: Convert item_delete_rcu to XArray
radix tree tests: Convert item_kill_tree to XArray
radix tree tests: Move item_insert_order
radix tree test suite: Remove multiorder benchmarking
radix tree test suite: Remove __item_insert
memremap: Convert to XArray
xarray: Add range store functionality
xarray: Move multiorder_check to in-kernel tests
xarray: Move multiorder_shrink to kernel tests
xarray: Move multiorder account test in-kernel
radix tree test suite: Convert iteration test to XArray
radix tree test suite: Convert tag_tagged_items to XArray
radix tree: Remove radix_tree_clear_tags
radix tree: Remove radix_tree_maybe_preload_order
radix tree: Remove split/join code
radix tree: Remove radix_tree_update_node_t
page cache: Finish XArray conversion
dax: Convert page fault handlers to XArray
...
Merge updates from Andrew Morton:
- a few misc things
- ocfs2 updates
- most of MM
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (132 commits)
hugetlbfs: dirty pages as they are added to pagecache
mm: export add_swap_extent()
mm: split SWP_FILE into SWP_ACTIVATED and SWP_FS
tools/testing/selftests/vm/map_fixed_noreplace.c: add test for MAP_FIXED_NOREPLACE
mm: thp: relocate flush_cache_range() in migrate_misplaced_transhuge_page()
mm: thp: fix mmu_notifier in migrate_misplaced_transhuge_page()
mm: thp: fix MADV_DONTNEED vs migrate_misplaced_transhuge_page race condition
mm/kasan/quarantine.c: make quarantine_lock a raw_spinlock_t
mm/gup: cache dev_pagemap while pinning pages
Revert "x86/e820: put !E820_TYPE_RAM regions into memblock.reserved"
mm: return zero_resv_unavail optimization
mm: zero remaining unavailable struct pages
tools/testing/selftests/vm/gup_benchmark.c: add MAP_HUGETLB option
tools/testing/selftests/vm/gup_benchmark.c: add MAP_SHARED option
tools/testing/selftests/vm/gup_benchmark.c: allow user specified file
tools/testing/selftests/vm/gup_benchmark.c: fix 'write' flag usage
mm/gup_benchmark.c: add additional pinning methods
mm/gup_benchmark.c: time put_page()
mm: don't raise MEMCG_OOM event due to failed high-order allocation
mm/page-writeback.c: fix range_cyclic writeback vs writepages deadlock
...
arm, ia64, mips, powerpc, sh, x86 architectures use the same version of
huge_ptep_set_wrprotect, so move this generic implementation into
asm-generic/hugetlb.h.
Link: http://lkml.kernel.org/r/20180920060358.16606-10-alex@ghiti.fr
Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Tested-by: Helge Deller <deller@gmx.de> [parisc]
Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64]
Acked-by: Paul Burton <paul.burton@mips.com> [MIPS]
Acked-by: Ingo Molnar <mingo@kernel.org> [x86]
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The 40xx defines _PAGE_HWWRITE while others don't.
The 8xx defines _PAGE_RO instead of _PAGE_RW.
The 8xx defines _PAGE_PRIVILEGED instead of _PAGE_USER.
The 8xx defines _PAGE_HUGE and _PAGE_NA while others don't.
Lets those platforms redefine pte_write(), pte_wrprotect() and
pte_mkwrite() and get _PAGE_RO and _PAGE_HWWRITE off the common
helpers.
Lets the 8xx redefine pte_user(), pte_mkprivileged() and pte_mkuser()
and get rid of _PAGE_PRIVILEGED and _PAGE_USER default values.
Lets the 8xx redefine pte_mkhuge() and get rid of
_PAGE_HUGE default value.
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
nohash/64 only uses book3e PTE flags, so it doesn't need pte-common.h
This also allows to drop PAGE_SAO and H_PAGE_4K_PFN from pte_common.h
as they are only used by PPC64
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In order to avoid using generic _PAGE_XXX flags in powerpc
core functions, define helpers for all needed flags:
- pte_mkuser() and pte_mkprivileged() to set/unset and/or
unset/set _PAGE_USER and/or _PAGE_PRIVILEGED
- pte_hashpte() to check if _PAGE_HASHPTE is set.
- pte_ci() check if cache is inhibited (already existing on book3s/64)
- pte_exprotect() to protect against execution
- pte_exec() and pte_mkexec() to query and set page execution
- pte_mkpte() to set _PAGE_PTE flag.
- pte_hw_valid() to check _PAGE_PRESENT since pte_present does
something different on book3s/64.
On book3s/32 there is no exec protection, so pte_mkexec() and
pte_exprotect() are nops and pte_exec() returns always true.
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In order to allow their use in nohash/32/pgtable.h, we have to move the
following helpers in nohash/[32:64]/pgtable.h:
- pte_mkwrite()
- pte_mkdirty()
- pte_mkyoung()
- pte_wrprotect()
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In order to avoid multiple conversions, handover directly a
pgprot_t to map_kernel_page() as already done for radix.
Do the same for __ioremap_caller() and __ioremap_at().
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Introduce xarray value entries and tagged pointers to replace radix
tree exceptional entries. This is a slight change in encoding to allow
the use of an extra bit (we can now store BITS_PER_LONG - 1 bits in a
value entry). It is also a change in emphasis; exceptional entries are
intimidating and different. As the comment explains, you can choose
to store values or pointers in the xarray and they are both first-class
citizens.
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Reviewed-by: Josef Bacik <jbacik@fb.com>
This patch moves ASM_CONST() and stringify_in_c() into
dedicated asm-const.h, then cleans all related inclusions.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: asm-compat.h should include asm-const.h]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
With 4k page size for hugetlb we allocate hugepage directories from its on slab
cache. With patch 0c4d26802 ("powerpc/book3s64/mm: Simplify the rcu callback for page table free")
we missed to free these allocated hugepd tables.
Update pgtable_free to handle hugetlb hugepd directory table.
Fixes: 0c4d268029 ("powerpc/book3s64/mm: Simplify the rcu callback for page table free")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
[mpe: Add CONFIG_HUGETLB_PAGE guard to fix build break]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When relaxing access (read -> read_write update), pte needs to be marked invalid
to handle a nest MMU bug. We also need to do a tlb flush after the pte is
marked invalid before updating the pte with new access bits.
We also move tlb flush to platform specific __ptep_set_access_flags. This will
help us to gerid of unnecessary tlb flush on BOOK3S 64 later. We don't do that
in this patch. This also helps in avoiding multiple tlbies with coprocessor
attached.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In later patch, we use the vma and psize to do tlb flush. Do the prototype
update in separate patch to make the review easy.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Fix the below crash on Book3E 64. pgtable_page_dtor expects struct
page *arg.
Also call the destructor on non book3s platforms correctly. This frees
up the split PTL locks correctly if we had allocated them before.
Call Trace:
.kmem_cache_free+0x9c/0x44c (unreliable)
.ptlock_free+0x1c/0x30
.tlb_remove_table+0xdc/0x224
.free_pgd_range+0x298/0x500
.shift_arg_pages+0x10c/0x1e0
.setup_arg_pages+0x200/0x25c
.load_elf_binary+0x450/0x16c8
.search_binary_handler.part.11+0x9c/0x248
.do_execveat_common.isra.13+0x868/0xc18
.run_init_process+0x34/0x4c
.try_to_run_init_process+0x1c/0x68
.kernel_init+0xdc/0x130
.ret_from_kernel_thread+0x58/0x7c
Fixes: 702346768 ("powerpc/mm/nohash: Remove pte fragment dependency from nohash")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Now that we have removed 64K page size support, the RCU page table free can
be much simpler for nohash. Make a copy of the the rcu callback to pgalloc.h
header similar to nohash 32. We could possibly merge 32 and 64 bit there. But
that is for a later patch
We also move the book3s specific handler to pgtable_book3s64.c. This will be
updated in a later patch to handle split pmd ptlock.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We have in Kconfig
config PPC_64K_PAGES
bool "64k page size"
depends on !PPC_FSL_BOOK3E && (44x || PPC_BOOK3S_64 || PPC_BOOK3E_64)
select HAVE_ARCH_SOFT_DIRTY if PPC_BOOK3S_64
Only supported BOOK3E 64 bit platforms is FSL_BOOK3E. Remove the dead 64k page
support code from 64bit nohash.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When nohash and book3s header were split, some hash related stuff
remained in the nohash header. This patch removes them.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: Duplicate pte_young() to avoid circular header dependency]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In preparation for the following patch which will enhance 'slices'
for supporting PPC32 in order to fix an issue on hugepages on 8xx,
this patch takes out of page*.h all bits related to 'slices' and put
them into newly created slice.h header files.
While common parts go into asm/slice.h, subarch specific
parts go into respective books3s/64/slice.c and nohash/64/slice.c
'slices'
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
To support memory keys, we moved the hash pte slot information to the
second half of the page table. This was ok with PTE entries at level
4 (PTE page) and level 3 (PMD). We already allocate larger page table
pages at those levels to accomodate extra details. For level 4 we
already have the extra space which was used to track 4k hash page
table entry details and at level 3 the extra space was allocated to
track the THP details.
With hugetlbfs PTE, we used this extra space at the PMD level to store
the slot details. But we also support hugetlbfs PTE at PUD level for
16GB pages and PUD level page didn't allocate extra space. This
resulted in memory corruption.
Fix this by allocating extra space at PUD level when HUGETLB is
enabled.
Fixes: bf9a95f9a6 ("powerpc: Free up four 64K PTE bits in 64K backed HPTE pages")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Non-highlights:
- Five fixes for the >128T address space handling, both to fix bugs in our
implementation and to bring the semantics exactly into line with x86.
Highlights:
- Support for a new OPAL call on bare metal machines which gives us a true NMI
(ie. is not masked by MSR[EE]=0) for debugging etc.
- Support for Power9 DD2 in the CXL driver.
- Improvements to machine check handling so that uncorrectable errors can be
reported into the generic memory_failure() machinery.
- Some fixes and improvements for VPHN, which is used under PowerVM to notify
the Linux partition of topology changes.
- Plumbing to enable TM (transactional memory) without suspend on some Power9
processors (PPC_FEATURE2_HTM_NO_SUSPEND).
- Support for emulating vector loads form cache-inhibited memory, on some
Power9 revisions.
- Disable the fast-endian switch "syscall" by default (behind a CONFIG), we
believe it has never had any users.
- A major rework of the API drivers use when initiating and waiting for long
running operations performed by OPAL firmware, and changes to the
powernv_flash driver to use the new API.
- Several fixes for the handling of FP/VMX/VSX while processes are using
transactional memory.
- Optimisations of TLB range flushes when using the radix MMU on Power9.
- Improvements to the VAS facility used to access coprocessors on Power9, and
related improvements to the way the NX crypto driver handles requests.
- Implementation of PMEM_API and UACCESS_FLUSHCACHE for 64-bit.
Thanks to:
Alexey Kardashevskiy, Alistair Popple, Allen Pais, Andrew Donnellan, Aneesh
Kumar K.V, Arnd Bergmann, Balbir Singh, Benjamin Herrenschmidt, Breno Leitao,
Christophe Leroy, Christophe Lombard, Cyril Bur, Frederic Barrat, Gautham R.
Shenoy, Geert Uytterhoeven, Guilherme G. Piccoli, Gustavo Romero, Haren
Myneni, Joel Stanley, Kamalesh Babulal, Kautuk Consul, Markus Elfring, Masami
Hiramatsu, Michael Bringmann, Michael Neuling, Michal Suchanek, Naveen N. Rao,
Nicholas Piggin, Oliver O'Halloran, Paul Mackerras, Pedro Miraglia Franco de
Carvalho, Philippe Bergheaud, Sandipan Das, Seth Forshee, Shriya, Stephen
Rothwell, Stewart Smith, Sukadev Bhattiprolu, Tyrel Datwyler, Vaibhav Jain,
Vaidyanathan Srinivasan, William A. Kennington III.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJaDXGuAAoJEFHr6jzI4aWAEqwP/0TA35KFAK6wqfkCf67z4q+O
I+5piI4eDV4jdCakfoIN1JfjhQRULNePSoCHTccan30mu/bm30p69xtOLL2/h5xH
Mhz/eDBAOo0lrT20nyZfYMW3FnM66wnNf++qJ0O+8L052r4WOB02J0k1uM1ST01D
5Lb5mUoxRLRzCgKRYAYWJifn+IFPUB9NMsvMTym94krAFlIjIzMEQXhDoln+jJMr
QmY5f1BTA/fLfXobn0zwoc/C1oa2PUtxd+rxbwGrLoZ6G843mMqUi90SMr5ybhXp
RzepnBTj4by3vOsnk/X1mANyaZfLsunp75FwnjHdPzKrAS/TuPp8D/iSxxE/PzEq
cLwJFBnFXSgQMefDErXxhHSDz2dAg5r14rsTpDcq2Ko8TPV4rPsuSfmbd9Txekb0
yWHsjoJUBBMl2QcWqIHl+AlV8j1RklF6solcTBcGnH1CZJMfa05VKXV7xGEvOHa0
RJ+/xPyR9KjoB/SUp++9Vmx/M6SwQYFOJlr3Zpg9LNtR8WpoPYu1E6eO+u1Hhzny
eJqaNstH+i+VdY9eqszkAsEBh8o9M/+b+7Wx7TetvU+v368CbXtgFYs9qy2oZjPF
t9sY/BHaHZ8eZ7I00an77a0fVV5B1PVASUtIz5CqkwGpMvX6Z6W2K/XUUFI61kuu
E06HS6Ht8UPJAzrAPUMl
=Rq81
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
"A bit of a small release, I suspect in part due to me travelling for
KS. But my backlog of patches to review is smaller than usual, so I
think in part folks just didn't send as much this cycle.
Non-highlights:
- Five fixes for the >128T address space handling, both to fix bugs
in our implementation and to bring the semantics exactly into line
with x86.
Highlights:
- Support for a new OPAL call on bare metal machines which gives us a
true NMI (ie. is not masked by MSR[EE]=0) for debugging etc.
- Support for Power9 DD2 in the CXL driver.
- Improvements to machine check handling so that uncorrectable errors
can be reported into the generic memory_failure() machinery.
- Some fixes and improvements for VPHN, which is used under PowerVM
to notify the Linux partition of topology changes.
- Plumbing to enable TM (transactional memory) without suspend on
some Power9 processors (PPC_FEATURE2_HTM_NO_SUSPEND).
- Support for emulating vector loads form cache-inhibited memory, on
some Power9 revisions.
- Disable the fast-endian switch "syscall" by default (behind a
CONFIG), we believe it has never had any users.
- A major rework of the API drivers use when initiating and waiting
for long running operations performed by OPAL firmware, and changes
to the powernv_flash driver to use the new API.
- Several fixes for the handling of FP/VMX/VSX while processes are
using transactional memory.
- Optimisations of TLB range flushes when using the radix MMU on
Power9.
- Improvements to the VAS facility used to access coprocessors on
Power9, and related improvements to the way the NX crypto driver
handles requests.
- Implementation of PMEM_API and UACCESS_FLUSHCACHE for 64-bit.
Thanks to: Alexey Kardashevskiy, Alistair Popple, Allen Pais, Andrew
Donnellan, Aneesh Kumar K.V, Arnd Bergmann, Balbir Singh, Benjamin
Herrenschmidt, Breno Leitao, Christophe Leroy, Christophe Lombard,
Cyril Bur, Frederic Barrat, Gautham R. Shenoy, Geert Uytterhoeven,
Guilherme G. Piccoli, Gustavo Romero, Haren Myneni, Joel Stanley,
Kamalesh Babulal, Kautuk Consul, Markus Elfring, Masami Hiramatsu,
Michael Bringmann, Michael Neuling, Michal Suchanek, Naveen N. Rao,
Nicholas Piggin, Oliver O'Halloran, Paul Mackerras, Pedro Miraglia
Franco de Carvalho, Philippe Bergheaud, Sandipan Das, Seth Forshee,
Shriya, Stephen Rothwell, Stewart Smith, Sukadev Bhattiprolu, Tyrel
Datwyler, Vaibhav Jain, Vaidyanathan Srinivasan, and William A.
Kennington III"
* tag 'powerpc-4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (151 commits)
powerpc/64s: Fix Power9 DD2.0 workarounds by adding DD2.1 feature
powerpc/64s: Fix masking of SRR1 bits on instruction fault
powerpc/64s: mm_context.addr_limit is only used on hash
powerpc/64s/radix: Fix 128TB-512TB virtual address boundary case allocation
powerpc/64s/hash: Allow MAP_FIXED allocations to cross 128TB boundary
powerpc/64s/hash: Fix fork() with 512TB process address space
powerpc/64s/hash: Fix 128TB-512TB virtual address boundary case allocation
powerpc/64s/hash: Fix 512T hint detection to use >= 128T
powerpc: Fix DABR match on hash based systems
powerpc/signal: Properly handle return value from uprobe_deny_signal()
powerpc/fadump: use kstrtoint to handle sysfs store
powerpc/lib: Implement UACCESS_FLUSHCACHE API
powerpc/lib: Implement PMEM API
powerpc/powernv/npu: Don't explicitly flush nmmu tlb
powerpc/powernv/npu: Use flush_all_mm() instead of flush_tlb_mm()
powerpc/powernv/idle: Round up latency and residency values
powerpc/kprobes: refactor kprobe_lookup_name for safer string operations
powerpc/kprobes: Blacklist emulate_update_regs() from kprobes
powerpc/kprobes: Do not disable interrupts for optprobes and kprobes_on_ftrace
powerpc/kprobes: Disable preemption before invoking probe handler for optprobes
...
CONFIG_PPC_STD_MMU_64 indicates support for the "standard" powerpc MMU
on 64-bit CPUs. The "standard" MMU refers to the hash page table MMU
found in "server" processors, from IBM mainly.
Currently CONFIG_PPC_STD_MMU_64 is == CONFIG_PPC_BOOK3S_64. While it's
annoying to have two symbols that always have the same value, it's not
quite annoying enough to bother removing one.
However with the arrival of Power9, we now have the situation where
CONFIG_PPC_STD_MMU_64 is enabled, but the kernel is running using the
Radix MMU - *not* the "standard" MMU. So it is now actively confusing
to use it, because it implies that code is disabled or inactive when
the Radix MMU is in use, however that is not necessarily true.
So s/CONFIG_PPC_STD_MMU_64/CONFIG_PPC_BOOK3S_64/, and do some minor
formatting updates of some of the affected lines.
This will be a pain for backports, but c'est la vie.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Introduce a helper pgtable_gfp_flags() which
just returns the current gfp flags and adds
__GFP_ACCOUNT to account for page table allocation.
The generic helper is added to include/asm/pgalloc.h
and has two variants - WARNING ugly bits ahead
1. If the header is included from a module, no check
for mm == &init_mm is done, since init_mm is not
exported
2. For kernel includes, the check is done and required
see (3e79ec7 arch: x86: charge page tables to kmemcg)
The fundamental assumption is that no module should be
doing pgd/pud/pmd and pte alloc's on behalf of init_mm
directly.
NOTE: This adds an overhead to pmd/pud/pgd allocations
similar to x86. The other alternative was to implement
pmd_alloc_kernel/pud_alloc_kernel and pgd_alloc_kernel
with their offset variants.
For 4k page size, pte_alloc_one no longer calls
pte_alloc_one_kernel.
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
BOOKE code is dead code as per the Kconfig details. So make it simpler
by enabling MM_SLICE only for book3s_64. The changes w.r.t nohash is just
removing deadcode. W.r.t ppc64, 4k without hugetlb will now enable MM_SLICE.
But that is good, because we reduce one extra variant which probably is not
getting tested much.
Reviewed-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
If an architecture uses 4level-fixup.h we don't need to do anything as
it includes 5level-fixup.h.
If an architecture uses pgtable-nop*d.h, define __ARCH_USE_5LEVEL_HACK
before inclusion of the header. It makes asm-generic code to use
5level-fixup.h.
If an architecture has 4-level paging or folds levels on its own,
include 5level-fixup.h directly.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Today powerpc64 uses a set of pgtable_caches while powerpc32 uses
standard pages when using 4k pages and a single pgtable_cache
if using other size pages.
In preparation of implementing huge pages on the 8xx, this patch
replaces the specific powerpc32 handling by the 64 bits approach.
This is done by:
* moving 64 bits pgtable_cache_add() and pgtable_cache_init()
in a new file called init-common.c
* modifying pgtable_cache_init() to also handle the case
without PMD
* removing the 32 bits version of pgtable_cache_add() and
pgtable_cache_init()
* copying related header contents from 64 bits into both the
book3s/32 and nohash/32 header files
On the 8xx, the following cache sizes will be used:
* 4k pages mode:
- PGT_CACHE(10) for PGD
- PGT_CACHE(3) for 512k hugepage tables
* 16k pages mode:
- PGT_CACHE(6) for PGD
- PGT_CACHE(7) for 512k hugepage tables
- PGT_CACHE(3) for 8M hugepage tables
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Scott Wood <oss@buserror.net>
When we are updating a pte, we just need to flush the tlb mapping
that pte. Right now we do a full mm flush because we don't track the page
size. Now that we have page size details in pte use that to do the
optimized flush
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
MIN_HUGEPTE_SHIFT hasn't been used since commit d1837cba5d
("powerpc/mm: Cleanup initialization of hugepages on powerpc")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
There are three #ifdef CONFIG_PPC_BOOK3E sections in nohash/64/pgtable.h.
And there should be no configurations possible which use nohash/64/pgtable.h
but don't also enable CONFIG_PPC_BOOK3E.
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Rui Teng <rui.teng@linux.vnet.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
POWER9 DD1 requires pte to be marked invalid (V=0) before updating
it with the new value. This makes this distinction for the different
revisions.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Highlights:
- PowerNV PCI hotplug support.
- Lots more Power9 support.
- eBPF JIT support on ppc64le.
- Lots of cxl updates.
- Boot code consolidation.
Bug fixes:
- Fix spin_unlock_wait() from Boqun Feng
- Fix stack pointer corruption in __tm_recheckpoint() from Michael Neuling
- Fix multiple bugs in memory_hotplug_max() from Bharata B Rao
- mm: Ensure "special" zones are empty from Oliver O'Halloran
- ftrace: Separate the heuristics for checking call sites from Michael Ellerman
- modules: Never restore r2 for a mprofile-kernel style mcount() call from Michael Ellerman
- Fix endianness when reading TCEs from Alexey Kardashevskiy
- start rtasd before PCI probing from Greg Kurz
- PCI: rpaphp: Fix slot registration for multiple slots under a PHB from Tyrel Datwyler
- powerpc/mm: Add memory barrier in __hugepte_alloc() from Sukadev Bhattiprolu
Cleanups & fixes:
- Drop support for MPIC in pseries from Rashmica Gupta
- Define and use PPC64_ELF_ABI_v2/v1 from Michael Ellerman
- Remove unused symbols in asm-offsets.c from Rashmica Gupta
- Fix SRIOV not building without EEH enabled from Russell Currey
- Remove kretprobe_trampoline_holder. from Thiago Jung Bauermann
- Reduce log level of PCI I/O space warning from Benjamin Herrenschmidt
- Add array bounds checking to crash_shutdown_handlers from Suraj Jitindar Singh
- Avoid -maltivec when using clang integrated assembler from Anton Blanchard
- Fix array overrun in ppc_rtas() syscall from Andrew Donnellan
- Fix error return value in cmm_mem_going_offline() from Rasmus Villemoes
- export cpu_to_core_id() from Mauricio Faria de Oliveira
- Remove old symbols from defconfigs from Andrew Donnellan
- Update obsolete comments in setup_32.c about entry conditions from Benjamin Herrenschmidt
- Add comment explaining the purpose of setup_kdump_trampoline() from Benjamin Herrenschmidt
- Merge the RELOCATABLE config entries for ppc32 and ppc64 from Kevin Hao
- Remove RELOCATABLE_PPC32 from Kevin Hao
- Fix .long's in tlb-radix.c to more meaningful from Balbir Singh
Minor cleanups & fixes:
- Andrew Donnellan, Anna-Maria Gleixner, Anton Blanchard, Benjamin
Herrenschmidt, Bharata B Rao, Christophe Leroy, Colin Ian King, Geliang
Tang, Greg Kurz, Madhavan Srinivasan, Michael Ellerman, Michael Ellerman,
Stephen Rothwell, Stewart Smith.
Freescale updates from Scott:
- "Highlights include more 8xx optimizations, device tree updates,
and MVME7100 support."
PowerNV PCI hotplug from Gavin Shan:
- PCI: Add pcibios_setup_bridge()
- Override pcibios_setup_bridge()
- Remove PCI_RESET_DELAY_US
- Move pnv_pci_ioda_setup_opal_tce_kill() around
- Increase PE# capacity
- Allocate PE# in reverse order
- Create PEs in pcibios_setup_bridge()
- Setup PE for root bus
- Extend PCI bridge resources
- Make pnv_ioda_deconfigure_pe() visible
- Dynamically release PE
- Update bridge windows on PCI plug
- Delay populating pdn
- Support PCI slot ID
- Use PCI slot reset infrastructure
- Introduce pnv_pci_get_slot_id()
- Functions to get/set PCI slot state
- PCI/hotplug: PowerPC PowerNV PCI hotplug driver
- Print correct PHB type names
Power9 idle support from Shreyas B. Prabhu:
- set power_save func after the idle states are initialized
- Use PNV_THREAD_WINKLE macro while requesting for winkle
- make hypervisor state restore a function
- Rename idle_power7.S to idle_book3s.S
- Rename reusable idle functions to hardware agnostic names
- Make pnv_powersave_common more generic
- abstraction for saving SPRs before entering deep idle states
- Add platform support for stop instruction
- cpuidle/powernv: Use CPUIDLE_STATE_MAX instead of MAX_POWERNV_IDLE_STATES
- cpuidle/powernv: cleanup cpuidle-powernv.c
- cpuidle/powernv: Add support for POWER ISA v3 idle states
- Use deepest stop state when cpu is offlined
Power9 PMU from Madhavan Srinivasan:
- factor out power8 pmu macros and defines
- factor out power8 pmu functions
- factor out power8 __init_pmu code
- Add power9 event list macros for generic and cache events
- Power9 PMU support
- Export Power9 generic and cache events to sysfs
Power9 preliminary interrupt & PCI support from Benjamin Herrenschmidt:
- Add XICS emulation APIs
- Move a few exception common handlers to make room
- Add support for HV virtualization interrupts
- Add mechanism to force a replay of interrupts
- Add ICP OPAL backend
- Discover IODA3 PHBs
- pci: Remove obsolete SW invalidate
- opal: Add real mode call wrappers
- Rename TCE invalidation calls
- Remove SWINV constants and obsolete TCE code
- Rework accessing the TCE invalidate register
- Fallback to OPAL for TCE invalidations
- Use the device-tree to get available range of M64's
- Check status of a PHB before using it
- pci: Don't try to allocate resources that will be reassigned
Other Power9:
- Send SIGBUS on unaligned copy and paste from Chris Smart
- Large Decrementer support from Oliver O'Halloran
- Load Monitor Register Support from Jack Miller
Performance improvements from Anton Blanchard:
- Avoid load hit store in __giveup_fpu() and __giveup_altivec()
- Avoid load hit store in setup_sigcontext()
- Remove assembly versions of strcpy, strcat, strlen and strcmp
- Align hot loops of some string functions
eBPF JIT from Naveen N. Rao:
- Fix/enhance 32-bit Load Immediate implementation
- Optimize 64-bit Immediate loads
- Introduce rotate immediate instructions
- A few cleanups
- Isolate classic BPF JIT specifics into a separate header
- Implement JIT compiler for extended BPF
Operator Panel driver from Suraj Jitindar Singh:
- devicetree/bindings: Add binding for operator panel on FSP machines
- Add inline function to get rc from an ASYNC_COMP opal_msg
- Add driver for operator panel on FSP machines
Sparse fixes from Daniel Axtens:
- make some things static
- Introduce asm-prototypes.h
- Include headers containing prototypes
- Use #ifdef __BIG_ENDIAN__ #else for REG_BYTE
- kvm: Clarify __user annotations
- Pass endianness to sparse
- Make ppc_md.{halt, restart} __noreturn
MM fixes & cleanups from Aneesh Kumar K.V:
- radix: Update LPCR HR bit as per ISA
- use _raw variant of page table accessors
- Compile out radix related functions if RADIX_MMU is disabled
- Clear top 16 bits of va only on older cpus
- Print formation regarding the the MMU mode
- hash: Update SDR1 size encoding as documented in ISA 3.0
- radix: Update PID switch sequence
- radix: Update machine call back to support new HCALL.
- radix: Add LPID based tlb flush helpers
- radix: Add a kernel command line to disable radix
- Cleanup LPCR defines
Boot code consolidation from Benjamin Herrenschmidt:
- Move epapr_paravirt_early_init() to early_init_devtree()
- cell: Don't use flat device-tree after boot
- ge_imp3a: Don't use the flat device-tree after boot
- mpc85xx_ds: Don't use the flat device-tree after boot
- mpc85xx_rdb: Don't use the flat device-tree after boot
- Don't test for machine type in rtas_initialize()
- Don't test for machine type in smp_setup_cpu_maps()
- dt: Add of_device_compatible_match()
- Factor do_feature_fixup calls
- Move 64-bit feature fixup earlier
- Move 64-bit memory reserves to setup_arch()
- Use a cachable DART
- Move FW feature probing out of pseries probe()
- Put exception configuration in a common place
- Remove early allocation of the SMU command buffer
- Move MMU backend selection out of platform code
- pasemi: Remove IOBMAP allocation from platform probe()
- mm/hash: Don't use machine_is() early during boot
- Don't test for machine type to detect HEA special case
- pmac: Remove spurrious machine type test
- Move hash table ops to a separate structure
- Ensure that ppc_md is empty before probing for machine type
- Move 64-bit probe_machine() to later in the boot process
- Move 32-bit probe() machine to later in the boot process
- Get rid of ppc_md.init_early()
- Move the boot time info banner to a separate function
- Move setting of {i,d}cache_bsize to initialize_cache_info()
- Move the content of setup_system() to setup_arch()
- Move cache info inits to a separate function
- Re-order the call to smp_setup_cpu_maps()
- Re-order setup_panic()
- Make a few boot functions __init
- Merge 32-bit and 64-bit setup_arch()
Other new features:
- tty/hvc: Use IRQF_SHARED for OPAL hvc consoles from Sam Mendoza-Jonas
- tty/hvc: Use opal irqchip interface if available from Sam Mendoza-Jonas
- powerpc: Add module autoloading based on CPU features from Alastair D'Silva
- crypto: vmx - Convert to CPU feature based module autoloading from Alastair D'Silva
- Wake up kopald polling thread before waiting for events from Benjamin Herrenschmidt
- xmon: Dump ISA 2.06 SPRs from Michael Ellerman
- xmon: Dump ISA 2.07 SPRs from Michael Ellerman
- Add a parameter to disable 1TB segs from Oliver O'Halloran
- powerpc/boot: Add OPAL console to epapr wrappers from Oliver O'Halloran
- Assign fixed PHB number based on device-tree properties from Guilherme G. Piccoli
- pseries: Add pseries hotplug workqueue from John Allen
- pseries: Add support for hotplug interrupt source from John Allen
- pseries: Use kernel hotplug queue for PowerVM hotplug events from John Allen
- pseries: Move property cloning into its own routine from Nathan Fontenot
- pseries: Dynamic add entires to associativity lookup array from Nathan Fontenot
- pseries: Auto-online hotplugged memory from Nathan Fontenot
- pseries: Remove call to memblock_add() from Nathan Fontenot
cxl:
- Add set and get private data to context struct from Michael Neuling
- make base more explicitly non-modular from Paul Gortmaker
- Use for_each_compatible_node() macro from Wei Yongjun
- Frederic Barrat
- Abstract the differences between the PSL and XSL
- Make vPHB device node match adapter's
- Philippe Bergheaud
- Add mechanism for delivering AFU driver specific events
- Ignore CAPI adapters misplaced in switched slots
- Refine slice error debug messages
- Andrew Donnellan
- static-ify variables to fix sparse warnings
- PCI/hotplug: pnv_php: export symbols and move struct types needed by cxl
- PCI/hotplug: pnv_php: handle OPAL_PCI_SLOT_OFFLINE power state
- Add cxl_check_and_switch_mode() API to switch bi-modal cards
- remove dead Kconfig options
- fix potential NULL dereference in free_adapter()
- Ian Munsie
- Update process element after allocating interrupts
- Add support for CAPP DMA mode
- Fix allowing bogus AFU descriptors with 0 maximum processes
- Fix allocating a minimum of 2 pages for the SPA
- Fix bug where AFU disable operation had no effect
- Workaround XSL bug that does not clear the RA bit after a reset
- Fix NULL pointer dereference on kernel contexts with no AFU interrupts
- powerpc/powernv: Split cxl code out into a separate file
- Add cxl_slot_is_supported API
- Enable bus mastering for devices using CAPP DMA mode
- Move cxl_afu_get / cxl_afu_put to base
- Allow a default context to be associated with an external pci_dev
- Do not create vPHB if there are no AFU configuration records
- powerpc/powernv: Add support for the cxl kernel api on the real phb
- Add support for using the kernel API with a real PHB
- Add kernel APIs to get & set the max irqs per context
- Add preliminary workaround for CX4 interrupt limitation
- Add support for interrupts on the Mellanox CX4
- Workaround PE=0 hardware limitation in Mellanox CX4
- powerpc/powernv: Fix pci-cxl.c build when CONFIG_MODULES=n
selftests:
- Test unaligned copy and paste from Chris Smart
- Load Monitor Register Tests from Jack Miller
- Cyril Bur
- exec() with suspended transaction
- Use signed long to read perf_event_paranoid
- Fix usage message in context_switch
- Fix generation of vector instructions/types in context_switch
- Michael Ellerman
- Use "Delta" rather than "Error" in normal output
- Import Anton's mmap & futex micro benchmarks
- Add a test for PROT_SAO
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXnWchAAoJEFHr6jzI4aWAe64P/36Vd9yJLptjkoyZp8/IQtu1
Cv8buQwGdKuSMzdkcUAOXcC3fe2u70ZWXMKKLfY3koIV1IAiqdWk5/XWRKMP2XmE
dG0LhSf0uu7uh+mE0WvQnRu46ImeKtQ+mPp4Hbs/s9SxMSeYjruv3vdWWmgUq0cl
Gac2qJSRtAMmgLuHWMjf7N5mxOTOnKejU4o2i9cJ+YHmWKOdCigv2Ge1UadOQFlC
E7tRPiUR3asfDfj+e+LVTTdToH6p8pk+mOUzIoZ8jIkQ+IXzi62UDl5+Rw9mqiuX
1CtqEMUXxo2qwX+d4TcV/QUOp0YKPuIcUZ9NMMS+S3lOyJ4NFt+j2Izk7QJp5kNP
gKVqB68TjDQsBuDr3P9ynlHbduxTIhZAqopbTrLe0FIg48nUe4n1yHJBVzqaVajX
rFBJSsSUffBLAARNPSXJJhIgc2C1/qOC8dgMeDMcR2kPirDHaQZ/lY1yEpq1yiqR
q6e3v5hvIAm4IjbYk0mF7TUxBrPGVE/ExyBINyASRoYxAJ1PyeD/iljZ9vI3asRA
s+hhxT8H3f7lnqTrmJqMjHgAdGkmag07EdmvFNX4xK4aADSy7Y6g4dw25ffRopo9
p9Jf9HX+dZv65Y3UjbV/6HuXcaSEBJJLSVWvii65PebqSN0LuHEFvNeIJ6Iblx0B
AWh/hd0Iin2gdkcG39Mr
=Z5kM
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
"Highlights:
- PowerNV PCI hotplug support.
- Lots more Power9 support.
- eBPF JIT support on ppc64le.
- Lots of cxl updates.
- Boot code consolidation.
Bug fixes:
- Fix spin_unlock_wait() from Boqun Feng
- Fix stack pointer corruption in __tm_recheckpoint() from Michael
Neuling
- Fix multiple bugs in memory_hotplug_max() from Bharata B Rao
- mm: Ensure "special" zones are empty from Oliver O'Halloran
- ftrace: Separate the heuristics for checking call sites from
Michael Ellerman
- modules: Never restore r2 for a mprofile-kernel style mcount() call
from Michael Ellerman
- Fix endianness when reading TCEs from Alexey Kardashevskiy
- start rtasd before PCI probing from Greg Kurz
- PCI: rpaphp: Fix slot registration for multiple slots under a PHB
from Tyrel Datwyler
- powerpc/mm: Add memory barrier in __hugepte_alloc() from Sukadev
Bhattiprolu
Cleanups & fixes:
- Drop support for MPIC in pseries from Rashmica Gupta
- Define and use PPC64_ELF_ABI_v2/v1 from Michael Ellerman
- Remove unused symbols in asm-offsets.c from Rashmica Gupta
- Fix SRIOV not building without EEH enabled from Russell Currey
- Remove kretprobe_trampoline_holder from Thiago Jung Bauermann
- Reduce log level of PCI I/O space warning from Benjamin
Herrenschmidt
- Add array bounds checking to crash_shutdown_handlers from Suraj
Jitindar Singh
- Avoid -maltivec when using clang integrated assembler from Anton
Blanchard
- Fix array overrun in ppc_rtas() syscall from Andrew Donnellan
- Fix error return value in cmm_mem_going_offline() from Rasmus
Villemoes
- export cpu_to_core_id() from Mauricio Faria de Oliveira
- Remove old symbols from defconfigs from Andrew Donnellan
- Update obsolete comments in setup_32.c about entry conditions from
Benjamin Herrenschmidt
- Add comment explaining the purpose of setup_kdump_trampoline() from
Benjamin Herrenschmidt
- Merge the RELOCATABLE config entries for ppc32 and ppc64 from Kevin
Hao
- Remove RELOCATABLE_PPC32 from Kevin Hao
- Fix .long's in tlb-radix.c to more meaningful from Balbir Singh
Minor cleanups & fixes:
- Andrew Donnellan, Anna-Maria Gleixner, Anton Blanchard, Benjamin
Herrenschmidt, Bharata B Rao, Christophe Leroy, Colin Ian King,
Geliang Tang, Greg Kurz, Madhavan Srinivasan, Michael Ellerman,
Michael Ellerman, Stephen Rothwell, Stewart Smith.
Freescale updates from Scott:
- "Highlights include more 8xx optimizations, device tree updates,
and MVME7100 support."
PowerNV PCI hotplug from Gavin Shan:
- PCI: Add pcibios_setup_bridge()
- Override pcibios_setup_bridge()
- Remove PCI_RESET_DELAY_US
- Move pnv_pci_ioda_setup_opal_tce_kill() around
- Increase PE# capacity
- Allocate PE# in reverse order
- Create PEs in pcibios_setup_bridge()
- Setup PE for root bus
- Extend PCI bridge resources
- Make pnv_ioda_deconfigure_pe() visible
- Dynamically release PE
- Update bridge windows on PCI plug
- Delay populating pdn
- Support PCI slot ID
- Use PCI slot reset infrastructure
- Introduce pnv_pci_get_slot_id()
- Functions to get/set PCI slot state
- PCI/hotplug: PowerPC PowerNV PCI hotplug driver
- Print correct PHB type names
Power9 idle support from Shreyas B. Prabhu:
- set power_save func after the idle states are initialized
- Use PNV_THREAD_WINKLE macro while requesting for winkle
- make hypervisor state restore a function
- Rename idle_power7.S to idle_book3s.S
- Rename reusable idle functions to hardware agnostic names
- Make pnv_powersave_common more generic
- abstraction for saving SPRs before entering deep idle states
- Add platform support for stop instruction
- cpuidle/powernv: Use CPUIDLE_STATE_MAX instead of MAX_POWERNV_IDLE_STATES
- cpuidle/powernv: cleanup cpuidle-powernv.c
- cpuidle/powernv: Add support for POWER ISA v3 idle states
- Use deepest stop state when cpu is offlined
Power9 PMU from Madhavan Srinivasan:
- factor out power8 pmu macros and defines
- factor out power8 pmu functions
- factor out power8 __init_pmu code
- Add power9 event list macros for generic and cache events
- Power9 PMU support
- Export Power9 generic and cache events to sysfs
Power9 preliminary interrupt & PCI support from Benjamin Herrenschmidt:
- Add XICS emulation APIs
- Move a few exception common handlers to make room
- Add support for HV virtualization interrupts
- Add mechanism to force a replay of interrupts
- Add ICP OPAL backend
- Discover IODA3 PHBs
- pci: Remove obsolete SW invalidate
- opal: Add real mode call wrappers
- Rename TCE invalidation calls
- Remove SWINV constants and obsolete TCE code
- Rework accessing the TCE invalidate register
- Fallback to OPAL for TCE invalidations
- Use the device-tree to get available range of M64's
- Check status of a PHB before using it
- pci: Don't try to allocate resources that will be reassigned
Other Power9:
- Send SIGBUS on unaligned copy and paste from Chris Smart
- Large Decrementer support from Oliver O'Halloran
- Load Monitor Register Support from Jack Miller
Performance improvements from Anton Blanchard:
- Avoid load hit store in __giveup_fpu() and __giveup_altivec()
- Avoid load hit store in setup_sigcontext()
- Remove assembly versions of strcpy, strcat, strlen and strcmp
- Align hot loops of some string functions
eBPF JIT from Naveen N. Rao:
- Fix/enhance 32-bit Load Immediate implementation
- Optimize 64-bit Immediate loads
- Introduce rotate immediate instructions
- A few cleanups
- Isolate classic BPF JIT specifics into a separate header
- Implement JIT compiler for extended BPF
Operator Panel driver from Suraj Jitindar Singh:
- devicetree/bindings: Add binding for operator panel on FSP machines
- Add inline function to get rc from an ASYNC_COMP opal_msg
- Add driver for operator panel on FSP machines
Sparse fixes from Daniel Axtens:
- make some things static
- Introduce asm-prototypes.h
- Include headers containing prototypes
- Use #ifdef __BIG_ENDIAN__ #else for REG_BYTE
- kvm: Clarify __user annotations
- Pass endianness to sparse
- Make ppc_md.{halt, restart} __noreturn
MM fixes & cleanups from Aneesh Kumar K.V:
- radix: Update LPCR HR bit as per ISA
- use _raw variant of page table accessors
- Compile out radix related functions if RADIX_MMU is disabled
- Clear top 16 bits of va only on older cpus
- Print formation regarding the the MMU mode
- hash: Update SDR1 size encoding as documented in ISA 3.0
- radix: Update PID switch sequence
- radix: Update machine call back to support new HCALL.
- radix: Add LPID based tlb flush helpers
- radix: Add a kernel command line to disable radix
- Cleanup LPCR defines
Boot code consolidation from Benjamin Herrenschmidt:
- Move epapr_paravirt_early_init() to early_init_devtree()
- cell: Don't use flat device-tree after boot
- ge_imp3a: Don't use the flat device-tree after boot
- mpc85xx_ds: Don't use the flat device-tree after boot
- mpc85xx_rdb: Don't use the flat device-tree after boot
- Don't test for machine type in rtas_initialize()
- Don't test for machine type in smp_setup_cpu_maps()
- dt: Add of_device_compatible_match()
- Factor do_feature_fixup calls
- Move 64-bit feature fixup earlier
- Move 64-bit memory reserves to setup_arch()
- Use a cachable DART
- Move FW feature probing out of pseries probe()
- Put exception configuration in a common place
- Remove early allocation of the SMU command buffer
- Move MMU backend selection out of platform code
- pasemi: Remove IOBMAP allocation from platform probe()
- mm/hash: Don't use machine_is() early during boot
- Don't test for machine type to detect HEA special case
- pmac: Remove spurrious machine type test
- Move hash table ops to a separate structure
- Ensure that ppc_md is empty before probing for machine type
- Move 64-bit probe_machine() to later in the boot process
- Move 32-bit probe() machine to later in the boot process
- Get rid of ppc_md.init_early()
- Move the boot time info banner to a separate function
- Move setting of {i,d}cache_bsize to initialize_cache_info()
- Move the content of setup_system() to setup_arch()
- Move cache info inits to a separate function
- Re-order the call to smp_setup_cpu_maps()
- Re-order setup_panic()
- Make a few boot functions __init
- Merge 32-bit and 64-bit setup_arch()
Other new features:
- tty/hvc: Use IRQF_SHARED for OPAL hvc consoles from Sam Mendoza-Jonas
- tty/hvc: Use opal irqchip interface if available from Sam Mendoza-Jonas
- powerpc: Add module autoloading based on CPU features from Alastair D'Silva
- crypto: vmx - Convert to CPU feature based module autoloading from Alastair D'Silva
- Wake up kopald polling thread before waiting for events from Benjamin Herrenschmidt
- xmon: Dump ISA 2.06 SPRs from Michael Ellerman
- xmon: Dump ISA 2.07 SPRs from Michael Ellerman
- Add a parameter to disable 1TB segs from Oliver O'Halloran
- powerpc/boot: Add OPAL console to epapr wrappers from Oliver O'Halloran
- Assign fixed PHB number based on device-tree properties from Guilherme G. Piccoli
- pseries: Add pseries hotplug workqueue from John Allen
- pseries: Add support for hotplug interrupt source from John Allen
- pseries: Use kernel hotplug queue for PowerVM hotplug events from John Allen
- pseries: Move property cloning into its own routine from Nathan Fontenot
- pseries: Dynamic add entires to associativity lookup array from Nathan Fontenot
- pseries: Auto-online hotplugged memory from Nathan Fontenot
- pseries: Remove call to memblock_add() from Nathan Fontenot
cxl:
- Add set and get private data to context struct from Michael Neuling
- make base more explicitly non-modular from Paul Gortmaker
- Use for_each_compatible_node() macro from Wei Yongjun
- Frederic Barrat
- Abstract the differences between the PSL and XSL
- Make vPHB device node match adapter's
- Philippe Bergheaud
- Add mechanism for delivering AFU driver specific events
- Ignore CAPI adapters misplaced in switched slots
- Refine slice error debug messages
- Andrew Donnellan
- static-ify variables to fix sparse warnings
- PCI/hotplug: pnv_php: export symbols and move struct types needed by cxl
- PCI/hotplug: pnv_php: handle OPAL_PCI_SLOT_OFFLINE power state
- Add cxl_check_and_switch_mode() API to switch bi-modal cards
- remove dead Kconfig options
- fix potential NULL dereference in free_adapter()
- Ian Munsie
- Update process element after allocating interrupts
- Add support for CAPP DMA mode
- Fix allowing bogus AFU descriptors with 0 maximum processes
- Fix allocating a minimum of 2 pages for the SPA
- Fix bug where AFU disable operation had no effect
- Workaround XSL bug that does not clear the RA bit after a reset
- Fix NULL pointer dereference on kernel contexts with no AFU interrupts
- powerpc/powernv: Split cxl code out into a separate file
- Add cxl_slot_is_supported API
- Enable bus mastering for devices using CAPP DMA mode
- Move cxl_afu_get / cxl_afu_put to base
- Allow a default context to be associated with an external pci_dev
- Do not create vPHB if there are no AFU configuration records
- powerpc/powernv: Add support for the cxl kernel api on the real phb
- Add support for using the kernel API with a real PHB
- Add kernel APIs to get & set the max irqs per context
- Add preliminary workaround for CX4 interrupt limitation
- Add support for interrupts on the Mellanox CX4
- Workaround PE=0 hardware limitation in Mellanox CX4
- powerpc/powernv: Fix pci-cxl.c build when CONFIG_MODULES=n
selftests:
- Test unaligned copy and paste from Chris Smart
- Load Monitor Register Tests from Jack Miller
- Cyril Bur
- exec() with suspended transaction
- Use signed long to read perf_event_paranoid
- Fix usage message in context_switch
- Fix generation of vector instructions/types in context_switch
- Michael Ellerman
- Use "Delta" rather than "Error" in normal output
- Import Anton's mmap & futex micro benchmarks
- Add a test for PROT_SAO"
* tag 'powerpc-4.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (263 commits)
powerpc/mm: Parenthesise IS_ENABLED() in if condition
tty/hvc: Use opal irqchip interface if available
tty/hvc: Use IRQF_SHARED for OPAL hvc consoles
selftests/powerpc: exec() with suspended transaction
powerpc: Improve comment explaining why we modify VRSAVE
powerpc/mm: Drop unused externs for hpte_init_beat[_v3]()
powerpc/mm: Rename hpte_init_lpar() and move the fallback to a header
powerpc/mm: Fix build break when PPC_NATIVE=n
crypto: vmx - Convert to CPU feature based module autoloading
powerpc: Add module autoloading based on CPU features
powerpc/powernv/ioda: Fix endianness when reading TCEs
powerpc/mm: Add memory barrier in __hugepte_alloc()
powerpc/modules: Never restore r2 for a mprofile-kernel style mcount() call
powerpc/ftrace: Separate the heuristics for checking call sites
powerpc: Merge 32-bit and 64-bit setup_arch()
powerpc/64: Make a few boot functions __init
powerpc: Re-order setup_panic()
powerpc: Re-order the call to smp_setup_cpu_maps()
powerpc/32: Move cache info inits to a separate function
powerpc/64: Move the content of setup_system() to setup_arch()
...
__GFP_REPEAT has a rather weak semantic but since it has been introduced
around 2.6.12 it has been ignored for low order allocations.
{pud,pmd}_alloc_one are allocating from {PGT,PUD}_CACHE initialized in
pgtable_cache_init which doesn't have larger than sizeof(void *) << 12
size and that fits into !costly allocation request size.
PGALLOC_GFP is used only in radix__pgd_alloc which uses either order-0
or order-4 requests. The first one doesn't need the flag while the
second does. Drop __GFP_REPEAT from PGALLOC_GFP and add it for the
order-4 one.
This means that this flag has never been actually useful here because it
has always been used only for PAGE_ALLOC_COSTLY requests.
Link: http://lkml.kernel.org/r/1464599699-30131-12-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is the third version of the patchset previously sent [1]. I have
basically only rebased it on top of 4.7-rc1 tree and dropped "dm: get
rid of superfluous gfp flags" which went through dm tree. I am sending
it now because it is tree wide and chances for conflicts are reduced
considerably when we want to target rc2. I plan to send the next step
and rename the flag and move to a better semantic later during this
release cycle so we will have a new semantic ready for 4.8 merge window
hopefully.
Motivation:
While working on something unrelated I've checked the current usage of
__GFP_REPEAT in the tree. It seems that a majority of the usage is and
always has been bogus because __GFP_REPEAT has always been about costly
high order allocations while we are using it for order-0 or very small
orders very often. It seems that a big pile of them is just a
copy&paste when a code has been adopted from one arch to another.
I think it makes some sense to get rid of them because they are just
making the semantic more unclear. Please note that GFP_REPEAT is
documented as
* __GFP_REPEAT: Try hard to allocate the memory, but the allocation attempt
* _might_ fail. This depends upon the particular VM implementation.
while !costly requests have basically nofail semantic. So one could
reasonably expect that order-0 request with __GFP_REPEAT will not loop
for ever. This is not implemented right now though.
I would like to move on with __GFP_REPEAT and define a better semantic
for it.
$ git grep __GFP_REPEAT origin/master | wc -l
111
$ git grep __GFP_REPEAT | wc -l
36
So we are down to the third after this patch series. The remaining
places really seem to be relying on __GFP_REPEAT due to large allocation
requests. This still needs some double checking which I will do later
after all the simple ones are sorted out.
I am touching a lot of arch specific code here and I hope I got it right
but as a matter of fact I even didn't compile test for some archs as I
do not have cross compiler for them. Patches should be quite trivial to
review for stupid compile mistakes though. The tricky parts are usually
hidden by macro definitions and thats where I would appreciate help from
arch maintainers.
[1] http://lkml.kernel.org/r/1461849846-27209-1-git-send-email-mhocko@kernel.org
This patch (of 19):
__GFP_REPEAT has a rather weak semantic but since it has been introduced
around 2.6.12 it has been ignored for low order allocations. Yet we
have the full kernel tree with its usage for apparently order-0
allocations. This is really confusing because __GFP_REPEAT is
explicitly documented to allow allocation failures which is a weaker
semantic than the current order-0 has (basically nofail).
Let's simply drop __GFP_REPEAT from those places. This would allow to
identify place which really need allocator to retry harder and formulate
a more specific semantic for what the flag is supposed to do actually.
Link: http://lkml.kernel.org/r/1464599699-30131-2-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen Liqin <liqin.linux@gmail.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com> [for tile]
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: John Crispin <blogic@openwrt.org>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 74701d5947 "powerpc/mm: Rename function to indicate we are
allocating fragments" renamed page_table_free() to pte_fragment_free().
One occurrence was mistyped as pte_fragment_fre().
This only breaks the nohash 64K page build, which is not the default or
enabled in any defconfig.
Fixes: 74701d5947 ("powerpc/mm: Rename function to indicate we are allocating fragments")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This patch switches 4K Linux page size config to use pte_t * type
instead of struct page * for pgtable_t. This simplifies the code a lot
and helps in consolidating both 64K and 4K page allocator routines. The
changes should not have any impact, because we already store physical
address in the upper level page table tree and that implies we already
do struct page * to physical address conversion.
One change to note here is we move the pgtable_page_dtor() call for
nohash to pte_fragment_free_mm(). The nohash related change is due to
the related changes in pgtable_64.c.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Only code cleanup. No functionality change.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This reverts pgalloc related changes WRT implementing 4-level page
table for 64K Linux page size and storing of physical address in higher
level page tables since they are only applicable to book3s64 variant
and we now have a separate copy for book3s64. This helps to keep these
headers simpler.
Cc: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This moves the nohash variant of pgalloc headers to nohash/ directory
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
For hash we create vmemmap mapping using bolted hash page table entries.
For radix we fill the radix page table. The next patch will add the
radix details for creating vmemmap mappings.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This changes the Linux page tables to store physical addresses
rather than kernel virtual addresses in the upper levels of the
tree (pgd, pud and pmd) for 64-bit Book 3S machines.
This also changes the hugepd pointers used to implement hugepages
when the base page size is 4k to store physical addresses rather than
virtual addresses (again just for 64-bit Book3S machines).
This frees up some high order bits, and will be needed with
PowerISA v3.0 machines which read the page table tree in hardware
in radix mode.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Remove the related functions and #defines
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
They don't need to track 4k subpage slot details and hence don't need
second half of pgtable_t.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Use the #define instead of open-coding the same
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
pte and pmd table size are dependent on config items. Don't
hard code the same. This make sure we use the right value
when masking pmd entries and also while checking pmd_bad
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>