Commit Graph

984 Commits

Author SHA1 Message Date
Thomas Gleixner
cd1a41ceba softirq: Move __ARCH_HAS_DO_SOFTIRQ to Kconfig
To prepare for inlining do_softirq_own_stack() replace
__ARCH_HAS_DO_SOFTIRQ with a Kconfig switch and select it in the affected
architectures.

This allows in the next step to move the function prototype and the inline
stub into a seperate asm-generic header file which is required to avoid
include recursion.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210210002513.181713427@linutronix.de
2021-02-10 23:34:16 +01:00
Nicholas Piggin
2a06bf3e95 powerpc/64: context tracking remove _TIF_NOHZ
Add context tracking to the system call handler explicitly, and remove
_TIF_NOHZ.

This improves system call performance when nohz_full is enabled. On a
POWER9, gettid scv system call cost on a nohz_full CPU improves from
1129 cycles to 1004 cycles and on a housekeeping CPU from 550 cycles
to 430 cycles.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210130130852.2952424-31-npiggin@gmail.com
2021-02-09 00:02:12 +11:00
Christophe Leroy
4eeef098b4 powerpc/44x: Remove STDBINUTILS kconfig option
STDBINUTILS is just a toggle to allow 256k page size to appear in the
possible page sizes list for the 44x.

Make 256k page size option appear all the time with an explicit
warning about binutils, and remove this unnecessary STDBINUTILS
config option.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Incorporate help text changes from David Laight]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f9981e819009aa121a998dc483052ec76f78f991.1611128938.git.christophe.leroy@csgroup.eu
2021-01-30 11:39:33 +11:00
Christophe Leroy
910a0cb6d2 powerpc/47x: Disable 256k page size
PPC47x_TLBE_SIZE isn't defined for 256k pages, leading to a build
break if 256k pages is selected.

So change the kconfig so that 256k pages can't be selected for 47x.

Fixes: e7f75ad01d ("powerpc/47x: Base ppc476 support")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Expand change log to mention build break]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/2fed79b1154c872194f98bac4422c23918325e61.1611128938.git.christophe.leroy@csgroup.eu
2021-01-30 11:39:33 +11:00
Nicholas Piggin
c9f3401313 powerpc: Always enable queued spinlocks for 64s, disable for others
Queued spinlocks have shown to have good performance and fairness
properties even on smaller (2 socket) POWER systems. This selects them
automatically for 64s. For other platforms they are de-selected, the
standard spinlock is far simpler and smaller code, and single chips
with a handful of cores is unlikely to show any improvement.

CONFIG_EXPERT still allows this to be changed, e.g., to help debug
performance or correctness issues.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210118123451.1452206-1-npiggin@gmail.com
2021-01-30 11:39:32 +11:00
Viresh Kumar
7a3c90df20 arch: powerpc: Stop building and using oprofile
The "oprofile" user-space tools don't use the kernel OPROFILE support
any more, and haven't in a long time. User-space has been converted to
the perf interfaces.

This commits stops building oprofile for powerpc and removes any
reference to it from directories in arch/powerpc/ apart from
arch/powerpc/oprofile, which will be removed in the next commit (this is
broken into two commits as the size of the commit became very big, ~5k
lines).

Note that the member "oprofile_cpu_type" in "struct cpu_spec" isn't
removed as it was also used by other parts of the code.

Suggested-by: Christoph Hellwig <hch@infradead.org>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Robert Richter <rric@kernel.org>
Acked-by: William Cohen <wcohen@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
2021-01-29 10:05:51 +05:30
Al Viro
41026c3435 Kconfig: regularize selection of CONFIG_BINFMT_ELF
with mips converted to use of fs/config_binfmt_elf.c, there's no
need to keep selects of that thing all over arch/* - we can simply
turn into def_bool y if COMPAT && BINFMT_ELF (in fs/Kconfig.binfmt)
and get rid of all selects.

Several architectures got those selects wrong (e.g. you could
end up with sparc64 sans BINFMT_ELF, with select violating
dependencies, etc.)

Randy Dunlap has spotted some of those; IMO this is simpler than
his fix, but it depends upon the stuff that would need to be
backported, so we might end up using his variant for -stable.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-01-06 08:42:49 -05:00
Linus Torvalds
347d81b68b dma-mapping updates for 5.11:
- support for a partial IOMMU bypass (Alexey Kardashevskiy)
  - add a DMA API benchmark (Barry Song)
  - misc fixes (Tiezhu Yang, tangjianqiang)
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAl/iF+wLHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYP/HQ//beE+HGi0+5yiWdLY/Q3nqT/VExgdY2CAE2en0jcs
 kpUEZPfhE2dlKPf9nBl+ZsLIgqIwVP+oSawxZ65r0z/w95vgWANAmBg2m/FcnEUx
 Vl6jUUPmViY0AGzMR1+55voNNor4aX9rLUdZEN+vBy62Z6fvQVmKOtVPUf0ekkUh
 n3factrKIt7mplsIgJPO3v6G2XHugBaNmcrp9LkkmhniYkH8S31l4uKCSKI+6atT
 CsTTVqpQ+qVAyrgp30Xs+N9QoOefI4tFdXmvXzIIFe3JyDvIpniaiT+HHVXuEQJR
 5Yukj9sKJXS1ipjc+yWsjvn4Wax/tubJ7eqIo9alsIkcJXI3AlKljwBtFrKPMq+L
 NK54N3N31A+/hrxOLQyfqz5vH6cUicFr7STIedb8pD3y3/f/tiP+/qown6Wj3rB/
 E0IlCK5w87OVrJT1f5fHxh9Xe4R2W9rpeU/v/AZ/DrQllwp3PbG0kWqu3m4mujih
 CF3D+rOoB5tNtBxrNU+TpOVm+OOdX1IDtBleKePHTTIQeBY3p3UZuwINO0axFr9p
 oQndj1fiYvRwI6GS31GPpduFG7PrrwCb5zD0Rm07+aNCIoSmu+ADEngfgxnj/rGz
 q+c9Te1dDn58OktkiEONPNu+iuGBxhBR7AeP6UDdcC1vkO/hvW2HnppSY0qiWthU
 IFY=
 =dOug
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-5.11' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping updates from Christoph Hellwig:

 - support for a partial IOMMU bypass (Alexey Kardashevskiy)

 - add a DMA API benchmark (Barry Song)

 - misc fixes (Tiezhu Yang, tangjianqiang)

* tag 'dma-mapping-5.11' of git://git.infradead.org/users/hch/dma-mapping:
  selftests/dma: add test application for DMA_MAP_BENCHMARK
  dma-mapping: add benchmark support for streaming DMA APIs
  dma-contiguous: fix a typo error in a comment
  dma-pool: no need to check return value of debugfs_create functions
  powerpc/dma: Fallback to dma_ops when persistent memory present
  dma-mapping: Allow mixing bypass and mapped DMA operation
2020-12-22 13:19:43 -08:00
Linus Torvalds
8a5be36b93 powerpc updates for 5.11
- Switch to the generic C VDSO, as well as some cleanups of our VDSO
    setup/handling code.
 
  - Support for KUAP (Kernel User Access Prevention) on systems using the hashed
    page table MMU, using memory protection keys.
 
  - Better handling of PowerVM SMT8 systems where all threads of a core do not
    share an L2, allowing the scheduler to make better scheduling decisions.
 
  - Further improvements to our machine check handling.
 
  - Show registers when unwinding interrupt frames during stack traces.
 
  - Improvements to our pseries (PowerVM) partition migration code.
 
  - Several series from Christophe refactoring and cleaning up various parts of
    the 32-bit code.
 
  - Other smaller features, fixes & cleanups.
 
 Thanks to:
   Alan Modra, Alexey Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V, Ard
   Biesheuvel, Athira Rajeev, Balamuruhan S, Bill Wendling, Cédric Le Goater,
   Christophe Leroy, Christophe Lombard, Colin Ian King, Daniel Axtens, David
   Hildenbrand, Frederic Barrat, Ganesh Goudar, Gautham R. Shenoy, Geert
   Uytterhoeven, Giuseppe Sacco, Greg Kurz, Harish, Jan Kratochvil, Jordan
   Niethe, Kaixu Xia, Laurent Dufour, Leonardo Bras, Madhavan Srinivasan, Mahesh
   Salgaonkar, Mathieu Desnoyers, Nathan Lynch, Nicholas Piggin, Oleg Nesterov,
   Oliver O'Halloran, Oscar Salvador, Po-Hsu Lin, Qian Cai, Qinglang Miao, Randy
   Dunlap, Ravi Bangoria, Sachin Sant, Sandipan Das, Sebastian Andrzej Siewior ,
   Segher Boessenkool, Srikar Dronamraju, Tyrel Datwyler, Uwe Kleine-König,
   Vincent Stehlé, Youling Tang, Zhang Xiaoxu.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl/bURITHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgEzBEAC1Vwibcog2P9rkJPb0q3UGWSYSx25V
 h/LwwxtM9Tm14j/LZsSgkOgIsfMaWEBIw/8D4efQ7AX9aFo+R0c2DdQMx1MG5MXz
 gZk58+l3LwId6h9+OrwurpEW+ZmURLAtGMSyFdkeiZ3/XTnkbf1XnewC0QWQe56a
 EGLmjx1MFl45jspoy7UIUXsXoNJIfflEKhrgUzSUh8X2eLmvB9ws6A4BXxbVzyZl
 lZv3+uWimU2pFgdkB9jOCxoG4zFEr2o5ovLHG7zCCVo5JoXmTPQ5cMVBraH206ms
 +5vCmu4qI8uP5UlZW/mZfhrtDiMdHdQqqFOaQwVlOmoUbU6L6E6rxm3iVnov2Bbi
 iUgxoeJDxAb2cM2EWFK6oWVgr7+NkwvXM1IG8xtprhHrCdnC9r+psQr/dswb3LSg
 MJ7u/RCq3uixy2kWP8E0NEHw7ngQZ/ZKPqzfnmIWOC7tYUxgaL02I8Ff9/ZXAI2J
 CnmqFYOjrimHkcwXGOtKkXNvfU0DiL97qpK2AQNWElE8+bUUmpw+ltUrsdSycYmv
 Afc4WIcVrTA+a9laSLgjdZbolbNSa3p+cMIYdPrVx9g+xqygbxIKv+EDGNv1WHfD
 GU1gmohMY+ZkMOvFRMi8LAsEm0DH/etWE0py/8uyxDYKnGyD1Ur6452DStkmgGb2
 azmcaOyLdb+HXA==
 =Ga3K
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:

 - Switch to the generic C VDSO, as well as some cleanups of our VDSO
   setup/handling code.

 - Support for KUAP (Kernel User Access Prevention) on systems using the
   hashed page table MMU, using memory protection keys.

 - Better handling of PowerVM SMT8 systems where all threads of a core
   do not share an L2, allowing the scheduler to make better scheduling
   decisions.

 - Further improvements to our machine check handling.

 - Show registers when unwinding interrupt frames during stack traces.

 - Improvements to our pseries (PowerVM) partition migration code.

 - Several series from Christophe refactoring and cleaning up various
   parts of the 32-bit code.

 - Other smaller features, fixes & cleanups.

Thanks to: Alan Modra, Alexey Kardashevskiy, Andrew Donnellan, Aneesh
Kumar K.V, Ard Biesheuvel, Athira Rajeev, Balamuruhan S, Bill Wendling,
Cédric Le Goater, Christophe Leroy, Christophe Lombard, Colin Ian King,
Daniel Axtens, David Hildenbrand, Frederic Barrat, Ganesh Goudar,
Gautham R. Shenoy, Geert Uytterhoeven, Giuseppe Sacco, Greg Kurz,
Harish, Jan Kratochvil, Jordan Niethe, Kaixu Xia, Laurent Dufour,
Leonardo Bras, Madhavan Srinivasan, Mahesh Salgaonkar, Mathieu
Desnoyers, Nathan Lynch, Nicholas Piggin, Oleg Nesterov, Oliver
O'Halloran, Oscar Salvador, Po-Hsu Lin, Qian Cai, Qinglang Miao, Randy
Dunlap, Ravi Bangoria, Sachin Sant, Sandipan Das, Sebastian Andrzej
Siewior , Segher Boessenkool, Srikar Dronamraju, Tyrel Datwyler, Uwe
Kleine-König, Vincent Stehlé, Youling Tang, and Zhang Xiaoxu.

* tag 'powerpc-5.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (304 commits)
  powerpc/32s: Fix cleanup_cpu_mmu_context() compile bug
  powerpc: Add config fragment for disabling -Werror
  powerpc/configs: Add ppc64le_allnoconfig target
  powerpc/powernv: Rate limit opal-elog read failure message
  powerpc/pseries/memhotplug: Quieten some DLPAR operations
  powerpc/ps3: use dma_mapping_error()
  powerpc: force inlining of csum_partial() to avoid multiple csum_partial() with GCC10
  powerpc/perf: Fix Threshold Event Counter Multiplier width for P10
  powerpc/mm: Fix hugetlb_free_pmd_range() and hugetlb_free_pud_range()
  KVM: PPC: Book3S HV: Fix mask size for emulated msgsndp
  KVM: PPC: fix comparison to bool warning
  KVM: PPC: Book3S: Assign boolean values to a bool variable
  powerpc: Inline setup_kup()
  powerpc/64s: Mark the kuap/kuep functions non __init
  KVM: PPC: Book3S HV: XIVE: Add a comment regarding VP numbering
  powerpc/xive: Improve error reporting of OPAL calls
  powerpc/xive: Simplify xive_do_source_eoi()
  powerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_EOI_FW
  powerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_MASK_FW
  powerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_SHIFT_BUG
  ...
2020-12-17 13:34:25 -08:00
Linus Torvalds
7a932e5702 asm-generic: cross-architecture timer cleanup
This cleans up two ancient timer features that were never completed in
 the past, CONFIG_GENERIC_CLOCKEVENTS and CONFIG_ARCH_USES_GETTIMEOFFSET.
 
 There was only one user left for the ARCH_USES_GETTIMEOFFSET variant
 of clocksource implementations, the ARM EBSA110 platform. Rather than
 changing to use modern timekeeping, we remove the platform entirely as
 Russell no longer uses his machine and nobody else seems to have one
 any more.
 
 The conditional code for using arch_gettimeoffset() is removed as
 a result.
 
 For CONFIG_GENERIC_CLOCKEVENTS, there are still a couple of platforms
 not using clockevent drivers: parisc, ia64, most of m68k, and one
 Arm platform. These all do timer ticks slighly differently, and this
 gets cleaned up to the point they at least all call the same helper
 function. Instead of most platforms using 'select GENERIC_CLOCKEVENTS'
 in Kconfig, the polarity is now reversed, with the few remaining ones
 selecting LEGACY_TIMER_TICK instead.
 
 Signed-off-by: Arnd Bergmann <arnd@arndb.de>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAl/Y1v8ACgkQmmx57+YA
 GNmCvQ/9EDlgCt92r8SB+LGafDtgB8TUQZeIrs9S2mByzdxwnw0lxObIXFCnhQgh
 RpG3dR+ONRDnC5eI149B377JOEFMZWe2+BtYHUHkFARtUEWatslQcz7yAGvVRK/l
 TS/qReb6piKltlzuanF1bMZbjy2OhlaDRcm+OlC3y5mALR33M4emb+rJ6cSdfk3K
 v1iZhrxtfQT77ztesh/oPkPiyQ6kNcz7SfpyYOb6f5VLlml2BZ7YwBSVyGY7urHk
 RL3XqOUP4KKlMEAI8w0E2nvft6Fk+luziBhrMYWK0GvbmI1OESENuX/c6tgT2OQ1
 DRaVHvcPG/EAY8adOKxxVyHhEJDSoz5GJV/EtjlOegsJk6RomczR1uuiT3Kvm7Ah
 PktMKv4xQht1E15KPSKbOvNIEP18w2s5z6gw+jVDv8pw42pVEQManm1D+BICqrhl
 fcpw6T1drf9UxAjwX4+zXtmNs+a+mqiFG8puU4VVgT4GpQ8umHvunXz2WUjZO0jc
 3m8ErJHBvtJwW5TOHGyXnjl9SkwPzHOfF6IcXTYWEDU4/gQIK9TwUvCjLc0lE27t
 FMCV2ds7/K1CXwRgpa5IrefSkb8yOXSbRZ56NqqF7Ekxw4J5bYRSaY7jb+qD/e+3
 5O1y+iPxFrpH+16hSahvzrtcdFNbLQvBBuRtEQOYuHLt2UJrNoU=
 =QpNs
 -----END PGP SIGNATURE-----

Merge tag 'asm-generic-timers-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull asm-generic cross-architecture timer cleanup from Arnd Bergmann:
 "This cleans up two ancient timer features that were never completed in
  the past, CONFIG_GENERIC_CLOCKEVENTS and CONFIG_ARCH_USES_GETTIMEOFFSET.

  There was only one user left for the ARCH_USES_GETTIMEOFFSET variant
  of clocksource implementations, the ARM EBSA110 platform. Rather than
  changing to use modern timekeeping, we remove the platform entirely as
  Russell no longer uses his machine and nobody else seems to have one
  any more.

  The conditional code for using arch_gettimeoffset() is removed as a
  result.

  For CONFIG_GENERIC_CLOCKEVENTS, there are still a couple of platforms
  not using clockevent drivers: parisc, ia64, most of m68k, and one Arm
  platform. These all do timer ticks slighly differently, and this gets
  cleaned up to the point they at least all call the same helper
  function.

  Instead of most platforms using 'select GENERIC_CLOCKEVENTS' in
  Kconfig, the polarity is now reversed, with the few remaining ones
  selecting LEGACY_TIMER_TICK instead"

* tag 'asm-generic-timers-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  timekeeping: default GENERIC_CLOCKEVENTS to enabled
  timekeeping: remove xtime_update
  m68k: remove timer_interrupt() function
  m68k: change remaining timers to legacy_timer_tick
  m68k: m68328: use legacy_timer_tick()
  m68k: sun3/sun3c: use legacy_timer_tick
  m68k: split heartbeat out of timer function
  m68k: coldfire: use legacy_timer_tick()
  parisc: use legacy_timer_tick
  ARM: rpc: use legacy_timer_tick
  ia64: convert to legacy_timer_tick
  timekeeping: add CONFIG_LEGACY_TIMER_TICK
  timekeeping: remove arch_gettimeoffset
  net: remove am79c961a driver
  ARM: remove ebsa110 platform
2020-12-16 00:07:17 -08:00
Linus Torvalds
ac73e3dc8a Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:

 - a few random little subsystems

 - almost all of the MM patches which are staged ahead of linux-next
   material. I'll trickle to post-linux-next work in as the dependents
   get merged up.

Subsystems affected by this patch series: kthread, kbuild, ide, ntfs,
ocfs2, arch, and mm (slab-generic, slab, slub, dax, debug, pagecache,
gup, swap, shmem, memcg, pagemap, mremap, hmm, vmalloc, documentation,
kasan, pagealloc, memory-failure, hugetlb, vmscan, z3fold, compaction,
oom-kill, migration, cma, page-poison, userfaultfd, zswap, zsmalloc,
uaccess, zram, and cleanups).

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (200 commits)
  mm: cleanup kstrto*() usage
  mm: fix fall-through warnings for Clang
  mm: slub: convert sysfs sprintf family to sysfs_emit/sysfs_emit_at
  mm: shmem: convert shmem_enabled_show to use sysfs_emit_at
  mm:backing-dev: use sysfs_emit in macro defining functions
  mm: huge_memory: convert remaining use of sprintf to sysfs_emit and neatening
  mm: use sysfs_emit for struct kobject * uses
  mm: fix kernel-doc markups
  zram: break the strict dependency from lzo
  zram: add stat to gather incompressible pages since zram set up
  zram: support page writeback
  mm/process_vm_access: remove redundant initialization of iov_r
  mm/zsmalloc.c: rework the list_add code in insert_zspage()
  mm/zswap: move to use crypto_acomp API for hardware acceleration
  mm/zswap: fix passing zero to 'PTR_ERR' warning
  mm/zswap: make struct kernel_param_ops definitions const
  userfaultfd/selftests: hint the test runner on required privilege
  userfaultfd/selftests: fix retval check for userfaultfd_open()
  userfaultfd/selftests: always dump something in modes
  userfaultfd: selftests: make __{s,u}64 format specifiers portable
  ...
2020-12-15 12:53:37 -08:00
Mike Rapoport
5d6ad668f3 arch, mm: restore dependency of __kernel_map_pages() on DEBUG_PAGEALLOC
The design of DEBUG_PAGEALLOC presumes that __kernel_map_pages() must
never fail.  With this assumption is wouldn't be safe to allow general
usage of this function.

Moreover, some architectures that implement __kernel_map_pages() have this
function guarded by #ifdef DEBUG_PAGEALLOC and some refuse to map/unmap
pages when page allocation debugging is disabled at runtime.

As all the users of __kernel_map_pages() were converted to use
debug_pagealloc_map_pages() it is safe to make it available only when
DEBUG_PAGEALLOC is set.

Link: https://lkml.kernel.org/r/20201109192128.960-4-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 12:13:43 -08:00
Linus Torvalds
edd7ab7684 The new preemtible kmap_local() implementation:
- Consolidate all kmap_atomic() internals into a generic implementation
     which builds the base for the kmap_local() API and make the
     kmap_atomic() interface wrappers which handle the disabling/enabling of
     preemption and pagefaults.
 
   - Switch the storage from per-CPU to per task and provide scheduler
     support for clearing mapping when scheduling out and restoring them
     when scheduling back in.
 
   - Merge the migrate_disable/enable() code, which is also part of the
     scheduler pull request. This was required to make the kmap_local()
     interface available which does not disable preemption when a mapping
     is established. It has to disable migration instead to guarantee that
     the virtual address of the mapped slot is the same accross preemption.
 
   - Provide better debug facilities: guard pages and enforced utilization
     of the mapping mechanics on 64bit systems when the architecture allows
     it.
 
   - Provide the new kmap_local() API which can now be used to cleanup the
     kmap_atomic() usage sites all over the place. Most of the usage sites
     do not require the implicit disabling of preemption and pagefaults so
     the penalty on 64bit and 32bit non-highmem systems is removed and quite
     some of the code can be simplified. A wholesale conversion is not
     possible because some usage depends on the implicit side effects and
     some need to be cleaned up because they work around these side effects.
 
     The migrate disable side effect is only effective on highmem systems
     and when enforced debugging is enabled. On 64bit and 32bit non-highmem
     systems the overhead is completely avoided.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl/XyQwTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoUolD/9+R+BX96fGir+I8rG9dc3cbLw5meSi
 0I/Nq3PToZMs2Iqv50DsoaPYHHz/M6fcAO9LRIgsE9jRbnY93GnsBM0wU9Y8yQaT
 4wUzOG5WHaLDfqIkx/CN9coUl458oEiwOEbn79A2FmPXFzr7IpkufnV3ybGDwzwP
 p73bjMJMPPFrsa9ig87YiYfV/5IAZHi82PN8Cq1v4yNzgXRP3Tg6QoAuCO84ZnWF
 RYlrfKjcJ2xPdn+RuYyXolPtxr1hJQ0bOUpe4xu/UfeZjxZ7i1wtwLN9kWZe8CKH
 +x4Lz8HZZ5QMTQ9sCHOLtKzu2MceMcpISzoQH4/aFQCNMgLn1zLbS790XkYiQCuR
 ne9Cua+IqgYfGMG8cq8+bkU9HCNKaXqIBgPEKE/iHYVmqzCOqhW5Cogu4KFekf6V
 Wi7pyyUdX2en8BAWpk5NHc8de9cGcc+HXMq2NIcgXjVWvPaqRP6DeITERTZLJOmz
 XPxq5oPLGl7wdm7z+ICIaNApy8zuxpzb6sPLNcn7l5OeorViORlUu08AN8587wAj
 FiVjp6ZYomg+gyMkiNkDqFOGDH5TMENpOFoB0hNNEyJwwS0xh6CgWuwZcv+N8aPO
 HuS/P+tNANbD8ggT4UparXYce7YCtgOf3IG4GA3JJYvYmJ6pU+AZOWRoDScWq4o+
 +jlfoJhMbtx5Gg==
 =n71I
 -----END PGP SIGNATURE-----

Merge tag 'core-mm-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull kmap updates from Thomas Gleixner:
 "The new preemtible kmap_local() implementation:

   - Consolidate all kmap_atomic() internals into a generic
     implementation which builds the base for the kmap_local() API and
     make the kmap_atomic() interface wrappers which handle the
     disabling/enabling of preemption and pagefaults.

   - Switch the storage from per-CPU to per task and provide scheduler
     support for clearing mapping when scheduling out and restoring them
     when scheduling back in.

   - Merge the migrate_disable/enable() code, which is also part of the
     scheduler pull request. This was required to make the kmap_local()
     interface available which does not disable preemption when a
     mapping is established. It has to disable migration instead to
     guarantee that the virtual address of the mapped slot is the same
     across preemption.

   - Provide better debug facilities: guard pages and enforced
     utilization of the mapping mechanics on 64bit systems when the
     architecture allows it.

   - Provide the new kmap_local() API which can now be used to cleanup
     the kmap_atomic() usage sites all over the place. Most of the usage
     sites do not require the implicit disabling of preemption and
     pagefaults so the penalty on 64bit and 32bit non-highmem systems is
     removed and quite some of the code can be simplified. A wholesale
     conversion is not possible because some usage depends on the
     implicit side effects and some need to be cleaned up because they
     work around these side effects.

     The migrate disable side effect is only effective on highmem
     systems and when enforced debugging is enabled. On 64bit and 32bit
     non-highmem systems the overhead is completely avoided"

* tag 'core-mm-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
  ARM: highmem: Fix cache_is_vivt() reference
  x86/crashdump/32: Simplify copy_oldmem_page()
  io-mapping: Provide iomap_local variant
  mm/highmem: Provide kmap_local*
  sched: highmem: Store local kmaps in task struct
  x86: Support kmap_local() forced debugging
  mm/highmem: Provide CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP
  mm/highmem: Provide and use CONFIG_DEBUG_KMAP_LOCAL
  microblaze/mm/highmem: Add dropped #ifdef back
  xtensa/mm/highmem: Make generic kmap_atomic() work correctly
  mm/highmem: Take kmap_high_get() properly into account
  highmem: High implementation details and document API
  Documentation/io-mapping: Remove outdated blurb
  io-mapping: Cleanup atomic iomap
  mm/highmem: Remove the old kmap_atomic cruft
  highmem: Get rid of kmap_types.h
  xtensa/mm/highmem: Switch to generic kmap atomic
  sparc/mm/highmem: Switch to generic kmap atomic
  powerpc/mm/highmem: Switch to generic kmap atomic
  nds32/mm/highmem: Switch to generic kmap atomic
  ...
2020-12-14 18:35:53 -08:00
Cédric Le Goater
7b3b3de3b0 powerpc: Increase NR_IRQS range to support more KVM guests
PowerNV systems can handle up to 4K guests and 1M interrupt numbers
per chip. Increase the range of allowed interrupts to support a larger
number of guests.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201210171450.1933725-8-clg@kaod.org
2020-12-11 09:53:10 +11:00
Christophe Leroy
bccc58986a powerpc/8xx: Always pin kernel text TLB
There is no big poing in not pinning kernel text anymore, as now
we can keep pinned TLB even with things like DEBUG_PAGEALLOC.

Remove CONFIG_PIN_TLB_TEXT, making it always right.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Drop ifdef around mmu_pin_tlb() to fix build errors]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/203b89de491e1379f1677a2685211b7c32adfff0.1606231483.git.christophe.leroy@csgroup.eu
2020-12-09 23:47:45 +11:00
Colin Ian King
f8a4b277c3 powerpc: fix spelling mistake in Kconfig "seleted" -> "selected"
There is a spelling mistake in the help text of the Kconfig. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207155420.172370-1-colin.king@canonical.com
2020-12-08 21:41:03 +11:00
Jordan Niethe
1baa1f70ef powerpc: Allow relative pointers in bug table entries
This enables GENERIC_BUG_RELATIVE_POINTERS on Power so that 32-bit
offsets are stored in the bug entries rather than 64-bit pointers.
While this doesn't save space for 32-bit machines, use it anyway so
there is only one code path.

Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201201005203.15210-1-jniethe5@gmail.com
2020-12-04 01:01:20 +11:00
Christophe Leroy
b6254ced4d powerpc/signal: Don't manage floating point regs when no FPU
There is no point in copying floating point regs when there
is no FPU and MATH_EMULATION is not selected.

Create a new CONFIG_PPC_FPU_REGS bool that is selected by
CONFIG_MATH_EMULATION and CONFIG_PPC_FPU, and use it to
opt out everything related to fp_state in thread_struct.

The asm const used only by fpu.S are opted out with CONFIG_PPC_FPU
as fpu.S build is conditionnal to CONFIG_PPC_FPU.

The following app spends approx 8.1 seconds system time on an 8xx
without the patch, and 7.0 seconds with the patch (13.5% reduction).

On an 832x, it spends approx 2.6 seconds system time without
the patch and 2.1 seconds with the patch (19% reduction).

	void sigusr1(int sig) { }

	int main(int argc, char **argv)
	{
		int i = 100000;

		signal(SIGUSR1, sigusr1);
		for (;i--;)
			raise(SIGUSR1);
		exit(0);
	}

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7569070083e6cd5b279bb5023da601aba3c06f3c.1597770847.git.christophe.leroy@csgroup.eu
2020-12-04 01:01:11 +11:00
Christophe Leroy
ab037dd87a powerpc/vdso: Switch VDSO to generic C implementation.
With the C VDSO, the performance is slightly lower, but it is worth
it as it will ease maintenance and evolution, and also brings clocks
that are not supported with the ASM VDSO.

On an 8xx at 132 MHz, vdsotest with the ASM VDSO:
  gettimeofday:    		  vdso:  828 nsec/call
  clock-getres-realtime-coarse:   vdso:  391 nsec/call
  clock-gettime-realtime-coarse:  vdso:  614 nsec/call
  clock-getres-realtime:    	  vdso:  460 nsec/call
  clock-gettime-realtime:    	  vdso:  876 nsec/call
  clock-getres-monotonic-coarse:  vdso:  399 nsec/call
  clock-gettime-monotonic-coarse: vdso:  691 nsec/call
  clock-getres-monotonic:    	  vdso:  460 nsec/call
  clock-gettime-monotonic:    	  vdso: 1026 nsec/call

On an 8xx at 132 MHz, vdsotest with the C VDSO:
  gettimeofday:    		  vdso:  955 nsec/call
  clock-getres-realtime-coarse:   vdso:  545 nsec/call
  clock-gettime-realtime-coarse:  vdso:  592 nsec/call
  clock-getres-realtime:          vdso:  545 nsec/call
  clock-gettime-realtime:    	  vdso:  941 nsec/call
  clock-getres-monotonic-coarse:  vdso:  545 nsec/call
  clock-gettime-monotonic-coarse: vdso:  591 nsec/call
  clock-getres-monotonic:         vdso:  545 nsec/call
  clock-gettime-monotonic:        vdso:  940 nsec/call

It is even better for gettime with monotonic clocks.

Unsupported clocks with ASM VDSO:
  clock-gettime-boottime:         vdso: 3851 nsec/call
  clock-gettime-tai:      	  vdso: 3852 nsec/call
  clock-gettime-monotonic-raw:    vdso: 3396 nsec/call

Same clocks with C VDSO:
  clock-gettime-tai:              vdso:  941 nsec/call
  clock-gettime-monotonic-raw:    vdso: 1001 nsec/call
  clock-gettime-monotonic-coarse: vdso:  591 nsec/call

On an 8321E at 333 MHz, vdsotest with the ASM VDSO:
  gettimeofday:     		  vdso: 220 nsec/call
  clock-getres-realtime-coarse:   vdso: 102 nsec/call
  clock-gettime-realtime-coarse:  vdso: 178 nsec/call
  clock-getres-realtime:          vdso: 129 nsec/call
  clock-gettime-realtime:    	  vdso: 235 nsec/call
  clock-getres-monotonic-coarse:  vdso: 105 nsec/call
  clock-gettime-monotonic-coarse: vdso: 208 nsec/call
  clock-getres-monotonic:         vdso: 129 nsec/call
  clock-gettime-monotonic:        vdso: 274 nsec/call

On an 8321E at 333 MHz, vdsotest with the C VDSO:
  gettimeofday:    		  vdso: 272 nsec/call
  clock-getres-realtime-coarse:   vdso: 160 nsec/call
  clock-gettime-realtime-coarse:  vdso: 184 nsec/call
  clock-getres-realtime:          vdso: 166 nsec/call
  clock-gettime-realtime:         vdso: 281 nsec/call
  clock-getres-monotonic-coarse:  vdso: 160 nsec/call
  clock-gettime-monotonic-coarse: vdso: 184 nsec/call
  clock-getres-monotonic:         vdso: 169 nsec/call
  clock-gettime-monotonic:        vdso: 275 nsec/call

On a Power9 Nimbus DD2.2 at 3.8GHz, with the ASM VDSO:
  clock-gettime-monotonic:    	  vdso:  35 nsec/call
  clock-getres-monotonic:    	  vdso:  16 nsec/call
  clock-gettime-monotonic-coarse: vdso:  18 nsec/call
  clock-getres-monotonic-coarse:  vdso: 522 nsec/call
  clock-gettime-monotonic-raw:    vdso: 598 nsec/call
  clock-getres-monotonic-raw:     vdso: 520 nsec/call
  clock-gettime-realtime:    	  vdso:  34 nsec/call
  clock-getres-realtime:    	  vdso:  16 nsec/call
  clock-gettime-realtime-coarse:  vdso:  18 nsec/call
  clock-getres-realtime-coarse:   vdso: 517 nsec/call
  getcpu:    			  vdso:   8 nsec/call
  gettimeofday:    		  vdso:  25 nsec/call

And with the C VDSO:
  clock-gettime-monotonic:    	  vdso:  37 nsec/call
  clock-getres-monotonic:    	  vdso:  20 nsec/call
  clock-gettime-monotonic-coarse: vdso:  21 nsec/call
  clock-getres-monotonic-coarse:  vdso:  19 nsec/call
  clock-gettime-monotonic-raw:    vdso:  38 nsec/call
  clock-getres-monotonic-raw:     vdso:  20 nsec/call
  clock-gettime-realtime:    	  vdso:  37 nsec/call
  clock-getres-realtime:    	  vdso:  20 nsec/call
  clock-gettime-realtime-coarse:  vdso:  20 nsec/call
  clock-getres-realtime-coarse:   vdso:  19 nsec/call
  getcpu:    			  vdso:   8 nsec/call
  gettimeofday:    		  vdso:  28 nsec/call

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201126131006.2431205-8-mpe@ellerman.id.au
2020-12-04 01:01:10 +11:00
Michael Ellerman
bae80c27fc powerpc: Update NUMA Kconfig description & help text
Update the NUMA Kconfig description to match other architectures, and
add some help text. Shamelessly borrowed from x86/arm64.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20201124120547.1940635-3-mpe@ellerman.id.au
2020-12-04 01:01:09 +11:00
Michael Ellerman
4c28b32b88 powerpc: Make NUMA default y for powernv
Our NUMA option is default y for pseries, but not powernv. The bulk of
powernv systems are NUMA, so make NUMA default y for powernv also.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Link: https://lore.kernel.org/r/20201124120547.1940635-2-mpe@ellerman.id.au
2020-12-04 01:01:09 +11:00
Michael Ellerman
25395cd2f8 powerpc: Make NUMA depend on SMP
Our Kconfig allows NUMA to be enabled without SMP, but none of
our defconfigs use that combination. This means it can easily be
broken inadvertently by code changes, which has happened recently.

Although it's theoretically possible to have a machine with a single
CPU and multiple memory nodes, I can't think of any real systems where
that's the case. Even so if such a system exists, it can just run an
SMP kernel anyway.

So to avoid the need to add extra #ifdefs and/or build breaks, make
NUMA depend on SMP.

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20201124120547.1940635-1-mpe@ellerman.id.au
2020-12-04 01:01:09 +11:00
Nathan Chancellor
59612b24f7 kbuild: Hoist '--orphan-handling' into Kconfig
Currently, '--orphan-handling=warn' is spread out across four different
architectures in their respective Makefiles, which makes it a little
unruly to deal with in case it needs to be disabled for a specific
linker version (in this case, ld.lld 10.0.1).

To make it easier to control this, hoist this warning into Kconfig and
the main Makefile so that disabling it is simpler, as the warning will
only be enabled in a couple places (main Makefile and a couple of
compressed boot folders that blow away LDFLAGS_vmlinx) and making it
conditional is easier due to Kconfig syntax. One small additional
benefit of this is saving a call to ld-option on incremental builds
because we will have already evaluated it for CONFIG_LD_ORPHAN_WARN.

To keep the list of supported architectures the same, introduce
CONFIG_ARCH_WANT_LD_ORPHAN_WARN, which an architecture can select to
gain this automatically after all of the sections are specified and size
asserted. A special thanks to Kees Cook for the help text on this
config.

Link: https://github.com/ClangBuiltLinux/linux/issues/1187
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-12-01 22:45:36 +09:00
Alexey Kardashevskiy
bf6e2d562b powerpc/dma: Fallback to dma_ops when persistent memory present
So far we have been using huge DMA windows to map all the RAM available.
The RAM is normally mapped to the VM address space contiguously, and
there is always a reasonable upper limit for possible future hot plugged
RAM which makes it easy to map all RAM via IOMMU.

Now there is persistent memory ("ibm,pmemory" in the FDT) which (unlike
normal RAM) can map anywhere in the VM space beyond the maximum RAM size
and since it can be used for DMA, it requires extending the huge window
up to MAX_PHYSMEM_BITS which requires hypervisor support for:
1. huge TCE tables;
2. multilevel TCE tables;
3. huge IOMMU pages.

Certain hypervisors cannot do either so the only option left is
restricting the huge DMA window to include only RAM and fallback to
the default DMA window for persistent memory.

This defines arch_dma_map_direct/etc to allow generic DMA code perform
additional checks on whether direct DMA is still possible.

This checks if the system has persistent memory. If it does not,
the DMA bypass mode is selected, i.e.
* dev->bus_dma_limit = 0
* dev->dma_ops_bypass = true <- this avoid calling dma_ops for mapping.

If there is such memory, this creates identity mapping only for RAM and
sets the dev->bus_dma_limit to let the generic code decide whether to
call into the direct DMA or the indirect DMA ops.

This should not change the existing behaviour when no persistent memory
as dev->dma_ops_bypass is expected to be set.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-11-27 10:33:42 +01:00
Thomas Gleixner
47da42b27a powerpc/mm/highmem: Switch to generic kmap atomic
No reason having the same code in every architecture

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20201103095858.087635810@linutronix.de
2020-11-06 23:14:57 +01:00
Arnd Bergmann
0774a6ed29 timekeeping: default GENERIC_CLOCKEVENTS to enabled
Almost all machines use GENERIC_CLOCKEVENTS, so it feels wrong to
require each one to select that symbol manually.

Instead, enable it whenever CONFIG_LEGACY_TIMER_TICK is disabled as
a simplification. It should be possible to select both
GENERIC_CLOCKEVENTS and LEGACY_TIMER_TICK from an architecture now
and decide at runtime between the two.

For the clockevents arch-support.txt file, this means that additional
architectures are marked as TODO when they have at least one machine
that still uses LEGACY_TIMER_TICK, rather than being marked 'ok' when
at least one machine has been converted. This means that both m68k and
arm (for riscpc) revert to TODO.

At this point, we could just always enable CONFIG_GENERIC_CLOCKEVENTS
rather than leaving it off when not needed. I built an m68k
defconfig kernel (using gcc-10.1.0) and found that this would add
around 5.5KB in kernel image size:

   text	   data	    bss	    dec	    hex	filename
3861936	1092236	 196656	5150828	 4e986c	obj-m68k/vmlinux-no-clockevent
3866201	1093832	 196184	5156217	 4ead79	obj-m68k/vmlinux-clockevent

On Arm (MACH_RPC), that difference appears to be twice as large,
around 11KB on top of an 6MB vmlinux.

Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2020-10-30 21:57:07 +01:00
Linus Torvalds
96685f8666 powerpc updates for 5.10
- A series from Nick adding ARCH_WANT_IRQS_OFF_ACTIVATE_MM & selecting it for
    powerpc, as well as a related fix for sparc.
 
  - Remove support for PowerPC 601.
 
  - Some fixes for watchpoints & addition of a new ptrace flag for detecting ISA
    v3.1 (Power10) watchpoint features.
 
  - A fix for kernels using 4K pages and the hash MMU on bare metal Power9
    systems with > 16TB of RAM, or RAM on the 2nd node.
 
  - A basic idle driver for shallow stop states on Power10.
 
  - Tweaks to our sched domains code to better inform the scheduler about the
    hardware topology on Power9/10, where two SMT4 cores can be presented by
    firmware as an SMT8 core.
 
  - A series doing further reworks & cleanups of our EEH code.
 
  - Addition of a filter for RTAS (firmware) calls done via sys_rtas(), to
    prevent root from overwriting kernel memory.
 
  - Other smaller features, fixes & cleanups.
 
 Thanks to:
   Alexey Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V, Athira Rajeev, Biwen
   Li, Cameron Berkenpas, Cédric Le Goater, Christophe Leroy, Christoph Hellwig,
   Colin Ian King, Daniel Axtens, David Dai, Finn Thain, Frederic Barrat, Gautham
   R. Shenoy, Greg Kurz, Gustavo Romero, Ira Weiny, Jason Yan, Joel Stanley,
   Jordan Niethe, Kajol Jain, Konrad Rzeszutek Wilk, Laurent Dufour, Leonardo
   Bras, Liu Shixin, Luca Ceresoli, Madhavan Srinivasan, Mahesh Salgaonkar,
   Nathan Lynch, Nicholas Mc Guire, Nicholas Piggin, Nick Desaulniers, Oliver
   O'Halloran, Pedro Miraglia Franco de Carvalho, Pratik Rajesh Sampat, Qian Cai,
   Qinglang Miao, Ravi Bangoria, Russell Currey, Satheesh Rajendran, Scott
   Cheloha, Segher Boessenkool, Srikar Dronamraju, Stan Johnson, Stephen Kitt,
   Stephen Rothwell, Thiago Jung Bauermann, Tyrel Datwyler, Vaibhav Jain,
   Vaidyanathan Srinivasan, Vasant Hegde, Wang Wensheng, Wolfram Sang, Yang
   Yingliang, zhengbin.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl+JBQoTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgJJAD/0e3tsFP+9rFlxKSJlDcMW3w7kXDRXE
 tG40F1ubYFLU8wtFVR0De3njTRsz5HyaNU6SI8CwPq48mCa7OFn1D1OeHonHXDX9
 w6v3GE2S1uXXQnjm+czcfdjWQut0IwWBLx007/S23WcPff3Abc2irupKLNu+Gx29
 b/yxJHZSRJVX59jSV94HkdJS75mDHQ3oUOlFGXtuGcUZDufpD1ynRcQOjr0V/8JU
 F4WAblFSe7hiczHGqIvfhFVJ+OikEhnj2aEMAL8U7vxzrAZ7RErKCN9s/0Tf0Ktx
 FzNEFNLHZGqh+qNDpKKmM+RnaeO2Lcoc9qVn7vMHOsXPzx9F5LJwkI/DgPjtgAq/
 mFvGnQB/FapATnQeMluViC/qhEe5bQXLUfPP5i2+QOjK0QqwyFlUMgaVNfsY8jRW
 0Q/sNA72Opzst4WUTveCd4SOInlUuat09e5nLooCRLW7u7/jIiXNRSFNvpOiwkfF
 EcIPJsi6FUQ4SNbqpRSNEO9fK5JZrrUtmr0pg8I7fZhHYGcxEjqPR6IWCs3DTsak
 4/KhjhhTnP/IWJRw6qKAyNhEyEwpWqYZ97SIQbvSb1g/bS47AIdQdJRb0eEoRjhx
 sbbnnYFwPFkG4c1yQSIFanT9wNDQ2hFx/c/mRfbd7J+ordx9JsoqXjqrGuhsU/pH
 GttJLmkJ5FH+pQ==
 =akeX
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:

 - A series from Nick adding ARCH_WANT_IRQS_OFF_ACTIVATE_MM & selecting
   it for powerpc, as well as a related fix for sparc.

 - Remove support for PowerPC 601.

 - Some fixes for watchpoints & addition of a new ptrace flag for
   detecting ISA v3.1 (Power10) watchpoint features.

 - A fix for kernels using 4K pages and the hash MMU on bare metal
   Power9 systems with > 16TB of RAM, or RAM on the 2nd node.

 - A basic idle driver for shallow stop states on Power10.

 - Tweaks to our sched domains code to better inform the scheduler about
   the hardware topology on Power9/10, where two SMT4 cores can be
   presented by firmware as an SMT8 core.

 - A series doing further reworks & cleanups of our EEH code.

 - Addition of a filter for RTAS (firmware) calls done via sys_rtas(),
   to prevent root from overwriting kernel memory.

 - Other smaller features, fixes & cleanups.

Thanks to: Alexey Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V,
Athira Rajeev, Biwen Li, Cameron Berkenpas, Cédric Le Goater, Christophe
Leroy, Christoph Hellwig, Colin Ian King, Daniel Axtens, David Dai, Finn
Thain, Frederic Barrat, Gautham R. Shenoy, Greg Kurz, Gustavo Romero,
Ira Weiny, Jason Yan, Joel Stanley, Jordan Niethe, Kajol Jain, Konrad
Rzeszutek Wilk, Laurent Dufour, Leonardo Bras, Liu Shixin, Luca
Ceresoli, Madhavan Srinivasan, Mahesh Salgaonkar, Nathan Lynch, Nicholas
Mc Guire, Nicholas Piggin, Nick Desaulniers, Oliver O'Halloran, Pedro
Miraglia Franco de Carvalho, Pratik Rajesh Sampat, Qian Cai, Qinglang
Miao, Ravi Bangoria, Russell Currey, Satheesh Rajendran, Scott Cheloha,
Segher Boessenkool, Srikar Dronamraju, Stan Johnson, Stephen Kitt,
Stephen Rothwell, Thiago Jung Bauermann, Tyrel Datwyler, Vaibhav Jain,
Vaidyanathan Srinivasan, Vasant Hegde, Wang Wensheng, Wolfram Sang, Yang
Yingliang, zhengbin.

* tag 'powerpc-5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (228 commits)
  Revert "powerpc/pci: unmap legacy INTx interrupts when a PHB is removed"
  selftests/powerpc: Fix eeh-basic.sh exit codes
  cpufreq: powernv: Fix frame-size-overflow in powernv_cpufreq_reboot_notifier
  powerpc/time: Make get_tb() common to PPC32 and PPC64
  powerpc/time: Make get_tbl() common to PPC32 and PPC64
  powerpc/time: Remove get_tbu()
  powerpc/time: Avoid using get_tbl() and get_tbu() internally
  powerpc/time: Make mftb() common to PPC32 and PPC64
  powerpc/time: Rename mftbl() to mftb()
  powerpc/32s: Remove #ifdef CONFIG_PPC_BOOK3S_32 in head_book3s_32.S
  powerpc/32s: Rename head_32.S to head_book3s_32.S
  powerpc/32s: Setup the early hash table at all time.
  powerpc/time: Remove ifdef in get_dec() and set_dec()
  powerpc: Remove get_tb_or_rtc()
  powerpc: Remove __USE_RTC()
  powerpc: Tidy up a bit after removal of PowerPC 601.
  powerpc: Remove support for PowerPC 601
  powerpc: Remove PowerPC 601
  powerpc: Drop SYNC_601() ISYNC_601() and SYNC()
  powerpc: Remove CONFIG_PPC601_SYNC_FIX
  ...
2020-10-16 12:21:15 -07:00
Linus Torvalds
8b05418b25 seccomp updates for v5.10-rc1
- heavily refactor seccomp selftests (and clone3 selftests dependency) to
   fix powerpc (Kees Cook, Thadeu Lima de Souza Cascardo)
 - fix style issue in selftests (Zou Wei)
 - upgrade "unknown action" from KILL_THREAD to KILL_PROCESS (Rich Felker)
 - replace task_pt_regs(current) with current_pt_regs() (Denis Efremov)
 - fix corner-case race in USER_NOTIF (Jann Horn)
 - make CONFIG_SECCOMP no longer per-arch (YiFei Zhu)
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAl+E1LAWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJgRfD/0cq7W51+o34719vefC+oZaMjJJ
 Bd5HYshmr6NRpMqn0OhtT9kVi6OeV0sK0VJeNxSISDIaGNJ8xCI9YhnXwzY+7myK
 +IQu3i2Hv7dlWvTaXWFLL+mvfk6WopLntFGGJQ8KPMnP2gcfH2AZmOeAKGFGhBDe
 NwpAUZ9zriXg9JCQp6u0FzPJgk8KfgfHjUY6Hsa095gg0aPSJhc8bWEUNBQwjCe6
 uIcxDP/zK2WWaEhO9BfHt6/VTcXw7QgTLS3yM+pwBCgR1JHs7HMhtgcwPT410qES
 LmYD8OiHmv5AZhDjcCcNipKEv3ZnxkLnpU/6hfaKM4zn/DoaR/zbfjO9U017rcNV
 9gf7k5siAP7DH48IFlqf4Erzd3xyF0OJDnVfC7NiPtggPfO9aWOHJJZCuJRQOdrN
 qPMjkaQzFb02qb501PLEn55F24OLDjz1vFOqpkJm2/XamOBVV4uiRKmfpNEo/MOf
 QkhSvzvwEFErWwzPH95uFyVhs42stwnM3ppnwtya2+U5kxXdNvbAR8N5leH7siaU
 ab+YJIHW59+BxXTlKgXIcqBP/6RqJWJtuT9OqGs0K2A7FhQSexh5MOm+9vvGgIwZ
 Qjyijku8dB3aV94BNGnlJq6BV+4Hc6EGadh7h3b8GiRAUTYo0pk5G/iKL6Ii+R6p
 0msJENqalKFtNCr70w==
 =a4u2
 -----END PGP SIGNATURE-----

Merge tag 'seccomp-v5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull seccomp updates from Kees Cook:
 "The bulk of the changes are with the seccomp selftests to accommodate
  some powerpc-specific behavioral characteristics. Additional cleanups,
  fixes, and improvements are also included:

   - heavily refactor seccomp selftests (and clone3 selftests
     dependency) to fix powerpc (Kees Cook, Thadeu Lima de Souza
     Cascardo)

   - fix style issue in selftests (Zou Wei)

   - upgrade "unknown action" from KILL_THREAD to KILL_PROCESS (Rich
     Felker)

   - replace task_pt_regs(current) with current_pt_regs() (Denis
     Efremov)

   - fix corner-case race in USER_NOTIF (Jann Horn)

   - make CONFIG_SECCOMP no longer per-arch (YiFei Zhu)"

* tag 'seccomp-v5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (23 commits)
  seccomp: Make duplicate listener detection non-racy
  seccomp: Move config option SECCOMP to arch/Kconfig
  selftests/clone3: Avoid OS-defined clone_args
  selftests/seccomp: powerpc: Set syscall return during ptrace syscall exit
  selftests/seccomp: Allow syscall nr and ret value to be set separately
  selftests/seccomp: Record syscall during ptrace entry
  selftests/seccomp: powerpc: Fix seccomp return value testing
  selftests/seccomp: Remove SYSCALL_NUM_RET_SHARE_REG in favor of SYSCALL_RET_SET
  selftests/seccomp: Avoid redundant register flushes
  selftests/seccomp: Convert REGSET calls into ARCH_GETREG/ARCH_SETREG
  selftests/seccomp: Convert HAVE_GETREG into ARCH_GETREG/ARCH_SETREG
  selftests/seccomp: Remove syscall setting #ifdefs
  selftests/seccomp: mips: Remove O32-specific macro
  selftests/seccomp: arm64: Define SYSCALL_NUM_SET macro
  selftests/seccomp: arm: Define SYSCALL_NUM_SET macro
  selftests/seccomp: mips: Define SYSCALL_NUM_SET macro
  selftests/seccomp: Provide generic syscall setting macro
  selftests/seccomp: Refactor arch register macros to avoid xtensa special case
  selftests/seccomp: Use __NR_mknodat instead of __NR_mknod
  selftests/seccomp: Use bitwise instead of arithmetic operator for flags
  ...
2020-10-13 16:33:43 -07:00
Linus Torvalds
cc7343724e Surgery of the MSI interrupt handling to prepare the support of upcoming
devices which require non-PCI based MSI handling.
 
   - Cleanup historical leftovers all over the place
 
   - Rework the code to utilize more core functionality
 
   - Wrap XEN PCI/MSI interrupts into an irqdomain to make irqdomain
     assignment to PCI devices possible.
 
   - Assign irqdomains to PCI devices at initialization time which allows
     to utilize the full functionality of hierarchical irqdomains.
 
   - Remove arch_.*_msi_irq() functions from X86 and utilize the irqdomain
     which is assigned to the device for interrupt management.
 
   - Make the arch_.*_msi_irq() support conditional on a config switch and
     let the last few users select it.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl+EUxcTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoagLEACGp5U7a4mk24GsOZJDhrua1PHR/fhb
 enn/5yOPpxDXdYmtFHIjV5qrNjDTV/WqDlI96KOi+oinG1Eoj0O/MA1AcSRhp6nf
 jVdAuK1X0DHDUTEeTAP0JFwqd2j0KlIOphBrIMgeWIf1CRKlYiJaO+ioF9fKgwZ/
 /HigOTSykGYMPggm3JXnWTWtJkKSGFxeADBvVHt5RpVmbWtrI4YoSBxKEMtvjyeM
 5+GsqbCad1CnFYTN74N+QWVGmgGnUWGEzWsPYnJ9hW+yyjad1kWx3n6NcCWhssaC
 E4vAXl6JuCPntL7jBFkbfUkQsgq12ThMZYWpCq8pShJA9O2tDKkxIGasHWrIt4cz
 nYrESiv6hM7edjtOvBc086Gd0A2EyGOM879goHyaNVaTO4rI6jfZG7PlW1HHWibS
 mf/bdTXBtULGNgEt7T8Qnb8sZ+D01WqzLrq/wm645jIrTzvNHUEpOhT1aH/g4TFQ
 cNHD5PcM9OTmiBir9srNd47+1s2mpfwdMYHKBt2QgiXMO8fRgdtr6WLQE4vJjmG8
 sA0yGGsgdTKeg2wW1ERF1pWL0Lt05Iaa42Skm0D3BwcOG2n5ltkBHzVllto9cTUh
 kIldAOgxGE6QeCnnlrnbHz5mvzt/3Ih/PIKqPSUAC94Kx1yvVHRYuOvDExeO8DFB
 P+f0TkrscZObSg==
 =JlqV
 -----END PGP SIGNATURE-----

Merge tag 'x86-irq-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 irq updates from Thomas Gleixner:
 "Surgery of the MSI interrupt handling to prepare the support of
  upcoming devices which require non-PCI based MSI handling:

   - Cleanup historical leftovers all over the place

   - Rework the code to utilize more core functionality

   - Wrap XEN PCI/MSI interrupts into an irqdomain to make irqdomain
     assignment to PCI devices possible.

   - Assign irqdomains to PCI devices at initialization time which
     allows to utilize the full functionality of hierarchical
     irqdomains.

   - Remove arch_.*_msi_irq() functions from X86 and utilize the
     irqdomain which is assigned to the device for interrupt management.

   - Make the arch_.*_msi_irq() support conditional on a config switch
     and let the last few users select it"

* tag 'x86-irq-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
  PCI: MSI: Fix Kconfig dependencies for PCI_MSI_ARCH_FALLBACKS
  x86/apic/msi: Unbreak DMAR and HPET MSI
  iommu/amd: Remove domain search for PCI/MSI
  iommu/vt-d: Remove domain search for PCI/MSI[X]
  x86/irq: Make most MSI ops XEN private
  x86/irq: Cleanup the arch_*_msi_irqs() leftovers
  PCI/MSI: Make arch_.*_msi_irq[s] fallbacks selectable
  x86/pci: Set default irq domain in pcibios_add_device()
  iommm/amd: Store irq domain in struct device
  iommm/vt-d: Store irq domain in struct device
  x86/xen: Wrap XEN MSI management into irqdomain
  irqdomain/msi: Allow to override msi_domain_alloc/free_irqs()
  x86/xen: Consolidate XEN-MSI init
  x86/xen: Rework MSI teardown
  x86/xen: Make xen_msi_init() static and rename it to xen_hvm_msi_init()
  PCI/MSI: Provide pci_dev_has_special_msi_domain() helper
  PCI_vmd_Mark_VMD_irqdomain_with_DOMAIN_BUS_VMD_MSI
  irqdomain/msi: Provide DOMAIN_BUS_VMD_MSI
  x86/irq: Initialize PCI/MSI domain at PCI init time
  x86/pci: Reducde #ifdeffery in PCI init code
  ...
2020-10-12 11:40:41 -07:00
Linus Torvalds
ca1b66922a * Extend the recovery from MCE in kernel space also to processes which
encounter an MCE in kernel space but while copying from user memory by
 sending them a SIGBUS on return to user space and umapping the faulty
 memory, by Tony Luck and Youquan Song.
 
 * memcpy_mcsafe() rework by splitting the functionality into
 copy_mc_to_user() and copy_mc_to_kernel(). This, as a result, enables
 support for new hardware which can recover from a machine check
 encountered during a fast string copy and makes that the default and
 lets the older hardware which does not support that advance recovery,
 opt in to use the old, fragile, slow variant, by Dan Williams.
 
 * New AMD hw enablement, by Yazen Ghannam and Akshay Gupta.
 
 * Do not use MSR-tracing accessors in #MC context and flag any fault
 while accessing MCA architectural MSRs as an architectural violation
 with the hope that such hw/fw misdesigns are caught early during the hw
 eval phase and they don't make it into production.
 
 * Misc fixes, improvements and cleanups, as always.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl+EIpUACgkQEsHwGGHe
 VUouoBAAgwb+NkWZtIqGImV4f+LOyFjhTR/r/7ZyiijXdbhOIuAdc/jQM31mQxug
 sX2jxaRYnf1n6SLA0ggX99gwr2deRQ/hsNf5Abw55GC+Z1dOxpGL0k59A3ELl1IR
 H9KYmCAFQIHvzfk38qcdND73XHcgthQoXFBOG9wAPAdgDWnaiWt6lcLAq8OiJTmp
 D8pInAYhcnL8YXwMGyQQ1KkFn9HwydoWDsK5Ff2shaw2/+dMQqd1zetenbVtjhLb
 iNYGvV7Bi/RQ8PyMbzmtTWa4kwQJAHC2gptkGxty//2ADGVBbqUQdqF9TjIWCNy5
 V6Ldv5zo0/1s7DOzji3htzqkSs/K1Ea6d2LtZjejkJipHKV5x068UC6Fu+PlfS2D
 VZfcICeapU4G2F3Zvks2DlZ7dVTbHCvoI78Qi7bBgczPUVmk6iqah4xuQaiHyBJc
 kTFDA4Nnf/026GpoWRiFry9vqdnHBZyLet5A6Y+SoWF0FbhYnCVPpq4MnussYoav
 lUIi9ZZav6X2RZp9DDM1f9d5xubtKq0DKt93wvzqAhjK0T2DikckJ+riOYkI6N8t
 fHCBNUkdfgyMzJUTBPAzYQ7RmjbjKWJi7xWP0oz6+GqOJkQfSTVC5/2yEffbb3ya
 whYRS6iklbl7yshzaOeecXsZcAeK2oGPfoHg34WkHFgXdF5mNgA=
 =u1Wg
 -----END PGP SIGNATURE-----

Merge tag 'ras_updates_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RAS updates from Borislav Petkov:

 - Extend the recovery from MCE in kernel space also to processes which
   encounter an MCE in kernel space but while copying from user memory
   by sending them a SIGBUS on return to user space and umapping the
   faulty memory, by Tony Luck and Youquan Song.

 - memcpy_mcsafe() rework by splitting the functionality into
   copy_mc_to_user() and copy_mc_to_kernel(). This, as a result, enables
   support for new hardware which can recover from a machine check
   encountered during a fast string copy and makes that the default and
   lets the older hardware which does not support that advance recovery,
   opt in to use the old, fragile, slow variant, by Dan Williams.

 - New AMD hw enablement, by Yazen Ghannam and Akshay Gupta.

 - Do not use MSR-tracing accessors in #MC context and flag any fault
   while accessing MCA architectural MSRs as an architectural violation
   with the hope that such hw/fw misdesigns are caught early during the
   hw eval phase and they don't make it into production.

 - Misc fixes, improvements and cleanups, as always.

* tag 'ras_updates_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Allow for copy_mc_fragile symbol checksum to be generated
  x86/mce: Decode a kernel instruction to determine if it is copying from user
  x86/mce: Recover from poison found while copying from user space
  x86/mce: Avoid tail copy when machine check terminated a copy from user
  x86/mce: Add _ASM_EXTABLE_CPY for copy user access
  x86/mce: Provide method to find out the type of an exception handler
  x86/mce: Pass pointer to saved pt_regs to severity calculation routines
  x86/copy_mc: Introduce copy_mc_enhanced_fast_string()
  x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}()
  x86/mce: Drop AMD-specific "DEFERRED" case from Intel severity rule list
  x86/mce: Add Skylake quirk for patrol scrub reported errors
  RAS/CEC: Convert to DEFINE_SHOW_ATTRIBUTE()
  x86/mce: Annotate mce_rd/wrmsrl() with noinstr
  x86/mce/dev-mcelog: Do not update kflags on AMD systems
  x86/mce: Stop mce_reign() from re-computing severity for every CPU
  x86/mce: Make mce_rdmsrl() panic on an inaccessible MSR
  x86/mce: Increase maximum number of banks to 64
  x86/mce: Delay clearing IA32_MCG_STATUS to the end of do_machine_check()
  x86/MCE/AMD, EDAC/mce_amd: Remove struct smca_hwid.xec_bitmap
  RAS/CEC: Fix cec_init() prototype
2020-10-12 10:14:38 -07:00
YiFei Zhu
282a181b1a seccomp: Move config option SECCOMP to arch/Kconfig
In order to make adding configurable features into seccomp easier,
it's better to have the options at one single location, considering
especially that the bulk of seccomp code is arch-independent. An quick
look also show that many SECCOMP descriptions are outdated; they talk
about /proc rather than prctl.

As a result of moving the config option and keeping it default on,
architectures arm, arm64, csky, riscv, sh, and xtensa did not have SECCOMP
on by default prior to this and SECCOMP will be default in this change.

Architectures microblaze, mips, powerpc, s390, sh, and sparc have an
outdated depend on PROC_FS and this dependency is removed in this change.

Suggested-by: Jann Horn <jannh@google.com>
Link: https://lore.kernel.org/lkml/CAG48ez1YWz9cnp08UZgeieYRhHdqh-ch7aNwc4JRBnGyrmgfMg@mail.gmail.com/
Signed-off-by: YiFei Zhu <yifeifz2@illinois.edu>
[kees: added HAVE_ARCH_SECCOMP help text, tweaked wording]
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/9ede6ef35c847e58d61e476c6a39540520066613.1600951211.git.yifeifz2@illinois.edu
2020-10-08 13:17:47 -07:00
Andrew Donnellan
bd59380c5b powerpc/rtas: Restrict RTAS requests from userspace
A number of userspace utilities depend on making calls to RTAS to retrieve
information and update various things.

The existing API through which we expose RTAS to userspace exposes more
RTAS functionality than we actually need, through the sys_rtas syscall,
which allows root (or anyone with CAP_SYS_ADMIN) to make any RTAS call they
want with arbitrary arguments.

Many RTAS calls take the address of a buffer as an argument, and it's up to
the caller to specify the physical address of the buffer as an argument. We
allocate a buffer (the "RMO buffer") in the Real Memory Area that RTAS can
access, and then expose the physical address and size of this buffer in
/proc/powerpc/rtas/rmo_buffer. Userspace is expected to read this address,
poke at the buffer using /dev/mem, and pass an address in the RMO buffer to
the RTAS call.

However, there's nothing stopping the caller from specifying whatever
address they want in the RTAS call, and it's easy to construct a series of
RTAS calls that can overwrite arbitrary bytes (even without /dev/mem
access).

Additionally, there are some RTAS calls that do potentially dangerous
things and for which there are no legitimate userspace use cases.

In the past, this would not have been a particularly big deal as it was
assumed that root could modify all system state freely, but with Secure
Boot and lockdown we need to care about this.

We can't fundamentally change the ABI at this point, however we can address
this by implementing a filter that checks RTAS calls against a list
of permitted calls and forces the caller to use addresses within the RMO
buffer.

The list is based off the list of calls that are used by the librtas
userspace library, and has been tested with a number of existing userspace
RTAS utilities. For compatibility with any applications we are not aware of
that require other calls, the filter can be turned off at build time.

Cc: stable@vger.kernel.org
Reported-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200820044512.7543-1-ajd@linux.ibm.com
2020-10-06 23:22:27 +11:00
Daniel Axtens
5c5e46dad9 powerpc: PPC_SECURE_BOOT should not require PowerNV
In commit 61f879d97c ("powerpc/pseries: Detect secure and trusted
boot state of the system.") we taught the kernel how to understand the
secure-boot parameters used by a pseries guest.

However, CONFIG_PPC_SECURE_BOOT still requires PowerNV. I didn't
catch this because pseries_le_defconfig includes support for
PowerNV and so everything still worked. Indeed, most configs will.
Nonetheless, technically PPC_SECURE_BOOT doesn't require PowerNV
any more.

The secure variables support (PPC_SECVAR_SYSFS) doesn't do anything
on pSeries yet, but I don't think it's worth adding a new condition -
at some stage we'll want to add a backend for pSeries anyway.

Fixes: 61f879d97c ("powerpc/pseries: Detect secure and trusted boot state of the system.")
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200924014922.172914-1-dja@axtens.net
2020-10-06 23:22:24 +11:00
Dan Williams
ec6347bb43 x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}()
In reaction to a proposal to introduce a memcpy_mcsafe_fast()
implementation Linus points out that memcpy_mcsafe() is poorly named
relative to communicating the scope of the interface. Specifically what
addresses are valid to pass as source, destination, and what faults /
exceptions are handled.

Of particular concern is that even though x86 might be able to handle
the semantics of copy_mc_to_user() with its common copy_user_generic()
implementation other archs likely need / want an explicit path for this
case:

  On Fri, May 1, 2020 at 11:28 AM Linus Torvalds <torvalds@linux-foundation.org> wrote:
  >
  > On Thu, Apr 30, 2020 at 6:21 PM Dan Williams <dan.j.williams@intel.com> wrote:
  > >
  > > However now I see that copy_user_generic() works for the wrong reason.
  > > It works because the exception on the source address due to poison
  > > looks no different than a write fault on the user address to the
  > > caller, it's still just a short copy. So it makes copy_to_user() work
  > > for the wrong reason relative to the name.
  >
  > Right.
  >
  > And it won't work that way on other architectures. On x86, we have a
  > generic function that can take faults on either side, and we use it
  > for both cases (and for the "in_user" case too), but that's an
  > artifact of the architecture oddity.
  >
  > In fact, it's probably wrong even on x86 - because it can hide bugs -
  > but writing those things is painful enough that everybody prefers
  > having just one function.

Replace a single top-level memcpy_mcsafe() with either
copy_mc_to_user(), or copy_mc_to_kernel().

Introduce an x86 copy_mc_fragile() name as the rename for the
low-level x86 implementation formerly named memcpy_mcsafe(). It is used
as the slow / careful backend that is supplanted by a fast
copy_mc_generic() in a follow-on patch.

One side-effect of this reorganization is that separating copy_mc_64.S
to its own file means that perf no longer needs to track dependencies
for its memcpy_64.S benchmarks.

 [ bp: Massage a bit. ]

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: <stable@vger.kernel.org>
Link: http://lore.kernel.org/r/CAHk-=wjSqtXAqfUJxFtWNwmguFASTgB0dz1dT3V-78Quiezqbg@mail.gmail.com
Link: https://lkml.kernel.org/r/160195561680.2163339.11574962055305783722.stgit@dwillia2-desk3.amr.corp.intel.com
2020-10-06 11:18:04 +02:00
Thomas Gleixner
981aa1d366 PCI: MSI: Fix Kconfig dependencies for PCI_MSI_ARCH_FALLBACKS
The unconditional selection of PCI_MSI_ARCH_FALLBACKS has an unmet
dependency because PCI_MSI_ARCH_FALLBACKS is defined in a 'if PCI' clause.

As it is only relevant when PCI_MSI is enabled, update the affected
architecture Kconfigs to make the selection of PCI_MSI_ARCH_FALLBACKS
depend on 'if PCI_MSI'.

Fixes: 077ee78e39 ("PCI/MSI: Make arch_.*_msi_irq[s] fallbacks selectable")
Reported-by: Qian Cai <cai@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Links: https://lore.kernel.org/r/cdfd63305caa57785b0925dd24c0711ea02c8527.camel@redhat.com
2020-09-28 16:03:10 +02:00
Michael Ellerman
0b30191b27 Merge branch 'topic/irqs-off-activate-mm' into next
Merge Nick's series to add ARCH_WANT_IRQS_OFF_ACTIVATE_MM.
2020-09-18 18:14:06 +10:00
Thomas Gleixner
077ee78e39 PCI/MSI: Make arch_.*_msi_irq[s] fallbacks selectable
The arch_.*_msi_irq[s] fallbacks are compiled in whether an architecture
requires them or not. Architectures which are fully utilizing hierarchical
irq domains should never call into that code.

It's not only architectures which depend on that by implementing one or
more of the weak functions, there is also a bunch of drivers which relies
on the weak functions which invoke msi_controller::setup_irq[s] and
msi_controller::teardown_irq.

Make the architectures and drivers which rely on them select them in Kconfig
and if not selected replace them by stub functions which emit a warning and
fail the PCI/MSI interrupt allocation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20200826112333.992429909@linutronix.de
2020-09-16 16:52:37 +02:00
Nicholas Piggin
66acd46080 powerpc: select ARCH_WANT_IRQS_OFF_ACTIVATE_MM
powerpc uses IPIs in some situations to switch a kernel thread away
from a lazy tlb mm, which is subject to the TLB flushing race
described in the changelog introducing ARCH_WANT_IRQS_OFF_ACTIVATE_MM.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200914045219.3736466-3-npiggin@gmail.com
2020-09-16 12:24:37 +10:00
Aneesh Kumar K.V
eb553f1697 powerpc/64/mm: implement page mapping percpu first chunk allocator
Implement page mapping percpu first chunk allocator as a fallback to
the embedding allocator. With 4K hash translation we limit our page
table range to 64TB and commit: 0034d395f8 ("powerpc/mm/hash64: Map all the
kernel regions in the same 0xc range") moved all kernel mapping to
that 64TB range. In-order to support sparse memory layout we need
to increase our linear mapping space and reduce other mappings.

With such a layout percpu embedded first chunk allocator will fail
because of small vmalloc range. Add a fallback to page mapping
percpu first chunk allocator for such failures.

The below dmesg output can be observed in such case.

 percpu: max_distance=0x1ffffef00000 too large for vmalloc space 0x10000000000
 PERCPU: auto allocator failed (-22), falling back to page size
 percpu: 40 4K pages/cpu s148816 r0 d15024

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200608070904.387440-2-aneesh.kumar@linux.ibm.com
2020-09-15 22:13:22 +10:00
Aneesh Kumar K.V
675bceb097 powerpc/mm: Remove DEBUG_VM_PGTABLE support on powerpc
The test is broken w.r.t page table update rules and results in kernel
crash as below. Disable the support until we get the tests updated.

[   21.083519] kernel BUG at arch/powerpc/mm/pgtable.c:304!
cpu 0x0: Vector: 700 (Program Check) at [c000000c6d1e76c0]
    pc: c00000000009a5ec: assert_pte_locked+0x14c/0x380
    lr: c0000000005eeeec: pte_update+0x11c/0x190
    sp: c000000c6d1e7950
   msr: 8000000002029033
  current = 0xc000000c6d172c80
  paca    = 0xc000000003ba0000   irqmask: 0x03   irq_happened: 0x01
    pid   = 1, comm = swapper/0
kernel BUG at arch/powerpc/mm/pgtable.c:304!
[link register   ] c0000000005eeeec pte_update+0x11c/0x190
[c000000c6d1e7950] 0000000000000001 (unreliable)
[c000000c6d1e79b0] c0000000005eee14 pte_update+0x44/0x190
[c000000c6d1e7a10] c000000001a2ca9c pte_advanced_tests+0x160/0x3d8
[c000000c6d1e7ab0] c000000001a2d4fc debug_vm_pgtable+0x7e8/0x1338
[c000000c6d1e7ba0] c0000000000116ec do_one_initcall+0xac/0x5f0
[c000000c6d1e7c80] c0000000019e4fac kernel_init_freeable+0x4dc/0x5a4
[c000000c6d1e7db0] c000000000012474 kernel_init+0x24/0x160
[c000000c6d1e7e20] c00000000000cbd0 ret_from_kernel_thread+0x5c/0x6c

With DEBUG_VM disabled

[   20.530152] BUG: Kernel NULL pointer dereference on read at 0x00000000
[   20.530183] Faulting instruction address: 0xc0000000000df330
cpu 0x33: Vector: 380 (Data SLB Access) at [c000000c6d19f700]
    pc: c0000000000df330: memset+0x68/0x104
    lr: c00000000009f6d8: hash__pmdp_huge_get_and_clear+0xe8/0x1b0
    sp: c000000c6d19f990
   msr: 8000000002009033
   dar: 0
  current = 0xc000000c6d177480
  paca    = 0xc00000001ec4f400   irqmask: 0x03   irq_happened: 0x01
    pid   = 1, comm = swapper/0
[link register   ] c00000000009f6d8 hash__pmdp_huge_get_and_clear+0xe8/0x1b0
[c000000c6d19f990] c00000000009f748 hash__pmdp_huge_get_and_clear+0x158/0x1b0 (unreliable)
[c000000c6d19fa10] c0000000019ebf30 pmd_advanced_tests+0x1f0/0x378
[c000000c6d19fab0] c0000000019ed088 debug_vm_pgtable+0x79c/0x1244
[c000000c6d19fba0] c0000000000116ec do_one_initcall+0xac/0x5f0
[c000000c6d19fc80] c0000000019a4fac kernel_init_freeable+0x4dc/0x5a4
[c000000c6d19fdb0] c000000000012474 kernel_init+0x24/0x160
[c000000c6d19fe20] c00000000000cbd0 ret_from_kernel_thread+0x5c/0x6c
33:mon>

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200902040122.136414-1-aneesh.kumar@linux.ibm.com
2020-09-02 22:14:26 +10:00
Shawn Anastasio
9b725a90a8 powerpc/64s: Disallow PROT_SAO in LPARs by default
Since migration of guests using SAO to ISA 3.1 hosts may cause issues,
disable PROT_SAO in LPARs by default and introduce a new Kconfig option
PPC_PROT_SAO_LPAR to allow users to enable it if desired.

Signed-off-by: Shawn Anastasio <shawn@anastas.io>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200821185558.35561-3-shawn@anastas.io
2020-08-24 14:12:54 +10:00
Linus Torvalds
25d8d4eeca powerpc updates for 5.9
- Add support for (optionally) using queued spinlocks & rwlocks.
 
  - Support for a new faster system call ABI using the scv instruction on Power9
    or later.
 
  - Drop support for the PROT_SAO mmap/mprotect flag as it will be unsupported on
    Power10 and future processors, leaving us with no way to implement the
    functionality it requests. This risks breaking userspace, though we believe
    it is unused in practice.
 
  - A bug fix for, and then the removal of, our custom stack expansion checking.
    We now allow stack expansion up to the rlimit, like other architectures.
 
  - Remove the remnants of our (previously disabled) topology update code, which
    tried to react to NUMA layout changes on virtualised systems, but was prone
    to crashes and other problems.
 
  - Add PMU support for Power10 CPUs.
 
  - A change to our signal trampoline so that we don't unbalance the link stack
    (branch return predictor) in the signal delivery path.
 
  - Lots of other cleanups, refactorings, smaller features and so on as usual.
 
 Thanks to:
   Abhishek Goel, Alastair D'Silva, Alexander A. Klimov, Alexey Kardashevskiy,
   Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Anton
   Blanchard, Arnd Bergmann, Athira Rajeev, Balamuruhan S, Bharata B Rao, Bill
   Wendling, Bin Meng, Cédric Le Goater, Chris Packham, Christophe Leroy,
   Christoph Hellwig, Daniel Axtens, Dan Williams, David Lamparter, Desnes A.
   Nunes do Rosario, Erhard F., Finn Thain, Frederic Barrat, Ganesh Goudar,
   Gautham R. Shenoy, Geoff Levand, Greg Kurz, Gustavo A. R. Silva, Hari Bathini,
   Harish, Imre Kaloz, Joel Stanley, Joe Perches, John Crispin, Jordan Niethe,
   Kajol Jain, Kamalesh Babulal, Kees Cook, Laurent Dufour, Leonardo Bras, Li
   RongQing, Madhavan Srinivasan, Mahesh Salgaonkar, Mark Cave-Ayland, Michal
   Suchanek, Milton Miller, Mimi Zohar, Murilo Opsfelder Araujo, Nathan
   Chancellor, Nathan Lynch, Naveen N. Rao, Nayna Jain, Nicholas Piggin, Oliver
   O'Halloran, Palmer Dabbelt, Pedro Miraglia Franco de Carvalho, Philippe
   Bergheaud, Pingfan Liu, Pratik Rajesh Sampat, Qian Cai, Qinglang Miao, Randy
   Dunlap, Ravi Bangoria, Sachin Sant, Sam Bobroff, Sandipan Das, Santosh
   Sivaraj, Satheesh Rajendran, Shirisha Ganta, Sourabh Jain, Srikar Dronamraju,
   Stan Johnson, Stephen Rothwell, Thadeu Lima de Souza Cascardo, Thiago Jung
   Bauermann, Tom Lane, Vaibhav Jain, Vladis Dronov, Wei Yongjun, Wen Xiong,
   YueHaibing.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl8tOxATHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgDQfEAClXHWf6hnxB84bEu39D51NkVotL1IG
 BRWFvyix+xHuUkHIouBPAAMl6ngY5X6wkYd+Z+CY9zHNtdSDoVlJE30YXdMQA/dE
 L/rYxR1884yGR/uU/3wusboO68ReXwcKQPmKOymUfh0zH7ujyJsSWLpXFK1YDC5d
 2TVVTi0Q+P5ucMHDh0L+AHirIxZvtZSp43+J7xLtywsj+XAxJWCTGo5WCJbdgbCA
 Qbv3aOkVyUa3EgsbdM/STPpv82ebqT+PHxeSIO4Jw6ZODtKRH0R5YsWCApuY9eZ+
 ebY9RLmgv9ZAhJqB2fv9A5NDcMoGpZNmjM7HrWpXwULKQpkBGHCzJ9FcSdHVMOx8
 nbVMFjt4uzLwV1w8lFYslQ2tNH/uH2o9BlryV1RLpiiKokDAJO/NOsWN9y0u/I4J
 EmAM5DSX2LgVvvas96IlGK8KX4xkOkf8FLX/H5UDvvAfloH8J4CZXk/CWCab/nqY
 KEHPnMmYvQZ1w9SzyZg9sO/1p6Bl1Gmm75Jv2F1lBiRW/42VcGBI/qLsJ4lC59Fc
 KbwufYNYYG38wbxDLW1HAPJhRonxIcaZj3EEqk7aTiLZ55nNbu8e2k32CpNXTGqt
 npOhzJHimcq7L6+878ZW+xpbZwogIEUdRSsmwb6aT8za3ShnYwSA2Q3LYxh9xyGH
 j3GifvPq6Efp3Q==
 =QMY1
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:

 - Add support for (optionally) using queued spinlocks & rwlocks.

 - Support for a new faster system call ABI using the scv instruction on
   Power9 or later.

 - Drop support for the PROT_SAO mmap/mprotect flag as it will be
   unsupported on Power10 and future processors, leaving us with no way
   to implement the functionality it requests. This risks breaking
   userspace, though we believe it is unused in practice.

 - A bug fix for, and then the removal of, our custom stack expansion
   checking. We now allow stack expansion up to the rlimit, like other
   architectures.

 - Remove the remnants of our (previously disabled) topology update
   code, which tried to react to NUMA layout changes on virtualised
   systems, but was prone to crashes and other problems.

 - Add PMU support for Power10 CPUs.

 - A change to our signal trampoline so that we don't unbalance the link
   stack (branch return predictor) in the signal delivery path.

 - Lots of other cleanups, refactorings, smaller features and so on as
   usual.

Thanks to: Abhishek Goel, Alastair D'Silva, Alexander A. Klimov, Alexey
Kardashevskiy, Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anju
T Sudhakar, Anton Blanchard, Arnd Bergmann, Athira Rajeev, Balamuruhan
S, Bharata B Rao, Bill Wendling, Bin Meng, Cédric Le Goater, Chris
Packham, Christophe Leroy, Christoph Hellwig, Daniel Axtens, Dan
Williams, David Lamparter, Desnes A. Nunes do Rosario, Erhard F., Finn
Thain, Frederic Barrat, Ganesh Goudar, Gautham R. Shenoy, Geoff Levand,
Greg Kurz, Gustavo A. R. Silva, Hari Bathini, Harish, Imre Kaloz, Joel
Stanley, Joe Perches, John Crispin, Jordan Niethe, Kajol Jain, Kamalesh
Babulal, Kees Cook, Laurent Dufour, Leonardo Bras, Li RongQing, Madhavan
Srinivasan, Mahesh Salgaonkar, Mark Cave-Ayland, Michal Suchanek, Milton
Miller, Mimi Zohar, Murilo Opsfelder Araujo, Nathan Chancellor, Nathan
Lynch, Naveen N. Rao, Nayna Jain, Nicholas Piggin, Oliver O'Halloran,
Palmer Dabbelt, Pedro Miraglia Franco de Carvalho, Philippe Bergheaud,
Pingfan Liu, Pratik Rajesh Sampat, Qian Cai, Qinglang Miao, Randy
Dunlap, Ravi Bangoria, Sachin Sant, Sam Bobroff, Sandipan Das, Santosh
Sivaraj, Satheesh Rajendran, Shirisha Ganta, Sourabh Jain, Srikar
Dronamraju, Stan Johnson, Stephen Rothwell, Thadeu Lima de Souza
Cascardo, Thiago Jung Bauermann, Tom Lane, Vaibhav Jain, Vladis Dronov,
Wei Yongjun, Wen Xiong, YueHaibing.

* tag 'powerpc-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (337 commits)
  selftests/powerpc: Fix pkey syscall redefinitions
  powerpc: Fix circular dependency between percpu.h and mmu.h
  powerpc/powernv/sriov: Fix use of uninitialised variable
  selftests/powerpc: Skip vmx/vsx/tar/etc tests on older CPUs
  powerpc/40x: Fix assembler warning about r0
  powerpc/papr_scm: Add support for fetching nvdimm 'fuel-gauge' metric
  powerpc/papr_scm: Fetch nvdimm performance stats from PHYP
  cpuidle: pseries: Fixup exit latency for CEDE(0)
  cpuidle: pseries: Add function to parse extended CEDE records
  cpuidle: pseries: Set the latency-hint before entering CEDE
  selftests/powerpc: Fix online CPU selection
  powerpc/perf: Consolidate perf_callchain_user_[64|32]()
  powerpc/pseries/hotplug-cpu: Remove double free in error path
  powerpc/pseries/mobility: Add pr_debug() for device tree changes
  powerpc/pseries/mobility: Set pr_fmt()
  powerpc/cacheinfo: Warn if cache object chain becomes unordered
  powerpc/cacheinfo: Improve diagnostics about malformed cache lists
  powerpc/cacheinfo: Use name@unit instead of full DT path in debug messages
  powerpc/cacheinfo: Set pr_fmt()
  powerpc: fix function annotations to avoid section mismatch warnings with gcc-10
  ...
2020-08-07 10:33:50 -07:00
Linus Torvalds
2ed90dbbf7 dma-mapping updates for 5.9
- make support for dma_ops optional
  - move more code out of line
  - add generic support for a dma_ops bypass mode
  - misc cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAl8oGscLHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYNfEhAAmFwd6BBHGwAhXUchoIue5vdNnuY3GiBFRzUdz67W
 zRYYgZYiPjl+MwflRmwPcoWEnGzmweRa2s6OnyDostiCRauioa8BuQfGqJasf1yZ
 D36dFNVHGW0o6pRDUQkd688k/4A6szwuwpq83qi4e8X2I9QzAITHtW8izjfPM923
 FlJzxEFggbB2TvwfUXOZhmpuG4Dog8S7VZ1Uz4QAg0Z/5FDqIKAAG2aZMqCXBbiX
 01E8tr0AqU/jn2xpc8O+DJGFiYIRhqhyNxQbH6qz1Q3xGFSokcLYm3YqkqVOgpn1
 DLs2UFDxWkly/F+wGnYtju7OD9VGPywzOcW125/LIsApYN5R/rYrtQzK41eq7Mp5
 HY3tqgNTIMdnl4so7QXeU4Vxj+lUdPlI26NZGszcM5AVftdTX8KjGdS+0+PBza6i
 i7trwG7J5/DnwiBCvEKoul7Ul1psUMTSvYwINTXRqsU4mZXhhx/mwyXbtruELnkj
 3agM98u6hoalLNjd2aueh+NjMZi1r+MchTrfRvTcxJ+yQ5BoR5kF+iz7eT/LtZ72
 AqWwimsPGNkLHUa0TrqWql5tv90cdDkBZzWXVbixwxRfgynWYLE6jugeIy8hwjFf
 GjO5XKbBwnWPjdSzFsVMPeuNpmr7ZjVHHewy2Q/jWQAIOyeof0VztEl23LN5yUkx
 pc8=
 =90UK
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-5.9' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping updates from Christoph Hellwig:

 - make support for dma_ops optional

 - move more code out of line

 - add generic support for a dma_ops bypass mode

 - misc cleanups

* tag 'dma-mapping-5.9' of git://git.infradead.org/users/hch/dma-mapping:
  dma-contiguous: cleanup dma_alloc_contiguous
  dma-debug: use named initializers for dir2name
  powerpc: use the generic dma_ops_bypass mode
  dma-mapping: add a dma_ops_bypass flag to struct device
  dma-mapping: make support for dma ops optional
  dma-mapping: inline the fast path dma-direct calls
  dma-mapping: move the remaining DMA API calls out of line
2020-08-04 17:29:57 -07:00
Christophe Leroy
6ca055322d powerpc/32s: Use dedicated segment for modules with STRICT_KERNEL_RWX
When STRICT_KERNEL_RWX is set, we want to set NX bit on vmalloc
segments. But modules require exec.

Use a dedicated segment for modules. There is not much space
above kernel, and we don't waste vmalloc space to do alignment.
Therefore, we take the segment before PAGE_OFFSET for modules.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/eb8faba9148b6cf17c696ba776b4e8ee2f6313bf.1593428200.git.christophe.leroy@csgroup.eu
2020-07-27 00:01:30 +10:00
Nicholas Piggin
aa65ff6b18 powerpc/64s: Implement queued spinlocks and rwlocks
These have shown significantly improved performance and fairness when
spinlock contention is moderate to high on very large systems.

With this series including subsequent patches, on a 16 socket 1536
thread POWER9, a stress test such as same-file open/close from all
CPUs gets big speedups, 11620op/s aggregate with simple spinlocks vs
384158op/s (33x faster), where the difference in throughput between
the fastest and slowest thread goes from 7x to 1.4x.

Thanks to the fast path being identical in terms of atomics and
barriers (after a subsequent optimisation patch), single threaded
performance is not changed (no measurable difference).

On smaller systems, performance and fairness seems to be generally
improved. Using dbench on tmpfs as a test (that starts to run into
kernel spinlock contention), a 2-socket OpenPOWER POWER9 system was
tested with bare metal and KVM guest configurations. Results can be
found here:

https://github.com/linuxppc/issues/issues/305#issuecomment-663487453

Observations are:

- Queued spinlocks are equal when contention is insignificant, as
  expected and as measured with microbenchmarks.

- When there is contention, on bare metal queued spinlocks have better
  throughput and max latency at all points.

- When virtualised, queued spinlocks are slightly worse approaching
  peak throughput, but significantly better throughput and max latency
  at all points beyond peak, until queued spinlock maximum latency
  rises when clients are 2x vCPUs.

The regressions haven't been analysed very well yet, there are a lot
of things that can be tuned, particularly the paravirtualised locking,
but the numbers already look like a good net win even on relatively
small systems.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Waiman Long <longman@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724131423.1362108-4-npiggin@gmail.com
2020-07-27 00:01:23 +10:00
Nicholas Piggin
2384b36f91 powerpc: Select ARCH_HAS_MEMBARRIER_SYNC_CORE
powerpc return from interrupt and return from system call sequences
are context synchronising.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200716013522.338318-1-npiggin@gmail.com
2020-07-23 17:43:23 +10:00
Nicholas Piggin
63396ada80 powerpc/64s/hash: Disable subpage_prot syscall by default
The subpage_prot syscall was added for specialised system software
(Lx86) that has been discontinued for about 7 years, and is not thought
to be used elsewhere, so disable it by default.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200703011958.1166620-4-npiggin@gmail.com
2020-07-22 00:01:25 +10:00
Christoph Hellwig
f1565c24b5 powerpc: use the generic dma_ops_bypass mode
Use the DMA API bypass mechanism for direct window mappings.  This uses
common code and speed up the direct mapping case by avoiding indirect
calls just when not using dma ops at all.  It also fixes a problem where
the sync_* methods were using the bypass check for DMA allocations, but
those are part of the streaming ops.

Note that this patch loses the DMA_ATTR_WEAK_ORDERING override, which
has never been well defined, as is only used by a few drivers, which
IIRC never showed up in the typical Cell blade setups that are affected
by the ordering workaround.

Fixes: efd176a04b ("powerpc/pseries/dma: Allow SWIOTLB")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2020-07-19 09:29:29 +02:00
Christoph Hellwig
2f9237d4f6 dma-mapping: make support for dma ops optional
Avoid the overhead of the dma ops support for tiny builds that only
use the direct mapping.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2020-07-19 09:29:23 +02:00
Christian Brauner
140c8180eb
arch: remove HAVE_COPY_THREAD_TLS
All architectures support copy_thread_tls() now, so remove the legacy
copy_thread() function and the HAVE_COPY_THREAD_TLS config option. Everyone
uses the same process creation calling convention based on
copy_thread_tls() and struct kernel_clone_args. This will make it easier to
maintain the core process creation code under kernel/, simplifies the
callpaths and makes the identical for all architectures.

Cc: linux-arch@vger.kernel.org
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Acked-by: Greentime Hu <green.hu@gmail.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-07-04 23:41:37 +02:00
Chris Packham
f134a7cef1 powerpc: Remove inaccessible CMDLINE default
Since commit cbe46bd4f5 ("powerpc: remove CONFIG_CMDLINE #ifdef mess")
CONFIG_CMDLINE has always had a value regardless of CONFIG_CMDLINE_BOOL.

For example:

 $ make ARCH=powerpc defconfig
 $ cat .config
 # CONFIG_CMDLINE_BOOL is not set
 CONFIG_CMDLINE=""

When enabling CONFIG_CMDLINE_BOOL this value is kept making the 'default
"..." if CONFIG_CMDLINE_BOOL' ineffective.

 $ ./scripts/config --enable CONFIG_CMDLINE_BOOL
 $ cat .config
 CONFIG_CMDLINE_BOOL=y
 CONFIG_CMDLINE=""

Remove CONFIG_CMDLINE_BOOL and the inaccessible default.

Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200611224220.25066-2-chris.packham@alliedtelesis.co.nz
2020-06-22 10:37:56 +10:00
Linus Torvalds
7ae77150d9 powerpc updates for 5.8
- Support for userspace to send requests directly to the on-chip GZIP
    accelerator on Power9.
 
  - Rework of our lockless page table walking (__find_linux_pte()) to make it
    safe against parallel page table manipulations without relying on an IPI for
    serialisation.
 
  - A series of fixes & enhancements to make our machine check handling more
    robust.
 
  - Lots of plumbing to add support for "prefixed" (64-bit) instructions on
    Power10.
 
  - Support for using huge pages for the linear mapping on 8xx (32-bit).
 
  - Remove obsolete Xilinx PPC405/PPC440 support, and an associated sound driver.
 
  - Removal of some obsolete 40x platforms and associated cruft.
 
  - Initial support for booting on Power10.
 
  - Lots of other small features, cleanups & fixes.
 
 Thanks to:
   Alexey Kardashevskiy, Alistair Popple, Andrew Donnellan, Andrey Abramov,
   Aneesh Kumar K.V, Balamuruhan S, Bharata B Rao, Bulent Abali, Cédric Le
   Goater, Chen Zhou, Christian Zigotzky, Christophe JAILLET, Christophe Leroy,
   Dmitry Torokhov, Emmanuel Nicolet, Erhard F., Gautham R. Shenoy, Geoff Levand,
   George Spelvin, Greg Kurz, Gustavo A. R. Silva, Gustavo Walbon, Haren Myneni,
   Hari Bathini, Joel Stanley, Jordan Niethe, Kajol Jain, Kees Cook, Leonardo
   Bras, Madhavan Srinivasan., Mahesh Salgaonkar, Markus Elfring, Michael
   Neuling, Michal Simek, Nathan Chancellor, Nathan Lynch, Naveen N. Rao,
   Nicholas Piggin, Oliver O'Halloran, Paul Mackerras, Pingfan Liu, Qian Cai, Ram
   Pai, Raphael Moreira Zinsly, Ravi Bangoria, Sam Bobroff, Sandipan Das, Segher
   Boessenkool, Stephen Rothwell, Sukadev Bhattiprolu, Tyrel Datwyler, Wolfram
   Sang, Xiongfeng Wang.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl7aYZ8THG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgPiKD/9zNCuZLFMAFrIdbm0HlYA2RGYZFT75
 GUHsqYyei1pxA7PgM3KwJiXELVODsBv0eQbgNh1tbecKrxPRegN/cywd1KLjPZ7I
 v5/qweQP8MvR0RhzjbhvUcO0jq/f8u2LbJr5mUfVzjU6tAvrvcWo3oZqDElsekCS
 kgyOH3r1vZ2PLTMiGFhb0gWi2iqc+6BHU1AFCGPCMjB1Vu5d5+54VvZ/6lllGsOF
 yg9CBXmmVvQ+Bn6tH4zdEB78FYxnAIwBqlbmL79i5ca+HQJ0Sw6HuPRy9XYq35p6
 2EiXS4Wrgp7i7+1TN3HO362u5Onb8TSyQU7NS6yCFPoJ6JQxcJMBIw6mHhnXOPuZ
 CrjgcdwUMjx8uDoKmX1Epbfuex2w+AysW+4yBHPFiSgl3klKC3D0wi95mR485w2F
 rN8uzJtrDeFKcYZJG7IoB/cgFCCPKGf9HaXr8q0S/jBKMffx91ul3cfzlfdIXOCw
 FDNw/+ZX7UD6ddFEG12ZTO+vdL8yf1uCRT/DIZwUiDMIA0+M6F4nc7j3lfyZfoO1
 65f9UlhoLxScq7VH2fKH4UtZatO9cPID2z1CmiY4UbUIPtFDepSuYClgLF+Duf4b
 rkfxhKU0+Ja1zNH5XNc+L+Bc5/W4lFiJXz02dYIjtHoUpWkc1aToOETVwzggYFNM
 G3PXIBOI0jRgRw==
 =o0WU
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:

 - Support for userspace to send requests directly to the on-chip GZIP
   accelerator on Power9.

 - Rework of our lockless page table walking (__find_linux_pte()) to
   make it safe against parallel page table manipulations without
   relying on an IPI for serialisation.

 - A series of fixes & enhancements to make our machine check handling
   more robust.

 - Lots of plumbing to add support for "prefixed" (64-bit) instructions
   on Power10.

 - Support for using huge pages for the linear mapping on 8xx (32-bit).

 - Remove obsolete Xilinx PPC405/PPC440 support, and an associated sound
   driver.

 - Removal of some obsolete 40x platforms and associated cruft.

 - Initial support for booting on Power10.

 - Lots of other small features, cleanups & fixes.

Thanks to: Alexey Kardashevskiy, Alistair Popple, Andrew Donnellan,
Andrey Abramov, Aneesh Kumar K.V, Balamuruhan S, Bharata B Rao, Bulent
Abali, Cédric Le Goater, Chen Zhou, Christian Zigotzky, Christophe
JAILLET, Christophe Leroy, Dmitry Torokhov, Emmanuel Nicolet, Erhard F.,
Gautham R. Shenoy, Geoff Levand, George Spelvin, Greg Kurz, Gustavo A.
R. Silva, Gustavo Walbon, Haren Myneni, Hari Bathini, Joel Stanley,
Jordan Niethe, Kajol Jain, Kees Cook, Leonardo Bras, Madhavan
Srinivasan., Mahesh Salgaonkar, Markus Elfring, Michael Neuling, Michal
Simek, Nathan Chancellor, Nathan Lynch, Naveen N. Rao, Nicholas Piggin,
Oliver O'Halloran, Paul Mackerras, Pingfan Liu, Qian Cai, Ram Pai,
Raphael Moreira Zinsly, Ravi Bangoria, Sam Bobroff, Sandipan Das, Segher
Boessenkool, Stephen Rothwell, Sukadev Bhattiprolu, Tyrel Datwyler,
Wolfram Sang, Xiongfeng Wang.

* tag 'powerpc-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (299 commits)
  powerpc/pseries: Make vio and ibmebus initcalls pseries specific
  cxl: Remove dead Kconfig options
  powerpc: Add POWER10 architected mode
  powerpc/dt_cpu_ftrs: Add MMA feature
  powerpc/dt_cpu_ftrs: Enable Prefixed Instructions
  powerpc/dt_cpu_ftrs: Advertise support for ISA v3.1 if selected
  powerpc: Add support for ISA v3.1
  powerpc: Add new HWCAP bits
  powerpc/64s: Don't set FSCR bits in INIT_THREAD
  powerpc/64s: Save FSCR to init_task.thread.fscr after feature init
  powerpc/64s: Don't let DT CPU features set FSCR_DSCR
  powerpc/64s: Don't init FSCR_DSCR in __init_FSCR()
  powerpc/32s: Fix another build failure with CONFIG_PPC_KUAP_DEBUG
  powerpc/module_64: Use special stub for _mcount() with -mprofile-kernel
  powerpc/module_64: Simplify check for -mprofile-kernel ftrace relocations
  powerpc/module_64: Consolidate ftrace code
  powerpc/32: Disable KASAN with pages bigger than 16k
  powerpc/uaccess: Don't set KUEP by default on book3s/32
  powerpc/uaccess: Don't set KUAP by default on book3s/32
  powerpc/8xx: Reduce time spent in allow_user_access() and friends
  ...
2020-06-05 12:39:30 -07:00
Anshuman Khandual
399145f9eb mm/debug: add tests validating architecture page table helpers
This adds tests which will validate architecture page table helpers and
other accessors in their compliance with expected generic MM semantics.
This will help various architectures in validating changes to existing
page table helpers or addition of new ones.

This test covers basic page table entry transformations including but not
limited to old, young, dirty, clean, write, write protect etc at various
level along with populating intermediate entries with next page table page
and validating them.

Test page table pages are allocated from system memory with required size
and alignments.  The mapped pfns at page table levels are derived from a
real pfn representing a valid kernel text symbol.  This test gets called
via late_initcall().

This test gets built and run when CONFIG_DEBUG_VM_PGTABLE is selected.
Any architecture, which is willing to subscribe this test will need to
select ARCH_HAS_DEBUG_VM_PGTABLE.  For now this is limited to arc, arm64,
x86, s390 and powerpc platforms where the test is known to build and run
successfully Going forward, other architectures too can subscribe the test
after fixing any build or runtime problems with their page table helpers.

Folks interested in making sure that a given platform's page table helpers
conform to expected generic MM semantics should enable the above config
which will just trigger this test during boot.  Any non conformity here
will be reported as an warning which would need to be fixed.  This test
will help catch any changes to the agreed upon semantics expected from
generic MM and enable platforms to accommodate it thereafter.

[anshuman.khandual@arm.com: v17]
  Link: http://lkml.kernel.org/r/1587436495-22033-3-git-send-email-anshuman.khandual@arm.com
[anshuman.khandual@arm.com: v18]
  Link: http://lkml.kernel.org/r/1588564865-31160-3-git-send-email-anshuman.khandual@arm.com
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>	[s390]
Tested-by: Christophe Leroy <christophe.leroy@c-s.fr>	[ppc32]
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Link: http://lkml.kernel.org/r/1583919272-24178-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04 19:06:21 -07:00
Mike Rapoport
acd3f5c441 mm: remove early_pfn_in_nid() and CONFIG_NODES_SPAN_OTHER_NODES
The memmap_init() function was made to iterate over memblock regions and
as the result the early_pfn_in_nid() function became obsolete.  Since
CONFIG_NODES_SPAN_OTHER_NODES is only used to pick a stub or a real
implementation of early_pfn_in_nid(), it is also not needed anymore.

Remove both early_pfn_in_nid() and the CONFIG_NODES_SPAN_OTHER_NODES.

Co-developed-by: Hoan Tran <Hoan@os.amperecomputing.com>
Signed-off-by: Hoan Tran <Hoan@os.amperecomputing.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
Cc: Baoquan He <bhe@redhat.com>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200412194859.12663-17-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-03 20:09:44 -07:00
Mike Rapoport
3f08a302f5 mm: remove CONFIG_HAVE_MEMBLOCK_NODE_MAP option
CONFIG_HAVE_MEMBLOCK_NODE_MAP is used to differentiate initialization of
nodes and zones structures between the systems that have region to node
mapping in memblock and those that don't.

Currently all the NUMA architectures enable this option and for the
non-NUMA systems we can presume that all the memory belongs to node 0 and
therefore the compile time configuration option is not required.

The remaining few architectures that use DISCONTIGMEM without NUMA are
easily updated to use memblock_add_node() instead of memblock_add() and
thus have proper correspondence of memblock regions to NUMA nodes.

Still, free_area_init_node() must have a backward compatible version
because its semantics with and without CONFIG_HAVE_MEMBLOCK_NODE_MAP is
different.  Once all the architectures will use the new semantics, the
entire compatibility layer can be dropped.

To avoid addition of extra run time memory to store node id for
architectures that keep memblock but have only a single node, the node id
field of the memblock_region is guarded by CONFIG_NEED_MULTIPLE_NODES and
the corresponding accessors presume that in those cases it is always 0.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
Cc: Baoquan He <bhe@redhat.com>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200412194859.12663-4-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-03 20:09:43 -07:00
Christophe Leroy
888468ce72 powerpc/32: Disable KASAN with pages bigger than 16k
Mapping of early shadow area is implemented by using a single static
page table having all entries pointing to the same early shadow page.
The shadow area must therefore occupy full PGD entries.

The shadow area has a size of 128MB starting at 0xf8000000.
With 4k pages, a PGD entry is 4MB
With 16k pages, a PGD entry is 64MB
With 64k pages, a PGD entry is 1GB which is too big.

Until we rework the early shadow mapping, disable KASAN when the page
size is too big.

Fixes: 2edb16efc8 ("powerpc/32: Add KASAN support")
Cc: stable@vger.kernel.org # v5.2+
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7195fcde7314ccbf7a081b356084a69d421b10d4.1590660977.git.christophe.leroy@csgroup.eu
2020-06-02 20:59:14 +10:00
David S. Miller
f9e0ce3ddc Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Alexei Starovoitov says:

====================
pull-request: bpf 2020-05-29

The following pull-request contains BPF updates for your *net* tree.

We've added 6 non-merge commits during the last 7 day(s) which contain
a total of 4 files changed, 55 insertions(+), 34 deletions(-).

The main changes are:

1) minor verifier fix for fmod_ret progs, from Alexei.

2) af_xdp overflow check, from Bjorn.

3) minor verifier fix for 32bit assignment, from John.

4) powerpc has non-overlapping addr space, from Petr.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-29 15:59:08 -07:00
Petr Mladek
d195b1d1d1 powerpc/bpf: Enable bpf_probe_read{, str}() on powerpc again
The commit 0ebeea8ca8 ("bpf: Restrict bpf_probe_read{, str}() only
to archs where they work") caused that bpf_probe_read{, str}() functions
were not longer available on architectures where the same logical address
might have different content in kernel and user memory mapping. These
architectures should use probe_read_{user,kernel}_str helpers.

For backward compatibility, the problematic functions are still available
on architectures where the user and kernel address spaces are not
overlapping. This is defined CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE.

At the moment, these backward compatible functions are enabled only on x86_64,
arm, and arm64. Let's do it also on powerpc that has the non overlapping
address space as well.

Fixes: 0ebeea8ca8 ("bpf: Restrict bpf_probe_read{, str}() only to archs where they work")
Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/lkml/20200527122844.19524-1-pmladek@suse.com
2020-05-28 17:12:18 +02:00
Michael Ellerman
baddc87d68 Merge branch 'fixes' into next
Merge our fixes branch from this cycle. It contains several important
fixes we need in next for testing purposes, and also some that will
conflict with upcoming changes.
2020-05-26 22:56:03 +10:00
Christophe Leroy
2b279c0348 powerpc/32s: Allow mapping with BATs with DEBUG_PAGEALLOC
DEBUG_PAGEALLOC only manages RW data.

Text and RO data can still be mapped with BATs.

In order to map with BATs, also enforce data alignment. Set
by default to 256M which is a good compromise for keeping
enough BATs for also KASAN and IMMR.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/fd29c1718ee44d82115d0e835ced808eb4ccbf51.1589866984.git.christophe.leroy@csgroup.eu
2020-05-26 22:22:23 +10:00
Christophe Leroy
fcdafd10a3 powerpc/8xx: Allow large TLBs with DEBUG_PAGEALLOC
DEBUG_PAGEALLOC only manages RW data.

Text and RO data can still be mapped with hugepages and pinned TLB.

In order to map with hugepages, also enforce a 512kB data alignment
minimum. That's a trade-off between size of speed, taking into
account that DEBUG_PAGEALLOC is a debug option. Anyway the alignment
is still tunable.

We also allow tuning of alignment for book3s to limit the complexity
of the test in Kconfig that will anyway disappear in the following
patches once DEBUG_PAGEALLOC is handled together with BATs.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c13256f2d356a316715da61fe089b3623ef217a5.1589866984.git.christophe.leroy@csgroup.eu
2020-05-26 22:22:23 +10:00
Christophe Leroy
da1adea075 powerpc/8xx: Allow STRICT_KERNEL_RwX with pinned TLB
Pinned TLB are 8M. Now that there is no strict boundary anymore
between text and RO data, it is possible to use 8M pinned executable
TLB that covers both text and RO data.

When PIN_TLB_DATA or PIN_TLB_TEXT is selected, enforce 8M RW data
alignment and allow STRICT_KERNEL_RWX.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c535fc97bf0dd8693192e25feeed8088701e00c6.1589866984.git.christophe.leroy@csgroup.eu
2020-05-26 22:22:23 +10:00
Christophe Leroy
a0591b60ee powerpc/mm: Don't be too strict with _etext alignment on PPC32
Similar to PPC64, accept to map RO data as ROX as a trade off between
between security and memory usage.

Having RO data executable is not a high risk as RO data can't be
modified to forge an exploit.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/8c4a0d89d944eed984dd941e509614031a5ace2b.1589866984.git.christophe.leroy@csgroup.eu
2020-05-26 22:22:22 +10:00
Christophe Leroy
5d4656696c powerpc/8xx: Move PPC_PIN_TLB options into 8xx Kconfig
PPC_PIN_TLB options are dedicated to the 8xx, move them into
the 8xx Kconfig.

While we are at it, add some text to explain what it does.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1ece39fac6312e1d14e6a67b3f9d9f9f91990a7b.1589866984.git.christophe.leroy@csgroup.eu
2020-05-26 22:22:21 +10:00
Michael Ellerman
8659a0e0ef powerpc/64s: Disable STRICT_KERNEL_RWX
Several strange crashes have been eventually traced back to
STRICT_KERNEL_RWX and its interaction with code patching.

Various paths in our ftrace, kprobes and other patching code need to
be hardened against patching failures, otherwise we can end up running
with partially/incorrectly patched ftrace paths, kprobes or jump
labels, which can then cause strange crashes.

Although fixes for those are in development, they're not -rc material.

There also seem to be problems with the underlying strict RWX logic,
which needs further debugging.

So for now disable STRICT_KERNEL_RWX on 64-bit to prevent people from
enabling the option and tripping over the bugs.

Fixes: 1e0fc9d1eb ("powerpc/Kconfig: Enable STRICT_KERNEL_RWX for some configs")
Cc: stable@vger.kernel.org # v4.13+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200520133605.972649-1-mpe@ellerman.id.au
2020-05-22 00:04:51 +10:00
Christophe Leroy
edbadaf067 powerpc/kasan: Fix stack overflow by increasing THREAD_SHIFT
When CONFIG_KASAN is selected, the stack usage is increased.

In the same way as x86 and arm64 architectures, increase
THREAD_SHIFT when CONFIG_KASAN is selected.

Fixes: 2edb16efc8 ("powerpc/32: Add KASAN support")
Reported-by: <erhard_f@mailbox.org>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=207129
Link: https://lore.kernel.org/r/2c50f3b1c9bbaa4217c9a98f3044bd2a36c46a4f.1586361277.git.christophe.leroy@c-s.fr
2020-05-11 23:15:16 +10:00
Naveen N. Rao
57b3ed941b powerpc/64: Have MPROFILE_KERNEL depend on FUNCTION_TRACER
Currently, it is possible to have CONFIG_FUNCTION_TRACER disabled, but
CONFIG_MPROFILE_KERNEL enabled. Though all existing users of
MPROFILE_KERNEL are doing the right thing, it is weird to have
MPROFILE_KERNEL enabled when the function tracer isn't. Fix this by
making MPROFILE_KERNEL depend on FUNCTION_TRACER.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200422092612.514301-1-naveen.n.rao@linux.vnet.ibm.com
2020-04-30 12:56:28 +10:00
Linus Torvalds
e4da01d833 powerpc updates for 5.7 #2
- A fix for a crash in machine check handling on pseries (ie. guests)
 
  - A small series to make it possible to disable CONFIG_COMPAT, and turn it off
    by default for ppc64le where it's not used.
 
  - A few other miscellaneous fixes and small improvements.
 
 Thanks to:
   Alexey Kardashevskiy, Anju T Sudhakar, Arnd Bergmann, Christophe Leroy, Dan
   Carpenter, Ganesh Goudar, Geert Uytterhoeven, Geoff Levand, Mahesh Salgaonkar,
   Markus Elfring, Michal Suchanek, Nicholas Piggin, Stephen Boyd, Wen Xiong.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl6O8LgTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgMcYEACbGf+Z9brLSasYajoqU6QdqGPacHEN
 1a9TEmUnN+HWgtfkkoEBFbyuYHnhyhYuf7hvNccDjDA91ESVylO+Wq7Q+v/xoz29
 LVyb3V6uuVMLHnoqwP5jpr0lS0aOpuu3Nc2SpfBuolDtJqeMxpVEEK3Ln3uATlq6
 SnEAxQEKb2x4Y4Tfuq5A3txupj0s/UYrmeR6GAdkN3Oapbb9uQl8Ql2smqrZo0cq
 6TLxtoFzbfXkV6NmY2V0OKMTeXt0fhrvdDhFEBckCUpRZLv4Fd7CwPWNER2bUVs6
 04kg87BAO8qRyfr3G93oP0mWgi65kpXI8yN6Vt5Lig+5PnxNh7swvEqDpQR1s9se
 uVoeN7RBHOKQppZRkpzf6yZbelCcMuwTeIBQ9XOx/jYFAHJ2nUkJm9TYgKckVvNY
 E4shiM1eoOVMrEmODFBCUmUkJLyn5jU1+r5mn708v2Nb5E1XgoTejitB6bHyL+Aa
 zo/0DdsZO86iNE7th94oHQRgVbx1vtP9kV6vK6BLB5M95RSGEdVAEMo5CTL70wr+
 hz7suOaijPi+TeCW9YPrcOHcHOQE9pzTFQoVZHuv8egbvd9Rni3aBAMMqHx/lQdE
 Hk4zN7IytOLqQTfINNYgoo0kUKI1ipJQOBfR5jyeLhgg6yp36CnO/m+0QGvrILbi
 18tlf2qkD75Z3g==
 =72sb
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull more powerpc updates from Michael Ellerman:
 "The bulk of this is the series to make CONFIG_COMPAT user-selectable,
  it's been around for a long time but was blocked behind the
  syscall-in-C series.

  Plus there's also a few fixes and other minor things.

  Summary:

   - A fix for a crash in machine check handling on pseries (ie. guests)

   - A small series to make it possible to disable CONFIG_COMPAT, and
     turn it off by default for ppc64le where it's not used.

   - A few other miscellaneous fixes and small improvements.

  Thanks to: Alexey Kardashevskiy, Anju T Sudhakar, Arnd Bergmann,
  Christophe Leroy, Dan Carpenter, Ganesh Goudar, Geert Uytterhoeven,
  Geoff Levand, Mahesh Salgaonkar, Markus Elfring, Michal Suchanek,
  Nicholas Piggin, Stephen Boyd, Wen Xiong"

* tag 'powerpc-5.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  selftests/powerpc: Always build the tm-poison test 64-bit
  powerpc: Improve ppc_save_regs()
  Revert "powerpc/64: irq_work avoid interrupt when called with hardware irqs enabled"
  powerpc/time: Replace <linux/clk-provider.h> by <linux/of_clk.h>
  powerpc/pseries/ddw: Extend upper limit for huge DMA window for persistent memory
  powerpc/perf: split callchain.c by bitness
  powerpc/64: Make COMPAT user-selectable disabled on littleendian by default.
  powerpc/64: make buildable without CONFIG_COMPAT
  powerpc/perf: consolidate valid_user_sp -> invalid_user_sp
  powerpc/perf: consolidate read_user_stack_32
  powerpc: move common register copy functions from signal_32.c to signal.c
  powerpc: Add back __ARCH_WANT_SYS_LLSEEK macro
  powerpc/ps3: Set CONFIG_UEVENT_HELPER=y in ps3_defconfig
  powerpc/ps3: Remove an unneeded NULL check
  powerpc/ps3: Remove duplicate error message
  powerpc/powernv: Re-enable imc trace-mode in kernel
  powerpc/perf: Implement a global lock to avoid races between trace, core and thread imc events.
  powerpc/pseries: Fix MCE handling on pseries
  selftests/eeh: Skip ahci adapters
  powerpc/64s: Fix doorbell wakeup msgclr optimisation
2020-04-09 11:01:42 -07:00
Linus Torvalds
9b06860d7c libnvdimm for 5.7
- Add support for region alignment configuration and enforcement to
   fix compatibility across architectures and PowerPC page size
   configurations.
 
 - Introduce 'zero_page_range' as a dax operation. This facilitates
   filesystem-dax operation without a block-device.
 
 - Introduce phys_to_target_node() to facilitate drivers that want to
   know resulting numa node if a given reserved address range was
   onlined.
 
 - Advertise a persistence-domain for of_pmem and papr_scm. The
   persistence domain indicates where cpu-store cycles need to reach in
   the platform-memory subsystem before the platform will consider them
   power-fail protected.
 
 - Promote numa_map_to_online_node() to a cross-kernel generic facility.
 
 - Save x86 numa information to allow for node-id lookups for reserved
   memory ranges, deploy that capability for the e820-pmem driver.
 
 - Pick up some miscellaneous minor fixes, that missed v5.6-final,
   including a some smatch reports in the ioctl path and some unit test
   compilation fixups.
 
 - Fixup some flexible-array declarations.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEf41QbsdZzFdA8EfZHtKRamZ9iAIFAl6LtIAACgkQHtKRamZ9
 iAIwRA/8CLVVuQpgHQ1tqK4h8CZPrISFXh7wy7uhocEU2xrDh6iGVnLztmoLRr2k
 5f8T9lRzreSAwIVL5DbGqP1pFncqIt9VMnKsFlaPMBGCBNR+hURY0iBCNjIT+jiq
 BOzLd52MR2rqJxeXGTMUbWrBrbmuj4mZPdmGVuFFe7GFRpoaVpCgOo+296eWa/ot
 gIOFUTonZY7STYjNvDok0TXCmiCFuJb+P+y5ldfCPShHvZhTiaF53jircja8vAjO
 G5dt8ixBKUK0rXRc4SEQsQhAZNcAFHb6Gy5lg4C2QzhTF374xTc9usJZNWbIE9iM
 5mipBYvjVuoY+XaCNZDkaRcJIy/jqB15O6l3QIWbZLGaK9m95YPp9LmkPFwd3JpO
 e3rO24ML471DxqB9iWIiJCNcBBocLOlnd6qAQTpppWDpGNbudwXvfsmKHmKIScSE
 x+IDCdscLmmm+WG2dLmLraWOVPu42xZFccoQCi4M3TTqfeB9pZ9XckFQ37zX62zG
 5t+7Ek+t1W4QVt/JQYVKH03XT15sqUpVknvx0Hl4Y5TtbDOkFLkO8RN0/HyExDef
 7iegS35kqTsM4EfZQ+9juKbI2JBAjHANcbj0V4dogqaRj6vr3akumBzUtuYqAofv
 qU3s9skmLsEemOJC+ns2PT8vl5dyIoeDfH0r2XvGWxYqolMqJpA=
 =sY4N
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm and dax updates from Dan Williams:
 "There were multiple touches outside of drivers/nvdimm/ this round to
  add cross arch compatibility to the devm_memremap_pages() interface,
  enhance numa information for persistent memory ranges, and add a
  zero_page_range() dax operation.

  This cycle I switched from the patchwork api to Konstantin's b4 script
  for collecting tags (from x86, PowerPC, filesystem, and device-mapper
  folks), and everything looks to have gone ok there. This has all
  appeared in -next with no reported issues.

  Summary:

   - Add support for region alignment configuration and enforcement to
     fix compatibility across architectures and PowerPC page size
     configurations.

   - Introduce 'zero_page_range' as a dax operation. This facilitates
     filesystem-dax operation without a block-device.

   - Introduce phys_to_target_node() to facilitate drivers that want to
     know resulting numa node if a given reserved address range was
     onlined.

   - Advertise a persistence-domain for of_pmem and papr_scm. The
     persistence domain indicates where cpu-store cycles need to reach
     in the platform-memory subsystem before the platform will consider
     them power-fail protected.

   - Promote numa_map_to_online_node() to a cross-kernel generic
     facility.

   - Save x86 numa information to allow for node-id lookups for reserved
     memory ranges, deploy that capability for the e820-pmem driver.

   - Pick up some miscellaneous minor fixes, that missed v5.6-final,
     including a some smatch reports in the ioctl path and some unit
     test compilation fixups.

   - Fixup some flexible-array declarations"

* tag 'libnvdimm-for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (29 commits)
  dax: Move mandatory ->zero_page_range() check in alloc_dax()
  dax,iomap: Add helper dax_iomap_zero() to zero a range
  dax: Use new dax zero page method for zeroing a page
  dm,dax: Add dax zero_page_range operation
  s390,dcssblk,dax: Add dax zero_page_range operation to dcssblk driver
  dax, pmem: Add a dax operation zero_page_range
  pmem: Add functions for reading/writing page to/from pmem
  libnvdimm: Update persistence domain value for of_pmem and papr_scm device
  tools/test/nvdimm: Fix out of tree build
  libnvdimm/region: Fix build error
  libnvdimm/region: Replace zero-length array with flexible-array member
  libnvdimm/label: Replace zero-length array with flexible-array member
  ACPI: NFIT: Replace zero-length array with flexible-array member
  libnvdimm/region: Introduce an 'align' attribute
  libnvdimm/region: Introduce NDD_LABELING
  libnvdimm/namespace: Enforce memremap_compat_align()
  libnvdimm/pfn: Prevent raw mode fallback if pfn-infoblock valid
  libnvdimm: Out of bounds read in __nd_ioctl()
  acpi/nfit: improve bounds checking for 'func'
  mm/memremap_pages: Introduce memremap_compat_align()
  ...
2020-04-08 21:03:40 -07:00
Linus Torvalds
7f218319ca Merge branch 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity
Pull integrity updates from Mimi Zohar:
 "Just a couple of updates for linux-5.7:

   - A new Kconfig option to enable IMA architecture specific runtime
     policy rules needed for secure and/or trusted boot, as requested.

   - Some message cleanup (eg. pr_fmt, additional error messages)"

* 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
  ima: add a new CONFIG for loading arch-specific policies
  integrity: Remove duplicate pr_fmt definitions
  IMA: Add log statements for failure conditions
  IMA: Update KBUILD_MODNAME for IMA files to ima
2020-04-02 14:49:46 -07:00
Michal Suchanek
6e944aed88 powerpc/64: Make COMPAT user-selectable disabled on littleendian by default.
On bigendian ppc64 it is common to have 32bit legacy binaries but much
less so on littleendian.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/41393d6e895b0d3a47ee62f8f51e1cf888ad6226.1584699455.git.msuchanek@suse.de
2020-04-03 00:10:00 +11:00
Nayna Jain
9e2b4be377 ima: add a new CONFIG for loading arch-specific policies
Every time a new architecture defines the IMA architecture specific
functions - arch_ima_get_secureboot() and arch_ima_get_policy(), the IMA
include file needs to be updated. To avoid this "noise", this patch
defines a new IMA Kconfig IMA_SECURE_AND_OR_TRUSTED_BOOT option, allowing
the different architectures to select it.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Philipp Rudo <prudo@linux.ibm.com> (s390)
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2020-03-12 07:43:57 -04:00
Dan Williams
9ffc1d19fc mm/memremap_pages: Introduce memremap_compat_align()
The "sub-section memory hotplug" facility allows memremap_pages() users
like libnvdimm to compensate for hardware platforms like x86 that have a
section size larger than their hardware memory mapping granularity.  The
compensation that sub-section support affords is being tolerant of
physical memory resources shifting by units smaller (64MiB on x86) than
the memory-hotplug section size (128 MiB). Where the platform
physical-memory mapping granularity is limited by the number and
capability of address-decode-registers in the memory controller.

While the sub-section support allows memremap_pages() to operate on
sub-section (2MiB) granularity, the Power architecture may still
require 16MiB alignment on "!radix_enabled()" platforms.

In order for libnvdimm to be able to detect and manage this per-arch
limitation, introduce memremap_compat_align() as a common minimum
alignment across all driver-facing memory-mapping interfaces, and let
Power override it to 16MiB in the "!radix_enabled()" case.

The assumption / requirement for 16MiB to be a viable
memremap_compat_align() value is that Power does not have platforms
where its equivalent of address-decode-registers never hardware remaps a
persistent memory resource on smaller than 16MiB boundaries. Note that I
tried my best to not add a new Kconfig symbol, but header include
entanglements defeated the #ifndef memremap_compat_align design pattern
and the need to export it defeats the __weak design pattern for arch
overrides.

Based on an initial patch by Aneesh.

Link: http://lore.kernel.org/r/CAPcyv4gBGNP95APYaBcsocEa50tQj9b5h__83vgngjq3ouGX_Q@mail.gmail.com
Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reported-by: Jeff Moyer <jmoyer@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2020-02-20 16:58:55 -08:00
Frederic Weisbecker
490f561b78 context-tracking: Introduce CONFIG_HAVE_TIF_NOHZ
A few archs (x86, arm, arm64) don't rely anymore on TIF_NOHZ to call
into context tracking on user entry/exit but instead use static keys
(or not) to optimize those calls. Ideally every arch should migrate to
that behaviour in the long run.

Settle a config option to let those archs remove their TIF_NOHZ
definitions.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paulburton@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: David S. Miller <davem@davemloft.net>
2020-02-14 16:05:04 +01:00
Linus Torvalds
71c3a888cb powerpc updates for 5.6
- Implement user_access_begin() and friends for our platforms that support
    controlling kernel access to userspace.
 
  - Enable CONFIG_VMAP_STACK on 32-bit Book3S and 8xx.
 
  - Some tweaks to our pseries IOMMU code to allow SVMs ("secure" virtual
    machines) to use the IOMMU.
 
  - Add support for CLOCK_{REALTIME/MONOTONIC}_COARSE to the 32-bit VDSO, and
    some other improvements.
 
  - A series to use the PCI hotplug framework to control opencapi card's so that
    they can be reset and re-read after flashing a new FPGA image.
 
 As well as other minor fixes and improvements as usual.
 
 Thanks to:
  Alastair D'Silva, Alexandre Ghiti, Alexey Kardashevskiy, Andrew Donnellan,
  Aneesh Kumar K.V, Anju T Sudhakar, Bai Yingjie, Chen Zhou, Christophe Leroy,
  Frederic Barrat, Greg Kurz, Jason A. Donenfeld, Joel Stanley, Jordan Niethe,
  Julia Lawall, Krzysztof Kozlowski, Laurent Dufour, Laurentiu Tudor, Linus
  Walleij, Michael Bringmann, Nathan Chancellor, Nicholas Piggin, Nick
  Desaulniers, Oliver O'Halloran, Peter Ujfalusi, Pingfan Liu, Ram Pai, Randy
  Dunlap, Russell Currey, Sam Bobroff, Sebastian Andrzej Siewior, Shawn
  Anastasio, Stephen Rothwell, Steve Best, Sukadev Bhattiprolu, Thiago Jung
  Bauermann, Tyrel Datwyler, Vaibhav Jain.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl44uJgTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgGIcD/9U3R2BK3trEPOStcUbYPte9sMqkyYq
 bcq4o2qrVc5deMvPhcHOQ4j28RUZOKoRODvSbXzGEGKIDlesmKjuP7AicE5qUjjV
 jRtsSOlRElXmPojAgrrlWrFDJOKbW5mFSj2TY/0sjVa06Wcu1Oi6WiQs/TazvZV/
 yzKh5lBL6xyQrmgH0h1VWWbblMbsA1bAL/D7m9Pgimpz0W6fOSRWgXILDUXPLBAy
 Rtt7p1218xPfhe66EgbLhWLIBJb70r+Z9yJNuVbp9NMJbDAhpfOuyMNXpRCELzXD
 5hwm0mFLOwxfSyBgIyIGokLRGFO6XL0uiZIG1Kp+tMxjgnNCmLlRs2R3EF1hoIWi
 49DHRAdK+IEggi6S4dXG5aglz6Rsun8pb/lN7uW+M68t3wp2IYQ+H8MQh4cxPTLu
 wX6KZr28lNG25yyp97nJq2Vld0xTxSSty92P8f588rkolyxzggUy0Xfen41szNrW
 9/bu8NWgt7qVtHmeUoCdWqiIiuMT1k3Of7AN4uAuS6aJHx2Fxr+03ZU5yNr8WIkm
 IOf27z8sUx3F8JL9cIuwAIPB0lSDPw1owvfiTYQ1VkzJa4Ko+kgv5wQ5Ors6V+ve
 XspE4osSP9T9PoHK2MVlu8mOjLpoo3Ibr849J0lGHQZDP6U3kHNILGfcXA8WP/9b
 Fgfh5Wj22cQe8A==
 =xpG+
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:
 "A pretty small batch for us, and apologies for it being a bit late, I
  wanted to sneak Christophe's user_access_begin() series in.

  Summary:

   - Implement user_access_begin() and friends for our platforms that
     support controlling kernel access to userspace.

   - Enable CONFIG_VMAP_STACK on 32-bit Book3S and 8xx.

   - Some tweaks to our pseries IOMMU code to allow SVMs ("secure"
     virtual machines) to use the IOMMU.

   - Add support for CLOCK_{REALTIME/MONOTONIC}_COARSE to the 32-bit
     VDSO, and some other improvements.

   - A series to use the PCI hotplug framework to control opencapi
     card's so that they can be reset and re-read after flashing a new
     FPGA image.

  As well as other minor fixes and improvements as usual.

  Thanks to: Alastair D'Silva, Alexandre Ghiti, Alexey Kardashevskiy,
  Andrew Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Bai Yingjie, Chen
  Zhou, Christophe Leroy, Frederic Barrat, Greg Kurz, Jason A.
  Donenfeld, Joel Stanley, Jordan Niethe, Julia Lawall, Krzysztof
  Kozlowski, Laurent Dufour, Laurentiu Tudor, Linus Walleij, Michael
  Bringmann, Nathan Chancellor, Nicholas Piggin, Nick Desaulniers,
  Oliver O'Halloran, Peter Ujfalusi, Pingfan Liu, Ram Pai, Randy Dunlap,
  Russell Currey, Sam Bobroff, Sebastian Andrzej Siewior, Shawn
  Anastasio, Stephen Rothwell, Steve Best, Sukadev Bhattiprolu, Thiago
  Jung Bauermann, Tyrel Datwyler, Vaibhav Jain"

* tag 'powerpc-5.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (131 commits)
  powerpc: configs: Cleanup old Kconfig options
  powerpc/configs/skiroot: Enable some more hardening options
  powerpc/configs/skiroot: Disable xmon default & enable reboot on panic
  powerpc/configs/skiroot: Enable security features
  powerpc/configs/skiroot: Update for symbol movement only
  powerpc/configs/skiroot: Drop default n CONFIG_CRYPTO_ECHAINIV
  powerpc/configs/skiroot: Drop HID_LOGITECH
  powerpc/configs: Drop NET_VENDOR_HP which moved to staging
  powerpc/configs: NET_CADENCE became NET_VENDOR_CADENCE
  powerpc/configs: Drop CONFIG_QLGE which moved to staging
  powerpc: Do not consider weak unresolved symbol relocations as bad
  powerpc/32s: Fix kasan_early_hash_table() for CONFIG_VMAP_STACK
  powerpc: indent to improve Kconfig readability
  powerpc: Provide initial documentation for PAPR hcalls
  powerpc: Implement user_access_save() and user_access_restore()
  powerpc: Implement user_access_begin and friends
  powerpc/32s: Prepare prevent_user_access() for user_access_end()
  powerpc/32s: Drop NULL addr verification
  powerpc/kuap: Fix set direction in allow/prevent_user_access()
  powerpc/32s: Fix bad_kuap_fault()
  ...
2020-02-04 13:06:46 +00:00
Peter Zijlstra
3af4bd0337 asm-generic/tlb: rename HAVE_MMU_GATHER_PAGE_SIZE
Towards a more consistent naming scheme.

Link: http://lkml.kernel.org/r/20200116064531.483522-8-aneesh.kumar@linux.ibm.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-04 03:05:26 +00:00
Peter Zijlstra
ff2e6d7259 asm-generic/tlb: rename HAVE_RCU_TABLE_FREE
Towards a more consistent naming scheme.

[akpm@linux-foundation.org: fix sparc64 Kconfig]
Link: http://lkml.kernel.org/r/20200116064531.483522-7-aneesh.kumar@linux.ibm.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-04 03:05:26 +00:00
Peter Zijlstra
0ed1325967 mm/mmu_gather: invalidate TLB correctly on batch allocation failure and flush
Architectures for which we have hardware walkers of Linux page table
should flush TLB on mmu gather batch allocation failures and batch flush.
Some architectures like POWER supports multiple translation modes (hash
and radix) and in the case of POWER only radix translation mode needs the
above TLBI.  This is because for hash translation mode kernel wants to
avoid this extra flush since there are no hardware walkers of linux page
table.  With radix translation, the hardware also walks linux page table
and with that, kernel needs to make sure to TLB invalidate page walk cache
before page table pages are freed.

More details in commit d86564a2f0 ("mm/tlb, x86/mm: Support invalidating
TLB caches for RCU_TABLE_FREE")

The changes to sparc are to make sure we keep the old behavior since we
are now removing HAVE_RCU_TABLE_NO_INVALIDATE.  The default value for
tlb_needs_table_invalidate is to always force an invalidate and sparc can
avoid the table invalidate.  Hence we define tlb_needs_table_invalidate to
false for sparc architecture.

Link: http://lkml.kernel.org/r/20200116064531.483522-3-aneesh.kumar@linux.ibm.com
Fixes: a46cc7a90f ("powerpc/mm/radix: Improve TLB/PWC flushes")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>	[powerpc]
Cc: <stable@vger.kernel.org>	[4.14+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-04 03:05:26 +00:00
Aneesh Kumar K.V
12e4d53f3f powerpc/mmu_gather: enable RCU_TABLE_FREE even for !SMP case
Patch series "Fixup page directory freeing", v4.

This is a repost of patch series from Peter with the arch specific changes
except ppc64 dropped.  ppc64 changes are added here because we are redoing
the patch series on top of ppc64 changes.  This makes it easy to backport
these changes.  Only the first 2 patches need to be backported to stable.

The thing is, on anything SMP, freeing page directories should observe the
exact same order as normal page freeing:

 1) unhook page/directory
 2) TLB invalidate
 3) free page/directory

Without this, any concurrent page-table walk could end up with a
Use-after-Free.  This is esp.  trivial for anything that has software
page-table walkers (HAVE_FAST_GUP / software TLB fill) or the hardware
caches partial page-walks (ie.  caches page directories).

Even on UP this might give issues since mmu_gather is preemptible these
days.  An interrupt or preempted task accessing user pages might stumble
into the free page if the hardware caches page directories.

This patch series fixes ppc64 and add generic MMU_GATHER changes to
support the conversion of other architectures.  I haven't added patches
w.r.t other architecture because they are yet to be acked.

This patch (of 9):

A followup patch is going to make sure we correctly invalidate page walk
cache before we free page table pages.  In order to keep things simple
enable RCU_TABLE_FREE even for !SMP so that we don't have to fixup the
!SMP case differently in the followup patch

!SMP case is right now broken for radix translation w.r.t page walk
cache flush.  We can get interrupted in between page table free and
that would imply we have page walk cache entries pointing to tables
which got freed already.  Michael said "both our platforms that run on
Power9 force SMP on in Kconfig, so the !SMP case is unlikely to be a
problem for anyone in practice, unless they've hacked their kernel to
build it !SMP."

Link: http://lkml.kernel.org/r/20200116064531.483522-2-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: <stable@vger.kernel.org>

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-04 03:05:25 +00:00
Linus Torvalds
893e591b59 Devicetree updates for v5.6:
- Update dtc to upstream v1.5.1-22-gc40aeb60b47a (plus 1 revert)
 
 - Fix for DMA coherent devices on Power
 
 - Rework and simplify the DT phandle cache code
 
 - DT schema conversions for LEDS, gpio-leds, STM32 dfsdm, STM32 UART,
   STM32 ROMEM, STM32 watchdog, STM32 DMAs, STM32 mlahb, STM32 RTC,
   STM32 RCC, STM32 syscon, rs485, Renesas rCar CSI2, Faraday FTIDE010,
   DWC2, Arm idle-states, Allwinner legacy resets, PRCM and clocks,
   Allwinner H6 OPP, Allwinner AHCI, Allwinner MBUS, Allwinner A31 CSI,
   Allwinner h/w codec, Allwinner A10 system ctrl, Allwinner SRAM,
   Allwinner USB PHY, Renesas CEU, generic PCI host, Arm Versatile PCI
 
 - New binding schemas for SATA and PATA controllers, TI and Infineon VR
   controllers, MAX31730
 
 - New compatible strings for i.MX8QM, WCN3991, renesas,r8a77961-wdt,
   renesas,etheravb-r8a77961
 
 - Add USB 'super-speed-plus' as a documented speed
 
 - Vendor prefixes for broadmobi, calaosystems, kam, and mps
 
 - Clean-up the multiple flavors of ST-Ericsson vendor prefixes
 -----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCgAuFiEEktVUI4SxYhzZyEuo+vtdtY28YcMFAl4x8eIQHHJvYmhAa2Vy
 bmVsLm9yZwAKCRD6+121jbxhw01FD/9MTmExKdgOc2I0u5MKIa+VXGeJKqfR0Liv
 irQBPVljPUAr7cDZJ4XX4YIuJ0Gh+MTLSF0QFp7kHyeVnfq+6rNMjwtUTpJSKCZb
 7nXQPguTDd2Guc0ycz1AG+nm2ZsFkaxg2GVFtDwA4aLGTho0oKYqFg8KUpZDAU9K
 HkDDyvDSFLZj48yn1yggs3d6kWh9ERTwK2ilDyHEsfJsZuD694X7GtSm0Vwlddti
 iuV/dTIsDfLPYFhDyz0sy21Kei/EWzMkB9sPIX/AhFQSd5/kgDqj1gL4SswbiV2a
 +pdjb0hcDpcpq4AxqGd5L4Jou4Zf9Agy+FNFkgia77MKZSwJo3lGq6GxZfXlu+L3
 XIV+lmjFILh6OtxHJBItDofqSO3jiGhj5THy16x0j6iwa/IHS3iGmp6MV4D75gvt
 +fvjRPrTwNmiaT8ZsjKG9EjnidLAWdwuhOrpqYi4hkQNRhwbA9pSr5ZUFz3G1LRS
 hLUXcBnZaQlfs8pgbzdv2n727WrGsUe1xFRj69nvl1GVIWPu2J14D1c4LU0+uCxH
 3kYcdyulTqwymRfDGBPEn1llp4N5sekJh0MYuYvNY3YtM/AQTE7FoKfR69xOk3Xa
 /Y1GddE0+suHAby7n2RLGGgpSv4notANYaEEsw0yiywgqkSr2kU67zI7hwU03Kux
 KAJZr365aQ==
 =dsEz
 -----END PGP SIGNATURE-----

Merge tag 'devicetree-for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull devicetree updates from Rob Herring:

 - Update dtc to upstream v1.5.1-22-gc40aeb60b47a (plus 1 revert)

 - Fix for DMA coherent devices on Power

 - Rework and simplify the DT phandle cache code

 - DT schema conversions for LEDS, gpio-leds, STM32 dfsdm, STM32 UART,
   STM32 ROMEM, STM32 watchdog, STM32 DMAs, STM32 mlahb, STM32 RTC,
   STM32 RCC, STM32 syscon, rs485, Renesas rCar CSI2, Faraday FTIDE010,
   DWC2, Arm idle-states, Allwinner legacy resets, PRCM and clocks,
   Allwinner H6 OPP, Allwinner AHCI, Allwinner MBUS, Allwinner A31 CSI,
   Allwinner h/w codec, Allwinner A10 system ctrl, Allwinner SRAM,
   Allwinner USB PHY, Renesas CEU, generic PCI host, Arm Versatile PCI

 - New binding schemas for SATA and PATA controllers, TI and Infineon VR
   controllers, MAX31730

 - New compatible strings for i.MX8QM, WCN3991, renesas,r8a77961-wdt,
   renesas,etheravb-r8a77961

 - Add USB 'super-speed-plus' as a documented speed

 - Vendor prefixes for broadmobi, calaosystems, kam, and mps

 - Clean-up the multiple flavors of ST-Ericsson vendor prefixes

* tag 'devicetree-for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (66 commits)
  scripts/dtc: Revert "yamltree: Ensure consistent bracketing of properties with phandles"
  of: Add OF_DMA_DEFAULT_COHERENT & select it on powerpc
  dt-bindings: leds: Convert gpio-leds to DT schema
  dt-bindings: leds: Convert common LED binding to schema
  dt-bindings: PCI: Convert generic host binding to DT schema
  dt-bindings: PCI: Convert Arm Versatile binding to DT schema
  dt-bindings: Be explicit about installing deps
  dt-bindings: stm32: convert dfsdm to json-schema
  dt-bindings: serial: Convert STM32 UART to json-schema
  dt-bindings: serial: Convert rs485 bindings to json-schema
  dt-bindings: timer: Use non-empty ranges in example
  dt-bindings: arm-boards: typo fix
  dt-bindings: Add TI and Infineon VR Controllers as trivial devices
  dt-binding: usb: add "super-speed-plus"
  dt-bindings: rcar-csi2: Convert bindings to json-schema
  dt-bindings: iio: adc: ad7606: Fix wrong maxItems value
  dt-bindings: Convert Faraday FTIDE010 to DT schema
  dt-bindings: Create DT bindings for PATA controllers
  dt-bindings: Create DT bindings for SATA controllers
  dt: bindings: add vendor prefix for Kamstrup A/S
  ...
2020-01-30 07:47:58 -08:00
Randy Dunlap
76be4414be powerpc: indent to improve Kconfig readability
Indent a Kconfig continuation line to improve readability.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ff8729c1-3a4b-c720-48ba-a1a42b0ef892@infradead.org
2020-01-30 13:20:50 +11:00
Linus Torvalds
c677124e63 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "These were the main changes in this cycle:

   - More -rt motivated separation of CONFIG_PREEMPT and
     CONFIG_PREEMPTION.

   - Add more low level scheduling topology sanity checks and warnings
     to filter out nonsensical topologies that break scheduling.

   - Extend uclamp constraints to influence wakeup CPU placement

   - Make the RT scheduler more aware of asymmetric topologies and CPU
     capacities, via uclamp metrics, if CONFIG_UCLAMP_TASK=y

   - Make idle CPU selection more consistent

   - Various fixes, smaller cleanups, updates and enhancements - please
     see the git log for details"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (58 commits)
  sched/fair: Define sched_idle_cpu() only for SMP configurations
  sched/topology: Assert non-NUMA topology masks don't (partially) overlap
  idle: fix spelling mistake "iterrupts" -> "interrupts"
  sched/fair: Remove redundant call to cpufreq_update_util()
  sched/psi: create /proc/pressure and /proc/pressure/{io|memory|cpu} only when psi enabled
  sched/fair: Fix sgc->{min,max}_capacity calculation for SD_OVERLAP
  sched/fair: calculate delta runnable load only when it's needed
  sched/cputime: move rq parameter in irqtime_account_process_tick
  stop_machine: Make stop_cpus() static
  sched/debug: Reset watchdog on all CPUs while processing sysrq-t
  sched/core: Fix size of rq::uclamp initialization
  sched/uclamp: Fix a bug in propagating uclamp value in new cgroups
  sched/fair: Load balance aggressively for SCHED_IDLE CPUs
  sched/fair : Improve update_sd_pick_busiest for spare capacity case
  watchdog: Remove soft_lockup_hrtimer_cnt and related code
  sched/rt: Make RT capacity-aware
  sched/fair: Make EAS wakeup placement consider uclamp restrictions
  sched/fair: Make task_fits_capacity() consider uclamp restrictions
  sched/uclamp: Rename uclamp_util_with() into uclamp_rq_util_with()
  sched/uclamp: Make uclamp util helpers use and return UL values
  ...
2020-01-28 10:07:09 -08:00
Linus Torvalds
8b561778f2 Merge branch 'core-objtool-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool updates from Ingo Molnar:
 "The main changes are to move the ORC unwind table sorting from early
  init to build-time - this speeds up booting.

  No change in functionality intended"

* 'core-objtool-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/unwind/orc: Fix !CONFIG_MODULES build warning
  x86/unwind/orc: Remove boot-time ORC unwind tables sorting
  scripts/sorttable: Implement build-time ORC unwind table sorting
  scripts/sorttable: Rename 'sortextable' to 'sorttable'
  scripts/sortextable: Refactor the do_func() function
  scripts/sortextable: Remove dead code
  scripts/sortextable: Clean up the code to meet the kernel coding style better
  scripts/sortextable: Rewrite error/success handling
2020-01-28 08:38:25 -08:00
Michael Ellerman
dabf6b36b8 of: Add OF_DMA_DEFAULT_COHERENT & select it on powerpc
There's an OF helper called of_dma_is_coherent(), which checks if a
device has a "dma-coherent" property to see if the device is coherent
for DMA.

But on some platforms devices are coherent by default, and on some
platforms it's not possible to update existing device trees to add the
"dma-coherent" property.

So add a Kconfig symbol to allow arch code to tell
of_dma_is_coherent() that devices are coherent by default, regardless
of the presence of the property.

Select that symbol on powerpc when NOT_COHERENT_CACHE is not set, ie.
when the system has a coherent cache.

Fixes: 92ea637ede ("of: introduce of_dma_is_coherent() helper")
Cc: stable@vger.kernel.org # v3.16+
Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rob Herring <robh@kernel.org>
2020-01-28 08:26:20 -06:00
Christophe Leroy
af1725d249 powerpc/kconfig: Move CONFIG_PPC32 into Kconfig.cputype
Move CONFIG_PPC32 at the same place as CONFIG_PPC64 for consistency.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/6f28085c2a1aa987093d50db17586633bbf8e206.1579024426.git.christophe.leroy@c-s.fr
2020-01-27 22:37:37 +11:00
Christophe Leroy
3d4247fcc9 powerpc/32: Add support of KASAN_VMALLOC
Add support of KASAN_VMALLOC on PPC32.

To allow this, the early shadow covering the VMALLOC space
need to be removed once high_memory var is set and before
freeing memblock.

And the VMALLOC area need to be aligned such that boundaries
are covered by a full shadow page.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/031dec5487bde9b2181c8b3c9800e1879cf98c1a.1579024426.git.christophe.leroy@c-s.fr
2020-01-27 22:37:33 +11:00
Bharata B Rao
a2db55dda9 powerpc: Ultravisor: Fix the dependencies for CONFIG_PPC_UV
Let PPC_UV depend only on DEVICE_PRIVATE which in turn
will satisfy all the other required dependencies

Fixes: 013a53f2d2 ("powerpc: Ultravisor: Add PPC_UV config option")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200109092047.24043-1-bharata@linux.ibm.com
2020-01-22 20:31:40 +11:00
Russell Currey
c55d7b5e64 powerpc: Remove STRICT_KERNEL_RWX incompatibility with RELOCATABLE
I have tested this with the Radix MMU and everything seems to work, and
the previous patch for Hash seems to fix everything too.
STRICT_KERNEL_RWX should still be disabled by default for now.

Please test STRICT_KERNEL_RWX + RELOCATABLE!

Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191224064126.183670-2-ruscur@russell.cc
2020-01-16 14:59:36 +10:00
Ingo Molnar
1e5f8a3085 Linux 5.5-rc3
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl4AEiYeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGR3sH/ixrBBYUVyjRPOxS
 ce4iVoTqphGSoAzq/3FA1YZZOPQ/Ep0NXL4L2fTGxmoiqIiuy8JPp07/NKbHQjj1
 Rt6PGm6cw2pMJHaK9gRdlTH/6OyXkp06OkH1uHqKYrhPnpCWDnj+i2SHAX21Hr1y
 oBQh4/XKvoCMCV96J2zxRsLvw8OkQFE0ouWWfj6LbpXIsmWZ++s0OuaO1cVdP/oG
 j+j2Voi3B3vZNQtGgJa5W7YoZN5Qk4ZIj9bMPg7bmKRd3wNB228AiJH2w68JWD/I
 jCA+JcITilxC9ud96uJ6k7SMS2ufjQlnP0z6Lzd0El1yGtHYRcPOZBgfOoPU2Euf
 33WGSyI=
 =iEwx
 -----END PGP SIGNATURE-----

Merge tag 'v5.5-rc3' into sched/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-25 10:41:37 +01:00
Shile Zhang
1091670637 scripts/sorttable: Rename 'sortextable' to 'sorttable'
Use a more generic name for additional table sorting usecases,
such as the upcoming ORC table sorting feature. This tool is
not tied to exception table sorting anymore.

No functional changes intended.

[ mingo: Rewrote the changelog. ]

Signed-off-by: Shile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Michal Marek <michal.lkml@markovi.net>
Cc: linux-kbuild@vger.kernel.org
Link: https://lkml.kernel.org/r/20191204004633.88660-6-shile.zhang@linux.alibaba.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-13 10:47:58 +01:00
Thomas Gleixner
fdc5569eab sched/rt, powerpc: Use CONFIG_PREEMPTION
CONFIG_PREEMPTION is selected by CONFIG_PREEMPT and by CONFIG_PREEMPT_RT.
Both PREEMPT and PREEMPT_RT require the same functionality which today
depends on CONFIG_PREEMPT.

Switch the entry code over to use CONFIG_PREEMPTION.

[bigeasy: +Kconfig]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20191024160458.vlnf3wlcyjl2ich7@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-08 14:37:32 +01:00
Linus Torvalds
aedc0650f9 * PPC secure guest support
* small x86 cleanup
 * fix for an x86-specific out-of-bounds write on a ioctl (not guest triggerable,
   data not attacker-controlled)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJd551cAAoJEL/70l94x66D+JkH/R3eEOyvckPmYmzd0lnV8mQ/
 7e0n2G/aD+iLZkcCbUnMaImdmSJmoEEJCPjgPk/5nJ3zUi5b/ABWyidEM5uf19Hl
 rzKBg0DR7BiQptPnZv2JMwEVKu3JOTchMykqu9xXChQlICocZ0xjdOA6nQ19p0Lv
 FulDw5MUaWrXevIzCBskQ38zJejRQA6CpD1lQkHn7LKS9p3p+BsAOd/Ouy87RfWG
 b3ktECNbXyO6KStrrhgm+z8pviWY+kqYklyBlDOOwxWif0x8WvNDpQLoVo+ZuhLU
 Me8YJ1BN75vFlxzh6ZK5exBUnm9E3fGVKIaaF+dpuds2x+j4HnYl+lZCm89MdqY=
 =Q4v7
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull more KVM updates from Paolo Bonzini:

 - PPC secure guest support

 - small x86 cleanup

 - fix for an x86-specific out-of-bounds write on a ioctl (not guest
   triggerable, data not attacker-controlled)

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  kvm: vmx: Stop wasting a page for guest_msrs
  KVM: x86: fix out-of-bounds write in KVM_GET_EMULATED_CPUID (CVE-2019-19332)
  Documentation: kvm: Fix mention to number of ioctls classes
  powerpc: Ultravisor: Add PPC_UV config option
  KVM: PPC: Book3S HV: Support reset of secure guest
  KVM: PPC: Book3S HV: Handle memory plug/unplug to secure VM
  KVM: PPC: Book3S HV: Radix changes for secure guest
  KVM: PPC: Book3S HV: Shared pages support for secure guests
  KVM: PPC: Book3S HV: Support for running secure guests
  mm: ksm: Export ksm_madvise()
  KVM x86: Move kvm cpuid support out of svm
2019-12-04 11:08:30 -08:00
Anshuman Khandual
013a53f2d2 powerpc: Ultravisor: Add PPC_UV config option
CONFIG_PPC_UV adds support for ultravisor.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
[ Update config help and commit message ]
Signed-off-by: Claudio Carvalho <cclaudio@linux.ibm.com>
Reviewed-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-11-28 17:02:40 +11:00
Christophe Leroy
265c3491c4 powerpc: Add support for GENERIC_EARLY_IOREMAP
Add support for GENERIC_EARLY_IOREMAP.

Let's define 16 slots of 256Kbytes each for early ioremap.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/412c7eaa6a373d8f82a3c3ee01e6a65a1a6589de.1568295907.git.christophe.leroy@c-s.fr
2019-11-19 19:38:38 +11:00
Michael Ellerman
3df191118b Merge branch 'topic/kaslr-book3e32' into next
This is a slight rebase of Scott's next branch, which contained the
KASLR support for book3e 32-bit, to squash in a couple of small fixes.

See the	original pull request:
  https://lore.kernel.org/r/20191022232155.GA26174@home.buserror.net
2019-11-14 19:23:33 +11:00
Jason Yan
2b0e86cc5d powerpc/fsl_booke/32: implement KASLR infrastructure
This patch add support to boot kernel from places other than KERNELBASE.
Since CONFIG_RELOCATABLE has already supported, what we need to do is
map or copy kernel to a proper place and relocate. Freescale Book-E
parts expect lowmem to be mapped by fixed TLB entries(TLB1). The TLB1
entries are not suitable to map the kernel directly in a randomized
region, so we chose to copy the kernel to a proper place and restart to
relocate.

The offset of the kernel was not randomized yet(a fixed 64M is set). We
will randomize it in the next patch.

Signed-off-by: Jason Yan <yanaijie@huawei.com>
Tested-by: Diana Craciun <diana.craciun@nxp.com>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>
[mpe: Use PTRRELOC() in early_init()]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-11-13 19:27:40 +11:00
Michael Ellerman
d34a5709be Merge branch 'topic/secureboot' into next
Merge the secureboot support, as well as the IMA changes needed to
support it.

From Nayna's cover letter:
  In order to verify the OS kernel on PowerNV systems, secure boot
  requires X.509 certificates trusted by the platform. These are
  stored in secure variables controlled by OPAL, called OPAL secure
  variables. In order to enable users to manage the keys, the secure
  variables need to be exposed to userspace.

  OPAL provides the runtime services for the kernel to be able to
  access the secure variables. This patchset defines the kernel
  interface for the OPAL APIs. These APIs are used by the hooks, which
  load these variables to the keyring and expose them to the userspace
  for reading/writing.

  Overall, this patchset adds the following support:
    * expose secure variables to the kernel via OPAL Runtime API interface
    * expose secure variables to the userspace via kernel sysfs interface
    * load kernel verification and revocation keys to .platform and
      .blacklist keyring respectively.

  The secure variables can be read/written using simple linux
  utilities cat/hexdump.

  For example:
  Path to the secure variables is: /sys/firmware/secvar/vars

    Each secure variable is listed as directory.
    $ ls -l
    total 0
    drwxr-xr-x. 2 root root 0 Aug 20 21:20 db
    drwxr-xr-x. 2 root root 0 Aug 20 21:20 KEK
    drwxr-xr-x. 2 root root 0 Aug 20 21:20 PK

  The attributes of each of the secure variables are (for example: PK):
    $ ls -l
    total 0
    -r--r--r--. 1 root root  4096 Oct  1 15:10 data
    -r--r--r--. 1 root root 65536 Oct  1 15:10 size
    --w-------. 1 root root  4096 Oct  1 15:12 update

  The "data" is used to read the existing variable value using
  hexdump. The data is stored in ESL format. The "update" is used to
  write a new value using cat. The update is to be submitted as AUTH
  file.
2019-11-13 16:55:50 +11:00
Nayna Jain
bd5d9c743d powerpc: expose secure variables to userspace via sysfs
PowerNV secure variables, which store the keys used for OS kernel
verification, are managed by the firmware. These secure variables need to
be accessed by the userspace for addition/deletion of the certificates.

This patch adds the sysfs interface to expose secure variables for PowerNV
secureboot. The users shall use this interface for manipulating
the keys stored in the secure variables.

Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Eric Richter <erichte@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1573441836-3632-3-git-send-email-nayna@linux.ibm.com
2019-11-13 00:33:22 +11:00
Nayna Jain
4238fad366 powerpc/ima: Add support to initialize ima policy rules
PowerNV systems use a Linux-based bootloader, which rely on the IMA
subsystem to enforce different secure boot modes. Since the
verification policy may differ based on the secure boot mode of the
system, the policies must be defined at runtime.

This patch implements arch-specific support to define IMA policy rules
based on the runtime secure boot mode of the system.

This patch provides arch-specific IMA policies if PPC_SECURE_BOOT
config is enabled.

Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1572492694-6520-3-git-send-email-zohar@linux.ibm.com
2019-11-12 12:25:49 +11:00
Nayna Jain
1a8916ee3a powerpc: Detect the secure boot mode of the system
This patch defines a function to detect the secure boot state of a
PowerNV system.

The PPC_SECURE_BOOT config represents the base enablement of secure
boot for powerpc.

Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
Signed-off-by: Eric Richter <erichte@linux.ibm.com>
[mpe: Fold in change from Nayna to add "ibm,secureboot" to ids]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/46b003b9-3225-6bf7-9101-ed6580bb748c@linux.ibm.com
2019-11-12 12:25:02 +11:00