Currently there are many platforms that dont enable ARCH_HAS_PTE_SPECIAL
but required to define quite similar fallback stubs for special page
table entry helpers such as pte_special() and pte_mkspecial(), as they
get build in generic MM without a config check. This creates two
generic fallback stub definitions for these helpers, eliminating much
code duplication.
mips platform has a special case where pte_special() and pte_mkspecial()
visibility is wider than what ARCH_HAS_PTE_SPECIAL enablement requires.
This restricts those symbol visibility in order to avoid redefinitions
which is now exposed through this new generic stubs and subsequent build
failure. arm platform set_pte_at() definition needs to be moved into a
C file just to prevent a build failure.
[anshuman.khandual@arm.com: use defined(CONFIG_ARCH_HAS_PTE_SPECIAL) in mips per Thomas]
Link: http://lkml.kernel.org/r/1583851924-21603-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Guo Ren <guoren@kernel.org> [csky]
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Acked-by: Stafford Horne <shorne@gmail.com> [openrisc]
Acked-by: Helge Deller <deller@gmx.de> [parisc]
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Sam Creasey <sammy@sammy.net>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paulburton@kernel.org>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Chris Zankel <chris@zankel.net>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Link: http://lkml.kernel.org/r/1583802551-15406-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are many platforms with exact same value for VM_DATA_DEFAULT_FLAGS
This creates a default value for VM_DATA_DEFAULT_FLAGS in line with the
existing VM_STACK_DEFAULT_FLAGS. While here, also define some more
macros with standard VMA access flag combinations that are used
frequently across many platforms. Apart from simplification, this
reduces code duplication as well.
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Guo Ren <guoren@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paulburton@kernel.org>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Rich Felker <dalias@libc.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Chris Zankel <chris@zankel.net>
Link: http://lkml.kernel.org/r/1583391014-8170-2-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Generated files are also checked by sparse that's why add newline to
remove sparse (C=1) warning.
The issue was found on Microblaze and reported like this:
./arch/microblaze/include/generated/uapi/asm/unistd_32.h:438:45: warning:
no newline at end of file
Mips and PowerPC have it already but let's align with style used by m68k.
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Stefan Asserhall <stefan.asserhall@xilinx.com>
Acked-by: Max Filippov <jcmvbkbc@gmail.com> (xtensa)
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Chris Zankel <chris@zankel.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Burton <paulburton@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/4d32ab4e1fb2edb691d2e1687e8fb303c09fd023.1581504803.git.michal.simek@xilinx.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
driver which affected the PPC users.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6J15gTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYocISD/9zcqwlSAHidpVuNRxlPW/iOeQy+D0T
tERfzeZpuZSSWBNw0xHdsi8mXSdc6turjDvPQJP1W8sAZcpVkdsY7FTXdrnX1N6J
9nBvExvot0tdJ0VoE0iSYdIztCBDnm+IwfLHwSTzWskt1ppve9XXs/wST+zVD10C
ZTFQ0APSszH/DjioubhJ1girwNsxbdb97e35d+QbHUKnTftreH6mN2LySwvj8rkm
/06TayAP3Jus0yPEkauCdh7EBljXQgXawrKUTsaeXsViMjY82a75lTRK+uJ041vN
SlnUGJ/BTNwYl02T13M/TNYZtunOUWGklP5ZuRl7ISWMbbOn7rssdKVzJTrCeB7u
88X1m8WxxBeyUBwpEIoX6Q820YlCxFYl0FGtMFbBtrsyEpoJAnbEAVDbBq8IQgCF
dxN/VOo118KNEGq0ihIjYkaqZB+O+dYUXg2koV2u5N1MFy6/Wy6RHoriasU/Dz37
ihkyXh7U08Mp0EVTUW/MDh8HLO7pary/fy17Mez3ys9DmUF2KUWL2Pt5SWPYj3Df
dPOM7MIpwEd+KtND+HKT9gvWUU5Vtg/unYada8nAcCZpbtJ3ebwlINWHPsnTSfOq
R73jTVM6bn2hzmv+/TDgyGDtoJhniz3PTtW2428dEGC8GSSEaQHe74wHceE7iZNZ
UsUDHPt16CkOqg==
=pZAC
-----END PGP SIGNATURE-----
Merge tag 'irq-urgent-2020-04-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
"Two reverts addressing regressions of the Xilinx interrupt controller
driver which affected the PPC users"
* tag 'irq-urgent-2020-04-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Revert "irqchip/xilinx: Enable generic irq multi handler"
Revert "irqchip/xilinx: Do not call irq_set_default_host()"
- fix an integer overflow in the coherent pool (Kevin Grandemange)
- provide support for in-place uncached remapping and use that
for openrisc
- fix the arm coherent allocator to take the bus limit into account
-----BEGIN PGP SIGNATURE-----
iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAl6IL38LHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYN8Pg/9FerEL9GoU7lSeXvvklAED5Ro3JVyaZdA4eQaYzG6
PnUYs+lFAsKGoE8qWbvue+pJs1JZtZ+oSp9uVrLGbPaMS+71iy/hCS+Cv9Ym0Y0u
lciBlFEkN9qM1srBMn4vdBo4gIyailcylznQxzw/ZjU90vU4uUJj37YrfGxQnOd8
pHrD24wutwECksEp6nLx3/Yt2xW92j9/oH+FnEK5mfaA0ATAQMz51L9veyU6liV4
1A8jbi0diskAIqn4uyO5SuVg7s7C90HOe3JUBtk+oZvUXFlr9WvrDjCqBp6rcSiH
XS+Z2RBomMSrjEHOfETcFu8JSyjY1eKu/a1rvEbmUc6bE/gKrEGgPiQwSLa+1Aty
qy3I24uSF7xcs5yngnjzIQ/BizKFk/wzja15c4sfUNKiXLI6FqwwHL34Dg+Nv7UG
A/eCXePzOGPVANcIU0Zh68epEfCJRqJtqy2BDrWisqRfhxd3rRgl9gNeS1JwR0El
9T5c+CKfXn1IVA3YhMABUYh1JJ9bXrlZIOd3PEPwvwYRBnIYxP6JK2R+4BYjsMHy
Y90QyAUUsJKMWYq4p4EpSCUGSlnzl9I2QH3ItUHvo+T9NcT6Vo4J6tCTQZu5tUGM
SPiV49Gxz3u2+5VBmolWixO6JpRBv+92gowWdxRULkFpMaOw8mPInWW5cWPWc2MY
u/Y=
=DrK0
-----END PGP SIGNATURE-----
Merge tag 'dma-mapping-5.7' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping updates from Christoph Hellwig:
- fix an integer overflow in the coherent pool (Kevin Grandemange)
- provide support for in-place uncached remapping and use that for
openrisc
- fix the arm coherent allocator to take the bus limit into account
* tag 'dma-mapping-5.7' of git://git.infradead.org/users/hch/dma-mapping:
ARM/dma-mapping: merge __dma_supported into arm_dma_supported
ARM/dma-mapping: take the bus limit into account in __dma_alloc
ARM/dma-mapping: remove get_coherent_dma_mask
openrisc: use the generic in-place uncached DMA allocator
dma-direct: provide a arch_dma_clear_uncached hook
dma-direct: make uncached_kernel_address more general
dma-direct: consolidate the error handling in dma_direct_alloc_pages
dma-direct: remove the cached_kernel_address hook
dma-coherent: fix integer overflow in the reserved-memory dma allocation
Here are 3 SPDX patches for 5.7-rc1.
One fixes up the SPDX tag for a single driver, while the other two go
through the tree and add SPDX tags for all of the .gitignore files as
needed.
Nothing too complex, but you will get a merge conflict with your current
tree, that should be trivial to handle (one file modified by two things,
one file deleted.)
All 3 of these have been in linux-next for a while, with no reported
issues other than the merge conflict.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXodg5A8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ykySQCgy9YDrkz7nWq6v3Gohl6+lW/L+rMAnRM4uTZm
m5AuCzO3Azt9KBi7NL+L
=2Lm5
-----END PGP SIGNATURE-----
Merge tag 'spdx-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx
Pull SPDX updates from Greg KH:
"Here are three SPDX patches for 5.7-rc1.
One fixes up the SPDX tag for a single driver, while the other two go
through the tree and add SPDX tags for all of the .gitignore files as
needed.
Nothing too complex, but you will get a merge conflict with your
current tree, that should be trivial to handle (one file modified by
two things, one file deleted.)
All three of these have been in linux-next for a while, with no
reported issues other than the merge conflict"
* tag 'spdx-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx:
ASoC: MT6660: make spdxcheck.py happy
.gitignore: add SPDX License Identifier
.gitignore: remove too obvious comments
- Unit test for overlays with GPIO hogs
- Improve dma-ranges parsing to handle dma-ranges with multiple entries
- Update dtc to upstream version v1.6.0-2-g87a656ae5ff9
- Improve overlay error reporting
- Device link support for power-domains and hwlocks bindings
- Add vendor prefixes for Beacon, Topwise, ENE, Dell, SG Micro, Elida,
PocketBook, Xiaomi, Linutronix, OzzMaker, Waveshare Electronics, and
ITE Tech
- Add deprecated Marvell vendor prefix 'mrvl'
- A bunch of binding conversions to DT schema continues. Of note, the
common serial and USB connector bindings are converted.
- Add more Arm CPU compatibles
- Drop Mark Rutland as DT maintainer :(
-----BEGIN PGP SIGNATURE-----
iQJEBAABCgAuFiEEktVUI4SxYhzZyEuo+vtdtY28YcMFAl6GTJUQHHJvYmhAa2Vy
bmVsLm9yZwAKCRD6+121jbxhw8vaD/9HDT0EEDRdfr8s5OiLHYQMSXlSExJgR+ON
iLI4v19apWwyycil176RXuWL5fFxh9HHOKbxOL27mmfqSLdv/xe/ev6Z4oNLVZzm
YiUalm37K+R9yWsWTqxJKtQxWqSBBtOlyJmUYnvxI1bjfF4PANKF+IMpQVDflj1w
9UMixoPRFdlOoMXVmC7yT0sfE1B6REJIh0Qfa4pqDwnHRhs8imgoIYA+JwFnmFPr
k5Q8xS2YCwopH/7W5aRLzyEI2dF6Rfy9FID8jfyNoKZLtBW6cIEkA4zTv41BZSuj
4Zj2etELXAnkIWMZXXdeKSz8IoSOlgjsCR482QmzTRuoFdmVLoJgsgh9hO2+oK07
vJDEVJnXKxq1pF6DVbYwuzNgPB1pRzg6Y8EitdH3SBgjsOQ/m4RpKaqN0MkQhYrB
BI9PCiF7GKCzNfNl3b/Op+HSsZ6aJkOsQPzC89/Ww0mB8azypgAwvLsjte5R3hib
fS8z+V+lxAAC0/M981TsPKJ96HNv4VXbLI3bY9qJBd2LB+NDortsF/OkUuObsh+6
hVQCWnae5V5iq8WmMTFABfBAK5lHafop87edn6X33LOyZZQCT9svNz/zrpakXh0e
/SBGTUudtjCOy/BLj1VQrb6tczEzos7Ij+bszeIBAcSzrWg8gtVPjw9TvuAo2b/b
9nwR4yrm0Q==
=db/l
-----END PGP SIGNATURE-----
Merge tag 'devicetree-for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree updates from Rob Herring:
- Unit test for overlays with GPIO hogs
- Improve dma-ranges parsing to handle dma-ranges with multiple entries
- Update dtc to upstream version v1.6.0-2-g87a656ae5ff9
- Improve overlay error reporting
- Device link support for power-domains and hwlocks bindings
- Add vendor prefixes for Beacon, Topwise, ENE, Dell, SG Micro, Elida,
PocketBook, Xiaomi, Linutronix, OzzMaker, Waveshare Electronics, and
ITE Tech
- Add deprecated Marvell vendor prefix 'mrvl'
- A bunch of binding conversions to DT schema continues. Of note, the
common serial and USB connector bindings are converted.
- Add more Arm CPU compatibles
- Drop Mark Rutland as DT maintainer :(
* tag 'devicetree-for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (106 commits)
MAINTAINERS: drop an old reference to stm32 pwm timers doc
MAINTAINERS: dt: update etnaviv file reference
dt-bindings: usb: dwc2: fix bindings for amlogic, meson-gxbb-usb
dt-bindings: uniphier-system-bus: fix warning in the example
dt-bindings: display: meson-vpu: fix indentation of reg-names' "items"
dt-bindings: iio: Fix adi, ltc2983 uint64-matrix schema constraints
dt-bindings: power: Fix example for power-domain
dt-bindings: arm: Add some constraints for PSCI nodes
of: some unittest overlays not untracked
of: gpio unittest kfree() wrong object
dt-bindings: phy: convert phy-rockchip-inno-usb2 bindings to yaml
dt-bindings: serial: sh-sci: Convert to json-schema
dt-bindings: serial: Document serialN aliases
dt-bindings: thermal: tsens: Set 'additionalProperties: false'
dt-bindings: thermal: tsens: Fix nvmem-cell-names schema
dt-bindings: vendor-prefixes: Add Beacon vendor prefix
dt-bindings: vendor-prefixes: Add Topwise
of: of_private.h: Replace zero-length array with flexible-array member
docs: dt: fix a broken reference to input.yaml
docs: dt: fix references to ap806-system-controller.txt
...
The idea comes from a discussion between Linus and Andrea [1].
Before this patch we only allow a page fault to retry once. We achieved
this by clearing the FAULT_FLAG_ALLOW_RETRY flag when doing
handle_mm_fault() the second time. This was majorly used to avoid
unexpected starvation of the system by looping over forever to handle the
page fault on a single page. However that should hardly happen, and after
all for each code path to return a VM_FAULT_RETRY we'll first wait for a
condition (during which time we should possibly yield the cpu) to happen
before VM_FAULT_RETRY is really returned.
This patch removes the restriction by keeping the FAULT_FLAG_ALLOW_RETRY
flag when we receive VM_FAULT_RETRY. It means that the page fault handler
now can retry the page fault for multiple times if necessary without the
need to generate another page fault event. Meanwhile we still keep the
FAULT_FLAG_TRIED flag so page fault handler can still identify whether a
page fault is the first attempt or not.
Then we'll have these combinations of fault flags (only considering
ALLOW_RETRY flag and TRIED flag):
- ALLOW_RETRY and !TRIED: this means the page fault allows to
retry, and this is the first try
- ALLOW_RETRY and TRIED: this means the page fault allows to
retry, and this is not the first try
- !ALLOW_RETRY and !TRIED: this means the page fault does not allow
to retry at all
- !ALLOW_RETRY and TRIED: this is forbidden and should never be used
In existing code we have multiple places that has taken special care of
the first condition above by checking against (fault_flags &
FAULT_FLAG_ALLOW_RETRY). This patch introduces a simple helper to detect
the first retry of a page fault by checking against both (fault_flags &
FAULT_FLAG_ALLOW_RETRY) and !(fault_flag & FAULT_FLAG_TRIED) because now
even the 2nd try will have the ALLOW_RETRY set, then use that helper in
all existing special paths. One example is in __lock_page_or_retry(), now
we'll drop the mmap_sem only in the first attempt of page fault and we'll
keep it in follow up retries, so old locking behavior will be retained.
This will be a nice enhancement for current code [2] at the same time a
supporting material for the future userfaultfd-writeprotect work, since in
that work there will always be an explicit userfault writeprotect retry
for protected pages, and if that cannot resolve the page fault (e.g., when
userfaultfd-writeprotect is used in conjunction with swapped pages) then
we'll possibly need a 3rd retry of the page fault. It might also benefit
other potential users who will have similar requirement like userfault
write-protection.
GUP code is not touched yet and will be covered in follow up patch.
Please read the thread below for more information.
[1] https://lore.kernel.org/lkml/20171102193644.GB22686@redhat.com/
[2] https://lore.kernel.org/lkml/20181230154648.GB9832@redhat.com/
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Brian Geffon <bgeffon@google.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Link: http://lkml.kernel.org/r/20200220160246.9790-1-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Although there're tons of arch-specific page fault handlers, most of them
are still sharing the same initial value of the page fault flags. Say,
merely all of the page fault handlers would allow the fault to be retried,
and they also allow the fault to respond to SIGKILL.
Let's define a default value for the fault flags to replace those initial
page fault flags that were copied over. With this, it'll be far easier to
introduce new fault flag that can be used by all the architectures instead
of touching all the archs.
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Brian Geffon <bgeffon@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Link: http://lkml.kernel.org/r/20200220160238.9694-1-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For most architectures, we've got a quick path to detect fatal signal
after a handle_mm_fault(). Introduce a helper for that quick path.
It cleans the current codes a bit so we don't need to duplicate the same
check across archs. More importantly, this will be an unified place that
we handle the signal immediately right after an interrupted page fault, so
it'll be much easier for us if we want to change the behavior of handling
signals later on for all the archs.
Note that currently only part of the archs are using this new helper,
because some archs have their own way to handle signals. In the follow up
patches, we'll try to apply this helper to all the rest of archs.
Another note is that the "regs" parameter in the new helper is not used
yet. It'll be used very soon. Now we kept it in this patch only to avoid
touching all the archs again in the follow up patches.
[peterx@redhat.com: fix sparse warnings]
Link: http://lkml.kernel.org/r/20200311145921.GD479302@xz-x1
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Brian Geffon <bgeffon@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Link: http://lkml.kernel.org/r/20200220155353.8676-4-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change a header to mandatory-y if both of the following are met:
[1] At least one architecture (except um) specifies it as generic-y in
arch/*/include/asm/Kbuild
[2] Every architecture (except um) either has its own implementation
(arch/*/include/asm/*.h) or specifies it as generic-y in
arch/*/include/asm/Kbuild
This commit was generated by the following shell script.
----------------------------------->8-----------------------------------
arches=$(cd arch; ls -1 | sed -e '/Kconfig/d' -e '/um/d')
tmpfile=$(mktemp)
grep "^mandatory-y +=" include/asm-generic/Kbuild > $tmpfile
find arch -path 'arch/*/include/asm/Kbuild' |
xargs sed -n 's/^generic-y += \(.*\)/\1/p' | sort -u |
while read header
do
mandatory=yes
for arch in $arches
do
if ! grep -q "generic-y += $header" arch/$arch/include/asm/Kbuild &&
! [ -f arch/$arch/include/asm/$header ]; then
mandatory=no
break
fi
done
if [ "$mandatory" = yes ]; then
echo "mandatory-y += $header" >> $tmpfile
for arch in $arches
do
sed -i "/generic-y += $header/d" arch/$arch/include/asm/Kbuild
done
fi
done
sed -i '/^mandatory-y +=/d' include/asm-generic/Kbuild
LANG=C sort $tmpfile >> include/asm-generic/Kbuild
----------------------------------->8-----------------------------------
One obvious benefit is the diff stat:
25 files changed, 52 insertions(+), 557 deletions(-)
It is tedious to list generic-y for each arch that needs it.
So, mandatory-y works like a fallback default (by just wrapping
asm-generic one) when arch does not have a specific header
implementation.
See the following commits:
def3f7cefea1b39bae16
It is tedious to convert headers one by one, so I processed by a shell
script.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Simek <michal.simek@xilinx.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Link: http://lkml.kernel.org/r/20200210175452.5030-1-masahiroy@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Convert license headers to SPDX
- Cleanup header handling and use asm-generic one
- Get rid of earlyprintk residues
- Define barriers and use it in the code
- Get rid of setup_irq() for timer
- Various small addons and fixes
-----BEGIN PGP SIGNATURE-----
iF0EABECAB0WIQQbPNTMvXmYlBPRwx7KSWXLKUoMIQUCXoNXfwAKCRDKSWXLKUoM
IapQAJ4g6yXuEbFrXmyidwthkY0mqGuXWwCgmZPjLDeFGHypYPUN9IUAB7rr/k8=
=pnz2
-----END PGP SIGNATURE-----
Merge tag 'microblaze-v5.7-rc1' of git://git.monstr.eu/linux-2.6-microblaze
Pull microblaze updates from Michal Simek:
- convert license headers to SPDX
- cleanup header handling and use asm-generic one
- get rid of earlyprintk residues
- define barriers and use it in the code
- get rid of setup_irq() for timer
- various small addons and fixes
* tag 'microblaze-v5.7-rc1' of git://git.monstr.eu/linux-2.6-microblaze:
microblaze: Replace setup_irq() by request_irq()
microblaze: Stop printing the virtual memory layout
microblaze: Use asm generic cmpxchg.h for !SMP case
microblaze: Define percpu sestion in linker file
microblaze: Remove unused boot_cpuid variable
microblaze: Add missing irqflags.h header
microblaze: Add sync to tlb operations
microblaze: Define microblaze barrier
microblaze: Remove empty headers
microblaze: Remove early printk setup
microblaze: Remove architecture tlb.h and use generic one
microblaze: Convert headers to SPDX license
microblaze: Fix _reset() function
microblaze: Kernel parameters should be parsed earlier
Treewide:
- Cleanup of setup_irq() which is not longer required because the
memory allocator is available early. Most cleanup changes come
through the various maintainer trees, so the final removal of
setup_irq() is postponed towards the end of the merge window.
Core:
- Protection against unsafe invocation of interrupt handlers and unsafe
interrupt injection including a fixup of the offending PCI/AER error
injection mechanism.
Invoking interrupt handlers from arbitrary contexts, i.e. outside of
an actual interrupt, can cause inconsistent state on the fragile
x86 interrupt affinity changing hardware trainwreck.
Drivers:
- Second wave of support for the new ARM GICv4.1
- Multi-instance support for Xilinx and PLIC interrupt controllers
- CPU-Hotplug support for PLIC
- The obligatory new driver for X1000 TCU
- Enhancements, cleanups and fixes all over the place
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6B888THHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoeMJD/9v8GcI/DSY87Fmo7s4odLFVU0J8zZ6
7QlYjSPm4yWv4pqn1TEnEF2pKz5X9Euhoh8BmdMKtdXBqlS4Ix9N+pH8ModcxyQo
aX97zuRUxvqfeeVE+yQRwbbMREj9jj9RW8FRtA39+l5H3uC1GDcc+2aAMIaykQ7+
8lo/6wBd8ZrZ0gsNf4KjlBwMDYAlQSRWxrff38PQ2XRpGKowdp8JFYZuq5Vp0ljJ
r2cE75ldmFSfmtuhhVroBRY0GAqW4/8v8/syAN3Q9jOEII60qhA0dqR085B9veWa
DHSqgLmzyUFFXN7Ntzt/fDirJVsIM4BE9qGu3ftCYHMaPB8hG+xqjbZe9E3D2e/d
+0Pb3TG8EHVOIwzv1t9+6462qYGkBhmBXtbj6GptPYk2Ai4HZlNaSsa8jUNyHvGz
WDegdRjt7O5RjqDH/VwrQxW/AEp05f/1egweBXbq9aF6j9nqeOur75c/PdxZxAX5
WUMtouXP2WN+sMW8k1T5cmVMGWxLGBB0wwG4LC/mXzHnkDiN1+2wEUHmhS8Voi3q
3HXeYBJeukUYbVvMKRvWVAD330TxFjAyd6pPwCdoNY2ZngJnQWlDD9vbYYX2osoW
kP+KhIANNBVqdK7NqlLoqcr3SdHn01pQYuVHejNzxb7E6/mmpMlaYDJc/rMPi/eM
0/rzl8fAj/WyBQ==
=DZ/G
-----END PGP SIGNATURE-----
Merge tag 'irq-core-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"Updates for the interrupt subsystem:
Treewide:
- Cleanup of setup_irq() which is not longer required because the
memory allocator is available early.
Most cleanup changes come through the various maintainer trees, so
the final removal of setup_irq() is postponed towards the end of
the merge window.
Core:
- Protection against unsafe invocation of interrupt handlers and
unsafe interrupt injection including a fixup of the offending
PCI/AER error injection mechanism.
Invoking interrupt handlers from arbitrary contexts, i.e. outside
of an actual interrupt, can cause inconsistent state on the
fragile x86 interrupt affinity changing hardware trainwreck.
Drivers:
- Second wave of support for the new ARM GICv4.1
- Multi-instance support for Xilinx and PLIC interrupt controllers
- CPU-Hotplug support for PLIC
- The obligatory new driver for X1000 TCU
- Enhancements, cleanups and fixes all over the place"
* tag 'irq-core-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (58 commits)
unicore32: Replace setup_irq() by request_irq()
sh: Replace setup_irq() by request_irq()
hexagon: Replace setup_irq() by request_irq()
c6x: Replace setup_irq() by request_irq()
alpha: Replace setup_irq() by request_irq()
irqchip/gic-v4.1: Eagerly vmap vPEs
irqchip/gic-v4.1: Add VSGI property setup
irqchip/gic-v4.1: Add VSGI allocation/teardown
irqchip/gic-v4.1: Move doorbell management to the GICv4 abstraction layer
irqchip/gic-v4.1: Plumb set_vcpu_affinity SGI callbacks
irqchip/gic-v4.1: Plumb get/set_irqchip_state SGI callbacks
irqchip/gic-v4.1: Plumb mask/unmask SGI callbacks
irqchip/gic-v4.1: Add initial SGI configuration
irqchip/gic-v4.1: Plumb skeletal VSGI irqchip
irqchip/stm32: Retrigger both in eoi and unmask callbacks
irqchip/gic-v3: Move irq_domain_update_bus_token to after checking for NULL domain
irqchip/xilinx: Do not call irq_set_default_host()
irqchip/xilinx: Enable generic irq multi handler
irqchip/xilinx: Fill error code when irq domain registration fails
irqchip/xilinx: Add support for multiple instances
...
Move access_ok() in and pagefault_enable()/pagefault_disable() out.
Mechanical conversion only - some instances don't really need
a separate access_ok() at all (e.g. the ones only using
get_user()/put_user(), or architectures where access_ok()
is always true); we'll deal with that in followups.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Register default arch handler via driver instead of directly pointing to
xilinx intc controller. This patch makes architecture code more generic.
Driver calls generic domain specific irq handler which does the most of
things self. Also get rid of concurrent_irq counting which hasn't been
exported anywhere.
Based on this loop was also optimized by using do/while loop instead of
goto loop.
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Stefan Asserhall <stefan.asserhall@xilinx.com>
Link: https://lore.kernel.org/r/20200317125600.15913-4-mubin.usman.sayyed@xilinx.com
The defconfig compiles without linux/mm.h. With mm.h included the
include chain leands to:
| CC kernel/locking/percpu-rwsem.o
| In file included from include/linux/huge_mm.h:8,
| from include/linux/mm.h:567,
| from arch/microblaze/include/asm/uaccess.h:,
| from include/linux/uaccess.h:11,
| from include/linux/sched/task.h:11,
| from include/linux/sched/signal.h:9,
| from include/linux/rcuwait.h:6,
| from include/linux/percpu-rwsem.h:8,
| from kernel/locking/percpu-rwsem.c:6:
| include/linux/fs.h:1422:29: error: array type has incomplete element type 'struct percpu_rw_semaphore'
| 1422 | struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];
once rcuwait.h includes linux/sched/signal.h.
Remove the linux/mm.h include.
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200321113241.719022171@linutronix.de
Rename the symbol to arch_dma_set_uncached, and pass a size to it as
well as allow an error return. That will allow reusing this hook for
in-place pagetable remapping.
As the in-place remap doesn't always require an explicit cache flush,
also detangle ARCH_HAS_DMA_PREP_COHERENT from ARCH_HAS_DMA_SET_UNCACHED.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
dma-direct now finds the kernel address for coherent allocations based
on the dma address, so the cached_kernel_address hooks is unused and
can be removed entirely.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
request_irq() is preferred over setup_irq(). Invocations of setup_irq()
occur after memory allocators are ready.
Per tglx[1], setup_irq() existed in olden days when allocators were not
ready by the time early interrupts were initialized.
Hence replace setup_irq() by request_irq().
[1] https://lkml.kernel.org/r/alpine.DEB.2.20.1710191609480.1971@nanos
Signed-off-by: afzal mohammed <afzal.mohd.ma@gmail.com>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
For security, don't display the kernel's virtual memory layout.
Kees Cook points out:
"These have been entirely removed on other architectures, so let's
just do the same for ia32 and remove it unconditionally."
071929dbdd ("arm64: Stop printing the virtual memory layout")
1c31d4e96b ("ARM: 8820/1: mm: Stop printing the virtual memory layout")
31833332f7 ("m68k/mm: Stop printing the virtual memory layout")
fd8d0ca256 ("parisc: Hide virtual kernel memory layout")
adb1fe9ae2 ("mm/page_alloc: Remove kernel address exposure in free_reserved_area()")
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Acked-by: Tycho Andersen <tycho@tycho.ws>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
The whole implementation is done in asm-generic/cmpxchg.h file and there is
no reason to duplicate it.
Also do not include asm-generic/cmpxchg-local.h because it is already
included from asm-generic/cmpxchg.h
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Reviewed-by: Stefan Asserhall <stefan.asserhall@xilinx.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Adding SMP requires to have percpu section defined.
Signed-off-by: Stefan Asserhall <stefan.asserhall@xilinx.com>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
boot_cpuid is not used on uni processor system that's why can be removed.
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Reviewed-by: Stefan Asserhall <stefan.asserhall@xilinx.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Without this header local_save_flags() is not defined.
Signed-off-by: Stefan Asserhall <stefan.asserhall@xilinx.com>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Do the real sync by using mbar instruction.
Signed-off-by: Stefan Asserhall <stefan.asserhall@xilinx.com>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
There is no reason to keep empty headers if asm-generic one can be used
instead. That's why remove empty file and use asm-generic equivalent.
cputable.h is completely unused that's why remove it.
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Reviewed-by: Stefan Asserhall <stefan.asserhall@xilinx.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Early printk has been removed already that's why this setting doesn't make
any sense. Also change printk level from pr_info to pr_err.
Fixes: 96f0e6fcc9 ("microblaze: remove redundant early_printk support")
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Reviewed-by: Stefan Asserhall <stefan.asserhall@xilinx.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
There is no reason to have this empty file if it is just link to
asm-generic/tlb.h.
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Reviewed-by: Stefan Asserhall <stefan.asserhall@xilinx.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Covert all headers to SPDX.
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Reviewed-by: Stefan Asserhall <stefan.asserhall@xilinx.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
There is a need to disable VM before jump to zero reset vector.
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Reviewed-by: Stefan Asserhall <stefan.asserhall@xilinx.com>
Kernel command line should be parsed before cma initialization to be able
to get cma sizes from command line. That's why call parse_early_param()
before dma_continugous_reserve().
Unfortunately it can't be called earlier in machine_early_init() because
if earlycon is passed in the command line the parse_early_param() attempts
an ioremap which fails as the memory params are not set yet.
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@xilinx.com>
Let's use the struct of_pci_range.flags field instead so we can remove
the pci_space field.
Just drop the debug prints as there's plenty of debug output in
drivers/of/address.c which can be enabled.
Cc: Michal Simek <monstr@monstr.eu>
Signed-off-by: Rob Herring <robh@kernel.org>
- Enable CMA
- Add support for MB v11
- Defconfig updates
- Minor fixes
-----BEGIN PGP SIGNATURE-----
iF0EABECAB0WIQQbPNTMvXmYlBPRwx7KSWXLKUoMIQUCXjlJ1gAKCRDKSWXLKUoM
IWy9AJ4tauV9sUb+zNadrYxI+2zemRstUwCfQ49LG4kHpFCv8ldSTmhBPJY/3MI=
=QpT4
-----END PGP SIGNATURE-----
Merge tag 'microblaze-v5.6-rc1' of git://git.monstr.eu/linux-2.6-microblaze
Pull Microblaze update from Michal Simek:
- enable CMA
- add support for MB v11
- defconfig updates
- minor fixes
* tag 'microblaze-v5.6-rc1' of git://git.monstr.eu/linux-2.6-microblaze:
microblaze: Add ID for Microblaze v11
microblaze: Prevent the overflow of the start
microblaze: Wire CMA allocator
asm-generic: Make dma-contiguous.h a mandatory include/asm header
microblaze: Sync defconfig with latest Kconfig layout
microblaze: defconfig: Disable EXT2 driver and Enable EXT3 & EXT4 drivers
microblaze: Align comments with register usage
In case the start + cache size is more than the max int the
start overflows.
Prevent the same.
Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@xilinx.com>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Based on commit 04e3543e22 ("microblaze: use the generic dma coherent
remap allocator")
CMA can be easily enabled by calling dma_contiguous_reserve() at the end of
mmu_init(). High limit is end of lowmem space which is completely unused at
this point of time.
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Layout was changed by commit 6210b6402f
("kernel-hacking: group sysrq/kgdb/ubsan into 'Generic Kernel Debugging Instruments'")
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
As EXT4 filesystem driver is used for handling EXT2 file systems as
well. There is no need to enable EXT2 driver. This patch disables EXT2
and enables EXT3/EXT4 drivers.
Signed-off-by: Manish Narani <manish.narani@xilinx.com>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCXjFo8wAKCRCRxhvAZXjc
omaGAQDVwCHQekqxp2eC8EJH4Pkt+Bn1BLrA25stlTo93YBPHgEAsPVUCRNcrZAl
VncYmxCfpt3Yu0S/MTVXu5xrRiIXPQk=
=uqTN
-----END PGP SIGNATURE-----
Merge tag 'threads-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull thread management updates from Christian Brauner:
"Sargun Dhillon over the last cycle has worked on the pidfd_getfd()
syscall.
This syscall allows for the retrieval of file descriptors of a process
based on its pidfd. A task needs to have ptrace_may_access()
permissions with PTRACE_MODE_ATTACH_REALCREDS (suggested by Oleg and
Andy) on the target.
One of the main use-cases is in combination with seccomp's user
notification feature. As a reminder, seccomp's user notification
feature was made available in v5.0. It allows a task to retrieve a
file descriptor for its seccomp filter. The file descriptor is usually
handed of to a more privileged supervising process. The supervisor can
then listen for syscall events caught by the seccomp filter of the
supervisee and perform actions in lieu of the supervisee, usually
emulating syscalls. pidfd_getfd() is needed to expand its uses.
There are currently two major users that wait on pidfd_getfd() and one
future user:
- Netflix, Sargun said, is working on a service mesh where users
should be able to connect to a dns-based VIP. When a user connects
to e.g. 1.2.3.4:80 that runs e.g. service "foo" they will be
redirected to an envoy process. This service mesh uses seccomp user
notifications and pidfd to intercept all connect calls and instead
of connecting them to 1.2.3.4:80 connects them to e.g.
127.0.0.1:8080.
- LXD uses the seccomp notifier heavily to intercept and emulate
mknod() and mount() syscalls for unprivileged containers/processes.
With pidfd_getfd() more uses-cases e.g. bridging socket connections
will be possible.
- The patchset has also seen some interest from the browser corner.
Right now, Firefox is using a SECCOMP_RET_TRAP sandbox managed by a
broker process. In the future glibc will start blocking all signals
during dlopen() rendering this type of sandbox impossible. Hence,
in the future Firefox will switch to a seccomp-user-nofication
based sandbox which also makes use of file descriptor retrieval.
The thread for this can be found at
https://sourceware.org/ml/libc-alpha/2019-12/msg00079.html
With pidfd_getfd() it is e.g. possible to bridge socket connections
for the supervisee (binding to a privileged port) and taking actions
on file descriptors on behalf of the supervisee in general.
Sargun's first version was using an ioctl on pidfds but various people
pushed for it to be a proper syscall which he duely implemented as
well over various review cycles. Selftests are of course included.
I've also added instructions how to deal with merge conflicts below.
There's also a small fix coming from the kernel mentee project to
correctly annotate struct sighand_struct with __rcu to fix various
sparse warnings. We've received a few more such fixes and even though
they are mostly trivial I've decided to postpone them until after -rc1
since they came in rather late and I don't want to risk introducing
build warnings.
Finally, there's a new prctl() command PR_{G,S}ET_IO_FLUSHER which is
needed to avoid allocation recursions triggerable by storage drivers
that have userspace parts that run in the IO path (e.g. dm-multipath,
iscsi, etc). These allocation recursions deadlock the device.
The new prctl() allows such privileged userspace components to avoid
allocation recursions by setting the PF_MEMALLOC_NOIO and
PF_LESS_THROTTLE flags. The patch carries the necessary acks from the
relevant maintainers and is routed here as part of prctl()
thread-management."
* tag 'threads-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
prctl: PR_{G,S}ET_IO_FLUSHER to support controlling memory reclaim
sched.h: Annotate sighand_struct with __rcu
test: Add test for pidfd getfd
arch: wire up pidfd_getfd syscall
pid: Implement pidfd_getfd syscall
vfs, fdtable: Add fget_task helper
Pull openat2 support from Al Viro:
"This is the openat2() series from Aleksa Sarai.
I'm afraid that the rest of namei stuff will have to wait - it got
zero review the last time I'd posted #work.namei, and there had been a
leak in the posted series I'd caught only last weekend. I was going to
repost it on Monday, but the window opened and the odds of getting any
review during that... Oh, well.
Anyway, openat2 part should be ready; that _did_ get sane amount of
review and public testing, so here it comes"
From Aleksa's description of the series:
"For a very long time, extending openat(2) with new features has been
incredibly frustrating. This stems from the fact that openat(2) is
possibly the most famous counter-example to the mantra "don't silently
accept garbage from userspace" -- it doesn't check whether unknown
flags are present[1].
This means that (generally) the addition of new flags to openat(2) has
been fraught with backwards-compatibility issues (O_TMPFILE has to be
defined as __O_TMPFILE|O_DIRECTORY|[O_RDWR or O_WRONLY] to ensure old
kernels gave errors, since it's insecure to silently ignore the
flag[2]). All new security-related flags therefore have a tough road
to being added to openat(2).
Furthermore, the need for some sort of control over VFS's path
resolution (to avoid malicious paths resulting in inadvertent
breakouts) has been a very long-standing desire of many userspace
applications.
This patchset is a revival of Al Viro's old AT_NO_JUMPS[3] patchset
(which was a variant of David Drysdale's O_BENEATH patchset[4] which
was a spin-off of the Capsicum project[5]) with a few additions and
changes made based on the previous discussion within [6] as well as
others I felt were useful.
In line with the conclusions of the original discussion of
AT_NO_JUMPS, the flag has been split up into separate flags. However,
instead of being an openat(2) flag it is provided through a new
syscall openat2(2) which provides several other improvements to the
openat(2) interface (see the patch description for more details). The
following new LOOKUP_* flags are added:
LOOKUP_NO_XDEV:
Blocks all mountpoint crossings (upwards, downwards, or through
absolute links). Absolute pathnames alone in openat(2) do not
trigger this. Magic-link traversal which implies a vfsmount jump is
also blocked (though magic-link jumps on the same vfsmount are
permitted).
LOOKUP_NO_MAGICLINKS:
Blocks resolution through /proc/$pid/fd-style links. This is done
by blocking the usage of nd_jump_link() during resolution in a
filesystem. The term "magic-links" is used to match with the only
reference to these links in Documentation/, but I'm happy to change
the name.
It should be noted that this is different to the scope of
~LOOKUP_FOLLOW in that it applies to all path components. However,
you can do openat2(NO_FOLLOW|NO_MAGICLINKS) on a magic-link and it
will *not* fail (assuming that no parent component was a
magic-link), and you will have an fd for the magic-link.
In order to correctly detect magic-links, the introduction of a new
LOOKUP_MAGICLINK_JUMPED state flag was required.
LOOKUP_BENEATH:
Disallows escapes to outside the starting dirfd's
tree, using techniques such as ".." or absolute links. Absolute
paths in openat(2) are also disallowed.
Conceptually this flag is to ensure you "stay below" a certain
point in the filesystem tree -- but this requires some additional
to protect against various races that would allow escape using
"..".
Currently LOOKUP_BENEATH implies LOOKUP_NO_MAGICLINKS, because it
can trivially beam you around the filesystem (breaking the
protection). In future, there might be similar safety checks done
as in LOOKUP_IN_ROOT, but that requires more discussion.
In addition, two new flags are added that expand on the above ideas:
LOOKUP_NO_SYMLINKS:
Does what it says on the tin. No symlink resolution is allowed at
all, including magic-links. Just as with LOOKUP_NO_MAGICLINKS this
can still be used with NOFOLLOW to open an fd for the symlink as
long as no parent path had a symlink component.
LOOKUP_IN_ROOT:
This is an extension of LOOKUP_BENEATH that, rather than blocking
attempts to move past the root, forces all such movements to be
scoped to the starting point. This provides chroot(2)-like
protection but without the cost of a chroot(2) for each filesystem
operation, as well as being safe against race attacks that
chroot(2) is not.
If a race is detected (as with LOOKUP_BENEATH) then an error is
generated, and similar to LOOKUP_BENEATH it is not permitted to
cross magic-links with LOOKUP_IN_ROOT.
The primary need for this is from container runtimes, which
currently need to do symlink scoping in userspace[7] when opening
paths in a potentially malicious container.
There is a long list of CVEs that could have bene mitigated by
having RESOLVE_THIS_ROOT (such as CVE-2017-1002101,
CVE-2017-1002102, CVE-2018-15664, and CVE-2019-5736, just to name a
few).
In order to make all of the above more usable, I'm working on
libpathrs[8] which is a C-friendly library for safe path resolution.
It features a userspace-emulated backend if the kernel doesn't support
openat2(2). Hopefully we can get userspace to switch to using it, and
thus get openat2(2) support for free once it's ready.
Future work would include implementing things like
RESOLVE_NO_AUTOMOUNT and possibly a RESOLVE_NO_REMOTE (to allow
programs to be sure they don't hit DoSes though stale NFS handles)"
* 'work.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
Documentation: path-lookup: include new LOOKUP flags
selftests: add openat2(2) selftests
open: introduce openat2(2) syscall
namei: LOOKUP_{IN_ROOT,BENEATH}: permit limited ".." resolution
namei: LOOKUP_IN_ROOT: chroot-like scoped resolution
namei: LOOKUP_BENEATH: O_BENEATH-like scoped resolution
namei: LOOKUP_NO_XDEV: block mountpoint crossing
namei: LOOKUP_NO_MAGICLINKS: block magic-link resolution
namei: LOOKUP_NO_SYMLINKS: block symlink resolution
namei: allow set_root() to produce errors
namei: allow nd_jump_link() to produce errors
nsfs: clean-up ns_get_path() signature to return int
namei: only return -ECHILD from follow_dotdot_rcu()
Here are the big set of tty and serial driver updates for 5.6-rc1
Included in here are:
- dummy_con cleanups (touches lots of arch code)
- sysrq logic cleanups (touches lots of serial drivers)
- samsung driver fixes (wasn't really being built)
- conmakeshash move to tty subdir out of scripts
- lots of small tty/serial driver updates
All of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXjFRBg8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+yn2VACgkge7vTeUNeZFc+6F4NWphAQ5tCQAoK/MMbU6
0O8ef7PjFwCU4s227UTv
=6m40
-----END PGP SIGNATURE-----
Merge tag 'tty-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial driver updates from Greg KH:
"Here are the big set of tty and serial driver updates for 5.6-rc1
Included in here are:
- dummy_con cleanups (touches lots of arch code)
- sysrq logic cleanups (touches lots of serial drivers)
- samsung driver fixes (wasn't really being built)
- conmakeshash move to tty subdir out of scripts
- lots of small tty/serial driver updates
All of these have been in linux-next for a while with no reported
issues"
* tag 'tty-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (140 commits)
tty: n_hdlc: Use flexible-array member and struct_size() helper
tty: baudrate: SPARC supports few more baud rates
tty: baudrate: Synchronise baud_table[] and baud_bits[]
tty: serial: meson_uart: Add support for kernel debugger
serial: imx: fix a race condition in receive path
serial: 8250_bcm2835aux: Document struct bcm2835aux_data
serial: 8250_bcm2835aux: Use generic remapping code
serial: 8250_bcm2835aux: Allocate uart_8250_port on stack
serial: 8250_bcm2835aux: Suppress register_port error on -EPROBE_DEFER
serial: 8250_bcm2835aux: Suppress clk_get error on -EPROBE_DEFER
serial: 8250_bcm2835aux: Fix line mismatch on driver unbind
serial_core: Remove unused member in uart_port
vt: Correct comment documenting do_take_over_console()
vt: Delete comment referencing non-existent unbind_con_driver()
arch/xtensa/setup: Drop dummy_con initialization
arch/x86/setup: Drop dummy_con initialization
arch/unicore32/setup: Drop dummy_con initialization
arch/sparc/setup: Drop dummy_con initialization
arch/sh/setup: Drop dummy_con initialization
arch/s390/setup: Drop dummy_con initialization
...
Pull scheduler updates from Ingo Molnar:
"These were the main changes in this cycle:
- More -rt motivated separation of CONFIG_PREEMPT and
CONFIG_PREEMPTION.
- Add more low level scheduling topology sanity checks and warnings
to filter out nonsensical topologies that break scheduling.
- Extend uclamp constraints to influence wakeup CPU placement
- Make the RT scheduler more aware of asymmetric topologies and CPU
capacities, via uclamp metrics, if CONFIG_UCLAMP_TASK=y
- Make idle CPU selection more consistent
- Various fixes, smaller cleanups, updates and enhancements - please
see the git log for details"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (58 commits)
sched/fair: Define sched_idle_cpu() only for SMP configurations
sched/topology: Assert non-NUMA topology masks don't (partially) overlap
idle: fix spelling mistake "iterrupts" -> "interrupts"
sched/fair: Remove redundant call to cpufreq_update_util()
sched/psi: create /proc/pressure and /proc/pressure/{io|memory|cpu} only when psi enabled
sched/fair: Fix sgc->{min,max}_capacity calculation for SD_OVERLAP
sched/fair: calculate delta runnable load only when it's needed
sched/cputime: move rq parameter in irqtime_account_process_tick
stop_machine: Make stop_cpus() static
sched/debug: Reset watchdog on all CPUs while processing sysrq-t
sched/core: Fix size of rq::uclamp initialization
sched/uclamp: Fix a bug in propagating uclamp value in new cgroups
sched/fair: Load balance aggressively for SCHED_IDLE CPUs
sched/fair : Improve update_sd_pick_busiest for spare capacity case
watchdog: Remove soft_lockup_hrtimer_cnt and related code
sched/rt: Make RT capacity-aware
sched/fair: Make EAS wakeup placement consider uclamp restrictions
sched/fair: Make task_fits_capacity() consider uclamp restrictions
sched/uclamp: Rename uclamp_util_with() into uclamp_rq_util_with()
sched/uclamp: Make uclamp util helpers use and return UL values
...
Pull EFI updates from Ingo Molnar:
"The main changes in this cycle were:
- Cleanup of the GOP [graphics output] handling code in the EFI stub
- Complete refactoring of the mixed mode handling in the x86 EFI stub
- Overhaul of the x86 EFI boot/runtime code
- Increase robustness for mixed mode code
- Add the ability to disable DMA at the root port level in the EFI
stub
- Get rid of RWX mappings in the EFI memory map and page tables,
where possible
- Move the support code for the old EFI memory mapping style into its
only user, the SGI UV1+ support code.
- plus misc fixes, updates, smaller cleanups.
... and due to interactions with the RWX changes, another round of PAT
cleanups make a guest appearance via the EFI tree - with no side
effects intended"
* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (75 commits)
efi/x86: Disable instrumentation in the EFI runtime handling code
efi/libstub/x86: Fix EFI server boot failure
efi/x86: Disallow efi=old_map in mixed mode
x86/boot/compressed: Relax sed symbol type regex for LLVM ld.lld
efi/x86: avoid KASAN false positives when accessing the 1: 1 mapping
efi: Fix handling of multiple efi_fake_mem= entries
efi: Fix efi_memmap_alloc() leaks
efi: Add tracking for dynamically allocated memmaps
efi: Add a flags parameter to efi_memory_map
efi: Fix comment for efi_mem_type() wrt absent physical addresses
efi/arm: Defer probe of PCIe backed efifb on DT systems
efi/x86: Limit EFI old memory map to SGI UV machines
efi/x86: Avoid RWX mappings for all of DRAM
efi/x86: Don't map the entire kernel text RW for mixed mode
x86/mm: Fix NX bit clearing issue in kernel_map_pages_in_pgd
efi/libstub/x86: Fix unused-variable warning
efi/libstub/x86: Use mandatory 16-byte stack alignment in mixed mode
efi/libstub/x86: Use const attribute for efi_is_64bit()
efi: Allow disabling PCI busmastering on bridges during boot
efi/x86: Allow translating 64-bit arguments for mixed mode calls
...
/* Background. */
For a very long time, extending openat(2) with new features has been
incredibly frustrating. This stems from the fact that openat(2) is
possibly the most famous counter-example to the mantra "don't silently
accept garbage from userspace" -- it doesn't check whether unknown flags
are present[1].
This means that (generally) the addition of new flags to openat(2) has
been fraught with backwards-compatibility issues (O_TMPFILE has to be
defined as __O_TMPFILE|O_DIRECTORY|[O_RDWR or O_WRONLY] to ensure old
kernels gave errors, since it's insecure to silently ignore the
flag[2]). All new security-related flags therefore have a tough road to
being added to openat(2).
Userspace also has a hard time figuring out whether a particular flag is
supported on a particular kernel. While it is now possible with
contemporary kernels (thanks to [3]), older kernels will expose unknown
flag bits through fcntl(F_GETFL). Giving a clear -EINVAL during
openat(2) time matches modern syscall designs and is far more
fool-proof.
In addition, the newly-added path resolution restriction LOOKUP flags
(which we would like to expose to user-space) don't feel related to the
pre-existing O_* flag set -- they affect all components of path lookup.
We'd therefore like to add a new flag argument.
Adding a new syscall allows us to finally fix the flag-ignoring problem,
and we can make it extensible enough so that we will hopefully never
need an openat3(2).
/* Syscall Prototype. */
/*
* open_how is an extensible structure (similar in interface to
* clone3(2) or sched_setattr(2)). The size parameter must be set to
* sizeof(struct open_how), to allow for future extensions. All future
* extensions will be appended to open_how, with their zero value
* acting as a no-op default.
*/
struct open_how { /* ... */ };
int openat2(int dfd, const char *pathname,
struct open_how *how, size_t size);
/* Description. */
The initial version of 'struct open_how' contains the following fields:
flags
Used to specify openat(2)-style flags. However, any unknown flag
bits or otherwise incorrect flag combinations (like O_PATH|O_RDWR)
will result in -EINVAL. In addition, this field is 64-bits wide to
allow for more O_ flags than currently permitted with openat(2).
mode
The file mode for O_CREAT or O_TMPFILE.
Must be set to zero if flags does not contain O_CREAT or O_TMPFILE.
resolve
Restrict path resolution (in contrast to O_* flags they affect all
path components). The current set of flags are as follows (at the
moment, all of the RESOLVE_ flags are implemented as just passing
the corresponding LOOKUP_ flag).
RESOLVE_NO_XDEV => LOOKUP_NO_XDEV
RESOLVE_NO_SYMLINKS => LOOKUP_NO_SYMLINKS
RESOLVE_NO_MAGICLINKS => LOOKUP_NO_MAGICLINKS
RESOLVE_BENEATH => LOOKUP_BENEATH
RESOLVE_IN_ROOT => LOOKUP_IN_ROOT
open_how does not contain an embedded size field, because it is of
little benefit (userspace can figure out the kernel open_how size at
runtime fairly easily without it). It also only contains u64s (even
though ->mode arguably should be a u16) to avoid having padding fields
which are never used in the future.
Note that as a result of the new how->flags handling, O_PATH|O_TMPFILE
is no longer permitted for openat(2). As far as I can tell, this has
always been a bug and appears to not be used by userspace (and I've not
seen any problems on my machines by disallowing it). If it turns out
this breaks something, we can special-case it and only permit it for
openat(2) but not openat2(2).
After input from Florian Weimer, the new open_how and flag definitions
are inside a separate header from uapi/linux/fcntl.h, to avoid problems
that glibc has with importing that header.
/* Testing. */
In a follow-up patch there are over 200 selftests which ensure that this
syscall has the correct semantics and will correctly handle several
attack scenarios.
In addition, I've written a userspace library[4] which provides
convenient wrappers around openat2(RESOLVE_IN_ROOT) (this is necessary
because no other syscalls support RESOLVE_IN_ROOT, and thus lots of care
must be taken when using RESOLVE_IN_ROOT'd file descriptors with other
syscalls). During the development of this patch, I've run numerous
verification tests using libpathrs (showing that the API is reasonably
usable by userspace).
/* Future Work. */
Additional RESOLVE_ flags have been suggested during the review period.
These can be easily implemented separately (such as blocking auto-mount
during resolution).
Furthermore, there are some other proposed changes to the openat(2)
interface (the most obvious example is magic-link hardening[5]) which
would be a good opportunity to add a way for userspace to restrict how
O_PATH file descriptors can be re-opened.
Another possible avenue of future work would be some kind of
CHECK_FIELDS[6] flag which causes the kernel to indicate to userspace
which openat2(2) flags and fields are supported by the current kernel
(to avoid userspace having to go through several guesses to figure it
out).
[1]: https://lwn.net/Articles/588444/
[2]: https://lore.kernel.org/lkml/CA+55aFyyxJL1LyXZeBsf2ypriraj5ut1XkNDsunRBqgVjZU_6Q@mail.gmail.com
[3]: commit 629e014bb8 ("fs: completely ignore unknown open flags")
[4]: https://sourceware.org/bugzilla/show_bug.cgi?id=17523
[5]: https://lore.kernel.org/lkml/20190930183316.10190-2-cyphar@cyphar.com/
[6]: https://youtu.be/ggD-eb3yPVs
Suggested-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
con_init in tty/vt.c will now set conswitchp to dummy_con if it's unset.
Drop it from arch setup code.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Link: https://lore.kernel.org/r/20191218214506.49252-12-nivedita@alum.mit.edu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>