Fix our KVM reverse map real-mode handling since we enabled huge vmalloc (in some
configurations).
Revert a recent change to our IOMMU code which broke some devices.
Fix KVM handling of FSCR on P7/P8, which could have possibly let a guest crash it's Qemu.
Fix kprobes validation of prefixed instructions across page boundary.
Thanks to: Alexey Kardashevskiy, Christophe Leroy, Fabiano Rosas, Frederic Barrat, Naveen
N. Rao, Nicholas Piggin.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmC8wi8THG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgN42D/4vHCHX4T0CZ/5bwh1RMOoGKM+PFyLe
BoA2i8lvUILG1+LOiRJuBnVZiWwKYBqfkkfY4BmQpU3Oe3gjbJJwc9QGGHUDarWn
NmMPqVgaO5qXObObKXzBU1Ihq4UQwMhK044srzXcgMYyTnSFNgWQAsvO0+0Cl4K4
uT100AFV4tps8dLCHCq2XVHuQALnHzZah4yQ8i6u1TMN/TK+kXyONrMSCgsQ1mrM
dDsT1zVeegj8EuW/n9kXkLNp2YZeatptZB7cPDtojlhCQTsZBcKnYtDq5ScASuwy
7hGjzA2SyWsa6l0Iejoj8tr/ZS8Nutftz3izuhDNLEf4foz0tOWqxbXJayOA5J7w
vzs9OSFbT6z/svELSIkRCvfePqUdDdC2MthWoShgv0SoIXj+Y7ABKQRW9B5rLeF5
RiB2kCB+7S/03qjDtn57IlJC6aVoHzglTAdYXuj7guUEsZQrmtsdm1IM4eB0XYyx
A9/AMCGSbswT0/IUriO4b9FtWGOJJf1vWv3WeqE63gPxqhyTz1ACqMT/0HLrARJZ
/QLZrbuOSMBSGDnmJxy3vzb+3fxGxSGrUcoYc6MiSODuRgf7zHuRJsSDwoftnOTW
PXVWPVz9ef0OEmuBJyEgTrO+/g9jjCPw8UJz9EaFzkMHbaoHRuZdo2m8X6zrXQLh
AUVlDkkSmblY9w==
=KkfQ
-----END PGP SIGNATURE-----
Merge tag 'powerpc-5.13-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Fix our KVM reverse map real-mode handling since we enabled huge
vmalloc (in some configurations).
Revert a recent change to our IOMMU code which broke some devices.
Fix KVM handling of FSCR on P7/P8, which could have possibly let a
guest crash it's Qemu.
Fix kprobes validation of prefixed instructions across page boundary.
Thanks to Alexey Kardashevskiy, Christophe Leroy, Fabiano Rosas,
Frederic Barrat, Naveen N. Rao, and Nicholas Piggin"
* tag 'powerpc-5.13-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
Revert "powerpc/kernel/iommu: Align size for IOMMU_PAGE_SIZE() to save TCEs"
KVM: PPC: Book3S HV: Save host FSCR in the P7/8 path
powerpc: Fix reverse map real-mode address lookup with huge vmalloc
powerpc/kprobes: Fix validation of prefixed instructions across page boundary
This reverts commit 3c0468d445.
That commit was breaking alignment guarantees for the DMA address when
allocating coherent mappings, as described in
Documentation/core-api/dma-api-howto.rst
It was also noticed by Mellanox' driver:
[ 1515.763621] mlx5_core c002:01:00.0: mlx5_frag_buf_alloc_node:146:(pid 13402): unexpected map alignment: 0x0800000000c61000, page_shift=16
[ 1515.763635] mlx5_core c002:01:00.0: mlx5_cqwq_create:181:(pid
13402): mlx5_frag_buf_alloc_node() failed, -12
Fixes: 3c0468d445 ("powerpc/kernel/iommu: Align size for IOMMU_PAGE_SIZE() to save TCEs")
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210526144540.117795-1-fbarrat@linux.ibm.com
Pull i2c fixes from Wolfram Sang:
"This is a bit larger than usual at rc4 time. The reason is due to
Lee's work of fixing newly reported build warnings.
The rest is fixes as usual"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (22 commits)
MAINTAINERS: adjust to removing i2c designware platform data
i2c: s3c2410: fix possible NULL pointer deref on read message after write
i2c: mediatek: Disable i2c start_en and clear intr_stat brfore reset
i2c: i801: Don't generate an interrupt on bus reset
i2c: mpc: implement erratum A-004447 workaround
powerpc/fsl: set fsl,i2c-erratum-a004447 flag for P1010 i2c controllers
powerpc/fsl: set fsl,i2c-erratum-a004447 flag for P2041 i2c controllers
dt-bindings: i2c: mpc: Add fsl,i2c-erratum-a004447 flag
i2c: busses: i2c-stm32f4: Remove incorrectly placed ' ' from function name
i2c: busses: i2c-st: Fix copy/paste function misnaming issues
i2c: busses: i2c-pnx: Provide descriptions for 'alg_data' data structure
i2c: busses: i2c-ocores: Place the expected function names into the documentation headers
i2c: busses: i2c-eg20t: Fix 'bad line' issue and provide description for 'msgs' param
i2c: busses: i2c-designware-master: Fix misnaming of 'i2c_dw_init_master()'
i2c: busses: i2c-cadence: Fix incorrectly documented 'enum cdns_i2c_slave_mode'
i2c: busses: i2c-ali1563: File headers are not good candidates for kernel-doc
i2c: muxes: i2c-arb-gpio-challenge: Demote non-conformant kernel-doc headers
i2c: busses: i2c-nomadik: Fix formatting issue pertaining to 'timeout'
i2c: sh_mobile: Use new clock calculation formulas for RZ/G2E
i2c: I2C_HISI should depend on ACPI
...
* Another state update on exit to userspace fix
* Prevent the creation of mixed 32/64 VMs
* Fix regression with irqbypass not restarting the guest on failed connect
* Fix regression with debug register decoding resulting in overlapping access
* Commit exception state on exit to usrspace
* Fix the MMU notifier return values
* Add missing 'static' qualifiers in the new host stage-2 code
x86 fixes:
* fix guest missed wakeup with assigned devices
* fix WARN reported by syzkaller
* do not use BIT() in UAPI headers
* make the kvm_amd.avic parameter bool
PPC fixes:
* make halt polling heuristics consistent with other architectures
selftests:
* various fixes
* new performance selftest memslot_perf_test
* test UFFD minor faults in demand_paging_test
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmCyF0MUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroOHSgf/Q4Hm5e12Bj2xJy6A+iShnrbbT8PW
hcIIOA7zGWXfjVYcBV7anbj7CcpzfIz0otcRBABa5mkhj+fb3YmPEb0EzCPi4Hru
zxpcpB2w7W7WtUOIKe2EmaT+4Pk6/iLcfr8UMHMqx460akE9OmIg10QNWai3My/3
RIOeakSckBI9e/1TQZbxH66dsLwCT0lLco7i7AWHdFxkzUQyoA34HX5pczOCBsO5
3nXH+/txnRVhqlcyzWLVVGVzFqmpHtBqkIInDOXfUqIoxo/gOhOgF1QdMUEKomxn
5ZFXlL5IXNtr+7yiI67iHX7CWkGZE9oJ04TgPHn6LR6wRnVvc3JInzcB5Q==
=ollO
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"ARM fixes:
- Another state update on exit to userspace fix
- Prevent the creation of mixed 32/64 VMs
- Fix regression with irqbypass not restarting the guest on failed
connect
- Fix regression with debug register decoding resulting in
overlapping access
- Commit exception state on exit to usrspace
- Fix the MMU notifier return values
- Add missing 'static' qualifiers in the new host stage-2 code
x86 fixes:
- fix guest missed wakeup with assigned devices
- fix WARN reported by syzkaller
- do not use BIT() in UAPI headers
- make the kvm_amd.avic parameter bool
PPC fixes:
- make halt polling heuristics consistent with other architectures
selftests:
- various fixes
- new performance selftest memslot_perf_test
- test UFFD minor faults in demand_paging_test"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (44 commits)
selftests: kvm: fix overlapping addresses in memslot_perf_test
KVM: X86: Kill off ctxt->ud
KVM: X86: Fix warning caused by stale emulation context
KVM: X86: Use kvm_get_linear_rip() in single-step and #DB/#BP interception
KVM: x86/mmu: Fix comment mentioning skip_4k
KVM: VMX: update vcpu posted-interrupt descriptor when assigning device
KVM: rename KVM_REQ_PENDING_TIMER to KVM_REQ_UNBLOCK
KVM: x86: add start_assignment hook to kvm_x86_ops
KVM: LAPIC: Narrow the timer latency between wait_lapic_expire and world switch
selftests: kvm: do only 1 memslot_perf_test run by default
KVM: X86: Use _BITUL() macro in UAPI headers
KVM: selftests: add shared hugetlbfs backing source type
KVM: selftests: allow using UFFD minor faults for demand paging
KVM: selftests: create alias mappings when using shared memory
KVM: selftests: add shmem backing source type
KVM: selftests: refactor vm_mem_backing_src_type flags
KVM: selftests: allow different backing source types
KVM: selftests: compute correct demand paging size
KVM: selftests: simplify setup_demand_paging error handling
KVM: selftests: Print a message if /dev/kvm is missing
...
Similar to commit 25edcc50d7 ("KVM: PPC: Book3S HV: Save and restore
FSCR in the P9 path"), ensure the P7/8 path saves and restores the host
FSCR. The logic explained in that patch actually applies there to the
old path well: a context switch can be made before kvmppc_vcpu_run_hv
restores the host FSCR and returns.
Now both the p9 and the p7/8 paths now save and restore their FSCR, it
no longer needs to be restored at the end of kvmppc_vcpu_run_hv
Fixes: b005255e12 ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs")
Cc: stable@vger.kernel.org # v3.14+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210526125851.3436735-1-npiggin@gmail.com
real_vmalloc_addr() does not currently work for huge vmalloc, which is
what the reverse map can be allocated with for radix host, hash guest.
Extract the hugepage aware equivalent from eeh code into a helper, and
convert existing sites including this one to use it.
Fixes: 8abddd968a ("powerpc/64s/radix: Enable huge vmalloc mappings")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210526120005.3432222-1-npiggin@gmail.com
When checking if the probed instruction is the suffix of a prefixed
instruction, we access the instruction at the previous word. If the
probed instruction is the very first word of a module, we can end up
trying to access an invalid page.
Fix this by skipping the check for all instructions at the beginning of
a page. Prefixed instructions cannot cross a 64-byte boundary and as
such, we don't expect to encounter a suffix as the very first word in a
page for kernel text. Even if there are prefixed instructions crossing
a page boundary (from a module, for instance), the instruction will be
illegal, so preventing probing on the suffix of such prefix instructions
isn't worthwhile.
Fixes: b4657f7650 ("powerpc/kprobes: Don't allow breakpoints on suffixes")
Cc: stable@vger.kernel.org # v5.8+
Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/0df9a032a05576a2fa8e97d1b769af2ff0eafbd6.1621416666.git.naveen.n.rao@linux.vnet.ibm.com
The i2c controllers on the P1010 have an erratum where the documented
scheme for i2c bus recovery will not work (A-004447). A different
mechanism is needed which is documented in the P1010 Chip Errata Rev L.
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
The i2c controllers on the P2040/P2041 have an erratum where the
documented scheme for i2c bus recovery will not work (A-004447). A
different mechanism is needed which is documented in the P2040 Chip
Errata Rev Q (latest available at the time of writing).
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
KVM_REQ_UNBLOCK will be used to exit a vcpu from
its inner vcpu halt emulation loop.
Rename KVM_REQ_PENDING_TIMER to KVM_REQ_UNBLOCK, switch
PowerPC to arch specific request bit.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Message-Id: <20210525134321.303768132@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This is inspired by commit 262de4102c (kvm: exit halt polling on
need_resched() as well). Due to PPC implements an arch specific halt
polling logic, we have to the need_resched() check there as well. This
patch adds a helper function that can be shared between book3s and generic
halt-polling loops.
Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Venkatesh Srinivas <venkateshs@chromium.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Venkatesh Srinivas <venkateshs@chromium.org>
Cc: Jim Mattson <jmattson@google.com>
Cc: David Matlack <dmatlack@google.com>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Message-Id: <1621339235-11131-1-git-send-email-wanpengli@tencent.com>
[Make the function inline. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Fix breakage of strace (and other ptracers etc.) when using the new scv ABI (Power9 or
later with glibc >= 2.33).
Fix early_ioremap() on 64-bit, which broke booting on some machines.
Thanks to: Dmitry V. Levin, Nicholas Piggin, Alexey Kardashevskiy, Christophe Leroy.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmCqKaoTHG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgER4D/9Nqbw1u16uoBrIyHaI4Q6UasXIcktc
ghFs0tOKNawNUyJUcl8/utH8ilpUTOnZPLeYWX9wP/KZFzHhEoWTmUZI5wcX+hkO
V0ZabIsJ9+mKZXffSqBliehRQpqQAS5vlpJOWN0WFUx2Jaqv+QAfGLuPMAvvpqx1
5yis2wVyC0ooo03TiaD2SjK2axzDa3Z+QOwcbAFYrb9/c2THU5J4y3+JeicHIZqi
pySwBE5INa25zjqgDxw6ONMNpdflQvB4i06rnGlkTnUbqtUW4oGVyE3cLTwkcL+j
zz6jN27jP0am6pM3+1JTIJcvyUETheMYmL5MPa7yzQqngD4egdNMl62p0WYLIgYo
LRvPpkF0mfgt9RdIbvCo5+dhni0FcCdqTJcCfmUG6ndQ9vCYFCtCvnRrl/9iqqLJ
B38Kjaad2T7oFmLBRKOHYVf5p77g1i37xiMcHu0m2Emrbi5ftenLnlOQ9Xk/xW/v
cp7e0o/D3PJjqy9EsZ+o0DiZq1AZe0dg8nKCVIXXF6UaLNb2copP0ylplBF7aefs
PW3Fkbq4zjRxE5UYBaz9BZmijtxH9IKywkaCS1/K+EgGjfhIP+XsmH0+qdd1JDqW
M47B8Bl8ucdOA9eD48GeOY9KBSbvR5sK83NibGAEMRfyNSDZPE7Z3OzI9goeWfCG
R6LDOridKGOuNQ==
=qeQq
-----END PGP SIGNATURE-----
Merge tag 'powerpc-5.13-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix breakage of strace (and other ptracers etc.) when using the new
scv ABI (Power9 or later with glibc >= 2.33).
- Fix early_ioremap() on 64-bit, which broke booting on some machines.
Thanks to Dmitry V. Levin, Nicholas Piggin, Alexey Kardashevskiy, and
Christophe Leroy.
* tag 'powerpc-5.13-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64s/syscall: Fix ptrace syscall info with scv syscalls
powerpc/64s/syscall: Use pt_regs.trap to distinguish syscall ABI difference between sc and scv syscalls
powerpc: Fix early setup to make early_ioremap() work
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmCmN9AACgkQnJ2qBz9k
QNn5ZwgAwnLdgBuILDqJwPaYpXOzvMhjjG8AwBDzhMYhhpt+OOCUevoRm7mDU7J2
t/DlwWGMhpp80ku+x+AURR/ltOfFvw4QAHeIXPWjkoieFKcLOEvAjWWZP6oIFC12
5e/QVXqK58fuRJwveYp4jZ+AXvDMoHJrDXsoTFezjBDIQQgzlIlrMzPavS/6UzUN
mAF2sapE9lcQoRMfU8kktBWPVM/GpFkus2Q48EYFCZ1rp3aRyw/aahTVuvSUZCV0
XiY6f2F7qgFLtomK6UurlxTc7rPsrG+UmNvGWuXf3R81UawegmKQeG5zcaMGrZs1
kHyJQcP9nGYPLDXt/4kW9cY0s8oOKg==
=RbOE
-----END PGP SIGNATURE-----
Merge tag 'quota_for_v5.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull quota fixes from Jan Kara:
"The most important part in the pull is disablement of the new syscall
quotactl_path() which was added in rc1.
The reason is some people at LWN discussion pointed out dirfd would be
useful for this path based syscall and Christian Brauner agreed.
Without dirfd it may be indeed problematic for containers. So let's
just disable the syscall for now when it doesn't have users yet so
that we have more time to mull over how to best specify the filesystem
we want to work on"
* tag 'quota_for_v5.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
quota: Disable quotactl_path syscall
quota: Use 'hlist_for_each_entry' to simplify code
The scv implementation missed updating syscall return value and error
value get/set functions to deal with the changed register ABI. This
broke ptrace PTRACE_GET_SYSCALL_INFO as well as some kernel auditing
and tracing functions.
Fix. tools/testing/selftests/ptrace/get_syscall_info now passes when
scv is used.
Fixes: 7fa95f9ada ("powerpc/64s: system call support for scv/rfscv instructions")
Cc: stable@vger.kernel.org # v5.9+
Reported-by: "Dmitry V. Levin" <ldv@altlinux.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210520111931.2597127-2-npiggin@gmail.com
The immediate problem is that after commit
0bd3f9e953 ("powerpc/legacy_serial: Use early_ioremap()") the kernel
silently reboots on some systems.
The reason is that early_ioremap() returns broken addresses as it uses
slot_virt[] array which initialized with offsets from FIXADDR_TOP ==
IOREMAP_END+FIXADDR_SIZE == KERN_IO_END - FIXADDR_SIZ + FIXADDR_SIZE ==
__kernel_io_end which is 0 when early_ioremap_setup() is called.
__kernel_io_end is initialized little bit later in early_init_mmu().
This fixes the initialization by swapping early_ioremap_setup() and
early_init_mmu().
Fixes: 265c3491c4 ("powerpc: Add support for GENERIC_EARLY_IOREMAP")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Drop unrelated cleanup & cleanup change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210520032919.358935-1-aik@ozlabs.ru
In commit fa8b90070a ("quota: wire up quotactl_path") we have wired up
new quotactl_path syscall. However some people in LWN discussion have
objected that the path based syscall is missing dirfd and flags argument
which is mostly standard for contemporary path based syscalls. Indeed
they have a point and after a discussion with Christian Brauner and
Sascha Hauer I've decided to disable the syscall for now and update its
API. Since there is no userspace currently using that syscall and it
hasn't been released in any major release, we should be fine.
CC: Christian Brauner <christian.brauner@ubuntu.com>
CC: Sascha Hauer <s.hauer@pengutronix.de>
Link: https://lore.kernel.org/lkml/20210512153621.n5u43jsytbik4yze@wittgenstein
Signed-off-by: Jan Kara <jack@suse.cz>
Some interrupt handlers have an "extra" that saves 1 or 2
registers (r14, r15) in the paca save area and makes them available to
use by the handler.
The change to always save nvgprs in exception handlers lead to some
interrupt handlers saving those scratch r14 / r15 registers into the
interrupt frame's GPR saves, which get restored on interrupt exit.
Fix this by always reloading those scratch registers from paca before
the EXCEPTION_COMMON that saves nvgprs.
Fixes: 4228b2c3d2 ("powerpc/64e/interrupt: always save nvgprs on interrupt")
Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210514044008.1955783-1-npiggin@gmail.com
The stf entry barrier fallback is unsafe to execute in a semi-patched
state, which can happen when enabling/disabling the mitigation with
strict kernel RWX enabled and using the hash MMU.
See the previous commit for more details.
Fix it by changing the order in which we patch the instructions.
Note the stf barrier fallback is only used on Power6 or earlier.
Fixes: bd573a8131 ("powerpc/mm/64s: Allow STRICT_KERNEL_RWX again")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210513140800.1391706-2-mpe@ellerman.id.au
The entry flush mitigation can be enabled/disabled at runtime. When this
happens it results in the kernel patching its own instructions to
enable/disable the mitigation sequence.
With strict kernel RWX enabled instruction patching happens via a
secondary mapping of the kernel text, so that we don't have to make the
primary mapping writable. With the hash MMU this leads to a hash fault,
which causes us to execute the exception entry which contains the entry
flush mitigation.
This means we end up executing the entry flush in a semi-patched state,
ie. after we have patched the first instruction but before we patch the
second or third instruction of the sequence.
On machines with updated firmware the entry flush is a series of special
nops, and it's safe to to execute in a semi-patched state.
However when using the fallback flush the sequence is mflr/branch/mtlr,
and so it's not safe to execute if we have patched out the mflr but not
the other two instructions. Doing so leads to us corrputing LR, leading
to an oops, for example:
# echo 0 > /sys/kernel/debug/powerpc/entry_flush
kernel tried to execute exec-protected page (c000000002971000) - exploit attempt? (uid: 0)
BUG: Unable to handle kernel instruction fetch
Faulting instruction address: 0xc000000002971000
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
CPU: 0 PID: 2215 Comm: bash Not tainted 5.13.0-rc1-00010-gda3bb206c9ce #1
NIP: c000000002971000 LR: c000000002971000 CTR: c000000000120c40
REGS: c000000013243840 TRAP: 0400 Not tainted (5.13.0-rc1-00010-gda3bb206c9ce)
MSR: 8000000010009033 <SF,EE,ME,IR,DR,RI,LE> CR: 48428482 XER: 00000000
...
NIP 0xc000000002971000
LR 0xc000000002971000
Call Trace:
do_patch_instruction+0xc4/0x340 (unreliable)
do_entry_flush_fixups+0x100/0x3b0
entry_flush_set+0x50/0xe0
simple_attr_write+0x160/0x1a0
full_proxy_write+0x8c/0x110
vfs_write+0xf0/0x340
ksys_write+0x84/0x140
system_call_exception+0x164/0x2d0
system_call_common+0xec/0x278
The simplest fix is to change the order in which we patch the
instructions, so that the sequence is always safe to execute. For the
non-fallback flushes it doesn't matter what order we patch in.
Fixes: bd573a8131 ("powerpc/mm/64s: Allow STRICT_KERNEL_RWX again")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210513140800.1391706-1-mpe@ellerman.id.au
The entry flush mitigation can be enabled/disabled at runtime via a
debugfs file (entry_flush), which causes the kernel to patch itself to
enable/disable the relevant mitigations.
However depending on which mitigation we're using, it may not be safe to
do that patching while other CPUs are active. For example the following
crash:
sleeper[15639]: segfault (11) at c000000000004c20 nip c000000000004c20 lr c000000000004c20
Shows that we returned to userspace with a corrupted LR that points into
the kernel, due to executing the partially patched call to the fallback
entry flush (ie. we missed the LR restore).
Fix it by doing the patching under stop machine. The CPUs that aren't
doing the patching will be spinning in the core of the stop machine
logic. That is currently sufficient for our purposes, because none of
the patching we do is to that code or anywhere in the vicinity.
Fixes: f79643787e ("powerpc/64s: flush L1D on kernel entry")
Cc: stable@vger.kernel.org # v5.10+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210506044959.1298123-2-mpe@ellerman.id.au
The STF (store-to-load forwarding) barrier mitigation can be
enabled/disabled at runtime via a debugfs file (stf_barrier), which
causes the kernel to patch itself to enable/disable the relevant
mitigations.
However depending on which mitigation we're using, it may not be safe to
do that patching while other CPUs are active. For example the following
crash:
User access of kernel address (c00000003fff5af0) - exploit attempt? (uid: 0)
segfault (11) at c00000003fff5af0 nip 7fff8ad12198 lr 7fff8ad121f8 code 1
code: 40820128 e93c00d0 e9290058 7c292840 40810058 38600000 4bfd9a81 e8410018
code: 2c030006 41810154 3860ffb6 e9210098 <e94d8ff0> 7d295279 39400000 40820a3c
Shows that we returned to userspace without restoring the user r13
value, due to executing the partially patched STF exit code.
Fix it by doing the patching under stop machine. The CPUs that aren't
doing the patching will be spinning in the core of the stop machine
logic. That is currently sufficient for our purposes, because none of
the patching we do is to that code or anywhere in the vicinity.
Fixes: a048a07d7f ("powerpc/64s: Add support for a store forwarding barrier at kernel entry/exit")
Cc: stable@vger.kernel.org # v4.17+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210506044959.1298123-1-mpe@ellerman.id.au
Commit 32b48bf851 ("KVM: PPC: Book3S HV: Fix conversion to gfn-based
MMU notifier callbacks") fixed kvm_unmap_gfn_range_hv() by adding a for
loop over each gfn in the range.
But for the Hash MMU it repeatedly calls kvm_unmap_rmapp() with the
first gfn of the range, rather than iterating through the range.
This exhibits as strange guest behaviour, sometimes crashing in firmare,
or booting and then guest userspace crashing unexpectedly.
Fix it by passing the iterator, gfn, to kvm_unmap_rmapp().
Fixes: 32b48bf851 ("KVM: PPC: Book3S HV: Fix conversion to gfn-based MMU notifier callbacks")
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210511105459.800788-1-mpe@ellerman.id.au
UBSAN complains when a pointer is calculated with invalid
'legacy_serial_console' index, allthough the index is verified
before dereferencing the pointer.
Fix it by checking 'legacy_serial_console' validity before
calculating pointers.
Fixes: 0bd3f9e953 ("powerpc/legacy_serial: Use early_ioremap()")
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210511010712.750096-1-mpe@ellerman.id.au
When neither CONFIG_VSX nor CONFIG_PPC_FPU_REGS are selected,
unsafe_copy_fpr_to_user() and unsafe_copy_fpr_from_user() are
doing nothing.
Then, unless the 'label' operand is used elsewhere, GCC complains
about it being defined but not used.
To fix that, add an impossible 'goto label'.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/cadc0a328bc8e6c5bf133193e7547d5c10ae7895.1620465920.git.christophe.leroy@csgroup.eu
Building kernel mainline with GCC 11 leads to following failure
when starting 'init':
init[1]: bad frame in sys_sigreturn: 7ff5a900 nip 001083cc lr 001083c4
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
This is an issue due to a segfault happening in
__unsafe_restore_general_regs() in a loop copying registers from user
to kernel:
10: 7d 09 03 a6 mtctr r8
14: 80 ca 00 00 lwz r6,0(r10)
18: 80 ea 00 04 lwz r7,4(r10)
1c: 90 c9 00 08 stw r6,8(r9)
20: 90 e9 00 0c stw r7,12(r9)
24: 39 0a 00 08 addi r8,r10,8
28: 39 29 00 08 addi r9,r9,8
2c: 81 4a 00 08 lwz r10,8(r10) <== r10 is clobbered here
30: 81 6a 00 0c lwz r11,12(r10)
34: 91 49 00 08 stw r10,8(r9)
38: 91 69 00 0c stw r11,12(r9)
3c: 39 48 00 08 addi r10,r8,8
40: 39 29 00 08 addi r9,r9,8
44: 42 00 ff d0 bdnz 14 <__unsafe_restore_general_regs+0x14>
As shown above, this is due to r10 being re-used by GCC. This didn't
happen with CLANG.
This is fixed by tagging 'x' output as an earlyclobber operand in
__get_user_asm2_goto().
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/cf0a050d124d4f426cdc7a74009d17b01d8d8969.1620465917.git.christophe.leroy@csgroup.eu
The hcall tracing code has a recursion check built in, which skips
tracing if we are already tracing an hcall.
However if the tracing code has problems with recursion, this check
may not catch all cases because the tracing code could be invoked from
a different tracepoint first, then make an hcall that gets traced,
then recurse.
Add an explicit warning if recursion is detected here, which might help
to notice tracing code making hcalls. Really the core trace code should
have its own recursion checking and warnings though.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210508101455.1578318-5-npiggin@gmail.com
Rather than special-case H_CEDE in the hcall trace wrappers, make the
idle H_CEDE call use plpar_hcall_norets_notrace().
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210508101455.1578318-4-npiggin@gmail.com
This doesn't seem very useful to trace before the recursion check, even
if the ftrace code has any recursion checks of its own. Be on the safe
side and don't trace the hcall trace wrappers.
Reported-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210508101455.1578318-3-npiggin@gmail.com
The paravit queued spinlock slow path adds itself to the queue then
calls pv_wait to wait for the lock to become free. This is implemented
by calling H_CONFER to donate cycles.
When hcall tracing is enabled, this H_CONFER call can lead to a spin
lock being taken in the tracing code, which will result in the lock to
be taken again, which will also go to the slow path because it queues
behind itself and so won't ever make progress.
An example trace of a deadlock:
__pv_queued_spin_lock_slowpath
trace_clock_global
ring_buffer_lock_reserve
trace_event_buffer_lock_reserve
trace_event_buffer_reserve
trace_event_raw_event_hcall_exit
__trace_hcall_exit
plpar_hcall_norets_trace
__pv_queued_spin_lock_slowpath
trace_clock_global
ring_buffer_lock_reserve
trace_event_buffer_lock_reserve
trace_event_buffer_reserve
trace_event_raw_event_rcu_dyntick
rcu_irq_exit
irq_exit
__do_irq
call_do_irq
do_IRQ
hardware_interrupt_common_virt
Fix this by introducing plpar_hcall_norets_notrace(), and using that to
make SPLPAR virtual processor dispatching hcalls by the paravirt
spinlock code.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210508101455.1578318-2-npiggin@gmail.com
- Convert sh and sparc to use generic shell scripts to generate the
syscall headers
- refactor .gitignore files
- Update kernel/config_data.gz only when the content of the .config is
really changed, which avoids the unneeded re-link of vmlinux
- move "remove stale files" workarounds to scripts/remove-stale-files
- suppress unused-but-set-variable warnings by default for Clang as well
- fix locale setting LANG=C to LC_ALL=C
- improve 'make distclean'
- always keep intermediate objects from scripts/link-vmlinux.sh
- move IF_ENABLED out of <linux/kconfig.h> to make it self-contained
- misc cleanups
-----BEGIN PGP SIGNATURE-----
iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmCWrucVHG1hc2FoaXJv
eUBrZXJuZWwub3JnAAoJED2LAQed4NsGRLkQAJ8t7PfMJLSh/VcgDXp3Z7fZ/V2M
RUGbOeRYErR1gylejuip/R19mS5MiBNecU60VrugZyDOMf98+mx61mI/ykpPeX92
sE3VU5MPXEwmv758QUr4gH014TZshMtHHo+tXA+NVUbqFp7RTnkZMDjOXGthYDHG
NhDou4LZ2P0CUKm8vb58SJPqB7ZdYOT9eEQEdHevm18Gx0KProCxRziup7loldy7
ET770okQ23if90ufCSVmnM6Ee6opoKYvXS5lv8V/a4xV/VbicbUclpzIZsHF7L2i
mIfr6dy480ncOaQlfWnX9ACgIeeqiFPOeZbAu7HAtwXzP5vCahgQ9FKVC7KPt+BP
Lf3LgdBrfSP5A7f7FrtkkPmP7pl1j6/Bq3+PhCur9XimtRIsvTOx7m7nuvsY4yHC
/wmBXFZgqE5DGyzpHXz1az8JHWw2AesP9L2f536BhfvRtdXaoOxPtZ/rmO1lfcMV
fWMa9f1em8lXwCiD1dR8UkBrIxItty+qqPffu2S/DlEepbiZrCg1gD827Fy7Mm3n
5rvrzYMOY2YK0yW1jtm+w3NlPlmG91BDUTP8tEcDxrTOIXezwqJf7fw8qIgGIy7W
3WzuBfgSvpT977ByMsB0YYugo2Xie+R1jpOWt7tv6KHM4varNBu0WpVhQhrKQr5o
agJiuvzsf3b+64oP
=935P
-----END PGP SIGNATURE-----
Merge tag 'kbuild-v5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull more Kbuild updates from Masahiro Yamada:
- Convert sh and sparc to use generic shell scripts to generate the
syscall headers
- refactor .gitignore files
- Update kernel/config_data.gz only when the content of the .config
is really changed, which avoids the unneeded re-link of vmlinux
- move "remove stale files" workarounds to scripts/remove-stale-files
- suppress unused-but-set-variable warnings by default for Clang
as well
- fix locale setting LANG=C to LC_ALL=C
- improve 'make distclean'
- always keep intermediate objects from scripts/link-vmlinux.sh
- move IF_ENABLED out of <linux/kconfig.h> to make it self-contained
- misc cleanups
* tag 'kbuild-v5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (25 commits)
linux/kconfig.h: replace IF_ENABLED() with PTR_IF() in <linux/kernel.h>
kbuild: Don't remove link-vmlinux temporary files on exit/signal
kbuild: remove the unneeded comments for external module builds
kbuild: make distclean remove tag files in sub-directories
kbuild: make distclean work against $(objtree) instead of $(srctree)
kbuild: refactor modname-multi by using suffix-search
kbuild: refactor fdtoverlay rule
kbuild: parameterize the .o part of suffix-search
arch: use cross_compiling to check whether it is a cross build or not
kbuild: remove ARCH=sh64 support from top Makefile
.gitignore: prefix local generated files with a slash
kbuild: replace LANG=C with LC_ALL=C
Makefile: Move -Wno-unused-but-set-variable out of GCC only block
kbuild: add a script to remove stale generated files
kbuild: update config_data.gz only when the content of .config is changed
.gitignore: ignore only top-level modules.builtin
.gitignore: move tags and TAGS close to other tag files
kernel/.gitgnore: remove stale timeconst.h and hz.bc
usr/include: refactor .gitignore
genksyms: fix stale comment
...
Merge master back into next, this allows us to resolve some conflicts in
arch/powerpc/Kconfig, and also re-sort the symbols under config PPC so
that they are in alphabetical order again.
Merge more updates from Andrew Morton:
"The remainder of the main mm/ queue.
143 patches.
Subsystems affected by this patch series (all mm): pagecache, hugetlb,
userfaultfd, vmscan, compaction, migration, cma, ksm, vmstat, mmap,
kconfig, util, memory-hotplug, zswap, zsmalloc, highmem, cleanups, and
kfence"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (143 commits)
kfence: use power-efficient work queue to run delayed work
kfence: maximize allocation wait timeout duration
kfence: await for allocation using wait_event
kfence: zero guard page after out-of-bounds access
mm/process_vm_access.c: remove duplicate include
mm/mempool: minor coding style tweaks
mm/highmem.c: fix coding style issue
btrfs: use memzero_page() instead of open coded kmap pattern
iov_iter: lift memzero_page() to highmem.h
mm/zsmalloc: use BUG_ON instead of if condition followed by BUG.
mm/zswap.c: switch from strlcpy to strscpy
arm64/Kconfig: introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
x86/Kconfig: introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
mm,memory_hotplug: add kernel boot option to enable memmap_on_memory
acpi,memhotplug: enable MHP_MEMMAP_ON_MEMORY when supported
mm,memory_hotplug: allocate memmap from the added memory range
mm,memory_hotplug: factor out adjusting present pages into adjust_present_page_count()
mm,memory_hotplug: relax fully spanned sections check
drivers/base/memory: introduce memory_block_{online,offline}
mm/memory_hotplug: remove broken locking of zone PCP structures during hot remove
...
SYS_SUPPORTS_HUGETLBFS config has duplicate definitions on platforms
that subscribe it. Instead, just make it a generic option which can be
selected on applicable platforms.
Also rename it as ARCH_SUPPORTS_HUGETLBFS instead. This reduces code
duplication and makes it cleaner.
Link: https://lkml.kernel.org/r/1617259448-22529-3-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64]
Acked-by: Palmer Dabbelt <palmerdabbelt@google.com> [riscv]
Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc]
Cc: Russell King <linux@armlinux.org.uk>
Cc: Will Deacon <will@kernel.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
HUGETLB_PAGE_SIZE_VARIABLE need not be defined for each individual
platform subscribing it. Instead just make it generic.
Link: https://lkml.kernel.org/r/1614914928-22039-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Suggested-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc]
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "hugetlb: Disable huge pmd unshare for uffd-wp", v4.
This series tries to disable huge pmd unshare of hugetlbfs backed memory
for uffd-wp. Although uffd-wp of hugetlbfs is still during rfc stage,
the idea of this series may be needed for multiple tasks (Axel's uffd
minor fault series, and Mike's soft dirty series), so I picked it out
from the larger series.
This patch (of 4):
It is a preparation work to be able to behave differently in the per
architecture huge_pte_alloc() according to different VMA attributes.
Pass it deeper into huge_pmd_share() so that we can avoid the find_vma() call.
[peterx@redhat.com: build fix]
Link: https://lkml.kernel.org/r/20210304164653.GB397383@xz-x1Link: https://lkml.kernel.org/r/20210218230633.15028-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20210218230633.15028-2-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Suggested-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Adam Ruprecht <ruprecht@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Cannon Matthews <cannonmatthews@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michal Koutn" <mkoutny@suse.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shawn Anastasio <shawn@anastas.io>
Cc: Steven Price <steven.price@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit b1c5356e87 ("KVM: PPC: Convert to the gfn-based MMU notifier
callbacks") causes unmap_gfn_range and age_gfn callbacks to only work
on the first gfn in the range. It also makes the aging callbacks call
into both radix and hash aging functions for radix guests. Fix this.
Add warnings for the single-gfn calls that have been converted to range
callbacks, in case they ever receieve ranges greater than 1.
Fixes: b1c5356e87 ("KVM: PPC: Convert to the gfn-based MMU notifier callbacks")
Reported-by: Bharata B Rao <bharata@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Bharata B Rao <bharata@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210505121509.1470207-1-npiggin@gmail.com
Pull swiotlb updates from Konrad Rzeszutek Wilk:
"Christoph Hellwig has taken a cleaver and trimmed off the not-needed
code and nicely folded duplicate code in the generic framework.
This lays the groundwork for more work to add extra DMA-backend-ish in
the future. Along with that some bug-fixes to make this a nice working
package"
* 'stable/for-linus-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb:
swiotlb: don't override user specified size in swiotlb_adjust_size
swiotlb: Fix the type of index
swiotlb: Make SWIOTLB_NO_FORCE perform no allocation
ARM: Qualify enabling of swiotlb_init()
swiotlb: remove swiotlb_nr_tbl
swiotlb: dynamically allocate io_tlb_default_mem
swiotlb: move global variables into a new io_tlb_mem structure
xen-swiotlb: remove the unused size argument from xen_swiotlb_fixup
xen-swiotlb: split xen_swiotlb_init
swiotlb: lift the double initialization protection from xen-swiotlb
xen-swiotlb: remove xen_io_tlb_start and xen_io_tlb_nslabs
xen-swiotlb: remove xen_set_nslabs
xen-swiotlb: use io_tlb_end in xen_swiotlb_dma_supported
xen-swiotlb: use is_swiotlb_buffer in is_xen_swiotlb_buffer
swiotlb: split swiotlb_tbl_sync_single
swiotlb: move orig addr and size validation into swiotlb_bounce
swiotlb: remove the alloc_size parameter to swiotlb_tbl_unmap_single
powerpc/svm: stop using io_tlb_start
Commit a7d2475af7 ("powerpc: Sort the selects under CONFIG_PPC")
sorted all selects under CONFIG_PPC.
4 years later, several items have been introduced at wrong place,
a few other have been renamed without moving them to their correct
place.
Reorder them now.
While we are at it, simplify the test for a couple of them:
- PPC_64 && PPC_PSERIES is simplified in PPC_PSERIES
- PPC_64 && PPC_BOOK3S is simplified in PPC_BOOK3S_64
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/361ee3fc5009c709ae0ca592249bb0702c6ef073.1619024780.git.christophe.leroy@csgroup.eu
Commit 7c95d8893f ("powerpc: Change calling convention for
create_branch() et. al.") complexified the frame of function
do_feature_fixups(), leading to GCC setting up a stack
guard when CONFIG_STACKPROTECTOR is selected.
The problem is that do_feature_fixups() is called very early
while 'current' in r2 is not set up yet and the code is still
not at the final address used at link time.
So, like other instrumentation, stack protection needs to be
deactivated for feature-fixups.c and code-patching.c
Fixes: 7c95d8893f ("powerpc: Change calling convention for create_branch() et. al.")
Cc: stable@vger.kernel.org # v5.8+
Reported-by: Jonathan Neuschaefer <j.neuschaefer@gmx.net>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Jonathan Neuschaefer <j.neuschaefer@gmx.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b688fe82927b330349d9e44553363fa451ea4d95.1619715114.git.christophe.leroy@csgroup.eu
Trace memory is cleared and the corresponding dcache lines
are flushed after allocation. However, this should not be
done using the PFN. This adds the missing conversion to
virtual address.
Fixes: 2ac02e5ece ("powerpc/mm: Remove dcache flush from memory remove.")
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210501160254.1179831-1-sandipan@linux.ibm.com
kexec_file_load() uses initial_boot_params in setting up the device tree
for the kernel to be loaded. Though initial_boot_params holds info about
CPUs at the time of boot, it doesn't account for hot added CPUs.
So, kexec'ing with kexec_file_load() syscall leaves the kexec'ed kernel
with inaccurate CPU info.
If kdump kernel is loaded with kexec_file_load() syscall and the system
crashes on a hot added CPU, the capture kernel hangs failing to identify
the boot CPU, with no output.
To avoid this from happening, extract current CPU info from of_root
device node and use it for setting up the fdt in kexec_file_load case.
Fixes: 6ecd0163d3 ("powerpc/kexec_file: Add appropriate regions for memory reserve map")
Cc: stable@vger.kernel.org # v5.9+
Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210429060256.199714-1-sourabhjain@linux.ibm.com
This reduces TLB misses by nearly 30x on a `git diff` workload on a
2-node POWER9 (59,800 -> 2,100) and reduces CPU cycles by 0.54%, due
to vfs hashes being allocated with 2MB pages.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210503091755.613393-1-npiggin@gmail.com