Introduce an unshare hypercall which can be used to unmap memory from
the hypervisor stage-1 in nVHE protected mode. This will be useful to
update the EL2 ownership state of pages during guest teardown, and
avoids keeping dangling mappings to unreferenced portions of memory.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211215161232.1480836-14-qperret@google.com
Tearing down a previously shared memory region results in the borrower
losing access to the underlying pages and returning them to the "owned"
state in the owner.
Implement a do_unshare() helper, along the same lines as do_share(), to
provide this functionality for the host-to-hyp case.
Reviewed-by: Andrew Walbran <qwandor@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211215161232.1480836-13-qperret@google.com
__pkvm_host_share_hyp() shares memory between the host and the
hypervisor so implement it as an invocation of the new do_share()
mechanism.
Note that double-sharing is no longer permitted (as this allows us to
reduce the number of page-table walks significantly), but is thankfully
no longer relied upon by the host.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211215161232.1480836-12-qperret@google.com
By default, protected KVM isolates memory pages so that they are
accessible only to their owner: be it the host kernel, the hypervisor
at EL2 or (in future) the guest. Establishing shared-memory regions
between these components therefore involves a transition for each page
so that the owner can share memory with a borrower under a certain set
of permissions.
Introduce a do_share() helper for safely sharing a memory region between
two components. Currently, only host-to-hyp sharing is implemented, but
the code is easily extended to handle other combinations and the
permission checks for each component are reusable.
Reviewed-by: Andrew Walbran <qwandor@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211215161232.1480836-11-qperret@google.com
In preparation for adding additional locked sections for manipulating
page-tables at EL2, introduce some simple wrappers around the host and
hypervisor locks so that it's a bit easier to read and bit more difficult
to take the wrong lock (or even take them in the wrong order).
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211215161232.1480836-10-qperret@google.com
Explicitly name the combination of SW0 | SW1 as reserved in the pte and
introduce a new PKVM_NOPAGE meta-state which, although not directly
stored in the software bits of the pte, can be used to represent an
entry for which there is no underlying page. This is distinct from an
invalid pte, as stage-2 identity mappings for the host are created
lazily and so an invalid pte there is the same as a valid mapping for
the purposes of ownership information.
This state will be used for permission checking during page transitions
in later patches.
Reviewed-by: Andrew Walbran <qwandor@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211215161232.1480836-9-qperret@google.com
In order to simplify the page tracking infrastructure at EL2 in nVHE
protected mode, move the responsibility of refcounting pages that are
shared multiple times on the host. In order to do so, let's create a
red-black tree tracking all the PFNs that have been shared, along with
a refcount.
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211215161232.1480836-8-qperret@google.com
The create_hyp_mappings() function can currently be called at any point
in time. However, its behaviour in protected mode changes widely
depending on when it is being called. Prior to KVM init, it is used to
create the temporary page-table used to bring-up the hypervisor, and
later on it is transparently turned into a 'share' hypercall when the
kernel has lost control over the hypervisor stage-1. In order to prepare
the ground for also unsharing pages with the hypervisor during guest
teardown, introduce a kvm_share_hyp() function to make it clear in which
places a share hypercall should be expected, as we will soon need a
matching unshare hypercall in all those places.
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211215161232.1480836-7-qperret@google.com
Implement kvm_pgtable_hyp_unmap() which can be used to remove hypervisor
stage-1 mappings at EL2.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211215161232.1480836-6-qperret@google.com
kvm_pgtable_hyp_unmap() relies on the ->page_count() function callback
being provided by the memory-management operations for the page-table.
Wire up this callback for the hypervisor stage-1 page-table.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211215161232.1480836-5-qperret@google.com
In nVHE-protected mode, the hyp stage-1 page-table refcount is broken
due to the lack of refcount support in the early allocator. Fix-up the
refcount in the finalize walker, once the 'hyp_vmemmap' is up and running.
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211215161232.1480836-4-qperret@google.com
To prepare the ground for allowing hyp stage-1 mappings to be removed at
run-time, update the KVM page-table code to maintain a correct refcount
using the ->{get,put}_page() function callbacks.
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211215161232.1480836-3-qperret@google.com
In nVHE protected mode, the EL2 code uses a temporary allocator during
boot while re-creating its stage-1 page-table. Unfortunately, the
hyp_vmmemap is not ready to use at this stage, so refcounting pages
is not possible. That is not currently a problem because hyp stage-1
mappings are never removed, which implies refcounting of page-table
pages is unnecessary.
In preparation for allowing hypervisor stage-1 mappings to be removed,
provide stub implementations for {get,put}_page() in the early allocator.
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211215161232.1480836-2-qperret@google.com
* kvm-arm64/fpsimd-tracking:
: .
: Simplify the handling of both the FP/SIMD and SVE state by
: removing the need for mapping the thread at EL2, and by
: dropping the tracking of the host's SVE state which is
: always invalid by construction.
: .
arm64/fpsimd: Document the use of TIF_FOREIGN_FPSTATE by KVM
KVM: arm64: Stop mapping current thread_info at EL2
KVM: arm64: Introduce flag shadowing TIF_FOREIGN_FPSTATE
KVM: arm64: Remove unused __sve_save_state
KVM: arm64: Get rid of host SVE tracking/saving
KVM: arm64: Reorder vcpu flag definitions
Signed-off-by: Marc Zyngier <maz@kernel.org>
* kvm-arm64/vcpu-first-run:
: Rework the "vcpu first run" sequence to be driven by KVM's
: "PID change" callback, removing the need for extra state.
KVM: arm64: Drop vcpu->arch.has_run_once for vcpu->pid
KVM: arm64: Merge kvm_arch_vcpu_run_pid_change() and kvm_vcpu_first_run_init()
KVM: arm64: Restructure the point where has_run_once is advertised
KVM: arm64: Move kvm_arch_vcpu_run_pid_change() out of line
KVM: arm64: Move SVE state mapping at HYP to finalize-time
Signed-off-by: Marc Zyngier <maz@kernel.org>
With the transition to kvm_arch_vcpu_run_pid_change() to handle
the "run once" activities, it becomes obvious that has_run_once
is now an exact shadow of vcpu->pid.
Replace vcpu->arch.has_run_once with a new vcpu_has_run_once()
helper that directly checks for vcpu->pid, and get rid of the
now unused field.
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
The kvm_arch_vcpu_run_pid_change() helper gets called on each PID
change. The kvm_vcpu_first_run_init() helper gets run on the...
first run(!) of a vcpu.
As it turns out, the first run of a vcpu also triggers a PID change
event (vcpu->pid is initially NULL).
Use this property to merge these two helpers and get rid of another
arm64-specific oddity.
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Restructure kvm_vcpu_first_run_init() to set the has_run_once
flag after having completed all the "run once" activities.
This includes moving the flip of the userspace irqchip static key
to a point where nothing can fail.
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Having kvm_arch_vcpu_run_pid_change() inline doesn't bring anything
to the table. Move it next to kvm_vcpu_first_run_init(), which will
be convenient for what is next to come.
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
We currently map the SVE state to HYP on detection of a PID change.
Although this matches what we do for FPSIMD, this is pretty pointless
for SVE, as the buffer is per-vcpu and has nothing to do with the
thread that is being run.
Move the mapping of the SVE state to finalize-time, which is where
we allocate the state memory, and thus the most logical place to
do this.
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Misc fixes all over the place.
Revert of virtio used length validation series: the approach taken does
not seem to work, breaking too many guests in the process. We'll need to
do length validation using some other approach.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmGe0sEPHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRp8WEH/imDIq1iduDeAuvFnmrm5eEO9w3wzXCT4NiG
8Pla241FzQ1pEFEAne16KP0+SlLhj7P0oc5FR8vkYvxxuyneDbCzcS2M1kYMOpA1
ry28PuObAnekzE/WXxvC031ozB5Zb/FL54gmw+/1EdAOdMGL0CdQ1aJxREBHRTBo
p4ZHr83GA2D2C/IyKCsgQ8cB9ZrMqImTQQ4vRD89HoFBp+GH2u2Di1iyXEWuOqdI
n1+7M9jjbyW8A+N1bkOicpShS/6UcyJQOOcg8kvUQOV6srVkYhfaiWC/CbOP2g73
8PKK+/K2Htf92s6RdvDUPSKmvqGR/4KPZWPtWThXBYXGgWul0uI=
=q6tO
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull vhost,virtio,vdpa bugfixes from Michael Tsirkin:
"Misc fixes all over the place.
Revert of virtio used length validation series: the approach taken
does not seem to work, breaking too many guests in the process. We'll
need to do length validation using some other approach"
[ This merge also ends up reverting commit f7a36b03a7 ("vsock/virtio:
suppress used length validation"), which came in through the
networking tree in the meantime, and was part of that whole used
length validation series - Linus ]
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vdpa_sim: avoid putting an uninitialized iova_domain
vhost-vdpa: clean irqs before reseting vdpa device
virtio-blk: modify the value type of num in virtio_queue_rq()
vhost/vsock: cleanup removing `len` variable
vhost/vsock: fix incorrect used length reported to the guest
Revert "virtio_ring: validate used buffer length"
Revert "virtio-net: don't let virtio core to validate used length"
Revert "virtio-blk: don't let virtio core to validate used length"
Revert "virtio-scsi: don't let virtio core to validate used buffer length"
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmGjr4UTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoXEXEAC79s58FkEV5LsKnmNe5pr/XG47Fgbz
MMTX+P4IUwUrvRHPzEKPdO4lR8ZSFefhSHcPA06cWYyNyeh/UHq/sB4JGysuH7bw
JAIJhJ3PsLv19TWvIKN3WHB7R3gwqKYzzzsKjKJHfepHv7kzidYz1fu380/bA2Zw
LDjlMaQFSIA4rk7sBB/FFC3wsCvmU69iwG82x+QXX3ZWk0eB8bmw4Vsz6tsjo1D2
EtEyxmjh2HjI+AbTORUsqucWvoKq3PvA9CX4SJdT/u+/ZX1zYR6iwsBkB+oiDNFx
mxN15QKUA0t5NmF3dtoftj0lLM7iT7rd9PAdyPbWDm1o02z7Q780tsv7S8A2njMp
ACwvCkupE9LSmEpFH9YyQeCBtyppXd0laN0mcL3h1MmSQbZs3KrtqyQDJ6McjPwL
TiDFa8PKCBejGms7P6gPBv0DCsNNBQ0alAioi+zVsn8WbnMKGBpb0nplAdn1Cxt0
GuJR63oEpX8Jtejb/018cEdDBiEmGh+7HXVXy4giAKnn92Pd7AhpWXwUYGuoD7j3
1nRI9TOEoWS9ZcK7goaVp9D42YCEPsn66RPGyg25EaTlWJ0g9tmpbXhGzFPYg8FS
tGIBw7CwlGMIKng+MWwuGu9hJYk0Tc1NcvrkxYTSVFNDYo4AQDpqwLlGuJsO893c
8tFApEuQ+w5zkg==
=QG5U
-----END PGP SIGNATURE-----
Merge tag 'x86-urgent-2021-11-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 build fix from Thomas Gleixner:
"A single fix for a missing __init annotation of prepare_command_line()"
* tag 'x86-urgent-2021-11-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot: Mark prepare_command_line() __init
left on the idle task's stack when a CPU is brought up after it was brought
down before.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmGjr0UTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoadaD/0Q3hMjI+N3AigZiBToGccafOfsmiMH
fJ6fUM7gh4pTrGuoDQSGt02zYNYx9Zx7X8PpiuWAAIKbppiKmvniCgPMgMGARUBn
UQ/W2XWUiu/wtleRf4JtE6VwHciNVgLdnWIazRWsjDryUXVcJwhn8J1o5K6LnwjD
Rof/aYuVR47DprYG03OI0FD1GwlSPWMbAgB6OlJS6ZRvpq+7ergVKA0PQAY7ZZko
vBlDU7Sq4dJ2CE4aiRGLyLNhZfrubmfeMP2UVmVSpMBta7zs+YmaYjZvKfgO3KZT
OVbyFfDbL8FJgUmTSI1WBKq+W44o1D1e8VrKiCFj+y5w9diHW9OQEg2wqQdsMB6a
QgNgDZjg8UHancF5O2kNYjnUVGgxUww7PftWbxkg4VAUmlCzhbZAAegspZHow0mU
zcqDaMTky0FbcbB/Ukik/HG6J3KrR34GYjui3fe0wZHZlDim6azZucRTd+x9jRsB
jPUlE3DW0JfNFKcMnlLLNvS8h3j7iCbb3XDv1y4BW0+EB76IsCThjqFO0dIPpiju
T9ituTr6p4+B4U37Cz5qOMgUSha+f9/6blYG8NgCeHyD2l5HDnavO9lGhoP3jsZJ
LJRa8mWd+oZbZlpBtTkaSOA55cTxonsIuCseTdXlfsVtzuJBmLKwdRPuDSRCEo0G
xH1vNNUba86+6A==
=ne0K
-----END PGP SIGNATURE-----
Merge tag 'sched-urgent-2021-11-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Thomas Gleixner:
"A single scheduler fix to ensure that there is no stale KASAN shadow
state left on the idle task's stack when a CPU is brought up after it
was brought down before"
* tag 'sched-urgent-2021-11-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/scs: Reset task stack state in bringup_cpu()
a trace point event as it's not possible to deliver a synchronous signal to
a different task from there.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmGjrj0THHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoRHRD/9T8sQw4arpmaFvB76m1LijsGrAuoXv
XH/gTcUupCdo0J1X8iEZfuGKx3C89BqLFaGpucK+9TCl6VMKHqtDTunciKV79tVQ
TcaTKYFwCwNrAQ0eATNzuM4RzzHGx0TK6u1DB0iFTSUJfAQ/EUE4/+yau2qDVfql
Pud/Fm5uHtqxDq5T9XqG3w324e8HWJr2johGMeg4ukbuKppRoNWlZcm75HndyK4m
OT8svA9Yg8GhSZNQ3q4HQTwof4zcGyaln+wxf7GWr9oryBPiqhHQuvWKXqDXLCVb
SbhsYmYcHEQgM3wpNaNqSf1LV1RoPuhFhgWB0te5SoVzoF7KpJLs+VIP/0q27Mcu
6aF7eTUG92NkR1uvSQ2d62UBE4EM0bFBvPaD4A5hLX1JAkVxHi+vxRFf5q0bUliO
Yybia4bv1WYwCVajBbpgwNDMKb4qacoIcXPlsjkRqkxk/vedOBkJadJnIEqc1iOl
Ld70jylQmj/TxmFM3iGk+QyFwFNpPnUxu0wws7A4YxYFknrhW+/8pcVTsUApBuYN
LWWiC08QelvQucCYGqpbEX37WA3DFXj4AHDp7nCJBkweMGhcgIBvZbz8yz/mgT7T
CTMkT5ZZY93mAWiXdagNJI4EWnjHZgeVtSlKRvF1D0J49SyKepqogOxNgi7KnW+/
tbCmxOTH9eA2Eg==
=yMum
-----END PGP SIGNATURE-----
Merge tag 'perf-urgent-2021-11-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fix from Thomas Gleixner:
"A single fix for perf to prevent it from sending SIGTRAP to another
task from a trace point event as it's not possible to deliver a
synchronous signal to a different task from there"
* tag 'perf-urgent-2021-11-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Ignore sigtrap for tracepoints destined for other tasks
- Plug a race in the lock handoff which is caused by inconsistency of the
reader and writer path and can lead to corruption of the underlying
counter.
- down_read_trylock() is suboptimal when the lock is contended and
multiple readers trylock concurrently. That's due to the initial value
being read non-atomically which results in at least two compare exchange
loops. Making the initial readout atomic reduces this significantly.
Whith 40 readers by 11% in a benchmark which enforces contention on
mmap_sem.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmGjrRITHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYodsdEACRDUU5tkNVIgNTsGrO4IUhNW9fxyfG
3dCAzcQx9w1UjjBn23/B0c6rPsVqEv6hKouBGXqdOHj0kLx6Xn0IPMTvqycPL+mp
OyDzx+t773BlvTZyaYFa6vBiWbEVGzedDp6uLsYaBNo//4yN1WZY3mevTwzKVceX
WOoobHjsoh5Wfwr1XmNw+7HVhPaY0E50DaIuRQrJjNj1zsUhzJsjr/M1NpiqCaSm
PleDum3Dg0PD/pxdWtm34teuGQur0QknqPc2I6sZGnX0UMsCozeZAuH/MGnwwXec
fsweMXBVyDngOIZbFX/tPbVTocOpfxkYgJKXwIrlmVwHzFeT6KFfpEPXxVhUj6ao
3KNqD+V5VL2zdMF11WB2lVQaX2/48WIXz23ppiUA5R7tJTPr+yAIYIUzT2GFkMTr
u//41pxnoXlm9RCjANrbzGSl049exf01mMFVzm6zGt6PZqTE/kaBuklRy6Vibk/C
cSB7Iy/iVaySunmF6X5RuBT7HsKrIN6SgYRCHZ7BI9aelQpHztJuy4LZAbgRPZZU
/VKB2BKLx1KeRNfn6ScvF1uSSLmXoFVs0PP7HwMrPs3AdI+KaHmYLqZf+Bf4W1q2
5bAfj2x5qWwvMrV4RnwLltWAASw1G/o5fs8WhPA6cZkG9iZCB5EBCnHv4B0pm+oq
xw8RPYImZFzK8w==
=dKz+
-----END PGP SIGNATURE-----
Merge tag 'locking-urgent-2021-11-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Thomas Gleixner:
"Two regression fixes for reader writer semaphores:
- Plug a race in the lock handoff which is caused by inconsistency of
the reader and writer path and can lead to corruption of the
underlying counter.
- down_read_trylock() is suboptimal when the lock is contended and
multiple readers trylock concurrently. That's due to the initial
value being read non-atomically which results in at least two
compare exchange loops. Making the initial readout atomic reduces
this significantly. Whith 40 readers by 11% in a benchmark which
enforces contention on mmap_sem"
* tag 'locking-urgent-2021-11-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/rwsem: Optimize down_read_trylock() under highly contended case
locking/rwsem: Make handoff bit handling more consistent
- The setting of the pid filtering flag tested the "trace only this
pid" case twice, and ignored the "trace everything but this pid" case.
Note, the 5.15 kernel does things a little differently due to the new
sparse pid mask introduced in 5.16, and as the bug was discovered
running the 5.15 kernel, and the first fix was initially done for
that kernel, that fix handled both cases (only pid and all but pid),
but the forward port to 5.16 created this bug.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYaOnPxQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qqUTAP9KCOe2rZBjbn14xiCm/wbECjox58Uf
PrJ3fCDBVt8E0gEAjHkR3ybVE4xYLKj4RrO5GJ/pk/x1NeMmHdi+ls5hOQg=
=MZso
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.16-rc2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull another tracing fix from Steven Rostedt:
"Fix the fix of pid filtering
The setting of the pid filtering flag tested the "trace only this pid"
case twice, and ignored the "trace everything but this pid" case.
The 5.15 kernel does things a little differently due to the new sparse
pid mask introduced in 5.16, and as the bug was discovered running the
5.15 kernel, and the first fix was initially done for that kernel,
that fix handled both cases (only pid and all but pid), but the
forward port to 5.16 created this bug"
* tag 'trace-v5.16-rc2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Test the 'Do not trace this pid' case in create event
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmGiaB4ACgkQiiy9cAdy
T1H4PAv+OG94BZe+MyMdXgoRbOLiEeye7dm/TZnlpdtV96WmyrMkA4/1nyOc8k7F
bNRn3ocuH3YCjjUB2kU6MZW5Kh9aSxULYgEC0zxuPcA9q4Ig4FehZD2U3r6fztFM
U5Do9n1xRXjIgS/gI7DfZHSQ+SYVwBMAzRB3opplcs1CW68N7+WmKa5CQ6wH5vZM
GqhU4+yFLhZGiXdxXTFF/bLKWer2PF9p5J4mLzRH0ugTG26tY7WnJcQIj+XrNTrN
ccBSXr+9fHFFa9iGcLY08pghk8s1F6dXl/BLv7DFswKOoje7HsDLcWpNSVtVDojO
0Fg5vVtuDKapCOrpmtPt8Khc4qgkxY6VpJyELMCwJuxrSzTE59C54gQcgJfZO97d
bpGeV9L6D6VTLTe1LhUFCRQbc0FFO3daPWRrxrZnvtJeef70xVSZPtrQqNtpnAtM
RwkN2Rf/Enl03w9U25nv3ymlKoHERiR2XZADLfWc5XdB40Lxc+fDccyoVP1wmxT8
uicPpySr
=bEt3
-----END PGP SIGNATURE-----
Merge tag '5.16-rc2-ksmbd-fixes' of git://git.samba.org/ksmbd
Pull ksmbd fixes from Steve French:
"Five ksmbd server fixes, four of them for stable:
- memleak fix
- fix for default data stream on filesystems that don't support xattr
- error logging fix
- session setup fix
- minor doc cleanup"
* tag '5.16-rc2-ksmbd-fixes' of git://git.samba.org/ksmbd:
ksmbd: fix memleak in get_file_stream_info()
ksmbd: contain default data stream even if xattr is empty
ksmbd: downgrade addition info error msg to debug in smb2_get_info_sec()
docs: filesystem: cifs: ksmbd: Fix small layout issues
ksmbd: Fix an error handling path in 'smb2_sess_setup()'
Use the architecture independent Kconfig option PAGE_SIZE_LESS_THAN_64KB
to indicate that VMXNET3 requires a page size smaller than 64kB.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
NTFS_RW code allocates page size dependent arrays on the stack. This
results in build failures if the page size is 64k or larger.
fs/ntfs/aops.c: In function 'ntfs_write_mst_block':
fs/ntfs/aops.c:1311:1: error:
the frame size of 2240 bytes is larger than 2048 bytes
Since commit f22969a660 ("powerpc/64s: Default to 64K pages for 64 bit
book3s") this affects ppc:allmodconfig builds, but other architectures
supporting page sizes of 64k or larger are also affected.
Increasing the maximum frame size for affected architectures just to
silence this error does not really help. The frame size would have to
be set to a really large value for 256k pages. Also, a large frame size
could potentially result in stack overruns in this code and elsewhere
and is therefore not desirable. Make NTFS_RW dependent on page sizes
smaller than 64k instead.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Cc: Anton Altaparmakov <anton@tuxera.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
NTFS_RW and VMXNET3 require a page size smaller than 64kB. Add generic
Kconfig option for use outside architecture code to avoid architecture
specific Kconfig options in that code.
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Cc: Anton Altaparmakov <anton@tuxera.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When creating a new event (via a module, kprobe, eprobe, etc), the
descriptors that are created must add flags for pid filtering if an
instance has pid filtering enabled, as the flags are used at the time the
event is executed to know if pid filtering should be done or not.
The "Only trace this pid" case was added, but a cut and paste error made
that case checked twice, instead of checking the "Trace all but this pid"
case.
Link: https://lore.kernel.org/all/202111280401.qC0z99JB-lkp@intel.com/
Fixes: 6cb206508b ("tracing: Check pid filtering when creating events")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
- Fix buffer resource leak that could lead to livelock on corrupt fs.
- Remove unused function xfs_inew_wait to shut up the build robots.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmGef+UACgkQ+H93GTRK
tOtAQQ/9EAzGgADQR2dcCXoZwRP3LKz5WGops0qCqywvH/BfbLmKmqzgfUXaf026
dMmrnP9+d6BFytoLk8IXmpydML8qK2/k8lmwJRUG8arPRbHQwVSckDM4vXVrI2X2
K4f8nu1CBR8MDavVS7cR8CZWO3XJMLKTZtxCTdOQlRAw+m9P1+S00LWkiuDTPTTX
YRAGkYEVCtCQLuqJgClf267/5+MaZXRJfFVh1hBQkUtPFXhu1LXfJqZ/thgacmlD
D6/tfwt6Ad7iKg4LjtoJC6zkvoTFN7rB39PGPGILWgS3Nimp0xWgKoTnJ6VLRblY
FwItA6zERXQoHRse7eGMfQ4ZnGT40pvIiN+JVZ4jju4hElY7dkigBfJv8oLkVm3D
xV2dA7YO4DcYS4UAifZ+C00T3pYo/rQnsIwfAGsbh28Xlshyi4+cEqGRTkqrJHnx
gkE4uzp0acs6HVTMC3S0pL6oTirPbHAQtt5tjS8ZtxALlYqF3+Y8xzQhMyB7Fo0b
is213My5aP6VSr2UZajxpkIOl+5OvQ4v37fhmBXMGKmiz7XCzatyvtUjEBjahFfF
FJeug4hRDom/uCKPM3eHfuZxGorlIB1GIeSe7+0ZIXcn+Wuo5seWfJrNP+1H/Asb
Vp6TNueGiCKLBD/+MLWp65zyIZu+b2gNEyYCuhwnOf2sQflhZmI=
=Fa6X
-----END PGP SIGNATURE-----
Merge tag 'xfs-5.16-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Darrick Wong:
"Fixes for a resource leak and a build robot complaint about totally
dead code:
- Fix buffer resource leak that could lead to livelock on corrupt fs.
- Remove unused function xfs_inew_wait to shut up the build robots"
* tag 'xfs-5.16-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: remove xfs_inew_wait
xfs: Fix the free logic of state in xfs_attr_node_hasname
- Fix an accounting problem where unaligned inline data reads can run
off the end of the read iomap iterator. iomap has historically
required that inline data mappings only exist at the end of a file,
though this wasn't documented anywhere.
- Document iomap_read_inline_data and change its return type to be
appropriate for the information that it's actually returning.
-----BEGIN PGP SIGNATURE-----
iQIyBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmGehZoACgkQ+H93GTRK
tOu7ig/44n/PzNzFv0dLAadoanBH/d3uMDs/s9DBWw6s7RYU0wjUMHgGyla9vFgT
aw0xxLZfppvk59Gkme5WPiv/ksRwB9ZcWvEFUOMX/zt55uSNueCXCVpckduf9j29
gKUzFvRGVssBQ2ACHvuH/s6c6hF9EOhCHadREHinqemU3zg/+eH/+L+dgHIitzMg
WiGdWEaojQX8brxD4kH7xfsNUbuFwCNO5UbGtndzsK/5b/8QGXIXOUrJ0JbefoBf
Kscz4opZjfJIuGjczhIhollgV0jihMOH3OIJYfHQHVOUGVwQ/2epHo2cwq6Wujk0
3qXsjuYloR5xkyQwoLfr382BBO4teQW75nNUt+ez4tvwYs7Ck3U1oOFG+j3KiU/P
gsMcSPzgQIiFdU1DRR5r6li6daLJJWK34PeZ4DtE0zFUKwslSUKytv+pT99yNPTG
xkhvdU6R4jchUOPJCZCh8zdARhofTiaxrLPlZ0xKenqxlLxdMm+W0fCkuGFrLHSq
g39CGJhuPe7OexK8lBY9fQ08zwI7LJIx/vQGQ3hbqZaxq14uCZVr/nGwcXlzBhzl
BufZslO9Aj/eG1/MzwSN/xDTo3Jl+RuCs9lBgpg8pu6WNzueXbRMQnURy6Vf47Ua
U6Awq0OrGbippg4Y7P/IiyYDKjFWPjgftYRJhoosxxfbO00BSA==
=hIEG
-----END PGP SIGNATURE-----
Merge tag 'iomap-5.16-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull iomap fixes from Darrick Wong:
"A single iomap bug fix and a cleanup for 5.16-rc2.
The bug fix changes how iomap deals with reading from an inline data
region -- whereas the current code (incorrectly) lets the iomap read
iter try for more bytes after reading the inline region (which zeroes
the rest of the page!) and hopes the next iteration terminates, we
surveyed the inlinedata implementations and realized that all
inlinedata implementations also require that the inlinedata region end
at EOF, so we can simply terminate the read.
The second patch documents these assumptions in the code so that
they're not subtle implications anymore, and cleans up some of the
grosser parts of that function.
Summary:
- Fix an accounting problem where unaligned inline data reads can run
off the end of the read iomap iterator. iomap has historically
required that inline data mappings only exist at the end of a file,
though this wasn't documented anywhere.
- Document iomap_read_inline_data and change its return type to be
appropriate for the information that it's actually returning"
* tag 'iomap-5.16-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
iomap: iomap_read_inline_data cleanup
iomap: Fix inline extent handling in iomap_readpage
- Have created events reflect the current state of pid filtering
- Test pid filtering on discard test of recorded logic.
(Also clean up the if statement to be cleaner).
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYaJ3ZhQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qhusAQC3nj0Xj4LRJXJtH4ALoJuthoBNoRHN
SslcuItuFLheyQD/URecPD2h4O+u/GQs1rjEUJ3B/mdzXojIrTz6Stagkwg=
=QCQF
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.16-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"Two fixes to event pid filtering:
- Make sure newly created events reflect the current state of pid
filtering
- Take pid filtering into account when recording trigger events.
(Also clean up the if statement to be cleaner)"
* tag 'trace-v5.16-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix pid filtering when triggers are attached
tracing: Check pid filtering when creating events
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmGiPsAQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpr6oD/9i1PrY0nhHbNdpEKC5ZB2TeX8inxvCQj5i
HmAm421s9umLjUXrRZQH6VMsSOWAS6viaXA9MdeWkxXSA9950Stx4Tr5LpujN/iJ
hZneHgX1V3kfGkIOWdstPE62QTKCoMDtBFx1Jk+XZfZ7N9ogWHlmvr3C7iD3QLId
+RODIIZlOZXYP5cYIUon9hK4ydMRjAh5jGik73ckN6gG+vsEWlsVZPlKLgqQ+AyT
CXkXHh5Ad5tQt1vqio+PH2Fk42Ce/+CeY0vhNS2ZUWDHwQEKSz8qIHz3kx+zaCCU
Y3e5CVe8MW64MOZIQPayEeNyposT0YNxTDgOMZxRi2J3IMeYFluJTm57zMNnnH/N
sgKZHZDItAfsgkptVkptoIEeg+2nus3GWmhAKVWu7zPaS8LfAVROt+OCx6+2CUvr
DHJwuGbkG3XR4vwO69ugTbjgRh97P3VJJfg4t6QZZNE9JDSAviHUrZDu9X53xHN5
hAsxg98QRz5BqvqPq99KIB6lq2zQ8LBHLZnnjEXhF3q031LpxCAhpVNBNt+LgQo2
MiGyx4lgznAl6P7/2uXaq5VxCLhWWHI5BtLdlGvJH3v8Uckci/JQKYoO65RNL4D8
gzdGnLQwJ80mDVlHcoN/9vUIYirru4+E+BiW6ua1/5MGmTAQvt6zw4Lo88MZMa5i
mKOxZxL30g==
=h6A3
-----END PGP SIGNATURE-----
Merge tag 'io_uring-5.16-2021-11-27' of git://git.kernel.dk/linux-block
Pull more io_uring fixes from Jens Axboe:
"The locking fixup that was applied earlier this rc has both a deadlock
and IRQ safety issue, let's get that ironed out before -rc3. This
contains:
- Link traversal locking fix (Pavel)
- Cancelation fix (Pavel)
- Relocate cond_resched() for huge buffer chain freeing, avoiding a
softlockup warning (Ye)
- Fix timespec validation (Ye)"
* tag 'io_uring-5.16-2021-11-27' of git://git.kernel.dk/linux-block:
io_uring: Fix undefined-behaviour in io_issue_sqe
io_uring: fix soft lockup when call __io_remove_buffers
io_uring: fix link traversal locking
io_uring: fail cancellation for EXITING tasks
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmGiPq4QHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpoyLD/96XwppHepSKHPRAo5A37XobgUWlWZ8c5MM
goKaZOKFCht9EVHaao5OmB+XoPQR8vbJPxuVbVg66FQ4C3Yq3/LlG7rGgXcPuvSB
gX82KHyoy2aqmeEIjbMiVpGMBfkQChuxCHvUVyApZY6iUYCCxN1cfY24WMGUxFG5
t5gSo0qR4YlI4RM66J/79dfmAz6hCkTqNmuZvyUUwwDeQMt6/djTKAO5X2x9epBu
JUgOcgEbvEsh/gtrFMx1MYIT1mYZ+f5BMSAudSp9q6E+kNgOPdGJPC5h3RIXsAii
JlwPqYsy0Cgnzq9lwkwh/lNHdDcqqRzwFo3B3AySYMfY5dfhZWjt1tQgBPKaGVPH
fw7memTI/Z1Ht5o6bC7Lo+PsTnW33T9gXgLRwD5oApN39vBQkyhB9Nap8fBY4ZR4
dwQcz5DFoTw58PEAKnm4q36V8T2W2wJC9wv/GMyqkcMmpUKbd38ZVPiLLO5AP6e4
HNox3FWfhvZrhzD3OHB0ujYOvo/WrIBT9RChSf9VAux+E7dcQxZdq/FQktIEb5Lp
kPArj9etxS9AoHoU2XMEknFkoQBlR73zeaf+3gB8nBpkwIg+H69LDgsq1sfd5wN5
VyMIU84no9H1uazZnUbrTuroPy9ItQ21Ee5Y+2rO5E1q2L4LkU5lXay+JP8t9JIo
JRcfNiH5YQ==
=xrY+
-----END PGP SIGNATURE-----
Merge tag 'block-5.16-2021-11-27' of git://git.kernel.dk/linux-block
Pull more block fixes from Jens Axboe:
"Turns out that the flushing out of pending fixes before the
Thanksgiving break didn't quite work out in terms of timing, so here's
a followup set of fixes:
- rq_qos_done() should be called regardless of whether or not we're
the final put of the request, it's not related to the freeing of
the state. This fixes an IO stall with wbt that a few users have
reported, a regression in this release.
- Only define zram_wb_devops if it's used, fixing a compilation
warning for some compilers"
* tag 'block-5.16-2021-11-27' of git://git.kernel.dk/linux-block:
zram: only make zram_wb_devops for CONFIG_ZRAM_WRITEBACK
block: call rq_qos_done() before ref check in batch completions
Twelve fixes, eleven in drivers (target, qla2xx, scsi_debug, mpt3sas,
ufs). The core fix is a minor correction to the previous state update
fix for the iscsi daemons.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCYaIz7yYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishXbyAQCozhIX
sWfrJJJClKF/1WlDuGIU9hc2hCn/2Cod2wxDLQEAs1FQwNyA++KHys167PJAYkSe
Bayq8IoS9eL8z8ZKt+g=
=+sFg
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Twelve fixes, eleven in drivers (target, qla2xx, scsi_debug, mpt3sas,
ufs). The core fix is a minor correction to the previous state update
fix for the iscsi daemons"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: scsi_debug: Zero clear zones at reset write pointer
scsi: core: sysfs: Fix setting device state to SDEV_RUNNING
scsi: scsi_debug: Sanity check block descriptor length in resp_mode_select()
scsi: target: configfs: Delete unnecessary checks for NULL
scsi: target: core: Use RCU helpers for INQUIRY t10_alua_tg_pt_gp
scsi: mpt3sas: Fix incorrect system timestamp
scsi: mpt3sas: Fix system going into read-only mode
scsi: mpt3sas: Fix kernel panic during drive powercycle test
scsi: ufs: ufs-mediatek: Add put_device() after of_find_device_by_node()
scsi: scsi_debug: Fix type in min_t to avoid stack OOB
scsi: qla2xxx: edif: Fix off by one bug in qla_edif_app_getfcinfo()
scsi: ufs: ufshpb: Fix warning in ufshpb_set_hpb_read_to_upiu()
Highlights include:
Stable fixes:
- NFSv42: Fix pagecache invalidation after COPY/CLONE
Bugfixes:
- NFSv42: Don't fail clone() just because the server failed to return
post-op attributes
- SUNRPC: use different lockdep keys for INET6 and LOCAL
- NFSv4.1: handle NFS4ERR_NOSPC from CREATE_SESSION
- SUNRPC: fix header include guard in trace header
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmGiTnMACgkQZwvnipYK
APJoJQ//VZYSCx/mGaTIj5oUwjBKE/n/9rz2EUGS1cfYjZjPpb5Xgm1tn+1A4c01
Ztu9/hKgwrDqknqkmtKvP1GsX5vYUqgfqAlc880Q2nXAqaLJBBZgB6BFMmTtcoQx
C24L0tgxlZVD9Vw0DJEVDVgDxXA/9VmdSQK6uptQRQhcYf4VtR1wAzELHWdkdkfq
1WrREeAwGWw1BaPTrlPn9XwW9qTaMlBH05XRHh6dM7gFmoIe3td7kq7BOnFxFsnA
AQZ/nCgeMTE04kQQMzYqUc4YqZvnzUHxueZ6q8s0K1RKJBpNIQNgdWUMa285Qo4a
JA9oBCPPo0JjmsEge2Km12zyBJoA7lLQDfc6UQJON50ADF0sTu3wszgGuC63KkhE
V+kUogK2WOlnGky2yYrHmv43mcCcyoJ/g+g+38GXNYGorsFi/XUhvctEpFFFPF71
0umQwWhA6Dhc52hMj5DN4nfspp//hEuV9o7/zJzlPi0elC+xtVaBWZCorNvznlXW
C/O5yobJVd89PuIE17Stg+c0Rq3k7RVPPoyS2IaMkZ5cs7DtT5Tz3nKClPAA8Bur
mPLAMkHSOSLO26cy30SVZCIx1JDMJU9dx/PkFemCexkzYXQxgp9px/8wmM2Xe6Oc
/hDqi8V7ayJUuYpYuJ6sA8oUqj3j+NkoP4w5HhlnmZDBhFMEBhM=
=jeHm
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-5.16-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client fixes from Trond Myklebust:
"Highlights include:
Stable fixes:
- NFSv42: Fix pagecache invalidation after COPY/CLONE
Bugfixes:
- NFSv42: Don't fail clone() just because the server failed to return
post-op attributes
- SUNRPC: use different lockdep keys for INET6 and LOCAL
- NFSv4.1: handle NFS4ERR_NOSPC from CREATE_SESSION
- SUNRPC: fix header include guard in trace header"
* tag 'nfs-for-5.16-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
SUNRPC: use different lock keys for INET6 and LOCAL
sunrpc: fix header include guard in trace header
NFSv4.1: handle NFS4ERR_NOSPC by CREATE_SESSION
NFSv42: Fix pagecache invalidation after COPY/CLONE
NFS: Add a tracepoint to show the results of nfs_set_cache_invalid()
NFSv42: Don't fail clone() unless the OP_CLONE operation failed
- Fix an ABBA deadlock introduced by XArray convention.
-----BEGIN PGP SIGNATURE-----
iIcEABYIAC8WIQThPAmQN9sSA0DVxtI5NzHcH7XmBAUCYaG1ZxEceGlhbmdAa2Vy
bmVsLm9yZwAKCRA5NzHcH7XmBFQqAP0Zk3Nozb1BnXnQ/xGZR+liyTimNlujsWqQ
YtHIwB4EiwEAopRnuZRY00txMqzYZNz3sLEV6ylATbN0i/EJ2brOpA8=
=ZHsQ
-----END PGP SIGNATURE-----
Merge tag 'erofs-for-5.16-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs fix from Gao Xiang:
"Fix an ABBA deadlock introduced by XArray conversion"
* tag 'erofs-for-5.16-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: fix deadlock when shrink erofs slab
Fix KVM using a Power9 instruction on earlier CPUs, which could lead to the host SLB being
incorrectly invalidated and a subsequent host crash.
Fix kernel hardlockup on vmap stack overflow on 32-bit.
Thanks to: Christophe Leroy, Nicholas Piggin, Fabiano Rosas.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmGh96MTHG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgGHDEACu37n/NLFp3qzHyrk4fHN6sUQ0wGsE
oroKeIrTP2+JPsoEDd6Jd9oQmqmVLmVTp4y/Lu6AAJbK4p1He1iPIxIbZn12NwqY
/siCihV76PXR8748g1j8r3a3UYDmmvmllAIf9JdAn2I4kKPE3J37/1NEGSqZyJ80
HzkheTczUweANhVuG1RDh3WvnAKMuOvtrT/YgetVILig23oiUORQkZH5wcpScaCN
qShnqKSuRt8HWaF3uRh6D0sf4o0BWfZ6Pjyi7R/T7s2dQVgz0uCtu4MuyGirx/0J
uU3yDEZD24y6doyi6HUgAFGeGYqJXky2XDOMrgV6CNGFty9HFp6GgN6gtX353pZa
9Xplw4kg9UJwGcyk1BvgdQBHdc83y5zmr8c2PfhgiP/5rfEyCTYtHgZKUu5LLW8Y
ECYkSNnlAT68SancCmiXgos1heIiwB7922cRWJANfGEjItNBFldtb2QZio2bWOUX
g4gZ8HN2P0OmOxV1nnNkmIe+ZKLeHPN2N/FR7pUdkFk2+BLWFx+SbrceCCicwMF4
CUUzll8FMFmVAGjFivG/axzqlDWMY5lrDsFZamaWN3VQEo7TwL6l8/cDcFZF4QBq
viLWbi7sfQm6M3tZXFS465sG4SpUQ4lI8PKptuz5DMdr6WKisTLR7OF0klxsEwZj
ZcT8a/wzUIZ53w==
=MBp+
-----END PGP SIGNATURE-----
Merge tag 'powerpc-5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Fix KVM using a Power9 instruction on earlier CPUs, which could lead
to the host SLB being incorrectly invalidated and a subsequent host
crash.
Fix kernel hardlockup on vmap stack overflow on 32-bit.
Thanks to Christophe Leroy, Nicholas Piggin, and Fabiano Rosas"
* tag 'powerpc-5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/32: Fix hardlockup on vmap stack overflow
KVM: PPC: Book3S HV: Prevent POWER7/8 TLB flush flushing SLB
We got issue as follows:
================================================================================
UBSAN: Undefined behaviour in ./include/linux/ktime.h:42:14
signed integer overflow:
-4966321760114568020 * 1000000000 cannot be represented in type 'long long int'
CPU: 1 PID: 2186 Comm: syz-executor.2 Not tainted 4.19.90+ #12
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace+0x0/0x3f0 arch/arm64/kernel/time.c:78
show_stack+0x28/0x38 arch/arm64/kernel/traps.c:158
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x170/0x1dc lib/dump_stack.c:118
ubsan_epilogue+0x18/0xb4 lib/ubsan.c:161
handle_overflow+0x188/0x1dc lib/ubsan.c:192
__ubsan_handle_mul_overflow+0x34/0x44 lib/ubsan.c:213
ktime_set include/linux/ktime.h:42 [inline]
timespec64_to_ktime include/linux/ktime.h:78 [inline]
io_timeout fs/io_uring.c:5153 [inline]
io_issue_sqe+0x42c8/0x4550 fs/io_uring.c:5599
__io_queue_sqe+0x1b0/0xbc0 fs/io_uring.c:5988
io_queue_sqe+0x1ac/0x248 fs/io_uring.c:6067
io_submit_sqe fs/io_uring.c:6137 [inline]
io_submit_sqes+0xed8/0x1c88 fs/io_uring.c:6331
__do_sys_io_uring_enter fs/io_uring.c:8170 [inline]
__se_sys_io_uring_enter fs/io_uring.c:8129 [inline]
__arm64_sys_io_uring_enter+0x490/0x980 fs/io_uring.c:8129
invoke_syscall arch/arm64/kernel/syscall.c:53 [inline]
el0_svc_common+0x374/0x570 arch/arm64/kernel/syscall.c:121
el0_svc_handler+0x190/0x260 arch/arm64/kernel/syscall.c:190
el0_svc+0x10/0x218 arch/arm64/kernel/entry.S:1017
================================================================================
As ktime_set only judge 'secs' if big than KTIME_SEC_MAX, but if we pass
negative value maybe lead to overflow.
To address this issue, we must check if 'sec' is negative.
Signed-off-by: Ye Bin <yebin10@huawei.com>
Link: https://lore.kernel.org/r/20211118015907.844807-1-yebin10@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If a event is filtered by pid and a trigger that requires processing of
the event to happen is a attached to the event, the discard portion does
not take the pid filtering into account, and the event will then be
recorded when it should not have been.
Cc: stable@vger.kernel.org
Fixes: 3fdaf80f4a ("tracing: Implement event pid filtering")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
When supporting only the .map and .unmap callbacks of iommu_ops,
the IOMMU driver can make assumptions about the size and alignment
used for mappings based on the driver provided pgsize_bitmap. VT-d
previously used essentially PAGE_MASK for this bitmap as any power
of two mapping was acceptably filled by native page sizes.
However, with the .map_pages and .unmap_pages interface we're now
getting page-size and count arguments. If we simply combine these
as (page-size * count) and make use of the previous map/unmap
functions internally, any size and alignment assumptions are very
different.
As an example, a given vfio device assignment VM will often create
a 4MB mapping at IOVA pfn [0x3fe00 - 0x401ff]. On a system that
does not support IOMMU super pages, the unmap_pages interface will
ask to unmap 1024 4KB pages at the base IOVA. dma_pte_clear_level()
will recurse down to level 2 of the page table where the first half
of the pfn range exactly matches the entire pte level. We clear the
pte, increment the pfn by the level size, but (oops) the next pte is
on a new page, so we exit the loop an pop back up a level. When we
then update the pfn based on that higher level, we seem to assume
that the previous pfn value was at the start of the level. In this
case the level size is 256K pfns, which we add to the base pfn and
get a results of 0x7fe00, which is clearly greater than 0x401ff,
so we're done. Meanwhile we never cleared the ptes for the remainder
of the range. When the VM remaps this range, we're overwriting valid
ptes and the VT-d driver complains loudly, as reported by the user
report linked below.
The fix for this seems relatively simple, if each iteration of the
loop in dma_pte_clear_level() is assumed to clear to the end of the
level pte page, then our next pfn should be calculated from level_pfn
rather than our working pfn.
Fixes: 3f34f12597 ("iommu/vt-d: Implement map/unmap_pages() iommu_ops callback")
Reported-by: Ajay Garg <ajaygargnsit@gmail.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Link: https://lore.kernel.org/all/20211002124012.18186-1-ajaygargnsit@gmail.com/
Link: https://lore.kernel.org/r/163659074748.1617923.12716161410774184024.stgit@omen
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20211126135556.397932-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
With the submission of iommu driver for RK3568 a subtle bug was
introduced: PAGE_DESC_HI_MASK1 and PAGE_DESC_HI_MASK2 have to be
the other way arround - that leads to random errors, especially when
addresses beyond 32 bit are used.
Fix it.
Fixes: c55356c534 ("iommu: rockchip: Add support for iommu v2")
Signed-off-by: Alex Bee <knaerzche@gmail.com>
Tested-by: Peter Geis <pgwipeout@gmail.com>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Tested-by: Dan Johansen <strit@manjaro.org>
Reviewed-by: Benjamin Gaignard <benjamin.gaignard@collabora.com>
Link: https://lore.kernel.org/r/20211124021325.858139-1-knaerzche@gmail.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>