virtio core already sets the .owner, so driver does not need to.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Message-Id: <20240331-module-owner-virtio-v2-10-98f04bfaf46a@linaro.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
virtio core already sets the .owner, so driver does not need to.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Message-Id: <20240331-module-owner-virtio-v2-9-98f04bfaf46a@linaro.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
virtio core already sets the .owner, so driver does not need to.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Message-Id: <20240331-module-owner-virtio-v2-8-98f04bfaf46a@linaro.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
virtio core already sets the .owner, so driver does not need to.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Message-Id: <20240331-module-owner-virtio-v2-7-98f04bfaf46a@linaro.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
virtio core already sets the .owner, so driver does not need to.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Message-Id: <20240331-module-owner-virtio-v2-6-98f04bfaf46a@linaro.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
virtio core already sets the .owner, so driver does not need to.
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Message-Id: <20240331-module-owner-virtio-v2-5-98f04bfaf46a@linaro.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
virtio core already sets the .owner, so driver does not need to.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Message-Id: <20240331-module-owner-virtio-v2-4-98f04bfaf46a@linaro.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
virtio core already sets the .owner, so driver does not need to.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Message-Id: <20240331-module-owner-virtio-v2-3-98f04bfaf46a@linaro.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
virtio core already sets the .owner, so driver does not need to.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Message-Id: <20240331-module-owner-virtio-v2-2-98f04bfaf46a@linaro.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Treat stats requests as wakeup events to ensure that the driver responds
to device requests in a timely manner.
Signed-off-by: David Stevens <stevensd@chromium.org>
Acked-by: David Hildenbrand <david@redhat.com>
Message-Id: <20240321012445.1593685-3-stevensd@google.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Wakeup sources don't support nesting multiple events, so sharing a
single object between multiple drivers can result in one driver
overriding the wakeup event processing period specified by another
driver. Have the virtio balloon driver use the wakeup source of the
device it is bound to rather than the wakeup source of the parent
device, to avoid conflicts with the transport layer.
Note that although the virtio balloon's virtio_device itself isn't what
actually wakes up the device, it is responsible for processing wakeup
events. In the same way that EPOLLWAKEUP uses a dedicated wakeup_source
to prevent suspend when userspace is processing wakeup events, a
dedicated wakeup_source is necessary when processing wakeup events in a
higher layer in the kernel.
Fixes: b12fbc3f78 ("virtio_balloon: stay awake while adjusting balloon")
Signed-off-by: David Stevens <stevensd@chromium.org>
Acked-by: David Hildenbrand <david@redhat.com>
Message-Id: <20240321012445.1593685-2-stevensd@google.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
With virtio-mem, primarily hibernation is problematic: as the machine shuts
down, the virtio-mem device loses its state. Powering the machine back up
is like losing a bunch of DIMMs. While there would be ways to add limited
support, suspend+resume is more commonly used for VMs and "easier" to
support cleanly.
s2idle can be supported without any device dependencies. Similarly, one
would expect suspend-to-ram (i.e., S3) to work out of the box. However,
QEMU currently unplugs all device memory when resuming the VM, using a
cold reset on the "wakeup" path. In order to support S3, we need a feature
flag for the device to tell us if memory remains plugged when waking up. In
the future, QEMU will implement this feature.
So let's always support s2idle and support S3 with plugged memory only if
the device indicates support. Block hibernation early using the PM
notifier.
Trying to hibernate now fails early:
# echo disk > /sys/power/state
[ 26.455369] PM: hibernation: hibernation entry
[ 26.458271] virtio_mem virtio0: hibernation is not supported.
[ 26.462498] PM: hibernation: hibernation exit
-bash: echo: write error: Operation not permitted
s2idle works even without the new feature bit:
# echo s2idle > /sys/power/mem_sleep
# echo mem > /sys/power/state
[ 52.083725] PM: suspend entry (s2idle)
[ 52.095950] Filesystems sync: 0.010 seconds
[ 52.101493] Freezing user space processes
[ 52.104213] Freezing user space processes completed (elapsed 0.001 seconds)
[ 52.106520] OOM killer disabled.
[ 52.107655] Freezing remaining freezable tasks
[ 52.110880] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
[ 52.113296] printk: Suspending console(s) (use no_console_suspend to debug)
S3 does not work without the feature bit when memory is plugged:
# echo deep > /sys/power/mem_sleep
# echo mem > /sys/power/state
[ 32.788281] PM: suspend entry (deep)
[ 32.816630] Filesystems sync: 0.027 seconds
[ 32.820029] Freezing user space processes
[ 32.823870] Freezing user space processes completed (elapsed 0.001 seconds)
[ 32.827756] OOM killer disabled.
[ 32.829608] Freezing remaining freezable tasks
[ 32.833842] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
[ 32.837953] printk: Suspending console(s) (use no_console_suspend to debug)
[ 32.916172] virtio_mem virtio0: suspend+resume with plugged memory is not supported
[ 32.916181] virtio-pci 0000:00:02.0: PM: pci_pm_suspend(): virtio_pci_freeze+0x0/0x50 returns -1
[ 32.916197] virtio-pci 0000:00:02.0: PM: dpm_run_callback(): pci_pm_suspend+0x0/0x170 returns -1
[ 32.916210] virtio-pci 0000:00:02.0: PM: failed to suspend async: error -1
But S3 works with the new feature bit when memory is plugged (patched
QEMU):
# echo deep > /sys/power/mem_sleep
# echo mem > /sys/power/state
[ 33.983694] PM: suspend entry (deep)
[ 34.009828] Filesystems sync: 0.024 seconds
[ 34.013589] Freezing user space processes
[ 34.016722] Freezing user space processes completed (elapsed 0.001 seconds)
[ 34.019092] OOM killer disabled.
[ 34.020291] Freezing remaining freezable tasks
[ 34.023549] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
[ 34.026090] printk: Suspending console(s) (use no_console_suspend to debug)
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20240318120645.105664-1-david@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This removes the signal/coredump hacks added for vhost_tasks in:
Commit f9010dbdce ("fork, vhost: Use CLONE_THREAD to fix freezer/ps regression")
When that patch was added vhost_tasks did not handle SIGKILL and would
try to ignore/clear the signal and continue on until the device's close
function was called. In the previous patches vhost_tasks and the vhost
drivers were converted to support SIGKILL by cleaning themselves up and
exiting. The hacks are no longer needed so this removes them.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Message-Id: <20240316004707.45557-10-michael.christie@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Instead of lingering until the device is closed, this has us handle
SIGKILL by:
1. marking the worker as killed so we no longer try to use it with
new virtqueues and new flush operations.
2. setting the virtqueue to worker mapping so no new works are queued.
3. running all the exiting works.
Suggested-by: Edward Adam Davis <eadavis@qq.com>
Reported-and-tested-by: syzbot+98edc2df894917b3431f@syzkaller.appspotmail.com
Message-Id: <tencent_546DA49414E876EEBECF2C78D26D242EE50A@qq.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Message-Id: <20240316004707.45557-9-michael.christie@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
In the next patches where the worker can be killed while in use, we
need to be able to take the worker mutex and kill queued works for
new IO and flushes, and set some new flags to prevent new
__vhost_vq_attach_worker calls from swapping in/out killed workers.
If we are holding the worker mutex during a flush and the flush's work
is still in the queue, the worker code that will handle the SIGKILL
cleanup won't be able to take the mutex and perform it's cleanup. So
this patch has us drop the worker mutex while waiting for the flush
to complete.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Message-Id: <20240316004707.45557-8-michael.christie@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
__vhost_vq_attach_worker uses the vhost_dev mutex to serialize the
swapping of a virtqueue's worker. This was done for simplicity because
we are already holding that mutex.
In the next patches where the worker can be killed while in use, we need
finer grained locking because some drivers will hold the vhost_dev mutex
while flushing. However in the SIGKILL handler in the next patches, we
will need to be able to swap workers (set current one to NULL), kill
queued works and stop new flushes while flushes are in progress.
To prepare us, this has us use the virtqueue mutex for swapping workers
instead of the vhost_dev one.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Message-Id: <20240316004707.45557-7-michael.christie@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
vhost_vq_work_queue will never fail when queueing the TMF's response
handling because a guest can only send us TMFs when the device is fully
setup so there is always a worker at that time. In the next patches we
will modify the worker code so it handles SIGKILL by exiting before
outstanding commands/TMFs have sent their responses. In that case
vhost_vq_work_queue can fail when we try to send a response.
This has us just free the TMF's resources since at this time the guest
won't be able to get a response even if we could send it.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Message-Id: <20240316004707.45557-6-michael.christie@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
vhost_vq_flush is no longer used so remove it.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Message-Id: <20240316004707.45557-5-michael.christie@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
We flush all the workers that are not also used by the ctl vq to make
sure that responses queued by LIO before the TMF response are sent
before the TMF response. This requires a special vhost_vq_flush
function which, in the next patches where we handle SIGKILL killing
workers while in use, will require extra locking/complexity. To avoid
that, this patch has us flush the entire device from the system work
queue, then queue up sending the response from there.
This is a little less optimal since we now flush all workers but this
will be ok since commands have already timed out and perf is not a
concern.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Message-Id: <20240316004707.45557-4-michael.christie@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
In the next patches we will support the vhost_task being killed while in
use. The problem for vhost-scsi is that we can't free some structs until
we get responses for commands we have submitted to the target layer and
we currently process the responses from the vhost_task.
This has just drop the responses and free the command's resources. When
all commands have completed then operations like flush will be woken up
and we can complete device release and endpoint cleanup.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Message-Id: <20240316004707.45557-3-michael.christie@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Currently, we can try to queue an event's work before the vhost_task is
created. When this happens we just drop it in vhost_scsi_do_plug before
even calling vhost_vq_work_queue. During a device shutdown we do the
same thing after vhost_scsi_clear_endpoint has cleared the backends.
In the next patches we will be able to kill the vhost_task before we
have cleared the endpoint. In that case, vhost_vq_work_queue can fail
and we will leak the event's memory. This has handle the failure by
just freeing the event. This is safe to do, because
vhost_vq_work_queue will only return failure for us when the vhost_task
is killed and so userspace will not be able to handle events if we
sent them.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Message-Id: <20240316004707.45557-2-michael.christie@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Per filesystems/sysfs.rst, show() should only use sysfs_emit()
or sysfs_emit_at() when formatting the value to be returned to user space.
coccinelle complains that there are still a couple of functions that use
snprintf(). Convert them to sysfs_emit().
sprintf() will be converted as weel if they have.
Generally, this patch is generated by
make coccicheck M=<path/to/file> MODE=patch \
COCCI=scripts/coccinelle/api/device_attr_show.cocci
No functional change intended
CC: "Michael S. Tsirkin" <mst@redhat.com>
CC: Jason Wang <jasowang@redhat.com>
CC: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
CC: virtualization@lists.linux.dev
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Message-Id: <20240314095853.1326111-1-lizhijian@fujitsu.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
In the vp_vdpa_set_status function, when setting the device status to
VIRTIO_CONFIG_S_DRIVER_OK, the vp_vdpa_request_irq function may fail.
In such cases, the device status should not be set to DRIVER_OK. Add
exception printing to remind the user.
Signed-off-by: Yuxue Liu <yuxue.liu@jaguarmicro.com>
Message-Id: <20240325105448.235-1-gavin.liu@jaguarmicro.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
TIF_FOREIGN_FPSTATE is a 'convenience' flag that should reflect whether
the current CPU holds the most recent user mode FP/SIMD state of the
current task. It combines two conditions:
- whether the current CPU's FP/SIMD state belongs to the task;
- whether that state is the most recent associated with the task (as a
task may have executed on other CPUs as well).
When a task is scheduled in and TIF_KERNEL_FPSTATE is set, it means the
task was in a kernel mode NEON section when it was scheduled out, and so
the kernel mode FP/SIMD state is restored. Since this implies that the
current CPU is *not* holding the most recent user mode FP/SIMD state of
the current task, the TIF_FOREIGN_FPSTATE flag is set too, so that the
user mode FP/SIMD state is reloaded from memory when returning to
userland.
However, the task may be scheduled out after completing the kernel mode
NEON section, but before returning to userland. When this happens, the
TIF_FOREIGN_FPSTATE flag will not be preserved, but will be set as usual
the next time the task is scheduled in, and will be based on the above
conditions.
This means that, rather than setting TIF_FOREIGN_FPSTATE when scheduling
in a task with TIF_KERNEL_FPSTATE set, the underlying state should be
updated so that TIF_FOREIGN_FPSTATE will assume the expected value as a
result.
So instead, call fpsimd_flush_cpu_state(), which takes care of this.
Closes: https://lore.kernel.org/all/cb8822182231850108fa43e0446a4c7f@kernel.org
Reported-by: Johannes Nixdorf <mixi@shadowice.org>
Fixes: aefbab8e77 ("arm64: fpsimd: Preserve/restore kernel mode NEON at context switch")
Cc: Mark Brown <broonie@kernel.org>
Cc: Dave Martin <Dave.Martin@arm.com>
Cc: Janne Grunau <j@jannau.net>
Cc: stable@vger.kernel.org
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Janne Grunau <j@jannau.net>
Tested-by: Johannes Nixdorf <mixi@shadowice.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20240522091335.335346-2-ardb+git@google.com
Signed-off-by: Will Deacon <will@kernel.org>
This reverts commit b8995a1841.
Ard managed to reproduce the dm-crypt corruption problem and got to the
bottom of it, so re-apply the problematic patch in preparation for
fixing things properly.
Cc: stable@vger.kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Code in v6.9 arch/x86/kernel/smpboot.c was changed by commit
4db64279bc ("x86/cpu: Switch to new Intel CPU model defines") from:
static const struct x86_cpu_id intel_cod_cpu[] = {
X86_MATCH_INTEL_FAM6_MODEL(HASWELL_X, 0), /* COD */
X86_MATCH_INTEL_FAM6_MODEL(BROADWELL_X, 0), /* COD */
X86_MATCH_INTEL_FAM6_MODEL(ANY, 1), /* SNC */ <--- 443
{}
};
static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
{
const struct x86_cpu_id *id = x86_match_cpu(intel_cod_cpu);
to:
static const struct x86_cpu_id intel_cod_cpu[] = {
X86_MATCH_VFM(INTEL_HASWELL_X, 0), /* COD */
X86_MATCH_VFM(INTEL_BROADWELL_X, 0), /* COD */
X86_MATCH_VFM(INTEL_ANY, 1), /* SNC */
{}
};
static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
{
const struct x86_cpu_id *id = x86_match_cpu(intel_cod_cpu);
On an Intel CPU with SNC enabled this code previously matched the rule on line
443 to avoid printing messages about insane cache configuration. The new code
did not match any rules.
Expanding the macros for the intel_cod_cpu[] array shows that the old is
equivalent to:
static const struct x86_cpu_id intel_cod_cpu[] = {
[0] = { .vendor = 0, .family = 6, .model = 0x3F, .steppings = 0, .feature = 0, .driver_data = 0 },
[1] = { .vendor = 0, .family = 6, .model = 0x4F, .steppings = 0, .feature = 0, .driver_data = 0 },
[2] = { .vendor = 0, .family = 6, .model = 0x00, .steppings = 0, .feature = 0, .driver_data = 1 },
[3] = { .vendor = 0, .family = 0, .model = 0x00, .steppings = 0, .feature = 0, .driver_data = 0 }
}
while the new code expands to:
static const struct x86_cpu_id intel_cod_cpu[] = {
[0] = { .vendor = 0, .family = 6, .model = 0x3F, .steppings = 0, .feature = 0, .driver_data = 0 },
[1] = { .vendor = 0, .family = 6, .model = 0x4F, .steppings = 0, .feature = 0, .driver_data = 0 },
[2] = { .vendor = 0, .family = 0, .model = 0x00, .steppings = 0, .feature = 0, .driver_data = 1 },
[3] = { .vendor = 0, .family = 0, .model = 0x00, .steppings = 0, .feature = 0, .driver_data = 0 }
}
Looking at the code for x86_match_cpu():
const struct x86_cpu_id *x86_match_cpu(const struct x86_cpu_id *match)
{
const struct x86_cpu_id *m;
struct cpuinfo_x86 *c = &boot_cpu_data;
for (m = match;
m->vendor | m->family | m->model | m->steppings | m->feature;
m++) {
...
}
return NULL;
it is clear that there was no match because the ANY entry in the table (array
index 2) is now the loop termination condition (all of vendor, family, model,
steppings, and feature are zero).
So this code was working before because the "ANY" check was looking for any
Intel CPU in family 6. But fails now because the family is a wild card. So the
root cause is that x86_match_cpu() has never been able to match on a rule with
just X86_VENDOR_INTEL and all other fields set to wildcards.
Add a new flags field to struct x86_cpu_id that has a bit set to indicate that
this entry in the array is valid. Update X86_MATCH*() macros to set that bit.
Change the end-marker check in x86_match_cpu() to just check the flags field
for this bit.
Backporter notes: The commit in Fixes is really the one that is broken:
you can't have m->vendor as part of the loop termination conditional in
x86_match_cpu() because it can happen - as it has happened above
- that that whole conditional is 0 albeit vendor == 0 is a valid case
- X86_VENDOR_INTEL is 0.
However, the only case where the above happens is the SNC check added by
4db64279bc so you only need this fix if you have backported that
other commit
4db64279bc ("x86/cpu: Switch to new Intel CPU model defines")
Fixes: 644e9cbbe3 ("Add driver auto probing for x86 features v4")
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable+noautosel@kernel.org> # see above
Link: https://lore.kernel.org/r/20240517144312.GBZkdtAOuJZCvxhFbJ@fat_crate.local
New CPU #defines encode vendor and family as well as model.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Link: https://lore.kernel.org/r/20240520224620.9480-2-tony.luck@intel.com
We used '#ifdef MODULE' for judging whether the system supports the
sound module or not, and /proc/asound/modules is created only when
'#ifdef MODULE' is true. The check is not really appropriate, though,
because the flag means only for the sound core and the drivers are
still allowed to be built as modules even if 'MODULE' is not set in
sound/core/init.c.
For fixing the inconsistency, replace those ifdefs with 'ifdef
CONFIG_MODULES'. One place for a NULL module check is rewritten with
IS_MODULE(CONFIG_SND) to be more intuitive. It can't be changed to
CONFIG_MODULES; otherwise it would hit a WARN_ON() incorrectly.
This is a slight behavior change; the modules proc entry appears now
no matter whether the sound core is built-in or not as long as modules
are enabled on the kernel in general. This can't be avoided due to
the nature of kernel builds.
Link: https://lore.kernel.org/r/20240520170349.2417900-1-xu.yang_2@nxp.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Xu Yang <xu.yang_2@nxp.com>
Link: https://lore.kernel.org/r/20240522070442.17786-2-tiwai@suse.de
The commit 81033c6b58 ("ALSA: core: Warn on empty module")
introduced a WARN_ON() for a NULL module pointer passed at snd_card
object creation, and it also wraps the code around it with '#ifdef
MODULE'. This works in most cases, but the devils are always in
details. "MODULE" is defined when the target code (i.e. the sound
core) is built as a module; but this doesn't mean that the caller is
also built-in or not. Namely, when only the sound core is built-in
(CONFIG_SND=y) while the driver is a module (CONFIG_SND_USB_AUDIO=m),
the passed module pointer is ignored even if it's non-NULL, and
card->module remains as NULL. This would result in the missing module
reference up/down at the device open/close, leading to a race with the
code execution after the module removal.
For addressing the bug, move the assignment of card->module again out
of ifdef. The WARN_ON() is still wrapped with ifdef because the
module can be really NULL when all sound drivers are built-in.
Note that we keep 'ifdef MODULE' for WARN_ON(), otherwise it would
lead to a false-positive NULL module check. Admittedly it won't catch
perfectly, i.e. no check is performed when CONFIG_SND=y. But, it's no
real problem as it's only for debugging, and the condition is pretty
rare.
Fixes: 81033c6b58 ("ALSA: core: Warn on empty module")
Reported-by: Xu Yang <xu.yang_2@nxp.com>
Closes: https://lore.kernel.org/r/20240520170349.2417900-1-xu.yang_2@nxp.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Xu Yang <xu.yang_2@nxp.com>
Link: https://lore.kernel.org/r/20240522070442.17786-1-tiwai@suse.de
Lan966x is adding ptp traps to redirect the ptp frames to the CPU such
that the HW will not forward these frames anywhere. The issue is that in
case ptp is not enabled and the timestamping source is et to
HWTSTAMP_SOURCE_NETDEV then these traps would not be removed on the
error path.
Fix this by removing the traps in this case as they are not needed.
Fixes: 54e1ed69c4 ("net: lan966x: convert to ndo_hwtstamp_get() and ndo_hwtstamp_set()")
Suggested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Link: https://lore.kernel.org/r/20240517135808.3025435-1-horatiu.vultur@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
With the following two conditions, bio will be lost:
1) blk plug is not enabled, for example, __blkdev_direct_IO_simple() and
__blkdev_direct_IO_async();
2) bio plug is enabled, for example write IO for raid1/raid10 while
bitmap is enabled;
Root cause is that blk_finish_plug() will add the bio to
curent->bio_list, while such bio will not be handled:
__submit_bio_noacct
current->bio_list = bio_list_on_stack;
blk_start_plug
do {
dm_submit_bio
md_handle_request
raid10_write_request
-> generate new bio for underlying disks
raid1_add_bio_to_plug -> bio is added to plug
} while ((bio = bio_list_pop(&bio_list_on_stack[0])))
-> previous bio are all handled
blk_finish_plug
raid10_unplug
raid1_submit_write
submit_bio_noacct
if (current->bio_list)
bio_list_add(¤t->bio_list[0], bio)
-> add new bio
current->bio_list = NULL
-> new bio is lost
Fix the problem by moving the plug into the while loop, so that
current->bio_list will still be handled after blk_finish_plug().
By the way, enable plug for raid1/raid10 in this case will also prevent
delay IO handling into daemon thread, which should also improve IO
performance.
Fixes: 060406c61c ("block: add plug while submitting IO")
Reported-by: Changhui Zhong <czhong@redhat.com>
Closes: https://lore.kernel.org/all/CAGVVp+Xsmzy2G9YuEatfMT6qv1M--YdOCQ0g7z7OVmcTbBxQAg@mail.gmail.com/
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Tested-by: Changhui Zhong <czhong@redhat.com>
Link: https://lore.kernel.org/r/20240521200308.983986-1-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The reader code in rb_get_reader_page() swaps a new reader page into the
ring buffer by doing cmpxchg on old->list.prev->next to point it to the
new page. Following that, if the operation is successful,
old->list.next->prev gets updated too. This means the underlying
doubly-linked list is temporarily inconsistent, page->prev->next or
page->next->prev might not be equal back to page for some page in the
ring buffer.
The resize operation in ring_buffer_resize() can be invoked in parallel.
It calls rb_check_pages() which can detect the described inconsistency
and stop further tracing:
[ 190.271762] ------------[ cut here ]------------
[ 190.271771] WARNING: CPU: 1 PID: 6186 at kernel/trace/ring_buffer.c:1467 rb_check_pages.isra.0+0x6a/0xa0
[ 190.271789] Modules linked in: [...]
[ 190.271991] Unloaded tainted modules: intel_uncore_frequency(E):1 skx_edac(E):1
[ 190.272002] CPU: 1 PID: 6186 Comm: cmd.sh Kdump: loaded Tainted: G E 6.9.0-rc6-default #5 158d3e1e6d0b091c34c3b96bfd99a1c58306d79f
[ 190.272011] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552c-rebuilt.opensuse.org 04/01/2014
[ 190.272015] RIP: 0010:rb_check_pages.isra.0+0x6a/0xa0
[ 190.272023] Code: [...]
[ 190.272028] RSP: 0018:ffff9c37463abb70 EFLAGS: 00010206
[ 190.272034] RAX: ffff8eba04b6cb80 RBX: 0000000000000007 RCX: ffff8eba01f13d80
[ 190.272038] RDX: ffff8eba01f130c0 RSI: ffff8eba04b6cd00 RDI: ffff8eba0004c700
[ 190.272042] RBP: ffff8eba0004c700 R08: 0000000000010002 R09: 0000000000000000
[ 190.272045] R10: 00000000ffff7f52 R11: ffff8eba7f600000 R12: ffff8eba0004c720
[ 190.272049] R13: ffff8eba00223a00 R14: 0000000000000008 R15: ffff8eba067a8000
[ 190.272053] FS: 00007f1bd64752c0(0000) GS:ffff8eba7f680000(0000) knlGS:0000000000000000
[ 190.272057] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 190.272061] CR2: 00007f1bd6662590 CR3: 000000010291e001 CR4: 0000000000370ef0
[ 190.272070] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 190.272073] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 190.272077] Call Trace:
[ 190.272098] <TASK>
[ 190.272189] ring_buffer_resize+0x2ab/0x460
[ 190.272199] __tracing_resize_ring_buffer.part.0+0x23/0xa0
[ 190.272206] tracing_resize_ring_buffer+0x65/0x90
[ 190.272216] tracing_entries_write+0x74/0xc0
[ 190.272225] vfs_write+0xf5/0x420
[ 190.272248] ksys_write+0x67/0xe0
[ 190.272256] do_syscall_64+0x82/0x170
[ 190.272363] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 190.272373] RIP: 0033:0x7f1bd657d263
[ 190.272381] Code: [...]
[ 190.272385] RSP: 002b:00007ffe72b643f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 190.272391] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f1bd657d263
[ 190.272395] RDX: 0000000000000002 RSI: 0000555a6eb538e0 RDI: 0000000000000001
[ 190.272398] RBP: 0000555a6eb538e0 R08: 000000000000000a R09: 0000000000000000
[ 190.272401] R10: 0000555a6eb55190 R11: 0000000000000246 R12: 00007f1bd6662500
[ 190.272404] R13: 0000000000000002 R14: 00007f1bd6667c00 R15: 0000000000000002
[ 190.272412] </TASK>
[ 190.272414] ---[ end trace 0000000000000000 ]---
Note that ring_buffer_resize() calls rb_check_pages() only if the parent
trace_buffer has recording disabled. Recent commit d78ab79270
("tracing: Stop current tracer when resizing buffer") causes that it is
now always the case which makes it more likely to experience this issue.
The window to hit this race is nonetheless very small. To help
reproducing it, one can add a delay loop in rb_get_reader_page():
ret = rb_head_page_replace(reader, cpu_buffer->reader_page);
if (!ret)
goto spin;
for (unsigned i = 0; i < 1U << 26; i++) /* inserted delay loop */
__asm__ __volatile__ ("" : : : "memory");
rb_list_head(reader->list.next)->prev = &cpu_buffer->reader_page->list;
.. and then run the following commands on the target system:
echo 1 > /sys/kernel/tracing/events/sched/sched_switch/enable
while true; do
echo 16 > /sys/kernel/tracing/buffer_size_kb; sleep 0.1
echo 8 > /sys/kernel/tracing/buffer_size_kb; sleep 0.1
done &
while true; do
for i in /sys/kernel/tracing/per_cpu/*; do
timeout 0.1 cat $i/trace_pipe; sleep 0.2
done
done
To fix the problem, make sure ring_buffer_resize() doesn't invoke
rb_check_pages() concurrently with a reader operating on the same
ring_buffer_per_cpu by taking its cpu_buffer->reader_lock.
Link: https://lore.kernel.org/linux-trace-kernel/20240517134008.24529-3-petr.pavlu@suse.com
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fixes: 659f451ff2 ("ring-buffer: Add integrity check at end of iter read")
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
[ Fixed whitespace ]
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Adjust the following code documentation:
* Kernel-doc comments for ring_buffer_read_prepare() and
ring_buffer_read_finish() mention that recording to the ring buffer is
disabled when the read is active. Remove mention of this restriction
because it was already lifted in commit 1039221cc2 ("ring-buffer: Do
not disable recording when there is an iterator").
* Function ring_buffer_read_finish() performs a self-check of the
ring-buffer by locking cpu_buffer->reader_lock and then calling
rb_check_pages(). The preceding comment explains that the lock is
needed because rb_check_pages() clears the HEAD flag required by
readers which might be running in parallel. Remove this explanation
because commit 8843e06f67 ("ring-buffer: Handle race between
rb_move_tail and rb_check_pages") simplified the function so it no
longer resets the mentioned flag. Nonetheless, the lock is still
needed because a reader swapping a page into the ring buffer can make
the underlying doubly-linked list temporarily inconsistent.
This is a non-functional change.
Link: https://lore.kernel.org/linux-trace-kernel/20240517134008.24529-2-petr.pavlu@suse.com
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
- Add Kan Liang to MAINTAINERS as a perf tools reviewer.
- Add support for using the 'capstone' disassembler library in various tools,
such as 'perf script' and 'perf annotate'. This is an alternative for the
use of the 'xed' and 'objdump' disassemblers.
- Data-type profiling improvements:
Resolve types for a->b->c by backtracking the assignments until it finds
DWARF info for one of those members
Support for global variables, keeping a cache to speed up lookups.
Handle the 'call' instruction, dealing with effects on registers and handling
its return when tracking register data types.
Handle x86's segment based addressing like %gs:0x28, to support things like
per CPU variables, the stack canary, etc.
Data-type profiling got big speedups when using capstone for disassembling.
The objdump outoput parsing method is left as a fallback when capstone fails or
isn't available. There are patches posted for 6.11 that to use a LLVM
disassembler.
Support event group display in the TUI when annotating types with --data-type,
for instance to show memory load and store events for the data type fields.
Optimize the 'perf annotate' data structures, reducing memory usage.
Add a initial 'perf test' for 'perf annotate', checking that a target symbol
appears on the output, specifying objdump via the command line, etc.
- Integrate the shellcheck utility with the build of perf to allow catching
shell problems early in areas such as 'perf test', 'perf trace' scrape
scripts, etc.
- Add 'uretprobe' variant in the 'perf bench uprobe' tool.
- Add script to run instances of 'perf script' in parallel.
- Allow parsing tracepoint names that start with digits, such as
9p/9p_client_req, etc. Make sure 'perf test' tests it even on systems
where those tracepoints aren't available.
Vendor Events:
- Update Intel JSON files for Cascade Lake X, Emerald Rapids, Grand Ridge, Ice
Lake X, Lunar Lake, Meteor Lake, Sapphire Rapids, Sierra Forest, Sky Lake X,
Sky Lake and Snow Ridge X. Remove info metrics erroneously in TopdownL1.
- Add AMD's Zen 5 core and uncore events and metrics. Those come from the
"Performance Monitor Counters for AMD Family 1Ah Model 00h- 0Fh Processors"
document, with events that capture information on op dispatch, execution and
retirement, branch prediction, L1 and L2 cache activity, TLB activity, etc.
- Mark L1D_CACHE_INVAL impacted by errata for ARM64's AmpereOne/AmpereOneX.
Miscellaneous:
- Sync header copies with the kernel sources.
- Move some header copies used only for generating translation string tables
for ioctl cmds and other syscall integer arguments to a new directory under
tools/perf/beauty/, to separate from copies in tools/include/ that are used
to build the tools.
- Introduce scrape script for several syscall 'flags'/'mask' arguments.
- Improve cpumap utilization, fixing up pairing of refcounts, using the right
iterators (perf_cpu_map__for_each_cpu), etc.
- Give more details about raw event encodings in 'perf list', show tracepoint
encoding in the detailed output.
- Refactor the DSOs handling code, reducing memory usage.
- Document the BPF event modifier and add a 'perf test' for it.
- Improve the event parser, better error messages and add further 'perf test's
for it.
- Add reference count checking to 'struct comm_str' and 'struct mem_info'.
- Make ARM64's 'perf test' entries for the Neoverse N1 more robust.
- Tweak the ARM64's Coresight 'perf test's.
- Improve ARM64's CoreSight ETM version detection and error reporting.
- Fix handling of symbols when using kcore.
- Fix PAI (Processor Activity Instrumentation) counter names for s390 virtual
machines in 'perf report'.
- Fix -g/--call-graph option failure in 'perf sched timehist'.
- Add LIBTRACEEVENT_DIR build option to allow building with libtraceevent
installed in non-standard directories, such as when doing cross builds.
- Various 'perf test' and 'perf bench' fixes.
- Improve 'perf probe' error message for long C++ probe names.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCZkzdjgAKCRCyPKLppCJ+
J8ZYAP46rcbOuDWol6tjD9FDXd+spkWc40bnqeSnOR+TWlmJXwEA87XU4+3LAh6p
HQxKXehJRh90I90yn954mK2NuN+58Q0=
=sie+
-----END PGP SIGNATURE-----
Merge tag 'perf-tools-for-v6.10-1-2024-05-21' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools updates from Arnaldo Carvalho de Melo:
"General:
- Integrate the shellcheck utility with the build of perf to allow
catching shell problems early in areas such as 'perf test', 'perf
trace' scrape scripts, etc
- Add 'uretprobe' variant in the 'perf bench uprobe' tool
- Add script to run instances of 'perf script' in parallel
- Allow parsing tracepoint names that start with digits, such as
9p/9p_client_req, etc. Make sure 'perf test' tests it even on
systems where those tracepoints aren't available
- Add Kan Liang to MAINTAINERS as a perf tools reviewer
- Add support for using the 'capstone' disassembler library in
various tools, such as 'perf script' and 'perf annotate'. This is
an alternative for the use of the 'xed' and 'objdump' disassemblers
Data-type profiling improvements:
- Resolve types for a->b->c by backtracking the assignments until it
finds DWARF info for one of those members
- Support for global variables, keeping a cache to speed up lookups
- Handle the 'call' instruction, dealing with effects on registers
and handling its return when tracking register data types
- Handle x86's segment based addressing like %gs:0x28, to support
things like per CPU variables, the stack canary, etc
- Data-type profiling got big speedups when using capstone for
disassembling. The objdump outoput parsing method is left as a
fallback when capstone fails or isn't available. There are patches
posted for 6.11 that to use a LLVM disassembler
- Support event group display in the TUI when annotating types with
--data-type, for instance to show memory load and store events for
the data type fields
- Optimize the 'perf annotate' data structures, reducing memory usage
- Add a initial 'perf test' for 'perf annotate', checking that a
target symbol appears on the output, specifying objdump via the
command line, etc
Vendor Events:
- Update Intel JSON files for Cascade Lake X, Emerald Rapids, Grand
Ridge, Ice Lake X, Lunar Lake, Meteor Lake, Sapphire Rapids, Sierra
Forest, Sky Lake X, Sky Lake and Snow Ridge X. Remove info metrics
erroneously in TopdownL1
- Add AMD's Zen 5 core and uncore events and metrics. Those come from
the "Performance Monitor Counters for AMD Family 1Ah Model 00h- 0Fh
Processors" document, with events that capture information on op
dispatch, execution and retirement, branch prediction, L1 and L2
cache activity, TLB activity, etc
- Mark L1D_CACHE_INVAL impacted by errata for ARM64's AmpereOne/
AmpereOneX
Miscellaneous:
- Sync header copies with the kernel sources
- Move some header copies used only for generating translation string
tables for ioctl cmds and other syscall integer arguments to a new
directory under tools/perf/beauty/, to separate from copies in
tools/include/ that are used to build the tools
- Introduce scrape script for several syscall 'flags'/'mask'
arguments
- Improve cpumap utilization, fixing up pairing of refcounts, using
the right iterators (perf_cpu_map__for_each_cpu), etc
- Give more details about raw event encodings in 'perf list', show
tracepoint encoding in the detailed output
- Refactor the DSOs handling code, reducing memory usage
- Document the BPF event modifier and add a 'perf test' for it
- Improve the event parser, better error messages and add further
'perf test's for it
- Add reference count checking to 'struct comm_str' and 'struct
mem_info'
- Make ARM64's 'perf test' entries for the Neoverse N1 more robust
- Tweak the ARM64's Coresight 'perf test's
- Improve ARM64's CoreSight ETM version detection and error reporting
- Fix handling of symbols when using kcore
- Fix PAI (Processor Activity Instrumentation) counter names for s390
virtual machines in 'perf report'
- Fix -g/--call-graph option failure in 'perf sched timehist'
- Add LIBTRACEEVENT_DIR build option to allow building with
libtraceevent installed in non-standard directories, such as when
doing cross builds
- Various 'perf test' and 'perf bench' fixes
- Improve 'perf probe' error message for long C++ probe names"
* tag 'perf-tools-for-v6.10-1-2024-05-21' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (260 commits)
tools lib subcmd: Show parent options in help
perf pmu: Count sys and cpuid JSON events separately
perf stat: Don't display metric header for non-leader uncore events
perf annotate-data: Ensure the number of type histograms
perf annotate: Fix segfault on sample histogram
perf daemon: Fix file leak in daemon_session__control
libsubcmd: Fix parse-options memory leak
perf lock: Avoid memory leaks from strdup()
perf sched: Rename 'switches' column header to 'count' and add usage description, options for latency
perf tools: Ignore deleted cgroups
perf parse: Allow tracepoint names to start with digits
perf parse-events: Add new 'fake_tp' parameter for tests
perf parse-events: pass parse_state to add_tracepoint
perf symbols: Fix ownership of string in dso__load_vmlinux()
perf symbols: Update kcore map before merging in remaining symbols
perf maps: Re-use __maps__free_maps_by_name()
perf symbols: Remove map from list before updating addresses
perf tracepoint: Don't scan all tracepoints to test if one exists
perf dwarf-aux: Fix build with HAVE_DWARF_CFI_SUPPORT
perf thread: Fixes to thread__new() related to initializing comm
...
Hi Linus,
Please pull patches for 6.10. This includes:
- topology_span_sane() optimization from Kyle Meyer;
- fns() rework from Kuan-Wei Chiu (used in
cpumask_local_spread() and other places); and
- headers cleanup from Andy.
This also adds a MAINTAINERS record for bitops API as it's unattended,
and I'd like to follow it closer.
Thanks,
Yury
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEEi8GdvG6xMhdgpu/4sUSA/TofvsgFAmZKh/kACgkQsUSA/Tof
vshtSQv/eT5+KyXg5qCY3fLaIjWYD0uch5jxkdqtib5BncfIrUMsFpZBon+E2x9C
fWu7K/nfxUjKZF0Sfgl9gVns6K0rC4F24WzHjzWRVVV7+g4idXwMC1kxSX733KQC
o+D2065Dx9EmhnzypBbmNsGQsQ09WXP1GsJLf8qSGCw0lT1zNtgqsAD5sSogFGGn
ca9ZsndThuzTst5lXPXipt1W/c26frchh6SgjVTPjzALCDAf5r9Ls5np3AL1AW8X
yR8cuV9UphT1ysBplzPbBET/Fy/AGbZl1g4u72M6NvGy/nVkQ5Ic4HZj0zIem0Ic
C60PokY8lg6hQ7tWN8da12/g6WZINgZcfUfuodKiQAzryBGUJlW0aDzDUZPcCqB/
gmV/Op4RPJeQr9sibQ6nIFx73ydKVQEmZRliahzXR0p33HJCOLTATOeYqLTXQMdi
ZwhYCqG5fNEUK0VMBy8S4+tEsUAoykU21hFD04b/Ur8A49bxxJ9RDlAUC0IEc1Pj
fiU0VPFx
=H6BQ
-----END PGP SIGNATURE-----
Merge tag 'bitmap-for-6.10v2' of https://github.com/norov/linux
Pull bitmap updates from Yury Norov:
- topology_span_sane() optimization from Kyle Meyer
- fns() rework from Kuan-Wei Chiu (used in cpumask_local_spread() and
other places)
- headers cleanup from Andy
- add a MAINTAINERS record for bitops API
* tag 'bitmap-for-6.10v2' of https://github.com/norov/linux:
usercopy: Don't use "proxy" headers
bitops: Move aligned_byte_mask() to wordpart.h
MAINTAINERS: add BITOPS API record
bitmap: relax find_nth_bit() limitation on return value
lib: make test_bitops compilable into the kernel image
bitops: Optimize fns() for improved performance
lib/test_bitops: Add benchmark test for fns()
Compiler Attributes: Add __always_used macro
sched/topology: Optimize topology_span_sane()
cpumask: Add for_each_cpu_from()
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZkzp/gAKCRBZ7Krx/gZQ
63KFAQCsKv3XdcF+2BO+QuwPvR6eAvDxFjrFEcQFyyOXgFVLaAD/UMM0HcEFWxBb
PCPvyKVP22wF9PbodkrKJn8DRdtRZwM=
=jvWv
-----END PGP SIGNATURE-----
Merge tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs updates from Al Viro:
"Assorted commits that had missed the last merge window..."
* tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
remove call_{read,write}_iter() functions
do_dentry_open(): kill inode argument
kernel_file_open(): get rid of inode argument
get_file_rcu(): no need to check for NULL separately
fd_is_open(): move to fs/file.c
close_on_exec(): pass files_struct instead of fdtable
We can easily have up to 24 flags with sane
atomicity, _without_ pushing anything out
of the first cacheline of struct block_device.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZkznRwAKCRBZ7Krx/gZQ
69XpAQDOZCyvYOZ/dlMOKKLf2vAojC/h++E/NjvGt3erbvVN2wEArXMi13ECsoCw
JYJA3MsmvjuY6VNcm24icf2/p4TMIgo=
=JyYi
-----END PGP SIGNATURE-----
Merge tag 'pull-bd_flags-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull bdev flags update from Al Viro:
"Compactifying bdev flags.
We can easily have up to 24 flags with sane atomicity, _without_
pushing anything out of the first cacheline of struct block_device"
* tag 'pull-bd_flags-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
bdev: move ->bd_make_it_fail to ->__bd_flags
bdev: move ->bd_ro_warned to ->__bd_flags
bdev: move ->bd_has_subit_bio to ->__bd_flags
bdev: move ->bd_write_holder into ->__bd_flags
bdev: move ->bd_read_only to ->__bd_flags
bdev: infrastructure for flags
wrapper for access to ->bd_partno
Use bdev_is_paritition() instead of open-coding it
With the move to private task_work, SQPOLL neglected to also run the
normal task_work, if any is pending. This will eventually get run, but
we should run it with the private task_work to ensure that things like
a final fput() is processed in a timely fashion.
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/313824bc-799d-414f-96b7-e6de57c7e21d@gmail.com/
Reported-by: Andrew Udvare <audvare@gmail.com>
Fixes: af5d68f889 ("io_uring/sqpoll: manage task_work privately")
Tested-by: Christian Heusel <christian@heusel.eu>
Tested-by: Andrew Udvare <audvare@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
- Switch read and write software bits for PUDs
- Add missing hardware bits for PUDs and PMDs
- Generate unwind information for C modules to fix GDB unwind
error for vDSO functions
- Create .build-id links for unstripped vDSO files to enable
vDSO debugging with symbols
- Use standard stack frame layout for vDSO generated stack frames
to manually walk stack frames without DWARF information
- Rework perf_callchain_user() and arch_stack_walk_user() functions
to reduce code duplication
- Skip first stack frame when walking user stack
- Add basic checks to identify invalid instruction pointers when
walking stack frames
- Introduce and use struct stack_frame_vdso_wrapper within vDSO user
wrapper code to automatically generate an asm-offset define. Also
use STACK_FRAME_USER_OVERHEAD instead of STACK_FRAME_OVERHEAD to
document that the code works with user space stack
- Clear the backchain of the extra stack frame added by the vDSO user
wrapper code. This allows the user stack walker to detect and skip
the non-standard stack frame. Without this an incorrect instruction
pointer would be added to stack traces.
- Rewrite psw_idle() function in C to ease maintenance and further
enhancements
- Remove get_vtimer() function and use get_cpu_timer() instead
- Mark psw variable in __load_psw_mask() as __unitialized to avoid
superfluous clearing of PSW
- Remove obsolete and superfluous comment about removed TIF_FPU flag
- Replace memzero_explicit() and kfree() with kfree_sensitive() to
fix warnings reported by Coccinelle
- Wipe sensitive data and all copies of protected- or secure-keys
from stack when an IOCTL fails
- Both do_airq_interrupt() and do_io_interrupt() functions set
CIF_NOHZ_DELAY flag. Move it in do_io_irq() to simplify the code
- Provide iucv_alloc_device() and iucv_release_device() helpers,
which can be used to deduplicate more or less identical IUCV
device allocation and release code in four different drivers
- Make use of iucv_alloc_device() and iucv_release_device()
helpers to get rid of quite some code and also remove a
cast to an incompatible function (clang W=1)
- There is no user of iucv_root outside of the core IUCV code left.
Therefore remove the EXPORT_SYMBOL
- __apply_alternatives() contains a runtime check which verifies
that the size of the to be patched code area is even. Convert
this to a compile time check
- Increase size of buffers for sending z/VM CP DIAGNOSE X'008'
commands from 128 to 240
- Do not accept z/VM CP DIAGNOSE X'008' commands longer than
maximally allowed
- Use correct defines IPL_BP_NVME_LEN and IPL_BP0_NVME_LEN instead
of IPL_BP_FCP_LEN and IPL_BP0_FCP_LEN ones to initialize NVMe
reIPL block on 'scp_data' sysfs attribute update
- Initialize the correct fields of the NVMe dump block, which
were confused with FCP fields
- Refactor macros for 'scp_data' (re-)IPL sysfs attribute to
reduce code duplication
- Introduce 'scp_data' sysfs attribute for dump IPL to allow tools
such as dumpconf passing additional kernel command line parameters
to a stand-alone dumper
- Rework the CPACF query functions to use the correct RRE or RRF
instruction formats and set instruction register fields correctly
- Instead of calling BUG() at runtime force a link error during
compile when a unsupported opcode is used with __cpacf_query()
or __cpacf_check_opcode() functions
- Fix a crash in ap_parse_bitmap_str() function on /sys/bus/ap/apmask
or /sys/bus/ap/aqmask sysfs file update with a relative mask value
- Fix "bindings complete" udev event which should be sent once all AP
devices have been bound to device drivers and again when unbind/bind
actions take place and all AP devices are bound again
- Facility list alt_stfle_fac_list is nowhere used in the decompressor,
therefore remove it there
- Remove custom kprobes insn slot allocator in favour of the standard
module_alloc() one, since kernel image and module areas are located
within 4GB
- Use kvcalloc() instead of kvmalloc_array() in zcrypt driver to avoid
calling memset() with a large byte count and get rid of the sparse
warning as result
-----BEGIN PGP SIGNATURE-----
iI0EABYIADUWIQQrtrZiYVkVzKQcYivNdxKlNrRb8AUCZkyx2BccYWdvcmRlZXZA
bGludXguaWJtLmNvbQAKCRDNdxKlNrRb8PYZAP9KxEfTyUmIh61Gx8+m3BW5dy7p
E2Q8yotlUpGj49ul+AD8CEAyTiWR95AlMOVZZLV/0J7XIjhALvpKAGfiJWkvXgc=
=pife
-----END PGP SIGNATURE-----
Merge tag 's390-6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull more s390 updates from Alexander Gordeev:
- Switch read and write software bits for PUDs
- Add missing hardware bits for PUDs and PMDs
- Generate unwind information for C modules to fix GDB unwind error for
vDSO functions
- Create .build-id links for unstripped vDSO files to enable vDSO
debugging with symbols
- Use standard stack frame layout for vDSO generated stack frames to
manually walk stack frames without DWARF information
- Rework perf_callchain_user() and arch_stack_walk_user() functions to
reduce code duplication
- Skip first stack frame when walking user stack
- Add basic checks to identify invalid instruction pointers when
walking stack frames
- Introduce and use struct stack_frame_vdso_wrapper within vDSO user
wrapper code to automatically generate an asm-offset define. Also use
STACK_FRAME_USER_OVERHEAD instead of STACK_FRAME_OVERHEAD to document
that the code works with user space stack
- Clear the backchain of the extra stack frame added by the vDSO user
wrapper code. This allows the user stack walker to detect and skip
the non-standard stack frame. Without this an incorrect instruction
pointer would be added to stack traces.
- Rewrite psw_idle() function in C to ease maintenance and further
enhancements
- Remove get_vtimer() function and use get_cpu_timer() instead
- Mark psw variable in __load_psw_mask() as __unitialized to avoid
superfluous clearing of PSW
- Remove obsolete and superfluous comment about removed TIF_FPU flag
- Replace memzero_explicit() and kfree() with kfree_sensitive() to fix
warnings reported by Coccinelle
- Wipe sensitive data and all copies of protected- or secure-keys from
stack when an IOCTL fails
- Both do_airq_interrupt() and do_io_interrupt() functions set
CIF_NOHZ_DELAY flag. Move it in do_io_irq() to simplify the code
- Provide iucv_alloc_device() and iucv_release_device() helpers, which
can be used to deduplicate more or less identical IUCV device
allocation and release code in four different drivers
- Make use of iucv_alloc_device() and iucv_release_device() helpers to
get rid of quite some code and also remove a cast to an incompatible
function (clang W=1)
- There is no user of iucv_root outside of the core IUCV code left.
Therefore remove the EXPORT_SYMBOL
- __apply_alternatives() contains a runtime check which verifies that
the size of the to be patched code area is even. Convert this to a
compile time check
- Increase size of buffers for sending z/VM CP DIAGNOSE X'008' commands
from 128 to 240
- Do not accept z/VM CP DIAGNOSE X'008' commands longer than maximally
allowed
- Use correct defines IPL_BP_NVME_LEN and IPL_BP0_NVME_LEN instead of
IPL_BP_FCP_LEN and IPL_BP0_FCP_LEN ones to initialize NVMe reIPL
block on 'scp_data' sysfs attribute update
- Initialize the correct fields of the NVMe dump block, which were
confused with FCP fields
- Refactor macros for 'scp_data' (re-)IPL sysfs attribute to reduce
code duplication
- Introduce 'scp_data' sysfs attribute for dump IPL to allow tools such
as dumpconf passing additional kernel command line parameters to a
stand-alone dumper
- Rework the CPACF query functions to use the correct RRE or RRF
instruction formats and set instruction register fields correctly
- Instead of calling BUG() at runtime force a link error during compile
when a unsupported opcode is used with __cpacf_query() or
__cpacf_check_opcode() functions
- Fix a crash in ap_parse_bitmap_str() function on /sys/bus/ap/apmask
or /sys/bus/ap/aqmask sysfs file update with a relative mask value
- Fix "bindings complete" udev event which should be sent once all AP
devices have been bound to device drivers and again when unbind/bind
actions take place and all AP devices are bound again
- Facility list alt_stfle_fac_list is nowhere used in the decompressor,
therefore remove it there
- Remove custom kprobes insn slot allocator in favour of the standard
module_alloc() one, since kernel image and module areas are located
within 4GB
- Use kvcalloc() instead of kvmalloc_array() in zcrypt driver to avoid
calling memset() with a large byte count and get rid of the sparse
warning as result
* tag 's390-6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (39 commits)
s390/zcrypt: Use kvcalloc() instead of kvmalloc_array()
s390/kprobes: Remove custom insn slot allocator
s390/boot: Remove alt_stfle_fac_list from decompressor
s390/ap: Fix bind complete udev event sent after each AP bus scan
s390/ap: Fix crash in AP internal function modify_bitmap()
s390/cpacf: Make use of invalid opcode produce a link error
s390/cpacf: Split and rework cpacf query functions
s390/ipl: Introduce sysfs attribute 'scp_data' for dump ipl
s390/ipl: Introduce macros for (re)ipl sysfs attribute 'scp_data'
s390/ipl: Fix incorrect initialization of nvme dump block
s390/ipl: Fix incorrect initialization of len fields in nvme reipl block
s390/ipl: Do not accept z/VM CP diag X'008' cmds longer than max length
s390/ipl: Fix size of vmcmd buffers for sending z/VM CP diag X'008' cmds
s390/alternatives: Convert runtime sanity check into compile time check
s390/iucv: Unexport iucv_root
tty: hvc-iucv: Make use of iucv_alloc_device()
s390/smsgiucv_app: Make use of iucv_alloc_device()
s390/netiucv: Make use of iucv_alloc_device()
s390/vmlogrdr: Make use of iucv_alloc_device()
s390/iucv: Provide iucv_alloc_device() / iucv_release_device()
...
- Followup fix for the EFI boot sequence refactor, which may result in
physical KASLR putting the kernel in a region which is being used for
a special purpose via a command line argument.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQQm/3uucuRGn1Dmh0wbglWLn0tXAUCZks6+wAKCRAwbglWLn0t
XCJzAQCOgl1bhfwU14U0KNjboeCcQKYrvD/6AmqM/UBsQyfB4gD+NuEAh2iKldeS
+5DuaWE3/sMPrZlTpatNF5E99mu2uAQ=
=Um+v
-----END PGP SIGNATURE-----
Merge tag 'efi-fixes-for-v6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI fix from Ard Biesheuvel:
- Followup fix for the EFI boot sequence refactor, which may result in
physical KASLR putting the kernel in a region which is being used for
a special purpose via a command line argument.
* tag 'efi-fixes-for-v6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
x86/efistub: Omit physical KASLR when memory reservations exist
queue_limits_set() without DM core and targets first being updated
to set (and stack) discard limits in terms of max_hw_discard_sectors
and not max_discard_sectors.
- Fix stable@ DM integrity discard support to set device's
discard_granularity limit to the device's logical block size.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAmZMv3kACgkQxSPxCi2d
A1pX2gf+NjaTP+6otMJ44sYwUGHuWtgwDy7NcoTiF4RvLQjjrUh8Lgm2p3i4evFs
4AzMNXy5V6/mEx/AZb7sjCtPkIGOykkBud3+jhStohQKvXEJQcYtwACNv151NW2c
Fw2tcPZbPKH8P8UkDkegLaCu+4DotYjhuw44dfHFVZ95Wlhcm2UmcjWQEf/LtYW0
Si2FE0z2n1yi1mY6cExuL0bJ+LaMrXRQzHE3ZPaPRn4PFNUTY1juKsHYbgyL7cZ+
xQM1W/KBrKzDztCJGKH4Cl86L3kNPRkiQ7BQ/2Wtse20o10EGT0+lPvevCNcPNpi
gbjAo7OPy9WPA9N/AR8Wfj06YQFaGA==
=Pp+x
-----END PGP SIGNATURE-----
Merge tag 'for-6.10/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- Fix DM discard regressions due to DM core switching over to using
queue_limits_set() without DM core and targets first being updated to
set (and stack) discard limits in terms of max_hw_discard_sectors and
not max_discard_sectors
- Fix stable@ DM integrity discard support to set device's
discard_granularity limit to the device's logical block size
* tag 'for-6.10/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm: always manage discard support in terms of max_hw_discard_sectors
dm-integrity: set discard_granularity to logical block size
- Fix a memory leak in the exit path of amd-pstate (Peng Ma).
- Fix required_opp_tables handling in the cases when multiple generic
PM domains share one OPP table (Viresh Kumar).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmZMYk8SHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxdUIP/31SNqWSNni7aiUirnMIg60UbpM3jTMm
apkqHaSqdpL4ExM/Y8caSL9rIQD3P1GhWHy+0JVuK5tNiFH84oyMmKYd0v01QVy/
DLDs55XpPAgbkEjCcloe3QaR8mWgnQIXatjQkVoXZu3LAH6QLnDzT5w7C4HFXFEL
4gVlBvM/S69pf6FBG0SZhKLON3zryGHCBQMqOFcowJ8QGnfQsC+YzEWwvyncsDv4
nFV/lje/LwARlsxUfjc7EZmf8tRtfgkgSFm+3pI/EtQikKwI0ttheF8eE4MEtpxe
0A3xCV1aDq+tQEXgU7IIqVvsuu5xOcdfJ3rfYm5GmgZSh1RU1b80S7MJPWBJn0ip
PTjVZ3UyaW+YhGw5br2uXWK427PRhPHUyfCPO1UiSSDu0qZPNNHIjv5RQk336Sbt
NQUl7XO45Lv5XwGq8uqCXxnxN8LaNiOd+JZ6xm3R34Pc/vR6zEptOc/748WS2cxV
O2h1F6u0MiFqP9ieTcY23anrZqU//a61P8xz9e6oRfieUylOExG/aAsNrZ9Q2J6Z
T9mMj9OQlcbXIu7X2Ai6Fn3ppWR8/rVJu2i2WwrsGYkdnWuJ3BqdMOpnKw1438bb
2AC2Pc0hC22p8/tpqckcwSHJOOYCynOc7y/2/X7skcY+jkXg5zhPtBCx3GhCsEjV
RUj9L2CTorvS
=mGfT
-----END PGP SIGNATURE-----
Merge tag 'pm-6.10-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix the amd-pstate driver and the operating performance point
(OPP) handling related to generic PM domains.
Specifics:
- Fix a memory leak in the exit path of amd-pstate (Peng Ma)
- Fix required_opp_tables handling in the cases when multiple generic
PM domains share one OPP table (Viresh Kumar)"
* tag 'pm-6.10-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
OPP: Fix required_opp_tables for multiple genpds using same table
cpufreq: amd-pstate: fix memory leak on CPU EPP exit
These make the ACPI EC driver always install the EC address space handler
at the root of the ACPI namespace which causes it to take care of all EC
operation regions everywhere, so in particular the custom EC address
space handler in the WMI driver is not needed any more and accordingly
it gets removed altogether.
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmZMYfoSHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxyQEQAJst7Tn/3kMwGQXnalapUC/sKmabNNqq
9E6hlFMlvkev36QNEmPg102Bs+qzr3JVzydyoP+9P/rarw3pa1f2ab7JSQbV77HX
s+EJ75NKT4GFGx2uq3UGArFqLaPhm+jT4dqKES0f+Niw1urgkpCXl//j9CpbdgWM
6oIlkDePUZqZ6mleFgYI99jhqtwI3yWcHVNHO7Iw9F6fA6A+qmMLELJvgsyW4Yaw
E3CBhszMu4YeX5ovQxze/UYYdUMBf/JWel3iuzu9Gpt3IdWpocIjCgMXKp0/eekG
wR1PuGA7X/4Tv4pJ2KDMMGD9w5ff6hBj9iI35BACxjuje4mVQf3LLBKmfphOaTVo
NBFLTJW8gVyFkudEwmvuNBOezG/ltIXqLVI79qwWkYcLagSQeg4HiKXkv7pKzVwA
1kAcG3Vg3lrg3dO09Nu9xWzOHoLiJSROCqDA1onR+cmrvb1cJB05PGS3lvHwAQgJ
+dv1/+iIur3R0Yisc3pXKWz4q4wARjjafL/8QWBXlwNlo1PBVqv5Hbf3l1N15jnZ
lkl21RmoTRzw7p/yqiz0JIWXL+xqDLRQhko1wVAbdBSTEtlhCaBQZKx67rZ849Je
KtbSwupI5u++QuB2uYGJ+9+XqRoZ4m7LZOGGXXPjDrkJdBE72T9MJKiZohAMptVa
uOvE0AMszRiz
=Ho4A
-----END PGP SIGNATURE-----
Merge tag 'acpi-6.10-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These make the ACPI EC driver always install the EC address space
handler at the root of the ACPI namespace which causes it to take care
of all EC operation regions everywhere.
This means that the custom EC address space handler in the WMI driver
is not needed any more and accordingly it gets removed altogether"
* tag 'acpi-6.10-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
platform/x86: wmi: Remove custom EC address space handler
ACPI: EC: Install address space handler at the namespace root
- Fix and clean up the MediaTek lvts_thermal driver (Julien Panis).
- Prevent invalid trip point handling from triggering spurious trip
point crossing events and allow passive polling to stop when a
passive trip point involved in it becomes invalid (Rafael Wysocki).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmZMYXoSHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxWKsP/R4PjdTgXYKg1NOGjsBJB5kYUVJZZkgZ
Kr123fp2pmJZ7MI7AsoD3I6d0Z9Kfj2MEHgUYyU7anN+Ub+2JpQufK48CV/KoQSW
LSC2hJsgHqRaOmP9i+tNGvKY3lnRTCKlLGhK+uSNC5ghHaFJuF/3ihEXHm0RWm4N
w4VBN3y721wcHGNXCOeQRxM4dq6Z2Z2L5qWsgvEszPz9PwxflJ5ek2KfCyZDjtWu
zh7xhk6WQ8kHkft2XhrCD9/AVIESeBLO9arvHh3TXUiJHHijyY/oTOHFP6VZ2n/A
HZiJeRGKMNaBktkkuhCzQQbu0PpZ97GqCyrFc18wDFzdeiFSwd4vNYeeOBK+RyNo
tUrlfm9X1O5e4Rary9mQ7DW9LJJ6DJSao4FFH0+wh2cSfJtl2/03lg6/obmb8hXh
uuvKCLJUqoE6IF5bXL1943QWENGyCEHjvJWS46NuGSy+Ly8+khS49rPEfJg1Qj7g
F12+1GL0uryTDLbL5V+IOtjFtUE786j5bHrxHTsasIvzbNvbm/fHbP/7JJW3l/Wf
MDwMx7MaijNeHeO4cLGWnaWA4U9kaxsphYii7icuUjw55khqBbt+EmE3Npc/fVlM
mWVkEZBGWKKpZb2zww7WX6FgBSZb/SZjMoxTUu/2YQgrwwiMe1LIHUFC3U5OATVG
vKXn8ALYHSly
=EjP7
-----END PGP SIGNATURE-----
Merge tag 'thermal-6.10-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull thermal control fixes from Rafael Wysocki:
"These fix the MediaTek lvts_thermal driver and the handling of trip
points that start as invalid and are adjusted later by user space via
sysfs.
Specifics:
- Fix and clean up the MediaTek lvts_thermal driver (Julien Panis)
- Prevent invalid trip point handling from triggering spurious trip
point crossing events and allow passive polling to stop when a
passive trip point involved in it becomes invalid (Rafael Wysocki)"
* tag 'thermal-6.10-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: core: Fix the handling of invalid trip points
thermal/drivers/mediatek/lvts_thermal: Fix wrong lvts_ctrl index
thermal/drivers/mediatek/lvts_thermal: Remove unused members from struct lvts_ctrl_data
thermal/drivers/mediatek/lvts_thermal: Check NULL ptr on lvts_data