- Ensure perf events programmed to count during guest execution
are actually enabled before entering the guest in the nVHE
configuration.
- Restore out-of-range handler for stage-2 translation faults.
- Several fixes to stage-2 TLB invalidations to avoid stale
translations, possibly including partial walk caches.
- Fix early handling of architectural VHE-only systems to ensure E2H is
appropriately set.
- Correct a format specifier warning in the arch_timer selftest.
- Make the KVM banner message correctly handle all of the possible
configurations.
RISC-V:
- Remove redundant semicolon in num_isa_ext_regs().
- Fix APLIC setipnum_le/be write emulation.
- Fix APLIC in_clrip[x] read emulation.
x86:
- Fix a bug in KVM_SET_CPUID{2,} where KVM looks at the wrong CPUID entries (old
vs. new) and ultimately neglects to clear PV_UNHALT from vCPUs with HLT-exiting
disabled.
- Documentation fixes for SEV.
- Fix compat ABI for KVM_MEMORY_ENCRYPT_OP.
- Fix a 14-year-old goof in a declaration shared by host and guest; the enabled
field used by Linux when running as a guest pushes the size of "struct
kvm_vcpu_pv_apf_data" from 64 to 68 bytes. This is really unconsequential
because KVM never consumes anything beyond the first 64 bytes, but the
resulting struct does not match the documentation.
Selftests:
- Fix spelling mistake in arch_timer selftest.
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmYMOJYUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroP2zAf/Z7/cK0+yFSvm7/tsbWtjnWofad/p
82puu0V+8lZSjGVs3AydiDCV+FahvLS0QIwgrffVr4XA10Km5ZZMjZyJ3uH4xki/
VFFsDnZPdKuj55T0wwN7JFn0YVOMdtgcP0b+F8aMbkL0uoJXjutOMKNhssuW12kw
9cmPjaBWm/bfrfoTUUB9mCh0Ub3HKpguYwTLQuf6Fyn2FK7oORpt87Zi+oIKUn6H
pFXFtZYduLg6M2LXvZqsXZLXnvABPjANNWEhiiwrvuF/wmXXTwTpvRXlYXhCvpAN
q0AhxPhPm3NnsmRhEB6SmoMjXyZIByezcEiqAspBrUvEqs/2u6VyzFMrXw==
=PlsI
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"ARM:
- Ensure perf events programmed to count during guest execution are
actually enabled before entering the guest in the nVHE
configuration
- Restore out-of-range handler for stage-2 translation faults
- Several fixes to stage-2 TLB invalidations to avoid stale
translations, possibly including partial walk caches
- Fix early handling of architectural VHE-only systems to ensure E2H
is appropriately set
- Correct a format specifier warning in the arch_timer selftest
- Make the KVM banner message correctly handle all of the possible
configurations
RISC-V:
- Remove redundant semicolon in num_isa_ext_regs()
- Fix APLIC setipnum_le/be write emulation
- Fix APLIC in_clrip[x] read emulation
x86:
- Fix a bug in KVM_SET_CPUID{2,} where KVM looks at the wrong CPUID
entries (old vs. new) and ultimately neglects to clear PV_UNHALT
from vCPUs with HLT-exiting disabled
- Documentation fixes for SEV
- Fix compat ABI for KVM_MEMORY_ENCRYPT_OP
- Fix a 14-year-old goof in a declaration shared by host and guest;
the enabled field used by Linux when running as a guest pushes the
size of "struct kvm_vcpu_pv_apf_data" from 64 to 68 bytes. This is
really unconsequential because KVM never consumes anything beyond
the first 64 bytes, but the resulting struct does not match the
documentation
Selftests:
- Fix spelling mistake in arch_timer selftest"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (25 commits)
KVM: arm64: Rationalise KVM banner output
arm64: Fix early handling of FEAT_E2H0 not being implemented
KVM: arm64: Ensure target address is granule-aligned for range TLBI
KVM: arm64: Use TLBI_TTL_UNKNOWN in __kvm_tlb_flush_vmid_range()
KVM: arm64: Don't pass a TLBI level hint when zapping table entries
KVM: arm64: Don't defer TLB invalidation when zapping table entries
KVM: selftests: Fix __GUEST_ASSERT() format warnings in ARM's arch timer test
KVM: arm64: Fix out-of-IPA space translation fault handling
KVM: arm64: Fix host-programmed guest events in nVHE
RISC-V: KVM: Fix APLIC in_clrip[x] read emulation
RISC-V: KVM: Fix APLIC setipnum_le/be write emulation
RISC-V: KVM: Remove second semicolon
KVM: selftests: Fix spelling mistake "trigged" -> "triggered"
Documentation: kvm/sev: clarify usage of KVM_MEMORY_ENCRYPT_OP
Documentation: kvm/sev: separate description of firmware
KVM: SEV: fix compat ABI for KVM_MEMORY_ENCRYPT_OP
KVM: selftests: Check that PV_UNHALT is cleared when HLT exiting is disabled
KVM: x86: Use actual kvm_cpuid.base for clearing KVM_FEATURE_PV_UNHALT
KVM: x86: Introduce __kvm_get_hypervisor_cpuid() helper
KVM: SVM: Return -EINVAL instead of -EBUSY on attempt to re-init SEV/SEV-ES
...
Commit 08abce60d6 ("security: Introduce path_post_mknod hook")
introduced security_path_post_mknod(), to replace the IMA-specific call
to ima_post_path_mknod().
For symmetry with security_path_mknod(), security_path_post_mknod() was
called after a successful mknod operation, for any file type, rather
than only for regular files at the time there was the IMA call.
However, as reported by VFS maintainers, successful mknod operation does
not mean that the dentry always has an inode attached to it (for
example, not for FIFOs on a SAMBA mount).
If that condition happens, the kernel crashes when
security_path_post_mknod() attempts to verify if the inode associated to
the dentry is private.
Move security_path_post_mknod() where the ima_post_path_mknod() call was,
which is obviously correct from IMA/EVM perspective. IMA/EVM are the only
in-kernel users, and only need to inspect regular files.
Reported-by: Steve French <smfrench@gmail.com>
Closes: https://lore.kernel.org/linux-kernel/CAH2r5msAVzxCUHHG8VKrMPUKQHmBpE6K9_vjhgDa1uAvwx4ppw@mail.gmail.com/
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Fixes: 08abce60d6 ("security: Introduce path_post_mknod hook")
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The srso_alias_untrain_ret() dummy thunk in the !CONFIG_MITIGATION_SRSO
case is there only for the altenative in CALL_UNTRAIN_RET to have
a symbol to resolve.
However, testing with kernels which don't have CONFIG_MITIGATION_SRSO
enabled, leads to the warning in patch_return() to fire:
missing return thunk: srso_alias_untrain_ret+0x0/0x10-0x0: eb 0e 66 66 2e
WARNING: CPU: 0 PID: 0 at arch/x86/kernel/alternative.c:826 apply_returns (arch/x86/kernel/alternative.c:826
Put in a plain "ret" there so that gcc doesn't put a return thunk in
in its place which special and gets checked.
In addition:
ERROR: modpost: "srso_alias_untrain_ret" [arch/x86/kvm/kvm-amd.ko] undefined!
make[2]: *** [scripts/Makefile.modpost:145: Module.symvers] Chyba 1
make[1]: *** [/usr/src/linux-6.8.3/Makefile:1873: modpost] Chyba 2
make: *** [Makefile:240: __sub-make] Chyba 2
since !SRSO builds would use the dummy return thunk as reported by
petr.pisar@atlas.cz, https://bugzilla.kernel.org/show_bug.cgi?id=218679.
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202404020901.da75a60f-oliver.sang@intel.com
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/all/202404020901.da75a60f-oliver.sang@intel.com/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ice_port_vlan_on/off() was introduced in commit 2946204b3f ("ice:
implement bridge port vlan"). But ice_port_vlan_on() incorrectly assigns
ena_rx_filtering to inner_vlan_ops in DVM mode.
This causes an error when rx_filtering cannot be enabled in legacy mode.
Reproducer:
echo 1 > /sys/class/net/$PF/device/sriov_numvfs
ip link set $PF vf 0 spoofchk off trust on vlan 3
dmesg:
ice 0000:41:00.0: failed to enable Rx VLAN filtering for VF 0 VSI 9 during VF rebuild, error -95
Fixes: 2946204b3f ("ice: implement bridge port vlan")
Signed-off-by: Petr Oros <poros@redhat.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Automatically cleaned up pointers need to be initialized before exiting
their scope. In this case, they need to be initialized to NULL before
any return statement.
Fixes: 90f821d72e ("ice: avoid unnecessary devm_ usage")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
IPA probe function was recently refactored to perform extra error checks
and make sure the thermal zone has trip points necessary for the IPA
operation. With this change, if a thermal zone is probed such that it
has no trip points that IPA can use, IPA will fail and the TZ won't be
created. This is the case if a platform defines a TZ without cooling
devices and only with "hot"/"critical" trip points, often found on some
Qualcomm devices [1].
Documentation across IPA code (notably get_governor_trips() kerneldoc)
suggests that IPA is supposed to handle such TZ even if it won't
actually do anything.
This commit partially reverts the previous change to allow IPA to bind
to such "empty" thermal zones.
Fixes: e83747c2f8 ("thermal: gov_power_allocator: Set up trip points earlier")
Link: arch/arm64/boot/dts/qcom/sc7180.dtsi#n4776 # [1]
Signed-off-by: Nikita Travkin <nikita@trvn.ru>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
IPA was recently refactored to split out memory allocation into a
separate funciton. That funciton was made to return -EINVAL if there is
zero power_actors and thus no memory to allocate. This causes IPA to
fail probing when the thermal zone has no attached cooling devices.
Since cooling devices can attach after the thermal zone is created and
the governer is attached to it, failing probe due to the lack of cooling
devices is incorrect.
Change the allocate_actors_buffer() to return success when there is no
cooling devices present.
Fixes: 912e97c67c ("thermal: gov_power_allocator: Move memory allocation out of throttle()")
Signed-off-by: Nikita Travkin <nikita@trvn.ru>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
vboxsf does not break leases on its own, so it can't properly handle the
case where the hypervisor changes the data. Don't allow file leases on
vboxsf.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20240319-setlease-v1-1-5997d67e04b3@kernel.org
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
If an load_nls_xxx() function fails a few lines above, the 'sbi->bdi_id' is
still 0.
So, in the error handling path, we will call ida_simple_remove(..., 0)
which is not allocated yet.
In order to prevent a spurious "ida_free called for id=0 which is not
allocated." message, tweak the error handling path and add a new label.
Fixes: 0fd1695766 ("fs: Add VirtualBox guest shared folder (vboxsf) support")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/d09eaaa4e2e08206c58a1a27ca9b3e81dc168773.1698835730.git.christophe.jaillet@wanadoo.fr
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
The SVE register sets have two different formats, one of which is a wrapped
version of the standard FPSIMD register set and another with actual SVE
register data. At present we check TIF_SVE to see if full SVE register
state should be provided when reading the SVE regset but if we were in a
syscall we may have saved only floating point registers even though that is
set.
Fix this and simplify the logic by checking and using the format which we
recorded when deciding if we should use FPSIMD or SVE format.
Fixes: 8c845e2731 ("arm64/sve: Leave SVE enabled on syscall if we don't context switch")
Cc: <stable@vger.kernel.org> # 6.2.x
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20240325-arm64-ptrace-fp-type-v1-1-8dc846caf11f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The variable out_len is being used to accumulate the number of
bytes but it is not being used for any other purpose. The variable
is redundant and can be removed.
Cleans up clang scan build warning:
fs/vboxsf/utils.c:443:9: warning: variable 'out_len' set but not
used [-Wunused-but-set-variable]
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://lore.kernel.org/r/20240229225138.351909-1-colin.i.king@gmail.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Regular expression used to match the unit address part should not allow
non-hex numbers. Expect at least one hex digit as well.
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20240325104833.33372-2-krzysztof.kozlowski@linaro.org
Signed-off-by: Rob Herring <robh@kernel.org>
Regular expression used to match the unit address part should not allow
non-hex numbers.
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20240325104833.33372-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Rob Herring <robh@kernel.org>
TI Davinci remoteproc bindings were marked as work-in-progress /
unstable in 2017 in commit ae67b80078 ("dt-bindings: remoteproc: Add
bindings for Davinci DSP processors"). Almost seven years is enough, so
drop the "unstable" remark and expect usual ABI rules.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Acked-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20240224091236.10146-3-krzysztof.kozlowski@linaro.org
Signed-off-by: Rob Herring <robh@kernel.org>
Several TI SoC clock bindings were marked as work-in-progress / unstable
between 2013-2016, for example in commit f60b1ea5ea ("CLK: TI: add
support for gate clock"). It was enough of time to consider them stable
and expect usual ABI rules.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Acked-by: Stephen Boyd <sboyd@kernel.org>
Acked-by: Rob Herring <robh@kernel.org>
Acked-by: Tony Lindgren <tony@atomide.com>
Link: https://lore.kernel.org/r/20240224091236.10146-2-krzysztof.kozlowski@linaro.org
Signed-off-by: Rob Herring <robh@kernel.org>
Keystone clock controller bindings were marked as work-in-progress /
unstable in 2013 in commit b9e0d40c0d ("clk: keystone: add Keystone
PLL clock driver") and commit 7affe5685c ("clk: keystone: Add gate
control clock driver") Almost eleven years is enough, so drop the
"unstable" remark and expect usual ABI rules.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Acked-by: Stephen Boyd <sboyd@kernel.org>
Acked-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20240224091236.10146-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Rob Herring <robh@kernel.org>
Align system call table on 8 bytes. With sys_call_table entry size
of 8 bytes that eliminates the possibility of a system call pointer
crossing cache line boundary.
Cc: stable@kernel.org
Suggested-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
In case of a sampling event, the PAI PMU device drivers need a
reference to this event. Currently to PMU device driver reference
is removed when a sampling event is destroyed. This may lead to
situations where the reference of the PMU device driver is removed
while being used by a different sampling event.
Reset the event reference pointer of the PMU device driver when
a sampling event is deleted and before the next one might be added.
Fixes: 39d62336f5 ("s390/pai: add support for cryptography counters")
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
preempt_count-related functions are quite ubiquitous and may be called
by noinstr ones, introducing unwanted instrumentation. Here is one
example call chain:
irqentry_nmi_enter() # noinstr
lockdep_hardirqs_enabled()
this_cpu_read()
__pcpu_size_call_return()
this_cpu_read_*()
this_cpu_generic_read()
__this_cpu_generic_read_nopreempt()
preempt_disable_notrace()
__preempt_count_inc()
__preempt_count_add()
They are very small, so there are no significant downsides to
force-inlining them.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20240320230007.4782-3-iii@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Atomic functions are quite ubiquitous and may be called by noinstr
ones, introducing unwanted instrumentation. They are very small, so
there are no significant downsides to force-inlining them.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20240320230007.4782-2-iii@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
The recently added check to figure out if a fault happened on gmap ASCE
dereferences the gmap pointer in lowcore without checking that it is not
NULL. For all non-KVM processes the pointer is NULL, so that some value
from lowcore will be read. With the current layouts of struct gmap and
struct lowcore the read value (aka ASCE) is zero, so that this doesn't lead
to any observable bug; at least currently.
Fix this by adding the missing NULL pointer check.
Fixes: 64c3431808 ("s390/entry: compare gmap asce to determine guest/host fault")
Acked-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
When a gpiochip gets added by loading a module, then another driver may
be waiting for that gpiochip to load on the deferred-probe list.
If the deferred-probe for the consumer of gpiochip then triggers between
the gpiodev_add_to_list_unlocked() calls which makes gpio_device_find()
see the chip and the gpiochip_setup_dev() later then gpio_device_find()
does a kobject_get() on an uninitialized kobject since the kobject is
initialized by gpiochip_setup_dev() calling device_initialize():
arizona spi-10WM5102:00: cannot find GPIO chip arizona, deferring
arizona spi-10WM5102:00: cannot find GPIO chip arizona, deferring
------------[ cut here ]------------
kobject: 'gpiochip5' (00000000241466f2): is not initialized, yet kobject_get() is being called.
WARNING: CPU: 3 PID: 42 at lib/kobject.c:640 kobject_get+0x43/0x70
Call Trace:
kobject_get
gpio_device_find
gpiod_find_and_request
gpiod_get
snd_byt_wm5102_mc_probe
Not only is the device not initialized yet, but when the gpio-device is
added to the list things like the irqchip also have not been initialized
yet.
So gpio_device_find() should really ignore the gpio-device until
gpiochip_add_data_with_key() is fully done. Add a device_is_registered()
check to gpio_device_find() to ignore gpio-devices on the list which are
not yet fully initialized.
Fixes: aab5c6f200 ("gpio: set device type for GPIO chips")
Suggested-by: Bartosz Golaszewski <brgl@bgdev.pl>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Andy Shevchenko <andy@kernel.org>
[Bartosz: fix a typo in commit message]
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
If the RBUF logic is not reset when the kernel starts then there
may be some data left over from any network boot loader. If the
64-byte packet headers are enabled then this can be fatal.
Extend bcmgenet_dma_disable to do perform the reset, but not when
called from bcmgenet_resume in order to preserve a wake packet.
N.B. This different handling of resume is just based on a hunch -
why else wouldn't one reset the RBUF as well as the TBUF? If this
isn't the case then it's easy to change the patch to make the RBUF
reset unconditional.
See: https://github.com/raspberrypi/linux/issues/3850
See: https://github.com/raspberrypi/firmware/issues/1882
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Signed-off-by: Maarten Vanraes <maarten@rmail.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
In function pci1xxxx_spi_probe, there is a potential null pointer that
may be caused by a failed memory allocation by the function devm_kzalloc.
Hence, a null pointer check needs to be added to prevent null pointer
dereferencing later in the code.
To fix this issue, spi_bus->spi_int[iter] should be checked. The memory
allocated by devm_kzalloc will be automatically released, so just directly
return -ENOMEM without worrying about memory leaks.
Fixes: 1cc0cbea71 ("spi: microchip: pci1xxxx: Add driver for SPI controller of PCI1XXXX PCIe switch")
Signed-off-by: Huai-Yuan Liu <qq810974084@gmail.com>
Link: https://msgid.link/r/20240403014221.969801-1-qq810974084@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
devm_spi_alloc_controller will allocate an SPI controller and
automatically release a reference on it when dev is unbound from
its driver. It doesn't need to call spi_controller_put explicitly
to put the reference when lpspi driver failed initialization.
Fixes: 2ae0ab0143 ("spi: lpspi: Avoid potential use-after-free in probe()")
Signed-off-by: Carlos Song <carlos.song@nxp.com>
Reviewed-by: Alexander Sverdlin <alexander.sverdlin@siemens.com>
Link: https://msgid.link/r/20240403084029.2000544-1-carlos.song@nxp.com
Signed-off-by: Mark Brown <broonie@kernel.org>
In rvu_map_cgx_lmac_pf() the 'iter', which is used as an array index, can reach
value (up to 14) that exceed the size (MAX_LMAC_COUNT = 8) of the array.
Fix this bug by adding 'iter' value check.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 91c6945ea1 ("octeontx2-af: cn10k: Add RPM MAC support")
Signed-off-by: Aleksandr Mishin <amishin@t-argos.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
The MSR_PEBS_DATA_CFG MSR register is used to configure which data groups
should be generated into a PEBS record, and it's shared among all counters.
If there are different configurations among counters, perf combines all the
configurations.
The first perf command as below requires a complete PEBS record
(including memory info, GPRs, XMMs, and LBRs). The second perf command
only requires a basic group. However, after the second perf command is
running, the MSR_PEBS_DATA_CFG register is cleared. Only a basic group is
generated in a PEBS record, which is wrong. The required information
for the first perf command is missed.
$ perf record --intr-regs=AX,SP,XMM0 -a -C 8 -b -W -d -c 100000003 -o /dev/null -e cpu/event=0xd0,umask=0x81/upp &
$ sleep 5
$ perf record --per-thread -c 1 -e cycles:pp --no-timestamp --no-tid taskset -c 8 ./noploop 1000
The first PEBS event is a system-wide PEBS event. The second PEBS event
is a per-thread event. When the thread is scheduled out, the
intel_pmu_pebs_del() function is invoked to update the PEBS state.
Since the system-wide event is still available, the cpuc->n_pebs is 1.
The cpuc->pebs_data_cfg is cleared. The data configuration for the
system-wide PEBS event is lost.
The (cpuc->n_pebs == 1) check was introduced in commit:
b6a32f023f ("perf/x86: Fix PEBS threshold initialization")
At that time, it indeed didn't hurt whether the state was updated
during the removal, because only the threshold is updated.
The calculation of the threshold takes the last PEBS event into
account.
However, since commit:
b752ea0c28 ("perf/x86/intel/ds: Flush PEBS DS when changing PEBS_DATA_CFG")
we delay the threshold update, and clear the PEBS data config, which triggers
the bug.
The PEBS data config update scope should not be shrunk during removal.
[ mingo: Improved the changelog & comments. ]
Fixes: b752ea0c28 ("perf/x86/intel/ds: Flush PEBS DS when changing PEBS_DATA_CFG")
Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240401133320.703971-1-kan.liang@linux.intel.com
Tony encountered this OOPS when the last CPU of a domain goes
offline while running a kernel built with CONFIG_NO_HZ_FULL:
BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0
Oops: 0000 [#1] PREEMPT SMP NOPTI
...
RIP: 0010:__find_nth_andnot_bit+0x66/0x110
...
Call Trace:
<TASK>
? __die()
? page_fault_oops()
? exc_page_fault()
? asm_exc_page_fault()
cpumask_any_housekeeping()
mbm_setup_overflow_handler()
resctrl_offline_cpu()
resctrl_arch_offline_cpu()
cpuhp_invoke_callback()
cpuhp_thread_fun()
smpboot_thread_fn()
kthread()
ret_from_fork()
ret_from_fork_asm()
</TASK>
The NULL pointer dereference is encountered while searching for another
online CPU in the domain (of which there are none) that can be used to
run the MBM overflow handler.
Because the kernel is configured with CONFIG_NO_HZ_FULL the search for
another CPU (in its effort to prefer those CPUs that aren't marked
nohz_full) consults the mask representing the nohz_full CPUs,
tick_nohz_full_mask. On a kernel with CONFIG_CPUMASK_OFFSTACK=y
tick_nohz_full_mask is not allocated unless the kernel is booted with
the "nohz_full=" parameter and because of that any access to
tick_nohz_full_mask needs to be guarded with tick_nohz_full_enabled().
Replace the IS_ENABLED(CONFIG_NO_HZ_FULL) with tick_nohz_full_enabled().
The latter ensures tick_nohz_full_mask can be accessed safely and can be
used whether kernel is built with CONFIG_NO_HZ_FULL enabled or not.
[ Use Ingo's suggestion that combines the two NO_HZ checks into one. ]
Fixes: a4846aaf39 ("x86/resctrl: Add cpumask_any_housekeeping() for limbo/overflow")
Reported-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/ff8dfc8d3dcb04b236d523d1e0de13d2ef585223.1711993956.git.reinette.chatre@intel.com
Closes: https://lore.kernel.org/lkml/ZgIFT5gZgIQ9A9G7@agluck-desk3/
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmYMx28UHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXO+CBAAuVp9uAfD7E/RTvcu9/uA1Y59LpFE
DzzwNyfOAXw1ZWMyVzwaKEEtlwB4PPC1Ojo0Jkxoctz8gKADb46ze3EZWTr0y9Lt
nbF4rQJDJUU2WVqQwzeJsYNCrxTmjQfgrxL+9tbrouFhikmKI0k0ogijz1aVTyWP
yG0v8gpvfNdHxwm05yXv5x+Fr4DyeHsV1AobHxu58X/NVGla0hb4XdfYViZRWYTB
/lySy/6hRooIxRxC+ruE4lLknQJbZz9nxyJcujAy2ylld52vVlyZSIrxglDi3ux7
CJOqZ4paxJWhFNRd2PbJVy8lnJYo6iJve/LpYNvaqzrba4+Ginn08u3LIwmiztno
iJpH1TcYf9oVwefbQaXU0q0jHNyc6o/W/LISAMdcT0cMIO+gBg1jXpAsVoZBcK7u
cXkJKRiWKKz3D2UX+Aky9e0GxFTyMhyku1d6pJ7lY82lNzwCqGR9skN+A+OYAv0F
S8KSsigh2cWrDtRKrJnnpGj09cODhgRW6bityXjdZ5+b/m2TFcfMKtWW/P6WxNfW
Lh00feiZxB8WC+h5D1KmEDlyC77Eo4pGlR9JfwFznYJR5W8yiomLyy3pY/qXB5EO
57Bq9rniEVv/tl64kXORpPBZzaC/ApxAtUhExwzdz/zpe9yozyVp0bQ1zEvGEXZF
HXTAzisjG8ecXmU=
=B8HM
-----END PGP SIGNATURE-----
Merge tag 'selinux-pr-20240402' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux fix from Paul Moore:
"A single patch for SELinux to fix a problem where we could potentially
dereference an error pointer if we failed to successfully mount
selinuxfs"
* tag 'selinux-pr-20240402' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: avoid dereference of garbage after mount failure
On some boards with this chip version the BIOS is buggy and misses
to reset the PHY page selector. This results in the PHY ID read
accessing registers on a different page, returning a more or
less random value. Fix this by resetting the page selector first.
Fixes: f1e911d5d0 ("r8169: add basic phylib support")
Cc: stable@vger.kernel.org
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/64f2055e-98b8-45ec-8568-665e3d54d4e6@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If we look up the kbuf, ensure that it doesn't get unregistered until
after we're done with it. Since we're inside mmap, we cannot safely use
the io_uring lock. Rely on the fact that we can lookup the buffer list
under RCU now and grab a reference to it, preventing it from being
unregistered until we're done with it. The lookup returns the
io_buffer_list directly with it referenced.
Cc: stable@vger.kernel.org # v6.4+
Fixes: 5cf4f52e6d ("io_uring: free io_buffer_list entries via RCU")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
No functional changes in this patch, just in preparation for being able
to keep the buffer list alive outside of the ctx->uring_lock.
Cc: stable@vger.kernel.org # v6.4+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Now that xarray is being exclusively used for the buffer_list lookup,
this check is no longer needed. Get rid of it and the is_ready member.
Cc: stable@vger.kernel.org # v6.4+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Just rely on the xarray for any kind of bgid. This simplifies things, and
it really doesn't bring us much, if anything.
Cc: stable@vger.kernel.org # v6.4+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Commit 82dfb540ae ("VSOCK: Add virtio vsock vsockmon hooks") added
virtio_transport_deliver_tap_pkt() for handing packets to the
vsockmon device. However, in virtio_transport_send_pkt_work(),
the function is called before actually sending the packet (i.e.
before placing it in the virtqueue with virtqueue_add_sgs() and checking
whether it returned successfully).
Queuing the packet in the virtqueue can fail even multiple times.
However, in virtio_transport_deliver_tap_pkt() we deliver the packet
to the monitoring tap interface only the first time we call it.
This certainly avoids seeing the same packet replicated multiple times
in the monitoring interface, but it can show the packet sent with the
wrong timestamp or even before we succeed to queue it in the virtqueue.
Move virtio_transport_deliver_tap_pkt() after calling virtqueue_add_sgs()
and making sure it returned successfully.
Fixes: 82dfb540ae ("VSOCK: Add virtio vsock vsockmon hooks")
Cc: stable@vge.kernel.org
Signed-off-by: Marco Pinna <marco.pinn95@gmail.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20240329161259.411751-1-marco.pinn95@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When the ax25 device is detaching, the ax25_dev_device_down()
calls ax25_ds_del_timer() to cleanup the slave_timer. When
the timer handler is running, the ax25_ds_del_timer() that
calls del_timer() in it will return directly. As a result,
the use-after-free bugs could happen, one of the scenarios
is shown below:
(Thread 1) | (Thread 2)
| ax25_ds_timeout()
ax25_dev_device_down() |
ax25_ds_del_timer() |
del_timer() |
ax25_dev_put() //FREE |
| ax25_dev-> //USE
In order to mitigate bugs, when the device is detaching, use
timer_shutdown_sync() to stop the timer.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240329015023.9223-1-duoming@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In the discard worker, we were failing to validate the bucket state -
meaning a corrupt needs_discard btree could cause us to discard a bucket
that we shouldn't.
If check_alloc_info hasn't run yet we just want to bail out, otherwise
it's a filesystem inconsistent error.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Merge series from Zhang Yi <zhangyi@everest-semi.com>:
We solved some issues related to headphone detection.And for using
the same configuration in different power conditions,we modified the
clock table
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmYMH1YPHGNvcmJldEBs
d24ubmV0AAoJEBdDWhNsDH5YgacH/3YREzLrjQz3JMBg1Hm4E+isKJZwfPYBc1I6
jpo5s8mdaJeOevByXYPy1ckiXLtH2OJZaUzTKCyvwS4GBlvw2/ipgdbiR3vY1sXR
XJVUX+88KNWcH0XVDb0aB1tMGo/Wtx6YQ5Zlt+5SQ3aMqwJYzCMIIfjzOq7OJl42
lezW7c6MM+8xhEAfISaaGgtAM/2yuRA41cPs4AxEXH0hT9YPQs7/qy+yN7DDTO1r
E3PVZjW6Cx1/gS0C7DsxTFrh+OjluE54SCQTJO0hi//hqvHBhA1iDEDy/wo9wvLS
G+NYqpTKcb0x9PX3cEH9EsX5ZYUNCrL4qyvMtWEzEy3Z7dmhm8Y=
=6UcE
-----END PGP SIGNATURE-----
Merge tag 'docs-6.9-fixes' of git://git.lwn.net/linux
Pull documentation fixes from Jonathan Corbet:
"Four small documentation fixes"
* tag 'docs-6.9-fixes' of git://git.lwn.net/linux:
docs: zswap: fix shell command format
tracing: Fix documentation on tp_printk cmdline option
docs: Fix bitfield handling in kernel-doc
Documentation: dev-tools: Add link to RV docs
Some laptops where the thermal control is handled by the EC may
provide trip points that fail the kernels new validation, but still have
working temperature sensors. An example of this is the Framework 13 AMD.
This patch allows the thermal zone to still be registered without trip
points if the trip points fail validation, allowing the temperature
sensor to be viewed and used by the user.
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218586
Fixes: 9c8647224e ("ACPI: thermal: Use library functions to obtain trip point temperature values")
Signed-off-by: Stephen Horvath <s.horvath@outlook.com.au>
[ rjw: Subject edits, remove redundant braces ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Lots of fixes for situations with extreme filesystem damage. One fix
("Fix journal pins in btree write buffer") applicable to normal usage;
also a dio performance fix.
New repair/construction code is in the final stages, should be ready in
about a week. Anyone that lost btree interior nodes (or a variety of
other damage) as a result of the splitbrain bug will be able to repair
then.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmYLMPsACgkQE6szbY3K
bnYVvxAAhNgoYTsjPbA8sjCtLIsEflz76BvNT7CAVB9QaF0Em/UJvKpIJ30JkNTj
j7N8XxvRJmreSKbKGeWHRcAejHvu7bky+SCHKDHyYHxmPLlcEkwSuXcR0fYnMAQ8
Ne4ELpL0jmWOS1QHds8v8O0SP+SgYEe8E1Pryz88kLL1eWJz348RWQkg6DtsVAyO
DySr8NRntZQyRo5C9H6iEcnLdG2snhKy+AOVDIySn9P5mLuaPRSANPNNT+Kss79p
z62ZwB7So6SE23LPAUQ4HaJoGtaJlB/gxNd8J8ma3JybbEcz4PmcyVIfN3A62FVi
gOUzd1pi8/NjOvtzojghvJ1+8zxD4kmZnoX5qu+Jx3rIICplQ6u9rYUiwTQRxYbw
QDeJkwmBdQFosl6iG+ji26ui0yZO1GNQpu2XCCv7JSVLddgNZLRb1v+b7uQzuYLA
7gQTYYXF+1g/WK3se3NlFVsPV+keqPFX2pYX1ySptLLr3QD5SX6d2SJIkNb4oV6c
T+1YA7BjGIzgSy4ZE/Q1jVQCKnIYYsW5bL9mvh/q2SSUfMc3uSUMRM4zsRCW6djB
SQKehKVAZBGUNgB5WOFslEUKwUPnGGfO1YAXyqumf1tkSs59CI5NLZfTQFaDFOND
2iS9HmxE4zdOckaM0eBkhAN349YJSaVZwD3C4Nb+qHjzT50ly7s=
=OLvh
-----END PGP SIGNATURE-----
Merge tag 'bcachefs-2024-04-01' of https://evilpiepirate.org/git/bcachefs
Pull bcachefs fixes from Kent Overstreet:
"Lots of fixes for situations with extreme filesystem damage.
One fix ("Fix journal pins in btree write buffer") applicable to
normal usage; also a dio performance fix.
New repair/construction code is in the final stages, should be ready
in about a week. Anyone that lost btree interior nodes (or a variety
of other damage) as a result of the splitbrain bug will be able to
repair then"
* tag 'bcachefs-2024-04-01' of https://evilpiepirate.org/git/bcachefs: (32 commits)
bcachefs: On emergency shutdown, print out current journal sequence number
bcachefs: Fix overlapping extent repair
bcachefs: Fix remove_dirent()
bcachefs: Logged op errors should be ignored
bcachefs: Improve -o norecovery; opts.recovery_pass_limit
bcachefs: bch2_run_explicit_recovery_pass_persistent()
bcachefs: Ensure bch_sb_field_ext always exists
bcachefs: Flush journal immediately after replay if we did early repair
bcachefs: Resume logged ops after fsck
bcachefs: Add error messages to logged ops fns
bcachefs: Split out recovery_passes.c
bcachefs: fix backpointer for missing alloc key msg
bcachefs: Fix bch2_btree_increase_depth()
bcachefs: Kill bch2_bkey_ptr_data_type()
bcachefs: Fix use after free in check_root_trans()
bcachefs: Fix repair path for missing indirect extents
bcachefs: Fix use after free in bch2_check_fix_ptrs()
bcachefs: Fix btree node keys accounting in topology repair path
bcachefs: Check btree ptr min_key in .invalid
bcachefs: add REQ_SYNC and REQ_IDLE in write dio
...
mean_and_variance_test_2 and mean_and_variance_test_4 always fail.
The input parameters to those tests are identical to the input parameters
to tests 1 and 3, yet the expected result for tests 2 and 4 is different
for the mean and stddev tests. That will always fail.
Expected mean_and_variance_get_mean(mv) == mean[i], but
mean_and_variance_get_mean(mv) == 22 (0x16)
mean[i] == 10 (0xa)
Drop the bad tests.
Fixes: 65bc410907 ("mean and variance: More tests")
Closes: https://lore.kernel.org/lkml/065b94eb-6a24-4248-b7d7-d3212efb4787@roeck-us.net/
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>