Conntrack assumes an unconfirmed entry (not yet committed to global hash
table) has a refcount of 1 and is not visible to other cores.
With multicast forwarding this assumption breaks down because such
skbs get cloned after being picked up, i.e. ct->use refcount is > 1.
Likewise, bridge netfilter will clone broad/mutlicast frames and
all frames in case they need to be flood-forwarded during learning
phase.
For ip multicast forwarding or plain bridge flood-forward this will
"work" because packets don't leave softirq and are implicitly
serialized.
With nfqueue this no longer holds true, the packets get queued
and can be reinjected in arbitrary ways.
Disable this feature, I see no other solution.
After this patch, nfqueue cannot queue packets except the last
multicast/broadcast packet.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* Fix failure to start guests with kvm.use_gisa=0
* Panic if (un)share fails to maintain security.
ARM:
* Use kvfree() for the kvmalloc'd nested MMUs array
* Set of fixes to address warnings in W=1 builds
* Make KVM depend on assembler support for ARMv8.4
* Fix for vgic-debug interface for VMs without LPIs
* Actually check ID_AA64MMFR3_EL1.S1PIE in get-reg-list selftest
* Minor code / comment cleanups for configuring PAuth traps
* Take kvm->arch.config_lock to prevent destruction / initialization
race for a vCPU's CPUIF which may lead to a UAF
x86:
* Disallow read-only memslots for SEV-ES and SEV-SNP (and TDX)
* Fix smatch issues
* Small cleanups
* Make x2APIC ID 100% readonly
* Fix typo in uapi constant
Generic:
* Use synchronize_srcu_expedited() on irqfd shutdown
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAma85nQUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroPq5gf8DVZjs2yNwdPLAUN7AOpElsobFnVN
etJ1V09XsfSYMIAV6ksvC1VBzHpyOJN2QKuF0nfIiISsV8W9xLcV2rKIodsGBvdV
K5ODL/yxeYI27t6Uferra1AGlchtn3tlpZzVarZIgRvZa3NMXaQPYJdcQr2Oybou
7hZsboMTl6jaSl6NELzcBRksfkcOqQLQoUqVBqlkBTM3yyFRmV85BisRkOWCIBfA
9c+kn7ZWBfOqYaXjiLeCrdCKrUuFLfR3ejJibYFan5MULYHIL95W/WJo9uQJ1QBr
BNjMfmtVZ2JOWya40uUSKrvxJ0IErAlMNgmnpjeA4cBqYK5GRHUKvO5jNw==
=A8SH
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"s390:
- Fix failure to start guests with kvm.use_gisa=0
- Panic if (un)share fails to maintain security.
ARM:
- Use kvfree() for the kvmalloc'd nested MMUs array
- Set of fixes to address warnings in W=1 builds
- Make KVM depend on assembler support for ARMv8.4
- Fix for vgic-debug interface for VMs without LPIs
- Actually check ID_AA64MMFR3_EL1.S1PIE in get-reg-list selftest
- Minor code / comment cleanups for configuring PAuth traps
- Take kvm->arch.config_lock to prevent destruction / initialization
race for a vCPU's CPUIF which may lead to a UAF
x86:
- Disallow read-only memslots for SEV-ES and SEV-SNP (and TDX)
- Fix smatch issues
- Small cleanups
- Make x2APIC ID 100% readonly
- Fix typo in uapi constant
Generic:
- Use synchronize_srcu_expedited() on irqfd shutdown"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (21 commits)
KVM: SEV: uapi: fix typo in SEV_RET_INVALID_CONFIG
KVM: x86: Disallow read-only memslots for SEV-ES and SEV-SNP (and TDX)
KVM: eventfd: Use synchronize_srcu_expedited() on shutdown
KVM: selftests: Add a testcase to verify x2APIC is fully readonly
KVM: x86: Make x2APIC ID 100% readonly
KVM: x86: Use this_cpu_ptr() instead of per_cpu_ptr(smp_processor_id())
KVM: x86: hyper-v: Remove unused inline function kvm_hv_free_pa_page()
KVM: SVM: Fix an error code in sev_gmem_post_populate()
KVM: SVM: Fix uninitialized variable bug
KVM: arm64: vgic: Hold config_lock while tearing down a CPU interface
KVM: selftests: arm64: Correct feature test for S1PIE in get-reg-list
KVM: arm64: Tidying up PAuth code in KVM
KVM: arm64: vgic-debug: Exit the iterator properly w/o LPI
KVM: arm64: Enforce dependency on an ARMv8.4-aware toolchain
s390/uv: Panic for set and remove shared access UVC errors
KVM: s390: fix validity interception issue when gisa is switched off
docs: KVM: Fix register ID of SPSR_FIQ
KVM: arm64: vgic: fix unexpected unlock sparse warnings
KVM: arm64: fix kdoc warnings in W=1 builds
KVM: arm64: fix override-init warnings in W=1 builds
...
Commit 264640fc2c ("ipv6: distinguish frag queues by device
for multicast and link-local packets") modified the ipv6 fragment
reassembly logic to distinguish frag queues by device for multicast
and link-local packets but in fact only the main reassembly code
limits the use of the device to those address types and the netfilter
reassembly code uses the device for all packets.
This means that if fragments of a packet arrive on different interfaces
then netfilter will fail to reassemble them and the fragments will be
expired without going any further through the filters.
Fixes: 648700f76b ("inet: frags: use rhashtables for reassembly units")
Signed-off-by: Tom Hughes <tom@compton.nu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
"INVALID" is misspelt in "SEV_RET_INAVLID_CONFIG". Since this is part of
the UAPI, keep the current definition and add a new one with the fix.
Fix-suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Amit Shah <amit.shah@amd.com>
Message-ID: <20240814083113.21622-1-amit@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Disallow read-only memslots for SEV-{ES,SNP} VM types, as KVM can't
directly emulate instructions for ES/SNP, and instead the guest must
explicitly request emulation. Unless the guest explicitly requests
emulation without accessing memory, ES/SNP relies on KVM creating an MMIO
SPTE, with the subsequent #NPF being reflected into the guest as a #VC.
But for read-only memslots, KVM deliberately doesn't create MMIO SPTEs,
because except for ES/SNP, doing so requires setting reserved bits in the
SPTE, i.e. the SPTE can't be readable while also generating a #VC on
writes. Because KVM never creates MMIO SPTEs and jumps directly to
emulation, the guest never gets a #VC. And since KVM simply resumes the
guest if ES/SNP guests trigger emulation, KVM effectively puts the vCPU
into an infinite #NPF loop if the vCPU attempts to write read-only memory.
Disallow read-only memory for all VMs with protected state, i.e. for
upcoming TDX VMs as well as ES/SNP VMs. For TDX, it's actually possible
to support read-only memory, as TDX uses EPT Violation #VE to reflect the
fault into the guest, e.g. KVM could configure read-only SPTEs with RX
protections and SUPPRESS_VE=0. But there is no strong use case for
supporting read-only memslots on TDX, e.g. the main historical usage is
to emulate option ROMs, but TDX disallows executing from shared memory.
And if someone comes along with a legitimate, strong use case, the
restriction can always be lifted for TDX.
Don't bother trying to retroactively apply the restriction to SEV-ES
VMs that are created as type KVM_X86_DEFAULT_VM. Read-only memslots can't
possibly work for SEV-ES, i.e. disallowing such memslots is really just
means reporting an error to userspace instead of silently hanging vCPUs.
Trying to deal with the ordering between KVM_SEV_INIT and memslot creation
isn't worth the marginal benefit it would provide userspace.
Fixes: 26c44aa9e0 ("KVM: SEV: define VM types for SEV and SEV-ES")
Fixes: 1dfe571c12 ("KVM: SEV: Add initial SEV-SNP support")
Cc: Peter Gonda <pgonda@google.com>
Cc: Michael Roth <michael.roth@amd.com>
Cc: Vishal Annapurve <vannapurve@google.com>
Cc: Ackerly Tng <ackerleytng@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20240809190319.1710470-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAma8yoAUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXOJWw/8D8ZhOLdK9HFKpywIA855CHXbTVHz
VSM9avEykSHTPqWhGx9dqaS8hG/xG7Fh5t3Me2goh0E7dAZjJQCGf/PXyh8Ntotj
5QJB3CEzi9o4yM5yCgNgEYtnVdP52Y8SxfzE9TQWST7EbWsrWDjE68a/08h0cRkr
XTScFIB+SVHOJGBcdSKdYyVqnpgoE8OeWOxVa2OnThug8VPB5WMmJIPkMQ2ykOR4
4dGykI8ULNSiZ+0skfdhxXaykYDdajzCSrk21OP2iaD/41X+I627HaJzCxa3m9s0
z04XtaOV31Jye8McPF6dCHtloFQvZ/deXXeow2yytR7WnT5/KDwBt+8A7mDcKL0V
uhW9DiwxCit6Pao4a/YuEerXqxSH4DdJmTCbqrHITnQWdAW9UQ0jsY1msxwal/aq
l9kK1SJoB3/5A0Dcsc57TFCvh2dhwRnAU3Ik00SCW2LlZrsCmBSPMRCsTHDBVg49
R5ya6qFB9+KEcAjZgJqa15X8WC8cmx+/96mDScd6VjS2Eyh57tZmJgonjlUwJplH
A1erboJtlo2NHzMaxmIXEsaxJYvrOcwKDeQ80H81SBIDPLweW2FYDVZpp4/nx++2
+8myHz9KM7iWFVceJuqvNRix4iR3RkXxvgocm1CzUedgXYUJOVM/li+0pjgJ/bUM
gPo7Sr3BAk9T43k=
=B712
-----END PGP SIGNATURE-----
Merge tag 'selinux-pr-20240814' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux fixes from Paul Moore:
- Fix a xperms counting problem where we adding to the xperms count
even if we failed to add the xperm.
- Propogate errors from avc_add_xperms_decision() back to the caller so
that we can trigger the proper cleanup and error handling.
- Revert our use of vma_is_initial_heap() in favor of our older logic
as vma_is_initial_heap() doesn't correctly handle the no-heap case
and it is causing issues with the SELinux process/execheap access
control. While the older SELinux logic may not be perfect, it
restores the expected user visible behavior.
Hopefully we will be able to resolve the problem with the
vma_is_initial_heap() macro with the mm folks, but we need to fix
this in the meantime.
* tag 'selinux-pr-20240814' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: revert our use of vma_is_initial_heap()
selinux: add the processing of the failure of avc_add_xperms_decision()
selinux: fix potential counting error in avc_add_xperms_decision()
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZrym4AAKCRCRxhvAZXjc
oqT3AP9ydoUNavaZcRayH8r3ybvz9+aJGJ6Q7NznFVCk71vn0gD/buLzmq96Muns
M5DWHbft2AFwK0Rz2nx8j5OXUeHwrQg=
=HZBL
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.11-rc4.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
"VFS:
- Fix the name of file lease slab cache. When file leases were split
out of file locks the name of the file lock slab cache was used for
the file leases slab cache as well.
- Fix a type in take_fd() helper.
- Fix infinite directory iteration for stable offsets in tmpfs.
- When the icache is pruned all reclaimable inodes are marked with
I_FREEING and other processes that try to lookup such inodes will
block.
But some filesystems like ext4 can trigger lookups in their inode
evict callback causing deadlocks. Ext4 does such lookups if the
ea_inode feature is used whereby a separate inode may be used to
store xattrs.
Introduce I_LRU_ISOLATING which pins the inode while its pages are
reclaimed. This avoids inode deletion during inode_lru_isolate()
avoiding the deadlock and evict is made to wait until
I_LRU_ISOLATING is done.
netfs:
- Fault in smaller chunks for non-large folio mappings for
filesystems that haven't been converted to large folios yet.
- Fix the CONFIG_NETFS_DEBUG config option. The config option was
renamed a short while ago and that introduced two minor issues.
First, it depended on CONFIG_NETFS whereas it wants to depend on
CONFIG_NETFS_SUPPORT. The former doesn't exist, while the latter
does. Second, the documentation for the config option wasn't fixed
up.
- Revert the removal of the PG_private_2 writeback flag as ceph is
using it and fix how that flag is handled in netfs.
- Fix DIO reads on 9p. A program watching a file on a 9p mount
wouldn't see any changes in the size of the file being exported by
the server if the file was changed directly in the source
filesystem. Fix this by attempting to read the full size specified
when a DIO read is requested.
- Fix a NULL pointer dereference bug due to a data race where a
cachefiles cookies was retired even though it was still in use.
Check the cookie's n_accesses counter before discarding it.
nsfs:
- Fix ioctl declaration for NS_GET_MNTNS_ID from _IO() to _IOR() as
the kernel is writing to userspace.
pidfs:
- Prevent the creation of pidfds for kthreads until we have a
use-case for it and we know the semantics we want. It also confuses
userspace why they can get pidfds for kthreads.
squashfs:
- Fix an unitialized value bug reported by KMSAN caused by a
corrupted symbolic link size read from disk. Check that the
symbolic link size is not larger than expected"
* tag 'vfs-6.11-rc4.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
Squashfs: sanity check symbolic link size
9p: Fix DIO read through netfs
vfs: Don't evict inode under the inode lru traversing context
netfs: Fix handling of USE_PGPRIV2 and WRITE_TO_CACHE flags
netfs, ceph: Revert "netfs: Remove deprecated use of PG_private_2 as a second writeback flag"
file: fix typo in take_fd() comment
pidfd: prevent creation of pidfds for kthreads
netfs: clean up after renaming FSCACHE_DEBUG config
libfs: fix infinite directory reads for offset dir
nsfs: fix ioctl declaration
fs/netfs/fscache_cookie: add missing "n_accesses" check
filelock: fix name of file_lease slab cache
netfs: Fault in smaller chunks for non-large folio mappings
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAma75lcACgkQ6rmadz2v
bToAARAAjGtU7TXWr7mRO9IuwjhbnIv4STWsfZQAOm9qIVxl72vqE6YQVISSI6+l
lBgVbrzvrd5DhO3y/xz5/4e/uS2zrS566jiwI+g7rvHYTI4CWBXrfewXIwPEjprT
nXntq+tRM7DpdiZk3KmO3nf/CQyj0Pft5AJQR3B3fmehoqjrhDNk6YnSum1SoRPx
yBparmWpIn4ysk2y0RCB4bpzjQIxCQwhYAVpQeiFar7blfWMKGYi+/ti0fPnvWFG
atUU7JZhIBaXcDfVEC82vECd/CkxZpWrgtv9kgp3b75ylE3vhR6C9Kq2nd0zBOFU
+wqrAwStLRDFnv+WkAHkPH2vIXQ+9ACwefINYrzzjFmoQr75jEYlo7oal2GU7sQI
Y5M9qGW6HcVUtw9soGT5eoqlwK2Uot/FpvsAVKCZq6TcniCGDeQomXEy6Rmc5aSz
HHIbmYSsSwuoWkXZ8Gn2He2QKTmGSG9d3EII8Ge6C7Ev2Rx1ARLZKG+jAy+igMfn
QDL9capp3LSTvbPGYU1z0llN6nW4u9JrkCPjFdopwM8RZoXphArtZMki87StXmH/
eZxA6PEJcFdBhCUkge14s/D7YW+dw05YBRsJ6QnEQULHx6pd77jlhT4Sv+U/SGL/
ghtmqIdT8JreL4e4dCS/sH7fkaT9ZmO12Awi3dPz8cs/B5LwdWY=
=sGTX
-----END PGP SIGNATURE-----
Merge tag 'bpf-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull bpf fixes from Alexei Starovoitov:
- Fix bpftrace regression from Kyle Huey.
Tracing bpf prog was called with perf_event input arguments causing
bpftrace produce garbage output.
- Fix verifier crash in stacksafe() from Yonghong Song.
Daniel Hodges reported verifier crash when playing with sched-ext.
The stack depth in the known verifier state was larger than stack
depth in being explored state causing out-of-bounds access.
- Fix update of freplace prog in prog_array from Leon Hwang.
freplace prog type wasn't recognized correctly.
* tag 'bpf-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
perf/bpf: Don't call bpf_overflow_handler() for tracing events
selftests/bpf: Add a test to verify previous stacksafe() fix
bpf: Fix a kernel verifier crash in stacksafe()
bpf: Fix updating attached freplace prog in prog_array map
This reverts commit 28ab976911.
Sense data can be in either fixed format or descriptor format.
SAT-6 revision 1, "10.4.6 Control mode page", defines the D_SENSE bit:
"The SATL shall support this bit as defined in SPC-5 with the following
exception: if the D_ SENSE bit is set to zero (i.e., fixed format sense
data), then the SATL should return fixed format sense data for ATA
PASS-THROUGH commands."
The libata SATL has always kept D_SENSE set to zero by default. (It is
however possible to change the value using a MODE SELECT SG_IO command.)
Failed ATA PASS-THROUGH commands correctly respected the D_SENSE bit,
however, successful ATA PASS-THROUGH commands incorrectly returned the
sense data in descriptor format (regardless of the D_SENSE bit).
Commit 28ab976911 ("ata: libata-scsi: Honor the D_SENSE bit for
CK_COND=1 and no error") fixed this bug for successful ATA PASS-THROUGH
commands.
However, after commit 28ab976911 ("ata: libata-scsi: Honor the D_SENSE
bit for CK_COND=1 and no error"), there were bug reports that hdparm,
hddtemp, and udisks were no longer working as expected.
These applications incorrectly assume the returned sense data is in
descriptor format, without even looking at the RESPONSE CODE field in the
returned sense data (to see which format the returned sense data is in).
Considering that there will be broken versions of these applications around
roughly forever, we are stuck with being bug compatible with older kernels.
Cc: stable@vger.kernel.org # 4.19+
Reported-by: Stephan Eisvogel <eisvogel@seitics.de>
Reported-by: Christian Heusel <christian@heusel.eu>
Closes: https://lore.kernel.org/linux-ide/0bf3f2f0-0fc6-4ba5-a420-c0874ef82d64@heusel.eu/
Fixes: 28ab976911 ("ata: libata-scsi: Honor the D_SENSE bit for CK_COND=1 and no error")
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20240813131900.1285842-2-cassel@kernel.org
Signed-off-by: Niklas Cassel <cassel@kernel.org>
This patch is based on the discussions between Neal Cardwell and
Eric Dumazet in the link
https://lore.kernel.org/netdev/20240726204105.1466841-1-quic_subashab@quicinc.com/
It was correctly pointed out that tp->window_clamp would not be
updated in cases where net.ipv4.tcp_moderate_rcvbuf=0 or if
(copied <= tp->rcvq_space.space). While it is expected for most
setups to leave the sysctl enabled, the latter condition may
not end up hitting depending on the TCP receive queue size and
the pattern of arriving data.
The updated check should be hit only on initial MSS update from
TCP_MIN_MSS to measured MSS value and subsequently if there was
an update to a larger value.
Fixes: 05f76b2d63 ("tcp: Adjust clamping window for applications specifying SO_RCVBUF")
Signed-off-by: Sean Tranchetti <quic_stranche@quicinc.com>
Signed-off-by: Subash Abhinov Kasiviswanathan <quic_subashab@quicinc.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit a0821ca14b ("media: atomisp: Remove test pattern generator (TPG)
support") broke BYT support because it removed a seemingly unused field
from struct sh_css_sp_config and a seemingly unused value from enum
ia_css_input_mode.
But these are part of the ABI between the kernel and firmware on ISP2400
and this part of the TPG support removal changes broke ISP2400 support.
ISP2401 support was not affected because on ISP2401 only a part of
struct sh_css_sp_config is used.
Restore the removed field and enum value to fix this.
Fixes: a0821ca14b ("media: atomisp: Remove test pattern generator (TPG) support")
Cc: stable@vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Keep the URL in a single line for easier copy-pasting.
Signed-off-by: Enguerrand de Ribaucourt <enguerrand.de-ribaucourt@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Link: https://patch.msgid.link/20240812124346.597702-1-enguerrand.de-ribaucourt@savoirfairelinux.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The value passed as the l3_proto argument of mvneta_txq_desc_csum()
is __be16. And mvneta_txq_desc_csum uses this parameter as a __be16
value. So use __be16 as the type for the parameter, rather than
type with host byte order.
Flagged by Sparse as:
.../mvneta.c:1796:25: warning: restricted __be16 degrades to integer
.../mvneta.c:1979:45: warning: incorrect type in argument 2 (different base types)
.../mvneta.c:1979:45: expected int l3_proto
.../mvneta.c:1979:45: got restricted __be16 [usertype] l3_proto
No functional change intended.
Flagged by Sparse.
Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Marcin Wojtas <marcin.s.wojtas@gmail.com>
Link: https://patch.msgid.link/20240812-mvneta-be16-v1-1-e1ea12234230@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
`fec_ptp_pps_perout()` reimplements logic already
in `fec_ptp_read()`. Replace with function call.
Signed-off-by: Csókás, Bence <csokas.bence@prolan.hu>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20240812094713.2883476-2-csokas.bence@prolan.hu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ssn_offset field is u32 and is placed into the netlink response with
nla_put_u32(), but only 2 bytes are reserved for the attribute payload
in subflow_get_info_size() (even though it makes no difference
in the end, as it is aligned up to 4 bytes). Supply the correct
argument to the relevant nla_total_size() call to make it less
confusing.
Fixes: 5147dfb508 ("mptcp: allow dumping subflow context to userspace")
Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240812065024.GA19719@asgard.redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
'struct config_item_type' is not modified in this driver.
This structure is only used with config_group_init_type_name() which takes
a const struct config_item_type* as a 3rd argument.
This also makes things consistent with 'netconsole_target_type' witch is
already const.
Constifying this structure moves some data to a read-only section, so
increase overall security, especially when the structure holds some
function pointers.
On a x86_64, with allmodconfig:
Before:
======
text data bss dec hex filename
33007 3952 1312 38271 957f drivers/net/netconsole.o
After:
=====
text data bss dec hex filename
33071 3888 1312 38271 957f drivers/net/netconsole.o
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/9c205b2b4bdb09fc9e9d2cb2f2936ec053da1b1b.1723325900.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a device option to inform the driver about the hash key size and
hash table size used by the device. This information will be stored and
made available for RSS ethtool operations.
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240812222013.1503584-2-pkaligineedi@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- binfmt_flat: Fix corruption when not offsetting data start
- exec: Fix ToCToU between perm check and set-uid/gid usage
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRSPkdeREjth1dHnSE2KwveOeQkuwUCZrvBmwAKCRA2KwveOeQk
u9Z2AQDgtypF4Kficiwn9BwZL5OxCxr9XnFuCYUue8Ufzm2WdQEAwhl1tcbs+Auf
VAqpr6gZTRlpBjbl55LHeyMdxIbwhg4=
=L28M
-----END PGP SIGNATURE-----
Merge tag 'execve-v6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull execve fixes from Kees Cook:
- binfmt_flat: Fix corruption when not offsetting data start
- exec: Fix ToCToU between perm check and set-uid/gid usage
* tag 'execve-v6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
exec: Fix ToCToU between perm check and set-uid/gid usage
binfmt_flat: Fix corruption when not offsetting data start
Replace additionalProperties with unevaluatedProperties because it have
allOf: $ref: ethernet-controller.yaml#.
Remove all properties, which already defined in ethernet-controller.yaml.
Fixed below CHECK_DTBS warnings:
arch/arm64/boot/dts/freescale/fsl-lx2160a-bluebox3.dtb:
fsl-mc@80c000000: dpmacs:ethernet@11: 'fixed-link' does not match any of the regexes: 'pinctrl-[0-9]+'
from schema $id: http://devicetree.org/schemas/misc/fsl,qoriq-mc.yaml#
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/20240811184049.3759195-1-Frank.Li@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Correct spelling problems for Documentation/networking/ as reported
by ispell.
Signed-off-by: Jing-Ping Jan <zoo868e@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240812170910.5760-1-zoo868e@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add support for offloading cls U32 filters. Only "skbedit queue_mapping"
and "drop" actions are supported. Also, only "ip" and "802_3" tc
protocols are allowed. The PF must advertise the VIRTCHNL_VF_OFFLOAD_TC_U32
capability flag.
Since the filters will be enabled via the FD stage at the PF, a new type
of FDIR filters is added and the existing list and state machine are used.
The new filters can be used to configure flow directors based on raw
(binary) pattern in the rx packet.
Examples:
0. # tc qdisc add dev enp175s0v0 ingress
1. Redirect UDP from src IP 192.168.2.1 to queue 12:
# tc filter add dev <dev> protocol ip ingress u32 \
match u32 0x45000000 0xff000000 at 0 \
match u32 0x00110000 0x00ff0000 at 8 \
match u32 0xC0A80201 0xffffffff at 12 \
match u32 0x00000000 0x00000000 at 24 \
action skbedit queue_mapping 12 skip_sw
2. Drop all ICMP:
# tc filter add dev <dev> protocol ip ingress u32 \
match u32 0x45000000 0xff000000 at 0 \
match u32 0x00010000 0x00ff0000 at 8 \
match u32 0x00000000 0x00000000 at 24 \
action drop skip_sw
3. Redirect ICMP traffic from MAC 3c:fd:fe:a5:47:e0 to queue 7
(note proto: 802_3):
# tc filter add dev <dev> protocol 802_3 ingress u32 \
match u32 0x00003CFD 0x0000ffff at 4 \
match u32 0xFEA547E0 0xffffffff at 8 \
match u32 0x08004500 0xffffff00 at 12 \
match u32 0x00000001 0x000000ff at 20 \
match u32 0x0000 0x0000 at 40 \
action skbedit queue_mapping 7 skip_sw
Notes on matches:
1 - All intermediate fields that are needed to parse the correct PTYPE
must be provided (in e.g. 3: Ethernet Type 0x0800 in MAC, IP version
and IP length: 0x45 and protocol: 0x01 (ICMP)).
2 - The last match must provide an offset that guarantees all required
headers are accounted for, even if the last header is not matched.
For example, in #2, the last match is 4 bytes at offset 24 starting
from IP header, so the total is 14 (MAC) + 24 + 4 = 42, which is the
sum of MAC+IP+ICMP headers.
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
In preparation for a second type of FDIR filters that can be added by
tc-u32, move the add/del of the FDIR logic to be entirely contained in
iavf_fdir.c.
The iavf_find_fdir_fltr_by_loc() is renamed to iavf_find_fdir_fltr()
to be more agnostic to the filter ID parameter (for now @loc, which is
relevant only to current FDIR filters added via ethtool).
The FDIR filter deletion is moved from iavf_del_fdir_ethtool() in
ethtool.c to iavf_fdir_del_fltr(). While at it, fix a minor bug where
the "fltr" is accessed out of the fdir_fltr_lock spinlock protection.
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Enable VFs to create FDIR filters from raw binary patterns.
The corresponding processes for raw flow are added in the
Parse / Create / Destroy stages.
Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
Co-developed-by: Ahmed Zaki <ahmed.zaki@intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The SWAP Flag in the FDIR Programming Descriptor doesn't work properly,
it is always set and cannot be unset (hardware bug). Thus, add a method
to effectively disable the FDIR SWAP option by setting the FDSWAP instead
of FDINSET registers.
Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The patch extends existing virtchnl_proto_hdrs structure to allow VF
to pass a pair of buffers as packet data and mask that describe
a match pattern of a filter rule. Then the kernel PF driver is requested
to parse the pair of buffer and figure out low level hardware metadata
(ptype, profile, field vector.. ) to program the expected FDIR or RSS
rules.
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add API ice_parser_profile_init() to init a parser profile based on
a parser result and a mask buffer. The ice_parser_profile struct is used
by the low level FXP engine to create HW profile/field vectors.
Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add support for the vxlan, geneve, ecpri UDP tunnels through the
following APIs:
- ice_parser_vxlan_tunnel_set()
- ice_parser_geneve_tunnel_set()
- ice_parser_ecpri_tunnel_set()
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add API ice_parser_dvm_set() to support turning on/off the parser's double
vlan mode.
Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
Co-developed-by: Ahmed Zaki <ahmed.zaki@intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add the following internal helper functions:
- ice_bst_tcam_match():
to perform ternary match on boost TCAM.
- ice_pg_cam_match():
to perform parse graph key match in cam table.
- ice_pg_nm_cam_match():
to perform parse graph key no match in cam table.
- ice_ptype_mk_tcam_match():
to perform ptype markers match in tcam table.
- ice_flg_redirect():
to redirect parser flags to packet flags.
- ice_xlt_kb_flag_get():
to aggregate 64 bit packet flag into 16 bit key builder flags.
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Parse the following DDP sections:
- ICE_SID_RXPARSER_IMEM into an array of struct ice_imem_item
- ICE_SID_RXPARSER_METADATA_INIT into an array of struct ice_metainit_item
- ICE_SID_RXPARSER_CAM or ICE_SID_RXPARSER_PG_SPILL into an array of
struct ice_pg_cam_item
- ICE_SID_RXPARSER_NOMATCH_CAM or ICE_SID_RXPARSER_NOMATCH_SPILL into an
array of struct ice_pg_nm_cam_item
- ICE_SID_RXPARSER_CAM into an array of ice_bst_tcam_item
- ICE_SID_LBL_RXPARSER_TMEM into an array of ice_lbl_item
- ICE_SID_RXPARSER_MARKER_PTYPE into an array of ice_ptype_mk_tcam_item
- ICE_SID_RXPARSER_MARKER_GRP into an array of ice_mk_grp_item
- ICE_SID_RXPARSER_PROTO_GRP into an array of ice_proto_grp_item
- ICE_SID_RXPARSER_FLAG_REDIR into an array of ice_flg_rd_item
- ICE_SID_XLT_KEY_BUILDER_SW, ICE_SID_XLT_KEY_BUILDER_ACL,
ICE_SID_XLT_KEY_BUILDER_FD and ICE_SID_XLT_KEY_BUILDER_RSS into
struct ice_xlt_kb
Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
Co-developed-by: Ahmed Zaki <ahmed.zaki@intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add new parser module which can parse a packet in binary and generate
information like ptype, protocol/offset pairs and flags which can be later
used to feed the FXP profile creation directly.
Add skeleton of the create and destroy APIs:
ice_parser_create()
ice_parser_destroy()
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
Co-developed-by: Ahmed Zaki <ahmed.zaki@intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
When opening a file for exec via do_filp_open(), permission checking is
done against the file's metadata at that moment, and on success, a file
pointer is passed back. Much later in the execve() code path, the file
metadata (specifically mode, uid, and gid) is used to determine if/how
to set the uid and gid. However, those values may have changed since the
permissions check, meaning the execution may gain unintended privileges.
For example, if a file could change permissions from executable and not
set-id:
---------x 1 root root 16048 Aug 7 13:16 target
to set-id and non-executable:
---S------ 1 root root 16048 Aug 7 13:16 target
it is possible to gain root privileges when execution should have been
disallowed.
While this race condition is rare in real-world scenarios, it has been
observed (and proven exploitable) when package managers are updating
the setuid bits of installed programs. Such files start with being
world-executable but then are adjusted to be group-exec with a set-uid
bit. For example, "chmod o-x,u+s target" makes "target" executable only
by uid "root" and gid "cdrom", while also becoming setuid-root:
-rwxr-xr-x 1 root cdrom 16048 Aug 7 13:16 target
becomes:
-rwsr-xr-- 1 root cdrom 16048 Aug 7 13:16 target
But racing the chmod means users without group "cdrom" membership can
get the permission to execute "target" just before the chmod, and when
the chmod finishes, the exec reaches brpm_fill_uid(), and performs the
setuid to root, violating the expressed authorization of "only cdrom
group members can setuid to root".
Re-check that we still have execute permissions in case the metadata
has changed. It would be better to keep a copy from the perm-check time,
but until we can do that refactoring, the least-bad option is to do a
full inode_permission() call (under inode lock). It is understood that
this is safe against dead-locks, but hardly optimal.
Reported-by: Marco Vanotti <mvanotti@google.com>
Tested-by: Marco Vanotti <mvanotti@google.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: stable@vger.kernel.org
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Signed-off-by: Kees Cook <kees@kernel.org>
The regressing commit is new in 6.10. It assumed that anytime event->prog
is set bpf_overflow_handler() should be invoked to execute the attached bpf
program. This assumption is false for tracing events, and as a result the
regressing commit broke bpftrace by invoking the bpf handler with garbage
inputs on overflow.
Prior to the regression the overflow handlers formed a chain (of length 0,
1, or 2) and perf_event_set_bpf_handler() (the !tracing case) added
bpf_overflow_handler() to that chain, while perf_event_attach_bpf_prog()
(the tracing case) did not. Both set event->prog. The chain of overflow
handlers was replaced by a single overflow handler slot and a fixed call to
bpf_overflow_handler() when appropriate. This modifies the condition there
to check event->prog->type == BPF_PROG_TYPE_PERF_EVENT, restoring the
previous behavior and fixing bpftrace.
Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Reported-by: Joe Damato <jdamato@fastly.com>
Closes: https://lore.kernel.org/lkml/ZpFfocvyF3KHaSzF@LQ3V64L9R2/
Fixes: f11f10bfa1 ("perf/bpf: Call BPF handler directly, not through overflow machinery")
Cc: stable@vger.kernel.org
Tested-by: Joe Damato <jdamato@fastly.com> # bpftrace
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240813151727.28797-1-jdamato@fastly.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
When hot-unplug a device which has many queues, and guest CPU will has
huge jitter, and unplugging is very slow.
It turns out synchronize_srcu() in irqfd_shutdown() caused the guest
jitter and unplugging latency, so replace synchronize_srcu() with
synchronize_srcu_expedited(), to accelerate the unplugging, and reduce
the guest OS jitter, this accelerates the VM reboot too.
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Message-ID: <20240711121130.38917-1-lirongqing@baidu.com>
[Call it just once in irqfd_resampler_shutdown. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAma6liAACgkQiiy9cAdy
T1Eh4wwAuTQDHjehfvCDspMn6lG8IXAtb3oio2cntkII3warxxQ/dRiIyG1JcG5Z
38e+dokvRkaUF6ntrmudUbHOerw+NRl2ozYF5pQv0+ECyJLXHDqVGnuxNvNPAsD7
RtHfFf50PdgzGKmXjmUg0GbXMgA6eLSHe9r+wwDkqmIwZHMxaJ2nGuwVjHoO/+uJ
oynxpYHIUROa2DeQiQKZAz/KHwpdSAGR4+KJRutvVCjInlb9bmSGp//BG34W4vva
nyQIpnqskmlFg4elV/ktOgCp1rbHc4lgQwsWoCDYrNOyKX83HEIRRWHUEIi7fi+Y
PBcFgTblrnuhYbUL4Z+rSmHB3YuUkvMLeKkSWSJm2M2qAZzoZWTUNLpzOcAOAcIF
uhkt1+GUuLsZu3ZoDbolMZl477DtBsbBOKsM0DZ5IMji3MRu8GpvhmOfGOAdVRpT
msTWfUoWvrc2CM09v3HBtnsAfjDXb/4ebztZxGTGVFk0uYJA1Zg655bHbYbw3tWr
jXKVa805
=Q9Qj
-----END PGP SIGNATURE-----
Merge tag '6.11-rc3-ksmbd-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:
"Two smb3 server fixes for access denied problem on share path checks"
* tag '6.11-rc3-ksmbd-fixes' of git://git.samba.org/ksmbd:
ksmbd: override fsids for smb2_query_info()
ksmbd: override fsids for share path check
Add a test to verify that userspace can't change a vCPU's x2APIC ID by
abusing KVM_SET_LAPIC. KVM models the x2APIC ID (and x2APIC LDR) as
readonly, and silently ignores userspace attempts to change the x2APIC ID
for backwards compatibility.
Signed-off-by: Michal Luczaj <mhal@rbox.co>
[sean: write changelog, add to existing test]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20240802202941.344889-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Ignore the userspace provided x2APIC ID when fixing up APIC state for
KVM_SET_LAPIC, i.e. make the x2APIC fully readonly in KVM. Commit
a92e2543d6 ("KVM: x86: use hardware-compatible format for APIC ID
register"), which added the fixup, didn't intend to allow userspace to
modify the x2APIC ID. In fact, that commit is when KVM first started
treating the x2APIC ID as readonly, apparently to fix some race:
static inline u32 kvm_apic_id(struct kvm_lapic *apic)
{
- return (kvm_lapic_get_reg(apic, APIC_ID) >> 24) & 0xff;
+ /* To avoid a race between apic_base and following APIC_ID update when
+ * switching to x2apic_mode, the x2apic mode returns initial x2apic id.
+ */
+ if (apic_x2apic_mode(apic))
+ return apic->vcpu->vcpu_id;
+
+ return kvm_lapic_get_reg(apic, APIC_ID) >> 24;
}
Furthermore, KVM doesn't support delivering interrupts to vCPUs with a
modified x2APIC ID, but KVM *does* return the modified value on a guest
RDMSR and for KVM_GET_LAPIC. I.e. no remotely sane setup can actually
work with a modified x2APIC ID.
Making the x2APIC ID fully readonly fixes a WARN in KVM's optimized map
calculation, which expects the LDR to align with the x2APIC ID.
WARNING: CPU: 2 PID: 958 at arch/x86/kvm/lapic.c:331 kvm_recalculate_apic_map+0x609/0xa00 [kvm]
CPU: 2 PID: 958 Comm: recalc_apic_map Not tainted 6.4.0-rc3-vanilla+ #35
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.2-1-1 04/01/2014
RIP: 0010:kvm_recalculate_apic_map+0x609/0xa00 [kvm]
Call Trace:
<TASK>
kvm_apic_set_state+0x1cf/0x5b0 [kvm]
kvm_arch_vcpu_ioctl+0x1806/0x2100 [kvm]
kvm_vcpu_ioctl+0x663/0x8a0 [kvm]
__x64_sys_ioctl+0xb8/0xf0
do_syscall_64+0x56/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7fade8b9dd6f
Unfortunately, the WARN can still trigger for other CPUs than the current
one by racing against KVM_SET_LAPIC, so remove it completely.
Reported-by: Michal Luczaj <mhal@rbox.co>
Closes: https://lore.kernel.org/all/814baa0c-1eaa-4503-129f-059917365e80@rbox.co
Reported-by: Haoyu Wu <haoyuwu254@gmail.com>
Closes: https://lore.kernel.org/all/20240126161633.62529-1-haoyuwu254@gmail.com
Reported-by: syzbot+545f1326f405db4e1c3e@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000c2a6b9061cbca3c3@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20240802202941.344889-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Use this_cpu_ptr() instead of open coding the equivalent in various
user return MSR helpers.
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Yuan Yao <yuan.yao@intel.com>
[sean: massage changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
Message-ID: <20240802201630.339306-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
A recent change to the driver exposed a bug where the MAC RX
filters (unicast MAC, broadcast MAC, and multicast MAC) are
configured and enabled before the RX path is fully initialized.
The result of this bug is that after the PHY is started packets
that match these MAC RX filters start to flow into the RX FIFO.
And then, after rx_init() is completed, these packets will go
into the driver RX ring as well. If enough packets are received
to fill the RX ring (default size is 128 packets) before the call
to request_irq() completes, the driver RX function becomes stuck.
This bug is intermittent but is most likely to be seen where the
oob_net0 interface is connected to a busy network with lots of
broadcast and multicast traffic.
All the MAC RX filters must be disabled until the RX path is ready,
i.e. all initialization is done and all the IRQs are installed.
Fixes: f7442a634a ("mlxbf_gige: call request_irq() after NAPI initialized")
Reviewed-by: Asmaa Mnebhi <asmaa@nvidia.com>
Signed-off-by: David Thompson <davthompson@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240809163612.12852-1-davthompson@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
There is no caller in tree since introduction in commit b4f69df0f6 ("KVM:
x86: Make Hyper-V emulation optional")
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Message-ID: <20240803113233.128185-1-yuehaibing@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Syzkiller reports a "KMSAN: uninit-value in pick_link" bug.
This is caused by an uninitialised page, which is ultimately caused
by a corrupted symbolic link size read from disk.
The reason why the corrupted symlink size causes an uninitialised
page is due to the following sequence of events:
1. squashfs_read_inode() is called to read the symbolic
link from disk. This assigns the corrupted value
3875536935 to inode->i_size.
2. Later squashfs_symlink_read_folio() is called, which assigns
this corrupted value to the length variable, which being a
signed int, overflows producing a negative number.
3. The following loop that fills in the page contents checks that
the copied bytes is less than length, which being negative means
the loop is skipped, producing an uninitialised page.
This patch adds a sanity check which checks that the symbolic
link size is not larger than expected.
--
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Link: https://lore.kernel.org/r/20240811232821.13903-1-phillip@squashfs.org.uk
Reported-by: Lizhi Xu <lizhi.xu@windriver.com>
Reported-by: syzbot+24ac24ff58dc5b0d26b9@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000a90e8c061e86a76b@google.com/
V2: fix spelling mistake.
Signed-off-by: Christian Brauner <brauner@kernel.org>