arch_dma_prep_coherent() as it caused a regression in the qcom_q6v5_mss
remoteproc driver. The driver is already buggy but the original arm64
change made the problem obvious. The change will be re-introduced once
the driver is fixed.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmOPfrIACgkQa9axLQDI
XvH2SxAAolN3oris9RYVlVsldd38XSklgEjKhotJulYUzK2+9u1Chv757sX1UlyN
YkZD2uZfHYIVMXzgEVlkecepMRNj7Vp8i4fzTRqUcYBeRjhNz8dfkAwS5L5ezA9f
NFYnh3tv4YYn5LNv03Gd2VQMrniOHYfyZKLextZyJC0OuabIHXGAgbR8vt0cJeAe
CzZkoGzmtv1IPd81DEeUZYsW+KGOHqKHj8aI+0DWHNGLmc3H/VNoQ5JlNOZSgR8F
DOAmM+UgeTVNJfTv81u4skTcBm3Dr8aDyIUlHTrjXXRPAmpeFoy1WwN22jeq0P2z
6nnbHGufgqVfhebd1Wy5sKQHg3tVZ8FEXJW9Tpw85m019v4jaPF3Pmz6e1Lvlzuz
pUX5fbLkwNHBJMUkkw5iVR9W5P3vrKkcq/XGa2y8sMPUDK7IaiMGQhVTT724oHgy
UW9j0G7iUguXNii6ZHyUPOdaA2Vrjj3AjmJX6bHhwDSvvkNZGZgA4iBY+yRoUU2V
xBuOkLE9oa5QE62jGoipajgi3CIB3cXwbNZyfPuZkkosHuqqRDkuLX5TJ2snAHhs
lSSwDlAGY+9p4jiTd8/36E2GPXZOGPDn3L/3UdYsi9ScNKYFNJREKL8b2ZBlTo9X
P0qIeCmiyMPp5s6i3D9M8xRbVySgu542tcQG3+KhbGUsl3XElr0=
=l6Q7
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fix from Catalin Marinas:
"Revert the dropping of the cache invalidation from the arm64
arch_dma_prep_coherent() as it caused a regression in the
qcom_q6v5_mss remoteproc driver.
The driver is already buggy but the original arm64 change made
the problem obvious. The change will be re-introduced once the
driver is fixed"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()"
* s390: fix multi-epoch extension in nested guests
* x86: fix uninitialized variable on nested triple fault
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmOPcBoUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroN9sAgArbccE7GCXUP3snd08rdV2Y1efKq3
SsZWlze/esVlsRBr4Z+lzMamCIdMwUT242XEq50pOjQeqYB0PjaIraDhd/lnhbk/
pYITFCoOmRtT4PyUrfYZIFeQBo+yhuhrYN8WCLFCatqY4ec+9p5nCiaFk1Wn49ZD
zFm+uVXhlTa+DGNjGzjBi3mAPqTpRNAxdlPAEHv+kyaQgfRlTSo37H8Xs6PfcLVe
YnsGG47Ozx3pPe5Zo3Sr1k56CQLjNcY77lVWXskKIiABsM3G5YJ2BlGgG007zlSB
4G0/GyJhb351c1MHk1O+AfMNPfD4shJfLdIm5bJNGdiFPz9XFdpqa49BNQ==
=SKVi
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"Unless anything comes from the ARM side, this should be the last pull
request for this release - and it's mostly documentation:
- Document the interaction between KVM_CAP_HALT_POLL and halt_poll_ns
- s390: fix multi-epoch extension in nested guests
- x86: fix uninitialized variable on nested triple fault"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: Document the interaction between KVM_CAP_HALT_POLL and halt_poll_ns
KVM: Move halt-polling documentation into common directory
KVM: x86: fix uninitialized variable use on KVM_REQ_TRIPLE_FAULT
KVM: s390: vsie: Fix the initialization of the epoch extension (epdx) field
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCY49degAKCRCAXGG7T9hj
vjt3APwJ2Xi240fsa8LRd0vvVgqU47DEw9EV2VdFs7NUchsp/gEA6PzZRAoTQS9z
x7ZQdFjf2dQSVNZM3mry4fL18r1oAwc=
=+GZe
-----END PGP SIGNATURE-----
Merge tag 'for-linus-xsa-6.1-rc9-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"Two zero-day fixes for the xen-netback driver (XSA-423 and XSA-424)"
* tag 'for-linus-xsa-6.1-rc9-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/netback: don't call kfree_skb() with interrupts disabled
xen/netback: Ensure protocol headers don't fall in the non-linear area
This reverts commit c44094eee3.
Although the semantics of the DMA API require only a clean operation
here, it turns out that the Qualcomm 'qcom_q6v5_mss' remoteproc driver
(ab)uses the DMA API for transferring the modem firmware to the secure
world via calls to Trustzone [1].
Once the firmware buffer has changed hands, _any_ access from the
non-secure side (i.e. Linux) will be detected on the bus and result in a
full system reset [2]. Although this is possible even with this revert
in place (due to speculative reads via the cacheable linear alias of
memory), anecdotally the problem occurs considerably more frequently
when the lines have not been invalidated, assumedly due to some
micro-architectural interactions with the cache hierarchy.
Revert the offending change for now, along with a comment, so that the
Qualcomm developers have time to fix the driver [3] to use a firmware
buffer which does not have a cacheable alias in the linear map.
Link: https://lore.kernel.org/r/20221114110329.68413-1-manivannan.sadhasivam@linaro.org [1]
Link: https://lore.kernel.org/r/CAMi1Hd3H2k1J8hJ6e-Miy5+nVDNzv6qQ3nN-9929B0GbHJkXEg@mail.gmail.com/ [2]
Link: https://lore.kernel.org/r/20221206092152.GD15486@thinkpad [2]
Reported-by: Amit Pundir <amit.pundir@linaro.org>
Reported-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Cc: Thorsten Leemhuis <regressions@leemhuis.info>
Cc: Sibi Sankar <quic_sibis@quicinc.com>
Signed-off-by: Will Deacon <will@kernel.org>
Acked-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20221206103403.646-1-will@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
It is not allowed to call kfree_skb() from hardware interrupt
context or with interrupts being disabled. So remove kfree_skb()
from the spin_lock_irqsave() section and use the already existing
"drop" label in xenvif_start_xmit() for dropping the SKB. At the
same time replace the dev_kfree_skb() call there with a call of
dev_kfree_skb_any(), as xenvif_start_xmit() can be called with
disabled interrupts.
This is XSA-424 / CVE-2022-42328 / CVE-2022-42329.
Fixes: be81992f90 ("xen/netback: don't queue unlimited number of packages")
Reported-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
In some cases, the frontend may send a packet where the protocol headers
are spread across multiple slots. This would result in netback creating
an skb where the protocol headers spill over into the non-linear area.
Some drivers and NICs don't handle this properly resulting in an
interface reset or worse.
This issue was introduced by the removal of an unconditional skb pull in
the tx path to improve performance. Fix this without reintroducing the
pull by setting up grant copy ops for as many slots as needed to reach
the XEN_NETBACK_TX_COPY_LEN size. Adjust the rest of the code to handle
multiple copy operations per skb.
This is XSA-423 / CVE-2022-3643.
Fixes: 7e5d775395 ("xen-netback: remove unconditional __pskb_pull_tail() in guest Tx path")
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
Create general function that sets miss group and rule to forward all
not-matched traffic to the next table.
Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Move miss handles into dedicated struct, so we can reuse it in next
patch when creating IPsec policy flow table.
Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Reuse existing struct what holds all information about modify
header pointer and rule. This helps to reduce ambiguity from the
name _err_ that doesn't describe the real purpose of that flow
table, rule and function - to copy status result from HW to
the stack.
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Rewrote the IPsec RX add rule path to be less convoluted
and don't rely on pre-initialized variables. The code now has clean
linear flow with clean separation between error and success paths.
Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
The policy offload logic needs to set flow steering rule that match
on saddr and daddr too, so factor out this code to separate functions,
together with code alignment to netdev coding pattern of relying on
family type.
As part of this change, let's separate more logic from setup_fte_common
to make sure that the function names describe that is done in the
function better than general *common* name.
Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Even now, to support IPsec crypto, the RX and TX paths use same
logic to create flow tables. In the following patches, we will
add more tables to support IPsec packet offload. So reuse existing
code and rewrite it to support IPsec packet offload from the beginning.
Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Create initial hardware IPsec packet offload object and connect it
to advanced steering operation (ASO) context and queue, so the data
path can communicate with the stack.
Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Setup the ASO (Advanced Steering Operation) object that is needed
for IPsec to interact with SW stack about various fast changing
events: replay window, lifetime limits, e.t.c
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
mlx5 priv structure is driver main structure that holds high level data.
That information is not needed for IPsec flow steering logic and the
pointer to mlx5e_priv was not supposed to be passed in the first place.
This change "cleans" the logic to rely on internal to IPsec structures
without touching global mlx5e_priv.
Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Low level mlx5 code needs to use mlx5_core print routines and
not netdev ones, as the failures are relevant to the HW itself
and not to its netdev.
This change allows us to remove access to mlx5 priv structure, which
holds high level driver data that isn't needed for mlx5 IPsec code.
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Remove AF family obfuscation by creating symmetric structs for RX and
TX IPsec flow steering chains. This simplifies to us low level IPsec
FS creation logic without need to dig into multiple levels of structs.
Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Instead of performing redefinition of XFRM core defines to same
values but with MLX5_* prefix, cache the input values as is by making
sure that the proper storage objects are used.
Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
As a preparation for future extension of IPsec hardware object to allow
configuration of packet offload mode, extend the XFRM validator to check
replay window values.
Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
There is no need in hiding returned ASO WQE type by providing void*,
use the real type instead. Do it together with zeroing that memory,
so ASO WQE will be ready to use immediately.
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Leon Romanovsky says:
============
The following series extends XFRM core code to handle a new type of IPsec
offload - packet offload.
In this mode, the HW is going to be responsible for the whole data path,
so both policy and state should be offloaded.
IPsec packet offload is an improved version of IPsec crypto mode,
In packet mode, HW is responsible to trim/add headers in addition to
decrypt/encrypt. In this mode, the packet arrives to the stack as already
decrypted and vice versa for TX (exits to HW as not-encrypted).
Devices that implement IPsec packet offload mode offload policies too.
In the RX path, it causes the situation that HW can't effectively
handle mixed SW and HW priorities unless users make sure that HW offloaded
policies have higher priorities.
It means that we don't need to perform any search of inexact policies
and/or priority checks if HW policy was discovered. In such situation,
the HW will catch the packets anyway and HW can still implement inexact
lookups.
In case specific policy is not found, we will continue with packet lookup
and check for existence of HW policies in inexact list.
HW policies are added to the head of SPD to ensure fast lookup, as XFRM
iterates over all policies in the loop.
This simple solution allows us to achieve same benefits of separate HW/SW
policies databases without over-engineering the code to iterate and manage
two databases at the same path.
To not over-engineer the code, HW policies are treated as SW ones and
don't take into account netdev to allow reuse of the same priorities for
policies databases without over-engineering the code to iterate and manage
two databases at the same path.
To not over-engineer the code, HW policies are treated as SW ones and
don't take into account netdev to allow reuse of the same priorities for
different devices.
* No software fallback
* Fragments are dropped, both in RX and TX
* No sockets policies
* Only IPsec transport mode is implemented
================================================================================
Rekeying:
In order to support rekeying, as XFRM core is skipped, the HW/driver should
do the following:
* Count the handled packets
* Raise event that limits are reached
* Drop packets once hard limit is occurred.
The XFRM core calls to newly introduced xfrm_dev_state_update_curlft()
function in order to perform sync between device statistics and internal
structures. On HW limit event, driver calls to xfrm_state_check_expire()
to allow XFRM core take relevant decisions.
This separation between control logic (in XFRM) and data plane allows us
to packet reuse SW stack.
================================================================================
Configuration:
iproute2: https://lore.kernel.org/netdev/cover.1652179360.git.leonro@nvidia.com/
Packet offload mode:
ip xfrm state offload packet dev <if-name> dir <in|out>
ip xfrm policy .... offload packet dev <if-name>
Crypto offload mode:
ip xfrm state offload crypto dev <if-name> dir <in|out>
or (backward compatibility)
ip xfrm state offload dev <if-name> dir <in|out>
================================================================================
Performance results:
TCP multi-stream, using iperf3 instance per-CPU.
+----------------------+--------+--------+--------+--------+---------+---------+
| | 1 CPU | 2 CPUs | 4 CPUs | 8 CPUs | 16 CPUs | 32 CPUs |
| +--------+--------+--------+--------+---------+---------+
| | BW (Gbps) |
+----------------------+--------+--------+-------+---------+---------+---------+
| Baseline | 27.9 | 59 | 93.1 | 92.8 | 93.7 | 94.4 |
+----------------------+--------+--------+-------+---------+---------+---------+
| Software IPsec | 6 | 11.9 | 23.3 | 45.9 | 83.8 | 91.8 |
+----------------------+--------+--------+-------+---------+---------+---------+
| IPsec crypto offload | 15 | 29.7 | 58.5 | 89.6 | 90.4 | 90.8 |
+----------------------+--------+--------+-------+---------+---------+---------+
| IPsec packet offload | 28 | 57 | 90.7 | 91 | 91.3 | 91.9 |
+----------------------+--------+--------+-------+---------+---------+---------+
IPsec packet offload mode behaves as baseline and reaches linerate with same amount
of CPUs.
Setups details (similar for both sides):
* NIC: ConnectX6-DX dual port, 100 Gbps each.
Single port used in the tests.
* CPU: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz
================================================================================
Series together with mlx5 part:
https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/log/?h=xfrm-next
================================================================================
Changelog:
v10:
* Added forgotten xdo_dev_state_del. Patch #4.
* Moved changelog in cover letter to the end.
* Added "if (xs->xso.type != XFRM_DEV_OFFLOAD_CRYPTO) {" line to newly
added netronome IPsec support. Patch #2.
v9: https://lore.kernel.org/all/cover.1669547603.git.leonro@nvidia.com
* Added acquire support
v8: https://lore.kernel.org/all/cover.1668753030.git.leonro@nvidia.com
* Removed not-related blank line
* Fixed typos in documentation
v7: https://lore.kernel.org/all/cover.1667997522.git.leonro@nvidia.com
As was discussed in IPsec workshop:
* Renamed "full offload" to be "packet offload".
* Added check that offloaded SA and policy have same device while sending packet
* Added to SAD same optimization as was done for SPD to speed-up lookups.
v6: https://lore.kernel.org/all/cover.1666692948.git.leonro@nvidia.com
* Fixed misplaced "!" in sixth patch.
v5: https://lore.kernel.org/all/cover.1666525321.git.leonro@nvidia.com
* Rebased to latest ipsec-next.
* Replaced HW priority patch with solution which mimics separated SPDs
for SW and HW. See more description in this cover letter.
* Dropped RFC tag, usecase, API and implementation are clear.
v4: https://lore.kernel.org/all/cover.1662295929.git.leonro@nvidia.com
* Changed title from "PATCH" to "PATCH RFC" per-request.
* Added two new patches: one to update hard/soft limits and another
initial take on documentation.
* Added more info about lifetime/rekeying flow to cover letter, see
relevant section.
* perf traces for crypto mode will come later.
v3: https://lore.kernel.org/all/cover.1661260787.git.leonro@nvidia.com
* I didn't hear any suggestion what term to use instead of
"packet offload", so left it as is. It is used in commit messages
and documentation only and easy to rename.
* Added performance data and background info to cover letter
* Reused xfrm_output_resume() function to support multiple XFRM transformations
* Add PMTU check in addition to driver .xdo_dev_offload_ok validation
* Documentation is in progress, but not part of this series yet.
v2: https://lore.kernel.org/all/cover.1660639789.git.leonro@nvidia.com
* Rebased to latest 6.0-rc1
* Add an extra check in TX datapath patch to validate packets before
forwarding to HW.
* Added policy cleanup logic in case of netdev down event
v1: https://lore.kernel.org/all/cover.1652851393.git.leonro@nvidia.com
* Moved comment to be before if (...) in third patch.
v0: https://lore.kernel.org/all/cover.1652176932.git.leonro@nvidia.com
-----------------------------------------------------------------------
============
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Horatiu Vultur says:
====================
net: lan966x: Enable PTP on bridge interfaces
Before it was not allowed to run ptp on ports that are part of a bridge
because in case of transparent clock the HW will still forward the frames
so there would be duplicate frames.
Now that there is VCAP support, it is possible to add entries in the VCAP
to trap frames to the CPU and the CPU will forward these frames.
The first part of the patch series, extends the VCAP support to be able to
modify and get the rule, while the last patch uses the VCAP to trap the ptp
frames.
====================
Link: https://lore.kernel.org/r/20221203104348.1749811-1-horatiu.vultur@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Currently lan966x, doesn't allow to run PTP over interfaces that are
part of the bridge. The reason is when the lan966x was receiving a
PTP frame (regardless if L2/IPv4/IPv6) the HW it would flood this
frame.
Now that it is possible to add VCAP rules to the HW, such to trap these
frames to the CPU, it is possible to run PTP also over interfaces that
are part of the bridge.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add the function vcap_rule_get_key_u32 which allows to get the value and
the mask of a key that exist on the rule. If the key doesn't exist,
it would return error.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add the function vcap_mod_rule which allows to update an existing rule
in the vcap. It is required for the rule to exist in the vcap to be able
to modify it.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add function vcap_get_rule which returns a rule based on the internal
rule id.
The entire functionality of reading and decoding the rule from the VCAP
was inside vcap_api_debugfs file. So move the entire implementation in
vcap_api as this is used also by vcap_get_rule.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Fix the potential risk of OOB if skb_linearize() fails in
tipc_link_proto_rcv().
Fixes: 5cbb28a4bf ("tipc: linearize arriving NAME_DISTR and LINK_PROTO buffers")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Link: https://lore.kernel.org/r/20221203094635.29024-1-yuehaibing@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The skb is delivered to napi_gro_receive() which may free it, after
calling this, dereferencing skb may trigger use-after-free.
Fixes: 57c5bc9ad7 ("net: hisilicon: add hix5hd2 mac driver")
Signed-off-by: Liu Jian <liujian56@huawei.com>
Link: https://lore.kernel.org/r/20221203094240.1240211-2-liujian56@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
There is warning report about of_node refcount leak
while probing mdio device:
OF: ERROR: memory leak, expected refcount 1 instead of 2,
of_node_get()/of_node_put() unbalanced - destroy cset entry:
attach overlay node /spi/soc@0/mdio@710700c0/ethernet@4
In of_mdiobus_register_device(), we increase fwnode refcount
by fwnode_handle_get() before associating the of_node with
mdio device, but it has never been decreased in normal path.
Since that, in mdio_device_release(), it needs to call
fwnode_handle_put() in addition instead of calling kfree()
directly.
After above, just calling mdio_device_free() in the error handle
path of of_mdiobus_register_device() is enough to keep the
refcount balanced.
Fixes: a9049e0c51 ("mdio: Add support for mdio drivers.")
Signed-off-by: Zeng Heng <zengheng4@huawei.com>
Reviewed-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20221203073441.3885317-1-zengheng4@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The skb is delivered to napi_gro_receive() which may free it, after
calling this, dereferencing skb may trigger use-after-free.
Fixes: 542ae60af2 ("net: hisilicon: Add Fast Ethernet MAC driver")
Signed-off-by: Liu Jian <liujian56@huawei.com>
Link: https://lore.kernel.org/r/20221203094240.1240211-1-liujian56@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The nicvf_probe() won't destroy workqueue when register_netdev()
failed. Add destroy_workqueue err handle case to fix this issue.
Fixes: 2ecbe4f4a0 ("net: thunderx: replace global nicvf_rx_mode_wq work queue for all VFs to private for each of them.")
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/20221203094125.602812-1-liuyongqiang13@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The skb is delivered to napi_gro_receive() which may free it, after calling this,
dereferencing skb may trigger use-after-free.
Fixes: 1c59eb678c ("ravb: Fillup ravb_rx_gbeth() stub")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20221203092941.10880-1-yuehaibing@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The mchp_sparx5_probe() won't destroy workqueue created by
create_singlethread_workqueue() in sparx5_start() when later
inits failed. Add destroy_workqueue in the cleanup_ports case,
also add it in mchp_sparx5_remove()
Fixes: b37a1bae74 ("net: sparx5: add mactable support")
Signed-off-by: Qiheng Lin <linqiheng@huawei.com>
Link: https://lore.kernel.org/r/20221203070259.19560-1-linqiheng@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Although the type I ERSPAN is based on the barebones IP + GRE
encapsulation and no extra ERSPAN header. Report erspan version on GRE
interface looks unreasonable. Fix this by separating the erspan and gre
fill info.
IPv6 GRE does not have this info as IPv6 only supports erspan version
1 and 2.
Reported-by: Jianlin Shi <jishi@redhat.com>
Fixes: f989d546a2 ("erspan: Add type I version 0 support.")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: William Tu <u9012063@gmail.com>
Link: https://lore.kernel.org/r/20221203032858.3130339-1-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
After calling napi_complete_done(), the NAPIF_STATE_SCHED bit may be
cleared, and another CPU can start napi thread and access per-CQ variable,
cq->work_done. If the other thread (for example, from busy_poll) sets
it to a value >= budget, this thread will continue to run when it should
stop, and cause memory corruption and panic.
To fix this issue, save the per-CQ work_done variable in a local variable
before napi_complete_done(), so it won't be corrupted by a possible
concurrent thread after napi_complete_done().
Also, add a flag bit to advertise to the NIC firmware: the NAPI work_done
variable race is fixed, so the driver is able to reliably support features
like busy_poll.
Cc: stable@vger.kernel.org
Fixes: e1b5683ff6 ("net: mana: Move NAPI from EQ to CQ")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://lore.kernel.org/r/1670010190-28595-1-git-send-email-haiyangz@microsoft.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
In dt-binding snps,dwmac.yaml, some properties under "snps,axi-config"
node are named without "axi_" prefix, but the driver expects the
prefix. Since the dt-binding has been there for a long time, we'd
better make driver match the binding for compatibility.
Fixes: afea03656a ("stmmac: rework DMA bus setting and introduce new platform AXI structure")
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Link: https://lore.kernel.org/r/20221202161739.2203-1-jszhang@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The node returned by of_get_parent() with refcount incremented,
of_node_put() needs be called when finish using it. So add it in the
end of of_pinctrl_get().
Fixes: 936ee2675e ("gpio/rockchip: add driver for rockchip gpio")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Add netlink based support for "ethtool -x <dev> [context x]"
command by implementing ETHTOOL_MSG_RSS_GET netlink message.
This is equivalent to functionality provided via ETHTOOL_GRSSH
in ioctl path. It sends RSS table, hash key and hash function
of an interface to user space.
This patch implements existing functionality available
in ioctl path and enables addition of new RSS context
based parameters in future.
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Link: https://lore.kernel.org/r/20221202002555.241580-1-sudheer.mogilappagari@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
proc_skip_spaces() seems to think it is working on C strings, and ends
up being just a wrapper around skip_spaces() with a really odd calling
convention.
Instead of basing it on skip_spaces(), it should have looked more like
proc_skip_char(), which really is the exact same function (except it
skips a particular character, rather than whitespace). So use that as
inspiration, odd coding and all.
Now the calling convention actually makes sense and works for the
intended purpose.
Reported-and-tested-by: Kyle Zeng <zengyhkyle@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
proc_get_long() is passed a size_t, but then assigns it to an 'int'
variable for the length. Let's not do that, even if our IO paths are
limited to MAX_RW_COUNT (exactly because of these kinds of type errors).
So do the proper test in the rigth type.
Reported-by: Kyle Zeng <zengyhkyle@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When __do_semtimedop() goes to sleep because it has to wait for a
semaphore value becoming zero or becoming bigger than some threshold, it
links the on-stack sem_queue to the sem_array, then goes to sleep
without holding a reference on the sem_array.
When __do_semtimedop() comes back out of sleep, one of two things must
happen:
a) We prove that the on-stack sem_queue has been disconnected from the
(possibly freed) sem_array, making it safe to return from the stack
frame that the sem_queue exists in.
b) We stabilize our reference to the sem_array, lock the sem_array, and
detach the sem_queue from the sem_array ourselves.
sem_array has RCU lifetime, so for case (b), the reference can be
stabilized inside an RCU read-side critical section by locklessly
checking whether the sem_queue is still connected to the sem_array.
However, the current code does the lockless check on sem_queue before
starting an RCU read-side critical section, so the result of the
lockless check immediately becomes useless.
Fix it by doing rcu_read_lock() before the lockless check. Now RCU
ensures that if we observe the object being on our queue, the object
can't be freed until rcu_read_unlock().
This bug is only hittable on kernel builds with full preemption support
(either CONFIG_PREEMPT or PREEMPT_DYNAMIC with preempt=full).
Fixes: 370b262c89 ("ipc/sem: avoid idr tree lookup for interrupted semop")
Cc: stable@vger.kernel.org
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Return -EOPNOTSUPP, when user requests l4_4_bytes for raw IP4 or
IP6 flow director filters. Flow director does not support filtering
on l4 bytes for PCTYPEs used by IP4 and IP6 filters.
Without this patch, user could create filters with l4_4_bytes fields,
which did not do any filtering on L4, but only on L3 fields.
Fixes: 36777d9fa2 ("i40e: check current configured input set when adding ntuple filters")
Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Signed-off-by: Kamil Maziarz <kamil.maziarz@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
After spawning max VFs on a PF, some VFs were not getting resources and
their MAC addresses were 0. This was caused by PF sleeping before flushing
HW registers which caused VIRTCHNL_VFR_VFACTIVE to not be set in time for
VF.
Fix by adding a sleep after hw flush.
Fixes: e4b433f4a7 ("i40e: reset all VFs in parallel when rebuilding PF")
Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
During tx rings configuration default XPS queue config is set and
__I40E_TX_XPS_INIT_DONE is locked. __I40E_TX_XPS_INIT_DONE state is
cleared and set again with default mapping only during queues build,
it means after first setup or reset with queues rebuild. (i.e.
ethtool -L <interface> combined <number>) After other resets (i.e.
ethtool -t <interface>) XPS_INIT_DONE is not cleared and those default
maps cannot be set again. It results in cleared xps_cpus mapping
until queues are not rebuild or mapping is not set by user.
Add clearing __I40E_TX_XPS_INIT_DONE state during reset to let
the driver set xps_cpus to defaults again after it was cleared.
Fixes: 6f853d4f8e ("i40e: allow XPS with QoS enabled")
Signed-off-by: Michal Jaron <michalx.jaron@intel.com>
Signed-off-by: Kamil Maziarz <kamil.maziarz@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>