bluetooth, netfilter and can.
Current release - regressions:
- mlxsw: spectrum_qdisc: Pass handle, not band number to find_class()
to fix modifying offloaded qdiscs
- lantiq: net: fix duplicated skb in rx descriptor ring
- rtnetlink: fix regression in bridge VLAN configuration, empty info
is not an error, bot-generated "fix" was not needed
- libbpf: s/rx/tx/ typo on umem->rx_ring_setup_done to fix
umem creation
Current release - new code bugs:
- ethtool: fix NULL pointer dereference during module EEPROM dump via
the new netlink API
- mlx5e: don't update netdev RQs with PTP-RQ, the special purpose queue
should not be visible to the stack
- mlx5e: select special PTP queue only for SKBTX_HW_TSTAMP skbs
- mlx5e: verify dev is present in get devlink port ndo, avoid a panic
Previous releases - regressions:
- neighbour: allow NUD_NOARP entries to be force GCed
- further fixes for fallout from reorg of WiFi locking
(staging: rtl8723bs, mac80211, cfg80211)
- skbuff: fix incorrect msg_zerocopy copy notifications
- mac80211: fix NULL ptr deref for injected rate info
- Revert "net/mlx5: Arm only EQs with EQEs" it may cause missed IRQs
Previous releases - always broken:
- bpf: more speculative execution fixes
- netfilter: nft_fib_ipv6: skip ipv6 packets from any to link-local
- udp: fix race between close() and udp_abort() resulting in a panic
- fix out of bounds when parsing TCP options before packets
are validated (in netfilter: synproxy, tc: sch_cake and mptcp)
- mptcp: improve operation under memory pressure, add missing wake-ups
- mptcp: fix double-lock/soft lookup in subflow_error_report()
- bridge: fix races (null pointer deref and UAF) in vlan tunnel egress
- ena: fix DMA mapping function issues in XDP
- rds: fix memory leak in rds_recvmsg
Misc:
- vrf: allow larger MTUs
- icmp: don't send out ICMP messages with a source address of 0.0.0.0
- cdc_ncm: switch to eth%d interface naming
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmDNP7EACgkQMUZtbf5S
IrvTmxAAgOAM9MdRl9wnYtqXKPXJ1JJtenozwt1yX6b6OG+Ns7cm6YYafU3KoZWR
KlzpvP90vRrER3RqksbMngHzvGjZKDS4LWRur7sRlJ1TBQoLrQCIbriAh07d7wlU
0nnS4J8mczTCKx78QCUYy1QBIX5TQrUbx0JQZDPoIPBjFeILW+Gx/Ghg5tUR4mhf
6icYqwIPocTXO37ZmWOzezZNVOXJF4kaQUZeuOHNe5hOtm6EeIpZbW1Xx3DIr5bd
80a/uNU7nVyos0n7jxnfVE/oelTnYbT5scZeV/PPVqZ4U113f7uex2QP23/XhGSX
lK1EhwPqPOyaNhQoihLM6Xzd4o7aZOcmF8NY96xqjC+DqdN+juvfJU+ClCZojGIj
H4bwCSaj3y2PiimfQdBiIKvYMc5d4zBdw/Dpk/gLDp4d5N638TAtuunK4Mj+TEuT
QF1qkBLIB4HFtLS0M35/twk93md/5GUdSTij2GB3fOkAWRu2m266P5m+4DigW/TB
Xm8FgKdetvxVP0Qv/p49nPEn24Ny8wCafH1x1wVTmoda2qi6j1EXMuSa0PlCdz70
Sl5FrlxdEkOpC4p+Aoc8APSoBXnOriAlpU+z/EVb8Co4JR/+Ge5zBWpsiZDVD0/K
Ay0FW3I87iyn9tw1H1Fzr9GBlVl5vWRauZFHjzl90fWakCrCzJE=
=xxUe
-----END PGP SIGNATURE-----
Merge tag 'net-5.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Networking fixes for 5.13-rc7, including fixes from wireless, bpf,
bluetooth, netfilter and can.
Current release - regressions:
- mlxsw: spectrum_qdisc: Pass handle, not band number to find_class()
to fix modifying offloaded qdiscs
- lantiq: net: fix duplicated skb in rx descriptor ring
- rtnetlink: fix regression in bridge VLAN configuration, empty info
is not an error, bot-generated "fix" was not needed
- libbpf: s/rx/tx/ typo on umem->rx_ring_setup_done to fix umem
creation
Current release - new code bugs:
- ethtool: fix NULL pointer dereference during module EEPROM dump via
the new netlink API
- mlx5e: don't update netdev RQs with PTP-RQ, the special purpose
queue should not be visible to the stack
- mlx5e: select special PTP queue only for SKBTX_HW_TSTAMP skbs
- mlx5e: verify dev is present in get devlink port ndo, avoid a panic
Previous releases - regressions:
- neighbour: allow NUD_NOARP entries to be force GCed
- further fixes for fallout from reorg of WiFi locking (staging:
rtl8723bs, mac80211, cfg80211)
- skbuff: fix incorrect msg_zerocopy copy notifications
- mac80211: fix NULL ptr deref for injected rate info
- Revert "net/mlx5: Arm only EQs with EQEs" it may cause missed IRQs
Previous releases - always broken:
- bpf: more speculative execution fixes
- netfilter: nft_fib_ipv6: skip ipv6 packets from any to link-local
- udp: fix race between close() and udp_abort() resulting in a panic
- fix out of bounds when parsing TCP options before packets are
validated (in netfilter: synproxy, tc: sch_cake and mptcp)
- mptcp: improve operation under memory pressure, add missing
wake-ups
- mptcp: fix double-lock/soft lookup in subflow_error_report()
- bridge: fix races (null pointer deref and UAF) in vlan tunnel
egress
- ena: fix DMA mapping function issues in XDP
- rds: fix memory leak in rds_recvmsg
Misc:
- vrf: allow larger MTUs
- icmp: don't send out ICMP messages with a source address of 0.0.0.0
- cdc_ncm: switch to eth%d interface naming"
* tag 'net-5.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (139 commits)
net: ethernet: fix potential use-after-free in ec_bhf_remove
selftests/net: Add icmp.sh for testing ICMP dummy address responses
icmp: don't send out ICMP messages with a source address of 0.0.0.0
net: ll_temac: Avoid ndo_start_xmit returning NETDEV_TX_BUSY
net: ll_temac: Fix TX BD buffer overwrite
net: ll_temac: Add memory-barriers for TX BD access
net: ll_temac: Make sure to free skb when it is completely used
MAINTAINERS: add Guvenc as SMC maintainer
bnxt_en: Call bnxt_ethtool_free() in bnxt_init_one() error path
bnxt_en: Fix TQM fastpath ring backing store computation
bnxt_en: Rediscover PHY capabilities after firmware reset
cxgb4: fix wrong shift.
mac80211: handle various extensible elements correctly
mac80211: reset profile_periodicity/ema_ap
cfg80211: avoid double free of PMSR request
cfg80211: make certificate generation more robust
mac80211: minstrel_ht: fix sample time check
net: qed: Fix memcpy() overflow of qed_dcbx_params()
net: cdc_eem: fix tx fixup skb leak
net: hamradio: fix memory leak in mkiss_close
...
static void ec_bhf_remove(struct pci_dev *dev)
{
...
struct ec_bhf_priv *priv = netdev_priv(net_dev);
unregister_netdev(net_dev);
free_netdev(net_dev);
pci_iounmap(dev, priv->dma_io);
pci_iounmap(dev, priv->io);
...
}
priv is netdev private data, but it is used
after free_netdev(). It can cause use-after-free when accessing priv
pointer. So, fix it by moving free_netdev() after pci_iounmap()
calls.
Fixes: 6af55ff52b ("Driver for Beckhoff CX5020 EtherCAT master module.")
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As documented in Documentation/networking/driver.rst, the ndo_start_xmit
method must not return NETDEV_TX_BUSY under any normal circumstances, and
as recommended, we simply stop the tx queue in advance, when there is a
risk that the next xmit would cause a NETDEV_TX_BUSY return.
Signed-off-by: Esben Haabendal <esben@geanix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Just as the initial check, we need to ensure num_frag+1 buffers available,
as that is the number of buffers we are going to use.
This fixes a buffer overflow, which might be seen during heavy network
load. Complete lockup of TEMAC was reproducible within about 10 minutes of
a particular load.
Fixes: 84823ff80f ("net: ll_temac: Fix race condition causing TX hang")
Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: Esben Haabendal <esben@geanix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a couple of memory-barriers to ensure correct ordering of read/write
access to TX BDs.
In xmit_done, we should ensure that reading the additional BD fields are
only done after STS_CTRL_APP0_CMPLT bit is set.
When xmit_done marks the BD as free by setting APP0=0, we need to ensure
that the other BD fields are reset first, so we avoid racing with the xmit
path, which writes to the same fields.
Finally, making sure to read APP0 of next BD after the current BD, ensures
that we see all available buffers.
Signed-off-by: Esben Haabendal <esben@geanix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the skb pointer piggy-backed on the TX BD, we have a simple and
efficient way to free the skb buffer when the frame has been transmitted.
But in order to avoid freeing the skb while there are still fragments from
the skb in use, we need to piggy-back on the TX BD of the skb, not the
first.
Without this, we are doing use-after-free on the DMA side, when the first
BD of a multi TX BD packet is seen as completed in xmit_done, and the
remaining BDs are still being processed.
Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: Esben Haabendal <esben@geanix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bnxt_ethtool_init() may have allocated some memory and we need to
call bnxt_ethtool_free() to properly unwind if bnxt_init_one()
fails.
Fixes: 7c38091814 ("bnxt_en: Refactor bnxt_init_one() and turn on TPA support on 57500 chips.")
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TQM fastpath ring needs to be sized to store both the requester
and responder side of RoCE QPs in TQM for supporting bi-directional
tests. Fix bnxt_alloc_ctx_mem() to multiply the RoCE QPs by a factor of
2 when computing the number of entries for TQM fastpath ring. This
fixes an RX pipeline stall issue when running bi-directional max
RoCE QP tests.
Fixes: c7dd7ab4b2 ("bnxt_en: Improve TQM ring context memory sizing formulas.")
Signed-off-by: Rukhsana Ansari <rukhsana.ansari@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a missing bnxt_probe_phy() call in bnxt_fw_init_one() to
rediscover the PHY capabilities after a firmware reset. This can cause
some PHY related functionalities to fail after a firmware reset. For
example, in multi-host, the ability for any host to configure the PHY
settings may be lost after a firmware reset.
Fixes: ec5d31e3c1 ("bnxt_en: Handle firmware reset status during IF_UP.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The source (&dcbx_info->operational.params) and dest
(&p_hwfn->p_dcbx_info->set.config.params) are both struct qed_dcbx_params
(560 bytes), not struct qed_dcbx_admin_params (564 bytes), which is used
as the memcpy() size.
However it seems that struct qed_dcbx_operational_params
(dcbx_info->operational)'s layout matches struct qed_dcbx_admin_params
(p_hwfn->p_dcbx_info->set.config)'s 4 byte difference (3 padding, 1 byte
for "valid").
On the assumption that the size is wrong (rather than the source structure
type), adjust the memcpy() size argument to be 4 bytes smaller and add
a BUILD_BUG_ON() to validate any changes to the structure sizes.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmDKfQMACgkQSD+KveBX
+j7uBAgAuwValxUKIGqJBoIVkj910r2AAyFoPrKNOyIX9d6saGHigMDHgrovIFlz
epZ1i4FLYFJTB96YDFh/P+gE6vh8d13vc0GQgpuyymDRn6aKuF01bwZupASxnH9L
TkiSq1MsLB5Nilfue/MfM20uBqRRRvwTLTeW5jWr3YqmaB1o/1sZzk47S0np7H/P
kNcLwZH49ujn99EHpoDDvHPJqW31qKfKwrnNiT2U7069vHspvSV9OWNwI/gz2TOY
geADSSieLo//zebjn88SFGWR0cPW8VQSWmU7BqcKAb+9piWfM3h25xDNpSsxhhcD
C0+d8i48il20ySuR2DuciZJ91iQQeg==
=4kRA
-----END PGP SIGNATURE-----
Merge tag 'mlx5-fixes-2021-06-16' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5 fixes 2021-06-16
This series introduces some fixes to mlx5 driver.
Please pull and let me know if there is any problem.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
If an error occurs after a 'pci_enable_pcie_error_reporting()' call, it
must be undone by a corresponding 'pci_disable_pcie_error_reporting()'
call, as already done in the remove function.
Fixes: d6b6d98778 ("be2net: use PCIe AER capability")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reset only the index part of the mkey and keep the variant part. On
devlink reload, driver recreates mkeys, so the mkey index may change.
Trying to preserve the variant part of the mkey, driver mistakenly
merged the mkey index with current value. In case of a devlink reload,
current value of index part is dirty, so the index may be corrupted.
Fixes: 54c62e13ad ("{IB,net}/mlx5: Setup mkey variant before mr create command invocation")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Amir Tzin <amirtz@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Running devlink reload command for port in switchdev mode cause
resources to corrupt: driver can't release allocated EQ and reclaim
memory pages, because "rdma" auxiliary device had add CQs which blocks
EQ from deletion.
Erroneous sequence happens during reload-down phase, and is following:
1. detach device - suspends auxiliary devices which support it, destroys
others. During this step "eth-rep" and "rdma-rep" are destroyed,
"eth" - suspended.
2. disable SRIOV - moves device to legacy mode; as part of disablement -
rescans drivers. This step adds "rdma" auxiliary device.
3. destroy EQ table - <failure>.
Driver shouldn't create any device during unload flows. To handle that
implement MLX5_PRIV_FLAGS_DETACH flag, set it on device detach and unset
on device attach. If flag is set do no-op on drivers rescan.
Fixes: a925b5e309 ("net/mlx5: Register mlx5 devices to auxiliary virtual bus")
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Decapsulation L3 on small inner packets which are less than
64 Bytes was done incorrectly. In small packets there is an
extra padding added in L2 which should not be included in L3
length. The issue was that after decapL3 the extra L2 padding
caused an update on the L3 length.
To avoid this issue the new header is pushed to the beginning
of the packet (offset 0) which should not cause a HW reparse
and update the L3 length.
Fixes: c349b4137c ("net/mlx5: DR, Add STEv1 modify header logic")
Reviewed-by: Erez Shitrit <erezsh@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When auxiliary bus autoprobe is disabled and SF is in ACTIVE state,
on SF port deletion it transitions from ACTIVE->ALLOCATED->INVALID.
When VHCA event handler queries the state, it is already transition
to INVALID state.
In this scenario, event handler missed to delete the SF device.
Fix it by deleting the SF when SF state is INVALID.
Fixes: 90d010b863 ("net/mlx5: SF, Add auxiliary device support")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
E-switch should be able to set the GUID of host PF vport.
Currently it returns an error. This results in below error
when user attempts to configure MAC address of the PF of an
external controller.
$ devlink port function set pci/0000:03:00.0/196608 \
hw_addr 00:00:00:11:22:33
mlx5_core 0000:03:00.0: mlx5_esw_set_vport_mac_locked:1876:(pid 6715):\
"Failed to set vport 0 node guid, err = -22.
RDMA_CM will not function properly for this VF."
Check for zero vport is no longer needed.
Fixes: 330077d14d ("net/mlx5: E-switch, Supporting setting devlink port function mac address")
Signed-off-by: Yuval Avnery <yuvalav@nvidia.com>
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Bodong Wang <bodong@nvidia.com>
Reviewed-by: Alaa Hleihel <alaa@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
External controller PF's MAC address is not read from the device during
vport setup. Fail to read this results in showing all zeros to user
while the factory programmed MAC is a valid value.
$ devlink port show eth1 -jp
{
"port": {
"pci/0000:03:00.0/196608": {
"type": "eth",
"netdev": "eth1",
"flavour": "pcipf",
"controller": 1,
"pfnum": 0,
"splittable": false,
"function": {
"hw_addr": "00:00:00:00:00:00"
}
}
}
}
Hence, read it when enabling a vport.
After the fix,
$ devlink port show eth1 -jp
{
"port": {
"pci/0000:03:00.0/196608": {
"type": "eth",
"netdev": "eth1",
"flavour": "pcipf",
"controller": 1,
"pfnum": 0,
"splittable": false,
"function": {
"hw_addr": "98:03:9b:a0:60:11"
}
}
}
}
Fixes: f099fde16d ("net/mlx5: E-switch, Support querying port function mac address")
Signed-off-by: Bodong Wang <bodong@nvidia.com>
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Alaa Hleihel <alaa@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
In the case of the failure to execute mlx5_core_set_hca_defaults(),
we used wrong goto label to execute error unwind flow.
Fixes: 5bef709d76 ("net/mlx5: Enable host PF HCA after eswitch is initialized")
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memcpy(), memmove(), and memset(), avoid
intentionally reading across neighboring array fields.
The memcpy() is copying the entire structure, not just the first array.
Adjust the source argument so the compiler can do appropriate bounds
checking.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memcpy(), memmove(), and memset(), avoid
intentionally reading across neighboring array fields.
The memcpy() is copying the entire structure, not just the first array.
Adjust the source argument so the compiler can do appropriate bounds
checking.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit da722186f6 ("net: fec: set GPR bit on suspend by DT configuration.")
refactor the fec_devtype, need adjust ptp driver accordingly.
Fixes: da722186f6 ("net: fec: set GPR bit on suspend by DT configuration.")
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add clock rate zero check to fix coverity issue of "divide by 0".
Fixes: commit 85bd1798b2 ("net: fec: fix spin_lock dead lock")
Signed-off-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Platform drivers may call stmmac_probe_config_dt() to parse dt, could
call stmmac_remove_config_dt() in error handing after dt parsed, so need
disable clocks in stmmac_remove_config_dt().
Go through all platforms drivers which use stmmac_probe_config_dt(),
none of them disable clocks manually, so it's safe to disable them in
stmmac_remove_config_dt().
Fixes: commit d2ed0a7755 ("net: ethernet: stmmac: fix of-node and fixed-link-phydev leaks")
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The previous commit didn't fix the bug properly. By mistake, it replaces
the pointer of the next skb in the descriptor ring instead of the current
one. As a result, the two descriptors are assigned the same SKB. The error
is seen during the iperf test when skb_put tries to insert a second packet
and exceeds the available buffer.
Fixes: c7718ee96d ("net: lantiq: fix memory corruption in RX ring ")
Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
The TID returned during successful filter creation is relative to
the region in which the filter is created. Using it directly always
returns Hi Prio/Normal filter region's entry for the first couple of
entries, even though the rule is actually inserted in Hash region.
Fix by analyzing in which region the filter has been inserted and
save the absolute TID to be used for lookup later.
Fixes: db43b30cd8 ("cxgb4: add ethtool n-tuple filter deletion")
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If an error occurs after a 'pci_enable_pcie_error_reporting()' call, it
must be undone by a corresponding 'pci_disable_pcie_error_reporting()'
call, as already done in the remove function.
Fixes: e87ad55393 ("netxen: support pci error handlers")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
If an error occurs after a 'pci_enable_pcie_error_reporting()' call, it
must be undone by a corresponding 'pci_disable_pcie_error_reporting()'
call, as already done in the remove function.
Fixes: 451724c821 ("qlcnic: aer support")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
The purpose of the loop using u64_stats_fetch_*_irq() is to ensure
statistics on a given CPU are collected atomically. If one of the
statistics values gets updated within the begin/retry window, the
loop will run again.
Currently the statistics totals are updated inside that window.
This means that if the loop ever retries, the statistics for the
CPU will be counted more than once.
Fix this by taking a snapshot of a CPU's statistics inside the
protected window, and then updating the counters with the snapshot
values after exiting the loop.
(Also add a newline at the end of this file...)
Fixes: 192c4b5d48 ("net: qualcomm: rmnet: Add support for 64 bit stats")
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The register starts from 0x800 is the 16th MAC address register rather
than the first one.
Fixes: cffb13f4d6 ("stmmac: extend mac addr reg and fix perfect filering")
Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When using firmware-assisted PHY firmware image write to flash,
halt the chip before beginning the flash write operation to allow
the running firmware to store the image persistently. Otherwise,
the running firmware will only store the PHY image in local on-chip
RAM, which will be lost after next reset.
Fixes: 4ee339e1e9 ("cxgb4: add support to flash PHY image")
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before writing new PHY firmware to on-chip memory, driver queries
firmware for current running PHY firmware version, which can result
in sleep waiting for reply. So, move spinlock closer to the actual
on-chip memory write operation, instead of taking it at the callers.
Fixes: 5fff701c83 ("cxgb4: always sync access when flashing PHY firmware")
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Boot images are copied to memory and updated with current underlying
device ID before flashing them to adapter. Ensure the updated images
are always flashed in Big Endian to allow the firmware to read the
new images during boot properly.
Fixes: 550883558f ("cxgb4: add support to flash boot image")
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If an error occurs after a 'pci_enable_pcie_error_reporting()' call, it
must be undone by a corresponding 'pci_disable_pcie_error_reporting()'
call, as already done in the remove function.
Fixes: ab69bde6b2 ("alx: add a simple AR816x/AR817x device driver")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
A mixture of small bug fixes and a small security issue:
- WARN_ON when IPoIB is automatically moved between namespaces
- Long standing bug where mlx5 would use the wrong page for the doorbell
recovery memory if fork is used
- Security fix for mlx4 that disables the timestamp feature
- Several crashers for mlx5
- Plug a recent mlx5 memory leak for the sig_mr
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAmDCAxYACgkQOG33FX4g
mxqL4Q/9FOOS+Q0O2nOtkxzenqB931w46Q4kca1m6RZcdJI97P/tpF+SigQoUwV+
qiuJV4CThkidqWjxxfesX4uXyj6mc8yW4ux57c2JAMiS5iGIsKEPCavNvzcWRZKJ
rlMQg0yi7KeDwJ8XC2nw/Ajl1ujtxh569AkaqFVMMJer6jSa048TU14iulOOlcpZ
VGmF0/sCSY+PzyEOycr0LxGfUImCdD/spvF1RDbCNtQUcQwg41LUUkR+wvrqp8eR
KmuU7i+NLbcGyCZou16r6su9mMRYU5ZuFN5JMtjrmeqfdOi6deb7StyCgQFmRuac
Yw9Lgw91JUNphZp9v//sw6UDfyZaRMdsSW4796jiEPjnxZK7tzx+klhFLpO3WPkh
3VaZGY5nkcGcaRfqGD0PUHcHNjPr18rCXHz+JIovNLwIIJDmR4iUnZOs/JgOkvvd
bh4p4O/3xhXT57FoyBb/MhYgILAVHJ3Od6Dab3uJNx7ZaHAngtVHhzykm8PP4t/h
sHfd5W494jgec5RicJBQQfjZ4YUdSFMKjqLchKaSkdIsv/Wi+3idh+561ucmkMwI
JnIVZV/0739JUKeXhOJkxQkc1SKjr79e7+JUlrEgVFC0lJ8srzUD0f9a0L5txgt4
2MqQ9CSGljhiUpby0urFPb/vznQ3OQoZVwXOxj1TKtr0rrS3nuE=
=crsk
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"A mixture of small bug fixes and a small security issue:
- WARN_ON when IPoIB is automatically moved between namespaces
- Long standing bug where mlx5 would use the wrong page for the
doorbell recovery memory if fork is used
- Security fix for mlx4 that disables the timestamp feature
- Several crashers for mlx5
- Plug a recent mlx5 memory leak for the sig_mr"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
IB/mlx5: Fix initializing CQ fragments buffer
RDMA/mlx5: Delete right entry from MR signature database
RDMA: Verify port when creating flow rule
RDMA/mlx5: Block FDB rules when not in switchdev mode
RDMA/mlx4: Do not map the core_clock page to user space unless enabled
RDMA/mlx5: Use different doorbell memory for different processes
RDMA/ipoib: Fix warning caused by destroying non-initial netns
The device is able to offload either the outer header csum or inner
header csum. The driver utilizes the inner csum offload. So, prohibit
setting of tx-gre-csum-segmentation and let it be: off[fixed].
Fixes: 2729984149 ("net/mlx5e: Support TSO and TX checksum offloads for GRE tunnels")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The device is able to offload either the outer header csum or inner
header csum. The driver utilizes the inner csum offload. Hence, block
setting of tx-udp_tnl-csum-segmentation and set it to off[fixed].
Fixes: b49663c8fb ("net/mlx5e: Add support for UDP tunnel segmentation with outer checksum offload")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
In the scenario described below, an EQ can remain in FIRED state which
can result in missing an interrupt generation.
The scenario:
device mlx5_core driver
------ ----------------
EQ1.eqe generated
EQ1.MSI-X sent
EQ1.state = FIRED
EQ2.eqe generated
mlx5_irq()
polls - eq1_eqes()
arm eq1
polls - eq2_eqes()
arm eq2
EQ2.MSI-X sent
EQ2.state = FIRED
mlx5_irq()
polls - eq2_eqes() -- no eqes found
driver skips EQ arming;
->EQ2 remains fired, misses generating interrupt.
Hence, always arm the EQ by reverting the cited commit in fixes tag.
Fixes: d894892dda ("net/mlx5: Arm only EQs with EQEs")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Steering packets to PTP-SQ should be done only if the SKB has
SKBTX_HW_TSTAMP set in the tx_flags. While here, take the function into
a header and inline it.
Set the whole condition to select the PTP-SQ to unlikely.
Fixes: 24c22dd091 ("net/mlx5e: Add states to PTP channel")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Since the driver opens the PTP-RQ under channel 0, it appears to the
stack as if the SKB was received on rxq0. So from thew stack POV there
are still the same number of RX queues.
Fixes: 960fbfe222 ("net/mlx5e: Allow coexistence of CQE compression and HW TS PTP")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
SW steering uses RC QP to write/read to/from ICM, hence it's not
supported when RoCE is not supported as well.
Fixes: 70605ea545 ("net/mlx5: DR, Expose APIs for direct rule managing")
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Check if RoCE is supported by the device before enable it in
the vport context and create all the RDMA steering objects.
Fixes: 80f09dfc23 ("net/mlx5: Eswitch, enable RoCE loopback traffic")
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently, IPsec feature is disabled because mlx5e_build_nic_netdev
is required to be called after mlx5e_ipsec_init. This requirement is
invalid as mlx5e_build_nic_netdev and mlx5e_ipsec_init initialize
independent resources.
Remove ipsec pointer check in mlx5e_build_nic_netdev so that the
two functions can be called at any order.
Fixes: 547eede070 ("net/mlx5e: IPSec, Innova IPSec offload infrastructure")
Signed-off-by: Huy Nguyen <huyn@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Function mlx5e_rep_neigh_update() wasn't updated to accommodate rtnl lock
removal from TC filter update path and properly handle concurrent encap
entry insertion/deletion which can lead to following use-after-free:
[23827.464923] ==================================================================
[23827.469446] BUG: KASAN: use-after-free in mlx5e_encap_take+0x72/0x140 [mlx5_core]
[23827.470971] Read of size 4 at addr ffff8881d132228c by task kworker/u20:6/21635
[23827.472251]
[23827.472615] CPU: 9 PID: 21635 Comm: kworker/u20:6 Not tainted 5.13.0-rc3+ #5
[23827.473788] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[23827.475639] Workqueue: mlx5e mlx5e_rep_neigh_update [mlx5_core]
[23827.476731] Call Trace:
[23827.477260] dump_stack+0xbb/0x107
[23827.477906] print_address_description.constprop.0+0x18/0x140
[23827.478896] ? mlx5e_encap_take+0x72/0x140 [mlx5_core]
[23827.479879] ? mlx5e_encap_take+0x72/0x140 [mlx5_core]
[23827.480905] kasan_report.cold+0x7c/0xd8
[23827.481701] ? mlx5e_encap_take+0x72/0x140 [mlx5_core]
[23827.482744] kasan_check_range+0x145/0x1a0
[23827.493112] mlx5e_encap_take+0x72/0x140 [mlx5_core]
[23827.494054] ? mlx5e_tc_tun_encap_info_equal_generic+0x140/0x140 [mlx5_core]
[23827.495296] mlx5e_rep_neigh_update+0x41e/0x5e0 [mlx5_core]
[23827.496338] ? mlx5e_rep_neigh_entry_release+0xb80/0xb80 [mlx5_core]
[23827.497486] ? read_word_at_a_time+0xe/0x20
[23827.498250] ? strscpy+0xa0/0x2a0
[23827.498889] process_one_work+0x8ac/0x14e0
[23827.499638] ? lockdep_hardirqs_on_prepare+0x400/0x400
[23827.500537] ? pwq_dec_nr_in_flight+0x2c0/0x2c0
[23827.501359] ? rwlock_bug.part.0+0x90/0x90
[23827.502116] worker_thread+0x53b/0x1220
[23827.502831] ? process_one_work+0x14e0/0x14e0
[23827.503627] kthread+0x328/0x3f0
[23827.504254] ? _raw_spin_unlock_irq+0x24/0x40
[23827.505065] ? __kthread_bind_mask+0x90/0x90
[23827.505912] ret_from_fork+0x1f/0x30
[23827.506621]
[23827.506987] Allocated by task 28248:
[23827.507694] kasan_save_stack+0x1b/0x40
[23827.508476] __kasan_kmalloc+0x7c/0x90
[23827.509197] mlx5e_attach_encap+0xde1/0x1d40 [mlx5_core]
[23827.510194] mlx5e_tc_add_fdb_flow+0x397/0xc40 [mlx5_core]
[23827.511218] __mlx5e_add_fdb_flow+0x519/0xb30 [mlx5_core]
[23827.512234] mlx5e_configure_flower+0x191c/0x4870 [mlx5_core]
[23827.513298] tc_setup_cb_add+0x1d5/0x420
[23827.514023] fl_hw_replace_filter+0x382/0x6a0 [cls_flower]
[23827.514975] fl_change+0x2ceb/0x4a51 [cls_flower]
[23827.515821] tc_new_tfilter+0x89a/0x2070
[23827.516548] rtnetlink_rcv_msg+0x644/0x8c0
[23827.517300] netlink_rcv_skb+0x11d/0x340
[23827.518021] netlink_unicast+0x42b/0x700
[23827.518742] netlink_sendmsg+0x743/0xc20
[23827.519467] sock_sendmsg+0xb2/0xe0
[23827.520131] ____sys_sendmsg+0x590/0x770
[23827.520851] ___sys_sendmsg+0xd8/0x160
[23827.521552] __sys_sendmsg+0xb7/0x140
[23827.522238] do_syscall_64+0x3a/0x70
[23827.522907] entry_SYSCALL_64_after_hwframe+0x44/0xae
[23827.523797]
[23827.524163] Freed by task 25948:
[23827.524780] kasan_save_stack+0x1b/0x40
[23827.525488] kasan_set_track+0x1c/0x30
[23827.526187] kasan_set_free_info+0x20/0x30
[23827.526968] __kasan_slab_free+0xed/0x130
[23827.527709] slab_free_freelist_hook+0xcf/0x1d0
[23827.528528] kmem_cache_free_bulk+0x33a/0x6e0
[23827.529317] kfree_rcu_work+0x55f/0xb70
[23827.530024] process_one_work+0x8ac/0x14e0
[23827.530770] worker_thread+0x53b/0x1220
[23827.531480] kthread+0x328/0x3f0
[23827.532114] ret_from_fork+0x1f/0x30
[23827.532785]
[23827.533147] Last potentially related work creation:
[23827.534007] kasan_save_stack+0x1b/0x40
[23827.534710] kasan_record_aux_stack+0xab/0xc0
[23827.535492] kvfree_call_rcu+0x31/0x7b0
[23827.536206] mlx5e_tc_del_fdb_flow+0x577/0xef0 [mlx5_core]
[23827.537305] mlx5e_flow_put+0x49/0x80 [mlx5_core]
[23827.538290] mlx5e_delete_flower+0x6d1/0xe60 [mlx5_core]
[23827.539300] tc_setup_cb_destroy+0x18e/0x2f0
[23827.540144] fl_hw_destroy_filter+0x1d2/0x310 [cls_flower]
[23827.541148] __fl_delete+0x4dc/0x660 [cls_flower]
[23827.541985] fl_delete+0x97/0x160 [cls_flower]
[23827.542782] tc_del_tfilter+0x7ab/0x13d0
[23827.543503] rtnetlink_rcv_msg+0x644/0x8c0
[23827.544257] netlink_rcv_skb+0x11d/0x340
[23827.544981] netlink_unicast+0x42b/0x700
[23827.545700] netlink_sendmsg+0x743/0xc20
[23827.546424] sock_sendmsg+0xb2/0xe0
[23827.547084] ____sys_sendmsg+0x590/0x770
[23827.547850] ___sys_sendmsg+0xd8/0x160
[23827.548606] __sys_sendmsg+0xb7/0x140
[23827.549303] do_syscall_64+0x3a/0x70
[23827.549969] entry_SYSCALL_64_after_hwframe+0x44/0xae
[23827.550853]
[23827.551217] The buggy address belongs to the object at ffff8881d1322200
[23827.551217] which belongs to the cache kmalloc-256 of size 256
[23827.553341] The buggy address is located 140 bytes inside of
[23827.553341] 256-byte region [ffff8881d1322200, ffff8881d1322300)
[23827.555747] The buggy address belongs to the page:
[23827.556847] page:00000000898762aa refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1d1320
[23827.558651] head:00000000898762aa order:2 compound_mapcount:0 compound_pincount:0
[23827.559961] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[23827.561243] raw: 002ffff800010200 dead000000000100 dead000000000122 ffff888100042b40
[23827.562653] raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000
[23827.564112] page dumped because: kasan: bad access detected
[23827.565439]
[23827.565932] Memory state around the buggy address:
[23827.566917] ffff8881d1322180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[23827.568485] ffff8881d1322200: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[23827.569818] >ffff8881d1322280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[23827.571143] ^
[23827.571879] ffff8881d1322300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[23827.573283] ffff8881d1322380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[23827.574654] ==================================================================
Most of the necessary logic is already correctly implemented by
mlx5e_get_next_valid_encap() helper that is used in neigh stats update
handler. Make the handler generic by renaming it to
mlx5e_get_next_matching_encap() and use callback to test whether flow is
matching instead of hardcoded check for 'valid' flag value. Implement
mlx5e_get_next_valid_encap() by calling mlx5e_get_next_matching_encap()
with callback that tests encap MLX5_ENCAP_ENTRY_VALID flag. Implement new
mlx5e_get_next_init_encap() helper by calling
mlx5e_get_next_matching_encap() with callback that tests encap completion
result to be non-error and use it in mlx5e_rep_neigh_update() to safely
iterate over nhe->encap_list.
Remove encap completion logic from mlx5e_rep_update_flows() since the encap
entries passed to this function are already guaranteed to be properly
initialized by similar code in mlx5e_get_next_init_encap().
Fixes: 2a1f1768fa ("net/mlx5e: Refactor neigh update for concurrent execution")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When the code execute 'if (!priv->fs.arfs->wq)', the value of err is 0.
So, we use -ENOMEM to indicate that the function
create_singlethread_workqueue() return NULL.
Clean up smatch warning:
drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c:373
mlx5e_arfs_create_tables() warn: missing error code 'err'.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Fixes: f6755b80d6 ("net/mlx5e: Dynamic alloc arfs table for netdev when needed")
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>