The usage of in_interrupt() in non-core code is phased out. Ideally the
information of the calling context should be passed by the callers or the
functions be split as appropriate.
The attempt to consolidate the code by passing an arguemnt or by
distangling it failed due lack of knowledge about this driver and because
the call chains are hard to follow.
As a stop gap use netif_rx_any_context() which invokes the correct code path
depending on context and confines the in_interrupt() usage to core code.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
dev->name is determined only after calling register_hdlc_device(),
however ,it is used by printk before the name is fully determined.
[ 4.565137] hdlc%d: detected at e8000000, irq 11
Instead of printing out a %d, print hdlc directly
Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The documentation for the PHY update [1] states:
Loop 4 times with index i
If PHY Revision >= 3
Copy table[i] to coef[i]
Otherwise
Set coef[i] to 0
the copy of the table to coef is currently implemented the wrong way
around, table is being updated from uninitialized values in coeff.
Fix this by swapping the assignment around.
[1] https://bcm-v4.sipsolutions.net/802.11/PHY/N/RestoreCal/
Fixes: 2f258b74d1 ("b43: N-PHY: implement restoring general configuration")
Addresses-Coverity: ("Uninitialized scalar variable")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Max imm data size in cxgb4 is not similar to the max imm data size
in the chtls. This caused an mismatch in output of is_ofld_imm() of
cxgb4 and chtls. So fixed this by keeping the max wreq size of imm data
same in both chtls and cxgb4 as MAX_IMM_OFLD_TX_DATA_WR_LEN.
As cxgb4's max imm. data value for ofld packets is changed to
MAX_IMM_OFLD_TX_DATA_WR_LEN. Using the same in cxgbit also.
Fixes: 36bedb3f2e ("crypto: chtls - Inline TLS record Tx")
Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
idt77252 is broken and wont load on amd64 systems
modprobe idt77252 shows the following
idt77252_init: skb->cb is too small (48 < 56)
Add packed attribute to struct idt77252_skb_prv and struct atm_skb_data
so that the total size can be <= sizeof(skb->cb)
Also convert runtime size check to buildtime size check in
idt77252_init()
Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This driver is set up to use a clock mapping in the device tree if it is
present, but still work without one for backward compatibility. However,
if getting the clock returns -EPROBE_DEFER, then we need to abort and
return that error from our driver initialization so that the probe can
be retried later after the clock is set up.
Move clock initialization to earlier in the process so we do not waste as
much effort if the clock is not yet available. Switch to use
devm_clk_get_optional and abort initialization on any error reported.
Also enable the clock regardless of whether the controller is using an MDIO
bus, as the clock is required in any case.
Fixes: 09a0354cad ("net: axienet: Use clock framework to get device clock rate")
Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The supported indirect subcrq entries on Power8 is 16. Power9
supports 128. Redefined this value to 16 to minimize the driver from
having to reset when migrating between Power9 and Power8. In our rx/tx
performance testing, we found no performance difference between 16 and
128 at this time.
Fixes: f019fb6392 ("ibmvnic: Introduce indirect subordinate Command Response Queue buffer")
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
the following command:
# tc filter add dev $h2 ingress protocol ip pref 1 handle 101 flower \
$tcflags dst_ip 192.0.2.2 ip_ttl 63 action drop
doesn't drop all IPv4 packets that match the configured TTL / destination
address. In particular, if "fragment offset" or "more fragments" have non
zero value in the IPv4 header, setting of FLOW_DISSECTOR_KEY_IP is simply
ignored. Fix this dissecting IPv4 TTL and TOS before fragment info; while
at it, add a selftest for tc flower's match on 'ip_ttl' that verifies the
correct behavior.
Fixes: 518d8a2e9b ("net/flow_dissector: add support for dissection of misc ip header fields")
Reported-by: Shuang Li <shuali@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If a DDP broadcast packet is sent out to a non-gateway target, it is
also looped back. There is a potential for the loopback device to have a
longer hardware header length than the original target route's device,
which can result in the skb not being created with enough room for the
loopback device's hardware header. This patch fixes the issue by
determining that a loopback will be necessary prior to allocating the
skb, and if so, ensuring the skb has enough room.
This was discovered while testing a new driver that creates a LocalTalk
network interface (LTALK_HLEN = 1). It caused an skb_under_panic.
Signed-off-by: Doug Brown <doug@schmorgal.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf 2021-02-13
The following pull-request contains BPF updates for your *net* tree.
We've added 2 non-merge commits during the last 3 day(s) which contain
a total of 2 files changed, 9 insertions(+), 11 deletions(-).
The main changes are:
1) Fix mod32 truncation handling in verifier, from Daniel Borkmann.
2) Fix XDP redirect tests to explicitly use bash, from Björn Töpel.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Recently noticed that when mod32 with a known src reg of 0 is performed,
then the dst register is 32-bit truncated in verifier:
0: R1=ctx(id=0,off=0,imm=0) R10=fp0
0: (b7) r0 = 0
1: R0_w=inv0 R1=ctx(id=0,off=0,imm=0) R10=fp0
1: (b7) r1 = -1
2: R0_w=inv0 R1_w=inv-1 R10=fp0
2: (b4) w2 = -1
3: R0_w=inv0 R1_w=inv-1 R2_w=inv4294967295 R10=fp0
3: (9c) w1 %= w0
4: R0_w=inv0 R1_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R2_w=inv4294967295 R10=fp0
4: (b7) r0 = 1
5: R0_w=inv1 R1_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R2_w=inv4294967295 R10=fp0
5: (1d) if r1 == r2 goto pc+1
R0_w=inv1 R1_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R2_w=inv4294967295 R10=fp0
6: R0_w=inv1 R1_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R2_w=inv4294967295 R10=fp0
6: (b7) r0 = 2
7: R0_w=inv2 R1_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R2_w=inv4294967295 R10=fp0
7: (95) exit
7: R0=inv1 R1=inv(id=0,umin_value=4294967295,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R2=inv4294967295 R10=fp0
7: (95) exit
However, as a runtime result, we get 2 instead of 1, meaning the dst
register does not contain (u32)-1 in this case. The reason is fairly
straight forward given the 0 test leaves the dst register as-is:
# ./bpftool p d x i 23
0: (b7) r0 = 0
1: (b7) r1 = -1
2: (b4) w2 = -1
3: (16) if w0 == 0x0 goto pc+1
4: (9c) w1 %= w0
5: (b7) r0 = 1
6: (1d) if r1 == r2 goto pc+1
7: (b7) r0 = 2
8: (95) exit
This was originally not an issue given the dst register was marked as
completely unknown (aka 64 bit unknown). However, after 468f6eafa6
("bpf: fix 32-bit ALU op verification") the verifier casts the register
output to 32 bit, and hence it becomes 32 bit unknown. Note that for
the case where the src register is unknown, the dst register is marked
64 bit unknown. After the fix, the register is truncated by the runtime
and the test passes:
# ./bpftool p d x i 23
0: (b7) r0 = 0
1: (b7) r1 = -1
2: (b4) w2 = -1
3: (16) if w0 == 0x0 goto pc+2
4: (9c) w1 %= w0
5: (05) goto pc+1
6: (bc) w1 = w1
7: (b7) r0 = 1
8: (1d) if r1 == r2 goto pc+1
9: (b7) r0 = 2
10: (95) exit
Semantics also match with {R,W}x mod{64,32} 0 -> {R,W}x. Invalid div
has always been {R,W}x div{64,32} 0 -> 0. Rewrites are as follows:
mod32: mod64:
(16) if w0 == 0x0 goto pc+2 (15) if r0 == 0x0 goto pc+1
(9c) w1 %= w0 (9f) r1 %= r0
(05) goto pc+1
(bc) w1 = w1
Fixes: 468f6eafa6 ("bpf: fix 32-bit ALU op verification")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmAl7OgACgkQSD+KveBX
+j72XQgAkwXPdkVm36mUP4vnjZhxCIOG/8KHp50iVJAFKRBukEGgSV3yGt8Srjaj
fCyqevg/ncOq61mmo6KE2v16/oI5Mh7fzJ+Q0+MXdYp5VVgtyUIil2dnfgPWnYfm
Ahsc4TaRfU3YUBnMz8MhgAhmRih24+dyW+YtOj3pzwxNRjTMxseLI9S1tXu9/a7P
BLWBka7EZroI+mArNtGXqlN8bPt+IigrysDixvEEQvNoUxSbEwjoPjwHLrukWjwB
FE530QXjvzsJH8gFeH0QnG432w/ZYcDlBqmlr4qRwR7k89/ClH0GjA2fqDXejX5u
xI/bx8dH5YgPKjpNPdsfhupe902dag==
=Z+Z+
-----END PGP SIGNATURE-----
Merge tag 'mlx5-fixes-2021-02-11' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
mlx5-fixes-2021-02-11
Saeed Mahameed says:
====================
mlx5 fixes 2021-02-11
This series introduces some fixes to mlx5 driver.
Please pull and let me know if there is any problem.
For -stable v5.4
('net/mlx5e: E-switch, Fix rate calculation for overflow')i
For -stable v5.10
('net/mlx5: Disallow RoCE on multi port slave device')
('net/mlx5: Disable devlink reload for multi port slave device')
('net/mlx5e: Don't change interrupt moderation params when DIM is enabled')
('net/mlx5e: Replace synchronize_rcu with synchronize_net')
('net/mlx5e: Enable XDP for Connect-X IPsec capable devices')
('net/mlx5e: kTLS, Use refcounts to free kTLS RX priv context')
('net/mlx5e: Check tunnel offload is required before setting SWP')
('net/mlx5: Fix health error state handling')
('net/mlx5: Disable devlink reload for lag devices')
('net/mlx5e: CT: manage the lifetime of the ct entry object')
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Check that tunnel offload is required before setting Software Parser
offsets to get Geneve HW offload. In case of Geneve packet we check HW
offload support of SWP in mlx5e_tunnel_features_check() and set features
accordingly, this should be reflected in skb offload requested by the
kernel and we should add the Software Parser offsets only if requested.
Otherwise, in case HW doesn't support SWP for Geneve, data path will
mistakenly try to offload Geneve SKBs with skb->encapsulation set,
regardless of whether offload was requested or not on this specific SKB.
Fixes: e3cfc7e6b7 ("net/mlx5e: TX, Add geneve tunnel stateless offload support")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The ct entry object is accessed by the ct add, del, stats and restore
methods. In addition, it is referenced from several hash tables.
The lifetime of the ct entry object was not managed which triggered race
conditions as in the following kasan dump:
[ 3374.973945] ==================================================================
[ 3374.988552] BUG: KASAN: use-after-free in memcmp+0x4c/0x98
[ 3374.999590] Read of size 1 at addr ffff00036129ea55 by task ksoftirqd/1/15
[ 3375.016415] CPU: 1 PID: 15 Comm: ksoftirqd/1 Tainted: G O 5.4.31+ #1
[ 3375.055301] Call trace:
[ 3375.060214] dump_backtrace+0x0/0x238
[ 3375.067580] show_stack+0x24/0x30
[ 3375.074244] dump_stack+0xe0/0x118
[ 3375.081085] print_address_description.isra.9+0x74/0x3d0
[ 3375.091771] __kasan_report+0x198/0x1e8
[ 3375.099486] kasan_report+0xc/0x18
[ 3375.106324] __asan_load1+0x60/0x68
[ 3375.113338] memcmp+0x4c/0x98
[ 3375.119409] mlx5e_tc_ct_restore_flow+0x3a4/0x6f8 [mlx5_core]
[ 3375.131073] mlx5e_rep_tc_update_skb+0x1d4/0x2f0 [mlx5_core]
[ 3375.142553] mlx5e_handle_rx_cqe_rep+0x198/0x308 [mlx5_core]
[ 3375.154034] mlx5e_poll_rx_cq+0x2a0/0x1060 [mlx5_core]
[ 3375.164459] mlx5e_napi_poll+0x1d4/0xa78 [mlx5_core]
[ 3375.174453] net_rx_action+0x28c/0x7a8
[ 3375.182004] __do_softirq+0x1b4/0x5d0
Manage the lifetime of the ct entry object by using synchornization
mechanisms for concurrent access.
Fixes: ac991b48d4 ("net/mlx5e: CT: Offload established flows")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Devlink reload can't be allowed on lag devices since reloading one lag
device will cause traffic on the bond to get stucked.
Users who wish to reload a lag device, need to remove the device from
the bond, and only then reload it.
Fixes: 4383cfcc65 ("net/mlx5: Add devlink reload")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
In lag mode, setting roce enabled/disable of lag device have no effect.
e.g.: bond device (roce/vf_lag) roce status remain unchanged.
Therefore disable it and add an error message.
Fixes: cc9defcbb8 ("net/mlx5: Handle "enable_roce" devlink param")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
In dual port mode, setting roce enabled/disable for the slave device
have no effect. e.g.: the slave device roce status remain unchanged.
Therefore disable it and add an error message.
Enable or disable roce of the master device affect both master and slave
devices.
Fixes: cc9defcbb8 ("net/mlx5: Handle "enable_roce" devlink param")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Devlink reload can't be allowed on a multi port slave device, because
reload of slave device doesn't take effect.
The right flow is to disable devlink reload for multi port slave
device. Hence, disabling it in mlx5_core probing.
Fixes: 4383cfcc65 ("net/mlx5: Add devlink reload")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
wait_for_resync is unreliable - if it timeouts, priv_rx will be freed
anyway. However, mlx5e_ktls_handle_get_psv_completion will be called
sooner or later, leading to use-after-free. For example, it can happen
if a CQ error happened, and ICOSQ stopped, but later on the queues are
destroyed, and ICOSQ is flushed with mlx5e_free_icosq_descs.
This patch converts the lifecycle of priv_rx to fully refcount-based, so
that the struct won't be freed before the refcount goes to zero.
Fixes: 0419d8c9d8 ("net/mlx5e: kTLS, Add kTLS RX resync support")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The commit mentioned below has split the parameters of ICOSQ and async
ICOSQ, but it contained a typo: the CQ parameters were swapped for ICOSQ
and async ICOSQ. Async ICOSQ is longer than the normal ICOSQ, and the CQ
size must be the same as the size of the corresponding SQ, but due to
this bug, the CQ of async ICOSQ was much shorter than async ICOSQ
itself. It led to overflows of the CQ with such messages in dmesg, in
particular, when running multiple kTLS-offloaded streams:
mlx5_core 0000:08:00.0: cq_err_event_notifier:529:(pid 9422): CQ error
on CQN 0x406, syndrome 0x1
mlx5_core 0000:08:00.0 eth2: mlx5e_cq_error_event: cqn=0x000406
event=0x04
This commit fixes the issue by using the corresponding parameters for
ICOSQ and async ICOSQ.
Fixes: c293ac927f ("net/mlx5e: Refactor build channel params")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The commit cited below switched from using napi_synchronize to
synchronize_rcu to have a guarantee that it will finish in finite time.
However, on average, synchronize_rcu takes more time than
napi_synchronize. Given that it's called multiple times per channel on
deactivation, it accumulates to a significant amount, which causes
timeouts in some applications (for example, when using bonding with
NetworkManager).
This commit replaces synchronize_rcu with synchronize_net, which is
faster when called under rtnl_lock, allowing to speed up the described
flow.
Fixes: 9c25a22dfb ("net/mlx5e: Use synchronize_rcu to sync with NAPI")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently, when we discover a fatal error, we are queueing a work that
will wait for a lock in order to enter the device to error state.
Meanwhile, FW commands are still being processed, and gets timeouts.
This can block the driver for few minutes before the work will manage
to get the lock and enter to error state.
Setting the device to error state before queueing health work, in order
to avoid FW commands being processed while the work is waiting for the
lock.
Fixes: c1d4d2e92a ("net/mlx5: Avoid calling sleeping function by the health poll thread")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
struct mlx5e_params contains fields ({rx,tx}_cq_moderation) that depend
on two things: whether DIM is enabled and the state of a private flag
(MLX5E_PFLAG_{RX,TX}_CQE_BASED_MODER). Whenever the DIM state changes,
mlx5e_reset_{rx,tx}_moderation is called to update the fields, however,
only if the channels are open. The flow where the channels are closed
misses the required update of the fields. This commit moves the calls of
mlx5e_reset_{rx,tx}_moderation, so that they run in both flows.
Fixes: ebeaf084ad ("net/mlx5e: Properly set default values when disabling adaptive moderation")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When mlx5e_ethtool_set_coalesce doesn't change DIM state
(enabled/disabled), it calls mlx5e_set_priv_channels_coalesce
unconditionally, which in turn invokes a firmware command to set
interrupt moderation parameters. It shouldn't happen while DIM manages
those parameters dynamically (it might even be happening at the same
time).
This patch fixes it by splitting mlx5e_set_priv_channels_coalesce into
two functions (for RX and TX) and calling them only when DIM is disabled
(for RX and TX respectively).
Fixes: cb3c7fd4f8 ("net/mlx5e: Support adaptive RX coalescing")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
This limitation was inherited by previous Innova (FPGA) IPsec
implementation, it uses its private set of RQ handlers which
does not support XDP, for Connect-X this is no longer true.
Fix by keeping this limitation only for Innova IPsec supporting devices,
as otherwise this limitation effectively wrongly blocks XDP for all
future Connect-X devices for all flows even if IPsec offload is not
used.
Fixes: 2d64663cd5 ("net/mlx5: IPsec: Add HW crypto offload support")
Signed-off-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Alaa Hleihel <alaa@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
This limitation was inherited by previous Innova (FPGA) IPsec
implementation, it uses its private set of RQ handlers which does
not support striding rq, for Connect-X this is no longer true.
Fix by keeping this limitation only for Innova IPsec supporting devices,
as otherwise this limitation effectively wrongly blocks striding RQs for
all future Connect-X devices for all flows even if IPsec offload is not
used.
Fixes: 2d64663cd5 ("net/mlx5: IPsec: Add HW crypto offload support")
Signed-off-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
rate_bytes_ps is a 64-bit field. It passed as 32-bit field to
apply_police_params(). Due to this when police rate is higher
than 4Gbps, 32-bit calculation ignores the carry. This results
in incorrect rate configurationn the device.
Fix it by performing 64-bit calculation.
Fixes: fcb64c0f56 ("net/mlx5: E-Switch, add ingress rate support")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Mat Martineau says:
====================
mptcp: Miscellaneous fixes
Here are some MPTCP fixes for the -net tree, addressing various issues
we have seen thanks to syzkaller and other testing:
Patch 1 correctly propagates errors at connection time and for TCP
fallback connections.
Patch 2 sets the expected poll() events on SEND_SHUTDOWN.
Patch 3 fixes a retranmit crash and unneeded retransmissions.
Patch 4 fixes possible uninitialized data on the error path during
socket creation.
Patch 5 addresses a problem with MPTCP window updates.
Patch 6 fixes a case where MPTCP retransmission can get stuck.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently we do not schedule the MPTCP retransmission
timer after pushing the data when such action happens
in the subflow context.
This may cause hang-up on active-backup scenarios, or
even when only single subflow msks are involved, if we lost
some peer's ack.
Fixes: 6e628cd3a8 ("mptcp: use mptcp release_cb for delayed tasks")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move mptcp_cleanup_rbuf() related checks inside the mentioned
helper and extend them to mirror TCP checks more closely.
Additionally drop the 'rmem_pending' hack, since commit 879526030c
("mptcp: protect the rx path with the msk socket spinlock") we
can use instead 'rmem_released'.
Fixes: ea4ca586b1 ("mptcp: refine MPTCP-level ack scheduling")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The mptcp subflow route_req() callback performs the subflow
req initialization after the route_req() check. If the latter
fails, mptcp-specific bits of the current request sockets
are left uninitialized.
The above causes bad things at req socket disposal time, when
the mptcp resources are cleared.
This change addresses the issue by splitting subflow_init_req()
into the actual initialization and the mptcp-specific checks.
The initialization is moved before any possibly failing check.
Reported-by: Christoph Paasch <cpaasch@apple.com>
Fixes: 7ea851d19b ("tcp: merge 'init_req' and 'route_req' functions")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Syzkaller was able to trigger the following splat again:
WARNING: CPU: 1 PID: 12512 at net/mptcp/protocol.c:761 mptcp_reset_timer+0x12a/0x160 net/mptcp/protocol.c:761
Modules linked in:
CPU: 1 PID: 12512 Comm: kworker/1:6 Not tainted 5.10.0-rc6 #52
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Workqueue: events mptcp_worker
RIP: 0010:mptcp_reset_timer+0x12a/0x160 net/mptcp/protocol.c:761
Code: e8 4b 0c ad ff e8 56 21 88 fe 48 b8 00 00 00 00 00 fc ff df 48 c7 04 03 00 00 00 00 48 83 c4 40 5b 5d 41 5c c3 e8 36 21 88 fe <0f> 0b 41 bc c8 00 00 00 eb 98 e8 e7 b1 af fe e9 30 ff ff ff 48 c7
RSP: 0018:ffffc900018c7c68 EFLAGS: 00010293
RAX: ffff888108cb1c80 RBX: 1ffff92000318f8d RCX: ffffffff82ad0307
RDX: 0000000000000000 RSI: ffffffff82ad036a RDI: 0000000000000007
RBP: ffff888113e2d000 R08: ffff888108cb1c80 R09: ffffed10227c5ab7
R10: ffff888113e2d5b7 R11: ffffed10227c5ab6 R12: 0000000000000000
R13: ffff88801f100000 R14: ffff888113e2d5b0 R15: 0000000000000001
FS: 0000000000000000(0000) GS:ffff88811b500000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fd76a874ef8 CR3: 000000001689c005 CR4: 0000000000170ee0
Call Trace:
mptcp_worker+0xaa4/0x1560 net/mptcp/protocol.c:2334
process_one_work+0x8d3/0x1200 kernel/workqueue.c:2272
worker_thread+0x9c/0x1090 kernel/workqueue.c:2418
kthread+0x303/0x410 kernel/kthread.c:292
ret_from_fork+0x22/0x30 arch/x86/entry/entry_64.S:296
The mptcp_worker tries to update the MPTCP retransmission timer
even if such timer is not currently scheduled.
The mptcp_rtx_head() return value is bogus: we can have enqueued
data not yet transmitted. The above may additionally cause spurious,
unneeded MPTCP-level retransmissions.
Fix the issue adding an explicit clearing of the rtx queue before
trying to retransmit and checking for unacked data.
Additionally drop an unneeded timer stop call and the unused
mptcp_rtx_tail() helper.
Reported-by: Christoph Paasch <cpaasch@apple.com>
Fixes: 6e628cd3a8 ("mptcp: use mptcp release_cb for delayed tasks")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current mptcp_poll() implementation gives unexpected
results after shutdown(SEND_SHUTDOWN) and when the msk
status is TCP_CLOSE.
Set the correct mask.
Fixes: 8edf08649e ("mptcp: rework poll+nospace handling")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently all errors received on msk subflows are ignored.
We need to catch at least the errors on connect() and
on fallback sockets.
Use a custom sk_error_report callback at subflow level,
and do the real action under the msk socket lock - via
the usual sock_owned_by_user()/release_callback() schema.
Fixes: 6e628cd3a8 ("mptcp: use mptcp release_cb for delayed tasks")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Claudiu reported that on his system S2R cuts off power to the PHY and
after resuming certain PHY settings are lost. The PM folks confirmed
that cutting off power to selected components in S2R is a valid case.
Therefore resuming from S2R, same as from hibernation, has to assume
that the PHY has power-on defaults. As a consequence use the restore
callback also as resume callback.
In addition make sure that the interrupt configuration is restored.
Let's do this in phy_init_hw() and ensure that after this call
actual interrupt configuration is in sync with phydev->interrupts.
Currently, if interrupt was enabled before hibernation, we would
resume with interrupt disabled because that's the power-on default.
This fix applies cleanly only after the commit marked as fixed.
I don't have an affected system, therefore change is compile-tested
only.
[0] https://lore.kernel.org/netdev/1610120754-14331-1-git-send-email-claudiu.beznea@microchip.com/
Fixes: 611d779af7 ("net: phy: fix MDIO bus PM PHY resuming")
Reported-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If xdp_do_redirect() fails, the calling driver should handle recycling
or freeing of the page associated with the frame. The dpaa2-eth driver
didn't do either of them and just incremented a counter.
Fix this by trying to DMA map back the page and recycle it or, if the
mapping fails, just free it.
Fixes: d678be1dc1 ("dpaa2-eth: add XDP_REDIRECT support")
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
FSL_ENETC_MDIO use symbols from PHYLIB (MDIO_BUS) and MDIO_DEVRES,
however there are no dependency specified in Kconfig
ERROR: modpost: "__mdiobus_register" [drivers/net/ethernet/freescale/enetc/fsl-enetc-mdio.ko] undefined!
ERROR: modpost: "mdiobus_unregister" [drivers/net/ethernet/freescale/enetc/fsl-enetc-mdio.ko] undefined!
ERROR: modpost: "devm_mdiobus_alloc_size" [drivers/net/ethernet/freescale/enetc/fsl-enetc-mdio.ko] undefined!
add depends on MDIO_DEVRES && MDIO_BUS
Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The aq_nic_start function can fail in a variety of cases which leaves
the device in broken state.
An example case where the start function fails is the
request_threaded_irq which can be interrupted, resulting in a EINTR
result. This can be manually triggered by bringing the link up (e.g. ip
link set up) and triggering a SIGINT on the initiating process (e.g.
Ctrl+C). This would put the device into a half configured state.
Subsequently bringing the link up again would cause the napi_enable to
BUG.
In order to correctly clean up the failed attempt to start a device call
aq_nic_stop.
Signed-off-by: Nathan Rossi <nathan.rossi@digi.com>
Reviewed-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan says:
====================
bnxt_en: 2 bug fixes.
Two unrelated fixes. The first one fixes intermittent false TX timeouts
during ring reconfigurations. The second one fixes a formatting
discrepancy between the stored and the running FW versions.
Please also queue these for -stable. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The running fw.psid version is in decimal format but the stored
fw.psid is in hex format. This can mislead the user to reset the
NIC to activate the stored version to become the running version.
Fix it to display the stored fw.psid in decimal format.
Fixes: 1388875b39 ("bnxt_en: Add stored FW version info to devlink info_get cb.")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A TX queue can potentially immediately timeout after it is stopped
and the last TX timestamp on that queue was more than 5 seconds ago with
carrier still up. Prevent these intermittent false TX timeouts
by bringing down carrier first before calling netif_tx_disable().
Fixes: c0c050c58d ("bnxt_en: New Broadcom ethernet driver.")
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If set_link_state() fails for any reason, we still cleanup the adapter
state and cannot recover from a partial close anyway. So set the adapter
to CLOSED state. That way if a new soft/hard reset is processed, the
adapter will remain in the CLOSED state until the next ibmvnic_open().
Fixes: 01d9bd792d ("ibmvnic: Reorganize device close")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Reported-by: Abdul Haleem <abdhalee@in.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcp_rmem[1] has been changed to 131072, we should update the documentation
to reflect this.
Fixes: a337531b94 ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Zhibin Liu <zhibinliu@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Point out where patchwork bot's code lives, and that we don't want
people posting stuff that doesn't build.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The test_xdp_redirect.sh script uses a bash feature, '&>'. On systems,
e.g. Debian, where '/bin/sh' is dash, this will not work as
expected. Use bash in the shebang to get the expected behavior.
Further, using 'set -e' means that the error of a command cannot be
captured without the command being executed with '&&' or '||'. Let us
restructure the ping-commands, and use them as an if-expression, so
that we can capture the return value.
v4: Added missing Fixes:, and removed local variables. (Andrii)
v3: Reintroduced /bin/bash, and kept 'set -e'. (Andrii)
v2: Kept /bin/sh and removed bashisms. (Randy)
Fixes: 996139e801 ("selftests: bpf: add a test for XDP redirect")
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210211082029.1687666-1-bjorn.topel@gmail.com
Reject the unsupported and invalid ct_state flags of cls flower rules.
Fixes: e0ace68af2 ("net/sched: cls_flower: Add matching on conntrack info")
Signed-off-by: wenxu <wenxu@ucloud.cn>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Address a performance regression related to scale-invariance on x86
that may prevent turbo CPU frequencies from being used in certain
workloads on systems using acpi-cpufreq as the CPU performance
scaling driver and schedutil as the scaling governor.
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmAkGloSHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxCNkP/3uQly0iE4WdsxiBlWgF6zQH5PkezzSu
vXY0E2NbzAlUpke3zDIZ6zkN6DfG1yAl4vpsVzy5N/kTzFnaFPbPH3ylZ2x/oKBZ
Rd9yl0uz13UR4txkY49ZRF3c3vMhFoGzNfIXYjMQDGevyfarMLdpR96GFbpFTCT5
I5gZfDtOuAwpXY+Mr+UplTu7PTrmkf2jfQ/T/b+jog3NAjqODwnLT8dwIOTZnuPk
vbCOhb5vsiUHaqilrKkuGS5TzGsb/KCa6k4kaf7WhoFiU99KKcaZRLca/4FlJuVj
Q4rgSrtPsbvhG2vmucprunrsyt21JQMDnERqMlPcEls/c0ONgS4fMc5YJlO6KgZZ
Mlu01f/oE84jQ//0Y3LVi6v6w+yOiBi1Ie9yD8wnOkn6c+r6sWWbvd5Kg/guGnwi
LLSdemslw4r0ltimFmWD5I86ZXDJ1gwU9iuv+SdxoyppHHwOOAu5l/FhEgNvuWbl
LeuLrl7BhYTbN40ouKivoQ8smTpI0EmZX2MRm+l5NV4hHQ+8df16Rt7hoFvAdI2L
VJe/i0sgOcdCVnSovxQ8WeuSMGQtqrgFC4B/9+q6WOgAGIAFtyHQUtpHzB+XsIRo
P2VAmxcdJsqrmnbtoxxopMcqov5cAYI5PPJO/4yR+Z+gOBjeBvEJaxDF/shFT2NO
iaAQXEYLIPW9
=iOYP
-----END PGP SIGNATURE-----
Merge tag 'pm-5.11-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"Address a performance regression related to scale-invariance on x86
that may prevent turbo CPU frequencies from being used in certain
workloads on systems using acpi-cpufreq as the CPU performance scaling
driver and schedutil as the scaling governor"
* tag 'pm-5.11-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: ACPI: Update arch scale-invariance max perf ratio if CPPC is not there
cpufreq: ACPI: Extend frequency tables to cover boost frequencies
Revert a problematic ACPICA commit that changed the code to attempt
to update memory regions which may be read-only on some systems (Ard
Biesheuvel).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmAkGeESHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRx90EQAJvCLtbXhf+AsQbYiDhGwH0VE4EvDjqJ
M3qZL8QDeXx+HRqicdCevpdukk3CAosm468yudwrEIpk7vUTtW2vgjTeVNuH3O4n
Yz7Aw5EXfOO7LGXpZ+oyYPNpPZ9DK5BugpZW6C85qlBPHGzTem7IkoEOEem7lvUV
QwahrwZ4D20/blgunGGt5jx8ilp9YA9wQl0FUaoiAfH7ydoA4YTbFqdcG2jEe9Xb
yQvpLPoD50QTO0Gc1R86C0rehNfVQaXI9L5Nf8w5LU0UPGL2fTWkNwDFga8XvrH/
ZkIVWIFYBF6cgiOaB5TaZ/ufWw7BXe8c2X8lEdn2uJjZ4CXyKBVxAtdiqk+Bf57o
mMszQDwf9hot+fXpq3TDeDhu8KwoLMpHITGxUzZ2Y50wtsb/vwkT45x9PWmUObTz
fXvKbvGWZ/QFABsAGlBCAXUsTsUhkCnBuZ7icbbdG+HAqLg3/QJHvORg7pFYNCJH
wlfChKpoH9n69VIo5Ae5NFxp678rW616E3N1yTF+0OK9uKhVqu7PxqRd9qWLOG4F
Fm/fJB9XtIOzNaVmSqMd8YONpFv1Kf450d8uC3D0QPrCSE9OPl3b+JnqEfhxx2xl
sjcT0UKewwm+H0pTzk/FdYqW1GpTV/5uWu+onhfZvVGePuJCRSJ+URw7FuhUfmiE
iTAT8mV3zo6u
=0hUo
-----END PGP SIGNATURE-----
Merge tag 'acpi-5.11-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Revert a problematic ACPICA commit that changed the code to attempt to
update memory regions which may be read-only on some systems (Ard
Biesheuvel)"
* tag 'acpi-5.11-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
Revert "ACPICA: Interpreter: fix memory leak by using existing buffer"