As suggested by David, document a somewhat unexpected behavior that results
from net.ipv4.tcp_l3mdev_accept=1. This behavior was encountered while
debugging FRR, a VRF-aware application, on a system which used
net.ipv4.tcp_l3mdev_accept=1 and where TCP connections for BGP with MD5
keys were failing to establish.
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Two new methods have been introduced, add some verbiage about what they do.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This function has disappeared in commit edac6f6332 ("Revert "net: dsa:
Allow drivers to filter packets they can decode source port from"").
Also, since commit 4e50025129 ("net: dsa: generalize overhead for
taggers that use both headers and trailers"), the next paragraph is no
longer true (it is still discouraged to do that, but it is now
supported, so no point in mentioning it). Delete.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the paragraphs that talk about the various modes of traffic
support, bridging with foreign interfaces, etc etc. There is nothing
that the user needs to know now, it should all work out of the box as
expected.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The sja1105 driver has removed its devlink params, so there is nothing
to see here.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This series introduces the support for two new mlx5 features:
1) Sample offload for tunneled traffic
2) devlink rate objects support
1) From Chris Mi: Sample offload for tunneled traffic
=====================================================
Background and solution
-----------------------
Currently the sample offload actions send the encapsulated packet
to software. This series de-capsulates the packet before performing
the sampling and set the tunnel properties on the skb metadata
fields to make the behavior consistent with OVS sFlow.
If de-capsulating first, we can't use the same match like before in
default table. So instantiate a post action instance to continue
processing the action list. If HW can preserve reg_c, also use the
post action instance.
Post action infrastructure
--------------------------
Some tc actions are modeled in hardware using multiple tables
causing a tc action list split. For example, CT action is modeled
by jumping to a ct table which is controlled by nf flow table.
sFlow jumps in hardware to a sample table, which continues to a
"default table" where it should continue processing the action list.
Multi table actions are modeled in hardware using a unique fte_id.
The fte_id is set before jumping to a table. Split actions continue
to a post-action table where the matched fte_id value continues the
execution the tc action list.
This series also introduces post action infrastructure. Both ct and
sample use it.
Sample for tunnel in TC SW
--------------------------
tc filter add dev vxlan1 protocol ip parent ffff: prio 3 \
flower src_mac 24:25:d0:e1:00:00 dst_mac 02:25:d0:13:01:02 \
enc_src_ip 192.168.1.14 enc_dst_ip 192.168.1.13 \
enc_dst_port 4789 enc_key_id 4 \
action sample rate 1 group 6 \
action tunnel_key unset \
action mirred egress redirect dev enp4s0f0_1
MLX5 sample HW offload
----------------------
For the following typical flow table:
+-------------------------------+
+ original flow table +
+-------------------------------+
+ original match +
+-------------------------------+
+ sample action + other actions +
+-------------------------------+
We translate the tc filter with sample action to the following HW model:
+---------------------+
+ original flow table +
+---------------------+
+ original match +
+---------------------+
| set fte_id (if reg_c preserve cap)
| do decap
v
+------------------------------------------------+
+ Flow Sampler Object +
+------------------------------------------------+
+ sample ratio +
+------------------------------------------------+
+ sample table id | default table id +
+------------------------------------------------+
| |
v v
+-----------------------------+ +-------------------+
+ sample table + + default table +
+-----------------------------+ +-------------------+
+ forward to management vport + |
+-----------------------------+ |
+-------+------+
| |reg_c preserve cap
| |or decap action
v v
+-----------------+ +-------------+
+ per vport table + + post action +
+-----------------+ +-------------+
+ original match +
+-----------------+
+ other actions +
+-----------------+
2) From Dmytro Linkin: devlink rate object support for mlx5_core driver
=======================================================================
HIGH-LEVEL OVERVIEW
Devlink leaf rate objects created per vport (VF/SF, and PF on BlueField)
in switchdev mode on devlink port registration.
Implement devlink ops callbacks to create/destroy rate groups, set TX
rate values of the vport/group, assign vport to the group.
Driver accepts TX rate values as fraction of 1Mbps.
Refactor existing eswitch QoS infrastructure to be accessible by legacy
NDO rate API and new devlink rate API. NDO rate API is not
removed/disabled in switchdev mode to not break existing users. Rate
values configured with NDO rate API are not visible for devlink
infrastructure, therefore APIs should not be used simultaneously.
IMPLEMENTATION DETAILS
Driver provide two level rate hierarchy to manage bandwidth - group
level and vport level. Initially each vport added to internal unlimited
group created by default. Each rate element (vport or group) receive
bandwidth relative to its parent element (for groups the parent is a
physical link itself) in a Round Robin manner, where element get
bandwidth value according to its weight. Example:
Created four rate groups with tx_share limits:
$ devlink port function rate add \
pci/0000:06:00.0/group_1 tx_share 30gbit
$ devlink port function rate add \
pci/0000:06:00.0/group_2 tx_share 20gbit
$ devlink port function rate add \
pci/0000:06:00.0/group_3 tx_share 20gbit
$ devlink port function rate add \
pci/0000:06:00.0/group_4 tx_share 10gbit
Weights created in HW for each group are relative to the bigest tx_share
value, which is 30gbit:
<group_1> 1.0
<group_2> 0.67
<group_3> 0.67
<group_4> 0.33
Assuming link speed is 50 Gbit/sec and each group can sustain such
amount of traffic, maximum bandwidth is 50 / (1.0 + 0.67 + 0.67 + 0.33)
= ~18.75 Gbit/sec. Normilized bandwidth values for groups:
<group_1> 18.75 * 1.0 = 18.75 Gbit/sec
<group_2> 18.75 * 0.67 = 12.5 Gbit/sec
<group_3> 18.75 * 0.67 = 12.5 Gbit/sec
<group_4> 18.75 * 0.33 = 6.25 Gbit/sec
If in example above group_1 doesn't produce any traffic, then maximum
bandwidth becomes 50 / (0.67 + 0.67 + 0.33) = ~30.0 Gbit/sec. Normalized
values:
<group_2> 30.0 * 0.67 = 20.0 Gbit/sec
<group_3> 30.0 * 0.67 = 20.0 Gbit/sec
<group_4> 30.0 * 0.33 = 10.0 Gbit/sec
Same normalization applied to each vport in the group.
Normalized values are internal, therefore driver provides QoS
tracepoints for next events:
* vport rate element creation/deletion:
* vport rate element configuration;
* group rate element creation/deletion;
* group rate element configuration.
PATCHES OVERVIEW
1 - Moving and isolation of eswitch QoS logic in separate file;
2 - Implement devlink leaf rate object support for vports;
3 - Implement rate groups creation/deletion;
4 - Implement TX rate management for the groups;
5 - Implement parent set for vports;
6 - Eswitch QoS tracepoints.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmEfNKEACgkQSD+KveBX
+j4DVgf/ZTX3n7xDVrqNgM3hkUOT7QKVCq5zUlDKw1IizE6I+8xB6LO3KmUPyWIn
+VBXE3c+aqUNZu4XlqdkVn1JskVdfTdAGXIIeBgMQskJ/VFCNqU4E9uieEmpiHnI
DUUkEI6eBURJSu1KPD0xdMqtdGJE+/KjwmfZFnCsa4uxmRuV7B0BdxzGIA6AMFKn
+jNS/PFbQM6bUDqP2UUwd97sThtTzDIVH86gu36yK/mdcwdLreqKeuxoHJWePWHC
qLReBC5OQ9zXD2F1Dv2u3WU7EJT7qyCLNrBUrTwHcR9N0Di+2a6lGvjRL5tjWKKC
KOrNkkviurmPp+VieJU+rHHYQwjoGQ==
=XfOY
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2021-08-19' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2021-08-19
This series introduces the support for two new mlx5 features:
1) Sample offload for tunneled traffic
2) devlink rate objects support
1) From Chris Mi: Sample offload for tunneled traffic
=====================================================
Background and solution
-----------------------
Currently the sample offload actions send the encapsulated packet
to software. This series de-capsulates the packet before performing
the sampling and set the tunnel properties on the skb metadata
fields to make the behavior consistent with OVS sFlow.
If de-capsulating first, we can't use the same match like before in
default table. So instantiate a post action instance to continue
processing the action list. If HW can preserve reg_c, also use the
post action instance.
Post action infrastructure
--------------------------
Some tc actions are modeled in hardware using multiple tables
causing a tc action list split. For example, CT action is modeled
by jumping to a ct table which is controlled by nf flow table.
sFlow jumps in hardware to a sample table, which continues to a
"default table" where it should continue processing the action list.
Multi table actions are modeled in hardware using a unique fte_id.
The fte_id is set before jumping to a table. Split actions continue
to a post-action table where the matched fte_id value continues the
execution the tc action list.
This series also introduces post action infrastructure. Both ct and
sample use it.
Sample for tunnel in TC SW
--------------------------
tc filter add dev vxlan1 protocol ip parent ffff: prio 3 \
flower src_mac 24:25:d0:e1:00:00 dst_mac 02:25:d0:13:01:02 \
enc_src_ip 192.168.1.14 enc_dst_ip 192.168.1.13 \
enc_dst_port 4789 enc_key_id 4 \
action sample rate 1 group 6 \
action tunnel_key unset \
action mirred egress redirect dev enp4s0f0_1
MLX5 sample HW offload
----------------------
For the following typical flow table:
+-------------------------------+
+ original flow table +
+-------------------------------+
+ original match +
+-------------------------------+
+ sample action + other actions +
+-------------------------------+
We translate the tc filter with sample action to the following HW model:
+---------------------+
+ original flow table +
+---------------------+
+ original match +
+---------------------+
| set fte_id (if reg_c preserve cap)
| do decap
v
+------------------------------------------------+
+ Flow Sampler Object +
+------------------------------------------------+
+ sample ratio +
+------------------------------------------------+
+ sample table id | default table id +
+------------------------------------------------+
| |
v v
+-----------------------------+ +-------------------+
+ sample table + + default table +
+-----------------------------+ +-------------------+
+ forward to management vport + |
+-----------------------------+ |
+-------+------+
| |reg_c preserve cap
| |or decap action
v v
+-----------------+ +-------------+
+ per vport table + + post action +
+-----------------+ +-------------+
+ original match +
+-----------------+
+ other actions +
+-----------------+
2) From Dmytro Linkin: devlink rate object support for mlx5_core driver
=======================================================================
HIGH-LEVEL OVERVIEW
Devlink leaf rate objects created per vport (VF/SF, and PF on BlueField)
in switchdev mode on devlink port registration.
Implement devlink ops callbacks to create/destroy rate groups, set TX
rate values of the vport/group, assign vport to the group.
Driver accepts TX rate values as fraction of 1Mbps.
Refactor existing eswitch QoS infrastructure to be accessible by legacy
NDO rate API and new devlink rate API. NDO rate API is not
removed/disabled in switchdev mode to not break existing users. Rate
values configured with NDO rate API are not visible for devlink
infrastructure, therefore APIs should not be used simultaneously.
IMPLEMENTATION DETAILS
Driver provide two level rate hierarchy to manage bandwidth - group
level and vport level. Initially each vport added to internal unlimited
group created by default. Each rate element (vport or group) receive
bandwidth relative to its parent element (for groups the parent is a
physical link itself) in a Round Robin manner, where element get
bandwidth value according to its weight. Example:
Created four rate groups with tx_share limits:
$ devlink port function rate add \
pci/0000:06:00.0/group_1 tx_share 30gbit
$ devlink port function rate add \
pci/0000:06:00.0/group_2 tx_share 20gbit
$ devlink port function rate add \
pci/0000:06:00.0/group_3 tx_share 20gbit
$ devlink port function rate add \
pci/0000:06:00.0/group_4 tx_share 10gbit
Weights created in HW for each group are relative to the bigest tx_share
value, which is 30gbit:
<group_1> 1.0
<group_2> 0.67
<group_3> 0.67
<group_4> 0.33
Assuming link speed is 50 Gbit/sec and each group can sustain such
amount of traffic, maximum bandwidth is 50 / (1.0 + 0.67 + 0.67 + 0.33)
= ~18.75 Gbit/sec. Normilized bandwidth values for groups:
<group_1> 18.75 * 1.0 = 18.75 Gbit/sec
<group_2> 18.75 * 0.67 = 12.5 Gbit/sec
<group_3> 18.75 * 0.67 = 12.5 Gbit/sec
<group_4> 18.75 * 0.33 = 6.25 Gbit/sec
If in example above group_1 doesn't produce any traffic, then maximum
bandwidth becomes 50 / (0.67 + 0.67 + 0.33) = ~30.0 Gbit/sec. Normalized
values:
<group_2> 30.0 * 0.67 = 20.0 Gbit/sec
<group_3> 30.0 * 0.67 = 20.0 Gbit/sec
<group_4> 30.0 * 0.33 = 10.0 Gbit/sec
Same normalization applied to each vport in the group.
Normalized values are internal, therefore driver provides QoS
tracepoints for next events:
* vport rate element creation/deletion:
* vport rate element configuration;
* group rate element creation/deletion;
* group rate element configuration.
PATCHES OVERVIEW
1 - Moving and isolation of eswitch QoS logic in separate file;
2 - Implement devlink leaf rate object support for vports;
3 - Implement rate groups creation/deletion;
4 - Implement TX rate management for the groups;
5 - Implement parent set for vports;
6 - Eswitch QoS tracepoints.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
- bump version strings, by Simon Wunderlich
- update docs about move IRC channel away from freenode,
by Sven Eckelmann
- Switch to kstrtox.h for kstrtou64, by Sven Eckelmann
- Update NULL checks, by Sven Eckelmann (2 patches)
- remove remaining skb-copy calls for broadcast packets,
by Linus Lüssing
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAmEeeK8WHHN3QHNpbW9u
d3VuZGVybGljaC5kZQAKCRChK+OYQpKeofhWD/0YwNndM/FFo/NHcO3GFDZx9eLM
dFuO7zdMilzgg462q7+mgi0jXA2Kp50Y+JcCqS2XVRIsMgKTVABflgmSlUIOdDoC
A3KKRVgQ1HNPD4WREaEV2CLvBdhR9wEI0jRHvZou7n/VWrfJcUHgdl9aDA2/ptlP
NcuYCKC99HCQmvaBt4GZgOunYDeplmo2qLip2gpwJWf9/vkL7HiBe3HtQSh1HI2y
EIn4SExZOFcxMmKeJMsYl35OZh9oFv7nTnpZBGyKjA+HS0pu03aaPNRGMjW/pdhF
f7V61aDJBU0xU6PjWvUegY4VMInrjW8F10EEJck461J/B9PXjUHUaH8BXXuGBkRM
0kU0Cv21a3Ovz23lgnXSnXu/xjqq5/zZHjnGvyPAMMppAI5f73q/0THtv9iOu+Cz
Qf/tYl0BIRir20ZWtddQ9x2W3+cBYPOYrf/tnmWqFhPddenn+xitwTysVA6fOykQ
pVksQ5UVpDZasZI9Al+R2M0CBttn7tS/iu95PV9CMST8aRgUuU90yd2Ocg3rRDNQ
iEor0AozmO879W460BFQcTILw+D7OdlErUV8H8VW4507imZ7JXGPwZTFxhjM2Xhx
wUXo/o2sxt/ITSdtZAeQj8zOXQMtOi3KlXtTl8ZzyRT//YLWah0j4oBf0a8K62/y
i1Pd5MgXDQAm8fHkBg==
=sHz+
-----END PGP SIGNATURE-----
Merge tag 'batadv-next-pullrequest-20210819' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
This cleanup patchset includes the following patches:
- bump version strings, by Simon Wunderlich
- update docs about move IRC channel away from freenode,
by Sven Eckelmann
- Switch to kstrtox.h for kstrtou64, by Sven Eckelmann
- Update NULL checks, by Sven Eckelmann (2 patches)
- remove remaining skb-copy calls for broadcast packets,
by Linus Lüssing
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Turn BPF_PROG_RUN into a proper always inlined function. No functional and
performance changes are intended, but it makes it much easier to understand
what's going on with how BPF programs are actually get executed. It's more
obvious what types and callbacks are expected. Also extra () around input
parameters can be dropped, as well as `__` variable prefixes intended to avoid
naming collisions, which makes the code simpler to read and write.
This refactoring also highlighted one extra issue. BPF_PROG_RUN is both
a macro and an enum value (BPF_PROG_RUN == BPF_PROG_TEST_RUN). Turning
BPF_PROG_RUN into a function causes naming conflict compilation error. So
rename BPF_PROG_RUN into lower-case bpf_prog_run(), similar to
bpf_prog_run_xdp(), bpf_prog_run_pin_on_cpu(), etc. All existing callers of
BPF_PROG_RUN, the macro, are switched to bpf_prog_run() explicitly.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210815070609.987780-2-andrii@kernel.org
Add documentation for two bad signal integrity substates:
ETHTOOL_LINK_EXT_SUBSTATE_BSI_SERDES_REFERENCE_CLOCK_LOST
ETHTOOL_LINK_EXT_SUBSTATE_BSI_SERDES_ALOS.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The msk can use backup subflows to transmit in-sequence data
only if there are no other active subflow. On active backup
scenario, the MPTCP connection can do forward progress only
due to MPTCP retransmissions - rtx can pick backup subflows.
This patch introduces a new flag flow MPTCP subflows: if the
underlying TCP connection made no progresses for long time,
and there are other less problematic subflows available, the
given subflow become stale.
Stale subflows are not considered active: if all non backup
subflows become stale, the MPTCP scheduler can pick backup
subflows for plain transmissions.
Stale subflows can return in active state, as soon as any reply
from the peer is observed.
Active backup scenarios can now leverage the available b/w
with no restrinction.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/207
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add new device generic parameter to enable/disable creation of
VDPA net auxiliary device and associated device functionality
in the devlink instance.
User who prefers to disable such functionality can disable it using below
example.
$ devlink dev param set pci/0000:06:00.0 \
name enable_vnet value false cmode driverinit
$ devlink dev reload pci/0000:06:00.0
At this point devlink instance do not create auxiliary device for the
VDPA net functionality.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add new device generic parameter to enable/disable creation of
RDMA auxiliary device and associated device functionality
in the devlink instance.
User who prefers to disable such functionality can disable it using below
example.
$ devlink dev param set pci/0000:06:00.0 \
name enable_rdma value false cmode driverinit
$ devlink dev reload pci/0000:06:00.0
At this point devlink instance do not create auxiliary device for the
RDMA functionality.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add new device generic parameter to enable/disable creation of
Ethernet auxiliary device and associated device functionality
in the devlink instance.
User who prefers to disable such functionality can disable it using below
example.
$ devlink dev param set pci/0000:06:00.0 \
name enable_eth value false cmode driverinit
$ devlink dev reload pci/0000:06:00.0
At this point devlink instance do not create auxiliary device for the
Ethernet functionality.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Due to recent developments around the Freenode.org IRC network, the
opinions about the usage of this service shifted dramatically. The majority
of the still active users of the #batman channel prefers a move to the
hackint.org network.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Restrict range element expansion in ipset to avoid soft lockup,
from Jozsef Kadlecsik.
2) Memleak in error path for nf_conntrack_bridge for IPv4 packets,
from Yajun Deng.
3) Simplify conntrack garbage collection strategy to avoid frequent
wake-ups, from Florian Westphal.
4) Fix NFNLA_HOOK_FUNCTION_NAME string, do not include module name.
5) Missing chain family netlink attribute in chain description
in nfnetlink_hook.
6) Incorrect sequence number on nfnetlink_hook dumps.
7) Use netlink request family in reply message for consistency.
8) Remove offload_pickup sysctl, use conntrack for established state
instead, from Florian Westphal.
9) Translate NFPROTO_INET/ingress to NFPROTO_NETDEV/ingress, since
NFPROTO_INET is not exposed through nfnetlink_hook.
* git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf:
netfilter: nfnetlink_hook: translate inet ingress to netdev
netfilter: conntrack: remove offload_pickup sysctl again
netfilter: nfnetlink_hook: Use same family as request message
netfilter: nfnetlink_hook: use the sequence number of the request message
netfilter: nfnetlink_hook: missing chain family
netfilter: nfnetlink_hook: strip off module name from hookfn
netfilter: conntrack: collect all entries in one cycle
netfilter: nf_conntrack_bridge: Fix memory leak when error
netfilter: ipset: Limit the maximal range of consecutive elements to add/delete
====================
Link: https://lore.kernel.org/r/20210806151149.6356-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
These two sysctls were added because the hardcoded defaults (2 minutes,
tcp, 30 seconds, udp) turned out to be too low for some setups.
They appeared in 5.14-rc1 so it should be fine to remove it again.
Marcelo convinced me that there should be no difference between a flow
that was offloaded vs. a flow that was not wrt. timeout handling.
Thus the default is changed to those for TCP established and UDP stream,
5 days and 120 seconds, respectively.
Marcelo also suggested to account for the timeout value used for the
offloading, this avoids increase beyond the value in the conntrack-sysctl
and will also instantly expire the conntrack entry with altered sysctls.
Example:
nf_conntrack_udp_timeout_stream=60
nf_flowtable_udp_timeout=60
This will remove offloaded udp flows after one minute, rather than two.
An earlier version of this patch also cleared the ASSURED bit to
allow nf_conntrack to evict the entry via early_drop (i.e., table full).
However, it looks like we can safely assume that connection timed out
via HW is still in established state, so this isn't needed.
Quoting Oz:
[..] the hardware sends all packets with a set FIN flags to sw.
[..] Connections that are aged in hardware are expected to be in the
established state.
In case it turns out that back-to-sw-path transition can occur for
'dodgy' connections too (e.g., one side disappeared while software-path
would have been in RETRANS timeout), we can adjust this later.
Cc: Oz Shlomo <ozsh@nvidia.com>
Cc: Paul Blakey <paulb@nvidia.com>
Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Build failure in drivers/net/wwan/mhi_wwan_mbim.c:
add missing parameter (0, assuming we don't want buffer pre-alloc).
Conflict in drivers/net/dsa/sja1105/sja1105_main.c between:
589918df93 ("net: dsa: sja1105: be stateless with FDB entries on SJA1105P/Q/R/S/SJA1110 too")
0fac6aa098 ("net: dsa: sja1105: delete the best_effort_vlan_filtering mode")
Follow the instructions from the commit message of the former commit
- removed the if conditions. When looking at commit 589918df93 ("net:
dsa: sja1105: be stateless with FDB entries on SJA1105P/Q/R/S/SJA1110 too")
note that the mask_iotag fields get removed by the following patch.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There are aspects of netdevsim which are commonly
misunderstood and pointed out in review. Cong
suggest we document them.
Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add an option lacp_active, which is similar with team's runner.active.
This option specifies whether to send LACPDU frames periodically. If set
on, the LACPDU frames are sent along with the configured lacp_rate
setting. If set off, the LACPDU frames acts as "speak when spoken to".
Note, the LACPDU state frames still will be sent when init or unbind port.
v2: remove module parameter
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IF_OPER_TESTING is in fact used today.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrii Nakryiko says:
====================
bpf-next 2021-07-30
We've added 64 non-merge commits during the last 15 day(s) which contain
a total of 83 files changed, 5027 insertions(+), 1808 deletions(-).
The main changes are:
1) BTF-guided binary data dumping libbpf API, from Alan.
2) Internal factoring out of libbpf CO-RE relocation logic, from Alexei.
3) Ambient BPF run context and cgroup storage cleanup, from Andrii.
4) Few small API additions for libbpf 1.0 effort, from Evgeniy and Hengqi.
5) bpf_program__attach_kprobe_opts() fixes in libbpf, from Jiri.
6) bpf_{get,set}sockopt() support in BPF iterators, from Martin.
7) BPF map pinning improvements in libbpf, from Martynas.
8) Improved module BTF support in libbpf and bpftool, from Quentin.
9) Bpftool cleanups and documentation improvements, from Quentin.
10) Libbpf improvements for supporting CO-RE on old kernels, from Shuyi.
11) Increased maximum cgroup storage size, from Stanislav.
12) Small fixes and improvements to BPF tests and samples, from various folks.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (64 commits)
tools: bpftool: Complete metrics list in "bpftool prog profile" doc
tools: bpftool: Document and add bash completion for -L, -B options
selftests/bpf: Update bpftool's consistency script for checking options
tools: bpftool: Update and synchronise option list in doc and help msg
tools: bpftool: Complete and synchronise attach or map types
selftests/bpf: Check consistency between bpftool source, doc, completion
tools: bpftool: Slightly ease bash completion updates
unix_bpf: Fix a potential deadlock in unix_dgram_bpf_recvmsg()
libbpf: Add btf__load_vmlinux_btf/btf__load_module_btf
tools: bpftool: Support dumping split BTF by id
libbpf: Add split BTF support for btf__load_from_kernel_by_id()
tools: Replace btf__get_from_id() with btf__load_from_kernel_by_id()
tools: Free BTF objects at various locations
libbpf: Rename btf__get_from_id() as btf__load_from_kernel_by_id()
libbpf: Rename btf__load() as btf__load_into_kernel()
libbpf: Return non-null error on failures in libbpf_find_prog_btf_id()
bpf: Emit better log message if bpf_iter ctx arg btf_id == 0
tools/resolve_btfids: Emit warnings and patch zero id for missing symbols
bpf: Increase supported cgroup storage value size
libbpf: Fix race when pinning maps in parallel
...
====================
Link: https://lore.kernel.org/r/20210730225606.1897330-1-andrii@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Document the mirroring capabilities of the dpaa2-switch driver,
any restrictions that are imposed and some example commands.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change adds a brief document about the sockets API provided for
sending and receiving MCTP messages from userspace.
This is roughly based on the OpenBMC design document, at:
https://github.com/openbmc/docs/blob/master/designs/mctp/mctp-kernel.md
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Append ioam6-sysctl to toctree in order to get rid of building warnings.
Signed-off-by: Hu Haowen <src.res@email.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
All other user triggered operations are gone from ndo_ioctl, so move
the SIOCBOND family into a custom operation as well.
The .ndo_ioctl() helper is no longer called by the dev_ioctl.c code now,
but there are still a few definitions in obsolete wireless drivers as well
as the appletalk and ieee802154 layers to call SIOCSIFADDR/SIOCGIFADDR
helpers from inside the kernel.
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to further reduce the scope of ndo_do_ioctl(), move
out the SIOCWANDEV handling into a new network device operation
function.
Adjust the prototype to only pass the if_settings sub-structure
in place of the ifreq, and remove the redundant 'cmd' argument
in the process.
Cc: Krzysztof Halasa <khc@pm.waw.pl>
Cc: "Jan \"Yenya\" Kasprzak" <kas@fi.muni.cz>
Cc: Kevin Curtis <kevin.curtis@farsite.co.uk>
Cc: Zhao Qiang <qiang.zhao@nxp.com>
Cc: Martin Schiller <ms@dev.tdt.de>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: linux-x25@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Most users of ndo_do_ioctl are ethernet drivers that implement
the MII commands SIOCGMIIPHY/SIOCGMIIREG/SIOCSMIIREG, or hardware
timestamping with SIOCSHWTSTAMP/SIOCGHWTSTAMP.
Separate these from the few drivers that use ndo_do_ioctl to
implement SIOCBOND, SIOCBR and SIOCWANDEV commands.
This is a purely cosmetic change intended to help readers find
their way through the implementation.
Cc: Doug Ledford <dledford@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Vivien Didelot <vivien.didelot@gmail.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Vladimir Oltean <olteanv@gmail.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SIOCDEVPRIVATE ioctl commands are mainly used in really old
drivers, and they have a number of problems:
- They hide behind the normal .ndo_do_ioctl function that
is also used for other things in modern drivers, so it's
hard to spot a driver that actually uses one of these
- Since drivers use a number different calling conventions,
it is impossible to support compat mode for them in
a generic way.
- With all drivers using the same 16 commands codes, there
is no way to introspect the data being passed through
things like strace.
Add a new net_device_ops callback pointer, to address the
first two of these. Separating them from .ndo_do_ioctl
makes it easy to grep for drivers with a .ndo_siocdevprivate
callback, and the unwieldy name hopefully makes it easier
to spot in code review.
By passing the ifreq structure and the ifr_data pointer
separately, it is no longer necessary to overload these,
and the driver can use either one for a given command.
Cc: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a documentation entry for the DPAA2 switch listing its
requirements, features and some examples to go along them.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a file to document devlink support for hns3 driver, now support devlink
info and devlink reload.
Signed-off-by: Hao Chen <chenhao288@hisilicon.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the DPAA2 DPIO driver chapter title by adding the necessary
overline. Without this, the index page of the DPAA2 documentation
doesn't display properly.
Fixes: d8e516bac7 ("soc: fsl: dpio: Convert DPIO documentation to .rst")
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Link: https://lore.kernel.org/r/20210722100356.635078-5-ciorneiioana@gmail.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Multiple complaints have been raised from the TFO users on the internet
stating that the TFO blackhole logic is too aggressive and gets falsely
triggered too often.
(e.g. https://blog.apnic.net/2021/07/05/tcp-fast-open-not-so-fast/)
Considering that most middleboxes no longer drop TFO packets, we decide
to disable the blackhole logic by setting
/proc/sys/net/ipv4/tcp_fastopen_blackhole_timeout_set to 0 by default.
Fixes: cf1ef3f071 ("net/tcp_fastopen: Disable active side TFO in certain scenarios")
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add documentation for new IOAM sysctls:
- ioam6_id and ioam6_id_wide: two per-namespace sysctls
- ioam6_enabled, ioam6_id and ioam6_id_wide: three per-interface sysctls
Example of IOAM configuration based on the following simple topology:
_____ _____ _____
| | eth0 eth0 | | eth1 eth0 | |
| A |.----------.| B |.----------.| C |
|_____| |_____| |_____|
1) Node and interface IDs can be configured for IOAM:
# IOAM ID of A = 1, IOAM ID of A.eth0 = 11
(A) sysctl -w net.ipv6.ioam6_id=1
(A) sysctl -w net.ipv6.conf.eth0.ioam6_id=11
# IOAM ID of B = 2, IOAM ID of B.eth0 = 21, IOAM ID of B.eth1 = 22
(B) sysctl -w net.ipv6.ioam6_id=2
(B) sysctl -w net.ipv6.conf.eth0.ioam6_id=21
(B) sysctl -w net.ipv6.conf.eth1.ioam6_id=22
# IOAM ID of C = 3, IOAM ID of C.eth0 = 31
(C) sysctl -w net.ipv6.ioam6_id=3
(C) sysctl -w net.ipv6.conf.eth0.ioam6_id=31
Note that "_wide" IDs equivalents can be configured the same way.
2) Each node can be configured to form an IOAM domain. For instance,
we allow IOAM from A to C only (not the reverse path), i.e. enable
IOAM on ingress for B.eth0 and C.eth0:
(B) sysctl -w net.ipv6.conf.eth0.ioam6_enabled=1
(C) sysctl -w net.ipv6.conf.eth0.ioam6_enabled=1
3) An IOAM domain (e.g. ID=123) is defined and made known to each node:
(A) ip ioam namespace add 123
(B) ip ioam namespace add 123
(C) ip ioam namespace add 123
4) Finally, an IOAM Pre-allocated Trace can be inserted in traffic sent
by A when C (e.g. db02::2) is the destination:
(A) ip -6 route add db02::2/128 encap ioam6 trace type 0x800000 ns 123
size 12 dev eth0
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add new heading for extensions to make it more readable. Also, add one
more example of filtering interface index for better understanding.
Signed-off-by: Roy, UjjaL <royujjal@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/CAADnVQJ=DoRDcVkaXmY3EmNdLoO7gq1mkJOn5G=00wKH8qUtZQ@mail.gmail.com
Andrii Nakryiko says:
====================
pull-request: bpf 2021-07-15
The following pull-request contains BPF updates for your *net* tree.
We've added 9 non-merge commits during the last 5 day(s) which contain
a total of 9 files changed, 37 insertions(+), 15 deletions(-).
The main changes are:
1) Fix NULL pointer dereference in BPF_TEST_RUN for BPF_XDP_DEVMAP and
BPF_XDP_CPUMAP programs, from Xuan Zhuo.
2) Fix use-after-free of net_device in XDP bpf_link, from Xuan Zhuo.
3) Follow-up fix to subprog poke descriptor use-after-free problem, from
Daniel Borkmann and John Fastabend.
4) Fix out-of-range array access in s390 BPF JIT backend, from Colin Ian King.
5) Fix memory leak in BPF sockmap, from John Fastabend.
6) Fix for sockmap to prevent proc stats reporting bug, from John Fastabend
and Jakub Sitnicki.
7) Fix NULL pointer dereference in bpftool, from Tobias Klauser.
8) AF_XDP documentation fixes, from Baruch Siach.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Current release - regressions:
- sock: fix parameter order in sock_setsockopt()
Current release - new code bugs:
- netfilter: nft_last:
- fix incorrect arithmetic when restoring last used
- honor NFTA_LAST_SET on restoration
Previous releases - regressions:
- udp: properly flush normal packet at GRO time
- sfc: ensure correct number of XDP queues; don't allow enabling the
feature if there isn't sufficient resources to Tx from any CPU
- dsa: sja1105: fix address learning getting disabled on the CPU port
- mptcp: addresses a rmem accounting issue that could keep packets
in subflow receive buffers longer than necessary, delaying
MPTCP-level ACKs
- ip_tunnel: fix mtu calculation for ETHER tunnel devices
- do not reuse skbs allocated from skbuff_fclone_cache in the napi
skb cache, we'd try to return them to the wrong slab cache
- tcp: consistently disable header prediction for mptcp
Previous releases - always broken:
- bpf: fix subprog poke descriptor tracking use-after-free
- ipv6:
- allocate enough headroom in ip6_finish_output2() in case
iptables TEE is used
- tcp: drop silly ICMPv6 packet too big messages to avoid
expensive and pointless lookups (which may serve as a DDOS
vector)
- make sure fwmark is copied in SYNACK packets
- fix 'disable_policy' for forwarded packets (align with IPv4)
- netfilter: conntrack: do not renew entry stuck in tcp SYN_SENT state
- netfilter: conntrack: do not mark RST in the reply direction coming
after SYN packet for an out-of-sync entry
- mptcp: cleanly handle error conditions with MP_JOIN and syncookies
- mptcp: fix double free when rejecting a join due to port mismatch
- validate lwtstate->data before returning from skb_tunnel_info()
- tcp: call sk_wmem_schedule before sk_mem_charge in zerocopy path
- mt76: mt7921: continue to probe driver when fw already downloaded
- bonding: fix multiple issues with offloading IPsec to (thru?) bond
- stmmac: ptp: fix issues around Qbv support and setting time back
- bcmgenet: always clear wake-up based on energy detection
Misc:
- sctp: move 198 addresses from unusable to private scope
- ptp: support virtual clocks and timestamping
- openvswitch: optimize operation for key comparison
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmDu3mMACgkQMUZtbf5S
Irsjxg//UwcPJMYFmXV+fGkEsWYe1Kf29FcUDEeANFtbltfAcIfZ0GoTbSDRnrVb
HcYAKcm4XRx5bWWdQrQsQq/yiLbnS/rSLc7VRB+uRHWRKl3eYcaUB2rnCXsxrjGw
wQJgOmztDCJS4BIky24iQpF/8lg7p/Gj2Ih532gh93XiYo612FrEJKkYb2/OQfYX
GkbnZ0kL2Y1SV+bhy6aT5azvhHKM4/3eA4fHeJ2p8e2gOZ5ni0vpX0xEzdzKOCd0
vwR/Wu3h/+2QuFYVcSsVguuM++JXACG8MAS/Tof78dtNM4a3kQxzqeh5Bv6IkfTu
rokENLq4pjNRy+nBAOeQZj8Jd0K0kkf/PN9WMdGQtplMoFhjjV25R6PeRrV9wwPo
peozIz2MuQo7Kfof1D+44h2foyLfdC28/Z0CvRbDpr5EHOfYynvBbrnhzIGdQp6V
xgftKTOdgz2Djgg8HiblZund1FA44OYerddVAASrIsnSFnIz1VLVQIsfV+GLBwwc
FawrIZ6WfIjzRSrDGOvDsbAQI47T/1jbaPJeK6XgjWkQmjEd6UtRWRZLYCxemQEw
4HP3sWC96BOehuD8ylipVE1oFqrxCiOB/fZxezXqjo8dSX3NLdak4cCHTHoW5SuZ
eEAxQRaBliKd+P7hoy9cZ57CAu3zUa8kijfM5QRlCAHF+zSxaPs=
=QFnb
-----END PGP SIGNATURE-----
Merge tag 'net-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski.
"Including fixes from bpf and netfilter.
Current release - regressions:
- sock: fix parameter order in sock_setsockopt()
Current release - new code bugs:
- netfilter: nft_last:
- fix incorrect arithmetic when restoring last used
- honor NFTA_LAST_SET on restoration
Previous releases - regressions:
- udp: properly flush normal packet at GRO time
- sfc: ensure correct number of XDP queues; don't allow enabling the
feature if there isn't sufficient resources to Tx from any CPU
- dsa: sja1105: fix address learning getting disabled on the CPU port
- mptcp: addresses a rmem accounting issue that could keep packets in
subflow receive buffers longer than necessary, delaying MPTCP-level
ACKs
- ip_tunnel: fix mtu calculation for ETHER tunnel devices
- do not reuse skbs allocated from skbuff_fclone_cache in the napi
skb cache, we'd try to return them to the wrong slab cache
- tcp: consistently disable header prediction for mptcp
Previous releases - always broken:
- bpf: fix subprog poke descriptor tracking use-after-free
- ipv6:
- allocate enough headroom in ip6_finish_output2() in case
iptables TEE is used
- tcp: drop silly ICMPv6 packet too big messages to avoid
expensive and pointless lookups (which may serve as a DDOS
vector)
- make sure fwmark is copied in SYNACK packets
- fix 'disable_policy' for forwarded packets (align with IPv4)
- netfilter: conntrack:
- do not renew entry stuck in tcp SYN_SENT state
- do not mark RST in the reply direction coming after SYN packet
for an out-of-sync entry
- mptcp: cleanly handle error conditions with MP_JOIN and syncookies
- mptcp: fix double free when rejecting a join due to port mismatch
- validate lwtstate->data before returning from skb_tunnel_info()
- tcp: call sk_wmem_schedule before sk_mem_charge in zerocopy path
- mt76: mt7921: continue to probe driver when fw already downloaded
- bonding: fix multiple issues with offloading IPsec to (thru?) bond
- stmmac: ptp: fix issues around Qbv support and setting time back
- bcmgenet: always clear wake-up based on energy detection
Misc:
- sctp: move 198 addresses from unusable to private scope
- ptp: support virtual clocks and timestamping
- openvswitch: optimize operation for key comparison"
* tag 'net-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (158 commits)
net: dsa: properly check for the bridge_leave methods in dsa_switch_bridge_leave()
sfc: add logs explaining XDP_TX/REDIRECT is not available
sfc: ensure correct number of XDP queues
sfc: fix lack of XDP TX queues - error XDP TX failed (-22)
net: fddi: fix UAF in fza_probe
net: dsa: sja1105: fix address learning getting disabled on the CPU port
net: ocelot: fix switchdev objects synced for wrong netdev with LAG offload
net: Use nlmsg_unicast() instead of netlink_unicast()
octeontx2-pf: Fix uninitialized boolean variable pps
ipv6: allocate enough headroom in ip6_finish_output2()
net: hdlc: rename 'mod_init' & 'mod_exit' functions to be module-specific
net: bridge: multicast: fix MRD advertisement router port marking race
net: bridge: multicast: fix PIM hello router port marking race
net: phy: marvell10g: fix differentiation of 88X3310 from 88X3340
dsa: fix for_each_child.cocci warnings
virtio_net: check virtqueue_add_sgs() return value
mptcp: properly account bulk freed memory
selftests: mptcp: fix case multiple subflows limited by server
mptcp: avoid processing packet if a subflow reset
mptcp: fix syncookie process if mptcp can not_accept new subflow
...
This patch adds a new sysctl tcp_ignore_invalid_rst to disable marking
out of segments RSTs as INVALID.
Signed-off-by: Ali Abdallah <aabdallah@suse.de>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Here is the big set of tty and serial driver patches for 5.14-rc1.
A bit more than normal, but nothing major, lots of cleanups. Highlights
are:
- lots of tty api cleanups and mxser driver cleanups from Jiri
- build warning fixes
- various serial driver updates
- coding style cleanups
- various tty driver minor fixes and updates
- removal of broken and disable r3964 line discipline (finally!)
All of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYOM4qQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ylKvQCfbh+OmTkDlDlDhSWlxuV05M1XTXoAoLUcLZru
s5JCnwSZztQQLMDHj7Pd
=Zupm
-----END PGP SIGNATURE-----
Merge tag 'tty-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty / serial updates from Greg KH:
"Here is the big set of tty and serial driver patches for 5.14-rc1.
A bit more than normal, but nothing major, lots of cleanups.
Highlights are:
- lots of tty api cleanups and mxser driver cleanups from Jiri
- build warning fixes
- various serial driver updates
- coding style cleanups
- various tty driver minor fixes and updates
- removal of broken and disable r3964 line discipline (finally!)
All of these have been in linux-next for a while with no reported
issues"
* tag 'tty-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (227 commits)
serial: mvebu-uart: remove unused member nb from struct mvebu_uart
arm64: dts: marvell: armada-37xx: Fix reg for standard variant of UART
dt-bindings: mvebu-uart: fix documentation
serial: mvebu-uart: correctly calculate minimal possible baudrate
serial: mvebu-uart: do not allow changing baudrate when uartclk is not available
serial: mvebu-uart: fix calculation of clock divisor
tty: make linux/tty_flip.h self-contained
serial: Prefer unsigned int to bare use of unsigned
serial: 8250: 8250_omap: Fix possible interrupt storm on K3 SoCs
serial: qcom_geni_serial: use DT aliases according to DT bindings
Revert "tty: serial: Add UART driver for Cortina-Access platform"
tty: serial: Add UART driver for Cortina-Access platform
MAINTAINERS: add me back as mxser maintainer
mxser: Documentation, fix typos
mxser: Documentation, make the docs up-to-date
mxser: Documentation, remove traces of callout device
mxser: introduce mxser_16550A_or_MUST helper
mxser: rename flags to old_speed in mxser_set_serial_info
mxser: use port variable in mxser_set_serial_info
mxser: access info->MCR under info->slock
...
kernel-doc for TIPC is too simple, we need to add more information for it.
This patch is to extend the abstract, and add the Features and Links items.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add an interface for getting PHC (PTP Hardware Clock)
virtual clocks, which are based on PHC physical clock
providing hardware timestamp to network packets.
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Core:
- BPF:
- add syscall program type and libbpf support for generating
instructions and bindings for in-kernel BPF loaders (BPF loaders
for BPF), this is a stepping stone for signed BPF programs
- infrastructure to migrate TCP child sockets from one listener
to another in the same reuseport group/map to improve flexibility
of service hand-off/restart
- add broadcast support to XDP redirect
- allow bypass of the lockless qdisc to improving performance
(for pktgen: +23% with one thread, +44% with 2 threads)
- add a simpler version of "DO_ONCE()" which does not require
jump labels, intended for slow-path usage
- virtio/vsock: introduce SOCK_SEQPACKET support
- add getsocketopt to retrieve netns cookie
- ip: treat lowest address of a IPv4 subnet as ordinary unicast address
allowing reclaiming of precious IPv4 addresses
- ipv6: use prandom_u32() for ID generation
- ip: add support for more flexible field selection for hashing
across multi-path routes (w/ offload to mlxsw)
- icmp: add support for extended RFC 8335 PROBE (ping)
- seg6: add support for SRv6 End.DT46 behavior
- mptcp:
- DSS checksum support (RFC 8684) to detect middlebox meddling
- support Connection-time 'C' flag
- time stamping support
- sctp: packetization Layer Path MTU Discovery (RFC 8899)
- xfrm: speed up state addition with seq set
- WiFi:
- hidden AP discovery on 6 GHz and other HE 6 GHz improvements
- aggregation handling improvements for some drivers
- minstrel improvements for no-ack frames
- deferred rate control for TXQs to improve reaction times
- switch from round robin to virtual time-based airtime scheduler
- add trace points:
- tcp checksum errors
- openvswitch - action execution, upcalls
- socket errors via sk_error_report
Device APIs:
- devlink: add rate API for hierarchical control of max egress rate
of virtual devices (VFs, SFs etc.)
- don't require RCU read lock to be held around BPF hooks
in NAPI context
- page_pool: generic buffer recycling
New hardware/drivers:
- mobile:
- iosm: PCIe Driver for Intel M.2 Modem
- support for Qualcomm MSM8998 (ipa)
- WiFi: Qualcomm QCN9074 and WCN6855 PCI devices
- sparx5: Microchip SparX-5 family of Enterprise Ethernet switches
- Mellanox BlueField Gigabit Ethernet (control NIC of the DPU)
- NXP SJA1110 Automotive Ethernet 10-port switch
- Qualcomm QCA8327 switch support (qca8k)
- Mikrotik 10/25G NIC (atl1c)
Driver changes:
- ACPI support for some MDIO, MAC and PHY devices from Marvell and NXP
(our first foray into MAC/PHY description via ACPI)
- HW timestamping (PTP) support: bnxt_en, ice, sja1105, hns3, tja11xx
- Mellanox/Nvidia NIC (mlx5)
- NIC VF offload of L2 bridging
- support IRQ distribution to Sub-functions
- Marvell (prestera):
- add flower and match all
- devlink trap
- link aggregation
- Netronome (nfp): connection tracking offload
- Intel 1GE (igc): add AF_XDP support
- Marvell DPU (octeontx2): ingress ratelimit offload
- Google vNIC (gve): new ring/descriptor format support
- Qualcomm mobile (rmnet & ipa): inline checksum offload support
- MediaTek WiFi (mt76)
- mt7915 MSI support
- mt7915 Tx status reporting
- mt7915 thermal sensors support
- mt7921 decapsulation offload
- mt7921 enable runtime pm and deep sleep
- Realtek WiFi (rtw88)
- beacon filter support
- Tx antenna path diversity support
- firmware crash information via devcoredump
- Qualcomm 60GHz WiFi (wcn36xx)
- Wake-on-WLAN support with magic packets and GTK rekeying
- Micrel PHY (ksz886x/ksz8081): add cable test support
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmDb+fUACgkQMUZtbf5S
Irs2Jg//aqN0Q8CgIvYCVhPxQw1tY7pTAbgyqgBZ01vwjyvtIOgJiWzSfFEU84mX
M8fcpFX5eTKrOyJ9S6UFfQ/JG114n3hjAxFFT4Hxk2gC1Tg0vHuFQTDHcUl28bUE
mTm61e1YpdorILnv2k5JVQ/wu0vs5QKDrjcYcrcPnh+j93wvnPOgAfDBV95nZzjS
OTt4q2fR8GzLcSYWWsclMbDNkzyTG50RW/0Yd6aGjr5QGvXfrMeXfUJNz533PMf/
w5lNyjRKv+x9mdTZJzU0+msNUrZgUdRz7W8Ey8lD3hJZRE+D6/uU7FtsE8Mi3+uc
HWxeZUyzA3YF1MfVl/eesbxyPT7S/OkLzk4O5B35FbqP0YltaP+bOjq1/nM3ce1/
io9Dx9pIl/2JANUgRCAtLi8Z2dkvRoqTaBxZ/nPudCCljFwDwl6joTMJ7Ow22i5Y
5aIkcXFmZq4LbJDiHvbTlqT7yiuaEvu2UK/23bSIg/K3nF4eAmkY9Y1EgiMf60OF
78Ttw0wk2tUegwaS5MZnCniKBKDyl9gM2F6rbZ/IxQRR2LTXFc1B6gC+ynUxgXfh
Ub8O++6qGYGYZ0XvQH4pzco79p3qQWBTK5beIp2eu6BOAjBVIXq4AibUfoQLACsu
hX7jMPYd0kc3WFgUnKgQP8EnjFSwbf4XiaE7fIXvWBY8hzCw2h4=
=LvtX
-----END PGP SIGNATURE-----
Merge tag 'net-next-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core:
- BPF:
- add syscall program type and libbpf support for generating
instructions and bindings for in-kernel BPF loaders (BPF loaders
for BPF), this is a stepping stone for signed BPF programs
- infrastructure to migrate TCP child sockets from one listener to
another in the same reuseport group/map to improve flexibility
of service hand-off/restart
- add broadcast support to XDP redirect
- allow bypass of the lockless qdisc to improving performance (for
pktgen: +23% with one thread, +44% with 2 threads)
- add a simpler version of "DO_ONCE()" which does not require jump
labels, intended for slow-path usage
- virtio/vsock: introduce SOCK_SEQPACKET support
- add getsocketopt to retrieve netns cookie
- ip: treat lowest address of a IPv4 subnet as ordinary unicast
address allowing reclaiming of precious IPv4 addresses
- ipv6: use prandom_u32() for ID generation
- ip: add support for more flexible field selection for hashing
across multi-path routes (w/ offload to mlxsw)
- icmp: add support for extended RFC 8335 PROBE (ping)
- seg6: add support for SRv6 End.DT46 behavior
- mptcp:
- DSS checksum support (RFC 8684) to detect middlebox meddling
- support Connection-time 'C' flag
- time stamping support
- sctp: packetization Layer Path MTU Discovery (RFC 8899)
- xfrm: speed up state addition with seq set
- WiFi:
- hidden AP discovery on 6 GHz and other HE 6 GHz improvements
- aggregation handling improvements for some drivers
- minstrel improvements for no-ack frames
- deferred rate control for TXQs to improve reaction times
- switch from round robin to virtual time-based airtime scheduler
- add trace points:
- tcp checksum errors
- openvswitch - action execution, upcalls
- socket errors via sk_error_report
Device APIs:
- devlink: add rate API for hierarchical control of max egress rate
of virtual devices (VFs, SFs etc.)
- don't require RCU read lock to be held around BPF hooks in NAPI
context
- page_pool: generic buffer recycling
New hardware/drivers:
- mobile:
- iosm: PCIe Driver for Intel M.2 Modem
- support for Qualcomm MSM8998 (ipa)
- WiFi: Qualcomm QCN9074 and WCN6855 PCI devices
- sparx5: Microchip SparX-5 family of Enterprise Ethernet switches
- Mellanox BlueField Gigabit Ethernet (control NIC of the DPU)
- NXP SJA1110 Automotive Ethernet 10-port switch
- Qualcomm QCA8327 switch support (qca8k)
- Mikrotik 10/25G NIC (atl1c)
Driver changes:
- ACPI support for some MDIO, MAC and PHY devices from Marvell and
NXP (our first foray into MAC/PHY description via ACPI)
- HW timestamping (PTP) support: bnxt_en, ice, sja1105, hns3, tja11xx
- Mellanox/Nvidia NIC (mlx5)
- NIC VF offload of L2 bridging
- support IRQ distribution to Sub-functions
- Marvell (prestera):
- add flower and match all
- devlink trap
- link aggregation
- Netronome (nfp): connection tracking offload
- Intel 1GE (igc): add AF_XDP support
- Marvell DPU (octeontx2): ingress ratelimit offload
- Google vNIC (gve): new ring/descriptor format support
- Qualcomm mobile (rmnet & ipa): inline checksum offload support
- MediaTek WiFi (mt76)
- mt7915 MSI support
- mt7915 Tx status reporting
- mt7915 thermal sensors support
- mt7921 decapsulation offload
- mt7921 enable runtime pm and deep sleep
- Realtek WiFi (rtw88)
- beacon filter support
- Tx antenna path diversity support
- firmware crash information via devcoredump
- Qualcomm WiFi (wcn36xx)
- Wake-on-WLAN support with magic packets and GTK rekeying
- Micrel PHY (ksz886x/ksz8081): add cable test support"
* tag 'net-next-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2168 commits)
tcp: change ICSK_CA_PRIV_SIZE definition
tcp_yeah: check struct yeah size at compile time
gve: DQO: Fix off by one in gve_rx_dqo()
stmmac: intel: set PCI_D3hot in suspend
stmmac: intel: Enable PHY WOL option in EHL
net: stmmac: option to enable PHY WOL with PMT enabled
net: say "local" instead of "static" addresses in ndo_dflt_fdb_{add,del}
net: use netdev_info in ndo_dflt_fdb_{add,del}
ptp: Set lookup cookie when creating a PTP PPS source.
net: sock: add trace for socket errors
net: sock: introduce sk_error_report
net: dsa: replay the local bridge FDB entries pointing to the bridge dev too
net: dsa: ensure during dsa_fdb_offload_notify that dev_hold and dev_put are on the same dev
net: dsa: include fdb entries pointing to bridge in the host fdb list
net: dsa: include bridge addresses which are local in the host fdb list
net: dsa: sync static FDB entries on foreign interfaces to hardware
net: dsa: install the host MDB and FDB entries in the master's RX filter
net: dsa: reference count the FDB addresses at the cross-chip notifier level
net: dsa: introduce a separate cross-chip notifier type for host FDBs
net: dsa: reference count the MDB entries at the cross-chip notifier level
...
We want to add reference counting for FDB entries in cross-chip
topologies, and in order for that to have any chance of working and not
be unbalanced (leading to entries which are never deleted), we need to
ensure that higher layers are sane, because if they aren't, it's garbage
in, garbage out.
For example, if we add a bridge FDB entry twice, the bridge properly
errors out:
$ bridge fdb add dev swp0 00:01:02:03:04:07 master static
$ bridge fdb add dev swp0 00:01:02:03:04:07 master static
RTNETLINK answers: File exists
However, the same thing cannot be said about the bridge bypass
operations:
$ bridge fdb add dev swp0 00:01:02:03:04:07
$ bridge fdb add dev swp0 00:01:02:03:04:07
$ bridge fdb add dev swp0 00:01:02:03:04:07
$ bridge fdb add dev swp0 00:01:02:03:04:07
$ echo $?
0
But one 'bridge fdb del' is enough to remove the entry, no matter how
many times it was added.
The bridge bypass operations are impossible to maintain in these
circumstances and lack of support for reference counting the cross-chip
notifiers is holding us back from making further progress, so just drop
support for them. The only way left for users to install static bridge
FDB entries is the proper one, using the "master static" flags.
With this change, rtnl_fdb_add() falls back to calling
ndo_dflt_fdb_add() which uses the duplicate-exclusive variant of
dev_uc_add(): dev_uc_add_excl(). Because DSA does not (yet) declare
IFF_UNICAST_FLT, this results in us going to promiscuous mode:
$ bridge fdb add dev swp0 00:01:02:03:04:05
[ 28.206743] device swp0 entered promiscuous mode
$ bridge fdb add dev swp0 00:01:02:03:04:05
RTNETLINK answers: File exists
So even if it does not completely fail, there is at least some indication
that it is behaving differently from before, and closer to user space
expectations, I would argue (the lack of a "local|static" specifier
defaults to "local", or "host-only", so dev_uc_add() is a reasonable
default implementation). If the generic implementation of .ndo_fdb_add
provided by Vlad Yasevich is a proof of anything, it only proves that
the implementation provided by DSA was always wrong, by not looking at
"ndm->ndm_state & NUD_NOARP" (the "static" flag which means that the FDB
entry points outwards) and "ndm->ndm_state & NUD_PERMANENT" (the "local"
flag which means that the FDB entry points towards the host). It all
used to mean the same thing to DSA.
Update the documentation so that the users are not confused about what's
going on.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf-next 2021-06-28
The following pull-request contains BPF updates for your *net-next* tree.
We've added 37 non-merge commits during the last 12 day(s) which contain
a total of 56 files changed, 394 insertions(+), 380 deletions(-).
The main changes are:
1) XDP driver RCU cleanups, from Toke Høiland-Jørgensen and Paul E. McKenney.
2) Fix bpf_skb_change_proto() IPv4/v6 GSO handling, from Maciej Żenczykowski.
3) Fix false positive kmemleak report for BPF ringbuf alloc, from Rustam Kovhaev.
4) Fix x86 JIT's extable offset calculation for PROBE_LDX NULL, from Ravi Bangoria.
5) Enable libbpf fallback probing with tracing under RHEL7, from Jonathan Edwards.
6) Clean up x86 JIT to remove unused cnt tracking from EMIT macro, from Jiri Olsa.
7) Netlink cleanups for libbpf to please Coverity, from Kumar Kartikeya Dwivedi.
8) Allow to retrieve ancestor cgroup id in tracing programs, from Namhyung Kim.
9) Fix lirc BPF program query to use user-provided prog_cnt, from Sean Young.
10) Add initial libbpf doc including generated kdoc for its API, from Grant Seltzer.
11) Make xdp_rxq_info_unreg_mem_model() more robust, from Jakub Kicinski.
12) Fix up bpfilter startup log-level to info level, from Gary Lin.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
These is no need to wait for 'interval' period for the next probe
if the last probe is already acked in search state. The 'interval'
period waiting should be only for probe failure timeout and the
current pmtu check when it's in search complete state.
This change will shorten the probe time a lot in search state, and
also fix the document accordingly.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Denote that the new switch generation is supported, detail its pin
strapping options (with differences compared to SJA1105) and explain how
MDIO access to the internal 100base-T1 and 100base-TX PHYs is performed.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
DQO is a new descriptor format for our next generation virtual NIC.
Signed-off-by: Bailey Forrest <bcf@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for net-next:
1) Skip non-SCTP packets in the new SCTP chunk support for nft_exthdr,
from Phil Sutter.
2) Simplify TCP option sanity check for TCP packets, also from Phil.
3) Add a new expression to store when the rule has been used last time.
4) Pass the hook state object to log function, from Florian Westphal.
5) Document the new sysctl knobs to tune the flowtable timeouts,
from Oz Shlomo.
6) Fix snprintf error check in the new nfnetlink_hook infrastructure,
from Dan Carpenter.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Examples in this document use all kinds of indentation from 3 to 5
spaces and even mixed with tabs. Making them all even and equal to
4 spaces.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20210622185647.3705104-1-i.maximets@ovn.org
This patch added a new sysctl, named allow_join_initial_addr_port, to
control whether allow peers to send join requests to the IP address and
port number used by the initial subflow.
Suggested-by: Florian Westphal <fw@strlen.de>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
PLPMTUD can be enabled by doing 'sysctl -w net.sctp.probe_interval=n'.
'n' is the interval for PLPMTUD probe timer in milliseconds, and it
can't be less than 5000 if it's not 0.
All asoc/transport's PLPMTUD in a new socket will be enabled by default.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The kernel assumes bank 0 when 'ETHTOOL_MSG_MODULE_EEPROM_GET' is sent
without 'ETHTOOL_A_MODULE_EEPROM_BANK'.
Document it as part of the interface documentation.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
'ETHTOOL_A_MODULE_EEPROM_DATA' is a binary attribute, not a nested one.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The command is called 'ETHTOOL_MSG_MODULE_EEPROM_GET', not
'ETHTOOL_MSG_MODULE_EEPROM'.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch added a new sysctl, named checksum_enabled, to control
whether DSS checksum can be enabled.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The conversion tools used during DocBook/LaTeX/html/Markdown->ReST
conversion and some cut-and-pasted text contain some characters that
aren't easily reachable on standard keyboards and/or could cause
troubles when parsed by the documentation build system.
Replace the occurences of the following characters:
- U+00a0 (' '): NO-BREAK SPACE
as it can cause lines being truncated on PDF output
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/9bd9f5c067c4b068a974730a14fe8d68e1be0c9a.1623826294.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Daniel Borkmann says:
====================
pull-request: bpf-next 2021-06-17
The following pull-request contains BPF updates for your *net-next* tree.
We've added 50 non-merge commits during the last 25 day(s) which contain
a total of 148 files changed, 4779 insertions(+), 1248 deletions(-).
The main changes are:
1) BPF infrastructure to migrate TCP child sockets from a listener to another
in the same reuseport group/map, from Kuniyuki Iwashima.
2) Add a provably sound, faster and more precise algorithm for tnum_mul() as
noted in https://arxiv.org/abs/2105.05398, from Harishankar Vishwanathan.
3) Streamline error reporting changes in libbpf as planned out in the
'libbpf: the road to v1.0' effort, from Andrii Nakryiko.
4) Add broadcast support to xdp_redirect_map(), from Hangbin Liu.
5) Extends bpf_map_lookup_and_delete_elem() functionality to 4 more map
types, that is, {LRU_,PERCPU_,LRU_PERCPU_,}HASH, from Denis Salopek.
6) Support new LLVM relocations in libbpf to make them more linker friendly,
also add a doc to describe the BPF backend relocations, from Yonghong Song.
7) Silence long standing KUBSAN complaints on register-based shifts in
interpreter, from Daniel Borkmann and Eric Biggers.
8) Add dummy PT_REGS macros in libbpf to fail BPF program compilation when
target arch cannot be determined, from Lorenz Bauer.
9) Extend AF_XDP to support large umems with 1M+ pages, from Magnus Karlsson.
10) Fix two minor libbpf tc BPF API issues, from Kumar Kartikeya Dwivedi.
11) Move libbpf BPF_SEQ_PRINTF/BPF_SNPRINTF macros that can be used by BPF
programs to bpf_helpers.h header, from Florent Revest.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes .rst file warnings seen on linux-next build.
Fixes: f7af616c63 ("net: iosm: infrastructure")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: M Chetan Kumar <m.chetan.kumar@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit adds a new sysctl option: net.ipv4.tcp_migrate_req. If this
option is enabled or eBPF program is attached, we will be able to migrate
child sockets from a listener to another in the same reuseport group after
close() or shutdown() syscalls.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Benjamin Herrenschmidt <benh@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210612123224.12525-2-kuniyu@amazon.co.jp
Add documentation for the devlink feature prestera switchdev driver supports:
add description for the support of the driver-specific devlink traps
(include both traps with action TRAP and action DROP);
Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add 25gbase-r phy interface mode
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: Bjarni Jonasson <bjarni.jonasson@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some really really weird switches just couldn't decide whether to use a
normal or a tail tagger, so they just did both.
This creates problems for DSA, because we only have the concept of an
'overhead' which can be applied to the headroom or to the tailroom of
the skb (like for example during the central TX reallocation procedure),
depending on the value of bool tail_tag, but not to both.
We need to generalize DSA to cater for these odd switches by
transforming the 'overhead / tail_tag' pair into 'needed_headroom /
needed_tailroom'.
The DSA master's MTU is increased to account for both.
The flow dissector code is modified such that it only calls the DSA
adjustment callback if the tagger has a non-zero header length.
Taggers are trivially modified to declare either needed_headroom or
needed_tailroom, based on the tail_tag value that they currently
declare.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement support for pushing vlan header into untagged packet on ingress
of port that has pvid configured and support for popping vlan on egress of
port that has the matching vlan configured as untagged. To support such
configurations packet reformat contexts of {INSERT|REMOVE}_HEADER types are
created per such vlan and saved to struct mlx5_esw_bridge_vlan which allows
all FDB entries on particular vlan to share single packet reformat
instance. When initializing FDB entries with pvid or untagged vlan type set
its mlx5_flow_act->pkt_reformat action accordingly.
Flush all flows when removing vlan from port. This is necessary because
even though software bridge removes all FDB entries before removing their
vlan, mlx5 bridge implementation deletes their corresponding flow entries
from hardware in asynchronous workqueue task, which will cause firmware
error if vlan packet reformat context is deleted before all flows that
point to it.
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Add support for FDB vlan-tagged entries. Extend ingress and egress flow
tables with flow groups to match packet vlan tag. Modify the flow creation
code to include vlan tag, if vlan is configured on port and vlan
configuration is supported for offload.
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Hardware supported by mlx5 driver doesn't provide learning and requires the
driver to emulate all switch-like behavior in software. As such, all
packets by default go through miss path, appear on representor and get to
software bridge, if it is the upper device of the representor. This causes
bridge to process packet in software, learn the MAC address to FDB and send
SWITCHDEV_FDB_ADD_TO_DEVICE event to all subscribers.
In order to offload FDB entries in mlx5, register switchdev notifier
callback and implement support for both 'added_by_user' and dynamic FDB
entry SWITCHDEV_FDB_ADD_TO_DEVICE events asynchronously using new
mlx5_esw_bridge_offloads->wq ordered workqueue. In workqueue callback
offload the ingress rule (matching FDB entry MAC as packet source MAC) and
egress table rule (matching FDB entry MAC as destination MAC). For ingress
table rule also match source vport to ensure that only traffic coming from
expected bridge port is matched by offloaded rule. Save all the relevant
FDB entry data in struct mlx5_esw_bridge_fdb_entry instance and insert the
instance in new mlx5_esw_bridge->fdb_list list (for traversing all entries
by software ageing implementation in following patch) and in new
mlx5_esw_bridge->fdb_ht hash table for fast retrieval. Notify the bridge
that FDB entry has been offloaded by sending SWITCHDEV_FDB_OFFLOADED
notification.
Delete FDB entry on reception of SWITCHDEV_FDB_DEL_TO_DEVICE event.
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The documentation file used to be written in markdown format but was
converted to reStructuredText (rst).
The converted file doesn't keep up with rst format requirements which
results in hard-to-read text.
This patch fixes the formatting of the file. The patch also
* Highlights and emphasizes some lines to improve readability
* Rephrases some hard-to-understand text
* Updates outdated function descriptions.
* Removes TSO description which falsely claims the driver supports it
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add devlink rate objects section at devlink port documentation.
Add devlink rate support info at netdevsim devlink documentation.
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adding documentation explaining the new MAPv4/v5 packet formats
and the corresponding checksum offload headers.
Signed-off-by: Sharath Chandra Vurukala <sharathv@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
To avoid confusions, it seems better to parse this sysctl parameter as a
boolean. We use it as a boolean, no need to parse an integer and bring
confusions if we see a value different from 0 and 1, especially with
this parameter name: enabled.
It seems fine to do this modification because the default value is 1
(enabled). Then the only other interesting value to set is 0 (disabled).
All other values would not have changed the default behaviour.
Suggested-by: Florian Westphal <fw@strlen.de>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a new multipath hash policy where the packet fields used for hash
calculation are determined by user space via the
fib_multipath_hash_fields sysctl that was introduced in the previous
patch.
The current set of available packet fields includes both outer and inner
fields, which requires two invocations of the flow dissector. Avoid
unnecessary dissection of the outer or inner flows by skipping
dissection if none of the outer or inner fields are required.
In accordance with the existing policies, when an skb is not available,
packet fields are extracted from the provided flow key. In which case,
only outer fields are considered.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
A subsequent patch will add a new multipath hash policy where the packet
fields used for multipath hash calculation are determined by user space.
This patch adds a sysctl that allows user space to set these fields.
The packet fields are represented using a bitmask and are common between
IPv4 and IPv6 to allow user space to use the same numbering across both
protocols. For example, to hash based on standard 5-tuple:
# sysctl -w net.ipv6.fib_multipath_hash_fields=0x0037
net.ipv6.fib_multipath_hash_fields = 0x0037
To avoid introducing holes in 'struct netns_sysctl_ipv6', move the
'bindv6only' field after the multipath hash fields.
The kernel rejects unknown fields, for example:
# sysctl -w net.ipv6.fib_multipath_hash_fields=0x1000
sysctl: setting key "net.ipv6.fib_multipath_hash_fields": Invalid argument
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a new multipath hash policy where the packet fields used for hash
calculation are determined by user space via the
fib_multipath_hash_fields sysctl that was introduced in the previous
patch.
The current set of available packet fields includes both outer and inner
fields, which requires two invocations of the flow dissector. Avoid
unnecessary dissection of the outer or inner flows by skipping
dissection if none of the outer or inner fields are required.
In accordance with the existing policies, when an skb is not available,
packet fields are extracted from the provided flow key. In which case,
only outer fields are considered.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
A subsequent patch will add a new multipath hash policy where the packet
fields used for multipath hash calculation are determined by user space.
This patch adds a sysctl that allows user space to set these fields.
The packet fields are represented using a bitmask and are common between
IPv4 and IPv6 to allow user space to use the same numbering across both
protocols. For example, to hash based on standard 5-tuple:
# sysctl -w net.ipv4.fib_multipath_hash_fields=0x0037
net.ipv4.fib_multipath_hash_fields = 0x0037
The kernel rejects unknown fields, for example:
# sysctl -w net.ipv4.fib_multipath_hash_fields=0x1000
sysctl: setting key "net.ipv4.fib_multipath_hash_fields": Invalid argument
More fields can be added in the future, if needed.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Group the flow flags under a single struct called flow. The new struct
contains 'stopped' and 'tco_stopped' bools which used to be bits in a
bitfield. The struct also contains the lock protecting them to
potentially share the same cache line.
Note that commit c545b66c69 (tty: Serialize tcflow() with other tty
flow control changes) added a padding to the original bitfield. It was
for the bitfield to occupy a whole 64b word to avoid interferring stores
on Alpha (cannot we evaporate this arch with weird implications to C
code yet?). But it doesn't work as expected as the padding
(tty_struct::unused) is aligned to a 8B boundary too and occupies some
bytes from the next word.
So make it reliable by:
1) setting __aligned of the struct -- that aligns the start, and
2) making 'unsigned long unused[0]' as the last member of the struct --
pads the end.
This is also the perfect time to start the documentation of tty_struct
where all this lives. So we start by documenting what these bools
actually serve for. And why we do all the alignment dances. Only the few
up-to-date information from the Theodore's comment made it into this new
Kerneldoc comment.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Shawn Guo <shawnguo@kernel.org>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: "Maciej W. Rozycki" <macro@orcam.me.uk>
Link: https://lore.kernel.org/r/20210505091928.22010-13-jslaby@suse.cz
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Probably because the original file was pre-processed by some
tool, both i40e.rst and iavf.rst files are using this character:
- U+2013 ('–'): EN DASH
meaning an hyphen when calling a command line application, which
is obviously wrong. So, replace them by an hyphen, ensuring
that it will be properly displayed as literals when building
the documentation.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/95eb2a48d0ca3528780ce0dfce64359977fa8cb3.1620744606.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Daniel Borkmann says:
====================
pull-request: bpf-next 2021-04-28
The main changes are:
1) Add link detach and following re-attach for trampolines, from Jiri Olsa.
2) Use kernel's "binary printf" lib for formatted output BPF helpers (which
avoids the needs for variadic argument handling), from Florent Revest.
3) Fix verifier 64 to 32 bit min/max bound propagation, from Daniel Borkmann.
4) Convert cpumap to use netif_receive_skb_list(), from Lorenzo Bianconi.
5) Add generic batched-ops support to percpu array map, from Pedro Tammela.
6) Various CO-RE relocation BPF selftests fixes, from Andrii Nakryiko.
7) Misc doc rst fixes, from Hengqi Chen.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
bpf, selftests: Update array map tests for per-cpu batched ops
bpf: Add batched ops support for percpu array
bpf: Implement formatted output helpers with bstr_printf
seq_file: Add a seq_bprintf function
bpf, docs: Fix literal block for example code
bpf, cpumap: Bulk skb using netif_receive_skb_list
bpf: Fix propagation of 32 bit unsigned bounds from 64 bit bounds
bpf: Lock bpf_trace_printk's tmp buf before it is written to
selftests/bpf: Fix core_reloc test runner
selftests/bpf: Fix field existence CO-RE reloc tests
selftests/bpf: Fix BPF_CORE_READ_BITFIELD() macro
libbpf: Support BTF_KIND_FLOAT during type compatibility checks in CO-RE
selftests/bpf: Add remaining ASSERT_xxx() variants
selftests/bpf: Use ASSERT macros in lsm test
selftests/bpf: Test that module can't be unloaded with attached trampoline
selftests/bpf: Add re-attach test to lsm test
selftests/bpf: Add re-attach test to fexit_test
selftests/bpf: Add re-attach test to fentry_test
bpf: Allow trampoline re-attach for tracing and lsm programs
====================
Link: https://lore.kernel.org/r/20210427233740.22238-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Update timestamping doc for DSA switches to describe current
implementation accurately. On TX, the skb cloning is no longer
in DSA generic code.
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some parts of the documentation may lead the reader to think that the
socket's own frames are always received when CAN_RAW_RECV_OWN_MSGS is
enabled, but all frames are subject to filtering.
As explained by Marc Kleine-Budde:
On TX complete of a CAN frame it's pushed into the RX path of the
networking stack, along with the information of the originating socket.
Then the CAN frame is delivered into AF_CAN, where it is passed on to
all registered receivers depending on filters. One receiver is the
sending socket in CAN_RAW. Then in CAN_RAW the it is checked if the
sending socket has RECV_OWN_MSGS enabled.
Link: https://lore.kernel.org/r/20210420191212.42753-1-erik@flodin.me
Signed-off-by: Erik Flodin <erik@flodin.me>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
ETHTOOL_MSG_MODULE_EEPROM_GET is missing from the list of messages.
ETHTOOL_MSG_MODULE_EEPROM_GET_REPLY is sadly a rather long name
so we need to adjust column length.
v2: use spaces (Andrew)
Fixes: c781ff12a2 ("ethtool: Allow network drivers to dump arbitrary EEPROM data")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
- keep the ZC code, drop the code related to reinit
net/bridge/netfilter/ebtables.c
- fix build after move to net_generic
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Make the lack of expectations for switching NICs explicit,
describe the new stats.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similarly to pause statistics add stats for FEC.
The IEEE standard mandates two sets of counters:
- 30.5.1.1.17 aFECCorrectedBlocks
- 30.5.1.1.18 aFECUncorrectableBlocks
where block is a block of bits FEC operates on.
Each of these counters is defined per lane (PCS instance).
Multiple vendors provide number of corrected _bits_ rather
than/as well as blocks.
This set adds the 2 standard-based block counters and a extra
one for corrected bits.
Counters are exposed to user space via netlink in new attributes.
Each attribute carries an array of u64s, first element is
the total count, and the following ones are a per-lane break down.
Much like with pause stats the operation will not fail when driver
does not implement the get_fec_stats callback (nor can the driver
fail the operation by returning an error). If stats can't be
reported the relevant attributes will be empty.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Let's have all seg6 sysctl at the same place.
Fixes: a6dc6670cd ("ipv6: sr: Add documentation for seg_flowlabel sysctl")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently each packet inserted in eswitch is tagged with a internal
metadata to indicate source vport. Metadata tagging is not always
needed. Metadata insertion is needed for multi-port RoCE, failover
between representors and stacked devices. In many other cases,
metadata enablement is not needed.
Metadata insertion slows down the packet processing rate of the E-switch
when it is in switchdev mode.
Below table show performance gain with metadata disabled for VXLAN
offload rules in both SMFS and DMFS steering mode on ConnectX-5 device.
----------------------------------------------
| steering | metadata | pkt size | rx pps |
| mode | | | (million) |
----------------------------------------------
| smfs | disabled | 128Bytes | 42 |
----------------------------------------------
| smfs | enabled | 128Bytes | 36 |
----------------------------------------------
| dmfs | disabled | 128Bytes | 42 |
----------------------------------------------
| dmfs | enabled | 128Bytes | 36 |
----------------------------------------------
Hence, allow user to disable metadata using driver specific devlink
parameter. Metadata setting of the eswitch is applicable only for the
switchdev mode.
Example to show and disable metadata before changing eswitch mode:
$ devlink dev param show pci/0000:06:00.0 name esw_port_metadata
pci/0000:06:00.0:
name esw_port_metadata type driver-specific
values:
cmode runtime value true
$ devlink dev param set pci/0000:06:00.0 \
name esw_port_metadata value false cmode runtime
$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
changelog:
v1->v2:
- added performance numbers in commit log
- updated commit log and documentation for switchdev mode
- added explicit note on when user can disable metadata in
documentation
Define get_module_eeprom_by_page() ethtool callback and implement
netlink infrastructure.
get_module_eeprom_by_page() allows network drivers to dump a part of
module's EEPROM specified by page and bank numbers along with offset and
length. It is effectively a netlink replacement for get_module_info()
and get_module_eeprom() pair, which is needed due to emergence of
complex non-linear EEPROM layouts.
Signed-off-by: Vladyslav Tarasiuk <vladyslavt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
MAINTAINERS
- keep Chandrasekar
drivers/net/ethernet/mellanox/mlx5/core/en_main.c
- simple fix + trust the code re-added to param.c in -next is fine
include/linux/bpf.h
- trivial
include/linux/ethtool.h
- trivial, fix kdoc while at it
include/linux/skmsg.h
- move to relevant place in tcp.c, comment re-wrapped
net/core/skmsg.c
- add the sk = sk // sk = NULL around calls
net/tipc/crypto.c
- trivial
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Quotes to backticks. All commands use backticks since the names
are constants.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix incorrect documentation. Mostly referring to other objects,
likely because the text was copied and not adjusted.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
X.25 Layer 3 (the Packet Layer) expects layer 2 to provide a reliable
datalink service such that no packets are reordered or dropped. And
X.25 Layer 2 (the LAPB layer) is indeed designed to provide such service.
However, this reliability is not preserved when a driver calls "netif_rx"
to deliver the received packets to layer 3, because "netif_rx" will put
the packets into per-CPU queues before they are delivered to layer 3.
If there are multiple CPUs, the order of the packets may not be preserved.
The per-CPU queues may also drop packets if there are too many.
Therefore, we should not call "netif_rx" to let it queue the packets.
Instead, we should use our own queue that won't reorder or drop packets.
This patch changes all X.25 drivers to use their own queues instead of
calling "netif_rx". The patch also documents this requirement in the
"x25-iface" documentation.
Cc: Martin Schiller <ms@dev.tdt.de>
Signed-off-by: Xie He <xie.he.0141@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If there is overlapp between ip_local_port_range and ip_local_reserved_ports with a huge reserved block, it will affect probability of selecting ephemeral ports, see file net/ipv4/inet_hashtables.c:723
int __inet_hash_connect(
...
for (i = 0; i < remaining; i += 2, port += 2) {
if (unlikely(port >= high))
port -= remaining;
if (inet_is_local_reserved_port(net, port))
continue;
E.g. if there is reserved block of 10000 ports, two ports right after this block will be 5000 more likely selected than others.
If this was intended, we can/should add note into documentation as proposed in this commit, otherwise we should think about different solution. One option could be mapping table of continuous port ranges. Second option could be letting user to modify step (port+=2) in above loop, e.g. using new sysctl parameter.
Signed-off-by: Otto Hollmann <otto.hollmann@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add FEC API to netlink.
This is not a 1-to-1 conversion.
FEC settings already depend on link modes to tell user which
modes are supported. Take this further an use link modes for
manual configuration. Old struct ethtool_fecparam is still
used to talk to the drivers, so we need to translate back
and forth. We can revisit the internal API if number of FEC
encodings starts to grow.
Enforce only one active FEC bit (by using a bit position
rather than another mask).
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Section 8 of RFC 8335 specifies potential security concerns of
responding to PROBE requests, and states that nodes that support PROBE
functionality MUST be able to enable/disable responses and that
responses MUST be disabled by default
Signed-off-by: Andreas Roeseler <andreas.a.roeseler@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a document describing the principles behind resilient next-hop groups,
and some notes about how to configure and offload them.
Suggested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
... cannot be used in block quote, it breaks compilation, remove it.
Fix warnings due to missing blank line such as:
net-next/Documentation/networking/nf_flowtable.rst:142: WARNING: Block quote ends without a blank line; unexpected unindent.
Fixes: 143490cde5 ("docs: nf_flowtable: update documentation with enhancements")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch updates the flowtable documentation to describe recent
enhancements:
- Offload action is available after the first packets go through the
classic forwarding path.
- IPv4 and IPv6 are supported. Only TCP and UDP layer 4 are supported at
this stage.
- Tuple has been augmented to track VLAN id and PPPoE session id.
- Bridge and IP forwarding integration, including bridge VLAN filtering
support.
- Hardware offload support.
- Describe the [OFFLOAD] and [HW_OFFLOAD] tags in the conntrack table
listing.
- Replace 'flow offload' by 'flow add' in example rulesets (preferred
syntax).
- Describe existing cache limitations.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
s/subsytem/subsystem/
Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit 9d5ef190e5 ("net: dsa: automatically bring up DSA master
when opening user port"), DSA manages the administrative status of the
host port automatically. Update the configuration steps to reflect this.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
"make htmldocs" complains:
configuration.rst:165: WARNING: duplicate label networking/dsa/configuration:single port, other instance in (...)
configuration.rst:212: WARNING: duplicate label networking/dsa/configuration:bridge, other instance in (...)
configuration.rst:252: WARNING: duplicate label networking/dsa/configuration:gateway, other instance in (...)
And for good reason, because the "single port", "bridge" and "gateway"
use cases are replicated twice, once for normal taggers and twice for
DSA_TAG_PROTO_NONE. So when trying to reference these sections via a
hyperlink such as:
https://www.kernel.org/doc/html/latest/networking/dsa/configuration.html#single-port
it will always reference the first occurrence, and never the second one.
This change makes the "single port", "bridge" and "gateway"
configuration examples consistent with the formatting used in the
"Configuration showcases" subsection.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
"make htmldocs" produces these warnings:
Documentation/networking/dsa/dsa.rst:468: WARNING: Unexpected indentation.
Documentation/networking/dsa/dsa.rst:477: WARNING: Block quote ends without a blank line; unexpected unindent.
Fixes: 8411abbcad ("Documentation: networking: dsa: mention integration with devlink")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Even though this is clear from the context, it is nice to actually be
grammatically correct.
Fixes: 0f22ad45f4 ("Documentation: networking: switchdev: clarify device driver behavior")
Reported-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It looks like "make htmldocs" produces this warning:
Documentation/networking/switchdev.rst:482: WARNING: Unexpected indentation.
Fixes: 0f22ad45f4 ("Documentation: networking: switchdev: clarify device driver behavior")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The "bridge fdb add" command provided in the switchdev documentation is
junk now, not only because it is syntactically incorrect and rejected by
the iproute2 bridge program, but also because it was not updated in
light of Arkadi Sharshevsky's radical switchdev refactoring in commit
29ab586c3d ("net: switchdev: Remove bridge bypass support from
switchdev"). Try to explain what the intended usage pattern is with the
new kernel implementation.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch provides details on the expected behavior of switchdev
enabled network devices when operating in a "stand alone" mode, as well
as when being bridge members. This clarifies a number of things that
recently came up during a bug fixing session on the b53 DSA switch
driver.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a short summary of the methods that a driver writer must implement
for offloading a HSR/PRP network interface.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: George McCollister <george.mccollister@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a short summary of the methods that a driver writer must implement
for getting an MRP instance to work on top of a DSA switch.
Cc: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a short summary of the methods that a driver writer must implement
for offloading a link aggregation group, and what is still missing.
Cc: Tobias Waldekranz <tobias@waldekranz.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a short summary of the devlink features supported by the DSA core.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The documentation was already lagging behind by not mentioning the old
version of port_bridge_flags (port_set_egress_floods). So now we are
skipping one step and just explaining how a DSA driver should configure
address learning and flooding settings.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
On one hand, the link is dead and therefore useless.
On the other hand, there are always more drivers to port, but at this
stage, DSA does not need to affirm itself as the driver model to use for
Ethernet-connected switches (since we already have 15 tagging protocols
supported and probably more switch families from various vendors), so
there is nothing actionable to do.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
After the recent series containing commit bae33f2b5a ("net: switchdev:
remove the transaction structure from port attributes"), there aren't
prepare/commit transactional phases anymore in most of the switchdev
objects/attributes, and as a result, there aren't any in the DSA driver
API either. So remove this piece.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
After Vivien's series from 2019 containing commits 27d4d19d7c ("net:
dsa: remove limitation of switch index value") and ab8ccae122 ("net:
dsa: add ports list in the switch fabric"), this is basically no longer
true.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The chapter about tagging protocols is out of date because it doesn't
mention all taggers that have been added since last documentation
update. But judging based on that, it will always tend to lag behind,
and there's no good reason why we would enumerate the supported
hardware. Instead we could do something more useful and explain what
there is to know about tagging protocols instead.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While preparing some slides for a customer presentation, I found the
existing high-level view to be a bit confusing, so I modified it a
little bit.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ena.rst documentation referred to end_start_xmit() when it should refer
to ena_start_xmit(). Fix the typo.
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Acked-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The batching logic in netvsc_send is non-trivial, due to
a combination of the Linux API and the underlying hypervisor
interface. Add a comment explaining why the code is written this
way.
Signed-off-by: Shachar Raindel <shacharr@microsoft.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Documentation is missing and it's not very clear what
this callback is for - presumably testing the recovery?
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Minor tweaks and improvement of wording about the diagnose callback.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit fixes three spelling typos in devlink-dpipe.rst and
devlink-port.rst.
Signed-off-by: Eva Dengler <eva.dengler@fau.de>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
"either" is outside the parentheses, so the matching "or" should be too.
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Following the recent update to MAINTAINERS update my e-mail address.
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes a spelling typo in bonding.rst.
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Here is the "big" set of staging and IIO driver patches for 5.12-rc1.
Nothing really huge in here, the number of staging tree patches has gone
down for a bit, maybe there's only so much churn to happen in here at
the moment.
The IIO changes are:
- new drivers
- new DT bindings
- new iio driver features
with full details in the shortlog.
The staging driver patches are just a lot of tiny coding style cleanups,
along with some semi-larger hikey driver cleanups as those are _almost_
good enough to get out of the staging tree, but will probably have to
wait until 5.13 to have happen.
All of these have been in linux-next with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYCqelQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ymIDQCguWtPGy6U1sgaL3GAK/ROt2aet3wAn3TP1WgB
GeKAKKPshu3cskYQzlou
=UPZR
-----END PGP SIGNATURE-----
Merge tag 'staging-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging and IIO driver updates from Greg KH:
"Here is the "big" set of staging and IIO driver patches for 5.12-rc1.
Nothing really huge in here, the number of staging tree patches has
gone down for a bit, maybe there's only so much churn to happen in
here at the moment.
The IIO changes are:
- new drivers
- new DT bindings
- new iio driver features
with full details in the shortlog.
The staging driver patches are just a lot of tiny coding style
cleanups, along with some semi-larger hikey driver cleanups as those
are _almost_ good enough to get out of the staging tree, but will
probably have to wait until 5.13 to have happen.
All of these have been in linux-next with no reported issues"
* tag 'staging-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (189 commits)
staging: hikey9xx: Fix alignment of function parameters
staging: greybus: Fixed a misspelling in hid.c
staging: wimax/i2400m: fix some byte order issues found by sparse
staging: wimax: i2400m: fix some incorrect type warnings
staging: greybus: minor code style fix
staging:wlan-ng: use memdup_user instead of kmalloc/copy_from_user
staging:r8188eu: use IEEE80211_FCTL_* kernel definitions
staging: rtl8192e: remove multiple blank lines
staging: greybus: Fixed alignment issue in hid.c
staging: wfx: remove unused included header files
staging: nvec: minor coding style fix
staging: wimax: Fix some coding style problem
staging: fbtft: add tearing signal detect
staging: vt6656: Fixed issue with alignment in rf.c
staging: qlge: Remove duplicate word in comment
staging: rtl8723bs: remove obsolete commented out code
staging: rtl8723bs: fix function comments to follow kernel-doc
staging: wfx: avoid defining array of flexible struct
staging: rtl8723bs: Replace one-element array with flexible-array member in struct ndis_80211_var_ie
staging: Replace lkml.org links with lore
...
Here is the big set of tty/serial driver changes for 5.12-rc1.
Nothing huge, just lots of good cleanups and additions:
- Your n_tty line discipline cleanups
- vt core cleanups and reworks to make the code more "modern"
- stm32 driver additions
- tty led support added to the tty core and led layer
- minor serial driver fixups and additions
All of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYCqgqw8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ymJYQCgnxHmkhzJ2VarTDR3cWm1gu0NU7AAoNe5wWUh
4TQbhB9LSNo78HnIVze0
=Chcg
-----END PGP SIGNATURE-----
Merge tag 'tty-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial driver updates from Greg KH:
"Here is the big set of tty/serial driver changes for 5.12-rc1.
Nothing huge, just lots of good cleanups and additions:
- n_tty line discipline cleanups
- vt core cleanups and reworks to make the code more "modern"
- stm32 driver additions
- tty led support added to the tty core and led layer
- minor serial driver fixups and additions
All of these have been in linux-next for a while with no reported
issues"
* tag 'tty-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (54 commits)
serial: core: Remove BUG_ON(in_interrupt()) check
vt_ioctl: Remove in_interrupt() check
dt-bindings: serial: imx: Switch to my personal address
vt: keyboard, use new API for keyboard_tasklet
serial: stm32: improve platform_get_irq condition handling in init_port
serial: ifx6x60: Remove driver for deprecated platform
tty: fix up iterate_tty_read() EOVERFLOW handling
tty: fix up hung_up_tty_read() conversion
tty: fix up hung_up_tty_write() conversion
tty: teach the n_tty ICANON case about the new "cookie continuations" too
tty: teach n_tty line discipline about the new "cookie continuations"
tty: clean up legacy leftovers from n_tty line discipline
tty: implement read_iter
tty: convert tty_ldisc_ops 'read()' function to take a kernel pointer
serial: remove sirf prima/atlas driver
serial: mxs-auart: Remove <asm/cacheflush.h>
serial: mxs-auart: Remove serial_mxs_probe_dt()
serial: fsl_lpuart: Use of_device_get_match_data()
dt-bindings: serial: renesas,hscif: Add r8a779a0 support
tty: serial: Drop unused efm32 serial driver
...
We have no in-tree users, also update the sfp-phylink.rst documentation
to indicate that phy_attach_direct() is used instead of of_phy_attach().
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf-next 2021-02-16
The following pull-request contains BPF updates for your *net-next* tree.
There's a small merge conflict between 7eeba1706e ("tcp: Add receive timestamp
support for receive zerocopy.") from net-next tree and 9cacf81f81 ("bpf: Remove
extra lock_sock for TCP_ZEROCOPY_RECEIVE") from bpf-next tree. Resolve as follows:
[...]
lock_sock(sk);
err = tcp_zerocopy_receive(sk, &zc, &tss);
err = BPF_CGROUP_RUN_PROG_GETSOCKOPT_KERN(sk, level, optname,
&zc, &len, err);
release_sock(sk);
[...]
We've added 116 non-merge commits during the last 27 day(s) which contain
a total of 156 files changed, 5662 insertions(+), 1489 deletions(-).
The main changes are:
1) Adds support of pointers to types with known size among global function
args to overcome the limit on max # of allowed args, from Dmitrii Banshchikov.
2) Add bpf_iter for task_vma which can be used to generate information similar
to /proc/pid/maps, from Song Liu.
3) Enable bpf_{g,s}etsockopt() from all sock_addr related program hooks. Allow
rewriting bind user ports from BPF side below the ip_unprivileged_port_start
range, both from Stanislav Fomichev.
4) Prevent recursion on fentry/fexit & sleepable programs and allow map-in-map
as well as per-cpu maps for the latter, from Alexei Starovoitov.
5) Add selftest script to run BPF CI locally. Also enable BPF ringbuffer
for sleepable programs, both from KP Singh.
6) Extend verifier to enable variable offset read/write access to the BPF
program stack, from Andrei Matei.
7) Improve tc & XDP MTU handling and add a new bpf_check_mtu() helper to
query device MTU from programs, from Jesper Dangaard Brouer.
8) Allow bpf_get_socket_cookie() helper also be called from [sleepable] BPF
tracing programs, from Florent Revest.
9) Extend x86 JIT to pad JMPs with NOPs for helping image to converge when
otherwise too many passes are required, from Gary Lin.
10) Verifier fixes on atomics with BPF_FETCH as well as function-by-function
verification both related to zero-extension handling, from Ilya Leoshkevich.
11) Better kernel build integration of resolve_btfids tool, from Jiri Olsa.
12) Batch of AF_XDP selftest cleanups and small performance improvement
for libbpf's xsk map redirect for newer kernels, from Björn Töpel.
13) Follow-up BPF doc and verifier improvements around atomics with
BPF_FETCH, from Brendan Jackman.
14) Permit zero-sized data sections e.g. if ELF .rodata section contains
read-only data from local variables, from Yonghong Song.
15) veth driver skb bulk-allocation for ndo_xdp_xmit, from Lorenzo Bianconi.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Some internal PHY's have their events like link change reported by the
MAC interrupt. We have PHY_IGNORE_INTERRUPT to deal with this scenario.
I'm not too happy with this name. We don't ignore interrupts, typically
there is no interrupt exposed at a PHY level. So let's rename it to
PHY_MAC_INTERRUPT. This is in line with phy_mac_interrupt(), which is
called from the MAC interrupt handler to handle PHY events.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
J721e, J7200 and AM64 have multi port switches which can work in multi
mac mode and in switch mode. Add documentation explaining how to use
different modes.
Borrowed from:
Documentation/networking/device_drivers/ethernet/ti/cpsw_switchdev.rst
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
AM65 NUSS ethernet switch on K3 devices can be configured to work either
in independent mac mode where each port acts as independent network
interface (multi mac) or switch mode.
Add devlink hooks to provide a way to switch b/w these modes.
Rationale to use devlink instead of defaulting to bridge mode is that
SoC use cases require to support multiple independent MAC ports with no
switching so that users can use software bridges with multi-mac
configuration (e.g: to support LAG, HSR/PRP, etc). Also, switching
between multi mac and switch mode requires significant Port and ALE
reconfiguration, therefore is easier to be made as part of mode change
devlink hooks. It also allows to keep user interface similar to what
was implemented for the previous generation of TI CPSW IP
(on AM33/AM43/AM57 SoCs).
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcp_rmem[1] has been changed to 131072, we should update the documentation
to reflect this.
Fixes: a337531b94 ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Zhibin Liu <zhibinliu@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for offloading of HSR/PRP (IEC 62439-3) tag insertion
tag removal, duplicate generation and forwarding.
For HSR, insertion involves the switch adding a 6 byte HSR header after
the 14 byte Ethernet header. For PRP it adds a 6 byte trailer.
Tag removal involves automatically stripping the HSR/PRP header/trailer
in the switch. This is possible when the switch also performs auto
deduplication using the HSR/PRP header/trailer (making it no longer
required).
Forwarding involves automatically forwarding between redundant ports in
an HSR. This is crucial because delay is accumulated as a frame passes
through each node in the ring.
Duplication involves the switch automatically sending a single frame
from the CPU port to both redundant ports. This is required because the
inserted HSR/PRP header/trailer must contain the same sequence number
on the frames sent out both redundant ports.
Export is_hsr_master so DSA can tell them apart from other devices in
dsa_slave_changeupper.
Signed-off-by: George McCollister <george.mccollister@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>