Commit Graph

6093 Commits

Author SHA1 Message Date
Nikolay Aleksandrov
6e1ca489c5 bridge: fdb: add new flush command
Add support for fdb bulk delete (aka flush) command. Currently it only
supports the self and master flags with the same semantics as fdb
add/del. The device is a mandatory argument.

Example:
$ bridge fdb flush dev br0
This will delete *all* fdb entries in br0's fdb table.

$ bridge fdb flush dev swp1 master
This will delete all fdb entries pointing to swp1.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
2022-06-10 09:00:31 -06:00
David Ahern
cef46213d5 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@kernel.org>
2022-06-09 09:12:36 -06:00
Petr Machata
11e41a635c ip: Convert non-constant initializers to macros
As per the C standard, "expressions in an initializer for an object that
has static or thread storage duration shall be constant expressions".
Aggregate objects are not constant expressions. Newer GCC doesn't mind, but
older GCC and LLVM do.

Therefore convert to a macro. And since all these macros will look very
similar, extract a generic helper, IPSTATS_STAT_DESC_XSTATS_LEAF, which
takes the leaf name as an argument and initializes the rest as appropriate
for an xstats descriptor.

Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2022-06-01 16:01:14 -07:00
David Ahern
d889b151f9 Merge branch 'ss-threads' into next
Peilin Ye  says:

====================

From: Peilin Ye <peilin.ye@bytedance.com>

This patchset adds a new ss option, -T (--threads), to show thread
information.  It extends the -p (--processes) option, and should be useful
for debugging, monitoring multi-threaded applications.  Example output:

  $ ss -ltT "sport = 1234"
  State   Recv-Q  Send-Q  Local Address:Port      Peer Address:Port       Process
  LISTEN  0       100           0.0.0.0:1234           0.0.0.0:*           users:(("test",pid=2932547,tid=2932548,fd=3),("test",pid=2932547,tid=2932547,fd=3))

It implies -p i.e. it outputs all threads in the thread group, including
the thread group leader.  When -T is used, -Z and -z also show SELinux
contexts for threads.

[1-5/7] are small clean-ups for the user_ent_hash_build() function.  [6/7]
factors out logic iterating $PROC_ROOT/$PID/fd/ from user_ent_hash_build()
to make [7/7] easier.  [7/7] actually implements the feature.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2022-05-30 09:54:59 -06:00
Peilin Ye
e2267e68b9 ss: Introduce -T, --threads option
The -p, -Z and -z options only show process (thread group leader)
information.  For example, if the thread group leader has exited, but
another thread in the group is still using a socket, ss -[pZz] does not
show it.

Add a new option, -T (--threads), to show thread information.  It implies
the -p option.  For example, imagine process A and thread B (in the same
group) using the same socket.  ss -p only shows A:

  $ ss -ltp "sport = 1234"
  State   Recv-Q  Send-Q  Local Address:Port      Peer Address:Port       Process
  LISTEN  0       100           0.0.0.0:1234           0.0.0.0:*           users:(("test",pid=2932547,fd=3))

ss -T shows A and B:

  $ ss -ltT "sport = 1234"
  State   Recv-Q  Send-Q  Local Address:Port      Peer Address:Port       Process
  LISTEN  0       100           0.0.0.0:1234           0.0.0.0:*           users:(("test",pid=2932547,tid=2932548,fd=3),("test",pid=2932547,tid=2932547,fd=3))

If -T is used, -Z and -z also show SELinux contexts for threads.

Rename some variables (from "process" to "task", for example) since we
use them for both processes and threads.

Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2022-05-30 09:54:28 -06:00
Peilin Ye
12d491e58f ss: Factor out fd iterating logic from user_ent_hash_build()
We are planning to add a thread version of the -p, --process option.
Move the logic iterating $PROC_ROOT/$PID/fd/ into a new function,
user_ent_hash_build_task(), to make it easier.

Since we will use this function for both processes and threads, rename
local variables as such (e.g. from "process" to "task").

Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2022-05-30 09:54:23 -06:00
Peilin Ye
210018bfe9 ss: Fix coding style issues in user_ent_hash_build()
Make checkpatch.pl --strict happy about user_ent_hash_build().

Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2022-05-30 09:54:18 -06:00
Peilin Ye
ea3b57ec39 ss: Delete unnecessary call to snprintf() in user_ent_hash_build()
'name' is already $PROC_ROOT/$PID/fd/$FD there, no need to rebuild the
string.

Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2022-05-30 09:54:12 -06:00
Peilin Ye
cd845a8568 ss: Do not call user_ent_hash_build() more than once
Call user_ent_hash_build() once after the getopt_long() loop if -p, -z
or -Z is used.

Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2022-05-30 09:54:01 -06:00
Peilin Ye
b38831bc23 ss: Remove unnecessary stack variable 'p' in user_ent_hash_build()
Commit 116ac9270b ("ss: Add support for retrieving SELinux contexts")
added an unnecessary stack variable, 'char *p', in
user_ent_hash_build().  Delete it for readability.

Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2022-05-30 09:53:55 -06:00
Peilin Ye
2d866c6d93 ss: Use assignment-suppression character in sscanf()
Use the '*' assignment-suppression character, instead of an
inappropriately named temporary variable.

Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2022-05-30 09:53:50 -06:00
Maxim Mikityanskiy
21c07b4568 ss: Show zerocopy sendfile status of TLS sockets
Print the activation status of zerocopy sendfile on TLS sockets.
Zerocopy sendfile was recently added to Linux and exposed via sock_diag.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2022-05-30 09:52:55 -06:00
Eric Dumazet
2a0541810c iplink: report tso_max_size and tso_max_segs
New netlink attributes IFLA_TSO_MAX_SIZE and IFLA_TSO_MAX_SEGS
are used to report device TSO limits to user-space.

ip -d link sh dev eth0
...
   tso_max_size 65536 tso_max_segs 65535

ip -d link sh dev lo
...
   tso_max_size 524280 tso_max_segs 65535

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2022-05-30 09:51:53 -06:00
David Ahern
555a775012 Merge branch 'main' into next
Signed-off-by: David Ahern <dsahern@kernel.org>
2022-05-30 09:51:08 -06:00
Stephen Hemminger
b1521ec002 Merge git://git.kernel.org/pub/scm/network/iproute2/iproute2-next 2022-05-26 17:09:59 -07:00
Stephen Hemminger
6474b7c865 v5.18.0 2022-05-26 15:51:48 -07:00
Andrea Claudi
4429a6c9b4 tipc: fix keylen check
Key length check in str2key() is wrong for hex. Fix this using the
proper hex key length.

Fixes: 28ee49e515 ("tipc: bail out if key is abnormally long")
Suggested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2022-05-17 12:57:45 -07:00
Eric Dumazet
6b6979b9d4 iplink: remove GSO_MAX_SIZE definition
David removed the check using GSO_MAX_SIZE
in commit f1d18e2e6e ("Update kernel headers").

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2022-05-16 10:21:03 -07:00
Andrea Claudi
19c3e009e3 doc: fix 'infact' --> 'in fact' typo
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2022-05-13 16:34:09 -07:00
Andrea Claudi
ed706c78ca man: fix some typos
In dcb-app man page, 'direcly' should be 'directly'
In dcb-dcbx man page, 'respecively' should be 'respectively'
In devlink-dev man page, 'unspecificed' should be 'unspecified'

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2022-05-13 16:33:50 -07:00
Andrea Claudi
03589beb45 man: devlink-region: fix typo in example
devlink-region does not accept the legth param, but the length one.

Fixes: 8b4fbf0bed ("devlink: Add support for devlink-region access")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2022-05-13 16:33:16 -07:00
Andrea Claudi
b84fc3321c tc: em_u32: fix offset parsing
tc u32 ematch offset parsing might fail even if nexthdr offset is
aligned to 4. The issue can be reproduced with the following script:

tc qdisc del dev dummy0 root
tc qdisc add dev dummy0 root handle 1: htb r2q 1 default 1
tc class add dev dummy0 parent 1:1 classid 1:108 htb quantum 1000000 \
	rate 1.00mbit ceil 10.00mbit burst 6k

while true; do
if ! tc filter add dev dummy0 protocol all parent 1: prio 1 basic match \
	"meta(vlan mask 0xfff eq 1)" and "u32(u32 0x20011002 0xffffffff \
	at nexthdr+8)" flowid 1:108; then
		exit 0
fi
done

which we expect to produce an endless loop.
With the current code, instead, this ends with:

u32: invalid offset alignment, must be aligned to 4.
... meta(vlan mask 0xfff eq 1) and >>u32(u32 0x20011002 0xffffffff at nexthdr+8)<< ...
... u32(u32 0x20011002 0xffffffff at >>nexthdr+8<<)...
Usage: u32(ALIGN VALUE MASK at [ nexthdr+ ] OFFSET)
where: ALIGN  := { u8 | u16 | u32 }

Example: u32(u16 0x1122 0xffff at nexthdr+4)
Illegal "ematch"

This is caused by memcpy copying into buf an unterminated string.

Fix it using strncpy instead of memcpy.

Fixes: commit 311b41454d ("Add new extended match files.")
Reported-by: Alfred Yang <alf.redyoung@gmail.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2022-05-13 16:32:45 -07:00
Stephen Hemminger
b6d1708663 uapi: update of virtio_ids
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2022-05-13 14:34:43 -07:00
David Ahern
8d3977ef81 Update kernel headers
Update kernel headers to commit:
    a65cc8435540 ("Merge branch 'bnxt_en-next'")

Signed-off-by: David Ahern <dsahern@kernel.org>
2022-05-13 09:01:28 -06:00
David Ahern
5a179c7217 Merge branch 'support-xstats-afstats' into next
Petr Machata  says:

====================

The RTM_GETSTATS response attributes IFLA_STATS_LINK_XSTATS and
IFLA_STATS_LINK_XSTATS_SLAVE are used to carry statistics related to,
respectively, netdevices of a certain type, and netdevices enslaved to
netdevices of a certain type. IFLA_STATS_AF_SPEC are similarly used to
carry statistics specific to a certain address family.

In this patch set, add support for three new stats groups that cover the
above attributes: xstats, xstats_slave and afstats. Add bridge and bond
subgroups to the former two groups, and mpls subgroup to the latter one.

Now "group" is used for selecting the top-level attribute, and subgroup
for the link-type or address-family nest below it (bridge, bond, mpls in
this patchset). But xstats (both master and slave) are further
subdivided. E.g. in the case of bridge statistics, the two subdivisions
are called "stp" and "mcast". To make it possible to pick these sets,
add to the two selector levels of group and subgroup a third level,
suite, which is filtered in the userspace.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2022-05-12 11:09:29 -06:00
Petr Machata
5a1ad9f8c1 man: ip-stats.8: Describe groups xstats, xstats_slave and afstats
Add description of the newly-added statistics groups.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2022-05-12 11:08:35 -06:00
Petr Machata
d9976d671c ipstats: Expose bond stats in ipstats
Describe xstats and xstats_slave subgroups for bond netdevices.

For example:

 # ip stats show dev swp1 group xstats_slave subgroup bond
 56: swp1: group xstats_slave subgroup bond suite 802.3ad
                     LACPDU Rx 0
                     LACPDU Tx 0
                     LACPDU Unknown type Rx 0
                     LACPDU Illegal Rx 0
                     Marker Rx 0
                     Marker Tx 0
                     Marker response Rx 0
                     Marker response Tx 0
                     Marker unknown type Rx 0

 # ip -j stats show dev swp1 group xstats_slave subgroup bond | jq
 [
   {
     "ifindex": 56,
     "ifname": "swp1",
     "group": "xstats_slave",
     "subgroup": "bond",
     "suite": "802.3ad",
     "802.3ad": {
       "lacpdu_rx": 0,
       "lacpdu_tx": 0,
       "lacpdu_unknown_rx": 0,
       "lacpdu_illegal_rx": 0,
       "marker_rx": 0,
       "marker_tx": 0,
       "marker_response_rx": 0,
       "marker_response_tx": 0,
       "marker_unknown_rx": 0
     }
   }
 ]

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2022-05-12 11:08:30 -06:00
Petr Machata
36e10429da ipstats: Expose bridge stats in ipstats
Bridge supports two suites, STP and IGMP, carried by attributes
BRIDGE_XSTATS_STP and BRIDGE_XSTATS_MCAST. Expose them as suites "stp" and
"mcast" (to correspond to the attribute name).

For example:

 # ip stats show dev swp1 group xstats_slave subgroup bridge
 56: swp1: group xstats_slave subgroup bridge suite mcast
                     IGMP queries:
                       RX: v1 0 v2 0 v3 0
                       TX: v1 0 v2 0 v3 0
                     IGMP reports:
                       RX: v1 0 v2 0 v3 0
                       TX: v1 0 v2 0 v3 0
                     IGMP leaves: RX: 0 TX: 0
                     IGMP parse errors: 0
                     MLD queries:
                       RX: v1 0 v2 0
                       TX: v1 0 v2 0
                     MLD reports:
                       RX: v1 0 v2 0
                       TX: v1 0 v2 0
                     MLD leaves: RX: 0 TX: 0
                     MLD parse errors: 0

 56: swp1: group xstats_slave subgroup bridge suite stp
                     STP BPDU:  RX: 0 TX: 0
                     STP TCN:   RX: 0 TX: 0
                     STP Transitions: Blocked: 1 Forwarding: 0

 # ip -j stats show dev swp1 group xstats_slave subgroup bridge | jq
 [
   {
     "ifindex": 56,
     "ifname": "swp1",
     "group": "xstats_slave",
     "subgroup": "bridge",
     "suite": "mcast",
     "multicast": {
       "igmp_queries": {
         "rx_v1": 0,
         "rx_v2": 0,
         "rx_v3": 0,
         "tx_v1": 0,
         "tx_v2": 0,
         "tx_v3": 0
       },
       "igmp_reports": {
         "rx_v1": 0,
         "rx_v2": 0,
         "rx_v3": 0,
         "tx_v1": 0,
         "tx_v2": 0,
         "tx_v3": 0
       },
       "igmp_leaves": {
         "rx": 0,
         "tx": 0
       },
       "igmp_parse_errors": 0,
       "mld_queries": {
         "rx_v1": 0,
         "rx_v2": 0,
         "tx_v1": 0,
         "tx_v2": 0
       },
       "mld_reports": {
         "rx_v1": 0,
         "rx_v2": 0,
         "tx_v1": 0,
         "tx_v2": 0
       },
       "mld_leaves": {
         "rx": 0,
         "tx": 0
       },
       "mld_parse_errors": 0
     }
   },
   {
     "ifindex": 56,
     "ifname": "swp1",
     "group": "xstats_slave",
     "subgroup": "bridge",
     "suite": "stp",
     "stp": {
       "rx_bpdu": 0,
       "tx_bpdu": 0,
       "rx_tcn": 0,
       "tx_tcn": 0,
       "transition_blk": 1,
       "transition_fwd": 0
     }
   }
 ]

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2022-05-12 11:08:26 -06:00
Petr Machata
79f5ad95c1 iplink_bridge: Split bridge_print_stats_attr()
Extract from bridge_print_stats_attr() two helpers, one for dumping the
multicast attribute, one for dumping the STP attribute.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2022-05-12 11:08:23 -06:00
Petr Machata
1247ed51e9 ipstats: Add groups "xstats", "xstats_slave"
The RTM_GETSTATS response attributes IFLA_STATS_LINK_XSTATS and
IFLA_STATS_LINK_XSTATS_SLAVE are used to carry statistics related to,
respectively, netdevices of a certain type, and netdevices enslaved to
netdevices of a certain type. Inside the nest is then link-type specific
attribute (e.g. LINK_XSTATS_TYPE_BRIDGE), and inside that nest further
attributes for individual type-specific statistical suites.

Under the "ip stats" model, that corresponds to groups "xstats" and
"xstats_slave", link-type specific subgroup, e.g. "bridge", and one or more
link-type specific suites, such as "stp".

Link-type specific stats are currently supported through struct link_util
and in particular the callbacks parse_ifla_xstats and print_ifla_xstats.

The role of parse_ifla_xstats is to establish which statistical suite to
display, and on which device. "ip stats" has framework for both of these
tasks, which obviates the need for custom parsing. Therefore the module
should instead provide a subgroup descriptor, which "ip stats" will then
use as any other.

The second link_util callback, print_ifla_xstats, is for response
dissection. In "ip stats" model, this belongs to leaf descriptors.

Eventually, the link-specific leaf descriptors will be similar to each
other: either master or slave top-level nest needs to be parsed, and
link-type attribute underneath that, and suite attribute underneath that.

To support this commonality, add struct ipstats_stat_desc_xstats to
describe the xstats suites. Further, expose ipstats_stat_desc_pack_xstats()
and ipstats_stat_desc_show_xstats(), which can be used at leaf descriptors
and do the appropriate thing according to the configuration in
ipstats_stat_desc_xstats.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2022-05-12 11:08:20 -06:00
Petr Machata
c6900b79b1 ipstats: Add a third level of stats hierarchy, a "suite"
To show statistics nested under IFLA_STATS_LINK_XSTATS_SLAVE or
IFLA_STATS_LINK_XSTATS, one would use "group" to select the top-level
attribute, then "subgroup" to select the link type, which is itself a nest,
and then would lack a way to denote which attribute to select out of the
link-type nest.

To that end, add the selector level "suite", which is filtered in the
userspace.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2022-05-12 11:08:16 -06:00
Petr Machata
2ed73b9a80 iplink: Add JSON support to MPLS stats formatter
MPLS stats currently do not support dumping in JSON format. Recognize when
JSON is requested and dump in an obvious manner:

 # ip -n ns0-2G8Ozd9z -j stats show dev veth01 group afstats | jq
 [
   {
     "ifindex": 3,
     "ifname": "veth01",
     "group": "afstats",
     "subgroup": "mpls",
     "mpls_stats": {
       "rx": {
         "bytes": 0,
         "packets": 0,
         "errors": 0,
         "dropped": 0,
         "noroute": 0
       },
       "tx": {
         "bytes": 216,
         "packets": 2,
         "errors": 0,
         "dropped": 0
       }
     }
   }
 ]

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2022-05-12 11:08:13 -06:00
Petr Machata
5ed8fd9d51 ipstats: Add a group "afstats", subgroup "mpls"
Add a new group, "afstats", for showing counters from the
IFLA_STATS_AF_SPEC nest, and a subgroup "mpls" for the AF_MPLS
specifically.

For example:

 # ip -n ns0-NrdgY9sx stats show dev veth01 group afstats
 3: veth01: group afstats subgroup mpls
     RX: bytes packets errors dropped noroute
             0       0      0       0       0
     TX: bytes packets errors dropped
           108       1      0       0

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2022-05-12 11:08:10 -06:00
Petr Machata
dff392fd86 iplink: Publish a function to format MPLS stats
Extract from print_mpls_stats() a new function, print_mpls_link_stats(),
make it non-static and publish in the header file.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2022-05-12 11:08:06 -06:00
Petr Machata
72623b73c4 iplink: Fix formatting of MPLS stats
Currently, MPLS stats are formatted this way:

 # ip -n ns0-DBZpxj8I link afstats dev veth01
 3: veth01
     mpls:
         RX: bytes  packets  errors  dropped  noroute
                  0        0       0        0       0
         TX: bytes  packets  errors  dropped
                216        2       0       0

Note how most numbers are not aligned properly under their column headers.
Fix by converting the code to use size_columns() to dynamically determine
the necessary width of individual columns, which also takes care of
formatting the table properly in case the counter values are high.
After the fix, the formatting looks as follows:

 # ip -n ns0-Y1PyEc55 link afstats dev veth01
 3: veth01
     mpls:
         RX: bytes packets errors dropped noroute
                 0       0      0       0       0
         TX: bytes packets errors dropped
               108       1      0       0

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2022-05-12 11:07:46 -06:00
Petr Machata
ce41750fcc ip: ipstats: Do not assume length of response attribute payload
In Linux kernel commit 794c24e9921f ("net-core: rx_otherhost_dropped to
core_stats"), struct rtnl_link_stats64 got a new member. This change got to
iproute2 through commit bba9583752 ("Update kernel headers").

"ip stats" makes the assumption that the payload of attributes that carry
structures is at least as long as the size of the given structure as
iproute2 knows it. But that will not hold when a newer iproute2 is used
against an older kernel: since such kernel misses some fields on the tail
end of the structure, "ip stats" bails out:

 # ip stats show group link
 1: lo: group link
 Error: attribute payload too shortDump terminated

Instead, be tolerant of responses that are both longer and shorter than
what is expected. Instead of forming a pointer directly into the payload,
allocate the stats structure on the stack, zero it, and then copy over the
portion from the response.

Reported-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2022-05-08 09:54:05 -06:00
David Ahern
2c09c7622a Merge branch 'bridge-vxlan-vni-filtering' into next
Roopa Prabhu  says:

====================

This series adds bridge command to manage
recently added vnifilter on a collect metadata
vxlan (external) device. Also includes per vni stats
support.

examples:
$bridge vni add dev vxlan0 vni 400

$bridge vni add dev vxlan0 vni 200 group 239.1.1.101

$bridge vni del dev vxlan0 vni 400

$bridge vni show

$bridge -s vni show

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2022-05-08 09:51:02 -06:00
Nikolay Aleksandrov
40b50f153c bridge: vni: add support for stats dumping
Add support for "-s" option which causes bridge vni to dump per-vni
statistics. Note that it disables vni range compression.

Example:
$ bridge -s vni | more
 dev               vni              group/remote
 vxlan0             1024  239.1.1.1
                     RX: bytes 0 pkts 0 drops 0 errors 0
                     TX: bytes 0 pkts 0 drops 0 errors 0
                    1025  239.1.1.1
                     RX: bytes 0 pkts 0 drops 0 errors 0
                     TX: bytes 0 pkts 0 drops 0 errors 0

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Roopa Prabhu <roopa@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2022-05-08 09:50:34 -06:00
Roopa Prabhu
c7f12a156b ip: iplink_vxlan: add support to set vnifiltering flag on vxlan device
This patch adds option to set vnifilter flag on a vxlan device. vnifilter is
only supported on a collect metadata device.

example: set vnifilter flag
$ ip link add vxlan0 type vxlan external vnifilter local 172.16.0.1

Signed-off-by: Roopa Prabhu <roopa@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2022-05-08 09:50:33 -06:00
Roopa Prabhu
45cd32f9f7 bridge: vxlan device vnifilter support
This patch adds bridge command to manage
recently added vnifilter on a collect metadata
vxlan device.

examples:
$bridge vni add dev vxlan0 vni 400

$bridge vni add dev vxlan0 vni 200 group 239.1.1.101

$bridge vni del dev vxlan0 vni 400

$bridge vni show

$bridge -s vni show

Signed-off-by: Roopa Prabhu <roopa@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2022-05-08 09:50:32 -06:00
David Ahern
17bf51b74d libbpf: Remove use of bpf_map_is_offload_neutral
bpf_map_is_offload_neutral is deprecated as of v0.8+;
import definition to maintain backwards compatibility.

Signed-off-by: David Ahern <dsahern@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2022-05-04 20:20:21 -07:00
David Ahern
fa30592512 libbpf: Remove use of bpf_program__set_priv and bpf_program__priv
bpf_program__set_priv and bpf_program__priv are deprecated as of
libbpf v0.7+. Rather than store the map as priv on the program,
change find_legacy_tail_calls to take an argument to return a reference
to the map.

find_legacy_tail_calls is invoked twice from load_bpf_object - the
first time to check for programs that should be loaded. In this case
a reference to the map is not needed, but it does validate the map
exists. The second is invoked from update_legacy_tail_call_maps where
the map pointer is needed.

Signed-off-by: David Ahern <dsahern@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2022-05-04 20:20:21 -07:00
David Ahern
9e0057b48d libbpf: Use bpf_object__load instead of bpf_object__load_xattr
bpf_object__load_xattr is deprecated as of v0.8+; remove it
in favor of bpf_object__load.

Signed-off-by: David Ahern <dsahern@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2022-05-04 20:20:21 -07:00
David Ahern
837294e452 libbpf: Remove use of bpf_map_is_offload_neutral
bpf_map_is_offload_neutral is deprecated as of v0.8+;
import definition to maintain backwards compatibility.

Signed-off-by: David Ahern <dsahern@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
2022-05-02 14:46:11 -07:00
David Ahern
64e5ed779f libbpf: Remove use of bpf_program__set_priv and bpf_program__priv
bpf_program__set_priv and bpf_program__priv are deprecated as of
libbpf v0.7+. Rather than store the map as priv on the program,
change find_legacy_tail_calls to take an argument to return a reference
to the map.

find_legacy_tail_calls is invoked twice from load_bpf_object - the
first time to check for programs that should be loaded. In this case
a reference to the map is not needed, but it does validate the map
exists. The second is invoked from update_legacy_tail_call_maps where
the map pointer is needed.

Signed-off-by: David Ahern <dsahern@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
2022-05-02 14:46:10 -07:00
David Ahern
ba6519cbcb libbpf: Use bpf_object__load instead of bpf_object__load_xattr
bpf_object__load_xattr is deprecated as of v0.8+; remove it
in favor of bpf_object__load.

Signed-off-by: David Ahern <dsahern@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
2022-05-02 14:46:07 -07:00
Boris Sukholitko
a6eb654d1c f_flower: add number of vlans man entry
The documentation was missing in the number of vlans commit.

Fixes: 5ba31bcf (f_flower: Add num of vlans parameter)
Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2022-04-29 11:22:18 -06:00
David Ahern
b4f8055477 Merge branch 'flower-vlans' into next
Boris Sukholitko  says:

====================

Our customers in the fiber telecom world have network configurations
where they would like to control their traffic according to the number
of tags appearing in the packet.

For example, TR247 GPON conformance test suite specification mostly
talks about untagged, single, double tagged packets and gives lax
guidelines on the vlan protocol vs. number of vlan tags.

This is different from the common IT networks where 802.1Q and 802.1ad
protocols are usually describe single and double tagged packet. GPON
configurations that we work with have arbitrary mix the above protocols
and number of vlan tags in the packet.

The following patch series implement number of vlans flower filter. They
add num_of_vlans flower filter as an alternative to vlan ethtype protocol
matching. The end result is that the following command becomes possible:

tc filter add dev eth1 ingress flower \
  num_of_vlans 1 vlan_prio 5 action drop

Also, from our logs, we have redirect rules such that:

tc filter add dev $GPON ingress flower num_of_vlans $N \
     action mirred egress redirect dev $DEV

where N can range from 0 to 3 and $DEV is the function of $N.

Also there are rules setting skb mark based on the number of vlans:

tc filter add dev $GPON ingress flower num_of_vlans $N vlan_prio \
    $P action skbedit mark $M

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2022-04-27 20:18:25 -06:00
Boris Sukholitko
5788732e38 f_flower: Check args with num_of_vlans
Having more than one vlan allows matching on the vlan tag parameters.
This patch changes vlan key validation to take number of vlan tags into
account.

Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2022-04-27 20:18:12 -06:00
Boris Sukholitko
5ba31bcf44 f_flower: Add num of vlans parameter
Our customers in the fiber telecom world have network configurations
where they would like to control their traffic according to the number
of tags appearing in the packet.

For example, TR247 GPON conformance test suite specification mostly
talks about untagged, single, double tagged packets and gives lax
guidelines on the vlan protocol vs. number of vlan tags.

This is different from the common IT networks where 802.1Q and 802.1ad
protocols are usually describe single and double tagged packet. GPON
configurations that we work with have arbitrary mix the above protocols
and number of vlan tags in the packet.

This patch adds num_of_vlans flower key and associated print and parse
routines. The following command becomes possible:

tc filter add dev eth1 ingress flower num_of_vlans 1 action drop

Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2022-04-27 20:16:16 -06:00