the function has the same definition in ifstat and ss
v2: fix the typo in the chagelog
v3: rebase on master
Signed-off-by: Denis Kirjanov <dkirjanov@suse.de>
Signed-off-by: David Ahern <dsahern@kernel.org>
The 32 bit statistics are problematic since 32 bit value can
easily wraparound at high speed. Use 64 bit stats if available.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
Use snprintf to print only valid data.
That's the similar change done for ifstat.
Signed-off-by: Denis Kirjanov <dkirjanov@suse.de>
Signed-off-by: David Ahern <dsahern@kernel.org>
as the name doesn't require a lot of storage put
it on the stack. Moreover the memory allocated via
malloc wasn't returned.
Signed-off-by: Denis Kirjanov <dkirjanov@suse.de>
Signed-off-by: David Ahern <dsahern@kernel.org>
the argument passed to the function
is always a constant value
Signed-off-by: Denis Kirjanov <dkirjanov@suse.de>
Signed-off-by: David Ahern <dsahern@kernel.org>
Add a sentence in the doc to describe what the new "monitor" command
does.
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
Add the "ip ioam monitor" command to be able to read all IOAM data
received. This is based on a netlink multicast group.
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David Ahern <dsahern@kernel.org>
Quentin Deslandes says:
====================
BPF allows programs to store socket-specific data using
BPF_MAP_TYPE_SK_STORAGE maps. The data is attached to the socket itself,
and Martin added INET_DIAG_REQ_SK_BPF_STORAGES, so it can be fetched
using the INET_DIAG mechanism.
Currently, ss doesn't request the socket-local data, this patch aims to
fix this.
The first patch requests the socket-local data for the requested map ID
(--bpf-map-id=) or all the maps (--bpf-maps). It then prints the map_id
in COL_EXT.
Patch #2 uses libbpf and BTF to pretty print the map's content, like
`bpftool map dump` would do.
Patch #3 updates ss' man page to explain new options.
While I think it makes sense for ss to provide the socket-local storage
content for the sockets, it's difficult to conciliate the column-based
output of ss and having readable socket-local data. Hence, the
socket-local data is printed in a readable fashion over multiple lines
under its socket statistics, independently of the column-based approach.
Here is an example of ss' output with --bpf-maps:
[...]
ESTAB 340116 0 [...]
map_id: 114 [
(struct my_sk_storage){
.field_hh = (char)3,
(union){
.a = (int)17,
.b = (int)17,
},
}
]
Changed this series to an RFC as the merging window for net-next is
closed.
Changes from v8:
* Remove usage of libbpf_bpf_map_type_str() which requires libbpf-1.0+
and provide very little added value (David).
* Use ENABLE_BPF_SKSTORAGE_SUPPORT to gate the BPF socket-local storage
support, instead of HAVE_LIBBPF. iproute2 depends on libbpf-0.1, but
this change needs libbpf-0.5+. If the requirements are not met, ss can
still be compiled and used without BPF socket-local storage support, but
a warning will be printed at compile time.
Changes from v7:
* Fix comment format and checkpatch warnings (Stephen, David).
* Replaced Co-authored-by with Co-developed-by + Signed-off-by for
Martin's contribution on patch #1 to follow checkpatch requirements,
with Martin's approval.
Changes from v6:
* Remove column dedicated to BPF socket-local storage (COL_SKSTOR),
use COL_EXT instead (Matthieu).
Changes from v5:
* Add support for --oneline when printing socket-local data.
* Use \t to indent instead of " " to be consistent with other columns.
* Removed Martin's ack on patch #2 due to amount of lines changed.
Changes from v4:
* Fix return code for 2 calls.
* Fix issue when inet_show_netlink() retries a request.
* BPF dump object is created in bpf_map_opts_load_info().
Changes from v3:
* Minor refactoring to reduce number of HAVE_LIBBF usage.
* Update ss' man page.
* btf_dump structure created to print the socket-local data is cached
in bpf_map_opts. Creation of the btf_dump structure is performed if
needed, before printing the data.
* If a map can't be pretty-printed, print its ID and a message instead
of skipping it.
* If show_all=true, send an empty message to the kernel to retrieve all
the maps (as Martin suggested).
Changes from v2:
* bpf_map_opts_is_enabled is not inline anymore.
* Add more #ifdef HAVE_LIBBPF to prevent compilation error if
libbpf support is disabled.
* Fix erroneous usage of args instead of _args in vout().
* Add missing btf__free() and close(fd).
Changes from v1:
* Remove the first patch from the series (fix) and submit it separately.
* Remove double allocation of struct rtattr.
* Close BPF map FDs on exit.
* If bpf_map_get_fd_by_id() fails with ENOENT, print an error message
and continue to the next map ID.
* Fix typo in new command line option documentation.
* Only use bpf_map_info.btf_value_type_id and ignore
bpf_map_info.btf_vmlinux_value_type_id (unused for socket-local storage).
* Use btf_dump__dump_type_data() instead of manually using BTF to
pretty-print socket-local storage data. This change alone divides the size
of the patch series by 2.
====================
Signed-off-by: David Ahern <dsahern@kernel.org>
Document new --bpf-maps and --bpf-map-id= options.
Signed-off-by: Quentin Deslandes <qde@naccy.de>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
ss is able to print the map ID(s) for which a given socket has BPF
socket-local storage defined (using --bpf-maps or --bpf-map-id=). However,
the actual content of the map remains hidden.
This change aims to pretty-print the socket-local storage content following
the socket details, similar to what `bpftool map dump` would do. The exact
output format is inspired by drgn, while the BTF data processing is similar
to bpftool's.
ss will use libbpf's btf_dump__dump_type_data() to ease pretty-printing
of binary data. This requires out_bpf_sk_storage_print_fn() as a print
callback function used by btf_dump__dump_type_data(). vout() is also
introduced, which is similar to out() but accepts a va_list as
parameter.
ss' output remains unchanged unless --bpf-maps or --bpf-map-id= is used,
in which case each socket containing BPF local storage will be followed by
the content of the storage before the next socket's info is displayed.
Signed-off-by: Quentin Deslandes <qde@naccy.de>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
While sock_diag is able to return BPF socket-local storage in response
to INET_DIAG_REQ_SK_BPF_STORAGES requests, ss doesn't request it.
This change introduces the --bpf-maps and --bpf-map-id= options to request
BPF socket-local storage for all SK_STORAGE maps, or only specific ones.
The bigger part of this change will check the requested map IDs and
ensure they are valid. The column COL_EXT is used to print the
socket-local data into.
When --bpf-maps is used, ss will send an empty
INET_DIAG_REQ_SK_BPF_STORAGES request, in return the kernel will send
all the BPF socket-local storage entries for a given socket. The BTF
data for each map is loaded on demand, as ss can't predict which map ID
are used.
When --bpf-map-id=ID is used, a file descriptor to the requested maps is
open to 1) ensure the map doesn't disappear before the data is printed,
and 2) ensure the map type is BPF_MAP_TYPE_SK_STORAGE. The BTF data for
each requested map is loaded before the request is sent to the kernel.
Co-developed-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Quentin Deslandes <qde@naccy.de>
Signed-off-by: David Ahern <dsahern@kernel.org>
As Paolo noticed, a skb->len check against gso_max_size was added in:
https://lore.kernel.org/netdev/20231219125331.4127498-1-edumazet@google.com/
gso_max_size needs to be set to a value greater than or equal to
gso_ipv4_max_size to make BIG TCP IPv4 work properly.
To not break the current setup, this patch just adds a note into its
man doc for this.
Reported-by: Xiumei Mu <xmu@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
The usage in the man page was out of date with the usage help, fix it.
Also sort the commands alphabetically, the same as the command usage.
Signed-off-by: Yedaya Katsman <yedaya.ka@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Fix json corruption when using the "-json" option in some cases
Signed-off-by: Takanori Hirano <me@hrntknr.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
In the case of a process such as mapping a json to a structure,
it can be difficult if the keys have the same name but different types.
Since handle is used in hex string, change it to fw.
Signed-off-by: Takanori Hirano <me@hrntknr.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Use snprintf to print only valid data
v2: adjust formatting
v3: fix the issue with a buffer length
Signed-off-by: Denis Kirjanov <dkirjanov@suse.de>
Signed-off-by: David Ahern <dsahern@kernel.org>
The kernel will now send missing type information in error response.
Print it if present.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
Throughout ifstat.c, ifstat_ent.val is accessed as a long long unsigned
type, however it is defined as __u64. This works by coincidence on many
systems, however on ppc64le, __u64 is a long unsigned.
This patch makes the type definition consistent with all of the places
where it is accessed.
Fixes: 5a52102b7c ("ifstat: Add extended statistics to ifstat")
Reviewed-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Gallagher <sgallagh@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Add assertion to check for case of snprintf failing (bad format?)
or buffer getting full.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Fix json corruption when using the "-json" option in cases where tc-fw is set.
Signed-off-by: Takanori Hirano <me@hrntknr.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Fix some typos and spelling errors in iproute2 documentation.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Fix various typos and spelling errors in some iproute2 comments.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
If rtnl_listen detects error (such as netlink socket EOF),
then exit with status 2 like other iproute2 monitor commands.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
If rtnl_listen() returns error while looking for netconf events,
then exit with status of 2 as other iproute2 monitor actions do.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Use the same pattern for handling rtnl_listen() errors that
is used across other iproute2 commands. All other commands
exit with status of 2 if rtnl_listen fails.
Reported-off-by: Maks Mishin <maks.mishinFZ@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
if ctrl_list is called with get operation and wrong number
of parameters, it would forget to close the local netlink
handle.
Signed-off-by: Maks Mishin <maks.mishinFZ@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
coupled_control specifies whether the LACP state machine's MUX in the
802.3ad mode should have separate Collecting and Distributing states per
IEEE 802.1AX-2008 5.4.15 for coupled and independent control state.
By default this setting is on and does not separate the Collecting and
Distributing states, maintaining the bond in coupled control. If set off,
will toggle independent control state machine which will seperate
Collecting and Distributing states.
Signed-off-by: Aahil Awatramani <aahila@google.com>
v2:
Dropped uapi header change
Use of print_on_off and parse_on_off
Signed-off-by: David Ahern <dsahern@kernel.org>
In commit b264b4c656 ("ip: add NLM_F_ECHO support") the "-echo" option
was added, but not to the options in the usage. Add it.
Note there doesn't seem to be any praticular order for the options here,
so it's placed kind of randomly.
Fixes: b264b4c656 ("ip: add NLM_F_ECHO support")
Signed-off-by: Yedaya Katsman <yedaya.ka@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The stats command was added in 54d82b0699 ("ip: Add a new family of
commands, "stats""), but wasn't included in the subcommand list in the
help usage.
Add it in the right position alphabetically.
Fixes: 54d82b0699 ("ip: Add a new family of commands, "stats"")
Signed-off-by: Yedaya Katsman <yedaya.ka@gmail.com>
Reviewed-by: Petr Machata <me@pmachata.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Commit 6e15d27aae ("ip: add AMT support") added "amt" to the list
of "first level" commands list, which isn't correct, as it isn't present
in the cmds list. remove it from the usage help.
Fixes: 6e15d27aae ("ip: add AMT support")
Signed-off-by: Yedaya Katsman <yedaya.ka@gmail.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
convert frprintf calls to perror() so the caller
can see the reason of an error
Signed-off-by: Denis Kirjanov <dkirjanov@suse.de>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Victor Nogueira says:
====================
Continuing on what Hangbin Liu started [1], this patch set adds support for
the NLM_F_ECHO flag for tc actions and filters. For qdiscs it will require
some kernel surgery, and we'll send it soon after this surgery is merged.
When user space configures the kernel with netlink messages, it can set
NLM_F_ECHO flag to request the kernel to send the applied configuration
back to the caller. This allows user space to receive back configuration
information that is populated by the kernel. Often because there are
parameters that can only be set by the kernel which become visible with the
echo, or because user space lets the kernel choose a default value.
To illustrate a use case where the kernel will give us a default value,
the example below shows the user not specifying the action index:
tc -echo actions add action mirred egress mirror dev lo
total acts 0
Added action
action order 1: mirred (Egress Mirror to device lo) pipe
index 1 ref 1 bind 0
not_in_hw
Note that the echoed response indicates that the kernel gave us a value
of index 1
[1] https://lore.kernel.org/netdev/20220916033428.400131-2-liuhangbin@gmail.com/
====================
Signed-off-by: David Ahern <dsahern@kernel.org>
If the user specifies this flag for a filter command the kernel will
return the command's result back to user space.
For example:
tc -echo filter add dev lo parent ffff: protocol ip matchall action ok
added filter dev lo parent ffff: protocol ip pref 49152 matchall chain 0
As illustrated above, the kernel will give us a pref of 491252
The same can be done for other filter commands (replace, delete, and
change). For example:
tc -echo filter del dev lo parent ffff: pref 49152 protocol ip matchall
deleted filter dev lo parent ffff: protocol ip pref 49152 matchall chain 0
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
This patch adds the -echo flag to tc command line and support for it in
tc actions. If the user specifies this flag for an action command, the
kernel will return the command's result back to user space.
For example:
tc -echo actions add action mirred egress mirror dev lo
total acts 0
Added action
action order 1: mirred (Egress Mirror to device lo) pipe
index 10 ref 1 bind 0
not_in_hw
As illustrated above, the kernel will give us an index of 10
The same can be done for other action commands (replace, change, and
delete). For example:
tc -echo actions delete action mirred index 10
total acts 0
Deleted action
action order 1: mirred (Egress Mirror to device lo) pipe
index 10 ref 0 bind 0
not_in_hw
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
The function basename() expects a mutable character string,
which now causes a warning:
bpf_legacy.c: In function ‘bpf_load_common’:
bpf_legacy.c:975:38: warning: passing argument 1 of ‘__xpg_basename’ discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
975 | basename(cfg->object), cfg->mode == EBPF_PINNED ?
| ~~~^~~~~~~~
In file included from bpf_legacy.c:21:
/usr/include/libgen.h:34:36: note: expected ‘char *’ but argument is of type ‘const char *’
34 | extern char *__xpg_basename (char *__path) __THROW;
Fixes: f20ff2f195 ("bpf: keep parsed program mode in struct bpf_cfg_in")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
So far the mirred action has dealt with syntax that handles
mirror/redirection for netdev. A matching packet is redirected or mirrored
to a target netdev.
In this patch we enable mirred to mirror to a tc block as well.
IOW, the new syntax looks as follows:
... mirred <ingress | egress> <mirror | redirect> [index INDEX] < <blockid BLOCKID> | <dev <devname>> >
Examples of mirroring or redirecting to a tc block:
$ tc filter add block 22 protocol ip pref 25 \
flower dst_ip 192.168.0.0/16 action mirred egress mirror blockid 22
$ tc filter add block 22 protocol ip pref 25 \
flower dst_ip 10.10.10.10/32 action mirred egress redirect blockid 22
Co-developed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Co-developed-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
In musl basename() is only available via libgen.h
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
There are cases where NULL is passed as format string when
nothing is to be printed. This is commonly done in the print_bool
function when a flag is false. Glibc seems to handle this case nicely
but for musl it will cause a segmentation fault
Since nothing needs to be printed, in this case; just check
for NULL and return.
Reported-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
There was a word too much in the documentation of tc-htb
Signed-off-by: Simon Egli <simon@egli.online>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>