Commit Graph

6093 Commits

Author SHA1 Message Date
Stephen Hemminger
07b65f312f netem: add SPDX license header
The netem directory contains code to generate tables for netem.
This code came from NISTnet which was public domain.
Add appropriate license tag.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2023-01-14 09:00:34 -08:00
Stephen Hemminger
6af63cc732 misc: use SPDX
Use SPDX tag instead of GPL boilerplate.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2023-01-14 09:00:34 -08:00
Stephen Hemminger
b3a091d100 tc: use SPDX
Replace GPL boilerplate with SPDX.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2023-01-14 09:00:34 -08:00
Stephen Hemminger
01bdedff99 tc: replace GPL-BSD boilerplate in codel and fq
Replace legal boilerplate with SPDX instead.
These algorithms are dual licensed.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2023-01-14 09:00:34 -08:00
Stephen Hemminger
d2c5676eec tipc: use SPDX
Replace boilerplate GPL text with SPDX

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2023-01-14 09:00:34 -08:00
Stephen Hemminger
2a5fb175fa testsuite: use SPDX
Replace boilerplate with SPDX tag.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2023-01-14 09:00:34 -08:00
Stephen Hemminger
c37d21944b ip: use SPDX
Use SPDX instead of boilerplate text for ip and related
sub commands.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2023-01-14 09:00:34 -08:00
Stephen Hemminger
18a13ec516 devlink: use SPDX
Add SPDX tag instead of GPL 2.0 or later boilerplate

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2023-01-14 09:00:34 -08:00
Stephen Hemminger
9e7e786ae4 lib: replace GPL boilerplate with SPDX
Replace standard GPL 2.0 or later text with SPDX tag.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2023-01-14 09:00:34 -08:00
Stephen Hemminger
8aa217546e genl: use SPDX
Replace GPL 2.0 or later boilerplate.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2023-01-14 09:00:34 -08:00
Stephen Hemminger
9458a2c1f5 bridge: use SPDX
Replace GPL 2.0 or later boilerplate text.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2023-01-14 09:00:34 -08:00
Jakub Wilk
221da6275e man: ss: remove duplicated option name
Signed-off-by: Jakub Wilk <jwilk@jwilk.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2023-01-14 09:00:08 -08:00
Stephen Hemminger
4dc60c0179 tc: remove support for rr qdisc
The Round-Robin qdisc was removed in kernel version 2.6.27.
Remove code and man page references from iproute.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2023-01-11 09:14:29 -08:00
Matthieu Baerts
e2e81aa20f mptcp: add new listener events
These new events have been added in kernel commit f8c9dfbd875b ("mptcp:
add pm listener events") by Geliang Tang.

Two new MPTCP Netlink event types for PM listening socket creation and
closure have been recently added. They will be available in the future
v6.2 kernel.

They have been added because MPTCP for Linux, when not using the
in-kernel PM, depends on the userspace PM to create extra listening
sockets -- called "PM listeners" -- before announcing addresses and
ports. With the existing MPTCP Netlink events, a userspace PM can create
PM listeners at startup time, or in response to an incoming connection.
Creating sockets in response to connections is not optimal: ADD_ADDRs
can't be sent until the sockets are created and listen()ed, and if all
connections are closed then it may not be clear to the userspace PM
daemon that PM listener sockets should be cleaned up. Hence these new
events: PM listening sockets can be managed based on application
activity.

Note that the maximum event string size has to be increased by 2 to be
able to display LISTENER_CREATED without truncated it.

Also, as pointed by Mat, this event doesn't have any "token" attribute
so this attribute is now printed only if it is available.

Link: https://github.com/multipath-tcp/mptcp_net-next/issues/313
Cc: Geliang Tang <geliang.tang@suse.com>
Acked-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2023-01-10 09:18:29 -08:00
Stephen Hemminger
1ba8021057 tc/htb: add SPDX comment
The standard way is to use SPDX to refer to license,
instead of per-file boilerplate text.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2023-01-09 13:39:27 -08:00
Stephen Hemminger
62f6e2fcd7 tc/htb: break long lines
Style guidelines is 100 characters

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2023-01-09 13:36:34 -08:00
Max Tottenham
010a8388ae tc: Add JSON output to tc-class
* Add JSON formatted output to the `tc class show ...` command.
  * Add JSON formatted output for the htb qdisc classes.

Signed-off-by: Max Tottenham <mtottenh@akamai.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2023-01-09 13:27:41 -08:00
Stephen Hemminger
b16d14aaad uapi: update vdpa.h
Upstream 6.2-rc3

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2023-01-09 13:21:53 -08:00
Ido Schimmel
d0e02f35af dcb: Do not leave ACKs in socket receive buffer
Originally, the dcb utility only stopped receiving messages from a
socket when it found the attribute it was looking for. Cited commit
changed that, so that the utility will also stop when seeing an ACK
(NLMSG_ERROR message), by setting the NLM_F_ACK flag on requests.

This is problematic because it means a successful request will leave an
ACK in the socket receive buffer, causing the next request to bail
before reading its response.

Fix that by not stopping when finding the required attribute in a
response. Instead, stop on the subsequent ACK.

Fixes: 84c0369726 ("dcb: unblock mnl_socket_recvfrom if not message received")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2022-12-29 08:44:23 -08:00
Hauke Mehrtens
22c877d93e configure: Remove include <sys/stat.h>
The check_name_to_handle_at() function in the configure script is
including sys/stat.h. This include fails with glibc 2.36 like this:
````
In file included from /linux-5.15.84/include/uapi/linux/stat.h:5,
                 from /toolchain-x86_64_gcc-12.2.0_glibc/include/bits/statx.h:31,
                 from /toolchain-x86_64_gcc-12.2.0_glibc/include/sys/stat.h:465,
                 from config.YExfMc/name_to_handle_at_test.c:3:
/linux-5.15.84/include/uapi/linux/types.h:10:2: warning: #warning "Attempt to use kernel headers from user space, see https://kernelnewbies.org/KernelHeaders" [-Wcpp]
   10 | #warning "Attempt to use kernel headers from user space, see https://kernelnewbies.org/KernelHeaders"
      |  ^~~~~~~
In file included from /linux-5.15.84/include/uapi/linux/posix_types.h:5,
                 from /linux-5.15.84/include/uapi/linux/types.h:14:
/linux-5.15.84/include/uapi/linux/stddef.h:5:10: fatal error: linux/compiler_types.h: No such file or directory
    5 | #include <linux/compiler_types.h>
      |          ^~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
````

Just removing the include works, the manpage of name_to_handle_at() says
only fcntl.h is needed.

Fixes: c5b72cc56b ("lib/fs: fix issue when {name,open}_to_handle_at() is not implemented")
Tested-by: Heiko Thiery <heiko.thiery@gmail.com>
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2022-12-29 08:43:52 -08:00
Stephen Hemminger
8d7c60b4dd uapi: update headers to 6.2-rc1
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2022-12-26 17:57:36 -08:00
Jiri Pirko
1e994cf69c devlink: fix mon json output for trap-policer
There is a json footer missed for trap-policer output in "devlink mon".
So add it and fix the json output.

Fixes: a66af55693 ("devlink: Add devlink trap policer set and show commands")
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2022-12-19 08:35:48 -08:00
David Ahern
3715a146e8 Merge branch 'main' into next
Conflicts:
	devlink/devlink.c

Signed-off-by: David Ahern <dsahern@kernel.org>
2022-12-16 09:12:38 -07:00
Stephen Hemminger
83699d3d50 fix version # 2022-12-14 09:42:22 -08:00
David Ahern
19b358e1de Merge branch 'new-ipsec-offload-type' into next
Leon Romanovsky  says:

====================

From: Leon Romanovsky <leonro@nvidia.com>

Extend ip tool to support new IPsec offload mode.
Followup of the recently accepted series to netdev.
https://lore.kernel.org/r/20221209093310.4018731-1-steffen.klassert@secunet.com

Changelog:
v1:
 * Changed "full offload" to "packet offload" to be aligned with kernel names.
 * Rebase to latest iproute2-next
v0: https://lore.kernel.org/all/cover.1652179360.git.leonro@nvidia.com

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2022-12-14 09:04:31 -07:00
Leon Romanovsky
3422c62d58 xfrm: add an interface to offload policy
Extend at "ip xfrm policy" to allow policy offload to specific device.
The syntax and the code follow already established pattern from the
state offload.

The only difference between them is that direction was already mandatory
argument in policy configuration commands, so don't need to add direction
handling logic like it was done for the state offload.

The syntax is as follows:
 $ ip xfrm policy .... offload packet dev <if-name>

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2022-12-14 09:02:34 -07:00
Leon Romanovsky
a6e740ff40 xfrm: add packet offload mode to xfrm state
Allow users to configure xfrm states with packet offload type.

Packet offload mode:
  ip xfrm state offload packet dev <if-name> dir <in|out>
Crypto offload mode:
  ip xfrm state offload crypto dev <if-name> dir <in|out>
  ip xfrm state offload dev <if-name> dir <in|out>

The latter variant configures crypto offload mode and is needed
to provide backward compatibility.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2022-12-14 09:02:29 -07:00
Leon Romanovsky
bdd19b1ede xfrm: prepare state offload logic to set mode
The offload in xfrm state requires to provide device and direction
in order to activate it. However, in the help section, device and
direction were displayed as an optional.

As a preparation to addition of packet offload, let's fix the help
section and refactor the code to be more clear.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2022-12-14 09:02:05 -07:00
David Ahern
dc768a4585 Merge branch 'devlink-port-function' into next
Shay Drory  says:

====================

Patch implementing new netlink attribute for devlink-port function got
merged to net-next.
https://lore.kernel.org/netdev/20221206185119.380138-1-shayd@nvidia.com/

Now there is a need to support these new attribute in the userspace
tool. Implement roce and migratable port function attributes in devlink
userspace tool. Update documentation.

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2022-12-14 09:01:13 -07:00
Shay Drory
fe036c3666 devlink: Add documentation for roce and migratable port function attributes
New port function attributes roce and migratable were added.
Update the man page for devlink-port to account for new attributes.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2022-12-14 08:59:38 -07:00
Shay Drory
32168d8a88 devlink: Support setting port function migratable cap
Suppor port function commands to enable / disable migratable
capability, this is used to set the port function as migratable.

Live migration is the process of transferring a live virtual machine
from one physical host to another without disrupting its normal
operation.

In order for a VM to be able to perform LM, all the VM components must
be able to perform migration. e.g.: to be migratable.
In order for VF to be migratable, VF must be bound to VFIO driver with
migration support.

When migratable capability is enable for a function of the port, the
device is making the necessary preparations for the function to be
migratable, which might include disabling features which cannot be
migrated.

Example of LM with migratable function configuration:
Set migratable of the VF's port function.

$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0
vfnum 1
    function:
        hw_addr 00:00:00:00:00:00 migratable disable

$ devlink port function set pci/0000:06:00.0/2 migratable enable

$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0
vfnum 1
    function:
        hw_addr 00:00:00:00:00:00 migratable enable

Bind VF to VFIO driver with migration support:
$ echo <pci_id> > /sys/bus/pci/devices/0000:08:00.0/driver/unbind
$ echo mlx5_vfio_pci > /sys/bus/pci/devices/0000:08:00.0/driver_override
$ echo <pci_id> > /sys/bus/pci/devices/0000:08:00.0/driver/bind

Attach VF to the VM.
Start the VM.
Perform LM.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2022-12-14 08:59:33 -07:00
Shay Drory
bb2eea918b devlink: Support setting port function roce cap
Support port function commands to enable / disable RoCE, this is used to
control the port RoCE device capabilities.

When RoCE is disabled for a function of the port, function cannot create
any RoCE specific resources (e.g GID table).
It also saves system memory utilization. For example disabling RoCE
enable a VF/SF to save 1 Mbytes of system memory per function.

Example of a PCI VF port which supports a port function:

$ devlink port show pci/0000:06:00.0/2
    pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum
    0 vfnum 1
      function:
        hw_addr 00:00:00:00:00:00 roce enabled

$ devlink port function set pci/0000:06:00.0/2 roce disable

$ devlink port show pci/0000:06:00.0/2
    pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum
    0 vfnum 1
      function:
        hw_addr 00:00:00:00:00:00 roce disabled

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2022-12-14 08:55:10 -07:00
David Ahern
3f1e064ea4 Update kernel headers
Update kernel headers to commit:
    7e68dd7d07a2 ("Merge tag 'net-next-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next")

Signed-off-by: David Ahern <dsahern@kernel.org>
2022-12-14 08:54:03 -07:00
Stephen Hemminger
83de2e8005 v6.1.0 2022-12-12 16:11:57 -08:00
Stephen Hemminger
1e344491f9 man: add missing tc class show
"tc class show" is valid sub command but missing from synopsis.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2022-12-12 14:32:51 -08:00
Stephen Hemminger
d0de986119 tc: make prefix const
Tcstats functions have prefix argument that can be made const.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2022-12-12 13:30:49 -08:00
Stephen Hemminger
6c9940eca1 ip: print mpls errors on stderr
Error messages should go on stderr.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2022-12-12 08:37:40 -08:00
Stephen Hemminger
523692fa17 tc: print errors on stderr
Don't mix output and errors.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2022-12-09 19:47:03 -08:00
Stephen Hemminger
13cd02228f iplink: support JSON in MPLS output
The MPLS statistics did not support oneline or JSON
in current code.

Fixes: 837552b445 ("iplink: add support for afstats subcommand")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2022-12-09 19:45:47 -08:00
Daniel Xu
1a408bda2e ip-link: man: Document existence of netns argument in add command
ip-link-add supports netns argument just like ip-link-set. This commit
documents the existence of netns in help text and man page.

Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2022-12-09 10:05:04 -08:00
Lahav Schlesinger
0faec4d050 libnetlink: Fix memory leak in __rtnl_talk_iov()
If `__rtnl_talk_iov` fails then callers are not expected to free `answer`.

Currently if `NLMSG_ERROR` was received with an error then the netlink
buffer was stored in `answer`, while still returning an error

This leak can be observed by running this snippet over time.
This triggers an `NLMSG_ERROR` because for each neighbour update, `ip`
will try to query for the name of interface 9999 in the wrong netns.
(which in itself is a separate bug)

 set -e

 ip netns del test-a || true
 ip netns add test-a
 ip netns del test-b || true
 ip netns add test-b

 ip -n test-a netns set test-b auto
 ip -n test-a link add veth_a index 9999 type veth \
  peer name veth_b netns test-b
 ip -n test-b link set veth_b up

 ip -n test-a monitor link address prefix neigh nsid label all-nsid \
  > /dev/null &
 monitor_pid=$!
 clean() {
  kill $monitor_pid
  ip netns del test-a
  ip netns del test-b
 }
 trap clean EXIT

 while true; do
  ip -n test-b neigh add dev veth_b 1.2.3.4 lladdr AA:AA:AA:AA:AA:AA
  ip -n test-b neigh del dev veth_b 1.2.3.4
 done

Fixes: 55870dfe7f ("Improve batch and dump times by caching link lookups")
Signed-off-by: Lahav Schlesinger <lschlesinger@drivenets.com>
Signed-off-by: Gilad Naaman <gnaaman@drivenets.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2022-12-09 10:03:45 -08:00
Jiri Pirko
42b27dfc6e devlink: update ifname map when message contains DEVLINK_ATTR_PORT_NETDEV_NAME
Recent kernels send PORT_NEW message with when ifname changes,
so benefit from that by having ifnames updated.

Whenever there is a message containing DEVLINK_ATTR_PORT_NETDEV_NAME
attribute, use it to update ifname map.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2022-12-08 10:47:48 -07:00
Jiri Pirko
18ff3ccbc8 devlink: push common code to __pr_out_port_handle_start_tb()
There is a common code in pr_out_port_handle_start() and
pr_out_port_handle_start_arr(). As the next patch is going to extend it
even more, push the code into common helper.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2022-12-08 10:47:44 -07:00
Jiri Pirko
d5ae4c3fdb devlink: get devlink port for ifname using RTNL get link command
Currently, when user specifies ifname as a handle on command line of
devlink, the related devlink port is looked-up in previously taken dump
of all devlink ports on the system. There are 3 problems with that:
1) The dump iterates over all devlink instances in kernel and takes a
   devlink instance lock for each.
2) Dumping all devlink ports would not scale.
3) Alternative ifnames are not exposed by devlink netlink interface.

Instead, benefit from RTNL get link command extension and get the
devlink port handle info from IFLA_DEVLINK_PORT attribute, if supported.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2022-12-08 10:47:37 -07:00
Jiri Pirko
f04f5e1d08 devlink: add ifname_map_add/del() helpers
Add couple of helpers to alloc/free of map object alongside with list
addition/removal.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2022-12-08 10:46:59 -07:00
David Ahern
3b88380b07 Merge branch 'pcp-prio-apptrust' into next
Daniel Machon  says:

====================

This patch series makes use of the newly introduced [1] DCB_APP_SEL_PCP
selector, for PCP/DEI prioritization, and DCB_ATTR_IEEE_APP_TRUST
attribute for configuring per-selector trust and trust-order.

========================================================================
New parameter "pcp-prio" to existing "app" subcommand:
========================================================================

A new pcp-prio parameter has been added to the app subcommand, which can
be used to classify traffic based on PCP and DEI from the VLAN header.
PCP and DEI is specified in a combination of numerical and symbolic
form, where 'de' (drop-eligible) means DEI=1 and 'nd' (not-drop-eligible)
means DEI=0.

Map PCP 1 and DEI 0 to priority 1
$ dcb app add dev eth0 pcp-prio 1nd:1

Map PCP 1 and DEI 1 to priority 1
$ dcb app add dev eth0 pcp-prio 1de:1

========================================================================
New apptrust subcommand for configuring per-selector trust and trust
order:
========================================================================

This new command currently has a single parameter, which lets you
specify an ordered list of trusted selectors. The microchip sparx5
driver is already enabled to offload said list of trusted selectors. The
new command has been given the name apptrust, to indicate that the trust
covers APP table selectors only. I found that 'apptrust' was better than
plain 'trust' as the latter does not indicate the scope of what is to be
trusted.

Example:

Trust selectors dscp and pcp, in that order:
$ dcb apptrust set dev eth0 order dscp pcp

Trust selectors ethtype, stream-port and pcp, in that order
$ dcb apptrust set dev eth0 order ethtype stream-port pcp

Show the trust order
$ dcb apptrust show dev eth0 order order: ethtype stream-port pcp

A concern was raised here [2], that 'apptrust' would not work well with
matches(), so instead strcmp() has been used to match for the new
subcommand, as suggested here [3]. Same goes with pcp-prio parameter for
dcb app.

The man page for dcb_app has been extended to cover the new pcp-prio
parameter, and a new man page for dcb_apptrust has been created.

[1] https://lore.kernel.org/netdev/20221101094834.2726202-1-daniel.machon@microchip.com/
[2] https://lore.kernel.org/netdev/20220909080631.6941a770@hermes.local/
[3] https://lore.kernel.org/netdev/Y0fP+9C0tE7P2xyK@shredder/

====================

Signed-off-by: David Ahern <dsahern@kernel.org>
2022-12-08 09:23:56 -07:00
Daniel Machon
21f88a0759 dcb: add new subcommand for apptrust
Add new apptrust subcommand for the dcbnl apptrust extension object.

The apptrust command lets you specify a consecutive ordered list of
trusted selectors, which can be used by drivers to determine which
selectors are eligible (trusted) for packet prioritization, and in which
order.

Selectors are sent in a new nested attribute:
DCB_ATTR_IEEE_APP_TRUST_TABLE.  The nest contains trusted selectors
encapsulated in either DCB_ATTR_IEEE_APP or DCB_ATTR_DCB_APP attributes,
for standard and non-standard selectors, respectively.

Example:

Trust selectors dscp and pcp, in that order
$ dcb apptrust set dev eth0 order dscp pcp

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2022-12-08 09:22:47 -07:00
Daniel Machon
3e2a96c9f4 dcb: add new pcp-prio parameter to dcb app
Add new pcp-prio parameter to the app subcommand, which can be used to
classify traffic based on PCP and DEI from the VLAN header. PCP and DEI
is specified in a combination of numerical and symbolic form, where 'de'
(drop-eligible) means DEI=1 and 'nd' (not-drop-eligible) means DEI=0.

Map PCP 1 and DEI 0 to priority 1
$ dcb app add dev eth0 pcp-prio 1nd:1

Map PCP 1 and DEI 1 to priority 1
$ dcb app add dev eth0 pcp-prio 1de:1

Internally, PCP and DEI is encoded in the protocol field of the dcb_app
struct. Each combination of PCP and DEI maps to a priority, thus needing
a range of  0-15. A well formed dcb_app entry for PCP/DEI
prioritization, could look like:

    struct dcb_app pcp = {
        .selector = DCB_APP_SEL_PCP,
	.priority = 7,
        .protocol = 15
    }

For mapping PCP=7 and DEI=1 to Prio=7.

Also, three helper functions for translating between std and non-std APP
selectors, have been added to dcb_app.c and exposed through dcb.h.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2022-12-08 09:22:26 -07:00
Jacob Keller
a74e7181c6 devlink: support direct region read requests
The kernel has gained support for reading from regions without needing to
create a snapshot. To use this support, the DEVLINK_ATTR_REGION_DIRECT
attribute must be added to the command.

For the "read" command, if the user did not specify a snapshot, add the new
attribute to request a direct read. The "dump" command will still require a
snapshot. While technically a dump could be performed without a snapshot it
is not guaranteed to be atomic unless the region size is no larger than
256 bytes.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2022-12-08 09:21:10 -07:00
Ido Schimmel
fa94a97921 libnetlink: Fix wrong netlink header placement
The netlink header must be first in the netlink message, so move it
there.

Fixes: fee4a56f01 ("Update kernel headers")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
2022-12-08 09:20:27 -07:00