Added support for filtering based on port ranges.
UAPI changes have been accepted into net-next.
Example:
1. Match on a port range:
-------------------------
$ tc filter add dev enp4s0 protocol ip parent ffff:\
prio 1 flower ip_proto tcp dst_port 20-30 skip_hw\
action drop
$ tc -s filter show dev enp4s0 parent ffff:
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
eth_type ipv4
ip_proto tcp
dst_port 20-30
skip_hw
not_in_hw
action order 1: gact action drop
random type none pass val 0
index 1 ref 1 bind 1 installed 85 sec used 3 sec
Action statistics:
Sent 460 bytes 10 pkt (dropped 10, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
2. Match on IP address and port range:
--------------------------------------
$ tc filter add dev enp4s0 protocol ip parent ffff:\
prio 1 flower dst_ip 192.168.1.1 ip_proto tcp dst_port 100-200\
skip_hw action drop
$ tc -s filter show dev enp4s0 parent ffff:
filter protocol ip pref 1 flower chain 0 handle 0x2
eth_type ipv4
ip_proto tcp
dst_ip 192.168.1.1
dst_port 100-200
skip_hw
not_in_hw
action order 1: gact action drop
random type none pass val 0
index 2 ref 1 bind 1 installed 58 sec used 2 sec
Action statistics:
Sent 920 bytes 20 pkt (dropped 20, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
v6:
Modified to change json output format as object for sport/dport.
"dst_port":{
"start":2000,
"end":6000
},
"src_port":{
"start":50,
"end":60
}
v5:
Simplified some code and used 'sscanf' for parsing. Removed
space in output format.
v4:
Added man updates explaining filtering based on port ranges.
Removed 'range' keyword.
v3:
Modified flower_port_range_attr_type calls.
v2:
Addressed Jiri's comment to sync output format with input
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
The different encapsulation types are described in ENCAP_*
non-terminals, but ENCAP definition lists them without the ENCAP_
prefix. Fix this for consistency.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
All rdma-related man pages list each other in SEE ALSO section, only
rdma-resource.8 is missing. Add it for the sake of consistency.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
IPX has been depracted then removed from upstream kernels.
Drop support from ip route as well.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
When disabling a flag, one needs to AND with the inverse not the flag
itself. Otherwise specifying for instance 'home -nodad' will effectively
clear the flags variable.
While being at it, simplify the code a bit by merging common parts of
negated and non-negated case branches. Also allow for the "special
cases" to be inverted, too.
Fixes: f73ac674d0 ("ip: change flag names to an array")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Add a note to 'nexthop' description stating the maximum number of
nexthops per command and pointing at 'append' command as a workaround.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Allow to set the DF bit behaviour for outgoing IPv4 packets: it can be
always on, inherited from the inner header, or, by default, always off,
which is the current behaviour.
v2:
- Indicate in the man page what DF refers to, using RFC 791 wording
(David Ahern)
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Allow to set the DF bit behaviour for outgoing IPv4 packets: it can be
always on, inherited from the inner header, or, by default, always off,
which is the current behaviour.
v2:
- Indicate in the man page what DF refers to, using RFC 791 wording
(David Ahern)
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
The region field was not added to the devlink man page.
Fixes: 8b4fbf0bed ("devlink: Add support for devlink-region access")
Signed-off-by: Alex Vesker <valex@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
". If" gets interpreted as a macro, so move the period to the previous
line:
33: warning: macro `If' not defined
Fixes: 141b55f854 ("Add SKB Priority qdisc support in tc(8)")
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Fixes groff warning:
ss.8 92: warning [p 2, 2.8i]: can't break line
And makes the line also more readable.
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Currently when we add geneve with "ttl inherit", we only set ttl to 0, which
is actually use whatever default value instead of inherit the inner protocol's
ttl value.
To make a difference with ttl inherit and ttl == 0, we add an attribute
IFLA_GENEVE_TTL_INHERIT in kernel commit 52d0d404d39dd ("geneve: add ttl
inherit support"). Now let's use "ttl inherit" to inherit the inner
protocol's ttl, and use "ttl auto" to means "use whatever default value",
the same behavior with ttl == 0.
v2:
1) remove IFLA_GENEVE_TTL_INHERIT defination in if_link.h as it's already
updated.
2) Still use addattr8() so we can enable/disable ttl inherit, as Michal
suggested.
v3: Update man page
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
This patch adds support for the new backup port option that can be set
on a bridge port. If the port's carrier goes down all of the traffic
gets redirected to the configured backup port. We add the following new
arguments:
$ ip link set dev brport type bridge_slave backup_port brport2
$ ip link set dev brport type bridge_slave nobackup_port
$ bridge link set dev brport backup_port brport2
$ bridge link set dev brport nobackup_port
The man pages are updated respectively.
Also 2 minor style adjustments:
- add missing space to bridge man page's state argument
- use lower starting case for vlan_tunnel in ip-link man page (to be
consistent with the rest)
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Adds new option extern_learn to set NTF_EXT_LEARNED flag
on neigh entries.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
This documents the parameters and provides an example of usage.
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Allow matching on options in Geneve tunnel headers.
The options can be described in the form
CLASS:TYPE:DATA/CLASS_MASK:TYPE_MASK:DATA_MASK, where CLASS is
represented as a 16bit hexadecimal value, TYPE as an 8bit
hexadecimal value and DATA as a variable length hexadecimal value.
e.g.
# ip link add name geneve0 type geneve dstport 0 external
# tc qdisc add dev geneve0 ingress
# tc filter add dev geneve0 protocol ip parent ffff: \
flower \
enc_src_ip 10.0.99.192 \
enc_dst_ip 10.0.99.193 \
enc_key_id 11 \
geneve_opts 0102:80:1122334421314151/ffff:ff:ffffffffffffffff \
ip_proto udp \
action mirred egress redirect dev eth1
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
While at it also add missing text for proxy in the man page.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Add support for the new sticky flag that can be set on fdbs and update the
man page.
CC: David Ahern <dsahern@gmail.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
IPVLAN and IPVTAP are using the same functions and parameters. So we can
just add a new link_util with id ipvtap. Others are the same.
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Conflicts:
ip/iproute_lwtunnel.c
In addition to merge conflict between bd59e5b151 and 94a8722f2f,
updated the code added by the latter commit based on the change of the
former (ie., added ret = to the new rta_addattr_l).
Signed-off-by: David Ahern <dsahern@gmail.com>
Extend slotting with support for non-uniform distributions. This is
similar to netem's non-uniform distribution delay feature.
Syntax:
slot distribution DISTRIBUTION DELAY JITTER [packets MAX_PACKETS] \
[bytes MAX_BYTES]
The syntax and use of the distribution table is the same as in the
non-uniform distribution delay feature. A file DISTRIBUTION must be
present in TC_LIB_DIR (e.g. /usr/lib/tc) containing numbers scaled by
NETEM_DIST_SCALE. A random value x is selected from the table and it
takes DELAY + ( x * JITTER ) as delay. Correlation between values is not
supported.
Examples:
Normal distribution delay with mean = 800us and stdev = 100us.
> tc qdisc add dev eth0 root netem slot distribution normal \
800us 100us
Optionally set the max slot size in bytes and/or packets.
> tc qdisc add dev eth0 root netem slot distribution normal \
800us 100us bytes 64k packets 42
Signed-off-by: Yousuk Seung <ysseung@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Slotting is a crude approximation of the behaviors of shared media such
as cable, wifi, and LTE, which gather up a bunch of packets within a
varying delay window and deliver them, relative to that, nearly all at
once.
It works within the existing loss, duplication, jitter and delay
parameters of netem. Some amount of inherent latency must be specified,
regardless.
The new "slot" parameter specifies a minimum and maximum delay between
transmission attempts.
The "bytes" and "packets" parameters can be used to limit the amount of
information transferred per slot.
Examples of use:
tc qdisc add dev eth0 root netem delay 200us \
slot 800us 10ms bytes 64k packets 42
A more correct example, using stacked netem instances and a packet limit
to emulate a tail drop wifi queue with slots and variable packet
delivery, with a 200Mbit isochronous underlying rate, and 20ms path
delay:
tc qdisc add dev eth0 root handle 1: netem delay 20ms rate 200mbit \
limit 10000
tc qdisc add dev eth0 parent 1:1 handle 10:1 netem delay 200us \
slot 800us 10ms bytes 64k packets 42 limit 512
Signed-off-by: Yousuk Seung <ysseung@google.com>
Signed-off-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Since CAKE now has three different settings that can be overridden by tc
filters (priority and host and flow hashes), documenting how they work is
probably a good idea.
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Allow for -color={never,auto,always} to have colored output disabled,
enabled only if stdout is a terminal or enabled regardless of stdout
state.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Add missing --pretty and --json options, correct --zero to --zeros and
correct the mess around --scan/--interval including broken man page
formatting.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
This was the only bit missing in comparison to devlink help text.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Versioning scheme of Linux and iproute2 is similar, therefore the
referenced kernel versions are likely to confuse readers. Clarify this
by prefixing each kernel version by 'Linux' prefix.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
sch_skbprio is a qdisc that prioritizes packets according to their skb->priority
field. Under congestion, it drops already-enqueued lower priority packets to
make space available for higher priority packets. Skbprio was conceived as a
solution for denial-of-service defenses that need to route packets with
different priorities as a means to overcome DoS attacks.
Signed-off-by: Nishanth Devarajan <ndev2021@gmail.com>
Reviewed-by: Michel Machado <michel@digirati.com.br>
Signed-off-by: David Ahern <dsahern@gmail.com>
Update the list of architectures supporting eBPF JIT as of Linux 4.18.
Also mention the Linux version where support for a particular
architecture was introduced. Finally, reformat the list of architectures
as a bullet list in order to make it more readable.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
This patch makes sch_cake's gso/gro splitting configurable
from userspace.
To disable breaking apart superpackets in sch_cake:
tc qdisc replace dev whatever root cake no-split-gso
to enable:
tc qdisc replace dev whatever root cake split-gso
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Add matching on tos/ttl of the IP tunnel headers.
For example, here's decap rule that matches on the tunnel tos:
tc filter add dev vxlan_sys_4789 protocol ip parent ffff: prio 10 flower \
enc_src_ip 192.168.10.2 enc_dst_ip 192.168.10.1 enc_key_id 100 enc_dst_port 4789 enc_tos 0x30 \
src_mac e4:11:22:33:44:70 dst_mac e4:11:22:33:44:50 \
action tunnel_key unset \
action mirred egress redirect dev eth0_0
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Allow to set tos and ttl for the tunnel.
For example, here's encap rule that sets tos to the tunnel:
tc filter add dev eth0_0 protocol ip parent ffff: prio 10 flower \
src_mac e4:11:22:33:44:50 dst_mac e4:11:22:33:44:70 \
action tunnel_key set src_ip 192.168.10.1 dst_ip 192.168.10.2 id 100 dst_port 4789 tos 0x30 \
action mirred egress redirect dev vxlan_sys_4789
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
This is consistent with the other multi-word parameters. Also change the
JSON output to be consistent with way it is formatted for the other
options.
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: David Ahern <dsahern@gmail.com>
sch_cake is intended to squeeze the most bandwidth and latency out of even
the slowest ISP links and routers, while presenting an API simple enough
that even an ISP can configure it.
Example of use on a cable ISP uplink:
tc qdisc add dev eth0 cake bandwidth 20Mbit nat docsis ack-filter
To shape a cable download link (ifb and tc-mirred setup elided)
tc qdisc add dev ifb0 cake bandwidth 200mbit nat docsis ingress wash besteffort
Cake is filled with:
* A hybrid Codel/Blue AQM algorithm, "Cobalt", tied to an FQ_Codel
derived Flow Queuing system, which autoconfigures based on the bandwidth.
* A novel "triple-isolate" mode (the default) which balances per-host
and per-flow FQ even through NAT.
* An deficit based shaper, that can also be used in an unlimited mode.
* 8 way set associative hashing to reduce flow collisions to a minimum.
* A reasonable interpretation of various diffserv latency/loss tradeoffs.
* Support for zeroing diffserv markings for entering and exiting traffic.
* Support for interacting well with Docsis 3.0 shaper framing.
* Support for DSL framing types and shapers.
* Support for ack filtering.
* Extensive statistics for measuring, loss, ecn markings, latency variation.
Various versions baking have been available as an out of tree build for
kernel versions going back to 3.10, as the embedded router world has been
running a few years behind mainline Linux. A stable version has been
generally available on lede-17.01 and later.
sch_cake replaces a combination of iptables, tc filter, htb and fq_codel
in the sqm-scripts, with sane defaults and vastly simpler configuration.
Cake's principal author is Jonathan Morton, with contributions from
Kevin Darbyshire-Bryant, Toke Høiland-Jørgensen, Sebastian Moeller,
Ryan Mounce, Tony Ambardar, Dean Scarff, Nils Andreas Svee, Dave Täht,
and Loganaden Velvindron.
Testing from Pete Heist, Georgios Amanakis, and the many other members of
the cake@lists.bufferbloat.net mailing list.
Signed-off-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: David Ahern <dsahern@gmail.com>
Devlink region allows access to driver defined address regions.
Each device can create its supported address regions and register
them. A device which exposes a region will allow access to it
using devlink.
This support allows reading and dumping regions snapshots as well
as presenting information such as region size and current available
snapshots.
A snapshot represents a memory image of a region taken by the driver.
If a device collects a snapshot of an address region it can be later
exposed using devlink region read or dump commands.
This functionality allows for future analyses on the snapshots.
The dump command is designed to read the full address space of a
region or of a snapshot unlike the read command which allows
reading only a specific section in a region/snapshot indicated by
an address and a length, current support is for reading and dumping
for a previously taken snapshot ID.
New commands added:
devlink region show [ DEV/REGION ]
devlink region delete DEV/REGION snapshot SNAPSHOT_ID
devlink region dump DEV/REGION [ snapshot SNAPSHOT_ID ]
devlink region read DEV/REGION [ snapshot SNAPSHOT_ID ]
address ADDRESS length length
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>