Introduce PPPoE specific fields in tc-flower:
- session id (16 bits)
- ppp protocol (16 bits)
Those fields can be provided only when protocol was set to
ETH_P_PPP_SES. ppp_proto works similar to vlan_ethtype, i.e.
ppp_proto overwrites eth_type. Thanks to that, fields from
encapsulated protocols (such as src_ip) can be specified.
e.g.
# tc filter add dev ens6f0 ingress prio 1 protocol ppp_ses \
flower \
pppoe_sid 1234 \
ppp_proto ip \
dst_ip 127.0.0.1 \
src_ip 127.0.0.2 \
action drop
Vlan and cvlan is also supported, in this case cvlan_ethtype
or vlan_ethtype has to be set to ETH_P_PPP_SES.
e.g.
# tc filter add dev ens6f0 ingress prio 1 protocol 802.1Q \
flower \
vlan_id 2 \
vlan_ethtype ppp_ses \
pppoe_sid 1234 \
ppp_proto ip \
dst_ip 127.0.0.1 \
src_ip 127.0.0.2 \
action drop
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Acked-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
In several places (code reuse?), the variable handle
is a parameter to the function, but then
is defined inside basic block for classid.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
tc u32 ematch offset parsing might fail even if nexthdr offset is
aligned to 4. The issue can be reproduced with the following script:
tc qdisc del dev dummy0 root
tc qdisc add dev dummy0 root handle 1: htb r2q 1 default 1
tc class add dev dummy0 parent 1:1 classid 1:108 htb quantum 1000000 \
rate 1.00mbit ceil 10.00mbit burst 6k
while true; do
if ! tc filter add dev dummy0 protocol all parent 1: prio 1 basic match \
"meta(vlan mask 0xfff eq 1)" and "u32(u32 0x20011002 0xffffffff \
at nexthdr+8)" flowid 1:108; then
exit 0
fi
done
which we expect to produce an endless loop.
With the current code, instead, this ends with:
u32: invalid offset alignment, must be aligned to 4.
... meta(vlan mask 0xfff eq 1) and >>u32(u32 0x20011002 0xffffffff at nexthdr+8)<< ...
... u32(u32 0x20011002 0xffffffff at >>nexthdr+8<<)...
Usage: u32(ALIGN VALUE MASK at [ nexthdr+ ] OFFSET)
where: ALIGN := { u8 | u16 | u32 }
Example: u32(u16 0x1122 0xffff at nexthdr+4)
Illegal "ematch"
This is caused by memcpy copying into buf an unterminated string.
Fix it using strncpy instead of memcpy.
Fixes: commit 311b41454d ("Add new extended match files.")
Reported-by: Alfred Yang <alf.redyoung@gmail.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Having more than one vlan allows matching on the vlan tag parameters.
This patch changes vlan key validation to take number of vlan tags into
account.
Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Our customers in the fiber telecom world have network configurations
where they would like to control their traffic according to the number
of tags appearing in the packet.
For example, TR247 GPON conformance test suite specification mostly
talks about untagged, single, double tagged packets and gives lax
guidelines on the vlan protocol vs. number of vlan tags.
This is different from the common IT networks where 802.1Q and 802.1ad
protocols are usually describe single and double tagged packet. GPON
configurations that we work with have arbitrary mix the above protocols
and number of vlan tags in the packet.
This patch adds num_of_vlans flower key and associated print and parse
routines. The following command becomes possible:
tc filter add dev eth1 ingress flower num_of_vlans 1 action drop
Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Add support for parsing TCA_FLOWER_KEY_ENC_OPTS_GTP.
Options are as follows: PDU_TYPE:QFI where each
option is represented as 8-bit hexadecimal value.
e.g.
# ip link add gtp_dev type gtp role sgsn
# tc qdisc add dev gtp_dev ingress
# tc filter add dev gtp_dev protocol ip parent ffff: \
flower \
enc_key_id 11 \
gtp_opts 1:8/ff:ff \
action mirred egress redirect dev eth0
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
We need to separate action print for filter and action dump since
in action dump, we need to print hardware status and flags. But in
filter dump, we do not need to print action hardware status and
hardware related flags.
In filter dump, actions hardware status should be same with filter.
so we will not print action hardware status in this case.
Action print for action dump:
action order 0: police 0xff000100 rate 0bit burst 0b mtu 64Kb pkts_rate 50000 pkts_burst 10000 action drop/pipe overhead 0b linklayer unspec
ref 4 bind 3 installed 666 sec used 0 sec firstused 106 sec
Action statistics:
Sent 7634140154 bytes 5109889 pkt (dropped 0, overlimits 0 requeues 0)
Sent software 84 bytes 3 pkt
Sent hardware 7634140070 bytes 5109886 pkt
backlog 0b 0p requeues 0
in_hw in_hw_count 1
used_hw_stats delayed
Action print for filter dump:
action order 1: police 0xff000100 rate 0bit burst 0b mtu 64Kb pkts_rate 50000 pkts_burst 10000 action drop/pipe overhead 0b linklayer unspec
ref 4 bind 3 installed 680 sec used 0 sec firstused 119 sec
Action statistics:
Sent 8627975846 bytes 5775107 pkt (dropped 0, overlimits 0 requeues 0)
Sent software 84 bytes 3 pkt
Sent hardware 8627975762 bytes 5775104 pkt
backlog 0b 0p requeues 0
used_hw_stats delayed
Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Commit b2450e46b7 ("flower: fix clang warnings") caused enc_key_id
and u32 to be printed without indentation. Fix this by printing two
spaces before calling print_uint_name_value.
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
For action police there is an conform-exceed action control
which can be for example "jump 2 / pipe".
The current parsing loop is doing one more iteration than necessary
and results in ok var being 3.
Example filter:
tc filter add dev enp8s0f0_0 ingress protocol ip prio 2 flower \
verbose action police rate 100mbit burst 12m \
conform-exceed jump 1 / pipe mirred egress redirect dev enp8s0f0_1 action drop
Before this change the command will fail.
Trying to add another "pipe" before mirred as a workaround for the stopping the loop
in ok var 3 resulting in result2 not being saved and wrong filter.
... conform-exceed jump 1 / pipe pipe mirred ...
Example dump of the action part:
... action order 1: police 0x1 rate 100Mbit burst 12Mb mtu 2Kb action jump 1 overhead 0b ...
Fix the behavior by removing redundant case 2 handling, either argc is over or breaking.
Example dump of the action part with the fix:
... action order 1: police 0x1 rate 100Mbit burst 12Mb mtu 2Kb action jump 1/pipe overhead 0b ...
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Currently the key struct of u32 filter does not support json. This
commit adds json support for showing key.
Signed-off-by: Wen Liang <liangwen12year@gmail.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Currently u32 filter output does not support json. This commit uses
proper json functions to add support for it.
`sprint_u32_handle` adds an extra space after the raw check, remove the
extra space.
Signed-off-by: Wen Liang <liangwen12year@gmail.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Introduce print_indent_name_value to do the indented style
used in flower.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
This fixes the indentation of types with newline flag.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
Add skip_hw and skip_sw flags for user to control whether
offload action to hardware.
Also we add hw_count to show how many hardwares accept to offload
the action.
Change man page to describe the usage of skip_sw and skip_hw flag.
An example to add and query action as below.
$ tc actions add action police rate 1mbit burst 100k index 100 skip_sw
$ tc -s -d actions list action police
total acts 1
action order 0: police 0x64 rate 1Mbit burst 100Kb mtu 2Kb action reclassify overhead 0b linklayer ethernet
ref 1 bind 0 installed 2 sec used 2 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
skip_sw in_hw in_hw_count 1
used_hw_stats delayed
Signed-off-by: baowen zheng <baowen.zheng@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Clang complains about passing a non-format string but the code here.
The old code was doing extra work the JSON case. JSON ignores one line mode.
This also fixes output format in oneline mode.
Fixes: 04b215015b ("tc_util: introduce a function to print JSON/non-JSON masked numbers")
Cc: elibr@mellanox.com
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
Clang complains about passing non-format string to print_string.
Handle this by splitting json and non-json parts.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Netem is using empty format string to not print values.
Clang complains about this hack so don't do it.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
This catches future errors and silences warning from Clang.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
Error messages should go to stderr even if using JSON.
Fixes: 2704bd6255 ("tc: jsonify actions core")
Cc: jiri@mellanox.com
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
A diffserv3 option (enum value 0) was never sent to the kernel, so it
was not possible to use "tc qdisc change" to select it.
This also meant that were also relying on the kernel's default being
diffserv3 when adding. If the default were to change, we wouldn't have
been able to request diffserv3 explicitly.
Signed-off-by: Kevin Bracey <kevin@bracey.fi>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Commit dfcb63ce1de6 ("fq_codel: generalise ce_threshold marking for subset
of traffic") added support in fq_codel for setting a value and mask that
will be applied to the diffserv/ECN byte to turn on the ce_threshold
feature for a subset of traffic.
This adds support to iproute for setting these values. The parameter is
called ce_threshold_selector and takes a value followed by a
slash-separated mask. Some examples:
# apply ce_threshold to ECT(1) traffic
tc qdisc replace dev eth0 root fq_codel ce_threshold 1ms ce_threshold_selector 0x1/0x3
# apply ce_threshold to ECN-capable traffic marked as diffserv AF22
tc qdisc replace dev eth0 root fq_codel ce_threshold 1ms ce_threshold_selector 0x50/0xfc
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Buffer is 64bytes, but label printing can take 66bytes printing
in hex, and will overflow when setting the string delimiter ('\0').
Fix that by increasing the print buffer size.
Example of overflowing ct_label:
ct_label 11111111111111111111111111111111/11111111111111111111111111111111
Fixes: 2fffb1c030 ("tc: flower: Add matching on conntrack info")
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Fix the wild bracket in the if clause leading to the error in the condition.
Fixes: d61167dd88 ("m_vlan: add pop_eth and push_eth actions")
Signed-off-by: Maxim Petrov <mmrmaximuzz@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Provided port range in tc rule are parsed incorrectly.
Even though range is passed as min-max. It throws an error.
$ tc filter add dev eth0 ingress handle 100 priority 10000 protocol ipv4 flower ip_proto tcp dst_port 10368-61000 action pass
max value should be greater than min value
Illegal "dst_port"
Fixes: 8930840e67 ("tc: flower: Classify packets based port ranges")
Signed-off-by: Puneet Sharma <pusharma@akamai.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Recently we added SKBMOD_F_ECN option support to the kernel; support it in
the tc-skbmod(8) front end, and update its man page accordingly.
The 2 least significant bits of the Traffic Class field in IPv4 and IPv6
headers are used to represent different ECN states [1]:
0b00: "Non ECN-Capable Transport", Non-ECT
0b10: "ECN Capable Transport", ECT(0)
0b01: "ECN Capable Transport", ECT(1)
0b11: "Congestion Encountered", CE
This new option, "ecn", marks ECT(0) and ECT(1) IPv{4,6} packets as CE,
which is useful for ECN-based rate limiting. For example:
$ tc filter add dev eth0 parent 1: protocol ip prio 10 \
u32 match ip protocol 1 0xff flowid 1:2 \
action skbmod \
ecn
The updated tc-skbmod SYNOPSIS looks like the following:
tc ... action skbmod { set SETTABLE | swap SWAPPABLE | ecn } ...
Only one of "set", "swap" or "ecn" shall be used in a single tc-skbmod
command. Trying to use more than one of them at a time is considered
undefined behavior; pipe multiple tc-skbmod commands together instead.
"set" and "swap" only affect Ethernet packets, while "ecn" only affects
IP packets.
Depends on kernel patch "net/sched: act_skbmod: Add SKBMOD_F_ECN option
support", as well as iproute2 patch "tc/skbmod: Remove misinformation
about the swap action".
[1] https://en.wikipedia.org/wiki/Explicit_Congestion_Notification
Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
In between Linux kernel 2.4 and 2.6, key folding for hash tables changed
in kernel space. When iproute2 dropped support for the older algorithm,
the wrong code was removed and kernel 2.4 folding method remained in
place. To get things functional for recent kernels again, restoring the
old code alone was not sufficient - additional byteorder fixes were
needed.
While being at it, make use of ffs() and thereby align the code with how
kernel determines the shift width.
Fixes: 267480f553 ("Backout the 2.4 utsname hash patch.")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Currently man 8 tc-skbmod says that "...the swap action will occur after
any smac/dmac substitutions are executed, if they are present."
This is false. In fact, trying to "set" and "swap" in a single skbmod
command causes the "set" part to be completely ignored. As an example:
$ tc filter add dev eth0 parent 1: protocol ip prio 10 \
matchall action skbmod \
set dmac AA:AA:AA:AA:AA:AA smac BB:BB:BB:BB:BB:BB \
swap mac
The above command simply does a "swap", without setting DMAC or SMAC to
AA's or BB's. The root cause of this is in the kernel, see
net/sched/act_skbmod.c:tcf_skbmod_init():
parm = nla_data(tb[TCA_SKBMOD_PARMS]);
index = parm->index;
if (parm->flags & SKBMOD_F_SWAPMAC)
lflags = SKBMOD_F_SWAPMAC;
^^^^^^^^^^^^^^^^^^^^^^^^^^
Doing a "=" instead of "|=" clears all other "set" flags when doing a
"swap". Discourage using "set" and "swap" in the same command by
documenting it as undefined behavior, and update the "SYNOPSIS" section
as well as tc -help text accordingly.
If one really needs to e.g. "set" DMAC to all AA's then "swap" DMAC and
SMAC, one should do two separate commands and "pipe" them together.
Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
With the json support fix the normal output was
changed. set it back to what it was.
Print overhead with print_size().
Print newline before ref.
Fixes: 0d5cf51e0d ("police: Add support for json output")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Implement a decrement operation for ttl and hoplimit.
Since this is just syntactic sugar, it goes that:
tc filter add ... action pedit ex munge ip ttl dec ...
tc filter add ... action pedit ex munge ip6 hoplimit dec ...
is just a more readable version of this:
tc filter add ... action pedit ex munge ip ttl add 0xff ...
tc filter add ... action pedit ex munge ip6 hoplimit add 0xff ...
This feature was suggested by some pseudo tc examples in Mellanox's
documentation[1], but wasn't present in neither their mlnx-iproute2
nor iproute2.
Tested with skip_sw on Mellanox ConnectX-6 Dx.
[1] https://docs.mellanox.com/pages/viewpage.action?pageId=47033989
v3:
- Use dedicated flags argument in parse_cmd() (David Ahern)
- Minor rewording of the man page
v2:
- Fix whitespace issue (Stephen Hemminger)
- Add to usage info in explain()
Signed-off-by: Asbjørn Sloth Tønnesen <asbjorn@asbjorn.st>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
This patch just prepares the flags argument, so it's
available to the next patch.
Signed-off-by: Asbjørn Sloth Tønnesen <asbjorn@asbjorn.st>
Signed-off-by: David Ahern <dsahern@kernel.org>
Add support for matching on ct_state flag related.
The related state indicates a packet is associated with an existing
connection.
Example:
$ tc filter add dev ens1f0_0 ingress prio 1 chain 1 proto ip flower \
ct_state -est-rel+trk \
action mirred egress redirect dev ens1f0_1
$ tc filter add dev ens1f0_0 ingress prio 1 chain 1 proto ip flower \
ct_state +rel+trk \
action mirred egress redirect dev ens1f0_1
Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
When a wrong value is provided for "burst" or "cburst" parameters, the
resulting error message is unclear and can be misleading:
$ tc class add dev dummy0 parent 1: classid 1:1 htb rate 100KBps burst errtrigger
Illegal "buffer"
The message claims an illegal "buffer" is provided, but neither the
inline help nor the man page list "buffer" among the htb parameters, and
the only way to know that "burst", "maxburst" and "buffer" are synonyms
is to look into tc/q_htb.c.
This commit tries to improve this simply changing the error string to
the parameter name provided in the user-given command, clearly pointing
out where the wrong value is.
$ tc class add dev dummy0 parent 1: classid 1:1 htb rate 100KBps burst errtrigger
Illegal "burst"
$ tc class add dev dummy0 parent 1: classid 1:1 htb rate 100Kbps maxburst errtrigger
Illegal "maxburst"
Reported-by: Sebastian Mitterle <smitterl@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Checking for nbands to be at least 1 at this point is useless. Indeed:
- ets requires "bands", "quanta" or "strict" to be specified
- if "bands" is specified, nbands cannot be negative, see parse_nbands()
- if "strict" is specified, nstrict cannot be negative, see
parse_nbands()
- if "quantum" is specified, nquanta cannot be negative, see
parse_quantum()
- if "bands" is not specified, nbands is set to nstrict+nquanta
- the previous if statement takes care of the case when none of them are
specified and nbands is 0, terminating execution.
Thus nbands cannot be < 1 at this point and this code cannot be executed.
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
envp_run is dinamically allocated with a malloc, and not freed in the
out: return path. This commit fix it.
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
In cake_parse_opt(), *argv is checked not to be null when parsing for
overhead and mpu parameters. However this is useless, since *argv
matches right before for "overhead" or "mpu".
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Allow a policer action to enforce a rate-limit based on packets-per-second,
configurable using a packet-per-second rate and burst parameters.
e.g.
# $TC actions add action police pkts_rate 1000 pkts_burst 200 index 1
# $TC actions ls action police
total acts 1
action order 0: police 0x1 rate 0bit burst 0b mtu 4096Mb pkts_rate 1000 pkts_burst 200
ref 1 bind 0
Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Louis Peens <louis.peens@netronome.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
The deficit returned from the kernel is signed, but was printed with a %u
specifier in the format string, leading to negative values to be printed as
high unsigned values instead. In addition, we passed a negative value to
sprint_time() even though that expects an unsigned value. Fix this by
changing the format specifier and reversing the sign of negative time
values.
Fixes: 714444c0cb ("Add support for CAKE qdisc")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
sprint_time64() uses SPRINT_BSIZE-1 as a constant buffer lenght in its
implementation, however m_gate uses shorter buffers when calling it.
Fix this using SPRINT_BUF macro to get the buffer, thus getting a
SPRINT_BSIZE-long buffer.
Fixes: 07d5ee70b5 ("iproute2-next:tc:action: add a gate control action")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>