Commit Graph

767 Commits

Author SHA1 Message Date
Phil Sutter
d892aaf740 tc: m_action: Improve conversion to C99 style initializers
This improves my initial change in the following points:

- Flatten embedded struct's initializers.
- No need to initialize variables to zero as the key feature of C99
  initializers is to do this implicitly.
- By relocating the declaration of struct rtattr *tail, it can be
  initialized at the same time.

Fixes: a0a73b298a ("tc: m_action: Use C99 style initializers for struct req")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
2016-07-20 12:05:24 -07:00
Daniel Borkmann
e77fa41d4c bpf: also check elf for official e_machine value
Use the official BPF ELF e_machine value that was assigned recently [1]
and will be propagated to glibc, libelf et al. LLVM will switch to it
in 3.9 release, therefore we need to prepare tc to check for EM_ELF as
well, older version still have the EM_NONE.

  [1] 36b9c09330

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
2016-07-20 11:54:53 -07:00
Stephen Hemminger
d5b62e6439 Merge branch 'master' into net-next 2016-07-06 21:29:32 -07:00
Amir Vadai
cfcabf18d8 tc: flower: Add skip_{hw|sw} support
On devices that support TC flower offloads, these flags enable a filter to be
added only to HW or only to SW. skip_sw and skip_hw are mutually exclusive
flags. By default without any flags, the filter is added to both HW and SW,
but no error checks are done in case of failure to add to HW.
With skip-sw, failure to add to HW is treated as an error.

Here is a sample script that adds 2 filters, one with skip_sw and the other
with skip_hw flag.

   # add ingress qdisc
   tc qdisc add dev enp0s9 ingress

   # enable hw tc offload.
   ethtool -K enp0s9 hw-tc-offload on

   # add a flower filter with skip-sw flag.
   tc filter add dev enp0s9 protocol ip parent ffff: flower \
	   ip_proto 1 indev enp0s9 skip_sw \
	   action drop

   # add a flower filter with skip-hw flag.
   tc filter add dev enp0s9 protocol ip parent ffff: flower \
	   ip_proto 3 indev enp0s9 skip_hw \
	   action drop

Signed-off-by: Amir Vadai <amirva@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
2016-07-06 21:24:48 -07:00
Jamal Hadi Salim
1d1e0fd29b actions: skbedit add support for mod-ing skb pkt_type
I'll make a formal submission sans the header when the kernel patches
makes it in. This version is for someone who wants to play around with
the net-next kernel patches i sent

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-07-06 21:15:44 -07:00
Phil Sutter
5f6a467f59 tc: m_action: Drop unused variable nladdr in tc_action_gd()
This has been there since the introduction of tc/m_action.c back in 2004
and was apparently never in use.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-06-16 09:41:55 -07:00
Phil Sutter
a0a73b298a tc: m_action: Use C99 style initializers for struct req
Instead of initializing fields after (or sometimes even before) zeroing
the whole struct via memset(), initialize the whole thing at declaration
time.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-06-16 09:41:55 -07:00
Alexander Aring
9b32f89693 tc: let m_ipt work with new iptables API headers
Since commit 5cd1adb ("Update to current iptables headers") the build
with m_ipt.o and the following config will fail:

TC_CONFIG_XT:=n
TC_CONFIG_XT_OLD:=n
TC_CONFIG_XT_OLD_H:=n

This patch renames "iptables_target" to "xtables_target" and some other
things which gets renamed and I noticed while reading iptables git log.
Functions which are not used in m_ipt.c and not exported by the header
are removed, if they still used in m_ipt.c I added a static to the function.

Reported-by: Clemens Gruber <clemens.gruber@pqgruber.com>
Signed-off-by: Alexander Aring <aar@pengutronix.de>
2016-06-14 18:03:30 -07:00
Stephen Hemminger
4b83a08c28 m_xt: whitespace cleanup
Make it 99% checkpatch clean.
2016-06-14 14:40:53 -07:00
Phil Sutter
2ef4008585 tc: m_xt: Introduce get_xtables_target_opts()
This pulls common code from parse_ipt() and print_ipt() functions
together.

While here, also fix for incorrect use of the global 'optarg' variable
in print_ipt().

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-06-14 14:35:56 -07:00
Phil Sutter
f6ddd9c5da tc: m_xt: Simplify argc adjusting in parse_ipt()
And while at it, also improve the error message in case too few
parameters have been given.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-06-14 14:35:56 -07:00
Phil Sutter
28432f370e tc: m_xt: Get rid of iargc variable in parse_ipt()
After dropping the unused decrement of argc in the function's tail, it
can fully take over what iargc has been used for.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-06-14 14:35:56 -07:00
Phil Sutter
ab8f52fc4a tc: m_xt: Get rid of rargc in parse_ipt()
No need to copy the passed parameter, it's changed only once right
before function return.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-06-14 14:35:56 -07:00
Phil Sutter
b0ba018576 tc: m_xt: Drop unused variable fw in parse_ipt()
Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-06-14 14:35:56 -07:00
Phil Sutter
b45f9141c2 tc: m_xt: Get rid of one indentation level in parse_ipt()
Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-06-14 14:35:56 -07:00
Phil Sutter
f1a7c7d830 tc: m_xt: Fix indenting
By exiting early if xtables_find_target() fails, one indenting level can
be dropped. Some of the wrongly indented code then happens to sit at the
right spot by accident which is why this patch is smaller than expected.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-06-14 14:35:56 -07:00
Phil Sutter
8eee75a835 tc: m_xt: Fix segfault when adding multiple actions at once
Without this, the following call to tc would segfault:

| tc filter add dev d0 parent ffff: u32 match u32 0 0 \
| 	action xt -j MARK --set-mark 0x1 \
| 	action xt -j MARK --set-mark 0x1

The reason is basically the same as for 6e2e5ec28b ("fix print_ipt:
segfault if more then one filter with action -j MARK.") but in
parse_ipt() instead of print_ipt().

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-06-14 14:35:56 -07:00
Phil Sutter
445745221a tc: m_xt: Prevent segfault with standard targets
Iptables standard targets like DROP or REJECT don't implement the print
callback in libxtables. Hence the following command would segfault:

| tc filter add dev d0 parent ffff: u32 match u32 0 0 action xt -j DROP

With this patch standard targets still can't be used (and are not really
useful anyway), but at least it doesn't crash anymore.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-06-14 14:35:56 -07:00
Stephen Hemminger
8b625177ba pedit: fix whitespace etc
Minor changes from checkpatch
2016-06-14 14:32:27 -07:00
Jamal Hadi Salim
d8694a30a4 action pedit: stylistic changes
More modern layout.

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-06-14 14:29:20 -07:00
Stephen Hemminger
622812052a tc: f_u32 cleanup indentation and long lines
Several long lines and too long messages here.
2016-06-08 16:45:26 -07:00
Samudrala, Sridhar
5e5b3008d1 tc: f_u32: Add support for skip_hw and skip_sw flags
On devices that support TC U32 offloads, these flags enable a filter to be
added only to HW or only to SW. skip_sw and skip_hw are mutually exclusive
flags. By default without any flags, the filter is added to both HW and SW,
but no error checks are done in case of failure to add to HW.
With skip-sw, failure to add to HW is treated as an error.

Here is a sample script that adds 2 filters, one with skip_sw and the other
with skip_hw flag.

   # add ingress qdisc
   tc qdisc add dev p4p1 ingress

   # enable hw tc offload.
   ethtool -K p4p1 hw-tc-offload on

   # add u32 filter with skip-sw flag.
   tc filter add dev p4p1 parent ffff: protocol ip prio 99 \
      handle 800:0:1 u32 ht 800: flowid 800:1 \
      skip-sw \
      match ip src 192.168.1.0/24 \
      action drop

   # add u32 filter with skip-hw flag.
   tc filter add dev p4p1 parent ffff: protocol ip prio 99 \
      handle 800:0:2 u32 ht 800: flowid 800:2 \
      skip-hw \
      match ip src 192.168.2.0/24 \
      action drop

Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
2016-06-08 16:39:30 -07:00
Sabrina Dubroca
9f7401fa49 utils: add get_be{16, 32, 64}, use them where possible
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Phil Sutter <phil@nwl.cc>
2016-06-08 09:30:37 -07:00
Eric Dumazet
4de4b5ca14 fq_codel: add per queue memory limit
This patch adds support for TCA_FQ_CODEL_MEMORY_LIMIT attribute.

..
qdisc fq_codel 8008: root refcnt 257 limit 10240p flows 1024
 quantum 1514 target 5.0ms interval 100.0ms memory_limit 4Mb ecn
 Sent 2083566791363 bytes 1376214889 pkt (dropped 4994406, overlimits 0
requeues 21705223)
 rate 9841Mbit 812549pps backlog 3906120b 376p requeues 21705223
  maxpacket 68130 drop_overlimit 4994406 new_flow_count 28855414
  ecn_mark 0 memory_used 4190048 drop_overmemory 4994406
new_flows_len 1 old_flows_len 177

Signed-off-by: Eric Dumazet <edumazet@google.com>
2016-06-08 08:42:00 -07:00
Jamal Hadi Salim
ead954cbd4 tc action policer: enable timestamp display
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-05-31 13:03:13 -07:00
Jamal Hadi Salim
82e6efe2e3 tc filter u32: Coding style fixes
"handle" was being used several times for different things.
Fix the 80 character limit abuse and other little issues while at it.

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-05-31 12:33:48 -07:00
Stephen Hemminger
e6263c8583 tc: action result is u32
In kernel action result is u32 not int in netlink messages.
2016-05-31 12:22:45 -07:00
Jamal Hadi Salim
45c6837911 tc action policer: Avoid nonsensical input
The user must at least specify a choice of the token bucket or
ewma policing or late binding index. TB policing requires at minimal
a rate and burst.

In addition fix formatting issues (80 chars etc).

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-05-31 12:16:45 -07:00
David Ahern
57bdf8b764 Make builds default to quiet mode
Similar to the Linux kernel and perf add infrastructure to reduce the
amount of output tossed to a user during a build. Full build output
can be obtained with 'make V=1'

Builds go from:

make[1]: Leaving directory `/home/dsa/iproute2.git/lib'
make[1]: Entering directory `/home/dsa/iproute2.git/ip'
gcc -Wall -Wstrict-prototypes  -Wmissing-prototypes -Wmissing-declarations -Wold-style-definition -Wformat=2 -O2 -I../include -DRESOLVE_HOSTNAMES -DLIBDIR=\"/usr/lib\" -DCONFDIR=\"/etc/iproute2\" -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE    -c -o ip.o ip.c
gcc -Wall -Wstrict-prototypes  -Wmissing-prototypes -Wmissing-declarations -Wold-style-definition -Wformat=2 -O2 -I../include -DRESOLVE_HOSTNAMES -DLIBDIR=\"/usr/lib\" -DCONFDIR=\"/etc/iproute2\" -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE    -c -o ipaddress.o ipaddress.c

to:

...
    AR       libutil.a

ip
    CC       ip.o
    CC       ipaddress.o
...

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
2016-05-31 12:13:07 -07:00
Jamal Hadi Salim
e70b9f16ea tc simple action: bug fix
Failed compile
m_simple.c: In function ‘parse_simple’:
m_simple.c:154:6: warning: too many arguments for format [-Wformat-extra-args]
      *argv);
      ^
m_simple.c:103:14: warning: unused variable ‘maybe_bind’ [-Wunused-variable]

Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-05-31 12:11:52 -07:00
Jamal Hadi Salim
a78a2dba27 tc fix ife late binding
following late binding didn't work

sudo tc actions add action ife encode \
type 0xDEAD allow mark dst 02:15:15:15:15:15 index 1

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-05-23 16:15:31 -07:00
Daniel Borkmann
1a0320727c f_bpf: fix filling of handle when no further arg is provided
We need to fill handle when provided by the user, even if no further
argument is provided. Thus, move the test for arg to the correct location,
so that it works correctly:

  # tc filter show dev foo egress
  filter protocol all pref 1 bpf
  filter protocol all pref 1 bpf handle 0x1 bpf.o:[classifier] direct-action
  filter protocol all pref 1 bpf handle 0x2 bpf.o:[classifier] direct-action
  # tc filter del dev foo egress prio 1 handle 2 bpf
  # tc filter show dev foo egress
  filter protocol all pref 1 bpf
  filter protocol all pref 1 bpf handle 0x1 bpf.o:[classifier] direct-action

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2016-05-23 16:14:18 -07:00
Daniel Borkmann
a2de651e64 ingress, clsact: don't add TCA_OPTIONS to nl msg
In ingress and clsact qdisc TCA_OPTIONS are ignored, since it's
parameterless. In tc, we add an empty addattr_l(... TCA_OPTIONS,
NULL, 0) to the netlink message nevertheless. This has the
side effect that when someone tries a 'tc qdisc replace' and
already an existing such qdisc is present, tc fails with
EINVAL here.

Reason is that in the kernel, this invokes qdisc_change() when
such requested qdisc is already present. When TCA_OPTIONS are
passed to modify parameters, it looks whether qdisc implements
.change() callback, and if not present (like in both cases here)
it returns with error. Rather than adding an empty stub to the
kernel that ignores TCA_OPTIONS again, just don't add TCA_OPTIONS
to the netlink message in the first place.

Before:

  # tc qdisc replace dev foo clsact    # first try
  # tc qdisc replace dev foo clsact    # second one
  RTNETLINK answers: Invalid argument

After:

  # tc qdisc replace dev foo clsact
  # tc qdisc replace dev foo clsact
  # tc qdisc replace dev foo clsact

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2016-05-16 11:20:50 -07:00
Jamal Hadi Salim
fdf1bdd0f1 tc simple action update and breakage
Brings it closer to more serious actions (adding branching
and allowing for late binding)

Unfortunately this breaks old syntax of the simple action.
But because simple is a pedagogical example unlikely to be used
in production environments (i.e its role is to serve as an example
on how to write actions), then this is ok.

New syntax for simple has new keyword "sdata". Example usage is:

sudo tc actions add action simple sdata "foobar" index 1
or
tc filter add dev $DEV parent ffff: protocol ip prio 1 u32\
match ip dst 17.0.0.1/32 flowid 1:10 action simple sdata "foobar"

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-05-16 11:15:12 -07:00
Jamal Hadi Salim
43726b750a tc: don't ignore ok as an action branch
This is what used to happen before:

tc filter add dev tap1 parent ffff: protocol 0xfefe prio 10 \
     u32 match u32 0 0 flowid 1:16 \
     action ife decode allow mark ok

tc -s filter ls dev tap1 parent ffff:
filter protocol [65278] pref 10 u32
filter protocol [65278] pref 10 u32 fh 800: ht divisor 1
filter protocol [65278] pref 10 u32 fh 800::800 order 2048 key ht 800
bkt 0 flowid 1:16
  match 00000000/00000000 at 0
        action order 1: ife decode action pipe
         index 2 ref 1 bind 1 installed 4 sec used 4 sec
         type: 0x0
         Metadata: allow mark
        Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

        action order 2: gact action pass
         random type none pass val 0
         index 1 ref 1 bind 1 installed 4 sec used 4 sec
        Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

Note the extra action added at the end..

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-05-16 11:13:58 -07:00
Jamal Hadi Salim
d3e511223f tc: introduce IFE action
This action allows for a sending side to encapsulate arbitrary metadata
which is decapsulated by the receiving end.
The sender runs in encoding mode and the receiver in decode mode.
Both sender and receiver must specify the same ethertype.
At some point we hope to have a registered ethertype and we'll
then provide a default so the user doesnt have to specify it.
For now we enforce the user specify it.

Described in netdev01 paper:
   "Distributing Linux Traffic Control Classifier-Action Subsystem"
    Authors: Jamal Hadi Salim and Damascene M. Joachimpillai

Also refer to IETF draft-ietf-forces-interfelfb-04.txt

Lets show example usage where we encode icmp from a sender towards
a receiver with an skbmark of 17; both sender and receiver use
ethertype of 0xdead to interop.

YYYY: Lets start with Receiver-side policy config:
xxx: add an ingress qdisc
sudo tc qdisc add dev $ETH ingress

xxx: any packets with ethertype 0xdead will be subjected to ife decoding
xxx: we then restart the classification so we can match on icmp at prio 3
sudo $TC filter add dev $ETH parent ffff: prio 2 protocol 0xdead \
u32 match u32 0 0 flowid 1:1 \
action ife decode reclassify

xxx: on restarting the classification from above if it was an icmp
xxx: packet, then match it here and continue to the next rule at prio 4
xxx: which will match based on skb mark of 17
sudo tc filter add dev $ETH parent ffff: prio 3 protocol ip \
u32 match ip protocol 1 0xff flowid 1:1 \
action continue

xxx: match on skbmark of 0x11 (decimal 17) and accept
sudo tc filter add dev $ETH parent ffff: prio 4 protocol ip \
handle 0x11 fw flowid 1:1 \
action ok

xxx: Lets show the decoding policy
sudo tc -s filter ls dev $ETH parent ffff: protocol 0xdead
xxx:
filter pref 2 u32
filter pref 2 u32 fh 800: ht divisor 1
filter pref 2 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:1  (rule hit 0 success 0)
  match 00000000/00000000 at 0 (success 0 )
	action order 1: ife decode action reclassify type 0x0
	 allow mark allow prio
	 index 11 ref 1 bind 1 installed 45 sec used 45 sec
	Action statistics:
	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0

xxx:
Observe that above lists all metadatum it can decode. Typically these
submodules will already be compiled into a monolithic kernel or
loaded as modules

YYYY: Lets show the sender side now ..
xxx: Add an egress qdisc on the sender netdev
sudo tc qdisc add dev $ETH root handle 1: prio
xxx:
xxx: Match all icmp packets to 192.168.122.237/24, then
xxx: tag the packet with skb mark of decimal 17, then
xxx: Encode it with:
xxx:    ethertype 0xdead
xxx:    add skb->mark to whitelist of metadatum to send
xxx:    rewrite target dst MAC address to 02:15:15:15:15:15
xxx:
sudo $TC filter add dev $ETH parent 1: protocol ip prio 10  u32 \
match ip dst 192.168.122.237/24 \
match ip protocol 1 0xff \
flowid 1:2 \
action skbedit mark 17 \
action ife encode \
type 0xDEAD \
allow mark \
dst 02:15:15:15:15:15

xxx: Lets show the encoding policy
filter pref 10 u32
filter pref 10 u32 fh 800: ht divisor 1
filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:2  (rule hit 118 success 0)
  match c0a87a00/ffffff00 at 16 (success 0 )
  match 00010000/00ff0000 at 8 (success 0 )
	action order 1:  skbedit mark 17
	 index 11 ref 1 bind 1 installed 3 sec used 3 sec
 	Action statistics:
	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0

	action order 2: ife encode action pipe type 0xDEAD
	 allow mark dst 02:15:15:15:15:15
	 index 12 ref 1 bind 1 installed 3 sec used 3 sec
	Action statistics:
	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0
xxx:

Now test by sending ping from sender to destination

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-05-16 11:13:26 -07:00
Gustavo Zacarias
5c5a0f3df9 iproute2: tc_bpf.c: fix building with musl libc
We need limits.h for PATH_MAX, fixes:

tc_bpf.c: In function ‘bpf_map_selfcheck_pinned’:
tc_bpf.c:222:12: error: ‘PATH_MAX’ undeclared (first use in this
function)
  char file[PATH_MAX], buff[4096];

Signed-off-by: Gustavo Zacarias <gustavo@zacarias.com.ar>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2016-04-11 22:09:57 +00:00
Daniel Borkmann
4dd3f50af4 tc, bpf: add support for map pre/allocation
Follow-up to kernel commit 6c9059817432 ("bpf: pre-allocate hash map
elements"). Add flags support, so that we can pass in BPF_F_NO_PREALLOC
flag for disallowing preallocation. Update examples accordingly and also
remove the BPF_* map helper macros from them as they were not very useful.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2016-04-11 21:54:47 +00:00
Daniel Borkmann
afc1a2000b tc, bpf: further improve error reporting
Make it easier to spot issues when loading the object file fails. This
includes reporting in what pinned object specs differ, better indication
when we've reached instruction limits. Don't retry to load a non relo
program once we failed with bpf(2), and report out of bounds tail call key.

Also, add truncation of huge log outputs by default. Sometimes errors are
quite easy to spot by only looking at the tail of the verifier log, but
logs can get huge in size e.g. up to few MB (due to verifier checking all
possible program paths). Thus, by default limit output to the last 4096
bytes and indicate that it's truncated. For the full log, the verbose option
can be used.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2016-04-11 21:53:58 +00:00
Jiri Pirko
4952b45946 include: add linked list implementation from kernel
Rename hlist.h to list.h while adding it to be aligned with kernel

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
2016-03-27 10:56:11 -07:00
Stephen Hemminger
e9e9365b56 scrub out whitespace issues
Run script that removes trailing whitespace everywhere.
2016-03-27 10:50:14 -07:00
Phil Sutter
7faf1588a7 lib/utils: introduce rt_addr_n2a_rta()
This simple macro eases calling rt_addr_n2a() with data from an rt_attr
pointer.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-27 10:37:35 -07:00
Phil Sutter
2e96d2ccd0 utils: make rt_addr_n2a() non-reentrant by default
There is only a single user who needs it to be reentrant (not really,
but it's safer like this), add rt_addr_n2a_r() for it to use.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-27 10:37:34 -07:00
Phil Sutter
a418e45164 make format_host non-reentrant by default
There are only three users which require it to be reentrant, the rest is
fine without. Instead, provide a reentrant format_host_r() for users
which need it.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-27 10:37:34 -07:00
Phil Sutter
51011dac36 tc/m_vlan.c: mention CONTROL option in help text
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2016-03-27 10:34:48 -07:00
Phil Sutter
1672f42195 tc: connmark, pedit: Rename BRANCH to CONTROL
As Jamal suggested, BRANCH is the wrong name, as these keywords go
beyond simple branch control - e.g. loops are possible, too. Therefore
rename the non-terminal to CONTROL instead which should be more
appropriate.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2016-03-27 10:34:42 -07:00
Phil Sutter
a33786b582 tc: pedit: Fix raw op
The retain value was wrong for u16 and u8 types.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2016-03-27 10:34:36 -07:00
Phil Sutter
77bed404d0 tc: pedit: Fix for big-endian systems
This was tricky to get right:
- The 'stride' value used for 8 and 16 bit values must behave inverse to
  the value's intra word offset to work correctly with big-endian data
  act_pedit is editing.
- The 'm' array's values are in host byte order, so they have to be
  converted as well (and the ordering was just inverse, for some
  reason).
- The only sane way of getting this right is to manipulate value/mask in
  host byte order and convert the output.
- TIPV4 (i.e. 'munge ip src/dst') had it's own pitfall: the address
  parser converts to network byte order automatically. This patch fixes
  this by converting it back before calling pack_key32, which is a hack
  but at least does not require to implement a completely separate code
  flow.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2016-03-27 10:34:33 -07:00
Phil Sutter
952f89deba tc/p_ip.c: Minor coding style cleanup
Break overlong function definitions and remove one extraneous
whitespace.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2016-03-27 10:34:22 -07:00
Stephen Hemminger
32a121cba2 tc: code cleanup
Use checkpatch to fix whitespace and other style issues.
2016-03-21 11:48:36 -07:00
Luca Lemmo
4733b18a5e tc: q_{codel,fq_codel}: add missing space in help text
Signed-off-by: Luca Lemmo <luca@linux.com>
2016-03-21 11:42:13 -07:00
Luca Lemmo
725f2a872d tc: f_u32: trivial coding style cleanups
Signed-off-by: Luca Lemmo <luca@linux.com>
2016-03-21 11:42:12 -07:00
Luca Lemmo
dd0c8d193f tc: f_u32: add missing spaces around operators
Signed-off-by: Luca Lemmo <luca@linux.com>
2016-03-21 11:42:12 -07:00
Phil Sutter
338b003bcc tc: pedit: Fix retain value for ihl adjustments
Since the IP Header Length field is just half a byte, adjust retain to
only match these bits so the Version field is not overwritten by
accident.

The whole concept is actually broken due to dependency on endianness
which pedit ignores.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-06 12:53:11 -08:00
Phil Sutter
f440e9d8c2 tc: pedit: Fix parse_cmd()
This was horribly broken:
* pack_key8() and pack_key16() ...
  * missed to invert retain value when applying it to the mask,
  * did not sanitize val by ANDing it with retain,
  * and ignored the mask which is necessary for 'invert' command.
* pack_key16() did not convert mask to network byte order.
* Changing the retain value for 'invert' or 'retain' operation seems
  just plain wrong.
* While here, also got rid of unnecessary offset sanitization in
  pack_key32().
* Simplify code a bit by always assigning the local mask variable to
  tkey->mask before calling any of the pack_key*() variants.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-06 12:53:11 -08:00
Phil Sutter
ec0ceeec49 tc: pedit: Fix layered op parsing
After lookup of the layered op submodule, pedit would pass argv and argc
including the layered op identifier at first position which confused the
submodule parser. Fix this by calling NEXT_ARG() before calling the
parse_peopt() callback.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-06 12:53:11 -08:00
Phil Sutter
c024acc641 tc: pedit: document branch control in help output
This seems to have been a hidden feature, though it's very useful and
necessary at least when combining multiple pedit actions.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-03-04 15:27:52 -08:00
Dmitrii Shcherbakov
467f9fce60 htb: rename b4 buffer to b3 to make its name more consistent
b3 buffer has been deleted previously so b2 is followed by b4
which is not consistent.

Signed-off-by: Dmitrii Shcherbakov <fw.dmitrii@yandex.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>
2016-02-17 17:50:14 -08:00
Dmitrii Shcherbakov
1aea7fea26 htb: remove printing of a deprecated overhead value
Remove printing according to the previously used encoding of mpu and
overhead values within the tc_ratespec's mpu field. This encoding is
no longer being used as a separate 'overhead' field in the ratespec
structure has been introduced.

Signed-off-by: Dmitrii Shcherbakov <fw.dmitrii@yandex.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>
2016-02-17 17:49:47 -08:00
Daniel Borkmann
5230a2ede0 tc, bpf: use bind/type macros from gelf
Don't reimplement them and rather use the macros from the gelf header,
that is, GELF_ST_BIND()/GELF_ST_TYPE().

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2016-02-07 11:27:38 -08:00
Daniel Borkmann
a576c6b977 tc, bpf: give some more hints wrt false relos
Provide some more hints to the user/developer when relos have been found
that don't point to ld64 imm instruction. Ran couple of times into relos
generated by clang [1], where the compiler tried to uninline inlined
functions with eBPF and emitted BPF_JMP | BPF_CALL opcodes. If this seems
the case, give a hint that the user should do a work-around to use
always_inline annotation.

  [1] https://llvm.org/bugs/show_bug.cgi?id=26243#c3

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2016-02-07 11:27:38 -08:00
Daniel Borkmann
f31645d138 tc, bpf: improve verifier logging
With a bit larger, branchy eBPF programs f.e. already ~BPF_MAXINSNS/7 in
size, it happens rather quickly that bpf(2) rejects also valid programs
when only the verifier log buffer size we have in tc is too small.

Change that, so by default we don't do any logging, and only in error
case we retry with logging enabled. If we should fail providing a
reasonable dump of the verifier analysis, retry few times with a larger
log buffer so that we can at least give the user a chance to debug the
program.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
2016-02-07 11:27:38 -08:00
Nicolas Dichtel
67584e3ab2 tc: fix compilation with old gcc (< 4.6) (bis)
Commit 8f80d450c3 ("tc: fix compilation with old gcc (< 4.6)") was reverted
to ease the merge of the net-next branch.

Here is the new version.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2016-02-05 11:46:18 +11:00
Daniel Borkmann
2486337aac tc, bpf: make sure relo is in relation with map section
Add a test that symbol from relocation entry is actually related
to map section and bail out with an error message if it's not the
case; in relation to [1].

  [1] https://llvm.org/bugs/show_bug.cgi?id=26243

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
2016-02-02 16:04:11 +11:00
Stephen Hemminger
62392ecbbb Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2 2016-02-02 15:57:23 +11:00
Daniel Borkmann
8187b01273 tc, bpf: more header checks on loading elf
eBPF llvm backend can support different BPF formats, make sure the object
we're trying to load matches with regards to endiannes and while at it, also
check for other attributes related to BPF ELFs.

  # llc --version
  LLVM (http://llvm.org/):
    LLVM version 3.8.0svn
    Optimized build.
    Built Jan  9 2016 (02:08:10).
    Default target: x86_64-unknown-linux-gnu
    Host CPU: ivybridge

    Registered Targets:
      bpf    - BPF (host endian)
      bpfeb  - BPF (big endian)
      bpfel  - BPF (little endian)
      [...]

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
2016-01-18 11:41:27 -08:00
Daniel Borkmann
cce3d4664c tc, bpf: check section names and type everywhere
When extracting sections, we better check for name and type. Noticed
that some llvm versions emit .strtab and .shstrtab (e.g. saw it on pre
3.7), while more recent ones only seem to emit .strtab. Thus, make sure
we get the right sections.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
2016-01-18 11:41:27 -08:00
Daniel Borkmann
8f9afdd531 tc, clsact: add clsact frontend
Add the tc part for the kernel commit 1f211a1b929c ("net, sched: add
clsact qdisc"). Quoting example usage from that commit description:

  Example, adding qdisc:

  # tc qdisc add dev foo clsact
  # tc qdisc show dev foo
  qdisc mq 0: root
  qdisc pfifo_fast 0: parent :1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
  qdisc pfifo_fast 0: parent :2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
  qdisc pfifo_fast 0: parent :3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
  qdisc pfifo_fast 0: parent :4 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
  qdisc clsact ffff: parent ffff:fff1

  Adding filters (deleting, etc works analogous by specifying ingress/egress):

  # tc filter add dev foo ingress bpf da obj bar.o sec ingress
  # tc filter add dev foo egress  bpf da obj bar.o sec egress
  # tc filter show dev foo ingress
  filter protocol all pref 49152 bpf
  filter protocol all pref 49152 bpf handle 0x1 bar.o:[ingress] direct-action
  # tc filter show dev foo egress
  filter protocol all pref 49152 bpf
  filter protocol all pref 49152 bpf handle 0x1 bar.o:[egress] direct-action

The ingress parent alias can also be used with ingress qdisc.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2016-01-18 11:41:27 -08:00
Daniel Borkmann
0d45c4b420 tc, ingress: clean up ingress handling a bit
Clean it up a bit, we can also get rid of some ugly ifdefs as in our case
TC_H_INGRESS is always defined.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2016-01-18 11:41:27 -08:00
Stephen Hemminger
2505780c20 Merge branch 'net-next' 2016-01-18 09:37:45 -08:00
Stephen Hemminger
bc223ab861 Revert "tc: fix compilation with old gcc (< 4.6)"
This reverts commit 8f80d450c3.
2016-01-18 09:37:38 -08:00
Jamal Hadi Salim
488b41d020 tc: flower no need to specify the ethertype
since all tc classifiers are required to specify ethertype as part of grammar
By not allowing eth_type to be specified we remove contradiction for
example when a user specifies:
tc filter add ... priority xxx protocol ip flower eth_type ipv6
This patch removes that contradiction

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2016-01-11 08:24:01 -08:00
Julien Floret
8f80d450c3 tc: fix compilation with old gcc (< 4.6)
gcc < 4.6 does not handle C11 syntax for the static initialization of
anonymous struct/union, hence the following error:

tc_bpf.c:260: error: unknown field map_type specified in initializer

Signed-off-by: Julien Floret <julien.floret@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2016-01-11 08:23:36 -08:00
Phil Sutter
de7db5d857 tc: m_connmark: Fix help text
When specifying a conntrack zone, the 'zone' keyword has to be used
before the actual zone index.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2016-01-07 10:35:08 -08:00
Stephen Hemminger
e49b51d663 monitor: fix file handle leak
In some cases passing file to monitor left file open.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2015-12-30 17:26:38 -08:00
Daniel Borkmann
fd7f9c7fd1 bpf: minor fix in api and bpf_dump_error() usage
Fix a whitespace in bpf_dump_error() usage, and also a missing closing
bracket in ntohl() macro for eBPF programs.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2015-12-17 17:22:25 -08:00
Daniel Borkmann
91d88eeb10 {f,m}_bpf: allow updates on program arrays
Since we have all infrastructure in place now, allow atomic live updates
on program arrays. This can be very useful e.g. in case programs that are
being tail-called need to be replaced, f.e. when classifier functionality
needs to be changed, new protocols added/removed during runtime, etc.

Thus, provide a way for in-place code updates, minimal example: Given is
an object file cls.o that contains the entry point in section 'classifier',
has a globally pinned program array 'jmp' with 2 slots and id of 0, and
two tail called programs under section '0/0' (prog array key 0) and '0/1'
(prog array key 1), the section encoding for the loader is <id/key>.
Adding the filter loads everything into cls_bpf:

  tc filter add dev foo parent ffff: bpf da obj cls.o

Now, the program under section '0/1' needs to be replaced with an updated
version that resides in the same section (also full path to tc's subfolder
of the mount point can be passed, e.g. /sys/fs/bpf/tc/globals/jmp):

  tc exec bpf graft m:globals/jmp obj cls.o sec 0/1

In case the program resides under a different section 'foo', it can also
be injected into the program array like:

  tc exec bpf graft m:globals/jmp key 1 obj cls.o sec foo

If the new tail called classifier program is already available as a pinned
object somewhere (here: /sys/fs/bpf/tc/progs/parser), it can be injected
into the prog array like:

  tc exec bpf graft m:globals/jmp key 1 fd m:progs/parser

In the kernel, the program on key 1 is being atomically replaced and the
old one's refcount dropped.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
2015-11-29 11:55:16 -08:00
Daniel Borkmann
f6793eec46 {f, m}_bpf: allow for user-defined object pinnings
The recently introduced object pinning can be further extended in order
to allow sharing maps beyond tc namespace. F.e. maps that are being pinned
from tracing side, can be accessed through this facility as well.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
2015-11-29 11:55:16 -08:00
Daniel Borkmann
9e607f2e72 {f, m}_bpf: check map attributes when fetching as pinned
Make use of the new show_fdinfo() facility and verify that when a
pinned map is being fetched that its basic attributes are the same
as the map we declared from the ELF file. I.e. when placed into the
globalns, collisions could occur. In such a case warn the user and
bail out.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
2015-11-29 11:55:16 -08:00
Daniel Borkmann
910b543dcc {f,m}_bpf: make tail calls working
Now that we have the possibility of sharing maps, it's time we get the
ELF loader fully working with regards to tail calls. Since program array
maps are pinned, we can keep them finally alive. I've noticed two bugs
that are being fixed in bpf_fill_prog_arrays() with this patch. Example
code comes as follow-up.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
2015-11-29 11:55:16 -08:00
Daniel Borkmann
32e93fb7f6 {f,m}_bpf: allow for sharing maps
This larger work addresses one of the bigger remaining issues on
tc's eBPF frontend, that is, to allow for persistent file descriptors.
Whenever tc parses the ELF object, extracts and loads maps into the
kernel, these file descriptors will be out of reach after the tc
instance exits.

Meaning, for simple (unnested) programs which contain one or
multiple maps, the kernel holds a reference, and they will live
on inside the kernel until the program holding them is unloaded,
but they will be out of reach for user space, even worse with
(also multiple nested) tail calls.

For this issue, we introduced the concept of an agent that can
receive the set of file descriptors from the tc instance creating
them, in order to be able to further inspect/update map data for
a specific use case. However, while that is more tied towards
specific applications, it still doesn't easily allow for sharing
maps accross multiple tc instances and would require a daemon to
be running in the background. F.e. when a map should be shared by
two eBPF programs, one attached to ingress, one to egress, this
currently doesn't work with the tc frontend.

This work solves exactly that, i.e. if requested, maps can now be
_arbitrarily_ shared between object files (PIN_GLOBAL_NS) or within
a single object (but various program sections, PIN_OBJECT_NS) without
"loosing" the file descriptor set. To make that happen, we use eBPF
object pinning introduced in kernel commit b2197755b263 ("bpf: add
support for persistent maps/progs") for exactly this purpose.

The shipped examples/bpf/bpf_shared.c code from this patch can be
easily applied, for instance, as:

 - classifier-classifier shared:

  tc filter add dev foo parent 1: bpf obj shared.o sec egress
  tc filter add dev foo parent ffff: bpf obj shared.o sec ingress

 - classifier-action shared (here: late binding to a dummy classifier):

  tc actions add action bpf obj shared.o sec egress pass index 42
  tc filter add dev foo parent ffff: bpf obj shared.o sec ingress
  tc filter add dev foo parent 1: bpf bytecode '1,6 0 0 4294967295,' \
     action bpf index 42

The toy example increments a shared counter on egress and dumps its
value on ingress (if no sharing (PIN_NONE) would have been chosen,
map value is 0, of course, due to the two map instances being created):

  [...]
          <idle>-0     [002] ..s. 38264.788234: : map val: 4
          <idle>-0     [002] ..s. 38264.788919: : map val: 4
          <idle>-0     [002] ..s. 38264.789599: : map val: 5
  [...]

... thus if both sections reference the pinned map(s) in question,
tc will take care of fetching the appropriate file descriptor.

The patch has been tested extensively on both, classifier and
action sides.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2015-11-23 16:10:44 -08:00
Stephen Hemminger
037660b351 qfq: fix parse_opt dead code
Fix Coverity warning from dead code.
2015-10-27 15:46:20 +09:00
Stephen Hemminger
86c392f958 Merge branch 'master' into net-next 2015-10-23 15:46:08 -07:00
Stephen Hemminger
753ef5bbd6 tc: remove extra whitespace
No blank lines at EOF, or trailing whitespace.
2015-10-23 15:43:28 -07:00
Phil Sutter
40eb737ebb tc: u32 filter coding style cleanup
Add missing spaces around operators to increase readability. Aside from
that, make "preference" match a real synonym for "tos" and "dsfield" as
it's effect was identical to them.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-10-23 15:37:26 -07:00
Phil Sutter
0a83e1eaf7 tc: improve filter help texts a bit
This fixes a few syntax errors and changes route filter help text to use
classid instead of flowid to be consistent with other filters' help
texts.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-10-23 15:37:26 -07:00
Daniel Borkmann
343dc90854 m_bpf: don't require default opcode on ebpf actions
After the patch, the most minimal command to load an eBPF action
for late binding with auto index selection through tc is:

  tc actions add action bpf obj prog.o

We already set TC_ACT_PIPE in tc as default opcode, so if nothing
further has been specified, just use it. Also, allow "ok" next to
"pass" for matching cmdline on TC_ACT_OK.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2015-10-12 09:44:52 -07:00
Daniel Borkmann
faa8a46300 f_bpf: allow for optional classid and add flags
When having optional classid, most minimal command can be sth
like:

  tc filter add dev foo parent X: bpf obj prog.o

Therefore, adapt the code so that a next argument will not be
enforced as the case currently.

Also, minor cleanup on the classid, where we should rather
have used addattr32(), and add flags for exec configuration,
for example (using short notation):

  tc filter add dev foo parent X: bpf da obj prog.o

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
2015-10-12 09:41:05 -07:00
Stephen Hemminger
8fe9839857 fq: fix whitespace 2015-09-25 12:40:00 -07:00
Eric Dumazet
8d5bd8c302 tc: fq: allow setting and retrieving orphan_mask
linux-3.19 fq packet scheduler got a new attribute, controlling
number of 'flows' holding packets not attached to a socket
(forwarding usage)

kernel commit is 06eb395fa9856b5a87cf7d80baee2a0ed3cdb9d7
("pkt_sched: fq: better control of DDOS traffic")

This patch adds corresponding code to tc command.

tc qd replace dev eth0 root fq orphan_mask 511

Signed-off-by: Eric Dumazet <edumazet@google.com>
2015-09-25 12:37:09 -07:00
Eric Dumazet
32a6fbe563 tc : add timestamps to tc monitor
Support -timestamp and -tshort options for tc monitor like ip monitor.

# tc -tshort monitor
[2015-09-23T16:39:11.260555] qdisc fq 8003: dev eth0 root refcnt 2 limit
10000p flow_limit 100p buckets 1024 quantum 3028 initial_quantum 15140
refill_delay 40.0ms

Signed-off-by: Eric Dumazet <edumazet@google.com>
2015-09-25 12:35:46 -07:00
Phil Sutter
565af7b816 tc: fq: allow setting and retrieving flow refill delay
Code to parse and export this tuneable via netlink is already present in
sched_fq.c of the kernel, so not making it accessible for users would be
a waste of resources.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-09-23 16:02:13 -07:00
Phil Sutter
5c32fa1d69 comment: Fix remaining listings of wrong FSF address
This patch follows the changes of commit 4d98ab0 ("Fix FSF address in
file headers"), fixing file headers added after it.

Signed-off-by: Phil Sutter <phil@nwl.cc>
2015-09-23 15:58:54 -07:00
Stephen Hemminger
9a6422c243 Merge branch 'master' into net-next 2015-08-13 19:42:41 -07:00
Stephen Hemminger
bcb4a7aa5b tc: fix return after invarg 2015-08-13 14:20:40 -07:00
Daniel Borkmann
baed90842a m_bpf: add frontend support for late binding
Frontend support for kernel commit a5c90b29e5cc ("act_bpf: properly
support late binding of bpf action to a classifier").

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2015-08-10 11:19:11 -07:00
Nicolas Dichtel
611f70b287 tc: fix bpf compilation with old glibc
Error was:
f_bpf.o: In function `bpf_parse_opt':
f_bpf.c:(.text+0x88f): undefined reference to `secure_getenv'
m_bpf.o: In function `parse_bpf':
m_bpf.c:(.text+0x587): undefined reference to `secure_getenv'
collect2: error: ld returned 1 exit status

There is no special reason to use the secure version of getenv, thus let's
simply use getenv().

CC: Daniel Borkmann <daniel@iogearbox.net>
Fixes: 88eea53954 ("tc: {f,m}_bpf: allow to retrieve uds path from env")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Tested-by: Yegor Yefremov <yegorslists@googlemail.com>
2015-07-27 14:35:42 -07:00
Stephen Hemminger
69be46c562 Merge branch 'master' into net-next 2015-06-26 00:04:04 -04:00
Daniel Borkmann
88eea53954 tc: {f,m}_bpf: allow to retrieve uds path from env
Allow to retrieve uds path from the environment, facilitates
also dealing with export a bit.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2015-06-25 15:13:16 -04:00
Daniel Borkmann
473d7840c3 tc: {f,m}_bpf: add tail call support for parser
Kernel commit 04fd61ab36ec ("bpf: allow bpf programs to tail-call other
bpf programs") added support for tail calls, this patch here adds tc
front end parts for the object parser to prepopulate a given eBPF prog
array before the root prog is pushed down for classifier creation. The
prepopulation works with any number of prog arrays in any dependencies,
e.g. prog or normal maps could also be used from progs that are
tail-called themself, etc.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2015-06-25 15:13:16 -04:00
Maciej Żenczykowski
0bbca0422f iproute2: tc/m_pedit.c - remove dead code
The initializers are simply not needed.

These if-blocks are outright dead code, because '0 > unsigned' is always
false, so only else clause triggers and regardless of which clause triggers
it only updates 'ind' which is later unconditionally written to before
being used anyway.

Otherwise we get errors from clang:

  m_pedit.c:166:8: error: comparison of 0 > unsigned expression is always false [-Werror,-Wtautological-compare]
    if (0 > tkey->off) {
        ~ ^ ~~~~~~~~~
  m_pedit.c:209:8: error: comparison of 0 > unsigned expression is always false [-Werror,-Wtautological-compare]
    if (0 > tkey->off) {
        ~ ^ ~~~~~~~~~
  2 errors generated.

Change-Id: I3c9e9092915088fc56f992e5df736851541a4458
2015-06-25 08:52:06 -04:00
Stephen Hemminger
f975059a51 Merge branch 'master' into net-next 2015-06-25 08:01:51 -04:00
Daniel Borkmann
ad1fe0d8e9 tc: util: fix print_rate for ludicrous speeds
The for loop should only probe up to G[i]bit rates, so that we
end up with T[i]bit as the last max units[] slot for snprintf(3),
and not possibly an invalid pointer in case rate is multiple of
kilo.

Fixes: 8cecdc2837 ("tc: more user friendly rates")
Reported-by: Jose R. Guzman Mosqueda <jose.r.guzman.mosqueda@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2015-06-24 23:34:20 -04:00
Stephen Hemminger
03371c7d98 Merge branch 'master' into net-next
Conflicts:
	include/linux/tcp.h
	lib/libnetlink.c
2015-05-28 09:18:01 -07:00
Stephen Hemminger
c079e121a7 libnetlink: add size argument to rtnl_talk
There have been several instances where response from kernel
has overrun the stack buffer from the caller. Avoid future problems
by passing a size argument.

Also drop the unused peer and group arguments to rtnl_talk.
2015-05-27 13:00:21 -07:00
David Ward
aacee2695a tc: gred: Add support for TCA_GRED_LIMIT attribute
Allow the qdisc limit to be set, which is particularly useful when
the default VQ is not configured with RED parameters.

Signed-off-by: David Ward <david.ward@ll.mit.edu>
2015-05-21 15:30:39 -07:00
Nicolas Dichtel
0628cddd9d libnetlink: introduce rtnl_listen_filter_t
There is no functional change with this commit. It only prepares the next one.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2015-05-21 15:28:56 -07:00
Eric Dumazet
df1c7d9138 codel: add ce_threshold support to codel & fc_codel
codel & fq_codel packet schedulers are now able to have a threshold
for CE marking packets, regardless of the drop/nodrop decision taken by
CoDel.

This is particularly useful for dctcp and variants, that do not use
traditional ECN.

Note that fq_codel users would have to specify noecn if ce_threshold is
used, otherwise results would be not very interesting, as ecn is default
on for fq_codel.

$ tc -s qdisc show dev eth1
qdisc codel 8002: root refcnt 45 limit 1000p target 5.0ms ce_threshold
1.0ms interval 100.0ms
 Sent 4908469888317 bytes 3351813967 pkt (dropped 0, overlimits 0
requeues 21624365)
 rate 37671Mbit 3231836pps backlog 4904740b 250p requeues 21624365
  count 0 lastcount 0 ldelay 1.1ms drop_next 0us
  maxpacket 68130 ecn_mark 0 drop_overlimit 0 ce_mark 410861803

Signed-off-by: Eric Dumazet <edumazet@google.com>
2015-05-21 15:25:05 -07:00
Jiri Pirko
30eb304ecd tc: add support for Flower classifier
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
2015-05-21 15:22:49 -07:00
David Ward
357c45ad3a tc: gred: Adopt the term VQ in the command syntax and output
In the GRED kernel source code, both of the terms "drop parameters"
(DP) and "virtual queue" (VQ) are used to refer to the same thing.
Each "DP" is better understood as a "set of drop parameters", since
it has values for limit, min, max, avpkt, etc. This terminology can
result in confusion when creating a GRED qdisc having multiple DPs.
Netlink attributes and struct members with the DP name seem to have
been left intact for compatibility, while the term VQ was otherwise
adopted in the code, which is more intuitive.

Use the VQ term in the tc command syntax and output (but maintain
compatibility with the old syntax).

Rewrite the usage text to be concise and similar to other qdiscs.

Signed-off-by: David Ward <david.ward@ll.mit.edu>
2015-05-21 14:16:03 -07:00
David Ward
eb6d7d6af1 tc: gred: Handle unsigned values properly in option parsing/printing
DPs, def_DP, and DP are unsigned values that are sent and received
in TCA_GRED_* netlink attributes; handle them properly when they
are parsed or printed. Use MAX_DPs as the initial value for def_DP
and DP, and fix the operator used for bounds checking them.

Signed-off-by: David Ward <david.ward@ll.mit.edu>
2015-05-21 14:16:03 -07:00
David Ward
1693a4d392 tc: gred: Improve parameter/statistics output
Make the output more consistent with the RED qdisc, and only show
details/statistics if the appropriate flag is set when calling tc.

Show the parameters used with "gred setup". Add missing statistics
"pdrop" and "other". Fix format specifiers for unsigned values.

Signed-off-by: David Ward <david.ward@ll.mit.edu>
2015-05-21 14:16:03 -07:00
David Ward
a77905ef6a tc: gred: Print usage text if no arguments appear after "gred"
This is more helpful to the user, since the command takes two forms,
and the message that would otherwise appear about missing parameters
assumes one of those forms.

Signed-off-by: David Ward <david.ward@ll.mit.edu>
2015-05-21 14:16:03 -07:00
David Ward
d73e0408e2 tc: gred: Fix whitespace issues in code
Signed-off-by: David Ward <david.ward@ll.mit.edu>
2015-05-21 14:16:03 -07:00
David Ward
7bf17a2264 tc: red: Mark "bandwidth" parameter as optional in usage text
Signed-off-by: David Ward <david.ward@ll.mit.edu>
2015-05-21 14:16:03 -07:00
David Ward
d93c909a4c tc: red, gred: Notify when using the default value for "bandwidth"
The "bandwidth" parameter is optional, but ensure the user is aware
of its default value, to proactively avoid configuration problems.

Signed-off-by: David Ward <david.ward@ll.mit.edu>
2015-05-21 14:16:03 -07:00
David Ward
6c99695da2 tc: red, gred: Fix format specifier in burst size warning
burst is an unsigned value.

Signed-off-by: David Ward <david.ward@ll.mit.edu>
2015-05-21 14:16:03 -07:00
David Ward
9d9a67c756 tc: red, gred: Rename overloaded variable wlog
It is used when parsing three different parameters, only one of
which is Wlog. Change the name to make the code less confusing.

Signed-off-by: David Ward <david.ward@ll.mit.edu>
2015-05-21 14:16:03 -07:00
Daniel Borkmann
ec6f5abcea tc: minor cleanup on ingress
Fix whitespacing and remove the unnecessary condition.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2015-05-11 09:18:10 -07:00
WANG Cong
285e7768e8 tc: fill in handle before checking argc
When deleting a specific basic filter with handle,
tc command always ignores the 'handle' option, so
tcm_handle is always 0 and kernel deletes all filters
in the selected group. This is wrong, we should respect
'handle' in cmdline.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
2015-05-11 09:13:20 -07:00
Daniel Borkmann
d937a74b6d tc: {m, f}_ebpf: add option for dumping verifier log
Currently, only on error we get a log dump, but I found it useful when
working with eBPF to have an option to also dump the log on success.
Also spotted a typo in a header comment, which is fixed here as well.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
2015-05-04 08:43:08 -07:00
Daniel Borkmann
4bd624467b tc: built-in eBPF exec proxy
This work follows upon commit 6256f8c9e4 ("tc, bpf: finalize eBPF
support for cls and act front-end") and takes up the idea proposed by
Hannes Frederic Sowa to spawn a shell (or any other command) that holds
generated eBPF map file descriptors.

File descriptors, based on their id, are being fetched from the same
unix domain socket as demonstrated in the bpf_agent, the shell spawned
via execvpe(2) and the map fds passed over the environment, and thus
are made available to applications in the fashion of std{in,out,err}
for read/write access, for example in case of iproute2's examples/bpf/:

  # env | grep BPF
  BPF_NUM_MAPS=3
  BPF_MAP1=6        <- BPF_MAP_ID_QUEUE (id 1)
  BPF_MAP0=5        <- BPF_MAP_ID_PROTO (id 0)
  BPF_MAP2=7        <- BPF_MAP_ID_DROPS (id 2)

  # ls -la /proc/self/fd
  [...]
  lrwx------. 1 root root 64 Apr 14 16:46 0 -> /dev/pts/4
  lrwx------. 1 root root 64 Apr 14 16:46 1 -> /dev/pts/4
  lrwx------. 1 root root 64 Apr 14 16:46 2 -> /dev/pts/4
  [...]
  lrwx------. 1 root root 64 Apr 14 16:46 5 -> anon_inode:bpf-map
  lrwx------. 1 root root 64 Apr 14 16:46 6 -> anon_inode:bpf-map
  lrwx------. 1 root root 64 Apr 14 16:46 7 -> anon_inode:bpf-map

The advantage (as opposed to the direct/native usage) is that now the
shell is map fd owner and applications can terminate and easily reattach
to descriptors w/o any kernel changes. Moreover, multiple applications
can easily read/write eBPF maps simultaneously.

To further allow users for experimenting with that, next step is to add
a small helper that can get along with simple data types, so that also
shell scripts can make use of bpf syscall, f.e to read/write into maps.

Generally, this allows for prepopulating maps, or any runtime altering
which could influence eBPF program behaviour (f.e. different run-time
classifications, skb modifications, ...), dumping of statistics, etc.

Reference: http://thread.gmane.org/gmane.linux.network/357471/focus=357860
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
2015-04-27 16:39:23 -07:00
Nicolas Dichtel
afa5158f02 tc: fix compilation warning on 32bits arch
The warning was:
m_simple.c: In function ‘parse_simple’:
m_simple.c:142:4: warning: format ‘%ld’ expects argument of type ‘long int’, but argument 3 has type ‘size_t’ [-Wformat]

Useful to be able to compile with -Werror.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2015-04-27 11:41:46 -07:00
Vadim Kochan
46679bbbe8 tc util: Fix possible buffer overflow when print class id
Use correct handle buffer length.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-04-20 10:06:02 -07:00
Felix Fietkau
b8d5c9a71b tc: add support for connmark action
Add ability to add the netfilter connmark support.

Typical usage:
...lets tag outgoing icmp with mark 0x10..
iptables -tmangle -A PREROUTING -p icmp -j CONNMARK --set-mark 0x10
..add on ingress of $ETH an extractor for connmark...
tc filter add dev $ETH parent ffff: prio 4 protocol ip \
u32 match ip protocol 1 0xff \
flowid 1:1 \
action connmark continue
...if the connmark was 0x11, we police to a ridic rate of 10Kbps
tc filter add dev $ETH parent ffff: prio 5 protocol ip \
handle 0x11 fw flowid 1:1 \
action police rate 10kbit burst 10k

Other ways to use the connmark is to supply the zone, index and
branching choice. Refer to help.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2015-04-13 10:49:45 -07:00
Daniel Borkmann
6256f8c9e4 tc, bpf: finalize eBPF support for cls and act front-end
This work finalizes both eBPF front-ends for the classifier and action
part in tc, it allows for custom ELF section selection, a simplified tc
command frontend (while keeping compat), reusing of common maps between
classifier and actions residing in the same object file, and exporting
of all map fds to an eBPF agent for handing off further control in user
space.

It also adds an extensive example of how eBPF can be used, and a minimal
self-contained example agent that dumps map data. The example is well
documented and hopefully provides a good starting point into programming
cls_bpf and act_bpf.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
2015-04-10 13:31:19 -07:00
Stephen Hemminger
bd733e4088 Merge branch 'master' into net-next
Conflicts:
	man/man8/ip-route.8.in
2015-04-07 08:56:14 -07:00
Vadim Kochan
8b90a9907e tc class: Ignore if default class name file does not exist
If '-nm' specified that do not fail if there is no
default class names file in /etc/iproute2.

Changed default class name file cls_names -> tc_cls.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-04-07 08:31:56 -07:00
Daniel Borkmann
11c39b5e98 tc: add eBPF support to f_bpf
This work adds the tc frontend for kernel commit e2e9b6541dd4 ("cls_bpf:
add initial eBPF support for programmable classifiers").

A C-like classifier program (f.e. see e2e9b6541dd4) is being compiled via
LLVM's eBPF backend into an ELF file, that is then being passed to tc. tc
then loads, if any, eBPF maps and eBPF opcodes (with fixed-up eBPF map file
descriptors) out of its dedicated sections, and via bpf(2) into the kernel
and then the resulting fd via netlink down to cls_bpf. cls_bpf allows for
annotations, currently, I've used the file name for that, so that the user
can easily identify his filter when dumping configurations back.

Example usage:

  clang -O2 -emit-llvm -c cls.c -o - | llc -march=bpf -filetype=obj -o cls.o
  tc filter add dev em1 parent 1: bpf run object-file cls.o classid x:y

  tc filter show dev em1 [...]
  filter parent 1: protocol all pref 49152 bpf handle 0x1 flowid x:y cls.o

I placed the parser bits derived from Alexei's kernel sample, into tc_bpf.c
as my next step is to also add the same support for BPF action, so we can
have a fully fledged eBPF classifier and action in tc.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
2015-03-24 15:45:23 -07:00
Daniel Borkmann
51cf36756c tc: m_bpf: fix next arg selection after tc opcode
Next argument after the tc opcode/verdict is optional, using NEXT_ARG()
requires to have another argument after that one otherwise tc will bail
out. Therefore, we need to advance to the next argument manually as done
elsewhere.

Fixes: 86ab59a666 ("tc: add support for BPF based actions")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Pirko <jiri@resnulli.us>
2015-03-24 15:39:53 -07:00
Vadim Kochan
4612d04d6b tc class: Show class names from file
It is possible to use class names from file /etc/iproute2/cls_names
which tc will use when showing class info:

    # tc/tc -nm class show dev lo
	class htb 1:10 parent 1:1 leaf 10: prio 0 rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b
	class htb 1:1 root rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b
	class htb web#1:20 parent 1:1 leaf 20: prio 0 rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
	class htb 1:2 root rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b
	class htb 1:30 parent 1:1 leaf 30: prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
	class htb voip#1:40 parent 1:2 leaf 40: prio 0 rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b
	class htb 1:50 parent 1:2 leaf 50: prio 0 rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
	class htb 1:60 parent 1:2 leaf 60: prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b

or to specify via file path:

    # tc/tc -nm -cf /tmp/cls_names class show dev lo

Class names file contains simple "maj:min  name" structure:

1:20    web
1:40    voip

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2015-03-15 12:27:40 -07:00
Daniel Borkmann
32caee9fc7 m_bpf: remove unrelevant help lines
Left-overs when copying this over from cls_bpf. ;) Lets remove them.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
2015-02-27 19:00:51 -08:00
Jiri Pirko
86ab59a666 tc: add support for BPF based actions
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
2015-02-05 10:38:13 -08:00
Jiri Pirko
1d129d191a tc: push bpf common code into separate file
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
2015-02-05 10:38:13 -08:00
Jamal Hadi Salim
564663b4ca actions: Get vlan action to work in pipeline
When specified in a graph such as:
action vlan ... action foobar
the vlan action chewed more than it can swallow

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2015-01-13 17:22:44 -08:00
Vadim Kochan
67e1d73be1 tc: Allow to easy change network namespace
Added new '-netns' option to simplify executing following cmd:

    ip netns exec NETNS tc OPTIONS COMMAND OBJECT

    to

    tc -n[etns] NETNS OPTIONS COMMAND OBJECT

e.g.:

    tc -net vnet0 qdisc

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
2014-12-27 10:22:34 -08:00
Vadim Kochan
d954b34a1f tc class: Show classes as ASCII graph
Added new '-g[raph]' option which shows classes in the graph view.

Meanwhile only generic stats info output is supported.

e.g.:

$ tc/tc -g class show dev tap0
+---(1:2) htb rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b
|    +---(1:40) htb prio 0 rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b
|    +---(1:50) htb rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
|    |    +---(1:51) htb prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
|    |
|    +---(1:60) htb prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
|
+---(1:1) htb rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b
     +---(1:10) htb prio 0 rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b
     +---(1:20) htb prio 0 rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
     +---(1:30) htb prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b

$ tc/tc -g -s class show dev tap0
+---(1:2) htb rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b
|    |    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
|    |    rate 0bit 0pps backlog 0b 0p requeues 0
|    |
|    +---(1:40) htb prio 0 rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b
|    |          Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
|    |          rate 0bit 0pps backlog 0b 0p requeues 0
|    |
|    +---(1:50) htb rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
|    |    |     Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
|    |    |     rate 0bit 0pps backlog 0b 0p requeues 0
|    |    |
|    |    +---(1:51) htb prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
|    |               Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
|    |               rate 0bit 0pps backlog 0b 0p requeues 0
|    |
|    +---(1:60) htb prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
|               Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
|               rate 0bit 0pps backlog 0b 0p requeues 0
|
+---(1:1) htb rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b
     |    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
     |    rate 0bit 0pps backlog 0b 0p requeues 0
     |
     +---(1:10) htb prio 0 rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b
     |          Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
     |          rate 0bit 0pps backlog 0b 0p requeues 0
     |
     +---(1:20) htb prio 0 rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
     |          Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
     |          rate 0bit 0pps backlog 0b 0p requeues 0
     |
     +---(1:30) htb prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
                Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
                rate 0bit 0pps backlog 0b 0p requeues 0

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
2014-12-27 10:16:51 -08:00
Stephen Hemminger
5c2c10b17e Merge branch 'net-next' 2014-12-24 12:23:00 -08:00
Stephen Hemminger
3d0b7439df whitespace cleanup
Remove all trailing whitespace and space before tabs.
2014-12-20 15:47:17 -08:00
Stephen Hemminger
c9b8aef6ae Merge branch 'master' into net-next 2014-12-09 16:33:59 -08:00
Stephen Hemminger
b2e116d6c3 tc: minor spelling fixes 2014-12-03 19:28:34 -08:00
Jiri Pirko
8b1c0216d8 tc: add support for vlan tc action
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Reviewed-by: Cong Wang <cwang@twopensource.com>
2014-12-03 09:29:21 -08:00
Stephen Hemminger
edd3979272 emp: fix warning on deprecated bison directive
emp_ematch.y:12.1-13: warning: deprecated directive, use ‘%name-prefix’ [-Wdeprecated]
 %name-prefix="ematch_"
 ^^^^^^^^^^^^^
2014-10-09 08:31:10 -07:00
Jamal Hadi Salim
863ecb04b4 discourage use of direct policer interface
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2014-10-09 08:26:57 -07:00
Jamal Hadi Salim
287bf3a990 route classifier support for multiple actions
route can now use the action syntax

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2014-10-09 08:26:57 -07:00
Jamal Hadi Salim
08139c2ffb tcindex classifier support for multiple actions
tcindex can now use the action syntax

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2014-10-09 08:26:56 -07:00
Andy Furniss
a07c6d6135 add missing underscore to man page and example nf_mark ematch
The man page and the "fail" example are missing an underscore in the
nf_mark ematch.

eg.

tc filter add dev eth0 parent ffff:  basic match 'meta(nfmark gt 24)'
classid 2:4

meta: unknown meta id

... >>meta(nfmark gt 24)<< ...
... meta(>>nfmark<< gt 24)...
Usage: meta(OBJECT { eq | lt | gt } OBJECT)
where: OBJECT  := { META_ID | VALUE }
        META_ID := id [ shift SHIFT ] [ mask MASK ]

Example: meta(nfmark gt 24)
          meta(indev shift 1 eq "ppp")
          meta(tcindex mask 0xf0 eq 0xf0)

For a list of meta identifiers, use meta(list).
Illegal "ematch"

meta(list) does correctly show nf_mark and the above test works with
nf_mark.

Signed-off-by: Andy Furniss adf.lists@gmail.com
2014-10-09 08:24:00 -07:00
Jamal Hadi Salim
10f5a375ea rsvp classifier support for multiple actions
Example setup:

sudo tc qdisc del dev eth0 root handle 1:0 prio
sudo tc qdisc add dev eth0 root handle 1:0 prio

sudo tc filter add dev eth0 pref 10 proto ip parent 1:0 \
rsvp session 10.0.0.1 ipproto icmp \
classid 1:1  \
action police rate 1kbit burst 90k pipe \
action ok

tc -s filter show dev eth0 parent 1:0

filter protocol ip pref 10 rsvp
filter protocol ip pref 10 rsvp fh 0x0001100a flowid 1:1 session
10.0.0.1 ipproto icmp
        action order 1:  police 0x5 rate 1Kbit burst 23440b mtu 2Kb
action pipe overhead 0b
ref 1 bind 1
        Action statistics:
        Sent 98000 bytes 1000 pkt (dropped 0, overlimits 761 requeues 0)
        backlog 0b 0p requeues 0

        action order 2: gact action pass
         random type none pass val 0
         index 2 ref 1 bind 1 installed 60 sec used 3 sec
        Action statistics:
        Sent 74578 bytes 761 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Tested-by: John Fastabend <john.r.fastabend@intel.com>
2014-09-29 08:47:33 -07:00
Jamal Hadi Salim
954de6c72b actions: BugFix action stats to display with -s
Was broken by commit 288abf513f
Lets not be too clever and have a separate call to print flushed
actions info.

Broken looks like:
root@moja-1:~# tc actions add  action drop index 4
root@moja-1:~# tc -s actions ls action gact

    action order 0: gact action drop
     random type none pass val 0
     index 4 ref 1 bind 0 installed 9 sec used 4 sec

The fixed version looks like:
    action order 0: gact action drop
     random type none pass val 0
     index 4 ref 1 bind 0 installed 9 sec used 4 sec
         Sent 108948 bytes 1297 pkts (dropped 1297, overlimits 0)

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2014-09-29 08:47:19 -07:00
Jay Vosburgh
3757185b29 tc/netem: loss gemodel options fixes
First, the default value for 1-k is documented as being 0, but is
currently being set to 1. (100%).  This causes all packets to be dropped
in the good state if 1-k is not explicitly specified.  Fix this by setting
the default to 0.

	Second, the 1-h option is parsed correctly, however, the kernel is
expecting "h", not 1-h.  Fix this by inverting the "1-h" percentage before
sending to and after receiving from the kernel.  This does change the
behavior, but makes it consistent with the netem documentation and the
literature on the Gilbert-Elliot model, which refer to "1-h" and "1-k,"
not "h" or "k" directly.

	Last, fix a minor formatting issue for the options reporting.

Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
2014-08-04 10:15:10 -07:00
Yang Yingliang
aeb199d5ce fq: allow options of fair queue set to ~0U
Some options of fair queue cannot be (~0U). It leads to maxrate
cannot be reset to unlimited because it cannot be (~0U). Allow
the options being ~0U.

Tested by the following command:
 # tc qdisc add dev eth4 root handle 1: fq limit 2000 flow_limit 200 maxrate 100mbit quantum 2000 initial_quantum 1600
 # tc -s -d qdisc show
qdisc fq 1: dev eth4 root refcnt 2 limit 2000p flow_limit 200p buckets 1024 quantum 2000 initial_quantum 1600 maxrate 100Mbit
 Sent 1492 bytes 10 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  1 flows (0 inactive, 0 throttled)
  0 gc, 0 highprio, 0 throttled

 # tc qdisc change dev eth4 root handle 1: fq limit 4294967295 flow_limit 4294967295 maxrate 34359738360 quantum 4294967295 initial_quantum 4294967295
 # tc -s -d qdisc show
qdisc fq 1: dev eth4 root refcnt 2 limit 4294967295p flow_limit 4294967295p buckets 1024 quantum 4294967295 initial_quantum 4294967295
 Sent 38372 bytes 216 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  2 flows (1 inactive, 0 throttled)
  0 gc, 2 highprio, 7 throttled

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
2014-06-09 12:42:36 -07:00
Sergey V. Lobanov
3ff10e82c1 Fixed 'tc qdisc show' for tbf when latency<0
When limit<burst latency becomes <0, for example:
 # tc qdisc add dev eth0 root handle 1: tbf limit 100K burst 256K rate 256kbit
 # tc qdisc show
 qdisc tbf 1: dev eth0 root refcnt 2 rate 256Kbit burst 256Kb lat 4290.0s

If latency<0 there is no reason to show it. Limit will be printed instead of
latency when latency<0:
 # tc qdisc show
 qdisc tbf 1: dev eth0 root refcnt 2 rate 256Kbit burst 256Kb limit 100Kb

Signed-off-by: Sergey V. Lobanov <sergey@lobanov.in>
2014-05-28 17:08:16 -07:00
Jamal Hadi Salim
288abf513f actions: correctly report the number of actions flushed
This also fixes a long standing bug of not sanely reporting the
action chain ordering

Sample scenario test

on window 1(event window):
run "tc monitor" and observe events

on window 2:
sudo tc actions add action drop index 10
sudo tc actions add action ok index 12
sudo tc actions ls action gact
sudo tc actions flush action gact

See the event window reporting two entries
(doing another listing should show empty generic actions)

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2014-05-28 16:54:31 -07:00
Jamal Hadi Salim
9282d08d93 actions: keyword flowid or classid terminates action pipeline
scenario testcase:

TC="sudo ./tc/tc"
DEV="dev eth0"
$TC qdisc del $DEV ingress
$TC qdisc add $DEV ingress
$TC filter add $DEV parent ffff: protocol ip u32 match ip src 10.0.0.0/24 action police rate 6Mbit burst 6Mbit drop flowid :1
$TC filter add $DEV parent ffff: protocol ip u32 match ip dst 10.0.0.0/24 action police rate 1Gbit burst 1Gbit pass flowid :1
$TC -s filter ls $DEV parent ffff: protocol ip
$TC qdisc del $DEV ingress
$TC qdisc add $DEV ingress
$TC filter add $DEV parent ffff: protocol ip u32 match ip src 10.0.0.0/24 flowid 1:1 action police rate 6Mbit burst 6Mbit drop
$TC filter add $DEV parent ffff: protocol ip u32 match ip dst 10.0.0.0/24 flowid 1:2 action police rate 1Gbit burst 1Gbit pass

$TC -s filter ls $DEV parent ffff: protocol ip
$TC qdisc del $DEV ingress
$TC qdisc add $DEV ingress
$TC filter add $DEV parent ffff: protocol ip pref 10 \
u32 match ip protocol 1 0xff \
flowid 1:10 \
action skbedit mark 11 \
action police rate 10kbit burst 10k pipe index 1 \
action skbedit mark 12 \
action police rate 20kbit burst 20k pipe index 2 \
action mirred egress mirror dev dummy0

$TC -s filter ls $DEV parent ffff: protocol ip
$TC qdisc del $DEV ingress
$TC qdisc add $DEV ingress
$TC filter add $DEV parent ffff: protocol ip pref 10 \
u32 match ip protocol 1 0xff \
action skbedit mark 11 \
action police rate 10kbit burst 10k pipe index 1 \
action skbedit mark 12 \
action police rate 20kbit burst 20k pipe index 2 \
action mirred egress mirror dev dummy0 \
flowid 1:10

$TC -s filter ls $DEV parent ffff: protocol ip

Reported-by: Seann Herdejurgen <seann@herdejurgen.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2014-05-28 16:54:28 -07:00
Jamal Hadi Salim
cacba03b10 Remove unnecessary debug statement
Reported-by: Seann Herdejurgen <seann@herdejurgen.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2014-05-28 16:54:26 -07:00
Natanael Copa
dd9cc0ee81 iproute2: various header include fixes for compiling with musl libc
We need limits.h for LONG_MIN and LONG_MAX, sys/param.h for MIN and
sys/select for struct timeval.

This fixes the following compile errors with musl libc:

f_bpf.c: In function 'bpf_parse_opt':
f_bpf.c:181:12: error: 'LONG_MIN' undeclared (first use in this function)
   if (h == LONG_MIN || h == LONG_MAX) {
            ^
...

tc_util.o: In function `print_tcstats2_attr':
tc_util.c:(.text+0x13fe): undefined reference to `MIN'
tc_util.c:(.text+0x1465): undefined reference to `MIN'
tc_util.c:(.text+0x14ce): undefined reference to `MIN'
tc_util.c:(.text+0x154c): undefined reference to `MIN'
tc_util.c:(.text+0x160a): undefined reference to `MIN'
tc_util.o:tc_util.c:(.text+0x174e): more undefined references to `MIN' follow
...

tc_stab.o: In function `print_size_table':
tc_stab.c:(.text+0x40f): undefined reference to `MIN'
...

fdb.c:247:30: error: 'ULONG_MAX' undeclared (first use in this function)
        (vni >> 24) || vni == ULONG_MAX)
                              ^

lnstat.h:28:17: error: field 'last_read' has incomplete type
  struct timeval last_read;  /* last time of read */
                 ^

Signed-off-by: Natanael Copa <ncopa@alpinelinux.org>
2014-05-28 16:51:39 -07:00
Andreas Greve
6e2e5ec28b fix print_ipt: segfault if more then one filter with action -j MARK.
BUG: tc filter show ... produce a segmentation fault if more than one
filter rule with action -j MARK exists.

Reason: In print_ipt(...) xtables will be initialzed with a
pointer to the static struct tcipt_globals at xtables_init_all().
Later on the fields .opts and .options_offset of tcipt_globals are
modified. The call of xtables_free_opts(1) at the end of print(...)
does not restore the original values of tcipt_globals for the
modified fields. It only frees some allocated memory and sets
.opts to NULL. This leads to a segmentation fault when print_ipt()
is called for the next filter rule with action -j MARK.

Fix: Cloneing tcipt_globals on the stack as tmp_tcipt_globals and
use it instead of tcipt_globals, so tcipt_globals will be not
modified.

Signed-off-by: Andreas Greve <andreas.greve@a-greve.de>
2014-05-13 13:10:31 -07:00
Terry Lam
ac74bd2a71 support for Heavy Hitter Filter (HHF) qdisc
$tc qdisc add dev eth0 hhf help
Usage: ... hhf [ limit PACKETS ] [ quantum BYTES]
               [ hh_limit NUMBER ]
               [ reset_timeout TIME ]
               [ admit_bytes BYTES ]
               [ evict_timeout TIME ]
               [ non_hh_weight NUMBER ]

$tc -s -d qdisc show dev eth0
qdisc hhf 8005: root refcnt 32 limit 1000p quantum 1514 hh_limit 2048
reset_timeout 40.0ms admit_bytes 131072 evict_timeout 1.0s non_hh_weight 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
    drop_overlimit 0 hh_overlimit 0 tot_hh 0 cur_hh 0

HHF qdisc parameters:
- limit: max number of packets in qdisc (default 1000)
- quantum: max deficit per RR round (default 1 MTU)
- hh_limit: max number of HHs to keep states (default 2048)
- reset_timeout: time to reset HHF counters (default 40ms)
- admit_bytes: counter thresh to classify as HH (default 128KB)
- evict_timeout: threshold to evict idle HHs (default 1s)
- non_hh_weight:  DRR weight for mice (default 2)

Signed-off-by: Terry Lam <vtlam@google.com>
2014-05-09 12:10:47 -07:00
Jay Vosburgh
8f9672af7a tc/netem: fix loss state display and p14 parsing
The display of the entire netem loss state is shown as if it
were gemodel state, as the loss state information is assigned to the
wrong pointer.  Correct this by assigning the loss state to the correct
pointer.

	Additionally, attempting to set netem loss state will result in
random values in the p14 state probability because the option value
passed to the kernel by tc netem is not parsed or initialized.  Fix this
by supplying a default value of 0 for p14 and parsing the p14 value if
one is supplied.

Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
2014-05-09 12:06:58 -07:00
Hiroaki SHIMODA
4d4da09e00 htb: Move direct_qlen code part to htb_parse_opt().
The direct_qlen command option is used with qdisc operation.
It happened to be implemented in htb_parse_class_opt() which is called
with class operation.

Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
2014-03-21 14:20:06 -07:00
WANG Cong
1c9af05071 pedit: do not print debugging information by default
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
2014-02-10 14:43:52 -08:00
Yang Yingliang
dad2f72bef netem: add 64bit rates support
netem support 64bit rates start from linux-3.13.
Add 64bit rates support in tc tools.

tc qdisc show dev eth0
qdisc netem 1: dev eth4 root refcnt 2 limit 1000 rate 35Gbit

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Acked-by: Eric Dumazet <edumazet@google.com>
2014-01-20 12:32:15 -08:00
Yang Yingliang
a01de0a336 tbf: support sending burst/mtu to kernel directly
To avoid loss when transforming burst to buffer in userspace, send
burst/mtu to kernel directly.

Kernel commit 2e04ad424b("sch_tbf: add TBF_BURST/TBF_PBURST attribute")
make it can handle burst/mtu.

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
2014-01-20 12:32:14 -08:00
Vijay Subramanian
80dd880dd0 PIE: Proportional Integral controller Enhanced
Proportional Integral controller Enhanced (PIE) is a scheduler to address the
bufferbloat problem.

We present here a lightweight design, PIE(Proportional Integral controller
Enhanced) that can effectively control the average queueing latency to a target
value. Simulation results, theoretical analysis and Linux testbed results have
shown that PIE can ensure low latency and achieve high link utilization under
various congestion situations. The design does not require per-packet
timestamp, so it incurs very small overhead and is simple enough to implement
in both hardware and software.  "

For more information, please see technical paper about PIE in the IEEE
Conference on High Performance Switching and Routing 2013. A copy of the paper
can be found at ftp://ftpeng.cisco.com/pie/.

Please also refer to the IETF draft submission at
http://tools.ietf.org/html/draft-pan-tsvwg-pie-00

All relevant code, documents and test scripts and results can be found at
ftp://ftpeng.cisco.com/pie/.

For problems with the iproute2/tc or Linux kernel code, please contact Vijay
Subramanian (vijaynsu@cisco.com or subramanian.vijay@gmail.com) Mythili Prabhu
(mysuryan@cisco.com)

Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Signed-off-by: Mythili Prabhu <mysuryan@cisco.com>
CC: Dave Taht <dave.taht@bufferbloat.net>
2014-01-09 22:50:47 -08:00
Stephen Hemminger
ef056b2190 Merge branch 'master' into net-next-for-3.13 2014-01-09 22:44:17 -08:00
Jamal Hadi Salim
f24a7e7205 dont skip action order
attached.

cheers,
jamal
commit 58d78f9f6447df324cdeb99262442c5e3f1f924b
Author: Jamal Hadi Salim <jhs@mojatatu.com>
Date:   Sun Dec 22 10:34:18 2013 -0500

    dont skip displaying of action chains or lists by TCA_ACT_MAX_PRIO

    Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2013-12-28 10:57:34 -08:00
Jamal Hadi Salim
b159a7f1ae allow batch gets of actions
Attached.

cheers,
jamal
commit c5f30cabef14c951596210b96bc9b423b0d39592
Author: Jamal Hadi Salim <hadi@mojatatu.com>
Date:   Sun Dec 22 10:24:17 2013 -0500

    Allow batching of action gets
    Example:
    ----
    tc actions get \
    action gact index 100 \
    action gact index 4
    ----

    Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2013-12-28 10:57:34 -08:00
Jamal Hadi Salim
352f6f97be simple print newline
attached.

cheers,
jamal
commit d7869e6167c3553e93e254940b0647032b40fed8
Author: Jamal Hadi Salim <jhs@mojatatu.com>
Date:   Sun Dec 22 07:46:28 2013 -0500

    print new line at the end for aesthetics

    Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2013-12-28 10:57:34 -08:00
Jamal Hadi Salim
4bfb21ca20 policer - retire old syntax
attached.

cheers,
jamal
commit b82057d9ec851a8aba8a295b959190ef5098f330
Author: Jamal Hadi Salim <jhs@mojatatu.com>
Date:   Sat Dec 21 17:00:11 2013 -0500

    After a decade of trying to deprecate the old policer syntax,
    I believe it is time to kill it. The kernel build option for old
    policer is gone for at least 5 years now (although backward
    compatibility is still there). Being backward compatible meant
    hijacking the keyword "action" and was obstructing policies like:

    tc filter add dev eth0 parent ffff: protocol ip pref 10 \
    u32 match ip protocol 1 0xff flowid 1:10 \
    action skbedit mark 1 \
    action police rate 10kbit burst 10k pipe \
    action skbedit mark 2 \
    action police rate 20kbit burst 20k pipe \
    action action mirred egress mirror dev dummy0

    Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2013-12-28 10:57:34 -08:00
Jamal Hadi Salim
02b1d345b7 skbedit print missing metadata
skbedit should print the index and other generic metadata info

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2013-12-28 10:57:34 -08:00
Jamal Hadi Salim
64b7db4db7 skbedit to default to pipe
Allow skbedit to be used as is in an action chain by default
without need to specify pipe

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2013-12-28 10:57:34 -08:00
Stephen Hemminger
4d98ab00de Fix FSF address in file headers 2013-12-06 15:05:07 -08:00
Eric Dumazet
8cecdc2837 tc: more user friendly rates
Display more user friendly rates.

10Mbit is more readable than 10000Kbit

Before :
class htb 1:2 root prio 0 rate 10000Kbit ceil 10000Kbit ...

After:
class htb 1:2 root prio 0 rate 10Mbit ceil 10Mbit ...

Signed-off-by: Eric Dumazet <edumazet@google.com>
2013-12-02 23:48:11 -08:00
Yang Yingliang
ddc6243e9a tbf: add 64bit rates support
tbf support 64bit rates start from linux-3.13.
Add 64bit rates support in tc tools.

tc qdisc show dev eth0
qdisc tbf 1: root refcnt 2 rate 40000Mbit burst 230000b peakrate 50000Mbit minburst 87500b lat 50.0ms

This is a followup to ("htb: support 64bit rates").

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Cc: Eric Dumazet <edumazet@google.com>
2013-12-02 23:46:56 -08:00
Eric Dumazet
8334bb325d htb: support 64bit rates
Starting from linux-3.13, we can break the 32bit limitation of
rates on HTB qdisc/classes.

Prior limit was 34.359.738.360 bits per second.

lpq83:~# tc -s qdisc show dev lo ; tc -s class show dev lo
qdisc htb 1: root refcnt 2 r2q 2000 default 1 direct_packets_stat 0 direct_qlen 6000
 Sent 6591936144493 bytes 149549182 pkt (dropped 0, overlimits 213757419 requeues 0)
 rate 39464Mbit 114938pps backlog 0b 15p requeues 0
class htb 1:1 root prio 0 rate 50000Mbit ceil 50000Mbit burst 200000b cburst 0b
 Sent 6591942184547 bytes 149549310 pkt (dropped 0, overlimits 0 requeues 0)
 rate 39464Mbit 114938pps backlog 0b 15p requeues 0
 lended: 149549310 borrowed: 0 giants: 0
 tokens: 336 ctokens: -164

Signed-off-by: Eric Dumazet <edumazet@google.com>
2013-11-22 17:36:18 -08:00
Daniel Borkmann
d05df6861f tc: add cls_bpf frontend
This is the iproute2 part of the kernel patch "net: sched:
add BPF-based traffic classifier".

[Will re-submit later again for iproute2 when window for
 -next submissions opens.]

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Thomas Graf <tgraf@suug.ch>
2013-10-30 16:45:05 -07:00
Nigel Kukard
9bea14ff6b Fix tc stats when using -batch mode
There are two global variables in tc/tc_class.c:
__u32 filter_qdisc;
__u32 filter_classid;

These are not re-initialized for each line received in -batch mode:
class show dev eth0 parent 1: classid 1:1
class show dev eth0 parent 1: classid 1:1
Error: duplicate "classid": "1:1" is the second value.

This patch fixes the issue by initializing the two globals when we
enter print_class().

Signed-off-by: Nigel Kukard <nkukard@lbsd.net>
2013-10-30 16:37:07 -07:00
Stephen Hemminger
734c0ca2ca htb: remove old unused duplicate qdisc name
Alexey had htb2 as name for version in ancient code.
2013-10-27 12:28:38 -07:00
Stephen Hemminger
0a502b21e3 Fix handling of qdis without options
Some qdisc like htb want the parse_qopt to be called even if no options
present. Fixes regression caused by:

e9e78b0db0 is the first bad commit
commit e9e78b0db0
Author: Stephen Hemminger <stephen@networkplumber.org>
Date:   Mon Aug 26 08:41:19 2013 -0700

    tc: allow qdisc without options
2013-10-27 12:26:47 -07:00
Jamal Hadi Salim
e26520e5c1 action: typo nat fix
If you taketh you giveth.
I Went the LinuxWay and copied this for m_simple.c and noticed
this one typo (I wonder where it came from?;->).

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2013-09-30 21:31:40 -07:00
Jamal Hadi Salim
087f46ee4e tc: introduce simple action
Simple action is already in the kernel for years now as an
example. This complements it with user space control.

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2013-09-30 21:29:34 -07:00
Stephen Hemminger
af60cf40c9 Merge branch 'net-next-3.11' 2013-09-23 13:16:48 -07:00
Eric Dumazet
b43f331828 htb: add support for direct_qlen attribute
TCA_HTB_DIRECT_QLEN attribute is supported since linux-3.10

HTB classes use an internal pfifo queue, which limit was not reported
by tc, and value inherited from device tx_queue_len at setup time.

With this patch, tc displays the value and can change it.

Signed-off-by: Eric Dumazet <edumazet@google.com>
2013-09-20 09:48:13 -07:00
Eric Dumazet
8f7574edd8 tc: support TCA_STATS_RATE_EST64
Since linux-3.11, rate estimator can provide TCA_STATS_RATE_EST64
when rate (bytes per second) is above 2^32 (~34 Mbits)

Change tc to use this attribute for high rates.

Signed-off-by: Eric Dumazet <edumazet@google.com>
2013-09-20 09:46:33 -07:00
Eric Dumazet
bc113e46a3 pkt_sched: fq: Fair Queue packet scheduler
Support for FQ packet scheduler

$ tc qd add dev eth0 root fq help
Usage: ... fq [ limit PACKETS ] [ flow_limit PACKETS ]
              [ quantum BYTES ] [ initial_quantum BYTES ]
              [ maxrate RATE  ] [ buckets NUMBER ]
              [ [no]pacing ]

$ tc -s -d qd
qdisc fq 8002: dev eth0 root refcnt 32 limit 10000p flow_limit 100p
buckets 256 quantum 3028 initial_quantum 15140
 Sent 216532416 bytes 148395 pkt (dropped 0, overlimits 0 requeues 14)
 backlog 0b 0p requeues 14
  511 flows (511 inactive, 0 throttled)
  110 gc, 0 highprio, 0 retrans, 1143 throttled, 0 flows_plimit

limit	: max number of packets on whole Qdisc (default 10000)

flow_limit : max number of packets per flow (default 100)

quantum : the max deficit per RR round (default is 2 MTU)

initial_quantum : initial credit for new flows (default is 10 MTU)

maxrate : max per flow rate (default : unlimited)

buckets : number of RB trees (default : 1024) in hash table.
               (consumes 8 bytes per bucket)

[no]pacing : disable/enable pacing (default is enable)

Usage :

tc qdisc add dev $ETH root fq

tc qdisc del dev $ETH root 2>/dev/null
tc qdisc add dev $ETH root handle 1: mq
for i in `seq 1 4`
do
  tc qdisc add dev $ETH parent 1:$i est 1sec 4sec fq
done

Signed-off-by: Eric Dumazet <edumazet@google.com>
2013-09-20 09:43:40 -07:00
Jesper Dangaard Brouer
3e92ff522a linklayer interface between kernel and tc/userspace
This iproute2 tc patch is connected to the kernel
 - commit 8a8e3d84b17 (net_sched: restore "linklayer atm" handling)

The rate table calculated by tc, have gotten replaced in the kernel
and is no-longer used for lookups.

This happened in kernel release v3.8 caused by kernel
 - commit 56b765b79 ("htb: improved accuracy at high rates").
This change unfortunately caused breakage of tc overhead and
linklayer parameters.

 Kernel overhead handling got fixed in kernel v3.10 by
 - commit 01cb71d2d47 (net_sched: restore "overhead xxx" handling)

 Kernel linklayer handling got fixed in kernel v3.11 by
 - commit 8a8e3d84b17 (net_sched: restore "linklayer atm" handling)

The linklayer fix introduced a struct change, that allow the linklayer
attribute to be transferred between tc and kernel. This patch make use
of this linklayer attribute.

The linklayer setting is transfer to the kernel.  And linklayer
setting received from the kernel is printed with a prefixed
"linklayer" when listing current configuration.  The default
TC_LINKLAYER_ETHERNET is only printed in detailed output mode.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
2013-09-03 08:21:24 -07:00
Stephen Hemminger
e9e78b0db0 tc: allow qdisc without options
Pfifo_fast needs no options. So don't force it to have parsing code.
2013-08-26 08:41:19 -07:00
Stephen Hemminger
b8a45897b9 More minor spelling fixes 2013-08-04 15:10:05 -07:00
Stephen Hemminger
a3aa47a559 Make tc and ip batch mode consistent
Change the code for tc and ip so that batch mode is handled
the same.
2013-07-16 10:04:05 -07:00
Eric Dumazet
a303853e84 get_rate: detect 32bit overflows
On Mon, 2013-06-03 at 16:36 +0100, Ben Hutchings wrote:

> Oops, I read this as being strtol() currently, not strtod().  Currently
> '1.5gbit' will work, but this change will break that.  So I think you
> need to keep bps as a double.

Arg

> Then here I think the check should be *rate != floor(bps), i.e. accept
> rounding down of a non-integer number of bytes but any other change is
> assumed to be overflow.

Thanks Ben, here is v4 then ;)

[PATCH v4] get_rate: detect 32bit overflows

Current rate limit is 34.359.738.360 bit per second, and
unfortunately 40Gbps links are above it.

overflows in get_rate() are currently not detected, and some
users are confused. Let's detect this and complain.

Note that some qdisc are ready to get extended range, but this will
need additional attributes and new iproute2

With help from Ben Hutchings

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>
2013-06-07 09:24:56 -07:00
Stephen Hemminger
22fa92e367 htb: fix indentation
iproute2 uses kernel style indenting
2013-06-07 08:54:45 -07:00
Eric Dumazet
44f1ff0afc htb: report overhead attribute
"tc class show dev ..." omits the overhead attribute for HTB.

After patch I have :

tc class add dev $DEV parent 1: classid 1:1 est 1sec 4sec htb \
    rate 12Mbit mtu 1500 quantum 1514 overhead 20

tc class show dev $DEV
class htb 1:1 root prio 0 rate 12000Kbit overhead 20 ceil 12000Kbit
burst 1500b cburst 1500b

Signed-off-by: Eric Dumazet <edumazet@google.com>
2013-06-07 08:53:53 -07:00
Alexander Duyck
cfa292defa iproute2: act_ipt fix xtables breakage on older versions.
In trying to build on a RHEL6.3 I ran into several build issues that are
addressed in this patch.

The first is that xtables_merge_options only has 3 parameters.  It appears
this is how this code was originally.  As such for the case where the version
is less than 6 I am assuming it would be correct to maintain the original
setup that only had 3 parameters being passed instead of 4.

I also ran into an issue with the define for __ALIGN_KERNEL not being present.
I believe this may be due to the fact that __ALIGN_KERNEL was moved into a
separate header from ALIGN after the UAPI changes.  In order to just cover all
of the bases I have moved the main definition for the macros into
__ALIGN_KERNEL_MASK and __ALIGN_KERNEL and if ALIGN is also needed then it is
just a direct redefine to __ALIGN_KERNEL.

Cc: Hasan Chowdhury <shemonc@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2013-05-01 08:01:47 -07:00
Stephen Hemminger
e7b24b67db Fix build when shared libraries are disabled
On some platforms, shared libraries are not used. The stub code
need some updating to not generate errors.
2013-03-13 08:29:59 -07:00
Kees van Reeuwijk
3bed7bb7e7 iproute2: clearer error messages for fifo and tbf qdiscs
Clearer error messages for fifo and tbf qdiscs:
- Say who is complaining
- Don't just say a parameter is bad, show the offending parameter
- Be clearer about duplicate parameters vs illegal pairs of parameters
- Try to give multiple error messages rather than let the user discover the errors one by one
- When there are parameter aliases, try to use the variant that was used, or at least mention them all

Note that in the old version an empty parameter list to tbf would just cause an explain() message
without a specific error message. By simply removing the relevant error check, the code now
handles this error more gracefully by printing an error message for all mandatory parameters.
It still prints the explain() message.

Signed-off-by: Kees van Reeuwijk <reeuwijk@few.vu.nl>
2013-02-21 08:34:34 -08:00
Stephen Hemminger
d1f28cf181 ip: make local functions static 2013-02-12 11:38:35 -08:00
Benjamin Poirier
5ab3a4de5e Use pkg-config to obtain xtables.h path
On openSUSE 12.2 (at least) xtables.h is not installed in the system-wide
include dir but in /usr/include/iptables-1.4.16.3/. This results in the
following build failure:
em_ipset.c:26:21: fatal error: xtables.h: No such file or directory

Other includers of xtables.h already call out to pkg-config
2013-02-11 09:19:54 -08:00
Johannes Naab
e72ca3fbb0 iproute2: tc netem rate: allow negative packet/cell overhead
by fixing the parsing of command-line tokens

Signed-off-by: Johannes Naab <jn@stusta.de>
2013-02-04 09:06:50 -08:00
Jamal Hadi Salim
852d51222d iproute2: act_ipt fix xtables breakage
Fixes breakage with xtables API starting with version 1.4.10

Signed-off-by: Hasan Chowdhury <shemonc@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
2013-01-16 08:14:48 -08:00
Strake
5bd9dd49ae include needed files
Needed to build iproute2 with musl
2012-12-23 11:49:06 -08:00
Mike Frysinger
e4fc4ada33 allow pkg-config to be customized
Rather than hard coding `pkg-config`, use ${PKG_CONFIG} so people can
override it to their specific version (like when cross-compiling).

This is the same way the upstream pkg-config code works.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
2012-11-11 16:21:34 -08:00
Matt Burgess
92905c6e0d iproute2-3.6.0 assumes presence of iptables
Hi,

When compiling iproute2-3.6.0 on a host that doesn't have iptables available, I get the following error:

gcc -Wall -Wstrict-prototypes -O2 -I../include -DRESOLVE_HOSTNAMES
-DLIBDIR=\"/usr/lib\" -DCONFDIR=\"/etc/iproute2\" -D_GNU_SOURCE
-DCONFIG_GACT -DCONFIG_GACT_PROB -DYY_NO_INPUT   -c -o em_ipset.o
em_ipset.c
em_ipset.c:26:21: fatal error: xtables.h: No such file or directory

Fixed by the following patch, which guards the building of em_ipset.o on
the presence of suitable headers.

Thanks,

Matt.
2012-10-03 08:51:29 -07:00
Rostislav Lisovy
7b5f30e14f Ematch used to classify CAN frames according to their identifiers
This ematch enables effective filtering of CAN frames (AF_CAN) based
on CAN identifiers with masking of compared bits. Implementation
utilizes bitmap based classification for standard frame format (SFF)
which is optimized for minimal overhead.

Signed-off-by: Rostislav Lisovy <lisovy@gmail.com>
2012-08-20 13:11:55 -07:00
Dan Kenigsberg
f1675d615b utils: invarg: msg precedes the faulty arg
fix all call which reversed the arg order.

Signed-off-by: Dan Kenigsberg <danken@redhat.com>
2012-08-17 13:35:36 -07:00
Florian Westphal
8194411a42 tc: add ipset ematch
example usage:
tc filter add dev $dev parent $id: basic match not ipset'(foobar src)' ..

also updates iproute2/ematch_map, else tc complains:
Error: Unable to find ematch "ipset" in /etc/iproute2/ematch_map
Please assign a unique ID to the ematch kind the suggested entry is:
        8       ipset

when trying to use this ematch.

(text ematch (5) only exists in kernel, a vlan ematch (6) exists neither in
 kernel nor userspace, but kernel headers define TCF_EM_VLAN == 6).
2012-08-13 08:33:50 -07:00
Li Wei
6cef544b96 tc: man: change man page and comment to confirm to code's behavior.
Since the get_rate() code incorrectly interpreted bare number, the
behavior is not the same as man page and comment described.

We need to change the man page and comment for compatible with the
existing usage by scripts.
2012-07-12 09:05:28 -07:00
Li Wei
424adc19bf tc: filter: validate filter priority in userspace.
Because we use the high 16 bits of tcm_info to pass prio value to
kernel, thus it's range would be [0, 0xffff], without validation
in tc when user pass a lager(>65535) priority, the actual priority
set in kernel would confuse the user.

So, add a validation to ensure prio in the range.
2012-07-10 15:39:30 -07:00
Hiroaki SHIMODA
690b11f4a6 tc: u32: Fix firstfrag filter.
On current firstfrag filter, all non fragmented packets are matched.
firstfrag should check MF bit.

Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
2012-07-10 15:39:02 -07:00
Hiroaki SHIMODA
1d62f99fe2 tc: u32: Fix icmp_code off.
The off of icmp_code is not 20 but 21. Also offmask should be 0 unless
nexthdr+ is specified.

Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
2012-07-10 15:39:02 -07:00
Li Wei
3c4f545633 tc: prio: Perform more strict check on priomap.
Since band number counts from zero thus band must be little than
opt.bands.
2012-06-18 12:25:08 -07:00
Vijay Subramanian
50a3ec3c46 tc-codel: Update usage text
codel can take 'noecn' as an option. This also makes it consistent with the
manpage.

Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
2012-05-24 15:02:05 -07:00
Eric Dumazet
c3524efc14 fq_codel: Fair Queue Codel AQM
Fair Queue Codel packet scheduler

Principles :

- Packets are classified (internal classifier or external) on flows.
- This is a Stochastic model (as we use a hash, several flows might
                              be hashed on same slot)
- Each flow has a CoDel managed queue.
- Flows are linked onto two (Round Robin) lists,
  so that new flows have priority on old ones.

- For a given flow, packets are not reordered (CoDel uses a FIFO)
- head drops only.
- ECN capability is on by default.
- Very low memory footprint (64 bytes per flow)

tc qdisc ... fq_codel [ limit PACKETS ] [ flows number ]
                      [ target TIME ] [ interval TIME ] [ noecn ]
                      [ quantum BYTES ]

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Dave Taht <dave.taht@bufferbloat.net>
Cc: Kathleen Nichols <nichols@pollere.com>
Cc: Van Jacobson <van@pollere.net>
Cc: Tom Herbert <therbert@google.com>
Cc: Matt Mathis <mattmathis@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Changli Gao <xiaosuo@gmail.com>
2012-05-22 14:17:49 -07:00
Eric Dumazet
185d88f99b tc_codel: Controlled Delay AQM
An implementation of CoDel AQM, from Kathleen Nichols and Van Jacobson.

http://queue.acm.org/detail.cfm?id=2209336

This AQM main input is no longer queue size in bytes or packets, but the
delay packets stay in (FIFO) queue.

As we don't have infinite memory, we still can drop packets in enqueue()
in case of massive load, but mean of CoDel is to drop packets in
dequeue(), using a control law based on two simple parameters :

target : target sojourn time (default 5ms)
interval : width of moving time window (default 100ms)

Selected packets are dropped, unless ECN is enabled and packets can get
ECN mark instead.

Usage: tc qdisc ... codel [ limit PACKETS ] [ target TIME ]
                          [ interval TIME ] [ ecn ]

qdisc codel 10: parent 1:1 limit 2000p target 3.0ms interval 60.0ms ecn
 Sent 13347099587 bytes 8815805 pkt (dropped 0, overlimits 0 requeues 0)
 rate 202365Kbit 16708pps backlog 113550b 75p requeues 0
  count 116 lastcount 98 ldelay 4.3ms dropping drop_next 816us
  maxpacket 1514 ecn_mark 84399 drop_overlimit 0

CoDel must be seen as a base module, and should be used keeping in mind
there is still a FIFO queue. So a typical setup will probably need a
hierarchy of several qdiscs and packet classifiers to be able to meet
whatever constraints a user might have.

One possible example would be to use fq_codel, which combines Fair
Queueing and CoDel, in replacement of sfq / sfq_red.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Dave Taht <dave.taht@bufferbloat.net>
2012-05-22 14:13:52 -07:00
Vijay Subramanian
1070205dc0 tc-netem: Add support for ECN packet marking
This patch provides support for marking packets with ECN instead of
dropping them with netem. This makes it possible to make use of the
netem ECN marking feature that was added recently to the kernel.

Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
2012-05-22 14:10:21 -07:00
Christoph J. Thompson
5c434a9e5a iproute2 - Fix up and simplify variables pointing to install directories
Define where is the are located the iproute2 config files.
Get rid of trailing slashes for paths in several file.

Signed-off-by: Christoph J. Thompson <cjsthompson@gmail.com>
2012-04-12 09:49:10 -07:00
Stephen Hemminger
ff24746cca Convert to use rta_getattr_ functions
User new functions (inspired by libmnl) to do type safe access
of routeing attributes
2012-04-10 08:47:55 -07:00
Anton Danilov
90d98edf39 csum action, fix typo 2012-03-15 14:24:59 -07:00
Andreas Henriksson
f526af995e iproute: fix tc -iec display of Mibit rates
As reported by Thomas Mühlgrabner <muehltom@cable.vol.at>
in http://bugs.debian.org/662979 :

 When showing htb class configuration with "tc -iec class show",
 the output for Mibit is actually the value for bit.
 Example: configure a class with a ceil of 1000Mibit.
 Output states 1048576000 Mibit.

The cause is missing parenteses in the display code of tc....

(Please also note that a lower value of 100Mibit will be displayed
as 102400 Kibit, which I think is kind of ugly.)

Reported-by: Thomas Mühlgrabner <muehltom@cable.vol.at>
Signed-off-by: Andreas Henriksson <andreas@fatal.se>
2012-03-10 09:13:58 -08:00
Yegor Yefremov
8ced4fcd50 iproute2: cleanup dependencies
LIBNETLINK will be defined in the main Makefile, so
both ../lib/libnetlink.a ../lib/libutil.a will be
automatically appended during linking. Otherwise
../lib/libnetlink.a ../lib/libutil.a will appear
twice during linking.

Signed-off-by: Yegor Yefremov <yegorslists@googlemail.com>
2012-02-27 08:27:54 -08:00
Petr Sabata
e2a4536a43 iproute2: tc - mqprio formatted print fix
Just a minor correction of mqprio printf()'s.

Reported-by: Petr Písař <ppisar@redhat.com>
Signed-off-by: Petr Šabata <contyk@redhat.com>
2012-02-22 15:23:12 -08:00
Stephen Hemminger
d798a0483e red: add missing include math.h
red now uses pow() function.
2012-02-06 09:45:50 -08:00
Vijay Subramanian
14a1c164d1 netem: Fail cleanly if user input is wrong
(Resending patch since it looks like my earlier mail did not make it to
netdev).

netem reordering requires that the delay parameter be given. Currently, if no
delay is given, tc prints the error message but still installs the qdisc. Fix
this by printing the usage and failing cleanly.

Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
2012-01-20 11:21:58 -08:00
Eric Dumazet
1b6f0bb5be gred: support TCA_GRED_MAX_P attribute
TCA_GRED_MAX_P permits to express high resolution probabilities.

New output (on 3.3+ kernel) :

disc gred 9442: root refcnt 17
 DP:0 (prio 1) Average Queue 0b Measured Queue 0b
	 Packet drops: 0 (forced 0 early 0)
	 Packet totals: 20 (bytes 2584)
 limit 31460b min 3000b max 9000b ewma 5 probability 0.05 Scell_log 15

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
2012-01-20 08:12:24 -08:00
Eric Dumazet
650252d8c3 choke: support TCA_CHOKE_MAX_P
TCA_CHOKE_MAX_P permits to express high resolution RED probability.

tc qdisc add dev $DEV parent 1:1 handle 10: est 1sec 8sec choke \
	limit 90 ecn min 10 max 30 probability 0.05 bandwidth 10Mbit

Before patch :

tc -s -d qdisc show dev eth3
qdisc ... limit 90p min 10p max 30p ecn ewma 3 Plog 19 Scell_log 13

After :

qdisc ... limit 90p min 10p max 30p ecn ewma 3 probability 0.05
Scell_log 13

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
2012-01-20 08:12:23 -08:00
Eric Dumazet
6987ecf083 sfq: add optional RED on top of SFQ
Adds an optional Random Early Detection on each SFQ flow queue.

Traditional SFQ limits count of packets, while RED permits to also
control number of bytes per flow, and adds ECN capability as well.

1) We dont handle the idle time management in this RED implementation,
since each 'new flow' begins with a null qavg. We really want to address
backlogged flows.

2) if headdrop is selected, we try to ecn mark first packet instead of
currently enqueued packet. This gives faster feedback for tcp flows
compared to traditional RED [ marking the last packet in queue ]

Example of use :

tc qdisc add dev $DEV parent 1:1 handle 10: est 1sec 4sec sfq \
	limit 3000 headdrop flows 512 divisor 16384 \
	redflowlimit 100000 min 8000 max 60000 probability 0.20 ecn

qdisc sfq 10: parent 1:1 limit 3000p quantum 1514b depth 127 headdrop
flows 512/16384 divisor 16384
 ewma 6 min 8000b max 60000b probability 0.2 ecn
 prob_mark 0 prob_mark_head 4876 prob_drop 6131
 forced_mark 0 forced_mark_head 0 forced_drop 0
 Sent 1175211782 bytes 777537 pkt (dropped 6131, overlimits 11007
requeues 0)
 rate 99483Kbit 8219pps backlog 689392b 456p requeues 0

In this test, with 64 netperf TCP_STREAM sessions, 50% using ECN enabled
flows, we can see number of packets CE marked is smaller than number of
drops (for non ECN flows)

If same test is run, without RED, we can check backlog is much bigger.

qdisc sfq 10: parent 1:1 limit 3000p quantum 1514b depth 127 headdrop
flows 512/16384 divisor 16384
 Sent 1148683617 bytes 795006 pkt (dropped 0, overlimits 0 requeues 0)
 rate 98429Kbit 8521pps backlog 1221290b 841p requeues 0

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
2012-01-20 08:12:22 -08:00
Eric Dumazet
54a2fce832 red: fix adaptive spelling
Reported-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
2012-01-20 08:12:21 -08:00
Eric Dumazet
e7e4abea3e red: Add adaptative algo Logged in as shemminger
Enable Adaptative RED algo, using :

tc qdisc  ... red limit BYTES ... adaptative ...

Support of high precision probability/max_p setting and reporting, with
support of old kernels.

With a new kernel, "Plog ..." is replaced in tc output by "probability
value" :

qdisc red 10: dev eth3 parent 1:1 limit 360Kb min 30Kb max 90Kb ecn ewma
5 probability 0.09 Scell_log 15
2012-01-19 14:45:20 -08:00
Hagen Paul Pfeifer
6b8dc4deea tc: netem rate shaping and cell extension
This patch add rate shaping as well as cell support. The link-rate can be
specified via rate options. Three optional arguments control the cell
knobs: packet-overhead, cell-size, cell-overhead. To ratelimit eth0 root
queue to 5kbit/s, with a 20 byte packet overhead, 100 byte cell size and
a 5 byte per cell overhead:

	tc qdisc add dev eth0 root netem rate 5kbit 20 100 5

Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
2012-01-19 14:28:27 -08:00
Jan Engelhardt
8e91a80d97 iproute2: fix calling up the xt action
Upsteam: has not been sent yet

Requesting the xt action never succeeded because it registered
using the wrong name.
2012-01-03 15:07:38 -08:00
Jan Engelhardt
d7aa57d450 iproute2: proper detection of libxtables position and flags
Upstream: not sent yet

Any tests involving iptables _MUST_ utilize pkg-config to find the
proper locations of the installation.
2012-01-03 15:05:25 -08:00
Stephen Hemminger
155ad8023b ematch: fix warning about unused input()
Use existing compile flag to indicate that input() is not used
by tc ematch, fixes compiler warning.
2012-01-03 13:55:59 -08:00
Stephen Hemminger
5761f04fb8 ematch: fix warning about yyerror and const
yyerror() should take const char * on current bison.
2012-01-03 13:55:00 -08:00
Stephen Hemminger
cd70f3f522 libnetlink: remove unused junk callback
Both rtnl_talk and rtnl_dump had a callback for handling portions
of netlink message that do not match the correct pid or seq.
But this callback was never used by any part of iproute2 so remove
it.
2011-12-28 10:37:12 -08:00
Eric Dumazet
d060de7f8d netem: fix a typo in explain()
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
2011-12-24 11:21:33 -08:00
Stephen Hemminger
3c7950af59 netem: add support for 4 state and GE loss model
Incorporate support for new loss models.
2011-12-22 17:08:11 -08:00
Eric Dumazet
841fc7bc98 red: harddrop support and cleanups
Add harddrop support (kernel support added a long time ago), and various
cleanups.

min BYTES, max BYTES are now optional and follow Sally Floyd's
recommendations.

By the way, our default 2% probability is a bit low, Sally recommends 10%.
Not a big deal if upcoming adaptative algo is deployed.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
2011-12-08 16:43:18 -08:00
Eric Dumazet
ab15aeacf5 red: make burst optional
Documentation advises to set burst to (min+min+max)/(3*avpkt)

Let tc do this automatically if user doesnt provide burst himself.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
2011-12-01 09:23:49 -08:00
Eric Dumazet
0cf67ead7b red: give a hint about burst value
Check for burst values that are too small.

Reported-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
2011-12-01 09:23:43 -08:00
Thomas Jarosch
fcbd0165fc tc: Use correct variable type for get_distribution() result
get_distribution() returns an int.

cppcheck reported:
[tc/q_netem.c:243]: (style) Checking if unsigned variable 'dist_size' is less than zero.

The mismatch actually rendered the error checking
after get_distribution() ineffective.

Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
2011-11-23 14:46:24 -08:00
Thomas Jarosch
a3da01c519 tc: Remove unused variable 'res'.
Detected by cppcheck.

Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
2011-11-23 14:46:21 -08:00
Stephen Hemminger
93ba481acb cleanup ematch yacc files
make clean needs to remove all the yacc output files for ematch.
2011-11-02 16:39:36 -07:00
Michal Soltys
41f6004139 HFSC (7) & (8) documentation + assorted changes
This patch adds detailed documentation for HFSC scheduler. It roughly
follows HFSC paper, but tries to not rely too much on math side of things.
Post-paper/Linux specific subjects (timer resolution, ul service curve, etc.)
are also discussed.

I've read it many times over, but it's a lengthy chunk of text - so try
to be understanding in case I made some mistakes.

tc-hfsc(7): explains algorithm in detail (very long)
tc-hfsc(8): explains command line options briefly
tc(8): adds references to new man pages
Makefile: adds man7 directory to install target
q_hfsc.c: minimal help text changes, consistency with tc-hfsc(8)

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
2011-11-02 16:33:50 -07:00
Mike Frysinger
aa48b5931a tc: fix parallel build file with lex/yacc
Building iproute2 in parallel might hit the race failure:
	emp_ematch.l:2:30: fatal error: emp_ematch.yacc.h:
		No such file or directory
	make[1]: *** [emp_ematch.lex.o] Error 1

This is because we currently allow the yacc/lex files to generate and
compile in parallel.  So add a simple dependency to make sure yacc has
finished before we attempt to compile the lex output.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
2011-10-18 15:02:21 -07:00
Thomas Jarosch
1a6543c56b Fix memory leak of lname variable in get_target_name()
Detected by cppcheck.

Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
2011-10-07 11:17:10 -07:00
Thomas Jarosch
9f1ba57016 Fix wrong sanity check in choke_parse_opt()
Detected by cppcheck.

Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
2011-10-07 11:17:03 -07:00
Thomas Jarosch
6d5ee98a7c Fix wrong comparison in cmp_print_eopt()
Detected by cppcheck.

Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
2011-10-07 11:16:15 -07:00
Dan McGee
4f3626f920 xt: only unset fields if m is non NULL 2011-08-31 12:18:49 -07:00
Florian Westphal
05fb9184f2 tc: filter: fix default 'protocol all' on little-endian platforms
when specifiying filters without 'protocol' keyword, tc will
default to 'protocol all'.

Unfortunately, this missed a byte-ordering conversion.
2011-08-31 10:55:13 -07:00
Stephen Hemminger
c441bd4c1b Add QFQ scheduler
Basic configuration support for QFQ.
Still need to add manual page.
2011-07-13 13:46:34 -07:00
Stephen Hemminger
be181323c1 Remove redundant limits.h
redo.
2011-07-13 09:49:17 -07:00