2019-05-29 01:10:24 +08:00
|
|
|
// SPDX-License-Identifier: GPL-2.0-only
|
packet: packet fanout rollover during socket overload
Changes:
v3->v2: rebase (no other changes)
passes selftest
v2->v1: read f->num_members only once
fix bug: test rollover mode + flag
Minimize packet drop in a fanout group. If one socket is full,
roll over packets to another from the group. Maintain flow
affinity during normal load using an rxhash fanout policy, while
dispersing unexpected traffic storms that hit a single cpu, such
as spoofed-source DoS flows. Rollover breaks affinity for flows
arriving at saturated sockets during those conditions.
The patch adds a fanout policy ROLLOVER that rotates between sockets,
filling each socket before moving to the next. It also adds a fanout
flag ROLLOVER. If passed along with any other fanout policy, the
primary policy is applied until the chosen socket is full. Then,
rollover selects another socket, to delay packet drop until the
entire system is saturated.
Probing sockets is not free. Selecting the last used socket, as
rollover does, is a greedy approach that maximizes chance of
success, at the cost of extreme load imbalance. In practice, with
sufficiently long queues to absorb bursts, sockets are drained in
parallel and load balance looks uniform in `top`.
To avoid contention, scales counters with number of sockets and
accesses them lockfree. Values are bounds checked to ensure
correctness.
Tested using an application with 9 threads pinned to CPUs, one socket
per thread and sufficient busywork per packet operation to limits each
thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
packets, a FANOUT_CPU setup processes 32 Kpps in total without this
patch, 270 Kpps with the patch. Tested with read() and with a packet
ring (V1).
Also, passes psock_fanout.c unit test added to selftests.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19 18:18:11 +08:00
|
|
|
/*
|
|
|
|
* Copyright 2013 Google Inc.
|
|
|
|
* Author: Willem de Bruijn (willemb@google.com)
|
|
|
|
*
|
|
|
|
* A basic test of packet socket fanout behavior.
|
|
|
|
*
|
|
|
|
* Control:
|
|
|
|
* - create fanout fails as expected with illegal flag combinations
|
|
|
|
* - join fanout fails as expected with diverging types or flags
|
|
|
|
*
|
|
|
|
* Datapath:
|
|
|
|
* Open a pair of packet sockets and a pair of INET sockets, send a known
|
|
|
|
* number of packets across the two INET sockets and count the number of
|
|
|
|
* packets enqueued onto the two packet sockets.
|
|
|
|
*
|
|
|
|
* The test currently runs for
|
|
|
|
* - PACKET_FANOUT_HASH
|
|
|
|
* - PACKET_FANOUT_HASH with PACKET_FANOUT_FLAG_ROLLOVER
|
2013-03-20 04:42:44 +08:00
|
|
|
* - PACKET_FANOUT_LB
|
|
|
|
* - PACKET_FANOUT_CPU
|
packet: packet fanout rollover during socket overload
Changes:
v3->v2: rebase (no other changes)
passes selftest
v2->v1: read f->num_members only once
fix bug: test rollover mode + flag
Minimize packet drop in a fanout group. If one socket is full,
roll over packets to another from the group. Maintain flow
affinity during normal load using an rxhash fanout policy, while
dispersing unexpected traffic storms that hit a single cpu, such
as spoofed-source DoS flows. Rollover breaks affinity for flows
arriving at saturated sockets during those conditions.
The patch adds a fanout policy ROLLOVER that rotates between sockets,
filling each socket before moving to the next. It also adds a fanout
flag ROLLOVER. If passed along with any other fanout policy, the
primary policy is applied until the chosen socket is full. Then,
rollover selects another socket, to delay packet drop until the
entire system is saturated.
Probing sockets is not free. Selecting the last used socket, as
rollover does, is a greedy approach that maximizes chance of
success, at the cost of extreme load imbalance. In practice, with
sufficiently long queues to absorb bursts, sockets are drained in
parallel and load balance looks uniform in `top`.
To avoid contention, scales counters with number of sockets and
accesses them lockfree. Values are bounds checked to ensure
correctness.
Tested using an application with 9 threads pinned to CPUs, one socket
per thread and sufficient busywork per packet operation to limits each
thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
packets, a FANOUT_CPU setup processes 32 Kpps in total without this
patch, 270 Kpps with the patch. Tested with read() and with a packet
ring (V1).
Also, passes psock_fanout.c unit test added to selftests.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19 18:18:11 +08:00
|
|
|
* - PACKET_FANOUT_ROLLOVER
|
2015-08-15 10:31:36 +08:00
|
|
|
* - PACKET_FANOUT_CBPF
|
2015-08-15 10:31:37 +08:00
|
|
|
* - PACKET_FANOUT_EBPF
|
packet: packet fanout rollover during socket overload
Changes:
v3->v2: rebase (no other changes)
passes selftest
v2->v1: read f->num_members only once
fix bug: test rollover mode + flag
Minimize packet drop in a fanout group. If one socket is full,
roll over packets to another from the group. Maintain flow
affinity during normal load using an rxhash fanout policy, while
dispersing unexpected traffic storms that hit a single cpu, such
as spoofed-source DoS flows. Rollover breaks affinity for flows
arriving at saturated sockets during those conditions.
The patch adds a fanout policy ROLLOVER that rotates between sockets,
filling each socket before moving to the next. It also adds a fanout
flag ROLLOVER. If passed along with any other fanout policy, the
primary policy is applied until the chosen socket is full. Then,
rollover selects another socket, to delay packet drop until the
entire system is saturated.
Probing sockets is not free. Selecting the last used socket, as
rollover does, is a greedy approach that maximizes chance of
success, at the cost of extreme load imbalance. In practice, with
sufficiently long queues to absorb bursts, sockets are drained in
parallel and load balance looks uniform in `top`.
To avoid contention, scales counters with number of sockets and
accesses them lockfree. Values are bounds checked to ensure
correctness.
Tested using an application with 9 threads pinned to CPUs, one socket
per thread and sufficient busywork per packet operation to limits each
thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
packets, a FANOUT_CPU setup processes 32 Kpps in total without this
patch, 270 Kpps with the patch. Tested with read() and with a packet
ring (V1).
Also, passes psock_fanout.c unit test added to selftests.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19 18:18:11 +08:00
|
|
|
*
|
|
|
|
* Todo:
|
|
|
|
* - functionality: PACKET_FANOUT_FLAG_DEFRAG
|
|
|
|
*/
|
|
|
|
|
2013-03-20 04:42:44 +08:00
|
|
|
#define _GNU_SOURCE /* for sched_setaffinity */
|
|
|
|
|
packet: packet fanout rollover during socket overload
Changes:
v3->v2: rebase (no other changes)
passes selftest
v2->v1: read f->num_members only once
fix bug: test rollover mode + flag
Minimize packet drop in a fanout group. If one socket is full,
roll over packets to another from the group. Maintain flow
affinity during normal load using an rxhash fanout policy, while
dispersing unexpected traffic storms that hit a single cpu, such
as spoofed-source DoS flows. Rollover breaks affinity for flows
arriving at saturated sockets during those conditions.
The patch adds a fanout policy ROLLOVER that rotates between sockets,
filling each socket before moving to the next. It also adds a fanout
flag ROLLOVER. If passed along with any other fanout policy, the
primary policy is applied until the chosen socket is full. Then,
rollover selects another socket, to delay packet drop until the
entire system is saturated.
Probing sockets is not free. Selecting the last used socket, as
rollover does, is a greedy approach that maximizes chance of
success, at the cost of extreme load imbalance. In practice, with
sufficiently long queues to absorb bursts, sockets are drained in
parallel and load balance looks uniform in `top`.
To avoid contention, scales counters with number of sockets and
accesses them lockfree. Values are bounds checked to ensure
correctness.
Tested using an application with 9 threads pinned to CPUs, one socket
per thread and sufficient busywork per packet operation to limits each
thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
packets, a FANOUT_CPU setup processes 32 Kpps in total without this
patch, 270 Kpps with the patch. Tested with read() and with a packet
ring (V1).
Also, passes psock_fanout.c unit test added to selftests.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19 18:18:11 +08:00
|
|
|
#include <arpa/inet.h>
|
|
|
|
#include <errno.h>
|
2013-03-20 04:42:44 +08:00
|
|
|
#include <fcntl.h>
|
2015-08-15 10:31:37 +08:00
|
|
|
#include <linux/unistd.h> /* for __NR_bpf */
|
packet: packet fanout rollover during socket overload
Changes:
v3->v2: rebase (no other changes)
passes selftest
v2->v1: read f->num_members only once
fix bug: test rollover mode + flag
Minimize packet drop in a fanout group. If one socket is full,
roll over packets to another from the group. Maintain flow
affinity during normal load using an rxhash fanout policy, while
dispersing unexpected traffic storms that hit a single cpu, such
as spoofed-source DoS flows. Rollover breaks affinity for flows
arriving at saturated sockets during those conditions.
The patch adds a fanout policy ROLLOVER that rotates between sockets,
filling each socket before moving to the next. It also adds a fanout
flag ROLLOVER. If passed along with any other fanout policy, the
primary policy is applied until the chosen socket is full. Then,
rollover selects another socket, to delay packet drop until the
entire system is saturated.
Probing sockets is not free. Selecting the last used socket, as
rollover does, is a greedy approach that maximizes chance of
success, at the cost of extreme load imbalance. In practice, with
sufficiently long queues to absorb bursts, sockets are drained in
parallel and load balance looks uniform in `top`.
To avoid contention, scales counters with number of sockets and
accesses them lockfree. Values are bounds checked to ensure
correctness.
Tested using an application with 9 threads pinned to CPUs, one socket
per thread and sufficient busywork per packet operation to limits each
thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
packets, a FANOUT_CPU setup processes 32 Kpps in total without this
patch, 270 Kpps with the patch. Tested with read() and with a packet
ring (V1).
Also, passes psock_fanout.c unit test added to selftests.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19 18:18:11 +08:00
|
|
|
#include <linux/filter.h>
|
2015-08-15 10:31:37 +08:00
|
|
|
#include <linux/bpf.h>
|
packet: packet fanout rollover during socket overload
Changes:
v3->v2: rebase (no other changes)
passes selftest
v2->v1: read f->num_members only once
fix bug: test rollover mode + flag
Minimize packet drop in a fanout group. If one socket is full,
roll over packets to another from the group. Maintain flow
affinity during normal load using an rxhash fanout policy, while
dispersing unexpected traffic storms that hit a single cpu, such
as spoofed-source DoS flows. Rollover breaks affinity for flows
arriving at saturated sockets during those conditions.
The patch adds a fanout policy ROLLOVER that rotates between sockets,
filling each socket before moving to the next. It also adds a fanout
flag ROLLOVER. If passed along with any other fanout policy, the
primary policy is applied until the chosen socket is full. Then,
rollover selects another socket, to delay packet drop until the
entire system is saturated.
Probing sockets is not free. Selecting the last used socket, as
rollover does, is a greedy approach that maximizes chance of
success, at the cost of extreme load imbalance. In practice, with
sufficiently long queues to absorb bursts, sockets are drained in
parallel and load balance looks uniform in `top`.
To avoid contention, scales counters with number of sockets and
accesses them lockfree. Values are bounds checked to ensure
correctness.
Tested using an application with 9 threads pinned to CPUs, one socket
per thread and sufficient busywork per packet operation to limits each
thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
packets, a FANOUT_CPU setup processes 32 Kpps in total without this
patch, 270 Kpps with the patch. Tested with read() and with a packet
ring (V1).
Also, passes psock_fanout.c unit test added to selftests.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19 18:18:11 +08:00
|
|
|
#include <linux/if_packet.h>
|
selftests/net: ignore background traffic in psock_fanout
The packet fanout test generates UDP traffic and reads this with
a pair of packet sockets, testing the various fanout algorithms.
Avoid non-determinism from reading unrelated background traffic.
Fanout decisions are made before unrelated packets can be dropped with
a filter, so that is an insufficient strategy [*]. Run the packet
socket tests in a network namespace, similar to msg_zerocopy.
It it still good practice to install a filter on a packet socket
before accepting traffic. Because this is example code, demonstrate
that pattern. Open the socket initially bound to no protocol, install
a filter, and only then bind to ETH_P_IP.
Another source of non-determinism is hash collisions in FANOUT_HASH.
The hash function used to select a socket in the fanout group includes
the pseudorandom number hashrnd, which is not visible from userspace.
To work around this, the test tries to find a pair of UDP source ports
that do not collide. It gives up too soon (5 times, every 32 runs) and
output is confusing. Increase tries to 20 and revise the error msg.
[*] another approach would be to add a third socket to the fanout
group and direct all unexpected traffic here. This is possible
only when reimplementing methods like RR or HASH alongside this
extra catch-all bucket, using the BPF fanout method.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-24 00:56:20 +08:00
|
|
|
#include <net/if.h>
|
packet: packet fanout rollover during socket overload
Changes:
v3->v2: rebase (no other changes)
passes selftest
v2->v1: read f->num_members only once
fix bug: test rollover mode + flag
Minimize packet drop in a fanout group. If one socket is full,
roll over packets to another from the group. Maintain flow
affinity during normal load using an rxhash fanout policy, while
dispersing unexpected traffic storms that hit a single cpu, such
as spoofed-source DoS flows. Rollover breaks affinity for flows
arriving at saturated sockets during those conditions.
The patch adds a fanout policy ROLLOVER that rotates between sockets,
filling each socket before moving to the next. It also adds a fanout
flag ROLLOVER. If passed along with any other fanout policy, the
primary policy is applied until the chosen socket is full. Then,
rollover selects another socket, to delay packet drop until the
entire system is saturated.
Probing sockets is not free. Selecting the last used socket, as
rollover does, is a greedy approach that maximizes chance of
success, at the cost of extreme load imbalance. In practice, with
sufficiently long queues to absorb bursts, sockets are drained in
parallel and load balance looks uniform in `top`.
To avoid contention, scales counters with number of sockets and
accesses them lockfree. Values are bounds checked to ensure
correctness.
Tested using an application with 9 threads pinned to CPUs, one socket
per thread and sufficient busywork per packet operation to limits each
thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
packets, a FANOUT_CPU setup processes 32 Kpps in total without this
patch, 270 Kpps with the patch. Tested with read() and with a packet
ring (V1).
Also, passes psock_fanout.c unit test added to selftests.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19 18:18:11 +08:00
|
|
|
#include <net/ethernet.h>
|
|
|
|
#include <netinet/ip.h>
|
|
|
|
#include <netinet/udp.h>
|
2013-03-20 04:42:44 +08:00
|
|
|
#include <poll.h>
|
|
|
|
#include <sched.h>
|
packet: packet fanout rollover during socket overload
Changes:
v3->v2: rebase (no other changes)
passes selftest
v2->v1: read f->num_members only once
fix bug: test rollover mode + flag
Minimize packet drop in a fanout group. If one socket is full,
roll over packets to another from the group. Maintain flow
affinity during normal load using an rxhash fanout policy, while
dispersing unexpected traffic storms that hit a single cpu, such
as spoofed-source DoS flows. Rollover breaks affinity for flows
arriving at saturated sockets during those conditions.
The patch adds a fanout policy ROLLOVER that rotates between sockets,
filling each socket before moving to the next. It also adds a fanout
flag ROLLOVER. If passed along with any other fanout policy, the
primary policy is applied until the chosen socket is full. Then,
rollover selects another socket, to delay packet drop until the
entire system is saturated.
Probing sockets is not free. Selecting the last used socket, as
rollover does, is a greedy approach that maximizes chance of
success, at the cost of extreme load imbalance. In practice, with
sufficiently long queues to absorb bursts, sockets are drained in
parallel and load balance looks uniform in `top`.
To avoid contention, scales counters with number of sockets and
accesses them lockfree. Values are bounds checked to ensure
correctness.
Tested using an application with 9 threads pinned to CPUs, one socket
per thread and sufficient busywork per packet operation to limits each
thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
packets, a FANOUT_CPU setup processes 32 Kpps in total without this
patch, 270 Kpps with the patch. Tested with read() and with a packet
ring (V1).
Also, passes psock_fanout.c unit test added to selftests.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19 18:18:11 +08:00
|
|
|
#include <stdint.h>
|
|
|
|
#include <stdio.h>
|
|
|
|
#include <stdlib.h>
|
|
|
|
#include <string.h>
|
2013-03-20 04:42:44 +08:00
|
|
|
#include <sys/mman.h>
|
packet: packet fanout rollover during socket overload
Changes:
v3->v2: rebase (no other changes)
passes selftest
v2->v1: read f->num_members only once
fix bug: test rollover mode + flag
Minimize packet drop in a fanout group. If one socket is full,
roll over packets to another from the group. Maintain flow
affinity during normal load using an rxhash fanout policy, while
dispersing unexpected traffic storms that hit a single cpu, such
as spoofed-source DoS flows. Rollover breaks affinity for flows
arriving at saturated sockets during those conditions.
The patch adds a fanout policy ROLLOVER that rotates between sockets,
filling each socket before moving to the next. It also adds a fanout
flag ROLLOVER. If passed along with any other fanout policy, the
primary policy is applied until the chosen socket is full. Then,
rollover selects another socket, to delay packet drop until the
entire system is saturated.
Probing sockets is not free. Selecting the last used socket, as
rollover does, is a greedy approach that maximizes chance of
success, at the cost of extreme load imbalance. In practice, with
sufficiently long queues to absorb bursts, sockets are drained in
parallel and load balance looks uniform in `top`.
To avoid contention, scales counters with number of sockets and
accesses them lockfree. Values are bounds checked to ensure
correctness.
Tested using an application with 9 threads pinned to CPUs, one socket
per thread and sufficient busywork per packet operation to limits each
thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
packets, a FANOUT_CPU setup processes 32 Kpps in total without this
patch, 270 Kpps with the patch. Tested with read() and with a packet
ring (V1).
Also, passes psock_fanout.c unit test added to selftests.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19 18:18:11 +08:00
|
|
|
#include <sys/socket.h>
|
|
|
|
#include <sys/stat.h>
|
|
|
|
#include <sys/types.h>
|
|
|
|
#include <unistd.h>
|
|
|
|
|
2013-04-02 21:00:51 +08:00
|
|
|
#include "psock_lib.h"
|
packet: packet fanout rollover during socket overload
Changes:
v3->v2: rebase (no other changes)
passes selftest
v2->v1: read f->num_members only once
fix bug: test rollover mode + flag
Minimize packet drop in a fanout group. If one socket is full,
roll over packets to another from the group. Maintain flow
affinity during normal load using an rxhash fanout policy, while
dispersing unexpected traffic storms that hit a single cpu, such
as spoofed-source DoS flows. Rollover breaks affinity for flows
arriving at saturated sockets during those conditions.
The patch adds a fanout policy ROLLOVER that rotates between sockets,
filling each socket before moving to the next. It also adds a fanout
flag ROLLOVER. If passed along with any other fanout policy, the
primary policy is applied until the chosen socket is full. Then,
rollover selects another socket, to delay packet drop until the
entire system is saturated.
Probing sockets is not free. Selecting the last used socket, as
rollover does, is a greedy approach that maximizes chance of
success, at the cost of extreme load imbalance. In practice, with
sufficiently long queues to absorb bursts, sockets are drained in
parallel and load balance looks uniform in `top`.
To avoid contention, scales counters with number of sockets and
accesses them lockfree. Values are bounds checked to ensure
correctness.
Tested using an application with 9 threads pinned to CPUs, one socket
per thread and sufficient busywork per packet operation to limits each
thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
packets, a FANOUT_CPU setup processes 32 Kpps in total without this
patch, 270 Kpps with the patch. Tested with read() and with a packet
ring (V1).
Also, passes psock_fanout.c unit test added to selftests.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19 18:18:11 +08:00
|
|
|
|
2013-04-02 21:00:51 +08:00
|
|
|
#define RING_NUM_FRAMES 20
|
packet: packet fanout rollover during socket overload
Changes:
v3->v2: rebase (no other changes)
passes selftest
v2->v1: read f->num_members only once
fix bug: test rollover mode + flag
Minimize packet drop in a fanout group. If one socket is full,
roll over packets to another from the group. Maintain flow
affinity during normal load using an rxhash fanout policy, while
dispersing unexpected traffic storms that hit a single cpu, such
as spoofed-source DoS flows. Rollover breaks affinity for flows
arriving at saturated sockets during those conditions.
The patch adds a fanout policy ROLLOVER that rotates between sockets,
filling each socket before moving to the next. It also adds a fanout
flag ROLLOVER. If passed along with any other fanout policy, the
primary policy is applied until the chosen socket is full. Then,
rollover selects another socket, to delay packet drop until the
entire system is saturated.
Probing sockets is not free. Selecting the last used socket, as
rollover does, is a greedy approach that maximizes chance of
success, at the cost of extreme load imbalance. In practice, with
sufficiently long queues to absorb bursts, sockets are drained in
parallel and load balance looks uniform in `top`.
To avoid contention, scales counters with number of sockets and
accesses them lockfree. Values are bounds checked to ensure
correctness.
Tested using an application with 9 threads pinned to CPUs, one socket
per thread and sufficient busywork per packet operation to limits each
thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
packets, a FANOUT_CPU setup processes 32 Kpps in total without this
patch, 270 Kpps with the patch. Tested with read() and with a packet
ring (V1).
Also, passes psock_fanout.c unit test added to selftests.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19 18:18:11 +08:00
|
|
|
|
2020-11-07 02:07:41 +08:00
|
|
|
static uint32_t cfg_max_num_members;
|
|
|
|
|
packet: packet fanout rollover during socket overload
Changes:
v3->v2: rebase (no other changes)
passes selftest
v2->v1: read f->num_members only once
fix bug: test rollover mode + flag
Minimize packet drop in a fanout group. If one socket is full,
roll over packets to another from the group. Maintain flow
affinity during normal load using an rxhash fanout policy, while
dispersing unexpected traffic storms that hit a single cpu, such
as spoofed-source DoS flows. Rollover breaks affinity for flows
arriving at saturated sockets during those conditions.
The patch adds a fanout policy ROLLOVER that rotates between sockets,
filling each socket before moving to the next. It also adds a fanout
flag ROLLOVER. If passed along with any other fanout policy, the
primary policy is applied until the chosen socket is full. Then,
rollover selects another socket, to delay packet drop until the
entire system is saturated.
Probing sockets is not free. Selecting the last used socket, as
rollover does, is a greedy approach that maximizes chance of
success, at the cost of extreme load imbalance. In practice, with
sufficiently long queues to absorb bursts, sockets are drained in
parallel and load balance looks uniform in `top`.
To avoid contention, scales counters with number of sockets and
accesses them lockfree. Values are bounds checked to ensure
correctness.
Tested using an application with 9 threads pinned to CPUs, one socket
per thread and sufficient busywork per packet operation to limits each
thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
packets, a FANOUT_CPU setup processes 32 Kpps in total without this
patch, 270 Kpps with the patch. Tested with read() and with a packet
ring (V1).
Also, passes psock_fanout.c unit test added to selftests.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19 18:18:11 +08:00
|
|
|
/* Open a socket in a given fanout mode.
|
|
|
|
* @return -1 if mode is bad, a valid socket otherwise */
|
2017-04-21 22:56:12 +08:00
|
|
|
static int sock_fanout_open(uint16_t typeflags, uint16_t group_id)
|
packet: packet fanout rollover during socket overload
Changes:
v3->v2: rebase (no other changes)
passes selftest
v2->v1: read f->num_members only once
fix bug: test rollover mode + flag
Minimize packet drop in a fanout group. If one socket is full,
roll over packets to another from the group. Maintain flow
affinity during normal load using an rxhash fanout policy, while
dispersing unexpected traffic storms that hit a single cpu, such
as spoofed-source DoS flows. Rollover breaks affinity for flows
arriving at saturated sockets during those conditions.
The patch adds a fanout policy ROLLOVER that rotates between sockets,
filling each socket before moving to the next. It also adds a fanout
flag ROLLOVER. If passed along with any other fanout policy, the
primary policy is applied until the chosen socket is full. Then,
rollover selects another socket, to delay packet drop until the
entire system is saturated.
Probing sockets is not free. Selecting the last used socket, as
rollover does, is a greedy approach that maximizes chance of
success, at the cost of extreme load imbalance. In practice, with
sufficiently long queues to absorb bursts, sockets are drained in
parallel and load balance looks uniform in `top`.
To avoid contention, scales counters with number of sockets and
accesses them lockfree. Values are bounds checked to ensure
correctness.
Tested using an application with 9 threads pinned to CPUs, one socket
per thread and sufficient busywork per packet operation to limits each
thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
packets, a FANOUT_CPU setup processes 32 Kpps in total without this
patch, 270 Kpps with the patch. Tested with read() and with a packet
ring (V1).
Also, passes psock_fanout.c unit test added to selftests.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19 18:18:11 +08:00
|
|
|
{
|
selftests/net: ignore background traffic in psock_fanout
The packet fanout test generates UDP traffic and reads this with
a pair of packet sockets, testing the various fanout algorithms.
Avoid non-determinism from reading unrelated background traffic.
Fanout decisions are made before unrelated packets can be dropped with
a filter, so that is an insufficient strategy [*]. Run the packet
socket tests in a network namespace, similar to msg_zerocopy.
It it still good practice to install a filter on a packet socket
before accepting traffic. Because this is example code, demonstrate
that pattern. Open the socket initially bound to no protocol, install
a filter, and only then bind to ETH_P_IP.
Another source of non-determinism is hash collisions in FANOUT_HASH.
The hash function used to select a socket in the fanout group includes
the pseudorandom number hashrnd, which is not visible from userspace.
To work around this, the test tries to find a pair of UDP source ports
that do not collide. It gives up too soon (5 times, every 32 runs) and
output is confusing. Increase tries to 20 and revise the error msg.
[*] another approach would be to add a third socket to the fanout
group and direct all unexpected traffic here. This is possible
only when reimplementing methods like RR or HASH alongside this
extra catch-all bucket, using the BPF fanout method.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-24 00:56:20 +08:00
|
|
|
struct sockaddr_ll addr = {0};
|
2020-11-07 02:07:41 +08:00
|
|
|
struct fanout_args args;
|
|
|
|
int fd, val, err;
|
packet: packet fanout rollover during socket overload
Changes:
v3->v2: rebase (no other changes)
passes selftest
v2->v1: read f->num_members only once
fix bug: test rollover mode + flag
Minimize packet drop in a fanout group. If one socket is full,
roll over packets to another from the group. Maintain flow
affinity during normal load using an rxhash fanout policy, while
dispersing unexpected traffic storms that hit a single cpu, such
as spoofed-source DoS flows. Rollover breaks affinity for flows
arriving at saturated sockets during those conditions.
The patch adds a fanout policy ROLLOVER that rotates between sockets,
filling each socket before moving to the next. It also adds a fanout
flag ROLLOVER. If passed along with any other fanout policy, the
primary policy is applied until the chosen socket is full. Then,
rollover selects another socket, to delay packet drop until the
entire system is saturated.
Probing sockets is not free. Selecting the last used socket, as
rollover does, is a greedy approach that maximizes chance of
success, at the cost of extreme load imbalance. In practice, with
sufficiently long queues to absorb bursts, sockets are drained in
parallel and load balance looks uniform in `top`.
To avoid contention, scales counters with number of sockets and
accesses them lockfree. Values are bounds checked to ensure
correctness.
Tested using an application with 9 threads pinned to CPUs, one socket
per thread and sufficient busywork per packet operation to limits each
thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
packets, a FANOUT_CPU setup processes 32 Kpps in total without this
patch, 270 Kpps with the patch. Tested with read() and with a packet
ring (V1).
Also, passes psock_fanout.c unit test added to selftests.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19 18:18:11 +08:00
|
|
|
|
selftests/net: ignore background traffic in psock_fanout
The packet fanout test generates UDP traffic and reads this with
a pair of packet sockets, testing the various fanout algorithms.
Avoid non-determinism from reading unrelated background traffic.
Fanout decisions are made before unrelated packets can be dropped with
a filter, so that is an insufficient strategy [*]. Run the packet
socket tests in a network namespace, similar to msg_zerocopy.
It it still good practice to install a filter on a packet socket
before accepting traffic. Because this is example code, demonstrate
that pattern. Open the socket initially bound to no protocol, install
a filter, and only then bind to ETH_P_IP.
Another source of non-determinism is hash collisions in FANOUT_HASH.
The hash function used to select a socket in the fanout group includes
the pseudorandom number hashrnd, which is not visible from userspace.
To work around this, the test tries to find a pair of UDP source ports
that do not collide. It gives up too soon (5 times, every 32 runs) and
output is confusing. Increase tries to 20 and revise the error msg.
[*] another approach would be to add a third socket to the fanout
group and direct all unexpected traffic here. This is possible
only when reimplementing methods like RR or HASH alongside this
extra catch-all bucket, using the BPF fanout method.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-24 00:56:20 +08:00
|
|
|
fd = socket(PF_PACKET, SOCK_RAW, 0);
|
packet: packet fanout rollover during socket overload
Changes:
v3->v2: rebase (no other changes)
passes selftest
v2->v1: read f->num_members only once
fix bug: test rollover mode + flag
Minimize packet drop in a fanout group. If one socket is full,
roll over packets to another from the group. Maintain flow
affinity during normal load using an rxhash fanout policy, while
dispersing unexpected traffic storms that hit a single cpu, such
as spoofed-source DoS flows. Rollover breaks affinity for flows
arriving at saturated sockets during those conditions.
The patch adds a fanout policy ROLLOVER that rotates between sockets,
filling each socket before moving to the next. It also adds a fanout
flag ROLLOVER. If passed along with any other fanout policy, the
primary policy is applied until the chosen socket is full. Then,
rollover selects another socket, to delay packet drop until the
entire system is saturated.
Probing sockets is not free. Selecting the last used socket, as
rollover does, is a greedy approach that maximizes chance of
success, at the cost of extreme load imbalance. In practice, with
sufficiently long queues to absorb bursts, sockets are drained in
parallel and load balance looks uniform in `top`.
To avoid contention, scales counters with number of sockets and
accesses them lockfree. Values are bounds checked to ensure
correctness.
Tested using an application with 9 threads pinned to CPUs, one socket
per thread and sufficient busywork per packet operation to limits each
thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
packets, a FANOUT_CPU setup processes 32 Kpps in total without this
patch, 270 Kpps with the patch. Tested with read() and with a packet
ring (V1).
Also, passes psock_fanout.c unit test added to selftests.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19 18:18:11 +08:00
|
|
|
if (fd < 0) {
|
|
|
|
perror("socket packet");
|
|
|
|
exit(1);
|
|
|
|
}
|
|
|
|
|
selftests/net: ignore background traffic in psock_fanout
The packet fanout test generates UDP traffic and reads this with
a pair of packet sockets, testing the various fanout algorithms.
Avoid non-determinism from reading unrelated background traffic.
Fanout decisions are made before unrelated packets can be dropped with
a filter, so that is an insufficient strategy [*]. Run the packet
socket tests in a network namespace, similar to msg_zerocopy.
It it still good practice to install a filter on a packet socket
before accepting traffic. Because this is example code, demonstrate
that pattern. Open the socket initially bound to no protocol, install
a filter, and only then bind to ETH_P_IP.
Another source of non-determinism is hash collisions in FANOUT_HASH.
The hash function used to select a socket in the fanout group includes
the pseudorandom number hashrnd, which is not visible from userspace.
To work around this, the test tries to find a pair of UDP source ports
that do not collide. It gives up too soon (5 times, every 32 runs) and
output is confusing. Increase tries to 20 and revise the error msg.
[*] another approach would be to add a third socket to the fanout
group and direct all unexpected traffic here. This is possible
only when reimplementing methods like RR or HASH alongside this
extra catch-all bucket, using the BPF fanout method.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-24 00:56:20 +08:00
|
|
|
pair_udp_setfilter(fd);
|
|
|
|
|
|
|
|
addr.sll_family = AF_PACKET;
|
|
|
|
addr.sll_protocol = htons(ETH_P_IP);
|
|
|
|
addr.sll_ifindex = if_nametoindex("lo");
|
|
|
|
if (addr.sll_ifindex == 0) {
|
|
|
|
perror("if_nametoindex");
|
|
|
|
exit(1);
|
|
|
|
}
|
|
|
|
if (bind(fd, (void *) &addr, sizeof(addr))) {
|
|
|
|
perror("bind packet");
|
|
|
|
exit(1);
|
|
|
|
}
|
|
|
|
|
2020-11-07 02:07:41 +08:00
|
|
|
if (cfg_max_num_members) {
|
|
|
|
args.id = group_id;
|
|
|
|
args.type_flags = typeflags;
|
|
|
|
args.max_num_members = cfg_max_num_members;
|
|
|
|
err = setsockopt(fd, SOL_PACKET, PACKET_FANOUT, &args,
|
|
|
|
sizeof(args));
|
|
|
|
} else {
|
|
|
|
val = (((int) typeflags) << 16) | group_id;
|
|
|
|
err = setsockopt(fd, SOL_PACKET, PACKET_FANOUT, &val,
|
|
|
|
sizeof(val));
|
|
|
|
}
|
|
|
|
if (err) {
|
packet: packet fanout rollover during socket overload
Changes:
v3->v2: rebase (no other changes)
passes selftest
v2->v1: read f->num_members only once
fix bug: test rollover mode + flag
Minimize packet drop in a fanout group. If one socket is full,
roll over packets to another from the group. Maintain flow
affinity during normal load using an rxhash fanout policy, while
dispersing unexpected traffic storms that hit a single cpu, such
as spoofed-source DoS flows. Rollover breaks affinity for flows
arriving at saturated sockets during those conditions.
The patch adds a fanout policy ROLLOVER that rotates between sockets,
filling each socket before moving to the next. It also adds a fanout
flag ROLLOVER. If passed along with any other fanout policy, the
primary policy is applied until the chosen socket is full. Then,
rollover selects another socket, to delay packet drop until the
entire system is saturated.
Probing sockets is not free. Selecting the last used socket, as
rollover does, is a greedy approach that maximizes chance of
success, at the cost of extreme load imbalance. In practice, with
sufficiently long queues to absorb bursts, sockets are drained in
parallel and load balance looks uniform in `top`.
To avoid contention, scales counters with number of sockets and
accesses them lockfree. Values are bounds checked to ensure
correctness.
Tested using an application with 9 threads pinned to CPUs, one socket
per thread and sufficient busywork per packet operation to limits each
thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
packets, a FANOUT_CPU setup processes 32 Kpps in total without this
patch, 270 Kpps with the patch. Tested with read() and with a packet
ring (V1).
Also, passes psock_fanout.c unit test added to selftests.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19 18:18:11 +08:00
|
|
|
if (close(fd)) {
|
|
|
|
perror("close packet");
|
|
|
|
exit(1);
|
|
|
|
}
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
|
|
|
return fd;
|
|
|
|
}
|
|
|
|
|
2017-04-18 23:14:16 +08:00
|
|
|
static void sock_fanout_set_cbpf(int fd)
|
|
|
|
{
|
|
|
|
struct sock_filter bpf_filter[] = {
|
|
|
|
BPF_STMT(BPF_LD+BPF_B+BPF_ABS, 80), /* ldb [80] */
|
|
|
|
BPF_STMT(BPF_RET+BPF_A, 0), /* ret A */
|
|
|
|
};
|
|
|
|
struct sock_fprog bpf_prog;
|
|
|
|
|
|
|
|
bpf_prog.filter = bpf_filter;
|
|
|
|
bpf_prog.len = sizeof(bpf_filter) / sizeof(struct sock_filter);
|
|
|
|
|
|
|
|
if (setsockopt(fd, SOL_PACKET, PACKET_FANOUT_DATA, &bpf_prog,
|
|
|
|
sizeof(bpf_prog))) {
|
|
|
|
perror("fanout data cbpf");
|
|
|
|
exit(1);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-04-21 22:56:12 +08:00
|
|
|
static void sock_fanout_getopts(int fd, uint16_t *typeflags, uint16_t *group_id)
|
|
|
|
{
|
|
|
|
int sockopt;
|
|
|
|
socklen_t sockopt_len = sizeof(sockopt);
|
|
|
|
|
|
|
|
if (getsockopt(fd, SOL_PACKET, PACKET_FANOUT,
|
|
|
|
&sockopt, &sockopt_len)) {
|
|
|
|
perror("failed to getsockopt");
|
|
|
|
exit(1);
|
|
|
|
}
|
|
|
|
*typeflags = sockopt >> 16;
|
|
|
|
*group_id = sockopt & 0xfffff;
|
|
|
|
}
|
|
|
|
|
2015-08-15 10:31:37 +08:00
|
|
|
static void sock_fanout_set_ebpf(int fd)
|
|
|
|
{
|
2018-02-15 08:19:26 +08:00
|
|
|
static char log_buf[65536];
|
|
|
|
|
2015-08-15 10:31:37 +08:00
|
|
|
const int len_off = __builtin_offsetof(struct __sk_buff, len);
|
|
|
|
struct bpf_insn prog[] = {
|
|
|
|
{ BPF_ALU64 | BPF_MOV | BPF_X, 6, 1, 0, 0 },
|
|
|
|
{ BPF_LDX | BPF_W | BPF_MEM, 0, 6, len_off, 0 },
|
|
|
|
{ BPF_JMP | BPF_JGE | BPF_K, 0, 0, 1, DATA_LEN },
|
|
|
|
{ BPF_JMP | BPF_JA | BPF_K, 0, 0, 4, 0 },
|
|
|
|
{ BPF_LD | BPF_B | BPF_ABS, 0, 0, 0, 0x50 },
|
|
|
|
{ BPF_JMP | BPF_JEQ | BPF_K, 0, 0, 2, DATA_CHAR },
|
|
|
|
{ BPF_JMP | BPF_JEQ | BPF_K, 0, 0, 1, DATA_CHAR_1 },
|
|
|
|
{ BPF_ALU | BPF_MOV | BPF_K, 0, 0, 0, 0 },
|
|
|
|
{ BPF_JMP | BPF_EXIT, 0, 0, 0, 0 }
|
|
|
|
};
|
|
|
|
union bpf_attr attr;
|
|
|
|
int pfd;
|
|
|
|
|
|
|
|
memset(&attr, 0, sizeof(attr));
|
|
|
|
attr.prog_type = BPF_PROG_TYPE_SOCKET_FILTER;
|
|
|
|
attr.insns = (unsigned long) prog;
|
|
|
|
attr.insn_cnt = sizeof(prog) / sizeof(prog[0]);
|
|
|
|
attr.license = (unsigned long) "GPL";
|
|
|
|
attr.log_buf = (unsigned long) log_buf,
|
|
|
|
attr.log_size = sizeof(log_buf),
|
|
|
|
attr.log_level = 1,
|
|
|
|
|
|
|
|
pfd = syscall(__NR_bpf, BPF_PROG_LOAD, &attr, sizeof(attr));
|
|
|
|
if (pfd < 0) {
|
|
|
|
perror("bpf");
|
|
|
|
fprintf(stderr, "bpf verifier:\n%s\n", log_buf);
|
|
|
|
exit(1);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (setsockopt(fd, SOL_PACKET, PACKET_FANOUT_DATA, &pfd, sizeof(pfd))) {
|
|
|
|
perror("fanout data ebpf");
|
|
|
|
exit(1);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (close(pfd)) {
|
|
|
|
perror("close ebpf");
|
|
|
|
exit(1);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2013-03-20 04:42:44 +08:00
|
|
|
static char *sock_fanout_open_ring(int fd)
|
packet: packet fanout rollover during socket overload
Changes:
v3->v2: rebase (no other changes)
passes selftest
v2->v1: read f->num_members only once
fix bug: test rollover mode + flag
Minimize packet drop in a fanout group. If one socket is full,
roll over packets to another from the group. Maintain flow
affinity during normal load using an rxhash fanout policy, while
dispersing unexpected traffic storms that hit a single cpu, such
as spoofed-source DoS flows. Rollover breaks affinity for flows
arriving at saturated sockets during those conditions.
The patch adds a fanout policy ROLLOVER that rotates between sockets,
filling each socket before moving to the next. It also adds a fanout
flag ROLLOVER. If passed along with any other fanout policy, the
primary policy is applied until the chosen socket is full. Then,
rollover selects another socket, to delay packet drop until the
entire system is saturated.
Probing sockets is not free. Selecting the last used socket, as
rollover does, is a greedy approach that maximizes chance of
success, at the cost of extreme load imbalance. In practice, with
sufficiently long queues to absorb bursts, sockets are drained in
parallel and load balance looks uniform in `top`.
To avoid contention, scales counters with number of sockets and
accesses them lockfree. Values are bounds checked to ensure
correctness.
Tested using an application with 9 threads pinned to CPUs, one socket
per thread and sufficient busywork per packet operation to limits each
thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
packets, a FANOUT_CPU setup processes 32 Kpps in total without this
patch, 270 Kpps with the patch. Tested with read() and with a packet
ring (V1).
Also, passes psock_fanout.c unit test added to selftests.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19 18:18:11 +08:00
|
|
|
{
|
2013-03-20 04:42:44 +08:00
|
|
|
struct tpacket_req req = {
|
|
|
|
.tp_block_size = getpagesize(),
|
|
|
|
.tp_frame_size = getpagesize(),
|
|
|
|
.tp_block_nr = RING_NUM_FRAMES,
|
|
|
|
.tp_frame_nr = RING_NUM_FRAMES,
|
|
|
|
};
|
|
|
|
char *ring;
|
2013-03-22 02:10:03 +08:00
|
|
|
int val = TPACKET_V2;
|
packet: packet fanout rollover during socket overload
Changes:
v3->v2: rebase (no other changes)
passes selftest
v2->v1: read f->num_members only once
fix bug: test rollover mode + flag
Minimize packet drop in a fanout group. If one socket is full,
roll over packets to another from the group. Maintain flow
affinity during normal load using an rxhash fanout policy, while
dispersing unexpected traffic storms that hit a single cpu, such
as spoofed-source DoS flows. Rollover breaks affinity for flows
arriving at saturated sockets during those conditions.
The patch adds a fanout policy ROLLOVER that rotates between sockets,
filling each socket before moving to the next. It also adds a fanout
flag ROLLOVER. If passed along with any other fanout policy, the
primary policy is applied until the chosen socket is full. Then,
rollover selects another socket, to delay packet drop until the
entire system is saturated.
Probing sockets is not free. Selecting the last used socket, as
rollover does, is a greedy approach that maximizes chance of
success, at the cost of extreme load imbalance. In practice, with
sufficiently long queues to absorb bursts, sockets are drained in
parallel and load balance looks uniform in `top`.
To avoid contention, scales counters with number of sockets and
accesses them lockfree. Values are bounds checked to ensure
correctness.
Tested using an application with 9 threads pinned to CPUs, one socket
per thread and sufficient busywork per packet operation to limits each
thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
packets, a FANOUT_CPU setup processes 32 Kpps in total without this
patch, 270 Kpps with the patch. Tested with read() and with a packet
ring (V1).
Also, passes psock_fanout.c unit test added to selftests.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19 18:18:11 +08:00
|
|
|
|
2013-03-22 02:10:03 +08:00
|
|
|
if (setsockopt(fd, SOL_PACKET, PACKET_VERSION, (void *) &val,
|
|
|
|
sizeof(val))) {
|
|
|
|
perror("packetsock ring setsockopt version");
|
|
|
|
exit(1);
|
|
|
|
}
|
2013-03-20 04:42:44 +08:00
|
|
|
if (setsockopt(fd, SOL_PACKET, PACKET_RX_RING, (void *) &req,
|
|
|
|
sizeof(req))) {
|
|
|
|
perror("packetsock ring setsockopt");
|
packet: packet fanout rollover during socket overload
Changes:
v3->v2: rebase (no other changes)
passes selftest
v2->v1: read f->num_members only once
fix bug: test rollover mode + flag
Minimize packet drop in a fanout group. If one socket is full,
roll over packets to another from the group. Maintain flow
affinity during normal load using an rxhash fanout policy, while
dispersing unexpected traffic storms that hit a single cpu, such
as spoofed-source DoS flows. Rollover breaks affinity for flows
arriving at saturated sockets during those conditions.
The patch adds a fanout policy ROLLOVER that rotates between sockets,
filling each socket before moving to the next. It also adds a fanout
flag ROLLOVER. If passed along with any other fanout policy, the
primary policy is applied until the chosen socket is full. Then,
rollover selects another socket, to delay packet drop until the
entire system is saturated.
Probing sockets is not free. Selecting the last used socket, as
rollover does, is a greedy approach that maximizes chance of
success, at the cost of extreme load imbalance. In practice, with
sufficiently long queues to absorb bursts, sockets are drained in
parallel and load balance looks uniform in `top`.
To avoid contention, scales counters with number of sockets and
accesses them lockfree. Values are bounds checked to ensure
correctness.
Tested using an application with 9 threads pinned to CPUs, one socket
per thread and sufficient busywork per packet operation to limits each
thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
packets, a FANOUT_CPU setup processes 32 Kpps in total without this
patch, 270 Kpps with the patch. Tested with read() and with a packet
ring (V1).
Also, passes psock_fanout.c unit test added to selftests.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19 18:18:11 +08:00
|
|
|
exit(1);
|
|
|
|
}
|
2013-03-20 04:42:44 +08:00
|
|
|
|
|
|
|
ring = mmap(0, req.tp_block_size * req.tp_block_nr,
|
|
|
|
PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
|
2015-08-15 10:31:36 +08:00
|
|
|
if (ring == MAP_FAILED) {
|
|
|
|
perror("packetsock ring mmap");
|
packet: packet fanout rollover during socket overload
Changes:
v3->v2: rebase (no other changes)
passes selftest
v2->v1: read f->num_members only once
fix bug: test rollover mode + flag
Minimize packet drop in a fanout group. If one socket is full,
roll over packets to another from the group. Maintain flow
affinity during normal load using an rxhash fanout policy, while
dispersing unexpected traffic storms that hit a single cpu, such
as spoofed-source DoS flows. Rollover breaks affinity for flows
arriving at saturated sockets during those conditions.
The patch adds a fanout policy ROLLOVER that rotates between sockets,
filling each socket before moving to the next. It also adds a fanout
flag ROLLOVER. If passed along with any other fanout policy, the
primary policy is applied until the chosen socket is full. Then,
rollover selects another socket, to delay packet drop until the
entire system is saturated.
Probing sockets is not free. Selecting the last used socket, as
rollover does, is a greedy approach that maximizes chance of
success, at the cost of extreme load imbalance. In practice, with
sufficiently long queues to absorb bursts, sockets are drained in
parallel and load balance looks uniform in `top`.
To avoid contention, scales counters with number of sockets and
accesses them lockfree. Values are bounds checked to ensure
correctness.
Tested using an application with 9 threads pinned to CPUs, one socket
per thread and sufficient busywork per packet operation to limits each
thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
packets, a FANOUT_CPU setup processes 32 Kpps in total without this
patch, 270 Kpps with the patch. Tested with read() and with a packet
ring (V1).
Also, passes psock_fanout.c unit test added to selftests.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19 18:18:11 +08:00
|
|
|
exit(1);
|
|
|
|
}
|
2013-03-20 04:42:44 +08:00
|
|
|
|
|
|
|
return ring;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int sock_fanout_read_ring(int fd, void *ring)
|
|
|
|
{
|
2013-03-22 02:10:03 +08:00
|
|
|
struct tpacket2_hdr *header = ring;
|
2013-03-20 04:42:44 +08:00
|
|
|
int count = 0;
|
|
|
|
|
2014-11-12 01:04:13 +08:00
|
|
|
while (count < RING_NUM_FRAMES && header->tp_status & TP_STATUS_USER) {
|
2013-03-20 04:42:44 +08:00
|
|
|
count++;
|
|
|
|
header = ring + (count * getpagesize());
|
|
|
|
}
|
|
|
|
|
|
|
|
return count;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int sock_fanout_read(int fds[], char *rings[], const int expect[])
|
|
|
|
{
|
|
|
|
int ret[2];
|
|
|
|
|
|
|
|
ret[0] = sock_fanout_read_ring(fds[0], rings[0]);
|
|
|
|
ret[1] = sock_fanout_read_ring(fds[1], rings[1]);
|
packet: packet fanout rollover during socket overload
Changes:
v3->v2: rebase (no other changes)
passes selftest
v2->v1: read f->num_members only once
fix bug: test rollover mode + flag
Minimize packet drop in a fanout group. If one socket is full,
roll over packets to another from the group. Maintain flow
affinity during normal load using an rxhash fanout policy, while
dispersing unexpected traffic storms that hit a single cpu, such
as spoofed-source DoS flows. Rollover breaks affinity for flows
arriving at saturated sockets during those conditions.
The patch adds a fanout policy ROLLOVER that rotates between sockets,
filling each socket before moving to the next. It also adds a fanout
flag ROLLOVER. If passed along with any other fanout policy, the
primary policy is applied until the chosen socket is full. Then,
rollover selects another socket, to delay packet drop until the
entire system is saturated.
Probing sockets is not free. Selecting the last used socket, as
rollover does, is a greedy approach that maximizes chance of
success, at the cost of extreme load imbalance. In practice, with
sufficiently long queues to absorb bursts, sockets are drained in
parallel and load balance looks uniform in `top`.
To avoid contention, scales counters with number of sockets and
accesses them lockfree. Values are bounds checked to ensure
correctness.
Tested using an application with 9 threads pinned to CPUs, one socket
per thread and sufficient busywork per packet operation to limits each
thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
packets, a FANOUT_CPU setup processes 32 Kpps in total without this
patch, 270 Kpps with the patch. Tested with read() and with a packet
ring (V1).
Also, passes psock_fanout.c unit test added to selftests.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19 18:18:11 +08:00
|
|
|
|
|
|
|
fprintf(stderr, "info: count=%d,%d, expect=%d,%d\n",
|
|
|
|
ret[0], ret[1], expect[0], expect[1]);
|
|
|
|
|
|
|
|
if ((!(ret[0] == expect[0] && ret[1] == expect[1])) &&
|
|
|
|
(!(ret[0] == expect[1] && ret[1] == expect[0]))) {
|
selftests/net: ignore background traffic in psock_fanout
The packet fanout test generates UDP traffic and reads this with
a pair of packet sockets, testing the various fanout algorithms.
Avoid non-determinism from reading unrelated background traffic.
Fanout decisions are made before unrelated packets can be dropped with
a filter, so that is an insufficient strategy [*]. Run the packet
socket tests in a network namespace, similar to msg_zerocopy.
It it still good practice to install a filter on a packet socket
before accepting traffic. Because this is example code, demonstrate
that pattern. Open the socket initially bound to no protocol, install
a filter, and only then bind to ETH_P_IP.
Another source of non-determinism is hash collisions in FANOUT_HASH.
The hash function used to select a socket in the fanout group includes
the pseudorandom number hashrnd, which is not visible from userspace.
To work around this, the test tries to find a pair of UDP source ports
that do not collide. It gives up too soon (5 times, every 32 runs) and
output is confusing. Increase tries to 20 and revise the error msg.
[*] another approach would be to add a third socket to the fanout
group and direct all unexpected traffic here. This is possible
only when reimplementing methods like RR or HASH alongside this
extra catch-all bucket, using the BPF fanout method.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-24 00:56:20 +08:00
|
|
|
fprintf(stderr, "warning: incorrect queue lengths\n");
|
2013-03-20 04:42:44 +08:00
|
|
|
return 1;
|
packet: packet fanout rollover during socket overload
Changes:
v3->v2: rebase (no other changes)
passes selftest
v2->v1: read f->num_members only once
fix bug: test rollover mode + flag
Minimize packet drop in a fanout group. If one socket is full,
roll over packets to another from the group. Maintain flow
affinity during normal load using an rxhash fanout policy, while
dispersing unexpected traffic storms that hit a single cpu, such
as spoofed-source DoS flows. Rollover breaks affinity for flows
arriving at saturated sockets during those conditions.
The patch adds a fanout policy ROLLOVER that rotates between sockets,
filling each socket before moving to the next. It also adds a fanout
flag ROLLOVER. If passed along with any other fanout policy, the
primary policy is applied until the chosen socket is full. Then,
rollover selects another socket, to delay packet drop until the
entire system is saturated.
Probing sockets is not free. Selecting the last used socket, as
rollover does, is a greedy approach that maximizes chance of
success, at the cost of extreme load imbalance. In practice, with
sufficiently long queues to absorb bursts, sockets are drained in
parallel and load balance looks uniform in `top`.
To avoid contention, scales counters with number of sockets and
accesses them lockfree. Values are bounds checked to ensure
correctness.
Tested using an application with 9 threads pinned to CPUs, one socket
per thread and sufficient busywork per packet operation to limits each
thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
packets, a FANOUT_CPU setup processes 32 Kpps in total without this
patch, 270 Kpps with the patch. Tested with read() and with a packet
ring (V1).
Also, passes psock_fanout.c unit test added to selftests.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19 18:18:11 +08:00
|
|
|
}
|
2013-03-20 04:42:44 +08:00
|
|
|
|
|
|
|
return 0;
|
packet: packet fanout rollover during socket overload
Changes:
v3->v2: rebase (no other changes)
passes selftest
v2->v1: read f->num_members only once
fix bug: test rollover mode + flag
Minimize packet drop in a fanout group. If one socket is full,
roll over packets to another from the group. Maintain flow
affinity during normal load using an rxhash fanout policy, while
dispersing unexpected traffic storms that hit a single cpu, such
as spoofed-source DoS flows. Rollover breaks affinity for flows
arriving at saturated sockets during those conditions.
The patch adds a fanout policy ROLLOVER that rotates between sockets,
filling each socket before moving to the next. It also adds a fanout
flag ROLLOVER. If passed along with any other fanout policy, the
primary policy is applied until the chosen socket is full. Then,
rollover selects another socket, to delay packet drop until the
entire system is saturated.
Probing sockets is not free. Selecting the last used socket, as
rollover does, is a greedy approach that maximizes chance of
success, at the cost of extreme load imbalance. In practice, with
sufficiently long queues to absorb bursts, sockets are drained in
parallel and load balance looks uniform in `top`.
To avoid contention, scales counters with number of sockets and
accesses them lockfree. Values are bounds checked to ensure
correctness.
Tested using an application with 9 threads pinned to CPUs, one socket
per thread and sufficient busywork per packet operation to limits each
thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
packets, a FANOUT_CPU setup processes 32 Kpps in total without this
patch, 270 Kpps with the patch. Tested with read() and with a packet
ring (V1).
Also, passes psock_fanout.c unit test added to selftests.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19 18:18:11 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/* Test illegal mode + flag combination */
|
|
|
|
static void test_control_single(void)
|
|
|
|
{
|
|
|
|
fprintf(stderr, "test: control single socket\n");
|
|
|
|
|
|
|
|
if (sock_fanout_open(PACKET_FANOUT_ROLLOVER |
|
2017-04-21 22:56:12 +08:00
|
|
|
PACKET_FANOUT_FLAG_ROLLOVER, 0) != -1) {
|
packet: packet fanout rollover during socket overload
Changes:
v3->v2: rebase (no other changes)
passes selftest
v2->v1: read f->num_members only once
fix bug: test rollover mode + flag
Minimize packet drop in a fanout group. If one socket is full,
roll over packets to another from the group. Maintain flow
affinity during normal load using an rxhash fanout policy, while
dispersing unexpected traffic storms that hit a single cpu, such
as spoofed-source DoS flows. Rollover breaks affinity for flows
arriving at saturated sockets during those conditions.
The patch adds a fanout policy ROLLOVER that rotates between sockets,
filling each socket before moving to the next. It also adds a fanout
flag ROLLOVER. If passed along with any other fanout policy, the
primary policy is applied until the chosen socket is full. Then,
rollover selects another socket, to delay packet drop until the
entire system is saturated.
Probing sockets is not free. Selecting the last used socket, as
rollover does, is a greedy approach that maximizes chance of
success, at the cost of extreme load imbalance. In practice, with
sufficiently long queues to absorb bursts, sockets are drained in
parallel and load balance looks uniform in `top`.
To avoid contention, scales counters with number of sockets and
accesses them lockfree. Values are bounds checked to ensure
correctness.
Tested using an application with 9 threads pinned to CPUs, one socket
per thread and sufficient busywork per packet operation to limits each
thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
packets, a FANOUT_CPU setup processes 32 Kpps in total without this
patch, 270 Kpps with the patch. Tested with read() and with a packet
ring (V1).
Also, passes psock_fanout.c unit test added to selftests.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19 18:18:11 +08:00
|
|
|
fprintf(stderr, "ERROR: opened socket with dual rollover\n");
|
|
|
|
exit(1);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Test illegal group with different modes or flags */
|
|
|
|
static void test_control_group(void)
|
|
|
|
{
|
|
|
|
int fds[2];
|
|
|
|
|
|
|
|
fprintf(stderr, "test: control multiple sockets\n");
|
|
|
|
|
2017-04-21 22:56:12 +08:00
|
|
|
fds[0] = sock_fanout_open(PACKET_FANOUT_HASH, 0);
|
packet: packet fanout rollover during socket overload
Changes:
v3->v2: rebase (no other changes)
passes selftest
v2->v1: read f->num_members only once
fix bug: test rollover mode + flag
Minimize packet drop in a fanout group. If one socket is full,
roll over packets to another from the group. Maintain flow
affinity during normal load using an rxhash fanout policy, while
dispersing unexpected traffic storms that hit a single cpu, such
as spoofed-source DoS flows. Rollover breaks affinity for flows
arriving at saturated sockets during those conditions.
The patch adds a fanout policy ROLLOVER that rotates between sockets,
filling each socket before moving to the next. It also adds a fanout
flag ROLLOVER. If passed along with any other fanout policy, the
primary policy is applied until the chosen socket is full. Then,
rollover selects another socket, to delay packet drop until the
entire system is saturated.
Probing sockets is not free. Selecting the last used socket, as
rollover does, is a greedy approach that maximizes chance of
success, at the cost of extreme load imbalance. In practice, with
sufficiently long queues to absorb bursts, sockets are drained in
parallel and load balance looks uniform in `top`.
To avoid contention, scales counters with number of sockets and
accesses them lockfree. Values are bounds checked to ensure
correctness.
Tested using an application with 9 threads pinned to CPUs, one socket
per thread and sufficient busywork per packet operation to limits each
thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
packets, a FANOUT_CPU setup processes 32 Kpps in total without this
patch, 270 Kpps with the patch. Tested with read() and with a packet
ring (V1).
Also, passes psock_fanout.c unit test added to selftests.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19 18:18:11 +08:00
|
|
|
if (fds[0] == -1) {
|
|
|
|
fprintf(stderr, "ERROR: failed to open HASH socket\n");
|
|
|
|
exit(1);
|
|
|
|
}
|
|
|
|
if (sock_fanout_open(PACKET_FANOUT_HASH |
|
2017-04-21 22:56:12 +08:00
|
|
|
PACKET_FANOUT_FLAG_DEFRAG, 0) != -1) {
|
packet: packet fanout rollover during socket overload
Changes:
v3->v2: rebase (no other changes)
passes selftest
v2->v1: read f->num_members only once
fix bug: test rollover mode + flag
Minimize packet drop in a fanout group. If one socket is full,
roll over packets to another from the group. Maintain flow
affinity during normal load using an rxhash fanout policy, while
dispersing unexpected traffic storms that hit a single cpu, such
as spoofed-source DoS flows. Rollover breaks affinity for flows
arriving at saturated sockets during those conditions.
The patch adds a fanout policy ROLLOVER that rotates between sockets,
filling each socket before moving to the next. It also adds a fanout
flag ROLLOVER. If passed along with any other fanout policy, the
primary policy is applied until the chosen socket is full. Then,
rollover selects another socket, to delay packet drop until the
entire system is saturated.
Probing sockets is not free. Selecting the last used socket, as
rollover does, is a greedy approach that maximizes chance of
success, at the cost of extreme load imbalance. In practice, with
sufficiently long queues to absorb bursts, sockets are drained in
parallel and load balance looks uniform in `top`.
To avoid contention, scales counters with number of sockets and
accesses them lockfree. Values are bounds checked to ensure
correctness.
Tested using an application with 9 threads pinned to CPUs, one socket
per thread and sufficient busywork per packet operation to limits each
thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
packets, a FANOUT_CPU setup processes 32 Kpps in total without this
patch, 270 Kpps with the patch. Tested with read() and with a packet
ring (V1).
Also, passes psock_fanout.c unit test added to selftests.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19 18:18:11 +08:00
|
|
|
fprintf(stderr, "ERROR: joined group with wrong flag defrag\n");
|
|
|
|
exit(1);
|
|
|
|
}
|
|
|
|
if (sock_fanout_open(PACKET_FANOUT_HASH |
|
2017-04-21 22:56:12 +08:00
|
|
|
PACKET_FANOUT_FLAG_ROLLOVER, 0) != -1) {
|
packet: packet fanout rollover during socket overload
Changes:
v3->v2: rebase (no other changes)
passes selftest
v2->v1: read f->num_members only once
fix bug: test rollover mode + flag
Minimize packet drop in a fanout group. If one socket is full,
roll over packets to another from the group. Maintain flow
affinity during normal load using an rxhash fanout policy, while
dispersing unexpected traffic storms that hit a single cpu, such
as spoofed-source DoS flows. Rollover breaks affinity for flows
arriving at saturated sockets during those conditions.
The patch adds a fanout policy ROLLOVER that rotates between sockets,
filling each socket before moving to the next. It also adds a fanout
flag ROLLOVER. If passed along with any other fanout policy, the
primary policy is applied until the chosen socket is full. Then,
rollover selects another socket, to delay packet drop until the
entire system is saturated.
Probing sockets is not free. Selecting the last used socket, as
rollover does, is a greedy approach that maximizes chance of
success, at the cost of extreme load imbalance. In practice, with
sufficiently long queues to absorb bursts, sockets are drained in
parallel and load balance looks uniform in `top`.
To avoid contention, scales counters with number of sockets and
accesses them lockfree. Values are bounds checked to ensure
correctness.
Tested using an application with 9 threads pinned to CPUs, one socket
per thread and sufficient busywork per packet operation to limits each
thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
packets, a FANOUT_CPU setup processes 32 Kpps in total without this
patch, 270 Kpps with the patch. Tested with read() and with a packet
ring (V1).
Also, passes psock_fanout.c unit test added to selftests.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19 18:18:11 +08:00
|
|
|
fprintf(stderr, "ERROR: joined group with wrong flag ro\n");
|
|
|
|
exit(1);
|
|
|
|
}
|
2017-04-21 22:56:12 +08:00
|
|
|
if (sock_fanout_open(PACKET_FANOUT_CPU, 0) != -1) {
|
packet: packet fanout rollover during socket overload
Changes:
v3->v2: rebase (no other changes)
passes selftest
v2->v1: read f->num_members only once
fix bug: test rollover mode + flag
Minimize packet drop in a fanout group. If one socket is full,
roll over packets to another from the group. Maintain flow
affinity during normal load using an rxhash fanout policy, while
dispersing unexpected traffic storms that hit a single cpu, such
as spoofed-source DoS flows. Rollover breaks affinity for flows
arriving at saturated sockets during those conditions.
The patch adds a fanout policy ROLLOVER that rotates between sockets,
filling each socket before moving to the next. It also adds a fanout
flag ROLLOVER. If passed along with any other fanout policy, the
primary policy is applied until the chosen socket is full. Then,
rollover selects another socket, to delay packet drop until the
entire system is saturated.
Probing sockets is not free. Selecting the last used socket, as
rollover does, is a greedy approach that maximizes chance of
success, at the cost of extreme load imbalance. In practice, with
sufficiently long queues to absorb bursts, sockets are drained in
parallel and load balance looks uniform in `top`.
To avoid contention, scales counters with number of sockets and
accesses them lockfree. Values are bounds checked to ensure
correctness.
Tested using an application with 9 threads pinned to CPUs, one socket
per thread and sufficient busywork per packet operation to limits each
thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
packets, a FANOUT_CPU setup processes 32 Kpps in total without this
patch, 270 Kpps with the patch. Tested with read() and with a packet
ring (V1).
Also, passes psock_fanout.c unit test added to selftests.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19 18:18:11 +08:00
|
|
|
fprintf(stderr, "ERROR: joined group with wrong mode\n");
|
|
|
|
exit(1);
|
|
|
|
}
|
2017-04-21 22:56:12 +08:00
|
|
|
fds[1] = sock_fanout_open(PACKET_FANOUT_HASH, 0);
|
packet: packet fanout rollover during socket overload
Changes:
v3->v2: rebase (no other changes)
passes selftest
v2->v1: read f->num_members only once
fix bug: test rollover mode + flag
Minimize packet drop in a fanout group. If one socket is full,
roll over packets to another from the group. Maintain flow
affinity during normal load using an rxhash fanout policy, while
dispersing unexpected traffic storms that hit a single cpu, such
as spoofed-source DoS flows. Rollover breaks affinity for flows
arriving at saturated sockets during those conditions.
The patch adds a fanout policy ROLLOVER that rotates between sockets,
filling each socket before moving to the next. It also adds a fanout
flag ROLLOVER. If passed along with any other fanout policy, the
primary policy is applied until the chosen socket is full. Then,
rollover selects another socket, to delay packet drop until the
entire system is saturated.
Probing sockets is not free. Selecting the last used socket, as
rollover does, is a greedy approach that maximizes chance of
success, at the cost of extreme load imbalance. In practice, with
sufficiently long queues to absorb bursts, sockets are drained in
parallel and load balance looks uniform in `top`.
To avoid contention, scales counters with number of sockets and
accesses them lockfree. Values are bounds checked to ensure
correctness.
Tested using an application with 9 threads pinned to CPUs, one socket
per thread and sufficient busywork per packet operation to limits each
thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
packets, a FANOUT_CPU setup processes 32 Kpps in total without this
patch, 270 Kpps with the patch. Tested with read() and with a packet
ring (V1).
Also, passes psock_fanout.c unit test added to selftests.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19 18:18:11 +08:00
|
|
|
if (fds[1] == -1) {
|
|
|
|
fprintf(stderr, "ERROR: failed to join group\n");
|
|
|
|
exit(1);
|
|
|
|
}
|
|
|
|
if (close(fds[1]) || close(fds[0])) {
|
|
|
|
fprintf(stderr, "ERROR: closing sockets\n");
|
|
|
|
exit(1);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2020-11-07 02:07:41 +08:00
|
|
|
/* Test illegal max_num_members values */
|
|
|
|
static void test_control_group_max_num_members(void)
|
|
|
|
{
|
|
|
|
int fds[3];
|
|
|
|
|
|
|
|
fprintf(stderr, "test: control multiple sockets, max_num_members\n");
|
|
|
|
|
|
|
|
/* expected failure on greater than PACKET_FANOUT_MAX */
|
|
|
|
cfg_max_num_members = (1 << 16) + 1;
|
|
|
|
if (sock_fanout_open(PACKET_FANOUT_HASH, 0) != -1) {
|
|
|
|
fprintf(stderr, "ERROR: max_num_members > PACKET_FANOUT_MAX\n");
|
|
|
|
exit(1);
|
|
|
|
}
|
|
|
|
|
|
|
|
cfg_max_num_members = 256;
|
|
|
|
fds[0] = sock_fanout_open(PACKET_FANOUT_HASH, 0);
|
|
|
|
if (fds[0] == -1) {
|
|
|
|
fprintf(stderr, "ERROR: failed open\n");
|
|
|
|
exit(1);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* expected failure on joining group with different max_num_members */
|
|
|
|
cfg_max_num_members = 257;
|
|
|
|
if (sock_fanout_open(PACKET_FANOUT_HASH, 0) != -1) {
|
|
|
|
fprintf(stderr, "ERROR: set different max_num_members\n");
|
|
|
|
exit(1);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* success on joining group with same max_num_members */
|
|
|
|
cfg_max_num_members = 256;
|
|
|
|
fds[1] = sock_fanout_open(PACKET_FANOUT_HASH, 0);
|
|
|
|
if (fds[1] == -1) {
|
|
|
|
fprintf(stderr, "ERROR: failed to join group\n");
|
|
|
|
exit(1);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* success on joining group with max_num_members unspecified */
|
|
|
|
cfg_max_num_members = 0;
|
|
|
|
fds[2] = sock_fanout_open(PACKET_FANOUT_HASH, 0);
|
|
|
|
if (fds[2] == -1) {
|
|
|
|
fprintf(stderr, "ERROR: failed to join group\n");
|
|
|
|
exit(1);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (close(fds[2]) || close(fds[1]) || close(fds[0])) {
|
|
|
|
fprintf(stderr, "ERROR: closing sockets\n");
|
|
|
|
exit(1);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-04-21 22:56:12 +08:00
|
|
|
/* Test creating a unique fanout group ids */
|
|
|
|
static void test_unique_fanout_group_ids(void)
|
|
|
|
{
|
|
|
|
int fds[3];
|
|
|
|
uint16_t typeflags, first_group_id, second_group_id;
|
|
|
|
|
|
|
|
fprintf(stderr, "test: unique ids\n");
|
|
|
|
|
|
|
|
fds[0] = sock_fanout_open(PACKET_FANOUT_HASH |
|
|
|
|
PACKET_FANOUT_FLAG_UNIQUEID, 0);
|
|
|
|
if (fds[0] == -1) {
|
|
|
|
fprintf(stderr, "ERROR: failed to create a unique id group.\n");
|
|
|
|
exit(1);
|
|
|
|
}
|
|
|
|
|
|
|
|
sock_fanout_getopts(fds[0], &typeflags, &first_group_id);
|
|
|
|
if (typeflags != PACKET_FANOUT_HASH) {
|
|
|
|
fprintf(stderr, "ERROR: unexpected typeflags %x\n", typeflags);
|
|
|
|
exit(1);
|
|
|
|
}
|
|
|
|
|
2017-04-25 09:29:11 +08:00
|
|
|
if (sock_fanout_open(PACKET_FANOUT_CPU, first_group_id) != -1) {
|
2017-04-21 22:56:12 +08:00
|
|
|
fprintf(stderr, "ERROR: joined group with wrong type.\n");
|
|
|
|
exit(1);
|
|
|
|
}
|
|
|
|
|
|
|
|
fds[1] = sock_fanout_open(PACKET_FANOUT_HASH, first_group_id);
|
|
|
|
if (fds[1] == -1) {
|
|
|
|
fprintf(stderr,
|
|
|
|
"ERROR: failed to join previously created group.\n");
|
|
|
|
exit(1);
|
|
|
|
}
|
|
|
|
|
|
|
|
fds[2] = sock_fanout_open(PACKET_FANOUT_HASH |
|
|
|
|
PACKET_FANOUT_FLAG_UNIQUEID, 0);
|
|
|
|
if (fds[2] == -1) {
|
|
|
|
fprintf(stderr,
|
|
|
|
"ERROR: failed to create a second unique id group.\n");
|
|
|
|
exit(1);
|
|
|
|
}
|
|
|
|
|
|
|
|
sock_fanout_getopts(fds[2], &typeflags, &second_group_id);
|
|
|
|
if (sock_fanout_open(PACKET_FANOUT_HASH | PACKET_FANOUT_FLAG_UNIQUEID,
|
|
|
|
second_group_id) != -1) {
|
|
|
|
fprintf(stderr,
|
|
|
|
"ERROR: specified a group id when requesting unique id\n");
|
|
|
|
exit(1);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (close(fds[0]) || close(fds[1]) || close(fds[2])) {
|
|
|
|
fprintf(stderr, "ERROR: closing sockets\n");
|
|
|
|
exit(1);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2013-03-20 04:42:44 +08:00
|
|
|
static int test_datapath(uint16_t typeflags, int port_off,
|
|
|
|
const int expect1[], const int expect2[])
|
packet: packet fanout rollover during socket overload
Changes:
v3->v2: rebase (no other changes)
passes selftest
v2->v1: read f->num_members only once
fix bug: test rollover mode + flag
Minimize packet drop in a fanout group. If one socket is full,
roll over packets to another from the group. Maintain flow
affinity during normal load using an rxhash fanout policy, while
dispersing unexpected traffic storms that hit a single cpu, such
as spoofed-source DoS flows. Rollover breaks affinity for flows
arriving at saturated sockets during those conditions.
The patch adds a fanout policy ROLLOVER that rotates between sockets,
filling each socket before moving to the next. It also adds a fanout
flag ROLLOVER. If passed along with any other fanout policy, the
primary policy is applied until the chosen socket is full. Then,
rollover selects another socket, to delay packet drop until the
entire system is saturated.
Probing sockets is not free. Selecting the last used socket, as
rollover does, is a greedy approach that maximizes chance of
success, at the cost of extreme load imbalance. In practice, with
sufficiently long queues to absorb bursts, sockets are drained in
parallel and load balance looks uniform in `top`.
To avoid contention, scales counters with number of sockets and
accesses them lockfree. Values are bounds checked to ensure
correctness.
Tested using an application with 9 threads pinned to CPUs, one socket
per thread and sufficient busywork per packet operation to limits each
thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
packets, a FANOUT_CPU setup processes 32 Kpps in total without this
patch, 270 Kpps with the patch. Tested with read() and with a packet
ring (V1).
Also, passes psock_fanout.c unit test added to selftests.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19 18:18:11 +08:00
|
|
|
{
|
|
|
|
const int expect0[] = { 0, 0 };
|
2013-03-20 04:42:44 +08:00
|
|
|
char *rings[2];
|
2015-08-15 10:31:36 +08:00
|
|
|
uint8_t type = typeflags & 0xFF;
|
2013-03-20 04:42:44 +08:00
|
|
|
int fds[2], fds_udp[2][2], ret;
|
packet: packet fanout rollover during socket overload
Changes:
v3->v2: rebase (no other changes)
passes selftest
v2->v1: read f->num_members only once
fix bug: test rollover mode + flag
Minimize packet drop in a fanout group. If one socket is full,
roll over packets to another from the group. Maintain flow
affinity during normal load using an rxhash fanout policy, while
dispersing unexpected traffic storms that hit a single cpu, such
as spoofed-source DoS flows. Rollover breaks affinity for flows
arriving at saturated sockets during those conditions.
The patch adds a fanout policy ROLLOVER that rotates between sockets,
filling each socket before moving to the next. It also adds a fanout
flag ROLLOVER. If passed along with any other fanout policy, the
primary policy is applied until the chosen socket is full. Then,
rollover selects another socket, to delay packet drop until the
entire system is saturated.
Probing sockets is not free. Selecting the last used socket, as
rollover does, is a greedy approach that maximizes chance of
success, at the cost of extreme load imbalance. In practice, with
sufficiently long queues to absorb bursts, sockets are drained in
parallel and load balance looks uniform in `top`.
To avoid contention, scales counters with number of sockets and
accesses them lockfree. Values are bounds checked to ensure
correctness.
Tested using an application with 9 threads pinned to CPUs, one socket
per thread and sufficient busywork per packet operation to limits each
thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
packets, a FANOUT_CPU setup processes 32 Kpps in total without this
patch, 270 Kpps with the patch. Tested with read() and with a packet
ring (V1).
Also, passes psock_fanout.c unit test added to selftests.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19 18:18:11 +08:00
|
|
|
|
selftests/net: ignore background traffic in psock_fanout
The packet fanout test generates UDP traffic and reads this with
a pair of packet sockets, testing the various fanout algorithms.
Avoid non-determinism from reading unrelated background traffic.
Fanout decisions are made before unrelated packets can be dropped with
a filter, so that is an insufficient strategy [*]. Run the packet
socket tests in a network namespace, similar to msg_zerocopy.
It it still good practice to install a filter on a packet socket
before accepting traffic. Because this is example code, demonstrate
that pattern. Open the socket initially bound to no protocol, install
a filter, and only then bind to ETH_P_IP.
Another source of non-determinism is hash collisions in FANOUT_HASH.
The hash function used to select a socket in the fanout group includes
the pseudorandom number hashrnd, which is not visible from userspace.
To work around this, the test tries to find a pair of UDP source ports
that do not collide. It gives up too soon (5 times, every 32 runs) and
output is confusing. Increase tries to 20 and revise the error msg.
[*] another approach would be to add a third socket to the fanout
group and direct all unexpected traffic here. This is possible
only when reimplementing methods like RR or HASH alongside this
extra catch-all bucket, using the BPF fanout method.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-24 00:56:20 +08:00
|
|
|
fprintf(stderr, "\ntest: datapath 0x%hx ports %hu,%hu\n",
|
2020-07-28 00:25:29 +08:00
|
|
|
typeflags, (uint16_t)PORT_BASE,
|
|
|
|
(uint16_t)(PORT_BASE + port_off));
|
packet: packet fanout rollover during socket overload
Changes:
v3->v2: rebase (no other changes)
passes selftest
v2->v1: read f->num_members only once
fix bug: test rollover mode + flag
Minimize packet drop in a fanout group. If one socket is full,
roll over packets to another from the group. Maintain flow
affinity during normal load using an rxhash fanout policy, while
dispersing unexpected traffic storms that hit a single cpu, such
as spoofed-source DoS flows. Rollover breaks affinity for flows
arriving at saturated sockets during those conditions.
The patch adds a fanout policy ROLLOVER that rotates between sockets,
filling each socket before moving to the next. It also adds a fanout
flag ROLLOVER. If passed along with any other fanout policy, the
primary policy is applied until the chosen socket is full. Then,
rollover selects another socket, to delay packet drop until the
entire system is saturated.
Probing sockets is not free. Selecting the last used socket, as
rollover does, is a greedy approach that maximizes chance of
success, at the cost of extreme load imbalance. In practice, with
sufficiently long queues to absorb bursts, sockets are drained in
parallel and load balance looks uniform in `top`.
To avoid contention, scales counters with number of sockets and
accesses them lockfree. Values are bounds checked to ensure
correctness.
Tested using an application with 9 threads pinned to CPUs, one socket
per thread and sufficient busywork per packet operation to limits each
thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
packets, a FANOUT_CPU setup processes 32 Kpps in total without this
patch, 270 Kpps with the patch. Tested with read() and with a packet
ring (V1).
Also, passes psock_fanout.c unit test added to selftests.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19 18:18:11 +08:00
|
|
|
|
2017-04-21 22:56:12 +08:00
|
|
|
fds[0] = sock_fanout_open(typeflags, 0);
|
|
|
|
fds[1] = sock_fanout_open(typeflags, 0);
|
packet: packet fanout rollover during socket overload
Changes:
v3->v2: rebase (no other changes)
passes selftest
v2->v1: read f->num_members only once
fix bug: test rollover mode + flag
Minimize packet drop in a fanout group. If one socket is full,
roll over packets to another from the group. Maintain flow
affinity during normal load using an rxhash fanout policy, while
dispersing unexpected traffic storms that hit a single cpu, such
as spoofed-source DoS flows. Rollover breaks affinity for flows
arriving at saturated sockets during those conditions.
The patch adds a fanout policy ROLLOVER that rotates between sockets,
filling each socket before moving to the next. It also adds a fanout
flag ROLLOVER. If passed along with any other fanout policy, the
primary policy is applied until the chosen socket is full. Then,
rollover selects another socket, to delay packet drop until the
entire system is saturated.
Probing sockets is not free. Selecting the last used socket, as
rollover does, is a greedy approach that maximizes chance of
success, at the cost of extreme load imbalance. In practice, with
sufficiently long queues to absorb bursts, sockets are drained in
parallel and load balance looks uniform in `top`.
To avoid contention, scales counters with number of sockets and
accesses them lockfree. Values are bounds checked to ensure
correctness.
Tested using an application with 9 threads pinned to CPUs, one socket
per thread and sufficient busywork per packet operation to limits each
thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
packets, a FANOUT_CPU setup processes 32 Kpps in total without this
patch, 270 Kpps with the patch. Tested with read() and with a packet
ring (V1).
Also, passes psock_fanout.c unit test added to selftests.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19 18:18:11 +08:00
|
|
|
if (fds[0] == -1 || fds[1] == -1) {
|
|
|
|
fprintf(stderr, "ERROR: failed open\n");
|
|
|
|
exit(1);
|
|
|
|
}
|
2015-08-15 10:31:36 +08:00
|
|
|
if (type == PACKET_FANOUT_CBPF)
|
2017-04-18 23:14:16 +08:00
|
|
|
sock_fanout_set_cbpf(fds[0]);
|
2015-08-15 10:31:37 +08:00
|
|
|
else if (type == PACKET_FANOUT_EBPF)
|
|
|
|
sock_fanout_set_ebpf(fds[0]);
|
2015-08-15 10:31:36 +08:00
|
|
|
|
2013-03-20 04:42:44 +08:00
|
|
|
rings[0] = sock_fanout_open_ring(fds[0]);
|
|
|
|
rings[1] = sock_fanout_open_ring(fds[1]);
|
|
|
|
pair_udp_open(fds_udp[0], PORT_BASE);
|
|
|
|
pair_udp_open(fds_udp[1], PORT_BASE + port_off);
|
|
|
|
sock_fanout_read(fds, rings, expect0);
|
packet: packet fanout rollover during socket overload
Changes:
v3->v2: rebase (no other changes)
passes selftest
v2->v1: read f->num_members only once
fix bug: test rollover mode + flag
Minimize packet drop in a fanout group. If one socket is full,
roll over packets to another from the group. Maintain flow
affinity during normal load using an rxhash fanout policy, while
dispersing unexpected traffic storms that hit a single cpu, such
as spoofed-source DoS flows. Rollover breaks affinity for flows
arriving at saturated sockets during those conditions.
The patch adds a fanout policy ROLLOVER that rotates between sockets,
filling each socket before moving to the next. It also adds a fanout
flag ROLLOVER. If passed along with any other fanout policy, the
primary policy is applied until the chosen socket is full. Then,
rollover selects another socket, to delay packet drop until the
entire system is saturated.
Probing sockets is not free. Selecting the last used socket, as
rollover does, is a greedy approach that maximizes chance of
success, at the cost of extreme load imbalance. In practice, with
sufficiently long queues to absorb bursts, sockets are drained in
parallel and load balance looks uniform in `top`.
To avoid contention, scales counters with number of sockets and
accesses them lockfree. Values are bounds checked to ensure
correctness.
Tested using an application with 9 threads pinned to CPUs, one socket
per thread and sufficient busywork per packet operation to limits each
thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
packets, a FANOUT_CPU setup processes 32 Kpps in total without this
patch, 270 Kpps with the patch. Tested with read() and with a packet
ring (V1).
Also, passes psock_fanout.c unit test added to selftests.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19 18:18:11 +08:00
|
|
|
|
|
|
|
/* Send data, but not enough to overflow a queue */
|
|
|
|
pair_udp_send(fds_udp[0], 15);
|
2015-08-15 10:31:36 +08:00
|
|
|
pair_udp_send_char(fds_udp[1], 5, DATA_CHAR_1);
|
2013-03-20 04:42:44 +08:00
|
|
|
ret = sock_fanout_read(fds, rings, expect1);
|
packet: packet fanout rollover during socket overload
Changes:
v3->v2: rebase (no other changes)
passes selftest
v2->v1: read f->num_members only once
fix bug: test rollover mode + flag
Minimize packet drop in a fanout group. If one socket is full,
roll over packets to another from the group. Maintain flow
affinity during normal load using an rxhash fanout policy, while
dispersing unexpected traffic storms that hit a single cpu, such
as spoofed-source DoS flows. Rollover breaks affinity for flows
arriving at saturated sockets during those conditions.
The patch adds a fanout policy ROLLOVER that rotates between sockets,
filling each socket before moving to the next. It also adds a fanout
flag ROLLOVER. If passed along with any other fanout policy, the
primary policy is applied until the chosen socket is full. Then,
rollover selects another socket, to delay packet drop until the
entire system is saturated.
Probing sockets is not free. Selecting the last used socket, as
rollover does, is a greedy approach that maximizes chance of
success, at the cost of extreme load imbalance. In practice, with
sufficiently long queues to absorb bursts, sockets are drained in
parallel and load balance looks uniform in `top`.
To avoid contention, scales counters with number of sockets and
accesses them lockfree. Values are bounds checked to ensure
correctness.
Tested using an application with 9 threads pinned to CPUs, one socket
per thread and sufficient busywork per packet operation to limits each
thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
packets, a FANOUT_CPU setup processes 32 Kpps in total without this
patch, 270 Kpps with the patch. Tested with read() and with a packet
ring (V1).
Also, passes psock_fanout.c unit test added to selftests.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19 18:18:11 +08:00
|
|
|
|
|
|
|
/* Send more data, overflow the queue */
|
2015-08-15 10:31:36 +08:00
|
|
|
pair_udp_send_char(fds_udp[0], 15, DATA_CHAR_1);
|
packet: packet fanout rollover during socket overload
Changes:
v3->v2: rebase (no other changes)
passes selftest
v2->v1: read f->num_members only once
fix bug: test rollover mode + flag
Minimize packet drop in a fanout group. If one socket is full,
roll over packets to another from the group. Maintain flow
affinity during normal load using an rxhash fanout policy, while
dispersing unexpected traffic storms that hit a single cpu, such
as spoofed-source DoS flows. Rollover breaks affinity for flows
arriving at saturated sockets during those conditions.
The patch adds a fanout policy ROLLOVER that rotates between sockets,
filling each socket before moving to the next. It also adds a fanout
flag ROLLOVER. If passed along with any other fanout policy, the
primary policy is applied until the chosen socket is full. Then,
rollover selects another socket, to delay packet drop until the
entire system is saturated.
Probing sockets is not free. Selecting the last used socket, as
rollover does, is a greedy approach that maximizes chance of
success, at the cost of extreme load imbalance. In practice, with
sufficiently long queues to absorb bursts, sockets are drained in
parallel and load balance looks uniform in `top`.
To avoid contention, scales counters with number of sockets and
accesses them lockfree. Values are bounds checked to ensure
correctness.
Tested using an application with 9 threads pinned to CPUs, one socket
per thread and sufficient busywork per packet operation to limits each
thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
packets, a FANOUT_CPU setup processes 32 Kpps in total without this
patch, 270 Kpps with the patch. Tested with read() and with a packet
ring (V1).
Also, passes psock_fanout.c unit test added to selftests.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19 18:18:11 +08:00
|
|
|
/* TODO: ensure consistent order between expect1 and expect2 */
|
2013-03-20 04:42:44 +08:00
|
|
|
ret |= sock_fanout_read(fds, rings, expect2);
|
packet: packet fanout rollover during socket overload
Changes:
v3->v2: rebase (no other changes)
passes selftest
v2->v1: read f->num_members only once
fix bug: test rollover mode + flag
Minimize packet drop in a fanout group. If one socket is full,
roll over packets to another from the group. Maintain flow
affinity during normal load using an rxhash fanout policy, while
dispersing unexpected traffic storms that hit a single cpu, such
as spoofed-source DoS flows. Rollover breaks affinity for flows
arriving at saturated sockets during those conditions.
The patch adds a fanout policy ROLLOVER that rotates between sockets,
filling each socket before moving to the next. It also adds a fanout
flag ROLLOVER. If passed along with any other fanout policy, the
primary policy is applied until the chosen socket is full. Then,
rollover selects another socket, to delay packet drop until the
entire system is saturated.
Probing sockets is not free. Selecting the last used socket, as
rollover does, is a greedy approach that maximizes chance of
success, at the cost of extreme load imbalance. In practice, with
sufficiently long queues to absorb bursts, sockets are drained in
parallel and load balance looks uniform in `top`.
To avoid contention, scales counters with number of sockets and
accesses them lockfree. Values are bounds checked to ensure
correctness.
Tested using an application with 9 threads pinned to CPUs, one socket
per thread and sufficient busywork per packet operation to limits each
thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
packets, a FANOUT_CPU setup processes 32 Kpps in total without this
patch, 270 Kpps with the patch. Tested with read() and with a packet
ring (V1).
Also, passes psock_fanout.c unit test added to selftests.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19 18:18:11 +08:00
|
|
|
|
2013-03-20 04:42:44 +08:00
|
|
|
if (munmap(rings[1], RING_NUM_FRAMES * getpagesize()) ||
|
|
|
|
munmap(rings[0], RING_NUM_FRAMES * getpagesize())) {
|
|
|
|
fprintf(stderr, "close rings\n");
|
|
|
|
exit(1);
|
|
|
|
}
|
packet: packet fanout rollover during socket overload
Changes:
v3->v2: rebase (no other changes)
passes selftest
v2->v1: read f->num_members only once
fix bug: test rollover mode + flag
Minimize packet drop in a fanout group. If one socket is full,
roll over packets to another from the group. Maintain flow
affinity during normal load using an rxhash fanout policy, while
dispersing unexpected traffic storms that hit a single cpu, such
as spoofed-source DoS flows. Rollover breaks affinity for flows
arriving at saturated sockets during those conditions.
The patch adds a fanout policy ROLLOVER that rotates between sockets,
filling each socket before moving to the next. It also adds a fanout
flag ROLLOVER. If passed along with any other fanout policy, the
primary policy is applied until the chosen socket is full. Then,
rollover selects another socket, to delay packet drop until the
entire system is saturated.
Probing sockets is not free. Selecting the last used socket, as
rollover does, is a greedy approach that maximizes chance of
success, at the cost of extreme load imbalance. In practice, with
sufficiently long queues to absorb bursts, sockets are drained in
parallel and load balance looks uniform in `top`.
To avoid contention, scales counters with number of sockets and
accesses them lockfree. Values are bounds checked to ensure
correctness.
Tested using an application with 9 threads pinned to CPUs, one socket
per thread and sufficient busywork per packet operation to limits each
thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
packets, a FANOUT_CPU setup processes 32 Kpps in total without this
patch, 270 Kpps with the patch. Tested with read() and with a packet
ring (V1).
Also, passes psock_fanout.c unit test added to selftests.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19 18:18:11 +08:00
|
|
|
if (close(fds_udp[1][1]) || close(fds_udp[1][0]) ||
|
|
|
|
close(fds_udp[0][1]) || close(fds_udp[0][0]) ||
|
|
|
|
close(fds[1]) || close(fds[0])) {
|
|
|
|
fprintf(stderr, "close datapath\n");
|
|
|
|
exit(1);
|
|
|
|
}
|
2013-03-20 04:42:44 +08:00
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int set_cpuaffinity(int cpuid)
|
|
|
|
{
|
|
|
|
cpu_set_t mask;
|
|
|
|
|
|
|
|
CPU_ZERO(&mask);
|
|
|
|
CPU_SET(cpuid, &mask);
|
|
|
|
if (sched_setaffinity(0, sizeof(mask), &mask)) {
|
|
|
|
if (errno != EINVAL) {
|
|
|
|
fprintf(stderr, "setaffinity %d\n", cpuid);
|
|
|
|
exit(1);
|
|
|
|
}
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
packet: packet fanout rollover during socket overload
Changes:
v3->v2: rebase (no other changes)
passes selftest
v2->v1: read f->num_members only once
fix bug: test rollover mode + flag
Minimize packet drop in a fanout group. If one socket is full,
roll over packets to another from the group. Maintain flow
affinity during normal load using an rxhash fanout policy, while
dispersing unexpected traffic storms that hit a single cpu, such
as spoofed-source DoS flows. Rollover breaks affinity for flows
arriving at saturated sockets during those conditions.
The patch adds a fanout policy ROLLOVER that rotates between sockets,
filling each socket before moving to the next. It also adds a fanout
flag ROLLOVER. If passed along with any other fanout policy, the
primary policy is applied until the chosen socket is full. Then,
rollover selects another socket, to delay packet drop until the
entire system is saturated.
Probing sockets is not free. Selecting the last used socket, as
rollover does, is a greedy approach that maximizes chance of
success, at the cost of extreme load imbalance. In practice, with
sufficiently long queues to absorb bursts, sockets are drained in
parallel and load balance looks uniform in `top`.
To avoid contention, scales counters with number of sockets and
accesses them lockfree. Values are bounds checked to ensure
correctness.
Tested using an application with 9 threads pinned to CPUs, one socket
per thread and sufficient busywork per packet operation to limits each
thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
packets, a FANOUT_CPU setup processes 32 Kpps in total without this
patch, 270 Kpps with the patch. Tested with read() and with a packet
ring (V1).
Also, passes psock_fanout.c unit test added to selftests.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19 18:18:11 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
int main(int argc, char **argv)
|
|
|
|
{
|
2013-03-20 04:42:44 +08:00
|
|
|
const int expect_hash[2][2] = { { 15, 5 }, { 20, 5 } };
|
|
|
|
const int expect_hash_rb[2][2] = { { 15, 5 }, { 20, 15 } };
|
|
|
|
const int expect_lb[2][2] = { { 10, 10 }, { 18, 17 } };
|
2015-05-19 03:42:11 +08:00
|
|
|
const int expect_rb[2][2] = { { 15, 5 }, { 20, 15 } };
|
2013-03-20 04:42:44 +08:00
|
|
|
const int expect_cpu0[2][2] = { { 20, 0 }, { 20, 0 } };
|
|
|
|
const int expect_cpu1[2][2] = { { 0, 20 }, { 0, 20 } };
|
2015-08-15 10:31:36 +08:00
|
|
|
const int expect_bpf[2][2] = { { 15, 5 }, { 15, 20 } };
|
2017-04-21 22:56:12 +08:00
|
|
|
const int expect_uniqueid[2][2] = { { 20, 20}, { 20, 20 } };
|
selftests/net: ignore background traffic in psock_fanout
The packet fanout test generates UDP traffic and reads this with
a pair of packet sockets, testing the various fanout algorithms.
Avoid non-determinism from reading unrelated background traffic.
Fanout decisions are made before unrelated packets can be dropped with
a filter, so that is an insufficient strategy [*]. Run the packet
socket tests in a network namespace, similar to msg_zerocopy.
It it still good practice to install a filter on a packet socket
before accepting traffic. Because this is example code, demonstrate
that pattern. Open the socket initially bound to no protocol, install
a filter, and only then bind to ETH_P_IP.
Another source of non-determinism is hash collisions in FANOUT_HASH.
The hash function used to select a socket in the fanout group includes
the pseudorandom number hashrnd, which is not visible from userspace.
To work around this, the test tries to find a pair of UDP source ports
that do not collide. It gives up too soon (5 times, every 32 runs) and
output is confusing. Increase tries to 20 and revise the error msg.
[*] another approach would be to add a third socket to the fanout
group and direct all unexpected traffic here. This is possible
only when reimplementing methods like RR or HASH alongside this
extra catch-all bucket, using the BPF fanout method.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-24 00:56:20 +08:00
|
|
|
int port_off = 2, tries = 20, ret;
|
packet: packet fanout rollover during socket overload
Changes:
v3->v2: rebase (no other changes)
passes selftest
v2->v1: read f->num_members only once
fix bug: test rollover mode + flag
Minimize packet drop in a fanout group. If one socket is full,
roll over packets to another from the group. Maintain flow
affinity during normal load using an rxhash fanout policy, while
dispersing unexpected traffic storms that hit a single cpu, such
as spoofed-source DoS flows. Rollover breaks affinity for flows
arriving at saturated sockets during those conditions.
The patch adds a fanout policy ROLLOVER that rotates between sockets,
filling each socket before moving to the next. It also adds a fanout
flag ROLLOVER. If passed along with any other fanout policy, the
primary policy is applied until the chosen socket is full. Then,
rollover selects another socket, to delay packet drop until the
entire system is saturated.
Probing sockets is not free. Selecting the last used socket, as
rollover does, is a greedy approach that maximizes chance of
success, at the cost of extreme load imbalance. In practice, with
sufficiently long queues to absorb bursts, sockets are drained in
parallel and load balance looks uniform in `top`.
To avoid contention, scales counters with number of sockets and
accesses them lockfree. Values are bounds checked to ensure
correctness.
Tested using an application with 9 threads pinned to CPUs, one socket
per thread and sufficient busywork per packet operation to limits each
thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
packets, a FANOUT_CPU setup processes 32 Kpps in total without this
patch, 270 Kpps with the patch. Tested with read() and with a packet
ring (V1).
Also, passes psock_fanout.c unit test added to selftests.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19 18:18:11 +08:00
|
|
|
|
|
|
|
test_control_single();
|
|
|
|
test_control_group();
|
2020-11-07 02:07:41 +08:00
|
|
|
test_control_group_max_num_members();
|
2017-04-21 22:56:12 +08:00
|
|
|
test_unique_fanout_group_ids();
|
packet: packet fanout rollover during socket overload
Changes:
v3->v2: rebase (no other changes)
passes selftest
v2->v1: read f->num_members only once
fix bug: test rollover mode + flag
Minimize packet drop in a fanout group. If one socket is full,
roll over packets to another from the group. Maintain flow
affinity during normal load using an rxhash fanout policy, while
dispersing unexpected traffic storms that hit a single cpu, such
as spoofed-source DoS flows. Rollover breaks affinity for flows
arriving at saturated sockets during those conditions.
The patch adds a fanout policy ROLLOVER that rotates between sockets,
filling each socket before moving to the next. It also adds a fanout
flag ROLLOVER. If passed along with any other fanout policy, the
primary policy is applied until the chosen socket is full. Then,
rollover selects another socket, to delay packet drop until the
entire system is saturated.
Probing sockets is not free. Selecting the last used socket, as
rollover does, is a greedy approach that maximizes chance of
success, at the cost of extreme load imbalance. In practice, with
sufficiently long queues to absorb bursts, sockets are drained in
parallel and load balance looks uniform in `top`.
To avoid contention, scales counters with number of sockets and
accesses them lockfree. Values are bounds checked to ensure
correctness.
Tested using an application with 9 threads pinned to CPUs, one socket
per thread and sufficient busywork per packet operation to limits each
thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
packets, a FANOUT_CPU setup processes 32 Kpps in total without this
patch, 270 Kpps with the patch. Tested with read() and with a packet
ring (V1).
Also, passes psock_fanout.c unit test added to selftests.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19 18:18:11 +08:00
|
|
|
|
2020-11-07 02:07:41 +08:00
|
|
|
/* PACKET_FANOUT_MAX */
|
|
|
|
cfg_max_num_members = 1 << 16;
|
2013-03-20 04:42:44 +08:00
|
|
|
/* find a set of ports that do not collide onto the same socket */
|
|
|
|
ret = test_datapath(PACKET_FANOUT_HASH, port_off,
|
|
|
|
expect_hash[0], expect_hash[1]);
|
selftests/net: ignore background traffic in psock_fanout
The packet fanout test generates UDP traffic and reads this with
a pair of packet sockets, testing the various fanout algorithms.
Avoid non-determinism from reading unrelated background traffic.
Fanout decisions are made before unrelated packets can be dropped with
a filter, so that is an insufficient strategy [*]. Run the packet
socket tests in a network namespace, similar to msg_zerocopy.
It it still good practice to install a filter on a packet socket
before accepting traffic. Because this is example code, demonstrate
that pattern. Open the socket initially bound to no protocol, install
a filter, and only then bind to ETH_P_IP.
Another source of non-determinism is hash collisions in FANOUT_HASH.
The hash function used to select a socket in the fanout group includes
the pseudorandom number hashrnd, which is not visible from userspace.
To work around this, the test tries to find a pair of UDP source ports
that do not collide. It gives up too soon (5 times, every 32 runs) and
output is confusing. Increase tries to 20 and revise the error msg.
[*] another approach would be to add a third socket to the fanout
group and direct all unexpected traffic here. This is possible
only when reimplementing methods like RR or HASH alongside this
extra catch-all bucket, using the BPF fanout method.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-24 00:56:20 +08:00
|
|
|
while (ret) {
|
2013-03-20 04:42:44 +08:00
|
|
|
fprintf(stderr, "info: trying alternate ports (%d)\n", tries);
|
|
|
|
ret = test_datapath(PACKET_FANOUT_HASH, ++port_off,
|
|
|
|
expect_hash[0], expect_hash[1]);
|
selftests/net: ignore background traffic in psock_fanout
The packet fanout test generates UDP traffic and reads this with
a pair of packet sockets, testing the various fanout algorithms.
Avoid non-determinism from reading unrelated background traffic.
Fanout decisions are made before unrelated packets can be dropped with
a filter, so that is an insufficient strategy [*]. Run the packet
socket tests in a network namespace, similar to msg_zerocopy.
It it still good practice to install a filter on a packet socket
before accepting traffic. Because this is example code, demonstrate
that pattern. Open the socket initially bound to no protocol, install
a filter, and only then bind to ETH_P_IP.
Another source of non-determinism is hash collisions in FANOUT_HASH.
The hash function used to select a socket in the fanout group includes
the pseudorandom number hashrnd, which is not visible from userspace.
To work around this, the test tries to find a pair of UDP source ports
that do not collide. It gives up too soon (5 times, every 32 runs) and
output is confusing. Increase tries to 20 and revise the error msg.
[*] another approach would be to add a third socket to the fanout
group and direct all unexpected traffic here. This is possible
only when reimplementing methods like RR or HASH alongside this
extra catch-all bucket, using the BPF fanout method.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-24 00:56:20 +08:00
|
|
|
if (!--tries) {
|
|
|
|
fprintf(stderr, "too many collisions\n");
|
|
|
|
return 1;
|
|
|
|
}
|
2013-03-20 04:42:44 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
ret |= test_datapath(PACKET_FANOUT_HASH | PACKET_FANOUT_FLAG_ROLLOVER,
|
|
|
|
port_off, expect_hash_rb[0], expect_hash_rb[1]);
|
|
|
|
ret |= test_datapath(PACKET_FANOUT_LB,
|
|
|
|
port_off, expect_lb[0], expect_lb[1]);
|
|
|
|
ret |= test_datapath(PACKET_FANOUT_ROLLOVER,
|
|
|
|
port_off, expect_rb[0], expect_rb[1]);
|
2015-08-15 10:31:37 +08:00
|
|
|
|
2015-08-15 10:31:36 +08:00
|
|
|
ret |= test_datapath(PACKET_FANOUT_CBPF,
|
|
|
|
port_off, expect_bpf[0], expect_bpf[1]);
|
2015-08-15 10:31:37 +08:00
|
|
|
ret |= test_datapath(PACKET_FANOUT_EBPF,
|
|
|
|
port_off, expect_bpf[0], expect_bpf[1]);
|
2013-03-20 04:42:44 +08:00
|
|
|
|
|
|
|
set_cpuaffinity(0);
|
|
|
|
ret |= test_datapath(PACKET_FANOUT_CPU, port_off,
|
|
|
|
expect_cpu0[0], expect_cpu0[1]);
|
|
|
|
if (!set_cpuaffinity(1))
|
|
|
|
/* TODO: test that choice alternates with previous */
|
|
|
|
ret |= test_datapath(PACKET_FANOUT_CPU, port_off,
|
|
|
|
expect_cpu1[0], expect_cpu1[1]);
|
|
|
|
|
2017-04-21 22:56:12 +08:00
|
|
|
ret |= test_datapath(PACKET_FANOUT_FLAG_UNIQUEID, port_off,
|
|
|
|
expect_uniqueid[0], expect_uniqueid[1]);
|
|
|
|
|
2013-03-20 04:42:44 +08:00
|
|
|
if (ret)
|
|
|
|
return 1;
|
packet: packet fanout rollover during socket overload
Changes:
v3->v2: rebase (no other changes)
passes selftest
v2->v1: read f->num_members only once
fix bug: test rollover mode + flag
Minimize packet drop in a fanout group. If one socket is full,
roll over packets to another from the group. Maintain flow
affinity during normal load using an rxhash fanout policy, while
dispersing unexpected traffic storms that hit a single cpu, such
as spoofed-source DoS flows. Rollover breaks affinity for flows
arriving at saturated sockets during those conditions.
The patch adds a fanout policy ROLLOVER that rotates between sockets,
filling each socket before moving to the next. It also adds a fanout
flag ROLLOVER. If passed along with any other fanout policy, the
primary policy is applied until the chosen socket is full. Then,
rollover selects another socket, to delay packet drop until the
entire system is saturated.
Probing sockets is not free. Selecting the last used socket, as
rollover does, is a greedy approach that maximizes chance of
success, at the cost of extreme load imbalance. In practice, with
sufficiently long queues to absorb bursts, sockets are drained in
parallel and load balance looks uniform in `top`.
To avoid contention, scales counters with number of sockets and
accesses them lockfree. Values are bounds checked to ensure
correctness.
Tested using an application with 9 threads pinned to CPUs, one socket
per thread and sufficient busywork per packet operation to limits each
thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
packets, a FANOUT_CPU setup processes 32 Kpps in total without this
patch, 270 Kpps with the patch. Tested with read() and with a packet
ring (V1).
Also, passes psock_fanout.c unit test added to selftests.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19 18:18:11 +08:00
|
|
|
|
|
|
|
printf("OK. All tests passed\n");
|
|
|
|
return 0;
|
|
|
|
}
|