2006-01-03 02:04:38 +08:00
|
|
|
/*
|
|
|
|
* net/tipc/bearer.c: TIPC bearer code
|
2007-02-09 22:25:21 +08:00
|
|
|
*
|
tipc: add neighbor monitoring framework
TIPC based clusters are by default set up with full-mesh link
connectivity between all nodes. Those links are expected to provide
a short failure detection time, by default set to 1500 ms. Because
of this, the background load for neighbor monitoring in an N-node
cluster increases with a factor N on each node, while the overall
monitoring traffic through the network infrastructure increases at
a ~(N * (N - 1)) rate. Experience has shown that such clusters don't
scale well beyond ~100 nodes unless we significantly increase failure
discovery tolerance.
This commit introduces a framework and an algorithm that drastically
reduces this background load, while basically maintaining the original
failure detection times across the whole cluster. Using this algorithm,
background load will now grow at a rate of ~(2 * sqrt(N)) per node, and
at ~(2 * N * sqrt(N)) in traffic overhead. As an example, each node will
now have to actively monitor 38 neighbors in a 400-node cluster, instead
of as before 399.
This "Overlapping Ring Supervision Algorithm" is completely distributed
and employs no centralized or coordinated state. It goes as follows:
- Each node makes up a linearly ascending, circular list of all its N
known neighbors, based on their TIPC node identity. This algorithm
must be the same on all nodes.
- The node then selects the next M = sqrt(N) - 1 nodes downstream from
itself in the list, and chooses to actively monitor those. This is
called its "local monitoring domain".
- It creates a domain record describing the monitoring domain, and
piggy-backs this in the data area of all neighbor monitoring messages
(LINK_PROTOCOL/STATE) leaving that node. This means that all nodes in
the cluster eventually (default within 400 ms) will learn about
its monitoring domain.
- Whenever a node discovers a change in its local domain, e.g., a node
has been added or has gone down, it creates and sends out a new
version of its node record to inform all neighbors about the change.
- A node receiving a domain record from anybody outside its local domain
matches this against its own list (which may not look the same), and
chooses to not actively monitor those members of the received domain
record that are also present in its own list. Instead, it relies on
indications from the direct monitoring nodes if an indirectly
monitored node has gone up or down. If a node is indicated lost, the
receiving node temporarily activates its own direct monitoring towards
that node in order to confirm, or not, that it is actually gone.
- Since each node is actively monitoring sqrt(N) downstream neighbors,
each node is also actively monitored by the same number of upstream
neighbors. This means that all non-direct monitoring nodes normally
will receive sqrt(N) indications that a node is gone.
- A major drawback with ring monitoring is how it handles failures that
cause massive network partitionings. If both a lost node and all its
direct monitoring neighbors are inside the lost partition, the nodes in
the remaining partition will never receive indications about the loss.
To overcome this, each node also chooses to actively monitor some
nodes outside its local domain. Those nodes are called remote domain
"heads", and are selected in such a way that no node in the cluster
will be more than two direct monitoring hops away. Because of this,
each node, apart from monitoring the member of its local domain, will
also typically monitor sqrt(N) remote head nodes.
- As an optimization, local list status, domain status and domain
records are marked with a generation number. This saves senders from
unnecessarily conveying unaltered domain records, and receivers from
performing unneeded re-adaptations of their node monitoring list, such
as re-assigning domain heads.
- As a measure of caution we have added the possibility to disable the
new algorithm through configuration. We do this by keeping a threshold
value for the cluster size; a cluster that grows beyond this value
will switch from full-mesh to ring monitoring, and vice versa when
it shrinks below the value. This means that if the threshold is set to
a value larger than any anticipated cluster size (default size is 32)
the new algorithm is effectively disabled. A patch set for altering the
threshold value and for listing the table contents will follow shortly.
- This change is fully backwards compatible.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-14 08:46:22 +08:00
|
|
|
* Copyright (c) 1996-2006, 2013-2016, Ericsson AB
|
2013-12-11 12:45:43 +08:00
|
|
|
* Copyright (c) 2004-2006, 2010-2013, Wind River Systems
|
2006-01-03 02:04:38 +08:00
|
|
|
* All rights reserved.
|
|
|
|
*
|
2006-01-11 20:30:43 +08:00
|
|
|
* Redistribution and use in source and binary forms, with or without
|
2006-01-03 02:04:38 +08:00
|
|
|
* modification, are permitted provided that the following conditions are met:
|
|
|
|
*
|
2006-01-11 20:30:43 +08:00
|
|
|
* 1. Redistributions of source code must retain the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer.
|
|
|
|
* 2. Redistributions in binary form must reproduce the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer in the
|
|
|
|
* documentation and/or other materials provided with the distribution.
|
|
|
|
* 3. Neither the names of the copyright holders nor the names of its
|
|
|
|
* contributors may be used to endorse or promote products derived from
|
|
|
|
* this software without specific prior written permission.
|
2006-01-03 02:04:38 +08:00
|
|
|
*
|
2006-01-11 20:30:43 +08:00
|
|
|
* Alternatively, this software may be distributed under the terms of the
|
|
|
|
* GNU General Public License ("GPL") version 2 as published by the Free
|
|
|
|
* Software Foundation.
|
|
|
|
*
|
|
|
|
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
|
|
|
|
* AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
|
|
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
|
|
|
* ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
|
|
|
|
* LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
|
|
|
|
* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
|
|
|
|
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
|
|
|
|
* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
|
|
|
|
* CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
|
|
|
|
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
|
2006-01-03 02:04:38 +08:00
|
|
|
* POSSIBILITY OF SUCH DAMAGE.
|
|
|
|
*/
|
|
|
|
|
2015-01-09 15:27:06 +08:00
|
|
|
#include <net/sock.h>
|
2006-01-03 02:04:38 +08:00
|
|
|
#include "core.h"
|
|
|
|
#include "bearer.h"
|
2014-11-20 17:29:07 +08:00
|
|
|
#include "link.h"
|
2006-01-03 02:04:38 +08:00
|
|
|
#include "discover.h"
|
tipc: add neighbor monitoring framework
TIPC based clusters are by default set up with full-mesh link
connectivity between all nodes. Those links are expected to provide
a short failure detection time, by default set to 1500 ms. Because
of this, the background load for neighbor monitoring in an N-node
cluster increases with a factor N on each node, while the overall
monitoring traffic through the network infrastructure increases at
a ~(N * (N - 1)) rate. Experience has shown that such clusters don't
scale well beyond ~100 nodes unless we significantly increase failure
discovery tolerance.
This commit introduces a framework and an algorithm that drastically
reduces this background load, while basically maintaining the original
failure detection times across the whole cluster. Using this algorithm,
background load will now grow at a rate of ~(2 * sqrt(N)) per node, and
at ~(2 * N * sqrt(N)) in traffic overhead. As an example, each node will
now have to actively monitor 38 neighbors in a 400-node cluster, instead
of as before 399.
This "Overlapping Ring Supervision Algorithm" is completely distributed
and employs no centralized or coordinated state. It goes as follows:
- Each node makes up a linearly ascending, circular list of all its N
known neighbors, based on their TIPC node identity. This algorithm
must be the same on all nodes.
- The node then selects the next M = sqrt(N) - 1 nodes downstream from
itself in the list, and chooses to actively monitor those. This is
called its "local monitoring domain".
- It creates a domain record describing the monitoring domain, and
piggy-backs this in the data area of all neighbor monitoring messages
(LINK_PROTOCOL/STATE) leaving that node. This means that all nodes in
the cluster eventually (default within 400 ms) will learn about
its monitoring domain.
- Whenever a node discovers a change in its local domain, e.g., a node
has been added or has gone down, it creates and sends out a new
version of its node record to inform all neighbors about the change.
- A node receiving a domain record from anybody outside its local domain
matches this against its own list (which may not look the same), and
chooses to not actively monitor those members of the received domain
record that are also present in its own list. Instead, it relies on
indications from the direct monitoring nodes if an indirectly
monitored node has gone up or down. If a node is indicated lost, the
receiving node temporarily activates its own direct monitoring towards
that node in order to confirm, or not, that it is actually gone.
- Since each node is actively monitoring sqrt(N) downstream neighbors,
each node is also actively monitored by the same number of upstream
neighbors. This means that all non-direct monitoring nodes normally
will receive sqrt(N) indications that a node is gone.
- A major drawback with ring monitoring is how it handles failures that
cause massive network partitionings. If both a lost node and all its
direct monitoring neighbors are inside the lost partition, the nodes in
the remaining partition will never receive indications about the loss.
To overcome this, each node also chooses to actively monitor some
nodes outside its local domain. Those nodes are called remote domain
"heads", and are selected in such a way that no node in the cluster
will be more than two direct monitoring hops away. Because of this,
each node, apart from monitoring the member of its local domain, will
also typically monitor sqrt(N) remote head nodes.
- As an optimization, local list status, domain status and domain
records are marked with a generation number. This saves senders from
unnecessarily conveying unaltered domain records, and receivers from
performing unneeded re-adaptations of their node monitoring list, such
as re-assigning domain heads.
- As a measure of caution we have added the possibility to disable the
new algorithm through configuration. We do this by keeping a threshold
value for the cluster size; a cluster that grows beyond this value
will switch from full-mesh to ring monitoring, and vice versa when
it shrinks below the value. This means that if the threshold is set to
a value larger than any anticipated cluster size (default size is 32)
the new algorithm is effectively disabled. A patch set for altering the
threshold value and for listing the table contents will follow shortly.
- This change is fully backwards compatible.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-14 08:46:22 +08:00
|
|
|
#include "monitor.h"
|
2015-01-09 15:27:07 +08:00
|
|
|
#include "bcast.h"
|
2016-03-05 00:04:42 +08:00
|
|
|
#include "netlink.h"
|
2016-08-26 16:52:53 +08:00
|
|
|
#include "udp_media.h"
|
2018-12-19 10:18:00 +08:00
|
|
|
#include "trace.h"
|
2019-11-08 13:05:11 +08:00
|
|
|
#include "crypto.h"
|
2006-01-03 02:04:38 +08:00
|
|
|
|
2013-04-17 14:18:28 +08:00
|
|
|
#define MAX_ADDR_STR 60
|
2006-01-03 02:04:38 +08:00
|
|
|
|
2013-12-11 12:45:40 +08:00
|
|
|
static struct tipc_media * const media_info_array[] = {
|
2013-12-11 12:45:39 +08:00
|
|
|
ð_media_info,
|
|
|
|
#ifdef CONFIG_TIPC_MEDIA_IB
|
|
|
|
&ib_media_info,
|
2015-03-05 17:23:49 +08:00
|
|
|
#endif
|
|
|
|
#ifdef CONFIG_TIPC_MEDIA_UDP
|
|
|
|
&udp_media_info,
|
2013-12-11 12:45:39 +08:00
|
|
|
#endif
|
|
|
|
NULL
|
|
|
|
};
|
2006-01-03 02:04:38 +08:00
|
|
|
|
2016-08-16 23:53:50 +08:00
|
|
|
static struct tipc_bearer *bearer_get(struct net *net, int bearer_id)
|
|
|
|
{
|
|
|
|
struct tipc_net *tn = tipc_net(net);
|
|
|
|
|
2019-07-02 00:54:55 +08:00
|
|
|
return rcu_dereference(tn->bearer_list[bearer_id]);
|
2016-08-16 23:53:50 +08:00
|
|
|
}
|
|
|
|
|
2015-11-20 03:30:47 +08:00
|
|
|
static void bearer_disable(struct net *net, struct tipc_bearer *b);
|
2017-08-28 23:57:02 +08:00
|
|
|
static int tipc_l2_rcv_msg(struct sk_buff *skb, struct net_device *dev,
|
|
|
|
struct packet_type *pt, struct net_device *orig_dev);
|
2011-04-22 02:58:26 +08:00
|
|
|
|
2006-01-03 02:04:38 +08:00
|
|
|
/**
|
2011-10-18 23:34:29 +08:00
|
|
|
* tipc_media_find - locates specified media object by name
|
2020-11-30 02:32:42 +08:00
|
|
|
* @name: name to locate
|
2006-01-03 02:04:38 +08:00
|
|
|
*/
|
2011-12-30 09:19:42 +08:00
|
|
|
struct tipc_media *tipc_media_find(const char *name)
|
2006-01-03 02:04:38 +08:00
|
|
|
{
|
|
|
|
u32 i;
|
|
|
|
|
2013-12-11 12:45:40 +08:00
|
|
|
for (i = 0; media_info_array[i] != NULL; i++) {
|
|
|
|
if (!strcmp(media_info_array[i]->name, name))
|
2013-12-11 12:45:39 +08:00
|
|
|
break;
|
2006-01-03 02:04:38 +08:00
|
|
|
}
|
2013-12-11 12:45:40 +08:00
|
|
|
return media_info_array[i];
|
2006-01-03 02:04:38 +08:00
|
|
|
}
|
|
|
|
|
2011-10-07 04:40:55 +08:00
|
|
|
/**
|
|
|
|
* media_find_id - locates specified media object by type identifier
|
2020-11-30 02:32:42 +08:00
|
|
|
* @type: type identifier to locate
|
2011-10-07 04:40:55 +08:00
|
|
|
*/
|
2011-12-30 09:19:42 +08:00
|
|
|
static struct tipc_media *media_find_id(u8 type)
|
2011-10-07 04:40:55 +08:00
|
|
|
{
|
|
|
|
u32 i;
|
|
|
|
|
2013-12-11 12:45:40 +08:00
|
|
|
for (i = 0; media_info_array[i] != NULL; i++) {
|
|
|
|
if (media_info_array[i]->type_id == type)
|
2013-12-11 12:45:39 +08:00
|
|
|
break;
|
2011-10-07 04:40:55 +08:00
|
|
|
}
|
2013-12-11 12:45:40 +08:00
|
|
|
return media_info_array[i];
|
2006-01-03 02:04:38 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
2006-01-18 07:38:21 +08:00
|
|
|
* tipc_media_addr_printf - record media address in print buffer
|
2020-11-30 02:32:42 +08:00
|
|
|
* @buf: output buffer
|
|
|
|
* @len: output buffer size remaining
|
|
|
|
* @a: input media address
|
2006-01-03 02:04:38 +08:00
|
|
|
*/
|
tipc: enable tracepoints in tipc
As for the sake of debugging/tracing, the commit enables tracepoints in
TIPC along with some general trace_events as shown below. It also
defines some 'tipc_*_dump()' functions that allow to dump TIPC object
data whenever needed, that is, for general debug purposes, ie. not just
for the trace_events.
The following trace_events are now available:
- trace_tipc_skb_dump(): allows to trace and dump TIPC msg & skb data,
e.g. message type, user, droppable, skb truesize, cloned skb, etc.
- trace_tipc_list_dump(): allows to trace and dump any TIPC buffers or
queues, e.g. TIPC link transmq, socket receive queue, etc.
- trace_tipc_sk_dump(): allows to trace and dump TIPC socket data, e.g.
sk state, sk type, connection type, rmem_alloc, socket queues, etc.
- trace_tipc_link_dump(): allows to trace and dump TIPC link data, e.g.
link state, silent_intv_cnt, gap, bc_gap, link queues, etc.
- trace_tipc_node_dump(): allows to trace and dump TIPC node data, e.g.
node state, active links, capabilities, link entries, etc.
How to use:
Put the trace functions at any places where we want to dump TIPC data
or events.
Note:
a) The dump functions will generate raw data only, that is, to offload
the trace event's processing, it can require a tool or script to parse
the data but this should be simple.
b) The trace_tipc_*_dump() should be reserved for a failure cases only
(e.g. the retransmission failure case) or where we do not expect to
happen too often, then we can consider enabling these events by default
since they will almost not take any effects under normal conditions,
but once the rare condition or failure occurs, we get the dumped data
fully for post-analysis.
For other trace purposes, we can reuse these trace classes as template
but different events.
c) A trace_event is only effective when we enable it. To enable the
TIPC trace_events, echo 1 to 'enable' files in the events/tipc/
directory in the 'debugfs' file system. Normally, they are located at:
/sys/kernel/debug/tracing/events/tipc/
For example:
To enable the tipc_link_dump event:
echo 1 > /sys/kernel/debug/tracing/events/tipc/tipc_link_dump/enable
To enable all the TIPC trace_events:
echo 1 > /sys/kernel/debug/tracing/events/tipc/enable
To collect the trace data:
cat trace
or
cat trace_pipe > /trace.out &
To disable all the TIPC trace_events:
echo 0 > /sys/kernel/debug/tracing/events/tipc/enable
To clear the trace buffer:
echo > trace
d) Like the other trace_events, the feature like 'filter' or 'trigger'
is also usable for the tipc trace_events.
For more details, have a look at:
Documentation/trace/ftrace.txt
MAINTAINERS | add two new files 'trace.h' & 'trace.c' in tipc
Acked-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19 10:17:56 +08:00
|
|
|
int tipc_media_addr_printf(char *buf, int len, struct tipc_media_addr *a)
|
2006-01-03 02:04:38 +08:00
|
|
|
{
|
2011-10-07 23:31:49 +08:00
|
|
|
char addr_str[MAX_ADDR_STR];
|
2015-11-20 03:30:47 +08:00
|
|
|
struct tipc_media *m;
|
2012-06-29 12:50:23 +08:00
|
|
|
int ret;
|
2006-01-03 02:04:38 +08:00
|
|
|
|
2015-11-20 03:30:47 +08:00
|
|
|
m = media_find_id(a->media_id);
|
2006-01-03 02:04:38 +08:00
|
|
|
|
2015-11-20 03:30:47 +08:00
|
|
|
if (m && !m->addr2str(a, addr_str, sizeof(addr_str)))
|
|
|
|
ret = scnprintf(buf, len, "%s(%s)", m->name, addr_str);
|
2011-10-07 23:31:49 +08:00
|
|
|
else {
|
|
|
|
u32 i;
|
2006-01-03 02:04:38 +08:00
|
|
|
|
2015-02-09 16:50:19 +08:00
|
|
|
ret = scnprintf(buf, len, "UNKNOWN(%u)", a->media_id);
|
2011-10-08 03:19:11 +08:00
|
|
|
for (i = 0; i < sizeof(a->value); i++)
|
tipc: enable tracepoints in tipc
As for the sake of debugging/tracing, the commit enables tracepoints in
TIPC along with some general trace_events as shown below. It also
defines some 'tipc_*_dump()' functions that allow to dump TIPC object
data whenever needed, that is, for general debug purposes, ie. not just
for the trace_events.
The following trace_events are now available:
- trace_tipc_skb_dump(): allows to trace and dump TIPC msg & skb data,
e.g. message type, user, droppable, skb truesize, cloned skb, etc.
- trace_tipc_list_dump(): allows to trace and dump any TIPC buffers or
queues, e.g. TIPC link transmq, socket receive queue, etc.
- trace_tipc_sk_dump(): allows to trace and dump TIPC socket data, e.g.
sk state, sk type, connection type, rmem_alloc, socket queues, etc.
- trace_tipc_link_dump(): allows to trace and dump TIPC link data, e.g.
link state, silent_intv_cnt, gap, bc_gap, link queues, etc.
- trace_tipc_node_dump(): allows to trace and dump TIPC node data, e.g.
node state, active links, capabilities, link entries, etc.
How to use:
Put the trace functions at any places where we want to dump TIPC data
or events.
Note:
a) The dump functions will generate raw data only, that is, to offload
the trace event's processing, it can require a tool or script to parse
the data but this should be simple.
b) The trace_tipc_*_dump() should be reserved for a failure cases only
(e.g. the retransmission failure case) or where we do not expect to
happen too often, then we can consider enabling these events by default
since they will almost not take any effects under normal conditions,
but once the rare condition or failure occurs, we get the dumped data
fully for post-analysis.
For other trace purposes, we can reuse these trace classes as template
but different events.
c) A trace_event is only effective when we enable it. To enable the
TIPC trace_events, echo 1 to 'enable' files in the events/tipc/
directory in the 'debugfs' file system. Normally, they are located at:
/sys/kernel/debug/tracing/events/tipc/
For example:
To enable the tipc_link_dump event:
echo 1 > /sys/kernel/debug/tracing/events/tipc/tipc_link_dump/enable
To enable all the TIPC trace_events:
echo 1 > /sys/kernel/debug/tracing/events/tipc/enable
To collect the trace data:
cat trace
or
cat trace_pipe > /trace.out &
To disable all the TIPC trace_events:
echo 0 > /sys/kernel/debug/tracing/events/tipc/enable
To clear the trace buffer:
echo > trace
d) Like the other trace_events, the feature like 'filter' or 'trigger'
is also usable for the tipc trace_events.
For more details, have a look at:
Documentation/trace/ftrace.txt
MAINTAINERS | add two new files 'trace.h' & 'trace.c' in tipc
Acked-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19 10:17:56 +08:00
|
|
|
ret += scnprintf(buf + ret, len - ret,
|
|
|
|
"-%x", a->value[i]);
|
2006-01-03 02:04:38 +08:00
|
|
|
}
|
tipc: enable tracepoints in tipc
As for the sake of debugging/tracing, the commit enables tracepoints in
TIPC along with some general trace_events as shown below. It also
defines some 'tipc_*_dump()' functions that allow to dump TIPC object
data whenever needed, that is, for general debug purposes, ie. not just
for the trace_events.
The following trace_events are now available:
- trace_tipc_skb_dump(): allows to trace and dump TIPC msg & skb data,
e.g. message type, user, droppable, skb truesize, cloned skb, etc.
- trace_tipc_list_dump(): allows to trace and dump any TIPC buffers or
queues, e.g. TIPC link transmq, socket receive queue, etc.
- trace_tipc_sk_dump(): allows to trace and dump TIPC socket data, e.g.
sk state, sk type, connection type, rmem_alloc, socket queues, etc.
- trace_tipc_link_dump(): allows to trace and dump TIPC link data, e.g.
link state, silent_intv_cnt, gap, bc_gap, link queues, etc.
- trace_tipc_node_dump(): allows to trace and dump TIPC node data, e.g.
node state, active links, capabilities, link entries, etc.
How to use:
Put the trace functions at any places where we want to dump TIPC data
or events.
Note:
a) The dump functions will generate raw data only, that is, to offload
the trace event's processing, it can require a tool or script to parse
the data but this should be simple.
b) The trace_tipc_*_dump() should be reserved for a failure cases only
(e.g. the retransmission failure case) or where we do not expect to
happen too often, then we can consider enabling these events by default
since they will almost not take any effects under normal conditions,
but once the rare condition or failure occurs, we get the dumped data
fully for post-analysis.
For other trace purposes, we can reuse these trace classes as template
but different events.
c) A trace_event is only effective when we enable it. To enable the
TIPC trace_events, echo 1 to 'enable' files in the events/tipc/
directory in the 'debugfs' file system. Normally, they are located at:
/sys/kernel/debug/tracing/events/tipc/
For example:
To enable the tipc_link_dump event:
echo 1 > /sys/kernel/debug/tracing/events/tipc/tipc_link_dump/enable
To enable all the TIPC trace_events:
echo 1 > /sys/kernel/debug/tracing/events/tipc/enable
To collect the trace data:
cat trace
or
cat trace_pipe > /trace.out &
To disable all the TIPC trace_events:
echo 0 > /sys/kernel/debug/tracing/events/tipc/enable
To clear the trace buffer:
echo > trace
d) Like the other trace_events, the feature like 'filter' or 'trigger'
is also usable for the tipc trace_events.
For more details, have a look at:
Documentation/trace/ftrace.txt
MAINTAINERS | add two new files 'trace.h' & 'trace.c' in tipc
Acked-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19 10:17:56 +08:00
|
|
|
return ret;
|
2006-01-03 02:04:38 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* bearer_name_validate - validate & (optionally) deconstruct bearer name
|
2012-07-10 18:55:09 +08:00
|
|
|
* @name: ptr to bearer name string
|
|
|
|
* @name_parts: ptr to area for bearer name components (or NULL if not needed)
|
2007-02-09 22:25:21 +08:00
|
|
|
*
|
2020-11-30 02:32:48 +08:00
|
|
|
* Return: 1 if bearer name is valid, otherwise 0.
|
2006-01-03 02:04:38 +08:00
|
|
|
*/
|
2007-02-09 22:25:21 +08:00
|
|
|
static int bearer_name_validate(const char *name,
|
2011-12-30 10:39:49 +08:00
|
|
|
struct tipc_bearer_names *name_parts)
|
2006-01-03 02:04:38 +08:00
|
|
|
{
|
|
|
|
char name_copy[TIPC_MAX_BEARER_NAME];
|
|
|
|
char *media_name;
|
|
|
|
char *if_name;
|
|
|
|
u32 media_len;
|
|
|
|
u32 if_len;
|
|
|
|
|
|
|
|
/* copy bearer name & ensure length is OK */
|
2020-11-12 17:34:42 +08:00
|
|
|
if (strscpy(name_copy, name, TIPC_MAX_BEARER_NAME) < 0)
|
2006-01-03 02:04:38 +08:00
|
|
|
return 0;
|
|
|
|
|
|
|
|
/* ensure all component parts of bearer name are present */
|
|
|
|
media_name = name_copy;
|
2011-01-01 02:59:33 +08:00
|
|
|
if_name = strchr(media_name, ':');
|
|
|
|
if (if_name == NULL)
|
2006-01-03 02:04:38 +08:00
|
|
|
return 0;
|
|
|
|
*(if_name++) = 0;
|
|
|
|
media_len = if_name - media_name;
|
|
|
|
if_len = strlen(if_name) + 1;
|
|
|
|
|
|
|
|
/* validate component parts of bearer name */
|
2007-02-09 22:25:21 +08:00
|
|
|
if ((media_len <= 1) || (media_len > TIPC_MAX_MEDIA_NAME) ||
|
2012-08-16 20:09:08 +08:00
|
|
|
(if_len <= 1) || (if_len > TIPC_MAX_IF_NAME))
|
2006-01-03 02:04:38 +08:00
|
|
|
return 0;
|
|
|
|
|
|
|
|
/* return bearer name components, if necessary */
|
|
|
|
if (name_parts) {
|
|
|
|
strcpy(name_parts->media_name, media_name);
|
|
|
|
strcpy(name_parts->if_name, if_name);
|
|
|
|
}
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
2011-10-18 23:34:29 +08:00
|
|
|
* tipc_bearer_find - locates bearer object with matching bearer name
|
2020-11-30 02:32:42 +08:00
|
|
|
* @net: the applicable net namespace
|
|
|
|
* @name: bearer name to locate
|
2006-01-03 02:04:38 +08:00
|
|
|
*/
|
2015-01-09 15:27:06 +08:00
|
|
|
struct tipc_bearer *tipc_bearer_find(struct net *net, const char *name)
|
2006-01-03 02:04:38 +08:00
|
|
|
{
|
2023-06-05 22:40:44 +08:00
|
|
|
struct tipc_net *tn = tipc_net(net);
|
2015-11-20 03:30:47 +08:00
|
|
|
struct tipc_bearer *b;
|
2006-01-03 02:04:38 +08:00
|
|
|
u32 i;
|
|
|
|
|
2014-03-27 12:54:33 +08:00
|
|
|
for (i = 0; i < MAX_BEARERS; i++) {
|
2015-11-20 03:30:47 +08:00
|
|
|
b = rtnl_dereference(tn->bearer_list[i]);
|
|
|
|
if (b && (!strcmp(b->name, name)))
|
|
|
|
return b;
|
2006-01-03 02:04:38 +08:00
|
|
|
}
|
2006-03-21 14:36:47 +08:00
|
|
|
return NULL;
|
2006-01-03 02:04:38 +08:00
|
|
|
}
|
|
|
|
|
2016-07-26 14:47:21 +08:00
|
|
|
/* tipc_bearer_get_name - get the bearer name from its id.
|
|
|
|
* @net: network namespace
|
|
|
|
* @name: a pointer to the buffer where the name will be stored.
|
|
|
|
* @bearer_id: the id to get the name from.
|
|
|
|
*/
|
|
|
|
int tipc_bearer_get_name(struct net *net, char *name, u32 bearer_id)
|
|
|
|
{
|
|
|
|
struct tipc_net *tn = tipc_net(net);
|
|
|
|
struct tipc_bearer *b;
|
|
|
|
|
|
|
|
if (bearer_id >= MAX_BEARERS)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
b = rtnl_dereference(tn->bearer_list[bearer_id]);
|
|
|
|
if (!b)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
strcpy(name, b->name);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2015-01-09 15:27:06 +08:00
|
|
|
void tipc_bearer_add_dest(struct net *net, u32 bearer_id, u32 dest)
|
2006-01-03 02:04:38 +08:00
|
|
|
{
|
2015-11-20 03:30:47 +08:00
|
|
|
struct tipc_bearer *b;
|
tipc: decouple the relationship between bearer and link
Currently on both paths of message transmission and reception, the
read lock of tipc_net_lock must be held before bearer is accessed,
while the write lock of tipc_net_lock has to be taken before bearer
is configured. Although it can ensure that bearer is always valid on
the two data paths, link and bearer is closely bound together.
So as the part of effort of removing tipc_net_lock, the locking
policy of bearer protection will be adjusted as below: on the two
data paths, RCU is used, and on the configuration path of bearer,
RTNL lock is applied.
Now RCU just covers the path of message reception. To make it possible
to protect the path of message transmission with RCU, link should not
use its stored bearer pointer to access bearer, but it should use the
bearer identity of its attached bearer as index to get bearer instance
from bearer_list array, which can help us decouple the relationship
between bearer and link. As a result, bearer on the path of message
transmission can be safely protected by RCU when we access bearer_list
array within RCU lock protection.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Tested-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-21 10:55:46 +08:00
|
|
|
|
|
|
|
rcu_read_lock();
|
2023-06-05 22:40:44 +08:00
|
|
|
b = bearer_get(net, bearer_id);
|
2015-11-20 03:30:47 +08:00
|
|
|
if (b)
|
2018-03-23 03:42:46 +08:00
|
|
|
tipc_disc_add_dest(b->disc);
|
tipc: decouple the relationship between bearer and link
Currently on both paths of message transmission and reception, the
read lock of tipc_net_lock must be held before bearer is accessed,
while the write lock of tipc_net_lock has to be taken before bearer
is configured. Although it can ensure that bearer is always valid on
the two data paths, link and bearer is closely bound together.
So as the part of effort of removing tipc_net_lock, the locking
policy of bearer protection will be adjusted as below: on the two
data paths, RCU is used, and on the configuration path of bearer,
RTNL lock is applied.
Now RCU just covers the path of message reception. To make it possible
to protect the path of message transmission with RCU, link should not
use its stored bearer pointer to access bearer, but it should use the
bearer identity of its attached bearer as index to get bearer instance
from bearer_list array, which can help us decouple the relationship
between bearer and link. As a result, bearer on the path of message
transmission can be safely protected by RCU when we access bearer_list
array within RCU lock protection.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Tested-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-21 10:55:46 +08:00
|
|
|
rcu_read_unlock();
|
2006-01-03 02:04:38 +08:00
|
|
|
}
|
|
|
|
|
2015-01-09 15:27:06 +08:00
|
|
|
void tipc_bearer_remove_dest(struct net *net, u32 bearer_id, u32 dest)
|
2006-01-03 02:04:38 +08:00
|
|
|
{
|
2015-11-20 03:30:47 +08:00
|
|
|
struct tipc_bearer *b;
|
tipc: decouple the relationship between bearer and link
Currently on both paths of message transmission and reception, the
read lock of tipc_net_lock must be held before bearer is accessed,
while the write lock of tipc_net_lock has to be taken before bearer
is configured. Although it can ensure that bearer is always valid on
the two data paths, link and bearer is closely bound together.
So as the part of effort of removing tipc_net_lock, the locking
policy of bearer protection will be adjusted as below: on the two
data paths, RCU is used, and on the configuration path of bearer,
RTNL lock is applied.
Now RCU just covers the path of message reception. To make it possible
to protect the path of message transmission with RCU, link should not
use its stored bearer pointer to access bearer, but it should use the
bearer identity of its attached bearer as index to get bearer instance
from bearer_list array, which can help us decouple the relationship
between bearer and link. As a result, bearer on the path of message
transmission can be safely protected by RCU when we access bearer_list
array within RCU lock protection.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Tested-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-21 10:55:46 +08:00
|
|
|
|
|
|
|
rcu_read_lock();
|
2023-06-05 22:40:44 +08:00
|
|
|
b = bearer_get(net, bearer_id);
|
2015-11-20 03:30:47 +08:00
|
|
|
if (b)
|
2018-03-23 03:42:46 +08:00
|
|
|
tipc_disc_remove_dest(b->disc);
|
tipc: decouple the relationship between bearer and link
Currently on both paths of message transmission and reception, the
read lock of tipc_net_lock must be held before bearer is accessed,
while the write lock of tipc_net_lock has to be taken before bearer
is configured. Although it can ensure that bearer is always valid on
the two data paths, link and bearer is closely bound together.
So as the part of effort of removing tipc_net_lock, the locking
policy of bearer protection will be adjusted as below: on the two
data paths, RCU is used, and on the configuration path of bearer,
RTNL lock is applied.
Now RCU just covers the path of message reception. To make it possible
to protect the path of message transmission with RCU, link should not
use its stored bearer pointer to access bearer, but it should use the
bearer identity of its attached bearer as index to get bearer instance
from bearer_list array, which can help us decouple the relationship
between bearer and link. As a result, bearer on the path of message
transmission can be safely protected by RCU when we access bearer_list
array within RCU lock protection.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Tested-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-21 10:55:46 +08:00
|
|
|
rcu_read_unlock();
|
2006-01-03 02:04:38 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* tipc_enable_bearer - enable bearer with the given name
|
2020-11-30 02:32:42 +08:00
|
|
|
* @net: the applicable net namespace
|
|
|
|
* @name: bearer name to enable
|
|
|
|
* @disc_domain: bearer domain
|
|
|
|
* @prio: bearer priority
|
|
|
|
* @attr: nlattr array
|
2021-03-26 17:14:14 +08:00
|
|
|
* @extack: netlink extended ack
|
2007-02-09 22:25:21 +08:00
|
|
|
*/
|
2015-02-09 16:50:05 +08:00
|
|
|
static int tipc_enable_bearer(struct net *net, const char *name,
|
2018-03-23 03:42:45 +08:00
|
|
|
u32 disc_domain, u32 prio,
|
2021-03-25 09:56:41 +08:00
|
|
|
struct nlattr *attr[],
|
|
|
|
struct netlink_ext_ack *extack)
|
2006-01-03 02:04:38 +08:00
|
|
|
{
|
2018-03-23 03:42:45 +08:00
|
|
|
struct tipc_net *tn = tipc_net(net);
|
|
|
|
struct tipc_bearer_names b_names;
|
|
|
|
int with_this_prio = 1;
|
2015-11-20 03:30:47 +08:00
|
|
|
struct tipc_bearer *b;
|
|
|
|
struct tipc_media *m;
|
tipc: eliminate buffer leak in bearer layer
When enabling a bearer we create a 'neigbor discoverer' instance by
calling the function tipc_disc_create() before the bearer is actually
registered in the list of enabled bearers. Because of this, the very
first discovery broadcast message, created by the mentioned function,
is lost, since it cannot find any valid bearer to use. Furthermore,
the used send function, tipc_bearer_xmit_skb() does not free the given
buffer when it cannot find a bearer, resulting in the leak of exactly
one send buffer each time a bearer is enabled.
This commit fixes this problem by introducing two changes:
1) Instead of attemting to send the discovery message directly, we let
tipc_disc_create() return the discovery buffer to the calling
function, tipc_enable_bearer(), so that the latter can send it
when the enabling sequence is finished.
2) In tipc_bearer_xmit_skb(), as well as in the two other transmit
functions at the bearer layer, we now free the indicated buffer or
buffer chain when a valid bearer cannot be found.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-07 22:09:13 +08:00
|
|
|
struct sk_buff *skb;
|
2018-03-23 03:42:45 +08:00
|
|
|
int bearer_id = 0;
|
2006-01-03 02:04:38 +08:00
|
|
|
int res = -EINVAL;
|
2018-03-23 03:42:45 +08:00
|
|
|
char *errstr = "";
|
2021-04-01 10:30:48 +08:00
|
|
|
u32 i;
|
2006-01-03 02:04:38 +08:00
|
|
|
|
2011-12-30 10:39:49 +08:00
|
|
|
if (!bearer_name_validate(name, &b_names)) {
|
2021-03-25 09:56:41 +08:00
|
|
|
NL_SET_ERR_MSG(extack, "Illegal name");
|
2022-06-02 14:30:53 +08:00
|
|
|
return res;
|
2006-06-26 14:52:17 +08:00
|
|
|
}
|
2018-03-23 03:42:45 +08:00
|
|
|
|
|
|
|
if (prio > TIPC_MAX_LINK_PRI && prio != TIPC_MEDIA_LINK_PRI) {
|
|
|
|
errstr = "illegal priority";
|
2021-03-25 09:56:41 +08:00
|
|
|
NL_SET_ERR_MSG(extack, "Illegal priority");
|
2018-03-23 03:42:45 +08:00
|
|
|
goto rejected;
|
2006-06-26 14:52:17 +08:00
|
|
|
}
|
2006-01-03 02:04:38 +08:00
|
|
|
|
2015-11-20 03:30:47 +08:00
|
|
|
m = tipc_media_find(b_names.media_name);
|
|
|
|
if (!m) {
|
2018-03-23 03:42:45 +08:00
|
|
|
errstr = "media not registered";
|
2021-03-25 09:56:41 +08:00
|
|
|
NL_SET_ERR_MSG(extack, "Media not registered");
|
2018-03-23 03:42:45 +08:00
|
|
|
goto rejected;
|
2006-01-03 02:04:38 +08:00
|
|
|
}
|
2006-01-14 05:22:22 +08:00
|
|
|
|
2018-03-23 03:42:45 +08:00
|
|
|
if (prio == TIPC_MEDIA_LINK_PRI)
|
|
|
|
prio = m->priority;
|
2006-01-03 02:04:38 +08:00
|
|
|
|
2018-03-23 03:42:45 +08:00
|
|
|
/* Check new bearer vs existing ones and find free bearer id if any */
|
2021-04-01 10:30:48 +08:00
|
|
|
bearer_id = MAX_BEARERS;
|
|
|
|
i = MAX_BEARERS;
|
|
|
|
while (i-- != 0) {
|
|
|
|
b = rtnl_dereference(tn->bearer_list[i]);
|
|
|
|
if (!b) {
|
|
|
|
bearer_id = i;
|
|
|
|
continue;
|
|
|
|
}
|
2015-11-20 03:30:47 +08:00
|
|
|
if (!strcmp(name, b->name)) {
|
2018-03-23 03:42:45 +08:00
|
|
|
errstr = "already enabled";
|
2021-03-25 09:56:41 +08:00
|
|
|
NL_SET_ERR_MSG(extack, "Already enabled");
|
2018-03-23 03:42:45 +08:00
|
|
|
goto rejected;
|
2006-01-03 02:04:38 +08:00
|
|
|
}
|
2021-04-01 10:30:48 +08:00
|
|
|
|
|
|
|
if (b->priority == prio &&
|
|
|
|
(++with_this_prio > 2)) {
|
|
|
|
pr_warn("Bearer <%s>: already 2 bearers with priority %u\n",
|
|
|
|
name, prio);
|
|
|
|
|
|
|
|
if (prio == TIPC_MIN_LINK_PRI) {
|
|
|
|
errstr = "cannot adjust to lower";
|
|
|
|
NL_SET_ERR_MSG(extack, "Cannot adjust to lower");
|
|
|
|
goto rejected;
|
|
|
|
}
|
|
|
|
|
|
|
|
pr_warn("Bearer <%s>: trying with adjusted priority\n",
|
|
|
|
name);
|
|
|
|
prio--;
|
|
|
|
bearer_id = MAX_BEARERS;
|
|
|
|
i = MAX_BEARERS;
|
|
|
|
with_this_prio = 1;
|
2006-01-03 02:04:38 +08:00
|
|
|
}
|
|
|
|
}
|
2018-03-23 03:42:45 +08:00
|
|
|
|
2006-01-03 02:04:38 +08:00
|
|
|
if (bearer_id >= MAX_BEARERS) {
|
2018-03-23 03:42:45 +08:00
|
|
|
errstr = "max 3 bearers permitted";
|
2021-03-25 09:56:41 +08:00
|
|
|
NL_SET_ERR_MSG(extack, "Max 3 bearers permitted");
|
2018-03-23 03:42:45 +08:00
|
|
|
goto rejected;
|
2006-01-03 02:04:38 +08:00
|
|
|
}
|
|
|
|
|
2015-11-20 03:30:47 +08:00
|
|
|
b = kzalloc(sizeof(*b), GFP_ATOMIC);
|
|
|
|
if (!b)
|
tipc: purge tipc_net_lock lock
Now tipc routing hierarchy comprises the structures 'node', 'link'and
'bearer'. The whole hierarchy is protected by a big read/write lock,
tipc_net_lock, to ensure that nothing is added or removed while code
is accessing any of these structures. Obviously the locking policy
makes node, link and bearer components closely bound together so that
their relationship becomes unnecessarily complex. In the worst case,
such locking policy not only has a negative influence on performance,
but also it's prone to lead to deadlock occasionally.
In order o decouple the complex relationship between bearer and node
as well as link, the locking policy is adjusted as follows:
- Bearer level
RTNL lock is used on update side, and RCU is used on read side.
Meanwhile, all bearer instances including broadcast bearer are
saved into bearer_list array.
- Node and link level
All node instances are saved into two tipc_node_list and node_htable
lists. The two lists are protected by node_list_lock on write side,
and they are guarded with RCU lock on read side. All members in node
structure including link instances are protected by node spin lock.
- The relationship between bearer and node
When link accesses bearer, it first needs to find the bearer with
its bearer identity from the bearer_list array. When bearer accesses
node, it can iterate the node_htable hash list with the node
address to find the corresponding node.
In the new locking policy, every component has its private locking
solution and the relationship between bearer and node is very simple,
that is, they can find each other with node address or bearer identity
from node_htable hash list or bearer_list array.
Until now above all changes have been done, so tipc_net_lock can be
removed safely.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Tested-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-21 10:55:48 +08:00
|
|
|
return -ENOMEM;
|
|
|
|
|
2015-11-20 03:30:47 +08:00
|
|
|
strcpy(b->name, name);
|
|
|
|
b->media = m;
|
|
|
|
res = m->enable_media(net, b, attr);
|
2006-01-03 02:04:38 +08:00
|
|
|
if (res) {
|
2017-12-22 15:35:16 +08:00
|
|
|
kfree(b);
|
2018-03-23 03:42:45 +08:00
|
|
|
errstr = "failed to enable media";
|
2021-03-25 09:56:41 +08:00
|
|
|
NL_SET_ERR_MSG(extack, "Failed to enable media");
|
2018-03-23 03:42:45 +08:00
|
|
|
goto rejected;
|
2006-01-03 02:04:38 +08:00
|
|
|
}
|
|
|
|
|
2015-11-20 03:30:47 +08:00
|
|
|
b->identity = bearer_id;
|
|
|
|
b->tolerance = m->tolerance;
|
tipc: introduce variable window congestion control
We introduce a simple variable window congestion control for links.
The algorithm is inspired by the Reno algorithm, covering both 'slow
start', 'congestion avoidance', and 'fast recovery' modes.
- We introduce hard lower and upper window limits per link, still
different and configurable per bearer type.
- We introduce a 'slow start theshold' variable, initially set to
the maximum window size.
- We let a link start at the minimum congestion window, i.e. in slow
start mode, and then let is grow rapidly (+1 per rceived ACK) until
it reaches the slow start threshold and enters congestion avoidance
mode.
- In congestion avoidance mode we increment the congestion window for
each window-size number of acked packets, up to a possible maximum
equal to the configured maximum window.
- For each non-duplicate NACK received, we drop back to fast recovery
mode, by setting the both the slow start threshold to and the
congestion window to (current_congestion_window / 2).
- If the timeout handler finds that the transmit queue has not moved
since the previous timeout, it drops the link back to slow start
and forces a probe containing the last sent sequence number to the
sent to the peer, so that this can discover the stale situation.
This change does in reality have effect only on unicast ethernet
transport, as we have seen that there is no room whatsoever for
increasing the window max size for the UDP bearer.
For now, we also choose to keep the limits for the broadcast link
unchanged and equal.
This algorithm seems to give a 50-100% throughput improvement for
messages larger than MTU.
Suggested-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-10 07:52:46 +08:00
|
|
|
b->min_win = m->min_win;
|
|
|
|
b->max_win = m->max_win;
|
2015-11-20 03:30:47 +08:00
|
|
|
b->domain = disc_domain;
|
|
|
|
b->net_plane = bearer_id + 'A';
|
2018-03-23 03:42:45 +08:00
|
|
|
b->priority = prio;
|
2019-11-08 13:05:08 +08:00
|
|
|
refcount_set(&b->refcnt, 1);
|
2011-04-22 02:58:26 +08:00
|
|
|
|
tipc: eliminate buffer leak in bearer layer
When enabling a bearer we create a 'neigbor discoverer' instance by
calling the function tipc_disc_create() before the bearer is actually
registered in the list of enabled bearers. Because of this, the very
first discovery broadcast message, created by the mentioned function,
is lost, since it cannot find any valid bearer to use. Furthermore,
the used send function, tipc_bearer_xmit_skb() does not free the given
buffer when it cannot find a bearer, resulting in the leak of exactly
one send buffer each time a bearer is enabled.
This commit fixes this problem by introducing two changes:
1) Instead of attemting to send the discovery message directly, we let
tipc_disc_create() return the discovery buffer to the calling
function, tipc_enable_bearer(), so that the latter can send it
when the enabling sequence is finished.
2) In tipc_bearer_xmit_skb(), as well as in the two other transmit
functions at the bearer layer, we now free the indicated buffer or
buffer chain when a valid bearer cannot be found.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-07 22:09:13 +08:00
|
|
|
res = tipc_disc_create(net, b, &b->bcast_addr, &skb);
|
2011-04-22 02:58:26 +08:00
|
|
|
if (res) {
|
2015-11-20 03:30:47 +08:00
|
|
|
bearer_disable(net, b);
|
2018-03-23 03:42:45 +08:00
|
|
|
errstr = "failed to create discoverer";
|
2021-03-25 09:56:41 +08:00
|
|
|
NL_SET_ERR_MSG(extack, "Failed to create discoverer");
|
2018-03-23 03:42:45 +08:00
|
|
|
goto rejected;
|
2011-04-22 02:58:26 +08:00
|
|
|
}
|
2014-03-27 12:54:33 +08:00
|
|
|
|
2022-03-04 11:25:18 +08:00
|
|
|
/* Create monitoring data before accepting activate messages */
|
2017-12-22 15:35:16 +08:00
|
|
|
if (tipc_mon_create(net, bearer_id)) {
|
|
|
|
bearer_disable(net, b);
|
2022-03-04 11:25:18 +08:00
|
|
|
kfree_skb(skb);
|
tipc: add neighbor monitoring framework
TIPC based clusters are by default set up with full-mesh link
connectivity between all nodes. Those links are expected to provide
a short failure detection time, by default set to 1500 ms. Because
of this, the background load for neighbor monitoring in an N-node
cluster increases with a factor N on each node, while the overall
monitoring traffic through the network infrastructure increases at
a ~(N * (N - 1)) rate. Experience has shown that such clusters don't
scale well beyond ~100 nodes unless we significantly increase failure
discovery tolerance.
This commit introduces a framework and an algorithm that drastically
reduces this background load, while basically maintaining the original
failure detection times across the whole cluster. Using this algorithm,
background load will now grow at a rate of ~(2 * sqrt(N)) per node, and
at ~(2 * N * sqrt(N)) in traffic overhead. As an example, each node will
now have to actively monitor 38 neighbors in a 400-node cluster, instead
of as before 399.
This "Overlapping Ring Supervision Algorithm" is completely distributed
and employs no centralized or coordinated state. It goes as follows:
- Each node makes up a linearly ascending, circular list of all its N
known neighbors, based on their TIPC node identity. This algorithm
must be the same on all nodes.
- The node then selects the next M = sqrt(N) - 1 nodes downstream from
itself in the list, and chooses to actively monitor those. This is
called its "local monitoring domain".
- It creates a domain record describing the monitoring domain, and
piggy-backs this in the data area of all neighbor monitoring messages
(LINK_PROTOCOL/STATE) leaving that node. This means that all nodes in
the cluster eventually (default within 400 ms) will learn about
its monitoring domain.
- Whenever a node discovers a change in its local domain, e.g., a node
has been added or has gone down, it creates and sends out a new
version of its node record to inform all neighbors about the change.
- A node receiving a domain record from anybody outside its local domain
matches this against its own list (which may not look the same), and
chooses to not actively monitor those members of the received domain
record that are also present in its own list. Instead, it relies on
indications from the direct monitoring nodes if an indirectly
monitored node has gone up or down. If a node is indicated lost, the
receiving node temporarily activates its own direct monitoring towards
that node in order to confirm, or not, that it is actually gone.
- Since each node is actively monitoring sqrt(N) downstream neighbors,
each node is also actively monitored by the same number of upstream
neighbors. This means that all non-direct monitoring nodes normally
will receive sqrt(N) indications that a node is gone.
- A major drawback with ring monitoring is how it handles failures that
cause massive network partitionings. If both a lost node and all its
direct monitoring neighbors are inside the lost partition, the nodes in
the remaining partition will never receive indications about the loss.
To overcome this, each node also chooses to actively monitor some
nodes outside its local domain. Those nodes are called remote domain
"heads", and are selected in such a way that no node in the cluster
will be more than two direct monitoring hops away. Because of this,
each node, apart from monitoring the member of its local domain, will
also typically monitor sqrt(N) remote head nodes.
- As an optimization, local list status, domain status and domain
records are marked with a generation number. This saves senders from
unnecessarily conveying unaltered domain records, and receivers from
performing unneeded re-adaptations of their node monitoring list, such
as re-assigning domain heads.
- As a measure of caution we have added the possibility to disable the
new algorithm through configuration. We do this by keeping a threshold
value for the cluster size; a cluster that grows beyond this value
will switch from full-mesh to ring monitoring, and vice versa when
it shrinks below the value. This means that if the threshold is set to
a value larger than any anticipated cluster size (default size is 32)
the new algorithm is effectively disabled. A patch set for altering the
threshold value and for listing the table contents will follow shortly.
- This change is fully backwards compatible.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-14 08:46:22 +08:00
|
|
|
return -ENOMEM;
|
2017-12-22 15:35:16 +08:00
|
|
|
}
|
tipc: add neighbor monitoring framework
TIPC based clusters are by default set up with full-mesh link
connectivity between all nodes. Those links are expected to provide
a short failure detection time, by default set to 1500 ms. Because
of this, the background load for neighbor monitoring in an N-node
cluster increases with a factor N on each node, while the overall
monitoring traffic through the network infrastructure increases at
a ~(N * (N - 1)) rate. Experience has shown that such clusters don't
scale well beyond ~100 nodes unless we significantly increase failure
discovery tolerance.
This commit introduces a framework and an algorithm that drastically
reduces this background load, while basically maintaining the original
failure detection times across the whole cluster. Using this algorithm,
background load will now grow at a rate of ~(2 * sqrt(N)) per node, and
at ~(2 * N * sqrt(N)) in traffic overhead. As an example, each node will
now have to actively monitor 38 neighbors in a 400-node cluster, instead
of as before 399.
This "Overlapping Ring Supervision Algorithm" is completely distributed
and employs no centralized or coordinated state. It goes as follows:
- Each node makes up a linearly ascending, circular list of all its N
known neighbors, based on their TIPC node identity. This algorithm
must be the same on all nodes.
- The node then selects the next M = sqrt(N) - 1 nodes downstream from
itself in the list, and chooses to actively monitor those. This is
called its "local monitoring domain".
- It creates a domain record describing the monitoring domain, and
piggy-backs this in the data area of all neighbor monitoring messages
(LINK_PROTOCOL/STATE) leaving that node. This means that all nodes in
the cluster eventually (default within 400 ms) will learn about
its monitoring domain.
- Whenever a node discovers a change in its local domain, e.g., a node
has been added or has gone down, it creates and sends out a new
version of its node record to inform all neighbors about the change.
- A node receiving a domain record from anybody outside its local domain
matches this against its own list (which may not look the same), and
chooses to not actively monitor those members of the received domain
record that are also present in its own list. Instead, it relies on
indications from the direct monitoring nodes if an indirectly
monitored node has gone up or down. If a node is indicated lost, the
receiving node temporarily activates its own direct monitoring towards
that node in order to confirm, or not, that it is actually gone.
- Since each node is actively monitoring sqrt(N) downstream neighbors,
each node is also actively monitored by the same number of upstream
neighbors. This means that all non-direct monitoring nodes normally
will receive sqrt(N) indications that a node is gone.
- A major drawback with ring monitoring is how it handles failures that
cause massive network partitionings. If both a lost node and all its
direct monitoring neighbors are inside the lost partition, the nodes in
the remaining partition will never receive indications about the loss.
To overcome this, each node also chooses to actively monitor some
nodes outside its local domain. Those nodes are called remote domain
"heads", and are selected in such a way that no node in the cluster
will be more than two direct monitoring hops away. Because of this,
each node, apart from monitoring the member of its local domain, will
also typically monitor sqrt(N) remote head nodes.
- As an optimization, local list status, domain status and domain
records are marked with a generation number. This saves senders from
unnecessarily conveying unaltered domain records, and receivers from
performing unneeded re-adaptations of their node monitoring list, such
as re-assigning domain heads.
- As a measure of caution we have added the possibility to disable the
new algorithm through configuration. We do this by keeping a threshold
value for the cluster size; a cluster that grows beyond this value
will switch from full-mesh to ring monitoring, and vice versa when
it shrinks below the value. This means that if the threshold is set to
a value larger than any anticipated cluster size (default size is 32)
the new algorithm is effectively disabled. A patch set for altering the
threshold value and for listing the table contents will follow shortly.
- This change is fully backwards compatible.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-14 08:46:22 +08:00
|
|
|
|
2022-03-04 11:25:18 +08:00
|
|
|
test_and_set_bit_lock(0, &b->up);
|
|
|
|
rcu_assign_pointer(tn->bearer_list[bearer_id], b);
|
|
|
|
if (skb)
|
|
|
|
tipc_bearer_xmit_skb(net, bearer_id, skb, &b->bcast_addr);
|
|
|
|
|
tipc: remove restrictions on node address values
Nominally, TIPC organizes network nodes into a three-level network
hierarchy consisting of the levels 'zone', 'cluster' and 'node'. This
hierarchy is reflected in the node address format, - it is sub-divided
into an 8-bit zone id, and 12 bit cluster id, and a 12-bit node id.
However, the 'zone' and 'cluster' levels have in reality never been
fully implemented,and never will be. The result of this has been
that the first 20 bits the node identity structure have been wasted,
and the usable node identity range within a cluster has been limited
to 12 bits. This is starting to become a problem.
In the following commits, we will need to be able to connect between
nodes which are using the whole 32-bit value space of the node address.
We therefore remove the restrictions on which values can be assigned
to node identity, -it is from now on only a 32-bit integer with no
assumed internal structure.
Isolation between clusters is now achieved only by setting different
values for the 'network id' field used during neighbor discovery, in
practice leading to the latter becoming the new cluster identity.
The rules for accepting discovery requests/responses from neighboring
nodes now become:
- If the user is using legacy address format on both peers, reception
of discovery messages is subject to the legacy lookup domain check
in addition to the cluster id check.
- Otherwise, the discovery request/response is always accepted, provided
both peers have the same network id.
This secures backwards compatibility for users who have been using zone
or cluster identities as cluster separators, instead of the intended
'network id'.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23 03:42:47 +08:00
|
|
|
pr_info("Enabled bearer <%s>, priority %u\n", name, prio);
|
|
|
|
|
2018-03-23 03:42:45 +08:00
|
|
|
return res;
|
|
|
|
rejected:
|
tipc: remove restrictions on node address values
Nominally, TIPC organizes network nodes into a three-level network
hierarchy consisting of the levels 'zone', 'cluster' and 'node'. This
hierarchy is reflected in the node address format, - it is sub-divided
into an 8-bit zone id, and 12 bit cluster id, and a 12-bit node id.
However, the 'zone' and 'cluster' levels have in reality never been
fully implemented,and never will be. The result of this has been
that the first 20 bits the node identity structure have been wasted,
and the usable node identity range within a cluster has been limited
to 12 bits. This is starting to become a problem.
In the following commits, we will need to be able to connect between
nodes which are using the whole 32-bit value space of the node address.
We therefore remove the restrictions on which values can be assigned
to node identity, -it is from now on only a 32-bit integer with no
assumed internal structure.
Isolation between clusters is now achieved only by setting different
values for the 'network id' field used during neighbor discovery, in
practice leading to the latter becoming the new cluster identity.
The rules for accepting discovery requests/responses from neighboring
nodes now become:
- If the user is using legacy address format on both peers, reception
of discovery messages is subject to the legacy lookup domain check
in addition to the cluster id check.
- Otherwise, the discovery request/response is always accepted, provided
both peers have the same network id.
This secures backwards compatibility for users who have been using zone
or cluster identities as cluster separators, instead of the intended
'network id'.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23 03:42:47 +08:00
|
|
|
pr_warn("Enabling of bearer <%s> rejected, %s\n", name, errstr);
|
2006-01-03 02:04:38 +08:00
|
|
|
return res;
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
tipc: remove interface state mirroring in bearer
struct 'tipc_bearer' is a generic representation of the underlying
media type, and exists in a one-to-one relationship to each interface
TIPC is using. The struct contains a 'blocked' flag that mirrors the
operational and execution state of the represented interface, and is
updated through notification calls from the latter. The users of
tipc_bearer are checking this flag before each attempt to send a
packet via the interface.
This state mirroring serves no purpose in the current code base. TIPC
links will not discover a media failure any faster through this
mechanism, and in reality the flag only adds overhead at packet
sending and reception.
Furthermore, the fact that the flag needs to be protected by a spinlock
aggregated into tipc_bearer has turned out to cause a serious and
completely unnecessary deadlock problem.
CPU0 CPU1
---- ----
Time 0: bearer_disable() link_timeout()
Time 1: spin_lock_bh(&b_ptr->lock) tipc_link_push_queue()
Time 2: tipc_link_delete() tipc_bearer_blocked(b_ptr)
Time 3: k_cancel_timer(&req->timer) spin_lock_bh(&b_ptr->lock)
Time 4: del_timer_sync(&req->timer)
I.e., del_timer_sync() on CPU0 never returns, because the timer handler
on CPU1 is waiting for the bearer lock.
We eliminate the 'blocked' flag from struct tipc_bearer, along with all
tests on this flag. This not only resolves the deadlock, but also
simplifies and speeds up the data path execution of TIPC. It also fits
well into our ongoing effort to make the locking policy simpler and
more manageable.
An effect of this change is that we can get rid of functions such as
tipc_bearer_blocked(), tipc_continue() and tipc_block_bearer().
We replace the latter with a new function, tipc_reset_bearer(), which
resets all links associated to the bearer immediately after an
interface goes down.
A user might notice one slight change in link behaviour after this
change. When an interface goes down, (e.g. through a NETDEV_DOWN
event) all attached links will be reset immediately, instead of
leaving it to each link to detect the failure through a timer-driven
mechanism. We consider this an improvement, and see no obvious risks
with the new behavior.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Paul Gortmaker <Paul.Gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-06 23:08:00 +08:00
|
|
|
* tipc_reset_bearer - Reset all links established over this bearer
|
2020-11-30 02:32:42 +08:00
|
|
|
* @net: the applicable net namespace
|
|
|
|
* @b: the target bearer
|
2006-01-03 02:04:38 +08:00
|
|
|
*/
|
2015-11-20 03:30:47 +08:00
|
|
|
static int tipc_reset_bearer(struct net *net, struct tipc_bearer *b)
|
2006-01-03 02:04:38 +08:00
|
|
|
{
|
2015-11-20 03:30:47 +08:00
|
|
|
pr_info("Resetting bearer <%s>\n", b->name);
|
|
|
|
tipc_node_delete_links(net, b->identity);
|
|
|
|
tipc_disc_reset(net, b);
|
2008-07-15 13:44:01 +08:00
|
|
|
return 0;
|
2006-01-03 02:04:38 +08:00
|
|
|
}
|
|
|
|
|
2019-11-08 13:05:08 +08:00
|
|
|
bool tipc_bearer_hold(struct tipc_bearer *b)
|
|
|
|
{
|
|
|
|
return (b && refcount_inc_not_zero(&b->refcnt));
|
|
|
|
}
|
|
|
|
|
|
|
|
void tipc_bearer_put(struct tipc_bearer *b)
|
|
|
|
{
|
|
|
|
if (b && refcount_dec_and_test(&b->refcnt))
|
|
|
|
kfree_rcu(b, rcu);
|
|
|
|
}
|
|
|
|
|
2006-01-03 02:04:38 +08:00
|
|
|
/**
|
2020-11-30 02:32:42 +08:00
|
|
|
* bearer_disable - disable this bearer
|
|
|
|
* @net: the applicable net namespace
|
|
|
|
* @b: the bearer to disable
|
2007-02-09 22:25:21 +08:00
|
|
|
*
|
tipc: purge tipc_net_lock lock
Now tipc routing hierarchy comprises the structures 'node', 'link'and
'bearer'. The whole hierarchy is protected by a big read/write lock,
tipc_net_lock, to ensure that nothing is added or removed while code
is accessing any of these structures. Obviously the locking policy
makes node, link and bearer components closely bound together so that
their relationship becomes unnecessarily complex. In the worst case,
such locking policy not only has a negative influence on performance,
but also it's prone to lead to deadlock occasionally.
In order o decouple the complex relationship between bearer and node
as well as link, the locking policy is adjusted as follows:
- Bearer level
RTNL lock is used on update side, and RCU is used on read side.
Meanwhile, all bearer instances including broadcast bearer are
saved into bearer_list array.
- Node and link level
All node instances are saved into two tipc_node_list and node_htable
lists. The two lists are protected by node_list_lock on write side,
and they are guarded with RCU lock on read side. All members in node
structure including link instances are protected by node spin lock.
- The relationship between bearer and node
When link accesses bearer, it first needs to find the bearer with
its bearer identity from the bearer_list array. When bearer accesses
node, it can iterate the node_htable hash list with the node
address to find the corresponding node.
In the new locking policy, every component has its private locking
solution and the relationship between bearer and node is very simple,
that is, they can find each other with node address or bearer identity
from node_htable hash list or bearer_list array.
Until now above all changes have been done, so tipc_net_lock can be
removed safely.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Tested-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-21 10:55:48 +08:00
|
|
|
* Note: This routine assumes caller holds RTNL lock.
|
2006-01-03 02:04:38 +08:00
|
|
|
*/
|
2015-11-20 03:30:47 +08:00
|
|
|
static void bearer_disable(struct net *net, struct tipc_bearer *b)
|
2006-01-03 02:04:38 +08:00
|
|
|
{
|
2016-04-07 22:09:14 +08:00
|
|
|
struct tipc_net *tn = tipc_net(net);
|
|
|
|
int bearer_id = b->identity;
|
2014-03-27 12:54:33 +08:00
|
|
|
|
2015-11-20 03:30:47 +08:00
|
|
|
pr_info("Disabling bearer <%s>\n", b->name);
|
2016-08-16 23:53:50 +08:00
|
|
|
clear_bit_unlock(0, &b->up);
|
2016-04-07 22:09:14 +08:00
|
|
|
tipc_node_delete_links(net, bearer_id);
|
2016-08-16 23:53:50 +08:00
|
|
|
b->media->disable_media(b);
|
2015-11-20 03:30:47 +08:00
|
|
|
RCU_INIT_POINTER(b->media_ptr, NULL);
|
2018-03-23 03:42:46 +08:00
|
|
|
if (b->disc)
|
|
|
|
tipc_disc_delete(b->disc);
|
2016-04-07 22:09:14 +08:00
|
|
|
RCU_INIT_POINTER(tn->bearer_list[bearer_id], NULL);
|
2019-11-08 13:05:08 +08:00
|
|
|
tipc_bearer_put(b);
|
tipc: add neighbor monitoring framework
TIPC based clusters are by default set up with full-mesh link
connectivity between all nodes. Those links are expected to provide
a short failure detection time, by default set to 1500 ms. Because
of this, the background load for neighbor monitoring in an N-node
cluster increases with a factor N on each node, while the overall
monitoring traffic through the network infrastructure increases at
a ~(N * (N - 1)) rate. Experience has shown that such clusters don't
scale well beyond ~100 nodes unless we significantly increase failure
discovery tolerance.
This commit introduces a framework and an algorithm that drastically
reduces this background load, while basically maintaining the original
failure detection times across the whole cluster. Using this algorithm,
background load will now grow at a rate of ~(2 * sqrt(N)) per node, and
at ~(2 * N * sqrt(N)) in traffic overhead. As an example, each node will
now have to actively monitor 38 neighbors in a 400-node cluster, instead
of as before 399.
This "Overlapping Ring Supervision Algorithm" is completely distributed
and employs no centralized or coordinated state. It goes as follows:
- Each node makes up a linearly ascending, circular list of all its N
known neighbors, based on their TIPC node identity. This algorithm
must be the same on all nodes.
- The node then selects the next M = sqrt(N) - 1 nodes downstream from
itself in the list, and chooses to actively monitor those. This is
called its "local monitoring domain".
- It creates a domain record describing the monitoring domain, and
piggy-backs this in the data area of all neighbor monitoring messages
(LINK_PROTOCOL/STATE) leaving that node. This means that all nodes in
the cluster eventually (default within 400 ms) will learn about
its monitoring domain.
- Whenever a node discovers a change in its local domain, e.g., a node
has been added or has gone down, it creates and sends out a new
version of its node record to inform all neighbors about the change.
- A node receiving a domain record from anybody outside its local domain
matches this against its own list (which may not look the same), and
chooses to not actively monitor those members of the received domain
record that are also present in its own list. Instead, it relies on
indications from the direct monitoring nodes if an indirectly
monitored node has gone up or down. If a node is indicated lost, the
receiving node temporarily activates its own direct monitoring towards
that node in order to confirm, or not, that it is actually gone.
- Since each node is actively monitoring sqrt(N) downstream neighbors,
each node is also actively monitored by the same number of upstream
neighbors. This means that all non-direct monitoring nodes normally
will receive sqrt(N) indications that a node is gone.
- A major drawback with ring monitoring is how it handles failures that
cause massive network partitionings. If both a lost node and all its
direct monitoring neighbors are inside the lost partition, the nodes in
the remaining partition will never receive indications about the loss.
To overcome this, each node also chooses to actively monitor some
nodes outside its local domain. Those nodes are called remote domain
"heads", and are selected in such a way that no node in the cluster
will be more than two direct monitoring hops away. Because of this,
each node, apart from monitoring the member of its local domain, will
also typically monitor sqrt(N) remote head nodes.
- As an optimization, local list status, domain status and domain
records are marked with a generation number. This saves senders from
unnecessarily conveying unaltered domain records, and receivers from
performing unneeded re-adaptations of their node monitoring list, such
as re-assigning domain heads.
- As a measure of caution we have added the possibility to disable the
new algorithm through configuration. We do this by keeping a threshold
value for the cluster size; a cluster that grows beyond this value
will switch from full-mesh to ring monitoring, and vice versa when
it shrinks below the value. This means that if the threshold is set to
a value larger than any anticipated cluster size (default size is 32)
the new algorithm is effectively disabled. A patch set for altering the
threshold value and for listing the table contents will follow shortly.
- This change is fully backwards compatible.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-14 08:46:22 +08:00
|
|
|
tipc_mon_delete(net, bearer_id);
|
2006-01-03 02:04:38 +08:00
|
|
|
}
|
|
|
|
|
2015-03-05 17:23:49 +08:00
|
|
|
int tipc_enable_l2_media(struct net *net, struct tipc_bearer *b,
|
|
|
|
struct nlattr *attr[])
|
2013-12-11 12:45:43 +08:00
|
|
|
{
|
2018-03-23 03:42:52 +08:00
|
|
|
char *dev_name = strchr((const char *)b->name, ':') + 1;
|
|
|
|
int hwaddr_len = b->media->hwaddr_len;
|
|
|
|
u8 node_id[NODE_ID_LEN] = {0,};
|
2013-12-11 12:45:43 +08:00
|
|
|
struct net_device *dev;
|
|
|
|
|
|
|
|
/* Find device with specified name */
|
2018-03-23 03:42:52 +08:00
|
|
|
dev = dev_get_by_name(net, dev_name);
|
2013-12-11 12:45:43 +08:00
|
|
|
if (!dev)
|
|
|
|
return -ENODEV;
|
2023-05-29 22:52:13 +08:00
|
|
|
if (tipc_mtu_bad(dev)) {
|
2016-12-02 16:33:41 +08:00
|
|
|
dev_put(dev);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
2019-08-07 10:52:29 +08:00
|
|
|
if (dev == net->loopback_dev) {
|
|
|
|
dev_put(dev);
|
|
|
|
pr_info("Enabling <%s> not permitted\n", b->name);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
2013-12-11 12:45:43 +08:00
|
|
|
|
2018-03-23 03:42:52 +08:00
|
|
|
/* Autoconfigure own node identity if needed */
|
|
|
|
if (!tipc_own_id(net) && hwaddr_len <= NODE_ID_LEN) {
|
|
|
|
memcpy(node_id, dev->dev_addr, hwaddr_len);
|
|
|
|
tipc_net_init(net, node_id, 0);
|
|
|
|
}
|
|
|
|
if (!tipc_own_id(net)) {
|
2018-07-25 18:00:49 +08:00
|
|
|
dev_put(dev);
|
2018-03-23 03:42:52 +08:00
|
|
|
pr_warn("Failed to obtain node identity\n");
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
tipc: improve and extend media address conversion functions
TIPC currently handles two media specific addresses: Ethernet MAC
addresses and InfiniBand addresses. Those are kept in three different
formats:
1) A "raw" format as obtained from the device. This format is known
only by the media specific adapter code in eth_media.c and
ib_media.c.
2) A "generic" internal format, in the form of struct tipc_media_addr,
which can be referenced and passed around by the generic media-
unaware code.
3) A serialized version of the latter, to be conveyed in neighbor
discovery messages.
Conversion between the three formats can only be done by the media
specific code, so we have function pointers for this purpose in
struct tipc_media. Here, the media adapters can install their own
conversion functions at startup.
We now introduce a new such function, 'raw2addr()', whose purpose
is to convert from format 1 to format 2 above. We also try to as far
as possible uniform commenting, variable names and usage of these
functions, with the purpose of making them more comprehensible.
We can now also remove the function tipc_l2_media_addr_set(), whose
job is done better by the new function.
Finally, we expand the field for serialized addresses (format 3)
in discovery messages from 20 to 32 bytes. This is permitted
according to the spec, and reduces the risk of problems when we
add new media in the future.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-14 17:39:13 +08:00
|
|
|
/* Associate TIPC bearer with L2 bearer */
|
2014-04-21 10:55:47 +08:00
|
|
|
rcu_assign_pointer(b->media_ptr, dev);
|
2017-08-28 23:57:02 +08:00
|
|
|
b->pt.dev = dev;
|
|
|
|
b->pt.type = htons(ETH_P_TIPC);
|
|
|
|
b->pt.func = tipc_l2_rcv_msg;
|
|
|
|
dev_add_pack(&b->pt);
|
tipc: improve and extend media address conversion functions
TIPC currently handles two media specific addresses: Ethernet MAC
addresses and InfiniBand addresses. Those are kept in three different
formats:
1) A "raw" format as obtained from the device. This format is known
only by the media specific adapter code in eth_media.c and
ib_media.c.
2) A "generic" internal format, in the form of struct tipc_media_addr,
which can be referenced and passed around by the generic media-
unaware code.
3) A serialized version of the latter, to be conveyed in neighbor
discovery messages.
Conversion between the three formats can only be done by the media
specific code, so we have function pointers for this purpose in
struct tipc_media. Here, the media adapters can install their own
conversion functions at startup.
We now introduce a new such function, 'raw2addr()', whose purpose
is to convert from format 1 to format 2 above. We also try to as far
as possible uniform commenting, variable names and usage of these
functions, with the purpose of making them more comprehensible.
We can now also remove the function tipc_l2_media_addr_set(), whose
job is done better by the new function.
Finally, we expand the field for serialized addresses (format 3)
in discovery messages from 20 to 32 bytes. This is permitted
according to the spec, and reduces the risk of problems when we
add new media in the future.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-14 17:39:13 +08:00
|
|
|
memset(&b->bcast_addr, 0, sizeof(b->bcast_addr));
|
2018-03-23 03:42:52 +08:00
|
|
|
memcpy(b->bcast_addr.value, dev->broadcast, hwaddr_len);
|
2013-12-11 12:45:43 +08:00
|
|
|
b->bcast_addr.media_id = b->media->type_id;
|
2017-01-19 02:50:50 +08:00
|
|
|
b->bcast_addr.broadcast = TIPC_BROADCAST_SUPPORT;
|
2013-12-11 12:45:43 +08:00
|
|
|
b->mtu = dev->mtu;
|
2021-10-12 23:58:39 +08:00
|
|
|
b->media->raw2addr(b, &b->addr, (const char *)dev->dev_addr);
|
2013-12-11 12:45:43 +08:00
|
|
|
rcu_assign_pointer(dev->tipc_ptr, b);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
tipc: improve and extend media address conversion functions
TIPC currently handles two media specific addresses: Ethernet MAC
addresses and InfiniBand addresses. Those are kept in three different
formats:
1) A "raw" format as obtained from the device. This format is known
only by the media specific adapter code in eth_media.c and
ib_media.c.
2) A "generic" internal format, in the form of struct tipc_media_addr,
which can be referenced and passed around by the generic media-
unaware code.
3) A serialized version of the latter, to be conveyed in neighbor
discovery messages.
Conversion between the three formats can only be done by the media
specific code, so we have function pointers for this purpose in
struct tipc_media. Here, the media adapters can install their own
conversion functions at startup.
We now introduce a new such function, 'raw2addr()', whose purpose
is to convert from format 1 to format 2 above. We also try to as far
as possible uniform commenting, variable names and usage of these
functions, with the purpose of making them more comprehensible.
We can now also remove the function tipc_l2_media_addr_set(), whose
job is done better by the new function.
Finally, we expand the field for serialized addresses (format 3)
in discovery messages from 20 to 32 bytes. This is permitted
according to the spec, and reduces the risk of problems when we
add new media in the future.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-14 17:39:13 +08:00
|
|
|
/* tipc_disable_l2_media - detach TIPC bearer from an L2 interface
|
2020-11-30 02:32:42 +08:00
|
|
|
* @b: the target bearer
|
2013-12-11 12:45:43 +08:00
|
|
|
*
|
2015-10-16 02:52:45 +08:00
|
|
|
* Mark L2 bearer as inactive so that incoming buffers are thrown away
|
2013-12-11 12:45:43 +08:00
|
|
|
*/
|
|
|
|
void tipc_disable_l2_media(struct tipc_bearer *b)
|
|
|
|
{
|
2014-04-21 10:55:47 +08:00
|
|
|
struct net_device *dev;
|
|
|
|
|
|
|
|
dev = (struct net_device *)rtnl_dereference(b->media_ptr);
|
2017-08-28 23:57:02 +08:00
|
|
|
dev_remove_pack(&b->pt);
|
2013-12-11 12:45:43 +08:00
|
|
|
RCU_INIT_POINTER(dev->tipc_ptr, NULL);
|
2014-04-21 10:55:49 +08:00
|
|
|
synchronize_net();
|
2013-12-11 12:45:43 +08:00
|
|
|
dev_put(dev);
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
tipc: improve and extend media address conversion functions
TIPC currently handles two media specific addresses: Ethernet MAC
addresses and InfiniBand addresses. Those are kept in three different
formats:
1) A "raw" format as obtained from the device. This format is known
only by the media specific adapter code in eth_media.c and
ib_media.c.
2) A "generic" internal format, in the form of struct tipc_media_addr,
which can be referenced and passed around by the generic media-
unaware code.
3) A serialized version of the latter, to be conveyed in neighbor
discovery messages.
Conversion between the three formats can only be done by the media
specific code, so we have function pointers for this purpose in
struct tipc_media. Here, the media adapters can install their own
conversion functions at startup.
We now introduce a new such function, 'raw2addr()', whose purpose
is to convert from format 1 to format 2 above. We also try to as far
as possible uniform commenting, variable names and usage of these
functions, with the purpose of making them more comprehensible.
We can now also remove the function tipc_l2_media_addr_set(), whose
job is done better by the new function.
Finally, we expand the field for serialized addresses (format 3)
in discovery messages from 20 to 32 bytes. This is permitted
according to the spec, and reduces the risk of problems when we
add new media in the future.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-14 17:39:13 +08:00
|
|
|
* tipc_l2_send_msg - send a TIPC packet out over an L2 interface
|
2020-11-30 02:32:42 +08:00
|
|
|
* @net: the associated network namespace
|
2016-04-07 22:09:14 +08:00
|
|
|
* @skb: the packet to be sent
|
2015-11-20 03:30:47 +08:00
|
|
|
* @b: the bearer through which the packet is to be sent
|
2013-12-11 12:45:43 +08:00
|
|
|
* @dest: peer destination address
|
|
|
|
*/
|
2015-10-22 20:51:45 +08:00
|
|
|
int tipc_l2_send_msg(struct net *net, struct sk_buff *skb,
|
2015-01-09 15:27:07 +08:00
|
|
|
struct tipc_bearer *b, struct tipc_media_addr *dest)
|
2013-12-11 12:45:43 +08:00
|
|
|
{
|
2014-04-21 10:55:47 +08:00
|
|
|
struct net_device *dev;
|
2013-12-11 12:45:43 +08:00
|
|
|
int delta;
|
2014-04-21 10:55:47 +08:00
|
|
|
|
2019-07-02 00:54:55 +08:00
|
|
|
dev = (struct net_device *)rcu_dereference(b->media_ptr);
|
2014-04-21 10:55:47 +08:00
|
|
|
if (!dev)
|
|
|
|
return 0;
|
2013-12-11 12:45:43 +08:00
|
|
|
|
2016-08-16 23:53:50 +08:00
|
|
|
delta = SKB_DATA_ALIGN(dev->hard_header_len - skb_headroom(skb));
|
|
|
|
if ((delta > 0) && pskb_expand_head(skb, delta, 0, GFP_ATOMIC)) {
|
|
|
|
kfree_skb(skb);
|
|
|
|
return 0;
|
|
|
|
}
|
2015-10-22 20:51:45 +08:00
|
|
|
skb_reset_network_header(skb);
|
|
|
|
skb->dev = dev;
|
|
|
|
skb->protocol = htons(ETH_P_TIPC);
|
|
|
|
dev_hard_header(skb, dev, ETH_P_TIPC, dest->value,
|
|
|
|
dev->dev_addr, skb->len);
|
|
|
|
dev_queue_xmit(skb);
|
2013-12-11 12:45:43 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2017-01-19 02:50:50 +08:00
|
|
|
bool tipc_bearer_bcast_support(struct net *net, u32 bearer_id)
|
|
|
|
{
|
|
|
|
bool supp = false;
|
|
|
|
struct tipc_bearer *b;
|
|
|
|
|
|
|
|
rcu_read_lock();
|
|
|
|
b = bearer_get(net, bearer_id);
|
|
|
|
if (b)
|
|
|
|
supp = (b->bcast_addr.broadcast == TIPC_BROADCAST_SUPPORT);
|
|
|
|
rcu_read_unlock();
|
|
|
|
return supp;
|
|
|
|
}
|
|
|
|
|
2015-10-22 20:51:43 +08:00
|
|
|
int tipc_bearer_mtu(struct net *net, u32 bearer_id)
|
|
|
|
{
|
|
|
|
int mtu = 0;
|
|
|
|
struct tipc_bearer *b;
|
|
|
|
|
|
|
|
rcu_read_lock();
|
2023-06-05 22:40:44 +08:00
|
|
|
b = bearer_get(net, bearer_id);
|
2015-10-22 20:51:43 +08:00
|
|
|
if (b)
|
|
|
|
mtu = b->mtu;
|
|
|
|
rcu_read_unlock();
|
|
|
|
return mtu;
|
|
|
|
}
|
|
|
|
|
2023-05-15 03:52:27 +08:00
|
|
|
int tipc_bearer_min_mtu(struct net *net, u32 bearer_id)
|
|
|
|
{
|
|
|
|
int mtu = TIPC_MIN_BEARER_MTU;
|
|
|
|
struct tipc_bearer *b;
|
|
|
|
|
|
|
|
rcu_read_lock();
|
|
|
|
b = bearer_get(net, bearer_id);
|
|
|
|
if (b)
|
|
|
|
mtu += b->encap_hlen;
|
|
|
|
rcu_read_unlock();
|
|
|
|
return mtu;
|
|
|
|
}
|
|
|
|
|
2015-10-22 20:51:44 +08:00
|
|
|
/* tipc_bearer_xmit_skb - sends buffer to destination over bearer
|
|
|
|
*/
|
|
|
|
void tipc_bearer_xmit_skb(struct net *net, u32 bearer_id,
|
|
|
|
struct sk_buff *skb,
|
|
|
|
struct tipc_media_addr *dest)
|
|
|
|
{
|
2016-08-16 23:53:50 +08:00
|
|
|
struct tipc_msg *hdr = buf_msg(skb);
|
2015-10-22 20:51:44 +08:00
|
|
|
struct tipc_bearer *b;
|
|
|
|
|
|
|
|
rcu_read_lock();
|
2016-08-16 23:53:50 +08:00
|
|
|
b = bearer_get(net, bearer_id);
|
2019-11-08 13:05:11 +08:00
|
|
|
if (likely(b && (test_bit(0, &b->up) || msg_is_reset(hdr)))) {
|
|
|
|
#ifdef CONFIG_TIPC_CRYPTO
|
|
|
|
tipc_crypto_xmit(net, &skb, b, dest, NULL);
|
|
|
|
if (skb)
|
|
|
|
#endif
|
|
|
|
b->media->send_msg(net, skb, b, dest);
|
|
|
|
} else {
|
tipc: eliminate buffer leak in bearer layer
When enabling a bearer we create a 'neigbor discoverer' instance by
calling the function tipc_disc_create() before the bearer is actually
registered in the list of enabled bearers. Because of this, the very
first discovery broadcast message, created by the mentioned function,
is lost, since it cannot find any valid bearer to use. Furthermore,
the used send function, tipc_bearer_xmit_skb() does not free the given
buffer when it cannot find a bearer, resulting in the leak of exactly
one send buffer each time a bearer is enabled.
This commit fixes this problem by introducing two changes:
1) Instead of attemting to send the discovery message directly, we let
tipc_disc_create() return the discovery buffer to the calling
function, tipc_enable_bearer(), so that the latter can send it
when the enabling sequence is finished.
2) In tipc_bearer_xmit_skb(), as well as in the two other transmit
functions at the bearer layer, we now free the indicated buffer or
buffer chain when a valid bearer cannot be found.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-07 22:09:13 +08:00
|
|
|
kfree_skb(skb);
|
2019-11-08 13:05:11 +08:00
|
|
|
}
|
2015-10-22 20:51:44 +08:00
|
|
|
rcu_read_unlock();
|
|
|
|
}
|
|
|
|
|
2015-07-17 04:54:24 +08:00
|
|
|
/* tipc_bearer_xmit() -send buffer to destination over bearer
|
|
|
|
*/
|
|
|
|
void tipc_bearer_xmit(struct net *net, u32 bearer_id,
|
|
|
|
struct sk_buff_head *xmitq,
|
2019-11-08 13:05:11 +08:00
|
|
|
struct tipc_media_addr *dst,
|
|
|
|
struct tipc_node *__dnode)
|
2015-07-17 04:54:24 +08:00
|
|
|
{
|
|
|
|
struct tipc_bearer *b;
|
|
|
|
struct sk_buff *skb, *tmp;
|
|
|
|
|
|
|
|
if (skb_queue_empty(xmitq))
|
|
|
|
return;
|
|
|
|
|
|
|
|
rcu_read_lock();
|
2016-08-16 23:53:50 +08:00
|
|
|
b = bearer_get(net, bearer_id);
|
tipc: eliminate buffer leak in bearer layer
When enabling a bearer we create a 'neigbor discoverer' instance by
calling the function tipc_disc_create() before the bearer is actually
registered in the list of enabled bearers. Because of this, the very
first discovery broadcast message, created by the mentioned function,
is lost, since it cannot find any valid bearer to use. Furthermore,
the used send function, tipc_bearer_xmit_skb() does not free the given
buffer when it cannot find a bearer, resulting in the leak of exactly
one send buffer each time a bearer is enabled.
This commit fixes this problem by introducing two changes:
1) Instead of attemting to send the discovery message directly, we let
tipc_disc_create() return the discovery buffer to the calling
function, tipc_enable_bearer(), so that the latter can send it
when the enabling sequence is finished.
2) In tipc_bearer_xmit_skb(), as well as in the two other transmit
functions at the bearer layer, we now free the indicated buffer or
buffer chain when a valid bearer cannot be found.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-07 22:09:13 +08:00
|
|
|
if (unlikely(!b))
|
|
|
|
__skb_queue_purge(xmitq);
|
|
|
|
skb_queue_walk_safe(xmitq, skb, tmp) {
|
|
|
|
__skb_dequeue(xmitq);
|
2019-11-08 13:05:11 +08:00
|
|
|
if (likely(test_bit(0, &b->up) || msg_is_reset(buf_msg(skb)))) {
|
|
|
|
#ifdef CONFIG_TIPC_CRYPTO
|
|
|
|
tipc_crypto_xmit(net, &skb, b, dst, __dnode);
|
|
|
|
if (skb)
|
|
|
|
#endif
|
|
|
|
b->media->send_msg(net, skb, b, dst);
|
|
|
|
} else {
|
2016-08-24 07:01:02 +08:00
|
|
|
kfree_skb(skb);
|
2019-11-08 13:05:11 +08:00
|
|
|
}
|
tipc: simplify bearer level broadcast
Until now, we have been keeping track of the exact set of broadcast
destinations though the help structure tipc_node_map. This leads us to
have to maintain a whole infrastructure for supporting this, including
a pseudo-bearer and a number of functions to manipulate both the bearers
and the node map correctly. Apart from the complexity, this approach is
also limiting, as struct tipc_node_map only can support cluster local
broadcast if we want to avoid it becoming excessively large. We want to
eliminate this limitation, in order to enable introduction of scoped
multicast in the future.
A closer analysis reveals that it is unnecessary maintaining this "full
set" overview; it is sufficient to keep a counter per bearer, indicating
how many nodes can be reached via this bearer at the moment. The protocol
is now robust enough to handle transitional discrepancies between the
nominal number of reachable destinations, as expected by the broadcast
protocol itself, and the number which is actually reachable at the
moment. The initial broadcast synchronization, in conjunction with the
retransmission mechanism, ensures that all packets will eventually be
acknowledged by the correct set of destinations.
This commit introduces these changes.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-22 20:51:42 +08:00
|
|
|
}
|
|
|
|
rcu_read_unlock();
|
|
|
|
}
|
|
|
|
|
|
|
|
/* tipc_bearer_bc_xmit() - broadcast buffers to all destinations
|
|
|
|
*/
|
|
|
|
void tipc_bearer_bc_xmit(struct net *net, u32 bearer_id,
|
|
|
|
struct sk_buff_head *xmitq)
|
|
|
|
{
|
|
|
|
struct tipc_net *tn = tipc_net(net);
|
2019-11-08 13:05:11 +08:00
|
|
|
struct tipc_media_addr *dst;
|
tipc: simplify bearer level broadcast
Until now, we have been keeping track of the exact set of broadcast
destinations though the help structure tipc_node_map. This leads us to
have to maintain a whole infrastructure for supporting this, including
a pseudo-bearer and a number of functions to manipulate both the bearers
and the node map correctly. Apart from the complexity, this approach is
also limiting, as struct tipc_node_map only can support cluster local
broadcast if we want to avoid it becoming excessively large. We want to
eliminate this limitation, in order to enable introduction of scoped
multicast in the future.
A closer analysis reveals that it is unnecessary maintaining this "full
set" overview; it is sufficient to keep a counter per bearer, indicating
how many nodes can be reached via this bearer at the moment. The protocol
is now robust enough to handle transitional discrepancies between the
nominal number of reachable destinations, as expected by the broadcast
protocol itself, and the number which is actually reachable at the
moment. The initial broadcast synchronization, in conjunction with the
retransmission mechanism, ensures that all packets will eventually be
acknowledged by the correct set of destinations.
This commit introduces these changes.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-22 20:51:42 +08:00
|
|
|
int net_id = tn->net_id;
|
|
|
|
struct tipc_bearer *b;
|
|
|
|
struct sk_buff *skb, *tmp;
|
|
|
|
struct tipc_msg *hdr;
|
|
|
|
|
|
|
|
rcu_read_lock();
|
2016-08-16 23:53:50 +08:00
|
|
|
b = bearer_get(net, bearer_id);
|
|
|
|
if (unlikely(!b || !test_bit(0, &b->up)))
|
tipc: eliminate buffer leak in bearer layer
When enabling a bearer we create a 'neigbor discoverer' instance by
calling the function tipc_disc_create() before the bearer is actually
registered in the list of enabled bearers. Because of this, the very
first discovery broadcast message, created by the mentioned function,
is lost, since it cannot find any valid bearer to use. Furthermore,
the used send function, tipc_bearer_xmit_skb() does not free the given
buffer when it cannot find a bearer, resulting in the leak of exactly
one send buffer each time a bearer is enabled.
This commit fixes this problem by introducing two changes:
1) Instead of attemting to send the discovery message directly, we let
tipc_disc_create() return the discovery buffer to the calling
function, tipc_enable_bearer(), so that the latter can send it
when the enabling sequence is finished.
2) In tipc_bearer_xmit_skb(), as well as in the two other transmit
functions at the bearer layer, we now free the indicated buffer or
buffer chain when a valid bearer cannot be found.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-07 22:09:13 +08:00
|
|
|
__skb_queue_purge(xmitq);
|
|
|
|
skb_queue_walk_safe(xmitq, skb, tmp) {
|
|
|
|
hdr = buf_msg(skb);
|
|
|
|
msg_set_non_seq(hdr, 1);
|
|
|
|
msg_set_mc_netid(hdr, net_id);
|
|
|
|
__skb_dequeue(xmitq);
|
2019-11-08 13:05:11 +08:00
|
|
|
dst = &b->bcast_addr;
|
|
|
|
#ifdef CONFIG_TIPC_CRYPTO
|
|
|
|
tipc_crypto_xmit(net, &skb, b, dst, NULL);
|
|
|
|
if (skb)
|
|
|
|
#endif
|
|
|
|
b->media->send_msg(net, skb, b, dst);
|
2015-07-17 04:54:24 +08:00
|
|
|
}
|
|
|
|
rcu_read_unlock();
|
|
|
|
}
|
|
|
|
|
tipc: relocate common functions from media to bearer
Currently, registering a TIPC stack handler in the network device layer
is done twice, once for Ethernet (eth_media) and Infiniband (ib_media)
repectively. But, as this registration is not media specific, we can
avoid some code duplication by moving the registering function to
the generic bearer layer, to the file bearer.c, and call it only once.
The same is true for the network device event notifier.
As a side effect, the two workqueues we are using for for setting up/
cleaning up media can now be eliminated. Furthermore, the array for
storing the specific media type structs, media_array[], can be entirely
deleted.
Note that the eth_started and ib_started flags were removed during the
code relocation. There is now only one call to bearer_setup and
bearer_cleanup, and these can logically not race against each other.
Despite its size, this cleanup work incurs no functional changes in TIPC.
In particular, it should be noted that the sequence ordering of received
packets is unaffected by this change, since packet reception never was
subject to any work queue handling in the first place.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11 12:45:42 +08:00
|
|
|
/**
|
|
|
|
* tipc_l2_rcv_msg - handle incoming TIPC message from an interface
|
2020-07-13 07:15:14 +08:00
|
|
|
* @skb: the received message
|
tipc: relocate common functions from media to bearer
Currently, registering a TIPC stack handler in the network device layer
is done twice, once for Ethernet (eth_media) and Infiniband (ib_media)
repectively. But, as this registration is not media specific, we can
avoid some code duplication by moving the registering function to
the generic bearer layer, to the file bearer.c, and call it only once.
The same is true for the network device event notifier.
As a side effect, the two workqueues we are using for for setting up/
cleaning up media can now be eliminated. Furthermore, the array for
storing the specific media type structs, media_array[], can be entirely
deleted.
Note that the eth_started and ib_started flags were removed during the
code relocation. There is now only one call to bearer_setup and
bearer_cleanup, and these can logically not race against each other.
Despite its size, this cleanup work incurs no functional changes in TIPC.
In particular, it should be noted that the sequence ordering of received
packets is unaffected by this change, since packet reception never was
subject to any work queue handling in the first place.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11 12:45:42 +08:00
|
|
|
* @dev: the net device that the packet was received on
|
|
|
|
* @pt: the packet_type structure which was used to register this handler
|
|
|
|
* @orig_dev: the original receive net device in case the device is a bond
|
|
|
|
*
|
|
|
|
* Accept only packets explicitly sent to this node, or broadcast packets;
|
|
|
|
* ignores packets sent using interface multicast, and traffic sent to other
|
|
|
|
* nodes (which can happen if interface is running in promiscuous mode).
|
|
|
|
*/
|
tipc: eliminate buffer leak in bearer layer
When enabling a bearer we create a 'neigbor discoverer' instance by
calling the function tipc_disc_create() before the bearer is actually
registered in the list of enabled bearers. Because of this, the very
first discovery broadcast message, created by the mentioned function,
is lost, since it cannot find any valid bearer to use. Furthermore,
the used send function, tipc_bearer_xmit_skb() does not free the given
buffer when it cannot find a bearer, resulting in the leak of exactly
one send buffer each time a bearer is enabled.
This commit fixes this problem by introducing two changes:
1) Instead of attemting to send the discovery message directly, we let
tipc_disc_create() return the discovery buffer to the calling
function, tipc_enable_bearer(), so that the latter can send it
when the enabling sequence is finished.
2) In tipc_bearer_xmit_skb(), as well as in the two other transmit
functions at the bearer layer, we now free the indicated buffer or
buffer chain when a valid bearer cannot be found.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-07 22:09:13 +08:00
|
|
|
static int tipc_l2_rcv_msg(struct sk_buff *skb, struct net_device *dev,
|
tipc: relocate common functions from media to bearer
Currently, registering a TIPC stack handler in the network device layer
is done twice, once for Ethernet (eth_media) and Infiniband (ib_media)
repectively. But, as this registration is not media specific, we can
avoid some code duplication by moving the registering function to
the generic bearer layer, to the file bearer.c, and call it only once.
The same is true for the network device event notifier.
As a side effect, the two workqueues we are using for for setting up/
cleaning up media can now be eliminated. Furthermore, the array for
storing the specific media type structs, media_array[], can be entirely
deleted.
Note that the eth_started and ib_started flags were removed during the
code relocation. There is now only one call to bearer_setup and
bearer_cleanup, and these can logically not race against each other.
Despite its size, this cleanup work incurs no functional changes in TIPC.
In particular, it should be noted that the sequence ordering of received
packets is unaffected by this change, since packet reception never was
subject to any work queue handling in the first place.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11 12:45:42 +08:00
|
|
|
struct packet_type *pt, struct net_device *orig_dev)
|
|
|
|
{
|
2015-11-20 03:30:47 +08:00
|
|
|
struct tipc_bearer *b;
|
tipc: relocate common functions from media to bearer
Currently, registering a TIPC stack handler in the network device layer
is done twice, once for Ethernet (eth_media) and Infiniband (ib_media)
repectively. But, as this registration is not media specific, we can
avoid some code duplication by moving the registering function to
the generic bearer layer, to the file bearer.c, and call it only once.
The same is true for the network device event notifier.
As a side effect, the two workqueues we are using for for setting up/
cleaning up media can now be eliminated. Furthermore, the array for
storing the specific media type structs, media_array[], can be entirely
deleted.
Note that the eth_started and ib_started flags were removed during the
code relocation. There is now only one call to bearer_setup and
bearer_cleanup, and these can logically not race against each other.
Despite its size, this cleanup work incurs no functional changes in TIPC.
In particular, it should be noted that the sequence ordering of received
packets is unaffected by this change, since packet reception never was
subject to any work queue handling in the first place.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11 12:45:42 +08:00
|
|
|
|
|
|
|
rcu_read_lock();
|
2019-07-02 00:54:55 +08:00
|
|
|
b = rcu_dereference(dev->tipc_ptr) ?:
|
|
|
|
rcu_dereference(orig_dev->tipc_ptr);
|
2016-08-16 23:53:50 +08:00
|
|
|
if (likely(b && test_bit(0, &b->up) &&
|
2017-08-14 23:55:56 +08:00
|
|
|
(skb->pkt_type <= PACKET_MULTICAST))) {
|
2018-07-30 11:42:53 +08:00
|
|
|
skb_mark_not_on_list(skb);
|
2019-11-08 13:05:11 +08:00
|
|
|
TIPC_SKB_CB(skb)->flags = 0;
|
2017-08-28 23:57:02 +08:00
|
|
|
tipc_rcv(dev_net(b->pt.dev), skb, b);
|
tipc: eliminate buffer leak in bearer layer
When enabling a bearer we create a 'neigbor discoverer' instance by
calling the function tipc_disc_create() before the bearer is actually
registered in the list of enabled bearers. Because of this, the very
first discovery broadcast message, created by the mentioned function,
is lost, since it cannot find any valid bearer to use. Furthermore,
the used send function, tipc_bearer_xmit_skb() does not free the given
buffer when it cannot find a bearer, resulting in the leak of exactly
one send buffer each time a bearer is enabled.
This commit fixes this problem by introducing two changes:
1) Instead of attemting to send the discovery message directly, we let
tipc_disc_create() return the discovery buffer to the calling
function, tipc_enable_bearer(), so that the latter can send it
when the enabling sequence is finished.
2) In tipc_bearer_xmit_skb(), as well as in the two other transmit
functions at the bearer layer, we now free the indicated buffer or
buffer chain when a valid bearer cannot be found.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-07 22:09:13 +08:00
|
|
|
rcu_read_unlock();
|
|
|
|
return NET_RX_SUCCESS;
|
tipc: relocate common functions from media to bearer
Currently, registering a TIPC stack handler in the network device layer
is done twice, once for Ethernet (eth_media) and Infiniband (ib_media)
repectively. But, as this registration is not media specific, we can
avoid some code duplication by moving the registering function to
the generic bearer layer, to the file bearer.c, and call it only once.
The same is true for the network device event notifier.
As a side effect, the two workqueues we are using for for setting up/
cleaning up media can now be eliminated. Furthermore, the array for
storing the specific media type structs, media_array[], can be entirely
deleted.
Note that the eth_started and ib_started flags were removed during the
code relocation. There is now only one call to bearer_setup and
bearer_cleanup, and these can logically not race against each other.
Despite its size, this cleanup work incurs no functional changes in TIPC.
In particular, it should be noted that the sequence ordering of received
packets is unaffected by this change, since packet reception never was
subject to any work queue handling in the first place.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11 12:45:42 +08:00
|
|
|
}
|
|
|
|
rcu_read_unlock();
|
tipc: eliminate buffer leak in bearer layer
When enabling a bearer we create a 'neigbor discoverer' instance by
calling the function tipc_disc_create() before the bearer is actually
registered in the list of enabled bearers. Because of this, the very
first discovery broadcast message, created by the mentioned function,
is lost, since it cannot find any valid bearer to use. Furthermore,
the used send function, tipc_bearer_xmit_skb() does not free the given
buffer when it cannot find a bearer, resulting in the leak of exactly
one send buffer each time a bearer is enabled.
This commit fixes this problem by introducing two changes:
1) Instead of attemting to send the discovery message directly, we let
tipc_disc_create() return the discovery buffer to the calling
function, tipc_enable_bearer(), so that the latter can send it
when the enabling sequence is finished.
2) In tipc_bearer_xmit_skb(), as well as in the two other transmit
functions at the bearer layer, we now free the indicated buffer or
buffer chain when a valid bearer cannot be found.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-07 22:09:13 +08:00
|
|
|
kfree_skb(skb);
|
tipc: relocate common functions from media to bearer
Currently, registering a TIPC stack handler in the network device layer
is done twice, once for Ethernet (eth_media) and Infiniband (ib_media)
repectively. But, as this registration is not media specific, we can
avoid some code duplication by moving the registering function to
the generic bearer layer, to the file bearer.c, and call it only once.
The same is true for the network device event notifier.
As a side effect, the two workqueues we are using for for setting up/
cleaning up media can now be eliminated. Furthermore, the array for
storing the specific media type structs, media_array[], can be entirely
deleted.
Note that the eth_started and ib_started flags were removed during the
code relocation. There is now only one call to bearer_setup and
bearer_cleanup, and these can logically not race against each other.
Despite its size, this cleanup work incurs no functional changes in TIPC.
In particular, it should be noted that the sequence ordering of received
packets is unaffected by this change, since packet reception never was
subject to any work queue handling in the first place.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11 12:45:42 +08:00
|
|
|
return NET_RX_DROP;
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* tipc_l2_device_event - handle device events from network device
|
|
|
|
* @nb: the context of the notification
|
|
|
|
* @evt: the type of event
|
|
|
|
* @ptr: the net device that the event was on
|
|
|
|
*
|
|
|
|
* This function is called by the Ethernet driver in case of link
|
|
|
|
* change event.
|
|
|
|
*/
|
|
|
|
static int tipc_l2_device_event(struct notifier_block *nb, unsigned long evt,
|
|
|
|
void *ptr)
|
|
|
|
{
|
|
|
|
struct net_device *dev = netdev_notifier_info_to_dev(ptr);
|
2015-01-09 15:27:04 +08:00
|
|
|
struct net *net = dev_net(dev);
|
2015-11-20 03:30:47 +08:00
|
|
|
struct tipc_bearer *b;
|
tipc: relocate common functions from media to bearer
Currently, registering a TIPC stack handler in the network device layer
is done twice, once for Ethernet (eth_media) and Infiniband (ib_media)
repectively. But, as this registration is not media specific, we can
avoid some code duplication by moving the registering function to
the generic bearer layer, to the file bearer.c, and call it only once.
The same is true for the network device event notifier.
As a side effect, the two workqueues we are using for for setting up/
cleaning up media can now be eliminated. Furthermore, the array for
storing the specific media type structs, media_array[], can be entirely
deleted.
Note that the eth_started and ib_started flags were removed during the
code relocation. There is now only one call to bearer_setup and
bearer_cleanup, and these can logically not race against each other.
Despite its size, this cleanup work incurs no functional changes in TIPC.
In particular, it should be noted that the sequence ordering of received
packets is unaffected by this change, since packet reception never was
subject to any work queue handling in the first place.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11 12:45:42 +08:00
|
|
|
|
2015-11-20 03:30:47 +08:00
|
|
|
b = rtnl_dereference(dev->tipc_ptr);
|
|
|
|
if (!b)
|
tipc: relocate common functions from media to bearer
Currently, registering a TIPC stack handler in the network device layer
is done twice, once for Ethernet (eth_media) and Infiniband (ib_media)
repectively. But, as this registration is not media specific, we can
avoid some code duplication by moving the registering function to
the generic bearer layer, to the file bearer.c, and call it only once.
The same is true for the network device event notifier.
As a side effect, the two workqueues we are using for for setting up/
cleaning up media can now be eliminated. Furthermore, the array for
storing the specific media type structs, media_array[], can be entirely
deleted.
Note that the eth_started and ib_started flags were removed during the
code relocation. There is now only one call to bearer_setup and
bearer_cleanup, and these can logically not race against each other.
Despite its size, this cleanup work incurs no functional changes in TIPC.
In particular, it should be noted that the sequence ordering of received
packets is unaffected by this change, since packet reception never was
subject to any work queue handling in the first place.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11 12:45:42 +08:00
|
|
|
return NOTIFY_DONE;
|
|
|
|
|
2018-12-19 10:18:00 +08:00
|
|
|
trace_tipc_l2_device_event(dev, b, evt);
|
tipc: relocate common functions from media to bearer
Currently, registering a TIPC stack handler in the network device layer
is done twice, once for Ethernet (eth_media) and Infiniband (ib_media)
repectively. But, as this registration is not media specific, we can
avoid some code duplication by moving the registering function to
the generic bearer layer, to the file bearer.c, and call it only once.
The same is true for the network device event notifier.
As a side effect, the two workqueues we are using for for setting up/
cleaning up media can now be eliminated. Furthermore, the array for
storing the specific media type structs, media_array[], can be entirely
deleted.
Note that the eth_started and ib_started flags were removed during the
code relocation. There is now only one call to bearer_setup and
bearer_cleanup, and these can logically not race against each other.
Despite its size, this cleanup work incurs no functional changes in TIPC.
In particular, it should be noted that the sequence ordering of received
packets is unaffected by this change, since packet reception never was
subject to any work queue handling in the first place.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11 12:45:42 +08:00
|
|
|
switch (evt) {
|
|
|
|
case NETDEV_CHANGE:
|
2018-09-26 03:56:57 +08:00
|
|
|
if (netif_carrier_ok(dev) && netif_oper_up(dev)) {
|
|
|
|
test_and_set_bit_lock(0, &b->up);
|
tipc: relocate common functions from media to bearer
Currently, registering a TIPC stack handler in the network device layer
is done twice, once for Ethernet (eth_media) and Infiniband (ib_media)
repectively. But, as this registration is not media specific, we can
avoid some code duplication by moving the registering function to
the generic bearer layer, to the file bearer.c, and call it only once.
The same is true for the network device event notifier.
As a side effect, the two workqueues we are using for for setting up/
cleaning up media can now be eliminated. Furthermore, the array for
storing the specific media type structs, media_array[], can be entirely
deleted.
Note that the eth_started and ib_started flags were removed during the
code relocation. There is now only one call to bearer_setup and
bearer_cleanup, and these can logically not race against each other.
Despite its size, this cleanup work incurs no functional changes in TIPC.
In particular, it should be noted that the sequence ordering of received
packets is unaffected by this change, since packet reception never was
subject to any work queue handling in the first place.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11 12:45:42 +08:00
|
|
|
break;
|
2018-09-26 03:56:57 +08:00
|
|
|
}
|
2020-08-24 06:36:59 +08:00
|
|
|
fallthrough;
|
2015-10-16 02:52:45 +08:00
|
|
|
case NETDEV_GOING_DOWN:
|
2016-08-16 23:53:50 +08:00
|
|
|
clear_bit_unlock(0, &b->up);
|
2016-04-07 22:09:14 +08:00
|
|
|
tipc_reset_bearer(net, b);
|
|
|
|
break;
|
2018-09-26 03:56:57 +08:00
|
|
|
case NETDEV_UP:
|
|
|
|
test_and_set_bit_lock(0, &b->up);
|
|
|
|
break;
|
tipc: relocate common functions from media to bearer
Currently, registering a TIPC stack handler in the network device layer
is done twice, once for Ethernet (eth_media) and Infiniband (ib_media)
repectively. But, as this registration is not media specific, we can
avoid some code duplication by moving the registering function to
the generic bearer layer, to the file bearer.c, and call it only once.
The same is true for the network device event notifier.
As a side effect, the two workqueues we are using for for setting up/
cleaning up media can now be eliminated. Furthermore, the array for
storing the specific media type structs, media_array[], can be entirely
deleted.
Note that the eth_started and ib_started flags were removed during the
code relocation. There is now only one call to bearer_setup and
bearer_cleanup, and these can logically not race against each other.
Despite its size, this cleanup work incurs no functional changes in TIPC.
In particular, it should be noted that the sequence ordering of received
packets is unaffected by this change, since packet reception never was
subject to any work queue handling in the first place.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11 12:45:42 +08:00
|
|
|
case NETDEV_CHANGEMTU:
|
2023-05-29 22:52:13 +08:00
|
|
|
if (tipc_mtu_bad(dev)) {
|
2016-12-02 16:33:41 +08:00
|
|
|
bearer_disable(net, b);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
b->mtu = dev->mtu;
|
2015-11-20 03:30:47 +08:00
|
|
|
tipc_reset_bearer(net, b);
|
2014-03-28 17:32:08 +08:00
|
|
|
break;
|
tipc: relocate common functions from media to bearer
Currently, registering a TIPC stack handler in the network device layer
is done twice, once for Ethernet (eth_media) and Infiniband (ib_media)
repectively. But, as this registration is not media specific, we can
avoid some code duplication by moving the registering function to
the generic bearer layer, to the file bearer.c, and call it only once.
The same is true for the network device event notifier.
As a side effect, the two workqueues we are using for for setting up/
cleaning up media can now be eliminated. Furthermore, the array for
storing the specific media type structs, media_array[], can be entirely
deleted.
Note that the eth_started and ib_started flags were removed during the
code relocation. There is now only one call to bearer_setup and
bearer_cleanup, and these can logically not race against each other.
Despite its size, this cleanup work incurs no functional changes in TIPC.
In particular, it should be noted that the sequence ordering of received
packets is unaffected by this change, since packet reception never was
subject to any work queue handling in the first place.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11 12:45:42 +08:00
|
|
|
case NETDEV_CHANGEADDR:
|
2015-11-20 03:30:47 +08:00
|
|
|
b->media->raw2addr(b, &b->addr,
|
2021-10-12 23:58:39 +08:00
|
|
|
(const char *)dev->dev_addr);
|
2015-11-20 03:30:47 +08:00
|
|
|
tipc_reset_bearer(net, b);
|
tipc: relocate common functions from media to bearer
Currently, registering a TIPC stack handler in the network device layer
is done twice, once for Ethernet (eth_media) and Infiniband (ib_media)
repectively. But, as this registration is not media specific, we can
avoid some code duplication by moving the registering function to
the generic bearer layer, to the file bearer.c, and call it only once.
The same is true for the network device event notifier.
As a side effect, the two workqueues we are using for for setting up/
cleaning up media can now be eliminated. Furthermore, the array for
storing the specific media type structs, media_array[], can be entirely
deleted.
Note that the eth_started and ib_started flags were removed during the
code relocation. There is now only one call to bearer_setup and
bearer_cleanup, and these can logically not race against each other.
Despite its size, this cleanup work incurs no functional changes in TIPC.
In particular, it should be noted that the sequence ordering of received
packets is unaffected by this change, since packet reception never was
subject to any work queue handling in the first place.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11 12:45:42 +08:00
|
|
|
break;
|
|
|
|
case NETDEV_UNREGISTER:
|
|
|
|
case NETDEV_CHANGENAME:
|
2017-09-06 17:08:06 +08:00
|
|
|
bearer_disable(net, b);
|
tipc: relocate common functions from media to bearer
Currently, registering a TIPC stack handler in the network device layer
is done twice, once for Ethernet (eth_media) and Infiniband (ib_media)
repectively. But, as this registration is not media specific, we can
avoid some code duplication by moving the registering function to
the generic bearer layer, to the file bearer.c, and call it only once.
The same is true for the network device event notifier.
As a side effect, the two workqueues we are using for for setting up/
cleaning up media can now be eliminated. Furthermore, the array for
storing the specific media type structs, media_array[], can be entirely
deleted.
Note that the eth_started and ib_started flags were removed during the
code relocation. There is now only one call to bearer_setup and
bearer_cleanup, and these can logically not race against each other.
Despite its size, this cleanup work incurs no functional changes in TIPC.
In particular, it should be noted that the sequence ordering of received
packets is unaffected by this change, since packet reception never was
subject to any work queue handling in the first place.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11 12:45:42 +08:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
return NOTIFY_OK;
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct notifier_block notifier = {
|
|
|
|
.notifier_call = tipc_l2_device_event,
|
|
|
|
.priority = 0,
|
|
|
|
};
|
|
|
|
|
|
|
|
int tipc_bearer_setup(void)
|
|
|
|
{
|
2017-08-28 23:57:02 +08:00
|
|
|
return register_netdevice_notifier(¬ifier);
|
tipc: relocate common functions from media to bearer
Currently, registering a TIPC stack handler in the network device layer
is done twice, once for Ethernet (eth_media) and Infiniband (ib_media)
repectively. But, as this registration is not media specific, we can
avoid some code duplication by moving the registering function to
the generic bearer layer, to the file bearer.c, and call it only once.
The same is true for the network device event notifier.
As a side effect, the two workqueues we are using for for setting up/
cleaning up media can now be eliminated. Furthermore, the array for
storing the specific media type structs, media_array[], can be entirely
deleted.
Note that the eth_started and ib_started flags were removed during the
code relocation. There is now only one call to bearer_setup and
bearer_cleanup, and these can logically not race against each other.
Despite its size, this cleanup work incurs no functional changes in TIPC.
In particular, it should be noted that the sequence ordering of received
packets is unaffected by this change, since packet reception never was
subject to any work queue handling in the first place.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11 12:45:42 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
void tipc_bearer_cleanup(void)
|
|
|
|
{
|
|
|
|
unregister_netdevice_notifier(¬ifier);
|
|
|
|
}
|
2006-01-03 02:04:38 +08:00
|
|
|
|
2015-01-09 15:27:05 +08:00
|
|
|
void tipc_bearer_stop(struct net *net)
|
2006-01-03 02:04:38 +08:00
|
|
|
{
|
2023-06-05 22:40:44 +08:00
|
|
|
struct tipc_net *tn = tipc_net(net);
|
2015-11-20 03:30:47 +08:00
|
|
|
struct tipc_bearer *b;
|
2006-01-03 02:04:38 +08:00
|
|
|
u32 i;
|
|
|
|
|
|
|
|
for (i = 0; i < MAX_BEARERS; i++) {
|
2015-11-20 03:30:47 +08:00
|
|
|
b = rtnl_dereference(tn->bearer_list[i]);
|
|
|
|
if (b) {
|
|
|
|
bearer_disable(net, b);
|
2015-01-09 15:27:06 +08:00
|
|
|
tn->bearer_list[i] = NULL;
|
2014-03-27 12:54:33 +08:00
|
|
|
}
|
2006-01-03 02:04:38 +08:00
|
|
|
}
|
|
|
|
}
|
2014-11-20 17:29:07 +08:00
|
|
|
|
2019-08-07 10:52:29 +08:00
|
|
|
void tipc_clone_to_loopback(struct net *net, struct sk_buff_head *pkts)
|
|
|
|
{
|
|
|
|
struct net_device *dev = net->loopback_dev;
|
|
|
|
struct sk_buff *skb, *_skb;
|
|
|
|
int exp;
|
|
|
|
|
|
|
|
skb_queue_walk(pkts, _skb) {
|
|
|
|
skb = pskb_copy(_skb, GFP_ATOMIC);
|
|
|
|
if (!skb)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
exp = SKB_DATA_ALIGN(dev->hard_header_len - skb_headroom(skb));
|
|
|
|
if (exp > 0 && pskb_expand_head(skb, exp, 0, GFP_ATOMIC)) {
|
|
|
|
kfree_skb(skb);
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
skb_reset_network_header(skb);
|
|
|
|
dev_hard_header(skb, dev, ETH_P_TIPC, dev->dev_addr,
|
|
|
|
dev->dev_addr, skb->len);
|
|
|
|
skb->dev = dev;
|
|
|
|
skb->pkt_type = PACKET_HOST;
|
|
|
|
skb->ip_summed = CHECKSUM_UNNECESSARY;
|
|
|
|
skb->protocol = eth_type_trans(skb, dev);
|
2022-03-07 05:57:47 +08:00
|
|
|
netif_rx(skb);
|
2019-08-07 10:52:29 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static int tipc_loopback_rcv_pkt(struct sk_buff *skb, struct net_device *dev,
|
|
|
|
struct packet_type *pt, struct net_device *od)
|
|
|
|
{
|
|
|
|
consume_skb(skb);
|
|
|
|
return NET_RX_SUCCESS;
|
|
|
|
}
|
|
|
|
|
|
|
|
int tipc_attach_loopback(struct net *net)
|
|
|
|
{
|
|
|
|
struct net_device *dev = net->loopback_dev;
|
|
|
|
struct tipc_net *tn = tipc_net(net);
|
|
|
|
|
|
|
|
if (!dev)
|
|
|
|
return -ENODEV;
|
|
|
|
|
2022-06-08 12:39:55 +08:00
|
|
|
netdev_hold(dev, &tn->loopback_pt.dev_tracker, GFP_KERNEL);
|
2019-08-07 10:52:29 +08:00
|
|
|
tn->loopback_pt.dev = dev;
|
|
|
|
tn->loopback_pt.type = htons(ETH_P_TIPC);
|
|
|
|
tn->loopback_pt.func = tipc_loopback_rcv_pkt;
|
|
|
|
dev_add_pack(&tn->loopback_pt);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
void tipc_detach_loopback(struct net *net)
|
|
|
|
{
|
|
|
|
struct tipc_net *tn = tipc_net(net);
|
|
|
|
|
|
|
|
dev_remove_pack(&tn->loopback_pt);
|
2022-06-08 12:39:55 +08:00
|
|
|
netdev_put(net->loopback_dev, &tn->loopback_pt.dev_tracker);
|
2019-08-07 10:52:29 +08:00
|
|
|
}
|
|
|
|
|
2014-11-20 17:29:08 +08:00
|
|
|
/* Caller should hold rtnl_lock to protect the bearer */
|
2014-11-24 18:10:29 +08:00
|
|
|
static int __tipc_nl_add_bearer(struct tipc_nl_msg *msg,
|
2015-04-29 00:33:50 +08:00
|
|
|
struct tipc_bearer *bearer, int nlflags)
|
2014-11-20 17:29:08 +08:00
|
|
|
{
|
|
|
|
void *hdr;
|
|
|
|
struct nlattr *attrs;
|
|
|
|
struct nlattr *prop;
|
|
|
|
|
2015-02-09 16:50:03 +08:00
|
|
|
hdr = genlmsg_put(msg->skb, msg->portid, msg->seq, &tipc_genl_family,
|
2015-04-29 00:33:50 +08:00
|
|
|
nlflags, TIPC_NL_BEARER_GET);
|
2014-11-20 17:29:08 +08:00
|
|
|
if (!hdr)
|
|
|
|
return -EMSGSIZE;
|
|
|
|
|
2019-04-26 17:13:06 +08:00
|
|
|
attrs = nla_nest_start_noflag(msg->skb, TIPC_NLA_BEARER);
|
2014-11-20 17:29:08 +08:00
|
|
|
if (!attrs)
|
|
|
|
goto msg_full;
|
|
|
|
|
|
|
|
if (nla_put_string(msg->skb, TIPC_NLA_BEARER_NAME, bearer->name))
|
|
|
|
goto attr_msg_full;
|
|
|
|
|
2019-04-26 17:13:06 +08:00
|
|
|
prop = nla_nest_start_noflag(msg->skb, TIPC_NLA_BEARER_PROP);
|
2014-11-20 17:29:08 +08:00
|
|
|
if (!prop)
|
|
|
|
goto prop_msg_full;
|
|
|
|
if (nla_put_u32(msg->skb, TIPC_NLA_PROP_PRIO, bearer->priority))
|
|
|
|
goto prop_msg_full;
|
|
|
|
if (nla_put_u32(msg->skb, TIPC_NLA_PROP_TOL, bearer->tolerance))
|
|
|
|
goto prop_msg_full;
|
tipc: introduce variable window congestion control
We introduce a simple variable window congestion control for links.
The algorithm is inspired by the Reno algorithm, covering both 'slow
start', 'congestion avoidance', and 'fast recovery' modes.
- We introduce hard lower and upper window limits per link, still
different and configurable per bearer type.
- We introduce a 'slow start theshold' variable, initially set to
the maximum window size.
- We let a link start at the minimum congestion window, i.e. in slow
start mode, and then let is grow rapidly (+1 per rceived ACK) until
it reaches the slow start threshold and enters congestion avoidance
mode.
- In congestion avoidance mode we increment the congestion window for
each window-size number of acked packets, up to a possible maximum
equal to the configured maximum window.
- For each non-duplicate NACK received, we drop back to fast recovery
mode, by setting the both the slow start threshold to and the
congestion window to (current_congestion_window / 2).
- If the timeout handler finds that the transmit queue has not moved
since the previous timeout, it drops the link back to slow start
and forces a probe containing the last sent sequence number to the
sent to the peer, so that this can discover the stale situation.
This change does in reality have effect only on unicast ethernet
transport, as we have seen that there is no room whatsoever for
increasing the window max size for the UDP bearer.
For now, we also choose to keep the limits for the broadcast link
unchanged and equal.
This algorithm seems to give a 50-100% throughput improvement for
messages larger than MTU.
Suggested-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-10 07:52:46 +08:00
|
|
|
if (nla_put_u32(msg->skb, TIPC_NLA_PROP_WIN, bearer->max_win))
|
2014-11-20 17:29:08 +08:00
|
|
|
goto prop_msg_full;
|
2018-04-19 17:06:20 +08:00
|
|
|
if (bearer->media->type_id == TIPC_MEDIA_TYPE_UDP)
|
|
|
|
if (nla_put_u32(msg->skb, TIPC_NLA_PROP_MTU, bearer->mtu))
|
|
|
|
goto prop_msg_full;
|
2014-11-20 17:29:08 +08:00
|
|
|
|
|
|
|
nla_nest_end(msg->skb, prop);
|
2016-08-26 16:52:55 +08:00
|
|
|
|
|
|
|
#ifdef CONFIG_TIPC_MEDIA_UDP
|
|
|
|
if (bearer->media->type_id == TIPC_MEDIA_TYPE_UDP) {
|
|
|
|
if (tipc_udp_nl_add_bearer_data(msg, bearer))
|
|
|
|
goto attr_msg_full;
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2014-11-20 17:29:08 +08:00
|
|
|
nla_nest_end(msg->skb, attrs);
|
|
|
|
genlmsg_end(msg->skb, hdr);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
prop_msg_full:
|
|
|
|
nla_nest_cancel(msg->skb, prop);
|
|
|
|
attr_msg_full:
|
|
|
|
nla_nest_cancel(msg->skb, attrs);
|
|
|
|
msg_full:
|
|
|
|
genlmsg_cancel(msg->skb, hdr);
|
|
|
|
|
|
|
|
return -EMSGSIZE;
|
|
|
|
}
|
|
|
|
|
|
|
|
int tipc_nl_bearer_dump(struct sk_buff *skb, struct netlink_callback *cb)
|
|
|
|
{
|
|
|
|
int err;
|
|
|
|
int i = cb->args[0];
|
|
|
|
struct tipc_bearer *bearer;
|
|
|
|
struct tipc_nl_msg msg;
|
2015-01-09 15:27:06 +08:00
|
|
|
struct net *net = sock_net(skb->sk);
|
2023-06-05 22:40:44 +08:00
|
|
|
struct tipc_net *tn = tipc_net(net);
|
2014-11-20 17:29:08 +08:00
|
|
|
|
|
|
|
if (i == MAX_BEARERS)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
msg.skb = skb;
|
|
|
|
msg.portid = NETLINK_CB(cb->skb).portid;
|
|
|
|
msg.seq = cb->nlh->nlmsg_seq;
|
|
|
|
|
|
|
|
rtnl_lock();
|
|
|
|
for (i = 0; i < MAX_BEARERS; i++) {
|
2015-01-09 15:27:06 +08:00
|
|
|
bearer = rtnl_dereference(tn->bearer_list[i]);
|
2014-11-20 17:29:08 +08:00
|
|
|
if (!bearer)
|
|
|
|
continue;
|
|
|
|
|
2015-04-29 00:33:50 +08:00
|
|
|
err = __tipc_nl_add_bearer(&msg, bearer, NLM_F_MULTI);
|
2014-11-20 17:29:08 +08:00
|
|
|
if (err)
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
rtnl_unlock();
|
|
|
|
|
|
|
|
cb->args[0] = i;
|
|
|
|
return skb->len;
|
|
|
|
}
|
|
|
|
|
|
|
|
int tipc_nl_bearer_get(struct sk_buff *skb, struct genl_info *info)
|
|
|
|
{
|
|
|
|
int err;
|
|
|
|
char *name;
|
|
|
|
struct sk_buff *rep;
|
|
|
|
struct tipc_bearer *bearer;
|
|
|
|
struct tipc_nl_msg msg;
|
|
|
|
struct nlattr *attrs[TIPC_NLA_BEARER_MAX + 1];
|
2015-01-09 15:27:06 +08:00
|
|
|
struct net *net = genl_info_net(info);
|
2014-11-20 17:29:08 +08:00
|
|
|
|
|
|
|
if (!info->attrs[TIPC_NLA_BEARER])
|
|
|
|
return -EINVAL;
|
|
|
|
|
netlink: make validation more configurable for future strictness
We currently have two levels of strict validation:
1) liberal (default)
- undefined (type >= max) & NLA_UNSPEC attributes accepted
- attribute length >= expected accepted
- garbage at end of message accepted
2) strict (opt-in)
- NLA_UNSPEC attributes accepted
- attribute length >= expected accepted
Split out parsing strictness into four different options:
* TRAILING - check that there's no trailing data after parsing
attributes (in message or nested)
* MAXTYPE - reject attrs > max known type
* UNSPEC - reject attributes with NLA_UNSPEC policy entries
* STRICT_ATTRS - strictly validate attribute size
The default for future things should be *everything*.
The current *_strict() is a combination of TRAILING and MAXTYPE,
and is renamed to _deprecated_strict().
The current regular parsing has none of this, and is renamed to
*_parse_deprecated().
Additionally it allows us to selectively set one of the new flags
even on old policies. Notably, the UNSPEC flag could be useful in
this case, since it can be arranged (by filling in the policy) to
not be an incompatible userspace ABI change, but would then going
forward prevent forgetting attribute entries. Similar can apply
to the POLICY flag.
We end up with the following renames:
* nla_parse -> nla_parse_deprecated
* nla_parse_strict -> nla_parse_deprecated_strict
* nlmsg_parse -> nlmsg_parse_deprecated
* nlmsg_parse_strict -> nlmsg_parse_deprecated_strict
* nla_parse_nested -> nla_parse_nested_deprecated
* nla_validate_nested -> nla_validate_nested_deprecated
Using spatch, of course:
@@
expression TB, MAX, HEAD, LEN, POL, EXT;
@@
-nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
+nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)
@@
expression NLH, HDRLEN, TB, MAX, POL, EXT;
@@
-nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
+nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)
@@
expression NLH, HDRLEN, TB, MAX, POL, EXT;
@@
-nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
+nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
@@
expression TB, MAX, NLA, POL, EXT;
@@
-nla_parse_nested(TB, MAX, NLA, POL, EXT)
+nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)
@@
expression START, MAX, POL, EXT;
@@
-nla_validate_nested(START, MAX, POL, EXT)
+nla_validate_nested_deprecated(START, MAX, POL, EXT)
@@
expression NLH, HDRLEN, MAX, POL, EXT;
@@
-nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
+nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)
For this patch, don't actually add the strict, non-renamed versions
yet so that it breaks compile if I get it wrong.
Also, while at it, make nla_validate and nla_parse go down to a
common __nla_validate_parse() function to avoid code duplication.
Ultimately, this allows us to have very strict validation for every
new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
next patch, while existing things will continue to work as is.
In effect then, this adds fully strict validation for any new command.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-26 20:07:28 +08:00
|
|
|
err = nla_parse_nested_deprecated(attrs, TIPC_NLA_BEARER_MAX,
|
|
|
|
info->attrs[TIPC_NLA_BEARER],
|
|
|
|
tipc_nl_bearer_policy, info->extack);
|
2014-11-20 17:29:08 +08:00
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
|
|
|
|
if (!attrs[TIPC_NLA_BEARER_NAME])
|
|
|
|
return -EINVAL;
|
|
|
|
name = nla_data(attrs[TIPC_NLA_BEARER_NAME]);
|
|
|
|
|
|
|
|
rep = nlmsg_new(NLMSG_GOODSIZE, GFP_KERNEL);
|
|
|
|
if (!rep)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
msg.skb = rep;
|
|
|
|
msg.portid = info->snd_portid;
|
|
|
|
msg.seq = info->snd_seq;
|
|
|
|
|
|
|
|
rtnl_lock();
|
2015-01-09 15:27:06 +08:00
|
|
|
bearer = tipc_bearer_find(net, name);
|
2014-11-20 17:29:08 +08:00
|
|
|
if (!bearer) {
|
|
|
|
err = -EINVAL;
|
2021-03-25 09:56:41 +08:00
|
|
|
NL_SET_ERR_MSG(info->extack, "Bearer not found");
|
2014-11-20 17:29:08 +08:00
|
|
|
goto err_out;
|
|
|
|
}
|
|
|
|
|
2015-04-29 00:33:50 +08:00
|
|
|
err = __tipc_nl_add_bearer(&msg, bearer, 0);
|
2014-11-20 17:29:08 +08:00
|
|
|
if (err)
|
|
|
|
goto err_out;
|
|
|
|
rtnl_unlock();
|
|
|
|
|
|
|
|
return genlmsg_reply(rep, info);
|
|
|
|
err_out:
|
|
|
|
rtnl_unlock();
|
|
|
|
nlmsg_free(rep);
|
|
|
|
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2018-02-14 13:37:59 +08:00
|
|
|
int __tipc_nl_bearer_disable(struct sk_buff *skb, struct genl_info *info)
|
2014-11-20 17:29:07 +08:00
|
|
|
{
|
|
|
|
int err;
|
|
|
|
char *name;
|
|
|
|
struct tipc_bearer *bearer;
|
|
|
|
struct nlattr *attrs[TIPC_NLA_BEARER_MAX + 1];
|
2015-02-09 16:50:05 +08:00
|
|
|
struct net *net = sock_net(skb->sk);
|
2014-11-20 17:29:07 +08:00
|
|
|
|
|
|
|
if (!info->attrs[TIPC_NLA_BEARER])
|
|
|
|
return -EINVAL;
|
|
|
|
|
netlink: make validation more configurable for future strictness
We currently have two levels of strict validation:
1) liberal (default)
- undefined (type >= max) & NLA_UNSPEC attributes accepted
- attribute length >= expected accepted
- garbage at end of message accepted
2) strict (opt-in)
- NLA_UNSPEC attributes accepted
- attribute length >= expected accepted
Split out parsing strictness into four different options:
* TRAILING - check that there's no trailing data after parsing
attributes (in message or nested)
* MAXTYPE - reject attrs > max known type
* UNSPEC - reject attributes with NLA_UNSPEC policy entries
* STRICT_ATTRS - strictly validate attribute size
The default for future things should be *everything*.
The current *_strict() is a combination of TRAILING and MAXTYPE,
and is renamed to _deprecated_strict().
The current regular parsing has none of this, and is renamed to
*_parse_deprecated().
Additionally it allows us to selectively set one of the new flags
even on old policies. Notably, the UNSPEC flag could be useful in
this case, since it can be arranged (by filling in the policy) to
not be an incompatible userspace ABI change, but would then going
forward prevent forgetting attribute entries. Similar can apply
to the POLICY flag.
We end up with the following renames:
* nla_parse -> nla_parse_deprecated
* nla_parse_strict -> nla_parse_deprecated_strict
* nlmsg_parse -> nlmsg_parse_deprecated
* nlmsg_parse_strict -> nlmsg_parse_deprecated_strict
* nla_parse_nested -> nla_parse_nested_deprecated
* nla_validate_nested -> nla_validate_nested_deprecated
Using spatch, of course:
@@
expression TB, MAX, HEAD, LEN, POL, EXT;
@@
-nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
+nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)
@@
expression NLH, HDRLEN, TB, MAX, POL, EXT;
@@
-nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
+nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)
@@
expression NLH, HDRLEN, TB, MAX, POL, EXT;
@@
-nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
+nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
@@
expression TB, MAX, NLA, POL, EXT;
@@
-nla_parse_nested(TB, MAX, NLA, POL, EXT)
+nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)
@@
expression START, MAX, POL, EXT;
@@
-nla_validate_nested(START, MAX, POL, EXT)
+nla_validate_nested_deprecated(START, MAX, POL, EXT)
@@
expression NLH, HDRLEN, MAX, POL, EXT;
@@
-nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
+nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)
For this patch, don't actually add the strict, non-renamed versions
yet so that it breaks compile if I get it wrong.
Also, while at it, make nla_validate and nla_parse go down to a
common __nla_validate_parse() function to avoid code duplication.
Ultimately, this allows us to have very strict validation for every
new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
next patch, while existing things will continue to work as is.
In effect then, this adds fully strict validation for any new command.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-26 20:07:28 +08:00
|
|
|
err = nla_parse_nested_deprecated(attrs, TIPC_NLA_BEARER_MAX,
|
|
|
|
info->attrs[TIPC_NLA_BEARER],
|
|
|
|
tipc_nl_bearer_policy, info->extack);
|
2014-11-20 17:29:07 +08:00
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
|
|
|
|
if (!attrs[TIPC_NLA_BEARER_NAME])
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
name = nla_data(attrs[TIPC_NLA_BEARER_NAME]);
|
|
|
|
|
2015-01-09 15:27:06 +08:00
|
|
|
bearer = tipc_bearer_find(net, name);
|
2021-03-25 09:56:41 +08:00
|
|
|
if (!bearer) {
|
|
|
|
NL_SET_ERR_MSG(info->extack, "Bearer not found");
|
2014-11-20 17:29:07 +08:00
|
|
|
return -EINVAL;
|
2021-03-25 09:56:41 +08:00
|
|
|
}
|
2014-11-20 17:29:07 +08:00
|
|
|
|
2015-05-14 22:46:11 +08:00
|
|
|
bearer_disable(net, bearer);
|
2014-11-20 17:29:07 +08:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2018-02-14 13:37:59 +08:00
|
|
|
int tipc_nl_bearer_disable(struct sk_buff *skb, struct genl_info *info)
|
|
|
|
{
|
|
|
|
int err;
|
|
|
|
|
|
|
|
rtnl_lock();
|
|
|
|
err = __tipc_nl_bearer_disable(skb, info);
|
|
|
|
rtnl_unlock();
|
|
|
|
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2018-02-14 13:38:00 +08:00
|
|
|
int __tipc_nl_bearer_enable(struct sk_buff *skb, struct genl_info *info)
|
2014-11-20 17:29:07 +08:00
|
|
|
{
|
|
|
|
int err;
|
|
|
|
char *bearer;
|
|
|
|
struct nlattr *attrs[TIPC_NLA_BEARER_MAX + 1];
|
2015-02-09 16:50:05 +08:00
|
|
|
struct net *net = sock_net(skb->sk);
|
tipc: remove restrictions on node address values
Nominally, TIPC organizes network nodes into a three-level network
hierarchy consisting of the levels 'zone', 'cluster' and 'node'. This
hierarchy is reflected in the node address format, - it is sub-divided
into an 8-bit zone id, and 12 bit cluster id, and a 12-bit node id.
However, the 'zone' and 'cluster' levels have in reality never been
fully implemented,and never will be. The result of this has been
that the first 20 bits the node identity structure have been wasted,
and the usable node identity range within a cluster has been limited
to 12 bits. This is starting to become a problem.
In the following commits, we will need to be able to connect between
nodes which are using the whole 32-bit value space of the node address.
We therefore remove the restrictions on which values can be assigned
to node identity, -it is from now on only a 32-bit integer with no
assumed internal structure.
Isolation between clusters is now achieved only by setting different
values for the 'network id' field used during neighbor discovery, in
practice leading to the latter becoming the new cluster identity.
The rules for accepting discovery requests/responses from neighboring
nodes now become:
- If the user is using legacy address format on both peers, reception
of discovery messages is subject to the legacy lookup domain check
in addition to the cluster id check.
- Otherwise, the discovery request/response is always accepted, provided
both peers have the same network id.
This secures backwards compatibility for users who have been using zone
or cluster identities as cluster separators, instead of the intended
'network id'.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23 03:42:47 +08:00
|
|
|
u32 domain = 0;
|
2014-11-20 17:29:07 +08:00
|
|
|
u32 prio;
|
|
|
|
|
|
|
|
prio = TIPC_MEDIA_LINK_PRI;
|
|
|
|
|
|
|
|
if (!info->attrs[TIPC_NLA_BEARER])
|
|
|
|
return -EINVAL;
|
|
|
|
|
netlink: make validation more configurable for future strictness
We currently have two levels of strict validation:
1) liberal (default)
- undefined (type >= max) & NLA_UNSPEC attributes accepted
- attribute length >= expected accepted
- garbage at end of message accepted
2) strict (opt-in)
- NLA_UNSPEC attributes accepted
- attribute length >= expected accepted
Split out parsing strictness into four different options:
* TRAILING - check that there's no trailing data after parsing
attributes (in message or nested)
* MAXTYPE - reject attrs > max known type
* UNSPEC - reject attributes with NLA_UNSPEC policy entries
* STRICT_ATTRS - strictly validate attribute size
The default for future things should be *everything*.
The current *_strict() is a combination of TRAILING and MAXTYPE,
and is renamed to _deprecated_strict().
The current regular parsing has none of this, and is renamed to
*_parse_deprecated().
Additionally it allows us to selectively set one of the new flags
even on old policies. Notably, the UNSPEC flag could be useful in
this case, since it can be arranged (by filling in the policy) to
not be an incompatible userspace ABI change, but would then going
forward prevent forgetting attribute entries. Similar can apply
to the POLICY flag.
We end up with the following renames:
* nla_parse -> nla_parse_deprecated
* nla_parse_strict -> nla_parse_deprecated_strict
* nlmsg_parse -> nlmsg_parse_deprecated
* nlmsg_parse_strict -> nlmsg_parse_deprecated_strict
* nla_parse_nested -> nla_parse_nested_deprecated
* nla_validate_nested -> nla_validate_nested_deprecated
Using spatch, of course:
@@
expression TB, MAX, HEAD, LEN, POL, EXT;
@@
-nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
+nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)
@@
expression NLH, HDRLEN, TB, MAX, POL, EXT;
@@
-nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
+nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)
@@
expression NLH, HDRLEN, TB, MAX, POL, EXT;
@@
-nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
+nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
@@
expression TB, MAX, NLA, POL, EXT;
@@
-nla_parse_nested(TB, MAX, NLA, POL, EXT)
+nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)
@@
expression START, MAX, POL, EXT;
@@
-nla_validate_nested(START, MAX, POL, EXT)
+nla_validate_nested_deprecated(START, MAX, POL, EXT)
@@
expression NLH, HDRLEN, MAX, POL, EXT;
@@
-nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
+nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)
For this patch, don't actually add the strict, non-renamed versions
yet so that it breaks compile if I get it wrong.
Also, while at it, make nla_validate and nla_parse go down to a
common __nla_validate_parse() function to avoid code duplication.
Ultimately, this allows us to have very strict validation for every
new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
next patch, while existing things will continue to work as is.
In effect then, this adds fully strict validation for any new command.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-26 20:07:28 +08:00
|
|
|
err = nla_parse_nested_deprecated(attrs, TIPC_NLA_BEARER_MAX,
|
|
|
|
info->attrs[TIPC_NLA_BEARER],
|
|
|
|
tipc_nl_bearer_policy, info->extack);
|
2014-11-20 17:29:07 +08:00
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
|
|
|
|
if (!attrs[TIPC_NLA_BEARER_NAME])
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
bearer = nla_data(attrs[TIPC_NLA_BEARER_NAME]);
|
|
|
|
|
|
|
|
if (attrs[TIPC_NLA_BEARER_DOMAIN])
|
|
|
|
domain = nla_get_u32(attrs[TIPC_NLA_BEARER_DOMAIN]);
|
|
|
|
|
|
|
|
if (attrs[TIPC_NLA_BEARER_PROP]) {
|
|
|
|
struct nlattr *props[TIPC_NLA_PROP_MAX + 1];
|
|
|
|
|
|
|
|
err = tipc_nl_parse_link_prop(attrs[TIPC_NLA_BEARER_PROP],
|
|
|
|
props);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
|
|
|
|
if (props[TIPC_NLA_PROP_PRIO])
|
|
|
|
prio = nla_get_u32(props[TIPC_NLA_PROP_PRIO]);
|
|
|
|
}
|
|
|
|
|
2021-03-25 09:56:41 +08:00
|
|
|
return tipc_enable_bearer(net, bearer, domain, prio, attrs,
|
|
|
|
info->extack);
|
2018-02-14 13:38:00 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
int tipc_nl_bearer_enable(struct sk_buff *skb, struct genl_info *info)
|
|
|
|
{
|
|
|
|
int err;
|
|
|
|
|
2014-11-20 17:29:07 +08:00
|
|
|
rtnl_lock();
|
2018-02-14 13:38:00 +08:00
|
|
|
err = __tipc_nl_bearer_enable(skb, info);
|
2014-11-20 17:29:07 +08:00
|
|
|
rtnl_unlock();
|
|
|
|
|
2018-02-14 13:38:00 +08:00
|
|
|
return err;
|
2014-11-20 17:29:07 +08:00
|
|
|
}
|
2014-11-20 17:29:09 +08:00
|
|
|
|
2016-08-26 16:52:53 +08:00
|
|
|
int tipc_nl_bearer_add(struct sk_buff *skb, struct genl_info *info)
|
|
|
|
{
|
|
|
|
int err;
|
|
|
|
char *name;
|
|
|
|
struct tipc_bearer *b;
|
|
|
|
struct nlattr *attrs[TIPC_NLA_BEARER_MAX + 1];
|
|
|
|
struct net *net = sock_net(skb->sk);
|
|
|
|
|
|
|
|
if (!info->attrs[TIPC_NLA_BEARER])
|
|
|
|
return -EINVAL;
|
|
|
|
|
netlink: make validation more configurable for future strictness
We currently have two levels of strict validation:
1) liberal (default)
- undefined (type >= max) & NLA_UNSPEC attributes accepted
- attribute length >= expected accepted
- garbage at end of message accepted
2) strict (opt-in)
- NLA_UNSPEC attributes accepted
- attribute length >= expected accepted
Split out parsing strictness into four different options:
* TRAILING - check that there's no trailing data after parsing
attributes (in message or nested)
* MAXTYPE - reject attrs > max known type
* UNSPEC - reject attributes with NLA_UNSPEC policy entries
* STRICT_ATTRS - strictly validate attribute size
The default for future things should be *everything*.
The current *_strict() is a combination of TRAILING and MAXTYPE,
and is renamed to _deprecated_strict().
The current regular parsing has none of this, and is renamed to
*_parse_deprecated().
Additionally it allows us to selectively set one of the new flags
even on old policies. Notably, the UNSPEC flag could be useful in
this case, since it can be arranged (by filling in the policy) to
not be an incompatible userspace ABI change, but would then going
forward prevent forgetting attribute entries. Similar can apply
to the POLICY flag.
We end up with the following renames:
* nla_parse -> nla_parse_deprecated
* nla_parse_strict -> nla_parse_deprecated_strict
* nlmsg_parse -> nlmsg_parse_deprecated
* nlmsg_parse_strict -> nlmsg_parse_deprecated_strict
* nla_parse_nested -> nla_parse_nested_deprecated
* nla_validate_nested -> nla_validate_nested_deprecated
Using spatch, of course:
@@
expression TB, MAX, HEAD, LEN, POL, EXT;
@@
-nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
+nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)
@@
expression NLH, HDRLEN, TB, MAX, POL, EXT;
@@
-nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
+nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)
@@
expression NLH, HDRLEN, TB, MAX, POL, EXT;
@@
-nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
+nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
@@
expression TB, MAX, NLA, POL, EXT;
@@
-nla_parse_nested(TB, MAX, NLA, POL, EXT)
+nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)
@@
expression START, MAX, POL, EXT;
@@
-nla_validate_nested(START, MAX, POL, EXT)
+nla_validate_nested_deprecated(START, MAX, POL, EXT)
@@
expression NLH, HDRLEN, MAX, POL, EXT;
@@
-nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
+nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)
For this patch, don't actually add the strict, non-renamed versions
yet so that it breaks compile if I get it wrong.
Also, while at it, make nla_validate and nla_parse go down to a
common __nla_validate_parse() function to avoid code duplication.
Ultimately, this allows us to have very strict validation for every
new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
next patch, while existing things will continue to work as is.
In effect then, this adds fully strict validation for any new command.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-26 20:07:28 +08:00
|
|
|
err = nla_parse_nested_deprecated(attrs, TIPC_NLA_BEARER_MAX,
|
|
|
|
info->attrs[TIPC_NLA_BEARER],
|
|
|
|
tipc_nl_bearer_policy, info->extack);
|
2016-08-26 16:52:53 +08:00
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
|
|
|
|
if (!attrs[TIPC_NLA_BEARER_NAME])
|
|
|
|
return -EINVAL;
|
|
|
|
name = nla_data(attrs[TIPC_NLA_BEARER_NAME]);
|
|
|
|
|
|
|
|
rtnl_lock();
|
|
|
|
b = tipc_bearer_find(net, name);
|
|
|
|
if (!b) {
|
2021-03-25 09:56:41 +08:00
|
|
|
NL_SET_ERR_MSG(info->extack, "Bearer not found");
|
2024-02-13 21:40:58 +08:00
|
|
|
err = -EINVAL;
|
|
|
|
goto out;
|
2016-08-26 16:52:53 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
#ifdef CONFIG_TIPC_MEDIA_UDP
|
|
|
|
if (attrs[TIPC_NLA_BEARER_UDP_OPTS]) {
|
2024-01-31 23:23:09 +08:00
|
|
|
if (b->media->type_id != TIPC_MEDIA_TYPE_UDP) {
|
|
|
|
NL_SET_ERR_MSG(info->extack, "UDP option is unsupported");
|
2024-02-13 21:40:58 +08:00
|
|
|
err = -EINVAL;
|
|
|
|
goto out;
|
2024-01-31 23:23:09 +08:00
|
|
|
}
|
|
|
|
|
2016-08-26 16:52:53 +08:00
|
|
|
err = tipc_udp_nl_bearer_add(b,
|
|
|
|
attrs[TIPC_NLA_BEARER_UDP_OPTS]);
|
|
|
|
}
|
|
|
|
#endif
|
2024-02-13 21:40:58 +08:00
|
|
|
out:
|
2016-08-26 16:52:53 +08:00
|
|
|
rtnl_unlock();
|
|
|
|
|
2024-02-13 21:40:58 +08:00
|
|
|
return err;
|
2016-08-26 16:52:53 +08:00
|
|
|
}
|
|
|
|
|
2018-02-14 13:38:01 +08:00
|
|
|
int __tipc_nl_bearer_set(struct sk_buff *skb, struct genl_info *info)
|
2014-11-20 17:29:09 +08:00
|
|
|
{
|
|
|
|
struct tipc_bearer *b;
|
|
|
|
struct nlattr *attrs[TIPC_NLA_BEARER_MAX + 1];
|
2015-05-06 19:58:54 +08:00
|
|
|
struct net *net = sock_net(skb->sk);
|
2018-02-14 20:34:39 +08:00
|
|
|
char *name;
|
|
|
|
int err;
|
2014-11-20 17:29:09 +08:00
|
|
|
|
|
|
|
if (!info->attrs[TIPC_NLA_BEARER])
|
|
|
|
return -EINVAL;
|
|
|
|
|
netlink: make validation more configurable for future strictness
We currently have two levels of strict validation:
1) liberal (default)
- undefined (type >= max) & NLA_UNSPEC attributes accepted
- attribute length >= expected accepted
- garbage at end of message accepted
2) strict (opt-in)
- NLA_UNSPEC attributes accepted
- attribute length >= expected accepted
Split out parsing strictness into four different options:
* TRAILING - check that there's no trailing data after parsing
attributes (in message or nested)
* MAXTYPE - reject attrs > max known type
* UNSPEC - reject attributes with NLA_UNSPEC policy entries
* STRICT_ATTRS - strictly validate attribute size
The default for future things should be *everything*.
The current *_strict() is a combination of TRAILING and MAXTYPE,
and is renamed to _deprecated_strict().
The current regular parsing has none of this, and is renamed to
*_parse_deprecated().
Additionally it allows us to selectively set one of the new flags
even on old policies. Notably, the UNSPEC flag could be useful in
this case, since it can be arranged (by filling in the policy) to
not be an incompatible userspace ABI change, but would then going
forward prevent forgetting attribute entries. Similar can apply
to the POLICY flag.
We end up with the following renames:
* nla_parse -> nla_parse_deprecated
* nla_parse_strict -> nla_parse_deprecated_strict
* nlmsg_parse -> nlmsg_parse_deprecated
* nlmsg_parse_strict -> nlmsg_parse_deprecated_strict
* nla_parse_nested -> nla_parse_nested_deprecated
* nla_validate_nested -> nla_validate_nested_deprecated
Using spatch, of course:
@@
expression TB, MAX, HEAD, LEN, POL, EXT;
@@
-nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
+nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)
@@
expression NLH, HDRLEN, TB, MAX, POL, EXT;
@@
-nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
+nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)
@@
expression NLH, HDRLEN, TB, MAX, POL, EXT;
@@
-nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
+nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
@@
expression TB, MAX, NLA, POL, EXT;
@@
-nla_parse_nested(TB, MAX, NLA, POL, EXT)
+nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)
@@
expression START, MAX, POL, EXT;
@@
-nla_validate_nested(START, MAX, POL, EXT)
+nla_validate_nested_deprecated(START, MAX, POL, EXT)
@@
expression NLH, HDRLEN, MAX, POL, EXT;
@@
-nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
+nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)
For this patch, don't actually add the strict, non-renamed versions
yet so that it breaks compile if I get it wrong.
Also, while at it, make nla_validate and nla_parse go down to a
common __nla_validate_parse() function to avoid code duplication.
Ultimately, this allows us to have very strict validation for every
new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
next patch, while existing things will continue to work as is.
In effect then, this adds fully strict validation for any new command.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-26 20:07:28 +08:00
|
|
|
err = nla_parse_nested_deprecated(attrs, TIPC_NLA_BEARER_MAX,
|
|
|
|
info->attrs[TIPC_NLA_BEARER],
|
|
|
|
tipc_nl_bearer_policy, info->extack);
|
2014-11-20 17:29:09 +08:00
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
|
|
|
|
if (!attrs[TIPC_NLA_BEARER_NAME])
|
|
|
|
return -EINVAL;
|
|
|
|
name = nla_data(attrs[TIPC_NLA_BEARER_NAME]);
|
|
|
|
|
2015-01-09 15:27:06 +08:00
|
|
|
b = tipc_bearer_find(net, name);
|
2021-03-25 09:56:41 +08:00
|
|
|
if (!b) {
|
|
|
|
NL_SET_ERR_MSG(info->extack, "Bearer not found");
|
2014-11-20 17:29:09 +08:00
|
|
|
return -EINVAL;
|
2021-03-25 09:56:41 +08:00
|
|
|
}
|
2014-11-20 17:29:09 +08:00
|
|
|
|
|
|
|
if (attrs[TIPC_NLA_BEARER_PROP]) {
|
|
|
|
struct nlattr *props[TIPC_NLA_PROP_MAX + 1];
|
|
|
|
|
|
|
|
err = tipc_nl_parse_link_prop(attrs[TIPC_NLA_BEARER_PROP],
|
|
|
|
props);
|
2018-02-14 13:38:01 +08:00
|
|
|
if (err)
|
2014-11-20 17:29:09 +08:00
|
|
|
return err;
|
|
|
|
|
2018-02-14 20:34:39 +08:00
|
|
|
if (props[TIPC_NLA_PROP_TOL]) {
|
2014-11-20 17:29:09 +08:00
|
|
|
b->tolerance = nla_get_u32(props[TIPC_NLA_PROP_TOL]);
|
2018-04-19 17:06:20 +08:00
|
|
|
tipc_node_apply_property(net, b, TIPC_NLA_PROP_TOL);
|
2018-02-14 20:34:39 +08:00
|
|
|
}
|
2014-11-20 17:29:09 +08:00
|
|
|
if (props[TIPC_NLA_PROP_PRIO])
|
|
|
|
b->priority = nla_get_u32(props[TIPC_NLA_PROP_PRIO]);
|
|
|
|
if (props[TIPC_NLA_PROP_WIN])
|
tipc: introduce variable window congestion control
We introduce a simple variable window congestion control for links.
The algorithm is inspired by the Reno algorithm, covering both 'slow
start', 'congestion avoidance', and 'fast recovery' modes.
- We introduce hard lower and upper window limits per link, still
different and configurable per bearer type.
- We introduce a 'slow start theshold' variable, initially set to
the maximum window size.
- We let a link start at the minimum congestion window, i.e. in slow
start mode, and then let is grow rapidly (+1 per rceived ACK) until
it reaches the slow start threshold and enters congestion avoidance
mode.
- In congestion avoidance mode we increment the congestion window for
each window-size number of acked packets, up to a possible maximum
equal to the configured maximum window.
- For each non-duplicate NACK received, we drop back to fast recovery
mode, by setting the both the slow start threshold to and the
congestion window to (current_congestion_window / 2).
- If the timeout handler finds that the transmit queue has not moved
since the previous timeout, it drops the link back to slow start
and forces a probe containing the last sent sequence number to the
sent to the peer, so that this can discover the stale situation.
This change does in reality have effect only on unicast ethernet
transport, as we have seen that there is no room whatsoever for
increasing the window max size for the UDP bearer.
For now, we also choose to keep the limits for the broadcast link
unchanged and equal.
This algorithm seems to give a 50-100% throughput improvement for
messages larger than MTU.
Suggested-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-10 07:52:46 +08:00
|
|
|
b->max_win = nla_get_u32(props[TIPC_NLA_PROP_WIN]);
|
2018-04-19 17:06:20 +08:00
|
|
|
if (props[TIPC_NLA_PROP_MTU]) {
|
2021-03-25 09:56:41 +08:00
|
|
|
if (b->media->type_id != TIPC_MEDIA_TYPE_UDP) {
|
|
|
|
NL_SET_ERR_MSG(info->extack,
|
|
|
|
"MTU property is unsupported");
|
2018-04-19 17:06:20 +08:00
|
|
|
return -EINVAL;
|
2021-03-25 09:56:41 +08:00
|
|
|
}
|
2018-04-19 17:06:20 +08:00
|
|
|
#ifdef CONFIG_TIPC_MEDIA_UDP
|
2023-05-15 03:52:29 +08:00
|
|
|
if (nla_get_u32(props[TIPC_NLA_PROP_MTU]) <
|
|
|
|
b->encap_hlen + TIPC_MIN_BEARER_MTU) {
|
2021-03-25 09:56:41 +08:00
|
|
|
NL_SET_ERR_MSG(info->extack,
|
|
|
|
"MTU value is out-of-range");
|
2018-04-19 17:06:20 +08:00
|
|
|
return -EINVAL;
|
2021-03-25 09:56:41 +08:00
|
|
|
}
|
2018-04-19 17:06:20 +08:00
|
|
|
b->mtu = nla_get_u32(props[TIPC_NLA_PROP_MTU]);
|
|
|
|
tipc_node_apply_property(net, b, TIPC_NLA_PROP_MTU);
|
|
|
|
#endif
|
|
|
|
}
|
2014-11-20 17:29:09 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
2014-11-20 17:29:15 +08:00
|
|
|
|
2018-02-14 13:38:01 +08:00
|
|
|
int tipc_nl_bearer_set(struct sk_buff *skb, struct genl_info *info)
|
|
|
|
{
|
|
|
|
int err;
|
|
|
|
|
|
|
|
rtnl_lock();
|
|
|
|
err = __tipc_nl_bearer_set(skb, info);
|
|
|
|
rtnl_unlock();
|
|
|
|
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2014-11-24 18:10:29 +08:00
|
|
|
static int __tipc_nl_add_media(struct tipc_nl_msg *msg,
|
2015-04-29 00:33:50 +08:00
|
|
|
struct tipc_media *media, int nlflags)
|
2014-11-20 17:29:15 +08:00
|
|
|
{
|
|
|
|
void *hdr;
|
|
|
|
struct nlattr *attrs;
|
|
|
|
struct nlattr *prop;
|
|
|
|
|
2015-02-09 16:50:03 +08:00
|
|
|
hdr = genlmsg_put(msg->skb, msg->portid, msg->seq, &tipc_genl_family,
|
2015-04-29 00:33:50 +08:00
|
|
|
nlflags, TIPC_NL_MEDIA_GET);
|
2014-11-20 17:29:15 +08:00
|
|
|
if (!hdr)
|
|
|
|
return -EMSGSIZE;
|
|
|
|
|
2019-04-26 17:13:06 +08:00
|
|
|
attrs = nla_nest_start_noflag(msg->skb, TIPC_NLA_MEDIA);
|
2014-11-20 17:29:15 +08:00
|
|
|
if (!attrs)
|
|
|
|
goto msg_full;
|
|
|
|
|
|
|
|
if (nla_put_string(msg->skb, TIPC_NLA_MEDIA_NAME, media->name))
|
|
|
|
goto attr_msg_full;
|
|
|
|
|
2019-04-26 17:13:06 +08:00
|
|
|
prop = nla_nest_start_noflag(msg->skb, TIPC_NLA_MEDIA_PROP);
|
2014-11-20 17:29:15 +08:00
|
|
|
if (!prop)
|
|
|
|
goto prop_msg_full;
|
|
|
|
if (nla_put_u32(msg->skb, TIPC_NLA_PROP_PRIO, media->priority))
|
|
|
|
goto prop_msg_full;
|
|
|
|
if (nla_put_u32(msg->skb, TIPC_NLA_PROP_TOL, media->tolerance))
|
|
|
|
goto prop_msg_full;
|
tipc: introduce variable window congestion control
We introduce a simple variable window congestion control for links.
The algorithm is inspired by the Reno algorithm, covering both 'slow
start', 'congestion avoidance', and 'fast recovery' modes.
- We introduce hard lower and upper window limits per link, still
different and configurable per bearer type.
- We introduce a 'slow start theshold' variable, initially set to
the maximum window size.
- We let a link start at the minimum congestion window, i.e. in slow
start mode, and then let is grow rapidly (+1 per rceived ACK) until
it reaches the slow start threshold and enters congestion avoidance
mode.
- In congestion avoidance mode we increment the congestion window for
each window-size number of acked packets, up to a possible maximum
equal to the configured maximum window.
- For each non-duplicate NACK received, we drop back to fast recovery
mode, by setting the both the slow start threshold to and the
congestion window to (current_congestion_window / 2).
- If the timeout handler finds that the transmit queue has not moved
since the previous timeout, it drops the link back to slow start
and forces a probe containing the last sent sequence number to the
sent to the peer, so that this can discover the stale situation.
This change does in reality have effect only on unicast ethernet
transport, as we have seen that there is no room whatsoever for
increasing the window max size for the UDP bearer.
For now, we also choose to keep the limits for the broadcast link
unchanged and equal.
This algorithm seems to give a 50-100% throughput improvement for
messages larger than MTU.
Suggested-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-10 07:52:46 +08:00
|
|
|
if (nla_put_u32(msg->skb, TIPC_NLA_PROP_WIN, media->max_win))
|
2014-11-20 17:29:15 +08:00
|
|
|
goto prop_msg_full;
|
2018-04-19 17:06:19 +08:00
|
|
|
if (media->type_id == TIPC_MEDIA_TYPE_UDP)
|
|
|
|
if (nla_put_u32(msg->skb, TIPC_NLA_PROP_MTU, media->mtu))
|
|
|
|
goto prop_msg_full;
|
2014-11-20 17:29:15 +08:00
|
|
|
|
|
|
|
nla_nest_end(msg->skb, prop);
|
|
|
|
nla_nest_end(msg->skb, attrs);
|
|
|
|
genlmsg_end(msg->skb, hdr);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
prop_msg_full:
|
|
|
|
nla_nest_cancel(msg->skb, prop);
|
|
|
|
attr_msg_full:
|
|
|
|
nla_nest_cancel(msg->skb, attrs);
|
|
|
|
msg_full:
|
|
|
|
genlmsg_cancel(msg->skb, hdr);
|
|
|
|
|
|
|
|
return -EMSGSIZE;
|
|
|
|
}
|
|
|
|
|
|
|
|
int tipc_nl_media_dump(struct sk_buff *skb, struct netlink_callback *cb)
|
|
|
|
{
|
|
|
|
int err;
|
|
|
|
int i = cb->args[0];
|
|
|
|
struct tipc_nl_msg msg;
|
|
|
|
|
|
|
|
if (i == MAX_MEDIA)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
msg.skb = skb;
|
|
|
|
msg.portid = NETLINK_CB(cb->skb).portid;
|
|
|
|
msg.seq = cb->nlh->nlmsg_seq;
|
|
|
|
|
|
|
|
rtnl_lock();
|
|
|
|
for (; media_info_array[i] != NULL; i++) {
|
2015-04-29 00:33:50 +08:00
|
|
|
err = __tipc_nl_add_media(&msg, media_info_array[i],
|
|
|
|
NLM_F_MULTI);
|
2014-11-20 17:29:15 +08:00
|
|
|
if (err)
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
rtnl_unlock();
|
|
|
|
|
|
|
|
cb->args[0] = i;
|
|
|
|
return skb->len;
|
|
|
|
}
|
|
|
|
|
|
|
|
int tipc_nl_media_get(struct sk_buff *skb, struct genl_info *info)
|
|
|
|
{
|
|
|
|
int err;
|
|
|
|
char *name;
|
|
|
|
struct tipc_nl_msg msg;
|
|
|
|
struct tipc_media *media;
|
|
|
|
struct sk_buff *rep;
|
2023-06-14 20:06:04 +08:00
|
|
|
struct nlattr *attrs[TIPC_NLA_MEDIA_MAX + 1];
|
2014-11-20 17:29:15 +08:00
|
|
|
|
|
|
|
if (!info->attrs[TIPC_NLA_MEDIA])
|
|
|
|
return -EINVAL;
|
|
|
|
|
netlink: make validation more configurable for future strictness
We currently have two levels of strict validation:
1) liberal (default)
- undefined (type >= max) & NLA_UNSPEC attributes accepted
- attribute length >= expected accepted
- garbage at end of message accepted
2) strict (opt-in)
- NLA_UNSPEC attributes accepted
- attribute length >= expected accepted
Split out parsing strictness into four different options:
* TRAILING - check that there's no trailing data after parsing
attributes (in message or nested)
* MAXTYPE - reject attrs > max known type
* UNSPEC - reject attributes with NLA_UNSPEC policy entries
* STRICT_ATTRS - strictly validate attribute size
The default for future things should be *everything*.
The current *_strict() is a combination of TRAILING and MAXTYPE,
and is renamed to _deprecated_strict().
The current regular parsing has none of this, and is renamed to
*_parse_deprecated().
Additionally it allows us to selectively set one of the new flags
even on old policies. Notably, the UNSPEC flag could be useful in
this case, since it can be arranged (by filling in the policy) to
not be an incompatible userspace ABI change, but would then going
forward prevent forgetting attribute entries. Similar can apply
to the POLICY flag.
We end up with the following renames:
* nla_parse -> nla_parse_deprecated
* nla_parse_strict -> nla_parse_deprecated_strict
* nlmsg_parse -> nlmsg_parse_deprecated
* nlmsg_parse_strict -> nlmsg_parse_deprecated_strict
* nla_parse_nested -> nla_parse_nested_deprecated
* nla_validate_nested -> nla_validate_nested_deprecated
Using spatch, of course:
@@
expression TB, MAX, HEAD, LEN, POL, EXT;
@@
-nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
+nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)
@@
expression NLH, HDRLEN, TB, MAX, POL, EXT;
@@
-nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
+nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)
@@
expression NLH, HDRLEN, TB, MAX, POL, EXT;
@@
-nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
+nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
@@
expression TB, MAX, NLA, POL, EXT;
@@
-nla_parse_nested(TB, MAX, NLA, POL, EXT)
+nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)
@@
expression START, MAX, POL, EXT;
@@
-nla_validate_nested(START, MAX, POL, EXT)
+nla_validate_nested_deprecated(START, MAX, POL, EXT)
@@
expression NLH, HDRLEN, MAX, POL, EXT;
@@
-nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
+nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)
For this patch, don't actually add the strict, non-renamed versions
yet so that it breaks compile if I get it wrong.
Also, while at it, make nla_validate and nla_parse go down to a
common __nla_validate_parse() function to avoid code duplication.
Ultimately, this allows us to have very strict validation for every
new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
next patch, while existing things will continue to work as is.
In effect then, this adds fully strict validation for any new command.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-26 20:07:28 +08:00
|
|
|
err = nla_parse_nested_deprecated(attrs, TIPC_NLA_MEDIA_MAX,
|
|
|
|
info->attrs[TIPC_NLA_MEDIA],
|
|
|
|
tipc_nl_media_policy, info->extack);
|
2014-11-20 17:29:15 +08:00
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
|
|
|
|
if (!attrs[TIPC_NLA_MEDIA_NAME])
|
|
|
|
return -EINVAL;
|
|
|
|
name = nla_data(attrs[TIPC_NLA_MEDIA_NAME]);
|
|
|
|
|
|
|
|
rep = nlmsg_new(NLMSG_GOODSIZE, GFP_KERNEL);
|
|
|
|
if (!rep)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
msg.skb = rep;
|
|
|
|
msg.portid = info->snd_portid;
|
|
|
|
msg.seq = info->snd_seq;
|
|
|
|
|
|
|
|
rtnl_lock();
|
|
|
|
media = tipc_media_find(name);
|
|
|
|
if (!media) {
|
2021-03-25 09:56:41 +08:00
|
|
|
NL_SET_ERR_MSG(info->extack, "Media not found");
|
2014-11-20 17:29:15 +08:00
|
|
|
err = -EINVAL;
|
|
|
|
goto err_out;
|
|
|
|
}
|
|
|
|
|
2015-04-29 00:33:50 +08:00
|
|
|
err = __tipc_nl_add_media(&msg, media, 0);
|
2014-11-20 17:29:15 +08:00
|
|
|
if (err)
|
|
|
|
goto err_out;
|
|
|
|
rtnl_unlock();
|
|
|
|
|
|
|
|
return genlmsg_reply(rep, info);
|
|
|
|
err_out:
|
|
|
|
rtnl_unlock();
|
|
|
|
nlmsg_free(rep);
|
|
|
|
|
|
|
|
return err;
|
|
|
|
}
|
2014-11-20 17:29:16 +08:00
|
|
|
|
2018-02-14 13:38:02 +08:00
|
|
|
int __tipc_nl_media_set(struct sk_buff *skb, struct genl_info *info)
|
2014-11-20 17:29:16 +08:00
|
|
|
{
|
|
|
|
int err;
|
|
|
|
char *name;
|
|
|
|
struct tipc_media *m;
|
2023-06-14 20:06:04 +08:00
|
|
|
struct nlattr *attrs[TIPC_NLA_MEDIA_MAX + 1];
|
2014-11-20 17:29:16 +08:00
|
|
|
|
|
|
|
if (!info->attrs[TIPC_NLA_MEDIA])
|
|
|
|
return -EINVAL;
|
|
|
|
|
netlink: make validation more configurable for future strictness
We currently have two levels of strict validation:
1) liberal (default)
- undefined (type >= max) & NLA_UNSPEC attributes accepted
- attribute length >= expected accepted
- garbage at end of message accepted
2) strict (opt-in)
- NLA_UNSPEC attributes accepted
- attribute length >= expected accepted
Split out parsing strictness into four different options:
* TRAILING - check that there's no trailing data after parsing
attributes (in message or nested)
* MAXTYPE - reject attrs > max known type
* UNSPEC - reject attributes with NLA_UNSPEC policy entries
* STRICT_ATTRS - strictly validate attribute size
The default for future things should be *everything*.
The current *_strict() is a combination of TRAILING and MAXTYPE,
and is renamed to _deprecated_strict().
The current regular parsing has none of this, and is renamed to
*_parse_deprecated().
Additionally it allows us to selectively set one of the new flags
even on old policies. Notably, the UNSPEC flag could be useful in
this case, since it can be arranged (by filling in the policy) to
not be an incompatible userspace ABI change, but would then going
forward prevent forgetting attribute entries. Similar can apply
to the POLICY flag.
We end up with the following renames:
* nla_parse -> nla_parse_deprecated
* nla_parse_strict -> nla_parse_deprecated_strict
* nlmsg_parse -> nlmsg_parse_deprecated
* nlmsg_parse_strict -> nlmsg_parse_deprecated_strict
* nla_parse_nested -> nla_parse_nested_deprecated
* nla_validate_nested -> nla_validate_nested_deprecated
Using spatch, of course:
@@
expression TB, MAX, HEAD, LEN, POL, EXT;
@@
-nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
+nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)
@@
expression NLH, HDRLEN, TB, MAX, POL, EXT;
@@
-nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
+nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)
@@
expression NLH, HDRLEN, TB, MAX, POL, EXT;
@@
-nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
+nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
@@
expression TB, MAX, NLA, POL, EXT;
@@
-nla_parse_nested(TB, MAX, NLA, POL, EXT)
+nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)
@@
expression START, MAX, POL, EXT;
@@
-nla_validate_nested(START, MAX, POL, EXT)
+nla_validate_nested_deprecated(START, MAX, POL, EXT)
@@
expression NLH, HDRLEN, MAX, POL, EXT;
@@
-nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
+nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)
For this patch, don't actually add the strict, non-renamed versions
yet so that it breaks compile if I get it wrong.
Also, while at it, make nla_validate and nla_parse go down to a
common __nla_validate_parse() function to avoid code duplication.
Ultimately, this allows us to have very strict validation for every
new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
next patch, while existing things will continue to work as is.
In effect then, this adds fully strict validation for any new command.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-26 20:07:28 +08:00
|
|
|
err = nla_parse_nested_deprecated(attrs, TIPC_NLA_MEDIA_MAX,
|
|
|
|
info->attrs[TIPC_NLA_MEDIA],
|
|
|
|
tipc_nl_media_policy, info->extack);
|
2014-11-20 17:29:16 +08:00
|
|
|
|
|
|
|
if (!attrs[TIPC_NLA_MEDIA_NAME])
|
|
|
|
return -EINVAL;
|
|
|
|
name = nla_data(attrs[TIPC_NLA_MEDIA_NAME]);
|
|
|
|
|
|
|
|
m = tipc_media_find(name);
|
2021-03-25 09:56:41 +08:00
|
|
|
if (!m) {
|
|
|
|
NL_SET_ERR_MSG(info->extack, "Media not found");
|
2014-11-20 17:29:16 +08:00
|
|
|
return -EINVAL;
|
2021-03-25 09:56:41 +08:00
|
|
|
}
|
2014-11-20 17:29:16 +08:00
|
|
|
if (attrs[TIPC_NLA_MEDIA_PROP]) {
|
|
|
|
struct nlattr *props[TIPC_NLA_PROP_MAX + 1];
|
|
|
|
|
|
|
|
err = tipc_nl_parse_link_prop(attrs[TIPC_NLA_MEDIA_PROP],
|
|
|
|
props);
|
2018-02-14 13:38:02 +08:00
|
|
|
if (err)
|
2014-11-20 17:29:16 +08:00
|
|
|
return err;
|
|
|
|
|
|
|
|
if (props[TIPC_NLA_PROP_TOL])
|
|
|
|
m->tolerance = nla_get_u32(props[TIPC_NLA_PROP_TOL]);
|
|
|
|
if (props[TIPC_NLA_PROP_PRIO])
|
|
|
|
m->priority = nla_get_u32(props[TIPC_NLA_PROP_PRIO]);
|
|
|
|
if (props[TIPC_NLA_PROP_WIN])
|
tipc: introduce variable window congestion control
We introduce a simple variable window congestion control for links.
The algorithm is inspired by the Reno algorithm, covering both 'slow
start', 'congestion avoidance', and 'fast recovery' modes.
- We introduce hard lower and upper window limits per link, still
different and configurable per bearer type.
- We introduce a 'slow start theshold' variable, initially set to
the maximum window size.
- We let a link start at the minimum congestion window, i.e. in slow
start mode, and then let is grow rapidly (+1 per rceived ACK) until
it reaches the slow start threshold and enters congestion avoidance
mode.
- In congestion avoidance mode we increment the congestion window for
each window-size number of acked packets, up to a possible maximum
equal to the configured maximum window.
- For each non-duplicate NACK received, we drop back to fast recovery
mode, by setting the both the slow start threshold to and the
congestion window to (current_congestion_window / 2).
- If the timeout handler finds that the transmit queue has not moved
since the previous timeout, it drops the link back to slow start
and forces a probe containing the last sent sequence number to the
sent to the peer, so that this can discover the stale situation.
This change does in reality have effect only on unicast ethernet
transport, as we have seen that there is no room whatsoever for
increasing the window max size for the UDP bearer.
For now, we also choose to keep the limits for the broadcast link
unchanged and equal.
This algorithm seems to give a 50-100% throughput improvement for
messages larger than MTU.
Suggested-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-10 07:52:46 +08:00
|
|
|
m->max_win = nla_get_u32(props[TIPC_NLA_PROP_WIN]);
|
2018-04-19 17:06:19 +08:00
|
|
|
if (props[TIPC_NLA_PROP_MTU]) {
|
2021-03-25 09:56:41 +08:00
|
|
|
if (m->type_id != TIPC_MEDIA_TYPE_UDP) {
|
|
|
|
NL_SET_ERR_MSG(info->extack,
|
|
|
|
"MTU property is unsupported");
|
2018-04-19 17:06:19 +08:00
|
|
|
return -EINVAL;
|
2021-03-25 09:56:41 +08:00
|
|
|
}
|
2018-04-19 17:06:19 +08:00
|
|
|
#ifdef CONFIG_TIPC_MEDIA_UDP
|
|
|
|
if (tipc_udp_mtu_bad(nla_get_u32
|
2021-03-25 09:56:41 +08:00
|
|
|
(props[TIPC_NLA_PROP_MTU]))) {
|
|
|
|
NL_SET_ERR_MSG(info->extack,
|
|
|
|
"MTU value is out-of-range");
|
2018-04-19 17:06:19 +08:00
|
|
|
return -EINVAL;
|
2021-03-25 09:56:41 +08:00
|
|
|
}
|
2018-04-19 17:06:19 +08:00
|
|
|
m->mtu = nla_get_u32(props[TIPC_NLA_PROP_MTU]);
|
|
|
|
#endif
|
|
|
|
}
|
2014-11-20 17:29:16 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
2018-02-14 13:38:02 +08:00
|
|
|
|
|
|
|
int tipc_nl_media_set(struct sk_buff *skb, struct genl_info *info)
|
|
|
|
{
|
|
|
|
int err;
|
|
|
|
|
|
|
|
rtnl_lock();
|
|
|
|
err = __tipc_nl_media_set(skb, info);
|
|
|
|
rtnl_unlock();
|
|
|
|
|
|
|
|
return err;
|
|
|
|
}
|