In short, sctp is likely to incorrectly choose src address if socket is
bound to secondary addresses. This patch fixes it by adding a new check
that checks if such src address belongs to the interface that routing
identified as output.
This is enough to avoid rp_filter drops on remote peer.
Details:
Currently, sctp will do a routing attempt without specifying the src
address and compare the returned value (preferred source) with the
addresses that the socket is bound to. When using secondary addresses,
this will not match.
Then it will try specifying each of the addresses that the socket is
bound to and re-routing, checking if that address is valid as src for
that dst. Thing is, this check alone is weak:
# ip r l
192.168.100.0/24 dev eth1 proto kernel scope link src 192.168.100.149
192.168.122.0/24 dev eth0 proto kernel scope link src 192.168.122.147
# ip a l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 52:54:00:15:18:6a brd ff:ff:ff:ff:ff:ff
inet 192.168.122.147/24 brd 192.168.122.255 scope global dynamic eth0
valid_lft 2160sec preferred_lft 2160sec
inet 192.168.122.148/24 scope global secondary eth0
valid_lft forever preferred_lft forever
inet6 fe80::5054:ff:fe15:186a/64 scope link
valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 52:54:00:b3:91:46 brd ff:ff:ff:ff:ff:ff
inet 192.168.100.149/24 brd 192.168.100.255 scope global dynamic eth1
valid_lft 2162sec preferred_lft 2162sec
inet 192.168.100.148/24 scope global secondary eth1
valid_lft forever preferred_lft forever
inet6 fe80::5054:ff:feb3:9146/64 scope link
valid_lft forever preferred_lft forever
4: ens9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 52:54:00:05:47:ee brd ff:ff:ff:ff:ff:ff
inet6 fe80::5054:ff:fe05:47ee/64 scope link
valid_lft forever preferred_lft forever
# ip r g 192.168.100.193 from 192.168.122.148
192.168.100.193 from 192.168.122.148 dev eth1
cache
Even if you specify an interface:
# ip r g 192.168.100.193 from 192.168.122.148 oif eth1
192.168.100.193 from 192.168.122.148 dev eth1
cache
Although this would be valid, peers using rp_filter will drop such
packets as their src doesn't match the routes for that interface.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paves the day for the next patch. Functionality stays untouched.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1. Arik introduced an rtnl-locked regulatory API to be able
to differentiate between place do/don't have the RTNL;
this fixes missing locking in some of the code paths
2. Two small mesh bugfixes from Bob, one to avoid treating
a certain malformed over-the-air frame and one to avoid
sending a garbage field over the air.
3. A fix for powersave during WoWLAN suspend from Krishna Chaitanya.
4. A fix for a powersave vs. aggregation teardown race, from Michal.
5. Thomas reduced the loglevel of CRDA messages to avoid spamming
the kernel log with mostly irrelevant information.
6. Tom fixed a dangling debugfs directory pointer that could cause
crashes if subsequent addition of the same interface to debugfs
failed for some reason.
7. A fix from myself for a list corruption issue in mac80211 during
combined interface shutdown/removal - shut down interfaces first
and only then remove them to avoid that.
-----BEGIN PGP SIGNATURE-----
iQIcBAABCAAGBQJVqQMwAAoJEDBSmw7B7bqrZzkQAIjMKojlJRouN/N/aF7ym2pC
eAboLMC+XHubQoq2H01k5ZOSrLL1kElhkB7pLas+q00gTFyavLzEcEiFqCNuLwPH
lQEwLXTDUeiaVWekOYJev/ONtaDdwUXoB4BPAA3Ih4EAk9fEBtcWiWeLDgOLOS8P
eYVqcMV733cOTjhYImEQnhnm3qrcwSCF1vTOJaN4Gf/qqw6j2ilq5wU1TvPyh0TA
1EP5Elb9hy1sud5X6shrsOBqkBrPoO1p3V4EeoHkxl8welqxXdqGvmA3K0sloGZT
7RiL8PD4QVyISy1NrBDnNMRRgj6BD1aLC+clmECmmgYvGGcqbzLtB3CWUCV6oQmb
TC4NmgJkKNVTvQnoqxQEL8JYSs/E2ITRKyMi3sfIYAyz1dVuQf1RkZZzB22rQWT2
PaLq/k+vpS7E3OD3XB53flB/k7Y6j/OwJb/rE7i2vqSn3kcbua8H7dzd7p+AE5FA
ZF//u2GBDgZeMaA9BvifByWy2+yvAEcD5/U9XkWqJ7t+HohKteLJj/scHT89pto3
n0NZ7RVRMNQ9mz14UJiVnFOL/81AjmiU123S5UIIMkmVE5Zrn7VTZlN6fVY4Fcsh
AtxHQesOlCw8T4lFLxgyKkEl7bxATQ2OMR6vWsZQraRHSqIuK8JDABRokIlzoFn/
xC/Yn1vTaBuj+2nif/F0
=US5Y
-----END PGP SIGNATURE-----
Merge tag 'mac80211-for-davem-2015-07-17' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
Some fixes for the current cycle:
1. Arik introduced an rtnl-locked regulatory API to be able
to differentiate between place do/don't have the RTNL;
this fixes missing locking in some of the code paths
2. Two small mesh bugfixes from Bob, one to avoid treating
a certain malformed over-the-air frame and one to avoid
sending a garbage field over the air.
3. A fix for powersave during WoWLAN suspend from Krishna Chaitanya.
4. A fix for a powersave vs. aggregation teardown race, from Michal.
5. Thomas reduced the loglevel of CRDA messages to avoid spamming
the kernel log with mostly irrelevant information.
6. Tom fixed a dangling debugfs directory pointer that could cause
crashes if subsequent addition of the same interface to debugfs
failed for some reason.
7. A fix from myself for a list corruption issue in mac80211 during
combined interface shutdown/removal - shut down interfaces first
and only then remove them to avoid that.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
We store c45 PHY's id information in c45_ids, so it should be used to
check the matching between PHY driver and PHY device for c45 PHY.
Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
__vxlan_find_mac invokes ether_addr_equal on the eth_addr field,
which triggers unaligned access messages, so rearrange vxlan_fdb
to avoid this in the most non-intrusive way.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kernel generates a lot of warnings when dst entry reference counter
overflows and becomes negative. That bug was seen several times at
machines with outdated 3.10.y kernels. Most like it's already fixed
in upstream. Anyway that flood completely kills machine and makes
further debugging impossible.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Depending on system speed, the large lookup/insert/delete loops of the testsuite can
take a considerable amount of time to complete causing watchdog warnings to appear.
Allow other tasks to be scheduled throughout the loops.
Reported-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1) If sk_filter() is applied, skb was leaked (not freed)
2) Testing SOCK_DEAD twice is racy :
packet could be freed while already queued.
3) Remove obsolete comment about caching skb->len
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Teranetics TN2020 is compliant with IEEE 802.3an 10 Gigabit.
Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov says:
====================
bpf: introduce bpf_skb_vlan_push/pop() helpers
Let TC+eBPF programs call skb_vlan_push/pop via helpers.
v1->v2:
- reworded commit log to better explain correctness of re-caching
and fixed comparison of mixed endiannes (suggested by Eric)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
improve accuracy of timing in test_bpf and add two stress tests:
- {skb->data[0], get_smp_processor_id} repeated 2k times
- {skb->data[0], vlan_push} x 68 followed by {skb->data[0], vlan_pop} x 68
1st test is useful to test performance of JIT implementation of BPF_LD_ABS
together with BPF_CALL instructions.
2nd test is stressing skb_vlan_push/pop logic together with skb->data access
via BPF_LD_ABS insn which checks that re-caching of skb->data is done correctly.
In order to call bpf_skb_vlan_push() from test_bpf.ko have to add
three export_symbol_gpl.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow eBPF programs attached to TC qdiscs call skb_vlan_push/pop via
helper functions. These functions may change skb->data/hlen which are
cached by some JITs to improve performance of ld_abs/ld_ind instructions.
Therefore JITs need to recognize bpf_skb_vlan_push/pop() calls,
re-compute header len and re-cache skb->data/hlen back into cpu registers.
Note, skb->data/hlen are not directly accessible from the programs,
so any changes to skb->data done either by these helpers or by other
TC actions are safe.
eBPF JIT supported by three architectures:
- arm64 JIT is using bpf_load_pointer() without caching, so it's ok as-is.
- x64 JIT re-caches skb->data/hlen unconditionally after vlan_push/pop calls
(experiments showed that conditional re-caching is slower).
- s390 JIT falls back to interpreter for now when bpf_skb_vlan_push() is present
in the program (re-caching is tbd).
These helpers allow more scalable handling of vlan from the programs.
Instead of creating thousands of vlan netdevs on top of eth0 and attaching
TC+ingress+bpf to all of them, the program can be attached to eth0 directly
and manipulate vlans as necessary.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2015-07-17
This series contains updates to igb, ixgbe, ixgbevf, i40e, bnx2x,
freescale, siena and dp83640.
Jacob provides several patches to clarify the intended way to implement
both SIOCSHWTSTAMP and ethtool's get_ts_info(). It is okay to support
the specific filters in SIOCSHWTSTAMP by upscaling them to the generic
filters.
Alex Duyck provides a igb patch to pull the time stamp from the fragment
before it gets added to the skb, to avoid a possible issue in which the
fragment can possibly be less than IGB_RX_HDR_LEN due to the time stamp
being pulled after the copybreak check. Also provides a ixgbevf patch to
fold the ixgbevf_pull_tail() call into ixgbevf_add_rx_frag(), which gives
the advantage that the fragment does not have to be modified after it is
added to the skb.
Fan provides patches for ixgbe/ixgbevf to set the receive hash type
based on receive descriptor RSS type.
Todd provides a fix for igb where on check for link on any media other
than copper was not being detected since it was looking on the incorrect
PHY page (due to the page being used gets switched before the function
to check link gets executed).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli says:
====================
net: bcmgenet: PHY initialization rework
This patch series reworks how we perform PHY initialization and resets in the
GENET driver. Although this contains mostly fixes, some of the changes are a
bit too intrusive to be backported to 'net' at the moment.
Some of the motivations behind these changes were to reduce the time spent in how
performing MDIO transactions, since it is better to perform then when we have
interrupts enabled. This reduces the bring-up time of GENET from ~600 msecs down
to ~8 msecs, and about the same time for suspend/resume.
Since I do not currently have a system which is not DT-aware, can you (Petri,
Jaedon) give this a try and confirm things keep working as expected?
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that we have reworked the way we perform the PHY initialization, we
no longer need to differentiate between init time vs. non-init time
calls, just use a dev_info_once() print to print the PHY type.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We are currently doing a full PHY initialization and even starting the
pHY state machine during bcmgenet_mii_init() which is executed in the
driver's probe function. This is convenient to determine whether we can
attach to a proper PHY device but comes at the expense of spending up to
10ms per MDIO transactions (to reach the waitqueue timeout), which slows
things down.
This also creates a sitaution where we end-up attaching twice to the
PHY, which is not quite correct either.
Fix this by moving bcmgenet_mii_probe() into bcmgenet_open() and update
its error path accordingly.
Avoid printing the message "attached PHY at address 1 [...]" every time
we bring up/down the interface and remove this print since it duplicates
what the PHY driver already does for us.
Fixes: 1c1008c793 ("net: bcmgenet: add main driver file")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Our internal GPHY might be powered off before we attempt scanning the
MDIO bus and bind a driver to it. The way we are currently determining
whether a PHY is internal or not is done *after* we have successfully
matched its driver. If the PHY is powered down, it will not respond to
the MDIO bus, so we will not be able to bind a driver to it.
Our Device Tree for GENET interfaces specifies a "phy-mode" value:
"internal" which tells if this internal uses an internal PHY or not.
If of_get_phy_mode() fails to parse the 'phy-mode' property, do an
additional manual lookup, and if we find "internal" set the
corresponding internal variable accordingly.
Replace all uses of phy_is_internal() with a check against
priv->internal_phy to avoid having to rely on whether or not
priv->phydev is set correctly.
Fixes: 1c1008c793 ("net: bcmgenet: add main driver file")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We are currently disabling the GPHY interface during bcmgenet_close(),
and attempting to power it back on during bcmgenet_open(). This works
fine for the first time, because we called bcmgenet_mii_config() which
took care of enabling the interface, however, bcmgenet_power_up() really
needs to power on the GPHY for correctness.
This will be particularly important as we want to move
bcmgenet_mii_probe() down to bcmgenet_open() to avoid seeing the "PHY
already attached" message.
Fixes: a642c4f790 ("net: bcmgenet: power up and down integrated GPHY when unused")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bcmgenet_open()'s error path call free_irq() with a dev_id argument
different from the one we used to call request_irq() with, this will
make us trip over the warning in kernel/irq/manage.c:__free_irq()
Fixes: 1c1008c793 ("net: bcmgenet: add main driver file")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We are currently issuing multiple PHY resets during a suspend/resume,
first during bcmgenet_power_up() which does a hardware reset, then a
software reset by calling bcmgenet_mii_reset(). This is both unnecessary
and can take as long as 10ms per MDIO transactions while we re-apply
workarounds because we do not yet have MDIO interrupts enabled.
phy_resume() takes care of re-apply our workarounds in case we need any,
and bcmgenet_power_up() does a PHY hardware reset, all of this is more
than enough to guarantee that the PHY operates correctly.
Fixes: 1c1008c793 ("net: bcmgenet: add main driver file")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joachim Eastwood says:
====================
stmmac clean up for 4.3 part1
This patch set continues the conversion of the dwmac glue layers
to more proper platform drivers. The first part of the patch set
cleans up stmmac_platform a bit. Refactors code from the common
probe function and exports two functions that will be used in
the dwmac-* drivers.
Second part converts two simple dwmac-* drivers to have their
own probe function and use the exported functions. This brings
us closer to point where stmmac_platform is only a library of
common functions for the dwmac-* drivers to use.
The plan next is:
* add probe functions to the rest of the dwmac-* drivers
* move probe function in stmmac_platform to dwmac-generic
* remove struct stmmac_of_data and let those drivers
that actually need match data handle it themselves
* clean up include/linux/stmmac.h
Note that this patch set has only been tested on lpc18xx so
testing on other platforms is greatly appreciated.
Previous parts can be found here:
http://www.spinics.net/lists/netdev/msg328997.htmlhttp://www.spinics.net/lists/netdev/msg329932.html
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Both of these fields are unused and has been unused since they
were added 3 and 5 years ago. Drop them since they are clearly
not very useful.
Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
By using a few functions from stmmac_platform we can now create
a proper probe function in this driver. By doing so we can drop
the OF match data and simplify the overall driver.
Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
By using a few functions from stmmac_platform we can now create
a proper probe function in this driver. By doing so we can drop
the OF match data and simplify the overall driver.
Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Export stmmac_probe_config_dt() and stmmac_get_platform_resources()
so they can be used in the dwmac-* drivers themselves. This will
allow us to build more flexible and standalone drivers which just
use stmmac_platform as a library for setup functions.
Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since stmmac_probe_config_dt() allocates the platform data structure
it is cleaner if it just returned this structure directly. This
function will later be used in the probe function in dwmac-* drivers.
Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Refactor all code that deals with platform resources into it's
own get function. This function will later be used in the probe
function in dwmac-* drivers.
Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Refactor code to clearly separate probing non-dt versus dt. In the
non-dt case platform data must be supplied to probe successfully.
For dt the platform data structure is created and match data is
copied into it. Note that support for supplying platform data in
dt from AUXDATA is dropped as no users in mainline does this.
This change will allow dt dwmac-* drivers to call the config_dt()
function from probe to create the needed platform data struct and
retrieve common dt properties.
Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
By using of_device_get_match_data() the code that retrieve
match data can be simplified quite a bit.
Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sierra Wireless MC7305/MC7355 with USB ID 1199:9041 also provide a
second QMI/network interface like the MC73xx with USB ID 1199:68c0 on
USB interface #10 when used in the appropriate USB configuration.
Add the corresponding QMI_FIXED_INTF entry to the qmi_wwan driver.
Please note that the second QMI/network interface is not working for
early MC73xx firmware versions like 01.08.x as the device does not
respond to QMI messages on the second /dev/cdc-wdm port.
Signed-off-by: Reinhard Speyerer <rspmn@arcor.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The TCCR.TSRQn bit may get clearead after TCCR gets read, so that TCCR write
would get skipped. We don't need to check this bit before setting.
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Maloy says:
====================
tipc: separate link and link aggregation layer
This is the first batch of a longer series that has two main objectives:
o Finer lock granularity during message sending and reception,
especially regarding usage of the node spinlock.
o Better separation between the link layer implementation and the link
aggregation layer, represented by node.c::struct tipc_node.
Hopefully these changes also make this part of code somewhat easier
to comprehend and maintain.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
We convert packet/message reception according to the same principle
we have been using for message sending and timeout handling:
We move the function tipc_rcv() to node.c, hence handling the initial
packet reception at the link aggregation level. The function grabs
the node lock, selects the receiving link, and accesses it via a new
call tipc_link_rcv(). This function appends buffers to the input
queue for delivery upwards, but it may also append outgoing packets
to the xmit queue, just as we do during regular message sending. The
latter will happen when buffers are forwarded from the link backlog,
or when retransmission is requested.
Upon return of this function, and after having released the node lock,
tipc_rcv() delivers/tranmsits the contents of those queues, but it may
also perform actions such as link activation or reset, as indicated by
the return flags from the link.
This reduces the number of cpu cycles spent inside the node spinlock,
and reduces contention on that lock.
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The logics for determining when a node is permitted to establish
and maintain contact with its peer node becomes non-trivial in the
presence of multiple parallel links that may come and go independently.
A known failure scenario is that one endpoint registers both its links
to the peer lost, cleans up it binding table, and prepares for a table
update once contact is re-establihed, while the other endpoint may
see its links reset and re-established one by one, hence seeing
no need to re-synchronize the binding table. To avoid this, a node
must not allow re-establishing contact until it has confirmation that
even the peer has lost both links.
Currently, the mechanism for handling this consists of setting and
resetting two state flags from different locations in the code. This
solution is hard to understand and maintain. A closer analysis even
reveals that it is not completely safe.
In this commit we do instead introduce an FSM that keeps track of
the conditions for when the node can establish and maintain links.
It has six states and four events, and is strictly based on explicit
knowledge about the own node's and the peer node's contact states.
Only events leading to state change are shown as edges in the figure
below.
+--------------+
| SELF_UP/ |
+---------------->| PEER_COMING |-----------------+
SELF_ | +--------------+ |PEER_
ESTBL_ | | |ESTBL_
CONTACT| SELF_LOST_CONTACT | |CONTACT
| v |
| +--------------+ |
| PEER_ | SELF_DOWN/ | SELF_ |
| LOST_ +--| PEER_LEAVING |<--+ LOST_ v
+-------------+ CONTACT | +--------------+ | CONTACT +-----------+
| SELF_DOWN/ |<----------+ +----------| SELF_UP/ |
| PEER_DOWN |<----------+ +----------| PEER_UP |
+-------------+ SELF_ | +--------------+ | PEER_ +-----------+
| LOST_ +--| SELF_LEAVING/|<--+ LOST_ A
| CONTACT | PEER_DOWN | CONTACT |
| +--------------+ |
| A |
PEER_ | PEER_LOST_CONTACT | |SELF_
ESTBL_ | | |ESTBL_
CONTACT| +--------------+ |CONTACT
+---------------->| PEER_UP/ |-----------------+
| SELF_COMING |
+--------------+
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In our effort to move control of the links to the link aggregation
layer, we move the perodic link supervision timer to struct tipc_node.
The new timer is shared between all links belonging to the node, thus
saving resources, while still kicking the FSM on both its pertaining
links at each expiration.
The current link timer and corresponding functions are removed.
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We create a second, simpler, link timer function, tipc_link_timeout().
The new function makes use of the new FSM function introduced in the
previous commit, and just like it, takes a buffer queue as parameter.
It returns an event bit field and potentially a link protocol packet
to the caller.
The existing timer function, link_timeout(), is still needed for a
while, so we redesign it to become a wrapper around the new function.
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The link FSM implementation is currently unnecessarily complex.
It sometimes checks for conditional state outside the FSM data
before deciding next state, and often performs actions directly
inside the FSM logics.
In this commit, we create a second, simpler FSM implementation,
that as far as possible acts only on states and events that it is
strictly defined for, and postpone any actions until it is finished
with its decisions. It also returns an event flag field and an a
buffer queue which may potentially contain a protocol message to
be sent by the caller.
Unfortunately, we cannot yet make the FSM "clean", in the sense
that its decisions are only based on FSM state and event, and that
state changes happen only here. That will have to wait until the
activate/reset logics has been cleaned up in a future commit.
We also rename the link states as follows:
WORKING_WORKING -> TIPC_LINK_WORKING
WORKING_UNKNOWN -> TIPC_LINK_PROBING
RESET_UNKNOWN -> TIPC_LINK_RESETTING
RESET_RESET -> TIPC_LINK_ESTABLISHING
The existing FSM function, link_state_event(), is still needed for
a while, so we redesign it to make use of the new function.
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As a preparation for later changes, we introduce a new function
tipc_link_build_proto_msg(). Instead of actually sending the created
protocol message, it only creates it and adds it to the head of a
skb queue provided by the caller.
Since we still need the existing function tipc_link_protocol_xmit()
for a while, we redesign it to make use of the new function.
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The status flag LINK_STOPPED is not needed any more, since the
mechanism for delayed deletion of links has been removed.
Likewise, LINK_STARTED and LINK_START_EVT are unnecessary,
because we can just as well start the link timer directly from
inside tipc_link_create().
We eliminate these flags in this commit.
Instead of the above flags, we now introduce three new link modes,
TIPC_LINK_OPEN, TIPC_LINK_BLOCKED and TIPC_LINK_TUNNEL. The values
indicate whether, and in the case of TIPC_LINK_TUNNEL, which, messages
the link is allowed to receive in this state. TIPC_LINK_BLOCKED also
blocks timer-driven protocol messages to be sent out, and any change
to the link FSM. Since the modes are mutually exclusive, we convert
them to state values, and rename the 'flags' field in struct tipc_link
to 'exec_mode'.
Finally, we move the #defines for link FSM states and events from link.h
into enums inside the file link.c, which is the real usage scope of
these definitions.
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, message sending is performed through a deep call chain,
where the node spinlock is grabbed and held during a significant
part of the transmission time. This is clearly detrimental to
overall throughput performance; it would be better if we could send
the message after the spinlock has been released.
In this commit, we do instead let the call revert on the stack after
the buffer chain has been added to the transmission queue, whereafter
clones of the buffers are transmitted to the device layer outside the
spinlock scope.
As a further step in our effort to separate the roles of the node
and link entities we also move the function tipc_link_xmit() to
node.c, and rename it to tipc_node_xmit().
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the function tipc_link_xmit() is given a buffer list for
transmission, it currently consumes the list both when transmission
is successful and when it fails, except for the special case when
it encounters link congestion.
This behavior is inconsistent, and needs to be corrected if we want
to avoid problems in later commits in this series.
In this commit, we change this to let the function consume the list
only when transmission is successful, and leave the list with the
sender in all other cases. We also modifiy the socket code so that
it adapts to this change, i.e., purges the list when a non-congestion
error code is returned.
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct tipc_node currently holds two arrays of link pointers; one,
indexed by bearer identity, which contains all links irrespective of
current state, and one two-slot array for the currently active link
or links. The latter array contains direct pointers into the elements
of the former. This has the effect that we cannot know the bearer id of
a link when accessing it via the "active_links[]" array without actually
dereferencing the pointer, something we want to avoid in some cases.
In this commit, we do instead store the bearer identity in the
"active_links" array, and use this as an index to find the right element
in the overall link entry array. This change should be seen as a
preparation for the later commits in this series.
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
At present, the link input queue and the name distributor receive
queues are fields aggregated in struct tipc_link. This is a hazard,
because a link might be deleted while a receiving socket still keeps
reference to one of the queues.
This commit fixes this bug. However, rather than adding yet another
reference counter to the critical data path, we move the two queues
to safe ground inside struct tipc_node, which is already protected, and
let the link code only handle references to the queues. This is also
in line with planned later changes in this area.
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As a step towards turning links into node internal entities, we move the
creation of links from the neighbor discovery logics to the node's link
control logics.
We also create an additional entry for the link's media address in the
newly introduced struct tipc_link_entry, since this is where it is
needed in the upcoming commits. The current copy in struct tipc_link
is kept for now, but will be removed later.
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct 'tipc_node' currently contains two arrays for link attributes,
one for the link pointers, and one for the usable link MTUs.
We now group those into a new struct 'tipc_link_entry', and intoduce
one single array consisting of such enties. Apart from being a cosmetic
improvement, this is a starting point for the strict master-slave
relation between node and link that we will introduce in the following
commits.
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The keystone qmss will raise interrupt when packet arrive at the
receive queue. Only control available to avoid interrupt from happening
is to keep the free descriptor queue (FDQ) empty in the receive side.
So the filling of descriptors into the FDQ has to happen after
request_irq() call is made as part of knav_queue_enable_notify(). So
move the function netcp_rxpool_refill() after this call.
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The "follow" fail_over_mac policy is useful for multiport devices that
either become confused or incur a performance penalty when multiple
ports are programmed with the same MAC address, but the same MAC
address still may happened by this steps for this policy:
1) echo +eth0 > /sys/class/net/bond0/bonding/slaves
bond0 has the same mac address with eth0, it is MAC1.
2) echo +eth1 > /sys/class/net/bond0/bonding/slaves
eth1 is backup, eth1 has MAC2.
3) ifconfig eth0 down
eth1 became active slave, bond will swap MAC for eth0 and eth1,
so eth1 has MAC1, and eth0 has MAC2.
4) ifconfig eth1 down
there is no active slave, and eth1 still has MAC1, eth2 has MAC2.
5) ifconfig eth0 up
the eth0 became active slave again, the bond set eth0 to MAC1.
Something wrong here, then if you set eth1 up, the eth0 and eth1 will have the same
MAC address, it will break this policy for ACTIVE_BACKUP mode.
This patch will fix this problem by finding the old active slave and
swap them MAC address before change active slave.
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Tested-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQEcBAABCgAGBQJVp19MAAoJEP5prqPJtc/HsA8IAIqYKwJtXHItVoG6GBxLo/iJ
G6OhCxSgDPUdytdyvWRmYL+2g6q9+BefnApwGroM0wowytHx4I6kF2O4LKEvthnD
TzDV6uY0n1CR8FzCdfQdTFedAfxIx7DLCqY/iC+JRFTh6Vhvj46WtHickthu4Se+
ygTWg5f5Da086JTUtwbb75TCHJv8xxtN57t6yZTtDoyaftqWd1Dzrmi8dBwHRWBA
6X/rE4VekFwFCbqOlv5ghm86lD77sMTEaU0xkm9FeZmO/CnUD3JASwi844RtCqXR
wj3OJ89qH14YmpJwjpakIwHfTyyPJurzPpPkCTZ6awmdNhCvp+h3nbfjKRqs0WU=
=pQGs
-----END PGP SIGNATURE-----
Merge tag 'linux-can-fixes-for-4.2-20150716' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2015-07-16
this is a pull request of 2 patches by Stefan Agner. He fixes the resume
operation in the mcp251x driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Scott Feldman says:
====================
switchdev: avoid duplicate packet forwarding
v3:
- Per Nicolas Dichtel review: remove errant empty union.
v2:
- Per davem review: in sk_buff, union fwd_mark with secmark to save space
since features appear to be mutually exclusive.
- Per Simon Horman review:
- fix grammar in switchdev.txt wrt fwd_mark
- remove some unrelated changes that snuck in
v1:
This patchset was previously submitted as RFC. No changes from the last
version (v2) sent under RFC. Including RFC version history here for reference.
RFC v2:
- s/fwd_mark/offload_fwd_mark
- use consume_skb rather than kfree_skb when dropping pkt on egress.
- Use Jiri's suggestion to use ifindex of one of the ports in a group
as the mark for all the ports in the group. This can be done with
no additional storage (no hashtable from v1). To pull it off, we
need some simple recursive routines to walk the netdev tree ensuring
all leaves in the tree (ports) in the same group (e.g. bridge)
belonging to the same switch device will have the same offload fwd mark.
Maybe someone sees a better design for the recusive routines? They're
not too bad, and should cover the stacked driver cases.
RFC v1:
With switchdev support for offloading L2/L3 forwarding data path to a
switch device, we have a general problem where both the device and the
kernel may forward the packet, resulting in duplicate packets on the wire.
Anytime a packet is forwarded by the device and a copy is sent to the CPU,
there is potential for duplicate forwarding, as the kernel may also do a
forwarding lookup and send the packet on the wire.
The specific problem this patch series is interested in solving is avoiding
duplicate packets on bridged ports. There was a previous RFC from Roopa
(http://marc.info/?l=linux-netdev&m=142687073314252&w=2) to address this
problem, but didn't solve the problem of mixed ports in the bridge from
different devices; there was no way to exclude some ports from forwarding
and include others. This RFC solves that problem by tagging the ingressing
packet with a unique mark, and then comparing the packet mark with the
egress port mark, and skip forwarding when there is a match. For the mixed
ports bridge case, only those ports with matching marks are skipped.
The switchdev port driver must do two things:
1) Generate a fwd_mark for each switch port, using some unique key of the
switch device (and optionally port). This is done when the port netdev
is registered or if the port's group membership changes (joins/leaves
a bridge, for example).
2) On packet ingress from port, mark the skb with the ingress port's
fwd_mark. If the device supports it, it's useful to only mark skbs
which were already forwarded by the device. If the device does not
support such indication, all skbs can be marked, even if they're
local dst.
Two new 32-bit fields are added to struct sk_buff and struct netdevice to
hold the fwd_mark. I've wrapped these with CONFIG_NET_SWITCHDEV for now. I
tried using skb->mark for this purpose, but ebtables can overwrite the
skb->mark before the bridge gets it, so that will not work.
In general, this fwd_mark can be used for any case where a packet is
forwarded by the device and a copy is sent to the CPU, to avoid the kernel
re-forwarding the packet. sFlow is another use-case that comes to mind,
but I haven't explored the details.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>