Commit Graph

63899 Commits

Author SHA1 Message Date
Jakub Kicinski
8e57158683 This feature/cleanup patchset is an updated version of the pull request
of Feb 2nd (batadv-next-pullrequest-20210202) and includes the
 following patches:
 
  - Bump version strings, by Simon Wunderlich (added commit log)
 
  - Drop publication years from copyright info, by Sven Eckelmann
    (replaced the previous patch which updated copyright years, as per
     our discussion)
 
  - Avoid sizeof on flexible structure, by Sven Eckelmann (unchanged)
 
  - Fix names for kernel-doc blocks, by Sven Eckelmann (unchanged)
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAmAhbUkWHHN3QHNpbW9u
 d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoR7yEACMhdCzMoVPJQwOYWe5p6QwgaBz
 85QRT56x9gpFhV5dXCyg7DK3Qt2JRfTfBtMOeZQatFxcACYaGunZSS1L6gDVATpT
 5vB+5UwQK8AW7cjkwVS6vJWW9Wmll9IwNe0/1DGvSBjEWwmU/wlRzAPa2uAT2uw5
 AxrErEuXo5M3U4mDfJckVD4XA+pKkd9ylLEQ7llcZA4rOaTsr5sNAy5mbxO5EXD9
 yP1vq9BoXtsf0FyZbQrMnYre6teAkxVxrvkTn6v44vsFKsi69JaxDiKQ4T7vakZR
 1rIQq/8XbkH0dQXEu4C2FtWTzrg9P4KNHBPiT06b+KxlROpfYivcWhIqlofmW2FJ
 5bWlumyNg3WoUmaM9kLGTFHagAp8M968W8zsI5fLi0meX0pEzFe/E1iBfkQaYyHh
 R8Xpt7z1ORYUavFhVXqMw8x92WOLWmdFZjSGaW6sNyCxMFIU7qR16gYcXmucrJyU
 RY6o159D9AKVOdX/GdX50mvyHjn/lC3KUEGQLUVxXMdJHpj7avn7aEiCWHUvgAxQ
 jIHLOy0CRsUlCFPmzSqwGs3dAJEZeFbvqMwZjFJ/UXlKwBgPVMy76wIUk57+FWKz
 3DTcg+6RIiW+bWazn/Hdbn9JXUNZnp5C6oH62GFPw7G6ywfe/yPex4qubF7feyog
 T9H6ho+KW3SmRsHY5Q==
 =fX7f
 -----END PGP SIGNATURE-----

Merge tag 'batadv-next-pullrequest-20210208' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
This feature/cleanup patchset is an updated version of the pull request
of Feb 2nd (batadv-next-pullrequest-20210202) and includes the
following patches:

 - Bump version strings, by Simon Wunderlich (added commit log)

 - Drop publication years from copyright info, by Sven Eckelmann
   (replaced the previous patch which updated copyright years, as per
    our discussion)

 - Avoid sizeof on flexible structure, by Sven Eckelmann (unchanged)

 - Fix names for kernel-doc blocks, by Sven Eckelmann (unchanged)

* tag 'batadv-next-pullrequest-20210208' of git://git.open-mesh.org/linux-merge:
  batman-adv: Fix names for kernel-doc blocks
  batman-adv: Avoid sizeof on flexible structure
  batman-adv: Drop publication years from copyright info
  batman-adv: Start new development cycle
====================

Link: https://lore.kernel.org/r/20210208165938.13262-1-sw@simonwunderlich.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-08 11:32:40 -08:00
Jakub Kicinski
c273a20c30 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

1) Remove indirection and use nf_ct_get() instead from nfnetlink_log
   and nfnetlink_queue, from Florian Westphal.

2) Add weighted random twos choice least-connection scheduling for IPVS,
   from Darby Payne.

3) Add a __hash placeholder in the flow tuple structure to identify
   the field to be included in the rhashtable key hash calculation.

4) Add a new nft_parse_register_load() and nft_parse_register_store()
   to consolidate register load and store in the core.

5) Statify nft_parse_register() since it has no more module clients.

6) Remove redundant assignment in nft_cmp, from Colin Ian King.

* git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next:
  netfilter: nftables: remove redundant assignment of variable err
  netfilter: nftables: statify nft_parse_register()
  netfilter: nftables: add nft_parse_register_store() and use it
  netfilter: nftables: add nft_parse_register_load() and use it
  netfilter: flowtable: add hash offset field to tuple
  ipvs: add weighted random twos choice algorithm
  netfilter: ctnetlink: remove get_ct indirection
====================

Link: https://lore.kernel.org/r/20210206015005.23037-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-06 15:34:23 -08:00
Xie He
21c85974aa net/packet: Improve the comment about LL header visibility criteria
The "dev_has_header" function, recently added in
commit d549699048 ("net/packet: fix packet receive on L3 devices
without visible hard header"),
is more accurate as criteria for determining whether a device exposes
the LL header to upper layers, because in addition to dev->header_ops,
it also checks for dev->header_ops->create.

When transmitting an skb on a device, dev_hard_header can be called to
generate an LL header. dev_hard_header will only generate a header if
dev->header_ops->create is present.

Signed-off-by: Xie He <xie.he.0141@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20210205224124.21345-1-xie.he.0141@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-06 14:59:28 -08:00
Vladimir Oltean
a324d3d48f net: dsa: make assisted_learning_on_cpu_port bypass offloaded LAG interfaces
Given the following topology, and focusing only on Box A:

         Box A
         +----------------------------------+
         | Board 1         br0              |
         |             +---------+          |
         |            /           \         |
         |            |           |         |
         |            |         bond0       |
         |            |        +-----+      |
         |192.168.1.1 |       /       \     |
         |  eno0     swp0    swp1    swp2   |
         +---|--------|-------|-------|-----+
             |        |       |       |
             +--------+       |       |
               Cable          |       |
                         Cable|       |Cable
               Cable          |       |
             +--------+       |       |
             |        |       |       |
         +---|--------|-------|-------|-----+
         |  eno0     swp0    swp1    swp2   |
         |192.168.1.2 |       \       /     |
         |            |        +-----+      |
         |            |         bond0       |
         |            |           |         |
         |            \           /         |
         |             +---------+          |
         | Board 2         br0              |
         +----------------------------------+
         Box B

The assisted_learning_on_cpu_port logic will see that swp0 is bridged
with a "foreign interface" (bond0) and will therefore install all
addresses learnt by the software bridge towards bond0 (including the
address of eno0 on Box B) as static addresses towards the CPU port.

But that's not what we want - bond0 is not really a "foreign interface"
but one we can offload including L2 forwarding from/towards it. So we
need to refine our logic for assisted learning such that, whenever we
see an address learnt on a non-DSA interface, we search through the tree
for any port that offloads that non-DSA interface.

Some confusion might arise as to why we search through the whole tree
instead of just the local switch returned by dsa_slave_dev_lower_find.
Or a different angle of the same confusion: why does
dsa_slave_dev_lower_find(br_dev) return a single dp that's under br_dev
instead of the whole list of bridged DSA ports?

To answer the second question, it should be enough to install the static
FDB entry on the CPU port of a single switch in the tree, because
dsa_port_fdb_add uses DSA_NOTIFIER_FDB_ADD which ensures that all other
switches in the tree get notified of that address, and add the entry
themselves using dsa_towards_port().

This should help understand the answer to the first question: the port
returned by dsa_slave_dev_lower_find may not be on the same switch as
the ports that offload the LAG. Nonetheless, if the driver implements
.crosschip_lag_join and .crosschip_bridge_join as mv88e6xxx does, there
still isn't any reason for trapping addresses learnt on the remote LAG
towards the CPU, and we should prevent that.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-06 14:51:51 -08:00
Vladimir Oltean
46acf7bdbc Revert "net: ipv4: handle DSA enabled master network devices"
This reverts commit 728c02089a.

Since 2015 DSA has gained more integration with the network stack, we
can now have the same functionality without explicitly open-coding for
it:
- It now opens the DSA master netdevice automatically whenever a user
  netdevice is opened.
- The master and switch interfaces are coupled in an upper/lower
  hierarchy using the netdev adjacency lists.

In the nfsroot example below, the interface chosen by autoconfig was
swp3, and every interface except that and the DSA master, eth1, was
brought down afterwards:

[    8.714215] mscc_felix 0000:00:00.5 swp0 (uninitialized): PHY [0000:00:00.3:10] driver [Microsemi GE VSC8514 SyncE] (irq=POLL)
[    8.978041] mscc_felix 0000:00:00.5 swp1 (uninitialized): PHY [0000:00:00.3:11] driver [Microsemi GE VSC8514 SyncE] (irq=POLL)
[    9.246134] mscc_felix 0000:00:00.5 swp2 (uninitialized): PHY [0000:00:00.3:12] driver [Microsemi GE VSC8514 SyncE] (irq=POLL)
[    9.486203] mscc_felix 0000:00:00.5 swp3 (uninitialized): PHY [0000:00:00.3:13] driver [Microsemi GE VSC8514 SyncE] (irq=POLL)
[    9.512827] mscc_felix 0000:00:00.5: configuring for fixed/internal link mode
[    9.521047] mscc_felix 0000:00:00.5: Link is Up - 2.5Gbps/Full - flow control off
[    9.530382] device eth1 entered promiscuous mode
[    9.535452] DSA: tree 0 setup
[    9.539777] printk: console [netcon0] enabled
[    9.544504] netconsole: network logging started
[    9.555047] fsl_enetc 0000:00:00.2 eth1: configuring for fixed/internal link mode
[    9.562790] fsl_enetc 0000:00:00.2 eth1: Link is Up - 1Gbps/Full - flow control off
[    9.564661] 8021q: adding VLAN 0 to HW filter on device bond0
[    9.637681] fsl_enetc 0000:00:00.0 eth0: PHY [0000:00:00.0:02] driver [Qualcomm Atheros AR8031/AR8033] (irq=POLL)
[    9.655679] fsl_enetc 0000:00:00.0 eth0: configuring for inband/sgmii link mode
[    9.666611] mscc_felix 0000:00:00.5 swp0: configuring for inband/qsgmii link mode
[    9.676216] 8021q: adding VLAN 0 to HW filter on device swp0
[    9.682086] mscc_felix 0000:00:00.5 swp1: configuring for inband/qsgmii link mode
[    9.690700] 8021q: adding VLAN 0 to HW filter on device swp1
[    9.696538] mscc_felix 0000:00:00.5 swp2: configuring for inband/qsgmii link mode
[    9.705131] 8021q: adding VLAN 0 to HW filter on device swp2
[    9.710964] mscc_felix 0000:00:00.5 swp3: configuring for inband/qsgmii link mode
[    9.719548] 8021q: adding VLAN 0 to HW filter on device swp3
[    9.747811] Sending DHCP requests ..
[   12.742899] mscc_felix 0000:00:00.5 swp1: Link is Up - 1Gbps/Full - flow control rx/tx
[   12.743828] mscc_felix 0000:00:00.5 swp0: Link is Up - 1Gbps/Full - flow control off
[   12.747062] IPv6: ADDRCONF(NETDEV_CHANGE): swp1: link becomes ready
[   12.755216] fsl_enetc 0000:00:00.0 eth0: Link is Up - 1Gbps/Full - flow control rx/tx
[   12.766603] IPv6: ADDRCONF(NETDEV_CHANGE): swp0: link becomes ready
[   12.783188] mscc_felix 0000:00:00.5 swp2: Link is Up - 1Gbps/Full - flow control rx/tx
[   12.785354] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   12.799535] IPv6: ADDRCONF(NETDEV_CHANGE): swp2: link becomes ready
[   13.803141] mscc_felix 0000:00:00.5 swp3: Link is Up - 1Gbps/Full - flow control rx/tx
[   13.811646] IPv6: ADDRCONF(NETDEV_CHANGE): swp3: link becomes ready
[   15.452018] ., OK
[   15.470336] IP-Config: Got DHCP answer from 10.0.0.1, my address is 10.0.0.39
[   15.477887] IP-Config: Complete:
[   15.481330]      device=swp3, hwaddr=00:04:9f:05:de:0a, ipaddr=10.0.0.39, mask=255.255.255.0, gw=10.0.0.1
[   15.491846]      host=10.0.0.39, domain=(none), nis-domain=(none)
[   15.498429]      bootserver=10.0.0.1, rootserver=10.0.0.1, rootpath=
[   15.498481]      nameserver0=8.8.8.8
[   15.627542] fsl_enetc 0000:00:00.0 eth0: Link is Down
[   15.690903] mscc_felix 0000:00:00.5 swp0: Link is Down
[   15.745216] mscc_felix 0000:00:00.5 swp1: Link is Down
[   15.800498] mscc_felix 0000:00:00.5 swp2: Link is Down

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-06 14:42:57 -08:00
Vladimir Oltean
ea92000d54 Revert "net: Have netpoll bring-up DSA management interface"
This reverts commit 1532b97784.

The above commit is good and it works, however it was meant as a bugfix
for stable kernels and now we have more self-contained ways in DSA to
handle the situation where the DSA master must be brought up.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-06 14:42:57 -08:00
Vladimir Oltean
c0a8a9c274 net: dsa: automatically bring user ports down when master goes down
This is not fixing any actual bug that I know of, but having a DSA
interface that is up even when its lower (master) interface is down is
one of those things that just do not sound right.

Yes, DSA checks if the master is up before actually bringing the
user interface up, but nobody prevents bringing the master interface
down immediately afterwards... Then the user ports would attempt
dev_queue_xmit on an interface that is down, and wonder what's wrong.

This patch prevents that from happening. NETDEV_GOING_DOWN is the
notification emitted _before_ the master actually goes down, and we are
protected by the rtnl_mutex, so all is well.

For those of you reading this because you were doing switch testing
such as latency measurements for autonomously forwarded traffic, and you
needed a controlled environment with no extra packets sent by the
network stack, this patch breaks that, because now the user ports go
down too, which may shut down the PHY etc. But please don't do it like
that, just do instead:

tc qdisc add dev eno2 clsact
tc filter add dev eno2 egress flower action drop

Tested with two cascaded DSA switches:
$ ip link set eno2 down
sja1105 spi2.0 sw0p2: Link is Down
mscc_felix 0000:00:00.5 swp0: Link is Down
fsl_enetc 0000:00:00.2 eno2: Link is Down

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-06 14:42:57 -08:00
Vladimir Oltean
9d5ef190e5 net: dsa: automatically bring up DSA master when opening user port
DSA wants the master interface to be open before the user port is due to
historical reasons. The promiscuity of interfaces that are down used to
have issues, as referenced Lennert Buytenhek in commit df02c6ff2e
("dsa: fix master interface allmulti/promisc handling").

The bugfix mentioned there, commit b6c40d68ff ("net: only invoke
dev->change_rx_flags when device is UP"), was basically a "don't do
that" approach to working around the promiscuity while down issue.

Further work done by Vlad Yasevich in commit d2615bf450 ("net: core:
Always propagate flag changes to interfaces") has resolved the
underlying issue, and it is strictly up to the DSA and 8021q drivers
now, it is no longer mandated by the networking core that the master
interface must be up when changing its promiscuity.

From DSA's point of view, deciding to error out in dsa_slave_open
because the master isn't up is
(a) a bad user experience and
(b) knocking at an open door.
Even if there still was an issue with promiscuity while down, DSA could
still just open the master and avoid it.

Doing it this way has the additional benefit that user space can now
remove DSA-specific workarounds, like systemd-networkd with BindCarrier:
https://github.com/systemd/systemd/issues/7478

And we can finally remove one of the 2 bullets in the "Common pitfalls
using DSA setups" chapter.

Tested with two cascaded DSA switches:

$ ip link set sw0p2 up
fsl_enetc 0000:00:00.2 eno2: configuring for fixed/internal link mode
fsl_enetc 0000:00:00.2 eno2: Link is Up - 1Gbps/Full - flow control rx/tx
mscc_felix 0000:00:00.5 swp0: configuring for fixed/sgmii link mode
mscc_felix 0000:00:00.5 swp0: Link is Up - 1Gbps/Full - flow control off
8021q: adding VLAN 0 to HW filter on device swp0
sja1105 spi2.0 sw0p2: configuring for phy/rgmii-id link mode
IPv6: ADDRCONF(NETDEV_CHANGE): eno2: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): swp0: link becomes ready

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-06 14:42:57 -08:00
Florian Westphal
3abc05d9ef mptcp: pm: add lockdep assertions
Add a few assertions to make sure functions are called with the needed
locks held.
Two functions gain might_sleep annotations because they contain
conditional calls to functions that sleep.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-06 14:35:47 -08:00
Kevin Hao
3f6e687dff net: Introduce {netdev,napi}_alloc_frag_align()
In the current implementation of {netdev,napi}_alloc_frag(), it doesn't
have any align guarantee for the returned buffer address, But for some
hardwares they do require the DMA buffer to be aligned correctly,
so we would have to use some workarounds like below if the buffers
allocated by the {netdev,napi}_alloc_frag() are used by these hardwares
for DMA.
    buf = napi_alloc_frag(really_needed_size + align);
    buf = PTR_ALIGN(buf, align);

These codes seems ugly and would waste a lot of memories if the buffers
are used in a network driver for the TX/RX. We have added the align
support for the page_frag functions, so add the corresponding
{netdev,napi}_frag functions.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-06 11:57:28 -08:00
Zheng Yongjun
a64566a22b net: sched: Return the correct errno code
When kalloc or kmemdup failed, should return ENOMEM rather than ENOBUF.

Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Link: https://lore.kernel.org/r/20210204073950.18372-1-zhengyongjun3@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-06 11:15:28 -08:00
Zheng Yongjun
247b557ee5 dccp: Return the correct errno code
When kalloc or kmemdup failed, should return ENOMEM rather than ENOBUF.

Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Link: https://lore.kernel.org/r/20210204072820.17723-1-zhengyongjun3@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-06 11:15:28 -08:00
Xu Wang
1697291dae net: bridge: mcast: Use ERR_CAST instead of ERR_PTR(PTR_ERR())
Use ERR_CAST inlined function instead of ERR_PTR(PTR_ERR(...)).

net/bridge/br_multicast.c:1246:9-16: WARNING: ERR_CAST can be used with mp
Generated by: scripts/coccinelle/api/err_cast.cocci

Signed-off-by: Xu Wang <vulab@iscas.ac.cn>
Link: https://lore.kernel.org/r/20210204070549.83636-1-vulab@iscas.ac.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-06 10:51:01 -08:00
Sven Eckelmann
25d81f9307 batman-adv: Fix names for kernel-doc blocks
kernel-doc can only correctly identify the documented function or struct
when the name in the first kernel-doc line references it. But some of the
kernel-doc blocks referenced a different function/struct then it actually
documented.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2021-02-06 09:22:45 +01:00
Sven Eckelmann
576fb6713b batman-adv: Avoid sizeof on flexible structure
The batadv_dhcp_packet is used to read in parts of the DHCP packet and
extract relevant information for the distributed arp table. But the
structure contained the flexible member "options" which is no where used in
the code.

A sizeof on this kind of type would return the size of everything except
the flexible member. But sparse will detect this kind of sizeof and warn
with

  warning: using sizeof on a flexible structure

This can be avoided by dropping the unused flexible member.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2021-02-06 09:22:44 +01:00
Sven Eckelmann
cfa55c6d47 batman-adv: Drop publication years from copyright info
The batman-adv source code was using the year of publication (to net-next)
as "last" year for the copyright statement. The whole source code mentioned
in the MAINTAINERS "BATMAN ADVANCED" section was handled as a single entity
regarding the publishing year.

This avoided having outdated (in sense of year information - not copyright
holder) publishing information inside several files. But since the simple
"update copyright year" commit (without other changes) in the file was not
well received in the upstream kernel, the option to not have a copyright
year (for initial and last publication) in the files are chosen instead.
More detailed information about the years can still be retrieved from the
SCM system.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Acked-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2021-02-06 09:22:10 +01:00
Colin Ian King
626899a02e netfilter: nftables: remove redundant assignment of variable err
The variable err is being assigned a value that is never read,
the same error number is being returned at the error return
path via label err1.  Clean up the code by removing the assignment.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-02-06 02:43:07 +01:00
Simon Wunderlich
03fd39ed5a batman-adv: Start new development cycle
This version will contain all the (major or even only minor) changes for
Linux 5.12.

The version number isn't a semantic version number with major and minor
information. It is just encoding the year of the expected publishing as
Linux -rc1 and the number of published versions this year (starting at 0).

Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2021-02-05 08:55:00 +01:00
Leon Romanovsky
edf597da02 netfilter: move handlers to net/ip_vs.h
Fix the following compilation warnings:
net/netfilter/ipvs/ip_vs_proto_tcp.c:147:1: warning: no previous prototype for 'tcp_snat_handler' [-Wmissing-prototypes]
  147 | tcp_snat_handler(struct sk_buff *skb, struct ip_vs_protocol *pp,
      | ^~~~~~~~~~~~~~~~
net/netfilter/ipvs/ip_vs_proto_udp.c:136:1: warning: no previous prototype for 'udp_snat_handler' [-Wmissing-prototypes]
  136 | udp_snat_handler(struct sk_buff *skb, struct ip_vs_protocol *pp,
      | ^~~~~~~~~~~~~~~~

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-04 18:37:57 -08:00
Leon Romanovsky
04f00ab227 net/core: move gro function declarations to separate header
Fir the following compilation warnings:
 1031 | INDIRECT_CALLABLE_SCOPE void udp_v6_early_demux(struct sk_buff *skb)

net/ipv6/ip6_offload.c:182:41: warning: no previous prototype for ‘ipv6_gro_receive’ [-Wmissing-prototypes]
  182 | INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head,
      |                                         ^~~~~~~~~~~~~~~~
net/ipv6/ip6_offload.c:320:29: warning: no previous prototype for ‘ipv6_gro_complete’ [-Wmissing-prototypes]
  320 | INDIRECT_CALLABLE_SCOPE int ipv6_gro_complete(struct sk_buff *skb, int nhoff)
      |                             ^~~~~~~~~~~~~~~~~
net/ipv6/ip6_offload.c:182:41: warning: no previous prototype for ‘ipv6_gro_receive’ [-Wmissing-prototypes]
  182 | INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head,
      |                                         ^~~~~~~~~~~~~~~~
net/ipv6/ip6_offload.c:320:29: warning: no previous prototype for ‘ipv6_gro_complete’ [-Wmissing-prototypes]
  320 | INDIRECT_CALLABLE_SCOPE int ipv6_gro_complete(struct sk_buff *skb, int nhoff)

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-04 18:37:57 -08:00
Leon Romanovsky
f9a4719cc1 ipv6: move udp declarations to net/udp.h
Fix the following compilation warning:

net/ipv6/udp.c:1031:30: warning: no previous prototype for 'udp_v6_early_demux' [-Wmissing-prototypes]
 1031 | INDIRECT_CALLABLE_SCOPE void udp_v6_early_demux(struct sk_buff *skb)
      |                              ^~~~~~~~~~~~~~~~~~
net/ipv6/udp.c:1072:29: warning: no previous prototype for 'udpv6_rcv' [-Wmissing-prototypes]
 1072 | INDIRECT_CALLABLE_SCOPE int udpv6_rcv(struct sk_buff *skb)
      |                             ^~~~~~~~~

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-04 18:37:57 -08:00
Xin Long
5d30c626b6 rxrpc: call udp_tunnel_encap_enable in rxrpc_open_socket
When doing encap_enable/increasing encap_needed_key, up->encap_enabled
is not set in rxrpc_open_socket(), and it will cause encap_needed_key
not being decreased in udpv6_destroy_sock().

This patch is to improve it by just calling udp_tunnel_encap_enable()
where it increases both UDP and UDPv6 encap_needed_key and sets
up->encap_enabled.

v4->v5:
  - add the missing '#include <net/udp_tunnel.h>', as David Howells
    noticed.

Acked-and-tested-by: David Howells <dhowells@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-04 18:37:15 -08:00
Xin Long
a4a600dd30 udp: call udp_encap_enable for v6 sockets when enabling encap
When enabling encap for a ipv6 socket without udp_encap_needed_key
increased, UDP GRO won't work for v4 mapped v6 address packets as
sk will be NULL in udp4_gro_receive().

This patch is to enable it by increasing udp_encap_needed_key for
v6 sockets in udp_tunnel_encap_enable(), and correspondingly
decrease udp_encap_needed_key in udpv6_destroy_sock().

v1->v2:
  - add udp_encap_disable() and export it.
v2->v3:
  - add the change for rxrpc and bareudp into one patch, as Alex
    suggested.
v3->v4:
  - move rxrpc part to another patch.

Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-04 18:37:14 -08:00
Alexander Lobakin
05656132a8 net: page_pool: simplify page recycling condition tests
pool_page_reusable() is a leftover from pre-NUMA-aware times. For now,
this function is just a redundant wrapper over page_is_pfmemalloc(),
so inline it into its sole call site.

Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-04 18:20:14 -08:00
Brian Vazquez
9c97921a51 net: fix building errors on powerpc when CONFIG_RETPOLINE is not set
This commit fixes the errores reported when building for powerpc:

 ERROR: modpost: "ip6_dst_check" [vmlinux] is a static EXPORT_SYMBOL
 ERROR: modpost: "ipv4_dst_check" [vmlinux] is a static EXPORT_SYMBOL
 ERROR: modpost: "ipv4_mtu" [vmlinux] is a static EXPORT_SYMBOL
 ERROR: modpost: "ip6_mtu" [vmlinux] is a static EXPORT_SYMBOL

Fixes: f67fbeaebd ("net: use indirect call helpers for dst_mtu")
Fixes: bbd807dfbf ("net: indirect call helpers for ipv4/ipv6 dst_check functions")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Brian Vazquez <brianvv@google.com>
Link: https://lore.kernel.org/r/20210204181839.558951-2-brianvv@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-04 18:06:27 -08:00
Marcelo Ricardo Leitner
7e3ce05e7f netlink: add tracepoint at NL_SET_ERR_MSG
Often userspace won't request the extack information, or they don't log it
because of log level or so, and even when they do, sometimes it's not
enough to know exactly what caused the error.

Netlink extack is the standard way of reporting erros with descriptive
error messages. With a trace point on it, we then can know exactly where
the error happened, regardless of userspace app. Also, we can even see if
the err msg was overwritten.

The wrapper do_trace_netlink_extack() is because trace points shouldn't be
called from .h files, as trace points are not that small, and the function
call to do_trace_netlink_extack() on the macros is not protected by
tracepoint_enabled() because the macros are called from modules, and this
would require exporting some trace structs. As this is error path, it's
better to export just the wrapper instead.

v2: removed leftover tracepoint declaration

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/4546b63e67b2989789d146498b13cc09e1fdc543.1612403190.git.marcelo.leitner@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-04 18:05:59 -08:00
Danielle Ratson
7dc33f0914 ethtool: Expose the number of lanes in use
Currently, ethtool does not expose how many lanes are used when the
link is up.

After adding a possibility to advertise or force a specific number of
lanes, the lanes in use value can be either the maximum width of the port
or below.

Extend ethtool to expose the number of lanes currently in use for
drivers that support it.

For example:

$ ethtool -s swp1 speed 100000 lanes 4
$ ethtool -s swp2 speed 100000 lanes 4
$ ip link set swp1 up
$ ip link set swp2 up
$ ethtool swp1
Settings for swp1:
        Supported ports: [ FIBRE         Backplane ]
        Supported link modes:   1000baseT/Full
                                10000baseT/Full
                                1000baseKX/Full
                                10000baseKR/Full
                                10000baseR_FEC
                                40000baseKR4/Full
                                40000baseCR4/Full
                                40000baseSR4/Full
                                40000baseLR4/Full
                                25000baseCR/Full
                                25000baseKR/Full
                                25000baseSR/Full
                                50000baseCR2/Full
                                50000baseKR2/Full
                                100000baseKR4/Full
                                100000baseSR4/Full
                                100000baseCR4/Full
                                100000baseLR4_ER4/Full
                                50000baseSR2/Full
                                10000baseCR/Full
                                10000baseSR/Full
                                10000baseLR/Full
                                10000baseER/Full
                                50000baseKR/Full
                                50000baseSR/Full
                                50000baseCR/Full
                                50000baseLR_ER_FR/Full
                                50000baseDR/Full
                                100000baseKR2/Full
                                100000baseSR2/Full
                                100000baseCR2/Full
                                100000baseLR2_ER2_FR2/Full
                                100000baseDR2/Full
                                200000baseKR4/Full
                                200000baseSR4/Full
                                200000baseLR4_ER4_FR4/Full
                                200000baseDR4/Full
                                200000baseCR4/Full
        Supported pause frame use: Symmetric Receive-only
        Supports auto-negotiation: Yes
        Supported FEC modes: Not reported
        Advertised link modes:  1000baseT/Full
                                10000baseT/Full
                                1000baseKX/Full
                                1000baseKX/Full
                                10000baseKR/Full
                                10000baseR_FEC
                                40000baseKR4/Full
                                40000baseCR4/Full
                                40000baseSR4/Full
                                40000baseLR4/Full
                                25000baseCR/Full
                                25000baseKR/Full
                                25000baseSR/Full
                                50000baseCR2/Full
                                50000baseKR2/Full
                                100000baseKR4/Full
                                100000baseSR4/Full
                                100000baseCR4/Full
                                100000baseLR4_ER4/Full
                                50000baseSR2/Full
                                10000baseCR/Full
                                10000baseSR/Full
                                10000baseLR/Full
                                10000baseER/Full
                                200000baseKR4/Full
                                200000baseSR4/Full
                                200000baseLR4_ER4_FR4/Full
                                200000baseDR4/Full
                                200000baseCR4/Full
        Advertised pause frame use: No
        Advertised auto-negotiation: Yes
        Advertised FEC modes: Not reported
        Advertised link modes:  100000baseKR4/Full
                                100000baseSR4/Full
                                100000baseCR4/Full
                                100000baseLR4_ER4/Full
	Advertised pause frame use: No
	Advertised auto-negotiation: Yes
	Advertised FEC modes: Not reported
	Speed: 100000Mb/s
	Lanes: 4
	Duplex: Full
	Auto-negotiation: on
	Port: Direct Attach Copper
	PHYAD: 0
	Transceiver: internal
	Link detected: yes

Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-03 18:37:29 -08:00
Danielle Ratson
c8907043c6 ethtool: Get link mode in use instead of speed and duplex parameters
Currently, when user space queries the link's parameters, as speed and
duplex, each parameter is passed from the driver to ethtool.

Instead, get the link mode bit in use, and derive each of the parameters
from it in ethtool.

Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-03 18:37:29 -08:00
Danielle Ratson
012ce4dd31 ethtool: Extend link modes settings uAPI with lanes
Currently, when auto negotiation is on, the user can advertise all the
linkmodes which correspond to a specific speed, but does not have a
similar selector for the number of lanes. This is significant when a
specific speed can be achieved using different number of lanes.  For
example, 2x50 or 4x25.

Add 'ETHTOOL_A_LINKMODES_LANES' attribute and expand 'struct
ethtool_link_settings' with lanes field in order to implement a new
lanes-selector that will enable the user to advertise a specific number
of lanes as well.

When auto negotiation is off, lanes parameter can be forced only if the
driver supports it. Add a capability bit in 'struct ethtool_ops' that
allows ethtool know if the driver can handle the lanes parameter when
auto negotiation is off, so if it does not, an error message will be
returned when trying to set lanes.

Example:

$ ethtool -s swp1 lanes 4
$ ethtool swp1
  Settings for swp1:
	Supported ports: [ FIBRE ]
        Supported link modes:   1000baseKX/Full
                                10000baseKR/Full
                                40000baseCR4/Full
				40000baseSR4/Full
				40000baseLR4/Full
                                25000baseCR/Full
                                25000baseSR/Full
				50000baseCR2/Full
                                100000baseSR4/Full
				100000baseCR4/Full
        Supported pause frame use: Symmetric Receive-only
        Supports auto-negotiation: Yes
        Supported FEC modes: Not reported
        Advertised link modes:  40000baseCR4/Full
				40000baseSR4/Full
				40000baseLR4/Full
                                100000baseSR4/Full
				100000baseCR4/Full
        Advertised pause frame use: No
        Advertised auto-negotiation: Yes
        Advertised FEC modes: Not reported
        Speed: Unknown!
        Duplex: Unknown! (255)
        Auto-negotiation: on
        Port: Direct Attach Copper
        PHYAD: 0
        Transceiver: internal
        Link detected: no

Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-03 18:37:28 -08:00
Danielle Ratson
189e7a8d94 ethtool: Validate master slave configuration before rtnl_lock()
Create a new function for input validations to be called before
rtnl_lock() and move the master slave validation to that function.

This would be a cleanup for next patch that would add another validation
to the new function.

Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-03 18:37:28 -08:00
Vladimir Oltean
99b8202b17 net: dsa: fix SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING getting ignored
The bridge emits VLAN filtering events and quite a few others via
switchdev with orig_dev = br->dev. After the blamed commit, these events
started getting ignored.

The point of the patch was to not offload switchdev objects for ports
that didn't go through dsa_port_bridge_join, because the configuration
is unsupported:
- ports that offload a bonding/team interface go through
  dsa_port_bridge_join when that bonding/team interface is later bridged
  with another switch port or LAG
- ports that don't offload LAG don't get notified of the bridge that is
  on top of that LAG.

Sadly, a check is missing, which is that the orig_dev is equal to the
bridge device. This check is compatible with the original intention,
because ports that don't offload bridging because they use a software
LAG don't have dp->bridge_dev set.

On a semi-related note, we should not offload switchdev objects or
populate dp->bridge_dev if the driver doesn't implement .port_bridge_join
either. However there is no regression associated with that, so it can
be done separately.

Fixes: 5696c8aedf ("net: dsa: Don't offload port attributes on standalone ports")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com>
Tested-by: Tobias Waldekranz <tobias@waldekranz.com>
Link: https://lore.kernel.org/r/20210202233109.1591466-1-olteanv@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-03 18:25:49 -08:00
Brian Vazquez
bbd807dfbf net: indirect call helpers for ipv4/ipv6 dst_check functions
This patch avoids the indirect call for the common case:
ip6_dst_check and ipv4_dst_check

Signed-off-by: Brian Vazquez <brianvv@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-03 14:51:40 -08:00
Brian Vazquez
f67fbeaebd net: use indirect call helpers for dst_mtu
This patch avoids the indirect call for the common case:
ip6_mtu and ipv4_mtu

Signed-off-by: Brian Vazquez <brianvv@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-03 14:51:39 -08:00
Brian Vazquez
6585d7dc49 net: use indirect call helpers for dst_output
This patch avoids the indirect call for the common case:
ip6_output and ip_output

Signed-off-by: Brian Vazquez <brianvv@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-03 14:51:39 -08:00
Brian Vazquez
e43b219064 net: use indirect call helpers for dst_input
This patch avoids the indirect call for the common case:
ip_local_deliver and ip6_input

Signed-off-by: Brian Vazquez <brianvv@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-03 14:51:39 -08:00
Eric Dumazet
fca23f37f3 inet: do not export inet_gro_{receive|complete}
inet_gro_receive() and inet_gro_complete() are part
of GRO engine which can not be modular.

Similarly, inet_gso_segment() does not need to be exported,
being part of GSO stack.

In other words, net/ipv6/ip6_offload.o is part of vmlinux,
regardless of CONFIG_IPV6.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20210202154145.1568451-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02 18:41:10 -08:00
Jakub Kicinski
0256317a61 This time, only RTNL locking reduction fallout.
- cfg80211_dev_rename() requires RTNL
  - cfg80211_change_iface() and cfg80211_set_encryption()
    require wiphy mutex (was missing in wireless extensions)
  - cfg80211_destroy_ifaces() requires wiphy mutex
  - netdev registration can fail due to notifiers, and then
    notifiers are "unrolled", need to handle this properly
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAmAZZAoACgkQB8qZga/f
 l8QW4A/+NSpqm/MMuw9IUwRsZ3wJUYI2SDj7xhbfpFEdNAmm7RjJLxF8NqCzFtqo
 2XM9DzGkbvQKlPi4DmahnRVycFqlqTGkDDs5WOvg9NdL8/zygDLsTMWJsvyI6XEp
 4Y8qLuwJpoaxDhmtEpjBNbQiZrXDdYptMRsvWpaLiLN8nlzWim+Sm+qiMeIWpxz2
 axutgbyfO4pREU3wxRbxe2V0RNLxRqJ7g10siAkchP+NoK2SjM1tQKzyuN7ruImZ
 cVriA+j1u43rWseedoKZzofCvgd74nZAi87u8dpk673s7V71//8uTHhpIOBYUbfp
 6mn1V2QhjiLZ3UZfdQFFQ+WjoowSEPMQ6gPe0EdPlWTYVNRWcQXzjIlznooZnxrE
 KVWYDYxkQKMgqrTFdUjcjOza9m4DG0aAJqqSQZ3r9KsRftseLM680vZY6AoteOOM
 WeaEt3p1Qaza18CA3BH1wVHmbNnwIfCiHtmsAefkgTD3cVH28IIUyGk4RsFPGWi0
 APNNQOyiPPQCVnDAMZjhrKPMX2NTyWw5UziFhPPc2jo00XjPW0+mPRjiSO2W55kz
 ixui/foUN5um3kEc9wJgh62eLYjOAtBomKCNZEZQjJpS9VtsvyoaY/ZjK69lP3wQ
 XERj8D+fVpm8Hidbv7tb2gElJvra0X4ZCky2KixnICjCrwDBBzw=
 =bgnR
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-next-for-net-next-2021-02-02' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
This time, only RTNL locking reduction fallout.
 - cfg80211_dev_rename() requires RTNL
 - cfg80211_change_iface() and cfg80211_set_encryption()
   require wiphy mutex (was missing in wireless extensions)
 - cfg80211_destroy_ifaces() requires wiphy mutex
 - netdev registration can fail due to notifiers, and then
   notifiers are "unrolled", need to handle this properly

* tag 'mac80211-next-for-net-next-2021-02-02' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next:
  cfg80211: fix netdev registration deadlock
  cfg80211: call cfg80211_destroy_ifaces() with wiphy lock held
  wext: call cfg80211_set_encryption() with wiphy lock held
  wext: call cfg80211_change_iface() with wiphy lock held
  nl80211: call cfg80211_dev_rename() under RTNL
====================

Link: https://lore.kernel.org/r/20210202144106.38207-1-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02 18:40:42 -08:00
Geliang Tang
2fbdd9eaf1 mptcp: add the mibs for ADD_ADDR with port
This patch adds the mibs for ADD_ADDR with port:

MPTCP_MIB_PORTADD for received ADD_ADDR suboption with a port number.

MPTCP_MIB_PORTSYNRX, MPTCP_MIB_PORTSYNACKRX, MPTCP_MIB_PORTACKRX, for
received MP_JOIN's SYN or SYN/ACK or ACK with a port number which is
different from the msk's port number.

MPTCP_MIB_MISMATCHPORTSYNRX and MPTCP_MIB_MISMATCHPORTACKRX, for
received SYN or ACK MP_JOIN with a mismatched port-number.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02 18:37:20 -08:00
Geliang Tang
a77e9179c7 mptcp: deal with MPTCP_PM_ADDR_ATTR_PORT in PM netlink
This patch adds MPTCP_PM_ADDR_ATTR_PORT filling and parsing in PM
netlink.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02 18:37:19 -08:00
Geliang Tang
60b57bf76c mptcp: enable use_port when invoke addresses_equal
When dealing with the addresses list local_addr_list or anno_list, we
should enable the function addresses_equal's parameter use_port. And
enable it in address_zero too.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02 18:37:19 -08:00
Geliang Tang
5bc56388c7 mptcp: add port number check for MP_JOIN
This patch adds two new helpers, subflow_use_different_sport and
subflow_use_different_dport, to check whether the subflow's source or
destination port number is different from the msk's port number. When
receiving the MP_JOIN's SYN/SYNACK/ACK, we do these port number checks
and print out the different port numbers.

And furthermore, when receiving the MP_JOIN's SYN/ACK, we also use a new
helper mptcp_pm_sport_in_anno_list to check whether this port number is
announced. If it isn't, we need to abort this connection.

This patch also populates the local address's port field in
local_address.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02 18:37:19 -08:00
Geliang Tang
ec20e14396 mptcp: add a new helper subflow_req_create_thmac
This patch adds a new helper named subflow_req_create_thmac, which is
extracted from subflow_token_join_request. It initializes subflow_req's
local_nonce and thmac fields, those are the more expensive to populate.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02 18:37:19 -08:00
Geliang Tang
b5e2e42fe5 mptcp: drop unused skb in subflow_token_join_request
This patch drops the unused parameter skb in subflow_token_join_request.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02 18:37:19 -08:00
Geliang Tang
1729cf186d mptcp: create the listening socket for new port
This patch creates a listening socket when an address with a port-number
is added by PM netlink. Then binds the new port to the socket, and
listens for new connections.

When the address is removed or the addresses are flushed by PM netlink,
release the listening socket.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02 18:37:19 -08:00
Geliang Tang
b5a7acd3bd mptcp: send ack for every add_addr
This patch changes the sending ACK conditions for the ADD_ADDR, send an
ACK packet for any ADD_ADDR, not just when ipv6 addresses or port
numbers are included.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/139
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02 18:37:18 -08:00
Geliang Tang
875b76718f mptcp: create subflow or signal addr for newly added address
Currently, when a new MPTCP endpoint is added, the existing MPTCP
sockets are not affected.

This patch implements a new function mptcp_nl_add_subflow_or_signal_addr,
invoked when an address is added from PM netlink. This function traverses
the MPTCP sockets list and invokes mptcp_pm_create_subflow_or_signal_addr
to try to create a subflow or signal an address for the newly added
address, if local constraint allows that.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/19
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02 18:37:18 -08:00
Geliang Tang
a914e58668 mptcp: drop *_max fields in mptcp_pm_data
This patch drops the per-msk values add_addr_signal_max,
add_addr_accept_max, local_addr_max and subflows_max fields in struct
mptcp_pm_data, uses the pernet *_max values instead. And adds four new
helpers to get the pernet *_max values separately.

Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02 18:37:18 -08:00
Geliang Tang
72603d207d mptcp: use WRITE_ONCE for the pernet *_max
This patch uses WRITE_ONCE() for all the pernet add_addr_signal_max,
add_addr_accept_max, local_addr_max and subflows_max fields in struct
pm_nl_pernet to avoid concurrency issues.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02 18:37:18 -08:00
Amit Cohen
907eea4868 net: ipv6: Emit notification when fib hardware flags are changed
After installing a route to the kernel, user space receives an
acknowledgment, which means the route was installed in the kernel,
but not necessarily in hardware.

The asynchronous nature of route installation in hardware can lead
to a routing daemon advertising a route before it was actually installed in
hardware. This can result in packet loss or mis-routed packets until the
route is installed in hardware.

It is also possible for a route already installed in hardware to change
its action and therefore its flags. For example, a host route that is
trapping packets can be "promoted" to perform decapsulation following
the installation of an IPinIP/VXLAN tunnel.

Emit RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags
are changed. The aim is to provide an indication to user-space
(e.g., routing daemons) about the state of the route in hardware.

Introduce a sysctl that controls this behavior.

Keep the default value at 0 (i.e., do not emit notifications) for several
reasons:
- Multiple RTM_NEWROUTE notification per-route might confuse existing
  routing daemons.
- Convergence reasons in routing daemons.
- The extra notifications will negatively impact the insertion rate.
- Not all users are interested in these notifications.

Move fib6_info_hw_flags_set() to C file because it is no longer a short
function.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02 17:45:59 -08:00
Amit Cohen
680aea08e7 net: ipv4: Emit notification when fib hardware flags are changed
After installing a route to the kernel, user space receives an
acknowledgment, which means the route was installed in the kernel,
but not necessarily in hardware.

The asynchronous nature of route installation in hardware can lead to a
routing daemon advertising a route before it was actually installed in
hardware. This can result in packet loss or mis-routed packets until the
route is installed in hardware.

It is also possible for a route already installed in hardware to change
its action and therefore its flags. For example, a host route that is
trapping packets can be "promoted" to perform decapsulation following
the installation of an IPinIP/VXLAN tunnel.

Emit RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags
are changed. The aim is to provide an indication to user-space
(e.g., routing daemons) about the state of the route in hardware.

Introduce a sysctl that controls this behavior.

Keep the default value at 0 (i.e., do not emit notifications) for several
reasons:
- Multiple RTM_NEWROUTE notification per-route might confuse existing
  routing daemons.
- Convergence reasons in routing daemons.
- The extra notifications will negatively impact the insertion rate.
- Not all users are interested in these notifications.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Acked-by: Roopa Prabhu <roopa@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02 17:45:59 -08:00