Commit Graph

57203 Commits

Author SHA1 Message Date
Alexey Dobriyan
9d2f112383 net: delete "register" keyword
Delete long obsoleted "register" keyword.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 18:03:42 -07:00
David S. Miller
b3a598eb0d This feature/cleanup patchset includes the following patches:
- bump version strings, by Simon Wunderlich
 
  - Replace usage of strlcpy with strscpy, by Sven Eckelmann
 
  - Add OGMv2 per-interface queue and aggregations, by Linus Luessing
    (2 patches)
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAl1MHekWHHN3QHNpbW9u
 d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoW8TEACox7vhtW4MS8QCzPySCU3f8V6m
 f+2PPlUbM4CqXcOPGw/jQng6PAtcb4gNPssm52GaxdeB9jFqkI/ELdSn5mCh+EcG
 QRrhf5DVruYyqBU2gNhovEe7SlJl8IJno6kFdAggaMngXnvlBzIr7n4FMIUKNFYn
 6kFbA8pugBXXvhiRcuzs+l5iUxdKUxTsUNPyppyqnqb8lrb0/30/681dfq87PmcV
 zehEf8Ry23W7CVQv6YougVJvK0GUwysULsvm8Wc8FsOke7CeeQIPLEF2Pcrl/CFM
 mfynXVXngE41MPBC59eUcWBGlRYwkuwm4Q+YQ8OUjr5+X5YP06jR5Dh8u6KVAMuy
 QWGSwyrlXSsCE6BTxoijdJqsLzHDXCmYY0GQI2tEMCyDnL95CU3tuTk4vckusuf+
 NlhHv7m+Bo0w9ztDUBifzNyURW9VgUCoOZfW9rdYRWjjN8Oe6wnMWCFttGnkg0qu
 zrCJn5mGvz3Vp434K0uGY1wOZincSdM6grBSgmZv1UNMaBdlfNroJsUqvz6IE5Fe
 iI5kqRXoUG+ftfwacgyFEK08HpcZHvJkNDiHHRlOPPZm75yE7mwreVUrGRaibywQ
 pzTEwM+H2MX9xF+osjiPVc197fxpnkX9fI2LhDOEiblaJbdiIxd+cgWRMFS1YPvd
 ANaFfAh58+gDWcRqhg==
 =Cbzf
 -----END PGP SIGNATURE-----

Merge tag 'batadv-next-for-davem-20190808' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
This feature/cleanup patchset includes the following patches:

 - bump version strings, by Simon Wunderlich

 - Replace usage of strlcpy with strscpy, by Sven Eckelmann

 - Add OGMv2 per-interface queue and aggregations, by Linus Luessing
   (2 patches)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 11:28:40 -07:00
David S. Miller
13dfb3fa49 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Just minor overlapping changes in the conflicts here.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06 18:44:57 -07:00
Yifeng Sun
aa733660db openvswitch: Print error when ovs_execute_actions() fails
Currently in function ovs_dp_process_packet(), return values of
ovs_execute_actions() are silently discarded. This patch prints out
an debug message when error happens so as to provide helpful hints
for debugging.
Acked-by: Pravin B Shelar <pshelar@ovn.org>

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06 14:38:12 -07:00
Vladimir Oltean
93fa8587b2 net: dsa: sja1105: Fix memory leak on meta state machine error path
When RX timestamping is enabled and two link-local (non-meta) frames are
received in a row, this constitutes an error.

The tagger is always caching the last link-local frame, in an attempt to
merge it with the meta follow-up frame when that arrives. To recover
from the above error condition, the initial cached link-local frame is
dropped and the second frame in a row is cached (in expectance of the
second meta frame).

However, when dropping the initial link-local frame, its backing memory
was being leaked.

Fixes: f3097be21b ("net: dsa: sja1105: Add a state machine for RX timestamping")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06 14:37:02 -07:00
Vladimir Oltean
f163fed276 net: dsa: sja1105: Fix memory leak on meta state machine normal path
After a meta frame is received, it is associated with the cached
sp->data->stampable_skb from the DSA tagger private structure.

Cached means its refcount is incremented with skb_get() in order for
dsa_switch_rcv() to not free it when the tagger .rcv returns NULL.

The mistake is that skb_unref() is not the correct function to use. It
will correctly decrement the refcount (which will go back to zero) but
the skb memory will not be freed.  That is the job of kfree_skb(), which
also calls skb_unref().

But it turns out that freeing the cached stampable_skb is in fact not
necessary.  It is still a perfectly valid skb, and now it is even
annotated with the partial RX timestamp.  So remove the skb_copy()
altogether and simply pass the stampable_skb with a refcount of 1
(incremented by us, decremented by dsa_switch_rcv) up the stack.

Fixes: f3097be21b ("net: dsa: sja1105: Add a state machine for RX timestamping")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06 14:37:02 -07:00
John Hurley
48e584ac58 net: sched: add ingress mirred action to hardware IR
TC mirred actions (redirect and mirred) can send to egress or ingress of a
device. Currently only egress is used for hw offload rules.

Modify the intermediate representation for hw offload to include mirred
actions that go to ingress. This gives drivers access to such rules and
can decide whether or not to offload them.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06 14:24:21 -07:00
John Hurley
fb1b775a24 net: sched: add skbedit of ptype action to hardware IR
TC rules can impliment skbedit actions. Currently actions that modify the
skb mark are passed to offloading drivers via the hardware intermediate
representation in the flow_offload API.

Extend this to include skbedit actions that modify the packet type of the
skb. Such actions may be used to set the ptype to HOST when redirecting a
packet to ingress.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06 14:24:21 -07:00
Dave Taht
77ddaff218 fq_codel: Kill useless per-flow dropped statistic
It is almost impossible to get anything other than a 0 out of
flow->dropped statistic with a tc class dump, as it resets to 0
on every round.

It also conflates ecn marks with drops.

It would have been useful had it kept a cumulative drop count, but
it doesn't. This patch doesn't change the API, it just stops
tracking a stat and state that is impossible to measure and nobody
uses.

Signed-off-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06 14:18:20 -07:00
Dave Taht
ae697f3bf7 Increase fq_codel count in the bulk dropper
In the field fq_codel is often used with a smaller memory or
packet limit than the default, and when the bulk dropper is hit,
the drop pattern bifircates into one that more slowly increases
the codel drop rate and hits the bulk dropper more than it should.

The scan through the 1024 queues happens more often than it needs to.

This patch increases the codel count in the bulk dropper, but
does not change the drop rate there, relying on the next codel round
to deliver the next packet at the original drop rate
(after that burst of loss), then escalate to a higher signaling rate.

Signed-off-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06 14:18:20 -07:00
Vivien Didelot
48e2331197 net: dsa: dump CPU port regs through master
Merge the CPU port registers dump into the master interface registers
dump through ethtool, by nesting the ethtool_drvinfo and ethtool_regs
structures of the CPU port into the dump.

drvinfo->regdump_len will contain the full data length, while regs->len
will contain only the master interface registers dump length.

This allows for example to dump the CPU port registers on a ZII Dev
C board like this:

    # ethtool -d eth1
    0x004:                                              0x00000000
    0x008:                                              0x0a8000aa
    0x010:                                              0x01000000
    0x014:                                              0x00000000
    0x024:                                              0xf0000102
    0x040:                                              0x6d82c800
    0x044:                                              0x00000020
    0x064:                                              0x40000000
    0x084: RCR (Receive Control Register)               0x47c00104
        MAX_FL (Maximum frame length)                   1984
        FCE (Flow control enable)                       0
        BC_REJ (Broadcast frame reject)                 0
        PROM (Promiscuous mode)                         0
        DRT (Disable receive on transmit)               0
        LOOP (Internal loopback)                        0
    0x0c4: TCR (Transmit Control Register)              0x00000004
        RFC_PAUSE (Receive frame control pause)         0
        TFC_PAUSE (Transmit frame control pause)        0
        FDEN (Full duplex enable)                       1
        HBC (Heartbeat control)                         0
        GTS (Graceful transmit stop)                    0
    0x0e4:                                              0x76735d6d
    0x0e8:                                              0x7e9e8808
    0x0ec:                                              0x00010000
    .
    .
    .
    88E6352  Switch Port Registers
    ------------------------------
    00: Port Status                            0x4d04
          Pause Enabled                        0
          My Pause                             1
          802.3 PHY Detected                   0
          Link Status                          Up
          Duplex                               Full
          Speed                                100 or 200 Mbps
          EEE Enabled                          0
          Transmitter Paused                   0
          Flow Control                         0
          Config Mode                          0x4
    01: Physical Control                       0x003d
          RGMII Receive Timing Control         Default
          RGMII Transmit Timing Control        Default
          200 BASE Mode                        100
          Flow Control's Forced value          0
          Force Flow Control                   0
          Link's Forced value                  Up
          Force Link                           1
          Duplex's Forced value                Full
          Force Duplex                         1
          Force Speed                          100 or 200 Mbps
    .
    .
    .

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06 14:08:32 -07:00
Roman Mashak
b35475c549 net sched: update vlan action for batched events operations
Add get_fill_size() routine used to calculate the action size
when building a batch of events.

Fixes: c7e2b9689 ("sched: introduce vlan action")
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06 14:05:39 -07:00
Ido Schimmel
b19d955055 drop_monitor: Use pre_doit / post_doit hooks
Each operation from user space should be protected by the global drop
monitor mutex. Use the pre_doit / post_doit hooks to take / release the
lock instead of doing it explicitly in each function.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06 12:37:56 -07:00
Ido Schimmel
965100966e drop_monitor: Add extack support
Add various extack messages to make drop_monitor more user friendly.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06 12:37:56 -07:00
Ido Schimmel
ff3818ca39 drop_monitor: Avoid multiple blank lines
Remove multiple blank lines which are visually annoying and useless.

This suppresses the "Please don't use multiple blank lines" checkpatch
messages.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06 12:37:56 -07:00
Ido Schimmel
01921d53f8 drop_monitor: Document scope of spinlock
While 'per_cpu_dm_data' is a per-CPU variable, its 'skb' and
'send_timer' fields can be accessed concurrently by the CPU sending the
netlink notification to user space from the workqueue and the CPU
tracing kfree_skb(). This spinlock is meant to protect against that.

Document its scope and suppress the checkpatch message "spinlock_t
definition without comment".

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06 12:37:56 -07:00
Ido Schimmel
dbf896b70d drop_monitor: Rename and document scope of mutex
The 'trace_state_mutex' does not only protect the global 'trace_state'
variable, but also the global 'hw_stats_list'.

Subsequent patches are going add more operations from user space to
drop_monitor and these all need to be mutually exclusive.

Rename 'trace_state_mutex' to the more fitting 'net_dm_mutex' name and
document its scope.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06 12:37:56 -07:00
Ido Schimmel
2230a7ef51 drop_monitor: Use correct error code
The error code 'ENOTSUPP' is reserved for use with NFS. Use 'EOPNOTSUPP'
instead.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06 12:37:56 -07:00
Marek Vasut
267df70fe8 net: dsa: ksz: Drop NET_DSA_TAG_KSZ9477
This Kconfig option is unused, drop it.

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: David S. Miller <davem@davemloft.net>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Tristram Ha <Tristram.Ha@microchip.com>
Cc: Vivien Didelot <vivien.didelot@gmail.com>
Cc: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06 11:59:17 -07:00
Nikolay Aleksandrov
091adf9ba6 net: bridge: move default pvid init/deinit to NETDEV_REGISTER/UNREGISTER
Most of the bridge device's vlan init bugs come from the fact that its
default pvid is created at the wrong time, way too early in ndo_init()
before the device is even assigned an ifindex. It introduces a bug when the
bridge's dev_addr is added as fdb during the initial default pvid creation
the notification has ifindex/NDA_MASTER both equal to 0 (see example below)
which really makes no sense for user-space[0] and is wrong.
Usually user-space software would ignore such entries, but they are
actually valid and will eventually have all necessary attributes.
It makes much more sense to send a notification *after* the device has
registered and has a proper ifindex allocated rather than before when
there's a chance that the registration might still fail or to receive
it with ifindex/NDA_MASTER == 0. Note that we can remove the fdb flush
from br_vlan_flush() since that case can no longer happen. At
NETDEV_REGISTER br->default_pvid is always == 1 as it's initialized by
br_vlan_init() before that and at NETDEV_UNREGISTER it can be anything
depending why it was called (if called due to NETDEV_REGISTER error
it'll still be == 1, otherwise it could be any value changed during the
device life time).

For the demonstration below a small change to iproute2 for printing all fdb
notifications is added, because it contained a workaround not to show
entries with ifindex == 0.
Command executed while monitoring: $ ip l add br0 type bridge
Before (both ifindex and master == 0):
$ bridge monitor fdb
36:7e:8a:b3:56:ba dev * vlan 1 master * permanent

After (proper br0 ifindex):
$ bridge monitor fdb
e6:2a:ae:7a:b7:48 dev br0 vlan 1 master br0 permanent

v4: move only the default pvid init/deinit to NETDEV_REGISTER/UNREGISTER
v3: send the correct v2 patch with all changes (stub should return 0)
v2: on error in br_vlan_init set br->vlgrp to NULL and return 0 in
    the br_vlan_bridge_event stub when bridge vlans are disabled

[0] https://bugzilla.kernel.org/show_bug.cgi?id=204389

Reported-by: michael-dev <michael-dev@fami-braun.de>
Fixes: 5be5a2df40 ("bridge: Add filtering support for default_pvid")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-05 13:32:53 -07:00
Ursula Braun
cd2063604e net/smc: avoid fallback in case of non-blocking connect
FASTOPEN is not possible with SMC. sendmsg() with msg_flag MSG_FASTOPEN
triggers a fallback to TCP if the socket is in state SMC_INIT.
But if a nonblocking connect is already started, fallback to TCP
is no longer possible, even though the socket may still be in state
SMC_INIT.
And if a nonblocking connect is already started, a listen() call
does not make sense.

Reported-by: syzbot+bd8cc73d665590a1fcad@syzkaller.appspotmail.com
Fixes: 50717a37db ("net/smc: nonblocking connect rework")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-05 13:24:37 -07:00
Ursula Braun
f9cedf1a9b net/smc: do not schedule tx_work in SMC_CLOSED state
The setsockopts options TCP_NODELAY and TCP_CORK may schedule the
tx worker. Make sure the socket is not yet moved into SMC_CLOSED
state (for instance by a shutdown SHUT_RDWR call).

Reported-by: syzbot+92209502e7aab127c75f@syzkaller.appspotmail.com
Reported-by: syzbot+b972214bb803a343f4fe@syzkaller.appspotmail.com
Fixes: 01d2f7e2cd ("net/smc: sockopts TCP_NODELAY and TCP_CORK")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-05 13:23:05 -07:00
David Ahern
43a4b60d04 ipv6: have a single rcu unlock point in __ip6_rt_update_pmtu
Simplify the unlock path in __ip6_rt_update_pmtu by using a
single point where rcu_read_unlock is called.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-05 13:17:54 -07:00
David Ahern
cff6a327d7 ipv6: Fix unbalanced rcu locking in rt6_update_exception_stamp_rt
The nexthop path in rt6_update_exception_stamp_rt needs to call
rcu_read_unlock if it fails to find a fib6_nh match rather than
just returning.

Fixes: e659ba31d8 ("ipv6: Handle all fib6_nh in a nexthop in exception handling")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-05 13:16:58 -07:00
Jakub Kicinski
5d92e631b8 net/tls: partially revert fix transition through disconnect with close
Looks like we were slightly overzealous with the shutdown()
cleanup. Even though the sock->sk_state can reach CLOSED again,
socket->state will not got back to SS_UNCONNECTED once
connections is ESTABLISHED. Meaning we will see EISCONN if
we try to reconnect, and EINVAL if we try to listen.

Only listen sockets can be shutdown() and reused, but since
ESTABLISHED sockets can never be re-connected() or used for
listen() we don't need to try to clean up the ULP state early.

Fixes: 32857cf57f ("net/tls: fix transition through disconnect with close")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-05 13:15:30 -07:00
Jesper Dangaard Brouer
065af35547 net: fix bpf_xdp_adjust_head regression for generic-XDP
When generic-XDP was moved to a later processing step by commit
458bf2f224 ("net: core: support XDP generic on stacked devices.")
a regression was introduced when using bpf_xdp_adjust_head.

The issue is that after this commit the skb->network_header is now
changed prior to calling generic XDP and not after. Thus, if the header
is changed by XDP (via bpf_xdp_adjust_head), then skb->network_header
also need to be updated again.  Fix by calling skb_reset_network_header().

Fixes: 458bf2f224 ("net: core: support XDP generic on stacked devices.")
Reported-by: Brandon Cazander <brandon.cazander@multapplied.net>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-05 11:17:40 -07:00
Dmytro Linkin
7be8ef2cdb net: sched: use temporary variable for actions indexes
Currently init call of all actions (except ipt) init their 'parm'
structure as a direct pointer to nla data in skb. This leads to race
condition when some of the filter actions were initialized successfully
(and were assigned with idr action index that was written directly
into nla data), but then were deleted and retried (due to following
action module missing or classifier-initiated retry), in which case
action init code tries to insert action to idr with index that was
assigned on previous iteration. During retry the index can be reused
by another action that was inserted concurrently, which causes
unintended action sharing between filters.
To fix described race condition, save action idr index to temporary
stack-allocated variable instead on nla data.

Fixes: 0190c1d452 ("net: sched: atomically check-allocate action")
Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-05 10:59:14 -07:00
Linus Lüssing
9cb9a17813 batman-adv: BATMAN_V: aggregate OGMv2 packets
Instead of transmitting individual OGMv2 packets from the aggregation
queue merge those OGMv2 packets into a single one and transmit this
aggregate instead.

This reduces overhead as it saves an ethernet header and a transmission
per aggregated OGMv2 packet.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2019-08-04 22:22:00 +02:00
Linus Lüssing
f89255a02f batman-adv: BATMAN_V: introduce per hard-iface OGMv2 queues
In preparation for the OGMv2 packet aggregation, hold OGMv2 packets for
up to BATADV_MAX_AGGREGATION_MS milliseconds (100ms) on per
hard-interface queues, before transmitting.

This allows us to later squash multiple OGMs into a single frame
and transmission for reduced overhead.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2019-08-04 22:22:00 +02:00
Dexuan Cui
685703b497 hv_sock: Fix hang when a connection is closed
There is a race condition for an established connection that is being closed
by the guest: the refcnt is 4 at the end of hvs_release() (Note: here the
'remove_sock' is false):

1 for the initial value;
1 for the sk being in the bound list;
1 for the sk being in the connected list;
1 for the delayed close_work.

After hvs_release() finishes, __vsock_release() -> sock_put(sk) *may*
decrease the refcnt to 3.

Concurrently, hvs_close_connection() runs in another thread:
  calls vsock_remove_sock() to decrease the refcnt by 2;
  call sock_put() to decrease the refcnt to 0, and free the sk;
  next, the "release_sock(sk)" may hang due to use-after-free.

In the above, after hvs_release() finishes, if hvs_close_connection() runs
faster than "__vsock_release() -> sock_put(sk)", then there is not any issue,
because at the beginning of hvs_close_connection(), the refcnt is still 4.

The issue can be resolved if an extra reference is taken when the
connection is established.

Fixes: a9eeb998c2 ("hv_sock: Add support for delayed close")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-02 17:26:27 -07:00
Jon Maloy
7c5b420559 tipc: reduce risk of wakeup queue starvation
In commit 365ad353c2 ("tipc: reduce risk of user starvation during
link congestion") we allowed senders to add exactly one list of extra
buffers to the link backlog queues during link congestion (aka
"oversubscription"). However, the criteria for when to stop adding
wakeup messages to the input queue when the overload abates is
inaccurate, and may cause starvation problems during very high load.

Currently, we stop adding wakeup messages after 10 total failed attempts
where we find that there is no space left in the backlog queue for a
certain importance level. The counter for this is accumulated across all
levels, which may lead the algorithm to leave the loop prematurely,
although there may still be plenty of space available at some levels.
The result is sometimes that messages near the wakeup queue tail are not
added to the input queue as they should be.

We now introduce a more exact algorithm, where we keep adding wakeup
messages to a level as long as the backlog queue has free slots for
the corresponding level, and stop at the moment there are no more such
slots or when there are no more wakeup messages to dequeue.

Fixes: 365ad35 ("tipc: reduce risk of user starvation during link congestion")
Reported-by: Tung Nguyen <tung.q.nguyen@dektech.com.au>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-01 18:19:28 -04:00
Taras Kondratiuk
4da5f0018e tipc: compat: allow tipc commands without arguments
Commit 2753ca5d90 ("tipc: fix uninit-value in tipc_nl_compat_doit")
broke older tipc tools that use compat interface (e.g. tipc-config from
tipcutils package):

% tipc-config -p
operation not supported

The commit started to reject TIPC netlink compat messages that do not
have attributes. It is too restrictive because some of such messages are
valid (they don't need any arguments):

% grep 'tx none' include/uapi/linux/tipc_config.h
#define  TIPC_CMD_NOOP              0x0000    /* tx none, rx none */
#define  TIPC_CMD_GET_MEDIA_NAMES   0x0002    /* tx none, rx media_name(s) */
#define  TIPC_CMD_GET_BEARER_NAMES  0x0003    /* tx none, rx bearer_name(s) */
#define  TIPC_CMD_SHOW_PORTS        0x0006    /* tx none, rx ultra_string */
#define  TIPC_CMD_GET_REMOTE_MNG    0x4003    /* tx none, rx unsigned */
#define  TIPC_CMD_GET_MAX_PORTS     0x4004    /* tx none, rx unsigned */
#define  TIPC_CMD_GET_NETID         0x400B    /* tx none, rx unsigned */
#define  TIPC_CMD_NOT_NET_ADMIN     0xC001    /* tx none, rx none */

This patch relaxes the original fix and rejects messages without
arguments only if such arguments are expected by a command (reg_type is
non zero).

Fixes: 2753ca5d90 ("tipc: fix uninit-value in tipc_nl_compat_doit")
Cc: stable@vger.kernel.org
Signed-off-by: Taras Kondratiuk <takondra@cisco.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-01 18:14:00 -04:00
Nikolay Aleksandrov
3247b27204 net: bridge: mcast: add delete due to fast-leave mdb flag
In user-space there's no way to distinguish why an mdb entry was deleted
and that is a problem for daemons which would like to keep the mdb in
sync with remote ends (e.g. mlag) but would also like to converge faster.
In almost all cases we'd like to age-out the remote entry for performance
and convergence reasons except when fast-leave is enabled. In that case we
want explicit immediate remote delete, thus add mdb flag which is set only
when the entry is being deleted due to fast-leave.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-31 19:13:40 -04:00
Nikolay Aleksandrov
5c725b6b65 net: bridge: mcast: don't delete permanent entries when fast leave is enabled
When permanent entries were introduced by the commit below, they were
exempt from timing out and thus igmp leave wouldn't affect them unless
fast leave was enabled on the port which was added before permanent
entries existed. It shouldn't matter if fast leave is enabled or not
if the user added a permanent entry it shouldn't be deleted on igmp
leave.

Before:
$ echo 1 > /sys/class/net/eth4/brport/multicast_fast_leave
$ bridge mdb add dev br0 port eth4 grp 229.1.1.1 permanent
$ bridge mdb show
dev br0 port eth4 grp 229.1.1.1 permanent

< join and leave 229.1.1.1 on eth4 >

$ bridge mdb show
$

After:
$ echo 1 > /sys/class/net/eth4/brport/multicast_fast_leave
$ bridge mdb add dev br0 port eth4 grp 229.1.1.1 permanent
$ bridge mdb show
dev br0 port eth4 grp 229.1.1.1 permanent

< join and leave 229.1.1.1 on eth4 >

$ bridge mdb show
dev br0 port eth4 grp 229.1.1.1 permanent

Fixes: ccb1c31a7a ("bridge: add flags to distinguish permanent mdb entires")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-31 19:03:01 -04:00
David S. Miller
ac5fe22636 We have a reasonably large number of changes:
* lots more HE (802.11ax) support, particularly things
    relevant for the the AP side, but also mesh support
  * debugfs cleanups from Greg
  * some more work on extended key ID
  * start using genl parallel_ops, as preparation for
    weaning ourselves off RTNL and getting parallelism
  * various other changes all over
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAl1BtsAACgkQB8qZga/f
 l8QTdQ//bD3NLitiIW4rBmC/w9str2TRpUbRN8Hf9xTnRh1smLGQop8jI8xpYdAE
 hXFlWn4glCkTkZrqtT8IDMcERb2YPHI94L7lMR1L6dXH7e1tiBZ7Qum5+rojLUCQ
 /XXZJnVa9AbdzBDtkhzCjkN8dCldMnkId0m6vkGdFre/b70FLDg2GlolqIchbDCz
 GxeKaw/uTLwu9nxEJhspDmiXDQtQd1ZsGNZRrovQ2nUuUfvX9sU54OmaoDHhqrUM
 iSwFZJAhrCV7nidOLs2X1rVs8hRymbrYnGu6NdUlfTQMqaOVgWhqniM1RMnJnnVi
 Ec/1OQg4KIj2rjJIXW/3QXAzAO6TDwV8xNk9al3m+yAkkSXR7tXP01plNUoMU4UW
 YxcaPD341H919GIqa71lieeEMyi7XIKKFfaZvGFLiDzidKKi0JkDuC+1w5UOJrbj
 PM/dzvKqvwOMEa5XahjrUQLsiGMgxBkDo36+F4rfaelzDKPuJBOokDS23vg/lYf5
 2v74WbeJVc45cAhSwp5/0RTsG0xhmFImfrE66VNB1vcJUVLvHTuATfXsBbRahOz/
 SglGBAGqigxJwIeYfgec3lPNfdV+0LwYnPAEjBIjCO8qlGvst4jLroHg7ayApr2E
 JfUbaWtpEb9cgAz7X6tEjv/BAnsRWPzpxb8Nm6JoZttTtA2VakE=
 =eUnh
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-next-for-davem-2019-07-31' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
We have a reasonably large number of changes:
 * lots more HE (802.11ax) support, particularly things
   relevant for the the AP side, but also mesh support
 * debugfs cleanups from Greg
 * some more work on extended key ID
 * start using genl parallel_ops, as preparation for
   weaning ourselves off RTNL and getting parallelism
 * various other changes all over
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-31 08:59:41 -07:00
David S. Miller
f86a677e57 Just a few fixes:
* revert NETIF_F_LLTX usage as it caused problems
  * avoid warning on WMM parameters from AP that are too short
  * fix possible null-ptr dereference in hwsim
  * fix interface combinations with 4-addr and crypto control
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAl1BfGcACgkQB8qZga/f
 l8TE/A/9G98DIi7UpuseyrW1y+3Bab5G+N4JhNDYMF0fC+f+6VZPsQRQi0aH2GVE
 fBDnFQogiqIY8WwgQ5RoyUjOux43INpNp3uPjOCzcdjj9+D+lBX/27yhNVBVYK5H
 Uxq08W2Mv+GWcswa5NMGUF9FlARbIMSu1BsDML031LzSckNvSLwMuANSx/1/d/gr
 rUa1waaGeC2yT/baefNaHDiqYBj/6Xm6U0Adl1PphGJZ587UjlylCwozNNIrUIwW
 PeysRXIUakePEqNs0Kl6U6lUJoH3g1KB+fLl5ea94L2aBHHRRaBAG1yROidg1/JG
 fmz8jostoasG+wuycNSdETVgY3TT37Hv0R7rRB0Wj5QvrdXqgyrAZSR97dMWR9wR
 fZLZCXtYnxOVZwKUlDusSjDFm9CKdhM+KqINHXBa6RRWpQjFYn4AvbQlJEZrbyuJ
 lBOvXAvmF1fsX5x9hmDLRM7Irtb9Awb7eHbT+T5c+BtWKvnR2GsujMy2f6TY+swN
 WPmER6MtQ80VPk9xpLVxU02vLs0EIdoPxE7I348ozDmW6sln0kRI1Pp7uOqDO1RC
 sX6uA8bSTp9u2tKsgevkHqiz3YiJOqmy475tcOg9F/CefsULZzVfBE0bsU4v0jnJ
 LTXJS2GtmPhG4AJFn+qiqvvOzJGhAJAEsvmPvUgH0AmB4m0+Cqg=
 =rE1e
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-for-davem-2019-07-31' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
Just a few fixes:
 * revert NETIF_F_LLTX usage as it caused problems
 * avoid warning on WMM parameters from AP that are too short
 * fix possible null-ptr dereference in hwsim
 * fix interface combinations with 4-addr and crypto control
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-31 08:51:34 -07:00
David S. Miller
fa9586aff9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
netfilter fixes for net

The following patchset contains Netfilter fixes for your net tree:

1) memleak in ebtables from the error path for the 32/64 compat layer,
   from Florian Westphal.

2) Fix inverted meta ifname/ifidx matching when no interface is set
   on either from the input/output path, from Phil Sutter.

3) Remove goto label in nft_meta_bridge, also from Phil.

4) Missing include guard in xt_connlabel, from Masahiro Yamada.

5) Two patch to fix ipset destination MAC matching coming from
   Stephano Brivio, via Jozsef Kadlecsik.

6) Fix set rename and listing concurrency problem, from Shijie Luo.
   Patch also coming via Jozsef Kadlecsik.

7) ebtables 32/64 compat missing base chain policy in rule count,
   from Florian Westphal.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-31 08:49:09 -07:00
Shay Bar
f39b07fdfb mac80211: HE STA disassoc due to QOS NULL not sent
In case of HE AP-STA link, ieee80211_send_nullfunc() will not
send the QOS NULL packet to check if AP is still associated.

In this case, probe_send_count will be non-zero and
ieee80211_sta_work() will later disassociate the AP, even
though no packet was ever sent.

Fix this by decrementing probe_send_count and not calling
ieee80211_send_nullfunc() in case of HE link, so that we
still wait for some time for the AP beacon to reappear and
don't disconnect right away.

Signed-off-by: Shay Bar <shay.bar@celeno.com>
Link: https://lore.kernel.org/r/20190703131848.22879-1-shay.bar@celeno.com
[clarify commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-07-31 13:26:41 +02:00
John Crispin
1ced169cc1 mac80211: allow setting spatial reuse parameters from bss_conf
Store the OBSS PD parameters inside bss_conf when bringing up an AP and/or
when a station connects to an AP. This allows the driver to configure the
HW accordingly.

Signed-off-by: John Crispin <john@phrozen.org>
Link: https://lore.kernel.org/r/20190730163701.18836-3-john@phrozen.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-07-31 11:00:52 +02:00
Johannes Berg
6d4dd4ef1a nl80211: add strict start type
Add a strict start type so all new attributes starting from
NL80211_ATTR_HE_OBSS_PD are validated strictly.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-07-31 11:00:52 +02:00
John Crispin
796e90f42b cfg80211: add support for parsing OBBS_PD attributes
Add the data structure, policy and parsing code allowing userland to send
the OBSS PD information into the kernel.

Signed-off-by: John Crispin <john@phrozen.org>
Link: https://lore.kernel.org/r/20190730163701.18836-2-john@phrozen.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-07-31 11:00:52 +02:00
Karthikeyan Periyasamy
52dba8d7d5 mac80211: reject zero MAC address in add station
This came up in fuzz testing, and really we don't consider
all-zeroes to be a valid MAC address in most places, so
also reject it here to avoid confusion later on.

Signed-off-by: Karthikeyan Periyasamy <periyasa@codeaurora.org>
Link: https://lore.kernel.org/r/1563959770-21570-1-git-send-email-periyasa@codeaurora.org
[rewrite commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-07-31 11:00:52 +02:00
Johannes Berg
50508d941c cfg80211: use parallel_ops for genl
Over time, we really need to get rid of all of our global locking.
One of the things needed is to use parallel_ops. This isn't really
the most important (RTNL is much more important) but OTOH we just
keep adding uses of genl_family_attrbuf() now. Use .parallel_ops to
disallow this.

Reviewed-By: Denis Kenzior <denkenz@gmail.com>
Link: https://lore.kernel.org/r/20190729143109.18683-1-johannes@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-07-31 11:00:46 +02:00
Colin Ian King
f12cac539f mac80211: add missing null return check from call to ieee80211_get_sband
The return from ieee80211_get_sband can potentially be a null pointer, so
it seems prudent to add a null check to avoid a null pointer dereference
on sband.

Addresses-Coverity: ("Dereference null return")
Fixes: 2ab4587675 ("mac80211: add support for the ADDBA extension element")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20190730143205.14261-1-colin.king@canonical.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-07-31 10:51:17 +02:00
Tristram Ha
016e43a26b net: dsa: ksz: Add KSZ8795 tag code
Add DSA tag code for Microchip KSZ8795 switch. The switch is simpler
and the tag is only 1 byte, instead of 2 as is the case with KSZ9477.

Signed-off-by: Tristram Ha <Tristram.Ha@microchip.com>
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: David S. Miller <davem@davemloft.net>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Tristram Ha <Tristram.Ha@microchip.com>
Cc: Vivien Didelot <vivien.didelot@gmail.com>
Cc: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-30 15:12:50 -07:00
Stefano Garzarella
0038ff357f vsock/virtio: change the maximum packet size allowed
Since now we are able to split packets, we can avoid limiting
their sizes to VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE.
Instead, we can use VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max
packet size.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-30 15:00:00 -07:00
Stefano Garzarella
6dbd3e66e7 vhost/vsock: split packets to send using multiple buffers
If the packets to sent to the guest are bigger than the buffer
available, we can split them, using multiple buffers and fixing
the length in the packet header.
This is safe since virtio-vsock supports only stream sockets.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-30 15:00:00 -07:00
Stefano Garzarella
9632e9f61b vsock/virtio: fix locking in virtio_transport_inc_tx_pkt()
fwd_cnt and last_fwd_cnt are protected by rx_lock, so we should use
the same spinlock also if we are in the TX path.

Move also buf_alloc under the same lock.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-30 15:00:00 -07:00
Stefano Garzarella
b89d882dc9 vsock/virtio: reduce credit update messages
In order to reduce the number of credit update messages,
we send them only when the space available seen by the
transmitter is less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-30 15:00:00 -07:00
Stefano Garzarella
473c7391ce vsock/virtio: limit the memory used per-socket
Since virtio-vsock was introduced, the buffers filled by the host
and pushed to the guest using the vring, are directly queued in
a per-socket list. These buffers are preallocated by the guest
with a fixed size (4 KB).

The maximum amount of memory used by each socket should be
controlled by the credit mechanism.
The default credit available per-socket is 256 KB, but if we use
only 1 byte per packet, the guest can queue up to 262144 of 4 KB
buffers, using up to 1 GB of memory per-socket. In addition, the
guest will continue to fill the vring with new 4 KB free buffers
to avoid starvation of other sockets.

This patch mitigates this issue copying the payload of small
packets (< 128 bytes) into the buffer of last packet queued, in
order to avoid wasting memory.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-30 15:00:00 -07:00