Commit Graph

900641 Commits

Author SHA1 Message Date
Roman Kiryanov
98bda63e20 net: disable BRIDGE_NETFILTER by default
The description says 'If unsure, say N.' but
the module is built as M by default (once
the dependencies are satisfied).

When the module is selected (Y or M), it enables
NETFILTER_FAMILY_BRIDGE and SKB_EXTENSIONS
which alter kernel internal structures.

We (Android Studio Emulator) currently do not
use this module and think this it is more consistent
to have it disabled by default as opposite to
disabling it explicitly to prevent enabling
NETFILTER_FAMILY_BRIDGE and SKB_EXTENSIONS.

Signed-off-by: Roman Kiryanov <rkir@google.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20 15:02:02 -08:00
Alexandre Belloni
ac2fcfa9fd net: macb: Properly handle phylink on at91rm9200
at91ether_init was handling the phy mode and speed but since the switch to
phylink, the NCFGR register got overwritten by macb_mac_config(). The issue
is that the RM9200_RMII bit and the MACB_CLK_DIV32 field are cleared
but never restored as they conflict with the PAE, GBE and PCSSEL bits.

Add new capability to differentiate between EMAC and the other versions of
the IP and use it to set and avoid clearing the relevant bits.

Also, this fixes a NULL pointer dereference in macb_mac_link_up as the EMAC
doesn't use any rings/bufffers/queues.

Fixes: 7897b071ac ("net: macb: convert to phylink")
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20 15:00:31 -08:00
David S. Miller
0d5b8d7055 Merge branch 's390-fixes'
Julian Wiedmann says:

====================
s390/qeth: fixes 2020-02-20

please apply the following patch series for qeth to netdev's net tree.

This corrects three minor issues:
1) return a more fitting errno when VNICC cmds are not supported,
2) remove a bogus WARN in the NAPI code, and
3) be _very_ pedantic about the RX copybreak.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20 10:30:47 -08:00
Julian Wiedmann
54a61fbc02 s390/qeth: fix off-by-one in RX copybreak check
The RX copybreak is intended as the _max_ value where the frame's data
should be copied. So for frame_len == copybreak, don't build an SG skb.

Fixes: 4a71df5004 ("qeth: new qeth device driver")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20 10:30:47 -08:00
Julian Wiedmann
420579dba1 s390/qeth: don't warn for napi with 0 budget
Calling napi->poll() with 0 budget is a legitimate use by netpoll.

Fixes: a1c3ed4c9c ("qeth: NAPI support for l2 and l3 discipline")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20 10:30:47 -08:00
Alexandra Winter
6f3846f095 s390/qeth: vnicc Fix EOPNOTSUPP precedence
When getting or setting VNICC parameters, the error code EOPNOTSUPP
should have precedence over EBUSY.

EBUSY is used because vnicc feature and bridgeport feature are mutually
exclusive, which is a temporary condition.
Whereas EOPNOTSUPP indicates that the HW does not support all or parts of
the vnicc feature.
This issue causes the vnicc sysfs params to show 'blocked by bridgeport'
for HW that does not support VNICC at all.

Fixes: caa1f0b10d ("s390/qeth: add VNICC enable/disable support")
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20 10:30:47 -08:00
Kees Cook
16a556eeb7 openvswitch: Distribute switch variables for initialization
Variables declared in a switch statement before any case statements
cannot be automatically initialized with compiler instrumentation (as
they are not part of any execution flow). With GCC's proposed automatic
stack variable initialization feature, this triggers a warning (and they
don't get initialized). Clang's automatic stack variable initialization
(via CONFIG_INIT_STACK_ALL=y) doesn't throw a warning, but it also
doesn't initialize such variables[1]. Note that these warnings (or silent
skipping) happen before the dead-store elimination optimization phase,
so even when the automatic initializations are later elided in favor of
direct initializations, the warnings remain.

To avoid these problems, move such variables into the "case" where
they're used or lift them up into the main function body.

net/openvswitch/flow_netlink.c: In function ‘validate_set’:
net/openvswitch/flow_netlink.c:2711:29: warning: statement will never be executed [-Wswitch-unreachable]
 2711 |  const struct ovs_key_ipv4 *ipv4_key;
      |                             ^~~~~~~~

[1] https://bugs.llvm.org/show_bug.cgi?id=44916

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20 10:00:19 -08:00
Kees Cook
46d30cb104 net: ip6_gre: Distribute switch variables for initialization
Variables declared in a switch statement before any case statements
cannot be automatically initialized with compiler instrumentation (as
they are not part of any execution flow). With GCC's proposed automatic
stack variable initialization feature, this triggers a warning (and they
don't get initialized). Clang's automatic stack variable initialization
(via CONFIG_INIT_STACK_ALL=y) doesn't throw a warning, but it also
doesn't initialize such variables[1]. Note that these warnings (or silent
skipping) happen before the dead-store elimination optimization phase,
so even when the automatic initializations are later elided in favor of
direct initializations, the warnings remain.

To avoid these problems, move such variables into the "case" where
they're used or lift them up into the main function body.

net/ipv6/ip6_gre.c: In function ‘ip6gre_err’:
net/ipv6/ip6_gre.c:440:32: warning: statement will never be executed [-Wswitch-unreachable]
  440 |   struct ipv6_tlv_tnl_enc_lim *tel;
      |                                ^~~

net/ipv6/ip6_tunnel.c: In function ‘ip6_tnl_err’:
net/ipv6/ip6_tunnel.c:520:32: warning: statement will never be executed [-Wswitch-unreachable]
  520 |   struct ipv6_tlv_tnl_enc_lim *tel;
      |                                ^~~

[1] https://bugs.llvm.org/show_bug.cgi?id=44916

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20 10:00:19 -08:00
Kees Cook
161d179261 net: core: Distribute switch variables for initialization
Variables declared in a switch statement before any case statements
cannot be automatically initialized with compiler instrumentation (as
they are not part of any execution flow). With GCC's proposed automatic
stack variable initialization feature, this triggers a warning (and they
don't get initialized). Clang's automatic stack variable initialization
(via CONFIG_INIT_STACK_ALL=y) doesn't throw a warning, but it also
doesn't initialize such variables[1]. Note that these warnings (or silent
skipping) happen before the dead-store elimination optimization phase,
so even when the automatic initializations are later elided in favor of
direct initializations, the warnings remain.

To avoid these problems, move such variables into the "case" where
they're used or lift them up into the main function body.

net/core/skbuff.c: In function ‘skb_checksum_setup_ip’:
net/core/skbuff.c:4809:7: warning: statement will never be executed [-Wswitch-unreachable]
 4809 |   int err;
      |       ^~~

[1] https://bugs.llvm.org/show_bug.cgi?id=44916

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20 10:00:19 -08:00
David S. Miller
41f57cfde1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Alexei Starovoitov says:

====================
pull-request: bpf 2020-02-19

The following pull-request contains BPF updates for your *net* tree.

We've added 10 non-merge commits during the last 10 day(s) which contain
a total of 10 files changed, 93 insertions(+), 31 deletions(-).

The main changes are:

1) batched bpf hashtab fixes from Brian and Yonghong.

2) various selftests and libbpf fixes.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-19 16:42:35 -08:00
David S. Miller
fca07a937a Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue
Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2020-02-19

This series contains fixes to the ice driver.

Brett fixes an issue where if a user sets an odd [tx|rx]-usecs value
through ethtool, the request is denied because the hardware is set to
have an ITR with 2us granularity.  Also fix an issue where the VF has
not been completely removed/reset after being unbound from the host
driver, so resolve this by waiting for the VF remove/reset process to
happen before checking if the VF is disabled.

Michal fixes an issue, where when the user changes flow control via
ethtool, the OS is told the link is going down when that may not be the
case.  Before the fix, the only way to get out of this state was to take
the interface down and up again.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-19 16:38:16 -08:00
Willem de Bruijn
303d0403b8 udp: rehash on disconnect
As of the below commit, udp sockets bound to a specific address can
coexist with one bound to the any addr for the same port.

The commit also phased out the use of socket hashing based only on
port (hslot), in favor of always hashing on {addr, port} (hslot2).

The change broke the following behavior with disconnect (AF_UNSPEC):

    server binds to 0.0.0.0:1337
    server connects to 127.0.0.1:80
    server disconnects
    client connects to 127.0.0.1:1337
    client sends "hello"
    server reads "hello"	// times out, packet did not find sk

On connect the server acquires a specific source addr suitable for
routing to its destination. On disconnect it reverts to the any addr.

The connect call triggers a rehash to a different hslot2. On
disconnect, add the same to return to the original hslot2.

Skip this step if the socket is going to be unhashed completely.

Fixes: 4cdeeee925 ("net: udp: prefer listeners bound to an address")
Reported-by: Pavel Roskin <plroskin@gmail.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-19 16:34:11 -08:00
Rohit Maheshwari
06f5201c63 net/tls: Fix to avoid gettig invalid tls record
Current code doesn't check if tcp sequence number is starting from (/after)
1st record's start sequnce number. It only checks if seq number is before
1st record's end sequnce number. This problem will always be a possibility
in re-transmit case. If a record which belongs to a requested seq number is
already deleted, tls_get_record will start looking into list and as per the
check it will look if seq number is before the end seq of 1st record, which
will always be true and will return 1st record always, it should in fact
return NULL.
As part of the fix, start looking each record only if the sequence number
lies in the list else return NULL.
There is one more check added, driver look for the start marker record to
handle tcp packets which are before the tls offload start sequence number,
hence return 1st record if the record is tls start marker and seq number is
before the 1st record's starting sequence number.

Fixes: e8f6979981 ("net/tls: Add generic NIC offload infrastructure")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-19 16:32:06 -08:00
Yonghong Song
b9aff38de2 bpf: Fix a potential deadlock with bpf_map_do_batch
Commit 057996380a ("bpf: Add batch ops to all htab bpf map")
added lookup_and_delete batch operation for hash table.
The current implementation has bpf_lru_push_free() inside
the bucket lock, which may cause a deadlock.

syzbot reports:
   -> #2 (&htab->buckets[i].lock#2){....}:
       __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
       _raw_spin_lock_irqsave+0x95/0xcd kernel/locking/spinlock.c:159
       htab_lru_map_delete_node+0xce/0x2f0 kernel/bpf/hashtab.c:593
       __bpf_lru_list_shrink_inactive kernel/bpf/bpf_lru_list.c:220 [inline]
       __bpf_lru_list_shrink+0xf9/0x470 kernel/bpf/bpf_lru_list.c:266
       bpf_lru_list_pop_free_to_local kernel/bpf/bpf_lru_list.c:340 [inline]
       bpf_common_lru_pop_free kernel/bpf/bpf_lru_list.c:447 [inline]
       bpf_lru_pop_free+0x87c/0x1670 kernel/bpf/bpf_lru_list.c:499
       prealloc_lru_pop+0x2c/0xa0 kernel/bpf/hashtab.c:132
       __htab_lru_percpu_map_update_elem+0x67e/0xa90 kernel/bpf/hashtab.c:1069
       bpf_percpu_hash_update+0x16e/0x210 kernel/bpf/hashtab.c:1585
       bpf_map_update_value.isra.0+0x2d7/0x8e0 kernel/bpf/syscall.c:181
       generic_map_update_batch+0x41f/0x610 kernel/bpf/syscall.c:1319
       bpf_map_do_batch+0x3f5/0x510 kernel/bpf/syscall.c:3348
       __do_sys_bpf+0x9b7/0x41e0 kernel/bpf/syscall.c:3460
       __se_sys_bpf kernel/bpf/syscall.c:3355 [inline]
       __x64_sys_bpf+0x73/0xb0 kernel/bpf/syscall.c:3355
       do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
       entry_SYSCALL_64_after_hwframe+0x49/0xbe

   -> #0 (&loc_l->lock){....}:
       check_prev_add kernel/locking/lockdep.c:2475 [inline]
       check_prevs_add kernel/locking/lockdep.c:2580 [inline]
       validate_chain kernel/locking/lockdep.c:2970 [inline]
       __lock_acquire+0x2596/0x4a00 kernel/locking/lockdep.c:3954
       lock_acquire+0x190/0x410 kernel/locking/lockdep.c:4484
       __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
       _raw_spin_lock_irqsave+0x95/0xcd kernel/locking/spinlock.c:159
       bpf_common_lru_push_free kernel/bpf/bpf_lru_list.c:516 [inline]
       bpf_lru_push_free+0x250/0x5b0 kernel/bpf/bpf_lru_list.c:555
       __htab_map_lookup_and_delete_batch+0x8d4/0x1540 kernel/bpf/hashtab.c:1374
       htab_lru_map_lookup_and_delete_batch+0x34/0x40 kernel/bpf/hashtab.c:1491
       bpf_map_do_batch+0x3f5/0x510 kernel/bpf/syscall.c:3348
       __do_sys_bpf+0x1f7d/0x41e0 kernel/bpf/syscall.c:3456
       __se_sys_bpf kernel/bpf/syscall.c:3355 [inline]
       __x64_sys_bpf+0x73/0xb0 kernel/bpf/syscall.c:3355
       do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
       entry_SYSCALL_64_after_hwframe+0x49/0xbe

    Possible unsafe locking scenario:

          CPU0                    CPU2
          ----                    ----
     lock(&htab->buckets[i].lock#2);
                                  lock(&l->lock);
                                  lock(&htab->buckets[i].lock#2);
     lock(&loc_l->lock);

    *** DEADLOCK ***

To fix the issue, for htab_lru_map_lookup_and_delete_batch() in CPU0,
let us do bpf_lru_push_free() out of the htab bucket lock. This can
avoid the above deadlock scenario.

Fixes: 057996380a ("bpf: Add batch ops to all htab bpf map")
Reported-by: syzbot+a38ff3d9356388f2fb83@syzkaller.appspotmail.com
Reported-by: syzbot+122b5421d14e68f29cd1@syzkaller.appspotmail.com
Suggested-by: Hillf Danton <hdanton@sina.com>
Suggested-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: Brian Vazquez <brianvv@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200219234757.3544014-1-yhs@fb.com
2020-02-19 16:01:25 -08:00
Brian Vazquez
492e0d0d6f bpf: Do not grab the bucket spinlock by default on htab batch ops
Grabbing the spinlock for every bucket even if it's empty, was causing
significant perfomance cost when traversing htab maps that have only a
few entries. This patch addresses the issue by checking first the
bucket_cnt, if the bucket has some entries then we go and grab the
spinlock and proceed with the batching.

Tested with a htab of size 50K and different value of populated entries.

Before:
  Benchmark             Time(ns)        CPU(ns)
  ---------------------------------------------
  BM_DumpHashMap/1       2759655        2752033
  BM_DumpHashMap/10      2933722        2930825
  BM_DumpHashMap/200     3171680        3170265
  BM_DumpHashMap/500     3639607        3635511
  BM_DumpHashMap/1000    4369008        4364981
  BM_DumpHashMap/5k     11171919       11134028
  BM_DumpHashMap/20k    69150080       69033496
  BM_DumpHashMap/39k   190501036      190226162

After:
  Benchmark             Time(ns)        CPU(ns)
  ---------------------------------------------
  BM_DumpHashMap/1        202707         200109
  BM_DumpHashMap/10       213441         210569
  BM_DumpHashMap/200      478641         472350
  BM_DumpHashMap/500      980061         967102
  BM_DumpHashMap/1000    1863835        1839575
  BM_DumpHashMap/5k      8961836        8902540
  BM_DumpHashMap/20k    69761497       69322756
  BM_DumpHashMap/39k   187437830      186551111

Fixes: 057996380a ("bpf: Add batch ops to all htab bpf map")
Signed-off-by: Brian Vazquez <brianvv@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200218172552.215077-1-brianvv@google.com
2020-02-19 15:59:30 -08:00
Brett Creeley
c54d209c78 ice: Wait for VF to be reset/ready before configuration
The configuration/command below is failing when the VF in the xml
file is already bound to the host iavf driver.

pci_0000_af_0_0.xml:

<interface type='hostdev' managed='yes'>
<source>
<address type='pci' domain='0x0000' bus='0xaf' slot='0x0' function='0x0'/>
</source>
<mac address='00🇩🇪ad:00:11:01'/>
</interface>

> virsh attach-device domain_name pci_0000_af_0_0.xml
error: Failed to attach device from pci_0000_af_0_0.xml
error: Cannot set interface MAC/vlanid to 00🇩🇪ad:00:11:01/0 for
	ifname ens1f1 vf 0: Device or resource busy

This is failing because the VF has not been completely removed/reset
after being unbound (via the virsh command above) from the host iavf
driver and ice_set_vf_mac() checks if the VF is disabled before waiting
for the reset to finish.

Fix this by waiting for the VF remove/reset process to happen before
checking if the VF is disabled. Also, since many functions for VF
administration on the PF were more or less calling the same 3 functions
(ice_wait_on_vf_reset(), ice_is_vf_disabled(), and ice_check_vf_init())
move these into the helper function ice_check_vf_ready_for_cfg(). Then
call this function in any flow that attempts to configure/query a VF
from the PF.

Lastly, increase the maximum wait time in ice_wait_on_vf_reset() to
800ms, and modify/add the #define(s) that determine the wait time.
This was done for robustness because in rare/stress cases VF removal can
take a max of ~800ms and previously the wait was a max of ~300ms.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-02-19 11:50:41 -08:00
Michal Swiatkowski
8a55c08d3b ice: Don't tell the OS that link is going down
Remove code that tell the OS that link is going down when user
change flow control via ethtool. When link is up it isn't certain
that link goes down after 0x0605 aq command. If link doesn't go
down, OS thinks that link is down, but physical link is up. To
reset this state user have to take interface down and up.

If link goes down after 0x0605 command, FW send information
about that and after that driver tells the OS that the link goes
down. So this code in ethtool is unnecessary.

Signed-off-by: Michal Swiatkowski <michal.swiatkowski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-02-19 11:50:41 -08:00
Brett Creeley
840f8ad0aa ice: Don't reject odd values of usecs set by user
Currently if a user sets an odd [tx|rx]-usecs value through ethtool,
the request is denied because the hardware is set to have an ITR
granularity of 2us. This caused poor customer experience. Fix this by
aligning to a register allowed value, which results in rounding down.
Also, print a once per ring container type message to be clear about
our intentions.

Also, change the ITR_TO_REG define to be the bitwise and of the ITR
setting and the ICE_ITR_MASK. This makes the purpose of ITR_TO_REG more
obvious.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-02-19 11:50:41 -08:00
Madhuparna Bhowmik
33c4acbe2f bridge: br_stp: Use built-in RCU list checking
list_for_each_entry_rcu() has built-in RCU and lock checking.

Pass cond argument to list_for_each_entry_rcu() to silence
false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled
by default.

Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-19 11:13:43 -08:00
Dmitry Osipenko
c3331d2fe3 nfc: pn544: Fix occasional HW initialization failure
The PN544 driver checks the "enable" polarity during of driver's probe and
it's doing that by turning ON and OFF NFC with different polarities until
enabling succeeds. It takes some time for the hardware to power-down, and
thus, to deassert the IRQ that is raised by turning ON the hardware.
Since the delay after last power-down of the polarity-checking process is
missed in the code, the interrupt may trigger immediately after installing
the IRQ handler (right after the checking is done), which results in IRQ
handler trying to touch the disabled HW and ends with marking NFC as
'DEAD' during of the driver's probe:

  pn544_hci_i2c 1-002a: NFC: nfc_en polarity : active high
  pn544_hci_i2c 1-002a: NFC: invalid len byte
  shdlc: llc_shdlc_recv_frame: NULL Frame -> link is dead

This patch fixes the occasional NFC initialization failure on Nexus 7
device.

Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-19 11:09:27 -08:00
Amol Grover
a7a9456e8d net: hsr: Pass lockdep expression to RCU lists
node_db is traversed using list_for_each_entry_rcu
outside an RCU read-side critical section but under the protection
of hsr->list_lock.

Hence, add corresponding lockdep expression to silence false-positive
warnings, and harden RCU lists.

Signed-off-by: Amol Grover <frextrite@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-19 10:55:42 -08:00
David S. Miller
7822dee5d1 mlx5-fixes-2020-02-18
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl5MpQAACgkQSD+KveBX
 +j7kcwgAtUPPpulo9AqMjk+2dJ67lT+GKl4JlrQR2kFs9o0j5vO8apixckqBGXyV
 wHTHLR79T710T1LD2mdIPFwFE5ecXKQL3mVrYMcyPCdkntZnNK7K3LpRWOo0t3yf
 JBeQoN+VuCXFAsyuQ39W+cr0+i4tO8DQ9kBj7usLu/8apuyo3KzA/3uLFE69zZeb
 vC7Cc+TcSWeiDvgCZGFRYtoRBvV2+AcyDNjY3Kaz6Yl+hgu53pGJfJWgV+lCxxr8
 vavJJgQQeAjFX52QttiXQBsvkfX+cqNfzVc/W9tI5rq5C0qmH4yGndrTOwsvCFen
 Q+5pbP6KYNX6X7urHfuJ25q5/mUkeA==
 =xZyi
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-fixes-2020-02-18' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
Mellanox, mlx5 fixes 2020-02-18

This series introduces some fixes to mlx5 driver.

Please pull and let me know if there is any problem.

For -stable v5.3
 ('net/mlx5: Fix sleep while atomic in mlx5_eswitch_get_vepa')

For -stable v5.4
 ('net/mlx5: DR, Fix matching on vport gvmi')
 ('net/mlx5e: Fix crash in recovery flow without devlink reporter')

For -stable v5.5
 ('net/mlx5e: Reset RQ doorbell counter before moving RQ state from RST to RDY')
 ('net/mlx5e: Don't clear the whole vf config when switching modes')
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-19 10:31:26 -08:00
Erez Shitrit
13a7e459a4 net/mlx5: DR, Handle reformat capability over sw-steering tables
On flow table creation, send the relevant flags according to what the FW
currently supports.
When FW doesn't support reformat option over SW-steering managed table,
the driver shouldn't pass this.

Fixes: 988fd6b32d ("net/mlx5: DR, Pass table flags at creation to lower layer")
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-02-18 19:01:20 -08:00
Paul Blakey
76781623f0 net/mlx5: Fix lowest FDB pool size
The pool sizes represent the pool sizes in the fw. when we request
a pool size from fw, it will return the next possible group.
We track how many pools the fw has left and start requesting groups
from the big to the small.
When we start request 4k group, which doesn't exists in fw, fw
wants to allocate the next possible size, 64k, but will fail since
its exhausted. The correct smallest pool size in fw is 128 and not 4k.

Fixes: 39ac237ce0 ("net/mlx5: E-Switch, Refactor chains and priorities")
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-02-18 19:01:20 -08:00
Dmytro Linkin
383de10815 net/mlx5e: Don't clear the whole vf config when switching modes
There is no need to reset all vf config (except link state) between
legacy and switchdev modes changes.
Also, set link state to AUTO, when legacy enabled.

Fixes: 3b83b6c2e0 ("net/mlx5e: Clear VF config when switching modes")
Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-02-18 19:01:19 -08:00
Hamdan Igbaria
52d214976d net/mlx5: DR, Fix matching on vport gvmi
Set vport gvmi in the tag, only when source gvmi is set in the bit mask.

Fixes: 26d688e3 ("net/mlx5: DR, Add Steering entry (STE) utilities")
Signed-off-by: Hamdan Igbaria <hamdani@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-02-18 19:01:19 -08:00
Aya Levin
1ad6c43c6a net/mlx5e: Fix crash in recovery flow without devlink reporter
When health reporters are not supported, recovery function is invoked
directly, not via devlink health reporters.

In this direct flow, the recover function input parameter was passed
incorrectly and is causing a kernel oops. This patch is fixing the input
parameter.

Following call trace is observed on rx error health reporting.

Internal error: Oops: 96000007 [#1] PREEMPT SMP
Process kworker/u16:4 (pid: 4584, stack limit = 0x00000000c9e45703)
Call trace:
mlx5e_rx_reporter_err_rq_cqe_recover+0x30/0x164 [mlx5_core]
mlx5e_health_report+0x60/0x6c [mlx5_core]
mlx5e_reporter_rq_cqe_err+0x6c/0x90 [mlx5_core]
mlx5e_rq_err_cqe_work+0x20/0x2c [mlx5_core]
process_one_work+0x168/0x3d0
worker_thread+0x58/0x3d0
kthread+0x108/0x134

Fixes: c50de4af1d ("net/mlx5e: Generalize tx reporter's functionality")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-02-18 19:01:19 -08:00
Aya Levin
5ee090ed0d net/mlx5e: Reset RQ doorbell counter before moving RQ state from RST to RDY
Initialize RQ doorbell counters to zero prior to moving an RQ from RST
to RDY state. Per HW spec, when RQ is back to RDY state, the descriptor
ID on the completion is reset. The doorbell record must comply.

Fixes: 8276ea1353 ("net/mlx5e: Report and recover from CQE with error on RQ")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reported-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-02-18 19:01:19 -08:00
Huy Nguyen
3d9c5e023a net/mlx5: Fix sleep while atomic in mlx5_eswitch_get_vepa
rtnl_bridge_getlink is protected by rcu lock, so mlx5_eswitch_get_vepa
cannot take mutex lock. Two possible issues can happen:
1. User at the same time change vepa mode via RTM_SETLINK command.
2. User at the same time change the switchdev mode via devlink netlink
interface.

Case 1 cannot happen because rtnl executes one message in order.
Case 2 can happen but we do not expect user to change the switchdev mode
when changing vepa. Even if a user does it, so he will read a value
which is no longer valid.

Fixes: 8da202b249 ("net/mlx5: E-Switch, Add support for VEPA in legacy mode.")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-02-18 19:01:18 -08:00
David S. Miller
7c8c1697c7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

This batch contains Netfilter fixes for net:

1) Restrict hashlimit size to 1048576, from Cong Wang.

2) Check for offload flags from nf_flow_table_offload_setup(),
   this fixes a crash in case the hardware offload is disabled.
   From Florian Westphal.

3) Three preparation patches to extend the conntrack clash resolution,
   from Florian.

4) Extend clash resolution to deal with DNS packets from the same flow
   racing to set up the NAT configuration.

5) Small documentation fix in pipapo, from Stefano Brivio.

6) Remove misleading unlikely() from pipapo_refill(), also from Stefano.

7) Reduce hashlimit mutex scope, from Cong Wang. This patch is actually
   triggering another problem, still under discussion, another patch to
   fix this will follow up.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-18 15:44:13 -08:00
Stefano Brivio
9a7712048f netfilter: nft_set_pipapo: Don't abuse unlikely() in pipapo_refill()
I originally used unlikely() in the if (match_only) clause, which
we hit on the mapping table for the last field in a set, to ensure
we avoid branching to the rest of for loop body, which is executed
more frequently.

However, Pablo reports, this is confusing as it gives the impression
that this is not a common case, and it's actually not the intended
usage of unlikely().

I couldn't observe any statistical difference in matching rates on
x864_64 and aarch64 without it, so just drop it.

Reported-by: Pablo Neira Ayuso <pablo@netfilter.org>
Fixes: 3c4287f620 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-02-18 22:07:09 +01:00
Stefano Brivio
bd97ad51a7 netfilter: nft_set_pipapo: Fix mapping table example in comments
In both insertion and lookup examples, the two element pointers
of rule mapping tables were swapped. Fix that.

Reported-by: Pablo Neira Ayuso <pablo@netfilter.org>
Fixes: 3c4287f620 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-02-18 22:06:54 +01:00
Madhuparna Bhowmik
a2cfb96cc3 flow_table.c: Use built-in RCU list checking
hlist_for_each_entry_rcu() has built-in RCU and lock checking.

Pass cond argument to list_for_each_entry_rcu() to silence
false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled
by default.

Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-18 12:46:27 -08:00
Madhuparna Bhowmik
53742e69e8 datapath.c: Use built-in RCU list checking
hlist_for_each_entry_rcu() has built-in RCU and lock checking.

Pass cond argument to list_for_each_entry_rcu() to silence
false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled
by default.

Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-18 12:46:27 -08:00
Madhuparna Bhowmik
fed48423f1 vport.c: Use built-in RCU list checking
hlist_for_each_entry_rcu() has built-in RCU and lock checking.

Pass cond argument to list_for_each_entry_rcu() to silence
false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled
by default.

Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-18 12:46:27 -08:00
Madhuparna Bhowmik
7790614616 meter.c: Use built-in RCU list checking
hlist_for_each_entry_rcu() has built-in RCU and lock checking.

Pass cond argument to list_for_each_entry_rcu() to silence
false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled
by default.

Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-18 12:46:26 -08:00
Madhuparna Bhowmik
9facfdb546 netlabel_domainhash.c: Use built-in RCU list checking
list_for_each_entry_rcu() has built-in RCU and lock checking.

Pass cond argument to list_for_each_entry_rcu() to silence
false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled
by default.

Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-18 12:44:23 -08:00
Madhuparna Bhowmik
8c70c3d728 net: netlabel: Use built-in RCU list checking
list_for_each_entry_rcu() has built-in RCU and lock checking.

Pass cond argument to list_for_each_entry_rcu() to silence
false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled
by default.

Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-18 12:43:06 -08:00
Paolo Abeni
379349e9bc Revert "net: dev: introduce support for sch BYPASS for lockless qdisc"
This reverts commit ba27b4cdaa

Ahmed reported ouf-of-order issues bisected to commit ba27b4cdaa
("net: dev: introduce support for sch BYPASS for lockless qdisc").
I can't find any working solution other than a plain revert.

This will introduce some minor performance regressions for
pfifo_fast qdisc. I plan to address them in net-next with more
indirect call wrapper boilerplate for qdiscs.

Reported-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Fixes: ba27b4cdaa ("net: dev: introduce support for sch BYPASS for lockless qdisc")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-18 12:37:45 -08:00
Jonathan Neuschäfer
29f20dd625 net: phy: broadcom: Fix a typo ("firsly")
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-18 12:31:25 -08:00
Florian Westphal
d99bfed58d mptcp: fix bogus socket flag values
Dan Carpenter reports static checker warnings due to bogus BIT() usage:

net/mptcp/subflow.c:571 subflow_write_space() warn: test_bit() takes a bit number
net/mptcp/subflow.c:694 subflow_state_change() warn: test_bit() takes a bit number
net/mptcp/protocol.c:261 ssk_check_wmem() warn: test_bit() takes a bit number
[..]

This is harmless (we use bits 1 & 2 instead of 0 and 1), but would
break eventually when adding BIT(5) (or 6, depends on size of 'long').

Just use 0 and 1, the values are only passed to test/set/clear_bit
functions.

Fixes: 648ef4b886 ("mptcp: Implement MPTCP receive path")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-18 12:05:53 -08:00
Michal Kalderon
af6565adb0 qede: Fix race between rdma destroy workqueue and link change event
If an event is added while the rdma workqueue is being destroyed
it could lead to several races, list corruption, null pointer
dereference during queue_work or init_queue.
This fixes the race between the two flows which can occur during
shutdown.

A kref object and a completion object are added to the rdma_dev
structure, these are initialized before the workqueue is created.
The refcnt is used to indicate work is being added to the
workqueue and ensures the cleanup flow won't start while we're in
the middle of adding the event.
Once the work is added, the refcnt is decreased and the cleanup flow
is safe to run.

Fixes: cee9fbd8e2 ("qede: Add qedr framework")
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-18 12:05:53 -08:00
Toke Høiland-Jørgensen
113e6b7e15 libbpf: Sanitise internal map names so they are not rejected by the kernel
The kernel only accepts map names with alphanumeric characters, underscores
and periods in their name. However, the auto-generated internal map names
used by libbpf takes their prefix from the user-supplied BPF object name,
which has no such restriction. This can lead to "Invalid argument" errors
when trying to load a BPF program using global variables.

Fix this by sanitising the map names, replacing any non-allowed characters
with underscores.

Fixes: d859900c4c ("bpf, libbpf: support global data/bss/rodata sections")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200217171701.215215-1-toke@redhat.com
2020-02-18 15:34:20 +01:00
Toke Høiland-Jørgensen
f25975f42f bpf, uapi: Remove text about bpf_redirect_map() giving higher performance
The performance of bpf_redirect() is now roughly the same as that of
bpf_redirect_map(). However, David Ahern pointed out that the header file
has not been updated to reflect this, and still says that a significant
performance increase is possible when using bpf_redirect_map(). Remove this
text from the bpf_redirect_map() description, and reword the description in
bpf_redirect() slightly. Also fix the 'Return' section of the
bpf_redirect_map() documentation.

Fixes: 1d233886dd ("xdp: Use bulking for non-map XDP_REDIRECT and consolidate code paths")
Reported-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20200218130334.29889-1-toke@redhat.com
2020-02-18 15:31:31 +01:00
Hangbin Liu
9b64208f74 selftests: forwarding: vxlan_bridge_1d: use more proper tos value
0x11 and 0x12 set the ECN bits based on RFC2474, it would be better to avoid
that. 0x14 and 0x18 would be better and works as well.

Reported-by: Petr Machata <petrm@mellanox.com>
Fixes: 4e867c9a50 ("selftests: forwarding: vxlan_bridge_1d: fix tos value")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17 22:41:10 -08:00
Alexandre Belloni
82969e6ef0 net: cnic: fix spelling mistake "reserverd" -> "reserved"
The reserved member should be named reserved3.

Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17 21:59:16 -08:00
Xin Long
245709ec8b sctp: move the format error check out of __sctp_sf_do_9_1_abort
When T2 timer is to be stopped, the asoc should also be deleted,
otherwise, there will be no chance to call sctp_association_free
and the asoc could last in memory forever.

However, in sctp_sf_shutdown_sent_abort(), after adding the cmd
SCTP_CMD_TIMER_STOP for T2 timer, it may return error due to the
format error from __sctp_sf_do_9_1_abort() and miss adding
SCTP_CMD_ASSOC_FAILED where the asoc will be deleted.

This patch is to fix it by moving the format error check out of
__sctp_sf_do_9_1_abort(), and do it before adding the cmd
SCTP_CMD_TIMER_STOP for T2 timer.

Thanks Hangbin for reporting this issue by the fuzz testing.

v1->v2:
  - improve the comment in the code as Marcelo's suggestion.

Fixes: 96ca468b86 ("sctp: check invalid value of length parameter in error cause")
Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17 21:56:03 -08:00
Jason Baron
8a9093c798 net: sched: correct flower port blocking
tc flower rules that are based on src or dst port blocking are sometimes
ineffective due to uninitialized stack data. __skb_flow_dissect() extracts
ports from the skb for tc flower to match against. However, the port
dissection is not done when when the FLOW_DIS_IS_FRAGMENT bit is set in
key_control->flags. All callers of __skb_flow_dissect(), zero-out the
key_control field except for fl_classify() as used by the flower
classifier. Thus, the FLOW_DIS_IS_FRAGMENT may be set on entry to
__skb_flow_dissect(), since key_control is allocated on the stack
and may not be initialized.

Since key_basic and key_control are present for all flow keys, let's
make sure they are initialized.

Fixes: 62230715fd ("flow_dissector: do not dissect l4 ports for fragments")
Co-developed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17 21:33:28 -08:00
Horatiu Vultur
a81541041c net: mscc: fix in frame extraction
Each extracted frame on Ocelot has an IFH. The frame and IFH are extracted
by reading chuncks of 4 bytes from a register.

In case the IFH and frames were read corretly it would try to read the next
frame. In case there are no more frames in the queue, it checks if there
were any previous errors and in that case clear the queue. But this check
will always succeed also when there are no errors. Because when extracting
the IFH the error is checked against 4(number of bytes read) and then the
error is set only if the extraction of the frame failed. So in a happy case
where there are no errors the err variable is still 4. So it could be
a case where after the check that there are no more frames in the queue, a
frame will arrive in the queue but because the error is not reseted, it
would try to flush the queue. So the frame will be lost.

The fix consist in resetting the error after reading the IFH.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17 14:02:29 -08:00
Johannes Krude
e20d3a055a bpf, offload: Replace bitwise AND by logical AND in bpf_prog_offload_info_fill
This if guards whether user-space wants a copy of the offload-jited
bytecode and whether this bytecode exists. By erroneously doing a bitwise
AND instead of a logical AND on user- and kernel-space buffer-size can lead
to no data being copied to user-space especially when user-space size is a
power of two and bigger then the kernel-space buffer.

Fixes: fcfb126def ("bpf: add new jited info fields in bpf_dev_offload and bpf_prog_info")
Signed-off-by: Johannes Krude <johannes@krude.de>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/bpf/20200212193227.GA3769@phlox.h.transitiv.net
2020-02-17 16:53:49 +01:00