Commit Graph

754540 Commits

Author SHA1 Message Date
Eric Dumazet
9c21d2fc41 tcp: add tcp_comp_sack_nr sysctl
This per netns sysctl allows for TCP SACK compression fine-tuning.

This limits number of SACK that can be compressed.
Using 0 disables SACK compression.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-18 11:40:27 -04:00
Eric Dumazet
6d82aa2420 tcp: add tcp_comp_sack_delay_ns sysctl
This per netns sysctl allows for TCP SACK compression fine-tuning.

Its default value is 1,000,000, or 1 ms to meet TSO autosizing period.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-18 11:40:27 -04:00
Eric Dumazet
200d95f457 tcp: add TCPAckCompressed SNMP counter
This counter tracks number of ACK packets that the host has not sent,
thanks to ACK compression.

Sample output :

$ nstat -n;sleep 1;nstat|egrep "IpInReceives|IpOutRequests|TcpInSegs|TcpOutSegs|TcpExtTCPAckCompressed"
IpInReceives                    123250             0.0
IpOutRequests                   3684               0.0
TcpInSegs                       123251             0.0
TcpOutSegs                      3684               0.0
TcpExtTCPAckCompressed          119252             0.0

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-18 11:40:27 -04:00
Eric Dumazet
5d9f4262b7 tcp: add SACK compression
When TCP receives an out-of-order packet, it immediately sends
a SACK packet, generating network load but also forcing the
receiver to send 1-MSS pathological packets, increasing its
RTX queue length/depth, and thus processing time.

Wifi networks suffer from this aggressive behavior, but generally
speaking, all these SACK packets add fuel to the fire when networks
are under congestion.

This patch adds a high resolution timer and tp->compressed_ack counter.

Instead of sending a SACK, we program this timer with a small delay,
based on RTT and capped to 1 ms :

	delay = min ( 5 % of RTT, 1 ms)

If subsequent SACKs need to be sent while the timer has not yet
expired, we simply increment tp->compressed_ack.

When timer expires, a SACK is sent with the latest information.
Whenever an ACK is sent (if data is sent, or if in-order
data is received) timer is canceled.

Note that tcp_sack_new_ofo_skb() is able to force a SACK to be sent
if the sack blocks need to be shuffled, even if the timer has not
expired.

A new SNMP counter is added in the following patch.

Two other patches add sysctls to allow changing the 1,000,000 and 44
values that this commit hard-coded.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-18 11:40:27 -04:00
Eric Dumazet
a3893637e1 tcp: do not force quickack when receiving out-of-order packets
As explained in commit 9f9843a751 ("tcp: properly handle stretch
acks in slow start"), TCP stacks have to consider how many packets
are acknowledged in one single ACK, because of GRO, but also
because of ACK compression or losses.

We plan to add SACK compression in the following patch, we
must therefore not call tcp_enter_quickack_mode()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-18 11:40:27 -04:00
Eric Dumazet
cf0dd20372 tcp: use __sock_put() instead of sock_put() in tcp_clear_xmit_timers()
Socket can not disappear under us.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-18 11:40:27 -04:00
Alexandre Belloni
64a2658b58 net: mscc: Add SPDX identifier
ocelot_qsys.h is missing the SPDX identfier, fix that.

Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Reviewed-by: Allan W. Nielsen <allan.nielsen@microsemi.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-18 11:30:25 -04:00
David S. Miller
10151339e8 Merge branch 'stmmac-Clean-up-and-tune-up'
Jose Abreu says:

====================
net: stmmac: Clean-up and tune-up

This targets to uniformize the handling of the different GMAC versions in
stmmac_main.c file and also tune-up the HW.

Currently there are some if/else conditions in the main source file which
calls different callbacks depending on the ID of GMAC.

With the introducion of a generic HW interface handling which automatically
selects the GMAC callbacks to be used, it is now unpleasant to see if
conditions in the main code because this should be completely agnostic of the
GMAC version.

This series removes most of these conditions. There are some if conditions
that remain untouched but the callbacks handling are now uniformized.

Tested in GMAC5, hope I didn't break any previous versions.

Please check [1] for performance analisys of patches 3-12.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-18 11:00:16 -04:00
Jose Abreu
61fac60a6a net: stmmac: Remove if condition by taking advantage of hwif return code
We can remove the if condition and check if return code is different
than -EINVAL, meaning callback is present.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Vitor Soares <soares@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-18 11:00:16 -04:00
Jose Abreu
d2df9ea0ad net: stmmac: Let descriptor code get skbuff address
Stop using if conditions depending on the GMAC version for getting the
descriptor skbuff address and use instead a helper implemented in the
descriptor files.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Vitor Soares <soares@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-18 11:00:16 -04:00
Jose Abreu
357951cdf0 net: stmmac: Uniformize set_rx_owner()
Currently an if condition is used to select the correct callback to set
rx_onwer in descriptor. Lets keep this simple and always use the same
callback.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Vitor Soares <soares@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-18 11:00:16 -04:00
Jose Abreu
f1565c6021 net: stmmac: Remove uneeded check for GMAC version in stmmac_xmit
We either have .enable_dma_transmission or .set_tx_tail_ptr in the HW
table callbacks, we can never have both so there is no need to check for
GMAC version.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Vitor Soares <soares@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-18 11:00:15 -04:00
Jose Abreu
24aaed0cc0 net: stmmac: Uniformize the use of dma_init_* callbacks
Instead of relying on the GMAC version for choosing if we need to use
dma_init or dma_init_{rx/tx}_chan callback, lets uniformize this and
always use the dma_init_{rx/tx}_chan callbacks.

While at it, fix the use of dma_init_chan callback, which shall be
called for as many channels as the max of rx/tx channels.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Vitor Soares <soares@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-18 11:00:15 -04:00
Jose Abreu
758d5c73e2 net: stmmac: Move PTP and MMC base address calculation to hwif.c
PTP and MMC modules base address can depend on the GMAC version. As this
is HW specific lets move this base address calculation to hwif.c. Also,
add an entry in the HW table so that we can specify the module offset.
This can later be extended to more modules, if deemed necessary.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Vitor Soares <soares@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-18 11:00:15 -04:00
Jose Abreu
63a550fc15 net: stmmac: Remove uneeded checks for GMAC version
With the introducion of callbacks check in hwif.h we only call the
callback if HW supports it so there is no longer need to check for GMAC
version.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Vitor Soares <soares@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-18 11:00:15 -04:00
Jose Abreu
ab0204e35c net: stmmac: Uniformize the use of dma_{rx/tx}_mode callbacks
Instead of relying on the GMAC version for choosing if we need to use
dma_{rx/tx}_mode or just dma_mode callback lets uniformize this and
always use the dma_{rx/tx}_mode callbacks.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Vitor Soares <soares@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-18 11:00:15 -04:00
Jose Abreu
44c67f8559 net: stmmac: Let descriptor code clear the descriptor
Stop using if conditions depending on the GMAC version for clearing the
descriptor and use instead a helper implemented in the descriptor files.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Vitor Soares <soares@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-18 11:00:15 -04:00
Jose Abreu
6844171d5b net: stmmac: Let descriptor code set skbuff address
Stop using if conditions depending on the GMAC version for setting the
the descriptor skbuff address and use instead a helper implemented in
the descriptor files.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Vitor Soares <soares@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-18 11:00:14 -04:00
Jose Abreu
4ae0169fd1 net: stmmac: Do not keep rearming the coalesce timer in stmmac_xmit
This is cutting down performance. Once the timer is armed it should run
after the time expires for the first packet sent and not the last one.

After this change, running iperf, the performance gain is +/- 24%.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Vitor Soares <soares@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-18 11:00:14 -04:00
Jose Abreu
67e1c4068d net: stmmac: Enable OSP for GMAC4
This enables OSP (Operate on Second Packet) for GMAC4. The feature
allows DMA to fetch second descriptor while its still processing the
first one.

Running iperf, the performance gain is +/- 38%.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Vitor Soares <soares@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-18 11:00:14 -04:00
David S. Miller
538e2de104 Merge branch 'net-Allow-more-drivers-with-COMPILE_TEST'
Florian Fainelli says:

====================
net: Allow more drivers with COMPILE_TEST

This patch series includes more drivers to be build tested with COMPILE_TEST
enabled. This helps cover some of the issues I just ran into with missing
a driver *sigh*.

Chanves in v3:

- drop the TI Keystone NETCP driver from the COMPILE_TEST additions

Changes in v2:

- allow FEC to build outside of CONFIG_ARM/ARM64 by defining a layout of
  registers, this is not meant to run, so this is not a real issue if we
  are not matching the correct register layout
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17 17:11:07 -04:00
Florian Fainelli
3c0596f8be net: phy: Allow MDIO_MOXART and MDIO_SUN4I with COMPILE_TEST
Those drivers build just fine with COMPILE_TEST, so make that possible.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17 17:11:06 -04:00
Florian Fainelli
78cc6e7ef9 net: ethernet: freescale: Allow FEC with COMPILE_TEST
The Freescale FEC driver builds fine with COMPILE_TEST, so make that
possible.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17 17:11:06 -04:00
Florian Fainelli
2652113ff0 net: ethernet: ti: Allow most drivers with COMPILE_TEST
Most of the TI drivers build just fine with COMPILE_TEST, cpmac (AR7) is
the exception because it uses a header file from
arch/mips/include/asm/mach-ar7/ar7.h and keystone netcp which requires
help from drivers/soc/ti/ for queue management helpers.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17 17:11:06 -04:00
David Ahern
33fa382324 vlan: Add extack messages for link create
Add informative messages for error paths related to adding a
VLAN to a device.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17 17:08:55 -04:00
Manish Chopra
8a8633978b qede: Add build_skb() support.
This patch makes use of build_skb() throughout in driver's receieve
data path [HW gro flow and non HW gro flow]. With this, driver can
build skb directly from the page segments which are already mapped
to the hardware instead of allocating new SKB via netdev_alloc_skb()
and memcpy the data which is quite costly.

This really improves performance (keeping same or slight gain in rx
throughput) in terms of CPU utilization which is significantly reduced
[almost half] in non HW gro flow where for every incoming MTU sized
packet driver had to allocate skb, memcpy headers etc. Additionally
in that flow, it also gets rid of bunch of additional overheads
[eth_get_headlen() etc.] to split headers and data in the skb.

Tested with:
system: 2 sockets, 4 cores per socket, hyperthreading, 2x4x2=16 cores
iperf [server]: iperf -s
iperf [client]: iperf -c <server_ip> -t 500 -i 10 -P 32

HW GRO off – w/o build_skb(), throughput: 36.8 Gbits/sec

Average:     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
Average:     all    0.59    0.00   32.93    0.00    0.00   43.07    0.00    0.00   23.42

HW GRO off - with build_skb(), throughput: 36.9 Gbits/sec

Average:     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
Average:     all    0.70    0.00   31.70    0.00    0.00   25.68    0.00    0.00   41.92

HW GRO on - w/o build_skb(), throughput: 36.9 Gbits/sec

Average:     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
Average:     all    0.86    0.00   24.14    0.00    0.00    6.59    0.00    0.00   68.41

HW GRO on - with build_skb(), throughput: 37.5 Gbits/sec

Average:     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
Average:     all    0.87    0.00   23.75    0.00    0.00    6.19    0.00    0.00   69.19

Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: Manish Chopra <manish.chopra@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17 17:06:53 -04:00
David S. Miller
56a9a9e737 Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
10GbE Intel Wired LAN Driver Updates 2018-05-17

This series contains updates to ixgbe, ixgbevf and ice drivers.

Cathy Zhou resolves sparse warnings by using the force attribute.

Mauro S M Rodrigues fixes a bug where IRQs were not freed if a PCI error
recovery system opts to remove the device which causes
ixgbe_io_error_detected() to return PCI_ERS_RESULT_DISCONNECT before
calling ixgbe_close_suspend() which results in IRQs not freed and
crashing when the remove handler calls pci_disable_device().  Resolved
this by calling ixgbe_close_suspend() before evaluating the PCI channel
state.

Pavel Tatashin releases the rtnl_lock during the call to
ixgbe_close_suspend() to allow scaling if device_shutdown() is
multi-threaded.

Emil modifies ixgbe to not validate the MAC address during a reset,
unless the MAC was set on the host so that the VF will get a new MAC
address every time it reloads.  Also updates ixgbevf to set
hw->mac.perm_addr in order to retain the custom MAC on a reset.

Anirudh updates the ice NVM read/erase/update AQ commands to align with
the latest specification.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17 17:02:57 -04:00
Roman Mashak
7c5995b33d tc-testing: fixed copy-pasting error in ife tests
Reported-by: Vlad Buslov <vladbu@mellanox.com>
Reported-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17 16:31:43 -04:00
Dan Carpenter
990a9d4975 net/ncsi: prevent a couple array underflows
We recently refactored this code and introduced a static checker
warning.  Smatch complains that if cmd->index is zero then we would
underflow the arrays.  That's obviously true.

The question is whether we prevent cmd->index from being zero at a
different level.  I've looked at the code and I don't immediately see
a check for that.

Fixes: 062b3e1b6d ("net/ncsi: Refactor MAC, VLAN filters")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17 16:27:39 -04:00
Eric Dumazet
be7f3e5999 net/smc: init conn.tx_work & conn.send_lock sooner
syzkaller found that following program crashes the host :

{
  int fd = socket(AF_SMC, SOCK_STREAM, 0);
  int val = 1;

  listen(fd, 0);
  shutdown(fd, SHUT_RDWR);
  setsockopt(fd, 6, TCP_NODELAY, &val, 4);
}

Simply initialize conn.tx_work & conn.send_lock at socket creation,
rather than deeper in the stack.

ODEBUG: assert_init not available (active state 0) object type: timer_list hint:           (null)
WARNING: CPU: 1 PID: 13988 at lib/debugobjects.c:329 debug_print_object+0x16a/0x210 lib/debugobjects.c:326
Kernel panic - not syncing: panic_on_warn set ...

CPU: 1 PID: 13988 Comm: syz-executor0 Not tainted 4.17.0-rc4+ #46
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1b9/0x294 lib/dump_stack.c:113
 panic+0x22f/0x4de kernel/panic.c:184
 __warn.cold.8+0x163/0x1b3 kernel/panic.c:536
 report_bug+0x252/0x2d0 lib/bug.c:186
 fixup_bug arch/x86/kernel/traps.c:178 [inline]
 do_error_trap+0x1de/0x490 arch/x86/kernel/traps.c:296
 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
 invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:992
RIP: 0010:debug_print_object+0x16a/0x210 lib/debugobjects.c:326
RSP: 0018:ffff880197a37880 EFLAGS: 00010086
RAX: 0000000000000061 RBX: 0000000000000005 RCX: ffffc90001ed0000
RDX: 0000000000004aaf RSI: ffffffff8160f6f1 RDI: 0000000000000001
RBP: ffff880197a378c0 R08: ffff8801aa7a0080 R09: ffffed003b5e3eb2
R10: ffffed003b5e3eb2 R11: ffff8801daf1f597 R12: 0000000000000001
R13: ffffffff88d96980 R14: ffffffff87fa19a0 R15: ffffffff81666ec0
 debug_object_assert_init+0x309/0x500 lib/debugobjects.c:692
 debug_timer_assert_init kernel/time/timer.c:724 [inline]
 debug_assert_init kernel/time/timer.c:776 [inline]
 del_timer+0x74/0x140 kernel/time/timer.c:1198
 try_to_grab_pending+0x439/0x9a0 kernel/workqueue.c:1223
 mod_delayed_work_on+0x91/0x250 kernel/workqueue.c:1592
 mod_delayed_work include/linux/workqueue.h:541 [inline]
 smc_setsockopt+0x387/0x6d0 net/smc/af_smc.c:1367
 __sys_setsockopt+0x1bd/0x390 net/socket.c:1903
 __do_sys_setsockopt net/socket.c:1914 [inline]
 __se_sys_setsockopt net/socket.c:1911 [inline]
 __x64_sys_setsockopt+0xbe/0x150 net/socket.c:1911
 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 01d2f7e2cd ("net/smc: sockopts TCP_NODELAY and TCP_CORK")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ursula Braun <ubraun@linux.ibm.com>
Cc: linux-s390@vger.kernel.org
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17 16:25:35 -04:00
Jiri Pirko
3b734ff604 nfp: flower: fix error path during representor creation
Don't store repr pointer to reprs array until the representor is
successfully created. This avoids message about "representor
destruction" even when it was never created. Also it cleans-up the flow.
Also, check return value after port alloc.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17 16:23:29 -04:00
David S. Miller
6d9f868fc7 Merge branch 'mvpp2-small-improvements'
Antoine Tenart says:

====================
net: mvpp2: small improvements

Those 3 patches are small improvements to the Marvell PPv2 driver. The
series does not conflict with the one sent about phylink and
1000/2500baseX support, so the two series can live in parallel.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17 16:18:55 -04:00
Yan Markman
934e0f8330 net: mvpp2: print rx error with rate-limit
Prevent flood of RX error prints during heavy traffic with weak signal
in link by checking net_ratelimit() before using netdev_err().

Signed-off-by: Yan Markman <ymarkman@marvell.com>
[Antoine: small rework, commit message]
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17 16:18:55 -04:00
Yan Markman
5b0ab2f41d net: mvpp2: set mac address does not require the stop/start sequence
Remove special stop/start handling from the set_mac_address callback.
All this special care is not needed, and can be removed. It also
simplifies the up/down status in the driver and helps avoiding possible
link status mismatch issues.

Signed-off-by: Yan Markman <ymarkman@marvell.com>
[Antoine: commit message]
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17 16:18:54 -04:00
Yan Markman
914365f1c9 net: mvpp2: avoid checking for free aggregated descriptors twice
Avoid repeating the check for free aggregated descriptors when it
already failed at the beginning of the function.

Signed-off-by: Yan Markman <ymarkman@marvell.com>
[Antoine: commit message]
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17 16:18:54 -04:00
David S. Miller
808e2fc3b0 Merge branch 'mvpp2-phylink-conversion'
Antoine Tenart says:

====================
net: mvpp2: phylink conversion

This series convert the Marvell PPv2 driver to phylink (models the MAC
to PHY link).

One important point is the PPv2 driver supports two probe modes: device
tree and ACPI. This series only brings phylink support for the device
tree mode, as the ACPI one will need further work. Still, the driver
should be working as before when using ACPI. This split should be
temporary, and was discussed with Marcin (in Cc.) who added ACPI support
to the driver.

Also as the SFP cages on both DB boards can be considered as non-wired.
We thus chose not to describe those SFP cages and we use fixed-link.

The rest of the series uses phylink to add support for 1000BaseX and
2500BaseX modes in the PPv2 driver. To do this, two patches are needed
in the common PHY framework (patches 3 and 4). The last 4 patches modify
the device tree to use the new PPv2 functionalities.

The series has been tested for the device tree mode on the 7040-db,
8040-db and 8040-mcbin boards, to ensure all the interface where working
as expected.

@Dave: patches 7 to 10 should go through the mvebu tree (Gregory in
Cc.) to avoid any conflict with the other mvebu dt patches taken during
this cycle.

The series is based on today's net-next.

Since v2:
  - Removed the SFP description from the DB boards, as their SFP cages
    are wired properly. We now use fixed-link.
  - Because of this rework, split the series in two, so that the SFP
    part is reviewed separately.
  - Small fixes in the phylink patch.
  - Rebased on the latest net-next branch.

Since v1:
  - Chose a different approach to the SFP changes, as the previous ones
    weren't valid and reworked both BD boards device trees.
  - Misc fixes.
  - Added Kishon's acked-by on one patch.
  - Rebaed on latest net-next branch.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17 16:11:40 -04:00
Antoine Tenart
a6fe31de86 net: mvpp2: 2500baseX support
This patch adds the 2500Base-X PHY mode support in the Marvell PPv2
driver. 2500Base-X is quite close to 1000Base-X and SGMII modes and uses
nearly the same code path.

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17 16:11:40 -04:00
Antoine Tenart
d97c9f4ab0 net: mvpp2: 1000baseX support
This patch adds the 1000Base-X PHY mode support in the Marvell PPv2
driver. 1000Base-X is quite close the SGMII and uses nearly the same
code path.

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17 16:11:40 -04:00
Antoine Tenart
9ad8bd819b phy: cp110-comphy: 2.5G SGMII mode
This patch allow the CP110 comphy to configure some lanes in the
2.5G SGMII mode. This mode is quite close to SGMII and uses nearly the
same code path.

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17 16:11:40 -04:00
Antoine Tenart
5490b8725d phy: add 2.5G SGMII mode to the phy_mode enum
This patch adds one more generic PHY mode to the phy_mode enum, to allow
configuring generic PHYs to the 2.5G SGMII mode by using the set_mode
callback.

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Acked-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17 16:11:39 -04:00
Antoine Tenart
4bb0432628 net: mvpp2: phylink support
Convert the PPv2 driver to implement phylink helpers, and use phylink in
DT mode. The other mode supported is ACPI, which will need further work
in order to be entirely compatible with phylink.

The MAC and GoP configuration functions were completely moved to fit
into the phylink helpers. When a PHY is always present between the MAC
and the physical port, phylink only is used, but when this is not the
case (the MAC directly is connected to the physical port) the link IRQ
is used to detect changes in the link state and call phylink_mac_change.

The ACPI mode do not uses phylink as of now, and the changes shouldn't
impact its use.

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17 16:11:39 -04:00
Antoine Tenart
dcd3e73ae7 net: mvpp2: align the ethtool ops definition
Cosmetic patch to align the ethtool functions to ops definitions. This
patch does not change in any way the driver's behaviour.

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17 16:11:39 -04:00
David S. Miller
a564b659bb wireless-drivers-next patches for 4.18
The first pull request for 4.18. As usual new features and bug fixes
 but nothing really special.
 
 I also merged wireless-drivers due to an iwlwifi patch dependency.
 
 Major changes:
 
 iwlwifi
 
 * implement Traffic Condition Monitor and use it for scan, BT coex and
   to detect when the AP doesn't support UAPSD properly
 
 * some more work for the 22000 family of devices;
 
 * introduce AMSDU rate control offload
 
 qtnfmac
 
 * DFS offload support
 
 rsi
 
 * roaming enhancements
 
 * increase max supported aggregation subframes
 
 * don't advertise 5 GHz support if the device doesn't support it
 
 brcmfmac
 
 * add support for BCM4366E chipset
 
 * add support for bcm43364 wireless chipset
 
 ath10k
 
 * enable temperature reads for QCA6174 and QCA9377
 
 * add firmware memory dump support for QCA9984
 
 * continue adding WCN3990 support via SNOC bus
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJa/TreAAoJEG4XJFUm622boV0IAI/tTu3obIdhdlnZJsjat/wH
 tmQX2rZl0g7kbthVU+WqPA1KgvK/HEX1SUIP0leARl6FDqxrBzE1G4P1fOY3JIaZ
 +T3UG9LgFM3hoXtJ1VRdvi8rTBVU67TTOrQCVD7AapGWfQwn6AXfy4ARUEqBjkrA
 SxDemdAwIks3miMU3EnsRlzLaI56R7l1mk0Xr30tM5Coq721AcWE6FBz6lqmFnTC
 3vdDzpMRIiTt5zLICJZYgAB3akiaJEqHnIAv+y0sbXG1gHDhKcfEH674SM6FCB2N
 3TP7EpzzxH/FYB0i+zOFg6wnAqUngLLnwkG/ciniVi75feb+gbaKqWHT8FNfx04=
 =rF/V
 -----END PGP SIGNATURE-----

Merge tag 'wireless-drivers-next-for-davem-2018-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next

Kalle Valo says:

====================
wireless-drivers-next patches for 4.18

The first pull request for 4.18. As usual new features and bug fixes
but nothing really special.

I also merged wireless-drivers due to an iwlwifi patch dependency.

Major changes:

iwlwifi

* implement Traffic Condition Monitor and use it for scan, BT coex and
  to detect when the AP doesn't support UAPSD properly

* some more work for the 22000 family of devices;

* introduce AMSDU rate control offload

qtnfmac

* DFS offload support

rsi

* roaming enhancements

* increase max supported aggregation subframes

* don't advertise 5 GHz support if the device doesn't support it

brcmfmac

* add support for BCM4366E chipset

* add support for bcm43364 wireless chipset

ath10k

* enable temperature reads for QCA6174 and QCA9377

* add firmware memory dump support for QCA9984

* continue adding WCN3990 support via SNOC bus
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17 16:00:40 -04:00
YueHaibing
93c65d13d8 vmxnet3: Replace msleep(1) with usleep_range()
As documented in Documentation/timers/timers-howto.txt,
replace msleep(1) with usleep_range().

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17 15:55:38 -04:00
Tonghao Zhang
7e878b605f bonding: introduce link change helper
Introduce an new common helper to avoid redundancy.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17 15:51:13 -04:00
David S. Miller
10e361e100 Merge branch 'tcp-default-RACK-loss-recovery'
Yuchung Cheng says:

====================
tcp: default RACK loss recovery

This patch set implements the features correspond to the
draft-ietf-tcpm-rack-03 version of the RACK draft.
https://datatracker.ietf.org/meeting/101/materials/slides-101-tcpm-update-on-tcp-rack-00

1. SACK: implement equivalent DUPACK threshold heuristic in RACK to
   replace existing RFC6675 recovery (tcp_mark_head_lost).

2. Non-SACK: simplify RFC6582 NewReno implementation

3. RTO: apply RACK's time-based approach to avoid spuriouly
   marking very recently sent packets lost.

4. with (1)(2)(3), make RACK the exclusive fast recovery mechanism to
   mark losses based on time on S/ACK. Tail loss probe and F-RTO remain
   enabled by default as complementary mechanisms to send probes in
   CA_Open and CA_Loss states. The probes would solicit S/ACKs to trigger
   RACK time-based loss detection.

All Google web and internal servers have been running RACK-only mode
(4) for a while now. a/b experiments indicate RACK/TLP on average
reduces recovery latency by 10% compared to RFC6675. RFC6675
is default-off now but can be enabled by disabling RACK (sysctl
net.ipv4.tcp_recovery=0) for unseen issues.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17 15:41:30 -04:00
Yuchung Cheng
56f8c5d78f tcp: don't mark recently sent packets lost on RTO
An RTO event indicates the head has not been acked for a long time
after its last (re)transmission. But the other packets are not
necessarily lost if they have been only sent recently (for example
due to application limit). This patch would prohibit marking packets
sent within an RTT to be lost on RTO event, using similar logic in
TCP RACK detection.

Normally the head (SND.UNA) would be marked lost since RTO should
fire strictly after the head was sent. An exception is when the
most recent RACK RTT measurement is larger than the (previous)
RTO. To address this exception the head is always marked lost.

Congestion control interaction: since we may not mark every packet
lost, the congestion window may be more than 1 (inflight plus 1).
But only one packet will be retransmitted after RTO, since
tcp_retransmit_timer() calls tcp_retransmit_skb(...,segs=1). The
connection still performs slow start from one packet (with Cubic
congestion control).

This commit was tested in an A/B test with Google web servers,
and showed a reduction of 2% in (spurious) retransmits post
timeout (SlowStartRetrans), and correspondingly reduced DSACKs
(DSACKIgnoredOld) by 7%.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17 15:41:29 -04:00
Yuchung Cheng
b8fef65a8a tcp: new helper tcp_rack_skb_timeout
Create and export a new helper tcp_rack_skb_timeout and move tcp_is_rack
to prepare the final RTO change.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17 15:41:29 -04:00
Yuchung Cheng
c77d62ffae tcp: separate loss marking and state update on RTO
Previously when TCP times out, it first updates cwnd and ssthresh,
marks packets lost, and then updates congestion state again. This
was fine because everything not yet delivered is marked lost,
so the inflight is always 0 and cwnd can be safely set to 1 to
retransmit one packet on timeout.

But the inflight may not always be 0 on timeout if TCP changes to
mark packets lost based on packet sent time. Therefore we must
first mark the packet lost, then set the cwnd based on the
(updated) inflight.

This is not a pure refactor. Congestion control may potentially
break if it uses (not yet updated) inflight to compute ssthresh.
Fortunately all existing congestion control modules does not do that.
Also it changes the inflight when CA_LOSS_EVENT is called, and only
westwood processes such an event but does not use inflight.

This change has two other minor side benefits:
1) consistent with Fast Recovery s.t. the inflight is updated
   first before tcp_enter_recovery flips state to CA_Recovery.

2) avoid intertwining loss marking with state update, making the
   code more readable.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17 15:41:29 -04:00
Yuchung Cheng
2ad55f5660 tcp: new helper tcp_timeout_mark_lost
Refactor using a new helper, tcp_timeout_mark_loss(), that marks packets
lost upon RTO.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17 15:41:29 -04:00