2
0
mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-26 22:24:09 +08:00
Commit Graph

591227 Commits

Author SHA1 Message Date
David S. Miller
6c76f3d292 Merge branch 'gre-lwt-fixes'
Jiri Benc says:

====================
gre: fix lwtunnel support

This patchset fixes a few bugs in ipgre metadata mode implementation.

As an example, in this setup:

ip a a 192.168.1.1/24 dev eth0
ip l a gre1 type gre external
ip l s gre1 up
ip a a 192.168.99.1/24 dev gre1
ip r a 192.168.99.2/32 encap ip dst 192.168.1.2 ttl 10 dev gre1
ping 192.168.99.2

the traffic does not go through before this patchset and does as expected
with it applied.

v3: Back to v1 in order not to break existing users. Dropped patch 3, will
    be fixed in iproute2 instead.
v2: Rejecting invalid configuration, added patch 3, dropped patch for
    ETH_P_TEB (will target net-next).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-28 17:02:45 -04:00
Jiri Benc
2090714e1d gre: build header correctly for collect metadata tunnels
In ipgre (i.e. not gretap) + collect metadata mode, the skb was assumed to
contain Ethernet header and was encapsulated as ETH_P_TEB. This is not the
case, the interface is ARPHRD_IPGRE and the protocol to be used for
encapsulation is skb->protocol.

Fixes: 2e15ea390e ("ip_gre: Add support to collect tunnel metadata.")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-28 17:02:45 -04:00
Jiri Benc
a64b04d86d gre: do not assign header_ops in collect metadata mode
In ipgre mode (i.e. not gretap) with collect metadata flag set, the tunnel
is incorrectly assumed to be mGRE in NBMA mode (see commit 6a5f44d7a0).
This is not the case, we're controlling the encapsulation addresses by
lwtunnel metadata. And anyway, assigning dev->header_ops in collect metadata
mode does not make sense.

Although it would be more user firendly to reject requests that specify
both the collect metadata flag and a remote/local IP address, this would
break current users of gretap or introduce ugly code and differences in
handling ipgre and gretap configuration. Keep the current behavior of
remote/local IP address being ignored in such case.

v3: Back to v1, added explanation paragraph.
v2: Reject configuration specifying both remote/local address and collect
    metadata flag.

Fixes: 2e15ea390e ("ip_gre: Add support to collect tunnel metadata.")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-28 17:02:44 -04:00
David S. Miller
12395d0647 Just a single fix, for a per-CPU memory leak in a
(root user triggerable) error case.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCgAGBQJXIH65AAoJEGt7eEactAAdZmwP/R2UAHltBlYhCEMqcM+8VhPD
 VDB3LFTYhOVUtVfFwqAzEoxPDjnGyGgZcjO5RxyCZLokm71KbbHAp3h3GnCVQCHd
 dnRej6RD+Kl6n0EoTPCLy7ZAjSjpGBWOTy6MEgrAQnTtL+Q7nUch+z5DXIafTg/w
 MOYke/WfD1jHbq2eGHu6HkbY3IUwoSKaEoA8qN20ieJRU7jsaG29RiAvBot2IVTI
 g3hTL4FPzwSL5XM0qkoxDLPYA5Mo36Cb5sZ9AjkQCaqP/EemOoFxILGWUyi+17nd
 zdF3zZB9lj+CdR+0IbjTjz8b457u1g/JW4dLl+iRqv7clynm3gmz7LivVhBcHogx
 usg0hW9tDeZ5wzHj8v+e+C+RqyxtgHxVvYtt8Jh6bTqS8aMO8hor7qPFOcpJPmyz
 ZbXThJnsvfaYoWAcvIXUa3Q2kwz2myVLDhlQBgwSi5TzgTDqb2GlYnGhvKOnB5cz
 6JL3mZt2vi0Yvb7Lk9YzeEYs5cZq4DFVfx3nHgaVwDZ5GPoUTgIjgSnVLFC/J80a
 r09Wmigtjy5qftw6w4pSUcf/Fj4L0BD+GhAZ+Hs5xhjUnlAxTlHxDW0bJ/kMmFzy
 9B91YSswBc+3IGdjnsN+bNZ6T0XuvapOLRkC1V8fJDrtyy1Tel/LL9Giy/V13XLr
 8vgETcgFJyQ3jnkzyRdr
 =EWBn
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-for-davem-2016-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
Just a single fix, for a per-CPU memory leak in a
(root user triggerable) error case.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-28 16:55:26 -04:00
Dan Carpenter
b43586576e tipc: remove an unnecessary NULL check
This is never called with a NULL "buf" and anyway, we dereference 's' on
the lines before so it would Oops before we reach the check.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-28 16:54:12 -04:00
Timur Tabi
a05d7dfc51 net: phy: at803x: only the AT8030 needs a hardware reset on link change
Commit 13a56b44 ("at803x: Add support for hardware reset") added a
work-around for a hardware bug on the AT8030.  However, the work-around
was being called for all 803x PHYs, even those that don't need it.
Function at803x_link_change_notify() checks to make sure that it only
resets the PHY on the 8030, but it makes more sense to not call that
function at all if it isn't needed.

Signed-off-by: Timur Tabi <timur@codeaurora.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-28 16:48:21 -04:00
Arnd Bergmann
6b87663fbe net/mlx5e: avoid stack overflow in mlx5e_open_channels
struct mlx5e_channel_param is a large structure that is allocated
on the stack of mlx5e_open_channels, and with a recent change
it has grown beyond the warning size for the maximum stack
that a single function should use:

mellanox/mlx5/core/en_main.c: In function 'mlx5e_open_channels':
mellanox/mlx5/core/en_main.c:1325:1: error: the frame size of 1072 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

The function is already using dynamic allocation and is not in
a fast path, so the easiest workaround is to use another kzalloc
for allocating the channel parameters.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: d3c9bc2743 ("net/mlx5e: Added ICO SQs")
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-28 16:46:59 -04:00
David S. Miller
956a7ffe00 In this patchset you can find the following fixes:
1) check skb size to avoid reading beyond its border when delivering
    payloads, by Sven Eckelmann
 2) initialize last_seen time in neigh_node object to prevent cleanup
    routine from accidentally purge it, by Marek Lindner
 3) release "recently added" slave interfaces upon virtual/batman
    interface shutdown, by Sven Eckelmann
 4) properly decrease router object reference counter upon routing table
    update, by Sven Eckelmann
 5) release queue slots when purging OGM packets of deactivating slave
    interface, by Linus Lüssing
 
 Patch 2 and 3 have no "Fixes:" tag because the offending commits date
 back to when batman-adv was not yet officially in the net tree.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJXHt+MAAoJEJ4aZjxxc6bKMEgP/1DZWgQpHs5IM8yW7IQx8CQO
 iMkpwfnJcRSOnADC/Z2GtIcz1Df2r+NZcqf5xMMF2CL0xlks024qTHoqeV7Poyel
 DmzzETbQFWgdFD22RI70h25T4Yb400PP0saL2TbcVec6CiM57YN3cPbhjZvqzN32
 bCIa38kwAGXvNqRzcy5WjDF/rllAoJZ0s055z+kY8WuVOmvOEor+FDmWFr0D8ioP
 /utVP9ACA3YHZ39DMDFDsyBp6nMZOgHjpJVfmcubFULHmKvYQ0zMpgX19IVoMsJ6
 HEtz9fKN4KPgAFbbPcU0GLg4srsNFmEbTB7Bqhqods+ZYN60M4Z0kexqYz1XuItH
 atISvCIe14xHdT6gW32N707yK30DxUKIEpEg5wMXhE+1m041NfrfrcvaEXSLco6d
 txsQzd1R4T5ry3V1YXv4znSVPmHvd84ykKrklQZgPA09QIPCCDb7Olp8Lj6mMsmc
 OuEYLOfAoOD/KZcRUzY6kWpMRfOJLLXUgwcfSEES8MCaaBGD91YZyrSuHvixmo5V
 24JTp0D/X/rkkQjI3a2Pf0dhvdGHAk1g6mElddo86a0UpRbm3qshquAPf+U8QcU0
 Kt4rpN9dtOA8yTpnvxG2r04T32yzQQQNIqRGEnugokaJUECFF0mugxENTqcw2vux
 uNxMhl36A21czA9s/Iu7
 =o059
 -----END PGP SIGNATURE-----

Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge

Antonio Quartulli says:

====================
In this patchset you can find the following fixes:

1) check skb size to avoid reading beyond its border when delivering
   payloads, by Sven Eckelmann
2) initialize last_seen time in neigh_node object to prevent cleanup
   routine from accidentally purge it, by Marek Lindner
3) release "recently added" slave interfaces upon virtual/batman
   interface shutdown, by Sven Eckelmann
4) properly decrease router object reference counter upon routing table
   update, by Sven Eckelmann
5) release queue slots when purging OGM packets of deactivating slave
   interface, by Linus Lüssing

Patch 2 and 3 have no "Fixes:" tag because the offending commits date
back to when batman-adv was not yet officially in the net tree.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-28 16:42:40 -04:00
Jason Wang
3df97ba830 tuntap: calculate rps hash only when needed
There's no need to calculate rps hash if it was not enabled. So this
patch export rps_needed and check it before trying to get rps
hash. Tests (using pktgen to inject packets to guest) shows this can
improve pps about 13% (when rps is disabled).

Before:
~1150000 pps
After:
~1300000 pps

Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
----
Changes from V1:
- Fix build when CONFIG_RPS is not set
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-28 16:38:54 -04:00
Christophe Jaillet
eb63efb4f2 ps3_gelic: fix memcpy parameter
The size allocated for target->hwinfo and the number of bytes copied in it
should be consistent.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-28 16:37:56 -04:00
Woojung Huh
14437e3fa2 lan78xx: workaround of forced 100 Full/Half duplex mode error
At forced 100 Full & Half duplex mode, chip may fail to set mode correctly
when cable is switched between long(~50+m) and short one.
As workaround, set to 10 before setting to 100 at forced 100 F/H mode.

Signed-off-by: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-28 16:36:12 -04:00
Woojung Huh
74d79a2e30 lan78xx: fix statistics counter error
Fix rx_bytes, tx_bytes and tx_frames error in netdev.stats.
- rx_bytes counted bytes excluding size of struct ethhdr.
- tx_packets didn't count multiple packets in a single urb
- tx_bytes included 8 bytes of extra commands.

Signed-off-by: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-28 16:36:11 -04:00
Colin Ian King
1d9619d533 net: dsa: mv88e6xxx: fix uninitialized error return
The error return err is not initialized and there is a possibility
that err is not assigned causing mv88e6xxx_port_bridge_join to
return a garbage error return status. Fix this by initializing err
to 0.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-28 16:29:20 -04:00
David S. Miller
f345c9a572 Merge branch 'tcp-eor'
Martin KaFai Lau says:

====================
tcp: Make use of MSG_EOR in tcp_sendmsg

v4:
~ Do not set eor bit in do_tcp_sendpages() since there is
  no way to pass MSG_EOR from the userland now.
~ Avoid rmw by testing MSG_EOR first in tcp_sendmsg().
~ Move TCP_SKB_CB(skb)->eor test to a new helper
  tcp_skb_can_collapse_to() (suggested by Soheil).
~ Add some packetdrill tests.

v3:
~ Separate EOR marking from the SKBTX_ANY_TSTAMP logic.
~ Move the eor bit test back to the loop in tcp_sendmsg and
  tcp_sendpage because there could be >1 threads doing
  sendmsg.
~ Thanks to Eric Dumazet's suggestions on v2.
~ The TCP timestamp bug fixes are separated into other threads.

v2:
~ Rework based on the recent work
  "add TX timestamping via cmsg" by
  Soheil Hassas Yeganeh <soheil.kdev@gmail.com>
~ This version takes the MSG_EOR bit as a signal of
  end-of-response-message and leave the selective
  timestamping job to the cmsg
~ Changes based on the v1 feedback (like avoid
  unlikely check in a loop and adding tcp_sendpage
  support)
~ The first 3 patches are bug fixes.  The fixes in this
  series depend on the newly introduced txstamp_ack in
  net-next.  I will make relevant patches against net after
  getting some feedback.
~ The test results are based on the recently posted net fix:
  "tcp: Fix SOF_TIMESTAMPING_TX_ACK when handling dup acks"

One potential use case is to use MSG_EOR with
SOF_TIMESTAMPING_TX_ACK to get a more accurate
TCP ack timestamping on application protocol with
multiple outgoing response messages (e.g. HTTP2).

One of our use case is at the webserver.  The webserver tracks
the HTTP2 response latency by measuring when the webserver sends
the first byte to the socket till the TCP ACK of the last byte
is received.  In the cases where we don't have client side
measurement, measuring from the server side is the only option.
In the cases we have the client side measurement, the server side
data can also be used to justify/cross-check-with the client
side data.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-28 16:14:20 -04:00
Martin KaFai Lau
a166140e81 tcp: Handle eor bit when fragmenting a skb
When fragmenting a skb, the next_skb should carry
the eor from prev_skb.  The eor of prev_skb should
also be reset.

Packetdrill script for testing:
~~~~~~
+0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10`
+0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1`
+0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0

0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7>
0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
0.200 < . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4
+0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0

0.200 sendto(4, ..., 15330, MSG_EOR, ..., ...) = 15330
0.200 sendto(4, ..., 730, 0, ..., ...) = 730

0.200 > .  1:7301(7300) ack 1
0.200 > . 7301:14601(7300) ack 1

0.300 < . 1:1(0) ack 14601 win 257
0.300 > P. 14601:15331(730) ack 1
0.300 > P. 15331:16061(730) ack 1

0.400 < . 1:1(0) ack 16061 win 257
0.400 close(4) = 0
0.400 > F. 16061:16061(0) ack 1
0.400 < F. 1:1(0) ack 16062 win 257
0.400 > . 16062:16062(0) ack 2

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-28 16:14:19 -04:00
Martin KaFai Lau
a643b5d41c tcp: Handle eor bit when coalescing skb
This patch:
1. Prevent next_skb from coalescing to the prev_skb if
   TCP_SKB_CB(prev_skb)->eor is set
2. Update the TCP_SKB_CB(prev_skb)->eor if coalescing is
   allowed

Packetdrill script for testing:
~~~~~~
+0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10`
+0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1`
+0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0

0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7>
0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
0.200 < . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4
+0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0

0.200 sendto(4, ..., 730, MSG_EOR, ..., ...) = 730
0.200 sendto(4, ..., 730, MSG_EOR, ..., ...) = 730
0.200 write(4, ..., 11680) = 11680

0.200 > P. 1:731(730) ack 1
0.200 > P. 731:1461(730) ack 1
0.200 > . 1461:8761(7300) ack 1
0.200 > P. 8761:13141(4380) ack 1

0.300 < . 1:1(0) ack 1 win 257 <sack 1461:13141,nop,nop>
0.300 > P. 1:731(730) ack 1
0.300 > P. 731:1461(730) ack 1
0.400 < . 1:1(0) ack 13141 win 257

0.400 close(4) = 0
0.400 > F. 13141:13141(0) ack 1
0.500 < F. 1:1(0) ack 13142 win 257
0.500 > . 13142:13142(0) ack 2

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-28 16:14:19 -04:00
Martin KaFai Lau
c134ecb878 tcp: Make use of MSG_EOR in tcp_sendmsg
This patch adds an eor bit to the TCP_SKB_CB.  When MSG_EOR
is passed to tcp_sendmsg, the eor bit will be set at the skb
containing the last byte of the userland's msg.  The eor bit
will prevent data from appending to that skb in the future.

The change in do_tcp_sendpages is to honor the eor set
during the previous tcp_sendmsg(MSG_EOR) call.

This patch handles the tcp_sendmsg case.  The followup patches
will handle other skb coalescing and fragment cases.

One potential use case is to use MSG_EOR with
SOF_TIMESTAMPING_TX_ACK to get a more accurate
TCP ack timestamping on application protocol with
multiple outgoing response messages (e.g. HTTP2).

Packetdrill script for testing:
~~~~~~
+0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10`
+0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1`
+0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0

0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7>
0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
0.200 < . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4
+0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0

0.200 write(4, ..., 14600) = 14600
0.200 sendto(4, ..., 730, MSG_EOR, ..., ...) = 730
0.200 sendto(4, ..., 730, MSG_EOR, ..., ...) = 730

0.200 > .  1:7301(7300) ack 1
0.200 > P. 7301:14601(7300) ack 1

0.300 < . 1:1(0) ack 14601 win 257
0.300 > P. 14601:15331(730) ack 1
0.300 > P. 15331:16061(730) ack 1

0.400 < . 1:1(0) ack 16061 win 257
0.400 close(4) = 0
0.400 > F. 16061:16061(0) ack 1
0.400 < F. 1:1(0) ack 16062 win 257
0.400 > . 16062:16062(0) ack 2

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-28 16:14:18 -04:00
David S. Miller
2a9e8438a2 Merge branch 'tcp-redundant-checks'
Soheil Hassas Yeganeh says:

====================
tcp: simplify ack tx timestamps

v2:
- Fully remove SKBTX_ACK_TSTAMP, as suggested by Willem de Bruijn.

This patch series aims at removing redundant checks and fields
for ack timestamps for TCP.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-28 16:06:11 -04:00
Soheil Hassas Yeganeh
0a2cf20c3f tcp: remove SKBTX_ACK_TSTAMP since it is redundant
The SKBTX_ACK_TSTAMP flag is set in skb_shinfo->tx_flags when
the timestamp of the TCP acknowledgement should be reported on
error queue. Since accessing skb_shinfo is likely to incur a
cache-line miss at the time of receiving the ack, the
txstamp_ack bit was added in tcp_skb_cb, which is set iff
the SKBTX_ACK_TSTAMP flag is set for an skb. This makes
SKBTX_ACK_TSTAMP flag redundant.

Remove the SKBTX_ACK_TSTAMP and instead use the txstamp_ack bit
everywhere.

Note that this frees one bit in shinfo->tx_flags.

Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Suggested-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-28 16:06:10 -04:00
Soheil Hassas Yeganeh
863c1fd981 tcp: remove an unnecessary check in tcp_tx_timestamp
Remove the redundant check for sk->sk_tsflags in tcp_tx_timestamp.

tcp_tx_timestamp() receives the tsflags as a parameter. As a
result the "sk->sk_tsflags || tsflags" is redundant, since
tsflags already includes sk->sk_tsflags plus overrides from
control messages.

Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-28 16:06:10 -04:00
Marcelo Ricardo Leitner
7b7483409f net: fix net_gso_ok for new GSO types.
Fix casting in net_gso_ok. Otherwise the shift on
gso_type << NETIF_F_GSO_SHIFT may hit the 32th bit and make it look like
a INT_MIN, which is then promoted from signed to uint64 which is
0xffffffff80000000, resulting in wrong behavior when it is and'ed with
the feature itself, as in:

This test app:
#include <stdio.h>
#include <stdint.h>

int main(int argc, char **argv)
{
	uint64_t feature1;
	uint64_t feature2;
	int gso_type = 1 << 15;

	feature1 = gso_type << 16;
	feature2 = (uint64_t)gso_type << 16;
	printf("%lx %lx\n", feature1, feature2);

	return 0;
}

Gives:
ffffffff80000000 80000000

So that this:
   return (features & feature) == feature;
Actually works on more bits than expected and invalid ones.

Fix is to promote it earlier.

Issue noted while rebasing SCTP GSO patch but posting separetely as
someone else may experience this meanwhile.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-28 15:53:17 -04:00
Neil Armstrong
62522ef3c3 net: ethernet: davinci_emac: Fix devioctl while in fixed link
When configured in fixed link, the DaVinci emac driver sets the
priv->phydev to NULL and further ioctl calls to the phy_mii_ioctl()
causes the kernel to crash.

Cc: Brian Hutchinson <b.hutchman@gmail.com>
Fixes: 1bb6aa56bb ("net: davinci_emac: Add support for fixed-link PHY")
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Reviewed-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-28 15:51:25 -04:00
David S. Miller
eee66af857 wireless-drivers fixes for 4.6
ath9k
 
 * fix a couple release old throughput regression on ar9281
 
 iwlwifi
 
 * add new device IDs for 8265
 * fix a NULL pointer dereference when paging firmware asserts
 * remove a WARNING on gscan capabilities
 * fix MODULE_FIRMWARE for 8260
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQEcBAABAgAGBQJXHjlZAAoJEG4XJFUm622brcgIAJ127uU9S/X3oM2EK76TdbV+
 yZnp8Te57bK+ypXRP7YMaL7gevR3q9TEzQkY6ZQ0iMCZenQouxB/Wg9xBZjJ/ny0
 dlVtmwM2sSvYdieTSftKGWLg3FH0FaJ1VXCJirLwrbs4Hbi2FqcGKMrLxflpqgW2
 m1FkTZzCbjokaHZgoWJmlDM/UbPtTz8BJgLICRd7hJ4jJ2Mi2k32LCSxsXn0GbUG
 naS6bDUGBcKjly9PKomY9y+rj897lm/m3rGKFJPkX/T7Klb1UMBLCj8NQIyF+/AX
 vu1fLxMwlFyXnmRJAFPyj9T2Vok3aDjRuA9xck3MufxOFjOhEGJiPjZV3wNNFj8=
 =aDY4
 -----END PGP SIGNATURE-----

Merge tag 'wireless-drivers-for-davem-2016-04-25' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers

Kalle Valo says:

====================
wireless-drivers fixes for 4.6

ath9k

* fix a couple release old throughput regression on ar9281

iwlwifi

* add new device IDs for 8265
* fix a NULL pointer dereference when paging firmware asserts
* remove a WARNING on gscan capabilities
* fix MODULE_FIRMWARE for 8260
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-28 14:23:27 -04:00
Bert Kenward
e00f80173b MAINTAINERS: net: update sfc maintainers
Add myself and Edward Cree as maintainers.
Remove Shradha Shah, who is on extended leave.

Cc: David S. Miller <davem@davemloft.net>
Cc: Edward Cree <ecree@solarflare.com>
Cc: Shradha Shah <sshah@solarflare.com>
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-28 14:21:15 -04:00
Jon Cooper
dcb4123cbe sfc: disable RSS when unsupported
When certain firmware variants are selected (via the sfboot utility) the
SFC7000 and SFC8000 series NICs don't support RSS. The driver still
tries (and fails) to insert filters with the RSS flag, and the NIC fails
to pass traffic.

When the firmware reports RSS_LIMITED suppress allocating a default RSS
context. The absence of an RSS context is picked up in filter insertion
and RSS flags are discarded.

Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-28 14:21:15 -04:00
Stanislaw Gruszka
4be2b49e28 myri10ge: fix sleeping with bh disabled
napi_disable() can not be called with bh disabled, move locking just
around myri10ge_ss_lock_napi() .

Patches fixes following bug:

[  114.278378] BUG: sleeping function called from invalid context at net/core/dev.c:4383
<snip>
[  114.313712] Call Trace:
[  114.314943]  [<ffffffff817010ce>] dump_stack+0x19/0x1b
[  114.317673]  [<ffffffff810ce7f3>] __might_sleep+0x173/0x230
[  114.320566]  [<ffffffff815b3117>] napi_disable+0x27/0x90
[  114.323254]  [<ffffffffa01e437f>] myri10ge_close+0xbf/0x3f0 [myri10ge]

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Acked-by: Hyong-Youb Kim <hykim@myri.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-28 14:21:14 -04:00
Eric Engestrom
edb9a1b894 Documentation: networking: fix spelling mistakes
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-28 14:21:13 -04:00
Sinclair Yeh
7851496a32 drm/vmwgfx: Fix order of operation
mode->hdisplay * (var->bits_per_pixel + 7) gets evaluated before
the division, potentially making the pitch larger than it should
be.

Since the original intention is to do a div-round-up, just use
the macro instead.

Signed-off-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
2016-04-28 11:07:30 -07:00
Charmaine Lee
e02e588431 drm/vmwgfx: use vmw_cmd_dx_cid_check for query commands.
Instead of calling vmw_cmd_ok, call vmw_cmd_dx_cid_check to
validate the context id for query commands.

Signed-off-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
2016-04-28 11:07:23 -07:00
Charmaine Lee
1883598d42 drm/vmwgfx: Enable SVGA_3D_CMD_DX_SET_PREDICATION
Fixes piglit tests nv_conditional_render-* crashes.

Signed-off-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
2016-04-28 11:07:15 -07:00
Jason Gunthorpe
e6bd18f57a IB/security: Restrict use of the write() interface
The drivers/infiniband stack uses write() as a replacement for
bi-directional ioctl().  This is not safe. There are ways to
trigger write calls that result in the return structure that
is normally written to user space being shunted off to user
specified kernel memory instead.

For the immediate repair, detect and deny suspicious accesses to
the write API.

For long term, update the user space libraries and the kernel API
to something that doesn't present the same security vulnerabilities
(likely a structured ioctl() interface).

The impacted uAPI interfaces are generally only available if
hardware from drivers/infiniband is installed in the system.

Reported-by: Jann Horn <jann@thejh.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
[ Expanded check to all known write() entry points ]
Cc: stable@vger.kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-04-28 12:03:16 -04:00
Dean Luick
7723d8c244 IB/hfi1: Use kernel default llseek for ui device
The ui device llseek had a mistake with SEEK_END and did
not fully follow seek semantics.  Correct all this by
using a kernel supplied function for fixed size devices.

Cc: Al Viro <viro@ZenIV.linux.org.uk>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-04-28 12:00:39 -04:00
Mitko Haralanov
94158442eb IB/hfi1: Don't attempt to free resources if initialization failed
Attempting to free resources which have not been allocated and
initialized properly led to the following kernel backtrace:

    BUG: unable to handle kernel NULL pointer dereference at           (null)
    IP: [<ffffffffa09658fe>] unlock_exp_tids.isra.8+0x2e/0x120 [hfi1]
    PGD 852a43067 PUD 85d4a6067 PMD 0
    Oops: 0000 [#1] SMP
    CPU: 0 PID: 2831 Comm: osu_bw Tainted: G          IO 3.12.18-wfr+ #1
    task: ffff88085b15b540 ti: ffff8808588fe000 task.ti: ffff8808588fe000
    RIP: 0010:[<ffffffffa09658fe>]  [<ffffffffa09658fe>] unlock_exp_tids.isra.8+0x2e/0x120 [hfi1]
    RSP: 0018:ffff8808588ffde0  EFLAGS: 00010282
    RAX: 0000000000000000 RBX: ffff880858a31800 RCX: 0000000000000000
    RDX: ffff88085d971bc0 RSI: ffff880858a318f8 RDI: ffff880858a318c0
    RBP: ffff8808588ffe20 R08: 0000000000000000 R09: 0000000000000000
    R10: ffff88087ffd6f40 R11: 0000000001100348 R12: ffff880852900000
    R13: ffff880858a318c0 R14: 0000000000000000 R15: ffff88085d971be8
    FS:  00007f4674e83740(0000) GS:ffff88087f400000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000000 CR3: 000000085c377000 CR4: 00000000001407f0
    Stack:
     ffffffffa0941a71 ffff880858a318f8 ffff88085d971bc0 ffff880858a31800
     ffff880852900000 ffff880858a31800 00000000003ffff7 ffff88085d971bc0
     ffff8808588ffe60 ffffffffa09663fc ffff8808588ffe60 ffff880858a31800
    Call Trace:
     [<ffffffffa0941a71>] ? find_mmu_handler+0x51/0x70 [hfi1]
     [<ffffffffa09663fc>] hfi1_user_exp_rcv_free+0x6c/0x120 [hfi1]
     [<ffffffffa0932809>] hfi1_file_close+0x1a9/0x340 [hfi1]
     [<ffffffff8116c189>] __fput+0xe9/0x270
     [<ffffffff8116c35e>] ____fput+0xe/0x10
     [<ffffffff81065707>] task_work_run+0xa7/0xe0
     [<ffffffff81002969>] do_notify_resume+0x59/0x80
     [<ffffffff814ffc1a>] int_signal+0x12/0x17

This commit re-arranges the context initialization code in a way that
would allow for context event flags to be used to determine whether
the context has been successfully initialized.

In turn, this can be used to skip the resource de-allocation if they
were never allocated in the first place.

Fixes: 3abb33ac65 ("staging/hfi1: Add TID cache receive init and free funcs")
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com.
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-04-28 12:00:39 -04:00
Mike Marciniszyn
b9b06cb6fe IB/hfi1: Fix missing lock/unlock in verbs drain callback
The iowait_sdma_drained() callback lacked locking to
protect the qp s_flags field.

This causes the s_flags to be out of sync
on multiple CPUs, potentially corrupting the s_flags.

Fixes: a545f5308b ("staging/rdma/hfi: fix CQ completion order issue")
Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-04-28 12:00:39 -04:00
Jubin John
e6d2e0176e IB/rdmavt: Fix send scheduling
call_send is used to determine whether to send immediately or schedule
a send for later. The current logic in rdmavt is inverted and has a
negative impact on the latency of the hfi1 and qib drivers. Fix this
regression by correctly calling send immediately when call_send is set.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-04-28 12:00:39 -04:00
Mitko Haralanov
849e3e9398 IB/hfi1: Prevent unpinning of wrong pages
The routine used by the SDMA cache to handle already
cached nodes can extend an already existing node.

In its error handling code, the routine will unpin pages
when not all pages of the buffer extension were pinned.

There was a bug in that part of the routine, which would
mistakenly unpin pages from the original set rather than
the newly pinned pages.

This commit fixes that bug by offsetting the page array
to the proper place pointing at the beginning of the newly
pinned pages.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-04-28 12:00:38 -04:00
Mitko Haralanov
de82bdff62 IB/hfi1: Fix deadlock caused by locking with wrong scope
The locking around the interval RB tree is designed to prevent
access to the tree while it's being modified. The locking in its
current form is too overzealous, which is causing a deadlock in
certain cases with the following backtrace:

    Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0
    CPU: 0 PID: 5836 Comm: IMB-MPI1 Tainted: G           O 3.12.18-wfr+ #1
     0000000000000000 ffff88087f206c50 ffffffff814f1caa ffffffff817b53f0
     ffff88087f206cc8 ffffffff814ecd56 0000000000000010 ffff88087f206cd8
     ffff88087f206c78 0000000000000000 0000000000000000 0000000000001662
    Call Trace:
     <NMI>  [<ffffffff814f1caa>] dump_stack+0x45/0x56
     [<ffffffff814ecd56>] panic+0xc2/0x1cb
     [<ffffffff810d4370>] ? restart_watchdog_hrtimer+0x50/0x50
     [<ffffffff810d4432>] watchdog_overflow_callback+0xc2/0xd0
     [<ffffffff81109b4e>] __perf_event_overflow+0x8e/0x2b0
     [<ffffffff8110a714>] perf_event_overflow+0x14/0x20
     [<ffffffff8101c906>] intel_pmu_handle_irq+0x1b6/0x390
     [<ffffffff814f927b>] perf_event_nmi_handler+0x2b/0x50
     [<ffffffff814f8ad8>] nmi_handle.isra.3+0x88/0x180
     [<ffffffff814f8d39>] do_nmi+0x169/0x310
     [<ffffffff814f8177>] end_repeat_nmi+0x1e/0x2e
     [<ffffffff81272600>] ? unmap_single+0x30/0x30
     [<ffffffff814f780d>] ? _raw_spin_lock_irqsave+0x2d/0x40
     [<ffffffff814f780d>] ? _raw_spin_lock_irqsave+0x2d/0x40
     [<ffffffff814f780d>] ? _raw_spin_lock_irqsave+0x2d/0x40
     <<EOE>>  <IRQ>  [<ffffffffa056c4a8>] hfi1_mmu_rb_search+0x38/0x70 [hfi1]
     [<ffffffffa05919cb>] user_sdma_free_request+0xcb/0x120 [hfi1]
     [<ffffffffa0593393>] user_sdma_txreq_cb+0x263/0x350 [hfi1]
     [<ffffffffa057fad7>] ? sdma_txclean+0x27/0x1c0 [hfi1]
     [<ffffffffa0593130>] ? user_sdma_send_pkts+0x1710/0x1710 [hfi1]
     [<ffffffffa057fdd6>] sdma_make_progress+0x166/0x480 [hfi1]
     [<ffffffff810762c9>] ? ttwu_do_wakeup+0x19/0xd0
     [<ffffffffa0581c7e>] sdma_engine_interrupt+0x8e/0x100 [hfi1]
     [<ffffffffa0546bdd>] sdma_interrupt+0x5d/0xa0 [hfi1]
     [<ffffffff81097e57>] handle_irq_event_percpu+0x47/0x1d0
     [<ffffffff81098017>] handle_irq_event+0x37/0x60
     [<ffffffff8109aa5f>] handle_edge_irq+0x6f/0x120
     [<ffffffff810044af>] handle_irq+0xbf/0x150
     [<ffffffff8104c9b7>] ? irq_enter+0x17/0x80
     [<ffffffff8150168d>] do_IRQ+0x4d/0xc0
     [<ffffffff814f7c6a>] common_interrupt+0x6a/0x6a
     <EOI>  [<ffffffff81073524>] ? finish_task_switch+0x54/0xe0
     [<ffffffff814f56c6>] __schedule+0x3b6/0x7e0
     [<ffffffff810763a6>] __cond_resched+0x26/0x30
     [<ffffffff814f5eda>] _cond_resched+0x3a/0x50
     [<ffffffff814f4f82>] down_write+0x12/0x30
     [<ffffffffa0591619>] hfi1_release_user_pages+0x69/0x90 [hfi1]
     [<ffffffffa059173a>] sdma_rb_remove+0x9a/0xc0 [hfi1]
     [<ffffffffa056c00d>] __mmu_rb_remove.isra.5+0x5d/0x70 [hfi1]
     [<ffffffffa056c536>] hfi1_mmu_rb_remove+0x56/0x70 [hfi1]
     [<ffffffffa059427b>] hfi1_user_sdma_process_request+0x74b/0x1160 [hfi1]
     [<ffffffffa055c763>] hfi1_aio_write+0xc3/0x100 [hfi1]
     [<ffffffff8116a14c>] do_sync_readv_writev+0x4c/0x80
     [<ffffffff8116b58b>] do_readv_writev+0xbb/0x230
     [<ffffffff811a9da1>] ? fsnotify+0x241/0x320
     [<ffffffff81073524>] ? finish_task_switch+0x54/0xe0
     [<ffffffff8116b795>] vfs_writev+0x35/0x60
     [<ffffffff8116b8c9>] SyS_writev+0x49/0xc0
     [<ffffffff810cd876>] ? __audit_syscall_exit+0x1f6/0x2a0
     [<ffffffff814ff992>] system_call_fastpath+0x16/0x1b

As evident from the backtrace above, the process was being put to sleep
while holding the lock.

Limiting the scope of the lock only to the RB tree operation fixes the
above error allowing for proper locking and the process being put to
sleep when needed.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-04-28 12:00:38 -04:00
Mitko Haralanov
f19bd643db IB/hfi1: Prevent NULL pointer deferences in caching code
There is a potential kernel crash when the MMU notifier calls the
invalidation routines in the hfi1 pinned page caching code for sdma.

The invalidation routine could call the remove callback
for the node, which in turn ends up dereferencing the
current task_struct to get a pointer to the mm_struct.
However, the mm_struct pointer could be NULL resulting in
the following backtrace:

    BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
    IP: [<ffffffffa041f75a>] sdma_rb_remove+0xaa/0x100 [hfi1]
    15
    task: ffff88085e66e080 ti: ffff88085c244000 task.ti: ffff88085c244000
    RIP: 0010:[<ffffffffa041f75a>]  [<ffffffffa041f75a>] sdma_rb_remove+0xaa/0x100 [hfi1]
    RSP: 0000:ffff88085c245878  EFLAGS: 00010002
    RAX: 0000000000000000 RBX: ffff88105b9bbd40 RCX: ffffea003931a830
    RDX: 0000000000000004 RSI: ffff88105754a9c0 RDI: ffff88105754a9c0
    RBP: ffff88085c245890 R08: ffff88105b9bbd70 R09: 00000000fffffffb
    R10: ffff88105b9bbd58 R11: 0000000000000013 R12: ffff88105754a9c0
    R13: 0000000000000001 R14: 0000000000000001 R15: ffff88105b9bbd40
    FS:  0000000000000000(0000) GS:ffff88107ef40000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00000000000000a8 CR3: 0000000001a0b000 CR4: 00000000001407e0
    Stack:
     ffff88105b9bbd40 ffff88080ec481a8 ffff88080ec481b8 ffff88085c2458c0
     ffffffffa03fa00e ffff88080ec48190 ffff88080ed9cd00 0000000001024000
     0000000000000000 ffff88085c245920 ffffffffa03fa0e7 0000000000000282
    Call Trace:
     [<ffffffffa03fa00e>] __mmu_rb_remove.isra.5+0x5e/0x70 [hfi1]
     [<ffffffffa03fa0e7>] mmu_notifier_mem_invalidate+0xc7/0xf0 [hfi1]
     [<ffffffffa03fa143>] mmu_notifier_page+0x13/0x20 [hfi1]
     [<ffffffff81156dd0>] __mmu_notifier_invalidate_page+0x50/0x70
     [<ffffffff81140bbb>] try_to_unmap_one+0x20b/0x470
     [<ffffffff81141ee7>] try_to_unmap_anon+0xa7/0x120
     [<ffffffff81141fad>] try_to_unmap+0x4d/0x60
     [<ffffffff8111fd7b>] shrink_page_list+0x2eb/0x9d0
     [<ffffffff81120ab3>] shrink_inactive_list+0x243/0x490
     [<ffffffff81121491>] shrink_lruvec+0x4c1/0x640
     [<ffffffff81121641>] shrink_zone+0x31/0x100
     [<ffffffff81121b0f>] kswapd_shrink_zone.constprop.62+0xef/0x1c0
     [<ffffffff811229e3>] kswapd+0x403/0x7e0
     [<ffffffff811225e0>] ? shrink_all_memory+0xf0/0xf0
     [<ffffffff81068ac0>] kthread+0xc0/0xd0
     [<ffffffff81068a00>] ? insert_kthread_work+0x40/0x40
     [<ffffffff814ff8ec>] ret_from_fork+0x7c/0xb0
     [<ffffffff81068a00>] ? insert_kthread_work+0x40/0x40

To correct this, the mm_struct passed to us by the MMU notifier is
used (which is what should have been done to begin with). This avoids
the broken derefences and ensures that the correct mm_struct is used.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-04-28 12:00:38 -04:00
Eric Dumazet
ba7863f4d3 net: snmp: fix 64bit stats on 32bit arches
I accidentally replaced BH disabling by preemption disabling
in SNMP_ADD_STATS64() and SNMP_UPD_PO_STATS64() on 32bit builds.

For 64bit stats on 32bit arch, we really need to disable BH,
since the "struct u64_stats_sync syncp" might be manipulated
both from process and BH contexts.

Fixes: 6aef70a851 ("net: snmp: kill various STATS_USER() helpers")
Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Tested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-28 11:49:45 -04:00
Sagi Grimberg
e7d2c25d94 MAINTAINERS: Update iser/isert maintainer contact info
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Acked-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-04-28 11:32:50 -04:00
Sagi Grimberg
986ef95ecd IB/mlx5: Expose correct max_sge_rd limit
mlx5 devices (Connect-IB, ConnectX-4, ConnectX-4-LX) has a limitation
where rdma read work queue entries cannot exceed 512 bytes.
A rdma_read wqe needs to fit in 512 bytes:
- wqe control segment (16 bytes)
- rdma segment (16 bytes)
- scatter elements (16 bytes each)

So max_sge_rd should be: (512 - 16 - 16) / 16 = 30.

Cc: linux-stable@vger.kernel.org
Reported-by: Christoph Hellwig <hch@lst.de>
Tested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagig@grimberg.me>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-04-28 10:49:17 -04:00
Chen-Yu Tsai
2963070a0f mmc: sunxi: Disable eMMC HS-DDR (MMC_CAP_1_8V_DDR) for Allwinner A80
eMMC HS-DDR no longer works on the A80, despite it working when support
for this developed.

Disable it for now.

Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-04-28 11:43:54 +02:00
Kan Liang
cf3beb7c90 perf/x86/intel: Fix incorrect lbr_sel_mask value
This patch fixes a bug which was introduced by:

 b16a5b52eb ("perf/x86: Add option to disable reading branch flags/cycles")

In this patch, lbr_sel_mask is used to mask the lbr_select. But LBR_SEL_MASK
doesn't include the bit for LBR_CALL_STACK. So LBR call stack will never be
set in lbr_select.

This patch corrects the LBR_SEL_MASK by including all valid bits in
LBR_SELECT. Also, the LBR_CALL_STACK bit is different as other bit in
LBR_SELECT. It does not operate in suppress mode, so it needs to be
specially handled in intel_pmu_setup_hw_lbr_filter.

Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1461231010-4399-1-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28 10:32:43 +02:00
Alexander Shishkin
1c5ac21a0e perf/x86/intel/pt: Don't die on VMXON
Some versions of Intel PT do not support tracing across VMXON, more
specifically, VMXON will clear TraceEn control bit and any attempt to
set it before VMXOFF will throw a #GP, which in the current state of
things will crash the kernel. Namely:

  $ perf record -e intel_pt// kvm -nographic

on such a machine will kill it.

To avoid this, notify the intel_pt driver before VMXON and after
VMXOFF so that it knows when not to enable itself.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: hpa@zytor.com
Link: http://lkml.kernel.org/r/87oa9dwrfk.fsf@ashishki-desk.ger.corp.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28 10:32:42 +02:00
Peter Zijlstra
79c9ce57eb perf/core: Fix perf_event_open() vs. execve() race
Jann reported that the ptrace_may_access() check in
find_lively_task_by_vpid() is racy against exec().

Specifically:

  perf_event_open()		execve()

  ptrace_may_access()
				commit_creds()
  ...				if (get_dumpable() != SUID_DUMP_USER)
				  perf_event_exit_task();
  perf_install_in_context()

would result in installing a counter across the creds boundary.

Fix this by wrapping lots of perf_event_open() in cred_guard_mutex.
This should be fine as perf_event_exit_task() is already called with
cred_guard_mutex held, so all perf locks already nest inside it.

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28 10:32:41 +02:00
Adam Borowski
0a25556f84 perf/x86/amd: Set the size of event map array to PERF_COUNT_HW_MAX
The entry for PERF_COUNT_HW_REF_CPU_CYCLES is not used on AMD, but is
referenced by filter_events() which expects undefined events to have a
value of 0.

Found via KASAN:

  UBSAN: Undefined behaviour in arch/x86/events/amd/core.c:132:30
  index 9 is out of range for type 'u64 [9]'
  UBSAN: Undefined behaviour in arch/x86/events/amd/core.c:132:9
  load of address ffffffff81c021c8 with insufficient space for an object of type 'const u64'

Signed-off-by: Adam Borowski <kilobyte@angband.pl>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1461749731-30979-1-git-send-email-kilobyte@angband.pl
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28 10:20:25 +02:00
Ilya Dryomov
d3767f0fae rbd: report unsupported features to syslog
... instead of just returning an error.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2016-04-28 10:07:43 +02:00
Ilya Dryomov
811c668877 rbd: fix rbd map vs notify races
A while ago, commit 9875201e10 ("rbd: fix use-after free of
rbd_dev->disk") fixed rbd unmap vs notify race by introducing
an exported wrapper for flushing notifies and sticking it into
do_rbd_remove().

A similar problem exists on the rbd map path, though: the watch is
registered in rbd_dev_image_probe(), while the disk is set up quite
a few steps later, in rbd_dev_device_setup().  Nothing prevents
a notify from coming in and crashing on a NULL rbd_dev->disk:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
    Call Trace:
     [<ffffffffa0508344>] rbd_watch_cb+0x34/0x180 [rbd]
     [<ffffffffa04bd290>] do_event_work+0x40/0xb0 [libceph]
     [<ffffffff8109d5db>] process_one_work+0x17b/0x470
     [<ffffffff8109e3ab>] worker_thread+0x11b/0x400
     [<ffffffff8109e290>] ? rescuer_thread+0x400/0x400
     [<ffffffff810a5acf>] kthread+0xcf/0xe0
     [<ffffffff810b41b3>] ? finish_task_switch+0x53/0x170
     [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
     [<ffffffff81645dd8>] ret_from_fork+0x58/0x90
     [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
    RIP  [<ffffffffa050828a>] rbd_dev_refresh+0xfa/0x180 [rbd]

If an error occurs during rbd map, we have to error out, potentially
tearing down a watch.  Just like on rbd unmap, notifies have to be
flushed, otherwise rbd_watch_cb() may end up trying to read in the
image header after rbd_dev_image_release() has run:

    Assertion failure in rbd_dev_header_info() at line 4722:

     rbd_assert(rbd_image_format_valid(rbd_dev->image_format));

    Call Trace:
     [<ffffffff81cccee0>] ? rbd_parent_request_create+0x150/0x150
     [<ffffffff81cd4e59>] rbd_dev_refresh+0x59/0x390
     [<ffffffff81cd5229>] rbd_watch_cb+0x69/0x290
     [<ffffffff81fde9bf>] do_event_work+0x10f/0x1c0
     [<ffffffff81107799>] process_one_work+0x689/0x1a80
     [<ffffffff811076f7>] ? process_one_work+0x5e7/0x1a80
     [<ffffffff81132065>] ? finish_task_switch+0x225/0x640
     [<ffffffff81107110>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
     [<ffffffff81108c69>] worker_thread+0xd9/0x1320
     [<ffffffff81108b90>] ? process_one_work+0x1a80/0x1a80
     [<ffffffff8111b02d>] kthread+0x21d/0x2e0
     [<ffffffff8111ae10>] ? kthread_stop+0x550/0x550
     [<ffffffff82022802>] ret_from_fork+0x22/0x40
     [<ffffffff8111ae10>] ? kthread_stop+0x550/0x550
    RIP  [<ffffffff81ccd8f9>] rbd_dev_header_info+0xa19/0x1e30

To fix this, a) check if RBD_DEV_FLAG_EXISTS is set before calling
revalidate_disk(), b) move ceph_osdc_flush_notifies() call into
rbd_dev_header_unwatch_sync() to cover rbd map error paths and c) turn
header read-in into a critical section.  The latter also happens to
take care of rbd map foo@bar vs rbd snap rm foo@bar race.

Fixes: http://tracker.ceph.com/issues/15490

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2016-04-28 10:07:22 +02:00
Keith Busch
1bdb897039 x86/apic: Handle zero vector gracefully in clear_vector_irq()
If x86_vector_alloc_irq() fails x86_vector_free_irqs() is invoked to cleanup
the already allocated vectors. This subsequently calls clear_vector_irq().

The failed irq has no vector assigned, which triggers the BUG_ON(!vector) in
clear_vector_irq().

We cannot suppress the call to x86_vector_free_irqs() for the failed
interrupt, because the other data related to this irq must be cleaned up as
well. So calling clear_vector_irq() with vector == 0 is legitimate.

Remove the BUG_ON and return if vector is zero,

[ tglx: Massaged changelog ]

Fixes: b5dc8e6c21 "x86/irq: Use hierarchical irqdomain to manage CPU interrupt vectors"
Signed-off-by: Keith Busch <keith.busch@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-04-28 09:53:06 +02:00
David S. Miller
8be2748a40 Merge branch 'socket-space-optimizations'
Eric Dumazet says:

====================
net: avoid some atomic ops when FASYNC is not used

We can avoid some atomic operations on sockets not using FASYNC
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-27 23:08:41 -04:00