Commit Graph

750295 Commits

Author SHA1 Message Date
Guillaume Nault
6b9f34239b l2tp: fix races in tunnel creation
l2tp_tunnel_create() inserts the new tunnel into the namespace's tunnel
list and sets the socket's ->sk_user_data field, before returning it to
the caller. Therefore, there are two ways the tunnel can be accessed
and freed, before the caller even had the opportunity to take a
reference. In practice, syzbot could crash the module by closing the
socket right after a new tunnel was returned to pppol2tp_create().

This patch moves tunnel registration out of l2tp_tunnel_create(), so
that the caller can safely hold a reference before publishing the
tunnel. This second step is done with the new l2tp_tunnel_register()
function, which is now responsible for associating the tunnel to its
socket and for inserting it into the namespace's list.

While moving the code to l2tp_tunnel_register(), a few modifications
have been done. First, the socket validation tests are done in a helper
function, for clarity. Also, modifying the socket is now done after
having inserted the tunnel to the namespace's tunnels list. This will
allow insertion to fail, without having to revert theses modifications
in the error path (a followup patch will check for duplicate tunnels
before insertion). Either the socket is a kernel socket which we
control, or it is a user-space socket for which we have a reference on
the file descriptor. In any case, the socket isn't going to be closed
from under us.

Reported-by: syzbot+fbeeb5c3b538e8545644@syzkaller.appspotmail.com
Fixes: fd558d186d ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-11 17:41:27 -04:00
Sabrina Dubroca
83c1f36f98 tun: send netlink notification when the device is modified
I added dumping of link information about tun devices over netlink in
commit 1ec010e705 ("tun: export flags, uid, gid, queue information
over netlink"), but didn't add the missing netlink notifications when
the device's exported properties change.

This patch adds notifications when owner/group or flags are modified,
when queues are attached/detached, and when a tun fd is closed.

Reported-by: Thomas Haller <thaller@redhat.com>
Fixes: 1ec010e705 ("tun: export flags, uid, gid, queue information over netlink")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-11 14:48:05 -04:00
Sabrina Dubroca
9fffc5c6dd tun: set the flags before registering the netdevice
Otherwise, register_netdevice advertises the creation of the device with
the default flags, instead of what the user requested.

Reported-by: Thomas Haller <thaller@redhat.com>
Fixes: 1ec010e705 ("tun: export flags, uid, gid, queue information over netlink")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-11 14:48:05 -04:00
Phil Elwell
47b998653f lan78xx: Don't reset the interface on open
Commit 92571a1aae ("lan78xx: Connect phy early") moves the PHY
initialisation into lan78xx_probe, but lan78xx_open subsequently calls
lan78xx_reset. As well as forcing a second round of link negotiation,
this reset frequently prevents the phy interrupt from being generated
(even though the link is up), rendering the interface unusable.

Fix this issue by removing the lan78xx_reset call from lan78xx_open.

Fixes: 92571a1aae ("lan78xx: Connect phy early")
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-11 14:44:45 -04:00
David S. Miller
9cf74f593a Merge branch 'bnxt_en-Fixes-for-net'
Michael Chan says:

====================
bnxt_en: Fixes for net.

This bug fix series include NULL pointer fixes in ethtool -x code path
and in the error clean up path when freeing IRQs, a ring accounting bug
that missed rings used by the RDMA driver, and 3 bug fixes related to TC
Flower and VF-reps.

v2: Fixed commit message of patch 4.  Changed the pound sign to $ sign
in front of the ip command.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-11 14:42:00 -04:00
Michael Chan
cb98526bf9 bnxt_en: Fix NULL pointer dereference at bnxt_free_irq().
When open fails during ethtool -L ring change, for example, the driver
may crash at bnxt_free_irq() because bp->bnapi is NULL.

If we fail to allocate all the new rings, bnxt_open_nic() will free
all the memory including bp->bnapi.  Subsequent call to bnxt_close_nic()
will try to dereference bp->bnapi in bnxt_free_irq().

Fix it by checking for !bp->bnapi in bnxt_free_irq().

Fixes: e5811b8c09 ("bnxt_en: Add IRQ remapping logic.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-11 14:42:00 -04:00
Michael Chan
11c3ec7bb9 bnxt_en: Need to include RDMA rings in bnxt_check_rings().
With recent changes to reserve both L2 and RDMA rings, we need to include
the RDMA rings in bnxt_check_rings().  Otherwise we will under-estimate
the rings we need during ethtool -L and may lead to failure.

Fixes: fbcfc8e467 ("bnxt_en: Reserve completion rings and MSIX for bnxt_re RDMA driver.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-11 14:42:00 -04:00
Sriharsha Basavapatna
9d96465b11 bnxt_en: Support max-mtu with VF-reps
While a VF is configured with a bigger mtu (> 1500), any packets that
are punted to the VF-rep (slow-path) get dropped by OVS kernel-datapath
with the following message: "dropped over-mtu packet". Fix this by
returning the max-mtu value for a VF-rep derived from its corresponding VF.
VF-rep's mtu can be changed using 'ip' command as shown in this example:

	$ ip link set bnxt0_pf0vf0 mtu 9000

Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-11 14:42:00 -04:00
Sriharsha Basavapatna
479ca3bf91 bnxt_en: Ignore src port field in decap filter nodes
The driver currently uses src port field (along with other fields) in the
decap tunnel key, while looking up and adding tunnel nodes. This leads to
redundant cfa_decap_filter_alloc() requests to the FW and flow-miss in the
flow engine. Fix this by ignoring the src port field in decap tunnel nodes.

Fixes: f484f6782e ("bnxt_en: add hwrm FW cmds for cfa_encap_record and decap_filter")
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-11 14:41:59 -04:00
Andy Gospodarek
e85a9be93c bnxt_en: do not allow wildcard matches for L2 flows
Before this patch the following commands would succeed as far as the
user was concerned:

$ tc qdisc add dev p1p1 ingress
$ tc filter add dev p1p1 parent ffff: protocol all \
	flower skip_sw action drop
$ tc filter add dev p1p1 parent ffff: protocol ipv4 \
	flower skip_sw src_mac 00:02:00:00:00:01/44 action drop

The current flow offload infrastructure used does not support wildcard
matching for ethernet headers, so do not allow the second or third
commands to succeed.  If a user wants to drop traffic on that interface
the protocol and MAC addresses need to be specified explicitly:

$ tc qdisc add dev p1p1 ingress
$ tc filter add dev p1p1 parent ffff: protocol arp \
	flower skip_sw action drop
$ tc filter add dev p1p1 parent ffff: protocol ipv4 \
	flower skip_sw action drop
...
$ tc filter add dev p1p1 parent ffff: protocol ipv4 \
	flower skip_sw src_mac 00:02:00:00:00:01 action drop
$ tc filter add dev p1p1 parent ffff: protocol ipv4 \
	flower skip_sw src_mac 00:02:00:00:00:02 action drop
...

There are also checks for VLAN parameters in this patch as other callers
may wildcard those parameters even if tc does not.  Using different
flow infrastructure could allow this to work in the future for L2 flows,
but for now it does not.

Fixes: 2ae7408fed ("bnxt_en: bnxt: add TC flower filter offload support")
Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-11 14:41:59 -04:00
Michael Chan
7991cb9cfb bnxt_en: Fix ethtool -x crash when device is down.
Fix ethtool .get_rxfh() crash by checking for valid indirection table
address before copying the data.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-11 14:41:59 -04:00
David S. Miller
949989b36c Merge branch 'vhost-fix-vhost_vq_access_ok-log-check'
Stefan Hajnoczi says:

====================
vhost: fix vhost_vq_access_ok() log check

v3:
 * Rebased onto net/master and resolved conflict [DaveM]

v2:
 * Rewrote the conditional to make the vq access check clearer [Linus]
 * Added Patch 2 to make the return type consistent and harder to misuse [Linus]

The first patch fixes the vhost virtqueue access check which was recently
broken.  The second patch replaces the int return type with bool to prevent
future bugs.
====================

Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-11 10:54:06 -04:00
Stefan Hajnoczi
ddd3d4081f vhost: return bool from *_access_ok() functions
Currently vhost *_access_ok() functions return int.  This is error-prone
because there are two popular conventions:

1. 0 means failure, 1 means success
2. -errno means failure, 0 means success

Although vhost mostly uses #1, it does not do so consistently.
umem_access_ok() uses #2.

This patch changes the return type from int to bool so that false means
failure and true means success.  This eliminates a potential source of
errors.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-11 10:54:06 -04:00
Stefan Hajnoczi
d14d2b7809 vhost: fix vhost_vq_access_ok() log check
Commit d65026c6c6 ("vhost: validate log
when IOTLB is enabled") introduced a regression.  The logic was
originally:

  if (vq->iotlb)
      return 1;
  return A && B;

After the patch the short-circuit logic for A was inverted:

  if (A || vq->iotlb)
      return A;
  return B;

This patch fixes the regression by rewriting the checks in the obvious
way, no longer returning A when vq->iotlb is non-NULL (which is hard to
understand).

Reported-by: syzbot+65a84dde0214b0387ccd@syzkaller.appspotmail.com
Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-11 10:54:06 -04:00
Eric Auger
7ced6c98c7 vhost: Fix vhost_copy_to_user()
vhost_copy_to_user is used to copy vring used elements to userspace.
We should use VHOST_ADDR_USED instead of VHOST_ADDR_DESC.

Fixes: f889491380 ("vhost: introduce O(1) vq metadata cache")
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-11 10:52:34 -04:00
David S. Miller
98239c9096 Merge branch 'Aquantia-atlantic-critical-fixes-04-2018'
Igor Russkikh says:

====================
Aquantia atlantic critical fixes 04/2018

Two regressions on latest 4.16 driver reported by users

Some of old FW (1.5.44) had a link management logic which prevents
driver to make clean reset. Driver of 4.16 has a full hardware reset
implemented and that broke the link and traffic on such a cards.

Second is oops on shutdown callback in case interface is already
closed or was never opened.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-11 10:41:36 -04:00
Igor Russkikh
9a11aff25f net: aquantia: oops when shutdown on already stopped device
In case netdev is closed at the moment of pci shutdown, aq_nic_stop
gets called second time. napi_disable in that case hangs indefinitely.
In other case, if device was never opened at all, we get oops because
of null pointer access.

We should invoke aq_nic_stop conditionally, only if device is running
at the moment of shutdown.

Reported-by: David Arcari <darcari@redhat.com>
Fixes: 90869ddfef ("net: aquantia: Implement pci shutdown callback")
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-11 10:41:36 -04:00
Igor Russkikh
cce96d1883 net: aquantia: Regression on reset with 1.x firmware
On ASUS XG-C100C with 1.5.44 firmware a special mode called "dirty wake"
is active. With this mode when motherboard gets powered (but no poweron
happens yet), NIC automatically enables powersave link and watches
for WOL packet.
This normally allows to powerup the PC after AC power failures.

Not all motherboards or bios settings gives power to PCI slots,
so this mode is not enabled on all the hardware.

4.16 linux driver introduced full hardware reset sequence
This is required since before that we had no NIC hardware
reset implemented and there were side effects of "not clean start".

But this full reset is incompatible with "dirty wake" WOL feature
it keeps the PHY link in a special mode forever. As a consequence,
driver sees no link and no traffic.

To fix this we forcibly change FW state to idle state before doing
the full reset. This makes FW to restore link state.

Fixes: c8c82eb net: aquantia: Introduce global AQC hardware reset sequence
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-11 10:41:36 -04:00
Bassem Boubaker
53765341ee cdc_ether: flag the Cinterion AHS8 modem by gemalto as WWAN
The Cinterion AHS8 is a 3G device with one embedded WWAN interface
using cdc_ether as a driver.

The modem is controlled via AT commands through the exposed TTYs.

AT+CGDCONT write command can be used to activate or deactivate a WWAN
connection for a PDP context defined with the same command. UE
supports one WWAN adapter.

Signed-off-by: Bassem Boubaker <bassem.boubaker@actia.fr>
Acked-by: Oliver Neukum <oneukum@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-11 10:36:38 -04:00
Tejaswi Tanikella
3f01ddb962 slip: Check if rstate is initialized before uncompressing
On receiving a packet the state index points to the rstate which must be
used to fill up IP and TCP headers. But if the state index points to a
rstate which is unitialized, i.e. filled with zeros, it gets stuck in an
infinite loop inside ip_fast_csum trying to compute the ip checsum of a
header with zero length.

89.666953:   <2> [<ffffff9dd3e94d38>] slhc_uncompress+0x464/0x468
89.666965:   <2> [<ffffff9dd3e87d88>] ppp_receive_nonmp_frame+0x3b4/0x65c
89.666978:   <2> [<ffffff9dd3e89dd4>] ppp_receive_frame+0x64/0x7e0
89.666991:   <2> [<ffffff9dd3e8a708>] ppp_input+0x104/0x198
89.667005:   <2> [<ffffff9dd3e93868>] pppopns_recv_core+0x238/0x370
89.667027:   <2> [<ffffff9dd4428fc8>] __sk_receive_skb+0xdc/0x250
89.667040:   <2> [<ffffff9dd3e939e4>] pppopns_recv+0x44/0x60
89.667053:   <2> [<ffffff9dd4426848>] __sock_queue_rcv_skb+0x16c/0x24c
89.667065:   <2> [<ffffff9dd4426954>] sock_queue_rcv_skb+0x2c/0x38
89.667085:   <2> [<ffffff9dd44f7358>] raw_rcv+0x124/0x154
89.667098:   <2> [<ffffff9dd44f7568>] raw_local_deliver+0x1e0/0x22c
89.667117:   <2> [<ffffff9dd44c8ba0>] ip_local_deliver_finish+0x70/0x24c
89.667131:   <2> [<ffffff9dd44c92f4>] ip_local_deliver+0x100/0x10c

./scripts/faddr2line vmlinux slhc_uncompress+0x464/0x468 output:
 ip_fast_csum at arch/arm64/include/asm/checksum.h:40
 (inlined by) slhc_uncompress at drivers/net/slip/slhc.c:615

Adding a variable to indicate if the current rstate is initialized. If
such a packet arrives, move to toss state.

Signed-off-by: Tejaswi Tanikella <tejaswit@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-11 10:33:46 -04:00
Phil Elwell
fed56079e7 lan78xx: Avoid spurious kevent 4 "error"
lan78xx_defer_event generates an error message whenever the work item
is already scheduled. lan78xx_open defers three events -
EVENT_STAT_UPDATE, EVENT_DEV_OPEN and EVENT_LINK_RESET. Being aware
of the likelihood (or certainty) of an error message, the DEV_OPEN
event is added to the set of pending events directly, relying on
the subsequent deferral of the EVENT_LINK_RESET call to schedule the
work.  Take the same precaution with EVENT_STAT_UPDATE to avoid a
totally unnecessary error message.

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-11 10:32:43 -04:00
Phil Elwell
4bfc33807a lan78xx: Correctly indicate invalid OTP
lan78xx_read_otp tries to return -EINVAL in the event of invalid OTP
content, but the value gets overwritten before it is returned and the
read goes ahead anyway. Make the read conditional as it should be
and preserve the error code.

Fixes: 55d7de9de6 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-11 10:30:42 -04:00
Ka-Cheong Poon
a43cced9a3 rds: MP-RDS may use an invalid c_path
rds_sendmsg() calls rds_send_mprds_hash() to find a c_path to use to
send a message.  Suppose the RDS connection is not yet up.  In
rds_send_mprds_hash(), it does

	if (conn->c_npaths == 0)
		wait_event_interruptible(conn->c_hs_waitq,
					 (conn->c_npaths != 0));

If it is interrupted before the connection is set up,
rds_send_mprds_hash() will return a non-zero hash value.  Hence
rds_sendmsg() will use a non-zero c_path to send the message.  But if
the RDS connection ends up to be non-MP capable, the message will be
lost as only the zero c_path can be used.

Signed-off-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-11 10:24:01 -04:00
Sabrina Dubroca
1cc5954f44 ip_gre: clear feature flags when incompatible o_flags are set
Commit dd9d598c66 ("ip_gre: add the support for i/o_flags update via
netlink") added the ability to change o_flags, but missed that the
GSO/LLTX features are disabled by default, and only enabled some gre
features are unused. Thus we also need to disable the GSO/LLTX features
on the device when the TUNNEL_SEQ or TUNNEL_CSUM flags are set.

These two examples should result in the same features being set:

    ip link add gre_none type gre local 192.168.0.10 remote 192.168.0.20 ttl 255 key 0

    ip link set gre_none type gre seq
    ip link add gre_seq type gre local 192.168.0.10 remote 192.168.0.20 ttl 255 key 1 seq

Fixes: dd9d598c66 ("ip_gre: add the support for i/o_flags update via netlink")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-10 11:03:32 -04:00
Linus Torvalds
c18bb396d3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) The sockmap code has to free socket memory on close if there is
    corked data, from John Fastabend.

 2) Tunnel names coming from userspace need to be length validated. From
    Eric Dumazet.

 3) arp_filter() has to take VRFs properly into account, from Miguel
    Fadon Perlines.

 4) Fix oops in error path of tcf_bpf_init(), from Davide Caratti.

 5) Missing idr_remove() in u32_delete_key(), from Cong Wang.

 6) More syzbot stuff. Several use of uninitialized value fixes all
    over, from Eric Dumazet.

 7) Do not leak kernel memory to userspace in sctp, also from Eric
    Dumazet.

 8) Discard frames from unused ports in DSA, from Andrew Lunn.

 9) Fix DMA mapping and reset/failover problems in ibmvnic, from Thomas
    Falcon.

10) Do not access dp83640 PHY registers prematurely after reset, from
    Esben Haabendal.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (46 commits)
  vhost-net: set packet weight of tx polling to 2 * vq size
  net: thunderx: rework mac addresses list to u64 array
  inetpeer: fix uninit-value in inet_getpeer
  dp83640: Ensure against premature access to PHY registers after reset
  devlink: convert occ_get op to separate registration
  ARM: dts: ls1021a: Specify TBIPA register address
  net/fsl_pq_mdio: Allow explicit speficition of TBIPA address
  ibmvnic: Do not reset CRQ for Mobility driver resets
  ibmvnic: Fix failover case for non-redundant configuration
  ibmvnic: Fix reset scheduler error handling
  ibmvnic: Zero used TX descriptor counter on reset
  ibmvnic: Fix DMA mapping mistakes
  tipc: use the right skb in tipc_sk_fill_sock_diag()
  sctp: sctp_sockaddr_af must check minimal addr length for AF_INET6
  net: dsa: Discard frames from unused ports
  sctp: do not leak kernel memory to user space
  soreuseport: initialise timewait reuseport field
  ipv4: fix uninit-value in ip_route_output_key_hash_rcu()
  dccp: initialize ireq->ir_mark
  net: fix uninit-value in __hw_addr_add_ex()
  ...
2018-04-09 17:04:10 -07:00
Linus Torvalds
fd3b36d275 Merge branch 'work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs namei updates from Al Viro:

 - make lookup_one_len() safe with parent locked only shared(incoming
   afs series wants that)

 - fix of getname_kernel() regression from 2015 (-stable fodder, that
   one).

* 'work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  getname_kernel() needs to make sure that ->name != ->iname in long case
  make lookup_one_len() safe to use with directory locked shared
  new helper: __lookup_slow()
  merge common parts of lookup_one_len{,_unlocked} into common helper
2018-04-09 12:48:05 -07:00
Linus Torvalds
8ea4a5d84e orangefs: fixes and cleanups
+ Documentation cleanups
  + removal of unused code
  + cause some structs to be static
  + implement Orangefs vm_operations fault callout
  + eliminate two single-use functions and put their cleaned up code in line.
  + replace a vmalloc/memset instance with vzalloc
  + fix a race condition bug in wait code.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJayk8pAAoJEM9EDqnrzg2+92IP/2AqMOUodqF+AfKrAebQIAEk
 DvQefRC2eQXN2Ak8KYSO7oMdqfY7xeVSaKllWsSHdnktI5L9l6AOBa7uYQYF4OQt
 sqT5Wj5sbnDS54gOfFt95sUHEQXs5wegTXBE15t4uxaLSpv22mPjzqlTzyZcK7IX
 nhSIxEi+utXOW1WodT2xLgS08ac693KXciNNN14NKi5DJ+h6a8Z2kZBq/6ssC1oV
 nW2tIfRVWNVk6cfJ+Wq12JksEZOPemzjv2zojv3PZwCcwObOyxvf16nxnLjdkJvw
 obYWmR8ZcF3TItBbkxOTfanr+UAhZ55YVYDpitJmV9fpPM9eoPAR0gNWBgq45kaS
 s27pvMYf5uvGRJHgEO6hBpmi1dExDDMLfH6Y4IN+zyHKjNfgvfQKwIkzrjlmG6q6
 AgO3mbbMJfid2x7lkShuP1uYqUfv03fo2xW3zxWjjxo3tEbADSgzJQ5kjPixDvcB
 0ru2SlRVqgYIFqBN8VgWkXKbfcpSEcDITmFEkBKg/SkA08ry4iLk744UqIPa5n76
 5DYB0u99nHPMf2BBVsz6L/mUfVuHBYUBx1iII95PqJhVlj2Yi0sIj3AHUzQ4neKS
 YWGMk4VQ+B5zCObSohCRv9l3ECUqugpiJzCuZjZZS5LUkj9RO5EJ1LDhOU/31xDL
 OUTgZl7f5NykxwIHh3bD
 =zJ1Z
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.17-ofs' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux

Pull orangefs updates from Mike Marshall:
 "Fixes and cleanups:

   - Documentation cleanups

   - removal of unused code

   - make some structs static

   - implement Orangefs vm_operations fault callout

   - eliminate two single-use functions and put their cleaned up code in
     line.

   - replace a vmalloc/memset instance with vzalloc

   - fix a race condition bug in wait code"

* tag 'for-linus-4.17-ofs' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
  Orangefs: documentation updates
  orangefs: document package install and xfstests procedure
  orangefs: remove unused code
  orangefs: make several *_operations structs static
  orangefs: implement vm_ops->fault
  orangefs: open code short single-use functions
  orangefs: replace vmalloc and memset with vzalloc
  orangefs: bug fix for a race condition when getting a slot
2018-04-09 12:45:47 -07:00
Linus Torvalds
190f2ace0e - Fix another compression Kconfig combination missed in testing (Tobias Regnery)
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 Comment: Kees Cook <kees@outflux.net>
 
 iQIcBAABCgAGBQJayk5UAAoJEIly9N/cbcAmUBoQAK7Z7HEQJed0/Z1/JxW/Fpxz
 iXHD3IC4D7SVn1dhwUe7lkM7n1X+VaZSalXOXTorqt5mXgGkrXa1vaf2itLSafcz
 3/MSN7DvT2yjooBugI6sMwEEg898vdu1FIN/3PugS09y5Uh0tJbEY9s4NXmvaYC7
 Ze0e5CfCrg25Ic9oa4xI2qiVI1OUzoVNEA4S/KeE+WpClrLbxnavMZYtyFqbFfdX
 REARDCxpPH3hk0xKEN1zR1RvoSMBgc1aYdZVHq9Qp+kI73aXcf4UwAGy/a1Jkm8V
 uEkPX3jcKNHeVPmxk5Z/XgyMieDeh85RG+wHCmEeeZinnx0iOVmv0HhM/1T5NJ/A
 shpIOYmC+F+mjZUD/q/WBMH7+VTskYTG6h3avn7skAHtmVGMipVhxuRHbZh/dyMf
 zuKEfo/DloD/fP5wtuUqNjduznMVfnDOazxzogX4omvUeLRQvVsVyfqA3vv2Dv6c
 q+Gx1hBMKTMewlCyDv6EIQZ7l6ejdBnB0ZIxvhY+blp4aJuDA/j7mXbX+/NhwuBg
 QHDj4d2vA/3rgCsknVsVQjaMdX2JagoHI2XM5NNbPO31ki/m/mult/HcusHOYc8U
 5WjxoqqNRUAVjwIGKBMWUNKTIU1XnPOZpqYbsNjM/1INnisX5rhvxOgAvqQIRll2
 MXZYvxGEkmYv4pnnSPlB
 =aZpk
 -----END PGP SIGNATURE-----

Merge tag 'pstore-v4.17-rc1-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull pstore fix from Kees Cook:
 "Fix another compression Kconfig combination missed in testing (Tobias
  Regnery)"

* tag 'pstore-v4.17-rc1-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  pstore: fix crypto dependencies without compression
2018-04-09 12:43:18 -07:00
Stephen Smalley
fd40ffc72e selinux: fix missing dput() before selinuxfs unmount
Commit 0619f0f5e3 ("selinux: wrap selinuxfs state") triggers a BUG
when SELinux is runtime-disabled (i.e. systemd or equivalent disables
SELinux before initial policy load via /sys/fs/selinux/disable based on
/etc/selinux/config SELINUX=disabled).

This does not manifest if SELinux is disabled via kernel command line
argument or if SELinux is enabled (permissive or enforcing).

Before:
  SELinux:  Disabled at runtime.
  BUG: Dentry 000000006d77e5c7{i=17,n=null}  still in use (1) [unmount of selinuxfs selinuxfs]

After:
  SELinux:  Disabled at runtime.

Fixes: 0619f0f5e3 ("selinux: wrap selinuxfs state")
Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-09 11:52:56 -07:00
Linus Torvalds
d8312a3f61 ARM:
- VHE optimizations
 - EL2 address space randomization
 - speculative execution mitigations ("variant 3a", aka execution past invalid
 privilege register access)
 - bugfixes and cleanups
 
 PPC:
 - improvements for the radix page fault handler for HV KVM on POWER9
 
 s390:
 - more kvm stat counters
 - virtio gpu plumbing
 - documentation
 - facilities improvements
 
 x86:
 - support for VMware magic I/O port and pseudo-PMCs
 - AMD pause loop exiting
 - support for AMD core performance extensions
 - support for synchronous register access
 - expose nVMX capabilities to userspace
 - support for Hyper-V signaling via eventfd
 - use Enlightened VMCS when running on Hyper-V
 - allow userspace to disable MWAIT/HLT/PAUSE vmexits
 - usual roundup of optimizations and nested virtualization bugfixes
 
 Generic:
 - API selftest infrastructure (though the only tests are for x86 as of now)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJay19UAAoJEL/70l94x66DGKYIAIu9PTHAEwaX0et15fPW5y2x
 rrtS355lSAmMrPJ1nePRQ+rProD/1B0Kizj3/9O+B9OTKKRsorRYNa4CSu9neO2k
 N3rdE46M1wHAPwuJPcYvh3iBVXtgbMayk1EK5aVoSXaMXEHh+PWZextkl+F+G853
 kC27yDy30jj9pStwnEFSBszO9ua/URdKNKBATNx8WUP6d9U/dlfm5xv3Dc3WtKt2
 UMGmog2wh0i7ecXo7hRkMK4R7OYP3ZxAexq5aa9BOPuFp+ZdzC/MVpN+jsjq2J/M
 Zq6RNyA2HFyQeP0E9QgFsYS2BNOPeLZnT5Jg1z4jyiD32lAZ/iC51zwm4oNKcDM=
 =bPlD
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "ARM:
   - VHE optimizations

   - EL2 address space randomization

   - speculative execution mitigations ("variant 3a", aka execution past
     invalid privilege register access)

   - bugfixes and cleanups

  PPC:
   - improvements for the radix page fault handler for HV KVM on POWER9

  s390:
   - more kvm stat counters

   - virtio gpu plumbing

   - documentation

   - facilities improvements

  x86:
   - support for VMware magic I/O port and pseudo-PMCs

   - AMD pause loop exiting

   - support for AMD core performance extensions

   - support for synchronous register access

   - expose nVMX capabilities to userspace

   - support for Hyper-V signaling via eventfd

   - use Enlightened VMCS when running on Hyper-V

   - allow userspace to disable MWAIT/HLT/PAUSE vmexits

   - usual roundup of optimizations and nested virtualization bugfixes

  Generic:
   - API selftest infrastructure (though the only tests are for x86 as
     of now)"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (174 commits)
  kvm: x86: fix a prototype warning
  kvm: selftests: add sync_regs_test
  kvm: selftests: add API testing infrastructure
  kvm: x86: fix a compile warning
  KVM: X86: Add Force Emulation Prefix for "emulate the next instruction"
  KVM: X86: Introduce handle_ud()
  KVM: vmx: unify adjacent #ifdefs
  x86: kvm: hide the unused 'cpu' variable
  KVM: VMX: remove bogus WARN_ON in handle_ept_misconfig
  Revert "KVM: X86: Fix SMRAM accessing even if VM is shutdown"
  kvm: Add emulation for movups/movupd
  KVM: VMX: raise internal error for exception during invalid protected mode state
  KVM: nVMX: Optimization: Dont set KVM_REQ_EVENT when VMExit with nested_run_pending
  KVM: nVMX: Require immediate-exit when event reinjected to L2 and L1 event pending
  KVM: x86: Fix misleading comments on handling pending exceptions
  KVM: x86: Rename interrupt.pending to interrupt.injected
  KVM: VMX: No need to clear pending NMI/interrupt on inject realmode interrupt
  x86/kvm: use Enlightened VMCS when running on Hyper-V
  x86/hyper-v: detect nested features
  x86/hyper-v: define struct hv_enlightened_vmcs and clean field bits
  ...
2018-04-09 11:42:31 -07:00
Linus Torvalds
e9092d0d97 Fix subtle macro variable shadowing in min_not_zero()
Commit 3c8ba0d61d ("kernel.h: Retain constant expression output for
max()/min()") rewrote our min/max macros to be very clever, but in the
meantime resurrected a variable name shadow issue that we had had
previously fixed in commit 589a9785ee ("min/max: remove sparse
warnings when they're nested").

That commit talks about the sparse warnings that this shadowing causes,
which we ignored as just a minor annoyance.  But it turns out that the
sparse warning is the least of our problems.  We actually have a real
bug due to the shadowing through the interaction with "min_not_zero()",
which ends up doing

   min(__x, __y)

internally, and then the new declaration of "__x" and "__y" as new
variables in __cmp_once() results in a complete mess of an expression,
and "min_not_zero()" doesn't work at all.

For some odd reason, this only ever caused (reported) problems on s390,
even though it is a generic issue and most of the (obviously successful)
testing of the problematic commit had happened on other architectures.

Quoting Sebastian Ott:
 "What happened is that the bio build by the partition detection code
  was attempted to be split by the block layer because the block queue
  had a max_sector setting of 0. blk_queue_max_hw_sectors uses
  min_not_zero."

So re-introduce the use of __UNIQUE_ID() to make sure that the min/max
macros do not have these kinds of clashes.

[ That said, __UNIQUE_ID() itself has several issues that make it less
  than wonderful.

  In particular, the "uniqueness" has a fallback on the line number,
  which means that it's not actually unique in more complex cases if you
  don't build with gcc or clang (which have working unique counters that
  aren't tied to line numbers).

  That historical broken fallback also means that we have that pointless
  "prefix" argument that doesn't actually make much sense _except_ for
  the known-broken case. Oh well. ]

Fixes: 3c8ba0d61d ("kernel.h: Retain constant expression output for max()/min()")
Reported-and-tested-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-09 10:34:07 -07:00
Linus Torvalds
7886e8aa7f Merge branch 'for-linus-sa1100' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM SA1100 updates from Russell King:
 "We have support for arbitary MMIO registers providing platform GPIOs,
  which allows us to abstract some of the SA11x0 CF support.

  This set of updates makes that change"

* 'for-linus-sa1100' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: sa1100/simpad: switch simpad CF to use gpiod APIs
  ARM: sa1100/shannon: convert to generic CF sockets
  ARM: sa1100/nanoengine: convert to generic CF sockets
  ARM: sa1100/h3xxx: switch h3xxx PCMCIA to use gpiod APIs
  ARM: sa1100/cerf: convert to generic CF sockets
  ARM: sa1100/assabet: convert to generic CF sockets
  ARM: sa1100: provide infrastructure to support generic CF sockets
  pcmcia: sa1100: provide generic CF support
2018-04-09 09:26:36 -07:00
Linus Torvalds
4a1e00524c Merge branch 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM updates from Russell King:
 "A number of core ARM changes:

   - Refactoring linker script by Nicolas Pitre

   - Enable source fortification

   - Add support for Cortex R8"

* 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: decompressor: fix warning introduced in fortify patch
  ARM: 8751/1: Add support for Cortex-R8 processor
  ARM: 8749/1: Kconfig: Add ARCH_HAS_FORTIFY_SOURCE
  ARM: simplify and fix linker script for TCM
  ARM: linker script: factor out TCM bits
  ARM: linker script: factor out vectors and stubs
  ARM: linker script: factor out unwinding table sections
  ARM: linker script: factor out stuff for the .text section
  ARM: linker script: factor out stuff for the DISCARD section
  ARM: linker script: factor out some common definitions between XIP and non-XIP
2018-04-09 09:19:30 -07:00
Linus Torvalds
2025fef0ca Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu
Pull m68knommu update from Greg Ungerer:
 "Only a single fix to set the DMA masks in the ColdFire FEC platform
  data structure.

  This stops the warning from dma-mapping.h at boot time"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
  m68k: set dma and coherent masks for platform FEC ethernets
2018-04-09 09:15:46 -07:00
Linus Torvalds
5148408a51 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha
Pull alpha updates from Matt Turner:
 "A few small changes for alpha"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha:
  alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering
  alpha: Implement CPU vulnerabilities sysfs functions.
  alpha: rtc: stop validating rtc_time in .read_time
  alpha: rtc: remove unused set_mmss ops
2018-04-09 09:11:32 -07:00
Linus Torvalds
becdce1c66 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Martin Schwidefsky:

 - Improvements for the spectre defense:
    * The spectre related code is consolidated to a single file
      nospec-branch.c
    * Automatic enable/disable for the spectre v2 defenses (expoline vs.
      nobp)
    * Syslog messages for specve v2 are added
    * Enable CONFIG_GENERIC_CPU_VULNERABILITIES and define the attribute
      functions for spectre v1 and v2

 - Add helper macros for assembler alternatives and use them to shorten
   the code in entry.S.

 - Add support for persistent configuration data via the SCLP Store Data
   interface. The H/W interface requires a page table that uses 4K pages
   only, the code to setup such an address space is added as well.

 - Enable virtio GPU emulation in QEMU. To do this the depends
   statements for a few common Kconfig options are modified.

 - Add support for format-3 channel path descriptors and add a binary
   sysfs interface to export the associated utility strings.

 - Add a sysfs attribute to control the IFCC handling in case of
   constant channel errors.

 - The vfio-ccw changes from Cornelia.

 - Bug fixes and cleanups.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (40 commits)
  s390/kvm: improve stack frame constants in entry.S
  s390/lpp: use assembler alternatives for the LPP instruction
  s390/entry.S: use assembler alternatives
  s390: add assembler macros for CPU alternatives
  s390: add sysfs attributes for spectre
  s390: report spectre mitigation via syslog
  s390: add automatic detection of the spectre defense
  s390: move nobp parameter functions to nospec-branch.c
  s390/cio: add util_string sysfs attribute
  s390/chsc: query utility strings via fmt3 channel path descriptor
  s390/cio: rename struct channel_path_desc
  s390/cio: fix unbind of io_subchannel_driver
  s390/qdio: split up CCQ handling for EQBS / SQBS
  s390/qdio: don't retry EQBS after CCQ 96
  s390/qdio: restrict buffer merging to eligible devices
  s390/qdio: don't merge ERROR output buffers
  s390/qdio: simplify math in get_*_buffer_frontier()
  s390/decompressor: trim uncompressed image head during the build
  s390/crypto: Fix kernel crash on aes_s390 module remove.
  s390/defkeymap: fix global init to zero
  ...
2018-04-09 09:04:10 -07:00
haibinzhang(张海斌)
a2ac99905f vhost-net: set packet weight of tx polling to 2 * vq size
handle_tx will delay rx for tens or even hundreds of milliseconds when tx busy
polling udp packets with small length(e.g. 1byte udp payload), because setting
VHOST_NET_WEIGHT takes into account only sent-bytes but no single packet length.

Ping-Latencies shown below were tested between two Virtual Machines using
netperf (UDP_STREAM, len=1), and then another machine pinged the client:

vq size=256
Packet-Weight   Ping-Latencies(millisecond)
                   min      avg       max
Origin           3.319   18.489    57.303
64               1.643    2.021     2.552
128              1.825    2.600     3.224
256              1.997    2.710     4.295
512              1.860    3.171     4.631
1024             2.002    4.173     9.056
2048             2.257    5.650     9.688
4096             2.093    8.508    15.943

vq size=512
Packet-Weight   Ping-Latencies(millisecond)
                   min      avg       max
Origin           6.537   29.177    66.245
64               2.798    3.614     4.403
128              2.861    3.820     4.775
256              3.008    4.018     4.807
512              3.254    4.523     5.824
1024             3.079    5.335     7.747
2048             3.944    8.201    12.762
4096             4.158   11.057    19.985

Seems pretty consistent, a small dip at 2 VQ sizes.
Ring size is a hint from device about a burst size it can tolerate. Based on
benchmarks, set the weight to 2 * vq size.

To evaluate this change, another tests were done using netperf(RR, TX) between
two machines with Intel(R) Xeon(R) Gold 6133 CPU @ 2.50GHz, and vq size was
tweaked through qemu. Results shown below does not show obvious changes.

vq size=256 TCP_RR                vq size=512 TCP_RR
size/sessions/+thu%/+normalize%   size/sessions/+thu%/+normalize%
   1/       1/  -7%/        -2%      1/       1/   0%/        -2%
   1/       4/  +1%/         0%      1/       4/  +1%/         0%
   1/       8/  +1%/        -2%      1/       8/   0%/        +1%
  64/       1/  -6%/         0%     64/       1/  +7%/        +3%
  64/       4/   0%/        +2%     64/       4/  -1%/        +1%
  64/       8/   0%/         0%     64/       8/  -1%/        -2%
 256/       1/  -3%/        -4%    256/       1/  -4%/        -2%
 256/       4/  +3%/        +4%    256/       4/  +1%/        +2%
 256/       8/  +2%/         0%    256/       8/  +1%/        -1%

vq size=256 UDP_RR                vq size=512 UDP_RR
size/sessions/+thu%/+normalize%   size/sessions/+thu%/+normalize%
   1/       1/  -5%/        +1%      1/       1/  -3%/        -2%
   1/       4/  +4%/        +1%      1/       4/  -2%/        +2%
   1/       8/  -1%/        -1%      1/       8/  -1%/         0%
  64/       1/  -2%/        -3%     64/       1/  +1%/        +1%
  64/       4/  -5%/        -1%     64/       4/  +2%/         0%
  64/       8/   0%/        -1%     64/       8/  -2%/        +1%
 256/       1/  +7%/        +1%    256/       1/  -7%/         0%
 256/       4/  +1%/        +1%    256/       4/  -3%/        -4%
 256/       8/  +2%/        +2%    256/       8/  +1%/        +1%

vq size=256 TCP_STREAM            vq size=512 TCP_STREAM
size/sessions/+thu%/+normalize%   size/sessions/+thu%/+normalize%
  64/       1/   0%/        -3%     64/       1/   0%/         0%
  64/       4/  +3%/        -1%     64/       4/  -2%/        +4%
  64/       8/  +9%/        -4%     64/       8/  -1%/        +2%
 256/       1/  +1%/        -4%    256/       1/  +1%/        +1%
 256/       4/  -1%/        -1%    256/       4/  -3%/         0%
 256/       8/  +7%/        +5%    256/       8/  -3%/         0%
 512/       1/  +1%/         0%    512/       1/  -1%/        -1%
 512/       4/  +1%/        -1%    512/       4/   0%/         0%
 512/       8/  +7%/        -5%    512/       8/  +6%/        -1%
1024/       1/   0%/        -1%   1024/       1/   0%/        +1%
1024/       4/  +3%/         0%   1024/       4/  +1%/         0%
1024/       8/  +8%/        +5%   1024/       8/  -1%/         0%
2048/       1/  +2%/        +2%   2048/       1/  -1%/         0%
2048/       4/  +1%/         0%   2048/       4/   0%/        -1%
2048/       8/  -2%/         0%   2048/       8/   5%/        -1%
4096/       1/  -2%/         0%   4096/       1/  -2%/         0%
4096/       4/  +2%/         0%   4096/       4/   0%/         0%
4096/       8/  +9%/        -2%   4096/       8/  -5%/        -1%

Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Haibin Zhang <haibinzhang@tencent.com>
Signed-off-by: Yunfang Tai <yunfangtai@tencent.com>
Signed-off-by: Lidong Chen <lidongchen@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-09 11:01:37 -04:00
Vadim Lomovtsev
9b5c4dfb2a net: thunderx: rework mac addresses list to u64 array
It is too expensive to pass u64 values via linked list, instead
allocate array for them by overall number of mac addresses from netdev.

This eventually removes multiple kmalloc() calls, aviod memory
fragmentation and allow to put single null check on kmalloc
return value in order to prevent a potential null pointer dereference.

Addresses-Coverity-ID: 1467429 ("Dereference null return value")
Fixes: 37c3347eb2 ("net: thunderx: add ndo_set_rx_mode callback implementation for VF")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-09 10:59:38 -04:00
Eric Dumazet
b6a37e5e25 inetpeer: fix uninit-value in inet_getpeer
syzbot/KMSAN reported that p->dtime was read while it was
not yet initialized in :

	delta = (__u32)jiffies - p->dtime;
	if (delta < ttl || !refcount_dec_if_one(&p->refcnt))
		gc_stack[i] = NULL;

This is a false positive, because the inetpeer wont be erased
from rb-tree if the refcount_dec_if_one(&p->refcnt) does not
succeed. And this wont happen before first inet_putpeer() call
for this inetpeer has been done, and ->dtime field is written
exactly before the refcount_dec_and_test(&p->refcnt).

The KMSAN report was :

BUG: KMSAN: uninit-value in inet_peer_gc net/ipv4/inetpeer.c:163 [inline]
BUG: KMSAN: uninit-value in inet_getpeer+0x1567/0x1e70 net/ipv4/inetpeer.c:228
CPU: 0 PID: 9494 Comm: syz-executor5 Not tainted 4.16.0+ #82
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:17 [inline]
 dump_stack+0x185/0x1d0 lib/dump_stack.c:53
 kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
 __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676
 inet_peer_gc net/ipv4/inetpeer.c:163 [inline]
 inet_getpeer+0x1567/0x1e70 net/ipv4/inetpeer.c:228
 inet_getpeer_v4 include/net/inetpeer.h:110 [inline]
 icmpv4_xrlim_allow net/ipv4/icmp.c:330 [inline]
 icmp_send+0x2b44/0x3050 net/ipv4/icmp.c:725
 ip_options_compile+0x237c/0x29f0 net/ipv4/ip_options.c:472
 ip_rcv_options net/ipv4/ip_input.c:284 [inline]
 ip_rcv_finish+0xda8/0x16d0 net/ipv4/ip_input.c:365
 NF_HOOK include/linux/netfilter.h:288 [inline]
 ip_rcv+0x119d/0x16f0 net/ipv4/ip_input.c:493
 __netif_receive_skb_core+0x47cf/0x4a80 net/core/dev.c:4562
 __netif_receive_skb net/core/dev.c:4627 [inline]
 netif_receive_skb_internal+0x49d/0x630 net/core/dev.c:4701
 netif_receive_skb+0x230/0x240 net/core/dev.c:4725
 tun_rx_batched drivers/net/tun.c:1555 [inline]
 tun_get_user+0x6d88/0x7580 drivers/net/tun.c:1962
 tun_chr_write_iter+0x1d4/0x330 drivers/net/tun.c:1990
 do_iter_readv_writev+0x7bb/0x970 include/linux/fs.h:1776
 do_iter_write+0x30d/0xd40 fs/read_write.c:932
 vfs_writev fs/read_write.c:977 [inline]
 do_writev+0x3c9/0x830 fs/read_write.c:1012
 SYSC_writev+0x9b/0xb0 fs/read_write.c:1085
 SyS_writev+0x56/0x80 fs/read_write.c:1082
 do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x3d/0xa2
RIP: 0033:0x455111
RSP: 002b:00007fae0365cba0 EFLAGS: 00000293 ORIG_RAX: 0000000000000014
RAX: ffffffffffffffda RBX: 000000000000002e RCX: 0000000000455111
RDX: 0000000000000001 RSI: 00007fae0365cbf0 RDI: 00000000000000fc
RBP: 0000000020000040 R08: 00000000000000fc R09: 0000000000000000
R10: 000000000000002e R11: 0000000000000293 R12: 00000000ffffffff
R13: 0000000000000658 R14: 00000000006fc8e0 R15: 0000000000000000

Uninit was created at:
 kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline]
 kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:188
 kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:314
 kmem_cache_alloc+0xaab/0xb90 mm/slub.c:2756
 inet_getpeer+0xed8/0x1e70 net/ipv4/inetpeer.c:210
 inet_getpeer_v4 include/net/inetpeer.h:110 [inline]
 ip4_frag_init+0x4d1/0x740 net/ipv4/ip_fragment.c:153
 inet_frag_alloc net/ipv4/inet_fragment.c:369 [inline]
 inet_frag_create net/ipv4/inet_fragment.c:385 [inline]
 inet_frag_find+0x7da/0x1610 net/ipv4/inet_fragment.c:418
 ip_find net/ipv4/ip_fragment.c:275 [inline]
 ip_defrag+0x448/0x67a0 net/ipv4/ip_fragment.c:676
 ip_check_defrag+0x775/0xda0 net/ipv4/ip_fragment.c:724
 packet_rcv_fanout+0x2a8/0x8d0 net/packet/af_packet.c:1447
 deliver_skb net/core/dev.c:1897 [inline]
 deliver_ptype_list_skb net/core/dev.c:1912 [inline]
 __netif_receive_skb_core+0x314a/0x4a80 net/core/dev.c:4545
 __netif_receive_skb net/core/dev.c:4627 [inline]
 netif_receive_skb_internal+0x49d/0x630 net/core/dev.c:4701
 netif_receive_skb+0x230/0x240 net/core/dev.c:4725
 tun_rx_batched drivers/net/tun.c:1555 [inline]
 tun_get_user+0x6d88/0x7580 drivers/net/tun.c:1962
 tun_chr_write_iter+0x1d4/0x330 drivers/net/tun.c:1990
 do_iter_readv_writev+0x7bb/0x970 include/linux/fs.h:1776
 do_iter_write+0x30d/0xd40 fs/read_write.c:932
 vfs_writev fs/read_write.c:977 [inline]
 do_writev+0x3c9/0x830 fs/read_write.c:1012
 SYSC_writev+0x9b/0xb0 fs/read_write.c:1085
 SyS_writev+0x56/0x80 fs/read_write.c:1082
 do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x3d/0xa2

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-09 10:57:35 -04:00
Russell King
9178caf964 Merge branches 'devel-stable' and 'misc' into for-linus 2018-04-09 10:08:51 +01:00
Esben Haabendal
76327a35ca dp83640: Ensure against premature access to PHY registers after reset
The datasheet specifies a 3uS pause after performing a software
reset. The default implementation of genphy_soft_reset() does not
provide this, so implement soft_reset with the needed pause.

Signed-off-by: Esben Haabendal <eha@deif.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-08 19:58:52 -04:00
David S. Miller
1f1cba787f Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2018-04-09

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Two sockmap fixes: i) fix a potential warning when a socket with
   pending cork data is closed by freeing the memory right when the
   socket is closed instead of seeing still outstanding memory at
   garbage collector time, ii) fix a NULL pointer deref in case of
   duplicates release calls, so make sure to only reset the sk_prot
   pointer when it's in a valid state to do so, both from John.

2) Fix a compilation warning in bpf_prog_attach_check_attach_type()
   by moving the function under CONFIG_CGROUP_BPF ifdef since only
   used there, from Anders.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-08 19:51:41 -04:00
David S. Miller
4c7c12e0c9 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Johan Hedberg says:

====================
pull request: bluetooth 2018-04-08

Here's one important Bluetooth fix for the 4.17-rc series that's needed
to pass several Bluetooth qualification test cases.

Let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-08 17:19:15 -04:00
Jiri Pirko
fc56be47da devlink: convert occ_get op to separate registration
This resolves race during initialization where the resources with
ops are registered before driver and the structures used by occ_get
op is initialized. So keep occ_get callbacks registered only when
all structs are initialized.

The example flows, as it is in mlxsw:
1) driver load/asic probe:
   mlxsw_core
      -> mlxsw_sp_resources_register
        -> mlxsw_sp_kvdl_resources_register
          -> devlink_resource_register IDX
   mlxsw_spectrum
      -> mlxsw_sp_kvdl_init
        -> mlxsw_sp_kvdl_parts_init
          -> mlxsw_sp_kvdl_part_init
            -> devlink_resource_size_get IDX (to get the current setup
                                              size from devlink)
        -> devlink_resource_occ_get_register IDX (register current
                                                  occupancy getter)
2) reload triggered by devlink command:
  -> mlxsw_devlink_core_bus_device_reload
    -> mlxsw_sp_fini
      -> mlxsw_sp_kvdl_fini
	-> devlink_resource_occ_get_unregister IDX
    (struct mlxsw_sp *mlxsw_sp is freed at this point, call to occ get
     which is using mlxsw_sp would cause use-after free)
    -> mlxsw_sp_init
      -> mlxsw_sp_kvdl_init
        -> mlxsw_sp_kvdl_parts_init
          -> mlxsw_sp_kvdl_part_init
            -> devlink_resource_size_get IDX (to get the current setup
                                              size from devlink)
        -> devlink_resource_occ_get_register IDX (register current
                                                  occupancy getter)

Fixes: d9f9b9a4d0 ("devlink: Add support for resource abstraction")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-08 12:45:57 -04:00
Esben Haabendal
5571196135 ARM: dts: ls1021a: Specify TBIPA register address
The current (mildly evil) fsl_pq_mdio code uses an undocumented shadow of
the TBIPA register on LS1021A, which happens to be read-only.
Changing TBI PHY address therefore does not work on LS1021A.

The real (and documented) address of the TBIPA registere lies in the eTSEC
block and not in MDIO/MII, which is read/write, so using that fixes
the problem.

Signed-off-by: Esben Haabendal <eha@deif.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-08 12:44:49 -04:00
Esben Haabendal
21481189e8 net/fsl_pq_mdio: Allow explicit speficition of TBIPA address
This introduces a simpler and generic method for for finding (and mapping)
the TBIPA register.

Instead of relying of complicated logic for finding the TBIPA register
address based on the MDIO or MII register block base
address, which even in some cases relies on undocumented shadow registers,
a second "reg" entry for the mdio bus devicetree node specifies the TBIPA
register.

Backwards compatibility is kept, as the existing logic is applied when
only a single "reg" mapping is specified.

Signed-off-by: Esben Haabendal <eha@deif.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-08 12:44:49 -04:00
David S. Miller
4e31a6845f Merge branch 'ibmvnic-Fix-driver-reset-and-DMA-bugs'
Thomas Falcon says:

====================
ibmvnic: Fix driver reset and DMA bugs

This patch series introduces some fixes to the driver reset
routines and a patch that fixes mistakes caught by the kernel
DMA debugger.

The reset fixes include a fix to reset TX queue counters properly
after a reset as well as updates to driver reset error-handling code.
It also provides updates to the reset handling routine for redundant
backing VF failover and partition migration cases.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-08 12:39:47 -04:00
Nathan Fontenot
30f796258c ibmvnic: Do not reset CRQ for Mobility driver resets
When resetting the ibmvnic driver after a partition migration occurs
there is no requirement to do a reset of the main CRQ. The current
driver code does the required re-enable of the main CRQ, then does
a reset of the main CRQ later.

What we should be doing for a driver reset after a migration is to
re-enable the main CRQ, release all the sub-CRQs, and then allocate
new sub-CRQs after capability negotiation.

This patch updates the handling of mobility resets to do the proper
work and not reset the main CRQ. To do this the initialization/reset
of the main CRQ had to be moved out of the ibmvnic_init routine
and in to the ibmvnic_probe and do_reset routines.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-08 12:39:47 -04:00
Thomas Falcon
5a18e1e0c1 ibmvnic: Fix failover case for non-redundant configuration
There is a failover case for a non-redundant pseries VNIC
configuration that was not being handled properly. The current
implementation assumes that the driver will always have a redandant
device to communicate with following a failover notification. There
are cases, however, when a non-redundant configuration can receive
a failover request. If that happens, the driver should wait until
it receives a signal that the device is ready for operation.

The driver is agnostic of its backing hardware configuration,
so this fix necessarily affects all device failover management.
The driver needs to wait until it receives a signal that the device
is ready for resetting. A flag is introduced to track this intermediary
state where the driver is waiting for an active device.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-08 12:39:47 -04:00
Thomas Falcon
af894d2398 ibmvnic: Fix reset scheduler error handling
In some cases, if the driver is waiting for a reset following
a device parameter change, failure to schedule a reset can result
in a hang since a completion signal is never sent.

If the device configuration is being altered by a tool such
as ethtool or ifconfig, it could cause the console to hang
if the reset request does not get scheduled. Add some additional
error handling code to exit the wait_for_completion if there is
one in progress.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-08 12:39:47 -04:00