Different namespaces might have different requirements to reuse
TIME-WAIT sockets for new connections. This might be required in
cases where different namespace applications are in place which
require TIME_WAIT socket connections to be reduced independently
of the host.
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit beb0babfb7 ("korina: disable napi on close and restart")
introduced calls to napi_disable() that were missing before,
unfortunately this leaves a small window during which NAPI has a chance
to run, yet we just freed resources since korina_free_ring() has been
called:
Fix this by disabling NAPI first then freeing resource, and make sure
that we also cancel the restart task before doing the resource freeing.
Fixes: beb0babfb7 ("korina: disable napi on close and restart")
Reported-by: Alexandros C. Couloumbis <alex@ozo.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shahar reported a soft lockup in tc_classify(), where we run into an
endless loop when walking the classifier chain due to tp->next == tp
which is a state we should never run into. The issue only seems to
trigger under load in the tc control path.
What happens is that in tc_ctl_tfilter(), thread A allocates a new
tp, initializes it, sets tp_created to 1, and calls into tp->ops->change()
with it. In that classifier callback we had to unlock/lock the rtnl
mutex and returned with -EAGAIN. One reason why we need to drop there
is, for example, that we need to request an action module to be loaded.
This happens via tcf_exts_validate() -> tcf_action_init/_1() meaning
after we loaded and found the requested action, we need to redo the
whole request so we don't race against others. While we had to unlock
rtnl in that time, thread B's request was processed next on that CPU.
Thread B added a new tp instance successfully to the classifier chain.
When thread A returned grabbing the rtnl mutex again, propagating -EAGAIN
and destroying its tp instance which never got linked, we goto replay
and redo A's request.
This time when walking the classifier chain in tc_ctl_tfilter() for
checking for existing tp instances we had a priority match and found
the tp instance that was created and linked by thread B. Now calling
again into tp->ops->change() with that tp was successful and returned
without error.
tp_created was never cleared in the second round, thus kernel thinks
that we need to link it into the classifier chain (once again). tp and
*back point to the same object due to the match we had earlier on. Thus
for thread B's already public tp, we reset tp->next to tp itself and
link it into the chain, which eventually causes the mentioned endless
loop in tc_classify() once a packet hits the data path.
Fix is to clear tp_created at the beginning of each request, also when
we replay it. On the paths that can cause -EAGAIN we already destroy
the original tp instance we had and on replay we really need to start
from scratch. It seems that this issue was first introduced in commit
12186be7d2 ("net_cls: fix unconfigured struct tcf_proto keeps chaining
and avoid kernel panic when we use cls_cgroup").
Fixes: 12186be7d2 ("net_cls: fix unconfigured struct tcf_proto keeps chaining and avoid kernel panic when we use cls_cgroup")
Reported-by: Shahar Klein <shahark@mellanox.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Tested-by: Shahar Klein <shahark@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The user prio field is wrong (and overflows) in the XDP forward
flow.
This is a result of a bad value for num_tx_rings_p_up, which should
account all XDP TX rings, as they operate for the same user prio.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reported-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit 6f00089c73 ("tipc: remove SS_DISCONNECTING state") the
check for socket type is in the wrong place, causing a closing socket
to always send out a FIN message even when the socket was never
connected. This is normally harmless, since the destination node for
such messages most often is zero, and the message will be dropped, but
it is still a wrong and confusing behavior.
We fix this in this commit.
Reviewed-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In an IPvlan setup when master is set in loopback mode e.g.
ethtool -K eth0 set loopback on
where eth0 is master device for IPvlan setup.
The failure is caused by the faulty logic that determines if the
packet is from TX-path vs. RX-path by just looking at the mac-
addresses on the packet while processing multicast packets.
In the loopback-mode where this crash was happening, the packets
that are sent out are reflected by the NIC and are processed on
the RX path, but mac-address check tricks into thinking this
packet is from TX path and falsely uses dev_forward_skb() to pass
packets to the slave (virtual) devices.
This patch records the path while queueing packets and eliminates
logic of looking at mac-addresses for the same decision.
------------[ cut here ]------------
kernel BUG at include/linux/skbuff.h:1737!
Call Trace:
[<ffffffff921fbbc2>] dev_forward_skb+0x92/0xd0
[<ffffffffc031ac65>] ipvlan_process_multicast+0x395/0x4c0 [ipvlan]
[<ffffffffc031a9a7>] ? ipvlan_process_multicast+0xd7/0x4c0 [ipvlan]
[<ffffffff91cdfea7>] ? process_one_work+0x147/0x660
[<ffffffff91cdff09>] process_one_work+0x1a9/0x660
[<ffffffff91cdfea7>] ? process_one_work+0x147/0x660
[<ffffffff91ce086d>] worker_thread+0x11d/0x360
[<ffffffff91ce0750>] ? rescuer_thread+0x350/0x350
[<ffffffff91ce960b>] kthread+0xdb/0xe0
[<ffffffff91c05c70>] ? _raw_spin_unlock_irq+0x30/0x50
[<ffffffff91ce9530>] ? flush_kthread_worker+0xc0/0xc0
[<ffffffff92348b7a>] ret_from_fork+0x9a/0xd0
[<ffffffff91ce9530>] ? flush_kthread_worker+0xc0/0xc0
Fixes: ba35f8588f ("ipvlan: Defer multicast / broadcast processing to a work-queue")
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
CC: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1) netif_rx() / dev_forward_skb() should not be called from process
context.
2) ipvlan_count_rx() should be called with preemption disabled.
3) We should check if ipvlan->dev is up before feeding packets
to netif_rx()
4) We need to prevent device from disappearing if some packets
are in the multicast backlog.
5) One kfree_skb() should be a consume_skb() eventually
Fixes: ba35f8588f ("ipvlan: Defer multicast / broadcast processing to
a work-queue")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) We have to be careful to not try and place a checksum after the end
of a rawv6 packet, fix from Dave Jones with help from Hannes
Frederic Sowa.
2) Missing memory barriers in tcp_tasklet_func() lead to crashes, from
Eric Dumazet.
3) Several bug fixes for the new XDP support in virtio_net, from Jason
Wang.
4) Increase headroom in RX skbs in be2net driver to accomodate
encapsulations such as geneve. From Kalesh A P.
5) Fix SKB frag unmapping on TX in mvpp2, from Thomas Petazzoni.
6) Pre-pulling UDP headers created a regression in RECVORIGDSTADDR
socket option support, from Willem de Bruijn.
7) UID based routing added a potential OOPS in ip_do_redirect() when we
see an SKB without a socket attached. We just need it for the
network namespace which we can get from skb->dev instead. Fix from
Lorenzo Colitti.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (30 commits)
sctp: fix recovering from 0 win with small data chunks
sctp: do not loose window information if in rwnd_over
virtio-net: XDP support for small buffers
virtio-net: remove big packet XDP codes
virtio-net: forbid XDP when VIRTIO_NET_F_GUEST_UFO is support
virtio-net: make rx buf size estimation works for XDP
virtio-net: unbreak csumed packets for XDP_PASS
virtio-net: correctly handle XDP_PASS for linearized packets
virtio-net: fix page miscount during XDP linearizing
virtio-net: correctly xmit linearized page on XDP_TX
virtio-net: remove the warning before XDP linearizing
mlxsw: spectrum_router: Correctly remove nexthop groups
mlxsw: spectrum_router: Don't reflect dead neighs
neigh: Send netevent after marking neigh as dead
ipv6: handle -EFAULT from skb_copy_bits
inet: fix IP(V6)_RECVORIGDSTADDR for udp sockets
net/sched: cls_flower: Mandate mask when matching on flags
net/sched: act_tunnel_key: Fix setting UDP dst port in metadata under IPv6
stmmac: CSR clock configuration fix
net: ipv4: Don't crash if passing a null sk to ip_do_redirect.
...
It's possible that we receive a packet that is larger than current
window. If it's the first packet in this way, it will cause it to
increase rwnd_over. Then, if we receive another data chunk (specially as
SCTP allows you to have one data chunk in flight even during 0 window),
rwnd_over will be overwritten instead of added to.
In the long run, this could cause the window to grow bigger than its
initial size, as rwnd_over would be charged only for the last received
data chunk while the code will try open the window for all packets that
were received and had its value in rwnd_over overwritten. This, then,
can lead to the worsening of payload/buffer ratio and cause rwnd_press
to kick in more often.
The fix is to sum it too, same as is done for rwnd_press, so that if we
receive 3 chunks after closing the window, we still have to release that
same amount before re-opening it.
Log snippet from sctp_test exhibiting the issue:
[ 146.209232] sctp: sctp_assoc_rwnd_decrease: asoc:ffff88013928e000
rwnd decreased by 1 to (0, 1, 114221)
[ 146.209232] sctp: sctp_assoc_rwnd_decrease:
association:ffff88013928e000 has asoc->rwnd:0, asoc->rwnd_over:1!
[ 146.209232] sctp: sctp_assoc_rwnd_decrease: asoc:ffff88013928e000
rwnd decreased by 1 to (0, 1, 114221)
[ 146.209232] sctp: sctp_assoc_rwnd_decrease:
association:ffff88013928e000 has asoc->rwnd:0, asoc->rwnd_over:1!
[ 146.209232] sctp: sctp_assoc_rwnd_decrease: asoc:ffff88013928e000
rwnd decreased by 1 to (0, 1, 114221)
[ 146.209232] sctp: sctp_assoc_rwnd_decrease:
association:ffff88013928e000 has asoc->rwnd:0, asoc->rwnd_over:1!
[ 146.209232] sctp: sctp_assoc_rwnd_decrease: asoc:ffff88013928e000
rwnd decreased by 1 to (0, 1, 114221)
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull final vfs updates from Al Viro:
"Assorted cleanups and fixes all over the place"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
sg_write()/bsg_write() is not fit to be called under KERNEL_DS
ufs: fix function declaration for ufs_truncate_blocks
fs: exec: apply CLOEXEC before changing dumpable task flags
seq_file: reset iterator to first record for zero offset
vfs: fix isize/pos/len checks for reflink & dedupe
[iov_iter] fix iterate_all_kinds() on empty iterators
move aio compat to fs/aio.c
reorganize do_make_slave()
clone_private_mount() doesn't need to touch namespace_sem
remove a bogus claim about namespace_sem being held by callers of mnt_alloc_id()
Jason Wang says:
====================
several fixups for virtio-net XDP
Merry Xmas and a Happy New year to all:
This series tries to fixes several issues for virtio-net XDP which
could be categorized into several parts:
- fix several issues during XDP linearizing
- allow csumed packet to work for XDP_PASS
- make EWMA rxbuf size estimation works for XDP
- forbid XDP when GUEST_UFO is support
- remove big packet XDP support
- add XDP support or small buffer
Please see individual patches for details.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit f600b69050 ("virtio_net: Add XDP support") leaves the case of
small receive buffer untouched. This will confuse the user who want to
set XDP but use small buffers. Other than forbid XDP in small buffer
mode, let's make it work. XDP then can only work at skb->data since
virtio-net create skbs during refill, this is sub optimal which could
be optimized in the future.
Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now we in fact don't allow XDP for big packets, remove its codes.
Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When VIRTIO_NET_F_GUEST_UFO is negotiated, host could still send UFO
packet that exceeds a single page which could not be handled
correctly by XDP. So this patch forbids setting XDP when GUEST_UFO is
supported. While at it, forbid XDP for ECN (which comes only from GRO)
too to prevent user from misconfiguration.
Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We don't update ewma rx buf size in the case of XDP. This will lead
underestimation of rx buf size which causes host to produce more than
one buffers. This will greatly increase the possibility of XDP page
linearization.
Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We drop csumed packet when do XDP for packets. This breaks
XDP_PASS when GUEST_CSUM is supported. Fix this by allowing csum flag
to be set. With this patch, simple TCP works for XDP_PASS.
Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When XDP_PASS were determined for linearized packets, we try to get
new buffers in the virtqueue and build skbs from them. This is wrong,
we should create skbs based on existed buffers instead. Fixing them by
creating skb based on xdp_page.
With this patch "ping 192.168.100.4 -s 3900 -M do" works for XDP_PASS.
Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We don't put page during linearizing, the would cause leaking when
xmit through XDP_TX or the packet exceeds PAGE_SIZE. Fix them by
put page accordingly. Also decrease the number of buffers during
linearizing to make sure caller can free buffers correctly when packet
exceeds PAGE_SIZE. With this patch, we won't get OOM after linearize
huge number of packets.
Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After we linearize page, we should xmit this page instead of the page
of first buffer which may lead unexpected result. With this patch, we
can see correct packet during XDP_TX.
Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since we use EWMA to estimate the size of rx buffer. When rx buffer
size is underestimated, it's usual to have a packet with more than one
buffers. Consider this is not a bug, remove the warning and correct
the comment before XDP linearizing.
Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJYW7vZAAoJEGu/nxmHO1GNeGUIAJil3Q4ZaeOaaj5uNs4h64kc
0BAfGSwzGNgreX5PWm+jQVeh6xbAqXnYtsWIDSibpxnXOhAZcXHbpzKLTwlMl4rh
qpXAAWhHcBsOKiNcg++RRmouubYtpgMoOKCgo/DzGp51mSV7/8K2mugzDRohPUsR
jUDqUa9qvt65uqI5xCuK1n3aLtCQ9m3RUzDfQbH4fK/yBXpNIE83xegU1SBJKZHj
uGPJpjHhc1vaba6Y8vDDBHuJR9IJxfeSnoJE0xMmGlIub40exw7P4Dek1Tc/3G+R
qiqT9aGAbegkFDerps5sqOLbU4Lm4Js8Ov78l3IN1FSVdYWsptzRibjIbUidPdc=
=zypk
-----END PGP SIGNATURE-----
Merge tag 'befs-v4.10-rc1' of git://github.com/luisbg/linux-befs
Pull befs updates from Luis de Bethencourt:
"A series of small fixes and adding NFS export support"
* tag 'befs-v4.10-rc1' of git://github.com/luisbg/linux-befs:
befs: add NFS export support
befs: remove trailing whitespaces
befs: remove signatures from comments
befs: fix style issues in header files
befs: fix style issues in linuxvfs.c
befs: fix typos in linuxvfs.c
befs: fix style issues in io.c
befs: fix style issues in inode.c
befs: fix style issues in debug.c
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJYXIQlAAoJEAx081l5xIa+WnEP/RDeTucoQrcKj66EU7Pq5kO6
blF/mcn7RfcHnxDIWPQCASk+pyyeTpCq063H/SMvcqfsXA8M8/89INZK/zYlYCqX
5dMjCuHoQ465V4LClY0VhWpeM2EPgvVWWFHvCE0tUQQKLvJRQjZHQl0zjqDzrfAR
6lVNFE7nIbHPy6dlHw7Xe9kQmsW1GpQhsr4o31MpZIQdOwBHu2k4CzioSOwPLaDg
u/RejB3dMdxfn7RKqyupSF8Z8VQ1yQE+LcTTlsc4CbfQf1YX8pijhFQShI6OBsz0
3pIS6Veucd9QH2HQEA3eFptHocJ7XK6w8aIQ93hgJ2WR9JWIXrVwLrpNQss+HcOA
9o5DriPwyYW1B5gZ86N652EWm+d5qIIJEAd4pcfAiQZDBzoVdMyUXZYcAV1yfsHI
eWiJZgnxytlem7tFmFACmmQTCMOZZU1BlESVf5cSX1JQAVo7TCEhkIi6xbXDY8E9
qU3NLETXZvaMIXJnDtI27wbXvzcK35cbskIwVPgd6ax9rV8IP33WmXXbs5CRtVwC
EWAyu/n2s3LOUPwZ9Sack3Whtl2c55yywUe4SARROuVhGtpzz7xpCbasg0ecBTwl
jEw1HhoonQAf/y5YxDVLX1SAMbiO7pSw9cmCKN6zCC2Iius5b4uBTpzbrQbUwUnc
Vk2/cX2oZBMXSk4j54sZ
=Taep
-----END PGP SIGNATURE-----
Merge tag 'drm-fixes-for-4.10-rc1' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"Some fixes came in while I was out, mostly intel and amdgpu ones, with
one ast fix"
Daniel Vetter says:
"This should also shut up the WARN_ON(!intel_dp->lane_count) noise"
* tag 'drm-fixes-for-4.10-rc1' of git://people.freedesktop.org/~airlied/linux: (35 commits)
drm/amdgpu: update tile table for oland/hainan
drm/amdgpu: update tile table for verde
drm/amdgpu: update rev id for verde
drm/amdgpu: update golden setting for verde
drm/amdgpu: update rev id for oland
drm/amdgpu: update golden setting for oland
drm/amdgpu: update rev id for hainan
drm/amdgpu: update golden setting for hainan
drm/amdgpu: update rev id for pitcairn
drm/amdgpu: update golden setting for pitcairn
drm/amdgpu: update golden setting/tiling table of tahiti
drm/i915: skip the first 4k of stolen memory on everything >= gen8
drm/i915: Fallback to single PAGE_SIZE segments for DMA remapping
drm/i915: Fix use after free in logical_render_ring_init
drm/i915: disable PSR by default on HSW/BDW
drm/i915: Fix setting of boost freq tunable
drm/i915: tune down the fast link training vs boot fail
drm/i915: Reorder phys backing storage release
drm/i915/gen9: Fix PCODE polling during SAGV disabling
drm/i915/gen9: Fix PCODE polling during CDCLK change notification
...
- Series of qedr fixes
- Series of rxe fixes
- One isolated i40iw fix
- One isolated cma fix
- One isolated cxgb4 fix
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJYXAGvAAoJELgmozMOVy/dDukQAMMNarWp0U8KfNYRU5tyCBwd
aIQC1gFT6GUCFys40Z6L84m1D3NpGR+vzVv3grVBeuge73b79zAOHXvVDwJCA+Jl
QQLG3vZ13C3158sLDiK8zL+4Ob5OfOQ5nQ2spvDfJWpye9SD+pWFcrpqvK02ANRN
kFHILk1gROBTNi46yBR5hjWOkw7Bua6XLsPxh6xoaDZ43NL0r0xgm43FTnj/19x3
0zpZYYKP+3C6U7678rqaog9zfXHvadghW5/WBJ/VgfKqEmH89ESx4J2MvbB8DxFD
1tWAOpr5TNY5jnh8mtUsceDjCzQivc/RWqAu05BspEwcavjSLFyRYr1epR0/4oAd
PqLSmfORmhpJ8+5Kmn+chtXo3TT4SYGHIzSUbgbEV/ClwX/7UW+w8mfQZ3buUBq/
cQp/oRnJcsrQIEDFO3AH7P+6Sxy6t3zbSl5oKBUOI1u4RFmC7YBPqo9fQu2Z2mGk
3+AWQaPr7qgEcFzXBgLzvd4LhTYKsvmiNwrcXi9KjjwQjNEVg15qqF2YtmxEUgi9
kh3IOcGan3iSblhV/WLrxcOjlPQrPpBOVnTPhUskFtlsrD+032OxeOBpVoU3nCUt
MjTYWoNTYdw4wHz0w373o0uR4+4nl4a5OmO4Fh6Drmg5hm4Bl9BWy0Kziu93Z1Ay
Z2utZVWLWhBzn8yJujUz
=NW9g
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull rdma fixes from Doug Ledford:
"First round of -rc fixes for 4.10 kernel:
- a series of qedr fixes
- a series of rxe fixes
- one i40iw fix
- one cma fix
- one cxgb4 fix"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
IB/rxe: Don't check for null ptr in send()
IB/rxe: Drop future atomic/read packets rather than retrying
IB/rxe: Use BTH_PSN_MASK when ACKing duplicate sends
qedr: Always notify the verb consumer of flushed CQEs
qedr: clear the vendor error field in the work completion
qedr: post_send/recv according to QP state
qedr: ignore inline flag in read verbs
qedr: modify QP state to error when destroying it
qedr: return correct value on modify qp
qedr: return error if destroy CQ failed
qedr: configure the number of CQEs on CQ creation
i40iw: Set 128B as the only supported RQ WQE size
IB/cma: Fix a race condition in iboe_addr_get_sgid()
IB/rxe: Fix a memory leak in rxe_qp_cleanup()
iw_cxgb4: set correct FetchBurstMax for QPs
This is mostly stuff which missed the initial pull. There's a new
driver: qedi, some ufs, ibmvscsis and ncr5380 updates plus some
assorted driver fixes and also a fix for the bug where if a device
goes into a blocked state between configuration and sysfs device add
(which can be a long time under async probing) it would become
permanently blocked.
Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJYXDugAAoJEAVr7HOZEZN4MO4P/2FM+U9uxjSMnHlnnEf/zVNE
pqYS8vxdnoPwbuzd++uFTneLa8xMGXPmX4X0t6tZQsi9w+ZDAUYPWEjH2TPCa112
V6y9bDKTnkXyLUo78dIRIJXb0lVSi2bCYi0R9x6HA2wyU4/NRhYtCSbYTONEzHkk
Y3aADV6kLQf8NNsqExFmtY1XNCc+xjSistcJwbmIkFAB8qdVFo2xo0qa6Sb1SEsX
WzER/6GdZOVblEy6llmlfSG3h8XPfbxVvqdHvTLlkCUjQsOucjzhxZZ1TGcqWhWB
XQGuXlfOTJN1O6ZEAFfZdiz7g/1NGjwczUP4x99S2NiPFD76C34ncRBFiYODvdYs
izsctB4l4cZmBNdYQSYGYxY5emJrmv1JCvtaqLxl2sQ+XIxGRv/t3yAWisdQi3My
UlGVZw8lyxPNY/+E7vuC9QvFlDW2FBTop8VPH/wvZFB3cuDssHr6Uogx9oxuX/BY
LfYu7GsGCcMigyot6MKiUMlT5Od4cHDlTaQE5kfgCSCP0s8Ssi7CQv2/CnpmWln+
HU/zDC+CRzZjkcCNy7nsgUhCEWKMWcPPTQ/HMpXDMRUIMfweHDUJ/cM5tRyqb0kl
n8TROG9Ptma2A2EcDYtkAqX+XqzE0RUTwDwcQhSQR/XJZEHmsVlXGyCcY3sx3UmZ
5q9bgMoRLWzsiTuZSqBR
=Zofc
-----END PGP SIGNATURE-----
Merge tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull late SCSI updates from James Bottomley:
"This is mostly stuff which missed the initial pull.
There's a new driver: qedi, and some ufs, ibmvscsis and ncr5380
updates plus some assorted driver fixes and also a fix for the bug
where if a device goes into a blocked state between configuration and
sysfs device add (which can be a long time under async probing) it
would become permanently blocked"
* tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (30 commits)
scsi: avoid a permanent stop of the scsi device's request queue
scsi: mpt3sas: Recognize and act on iopriority info
scsi: qla2xxx: Fix Target mode handling with Multiqueue changes.
scsi: qla2xxx: Add Block Multi Queue functionality.
scsi: qla2xxx: Add multiple queue pair functionality.
scsi: qla2xxx: Utilize pci_alloc_irq_vectors/pci_free_irq_vectors calls.
scsi: qla2xxx: Only allow operational MBX to proceed during RESET.
scsi: hpsa: remove memory allocate failure message
scsi: Update 3ware driver email addresses
scsi: zfcp: fix rport unblock race with LUN recovery
scsi: zfcp: do not trace pure benign residual HBA responses at default level
scsi: zfcp: fix use-after-"free" in FC ingress path after TMF
scsi: libcxgbi: return error if interface is not up
scsi: cxgb4i: libcxgbi: add missing module_put()
scsi: cxgb4i: libcxgbi: cxgb4: add T6 iSCSI completion feature
scsi: cxgb4i: libcxgbi: add active open cmd for T6 adapters
scsi: cxgb4i: use cxgb4_tp_smt_idx() to get smt_idx
scsi: qedi: Add QLogic FastLinQ offload iSCSI driver framework.
scsi: aacraid: remove wildcard for series 9 controllers
scsi: ibmvscsi: add write memory barrier to CRQ processing
...
Jiri Pirko says:
====================
mlxsw: Router fixes
Ido says:
First two patches ensure we remove from the device's table neighbours
that are considered to be dead by the neighbour core.
The last patch removes nexthop groups from the device when they are no
longer valid.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
At the end of the nexthop initialization process we determine whether
the nexthop should be offloaded or not based on the NUD state of the
neighbour representing it. After all the nexthops were initialized we
refresh the nexthop group and potentially offload it to the device, in
case some of the nexthops were resolved.
Make the destruction of a nexthop group symmetric with its creation by
marking all nexthops as invalid and then refresh the nexthop group to
make sure it was removed from the device's tables.
Fixes: b2157149b0 ("mlxsw: spectrum_router: Add the nexthop neigh activity update")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a neighbour is considered to be dead, we should remove it from the
device's table regardless of its NUD state.
Without this patch, after setting a port to be administratively down we
get the following errors when we periodically try to update the kernel
about neighbours activity:
[ 461.947268] mlxsw_spectrum 0000:03:00.0 sw1p3: Failed to find
matching neighbour for IP=192.168.100.2
Fixes: a6bf9e933d ("mlxsw: spectrum_router: Offload neighbours based on NUD state change")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
neigh_cleanup_and_release() is always called after marking a neighbour
as dead, but it only notifies user space and not in-kernel listeners of
the netevent notification chain.
This can cause multiple problems. In my specific use case, it causes the
listener (a switch driver capable of L3 offloads) to believe a neighbour
entry is still valid, and is thus erroneously kept in the device's
table.
Fix that by sending a netevent after marking the neighbour as dead.
Fixes: a6bf9e933d ("mlxsw: spectrum_router: Offload neighbours based on NUD state change")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Socket cmsg IP(V6)_RECVORIGDSTADDR checks that port range lies within
the packet. For sockets that have transport headers pulled, transport
offset can be negative. Use signed comparison to avoid overflow.
Fixes: e6afc8ace6 ("udp: remove headers from UDP packets before queueing")
Reported-by: Nisar Jagabar <njagabar@cloudmark.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Or Gerlitz says:
====================
net/sched fixes for cls_flower and act_tunnel_key
This small series contain a fix to the matching flags support
in flower and to the tunnel key action MD prep for IPv6.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When matching on flags, we should require the user to provide the
mask and avoid using an all-ones mask. Not doing so causes matching
on flags provided w.o mask to hit on the value being unset for all
flags, which may not what the user wanted to happen.
Fixes: faa3ffce78 ('net/sched: cls_flower: Add support for matching on flags')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Paul Blakey <paulb@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The UDP dst port was provided to the helper function which sets the
IPv6 IP tunnel meta-data under a wrong param order, fix that.
Fixes: 75bfbca01e ('net/sched: act_tunnel_key: Add UDP dst port option')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When testing stmmac with my QoS reference design I checked a problem in the
CSR clock configuration that was impossibilitating the phy discovery, since
every read operation returned 0x0000ffff. This patch fixes the issue.
Signed-off-by: Joao Pinto <jpinto@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Both damn things interpret userland pointers embedded into the payload;
worse, they are actually traversing those. Leaving aside the bad
API design, this is very much _not_ safe to call with KERNEL_DS.
Bail out early if that happens.
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
sparse says:
fs/ufs/inode.c:1195:6: warning: symbol 'ufs_truncate_blocks' was not declared. Should it be static?
Note that the forward declaration in the file is already marked static.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
If you have a process that has set itself to be non-dumpable, and it
then undergoes exec(2), any CLOEXEC file descriptors it has open are
"exposed" during a race window between the dumpable flags of the process
being reset for exec(2) and CLOEXEC being applied to the file
descriptors. This can be exploited by a process by attempting to access
/proc/<pid>/fd/... during this window, without requiring CAP_SYS_PTRACE.
The race in question is after set_dumpable has been (for get_link,
though the trace is basically the same for readlink):
[vfs]
-> proc_pid_link_inode_operations.get_link
-> proc_pid_get_link
-> proc_fd_access_allowed
-> ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS);
Which will return 0, during the race window and CLOEXEC file descriptors
will still be open during this window because do_close_on_exec has not
been called yet. As a result, the ordering of these calls should be
reversed to avoid this race window.
This is of particular concern to container runtimes, where joining a
PID namespace with file descriptors referring to the host filesystem
can result in security issues (since PRCTL_SET_DUMPABLE doesn't protect
against access of CLOEXEC file descriptors -- file descriptors which may
reference filesystem objects the container shouldn't have access to).
Cc: dev@opencontainers.org
Cc: <stable@vger.kernel.org> # v3.2+
Reported-by: Michael Crosby <crosbymichael@gmail.com>
Signed-off-by: Aleksa Sarai <asarai@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
If kernfs file is empty on a first read, successive read operations
using the same file descriptor will return no data, even when data is
available. Default kernfs 'seq_next' implementation advances iterator
position even when next object is not there. Kernfs 'seq_start' for
following requests will not return iterator as position is already on
the second object.
This defect doesn't allow to monitor badblocks sysfs files from MD raid.
They are initially empty but if data appears at some stage, userspace is
not able to read it.
Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Strengthen the checking of pos/len vs. i_size, clarify the return values
for the clone prep function, and remove pointless code.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Problem similar to ones dealt with in "fold checks into iterate_and_advance()"
and followups, except that in this case we really want to do nothing when
asked for zero-length operation - unlike zero-length iterate_and_advance(),
zero-length iterate_all_kinds() has no side effects, and callers are simpler
that way.
That got exposed when copy_from_iter_full() had been used by tipc, which
builds an msghdr with zero payload and (now) feeds it to a primitive
based on iterate_all_kinds() instead of iterate_and_advance().
Reported-by: Jon Maloy <jon.maloy@ericsson.com>
Tested-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
... and fix the minor buglet in compat io_submit() - native one
kills ioctx as cleanup when put_user() fails. Get rid of
bogus compat_... in !CONFIG_AIO case, while we are at it - they
should simply fail with ENOSYS, same as for native counterparts.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
First set of i915 fixes for code in next.
* tag 'drm-intel-next-fixes-2016-12-22' of git://anongit.freedesktop.org/git/drm-intel:
drm/i915: skip the first 4k of stolen memory on everything >= gen8
drm/i915: Fallback to single PAGE_SIZE segments for DMA remapping
drm/i915: Fix use after free in logical_render_ring_init
drm/i915: disable PSR by default on HSW/BDW
drm/i915: Fix setting of boost freq tunable
drm/i915: tune down the fast link training vs boot fail
drm/i915: Reorder phys backing storage release
drm/i915/gen9: Fix PCODE polling during SAGV disabling
drm/i915/gen9: Fix PCODE polling during CDCLK change notification
drm/i915/dsi: Fix chv_exec_gpio disabling the GPIOs it is setting
drm/i915/dsi: Fix swapping of MIPI_SEQ_DEASSERT_RESET / MIPI_SEQ_ASSERT_RESET
drm/i915/dsi: Do not clear DPOUNIT_CLOCK_GATE_DISABLE from vlv_init_display_clock_gating
drm/i915: drop the struct_mutex when wedged or trying to reset
Here's the one lonely bugfix I talked about on irc.
* tag 'drm-misc-fixes-2016-12-22' of git://anongit.freedesktop.org/git/drm-misc:
drivers/gpu/drm/ast: Fix infinite loop if read fails
- fix display regression on DCE6/8
- Powergating fixes for GFX8
- amdgpu SI fixes (golden settings, proper rev id setup, etc.)
* 'drm-next-4.10' of git://people.freedesktop.org/~agd5f/linux: (21 commits)
drm/amdgpu: update tile table for oland/hainan
drm/amdgpu: update tile table for verde
drm/amdgpu: update rev id for verde
drm/amdgpu: update golden setting for verde
drm/amdgpu: update rev id for oland
drm/amdgpu: update golden setting for oland
drm/amdgpu: update rev id for hainan
drm/amdgpu: update golden setting for hainan
drm/amdgpu: update rev id for pitcairn
drm/amdgpu: update golden setting for pitcairn
drm/amdgpu: update golden setting/tiling table of tahiti
drm/amdgpu: fix cursor setting of dce6/dce8
drm/amdgpu: refine set clock gating for tonga/polaris
drm/amdgpu: initialize cg flags for tonga/polaris10/polaris11.
drm/amdgpu: add new gfx cg flags.
drm/amdgpu: fix pg can't be disabled by PG mask.
drm/amdgpu: always initialize gfx pg for gfx_v8.0.
drm/amdgpu: enable AMD_PG_SUPPORT_CP in Carrizo/Stoney.
drm/amdgpu: fix init save/restore list in gfx_v8.0
drm/amdgpu: fix enable_cp_power_gating in gfx_v8.0.
...
Pull block layer fixes from Jens Axboe:
"Just a set of small fixes that have either been queued up after the
original pull for this merge window, or just missed the original pull
request.
- a few bcache fixes/changes from Eric and Kent
- add WRITE_SAME to the command filter whitelist frm Mauricio
- kill an unused struct member from Ritesh
- partition IO alignment fix from Stefan
- nvme sysfs printf fix from Stephen"
* 'for-linus' of git://git.kernel.dk/linux-block:
block: check partition alignment
nvme : Use correct scnprintf in cmb show
block: allow WRITE_SAME commands with the SG_IO ioctl
block: Remove unused member (busy) from struct blk_queue_tag
bcache: partition support: add 16 minors per bcacheN device
bcache: Make gc wakeup sane, remove set_task_state()