Generating and retrieving socket cookies are a useful feature that is
exposed to BPF for various program types through bpf_get_socket_cookie()
helper.
The fact that the cookie counter is per netns is quite a limitation
for BPF in practice in particular for programs in host namespace that
use socket cookies as part of a map lookup key since they will be
causing socket cookie collisions e.g. when attached to BPF cgroup hooks
or cls_bpf on tc egress in host namespace handling container traffic
from veth or ipvlan devices with peer in different netns. Change the
counter to be global instead.
Socket cookie consumers must assume the value as opqaue in any case.
Not every socket must have a cookie generated and knowledge of the
counter value itself does not provide much value either way hence
conversion to global is fine.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Martynas Pumputis <m@lambda.lt>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAl1NmIAACgkQ+7dXa6fL
C2vroRAAioYEcgShBOH6uqjuGiWZjRsGX6Ppw8QKg7LTvvAes5oT74t3jL0a8as9
Zn3Y5bj7jueUH1vakFGFdycWNO8wgxD6+63Ki0M070Yy15lV9uH1VyAI8BJhMxHe
M3v6Yjy5OzILNqqs5+9i3om8ocBLkwfiGlocHXxQv15MA6lMgDFl23d1J3L4KK0o
sItorzvg90wIWNWR2jctbjqyAdS17LeXdUL41Oc3SXJi9wLHony3LAH3al4Rb9k7
FgWHyD9colDEezl4cWAehjZvm060EfWmuQsDT7+iDhNftokZTeDlRdnizDzAISYQ
i3wleKXegxFpXJ6FfaJeL8BDMUjTVG/MWUWSCb4gKYOr1XC6ioVhvo2DH7mgcb+2
E4dP76u0S7hoYI0Vn3X8091Vc4rXv6tCgw71HYA5wWKnZIfxFo+conOFoNujRh3V
3DZXj/hr/3UzIA8tsqWadfGfE9O08bObnhqfqFCzSTy3UhlWGkKCKEIyJljhh2wb
ztvDpASl2L62NPtd0irHG2yHxbVFH0lWcF1buL/wbndLo3HaQ7bTxQ/s8RDOpKTH
40Rv9Leih4QuT8WLrEMhRkgxE1heP9dGC34ZD2r1eZX6Lb9A5w21GjYd3/hl0dyG
llb75sqyBfAbfYveMij58hLuk4GW7lcaoQ0rnDwvS6yJ1LTHxLI=
=u6EZ
-----END PGP SIGNATURE-----
Merge tag 'rxrpc-fixes-20190809' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
David Howells says:
====================
Here's a couple of fixes for rxrpc:
(1) Fix refcounting of the local endpoint.
(2) Don't calculate or report packet skew information. This has been
obsolete since AFS 3.1 and so is a waste of resources.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Don't bother generating maxSkew in the ACK packet as it has been obsolete
since AFS 3.1.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
The object lifetime management on the rxrpc_local struct is broken in that
the rxrpc_local_processor() function is expected to clean up and remove an
object - but it may get requeued by packets coming in on the backing UDP
socket once it starts running.
This may result in the assertion in rxrpc_local_rcu() firing because the
memory has been scheduled for RCU destruction whilst still queued:
rxrpc: Assertion failed
------------[ cut here ]------------
kernel BUG at net/rxrpc/local_object.c:468!
Note that if the processor comes around before the RCU free function, it
will just do nothing because ->dead is true.
Fix this by adding a separate refcount to count active users of the
endpoint that causes the endpoint to be destroyed when it reaches 0.
The original refcount can then be used to refcount objects through the work
processor and cause the memory to be rcu freed when that reaches 0.
Fixes: 4f95dd78a7 ("rxrpc: Rework local endpoint management")
Reported-by: syzbot+1e0edc4b8b7494c28450@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
As spin_unlock_irq will enable interrupts.
Function tsi108_stat_carry is called from interrupt handler tsi108_irq.
Interrupts are enabled in interrupt handler.
Use spin_lock_irqsave/spin_unlock_irqrestore instead of spin_(un)lock_irq
in IRQ context to avoid this.
Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We should also enable team's vlan tx offload in hw_enc_features,
pass the vlan packets to the slave devices with vlan tci, let the
slave handle vlan tunneling offload implementation.
Fixes: 3268e5cb49 ("team: Advertise tunneling offload features")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sk_validate_xmit_skb() and drivers depend on the sk member of
struct sk_buff to identify segments requiring encryption.
Any operation which removes or does not preserve the original TLS
socket such as skb_orphan() or skb_clone() will cause clear text
leaks.
Make the TCP socket underlying an offloaded TLS connection
mark all skbs as decrypted, if TLS TX is in offload mode.
Then in sk_validate_xmit_skb() catch skbs which have no socket
(or a socket with no validation) and decrypted flag set.
Note that CONFIG_SOCK_VALIDATE_XMIT, CONFIG_TLS_DEVICE and
sk->sk_validate_xmit_skb are slightly interchangeable right now,
they all imply TLS offload. The new checks are guarded by
CONFIG_TLS_DEVICE because that's the option guarding the
sk_buff->decrypted member.
Second, smaller issue with orphaning is that it breaks
the guarantee that packets will be delivered to device
queues in-order. All TLS offload drivers depend on that
scheduling property. This means skb_orphan_partial()'s
trick of preserving partial socket references will cause
issues in the drivers. We need a full orphan, and as a
result netem delay/throttling will cause all TLS offload
skbs to be dropped.
Reusing the sk_buff->decrypted flag also protects from
leaking clear text when incoming, decrypted skb is redirected
(e.g. by TC).
See commit 0608c69c9a ("bpf: sk_msg, sock{map|hash} redirect
through ULP") for justification why the internal flag is safe.
The only location which could leak the flag in is tcp_bpf_sendmsg(),
which is taken care of by clearing the previously unused bit.
v2:
- remove superfluous decrypted mark copy (Willem);
- remove the stale doc entry (Boris);
- rely entirely on EOR marking to prevent coalescing (Boris);
- use an internal sendpages flag instead of marking the socket
(Boris).
v3 (Willem):
- reorganize the can_skb_orphan_partial() condition;
- fix the flag leak-in through tcp_bpf_sendmsg.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roman Mashak says:
====================
Fix batched event generation for skbedit action
When adding or deleting a batch of entries, the kernel sends up to
TCA_ACT_MAX_PRIO (defined to 32 in kernel) entries in an event to user
space. However it does not consider that the action sizes may vary and
require different skb sizes.
For example, consider the following script adding 32 entries with all
supported skbedit parameters and cookie (in order to maximize netlink
messages size):
% cat tc-batch.sh
TC="sudo /mnt/iproute2.git/tc/tc"
$TC actions flush action skbedit
for i in `seq 1 $1`;
do
cmd="action skbedit queue_mapping 2 priority 10 mark 7/0xaabbccdd \
ptype host inheritdsfield \
index $i cookie aabbccddeeff112233445566778800a1 "
args=$args$cmd
done
$TC actions add $args
%
% ./tc-batch.sh 32
Error: Failed to fill netlink attributes while adding TC action.
We have an error talking to the kernel
%
patch 1 adds callback in tc_action_ops of skbedit action, which calculates
the action size, and passes size to tcf_add_notify()/tcf_del_notify().
patch 2 updates the TDC test suite with relevant skbedit test cases.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Update TDC tests with cases varifying ability of TC to install or delete
batches of skbedit actions.
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add get_fill_size() routine used to calculate the action size
when building a batch of events.
Fixes: ca9b0e27e ("pkt_action: add new action skbedit")
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/net/dsa/sja1105/sja1105_main.c: In function sja1105_fdb_dump:
drivers/net/dsa/sja1105/sja1105_main.c:1226:14: warning:
variable tx_vid set but not used [-Wunused-but-set-variable]
drivers/net/dsa/sja1105/sja1105_main.c:1226:6: warning:
variable rx_vid set but not used [-Wunused-but-set-variable]
They are not used since commit 6d7c7d948a ("net: dsa:
sja1105: Fix broken learning with vlan_filtering disabled")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As commit 30d8177e8a ("bonding: Always enable vlan tx offload")
said, we should always enable bonding's vlan tx offload, pass the
vlan packets to the slave devices with vlan tci, let them to handle
vlan implementation.
Now if encapsulation protocols like VXLAN is used, skb->encapsulation
may be set, then the packet is passed to vlan device which based on
bonding device. However in netif_skb_features(), the check of
hw_enc_features:
if (skb->encapsulation)
features &= dev->hw_enc_features;
clears NETIF_F_HW_VLAN_CTAG_TX/NETIF_F_HW_VLAN_STAG_TX. This results
in same issue in commit 30d8177e8a like this:
vlan_dev_hard_start_xmit
-->dev_queue_xmit
-->validate_xmit_skb
-->netif_skb_features //NETIF_F_HW_VLAN_CTAG_TX is cleared
-->validate_xmit_vlan
-->__vlan_hwaccel_push_inside //skb->tci is cleared
...
--> bond_start_xmit
--> bond_xmit_hash //BOND_XMIT_POLICY_ENCAP34
--> __skb_flow_dissect // nhoff point to IP header
--> case htons(ETH_P_8021Q)
// skb_vlan_tag_present is false, so
vlan = __skb_header_pointer(skb, nhoff, sizeof(_vlan),
//vlan point to ip header wrongly
Fixes: b2a103e6d0 ("bonding: convert to ndo_fix_features")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In error case, all entries should be freed from the sched list
before deleting it. For simplicity use rcu way.
Fixes: 5a781ccbd1 ("tc: Add support for configuring the taprio scheduler")
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
IPX is no longer supported, but the example in the documentation
might useful. Replace it with IPv6.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Both IPX and TR have not been supported for a while now.
Remove them from the /proc/sys/net documentation.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
At this point nr_frags has been incremented but the frag does not yet
have a page assigned so freeing the skb results in a crash. Reset
nr_frags before freeing the skb to prevent this.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before commit d4289fcc9b ("net: IP6 defrag: use rbtrees for IPv6
defrag"), a netperf UDP_STREAM test[0] using big IPv6 datagrams (thus
generating many fragments) and running over an IPsec tunnel, reported
more than 6Gbps throughput. After that patch, the same test gets only
9Mbps when receiving on a be2net nic (driver can make a big difference
here, for example, ixgbe doesn't seem to be affected).
By reusing the IPv4 defragmentation code, IPv6 lost fragment coalescing
(IPv4 fragment coalescing was dropped by commit 14fe22e334 ("Revert
"ipv4: use skb coalescing in defragmentation"")).
Without fragment coalescing, be2net runs out of Rx ring entries and
starts to drop frames (ethtool reports rx_drops_no_frags errors). Since
the netperf traffic is only composed of UDP fragments, any lost packet
prevents reassembly of the full datagram. Therefore, fragments which
have no possibility to ever get reassembled pile up in the reassembly
queue, until the memory accounting exeeds the threshold. At that point
no fragment is accepted anymore, which effectively discards all
netperf traffic.
When reassembly timeout expires, some stale fragments are removed from
the reassembly queue, so a few packets can be received, reassembled
and delivered to the netperf receiver. But the nic still drops frames
and soon the reassembly queue gets filled again with stale fragments.
These long time frames where no datagram can be received explain why
the performance drop is so significant.
Re-introducing fragment coalescing is enough to get the initial
performances again (6.6Gbps with be2net): driver doesn't drop frames
anymore (no more rx_drops_no_frags errors) and the reassembly engine
works at full speed.
This patch is quite conservative and only coalesces skbs for local
IPv4 and IPv6 delivery (in order to avoid changing skb geometry when
forwarding). Coalescing could be extended in the future if need be, as
more scenarios would probably benefit from it.
[0]: Test configuration
Sender:
ip xfrm policy flush
ip xfrm state flush
ip xfrm state add src fc00:1::1 dst fc00:2::1 proto esp spi 0x1000 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:1::1 dst fc00:2::1
ip xfrm policy add src fc00:1::1 dst fc00:2::1 dir in tmpl src fc00:1::1 dst fc00:2::1 proto esp mode transport action allow
ip xfrm state add src fc00:2::1 dst fc00:1::1 proto esp spi 0x1001 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:2::1 dst fc00:1::1
ip xfrm policy add src fc00:2::1 dst fc00:1::1 dir out tmpl src fc00:2::1 dst fc00:1::1 proto esp mode transport action allow
netserver -D -L fc00:2::1
Receiver:
ip xfrm policy flush
ip xfrm state flush
ip xfrm state add src fc00:2::1 dst fc00:1::1 proto esp spi 0x1001 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:2::1 dst fc00:1::1
ip xfrm policy add src fc00:2::1 dst fc00:1::1 dir in tmpl src fc00:2::1 dst fc00:1::1 proto esp mode transport action allow
ip xfrm state add src fc00:1::1 dst fc00:2::1 proto esp spi 0x1000 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:1::1 dst fc00:2::1
ip xfrm policy add src fc00:1::1 dst fc00:2::1 dir out tmpl src fc00:1::1 dst fc00:2::1 proto esp mode transport action allow
netperf -H fc00:2::1 -f k -P 0 -L fc00:1::1 -l 60 -t UDP_STREAM -I 99,5 -i 5,5 -T5,5 -6
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Fix netlink dumping of all mcast_flags buckets, by Sven Eckelmann
- Fix deletion of RTR(4|6) mcast list entries, by Sven Eckelmann
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAl1MHO0WHHN3QHNpbW9u
d3VuZGVybGljaC5kZQAKCRChK+OYQpKeobHvD/9RQLU/2zrbq22QIhLwEt2DeGg3
Z//qHgK/LhDnLJRHA0F14NlA/4vPtzbfIX8m0Ai/mrPrk0naDKEeM6TSBw8IHQEz
bsibnHkwB+cdgqGtVKFEHqL5seaD5Nnlu89xIvgTyeImq2Zd7heOphN8lLmJPdem
3TugLIkbEqO4P+KU8fdjuA+N4GcRzaArsendH+3z4LrlXJnh4YWvook4/gSjQwWe
ydF+e2ktZvNEj8Xl5vsDpykBEFll+k4BThW8toIMx4FCLiw1Hqedp/agwIbs5Tjc
ccdYa/wuKYUAwz5rsrqgx3FpH6z0w6h0b3AB++cLaahmas9qx/K7lT68PsJcPz0E
qhZmbw0snRO/N+TZQPVd1B9i0H3efAaQzzUu/xfL2RRA91ztB5O2+3HZF3PBzw/J
u1Ewt56vv4LV0QmeQLzczKMBS47DPoARk8mLOx0dpGl6vvkQYWBFbn+gk/+uPalk
K15Yozd3h/nSEX3SWk50X3+AU/p8l7rAYcrOrpWsRmQGl06XBbRDr9mRijMoHnTE
LM+AKiZLvOS1Q/UT/C0YwL2i665aG3NqzqMXq+0Mtciu6Lsefu8+JOZFxcK4GXGU
XJ+Kkygf/3/5dhu3Nxm4GchQQSLVJRn/i0nCKJeWS/Qxdvcalk8lPQPrX2a2l6GY
kW2KxW/KjwayzLpwSQ==
=g2Gm
-----END PGP SIGNATURE-----
Merge tag 'batadv-net-for-davem-20190808' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
Here are some batman-adv bugfixes:
- Fix netlink dumping of all mcast_flags buckets, by Sven Eckelmann
- Fix deletion of RTR(4|6) mcast list entries, by Sven Eckelmann
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
"Yeah I should have sent a pull request last week, so there is a lot
more here than usual:
1) Fix memory leak in ebtables compat code, from Wenwen Wang.
2) Several kTLS bug fixes from Jakub Kicinski (circular close on
disconnect etc.)
3) Force slave speed check on link state recovery in bonding 802.3ad
mode, from Thomas Falcon.
4) Clear RX descriptor bits before assigning buffers to them in
stmmac, from Jose Abreu.
5) Several missing of_node_put() calls, mostly wrt. for_each_*() OF
loops, from Nishka Dasgupta.
6) Double kfree_skb() in peak_usb can driver, from Stephane Grosjean.
7) Need to hold sock across skb->destructor invocation, from Cong
Wang.
8) IP header length needs to be validated in ipip tunnel xmit, from
Haishuang Yan.
9) Use after free in ip6 tunnel driver, also from Haishuang Yan.
10) Do not use MSI interrupts on r8169 chips before RTL8168d, from
Heiner Kallweit.
11) Upon bridge device init failure, we need to delete the local fdb.
From Nikolay Aleksandrov.
12) Handle erros from of_get_mac_address() properly in stmmac, from
Martin Blumenstingl.
13) Handle concurrent rename vs. dump in netfilter ipset, from Jozsef
Kadlecsik.
14) Setting NETIF_F_LLTX on mac80211 causes complete breakage with
some devices, so revert. From Johannes Berg.
15) Fix deadlock in rxrpc, from David Howells.
16) Fix Kconfig deps of enetc driver, we must have PHYLIB. From Yue
Haibing.
17) Fix mvpp2 crash on module removal, from Matteo Croce.
18) Fix race in genphy_update_link, from Heiner Kallweit.
19) bpf_xdp_adjust_head() stopped working with generic XDP when we
fixes generic XDP to support stacked devices properly, fix from
Jesper Dangaard Brouer.
20) Unbalanced RCU locking in rt6_update_exception_stamp_rt(), from
David Ahern.
21) Several memory leaks in new sja1105 driver, from Vladimir Oltean"
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (214 commits)
net: dsa: sja1105: Fix memory leak on meta state machine error path
net: dsa: sja1105: Fix memory leak on meta state machine normal path
net: dsa: sja1105: Really fix panic on unregistering PTP clock
net: dsa: sja1105: Use the LOCKEDS bit for SJA1105 E/T as well
net: dsa: sja1105: Fix broken learning with vlan_filtering disabled
net: dsa: qca8k: Add of_node_put() in qca8k_setup_mdio_bus()
net: sched: sample: allow accessing psample_group with rtnl
net: sched: police: allow accessing police->params with rtnl
net: hisilicon: Fix dma_map_single failed on arm64
net: hisilicon: fix hip04-xmit never return TX_BUSY
net: hisilicon: make hip04_tx_reclaim non-reentrant
tc-testing: updated vlan action tests with batch create/delete
net sched: update vlan action for batched events operations
net: stmmac: tc: Do not return a fragment entry
net: stmmac: Fix issues when number of Queues >= 4
net: stmmac: xgmac: Fix XGMAC selftests
be2net: disable bh with spin_lock in be_process_mcc
net: cxgb3_main: Fix a resource leak in a error path in 'init_one()'
net: ethernet: sun4i-emac: Support phy-handle property for finding PHYs
net: bridge: move default pvid init/deinit to NETDEV_REGISTER/UNREGISTER
...
Vladimir Oltean says:
====================
Fixes for SJA1105 DSA: FDBs, Learning and PTP
This is an assortment of functional fixes for the sja1105 switch driver
targeted for the "net" tree (although they apply on net-next just as
well).
Patch 1/5 ("net: dsa: sja1105: Fix broken learning with vlan_filtering
disabled") repairs a breakage introduced in the early development stages
of the driver: support for traffic from the CPU has broken "normal"
frame forwarding (based on DMAC) - there is connectivity through the
switch only because all frames are flooded.
I debated whether this patch qualifies as a fix, since it puts the
switch into a mode it has never operated in before (aka SVL). But
"normal" forwarding did use to work before the "Traffic support for
SJA1105 DSA driver" patchset, and arguably this patch should have been
part of that.
Also, it would be strange for this feature to be broken in the 5.2 LTS.
Patch 2/5 ("net: dsa: sja1105: Use the LOCKEDS bit for SJA1105 E/T as
well") is a simplification of a previous FDB-related patch that is
currently in the 5.3 rc's.
Patches 3/5 - 5/5 fix various crashes found while running linuxptp over the
switch ports for extended periods of time, or in conjunction with other
error conditions. The fixed-up commits were all introduced in 5.2.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When RX timestamping is enabled and two link-local (non-meta) frames are
received in a row, this constitutes an error.
The tagger is always caching the last link-local frame, in an attempt to
merge it with the meta follow-up frame when that arrives. To recover
from the above error condition, the initial cached link-local frame is
dropped and the second frame in a row is cached (in expectance of the
second meta frame).
However, when dropping the initial link-local frame, its backing memory
was being leaked.
Fixes: f3097be21b ("net: dsa: sja1105: Add a state machine for RX timestamping")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After a meta frame is received, it is associated with the cached
sp->data->stampable_skb from the DSA tagger private structure.
Cached means its refcount is incremented with skb_get() in order for
dsa_switch_rcv() to not free it when the tagger .rcv returns NULL.
The mistake is that skb_unref() is not the correct function to use. It
will correctly decrement the refcount (which will go back to zero) but
the skb memory will not be freed. That is the job of kfree_skb(), which
also calls skb_unref().
But it turns out that freeing the cached stampable_skb is in fact not
necessary. It is still a perfectly valid skb, and now it is even
annotated with the partial RX timestamp. So remove the skb_copy()
altogether and simply pass the stampable_skb with a refcount of 1
(incremented by us, decremented by dsa_switch_rcv) up the stack.
Fixes: f3097be21b ("net: dsa: sja1105: Add a state machine for RX timestamping")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The IS_ERR_OR_NULL(priv->clock) check inside
sja1105_ptp_clock_unregister() is preventing cancel_delayed_work_sync
from actually being run.
Additionally, sja1105_ptp_clock_unregister() does not actually get run,
when placed in sja1105_remove(). The DSA switch gets torn down, but the
sja1105 module does not get unregistered. So sja1105_ptp_clock_unregister
needs to be moved to sja1105_teardown, to be symmetrical with
sja1105_ptp_clock_register which is called from the DSA sja1105_setup.
It is strange to fix a "fixes" patch, but the probe failure can only be
seen when the attached PHY does not respond to MDIO (issue which I can't
pinpoint the reason to) and it goes away after I power-cycle the board.
This time the patch was validated on a failing board, and the kernel
panic from the fixed commit's message can no longer be seen.
Fixes: 29dd908d35 ("net: dsa: sja1105: Cancel PTP delayed work on unregister")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It looks like the FDB dump taken from first-generation switches also
contains information on whether entries are static or not. So use that
instead of searching through the driver's tables.
Fixes: d763778224 ("net: dsa: sja1105: Implement is_static for FDB entries on E/T")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When put under a bridge with vlan_filtering 0, the SJA1105 ports will
flood all traffic as if learning was broken. This is because learning
interferes with the rx_vid's configured by dsa_8021q as unique pvid's.
So learning technically still *does* work, it's just that the learnt
entries never get matched due to their unique VLAN ID.
The setting that saves the day is Shared VLAN Learning, which on this
switch family works exactly as desired: VLAN tagging still works
(untagged traffic gets the correct pvid) and FDB entries are still
populated with the correct contents including VID. Also, a frame cannot
violate the forwarding domain restrictions enforced by its classified
VLAN. It is just that the VID is ignored when looking up the FDB for
taking a forwarding decision (selecting the egress port).
This patch activates SVL, and the result is that frames with a learnt
DMAC are no longer flooded in the scenario described above.
Now exactly *because* SVL works as desired, we have to revisit some
earlier patches:
- It is no longer necessary to manipulate the VID of the 'bridge fdb
{add,del}' command when vlan_filtering is off. This is because now,
SVL is enabled for that case, so the actual VID does not matter*.
- It is still desirable to hide dsa_8021q VID's in the FDB dump
callback. But right now the dump callback should no longer hide
duplicates (one per each front panel port's pvid, plus one for the
VLAN that the CPU port is going to tag a TX frame with), because there
shouldn't be any (the switch will match a single FDB entry no matter
its VID anyway).
* Not really... It's no longer necessary to transform a 'bridge fdb add'
into 5 fdb add operations, but the user might still add a fdb entry with
any vid, and all of them would appear as duplicates in 'bridge fdb
show'. So force a 'bridge fdb add' to insert the VID of 0**, so that we
can prune the duplicates at insertion time.
** The VID of 0 is better than 1 because it is always guaranteed to be
in the ports' hardware filter. DSA also avoids putting the VID inside
the netlink response message towards the bridge driver when we return
this particular VID, which makes it suitable for FDB entries learnt
with vlan_filtering off.
Fixes: 227d07a07e ("net: dsa: sja1105: Add support for traffic through standalone ports")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Georg Waibel <georg.waibel@sensor-technik.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Each iteration of for_each_available_child_of_node() puts the previous
node, but in the case of a return from the middle of the loop, there
is no put, thus causing a memory leak. Hence add an of_node_put() before
the return.
Additionally, the local variable ports in the function
qca8k_setup_mdio_bus() takes the return value of of_get_child_by_name(),
which gets a node but does not put it. If the function returns without
putting ports, it may cause a memory leak. Hence put ports before the
mid-loop return statement, and also outside the loop after its last usage
in this function.
Issues found with Coccinelle.
Signed-off-by: Nishka Dasgupta <nishkadg.linux@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Buslov says:
====================
action fixes for flow_offload infra compatibility
Fix rcu warnings due to usage of action helpers that expect rcu read lock
protection from rtnl-protected context of flow_offload infra.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiangfeng Xiao says:
====================
net: hisilicon: Fix a few problems with hip04_eth
During the use of the hip04_eth driver,
several problems were found,
which solved the hip04_tx_reclaim reentry problem,
fixed the problem that hip04_mac_start_xmit never
returns NETDEV_TX_BUSY
and the dma_map_single failed on the arm64 platform.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
On the arm64 platform, executing "ifconfig eth0 up" will fail,
returning "ifconfig: SIOCSIFFLAGS: Input/output error."
ndev->dev is not initialized, dma_map_single->get_dma_ops->
dummy_dma_ops->__dummy_map_page will return DMA_ERROR_CODE
directly, so when we use dma_map_single, the first parameter
is to use the device of platform_device.
Signed-off-by: Jiangfeng Xiao <xiaojiangfeng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TX_DESC_NUM is 256, in tx_count, the maximum value of
mod(TX_DESC_NUM - 1) is 254, the variable "count" in
the hip04_mac_start_xmit function is never equal to
(TX_DESC_NUM - 1), so hip04_mac_start_xmit never
return NETDEV_TX_BUSY.
tx_count is modified to mod(TX_DESC_NUM) so that
the maximum value of tx_count can reach
(TX_DESC_NUM - 1), then hip04_mac_start_xmit can reurn
NETDEV_TX_BUSY.
Signed-off-by: Jiangfeng Xiao <xiaojiangfeng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roman Mashak says:
====================
Fix batched event generation for vlan action
When adding or deleting a batch of entries, the kernel sends up to
TCA_ACT_MAX_PRIO (defined to 32 in kernel) entries in an event to user
space. However it does not consider that the action sizes may vary and
require different skb sizes.
For example, consider the following script adding 32 entries with all
supported vlan parameters (in order to maximize netlink messages size):
% cat tc-batch.sh
TC="sudo /mnt/iproute2.git/tc/tc"
$TC actions flush action vlan
for i in `seq 1 $1`;
do
cmd="action vlan push protocol 802.1q id 4094 priority 7 pipe \
index $i cookie aabbccddeeff112233445566778800a1 "
args=$args$cmd
done
$TC actions add $args
%
% ./tc-batch.sh 32
Error: Failed to fill netlink attributes while adding TC action.
We have an error talking to the kernel
%
patch 1 adds callback in tc_action_ops of vlan action, which calculates
the action size, and passes size to tcf_add_notify()/tcf_del_notify().
patch 2 updates the TDC test suite with relevant vlan test cases.
====================
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update TDC tests with cases varifying ability of TC to install or delete
batches of vlan actions.
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add get_fill_size() routine used to calculate the action size
when building a batch of events.
Fixes: c7e2b9689 ("sched: introduce vlan action")
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Various switch fall through annotations to fixup warnings & errors
resulting from -Wimplicit-fallthrough.
- A fix for systems (at least jazz) using an i8253 PIT as clocksource
when it's not suitably configured.
- Set struct cacheinfo's cpu_map_populated field to true, indicating
that we filled in cache info detected from cop0 registers & avoiding
complaints about that info being (intentionally) missing in
devicetree.
-----BEGIN PGP SIGNATURE-----
iIsEABYIADMWIQRgLjeFAZEXQzy86/s+p5+stXUA3QUCXUnSPBUccGF1bC5idXJ0
b25AbWlwcy5jb20ACgkQPqefrLV1AN2u3gD/TaMPczS5027R0FMXskiroUHaMG4S
JL0EYIVmfny4vwYBAIvLr5l1jEXEqegjYXFabuI5PybQlFmTZMhjauh6gKYJ
=e6fl
-----END PGP SIGNATURE-----
Merge tag 'mips_fixes_5.3_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS fixes from Paul Burton:
"A few MIPS fixes for 5.3:
- Various switch fall through annotations to fixup warnings & errors
resulting from -Wimplicit-fallthrough.
- A fix for systems (at least jazz) using an i8253 PIT as clocksource
when it's not suitably configured.
- Set struct cacheinfo's cpu_map_populated field to true, indicating
that we filled in cache info detected from cop0 registers &
avoiding complaints about that info being (intentionally) missing
in devicetree"
* tag 'mips_fixes_5.3_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: BCM63XX: Mark expected switch fall-through
MIPS: OProfile: Mark expected switch fall-throughs
MIPS: Annotate fall-through in Cavium Octeon code
MIPS: Annotate fall-through in kvm/emulate.c
mips: fix cacheinfo
MIPS: kernel: only use i8253 clocksource with periodic clockevent
Jose Abreu says:
====================
net: stmmac: Fixes for -net
Couple of fixes for -net. More info in commit log.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Do not try to return a fragment entry from TC list. Otherwise we may not
clean properly allocated entries.
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When queues >= 4 we use different registers but we were not subtracting
the offset of 4. Fix this.
Found out by Coverity.
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixup the XGMAC selftests by correctly finishing the implementation of
set_filter callback.
Result:
$ ethtool -t enp4s0
The test result is PASS
The test extra info:
1. MAC Loopback 0
2. PHY Loopback -95
3. MMC Counters -95
4. EEE -95
5. Hash Filter MC 0
6. Perfect Filter UC 0
7. MC Filter 0
8. UC Filter 0
9. Flow Control 0
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Second set of fixes for 5.3. Lots of iwlwifi fixes have accumulated
which consists most of patches in this pull request. Only most notable
iwlwifi fixes are listed below.
mwifiex
* fix a regression related to WPA1 networks since v5.3-rc1
iwlwifi
* fix use-after-free issues
* fix DMA mapping API usage errors
* fix frame drop occurring due to reorder buffer handling in
RSS in certain conditions
* fix rate scale locking issues
* disable TX A-MSDU on older NICs as it causes problems and was
never supposed to be supported
* new PCI IDs
* GEO_TX_POWER_LIMIT API issue that many people were hitting
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJdSWJ0AAoJEG4XJFUm622bwb4H/1cbH/VoPkNPfRb6783G3Fuc
SWcI655nMDHY3cElhLPGQrR4OuNYbFpBDT7O73XMIVqnkHlgq3L/VZD2qaacSd4H
tY4HEkkHobil9gdsEt8hpBstp9xNi36klDiq9gEOLT88eatdukCZTsCnBwaTFSEQ
84QxsVsHWuB+khVDay6jprA+vwL6YopOlPwfY9+q0LHWCHq1C98ZqPZoxjb8dzN7
bbW4hWIhMHRtN9ixPwvoRhscev2ZgRPHAncoGFE8bmRM79srZpG0mDeAhQPLGrJz
YSoGtylHpQxg3vzz2y80fMpH6jsUHFeA56xuKPQ5gnUm4bPeFbhx7h29DuABsAw=
=GDUP
-----END PGP SIGNATURE-----
Merge tag 'wireless-drivers-for-davem-2019-08-06' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers
Kalle Valo says:
====================
wireless-drivers fixes for 5.3
Second set of fixes for 5.3. Lots of iwlwifi fixes have accumulated
which consists most of patches in this pull request. Only most notable
iwlwifi fixes are listed below.
mwifiex
* fix a regression related to WPA1 networks since v5.3-rc1
iwlwifi
* fix use-after-free issues
* fix DMA mapping API usage errors
* fix frame drop occurring due to reorder buffer handling in
RSS in certain conditions
* fix rate scale locking issues
* disable TX A-MSDU on older NICs as it causes problems and was
never supposed to be supported
* new PCI IDs
* GEO_TX_POWER_LIMIT API issue that many people were hitting
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull HID fixes from Jiri Kosina:
- functional regression fix for some of the Logitech unifying devices,
from Hans de Goede
- race condition fix in hid-sony for bug severely affecting
Valve/Android deployments, from Roderick Colenbrander
- several fixes for issues found by syzbot/kasan, from Oliver Neukum
and Hillf Danton
- functional regression fix for Wacom Cintiq device, from Aaron
Armstrong Skomra
- a few other assorted device-specific quirks
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: sony: Fix race condition between rumble and device remove.
HID: hiddev: do cleanup in failure of opening a device
HID: hiddev: avoid opening a disconnected device
HID: input: fix a4tech horizontal wheel custom usage
HID: Add quirk for HP X1200 PIXART OEM mouse
HID: holtek: test for sanity of intfdata
HID: wacom: fix bit shift for Cintiq Companion 2
HID: quirks: Set the INCREMENT_USAGE_ON_DUPLICATE quirk on Saitek X52
HID: logitech-dj: Really fix return value of logi_dj_recv_query_hidpp_devices
HID: Add 044f:b320 ThrustMaster, Inc. 2 in 1 DT
HID: logitech-dj: add the Powerplay receiver
HID: logitech-hidpp: add USB PID for a few more supported mice
HID: logitech-dj: rename "gaming" receiver to "lightspeed"
be_process_mcc() is invoked in 3 different places and
always with BHs disabled except the be_poll function
but since it's invoked from softirq with BHs
disabled it won't hurt.
v1->v2: added explanation to the patch
v2->v3: add a missing call from be_cmds.c
Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
A call to 'kfree_skb()' is missing in the error handling path of
'init_one()'.
This is already present in 'remove_one()' but is missing here.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
The sun4i-emac uses the "phy" property to find the PHY it's supposed to
use. This property was deprecated in favor of "phy-handle" in commit
8c5b094476 ("dt-bindings: net: sun4i-emac: Convert the binding to a
schemas").
Add support for this new property name, and fall back to the old one in
case the device tree hasn't been updated.
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull pti updates from Thomas Gleixner:
"The performance deterioration departement is not proud at all to
present yet another set of speculation fences to mitigate the next
chapter in the 'what could possibly go wrong' story.
The new vulnerability belongs to the Spectre class and affects GS
based data accesses and has therefore been dubbed 'Grand Schemozzle'
for secret communication purposes. It's officially listed as
CVE-2019-1125.
Conditional branches in the entry paths which contain a SWAPGS
instruction (interrupts and exceptions) can be mis-speculated which
results in speculative accesses with a wrong GS base.
This can happen on entry from user mode through a mis-speculated
branch which takes the entry from kernel mode path and therefore does
not execute the SWAPGS instruction. The following speculative accesses
are done with user GS base.
On entry from kernel mode the mis-speculated branch executes the
SWAPGS instruction in the entry from user mode path which has the same
effect that the following GS based accesses are done with user GS
base.
If there is a disclosure gadget available in these code paths the
mis-speculated data access can be leaked through the usual side
channels.
The entry from user mode issue affects all CPUs which have speculative
execution. The entry from kernel mode issue affects only Intel CPUs
which can speculate through SWAPGS. On CPUs from other vendors SWAPGS
has semantics which prevent that.
SMAP migitates both problems but only when the CPU is not affected by
the Meltdown vulnerability.
The mitigation is to issue LFENCE instructions in the entry from
kernel mode path for all affected CPUs and on the affected Intel CPUs
also in the entry from user mode path unless PTI is enabled because
the CR3 write is serializing.
The fences are as usual enabled conditionally and can be completely
disabled on the kernel command line. The Spectre V1 documentation is
updated accordingly.
A big "Thank You!" goes to Josh for doing the heavy lifting for this
round of hardware misfeature 'repair'. Of course also "Thank You!" to
everybody else who contributed in one way or the other"
* 'x86/grand-schemozzle' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Documentation: Add swapgs description to the Spectre v1 documentation
x86/speculation/swapgs: Exclude ATOMs from speculation through SWAPGS
x86/entry/64: Use JMP instead of JMPQ
x86/speculation: Enable Spectre v1 swapgs mitigations
x86/speculation: Prepare entry code for Spectre v1 swapgs mitigations
Valve reported a kernel crash on Ubuntu 18.04 when disconnecting a DS4
gamepad while rumble is enabled. This issue is reproducible with a
frequency of 1 in 3 times in the game Borderlands 2 when using an
automatic weapon, which triggers many rumble operations.
We found the issue to be a race condition between sony_remove and the
final device destruction by the HID / input system. The problem was
that sony_remove didn't clean some of its work_item state in
"struct sony_sc". After sony_remove work, the corresponding evdev
node was around for sufficient time for applications to still queue
rumble work after "sony_remove".
On pre-4.19 kernels the race condition caused a kernel crash due to a
NULL-pointer dereference as "sc->output_report_dmabuf" got freed during
sony_remove. On newer kernels this crash doesn't happen due the buffer
now being allocated using devm_kzalloc. However we can still queue work,
while the driver is an undefined state.
This patch fixes the described problem, by guarding the work_item
"state_worker" with an initialized variable, which we are setting back
to 0 on cleanup.
Signed-off-by: Roderick Colenbrander <roderick.colenbrander@sony.com>
CC: stable@vger.kernel.org
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Undo what we did for opening before releasing the memory slice.
Reported-by: syzbot <syzbot+62a1e04fd3ec2abf099e@syzkaller.appspotmail.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Hillf Danton <hdanton@sina.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>