Daniel Borkmann says:
====================
pull-request: bpf-next 2020-10-15
The main changes are:
1) Fix register equivalence tracking in verifier, from Alexei Starovoitov.
2) Fix sockmap error path to not call bpf_prog_put() with NULL, from Alex Dewar.
3) Fix sockmap to add locking annotations to iterator, from Lorenz Bauer.
4) Fix tcp_hdr_options test to use loopback address, from Martin KaFai Lau.
====================
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Minor conflicts in net/mptcp/protocol.h and
tools/testing/selftests/net/Makefile.
In both cases code was added on both sides in the same place
so just keep both.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This reverts commit 1d273fcc2c.
Alexei points out there's nothing implying headers will be built
and therefore exist under usr/include, so this fix doesn't make
much sense.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If bpf_prog_inc_not_zero() fails for skb_parser, then bpf_prog_put() is
called unconditionally on skb_verdict, even though it may be NULL. Fix
and tidy up error path.
Fixes: 743df8b774 ("bpf, sockmap: Check skb_verdict and skb_parser programs explicitly")
Addresses-Coverity-ID: 1497799: Null pointer dereferences (FORWARD_NULL)
Signed-off-by: Alex Dewar <alex.dewar90@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20201012170952.60750-1-alex.dewar90@gmail.com
The tcp_hdr_options test adds a "::eB9F" addr to the lo dev.
However, this non loopback address will have a race on ipv6 dad
which may lead to EADDRNOTAVAIL error from time to time.
Even nodad is used in the iproute2 command, there is still a race in
when the route will be added. This will then lead to ENETUNREACH from
time to time.
To avoid the above, this patch uses the default loopback address "::1"
to do the test.
Fixes: ad2f8eb009 ("bpf: selftests: Tcp header options")
Reported-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20201012234940.1707941-1-kafai@fb.com
The sparse checker currently outputs the following warnings:
include/linux/rcupdate.h:632:9: sparse: sparse: context imbalance in 'sock_hash_seq_start' - wrong count at exit
include/linux/rcupdate.h:632:9: sparse: sparse: context imbalance in 'sock_map_seq_start' - wrong count at exit
Add the necessary __acquires and __release annotations to make the
iterator locking schema palatable to sparse. Also add __must_hold
for good measure.
The kernel codebase uses both __acquires(rcu) and __acquires(RCU).
I couldn't find any guidance which one is preferred, so I used
what is easier to type out.
Fixes: 0365351524 ("net: Allow iterating sockmap and sockhash")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20201012091850.67452-1-lmb@cloudflare.com
nftables payload statements are used to mangle SCTP headers, but they can
only replace the Internet Checksum. As a consequence, nftables rules that
mangle sport/dport/vtag in SCTP headers potentially generate packets that
are discarded by the receiver, unless the CRC-32C is "offloaded" (e.g the
rule mangles a skb having 'ip_summed' equal to 'CHECKSUM_PARTIAL'.
Fix this extending uAPI definitions and L4 checksum update function, in a
way that userspace programs (e.g. nft) can instruct the kernel to compute
CRC-32C in SCTP headers. Also ensure that LIBCRC32C is built if NF_TABLES
is 'y' or 'm' in the kernel build configuration.
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAl+IRI8ACgkQ+7dXa6fL
C2szew//cxlOShaXVprAcS/i5djASCbcL1qoEbz9O0FIXSdNdioeT9+CxSzvAecz
weHsDtPoNYdLMzcM53KiB1OjBl8CngbV54zoe5jrOqIEyKa7ONzyi6WC3umwQT4I
tTyr/yKs1XdQXfpICxadyOyVtNUF8cy4LcdAZUpbsNK1dGsRlxLMIzlqX4xMvh3+
xatBIc/bqJ3SRLq2I/ceA7d+eGsIa7cMhNgnCZYTKpkJN0FAD/e7rp+DmULg5ZY0
+6Y6G8LNd1yLq9hmFRBKQacx8zp3pV4srmrlDJOE754OMhYld9zw3hXPBBBDczIf
qjEGxRixkbsNQsPvnPDCsQpzVhtXWTh14I4B/vD6I7uDNEOPLHE4/ootrjcGLakE
vxd1HGrIpNNZGAVC6aj+nJTVALvZ1lc/u1xK7VNlIl5o5339FN8cOpLLTeE3NhZJ
lZvcqFrsD59W+eFFdmpOMr3dzUK9o0ZtLEYuRyrHAWPThjpKmXbZZQd6pVGASv2b
r5xJlgrFxVUAV4XDzbcmsNnuQ7zlfOTIThE+3s1evuckqfvrAg8r/PilXLPT3yCN
btO8ZiwLZaaiKtKCDo7Iy1Fw7df32PxlBLub05WOC5HZMsTiemTxd20ENwIR8Zrk
3hJE1IAyCjzry3XsFO9y+UrPlKcXihmmj2UKC8ibRArYNHtDMDc=
=PsBi
-----END PGP SIGNATURE-----
Merge tag 'rxrpc-next-20201015' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
David Howells says:
====================
rxrpc fixes
Here are a couple of fixes that need to be applied on top of rxrpc patches
in net-next:
(1) Fix a bug in the connection bundle changes in the net-next tree.
(2) Fix the loss of final ACK on socket shutdown.
====================
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 4fc427e051 ("ipv6_route_seq_next should increase position index")
tried to fix the issue where seq_file pos is not increased
if a NULL element is returned with seq_ops->next(). See bug
https://bugzilla.kernel.org/show_bug.cgi?id=206283
The commit effectively does:
- increase pos for all seq_ops->start()
- increase pos for all seq_ops->next()
For ipv6_route, increasing pos for all seq_ops->next() is correct.
But increasing pos for seq_ops->start() is not correct
since pos is used to determine how many items to skip during
seq_ops->start():
iter->skip = *pos;
seq_ops->start() just fetches the *current* pos item.
The item can be skipped only after seq_ops->show() which essentially
is the beginning of seq_ops->next().
For example, I have 7 ipv6 route entries,
root@arch-fb-vm1:~/net-next dd if=/proc/net/ipv6_route bs=4096
00000000000000000000000000000000 40 00000000000000000000000000000000 00 00000000000000000000000000000000 00000400 00000001 00000000 00000001 eth0
fe800000000000000000000000000000 40 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000001 00000000 00000001 eth0
00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200 lo
00000000000000000000000000000001 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000003 00000000 80200001 lo
fe800000000000002050e3fffebd3be8 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000002 00000000 80200001 eth0
ff000000000000000000000000000000 08 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000004 00000000 00000001 eth0
00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200 lo
0+1 records in
0+1 records out
1050 bytes (1.0 kB, 1.0 KiB) copied, 0.00707908 s, 148 kB/s
root@arch-fb-vm1:~/net-next
In the above, I specify buffer size 4096, so all records can be returned
to user space with a single trip to the kernel.
If I use buffer size 128, since each record size is 149, internally
kernel seq_read() will read 149 into its internal buffer and return the data
to user space in two read() syscalls. Then user read() syscall will trigger
next seq_ops->start(). Since the current implementation increased pos even
for seq_ops->start(), it will skip record #2, #4 and #6, assuming the first
record is #1.
root@arch-fb-vm1:~/net-next dd if=/proc/net/ipv6_route bs=128
00000000000000000000000000000000 40 00000000000000000000000000000000 00 00000000000000000000000000000000 00000400 00000001 00000000 00000001 eth0
00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200 lo
fe800000000000002050e3fffebd3be8 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000002 00000000 80200001 eth0
00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200 lo
4+1 records in
4+1 records out
600 bytes copied, 0.00127758 s, 470 kB/s
To fix the problem, create a fake pos pointer so seq_ops->start()
won't actually increase seq_file pos. With this fix, the
above `dd` command with `bs=128` will show correct result.
Fixes: 4fc427e051 ("ipv6_route_seq_next should increase position index")
Cc: Alexei Starovoitov <ast@kernel.org>
Suggested-by: Vasily Averin <vvs@virtuozzo.com>
Reviewed-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Karsten Graul says:
====================
net/smc: fixes 2020-10-14
The first patch fixes a possible use-after-free of delayed llc events.
Patch 2 corrects the number of DMB buffer sizes. And patch 3 ensures
a correctly formatted return code when smc_ism_register_dmb() fails to
create a new DMB.
====================
Link: https://lore.kernel.org/r/20201014174329.35791-1-kgraul@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
smc_ism_register_dmb() returns error codes set by the ISM driver which
are not guaranteed to be negative or in the errno range. Such values
would not be handled by ERR_PTR() and finally the return code will be
used as a memory address.
Fix that by using a valid negative errno value with ERR_PTR().
Fixes: 72b7f6c487 ("net/smc: unique reason code for exceeded max dmb count")
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The SMCD_DMBE_SIZES should include all valid DMBE buffer sizes, so the
correct value is 6 which means 1MB. With 7 the registration of an ISM
buffer would always fail because of the invalid size requested.
Fix that and set the value to 6.
Fixes: c6ba7c9ba4 ("net/smc: add base infrastructure for SMC-D and ISM")
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When a delayed event is enqueued then the event worker will send this
event the next time it is running and no other flow is currently
active. The event handler is called for the delayed event, and the
pointer to the event keeps set in lgr->delayed_event. This pointer is
cleared later in the processing by smc_llc_flow_start().
This can lead to a use-after-free condition when the processing does not
reach smc_llc_flow_start(), but frees the event because of an error
situation. Then the delayed_event pointer is still set but the event is
freed.
Fix this by always clearing the delayed event pointer when the event is
provided to the event handler for processing, and remove the code to
clear it in smc_llc_flow_start().
Fixes: 555da9af82 ("net/smc: add event-based llc_flow framework")
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
IF CONFIG_BPFILTER_UMH is set, building fails:
In file included from /usr/include/sys/socket.h:33:0,
from net/bpfilter/main.c:6:
/usr/include/bits/socket.h:390:10: fatal error: asm/socket.h: No such file or directory
#include <asm/socket.h>
^~~~~~~~~~~~~~
compilation terminated.
scripts/Makefile.userprogs:43: recipe for target 'net/bpfilter/main.o' failed
make[2]: *** [net/bpfilter/main.o] Error 1
Add missing include path to fix this.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch changes the module name to "ch_ipsec" and prepends
"ch_ipsec" string instead of "chcr" in all debug messages and
function names.
V1->V2:
-Removed inline keyword from functions.
-Removed CH_IPSEC prefix from pr_debug.
-Used proper indentation for the continuation line of the function
arguments.
V2->V3:
Fix the checkpatch.pl warnings.
Fixes: 1b77be4639 ("crypto/chcr: Moving chelsio's inline ipsec functionality to /drivers/net")
Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The 64-bit JEQ/JNE handling in reg_set_min_max() was clearing reg->id in either
true or false branch. In the case 'if (reg->id)' check was done on the other
branch the counter part register would have reg->id == 0 when called into
find_equal_scalars(). In such case the helper would incorrectly identify other
registers with id == 0 as equivalent and propagate the state incorrectly.
Fix it by preserving ID across reg_set_min_max().
In other words any kind of comparison operator on the scalar register
should preserve its ID to recognize:
r1 = r2
if (r1 == 20) {
#1 here both r1 and r2 == 20
} else if (r2 < 20) {
#2 here both r1 and r2 < 20
}
The patch is addressing #1 case. The #2 was working correctly already.
Fixes: 75748837b7 ("bpf: Propagate scalar ranges through register assignments.")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Tested-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20201014175608.1416-1-alexei.starovoitov@gmail.com
Fix the loss of transmission of a call's final ack when a socket gets shut
down. This means that the server will retransmit the last data packet or
send a ping ack and then get an ICMP indicating the port got closed. The
server will then view this as a failure.
Fixes: 3136ef49a1 ("rxrpc: Delay terminal ACK transmission on a client call")
Signed-off-by: David Howells <dhowells@redhat.com>
Fix rxrpc_unbundle_conn() to not drop the bundle usage count when cleaning
up an exclusive connection.
Based on the suggested fix from Hillf Danton.
Fixes: 245500d853 ("rxrpc: Rewrite the client connection manager")
Reported-by: syzbot+d57aaf84dd8a550e6d91@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Hillf Danton <hdanton@sina.com>
This definition is used by the iptables legacy UAPI, restore it.
Fixes: d3519cb89f ("netfilter: nf_tables: add inet ingress support")
Reported-by: Jason A. Donenfeld <Jason@zx2c4.com>
Tested-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David Wilder says:
====================
ibmveth gso fix
The ibmveth driver is a virtual Ethernet driver used on IBM pSeries systems.
Gso packets can be sent between LPARS (virtual hosts) without segmentation,
by flagging gso packets using one of two methods depending on the firmware
version. Some gso packet were not correctly identified by the receiver.
This patch-set corrects this issue.
V2:
- Added fix tags.
- Byteswap the constant at compilation time.
- Updated the commit message to clarify what frame validation is performed
by the hypervisor.
====================
Link: https://lore.kernel.org/r/20201013232014.26044-1-dwilder@us.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ingress large send packets are identified by either:
The IBMVETH_RXQ_LRG_PKT flag in the receive buffer
or with a -1 placed in the ip header checksum.
The method used depends on firmware version. Frame
geometry and sufficient header validation is performed by the
hypervisor eliminating the need for further header checks here.
Fixes: 7b5967389f ("ibmveth: set correct gso_size and gso_type")
Signed-off-by: David Wilder <dwilder@us.ibm.com>
Reviewed-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Reviewed-by: Cristobal Forno <cris.forno@ibm.com>
Reviewed-by: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ibmveth_rx_csum_helper() must be called after ibmveth_rx_mss_helper()
as ibmveth_rx_csum_helper() may alter ip and tcp checksum values.
Fixes: 66aa0678ef ("ibmveth: Support to enable LSO/CSO for Trunk VEA.")
Signed-off-by: David Wilder <dwilder@us.ibm.com>
Reviewed-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Reviewed-by: Cristobal Forno <cris.forno@ibm.com>
Reviewed-by: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The 4-tuple NAT offload via PEDIT always overwrites all the 4-tuple
fields even if they had not been explicitly enabled. If any fields in
the 4-tuple are not enabled, then the hardware overwrites the
disabled fields with zeros, instead of ignoring them.
So, add a parser that can translate the enabled 4-tuple PEDIT fields
to one of the NAT mode combinations supported by the hardware and
hence avoid overwriting disabled fields to 0. Any rule with
unsupported NAT mode combination is rejected.
Signed-off-by: Herat Ramani <herat@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Mathieu Desnoyers says:
====================
l3mdev icmp error route lookup fixes
Here is a series of fixes for ipv4 and ipv6 which ensure the route
lookup is performed on the right routing table in VRF configurations
when sending TTL expired icmp errors (useful for traceroute).
It includes tests for both ipv4 and ipv6.
These fixes address specifically address the code paths involved in
sending TTL expired icmp errors. As detailed in the individual commit
messages, those fixes do not address similar icmp errors related to
network namespaces and unreachable / fragmentation needed messages,
which appear to use different code paths.
====================
Link: https://lore.kernel.org/r/20201012145016.2023-1-mathieu.desnoyers@efficios.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The objective of the tests is to check that ICMP errors generated while
crossing between VRFs are properly routed back to the source host.
The first ttl test sends a ping with a ttl of 1 from h1 to h2 and parses the
output of the command to check that a ttl expired error is received.
The second ttl test runs traceroute from h1 to h2 and parses the output to
check for a hop on r1.
The mtu test sends a ping with a payload of 1450 from h1 to h2, through
r1 which has an interface with a mtu of 1400 and parses the output of the
command to check that a fragmentation needed error is received.
[ The IPv6 MTU test still fails with the symmetric routing setup. It
appears to be caused by source address selection picking ::1. Fixing
this is beyond the scope of this series. ]
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As per RFC4443, the destination address field for ICMPv6 error messages
is copied from the source address field of the invoking packet.
In configurations with Virtual Routing and Forwarding tables, looking up
which routing table to use for sending ICMPv6 error messages is
currently done by using the destination net_device.
If the source and destination interfaces are within separate VRFs, or
one in the global routing table and the other in a VRF, looking up the
source address of the invoking packet in the destination interface's
routing table will fail if the destination interface's routing table
contains no route to the invoking packet's source address.
One observable effect of this issue is that traceroute6 does not work in
the following cases:
- Route leaking between global routing table and VRF
- Route leaking between VRFs
Use the source device routing table when sending ICMPv6 error
messages.
[ In the context of ipv4, it has been pointed out that a similar issue
may exist with ICMP errors triggered when forwarding between network
namespaces. It would be worthwhile to investigate whether ipv6 has
similar issues, but is outside of the scope of this investigation. ]
[ Testing shows that similar issues exist with ipv6 unreachable /
fragmentation needed messages. However, investigation of this
additional failure mode is beyond this investigation's scope. ]
Link: https://tools.ietf.org/html/rfc4443
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As per RFC792, ICMP errors should be sent to the source host.
However, in configurations with Virtual Routing and Forwarding tables,
looking up which routing table to use is currently done by using the
destination net_device.
commit 9d1a6c4ea4 ("net: icmp_route_lookup should use rt dev to
determine L3 domain") changes the interface passed to
l3mdev_master_ifindex() and inet_addr_type_dev_table() from skb_in->dev
to skb_dst(skb_in)->dev. This effectively uses the destination device
rather than the source device for choosing which routing table should be
used to lookup where to send the ICMP error.
Therefore, if the source and destination interfaces are within separate
VRFs, or one in the global routing table and the other in a VRF, looking
up the source host in the destination interface's routing table will
fail if the destination interface's routing table contains no route to
the source host.
One observable effect of this issue is that traceroute does not work in
the following cases:
- Route leaking between global routing table and VRF
- Route leaking between VRFs
Preferably use the source device routing table when sending ICMP error
messages. If no source device is set, fall-back on the destination
device routing table. Else, use the main routing table (index 0).
[ It has been pointed out that a similar issue may exist with ICMP
errors triggered when forwarding between network namespaces. It would
be worthwhile to investigate, but is outside of the scope of this
investigation. ]
[ It has also been pointed out that a similar issue exists with
unreachable / fragmentation needed messages, which can be triggered by
changing the MTU of eth1 in r1 to 1400 and running:
ip netns exec h1 ping -s 1450 -Mdo -c1 172.16.2.2
Some investigation points to raw_icmp_error() and raw_err() as being
involved in this last scenario. The focus of this patch is TTL expired
ICMP messages, which go through icmp_route_lookup.
Investigation of failure modes related to raw_icmp_error() is beyond
this investigation's scope. ]
Fixes: 9d1a6c4ea4 ("net: icmp_route_lookup should use rt dev to determine L3 domain")
Link: https://tools.ietf.org/html/rfc792
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Extend nf_queue selftest to cover re-queueing, non-gso mode and
delayed queueing, from Florian Westphal.
2) Clear skb->tstamp in IPVS forwarding path, from Julian Anastasov.
3) Provide netlink extended error reporting for EEXIST case.
4) Missing VLAN offload tag and proto in log target.
====================
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
40GbE Intel Wired LAN Driver Updates 2020-10-12
This series contains updates to i40e and e1000 drivers.
Jaroslaw adds support for changing FEC on i40e if the firmware supports it.
Jesse fixes a kbuild-bot warning regarding ternary operator on e1000.
v2: Return -EOPNOTSUPP instead of -EINVAL when FEC settings are not
supported by firmware. Remove, unneeded, done label and return errors
directly in i40e_set_fec_param() for patch 1. Dropped, previous patch 2,
to send to net.
====================
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The e1000_clear_vfta function was triggering a warning in kbuild-bot
testing. It's actually a bug but has no functional impact.
drivers/net/ethernet/intel/e1000/e1000_hw.c:4415:58: warning: Same expression in both branches of ternary operator. [duplicateExpressionTernary]
Fix this warning by removing the offending code and simplifying
the routine to do exactly what it did before, no functional
change.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Starting with API version 1.10 firmware for X722 devices has ability
to change FEC settings in PHY. Code added in this patch allows
changing FEC settings if the capability flag indicates the device
supports this feature.
Signed-off-by: Jaroslaw Gawin <jaroslawx.gawin@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
GRE tunnel has its own header_ops, ipgre_header_ops, and sets it
conditionally. When it is set, it assumes the outer IP header is
already created before ipgre_xmit().
This is not true when we send packets through a raw packet socket,
where L2 headers are supposed to be constructed by user. Packet
socket calls dev_validate_header() to validate the header. But
GRE tunnel does not set dev->hard_header_len, so that check can
be simply bypassed, therefore uninit memory could be passed down
to ipgre_xmit(). Similar for dev->needed_headroom.
dev->hard_header_len is supposed to be the length of the header
created by dev->header_ops->create(), so it should be used whenever
header_ops is set, and dev->needed_headroom should be used when it
is not set.
Reported-and-tested-by: syzbot+4a2c52677a8a1aa283cb@syzkaller.appspotmail.com
Cc: William Tu <u9012063@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Xie He <xie.he.0141@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Heiner Kallweit says:
====================
net: add and use function dev_fetch_sw_netstats for fetching pcpu_sw_netstats
In several places the same code is used to populate rtnl_link_stats64
fields with data from pcpu_sw_netstats. Therefore factor out this code
to a new function dev_fetch_sw_netstats().
====================
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Simplify the code by using new function dev_fetch_sw_netstats().
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Tested-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/b6047017-8226-6b7e-a3cd-064e69fdfa27@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Simplify the code by using new function dev_fetch_sw_netstats().
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Acked-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/166259f2-084c-45d7-e610-2de2a0bdae06@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In several places the same code is used to populate rtnl_link_stats64
fields with data from pcpu_sw_netstats. Therefore factor out this code
to a new function dev_fetch_sw_netstats().
v2:
- constify argument netstats
- don't ignore netstats being NULL or an ERRPTR
- switch to EXPORT_SYMBOL_GPL
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/6d16a338-52f5-df69-0020-6bc771a7d498@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Allow user configuring RXCSUM separately with ethtool -K,
reusing the existing virtnet_set_guest_offloads helper
that configures RXCSUM for XDP. This is conditional on
VIRTIO_NET_F_CTRL_GUEST_OFFLOADS.
If Rx checksum is disabled, LRO should also be disabled.
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20201012015820.62042-1-xiangxia.m.yue@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 109f6e39fa ("af_unix: Allow SO_PEERCRED
to work across namespaces.") introduced the old_pid variable
in unix_listen, but it's never used.
Remove the declaration and the call to put_pid.
Signed-off-by: Or Cohen <orcohen@paloaltonetworks.com>
Link: https://lore.kernel.org/r/20201011153527.18628-1-orcohen@paloaltonetworks.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
SOCK_TSTAMP_NEW (timespec64 instead of timespec) is also used for
hardware time stamps (configured via SO_TIMESTAMPING_NEW).
User space (ptp4l) first configures hardware time stamping via
SO_TIMESTAMPING_NEW which sets SOCK_TSTAMP_NEW. In the next step, ptp4l
disables SO_TIMESTAMPNS(_NEW) (software time stamps), but this must not
switch hardware time stamps back to "32 bit mode".
This problem happens on 32 bit platforms were the libc has already
switched to struct timespec64 (from SO_TIMExxx_OLD to SO_TIMExxx_NEW
socket options). ptp4l complains with "missing timestamp on transmitted
peer delay request" because the wrong format is received (and
discarded).
Fixes: 887feae36a ("socket: Add SO_TIMESTAMP[NS]_NEW")
Fixes: 783da70e83 ("net: add sock_enable_timestamps")
Signed-off-by: Christian Eggers <ceggers@arri.de>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>