Clang does not recognize that calls to error() terminate execution
and complains about uninitialized variable use that happens after calls
to error(). This noop patchset fixes this.
Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Virtio-net devices negotiate LRO support with the host.
Display the initially negotiated state with ethtool -k.
Also allow configuring it with ethtool -K, reusing the existing
virtnet_set_guest_offloads helper that configures LRO for XDP.
This is conditional on VIRTIO_NET_F_CTRL_GUEST_OFFLOADS.
Virtio-net negotiates TSO4 and TSO6 separately, but ethtool does not
distinguish between the two. Display LRO as on only if any offload
is active.
RTNL is held while calling virtnet_set_features, same as on the path
from virtnet_xdp_set.
Changes v1 -> v2
- allow ethtool config (-K) only if VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
- show LRO as enabled if any LRO variant is enabled
- do not allow configuration while XDP is active
- differentiate current features from the capable set, to restore
on XDP down only those features that were active on XDP up
- move test out of VIRTIO_NET_F_CSUM/TSO branch, which is tx only
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for net-next:
1) Support for destination MAC in ipset, from Stefano Brivio.
2) Disallow all-zeroes MAC address in ipset, also from Stefano.
3) Add IPSET_CMD_GET_BYNAME and IPSET_CMD_GET_BYINDEX commands,
introduce protocol version number 7, from Jozsef Kadlecsik.
A follow up patch to fix ip_set_byindex() is also included
in this batch.
4) Honor CTA_MARK_MASK from ctnetlink, from Andreas Jaggi.
5) Statify nf_flow_table_iterate(), from Taehee Yoo.
6) Use nf_flow_table_iterate() to simplify garbage collection in
nf_flow_table logic, also from Taehee Yoo.
7) Don't use _bh variants of call_rcu(), rcu_barrier() and
synchronize_rcu_bh() in Netfilter, from Paul E. McKenney.
8) Remove NFC_* cache definition from the old caching
infrastructure.
9) Remove layer 4 port rover in NAT helpers, use random port
instead, from Florian Westphal.
10) Use strscpy() in ipset, from Qian Cai.
11) Remove NF_NAT_RANGE_PROTO_RANDOM_FULLY branch now that
random port is allocated by default, from Xiaozhou Liu.
12) Ignore NF_NAT_RANGE_PROTO_RANDOM too, from Florian Westphal.
13) Limit port allocation selection routine in NAT to avoid
softlockup splats when most ports are in use, from Florian.
14) Remove unused parameters in nf_ct_l4proto_unregister_sysctl()
from Yafang Shao.
15) Direct call to nf_nat_l4proto_unique_tuple() instead of
indirection, from Florian Westphal.
16) Several patches to remove all layer 4 NAT indirections,
remove nf_nat_l4proto struct, from Florian Westphal.
17) Fix RTP/RTCP source port translation when SNAT is in place,
from Alin Nastac.
18) Selective rule dump per chain, from Phil Sutter.
19) Revisit CLUSTERIP target, this includes a deadlock fix from
netns path, sleep in atomic, remove bogus WARN_ON_ONCE()
and disallow mismatching IP address and MAC address.
Patchset from Taehee Yoo.
20) Update UDP timeout to stream after 2 seconds, from Florian.
21) Shrink UDP established timeout to 120 seconds like TCP timewait.
22) Sysctl knobs to set GRE timeouts, from Yafang Shao.
23) Move seq_print_acct() to conntrack core file, from Florian.
24) Add enum for conntrack sysctl knobs, also from Florian.
25) Place nf_conntrack_acct, nf_conntrack_helper, nf_conntrack_events
and nf_conntrack_timestamp knobs in the core, from Florian Westphal.
As a side effect, shrink netns_ct structure by removing obsolete
sysctl anchors, also from Florian.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf-next 2018-12-21
The following pull-request contains BPF updates for your *net-next* tree.
There is a merge conflict in test_verifier.c. Result looks as follows:
[...]
},
{
"calls: cross frame pruning",
.insns = {
[...]
.prog_type = BPF_PROG_TYPE_SOCKET_FILTER,
.errstr_unpriv = "function calls to other bpf functions are allowed for root only",
.result_unpriv = REJECT,
.errstr = "!read_ok",
.result = REJECT,
},
{
"jset: functional",
.insns = {
[...]
{
"jset: unknown const compare not taken",
.insns = {
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
BPF_FUNC_get_prandom_u32),
BPF_JMP_IMM(BPF_JSET, BPF_REG_0, 1, 1),
BPF_LDX_MEM(BPF_B, BPF_REG_8, BPF_REG_9, 0),
BPF_EXIT_INSN(),
},
.prog_type = BPF_PROG_TYPE_SOCKET_FILTER,
.errstr_unpriv = "!read_ok",
.result_unpriv = REJECT,
.errstr = "!read_ok",
.result = REJECT,
},
[...]
{
"jset: range",
.insns = {
[...]
},
.prog_type = BPF_PROG_TYPE_SOCKET_FILTER,
.result_unpriv = ACCEPT,
.result = ACCEPT,
},
The main changes are:
1) Various BTF related improvements in order to get line info
working. Meaning, verifier will now annotate the corresponding
BPF C code to the error log, from Martin and Yonghong.
2) Implement support for raw BPF tracepoints in modules, from Matt.
3) Add several improvements to verifier state logic, namely speeding
up stacksafe check, optimizations for stack state equivalence
test and safety checks for liveness analysis, from Alexei.
4) Teach verifier to make use of BPF_JSET instruction, add several
test cases to kselftests and remove nfp specific JSET optimization
now that verifier has awareness, from Jakub.
5) Improve BPF verifier's slot_type marking logic in order to
allow more stack slot sharing, from Jiong.
6) Add sk_msg->size member for context access and add set of fixes
and improvements to make sock_map with kTLS usable with openssl
based applications, from John.
7) Several cleanups and documentation updates in bpftool as well as
auto-mount of tracefs for "bpftool prog tracelog" command,
from Quentin.
8) Include sub-program tags from now on in bpf_prog_info in order to
have a reliable way for user space to get all tags of the program
e.g. needed for kallsyms correlation, from Song.
9) Add BTF annotations for cgroup_local_storage BPF maps and
implement bpf fs pretty print support, from Roman.
10) Fix bpftool in order to allow for cross-compilation, from Ivan.
11) Update of bpftool license to GPLv2-only + BSD-2-Clause in order
to be compatible with libbfd and allow for Debian packaging,
from Jakub.
12) Remove an obsolete prog->aux sanitation in dump and get rid of
version check for prog load, from Daniel.
13) Fix a memory leak in libbpf's line info handling, from Prashant.
14) Fix cpumap's frame alignment for build_skb() so that skb_shared_info
does not get unaligned, from Jesper.
15) Fix test_progs kselftest to work with older compilers which are less
smart in optimizing (and thus throwing build error), from Stanislav.
16) Cleanup and simplify AF_XDP socket teardown, from Björn.
17) Fix sk lookup in BPF kselftest's test_sock_addr with regards
to netns_id argument, from Andrey.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Willem de Bruijn says:
====================
expand txtimestamp selftest
Convert the existing txtimestamp test to run as part of kselftest
and return a pass/fail.
Also expand the variations of timestamping tested, including packet
sockets, ipv6 raw and dgram and passing options using cmsg.
These are enough changes to split across a few patches, even if all
changes are only this one test.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Run the transmit timestamp tests as part of kselftests.
Add a txtimestamp.sh test script that runs most variants:
ipv4/ipv6, tcp/udp/raw/raw_ipproto/pf_packet, data/nodata,
setsockopt/cmsg. The script runs tests with netem delays.
Refine txtimestamp.c to validate results. Take expected
netem delays as input and compare against real timestamps.
To run without dependencies, add a listener socket to be
able to connect in the case of TCP.
Add the timestamping directory to the kselftests Makefile.
Build all the binaries. Only run verified txtimestamp.sh.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Expand the transmit timestamp regression test with support for
missing protocols: ipv6 datagram and raw and pf_packet.
Also refine resolve_hostname to independently request AF_INET or
AF_INET6 addresses. Else, ipv4 addresses may be returned as AF_INET6.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 3dd17e63f5 ("sock: accept SO_TIMESTAMPING flags in socket
cmsg") added support for passing tx timestamping options per-call
in sendmsg.
Expand the txtimestamp test with support for this feature.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extract "Protocol" field decompression code from transport protocols to
PPP generic layer, where it actually belongs. As a consequence, this
patch fixes incorrect place of PFC decompression in L2TP driver (when
it's not PPPOX_BOUND) and also enables this decompression for other
protocols, like PPPoE.
Protocol field decompression also happens in PPP Multilink Protocol
code and in PPP compression protocols implementations (bsd, deflate,
mppe). It looks like there is no easy way to get rid of that, so it was
decided to leave it as is, but provide those cases with appropriate
comments instead.
Changes in v2:
- Fix the order of checking skb data room and proto decompression
- Remove "inline" keyword from ppp_decompress_proto()
- Don't split line before function name
- Prefix ppp_decompress_proto() function with "__"
- Add ppp_decompress_proto() function with skb data room checks
- Add description for introduced functions
- Fix comments (as per review on mailing list)
Signed-off-by: Sam Protsenko <semen.protsenko@linaro.org>
Reviewed-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Last set of patches for 4.21. mt76 is still in very active development
and having some refactoring as well as new features. But also other
drivers got few new features and fixes.
Major changes:
ath10k
* add amsdu support for QCA6174 monitor mode
* report tx rate using the new ieee80211_tx_rate_update() API
* wcn3990 support is not experimental anymore
iwlwifi
* support for FW version 43 for 9000 and 22000 series
brcmfmac
* add support for CYW43012 SDIO chipset
* add the raw 4354 PCIe device ID for unprogrammed Cypress boards
mwifiex
* add NL80211_STA_INFO_RX_BITRATE support
mt76
* use the same firmware for mt76x2e and mt76x2u
* mt76x0e survey support
* more unification between mt76x2 and mt76x0
* mt76x0e AP mode support
* mt76x0e DFS support
* rework and fix tx status handling for mt76x0 and mt76x2
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJcG9TEAAoJEG4XJFUm622byW8H/1vMVJhXwgIZbHeoUKNa47Yp
Z7Jv5vW8IGXu+lp7DyoedDCbq4+lskNSlDV1DmysNChLgDnApU/3oCd/jH8EiGPV
JAFUHb85HuVLTTpPpNHtnYz3IzL7r098TNVxOU0VD+xILM0Mf0aCeXztgmFWpGaY
/rfHkId8oKUezIjdu6Dc96mqITrT6WRNtnOMfjr6dZPjClRTS44Hyz3Ga3rXABBL
/n8BCkl0GpKGrL3mBy2CCR5mVY8zfxMB4Aj2zx7bccZ8i2i2QjrGlXCHyB6ImNrR
lv4L1fUVXZWVdeOe8EbpftY7zEsPrX+XNm6h1kckdB7UyuBROpQLsVb+yxlLh9g=
=mhAw
-----END PGP SIGNATURE-----
Merge tag 'wireless-drivers-next-for-davem-2018-12-20' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
Kalle Valo says:
====================
wireless-drivers-next patches for 4.21
Last set of patches for 4.21. mt76 is still in very active development
and having some refactoring as well as new features. But also other
drivers got few new features and fixes.
Major changes:
ath10k
* add amsdu support for QCA6174 monitor mode
* report tx rate using the new ieee80211_tx_rate_update() API
* wcn3990 support is not experimental anymore
iwlwifi
* support for FW version 43 for 9000 and 22000 series
brcmfmac
* add support for CYW43012 SDIO chipset
* add the raw 4354 PCIe device ID for unprogrammed Cypress boards
mwifiex
* add NL80211_STA_INFO_RX_BITRATE support
mt76
* use the same firmware for mt76x2e and mt76x2u
* mt76x0e survey support
* more unification between mt76x2 and mt76x0
* mt76x0e AP mode support
* mt76x0e DFS support
* rework and fix tx status handling for mt76x0 and mt76x2
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Don't need extern prefix before function prototypes.
Checkpatch has complained about this for a couple of years.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel says:
====================
net: ipv4: Prevent user triggerable warning
Patch #1 prevents a user triaggerable warning in the flow dissector by
setting 'skb->dev' in skbs used for IPv4 output route get requests.
Patch #2 adds a test case that triggers the warning without the first
patch.
I have audited all the RTM_GETROUTE handlers and could not find any
other callpath where an skb is passed to the flow dissector with both
'skb->dev' and 'skb->sk' cleared.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Without previous patch a warning would be generated upon multipath route
get when FIB multipath hash policy is to use a 5-tuple for multipath
hash calculation.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When user requests to resolve an output route, the kernel synthesizes
an skb where the relevant parameters (e.g., source address) are set. The
skb is then passed to ip_route_output_key_hash_rcu() which might call
into the flow dissector in case a multipath route was hit and a nexthop
needs to be selected based on the multipath hash.
Since both 'skb->dev' and 'skb->sk' are not set, a warning is triggered
in the flow dissector [1]. The warning is there to prevent codepaths
from silently falling back to the standard flow dissector instead of the
BPF one.
Therefore, instead of removing the warning, set 'skb->dev' to the
loopback device, as its not used for anything but resolving the correct
namespace.
[1]
WARNING: CPU: 1 PID: 24819 at net/core/flow_dissector.c:764 __skb_flow_dissect+0x314/0x16b0
...
RSP: 0018:ffffa0df41fdf650 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8bcded232000 RCX: 0000000000000000
RDX: ffffa0df41fdf7e0 RSI: ffffffff98e415a0 RDI: ffff8bcded232000
RBP: ffffa0df41fdf760 R08: 0000000000000000 R09: 0000000000000000
R10: ffffa0df41fdf7e8 R11: ffff8bcdf27a3000 R12: ffffffff98e415a0
R13: ffffa0df41fdf7e0 R14: ffffffff98dd2980 R15: ffffa0df41fdf7e0
FS: 00007f46f6897680(0000) GS:ffff8bcdf7a80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055933e95f9a0 CR3: 000000021e636000 CR4: 00000000001006e0
Call Trace:
fib_multipath_hash+0x28c/0x2d0
? fib_multipath_hash+0x28c/0x2d0
fib_select_path+0x241/0x32f
? __fib_lookup+0x6a/0xb0
ip_route_output_key_hash_rcu+0x650/0xa30
? __alloc_skb+0x9b/0x1d0
inet_rtm_getroute+0x3f7/0xb80
? __alloc_pages_nodemask+0x11c/0x2c0
rtnetlink_rcv_msg+0x1d9/0x2f0
? rtnl_calcit.isra.24+0x120/0x120
netlink_rcv_skb+0x54/0x130
rtnetlink_rcv+0x15/0x20
netlink_unicast+0x20a/0x2c0
netlink_sendmsg+0x2d1/0x3d0
sock_sendmsg+0x39/0x50
___sys_sendmsg+0x2a0/0x2f0
? filemap_map_pages+0x16b/0x360
? __handle_mm_fault+0x108e/0x13d0
__sys_sendmsg+0x63/0xa0
? __sys_sendmsg+0x63/0xa0
__x64_sys_sendmsg+0x1f/0x30
do_syscall_64+0x5a/0x120
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: d0e13a1488 ("flow_dissector: lookup netns by skb->sk if skb->dev is NULL")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When doing indirect access in the Ocelot chip, a command is setup,
issued and then we need to poll until the result is ready. The polling
timeout is specified in milliseconds in the datasheet and not in
register access attempts.
It is not a bug on the currently supported platform, but we observed
that the code does not work properly on other platforms that we want to
support as the timing requirements there are different.
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the stray semicolon means that the final term in the addition
is being missed. Fix this by removing it. Cleans up clang warning:
net/core/neighbour.c:2821:9: warning: expression result unused [-Wunused-value]
Fixes: 82cbb5c631 ("neighbour: register rtnl doit handler")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-By: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arjun Vynipadath will be taking over as maintainer from now.
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For fadump to work successfully there should not be any holes in reserved
memory ranges where kernel has asked firmware to move the content of old
kernel memory in event of crash. Now that fadump uses CMA for reserved
area, this memory area is now not protected from hot-remove operations
unless it is cma allocated. Hence, fadump service can fail to re-register
after the hot-remove operation, if hot-removed memory belongs to fadump
reserved region. To avoid this make sure that memory from fadump reserved
area is not hot-removable if fadump is registered.
However, if user still wants to remove that memory, he can do so by
manually stopping fadump service before hot-remove operation.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
fadump fails to register when there are holes in reserved memory area.
This can happen if user has hot-removed a memory that falls in the
fadump reserved memory area. Throw a meaningful error message to the
user in such case.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
[mpe: is_reserved_memory_area_contiguous() returns bool, unsplit string]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
One of the primary issues with Firmware Assisted Dump (fadump) on Power
is that it needs a large amount of memory to be reserved. On large
systems with TeraBytes of memory, this reservation can be quite
significant.
In some cases, fadump fails if the memory reserved is insufficient, or
if the reserved memory was DLPAR hot-removed.
In the normal case, post reboot, the preserved memory is filtered to
extract only relevant areas of interest using the makedumpfile tool.
While the tool provides flexibility to determine what needs to be part
of the dump and what memory to filter out, all supported distributions
default this to "Capture only kernel data and nothing else".
We take advantage of this default and the Linux kernel's Contiguous
Memory Allocator (CMA) to fundamentally change the memory reservation
model for fadump.
Instead of setting aside a significant chunk of memory nobody can use,
this patch uses CMA instead, to reserve a significant chunk of memory
that the kernel is prevented from using (due to MIGRATE_CMA), but
applications are free to use it. With this fadump will still be able
to capture all of the kernel memory and most of the user space memory
except the user pages that were present in CMA region.
Essentially, on a P9 LPAR with 2 cores, 8GB RAM and current upstream:
[root@zzxx-yy10 ~]# free -m
total used free shared buff/cache available
Mem: 7557 193 6822 12 541 6725
Swap: 4095 0 4095
With this patch:
[root@zzxx-yy10 ~]# free -m
total used free shared buff/cache available
Mem: 8133 194 7464 12 475 7338
Swap: 4095 0 4095
Changes made here are completely transparent to how fadump has
traditionally worked.
Thanks to Aneesh Kumar and Anshuman Khandual for helping us understand
CMA and its usage.
TODO:
- Handle case where CMA reservation spans nodes.
Signed-off-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
opal_power_control_init() depends on opal message notifier to be
initialized, which is done in opal_init()->opal_message_init(). But both
these initialization are called through machine initcalls and it all
depends on in which order they being called. So far these are called in
correct order (may be we got lucky) and never saw any issue. But it is
clearer to control initialization order explicitly by moving
opal_power_control_init() into opal_init().
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The script "checkpatch.pl" pointed information out like the following.
WARNING: void function return statements are not generally useful
Thus remove such a statement in the affected functions.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Omit an extra message for a memory allocation failure in these
functions.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
A single character (line break) should be put into a sequence.
Thus use the corresponding function "seq_putc".
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Some data were printed into a sequence by four separate function calls.
Print the same data by two single function calls instead.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
CONFIG_EARLY_DEBUG_CPM requires IMMR area TLB to be pinned
otherwise it doesn't survive MMU_init, and the boot fails.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
CONFIG_PCI_MSI was made mandatory by commit a311e738b6
("powerpc/powernv: Make PCI non-optional") so the #ifdef
checks around CONFIG_PCI_MSI here can be removed entirely.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Fix a spelling mistake in a register description.
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Commit 14c63f17b1 ("perf: Drop sample rate when sampling is too
slow") introduced a way to throttle PMU interrupts if we're spending
too much time just processing those. Wire up powerpc PMI handler to
use this infrastructure.
We have throttling of the *rate* of interrupts, but this adds
throttling based on the *time taken* to process the interrupts.
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
It was reported that IPsec would crash when it encounters an IPv6
reassembled packet because skb->sk is non-zero and not a valid
pointer.
This is because skb->sk is now a union with ip_defrag_offset.
This patch fixes this by resetting skb->sk when exiting from
the reassembly code.
Reported-by: Xiumei Mu <xmu@redhat.com>
Fixes: 219badfaad ("ipv6: frags: get rid of ip6frag_skb_cb/...")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The MAC table in Ocelot supports auto aging (normal) and static entries.
MAC entries that is manually configured should be static and not subject
to aging.
Fixes: a556c76adc ("net: mscc: Add initial Ocelot switch support")
Signed-off-by: Allan Nielsen <allan.nielsen@microchip.com>
Reviewed-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Port partitioning is done by enabling UNICAST_VLAN_BOUNDARY and changing
the default port membership of 0x7f to other values such that there is
no communication between ports. In KSZ9477 the member for port 1 is
0x41; port 2, 0x42; port 3, 0x44; port 4, 0x48; port 5, 0x50; and port 7,
0x60. Port 6 is the host port.
Setting a zero value can be used to stop port from receiving.
However, when UNICAST_VLAN_BOUNDARY is disabled and the unicast addresses
are already learned in the dynamic MAC table, setting zero still allows
devices connected to those ports to communicate. This does not apply to
multicast and broadcast addresses though. To prevent these leaks and
make the function of port membership consistent UNICAST_VLAN_BOUNDARY
should never be disabled.
Note that UNICAST_VLAN_BOUNDARY is enabled by default in KSZ9477.
Fixes: b987e98e50 ("dsa: add DSA switch driver for Microchip KSZ9477")
Signed-off-by: Tristram Ha <Tristram.Ha@microchip.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When resolving the conflict wrt. the vxlan_fdb_update call
in vxlan_changelink() I made the last argument false instead
of true.
Fix this.
Signed-off-by: David S. Miller <davem@davemloft.net>
remove the obsolete sysctl anchors and move auto_assign_helper_warned
to avoid/cover a hole. Reduces size by 40 bytes on 64 bit.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This series adds some misc updates and the support for tunnels over VLAN
tc offloads.
From Miroslav Lichvar, patches #1,2
1) Update timecounter at least twice per counter overflow
2) Extend PTP gettime function to read system clock
From Gavi Teitz, patch #3
3) Increase VF representors' SQ size to 128
From Eli Britstein and Or Gerlitz, patches #4-10
4) Adds the capability to support tunnels over VLAN device.
Patch 4 avoids crash for TC flow with egress upper devices
Patch 5 refactors tunnel routing devs into a helper function
Patch 6 avoids crash for TC encap flows with vlan on underlay
Patches 7-8 refactor encap tunnel header preparing code.
Patch 9 adds support for building VLAN tagged ETH header.
Patch 10 adds support for tunnel routing to VLAN device.
From Aviv, patches 11,12 to fix earlier VF lag series
5) Fix query_nic_sys_image_guid() error during init
6) Fix LAG requirement when CONFIG_MLX5_ESWITCH is off
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJcG5O8AAoJEEg/ir3gV/o+v0AH/ja1bJ6PkAlw2bCRMIuI6HK/
1vh9n8D74tIOAlvUi6QiLOgJ7CLdAgiAVFppDWzJSBUsY2XNAtRdYvLDDt9hGafO
QGysBWNVcX2aUVp+pLDCCVEYBWyyIzW416CWHx2IUgdAg9S6cvJK6P/81wd+l4Zp
8YlPstEUANP/JKZxHHjSelMcnY+Bj0JrDzuyyyaQdwmcHo5I7Ht0tNex14yFfFbl
gd7YvfPr1PtovPX2w9hMt3y3ml4mommB+jtxc0+59D+A7680hBOpAnH5pONms9rv
OKKKV8KqpxL1m32/TGbd7XSG+llLJwsSM6RtD4oUdiPA4iCiVDVdreP5vac52C4=
=0QQ5
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2018-12-19' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2018-12-19
This series adds some misc updates and the support for tunnels over VLAN
tc offloads.
From Miroslav Lichvar, patches #1,2
1) Update timecounter at least twice per counter overflow
2) Extend PTP gettime function to read system clock
From Gavi Teitz, patch #3
3) Increase VF representors' SQ size to 128
From Eli Britstein and Or Gerlitz, patches #4-10
4) Adds the capability to support tunnels over VLAN device.
Patch 4 avoids crash for TC flow with egress upper devices
Patch 5 refactors tunnel routing devs into a helper function
Patch 6 avoids crash for TC encap flows with vlan on underlay
Patches 7-8 refactor encap tunnel header preparing code.
Patch 9 adds support for building VLAN tagged ETH header.
Patch 10 adds support for tunnel routing to VLAN device.
From Aviv, patches 11,12 to fix earlier VF lag series
5) Fix query_nic_sys_image_guid() error during init
6) Fix LAG requirement when CONFIG_MLX5_ESWITCH is off
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
after moving sysctl handling into single place, the init functions
can't fail anymore and some of the fini functions are empty.
Remove them and change return type to void.
This also simplifies error unwinding in conntrack module init path.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Similar to previous change, this time for eache and timestamp.
Unlike helper and acct, these can be disabled at build time, so they
need ifdef guards.
Next patch will remove a few (now obsolete) functions.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Needless copy&paste, just handle all in one. Next patch will handle
acct and timestamp, which have similar functions.
Intentionally leaves cruft behind, will be cleaned up in a followup
patch.
The obsolete sysctl pointers in netns_ct struct are left in place and
removed in a single change, as changes to netns trigger rebuild of
almost all files.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Its a bit hard to see what table[3] really lines up with, so add
human-readable mnemonics and use them for initialisation.
This makes it easier to see e.g. which sysctls are not exported to
unprivileged userns.
objdiff shows no changes.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch adds two sysctl knobs for GRE:
net.netfilter.nf_conntrack_gre_timeout = 30
net.netfilter.nf_conntrack_gre_timeout_stream = 180
Update the Documentation as well.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Ido Schimmel says:
====================
mlxsw: Two usability improvements
This patchset contains two small improvements in the mlxsw driver. The
first one, in patches #1-#2, relieves the user from the need to
configure a VLAN interface and only later the corresponding VXLAN
tunnel. The issue is explained in detail in the first patch.
The second improvement is described below and allows the user to make
use of VID 1 by having the driver use the reserved 4095 VID for untagged
traffic.
VLAN entries on a given port can be associated with either a bridge or a
router. For example, if swp1.10 is assigned an IP address and swp1.20 is
enslaved to a VLAN-unaware bridge, then both {Port 1, VID 10} and {Port
1, VID 20} would be associated with a filtering identifier (FID) of the
correct type.
In case swp1 itself is assigned an IP address or enslaved to a
VLAN-unaware bridge, then a FID would be associated with {Port 1, VID
1}. Using VID 1 for this purpose means that VLAN devices with VID 1
cannot be created over mlxsw ports, as this VID is (ab)used as the
default VLAN.
Instead of using VID 1 for this purpose, we can use VID 4095 which is
reserved for internal use and cannot be configured by either the 8021q
or the bridge driver.
Patches #3-#7 perform small and non-functional changes that finally
allow us to switch to VID 4095 as the default VID in patch #8.
Patch #9 removes the limitation about creation of VLAN devices with VID
1 over mlxsw ports.
Patches #10-#11 add test cases.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Previous patches made it possible to setup VLAN devices with VID 1 over
mlxsw ports. Verify this functionality actually works by conducting a
simple router test over VID 1.
Adding this test as a generic test since it can be run using veth pairs
and it can also be useful for other physical devices where VID 1 was
considered reserved (knowingly or not).
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previous patches made it possible to create VLAN devices with VID 1 over
mlxsw ports. Adjust the test to verify such an operation succeeds.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
VID 1 is not reserved anymore, so remove the check that prevented the
creation of VLAN devices with this VID over mlxsw ports.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no need to abuse VID 1 anymore and we can instead use VID 4095
as the default VLAN, which will be configured on the port throughout its
lifetime.
The OVS join / leave functions are changed to enable VIDs 1-4094
(inclusive) instead of 2-4095. This because VID 4095 is now the default
VLAN instead of 1.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
VLAN entries on a port can be associated with either a bridge VLAN or a
router port. Before the VLAN entry is destroyed these associations need
to be cleaned up.
Currently, this is always invoked from the function which destroys the
VLAN entry, but next patch is going to skip the destruction of the
default entry when a port in unlinked from a LAG.
The above does not mean that the associations should not be cleaned up,
so add a helper that will be invoked from both call sites.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Subsequent patches will need to access the default port VLAN. Since this
VLAN will exist throughout the lifetime of the port, simply store it in
the port's struct.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>