Commit Graph

665864 Commits

Author SHA1 Message Date
Vivien Didelot
567aa59a8b net: dsa: mv88e6xxx: simplify VTU entry getter
Make the code which fetches or initializes a new VTU entry more concise.
This allows us the get rid of the old underscore prefix naming.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01 15:03:13 -04:00
Vivien Didelot
bf7d71c045 net: dsa: mv88e6xxx: make VTU helpers static
Now that we have chip operations for VTU accesses, mark all helpers from
global1_vtu.c as static. Only the various implementations of the
GetNext, LoadPurge and Flush operations need to be exposed.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01 15:03:12 -04:00
Vivien Didelot
0ad5daf6ba net: dsa: mv88e6xxx: add VTU Load/Purge operation
Add a new vtu_loadpurge operation to the chip info structure to differ
the various implementations of the VTU accesses.

Now that the STU handling is abstracted behind VTU operations, kill the
obsolete MV88E6XXX_FLAG_STU flag.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01 15:03:12 -04:00
Vivien Didelot
f1394b78a6 net: dsa: mv88e6xxx: add VTU GetNext operation
Add a new vtu_getnext operation to the chip info structure to differ the
various implementations of the VTU accesses.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01 15:03:12 -04:00
Vivien Didelot
021e64ff76 net: dsa: mv88e6xxx: load STU entry with VTU entry
Now that the code writes both VTU and STU data when loading a VTU entry,
load the corresponding STU entry at the same time.

This allows us to get rid of the STU management in the
_mv88e6xxx_vtu_new helper and thus remove the separate implementations
of STU Load/Purge and STU GetNext, as well as the unused family checks.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01 15:03:11 -04:00
Vivien Didelot
ef6fcea37f net: dsa: mv88e6xxx: get STU entry on VTU GetNext
Now that the code reads both VTU and STU data on VTU GetNext operation,
fetch the STU entry data of a VTU entry at the same time.

The STU data bits are masked with the VTU data bits and they are now all
read at the same time a VTU GetNext operation is issued.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01 15:03:11 -04:00
Vivien Didelot
66a8e1f933 net: dsa: mv88e6xxx: move STU GetNext operation
Extract the generic portion of code to issue an STU GetNext operation,
which will be used in other implementations.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01 15:03:11 -04:00
Vivien Didelot
c499a64f34 net: dsa: mv88e6xxx: move VTU Data accessors
The code to access the VTU Data registers currently only supports the
88E6185 family and alike: 2-bit membership adjacent to 2-bit port state.

Even though the 88E6352 family introduced an indirect table to program
the VLAN Spanning Tree states, the usage of the VTU Data registers
remains the same regardless the VTU or STU operation.

Now that the mv88e6xxx_vtu_entry structure contains both port membership
and states data, factorize the code to access them in global1_vtu.c.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01 15:03:11 -04:00
Vivien Didelot
f169e5ee5f net: dsa: mv88e6xxx: move generic VTU GetNext
Even though every switch model has a different way to access the VTU
Data bits, the base implementation of the VTU GetNext operation remains
the same: wait, write the first VID to iterate from, start the
operation, and read the next VID.

Move this generic implementation into global1_vtu.c and abstract the
handling of the start VID (similarly to the ATU GetNext implementation),
before introducing a new chip operation for specific chips.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01 15:03:10 -04:00
Vivien Didelot
3afb4bde6f net: dsa: mv88e6xxx: move VTU VID accessors
Add helpers to access the VTU VID register in the global1_vtu.c file.

At the same time, move mv88e6xxx_g1_vtu_vid_write at the beginning of
_mv88e6xxx_vtu_loadpurge, which adds no functional changes but makes
future patches simpler.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01 15:03:10 -04:00
Vivien Didelot
d2ca1ea18d net: dsa: mv88e6xxx: move VTU SID accessors
Add helpers to access the VTU SID register in the global1_vtu.c file.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01 15:03:10 -04:00
Vivien Didelot
8ee51f6b4f net: dsa: mv88e6xxx: move VTU FID accessors
Add helpers to access the VTU FID register in the global1_vtu.c file.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01 15:03:10 -04:00
Vivien Didelot
b486d7c95c net: dsa: mv88e6xxx: move VTU flush
Move the VTU flush operation to global1_vtu.c and call it from a
mv88e6xxx_vtu_setup helper, similarly to the ATU and PVT setup.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01 15:03:09 -04:00
Vivien Didelot
332aa5ccc8 net: dsa: mv88e6xxx: move VTU Operation accessors
Move the helper functions to access the Global 1 VTU Operation register
to a new global1_vtu.c file, and get rid of the old underscore prefix
naming convention. This file will be extended will all VTU/STU related
code.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01 15:03:09 -04:00
Vivien Didelot
bd00e053ae net: dsa: mv88e6xxx: split VTU entry data member
VLAN aware Marvell chips can program 802.1Q VLAN membership as well as
802.1s per VLAN Spanning Tree state using the same 3 VTU Data registers.

Some chips such as 88E6185 use different Data registers offsets for
ports state and membership, and program them in a single operation.

Other chips such as 88E6352 use the same register layout but program
them in distinct operations (an indirect table is used for 802.1s.)

Newer chips such as 88E6390 use the same offsets for both state and
membership in distinct operations, thus require multiple data accesses.

To correctly abstract this, split the "data" structure member of
mv88e6xxx_vtu_entry in two "state" and "member" members, before adding
VTU support for newer chips.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01 15:03:09 -04:00
Vivien Didelot
3cf3c8469f net: dsa: mv88e6xxx: add max VID to info
Some chips don't have a VLAN Table Unit, most of them do have a 4K
table, some others as the 88E6390 family has a 13th bit for the VID.

Add a new max_vid member to the info structure, used to check the
presence of a VTU as well as the value used to iterate from in VTU
GetNext operations.

This makes the MV88E6XXX_FLAG_VTU obsolete, thus remove it.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01 15:03:09 -04:00
Ilan Tayari
152afb9b45 xfrm: Indicate xfrm_state offload errors
Current code silently ignores driver errors when configuring
IPSec offload xfrm_state, and falls back to host-based crypto.

Fail the xfrm_state creation if the driver has an error, because
the NIC offloading was explicitly requested by the user program.

This will communicate back to the user that there was an error.

Fixes: d77e38e612 ("xfrm: Add an IPsec hardware offloading API")
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01 14:59:39 -04:00
Ilan Tayari
67d349ed60 net/esp4: Fix invalid esph pointer crash
Both esp_output and esp_xmit take a pointer to the ESP header
and place it in esp_info struct prior to calling esp_output_head.

Inside esp_output_head, the call to esp_output_udp_encap
makes sure to update the pointer if it gets invalid.
However, if esp_output_head itself calls skb_cow_data, the
pointer is not updated and stays invalid, causing a crash
after esp_output_head returns.

Update the pointer if it becomes invalid in esp_output_head

Fixes: fca11ebde3 ("esp4: Reorganize esp_output")
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01 14:58:50 -04:00
Craig Gallek
89a23c8b52 ip6_tunnel: Fix missing tunnel encapsulation limit option
The IPv6 tunneling code tries to insert IPV6_TLV_TNL_ENCAP_LIMIT and
IPV6_TLV_PADN options when an encapsulation limit is defined (the
default is a limit of 4).  An MTU adjustment is done to account for
these options as well.  However, the options are never present in the
generated packets.

The issue appears to be a subtlety between IPV6_DSTOPTS and
IPV6_RTHDRDSTOPTS defined in RFC 3542.  When the IPIP tunnel driver was
written, the encap limit options were included as IPV6_RTHDRDSTOPTS in
dst0opt of struct ipv6_txoptions.  Later, ipv6_push_nfrags_opts was
(correctly) updated to require IPV6_RTHDR options when IPV6_RTHDRDSTOPTS
are to be used.  This caused the options to no longer be included in v6
encapsulated packets.

The fix is to use IPV6_DSTOPTS (in dst1opt of struct ipv6_txoptions)
instead.  IPV6_DSTOPTS do not have the additional IPV6_RTHDR requirement.

Fixes: 1df64a8569c7: ("[IPV6]: Add ip6ip6 tunnel driver.")
Fixes: 333fad5364: ("[IPV6]: Support several new sockopt / ancillary data in Advanced API (RFC3542)")
Signed-off-by: Craig Gallek <kraig@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01 14:52:45 -04:00
David S. Miller
f9ed236c2a Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:

====================
pull request: bluetooth-next 2017-04-30

Here's one last batch of Bluetooth patches in the bluetooth-next tree
targeting the 4.12 kernel.

 - Remove custom ECDH implementation and use new KPP API instead
 - Add protocol checks to hci_ldisc
 - Add module license to HCI UART Nokia H4+ driver
 - Minor fix for 32bit user space - 64 bit kernel combination

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01 14:34:46 -04:00
Liam Beguin
d5066c467e switchdev: documentation: fix whitespace issues
Figure 1 is full of whitespaces; fix it

Signed-off-by: Liam Beguin <lbeguin@tycoint.com>
Signed-off-by: Sylvain Lemieux <slemieux@tycoint.com>
Acked-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01 13:52:43 -04:00
Ido Schimmel
b1e455260c mlxsw: spectrum_router: Simplify VRF enslavement
When a netdev is enslaved to a VRF master, its router interface (RIF)
needs to be destroyed (if exists) and a new one created using the
corresponding virtual router (VR).

>From the driver's perspective, the above is equivalent to an inetaddr
event sent for this netdev. Therefore, when a port netdev (or its
uppers) are enslaved to a VRF master, call the same function that
would've been called had a NETDEV_UP was sent for this netdev in the
inetaddr notification chain.

This patch also fixes a bug when a LAG netdev with an existing RIF is
enslaved to a VRF. Before this patch, each LAG port would drop the
reference on the RIF, but would re-join the same one (in the wrong VR)
soon after. With this patch, the corresponding RIF is first destroyed
and a new one is created using the correct VR.

Fixes: 7179eb5acd ("mlxsw: spectrum_router: Add support for VRFs")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01 11:47:58 -04:00
David S. Miller
cedf90c0cc mlx5-updates-2017-04-30
Or says:
 ================
 mlx5 neigh update
 
 This series (whose code name is 'neigh update') from Hadar, enhances the
 mlx5 TC IP tunnel offloads to deal with changes to tunnel destination
 neighbours used in offloaded flows which involved encapsulation.
 
 In order to keep track on the validity state of such neighbours, we register
 a netevent notifier callback and act on NEIGH_UPDATE events: if a neighbour
 becomes valid, offload the related flows to HW (the other way around when
 neigh becomes invalid) and similarly when a neigh mac addresses changes.
 
 Since this traffic is offloaded from the host OS, the neighbour for the IP
 tunnel destination can mistakenly become STALE and deleted by the kernel
 since its 'used' value wasn't changed. To address that, we proactively
 update the neighbour 'used' value every DELAY_PROBE_TIME seconds, using
 time stamps generated by the existing driver code for HW flow counters.
 We use the DELAY_PROBE_TIME_UPDATE event to adjust the frequency of the updates.
 
 Prior to the core of the series, there's a patch from Saeed that introduces an
 extendable vport representor implementation scheme. It provides a separation
 between the eswitch to the netdev related aspects of the representors.
 
 We would like to thank Ido Schimmel and Ilya Lesokhin for their coaching && advice
 through the long design and review cycles while we struggled to understand and
 (hopefully correctly) implement the locking around the different driver flows(..) .
 
 - Or.
 =================
 
 Misc Updates:
 
 From Tariq:
 Some small performance and trivial code optimization for mlx5 netdev driver
 - Optimize poll ICOSQ completion queue
 - Use prefetchw when a write is to follow
 - Use u8 as ownership type in mlx5e_get_cqe()
 
 From Eran:
 - Disable LRO by default on specific setups
 
 From Eli:
 - Small cleanup for E-Switch to avoid redundant allocation
 
 Thanks,
 Saeed.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJZBeGlAAoJEEg/ir3gV/o+AuMH/2OfS+diXrTMU90tQGfucurg
 cPuv2qSGzSzcyFOjMUKL/hgj+y8HcUNZVSqts/LPhUj5CbR8On2jo/3XZb405SMX
 1eAffpD5JxDGm4xQCADwkDMRVOuWNbXlTceRMbD+8vDtboTDEZwbaXX20En6MIEW
 qcQPX0wv7/u8VpxGlhch2RRvl7+zBhhGCLNpwzkE5K+RggIPguE+F2hLpm+SH9PD
 AOHSuwGWnE40In8dZIPqsEnUsmsC9LmUQ17ip8S8dEtGDHMvwDelsKGZcRGMuwoz
 wSxpVTWn5JaU0EwN0t+rYtERFH18SqboG3OFwqb27qtRns6ALXIx3Dj1bb9sVBo=
 =3qOg
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2017-04-30' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

mlx5-updates-2017-04-30

Or says:
================
mlx5 neigh update

This series (whose code name is 'neigh update') from Hadar, enhances the
mlx5 TC IP tunnel offloads to deal with changes to tunnel destination
neighbours used in offloaded flows which involved encapsulation.

In order to keep track on the validity state of such neighbours, we register
a netevent notifier callback and act on NEIGH_UPDATE events: if a neighbour
becomes valid, offload the related flows to HW (the other way around when
neigh becomes invalid) and similarly when a neigh mac addresses changes.

Since this traffic is offloaded from the host OS, the neighbour for the IP
tunnel destination can mistakenly become STALE and deleted by the kernel
since its 'used' value wasn't changed. To address that, we proactively
update the neighbour 'used' value every DELAY_PROBE_TIME seconds, using
time stamps generated by the existing driver code for HW flow counters.
We use the DELAY_PROBE_TIME_UPDATE event to adjust the frequency of the updates.

Prior to the core of the series, there's a patch from Saeed that introduces an
extendable vport representor implementation scheme. It provides a separation
between the eswitch to the netdev related aspects of the representors.

We would like to thank Ido Schimmel and Ilya Lesokhin for their coaching && advice
through the long design and review cycles while we struggled to understand and
(hopefully correctly) implement the locking around the different driver flows(..) .

- Or.
=================

Misc Updates:

From Tariq:
Some small performance and trivial code optimization for mlx5 netdev driver
- Optimize poll ICOSQ completion queue
- Use prefetchw when a write is to follow
- Use u8 as ownership type in mlx5e_get_cqe()

From Eran:
- Disable LRO by default on specific setups

From Eli:
- Small cleanup for E-Switch to avoid redundant allocation

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01 11:47:10 -04:00
Mintz, Yuval
07ff2ed03b qed: Prevent warning without CONFIG_RFS_ACCEL
After removing the PTP related initialization from slowpath start,
the remaining PTT entry is required only in case CONFIG_RFS_ACCEL is set.
Otherwise, it leads to a warning due to it being unused.

Fixes: d179bd1699 ("qed: Acquire/release ptt_ptp lock when enabling/disabling PTP")
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01 11:42:51 -04:00
David S. Miller
a6e8ab8e72 Merge branch 'qed-RoCE-fixes'
Yuval Mintz says:

====================
qed: RoCE related pseudo-fixes

This series contains multiple small corrections to the RoCE logic
in qed plus some debug information and inter-module parameter
meant to prevent issues further along.

 - #1, #6 Share information with protocol driver
   [either new or filling missing bits in existing API].
 - #2, #3 correct error flows in qed.
 - #4 add debug related information.
 - #5 fixes a minor issue in the HW configuration.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01 11:42:16 -04:00
Ram Amrani
20b1bd96e9 qed: output the DPM status and WID count
Output to the RDMA driver whether DPM mode is enabled or disabled in
the HW and if so what is the number of WIDs it supports

Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01 11:42:15 -04:00
Ram Amrani
107392b75f qed: align DPI configuration to HW requirements
When calculating doorbell BAR partitioning round up the number of
CPUs to the nearest power of 2 so the size of the DPI (per user
section) configured in the hardware will be stored properly and
not truncated.

Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01 11:42:15 -04:00
Ram Amrani
e015d58b44 qed: verify RoCE resource bitmaps are released
Add mechanism to verify RoCE resources are released prior to freeing the
bitmaps. If this is not the case, print what resources were not released.

Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01 11:42:14 -04:00
Ram Amrani
105361943d qed: add error handling flow to TID deregistratin posting failure
If the posting of the ramrod for the purpose of TID deregistration
fails, abort the deregistration operation without using the FW's
return code.

Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01 11:42:14 -04:00
Ram Amrani
ba0154e964 qed: remove unused SQ error state
The internal RoCE SQE QP state isn't being used. Instead we mark the
QP as in regular error state.

Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01 11:42:13 -04:00
Ram Amrani
793ea8a9c7 qed: configure the RoCE max message size
Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01 11:42:13 -04:00
Yonghong Song
332270fdc8 bpf: enhance verifier to understand stack pointer arithmetic
llvm 4.0 and above generates the code like below:
....
440: (b7) r1 = 15
441: (05) goto pc+73
515: (79) r6 = *(u64 *)(r10 -152)
516: (bf) r7 = r10
517: (07) r7 += -112
518: (bf) r2 = r7
519: (0f) r2 += r1
520: (71) r1 = *(u8 *)(r8 +0)
521: (73) *(u8 *)(r2 +45) = r1
....
and the verifier complains "R2 invalid mem access 'inv'" for insn #521.
This is because verifier marks register r2 as unknown value after #519
where r2 is a stack pointer and r1 holds a constant value.

Teach verifier to recognize "stack_ptr + imm" and
"stack_ptr + reg with const val" as valid stack_ptr with new offset.

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01 11:40:23 -04:00
Karim Eshapa
2faf265753 benet: Use time_before_eq for time comparison
Use time_before_eq for time comparison more safe and dealing
with timer wrapping to be future-proof.

Signed-off-by: Karim Eshapa <karim.eshapa@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01 11:12:46 -04:00
Benjamin LaHaise
1a7fca63cd flower: check unused bits in MPLS fields
Since several of the the netlink attributes used to configure the flower
classifier's MPLS TC, BOS and Label fields have additional bits which are
unused, check those bits to ensure that they are actually 0 as suggested
by Jamal.

Signed-off-by: Benjamin LaHaise <benjamin.lahaise@netronome.com>
Cc: David Miller <davem@davemloft.net>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Simon Horman <simon.horman@netronome.com>
Cc: Jakub Kicinski <kubakici@wp.pl>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01 11:12:21 -04:00
David S. Miller
a01aa920b8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter updates for your net-next
tree. A large bunch of code cleanups, simplify the conntrack extension
codebase, get rid of the fake conntrack object, speed up netns by
selective synchronize_net() calls. More specifically, they are:

1) Check for ct->status bit instead of using nfct_nat() from IPVS and
   Netfilter codebase, patch from Florian Westphal.

2) Use kcalloc() wherever possible in the IPVS code, from Varsha Rao.

3) Simplify FTP IPVS helper module registration path, from Arushi Singhal.

4) Introduce nft_is_base_chain() helper function.

5) Enforce expectation limit from userspace conntrack helper,
   from Gao Feng.

6) Add nf_ct_remove_expect() helper function, from Gao Feng.

7) NAT mangle helper function return boolean, from Gao Feng.

8) ctnetlink_alloc_expect() should only work for conntrack with
   helpers, from Gao Feng.

9) Add nfnl_msg_type() helper function to nfnetlink to build the
   netlink message type.

10) Get rid of unnecessary cast on void, from simran singhal.

11) Use seq_puts()/seq_putc() instead of seq_printf() where possible,
    also from simran singhal.

12) Use list_prev_entry() from nf_tables, from simran signhal.

13) Remove unnecessary & on pointer function in the Netfilter and IPVS
    code.

14) Remove obsolete comment on set of rules per CPU in ip6_tables,
    no longer true. From Arushi Singhal.

15) Remove duplicated nf_conntrack_l4proto_udplite4, from Gao Feng.

16) Remove unnecessary nested rcu_read_lock() in
    __nf_nat_decode_session(). Code running from hooks are already
    guaranteed to run under RCU read side.

17) Remove deadcode in nf_tables_getobj(), from Aaron Conole.

18) Remove double assignment in nf_ct_l4proto_pernet_unregister_one(),
    also from Aaron.

19) Get rid of unsed __ip_set_get_netlink(), from Aaron Conole.

20) Don't propagate NF_DROP error to userspace via ctnetlink in
    __nf_nat_alloc_null_binding() function, from Gao Feng.

21) Revisit nf_ct_deliver_cached_events() to remove unnecessary checks,
    from Gao Feng.

22) Kill the fake untracked conntrack objects, use ctinfo instead to
    annotate a conntrack object is untracked, from Florian Westphal.

23) Remove nf_ct_is_untracked(), now obsolete since we have no
    conntrack template anymore, from Florian.

24) Add event mask support to nft_ct, also from Florian.

25) Move nf_conn_help structure to
    include/net/netfilter/nf_conntrack_helper.h.

26) Add a fixed 32 bytes scratchpad area for conntrack helpers.
    Thus, we don't deal with variable conntrack extensions anymore.
    Make sure userspace conntrack helper doesn't go over that size.
    Remove variable size ct extension infrastructure now this code
    got no more clients. From Florian Westphal.

27) Restore offset and length of nf_ct_ext structure to 8 bytes now
    that wraparound is not possible any longer, also from Florian.

28) Allow to get rid of unassured flows under stress in conntrack,
    this applies to DCCP, SCTP and TCP protocols, from Florian.

29) Shrink size of nf_conntrack_ecache structure, from Florian.

30) Use TCP_MAX_WSCALE instead of hardcoded 14 in TCP tracker,
    from Gao Feng.

31) Register SYNPROXY hooks on demand, from Florian Westphal.

32) Use pernet hook whenever possible, instead of global hook
    registration, from Florian Westphal.

33) Pass hook structure to ebt_register_table() to consolidate some
    infrastructure code, from Florian Westphal.

34) Use consume_skb() and return NF_STOLEN, instead of NF_DROP in the
    SYNPROXY code, to make sure device stats are not fooled, patch
    from Gao Feng.

35) Remove NF_CT_EXT_F_PREALLOC this kills quite some code that we
    don't need anymore if we just select a fixed size instead of
    expensive runtime time calculation of this. From Florian.

36) Constify nf_ct_extend_register() and nf_ct_extend_unregister(),
    from Florian.

37) Simplify nf_ct_ext_add(), this kills nf_ct_ext_create(), from
    Florian.

38) Attach NAT extension on-demand from masquerade and pptp helper
    path, from Florian.

39) Get rid of useless ip_vs_set_state_timeout(), from Aaron Conole.

40) Speed up netns by selective calls of synchronize_net(), from
    Florian Westphal.

41) Silence stack size warning gcc in 32-bit arch in snmp helper,
    from Florian.

42) Inconditionally call nf_ct_ext_destroy(), even if we have no
    extensions, to deal with the NF_NAT_MANIP_SRC case. Patch from
    Liping Zhang.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01 10:47:53 -04:00
David S. Miller
edd7f4efa8 Merge branch 'bpf-samples-skb_mode-bug-fixes'
Jesper Dangaard Brouer says:

====================
samples/bpf: two bug fixes to XDP_FLAGS_SKB_MODE attaching

Two small bugfixes for:
 commit 3993f2cb98 ("samples/bpf: Add support for SKB_MODE to xdp1 and xdp_tx_iptunnel")
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01 10:42:38 -04:00
Jesper Dangaard Brouer
f76254a845 samples/bpf: fix XDP_FLAGS_SKB_MODE detach for xdp_tx_iptunnel
The xdp_tx_iptunnel program can be terminated in two ways, after
N-seconds or via Ctrl-C SIGINT.  The SIGINT code path does not
handle detatching the correct XDP program, in-case the program
was attached with XDP_FLAGS_SKB_MODE.

Fix this by storing the XDP flags as a global variable, which is
available for the SIGINT handler function.

Fixes: 3993f2cb98 ("samples/bpf: Add support for SKB_MODE to xdp1 and xdp_tx_iptunnel")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01 10:42:37 -04:00
Jesper Dangaard Brouer
6387d0111c samples/bpf: fix SKB_MODE flag to be a 32-bit unsigned int
The kernel side of XDP_FLAGS_SKB_MODE is unsigned, and the rtnetlink
IFLA_XDP_FLAGS is defined as NLA_U32. Thus, userspace programs under
samples/bpf/ should use the correct type.

Fixes: 3993f2cb98 ("samples/bpf: Add support for SKB_MODE to xdp1 and xdp_tx_iptunnel")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01 10:42:37 -04:00
David S. Miller
d74a32acd5 Merge branch 'xdp-netlink-ext-ack'
Jakub Kicinski says:

====================
xdp: use netlink extended ACK reporting

This series is an attempt to make XDP more user friendly by
enabling exploiting the recently added netlink extended ACK
reporting to carry messages to user space.

David Ahern's iproute2 ext ack patches for ip link are sufficient
to show the errors like this:

Error: nfp: MTU too large w/ XDP enabled

Where the message is coming directly from the driver.  There could
still be a bit of a leap for a complete novice from the message
above to the right settings, but it's a big improvement over the
standard "Invalid argument" message.

v1/non-rfc:
 - add a separate macro in patch 1;
 - add KBUILD_MODNAME as part of the message (Daniel);
 - don't print the error to logs in patch 1.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01 10:35:48 -04:00
Jakub Kicinski
9861ce039c virtio_net: make use of extended ack message reporting
Try to carry error messages to the user via the netlink extended
ack message attribute.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01 10:35:48 -04:00
Jakub Kicinski
d957c0f711 nfp: make use of extended ack message reporting
Try to carry error messages to the user via the netlink extended
ack message attribute.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01 10:35:47 -04:00
Jakub Kicinski
ddf9f97076 xdp: propagate extended ack to XDP setup
Drivers usually have a number of restrictions for running XDP
- most common being buffer sizes, LRO and number of rings.
Even though some drivers try to be helpful and print error
messages experience shows that users don't often consult
kernel logs on netlink errors.  Try to use the new extended
ack mechanism to carry the message back to user space.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01 10:35:47 -04:00
Jakub Kicinski
45d9b378e8 netlink: add NULL-friendly helper for setting extended ACK message
As we propagate extended ack reporting throughout various paths in
the kernel it may be that the same function is called with the
extended ack parameter passed as NULL.  One place where that happens
is in drivers which have a centralized reconfiguration function
called both from ndos and from ethtool_ops.  Add a new helper for
setting the error message in such conditions.

Existing helper is left as is to encourage propagating the ext act
fully wherever possible.  It also makes it clear in the code which
messages may be lost due to ext ack being NULL.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01 10:35:47 -04:00
Liping Zhang
8eeef23504 netfilter: nf_ct_ext: invoke destroy even when ext is not attached
For NF_NAT_MANIP_SRC, we will insert the ct to the nat_bysource_table,
then remove it from the nat_bysource_table via nat_extend->destroy.

But now, the nat extension is attached on demand, so if the nat extension
is not attached, we will not be notified when the ct is destroyed, i.e.
we may fail to remove ct from the nat_bysource_table.

So just keep it simple, even if the extension is not attached, we will
still invoke the related ext->destroy. And this will also preserve the
flexibility for the future extension.

Fixes: 9a08ecfe74 ("netfilter: don't attach a nat extension by default")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-01 11:48:49 +02:00
Pablo Neira Ayuso
d1908ca8dc Merge tag 'ipvs3-for-v4.12' of http://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next
Simon Horman says:

====================
Third Round of IPVS Updates for v4.12

please consider these enhancements to IPVS for v4.12.
If it is too late for v4.12 then please consider them for v4.13.

* Remove unused function
* Correct comparison of unsigned value
====================

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-01 11:46:50 +02:00
Florian Westphal
0e72f55f35 netfilter: snmp: avoid stack size warning
net/ipv4/netfilter/nf_nat_snmp_basic.c:1158:1: warning: the frame size
of 1160 bytes is larger than 1024 bytes

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-01 11:43:58 +02:00
Florian Westphal
039b40ee58 netfilter: nf_queue: only call synchronize_net twice if nf_queue is active
nf_unregister_net_hook(s) can avoid a second call to synchronize_net,
provided there is no nfqueue active in that net namespace (which is
the common case).

This also gets rid of the extra arg to nf_queue_nf_hook_drop(), normally
this gets called during netns cleanup so no packets should be queued.

For the rare case of base chain being unregistered or module removal
while nfqueue is in use the extra hiccup due to the packet drops isn't
a big deal.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-01 11:19:12 +02:00
Florian Westphal
c83fa19603 netfilter: nf_log: don't call synchronize_rcu in nf_log_unset
nf_log_unregister() (which is what gets called in the logger backends
module exit paths) does a (required, module is removed) synchronize_rcu().

But nf_log_unset() is only called from pernet exit handlers. It doesn't
free any memory so there appears to be no need to call synchronize_rcu.

v2: Liping Zhang points out that nf_log_unregister() needs to be called
after pernet unregister, else rmmod would become unsafe.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-01 11:19:07 +02:00
Florian Westphal
933bd83ed6 netfilter: batch synchronize_net calls during hook unregister
synchronize_net is expensive and slows down netns cleanup a lot.

We have two APIs to unregister a hook:
nf_unregister_net_hook (which calls synchronize_net())
and
nf_unregister_net_hooks (calls nf_unregister_net_hook in a loop)

Make nf_unregister_net_hook a wapper around new helper
__nf_unregister_net_hook, which unlinks the hook but does not free it.

Then, we can call that helper in nf_unregister_net_hooks and then
call synchronize_net() only once.

Andrey Konovalov reports this change improves syzkaller fuzzing speed at
least twice.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-01 11:18:54 +02:00
Abhishek Shah
733336262b net: phy: Allow BCM5481x PHYs to setup internal TX/RX clock delay
This patch allows users to enable/disable internal TX and/or RX
clock delay for BCM5481x series PHYs so as to satisfy RGMII timing
specifications.

On a particular platform, whether TX and/or RX clock delay is required
depends on how PHY connected to the MAC IP. This requirement can be
specified through "phy-mode" property in the platform device tree.

Signed-off-by: Abhishek Shah <abhishek.shah@broadcom.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-30 22:56:19 -04:00