This config option makes only couple of lines optional.
Two small helpers and an int in couple of cls structs.
Remove the config option and always compile this in.
This saves the user from unexpected surprises when he adds
a filter with ingress device match which is silently ignored
in case the config option is not set.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A port may trigger operations on its dedicated CPU port, so using
cpu_dp as const will raise warnings. Make cpu_dp non const.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If we want to set a EDT time for the skb we want to send
via ip_send_unicast_reply(), we have to pass a new parameter
and initialize ipc.sockc.transmit_time with it.
This fixes the EDT time for ACK/RST packets sent on behalf of
a TIME_WAIT socket.
Fixes: a842fe1425 ("tcp: add optional per socket transmit delay")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* HE (802.11ax) work continues
* WPA3 offloads
* work on extended key ID handling continues
* fixes to honour AP supported rates with auth/assoc frames
* nl80211 netlink policy improvements to fix some issues
with strict validation on new commands with old attrs
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAl0Dq/sACgkQB8qZga/f
l8RqFg/+MBcuqvW2xTy5o5Lbw7Drx5ROgFT2ZRAO6PTeboQ43NOBiXt2dEhDbp+w
mHChImF85px3SFMBSvuf97zlScNV6+VJraDDjoZFixt/gIZ/XsdURo5i4IGmUbfj
+LY1oPm7suC5Cold+yPicHTukFpeU7cSwceslFsecqiN5unlzIxf6gY9H7OL7WGT
s0Wis0x3y2m9mMi4cvQfHkFzplcTc5SBgPLyLQtHUNx1eySEZ+AymlNVmbGrRWr9
vaCU5W9+Wz0N6lEB/UI5y6fZzj5mhkcimGck1Os7dFeC7KWjntjT9iKIkFHWehxi
QfLcK6pGjLpPpMTQtOEfl34ZGnOyO8N9GmOLaaUaBeaZItabYJwfgbdr7NxiJvta
1cyqXek+D2G7WOa0aIrWhmwswKGBa3nIBqS/ZP/SEWLEzU1Cn0NiAD5Ba016TC4C
D+1BBXIdpQDoZCgfd6KkGs2Ynf/8N3OwHW+EwjpAu3IARTQzb6tMWSvkAuAgJt1F
dBD7NqdFhWXFfxqf9NpB8bkmpyNKM4Km6eO2HKpCg/5suKqYJ1Xj9EeQin1B+QsE
Jntj69hQ6Kj2gKBPy+RnCBFbxMNuFhpc1kmUOGj9U9aAcOntV0woVOyFGsbRmFo3
MI8aVU/gjQDCcHHD5xtJGHa11uIefXq1r2H7Um3sxKYeBsqFjP4=
=j+Um
-----END PGP SIGNATURE-----
Merge tag 'mac80211-next-for-davem-2019-06-14' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
====================
Many changes all over:
* HE (802.11ax) work continues
* WPA3 offloads
* work on extended key ID handling continues
* fixes to honour AP supported rates with auth/assoc frames
* nl80211 netlink policy improvements to fix some issues
with strict validation on new commands with old attrs
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
cfg80211_remain_on_channel_expired is used to notify userspace when
the remain on channel duration expired by sending an event. There is
no such equivalent to CMD_FRAME, where if offchannel and a duration
is provided, the card will go offchannel for that duration. Currently
there is no way for userspace to tell when that duration expired
apart from setting an independent timeout. This timeout is quite
erroneous as the kernel may not immediately send out the frame
because of scheduling or work queue delays. In testing, it was found
this timeout had to be quite large to accomidate any potential delays.
A better solution is to have the kernel send an event when this
duration has expired. There is already NL80211_CMD_FRAME_WAIT_CANCEL
which can be used to cancel a NL80211_CMD_FRAME offchannel. Using this
command matches perfectly to how NL80211_CMD_CANCEL_REMAIN_ON_CHANNEL
works, where its both used to cancel and notify if the duration has
expired.
Signed-off-by: James Prestwood <james.prestwood@linux.intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
There's no rate control algorithm that *doesn't* want to call
it internally, and calling it internally will let us modify
its behaviour in the future.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add a function that iterates over the BSS entries associated with a
given wiphy and calls a callback for each iterated BSS. This can be
used by drivers in various ways, e.g., to evaluate some property for
all the BSSs in the medium.
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Allow the userland daemon to en/disable TWT support for an AP.
Signed-off-by: Shashidhar Lakkavalli <slakkavalli@datto.com>
Signed-off-by: John Crispin <john@phrozen.org>
[simplify parsing code]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Turn TWT for STA interfaces when they associate and/or receive a
beacon where the twt_responder bit has changed.
Signed-off-by: Shashidhar Lakkavalli <slakkavalli@datto.com>
Signed-off-by: John Crispin <john@phrozen.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Require that each vendor command give a policy of its sub-attributes
in NL80211_ATTR_VENDOR_DATA, and then (stricly) check the contents,
including the NLA_F_NESTED flag that we couldn't check on the outer
layer because there we don't know yet.
It is possible to use VENDOR_CMD_RAW_DATA for raw data, but then no
nested data can be given (NLA_F_NESTED flag must be clear) and the
data is just passed as is to the command.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This function is similar to ieee80211_get_he_sta_cap() but allows passing
the iftype. Also make ieee80211_get_he_sta_cap() use the new helper
rather than duplicating the code.
Signed-off-by: Shashidhar Lakkavalli <slakkavalli@datto.com>
Signed-off-by: John Crispin <john@phrozen.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Let drivers advertise support for station-mode SAE authentication
offload with a new NL80211_EXT_FEATURE_SAE_OFFLOAD flag.
Signed-off-by: Chung-Hsien Hsu <stanley.hsu@cypress.com>
Signed-off-by: Chi-Hsien Lin <chi-hsien.lin@cypress.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
IEEE 802.11 - 2016 forbids mixing MPDUs with different keyIDs in one
A-MPDU. Drivers supporting A-MPDUs and Extended Key ID must actively
enforce that requirement due to the available two unicast keyIDs.
Allow driver to signal mac80211 that they will not check the keyID in
MPDUs when aggregating them and that they expect mac80211 to stop Tx
aggregation when rekeying a connection using Extended Key ID.
Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Adding delays to TCP flows is crucial for studying behavior
of TCP stacks, including congestion control modules.
Linux offers netem module, but it has unpractical constraints :
- Need root access to change qdisc
- Hard to setup on egress if combined with non trivial qdisc like FQ
- Single delay for all flows.
EDT (Earliest Departure Time) adoption in TCP stack allows us
to enable a per socket delay at a very small cost.
Networking tools can now establish thousands of flows, each of them
with a different delay, simulating real world conditions.
This requires FQ packet scheduler or a EDT-enabled NIC.
This patchs adds TCP_TX_DELAY socket option, to set a delay in
usec units.
unsigned int tx_delay = 10000; /* 10 msec */
setsockopt(fd, SOL_TCP, TCP_TX_DELAY, &tx_delay, sizeof(tx_delay));
Note that FQ packet scheduler limits might need some tweaking :
man tc-fq
PARAMETERS
limit
Hard limit on the real queue size. When this limit is
reached, new packets are dropped. If the value is lowered,
packets are dropped so that the new limit is met. Default
is 10000 packets.
flow_limit
Hard limit on the maximum number of packets queued per
flow. Default value is 100.
Use of TCP_TX_DELAY option will increase number of skbs in FQ qdisc,
so packets would be dropped if any of the previous limit is hit.
Use of a jump label makes this support runtime-free, for hosts
never using the option.
Also note that TSQ (TCP Small Queues) limits are slightly changed
with this patch : we need to account that skbs artificially delayed
wont stop us providind more skbs to feed the pipe (netem uses
skb_orphan_partial() for this purpose, but FQ can not use this trick)
Because of that, using big delays might very well trigger
old bugs in TSO auto defer logic and/or sndbuf limited detection.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TLS offload drivers keep track of TCP seq numbers to make sure
the packets are fed into the HW in order.
When packets get dropped on the way through the stack, the driver
will get out of sync and have to use fallback encryption, but unless
TCP seq number is resynced it will never match the packets correctly
(or even worse - use incorrect record sequence number after TCP seq
wraps).
Existing drivers (mlx5) feed the entire record on every out-of-order
event, allowing FW/HW to always be in sync.
This patch adds an alternative, more akin to the RX resync. When
driver sees a frame which is past its expected sequence number the
stream must have gotten out of order (if the sequence number is
smaller than expected its likely a retransmission which doesn't
require resync). Driver will ask the stack to perform TX sync
before it submits the next full record, and fall back to software
crypto until stack has performed the sync.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently only RX direction is ever resynced, however, TX may
also get out of sequence if packets get dropped on the way to
the driver. Rename the resync callback and add a direction
parameter.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TLS offload device may lose sync with the TCP stream if packets
arrive out of order. Drivers can currently request a resync at
a specific TCP sequence number. When a record is found starting
at that sequence number kernel will inform the device of the
corresponding record number.
This requires the device to constantly scan the stream for a
known pattern (constant bytes of the header) after sync is lost.
This patch adds an alternative approach which is entirely under
the control of the kernel. Kernel tracks records it had to fully
decrypt, even though TLS socket is in TLS_HW mode. If multiple
records did not have any decrypted parts - it's a pretty strong
indication that the device is out of sync.
We choose the min number of fully encrypted records to be 2,
which should hopefully be more than will get retransmitted at
a time.
After kernel decides the device is out of sync it schedules a
resync request. If the TCP socket is empty the resync gets
performed immediately. If socket is not empty we leave the
record parser to resync when next record comes.
Before resync in message parser we peek at the TCP socket and
don't attempt the sync if the socket already has some of the
next record queued.
On resync failure (encrypted data continues to flow in) we
retry with exponential backoff, up to once every 128 records
(with a 16k record thats at most once every 2M of data).
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
handle_device_resync() doesn't describe the function very well.
The function checks if resync should be issued upon parsing of
a new record.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TLS offload code casts record number to a u64. The buffer
should be aligned to 8 bytes, but its actually a __be64, and
the rest of the TLS code treats it as big int. Make the
offload callbacks take a byte array, drivers can make the
choice to do the ugly cast if they want to.
Prepare for copying the record number onto the stack by
defining a constant for max size of the byte array.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for RTA_NH_ID attribute to allow a user to specify a
nexthop id to use with a route. fc_nh_id is added to fib6_config to
hold the value passed in the RTA_NH_ID attribute. If a nexthop id
is given, the gateway, device, encap and multipath attributes can
not be set.
Update ip6_route_del to check metric and protocol before nexthop
specs. If fc_nh_id is set, then it must match the id in the route
entry. Since IPv6 allows delete of a cached entry (an exception),
add ip6_del_cached_rt_nh to cycle through all of the fib6_nh in
a fib entry if it is using a nexthop.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for RTA_NH_ID attribute to allow a user to specify a
nexthop id to use with a route. fc_nh_id is added to fib_config to
hold the value passed in the RTA_NH_ID attribute. If a nexthop id
is given, the gateway, device, encap and multipath attributes can
not be set.
Update fib_nh_match to check ids on a route delete.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IPv6 has traditionally had a single fib6_nh per fib6_info. With
nexthops we can have multiple fib6_nh associated with a fib6_info.
Add a nexthop helper to invoke a callback for each fib6_nh in a
'struct nexthop'. If the callback returns non-0, the loop is
stopped and the return value passed to the caller.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case autoflowlabel is in action, skb_get_hash_flowi6()
derives a non zero skb->hash to the flowlabel.
If skb->hash is zero, a flow dissection is performed.
Since all TCP skbs sent from ESTABLISH state inherit their
skb->hash from sk->sk_txhash, we better keep a copy
of sk->sk_txhash into the TIME_WAIT socket.
After this patch, ACK or RST packets sent on behalf of
a TIME_WAIT socket have the flowlabel that was previously
used by the flow.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Based on review, `lock' is only acquired in hwbm_pool_add() which is
invoked via ->probe(), ->resume() and ->ndo_change_mtu(). Based on this
the lock can become a mutex and there is no need to disable interrupts
during the procedure.
Now that the lock is a mutex, hwbm_pool_add() no longer invokes
hwbm_pool_refill() in an atomic context so we can pass GFP_KERNEL to
hwbm_pool_refill() and remove the `gfp' argument from hwbm_pool_add().
Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The nhg->nh_entries[] array is allocated in nexthop_grp_alloc() and it
has nhg->num_nh elements so this check should be >= instead of >.
Fixes: 430a049190 ("nexthop: Add support for nexthop groups")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Where possible, we generally want both the bond master and the relevant slave
information in message output. Standardize the format using new slave_*
printk macros.
Suggested-by: Joe Perches <joe@perches.com>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is helpful for e.g. draining per-driver (not per-port) tagger
queues.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some ISDN files that got removed in net-next had some changes
done in mainline, take the removals.
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) Free AF_PACKET po->rollover properly, from Willem de Bruijn.
2) Read SFP eeprom in max 16 byte increments to avoid problems with
some SFP modules, from Russell King.
3) Fix UDP socket lookup wrt. VRF, from Tim Beale.
4) Handle route invalidation properly in s390 qeth driver, from Julian
Wiedmann.
5) Memory leak on unload in RDS, from Zhu Yanjun.
6) sctp_process_init leak, from Neil HOrman.
7) Fix fib_rules rule insertion semantic change that broke Android,
from Hangbin Liu.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (33 commits)
pktgen: do not sleep with the thread lock held.
net: mvpp2: Use strscpy to handle stat strings
net: rds: fix memory leak in rds_ib_flush_mr_pool
ipv6: fix EFAULT on sendto with icmpv6 and hdrincl
ipv6: use READ_ONCE() for inet->hdrincl as in ipv4
Revert "fib_rules: return 0 directly if an exactly same rule exists when NLM_F_EXCL not supplied"
net: aquantia: fix wol configuration not applied sometimes
ethtool: fix potential userspace buffer overflow
Fix memory leak in sctp_process_init
net: rds: fix memory leak when unload rds_rdma
ipv6: fix the check before getting the cookie in rt6_get_cookie
ipv4: not do cache for local delivery if bc_forwarding is enabled
s390/qeth: handle error when updating TX queue count
s390/qeth: fix VLAN attribute in bridge_hostnotify udev event
s390/qeth: check dst entry before use
s390/qeth: handle limited IPv4 broadcast in L3 TX path
net: fix indirect calls helpers for ptype list hooks.
net: ipvlan: Fix ipvlan device tso disabled while NETIF_F_IP_CSUM is set
udp: only choose unbound UDP socket for multicast when not in a VRF
net/tls: replace the sleeping lock around RX resync with a bit lock
...
While offloading TLS connections, drivers need to handle the case where
out of order packets need to be transmitted.
Other drivers obtain the entire TLS record for the specific skb to
provide as context to hardware for encryption. However, other designs
may also want to keep the hardware state intact and perform the
out of order encryption entirely on the host.
To achieve this, export the already existing software encryption
fallback path so drivers could access this.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently drivers have to ensure the alignment of their tls state
structure, which leads to unnecessary layers of getters and
encapsulated structures in each driver.
Simplify all this by marking the driver state as aligned (driver_state
members are currently aligned, so no hole is added, besides ALIGN in
TLS_OFFLOAD_CONTEXT_SIZE_RX/TX would reserve this extra space, anyway.)
With that we can add a common accessor to the core.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 bytes of driver state has been enough so far, but for drivers
which have to store 8 byte handle it's no longer practical to
store the state directly in the context.
Drivers generally don't need much extra state on RX side, while
TX side has to be tracking TCP sequence numbers. Split the
lengths of max driver state size on RX and TX.
The struct tls_offload_context_tx currently stands at 616 bytes and
struct tls_offload_context_rx stands at 368 bytes. Upcoming work
will consume extra 8 bytes in both for kernel-driven resync.
This means that we can bump TX side to 16 bytes and still fit
into the same number of cache lines but on RX side we would be 8
bytes over.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The monolithic hash_lock could cause huge contention when
inserting/deletiing vxlan_fdbs into the fdb_head.
Use FDB_HASH_SIZE hash_locks to protect insertions/deletions
of vxlan_fdbs into the fdb_head hash table.
Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Litao jiao <jiaolitao@raisecom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In Jianlin's testing, netperf was broken with 'Connection reset by peer',
as the cookie check failed in rt6_check() and ip6_dst_check() always
returned NULL.
It's caused by Commit 93531c6743 ("net/ipv6: separate handling of FIB
entries from dst based routes"), where the cookie can be got only when
'c1'(see below) for setting dst_cookie whereas rt6_check() is called
when !'c1' for checking dst_cookie, as we can see in ip6_dst_check().
Since in ip6_dst_check() both rt6_dst_from_check() (c1) and rt6_check()
(!c1) will check the 'from' cookie, this patch is to remove the c1 check
in rt6_get_cookie(), so that the dst_cookie can always be set properly.
c1:
(rt->rt6i_flags & RTF_PCPU || unlikely(!list_empty(&rt->rt6i_uncached)))
Fixes: 93531c6743 ("net/ipv6: separate handling of FIB entries from dst based routes")
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add struct nexthop and nh_list list_head to fib6_info. nh_list is the
fib6_info side of the nexthop <-> fib_info relationship. Since a fib6_info
referencing a nexthop object can not have 'sibling' entries (the old way
of doing multipath routes), the nh_list is a union with fib6_siblings.
Add f6i_list list_head to 'struct nexthop' to track fib6_info entries
using a nexthop instance. Update __remove_nexthop_fib to walk f6_list
and delete fib entries using the nexthop.
Add a few nexthop helpers for use when a nexthop is added to fib6_info:
- nexthop_fib6_nh - return first fib6_nh in a nexthop object
- fib6_info_nh_dev moved to nexthop.h and updated to use nexthop_fib6_nh
if the fib6_info references a nexthop object
- nexthop_path_fib6_result - similar to ipv4, select a path within a
multipath nexthop object. If the nexthop is a blackhole, set
fib6_result type to RTN_BLACKHOLE, and set the REJECT flag
Update the fib6_info references to check for nh and take a different path
as needed:
- rt6_qualify_for_ecmp - if a fib entry uses a nexthop object it can NOT
be coalesced with other fib entries into a multipath route
- rt6_duplicate_nexthop - use nexthop_cmp if either fib6_info references
a nexthop
- addrconf (host routes), RA's and info entries (anything configured via
ndisc) does not use nexthop objects
- fib6_info_destroy_rcu - put reference to nexthop object
- fib6_purge_rt - drop fib6_info from f6i_list
- fib6_select_path - update to use the new nexthop_path_fib6_result when
fib entry uses a nexthop object
- rt6_device_match - update to catch use of nexthop object as a blackhole
and set fib6_type and flags.
- ip6_route_info_create - don't add space for fib6_nh if fib entry is
going to reference a nexthop object, take a reference to nexthop object,
disallow use of source routing
- rt6_nlmsg_size - add space for RTA_NH_ID
- add rt6_fill_node_nexthop to add nexthop data on a dump
As with ipv4, most of the changes push existing code into the else branch
of whether the fib entry uses a nexthop object.
Update the nexthop code to walk f6i_list on a nexthop deleted to remove
fib entries referencing it.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add 'struct nexthop' and nh_list list_head to fib_info. nh_list is the
fib_info side of the nexthop <-> fib_info relationship.
Add fi_list list_head to 'struct nexthop' to track fib_info entries
using a nexthop instance. Add __remove_nexthop_fib and add it to
__remove_nexthop to walk the new list_head and mark those fib entries
as dead when the nexthop is deleted.
Add a few nexthop helpers for use when a nexthop is added to fib_info:
- nexthop_cmp to determine if 2 nexthops are the same
- nexthop_path_fib_result to select a path for a multipath
'struct nexthop'
- nexthop_fib_nhc to select a specific fib_nh_common within a
multipath 'struct nexthop'
Update existing fib_info_nhc to use nexthop_fib_nhc if a fib_info uses
a 'struct nexthop', and mark fib_info_nh as only used for the non-nexthop
case.
Update the fib_info functions to check for fi->nh and take a different
path as needed:
- free_fib_info_rcu - put the nexthop object reference
- fib_release_info - remove the fib_info from the nexthop's fi_list
- nh_comp - use nexthop_cmp when either fib_info references a nexthop
object
- fib_info_hashfn - use the nexthop id for the hashing vs the oif of
each fib_nh in a fib_info
- fib_nlmsg_size - add space for the RTA_NH_ID attribute
- fib_create_info - verify nexthop reference can be taken, verify
nexthop spec is valid for fib entry, and add fib_info to fi_list for
a nexthop
- fib_select_multipath - use the new nexthop_path_fib_result to select a
path when nexthop objects are used
- fib_table_lookup - if the 'struct nexthop' is a blackhole nexthop, treat
it the same as a fib entry using 'blackhole'
The bulk of the changes are in fib_semantics.c and most of that is
moving the existing change_nexthops into an else branch.
Update the nexthop code to walk fi_list on a nexthop deleted to remove
fib entries referencing it.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert more IPv4 code to use fib_nh_common over fib_nh to enable routes
to use a fib6_nh based nexthop. In the end, only code not using a
nexthop object in a fib_info should directly access fib_nh in a fib_info
without checking the famiy and going through fib_nh_common. Those
functions will be marked when it is not directly evident.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use helpers to access fib_nh and fib_nhs fields of a fib_info. Drop the
fib_dev macro which is an alias for the first nexthop. Replacements:
fi->fib_dev --> fib_info_nh(fi, 0)->fib_nh_dev
fi->fib_nh --> fib_info_nh(fi, 0)
fi->fib_nh[i] --> fib_info_nh(fi, i)
fi->fib_nhs --> fib_info_num_path(fi)
where fib_info_nh(fi, i) returns fi->fib_nh[nhsel] and fib_info_num_path
returns fi->fib_nhs.
Move the existing fib_info_nhc to nexthop.h and define the new ones
there. A later patch adds a check if a fib_info uses a nexthop object,
and defining the helpers in nexthop.h avoid circular header
dependencies.
After this all remaining open coded references to fi->fib_nhs and
fi->fib_nh are in:
- fib_create_info and helpers used to lookup an existing fib_info
entry, and
- the netdev event functions fib_sync_down_dev and fib_sync_up.
The latter two will not be reused for nexthops, and the fib_create_info
will be updated to handle a nexthop in a fib_info.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All callers pass prot->version as the last parameter
of tls_advance_record_sn(), yet tls_advance_record_sn()
itself needs a pointer to prot. Pass prot from callers.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct tls_context is slightly badly laid out. If we reorder things
right we can save 16 bytes (320 -> 304) but also make all fast path
data fit into two cache lines (one read only and one read/write,
down from four cache lines).
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce a function to be called from drivers during flash. It sends
notification to userspace about flash update progress.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 38030d7cb7 ("net/tls: avoid NULL-deref on resync during device removal")
tried to fix a potential NULL-dereference by taking the
context rwsem. Unfortunately the RX resync may get called
from soft IRQ, so we can't use the rwsem to protect from
the device disappearing. Because we are guaranteed there
can be only one resync at a time (it's called from strparser)
use a bit to indicate resync is busy and make device
removal wait for the bit to get cleared.
Note that there is a leftover "flags" field in struct
tls_context already.
Fixes: 4799ac81e5 ("tls: Add rx inline crypto offload")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
flow_stats_update() uses max_t, so ensure we have that defined.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This flag is not used by any caller, remove it.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
The following patchset container Netfilter/IPVS update for net-next:
1) Add UDP tunnel support for ICMP errors in IPVS.
Julian Anastasov says:
This patchset is a followup to the commit that adds UDP/GUE tunnel:
"ipvs: allow tunneling with gue encapsulation".
What we do is to put tunnel real servers in hash table (patch 1),
add function to lookup tunnels (patch 2) and use it to strip the
embedded tunnel headers from ICMP errors (patch 3).
2) Extend xt_owner to match for supplementary groups, from
Lukasz Pawelczyk.
3) Remove unused oif field in flow_offload_tuple object, from
Taehee Yoo.
4) Release basechain counters from workqueue to skip synchronize_rcu()
call. From Florian Westphal.
5) Replace skb_make_writable() by skb_ensure_writable(). Patchset
from Florian Westphal.
6) Checksum support for gue encapsulation in IPVS, from Jacky Hu.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The phylink conflict was between a bug fix by Russell King
to make sure we have a consistent PHY interface mode, and
a change in net-next to pull some code in phylink_resolve()
into the helper functions phylink_mac_link_{up,down}()
On the dp83867 side it's mostly overlapping changes, with
the 'net' side removing a condition that was supposed to
trigger for RGMII but because of how it was coded never
actually could trigger.
Signed-off-by: David S. Miller <davem@davemloft.net>
Add checksum support for gue encapsulation with the tun_flags parameter,
which could be one of the values below:
IP_VS_TUNNEL_ENCAP_FLAG_NOCSUM
IP_VS_TUNNEL_ENCAP_FLAG_CSUM
IP_VS_TUNNEL_ENCAP_FLAG_REMCSUM
Signed-off-by: Jacky Hu <hengqing.hu@gmail.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The oifidx in the struct flow_offload_tuple is not used anymore.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add ip_vs_find_tunnel() to match tunnel headers
by family, address and optional port. Use it to
properly find the tunnel real server used in
received ICMP errors.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>