This patch adds the following structure:
struct netlink_kernel_cfg {
unsigned int groups;
void (*input)(struct sk_buff *skb);
struct mutex *cb_mutex;
};
That can be passed to netlink_kernel_create to set optional configurations
for netlink kernel sockets.
I've populated this structure by looking for NULL and zero parameters at the
existing code. The remaining parameters that always need to be set are still
left in the original interface.
That includes optional parameters for the netlink socket creation. This allows
easy extensibility of this interface in the future.
This patch also adapts all callers to use this new interface.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
If rpfilter is off (or the SKB has an IPSEC path) and there are not
tclassid users, we don't have to do anything at all when
fib_validate_source() is invoked besides setting the itag to zero.
We monitor tclassid uses with a counter (modified only under RTNL and
marked __read_mostly) and we protect the fib_validate_source() real
work with a test against this counter and whether rpfilter is to be
done.
Having a way to know whether we need no tclassid processing or not
also opens the door for future optimized rpfilter algorithms that do
not perform full FIB lookups.
Signed-off-by: David S. Miller <davem@davemloft.net>
Change l2tp_xmit_skb() to return NET_XMIT_DROP in case skb is dropped.
Use kfree_skb() instead dev_kfree_skb() for drop_monitor pleasure.
Support tx_dropped counter for l2tp_eth
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
At Facebook, we do Layer-3 DSR via IP-in-IP tunneling. Our load balancers wrap
an extra IP header on incoming packets so they can be routed to the backend.
In the v4 tunnel driver, when these packets fall on the default tunl0 device,
the behavior is to decapsulate them and drop them back on the stack. So our
setup is that tunl0 has the VIP and eth0 has (obviously) the backend's real
address.
In IPv6 we do the same thing, but the v6 tunnel driver didn't have this same
behavior - if you didn't have an explicit tunnel setup, it would drop the
packet.
This patch brings that v4 feature to the v6 driver.
The same IPv6 address checks are performed as with any normal tunnel,
but as the fallback tunnel endpoint addresses are unspecified, the checks
must be performed on a per-packet basis, rather than at tunnel
configuration time.
[Patch description modified by phil@ipom.com]
Signed-off-by: Ville Nuorvala <ville.nuorvala@gmail.com>
Tested-by: Phil Dibowitz <phil@ipom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Checking for in_dev being NULL is pointless.
In fact, all of our callers have in_dev precomputed already,
so just pass it in and remove the NULL checking.
Signed-off-by: David S. Miller <davem@davemloft.net>
Based upon feedback from Julian Anastasov.
1) Use route flags to determine multicast/broadcast, not the
packet flags.
2) Leave saddr unspecified in flow key.
3) Adjust how we invoke inet_select_addr(). Pass ip_hdr(skb)->saddr as
second arg, and if it was zeronet use link scope.
4) Use loopback as input interface in flow key.
Signed-off-by: David S. Miller <davem@davemloft.net>
Using NLMSG_GOODSIZE results in multiple pages being used as
nlmsg_new() will automatically add the size of the netlink
header to the payload thus exceeding the page limit.
NLMSG_DEFAULT_SIZE takes this into account.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Cc: Jiri Pirko <jpirko@redhat.com>
Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: Sergey Lapin <slapin@ossfans.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Lauro Ramos Venancio <lauro.venancio@openbossa.org>
Cc: Aloisio Almeida Jr <aloisio.almeida@openbossa.org>
Cc: Samuel Ortiz <sameo@linux.intel.com>
Reviewed-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The code in tcp_v6_conn_request() was implicitly assuming that
tcp_v6_send_synack() would take care of dst_release(), much as
tcp_v4_send_synack() already does. This resulted in
tcp_v6_conn_request() leaking a dst if sysctl_tw_recycle is enabled.
This commit restructures tcp_v6_send_synack() so that it accepts a dst
pointer and takes care of releasing the dst that is passed in, to plug
the leak and avoid future surprises by bringing the IPv6 behavior in
line with the IPv4 side.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the recent change (earlier in this patch series) to set
flowi6_oif to treq->iif in inet6_csk_route_req(), the dst lookup in
these two functions is now identical, so tcp_v6_send_synack() can now
just call inet6_csk_route_req(), to reduce code duplication and keep
things closer to the IPv4 side, which is structured this way.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit changes inet_csk_route_req() so that it uses a pointer to
a struct flowi6, rather than allocating its own on the stack. This
brings its behavior in line with its IPv4 cousin,
inet_csk_route_req(), and allows a follow-on patch to fix a dst leak.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix inet6_csk_route_req() to use as the flowi6_oif the treq->iif,
which is correctly fixed up in tcp_v6_conn_request() to handle the
case of link-local addresses. This brings it in line with the
tcp_v6_send_synack() code, which is already correctly using the
treq->iif in this way.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/caif/caif_hsi.c
drivers/net/usb/qmi_wwan.c
The qmi_wwan merge was trivial.
The caif_hsi.c, on the other hand, was not. It's a conflict between
1c385f1fdf ("caif-hsi: Replace platform
device with ops structure.") in the net-next tree and commit
39abbaef19 ("caif-hsi: Postpone init of
HIS until open()") in the net tree.
I did my best with that one and will ask Sjur to check it out.
Signed-off-by: David S. Miller <davem@davemloft.net>
John Linville says:
====================
Amitkumar Karwar gives us two mwifiex fixes: one fixes some skb
manipulations when handling some event messages; and another that
does some similar fixing on an error path.
Avinash Patil gives us a fix for for a memory leak in mwifiex.
Dan Rosenberg offers an NFC NCI fix to enforce some message length
limits to prevent buffer overflows.
Eliad Peller provides a mac80211 fix to prevent some frames from
being built with an invalid BSSID.
Eric Dumazet sends an NFC fix to prevent a BUG caused by a NULL
pointer dereference.
Felix Fietkau has an ath9k fix for a regression causing
LEAP-authenticated connection failures.
Johannes Berg provides an iwlwifi fix that eliminates some log SPAM
after an authentication/association timeout. He also provides a
mac80211 fix to prevent incorrectly addressing certain action frames
(and in so doing, to comply with the 802.11 specs).
Larry Finger provides a few USB IDs for the rtl8192cu driver --
should be harmless.
Panayiotis Karabassis provices a one-liner to fix kernel bug 42903
(a system freeze).
Randy Dunlap provides a one-line Kconfig change to prevent build
failures with some configurations.
Stone Piao provides an mwifiex sequence numbering fix and a fix
to prevent mwifiex from attempting to include eapol frames in an
aggregation frame.
Finally, Tom Hughes provides an ath9k fix for a NULL pointer
dereference.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Make logging level consistent with other deprecation messages in net
subsystem.
Signed-off-by: Vinson Lee <vlee@twitter.com>
Cc: David Mackey <tdmackey@twitter.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking update from David Miller:
1) Pairing and deadlock fixes in bluetooth from Johan Hedberg.
2) Add device IDs for AR3011 and AR3012 bluetooth chips. From
Giancarlo Formicuccia and Marek Vasut.
3) Fix wireless regulatory deadlock, from Eliad Peller.
4) Fix full TX ring panic in bnx2x driver, from Eric Dumazet.
5) Revert the two commits that added skb_orphan_try(), it causes
erratic bonding behavior with UDP clients and the gains it used to
give are mostly no longer happening due to how BQL works. From Eric
Dumazet.
6) It took two tries, but Thomas Graf fixed a problem wherein we
registered ipv6 routing procfs files before their backend data were
initialized properly.
7) Fix max GSO size setting in be2net, from Sarveshwar Bandi.
8) PHY device id mask is wrong for KSZ9021 and KS8001 chips, fix from
Jason Wang.
9) Fix use of stale SKB data pointer after skb_linearize() call in
batman-adv, from Antonio Quartulli.
10) Fix memory leak in IXGBE due to missing __GFP_COMP, from Alexander
Duyck.
11) Fix probing of Gobi devices in qmi_wwan usbnet driver, from Bjørn
Mork.
12) Fix suspend/resume and open failure handling in usbnet from Ming
Lei.
13) Attempt to fix device r8169 hangs for certain chips, from Francois
Romieu.
14) Fix advancement of RX dirty pointer in some situations in sh_eth
driver, from Yoshihiro Shimoda.
15) Attempt to fix restart of IPV6 routing table dumps when there is an
intervening table update. From Eric Dumazet.
16) Respect security_inet_conn_request() return value in ipv6 TCP. From
Neal Cardwell.
17) Add another iPAD device ID to ipheth driver, from Davide Gerhard.
18) Fix access to freed SKB in l2tp_eth_dev_xmit(), and fix l2tp lockdep
splats, from Eric Dumazet.
19) Make sure all bridge devices, regardless of whether they were
created via netlink or ioctls, have their rtnetlink ops hooked up.
From Thomas Graf and Stephen Hemminger.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (81 commits)
9p: fix min_t() casting in p9pdu_vwritef()
can: flexcan: use be32_to_cpup to handle the value of dt entry
xen/netfront: teardown the device before unregistering it.
bridge: Assign rtnl_link_ops to bridge devices created via ioctl (v2)
vhost: use USER_DS in vhost_worker thread
ixgbe: Do not pad FCoE frames as this can cause issues with FCoE DDP
net: l2tp_eth: use LLTX to avoid LOCKDEP splats
mac802154: add missed braces
net: l2tp_eth: fix l2tp_eth_dev_xmit race
net/mlx4_en: Release QP range in free_resources
net/mlx4: Use single completion vector after NOP failure
net/mlx4_en: Set correct port parameters during device initialization
ipheth: add support for iPad
caif-hsi: Add missing return in error path
caif-hsi: Bugfix - Piggyback'ed embedded CAIF frame lost
caif: Clear shutdown mask to zero at reconnect.
tcp: heed result of security_inet_conn_request() in tcp_v6_conn_request()
ipv6: fib: fix fib dump restart
batman-adv: fix race condition in TT full-table replacement
batman-adv: only drop packets of known wifi clients
...
- another batch of patches meant to clean batman-adv namespace
- deletion of an obsolete intermediate buffer used in the visualization code to
print the output
- TT code cleanups
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iEYEABECAAYFAk/r/yQACgkQpGgxIkP9cwefRwCgmLPvqF1y720Bs33vIxIExcY1
tZAAn24dNaVX7niTIF35FS/FuSsswW/J
=6YOn
-----END PGP SIGNATURE-----
Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge
Included changes:
- another batch of patches meant to clean batman-adv namespace
- deletion of an obsolete intermediate buffer used in the visualization code to
print the output
- TT code cleanups
The specific destination is the host we direct unicast replies to.
Usually this is the original packet source address, but if we are
responding to a multicast or broadcast packet we have to use something
different.
Specifically we must use the source address we would use if we were to
send a packet to the unicast source of the original packet.
The routing cache precomputes this value, but we want to remove that
precomputation because it creates a hard dependency on the expensive
rpfilter source address validation which we'd like to make cheaper.
There are only three places where this matters:
1) ICMP replies.
2) pktinfo CMSG
3) IP options
Now there will be no real users of rt->rt_spec_dst and we can simply
remove it altogether.
Signed-off-by: David S. Miller <davem@davemloft.net>
Rename it to ip_send_unicast_reply() and add explicit 'saddr'
argument.
This removed one of the few users of rt->rt_spec_dst.
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of adding a new bool argument each time it is needed, it is better (and
simpler) to pass an 8bit flag argument which contains all the needed flags
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
During an OGM-interval (time between two different OGM sendings) the same client
could roam away and then roam back to us. In this case the node would add two
events to the events list (that is going to be sent appended to the next OGM). A
DEL one and an ADD one. Obviously they will only increase the overhead (either in
the air and on the receiver side) and eventually trigger wrong states/events
without producing any real effect.
For this reason we can safely delete any ADD event with its related DEL one.
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
The vis output doesn't need to be buffered in an character buffer before it can
be send to the userspace program that reads from the vis debug file.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
All non-static symbols of batman-adv were prefixed with batadv_ to avoid
collisions with other symbols of the kernel. Other symbols of batman-adv
should use the same prefix to keep the naming scheme consistent.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
All non-static symbols of batman-adv were prefixed with batadv_ to avoid
collisions with other symbols of the kernel. Other symbols of batman-adv
should use the same prefix to keep the naming scheme consistent.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
All non-static symbols of batman-adv were prefixed with batadv_ to avoid
collisions with other symbols of the kernel. Other symbols of batman-adv
should use the same prefix to keep the naming scheme consistent.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Instead of using a fixed value of "-1" or "-EMSGSIZE", propagate what
the nla_*() interfaces actually return.
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit c074da2810.
This change has several unwanted side effects:
1) Sockets will cache the DST_NOCACHE route in sk->sk_rx_dst and we'll
thus never create a real cached route.
2) All TCP traffic will use DST_NOCACHE and never use the routing
cache at all.
Signed-off-by: David S. Miller <davem@davemloft.net>
1. removed code replication for tov calculation for 1G, 10G and
made is common for speed > 1G (1G, 10G, 40G, 100G).
2. defines values for #4 different 40G Phys (KR4, LF4, SR4, CR4)
Signed-off-by: Parav Pandit <parav.pandit@emulex.com>
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
dropwatch wrongly diagnose all received UDP packets as drops.
This patch removes trace_kfree_skb() done in skb_free_datagram_locked().
Locations calling skb_free_datagram_locked() should do it on their own.
As a result, drops are accounted on the right function.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Removes all RTA_GET*() and RTA_PUT*() variations, as well as the
the unused rtattr_strcmp(). Get rid of rtm_get_table() by moving
it to its only user decnet.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Also, no need to trim on nlmsg_put() failure, nothing has been added
yet. We also want to use nlmsg_end(), nlmsg_new() and nlmsg_free().
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Also fix a needless skb tailroom check for a 4 bytes area
after after each rtnexthop block.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>