This change adds two new netlink routing attributes:
RTA_VIA and RTA_NEWDST.
RTA_VIA specifies the specifies the next machine to send a packet to
like RTA_GATEWAY. RTA_VIA differs from RTA_GATEWAY in that it
includes the address family of the address of the next machine to send
a packet to. Currently the MPLS code supports addresses in AF_INET,
AF_INET6 and AF_PACKET. For AF_INET and AF_INET6 the destination mac
address is acquired from the neighbour table. For AF_PACKET the
destination mac_address is specified in the netlink configuration.
I think raw destination mac address support with the family AF_PACKET
will prove useful. There is MPLS-TP which is defined to operate
on machines that do not support internet packets of any flavor. Further
seem to be corner cases where it can be useful. At this point
I don't care much either way.
RTA_NEWDST specifies the destination address to forward the packet
with. MPLS typically changes it's destination address at every hop.
For a swap operation RTA_NEWDST is specified with a length of one label.
For a push operation RTA_NEWDST is specified with two or more labels.
For a pop operation RTA_NEWDST is not specified or equivalently an emtpy
RTAN_NEWDST is specified.
Those new netlink attributes are used to implement handling of rt-netlink
RTM_NEWROUTE, RTM_DELROUTE, and RTM_GETROUTE messages, to maintain the
MPLS label table.
rtm_to_route_config parses a netlink RTM_NEWROUTE or RTM_DELROUTE message,
verify no unhandled attributes or unhandled values are present and sets
up the data structures for mpls_route_add and mpls_route_del.
I did my best to match up with the existing conventions with the caveats
that MPLS addresses are all destination-specific-addresses, and so
don't properly have a scope.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reading and writing addresses in network byte order in netlink is
traditional and I see no reason to change that. MPLS is interesting
as effectively it has variabely length addresses (the MPLS label
stack). To represent these variable length addresses in netlink
I use a valid MPLS label stack (complete with stop bit).
This achieves two things: a well defined existing format is used,
and the data can be interpreted without looking at it's length.
Not needed to look at the length to decode the variable length
network representation allows existing userspace functions
such as inet_ntop to be used without needed to change their
prototype.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
mpls_route_add and mpls_route_del implement the basic logic for adding
and removing Next Hop Label Forwarding Entries from the MPLS input
label map. The addition and subtraction is done in a way that is
consistent with how the existing routing table in Linux are
maintained. Thus all of the work to deal with NLM_F_APPEND,
NLM_F_EXCL, NLM_F_REPLACE, and NLM_F_CREATE.
Cases that are not clearly defined such as changing the interpretation
of the mpls reserved labels is not allowed.
Because it seems like the right thing to do adding an MPLS route without
specifying an input label and allowing the kernel to pick a free label
table entry is supported. The implementation is currently less than optimal
but that can be changed.
As I don't have anything else to test with only ethernet and the loopback
device are the only two device types currently supported for forwarding
MPLS over.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This sysctl gives two benefits. By defaulting the table size to 0
mpls even when compiled in and enabled defaults to not forwarding
any packets. This prevents unpleasant surprises for users.
The other benefit is that as mpls labels are allocated locally a dense
table a small dense label table may be used which saves memory and
is extremely simple and efficient to implement.
This sysctl allows userspace to choose the restrictions on the label
table size userspace applications need to cope with.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change adds a new Kconfig option MPLS_ROUTING.
The core of this change is the code to look at an mpls packet received
from another machine. Look that packet up in a routing table and
forward the packet on.
Support of MPLS over ATM is not considered or attempted here. This
implemntation follows RFC3032 and implements the MPLS shim header that
can pass over essentially any network.
What RFC3021 refers to as the as the Incoming Label Map (ILM) I call
net->mpls.platform_label[]. What RFC3031 refers to as the Next Label
Hop Forwarding Entry (NHLFE) I call mpls_route. Though calling it the
label fordwarding information base (lfib) might also be valid.
Further the implemntation forwards packets as described in RFC3032.
There is no need and given the original motivation for MPLS a strong
discincentive to have a flexible label forwarding path. In essence
the logic is the topmost label is read, looked up, removed, and
replaced by 0 or more new lables and the sent out the specified
interface to it's next hop.
Quite a few optional features are not implemented here. Among them
are generation of ICMP errors when the TTL is exceeded or the packet
is larger than the next hop MTU (those conditions are detected and the
packets are dropped instead of generating an icmp error). The traffic
class field is always set to 0. The implementation focuses on IP over
MPLS and does not handle egress of other kinds of protocols.
Instead of implementing coordination with the neighbour table and
sorting out how to input next hops in a different address family (for
which there is value). I was lazy and implemented a next hop mac
address instead. The code is simpler and there are flavor of MPLS
such as MPLS-TP where neither an IPv4 nor an IPv6 next hop is
appropriate so a next hop by mac address would need to be implemented
at some point.
Two new definitions AF_MPLS and PF_MPLS are exposed to userspace.
Decoding the mpls header must be done by first byeswapping a 32bit bit
endian word into the local cpu endian and then bit shifting to extract
the pieces. There is no C bit-field that can represent a wire format
mpls header on a little endian machine as the low bits of the 20bit
label wind up in the wrong half of third byte. Therefore internally
everything is deal with in cpu native byte order except when writing
to and reading from a packet.
For management simplicity if a label is configured to forward out
an interface that is down the packet is dropped early. Similarly
if an network interface is removed rt_dev is updated to NULL
(so no reference is preserved) and any packets for that label
are dropped. Keeping the label entries in the kernel allows
the kernel label table to function as the definitive source
of which labels are allocated and which are not.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This refactoring is needed to allow more than just mpls gso
support to be built into the mpls moddule.
Reviewed-by: Simon Horman <horms@verge.net.au>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For MPLS I am building the code so that either the neighbour mac
address can be specified or we can have a next hop in ipv4 or ipv6.
The kind of next hop we have is indicated by the neighbour table
pointer. A neighbour table pointer of NULL is a link layer address.
A non-NULL neighbour table pointer indicates which neighbour table and
thus which address family the next hop address is in that we need to
look up.
The code either sends a packet directly or looks up the appropriate
neighbour table entry and sends the packet.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While looking at the mpls code I found myself writing yet another
version of neigh_lookup_noref. We currently have __ipv4_lookup_noref
and __ipv6_lookup_noref.
So to make my work a little easier and to make it a smidge easier to
verify/maintain the mpls code in the future I stopped and wrote
___neigh_lookup_noref. Then I rewote __ipv4_lookup_noref and
__ipv6_lookup_noref in terms of this new function. I tested my new
version by verifying that the same code is generated in
ip_finish_output2 and ip6_finish_output2 where these functions are
inlined.
To get to ___neigh_lookup_noref I added a new neighbour cache table
function key_eq. So that the static size of the key would be
available.
I also added __neigh_lookup_noref for people who want to to lookup
a neighbour table entry quickly but don't know which neibhgour table
they are going to look up.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the STP timer fires, it can call br_ifinfo_notify(),
which in turn ends up in the new br_get_link_af_size().
This function is annotated to be using RTNL locking, which
clearly isn't the case here, and thus lockdep warns:
===============================
[ INFO: suspicious RCU usage. ]
3.19.0+ #569 Not tainted
-------------------------------
net/bridge/br_private.h:204 suspicious rcu_dereference_protected() usage!
Fix this by doing RCU locking here.
Fixes: b7853d73e3 ("bridge: add vlan info to bridge setlink and dellink notification messages")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/ethernet/rocker/rocker.c
The rocker commit was two overlapping changes, one to rename
the ->vport member to ->pport, and another making the bitmask
expression use '1ULL' instead of plain '1'.
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull nfsd fixes from Bruce Fields:
"Three miscellaneous bugfixes, most importantly the clp->cl_revoked
bug, which we've seen several reports of people hitting"
* 'for-4.0' of git://linux-nfs.org/~bfields/linux:
sunrpc: integer underflow in rsc_parse()
nfsd: fix clp->cl_revoked list deletion causing softlock in nfsd
svcrpc: fix memory leak in gssp_accept_sec_context_upcall
Pull networking fixes from David Miller:
1) If an IPVS tunnel is created with a mixed-family destination
address, it cannot be removed. Fix from Alexey Andriyanov.
2) Fix module refcount underflow in netfilter's nft_compat, from Pablo
Neira Ayuso.
3) Generic statistics infrastructure can reference variables sitting on
a released function stack, therefore use dynamic allocation always.
Fix from Ignacy Gawędzki.
4) skb_copy_bits() return value test is inverted in ip_check_defrag().
5) Fix network namespace exit in openvswitch, we have to release all of
the per-net vports. From Pravin B Shelar.
6) Fix signedness bug in CAIF's cfpkt_iterate(), from Dan Carpenter.
7) Fix rhashtable grow/shrink behavior, only expand during inserts and
shrink during deletes. From Daniel Borkmann.
8) Netdevice names with semicolons should never be allowed, because
they serve as a separator. From Matthew Thode.
9) Use {,__}set_current_state() where appropriate, from Fabian
Frederick.
10) Revert byte queue limits support in r8169 driver, it's causing
regressions we can't figure out.
11) tcp_should_expand_sndbuf() erroneously uses tp->packets_out to
measure packets in flight, properly use tcp_packets_in_flight()
instead. From Neal Cardwell.
12) Fix accidental removal of support for bluetooth in CSR based Intel
wireless cards. From Marcel Holtmann.
13) We accidently added a behavioral change between native and compat
tasks, wrt testing the MSG_CMSG_COMPAT bit. Just ignore it if the
user happened to set it in a native binary as that was always the
behavior we had. From Catalin Marinas.
14) Check genlmsg_unicast() return valud in hwsim netlink tx frame
handling, from Bob Copeland.
15) Fix stale ->radar_required setting in mac80211 that can prevent
starting new scans, from Eliad Peller.
16) Fix memory leak in nl80211 monitor, from Johannes Berg.
17) Fix race in TX index handling in xen-netback, from David Vrabel.
18) Don't enable interrupts in amx-xgbe driver until all software et al.
state is ready for the interrupt handler to run. From Thomas
Lendacky.
19) Add missing netlink_ns_capable() checks to rtnl_newlink(), from Eric
W Biederman.
20) The amount of header space needed in macvtap was not calculated
properly, fix it otherwise we splat past the beginning of the
packet. From Eric Dumazet.
21) Fix bcmgenet TCP TX perf regression, from Jaedon Shin.
22) Don't raw initialize or mod timers, use setup_timer() and
mod_timer() instead. From Vaishali Thakkar.
23) Fix software maintained statistics in bcmgenet and systemport
drivers, from Florian Fainelli.
24) DMA descriptor updates in sh_eth need proper memory barriers, from
Ben Hutchings.
25) Don't do UDP Fragmentation Offload on RAW sockets, from Michal
Kubecek.
26) Openvswitch's non-masked set actions aren't constructed properly
into netlink messages, fix from Joe Stringer.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits)
openvswitch: Fix serialization of non-masked set actions.
gianfar: Reduce logging noise seen due to phy polling if link is down
ibmveth: Add function to enable live MAC address changes
net: bridge: add compile-time assert for cb struct size
udp: only allow UFO for packets from SOCK_DGRAM sockets
sh_eth: Really fix padding of short frames on TX
Revert "sh_eth: Enable Rx descriptor word 0 shift for r8a7790"
sh_eth: Fix RX recovery on R-Car in case of RX ring underrun
sh_eth: Ensure proper ordering of descriptor active bit write/read
net/mlx4_en: Disbale GRO for incoming loopback/selftest packets
net/mlx4_core: Fix wrong mask and error flow for the update-qp command
net: systemport: fix software maintained statistics
net: bcmgenet: fix software maintained statistics
rxrpc: don't multiply with HZ twice
rxrpc: terminate retrans loop when sending of skb fails
net/hsr: Fix NULL pointer dereference and refcnt bugs when deleting a HSR interface.
net: pasemi: Use setup_timer and mod_timer
net: stmmac: Use setup_timer and mod_timer
net: 8390: axnet_cs: Use setup_timer and mod_timer
net: 8390: pcnet_cs: Use setup_timer and mod_timer
...
Use the built-in function instead of memset.
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before the ax25 stack calls dev_queue_xmit it always calls
ax25_type_trans which sets skb->protocol to ETH_P_AX25.
Which means that by looking at the protocol type it is possible to
detect IP packets that have not been munged by the ax25 stack in
ndo_start_xmit and call a function to munge them.
Rename ax25_neigh_xmit to ax25_ip_xmit and tweak the return type and
value to be appropriate for an ndo_start_xmit function.
Update all of the ax25 devices to test the protocol type for ETH_P_IP
and return ax25_ip_xmit as the first thing they do. This preserves
the existing semantics of IP packet processing, but the timing will be
a little different as the IP packets now pass through the qdisc layer
before reaching the ax25 ip packet processing.
Remove the now unnecessary ax25 neighbour table operations.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set actions consist of a regular OVS_KEY_ATTR_* attribute nested inside
of a OVS_ACTION_ATTR_SET action attribute. When converting masked actions
back to regular set actions, the inner attribute length was not changed,
ie, double the length being serialized. This patch fixes the bug.
Fixes: 83d2b9b ("net: openvswitch: Support masked set actions.")
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
WAIT_FOR_DEMON code is directly undefined at the beginning
of signaling.c since initial git version and thus never compiled.
This also removes buggy current->state direct access.
Suggested-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
make build fail if structure no longer fits into ->cb storage.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
If an over-MTU UDP datagram is sent through a SOCK_RAW socket to a
UFO-capable device, ip_ufo_append_data() sets skb->ip_summed to
CHECKSUM_PARTIAL unconditionally as all GSO code assumes transport layer
checksum is to be computed on segmentation. However, in this case,
skb->csum_start and skb->csum_offset are never set as raw socket
transmit path bypasses udp_send_skb() where they are usually set. As a
result, driver may access invalid memory when trying to calculate the
checksum and store the result (as observed in virtio_net driver).
Moreover, the very idea of modifying the userspace provided UDP header
is IMHO against raw socket semantics (I wasn't able to find a document
clearly stating this or the opposite, though). And while allowing
CHECKSUM_NONE in the UFO case would be more efficient, it would be a bit
too intrusive change just to handle a corner case like this. Therefore
disallowing UFO for packets from SOCK_DGRAM seems to be the best option.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Having a dst helps a little bit for teql but is fundamentally
unnecessary and there are code paths where a dst is not available that
it would be nice to use the neighbour cache.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Add protocol to neigh_tbl so that dst->ops->protocol is not needed
- Acquire the device from neigh->dev
This results in a neigh_hh_init that will cache the samve values
regardless of the packets flowing through it.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are no more callers so kill this function.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that there are no more users kill dev_rebuild_header and all of it's
implementations.
This is long overdue.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Have ax25_neigh_output perform ordinary arp resolution before calling
ax25_neigh_xmit.
Call dev_hard_header in ax25_neigh_output with a destination address so
it will not fail, and the destination mac address will not need to be
set in ax25_neigh_xmit.
Remove arp_find from ax25_neigh_xmit (the ordinary arp resolution added
to ax25_neigh_output removes the need for calling arp_find).
Document how close ax25_neigh_output is to neigh_resolve_output.
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-hams@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Rename ax25_rebuild_header to ax25_neigh_xmit and call it from
ax25_neigh_output directly. The rename is to make it clear
that this is not a rebuild_header operation.
- Remove ax25_rebuild_header from ax25_header_ops.
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-hams@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The only caller is now is ax25_neigh_construct so move
neigh_compat_output into ax25_ip.c make it static and rename it
ax25_neigh_output.
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-hams@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The special case has been pushed out into ax25_neigh_construct so there
is no need to keep this code in arp.c
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
AX25 already has it's own private arp cache operations to isolate
it's abuse of dev_rebuild_header to transmit packets. Add a function
ax25_neigh_construct that will allow all of the ax25 devices to
force using these operations, so that the generic arp code does
not need to.
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-hams@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The only user is in ax25_ip.c so stop exporting these functions.
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-hams@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patterned after the similar code in net/rom this turns out
to be a trivial obviously correct transmformation.
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-hams@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Not setting the destination address is a bug that I suspect causes no
problems today, as only the arp code seems to call dev_hard_header and
the description I have of rose is that it is expected to be used with a
static neigbour table.
I have derived the offset and the length of the rose destination address
from rose_rebuild_header where arp_find calls neigh_ha_snapshot to set
the destination address.
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-hams@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the unlikely (impossible?) event that we attempt to transmit
an ax25 packet over a non-ax25 device free the skb so we don't
leak it.
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-hams@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
A small batch with accumulated updates in nf-next, mostly IPVS updates,
they are:
1) Add 64-bits stats counters to IPVS, from Julian Anastasov.
2) Move NETFILTER_XT_MATCH_ADDRTYPE out of NETFILTER_ADVANCED as docker
seem to require this, from Anton Blanchard.
3) Use boolean instead of numeric value in set_match_v*(), from
coccinelle via Fengguang Wu.
4) Allows rescheduling of new connections in IPVS when port reuse is
detected, from Marcelo Ricardo Leitner.
5) Add missing bits to support arptables extensions from nft_compat,
from Arturo Borrero.
Patrick is preparing a large batch to enhance the set infrastructure,
named expressions among other things, that should follow up soon after
this batch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Both sk_attach_filter() and sk_attach_bpf() are setting up sk_filter,
charging skmem and attaching it to the socket after we got the eBPF
prog up and ready. Lets refactor that into a common helper.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hedberg says:
====================
pull request: bluetooth-next 2015-03-02
Here's the first bluetooth-next pull request targeting the 4.1 kernel:
- ieee802154/6lowpan cleanups
- SCO routing to host interface support for the btmrvl driver
- AMP code cleanups
- Fixes to AMP HCI init sequence
- Refactoring of the HCI callback mechanism
- Added shutdown routine for Intel controllers in the btusb driver
- New config option to enable/disable Bluetooth debugfs information
- Fix for early data reception on L2CAP fixed channels
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
After TIPC doesn't depend on iocb argument in its internal
implementations of sendmsg() and recvmsg() hooks defined in proto
structure, no any user is using iocb argument in them at all now.
Then we can drop the redundant iocb argument completely from kinds of
implementations of both sendmsg() and recvmsg() in the entire
networking stack.
Cc: Christoph Hellwig <hch@lst.de>
Suggested-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the iocb argument is used to idenfiy whether or not socket
lock is hold before tipc_sendmsg()/tipc_send_stream() is called. But
this usage prevents iocb argument from being dropped through sendmsg()
at socket common layer. Therefore, in the commit we introduce two new
functions called __tipc_sendmsg() and __tipc_send_stream(). When they
are invoked, it assumes that their callers have taken socket lock,
thereby avoiding the weird usage of iocb argument.
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support to arptables extensions from nft_compat.
Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Commit 977750076d ("af_packet: add interframe drop cmsg (v6)")
unionized skb->mark and skb->dropcount in order to allow recording
of the socket drop count while maintaining struct sk_buff size.
skb->dropcount was introduced since there was no available room
in skb->cb[] in packet sockets. However, its introduction led to
the inability to export skb->mark, or any other aliased field to
userspace if so desired.
Moving the dropcount metric to skb->cb[] eliminates this problem
at the expense of 4 bytes less in skb->cb[] for protocol families
using it.
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As part of an effort to move skb->dropcount to skb->cb[], use
a common function in order to set dropcount in struct sk_buff.
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As part of an effort to move skb->dropcount to skb->cb[] use a common
macro in protocol families using skb->cb[] for ancillary data to
validate available room in skb->cb[].
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As part of an effort to move skb->dropcount to skb->cb[], 4 bytes
of additional room are needed in skb->cb[] in packet sockets.
Store the skb original length in the first two fields of sockaddr_ll
(sll_family and sll_protocol) as they can be derived from the skb when
needed.
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 3b885787ea ("net: Generalize socket rx gap / receive queue overflow cmsg")
allowed receiving packet dropcount information as a socket level option.
RXRPC sockets recvmsg function was changed to support this by calling
sock_recv_ts_and_drops() instead of sock_recv_timestamp().
However, protocol families wishing to receive dropcount should call
sock_queue_rcv_skb() or set the dropcount specifically (as done
in packet_rcv()). This was not done for rxrpc and thus this feature
never worked on these sockets.
Formalizing this by not calling sock_recv_ts_and_drops() in rxrpc as
part of an effort to move skb->dropcount into skb->cb[]
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert boolean fields incoming and req_start to bit fields and move
force_active in order save space in bt_skb_cb in an effort to use
a portion of skb->cb[] for storing skb->dropcount.
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>