2
0
mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-18 02:04:05 +08:00
Commit Graph

29524 Commits

Author SHA1 Message Date
Neal Cardwell
02cf4ebd82 tcp: initialize passive-side sk_pacing_rate after 3WHS
For passive TCP connections, upon receiving the ACK that completes the
3WHS, make sure we set our pacing rate after we get our first RTT
sample.

On passive TCP connections, when we receive the ACK completing the
3WHS we do not take an RTT sample in tcp_ack(), but rather in
tcp_synack_rtt_meas(). So upon receiving the ACK that completes the
3WHS, tcp_ack() leaves sk_pacing_rate at its initial value.

Originally the initial sk_pacing_rate value was 0, so passive-side
connections defaulted to sysctl_tcp_min_tso_segs (2 segs) in skbuffs
made in the first RTT. With a default initial cwnd of 10 packets, this
happened to be correct for RTTs 5ms or bigger, so it was hard to
see problems in WAN or emulated WAN testing.

Since 7eec4174ff ("pkt_sched: fq: fix non TCP flows pacing"), the
initial sk_pacing_rate is 0xffffffff. So after that change, passive
TCP connections were keeping this value (and using large numbers of
segments per skbuff) until receiving an ACK for data.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21 18:56:23 -04:00
Hannes Frederic Sowa
c2f17e827b ipv6: probe routes asynchronous in rt6_probe
Routes need to be probed asynchronous otherwise the call stack gets
exhausted when the kernel attemps to deliver another skb inline, like
e.g. xt_TEE does, and we probe at the same time.

We update neigh->updated still at once, otherwise we would send to
many probes.

Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21 18:56:22 -04:00
Julian Anastasov
56e42441ed netfilter: nf_conntrack: fix rt6i_gateway checks for H.323 helper
Now when rt6_nexthop() can return nexthop address we can use it
for proper nexthop comparison of directly connected destinations.
For more information refer to commit bbb5823cf7
("netfilter: nf_conntrack: fix rt_gateway checks for H.323 helper").

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21 18:37:01 -04:00
Julian Anastasov
550bab42f8 ipv6: fill rt6i_gateway with nexthop address
Make sure rt6i_gateway contains nexthop information in
all routes returned from lookup or when routes are directly
attached to skb for generated ICMP packets.

The effect of this patch should be a faster version of
rt6_nexthop() and the consideration of local addresses as
nexthop.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21 18:37:01 -04:00
Jiri Pirko
e93b7d748b ip_output: do skb ufo init for peeked non ufo skb as well
Now, if user application does:
sendto len<mtu flag MSG_MORE
sendto len>mtu flag 0
The skb is not treated as fragmented one because it is not initialized
that way. So move the initialization to fix this.

introduced by:
commit e89e9cf539 "[IPv4/IPv6]: UFO Scatter-gather approach"

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19 19:20:52 -04:00
Jiri Pirko
c547dbf55d ip6_output: do skb ufo init for peeked non ufo skb as well
Now, if user application does:
sendto len<mtu flag MSG_MORE
sendto len>mtu flag 0
The skb is not treated as fragmented one because it is not initialized
that way. So move the initialization to fix this.

introduced by:
commit e89e9cf539 "[IPv4/IPv6]: UFO Scatter-gather approach"

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19 19:20:52 -04:00
Jiri Pirko
e36d3ff911 udp6: respect IPV6_DONTFRAG sockopt in case there are pending frames
if up->pending != 0 dontfrag is left with default value -1. That
causes that application that do:
sendto len>mtu flag MSG_MORE
sendto len>mtu flag 0
will receive EMSGSIZE errno as the result of the second sendto.

This patch fixes it by respecting IPV6_DONTFRAG socket option.

introduced by:
commit 4b340ae20d "IPv6: Complete IPV6_DONTFRAG support"

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19 19:20:52 -04:00
Daniel Borkmann
90c6bd34f8 net: unix: inherit SOCK_PASS{CRED, SEC} flags from socket to fix race
In the case of credentials passing in unix stream sockets (dgram
sockets seem not affected), we get a rather sparse race after
commit 16e5726 ("af_unix: dont send SCM_CREDENTIALS by default").

We have a stream server on receiver side that requests credential
passing from senders (e.g. nc -U). Since we need to set SO_PASSCRED
on each spawned/accepted socket on server side to 1 first (as it's
not inherited), it can happen that in the time between accept() and
setsockopt() we get interrupted, the sender is being scheduled and
continues with passing data to our receiver. At that time SO_PASSCRED
is neither set on sender nor receiver side, hence in cmsg's
SCM_CREDENTIALS we get eventually pid:0, uid:65534, gid:65534
(== overflow{u,g}id) instead of what we actually would like to see.

On the sender side, here nc -U, the tests in maybe_add_creds()
invoked through unix_stream_sendmsg() would fail, as at that exact
time, as mentioned, the sender has neither SO_PASSCRED on his side
nor sees it on the server side, and we have a valid 'other' socket
in place. Thus, sender believes it would just look like a normal
connection, not needing/requesting SO_PASSCRED at that time.

As reverting 16e5726 would not be an option due to the significant
performance regression reported when having creds always passed,
one way/trade-off to prevent that would be to set SO_PASSCRED on
the listener socket and allow inheriting these flags to the spawned
socket on server side in accept(). It seems also logical to do so
if we'd tell the listener socket to pass those flags onwards, and
would fix the race.

Before, strace:

recvmsg(4, {msg_name(0)=NULL, msg_iov(1)=[{"blub\n", 4096}],
        msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET,
        cmsg_type=SCM_CREDENTIALS{pid=0, uid=65534, gid=65534}},
        msg_flags=0}, 0) = 5

After, strace:

recvmsg(4, {msg_name(0)=NULL, msg_iov(1)=[{"blub\n", 4096}],
        msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET,
        cmsg_type=SCM_CREDENTIALS{pid=11580, uid=1000, gid=1000}},
        msg_flags=0}, 0) = 5

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19 18:50:15 -04:00
Toshiaki Makita
dfb5fa32c6 bridge: Fix updating FDB entries when the PVID is applied
We currently set the value that variable vid is pointing, which will be
used in FDB later, to 0 at br_allowed_ingress() when we receive untagged
or priority-tagged frames, even though the PVID is valid.
This leads to FDB updates in such a wrong way that they are learned with
VID 0.
Update the value to that of PVID if the PVID is applied.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Reviewed-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-18 16:02:53 -04:00
Toshiaki Makita
d1c6c708c4 bridge: Fix the way the PVID is referenced
We are using the VLAN_TAG_PRESENT bit to detect whether the PVID is
set or not at br_get_pvid(), while we don't care about the bit in
adding/deleting the PVID, which makes it impossible to forward any
incomming untagged frame with vlan_filtering enabled.

Since vid 0 cannot be used for the PVID, we can use vid 0 to indicate
that the PVID is not set, which is slightly more efficient than using
the VLAN_TAG_PRESENT.

Fix the problem by getting rid of using the VLAN_TAG_PRESENT.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Reviewed-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-18 16:02:53 -04:00
Toshiaki Makita
b90356ce17 bridge: Apply the PVID to priority-tagged frames
IEEE 802.1Q says that when we receive priority-tagged (VID 0) frames
use the PVID for the port as its VID.
(See IEEE 802.1Q-2011 6.9.1 and Table 9-2)

Apply the PVID to not only untagged frames but also priority-tagged frames.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Reviewed-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-18 16:02:53 -04:00
Toshiaki Makita
8adff41c3d bridge: Don't use VID 0 and 4095 in vlan filtering
IEEE 802.1Q says that:
- VID 0 shall not be configured as a PVID, or configured in any Filtering
Database entry.
- VID 4095 shall not be configured as a PVID, or transmitted in a tag
header. This VID value may be used to indicate a wildcard match for the VID
in management operations or Filtering Database entries.
(See IEEE 802.1Q-2011 6.9.1 and Table 9-2)

Don't accept adding these VIDs in the vlan_filtering implementation.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Reviewed-by: Vlad Yasevich <vyasevic@redhat.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-18 16:02:52 -04:00
Vlad Yasevich
4b6c7879d8 bridge: Correctly clamp MAX forward_delay when enabling STP
Commit be4f154d5e
	bridge: Clamp forward_delay when enabling STP
had a typo when attempting to clamp maximum forward delay.

It is possible to set bridge_forward_delay to be higher then
permitted maximum when STP is off.  When turning STP on, the
higher then allowed delay has to be clamed down to max value.

CC: Herbert Xu <herbert@gondor.apana.org.au>
CC: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Reviewed-by: Veaceslav Falico <vfalico@redhat.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-17 16:12:15 -04:00
Eric Dumazet
8f26fb1c1e tcp: remove the sk_can_gso() check from tcp_set_skb_tso_segs()
sk_can_gso() should only be used as a hint in tcp_sendmsg() to build GSO
packets in the first place. (As a performance hint)

Once we have GSO packets in write queue, we can not decide they are no
longer GSO only because flow now uses a route which doesn't handle
TSO/GSO.

Core networking stack handles the case very well for us, all we need
is keeping track of packet counts in MSS terms, regardless of
segmentation done later (in GSO or hardware)

Right now, if  tcp_fragment() splits a GSO packet in two parts,
@left and @right, and route changed through a non GSO device,
both @left and @right have pcount set to 1, which is wrong,
and leads to incorrect packet_count tracking.

This problem was added in commit d5ac99a648 ("[TCP]: skb pcount with MTU
discovery")

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reported-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-17 16:08:08 -04:00
Eric Dumazet
c52e2421f7 tcp: must unclone packets before mangling them
TCP stack should make sure it owns skbs before mangling them.

We had various crashes using bnx2x, and it turned out gso_size
was cleared right before bnx2x driver was populating TC descriptor
of the _previous_ packet send. TCP stack can sometime retransmit
packets that are still in Qdisc.

Of course we could make bnx2x driver more robust (using
ACCESS_ONCE(shinfo->gso_size) for example), but the bug is TCP stack.

We have identified two points where skb_unclone() was needed.

This patch adds a WARN_ON_ONCE() to warn us if we missed another
fix of this kind.

Kudos to Neal for finding the root cause of this bug. Its visible
using small MSS.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-17 16:08:08 -04:00
David S. Miller
df709a18ca Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
John W. Linville says:

====================
Please pull this batch of fixes intended for the 3.12 stream!

For the mac80211 bits, Johannes says:

"Jouni fixes a remain-on-channel vs. scan bug, and Felix fixes client TX
probing on VLANs."

And also:

"This time I have two fixes from Emmanuel for RF-kill issues, and fixed
two issues reported by Evan Huus and Thomas Lindroth respectively."

On top of those...

Avinash Patil adds a couple of mwifiex fixes to properly inform cfg80211
about some different types of disconnects, avoiding WARNINGs.

Mark Cave-Ayland corrects a pointer arithmetic problem in rtlwifi,
avoiding incorrect automatic gain calculations.

Solomon Peachy sends a cw1200 fix for locking around calls to
cw1200_irq_handler, addressing "lost interrupt" problems.

Please let me know if there are problems!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-17 16:05:50 -04:00
Yuchung Cheng
031afe4990 tcp: fix incorrect ca_state in tail loss probe
On receiving an ACK that covers the loss probe sequence, TLP
immediately sets the congestion state to Open, even though some packets
are not recovered and retransmisssion are on the way.  The later ACks
may trigger a WARN_ON check in step D of tcp_fastretrans_alert(), e.g.,
https://bugzilla.redhat.com/show_bug.cgi?id=989251

The fix is to follow the similar procedure in recovery by calling
tcp_try_keep_open(). The sender switches to Open state if no packets
are retransmissted. Otherwise it goes to Disorder and let subsequent
ACKs move the state to Recovery or Open.

Reported-By: Michael Sterrett <michael@sterretts.net>
Tested-By: Dormando <dormando@rydia.net>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-17 15:41:02 -04:00
Vlad Yasevich
d2dbbba77e sctp: Perform software checksum if packet has to be fragmented.
IP/IPv6 fragmentation knows how to compute only TCP/UDP checksum.
This causes problems if SCTP packets has to be fragmented and
ipsummed has been set to PARTIAL due to checksum offload support.
This condition can happen when retransmitting after MTU discover,
or when INIT or other control chunks are larger then MTU.
Check for the rare fragmentation condition in SCTP and use software
checksum calculation in this case.

CC: Fan Du <fan.du@windriver.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-17 15:24:44 -04:00
Fan Du
27127a8256 sctp: Use software crc32 checksum when xfrm transform will happen.
igb/ixgbe have hardware sctp checksum support, when this feature is enabled
and also IPsec is armed to protect sctp traffic, ugly things happened as
xfrm_output checks CHECKSUM_PARTIAL to do checksum operation(sum every thing
up and pack the 16bits result in the checksum field). The result is fail
establishment of sctp communication.

Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-17 15:24:44 -04:00
John W. Linville
39c253ed78 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2013-10-15 13:05:21 -04:00
John W. Linville
4db89e149c Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 2013-10-14 13:22:59 -04:00
Johannes Berg
d86aa4f8ca mac80211: fix crash if bitrate calculation goes wrong
If a frame's timestamp is calculated, and the bitrate
calculation goes wrong and returns zero, the system
will attempt to divide by zero and crash. Catch this
case and print the rate information that the driver
reported when this happens.

Cc: stable@vger.kernel.org
Reported-by: Thomas Lindroth <thomas.lindroth@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-10-14 09:47:08 +02:00
Johannes Berg
f5563318ff wireless: radiotap: fix parsing buffer overrun
When parsing an invalid radiotap header, the parser can overrun
the buffer that is passed in because it doesn't correctly check
 1) the minimum radiotap header size
 2) the space for extended bitmaps

The first issue doesn't affect any in-kernel user as they all
check the minimum size before calling the radiotap function.
The second issue could potentially affect the kernel if an skb
is passed in that consists only of the radiotap header with a
lot of extended bitmaps that extend past the SKB. In that case
a read-only buffer overrun by at most 4 bytes is possible.

Fix this by adding the appropriate checks to the parser.

Cc: stable@vger.kernel.org
Reported-by: Evan Huus <eapache@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-10-14 09:47:00 +02:00
Oussama Ghorbel
bf58175954 ipv6: Initialize ip6_tnl.hlen in gre tunnel even if no route is found
The ip6_tnl.hlen (gre and ipv6 headers length) is independent from the
outgoing interface, so it would be better to initialize it even when no
route is found, otherwise its value will be zero.
While I'm not sure if this could happen in real life, but doing that
will avoid to call the skb_push function with a zero in ip6gre_header
function.

Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Oussama Ghorbel <ou.ghorbel@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-11 17:50:59 -04:00
stephen hemminger
ff704050f2 netem: free skb's in tree on reset
Netem can leak memory because packets get stored in red-black
tree and it is not cleared on reset.

Reported by: Сергеев Сергей <adron@yapic.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-11 17:30:38 -04:00
stephen hemminger
638a52b801 netem: update backlog after drop
When packet is dropped from rb-tree netem the backlog statistic should
also be updated.

Reported-by: Сергеев Сергей <adron@yapic.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-11 17:29:29 -04:00
Eric Dumazet
455cc32bf1 l2tp: must disable bh before calling l2tp_xmit_skb()
François Cachereul made a very nice bug report and suspected
the bh_lock_sock() / bh_unlok_sock() pair used in l2tp_xmit_skb() from
process context was not good.

This problem was added by commit 6af88da14e
("l2tp: Fix locking in l2tp_core.c").

l2tp_eth_dev_xmit() runs from BH context, so we must disable BH
from other l2tp_xmit_skb() users.

[  452.060011] BUG: soft lockup - CPU#1 stuck for 23s! [accel-pppd:6662]
[  452.061757] Modules linked in: l2tp_ppp l2tp_netlink l2tp_core pppoe pppox
ppp_generic slhc ipv6 ext3 mbcache jbd virtio_balloon xfs exportfs dm_mod
virtio_blk ata_generic virtio_net floppy ata_piix libata virtio_pci virtio_ring virtio [last unloaded: scsi_wait_scan]
[  452.064012] CPU 1
[  452.080015] BUG: soft lockup - CPU#2 stuck for 23s! [accel-pppd:6643]
[  452.080015] CPU 2
[  452.080015]
[  452.080015] Pid: 6643, comm: accel-pppd Not tainted 3.2.46.mini #1 Bochs Bochs
[  452.080015] RIP: 0010:[<ffffffff81059f6c>]  [<ffffffff81059f6c>] do_raw_spin_lock+0x17/0x1f
[  452.080015] RSP: 0018:ffff88007125fc18  EFLAGS: 00000293
[  452.080015] RAX: 000000000000aba9 RBX: ffffffff811d0703 RCX: 0000000000000000
[  452.080015] RDX: 00000000000000ab RSI: ffff8800711f6896 RDI: ffff8800745c8110
[  452.080015] RBP: ffff88007125fc18 R08: 0000000000000020 R09: 0000000000000000
[  452.080015] R10: 0000000000000000 R11: 0000000000000280 R12: 0000000000000286
[  452.080015] R13: 0000000000000020 R14: 0000000000000240 R15: 0000000000000000
[  452.080015] FS:  00007fdc0cc24700(0000) GS:ffff8800b6f00000(0000) knlGS:0000000000000000
[  452.080015] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  452.080015] CR2: 00007fdb054899b8 CR3: 0000000074404000 CR4: 00000000000006a0
[  452.080015] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  452.080015] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  452.080015] Process accel-pppd (pid: 6643, threadinfo ffff88007125e000, task ffff8800b27e6dd0)
[  452.080015] Stack:
[  452.080015]  ffff88007125fc28 ffffffff81256559 ffff88007125fc98 ffffffffa01b2bd1
[  452.080015]  ffff88007125fc58 000000000000000c 00000000029490d0 0000009c71dbe25e
[  452.080015]  000000000000005c 000000080000000e 0000000000000000 ffff880071170600
[  452.080015] Call Trace:
[  452.080015]  [<ffffffff81256559>] _raw_spin_lock+0xe/0x10
[  452.080015]  [<ffffffffa01b2bd1>] l2tp_xmit_skb+0x189/0x4ac [l2tp_core]
[  452.080015]  [<ffffffffa01c2d36>] pppol2tp_sendmsg+0x15e/0x19c [l2tp_ppp]
[  452.080015]  [<ffffffff811c7872>] __sock_sendmsg_nosec+0x22/0x24
[  452.080015]  [<ffffffff811c83bd>] sock_sendmsg+0xa1/0xb6
[  452.080015]  [<ffffffff81254e88>] ? __schedule+0x5c1/0x616
[  452.080015]  [<ffffffff8103c7c6>] ? __dequeue_signal+0xb7/0x10c
[  452.080015]  [<ffffffff810bbd21>] ? fget_light+0x75/0x89
[  452.080015]  [<ffffffff811c8444>] ? sockfd_lookup_light+0x20/0x56
[  452.080015]  [<ffffffff811c9b34>] sys_sendto+0x10c/0x13b
[  452.080015]  [<ffffffff8125cac2>] system_call_fastpath+0x16/0x1b
[  452.080015] Code: 81 48 89 e5 72 0c 31 c0 48 81 ff 45 66 25 81 0f 92 c0 5d c3 55 b8 00 01 00 00 48 89 e5 f0 66 0f c1 07 0f b6 d4 38 d0 74 06 f3 90 <8a> 07 eb f6 5d c3 90 90 55 48 89 e5 9c 58 0f 1f 44 00 00 5d c3
[  452.080015] Call Trace:
[  452.080015]  [<ffffffff81256559>] _raw_spin_lock+0xe/0x10
[  452.080015]  [<ffffffffa01b2bd1>] l2tp_xmit_skb+0x189/0x4ac [l2tp_core]
[  452.080015]  [<ffffffffa01c2d36>] pppol2tp_sendmsg+0x15e/0x19c [l2tp_ppp]
[  452.080015]  [<ffffffff811c7872>] __sock_sendmsg_nosec+0x22/0x24
[  452.080015]  [<ffffffff811c83bd>] sock_sendmsg+0xa1/0xb6
[  452.080015]  [<ffffffff81254e88>] ? __schedule+0x5c1/0x616
[  452.080015]  [<ffffffff8103c7c6>] ? __dequeue_signal+0xb7/0x10c
[  452.080015]  [<ffffffff810bbd21>] ? fget_light+0x75/0x89
[  452.080015]  [<ffffffff811c8444>] ? sockfd_lookup_light+0x20/0x56
[  452.080015]  [<ffffffff811c9b34>] sys_sendto+0x10c/0x13b
[  452.080015]  [<ffffffff8125cac2>] system_call_fastpath+0x16/0x1b
[  452.064012]
[  452.064012] Pid: 6662, comm: accel-pppd Not tainted 3.2.46.mini #1 Bochs Bochs
[  452.064012] RIP: 0010:[<ffffffff81059f6e>]  [<ffffffff81059f6e>] do_raw_spin_lock+0x19/0x1f
[  452.064012] RSP: 0018:ffff8800b6e83ba0  EFLAGS: 00000297
[  452.064012] RAX: 000000000000aaa9 RBX: ffff8800b6e83b40 RCX: 0000000000000002
[  452.064012] RDX: 00000000000000aa RSI: 000000000000000a RDI: ffff8800745c8110
[  452.064012] RBP: ffff8800b6e83ba0 R08: 000000000000c802 R09: 000000000000001c
[  452.064012] R10: ffff880071096c4e R11: 0000000000000006 R12: ffff8800b6e83b18
[  452.064012] R13: ffffffff8125d51e R14: ffff8800b6e83ba0 R15: ffff880072a589c0
[  452.064012] FS:  00007fdc0b81e700(0000) GS:ffff8800b6e80000(0000) knlGS:0000000000000000
[  452.064012] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  452.064012] CR2: 0000000000625208 CR3: 0000000074404000 CR4: 00000000000006a0
[  452.064012] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  452.064012] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  452.064012] Process accel-pppd (pid: 6662, threadinfo ffff88007129a000, task ffff8800744f7410)
[  452.064012] Stack:
[  452.064012]  ffff8800b6e83bb0 ffffffff81256559 ffff8800b6e83bc0 ffffffff8121c64a
[  452.064012]  ffff8800b6e83bf0 ffffffff8121ec7a ffff880072a589c0 ffff880071096c62
[  452.064012]  0000000000000011 ffffffff81430024 ffff8800b6e83c80 ffffffff8121f276
[  452.064012] Call Trace:
[  452.064012]  <IRQ>
[  452.064012]  [<ffffffff81256559>] _raw_spin_lock+0xe/0x10
[  452.064012]  [<ffffffff8121c64a>] spin_lock+0x9/0xb
[  452.064012]  [<ffffffff8121ec7a>] udp_queue_rcv_skb+0x186/0x269
[  452.064012]  [<ffffffff8121f276>] __udp4_lib_rcv+0x297/0x4ae
[  452.064012]  [<ffffffff8121c178>] ? raw_rcv+0xe9/0xf0
[  452.064012]  [<ffffffff8121f4a7>] udp_rcv+0x1a/0x1c
[  452.064012]  [<ffffffff811fe385>] ip_local_deliver_finish+0x12b/0x1a5
[  452.064012]  [<ffffffff811fe54e>] ip_local_deliver+0x53/0x84
[  452.064012]  [<ffffffff811fe1d0>] ip_rcv_finish+0x2bc/0x2f3
[  452.064012]  [<ffffffff811fe78f>] ip_rcv+0x210/0x269
[  452.064012]  [<ffffffff8101911e>] ? kvm_clock_get_cycles+0x9/0xb
[  452.064012]  [<ffffffff811d88cd>] __netif_receive_skb+0x3a5/0x3f7
[  452.064012]  [<ffffffff811d8eba>] netif_receive_skb+0x57/0x5e
[  452.064012]  [<ffffffff811cf30f>] ? __netdev_alloc_skb+0x1f/0x3b
[  452.064012]  [<ffffffffa0049126>] virtnet_poll+0x4ba/0x5a4 [virtio_net]
[  452.064012]  [<ffffffff811d9417>] net_rx_action+0x73/0x184
[  452.064012]  [<ffffffffa01b2cc2>] ? l2tp_xmit_skb+0x27a/0x4ac [l2tp_core]
[  452.064012]  [<ffffffff810343b9>] __do_softirq+0xc3/0x1a8
[  452.064012]  [<ffffffff81013b56>] ? ack_APIC_irq+0x10/0x12
[  452.064012]  [<ffffffff81256559>] ? _raw_spin_lock+0xe/0x10
[  452.064012]  [<ffffffff8125e0ac>] call_softirq+0x1c/0x26
[  452.064012]  [<ffffffff81003587>] do_softirq+0x45/0x82
[  452.064012]  [<ffffffff81034667>] irq_exit+0x42/0x9c
[  452.064012]  [<ffffffff8125e146>] do_IRQ+0x8e/0xa5
[  452.064012]  [<ffffffff8125676e>] common_interrupt+0x6e/0x6e
[  452.064012]  <EOI>
[  452.064012]  [<ffffffff810b82a1>] ? kfree+0x8a/0xa3
[  452.064012]  [<ffffffffa01b2cc2>] ? l2tp_xmit_skb+0x27a/0x4ac [l2tp_core]
[  452.064012]  [<ffffffffa01b2c25>] ? l2tp_xmit_skb+0x1dd/0x4ac [l2tp_core]
[  452.064012]  [<ffffffffa01c2d36>] pppol2tp_sendmsg+0x15e/0x19c [l2tp_ppp]
[  452.064012]  [<ffffffff811c7872>] __sock_sendmsg_nosec+0x22/0x24
[  452.064012]  [<ffffffff811c83bd>] sock_sendmsg+0xa1/0xb6
[  452.064012]  [<ffffffff81254e88>] ? __schedule+0x5c1/0x616
[  452.064012]  [<ffffffff8103c7c6>] ? __dequeue_signal+0xb7/0x10c
[  452.064012]  [<ffffffff810bbd21>] ? fget_light+0x75/0x89
[  452.064012]  [<ffffffff811c8444>] ? sockfd_lookup_light+0x20/0x56
[  452.064012]  [<ffffffff811c9b34>] sys_sendto+0x10c/0x13b
[  452.064012]  [<ffffffff8125cac2>] system_call_fastpath+0x16/0x1b
[  452.064012] Code: 89 e5 72 0c 31 c0 48 81 ff 45 66 25 81 0f 92 c0 5d c3 55 b8 00 01 00 00 48 89 e5 f0 66 0f c1 07 0f b6 d4 38 d0 74 06 f3 90 8a 07 <eb> f6 5d c3 90 90 55 48 89 e5 9c 58 0f 1f 44 00 00 5d c3 55 48
[  452.064012] Call Trace:
[  452.064012]  <IRQ>  [<ffffffff81256559>] _raw_spin_lock+0xe/0x10
[  452.064012]  [<ffffffff8121c64a>] spin_lock+0x9/0xb
[  452.064012]  [<ffffffff8121ec7a>] udp_queue_rcv_skb+0x186/0x269
[  452.064012]  [<ffffffff8121f276>] __udp4_lib_rcv+0x297/0x4ae
[  452.064012]  [<ffffffff8121c178>] ? raw_rcv+0xe9/0xf0
[  452.064012]  [<ffffffff8121f4a7>] udp_rcv+0x1a/0x1c
[  452.064012]  [<ffffffff811fe385>] ip_local_deliver_finish+0x12b/0x1a5
[  452.064012]  [<ffffffff811fe54e>] ip_local_deliver+0x53/0x84
[  452.064012]  [<ffffffff811fe1d0>] ip_rcv_finish+0x2bc/0x2f3
[  452.064012]  [<ffffffff811fe78f>] ip_rcv+0x210/0x269
[  452.064012]  [<ffffffff8101911e>] ? kvm_clock_get_cycles+0x9/0xb
[  452.064012]  [<ffffffff811d88cd>] __netif_receive_skb+0x3a5/0x3f7
[  452.064012]  [<ffffffff811d8eba>] netif_receive_skb+0x57/0x5e
[  452.064012]  [<ffffffff811cf30f>] ? __netdev_alloc_skb+0x1f/0x3b
[  452.064012]  [<ffffffffa0049126>] virtnet_poll+0x4ba/0x5a4 [virtio_net]
[  452.064012]  [<ffffffff811d9417>] net_rx_action+0x73/0x184
[  452.064012]  [<ffffffffa01b2cc2>] ? l2tp_xmit_skb+0x27a/0x4ac [l2tp_core]
[  452.064012]  [<ffffffff810343b9>] __do_softirq+0xc3/0x1a8
[  452.064012]  [<ffffffff81013b56>] ? ack_APIC_irq+0x10/0x12
[  452.064012]  [<ffffffff81256559>] ? _raw_spin_lock+0xe/0x10
[  452.064012]  [<ffffffff8125e0ac>] call_softirq+0x1c/0x26
[  452.064012]  [<ffffffff81003587>] do_softirq+0x45/0x82
[  452.064012]  [<ffffffff81034667>] irq_exit+0x42/0x9c
[  452.064012]  [<ffffffff8125e146>] do_IRQ+0x8e/0xa5
[  452.064012]  [<ffffffff8125676e>] common_interrupt+0x6e/0x6e
[  452.064012]  <EOI>  [<ffffffff810b82a1>] ? kfree+0x8a/0xa3
[  452.064012]  [<ffffffffa01b2cc2>] ? l2tp_xmit_skb+0x27a/0x4ac [l2tp_core]
[  452.064012]  [<ffffffffa01b2c25>] ? l2tp_xmit_skb+0x1dd/0x4ac [l2tp_core]
[  452.064012]  [<ffffffffa01c2d36>] pppol2tp_sendmsg+0x15e/0x19c [l2tp_ppp]
[  452.064012]  [<ffffffff811c7872>] __sock_sendmsg_nosec+0x22/0x24
[  452.064012]  [<ffffffff811c83bd>] sock_sendmsg+0xa1/0xb6
[  452.064012]  [<ffffffff81254e88>] ? __schedule+0x5c1/0x616
[  452.064012]  [<ffffffff8103c7c6>] ? __dequeue_signal+0xb7/0x10c
[  452.064012]  [<ffffffff810bbd21>] ? fget_light+0x75/0x89
[  452.064012]  [<ffffffff811c8444>] ? sockfd_lookup_light+0x20/0x56
[  452.064012]  [<ffffffff811c9b34>] sys_sendto+0x10c/0x13b
[  452.064012]  [<ffffffff8125cac2>] system_call_fastpath+0x16/0x1b

Reported-by: François Cachereul <f.cachereul@alphalink.fr>
Tested-by: François Cachereul <f.cachereul@alphalink.fr>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-11 16:51:37 -04:00
Christophe Gouault
7263a5187f vti: get rid of nf mark rule in prerouting
This patch fixes and improves the use of vti interfaces (while
lightly changing the way of configuring them).

Currently:

- it is necessary to identify and mark inbound IPsec
  packets destined to each vti interface, via netfilter rules in
  the mangle table at prerouting hook.

- the vti module cannot retrieve the right tunnel in input since
  commit b9959fd3: vti tunnels all have an i_key, but the tunnel lookup
  is done with flag TUNNEL_NO_KEY, so there no chance to retrieve them.

- the i_key is used by the outbound processing as a mark to lookup
  for the right SP and SA bundle.

This patch uses the o_key to store the vti mark (instead of i_key) and
enables:

- to avoid the need for previously marking the inbound skbuffs via a
  netfilter rule.
- to properly retrieve the right tunnel in input, only based on the IPsec
  packet outer addresses.
- to properly perform an inbound policy check (using the tunnel o_key
  as a mark).
- to properly perform an outbound SPD and SAD lookup (using the tunnel
  o_key as a mark).
- to keep the current mark of the skbuff. The skbuff mark is neither
  used nor changed by the vti interface. Only the vti interface o_key
  is used.

SAs have a wildcard mark.
SPs have a mark equal to the vti interface o_key.

The vti interface must be created as follows (i_key = 0, o_key = mark):

   ip link add vti1 mode vti local 1.1.1.1 remote 2.2.2.2 okey 1

The SPs attached to vti1 must be created as follows (mark = vti1 o_key):

   ip xfrm policy add dir out mark 1 tmpl src 1.1.1.1 dst 2.2.2.2 \
      proto esp mode tunnel
   ip xfrm policy add dir in  mark 1 tmpl src 2.2.2.2 dst 1.1.1.1 \
      proto esp mode tunnel

The SAs are created with the default wildcard mark. There is no
distinction between global vs. vti SAs. Just their addresses will
possibly link them to a vti interface:

   ip xfrm state add src 1.1.1.1 dst 2.2.2.2 proto esp spi 1000 mode tunnel \
                 enc "cbc(aes)" "azertyuiopqsdfgh"

   ip xfrm state add src 2.2.2.2 dst 1.1.1.1 proto esp spi 2000 mode tunnel \
                 enc "cbc(aes)" "sqbdhgqsdjqjsdfh"

To avoid matching "global" (not vti) SPs in vti interfaces, global SPs
should no use the default wildcard mark, but explicitly match mark 0.

To avoid a double SPD lookup in input and output (in global and vti SPDs),
the NOPOLICY and NOXFRM options should be set on the vti interfaces:

   echo 1 > /proc/sys/net/ipv4/conf/vti1/disable_policy
   echo 1 > /proc/sys/net/ipv4/conf/vti1/disable_xfrm

The outgoing traffic is steered to vti1 by a route via the vti interface:

   ip route add 192.168.0.0/16 dev vti1

The incoming IPsec traffic is steered to vti1 because its outer addresses
match the vti1 tunnel configuration.

Signed-off-by: Christophe Gouault <christophe.gouault@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-11 14:52:17 -04:00
Vlad Yasevich
f144febd93 bridge: update mdb expiration timer upon reports.
commit 9f00b2e7cf
	bridge: only expire the mdb entry when query is received
changed the mdb expiration timer to be armed only when QUERY is
received.  Howerver, this causes issues in an environment where
the multicast server socket comes and goes very fast while a client
is trying to send traffic to it.

The root cause is a race where a sequence of LEAVE followed by REPORT
messages can race against QUERY messages generated in response to LEAVE.
The QUERY ends up starting the expiration timer, and that timer can
potentially expire after the new REPORT message has been received signaling
the new join operation.  This leads to a significant drop in multicast
traffic and possible complete stall.

The solution is to have REPORT messages update the expiration timer
on entries that already exist.

CC: Cong Wang <xiyou.wangcong@gmail.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>
CC: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-11 00:56:47 -04:00
John W. Linville
c380a1fda5 Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 2013-10-10 13:19:58 -04:00
David S. Miller
f606385068 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:

====================
1) We used the wrong netlink attribute to verify the
   lenght of the replay window on async events. Fix this by
   using the right netlink attribute.

2) Policy lookups can not match the output interface on forwarding.
   Add the needed informations to the flow informations.

3) We update the pmtu when we receive a ICMPV6_DEST_UNREACH message
   on IPsec with ipv6. This is wrong and leads to strange fragmented
   packets, only ICMPV6_PKT_TOOBIG messages should update the pmtu.
   Fix this by removing the ICMPV6_DEST_UNREACH check from the IPsec
   protocol error handlers.

4) The legacy IPsec anti replay mechanism supports anti replay
   windows up to 32 packets. If a user requests for a bigger
   anti replay window, we use 32 packets but pretend that we use
   the requested window size. Fix from Fan Du.

5) If asynchronous events are enabled and replay_maxdiff is set to
   zero, we generate an async event for every received packet instead
   of checking whether a timeout occurred. Fix from Thomas Egerer.

6) Policies need a refcount when the state resolution timer is armed.
   Otherwise the timer can fire after the policy is deleted.

7) We might dreference a NULL pointer if the hold_queue is empty,
   add a check to avoid this.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-09 13:41:45 -04:00
Fabio Estevam
cb03db9d0e net: secure_seq: Fix warning when CONFIG_IPV6 and CONFIG_INET are not selected
net_secret() is only used when CONFIG_IPV6 or CONFIG_INET are selected.

Building a defconfig with both of these symbols unselected (Using the ARM
at91sam9rl_defconfig, for example) leads to the following build warning:

$ make at91sam9rl_defconfig
#
# configuration written to .config
#

$ make net/core/secure_seq.o
scripts/kconfig/conf --silentoldconfig Kconfig
  CHK     include/config/kernel.release
  CHK     include/generated/uapi/linux/version.h
  CHK     include/generated/utsrelease.h
make[1]: `include/generated/mach-types.h' is up to date.
  CALL    scripts/checksyscalls.sh
  CC      net/core/secure_seq.o
net/core/secure_seq.c:17:13: warning: 'net_secret_init' defined but not used [-Wunused-function]

Fix this warning by protecting the definition of net_secret() with these
symbols.

Reported-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-09 12:59:57 -04:00
Emmanuel Grumbach
f38dd58ccc cfg80211: don't add p2p device while in RFKILL
Since P2P device doesn't have a netdev associated to it,
we cannot prevent the user to start it when in RFKILL.
So refuse to even add it when in RFKILL.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-10-09 18:40:16 +02:00
Emmanuel Grumbach
a754055a12 mac80211: correctly close cancelled scans
__ieee80211_scan_completed is called from a worker. This
means that the following flow is possible.

 * driver calls ieee80211_scan_completed
 * mac80211 cancels the scan (that is already complete)
 * __ieee80211_scan_completed runs

When scan_work will finally run, it will see that the scan
hasn't been aborted and might even trigger another scan on
another band. This leads to a situation where cfg80211's
scan is not done and no further scan can be issued.

Fix this by setting a new flag when a HW scan is being
cancelled so that no other scan will be triggered.

Cc: stable@vger.kernel.org
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-10-09 18:40:07 +02:00
Eric Dumazet
7eec4174ff pkt_sched: fq: fix non TCP flows pacing
Steinar reported FQ pacing was not working for UDP flows.

It looks like the initial sk->sk_pacing_rate value of 0 was
a wrong choice. We should init it to ~0U (unlimited)

Then, TCA_FQ_FLOW_DEFAULT_RATE should be removed because it makes
no real sense. The default rate is really unlimited, and we
need to avoid a zero divide.

Reported-by: Steinar H. Gunderson <sesse@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-08 21:54:01 -04:00
Marc Kleine-Budde
c33a39c575 net: vlan: fix nlmsg size calculation in vlan_get_size()
This patch fixes the calculation of the nlmsg size, by adding the missing
nla_total_size().

Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-08 16:32:41 -04:00
Eric Dumazet
ede869cd0f pkt_sched: fq: fix typo for initial_quantum
TCA_FQ_INITIAL_QUANTUM should set q->initial_quantum

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-08 16:32:41 -04:00
Oussama Ghorbel
0e719e3a53 ipv6: Fix the upper MTU limit in GRE tunnel
Unlike ipv4, the struct member hlen holds the length of the GRE and ipv6
headers. This length is also counted in dev->hard_header_len.
Perhaps, it's more clean to modify the hlen to count only the GRE header
without ipv6 header as the variable name suggest, but the simple way to fix
this without regression risk is simply modify the calculation of the limit
in ip6gre_tunnel_change_mtu function.
Verified in kernel version v3.11.

Signed-off-by: Oussama Ghorbel <ou.ghorbel@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-08 16:32:40 -04:00
David S. Miller
8d8a51e26a l2tp: Fix build warning with ipv6 disabled.
net/l2tp/l2tp_core.c: In function ‘l2tp_verify_udp_checksum’:
net/l2tp/l2tp_core.c:499:22: warning: unused variable ‘tunnel’ [-Wunused-variable]

Create a helper "l2tp_tunnel()" to facilitate this, and as a side
effect get rid of a bunch of unnecessary void pointer casts.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-08 15:44:26 -04:00
Alan Ott
ab2d95df9c 6lowpan: Sync default hardware address of lowpan links to their wpan
When a lowpan link to a wpan device is created, set the hardware address
of the lowpan link to that of the wpan device.

Signed-off-by: Alan Ott <alan@signal11.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-08 15:28:37 -04:00
Alan Ott
7adac1ec81 6lowpan: Only make 6lowpan links to IEEE802154 devices
Refuse to create 6lowpan links if the actual hardware interface is
of any type other than ARPHRD_IEEE802154.

Signed-off-by: Alan Ott <alan@signal11.us>
Suggested-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-08 15:28:37 -04:00
Steffen Klassert
2bb53e2557 xfrm: check for a vaild skb in xfrm_policy_queue_process
We might dreference a NULL pointer if the hold_queue is empty,
so add a check to avoid this.

Bug was introduced with git commit a0073fe18 ("xfrm: Add a state
resolution packet queue")

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-10-08 10:49:51 +02:00
Steffen Klassert
e7d8f6cb2f xfrm: Add refcount handling to queued policies
We need to ensure that policies can't go away as long as the hold timer
is armed, so take a refcont when we arm the timer and drop one if we
delete it.

Bug was introduced with git commit a0073fe18 ("xfrm: Add a state
resolution packet queue")

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-10-08 10:49:45 +02:00
Eric W. Biederman
88ba09df23 net: Update the sysctl permissions handler to test effective uid/gid
On Tue, 20 Aug 2013 11:40:04 -0500 Eric Sandeen <sandeen@redhat.com> wrote:
> This was brought up in a Red Hat bug (which may be marked private, I'm sorry):
>
> Bug 987055 - open O_WRONLY succeeds on some root owned files in /proc for process running with unprivileged EUID
>
> "On RHEL7 some of the files in /proc can be opened for writing by an unprivileged EUID."
>
> The flaw existed upstream as well last I checked.
>
> This commit in kernel v3.8 caused the regression:
>
> commit cff109768b
> Author: Eric W. Biederman <ebiederm@xmission.com>
> Date:   Fri Nov 16 03:03:01 2012 +0000
>
>     net: Update the per network namespace sysctls to be available to the network namespace owner
>
>     - Allow anyone with CAP_NET_ADMIN rights in the user namespace of the
>       the netowrk namespace to change sysctls.
>     - Allow anyone the uid of the user namespace root the same
>       permissions over the network namespace sysctls as the global root.
>     - Allow anyone with gid of the user namespace root group the same
>       permissions over the network namespace sysctl as the global root group.
>
>     Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
>
> because it changed /sys/net's special permission handler to test current_uid, not
> current_euid; same for current_gid/current_egid.
>
> So in this case, root cannot drop privs via set[ug]id, and retains all privs
> in this codepath.

Modify the code to use current_euid(), and in_egroup_p, as in done
in fs/proc/proc_sysctl.c:test_perm()

Cc: stable@vger.kernel.org
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reported-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-07 15:57:56 -04:00
Jiri Benc
0a7e226090 ipv4: fix ineffective source address selection
When sending out multicast messages, the source address in inet->mc_addr is
ignored and rewritten by an autoselected one. This is caused by a typo in
commit 813b3b5db8 ("ipv4: Use caller's on-stack flowi as-is in output
route lookups").

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-07 15:26:46 -04:00
Alexei Starovoitov
d45ed4a4e3 net: fix unsafe set_memory_rw from softirq
on x86 system with net.core.bpf_jit_enable = 1

sudo tcpdump -i eth1 'tcp port 22'

causes the warning:
[   56.766097]  Possible unsafe locking scenario:
[   56.766097]
[   56.780146]        CPU0
[   56.786807]        ----
[   56.793188]   lock(&(&vb->lock)->rlock);
[   56.799593]   <Interrupt>
[   56.805889]     lock(&(&vb->lock)->rlock);
[   56.812266]
[   56.812266]  *** DEADLOCK ***
[   56.812266]
[   56.830670] 1 lock held by ksoftirqd/1/13:
[   56.836838]  #0:  (rcu_read_lock){.+.+..}, at: [<ffffffff8118f44c>] vm_unmap_aliases+0x8c/0x380
[   56.849757]
[   56.849757] stack backtrace:
[   56.862194] CPU: 1 PID: 13 Comm: ksoftirqd/1 Not tainted 3.12.0-rc3+ #45
[   56.868721] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012
[   56.882004]  ffffffff821944c0 ffff88080bbdb8c8 ffffffff8175a145 0000000000000007
[   56.895630]  ffff88080bbd5f40 ffff88080bbdb928 ffffffff81755b14 0000000000000001
[   56.909313]  ffff880800000001 ffff880800000000 ffffffff8101178f 0000000000000001
[   56.923006] Call Trace:
[   56.929532]  [<ffffffff8175a145>] dump_stack+0x55/0x76
[   56.936067]  [<ffffffff81755b14>] print_usage_bug+0x1f7/0x208
[   56.942445]  [<ffffffff8101178f>] ? save_stack_trace+0x2f/0x50
[   56.948932]  [<ffffffff810cc0a0>] ? check_usage_backwards+0x150/0x150
[   56.955470]  [<ffffffff810ccb52>] mark_lock+0x282/0x2c0
[   56.961945]  [<ffffffff810ccfed>] __lock_acquire+0x45d/0x1d50
[   56.968474]  [<ffffffff810cce6e>] ? __lock_acquire+0x2de/0x1d50
[   56.975140]  [<ffffffff81393bf5>] ? cpumask_next_and+0x55/0x90
[   56.981942]  [<ffffffff810cef72>] lock_acquire+0x92/0x1d0
[   56.988745]  [<ffffffff8118f52a>] ? vm_unmap_aliases+0x16a/0x380
[   56.995619]  [<ffffffff817628f1>] _raw_spin_lock+0x41/0x50
[   57.002493]  [<ffffffff8118f52a>] ? vm_unmap_aliases+0x16a/0x380
[   57.009447]  [<ffffffff8118f52a>] vm_unmap_aliases+0x16a/0x380
[   57.016477]  [<ffffffff8118f44c>] ? vm_unmap_aliases+0x8c/0x380
[   57.023607]  [<ffffffff810436b0>] change_page_attr_set_clr+0xc0/0x460
[   57.030818]  [<ffffffff810cfb8d>] ? trace_hardirqs_on+0xd/0x10
[   57.037896]  [<ffffffff811a8330>] ? kmem_cache_free+0xb0/0x2b0
[   57.044789]  [<ffffffff811b59c3>] ? free_object_rcu+0x93/0xa0
[   57.051720]  [<ffffffff81043d9f>] set_memory_rw+0x2f/0x40
[   57.058727]  [<ffffffff8104e17c>] bpf_jit_free+0x2c/0x40
[   57.065577]  [<ffffffff81642cba>] sk_filter_release_rcu+0x1a/0x30
[   57.072338]  [<ffffffff811108e2>] rcu_process_callbacks+0x202/0x7c0
[   57.078962]  [<ffffffff81057f17>] __do_softirq+0xf7/0x3f0
[   57.085373]  [<ffffffff81058245>] run_ksoftirqd+0x35/0x70

cannot reuse jited filter memory, since it's readonly,
so use original bpf insns memory to hold work_struct

defer kfree of sk_filter until jit completed freeing

tested on x86_64 and i386

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-07 15:16:45 -04:00
Oussama Ghorbel
582442d6d5 ipv6: Allow the MTU of ipip6 tunnel to be set below 1280
The (inner) MTU of a ipip6 (IPv4-in-IPv6) tunnel cannot be set below 1280, which is the minimum MTU in IPv6.
However, there should be no IPv6 on the tunnel interface at all, so the IPv6 rules should not apply.
More info at https://bugzilla.kernel.org/show_bug.cgi?id=15530

This patch allows to check the minimum MTU for ipv6 tunnel according to these rules:
-In case the tunnel is configured with ipip6 mode the minimum MTU is 68.
-In case the tunnel is configured with ip6ip6 or any mode the minimum MTU is 1280.

Signed-off-by: Oussama Ghorbel <ou.ghorbel@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-07 12:32:26 -04:00
Michael S. Tsirkin
3573540caf netif_set_xps_queue: make cpu mask const
virtio wants to pass in cpumask_of(cpu), make parameter
const to avoid build warnings.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-07 12:29:26 -04:00
Eric Dumazet
5e8a402f83 tcp: do not forget FIN in tcp_shifted_skb()
Yuchung found following problem :

 There are bugs in the SACK processing code, merging part in
 tcp_shift_skb_data(), that incorrectly resets or ignores the sacked
 skbs FIN flag. When a receiver first SACK the FIN sequence, and later
 throw away ofo queue (e.g., sack-reneging), the sender will stop
 retransmitting the FIN flag, and hangs forever.

Following packetdrill test can be used to reproduce the bug.

$ cat sack-merge-bug.pkt
`sysctl -q net.ipv4.tcp_fack=0`

// Establish a connection and send 10 MSS.
0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+.000 bind(3, ..., ...) = 0
+.000 listen(3, 1) = 0

+.050 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
+.000 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 6>
+.001 < . 1:1(0) ack 1 win 1024
+.000 accept(3, ..., ...) = 4

+.100 write(4, ..., 12000) = 12000
+.000 shutdown(4, SHUT_WR) = 0
+.000 > . 1:10001(10000) ack 1
+.050 < . 1:1(0) ack 2001 win 257
+.000 > FP. 10001:12001(2000) ack 1
+.050 < . 1:1(0) ack 2001 win 257 <sack 10001:11001,nop,nop>
+.050 < . 1:1(0) ack 2001 win 257 <sack 10001:12002,nop,nop>
// SACK reneg
+.050 < . 1:1(0) ack 12001 win 257
+0 %{ print "unacked: ",tcpi_unacked }%
+5 %{ print "" }%

First, a typo inverted left/right of one OR operation, then
code forgot to advance end_seq if the merged skb carried FIN.

Bug was added in 2.6.29 by commit 832d11c5cd
("tcp: Try to restore large SKBs while SACK processing")

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-04 14:16:36 -04:00
David S. Miller
7df9b48588 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
John W. Linville says:

====================
Here is another batch of fixes intended for the 3.12 stream...

For the mac80211 bits, Johannes says:

"This time I have two fixes for IBSS (including one for wext, hah), a fix
for extended rates IEs, an active monitor checking fix and a sysfs
registration race fix."

On top of those...

Amitkumar Karwar brings an mwifiex fix for an interrupt loss issue
w/ SDIO devices.  The problem was due to a command timeout issue
introduced by an earlier patch.

Felix Fietkau a stall in the ath9k driver.  This patch fixes the
regression introduced in the commit "ath9k: use software queues for
un-aggregated data packets".

Stanislaw Gruszka reverts an rt2x00 patch that was found to cause
connection problems with some devices.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-03 16:28:34 -04:00