Commit Graph

6066 Commits

Author SHA1 Message Date
David Ahern
f7225172f2 net/ipv6: prevent use after free in ip6_route_mpath_notify
syzbot reported a use-after-free:

BUG: KASAN: use-after-free in ip6_route_mpath_notify+0xe9/0x100 net/ipv6/route.c:4180
Read of size 4 at addr ffff8801bf789cf0 by task syz-executor756/4555

CPU: 1 PID: 4555 Comm: syz-executor756 Not tainted 4.17.0-rc7+ #78
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1b9/0x294 lib/dump_stack.c:113
 print_address_description+0x6c/0x20b mm/kasan/report.c:256
 kasan_report_error mm/kasan/report.c:354 [inline]
 kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
 __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:432
 ip6_route_mpath_notify+0xe9/0x100 net/ipv6/route.c:4180
 ip6_route_multipath_add+0x615/0x1910 net/ipv6/route.c:4303
 inet6_rtm_newroute+0xe3/0x160 net/ipv6/route.c:4391
 ...

Allocated by task 4555:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:448
 set_track mm/kasan/kasan.c:460 [inline]
 kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
 kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
 kmem_cache_alloc+0x12e/0x760 mm/slab.c:3554
 dst_alloc+0xbb/0x1d0 net/core/dst.c:104
 __ip6_dst_alloc+0x35/0xa0 net/ipv6/route.c:361
 ip6_dst_alloc+0x29/0xb0 net/ipv6/route.c:376
 ip6_route_info_create+0x4d4/0x3a30 net/ipv6/route.c:2834
 ip6_route_multipath_add+0xc7e/0x1910 net/ipv6/route.c:4240
 inet6_rtm_newroute+0xe3/0x160 net/ipv6/route.c:4391
 ...

Freed by task 4555:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:448
 set_track mm/kasan/kasan.c:460 [inline]
 __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
 __cache_free mm/slab.c:3498 [inline]
 kmem_cache_free+0x86/0x2d0 mm/slab.c:3756
 dst_destroy+0x267/0x3c0 net/core/dst.c:140
 dst_release_immediate+0x71/0x9e net/core/dst.c:205
 fib6_add+0xa40/0x1650 net/ipv6/ip6_fib.c:1305
 __ip6_ins_rt+0x6c/0x90 net/ipv6/route.c:1011
 ip6_route_multipath_add+0x513/0x1910 net/ipv6/route.c:4267
 inet6_rtm_newroute+0xe3/0x160 net/ipv6/route.c:4391
 ...

The problem is that rt_last can point to a deleted route if the insert
fails.

One reproducer is to insert a route and then add a multipath route that
has a duplicate nexthop.e.g,:
    $ ip -6 ro add vrf red 2001:db8:101::/64 nexthop via 2001:db8:1::2
    $ ip -6 ro append vrf red 2001:db8:101::/64 nexthop via 2001:db8:1::4 nexthop via 2001:db8:1::2

Fix by not setting rt_last until the it is verified the insert succeeded.

Fixes: 3b1137fe74 ("net: ipv6: Change notifications for multipath add to RTA_MULTIPATH")
Cc: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-05 09:56:04 -04:00
Michal Kubecek
fa1be7e01e ipv6: omit traffic class when calculating flow hash
Some of the code paths calculating flow hash for IPv6 use flowlabel member
of struct flowi6 which, despite its name, encodes both flow label and
traffic class. If traffic class changes within a TCP connection (as e.g.
ssh does), ECMP route can switch between path. It's also inconsistent with
other code paths where ip6_flowlabel() (returning only flow label) is used
to feed the key.

Use only flow label everywhere, including one place where hash key is set
using ip6_flowinfo().

Fixes: 51ebd31815 ("ipv6: add support of equal cost multipath (ECMP)")
Fixes: f70ea018da ("net: Add functions to get skb->hash based on flow structures")
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-04 13:21:18 -04:00
David S. Miller
a925ab48da Revert "ipv6: omit traffic class when calculating flow hash"
This reverts commit 87ae68c8b4.

Applied the wrong version of this fix, correct version
coming up.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-04 13:20:38 -04:00
Michal Kubecek
87ae68c8b4 ipv6: omit traffic class when calculating flow hash
Some of the code paths calculating flow hash for IPv6 use flowlabel member
of struct flowi6 which, despite its name, encodes both flow label and
traffic class. If traffic class changes within a TCP connection (as e.g.
ssh does), ECMP route can switch between path. It's also incosistent with
other code paths where ip6_flowlabel() (returning only flow label) is used
to feed the key.

Use only flow label everywhere, including one place where hash key is set
using ip6_flowinfo().

Fixes: 51ebd31815 ("ipv6: add support of equal cost multipath (ECMP)")
Fixes: f70ea018da ("net: Add functions to get skb->hash based on flow structures")
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-04 13:18:35 -04:00
Stephen Suryaputra
2f17becfbe vrf: check the original netdevice for generating redirect
Use the right device to determine if redirect should be sent especially
when using vrf. Same as well as when sending the redirect.

Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-04 10:16:45 -04:00
David S. Miller
9c54aeb03a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Filling in the padding slot in the bpf structure as a bug fix in 'ne'
overlapped with actually using that padding area for something in
'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-03 09:31:58 -04:00
Máté Eckl
45ca4e0cf2 netfilter: Libify xt_TPROXY
The extracted functions will likely be usefull to implement tproxy
support in nf_tables.

Extrancted functions:
	- nf_tproxy_sk_is_transparent
	- nf_tproxy_laddr4
	- nf_tproxy_handle_time_wait4
	- nf_tproxy_get_sock_v4
	- nf_tproxy_laddr6
	- nf_tproxy_handle_time_wait6
	- nf_tproxy_get_sock_v6

(nf_)tproxy_handle_time_wait6 also needed some refactor as its current
implementation was xtables-specific.

Signed-off-by: Máté Eckl <ecklm94@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-03 00:02:05 +02:00
David S. Miller
1ffdd8e164 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter/IPVS updates for your net-next
tree, the most relevant things in this batch are:

1) Compile masquerade infrastructure into NAT module, from Florian Westphal.
   Same thing with the redirection support.

2) Abort transaction if early initialization of the commit phase fails.
   Also from Florian.

3) Get rid of synchronize_rcu() by using rule array in nf_tables, from
   Florian.

4) Abort nf_tables batch if fatal signal is pending, from Florian.

5) Use .call_rcu nfnetlink from nf_tables to make dumps fully lockless.
   From Florian Westphal.

6) Support to match transparent sockets from nf_tables, from Máté Eckl.

7) Audit support for nf_tables, from Phil Sutter.

8) Validate chain dependencies from commit phase, fall back to fine grain
   validation only in case of errors.

9) Attach dst to skbuff from netfilter flowtable packet path, from
   Jason A. Donenfeld.

10) Use artificial maximum attribute cap to remove VLA from nfnetlink.
    Patch from Kees Cook.

11) Add extension to allow to forward packets through neighbour layer.

12) Add IPv6 conntrack helper support to IPVS, from Julian Anastasov.

13) Add IPv6 FTP conntrack support to IPVS, from Julian Anastasov.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-02 09:04:21 -04:00
Nicolas Dichtel
f7ff1fde94 ip6_tunnel: remove magic mtu value 0xFFF8
I don't know where this value comes from (probably a copy and paste and
paste and paste ...).
Let's use standard values which are a bit greater.

Link: https://git.kernel.org/pub/scm/linux/kernel/git/davem/netdev-vger-cvs.git/commit/?id=e5afd356a411a
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-01 13:56:30 -04:00
David S. Miller
ccfde6e27d Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:

====================
pull request (net): ipsec 2018-05-31

1) Avoid possible overflow of the offset variable
   in  _decode_session6(), this fixes an infinite
   lookp there. From Eric Dumazet.

2) We may use an error pointer in the error path of
   xfrm_bundle_create(). Fix this by returning this
   pointer directly to the caller.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-01 13:25:41 -04:00
Yafang Shao
3d97d88e80 tcp: minor optimization around tcp_hdr() usage in receive path
This is additional to the
commit ea1627c20c ("tcp: minor optimizations around tcp_hdr() usage").
At this point, skb->data is same with tcp_hdr() as tcp header has not
been pulled yet. So use the less expensive one to get the tcp header.

Remove the third parameter of tcp_rcv_established() and put it into
the function body.

Furthermore, the local variables are listed as a reverse christmas tree :)

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-31 13:20:47 -04:00
David Ahern
8308f3ff17 net/ipv6: Add support for specifying metric of connected routes
Add support for IFA_RT_PRIORITY to ipv6 addresses.

If the metric is changed on an existing address then the new route
is inserted before removing the old one. Since the metric is one
of the route keys, the prefix route can not be atomically replaced.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-29 10:12:45 -04:00
David Ahern
d169a1f8ba net/ipv6: Pass ifa6_config struct to inet6_addr_modify
Update inet6_addr_modify to take ifa6_config argument versus a parameter
list. This is an argument move only; no functional change intended.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-29 10:12:45 -04:00
David Ahern
19b1518c29 net/ipv6: Pass ifa6_config struct to inet6_addr_add
Move the creation of struct ifa6_config up to callers of inet6_addr_add.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-29 10:12:44 -04:00
David Ahern
e6464b8c63 net/ipv6: Convert ipv6_add_addr to struct ifa6_config
Move config parameters for adding an ipv6 address to a struct. struct
names stem from inet6_rtm_newaddr which is the modern handler for
adding an address.

Start the conversion to ifa6_config with ipv6_add_addr. This is an argument
move only; no functional change intended. Mapping of variable changes:

    addr      -->  cfg->pfx
    peer_addr -->  cfg->peer_pfx
    pfxlen    -->  cfg->plen
    flags     -->  cfg->ifa_flags

scope, valid_lft, prefered_lft have the same names within cfg
(with corrected spelling).

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-29 10:12:44 -04:00
YueHaibing
c1c9a3c966 net: remove unnecessary genlmsg_cancel() calls
the message be freed immediately, no need to trim it
back to the previous size.

Inspired by commit 7a9b3ec1e1 ("nl80211: remove unnecessary genlmsg_cancel() calls")

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-29 09:53:38 -04:00
Mathieu Xhonneux
bbb40a0b75 ipv6: sr: fix memory OOB access in seg6_do_srh_encap/inline
seg6_do_srh_encap and seg6_do_srh_inline can possibly do an
out-of-bounds access when adding the SRH to the packet. This no longer
happen when expanding the skb not only by the size of the SRH (+
outer IPv6 header), but also by skb->mac_len.

[   53.793056] BUG: KASAN: use-after-free in seg6_do_srh_encap+0x284/0x620
[   53.794564] Write of size 14 at addr ffff88011975ecfa by task ping/674

[   53.796665] CPU: 0 PID: 674 Comm: ping Not tainted 4.17.0-rc3-ARCH+ #90
[   53.796670] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.11.0-20171110_100015-anatol 04/01/2014
[   53.796673] Call Trace:
[   53.796679]  <IRQ>
[   53.796689]  dump_stack+0x71/0xab
[   53.796700]  print_address_description+0x6a/0x270
[   53.796707]  kasan_report+0x258/0x380
[   53.796715]  ? seg6_do_srh_encap+0x284/0x620
[   53.796722]  memmove+0x34/0x50
[   53.796730]  seg6_do_srh_encap+0x284/0x620
[   53.796741]  ? seg6_do_srh+0x29b/0x360
[   53.796747]  seg6_do_srh+0x29b/0x360
[   53.796756]  seg6_input+0x2e/0x2e0
[   53.796765]  lwtunnel_input+0x93/0xd0
[   53.796774]  ipv6_rcv+0x690/0x920
[   53.796783]  ? ip6_input+0x170/0x170
[   53.796791]  ? eth_gro_receive+0x2d0/0x2d0
[   53.796800]  ? ip6_input+0x170/0x170
[   53.796809]  __netif_receive_skb_core+0xcc0/0x13f0
[   53.796820]  ? netdev_info+0x110/0x110
[   53.796827]  ? napi_complete_done+0xb6/0x170
[   53.796834]  ? e1000_clean+0x6da/0xf70
[   53.796845]  ? process_backlog+0x129/0x2a0
[   53.796853]  process_backlog+0x129/0x2a0
[   53.796862]  net_rx_action+0x211/0x5c0
[   53.796870]  ? napi_complete_done+0x170/0x170
[   53.796887]  ? run_rebalance_domains+0x11f/0x150
[   53.796891]  __do_softirq+0x10e/0x39e
[   53.796894]  do_softirq_own_stack+0x2a/0x40
[   53.796895]  </IRQ>
[   53.796898]  do_softirq.part.16+0x54/0x60
[   53.796900]  __local_bh_enable_ip+0x5b/0x60
[   53.796903]  ip6_finish_output2+0x416/0x9f0
[   53.796906]  ? ip6_dst_lookup_flow+0x110/0x110
[   53.796909]  ? ip6_sk_dst_lookup_flow+0x390/0x390
[   53.796911]  ? __rcu_read_unlock+0x66/0x80
[   53.796913]  ? ip6_mtu+0x44/0xf0
[   53.796916]  ? ip6_output+0xfc/0x220
[   53.796918]  ip6_output+0xfc/0x220
[   53.796921]  ? ip6_finish_output+0x2b0/0x2b0
[   53.796923]  ? memcpy+0x34/0x50
[   53.796926]  ip6_send_skb+0x43/0xc0
[   53.796929]  rawv6_sendmsg+0x1216/0x1530
[   53.796932]  ? __orc_find+0x6b/0xc0
[   53.796934]  ? rawv6_rcv_skb+0x160/0x160
[   53.796937]  ? __rcu_read_unlock+0x66/0x80
[   53.796939]  ? __rcu_read_unlock+0x66/0x80
[   53.796942]  ? is_bpf_text_address+0x1e/0x30
[   53.796944]  ? kernel_text_address+0xec/0x100
[   53.796946]  ? __kernel_text_address+0xe/0x30
[   53.796948]  ? unwind_get_return_address+0x2f/0x50
[   53.796950]  ? __save_stack_trace+0x92/0x100
[   53.796954]  ? save_stack+0x89/0xb0
[   53.796956]  ? kasan_kmalloc+0xa0/0xd0
[   53.796958]  ? kmem_cache_alloc+0xd2/0x1f0
[   53.796961]  ? prepare_creds+0x23/0x160
[   53.796963]  ? __x64_sys_capset+0x252/0x3e0
[   53.796966]  ? do_syscall_64+0x69/0x160
[   53.796968]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   53.796971]  ? __alloc_pages_nodemask+0x170/0x380
[   53.796973]  ? __alloc_pages_slowpath+0x12c0/0x12c0
[   53.796977]  ? tty_vhangup+0x20/0x20
[   53.796979]  ? policy_nodemask+0x1a/0x90
[   53.796982]  ? __mod_node_page_state+0x8d/0xa0
[   53.796986]  ? __check_object_size+0xe7/0x240
[   53.796989]  ? __sys_sendto+0x229/0x290
[   53.796991]  ? rawv6_rcv_skb+0x160/0x160
[   53.796993]  __sys_sendto+0x229/0x290
[   53.796996]  ? __ia32_sys_getpeername+0x50/0x50
[   53.796999]  ? commit_creds+0x2de/0x520
[   53.797002]  ? security_capset+0x57/0x70
[   53.797004]  ? __x64_sys_capset+0x29f/0x3e0
[   53.797007]  ? __x64_sys_rt_sigsuspend+0xe0/0xe0
[   53.797011]  ? __do_page_fault+0x664/0x770
[   53.797014]  __x64_sys_sendto+0x74/0x90
[   53.797017]  do_syscall_64+0x69/0x160
[   53.797019]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   53.797022] RIP: 0033:0x7f43b7a6714a
[   53.797023] RSP: 002b:00007ffd891bd368 EFLAGS: 00000246 ORIG_RAX:
000000000000002c
[   53.797026] RAX: ffffffffffffffda RBX: 00000000006129c0 RCX: 00007f43b7a6714a
[   53.797028] RDX: 0000000000000040 RSI: 00000000006129c0 RDI: 0000000000000004
[   53.797029] RBP: 00007ffd891be640 R08: 0000000000610940 R09: 000000000000001c
[   53.797030] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[   53.797032] R13: 000000000060e6a0 R14: 0000000000008004 R15: 000000000040b661

[   53.797171] Allocated by task 642:
[   53.797460]  kasan_kmalloc+0xa0/0xd0
[   53.797463]  kmem_cache_alloc+0xd2/0x1f0
[   53.797465]  getname_flags+0x40/0x210
[   53.797467]  user_path_at_empty+0x1d/0x40
[   53.797469]  do_faccessat+0x12a/0x320
[   53.797471]  do_syscall_64+0x69/0x160
[   53.797473]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

[   53.797607] Freed by task 642:
[   53.797869]  __kasan_slab_free+0x130/0x180
[   53.797871]  kmem_cache_free+0xa8/0x230
[   53.797872]  filename_lookup+0x15b/0x230
[   53.797874]  do_faccessat+0x12a/0x320
[   53.797876]  do_syscall_64+0x69/0x160
[   53.797878]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

[   53.798014] The buggy address belongs to the object at ffff88011975e600
                which belongs to the cache names_cache of size 4096
[   53.799043] The buggy address is located 1786 bytes inside of
                4096-byte region [ffff88011975e600, ffff88011975f600)
[   53.800013] The buggy address belongs to the page:
[   53.800414] page:ffffea000465d600 count:1 mapcount:0
mapping:0000000000000000 index:0x0 compound_mapcount: 0
[   53.801259] flags: 0x17fff0000008100(slab|head)
[   53.801640] raw: 017fff0000008100 0000000000000000 0000000000000000
0000000100070007
[   53.803147] raw: dead000000000100 dead000000000200 ffff88011b185a40
0000000000000000
[   53.803787] page dumped because: kasan: bad access detected

[   53.804384] Memory state around the buggy address:
[   53.804788]  ffff88011975eb80: fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb fb
[   53.805384]  ffff88011975ec00: fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb fb
[   53.805979] >ffff88011975ec80: fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb fb
[   53.806577]                                                                 ^
[   53.807165]  ffff88011975ed00: fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb fb
[   53.807762]  ffff88011975ed80: fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb fb
[   53.808356] ==================================================================
[   53.808949] Disabling lock debugging due to kernel taint

Fixes: 6c8702c60b ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
Signed-off-by: David Lebrun <dlebrun@google.com>
Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-28 23:09:49 -04:00
Florian Westphal
0168e8b361 netfilter: nat: merge ipv4/ipv6 masquerade code into main nat module
Instead of using extra modules for these, turn the config options into
an implicit dependency that adds masq feature to the protocol specific nf_nat module.

before:
   text    data     bss     dec     hex filename
   2001     860       4    2865     b31 net/ipv4/netfilter/nf_nat_masquerade_ipv4.ko
   5579     780       2    6361    18d9 net/ipv4/netfilter/nf_nat_ipv4.ko
   2860     836       8    3704     e78 net/ipv6/netfilter/nf_nat_masquerade_ipv6.ko
   6648     780       2    7430    1d06 net/ipv6/netfilter/nf_nat_ipv6.ko

after:
   text    data     bss     dec     hex filename
   7245     872       8    8125    1fbd net/ipv4/netfilter/nf_nat_ipv4.ko
   9165     848      12   10025    2729 net/ipv6/netfilter/nf_nat_ipv6.ko

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-29 00:25:36 +02:00
David Ahern
30d444d300 net/ipv6: Udate fib6_table_lookup tracepoint
Commit bb0ad1987e ("ipv6: fib6_rules: support for match on sport, dport
and ip proto") added support for protocol and ports to FIB rules.
Update the FIB lookup tracepoint to dump the parameters.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24 23:01:15 -04:00
David S. Miller
90fed9c946 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2018-05-24

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Björn Töpel cleans up AF_XDP (removes rebind, explicit cache alignment from uapi, etc).

2) David Ahern adds mtu checks to bpf_ipv{4,6}_fib_lookup() helpers.

3) Jesper Dangaard Brouer adds bulking support to ndo_xdp_xmit.

4) Jiong Wang adds support for indirect and arithmetic shifts to NFP

5) Martin KaFai Lau cleans up BTF uapi and makes the btf_header extensible.

6) Mathieu Xhonneux adds an End.BPF action to seg6local with BPF helpers allowing
   to edit/grow/shrink a SRH and apply on a packet generic SRv6 actions.

7) Sandipan Das adds support for bpf2bpf function calls in ppc64 JIT.

8) Yonghong Song adds BPF_TASK_FD_QUERY command for introspection of tracing events.

9) other misc fixes from Gustavo A. R. Silva, Sirio Balmelli, John Fastabend, and Magnus Karlsson
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24 22:20:51 -04:00
Mathieu Xhonneux
004d4b274e ipv6: sr: Add seg6local action End.BPF
This patch adds the End.BPF action to the LWT seg6local infrastructure.
This action works like any other seg6local End action, meaning that an IPv6
header with SRH is needed, whose DA has to be equal to the SID of the
action. It will also advance the SRH to the next segment, the BPF program
does not have to take care of this.

Since the BPF program may not be a source of instability in the kernel, it
is important to ensure that the integrity of the packet is maintained
before yielding it back to the IPv6 layer. The hook hence keeps track if
the SRH has been altered through the helpers, and re-validates its
content if needed with seg6_validate_srh. The state kept for validation is
stored in a per-CPU buffer. The BPF program is not allowed to directly
write into the packet, and only some fields of the SRH can be altered
through the helper bpf_lwt_seg6_store_bytes.

Performances profiling has shown that the SRH re-validation does not induce
a significant overhead. If the altered SRH is deemed as invalid, the packet
is dropped.

This validation is also done before executing any action through
bpf_lwt_seg6_action, and will not be performed again if the SRH is not
modified after calling the action.

The BPF program may return 3 types of return codes:
    - BPF_OK: the End.BPF action will look up the next destination through
             seg6_lookup_nexthop.
    - BPF_REDIRECT: if an action has been executed through the
          bpf_lwt_seg6_action helper, the BPF program should return this
          value, as the skb's destination is already set and the default
          lookup should not be performed.
    - BPF_DROP : the packet will be dropped.

Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
Acked-by: David Lebrun <dlebrun@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-24 11:57:36 +02:00
Mathieu Xhonneux
fe94cc290f bpf: Add IPv6 Segment Routing helpers
The BPF seg6local hook should be powerful enough to enable users to
implement most of the use-cases one could think of. After some thinking,
we figured out that the following actions should be possible on a SRv6
packet, requiring 3 specific helpers :
    - bpf_lwt_seg6_store_bytes: Modify non-sensitive fields of the SRH
    - bpf_lwt_seg6_adjust_srh: Allow to grow or shrink a SRH
                               (to add/delete TLVs)
    - bpf_lwt_seg6_action: Apply some SRv6 network programming actions
                           (specifically End.X, End.T, End.B6 and
                            End.B6.Encap)

The specifications of these helpers are provided in the patch (see
include/uapi/linux/bpf.h).

The non-sensitive fields of the SRH are the following : flags, tag and
TLVs. The other fields can not be modified, to maintain the SRH
integrity. Flags, tag and TLVs can easily be modified as their validity
can be checked afterwards via seg6_validate_srh. It is not allowed to
modify the segments directly. If one wants to add segments on the path,
he should stack a new SRH using the End.B6 action via
bpf_lwt_seg6_action.

Growing, shrinking or editing TLVs via the helpers will flag the SRH as
invalid, and it will have to be re-validated before re-entering the IPv6
layer. This flag is stored in a per-CPU buffer, along with the current
header length in bytes.

Storing the SRH len in bytes in the control block is mandatory when using
bpf_lwt_seg6_adjust_srh. The Header Ext. Length field contains the SRH
len rounded to 8 bytes (a padding TLV can be inserted to ensure the 8-bytes
boundary). When adding/deleting TLVs within the BPF program, the SRH may
temporary be in an invalid state where its length cannot be rounded to 8
bytes without remainder, hence the need to store the length in bytes
separately. The caller of the BPF program can then ensure that the SRH's
final length is valid using this value. Again, a final SRH modified by a
BPF program which doesn’t respect the 8-bytes boundary will be discarded
as it will be considered as invalid.

Finally, a fourth helper is provided, bpf_lwt_push_encap, which is
available from the LWT BPF IN hook, but not from the seg6local BPF one.
This helper allows to encapsulate a Segment Routing Header (either with
a new outer IPv6 header, or by inlining it directly in the existing IPv6
header) into a non-SRv6 packet. This helper is required if we want to
offer the possibility to dynamically encapsulate a SRH for non-SRv6 packet,
as the BPF seg6local hook only works on traffic already containing a SRH.
This is the BPF equivalent of the seg6 LWT infrastructure, which achieves
the same purpose but with a static SRH per route.

These helpers require CONFIG_IPV6=y (and not =m).

Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
Acked-by: David Lebrun <dlebrun@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-24 11:57:35 +02:00
Mathieu Xhonneux
1c1e761ef1 ipv6: sr: export function lookup_nexthop
The function lookup_nexthop is essential to implement most of the seg6local
actions. As we want to provide a BPF helper allowing to apply some of these
actions on the packet being processed, the helper should be able to call
this function, hence the need to make it public.

Moreover, if one argument is incorrect or if the next hop can not be found,
an error should be returned by the BPF helper so the BPF program can adapt
its processing of the packet (return an error, properly force the drop,
...). This patch hence makes this function return dst->error to indicate a
possible error.

Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
Acked-by: David Lebrun <dlebrun@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-24 11:57:35 +02:00
David S. Miller
fb83eb93c6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for your net-next
tree, they are:

1) Remove obsolete nf_log tracing from nf_tables, from Florian Westphal.

2) Add support for map lookups to numgen, random and hash expressions,
   from Laura Garcia.

3) Allow to register nat hooks for iptables and nftables at the same
   time. Patchset from Florian Westpha.

4) Timeout support for rbtree sets.

5) ip6_rpfilter works needs interface for link-local addresses, from
   Vincent Bernat.

6) Add nf_ct_hook and nf_nat_hook structures and use them.

7) Do not drop packets on packets raceing to insert conntrack entries
   into hashes, this is particularly a problem in nfqueue setups.

8) Address fallout from xt_osf separation to nf_osf, patches
   from Florian Westphal and Fernando Mancera.

9) Remove reference to struct nft_af_info, which doesn't exist anymore.
   From Taehee Yoo.

This batch comes with is a conflict between 25fd386e0b ("netfilter:
core: add missing __rcu annotation") in your tree and 2c205dd398
("netfilter: add struct nf_nat_hook and use it") coming in this batch.
This conflict can be solved by leaving the __rcu tag on
__netfilter_net_init() - added by 25fd386e0b - and remove all code
related to nf_nat_decode_session_hook - which is gone after
2c205dd398, as described by:

diff --cc net/netfilter/core.c
index e0ae4aae96f5,206fb2c4c319..168af54db975
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@@ -611,7 -580,13 +611,8 @@@ const struct nf_conntrack_zone nf_ct_zo
  EXPORT_SYMBOL_GPL(nf_ct_zone_dflt);
  #endif /* CONFIG_NF_CONNTRACK */

- static void __net_init __netfilter_net_init(struct nf_hook_entries **e, int max)
 -#ifdef CONFIG_NF_NAT_NEEDED
 -void (*nf_nat_decode_session_hook)(struct sk_buff *, struct flowi *);
 -EXPORT_SYMBOL(nf_nat_decode_session_hook);
 -#endif
 -
+ static void __net_init
+ __netfilter_net_init(struct nf_hook_entries __rcu **e, int max)
  {
  	int h;

I can also merge your net-next tree into nf-next, solve the conflict and
resend the pull request if you prefer so.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23 16:37:11 -04:00
Roopa Prabhu
eacb9384a3 ipv6: support sport, dport and ip_proto in RTM_GETROUTE
This is a followup to fib6 rules sport, dport and ipproto
match support. Only supports tcp, udp and icmp for ipproto.
Used by fib rule self tests.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23 15:14:12 -04:00
Willem de Bruijn
ff06342cbc udp: exclude gso from xfrm paths
UDP GSO delays final datagram construction to the GSO layer. This
conflicts with protocol transformations.

Fixes: bec1f6f697 ("udp: generate gso with UDP_SEGMENT")
CC: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23 14:48:44 -04:00
Vincent Bernat
cede24d1b2 netfilter: ip6t_rpfilter: provide input interface for route lookup
In commit 47b7e7f828, this bit was removed at the same time the
RT6_LOOKUP_F_IFACE flag was removed. However, it is needed when
link-local addresses are used, which is a very common case: when
packets are routed, neighbor solicitations are done using link-local
addresses. For example, the following neighbor solicitation is not
matched by "-m rpfilter":

    IP6 fe80::5254:33ff:fe00:1 > ff02::1:ff00:3: ICMP6, neighbor
    solicitation, who has 2001:db8::5254:33ff:fe00:3, length 32

Commit 47b7e7f828 doesn't quite explain why we shouldn't use
RT6_LOOKUP_F_IFACE in the rpfilter case. I suppose the interface check
later in the function would make it redundant. However, the remaining
of the routing code is using RT6_LOOKUP_F_IFACE when there is no
source address (which matches rpfilter's case with a non-unicast
destination, like with neighbor solicitation).

Signed-off-by: Vincent Bernat <vincent@bernat.im>
Fixes: 47b7e7f828 ("netfilter: don't set F_IFACE on ipv6 fib lookups")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-23 09:25:54 +02:00
Florian Westphal
9971a514ed netfilter: nf_nat: add nat type hooks to nat core
Currently the packet rewrite and instantiation of nat NULL bindings
happens from the protocol specific nat backend.

Invocation occurs either via ip(6)table_nat or the nf_tables nat chain type.

Invocation looks like this (simplified):
NF_HOOK()
   |
   `---iptable_nat
	 |
	 `---> nf_nat_l3proto_ipv4 -> nf_nat_packet
	               |
          new packet? pass skb though iptables nat chain
                       |
		       `---> iptable_nat: ipt_do_table

In nft case, this looks the same (nft_chain_nat_ipv4 instead of
iptable_nat).

This is a problem for two reasons:
1. Can't use iptables nat and nf_tables nat at the same time,
   as the first user adds a nat binding (nf_nat_l3proto_ipv4 adds a
   NULL binding if do_table() did not find a matching nat rule so we
   can detect post-nat tuple collisions).
2. If you use e.g. nft_masq, snat, redir, etc. uses must also register
   an empty base chain so that the nat core gets called fro NF_HOOK()
   to do the reverse translation, which is neither obvious nor user
   friendly.

After this change, the base hook gets registered not from iptable_nat or
nftables nat hooks, but from the l3 nat core.

iptables/nft nat base hooks get registered with the nat core instead:

NF_HOOK()
   |
   `---> nf_nat_l3proto_ipv4 -> nf_nat_packet
		|
         new packet? pass skb through iptables/nftables nat chains
                |
		+-> iptables_nat: ipt_do_table
	        +-> nft nat chain x
	        `-> nft nat chain y

The nat core deals with null bindings and reverse translation.
When no mapping exists, it calls the registered nat lookup hooks until
one creates a new mapping.
If both iptables and nftables nat hooks exist, the first matching
one is used (i.e., higher priority wins).

Also, nft users do not need to create empty nat hooks anymore,
nat core always registers the base hooks that take care of reverse/reply
translation.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-23 09:14:06 +02:00
Florian Westphal
4e25ceb80b netfilter: nf_tables: allow chain type to override hook register
Will be used in followup patch when nat types no longer
use nf_register_net_hook() but will instead register with the nat core.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-23 09:14:05 +02:00
Florian Westphal
ba7d284a98 netfilter: xtables: allow table definitions not backed by hook_ops
The ip(6)tables nat table is currently receiving skbs from the netfilter
core, after a followup patch skbs will be coming from the netfilter nat
core instead, so the table is no longer backed by normal hook_ops.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-23 09:14:05 +02:00
Florian Westphal
1f55236bd8 netfilter: nf_nat: move common nat code to nat core
Copy-pasted, both l3 helpers almost use same code here.
Split out the common part into an 'inet' helper.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-23 09:14:05 +02:00
David Ahern
f34436a430 net/ipv6: Simplify route replace and appending into multipath route
Bring consistency to ipv6 route replace and append semantics.

Remove rt6_qualify_for_ecmp which is just guess work. It fails in 2 cases:
1. can not replace a route with a reject route. Existing code appends
   a new route instead of replacing the existing one.

2. can not have a multipath route where a leg uses a dev only nexthop

Existing use cases affected by this change:
1. adding a route with existing prefix and metric using NLM_F_CREATE
   without NLM_F_APPEND or NLM_F_EXCL (ie., what iproute2 calls
   'prepend'). Existing code auto-determines that the new nexthop can
   be appended to an existing route to create a multipath route. This
   change breaks that by requiring the APPEND flag for the new route
   to be added to an existing one. Instead the prepend just adds another
   route entry.

2. route replace. Existing code replaces first matching multipath route
   if new route is multipath capable and fallback to first matching
   non-ECMP route (reject or dev only route) in case one isn't available.
   New behavior replaces first matching route. (Thanks to Ido for spotting
   this one)

Note: Newer iproute2 is needed to display multipath routes with a dev-only
      nexthop. This is due to a bug in iproute2 and parsing nexthops.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-22 14:44:18 -04:00
David Ahern
901731b882 net/ipv6: Add helper to return path MTU based on fib result
Determine path MTU from a FIB lookup result. Logic is based on
ip6_dst_mtu_forward plus lookup of nexthop exception.

Add ip6_dst_mtu_forward to ipv6_stubs to handle access by core
bpf code.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-22 10:51:09 +02:00
David S. Miller
6f6e434aa2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
S390 bpf_jit.S is removed in net-next and had changes in 'net',
since that code isn't used any more take the removal.

TLS data structures split the TX and RX components in 'net-next',
put the new struct members from the bug fix in 'net' into the RX
part.

The 'net-next' tree had some reworking of how the ERSPAN code works in
the GRE tunneling code, overlapping with a one-line headroom
calculation fix in 'net'.

Overlapping changes in __sock_map_ctx_update_elem(), keep the bits
that read the prog members via READ_ONCE() into local variables
before using them.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-21 16:01:54 -04:00
William Tu
b80d0b93b9 net: ip6_gre: fix tunnel metadata device sharing.
Currently ip6gre and ip6erspan share single metadata mode device,
using 'collect_md_tun'.  Thus, when doing:
  ip link add dev ip6gre11 type ip6gretap external
  ip link add dev ip6erspan12 type ip6erspan external
  RTNETLINK answers: File exists
simply fails due to the 2nd tries to create the same collect_md_tun.

The patch fixes it by adding a separate collect md tunnel device
for the ip6erspan, 'collect_md_tun_erspan'.  As a result, a couple
of places need to refactor/split up in order to distinguish ip6gre
and ip6erspan.

First, move the collect_md check at ip6gre_tunnel_{unlink,link} and
create separate function {ip6gre,ip6ersapn}_tunnel_{link_md,unlink_md}.
Then before link/unlink, make sure the link_md/unlink_md is called.
Finally, a separate ndo_uninit is created for ip6erspan.  Tested it
using the samples/bpf/test_tunnel_bpf.sh.

Fixes: ef7baf5e08 ("ip6_gre: add ip6 erspan collect_md mode")
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-19 23:32:12 -04:00
Willem de Bruijn
113f99c335 net: test tailroom before appending to linear skb
Device features may change during transmission. In particular with
corking, a device may toggle scatter-gather in between allocating
and writing to an skb.

Do not unconditionally assume that !NETIF_F_SG at write time implies
that the same held at alloc time and thus the skb has sufficient
tailroom.

This issue predates git history.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17 17:05:01 -04:00
Petr Machata
2d665034f2 net: ip6_gre: Fix ip6erspan hlen calculation
Even though ip6erspan_tap_init() sets up hlen and tun_hlen according to
what ERSPAN needs, it goes ahead to call ip6gre_tnl_link_config() which
overwrites these settings with GRE-specific ones.

Similarly for changelink callbacks, which are handled by
ip6gre_changelink() calls ip6gre_tnl_change() calls
ip6gre_tnl_link_config() as well.

The difference ends up being 12 vs. 20 bytes, and this is generally not
a problem, because a 12-byte request likely ends up allocating more and
the extra 8 bytes are thus available. However correct it is not.

So replace the newlink and changelink callbacks with an ERSPAN-specific
ones, reusing the newly-introduced _common() functions.

Fixes: 5a963eb61b ("ip6_gre: Add ERSPAN native tunnel support")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17 16:50:06 -04:00
Petr Machata
c8632fc30b net: ip6_gre: Split up ip6gre_changelink()
Extract from ip6gre_changelink() a reusable function
ip6gre_changelink_common(). This will allow introduction of
ERSPAN-specific _changelink() function with not a lot of code
duplication.

Fixes: 5a963eb61b ("ip6_gre: Add ERSPAN native tunnel support")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17 16:50:06 -04:00
Petr Machata
7fa38a7c85 net: ip6_gre: Split up ip6gre_newlink()
Extract from ip6gre_newlink() a reusable function
ip6gre_newlink_common(). The ip6gre_tnl_link_config() call needs to be
made customizable for ERSPAN, thus reorder it with calls to
ip6_tnl_change_mtu() and dev_hold(), and extract the whole tail to the
caller, ip6gre_newlink(). Thus enable an ERSPAN-specific _newlink()
function without a lot of duplicity.

Fixes: 5a963eb61b ("ip6_gre: Add ERSPAN native tunnel support")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17 16:50:06 -04:00
Petr Machata
a6465350ef net: ip6_gre: Split up ip6gre_tnl_change()
Split a reusable function ip6gre_tnl_copy_tnl_parm() from
ip6gre_tnl_change(). This will allow ERSPAN-specific code to
reuse the common parts while customizing the behavior for ERSPAN.

Fixes: 5a963eb61b ("ip6_gre: Add ERSPAN native tunnel support")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17 16:50:06 -04:00
Petr Machata
a483373ead net: ip6_gre: Split up ip6gre_tnl_link_config()
The function ip6gre_tnl_link_config() is used for setting up
configuration of both ip6gretap and ip6erspan tunnels. Split the
function into the common part and the route-lookup part. The latter then
takes the calculated header length as an argument. This split will allow
the patches down the line to sneak in a custom header length computation
for the ERSPAN tunnel.

Fixes: 5a963eb61b ("ip6_gre: Add ERSPAN native tunnel support")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17 16:50:06 -04:00
Petr Machata
5691484df9 net: ip6_gre: Fix headroom request in ip6erspan_tunnel_xmit()
dev->needed_headroom is not primed until ip6_tnl_xmit(), so it starts
out zero. Thus the call to skb_cow_head() fails to actually make sure
there's enough headroom to push the ERSPAN headers to. That can lead to
the panic cited below. (Reproducer below that).

Fix by requesting either needed_headroom if already primed, or just the
bare minimum needed for the header otherwise.

[  190.703567] kernel BUG at net/core/skbuff.c:104!
[  190.708384] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
[  190.714007] Modules linked in: act_mirred cls_matchall ip6_gre ip6_tunnel tunnel6 gre sch_ingress vrf veth x86_pkg_temp_thermal mlx_platform nfsd e1000e leds_mlxcpld
[  190.728975] CPU: 1 PID: 959 Comm: kworker/1:2 Not tainted 4.17.0-rc4-net_master-custom-139 #10
[  190.737647] Hardware name: Mellanox Technologies Ltd. "MSN2410-CB2F"/"SA000874", BIOS 4.6.5 03/08/2016
[  190.747006] Workqueue: ipv6_addrconf addrconf_dad_work
[  190.752222] RIP: 0010:skb_panic+0xc3/0x100
[  190.756358] RSP: 0018:ffff8801d54072f0 EFLAGS: 00010282
[  190.761629] RAX: 0000000000000085 RBX: ffff8801c1a8ecc0 RCX: 0000000000000000
[  190.768830] RDX: 0000000000000085 RSI: dffffc0000000000 RDI: ffffed003aa80e54
[  190.776025] RBP: ffff8801bd1ec5a0 R08: ffffed003aabce19 R09: ffffed003aabce19
[  190.783226] R10: 0000000000000001 R11: ffffed003aabce18 R12: ffff8801bf695dbe
[  190.790418] R13: 0000000000000084 R14: 00000000000006c0 R15: ffff8801bf695dc8
[  190.797621] FS:  0000000000000000(0000) GS:ffff8801d5400000(0000) knlGS:0000000000000000
[  190.805786] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  190.811582] CR2: 000055fa929aced0 CR3: 0000000003228004 CR4: 00000000001606e0
[  190.818790] Call Trace:
[  190.821264]  <IRQ>
[  190.823314]  ? ip6erspan_tunnel_xmit+0x5e4/0x1982 [ip6_gre]
[  190.828940]  ? ip6erspan_tunnel_xmit+0x5e4/0x1982 [ip6_gre]
[  190.834562]  skb_push+0x78/0x90
[  190.837749]  ip6erspan_tunnel_xmit+0x5e4/0x1982 [ip6_gre]
[  190.843219]  ? ip6gre_tunnel_ioctl+0xd90/0xd90 [ip6_gre]
[  190.848577]  ? debug_check_no_locks_freed+0x210/0x210
[  190.853679]  ? debug_check_no_locks_freed+0x210/0x210
[  190.858783]  ? print_irqtrace_events+0x120/0x120
[  190.863451]  ? sched_clock_cpu+0x18/0x210
[  190.867496]  ? cyc2ns_read_end+0x10/0x10
[  190.871474]  ? skb_network_protocol+0x76/0x200
[  190.875977]  dev_hard_start_xmit+0x137/0x770
[  190.880317]  ? do_raw_spin_trylock+0x6d/0xa0
[  190.884624]  sch_direct_xmit+0x2ef/0x5d0
[  190.888589]  ? pfifo_fast_dequeue+0x3fa/0x670
[  190.892994]  ? pfifo_fast_change_tx_queue_len+0x810/0x810
[  190.898455]  ? __lock_is_held+0xa0/0x160
[  190.902422]  __qdisc_run+0x39e/0xfc0
[  190.906041]  ? _raw_spin_unlock+0x29/0x40
[  190.910090]  ? pfifo_fast_enqueue+0x24b/0x3e0
[  190.914501]  ? sch_direct_xmit+0x5d0/0x5d0
[  190.918658]  ? pfifo_fast_dequeue+0x670/0x670
[  190.923047]  ? __dev_queue_xmit+0x172/0x1770
[  190.927365]  ? preempt_count_sub+0xf/0xd0
[  190.931421]  __dev_queue_xmit+0x410/0x1770
[  190.935553]  ? ___slab_alloc+0x605/0x930
[  190.939524]  ? print_irqtrace_events+0x120/0x120
[  190.944186]  ? memcpy+0x34/0x50
[  190.947364]  ? netdev_pick_tx+0x1c0/0x1c0
[  190.951428]  ? __skb_clone+0x2fd/0x3d0
[  190.955218]  ? __copy_skb_header+0x270/0x270
[  190.959537]  ? rcu_read_lock_sched_held+0x93/0xa0
[  190.964282]  ? kmem_cache_alloc+0x344/0x4d0
[  190.968520]  ? cyc2ns_read_end+0x10/0x10
[  190.972495]  ? skb_clone+0x123/0x230
[  190.976112]  ? skb_split+0x820/0x820
[  190.979747]  ? tcf_mirred+0x554/0x930 [act_mirred]
[  190.984582]  tcf_mirred+0x554/0x930 [act_mirred]
[  190.989252]  ? tcf_mirred_act_wants_ingress.part.2+0x10/0x10 [act_mirred]
[  190.996109]  ? __lock_acquire+0x706/0x26e0
[  191.000239]  ? sched_clock_cpu+0x18/0x210
[  191.004294]  tcf_action_exec+0xcf/0x2a0
[  191.008179]  tcf_classify+0xfa/0x340
[  191.011794]  __netif_receive_skb_core+0x8e1/0x1c60
[  191.016630]  ? debug_check_no_locks_freed+0x210/0x210
[  191.021732]  ? nf_ingress+0x500/0x500
[  191.025458]  ? process_backlog+0x347/0x4b0
[  191.029619]  ? print_irqtrace_events+0x120/0x120
[  191.034302]  ? lock_acquire+0xd8/0x320
[  191.038089]  ? process_backlog+0x1b6/0x4b0
[  191.042246]  ? process_backlog+0xc2/0x4b0
[  191.046303]  process_backlog+0xc2/0x4b0
[  191.050189]  net_rx_action+0x5cc/0x980
[  191.053991]  ? napi_complete_done+0x2c0/0x2c0
[  191.058386]  ? mark_lock+0x13d/0xb40
[  191.062001]  ? clockevents_program_event+0x6b/0x1d0
[  191.066922]  ? print_irqtrace_events+0x120/0x120
[  191.071593]  ? __lock_is_held+0xa0/0x160
[  191.075566]  __do_softirq+0x1d4/0x9d2
[  191.079282]  ? ip6_finish_output2+0x524/0x1460
[  191.083771]  do_softirq_own_stack+0x2a/0x40
[  191.087994]  </IRQ>
[  191.090130]  do_softirq.part.13+0x38/0x40
[  191.094178]  __local_bh_enable_ip+0x135/0x190
[  191.098591]  ip6_finish_output2+0x54d/0x1460
[  191.102916]  ? ip6_forward_finish+0x2f0/0x2f0
[  191.107314]  ? ip6_mtu+0x3c/0x2c0
[  191.110674]  ? ip6_finish_output+0x2f8/0x650
[  191.114992]  ? ip6_output+0x12a/0x500
[  191.118696]  ip6_output+0x12a/0x500
[  191.122223]  ? ip6_route_dev_notify+0x5b0/0x5b0
[  191.126807]  ? ip6_finish_output+0x650/0x650
[  191.131120]  ? ip6_fragment+0x1a60/0x1a60
[  191.135182]  ? icmp6_dst_alloc+0x26e/0x470
[  191.139317]  mld_sendpack+0x672/0x830
[  191.143021]  ? igmp6_mcf_seq_next+0x2f0/0x2f0
[  191.147429]  ? __local_bh_enable_ip+0x77/0x190
[  191.151913]  ipv6_mc_dad_complete+0x47/0x90
[  191.156144]  addrconf_dad_completed+0x561/0x720
[  191.160731]  ? addrconf_rs_timer+0x3a0/0x3a0
[  191.165036]  ? mark_held_locks+0xc9/0x140
[  191.169095]  ? __local_bh_enable_ip+0x77/0x190
[  191.173570]  ? addrconf_dad_work+0x50d/0xa20
[  191.177886]  ? addrconf_dad_work+0x529/0xa20
[  191.182194]  addrconf_dad_work+0x529/0xa20
[  191.186342]  ? addrconf_dad_completed+0x720/0x720
[  191.191088]  ? __lock_is_held+0xa0/0x160
[  191.195059]  ? process_one_work+0x45d/0xe20
[  191.199302]  ? process_one_work+0x51e/0xe20
[  191.203531]  ? rcu_read_lock_sched_held+0x93/0xa0
[  191.208279]  process_one_work+0x51e/0xe20
[  191.212340]  ? pwq_dec_nr_in_flight+0x200/0x200
[  191.216912]  ? get_lock_stats+0x4b/0xf0
[  191.220788]  ? preempt_count_sub+0xf/0xd0
[  191.224844]  ? worker_thread+0x219/0x860
[  191.228823]  ? do_raw_spin_trylock+0x6d/0xa0
[  191.233142]  worker_thread+0xeb/0x860
[  191.236848]  ? process_one_work+0xe20/0xe20
[  191.241095]  kthread+0x206/0x300
[  191.244352]  ? process_one_work+0xe20/0xe20
[  191.248587]  ? kthread_stop+0x570/0x570
[  191.252459]  ret_from_fork+0x3a/0x50
[  191.256082] Code: 14 3e ff 8b 4b 78 55 4d 89 f9 41 56 41 55 48 c7 c7 a0 cf db 82 41 54 44 8b 44 24 2c 48 8b 54 24 30 48 8b 74 24 20 e8 16 94 13 ff <0f> 0b 48 c7 c7 60 8e 1f 85 48 83 c4 20 e8 55 ef a6 ff 89 74 24
[  191.275327] RIP: skb_panic+0xc3/0x100 RSP: ffff8801d54072f0
[  191.281024] ---[ end trace 7ea51094e099e006 ]---
[  191.285724] Kernel panic - not syncing: Fatal exception in interrupt
[  191.292168] Kernel Offset: disabled
[  191.295697] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

Reproducer:

	ip link add h1 type veth peer name swp1
	ip link add h3 type veth peer name swp3

	ip link set dev h1 up
	ip address add 192.0.2.1/28 dev h1

	ip link add dev vh3 type vrf table 20
	ip link set dev h3 master vh3
	ip link set dev vh3 up
	ip link set dev h3 up

	ip link set dev swp3 up
	ip address add dev swp3 2001:db8:2::1/64

	ip link set dev swp1 up
	tc qdisc add dev swp1 clsact

	ip link add name gt6 type ip6erspan \
		local 2001:db8:2::1 remote 2001:db8:2::2 oseq okey 123
	ip link set dev gt6 up

	sleep 1

	tc filter add dev swp1 ingress pref 1000 matchall skip_hw \
		action mirred egress mirror dev gt6
	ping -I h1 192.0.2.2

Fixes: e41c7c68ea ("ip6erspan: make sure enough headroom at xmit.")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17 16:50:06 -04:00
Petr Machata
01b8d064d5 net: ip6_gre: Request headroom in __gre6_xmit()
__gre6_xmit() pushes GRE headers before handing over to ip6_tnl_xmit()
for generic IP-in-IP processing. However it doesn't make sure that there
is enough headroom to push the header to. That can lead to the panic
cited below. (Reproducer below that).

Fix by requesting either needed_headroom if already primed, or just the
bare minimum needed for the header otherwise.

[  158.576725] kernel BUG at net/core/skbuff.c:104!
[  158.581510] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
[  158.587174] Modules linked in: act_mirred cls_matchall ip6_gre ip6_tunnel tunnel6 gre sch_ingress vrf veth x86_pkg_temp_thermal mlx_platform nfsd e1000e leds_mlxcpld
[  158.602268] CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 4.17.0-rc4-net_master-custom-139 #10
[  158.610938] Hardware name: Mellanox Technologies Ltd. "MSN2410-CB2F"/"SA000874", BIOS 4.6.5 03/08/2016
[  158.620426] RIP: 0010:skb_panic+0xc3/0x100
[  158.624586] RSP: 0018:ffff8801d3f27110 EFLAGS: 00010286
[  158.629882] RAX: 0000000000000082 RBX: ffff8801c02cc040 RCX: 0000000000000000
[  158.637127] RDX: 0000000000000082 RSI: dffffc0000000000 RDI: ffffed003a7e4e18
[  158.644366] RBP: ffff8801bfec8020 R08: ffffed003aabce19 R09: ffffed003aabce19
[  158.651574] R10: 000000000000000b R11: ffffed003aabce18 R12: ffff8801c364de66
[  158.658786] R13: 000000000000002c R14: 00000000000000c0 R15: ffff8801c364de68
[  158.666007] FS:  0000000000000000(0000) GS:ffff8801d5400000(0000) knlGS:0000000000000000
[  158.674212] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  158.680036] CR2: 00007f4b3702dcd0 CR3: 0000000003228002 CR4: 00000000001606e0
[  158.687228] Call Trace:
[  158.689752]  ? __gre6_xmit+0x246/0xd80 [ip6_gre]
[  158.694475]  ? __gre6_xmit+0x246/0xd80 [ip6_gre]
[  158.699141]  skb_push+0x78/0x90
[  158.702344]  __gre6_xmit+0x246/0xd80 [ip6_gre]
[  158.706872]  ip6gre_tunnel_xmit+0x3bc/0x610 [ip6_gre]
[  158.711992]  ? __gre6_xmit+0xd80/0xd80 [ip6_gre]
[  158.716668]  ? debug_check_no_locks_freed+0x210/0x210
[  158.721761]  ? print_irqtrace_events+0x120/0x120
[  158.726461]  ? sched_clock_cpu+0x18/0x210
[  158.730572]  ? sched_clock_cpu+0x18/0x210
[  158.734692]  ? cyc2ns_read_end+0x10/0x10
[  158.738705]  ? skb_network_protocol+0x76/0x200
[  158.743216]  ? netif_skb_features+0x1b2/0x550
[  158.747648]  dev_hard_start_xmit+0x137/0x770
[  158.752010]  sch_direct_xmit+0x2ef/0x5d0
[  158.755992]  ? pfifo_fast_dequeue+0x3fa/0x670
[  158.760460]  ? pfifo_fast_change_tx_queue_len+0x810/0x810
[  158.765975]  ? __lock_is_held+0xa0/0x160
[  158.770002]  __qdisc_run+0x39e/0xfc0
[  158.773673]  ? _raw_spin_unlock+0x29/0x40
[  158.777781]  ? pfifo_fast_enqueue+0x24b/0x3e0
[  158.782191]  ? sch_direct_xmit+0x5d0/0x5d0
[  158.786372]  ? pfifo_fast_dequeue+0x670/0x670
[  158.790818]  ? __dev_queue_xmit+0x172/0x1770
[  158.795195]  ? preempt_count_sub+0xf/0xd0
[  158.799313]  __dev_queue_xmit+0x410/0x1770
[  158.803512]  ? ___slab_alloc+0x605/0x930
[  158.807525]  ? ___slab_alloc+0x605/0x930
[  158.811540]  ? memcpy+0x34/0x50
[  158.814768]  ? netdev_pick_tx+0x1c0/0x1c0
[  158.818895]  ? __skb_clone+0x2fd/0x3d0
[  158.822712]  ? __copy_skb_header+0x270/0x270
[  158.827079]  ? rcu_read_lock_sched_held+0x93/0xa0
[  158.831903]  ? kmem_cache_alloc+0x344/0x4d0
[  158.836199]  ? skb_clone+0x123/0x230
[  158.839869]  ? skb_split+0x820/0x820
[  158.843521]  ? tcf_mirred+0x554/0x930 [act_mirred]
[  158.848407]  tcf_mirred+0x554/0x930 [act_mirred]
[  158.853104]  ? tcf_mirred_act_wants_ingress.part.2+0x10/0x10 [act_mirred]
[  158.860005]  ? __lock_acquire+0x706/0x26e0
[  158.864162]  ? mark_lock+0x13d/0xb40
[  158.867832]  tcf_action_exec+0xcf/0x2a0
[  158.871736]  tcf_classify+0xfa/0x340
[  158.875402]  __netif_receive_skb_core+0x8e1/0x1c60
[  158.880334]  ? nf_ingress+0x500/0x500
[  158.884059]  ? process_backlog+0x347/0x4b0
[  158.888241]  ? lock_acquire+0xd8/0x320
[  158.892050]  ? process_backlog+0x1b6/0x4b0
[  158.896228]  ? process_backlog+0xc2/0x4b0
[  158.900291]  process_backlog+0xc2/0x4b0
[  158.904210]  net_rx_action+0x5cc/0x980
[  158.908047]  ? napi_complete_done+0x2c0/0x2c0
[  158.912525]  ? rcu_read_unlock+0x80/0x80
[  158.916534]  ? __lock_is_held+0x34/0x160
[  158.920541]  __do_softirq+0x1d4/0x9d2
[  158.924308]  ? trace_event_raw_event_irq_handler_exit+0x140/0x140
[  158.930515]  run_ksoftirqd+0x1d/0x40
[  158.934152]  smpboot_thread_fn+0x32b/0x690
[  158.938299]  ? sort_range+0x20/0x20
[  158.941842]  ? preempt_count_sub+0xf/0xd0
[  158.945940]  ? schedule+0x5b/0x140
[  158.949412]  kthread+0x206/0x300
[  158.952689]  ? sort_range+0x20/0x20
[  158.956249]  ? kthread_stop+0x570/0x570
[  158.960164]  ret_from_fork+0x3a/0x50
[  158.963823] Code: 14 3e ff 8b 4b 78 55 4d 89 f9 41 56 41 55 48 c7 c7 a0 cf db 82 41 54 44 8b 44 24 2c 48 8b 54 24 30 48 8b 74 24 20 e8 16 94 13 ff <0f> 0b 48 c7 c7 60 8e 1f 85 48 83 c4 20 e8 55 ef a6 ff 89 74 24
[  158.983235] RIP: skb_panic+0xc3/0x100 RSP: ffff8801d3f27110
[  158.988935] ---[ end trace 5af56ee845aa6cc8 ]---
[  158.993641] Kernel panic - not syncing: Fatal exception in interrupt
[  159.000176] Kernel Offset: disabled
[  159.003767] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

Reproducer:

	ip link add h1 type veth peer name swp1
	ip link add h3 type veth peer name swp3

	ip link set dev h1 up
	ip address add 192.0.2.1/28 dev h1

	ip link add dev vh3 type vrf table 20
	ip link set dev h3 master vh3
	ip link set dev vh3 up
	ip link set dev h3 up

	ip link set dev swp3 up
	ip address add dev swp3 2001:db8:2::1/64

	ip link set dev swp1 up
	tc qdisc add dev swp1 clsact

	ip link add name gt6 type ip6gretap \
		local 2001:db8:2::1 remote 2001:db8:2::2
	ip link set dev gt6 up

	sleep 1

	tc filter add dev swp1 ingress pref 1000 matchall skip_hw \
		action mirred egress mirror dev gt6
	ping -I h1 192.0.2.2

Fixes: c12b395a46 ("gre: Support GRE over IPv6")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17 16:50:06 -04:00
William Tu
02f99df187 erspan: fix invalid erspan version.
ERSPAN only support version 1 and 2.  When packets send to an
erspan device which does not have proper version number set,
drop the packet.  In real case, we observe multicast packets
sent to the erspan pernet device, erspan0, which does not have
erspan version configured.

Reported-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17 15:48:49 -04:00
David S. Miller
b9f672af14 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2018-05-17

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Provide a new BPF helper for doing a FIB and neighbor lookup
   in the kernel tables from an XDP or tc BPF program. The helper
   provides a fast-path for forwarding packets. The API supports
   IPv4, IPv6 and MPLS protocols, but currently IPv4 and IPv6 are
   implemented in this initial work, from David (Ahern).

2) Just a tiny diff but huge feature enabled for nfp driver by
   extending the BPF offload beyond a pure host processing offload.
   Offloaded XDP programs are allowed to set the RX queue index and
   thus opening the door for defining a fully programmable RSS/n-tuple
   filter replacement. Once BPF decided on a queue already, the device
   data-path will skip the conventional RSS processing completely,
   from Jakub.

3) The original sockmap implementation was array based similar to
   devmap. However unlike devmap where an ifindex has a 1:1 mapping
   into the map there are use cases with sockets that need to be
   referenced using longer keys. Hence, sockhash map is added reusing
   as much of the sockmap code as possible, from John.

4) Introduce BTF ID. The ID is allocatd through an IDR similar as
   with BPF maps and progs. It also makes BTF accessible to user
   space via BPF_BTF_GET_FD_BY_ID and adds exposure of the BTF data
   through BPF_OBJ_GET_INFO_BY_FD, from Martin.

5) Enable BPF stackmap with build_id also in NMI context. Due to the
   up_read() of current->mm->mmap_sem build_id cannot be parsed.
   This work defers the up_read() via a per-cpu irq_work so that
   at least limited support can be enabled, from Song.

6) Various BPF JIT follow-up cleanups and fixups after the LD_ABS/LD_IND
   JIT conversion as well as implementation of an optimized 32/64 bit
   immediate load in the arm64 JIT that allows to reduce the number of
   emitted instructions; in case of tested real-world programs they
   were shrinking by three percent, from Daniel.

7) Add ifindex parameter to the libbpf loader in order to enable
   BPF offload support. Right now only iproute2 can load offloaded
   BPF and this will also enable libbpf for direct integration into
   other applications, from David (Beckett).

8) Convert the plain text documentation under Documentation/bpf/ into
   RST format since this is the appropriate standard the kernel is
   moving to for all documentation. Also add an overview README.rst,
   from Jesper.

9) Add __printf verification attribute to the bpf_verifier_vlog()
   helper. Though it uses va_list we can still allow gcc to check
   the format string, from Mathieu.

10) Fix a bash reference in the BPF selftest's Makefile. The '|& ...'
    is a bash 4.0+ feature which is not guaranteed to be available
    when calling out to shell, therefore use a more portable variant,
    from Joe.

11) Fix a 64 bit division in xdp_umem_reg() by using div_u64()
    instead of relying on the gcc built-in, from Björn.

12) Fix a sock hashmap kmalloc warning reported by syzbot when an
    overly large key size is used in hashmap then causing overflows
    in htab->elem_size. Reject bogus attr->key_size early in the
    sock_hash_alloc(), from Yonghong.

13) Ensure in BPF selftests when urandom_read is being linked that
    --build-id is always enabled so that test_stacktrace_build_id[_nmi]
    won't be failing, from Alexei.

14) Add bitsperlong.h as well as errno.h uapi headers into the tools
    header infrastructure which point to one of the arch specific
    uapi headers. This was needed in order to fix a build error on
    some systems for the BPF selftests, from Sirio.

15) Allow for short options to be used in the xdp_monitor BPF sample
    code. And also a bpf.h tools uapi header sync in order to fix a
    selftest build failure. Both from Prashant.

16) More formally clarify the meaning of ID in the direct packet access
    section of the BPF documentation, from Wang.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-16 22:47:11 -04:00
Eric Dumazet
d9f92772e8 xfrm6: avoid potential infinite loop in _decode_session6()
syzbot found a way to trigger an infinitie loop by overflowing
@offset variable that has been forced to use u16 for some very
obscure reason in the past.

We probably want to look at NEXTHDR_FRAGMENT handling which looks
wrong, in a separate patch.

In net-next, we shall try to use skb_header_pointer() instead of
pskb_may_pull().

watchdog: BUG: soft lockup - CPU#1 stuck for 134s! [syz-executor738:4553]
Modules linked in:
irq event stamp: 13885653
hardirqs last  enabled at (13885652): [<ffffffff878009d5>] restore_regs_and_return_to_kernel+0x0/0x2b
hardirqs last disabled at (13885653): [<ffffffff87800905>] interrupt_entry+0xb5/0xf0 arch/x86/entry/entry_64.S:625
softirqs last  enabled at (13614028): [<ffffffff84df0809>] tun_napi_alloc_frags drivers/net/tun.c:1478 [inline]
softirqs last  enabled at (13614028): [<ffffffff84df0809>] tun_get_user+0x1dd9/0x4290 drivers/net/tun.c:1825
softirqs last disabled at (13614032): [<ffffffff84df1b6f>] tun_get_user+0x313f/0x4290 drivers/net/tun.c:1942
CPU: 1 PID: 4553 Comm: syz-executor738 Not tainted 4.17.0-rc3+ #40
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:check_kcov_mode kernel/kcov.c:67 [inline]
RIP: 0010:__sanitizer_cov_trace_pc+0x20/0x50 kernel/kcov.c:101
RSP: 0018:ffff8801d8cfe250 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
RAX: ffff8801d88a8080 RBX: ffff8801d7389e40 RCX: 0000000000000006
RDX: 0000000000000000 RSI: ffffffff868da4ad RDI: ffff8801c8a53277
RBP: ffff8801d8cfe250 R08: ffff8801d88a8080 R09: ffff8801d8cfe3e8
R10: ffffed003b19fc87 R11: ffff8801d8cfe43f R12: ffff8801c8a5327f
R13: 0000000000000000 R14: ffff8801c8a4e5fe R15: ffff8801d8cfe3e8
FS:  0000000000d88940(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffff600400 CR3: 00000001acab3000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 _decode_session6+0xc1d/0x14f0 net/ipv6/xfrm6_policy.c:150
 __xfrm_decode_session+0x71/0x140 net/xfrm/xfrm_policy.c:2368
 xfrm_decode_session_reverse include/net/xfrm.h:1213 [inline]
 icmpv6_route_lookup+0x395/0x6e0 net/ipv6/icmp.c:372
 icmp6_send+0x1982/0x2da0 net/ipv6/icmp.c:551
 icmpv6_send+0x17a/0x300 net/ipv6/ip6_icmp.c:43
 ip6_input_finish+0x14e1/0x1a30 net/ipv6/ip6_input.c:305
 NF_HOOK include/linux/netfilter.h:288 [inline]
 ip6_input+0xe1/0x5e0 net/ipv6/ip6_input.c:327
 dst_input include/net/dst.h:450 [inline]
 ip6_rcv_finish+0x29c/0xa10 net/ipv6/ip6_input.c:71
 NF_HOOK include/linux/netfilter.h:288 [inline]
 ipv6_rcv+0xeb8/0x2040 net/ipv6/ip6_input.c:208
 __netif_receive_skb_core+0x2468/0x3650 net/core/dev.c:4646
 __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4711
 netif_receive_skb_internal+0x126/0x7b0 net/core/dev.c:4785
 napi_frags_finish net/core/dev.c:5226 [inline]
 napi_gro_frags+0x631/0xc40 net/core/dev.c:5299
 tun_get_user+0x3168/0x4290 drivers/net/tun.c:1951
 tun_chr_write_iter+0xb9/0x154 drivers/net/tun.c:1996
 call_write_iter include/linux/fs.h:1784 [inline]
 do_iter_readv_writev+0x859/0xa50 fs/read_write.c:680
 do_iter_write+0x185/0x5f0 fs/read_write.c:959
 vfs_writev+0x1c7/0x330 fs/read_write.c:1004
 do_writev+0x112/0x2f0 fs/read_write.c:1039
 __do_sys_writev fs/read_write.c:1112 [inline]
 __se_sys_writev fs/read_write.c:1109 [inline]
 __x64_sys_writev+0x75/0xb0 fs/read_write.c:1109
 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reported-by: syzbot+0053c8...@syzkaller.appspotmail.com
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-05-14 09:42:07 +02:00
David S. Miller
4f6b15c3a6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter/IPVS fixes for net

The following patchset contains Netfilter/IPVS fixes for your net tree,
they are:

1) Fix handling of simultaneous open TCP connection in conntrack,
   from Jozsef Kadlecsik.

2) Insufficient sanitify check of xtables extension names, from
   Florian Westphal.

3) Skip unnecessary synchronize_rcu() call when transaction log
   is already empty, from Florian Westphal.

4) Incorrect destination mac validation in ebt_stp, from Stephen
   Hemminger.

5) xtables module reference counter leak in nft_compat, from
   Florian Westphal.

6) Incorrect connection reference counting logic in IPVS
   one-packet scheduler, from Julian Anastasov.

7) Wrong stats for 32-bits CPU in IPVS, also from Julian.

8) Calm down sparse error in netfilter core, also from Florian.

9) Use nla_strlcpy to fix compilation warning in nfnetlink_acct
   and nfnetlink_cthelper, again from Florian.

10) Missing module alias in icmp and icmp6 xtables extensions,
    from Florian Westphal.

11) Base chain statistics in nf_tables may be unset/null, from Florian.

12) Fix handling of large matchinfo size in nft_compat, this includes
    one preparation for before this fix. From Florian.

13) Fix bogus EBUSY error when deleting chains due to incorrect reference
    counting from the preparation phase of the two-phase commit protocol.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-13 20:28:47 -04:00
David S. Miller
b2d6cee117 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
The bpf syscall and selftests conflicts were trivial
overlapping changes.

The r8169 change involved moving the added mdelay from 'net' into a
different function.

A TLS close bug fix overlapped with the splitting of the TLS state
into separate TX and RX parts.  I just expanded the tests in the bug
fix from "ctx->conf == X" into "ctx->tx_conf == X && ctx->rx_conf
== X".

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11 20:53:22 -04:00
William Tu
d5db21a3e6 erspan: auto detect truncated ipv6 packets.
Currently the truncated bit is set only when 1) the mirrored packet
is larger than mtu and 2) the ipv4 packet tot_len is larger than
the actual skb->len.  This patch adds another case for detecting
whether ipv6 packet is truncated or not, by checking the ipv6 header
payload_len and the skb->len.

Reported-by: Xiaoyan Jin <xiaoyanj@vmware.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11 16:03:49 -04:00
David Ahern
65a2022e89 net/ipv6: Add fib lookup stubs for use in bpf helper
Add stubs to retrieve a handle to an IPv6 FIB table, fib6_get_table,
a stub to do a lookup in a specific table, fib6_table_lookup, and
a stub for a full route lookup.

The stubs are needed for core bpf code to handle the case when the
IPv6 module is not builtin.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-11 00:10:57 +02:00