linux/net
Florian Westphal 7223ecd466 netfilter: nat: switch to new rhlist interface
I got offlist bug report about failing connections and high cpu usage.
This happens because we hit 'elasticity' checks in rhashtable that
refuses bucket list exceeding 16 entries.

The nat bysrc hash unfortunately needs to insert distinct objects that
share same key and are identical (have same source tuple), this cannot
be avoided.

Switch to the rhlist interface which is designed for this.

The nulls_base is removed here, I don't think its needed:

A (unlikely) false positive results in unneeded port clash resolution,
a false negative results in packet drop during conntrack confirmation,
when we try to insert the duplicate into main conntrack hash table.

Tested by adding multiple ip addresses to host, then adding
iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE

... and then creating multiple connections, from same source port but
different addresses:

for i in $(seq 2000 2032);do nc -p 1234 192.168.7.1 $i > /dev/null  & done

(all of these then get hashed to same bysource slot)

Then, to test that nat conflict resultion is working:

nc -s 10.0.0.1 -p 1234 192.168.7.1 2000
nc -s 10.0.0.2 -p 1234 192.168.7.1 2000

tcp  .. src=10.0.0.1 dst=192.168.7.1 sport=1234 dport=2000 src=192.168.7.1 dst=192.168.7.10 sport=2000 dport=1024 [ASSURED]
tcp  .. src=10.0.0.2 dst=192.168.7.1 sport=1234 dport=2000 src=192.168.7.1 dst=192.168.7.10 sport=2000 dport=1025 [ASSURED]
tcp  .. src=192.168.7.10 dst=192.168.7.1 sport=1234 dport=2000 src=192.168.7.1 dst=192.168.7.10 sport=2000 dport=1234 [ASSURED]
tcp  .. src=192.168.7.10 dst=192.168.7.1 sport=1234 dport=2001 src=192.168.7.1 dst=192.168.7.10 sport=2001 dport=1234 [ASSURED]
[..]

-> nat altered source ports to 1024 and 1025, respectively.
This can also be confirmed on destination host which shows
ESTAB      0      0   192.168.7.1:2000      192.168.7.10:1024
ESTAB      0      0   192.168.7.1:2000      192.168.7.10:1025
ESTAB      0      0   192.168.7.1:2000      192.168.7.10:1234

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Fixes: 870190a9ec ("netfilter: nat: convert nat bysrc hash to rhashtable")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-24 14:43:34 +01:00
..
6lowpan 6lowpan: ndisc: no overreact if no short address is available 2016-09-19 20:19:34 +02:00
9p IB/core: add support to create a unsafe global rkey to ib_create_pd 2016-09-23 13:47:44 -04:00
802
8021q net: add recursion limit to GRO 2016-10-20 14:32:22 -04:00
appletalk appletalk: use IS_ENABLED() instead of checking for built-in or module 2016-09-10 21:19:10 -07:00
atm lec: use IS_ENABLED() instead of checking for built-in or module 2016-09-10 21:19:10 -07:00
ax25
batman-adv batman-adv: Detect missing primaryif during tp_send as error 2016-11-04 12:27:39 +01:00
bluetooth Bluetooth: Fix append max 11 bytes of name to scan rsp data 2016-10-19 18:42:37 +02:00
bridge bridge: multicast: restore perm router ports on multicast enable 2016-10-18 13:52:13 -04:00
caif
can can: bcm: fix warning in bcm_connect/proc_register 2016-10-31 20:48:19 +01:00
ceph libceph: initialize last_linger_id with a large integer 2016-11-10 20:13:08 +01:00
core rtnetlink: fix the wrong minimal dump size getting from rtnl_calcit() 2016-11-23 20:18:36 -05:00
dcb
dccp ipv6: dccp: add missing bind_conflict to dccp_ipv6_mapped 2016-11-03 16:50:27 -04:00
decnet net: fix decnet rtnexthop parsing 2016-07-05 14:08:47 -07:00
dns_resolver
dsa net: dsa: add port fast ageing 2016-09-23 08:38:50 -04:00
ethernet net: add recursion limit to GRO 2016-10-20 14:32:22 -04:00
hsr net/hsr: Remove unused but set variable 2016-10-18 10:28:18 -04:00
ieee802154 ieee802154: 6lowpan: fix intra pan id check 2016-07-08 13:23:12 +02:00
ipv4 netfilter: Update ip_route_me_harder to consider L3 domain 2016-11-24 12:44:36 +01:00
ipv6 netfilter: Update nf_send_reset6 to consider L3 domain 2016-11-24 12:47:08 +01:00
ipx
irda Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-09-23 06:46:57 -04:00
iucv Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2016-07-29 17:38:46 -07:00
kcm Merge branch 'work.splice_read' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-07 15:36:58 -07:00
key
l2tp net: revert "net: l2tp: Treat NET_XMIT_CN as success in l2tp_eth_dev_xmit" 2016-11-23 20:18:36 -05:00
l3mdev net: ipv6: Remove l3mdev_get_saddr6 2016-09-10 23:12:53 -07:00
lapb
llc llc: switch type to bool as the timeout is only tested versus 0 2016-09-17 10:05:05 -04:00
mac80211 mac80211: fix A-MSDU aggregation with fast-xmit + txq 2016-11-15 14:37:30 +01:00
mac802154 mac802154: use rate limited warnings for malformed frames 2016-09-19 20:19:34 +02:00
mpls mpls: move mpls_hdr to a common location 2016-10-03 02:00:21 -04:00
ncsi net/ncsi: Improve HNCDSC AEN handler 2016-10-20 11:23:08 -04:00
netfilter netfilter: nat: switch to new rhlist interface 2016-11-24 14:43:34 +01:00
netlabel
netlink genetlink: fix a memory leak on error path 2016-11-03 16:52:29 -04:00
netrom
nfc NFC: digital: Fix RTOX supervisor PDU handling 2016-07-11 02:02:03 +02:00
openvswitch openvswitch: add NETIF_F_HW_VLAN_STAG_TX to internal dev 2016-10-13 10:03:23 -04:00
packet packet: on direct_xmit, limit tso and csum to supported devices 2016-10-29 15:02:15 -04:00
phonet
qrtr
rds rds: debug messages are enabled by default 2016-10-29 15:55:57 -04:00
rfkill
rose rose: limit sk_filter trim to payload 2016-07-13 11:53:40 -07:00
rxrpc rxrpc: Fix checking of error from ip6_route_output() 2016-10-13 08:43:17 +01:00
sched net sched filters: pass netlink message flags in event notification 2016-11-17 13:42:12 -05:00
sctp sctp: change sk state only when it has assocs in sctp_shutdown 2016-11-14 16:22:33 -05:00
strparser strparser: Propagate correct error code in strp_recv() 2016-10-12 01:51:49 -04:00
sunrpc One fix for an NFS/RDMA crash. 2016-11-18 16:32:21 -08:00
switchdev switchdev: Execute bridge ndos only for bridge ports 2016-10-19 10:58:04 -04:00
tipc tipc: eliminate obsolete socket locking policy description 2016-11-19 22:15:41 -05:00
unix af_unix: conditionally use freezable blocking calls in read 2016-11-18 13:58:39 -05:00
vmw_vsock VSOCK: Don't dec ack backlog twice for rejected connections 2016-09-27 07:59:25 -04:00
wimax
wireless cfg80211: limit scan results cache size 2016-11-18 08:44:44 +01:00
x25 net: x25: remove null checks on arrays calling_ae and called_ae 2016-09-09 18:13:30 -07:00
xfrm proc: Reduce cache miss in xfrm_statistics_seq_show 2016-09-30 01:50:45 -04:00
compat.c
Kconfig strparser: Stream parser for messages 2016-08-17 19:36:23 -04:00
Makefile strparser: Stream parser for messages 2016-08-17 19:36:23 -04:00
socket.c xattr: Fix setting security xattrs on sockfs 2016-11-17 00:00:23 -05:00
sysctl_net.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2016-10-06 09:52:23 -07:00