linux/net
Daniel Borkmann 52c35befb6 net: sctp: wake up all assocs if sndbuf policy is per socket
SCTP charges chunks for wmem accounting via skb->truesize in
sctp_set_owner_w(), and sctp_wfree() respectively as the
reverse operation. If a sender runs out of wmem, it needs to
wait via sctp_wait_for_sndbuf(), and gets woken up by a call
to __sctp_write_space() mostly via sctp_wfree().

__sctp_write_space() is being called per association. Although
we assign sk->sk_write_space() to sctp_write_space(), which
is then being done per socket, it is only used if send space
is increased per socket option (SO_SNDBUF), as SOCK_USE_WRITE_QUEUE
is set and therefore not invoked in sock_wfree().

Commit 4c3a5bdae2 ("sctp: Don't charge for data in sndbuf
again when transmitting packet") fixed an issue where in case
sctp_packet_transmit() manages to queue up more than sndbuf
bytes, sctp_wait_for_sndbuf() will never be woken up again
unless it is interrupted by a signal. However, a still
remaining issue is that if net.sctp.sndbuf_policy=0, that is
accounting per socket, and one-to-many sockets are in use,
the reclaimed write space from sctp_wfree() is 'unfairly'
handed back on the server to the association that is the lucky
one to be woken up again via __sctp_write_space(), while
the remaining associations are never be woken up again
(unless by a signal).

The effect disappears with net.sctp.sndbuf_policy=1, that
is wmem accounting per association, as it guarantees a fair
share of wmem among associations.

Therefore, if we have reclaimed memory in case of per socket
accounting, wake all related associations to a socket in a
fair manner, that is, traverse the socket association list
starting from the current neighbour of the association and
issue a __sctp_write_space() to everyone until we end up
waking ourselves. This guarantees that no association is
preferred over another and even if more associations are
taken into the one-to-many session, all receivers will get
messages from the server and are not stalled forever on
high load. This setting still leaves the advantage of per
socket accounting in touch as an association can still use
up global limits if unused by others.

Fixes: 4eb701dfc6 ("[SCTP] Fix SCTP sendbuffer accouting.")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Vlad Yasevich <vyasevic@redhat.com>
Acked-by: Vlad Yasevich <vyasevic@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-08 13:06:07 -04:00
..
9p 9p/trans_virtio.c: Fix broken zero-copy on vmalloc() buffers 2014-02-10 17:48:54 -08:00
802 neigh: use NEIGH_VAR_INIT in ndo_neigh_setup functions. 2014-01-16 11:31:58 -08:00
8021q Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-03-29 18:48:54 -04:00
appletalk appletalk: fix checkpatch error with indent 2014-02-14 16:18:32 -05:00
atm atm: replace del_timer by del_timer_sync 2014-03-27 15:28:06 -04:00
ax25 net: add build-time checks for msg->msg_name size 2014-01-18 23:04:16 -08:00
batman-adv batman-adv: Start new development cycle 2014-03-22 09:18:59 +01:00
bluetooth Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-04-02 20:53:45 -07:00
bridge netfilter: Can't fail and free after table replacement 2014-04-05 17:46:22 +02:00
caif net: Include appropriate header file in caif/cfsrvl.c 2014-02-09 17:32:49 -08:00
can can: remove CAN FD compatibility for CAN 2.0 sockets 2014-03-03 14:29:52 +01:00
ceph net: remove unnecessary return's 2014-02-13 18:33:38 -05:00
core netdev: remove potentially harmful checks 2014-04-07 15:52:07 -04:00
dcb dcb: use __dev_get_by_name instead of dev_get_by_name to find interface 2014-01-14 18:50:46 -08:00
dccp dccp: re-enable debug macro 2014-02-16 23:45:00 -05:00
decnet net: Move prototype declaration to header file include/net/dn.h from net/decnet/af_decnet.c 2014-02-09 17:32:49 -08:00
dns_resolver net/*: Fix FSF address in file headers 2013-12-06 12:37:57 -05:00
dsa dsa: Use ether_addr_copy 2014-01-21 18:13:05 -08:00
ethernet net: eth_type_trans() should use skb_header_pointer() 2014-01-16 15:30:31 -08:00
hsr hsr: replace del_timer by del_timer_sync 2014-03-27 15:28:06 -04:00
ieee802154 mac802154: make csma/cca parameters per-wpan 2014-04-01 16:25:51 -04:00
ipv4 netfilter: Can't fail and free after table replacement 2014-04-05 17:46:22 +02:00
ipv6 netfilter: Can't fail and free after table replacement 2014-04-05 17:46:22 +02:00
ipx ipx: implement shutdown() 2014-02-12 19:26:32 -05:00
irda net: add build-time checks for msg->msg_name size 2014-01-18 23:04:16 -08:00
iucv af_iucv: recvmsg problem for SOCK_STREAM sockets 2014-03-20 00:06:55 -04:00
key Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-03-25 20:29:20 -04:00
l2tp Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-04-02 20:53:45 -07:00
lapb
llc llc: remove noisy WARN from llc_mac_hdr_init 2014-01-28 18:01:32 -08:00
mac80211 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2014-03-21 14:02:04 -04:00
mac802154 mac802154: fix duplicate #include headers 2014-04-07 13:18:44 -04:00
mpls
netfilter netfilter: nf_tables: fix wrong format in request_module() 2014-04-03 23:52:50 +02:00
netlabel netlabel: Fix FSF address in file headers 2013-12-06 12:37:56 -05:00
netlink netlink: autosize skb lengthes 2014-03-10 13:56:26 -04:00
netrom net: add build-time checks for msg->msg_name size 2014-01-18 23:04:16 -08:00
nfc NFC: 3.15: First pull request 2014-03-17 13:16:50 -04:00
openvswitch Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-03-29 18:48:54 -04:00
packet packet: fix packet_direct_xmit for BQL enabled drivers 2014-04-03 14:29:12 -04:00
phonet net: add build-time checks for msg->msg_name size 2014-01-18 23:04:16 -08:00
rds rds: prevent dereference of a NULL device in rds_iw_laddr_check 2014-03-31 16:25:52 -04:00
rfkill net: rfkill: move poll work to power efficient workqueue 2014-02-04 21:58:16 +01:00
rose net: add build-time checks for msg->msg_name size 2014-01-18 23:04:16 -08:00
rxrpc af_rxrpc: Keep rxrpc_call pointers in a hashtable 2014-03-04 10:36:53 +00:00
sched net-sysfs: expose number of carrier on/off changes 2014-03-31 16:24:52 -04:00
sctp net: sctp: wake up all assocs if sndbuf policy is per socket 2014-04-08 13:06:07 -04:00
sunrpc NFS client bugfixes for Linux 3.14 2014-02-19 12:13:02 -08:00
tipc tipc: Let tipc_release() return 0 2014-04-07 15:08:17 -04:00
unix net: unix: non blocking recvmsg() should not return -EINTR 2014-03-26 17:05:40 -04:00
vmw_vsock net: add build-time checks for msg->msg_name size 2014-01-18 23:04:16 -08:00
wimax wimax: remove dead code 2013-11-21 13:09:42 -05:00
wireless Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2014-03-21 14:02:04 -04:00
x25 net: add build-time checks for msg->msg_name size 2014-01-18 23:04:16 -08:00
xfrm Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-03-25 20:29:20 -04:00
compat.c net/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types 2014-03-06 16:30:45 +01:00
Kconfig net: ptp: move PTP classifier in its own file 2014-04-01 16:43:18 -04:00
Makefile net: move 6lowpan compression code to separate module 2014-01-15 15:36:38 -08:00
nonet.c
socket.c net: ptp: move PTP classifier in its own file 2014-04-01 16:43:18 -04:00
sysctl_net.c