linux/include
Pavel Emelyanov cca7af3889 skb: Propagate pfmemalloc on skb from head page only
Hi.

I'm trying to send big chunks of memory from application address space via
TCP socket using vmsplice + splice like this

   mem = mmap(128Mb);
   vmsplice(pipe[1], mem); /* splice memory into pipe */
   splice(pipe[0], tcp_socket); /* send it into network */

When I'm lucky and a huge page splices into the pipe and then into the socket
_and_ client and server ends of the TCP connection are on the same host,
communicating via lo, the whole connection gets stuck! The sending queue
becomes full and app stops writing/splicing more into it, but the receiving
queue remains empty, and that's why.

The __skb_fill_page_desc observes a tail page of a huge page and erroneously
propagates its page->pfmemalloc value onto socket (the pfmemalloc on tail pages
contain garbage). Then this skb->pfmemalloc leaks through lo and due to the

    tcp_v4_rcv
    sk_filter
        if (skb->pfmemalloc && !sock_flag(sk, SOCK_MEMALLOC)) /* true */
            return -ENOMEM
        goto release_and_discard;

no packets reach the socket. Even TCP re-transmits are dropped by this, as skb
cloning clones the pfmemalloc flag as well.

That said, here's the proper page->pfmemalloc propagation onto socket: we
must check the huge-page's head page only, other pages' pfmemalloc and mapping
values do not contain what is expected in this place. However, I'm not sure
whether this fix is _complete_, since pfmemalloc propagation via lo also
oesn't look great.

Both, bit propagation from page to skb and this check in sk_filter, were
introduced by c48a11c7 (netvm: propagate page->pfmemalloc to skb), in v3.5 so
Mel and stable@ are in Cc.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-14 11:53:32 -04:00
..
acpi ACPI / glue: Drop .find_bridge() callback from struct acpi_bus_type 2013-03-04 14:23:40 +01:00
asm-generic ImgTec Meta architecture changes for v3.9-rc1 2013-03-03 12:06:09 -08:00
clocksource ImgTec Meta architecture changes for v3.9-rc1 2013-03-03 12:06:09 -08:00
crypto
drm drm: Documentation typo fixes 2013-03-08 08:32:23 +10:00
keys
linux skb: Propagate pfmemalloc on skb from head page only 2013-03-14 11:53:32 -04:00
math-emu
media [media] media: ov7670: Add possibility to disable pixclk during hblank 2013-02-08 14:35:06 -02:00
memory
misc
net ipv4: fix definition of FIB_TABLE_HASHSZ 2013-03-13 10:47:09 -04:00
pcmcia
ras edac: add support for error type "Info" 2013-02-21 14:16:27 -03:00
rdma IB/core: Add "type 2" memory windows support 2013-02-21 11:51:45 -08:00
rxrpc
scsi SCSI for-linus on 20130301 2013-03-02 11:42:16 -08:00
sound arm-soc: late OMAP changes 2013-02-28 20:00:40 -08:00
target target: Rename spc_get_write_same_sectors -> sbc_get_write_same_sectors 2013-02-23 12:46:14 -08:00
trace Various bug fixes for ext4. The most important is a fix for the new 2013-03-02 19:33:21 -08:00
uapi UAPI disintegration 2012-12-20 2013-03-03 14:24:59 -08:00
video UAPI disintegration 2012-12-20 2013-03-03 14:24:59 -08:00
xen xen: event channel arrays are xen_ulong_t and not unsigned long 2013-02-20 08:45:07 -05:00
Kbuild