linux/net/core
Michal Hocko 2f064f3485 mm: make page pfmemalloc check more robust
Commit c48a11c7ad ("netvm: propagate page->pfmemalloc to skb") added
checks for page->pfmemalloc to __skb_fill_page_desc():

        if (page->pfmemalloc && !page->mapping)
                skb->pfmemalloc = true;

It assumes page->mapping == NULL implies that page->pfmemalloc can be
trusted.  However, __delete_from_page_cache() can set set page->mapping
to NULL and leave page->index value alone.  Due to being in union, a
non-zero page->index will be interpreted as true page->pfmemalloc.

So the assumption is invalid if the networking code can see such a page.
And it seems it can.  We have encountered this with a NFS over loopback
setup when such a page is attached to a new skbuf.  There is no copying
going on in this case so the page confuses __skb_fill_page_desc which
interprets the index as pfmemalloc flag and the network stack drops
packets that have been allocated using the reserves unless they are to
be queued on sockets handling the swapping which is the case here and
that leads to hangs when the nfs client waits for a response from the
server which has been dropped and thus never arrive.

The struct page is already heavily packed so rather than finding another
hole to put it in, let's do a trick instead.  We can reuse the index
again but define it to an impossible value (-1UL).  This is the page
index so it should never see the value that large.  Replace all direct
users of page->pfmemalloc by page_is_pfmemalloc which will hide this
nastiness from unspoiled eyes.

The information will get lost if somebody wants to use page->index
obviously but that was the case before and the original code expected
that the information should be persisted somewhere else if that is
really needed (e.g.  what SLAB and SLUB do).

[akpm@linux-foundation.org: fix blooper in slub]
Fixes: c48a11c7ad ("netvm: propagate page->pfmemalloc to skb")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Debugged-by: Vlastimil Babka <vbabka@suse.com>
Debugged-by: Jiri Bohac <jbohac@suse.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: <stable@vger.kernel.org>	[3.6+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-21 14:30:10 -07:00
..
datagram.c net: Fix skb_set_peeked use-after-free bug 2015-08-06 21:55:47 -07:00
dev_addr_lists.c net: fix spelling for synchronized 2014-11-18 15:26:32 -05:00
dev_ioctl.c dev_ioctl: use sizeof(x) instead of sizeof x 2014-11-18 15:27:32 -05:00
dev.c net: call rcu_read_lock early in process_backlog 2015-07-10 18:16:36 -07:00
drop_monitor.c net: Replace get_cpu_var through this_cpu_ptr 2014-08-26 13:45:47 -04:00
dst.c net: ratelimit warnings about dst entry refcount underflow or overflow 2015-07-21 00:11:19 -07:00
ethtool.c net/ethtool: Add current supported tunable options 2015-06-11 00:36:37 -07:00
fib_rules.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-04-06 22:34:15 -04:00
filter.c bpf: disallow bpf tc programs access current->pid,uid 2015-06-15 20:51:20 -07:00
flow_dissector.c flow_dissector: Pre-initialize ip_proto in __skb_flow_dissect() 2015-06-28 16:53:54 -07:00
flow.c flowcache: Fix kernel panic in flow_cache_flush_task 2015-02-05 14:38:53 -08:00
gen_estimator.c net_sched: gen_estimator: extend pps limit 2015-07-08 13:59:20 -07:00
gen_stats.c gen_stats.c: Duplicate xstats buffer for later use 2015-02-19 15:45:53 -05:00
link_watch.c dev: introduce dev_get_iflink() 2015-04-02 14:04:59 -04:00
Makefile net: bury net/core/iovec.c - nothing in there is used anymore 2015-02-04 01:34:15 -05:00
neighbour.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-06-24 02:58:51 -07:00
net_namespace.c netns: make nsid_lock per net 2015-05-17 23:41:11 -04:00
net-procfs.c rps: selective flow shedding during softnet overflow 2013-05-20 13:48:04 -07:00
net-sysfs.c switchdev: don't use anonymous union on switchdev attr/obj structs 2015-05-13 14:20:59 -04:00
net-sysfs.h net: netdev_kobject_init: annotate with __init 2014-01-05 20:27:54 -05:00
net-traces.c net: Add export.h for EXPORT_SYMBOL/THIS_MODULE to non-modules 2011-10-31 19:30:30 -04:00
netclassid_cgroup.c cgroup: net_cls: fix false-positive "suspicious RCU usage" 2015-07-25 00:13:18 -07:00
netevent.c netevent: remove automatic variable in register_netevent_notifier() 2015-05-31 00:03:21 -07:00
netpoll.c net: rename vlan_tx_* helpers since "tx" is misleading there 2015-01-13 17:51:08 -05:00
netprio_cgroup.c cgroup: rename cgroup_subsys->base_cftypes to ->legacy_cftypes 2014-07-15 11:05:09 -04:00
pktgen.c net: pktgen: don't abuse current->state in pktgen_thread_worker() 2015-08-06 23:52:44 -07:00
ptp_classifier.c net: filter: split 'struct sk_filter' into socket and bpf parts 2014-08-02 15:03:58 -07:00
request_sock.c inet: fix races with reqsk timers 2015-08-10 21:17:29 -07:00
rtnetlink.c rtnetlink: reject non-IFLA_VF_PORT attributes inside IFLA_VF_PORTS 2015-07-15 15:53:27 -07:00
scm.c net: introduce helper macro for_each_cmsghdr 2014-12-10 22:41:55 -05:00
secure_seq.c net: remove a sparse error in secure_dccpv6_sequence_number() 2015-05-25 22:55:37 -04:00
skbuff.c mm: make page pfmemalloc check more robust 2015-08-21 14:30:10 -07:00
sock_diag.c sock_diag: define destruction multicast groups 2015-06-15 19:49:22 -07:00
sock.c net: sk_clone_lock() should only do get_net() if the parent is not a kernel socket 2015-07-30 15:59:12 -07:00
stream.c tcp: set SOCK_NOSPACE under memory pressure 2015-05-09 17:38:36 -04:00
sysctl_net_core.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-03-20 18:51:09 -04:00
timestamping.c net-timestamp: Make the clone operation stand-alone from phy timestamping 2014-09-05 17:43:45 -07:00
tso.c net: tso: fix unaligned access to crafted TCP header in helper API 2014-10-22 12:52:55 -04:00
utils.c net: fix inet_proto_csum_replace4() sparse errors 2015-05-25 22:56:47 -04:00