linux/include
David Gibson 90481622d7 hugepages: fix use after free bug in "quota" handling
hugetlbfs_{get,put}_quota() are badly named.  They don't interact with the
general quota handling code, and they don't much resemble its behaviour.
Rather than being about maintaining limits on on-disk block usage by
particular users, they are instead about maintaining limits on in-memory
page usage (including anonymous MAP_PRIVATE copied-on-write pages)
associated with a particular hugetlbfs filesystem instance.

Worse, they work by having callbacks to the hugetlbfs filesystem code from
the low-level page handling code, in particular from free_huge_page().
This is a layering violation of itself, but more importantly, if the
kernel does a get_user_pages() on hugepages (which can happen from KVM
amongst others), then the free_huge_page() can be delayed until after the
associated inode has already been freed.  If an unmount occurs at the
wrong time, even the hugetlbfs superblock where the "quota" limits are
stored may have been freed.

Andrew Barry proposed a patch to fix this by having hugepages, instead of
storing a pointer to their address_space and reaching the superblock from
there, had the hugepages store pointers directly to the superblock,
bumping the reference count as appropriate to avoid it being freed.
Andrew Morton rejected that version, however, on the grounds that it made
the existing layering violation worse.

This is a reworked version of Andrew's patch, which removes the extra, and
some of the existing, layering violation.  It works by introducing the
concept of a hugepage "subpool" at the lower hugepage mm layer - that is a
finite logical pool of hugepages to allocate from.  hugetlbfs now creates
a subpool for each filesystem instance with a page limit set, and a
pointer to the subpool gets added to each allocated hugepage, instead of
the address_space pointer used now.  The subpool has its own lifetime and
is only freed once all pages in it _and_ all other references to it (i.e.
superblocks) are gone.

subpools are optional - a NULL subpool pointer is taken by the code to
mean that no subpool limits are in effect.

Previous discussion of this bug found in:  "Fix refcounting in hugetlbfs
quota handling.". See:  https://lkml.org/lkml/2011/8/11/28 or
http://marc.info/?l=linux-mm&m=126928970510627&w=1

v2: Fixed a bug spotted by Hillf Danton, and removed the extra parameter to
alloc_huge_page() - since it already takes the vma, it is not necessary.

Signed-off-by: Andrew Barry <abarry@cray.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-21 17:54:59 -07:00
..
acpi Merge 3.3-rc2 into the driver-core-next branch. 2012-02-02 11:24:44 -08:00
asm-generic mm: thp: fix pmd_bad() triggering in code paths holding mmap_sem read mode 2012-03-21 17:54:54 -07:00
crypto crypto: remove the second argument of k[un]map_atomic() 2012-03-20 21:48:16 +08:00
drm drm/exynos: exynos_drm.h header file fixes 2012-02-15 10:29:12 +09:00
keys keys: add a "logon" key type 2012-01-17 22:39:40 -06:00
linux hugepages: fix use after free bug in "quota" handling 2012-03-21 17:54:59 -07:00
math-emu
media Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media 2012-01-18 12:53:54 -08:00
misc
mtd mtd: document that MEMWRITE ioctl is NAND-specific 2012-01-09 18:18:36 +00:00
net Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2012-03-20 21:04:47 -07:00
pcmcia
rdma Merge branches 'cma', 'cxgb3', 'cxgb4', 'ehca', 'iser', 'mad', 'nes', 'qib', 'srp' and 'srpt' into for-next 2012-03-19 09:50:33 -07:00
rxrpc
scsi SCSI & usb-storage: add flags for VPD pages and REPORT LUNS 2012-02-08 17:36:41 -08:00
sound Merge branch 'fix/asoc' into for-linus 2012-01-31 15:13:14 +01:00
target target: Change target_submit_cmd() to return void 2012-02-07 06:41:04 +00:00
trace Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2012-03-20 21:12:50 -07:00
video fbdev fixes for 3.3 2012-02-07 15:54:02 -08:00
xen Merge branch 'for-3.3/drivers' of git://git.kernel.dk/linux-block 2012-01-15 12:48:41 -08:00
Kbuild