linux/arch/x86/mm
Dave Hansen 1de4fa14ee x86, mpx: Cleanup unused bound tables
The previous patch allocates bounds tables on-demand.  As noted in
an earlier description, these can add up to *HUGE* amounts of
memory.  This has caused OOMs in practice when running tests.

This patch adds support for freeing bounds tables when they are no
longer in use.

There are two types of mappings in play when unmapping tables:
 1. The mapping with the actual data, which userspace is
    munmap()ing or brk()ing away, etc...
 2. The mapping for the bounds table *backing* the data
    (is tagged with VM_MPX, see the patch "add MPX specific
    mmap interface").

If userspace use the prctl() indroduced earlier in this patchset
to enable the management of bounds tables in kernel, when it
unmaps the first type of mapping with the actual data, the kernel
needs to free the mapping for the bounds table backing the data.
This patch hooks in at the very end of do_unmap() to do so.
We look at the addresses being unmapped and find the bounds
directory entries and tables which cover those addresses.  If
an entire table is unused, we clear associated directory entry
and free the table.

Once we unmap the bounds table, we would have a bounds directory
entry pointing at empty address space. That address space might
now be allocated for some other (random) use, and the MPX
hardware might now try to walk it as if it were a bounds table.
That would be bad.  So any unmapping of an enture bounds table
has to be accompanied by a corresponding write to the bounds
directory entry to invalidate it.  That write to the bounds
directory can fault, which causes the following problem:

Since we are doing the freeing from munmap() (and other paths
like it), we hold mmap_sem for write. If we fault, the page
fault handler will attempt to acquire mmap_sem for read and
we will deadlock.  To avoid the deadlock, we pagefault_disable()
when touching the bounds directory entry and use a
get_user_pages() to resolve the fault.

The unmapping of bounds tables happends under vm_munmap().  We
also (indirectly) call vm_munmap() to _do_ the unmapping of the
bounds tables.  We avoid unbounded recursion by disallowing
freeing of bounds tables *for* bounds tables.  This would not
occur normally, so should not have any practical impact.  Being
strict about it here helps ensure that we do not have an
exploitable stack overflow.

Based-on-patch-by: Qiaowei Ren <qiaowei.ren@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-mm@kvack.org
Cc: linux-mips@linux-mips.org
Cc: Dave Hansen <dave@sr71.net>
Link: http://lkml.kernel.org/r/20141114151831.E4531C4A@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-18 00:58:54 +01:00
..
kmemcheck x86: Replace __get_cpu_var uses 2014-08-26 13:45:49 -04:00
amdtopology.c x86/mm/numa: Simplify some bit mangling 2013-04-10 19:06:26 +02:00
dump_pagetables.c x86-64, ptdump: Mark espfix area only if existent 2014-09-08 11:57:34 -07:00
extable.c x86, extable: Switch to relative exception table entries 2012-04-20 17:22:34 -07:00
fault.c Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-10-14 02:22:41 +02:00
gup.c mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE 2014-01-23 16:36:50 -08:00
highmem_32.c mm: accurately calculate zone->managed_pages for highmem zones 2013-07-03 16:07:33 -07:00
hugetlbpage.c hugetlb: restrict hugepage_migration_support() to x86_64 2014-06-04 16:53:51 -07:00
init_32.c x86: remove the Xen-specific _PAGE_IOMAP PTE flag 2014-09-23 13:36:20 +00:00
init_64.c Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-10-14 02:22:41 +02:00
init.c x86/mm: Add tracepoints for TLB flushes 2014-07-31 08:48:51 -07:00
iomap_32.c mm: fix race in kunmap_atomic() 2010-10-27 18:03:05 -07:00
ioremap.c x86: use optimized ioresource lookup in ioremap function 2014-10-14 02:18:22 +02:00
kmmio.c x86: Delete non-required instances of include <linux/init.h> 2014-01-06 21:25:18 -08:00
Makefile x86, mpx: Add MPX-specific mmap interface 2014-11-18 00:58:53 +01:00
memtest.c x86/mm: memblock: switch to use NUMA_NO_NODE 2014-01-21 16:19:47 -08:00
mm_internal.h x86, mm: Move after_bootmem to mm_internel.h 2012-11-17 11:59:45 -08:00
mmap.c x86/mm: Apply the section attribute to the variable, not its type 2014-09-09 07:13:39 +02:00
mmio-mod.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
mpx.c x86, mpx: Cleanup unused bound tables 2014-11-18 00:58:54 +01:00
numa_32.c x86: Fix the initialization of physnode_map 2014-02-01 22:15:51 -08:00
numa_64.c x86, mm: kill numa_free_all_bootmem() 2012-11-17 11:59:47 -08:00
numa_emulation.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
numa_internal.h x86-32, mm: Rip out x86_32 NUMA remapping code 2013-01-31 14:12:30 -08:00
numa.c Merge branch 'akpm' (patches from Andrew Morton) 2014-10-14 03:54:50 +02:00
pageattr-test.c x86: define _PAGE_NUMA by reusing software bits on the PMD and PTE levels 2014-06-04 16:53:55 -07:00
pageattr.c x86, pageattr: Prevent overflow in slow_virt_to_phys() for X86_PAE 2014-10-29 10:57:21 +01:00
pat_internal.h x86, pat: Fix memory leak in free_memtype 2010-05-26 11:26:04 -07:00
pat_rbtree.c rbtree: move augmented rbtree functionality to rbtree_augmented.h 2012-10-09 16:22:40 +09:00
pat.c x86: Do not try to sync identity map for non-mapped pages 2013-03-07 13:23:28 -08:00
pf_in.c x86: Eliminate various 'set but not used' warnings 2011-05-21 19:10:33 +02:00
pf_in.h
pgtable_32.c x86: Remove set_pmd_pfn 2014-09-01 10:15:31 +02:00
pgtable.c Merge branch 'x86/vdso' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next 2014-06-05 08:05:29 -07:00
physaddr.c x86, mm: Make DEBUG_VIRTUAL work earlier in boot 2013-01-25 16:33:22 -08:00
physaddr.h x86: split __phys_addr out into separate file 2009-09-10 11:48:55 -07:00
setup_nx.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
srat.c x86/mm: Avoid duplicated pxm_to_node() calls 2014-02-09 15:32:31 +01:00
testmmiotrace.c x86, kmmio/mmiotrace: Fix double free of kmmio_fault_pages 2010-06-18 11:30:09 +02:00
tlb.c x86/mm: Fix sparse 'tlb_single_page_flush_ceiling' warning and make the variable read-mostly 2014-08-10 09:07:18 +02:00