linux/arch/x86/mm
Dave Hansen 62b5f7d013 mm/core, x86/mm/pkeys: Add execute-only protection keys support
Protection keys provide new page-based protection in hardware.
But, they have an interesting attribute: they only affect data
accesses and never affect instruction fetches.  That means that
if we set up some memory which is set as "access-disabled" via
protection keys, we can still execute from it.

This patch uses protection keys to set up mappings to do just that.
If a user calls:

	mmap(..., PROT_EXEC);
or
	mprotect(ptr, sz, PROT_EXEC);

(note PROT_EXEC-only without PROT_READ/WRITE), the kernel will
notice this, and set a special protection key on the memory.  It
also sets the appropriate bits in the Protection Keys User Rights
(PKRU) register so that the memory becomes unreadable and
unwritable.

I haven't found any userspace that does this today.  With this
facility in place, we expect userspace to move to use it
eventually.  Userspace _could_ start doing this today.  Any
PROT_EXEC calls get converted to PROT_READ inside the kernel, and
would transparently be upgraded to "true" PROT_EXEC with this
code.  IOW, userspace never has to do any PROT_EXEC runtime
detection.

This feature provides enhanced protection against leaking
executable memory contents.  This helps thwart attacks which are
attempting to find ROP gadgets on the fly.

But, the security provided by this approach is not comprehensive.
The PKRU register which controls access permissions is a normal
user register writable from unprivileged userspace.  An attacker
who can execute the 'wrpkru' instruction can easily disable the
protection provided by this feature.

The protection key that is used for execute-only support is
permanently dedicated at compile time.  This is fine for now
because there is currently no API to set a protection key other
than this one.

Despite there being a constant PKRU value across the entire
system, we do not set it unless this feature is in use in a
process.  That is to preserve the PKRU XSAVE 'init state',
which can lead to faster context switches.

PKRU *is* a user register and the kernel is modifying it.  That
means that code doing:

	pkru = rdpkru()
	pkru |= 0x100;
	mmap(..., PROT_EXEC);
	wrpkru(pkru);

could lose the bits in PKRU that enforce execute-only
permissions.  To avoid this, we suggest avoiding ever calling
mmap() or mprotect() when the PKRU value is expected to be
unstable.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chen Gang <gang.chen.5i5j@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Piotr Kwapulinski <kwapulinski.piotr@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: keescook@google.com
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210240.CB4BB5CA@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-18 19:46:33 +01:00
..
kmemcheck x86: Replace __get_cpu_var uses 2014-08-26 13:45:49 -04:00
amdtopology.c
debug_pagetables.c x86/mm/ptdump: Make (debugfs)/kernel_page_tables read-only 2015-12-04 12:55:01 +01:00
dump_pagetables.c Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-01-11 17:16:01 -08:00
extable.c
fault.c mm/core, x86/mm/pkeys: Add execute-only protection keys support 2016-02-18 19:46:33 +01:00
gup.c mm/gup, x86/mm/pkeys: Check VMAs and PTEs for protection keys 2016-02-18 09:32:44 +01:00
highmem_32.c kmap_atomic_to_page() has no users, remove it 2015-11-09 15:11:24 -08:00
hugetlbpage.c mm, hugetlb: don't require CMA for runtime gigantic pages 2016-02-05 18:10:40 -08:00
init_32.c x86/mm: Make kmap_prot into a #define 2016-01-20 11:39:14 +01:00
init_64.c Merge branch 'x86/urgent' into x86/mm, to pick up dependent fix 2016-02-08 12:13:22 +01:00
init.c libnvdimm for 4.4: 2015-11-10 12:07:22 -08:00
iomap_32.c Merge branch 'x86-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-06-22 17:59:09 -07:00
ioremap.c x86/mm: Drop WARN from multi-BAR check 2015-12-29 12:34:38 +01:00
kasan_init_64.c x86/kasan: Write protect kasan zero shadow 2016-02-09 13:33:14 +01:00
kmmio.c x86: Delete non-required instances of include <linux/init.h> 2014-01-06 21:25:18 -08:00
Makefile mm/core, x86/mm/pkeys: Add execute-only protection keys support 2016-02-18 19:46:33 +01:00
mm_internal.h x86: Enable PAT to use cache mode translation tables 2014-11-16 11:04:26 +01:00
mmap.c x86: mm: support ARCH_MMAP_RND_BITS 2016-01-14 16:00:49 -08:00
mmio-mod.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
mpx.c mm/gup: Switch all callers of get_user_pages() to not pass tsk/mm 2016-02-16 10:11:12 +01:00
numa_32.c x86: Fix the initialization of physnode_map 2014-02-01 22:15:51 -08:00
numa_64.c
numa_emulation.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
numa_internal.h
numa.c x86/mm/numa: Check for failures in numa_clear_kernel_node_hotplug() 2016-02-08 12:14:55 +01:00
pageattr-test.c x86/mm/pat: Make mm/pageattr[-test].c explicitly non-modular 2015-08-25 09:48:38 +02:00
pageattr.c x86/mm/pat: Avoid truncation when converting cpa->numpages to address 2016-01-29 15:03:09 +01:00
pat_internal.h x86/mm/pat: Convert to pr_*() usage 2015-05-27 14:40:59 +02:00
pat_rbtree.c x86/mm/pat: Change free_memtype() to support shrinking case 2016-01-05 11:10:23 +01:00
pat.c x86/mm: Honour passed pgprot in track_pfn_insert() and track_pfn_remap() 2016-02-09 15:25:36 +01:00
pf_in.c
pf_in.h
pgtable_32.c x86: Remove set_pmd_pfn 2014-09-01 10:15:31 +02:00
pgtable.c x86, thp: remove infrastructure for handling splitting PMDs 2016-01-15 17:56:32 -08:00
physaddr.c
physaddr.h
pkeys.c mm/core, x86/mm/pkeys: Add execute-only protection keys support 2016-02-18 19:46:33 +01:00
setup_nx.c Merge branches 'x86/fpu', 'x86/mm' and 'x86/asm' into x86/pkeys 2016-02-16 09:37:37 +01:00
srat.c x86/mm: Introduce max_possible_pfn 2015-12-06 12:46:31 +01:00
testmmiotrace.c
tlb.c x86/mm: Add barriers and document switch_mm()-vs-flush synchronization 2016-01-11 12:03:15 +01:00