Go to file
Mel Gorman 4647706ebe mm: always flush VMA ranges affected by zap_page_range
Nadav Amit report zap_page_range only specifies that the caller protect
the VMA list but does not specify whether it is held for read or write
with callers using either.  madvise holds mmap_sem for read meaning that
a parallel zap operation can unmap PTEs which are then potentially
skipped by madvise which potentially returns with stale TLB entries
present.  While the API could be extended, it would be a difficult API
to use.  This patch causes zap_page_range() to always consider flushing
the full affected range.  For small ranges or sparsely populated
mappings, this may result in one additional spurious TLB flush.  For
larger ranges, it is possible that the TLB has already been flushed and
the overhead is negligible.  Either way, this approach is safer overall
and avoids stale entries being present when madvise returns.

This can be illustrated with the following program provided by Nadav
Amit and slightly modified.  With the patch applied, it has an exit code
of 0 indicating a stale TLB entry did not leak to userspace.

---8<---

volatile int sync_step = 0;
volatile char *p;

static inline unsigned long rdtsc()
{
	unsigned long hi, lo;
	__asm__ __volatile__ ("rdtsc" : "=a"(lo), "=d"(hi));
	 return lo | (hi << 32);
}

static inline void wait_rdtsc(unsigned long cycles)
{
	unsigned long tsc = rdtsc();

	while (rdtsc() - tsc < cycles);
}

void *big_madvise_thread(void *ign)
{
	sync_step = 1;
	while (sync_step != 2);
	madvise((void*)p, PAGE_SIZE * N_PAGES, MADV_DONTNEED);
}

int main(void)
{
	pthread_t aux_thread;

	p = mmap(0, PAGE_SIZE * N_PAGES, PROT_READ|PROT_WRITE,
		 MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);

	memset((void*)p, 8, PAGE_SIZE * N_PAGES);

	pthread_create(&aux_thread, NULL, big_madvise_thread, NULL);
	while (sync_step != 1);

	*p = 8;		// Cache in TLB
	sync_step = 2;
	wait_rdtsc(100000);
	madvise((void*)p, PAGE_SIZE, MADV_DONTNEED);
	printf("data: %d (%s)\n", *p, (*p == 8 ? "stale, broken" : "cleared, fine"));
	return *p == 8 ? -1 : 0;
}
---8<---

Link: http://lkml.kernel.org/r/20170725101230.5v7gvnjmcnkzzql3@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reported-by: Nadav Amit <nadav.amit@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06 17:27:26 -07:00
arch metag/numa: remove the unused parent_node() macro 2017-09-06 17:27:24 -07:00
block Char/Misc drivers for 4.14-rc1 2017-09-05 11:08:17 -07:00
certs modsign: add markers to endif-statements in certs/Makefile 2017-07-14 11:01:37 +10:00
crypto crypto: algif_skcipher - only call put_page on referenced and used pages 2017-08-22 14:45:48 +08:00
Documentation mm, page_alloc: rip out ZONELIST_ORDER_ZONE 2017-09-06 17:27:25 -07:00
drivers zram: add config and doc file for writeback feature 2017-09-06 17:27:25 -07:00
firmware firmware/Makefile: force recompilation if makefile changes 2017-05-08 17:15:10 -07:00
fs ocfs2: clean up some dead code 2017-09-06 17:27:24 -07:00
include mm, memory_hotplug: get rid of zonelists_mutex 2017-09-06 17:27:26 -07:00
init mm, memory_hotplug: drop zone from build_all_zonelists 2017-09-06 17:27:25 -07:00
ipc Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu 2017-08-21 09:45:19 +02:00
kernel Device properties framework updates for v4.14-rc1 2017-09-05 12:50:00 -07:00
lib Driver core update for 4.14-rc1 2017-09-05 10:41:21 -07:00
mm mm: always flush VMA ranges affected by zap_page_range 2017-09-06 17:27:26 -07:00
net Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid 2017-09-05 11:54:41 -07:00
samples samples/bpf: fix bpf tunnel cleanup 2017-07-31 22:02:47 -07:00
scripts modpost: simplify sec_name() 2017-09-06 17:27:24 -07:00
security Now that IPC and other changes have landed, enable manual markings for 2017-07-19 08:55:18 -07:00
sound Merge branch 'parisc-4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux 2017-09-05 09:37:11 -07:00
tools ACPI updates for v4.14-rc1 2017-09-05 12:45:03 -07:00
usr ramfs: clarify help text that compression applies to ramfs as well as legacy ramdisk. 2017-07-06 16:24:30 -07:00
virt KVM: update to new mmu_notifier semantic v2 2017-08-31 16:13:00 -07:00
.cocciconfig scripts: add Linux .cocciconfig for coccinelle 2016-07-22 12:13:39 +02:00
.get_maintainer.ignore Add hch to .get_maintainer.ignore 2015-08-21 14:30:10 -07:00
.gitattributes .gitattributes: set git diff driver for C source code files 2016-10-07 18:46:30 -07:00
.gitignore kbuild: Add support to generate LLVM assembly files 2017-04-25 08:13:52 +09:00
.mailmap power supply and reset changes for the v4.12 series (part 2) 2017-05-12 12:02:21 -07:00
COPYING
CREDITS avr32: remove support for AVR32 architecture 2017-05-01 09:27:15 +02:00
Kbuild kbuild: Consolidate header generation from ASM offset information 2017-04-13 05:43:37 +09:00
Kconfig
MAINTAINERS ACPI updates for v4.14-rc1 2017-09-05 12:45:03 -07:00
Makefile Merge branch 'docs-next' of git://git.lwn.net/linux 2017-09-03 21:07:29 -07:00
README README: add a new README file, pointing to the Documentation/ 2016-10-24 08:12:35 -02:00

Linux kernel
============

This file was moved to Documentation/admin-guide/README.rst

Please notice that there are several guides for kernel developers and users.
These guides can be rendered in a number of formats, like HTML and PDF.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.
See Documentation/00-INDEX for a list of what is contained in each file.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.