linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-16 08:44:21 +08:00

History

Kees Cook 7a46ec0e2f locking/refcounts, x86/asm: Implement fast refcount overflow protection This implements refcount_t overflow protection on x86 without a noticeable performance impact, though without the fuller checking of REFCOUNT_FULL. This is done by duplicating the existing atomic_t refcount implementation but with normally a single instruction added to detect if the refcount has gone negative (e.g. wrapped past INT_MAX or below zero). When detected, the handler saturates the refcount_t to INT_MIN / 2. With this overflow protection, the erroneous reference release that would follow a wrap back to zero is blocked from happening, avoiding the class of refcount-overflow use-after-free vulnerabilities entirely. Only the overflow case of refcounting can be perfectly protected, since it can be detected and stopped before the reference is freed and left to be abused by an attacker. There isn't a way to block early decrements, and while REFCOUNT_FULL stops increment-from-zero cases (which would be the state _after_ an early decrement and stops potential double-free conditions), this fast implementation does not, since it would require the more expensive cmpxchg loops. Since the overflow case is much more common (e.g. missing a "put" during an error path), this protection provides real-world protection. For example, the two public refcount overflow use-after-free exploits published in 2016 would have been rendered unexploitable: http://perception-point.io/2016/01/14/analysis-and-exploitation-of-a-linux-kernel-vulnerability-cve-2016-0728/ http://cyseclabs.com/page?n=02012016 This implementation does, however, notice an unchecked decrement to zero (i.e. caller used refcount_dec() instead of refcount_dec_and_test() and it resulted in a zero). Decrements under zero are noticed (since they will have resulted in a negative value), though this only indicates that a use-after-free may have already happened. Such notifications are likely avoidable by an attacker that has already exploited a use-after-free vulnerability, but it's better to have them reported than allow such conditions to remain universally silent. On first overflow detection, the refcount value is reset to INT_MIN / 2 (which serves as a saturation value) and a report and stack trace are produced. When operations detect only negative value results (such as changing an already saturated value), saturation still happens but no notification is performed (since the value was already saturated). On the matter of races, since the entire range beyond INT_MAX but before 0 is negative, every operation at INT_MIN / 2 will trap, leaving no overflow-only race condition. As for performance, this implementation adds a single "js" instruction to the regular execution flow of a copy of the standard atomic_t refcount operations. (The non-"and_test" refcount_dec() function, which is uncommon in regular refcount design patterns, has an additional "jz" instruction to detect reaching exactly zero.) Since this is a forward jump, it is by default the non-predicted path, which will be reinforced by dynamic branch prediction. The result is this protection having virtually no measurable change in performance over standard atomic_t operations. The error path, located in .text.unlikely, saves the refcount location and then uses UD0 to fire a refcount exception handler, which resets the refcount, handles reporting, and returns to regular execution. This keeps the changes to .text size minimal, avoiding return jumps and open-coded calls to the error reporting routine. Example assembly comparison: refcount_inc() before: .text: ffffffff81546149: f0 ff 45 f4 lock incl -0xc(%rbp) refcount_inc() after: .text: ffffffff81546149: f0 ff 45 f4 lock incl -0xc(%rbp) ffffffff8154614d: 0f 88 80 d5 17 00 js ffffffff816c36d3 ... .text.unlikely: ffffffff816c36d3: 48 8d 4d f4 lea -0xc(%rbp),%rcx ffffffff816c36d7: 0f ff (bad) These are the cycle counts comparing a loop of refcount_inc() from 1 to INT_MAX and back down to 0 (via refcount_dec_and_test()), between unprotected refcount_t (atomic_t), fully protected REFCOUNT_FULL (refcount_t-full), and this overflow-protected refcount (refcount_t-fast): 2147483646 refcount_inc()s and 2147483647 refcount_dec_and_test()s: cycles protections atomic_t 82249267387 none refcount_t-fast 82211446892 overflow, untested dec-to-zero refcount_t-full 144814735193 overflow, untested dec-to-zero, inc-from-zero This code is a modified version of the x86 PAX_REFCOUNT atomic_t overflow defense from the last public patch of PaX/grsecurity, based on my understanding of the code. Changes or omissions from the original code are mine and don't reflect the original grsecurity/PaX code. Thanks to PaX Team for various suggestions for improvement for repurposing this code to be a refcount-only protection. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: David S. Miller <davem@davemloft.net> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Elena Reshetova <elena.reshetova@intel.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Hans Liljestrand <ishkamiel@gmail.com> Cc: James Bottomley <James.Bottomley@hansenpartnership.com> Cc: Jann Horn <jannh@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Serge E. Hallyn <serge@hallyn.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: arozansk@redhat.com Cc: axboe@kernel.dk Cc: kernel-hardening@lists.openwall.com Cc: linux-arch <linux-arch@vger.kernel.org> Link: http://lkml.kernel.org/r/20170815161924.GA133115@beast Signed-off-by: Ingo Molnar <mingo@kernel.org>		2017-08-17 10:40:26 +02:00
..
bpf	bpf: fix bpf_prog_get_info_by_fd to dump correct xlated_prog_len	2017-07-29 23:29:41 -07:00
cgroup	cpuset: Make nr_cpusets private	2017-08-10 12:28:57 +02:00
configs	config: android-base: disable CONFIG_NFSD and CONFIG_NFS_FS	2017-06-09 11:47:38 +02:00
debug	sched/headers: Prepare for new header dependencies before moving code to <linux/sched/debug.h>	2017-03-02 08:42:34 +01:00
events	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2017-07-21 11:12:48 -07:00
gcov	gcov: support GCC 7.1	2017-05-12 15:57:15 -07:00
irq	genirq/cpuhotplug: Revert "Set force affinity flag on hotplug migration"	2017-07-27 15:40:02 +02:00
livepatch	livepatch: Fix stacking of patches with respect to RCU	2017-06-20 10:42:19 +02:00
locking	locking/lockdep: Fix the rollback and overwrite detection logic in crossrelease	2017-08-14 12:52:17 +02:00
power	mm: fix global NR_SLAB_.*CLAIMABLE counter reads	2017-08-10 15:54:06 -07:00
printk	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk	2017-07-05 11:11:26 -07:00
rcu	rcu: Remove RCU CPU stall warnings from Tiny RCU	2017-06-08 18:52:45 -07:00
sched	locking/lockdep: Apply crossrelease to completions	2017-08-10 12:29:10 +02:00
time	timers: Fix overflow in get_next_timer_interrupt	2017-08-01 14:20:53 +02:00
trace	trace: fix the errors caused by incompatible type of RCU variables	2017-07-20 09:27:29 -04:00
.gitignore
acct.c	sched/headers: Prepare to move cputime functionality from <linux/sched.h> into <linux/sched/cputime.h>	2017-03-02 08:42:39 +01:00
async.c	async: Adjust system_state checks	2017-05-23 10:01:37 +02:00
audit_fsnotify.c	Merge branch 'fsnotify' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs	2017-05-03 11:05:15 -07:00
audit_tree.c	Merge branch 'fsnotify' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs	2017-05-03 11:05:15 -07:00
audit_watch.c	Merge branch 'fsnotify' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs	2017-05-03 11:05:15 -07:00
audit.c	Merge branch 'stable-4.13' of git://git.infradead.org/users/pcmoore/audit	2017-07-20 10:22:26 -07:00
audit.h	audit: style fix	2017-06-12 18:07:43 -04:00
auditfilter.c	audit: kernel generated netlink traffic should have a portid of 0	2017-05-02 10:16:05 -04:00
auditsc.c	Merge branch 'stable-4.13' of git://git.infradead.org/users/pcmoore/audit	2017-07-05 11:24:05 -07:00
backtracetest.c
bounds.c
capability.c	capability: export has_capability	2017-01-12 07:01:56 -07:00
compat.c	Merge branch 'misc.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2017-07-06 20:57:13 -07:00
configs.c	Replace <asm/uaccess.h> with <linux/uaccess.h> globally	2016-12-24 11:46:01 -08:00
context_tracking.c
cpu_pm.c
cpu.c	smp/hotplug: Replace BUG_ON and react useful	2017-07-11 22:25:44 +02:00
crash_core.c	kdump: protect vmcoreinfo data under the crash memory	2017-07-12 16:26:00 -07:00
crash_dump.c
cred.c	doc: ReSTify credentials.txt	2017-05-18 10:30:19 -06:00
delayacct.c	sched/headers: Prepare to move cputime functionality from <linux/sched.h> into <linux/sched/cputime.h>	2017-03-02 08:42:39 +01:00
dma.c
elfcore.c
exec_domain.c
exit.c	locking/lockdep: Implement the 'crossrelease' feature	2017-08-10 12:29:07 +02:00
extable.c	lib/extable.c: use bsearch() library function in search_extable()	2017-07-10 16:32:35 -07:00
fork.c	Merge branch 'linus' into locking/core, to resolve conflicts	2017-08-11 13:51:59 +02:00
freezer.c
futex_compat.c	Replace <asm/uaccess.h> with <linux/uaccess.h> globally	2016-12-24 11:46:01 -08:00
futex.c	Merge branch 'linus' into locking/core, to pick up fixes	2017-08-10 12:20:53 +02:00
groups.c	kernel/groups.c: use sort library function	2017-07-10 16:32:34 -07:00
hung_task.c	kernel/hung_task.c: defer showing held locks	2017-05-08 17:15:10 -07:00
irq_work.c
jump_label.c	jump_label: Provide hotplug context variants	2017-08-10 12:28:59 +02:00
kallsyms.c	kernel/kallsyms.c: replace all_var with IS_ENABLED(CONFIG_KALLSYMS_ALL)	2017-07-10 16:32:34 -07:00
kcmp.c	kcmp: add KCMP_EPOLL_TFD mode to compare epoll target files	2017-07-12 16:26:01 -07:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks	locking/mutex: Allow MUTEX_SPIN_ON_OWNER when DEBUG_MUTEXES	2016-10-25 11:31:51 +02:00
Kconfig.preempt
kcov.c	kcov: simplify interrupt check	2017-05-08 17:15:12 -07:00
kexec_core.c	kdump: protect vmcoreinfo data under the crash memory	2017-07-12 16:26:00 -07:00
kexec_file.c	kexec_file: adjust declaration of kexec_purgatory	2017-07-12 16:26:02 -07:00
kexec_internal.h	kexec_file: adjust declaration of kexec_purgatory	2017-07-12 16:26:02 -07:00
kexec.c	kdump: protect vmcoreinfo data under the crash memory	2017-07-12 16:26:00 -07:00
kmod.c	kmod: throttle kmod thread limit	2017-07-14 15:05:13 -07:00
kprobes.c	kprobes: Ensure that jprobe probepoints are at function entry	2017-07-08 11:05:35 +02:00
ksysfs.c	kexec: move vmcoreinfo out of the kernel's .bss section	2017-07-12 16:25:59 -07:00
kthread.c	cgroup, kthread: close race window where new kthreads can be migrated to non-root cgroups	2017-03-17 10:18:47 -04:00
latencytop.c	sched/headers: Prepare to move sched_info_on() and force_schedstat_enabled() from <linux/sched.h> to <linux/sched/stat.h>	2017-03-02 08:42:39 +01:00
Makefile	kernel/watchdog: split up config options	2017-07-12 16:26:02 -07:00
membarrier.c	Fix: Disable sys_membarrier when nohz_full is enabled	2017-01-23 11:32:16 -08:00
memremap.c	mm, memory_hotplug: replace for_device by want_memblock in arch_add_memory	2017-07-06 16:24:32 -07:00
module_signing.c
module-internal.h
module.c	Modules updates for v4.13	2017-07-12 17:22:01 -07:00
notifier.c	kernel/notifier.c: simplify expression	2017-02-24 17:46:56 -08:00
nsproxy.c	perf: Add PERF_RECORD_NAMESPACES to include namespaces related info	2017-03-13 15:57:41 -03:00
padata.c	padata: Avoid nested calls to cpus_read_lock() in pcrypt_init_padata()	2017-05-26 10:10:37 +02:00
panic.c	locking/refcounts, x86/asm: Implement fast refcount overflow protection	2017-08-17 10:40:26 +02:00
params.c	boot/param: Move next_arg() function to lib/cmdline.c for later reuse	2017-04-18 10:37:13 +02:00
pid_namespace.c	pid_ns: Sleep in TASK_INTERRUPTIBLE in zap_pid_ns_processes	2017-05-13 17:26:01 -05:00
pid.c	pid: kill pidhash_size in pidhash_init()	2017-08-02 16:34:46 -07:00
profile.c	sched/headers: Prepare to move sched_info_on() and force_schedstat_enabled() from <linux/sched.h> to <linux/sched/stat.h>	2017-03-02 08:42:39 +01:00
ptrace.c	ptrace: Properly initialize ptracer_cred on fork	2017-05-23 07:40:44 -05:00
range.c
reboot.c
relay.c	Merge branch 'work.splice' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2017-05-02 11:38:06 -07:00
resource.c
seccomp.c	seccomp: Switch from atomic_t to recount_t	2017-06-26 09:24:00 -07:00
signal.c	Fix compat_sys_sigpending breakage	2017-08-06 11:48:27 -07:00
smp.c	smp, cpumask: Use non-atomic cpumask_{set,clear}_cpu()	2017-05-23 10:01:32 +02:00
smpboot.c	sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task.h>	2017-03-02 08:42:35 +01:00
smpboot.h
softirq.c	sched/core: Remove 'task' parameter and rename tsk_restore_flags() to current_restore_flags()	2017-04-11 09:06:32 +02:00
stacktrace.c	stacktrace/x86: add function for detecting reliable stack traces	2017-03-08 09:18:02 +01:00
stop_machine.c	stop_machine: Provide stop_machine_cpuslocked()	2017-05-26 10:10:36 +02:00
sys_ni.c	move aio compat to fs/aio.c	2016-12-22 22:58:37 -05:00
sys.c	fix a braino in compat_sys_getrlimit()	2017-07-12 09:15:00 -07:00
sysctl_binary.c	kernel/sysctl_binary.c: check name array length in deprecated_sysctl_warning()	2017-07-12 16:26:00 -07:00
sysctl.c	kernel/watchdog: split up config options	2017-07-12 16:26:02 -07:00
task_work.c
taskstats.c	taskstats: add e/u/stime for TGID command	2017-05-08 17:15:12 -07:00
test_kprobes.c
torture.c	sched/headers: Prepare for new header dependencies before moving code to <linux/sched/clock.h>	2017-03-02 08:42:27 +01:00
tracepoint.c	sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task.h>	2017-03-02 08:42:35 +01:00
tsacct.c	sched/headers: Prepare to move cputime functionality from <linux/sched.h> into <linux/sched/cputime.h>	2017-03-02 08:42:39 +01:00
ucount.c	ucount: Remove the atomicity from ucount->count	2017-03-06 15:26:37 -06:00
uid16.c	sched/headers: Prepare to remove <linux/cred.h> inclusion from <linux/sched.h>	2017-03-02 08:42:31 +01:00
up.c
user_namespace.c	sched/headers: Prepare for new header dependencies before moving code to <linux/sched/signal.h>	2017-03-02 08:42:29 +01:00
user-return-notifier.c
user.c	sched/headers: Prepare for new header dependencies before moving code to <linux/sched/user.h>	2017-03-02 08:42:29 +01:00
utsname_sysctl.c	sched/headers: Remove <linux/rwsem.h> from <linux/sched.h>	2017-03-03 01:45:36 +01:00
utsname.c	sched/headers: Prepare to move the task_lock()/unlock() APIs to <linux/sched/task.h>	2017-03-02 08:42:38 +01:00
watchdog_hld.c	kernel/watchdog: split up config options	2017-07-12 16:26:02 -07:00
watchdog.c	kernel/watchdog.c: use better pr_fmt prefix	2017-07-14 15:05:13 -07:00
workqueue_internal.h
workqueue.c	locking/lockdep: Implement the 'crossrelease' feature	2017-08-10 12:29:07 +02:00