linux/kernel
Nicolas Viennot ebd6de6812 prctl: Allow local CAP_CHECKPOINT_RESTORE to change /proc/self/exe
Originally, only a local CAP_SYS_ADMIN could change the exe link,
making it difficult for doing checkpoint/restore without CAP_SYS_ADMIN.
This commit adds CAP_CHECKPOINT_RESTORE in addition to CAP_SYS_ADMIN
for permitting changing the exe link.

The following describes the history of the /proc/self/exe permission
checks as it may be difficult to understand what decisions lead to this
point.

* [1] May 2012: This commit introduces the ability of changing
  /proc/self/exe if the user is CAP_SYS_RESOURCE capable.
  In the related discussion [2], no clear thread model is presented for
  what could happen if the /proc/self/exe changes multiple times, or why
  would the admin be at the mercy of userspace.

* [3] Oct 2014: This commit introduces a new API to change
  /proc/self/exe. The permission no longer checks for CAP_SYS_RESOURCE,
  but instead checks if the current user is root (uid=0) in its local
  namespace. In the related discussion [4] it is said that "Controlling
  exe_fd without privileges may turn out to be dangerous. At least
  things like tomoyo examine it for making policy decisions (see
  tomoyo_manager())."

* [5] Dec 2016: This commit removes the restriction to change
  /proc/self/exe at most once. The related discussion [6] informs that
  the audit subsystem relies on the exe symlink, presumably
  audit_log_d_path_exe() in kernel/audit.c.

* [7] May 2017: This commit changed the check from uid==0 to local
  CAP_SYS_ADMIN. No discussion.

* [8] July 2020: A PoC to spoof any program's /proc/self/exe via ptrace
  is demonstrated

Overall, the concrete points that were made to retain capability checks
around changing the exe symlink is that tomoyo_manager() and
audit_log_d_path_exe() uses the exe_file path.

Christian Brauner said that relying on /proc/<pid>/exe being immutable (or
guarded by caps) in a sake of security is a bit misleading. It can only
be used as a hint without any guarantees of what code is being executed
once execve() returns to userspace. Christian suggested that in the
future, we could call audit_log() or similar to inform the admin of all
exe link changes, instead of attempting to provide security guarantees
via permission checks. However, this proposed change requires the
understanding of the security implications in the tomoyo/audit subsystems.

[1] b32dfe3771 ("c/r: prctl: add ability to set new mm_struct::exe_file")
[2] https://lore.kernel.org/patchwork/patch/292515/
[3] f606b77f1a ("prctl: PR_SET_MM -- introduce PR_SET_MM_MAP operation")
[4] https://lore.kernel.org/patchwork/patch/479359/
[5] 3fb4afd9a5 ("prctl: remove one-shot limitation for changing exe link")
[6] https://lore.kernel.org/patchwork/patch/697304/
[7] 4d28df6152 ("prctl: Allow local CAP_SYS_ADMIN changing exe_file")
[8] https://github.com/nviennot/run_as_exe

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
Signed-off-by: Adrian Reber <areber@redhat.com>
Link: https://lore.kernel.org/r/20200719100418.2112740-6-areber@redhat.com
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-07-19 20:14:42 +02:00
..
bpf Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-07-10 18:16:22 -07:00
cgroup cgroup: fix cgroup_sk_alloc() for sk_clone_lock() 2020-07-07 13:34:11 -07:00
configs compiler: remove CONFIG_OPTIMIZE_INLINING entirely 2020-04-07 10:43:42 -07:00
debug kgdb: enable arch to support XML packet. 2020-07-09 20:09:28 -07:00
dma Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-07-10 18:16:22 -07:00
events Merge branch 'akpm' (patches from Andrew) 2020-06-09 09:54:46 -07:00
gcov treewide: replace '---help---' in Kconfig files with 'help' 2020-06-14 01:57:21 +09:00
irq treewide: replace '---help---' in Kconfig files with 'help' 2020-06-14 01:57:21 +09:00
kcsan kcsan: Support distinguishing volatile accesses 2020-06-11 20:04:01 +02:00
livepatch livepatch: Make klp_apply_object_relocs static 2020-05-11 00:31:38 +02:00
locking The X86 entry, exception and interrupt code rework 2020-06-13 10:05:47 -07:00
power treewide: replace '---help---' in Kconfig files with 'help' 2020-06-14 01:57:21 +09:00
printk Revert "kernel/printk: add kmsg SEEK_CUR handling" 2020-06-21 20:47:20 -07:00
rcu A single fix for a printk format warning in RCU. 2020-07-05 12:21:28 -07:00
sched Peter Zijlstra says: 2020-06-28 10:37:39 -07:00
time The X86 entry, exception and interrupt code rework 2020-06-13 10:05:47 -07:00
trace Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-06-25 18:27:40 -07:00
.gitignore .gitignore: add SPDX License Identifier 2020-03-25 11:50:48 +01:00
acct.c mmap locking API: convert mmap_sem comments 2020-06-09 09:39:14 -07:00
async.c
audit_fsnotify.c fsnotify: use helpers to access data by data_type 2020-03-23 18:19:06 +01:00
audit_tree.c
audit_watch.c \n 2020-04-06 08:58:42 -07:00
audit.c audit/stable-5.8 PR 20200601 2020-06-02 17:13:37 -07:00
audit.h audit: fix a net reference leak in audit_list_rules_send() 2020-04-22 15:23:10 -04:00
auditfilter.c audit: fix a net reference leak in audit_list_rules_send() 2020-04-22 15:23:10 -04:00
auditsc.c audit: add subj creds to NETFILTER_CFG record to 2020-05-20 18:09:19 -04:00
backtracetest.c
bounds.c
capability.c
compat.c uaccess: Selectively open read or write user access 2020-05-01 12:35:21 +10:00
configs.c proc: convert everything to "struct proc_ops" 2020-02-04 03:05:26 +00:00
context_tracking.c context_tracking: Ensure that the critical path cannot be instrumented 2020-06-11 15:14:36 +02:00
cpu_pm.c kernel/cpu_pm: Fix uninitted local in cpu_pm 2020-05-15 11:44:34 -07:00
cpu.c The changes in this cycle are: 2020-06-03 13:06:42 -07:00
crash_core.c
crash_dump.c crash_dump: Remove no longer used saved_max_pfn 2020-04-15 11:21:54 +02:00
cred.c exec: Teach prepare_exec_creds how exec treats uids & gids 2020-05-20 14:44:21 -05:00
delayacct.c
dma.c
elfcore.c
exec_domain.c
exit.c mmap locking API: convert mmap_sem comments 2020-06-09 09:39:14 -07:00
extable.c kernel/extable.c: use address-of operator on section symbols 2020-04-07 10:43:42 -07:00
fail_function.c
fork.c fork: annotate data race in copy_process() 2020-06-26 01:05:29 +02:00
freezer.c
futex.c mmap locking API: use coccinelle to convert mmap_sem rwsem call sites 2020-06-09 09:39:14 -07:00
gen_kheaders.sh kbuild: add variables for compression tools 2020-06-06 23:42:01 +09:00
groups.c mm: remove the pgprot argument to __vmalloc 2020-06-02 10:59:11 -07:00
hung_task.c kernel/hung_task.c: introduce sysctl to print all traces when a hung task is detected 2020-06-08 11:05:56 -07:00
iomem.c
irq_work.c irq_work, smp: Allow irq_work on call_single_queue 2020-05-28 10:54:15 +02:00
jump_label.c
kallsyms.c kallsyms: Refactor kallsyms_show_value() to take cred 2020-07-08 15:59:57 -07:00
kcmp.c kernel/kcmp.c: Use new infrastructure to fix deadlocks in execve 2020-03-25 10:04:01 -05:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kcov.c kcov: check kcov_softirq in kcov_remote_stop() 2020-06-10 19:14:17 -07:00
kexec_core.c kexec: add machine_kexec_post_load() 2020-01-08 16:32:55 +00:00
kexec_elf.c
kexec_file.c kexec: do not verify the signature without the lockdown or mandatory signature 2020-06-26 00:27:36 -07:00
kexec_internal.h kexec: add machine_kexec_post_load() 2020-01-08 16:32:55 +00:00
kexec.c kexec: add machine_kexec_post_load() 2020-01-08 16:32:55 +00:00
kheaders.c
kmod.c kmod: make request_module() return an error when autoloading is disabled 2020-04-10 15:36:22 -07:00
kprobes.c kprobes: Do not expose probe addresses to non-CAP_SYSLOG 2020-07-08 16:00:22 -07:00
ksysfs.c
kthread.c maccess: rename probe_kernel_{read,write} to copy_{from,to}_kernel_nofault 2020-06-17 10:57:41 -07:00
latencytop.c sysctl: pass kernel pointers to ->proc_handler 2020-04-27 02:07:40 -04:00
Makefile Notifications over pipes + Keyring notifications 2020-06-13 09:56:21 -07:00
module_signature.c
module_signing.c
module-internal.h
module.c Refactor kallsyms_show_value() users for correct cred 2020-07-09 13:09:30 -07:00
notifier.c mm: remove vmalloc_sync_(un)mappings() 2020-06-02 10:59:12 -07:00
nsproxy.c nsproxy: restore EINVAL for non-namespace file descriptor 2020-06-17 00:33:12 +02:00
padata.c padata: upgrade smp_mb__after_atomic to smp_mb in padata_do_serial 2020-06-18 17:09:54 +10:00
panic.c bug: Annotate WARN/BUG/stackfail as noinstr safe 2020-06-11 15:14:36 +02:00
params.c
pid_namespace.c pid_namespace: use checkpoint_restore_ns_capable() for ns_last_pid 2020-07-19 20:14:42 +02:00
pid.c pid: use checkpoint_restore_ns_capable() for set_tid 2020-07-19 20:14:42 +02:00
profile.c proc: convert everything to "struct proc_ops" 2020-02-04 03:05:26 +00:00
ptrace.c ptrace: reintroduce usage of subjective credentials in ptrace_has_cap() 2020-01-18 13:51:39 +01:00
range.c
reboot.c printk: Collapse shutdown types into a single dump reason 2020-05-30 10:34:03 -07:00
relay.c mmap locking API: convert mmap_sem comments 2020-06-09 09:39:14 -07:00
resource.c /dev/mem: Revoke mappings when a driver claims the region 2020-05-27 11:10:05 +02:00
rseq.c
scs.c scs: Report SCS usage in bytes rather than number of entries 2020-06-04 16:14:56 +01:00
seccomp.c sysctl: pass kernel pointers to ->proc_handler 2020-04-27 02:07:40 -04:00
signal.c task_work: teach task_work_add() to do signal_wake_up() 2020-06-30 12:18:08 -06:00
smp.c smp, irq_work: Continue smp_call_function*() and irq_work*() integration 2020-06-28 17:01:20 +02:00
smpboot.c
smpboot.h
softirq.c x86/entry: Clarify irq_{enter,exit}_rcu() 2020-06-11 15:15:24 +02:00
stackleak.c
stacktrace.c
stop_machine.c stop_machine: Make stop_cpus() static 2020-01-17 10:19:21 +01:00
sys_ni.c
sys.c prctl: Allow local CAP_CHECKPOINT_RESTORE to change /proc/self/exe 2020-07-19 20:14:42 +02:00
sysctl_binary.c
sysctl-test.c kunit: allow kunit tests to be loaded as a module 2020-01-09 16:42:29 -07:00
sysctl.c kernel/sysctl.c: ignore out-of-range taint bits introduced via kernel.tainted 2020-06-08 11:05:56 -07:00
task_work.c task_work: teach task_work_add() to do signal_wake_up() 2020-06-30 12:18:08 -06:00
taskstats.c
test_kprobes.c
torture.c CPU (hotplug) updates: 2020-03-30 18:06:39 -07:00
tracepoint.c
tsacct.c
ucount.c ucount: Make sure ucounts in /proc/sys/user don't regress again 2020-04-07 21:51:27 +02:00
uid16.c
uid16.h
umh.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-05-15 13:48:59 -07:00
up.c smp/up: Make smp_call_function_single() match SMP semantics 2020-02-07 15:34:12 +01:00
user_namespace.c nsproxy: add struct nsset 2020-05-09 13:57:12 +02:00
user-return-notifier.c
user.c user.c: make uidhash_table static 2020-06-04 19:06:24 -07:00
utsname_sysctl.c sysctl: pass kernel pointers to ->proc_handler 2020-04-27 02:07:40 -04:00
utsname.c nsproxy: add struct nsset 2020-05-09 13:57:12 +02:00
watch_queue.c Notifications over pipes + Keyring notifications 2020-06-13 09:56:21 -07:00
watchdog_hld.c
watchdog.c kernel/watchdog.c: convert {soft/hard}lockup boot parameters to sysctl aliases 2020-06-08 11:05:56 -07:00
workqueue_internal.h
workqueue.c maccess: rename probe_kernel_{read,write} to copy_{from,to}_kernel_nofault 2020-06-17 10:57:41 -07:00