linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-12-27 04:54:41 +08:00

Author	SHA1	Message	Date
Alexey Gladkov	fa10fed30f	proc: allow to mount many instances of proc in one pid namespace This patch allows to have multiple procfs instances inside the same pid namespace. The aim here is lightweight sandboxes, and to allow that we have to modernize procfs internals. 1) The main aim of this work is to have on embedded systems one supervisor for apps. Right now we have some lightweight sandbox support, however if we create pid namespacess we have to manages all the processes inside too, where our goal is to be able to run a bunch of apps each one inside its own mount namespace without being able to notice each other. We only want to use mount namespaces, and we want procfs to behave more like a real mount point. 2) Linux Security Modules have multiple ptrace paths inside some subsystems, however inside procfs, the implementation does not guarantee that the ptrace() check which triggers the security_ptrace_check() hook will always run. We have the 'hidepid' mount option that can be used to force the ptrace_may_access() check inside has_pid_permissions() to run. The problem is that 'hidepid' is per pid namespace and not attached to the mount point, any remount or modification of 'hidepid' will propagate to all other procfs mounts. This also does not allow to support Yama LSM easily in desktop and user sessions. Yama ptrace scope which restricts ptrace and some other syscalls to be allowed only on inferiors, can be updated to have a per-task context, where the context will be inherited during fork(), clone() and preserved across execve(). If we support multiple private procfs instances, then we may force the ptrace_may_access() on /proc/<pids>/ to always run inside that new procfs instances. This will allow to specifiy on user sessions if we should populate procfs with pids that the user can ptrace or not. By using Yama ptrace scope, some restricted users will only be able to see inferiors inside /proc, they won't even be able to see their other processes. Some software like Chromium, Firefox's crash handler, Wine and others are already using Yama to restrict which processes can be ptracable. With this change this will give the possibility to restrict /proc/<pids>/ but more importantly this will give desktop users a generic and usuable way to specifiy which users should see all processes and which users can not. Side notes: * This covers the lack of seccomp where it is not able to parse arguments, it is easy to install a seccomp filter on direct syscalls that operate on pids, however /proc/<pid>/ is a Linux ABI using filesystem syscalls. With this change LSMs should be able to analyze open/read/write/close... In the new patch set version I removed the 'newinstance' option as suggested by Eric W. Biederman. Selftest has been added to verify new behavior. Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com> Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2020-04-22 10:51:21 -05:00
Masahiro Yamada	d198b34f38	.gitignore: add SPDX License Identifier Add SPDX License Identifier to all .gitignore files. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-03-25 11:50:48 +01:00
Alexey Dobriyan	7dbbade1f2	proc: test /proc/sysvipc vs setns(CLONE_NEWIPC) I thought that /proc/sysvipc has the same bug as /proc/net commit `1fde6f21d9` proc: fix /proc/net/* after setns(2) However, it doesn't! /proc/sysvipc files do get_ipc_ns(current->nsproxy->ipc_ns); in their open() hook and avoid the problem. Keep the test, maybe /proc/sysvipc will become broken someday :-\ Link: http://lkml.kernel.org/r/20190706180146.GA21015@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-07-16 19:23:21 -07:00
Alexey Dobriyan	e483b02087	proc: test /proc//maps, smaps, smaps_rollup, statm Start testing VM related fiels found in per-process files. Do it by jiting small executable which brings its address space to precisely known state, then comparing /proc//maps, smaps, smaps_rollup, and statm files to expected values. Currently only x86_64 is supported. [adobriyan@gmail.com: exit correctly in /proc/*/maps test] Link: http://lkml.kernel.org/r/20190206073659.GB15311@avx2 Link: http://lkml.kernel.org/r/20190203165806.GA14568@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-03-05 21:07:22 -08:00
Alexey Dobriyan	1fde6f21d9	proc: fix /proc/net/* after setns(2) /proc entries under /proc/net/* can't be cached into dcache because setns(2) can change current net namespace. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: avoid vim miscolorization] [adobriyan@gmail.com: write test, add dummy ->d_revalidate hook: necessary if /proc/net/* is pinned at setns time] Link: http://lkml.kernel.org/r/20190108192350.GA12034@avx2 Link: http://lkml.kernel.org/r/20190107162336.GA9239@avx2 Fixes: `1da4d377f9` ("proc: revalidate misc dentries") Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reported-by: Mateusz Stępień <mateusz.stepien@netrounds.com> Reported-by: Ahmad Fatoum <a.fatoum@pengutronix.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-02-01 15:46:22 -08:00
Alexey Dobriyan	2cd36fb329	proc: test /proc/thread-self symlink Same story: I have WIP patch to make it faster, so better have a test as well. Link: http://lkml.kernel.org/r/20180627195209.GC18113@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Shuah Khan <shuahkh@osg.samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-22 10:52:45 -07:00
Alexey Dobriyan	61d47c4e71	proc: test /proc/self symlink There are plans to change how /proc/self result is calculated, for that a test is necessary. Use direct system call because of this whole getpid caching story. Link: http://lkml.kernel.org/r/20180627195103.GB18113@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Shuah Khan <shuahkh@osg.samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-22 10:52:45 -07:00
Alexey Dobriyan	b2f5de0334	tools/testing/selftests/proc: test /proc//fd a bit (+ PF_KTHREAD is ABI!) Test lookup in /proc/self/fd. "map_files" lookup story showed that lookup is not that simple. * Test that all those symlinks open the same file. Check with (st_dev, st_info). * Test that kernel threads do not have anything in their /proc//fd/ directory. Now this is where things get interesting. First, kernel threads aren't pinned by /proc/self or equivalent, thus some "atomicity" is required. Second, ->comm can contain whitespace and ')'. No, they are not escaped. Third, the only reliable way to check if process is kernel thread appears to be field #9 in /proc//stat. This field is struct task_struct::flags in decimal! Check is done by testing PF_KTHREAD flags like we do in kernel. PF_KTREAD value is a part of userspace ABI !!! Other methods for determining kernel threadness are not reliable: * RSS can be 0 if everything is swapped, even while reading from /proc/self. * ->total_vm CAN BE ZERO if process is finishing munmap(NULL, whole address space); * /proc/*/maps and similar files can be empty because unmapping everything works. Read returning 0 can't distinguish between kernel thread and such suicide process. Link: http://lkml.kernel.org/r/20180505000414.GA15090@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-06-07 17:34:38 -07:00
Alexey Dobriyan	1f5bd05476	proc: selftests: test /proc/uptime The only tests I could come up with for /proc/uptime are: - test that values increase monotonically for 1 second, - bounce around CPUs and test the same thing. Avoid glibc like plague for affinity given patches like this: https://marc.info/?l=linux-kernel&m=152130031912594&w=4 Link: http://lkml.kernel.org/r/20180317165235.GB3445@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-04-11 10:28:34 -07:00
Alexey Dobriyan	05c3f29283	proc: selftests: shotgun testing of read/readdir/readlink/write Perform reads with nearly everything in /proc, and some writing as well. Hopefully memleak checkers and KASAN will find something. [adobriyan@gmail.com: /proc/kmsg can and will block if read under root] Link: http://lkml.kernel.org/r/20180316232147.GA20146@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> [adobriyan@gmail.com: /proc/sysrq-trigger lives on the ground floor] Link: http://lkml.kernel.org/r/20180317164911.GA3445@avx2 Link: http://lkml.kernel.org/r/20180315201251.GA12396@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-04-11 10:28:34 -07:00
Alexey Dobriyan	5de3d401b7	proc: add selftest for last field of /proc/loadavg Test fork counter formerly known as ->last_pid, the only part of /proc/loadavg which can be tested. Testing in init pid namespace is not reliable because of background activity. Link: http://lkml.kernel.org/r/20180311152241.GA26247@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-04-11 10:28:34 -07:00
Alexey Dobriyan	35318db566	proc: fix /proc/*/map_files lookup some more I totally forgot that _parse_integer() accepts arbitrary amount of leading zeroes leading to the following lookups: OK # readlink /proc/1/map_files/56427ecba000-56427eddc000 /lib/systemd/systemd bogus # readlink /proc/1/map_files/00000000000056427ecba000-56427eddc000 /lib/systemd/systemd # readlink /proc/1/map_files/56427ecba000-00000000000056427eddc000 /lib/systemd/systemd Link: http://lkml.kernel.org/r/20180303215130.GA23480@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Pavel Emelyanov <xemul@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-04-11 10:28:34 -07:00
Alexey Dobriyan	c4219edf1d	proc: test /proc/self/syscall Read from /proc/self/syscall should yield read system call and correct args in the output as current is reading /proc/self/syscall. Link: http://lkml.kernel.org/r/20180226212145.GB742@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-04-11 10:28:34 -07:00
Alexey Dobriyan	9cd6565558	proc: test /proc/self/wchan This patch starts testing /proc. Many more tests to come (I promise). Read from /proc/self/wchan should always return "0" as current is in TASK_RUNNING state while reading /proc/self/wchan. Link: http://lkml.kernel.org/r/20180226212006.GA742@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-04-11 10:28:34 -07:00

14 Commits