systemd

mirror of https://github.com/systemd/systemd.git synced 2024-11-24 10:43:35 +08:00

Author	SHA1	Message	Date
Zbigniew Jędrzejewski-Szmek	fb6692ed33	Merge pull request #11927 from poettering/network-namespace-path Add NetworkNamespacePath= to unit files	2019-03-12 14:29:14 +01:00
Lennart Poettering	8df87b4383	man: document that ProtectHostname= disables hostname change notifications	2019-03-08 15:49:10 +01:00
Lennart Poettering	4107452e51	man: document NetworkNamespacePath=	2019-03-07 21:27:02 +01:00
Lennart Poettering	eb5149ba74	Merge pull request #11682 from topimiettinen/private-utsname core: ProtectHostname feature	2019-02-20 14:12:15 +01:00
Topi Miettinen	aecd5ac621	core: ProtectHostname= feature Let services use a private UTS namespace. In addition, a seccomp filter is installed on set{host,domain}name and a ro bind mounts on /proc/sys/kernel/{host,domain}name.	2019-02-20 10:50:44 +02:00
Lennart Poettering	dcf3c3c3d9	core: export $PIDFILE env var for services, derived from PIDFile=	2019-02-15 11:32:19 +01:00
Zbigniew Jędrzejewski-Szmek	e0e2ecd5a8	man: move entries to the right section in systemd.directives They were in "miscellaneuos" because of the missing class= assignment. Probably introduced when the split into sections was done.	2019-02-13 11:17:41 +01:00
Yu Watanabe	d1698b82e6	man: add referecne to systemd-system.conf	2019-02-01 12:31:51 +01:00
Yu Watanabe	68d838f71d	man: fix volume num of journalctl	2019-02-01 12:30:36 +01:00
Topi Miettinen	10d44e72ec	Document weaknesses with MDWE and suggest hardening Closes #11473	2019-01-21 11:37:46 +01:00
Philip Withnall	35f2c0ba6a	man: Fix a typo in systemd.exec.xml Signed-off-by: Philip Withnall <withnall@endlessm.com>	2019-01-16 21:33:38 +09:00
Alex Mayer	8d7fac92f0	Docs: Add Missing Space Between Words	2019-01-03 03:07:50 +09:00
Zbigniew Jędrzejewski-Szmek	0b57803630	man: add note about systemd-vconsole-setup.service and tty as input/output Closes #10019.	2018-12-14 11:18:32 +01:00
Lennart Poettering	438311a518	man: document that env vars are not suitable for passing secrets Prompted by the thread around: https://lists.freedesktop.org/archives/systemd-devel/2018-November/041665.html	2018-11-14 09:12:49 +03:00
Lennart Poettering	0e18724eb1	man: emphasize the ReadOnlyPaths= mount propagation "hole" This changes the ProtectSystem= documentation to refer in more explicit words to the restrictions of ReadOnlyPath=, as sugegsted in #9857. THis also extends the paragraph in ReadOnlyPath= that explains the hole. Fixes: #9857	2018-10-30 15:30:18 +01:00
Lennart Poettering	d287820dec	man: document that various sandboxing settings are not available in --user services This is brief and doesn't go into detail, but should at least indicate to those searching for it that some stuff is not available. Fixes: #9870	2018-10-30 15:30:18 +01:00
Anita Zhang	90fc172e19	core: implement per unit journal rate limiting Add LogRateLimitIntervalSec= and LogRateLimitBurst= options for services. If provided, these values get passed to the journald client context, and those values are used in the rate limiting function in the journal over the the journald.conf values. Part of #10230	2018-10-18 09:56:20 +02:00
Alan Jenkins	923f910115	man/systemd.exec: MountFlags=shared behaviour was changed (fixed?) The behaviour described was observed on Fedora 28 (systemd-238-9.git0e0aa59), with and without SELinux. I don't actually know why though! It contradicts my understanding of the code, including an explicit comment in the code. Testing in a VM upgraded to v239-792-g1327f272d, this behaviour goes away. Test case: # /etc/systemd/system/mount-test.service [Service] MountFlags=shared Type=oneshot ExecStart=/usr/bin/ls -l /proc/1/ns/mnt /proc/self/ns/mnt ExecStart=/usr/bin/grep ext4 /proc/self/mountinfo Weird old behaviour: new mount namespace but / is fully shared. lrwxrwxrwx. 1 root root 0 Sep 14 11:18 /proc/1/ns/mnt -> mnt:[4026531840] lrwxrwxrwx. 1 root root 0 Sep 14 11:48 /proc/self/ns/mnt -> mnt:[4026532851] 968 967 253:0 / / rw,relatime shared:1 - ext4 /dev/mapper/alan_dell_2016... Current behaviour: / is not fully shared lrwxrwxrwx. 1 root root 0 Sep 14 11:39 /proc/1/ns/mnt -> mnt:[4026531840] lrwxrwxrwx. 1 root root 0 Sep 14 11:41 /proc/self/ns/mnt -> mnt:[4026532329] 591 558 8:3 / / rw,relatime shared:313 master:1 - ext4 /dev/sda3 rw,secl...	2018-10-05 17:38:38 +02:00
Yu Watanabe	d491e65e74	man: document RUNTIME_DIRECTORY= or friends	2018-09-13 17:02:58 +09:00
Lennart Poettering	2d2224e407	man: document that most sandboxing options are best effort only	2018-08-21 20:00:33 +02:00
Yu Watanabe	fe65e88ba6	namespace: implicitly adds DeviceAllow= when RootImage= is set RootImage= may require the following settings ``` DeviceAllow=/dev/loop-control rw DeviceAllow=block-loop rwm DeviceAllow=block-blkext rwm ``` This adds the following settings implicitly when RootImage= is specified. Fixes #9737.	2018-08-06 14:02:31 +09:00
Zsolt Dollenstein	566b7d23eb	Add support for opening files for appending Addresses part of #8983	2018-07-20 03:54:22 -07:00
Lennart Poettering	9236cabf78	man: elaborate a bit on the effect of PrivateNetwork= Triggered by this thread: https://lists.freedesktop.org/archives/systemd-devel/2018-July/040992.html	2018-07-17 21:41:23 +02:00
Alexander Kurtz	1448dfa6bf	man: Mention that paths in unit files must be fully normalized. Related to issues #9107 and #9498 and PRs #9149 and #9157.	2018-07-05 22:55:26 +02:00
Zbigniew Jędrzejewski-Szmek	514094f933	man: drop mode line in file headers This is already included in .dir-locals, so we don't need it in the files themselves.	2018-07-03 01:32:25 +02:00
Lennart Poettering	705268414f	seccomp: add new system call filter, suitable as default whitelist for system services Currently we employ mostly system call blacklisting for our system services. Let's add a new system call filter group @system-service that helps turning this around into a whitelist by default. The new group is very similar to nspawn's default filter list, but in some ways more restricted (as sethostname() and suchlike shouldn't be available to most system services just like that) and in others more relaxed (for example @keyring is blocked in nspawn since it's not properly virtualized yet in the kernel, but is fine for regular system services).	2018-06-14 17:44:20 +02:00
Zbigniew Jędrzejewski-Szmek	fdbbee37d5	man: drop unused <authorgroup> tags from man sources Docbook styles required those to be present, even though the templates that we use did not show those names anywhere. But something changed semi-recently (I would suspect docbook templates, but there was only a minor version bump in recent years, and the changelog does not suggest anything related), and builds now work without those entries. Let's drop this dead weight. Tested with F26-F29, debian unstable. $ perl -i -0pe 's/\s<authorgroup>.<.authorgroup>//gms' man/*xml	2018-06-14 12:22:18 +02:00
Lennart Poettering	0c69794138	tree-wide: remove Lennart's copyright lines These lines are generally out-of-date, incomplete and unnecessary. With SPDX and git repository much more accurate and fine grained information about licensing and authorship is available, hence let's drop the per-file copyright notice. Of course, removing copyright lines of others is problematic, hence this commit only removes my own lines and leaves all others untouched. It might be nicer if sooner or later those could go away too, making git the only and accurate source of authorship information.	2018-06-14 10:20:20 +02:00
Lennart Poettering	818bf54632	tree-wide: drop 'This file is part of systemd' blurb This part of the copyright blurb stems from the GPL use recommendations: https://www.gnu.org/licenses/gpl-howto.en.html The concept appears to originate in times where version control was per file, instead of per tree, and was a way to glue the files together. Ultimately, we nowadays don't live in that world anymore, and this information is entirely useless anyway, as people are very welcome to copy these files into any projects they like, and they shouldn't have to change bits that are part of our copyright header for that. hence, let's just get rid of this old cruft, and shorten our codebase a bit.	2018-06-14 10:20:20 +02:00
Zbigniew Jędrzejewski-Szmek	70127be805	Merge pull request #9153 from poettering/private-mounts introduce PrivateMounts= setting and clean up documentation for MountFlags=	2018-06-13 08:20:18 +02:00
Michael Biebl	1b2ad5d9a5	doc: more spelling fixes	2018-06-12 16:31:30 +02:00
Lennart Poettering	2f2e14b251	man: document the new PrivateMounts= setting Also, extend the documentation on MountFlags= substantially, hopefully addressing all the questions of #4393 Fixes: #4393	2018-06-12 16:27:37 +02:00
Lennart Poettering	f86fae61ec	tree-wide: drop trailing whitespace	2018-06-12 13:05:38 +02:00
Bruno Vernay	8d00da49fb	Table is easier to grasp State goes in CONFIG for users 3rd review	2018-06-11 13:52:55 +02:00
Yu Watanabe	d3c8afd092	man: RuntimeDirectory= or friends accept dot contained paths	2018-06-04 01:44:04 +09:00
Yu Watanabe	617d253afa	load-fragment: make IOScheduling{Class,Priority}= accept the empty string	2018-05-31 11:09:41 +09:00
Lennart Poettering	cdc0f9be92	Merge pull request #8817 from yuwata/cleanup-nsflags core: allow to specify RestrictNamespaces= multiple times	2018-05-24 16:49:13 +02:00
Lucas Werkmeister	8d29bef6b5	man: fix reference in StandardOutput= Since StandardOutput=file:path is more similar to StandardInput= than StandardInputText=, and only StandardInput= is actually documented above StandardOutput= whereas StandardInputText= is documented below it, I assume the intention was to refer to the former.	2018-05-14 08:11:37 +02:00
Yu Watanabe	b086654c6a	man: fix merging rule for CapabilityBoundingSet=	2018-05-05 11:07:37 +09:00
Yu Watanabe	53255e53ce	man: mention that RestrictNamespaces= can be specified multiple times	2018-05-05 11:07:37 +09:00
Lennart Poettering	46b073298f	man: don't claim we'd set XDG_SEAT and XDG_VTNR as part of service management Previously, reading through systemd.exec(5) one might get the idea that XDG_SEAT and XDG_VTNR are part of the service management logic, but they are not, they are only set if pam_systemd is part of a PAM stack an pam_systemd is used. Hence, let's drop these env vars from the list of env vars, and instead add a paragraph after the list mentioning that pam_systemd might add more systemd-specific env vars if included in the PAM stack for a service that uses PAMName=.	2018-04-27 17:32:01 +02:00
Lennart Poettering	3e0bff7d0b	man: document BSD exit codes in systemd.exec(5) too Our own tools use them now, and we probably should encourage that, hence let's document them along with the other exit codes we use.	2018-04-27 17:32:01 +02:00
Lennart Poettering	5d13a15b1d	tree-wide: drop spurious newlines (#8764 ) Double newlines (i.e. one empty lines) are great to structure code. But let's avoid triple newlines (i.e. two empty lines), quadruple newlines, quintuple newlines, …, that's just spurious whitespace. It's an easy way to drop 121 lines of code, and keeps the coding style of our sources a bit tigther.	2018-04-19 12:13:23 +02:00
Zbigniew Jędrzejewski-Szmek	11a1589223	tree-wide: drop license boilerplate Files which are installed as-is (any .service and other unit files, .conf files, .policy files, etc), are left as is. My assumption is that SPDX identifiers are not yet that well known, so it's better to retain the extended header to avoid any doubt. I also kept any copyright lines. We can probably remove them, but it'd nice to obtain explicit acks from all involved authors before doing that.	2018-04-06 18:58:55 +02:00
Yu Watanabe	e568a92d99	man: suggests TemporaryFileSystem= when people want to nest bind mounts inside InaccessiblePaths= (#8288 ) Suggested by @sourcejedi in #8242. Closes #7895, #7153, and #2780.	2018-02-27 08:59:03 +01:00
Alan Jenkins	59e00b2a16	Merge pull request #7908 from yuwata/rfe-7895 core: add TemporaryFileSystem= setting and 'tmpfs' option to ProtectHome=	2018-02-21 08:57:11 +00:00
Yu Watanabe	e4da7d8c79	core: add new option 'tmpfs' to ProtectHome= This make ProtectHome= setting can take 'tmpfs'. This is mostly equivalent to `TemporaryFileSystem=/home /run/user /root`.	2018-02-21 09:18:17 +09:00
Yu Watanabe	c10b460b5a	man: add documents for TemporaryFileSystem=	2018-02-21 09:18:11 +09:00
Yu Watanabe	4ca763a902	core/namespace: make '-' prefix in Bind{,ReadOnly}Paths= work Each path in `Bind{ReadOnly}Paths=` accept '-' prefix. However, the prefix is completely ignored. This makes it work as expected.	2018-02-21 09:07:56 +09:00
Lennart Poettering	00f5ad93b5	core: change KeyringMode= to "shared" by default for non-service units in the system manager (#8172 ) Before this change all unit types would default to "private" in the system service manager and "inherit" to in the user service manager. With this change this is slightly altered: non-service units of the system service manager are now run with KeyringMode=shared. This appears to be the more appropriate choice as isolation is not as desirable for mount tools, which regularly consume key material. After all mounts are a shared resource themselves as they appear system-wide hence it makes a lot of sense to share their key material too. Fixes: #8159	2018-02-20 08:53:34 +01:00
Alan Jenkins	2428aaf8a2	seccomp: allow x86-64 syscalls on x32, used by the VDSO (fix #8060 ) The VDSO provided by the kernel for x32, uses x86-64 syscalls instead of x32 ones. I think we can safely allow this; the set of x86-64 syscalls should be very similar to the x32 ones. The real point is not to allow x86 syscalls, because some of those are inconveniently multiplexed and we're apparently not able to block the specific actions we want to.	2018-02-02 18:12:34 +00:00
Alan Jenkins	62a0680bf2	man: systemd.exec: cleanup "only X will be permitted" ... "but X=X+1" > Only system calls of the specified architectures will be permitted to > processes of this unit. (my emphasis) > Note that setting this option to a non-empty list implies that > native is included too. Attempting to use "implies" in the later sentence, in a way that contradicts the very clear meaning of the earlier sentence... it's too much.	2018-01-31 15:39:13 +00:00
Yu Watanabe	5af1644314	man: note that `systemctl show` does not overridden value Fixes #7694.	2017-12-19 16:07:04 +09:00
Yu Watanabe	69b528832a	man: LockPersonality= implies NoNewPrivileges=	2017-12-19 12:48:54 +09:00
Lennart Poettering	f95b0be742	man: "systemd" is to be written in all lower-case, even at beginnings of sentences This very important commit is very important.	2017-12-13 17:42:04 +01:00
Yu Watanabe	bf2d3d7cae	man: fix typo	2017-12-05 23:30:47 +09:00
Yu Watanabe	606df9a5a5	man: fix typo (#7511 )	2017-11-30 12:02:20 +01:00
Lennart Poettering	b8afec2107	man: reorder/add sections to systemd.exec(5) (#7412 ) The long long list of settings is getting too confusing, let's add some sections and reorder things in them. This makes no changes regarding contents, it only reorders things, sometimes reindents them, and adds sections that made sense to me to some degree. Within each sections the settings are ordered by relevance (at least according to how relevant I personally find them), and not alphabetically.	2017-11-23 21:20:48 +01:00
Lennart Poettering	0133d5553a	Merge pull request #7198 from poettering/stdin-stdout Add StandardInput=data, StandardInput=file:... and more	2017-11-19 19:49:11 +01:00
Zbigniew Jędrzejewski-Szmek	572eb058cf	Add SPDX license identifiers to man pages	2017-11-19 19:08:15 +01:00
Zbigniew Jędrzejewski-Szmek	a6fabe384d	man: add link to kernel docs about no_new_privs	2017-11-19 11:58:45 +01:00
Lennart Poettering	fc8d038130	man: document all the new options we acquired	2017-11-17 11:13:44 +01:00
Lennart Poettering	8b8de13d54	man: document LogFieldMax= and LogExtraFields=	2017-11-16 12:40:17 +01:00
Lennart Poettering	4d14b2bd35	man: update SyslogXYZ= documentation a bit Let's clarify that these settings only apply to stdout/stderr logging. Always mention the journal before syslog (as the latter is in most ways just a legacy alias these days). Always mention the +console cases too.	2017-11-16 12:40:17 +01:00
Yu Watanabe	798499278a	man: fix wrong tag (#7358 )	2017-11-16 11:35:30 +01:00
Lennart Poettering	b0e8cec2dd	man: document > /dev/stderr pitfalls (#7317 ) Fixes: #7254 See: #2473	2017-11-14 10:51:09 +01:00
Zbigniew Jędrzejewski-Szmek	b835eeb4ec	shared/seccomp: disallow pkey_mprotect the same as mprotect for W^X mappings (#7295 ) MemoryDenyWriteExecution policy could be be bypassed by using pkey_mprotect instead of mprotect to create an executable writable mapping. The impact is mitigated by the fact that the man page says "Note that this feature is fully available on x86-64, and partially on x86", so hopefully people do not rely on it as a sole security measure. Found by Karin Hossen and Thomas Imbert from Sogeti ESEC R&D. https://bugs.launchpad.net/bugs/1725348	2017-11-12 17:28:48 +01:00
Yu Watanabe	3df90f24cc	core: allow to specify errno number in SystemCallErrorNumber=	2017-11-11 21:54:24 +09:00
Yu Watanabe	8cfa775f4f	core: add support to specify errno in SystemCallFilter= This makes each system call in SystemCallFilter= blacklist optionally takes errno name or number after a colon. The errno takes precedence over the one given by SystemCallErrorNumber=. C.f. #7173. Closes #7169.	2017-11-11 21:54:12 +09:00
Yu Watanabe	fdfcb94631	man: update documents for RuntimeDirectory= and friends	2017-11-08 15:52:08 +09:00
Zbigniew Jędrzejewski-Szmek	895265ad7d	Merge pull request #7059 from yuwata/dynamic-user-7013 dynamic-user: permit the case static uid and gid are different	2017-10-18 08:37:12 +02:00
Yu Watanabe	3bd493dc93	man: comment a requirement about the static user or group when DynamicUser=yes	2017-10-18 15:30:00 +09:00
Jakub Wilk	dcfaecc70a	man: fix typos (#7029 )	2017-10-10 21:59:03 +02:00
Lennart Poettering	44898c5358	seccomp: add three more seccomp groups @aio → asynchronous IO calls @sync → msync/fsync/... and friends @chown → changing file ownership (Also, change @privileged to reference @chown now, instead of the individual syscalls it contains)	2017-10-05 15:42:48 +02:00
Djalal Harouni	09d3020b0a	seccomp: remove '@credentials' syscall set (#6958 ) This removes the '@credentials' syscall set that was added in commit v234-468-gcd0ddf6f75. Most of these syscalls are so simple that we do not want to filter them. They work on the current calling process, doing only read operations, they do not have a deep kernel path. The problem may only be in 'capget' syscall since it can query arbitrary processes, and used to discover processes, however sending signal 0 to arbitrary processes can be used to discover if a process exists or not. It is unfortunate that Linux allows to query processes of different users. Lets put it now in '@process' syscall set, and later we may add it to a new '@basic-process' set that allows most basic process operations.	2017-10-03 07:20:05 +02:00
Lennart Poettering	4a62836033	man: document the new logic	2017-10-02 17:41:44 +02:00
Lennart Poettering	5aaeeffb5f	man: document that PAMName= and NotifyAccess=all don't mix well. See: #6045	2017-10-02 12:58:42 +02:00
Zbigniew Jędrzejewski-Szmek	3d7d3cbbda	Merge pull request #6832 from poettering/keyring-mode Add KeyringMode unit property to fix cryptsetup key caching	2017-09-15 21:24:48 +02:00
Lennart Poettering	b1edf4456e	core: add new per-unit setting KeyringMode= for controlling kernel keyring setup Usually, it's a good thing that we isolate the kernel session keyring for the various services and disconnect them from the user keyring. However, in case of the cryptsetup key caching we actually want that multiple instances of the cryptsetup service can share the keys in the root user's user keyring, hence we need to be able to disable this logic for them. This adds KeyringMode=inherit\|private\|shared: inherit: don't do any keyring magic (this is the default in systemd --user) private: a private keyring as before (default in systemd --system) shared: the new setting	2017-09-15 16:53:35 +02:00
Jan Synacek	91a8f867b6	doc: document service exit codes (Heavily reworked by Lennart while rebasing) Fixes: #3545 Replaces: #5159	2017-09-15 16:44:06 +02:00
Lennart Poettering	ab2116b140	core: make sure that $JOURNAL_STREAM prefers stderr over stdout information (#6824 ) If two separate log streams are connected to stdout and stderr, let's make sure $JOURNAL_STREAM points to the latter, as that's the preferred log destination, and the environment variable has been created in order to permit services to automatically upgrade from stderr based logging to native journal logging. Also, document this behaviour. Fixes: #6800	2017-09-15 08:26:38 +02:00
Lennart Poettering	21f0669163	Merge pull request #6801 from johnlinp/master man: explicitly distinguish "implicit dependencies" and "default dependencies"	2017-09-14 21:41:13 +02:00
Zbigniew Jędrzejewski-Szmek	8b5c528ce8	Merge pull request #6818 from poettering/nspawn-whitelist convert nspawn syscall blacklist into a whitelist (and related stuff)	2017-09-14 19:47:59 +02:00
Lennart Poettering	cd0ddf6f75	seccomp: add four new syscall groups These groups should be useful shortcuts for sets of closely related syscalls where it usually makes more sense to allow them altogether or not at all.	2017-09-14 15:45:21 +02:00
Lennart Poettering	00819cc151	core: add new UnsetEnvironment= setting for unit files With this setting we can explicitly unset specific variables for processes of a unit, as last step of assembling the environment block for them. This is useful to fix #6407. While we are at it, greatly expand the documentation on how the environment block for forked off processes is assembled.	2017-09-14 15:17:40 +02:00
Zbigniew Jędrzejewski-Szmek	e124ccdf5b	man: rework grammatical form of sentences in a table in systemd.exec(5) "Currently, the following values are defined: xxx: in case <condition>" is awkward because "xxx" is always defined unconditionally. It is _used_ in case <condition> is true. Correct this and a bunch of other places where the sentence structure makes it unclear what is the subject of the sentence.	2017-09-13 23:06:20 +02:00
John Lin	45f09f939b	man: explicitly distinguish "implicit dependencies" and "default dependencies" Fixes: #6793	2017-09-13 11:39:09 +08:00
Lennart Poettering	38a7c3c0bd	man: complete and rework $SERVICE_RESULT documentation This reworks the paragraph describing $SERVICE_RESULT into a table, and adds two missing entries: "success" and "start-limit-hit". These two entries are then also added to the table explaining the $EXIT_CODE + $EXIT_STATUS variables. Fixes: #6597	2017-09-12 18:04:26 +02:00
Yu Watanabe	de7070b49a	man: add examples for CapabilityBoundingSet= Follow-up for `c792ec2e35`.	2017-09-04 16:20:55 +09:00
Yu Watanabe	e8d85bc062	man: LockPersonality= takes a boolean argument (#6718 ) Follow-up for `78e864e5b3`.	2017-09-01 09:38:41 +02:00
Yu Watanabe	ada5e27657	core: StateDirectory= and friends imply RequiresMountsFor=	2017-08-31 18:19:35 +09:00
Topi Miettinen	78e864e5b3	seccomp: LockPersonality boolean (#6193 ) Add LockPersonality boolean to allow locking down personality(2) system call so that the execution domain can't be changed. This may be useful to improve security because odd emulations may be poorly tested and source of vulnerabilities, while system services shouldn't need any weird personalities.	2017-08-29 15:54:50 +02:00
Diogo Pereira	c29ebc1a10	Fix typo in man/systemd.exec.xml (#6683 )	2017-08-28 18:38:29 +02:00
Lennart Poettering	6eaaeee93a	seccomp: add new @setuid seccomp group This new group lists all UID/GID credential changing syscalls (which are quite a number these days). This will become particularly useful in a later commit, which uses this group to optionally permit user credential changing to daemons in case ambient capabilities are not available.	2017-08-10 15:02:50 +02:00
Yu Watanabe	2d35b79cdc	man: DynamicUser= does not imply PrivateDevices= (#6510 ) Follow-up for `effbd6d2ea`.	2017-08-07 11:02:47 +02:00
Yu Watanabe	3536f49e8f	core: add {State,Cache,Log,Configuration}Directory= (#6384 ) This introduces {State,Cache,Log,Configuration}Directory= those are similar to RuntimeDirectory=. They create the directories under /var/lib, /var/cache/, /var/log, or /etc, respectively, with the mode specified in {State,Cache,Log,Configuration}DirectoryMode=. This also fixes #6391.	2017-07-18 14:34:52 +02:00
Lennart Poettering	7398320f9a	Merge pull request #6328 from yuwata/runtime-preserve core: Allow preserving contents of RuntimeDirectory over process restart	2017-07-17 10:02:19 +02:00
Yu Watanabe	23a7448efa	core: support subdirectories in RuntimeDirectory= option	2017-07-17 16:30:53 +09:00
Yu Watanabe	53f47dfc7b	core: allow preserving contents of RuntimeDirectory= over process restart This introduces RuntimeDirectoryPreserve= option which takes a boolean argument or 'restart'. Closes #6087.	2017-07-17 16:22:25 +09:00
Lucas Werkmeister	ceabfb889d	Fix spelling (#6378 )	2017-07-15 12:29:09 -04:00
Lennart Poettering	6297d07b82	Merge pull request #6300 from keszybz/refuse-to-load-some-units Refuse to load some units	2017-07-12 09:28:20 +02:00
Zbigniew Jędrzejewski-Szmek	b023856884	man: add warnings that Private*= settings are not always applied	2017-07-11 13:38:13 -04:00
Lennart Poettering	565dab8ef4	man: briefly document permitted user/group name syntax for User=/Group= and syusers.d (#6321 ) As discussed here: https://lists.freedesktop.org/archives/systemd-devel/2017-July/039237.html	2017-07-10 13:44:06 -04:00
Zbigniew Jędrzejewski-Szmek	189cd8c2ab	man: describe RuntimeDirectoryMode= Fixes #5509.	2017-06-17 15:23:02 -04:00
Zbigniew Jędrzejewski-Szmek	03c3c52040	man: update MemoryDenyWriteExecute description for executable stacks Without going into details, mention that libraries are also covered by the filters, and that executable stacks are a no no. Closes #5970.	2017-05-30 16:44:48 -04:00
Zbigniew Jędrzejewski-Szmek	98e9d71022	man: fix links to external man pages linkchecker ftw!	2017-05-07 11:29:40 -04:00
James Cowgill	a3645cc6dd	seccomp: add clone syscall definitions for mips (#5880 ) Also updates the documentation and adds a mention of ppc64 support which was enabled by #5325. Tested on Debian mipsel and mips64el. The other 4 mips architectures should have an identical user <-> kernel ABI to one of the 2 tested systems.	2017-05-03 18:35:45 +02:00
Mark Stosberg	b8e485faf1	man: document how to include an equals sign in a value provided to Environment= (#5710 ) It wasn't clear before how an equals sign in an "Environment=" value might be handled. Ref: http://stackoverflow.com/questions/43278883/how-to-write-systemd-environment-variables-value-which-contains/43280157	2017-04-11 23:19:06 +02:00
Torstein Husebø	6cf5a96489	man: fix typo (#5556 )	2017-03-08 07:54:22 -05:00
Lennart Poettering	525872bfab	man: document that ProtectKernelTunables= and ProtectControlGroups= implies MountAPIVFS= See: #5384	2017-02-21 21:55:43 +01:00
AsciiWolf	28a0ad81ee	man: use https:// in URLs	2017-02-21 16:28:04 +01:00
Lennart Poettering	0b8fab97cf	man: improve documentation on seccomp regarding alternative ABIs Let's clarify that RestrictAddressFamilies= and MemoryDenyWriteExecute= are only fully effective if non-native system call architectures are disabled, since they otherwise may be used to circumvent the filters, as the filters aren't equally effective on all ABIs. Fixes: #5277	2017-02-09 18:42:17 +01:00
Lennart Poettering	23deef88b9	Revert "core/execute: set HOME, USER also for root users" This reverts commit `8b89628a10`. This broke #5246	2017-02-09 11:43:44 +01:00
Zbigniew Jędrzejewski-Szmek	fc6149a6ce	Merge pull request #4962 from poettering/root-directory-2 Add new MountAPIVFS= boolean unit file setting + RootImage=	2017-02-08 23:05:05 -05:00
Zbigniew Jędrzejewski-Szmek	ef3116b5d4	man: add more commas for clarify and reword a few sentences	2017-02-08 22:53:16 -05:00
Lennart Poettering	ae9d60ce4e	seccomp: on s390 the clone() parameters are reversed Add a bit of code that tries to get the right parameter order in place for some of the better known architectures, and skips restrict_namespaces for other archs. This also bypasses the test on archs where we don't know the right order. In this case I didn't bother with testing the case where no filter is applied, since that is hopefully just an issue for now, as there's nothing stopping us from supporting more archs, we just need to know which order is right. Fixes: #5241	2017-02-08 22:21:27 +01:00
Lennart Poettering	8a50cf6957	seccomp: MemoryDenyWriteExecute= should affect both mmap() and mmap2() (#5254 ) On i386 we block the old mmap() call entirely, since we cannot properly filter it. Thankfully it hasn't been used by glibc since quite some time. Fixes: #5240	2017-02-08 15:14:02 +01:00
Lennart Poettering	915e6d1676	core: add RootImage= setting for using a specific image file as root directory for a service This is similar to RootDirectory= but mounts the root file system from a block device or loopback file instead of another directory. This reuses the image dissector code now used by nspawn and gpt-auto-discovery.	2017-02-07 12:19:42 +01:00
Lennart Poettering	5d997827e2	core: add a per-unit setting MountAPIVFS= for mounting /dev, /proc, /sys in conjunction with RootDirectory= This adds a boolean unit file setting MountAPIVFS=. If set, the three main API VFS mounts will be mounted for the service. This only has an effect on RootDirectory=, which it makes a ton times more useful. (This is basically the /dev + /proc + /sys mounting code posted in the original #4727, but rebased on current git, and with the automatic logic replaced by explicit logic controlled by a unit file setting)	2017-02-07 11:22:05 +01:00
Lennart Poettering	142bd808a1	man: Document that RestrictAddressFamilies= doesn't work on s390/s390x/... We already say that it doesn't work on i386, but there are more archs like that apparently.	2017-02-06 14:17:12 +01:00
Zbigniew Jędrzejewski-Szmek	8b89628a10	core/execute: set HOME, USER also for root users This changes the environment for services running as root from: LANG=C.utf8 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin INVOCATION_ID=ffbdec203c69499a9b83199333e31555 JOURNAL_STREAM=8:1614518 to LANG=C.utf8 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin HOME=/root LOGNAME=root USER=root SHELL=/bin/sh INVOCATION_ID=15a077963d7b4ca0b82c91dc6519f87c JOURNAL_STREAM=8:1616718 Making the environment special for the root user complicates things unnecessarily. This change simplifies both our logic (by making the setting of the variables unconditional), and should also simplify the logic in services (particularly scripts). Fixes #5124.	2017-02-03 11:49:22 -05:00
Brandon Philips	9806301614	man: fix spelling error parth -> path	2017-02-02 00:54:42 +01:00
Jakub Wilk	301a21a880	man: fix typos (#5109 )	2017-01-19 16:54:22 +01:00
Zbigniew Jędrzejewski-Szmek	5b3637b44a	Merge pull request #4991 from poettering/seccomp-fix	2017-01-17 23:10:46 -05:00
Zbigniew Jędrzejewski-Szmek	374e692252	Merge pull request #5009 from ian-kelling/ian-mnt-namespace-doc	2017-01-11 15:23:00 -05:00
Ian Kelling	fa2a396620	doc: MountFlags= don't reference container which may not exist (#5011 )	2017-01-03 21:32:31 +01:00
Ian Kelling	7141028d30	doc: correct "or" to "and" in MountFlags= description (#5010 )	2017-01-03 21:31:20 +01:00
Ian Kelling	4b957756b8	man: document mount deletion between commands	2017-01-03 02:17:50 -08:00
Martin Pitt	56a9366d7d	Merge pull request #4994 from poettering/private-tmp-tmpfiles automatically clean up PrivateTmp= left-overs in /var/tmp on next boot	2016-12-29 11:18:38 +01:00
Lennart Poettering	9eb484fa40	man: add brief documentation for the (sd-pam) processes created due to PAMName= (#4967 ) A follow-up for #4942, adding a brief but more correct explanation of the processes.	2016-12-29 10:55:27 +01:00
Lennart Poettering	d71f050599	core: implicitly order units with PrivateTmp= after systemd-tmpfiles-setup.service Preparation for fixing #4401.	2016-12-27 23:25:24 +01:00
Lennart Poettering	bd2ab3f4f6	seccomp: add two new filter sets: @reboot and @swap These groupe reboot()/kexec() and swapon()/swapoff() respectively	2016-12-27 18:09:37 +01:00
Lennart Poettering	d2d6c096f6	core: add ability to define arbitrary bind mounts for services This adds two new settings BindPaths= and BindReadOnlyPaths=. They allow defining arbitrary bind mounts specific to particular services. This is particularly useful for services with RootDirectory= set as this permits making specific bits of the host directory available to chrooted services. The two new settings follow the concepts nspawn already possess in --bind= and --bind-ro=, as well as the .nspawn settings Bind= and BindReadOnly= (and these latter options should probably be renamed to BindPaths= and BindReadOnlyPaths= too). Fixes: #3439	2016-12-14 00:54:10 +01:00
Jouke Witteveen	a4e26faf33	man: fix $SERVICE_RESULT/$EXIT_CODE/$EXIT_STATUS documentation Note that any exit code is available through $EXIT_STATUS and not through $EXIT_CODE. This mimics siginfo.	2016-12-06 13:37:14 +01:00
Jouke Witteveen	7ed0a4c537	bus-util: add protocol error type explanation	2016-11-29 23:19:52 +01:00
Jouke Witteveen	e0c7d5f7be	man: document protocol error type for service failures (#4724 )	2016-11-23 22:51:33 +01:00
Lennart Poettering	1a1b13c957	seccomp: add @filesystem syscall group (#4537 ) @filesystem groups various file system operations, such as opening files and directories for read/write and stat()ing them, plus renaming, deleting, symlinking, hardlinking.	2016-11-21 19:29:12 -05:00
Lennart Poettering	5327c910d2	namespace: simplify, optimize and extend handling of mounts for namespace This changes a couple of things in the namespace handling: It merges the BindMount and TargetMount structures. They are mostly the same, hence let's just use the same structue, and rely on C's implicit zero initialization of partially initialized structures for the unneeded fields. This reworks memory management of each entry a bit. It now contains one "const" and one "malloc" path. We use the former whenever we can, but use the latter when we have to, which is the case when we have to chase symlinks or prefix a root directory. This means in the common case we don't actually need to allocate any dynamic memory. To make this easy to use we add an accessor function bind_mount_path() which retrieves the right path string from a BindMount structure. While we are at it, also permit "+" as prefix for dirs configured with ReadOnlyPaths= and friends: if specified the root directory of the unit is implicited prefixed. This also drops set_bind_mount() and uses C99 structure initialization instead, which I think is more readable and clarifies what is being done. This drops append_protect_kernel_tunables() and append_protect_kernel_modules() as append_static_mounts() is now simple enough to be called directly. Prefixing with the root dir is now done in an explicit step in prefix_where_needed(). It will prepend the root directory on each entry that doesn't have it prefixed yet. The latter is determined depending on an extra bit in the BindMount structure.	2016-11-17 18:08:32 +01:00
Djalal Harouni	8526555680	doc: move ProtectKernelModules= documentation near ProtectKernelTunalbes=	2016-11-15 15:04:41 +01:00
Djalal Harouni	a7db8614f3	doc: note when no new privileges is implied	2016-11-15 15:04:35 +01:00
Lennart Poettering	add005357d	core: add new RestrictNamespaces= unit file setting This new setting permits restricting whether namespaces may be created and managed by processes started by a unit. It installs a seccomp filter blocking certain invocations of unshare(), clone() and setns(). RestrictNamespaces=no is the default, and does not restrict namespaces in any way. RestrictNamespaces=yes takes away the ability to create or manage any kind of namspace. "RestrictNamespaces=mnt ipc" restricts the creation of namespaces so that only mount and IPC namespaces may be created/managed, but no other kind of namespaces. This setting should be improve security quite a bit as in particular user namespacing was a major source of CVEs in the kernel in the past, and is accessible to unprivileged processes. With this setting the entire attack surface may be removed for system services that do not make use of namespaces.	2016-11-04 07:40:13 -06:00
Zbigniew Jędrzejewski-Szmek	cf88547034	Merge pull request #4548 from keszybz/seccomp-help systemd-analyze syscall-filter	2016-11-03 20:27:45 -04:00
Kees Cook	d974f949f1	doc: clarify NoNewPrivileges (#4562 ) Setting no_new_privs does not stop UID changes, but rather blocks gaining privileges through execve(). Also fixes a small typo.	2016-11-03 20:26:59 -04:00
Zbigniew Jędrzejewski-Szmek	d5efc18b60	seccomp-util, analyze: export comments as a help string Just to make the whole thing easier for users.	2016-11-03 09:35:36 -04:00
Zbigniew Jędrzejewski-Szmek	869feb3388	analyze: add syscall-filter verb This should make it easier for users to understand what each filter means as the list of syscalls is updated in subsequent systemd versions.	2016-11-03 09:35:35 -04:00
Lennart Poettering	2ca8dc15f9	man: document that too strict system call filters may affect the service manager If execve() or socket() is filtered the service manager might get into trouble executing the service binary, or handling any failures when this fails. Mention this in the documentation. The other option would be to implicitly whitelist all system calls that are required for these codepaths. However, that appears less than desirable as this would mean socket() and many related calls have to be whitelisted unconditionally. As writing system call filters requires a certain level of expertise anyway it sounds like the better option to simply document these issues and suggest that the user disables system call filters in the service temporarily in order to debug any such failures. See: #3993.	2016-11-02 08:55:24 -06:00
Lennart Poettering	133ddbbeae	seccomp: add two new syscall groups @resources contains various syscalls that alter resource limits and memory and scheduling parameters of processes. As such they are good candidates to block for most services. @basic-io contains a number of basic syscalls for I/O, similar to the list seccomp v1 permitted but slightly more complete. It should be useful for building basic whitelisting for minimal sandboxes	2016-11-02 08:50:00 -06:00
Lennart Poettering	aa6b9cec88	man: two minor fixes	2016-11-02 08:50:00 -06:00
Lennart Poettering	cd5bfd7e60	seccomp: include pipes and memfd in @ipc These system calls clearly fall in the @ipc category, hence should be listed there, simply to avoid confusion and surprise by the user.	2016-11-02 08:50:00 -06:00
Lennart Poettering	a8c157ff30	seccomp: drop execve() from @process list The system call is already part in @default hence implicitly allowed anyway. Also, if it is actually blocked then systemd couldn't execute the service in question anymore, since the application of seccomp is immediately followed by it.	2016-11-02 08:49:59 -06:00
Lennart Poettering	c79aff9a82	seccomp: add clock query and sleeping syscalls to "@default" group Timing and sleep are so basic operations, it makes very little sense to ever block them, hence don't.	2016-11-02 08:49:59 -06:00
Zbigniew Jędrzejewski-Szmek	aa34055ffb	seccomp: allow specifying arm64, mips, ppc (#4491 ) "Secondary arch" table for mips is entirely speculative…	2016-11-01 09:33:18 -06:00
Jakub Wilk	b17649ee5e	man: fix typos (#4527 )	2016-10-31 08:08:08 -04:00
Djalal Harouni	fa1f250d6f	Merge pull request #4495 from topimiettinen/block-shmat-exec seccomp: also block shmat(..., SHM_EXEC) for MemoryDenyWriteExecute	2016-10-28 15:41:07 +02:00
Topi Miettinen	d2ffa389b8	seccomp: also block shmat(..., SHM_EXEC) for MemoryDenyWriteExecute shmat(..., SHM_EXEC) can be used to create writable and executable memory, so let's block it when MemoryDenyWriteExecute is set.	2016-10-26 18:59:14 +03:00
Zbigniew Jędrzejewski-Szmek	74388c2d11	man: document the default value of NoNewPrivileges= Fixes #4329.	2016-10-24 23:45:57 -04:00
Lennart Poettering	47da760efd	man: document default for User= Replaces: #4375	2016-10-20 13:21:25 +02:00
Luca Bruno	52c239d770	core/exec: add a named-descriptor option ("fd") for streams (#4179 ) This commit adds a `fd` option to `StandardInput=`, `StandardOutput=` and `StandardError=` properties in order to connect standard streams to externally named descriptors provided by some socket units. This option looks for a file descriptor named as the corresponding stream. Custom names can be specified, separated by a colon. If multiple name-matches exist, the first matching fd will be used.	2016-10-17 20:05:49 -04:00
Lennart Poettering	c7458f9399	man: avoid abbreviated "cgroups" terminology (#4396 ) Let's avoid the overly abbreviated "cgroups" terminology. Let's instead write: "Linux Control Groups (cgroups)" is the long form wherever the term is introduced in prose. Use "control groups" in the short form wherever the term is used within brief explanations. Follow-up to: #4381	2016-10-17 09:50:26 -04:00
Zbigniew Jędrzejewski-Szmek	74b47bbd5d	man: add crosslink between systemd.resource-control(5) and systemd.exec(5) Fixes #4379.	2016-10-15 18:38:20 -04:00
Lennart Poettering	8bfdf29b24	Merge pull request #4243 from endocode/djalal/sandbox-first-protection-kernelmodules-v1 core:sandbox: Add ProtectKernelModules= and some fixes	2016-10-13 18:36:29 +02:00
Thomas Hindoe Paaboel Andersen	2dd678171e	man: typo fixes A mix of fixes for typos and UK english	2016-10-12 23:02:44 +02:00
Djalal Harouni	c575770b75	core:sandbox: lets make /lib/modules/ inaccessible on ProtectKernelModules= Lets go further and make /lib/modules/ inaccessible for services that do not have business with modules, this is a minor improvment but it may help on setups with custom modules and they are limited... in regard of kernel auto-load feature. This change introduce NameSpaceInfo struct which we may embed later inside ExecContext but for now lets just reduce the argument number to setup_namespace() and merge ProtectKernelModules feature.	2016-10-12 14:11:16 +02:00
Djalal Harouni	ac246d9868	doc: minor hint about InaccessiblePaths= in regard of ProtectKernelTunables=	2016-10-12 13:52:40 +02:00
Djalal Harouni	2cd0a73547	core:sandbox: remove CAP_SYS_RAWIO on PrivateDevices=yes The rawio system calls were filtered, but CAP_SYS_RAWIO allows to access raw data through /proc, ioctl and some other exotic system calls...	2016-10-12 13:39:49 +02:00
Djalal Harouni	502d704e5e	core:sandbox: Add ProtectKernelModules= option This is useful to turn off explicit module load and unload operations on modular kernels. This option removes CAP_SYS_MODULE from the capability bounding set for the unit, and installs a system call filter to block module system calls. This option will not prevent the kernel from loading modules using the module auto-load feature which is a system wide operation.	2016-10-12 13:31:21 +02:00
Zbigniew Jędrzejewski-Szmek	56b4c80b42	Merge pull request #4348 from poettering/docfixes Various smaller documentation fixes.	2016-10-11 13:49:15 -04:00
Lennart Poettering	f4c9356d13	man: beef up documentation on per-unit resource limits a bit Let's clarify that for user services some OS-defined limits bound the settings in the unit files. Fixes: #4232	2016-10-11 18:42:22 +02:00
Lennart Poettering	4b58153dd2	core: add "invocation ID" concept to service manager This adds a new invocation ID concept to the service manager. The invocation ID identifies each runtime cycle of a unit uniquely. A new randomized 128bit ID is generated each time a unit moves from and inactive to an activating or active state. The primary usecase for this concept is to connect the runtime data PID 1 maintains about a service with the offline data the journal stores about it. Previously we'd use the unit name plus start/stop times, which however is highly racy since the journal will generally process log data after the service already ended. The "invocation ID" kinda matches the "boot ID" concept of the Linux kernel, except that it applies to an individual unit instead of the whole system. The invocation ID is passed to the activated processes as environment variable. It is additionally stored as extended attribute on the cgroup of the unit. The latter is used by journald to automatically retrieve it for each log logged message and attach it to the log entry. The environment variable is very easily accessible, even for unprivileged services. OTOH the extended attribute is only accessible to privileged processes (this is because cgroupfs only supports the "trusted." xattr namespace, not "user."). The environment variable may be altered by services, the extended attribute may not be, hence is the better choice for the journal. Note that reading the invocation ID off the extended attribute from journald is racy, similar to the way reading the unit name for a logging process is. This patch adds APIs to read the invocation ID to sd-id128: sd_id128_get_invocation() may be used in a similar fashion to sd_id128_get_boot(). PID1's own logging is updated to always include the invocation ID when it logs information about a unit. A new bus call GetUnitByInvocationID() is added that allows retrieving a bus path to a unit by its invocation ID. The bus path is built using the invocation ID, thus providing a path for referring to a unit that is valid only for the current runtime cycleof it. Outlook for the future: should the kernel eventually allow passing of cgroup information along AF_UNIX/SOCK_DGRAM messages via a unique cgroup id, then we can alter the invocation ID to be generated as hash from that rather than entirely randomly. This way we can derive the invocation race-freely from the messages.	2016-10-07 20:14:38 +02:00
hbrueckner	6abfd30372	seccomp: add support for the s390 architecture (#4287 ) Add seccomp support for the s390 architecture (31-bit and 64-bit) to systemd. This requires libseccomp >= 2.3.1.	2016-10-05 13:58:55 +02:00
Stefan Schweter	cfaf4b75e0	man: remove consecutive duplicate words (#4268 ) This PR removes consecutive duplicate words from the man pages of: * `resolved.conf.xml` * `systemd.exec.xml` * `systemd.socket.xml`	2016-10-03 17:09:54 +02:00
Djalal Harouni	8f81a5f61b	core: Use @raw-io syscall group to filter I/O syscalls when PrivateDevices= is set Instead of having a local syscall list, use the @raw-io group which contains the same set of syscalls to filter.	2016-09-25 12:52:27 +02:00
Djalal Harouni	49accde7bd	core:sandbox: add more /proc/* entries to ProtectKernelTunables= Make ALSA entries, latency interface, mtrr, apm/acpi, suspend interface, filesystems configuration and IRQ tuning readonly. Most of these interfaces now days should be in /sys but they are still available through /proc, so just protect them. This patch does not touch /proc/net/...	2016-09-25 11:30:11 +02:00
Djalal Harouni	9221aec8d0	doc: explicitly document that /dev/mem and /dev/port are blocked by PrivateDevices=true	2016-09-25 11:25:44 +02:00
Djalal Harouni	e778185bb5	doc: documentation fixes for ReadWritePaths= and ProtectKernelTunables= Documentation fixes for ReadWritePaths= and ProtectKernelTunables= as reported by Evgeny Vereshchagin.	2016-09-25 11:25:31 +02:00
Lennart Poettering	6757c06a1a	man: shorten the exit status table a bit Let's merge a couple of columns, to make the table a bit shorter. This effectively just drops whitespace, not contents, but makes the currently humungous table much much more compact.	2016-09-25 10:52:57 +02:00
Lennart Poettering	81c8aceed4	man: the exit code/signal is stored in $EXIT_CODE, not $EXIT_STATUS	2016-09-25 10:52:57 +02:00
Lennart Poettering	effbd6d2ea	man: rework documentation for ReadOnlyPaths= and related settings This reworks the documentation for ReadOnlyPaths=, ReadWritePaths=, InaccessiblePaths=. It no longer claims that we'd follow symlinks relative to the host file system. (Which wasn't true actually, as we didn't follow symlinks at all in the most recent releases, and we know do follow them, but relative to RootDirectory=). This also replaces all references to the fact that all fs namespacing options can be undone with enough privileges and disable propagation by a single one in the documentation of ReadOnlyPaths= and friends, and then directs the read to this in all other places. Moreover a hint is added to the documentation of SystemCallFilter=, suggesting usage of ~@mount in case any of the fs namespacing related options are used.	2016-09-25 10:42:18 +02:00
Lennart Poettering	b2656f1b1c	man: in user-facing documentaiton don't reference C function names Let's drop the reference to the cap_from_name() function in the documentation for the capabilities setting, as it is hardly helpful. Our readers are not necessarily C hackers knowing the semantics of cap_from_name(). Moreover, the strings we accept are just the plain capability names as listed in capabilities(7) hence there's really no point in confusing the user with anything else.	2016-09-25 10:42:18 +02:00
Lennart Poettering	63bb64a056	core: imply ProtectHome=read-only and ProtectSystem=strict if DynamicUser=1 Let's make sure that services that use DynamicUser=1 cannot leave files in the file system should the system accidentally have a world-writable directory somewhere. This effectively ensures that directories need to be whitelisted rather than blacklisted for access when DynamicUser=1 is set.	2016-09-25 10:42:18 +02:00
Lennart Poettering	3f815163ff	core: introduce ProtectSystem=strict Let's tighten our sandbox a bit more: with this change ProtectSystem= gains a new setting "strict". If set, the entire directory tree of the system is mounted read-only, but the API file systems /proc, /dev, /sys are excluded (they may be managed with PrivateDevices= and ProtectKernelTunables=). Also, /home and /root are excluded as those are left for ProtectHome= to manage. In this mode, all "real" file systems (i.e. non-API file systems) are mounted read-only, and specific directories may only be excluded via ReadWriteDirectories=, thus implementing an effective whitelist instead of blacklist of writable directories. While we are at, also add /efi to the list of paths always affected by ProtectSystem=. This is a follow-up for `b52a109ad3` which added /efi as alternative for /boot. Our namespacing logic should respect that too.	2016-09-25 10:42:18 +02:00
Lennart Poettering	59eeb84ba6	core: add two new service settings ProtectKernelTunables= and ProtectControlGroups= If enabled, these will block write access to /sys, /proc/sys and /proc/sys/fs/cgroup.	2016-09-25 10:18:48 +02:00
Lennart Poettering	00d9ef8560	core: add RemoveIPC= setting This adds the boolean RemoveIPC= setting to service, socket, mount and swap units (i.e. all unit types that may invoke processes). if turned on, and the unit's user/group is not root, all IPC objects of the user/group are removed when the service is shut down. The life-cycle of the IPC objects is hence bound to the unit life-cycle. This is particularly relevant for units with dynamic users, as it is essential that no objects owned by the dynamic users survive the service exiting. In fact, this patch adds code to imply RemoveIPC= if DynamicUser= is set. In order to communicate the UID/GID of an executed process back to PID 1 this adds a new "user lookup" socket pair, that is inherited into the forked processes, and closed before the exec(). This is needed since we cannot do NSS from PID 1 due to deadlock risks, However need to know the used UID/GID in order to clean up IPC owned by it if the unit shuts down.	2016-08-19 00:37:25 +02:00
Zbigniew Jędrzejewski-Szmek	29df65f913	man: add "timeout" to status table (#3919 )	2016-08-11 10:51:49 +02:00
Lennart Poettering	56bf97e10f	Merge pull request #3914 from keszybz/fix-man-links Fix man links	2016-08-07 11:17:56 +02:00
Zbigniew Jędrzejewski-Szmek	e64e1bfd86	man: add a table of possible exit statuses (#3910 )	2016-08-07 11:14:40 +02:00
Zbigniew Jędrzejewski-Szmek	d87a2ef782	Merge pull request #3884 from poettering/private-users	2016-08-06 17:04:45 -04:00
Zbigniew Jędrzejewski-Szmek	0a07667d8d	man: provide html links to a bunch of external man pages	2016-08-06 16:39:53 -04:00
Lennart Poettering	136dc4c435	core: set $SERVICE_RESULT, $EXIT_CODE and $EXIT_STATUS in ExecStop=/ExecStopPost= commands This should simplify monitoring tools for services, by passing the most basic information about service result/exit information via environment variables, thus making it unnecessary to retrieve them explicitly via the bus.	2016-08-04 23:08:05 +02:00
Lennart Poettering	d251207d55	core: add new PrivateUsers= option to service execution This setting adds minimal user namespacing support to a service. When set the invoked processes will run in their own user namespace. Only a trivial mapping will be set up: the root user/group is mapped to root, and the user/group of the service will be mapped to itself, everything else is mapped to nobody. If this setting is used the service runs with no capabilities on the host, but configurable capabilities within the service. This setting is particularly useful in conjunction with RootDirectory= as the need to synchronize /etc/passwd and /etc/group between the host and the service OS tree is reduced, as only three UID/GIDs need to match: root, nobody and the user of the service itself. But even outside the RootDirectory= case this setting is useful to substantially reduce the attack surface of a service. Example command to test this: systemd-run -p PrivateUsers=1 -p User=foobar -t /bin/sh This runs a shell as user "foobar". When typing "ps" only processes owned by "root", by "foobar", and by "nobody" should be visible.	2016-08-03 20:42:04 +02:00
Zbigniew Jędrzejewski-Szmek	dadd6ecfa5	Merge pull request #3728 from poettering/dynamic-users	2016-07-25 16:40:26 -04:00
Lennart Poettering	43eb109aa9	core: change ExecStart=! syntax to ExecStart=+ (#3797 ) As suggested by @mbiebl we already use the "!" special char in unit file assignments for negation, hence we should not use it in a different context for privileged execution. Let's use "+" instead.	2016-07-25 16:53:33 +02:00
Lennart Poettering	29206d4619	core: add a concept of "dynamic" user ids, that are allocated as long as a service is running This adds a new boolean setting DynamicUser= to service files. If set, a new user will be allocated dynamically when the unit is started, and released when it is stopped. The user ID is allocated from the range 61184..65519. The user will not be added to /etc/passwd (but an NSS module to be added later should make it show up in getent passwd). For now, care should be taken that the service writes no files to disk, since this might result in files owned by UIDs that might get assigned dynamically to a different service later on. Later patches will tighten sandboxing in order to ensure that this cannot happen, except for a few selected directories. A simple way to test this is: systemd-run -p DynamicUser=1 /bin/sleep 99999	2016-07-22 15:53:45 +02:00
Alessandro Puccetti	2a624c36e6	doc,core: Read{Write,Only}Paths= and InaccessiblePaths= This patch renames Read{Write,Only}Directories= and InaccessibleDirectories= to Read{Write,Only}Paths= and InaccessiblePaths=, previous names are kept as aliases but they are not advertised in the documentation. Renamed variables: `read_write_dirs` --> `read_write_paths` `read_only_dirs` --> `read_only_paths` `inaccessible_dirs` --> `inaccessible_paths`	2016-07-19 17:22:02 +02:00
Alessandro Puccetti	c4b4170746	namespace: unify limit behavior on non-directory paths Despite the name, `Read{Write,Only}Directories=` already allows for regular file paths to be masked. This commit adds the same behavior to `InaccessibleDirectories=` and makes it explicit in the doc. This patch introduces `/run/systemd/inaccessible/{reg,dir,chr,blk,fifo,sock}` {dile,device}nodes and mounts on the appropriate one the paths specified in `InacessibleDirectories=`. Based on Luca's patch from https://github.com/systemd/systemd/pull/3327	2016-07-19 17:22:02 +02:00
Lennart Poettering	f4170c671b	execute: add a new easy-to-use RestrictRealtime= option to units It takes a boolean value. If true, access to SCHED_RR, SCHED_FIFO and SCHED_DEADLINE is blocked, which my be used to lock up the system.	2016-06-23 01:45:45 +02:00
Lennart Poettering	7bce046bcf	core: set $JOURNAL_STREAM to the dev_t/ino_t of the journal stream of executed services This permits services to detect whether their stdout/stderr is connected to the journal, and if so talk to the journal directly, thus permitting carrying of metadata. As requested by the gtk folks: #2473	2016-06-15 23:00:27 +02:00
Lennart Poettering	1f9ac68b5b	core: improve seccomp syscall grouping a bit This adds three new seccomp syscall groups: @keyring for kernel keyring access, @cpu-emulation for CPU emulation features, for exampe vm86() for dosemu and suchlike, and @debug for ptrace() and related calls. Also, the @clock group is updated with more syscalls that alter the system clock. capset() is added to @privileged, and pciconfig_iobase() is added to @raw-io. Finally, @obsolete is a cleaned up. A number of syscalls that never existed on Linux and have no number assigned on any architecture are removed, as they only exist in the man pages and other operating sytems, but not in code at all. create_module() is moved from @module to @obsolete, as it is an obsolete system call. mem_getpolicy() is removed from the @obsolete list, as it is not obsolete, but simply a NUMA API.	2016-06-13 16:25:54 +02:00
Alessandro Puccetti	cf677fe686	core/execute: add the magic character '!' to allow privileged execution (#3493 ) This patch implements the new magic character '!'. By putting '!' in front of a command, systemd executes it with full privileges ignoring paramters such as User, Group, SupplementaryGroups, CapabilityBoundingSet, AmbientCapabilities, SecureBits, SystemCallFilter, SELinuxContext, AppArmorProfile, SmackProcessLabel, and RestrictAddressFamilies. Fixes partially https://github.com/systemd/systemd/issues/3414 Related to https://github.com/coreos/rkt/issues/2482 Testing: 1. Create a user 'bob' 2. Create the unit file /etc/systemd/system/exec-perm.service (You can use the example below) 3. sudo systemctl start ext-perm.service 4. Verify that the commands starting with '!' were not executed as bob, 4.1 Looking to the output of ls -l /tmp/exec-perm 4.2 Each file contains the result of the id command. ````````````````````````````````````````````````````````````````` [Unit] Description=ext-perm [Service] Type=oneshot TimeoutStartSec=0 User=bob ExecStartPre=!/usr/bin/sh -c "/usr/bin/rm /tmp/exec-perm*" ; /usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-start-pre" ExecStart=/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-start" ; !/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-star-2" ExecStartPost=/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-start-post" ExecReload=/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-reload" ExecStop=!/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-stop" ExecStopPost=/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-stop-post" [Install] WantedBy=multi-user.target] `````````````````````````````````````````````````````````````````	2016-06-10 18:19:54 +02:00
Topi Miettinen	f3e4363593	core: Restrict mmap and mprotect with PAGE_WRITE\|PAGE_EXEC (#3319 ) (#3379 ) New exec boolean MemoryDenyWriteExecute, when set, installs a seccomp filter to reject mmap(2) with PAGE_WRITE\|PAGE_EXEC and mprotect(2) with PAGE_EXEC.	2016-06-03 17:58:18 +02:00

... 2 3 4 5 6 ...

518 Commits