Allow to setup new bind mounts for a service at runtime (via either
DBUS or a new 'systemctl bind' verb) with a new helper that forks into
the unit's mount namespace.
Add a new integration test to cover this.
Useful for zero-downtime addition to services that are running inside
mount namespaces, especially when using RootImage/RootDirectory.
If a service runs with a read-only root, a tmpfs is added on /run
to ensure we can create the airlock directory for incoming mounts
under /run/host/incoming.
We need a writable /run for most operations, but in case a read-only
RootImage (or similar) is used, by default there's no additional
tmpfs mount on /run. Change this behaviour and document it.
This adds the ability to specify truncate:PATH for StandardOutput= and
StandardError=, similar to the existing append:PATH. The code is mostly
copied from the related append: code. Fixes#8983.
Importing the full environment is convenient, but it doesn't work too well in
practice, because we get a metric ton of shell-specific crap that should never
end up in the global environment block:
$ systemctl --user show-environment
...
SHELL=/bin/zsh
AUTOJUMP_ERROR_PATH=/home/zbyszek/.local/share/autojump/errors.log
AUTOJUMP_SOURCED=1
CONDA_SHLVL=0
CVS_RSH=ssh
DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1000/bus
DESKTOP_SESSION=gnome
DISPLAY=:0
FPATH=/usr/share/Modules/init/zsh-functions:/usr/local/share/zsh/site-functions:/usr/share/zsh/site-functions:/usr/share/zsh/5.8/functions
GDMSESSION=gnome
GDM_LANG=en_US.UTF-8
GNOME_SETUP_DISPLAY=:1
GUESTFISH_INIT=$'\\e[1;34m'
GUESTFISH_OUTPUT=$'\\e[0m'
GUESTFISH_PS1=$'\\[\\e[1;32m\\]><fs>\\[\\e[0;31m\\] '
GUESTFISH_RESTORE=$'\\e[0m'
HISTCONTROL=ignoredups
HISTSIZE=1000
LOADEDMODULES=
OLDPWD=/home/zbyszek
PWD=/home/zbyszek
QTDIR=/usr/lib64/qt-3.3
QTINC=/usr/lib64/qt-3.3/include
QTLIB=/usr/lib64/qt-3.3/lib
QT_IM_MODULE=ibus
SDL_VIDEO_MINIMIZE_ON_FOCUS_LOSS=0
SESSION_MANAGER=local/unix:@/tmp/.ICE-unix/2612,unix/unix:/tmp/.ICE-unix/2612
SHLVL=0
STEAM_FRAME_FORCE_CLOSE=1
TERM=xterm-256color
USERNAME=zbyszek
WISECONFIGDIR=/usr/share/wise2/
...
Plenty of shell-specific and terminal-specific stuff that have no global
significance.
Let's start warning when this is used to push people towards importing only
specific variables.
Putative NEWS entry:
* systemctl import-environment will now emit a warning when called without
any arguments (i.e. to import the full environment block of the called
program). This command will usually be invoked from a shell, which means
that it'll inherit a bunch of variables which are specific to that shell,
and usually to the tty the shell is connected to, and don't have any
meaning in the global context of the system or user service manager.
Instead, only specific variables should be imported into the manager
environment block.
Similarly, programs which update the manager environment block by directly
calling the D-Bus API of the manager, should also push specific variables,
and not the full inherited environment.
This adds a general description of "philosphy" of keeping the environemnt
block small and hints about systemd-run -P env.
The list of generated variables is split out to a subsection. Viewing
the patch with ignoring whitespace changes is recommended.
We don't ignore invalid assignments (except in import-environment to some
extent), previous description was wrong.
For https://bugzilla.redhat.com/show_bug.cgi?id=1912046#c17.
This beefs up the READ_FULL_FILE_CONNECT_SOCKET logic of
read_full_file_full() a bit: when used a sender socket name may be
specified. If specified as NULL behaviour is as before: the client
socket name is picked by the kernel. But if specified as non-NULL the
client can pick a socket name to use when connecting. This is useful to
communicate a minimal amount of metainformation from client to server,
outside of the transport payload.
Specifically, these beefs up the service credential logic to pass an
abstract AF_UNIX socket name as client socket name when connecting via
READ_FULL_FILE_CONNECT_SOCKET, that includes the requesting unit name
and the eventual credential name. This allows servers implementing the
trivial credential socket logic to distinguish clients: via a simple
getpeername() it can be determined which unit is requesting a
credential, and which credential specifically.
Example: with this patch in place, in a unit file "waldo.service" a
configuration line like the following:
LoadCredential=foo:/run/quux/creds.sock
will result in a connection to the AF_UNIX socket /run/quux/creds.sock,
originating from an abstract namespace AF_UNIX socket:
@$RANDOM/unit/waldo.service/foo
(The $RANDOM is replaced by some randomized string. This is included in
the socket name order to avoid namespace squatting issues: the abstract
socket namespace is open to unprivileged users after all, and care needs
to be taken not to use guessable names)
The services listening on the /run/quux/creds.sock socket may thus
easily retrieve the name of the unit the credential is requested for
plus the credential name, via a simpler getpeername(), discarding the
random preifx and the /unit/ string.
This logic uses "/" as separator between the fields, since both unit
names and credential names appear in the file system, and thus are
designed to use "/" as outer separators. Given that it's a good safe
choice to use as separators here, too avoid any conflicts.
This is a minimal patch only: the new logic is used only for the unit
file credential logic. For other places where we use
READ_FULL_FILE_CONNECT_SOCKET it is probably a good idea to use this
scheme too, but this should be done carefully in later patches, since
the socket names become API that way, and we should determine the right
amount of info to pass over.
With new directive SystemCallLog= it's possible to list system calls to be
logged. This can be used for auditing or temporarily when constructing system
call filters.
---
v5: drop intermediary, update HASHMAP_FOREACH_KEY() use
v4: skip useless debug messages, actually parse directive
v3: don't declare unused variables with old libseccomp
v2: fix build without seccomp or old libseccomp
Define explicit action "kill" for SystemCallErrorNumber=.
In addition to errno code, allow specifying "kill" as action for
SystemCallFilter=.
---
v7: seccomp_parse_errno_or_action() returns -EINVAL if !HAVE_SECCOMP
v6: use streq_ptr(), let errno_to_name() handle bad values, kill processes,
init syscall_errno
v5: actually use seccomp_errno_or_action_to_string(), don't fail bus unit
parsing without seccomp
v4: fix build without seccomp
v3: drop log action
v2: action -> number
Mention the JSON user record stuff. Mention pam_umask explicitly.
Mention that UMask= of the per-user user@.service instance can be used
too.
Fixes: #16963
Follow the same model established for RootImage and RootImageOptions,
and allow to either append a single list of options or tuples of
partition_number:options.
Follows the same pattern and features as RootImage, but allows an
arbitrary mount point under / to be specified by the user, and
multiple values - like BindPaths.
Original implementation by @topimiettinen at:
https://github.com/systemd/systemd/pull/14451
Reworked to use dissect's logic instead of bare libmount() calls
and other review comments.
Thanks Topi for the initial work to come up with and implement
this useful feature.
Allows to specify mount options for RootImage.
In case of multi-partition images, the partition number can be prefixed
followed by colon. Eg:
RootImageOptions=1:ro,dev 2:nosuid nodev
In absence of a partition number, 0 is assumed.
This reverts commit 0b57803630.
From https://github.com/systemd/systemd/pull/16503#issuecomment-660212813:
systemd-vconsole-setup (the binary) is supposed to run asynchronously by udev
therefore ordering early interactive services after systemd-vconsole-setup.service
has basically no effect.
Let's remove this paragraph. It's better to say nothing than to give pointless
advice.
https://tools.ietf.org/html/draft-knodel-terminology-02https://lwn.net/Articles/823224/
This gets rid of most but not occasions of these loaded terms:
1. scsi_id and friends are something that is supposed to be removed from
our tree (see #7594)
2. The test suite defines an API used by the ubuntu CI. We can remove
this too later, but this needs to be done in sync with the ubuntu CI.
3. In some cases the terms are part of APIs we call or where we expose
concepts the kernel names the way it names them. (In particular all
remaining uses of the word "slave" in our codebase are like this,
it's used by the POSIX PTY layer, by the network subsystem, the mount
API and the block device subsystem). Getting rid of the term in these
contexts would mean doing some major fixes of the kernel ABI first.
Regarding the replacements: when whitelist/blacklist is used as noun we
replace with with allow list/deny list, and when used as verb with
allow-list/deny-list.
We use udev to wait for /dev/loopX devices to be fully proped hence we
need an implicit ordering dependency on it, for RootImage= to work
reliably in early boot, too.
Fixes: #14972
This patch changes the way user managers set the default umask for the units it
manages.
Indeed one can expect that if user manager's umask is redefined through PAM
(via /etc/login.defs or pam_umask), all its children including the units it
spawns have their umask set to the new value.
Hence make user units inherit their umask value from their parent instead of
the hard coded value 0022 but allow them to override this value via their unit
file.
Note that reexecuting managers with 'systemctl daemon-reexec' after changing
UMask= has no effect. To take effect managers need to be restarted with
'systemct restart' instead. This behavior was already present before this
patch.
Fixes#6077.
Let per-user service managers have user namespaces too.
For unprivileged users, user namespaces are set up much earlier
(before the mount, network, and UTS namespaces vs after) in
order to obtain capbilities in the new user namespace and enable use of
the other listed namespaces. However for privileged users (root), the
set up for the user namspace is still done at the end to avoid any
restrictions with combining namespaces inside a user namespace (see
inline comments).
Closes#10576
When wrong element types are used, directives are sometimes placed in the wrong
section. Also, strip part of text starting with "'", which is used in a few
places and which is displayed improperly in the index.
This partially reverts db11487d10 (the logic to
calculate the correct value is removed, we always use the same setting as for
the system manager). Distributions have an easy mechanism to override this if
they wish.
I think making this configurable is better, because different distros clearly
want different defaults here, and making this configurable is nice and clean.
If we don't make it configurable, distros which either have to carry patches,
or what would be worse, rely on some other configuration mechanism, like
/etc/profile. Those other solutions do not apply everywhere (they usually
require the shell to be used at some point), so it is better if we provide
a nice way to override the default.
Fixes #13469.
exec-condition and oom-kill were added without updating this table
Updated success to reflect the code, which also allows kills by signal in certain situations
Traditionally, user logins had a $PATH in which /bin was before /sbin, while
root logins had a $PATH with /sbin first. This allows the tricks that
consolehelper is doing to work. But even if we ignore consolehelper, having the
path in this order might have been used by admins for other purposes, and
keeping the order in user sessions will make it easier the adoption of systemd
user sessions a bit easier.
Fixes#733.
https://bugzilla.redhat.com/show_bug.cgi?id=1744059
OOM handling in manager_default_environment wasn't really correct.
Now the (theorertical) malloc failure in strv_new() is handled.
Please note that this has no effect on:
- systems with merged /bin-/sbin (e.g. arch)
- when there are no binaries that differ between the two locations.
E.g. on my F30 laptop there is exactly one program that is affected:
/usr/bin/setup -> consolehelper.
There is less and less stuff that relies on consolehelper, but there's still
some.
So for "clean" systems this makes no difference, but helps with legacy setups.
$ dnf repoquery --releasever=31 --qf %{name} --whatrequires usermode
anaconda-live
audit-viewer
beesu
chkrootkit
driftnet
drobo-utils-gui
hddtemp
mate-system-log
mock
pure-ftpd
setuptool
subscription-manager
system-config-httpd
system-config-rootpassword
system-switch-java
system-switch-mail
usermode-gtk
vpnc-consoleuser
wifi-radar
xawtv
Make possible to set NUMA allocation policy for manager. Manager's
policy is by default inherited to all forked off processes. However, it
is possible to override the policy on per-service basis. Currently we
support, these policies: default, prefer, bind, interleave, local.
See man 2 set_mempolicy for details on each policy.
Overall NUMA policy actually consists of two parts. Policy itself and
bitmask representing NUMA nodes where is policy effective. Node mask can
be specified using related option, NUMAMask. Default mask can be
overwritten on per-service level.
These options are pretty much equivalent to "journal" and
"journal+console" anyway, let's simplify things, and drop them from the
documentation hence.
For compat reasons let's keep them in the code.
(Note that they are not 100% identical to 'journal', but I doubt the
distinction in behaviour is really relevant to keep this in the docs.
And we should probably should drop 'syslog' entirely from our codebase
eventually, but it's problematic as long as we semi-support udev on
non-systemd systems still.)
This makes the handling of this option match what we do in unit files. I think
consistency is important here. (As it happens, it is the only option in
system.conf that is "non-atomic", i.e. where there's a list of things which can
be split over multiple assignments. All other options are single-valued, so
there's no issue of how to handle multiple assignments.)
https://github.com/systemd/systemd/issues/7153#issuecomment-485252308
Apparently this is still confusing for people.
Longer-term, I think we should just make BindMount= automatically "upgrade"
(or "downgrade", depending on how you look at this), any InaccessiblePath=
mountpoints to "tmpfs". I don't see much point in forcing users to remember
this interaction. But let's at least document the status quo, we can always
update the docs if the code changes.
Let's be safe, rather than sorry. This way DynamicUser=yes services can
neither take benefit of, nor create SUID/SGID binaries.
Given that DynamicUser= is a recent addition only we should be able to
get away with turning this on, even though this is strictly speaking a
binary compatibility breakage.
Let's avoid confusion whether the root is at the top or of the bottom of
the directory tree. Moreover we use "innermost" further down for the
same concept, so let's stick to the same terminology here.
The "include" files had type "book" for some raeason. I don't think this
is meaningful. Let's just use the same everywhere.
$ perl -i -0pe 's^..DOCTYPE (book|refentry) PUBLIC "-//OASIS//DTD DocBook XML V4.[25]//EN"\s+"http^<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"\n "http^gms' man/*.xml
No need to waste space, and uniformity is good.
$ perl -i -0pe 's|\n+<!--\s*SPDX-License-Identifier: LGPL-2.1..\s*-->|\n<!-- SPDX-License-Identifier: LGPL-2.1+ -->|gms' man/*.xml