Let's handle gracefully if a client disconnects very early on.
This builds on #4120, but relaxes the condition checks further, since we
getpeername() might already fail during ExecStartPre= and friends.
Fixes: #7172
Presets are useful to initialize uninitialized /etc, but that doesn't
apply to the initrd.
Also, let's rename etc_empty → first_boot. After all, the variable
doesn't actually reflect whether /etc is really empty, it just reflects
whether /etc/machine-id existed originally or not. Moreover, we later on
directly initialize manager_set_first_boot() from it, hence let's just
name it the same way all through the codepath, to make this all less
confusing.
See: #7100
This function is really not a method of the Manager object (implemented
in manager.c), but just a helper in main.c. Hence let's not confusingly
name it the way methods are called.
We already made a similar change when talking about the "restart"
command, let's also do this for "systemctl reload" and friends.
Follow-up for: 6539dd7c42
See: #7126
We currently use the ownership of the top-level directory as a hint
whether we need to descent into the whole tree to chown() it recursively
or not. This is problematic with the previous chown()ing algorithm, as
when descending into the tree we'd first chown() and then descend
further down, which meant that the top-level directory would be chowned
first, and an aborted recursive chowning would appear on the next
invocation as successful, even though it was not. Let's reshuffle things
a bit, to make the re-chown()ing safe regarding interruptions:
a) We chown() the dir we are looking at last, and descent into all its
children first. That way we know that if the top-level dir is
properly owned everything inside of it is properly owned too.
b) Before starting a chown()ing operation, we mark the top-level
directory as owned by a special "busy" UID range, which we can use to
recognize whether a tree was fully chowned: if it is marked as busy,
it's definitely not fully chowned, as the busy ownership will only be
fixed as final step of the chowning.
Fixes: #6292
Linux doesn't have faccess(), hence let's emulate it. Linux has access()
and faccessat() but neither allows checking the access rights of an fd
passed in directly.
Before this, assigning empty string to Delegate= makes no change to the
controller list. This is inconsistent to the other options that take list
of strings. After this, when empty string is assigned to Delegate=, the
list of controllers is reset. Such behavior is consistent to other options
and useful for drop-in configs.
Closes#7334.
This option is likely to be very useful for systemd-run invocations,
hence let's add a shortcut for it.
With this new concepts it's now very easy to put together systemd-run
invocations that leave zero artifacts in the system, including when they
fail.
Right now, the option only takes one of two possible values "inactive"
or "inactive-or-failed", the former being the default, and exposing same
behaviour as the status quo ante. If set to "inactive-or-failed" units
may be collected by the GC logic when in the "failed" state too.
This logic should be a nicer alternative to using the "-" modifier for
ExecStart= and friends, as the exit data is collected and logged about
and only removed when the GC comes along. This should be useful in
particular for per-connection socket-activated services, as well as
"systemd-run" command lines that shall leave no artifacts in the
system.
I was thinking about whether to expose this as a boolean, but opted for
an enum instead, as I have the suspicion other tweaks like this might be
a added later on, in which case we extend this setting instead of having
to add yet another one.
Also, let's add some documentation for the GC logic.
When preparing for a restart we quickly go through the DEAD/INACTIVE
service state before entering AUTO_RESTART. When doing this, we need to
make sure we don't destroy the FD store. Previously this was done by
checking the failure state of the unit, and keeping the FD store around
when the unit failed, under the assumption that the restart logic will
then get into action.
This is not entirely correct howver, as there might be failure states
that will no result in restarts.
With this commit we slightly alter the logic: a ref counter for the fd
store is added, that is increased right before we handle the restart
logic, and decreased again right-after.
This should ensure that the fdstore lives exactly as long as it needs.
Follow-up for f0bfbfac43.
And let's make use of it to implement two new unit settings with it:
1. LogLevelMax= is a new per-unit setting that may be used to configure
log priority filtering: set it to LogLevelMax=notice and only
messages of level "notice" and lower (i.e. more important) will be
processed, all others are dropped.
2. LogExtraFields= is a new per-unit setting for configuring per-unit
journal fields, that are implicitly included in every log record
generated by the unit's processes. It takes field/value pairs in the
form of FOO=BAR.
Also, related to this, one exisiting unit setting is ported to this new
facility:
3. The invocation ID is now pulled from /run/systemd/units/ instead of
cgroupfs xattrs. This substantially relaxes requirements of systemd
on the kernel version and the privileges it runs with (specifically,
cgroupfs xattrs are not available in containers, since they are
stored in kernel memory, and hence are unsafe to permit to lesser
privileged code).
/run/systemd/units/ is a new directory, which contains a number of files
and symlinks encoding the above information. PID 1 creates and manages
these files, and journald reads them from there.
Note that this is supposed to be a direct path between PID 1 and the
journal only, due to the special runtime environment the journal runs
in. Normally, today we shouldn't introduce new interfaces that (mis-)use
a file system as IPC framework, and instead just an IPC system, but this
is very hard to do between the journal and PID 1, as long as the IPC
system is a subject PID 1 manages, and itself a client to the journal.
This patch cleans up a couple of types used in journal code:
specifically we switch to size_t for a couple of memory-sizing values,
as size_t is the right choice for everything that is memory.
Fixes: #4089Fixes: #3041Fixes: #4441
Let's clarify that these settings only apply to stdout/stderr logging.
Always mention the journal before syslog (as the latter is in most ways
just a legacy alias these days). Always mention the +console cases too.
When we drop messages of a unit, we log about. Let's add some structured
data to that. Let's include how many messages we dropped, but more
importantly, let's link up the message we generate to the unit we
dropped the messages from by using the "OBJECT" logic, i.e. by
generating OBJECT_SYSTEMD_UNIT= fields and suchlike, that "journalctl
-u" and friends already look for.
Fixes: #6494