Commit Graph

1648 Commits

Author SHA1 Message Date
Luca Boccassi
fe077a1a58 units: add initrd directory to list of conditions for systemd-confext
systemd-sysext has the same check, but it was forgotten for confexts.
Needed to activate confexts from the ESP in the initrd.
2024-11-20 09:12:24 +01:00
Adrian Vovk
31616d00ef
sysupdated: Permit mount namespaces
dissect-image tries to use mount namespaces to dissect images without
polluting the host mounts. This change allows it to do that.
2024-11-06 15:44:11 -05:00
Zbigniew Jędrzejewski-Szmek
243b63d8a6 meson: add separate option for sysupdated, disable in release builds
This commit introduces a build-time option to enable/disable sysupdated
separately from sysupdate. 'auto' translated to enabled by default in
developer builds.
2024-10-31 21:08:08 +00:00
Luca Boccassi
5ff6841c23 logind: allow read/write to char-hvc devices
virtio console uses /dev/hvc* so we need access to write wall
messages
2024-10-22 23:44:47 +02:00
Mike Yuan
a375e14519 units/{user,capsule}@.service: issue daemon-reexec when notify-reloading
Closes #28367 (but not really in the exact form, see below)

We have the problem of restarting all user manager instances
after upgrade. Current approaches involve systemctl kill
with SIGRTMIN+25, which is async and feels rather ugly [1][2];
or systemctl --machine=user@ --user, which requires entering
each user session. Neither is particularly elegant.
Instead, let's just signal daemon-reexec when user@.service
is reloaded from system manager. Our long goal of dropping
daemon-reload in favor of reexec (see TODO) is unlikely to happen
due to user dbus restrictions, but here the synchronization
is done via READY=1.

[1] https://gitlab.archlinux.org/archlinux/packaging/packages/systemd/-/blob/main/systemd.install?ref_type=heads#L37
[2] https://salsa.debian.org/systemd-team/systemd/-/blob/debian/master/debian/systemd.postinst#L24

#28367 would not really work for us now I come to think about it,
because all processes will be reparented to pid1 as soon as
original user manager process exits. This alternative approach
seems good enough for our use case.
2024-10-11 10:36:08 +02:00
Peter Hutterer
305272ab2b logind: add support for hidraw devices
Add support for opening /dev/hidraw devices via logind's TakeDevice().
Same semantics as our support for evdev devices, but it requires the
HIDIOCREVOKE ioctl in the kernel.
2024-10-03 09:36:57 +01:00
Daan De Meyer
df4d09f78e units: Order ldconfig after systemd-tmpfiles-setup.service
tmpfiles might be linking the configuration for ldconfig into /etc
so make sure it runs after it so that the configuration is guaranteed
to be in place.
2024-09-24 15:32:58 +02:00
Daan De Meyer
939137abb4 firstboot: Prompt for keymap
It's rather crucial to have a good firstboot experience that you
can immediately set the right keymap so let's make sure we prompt
for it.
2024-09-20 04:47:27 +09:00
Daan De Meyer
e196136bc5 units: Order ldconfig.service after systemd-confext.service
The configuration files required by ldconfig could be put into
place by systemd-confext.service (ldconfig only looks in /etc) so
let's order the service after systemd-confext.service to make sure
any config files are in place before the service runs.
2024-09-12 20:20:53 +02:00
Matteo Croce
6d9ef22acd emit a warning in networkd if managed sysctls are changed
Monitor the sysctl set by networkd for writes, if a sysctl is
overwritten with a different value than the one we set, emit a warning.
Writes are detected with an eBPF program attached as BPF_CGROUP_SYSCTL
which reports the sysctl writes only in net/.

The eBPF program only reports sysctl writes from a different cgroup than networkd.
To do this, it uses the `bpf_current_task_under_cgroup_proto()` helper,
which will be available allowed in BPF_CGROUP_SYSCTL from kernel 6.12[1].

Loading a BPF_CGROUP_SYSCTL program requires the CAP_SYS_ADMIN capability,
so drop it just after the program load, whether it loads successfully or not.

Writes are logged but permitted, in future the functionality can be
extended to also deny writes to managed sysctls.

[1] https://lore.kernel.org/bpf/20240819162805.78235-3-technoboy85@gmail.com/
2024-09-11 23:07:00 +02:00
Yu Watanabe
214c2508f3
Merge pull request #34336 from yuwata/nspawn-fuse-follow-ups
nspawn: follow-ups for FUSE support
2024-09-10 14:32:09 +09:00
Yu Watanabe
b86b90cec5 nspawn: sync DeviceAllow= setting with systemd-nspawn@.service
Follow-up for dc3223919f.
Addresses https://github.com/systemd/systemd/pull/34067#discussion_r1748592958.

Otherwise, containers started with and without --keep-unit option run in
different device policies.
2024-09-10 04:38:11 +09:00
Lennart Poettering
229d4a9806 shell: define three system credentials we can propagate into shell prompts and welcome messages 2024-09-09 19:03:48 +02:00
Luke T. Shumaker
dc3223919f nspawn: enable FUSE in containers
Linux kernel v4.18 (2018-08-12) added user-namespace support to FUSE, and
bumped the FUSE version to 7.27 (see: da315f6e0398 (Merge tag
'fuse-update-4.18' of
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse, Linus Torvalds,
2018-06-07).  This means that on such kernels it is safe to enable FUSE in
nspawn containers.

In outer_child(), before calling copy_devnodes(), check the FUSE version to
decide whether enable (>=7.27) or disable (<7.27) FUSE in the container.  We
look at the FUSE version instead of the kernel version in order to enable FUSE
support on older-versioned kernels that may have the mentioned patchset
backported ([as requested by @poettering][1]).  However, I am not sure that
this is safe; user-namespace support is not a documented part of the FUSE
protocol, which is what FUSE_KERNEL_VERSION/FUSE_KERNEL_MINOR_VERSION are meant
to capture.  While the same patchset
 - added FUSE_ABORT_ERROR (which is all that the 7.27 version bump
   is documented as including),
 - bumped FUSE_KERNEL_MINOR_VERSION from 26 to 27, and
 - added user-namespace support
these 3 things are not inseparable; it is conceivable to me that a backport
could include the first 2 of those things and exclude the 3rd; perhaps it would
be safer to check the kernel version.

Do note that our get_fuse_version() function uses the fsopen() family of
syscalls, which were not added until Linux kernel v5.2 (2019-07-07); so if
nothing has been backported, then the minimum kernel version for FUSE-in-nspawn
is actually v5.2, not v4.18.

Pass whether or not to enable FUSE to copy_devnodes(); have copy_devnodes()
copy in /dev/fuse if enabled.

Pass whether or not to enable FUSE back over fd_outer_socket to run_container()
so that it can pass that to append_machine_properties() (via either
register_machine() or allocate_scope()); have append_machine_properties()
append "DeviceAllow=/dev/fuse rw" if enabled.

For testing, simply check that /dev/fuse can be opened for reading and writing,
but that actually reading from it fails with EPERM.  The test assumes that if
FUSE is supported (/dev/fuse exists), then the testsuite is running on a kernel
with FUSE >= 7.27; I am unsure how to go about writing a test that validates
that the version check disables FUSE on old kernels.

[1]: https://github.com/systemd/systemd/issues/17607#issuecomment-745418835

Closes #17607
2024-09-07 10:18:35 -06:00
Etienne Cordonnier
4ac1755be2 coredump: set ProtectHome to read-only
In 924453c225
ProtectHome was set to true for systemd-coredump in order to reduce risk, since an attacker could craft a malicious binary in order to compromise systemd-coredump.
At that point the object analysis was done in the main systemd-coredump process.
Because of this systemd-coredump is unable to product symbolicated call-stacks for binaries running under /home ("n/a" is shown instead of function names).

However, later in 61aea456c1 systemd-coredump was changed to do the object analysis in a forked process,
covering those security concerns.

Let's set ProtectHome to read-only so that systemd-coredump produces symbolicated call-stacks for processes running under /home.
2024-09-06 13:30:36 +02:00
Lennart Poettering
e7c8d78e7f
units: don't set LISTEN_FDNAMES for varlink services explicitly
Now that FileDescriptorName= is properly honored by Accept=yes sockets,
this explicit override is pointless.
2024-08-26 15:40:15 +02:00
Adrian Vovk
bf2c741fd7
sysupdate: Implement systemd-sysupdated dbus service
Co-authored-by: Tom Coldrick <thomas.coldrick@codethink.co.uk>
Co-authored-by: Abderrahim Kitouni <abderrahim.kitouni@codethink.co.uk>
2024-08-21 09:31:41 +01:00
Ronan Pigott
3d2157e707 units: drop "-p" flag from agetty's login options
This flag was added in db6aedab92 with the justification that locale
environment variables should be preserved by the user session. However,
the companion patch to drop the UnsetEnvironment= directive blocking
these variables was never merged, so the intended change was never
effected.

While the patch was ineffective toward its stated goal, the "-p" option
does have material negative consequences for the user session in
systemd — environment variables to support the use of
credentials and memory pressure directives, such as
$CREDENTIALS_DIRECTORY and $MEMORY_PRESSURE_WATCH, which are now
directly used by agetty and login, get leaked into the user session
potentially breaking applications that rely on these values.

E.g. systemd-ask-password fails from the tty when $CREDENTIALS_DIRECTORY
has been leaked from agetty, because it expects to be able to access
credentials in $CREDENTIALS_DIRECTORY.

This effectively reverts db6aedab92.

References: db6aedab92 (units: Tell login to preserve environment (#6023), 2017-05-24)
2024-08-15 16:49:02 +09:00
Daan De Meyer
e97429902d units: Import tty specific credentials for each getty unit
As explained in the previous commit, this allows us to configure
agetty and login for individual ttys instead of globally.
2024-07-31 15:52:29 +02:00
Lennart Poettering
56ea3c262c units: bring agetty command lines back into sync
Let's always rely on our own TTY reset logic and tty disallocation/clear
screen logic, thus always pass --noclear and --noreset.

Also, bring the list of baud rates to try into sync for console-getty
and serial-getty (the former might or might not be connected to rs232,
we can't know, hence assume the worst, and copy what
serial-getty@.service does)
2024-07-19 11:44:04 +02:00
Luca Boccassi
bbb0b72849 fsck: do not pull down mount units on soft-reboot
Otherwise they will pull down the disk too, which we don't want on soft-reboot
2024-07-09 20:59:35 +02:00
fwfy
496b4fa0e9
Remove extra period at the end of systemd-bsod's unit description. (#33632)
* Remove extra period at end of unit description.

Having an extra period at the end of this unit description makes log entries pertaining to it appear weirdly, as it seems the default expectation is that there is not to be a period at the end of a unit description.

e.g.: `systemd[1]: Started Displays emergency message in full screen..`
2024-07-06 10:17:20 +01:00
Lennart Poettering
29294d21cf units: add dep on systemd-logind.service by user@.service
Let's make sure logind is accessible by the time user@.service runs, and
that logind stays around as long as it does so.

Addresses an issue reported here:

https://lists.freedesktop.org/archives/systemd-devel/2024-June/050468.html

This addresses an issued introduced by
278e815bfa, which dropped the a dependency
from user@.service systemd-user-sessions.service without replacement.
While dropping that dependency does make sense, it should have been
replaced with the weaker dependency on systemd-logind.service, hence fix
that now.

user@.service is after all a logind concept, hence logind really should
be around for its lifetime.

systemd-user-sessions.service is a later milestone that only really
should apply to regular users (not root), hence it's too strong a
requirement.
2024-07-01 18:52:35 +02:00
Lennart Poettering
f596658811 importd: allow activation in early boot, and make it socket activatable
Previously, importd was only accessible via D-Bus, which required it to
be a late boot service. Now that we have Varlink we can rearrange things
to become early-boot activated, just after the image directories are
mounted.

This will later allow us to have generator that auto-downloads images on
boot.
2024-06-25 09:57:42 +02:00
Lennart Poettering
ec67cc9785 units: register vmspawn VMs started via systemd-vmspawn@.service by default with machined 2024-06-21 17:49:26 +02:00
Mike Yuan
b5c8cc0a3b man,units: drop "temporary" from description of systemd-tmpfiles
Historically, systemd-tmpfiles was designed to manager temporary
files, but nowadays it has become a generic tool for managing
all kinds of files. To avoid user confusion, let's remove "temporary"
from the tool's description.

As discussed in #33349
2024-06-15 19:08:35 +02:00
Daan De Meyer
d6518003f8 tpm2-setup: Don't fail if we can't access the TPM due to authorization failure
The TPM might be password/pin protected for various reasons even if
there is no SRK yet. Let's handle those cases gracefully instead of
failing the unit as it is enabled by default.
2024-06-12 18:31:21 +09:00
Daan De Meyer
4861eace12 presets: Don't enable systemd-homed-firstboot.service by default
Enabling this service by default means every CI image without a
regular user now gets stuck on first boot due to the password prompt
from systemd-homed-firstboot.service. Let's not enable the service
by default but instead require users to enable it explicitly if they
want its behavior.

Fixes #33249
2024-06-08 11:29:55 +01:00
Luca Boccassi
d6243ebedd journald: enable persistent FD Store to fix logging during soft-reboot
A unit with StandardOutput=journal (the default) will get its stdout/stderr sockets
disconnected when journald stops, as the file descriptors on journald's side are
not preserved (it works on restart, as the FD Store keeps them open during restarts).
Set FileDescriptorStorePreserve=yes so that the journal FD's stay open during a soft
reboot, and applications don't get broken stdout/stderr.
2024-06-03 16:30:54 +01:00
Zbigniew Jędrzejewski-Szmek
a37454bd90 man: update links to "API File Systems" 2024-05-28 14:48:56 +02:00
Zbigniew Jędrzejewski-Szmek
d5c17aceb3 various: update links to more wiki pages 2024-05-28 14:48:53 +02:00
Yu Watanabe
7ae27cefd7 unit: also stop systemd-journal-flush.service on soft-reboot
After soft-reboot, /var/log/journal may be initially read-only,
and becomes writable a bit later. In such case, runtime journal is
initially opened by journald. Hence, we need to flush to /var when it is
ready.
2024-05-26 03:11:24 +09:00
Yu Watanabe
37143fdf5a units: stop systemd-journald before systemd-soft-reboot.service
Typically, soft-reboot.target is never reached. So, without this change,
systemd-journald may be killed by PID1 on soft-reboot, and may cause
journal corruption.
2024-05-23 00:08:14 +09:00
Yu Watanabe
03a41c41ee Revert "units: do not soft-reboot before soft-reboot.target reached"
This reverts commit 4263d7617f.

Still I think this is the way to go. But the change was merged after -rc2,
and still discussion is continued. So, at least now let's revert it,
and do that after v256-final is released if approved.
2024-05-23 00:06:30 +09:00
Yu Watanabe
1cd904bbe0 units: add JobTimeoutAction= to exit.target and friends
For consistency with other targets, e.g. poweroff.target or
reboot.target.
2024-05-18 01:28:14 +09:00
Yu Watanabe
4263d7617f units: do not soft-reboot before soft-reboot.target reached
Otherwise, at the time systemd-soft-reboot.service succeeds,
services which has Conflicts= and Before=soft-reboot.target may
not be stopped yet, and may be SIGKILLed.

Especially, systemd-journald.service has the dependencies, thus
journal may be corrupted. See #32223.

Follow-up for 13ffc60749.

Fixes #32834.
2024-05-17 12:31:00 +09:00
Yu Watanabe
11a55b15bf units: drop dependencies of soft-reboot.target from systemd-journald@.service
The service deos not have DefaultDependencies=no. Hence it has dependencies
of shutdown.target, and dependencies of soft-reboot.target are not
necessary.

Follow-up for f89985ca49.
2024-05-17 11:57:53 +09:00
Yu Watanabe
61628287bd journal: explicitly sync namespaced journals before stopping socket units
Otherwise, if a service unit that requests LogNamespace= stopped before
systemd-journald@.service is started, logs generated by the service will be
lost, as systemd-journald@.socket is stopped and
systemd-journald@.service will never started.

To prevent the issue, let's introduce another implicit dependency to
a oneshot service that explicitly synchronizes a namespaced journal file
when the log namespace is not needed anymore.

Fixes #32604.
2024-05-02 19:41:01 +02:00
Dmitry V. Levin
c309b9e9c3 treewide: fix a few typos in NEWS, docs, comments, and log messages 2024-04-27 12:11:13 +02:00
Luca Boccassi
1ac79a1937 units: add Before=shutdown.target to systemd-networkd-persistent-storage.service
It's ordered with networkd, but just in case. Lintian complains:

W: systemd: systemd-service-file-shutdown-problems [usr/lib/systemd/system/systemd-networkd-persistent-storage.service]

Follow-up for 91676b6458
2024-04-26 22:16:33 +02:00
Lennart Poettering
ad7ac02035 units: merge two After= lines 2024-04-22 15:15:05 +02:00
Lennart Poettering
a6e9c37f5e tpm2-setup-early: order against pcrphase-initrd
Right now systemd-tpm2-setup-early and systemd-pcrphase-initrd.service
are not ordered against each other. However, they require the same slow
resource to operate: the TPM2. If we allow them to access the device
simultaneously, the kernel resource manager like has to save/restore TPM
state while they operate, slowing things down further.

hence, let's avoid all this mess, and just order them against each other
so that the shared resource is first used in full by one and then by the
other.

I opted to order systemd-pcrphase-initrd before
systemd-tpm2-setup-early, since there's value in having the former as
early as possible in userspace, to be a good marker for the transition
from kernel to first userspace. I can see no benefit in the opposite
order however.
2024-04-22 14:47:58 +02:00
Yu Watanabe
5700e755a9 units: introduce systemd-udev-load-credentials.service 2024-04-16 09:45:43 +09:00
Lennart Poettering
27dd678d2d units: order repart after systemd-tpm2-setup-early.service
This mimics what we do for systemd-cryptsetup@.service (see
src/shared/generator.c), and makes sense since repart might lock up the
root volume against a TPM, which ideally has its SRK already set up by
then.

More importantly though, this ensures that we ordered correctly after
tpm2.target (which systemd-tpm2-setup-early.service has a dependency
on), for systems where the TPM drivers are not compiled into the kernel.

See: https://lists.freedesktop.org/archives/systemd-devel/2024-April/050201.html
2024-04-15 22:33:45 +02:00
Mike Yuan
40611863e4
units/systemd-boot-check-no-failures.service: drop unneeded dep on shutdown.target 2024-04-10 23:40:53 +08:00
Lennart Poettering
702a52f4b5 mountfsd: add new systemd-mountfsd component 2024-04-06 16:08:24 +02:00
Lennart Poettering
8aee931e7a nsresourced: add new daemon for granting clients user namespaces and assigning resources to them
This adds a small, socket-activated Varlink daemon that can delegate UID
ranges for user namespaces to clients asking for it.

The primary call is AllocateUserRange() where the user passes in an
uninitialized userns fd, which is then set up.

There are other calls that allow assigning a mount fd to a userns
allocated that way, to set up permissions for a cgroup subtree, and to
allocate a veth for such a user namespace.

Since the UID assignments are supposed to be transitive, i.e. not
permanent, care is taken to ensure that users cannot create inodes owned
by these UIDs, so that persistancy cannot be acquired. This is
implemented via a BPF-LSM module that ensures that any member of a
userns allocated that way cannot create files unless the mount it
operates on is owned by the userns itself, or is explicitly
allowelisted.

BPF LSM program with contributions from Alexei Starovoitov.
2024-04-06 16:08:24 +02:00
Mike Yuan
dfad86b838
units: introduce systemd-hibernate-clear.service that clears
stale HibernateLocation EFI variable

Currently, if the HibernateLocation EFI variable exists,
but we failed to resume from it, the boot carries on
without clearing the stale variable. Therefore, the subsequent
boots would still be waiting for the device timeout,
unless the variable is purged manually.

There's no point to keep trying to resume after a successful
switch-root, because the hibernation image state
would have been invalidated by then. OTOH, we don't
want to clear the variable prematurely either,
i.e. in initrd, since if the resume device is the same
as root one, the boot won't succeed and the user might
be able to try resuming again. So, let's introduce a
unit that only runs after switch-root and clears the var.

Fixes #32021
2024-04-03 22:07:43 +08:00
Mike Yuan
4f156b1078
units: remove implicit RequiresMountsFor= 2024-04-01 19:44:51 +08:00
Yu Watanabe
d30d0b04ae
Merge pull request #31951 from bluca/resolve_reload
resolved: support reloading configuration at runtime
2024-03-27 02:37:52 +09:00