Commit Graph

44 Commits

Author SHA1 Message Date
Yu Watanabe
b86b90cec5 nspawn: sync DeviceAllow= setting with systemd-nspawn@.service
Follow-up for dc3223919f.
Addresses https://github.com/systemd/systemd/pull/34067#discussion_r1748592958.

Otherwise, containers started with and without --keep-unit option run in
different device policies.
2024-09-10 04:38:11 +09:00
Luke T. Shumaker
dc3223919f nspawn: enable FUSE in containers
Linux kernel v4.18 (2018-08-12) added user-namespace support to FUSE, and
bumped the FUSE version to 7.27 (see: da315f6e0398 (Merge tag
'fuse-update-4.18' of
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse, Linus Torvalds,
2018-06-07).  This means that on such kernels it is safe to enable FUSE in
nspawn containers.

In outer_child(), before calling copy_devnodes(), check the FUSE version to
decide whether enable (>=7.27) or disable (<7.27) FUSE in the container.  We
look at the FUSE version instead of the kernel version in order to enable FUSE
support on older-versioned kernels that may have the mentioned patchset
backported ([as requested by @poettering][1]).  However, I am not sure that
this is safe; user-namespace support is not a documented part of the FUSE
protocol, which is what FUSE_KERNEL_VERSION/FUSE_KERNEL_MINOR_VERSION are meant
to capture.  While the same patchset
 - added FUSE_ABORT_ERROR (which is all that the 7.27 version bump
   is documented as including),
 - bumped FUSE_KERNEL_MINOR_VERSION from 26 to 27, and
 - added user-namespace support
these 3 things are not inseparable; it is conceivable to me that a backport
could include the first 2 of those things and exclude the 3rd; perhaps it would
be safer to check the kernel version.

Do note that our get_fuse_version() function uses the fsopen() family of
syscalls, which were not added until Linux kernel v5.2 (2019-07-07); so if
nothing has been backported, then the minimum kernel version for FUSE-in-nspawn
is actually v5.2, not v4.18.

Pass whether or not to enable FUSE to copy_devnodes(); have copy_devnodes()
copy in /dev/fuse if enabled.

Pass whether or not to enable FUSE back over fd_outer_socket to run_container()
so that it can pass that to append_machine_properties() (via either
register_machine() or allocate_scope()); have append_machine_properties()
append "DeviceAllow=/dev/fuse rw" if enabled.

For testing, simply check that /dev/fuse can be opened for reading and writing,
but that actually reading from it fails with EPERM.  The test assumes that if
FUSE is supported (/dev/fuse exists), then the testsuite is running on a kernel
with FUSE >= 7.27; I am unsure how to go about writing a test that validates
that the version check disables FUSE on old kernels.

[1]: https://github.com/systemd/systemd/issues/17607#issuecomment-745418835

Closes #17607
2024-09-07 10:18:35 -06:00
Nick Rosbrook
411d8c72ec nspawn: set CoredumpReceive=yes on container's scope when --boot is set
When --boot is set, and --keep-unit is not, set CoredumpReceive=yes on
the scope allocated for the container. When --keep-unit is set, nspawn
does not allocate the container's unit, so the existing unit needs to
configure this setting itself.

Since systemd-nspawn@.service sets --boot and --keep-unit, add
CoredumpReceives=yes to that unit.
2023-10-13 15:28:50 -04:00
Lennart Poettering
1a3704dcc3 nspawn: port over to /supervisor/ subcgroup being delegated to nspawn
Let's make use of the new DelegateSubgroup= feature and delegate the
/supervisor/ subcgroup already to nspawn, so that moving the supervisor
process there is unnecessary.
2023-04-27 12:18:32 +02:00
Lennart Poettering
143a1f1039 units: change modprobe@dm-mod.service → modprobe@dm_mod.service
Follow-up for 8f1359bf85
2022-12-23 17:26:48 +01:00
Yu Watanabe
8f1359bf85 unit: use underbar for module name
For consistency with src/core/unit.c.
2022-12-19 12:12:02 +01:00
Lennart Poettering
047c2c14c5 units: drop After=systemd-resolved.service from systemd-nspawn@.service
resolved is now started as part of early boot hence we need no explicit
ordering anymore.
2022-02-24 10:37:11 +01:00
Zbigniew Jędrzejewski-Szmek
fa10451686 units: strip out the developer comment in .service unit again
The comment talks about upstream development steps and doesn't make
sense for users. We used special '## ' syntax to strip it out during
build, but it got inadvertently reformatted as a normal comment
in 3982becc92.
2021-05-19 10:24:43 +09:00
Zbigniew Jędrzejewski-Szmek
059cc610b7 meson: use jinja2 for unit templates
We don't need two (and half) templating systems anymore, yay!

I'm keeping the changes minimal, to make the diff manageable. Some enhancements
due to a better templating system might be possible in the future.

For handling of '## ' — see the next commit.
2021-05-19 10:24:43 +09:00
Yu Watanabe
db9ecf0501 license: LGPL-2.1+ -> LGPL-2.1-or-later 2020-11-09 13:23:58 +09:00
Kevin P. Fleming
3b355677b8 RequireMountsFor in systemd-nspawn should wait for machine mount
This patch modifies the RequireMountsFor setting in systemd-nspawn@.service to wait for the machine instance directory to be mounted, not just /var/lib/machines.

Closes #14931
2020-03-02 19:37:51 +09:00
Zbigniew Jędrzejewski-Szmek
cdc6804b60 units: drop full paths for utilities in $PATH
This makes things a bit simpler and the build a bit faster, because we don't
have to rewrite files to do the trivial substitution. @rootbindir@ is always in
our internal $PATH that we use for non-absolute paths, so there should be no
functional change.
2020-01-20 16:50:16 +01:00
Iain Lane
625077264b units: Split modprobing out into a separate service unit
Devices referred to by `DeviceAllow=` sandboxing are resolved into their
corresponding major numbers when the unit is loaded by looking at
`/proc/devices`. If a reference is made to a device which is not yet
available, the `DeviceAllow` is ignored and the unit's processes cannot
access that device.

In both logind and nspawn, we have `DeviceAllow=` lines, and `modprobe`
in `ExecStartPre=` to load some kernel modules. Those kernel modules
cause device nodes to become available when they are loaded: the device
nodes may not exist when the unit itself is loaded. This means that the
unit's processes will not be able to access the device since the
`DeviceAllow=` will have been resolved earlier and denied it.

One way to fix this would be to re-evaluate the available devices and
re-apply the policy to the cgroup, but this cannot work atomically on
cgroupsv1. So we fall back to a second approach: instead of running
`modprobe` via `ExecStartPre`, we move this out to a separate unit and
order it before the units which want the module.

Closes #14322.
Fixes: #13943.
2020-01-07 18:37:30 +01:00
Zbigniew Jędrzejewski-Szmek
21d0dd5a89 meson: allow WatchdogSec= in services to be configured
As discussed on systemd-devel [1], in Fedora we get lots of abrt reports
about the watchdog firing [2], but 100% of them seem to be caused by resource
starvation in the machine, and never actual deadlocks in the services being
monitored. Killing the services not only does not improve anything, but it
makes the resource starvation worse, because the service needs cycles to restart,
and coredump processing is also fairly expensive. This adds a configuration option
to allow the value to be changed. If the setting is not set, there is no change.

My plan is to set it to some ridiculusly high value, maybe 1h, to catch cases
where a service is actually hanging.

[1] https://lists.freedesktop.org/archives/systemd-devel/2019-October/043618.html
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1300212
2019-10-25 17:20:24 +02:00
Lennart Poettering
31ea9c89d4 nspawn: explicitly load units beforehand so that DeviceAllow= syntax works
Yuck, but I don't see any prettier solution.

Fixes: #13130
2019-07-23 13:30:56 +02:00
Lennart Poettering
8fd010bb1b nspawn: turn on watchdog logic for nspawn too
It's a long-running daemon, and it's easy to enable, hence do it.
2017-12-07 12:34:46 +01:00
Zbigniew Jędrzejewski-Szmek
a7df2d1e43 Add SPDX license headers to unit files 2017-11-19 19:08:15 +01:00
Lennart Poettering
3982becc92 units: include DM devices in DeviceAllow fpor systemd-nspawn@.service
We need it to make LUKS devices work.

Fixes: #6525
2017-08-29 16:01:19 +02:00
Josef Gajdusek
be5bd2ec62 systemd-nspawn@.service: start after /var/lib/machines is mounted (#6079)
This fixes a race condition during boot, where an nspawn container would start
before /var/lib/machines got mounted resulting in a failure.
2017-06-06 11:18:22 -04:00
Lennart Poettering
dec718065b units: order systemd-nspawn@.service after systemd-resolved.service
This way, the nspawn internal check whether resolved is running will
succeed if it is enabled.

Fixes: #4649
2017-02-17 16:06:31 -05:00
Zbigniew Jędrzejewski-Szmek
9c0f732c62 Introduce '## ' as internal comment prefix in .in files and filter out a comment (#5289)
Sometimes we have comments which don't make sense outside of the systemd
codebase, so let's filter them out from the user-visible files.

Fixes #5286.
2017-02-09 16:28:37 +01:00
Alessandro Puccetti
54cd6556b3 nspawn: set DevicesPolicy closed and clean up duplicated devices 2016-07-22 16:08:26 +02:00
Martin Pitt
5c3c778014 Merge pull request #3764 from poettering/assorted-stuff-2
Assorted fixes
2016-07-22 09:10:04 +02:00
Alessandro Puccetti
31d28eabc1 nspawn: enable major=0/minor=0 devices inside the container (#3773)
https://github.com/systemd/systemd/pull/3685 introduced
/run/systemd/inaccessible/{chr,blk} to map inacessible devices,
this patch allows systemd running inside a nspawn container to create
/run/systemd/inaccessible/{chr,blk}.
2016-07-21 17:39:38 +02:00
Lennart Poettering
8d36b53a2d units: fix TasksMax=16384 for systemd-nspawn@.service
When a container scope is allocated via machined it gets 16K set already since
cf7d1a30e4. Make sure when a container is run as
system service it gets the same values.
2016-07-20 14:53:15 +02:00
Lennart Poettering
af88764ff8 units: turn on user namespace by default in systemd-nspawn@.service
Now that user namespacing is supported in a pretty automatic way, actually turn
it on by default if the systemd-nspawn@.service template is used.
2016-04-25 12:16:03 +02:00
Elias Probst
7a8c9e4457
Don't escape the name of the container in instances of
When using `%I` for instances of `systemd-nspawn@.service`, the result
will be `systemd-nspawn` trying to launch a container named e.g.
`fedora/23` instead of `fedora-23`.
Using `%i` instead prevents escaping `-` in a container name and uses
the unmodified container name from the machine store.
2016-02-26 20:39:10 +01:00
Lennart Poettering
541ec33075 nspawn: set TasksMax= for containers to 8192 by default 2015-11-16 11:58:04 +01:00
Lennart Poettering
a2c90f05f1 units: also whitelist "blkext" block devices for nspawn service
/dev/loop*p* block devices are of the "blkext" subsystem, not of loop,
hence whitelist this too.

Fixes #1446
2015-10-22 01:59:25 +02:00
Lennart Poettering
988a479642 nspawn: fix --image= when nspawn is run as service
nspawn needs access to /dev/loop to implement --image=, hence grant that
in the service file.

Fixes #1446.
2015-10-03 11:23:52 +02:00
Lennart Poettering
08acb521f3 units: make sure that .nspawn files override the default settings in systemd-nspawn@.service 2015-09-06 01:49:06 +02:00
Lennart Poettering
45d383a3b8 units: make sure systemd-nspawn@.slice instances are actually located in machine.slice
https://plus.google.com/112206451048767236518/posts/SYAueyXHeEX
2015-05-19 19:49:01 +02:00
Lennart Poettering
d3650f0c4b units: order nspawn containers after network.target
This way we know that any bridges and other user-created network devices
are in place, and can be properly added to the container.

In the long run this should be dropped, and replaced by direct calls
inside nspawn that cause the devices to be created when necessary.
2015-05-11 22:18:20 +02:00
Lennart Poettering
773ce3d89c nspawn: make sure we install the device policy if nspawn is run as unit as on the command line 2015-04-28 21:34:23 +02:00
Lennart Poettering
7d5fed66a6 units: turn on --network-veth by default for systemd-nspawn@.service
Given the recent improvements in networkd, it's probably the better
default now.
2015-02-13 14:35:50 +01:00
Lennart Poettering
6a140df004 units: rework systemd-nspawn@.service unit
- Unescape instance name so that we can take almost anything as instance
  name.

- Introduce "machines.target" which consists of all enabled nspawns and
  can be used to start/stop them altogether

- Look for container directory using -M instead of harcoding the path in
  /var/lib/container
2014-12-29 17:00:05 +01:00
Martin Pitt
574edc9006 nspawn: Add try-{host,guest} journal link modes
--link-journal={host,guest} fail if the host does not have persistent
journalling enabled and /var/log/journal/ does not exist. Even worse, as there
is no stdout/err any more, there is no error message to point that out.

Introduce two new modes "try-host" and "try-guest" which don't fail in this
case, and instead just silently skip the guest journal setup.

Change -j to mean "try-guest" instead of "guest", and fix the wrong --help
output for it (it said "host" before).

Change systemd-nspawn@.service.in to use "try-guest" so that this unit works
with both persistent and non-persistent journals on the host without failing.

https://bugs.debian.org/770275
2014-11-21 14:27:26 +01:00
Lennart Poettering
a931ad47a8 core: introduce new Delegate=yes/no property controlling creation of cgroup subhierarchies
For priviliged units this resource control property ensures that the
processes have all controllers systemd manages enabled.

For unpriviliged services (those with User= set) this ensures that
access rights to the service cgroup is granted to the user in question,
to create further subgroups. Note that this only applies to the
name=systemd hierarchy though, as access to other controllers is not
safe for unpriviliged processes.

Delegate=yes should be set for container scopes where a systemd instance
inside the container shall manage the hierarchies below its own cgroup
and have access to all controllers.

Delegate=yes should also be set for user@.service, so that systemd
--user can run, controlling its own cgroup tree.

This commit changes machined, systemd-nspawn@.service and user@.service
to set this boolean, in order to ensure that container management will
just work, and the user systemd instance can run fine.
2014-11-05 18:49:14 +01:00
Lennart Poettering
ce38dbc84b nspawn: when running in a service unit, use systemd for restarts
THis way we can remove cgroup priviliges after setup, but get them back
for the next restart, as we need it.
2014-07-03 12:51:07 +02:00
Jonathan Liu
d8e40d62ab units: use KillMode=mixed for systemd-nspawn@.service
This causes the container to shut down cleanly when the service is
stopped.
2014-05-30 09:36:29 -04:00
Lennart Poettering
c480d2f8bc units: make use of nspawn's --keep-unit switch in systemd-nspawn@.service 2014-02-11 21:13:51 +01:00
Zbigniew Jędrzejewski-Szmek
9cb74bcb23 man,units: fix installation of systemd-nspawn@.service and add example 2013-11-09 19:02:53 -05:00
Lennart Poettering
3331234adc nspawn: update unit file
ControlGroup= is obsolete, so let's drop it from the default nspawn unit
file.
2013-09-17 11:59:47 -05:00
Lennart Poettering
05947befce units: add an easy-to-use unit template file systemd-nspawn@.service for running containers as system services 2013-04-30 08:36:02 -03:00