glibc 2.30 (Aug 2019) added a wrapper for getdents64(). For older
versions let's define our own.
(This syscall exists since Linux 2.4, hence should be safe to use for
us)
In upstream, we have a linearly-growing list of net-naming-scheme defines;
we add a new one for every release where we make user-visible changes to the
naming scheme.
But the general idea was that downstream distributions could define their
own combinations (or even just their own names for existing combinations),
so provide stability for their users. So far this required patching of the
netif-naming-scheme.c and .h files to add the new lines.
With this patch, patching is not required:
$ meson configure build \
-Dextra-net-naming-schemes=gargoyle=v238+npar_ari+allow_rerenames,gargoyle2=gargoyle+nspawn_long_hash \
-Ddefault-net-naming-scheme=gargoyle2
or even
$ meson configure build \
-Dextra-net-naming-schemes=gargoyle=v238+npar_ari+allow_rerenames,gargoyle2=gargoyle+nspawn_long_hash,latest=v249 \
-Ddefault-net-naming-scheme=gargoyle2
The syntax is a comma-separated list of NAME=name+name+…
This syntax is a bit scary, but any typos result in compilation errors,
so I think it should be OK in practice.
With this approach, we don't allow users to define arbitrary combinations:
what is allowed is still defined at compilation time, so it's up to the
distribution maintainers to provide reasonable combinations. In this regard,
the only difference from status quo is that it's much easier to do (and harder
to do incorrectly, for example by forgetting to add a name to one of the
maps).
We used 'combo' type for the scheme list. For a while we forgot to add
new names, and recently aa0a23ec86 added v241, v243, v245, and v247.
I want to allow defining new values during configuration, which means
that we can't use meson to verify the list of options. So any value is
allowed, but then two tests are added: one that will fail compilation if some
invalid name is given (other than "latest"), and one that converts
DEFAULT_NET_NAMING_SCHEME to a NamingScheme pointer.
Before:
```
Compiling C object src/libsystemd-network/libsystemd-network.a.p/dhcp6-option.c.o
../src/libsystemd-network/dhcp6-option.c: In function ‘dhcp6_option_parse_ia’:
../src/libsystemd-network/dhcp6-option.c:633:70: warning: passing argument 3 of ‘dhcp6_option_parse’ makes pointer from integer without a cast [-Wint-conversion]
633 | r = dhcp6_option_parse(option_data, option_data_len, offset, &subopt, &subdata_len, &subdata);
| ^~~~~~
| |
| size_t {aka long unsigned int}
../src/libsystemd-network/dhcp6-option.c:358:25: note: expected ‘size_t *’ {aka ‘long unsigned int *’} but argument is of type ‘size_t’ {aka ‘long unsigned int’}
358 | size_t *offset,
| ~~~~~~~~^~~~~~
```
After:
```
../src/libsystemd-network/dhcp6-option.c: In function ‘dhcp6_option_parse_ia’:
../src/libsystemd-network/dhcp6-option.c:633:70: error: passing argument 3 of ‘dhcp6_option_parse’ makes pointer from integer without a cast [-Werror=int-conversion]
633 | r = dhcp6_option_parse(option_data, option_data_len, offset, &subopt, &subdata_len, &subdata);
| ^~~~~~
| |
| size_t {aka long unsigned int}
../src/libsystemd-network/dhcp6-option.c:358:25: note: expected ‘size_t *’ {aka ‘long unsigned int *’} but argument is of type ‘size_t’ {aka ‘long unsigned int’}
358 | size_t *offset,
| ~~~~~~~~^~~~~~
cc1: some warnings being treated as errors
```
Compilation would fail because we could have HAVE_SMACK_RUN_LABEL without
HAVE_SMACK. This doesn't make much sense, so let's just make -Dsmack=false
completely disable smack.
Also, the logic in smack-setup.c seems dubious: '#ifdef SMACK_RUN_LABEL'
would evaluate to true even if -Dsmack-run-label='' is used. I think
this was introduced in the conversion to meson:
8b197c3a8a added
AC_ARG_WITH(smack-run-label,
AS_HELP_STRING([--with-smack-run-label=STRING],
[run systemd --system with a specific SMACK label]),
[AC_DEFINE_UNQUOTED(SMACK_RUN_LABEL, ["$withval"], [Run with a smack label])],
[])
i.e. it really was undefined if not specified. And it was same
still in 72cdb3e783 when configure.ac
was dropped.
So let's use the single conditional HAVE_SMACK_RUN_LABEL everywhere.
Units are copied out via sendmsg datafd from images, but that means
the SELinux labels get lost in transit. Extract them and copy them over.
Given recvmsg cannot use multiple IOV transparently when the sizes are
variable, use a '\0' as a separator between the filename and the label.
upstream meson stopped allowing combining boolean with the plus
operator, and now requires using the logical and operator
reference:
43302d3296Fixes: #20632
Add support for systemd-pkcs11 based LUKS2 device activation
via libcryptsetup plugin. This make the feature (pkcs11 sealed
LUKS2 keyslot passphrase) usable from both systemd utilities
and cryptsetup cli.
The feature is configured via -Dlibcryptsetup-plugins combo
with default value set to 'auto'. It get's enabled automatically
when cryptsetup 2.4.0 or later is installed in build system.
Add support for systemd-fido2 based LUKS2 device activation
via libcryptsetup plugin. This make the feature (fido2 sealed
LUKS2 keyslot passphrase) usable from both systemd utilities
and cryptsetup cli.
The feature is configured via -Dlibcryptsetup-plugins combo
with default value set to 'auto'. It get's enabled automatically
when cryptsetup 2.4.0 or later is installed in build system.
The output is similar to our hand-crafted status message, but it's nice to use
the built-in functionality. After all, it was amended during development to
support our use case.
This undoes part of 4c890ad3cc: the
implementations of update-dbus-docs and update-man-rules are moved back to
man/meson.build, and alias_target() is used to keep the visible target names
unchanged.
The rules for man pages are reworked so that it's possible to invoke the
targets even if xstlproc is not available. After all, xsltproc is only needed
for the final formatted output, and not other processing.
Ubuntu Bionic 18.04 has 0.45, so it was below the previously required
minimum version already. Focal 20.04 has 0.53.2. Let's require that
and use various features that are available.
Otherwise the build sometimes fails in a racy way:
```
[274/1850] Compiling C object src/cryptsetup/cryptsetup-tokens/libcryptsetup-token-systemd-tpm2_static.a.p/cryptsetup-token-systemd-tpm2.c.o
FAILED: src/cryptsetup/cryptsetup-tokens/libcryptsetup-token-systemd-tpm2_static.a.p/cryptsetup-token-systemd-tpm2.c.o
cc -Isrc/cryptsetup/cryptsetup-tokens/libcryptsetup-token-systemd-tpm2_static.a.p (...) -c ../build/src/cryptsetup/cryptsetup-tokens/cryptsetup-token-systemd-tpm2.c
../build/src/cryptsetup/cryptsetup-tokens/cryptsetup-token-systemd-tpm2.c:12:10: fatal error: version.h: No such file or directory
12 | #include "version.h"
| ^~~~~~~~~~~
compilation terminated.
```
Follow-up to d1ae38d85a.
Add support for systemd-tpm2 based LUKS2 device activation
via libcryptsetup plugin. This make the feature (tpm2 sealed
LUKS2 keyslot passphrase) usable from both systemd utilities
and cryptsetup cli.
The feature is configured via -Dlibcryptsetup-plugins combo
with default value set to 'auto'. It get's enabled automatically
when cryptsetup 2.4.0 or later is installed in build system.
This closes an important gap: so far we would reexecute the system manager and
restart system services that were configured to do so, but we wouldn't do the
same for user managers or user services.
The scheme used for user managers is very similar to the system one, except
that there can be multiple user managers running, so we query the system
manager to get a list of them, and then tell each one to do the equivalent
operations: daemon-reload, disable --now, set-property Markers=+needs-restart,
reload-or-restart --marked.
The total time that can be spend on this is bounded: we execute the commands in
parallel over user managers and units, and additionally set SYSTEMD_BUS_TIMEOUT
to a lower value (15 s by default). User managers should not have too many
units running, and they should be able to do all those operations very
quickly (<< 1s). The final restart operation may take longer, but it's done
asynchronously, so we only wait for the queuing to happen.
The advantage of doing this synchronously is that we can wait for each step to
happen, and for example daemon-reloads can finish before we execute the service
restarts, etc. We can also order various steps wrt. to the phases in the rpm
transaction.
When this was initially proposed, we discussed a more relaxed scheme with bus
property notifications. Such an approach would be more complex because a bunch
of infrastructure would have to be added to system manager to propagate
appropriate notifications to the user managers, and then the user managers
would have to wait for them. Instead, now there is no new code in the managers,
all new functionality is contained in src/rpm/. The ability to call 'systemctl
--user user@' makes this approach very easy. Also, it would be very hard to
order the user manager steps and the rpm transaction steps.
Note: 'systemctl --user disable' is only called for a user managers that are
running. I don't see a nice way around this, and it shouldn't matter too much:
we'll just leave a dangling symlink in the case where the user enabled the
service manually.
A follow-up for https://bugzilla.redhat.com/show_bug.cgi?id=1792468 and
fa97d2fcf6.
Instead of embedding the commands to invoke directly in the macros,
let's use a helper script as indirection. This has a couple of advantages:
- the macro language is awkward, we need to suffix most commands by "|| :"
and "\", which is easy to get wrong. In the new scheme, the macro becomes
a single simple command.
- in the script we can use normal syntax highlighting, shellcheck, etc.
- it's also easier to test the invoked commands by invoking the helper
manually.
- most importantly, the logic is contained in the helper, i.e. we can
update systemd rpm and everything uses the new helper. Before, we would
have to rebuild all packages to update the macro definition.
This raises the question whether it makes sense to use the lua scriptlets when
the real work is done in a bash script. I think it's OK: we still have the
efficient lua scripts that do the short scripts, and we use a single shared
implementation in bash to do the more complex stuff.
The meson version is raised to 0.47 because that's needed for install_mode.
We were planning to raise the required version anyway…
This moves the /var/log/README content out of /var and into the
docs location, replacing the previous file with a symlink
created through a tmpfiles.d entry.
We disabled it in f73fb7b742 in response to an
apparent gcc bug. It seems that depending on the combination of optimization
options, gcc still ignores (void). But this seems to work fine with clang, so
let's re-enable the warning conditionally.
So, as it turns out AF_ALG is turned off in a lot of kernels/container
environments, including our CI. Hence, if we link against OpenSSL
anyway, let's just use that client side. It's also faster.
One of those days we should drop the khash code, and ust use OpenSSL,
once the licensing issues are resolved.
The general idea with users and groups created through sysusers is that an
appropriate number is picked when the allocation is made. The number that is
selected will be different on each system based on the order of creation of
users, installed packages, etc. Since system users and groups are not shared
between installations, this generally is not an issue. But it becomes a problem
for initrd: some file systems are shared between the initrd and the host (/run
and /dev are probably the only ones that matter). If the allocations are
different in the host and the initrd, and files survive switch-root, they will
have wrong ownership.
This makes the gids build-time-configurable for all groups and users where
state may survive the switch from initrd to the host.
In particular, all "hardware access" groups are like this: files in /dev will
be owned by them. Eventually the new udev would change ownership, but there
would be a momemnt where the files were owned by the wrong group. The
allocations are "soft-static" in the language of Fedora packaging guidelines:
the uid/gid will be used if possible, but we'll fall back to a different
one. TTY_GID is the exception, because the number is used directly.
Similarly, the possibility to configure "soft-static" uids is added for daemons
which may usefully run in the initramfs: systemd-network (lease information and
interface state is serialized to /run), systemd-resolve (stub files and
interface state), systemd-timesync (/run/systemd/timesync).
Journal files are owned by the group systemd-journal, and acls are granted
for wheel and adm.
systemd-oom and systemd-coredump are excluded from this patch: I assume that
oomd is not useful in the initrd, and coredump leaves no state (it only creates
a pipe in /run?).
The defaults are not changed: if nothing is configured, dynamic allocation will
be used. I looked at a Debian system, and the numbers are all different than
on Fedora.
For Fedora, see the list of uids and gids at https://pagure.io/setup/blob/master/f/uidgid.
In particular, systemd-network and systemd-resolve got soft-static numbers to
make it easy to transition from a non-host-specific initrd to a host system
already a few years back (https://bugzilla.redhat.com/show_bug.cgi?id=1102002).
I also requested static allocations for sgx, input, render in
https://pagure.io/packaging-committee/issue/1078,
https://pagure.io/setup/pull-request/27.
Debugging udev issues especially during the early boot is fairly
difficult. Currently, you need to enable (at least) debug logging and
start monitoring uevents, try to reproduce the issue and then analyze
and correlate two (usually) huge log files. This is not ideal.
This patch aims to provide much more focused debugging tool,
tracepoints. More often then not we tend to have at least the basic idea
about the issue we are trying to debug further, e.g. we know it is
storage related. Hence all of the debug data generated for network
devices is useless, adds clutter to the log files and generally
slows things down.
Using this set of tracepoints you can start asking very specific
questions related to event processing for given device or subsystem.
Tracepoints can be used with various tracing tools but I will provide
examples using bpftrace.
Another important aspect to consider is that using tracepoints you can
debug production systems. There is no need to install test packages with
added logging, no debuginfo packages, etc...
Example usage (you might be asking such questions during the debug session),
Q: How can I list all tracepoints?
A: bpftrace -l 'usdt:/usr/lib/systemd/systemd-udevd:udev:*'
Q: What are the arguments for each tracepoint?
A: Look at the code and search for use of DEVICE_TRACE_POINT macro.
Q: How many times we have executed external binary?
A: bpftrace -e 'usdt:/usr/lib/systemd/systemd-udevd:udev:spawn_exec { @cnt = count(); }'
Q: What binaries where executed while handling events for "dm-0" device?
A bpftrace -e 'usdt:/usr/lib/systemd/systemd-udevd:udev:spawn_exec / str(arg1) == "dm-0"/ { @cmds[str(arg4)] = count(); }'
Thanks to Thomas Weißschuh <thomas@t-8ch.de> for reviewing this patch
and contributions that allowed us to drop the dependency on dtrace tool
and made the resulting code much more concise.
In commit d895e10a a test was introduced to validate that prefix is a
child of rootprefix. However, it only works when rootprefix is "/".
Since the test is ignored when rootprefix is equal to prefix, this is
only noticed if specifying both -Drootprefix= and -Dprefix=, e.g.:
$ meson foo -Drootprefix=/foo -Dprefix=/foo/bar
meson.build:111:8: ERROR: Problem encountered: Prefix is not below
root prefix (now rootprefix=/foo prefix=/foo/bar)
On Debian, bpftool is installed in /usr/sbin, which is not in $PATH for
non-root users by default, so finding it fails.
Add a secondary, hard-coded '/usr/sbin/bpftool' after 'bpftool' so that
meson can find it.
https://packages.debian.org/sid/amd64/bpftool/filelist
This ensures that the fuzz test code is also built by default.
It also increases the test coverage a bit. Compiling the tests
*with* sanitizers is painfully slow, so this is not enabled. But
just compiling them sauté is hardly noticable. Running the tests
increases the test count and runtime:
622 tests, 26 s
to
922 tests, 35 s
I think this is acceptable.