This patch introduces the systemd pstore service which will archive the
contents of the Linux persistent storage filesystem, pstore, to other storage,
thus preserving the existing information contained in the pstore, and clearing
pstore storage for future error events.
Linux provides a persistent storage file system, pstore[1], that can store
error records when the kernel dies (or reboots or powers-off). These records in
turn can be referenced to debug kernel problems (currently the kernel stuffs
the tail of the dmesg, which also contains a stack backtrace, into pstore).
The pstore file system supports a variety of backends that map onto persistent
storage, such as the ACPI ERST[2, Section 18.5 Error Serialization] and UEFI
variables[3 Appendix N Common Platform Error Record]. The pstore backends
typically offer a relatively small amount of persistent storage, e.g. 64KiB,
which can quickly fill up and thus prevent subsequent kernel crashes from
recording errors. Thus there is a need to monitor and extract the pstore
contents so that future kernel problems can also record information in the
pstore.
The pstore service is independent of the kdump service. In cloud environments
specifically, host and guest filesystems are on remote filesystems (eg. iSCSI
or NFS), thus kdump relies [implicitly and/or explicitly] upon proper operation
of networking software *and* hardware *and* infrastructure. Thus it may not be
possible to capture a kernel coredump to a file since writes over the network
may not be possible.
The pstore backend, on the other hand, is completely local and provides a path
to store error records which will survive a reboot and aid in post-mortem
debugging.
Usage Notes:
This tool moves files from /sys/fs/pstore into /var/lib/systemd/pstore.
To enable kernel recording of error records into pstore, one must either pass
crash_kexec_post_notifiers[4] to the kernel command line or enable via 'echo Y
> /sys/module/kernel/parameters/crash_kexec_post_notifiers'. This option
invokes the recording of errors into pstore *before* an attempt to kexec/kdump
on a kernel crash.
Optionally, to record reboots and shutdowns in the pstore, one can either pass
the printk.always_kmsg_dump[4] to the kernel command line or enable via 'echo Y >
/sys/module/printk/parameters/always_kmsg_dump'. This option enables code on the
shutdown path to record information via pstore.
This pstore service is a oneshot service. When run, the service invokes
systemd-pstore which is a tool that performs the following:
- reads the pstore.conf configuration file
- collects the lists of files in the pstore (eg. /sys/fs/pstore)
- for certain file types (eg. dmesg) a handler is invoked
- for all other files, the file is moved from pstore
- In the case of dmesg handler, final processing occurs as such:
- files processed in reverse lexigraphical order to faciliate
reconstruction of original dmesg
- the filename is examined to determine which dmesg it is a part
- the file is appended to the reconstructed dmesg
For example, the following pstore contents:
root@vm356:~# ls -al /sys/fs/pstore
total 0
drwxr-x--- 2 root root 0 May 9 09:50 .
drwxr-xr-x 7 root root 0 May 9 09:50 ..
-r--r--r-- 1 root root 1610 May 9 09:49 dmesg-efi-155741337601001
-r--r--r-- 1 root root 1778 May 9 09:49 dmesg-efi-155741337602001
-r--r--r-- 1 root root 1726 May 9 09:49 dmesg-efi-155741337603001
-r--r--r-- 1 root root 1746 May 9 09:49 dmesg-efi-155741337604001
-r--r--r-- 1 root root 1686 May 9 09:49 dmesg-efi-155741337605001
-r--r--r-- 1 root root 1690 May 9 09:49 dmesg-efi-155741337606001
-r--r--r-- 1 root root 1775 May 9 09:49 dmesg-efi-155741337607001
-r--r--r-- 1 root root 1811 May 9 09:49 dmesg-efi-155741337608001
-r--r--r-- 1 root root 1817 May 9 09:49 dmesg-efi-155741337609001
-r--r--r-- 1 root root 1795 May 9 09:49 dmesg-efi-155741337710001
-r--r--r-- 1 root root 1770 May 9 09:49 dmesg-efi-155741337711001
-r--r--r-- 1 root root 1796 May 9 09:49 dmesg-efi-155741337712001
-r--r--r-- 1 root root 1787 May 9 09:49 dmesg-efi-155741337713001
-r--r--r-- 1 root root 1808 May 9 09:49 dmesg-efi-155741337714001
-r--r--r-- 1 root root 1754 May 9 09:49 dmesg-efi-155741337715001
results in the following:
root@vm356:~# ls -al /var/lib/systemd/pstore/155741337/
total 92
drwxr-xr-x 2 root root 4096 May 9 09:50 .
drwxr-xr-x 4 root root 40 May 9 09:50 ..
-rw-r--r-- 1 root root 1610 May 9 09:50 dmesg-efi-155741337601001
-rw-r--r-- 1 root root 1778 May 9 09:50 dmesg-efi-155741337602001
-rw-r--r-- 1 root root 1726 May 9 09:50 dmesg-efi-155741337603001
-rw-r--r-- 1 root root 1746 May 9 09:50 dmesg-efi-155741337604001
-rw-r--r-- 1 root root 1686 May 9 09:50 dmesg-efi-155741337605001
-rw-r--r-- 1 root root 1690 May 9 09:50 dmesg-efi-155741337606001
-rw-r--r-- 1 root root 1775 May 9 09:50 dmesg-efi-155741337607001
-rw-r--r-- 1 root root 1811 May 9 09:50 dmesg-efi-155741337608001
-rw-r--r-- 1 root root 1817 May 9 09:50 dmesg-efi-155741337609001
-rw-r--r-- 1 root root 1795 May 9 09:50 dmesg-efi-155741337710001
-rw-r--r-- 1 root root 1770 May 9 09:50 dmesg-efi-155741337711001
-rw-r--r-- 1 root root 1796 May 9 09:50 dmesg-efi-155741337712001
-rw-r--r-- 1 root root 1787 May 9 09:50 dmesg-efi-155741337713001
-rw-r--r-- 1 root root 1808 May 9 09:50 dmesg-efi-155741337714001
-rw-r--r-- 1 root root 1754 May 9 09:50 dmesg-efi-155741337715001
-rw-r--r-- 1 root root 26754 May 9 09:50 dmesg.txt
where dmesg.txt is reconstructed from the group of related
dmesg-efi-155741337* files.
Configuration file:
The pstore.conf configuration file has four settings, described below.
- Storage : one of "none", "external", or "journal". With "none", this
tool leaves the contents of pstore untouched. With "external", the
contents of the pstore are moved into the /var/lib/systemd/pstore,
as well as logged into the journal. With "journal", the contents of
the pstore are recorded only in the systemd journal. The default is
"external".
- Unlink : is a boolean. When "true", the default, then files in the
pstore are removed once processed. When "false", processing of the
pstore occurs normally, but the pstore files remain.
References:
[1] "Persistent storage for a kernel's dying breath",
March 23, 2011.
https://lwn.net/Articles/434821/
[2] "Advanced Configuration and Power Interface Specification",
version 6.2, May 2017.
https://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf
[3] "Unified Extensible Firmware Interface Specification",
version 2.8, March 2019.
https://uefi.org/sites/default/files/resources/UEFI_Spec_2_8_final.pdf
[4] "The kernel’s command-line parameters",
https://static.lwn.net/kerneldoc/admin-guide/kernel-parameters.html
Some distros install nologin as /usr/sbin/nologin, others as
/sbin/nologin.
Since we can't really on merged-usr everywhere (where the path wouldn't
matter), make the path build time configurable via -Dnologin-path=.
Closes#13028
This means the the code needs to be kept compatible in the shared header,
but I think that still nicer than having two places to declare the same
things.
I added src/boot to -I, so that efi/foo.h needs to be used. This reduces
the potential for accidentally including the wrong header.
When using build/ directory inside of the source directory:
__FILE__: ../src/test/test-log.c
RELATIVE_SOURCE_PATH: ..
PROJECT_FILE: src/test/test-log.c
When using a build directory outside of the source directory:
__FILE__: ../../../home/zbyszek/src/systemd-work/src/test/test-log.c
RELATIVE_SOURCE_PATH: ../../../home/zbyszek/src/systemd-work
PROJECT_FILE: src/test/test-log.c
It was only used for exactly one thing: to substitute in the text in
/var/log/README. But it's use there was completely wrong, because the text
talks about "missing" log files from syslog, so even if we configured systemd
to log to a different directory, the "missing" log files would still be
"missing" from the old location.
Make possible to set NUMA allocation policy for manager. Manager's
policy is by default inherited to all forked off processes. However, it
is possible to override the policy on per-service basis. Currently we
support, these policies: default, prefer, bind, interleave, local.
See man 2 set_mempolicy for details on each policy.
Overall NUMA policy actually consists of two parts. Policy itself and
bitmask representing NUMA nodes where is policy effective. Node mask can
be specified using related option, NUMAMask. Default mask can be
overwritten on per-service level.
Man page generation is generally very slow. I prefer to use -Dman=false when
developing systemd, and only build specific pages when introducing changes.
Those two little helper tools make it easy:
$ build/man/man systemd.link
$ build/man/html systemd.link
will show systemd.link.8 and systemd.link.html from the build directory build/.
Unfortunately the warning must be known, or otherwise the pragma generates a
warning or an error. So let's do a meson check for it.
Is it worth doing this to silence the warning? I think so, because apparently
the warning was already emitted by gcc-8.1, and with the recent push in gcc to
catch more such cases, we'll most likely only get more of those.
Same motivation as in other places: let's use a single logic to parse this.
Use path_equal() to compare the path.
A bug in error handling is fixed: if we failed after the GREEDY_REALLOC but
before the line that sets the last item to NULL, we would jump to
_cleanup_strv_free_ with the strv unterminated. Let's use GREEDY_REALLOC0
to avoid the issue.
Ld's man page says the following:
-u symbol
--undefined=symbol
Force symbol to be entered in the output file as an undefined symbol. Doing
this may, for example, trigger linking of additional modules from standard
libraries. -u may be repeated with different option arguments to enter
additional undefined symbols. This option is equivalent to the "EXTERN"
linker script command.
If this option is being used to force additional modules to be pulled into
the link, and if it is an error for the symbol to remain undefined, then the
option --require-defined should be used instead.
This would imply that it always requires an argument, which this does not
pass. Thus it will grab the next argument on the command line as its
argument. Before it took one of the many -lrt args (presumably) and now it
grabs something other random linker argument and things break.
[zj: this line was added in the first version of the meson configuration back
in 5c23128dab. AFAICT, this was a mistake. No
such flag appeared in Makefile.am at the time.]
https://github.com/mesonbuild/meson/issues/5113
With assertions disabled, we'd get a bunch of warnings that really bring no
value. With this change, a default meson build with -Db_ndebug=true generates
no warnings.
Due to this specific change: d0b6a10#diff-0203416587516c224c8fcfe8129e7caeR8,
systemd-nspawn uses libseccomp now if it is available. We we need to pass -I/usr/include
/libseccomp (or wherever seccomp.h is located) when compiling systemd-nspawn because
nspawn-settings.h does #include <seccomp.h>.
Fixes: #12060
Setting an access mode != 0666 is explicitly supported via -Dgroup-render-mode
In such a case, re-add the uaccess tag.
This is basically the same change that was done for /dev/kvm in
commit fa53e24130 and
ace5e3111c
and partially reverts the changes from
4e15a7343c
This avoids double compilation. Those files are tiny, so it doesn't save time,
but we avoid repeated warnings and errors, and it's generally cleaner to it
this way.
The number of commands in 'ninja -C build clean && ninja -C build' drops from
1462 to 1455 for me.
By defining rootprefix= we avoid a double slash in $systemdsystemunitdir and
other variables. This fixes a regression introduced in
1c2c7c6cb3 where the variables using rootprefix=/
would start with a double slash. This should be interpreted the same, but is
certainly ugly.
The rootprefix variable was added to systemd.pc in
1c2c7c6cb3, so there is no question of backwards
compatiblity. If people try to "override" the prefix and specify
--define-variable=rootprefix=/, they will get a double slash, which should be
OK, and is the same as --define-variable=rootprefix=/something/, which also
results in a double slash somewhere in the strings.
Let's move the shutdown binary into its own subdirectory in
src/shutdown, after all it is relatively isolated from the normal PID 1
sources, being a different binary and all.
Unfortunately it's not possible to move some of the code, since it is
shared with PID 1, that I wished we could move, but I still think it's
worth it.
Clang includes -W#warning in -Werror, so the #warning used for msan would
be an error.
v2:
- use -Wno-error=... so that the warning is still emitted, but not as an error.