linux/Documentation/userspace-api
Sargun Dhillon c2aa2dfef2 seccomp: Add wait_killable semantic to seccomp user notifier
This introduces a per-filter flag (SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV)
that makes it so that when notifications are received by the supervisor the
notifying process will transition to wait killable semantics. Although wait
killable isn't a set of semantics formally exposed to userspace, the
concept is searchable. If the notifying process is signaled prior to the
notification being received by the userspace agent, it will be handled as
normal.

One quirk about how this is handled is that the notifying process
only switches to TASK_KILLABLE if it receives a wakeup from either
an addfd or a signal. This is to avoid an unnecessary wakeup of
the notifying task.

The reasons behind switching into wait_killable only after userspace
receives the notification are:
* Avoiding unncessary work - Often, workloads will perform work that they
  may abort (request racing comes to mind). This allows for syscalls to be
  aborted safely prior to the notification being received by the
  supervisor. In this, the supervisor doesn't end up doing work that the
  workload does not want to complete anyways.
* Avoiding side effects - We don't want the syscall to be interruptible
  once the supervisor starts doing work because it may not be trivial
  to reverse the operation. For example, unmounting a file system may
  take a long time, and it's hard to rollback, or treat that as
  reentrant.
* Avoid breaking runtimes - Various runtimes do not GC when they are
  during a syscall (or while running native code that subsequently
  calls a syscall). If many notifications are blocked, and not picked
  up by the supervisor, this can get the application into a bad state.

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220503080958.20220-2-sargun@sargun.me
2022-05-03 14:11:58 -07:00
..
accelerators Documentation: ocxl.rst: change FPGA indirect article to an 2021-06-09 14:51:25 +02:00
ebpf docs/bpf: Add bpf() syscall command reference 2021-03-04 18:39:46 -08:00
ioctl platform-drivers-x86 for v5.18-1 2022-03-25 12:14:39 -07:00
media media: pixfmt-yuv-planar.rst: fix PIX_FMT labels 2022-03-18 07:27:37 +01:00
futex2.rst futex2: Documentation: Document sys_futex_waitv() uAPI 2021-10-07 13:51:13 +02:00
index.rst futex2: Documentation: Document sys_futex_waitv() uAPI 2021-10-07 13:51:13 +02:00
iommu.rst docs: IOMMU user API 2020-10-01 14:52:46 +02:00
landlock.rst docs: userspace-api: landlock.rst: avoid using ReST :doc:foo markup 2021-06-17 13:24:39 -06:00
no_new_privs.rst doc: ReSTify no_new_privs.txt 2017-05-18 10:30:09 -06:00
seccomp_filter.rst seccomp: Add wait_killable semantic to seccomp user notifier 2022-05-03 14:11:58 -07:00
spec_ctrl.rst Documentation: Add L1D flushing Documentation 2021-07-28 11:42:25 +02:00
sysfs-platform_profile.rst Documentation: Add documentation for new platform_profile sysfs attribute 2020-12-30 18:28:57 +01:00
unshare.rst doc-rst: fix inline emphasis in unshare.rst 2017-05-18 10:23:10 -06:00
vduse.rst VDUSE: fix documentation underline warning 2021-10-13 08:42:07 -04:00