Commit Graph

1710 Commits

Author SHA1 Message Date
Stephen Smalley
461698026f selinux: encapsulate policy state, refactor policy load
Encapsulate the policy state in its own structure (struct
selinux_policy) that is separately allocated but referenced from the
selinux_ss structure.  The policy state includes the SID table
(particularly the context structures), the policy database, and the
mapping between the kernel classes/permissions and the policy values.
Refactor the security server portion of the policy load logic to
cleanly separate loading of the new structures from committing the new
policy.  Unify the initial policy load and reload code paths as much
as possible, avoiding duplicated code.  Make sure we are taking the
policy read-lock prior to any dereferencing of the policy.  Move the
copying of the policy capability booleans into the state structure
outside of the policy write-lock because they are separate from the
policy and are read outside of any policy lock; possibly they should
be using at least READ_ONCE/WRITE_ONCE or smp_load_acquire/store_release.

These changes simplify the policy loading logic, reduce the size of
the critical section while holding the policy write-lock, and should
facilitate future changes to e.g. refactor the entire policy reload
logic including the selinuxfs code to make the updating of the policy
and the selinuxfs directory tree atomic and/or to convert the policy
read-write lock to RCU.

Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-08-17 20:48:57 -04:00
Stephen Smalley
339949be25 scripts/selinux,selinux: update mdp to enable policy capabilities
Presently mdp does not enable any SELinux policy capabilities
in the dummy policy it generates. Thus, policies derived from
it will by default lack various features commonly used in modern
policies such as open permission, extended socket classes, network
peer controls, etc.  Split the policy capability definitions out into
their own headers so that we can include them into mdp without pulling in
other kernel headers and extend mdp generate policycap statements for the
policy capabilities known to the kernel.  Policy authors may wish to
selectively remove some of these from the generated policy.

Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-08-17 20:42:00 -04:00
Linus Torvalds
74858abbb1 cap-checkpoint-restore-v5.9
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCXygegQAKCRCRxhvAZXjc
 olWZAQCMPbhI/20LA3OYJ6s+BgBEnm89PymvlHcym6Z4AvTungD+KqZonIYuxWgi
 6Ttlv/fzgFFbXgJgbuass5mwFVoN5wM=
 =oK7d
 -----END PGP SIGNATURE-----

Merge tag 'cap-checkpoint-restore-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux

Pull checkpoint-restore updates from Christian Brauner:
 "This enables unprivileged checkpoint/restore of processes.

  Given that this work has been going on for quite some time the first
  sentence in this summary is hopefully more exciting than the actual
  final code changes required. Unprivileged checkpoint/restore has seen
  a frequent increase in interest over the last two years and has thus
  been one of the main topics for the combined containers &
  checkpoint/restore microconference since at least 2018 (cf. [1]).

  Here are just the three most frequent use-cases that were brought forward:

   - The JVM developers are integrating checkpoint/restore into a Java
     VM to significantly decrease the startup time.

   - In high-performance computing environment a resource manager will
     typically be distributing jobs where users are always running as
     non-root. Long-running and "large" processes with significant
     startup times are supposed to be checkpointed and restored with
     CRIU.

   - Container migration as a non-root user.

  In all of these scenarios it is either desirable or required to run
  without CAP_SYS_ADMIN. The userspace implementation of
  checkpoint/restore CRIU already has the pull request for supporting
  unprivileged checkpoint/restore up (cf. [2]).

  To enable unprivileged checkpoint/restore a new dedicated capability
  CAP_CHECKPOINT_RESTORE is introduced. This solution has last been
  discussed in 2019 in a talk by Google at Linux Plumbers (cf. [1]
  "Update on Task Migration at Google Using CRIU") with Adrian and
  Nicolas providing the implementation now over the last months. In
  essence, this allows the CRIU binary to be installed with the
  CAP_CHECKPOINT_RESTORE vfs capability set thereby enabling
  unprivileged users to restore processes.

  To make this possible the following permissions are altered:

   - Selecting a specific PID via clone3() set_tid relaxed from userns
     CAP_SYS_ADMIN to CAP_CHECKPOINT_RESTORE.

   - Selecting a specific PID via /proc/sys/kernel/ns_last_pid relaxed
     from userns CAP_SYS_ADMIN to CAP_CHECKPOINT_RESTORE.

   - Accessing /proc/pid/map_files relaxed from init userns
     CAP_SYS_ADMIN to init userns CAP_CHECKPOINT_RESTORE.

   - Changing /proc/self/exe from userns CAP_SYS_ADMIN to userns
     CAP_CHECKPOINT_RESTORE.

  Of these four changes the /proc/self/exe change deserves a few words
  because the reasoning behind even restricting /proc/self/exe changes
  in the first place is just full of historical quirks and tracking this
  down was a questionable version of fun that I'd like to spare others.

  In short, it is trivial to change /proc/self/exe as an unprivileged
  user, i.e. without userns CAP_SYS_ADMIN right now. Either via ptrace()
  or by simply intercepting the elf loader in userspace during exec.
  Nicolas was nice enough to even provide a POC for the latter (cf. [3])
  to illustrate this fact.

  The original patchset which introduced PR_SET_MM_MAP had no
  permissions around changing the exe link. They too argued that it is
  trivial to spoof the exe link already which is true. The argument
  brought up against this was that the Tomoyo LSM uses the exe link in
  tomoyo_manager() to detect whether the calling process is a policy
  manager. This caused changing the exe links to be guarded by userns
  CAP_SYS_ADMIN.

  All in all this rather seems like a "better guard it with something
  rather than nothing" argument which imho doesn't qualify as a great
  security policy. Again, because spoofing the exe link is possible for
  the calling process so even if this were security relevant it was
  broken back then and would be broken today. So technically, dropping
  all permissions around changing the exe link would probably be
  possible and would send a clearer message to any userspace that relies
  on /proc/self/exe for security reasons that they should stop doing
  this but for now we're only relaxing the exe link permissions from
  userns CAP_SYS_ADMIN to userns CAP_CHECKPOINT_RESTORE.

  There's a final uapi change in here. Changing the exe link used to
  accidently return EINVAL when the caller lacked the necessary
  permissions instead of the more correct EPERM. This pr contains a
  commit fixing this. I assume that userspace won't notice or care and
  if they do I will revert this commit. But since we are changing the
  permissions anyway it seems like a good opportunity to try this fix.

  With these changes merged unprivileged checkpoint/restore will be
  possible and has already been tested by various users"

[1] LPC 2018
     1. "Task Migration at Google Using CRIU"
        https://www.youtube.com/watch?v=yI_1cuhoDgA&t=12095
     2. "Securely Migrating Untrusted Workloads with CRIU"
        https://www.youtube.com/watch?v=yI_1cuhoDgA&t=14400
     LPC 2019
     1. "CRIU and the PID dance"
         https://www.youtube.com/watch?v=LN2CUgp8deo&list=PLVsQ_xZBEyN30ZA3Pc9MZMFzdjwyz26dO&index=9&t=2m48s
     2. "Update on Task Migration at Google Using CRIU"
        https://www.youtube.com/watch?v=LN2CUgp8deo&list=PLVsQ_xZBEyN30ZA3Pc9MZMFzdjwyz26dO&index=9&t=1h2m8s

[2] https://github.com/checkpoint-restore/criu/pull/1155

[3] https://github.com/nviennot/run_as_exe

* tag 'cap-checkpoint-restore-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  selftests: add clone3() CAP_CHECKPOINT_RESTORE test
  prctl: exe link permission error changed from -EINVAL to -EPERM
  prctl: Allow local CAP_CHECKPOINT_RESTORE to change /proc/self/exe
  proc: allow access in init userns for map_files with CAP_CHECKPOINT_RESTORE
  pid_namespace: use checkpoint_restore_ns_capable() for ns_last_pid
  pid: use checkpoint_restore_ns_capable() for set_tid
  capabilities: Introduce CAP_CHECKPOINT_RESTORE
2020-08-04 15:02:07 -07:00
Linus Torvalds
49e917deeb selinux/stable-5.9 PR 20200803
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAl8okmsUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXMaRA//XO7JKJEyLcpqRzhQP/QY50JXdQtE
 c9vKeb7y4wlfbTozRgjBN3Xj+tFbqANzX/rVsR1aKV+hExyEuUfNZ0Fl8MbPEccQ
 1RUCW2808/YRTYsl0g0DDZsc+vxVosfouk91pZfld9ZRnZbrNTGXFP7vuVyFKdBy
 wBX1FCL9q31wLc8Jk7f6otSbBvSCG0YXjkkxEM7LQx3oQ59s8dfOed41kDGpLoNk
 TS5BN/W3uuYEDIsIwTRZjU4h42dpc/wbxVMJhBg85rU/2bF4u5sDs2qgwqaa1tXs
 aRDH5J+eBMZRCkF4shxlDrrOWeXvEEtal9yYzQUx664tWDjZazoTLctCAe3PWI1i
 q61cG8PXw/5/oB6RyvPkRMLc5pU8P6/Xdfg6R6kOsGSq8bj+g30J6jqGXnW9FIVr
 5rIaooiw19vqH+ASVuq9oLmhuWJQyn6ImFqOkREJFWVaqufglWw9RWDCGFsLq9Tr
 w6HbA9UYCoWpdQBfRXpa086sSQm1wuCP39fIcY64uHpR5gPJuzyd8Tswz3tbEAtg
 v7vgIRtBpghhdcBLzIJgSIXJKR7W/Y49eFwNf3x0OTSeAIia6Z9paaQjXYl71I9V
 6oUiQgVE3lX2SkgMbOK2V5UsjbVkpjjv7MWxdm0mPQCU0Fmb8W2FN/wVR7FBlCZc
 yhde+bs4zTmPNFw=
 =ocPe
 -----END PGP SIGNATURE-----

Merge tag 'selinux-pr-20200803' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull selinux updates from Paul Moore:
 "Beyond the usual smattering of bug fixes, we've got three small
  improvements worth highlighting:

   - improved SELinux policy symbol table performance due to a reworking
     of the insert and search functions

   - allow reading of SELinux labels before the policy is loaded,
     allowing for some more "exotic" initramfs approaches

   - improved checking an error reporting about process
     class/permissions during SELinux policy load"

* tag 'selinux-pr-20200803' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
  selinux: complete the inlining of hashtab functions
  selinux: prepare for inlining of hashtab functions
  selinux: specialize symtab insert and search functions
  selinux: Fix spelling mistakes in the comments
  selinux: fixed a checkpatch warning with the sizeof macro
  selinux: log error messages on required process class / permissions
  scripts/selinux/mdp: fix initial SID handling
  selinux: allow reading labels before policy is loaded
2020-08-04 14:18:01 -07:00
Adrian Reber
124ea650d3 capabilities: Introduce CAP_CHECKPOINT_RESTORE
This patch introduces CAP_CHECKPOINT_RESTORE, a new capability facilitating
checkpoint/restore for non-root users.

Over the last years, The CRIU (Checkpoint/Restore In Userspace) team has
been asked numerous times if it is possible to checkpoint/restore a
process as non-root. The answer usually was: 'almost'.

The main blocker to restore a process as non-root was to control the PID
of the restored process. This feature available via the clone3 system
call, or via /proc/sys/kernel/ns_last_pid is unfortunately guarded by
CAP_SYS_ADMIN.

In the past two years, requests for non-root checkpoint/restore have
increased due to the following use cases:
* Checkpoint/Restore in an HPC environment in combination with a
  resource manager distributing jobs where users are always running as
  non-root. There is a desire to provide a way to checkpoint and
  restore long running jobs.
* Container migration as non-root
* We have been in contact with JVM developers who are integrating
  CRIU into a Java VM to decrease the startup time. These
  checkpoint/restore applications are not meant to be running with
  CAP_SYS_ADMIN.

We have seen the following workarounds:
* Use a setuid wrapper around CRIU:
  See https://github.com/FredHutch/slurm-examples/blob/master/checkpointer/lib/checkpointer/checkpointer-suid.c
* Use a setuid helper that writes to ns_last_pid.
  Unfortunately, this helper delegation technique is impossible to use
  with clone3, and is thus prone to races.
  See https://github.com/twosigma/set_ns_last_pid
* Cycle through PIDs with fork() until the desired PID is reached:
  This has been demonstrated to work with cycling rates of 100,000 PIDs/s
  See https://github.com/twosigma/set_ns_last_pid
* Patch out the CAP_SYS_ADMIN check from the kernel
* Run the desired application in a new user and PID namespace to provide
  a local CAP_SYS_ADMIN for controlling PIDs. This technique has limited
  use in typical container environments (e.g., Kubernetes) as /proc is
  typically protected with read-only layers (e.g., /proc/sys) for
  hardening purposes. Read-only layers prevent additional /proc mounts
  (due to proc's SB_I_USERNS_VISIBLE property), making the use of new
  PID namespaces limited as certain applications need access to /proc
  matching their PID namespace.

The introduced capability allows to:
* Control PIDs when the current user is CAP_CHECKPOINT_RESTORE capable
  for the corresponding PID namespace via ns_last_pid/clone3.
* Open files in /proc/pid/map_files when the current user is
  CAP_CHECKPOINT_RESTORE capable in the root namespace, useful for
  recovering files that are unreachable via the file system such as
  deleted files, or memfd files.

See corresponding selftest for an example with clone3().

Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/r/20200719100418.2112740-2-areber@redhat.com
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-07-19 20:14:42 +02:00
Ondrej Mosnacek
54b27f9287 selinux: complete the inlining of hashtab functions
Move (most of) the definitions of hashtab_search() and hashtab_insert()
to the header file. In combination with the previous patch, this avoids
calling the callbacks indirectly by function pointers and allows for
better optimization, leading to a drastic performance improvement of
these operations.

With this patch, I measured a speed up in the following areas (measured
on x86_64 F32 VM with 4 CPUs):
  1. Policy load (`load_policy`) - takes ~150 ms instead of ~230 ms.
  2. `chcon -R unconfined_u:object_r:user_tmp_t:s0:c381,c519 /tmp/linux-src`
     where /tmp/linux-src is an extracted linux-5.7 source tarball -
     takes ~522 ms instead of ~576 ms. This is because of many
     symtab_search() calls in string_to_context_struct() when there are
     many categories specified in the context.
  3. `stress-ng --msg 1 --msg-ops 10000000` - takes 12.41 s instead of
     13.95 s (consumes 18.6 s of kernel CPU time instead of 21.6 s).
     This is thanks to security_transition_sid() being ~43% faster after
     this patch.

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-07-09 19:08:16 -04:00
Ondrej Mosnacek
24def7bb92 selinux: prepare for inlining of hashtab functions
Refactor searching and inserting into hashtabs to pave the way for
converting hashtab_search() and hashtab_insert() to inline functions in
the next patch. This will avoid indirect calls and allow the compiler to
better optimize individual callers, leading to a significant performance
improvement.

In order to avoid the indirect calls, the key hashing and comparison
callbacks need to be extracted from the hashtab struct and passed
directly to hashtab_search()/_insert() by the callers so that the
callback address is always known at compile time. The kernel's
rhashtable library (<linux/rhashtable*.h>) does the same thing.

This of course makes the hashtab functions slightly easier to misuse by
passing a wrong callback set, but unfortunately there is no better way
to implement a hash table that is both generic and efficient in C. This
patch tries to somewhat mitigate this by only calling the hashtab
functions in the same file where the corresponding callbacks are
defined (wrapping them into more specialized functions as needed).

Note that this patch doesn't bring any benefit without also moving the
definitions of hashtab_search() and -_insert() to the header file, which
is done in a follow-up patch for easier review of the hashtab.c changes
in this patch.

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-07-09 19:05:36 -04:00
Ondrej Mosnacek
237389e301 selinux: specialize symtab insert and search functions
This encapsulates symtab a little better and will help with further
refactoring later.

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-07-08 20:21:43 -04:00
lihao
2c3d8dfece selinux: Fix spelling mistakes in the comments
Fix spelling mistakes in the comments
    quering==>querying

Signed-off-by: lihao <fly.lihao@huawei.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-07-08 12:15:52 -04:00
Ethan Edwards
65d96351b1 selinux: fixed a checkpatch warning with the sizeof macro
`sizeof buf` changed to `sizeof(buf)`

Signed-off-by: Ethan Edwards <ethancarteredwards@gmail.com>
[PM: rewrote the subject line]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-06-29 19:26:17 -04:00
Stephen Smalley
7383c0f94d selinux: log error messages on required process class / permissions
In general SELinux no longer treats undefined object classes or permissions
in the policy as a fatal error, instead handling them in accordance with
handle_unknown. However, the process class and process transition and
dyntransition permissions are still required to be defined due to
dependencies on these definitions for default labeling behaviors,
role and range transitions in older policy versions that lack an explicit
class field, and role allow checking.  Log error messages in these cases
since otherwise the policy load will fail silently with no indication
to the user as to the underlying cause.  While here, fix the checking for
process transition / dyntransition so that omitting either permission is
handled as an error; both are needed in order to ensure that role allow
checking is consistently applied.

Reported-by: bauen1 <j2468h@googlemail.com>
Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-06-23 20:57:01 -04:00
Jonathan Lebon
c8e222616c selinux: allow reading labels before policy is loaded
This patch does for `getxattr` what commit 3e3e24b420 ("selinux: allow
labeling before policy is loaded") did for `setxattr`; it allows
querying the current SELinux label on disk before the policy is loaded.

One of the motivations described in that commit message also drives this
patch: for Fedora CoreOS (and eventually RHEL CoreOS), we want to be
able to move the root filesystem for example, from xfs to ext4 on RAID,
on first boot, at initrd time.[1]

Because such an operation works at the filesystem level, we need to be
able to read the SELinux labels first from the original root, and apply
them to the files of the new root. The previous commit enabled the
second part of this process; this commit enables the first part.

[1] https://github.com/coreos/fedora-coreos-tracker/issues/94

Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Jonathan Lebon <jlebon@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-06-23 20:42:38 -04:00
Linus Torvalds
817d914d17 selinux/stable-5.8 PR 20200621
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAl7vxoUUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXOs3Q/+JSNYoKZiOJax5u1ePEwk5JRasRij
 JkKNjspXueVcoVkVF8lOiSOmQm/7FyfDUi2Qk8U5Gmx7Pr6vQJZgghHxdPMsemCz
 mNbbR8UMm6ssTim19ULBQ3S0Sc1QMQvQCLNDZcv24ne2K8d9HTBrFGenqlLU4UtZ
 JrMLircBt39fVonooMrf9ycGlM8tUZwm8te+Jp7KL18GUKZT8hr0HKzu2WE6/qT4
 WBGNaWxqnfbajnDb41ix2rL+lb8Snqn94cxCjp248rn7M5fJRSCKmYaumBh5ViJ2
 VuD/ZQsTX5SSnc9YDpkUDXya9M1wzFwf64ku6Avga1BXS6lNWB1wqWueSTMfggiL
 2B+LVANWkGFfHtVAVA5xsxXjeJnYmIj/g8qSiHS/RSFJazr1b/cXWedvyewll/Nv
 rFRBsVzktV6BBrlTclcrsu9FmlZRAThNC3uYs/s5vbAja+wHEhCLuacO+jiducRP
 fqQCP2iF6MqC6B2I8WzVp3jU8k2t02i6ySaXmXjzrwOZSLvnOdvDBzE7e95yNLRg
 WLeGd/o2PdLpVoSNVHelFrqm8VZKYSCkWty9WppklnrIVVydKMJ3bgihXY4pADyf
 1ABtKUZgySZKZOpr1pQBqIivHuvKqUGFynj6PSRsngQBoq6V3XpJ7ZCBhuG7cNAT
 9BfnUkhFW7lW70I=
 =nILH
 -----END PGP SIGNATURE-----

Merge tag 'selinux-pr-20200621' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull SELinux fixes from Paul Moore:
 "Three small patches to fix problems in the SELinux code, all found via
  clang.

  Two patches fix potential double-free conditions and one fixes an
  undefined return value"

* tag 'selinux-pr-20200621' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
  selinux: fix undefined return of cond_evaluate_expr
  selinux: fix a double free in cond_read_node()/cond_read_list()
  selinux: fix double free
2020-06-21 15:41:24 -07:00
Tom Rix
8231b0b9c3 selinux: fix undefined return of cond_evaluate_expr
clang static analysis reports an undefined return

security/selinux/ss/conditional.c:79:2: warning: Undefined or garbage value returned to caller [core.uninitialized.UndefReturn]
        return s[0];
        ^~~~~~~~~~~

static int cond_evaluate_expr( ...
{
	u32 i;
	int s[COND_EXPR_MAXDEPTH];

	for (i = 0; i < expr->len; i++)
	  ...

	return s[0];

When expr->len is 0, the loop which sets s[0] never runs.

So return -1 if the loop never runs.

Cc: stable@vger.kernel.org
Signed-off-by: Tom Rix <trix@redhat.com>
Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-06-17 17:36:40 -04:00
Tom Rix
aa449a7965 selinux: fix a double free in cond_read_node()/cond_read_list()
Clang static analysis reports this double free error

security/selinux/ss/conditional.c:139:2: warning: Attempt to free released memory [unix.Malloc]
        kfree(node->expr.nodes);
        ^~~~~~~~~~~~~~~~~~~~~~~

When cond_read_node fails, it calls cond_node_destroy which frees the
node but does not poison the entry in the node list.  So when it
returns to its caller cond_read_list, cond_read_list deletes the
partial list.  The latest entry in the list will be deleted twice.

So instead of freeing the node in cond_read_node, let list freeing in
code_read_list handle the freeing the problem node along with all of the
earlier nodes.

Because cond_read_node no longer does any error handling, the goto's
the error case are redundant.  Instead just return the error code.

Cc: stable@vger.kernel.org
Fixes: 60abd3181d ("selinux: convert cond_list to array")
Signed-off-by: Tom Rix <trix@redhat.com>
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
[PM: subject line tweaks]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-06-16 20:25:19 -04:00
Linus Torvalds
6c32978414 Notifications over pipes + Keyring notifications
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAl7U/i8ACgkQ+7dXa6fL
 C2u2eg/+Oy6ybq0hPovYVkFI9WIG7ZCz7w9Q6BEnfYMqqn3dnfJxKQ3l4pnQEOWw
 f4QfvpvevsYfMtOJkYcG6s66rQgbFdqc5TEyBBy0QNp3acRolN7IXkcopvv9xOpQ
 JxedpbFG1PTFLWjvBpyjlrUPouwLzq2FXAf1Ox0ZIMw6165mYOMWoli1VL8dh0A0
 Ai7JUB0WrvTNbrwhV413obIzXT/rPCdcrgbQcgrrLPex8lQ47ZAE9bq6k4q5HiwK
 KRzEqkQgnzId6cCNTFBfkTWsx89zZunz7jkfM5yx30MvdAtPSxvvpfIPdZRZkXsP
 E2K9Fk1/6OQZTC0Op3Pi/bt+hVG/mD1p0sQUDgo2MO3qlSS+5mMkR8h3mJEgwK12
 72P4YfOJkuAy2z3v4lL0GYdUDAZY6i6G8TMxERKu/a9O3VjTWICDOyBUS6F8YEAK
 C7HlbZxAEOKTVK0BTDTeEUBwSeDrBbvH6MnRlZCG5g1Fos2aWP0udhjiX8IfZLO7
 GN6nWBvK1fYzfsUczdhgnoCzQs3suoDo04HnsTPGJ8De52T4x2RsjV+gPx0nrNAq
 eWChl1JvMWsY2B3GLnl9XQz4NNN+EreKEkk+PULDGllrArrPsp5Vnhb9FJO1PVCU
 hMDJHohPiXnKbc8f4Bd78OhIvnuoGfJPdM5MtNe2flUKy2a2ops=
 =YTGf
 -----END PGP SIGNATURE-----

Merge tag 'notifications-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

Pull notification queue from David Howells:
 "This adds a general notification queue concept and adds an event
  source for keys/keyrings, such as linking and unlinking keys and
  changing their attributes.

  Thanks to Debarshi Ray, we do have a pull request to use this to fix a
  problem with gnome-online-accounts - as mentioned last time:

     https://gitlab.gnome.org/GNOME/gnome-online-accounts/merge_requests/47

  Without this, g-o-a has to constantly poll a keyring-based kerberos
  cache to find out if kinit has changed anything.

  [ There are other notification pending: mount/sb fsinfo notifications
    for libmount that Karel Zak and Ian Kent have been working on, and
    Christian Brauner would like to use them in lxc, but let's see how
    this one works first ]

  LSM hooks are included:

   - A set of hooks are provided that allow an LSM to rule on whether or
     not a watch may be set. Each of these hooks takes a different
     "watched object" parameter, so they're not really shareable. The
     LSM should use current's credentials. [Wanted by SELinux & Smack]

   - A hook is provided to allow an LSM to rule on whether or not a
     particular message may be posted to a particular queue. This is
     given the credentials from the event generator (which may be the
     system) and the watch setter. [Wanted by Smack]

  I've provided SELinux and Smack with implementations of some of these
  hooks.

  WHY
  ===

  Key/keyring notifications are desirable because if you have your
  kerberos tickets in a file/directory, your Gnome desktop will monitor
  that using something like fanotify and tell you if your credentials
  cache changes.

  However, we also have the ability to cache your kerberos tickets in
  the session, user or persistent keyring so that it isn't left around
  on disk across a reboot or logout. Keyrings, however, cannot currently
  be monitored asynchronously, so the desktop has to poll for it - not
  so good on a laptop. This facility will allow the desktop to avoid the
  need to poll.

  DESIGN DECISIONS
  ================

   - The notification queue is built on top of a standard pipe. Messages
     are effectively spliced in. The pipe is opened with a special flag:

        pipe2(fds, O_NOTIFICATION_PIPE);

     The special flag has the same value as O_EXCL (which doesn't seem
     like it will ever be applicable in this context)[?]. It is given up
     front to make it a lot easier to prohibit splice&co from accessing
     the pipe.

     [?] Should this be done some other way?  I'd rather not use up a new
         O_* flag if I can avoid it - should I add a pipe3() system call
         instead?

     The pipe is then configured::

        ioctl(fds[1], IOC_WATCH_QUEUE_SET_SIZE, queue_depth);
        ioctl(fds[1], IOC_WATCH_QUEUE_SET_FILTER, &filter);

     Messages are then read out of the pipe using read().

   - It should be possible to allow write() to insert data into the
     notification pipes too, but this is currently disabled as the
     kernel has to be able to insert messages into the pipe *without*
     holding pipe->mutex and the code to make this work needs careful
     auditing.

   - sendfile(), splice() and vmsplice() are disabled on notification
     pipes because of the pipe->mutex issue and also because they
     sometimes want to revert what they just did - but one or more
     notification messages might've been interleaved in the ring.

   - The kernel inserts messages with the wait queue spinlock held. This
     means that pipe_read() and pipe_write() have to take the spinlock
     to update the queue pointers.

   - Records in the buffer are binary, typed and have a length so that
     they can be of varying size.

     This allows multiple heterogeneous sources to share a common
     buffer; there are 16 million types available, of which I've used
     just a few, so there is scope for others to be used. Tags may be
     specified when a watchpoint is created to help distinguish the
     sources.

   - Records are filterable as types have up to 256 subtypes that can be
     individually filtered. Other filtration is also available.

   - Notification pipes don't interfere with each other; each may be
     bound to a different set of watches. Any particular notification
     will be copied to all the queues that are currently watching for it
     - and only those that are watching for it.

   - When recording a notification, the kernel will not sleep, but will
     rather mark a queue as having lost a message if there's
     insufficient space. read() will fabricate a loss notification
     message at an appropriate point later.

   - The notification pipe is created and then watchpoints are attached
     to it, using one of:

        keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fds[1], 0x01);
        watch_mount(AT_FDCWD, "/", 0, fd, 0x02);
        watch_sb(AT_FDCWD, "/mnt", 0, fd, 0x03);

     where in both cases, fd indicates the queue and the number after is
     a tag between 0 and 255.

   - Watches are removed if either the notification pipe is destroyed or
     the watched object is destroyed. In the latter case, a message will
     be generated indicating the enforced watch removal.

  Things I want to avoid:

   - Introducing features that make the core VFS dependent on the
     network stack or networking namespaces (ie. usage of netlink).

   - Dumping all this stuff into dmesg and having a daemon that sits
     there parsing the output and distributing it as this then puts the
     responsibility for security into userspace and makes handling
     namespaces tricky. Further, dmesg might not exist or might be
     inaccessible inside a container.

   - Letting users see events they shouldn't be able to see.

  TESTING AND MANPAGES
  ====================

   - The keyutils tree has a pipe-watch branch that has keyctl commands
     for making use of notifications. Proposed manual pages can also be
     found on this branch, though a couple of them really need to go to
     the main manpages repository instead.

     If the kernel supports the watching of keys, then running "make
     test" on that branch will cause the testing infrastructure to spawn
     a monitoring process on the side that monitors a notifications pipe
     for all the key/keyring changes induced by the tests and they'll
     all be checked off to make sure they happened.

        https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/keyutils.git/log/?h=pipe-watch

   - A test program is provided (samples/watch_queue/watch_test) that
     can be used to monitor for keyrings, mount and superblock events.
     Information on the notifications is simply logged to stdout"

* tag 'notifications-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  smack: Implement the watch_key and post_notification hooks
  selinux: Implement the watch_key security hook
  keys: Make the KEY_NEED_* perms an enum rather than a mask
  pipe: Add notification lossage handling
  pipe: Allow buffers to be marked read-whole-or-error for notifications
  Add sample notification program
  watch_queue: Add a key/keyring notification facility
  security: Add hooks to rule on setting a watch
  pipe: Add general notification queue support
  pipe: Add O_NOTIFICATION_PIPE
  security: Add a hook for the point of notification insertion
  uapi: General notification queue definitions
2020-06-13 09:56:21 -07:00
Tom Rix
65de50969a selinux: fix double free
Clang's static analysis tool reports these double free memory errors.

security/selinux/ss/services.c:2987:4: warning: Attempt to free released memory [unix.Malloc]
                        kfree(bnames[i]);
                        ^~~~~~~~~~~~~~~~
security/selinux/ss/services.c:2990:2: warning: Attempt to free released memory [unix.Malloc]
        kfree(bvalues);
        ^~~~~~~~~~~~~~

So improve the security_get_bools error handling by freeing these variables
and setting their return pointers to NULL and the return len to 0

Cc: stable@vger.kernel.org
Signed-off-by: Tom Rix <trix@redhat.com>
Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-06-10 22:10:35 -04:00
Linus Torvalds
15a2bc4dbb Merge branch 'exec-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull execve updates from Eric Biederman:
 "Last cycle for the Nth time I ran into bugs and quality of
  implementation issues related to exec that could not be easily be
  fixed because of the way exec is implemented. So I have been digging
  into exec and cleanup up what I can.

  I don't think I have exec sorted out enough to fix the issues I
  started with but I have made some headway this cycle with 4 sets of
  changes.

   - promised cleanups after introducing exec_update_mutex

   - trivial cleanups for exec

   - control flow simplifications

   - remove the recomputation of bprm->cred

  The net result is code that is a bit easier to understand and work
  with and a decrease in the number of lines of code (if you don't count
  the added tests)"

* 'exec-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (24 commits)
  exec: Compute file based creds only once
  exec: Add a per bprm->file version of per_clear
  binfmt_elf_fdpic: fix execfd build regression
  selftests/exec: Add binfmt_script regression test
  exec: Remove recursion from search_binary_handler
  exec: Generic execfd support
  exec/binfmt_script: Don't modify bprm->buf and then return -ENOEXEC
  exec: Move the call of prepare_binprm into search_binary_handler
  exec: Allow load_misc_binary to call prepare_binprm unconditionally
  exec: Convert security_bprm_set_creds into security_bprm_repopulate_creds
  exec: Factor security_bprm_creds_for_exec out of security_bprm_set_creds
  exec: Teach prepare_exec_creds how exec treats uids & gids
  exec: Set the point of no return sooner
  exec: Move handling of the point of no return to the top level
  exec: Run sync_mm_rss before taking exec_update_mutex
  exec: Fix spelling of search_binary_handler in a comment
  exec: Move the comment from above de_thread to above unshare_sighand
  exec: Rename flush_old_exec begin_new_exec
  exec: Move most of setup_new_exec into flush_old_exec
  exec: In setup_new_exec cache current in the local variable me
  ...
2020-06-04 14:07:08 -07:00
Linus Torvalds
cb8e59cc87 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from David Miller:

 1) Allow setting bluetooth L2CAP modes via socket option, from Luiz
    Augusto von Dentz.

 2) Add GSO partial support to igc, from Sasha Neftin.

 3) Several cleanups and improvements to r8169 from Heiner Kallweit.

 4) Add IF_OPER_TESTING link state and use it when ethtool triggers a
    device self-test. From Andrew Lunn.

 5) Start moving away from custom driver versions, use the globally
    defined kernel version instead, from Leon Romanovsky.

 6) Support GRO vis gro_cells in DSA layer, from Alexander Lobakin.

 7) Allow hard IRQ deferral during NAPI, from Eric Dumazet.

 8) Add sriov and vf support to hinic, from Luo bin.

 9) Support Media Redundancy Protocol (MRP) in the bridging code, from
    Horatiu Vultur.

10) Support netmap in the nft_nat code, from Pablo Neira Ayuso.

11) Allow UDPv6 encapsulation of ESP in the ipsec code, from Sabrina
    Dubroca. Also add ipv6 support for espintcp.

12) Lots of ReST conversions of the networking documentation, from Mauro
    Carvalho Chehab.

13) Support configuration of ethtool rxnfc flows in bcmgenet driver,
    from Doug Berger.

14) Allow to dump cgroup id and filter by it in inet_diag code, from
    Dmitry Yakunin.

15) Add infrastructure to export netlink attribute policies to
    userspace, from Johannes Berg.

16) Several optimizations to sch_fq scheduler, from Eric Dumazet.

17) Fallback to the default qdisc if qdisc init fails because otherwise
    a packet scheduler init failure will make a device inoperative. From
    Jesper Dangaard Brouer.

18) Several RISCV bpf jit optimizations, from Luke Nelson.

19) Correct the return type of the ->ndo_start_xmit() method in several
    drivers, it's netdev_tx_t but many drivers were using
    'int'. From Yunjian Wang.

20) Add an ethtool interface for PHY master/slave config, from Oleksij
    Rempel.

21) Add BPF iterators, from Yonghang Song.

22) Add cable test infrastructure, including ethool interfaces, from
    Andrew Lunn. Marvell PHY driver is the first to support this
    facility.

23) Remove zero-length arrays all over, from Gustavo A. R. Silva.

24) Calculate and maintain an explicit frame size in XDP, from Jesper
    Dangaard Brouer.

25) Add CAP_BPF, from Alexei Starovoitov.

26) Support terse dumps in the packet scheduler, from Vlad Buslov.

27) Support XDP_TX bulking in dpaa2 driver, from Ioana Ciornei.

28) Add devm_register_netdev(), from Bartosz Golaszewski.

29) Minimize qdisc resets, from Cong Wang.

30) Get rid of kernel_getsockopt and kernel_setsockopt in order to
    eliminate set_fs/get_fs calls. From Christoph Hellwig.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2517 commits)
  selftests: net: ip_defrag: ignore EPERM
  net_failover: fixed rollback in net_failover_open()
  Revert "tipc: Fix potential tipc_aead refcnt leak in tipc_crypto_rcv"
  Revert "tipc: Fix potential tipc_node refcnt leak in tipc_rcv"
  vmxnet3: allow rx flow hash ops only when rss is enabled
  hinic: add set_channels ethtool_ops support
  selftests/bpf: Add a default $(CXX) value
  tools/bpf: Don't use $(COMPILE.c)
  bpf, selftests: Use bpf_probe_read_kernel
  s390/bpf: Use bcr 0,%0 as tail call nop filler
  s390/bpf: Maintain 8-byte stack alignment
  selftests/bpf: Fix verifier test
  selftests/bpf: Fix sample_cnt shared between two threads
  bpf, selftests: Adapt cls_redirect to call csum_level helper
  bpf: Add csum_level helper for fixing up csum levels
  bpf: Fix up bpf_skb_adjust_room helper's skb csum setting
  sfc: add missing annotation for efx_ef10_try_update_nic_stats_vf()
  crypto/chtls: IPv6 support for inline TLS
  Crypto/chcr: Fixes a coccinile check error
  Crypto/chcr: Fixes compilations warnings
  ...
2020-06-03 16:27:18 -07:00
Linus Torvalds
f41030a20b selinux/stable-5.8 PR 20200601
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAl7VnLoUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXPkjA//fHFbgHBbiZrS7v/vi61wdpEtGzmn
 /hr4Z5DpFmJdCTGeGItST8Xq4KEqlLGrMclk+PsG0H7BMJEEp+0XJ+begqNvC8PF
 +JzP+oBqoO0SoF5z0jOnBBtzK8R3vmVgcPO3dNdEgNBQG3T7/GQLUTX8DylBDOI1
 yFeuewRD7sK/rIg/S6t+B0ut7Uer5CjEIed4iQZ3eKIUqE6/C1zpmQj98MH9L5uh
 yN0tdF8aOZvgD6v1bfmvgAnnODFvvKcogDn+hvbqRhrDdhgt1DAErIjYeqRemQRc
 g7Xve4i7VivXC4o8nhUy00FWqzCB5tcydR0cwgg4iR/JgKvn18s0vRQV9SU7Nt+o
 pXOex6qHlFCJpjTop+DCBEkGK9V7UBMM6t6gwR/bpkDMYkgIjJrCIQTyw8/HrKKt
 fntryXf9juM0Owh/YOp5jKXPddhkfuztViJ+FnxsI2sho643Gg6/Wfy2slvJ0udH
 i0bnnacW/6pysf/eLrPsF89IacAGydkhdZwaSno3GLyCtXxrqJU4cs2wSpUq0Wiz
 g4kB4hpPXgrQszLriEsF0gRVcRu2nOF4ISXlUqfSw7i/nFT7+axYUjgBg9PpV1Mj
 GyLBSOQp1xs4S/oglfJ5nE4UtS4m187t4JVWOxfqqyWE/O2cqUPtaS+52m0aIWTH
 6HFWbmL5+Dxsm+4=
 =A+bX
 -----END PGP SIGNATURE-----

Merge tag 'selinux-pr-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull SELinux updates from Paul Moore:
 "The highlights:

   - A number of improvements to various SELinux internal data
     structures to help improve performance. We move the role
     transitions into a hash table. In the content structure we shift
     from hashing the content string (aka SELinux label) to the
     structure itself, when it is valid. This last change not only
     offers a speedup, but it helps us simplify the code some as well.

   - Add a new SELinux policy version which allows for a more space
     efficient way of storing the filename transitions in the binary
     policy. Given the default Fedora SELinux policy with the unconfined
     module enabled, this change drops the policy size from ~7.6MB to
     ~3.3MB. The kernel policy load time dropped as well.

   - Some fixes to the error handling code in the policy parser to
     properly return error codes when things go wrong"

* tag 'selinux-pr-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
  selinux: netlabel: Remove unused inline function
  selinux: do not allocate hashtabs dynamically
  selinux: fix return value on error in policydb_read()
  selinux: simplify range_write()
  selinux: fix error return code in policydb_read()
  selinux: don't produce incorrect filename_trans_count
  selinux: implement new format of filename transitions
  selinux: move context hashing under sidtab
  selinux: hash context structure directly
  selinux: store role transitions in a hash table
  selinux: drop unnecessary smp_load_acquire() call
  selinux: fix warning Comparison to bool
2020-06-02 17:16:47 -07:00
Ingo Molnar
0bffedbce9 Linux 5.7-rc7
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl7K9iEeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGzTAH/0ifZEG4BQ8x/WlB
 8YLSLE6QQTSXYi25nyExuJbFkkKY5Tik8M2HD/36xwY/HnZOlH9jH6m0ntqZxpaA
 3EU9lr1ct79nCBMYhiJssvz8d9AOZXlyogFW9y2y9pmPjlmUtseZ7yGh1xD465cj
 B5Ty2w2W34cs7zF3og2xn5agOJMtWWXLXZ5mRa9EOquKC5zeYyRicmd0T+plYQD6
 hbRYmxFfDfppVnBCBARPNN0+NU5JJD94H+8bOuf1tl48XNrLiZMOicmtohKNQ6+W
 rZNpJNEGEp7KMtqWH0Nl3hmy3yfZHMwe1DXM/AZDqR7jTHZY4mZ0GEpLyfI9AU4n
 34jVHwU=
 =SmJ9
 -----END PGP SIGNATURE-----

Merge tag 'v5.7-rc7' into perf/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-05-28 07:58:12 +02:00
Eric W. Biederman
b8bff59926 exec: Factor security_bprm_creds_for_exec out of security_bprm_set_creds
Today security_bprm_set_creds has several implementations:
apparmor_bprm_set_creds, cap_bprm_set_creds, selinux_bprm_set_creds,
smack_bprm_set_creds, and tomoyo_bprm_set_creds.

Except for cap_bprm_set_creds they all test bprm->called_set_creds and
return immediately if it is true.  The function cap_bprm_set_creds
ignores bprm->calld_sed_creds entirely.

Create a new LSM hook security_bprm_creds_for_exec that is called just
before prepare_binprm in __do_execve_file, resulting in a LSM hook
that is called exactly once for the entire of exec.  Modify the bits
of security_bprm_set_creds that only want to be called once per exec
into security_bprm_creds_for_exec, leaving only cap_bprm_set_creds
behind.

Remove bprm->called_set_creds all of it's former users have been moved
to security_bprm_creds_for_exec.

Add or upate comments a appropriate to bring them up to date and
to reflect this change.

Link: https://lkml.kernel.org/r/87v9kszrzh.fsf_-_@x220.int.ebiederm.org
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Casey Schaufler <casey@schaufler-ca.com> # For the LSM and Smack bits
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2020-05-20 14:45:31 -05:00
David Howells
3e412ccc22 selinux: Implement the watch_key security hook
Implement the watch_key security hook to make sure that a key grants the
caller View permission in order to set a watch on a key.

For the moment, the watch_devices security hook is left unimplemented as
it's not obvious what the object should be since the queue is global and
didn't previously exist.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>
2020-05-19 15:47:15 +01:00
David Howells
8c0637e950 keys: Make the KEY_NEED_* perms an enum rather than a mask
Since the meaning of combining the KEY_NEED_* constants is undefined, make
it so that you can't do that by turning them into an enum.

The enum is also given some extra values to represent special
circumstances, such as:

 (1) The '0' value is reserved and causes a warning to trap the parameter
     being unset.

 (2) The key is to be unlinked and we require no permissions on it, only
     the keyring, (this replaces the KEY_LOOKUP_FOR_UNLINK flag).

 (3) An override due to CAP_SYS_ADMIN.

 (4) An override due to an instantiation token being present.

 (5) The permissions check is being deferred to later key_permission()
     calls.

The extra values give the opportunity for LSMs to audit these situations.

[Note: This really needs overhauling so that lookup_user_key() tells
 key_task_permission() and the LSM what operation is being done and leaves
 it to those functions to decide how to map that onto the available
 permits.  However, I don't really want to make these change in the middle
 of the notifications patchset.]

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
cc: Paul Moore <paul@paul-moore.com>
cc: Stephen Smalley <stephen.smalley.work@gmail.com>
cc: Casey Schaufler <casey@schaufler-ca.com>
cc: keyrings@vger.kernel.org
cc: selinux@vger.kernel.org
2020-05-19 15:42:22 +01:00
Alexei Starovoitov
a17b53c4a4 bpf, capability: Introduce CAP_BPF
Split BPF operations that are allowed under CAP_SYS_ADMIN into
combination of CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN.
For backward compatibility include them in CAP_SYS_ADMIN as well.

The end result provides simple safety model for applications that use BPF:
- to load tracing program types
  BPF_PROG_TYPE_{KPROBE, TRACEPOINT, PERF_EVENT, RAW_TRACEPOINT, etc}
  use CAP_BPF and CAP_PERFMON
- to load networking program types
  BPF_PROG_TYPE_{SCHED_CLS, XDP, SK_SKB, etc}
  use CAP_BPF and CAP_NET_ADMIN

There are few exceptions from this rule:
- bpf_trace_printk() is allowed in networking programs, but it's using
  tracing mechanism, hence this helper needs additional CAP_PERFMON
  if networking program is using this helper.
- BPF_F_ZERO_SEED flag for hash/lru map is allowed under CAP_SYS_ADMIN only
  to discourage production use.
- BPF HW offload is allowed under CAP_SYS_ADMIN.
- bpf_probe_write_user() is allowed under CAP_SYS_ADMIN only.

CAPs are not checked at attach/detach time with two exceptions:
- loading BPF_PROG_TYPE_CGROUP_SKB is allowed for unprivileged users,
  hence CAP_NET_ADMIN is required at attach time.
- flow_dissector detach doesn't check prog FD at detach,
  hence CAP_NET_ADMIN is required at detach time.

CAP_SYS_ADMIN is required to iterate BPF objects (progs, maps, links) via get_next_id
command and convert them to file descriptor via GET_FD_BY_ID command.
This restriction guarantees that mutliple tasks with CAP_BPF are not able to
affect each other. That leads to clean isolation of tasks. For example:
task A with CAP_BPF and CAP_NET_ADMIN loads and attaches a firewall via bpf_link.
task B with the same capabilities cannot detach that firewall unless
task A explicitly passed link FD to task B via scm_rights or bpffs.
CAP_SYS_ADMIN can still detach/unload everything.

Two networking user apps with CAP_SYS_ADMIN and CAP_NET_ADMIN can
accidentely mess with each other programs and maps.
Two networking user apps with CAP_NET_ADMIN and CAP_BPF cannot affect each other.

CAP_NET_ADMIN + CAP_BPF allows networking programs access only packet data.
Such networking progs cannot access arbitrary kernel memory or leak pointers.

bpftool, bpftrace, bcc tools binaries should NOT be installed with
CAP_BPF and CAP_PERFMON, since unpriv users will be able to read kernel secrets.
But users with these two permissions will be able to use these tracing tools.

CAP_PERFMON is least secure, since it allows kprobes and kernel memory access.
CAP_NET_ADMIN can stop network traffic via iproute2.
CAP_BPF is the safest from security point of view and harmless on its own.

Having CAP_BPF and/or CAP_NET_ADMIN is not enough to write into arbitrary map
and if that map is used by firewall-like bpf prog.
CAP_BPF allows many bpf prog_load commands in parallel. The verifier
may consume large amount of memory and significantly slow down the system.

Existing unprivileged BPF operations are not affected.
In particular unprivileged users are allowed to load socket_filter and cg_skb
program types and to create array, hash, prog_array, map-in-map map types.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200513230355.7858-2-alexei.starovoitov@gmail.com
2020-05-15 17:29:41 +02:00
David S. Miller
d00f26b623 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2020-05-14

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Merged tag 'perf-for-bpf-2020-05-06' from tip tree that includes CAP_PERFMON.

2) support for narrow loads in bpf_sock_addr progs and additional
   helpers in cg-skb progs, from Andrey.

3) bpf benchmark runner, from Andrii.

4) arm and riscv JIT optimizations, from Luke.

5) bpf iterator infrastructure, from Yonghong.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-14 20:31:21 -07:00
YueHaibing
fe5a90b8c1 selinux: netlabel: Remove unused inline function
There's no callers in-tree.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-05-12 20:16:33 -04:00
Alexei Starovoitov
f87b87a1c9 CAP_PERFMON for BPF
-----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6zMIUTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoTEOEACWUbZlswquZqSAr6pESTjAbjSPZ58s
 QfmiY218aXvyvwJeINfJduYIjtVjPL9F0qbqmm5Sh4CFflgF9QRUZLxsyuu6K7uY
 Bsy7hjHWUCO79anxgjie1rqhCxT/orW39E0nlDW72AlrebVwPRc4PmERsV9bl/9z
 Z7M7aJzza60938K8qN24A63Q4wb3uCfygUYDGkeaN46jmwlHLj8Qwu/L2pVv2/3O
 7FHEHYFK0UuvVw6byRpbPoHDbBEGApYszRlEUMRtkk/7zsvdIQJGHQpPMD5PkGY1
 kS6x1a7sdnA7++A7Oin7Uq+0y7sgVNeJyPO9o+u9wZgePZL4t87YZlwIL5NlyXIL
 JHDrz0DAckSYEAYuPnFrXtYW2AQ9TZxVuIHGsJdKW8KOdkHkYUqtXwLuj+0xf8v/
 szeJR+l6mOJzDHML7W7KzZdl+AJB/+GE55cLaRPs0bBFyLpUs/vL8BjkoDiDm3/i
 okGIWQkh8PwpujiS/mHDHqoXuVpVHYAcHD0X+zuLIUVCzKf71Kq7y2fIiHyTR4Wo
 +x5aeOFWHYOC2DwFdUQ1EiOUCtLbORqq1CDsnIE4KCtsfx1K6IHmqI8X77D8CTEp
 oSPz0kd8kgBfHwPhCepQ7DnBA1cJTiRbs6/++frU0S5R2HhIOGeLN6hrhrMjNhYD
 07jorSqL6hjjog==
 =+J1X
 -----END PGP SIGNATURE-----

Merge tag 'perf-for-bpf-2020-05-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into bpf-next

CAP_PERFMON for BPF
2020-05-06 17:12:44 -07:00
Ondrej Mosnacek
03414a49ad selinux: do not allocate hashtabs dynamically
It is simpler to allocate them statically in the corresponding
structure, avoiding unnecessary kmalloc() calls and pointer
dereferencing.

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
[PM: manual merging required in policydb.c]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-05-01 16:34:57 -04:00
Ondrej Mosnacek
46619b44e4 selinux: fix return value on error in policydb_read()
The value of rc is still zero from the last assignment when the error
path is taken. Fix it by setting it to -ENOMEM before the
hashtab_create() call.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: e67b2ec9f6 ("selinux: store role transitions in a hash table")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-05-01 16:08:46 -04:00
Ondrej Mosnacek
3348bd33e8 selinux: simplify range_write()
No need to traverse the hashtab to count its elements, hashtab already
tracks it for us.

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-05-01 16:08:04 -04:00
Wei Yongjun
4c09f8b691 selinux: fix error return code in policydb_read()
Fix to return negative error code -ENOMEM from the kvcalloc() error
handling case instead of 0, as done elsewhere in this function.

Fixes: acdf52d97f ("selinux: convert to kvmalloc")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-05-01 15:02:14 -04:00
Linus Torvalds
39e16d9342 selinux/stable-5.7 PR 20200430
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAl6rPswUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXMJBQ//dAU7VS01kQUUsFjd8xUIOk9aSbNy
 gjFkzcpbTsS4Mhqk0FSP3mfqDWP3lvxt9gx6WfnCf+a2KE3eTtf9bISW2OB2evIl
 9ydae6frJLiP6yIeAEZBb1PBQ6AxwBT8j8drKi57sOBC8rkmF66wiMaG2nybYW/j
 rvkOQCFtWj/A3b+Y7y8fVs8sjTHWvcsvkN7kwYmmdjyn7h/C1Tqc6TOOrt1jtLUG
 dgeak9bCIvK7JB/W6squ1iKqvkJhld7h5fZn6WB/6Xd1DKD1LVjGT8HsKpI/ei49
 0tAybqaLv8WxVc5ZGcAGoTt/X0hq3lXRiMG1Qgmed85wxjrLEpU12L6yprEtgtao
 0yY1JNizuC3Ehbi02o4gHf+RffLPWDrT8Kmu00/IuridCesNZCrEFpbAZmOwPU67
 nFufU0YlSnsVJ63C8TMhkI2eg/VejGjN4I2PEgcxEbZKBBW+nAcJfoKl1y2tzEo9
 ZNdZcetY9yJdpewjsF6VgsXs4qUrm1NUiG8pCXdK23+w/qYvZ4UqoYfoRNYIuRS0
 nRN40OkRYN6GzDZ+NCPYqgIhoEpps0p96VYQI6mp+PyOpwlCq8epKVjXD/TkteVG
 mevM2Ffy8xVaL47ufXxAHn+pA6F6Mdmo/rIwe+U5Olase96DTcFr90JPmVz58mcP
 6lWZhje3wS3mSpk=
 =HTrZ
 -----END PGP SIGNATURE-----

Merge tag 'selinux-pr-20200430' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull SELinux fixes from Paul Moore:
 "Two more SELinux patches to fix problems in the v5.7-rcX releases.

  Wei Yongjun's patch fixes a return code in an error path, and my patch
  fixes a problem where we were not correctly applying access controls
  to all of the netlink messages in the netlink_send LSM hook"

* tag 'selinux-pr-20200430' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
  selinux: properly handle multiple messages in selinux_netlink_send()
  selinux: fix error return code in cond_read_list()
2020-04-30 16:35:45 -07:00
Paul Moore
fb73974172 selinux: properly handle multiple messages in selinux_netlink_send()
Fix the SELinux netlink_send hook to properly handle multiple netlink
messages in a single sk_buff; each message is parsed and subject to
SELinux access control.  Prior to this patch, SELinux only inspected
the first message in the sk_buff.

Cc: stable@vger.kernel.org
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-04-30 16:18:37 -04:00
Wei Yongjun
292fed1fc8 selinux: fix error return code in cond_read_list()
Fix to return negative error code -ENOMEM from the error handling
case instead of 0, as done elsewhere in this function.

Fixes: 60abd3181d ("selinux: convert cond_list to array")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-04-27 17:44:39 -04:00
Ondrej Mosnacek
9521eb3ea1 selinux: don't produce incorrect filename_trans_count
I thought I fixed the counting in filename_trans_read_helper() to count
the compat rule count correctly in the final version, but it's still
wrong. To really count the same thing as in the compat path, we'd need
to add up the cardinalities of stype bitmaps of all datums.

Since the kernel currently doesn't implement an ebitmap_cardinality()
function (and computing the proper count would just waste CPU cycles
anyway), just document that we use the field only in case of the old
format and stop updating it in filename_trans_read_helper().

Fixes: 4300590243 ("selinux: implement new format of filename transitions")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-04-22 15:38:10 -04:00
Ingo Molnar
87cfeb1920 perf/core fixes and improvements:
kernel + tools/perf:
 
   Alexey Budankov:
 
   - Introduce CAP_PERFMON to kernel and user space.
 
 callchains:
 
   Adrian Hunter:
 
   - Allow using Intel PT to synthesize callchains for regular events.
 
   Kan Liang:
 
   - Stitch LBR records from multiple samples to get deeper backtraces,
     there are caveats, see the csets for details.
 
 perf script:
 
   Andreas Gerstmayr:
 
   - Add flamegraph.py script
 
 BPF:
 
   Jiri Olsa:
 
   - Synthesize bpf_trampoline/dispatcher ksymbol events.
 
 perf stat:
 
   Arnaldo Carvalho de Melo:
 
   - Honour --timeout for forked workloads.
 
   Stephane Eranian:
 
   - Force error in fallback on :k events, to avoid counting nothing when
     the user asks for kernel events but is not allowed to.
 
 perf bench:
 
   Ian Rogers:
 
   - Add event synthesis benchmark.
 
 tools api fs:
 
   Stephane Eranian:
 
  - Make xxx__mountpoint() more scalable
 
 libtraceevent:
 
   He Zhe:
 
   - Handle return value of asprintf.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCXp2LlQAKCRCyPKLppCJ+
 J95oAP0ZihVUhESv/gdeX0IDE5g6Rd2V6LNcRj+jb7gX9NlQkwD/UfS454WV1ftQ
 qTwrkKPzY/5Tm2cLuVE7r7fJ6naDHgU=
 =FHm4
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-for-mingo-5.8-20200420' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core

Pull perf/core fixes and improvements from Arnaldo Carvalho de Melo:

kernel + tools/perf:

  Alexey Budankov:

  - Introduce CAP_PERFMON to kernel and user space.

callchains:

  Adrian Hunter:

  - Allow using Intel PT to synthesize callchains for regular events.

  Kan Liang:

  - Stitch LBR records from multiple samples to get deeper backtraces,
    there are caveats, see the csets for details.

perf script:

  Andreas Gerstmayr:

  - Add flamegraph.py script

BPF:

  Jiri Olsa:

  - Synthesize bpf_trampoline/dispatcher ksymbol events.

perf stat:

  Arnaldo Carvalho de Melo:

  - Honour --timeout for forked workloads.

  Stephane Eranian:

  - Force error in fallback on :k events, to avoid counting nothing when
    the user asks for kernel events but is not allowed to.

perf bench:

  Ian Rogers:

  - Add event synthesis benchmark.

tools api fs:

  Stephane Eranian:

 - Make xxx__mountpoint() more scalable

libtraceevent:

  He Zhe:

  - Handle return value of asprintf.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-04-22 14:08:28 +02:00
Ondrej Mosnacek
4300590243 selinux: implement new format of filename transitions
Implement a new, more space-efficient way of storing filename
transitions in the binary policy. The internal structures have already
been converted to this new representation; this patch just implements
reading/writing an equivalent represntation from/to the binary policy.

This new format reduces the size of Fedora policy from 7.6 MB to only
3.3 MB (with policy optimization enabled in both cases). With the
unconfined module disabled, the size is reduced from 3.3 MB to 2.4 MB.

The time to load policy into kernel is also shorter with the new format.
On Fedora Rawhide x86_64 it dropped from 157 ms to 106 ms; without the
unconfined module from 115 ms to 105 ms.

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-04-17 16:42:01 -04:00
Ondrej Mosnacek
225621c934 selinux: move context hashing under sidtab
Now that context hash computation no longer depends on policydb, we can
simplify things by moving the context hashing completely under sidtab.
The hash is still cached in sidtab entries, but not for the in-flight
context structures.

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-04-17 16:04:38 -04:00
Ondrej Mosnacek
5007728980 selinux: hash context structure directly
Always hashing the string representation is inefficient. Just hash the
contents of the structure directly (using jhash). If the context is
invalid (str & len are set), then hash the string as before, otherwise
hash the structured data.

Since the context hashing function is now faster (about 10 times), this
patch decreases the overhead of security_transition_sid(), which is
called from many hooks.

The jhash function seemed as a good choice, since it is used as the
default hashing algorithm in rhashtable.

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Reviewed-by: Jeff Vander Stoep <jeffv@google.com>
Tested-by: Jeff Vander Stoep <jeffv@google.com>
[PM: fixed some spelling errors in the comments pointed out by JVS]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-04-17 16:04:34 -04:00
Ondrej Mosnacek
e67b2ec9f6 selinux: store role transitions in a hash table
Currently, they are stored in a linked list, which adds significant
overhead to security_transition_sid(). On Fedora, with 428 role
transitions in policy, converting this list to a hash table cuts down
its run time by about 50%. This was measured by running 'stress-ng --msg
1 --msg-ops 100000' under perf with and without this patch.

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-04-17 15:20:22 -04:00
Linus Torvalds
9786cab674 selinux/stable-5.7 PR 20200416
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAl6YmC0UHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXPplBAAzu5Fi0grInLr/IGXQKN2ZWcnx6KC
 OIo28vpBhie0Q9tRtHTux2ec57IBYGAVomhZDGWcHvVHdm84T3/+/Fnb/cL9FIBy
 GX2XgQjvAIyIPsscnq47eHbGdAk8o9E1mxuGD7Sgyql5834j3XbRN1yoOMEXfIOg
 0sDjv7/4EzIymI/jiEaZ6LyVA/bXT2L0CcXEyLD4RSUJEgBaejrx8k1jAwz2w/De
 NoXUqSnRpzN+ti2T0u/kt77cnshmK7w5AyjedA340LAqtvpMIWseeFmeTvlxQeOK
 bIZaTmwgGdkKo8hdgayns1/A3FNSr9lnlOOfn04/SpGHpGOvmC/b+xrw3ENJLHJG
 r+hanFAKkUlYGVY3dK82g3gAbfRQL3n48Cb0qmujqlqfLLAwc5VG0AN8WfDm0c8D
 kZEe3Hbf7NAx9KUOIfclcqYvDaCE7F6DyXJs2ToO0rHDyuWXJ6T6kPQtSGdB7Qd3
 fzi8XsN6fS2yCxEDyymUxRt5V+cJ+eNUuc52p+RTes3xh+31TGeIWmRudeNFfDTx
 XawXjypvZTxOfoo+3WcLq0qPVp9bc3lzORKAX28nSGb/6Ytijctf5iS3f1VmZVM8
 whY7UiSkTCFwix4SE3MwzJ1+kzJVngHY2woYxC02E5Lw972tiVT8LORvLU6G6P2G
 Nf4aDz3SNGiYM3o=
 =/dym
 -----END PGP SIGNATURE-----

Merge tag 'selinux-pr-20200416' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull SELinux fix from Paul Moore:
 "One small SELinux fix to ensure we cleanup properly on an error
  condition"

* tag 'selinux-pr-20200416' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
  selinux: free str on error in str_read()
2020-04-16 10:45:47 -07:00
Alexey Budankov
9807372822 capabilities: Introduce CAP_PERFMON to kernel and user space
Introduce the CAP_PERFMON capability designed to secure system
performance monitoring and observability operations so that CAP_PERFMON
can assist CAP_SYS_ADMIN capability in its governing role for
performance monitoring and observability subsystems.

CAP_PERFMON hardens system security and integrity during performance
monitoring and observability operations by decreasing attack surface that
is available to a CAP_SYS_ADMIN privileged process [2]. Providing the access
to system performance monitoring and observability operations under CAP_PERFMON
capability singly, without the rest of CAP_SYS_ADMIN credentials, excludes
chances to misuse the credentials and makes the operation more secure.

Thus, CAP_PERFMON implements the principle of least privilege for
performance monitoring and observability operations (POSIX IEEE 1003.1e:
2.2.2.39 principle of least privilege: A security design principle that
  states that a process or program be granted only those privileges
(e.g., capabilities) necessary to accomplish its legitimate function,
and only for the time that such privileges are actually required)

CAP_PERFMON meets the demand to secure system performance monitoring and
observability operations for adoption in security sensitive, restricted,
multiuser production environments (e.g. HPC clusters, cloud and virtual compute
environments), where root or CAP_SYS_ADMIN credentials are not available to
mass users of a system, and securely unblocks applicability and scalability
of system performance monitoring and observability operations beyond root
and CAP_SYS_ADMIN use cases.

CAP_PERFMON takes over CAP_SYS_ADMIN credentials related to system performance
monitoring and observability operations and balances amount of CAP_SYS_ADMIN
credentials following the recommendations in the capabilities man page [1]
for CAP_SYS_ADMIN: "Note: this capability is overloaded; see Notes to kernel
developers, below." For backward compatibility reasons access to system
performance monitoring and observability subsystems of the kernel remains
open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN capability
usage for secure system performance monitoring and observability operations
is discouraged with respect to the designed CAP_PERFMON capability.

Although the software running under CAP_PERFMON can not ensure avoidance
of related hardware issues, the software can still mitigate these issues
following the official hardware issues mitigation procedure [2]. The bugs
in the software itself can be fixed following the standard kernel development
process [3] to maintain and harden security of system performance monitoring
and observability operations.

[1] http://man7.org/linux/man-pages/man7/capabilities.7.html
[2] https://www.kernel.org/doc/html/latest/process/embargoed-hardware-issues.html
[3] https://www.kernel.org/doc/html/latest/admin-guide/security-bugs.html

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Acked-by: Serge E. Hallyn <serge@hallyn.com>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Igor Lubashev <ilubashe@akamai.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: intel-gfx@lists.freedesktop.org
Cc: linux-doc@vger.kernel.org
Cc: linux-man@vger.kernel.org
Cc: linux-security-module@vger.kernel.org
Cc: selinux@vger.kernel.org
Link: http://lore.kernel.org/lkml/5590d543-82c6-490a-6544-08e6a5517db0@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-16 12:19:06 -03:00
Ondrej Mosnacek
433e3aa377 selinux: drop unnecessary smp_load_acquire() call
In commit 66f8e2f03c ("selinux: sidtab reverse lookup hash table") the
corresponding load is moved under the spin lock, so there is no race
possible and we can read the count directly. The smp_store_release() is
still needed to avoid racing with the lock-free readers.

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-04-15 18:27:35 -04:00
Ondrej Mosnacek
af15f14c8c selinux: free str on error in str_read()
In [see "Fixes:"] I missed the fact that str_read() may give back an
allocated pointer even if it returns an error, causing a potential
memory leak in filename_trans_read_one(). Fix this by making the
function free the allocated string whenever it returns a non-zero value,
which also makes its behavior more obvious and prevents repeating the
same mistake in the future.

Reported-by: coverity-bot <keescook+coverity-bot@chromium.org>
Addresses-Coverity-ID: 1461665 ("Resource leaks")
Fixes: c3a276111e ("selinux: optimize storage of filename transitions")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-04-15 17:23:16 -04:00
Zou Wei
4b8503967e selinux: fix warning Comparison to bool
fix below warnings reported by coccicheck

security/selinux/ss/mls.c:539:39-43: WARNING: Comparison to bool
security/selinux/ss/services.c:1815:46-50: WARNING: Comparison to bool
security/selinux/ss/services.c:1827:46-50: WARNING: Comparison to bool

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zou Wei <zou_wei@huawei.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-04-14 18:01:18 -04:00
Linus Torvalds
ff2ae607c6 SPDX patches for 5.7-rc1.
Here are 3 SPDX patches for 5.7-rc1.
 
 One fixes up the SPDX tag for a single driver, while the other two go
 through the tree and add SPDX tags for all of the .gitignore files as
 needed.
 
 Nothing too complex, but you will get a merge conflict with your current
 tree, that should be trivial to handle (one file modified by two things,
 one file deleted.)
 
 All 3 of these have been in linux-next for a while, with no reported
 issues other than the merge conflict.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXodg5A8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ykySQCgy9YDrkz7nWq6v3Gohl6+lW/L+rMAnRM4uTZm
 m5AuCzO3Azt9KBi7NL+L
 =2Lm5
 -----END PGP SIGNATURE-----

Merge tag 'spdx-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx

Pull SPDX updates from Greg KH:
 "Here are three SPDX patches for 5.7-rc1.

  One fixes up the SPDX tag for a single driver, while the other two go
  through the tree and add SPDX tags for all of the .gitignore files as
  needed.

  Nothing too complex, but you will get a merge conflict with your
  current tree, that should be trivial to handle (one file modified by
  two things, one file deleted.)

  All three of these have been in linux-next for a while, with no
  reported issues other than the merge conflict"

* tag 'spdx-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx:
  ASoC: MT6660: make spdxcheck.py happy
  .gitignore: add SPDX License Identifier
  .gitignore: remove too obvious comments
2020-04-03 13:12:26 -07:00
Linus Torvalds
b3aa112d57 selinux/stable-5.7 PR 20200330
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAl6Ch6wUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXPdcg/9FDMS/n0Xl1HQBUyu26EwLu3aUpNE
 BdghXW1LKSTp7MrOENE60PGzZSAiC07ci1DqFd7zfLPZf2q5IwPwOBa/Avy8z95V
 oHKqcMT6WO1SPOm/PxZn16FCKyET4gZDTXvHBAyiyFsbk36R522ZY615P9T3eLu/
 ZA1NFsSjj68SqMCUlAWfeqjcbQiX63bryEpugOIg0qWy7R/+rtWxj9TjriZ+v9tq
 uC45UcjBqphpmoPG8BifA3jjyclwO3DeQb5u7E8//HPPraGeB19ntsymUg7ljoGk
 NrqCkZtv6E+FRCDTR5f0O7M1T4BWJodxw2NwngnTwKByLC25EZaGx80o+VyMt0eT
 Pog+++JZaa5zZr2OYOtdlPVMLc2ALL6p/8lHOqFU3GKfIf04hWOm6/Lb2IWoXs3f
 CG2b6vzoXYyjbF0Q7kxadb8uBY2S1Ds+CVu2HMBBsXsPdwbbtFWOT/6aRAQu61qn
 PW+f47NR8By3SO6nMzWts2SZEERZNIEdSKeXHuR7As1jFMXrHLItreb4GCSPay5h
 2bzRpxt2br5CDLh7Jv2pZnHtUqBWOSbslTix77+Z/hPKaNowvD9v3tc5hX87rDmB
 dYXROD6/KoyXFYDcMdphlvORFhqGqd5bEYuHHum/VjSIRh237+/nxFY/vZ4i4bzU
 2fvpAmUlVX1c4rw=
 =LlWA
 -----END PGP SIGNATURE-----

Merge tag 'selinux-pr-20200330' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull SELinux updates from Paul Moore:
 "We've got twenty SELinux patches for the v5.7 merge window, the
  highlights are below:

   - Deprecate setting /sys/fs/selinux/checkreqprot to 1.

     This flag was originally created to deal with legacy userspace and
     the READ_IMPLIES_EXEC personality flag. We changed the default from
     1 to 0 back in Linux v4.4 and now we are taking the next step of
     deprecating it, at some point in the future we will take the final
     step of rejecting 1.

   - Allow kernfs symlinks to inherit the SELinux label of the parent
     directory. In order to preserve backwards compatibility this is
     protected by the genfs_seclabel_symlinks SELinux policy capability.

   - Optimize how we store filename transitions in the kernel, resulting
     in some significant improvements to policy load times.

   - Do a better job calculating our internal hash table sizes which
     resulted in additional policy load improvements and likely general
     SELinux performance improvements as well.

   - Remove the unused initial SIDs (labels) and improve how we handle
     initial SIDs.

   - Enable per-file labeling for the bpf filesystem.

   - Ensure that we properly label NFS v4.2 filesystems to avoid a
     temporary unlabeled condition.

   - Add some missing XFS quota command types to the SELinux quota
     access controls.

   - Fix a problem where we were not updating the seq_file position
     index correctly in selinuxfs.

   - We consolidate some duplicated code into helper functions.

   - A number of list to array conversions.

   - Update Stephen Smalley's email address in MAINTAINERS"

* tag 'selinux-pr-20200330' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
  selinux: clean up indentation issue with assignment statement
  NFS: Ensure security label is set for root inode
  MAINTAINERS: Update my email address
  selinux: avtab_init() and cond_policydb_init() return void
  selinux: clean up error path in policydb_init()
  selinux: remove unused initial SIDs and improve handling
  selinux: reduce the use of hard-coded hash sizes
  selinux: Add xfs quota command types
  selinux: optimize storage of filename transitions
  selinux: factor out loop body from filename_trans_read()
  security: selinux: allow per-file labeling for bpffs
  selinux: generalize evaluate_cond_node()
  selinux: convert cond_expr to array
  selinux: convert cond_av_list to array
  selinux: convert cond_list to array
  selinux: sel_avc_get_stat_idx should increase position index
  selinux: allow kernfs symlinks to inherit parent directory context
  selinux: simplify evaluate_cond_node()
  Documentation,selinux: deprecate setting checkreqprot to 1
  selinux: move status variables out of selinux_ss
2020-03-31 15:07:55 -07:00
Colin Ian King
c753924b62 selinux: clean up indentation issue with assignment statement
The assignment of e->type_names is indented one level too deep,
clean this up by removing the extraneous tab.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-03-30 19:57:07 -04:00
Masahiro Yamada
d198b34f38 .gitignore: add SPDX License Identifier
Add SPDX License Identifier to all .gitignore files.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-25 11:50:48 +01:00
Paul Moore
5e729e111e selinux: avtab_init() and cond_policydb_init() return void
The avtab_init() and cond_policydb_init() functions always return
zero so mark them as returning void and update the callers not to
check for a return value.

Suggested-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-03-05 14:55:43 -05:00
Ondrej Mosnacek
34a2dab488 selinux: clean up error path in policydb_init()
Commit e0ac568de1 ("selinux: reduce the use of hard-coded hash sizes")
moved symtab initialization out of policydb_init(), but left the cleanup
of symtabs from the error path. This patch fixes the oversight.

Suggested-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-03-05 14:49:15 -05:00
Stephen Smalley
e3e0b582c3 selinux: remove unused initial SIDs and improve handling
Remove initial SIDs that have never been used or are no longer used by
the kernel from its string table, which is also used to generate the
SECINITSID_* symbols referenced in code.  Update the code to
gracefully handle the fact that these can now be NULL. Stop treating
it as an error if a policy defines additional initial SIDs unknown to
the kernel.  Do not load unused initial SID contexts into the sidtab.
Fix the incorrect usage of the name from the ocontext in error
messages when loading initial SIDs since these are not presently
written to the kernel policy and are therefore always NULL.

After this change, it is possible to safely reclaim and reuse some of
the unused initial SIDs without compatibility issues.  Specifically,
unused initial SIDs that were being assigned the same context as the
unlabeled initial SID in policies can be reclaimed and reused for
another purpose, with existing policies still treating them as having
the unlabeled context and future policies having the option of mapping
them to a more specific context.  For example, this could have been
used when the infiniband labeling support was introduced to define
initial SIDs for the default pkey and endport SIDs similar to the
handling of port/netif/node SIDs rather than always using
SECINITSID_UNLABELED as the default.

The set of safely reclaimable unused initial SIDs across all known
policies is igmp_packet (13), icmp_socket (14), tcp_socket (15), kmod
(24), policy (25), and scmp_packet (26); these initial SIDs were
assigned the same context as unlabeled in all known policies including
mls.  If only considering non-mls policies (i.e. assuming that mls
users always upgrade policy with their kernels), the set of safely
reclaimable unused initial SIDs further includes file_labels (6), init
(7), sysctl_modprobe (16), and sysctl_fs (18) through sysctl_dev (23).

Adding new initial SIDs beyond SECINITSID_NUM to policy unfortunately
became a fatal error in commit 24ed7fdae6 ("selinux: use separate
table for initial SID lookup") and even before that it could cause
problems on a policy reload (collision between the new initial SID and
one allocated at runtime) ever since commit 42596eafdd ("selinux:
load the initial SIDs upon every policy load") so we cannot safely
start adding new initial SIDs to policies beyond SECINITSID_NUM (27)
until such a time as all such kernels do not need to be supported and
only those that include this commit are relevant. That is not a big
deal since we haven't added a new initial SID since 2004 (v2.6.7) and
we have plenty of unused ones we can reclaim if we truly need one.

If we want to avoid the wasted storage in initial_sid_to_string[]
and/or sidtab->isids[] for the unused initial SIDs, we could introduce
an indirection between the kernel initial SID values and the policy
initial SID values and just map the policy SID values in the ocontexts
to the kernel values during policy_load_isids(). Originally I thought
we'd do this by preserving the initial SID names in the kernel policy
and creating a mapping at load time like we do for the security
classes and permissions but that would require a new kernel policy
format version and associated changes to libsepol/checkpolicy and I'm
not sure it is justified. Simpler approach is just to create a fixed
mapping table in the kernel from the existing fixed policy values to
the kernel values. Less flexible but probably sufficient.

A separate selinux userspace change was applied in
8677ce5e8f
to enable removal of most of the unused initial SID contexts from
policies, but there is no dependency between that change and this one.
That change permits removing all of the unused initial SID contexts
from policy except for the fs and sysctl SID contexts.  The initial
SID declarations themselves would remain in policy to preserve the
values of subsequent ones but the contexts can be dropped.  If/when
the kernel decides to reuse one of them, future policies can change
the name and start assigning a context again without breaking
compatibility.

Here is how I would envision staging changes to the initial SIDs in a
compatible manner after this commit is applied:

1. At any time after this commit is applied, the kernel could choose
to reclaim one of the safely reclaimable unused initial SIDs listed
above for a new purpose (i.e. replace its NULL entry in the
initial_sid_to_string[] table with a new name and start using the
newly generated SECINITSID_name symbol in code), and refpolicy could
at that time rename its declaration of that initial SID to reflect its
new purpose and start assigning it a context going
forward. Existing/old policies would map the reclaimed initial SID to
the unlabeled context, so that would be the initial default behavior
until policies are updated. This doesn't depend on the selinux
userspace change; it will work with existing policies and userspace.

2. In 6 months or so we'll have another SELinux userspace release that
will include the libsepol/checkpolicy support for omitting unused
initial SID contexts.

3. At any time after that release, refpolicy can make that release its
minimum build requirement and drop the sid context statements (but not
the sid declarations) for all of the unused initial SIDs except for
fs and sysctl, which must remain for compatibility on policy
reload with old kernels and for compatibility with kernels that were
still using SECINITSID_SYSCTL (< 2.6.39). This doesn't depend on this
kernel commit; it will work with previous kernels as well.

4. After N years for some value of N, refpolicy decides that it no
longer cares about policy reload compatibility for kernels that
predate this kernel commit, and refpolicy drops the fs and sysctl
SID contexts from policy too (but retains the declarations).

5. After M years for some value of M, the kernel decides that it no
longer cares about compatibility with refpolicies that predate step 4
(dropping the fs and sysctl SIDs), and those two SIDs also become
safely reclaimable.  This step is optional and need not ever occur unless
we decide that the need to reclaim those two SIDs outweighs the
compatibility cost.

6. After O years for some value of O, refpolicy decides that it no
longer cares about policy load (not just reload) compatibility for
kernels that predate this kernel commit, and both kernel and refpolicy
can then start adding and using new initial SIDs beyond 27. This does
not depend on the previous change (step 5) and can occur independent
of it.

Fixes: https://github.com/SELinuxProject/selinux-kernel/issues/12
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-02-27 19:34:24 -05:00
Ondrej Mosnacek
e0ac568de1 selinux: reduce the use of hard-coded hash sizes
Instead allocate hash tables with just the right size based on the
actual number of elements (which is almost always known beforehand, we
just need to defer the hashtab allocation to the right time). The only
case when we don't know the size (with the current policy format) is the
new filename transitions hashtable. Here I just left the existing value.

After this patch, the time to load Fedora policy on x86_64 decreases
from 790 ms to 167 ms. If the unconfined module is removed, it decreases
from 750 ms to 122 ms. It is also likely that other operations are going
to be faster, mainly string_to_context_struct() or mls_compute_sid(),
but I didn't try to quantify that.

The memory usage of all hash table arrays increases from ~58 KB to
~163 KB (with Fedora policy on x86_64).

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-02-27 19:23:20 -05:00
Richard Haines
e4cfa05e9b selinux: Add xfs quota command types
Add Q_XQUOTAOFF, Q_XQUOTAON and Q_XSETQLIM to trigger filesystem quotamod
permission check.

Add Q_XGETQUOTA, Q_XGETQSTAT, Q_XGETQSTATV and Q_XGETNEXTQUOTA to trigger
filesystem quotaget permission check.

Signed-off-by: Richard Haines <richard_c_haines@btinternet.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-02-22 14:41:21 -05:00
Ondrej Mosnacek
c3a276111e selinux: optimize storage of filename transitions
In these rules, each rule with the same (target type, target class,
filename) values is (in practice) always mapped to the same result type.
Therefore, it is much more efficient to group the rules by (ttype,
tclass, filename).

Thus, this patch drops the stype field from the key and changes the
datum to be a linked list of one or more structures that contain a
result type and an ebitmap of source types that map the given target to
the given result type under the given filename. The size of the hash
table is also incremented to 2048 to be more optimal for Fedora policy
(which currently has ~2500 unique (ttype, tclass, filename) tuples,
regardless of whether the 'unconfined' module is enabled).

Not only does this dramtically reduce memory usage when the policy
contains a lot of unconfined domains (ergo a lot of filename based
transitions), but it also slightly reduces memory usage of strongly
confined policies (modeled on Fedora policy with 'unconfined' module
disabled) and significantly reduces lookup times of these rules on
Fedora (roughly matches the performance of the rhashtable conversion
patch [1] posted recently to selinux@vger.kernel.org).

An obvious next step is to change binary policy format to match this
layout, so that disk space is also saved. However, since that requires
more work (including matching userspace changes) and this patch is
already beneficial on its own, I'm posting it separately.

Performance/memory usage comparison:

Kernel           | Policy load | Policy load   | Mem usage | Mem usage     | openbench
                 |             | (-unconfined) |           | (-unconfined) | (createfiles)
-----------------|-------------|---------------|-----------|---------------|--------------
reference        |       1,30s |         0,91s |      90MB |          77MB | 55 us/file
rhashtable patch |       0.98s |         0,85s |      85MB |          75MB | 38 us/file
this patch       |       0,95s |         0,87s |      75MB |          75MB | 40 us/file

(Memory usage is measured after boot. With SELinux disabled the memory
usage was ~60MB on the same system.)

[1] https://lore.kernel.org/selinux/20200116213937.77795-1-dev@lynxeye.de/T/

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-02-22 11:22:32 -05:00
Ondrej Mosnacek
253050f57c selinux: factor out loop body from filename_trans_read()
It simplifies cleanup in the error path. This will be extra useful in
later patch.

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-02-13 18:08:15 -05:00
Connor O'Brien
4ca54d3d30 security: selinux: allow per-file labeling for bpffs
Add support for genfscon per-file labeling of bpffs files. This allows
for separate permissions for different pinned bpf objects, which may
be completely unrelated to each other.

Signed-off-by: Connor O'Brien <connoro@google.com>
Signed-off-by: Steven Moreland <smoreland@google.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-02-11 22:02:54 -05:00
Ondrej Mosnacek
89d4d7c88d selinux: generalize evaluate_cond_node()
Both callers iterate the cond_list and call it for each node - turn it
into evaluate_cond_nodes(), which does the iteration for them.

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-02-11 21:50:26 -05:00
Ondrej Mosnacek
8794d78390 selinux: convert cond_expr to array
Since it is fixed-size after allocation and we know the size beforehand,
using a plain old array is simpler and more efficient.

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Reviewed-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-02-11 21:48:50 -05:00
Ondrej Mosnacek
2b3a003e15 selinux: convert cond_av_list to array
Since it is fixed-size after allocation and we know the size beforehand,
using a plain old array is simpler and more efficient.

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Reviewed-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-02-11 21:42:27 -05:00
Ondrej Mosnacek
60abd3181d selinux: convert cond_list to array
Since it is fixed-size after allocation and we know the size beforehand,
using a plain old array is simpler and more efficient.

While there, also fix signedness of some related variables/parameters.

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-02-11 21:39:41 -05:00
Linus Torvalds
a5650acb5f selinux/stable-5.6 PR 20200210
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAl5ByJYUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXMNLg/+Nj3gdWa5z0lsFuvUs87uM4HYHLtp
 9/wENEaqy7W92LAAEW5y+6sK8Pm9/CO3xf2d40SAr/xjwBo2Fbnfmx7QwDMUCQS6
 vyrLDiikSMyFU5j1AQDFuWjWF36FZlfqix+oXMETJ1tqxjIFxFN+rvbsDICPiQTl
 VxBG4CWWzNtSllYDDj3KYcVaGzNb3nleb8n436gFoHS++12BaypVA0kBlV3BqzPi
 mC0+gDuY06hwFmc/tAl8WndRwZpJ71rgsKJ5opgel61Zxf1J39QdTxeap9hhldJl
 5FbtoiGpnMXtHzpRGh6BHag/2gGX/0+J+t8ZATuk4GRtt1ueyYw9kNnAMblooJSV
 TwJlv1LLIKpAmSSJnKoN1AUAsxhS3dAmVyPuYWtmqEq8wq7YEYi5UnK1fH4ziI+b
 5LmYD0D/FoNE1Dr1TV78HmM9i0+AteH5FhkS0V6GD3v+wO6vCb3hTYaQpMWhhgnY
 /SEWbvLUO1fvQoKA8GT3UtmiUFdrqvrQFoL/l3weo9tvfyG3Walxo8s0w0r/DVUH
 AevIhUVi+snSLhI95fF26wskrZtEE9gwl/BoMjfKse0HK2t77xBE3kZW88NnSXOU
 Wk+ozfgibCnzk/Do/Y4iRr9GbgmCfscjOqlq1Yzyt/ZkUsG2egWHMzpritnzVEio
 NSQ6n6kpbkRgkHc=
 =0cTj
 -----END PGP SIGNATURE-----

Merge tag 'selinux-pr-20200210' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull SELinux fixes from Paul Moore:
 "Two small fixes: one fixes a locking problem in the recently merged
  label translation code, the other fixes an embarrassing 'binderfs' /
  'binder' filesystem name check"

* tag 'selinux-pr-20200210' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
  selinux: fix sidtab string cache locking
  selinux: fix typo in filesystem name
2020-02-10 16:51:35 -08:00
Vasily Averin
8d269a8e2a selinux: sel_avc_get_stat_idx should increase position index
If seq_file .next function does not change position index,
read after some lseek can generate unexpected output.

$ dd if=/sys/fs/selinux/avc/cache_stats # usual output
lookups hits misses allocations reclaims frees
817223 810034 7189 7189 6992 7037
1934894 1926896 7998 7998 7632 7683
1322812 1317176 5636 5636 5456 5507
1560571 1551548 9023 9023 9056 9115
0+1 records in
0+1 records out
189 bytes copied, 5,1564e-05 s, 3,7 MB/s

$# read after lseek to midle of last line
$ dd if=/sys/fs/selinux/avc/cache_stats bs=180 skip=1
dd: /sys/fs/selinux/avc/cache_stats: cannot skip to specified offset
056 9115   <<<< end of last line
1560571 1551548 9023 9023 9056 9115  <<< whole last line once again
0+1 records in
0+1 records out
45 bytes copied, 8,7221e-05 s, 516 kB/s

$# read after lseek beyond  end of of file
$ dd if=/sys/fs/selinux/avc/cache_stats bs=1000 skip=1
dd: /sys/fs/selinux/avc/cache_stats: cannot skip to specified offset
1560571 1551548 9023 9023 9056 9115  <<<< generates whole last line
0+1 records in
0+1 records out
36 bytes copied, 9,0934e-05 s, 396 kB/s

https://bugzilla.kernel.org/show_bug.cgi?id=206283

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-02-10 10:49:01 -05:00
Christian Göttsche
7470d0d13f selinux: allow kernfs symlinks to inherit parent directory context
Currently symlinks on kernel filesystems, like sysfs, are labeled on
creation with the parent filesystem root sid.

Allow symlinks to inherit the parent directory context, so fine-grained
kernfs labeling can be applied to symlinks too and checking contexts
doesn't complain about them.

For backward-compatibility this behavior is contained in a new policy
capability: genfs_seclabel_symlinks

Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-02-10 10:49:01 -05:00
Ondrej Mosnacek
06c2efe2cf selinux: simplify evaluate_cond_node()
It never fails, so it can just return void.

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Reviewed-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-02-10 10:49:01 -05:00
Stephen Smalley
e9c38f9fc2 Documentation,selinux: deprecate setting checkreqprot to 1
Deprecate setting the SELinux checkreqprot tunable to 1 via kernel
parameter or /sys/fs/selinux/checkreqprot.  Setting it to 0 is left
intact for compatibility since Android and some Linux distributions
do so for security and treat an inability to set it as a fatal error.
Eventually setting it to 0 will become a no-op and the kernel will
stop using checkreqprot's value internally altogether.

checkreqprot was originally introduced as a compatibility mechanism
for legacy userspace and the READ_IMPLIES_EXEC personality flag.
However, if set to 1, it weakens security by allowing mappings to be
made executable without authorization by policy.  The default value
for the SECURITY_SELINUX_CHECKREQPROT_VALUE config option was changed
from 1 to 0 in commit 2a35d196c1 ("selinux: change
CONFIG_SECURITY_SELINUX_CHECKREQPROT_VALUE default") and both Android
and Linux distributions began explicitly setting
/sys/fs/selinux/checkreqprot to 0 some time ago.

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-02-10 10:49:01 -05:00
Ondrej Mosnacek
4b36cb773a selinux: move status variables out of selinux_ss
It fits more naturally in selinux_state, since it reflects also global
state (the enforcing and policyload fields).

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Reviewed-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-02-10 10:49:01 -05:00
Linus Torvalds
c9d35ee049 Merge branch 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs file system parameter updates from Al Viro:
 "Saner fs_parser.c guts and data structures. The system-wide registry
  of syntax types (string/enum/int32/oct32/.../etc.) is gone and so is
  the horror switch() in fs_parse() that would have to grow another case
  every time something got added to that system-wide registry.

  New syntax types can be added by filesystems easily now, and their
  namespace is that of functions - not of system-wide enum members. IOW,
  they can be shared or kept private and if some turn out to be widely
  useful, we can make them common library helpers, etc., without having
  to do anything whatsoever to fs_parse() itself.

  And we already get that kind of requests - the thing that finally
  pushed me into doing that was "oh, and let's add one for timeouts -
  things like 15s or 2h". If some filesystem really wants that, let them
  do it. Without somebody having to play gatekeeper for the variants
  blessed by direct support in fs_parse(), TYVM.

  Quite a bit of boilerplate is gone. And IMO the data structures make a
  lot more sense now. -200LoC, while we are at it"

* 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (25 commits)
  tmpfs: switch to use of invalfc()
  cgroup1: switch to use of errorfc() et.al.
  procfs: switch to use of invalfc()
  hugetlbfs: switch to use of invalfc()
  cramfs: switch to use of errofc() et.al.
  gfs2: switch to use of errorfc() et.al.
  fuse: switch to use errorfc() et.al.
  ceph: use errorfc() and friends instead of spelling the prefix out
  prefix-handling analogues of errorf() and friends
  turn fs_param_is_... into functions
  fs_parse: handle optional arguments sanely
  fs_parse: fold fs_parameter_desc/fs_parameter_spec
  fs_parser: remove fs_parameter_description name field
  add prefix to fs_context->log
  ceph_parse_param(), ceph_parse_mon_ips(): switch to passing fc_log
  new primitive: __fs_parse()
  switch rbd and libceph to p_log-based primitives
  struct p_log, variants of warnf() et.al. taking that one instead
  teach logfc() to handle prefices, give it saner calling conventions
  get rid of cg_invalf()
  ...
2020-02-08 13:26:41 -08:00
Al Viro
d7167b1499 fs_parse: fold fs_parameter_desc/fs_parameter_spec
The former contains nothing but a pointer to an array of the latter...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-07 14:48:37 -05:00
Eric Sandeen
96cafb9ccb fs_parser: remove fs_parameter_description name field
Unused now.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-07 14:48:36 -05:00
Ondrej Mosnacek
39a706fbcf selinux: fix sidtab string cache locking
Avoiding taking a lock in an IRQ context is not enough to prevent
deadlocks, as discovered by syzbot:

===
WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
5.5.0-syzkaller #0 Not tainted
-----------------------------------------------------
syz-executor.0/8927 [HC0[0]:SC0[2]:HE1:SE0] is trying to acquire:
ffff888027c94098 (&(&s->cache_lock)->rlock){+.+.}, at: spin_lock include/linux/spinlock.h:338 [inline]
ffff888027c94098 (&(&s->cache_lock)->rlock){+.+.}, at: sidtab_sid2str_put.part.0+0x36/0x880 security/selinux/ss/sidtab.c:533

and this task is already holding:
ffffffff898639b0 (&(&nf_conntrack_locks[i])->rlock){+.-.}, at: spin_lock include/linux/spinlock.h:338 [inline]
ffffffff898639b0 (&(&nf_conntrack_locks[i])->rlock){+.-.}, at: nf_conntrack_lock+0x17/0x70 net/netfilter/nf_conntrack_core.c:91
which would create a new lock dependency:
 (&(&nf_conntrack_locks[i])->rlock){+.-.} -> (&(&s->cache_lock)->rlock){+.+.}

but this new dependency connects a SOFTIRQ-irq-safe lock:
 (&(&nf_conntrack_locks[i])->rlock){+.-.}

[...]

other info that might help us debug this:

 Possible interrupt unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&(&s->cache_lock)->rlock);
                               local_irq_disable();
                               lock(&(&nf_conntrack_locks[i])->rlock);
                               lock(&(&s->cache_lock)->rlock);
  <Interrupt>
    lock(&(&nf_conntrack_locks[i])->rlock);

 *** DEADLOCK ***
[...]
===

Fix this by simply locking with irqsave/irqrestore and stop giving up on
!in_task(). It makes the locking a bit slower, but it shouldn't make a
big difference in real workloads. Under the scenario from [1] (only
cache hits) it only increased the runtime overhead from the
security_secid_to_secctx() function from ~2% to ~3% (it was ~5-65%
before introducing the cache).

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1733259

Fixes: d97bd23c2d ("selinux: cache the SID -> context string translation")
Reported-by: syzbot+61cba5033e2072d61806@syzkaller.appspotmail.com
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-02-05 18:31:10 -05:00
Hridya Valsaraju
a20456aef8 selinux: fix typo in filesystem name
Correct the filesystem name to "binder" to enable genfscon per-file
labelling for binderfs.

Fixes: 7a4b519474 ("selinux: allow per-file labelling for binderfs")
Signed-off-by: Hridya Valsaraju <hridya@google.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
[PM: slight style changes to the subj/description]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-02-05 18:23:13 -05:00
Linus Torvalds
bd2463ac7d Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from David Miller:

 1) Add WireGuard

 2) Add HE and TWT support to ath11k driver, from John Crispin.

 3) Add ESP in TCP encapsulation support, from Sabrina Dubroca.

 4) Add variable window congestion control to TIPC, from Jon Maloy.

 5) Add BCM84881 PHY driver, from Russell King.

 6) Start adding netlink support for ethtool operations, from Michal
    Kubecek.

 7) Add XDP drop and TX action support to ena driver, from Sameeh
    Jubran.

 8) Add new ipv4 route notifications so that mlxsw driver does not have
    to handle identical routes itself. From Ido Schimmel.

 9) Add BPF dynamic program extensions, from Alexei Starovoitov.

10) Support RX and TX timestamping in igc, from Vinicius Costa Gomes.

11) Add support for macsec HW offloading, from Antoine Tenart.

12) Add initial support for MPTCP protocol, from Christoph Paasch,
    Matthieu Baerts, Florian Westphal, Peter Krystad, and many others.

13) Add Octeontx2 PF support, from Sunil Goutham, Geetha sowjanya, Linu
    Cherian, and others.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1469 commits)
  net: phy: add default ARCH_BCM_IPROC for MDIO_BCM_IPROC
  udp: segment looped gso packets correctly
  netem: change mailing list
  qed: FW 8.42.2.0 debug features
  qed: rt init valid initialization changed
  qed: Debug feature: ilt and mdump
  qed: FW 8.42.2.0 Add fw overlay feature
  qed: FW 8.42.2.0 HSI changes
  qed: FW 8.42.2.0 iscsi/fcoe changes
  qed: Add abstraction for different hsi values per chip
  qed: FW 8.42.2.0 Additional ll2 type
  qed: Use dmae to write to widebus registers in fw_funcs
  qed: FW 8.42.2.0 Parser offsets modified
  qed: FW 8.42.2.0 Queue Manager changes
  qed: FW 8.42.2.0 Expose new registers and change windows
  qed: FW 8.42.2.0 Internal ram offsets modifications
  MAINTAINERS: Add entry for Marvell OcteonTX2 Physical Function driver
  Documentation: net: octeontx2: Add RVU HW and drivers overview
  octeontx2-pf: ethtool RSS config support
  octeontx2-pf: Add basic ethtool support
  ...
2020-01-28 16:02:33 -08:00
Stephen Smalley
98aa00345d selinux: fix regression introduced by move_mount(2) syscall
commit 2db154b3ea ("vfs: syscall: Add move_mount(2) to move mounts around")
introduced a new move_mount(2) system call and a corresponding new LSM
security_move_mount hook but did not implement this hook for any existing
LSM.  This creates a regression for SELinux with respect to consistent
checking of mounts; the existing selinux_mount hook checks mounton
permission to the mount point path.  Provide a SELinux hook
implementation for move_mount that applies this same check for
consistency.  In the future we may wish to add a new move_mount
filesystem permission and check as well, but this addresses
the immediate regression.

Fixes: 2db154b3ea ("vfs: syscall: Add move_mount(2) to move mounts around")
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-01-20 07:42:37 -05:00
Ondrej Mosnacek
dd89b9d9f3 selinux: do not allocate ancillary buffer on first load
In security_load_policy(), we can defer allocating the newpolicydb
ancillary array to after checking state->initialized, thereby avoiding
the pointless allocation when loading policy the first time.

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
[PM: merged portions by hand]
Reviewed-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-01-16 16:05:25 -05:00
Paul Moore
cb89e24658 selinux: remove redundant allocation and helper functions
This patch removes the inode, file, and superblock security blob
allocation functions and moves the associated code into the
respective LSM hooks.  This patch also removes the inode_doinit()
function as it was a trivial wrapper around
inode_doinit_with_dentry() and called from one location in the code.

Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-01-16 14:38:03 -05:00
Huaisheng Ye
df4779b5d2 selinux: remove redundant selinux_nlmsg_perm
selinux_nlmsg_perm is used for only by selinux_netlink_send. Remove
the redundant function to simplify the code.

Fix a typo by suggestion from Stephen.

Signed-off-by: Huaisheng Ye <yehs1@lenovo.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-01-16 14:34:36 -05:00
Ondrej Mosnacek
ae3d8c2e27 selinux: fix wrong buffer types in policydb.c
Two places used u32 where there should have been __le32.

Fixes sparse warnings:
  CHECK   [...]/security/selinux/ss/services.c
[...]/security/selinux/ss/policydb.c:2669:16: warning: incorrect type in assignment (different base types)
[...]/security/selinux/ss/policydb.c:2669:16:    expected unsigned int
[...]/security/selinux/ss/policydb.c:2669:16:    got restricted __le32 [usertype]
[...]/security/selinux/ss/policydb.c:2674:24: warning: incorrect type in assignment (different base types)
[...]/security/selinux/ss/policydb.c:2674:24:    expected unsigned int
[...]/security/selinux/ss/policydb.c:2674:24:    got restricted __le32 [usertype]
[...]/security/selinux/ss/policydb.c:2675:24: warning: incorrect type in assignment (different base types)
[...]/security/selinux/ss/policydb.c:2675:24:    expected unsigned int
[...]/security/selinux/ss/policydb.c:2675:24:    got restricted __le32 [usertype]
[...]/security/selinux/ss/policydb.c:2676:24: warning: incorrect type in assignment (different base types)
[...]/security/selinux/ss/policydb.c:2676:24:    expected unsigned int
[...]/security/selinux/ss/policydb.c:2676:24:    got restricted __le32 [usertype]
[...]/security/selinux/ss/policydb.c:2681:32: warning: incorrect type in assignment (different base types)
[...]/security/selinux/ss/policydb.c:2681:32:    expected unsigned int
[...]/security/selinux/ss/policydb.c:2681:32:    got restricted __le32 [usertype]
[...]/security/selinux/ss/policydb.c:2701:16: warning: incorrect type in assignment (different base types)
[...]/security/selinux/ss/policydb.c:2701:16:    expected unsigned int
[...]/security/selinux/ss/policydb.c:2701:16:    got restricted __le32 [usertype]
[...]/security/selinux/ss/policydb.c:2706:24: warning: incorrect type in assignment (different base types)
[...]/security/selinux/ss/policydb.c:2706:24:    expected unsigned int
[...]/security/selinux/ss/policydb.c:2706:24:    got restricted __le32 [usertype]
[...]/security/selinux/ss/policydb.c:2707:24: warning: incorrect type in assignment (different base types)
[...]/security/selinux/ss/policydb.c:2707:24:    expected unsigned int
[...]/security/selinux/ss/policydb.c:2707:24:    got restricted __le32 [usertype]

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Reviewed-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-01-16 14:31:05 -05:00
Nikolay Aleksandrov
8dcea18708 net: bridge: vlan: add rtm definitions and dump support
This patch adds vlan rtm definitions:
 - NEWVLAN: to be used for creating vlans, setting options and
   notifications
 - DELVLAN: to be used for deleting vlans
 - GETVLAN: used for dumping vlan information

Dumping vlans which can span multiple messages is added now with basic
information (vid and flags). We use nlmsg_parse() to validate the header
length in order to be able to extend the message with filtering
attributes later.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15 13:48:17 +01:00
Ondrej Mosnacek
cfff75d897 selinux: reorder hooks to make runtime disable less broken
Commit b1d9e6b064 ("LSM: Switch to lists of hooks") switched the LSM
infrastructure to use per-hook lists, which meant that removing the
hooks for a given module was no longer atomic. Even though the commit
clearly documents that modules implementing runtime revmoval of hooks
(only SELinux attempts this madness) need to take special precautions to
avoid race conditions, SELinux has never addressed this.

By inserting an artificial delay between the loop iterations of
security_delete_hooks() (I used 100 ms), booting to a state where
SELinux is enabled, but policy is not yet loaded, and running these
commands:

    while true; do ping -c 1 <some IP>; done &
    echo -n 1 >/sys/fs/selinux/disable
    kill %1
    wait

...I was able to trigger NULL pointer dereferences in various places. I
also have a report of someone getting panics on a stock RHEL-8 kernel
after setting SELINUX=disabled in /etc/selinux/config and rebooting
(without adding "selinux=0" to kernel command-line).

Reordering the SELinux hooks such that those that allocate structures
are removed last seems to prevent these panics. It is very much possible
that this doesn't make the runtime disable completely race-free, but at
least it makes the operation much less fragile.

Cc: stable@vger.kernel.org
Fixes: b1d9e6b064 ("LSM: Switch to lists of hooks")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Reviewed-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-01-10 15:26:55 -05:00
Ondrej Mosnacek
65cddd5098 selinux: treat atomic flags more carefully
The disabled/enforcing/initialized flags are all accessed concurrently
by threads so use the appropriate accessors that ensure atomicity and
document that it is expected.

Use smp_load/acquire...() helpers (with memory barriers) for the
initialized flag, since it gates access to the rest of the state
structures.

Note that the disabled flag is currently not used for anything other
than avoiding double disable, but it will be used for bailing out of
hooks once security_delete_hooks() is removed.

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-01-10 15:19:39 -05:00
Stephen Smalley
b78b7d59bd selinux: make default_noexec read-only after init
SELinux checks whether VM_EXEC is set in the VM_DATA_DEFAULT_FLAGS
during initialization and saves the result in default_noexec for use
in its mmap and mprotect hook function implementations to decide
whether to apply EXECMEM, EXECHEAP, EXECSTACK, and EXECMOD checks.
Mark default_noexec as ro_after_init to prevent later clearing it
and thereby disabling these checks.  It is only set legitimately from
init code.

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-01-10 12:26:20 -05:00
Ravi Kumar Siddojigari
fe49c7e4f8 selinux: move ibpkeys code under CONFIG_SECURITY_INFINIBAND.
Move cache based  pkey sid  retrieval code which was added
with commit "409dcf31" under CONFIG_SECURITY_INFINIBAND.
As its  going to alloc a new cache which impacts
low RAM devices which was enabled by default.

Suggested-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Ravi Kumar Siddojigari <rsiddoji@codeaurora.org>
[PM: checkpatch.pl cleanups, fixed capitalization in the description]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-01-10 11:56:37 -05:00
Huaisheng Ye
b82f3f6894 selinux: remove redundant msg_msg_alloc_security
selinux_msg_msg_alloc_security only calls msg_msg_alloc_security but
do nothing else. And also msg_msg_alloc_security is just used by the
former.

Remove the redundant function to simplify the code.

Signed-off-by: Huaisheng Ye <yehs1@lenovo.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-01-10 11:32:13 -05:00
Stephen Smalley
d41415eb5e Documentation,selinux: fix references to old selinuxfs mount point
selinuxfs was originally mounted on /selinux, and various docs and
kconfig help texts referred to nodes under it.  In Linux 3.0,
/sys/fs/selinux was introduced as the preferred mount point for selinuxfs.
Fix all the old references to /selinux/ to /sys/fs/selinux/.
While we are there, update the description of the selinux boot parameter
to reflect the fact that the default value is always 1 since
commit be6ec88f41 ("selinux: Remove SECURITY_SELINUX_BOOTPARAM_VALUE")
and drop discussion of runtime disable since it is deprecated.

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-01-07 12:46:53 -05:00
Paul Moore
89b223bfb8 selinux: deprecate disabling SELinux and runtime
Deprecate the CONFIG_SECURITY_SELINUX_DISABLE functionality.  The
code was originally developed to make it easier for Linux
distributions to support architectures where adding parameters to the
kernel command line was difficult.  Unfortunately, supporting runtime
disable meant we had to make some security trade-offs when it came to
the LSM hooks, as documented in the Kconfig help text:

  NOTE: selecting this option will disable the '__ro_after_init'
  kernel hardening feature for security hooks.   Please consider
  using the selinux=0 boot parameter instead of enabling this
  option.

Fortunately it looks as if that the original motivation for the
runtime disable functionality is gone, and Fedora/RHEL appears to be
the only major distribution enabling this capability at build time
so we are now taking steps to remove it entirely from the kernel.
The first step is to mark the functionality as deprecated and print
an error when it is used (what this patch is doing).  As Fedora/RHEL
makes progress in transitioning the distribution away from runtime
disable, we will introduce follow-up patches over several kernel
releases which will block for increasing periods of time when the
runtime disable is used.  Finally we will remove the option entirely
once we believe all users have moved to the kernel cmdline approach.

Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: Ondrej Mosnacek <omosnace@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-01-07 10:19:43 -05:00
Hridya Valsaraju
7a4b519474 selinux: allow per-file labelling for binderfs
This patch allows genfscon per-file labeling for binderfs.
This is required to have separate permissions to allow
access to binder, hwbinder and vndbinder devices which are
relocating to binderfs.

Acked-by: Jeff Vander Stoep <jeffv@google.com>
Acked-by: Mark Salyzyn <salyzyn@android.com>
Signed-off-by: Hridya Valsaraju <hridya@google.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-01-06 21:11:18 -05:00
liuyang34
7e78c87514 selinuxfs: use scnprintf to get real length for inode
The return value of snprintf maybe over the size of TMPBUFLEN, use
scnprintf instead in sel_read_class and sel_read_perm.

Signed-off-by: liuyang34 <liuyang34@xiaomi.com>
[PM: cleaned up the description]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-01-06 21:05:57 -05:00
YueHaibing
f126853402 selinux: remove set but not used variable 'sidtab'
security/selinux/ss/services.c: In function security_port_sid:
security/selinux/ss/services.c:2346:17: warning: variable sidtab set but not used [-Wunused-but-set-variable]
security/selinux/ss/services.c: In function security_ib_endport_sid:
security/selinux/ss/services.c:2435:17: warning: variable sidtab set but not used [-Wunused-but-set-variable]
security/selinux/ss/services.c: In function security_netif_sid:
security/selinux/ss/services.c:2480:17: warning: variable sidtab set but not used [-Wunused-but-set-variable]
security/selinux/ss/services.c: In function security_fs_use:
security/selinux/ss/services.c:2831:17: warning: variable sidtab set but not used [-Wunused-but-set-variable]

Since commit 66f8e2f03c ("selinux: sidtab reverse lookup hash table")
'sidtab' is not used any more, so remove it.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2019-12-24 14:34:01 -05:00
Paul Moore
15b590a81f selinux: ensure the policy has been loaded before reading the sidtab stats
Check to make sure we have loaded a policy before we query the
sidtab's hash stats.  Failure to do so could result in a kernel
panic/oops due to a dereferenced NULL pointer.

Fixes: 66f8e2f03c ("selinux: sidtab reverse lookup hash table")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2019-12-23 16:38:36 -05:00
Jaihind Yadav
030b995ad9 selinux: ensure we cleanup the internal AVC counters on error in avc_update()
In AVC update we don't call avc_node_kill() when avc_xperms_populate()
fails, resulting in the avc->avc_cache.active_nodes counter having a
false value.  In last patch this changes was missed , so correcting it.

Fixes: fa1aa143ac ("selinux: extended permissions for ioctls")
Signed-off-by: Jaihind Yadav <jaihindyadav@codeaurora.org>
Signed-off-by: Ravi Kumar Siddojigari <rsiddoji@codeaurora.org>
[PM: merge fuzz, minor description cleanup]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2019-12-21 10:59:21 -05:00
Stephen Smalley
5c108d4e18 selinux: randomize layout of key structures
Randomize the layout of key selinux data structures.
Initially this is applied to the selinux_state, selinux_ss,
policydb, and task_security_struct data structures.

NB To test/use this mechanism, one must install the
necessary build-time dependencies, e.g. gcc-plugin-devel on Fedora,
and enable CONFIG_GCC_PLUGIN_RANDSTRUCT in the kernel configuration.

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Reviewed-by: Kees Cook <keescook@chromium.org>
[PM: double semi-colon fixed]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2019-12-18 21:26:06 -05:00
Stephen Smalley
6c5a682e64 selinux: clean up selinux_enabled/disabled/enforcing_boot
Rename selinux_enabled to selinux_enabled_boot to make it clear that
it only reflects whether SELinux was enabled at boot.  Replace the
references to it in the MAC_STATUS audit log in sel_write_enforce()
with hardcoded "1" values because this code is only reachable if SELinux
is enabled and does not change its value, and update the corresponding
MAC_STATUS audit log in sel_write_disable().  Stop clearing
selinux_enabled in selinux_disable() since it is not used outside of
initialization code that runs before selinux_disable() can be reached.
Mark both selinux_enabled_boot and selinux_enforcing_boot as __initdata
since they are only used in initialization code.

Wrap the disabled field in the struct selinux_state with
CONFIG_SECURITY_SELINUX_DISABLE since it is only used for
runtime disable.

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2019-12-18 21:22:46 -05:00
Yang Guo
210a292874 selinux: remove unnecessary selinux cred request
task_security_struct was obtained at the beginning of may_create
and selinux_inode_init_security, no need to obtain again.
may_create will be called very frequently when create dir and file.

Cc: Paul Moore <paul@paul-moore.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Eric Paris <eparis@parisplace.org>
Signed-off-by: Yang Guo <guoyang2@huawei.com>
Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2019-12-12 08:50:39 -05:00
Paul Moore
d8db60cb23 selinux: ensure we cleanup the internal AVC counters on error in avc_insert()
Fix avc_insert() to call avc_node_kill() if we've already allocated
an AVC node and the code fails to insert the node in the cache.

Fixes: fa1aa143ac ("selinux: extended permissions for ioctls")
Reported-by: rsiddoji@codeaurora.org
Suggested-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2019-12-10 14:16:53 -05:00
Stephen Smalley
5298d0b9b9 selinux: clean up selinux_inode_permission MAY_NOT_BLOCK tests
Through a somewhat convoluted series of changes, we have ended up
with multiple unnecessary occurrences of (flags & MAY_NOT_BLOCK)
tests in selinux_inode_permission().  Clean it up and simplify.
No functional change.

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2019-12-09 18:47:27 -05:00
Stephen Smalley
0188d5c025 selinux: fall back to ref-walk if audit is required
commit bda0be7ad9 ("security: make inode_follow_link RCU-walk aware")
passed down the rcu flag to the SELinux AVC, but failed to adjust the
test in slow_avc_audit() to also return -ECHILD on LSM_AUDIT_DATA_DENTRY.
Previously, we only returned -ECHILD if generating an audit record with
LSM_AUDIT_DATA_INODE since this was only relevant from inode_permission.
Move the handling of MAY_NOT_BLOCK to avc_audit() and its inlined
equivalent in selinux_inode_permission() immediately after we determine
that audit is required, and always fall back to ref-walk in this case.

Fixes: bda0be7ad9 ("security: make inode_follow_link RCU-walk aware")
Reported-by: Will Deacon <will@kernel.org>
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2019-12-09 18:37:47 -05:00
Stephen Smalley
1a37079c23 selinux: revert "stop passing MAY_NOT_BLOCK to the AVC upon follow_link"
This reverts commit e46e01eebb ("selinux: stop passing MAY_NOT_BLOCK
to the AVC upon follow_link"). The correct fix is to instead fall
back to ref-walk if audit is required irrespective of the specific
audit data type.  This is done in the next commit.

Fixes: e46e01eebb ("selinux: stop passing MAY_NOT_BLOCK to the AVC upon follow_link")
Reported-by: Will Deacon <will@kernel.org>
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2019-12-09 18:28:56 -05:00
Stephen Smalley
59438b4647 security,lockdown,selinux: implement SELinux lockdown
Implement a SELinux hook for lockdown.  If the lockdown module is also
enabled, then a denial by the lockdown module will take precedence over
SELinux, so SELinux can only further restrict lockdown decisions.
The SELinux hook only distinguishes at the granularity of integrity
versus confidentiality similar to the lockdown module, but includes the
full lockdown reason as part of the audit record as a hint in diagnosing
what triggered the denial.  To support this auditing, move the
lockdown_reasons[] string array from being private to the lockdown
module to the security framework so that it can be used by the lsm audit
code and so that it is always available even when the lockdown module
is disabled.

Note that the SELinux implementation allows the integrity and
confidentiality reasons to be controlled independently from one another.
Thus, in an SELinux policy, one could allow operations that specify
an integrity reason while blocking operations that specify a
confidentiality reason. The SELinux hook implementation is
stricter than the lockdown module in validating the provided reason value.

Sample AVC audit output from denials:
avc:  denied  { integrity } for pid=3402 comm="fwupd"
 lockdown_reason="/dev/mem,kmem,port" scontext=system_u:system_r:fwupd_t:s0
 tcontext=system_u:system_r:fwupd_t:s0 tclass=lockdown permissive=0

avc:  denied  { confidentiality } for pid=4628 comm="cp"
 lockdown_reason="/proc/kcore access"
 scontext=unconfined_u:unconfined_r:test_lockdown_integrity_t:s0-s0:c0.c1023
 tcontext=unconfined_u:unconfined_r:test_lockdown_integrity_t:s0-s0:c0.c1023
 tclass=lockdown permissive=0

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>
[PM: some merge fuzz do the the perf hooks]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2019-12-09 17:53:58 -05:00