While running the high_systime workload of the AIM7 benchmark on
a 2-socket 12-core Westmere x86-64 machine running 3.10-rc4 kernel
(with HT on), it was found that a pretty sizable amount of time was
spent in the SELinux code. Below was the perf trace of the "perf
record -a -s" of a test run at 1500 users:
5.04% ls [kernel.kallsyms] [k] ebitmap_get_bit
1.96% ls [kernel.kallsyms] [k] mls_level_isvalid
1.95% ls [kernel.kallsyms] [k] find_next_bit
The ebitmap_get_bit() was the hottest function in the perf-report
output. Both the ebitmap_get_bit() and find_next_bit() functions
were, in fact, called by mls_level_isvalid(). As a result, the
mls_level_isvalid() call consumed 8.95% of the total CPU time of
all the 24 virtual CPUs which is quite a lot. The majority of the
mls_level_isvalid() function invocations come from the socket creation
system call.
Looking at the mls_level_isvalid() function, it is checking to see
if all the bits set in one of the ebitmap structure are also set in
another one as well as the highest set bit is no bigger than the one
specified by the given policydb data structure. It is doing it in
a bit-by-bit manner. So if the ebitmap structure has many bits set,
the iteration loop will be done many times.
The current code can be rewritten to use a similar algorithm as the
ebitmap_contains() function with an additional check for the
highest set bit. The ebitmap_contains() function was extended to
cover an optional additional check for the highest set bit, and the
mls_level_isvalid() function was modified to call ebitmap_contains().
With that change, the perf trace showed that the used CPU time drop
down to just 0.08% (ebitmap_contains + mls_level_isvalid) of the
total which is about 100X less than before.
0.07% ls [kernel.kallsyms] [k] ebitmap_contains
0.05% ls [kernel.kallsyms] [k] ebitmap_get_bit
0.01% ls [kernel.kallsyms] [k] mls_level_isvalid
0.01% ls [kernel.kallsyms] [k] find_next_bit
The remaining ebitmap_get_bit() and find_next_bit() functions calls
are made by other kernel routines as the new mls_level_isvalid()
function will not call them anymore.
This patch also improves the high_systime AIM7 benchmark result,
though the improvement is not as impressive as is suggested by the
reduction in CPU time spent in the ebitmap functions. The table below
shows the performance change on the 2-socket x86-64 system (with HT
on) mentioned above.
+--------------+---------------+----------------+-----------------+
| Workload | mean % change | mean % change | mean % change |
| | 10-100 users | 200-1000 users | 1100-2000 users |
+--------------+---------------+----------------+-----------------+
| high_systime | +0.1% | +0.9% | +2.6% |
+--------------+---------------+----------------+-----------------+
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
When new objects are created we have great and flexible rules to
determine the type of the new object. We aren't quite as flexible or
mature when it comes to determining the user, role, and range. This
patch adds a new ability to specify the place a new objects user, role,
and range should come from. For users and roles it can come from either
the source or the target of the operation. aka for files the user can
either come from the source (the running process and todays default) or
it can come from the target (aka the parent directory of the new file)
examples always are done with
directory context: system_u:object_r:mnt_t:s0-s0:c0.c512
process context: unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
[no rule]
unconfined_u:object_r:mnt_t:s0 test_none
[default user source]
unconfined_u:object_r:mnt_t:s0 test_user_source
[default user target]
system_u:object_r:mnt_t:s0 test_user_target
[default role source]
unconfined_u:unconfined_r:mnt_t:s0 test_role_source
[default role target]
unconfined_u:object_r:mnt_t:s0 test_role_target
[default range source low]
unconfined_u:object_r:mnt_t:s0 test_range_source_low
[default range source high]
unconfined_u:object_r:mnt_t:s0:c0.c1023 test_range_source_high
[default range source low-high]
unconfined_u:object_r:mnt_t:s0-s0:c0.c1023 test_range_source_low-high
[default range target low]
unconfined_u:object_r:mnt_t:s0 test_range_target_low
[default range target high]
unconfined_u:object_r:mnt_t:s0:c0.c512 test_range_target_high
[default range target low-high]
unconfined_u:object_r:mnt_t:s0-s0:c0.c512 test_range_target_low-high
Signed-off-by: Eric Paris <eparis@redhat.com>
My @hp.com will no longer be valid starting August 5, 2011 so an update is
necessary. My new email address is employer independent so we don't have
to worry about doing this again any time soon.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The socket SID would be computed on creation and no longer inherit
its creator's SID by default. Socket may have a different type but
needs to retain the creator's role and MLS attribute in order not
to break labeled networking and network access control.
The kernel value for a class would be used to determine if the class
if one of socket classes. If security_compute_sid is called from
userspace the policy value for a class would be mapped to the relevant
kernel value first.
Signed-off-by: Harry Ciao <qingtao.cao@windriver.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
The sym_val_to_name type array can be quite large as it grows linearly with
the number of types. With known policies having over 5k types these
allocations are growing large enough that they are likely to fail. Convert
those to flex_array so no allocation is larger than PAGE_SIZE
Signed-off-by: Eric Paris <eparis@redhat.com>
Allow runtime switching between different policy types (e.g. from a MLS/MCS
policy to a non-MLS/non-MCS policy or viceversa).
Signed-off-by: Guido Trentalancia <guido@trentalancia.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
Per https://bugzilla.redhat.com/show_bug.cgi?id=548145
there are sufficient range transition rules in modern (Fedora) policy to
make mls_compute_sid a significant factor on the shmem file setup path
due to the length of the range_tr list. Replace the simple range_tr
list with a hashtab inside the security server to help mitigate this
problem.
Signed-off-by: Stephen D. Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
The last return is unreachable, remove the 'return'
in default, let it fall through.
Signed-off-by: WANG Cong <amwang@redhat.com>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Modify SELinux to dynamically discover class and permission values
upon policy load, based on the dynamic object class/perm discovery
logic from libselinux. A mapping is created between kernel-private
class and permission indices used outside the security server and the
policy values used within the security server.
The mappings are only applied upon kernel-internal computations;
similar mappings for the private indices of userspace object managers
is handled on a per-object manager basis by the userspace AVC. The
interfaces for compute_av and transition_sid are split for kernel
vs. userspace; the userspace functions are distinguished by a _user
suffix.
The kernel-private class indices are no longer tied to the policy
values and thus do not need to skip indices for userspace classes;
thus the kernel class index values are compressed. The flask.h
definitions were regenerated by deleting the userspace classes from
refpolicy's definitions and then regenerating the headers. Going
forward, we can just maintain the flask.h, av_permissions.h, and
classmap.h definitions separately from policy as they are no longer
tied to the policy values. The next patch introduces a utility to
automate generation of flask.h and av_permissions.h from the
classmap.h definitions.
The older kernel class and permission string tables are removed and
replaced by a single security class mapping table that is walked at
policy load to generate the mapping. The old kernel class validation
logic is completely replaced by the mapping logic.
The handle unknown logic is reworked. reject_unknown=1 is handled
when the mappings are computed at policy load time, similar to the old
handling by the class validation logic. allow_unknown=1 is handled
when computing and mapping decisions - if the permission was not able
to be mapped (i.e. undefined, mapped to zero), then it is
automatically added to the allowed vector. If the class was not able
to be mapped (i.e. undefined, mapped to zero), then all permissions
are allowed for it if allow_unknown=1.
avc_audit leverages the new security class mapping table to lookup the
class and permission names from the kernel-private indices.
The mdp program is updated to use the new table when generating the
class definitions and allow rules for a minimal boot policy for the
kernel. It should be noted that this policy will not include any
userspace classes, nor will its policy index values for the kernel
classes correspond with the ones in refpolicy (they will instead match
the kernel-private indices).
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
Trivial minor fixes that change C null character style.
Signed-off-by: Vesa-Matti Kari <vmkari@cc.helsinki.fi>
Signed-off-by: James Morris <jmorris@namei.org>
Formatting and syntax changes
whitespace, tabs to spaces, trailing space
put open { on same line as struct def
remove unneeded {} after if statements
change printk("Lu") to printk("llu")
convert asm/uaccess.h to linux/uaacess.h includes
remove unnecessary asm/bug.h includes
convert all users of simple_strtol to strict_strtol
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Introduce SELinux support for deferred mapping of security contexts in
the SID table upon policy reload, and use this support for inode
security contexts when the context is not yet valid under the current
policy. Only processes with CAP_MAC_ADMIN + mac_admin permission in
policy can set undefined security contexts on inodes. Inodes with
such undefined contexts are treated as having the unlabeled context
until the context becomes valid upon a policy reload that defines the
context. Context invalidation upon policy reload also uses this
support to save the context information in the SID table and later
recover it upon a subsequent policy reload that defines the context
again.
This support is to enable package managers and similar programs to set
down file contexts unknown to the system policy at the time the file
is created in order to better support placing loadable policy modules
in packages and to support build systems that need to create images of
different distro releases with different policies w/o requiring all of
the contexts to be defined or legal in the build host policy.
With this patch applied, the following sequence is possible, although
in practice it is recommended that this permission only be allowed to
specific program domains such as the package manager.
# rmdir baz
# rm bar
# touch bar
# chcon -t foo_exec_t bar # foo_exec_t is not yet defined
chcon: failed to change context of `bar' to `system_u:object_r:foo_exec_t': Invalid argument
# mkdir -Z system_u:object_r:foo_exec_t baz
mkdir: failed to set default file creation context to `system_u:object_r:foo_exec_t': Invalid argument
# cat setundefined.te
policy_module(setundefined, 1.0)
require {
type unconfined_t;
type unlabeled_t;
}
files_type(unlabeled_t)
allow unconfined_t self:capability2 mac_admin;
# make -f /usr/share/selinux/devel/Makefile setundefined.pp
# semodule -i setundefined.pp
# chcon -t foo_exec_t bar # foo_exec_t is not yet defined
# mkdir -Z system_u:object_r:foo_exec_t baz
# ls -Zd bar baz
-rw-r--r-- root root system_u:object_r:unlabeled_t bar
drwxr-xr-x root root system_u:object_r:unlabeled_t baz
# cat foo.te
policy_module(foo, 1.0)
type foo_exec_t;
files_type(foo_exec_t)
# make -f /usr/share/selinux/devel/Makefile foo.pp
# semodule -i foo.pp # defines foo_exec_t
# ls -Zd bar baz
-rw-r--r-- root root user_u:object_r:foo_exec_t bar
drwxr-xr-x root root system_u:object_r:foo_exec_t baz
# semodule -r foo
# ls -Zd bar baz
-rw-r--r-- root root system_u:object_r:unlabeled_t bar
drwxr-xr-x root root system_u:object_r:unlabeled_t baz
# semodule -i foo.pp
# ls -Zd bar baz
-rw-r--r-- root root user_u:object_r:foo_exec_t bar
drwxr-xr-x root root system_u:object_r:foo_exec_t baz
# semodule -r setundefined foo
# chcon -t foo_exec_t bar # no longer defined and not allowed
chcon: failed to change context of `bar' to `system_u:object_r:foo_exec_t': Invalid argument
# rmdir baz
# mkdir -Z system_u:object_r:foo_exec_t baz
mkdir: failed to set default file creation context to `system_u:object_r:foo_exec_t': Invalid argument
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
This patch changes mls.c to fix whitespace and syntax issues. Things that
are fixed may include (does not not have to include)
whitespace at end of lines
spaces followed by tabs
spaces used instead of tabs
spacing around parenthesis
locateion of { around struct and else clauses
location of * in pointer declarations
removal of initialization of static data to keep it in the right section
useless {} in if statemetns
useless checking for NULL before kfree
fixing of the indentation depth of switch statements
and any number of other things I forgot to mention
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
This patch adds support to the NetLabel LSM secattr struct for a secid token
and a type field, paving the way for full LSM/SELinux context support and
"static" or "fallback" labels. In addition, this patch adds a fair amount
of documentation to the core NetLabel structures used as part of the
NetLabel kernel API.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
This patch removes the requirement that the new and related object types
differ in order to polyinstantiate by MLS level. This allows MLS
polyinstantiation to occur in the absence of explicit type_member rules or
when the type has not changed.
Potential users of this support include pam_namespace.so (directory
polyinstantiation) and the SELinux X support (property polyinstantiation).
Signed-off-by: Eamon Walsh <ewalsh@tycho.nsa.gov>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
Add more validity checks at policy load time to reject malformed
policies and prevent subsequent out-of-range indexing when in permissive
mode. Resolves the NULL pointer dereference reported in
https://bugzilla.redhat.com/show_bug.cgi?id=357541.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
* We add ebitmap_for_each_positive_bit() which enables to walk on
any positive bit on the given ebitmap, to improve its performance
using common bit-operations defined in linux/bitops.h.
In the previous version, this logic was implemented using a combination
of ebitmap_for_each_bit() and ebitmap_node_get_bit(), but is was worse
in performance aspect.
This logic is most frequestly used to compute a new AVC entry,
so this patch can improve SELinux performance when AVC misses are happen.
* struct ebitmap_node is redefined as an array of "unsigned long", to get
suitable for using find_next_bit() which is fasted than iteration of
shift and logical operation, and to maximize memory usage allocated
from general purpose slab.
* Any ebitmap_for_each_bit() are repleced by the new implementation
in ss/service.c and ss/mls.c. Some of related implementation are
changed, however, there is no incompatibility with the previous
version.
* The width of any new line are less or equal than 80-chars.
The following benchmark shows the effect of this patch, when we
access many files which have different security context one after
another. The number is more than /selinux/avc/cache_threshold, so
any access always causes AVC misses.
selinux-2.6 selinux-2.6-ebitmap
AVG: 22.763 [s] 8.750 [s]
STD: 0.265 0.019
------------------------------------------
1st: 22.558 [s] 8.786 [s]
2nd: 22.458 [s] 8.750 [s]
3rd: 22.478 [s] 8.754 [s]
4th: 22.724 [s] 8.745 [s]
5th: 22.918 [s] 8.748 [s]
6th: 22.905 [s] 8.764 [s]
7th: 23.238 [s] 8.726 [s]
8th: 22.822 [s] 8.729 [s]
Signed-off-by: KaiGai Kohei <kaigai@ak.jp.nec.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
This deletes mls_copy_context() in favor of mls_context_cpy() and
replaces mls_scopy_context() with mls_context_cpy_low().
Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
The original NetLabel category bitmap was a straight char bitmap which worked
fine for the initial release as it only supported 240 bits due to limitations
in the CIPSO restricted bitmap tag (tag type 0x01). This patch converts that
straight char bitmap into an extensibile/sparse bitmap in order to lay the
foundation for other CIPSO tag types and protocols.
This patch also has a nice side effect in that all of the security attributes
passed by NetLabel into the LSM are now in a format which is in the host's
native byte/bit ordering which makes the LSM specific code much simpler; look
at the changes in security/selinux/ss/ebitmap.c as an example.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
Upon inspection it looked like the error handling for mls_export_cat() was
rather poor. This patch addresses this by NULL'ing out kfree()'d pointers
before returning and checking the return value of the function everywhere
it is called.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
Introduces support for policy version 21. This version of the binary
kernel policy allows for defining range transitions on security classes
other than the process security class. As always, backwards compatibility
for older formats is retained. The security class is read in as specified
when using the new format, while the "process" security class is assumed
when using an older policy format.
Signed-off-by: Darrel Goeddel <dgoeddel@trustedcs.com>
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: James Morris <jmorris@namei.org>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add NetLabel support to the SELinux LSM and modify the
socket_post_create() LSM hook to return an error code. The most
significant part of this patch is the addition of NetLabel hooks into
the following SELinux LSM hooks:
* selinux_file_permission()
* selinux_socket_sendmsg()
* selinux_socket_post_create()
* selinux_socket_sock_rcv_skb()
* selinux_socket_getpeersec_stream()
* selinux_socket_getpeersec_dgram()
* selinux_sock_graft()
* selinux_inet_conn_request()
The basic reasoning behind this patch is that outgoing packets are
"NetLabel'd" by labeling their socket and the NetLabel security
attributes are checked via the additional hook in
selinux_socket_sock_rcv_skb(). NetLabel itself is only a labeling
mechanism, similar to filesystem extended attributes, it is up to the
SELinux enforcement mechanism to perform the actual access checks.
In addition to the changes outlined above this patch also includes
some changes to the extended bitmap (ebitmap) and multi-level security
(mls) code to import and export SELinux TE/MLS attributes into and out
of NetLabel.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This defines a routine that combines the Type Enforcement portion of
one sid with the MLS portion from the other sid to arrive at a new
sid. This would be used to define a sid for a security association
that is to be negotiated by IKE as well as for determing the sid for
open requests and connection-oriented child sockets.
Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The following patch provides selinux interfaces that will allow the audit
system to perform filtering based on the process context (user, role, type,
sensitivity, and clearance). These interfaces will allow the selinux
module to perform efficient matches based on lower level selinux constructs,
rather than relying on context retrievals and string comparisons within
the audit module. It also allows for dominance checks on the mls portion
of the contexts that are impossible with only string comparisons.
Signed-off-by: Darrel Goeddel <dgoeddel@trustedcs.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Fix an off-by-one error in the MLS compatibility code that was causing
contexts with a MLS suffix to be rejected, preventing sharing partitions
between FC4 and FC5. Bug reported in
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=188068
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: James Morris <jmorris@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch enables files created on a MLS-enabled SELinux system to be
accessible on a non-MLS SELinux system, by skipping the MLS component of
the security context in the non-MLS case.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch improves memory use by SELinux by both reducing the avtab node
size and reducing the number of avtab nodes. The memory savings are
substantial, e.g. on a 64-bit system after boot, James Morris reported the
following data for the targeted and strict policies:
#objs objsize kernmem
Targeted:
Before: 237888 40 9.1MB
After: 19968 24 468KB
Strict:
Before: 571680 40 21.81MB
After: 221052 24 5.06MB
The improvement in memory use comes at a cost in the speed of security
server computations of access vectors, but these computations are only
required on AVC cache misses, and performance measurements by James Morris
using a number of benchmarks have shown that the change does not cause any
significant degradation.
Note that a rebuilt policy via an updated policy toolchain
(libsepol/checkpolicy) is required in order to gain the full benefits of
this patch, although some memory savings benefits are immediately applied
even to older policies (in particular, the reduction in avtab node size).
Sources for the updated toolchain are presently available from the
sourceforge CVS tree (http://sourceforge.net/cvs/?group_id=21266), and
tarballs are available from http://www.flux.utah.edu/~sds.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Implement kernel labeling of the MLS (multilevel security) field of
security contexts for files which have no existing MLS field. This is to
enable upgrades of a system from non-MLS to MLS without performing a full
filesystem relabel including all of the mountpoints, which would be quite
painful for users.
With this patch, with MLS enabled, if a file has no MLS field, the kernel
internally adds an MLS field to the in-core inode (but not to the on-disk
file). This MLS field added is the default for the superblock, allowing
per-mountpoint control over the values via fixed policy or mount options.
This patch has been tested by enabling MLS without relabeling its
filesystem, and seems to be working correctly.
Signed-off-by: James Morris <jmorris@redhat.com>
Signed-off-by: Stephen Smalley <sds@epoch.ncsc.mil>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.
Let it rip!