This patch adds the following structure:
struct netlink_kernel_cfg {
unsigned int groups;
void (*input)(struct sk_buff *skb);
struct mutex *cb_mutex;
};
That can be passed to netlink_kernel_create to set optional configurations
for netlink kernel sockets.
I've populated this structure by looking for NULL and zero parameters at the
existing code. The remaining parameters that always need to be set are still
left in the original interface.
That includes optional parameters for the netlink socket creation. This allows
easy extensibility of this interface in the future.
This patch also adapts all callers to use this new interface.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is a cleanup. Use NFPROTO_* for consistency with other
netfilter code.
Signed-off-by: Alban Crequy <alban.crequy@collabora.co.uk>
Reviewed-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
Reviewed-by: Vincent Sanders <vincent.sanders@collabora.co.uk>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
a) %d does _not_ produce a page worth of output
b) snprintf() doesn't return negatives - it used to in old glibc, but
that's the kernel...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull security subsystem updates from James Morris:
"New notable features:
- The seccomp work from Will Drewry
- PR_{GET,SET}_NO_NEW_PRIVS from Andy Lutomirski
- Longer security labels for Smack from Casey Schaufler
- Additional ptrace restriction modes for Yama by Kees Cook"
Fix up trivial context conflicts in arch/x86/Kconfig and include/linux/filter.h
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (65 commits)
apparmor: fix long path failure due to disconnected path
apparmor: fix profile lookup for unconfined
ima: fix filename hint to reflect script interpreter name
KEYS: Don't check for NULL key pointer in key_validate()
Smack: allow for significantly longer Smack labels v4
gfp flags for security_inode_alloc()?
Smack: recursive tramsmute
Yama: replace capable() with ns_capable()
TOMOYO: Accept manager programs which do not start with / .
KEYS: Add invalidation support
KEYS: Do LRU discard in full keyrings
KEYS: Permit in-place link replacement in keyring list
KEYS: Perform RCU synchronisation on keys prior to key destruction
KEYS: Announce key type (un)registration
KEYS: Reorganise keys Makefile
KEYS: Move the key config into security/keys/Kconfig
KEYS: Use the compat keyctl() syscall wrapper on Sparc64 for Sparc32 compat
Yama: remove an unused variable
samples/seccomp: fix dependencies on arch macros
Yama: add additional ptrace scopes
...
This patch removes ip_queue support which was marked as obsolete
years ago. The nfnetlink_queue modules provides more advanced
user-space packet queueing mechanism.
This patch also removes capability code included in SELinux that
refers to ip_queue. Otherwise, we break compilation.
Several warning has been sent regarding this to the mailing list
in the past month without anyone rising the hand to stop this
with some strong argument.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
With this change, calling
prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)
disables privilege granting operations at execve-time. For example, a
process will not be able to execute a setuid binary to change their uid
or gid if this bit is set. The same is true for file capabilities.
Additionally, LSM_UNSAFE_NO_NEW_PRIVS is defined to ensure that
LSMs respect the requested behavior.
To determine if the NO_NEW_PRIVS bit is set, a task may call
prctl(PR_GET_NO_NEW_PRIVS, 0, 0, 0, 0);
It returns 1 if set and 0 if it is not set. If any of the arguments are
non-zero, it will return -1 and set errno to -EINVAL.
(PR_SET_NO_NEW_PRIVS behaves similarly.)
This functionality is desired for the proposed seccomp filter patch
series. By using PR_SET_NO_NEW_PRIVS, it allows a task to modify the
system call behavior for itself and its child tasks without being
able to impact the behavior of a more privileged task.
Another potential use is making certain privileged operations
unprivileged. For example, chroot may be considered "safe" if it cannot
affect privileged tasks.
Note, this patch causes execve to fail when PR_SET_NO_NEW_PRIVS is
set and AppArmor is in use. It is fixed in a subsequent patch.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Will Drewry <wad@chromium.org>
Acked-by: Eric Paris <eparis@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>
v18: updated change desc
v17: using new define values as per 3.4
Signed-off-by: James Morris <james.l.morris@oracle.com>
avc_add_callback now just used for registering reset functions
in initcalls, and the callback functions just did reset operations.
So, reducing the arguments to only one event is enough now.
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
avc_add_callback now only called from initcalls, so replace the
weak GFP_ATOMIC to GFP_KERNEL, and mark this function __init
to make a warning when not been called from initcalls.
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
We no longer need the distinction. We only need data after we decide to do an
audit. So turn the "late" audit data into just "data" and remove what we
currently have as "data".
Signed-off-by: Eric Paris <eparis@redhat.com>
It isn't needed. If you don't set the type of the data associated with
that type it is a pretty obvious programming bug. So why waste the cycles?
Signed-off-by: Eric Paris <eparis@redhat.com>
There are no legitimate users. Always use current and get back some stack
space for the common_audit_data.
Signed-off-by: Eric Paris <eparis@redhat.com>
selinux_inode_has_perm is a hot path. Instead of declaring the
common_audit_data on the stack move it to a noinline function only used in
the rare case we need to send an audit message.
Signed-off-by: Eric Paris <eparis@redhat.com>
We pay a rather large overhead initializing the common_audit_data.
Since we only need this information if we actually emit an audit
message there is little need to set it up in the hot path. This patch
splits the functionality of avc_has_perm() into avc_has_perm_noaudit(),
avc_audit_required() and slow_avc_audit(). But we take care of setting
up to audit between required() and the actual audit call. Thus saving
measurable time in a hot path.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Eric Paris <eparis@redhat.com>
We reset the bool names and values array to NULL, but do not reset the
number of entries in these arrays to 0. If we error out and then get back
into this function we will walk these NULL pointers based on the belief
that they are non-zero length.
Signed-off-by: Eric Paris <eparis@redhat.com>
cc: stable@kernel.org
I'm not really sure what the idea behind the sel_div function is, but it's
useless. Since a and b are both unsigned, it's impossible for a % b < 0.
That means that part of the function never does anything. Thus it's just a
normal /. Just do that instead. I don't even understand what that operation
was supposed to mean in the signed case however....
If it was signed:
sel_div(-2, 4) == ((-2 / 4) - ((-2 % 4) < 0))
((0) - ((-2) < 0))
((0) - (1))
(-1)
What actually happens:
sel_div(-2, 4) == ((18446744073709551614 / 4) - ((18446744073709551614 % 4) < 0))
((4611686018427387903) - ((2 < 0))
(4611686018427387903 - 0)
((unsigned int)4611686018427387903)
(4294967295)
Neither makes a whole ton of sense to me. So I'm getting rid of the
function entirely.
Signed-off-by: Eric Paris <eparis@redhat.com>
It's possible that the caller passed a NULL for scontext. However if this
is a defered mapping we might still attempt to call *scontext=kstrdup().
This is bad. Instead just return the len.
Signed-off-by: Eric Paris <eparis@redhat.com>
We know that some yum operation is causing CAP_MAC_ADMIN failures. This
implies that an RPM is laying down (or attempting to lay down) a file with
an invalid label. The problem is that we don't have any information to
track down the cause. This patch will cause such a failure to report the
failed label in an SELINUX_ERR audit message. This is similar to the
SELINUX_ERR reports on invalid transitions and things like that. It should
help run down problems on what is trying to set invalid labels in the
future.
Resulting records look something like:
type=AVC msg=audit(1319659241.138:71): avc: denied { mac_admin } for pid=2594 comm="chcon" capability=33 scontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tcontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tclass=capability2
type=SELINUX_ERR msg=audit(1319659241.138:71): op=setxattr invalid_context=unconfined_u:object_r:hello:s0
type=SYSCALL msg=audit(1319659241.138:71): arch=c000003e syscall=188 success=no exit=-22 a0=a2c0e0 a1=390341b79b a2=a2d620 a3=1f items=1 ppid=2519 pid=2594 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=1 comm="chcon" exe="/usr/bin/chcon" subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key=(null)
type=CWD msg=audit(1319659241.138:71): cwd="/root" type=PATH msg=audit(1319659241.138:71): item=0 name="test" inode=785879 dev=fc:03 mode=0100644 ouid=0 ogid=0 rdev=00:00 obj=unconfined_u:object_r:admin_home_t:s0
Signed-off-by: Eric Paris <eparis@redhat.com>
In RH BZ 578841 we realized that the SELinux sandbox program was allowed to
truncate files outside of the sandbox. The reason is because sandbox
confinement is determined almost entirely by the 'open' permission. The idea
was that if the sandbox was unable to open() files it would be unable to do
harm to those files. This turns out to be false in light of syscalls like
truncate() and chmod() which don't require a previous open() call. I looked
at the syscalls that did not have an associated 'open' check and found that
truncate(), did not have a seperate permission and even if it did have a
separate permission such a permission owuld be inadequate for use by
sandbox (since it owuld have to be granted so liberally as to be useless).
This patch checks the OPEN permission on truncate. I think a better solution
for sandbox is a whole new permission, but at least this fixes what we have
today.
Signed-off-by: Eric Paris <eparis@redhat.com>
Because Fedora shipped userspace based on my development tree we now
have policy version 27 in the wild defining only default user, role, and
range. Thus to add default_type we need a policy.28.
Signed-off-by: Eric Paris <eparis@redhat.com>
When new objects are created we have great and flexible rules to
determine the type of the new object. We aren't quite as flexible or
mature when it comes to determining the user, role, and range. This
patch adds a new ability to specify the place a new objects user, role,
and range should come from. For users and roles it can come from either
the source or the target of the operation. aka for files the user can
either come from the source (the running process and todays default) or
it can come from the target (aka the parent directory of the new file)
examples always are done with
directory context: system_u:object_r:mnt_t:s0-s0:c0.c512
process context: unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
[no rule]
unconfined_u:object_r:mnt_t:s0 test_none
[default user source]
unconfined_u:object_r:mnt_t:s0 test_user_source
[default user target]
system_u:object_r:mnt_t:s0 test_user_target
[default role source]
unconfined_u:unconfined_r:mnt_t:s0 test_role_source
[default role target]
unconfined_u:object_r:mnt_t:s0 test_role_target
[default range source low]
unconfined_u:object_r:mnt_t:s0 test_range_source_low
[default range source high]
unconfined_u:object_r:mnt_t:s0:c0.c1023 test_range_source_high
[default range source low-high]
unconfined_u:object_r:mnt_t:s0-s0:c0.c1023 test_range_source_low-high
[default range target low]
unconfined_u:object_r:mnt_t:s0 test_range_target_low
[default range target high]
unconfined_u:object_r:mnt_t:s0:c0.c512 test_range_target_high
[default range target low-high]
unconfined_u:object_r:mnt_t:s0-s0:c0.c512 test_range_target_low-high
Signed-off-by: Eric Paris <eparis@redhat.com>
There is no reason the DAC perms on reading the policy file need to be root
only. There are selinux checks which should control this access.
Signed-off-by: Eric Paris <eparis@redhat.com>
It just bloats the audit data structure for no good reason, since the
only time those fields are filled are just before calling the
common_lsm_audit() function, which is also the only user of those
fields.
So just make them be the arguments to common_lsm_audit(), rather than
bloating that structure that is passed around everywhere, and is
initialized in hot paths.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Instead of declaring the entire selinux_audit_data on the stack when we
start an operation on declare it on the stack if we are going to use it.
We know it's usefulness at the end of the security decision and can declare
it there.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After shrinking the common_audit_data stack usage for private LSM data I'm
not going to shrink the data union. To do this I'm going to move anything
larger than 2 void * ptrs to it's own structure and require it to be declared
separately on the calling stack. Thus hot paths which don't need more than
a couple pointer don't have to declare space to hold large unneeded
structures. I could get this down to one void * by dealing with the key
struct and the struct path. We'll see if that is helpful after taking care of
networking.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus found that the gigantic size of the common audit data caused a big
perf hit on something as simple as running stat() in a loop. This patch
requires LSMs to declare the LSM specific portion separately rather than
doing it in a union. Thus each LSM can be responsible for shrinking their
portion and don't have to pay a penalty just because other LSMs have a
bigger space requirement.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull second try at vfs part d#2 from Al Viro:
"Miklos' first series (with do_lookup() rewrite split into edible
chunks) + assorted bits and pieces.
The 'untangling of do_lookup()' series is is a splitup of what used to
be a monolithic patch from Miklos, so this series is basically "how do
I convince myself that his patch is correct (or find a hole in it)".
No holes found and I like the resulting cleanup, so in it went..."
Changes from try 1: Fix a boot problem with selinux, and commit messages
prettied up a bit.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (24 commits)
vfs: fix out-of-date dentry_unhash() comment
vfs: split __lookup_hash
untangling do_lookup() - take __lookup_hash()-calling case out of line.
untangling do_lookup() - switch to calling __lookup_hash()
untangling do_lookup() - merge d_alloc_and_lookup() callers
untangling do_lookup() - merge failure exits in !dentry case
untangling do_lookup() - massage !dentry case towards __lookup_hash()
untangling do_lookup() - get rid of need_reval in !dentry case
untangling do_lookup() - eliminate a loop.
untangling do_lookup() - expand the area under ->i_mutex
untangling do_lookup() - isolate !dentry stuff from the rest of it.
vfs: move MAY_EXEC check from __lookup_hash()
vfs: don't revalidate just looked up dentry
vfs: fix d_need_lookup/d_revalidate order in do_lookup
ext3: move headers to fs/ext3/
migrate ext2_fs.h guts to fs/ext2/ext2.h
new helper: ext2_image_size()
get rid of pointless includes of ext2_fs.h
ext2: No longer export ext2_fs.h to user space
mtdchar: kill persistently held vfsmount
...
Now that all the slow-path code is gone from these functions, we can
inline them into the main caller - avc_has_perm_flags().
Now the compiler can see that 'avc' is allocated on the stack for this
case, which helps register pressure a bit. It also actually shrinks the
total stack frame, because the stack frame that avc_has_perm_flags()
always needed (for that 'avc' allocation) is now sufficient for the
inlined functions too.
Inlining isn't bad - but mindless inlining of cold code (see the
previous commit) is.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The selinux AVC paths remain some of the hottest (and deepest) codepaths
at filename lookup time, and we make it worse by having the slow path
cases take up I$ and stack space even when they don't trigger. Gcc
tends to always want to inline functions that are just called once -
never mind that this might make for slower and worse code in the caller.
So this tries to improve on it a bit by making the slow-path cases
explicitly separate functions that are marked noinline, causing gcc to
at least no longer allocate stack space for them unless they are
actually called. It also seems to help register allocation a tiny bit,
since gcc now doesn't take the slow case code into account.
Uninlining the slow path may also allow us to inline the remaining hot
path into the one caller that actually matters: avc_has_perm_flags().
I'll have to look at that separately, but both avc_audit() and
avc_has_perm_noaudit() are now small and lean enough that inlining them
may make sense.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull x32 support for x86-64 from Ingo Molnar:
"This tree introduces the X32 binary format and execution mode for x86:
32-bit data space binaries using 64-bit instructions and 64-bit kernel
syscalls.
This allows applications whose working set fits into a 32 bits address
space to make use of 64-bit instructions while using a 32-bit address
space with shorter pointers, more compressed data structures, etc."
Fix up trivial context conflicts in arch/x86/{Kconfig,vdso/vma.c}
* 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits)
x32: Fix alignment fail in struct compat_siginfo
x32: Fix stupid ia32/x32 inversion in the siginfo format
x32: Add ptrace for x32
x32: Switch to a 64-bit clock_t
x32: Provide separate is_ia32_task() and is_x32_task() predicates
x86, mtrr: Use explicit sizing and padding for the 64-bit ioctls
x86/x32: Fix the binutils auto-detect
x32: Warn and disable rather than error if binutils too old
x32: Only clear TIF_X32 flag once
x32: Make sure TS_COMPAT is cleared for x32 tasks
fs: Remove missed ->fds_bits from cessation use of fd_set structs internally
fs: Fix close_on_exec pointer in alloc_fdtable
x32: Drop non-__vdso weak symbols from the x32 VDSO
x32: Fix coding style violations in the x32 VDSO code
x32: Add x32 VDSO support
x32: Allow x32 to be configured
x32: If configured, add x32 system calls to system call tables
x32: Handle process creation
x32: Signal-related system calls
x86: Add #ifdef CONFIG_COMPAT to <asm/sys_ia32.h>
...
Remove all #inclusions of asm/system.h preparatory to splitting and killing
it. Performed with the following command:
perl -p -i -e 's!^#\s*include\s*<asm/system[.]h>.*\n!!' `grep -Irl '^#\s*include\s*<asm/system[.]h>' *`
Signed-off-by: David Howells <dhowells@redhat.com>
selinux/xfrm.h needs to #include net/flow.h or else suffer:
In file included from security/selinux/ss/services.c:69:0:
security/selinux/include/xfrm.h: In function 'selinux_xfrm_notify_policyload':
security/selinux/include/xfrm.h:53:14: error: 'flow_cache_genid' undeclared (first use in this function)
security/selinux/include/xfrm.h:53:14: note: each undeclared identifier is reported only once for each function it appears in
Signed-off-by: David Howells <dhowells@redhat.com>
avc_audit() did a lot of jumping around and had a big stack frame, all
for the uncommon case.
Split up the uncommon case (which we really can't make go fast anyway)
into its own slow function, and mark the conditional branches
appropriately for the common likely case.
This causes avc_audit() to no longer show up as one of the hottest
functions on the branch profiles (the new "perf -b" thing), and makes
the cycle profiles look really nice and dense too.
The whole audit path is still annoyingly very much one of the biggest
costs of name lookup, so these things are worth optimizing for. I wish
we could just tell people to turn it off, but realistically we do need
it: we just need to make sure that the overhead of the necessary evil is
as low as possible.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Replace the fd_sets in struct fdtable with an array of unsigned longs and then
use the standard non-atomic bit operations rather than the FD_* macros.
This:
(1) Removes the abuses of struct fd_set:
(a) Since we don't want to allocate a full fd_set the vast majority of the
time, we actually, in effect, just allocate a just-big-enough array of
unsigned longs and cast it to an fd_set type - so why bother with the
fd_set at all?
(b) Some places outside of the core fdtable handling code (such as
SELinux) want to look inside the array of unsigned longs hidden inside
the fd_set struct for more efficient iteration over the entire set.
(2) Eliminates the use of FD_*() macros in the kernel completely.
(3) Permits the __FD_*() macros to be deleted entirely where not exposed to
userspace.
Signed-off-by: David Howells <dhowells@redhat.com>
Link: http://lkml.kernel.org/r/20120216174954.23314.48147.stgit@warthog.procyon.org.uk
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
* 'for-linus' of git://selinuxproject.org/~jmorris/linux-security:
capabilities: remove __cap_full_set definition
security: remove the security_netlink_recv hook as it is equivalent to capable()
ptrace: do not audit capability check when outputing /proc/pid/stat
capabilities: remove task_ns_* functions
capabitlies: ns_capable can use the cap helpers rather than lsm call
capabilities: style only - move capable below ns_capable
capabilites: introduce new has_ns_capabilities_noaudit
capabilities: call has_ns_capability from has_capability
capabilities: remove all _real_ interfaces
capabilities: introduce security_capable_noaudit
capabilities: reverse arguments to security_capable
capabilities: remove the task from capable LSM hook entirely
selinux: sparse fix: fix several warnings in the security server cod
selinux: sparse fix: fix warnings in netlink code
selinux: sparse fix: eliminate warnings for selinuxfs
selinux: sparse fix: declare selinux_disable() in security.h
selinux: sparse fix: move selinux_complete_init
selinux: sparse fix: make selinux_secmark_refcount static
SELinux: Fix RCU deref check warning in sel_netport_insert()
Manually fix up a semantic mis-merge wrt security_netlink_recv():
- the interface was removed in commit fd77846152 ("security: remove
the security_netlink_recv hook as it is equivalent to capable()")
- a new user of it appeared in commit a38f7907b9 ("crypto: Add
userspace configuration API")
causing no automatic merge conflict, but Eric Paris pointed out the
issue.
* 'for-linus' of git://selinuxproject.org/~jmorris/linux-security: (32 commits)
ima: fix invalid memory reference
ima: free duplicate measurement memory
security: update security_file_mmap() docs
selinux: Casting (void *) value returned by kmalloc is useless
apparmor: fix module parameter handling
Security: tomoyo: add .gitignore file
tomoyo: add missing rcu_dereference()
apparmor: add missing rcu_dereference()
evm: prevent racing during tfm allocation
evm: key must be set once during initialization
mpi/mpi-mpow: NULL dereference on allocation failure
digsig: build dependency fix
KEYS: Give key types their own lockdep class for key->sem
TPM: fix transmit_cmd error logic
TPM: NSC and TIS drivers X86 dependency fix
TPM: Export wait_for_stat for other vendor specific drivers
TPM: Use vendor specific function for status probe
tpm_tis: add delay after aborting command
tpm_tis: Check return code from getting timeouts/durations
tpm: Introduce function to poll for result of self test
...
Fix up trivial conflict in lib/Makefile due to addition of CONFIG_MPI
and SIGSIG next to CONFIG_DQL addition.