Commit 5d226df4 has introduced a performance regression of about
10% in the UnixBench pipe benchmark. It turns out that the call
to inode_security in selinux_file_permission can be moved below
the zero-mask test and that inode_security_revalidate can be
removed entirely, which brings us back to roughly the original
performance.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Nothing in there gives a damn about the buffer alignment - it
just parses its contents. So the use of get_zeroed_page()
doesn't buy us anything - might as well had been kmalloc(),
which makes that code equivalent to open-coded memdup_user_nul()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Any process is able to send netlink messages with invalid types.
Make the warning rate-limited to prevent too much log spam.
The warning is supposed to help to find misbehaving programs, so
print the triggering command name and pid.
Reported-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Vladis Dronov <vdronov@redhat.com>
[PM: subject line tweak to make checkpatch.pl happy]
Signed-off-by: Paul Moore <pmoore@redhat.com>
Make validatetrans decisions available through selinuxfs.
"/validatetrans" is added to selinuxfs for this purpose.
This functionality is needed by file system servers
implemented in userspace or kernelspace without the VFS
layer.
Writing "$oldcontext $newcontext $tclass $taskcontext"
to /validatetrans is expected to return 0 if the transition
is allowed and -EPERM otherwise.
Signed-off-by: Andrew Perepechko <anserper@ya.ru>
CC: andrew.perepechko@seagate.com
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
When fetching an inode's security label, check if it is still valid, and
try reloading it if it is not. Reloading will fail when we are in RCU
context which doesn't allow sleeping, or when we can't find a dentry for
the inode. (Reloading happens via iop->getxattr which takes a dentry
parameter.) When reloading fails, continue using the old, invalid
label.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Add a hook to invalidate an inode's security label when the cached
information becomes invalid.
Add the new hook in selinux: set a flag when a security label becomes
invalid.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: James Morris <james.l.morris@oracle.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Add functions dentry_security and inode_security for accessing
inode->i_security. These functions initially don't do much, but they
will later be used to revalidate the security labels when necessary.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Make the inode argument of the inode_getsecid hook non-const so that we
can use it to revalidate invalid security labels.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Make the inode argument of the inode_getsecurity hook non-const so that
we can use it to revalidate invalid security labels.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
commit fa1aa143ac ("selinux: extended permissions for ioctls")
introduced a bug into the handling of conditional rules, skipping the
processing entirely when the caller does not provide an extended
permissions (xperms) structure. Access checks from userspace using
/sys/fs/selinux/access do not include such a structure since that
interface does not presently expose extended permission information.
As a result, conditional rules were being ignored entirely on userspace
access requests, producing denials when access was allowed by
conditional rules in the policy. Fix the bug by only skipping
computation of extended permissions in this situation, not the entire
conditional rules processing.
Reported-by: Laurent Bigonville <bigon@debian.org>
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
[PM: fixed long lines in patch description]
Cc: stable@vger.kernel.org # 4.3
Signed-off-by: Paul Moore <pmoore@redhat.com>
Pull networking fixes from David Miller:
1) Fix null deref in xt_TEE netfilter module, from Eric Dumazet.
2) Several spots need to get to the original listner for SYN-ACK
packets, most spots got this ok but some were not. Whilst covering
the remaining cases, create a helper to do this. From Eric Dumazet.
3) Missiing check of return value from alloc_netdev() in CAIF SPI code,
from Rasmus Villemoes.
4) Don't sleep while != TASK_RUNNING in macvtap, from Vlad Yasevich.
5) Use after free in mvneta driver, from Justin Maggard.
6) Fix race on dst->flags access in dst_release(), from Eric Dumazet.
7) Add missing ZLIB_INFLATE dependency for new qed driver. From Arnd
Bergmann.
8) Fix multicast getsockopt deadlock, from WANG Cong.
9) Fix deadlock in btusb, from Kuba Pawlak.
10) Some ipv6_add_dev() failure paths were not cleaning up the SNMP6
counter state. From Sabrina Dubroca.
11) Fix packet_bind() race, which can cause lost notifications, from
Francesco Ruggeri.
12) Fix MAC restoration in qlcnic driver during bonding mode changes,
from Jarod Wilson.
13) Revert bridging forward delay change which broke libvirt and other
userspace things, from Vlad Yasevich.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (65 commits)
Revert "bridge: Allow forward delay to be cfgd when STP enabled"
bpf_trace: Make dependent on PERF_EVENTS
qed: select ZLIB_INFLATE
net: fix a race in dst_release()
net: mvneta: Fix memory use after free.
net: Documentation: Fix default value tcp_limit_output_bytes
macvtap: Resolve possible __might_sleep warning in macvtap_do_read()
mvneta: add FIXED_PHY dependency
net: caif: check return value of alloc_netdev
net: hisilicon: NET_VENDOR_HISILICON should depend on HAS_DMA
drivers: net: xgene: fix RGMII 10/100Mb mode
netfilter: nft_meta: use skb_to_full_sk() helper
net_sched: em_meta: use skb_to_full_sk() helper
sched: cls_flow: use skb_to_full_sk() helper
netfilter: xt_owner: use skb_to_full_sk() helper
smack: use skb_to_full_sk() helper
net: add skb_to_full_sk() helper and use it in selinux_netlbl_skbuff_setsid()
bpf: doc: correct arch list for supported eBPF JIT
dwc_eth_qos: Delete an unnecessary check before the function call "of_node_put"
bonding: fix panic on non-ARPHRD_ETHER enslave failure
...
Generalize selinux_skb_sk() added in commit 212cd08953
("selinux: fix random read in selinux_ip_postroute_compat()")
so that we can use it other contexts.
Use it right away in selinux_netlbl_skbuff_setsid()
Fixes: ca6fb06518 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull security subsystem update from James Morris:
"This is mostly maintenance updates across the subsystem, with a
notable update for TPM 2.0, and addition of Jarkko Sakkinen as a
maintainer of that"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (40 commits)
apparmor: clarify CRYPTO dependency
selinux: Use a kmem_cache for allocation struct file_security_struct
selinux: ioctl_has_perm should be static
selinux: use sprintf return value
selinux: use kstrdup() in security_get_bools()
selinux: use kmemdup in security_sid_to_context_core()
selinux: remove pointless cast in selinux_inode_setsecurity()
selinux: introduce security_context_str_to_sid
selinux: do not check open perm on ftruncate call
selinux: change CONFIG_SECURITY_SELINUX_CHECKREQPROT_VALUE default
KEYS: Merge the type-specific data with the payload data
KEYS: Provide a script to extract a module signature
KEYS: Provide a script to extract the sys cert list from a vmlinux file
keys: Be more consistent in selection of union members used
certs: add .gitignore to stop git nagging about x509_certificate_list
KEYS: use kvfree() in add_key
Smack: limited capability for changing process label
TPM: remove unnecessary little endian conversion
vTPM: support little endian guests
char: Drop owner assignment from i2c_driver
...
In commit e446f9dfe1 ("net: synack packets can be attached to request
sockets"), I missed one remaining case of invalid skb->sk->sk_security
access.
Dmitry Vyukov got a KASan report pointing to it.
Add selinux_skb_sk() helper that is responsible to get back to the
listener if skb is attached to a request socket, instead of
duplicating the logic.
Fixes: ca6fb06518 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The size of struct file_security_struct is 16byte at my setup.
But, the real allocation size for per each file_security_struct
is 64bytes in my setup that kmalloc min size is 64bytes
because ARCH_DMA_MINALIGN is 64.
This allocation is called every times at file allocation(alloc_file()).
So, the total slack memory size(allocated size - request size)
is increased exponentially.
E.g) Min Kmalloc Size : 64bytes, Unit : bytes
Allocated Size | Request Size | Slack Size | Allocation Count
---------------------------------------------------------------
770048 | 192512 | 577536 | 12032
At the result, this change reduce memory usage 42bytes per each
file_security_struct
Signed-off-by: Sangwoo <sangwoo2.park@lge.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
[PM: removed extra subject prefix]
Signed-off-by: Paul Moore <pmoore@redhat.com>
Fixes the following sparse warning:
security/selinux/hooks.c:3242:5: warning: symbol 'ioctl_has_perm' was
not declared. Should it be static?
Signed-off-by: Geliang Tang <geliangtang@163.com>
Acked-by: Jeff Vander Stoep <jeffv@google.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
sprintf returns the number of characters printed (excluding '\0'), so
we can use that and avoid duplicating the length computation.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
This is much simpler.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
security_context_to_sid() expects a const char* argument, so there's
no point in casting away the const qualifier of value.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
There seems to be a little confusion as to whether the scontext_len
parameter of security_context_to_sid() includes the nul-byte or
not. Reading security_context_to_sid_core(), it seems that the
expectation is that it does not (both the string copying and the test
for scontext_len being zero hint at that).
Introduce the helper security_context_str_to_sid() to do the strlen()
call and fix all callers.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Use the ATTR_FILE attribute to distinguish between truncate()
and ftruncate() system calls. The two other cases where
do_truncate is called with a filp (and therefore ATTR_FILE is set)
are for coredump files and for open(O_TRUNC). In both of those cases
the open permission has already been checked during file open and
therefore does not need to be repeated.
Commit 95dbf73931 ("SELinux: check OPEN on truncate calls")
fixed a major issue where domains were allowed to truncate files
without the open permission. However, it introduced a new bug where
a domain with the write permission can no longer ftruncate files
without the open permission, even when they receive an already open
file.
Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Change the SELinux checkreqprot default value to 0 so that SELinux
performs access control checking on the actual memory protections
used by the kernel and not those requested by the application.
Signed-off-by: Paul Moore <pmoore@redhat.com>
This merge resolves conflicts with 75aec9df3a ("bridge: Remove
br_nf_push_frag_xmit_sk") as part of Eric Biederman's effort to improve
netns support in the network stack that reached upstream via David's
net-next tree.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Conflicts:
net/bridge/br_netfilter_hooks.c
since commit 8405a8fff3 ("netfilter: nf_qeueue: Drop queue entries on
nf_unregister_hook") all pending queued entries are discarded.
So we can simply remove all of the owner handling -- when module is
removed it also needs to unregister all its hooks.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
selinux needs few changes to accommodate fact that SYNACK messages
can be attached to a request socket, lacking sk_security pointer
(Only syncookies are still attached to a TCP_LISTEN socket)
Adds a new sk_listener() helper, and use it in selinux and sch_fq
Fixes: ca6fb06518 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported by: kernel test robot <ying.huang@linux.intel.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Eric Paris <eparis@parisplace.org>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Only pass the void *priv parameter out of the nf_hook_ops. That is
all any of the functions are interested now, and by limiting what is
passed it becomes simpler to change implementation details.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
With two exceptions (drm/qxl and drm/radeon) all vm_operations_struct
structs should be constant.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull security subsystem updates from James Morris:
"Highlights:
- PKCS#7 support added to support signed kexec, also utilized for
module signing. See comments in 3f1e1bea.
** NOTE: this requires linking against the OpenSSL library, which
must be installed, e.g. the openssl-devel on Fedora **
- Smack
- add IPv6 host labeling; ignore labels on kernel threads
- support smack labeling mounts which use binary mount data
- SELinux:
- add ioctl whitelisting (see
http://kernsec.org/files/lss2015/vanderstoep.pdf)
- fix mprotect PROT_EXEC regression caused by mm change
- Seccomp:
- add ptrace options for suspend/resume"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (57 commits)
PKCS#7: Add OIDs for sha224, sha284 and sha512 hash algos and use them
Documentation/Changes: Now need OpenSSL devel packages for module signing
scripts: add extract-cert and sign-file to .gitignore
modsign: Handle signing key in source tree
modsign: Use if_changed rule for extracting cert from module signing key
Move certificate handling to its own directory
sign-file: Fix warning about BIO_reset() return value
PKCS#7: Add MODULE_LICENSE() to test module
Smack - Fix build error with bringup unconfigured
sign-file: Document dependency on OpenSSL devel libraries
PKCS#7: Appropriately restrict authenticated attributes and content type
KEYS: Add a name for PKEY_ID_PKCS7
PKCS#7: Improve and export the X.509 ASN.1 time object decoder
modsign: Use extract-cert to process CONFIG_SYSTEM_TRUSTED_KEYS
extract-cert: Cope with multiple X.509 certificates in a single file
sign-file: Generate CMS message as signature instead of PKCS#7
PKCS#7: Support CMS messages also [RFC5652]
X.509: Change recorded SKID & AKID to not include Subject or Issuer
PKCS#7: Check content type and versions
MAINTAINERS: The keyrings mailing list has moved
...
Many file systems that implement the show_options hook fail to correctly
escape their output which could lead to unescaped characters (e.g. new
lines) leaking into /proc/mounts and /proc/[pid]/mountinfo files. This
could lead to confusion, spoofed entries (resulting in things like
systemd issuing false d-bus "mount" notifications), and who knows what
else. This looks like it would only be the root user stepping on
themselves, but it's possible weird things could happen in containers or
in other situations with delegated mount privileges.
Here's an example using overlay with setuid fusermount trusting the
contents of /proc/mounts (via the /etc/mtab symlink). Imagine the use
of "sudo" is something more sneaky:
$ BASE="ovl"
$ MNT="$BASE/mnt"
$ LOW="$BASE/lower"
$ UP="$BASE/upper"
$ WORK="$BASE/work/ 0 0
none /proc fuse.pwn user_id=1000"
$ mkdir -p "$LOW" "$UP" "$WORK"
$ sudo mount -t overlay -o "lowerdir=$LOW,upperdir=$UP,workdir=$WORK" none /mnt
$ cat /proc/mounts
none /root/ovl/mnt overlay rw,relatime,lowerdir=ovl/lower,upperdir=ovl/upper,workdir=ovl/work/ 0 0
none /proc fuse.pwn user_id=1000 0 0
$ fusermount -u /proc
$ cat /proc/mounts
cat: /proc/mounts: No such file or directory
This fixes the problem by adding new seq_show_option and
seq_show_option_n helpers, and updating the vulnerable show_option
handlers to use them as needed. Some, like SELinux, need to be open
coded due to unusual existing escape mechanisms.
[akpm@linux-foundation.org: add lost chunk, per Kees]
[keescook@chromium.org: seq_show_option should be using const parameters]
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Jan Kara <jack@suse.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Cc: J. R. Okajima <hooanon05g@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Create a common helper function to determine the label for a new inode.
This is then used by:
- may_create()
- selinux_dentry_init_security()
- selinux_inode_init_security()
This will change the behaviour of the functions slightly, bringing them
all into line.
Suggested-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Initialize the security class of sock security structures
to the generic socket class. This is similar to what is
already done in inode_alloc_security for files. Generally
the sclass field will later by set by socket_post_create
or sk_clone or sock_graft, but for protocol implementations
that fail to call any of these for newly accepted sockets,
we want some sane default that will yield a legitimate
avc denied message with non-garbage values for class and
permission.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
The inode_free_security() function just took the superblock's isec_lock
before checking and trying to remove the inode security struct from the
linked list. In many cases, the list was empty and so the lock taking
is wasteful as no useful work is done. On multi-socket systems with
a large number of CPUs, there can also be a fair amount of spinlock
contention on the isec_lock if many tasks are exiting at the same time.
This patch changes the code to check the state of the list first before
taking the lock and attempting to dequeue it. The list_del_init()
can be called more than once on the same list with no harm as long
as they are properly serialized. It should not be possible to have
inode_free_security() called concurrently with list_add(). For better
safety, however, we use list_empty_careful() here even though it is
still not completely safe in case that happens.
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Add extended permissions logic to selinux. Extended permissions
provides additional permissions in 256 bit increments. Extend the
generic ioctl permission check to use the extended permissions for
per-command filtering. Source/target/class sets including the ioctl
permission may additionally include a set of commands. Example:
allowxperm <source> <target>:<class> ioctl unpriv_app_socket_cmds
auditallowxperm <source> <target>:<class> ioctl priv_gpu_cmds
Where unpriv_app_socket_cmds and priv_gpu_cmds are macros
representing commonly granted sets of ioctl commands.
When ioctl commands are omitted only the permissions are checked.
This feature is intended to provide finer granularity for the ioctl
permission that may be too imprecise. For example, the same driver
may use ioctls to provide important and benign functionality such as
driver version or socket type as well as dangerous capabilities such
as debugging features, read/write/execute to physical memory or
access to sensitive data. Per-command filtering provides a mechanism
to reduce the attack surface of the kernel, and limit applications
to the subset of commands required.
The format of the policy binary has been modified to include ioctl
commands, and the policy version number has been incremented to
POLICYDB_VERSION_XPERMS_IOCTL=30 to account for the format
change.
The extended permissions logic is deliberately generic to allow
components to be reused e.g. netlink filters
Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
Acked-by: Nick Kralevich <nnk@google.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
commit 66fc130394 ("mm: shmem_zero_setup
skip security check and lockdep conflict with XFS") caused a regression
for SELinux by disabling any SELinux checking of mprotect PROT_EXEC on
shared anonymous mappings. However, even before that regression, the
checking on such mprotect PROT_EXEC calls was inconsistent with the
checking on a mmap PROT_EXEC call for a shared anonymous mapping. On a
mmap, the security hook is passed a NULL file and knows it is dealing
with an anonymous mapping and therefore applies an execmem check and no
file checks. On a mprotect, the security hook is passed a vma with a
non-NULL vm_file (as this was set from the internally-created shmem
file during mmap) and therefore applies the file-based execute check
and no execmem check. Since the aforementioned commit now marks the
shmem zero inode with the S_PRIVATE flag, the file checks are disabled
and we have no checking at all on mprotect PROT_EXEC. Add a test to
the mprotect hook logic for such private inodes, and apply an execmem
check in that case. This makes the mmap and mprotect checking
consistent for shared anonymous mappings, as well as for /dev/zero and
ashmem.
Cc: <stable@vger.kernel.org> # 4.1.x
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
At present we don't create efficient ebitmaps when importing NetLabel
category bitmaps. This can present a problem when comparing ebitmaps
since ebitmap_cmp() is very strict about these things and considers
these wasteful ebitmaps not equal when compared to their more
efficient counterparts, even if their values are the same. This isn't
likely to cause problems on 64-bit systems due to a bit of luck on
how NetLabel/CIPSO works and the default ebitmap size, but it can be
a problem on 32-bit systems.
This patch fixes this problem by being a bit more intelligent when
importing NetLabel category bitmaps by skipping over empty sections
which should result in a nice, efficient ebitmap.
Cc: stable@vger.kernel.org # 3.17
Signed-off-by: Paul Moore <pmoore@redhat.com>
Pull user namespace updates from Eric Biederman:
"Long ago and far away when user namespaces where young it was realized
that allowing fresh mounts of proc and sysfs with only user namespace
permissions could violate the basic rule that only root gets to decide
if proc or sysfs should be mounted at all.
Some hacks were put in place to reduce the worst of the damage could
be done, and the common sense rule was adopted that fresh mounts of
proc and sysfs should allow no more than bind mounts of proc and
sysfs. Unfortunately that rule has not been fully enforced.
There are two kinds of gaps in that enforcement. Only filesystems
mounted on empty directories of proc and sysfs should be ignored but
the test for empty directories was insufficient. So in my tree
directories on proc, sysctl and sysfs that will always be empty are
created specially. Every other technique is imperfect as an ordinary
directory can have entries added even after a readdir returns and
shows that the directory is empty. Special creation of directories
for mount points makes the code in the kernel a smidge clearer about
it's purpose. I asked container developers from the various container
projects to help test this and no holes were found in the set of mount
points on proc and sysfs that are created specially.
This set of changes also starts enforcing the mount flags of fresh
mounts of proc and sysfs are consistent with the existing mount of
proc and sysfs. I expected this to be the boring part of the work but
unfortunately unprivileged userspace winds up mounting fresh copies of
proc and sysfs with noexec and nosuid clear when root set those flags
on the previous mount of proc and sysfs. So for now only the atime,
read-only and nodev attributes which userspace happens to keep
consistent are enforced. Dealing with the noexec and nosuid
attributes remains for another time.
This set of changes also addresses an issue with how open file
descriptors from /proc/<pid>/ns/* are displayed. Recently readlink of
/proc/<pid>/fd has been triggering a WARN_ON that has not been
meaningful since it was added (as all of the code in the kernel was
converted) and is not now actively wrong.
There is also a short list of issues that have not been fixed yet that
I will mention briefly.
It is possible to rename a directory from below to above a bind mount.
At which point any directory pointers below the renamed directory can
be walked up to the root directory of the filesystem. With user
namespaces enabled a bind mount of the bind mount can be created
allowing the user to pick a directory whose children they can rename
to outside of the bind mount. This is challenging to fix and doubly
so because all obvious solutions must touch code that is in the
performance part of pathname resolution.
As mentioned above there is also a question of how to ensure that
developers by accident or with purpose do not introduce exectuable
files on sysfs and proc and in doing so introduce security regressions
in the current userspace that will not be immediately obvious and as
such are likely to require breaking userspace in painful ways once
they are recognized"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
vfs: Remove incorrect debugging WARN in prepend_path
mnt: Update fs_fully_visible to test for permanently empty directories
sysfs: Create mountpoints with sysfs_create_mount_point
sysfs: Add support for permanently empty directories to serve as mount points.
kernfs: Add support for always empty directories.
proc: Allow creating permanently empty directories that serve as mount points
sysctl: Allow creating permanently empty directories that serve as mountpoints.
fs: Add helper functions for permanently empty directories.
vfs: Ignore unlocked mounts in fs_fully_visible
mnt: Modify fs_fully_visible to deal with locked ro nodev and atime
mnt: Refactor the logic for mounting sysfs and proc in a user namespace
This allows for better documentation in the code and
it allows for a simpler and fully correct version of
fs_fully_visible to be written.
The mount points converted and their filesystems are:
/sys/hypervisor/s390/ s390_hypfs
/sys/kernel/config/ configfs
/sys/kernel/debug/ debugfs
/sys/firmware/efi/efivars/ efivarfs
/sys/fs/fuse/connections/ fusectl
/sys/fs/pstore/ pstore
/sys/kernel/tracing/ tracefs
/sys/fs/cgroup/ cgroup
/sys/kernel/security/ securityfs
/sys/fs/selinux/ selinuxfs
/sys/fs/smackfs/ smackfs
Cc: stable@vger.kernel.org
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Pull security subsystem updates from James Morris:
"The main change in this kernel is Casey's generalized LSM stacking
work, which removes the hard-coding of Capabilities and Yama stacking,
allowing multiple arbitrary "small" LSMs to be stacked with a default
monolithic module (e.g. SELinux, Smack, AppArmor).
See
https://lwn.net/Articles/636056/
This will allow smaller, simpler LSMs to be incorporated into the
mainline kernel and arbitrarily stacked by users. Also, this is a
useful cleanup of the LSM code in its own right"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (38 commits)
tpm, tpm_crb: fix le64_to_cpu conversions in crb_acpi_add()
vTPM: set virtual device before passing to ibmvtpm_reset_crq
tpm_ibmvtpm: remove unneccessary message level.
ima: update builtin policies
ima: extend "mask" policy matching support
ima: add support for new "euid" policy condition
ima: fix ima_show_template_data_ascii()
Smack: freeing an error pointer in smk_write_revoke_subj()
selinux: fix setting of security labels on NFS
selinux: Remove unused permission definitions
selinux: enable genfscon labeling for sysfs and pstore files
selinux: enable per-file labeling for debugfs files.
selinux: update netlink socket classes
signals: don't abuse __flush_signals() in selinux_bprm_committed_creds()
selinux: Print 'sclass' as string when unrecognized netlink message occurs
Smack: allow multiple labels in onlycap
Smack: fix seq operations in smackfs
ima: pass iint to ima_add_violation()
ima: wrap event related data to the new ima_event_data structure
integrity: add validity checks for 'path' parameter
...
Pull networking updates from David Miller:
1) Add TX fast path in mac80211, from Johannes Berg.
2) Add TSO/GRO support to ibmveth, from Thomas Falcon
3) Move away from cached routes in ipv6, just like ipv4, from Martin
KaFai Lau.
4) Lots of new rhashtable tests, from Thomas Graf.
5) Run ingress qdisc lockless, from Alexei Starovoitov.
6) Allow servers to fetch TCP packet headers for SYN packets of new
connections, for fingerprinting. From Eric Dumazet.
7) Add mode parameter to pktgen, for testing receive. From Alexei
Starovoitov.
8) Cache access optimizations via simplifications of build_skb(), from
Alexander Duyck.
9) Move page frag allocator under mm/, also from Alexander.
10) Add xmit_more support to hv_netvsc, from KY Srinivasan.
11) Add a counter guard in case we try to perform endless reclassify
loops in the packet scheduler.
12) Extern flow dissector to be programmable and use it in new "Flower"
classifier. From Jiri Pirko.
13) AF_PACKET fanout rollover fixes, performance improvements, and new
statistics. From Willem de Bruijn.
14) Add netdev driver for GENEVE tunnels, from John W Linville.
15) Add ingress netfilter hooks and filtering, from Pablo Neira Ayuso.
16) Fix handling of epoll edge triggers in TCP, from Eric Dumazet.
17) Add an ECN retry fallback for the initial TCP handshake, from Daniel
Borkmann.
18) Add tail call support to BPF, from Alexei Starovoitov.
19) Add several pktgen helper scripts, from Jesper Dangaard Brouer.
20) Add zerocopy support to AF_UNIX, from Hannes Frederic Sowa.
21) Favor even port numbers for allocation to connect() requests, and
odd port numbers for bind(0), in an effort to help avoid
ip_local_port_range exhaustion. From Eric Dumazet.
22) Add Cavium ThunderX driver, from Sunil Goutham.
23) Allow bpf programs to access skb_iif and dev->ifindex SKB metadata,
from Alexei Starovoitov.
24) Add support for T6 chips in cxgb4vf driver, from Hariprasad Shenai.
25) Double TCP Small Queues default to 256K to accomodate situations
like the XEN driver and wireless aggregation. From Wei Liu.
26) Add more entropy inputs to flow dissector, from Tom Herbert.
27) Add CDG congestion control algorithm to TCP, from Kenneth Klette
Jonassen.
28) Convert ipset over to RCU locking, from Jozsef Kadlecsik.
29) Track and act upon link status of ipv4 route nexthops, from Andy
Gospodarek.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1670 commits)
bridge: vlan: flush the dynamically learned entries on port vlan delete
bridge: multicast: add a comment to br_port_state_selection about blocking state
net: inet_diag: export IPV6_V6ONLY sockopt
stmmac: troubleshoot unexpected bits in des0 & des1
net: ipv4 sysctl option to ignore routes when nexthop link is down
net: track link-status of ipv4 nexthops
net: switchdev: ignore unsupported bridge flags
net: Cavium: Fix MAC address setting in shutdown state
drivers: net: xgene: fix for ACPI support without ACPI
ip: report the original address of ICMP messages
net/mlx5e: Prefetch skb data on RX
net/mlx5e: Pop cq outside mlx5e_get_cqe
net/mlx5e: Remove mlx5e_cq.sqrq back-pointer
net/mlx5e: Remove extra spaces
net/mlx5e: Avoid TX CQE generation if more xmit packets expected
net/mlx5e: Avoid redundant dev_kfree_skb() upon NOP completion
net/mlx5e: Remove re-assignment of wq type in mlx5e_enable_rq()
net/mlx5e: Use skb_shinfo(skb)->gso_segs rather than counting them
net/mlx5e: Static mapping of netdev priv resources to/from netdev TX queues
net/mlx4_en: Use HW counters for rx/tx bytes/packets in PF device
...
While testing my netfilter changes I noticed several files where
recompiling unncessarily because they unncessarily included
netfilter.h.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Before calling into the filesystem, vfs_setxattr calls
security_inode_setxattr, which ends up calling selinux_inode_setxattr in
our case. That returns -EOPNOTSUPP whenever SBLABEL_MNT is not set.
SBLABEL_MNT was supposed to be set by sb_finish_set_opts, which sets it
only if selinux_is_sblabel_mnt returns true.
The selinux_is_sblabel_mnt logic was broken by eadcabc697 "SELinux: do
all flags twiddling in one place", which didn't take into the account
the SECURITY_FS_USE_NATIVE behavior that had been introduced for nfs
with eb9ae68650 "SELinux: Add new labeling type native labels".
This caused setxattr's of security labels over NFSv4.2 to fail.
Cc: stable@kernel.org # 3.13
Cc: Eric Paris <eparis@redhat.com>
Cc: David Quigley <dpquigl@davequigley.com>
Reported-by: Richard Chan <rc556677@outlook.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
[PM: added the stable dependency]
Signed-off-by: Paul Moore <pmoore@redhat.com>
Remove unused permission definitions from SELinux.
Many of these were only ever used in pre-mainline
versions of SELinux, prior to Linux 2.6.0. Some of them
were used in the legacy network or compat_net=1 checks
that were disabled by default in Linux 2.6.18 and
fully removed in Linux 2.6.30.
Permissions never used in mainline Linux:
file swapon
filesystem transition
tcp_socket { connectto newconn acceptfrom }
node enforce_dest
unix_stream_socket { newconn acceptfrom }
Legacy network checks, removed in 2.6.30:
socket { recv_msg send_msg }
node { tcp_recv tcp_send udp_recv udp_send rawip_recv rawip_send dccp_recv dccp_send }
netif { tcp_recv tcp_send udp_recv udp_send rawip_recv rawip_send dccp_recv dccp_send }
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Support per-file labeling of sysfs and pstore files based on
genfscon policy entries. This is safe because the sysfs
and pstore directory tree cannot be manipulated by userspace,
except to unlink pstore entries.
This provides an alternative method of assigning per-file labeling
to sysfs or pstore files without needing to set the labels from
userspace on each boot. The advantages of this approach are that
the labels are assigned as soon as the dentry is first instantiated
and userspace does not need to walk the sysfs or pstore tree and
set the labels on each boot. The limitations of this approach are
that the labels can only be assigned based on pathname prefix matching.
You can initially assign labels using this mechanism and then change
them at runtime via setxattr if allowed to do so by policy.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Suggested-by: Dominick Grift <dac.override@gmail.com>
Acked-by: Jeff Vander Stoep <jeffv@google.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Add support for per-file labeling of debugfs files so that
we can distinguish them in policy. This is particularly
important in Android where certain debugfs files have to be writable
by apps and therefore the debugfs directory tree can be read and
searched by all.
Since debugfs is entirely kernel-generated, the directory tree is
immutable by userspace, and the inodes are pinned in memory, we can
simply use the same approach as with proc and label the inodes from
policy based on pathname from the root of the debugfs filesystem.
Generalize the existing labeling support used for proc and reuse it
for debugfs too.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Update the set of SELinux netlink socket class definitions to match
the set of netlink protocols implemented by the kernel. The
ip_queue implementation for the NETLINK_FIREWALL and NETLINK_IP6_FW protocols
was removed in d16cf20e2f, so we can remove
the corresponding class definitions as this is dead code. Add new
classes for NETLINK_ISCSI, NETLINK_FIB_LOOKUP, NETLINK_CONNECTOR,
NETLINK_NETFILTER, NETLINK_GENERIC, NETLINK_SCSITRANSPORT, NETLINK_RDMA,
and NETLINK_CRYPTO so that we can distinguish among sockets created
for each of these protocols. This change does not define the finer-grained
nlsmsg_read/write permissions or map specific nlmsg_type values to those
permissions in the SELinux nlmsgtab; if finer-grained control of these
sockets is desired/required, that can be added as a follow-on change.
We do not define a SELinux class for NETLINK_ECRYPTFS as the implementation
was removed in 624ae52845.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
selinux_bprm_committed_creds()->__flush_signals() is not right, we
shouldn't clear TIF_SIGPENDING unconditionally. There can be other
reasons for signal_pending(): freezing(), JOBCTL_PENDING_MASK, and
potentially more.
Also change this code to check fatal_signal_pending() rather than
SIGNAL_GROUP_EXIT, it looks a bit better.
Now we can kill __flush_signals() before it finds another buggy user.
Note: this code looks racy, we can flush a signal which was sent after
the task SID has been updated.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
This prints the 'sclass' field as string instead of index in unrecognized netlink message.
The textual representation makes it easier to distinguish the right class.
Signed-off-by: Marek Milkovic <mmilkovi@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
[PM: 80-char width fixes]
Signed-off-by: Paul Moore <pmoore@redhat.com>
Instead of using a vector of security operations
with explicit, special case stacking of the capability
and yama hooks use lists of hooks with capability and
yama hooks included as appropriate.
The security_operations structure is no longer required.
Instead, there is a union of the function pointers that
allows all the hooks lists to use a common mechanism for
list management while retaining typing. Each module
supplies an array describing the hooks it provides instead
of a sparsely populated security_operations structure.
The description includes the element that gets put on
the hook list, avoiding the issues surrounding individual
element allocation.
The method for registering security modules is changed to
reflect the information available. The method for removing
a module, currently only used by SELinux, has also changed.
It should be generic now, however if there are potential
race conditions based on ordering of hook removal that needs
to be addressed by the calling module.
The security hooks are called from the lists and the first
failure is returned.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: John Johansen <john.johansen@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Paul Moore <paul@paul-moore.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Add a list header for each security hook. They aren't used until
later in the patch series. They are grouped together in a structure
so that there doesn't need to be an external address for each.
Macro-ize the initialization of the security_operations
for each security module in anticipation of changing out
the security_operations structure.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: John Johansen <john.johansen@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Paul Moore <paul@paul-moore.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: James Morris <james.l.morris@oracle.com>
The security.h header file serves two purposes,
interfaces for users of the security modules and
interfaces for security modules. Users of the
security modules don't need to know about what's
in the security_operations structure, so pull it
out into it's own header, lsm_hooks.h
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: John Johansen <john.johansen@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Paul Moore <paul@paul-moore.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: James Morris <james.l.morris@oracle.com>
inode_follow_link now takes an inode and rcu flag as well as the
dentry.
inode is used in preference to d_backing_inode(dentry), particularly
in RCU-walk mode.
selinux_inode_follow_link() gets dentry_has_perm() and
inode_has_perm() open-coded into it so that it can call
avc_has_perm_flags() in way that is safe if LOOKUP_RCU is set.
Calling avc_has_perm_flags() with rcu_read_lock() held means
that when avc_has_perm_noaudit calls avc_compute_av(), the attempt
to rcu_read_unlock() before calling security_compute_av() will not
actually drop the RCU read-lock.
However as security_compute_av() is completely in a read_lock()ed
region, it should be safe with the RCU read-lock held.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This allows MAY_NOT_BLOCK to be passed, in RCU-walk mode, through
the new avc_has_perm_flags() to avc_audit() and thence the slow_avc_audit.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
No ->inode_follow_link() methods use the nameidata arg, and
it is about to become private to namei.c.
So remove from all inode_follow_link() functions.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull fourth vfs update from Al Viro:
"d_inode() annotations from David Howells (sat in for-next since before
the beginning of merge window) + four assorted fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
RCU pathwalk breakage when running into a symlink overmounting something
fix I_DIO_WAKEUP definition
direct-io: only inc/dec inode->i_dio_count for file systems
fs/9p: fix readdir()
VFS: assorted d_backing_inode() annotations
VFS: fs/inode.c helpers: d_inode() annotations
VFS: fs/cachefiles: d_backing_inode() annotations
VFS: fs library helpers: d_inode() annotations
VFS: assorted weird filesystems: d_inode() annotations
VFS: normal filesystems (and lustre): d_inode() annotations
VFS: security/: d_inode() annotations
VFS: security/: d_backing_inode() annotations
VFS: net/: d_inode() annotations
VFS: net/unix: d_backing_inode() annotations
VFS: kernel/: d_inode() annotations
VFS: audit: d_backing_inode() annotations
VFS: Fix up some ->d_inode accesses in the chelsio driver
VFS: Cachefiles should perform fs modifications on the top layer only
VFS: AF_UNIX sockets should call mknod on the top layer only
... except where that code acts as a filesystem driver, rather than
working with dentries given to it.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
most of the ->d_inode uses there refer to the same inode IO would
go to, i.e. d_backing_inode()
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull security subsystem updates from James Morris:
"Highlights for this window:
- improved AVC hashing for SELinux by John Brooks and Stephen Smalley
- addition of an unconfined label to Smack
- Smack documentation update
- TPM driver updates"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (28 commits)
lsm: copy comm before calling audit_log to avoid race in string printing
tomoyo: Do not generate empty policy files
tomoyo: Use if_changed when generating builtin-policy.h
tomoyo: Use bin2c to generate builtin-policy.h
selinux: increase avtab max buckets
selinux: Use a better hash function for avtab
selinux: convert avtab hash table to flex_array
selinux: reconcile security_netlbl_secattr_to_sid() and mls_import_netlbl_cat()
selinux: remove unnecessary pointer reassignment
Smack: Updates for Smack documentation
tpm/st33zp24/spi: Add missing device table for spi phy.
tpm/st33zp24: Add proper wait for ordinal duration in case of irq mode
smack: Fix gcc warning from unused smack_syslog_lock mutex in smackfs.c
Smack: Allow an unconfined label in bringup mode
Smack: getting the Smack security context of keys
Smack: Assign smack_known_web as default smk_in label for kernel thread's socket
tpm/tpm_infineon: Use struct dev_pm_ops for power management
MAINTAINERS: Add Jason as designated reviewer for TPM
tpm: Update KConfig text to include TPM2.0 FIFO chips
tpm/st33zp24/dts/st33zp24-spi: Add dts documentation for st33zp24 spi phy
...
Pull networking updates from David Miller:
1) Add BQL support to via-rhine, from Tino Reichardt.
2) Integrate SWITCHDEV layer support into the DSA layer, so DSA drivers
can support hw switch offloading. From Floria Fainelli.
3) Allow 'ip address' commands to initiate multicast group join/leave,
from Madhu Challa.
4) Many ipv4 FIB lookup optimizations from Alexander Duyck.
5) Support EBPF in cls_bpf classifier and act_bpf action, from Daniel
Borkmann.
6) Remove the ugly compat support in ARP for ugly layers like ax25,
rose, etc. And use this to clean up the neigh layer, then use it to
implement MPLS support. All from Eric Biederman.
7) Support L3 forwarding offloading in switches, from Scott Feldman.
8) Collapse the LOCAL and MAIN ipv4 FIB tables when possible, to speed
up route lookups even further. From Alexander Duyck.
9) Many improvements and bug fixes to the rhashtable implementation,
from Herbert Xu and Thomas Graf. In particular, in the case where
an rhashtable user bulk adds a large number of items into an empty
table, we expand the table much more sanely.
10) Don't make the tcp_metrics hash table per-namespace, from Eric
Biederman.
11) Extend EBPF to access SKB fields, from Alexei Starovoitov.
12) Split out new connection request sockets so that they can be
established in the main hash table. Much less false sharing since
hash lookups go direct to the request sockets instead of having to
go first to the listener then to the request socks hashed
underneath. From Eric Dumazet.
13) Add async I/O support for crytpo AF_ALG sockets, from Tadeusz Struk.
14) Support stable privacy address generation for RFC7217 in IPV6. From
Hannes Frederic Sowa.
15) Hash network namespace into IP frag IDs, also from Hannes Frederic
Sowa.
16) Convert PTP get/set methods to use 64-bit time, from Richard
Cochran.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1816 commits)
fm10k: Bump driver version to 0.15.2
fm10k: corrected VF multicast update
fm10k: mbx_update_max_size does not drop all oversized messages
fm10k: reset head instead of calling update_max_size
fm10k: renamed mbx_tx_dropped to mbx_tx_oversized
fm10k: update xcast mode before synchronizing multicast addresses
fm10k: start service timer on probe
fm10k: fix function header comment
fm10k: comment next_vf_mbx flow
fm10k: don't handle mailbox events in iov_event path and always process mailbox
fm10k: use separate workqueue for fm10k driver
fm10k: Set PF queues to unlimited bandwidth during virtualization
fm10k: expose tx_timeout_count as an ethtool stat
fm10k: only increment tx_timeout_count in Tx hang path
fm10k: remove extraneous "Reset interface" message
fm10k: separate PF only stats so that VF does not display them
fm10k: use hw->mac.max_queues for stats
fm10k: only show actual queues, not the maximum in hardware
fm10k: allow creation of VLAN on default vid
fm10k: fix unused warnings
...
Pull vfs update from Al Viro:
"Part one:
- struct filename-related cleanups
- saner iov_iter_init() replacements (and switching the syscalls to
use of those)
- ntfs switch to ->write_iter() (Anton)
- aio cleanups and splitting iocb into common and async parts
(Christoph)
- assorted fixes (me, bfields, Andrew Elble)
There's a lot more, including the completion of switchover to
->{read,write}_iter(), d_inode/d_backing_inode annotations, f_flags
race fixes, etc, but that goes after #for-davem merge. David has
pulled it, and once it's in I'll send the next vfs pull request"
* 'for-linus-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (35 commits)
sg_start_req(): use import_iovec()
sg_start_req(): make sure that there's not too many elements in iovec
blk_rq_map_user(): use import_single_range()
sg_io(): use import_iovec()
process_vm_access: switch to {compat_,}import_iovec()
switch keyctl_instantiate_key_common() to iov_iter
switch {compat_,}do_readv_writev() to {compat_,}import_iovec()
aio_setup_vectored_rw(): switch to {compat_,}import_iovec()
vmsplice_to_user(): switch to import_iovec()
kill aio_setup_single_vector()
aio: simplify arguments of aio_setup_..._rw()
aio: lift iov_iter_init() into aio_setup_..._rw()
lift iov_iter into {compat_,}do_readv_writev()
NFS: fix BUG() crash in notify_change() with patch to chown_common()
dcache: return -ESTALE not -EBUSY on distributed fs race
NTFS: Version 2.1.32 - Update file write from aio_write to write_iter.
VFS: Add iov_iter_fault_in_multipages_readable()
drop bogus check in file_open_root()
switch security_inode_getattr() to struct path *
constify tomoyo_realpath_from_path()
...
When a new rtnl or xfrm command is added, this part of the code is frequently
missing. Let's help the developer with a build time test.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This command is missing.
Fixes: 3a2dfbe8ac ("xfrm: Notify changes in UDP encapsulation via netlink")
CC: Martin Willi <martin@strongswan.org>
Reported-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This command is missing.
Fixes: 5c79de6e79 ("[XFRM]: User interface for handling XFRM_MSG_MIGRATE")
Reported-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This command is missing.
Fixes: 97a64b4577 ("[XFRM]: Introduce XFRM_MSG_REPORT.")
Reported-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These commands are missing.
Fixes: 28d8909bc7 ("[XFRM]: Export SAD info.")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This command is missing.
Fixes: ecfd6b1837 ("[XFRM]: Export SPD info")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This new command is missing.
Fixes: 880a6fab8f ("xfrm: configure policy hash table thresholds by netlink")
Reported-by: Christophe Gouault <christophe.gouault@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This new command is missing.
Fixes: 9a9634545c ("netns: notify netns id events")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These new commands are missing.
Fixes: 0c7aecd4bd ("netns: add rtnl cmd to add and get peer netns ids")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that we can safely increase the avtab max buckets without
triggering high order allocations and have a hash function that
will make better use of the larger number of buckets, increase
the max buckets to 2^16.
Original:
101421 entries and 2048/2048 buckets used, longest chain length 374
With new hash function:
101421 entries and 2048/2048 buckets used, longest chain length 81
With increased max buckets:
101421 entries and 31078/32768 buckets used, longest chain length 12
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
This function, based on murmurhash3, has much better distribution than
the original. Using the current default of 2048 buckets, there are many
fewer collisions:
Before:
101421 entries and 2048/2048 buckets used, longest chain length 374
After:
101421 entries and 2048/2048 buckets used, longest chain length 81
The difference becomes much more significant when buckets are increased.
A naive attempt to expand the current function to larger outputs doesn't
yield any significant improvement; so this function is a prerequisite
for increasing the bucket size.
sds: Adapted from the original patches for libsepol to the kernel.
Signed-off-by: John Brooks <john.brooks@jolla.com>
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Previously we shrank the avtab max hash buckets to avoid
high order memory allocations, but this causes avtab lookups to
degenerate to very long linear searches for the Fedora policy. Convert to
using a flex_array instead so that we can increase the buckets
without such limitations.
This change does not alter the max hash buckets; that is left to a
separate follow-on change.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Move the NetLabel secattr MLS category import logic into
mls_import_netlbl_cat() where it belongs, and use the
mls_import_netlbl_cat() function in security_netlbl_secattr_to_sid().
Reported-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Commit f01e1af445 ("selinux: don't pass in NULL avd to avc_has_perm_noaudit")
made this pointer reassignment unnecessary. Avd should continue to reference
the stack-based copy.
Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
[PM: tweaked subject line]
Signed-off-by: Paul Moore <pmoore@redhat.com>
Conflicts:
drivers/net/usb/asix_common.c
drivers/net/usb/sr9800.c
drivers/net/usb/usbnet.c
include/linux/usb/usbnet.h
net/ipv4/tcp_ipv4.c
net/ipv6/tcp_ipv6.c
The TCP conflicts were overlapping changes. In 'net' we added a
READ_ONCE() to the socket cached RX route read, whilst in 'net-next'
Eric Dumazet touched the surrounding code dealing with how mini
sockets are handled.
With USB, it's a case of the same bug fix first going into net-next
and then I cherry picked it back into net.
Signed-off-by: David S. Miller <davem@davemloft.net>
Return a negative error value like the rest of the entries in this function.
Cc: <stable@vger.kernel.org>
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
[PM: tweaked subject line]
Signed-off-by: Paul Moore <pmoore@redhat.com>
This reverts commit ca10b9e9a8.
No longer needed after commit eb8895debe
("tcp: tcp_make_synack() should use sock_wmalloc")
When under SYNFLOOD, we build lot of SYNACK and hit false sharing
because of multiple modifications done on sk_listener->sk_wmem_alloc
Since tcp_make_synack() uses sock_wmalloc(), there is no need
to call skb_set_owner_w() again, as this adds two atomic operations.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert the following where appropriate:
(1) S_ISLNK(dentry->d_inode) to d_is_symlink(dentry).
(2) S_ISREG(dentry->d_inode) to d_is_reg(dentry).
(3) S_ISDIR(dentry->d_inode) to d_is_dir(dentry). This is actually more
complicated than it appears as some calls should be converted to
d_can_lookup() instead. The difference is whether the directory in
question is a real dir with a ->lookup op or whether it's a fake dir with
a ->d_automount op.
In some circumstances, we can subsume checks for dentry->d_inode not being
NULL into this, provided we the code isn't in a filesystem that expects
d_inode to be NULL if the dirent really *is* negative (ie. if we're going to
use d_inode() rather than d_backing_inode() to get the inode pointer).
Note that the dentry type field may be set to something other than
DCACHE_MISS_TYPE when d_inode is NULL in the case of unionmount, where the VFS
manages the fall-through from a negative dentry to a lower layer. In such a
case, the dentry type of the negative union dentry is set to the same as the
type of the lower dentry.
However, if you know d_inode is not NULL at the call site, then you can use
the d_is_xxx() functions even in a filesystem.
There is one further complication: a 0,0 chardev dentry may be labelled
DCACHE_WHITEOUT_TYPE rather than DCACHE_SPECIAL_TYPE. Strictly, this was
intended for special directory entry types that don't have attached inodes.
The following perl+coccinelle script was used:
use strict;
my @callers;
open($fd, 'git grep -l \'S_IS[A-Z].*->d_inode\' |') ||
die "Can't grep for S_ISDIR and co. callers";
@callers = <$fd>;
close($fd);
unless (@callers) {
print "No matches\n";
exit(0);
}
my @cocci = (
'@@',
'expression E;',
'@@',
'',
'- S_ISLNK(E->d_inode->i_mode)',
'+ d_is_symlink(E)',
'',
'@@',
'expression E;',
'@@',
'',
'- S_ISDIR(E->d_inode->i_mode)',
'+ d_is_dir(E)',
'',
'@@',
'expression E;',
'@@',
'',
'- S_ISREG(E->d_inode->i_mode)',
'+ d_is_reg(E)' );
my $coccifile = "tmp.sp.cocci";
open($fd, ">$coccifile") || die $coccifile;
print($fd "$_\n") || die $coccifile foreach (@cocci);
close($fd);
foreach my $file (@callers) {
chomp $file;
print "Processing ", $file, "\n";
system("spatch", "--sp-file", $coccifile, $file, "--in-place", "--no-show-diff") == 0 ||
die "spatch failed";
}
[AV: overlayfs parts skipped]
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Use d_is_positive() rather than testing dentry->d_inode in SELinux to get rid
of direct references to d_inode outside of the VFS.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull misc VFS updates from Al Viro:
"This cycle a lot of stuff sits on topical branches, so I'll be sending
more or less one pull request per branch.
This is the first pile; more to follow in a few. In this one are
several misc commits from early in the cycle (before I went for
separate branches), plus the rework of mntput/dput ordering on umount,
switching to use of fs_pin instead of convoluted games in
namespace_unlock()"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
switch the IO-triggering parts of umount to fs_pin
new fs_pin killing logics
allow attaching fs_pin to a group not associated with some superblock
get rid of the second argument of acct_kill()
take count and rcu_head out of fs_pin
dcache: let the dentry count go down to zero without taking d_lock
pull bumping refcount into ->kill()
kill pin_put()
mode_t whack-a-mole: chelsio
file->f_path.dentry is pinned down for as long as the file is open...
get rid of lustre_dump_dentry()
gut proc_register() a bit
kill d_validate()
ncpfs: get rid of d_validate() nonsense
selinuxfs: don't open-code d_genocide()
Here's the big char/misc driver update for 3.20-rc1.
Lots of little things in here, all described in the changelog. Nothing
major or unusual, except maybe the binder selinux stuff, which was all
acked by the proper selinux people and they thought it best to come
through this tree.
All of this has been in linux-next with no reported issues for a while.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlTgs80ACgkQMUfUDdst+yn86gCeMLbxANGExVLd+PR46GNsAUQb
SJ4AmgIqrkIz+5LCwZWM02ldbYhPeBVf
=lfmM
-----END PGP SIGNATURE-----
Merge tag 'char-misc-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char / misc patches from Greg KH:
"Here's the big char/misc driver update for 3.20-rc1.
Lots of little things in here, all described in the changelog.
Nothing major or unusual, except maybe the binder selinux stuff, which
was all acked by the proper selinux people and they thought it best to
come through this tree.
All of this has been in linux-next with no reported issues for a while"
* tag 'char-misc-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (90 commits)
coresight: fix function etm_writel_cp14() parameter order
coresight-etm: remove check for unknown Kconfig macro
coresight: fixing CPU hwid lookup in device tree
coresight: remove the unnecessary function coresight_is_bit_set()
coresight: fix the debug AMBA bus name
coresight: remove the extra spaces
coresight: fix the link between orphan connection and newly added device
coresight: remove the unnecessary replicator property
coresight: fix the replicator subtype value
pdfdocs: Fix 'make pdfdocs' failure for 'uio-howto.tmpl'
mcb: Fix error path of mcb_pci_probe
virtio/console: verify device has config space
ti-st: clean up data types (fix harmless memory corruption)
mei: me: release hw from reset only during the reset flow
mei: mask interrupt set bit on clean reset bit
extcon: max77693: Constify struct regmap_config
extcon: adc-jack: Release IIO channel on driver remove
extcon: Remove duplicated include from extcon-class.c
Drivers: hv: vmbus: hv_process_timer_expiration() can be static
Drivers: hv: vmbus: serialize Offer and Rescind offer
...
If hashtab_create() returns a NULL pointer then we should return -ENOMEM
but instead the current code returns success.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
- add "pstore" and "debugfs" to list of in-core exceptions
- change fstype checks to boolean equation
- change from strncmp to strcmp for checking
Signed-off-by: Mark Salyzyn <salyzyn@android.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
[PM: tweaked the subject line prefix to "selinux"]
Signed-off-by: Paul Moore <pmoore@redhat.com>
While the filesystem labeling method is only printed at the KERN_DEBUG
level, this still appears in dmesg and on modern Linux distributions
that create a lot of tmpfs mounts for session handling, the dmesg can
easily be filled with a lot of "SELinux: initialized (dev X ..."
messages. This patch removes this notification for the normal case
but leaves the error message intact (displayed when mounting a
filesystem with an unknown labeling behavior).
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Remove the function avc_sidcmp() that is not used anywhere.
This was partially found by using a static code analysis program called cppcheck.
Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
[PM: rewrite the patch subject line]
Signed-off-by: Paul Moore <pmoore@redhat.com>
Add security hooks to the binder and implement the hooks for SELinux.
The security hooks enable security modules such as SELinux to implement
controls over binder IPC. The security hooks include support for
controlling what process can become the binder context manager
(binder_set_context_mgr), controlling the ability of a process
to invoke a binder transaction/IPC to another process (binder_transaction),
controlling the ability of a process to transfer a binder reference to
another process (binder_transfer_binder), and controlling the ability
of a process to transfer an open file to another process (binder_transfer_file).
These hooks have been included in the Android kernel trees since Android 4.3.
(Updated to reflect upstream relocation and changes to the binder driver,
changes to the LSM audit data structures, coding style cleanups, and
to add inline documentation for the hooks).
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Nick Kralevich <nnk@google.com>
Acked-by: Jeffrey Vander Stoep <jeffv@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull security layer updates from James Morris:
"In terms of changes, there's general maintenance to the Smack,
SELinux, and integrity code.
The IMA code adds a new kconfig option, IMA_APPRAISE_SIGNED_INIT,
which allows IMA appraisal to require signatures. Support for reading
keys from rootfs before init is call is also added"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (23 commits)
selinux: Remove security_ops extern
security: smack: fix out-of-bounds access in smk_parse_smack()
VFS: refactor vfs_read()
ima: require signature based appraisal
integrity: provide a hook to load keys when rootfs is ready
ima: load x509 certificate from the kernel
integrity: provide a function to load x509 certificate from the kernel
integrity: define a new function integrity_read_file()
Security: smack: replace kzalloc with kmem_cache for inode_smack
Smack: Lock mode for the floor and hat labels
ima: added support for new kernel cmdline parameter ima_template_fmt
ima: allocate field pointers array on demand in template_desc_init_fields()
ima: don't allocate a copy of template_fmt in template_desc_init_fields()
ima: display template format in meas. list if template name length is zero
ima: added error messages to template-related functions
ima: use atomic bit operations to protect policy update interface
ima: ignore empty and with whitespaces policy lines
ima: no need to allocate entry for comment
ima: report policy load status
ima: use path names cache
...
Convert WARN_ONCE() to printk() in selinux_nlmsg_perm().
After conversion from audit_log() in commit e173fb26, WARN_ONCE() was
deemed too alarmist, so switch it to printk().
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
[PM: Changed to printk(WARNING) so we catch all of the different
invalid netlink messages. In Richard's defense, he brought this
point up earlier, but I didn't understand his point at the time.]
Signed-off-by: Paul Moore <pmoore@redhat.com>
sb_finish_set_opts() can race with inode_free_security()
when initializing inode security structures for inodes
created prior to initial policy load or by the filesystem
during ->mount(). This appears to have always been
a possible race, but commit 3dc91d4 ("SELinux: Fix possible
NULL pointer dereference in selinux_inode_permission()")
made it more evident by immediately reusing the unioned
list/rcu element of the inode security structure for call_rcu()
upon an inode_free_security(). But the underlying issue
was already present before that commit as a possible use-after-free
of isec.
Shivnandan Kumar reported the list corruption and proposed
a patch to split the list and rcu elements out of the union
as separate fields of the inode_security_struct so that setting
the rcu element would not affect the list element. However,
this would merely hide the issue and not truly fix the code.
This patch instead moves up the deletion of the list entry
prior to dropping the sbsec->isec_lock initially. Then,
if the inode is dropped subsequently, there will be no further
references to the isec.
Reported-by: Shivnandan Kumar <shivnandan.k@samsung.com>
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
Pull security subsystem updates from James Morris.
Mostly ima, selinux, smack and key handling updates.
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (65 commits)
integrity: do zero padding of the key id
KEYS: output last portion of fingerprint in /proc/keys
KEYS: strip 'id:' from ca_keyid
KEYS: use swapped SKID for performing partial matching
KEYS: Restore partial ID matching functionality for asymmetric keys
X.509: If available, use the raw subjKeyId to form the key description
KEYS: handle error code encoded in pointer
selinux: normalize audit log formatting
selinux: cleanup error reporting in selinux_nlmsg_perm()
KEYS: Check hex2bin()'s return when generating an asymmetric key ID
ima: detect violations for mmaped files
ima: fix race condition on ima_rdwr_violation_check and process_measurement
ima: added ima_policy_flag variable
ima: return an error code from ima_add_boot_aggregate()
ima: provide 'ima_appraise=log' kernel option
ima: move keyring initialization to ima_init()
PKCS#7: Handle PKCS#7 messages that contain no X.509 certs
PKCS#7: Better handling of unsupported crypto
KEYS: Overhaul key identification when searching for asymmetric keys
KEYS: Implement binary asymmetric key ID handling
...
Restructure to keyword=value pairs without spaces. Drop superfluous words in
text. Make invalid_context a keyword. Change result= keyword to seresult=.
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
[Minor rewrite to the patch subject line]
Signed-off-by: Paul Moore <pmoore@redhat.com>
Convert audit_log() call to WARN_ONCE().
Rename "type=" to nlmsg_type=" to avoid confusion with the audit record
type.
Added "protocol=" to help track down which protocol (NETLINK_AUDIT?) was used
within the netlink protocol family.
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
[Rewrote the patch subject line]
Signed-off-by: Paul Moore <pmoore@redhat.com>
While SELinux largely ignores namespaces, for good reason, there are
some places where it needs to at least be aware of namespaces in order
to function correctly. Network namespaces are one example. Basic
awareness of network namespaces are necessary in order to match a
network interface's index number to an actual network device.
This patch corrects a problem with network interfaces added to a
non-init namespace, and can be reproduced with the following commands:
[NOTE: the NetLabel configuration is here only to active the dynamic
networking controls ]
# netlabelctl unlbl add default address:0.0.0.0/0 \
label:system_u:object_r:unlabeled_t:s0
# netlabelctl unlbl add default address:::/0 \
label:system_u:object_r:unlabeled_t:s0
# netlabelctl cipsov4 add pass doi:100 tags:1
# netlabelctl map add domain:lspp_test_netlabel_t \
protocol:cipsov4,100
# ip link add type veth
# ip netns add myns
# ip link set veth1 netns myns
# ip a add dev veth0 10.250.13.100/24
# ip netns exec myns ip a add dev veth1 10.250.13.101/24
# ip l set veth0 up
# ip netns exec myns ip l set veth1 up
# ping -c 1 10.250.13.101
# ip netns exec myns ping -c 1 10.250.13.100
Reported-by: Jiri Jaburek <jjaburek@redhat.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
security_file_set_fowner always returns 0, so make it f_setown and
__f_setown void return functions and fix up the error handling in the
callers.
Cc: linux-security-module@vger.kernel.org
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Push ipv4 and ipv6 nf hooks into single array and register/unregister
them via single call.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Paul Moore <pmoore@redhat.com>
A previous commit c0828e5048 ("selinux:
process labeled IPsec TCP SYN-ACK packets properly in
selinux_ip_postroute()") mistakenly left out a 'break' from a switch
statement which caused problems with IPv6 traffic.
Thanks to Florian Westphal for reporting and debugging the issue.
Reported-by: Florian Westphal <fwestpha@redhat.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
If the callee SID is bounded by the caller SID, then allowing
the transition to occur poses no risk of privilege escalation and we can
therefore safely allow the transition to occur. Add this exemption
for both the case where a transition was explicitly requested by the
application and the case where an automatic transition is defined in
policy.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Pull SElinux fixes from Paul Moore:
"Two small patches to fix a couple of build warnings in SELinux and
NetLabel. The patches are obvious enough that I don't think any
additional explanation is necessary, but it basically boils down to
the usual: I was stupid, and these patches fix some of the stupid.
Both patches were posted earlier this week to the SELinux list, and
that is where they sat as I didn't think there were noteworthy enough
to go upstream at this point in time, but DaveM would rather see them
upstream now so who am I to argue. As the patches are both very
small"
* 'stable-3.17' of git://git.infradead.org/users/pcmoore/selinux:
selinux: remove unused variabled in the netport, netnode, and netif caches
netlabel: fix the netlbl_catmap_setlong() dummy function
This patch removes the unused return code variable in the netport,
netnode, and netif initialization functions.
Reported-by: fengguang.wu@intel.com
Signed-off-by: Paul Moore <pmoore@redhat.com>
Historically the NetLabel LSM secattr catmap functions and data
structures have had very long names which makes a mess of the NetLabel
code and anyone who uses NetLabel. This patch renames the catmap
functions and structures from "*_secattr_catmap_*" to just "*_catmap_*"
which improves things greatly.
There are no substantial code or logic changes in this patch.
Signed-off-by: Paul Moore <pmoore@redhat.com>
Tested-by: Casey Schaufler <casey@schaufler-ca.com>
The NetLabel secattr catmap functions, and the SELinux import/export
glue routines, were broken in many horrible ways and the SELinux glue
code fiddled with the NetLabel catmap structures in ways that we
probably shouldn't allow. At some point this "worked", but that was
likely due to a bit of dumb luck and sub-par testing (both inflicted
by yours truly). This patch corrects these problems by basically
gutting the code in favor of something less obtuse and restoring the
NetLabel abstractions in the SELinux catmap glue code.
Everything is working now, and if it decides to break itself in the
future this code will be much easier to debug than the code it
replaces.
One noteworthy side effect of the changes is that it is no longer
necessary to allocate a NetLabel catmap before calling one of the
NetLabel APIs to set a bit in the catmap. NetLabel will automatically
allocate the catmap nodes when needed, resulting in less allocations
when the lowest bit is greater than 255 and less code in the LSMs.
Cc: stable@vger.kernel.org
Reported-by: Christian Evans <frodox@zoho.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Tested-by: Casey Schaufler <casey@schaufler-ca.com>
This reverts commit 4da6daf4d3.
Unfortunately, the commit in question caused problems with Bluetooth
devices, specifically it caused them to get caught in the newly
created BUG_ON() check. The AF_ALG problem still exists, but will be
addressed in a future patch.
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
The sock_graft() hook has special handling for AF_INET, AF_INET, and
AF_UNIX sockets as those address families have special hooks which
label the sock before it is attached its associated socket.
Unfortunately, the sock_graft() hook was missing a default approach
to labeling sockets which meant that any other address family which
made use of connections or the accept() syscall would find the
returned socket to be in an "unlabeled" state. This was recently
demonstrated by the kcrypto/AF_ALG subsystem and the newly released
cryptsetup package (cryptsetup v1.6.5 and later).
This patch preserves the special handling in selinux_sock_graft(),
but adds a default behavior - setting the sock's label equal to the
associated socket - which resolves the problem with AF_ALG and
presumably any other address family which makes use of accept().
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
Tested-by: Milan Broz <gmazyland@gmail.com>
When flushing the AVC, such as during a policy load, the various
network caches are also flushed, with each making a call to
synchronize_net() which has shown to be expensive in some cases.
This patch consolidates the network cache flushes into a single AVC
callback which only calls synchronize_net() once for each AVC cache
flush.
Reported-by: Jaejyn Shin <flagon22bass@gmail.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
With the introduction of fair queued rwlock, recursive read_lock()
may hang the offending process if there is a write_lock() somewhere
in between.
With recursive read_lock checking enabled, the following error was
reported:
=============================================
[ INFO: possible recursive locking detected ]
3.16.0-rc1 #2 Tainted: G E
---------------------------------------------
load_policy/708 is trying to acquire lock:
(policy_rwlock){.+.+..}, at: [<ffffffff8125b32a>]
security_genfs_sid+0x3a/0x170
but task is already holding lock:
(policy_rwlock){.+.+..}, at: [<ffffffff8125b48c>]
security_fs_use+0x2c/0x110
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(policy_rwlock);
lock(policy_rwlock);
This patch fixes the occurrence of recursive read_lock() of
policy_rwlock by adding a helper function __security_genfs_sid()
which requires caller to take the lock before calling it. The
security_fs_use() was then modified to call the new helper function.
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
The cond_read_node() should free the given node on error path as it's
not linked to p->cond_list yet. This is done via cond_node_destroy()
but it's not called when next_entry() fails before the expr loop.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Paul Moore <pmoore@redhat.com>
The node->cur_state and len can be read in a single call of next_entry().
And setting len before reading is a dead write so can be eliminated.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
(Minor tweak to the length parameter in the call to next_entry())
Signed-off-by: Paul Moore <pmoore@redhat.com>
To increase compiler portability there is <linux/compiler.h> which
provides convenience macros for various gcc constructs. Eg: __packed
for __attribute__((packed)).
This patch is part of a large task I've taken to clean the gcc
specific attributes and use the the macros instead.
Signed-off-by: Gideon Israel Dsouza <gidisrael@gmail.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
There're some code duplication for reading a string value during
policydb_read(). Add str_read() helper to fix it.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Paul Moore <pmoore@redhat.com>
ARRAY_SIZE is more concise to use when the size of an array is divided
by the size of its type or the size of its first element.
The Coccinelle semantic patch that makes this change is as follows:
// <smpl>
@@
type T;
T[] E;
@@
- (sizeof(E)/sizeof(E[...]))
+ ARRAY_SIZE(E)
// </smpl>
Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Pull networking updates from David Miller:
1) Seccomp BPF filters can now be JIT'd, from Alexei Starovoitov.
2) Multiqueue support in xen-netback and xen-netfront, from Andrew J
Benniston.
3) Allow tweaking of aggregation settings in cdc_ncm driver, from Bjørn
Mork.
4) BPF now has a "random" opcode, from Chema Gonzalez.
5) Add more BPF documentation and improve test framework, from Daniel
Borkmann.
6) Support TCP fastopen over ipv6, from Daniel Lee.
7) Add software TSO helper functions and use them to support software
TSO in mvneta and mv643xx_eth drivers. From Ezequiel Garcia.
8) Support software TSO in fec driver too, from Nimrod Andy.
9) Add Broadcom SYSTEMPORT driver, from Florian Fainelli.
10) Handle broadcasts more gracefully over macvlan when there are large
numbers of interfaces configured, from Herbert Xu.
11) Allow more control over fwmark used for non-socket based responses,
from Lorenzo Colitti.
12) Do TCP congestion window limiting based upon measurements, from Neal
Cardwell.
13) Support busy polling in SCTP, from Neal Horman.
14) Allow RSS key to be configured via ethtool, from Venkata Duvvuru.
15) Bridge promisc mode handling improvements from Vlad Yasevich.
16) Don't use inetpeer entries to implement ID generation any more, it
performs poorly, from Eric Dumazet.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1522 commits)
rtnetlink: fix userspace API breakage for iproute2 < v3.9.0
tcp: fixing TLP's FIN recovery
net: fec: Add software TSO support
net: fec: Add Scatter/gather support
net: fec: Increase buffer descriptor entry number
net: fec: Factorize feature setting
net: fec: Enable IP header hardware checksum
net: fec: Factorize the .xmit transmit function
bridge: fix compile error when compiling without IPv6 support
bridge: fix smatch warning / potential null pointer dereference
via-rhine: fix full-duplex with autoneg disable
bnx2x: Enlarge the dorq threshold for VFs
bnx2x: Check for UNDI in uncommon branch
bnx2x: Fix 1G-baseT link
bnx2x: Fix link for KR with swapped polarity lane
sctp: Fix sk_ack_backlog wrap-around problem
net/core: Add VF link state control policy
net/fsl: xgmac_mdio is dependent on OF_MDIO
net/fsl: Make xgmac_mdio read error message useful
net_sched: drr: warn when qdisc is not work conserving
...
Pull security layer updates from Serge Hallyn:
"This is a merge of James Morris' security-next tree from 3.14 to
yesterday's master, plus four patches from Paul Moore which are in
linux-next, plus one patch from Mimi"
* 'serge-next-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux-security:
ima: audit log files opened with O_DIRECT flag
selinux: conditionally reschedule in hashtab_insert while loading selinux policy
selinux: conditionally reschedule in mls_convert_context while loading selinux policy
selinux: reject setexeccon() on MNT_NOSUID applications with -EACCES
selinux: Report permissive mode in avc: denied messages.
Warning in scanf string typing
Smack: Label cgroup files for systemd
Smack: Verify read access on file open - v3
security: Convert use of typedef ctl_table to struct ctl_table
Smack: bidirectional UDS connect check
Smack: Correctly remove SMACK64TRANSMUTE attribute
SMACK: Fix handling value==NULL in post setxattr
bugfix patch for SMACK
Smack: adds smackfs/ptrace interface
Smack: unify all ptrace accesses in the smack
Smack: fix the subject/object order in smack_ptrace_traceme()
Minor improvement of 'smack_sb_kern_mount'
smack: fix key permission verification
KEYS: Move the flags representing required permission to linux/key.h
After silencing the sleeping warning in mls_convert_context() I started
seeing similar traces from hashtab_insert. Do a cond_resched there too.
Signed-off-by: Dave Jones <davej@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
We presently prevent processes from using setexecon() to set the
security label of exec()'d processes when NO_NEW_PRIVS is enabled by
returning an error; however, we silently ignore setexeccon() when
exec()'ing from a nosuid mounted filesystem. This patch makes things
a bit more consistent by returning an error in the setexeccon()/nosuid
case.
Signed-off-by: Paul Moore <pmoore@redhat.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
We cannot presently tell from an avc: denied message whether access was in
fact denied or was allowed due to global or per-domain permissive mode.
Add a permissive= field to the avc message to reflect this information.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
After silencing the sleeping warning in mls_convert_context() I started
seeing similar traces from hashtab_insert. Do a cond_resched there too.
Signed-off-by: Dave Jones <davej@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
We presently prevent processes from using setexecon() to set the
security label of exec()'d processes when NO_NEW_PRIVS is enabled by
returning an error; however, we silently ignore setexeccon() when
exec()'ing from a nosuid mounted filesystem. This patch makes things
a bit more consistent by returning an error in the setexeccon()/nosuid
case.
Signed-off-by: Paul Moore <pmoore@redhat.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Conflicts:
drivers/net/ethernet/altera/altera_sgdma.c
net/netlink/af_netlink.c
net/sched/cls_api.c
net/sched/sch_api.c
The netlink conflict dealt with moving to netlink_capable() and
netlink_ns_capable() in the 'net' tree vs. supporting 'tc' operations
in non-init namespaces. These were simple transformations from
netlink_capable to netlink_ns_capable.
The Altera driver conflict was simply code removal overlapping some
void pointer cast cleanups in net-next.
Signed-off-by: David S. Miller <davem@davemloft.net>
We cannot presently tell from an avc: denied message whether access was in
fact denied or was allowed due to global or per-domain permissive mode.
Add a permissive= field to the avc message to reflect this information.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Register a netlink per-protocol bind fuction for audit to check userspace
process capabilities before allowing a multicast group connection.
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
File-private locks have been merged into Linux for v3.15, and *now*
people are commenting that the name and macro definitions for the new
file-private locks suck.
...and I can't even disagree. The names and command macros do suck.
We're going to have to live with these for a long time, so it's
important that we be happy with the names before we're stuck with them.
The consensus on the lists so far is that they should be rechristened as
"open file description locks".
The name isn't a big deal for the kernel, but the command macros are not
visually distinct enough from the traditional POSIX lock macros. The
glibc and documentation folks are recommending that we change them to
look like F_OFD_{GETLK|SETLK|SETLKW}. That lessens the chance that a
programmer will typo one of the commands wrong, and also makes it easier
to spot this difference when reading code.
This patch makes the following changes that I think are necessary before
v3.15 ships:
1) rename the command macros to their new names. These end up in the uapi
headers and so are part of the external-facing API. It turns out that
glibc doesn't actually use the fcntl.h uapi header, but it's hard to
be sure that something else won't. Changing it now is safest.
2) make the the /proc/locks output display these as type "OFDLCK"
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Carlos O'Donell <carlos@redhat.com>
Cc: Stefan Metzmacher <metze@samba.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Frank Filz <ffilzlnx@mindspring.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Pull file locking updates from Jeff Layton:
"Highlights:
- maintainership change for fs/locks.c. Willy's not interested in
maintaining it these days, and is OK with Bruce and I taking it.
- fix for open vs setlease race that Al ID'ed
- cleanup and consolidation of file locking code
- eliminate unneeded BUG() call
- merge of file-private lock implementation"
* 'locks-3.15' of git://git.samba.org/jlayton/linux:
locks: make locks_mandatory_area check for file-private locks
locks: fix locks_mandatory_locked to respect file-private locks
locks: require that flock->l_pid be set to 0 for file-private locks
locks: add new fcntl cmd values for handling file private locks
locks: skip deadlock detection on FL_FILE_PVT locks
locks: pass the cmd value to fcntl_getlk/getlk64
locks: report l_pid as -1 for FL_FILE_PVT locks
locks: make /proc/locks show IS_FILE_PVT locks as type "FLPVT"
locks: rename locks_remove_flock to locks_remove_file
locks: consolidate checks for compatible filp->f_mode values in setlk handlers
locks: fix posix lock range overflow handling
locks: eliminate BUG() call when there's an unexpected lock on file close
locks: add __acquires and __releases annotations to locks_start and locks_stop
locks: remove "inline" qualifier from fl_link manipulation functions
locks: clean up comment typo
locks: close potential race between setlease and open
MAINTAINERS: update entry for fs/locks.c
Pull security subsystem updates from James Morris:
"Apart from reordering the SELinux mmap code to ensure DAC is called
before MAC, these are minor maintenance updates"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (23 commits)
selinux: correctly label /proc inodes in use before the policy is loaded
selinux: put the mmap() DAC controls before the MAC controls
selinux: fix the output of ./scripts/get_maintainer.pl for SELinux
evm: enable key retention service automatically
ima: skip memory allocation for empty files
evm: EVM does not use MD5
ima: return d_name.name if d_path fails
integrity: fix checkpatch errors
ima: fix erroneous removal of security.ima xattr
security: integrity: Use a more current logging style
MAINTAINERS: email updates and other misc. changes
ima: reduce memory usage when a template containing the n field is used
ima: restore the original behavior for sending data with ima template
Integrity: Pass commname via get_task_comm()
fs: move i_readcount
ima: use static const char array definitions
security: have cap_dentry_init_security return error
ima: new helper: file_inode(file)
kernel: Mark function as static in kernel/seccomp.c
capability: Use current logging styles
...
Due to some unfortunate history, POSIX locks have very strange and
unhelpful semantics. The thing that usually catches people by surprise
is that they are dropped whenever the process closes any file descriptor
associated with the inode.
This is extremely problematic for people developing file servers that
need to implement byte-range locks. Developers often need a "lock
management" facility to ensure that file descriptors are not closed
until all of the locks associated with the inode are finished.
Additionally, "classic" POSIX locks are owned by the process. Locks
taken between threads within the same process won't conflict with one
another, which renders them useless for synchronization between threads.
This patchset adds a new type of lock that attempts to address these
issues. These locks conflict with classic POSIX read/write locks, but
have semantics that are more like BSD locks with respect to inheritance
and behavior on close.
This is implemented primarily by changing how fl_owner field is set for
these locks. Instead of having them owned by the files_struct of the
process, they are instead owned by the filp on which they were acquired.
Thus, they are inherited across fork() and are only released when the
last reference to a filp is put.
These new semantics prevent them from being merged with classic POSIX
locks, even if they are acquired by the same process. These locks will
also conflict with classic POSIX locks even if they are acquired by
the same process or on the same file descriptor.
The new locks are managed using a new set of cmd values to the fcntl()
syscall. The initial implementation of this converts these values to
"classic" cmd values at a fairly high level, and the details are not
exposed to the underlying filesystem. We may eventually want to push
this handing out to the lower filesystem code but for now I don't
see any need for it.
Also, note that with this implementation the new cmd values are only
available via fcntl64() on 32-bit arches. There's little need to
add support for legacy apps on a new interface like this.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Conflicts:
Documentation/devicetree/bindings/net/micrel-ks8851.txt
net/core/netpoll.c
The net/core/netpoll.c conflict is a bug fix in 'net' happening
to code which is completely removed in 'net-next'.
In micrel-ks8851.txt we simply have overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is based on an earlier patch by Eric Paris, he describes
the problem below:
"If an inode is accessed before policy load it will get placed on a
list of inodes to be initialized after policy load. After policy
load we call inode_doinit() which calls inode_doinit_with_dentry()
on all inodes accessed before policy load. In the case of inodes
in procfs that means we'll end up at the bottom where it does:
/* Default to the fs superblock SID. */
isec->sid = sbsec->sid;
if ((sbsec->flags & SE_SBPROC) && !S_ISLNK(inode->i_mode)) {
if (opt_dentry) {
isec->sclass = inode_mode_to_security_class(...)
rc = selinux_proc_get_sid(opt_dentry,
isec->sclass,
&sid);
if (rc)
goto out_unlock;
isec->sid = sid;
}
}
Since opt_dentry is null, we'll never call selinux_proc_get_sid()
and will leave the inode labeled with the label on the superblock.
I believe a fix would be to mimic the behavior of xattrs. Look
for an alias of the inode. If it can't be found, just leave the
inode uninitialized (and pick it up later) if it can be found, we
should be able to call selinux_proc_get_sid() ..."
On a system exhibiting this problem, you will notice a lot of files in
/proc with the generic "proc_t" type (at least the ones that were
accessed early in the boot), for example:
# ls -Z /proc/sys/kernel/shmmax | awk '{ print $4 " " $5 }'
system_u:object_r:proc_t:s0 /proc/sys/kernel/shmmax
However, with this patch in place we see the expected result:
# ls -Z /proc/sys/kernel/shmmax | awk '{ print $4 " " $5 }'
system_u:object_r:sysctl_kernel_t:s0 /proc/sys/kernel/shmmax
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Acked-by: Eric Paris <eparis@redhat.com>
It turns out that doing the SELinux MAC checks for mmap() before the
DAC checks was causing users and the SELinux policy folks headaches
as users were seeing a lot of SELinux AVC denials for the
memprotect:mmap_zero permission that would have also been denied by
the normal DAC capability checks (CAP_SYS_RAWIO).
Example:
# cat mmap_test.c
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <sys/mman.h>
int main(int argc, char *argv[])
{
int rc;
void *mem;
mem = mmap(0x0, 4096,
PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
if (mem == MAP_FAILED)
return errno;
printf("mem = %p\n", mem);
munmap(mem, 4096);
return 0;
}
# gcc -g -O0 -o mmap_test mmap_test.c
# ./mmap_test
mem = (nil)
# ausearch -m AVC | grep mmap_zero
type=AVC msg=audit(...): avc: denied { mmap_zero }
for pid=1025 comm="mmap_test"
scontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
tcontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
tclass=memprotect
This patch corrects things so that when the above example is run by a
user without CAP_SYS_RAWIO the SELinux AVC is no longer generated as
the DAC capability check fails before the SELinux permission check.
Signed-off-by: Paul Moore <pmoore@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Move the flags representing required permission to linux/key.h as the perm
parameter of security_key_permission() is in terms of them - and not the
permissions mask flags used in key->perm.
Whilst we're at it:
(1) Rename them to be KEY_NEED_xxx rather than KEY_xxx to avoid collisions
with symbols in uapi/linux/input.h.
(2) Don't use key_perm_t for a mask of required permissions, but rather limit
it to the permissions mask attached to the key and arguments related
directly to that.
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
security_xfrm_policy_alloc can be called in atomic context so the
allocation should be done with GFP_ATOMIC. Add an argument to let the
callers choose the appropriate way. In order to do so a gfp argument
needs to be added to the method xfrm_policy_alloc_security in struct
security_operations and to the internal function
selinux_xfrm_alloc_user. After that switch to GFP_ATOMIC in the atomic
callers and leave GFP_KERNEL as before for the rest.
The path that needed the gfp argument addition is:
security_xfrm_policy_alloc -> security_ops.xfrm_policy_alloc_security ->
all users of xfrm_policy_alloc_security (e.g. selinux_xfrm_policy_alloc) ->
selinux_xfrm_alloc_user (here the allocation used to be GFP_KERNEL only)
Now adding a gfp argument to selinux_xfrm_alloc_user requires us to also
add it to security_context_to_sid which is used inside and prior to this
patch did only GFP_KERNEL allocation. So add gfp argument to
security_context_to_sid and adjust all of its callers as well.
CC: Paul Moore <paul@paul-moore.com>
CC: Dave Jones <davej@redhat.com>
CC: Steffen Klassert <steffen.klassert@secunet.com>
CC: Fan Du <fan.du@windriver.com>
CC: David S. Miller <davem@davemloft.net>
CC: LSM list <linux-security-module@vger.kernel.org>
CC: SELinux list <selinux@tycho.nsa.gov>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Conflicts:
drivers/net/wireless/ath/ath9k/recv.c
drivers/net/wireless/mwifiex/pcie.c
net/ipv6/sit.c
The SIT driver conflict consists of a bug fix being done by hand
in 'net' (missing u64_stats_init()) whilst in 'net-next' a helper
was created (netdev_alloc_pcpu_stats()) which takes care of this.
The two wireless conflicts were overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is based on an earlier patch by Eric Paris, he describes
the problem below:
"If an inode is accessed before policy load it will get placed on a
list of inodes to be initialized after policy load. After policy
load we call inode_doinit() which calls inode_doinit_with_dentry()
on all inodes accessed before policy load. In the case of inodes
in procfs that means we'll end up at the bottom where it does:
/* Default to the fs superblock SID. */
isec->sid = sbsec->sid;
if ((sbsec->flags & SE_SBPROC) && !S_ISLNK(inode->i_mode)) {
if (opt_dentry) {
isec->sclass = inode_mode_to_security_class(...)
rc = selinux_proc_get_sid(opt_dentry,
isec->sclass,
&sid);
if (rc)
goto out_unlock;
isec->sid = sid;
}
}
Since opt_dentry is null, we'll never call selinux_proc_get_sid()
and will leave the inode labeled with the label on the superblock.
I believe a fix would be to mimic the behavior of xattrs. Look
for an alias of the inode. If it can't be found, just leave the
inode uninitialized (and pick it up later) if it can be found, we
should be able to call selinux_proc_get_sid() ..."
On a system exhibiting this problem, you will notice a lot of files in
/proc with the generic "proc_t" type (at least the ones that were
accessed early in the boot), for example:
# ls -Z /proc/sys/kernel/shmmax | awk '{ print $4 " " $5 }'
system_u:object_r:proc_t:s0 /proc/sys/kernel/shmmax
However, with this patch in place we see the expected result:
# ls -Z /proc/sys/kernel/shmmax | awk '{ print $4 " " $5 }'
system_u:object_r:sysctl_kernel_t:s0 /proc/sys/kernel/shmmax
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Acked-by: Eric Paris <eparis@redhat.com>
It turns out that doing the SELinux MAC checks for mmap() before the
DAC checks was causing users and the SELinux policy folks headaches
as users were seeing a lot of SELinux AVC denials for the
memprotect:mmap_zero permission that would have also been denied by
the normal DAC capability checks (CAP_SYS_RAWIO).
Example:
# cat mmap_test.c
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <sys/mman.h>
int main(int argc, char *argv[])
{
int rc;
void *mem;
mem = mmap(0x0, 4096,
PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
if (mem == MAP_FAILED)
return errno;
printf("mem = %p\n", mem);
munmap(mem, 4096);
return 0;
}
# gcc -g -O0 -o mmap_test mmap_test.c
# ./mmap_test
mem = (nil)
# ausearch -m AVC | grep mmap_zero
type=AVC msg=audit(...): avc: denied { mmap_zero }
for pid=1025 comm="mmap_test"
scontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
tcontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
tclass=memprotect
This patch corrects things so that when the above example is run by a
user without CAP_SYS_RAWIO the SELinux AVC is no longer generated as
the DAC capability check fails before the SELinux permission check.
Signed-off-by: Paul Moore <pmoore@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
When writing policy via /sys/fs/selinux/policy I wrote the type and class
of filename trans rules in CPU endian instead of little endian. On
x86_64 this works just fine, but it means that on big endian arch's like
ppc64 and s390 userspace reads the policy and converts it from
le32_to_cpu. So the values are all screwed up. Write the values in le
format like it should have been to start.
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
Inserting a entry into flowcache, or flushing flowcache should be based
on per net scope. The reason to do so is flushing operation from fat
netns crammed with flow entries will also making the slim netns with only
a few flow cache entries go away in original implementation.
Since flowcache is tightly coupled with IPsec, so it would be easier to
put flow cache global parameters into xfrm namespace part. And one last
thing needs to do is bumping flow cache genid, and flush flow cache should
also be made in per net style.
Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
The usage of strict_strto*() is not preferred, because
strict_strto*() is obsolete. Thus, kstrto*() should be
used.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Setting an empty security context (length=0) on a file will
lead to incorrectly dereferencing the type and other fields
of the security context structure, yielding a kernel BUG.
As a zero-length security context is never valid, just reject
all such security contexts whether coming from userspace
via setxattr or coming from the filesystem upon a getxattr
request by SELinux.
Setting a security context value (empty or otherwise) unknown to
SELinux in the first place is only possible for a root process
(CAP_MAC_ADMIN), and, if running SELinux in enforcing mode, only
if the corresponding SELinux mac_admin permission is also granted
to the domain by policy. In Fedora policies, this is only allowed for
specific domains such as livecd for setting down security contexts
that are not defined in the build host policy.
Reproducer:
su
setenforce 0
touch foo
setfattr -n security.selinux foo
Caveat:
Relabeling or removing foo after doing the above may not be possible
without booting with SELinux disabled. Any subsequent access to foo
after doing the above will also trigger the BUG.
BUG output from Matthew Thode:
[ 473.893141] ------------[ cut here ]------------
[ 473.962110] kernel BUG at security/selinux/ss/services.c:654!
[ 473.995314] invalid opcode: 0000 [#6] SMP
[ 474.027196] Modules linked in:
[ 474.058118] CPU: 0 PID: 8138 Comm: ls Tainted: G D I
3.13.0-grsec #1
[ 474.116637] Hardware name: Supermicro X8ST3/X8ST3, BIOS 2.0
07/29/10
[ 474.149768] task: ffff8805f50cd010 ti: ffff8805f50cd488 task.ti:
ffff8805f50cd488
[ 474.183707] RIP: 0010:[<ffffffff814681c7>] [<ffffffff814681c7>]
context_struct_compute_av+0xce/0x308
[ 474.219954] RSP: 0018:ffff8805c0ac3c38 EFLAGS: 00010246
[ 474.252253] RAX: 0000000000000000 RBX: ffff8805c0ac3d94 RCX:
0000000000000100
[ 474.287018] RDX: ffff8805e8aac000 RSI: 00000000ffffffff RDI:
ffff8805e8aaa000
[ 474.321199] RBP: ffff8805c0ac3cb8 R08: 0000000000000010 R09:
0000000000000006
[ 474.357446] R10: 0000000000000000 R11: ffff8805c567a000 R12:
0000000000000006
[ 474.419191] R13: ffff8805c2b74e88 R14: 00000000000001da R15:
0000000000000000
[ 474.453816] FS: 00007f2e75220800(0000) GS:ffff88061fc00000(0000)
knlGS:0000000000000000
[ 474.489254] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 474.522215] CR2: 00007f2e74716090 CR3: 00000005c085e000 CR4:
00000000000207f0
[ 474.556058] Stack:
[ 474.584325] ffff8805c0ac3c98 ffffffff811b549b ffff8805c0ac3c98
ffff8805f1190a40
[ 474.618913] ffff8805a6202f08 ffff8805c2b74e88 00068800d0464990
ffff8805e8aac860
[ 474.653955] ffff8805c0ac3cb8 000700068113833a ffff880606c75060
ffff8805c0ac3d94
[ 474.690461] Call Trace:
[ 474.723779] [<ffffffff811b549b>] ? lookup_fast+0x1cd/0x22a
[ 474.778049] [<ffffffff81468824>] security_compute_av+0xf4/0x20b
[ 474.811398] [<ffffffff8196f419>] avc_compute_av+0x2a/0x179
[ 474.843813] [<ffffffff8145727b>] avc_has_perm+0x45/0xf4
[ 474.875694] [<ffffffff81457d0e>] inode_has_perm+0x2a/0x31
[ 474.907370] [<ffffffff81457e76>] selinux_inode_getattr+0x3c/0x3e
[ 474.938726] [<ffffffff81455cf6>] security_inode_getattr+0x1b/0x22
[ 474.970036] [<ffffffff811b057d>] vfs_getattr+0x19/0x2d
[ 475.000618] [<ffffffff811b05e5>] vfs_fstatat+0x54/0x91
[ 475.030402] [<ffffffff811b063b>] vfs_lstat+0x19/0x1b
[ 475.061097] [<ffffffff811b077e>] SyS_newlstat+0x15/0x30
[ 475.094595] [<ffffffff8113c5c1>] ? __audit_syscall_entry+0xa1/0xc3
[ 475.148405] [<ffffffff8197791e>] system_call_fastpath+0x16/0x1b
[ 475.179201] Code: 00 48 85 c0 48 89 45 b8 75 02 0f 0b 48 8b 45 a0 48
8b 3d 45 d0 b6 00 8b 40 08 89 c6 ff ce e8 d1 b0 06 00 48 85 c0 49 89 c7
75 02 <0f> 0b 48 8b 45 b8 4c 8b 28 eb 1e 49 8d 7d 08 be 80 01 00 00 e8
[ 475.255884] RIP [<ffffffff814681c7>]
context_struct_compute_av+0xce/0x308
[ 475.296120] RSP <ffff8805c0ac3c38>
[ 475.328734] ---[ end trace f076482e9d754adc ]---
Reported-by: Matthew Thode <mthode@mthode.org>
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJS3IyXAAoJEHm+PkMAQRiGplAH/ilCikBrCHyZ2938NHNLm+j1
yhfYnEJHLNg7T69KEj3p0cNagO3v9RPWM6UYFBQ6uFIYNN1MBKO7U+mCZuMWzeO8
+tGMV3mn5wx+oYn1RnWCCweQx5AESEl6rYn8udPDKh7LfW5fCLV60jguUjVSQ9IQ
cvtKlWknbiHyM7t1GoYgzN7jlPrRQvcNQZ+Aogzz7uSnJAgwINglBAHS7WP2tiEM
HAU2FoE4b3MbfGaid1vypaYQPBbFebx7Bw2WxAuZhkBbRiUBKlgF0/SYhOTvH38a
Sjpj1EHKfjcuCBt9tht6KP6H56R25vNloGR2+FB+fuQBdujd/SPa9xDflEMcdG4=
=iXnG
-----END PGP SIGNATURE-----
Merge tag 'v3.13' into stable-3.14
Linux 3.13
Conflicts:
security/selinux/hooks.c
Trivial merge issue in selinux_inet_conn_request() likely due to me
including patches that I sent to the stable folks in my next tree
resulting in the patch hitting twice (I think). Thankfully it was an
easy fix this time, but regardless, lesson learned, I will not do that
again.
Pull audit update from Eric Paris:
"Again we stayed pretty well contained inside the audit system.
Venturing out was fixing a couple of function prototypes which were
inconsistent (didn't hurt anything, but we used the same value as an
int, uint, u32, and I think even a long in a couple of places).
We also made a couple of minor changes to when a couple of LSMs called
the audit system. We hoped to add aarch64 audit support this go
round, but it wasn't ready.
I'm disappearing on vacation on Thursday. I should have internet
access, but it'll be spotty. If anything goes wrong please be sure to
cc rgb@redhat.com. He'll make fixing things his top priority"
* git://git.infradead.org/users/eparis/audit: (50 commits)
audit: whitespace fix in kernel-parameters.txt
audit: fix location of __net_initdata for audit_net_ops
audit: remove pr_info for every network namespace
audit: Modify a set of system calls in audit class definitions
audit: Convert int limit uses to u32
audit: Use more current logging style
audit: Use hex_byte_pack_upper
audit: correct a type mismatch in audit_syscall_exit()
audit: reorder AUDIT_TTY_SET arguments
audit: rework AUDIT_TTY_SET to only grab spin_lock once
audit: remove needless switch in AUDIT_SET
audit: use define's for audit version
audit: documentation of audit= kernel parameter
audit: wait_for_auditd rework for readability
audit: update MAINTAINERS
audit: log task info on feature change
audit: fix incorrect set of audit_sock
audit: print error message when fail to create audit socket
audit: fix dangling keywords in audit_log_set_loginuid() output
audit: log on errors from filter user rules
...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJS3IyXAAoJEHm+PkMAQRiGplAH/ilCikBrCHyZ2938NHNLm+j1
yhfYnEJHLNg7T69KEj3p0cNagO3v9RPWM6UYFBQ6uFIYNN1MBKO7U+mCZuMWzeO8
+tGMV3mn5wx+oYn1RnWCCweQx5AESEl6rYn8udPDKh7LfW5fCLV60jguUjVSQ9IQ
cvtKlWknbiHyM7t1GoYgzN7jlPrRQvcNQZ+Aogzz7uSnJAgwINglBAHS7WP2tiEM
HAU2FoE4b3MbfGaid1vypaYQPBbFebx7Bw2WxAuZhkBbRiUBKlgF0/SYhOTvH38a
Sjpj1EHKfjcuCBt9tht6KP6H56R25vNloGR2+FB+fuQBdujd/SPa9xDflEMcdG4=
=iXnG
-----END PGP SIGNATURE-----
Merge tag 'v3.13' into next
Linux 3.13
Minor fixup needed in selinux_inet_conn_request()
Conflicts:
security/selinux/hooks.c
Pull security layer updates from James Morris:
"Changes for this kernel include maintenance updates for Smack, SELinux
(and several networking fixes), IMA and TPM"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (39 commits)
SELinux: Fix memory leak upon loading policy
tpm/tpm-sysfs: active_show() can be static
tpm: tpm_tis: Fix compile problems with CONFIG_PM_SLEEP/CONFIG_PNP
tpm: Make tpm-dev allocate a per-file structure
tpm: Use the ops structure instead of a copy in tpm_vendor_specific
tpm: Create a tpm_class_ops structure and use it in the drivers
tpm: Pull all driver sysfs code into tpm-sysfs.c
tpm: Move sysfs functions from tpm-interface to tpm-sysfs
tpm: Pull everything related to /dev/tpmX into tpm-dev.c
char: tpm: nuvoton: remove unused variable
tpm: MAINTAINERS: Cleanup TPM Maintainers file
tpm/tpm_i2c_atmel: fix coccinelle warnings
tpm/tpm_ibmvtpm: fix unreachable code warning (smatch warning)
tpm/tpm_i2c_stm_st33: Check return code of get_burstcount
tpm/tpm_ppi: Check return value of acpi_get_name
tpm/tpm_ppi: Do not compare strcmp(a,b) == -1
ima: remove unneeded size_limit argument from ima_eventdigest_init_common()
ima: update IMA-templates.txt documentation
ima: pass HASH_ALGO__LAST as hash algo in ima_eventdigest_init()
ima: change the default hash algorithm to SHA1 in ima_eventdigest_ng_init()
...
Two of the conditions in selinux_audit_rule_match() should never happen and
the third indicates a race that should be retried. Remove the calls to
audit_log() (which call audit_log_start()) and deal with the errors in the
caller, logging only once if the condition is met. Calling audit_log_start()
in this location makes buffer allocation and locking more complicated in the
calling tree (audit_filter_user()).
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
While running stress tests on adding and deleting ftrace instances I hit
this bug:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
IP: selinux_inode_permission+0x85/0x160
PGD 63681067 PUD 7ddbe067 PMD 0
Oops: 0000 [#1] PREEMPT
CPU: 0 PID: 5634 Comm: ftrace-test-mki Not tainted 3.13.0-rc4-test-00033-gd2a6dde-dirty #20
Hardware name: /DG965MQ, BIOS MQ96510J.86A.0372.2006.0605.1717 06/05/2006
task: ffff880078375800 ti: ffff88007ddb0000 task.ti: ffff88007ddb0000
RIP: 0010:[<ffffffff812d8bc5>] [<ffffffff812d8bc5>] selinux_inode_permission+0x85/0x160
RSP: 0018:ffff88007ddb1c48 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000800000 RCX: ffff88006dd43840
RDX: 0000000000000001 RSI: 0000000000000081 RDI: ffff88006ee46000
RBP: ffff88007ddb1c88 R08: 0000000000000000 R09: ffff88007ddb1c54
R10: 6e6576652f6f6f66 R11: 0000000000000003 R12: 0000000000000000
R13: 0000000000000081 R14: ffff88006ee46000 R15: 0000000000000000
FS: 00007f217b5b6700(0000) GS:ffffffff81e21000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M
CR2: 0000000000000020 CR3: 000000006a0fe000 CR4: 00000000000007f0
Call Trace:
security_inode_permission+0x1c/0x30
__inode_permission+0x41/0xa0
inode_permission+0x18/0x50
link_path_walk+0x66/0x920
path_openat+0xa6/0x6c0
do_filp_open+0x43/0xa0
do_sys_open+0x146/0x240
SyS_open+0x1e/0x20
system_call_fastpath+0x16/0x1b
Code: 84 a1 00 00 00 81 e3 00 20 00 00 89 d8 83 c8 02 40 f6 c6 04 0f 45 d8 40 f6 c6 08 74 71 80 cf 02 49 8b 46 38 4c 8d 4d cc 45 31 c0 <0f> b7 50 20 8b 70 1c 48 8b 41 70 89 d9 8b 78 04 e8 36 cf ff ff
RIP selinux_inode_permission+0x85/0x160
CR2: 0000000000000020
Investigating, I found that the inode->i_security was NULL, and the
dereference of it caused the oops.
in selinux_inode_permission():
isec = inode->i_security;
rc = avc_has_perm_noaudit(sid, isec->sid, isec->sclass, perms, 0, &avd);
Note, the crash came from stressing the deletion and reading of debugfs
files. I was not able to recreate this via normal files. But I'm not
sure they are safe. It may just be that the race window is much harder
to hit.
What seems to have happened (and what I have traced), is the file is
being opened at the same time the file or directory is being deleted.
As the dentry and inode locks are not held during the path walk, nor is
the inodes ref counts being incremented, there is nothing saving these
structures from being discarded except for an rcu_read_lock().
The rcu_read_lock() protects against freeing of the inode, but it does
not protect freeing of the inode_security_struct. Now if the freeing of
the i_security happens with a call_rcu(), and the i_security field of
the inode is not changed (it gets freed as the inode gets freed) then
there will be no issue here. (Linus Torvalds suggested not setting the
field to NULL such that we do not need to check if it is NULL in the
permission check).
Note, this is a hack, but it fixes the problem at hand. A real fix is
to restructure the destroy_inode() to call all the destructor handlers
from the RCU callback. But that is a major job to do, and requires a
lot of work. For now, we just band-aid this bug with this fix (it
works), and work on a more maintainable solution in the future.
Link: http://lkml.kernel.org/r/20140109101932.0508dec7@gandalf.local.home
Link: http://lkml.kernel.org/r/20140109182756.17abaaa8@gandalf.local.home
Cc: stable@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hello.
I got below leak with linux-3.10.0-54.0.1.el7.x86_64 .
[ 681.903890] kmemleak: 5538 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
Below is a patch, but I don't know whether we need special handing for undoing
ebitmap_set_bit() call.
----------
>>From fe97527a90fe95e2239dfbaa7558f0ed559c0992 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Mon, 6 Jan 2014 16:30:21 +0900
Subject: [PATCH] SELinux: Fix memory leak upon loading policy
Commit 2463c26d "SELinux: put name based create rules in a hashtable" did not
check return value from hashtab_insert() in filename_trans_read(). It leaks
memory if hashtab_insert() returns error.
unreferenced object 0xffff88005c9160d0 (size 8):
comm "systemd", pid 1, jiffies 4294688674 (age 235.265s)
hex dump (first 8 bytes):
57 0b 00 00 6b 6b 6b a5 W...kkk.
backtrace:
[<ffffffff816604ae>] kmemleak_alloc+0x4e/0xb0
[<ffffffff811cba5e>] kmem_cache_alloc_trace+0x12e/0x360
[<ffffffff812aec5d>] policydb_read+0xd1d/0xf70
[<ffffffff812b345c>] security_load_policy+0x6c/0x500
[<ffffffff812a623c>] sel_write_load+0xac/0x750
[<ffffffff811eb680>] vfs_write+0xc0/0x1f0
[<ffffffff811ec08c>] SyS_write+0x4c/0xa0
[<ffffffff81690419>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
However, we should not return EEXIST error to the caller, or the systemd will
show below message and the boot sequence freezes.
systemd[1]: Failed to load SELinux policy. Freezing.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Eric Paris <eparis@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
selinux_setprocattr() does ptrace_parent(p) under task_lock(p),
but task_struct->alloc_lock doesn't pin ->parent or ->ptrace,
this looks confusing and triggers the "suspicious RCU usage"
warning because ptrace_parent() does rcu_dereference_check().
And in theory this is wrong, spin_lock()->preempt_disable()
doesn't necessarily imply rcu_read_lock() we need to access
the ->parent.
Reported-by: Evan McNabb <emcnabb@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
Fix a broken networking check. Return an error if peer recv fails. If
secmark is active and the packet recv succeeds the peer recv error is
ignored.
Signed-off-by: Chad Hanson <chanson@trustedcs.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
selinux_setprocattr() does ptrace_parent(p) under task_lock(p),
but task_struct->alloc_lock doesn't pin ->parent or ->ptrace,
this looks confusing and triggers the "suspicious RCU usage"
warning because ptrace_parent() does rcu_dereference_check().
And in theory this is wrong, spin_lock()->preempt_disable()
doesn't necessarily imply rcu_read_lock() we need to access
the ->parent.
Reported-by: Evan McNabb <emcnabb@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
Pull SELinux fixes from James Morris.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
selinux: process labeled IPsec TCP SYN-ACK packets properly in selinux_ip_postroute()
selinux: look for IPsec labels on both inbound and outbound packets
selinux: handle TCP SYN-ACK packets correctly in selinux_ip_postroute()
selinux: handle TCP SYN-ACK packets correctly in selinux_ip_output()
selinux: fix possible memory leak
This reverts commit 102aefdda4.
Tom London reports that it causes sync() to hang on Fedora rawhide:
https://bugzilla.redhat.com/show_bug.cgi?id=1033965
and Josh Boyer bisected it down to this commit. Reverting the commit in
the rawhide kernel fixes the problem.
Eric Paris root-caused it to incorrect subtype matching in that commit
breaking fuse, and has a tentative patch, but by now we're better off
retrying this in 3.14 rather than playing with it any more.
Reported-by: Tom London <selinux@gmail.com>
Bisected-by: Josh Boyer <jwboyer@fedoraproject.org>
Acked-by: Eric Paris <eparis@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: Anand Avati <avati@redhat.com>
Cc: Paul Moore <paul@paul-moore.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Revert "selinux: consider filesystem subtype in policies"
This reverts commit 102aefdda4.
Explanation from Eric Paris:
SELinux policy can specify if it should use a filesystem's
xattrs or not. In current policy we have a specification that
fuse should not use xattrs but fuse.glusterfs should use
xattrs. This patch has a bug in which non-glusterfs
filesystems would match the rule saying fuse.glusterfs should
use xattrs. If both fuse and the particular filesystem in
question are not written to handle xattr calls during the mount
command, they will deadlock.
I have fixed the bug to do proper matching, however I believe a
revert is still the correct solution. The reason I believe
that is because the code still does not work. The s_subtype is
not set until after the SELinux hook which attempts to match on
the ".gluster" portion of the rule. So we cannot match on the
rule in question. The code is useless.
Signed-off-by: Paul Moore <pmoore@redhat.com>
Due to difficulty in arriving at the proper security label for
TCP SYN-ACK packets in selinux_ip_postroute(), we need to check packets
while/before they are undergoing XFRM transforms instead of waiting
until afterwards so that we can determine the correct security label.
Reported-by: Janak Desai <Janak.Desai@gtri.gatech.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
Previously selinux_skb_peerlbl_sid() would only check for labeled
IPsec security labels on inbound packets, this patch enables it to
check both inbound and outbound traffic for labeled IPsec security
labels.
Reported-by: Janak Desai <Janak.Desai@gtri.gatech.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
In selinux_ip_postroute() we perform access checks based on the
packet's security label. For locally generated traffic we get the
packet's security label from the associated socket; this works in all
cases except for TCP SYN-ACK packets. In the case of SYN-ACK packet's
the correct security label is stored in the connection's request_sock,
not the server's socket. Unfortunately, at the point in time when
selinux_ip_postroute() is called we can't query the request_sock
directly, we need to recreate the label using the same logic that
originally labeled the associated request_sock.
See the inline comments for more explanation.
Reported-by: Janak Desai <Janak.Desai@gtri.gatech.edu>
Tested-by: Janak Desai <Janak.Desai@gtri.gatech.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
In selinux_ip_output() we always label packets based on the parent
socket. While this approach works in almost all cases, it doesn't
work in the case of TCP SYN-ACK packets when the correct label is not
the label of the parent socket, but rather the label of the larval
socket represented by the request_sock struct.
Unfortunately, since the request_sock isn't queued on the parent
socket until *after* the SYN-ACK packet is sent, we can't lookup the
request_sock to determine the correct label for the packet; at this
point in time the best we can do is simply pass/NF_ACCEPT the packet.
It must be said that simply passing the packet without any explicit
labeling action, while far from ideal, is not terrible as the SYN-ACK
packet will inherit any IP option based labeling from the initial
connection request so the label *should* be correct and all our
access controls remain in place so we shouldn't have to worry about
information leaks.
Reported-by: Janak Desai <Janak.Desai@gtri.gatech.edu>
Tested-by: Janak Desai <Janak.Desai@gtri.gatech.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
Fix a broken networking check. Return an error if peer recv fails. If
secmark is active and the packet recv succeeds the peer recv error is
ignored.
Signed-off-by: Chad Hanson <chanson@trustedcs.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Due to difficulty in arriving at the proper security label for
TCP SYN-ACK packets in selinux_ip_postroute(), we need to check packets
while/before they are undergoing XFRM transforms instead of waiting
until afterwards so that we can determine the correct security label.
Reported-by: Janak Desai <Janak.Desai@gtri.gatech.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
Previously selinux_skb_peerlbl_sid() would only check for labeled
IPsec security labels on inbound packets, this patch enables it to
check both inbound and outbound traffic for labeled IPsec security
labels.
Reported-by: Janak Desai <Janak.Desai@gtri.gatech.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
We don't need to inspect the packet to determine if the packet is an
IPv4 packet arriving on an IPv6 socket when we can query the
request_sock directly.
Signed-off-by: Paul Moore <pmoore@redhat.com>
In selinux_netlbl_skbuff_setsid() we leverage a cached NetLabel
secattr whenever possible. However, we never check to ensure that
the desired SID matches the cached NetLabel secattr. This patch
checks the SID against the secattr before use and only uses the
cached secattr when the SID values match.
Signed-off-by: Paul Moore <pmoore@redhat.com>
In selinux_ip_postroute() we perform access checks based on the
packet's security label. For locally generated traffic we get the
packet's security label from the associated socket; this works in all
cases except for TCP SYN-ACK packets. In the case of SYN-ACK packet's
the correct security label is stored in the connection's request_sock,
not the server's socket. Unfortunately, at the point in time when
selinux_ip_postroute() is called we can't query the request_sock
directly, we need to recreate the label using the same logic that
originally labeled the associated request_sock.
See the inline comments for more explanation.
Reported-by: Janak Desai <Janak.Desai@gtri.gatech.edu>
Tested-by: Janak Desai <Janak.Desai@gtri.gatech.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
In selinux_ip_output() we always label packets based on the parent
socket. While this approach works in almost all cases, it doesn't
work in the case of TCP SYN-ACK packets when the correct label is not
the label of the parent socket, but rather the label of the larval
socket represented by the request_sock struct.
Unfortunately, since the request_sock isn't queued on the parent
socket until *after* the SYN-ACK packet is sent, we can't lookup the
request_sock to determine the correct label for the packet; at this
point in time the best we can do is simply pass/NF_ACCEPT the packet.
It must be said that simply passing the packet without any explicit
labeling action, while far from ideal, is not terrible as the SYN-ACK
packet will inherit any IP option based labeling from the initial
connection request so the label *should* be correct and all our
access controls remain in place so we shouldn't have to worry about
information leaks.
Reported-by: Janak Desai <Janak.Desai@gtri.gatech.edu>
Tested-by: Janak Desai <Janak.Desai@gtri.gatech.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
Free 'ctx_str' when necessary.
Signed-off-by: Geyslan G. Bem <geyslan@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
Pull security subsystem updates from James Morris:
"In this patchset, we finally get an SELinux update, with Paul Moore
taking over as maintainer of that code.
Also a significant update for the Keys subsystem, as well as
maintenance updates to Smack, IMA, TPM, and Apparmor"
and since I wanted to know more about the updates to key handling,
here's the explanation from David Howells on that:
"Okay. There are a number of separate bits. I'll go over the big bits
and the odd important other bit, most of the smaller bits are just
fixes and cleanups. If you want the small bits accounting for, I can
do that too.
(1) Keyring capacity expansion.
KEYS: Consolidate the concept of an 'index key' for key access
KEYS: Introduce a search context structure
KEYS: Search for auth-key by name rather than target key ID
Add a generic associative array implementation.
KEYS: Expand the capacity of a keyring
Several of the patches are providing an expansion of the capacity of a
keyring. Currently, the maximum size of a keyring payload is one page.
Subtract a small header and then divide up into pointers, that only gives
you ~500 pointers on an x86_64 box. However, since the NFS idmapper uses
a keyring to store ID mapping data, that has proven to be insufficient to
the cause.
Whatever data structure I use to handle the keyring payload, it can only
store pointers to keys, not the keys themselves because several keyrings
may point to a single key. This precludes inserting, say, and rb_node
struct into the key struct for this purpose.
I could make an rbtree of records such that each record has an rb_node
and a key pointer, but that would use four words of space per key stored
in the keyring. It would, however, be able to use much existing code.
I selected instead a non-rebalancing radix-tree type approach as that
could have a better space-used/key-pointer ratio. I could have used the
radix tree implementation that we already have and insert keys into it by
their serial numbers, but that means any sort of search must iterate over
the whole radix tree. Further, its nodes are a bit on the capacious side
for what I want - especially given that key serial numbers are randomly
allocated, thus leaving a lot of empty space in the tree.
So what I have is an associative array that internally is a radix-tree
with 16 pointers per node where the index key is constructed from the key
type pointer and the key description. This means that an exact lookup by
type+description is very fast as this tells us how to navigate directly to
the target key.
I made the data structure general in lib/assoc_array.c as far as it is
concerned, its index key is just a sequence of bits that leads to a
pointer. It's possible that someone else will be able to make use of it
also. FS-Cache might, for example.
(2) Mark keys as 'trusted' and keyrings as 'trusted only'.
KEYS: verify a certificate is signed by a 'trusted' key
KEYS: Make the system 'trusted' keyring viewable by userspace
KEYS: Add a 'trusted' flag and a 'trusted only' flag
KEYS: Separate the kernel signature checking keyring from module signing
These patches allow keys carrying asymmetric public keys to be marked as
being 'trusted' and allow keyrings to be marked as only permitting the
addition or linkage of trusted keys.
Keys loaded from hardware during kernel boot or compiled into the kernel
during build are marked as being trusted automatically. New keys can be
loaded at runtime with add_key(). They are checked against the system
keyring contents and if their signatures can be validated with keys that
are already marked trusted, then they are marked trusted also and can
thus be added into the master keyring.
Patches from Mimi Zohar make this usable with the IMA keyrings also.
(3) Remove the date checks on the key used to validate a module signature.
X.509: Remove certificate date checks
It's not reasonable to reject a signature just because the key that it was
generated with is no longer valid datewise - especially if the kernel
hasn't yet managed to set the system clock when the first module is
loaded - so just remove those checks.
(4) Make it simpler to deal with additional X.509 being loaded into the kernel.
KEYS: Load *.x509 files into kernel keyring
KEYS: Have make canonicalise the paths of the X.509 certs better to deduplicate
The builder of the kernel now just places files with the extension ".x509"
into the kernel source or build trees and they're concatenated by the
kernel build and stuffed into the appropriate section.
(5) Add support for userspace kerberos to use keyrings.
KEYS: Add per-user_namespace registers for persistent per-UID kerberos caches
KEYS: Implement a big key type that can save to tmpfs
Fedora went to, by default, storing kerberos tickets and tokens in tmpfs.
We looked at storing it in keyrings instead as that confers certain
advantages such as tickets being automatically deleted after a certain
amount of time and the ability for the kernel to get at these tokens more
easily.
To make this work, two things were needed:
(a) A way for the tickets to persist beyond the lifetime of all a user's
sessions so that cron-driven processes can still use them.
The problem is that a user's session keyrings are deleted when the
session that spawned them logs out and the user's user keyring is
deleted when the UID is deleted (typically when the last log out
happens), so neither of these places is suitable.
I've added a system keyring into which a 'persistent' keyring is
created for each UID on request. Each time a user requests their
persistent keyring, the expiry time on it is set anew. If the user
doesn't ask for it for, say, three days, the keyring is automatically
expired and garbage collected using the existing gc. All the kerberos
tokens it held are then also gc'd.
(b) A key type that can hold really big tickets (up to 1MB in size).
The problem is that Active Directory can return huge tickets with lots
of auxiliary data attached. We don't, however, want to eat up huge
tracts of unswappable kernel space for this, so if the ticket is
greater than a certain size, we create a swappable shmem file and dump
the contents in there and just live with the fact we then have an
inode and a dentry overhead. If the ticket is smaller than that, we
slap it in a kmalloc()'d buffer"
* 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (121 commits)
KEYS: Fix keyring content gc scanner
KEYS: Fix error handling in big_key instantiation
KEYS: Fix UID check in keyctl_get_persistent()
KEYS: The RSA public key algorithm needs to select MPILIB
ima: define '_ima' as a builtin 'trusted' keyring
ima: extend the measurement list to include the file signature
kernel/system_certificate.S: use real contents instead of macro GLOBAL()
KEYS: fix error return code in big_key_instantiate()
KEYS: Fix keyring quota misaccounting on key replacement and unlink
KEYS: Fix a race between negating a key and reading the error set
KEYS: Make BIG_KEYS boolean
apparmor: remove the "task" arg from may_change_ptraced_domain()
apparmor: remove parent task info from audit logging
apparmor: remove tsk field from the apparmor_audit_struct
apparmor: fix capability to not use the current task, during reporting
Smack: Ptrace access check mode
ima: provide hash algo info in the xattr
ima: enable support for larger default filedata hash algorithms
ima: define kernel parameter 'ima_template=' to change configured default
ima: add Kconfig default measurement list template
...