Now that we track what sort of NEGOTIATE response was received, stop
mandating that every session on a socket use the same type of auth.
Push that decision out into the session setup code, and make the sectype
a per-session property. This should allow us to mix multiple sectypes on
a socket as long as they are compatible with the NEGOTIATE response.
With this too, we can now eliminate the ses->secFlg field since that
info is redundant and harder to work with than a securityEnum.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
Currently, we determine this according to flags in the sec_mode, flags
in the global_secflags and via other methods. That makes the semantics
very hard to follow and there are corner cases where we don't handle
this correctly.
Add a new bool to the TCP_Server_Info that acts as a simple flag to tell
us whether signing is enabled on this connection or not, and fix up the
places that need to determine this to use that flag.
This is a bit weird for the SMB2 case, where signing is per-session.
SMB2 needs work in this area already though. The existing SMB2 code has
similar logic to what we're using here, so there should be no real
change in behavior. These changes should make it easier to implement
per-session signing in the future though.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
We have this to some degree already in secFlgs, but those get "or'ed" so
there's no way to know what the last option requested was. Add new fields
that will eventually supercede the secFlgs field in the cifs_ses.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
Currently we have the overrideSecFlg field, but it's quite cumbersome
to work with. Add some new fields that will eventually supercede it.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Track what sort of NEGOTIATE response we get from the server, as that
will govern what sort of authentication types this socket will support.
There are three possibilities:
LANMAN: server sent legacy LANMAN-type response
UNENCAP: server sent a newer-style response, but extended security bit
wasn't set. This socket will only support unencapsulated auth types.
EXTENDED: server sent a newer-style response with the extended security
bit set. This is necessary to support krb5 and ntlmssp auth types.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
Add a new securityEnum value to cover the case where a sec= option
was not explicitly set.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
Move the sanity checks for signed connections into a separate function.
SMB2's was a cut-and-paste job from CIFS code, so we can make them use
the same function.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
...this also gets rid of some #ifdef ugliness too.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <sfrench@us.ibm.com>
...cleanup.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <sfrench@us.ibm.com>
This field is completely unused:
CIFS_SES_W9X is completely unused. CIFS_SES_LANMAN and CIFS_SES_OS2
are set but never checked. CIFS_SES_NT4 is checked, but never set.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <sfrench@us.ibm.com>
These look pretty cargo-culty to me, but let's be certain. Leave
them in place for now. Pop a WARN if it ever does happen. Also,
move to a more standard idiom for setting the "server" pointer.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <sfrench@us.ibm.com>
...rc is always set to 0.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <sfrench@us.ibm.com>
It turns out that CIFS_SESS_KEY_SIZE == CIFS_ENCPWD_SIZE, so this
memset doesn't do anything useful.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <sfrench@us.ibm.com>
The field that held this was removed quite some time ago.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Some servers set max_vcs to 1 and actually do enforce that limit. Add a
new mount option to work around this behavior that forces a mount
request to open a new socket to the server instead of reusing an
existing one.
I'd prefer to come up with a solution that doesn't require this, so
consider this a debug patch that you can use to determine whether this
is the real problem.
Cc: Jim McDonough <jmcd@samba.org>
Cc: Steve French <smfrench@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Fix new kernel-doc warning in fs/splice.c:
Warning(fs/splice.c:1298): No description found for parameter 'opos'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull VFS fixes from Al Viro:
"Several fixes + obvious cleanup (you've missed a couple of open-coded
can_lookup() back then)"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
snd_pcm_link(): fix a leak...
use can_lookup() instead of direct checks of ->i_op->lookup
move exit_task_namespaces() outside of exit_notify()
fput: task_work_add() can fail if the caller has passed exit_task_work()
ncpfs: fix rmdir returns Device or resource busy
- Remove noisy warnings about experimental support which spams the logs
- Add padding to align directory and attr structures correctly
- Set block number on child buffer on a root btree split
- Disable verifiers during log recovery for non-CRC filesystems
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
iQIcBAABAgAGBQJRu4gPAAoJENaLyazVq6ZO0GwP/j7i8hEl6hoFZZJ2WX7niFCP
t0r218J9JZDCLSk7+rY26gmxOzifRHAIt5TRwwqSCbNnZbuQZsqFUpvDMSMY3XOj
4qnUlO6diRLonN5ixrOb5YMTQJ8YHG7cB4jvxBDAqPqEfNpRyqikxstcH6KBmtSU
duqhuQMdmHAjMUqfpdt5ewueOCmw6jI79ZqvMnEfSHW7YS7G4SrKYa71HkfRR6CD
+K/FqEoDO/9psbsFlrkQ4Uvqngp8c9c0wQULxreN0BSdRbVqHfrS6eAWGhT3K2HW
7ZGxEiTcwR5XCtDQjhw7vbZQEMeMcl6yZ6J7e+jJc53maySOOrqCaYyyrhzZFw4H
Xh52pcVJtGuGVBHDxpfhI5e7KI4DjEugQK9AaONy02bhhTh3r3CKu5pprDyenyHr
9s/DG8u/gJX8tm8DSBlIXv2iCvY4mTeesYkMaLHgC8uLXmItkRBoUaj1NQvnsTqo
EF1xVVqh3aiueD4+cvu3+x4J4dTFmYQ++Oi3Zt1YpjBBb/h3n3KFUfizhRIp9r43
R4UO5W3b6s4q/1oC+bO6Qlxfny9vcyz+UrkcLpbuo+cRTC3bKi85v2Gaaw69bcB1
1SZCFRuVvDvzffX6Nir699Dj/uU4GETvDw/+y/igcKcETx6L4AgQPV9y/izJq5zr
zLhC+OSCDvuOGaOmRvco
=bijX
-----END PGP SIGNATURE-----
Merge tag 'for-linus-v3.10-rc6' of git://oss.sgi.com/xfs/xfs
Pull xfs fixes from Ben Myers:
- Remove noisy warnings about experimental support which spams the logs
- Add padding to align directory and attr structures correctly
- Set block number on child buffer on a root btree split
- Disable verifiers during log recovery for non-CRC filesystems
* tag 'for-linus-v3.10-rc6' of git://oss.sgi.com/xfs/xfs:
xfs: don't shutdown log recovery on validation errors
xfs: ensure btree root split sets blkno correctly
xfs: fix implicit padding in directory and attr CRC formats
xfs: don't emit v5 superblock warnings on write
fput() assumes that it can't be called after exit_task_work() but
this is not true, for example free_ipc_ns()->shm_destroy() can do
this. In this case fput() silently leaks the file.
Change it to fallback to delayed_fput_work if task_work_add() fails.
The patch looks complicated but it is not, it changes the code from
if (PF_KTHREAD) {
schedule_work(...);
return;
}
task_work_add(...)
to
if (!PF_KTHREAD) {
if (!task_work_add(...))
return;
/* fallback */
}
schedule_work(...);
As for shm_destroy() in particular, we could make another fix but I
think this change makes sense anyway. There could be another similar
user, it is not safe to assume that task_work_add() can't fail.
Reported-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Unfortunately, we cannot guarantee that items logged multiple times
and replayed by log recovery do not take objects back in time. When
they are taken back in time, the go into an intermediate state which
is corrupt, and hence verification that occurs on this intermediate
state causes log recovery to abort with a corruption shutdown.
Instead of causing a shutdown and unmountable filesystem, don't
verify post-recovery items before they are written to disk. This is
less than optimal, but there is no way to detect this issue for
non-CRC filesystems If log recovery successfully completes, this
will be undone and the object will be consistent by subsequent
transactions that are replayed, so in most cases we don't need to
take drastic action.
For CRC enabled filesystems, leave the verifiers in place - we need
to call them to recalculate the CRCs on the objects anyway. This
recovery problem can be solved for such filesystems - we have a LSN
stamped in all metadata at writeback time that we can to determine
whether the item should be replayed or not. This is a separate piece
of work, so is not addressed by this patch.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit 9222a9cf86)
For CRC enabled filesystems, the BMBT is rooted in an inode, so it
passes through a different code path on root splits than the
freespace and inode btrees. This is much less traversed by xfstests
than the other trees. When testing on a 1k block size filesystem,
I've been seeing ASSERT failures in generic/234 like:
XFS: Assertion failed: cur->bc_btnum != XFS_BTNUM_BMAP || cur->bc_private.b.allocated == 0, file: fs/xfs/xfs_btree.c, line: 317
which are generally preceded by a lblock check failure. I noticed
this in the bmbt stats:
$ pminfo -f xfs.btree.block_map
xfs.btree.block_map.lookup
value 39135
xfs.btree.block_map.compare
value 268432
xfs.btree.block_map.insrec
value 15786
xfs.btree.block_map.delrec
value 13884
xfs.btree.block_map.newroot
value 2
xfs.btree.block_map.killroot
value 0
.....
Very little coverage of root splits and merges. Indeed, on a 4k
filesystem, block_map.newroot and block_map.killroot are both zero.
i.e. the code is not exercised at all, and it's the only generic
btree infrastructure operation that is not exercised by a default run
of xfstests.
Turns out that on a 1k filesystem, generic/234 accounts for one of
those two root splits, and that is somewhat of a smoking gun. In
fact, it's the same problem we saw in the directory/attr code where
headers are memcpy()d from one block to another without updating the
self describing metadata.
Simple fix - when copying the header out of the root block, make
sure the block number is updated correctly.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit ade1335afe)
Michael L. Semon has been testing CRC patches on a 32 bit system and
been seeing assert failures in the directory code from xfs/080.
Thanks to Michael's heroic efforts with printk debugging, we found
that the problem was that the last free space being left in the
directory structure was too small to fit a unused tag structure and
it was being corrupted and attempting to log a region out of bounds.
Hence the assert failure looked something like:
.....
#5 calling xfs_dir2_data_log_unused() 36 32
#1 4092 4095 4096
#2 8182 8183 4096
XFS: Assertion failed: first <= last && last < BBTOB(bp->b_length), file: fs/xfs/xfs_trans_buf.c, line: 568
Where #1 showed the first region of the dup being logged (i.e. the
last 4 bytes of a directory buffer) and #2 shows the corrupt values
being calculated from the length of the dup entry which overflowed
the size of the buffer.
It turns out that the problem was not in the logging code, nor in
the freespace handling code. It is an initial condition bug that
only shows up on 32 bit systems. When a new buffer is initialised,
where's the freespace that is set up:
[ 172.316249] calling xfs_dir2_leaf_addname() from xfs_dir_createname()
[ 172.316346] #9 calling xfs_dir2_data_log_unused()
[ 172.316351] #1 calling xfs_trans_log_buf() 60 63 4096
[ 172.316353] #2 calling xfs_trans_log_buf() 4094 4095 4096
Note the offset of the first region being logged? It's 60 bytes into
the buffer. Once I saw that, I pretty much knew that the bug was
going to be caused by this.
Essentially, all direct entries are rounded to 8 bytes in length,
and all entries start with an 8 byte alignment. This means that we
can decode inplace as variables are naturally aligned. With the
directory data supposedly starting on a 8 byte boundary, and all
entries padded to 8 bytes, the minimum freespace in a directory
block is supposed to be 8 bytes, which is large enough to fit a
unused data entry structure (6 bytes in size). The fact we only have
4 bytes of free space indicates a directory data block alignment
problem.
And what do you know - there's an implicit hole in the directory
data block header for the CRC format, which means the header is 60
byte on 32 bit intel systems and 64 bytes on 64 bit systems. Needs
padding. And while looking at the structures, I found the same
problem in the attr leaf header. Fix them both.
Note that this only affects 32 bit systems with CRCs enabled.
Everything else is just fine. Note that CRC enabled filesystems created
before this fix on such systems will not be readable with this fix
applied.
Reported-by: Michael L. Semon <mlsemon35@gmail.com>
Debugged-by: Michael L. Semon <mlsemon35@gmail.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit 8a1fd2950e)
We write the superblock every 30s or so which results in the
verifier being called. Right now that results in this output
every 30s:
XFS (vda): Version 5 superblock detected. This kernel has EXPERIMENTAL support enabled!
Use of these features in this kernel is at your own risk!
And spamming the logs.
We don't need to check for whether we support v5 superblocks or
whether there are feature bits we don't support set as these are
only relevant when we first mount the filesytem. i.e. on superblock
read. Hence for the write verification we can just skip all the
checks (and hence verbose output) altogether.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit 34510185ab)
Pull btrfs fixes from Chris Mason:
"This is an assortment of crash fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: stop all workers before cleaning up roots
Btrfs: fix use-after-free bug during umount
Btrfs: init relocate extent_io_tree with a mapping
btrfs: Drop inode if inode root is NULL
Btrfs: don't delete fs_roots until after we cleanup the transaction
Merge misc fixes from Andrew Morton:
"Bunch of fixes and one little addition to math64.h"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (27 commits)
include/linux/math64.h: add div64_ul()
mm: memcontrol: fix lockless reclaim hierarchy iterator
frontswap: fix incorrect zeroing and allocation size for frontswap_map
kernel/audit_tree.c:audit_add_tree_rule(): protect `rule' from kill_rules()
mm: migration: add migrate_entry_wait_huge()
ocfs2: add missing lockres put in dlm_mig_lockres_handler
mm/page_alloc.c: fix watermark check in __zone_watermark_ok()
drivers/misc/sgi-gru/grufile.c: fix info leak in gru_get_config_info()
aio: fix io_destroy() regression by using call_rcu()
rtc-at91rm9200: use shadow IMR on at91sam9x5
rtc-at91rm9200: add shadow interrupt mask
rtc-at91rm9200: refactor interrupt-register handling
rtc-at91rm9200: add configuration support
rtc-at91rm9200: add match-table compile guard
fs/ocfs2/namei.c: remove unecessary ERROR when removing non-empty directory
swap: avoid read_swap_cache_async() race to deadlock while waiting on discard I/O completion
drivers/rtc/rtc-twl.c: fix missing device_init_wakeup() when booted with device tree
cciss: fix broken mutex usage in ioctl
audit: wait_for_auditd() should use TASK_UNINTERRUPTIBLE
drivers/rtc/rtc-cmos.c: fix accidentally enabling rtc channel
...
dlm_mig_lockres_handler() is missing a dlm_lockres_put() on an error path.
Signed-off-by: joyce <xuejiufei@huawei.com>
Reviewed-by: shencanquan <shencanquan@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There was a regression introduced by 36f5588905 ("aio: refcounting
cleanup"), reported by Jens Axboe - the refcounting cleanup switched to
using RCU in the shutdown path, but the synchronize_rcu() was done in
the context of the io_destroy() syscall greatly increasing the time it
could block.
This patch switches it to call_rcu() and makes shutdown asynchronous
(more asynchronous than it was originally; before the refcount changes
io_destroy() would still wait on pending kiocbs).
Note that there's a global quota on the max outstanding kiocbs, and that
quota must be manipulated synchronously; otherwise io_setup() could
return -EAGAIN when there isn't quota available, and userspace won't
have any way of waiting until shutdown of the old kioctxs has finished
(besides busy looping).
So we release our quota before kioctx shutdown has finished, which
should be fine since the quota never corresponded to anything real
anyways.
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Reported-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Tested-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While removing a non-empty directory, the kernel dumps a message:
(rmdir,21743,1):ocfs2_unlink:953 ERROR: status = -39
Suppress the error message from being printed in the dmesg so users
don't panic.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Acked-by: Sunil Mushran <sunil.mushran@gmail.com>
Reviewed-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If an error occurs, for example an EIO in __ocfs2_prepare_orphan_dir,
ocfs2_prep_new_orphaned_file will release the inode_ac, then when the
caller of ocfs2_prep_new_orphaned_file gets a 0 return, it will refer to
a NULL ocfs2_alloc_context struct in the following functions. A kernel
panic happens.
Signed-off-by: "Xiaowei.Hu" <xiaowei.hu@oracle.com>
Reviewed-by: shencanquan <shencanquan@huawei.com>
Acked-by: Sunil Mushran <sunil.mushran@gmail.com>
Cc: Joe Jin <joe.jin@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The dmesg_restrict sysctl currently covers the syslog method for access
dmesg, however /dev/kmsg isn't covered by the same protections. Most
people haven't noticed because util-linux dmesg(1) defaults to using the
syslog method for access in older versions. With util-linux dmesg(1)
defaults to reading directly from /dev/kmsg.
To fix /dev/kmsg, let's compare the existing interfaces and what they
allow:
- /proc/kmsg allows:
- open (SYSLOG_ACTION_OPEN) if CAP_SYSLOG since it uses a destructive
single-reader interface (SYSLOG_ACTION_READ).
- everything, after an open.
- syslog syscall allows:
- anything, if CAP_SYSLOG.
- SYSLOG_ACTION_READ_ALL and SYSLOG_ACTION_SIZE_BUFFER, if
dmesg_restrict==0.
- nothing else (EPERM).
The use-cases were:
- dmesg(1) needs to do non-destructive SYSLOG_ACTION_READ_ALLs.
- sysklog(1) needs to open /proc/kmsg, drop privs, and still issue the
destructive SYSLOG_ACTION_READs.
AIUI, dmesg(1) is moving to /dev/kmsg, and systemd-journald doesn't
clear the ring buffer.
Based on the comments in devkmsg_llseek, it sounds like actions besides
reading aren't going to be supported by /dev/kmsg (i.e.
SYSLOG_ACTION_CLEAR), so we have a strict subset of the non-destructive
syslog syscall actions.
To this end, move the check as Josh had done, but also rename the
constants to reflect their new uses (SYSLOG_FROM_CALL becomes
SYSLOG_FROM_READER, and SYSLOG_FROM_FILE becomes SYSLOG_FROM_PROC).
SYSLOG_FROM_READER allows non-destructive actions, and SYSLOG_FROM_PROC
allows destructive actions after a capabilities-constrained
SYSLOG_ACTION_OPEN check.
- /dev/kmsg allows:
- open if CAP_SYSLOG or dmesg_restrict==0
- reading/polling, after open
Addresses https://bugzilla.redhat.com/show_bug.cgi?id=903192
[akpm@linux-foundation.org: use pr_warn_once()]
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Christian Kujau <lists@nerdbynature.de>
Tested-by: Josh Boyer <jwboyer@redhat.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull ceph fixes from Sage Weil:
"There is a pair of fixes for double-frees in the recent bundle for
3.10, a couple of fixes for long-standing bugs (sleep while atomic and
an endianness fix), and a locking fix that can be triggered when osds
are going down"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
rbd: fix cleanup in rbd_add()
rbd: don't destroy ceph_opts in rbd_add()
ceph: ceph_pagelist_append might sleep while atomic
ceph: add cpu_to_le32() calls when encoding a reconnect capability
libceph: must hold mutex for reset_changed_osds()
Pull timer fixes from Thomas Gleixner:
- Trivial: unused variable removal
- Posix-timers: Add the clock ID to the new proc interface to make it
useful. The interface is new and should be functional when we reach
the final 3.10 release.
- Cure a false positive warning in the tick code introduced by the
overhaul in 3.10
- Fix for a persistent clock detection regression introduced in this
cycle
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timekeeping: Correct run-time detection of persistent_clock.
ntp: Remove unused variable flags in __hardpps
posix-timers: Show clock ID in proc file
tick: Cure broadcast false positive pending bit warning
Dave reported a panic because the extent_root->commit_root was NULL in the
caching kthread. That is because we just unset it in free_root_pointers, which
is not the correct thing to do, we have to either wait for the caching kthread
to complete or hold the extent_commit_sem lock so we know the thread has exited.
This patch makes the kthreads all stop first and then we do our cleanup. This
should fix the race. Thanks,
Reported-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Dave reported a NULL pointer deref. This is caused because he thought he'd be
smart and add sanity checks to the extent_io bit operations, but he didn't
expect a tree to have a NULL mapping. To fix this we just need to init the
relocation's processed_blocks with the btree_inode->i_mapping. Thanks,
Reported-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
There is a path where btrfs_drop_inode() is called with its inode's root
is NULL: In btrfs_new_inode(), when btrfs_set_inode_index() fails,
iput() is called. We should handle this case before taking look at the
root->root_item.
Signed-off-by: Naohiro Aota <naota@elisp.net>
Reviewed-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
We get a use after free if we had a transaction to cleanup since there could be
delayed inodes which refer to their respective fs_root. Thanks
Reported-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
* A couple of MAINTAINERS updates
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABCgAGBQJRsU9rAAoJENaSAD2qAscK3ccP/ia/C8LawYZbgbZq8VVGY0li
LzBbO+OT2O47h7tmILOM2fgR86emt1FkKa+snWBGWiigGfeTyQX6g2/I9adEEY3D
H8n5DBc6d6TlBO2ca1dXUkIe/b2b3Mj4NwxZIXDLwd15nJDwW5aQCFMtlrJ0UmBp
q2Xuvj/AprXTz02jk7V7vcor5wd8VgveTnLTIclLQ1AwJ+Zt51nx5UJo4jqlQydF
1MO+tvKRo4LWs+dl1j1J4VkZIfy3frXO8m2W9wc+pJRE6FO1jyDrbjjyaP5BfelU
tfpKkx801tnp//1NTjOcakhifH/kCm43WQn4eV7qb386pV85nGEZrP1MdRoEVmv3
39NlKLi9sUvLZ2DgAl1qsdh36VD9PF1z1uKlcRjABJb/5IOM2RfvM2FltfqLtMM0
lenZOZb3r62CiKIoqz6CgQnU18XE3ICo/mBiuWm1FS2l/IgwKfU4dUV361KH4ykb
KzHKFVDY68SdPHkYJtEGJ86pgLA7N6Lu97c8+zesnGzYK1HLf5I7MvP9toM0zUXX
TEv68sbRGEOia7EGcyE9gSVZebhtneosi4/z73GJa93kYyc+6AleLL3y+X7MthwT
WbvbzmrsU1JeRO/VM99xRYDspDM3kxnTQlhyGOw+GunUY6Zk7nl2IiNcNTPfGP3w
QJhg09DfITcwOmf2ImFb
=NgKp
-----END PGP SIGNATURE-----
Merge tag 'ecryptfs-3.10-rc5-msync' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs
Pull ecryptfs fixes from Tyler Hicks:
- Fixes how eCryptfs handles msync to sync both the upper and lower
file
- A couple of MAINTAINERS updates
* tag 'ecryptfs-3.10-rc5-msync' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
eCryptfs: Check return of filemap_write_and_wait during fsync
Update eCryptFS maintainers
ecryptfs: fixed msync to flush data
Pull CIFS fix from Steve French:
"Fix one byte buffer overrun with prefixpaths on cifs mounts which can
cause a problem with mount depending on the string length"
* 'for-3.10' of git://git.samba.org/sfrench/cifs-2.6:
cifs: fix off-by-one bug in build_unc_path_to_root
1d2ef59014 caused a regression in ncpfs such that
directories could no longer be removed. This was because ncp_rmdir checked
to see if a dentry could be unhashed before allowing it to be removed. Since
1d2ef59014 introduced a change that incremented
dentry->d_count causing it to always be greater than 1 unhash would always
fail. Thus causing the error path in ncp_rmdir to always be taken. Removing
this error path is safe as unhashing is still accomplished by calls to dput
from vfs_rmdir.
Signed-off-by: Dave Chiluk <chiluk@canonical.com>
Signed-off-by: Petr Vandrovec <petr@vandrovec.name>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
- Rework of dquot CRCs
- Fix for remote attribute invalidation of a leaf
- Fix ordering of transaction replay in recovery
- Implement CRCs for inode unlinked list
- Disable noattr2/attr2 mount options when CRCs are enabled
- Bump the limitation of ACL entries for v5 superblocks
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
iQIcBAABAgAGBQJRsLfDAAoJENaLyazVq6ZOgBQP/jKupB/cOV3sewBsPDDPBR46
xg3qaps6zpEWtXGWnXe8HF/u57YfoA5K+YVwq6+jkIsYFjP3dDdLPDeEeC3HoB9I
VZPmV5VEACvUyD9WhMeSjAbRPAtweFFbTuZZqULv2SpG+tUaF8VUz7luUM4XpcFa
NtxccORMmBBN1j71Qod4+xxJ1BM/KtXV1RBMudtPAWr+//LKwLm/9HFavw2uXeQW
xgebmc95DXrFpjwXepHQXW/xTmVPclah034JC8kj+Q/VhmWvVIgo331gVL0M1+L9
U7OObUdJAJ+5VN872TgwbzWMSRCPKZ9PTh78LZksYtweb5GjD/x6yVXFSWAix/O7
q1EkYUfBR1YHZVTAfoE2QGCTkgJCsellBJWzIoov2/Qqq18QRA1EtznjtZ+WkqP6
dKzgcnDb7LEPjg6Y8L4ZKGhylXGkPmCScTprjgPIWoAUS2ytURkaG+CCK5sp/Ldn
KRu0zjbrcQrxAVkMo1E2hkOOZDD1qazJE5mfhOnQxLKsPmryM5tarzWmd9O+VjH8
tiJosS3JGoz1rLOGKYjlqEr9G0zc/3Bmaz7tFnaHeCWTKUHxo7AXwnEfPmKle29h
lbJFW86DU56QlNb/mLEE+v2ojC/PDpcYHddxG9Yo3B5CrDX8xJgXQ6ML0g86ceL3
tuyMnOF8opR72Wavc0co
=R+6z
-----END PGP SIGNATURE-----
Merge tag 'for-linus-v3.10-rc5' of git://oss.sgi.com/xfs/xfs
Pull more xfs updates from Ben Myers:
"Here are several fixes for filesystems with CRC support turned on:
fixes for quota, remote attributes, and recovery. There is also some
feature work related to CRCs: the implementation of CRCs for the inode
unlinked lists, disabling noattr2/attr2 options when appropriate, and
bumping the maximum number of ACLs.
I would have preferred to defer this last category of items to 3.11.
This would require setting a feature bit for the on-disk changes, so
there is some pressure to get these in 3.10. I believe this
represents the end of the CRC related queue.
- Rework of dquot CRCs
- Fix for remote attribute invalidation of a leaf
- Fix ordering of transaction replay in recovery
- Implement CRCs for inode unlinked list
- Disable noattr2/attr2 mount options when CRCs are enabled
- Bump the limitation of ACL entries for v5 superblocks"
* tag 'for-linus-v3.10-rc5' of git://oss.sgi.com/xfs/xfs:
xfs: increase number of ACL entries for V5 superblocks
xfs: disable noattr2/attr2 mount options for CRC enabled filesystems
xfs: inode unlinked list needs to recalculate the inode CRC
xfs: fix log recovery transaction item reordering
xfs: fix remote attribute invalidation for a leaf
xfs: rework dquot CRCs
The limit of 25 ACL entries is arbitrary, but baked into the on-disk
format. For version 5 superblocks, increase it to the maximum nuber
of ACLs that can fit into a single xattr.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Mark Tinguely <tinuguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit 5c87d4bc1a)
attr2 format is always enabled for v5 superblock filesystems, so the
mount options to enable or disable it need to be cause mount errors.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit d3eaace84e)
The inode unlinked list manipulations operate directly on the inode
buffer, and so bypass the inode CRC calculation mechanisms. Hence an
inode on the unlinked list has an invalid CRC. Fix this by
recalculating the CRC whenever we modify an unlinked list pointer in
an inode, ncluding during log recovery. This is trivial to do and
results in unlinked list operations always leaving a consistent
inode in the buffer.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit 0a32c26e72)
There are several constraints that inode allocation and unlink
logging impose on log recovery. These all stem from the fact that
inode alloc/unlink are logged in buffers, but all other inode
changes are logged in inode items. Hence there are ordering
constraints that recovery must follow to ensure the correct result
occurs.
As it turns out, this ordering has been working mostly by chance
than good management. The existing code moves all buffers except
cancelled buffers to the head of the list, and everything else to
the tail of the list. The problem with this is that is interleaves
inode items with the buffer cancellation items, and hence whether
the inode item in an cancelled buffer gets replayed is essentially
left to chance.
Further, this ordering causes problems for log recovery when inode
CRCs are enabled. It typically replays the inode unlink buffer long before
it replays the inode core changes, and so the CRC recorded in an
unlink buffer is going to be invalid and hence any attempt to
validate the inode in the buffer is going to fail. Hence we really
need to enforce the ordering that the inode alloc/unlink code has
expected log recovery to have since inode chunk de-allocation was
introduced back in 2003...
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit a775ad7780)
When invalidating an attribute leaf block block, there might be
remote attributes that it points to. With the recent rework of the
remote attribute format, we have to make sure we calculate the
length of the attribute correctly. We aren't doing that in
xfs_attr3_leaf_inactive(), so fix it.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Mark Tinguely <tinuguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit 59913f14df)