2
0
mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-26 14:14:01 +08:00
Commit Graph

37572 Commits

Author SHA1 Message Date
Trond Myklebust
df817ba357 NFSv4: fix open/lock state recovery error handling
The current open/lock state recovery unfortunately does not handle errors
such as NFS4ERR_CONN_NOT_BOUND_TO_SESSION correctly. Instead of looping,
just proceeds as if the state manager is finished recovering.
This patch ensures that we loop back, handle higher priority errors
and complete the open/lock state recovery.

Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-28 16:03:04 -04:00
Trond Myklebust
a4339b7b68 NFSv4: Fix lock recovery when CREATE_SESSION/SETCLIENTID_CONFIRM fails
If a NFSv4.x server returns NFS4ERR_STALE_CLIENTID in response to a
CREATE_SESSION or SETCLIENTID_CONFIRM in order to tell us that it rebooted
a second time, then the client will currently take this to mean that it must
declare all locks to be stale, and hence ineligible for reboot recovery.

RFC3530 and RFC5661 both suggest that the client should instead rely on the
server to respond to inelegible open share, lock and delegation reclaim
requests with NFS4ERR_NO_GRACE in this situation.

Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-28 16:03:03 -04:00
David Howells
f3f760314a NFS: Fabricate fscache server index key correctly
When fabricating a server index key for fscache, we should clear the index key
buffer before starting to fill it in, not in the middle.

Reported-by: James Pearson <james-p@moving-picture.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-25 21:25:18 -04:00
Fabian Frederick
2f3169fb18 nfs: fix duplicate proc entries
Commit 65b38851a1
("NFS: Fix /proc/fs/nfsfs/servers and /proc/fs/nfsfs/volumes")

updated the following function:
static int nfs_volume_list_open(struct inode *inode, struct file *file)

it used &nfs_server_list_ops instead of &nfs_volume_list_ops
which means cat /proc/fs/nfsfs/volumes = /proc/fs/nfsfs/servers

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Fixes: 65b38851a1 (NFS: Fix /proc/fs/nfsfs/servers and...)
Cc: stable@vger.kernel.org # 3.4.x+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-24 23:00:18 -04:00
Trond Myklebust
cd9288ffae NFSv4: Fix another bug in the close/open_downgrade code
James Drew reports another bug whereby the NFS client is now sending
an OPEN_DOWNGRADE in a situation where it should really have sent a
CLOSE: the client is opening the file for O_RDWR, but then trying to
do a downgrade to O_RDONLY, which is not allowed by the NFSv4 spec.

Reported-by: James Drews <drews@engr.wisc.edu>
Link: http://lkml.kernel.org/r/541AD7E5.8020409@engr.wisc.edu
Fixes: aee7af356e (NFSv4: Fix problems with close in the presence...)
Cc: stable@vger.kernel.org # 2.6.33+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-18 13:04:22 -04:00
Steve Dickson
080af20cc9 NFSv4: nfs4_state_manager() vs. nfs_server_remove_lists()
There is a race between nfs4_state_manager() and
nfs_server_remove_lists() that happens during a nfsv3 mount.

The v3 mount notices there is already a supper block so
nfs_server_remove_lists() called which uses the nfs_client_lock
spin lock to synchronize access to the client list.

At the same time nfs4_state_manager() is running through
the client list looking for work to do, using the same
lock. When nfs4_state_manager() wins the race to the
list, a v3 client pointer is found and not ignored
properly which causes the panic.

Moving some protocol checks before the state checking
avoids the panic.

CC: Stable Tree <stable@vger.kernel.org>
Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-18 13:04:21 -04:00
NeilBrown
f39c010479 NFS: remove BUG possibility in nfs4_open_and_get_state
commit 4fa2c54b51
    NFS: nfs4_do_open should add negative results to the dcache.

used "d_drop(); d_add();" to ensure that a dentry was hashed
as a negative cached entry.
This is not safe if the dentry has an non-NULL ->d_inode.
It will trigger a BUG_ON in d_instantiate().
In that case, d_delete() is needed.

Also, only d_add if the dentry is currently unhashed, it seems
pointless removed and re-adding it unchanged.

Reported-by: Christoph Hellwig <hch@infradead.org>
Fixes: 4fa2c54b51
Cc: Jeff Layton <jeff.layton@primarydata.com>
Link: http://lkml.kernel.org/r/20140908144525.GB19811@infradead.org
Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-12 13:10:53 -04:00
Weston Andros Adamson
224ecbf5a6 pnfs: fix filelayout_retry_commit when idx > 0
filelayout_retry_commit was recently split out from alloc_ds_commits,
but was done in such a way that the bucket pointer always starts at
index 0 no matter what the @idx argument is set to.

The intention of the @idx argument is to retry commits starting at
bucket @idx. This is called when alloc_ds_commits fails for a bucket.

Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-10 12:43:45 -07:00
Jeff Layton
0c0e0d3c09 nfs: revert "nfs4: queue free_lock_state job submission to nfsiod"
This reverts commit 49a4bda22e.

Christoph reported an oops due to the above commit:

generic/089 242s ...[ 2187.041239] general protection fault: 0000 [#1]
SMP
[ 2187.042899] Modules linked in:
[ 2187.044000] CPU: 0 PID: 11913 Comm: kworker/0:1 Not tainted 3.16.0-rc6+ #1151
[ 2187.044287] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[ 2187.044287] Workqueue: nfsiod free_lock_state_work
[ 2187.044287] task: ffff880072b50cd0 ti: ffff88007a4ec000 task.ti: ffff88007a4ec000
[ 2187.044287] RIP: 0010:[<ffffffff81361ca6>]  [<ffffffff81361ca6>] free_lock_state_work+0x16/0x30
[ 2187.044287] RSP: 0018:ffff88007a4efd58  EFLAGS: 00010296
[ 2187.044287] RAX: 6b6b6b6b6b6b6b6b RBX: ffff88007a947ac0 RCX: 8000000000000000
[ 2187.044287] RDX: ffffffff826af9e0 RSI: ffff88007b093c00 RDI: ffff88007b093db8
[ 2187.044287] RBP: ffff88007a4efd58 R08: ffffffff832d3e10 R09: 000001c40efc0000
[ 2187.044287] R10: 0000000000000000 R11: 0000000000059e30 R12: ffff88007fc13240
[ 2187.044287] R13: ffff88007fc18b00 R14: ffff88007b093db8 R15: 0000000000000000
[ 2187.044287] FS:  0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
[ 2187.044287] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 2187.044287] CR2: 00007f93ec33fb80 CR3: 0000000079dc2000 CR4: 00000000000006f0
[ 2187.044287] Stack:
[ 2187.044287]  ffff88007a4efdd8 ffffffff810cc877 ffffffff810cc80d ffff88007fc13258
[ 2187.044287]  000000007a947af0 0000000000000000 ffffffff8353ccc8 ffffffff82b6f3d0
[ 2187.044287]  0000000000000000 ffffffff82267679 ffff88007a4efdd8 ffff88007fc13240
[ 2187.044287] Call Trace:
[ 2187.044287]  [<ffffffff810cc877>] process_one_work+0x1c7/0x490
[ 2187.044287]  [<ffffffff810cc80d>] ? process_one_work+0x15d/0x490
[ 2187.044287]  [<ffffffff810cd569>] worker_thread+0x119/0x4f0
[ 2187.044287]  [<ffffffff810fbbad>] ? trace_hardirqs_on+0xd/0x10
[ 2187.044287]  [<ffffffff810cd450>] ? init_pwq+0x190/0x190
[ 2187.044287]  [<ffffffff810d3c6f>] kthread+0xdf/0x100
[ 2187.044287]  [<ffffffff810d3b90>] ? __init_kthread_worker+0x70/0x70
[ 2187.044287]  [<ffffffff81d9873c>] ret_from_fork+0x7c/0xb0
[ 2187.044287]  [<ffffffff810d3b90>] ? __init_kthread_worker+0x70/0x70
[ 2187.044287] Code: 0f 1f 44 00 00 31 c0 5d c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 8d b7 48 fe ff ff 48 8b 87 58 fe ff ff 48 89 e5 48 8b 40 30 <48> 8b 00 48 8b 10 48 89 c7 48 8b 92 90 03 00 00 ff 52 28 5d c3
[ 2187.044287] RIP  [<ffffffff81361ca6>] free_lock_state_work+0x16/0x30
[ 2187.044287]  RSP <ffff88007a4efd58>
[ 2187.103626] ---[ end trace 0f11326d28e5d8fa ]---

The original reason for this patch was because the fl_release_private
operation couldn't sleep. With commit ed9814d858 (locks: defer freeing
locks in locks_delete_lock until after i_lock has been dropped), this is
no longer a problem so we can revert this patch.

Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-08 17:00:32 -07:00
Cong Wang
21e81002f9 nfs: fix kernel warning when removing proc entry
I saw the following kernel warning:

[ 1852.321222] ------------[ cut here ]------------
[ 1852.326527] WARNING: CPU: 0 PID: 118 at fs/proc/generic.c:521 remove_proc_entry+0x154/0x16b()
[ 1852.335630] remove_proc_entry: removing non-empty directory 'fs/nfsfs', leaking at least 'volumes'
[ 1852.344084] CPU: 0 PID: 118 Comm: kworker/u8:2 Not tainted 3.16.0+ #540
[ 1852.350036] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 1852.354992] Workqueue: netns cleanup_net
[ 1852.358701]  0000000000000000 ffff880116f2fbd0 ffffffff819c03e9 ffff880116f2fc18
[ 1852.366474]  ffff880116f2fc08 ffffffff810744ee ffffffff811e0e6e ffff8800d4e96238
[ 1852.373507]  ffffffff81dbe665 ffff8800d46a5948 0000000000000005 ffff880116f2fc68
[ 1852.380224] Call Trace:
[ 1852.381976]  [<ffffffff819c03e9>] dump_stack+0x4d/0x66
[ 1852.385495]  [<ffffffff810744ee>] warn_slowpath_common+0x7a/0x93
[ 1852.389869]  [<ffffffff811e0e6e>] ? remove_proc_entry+0x154/0x16b
[ 1852.393987]  [<ffffffff8107457b>] warn_slowpath_fmt+0x4c/0x4e
[ 1852.397999]  [<ffffffff811e0e6e>] remove_proc_entry+0x154/0x16b
[ 1852.402034]  [<ffffffff8129c73d>] nfs_fs_proc_net_exit+0x53/0x56
[ 1852.406136]  [<ffffffff812a103b>] nfs_net_exit+0x12/0x1d
[ 1852.409774]  [<ffffffff81785bc9>] ops_exit_list+0x44/0x55
[ 1852.413529]  [<ffffffff81786389>] cleanup_net+0xee/0x182
[ 1852.417198]  [<ffffffff81088c9e>] process_one_work+0x209/0x40d
[ 1852.502320]  [<ffffffff81088bf7>] ? process_one_work+0x162/0x40d
[ 1852.587629]  [<ffffffff810890c1>] worker_thread+0x1f0/0x2c7
[ 1852.673291]  [<ffffffff81088ed1>] ? process_scheduled_works+0x2f/0x2f
[ 1852.759470]  [<ffffffff8108e079>] kthread+0xc9/0xd1
[ 1852.843099]  [<ffffffff8109427f>] ? finish_task_switch+0x3a/0xce
[ 1852.926518]  [<ffffffff8108dfb0>] ? __kthread_parkme+0x61/0x61
[ 1853.008565]  [<ffffffff819cbeac>] ret_from_fork+0x7c/0xb0
[ 1853.076477]  [<ffffffff8108dfb0>] ? __kthread_parkme+0x61/0x61
[ 1853.140653] ---[ end trace 69c4c6617f78e32d ]---

It looks wrong that we add "/proc/net/nfsfs" in nfs_fs_proc_net_init()
while remove "/proc/fs/nfsfs" in nfs_fs_proc_net_exit().

Fixes: commit 65b38851a1 (NFS: Fix /proc/fs/nfsfs/servers and /proc/fs/nfsfs/volumes)
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Dan Aloni <dan@kernelim.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
[Trond: replace uses of remove_proc_entry() with remove_proc_subtree()
as suggested by Al Viro]
Cc: stable@vger.kernel.org # 3.4.x : 65b38851a1: NFS: Fix /proc/fs/nfsfs/servers
Cc: stable@vger.kernel.org # 3.4.x
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-08 16:41:36 -07:00
Linus Torvalds
9142eadefe Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull filesystem fixes from Al Viro:
 "Several bugfixes (all of them -stable fodder).

  Alexey's one deals with double mutex_lock() in UFS (apparently, nobody
  has tried to test "ufs: sb mutex merge + mutex_destroy" on something
  like file creation/removal on ufs).  Mine deal with two kinds of
  umount bugs, in umount propagation and in handling of automounted
  submounts, both resulting in bogus transient EBUSY from umount"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  ufs: fix deadlocks introduced by sb mutex merge
  fix EBUSY on umount() from MNT_SHRINKABLE
  get rid of propagate_umount() mistakenly treating slaves as busy.
2014-09-07 10:59:58 -07:00
Alexey Khoroshilov
9ef7db7f38 ufs: fix deadlocks introduced by sb mutex merge
Commit 0244756edc ("ufs: sb mutex merge + mutex_destroy") introduces
deadlocks in ufs_new_inode() and ufs_free_inode().
Most callers of that functions acqure the mutex by themselves and
ufs_{new,free}_inode() do that via lock_ufs(),
i.e we have an unavoidable double lock.

The patch proposes to resolve the issue by making sure that
ufs_{new,free}_inode() are not called with the mutex held.

Found by Linux Driver Verification project (linuxtesting.org).

Cc: stable@vger.kernel.org # 3.16
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-09-07 13:26:39 -04:00
Linus Torvalds
11e9739813 xfs: fixes for v3.17-rc3
Fix:
 - a direct IO read/buffered read data corruption
 - the associated fallout from the DIO data corruption fix
 - collapse range bugs that are potential data corruption issues.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJUCkM+AAoJEK3oKUf0dfodUBgP+gJu50/XV4TFRLPlCRxhvN61
 371i3ASls1y7ivhj40NzgbDDAZHM2q8Zqwd//318dFViHhWQDlH/1ga06kscRpZX
 d8cQEbFHApgUGQL5Gdq2l2hvAzYa75H0m6cq3jveyrN2rscjCSmAXwtlcEmx3AR6
 TnCpxuVL5asjEGZYb0KfQACq//rASHJbhukpo1gB4ccZ0boWHOVf5SxuS4remzs9
 y+rlPFNl5RD/WVdnJSvu9zu/nP6op3Ax5r7jZanoKbisKHfd7QOa+k65+Vz0Vq9G
 kxgfhz+yLfkOvcktq+41e1lVBln7fCIlcO9m3b53uxWPx5cla323893UiGYsA4F/
 j/gGlh1qaT6C/1M1JBWDLDx931S78XiR1Y+WbtAU1PO+GuO0IEap3+iqtS2+oNAv
 OrpThLOgqTspK6MJeToCzdn2lRT2BJpcKwxIyDK8g+p9N6qCpyw3DfiKyu0wipGH
 D2D3mtE6drSHNaSceFAz8CrQvPOR7Ygj92QGpGSfkohxap9h6VJR/wNp/oGnpmN0
 qgcxTrHvx3kw1hXssB4gjh6fBDnOUkac0isqxdow22Qt529t9sIanzMBvz+JxHQF
 zeqeFSh96lOXmB7UFBU+QyOhbDp3cJWChHrtY3Esw/+FmG6fxEy8z6pdZsiJYELr
 5tka2richPD+gyXzcZwP
 =tRQo
 -----END PGP SIGNATURE-----

Merge tag 'xfs-for-linus-3.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs

Pull xfs fixes from Dave Chinner:
 "The fixes all address recently discovered data corruption issues.

  The original Direct IO issue was discovered by Chris Mason @ Facebook
  on a production workload which mixed buffered reads with direct reads
  and writes IO to the same file.  The fix for that exposed other issues
  with page invalidation (exposed by millions of fsx operations) failing
  due to dirty buffers beyond EOF.

  Finally, the collapse_range code could also cause problems due to
  racing writeback changing the extent map while it was being shifted
  around.  The commits for that problem are simple mitigation fixes that
  prevent the problem from occuring.  A more robust fix for 3.18 that
  addresses the underlying problem is currently being worked on by
  Brian.

  Summary of fixes:
   - a direct IO read/buffered read data corruption
   - the associated fallout from the DIO data corruption fix
   - collapse range bugs that are potential data corruption issues"

* tag 'xfs-for-linus-3.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs:
  xfs: trim eofblocks before collapse range
  xfs: xfs_file_collapse_range is delalloc challenged
  xfs: don't log inode unless extent shift makes extent modifications
  xfs: use ranged writeback and invalidation for direct IO
  xfs: don't zero partial page cache pages during O_DIRECT writes
  xfs: don't zero partial page cache pages during O_DIRECT writes
  xfs: don't dirty buffers beyond EOF
2014-09-06 12:13:17 -07:00
Anton Altaparmakov
10096fb108 Export sync_filesystem() for modular ->remount_fs() use
This patch changes sync_filesystem() to be EXPORT_SYMBOL().

The reason this is needed is that starting with 3.15 kernel, due to
Theodore Ts'o's commit 02b9984d64 ("fs: push sync_filesystem() down to
the file system's remount_fs()"), all file systems that have dirty data
to be written out need to call sync_filesystem() from their
->remount_fs() method when remounting read-only.

As this is now a generically required function rather than an internal
only function it should be EXPORT_SYMBOL() so that all file systems can
call it.

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-09-05 08:16:21 -07:00
Linus Torvalds
b7fece1be8 Merge git://git.kvack.org/~bcrl/aio-fixes
Pull aio bugfixes from Ben LaHaise:
 "Two small fixes"

* git://git.kvack.org/~bcrl/aio-fixes:
  aio: block exit_aio() until all context requests are completed
  aio: add missing smp_rmb() in read_events_ring
2014-09-04 16:08:55 -07:00
Gu Zheng
6098b45b32 aio: block exit_aio() until all context requests are completed
It seems that exit_aio() also needs to wait for all iocbs to complete (like
io_destroy), but we missed the wait step in current implemention, so fix
it in the same way as we did in io_destroy.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Cc: stable@vger.kernel.org
2014-09-04 16:54:47 -04:00
Linus Torvalds
70c8038dd6 Merge tag 'for-f2fs-3.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs bug fixes from Jaegeuk Kim:
 "This series includes patches to:

   - fix recovery routines
   - fix bugs related to inline_data/xattr
   - fix when casting the dentry names
   - handle EIO or ENOMEM correctly
   - fix memory leak
   - fix lock coverage"

* tag 'for-f2fs-3.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (28 commits)
  f2fs: reposition unlock_new_inode to prevent accessing invalid inode
  f2fs: fix wrong casting for dentry name
  f2fs: simplify by using a literal
  f2fs: truncate stale block for inline_data
  f2fs: use macro for code readability
  f2fs: introduce need_do_checkpoint for readability
  f2fs: fix incorrect calculation with total/free inode num
  f2fs: remove rename and use rename2
  f2fs: skip if inline_data was converted already
  f2fs: remove rewrite_node_page
  f2fs: avoid double lock in truncate_blocks
  f2fs: prevent checkpoint during roll-forward
  f2fs: add WARN_ON in f2fs_bug_on
  f2fs: handle EIO not to break fs consistency
  f2fs: check s_dirty under cp_mutex
  f2fs: unlock_page when node page is redirtied out
  f2fs: introduce f2fs_cp_error for readability
  f2fs: give a chance to mount again when encountering errors
  f2fs: trigger release_dirty_inode in f2fs_put_super
  f2fs: don't skip checkpoint if there is no dirty node pages
  ...
2014-09-03 10:10:28 -07:00
Jeff Moyer
2ff396be60 aio: add missing smp_rmb() in read_events_ring
We ran into a case on ppc64 running mariadb where io_getevents would
return zeroed out I/O events.  After adding instrumentation, it became
clear that there was some missing synchronization between reading the
tail pointer and the events themselves.  This small patch fixes the
problem in testing.

Thanks to Zach for helping to look into this, and suggesting the fix.

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Cc: stable@vger.kernel.org
2014-09-02 15:20:03 -04:00
Chao Yu
b73e52824c f2fs: reposition unlock_new_inode to prevent accessing invalid inode
As the race condition on the inode cache, following scenario can appear:
[Thread a]				[Thread b]
					->f2fs_mkdir
					  ->f2fs_add_link
					    ->__f2fs_add_link
					      ->init_inode_metadata failed here
->gc_thread_func
  ->f2fs_gc
    ->do_garbage_collect
      ->gc_data_segment
        ->f2fs_iget
          ->iget_locked
            ->wait_on_inode
					  ->unlock_new_inode
        ->move_data_page
					  ->make_bad_inode
					  ->iput

When we fail in create/symlink/mkdir/mknod/tmpfile, the new allocated inode
should be set as bad to avoid being accessed by other thread. But in above
scenario, it allows f2fs to access the invalid inode before this inode was set
as bad.
This patch fix the potential problem, and this issue was found by code review.

change log from v1:
 o Add condition judgment in gc_data_segment() suggested by Changman Lee.
 o use iget_failed to simplify code.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-02 00:22:24 -07:00
Brian Foster
41b9d7263e xfs: trim eofblocks before collapse range
xfs_collapse_file_space() currently writes back the entire file
undergoing collapse range to settle things down for the extent shift
algorithm. While this prevents changes to the extent list during the
collapse operation, the writeback itself is not enough to prevent
unnecessary collapse failures.

The current shift algorithm uses the extent index to iterate the in-core
extent list. If a post-eof delalloc extent persists after the writeback
(e.g., a prior zero range op where the end of the range aligns with eof
can separate the post-eof blocks such that they are not written back and
converted), xfs_bmap_shift_extents() becomes confused over the encoded
br_startblock value and fails the collapse.

As with the full writeback, this is a temporary fix until the algorithm
is improved to cope with a volatile extent list and avoid attempts to
shift post-eof extents.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-09-02 12:12:53 +10:00
Dave Chinner
1669a8ca21 xfs: xfs_file_collapse_range is delalloc challenged
If we have delalloc extents on a file before we run a collapse range
opertaion, we sync the range that we are going to collapse to
convert delalloc extents in that region to real extents to simplify
the shift operation.

However, the shift operation then assumes that the extent list is
not going to change as it iterates over the extent list moving
things about. Unfortunately, this isn't true because we can't hold
the ILOCK over all the operations. We can prevent new IO from
modifying the extent list by holding the IOLOCK, but that doesn't
prevent writeback from running....

And when writeback runs, it can convert delalloc extents is the
range of the file prior to the region being collapsed, and this
changes the indexes of all the extents in the file. That causes the
collapse range operation to Go Bad.

The right fix is to rewrite the extent shift operation not to be
dependent on the extent list not changing across the entire
operation, but this is a fairly significant piece of work to do.
Hence, as a short-term workaround for the problem, sync the entire
file before starting a collapse operation to remove all delalloc
ranges from the file and so avoid the problem of concurrent
writeback changing the extent list.

Diagnosed-and-Reported-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-09-02 12:12:53 +10:00
Brian Foster
ca446d880c xfs: don't log inode unless extent shift makes extent modifications
The file collapse mechanism uses xfs_bmap_shift_extents() to collapse
all subsequent extents down into the specified, previously punched out,
region. This function performs some validation, such as whether a
sufficient hole exists in the target region of the collapse, then shifts
the remaining exents downward.

The exit path of the function currently logs the inode unconditionally.
While we must log the inode (and abort) if an error occurs and the
transaction is dirty, the initial validation paths can generate errors
before the transaction has been dirtied. This creates an unnecessary
filesystem shutdown scenario, as the caller will cancel a transaction
that has been marked dirty.

Modify xfs_bmap_shift_extents() to OR the logflags bits as modifications
are made to the inode bmap. Only log the inode in the exit path if
logflags has been set. This ensures we only have to cancel a dirty
transaction if modifications have been made and prevents an unnecessary
filesystem shutdown otherwise.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-09-02 12:12:53 +10:00
Dave Chinner
7d4ea3ce63 xfs: use ranged writeback and invalidation for direct IO
Now we are not doing silly things with dirtying buffers beyond EOF
and using invalidation correctly, we can finally reduce the ranges of
writeback and invalidation used by direct IO to match that of the IO
being issued.

Bring the writeback and invalidation ranges back to match the
generic direct IO code - this will greatly reduce the perturbation
of cached data when direct IO and buffered IO are mixed, but still
provide the same buffered vs direct IO coherency behaviour we
currently have.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-09-02 12:12:53 +10:00
Dave Chinner
834ffca6f7 xfs: don't zero partial page cache pages during O_DIRECT writes
Similar to direct IO reads, direct IO writes are using 
truncate_pagecache_range to invalidate the page cache. This is
incorrect due to the sub-block zeroing in the page cache that
truncate_pagecache_range() triggers.

This patch fixes things by using invalidate_inode_pages2_range
instead.  It preserves the page cache invalidation, but won't zero
any pages.

cc: stable@vger.kernel.org
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-09-02 12:12:52 +10:00
Chris Mason
85e584da32 xfs: don't zero partial page cache pages during O_DIRECT writes
xfs is using truncate_pagecache_range to invalidate the page cache
during DIO reads.  This is different from the other filesystems who
only invalidate pages during DIO writes.

truncate_pagecache_range is meant to be used when we are freeing the
underlying data structs from disk, so it will zero any partial
ranges in the page.  This means a DIO read can zero out part of the
page cache page, and it is possible the page will stay in cache.

buffered reads will find an up to date page with zeros instead of
the data actually on disk.

This patch fixes things by using invalidate_inode_pages2_range
instead.  It preserves the page cache invalidation, but won't zero
any pages.

[dchinner: catch error and warn if it fails. Comment.]

cc: stable@vger.kernel.org
Signed-off-by: Chris Mason <clm@fb.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-09-02 12:12:52 +10:00
Dave Chinner
22e757a49c xfs: don't dirty buffers beyond EOF
generic/263 is failing fsx at this point with a page spanning
EOF that cannot be invalidated. The operations are:

1190 mapwrite   0x52c00 thru    0x5e569 (0xb96a bytes)
1191 mapread    0x5c000 thru    0x5d636 (0x1637 bytes)
1192 write      0x5b600 thru    0x771ff (0x1bc00 bytes)

where 1190 extents EOF from 0x54000 to 0x5e569. When the direct IO
write attempts to invalidate the cached page over this range, it
fails with -EBUSY and so any attempt to do page invalidation fails.

The real question is this: Why can't that page be invalidated after
it has been written to disk and cleaned?

Well, there's data on the first two buffers in the page (1k block
size, 4k page), but the third buffer on the page (i.e. beyond EOF)
is failing drop_buffers because it's bh->b_state == 0x3, which is
BH_Uptodate | BH_Dirty.  IOWs, there's dirty buffers beyond EOF. Say
what?

OK, set_buffer_dirty() is called on all buffers from
__set_page_buffers_dirty(), regardless of whether the buffer is
beyond EOF or not, which means that when we get to ->writepage,
we have buffers marked dirty beyond EOF that we need to clean.
So, we need to implement our own .set_page_dirty method that
doesn't dirty buffers beyond EOF.

This is messy because the buffer code is not meant to be shared
and it has interesting locking issues on the buffer dirty bits.
So just copy and paste it and then modify it to suit what we need.

Note: the solutions the other filesystems and generic block code use
of marking the buffers clean in ->writepage does not work for XFS.
It still leaves dirty buffers beyond EOF and invalidations still
fail. Hence rather than play whack-a-mole, this patch simply
prevents those buffers from being dirtied in the first place.

cc: <stable@kernel.org>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-09-02 12:12:51 +10:00
Linus Torvalds
35e274458c File locking related bugfixes for v3.17 (pile #3)
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUAmbRAAoJEAAOaEEZVoIVah4P/iupUO7Ae5ODMDMog/vOp+SM
 +sWnyqnEyeMlQlNDoHoef5TPQ28aKEAq1Sg7CsqlK3qZSYSSPhb4KFsGWLZe6D5A
 7iWSMKabdnuQ3qBCsb2Y6ZdB8IRAJz81sIAVI8F32NDmSs325wN/coVwfV4g8mQF
 QSpv78TjwBY0qNhNw06pS/FLV45IaPTDDgnTHRcOLrfHajDdGTdqrKI/L0ES1PFB
 0ZUtG3qMPS2XYRyS6ZQ0TZZrl2/HMA5/fOwqKspNKxYxKS+TOf/umKwPPjHBnMHo
 mfD1XnG64ECkNio9bpg2CkjUqaT8aloNPgDxuP15vEV6bZ5WBLKjGUOY2IvPa198
 do8CaAdp2Ql6kE2IyD+G+IkjqcxZ9H8hNH3cBM+3TzvxYqaiZKKZAky5UTau2LvG
 E5cyWhDPsVBGvAJXEPBf4vhIgzhaSuNox0+73nL4xU+L7bPTDzYIYyhS/InKO0X+
 ZwAZn2u62XQmUDI8b+zrgOAHfWB0hHlcIfIsIrDxotM24TPPbJ2k2Dz0hKCZAraR
 DYDYPJZg+/QyPc8bujL5Hwjh8MogdDt1vxd9B65MwQWRn791LSGbq6VSsUnsMoAa
 dhG5U+a5eI8oQ2gkMEEK45o2ljcnZ3BSim6SGdmZ6YrNyEdk63xA4GjozK1YhWzG
 tLkXb4/7zV/dR8VTOQR/
 =Wb1t
 -----END PGP SIGNATURE-----

Merge tag 'locks-v3.17-3' of git://git.samba.org/jlayton/linux

Pull file locking bugfx from Jeff Layton:
 "Just a bugfix for a bug that crept in to v3.15.  It's in a rather rare
  error path, and I'm not aware of anyone having hit it, but it's worth
  fixing for v3.17"

* tag 'locks-v3.17-3' of git://git.samba.org/jlayton/linux:
  locks: pass correct "before" pointer to locks_unlink_lock in generic_add_lease
2014-08-30 21:04:37 -07:00
Al Viro
81b6b06197 fix EBUSY on umount() from MNT_SHRINKABLE
We need the parents of victims alive until namespace_unlock() gets to
dput() of the (ex-)mountpoints.  However, that screws up the "is it
busy" checks in case when we have shrinkable mounts that need to be
killed.  Solution: go ahead and decrement refcounts of parents right
in umount_tree(), increment them again just before dropping rwsem in
namespace_unlock() (and let the loop in the end of namespace_unlock()
finally drop those references for good, as we do now).  Parents can't
get freed until we drop rwsem - at least one reference is kept until
then, both in case when parent is among the victims and when it is
not.  So they'll still be around when we get to namespace_unlock().

Cc: stable@vger.kernel.org # 3.12+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-30 18:32:05 -04:00
Al Viro
88b368f27a get rid of propagate_umount() mistakenly treating slaves as busy.
The check in __propagate_umount() ("has somebody explicitly mounted
something on that slave?") is done *before* taking the already doomed
victims out of the child lists.

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-30 18:31:41 -04:00
Linus Torvalds
10f3291a1d Merge branch 'akpm' (fixes from Andrew Morton)
Merge patches from Andrew Morton:
 "22 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (22 commits)
  kexec: purgatory: add clean-up for purgatory directory
  Documentation/kdump/kdump.txt: add ARM description
  flush_icache_range: export symbol to fix build errors
  tools: selftests: fix build issue with make kselftests target
  ocfs2: quorum: add a log for node not fenced
  ocfs2: o2net: set tcp user timeout to max value
  ocfs2: o2net: don't shutdown connection when idle timeout
  ocfs2: do not write error flag to user structure we cannot copy from/to
  x86/purgatory: use approprate -m64/-32 build flag for arch/x86/purgatory
  drivers/rtc/rtc-s5m.c: re-add support for devices without irq specified
  xattr: fix check for simultaneous glibc header inclusion
  kexec: remove CONFIG_KEXEC dependency on crypto
  kexec: create a new config option CONFIG_KEXEC_FILE for new syscall
  x86,mm: fix pte_special versus pte_numa
  hugetlb_cgroup: use lockdep_assert_held rather than spin_is_locked
  mm/zpool: use prefixed module loading
  zram: fix incorrect stat with failed_reads
  lib: turn CONFIG_STACKTRACE into an actual option.
  mm: actually clear pmd_numa before invalidating
  memblock, memhotplug: fix wrong type in memblock_find_in_range_node().
  ...
2014-08-29 16:28:29 -07:00
Junxiao Bi
8c7b638cec ocfs2: quorum: add a log for node not fenced
For debug use, we can see from the log whether the fence decision is
made and why it is not fenced.

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-29 16:28:17 -07:00
Junxiao Bi
8e9801dfe3 ocfs2: o2net: set tcp user timeout to max value
When tcp retransmit timeout(15mins), the connection will be closed.
Pending messages may be lost during this time.  So we set tcp user
timeout to override the retransmit timeout to the max value.  This is OK
for ocfs2 since we have disk heartbeat, if peer crash, the disk
heartbeat will timeout and it will be evicted, if disk heartbeat not
timeout and connection idle for a long time, then this means the cluster
enters split-brain state, since fence can't happen, we'd better keep the
connection and wait network recover.

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-29 16:28:16 -07:00
Junxiao Bi
c43c363def ocfs2: o2net: don't shutdown connection when idle timeout
This patch series is to fix a possible message lost bug in ocfs2 when
network go bad.  This bug will cause ocfs2 hung forever even network
become good again.

The messages may lost in this case.  After the tcp connection is
established between two nodes, an idle timer will be set to check its
state periodically, if no messages are received during this time, idle
timer will timeout, it will shutdown the connection and try to
reconnect, so pending messages in tcp queues will be lost.  This
messages may be from dlm.  Dlm may get hung in this case.  This may
cause the whole ocfs2 cluster hung.

This is very possible to happen when network state goes bad.  Do the
reconnect is useless, it will fail if network state is still bad.  Just
waiting there for network recovering may be a good idea, it will not
lost messages and some node will be fenced until cluster goes into
split-brain state, for this case, Tcp user timeout is used to override
the tcp retransmit timeout.  It will timeout after 25 days, user should
have notice this through the provided log and fix the network, if they
don't, ocfs2 will fall back to original reconnect way.

This patch (of 3):

Some messages in the tcp queue maybe lost if we shutdown the connection
and reconnect when idle timeout.  If packets lost and reconnect success,
then the ocfs2 cluster maybe hung.

To fix this, we can leave the connection there and do the fence decision
when idle timeout, if network recover before fence dicision is made, the
connection survive without lost any messages.

This bug can be saw when network state go bad.  It may cause ocfs2 hung
forever if some packets lost.  With this fix, ocfs2 will recover from
hung if network becomes good again.

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-29 16:28:16 -07:00
Ben Hutchings
2b462638e4 ocfs2: do not write error flag to user structure we cannot copy from/to
If we failed to copy from the structure, writing back the flags leaks 31
bits of kernel memory (the rest of the ir_flags field).

In any case, if we cannot copy from/to the structure, why should we
expect putting just the flags to work?

Also make sure ocfs2_info_handle_freeinode() returns the right error
code if the copy_to_user() fails.

Fixes: ddee5cdb70 ('Ocfs2: Add new OCFS2_IOC_INFO ioctl for ocfs2 v8.')
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Joel Becker <jlbec@evilplan.org>
Acked-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-29 16:28:16 -07:00
Linus Torvalds
878e580e21 NFS client fixes for 3.17
Highlights:
 - NFSv3 stable fix for another POSIX ACL regression
 - NFSv4 stable fix for a regression with OPEN_DOWNGRADE
 - NFSv4 stable fix for bad close() behaviour when holding a delegation
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUANWVAAoJEGcL54qWCgDygyQQALF755JgqEjVy+uRjmqoXn/q
 4Gc7fGEkSfGbP8BO0dJ9Qs4IX7WQTFNUw1x3xqR+wBrvjCFbQeMclI2XIBwYarBl
 7zNnBK9NpmnNh94cataR9WANTmHMm+3xSA3UmK7OnovFOSDviJpKMa3AkRIrJMX5
 ZKYvwN2CigcIefYwtQ2NqkDt0CGbt53zYavQ4hp+//LexaN5z0f2krVj8pPwquYZ
 3JX1sm+C6bKxyTbAyJ8cWCWnJ/gxDOzl2ZPjtWah4G3tVpO6CF5+07xbQ8+B5KOc
 Bm434dWJFlCYSXvmRgbC9i7d7mJU2+fI0rcUP2LDeA73oKDjndsmqmtq08hmPz5K
 FfIA7gko4SJXvYzNKyuoS8j5r+LCtEqKoCCwMucVRwy33rpinmlzw68WTsUm8YtK
 0qYDeAqeuCc9ZerGMMFfkmgigAd2cWhhUnL+V5tlpCEeFRnL1+jqnRxuBhLlzgN3
 SaikZfmncB6gNR6cGwMfceo1E2AoA1GuVy0am1yPsYMhRF6OPCxaLRR53WgioXrt
 DwKUqhQtcE0qN1MII44x0Yxl0oFMTTCl279exnjyCWMpGYX/SAI9ErOpsc+QpIxJ
 wQEL+xUHOV7B6gTt4Y6GbXtL7toBLcMmT71gz6OHcTNJN0OtMuqtp9jgy1iTG7Gm
 2Gd5Su14xza8DFEaDLca
 =3xjW
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.17-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client fixes from Trond Myklebust:
 "Highlights:
   - NFSv3 stable fix for another POSIX ACL regression
   - NFSv4 stable fix for a regression with OPEN_DOWNGRADE
   - NFSv4 stable fix for bad close() behaviour when holding a delegation"

* tag 'nfs-for-3.17-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFSv3: Fix another acl regression
  NFSv4: Don't clear the open state when we just did an OPEN_DOWNGRADE
  NFSv4: Fix problems with close in the presence of a delegation
2014-08-29 13:04:13 -07:00
Linus Torvalds
d4f03186c8 Ext4 bug fixes for 3.17, to provide better handling of memory
allocation failures, and to fix some journaling bugs involving journal
 checksums and FALLOC_FL_ZERO_RANGE.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJT/+hGAAoJENNvdpvBGATwlU8P/02752nzboRRtqYZBxh/rP6L
 QoawhKslb516QFcwgxBpmhf/uSg0XIIakANEFSvlJksj0hcSLNmHl3SjGB6EGyu0
 1qOjgXSULFFnLGkjJ9ptzn266irQRR2AX5+mBP1T/JV6L5dRFwylCWbSElxEjobt
 WhUe0TzXjazYviItOugh8tQYKrfWlfc0UnMSOU7abastStYkROPuvUUOg0fcQCW/
 jZpgFQDKO+TmIZ/QtP26Bogz27Cthe5d1XnA9555JOyYjxpRh3HnVaZXLXOtA2nf
 eQZmDpfXCnbqORLsqDQbq1+TMMFVjudQyIgHkmMojshTc2PWGZyl/KtgxDHCoBxz
 j3a/qafUPbkqEKTLOunDggkWvOKhah7Z6ZCxzamC3d5Cy2GtjUhhp+iyllf4Tmga
 OEWIPp/5F3/UfJj/0e3fcmj8tzTP8bOgVh4xC/Iwf3wugKzeGs9iaWEs02TJpCQk
 Yu+xqhHP05MGQuMXcQbPJy+DPq3a43Y/PBlzyF9ZmvJKqs0SxRIhgDnpRXDLla/m
 a2zYkzqZBog081idgy1KSJjL1XVBjHkcMwUaZ/mOCd/ok7pAhQIYPVeJEpBylf+l
 ABtTgn8qU6+QmkaeTypY8h3OAMES/PgA+PQp46zgmPJVopov0926QuMWzXb2Wbq9
 ZFGJziWdAZos5XWnMo6A
 =e6Ou
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 bugfixes from Ted Ts'o:
 "Ext4 bug fixes for 3.17, to provide better handling of memory
  allocation failures, and to fix some journaling bugs involving
  journal checksums and FALLOC_FL_ZERO_RANGE"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: fix same-dir rename when inline data directory overflows
  jbd2: fix descriptor block size handling errors with journal_csum
  jbd2: fix infinite loop when recovering corrupt journal blocks
  ext4: update i_disksize coherently with block allocation on error path
  ext4: fix transaction issues for ext4_fallocate and ext_zero_range
  ext4: fix incorect journal credits reservation in ext4_zero_range
  ext4: move i_size,i_disksize update routines to helper function
  ext4: fix BUG_ON in mb_free_blocks()
  ext4: propagate errors up to ext4_find_entry()'s callers
2014-08-29 11:52:46 -07:00
Jaegeuk Kim
3304b56401 f2fs: fix wrong casting for dentry name
The dentry name type is unsigned char *.
If we don't match this type, some character codes can be changed by signed bit.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-29 00:26:50 -07:00
Darrick J. Wong
d80d448c6c ext4: fix same-dir rename when inline data directory overflows
When performing a same-directory rename, it's possible that adding or
setting the new directory entry will cause the directory to overflow
the inline data area, which causes the directory to be converted to an
extent-based directory.  Under this circumstance it is necessary to
re-read the directory when deleting the old dirent because the "old
directory" context still points to i_block in the inode table, which
is now an extent tree root!  The delete fails with an FS error, and
the subsequent fsck complains about incorrect link counts and
hardlinked directories.

Test case (originally found with flat_dir_test in the metadata_csum
test program):

# mkfs.ext4 -O inline_data /dev/sda
# mount /dev/sda /mnt
# mkdir /mnt/x
# touch /mnt/x/changelog.gz /mnt/x/copyright /mnt/x/README.Debian
# sync
# for i in /mnt/x/*; do mv $i $i.longer; done
# ls -la /mnt/x/
total 0
-rw-r--r-- 1 root root 0 Aug 25 12:03 changelog.gz.longer
-rw-r--r-- 1 root root 0 Aug 25 12:03 copyright
-rw-r--r-- 1 root root 0 Aug 25 12:03 copyright.longer
-rw-r--r-- 1 root root 0 Aug 25 12:03 README.Debian.longer

(Hey!  Why are there four files now??)

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
2014-08-28 22:22:29 -04:00
Darrick J. Wong
db9ee22036 jbd2: fix descriptor block size handling errors with journal_csum
It turns out that there are some serious problems with the on-disk
format of journal checksum v2.  The foremost is that the function to
calculate descriptor tag size returns sizes that are too big.  This
causes alignment issues on some architectures and is compounded by the
fact that some parts of jbd2 use the structure size (incorrectly) to
determine the presence of a 64bit journal instead of checking the
feature flags.

Therefore, introduce journal checksum v3, which enlarges the
descriptor block tag format to allow for full 32-bit checksums of
journal blocks, fix the journal tag function to return the correct
sizes, and fix the jbd2 recovery code to use feature flags to
determine 64bitness.

Add a few function helpers so we don't have to open-code quite so
many pieces.

Switching to a 16-byte block size was found to increase journal size
overhead by a maximum of 0.1%, to convert a 32-bit journal with no
checksumming to a 32-bit journal with checksum v3 enabled.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reported-by: TR Reardon <thomas_reardon@hotmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
2014-08-28 22:22:29 -04:00
Darrick J. Wong
022eaa7517 jbd2: fix infinite loop when recovering corrupt journal blocks
When recovering the journal, don't fall into an infinite loop if we
encounter a corrupt journal block.  Instead, just skip the block and
return an error, which fails the mount and thus forces the user to run
a full filesystem fsck.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
2014-08-28 22:22:28 -04:00
Dmitry Monakhov
6603120e96 ext4: update i_disksize coherently with block allocation on error path
In case of delalloc block i_disksize may be less than i_size. So we
have to update i_disksize each time we allocated and submitted some
blocks beyond i_disksize.  We weren't doing this on the error paths,
so fix this.

testcase: xfstest generic/019

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
2014-08-28 22:20:41 -04:00
Dan Carpenter
922cedbd00 f2fs: simplify by using a literal
We can make the code a bit simpler because we know that "!retry" is
zero.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-28 09:25:29 -07:00
Dmitry Monakhov
c174e6d697 ext4: fix transaction issues for ext4_fallocate and ext_zero_range
After commit f282ac19d8 we use different transactions for
preallocation and i_disksize update which result in complain from fsck
after power-failure.  spotted by generic/019. IMHO this is regression
because fs becomes inconsistent, even more 'e2fsck -p' will no longer
works (which drives admins go crazy) Same transaction requirement
applies ctime,mtime updates

testcase: xfstest generic/019

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
2014-08-27 18:40:00 -04:00
Dmitry Monakhov
69dc953640 ext4: fix incorect journal credits reservation in ext4_zero_range
Currently we reserve only 4 blocks but in worst case scenario
ext4_zero_partial_blocks() may want to zeroout and convert two
non adjacent blocks.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
2014-08-27 18:33:49 -04:00
Linus Torvalds
1fb00cbca0 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "The biggest of these comes from Liu Bo, who tracked down a hang we've
  been hitting since moving to kernel workqueues (it's a btrfs bug, not
  in the generic code).  His patch needs backporting to 3.16 and 3.15
  stable, which I'll send once this is in.

  Otherwise these are assorted fixes.  Most were integrated last week
  during KS, but I wanted to give everyone the chance to test the
  result, so I waited for rc2 to come out before sending"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (24 commits)
  Btrfs: fix task hang under heavy compressed write
  Btrfs: fix filemap_flush call in btrfs_file_release
  Btrfs: fix crash on endio of reading corrupted block
  btrfs: fix leak in qgroup_subtree_accounting() error path
  btrfs: Use right extent length when inserting overlap extent map.
  Btrfs: clone, don't create invalid hole extent map
  Btrfs: don't monopolize a core when evicting inode
  Btrfs: fix hole detection during file fsync
  Btrfs: ensure tmpfile inode is always persisted with link count of 0
  Btrfs: race free update of commit root for ro snapshots
  Btrfs: fix regression of btrfs device replace
  Btrfs: don't consider the missing device when allocating new chunks
  Btrfs: Fix wrong device size when we are resizing the device
  Btrfs: don't write any data into a readonly device when scrub
  Btrfs: Fix the problem that the replace destroys the seed filesystem
  btrfs: Return right extent when fiemap gives unaligned offset and len.
  Btrfs: fix wrong extent mapping for DirectIO
  Btrfs: fix wrong write range for filemap_fdatawrite_range()
  Btrfs: fix wrong missing device counter decrease
  Btrfs: fix unzeroed members in fs_devices when creating a fs from seed fs
  ...
2014-08-27 09:14:17 -07:00
Trond Myklebust
f87d928f6d NFSv3: Fix another acl regression
When creating a new object on the NFS server, we should not be sending
posix setacl requests unless the preceding posix_acl_create returned a
non-trivial acl. Doing so, causes Solaris servers in particular to
return an EINVAL.

Fixes: 013cdf1088 (nfs: use generic posix ACL infrastructure,,,)
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1132786
Cc: stable@vger.kernel.org # 3.14+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-08-26 16:17:48 -04:00
Trond Myklebust
412f6c4c26 NFSv4: Don't clear the open state when we just did an OPEN_DOWNGRADE
If we did an OPEN_DOWNGRADE, then the right thing to do on success, is
to apply the new open mode to the struct nfs4_state. Instead, we were
unconditionally clearing the state, making it appear to our state
machinery as if we had just performed a CLOSE.

Fixes: 226056c5c3 (NFSv4: Use correct locking when updating nfs4_state...)
Cc: stable@vger.kernel.org # 3.15+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-08-26 16:17:48 -04:00
Trond Myklebust
aee7af356e NFSv4: Fix problems with close in the presence of a delegation
In the presence of delegations, we can no longer assume that the
state->n_rdwr, state->n_rdonly, state->n_wronly reflect the open
stateid share mode, and so we need to calculate the initial value
for calldata->arg.fmode using the state->flags.

Reported-by: James Drews <drews@engr.wisc.edu>
Fixes: 88069f77e1 (NFSv41: Fix a potential state leakage when...)
Cc: stable@vger.kernel.org # 2.6.33+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-08-26 16:17:48 -04:00
Linus Torvalds
f01bfc977e NFS client fixes for 3.17
Highlights:
 
 - More fixes for read/write codepath regressions
   - Sleeping while holding the inode lock
   - Stricter enforcement of page contiguity when coalescing requests
   - Fix up error handling in the page coalescing code
 - Don't busy wait on SIGKILL in the file locking code
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJT+0LpAAoJEGcL54qWCgDyWfsP/imrpge47aZywi95chV8vgjM
 O85ITZbupTFwXbB7kE63CrcaxRGhFrSStk4UDhDCDkHfFb1ksjZaPR1mnkwvkR2p
 4+JUoq0fkPfeX21+rqKCYmnhstpne/N8K8FJBsEs3/TqiCBWxWOelLXdyWun4H5B
 9JBYQ7FYitUazeSiSiDXcl7Di/E09cFPi0H5VPKRyuNdYxySabnsBOELBE/28iXr
 egW1I9UKQR2EtBrvgazBbWE5XmB9XAm4X3sD1l0QD65mfSNkbnNhPFSiCdT7f/d6
 9uxECR0Y4wNYgYAfVLBew5/MXJajcv03BFMKmTUeGj9fOQzycpBT4Dx2KxEWqfnt
 Xk2nNbISxBnO0koMflmo+LPv2lv+Br3kQ+eZCHHKknvBrX2a6bJdTCZkwACVtND9
 LdbAveFQpdaeLrm/28TnRoE927r+VeAVM19yOSG8sNAskFFg4Yy51tR0e1GivkJT
 +qmmTRx+l78HjHvoPXOYdNgBC954r6APH5ST7su/7WxNClM36fEK6XxA9xbDLJWm
 wUzlGKvpwEeBJJhgjbQLwuU8BiksjFz/CaiObNvPOpc/d2GoKIhnTg19kNhg2R//
 UCDa2d5fep4z0Bo9p0s1KZm9pSBkkLjvRp9dm8WEIxLcdaF1jBK3dJECepm6ccvw
 dmEmEfjbMudVdt/ZhapJ
 =2wRt
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.17-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client fixes from Trond Myklebust:
 "Highlights:

   - more fixes for read/write codepath regressions
     * sleeping while holding the inode lock
     * stricter enforcement of page contiguity when coalescing requests
     * fix up error handling in the page coalescing code

   - don't busy wait on SIGKILL in the file locking code"

* tag 'nfs-for-3.17-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  nfs: Don't busy-wait on SIGKILL in __nfs_iocounter_wait
  nfs: can_coalesce_requests must enforce contiguity
  nfs: disallow duplicate pages in pgio page vectors
  nfs: don't sleep with inode lock in lock_and_join_requests
  nfs: fix error handling in lock_and_join_requests
  nfs: use blocking page_group_lock in add_request
  nfs: fix nonblocking calls to nfs_page_group_lock
  nfs: change nfs_page_group_lock argument
2014-08-25 15:34:28 -07:00
Jaegeuk Kim
c2e69583a4 f2fs: truncate stale block for inline_data
This verifies to truncate any allocated blocks, offset[0], by inline_data.
Not figured out, but for making sure.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-25 14:52:09 -07:00