Commit Graph

22283 Commits

Author SHA1 Message Date
Bob Peterson
556bb17998 GFS2: move function foreach_leaf to gfs2_dir_exhash_dealloc
The previous patches made function gfs2_dir_exhash_dealloc do nothing
but call function foreach_leaf.  This patch simplifies the code by
moving the entire function foreach_leaf into gfs2_dir_exhash_dealloc.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-04-20 08:54:44 +01:00
Bob Peterson
ec038c826b GFS2: pass leaf_bh into leaf_dealloc
Function foreach_leaf used to look up the leaf block address and get
a buffer_head.  Then it would call leaf_dealloc which did the same
lookup.  This patch combines the two operations by making foreach_leaf
pass the leaf bh to leaf_dealloc.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-04-20 08:54:26 +01:00
Bob Peterson
d24a7a439a GFS2: Combine transaction from gfs2_dir_exhash_dealloc
At the end of function gfs2_dir_exhash_dealloc, it was setting the dinode
type to "file" to prevent directory corruption in case of a crash.
It was doing so in its own journal transaction.  This patch makes the
change occur when the last call is make to leaf_dealloc, since it needs
to rewrite the directory dinode at that time anyway.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-04-20 08:53:56 +01:00
Bob Peterson
0d95326d9b GFS2: remove *leaf_call_t and simplify leaf_dealloc
Since foreach_leaf is only called with leaf_dealloc as its only possible
call function, we can simplify the code by making it call leaf_dealloc
directly.  This simplifies the code and eliminates the need for
leaf_call_t, the generic call method.  This is a first small step in
simplifying the directory leaf deallocation code.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-04-20 08:53:35 +01:00
Bob Peterson
95c8e17f2f GFS2: Dump better debug info if a bitmap inconsistency is detected
On rare occasions we encounter gfs2 problems where an
invalid bitmap state transition is attempted.  For example,
trying to "unlink" a free block.  In these cases, there
is really no useful information logged to debug the problem.
This patch adds more debug details that should allow us to
more closely examine the problem and possibly solve it.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-04-20 08:53:12 +01:00
Linus Torvalds
f28c6179e5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes:
  GFS2: filesystem hang caused by incorrect lock order
  GFS2: Don't try to deallocate unlinked inodes when mounted ro
  GFS2: directly write blocks past i_size
  GFS2: write_end error path fails to unlock transaction lock
2011-04-19 10:52:51 -07:00
Linus Torvalds
adff377bb1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (24 commits)
  Btrfs: fix free space cache leak
  Btrfs: avoid taking the chunk_mutex in do_chunk_alloc
  Btrfs end_bio_extent_readpage should look for locked bits
  Btrfs: don't force chunk allocation in find_free_extent
  Btrfs: Check validity before setting an acl
  Btrfs: Fix incorrect inode nlink in btrfs_link()
  Btrfs: Check if btrfs_next_leaf() returns error in btrfs_real_readdir()
  Btrfs: Check if btrfs_next_leaf() returns error in btrfs_listxattr()
  Btrfs: make uncache_state unconditional
  btrfs: using cached extent_state in set/unlock combinations
  Btrfs: avoid taking the trans_mutex in btrfs_end_transaction
  Btrfs: fix subvolume mount by name problem when default mount subvolume is set
  fix user annotation in ioctl.c
  Btrfs: check for duplicate iov_base's when doing dio reads
  btrfs: properly handle overlapping areas in memmove_extent_buffer
  Btrfs: fix memory leaks in btrfs_new_inode()
  Btrfs: check for duplicate iov_base's when doing dio reads
  Btrfs: reuse the extent_map we found when calling btrfs_get_extent
  Btrfs: do not use async submit for small DIO io's
  Btrfs: don't split dio bios if we don't have to
  ...
2011-04-18 12:24:05 -07:00
Linus Torvalds
d8bdc59f21 proc: do proper range check on readdir offset
Rather than pass in some random truncated offset to the pid-related
functions, check that the offset is in range up-front.

This is just cleanup, the previous commit fixed the real problem.

Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-18 10:36:54 -07:00
Bob Peterson
44ad37d69b GFS2: filesystem hang caused by incorrect lock order
This patch fixes a deadlock in GFS2 where two processes are trying
to reclaim an unlinked dinode:
One holds the inode glock and calls gfs2_lookup_by_inum trying to look
up the inode, which it can't, due to I_FREEING.  The other has set
I_FREEING from vfs and is at the beginning of gfs2_delete_inode
waiting for the glock, which is held by the first.  The solution is to
add a new non_block parameter to the gfs2_iget function that causes it
to return -ENOENT if the inode is being freed.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-04-18 15:23:50 +01:00
Steven Whitehouse
001e8e8df4 GFS2: Don't try to deallocate unlinked inodes when mounted ro
This adds a couple of missing tests to avoid read-only nodes
from attempting to deallocate unlinked inodes.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Reported-by: Michel Andre de la Porte <madelaporte@ubi.com>
2011-04-18 15:23:12 +01:00
Benjamin Marzinski
0ee532062f GFS2: directly write blocks past i_size
GFS2 was relying on the writepage code to write out the zeroed data for
fallocate.  However, with FALLOC_FL_KEEP_SIZE set, this may be past i_size.
If it is, it will be ignored.  To work around this, gfs2 now calls
write_dirty_buffer directly on the buffer_heads when FALLOC_FL_KEEP_SIZE
is set, and it's writing past i_size.

This version is just a cleanup of my last version

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-04-18 15:22:52 +01:00
Bob Peterson
deab72d379 GFS2: write_end error path fails to unlock transaction lock
I did an audit of gfs2's transaction glock for bugzilla bug
658619 and ran across this:

In function gfs2_write_end, in the unlikely event that
gfs2_meta_inode_buffer returns an error, the code may forget
to unlock the transaction lock because the "failed" label
appears after the call to function gfs2_trans_end.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-04-18 15:22:35 +01:00
Chris Mason
f65647c29b Btrfs: fix free space cache leak
The free space caching code was recently reworked to
cache all the pages it needed instead of using find_get_page everywhere.

One loop was missed though, so it ended up leaking pages.  This fixes
it to use our page array instead of find_get_page.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-04-18 08:55:34 -04:00
Milton Miller
fff3e5ade4 fs: synchronize_rcu when unregister_filesystem success not failure
While checking unregister_filesystem for saftey vs extra calls for
"ext4: register ext2 and ext3 alias after ext4" I realized that
the synchronize_rcu() was called on the error path but not on
the success path.

Cc: stable (2.6.38)
Signed-off-by: Milton Miller <miltonm@bga.com>
[ This probably won't really make a difference since commit d863b50ab0
  ("vfs: call rcu_barrier after ->kill_sb()"), but it's the right thing
  to do.  - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-17 10:42:01 -07:00
Josef Bacik
6d74119f1a Btrfs: avoid taking the chunk_mutex in do_chunk_alloc
Everytime we try to allocate disk space we try and see if we can pre-emptively
allocate a chunk, but in the common case we don't allocate anything, so there is
no sense in taking the chunk_mutex at all.  So instead if we are allocating a
chunk, mark it in the space_info so we don't get two people trying to allocate
at the same time.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Reviewed-by: Liu Bo <liubo2009@cn.fujitsu.com>
2011-04-16 07:10:56 -04:00
Chris Mason
0d399205ed Btrfs end_bio_extent_readpage should look for locked bits
A recent commit caches the extent state in end_bio_extent_readpage,
but the search it does should look for locked extents.  This
fixes things to make it more effective.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-04-16 06:55:39 -04:00
Linus Torvalds
0ebc115da3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  net/9p: nwname should be an unsigned int
  9p: Fix sparse error
  fs/9p: Fix error reported by coccicheck
  9p: revert tsyncfs related changes
  fs/9p: Use write_inode for data sync on server
  fs/9p: Fix revalidate to return correct value
2011-04-15 20:31:15 -07:00
Tim Chen
c1530019e3 vfs: Fix absolute RCU path walk failures due to uninitialized seq number
During RCU walk in path_lookupat and path_openat, the rcu lookup
frequently failed if looking up an absolute path, because when root
directory was looked up, seq number was not properly set in nameidata.

We dropped out of RCU walk in nameidata_drop_rcu due to mismatch in
directory entry's seq number.  We reverted to slow path walk that need
to take references.

With the following patch, I saw a 50% increase in an exim mail server
benchmark throughput on a 4-socket Nehalem-EX system.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: stable@kernel.org (v2.6.38)
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-15 15:28:12 -07:00
Aneesh Kumar K.V
936bb2d703 fs/9p: Fix error reported by coccicheck
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-04-15 15:26:14 -05:00
Aneesh Kumar K.V
df5d8c80f1 9p: revert tsyncfs related changes
Now that we use write_inode to flush server
cache related to fid, we don't need tsyncfs either fort dotl or dotu
protocols. For dotu this helps to do a more efficient server flush.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-04-15 15:26:14 -05:00
Aneesh Kumar K.V
c2ed388021 fs/9p: Use write_inode for data sync on server
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-04-15 15:26:14 -05:00
Aneesh Kumar K.V
f2eda2c6cc fs/9p: Fix revalidate to return correct value
revalidate should return > 0 on success. Also return 0 on ENOENT
to force do_revalidate to return NULL dentry;

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-04-15 15:26:13 -05:00
Chris Mason
0e4f8f8888 Btrfs: don't force chunk allocation in find_free_extent
find_free_extent likes to allocate in contiguous clusters,
which makes writeback faster, especially on SSD storage.  As
the FS fragments, these clusters become harder to find and we have
to decide between allocating a new chunk to make more clusters
or giving up on the cluster to allocate from the free space
we have.

Right now it creates too many chunks, and you can end up with
a whole FS that is mostly empty metadata chunks.  This commit
changes the allocation code to be more strict and only
allocate new chunks when we've made good use of the chunks we
already have.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-04-15 16:05:44 -04:00
Linus Torvalds
a970f5d513 Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6
* 'linux-next' of git://git.infradead.org/ubifs-2.6:
  UBIFS: fix compilation warnings when compiling with gcc 4.5
  UBIFS: fix oops when R/O file-system is fsync'ed
2011-04-15 07:44:22 -07:00
Linus Torvalds
7ebfa57f6d vfs: fix incorrect dentry_update_name_case() BUG_ON() test
The case we should be verifying when updating the dentry name is that
the _parent_ inode (the directory) semaphore is held, not the semaphore
for the dentry itself.  It's the directory locking that rename and
readdir() etc all care about.

The comment just above even says so - but then the BUG_ON() still
checked the dentry inode itself.

Very few people noticed, because this helper function really isn't used
for very much, so you had to be using ncpfs to ever hit it.

I think I should just remove the BUG_ON (the function really has just
one user), but let's run with it fixed for a while before getting rid of
it entirely.

Reported-and-tested-by: Bongani Hlope <bonganih@bankservafrica.com>
Reported-and-tested-by: Bernd Feige <bernd.feige@uniklinik-freiburg.de>
Cc: Petr Vandrovec <petr@vandrovec.name>,
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-15 07:34:26 -07:00
Bob Liu
b836aec53e ramfs: fix memleak on no-mmu arch
On no-mmu arch, there is a memleak during shmem test.  The cause of this
memleak is ramfs_nommu_expand_for_mapping() added page refcount to 2
which makes iput() can't free that pages.

The simple test file is like this:

  int main(void)
  {
	int i;
	key_t k = ftok("/etc", 42);

	for ( i=0; i<100; ++i) {
		int id = shmget(k, 10000, 0644|IPC_CREAT);
		if (id == -1) {
			printf("shmget error\n");
		}
		if(shmctl(id, IPC_RMID, NULL ) == -1) {
			printf("shm  rm error\n");
			return -1;
		}
	}
	printf("run ok...\n");
	return 0;
  }

And the result:

  root:/> free
               total         used         free       shared      buffers
  Mem:         60320        17912        42408            0            0
  -/+ buffers:              17912        42408
  root:/> shmem
  run ok...
  root:/> free
               total         used         free       shared      buffers
  Mem:         60320        19096        41224            0            0
  -/+ buffers:              19096        41224
  root:/> shmem
  run ok...
  root:/> free
               total         used         free       shared      buffers
  Mem:         60320        20296        40024            0            0
  -/+ buffers:              20296        40024
  ...

After this patch the test result is:(no memleak anymore)

  root:/> free
               total         used         free       shared      buffers
  Mem:         60320        16668        43652            0            0
  -/+ buffers:              16668        43652
  root:/> shmem
  run ok...
  root:/> free
               total         used         free       shared      buffers
  Mem:         60320        16668        43652            0            0
  -/+ buffers:              16668        43652

Signed-off-by: Bob Liu <lliubbo@gmail.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-14 16:06:56 -07:00
Jeff Mahoney
ed5afeaf42 fs/fhandle.c: add <linux/personality.h> for ia64
force_o_largefile() on ia64 is defined in <asm/fcntl.h> and requires
<linux/personality.h>.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-14 16:06:56 -07:00
Jiri Kosina
4471a675df brk: COMPAT_BRK: fix detection of randomized brk
5520e89 ("brk: fix min_brk lower bound computation for COMPAT_BRK")
tried to get the whole logic of brk randomization for legacy
(libc5-based) applications finally right.

It turns out that the way to detect whether brk has actually been
randomized in the end or not introduced by that patch still doesn't work
for those binaries, as reported by Geert:

: /sbin/init from my old m68k ramdisk exists prematurely.
:
: Before the patch:
:
: | brk(0x80005c8e)                         = 0x80006000
:
: After the patch:
:
: | brk(0x80005c8e)                         = 0x80005c8e
:
: Old libc5 considers brk() to have failed if the return value is not
: identical to the requested value.

I don't like it, but currently see no better option than a bit flag in
task_struct to catch the CONFIG_COMPAT_BRK && randomize_va_space == 2
case.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-14 16:06:55 -07:00
Timo Warns
c340b1d640 fs/partitions/ldm.c: fix oops caused by corrupted partition table
The kernel automatically evaluates partition tables of storage devices.
The code for evaluating LDM partitions (in fs/partitions/ldm.c) contains
a bug that causes a kernel oops on certain corrupted LDM partitions.
A kernel subsystem seems to crash, because, after the oops, the kernel no
longer recognizes newly connected storage devices.

The patch validates the value of vblk_size.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Timo Warns <warns@pre-sense.de>
Cc: Eugene Teo <eugeneteo@kernel.sg>
Cc: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Richard Russon <rich@flatcap.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-14 16:06:54 -07:00
Maksim Rayskiy
1dcffad741 UBIFS: fix compilation warnings when compiling with gcc 4.5
When compiling UBIFS with CONFIG_UBIFS_FS_DEBUG not set,
gcc-4.5.2 generates a slew of "warning: statement with no effect"
on references to non-void functions defined as 0.
To avoid these warnings, replace #defines with dummy inline functions.

Artem: massage the patch a bit, also remove the duplicate
       'dbg_check_lprops()' prototype.

Signed-off-by: Maksim Rayskiy <maksim.rayskiy@gmail.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2011-04-13 11:59:09 +03:00
Artem Bityutskiy
78530bf7f2 UBIFS: fix oops when R/O file-system is fsync'ed
This patch fixes severe UBIFS bug: UBIFS oopses when we 'fsync()' an
file on R/O-mounter file-system. We (the UBIFS authors) incorrectly
thought that VFS would not propagate 'fsync()' down to the file-system
if it is read-only, but this is not the case.

It is easy to exploit this bug using the following simple perl script:

use strict;
use File::Sync qw(fsync sync);

die "File path is not specified" if not defined $ARGV[0];
my $path = $ARGV[0];

open FILE, "<", "$path" or die "Cannot open $path: $!";
fsync(\*FILE) or die "cannot fsync $path: $!";
close FILE or die "Cannot close $path: $!";

Thanks to Reuben Dowle <Reuben.Dowle@navico.com> for reporting about this
issue.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Reported-by: Reuben Dowle <Reuben.Dowle@navico.com>
Cc: stable@kernel.org
2011-04-13 10:43:32 +03:00
Miao Xie
329c5056be Btrfs: Check validity before setting an acl
Call posix_acl_valid() to check if an acl is valid or not.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-13 14:25:35 +08:00
Miao Xie
3153495d8e Btrfs: Fix incorrect inode nlink in btrfs_link()
Link count of the inode is not decreased if btrfs_set_inode_index()
fails.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Singed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-13 14:25:32 +08:00
Li Zefan
b9e03af0bc Btrfs: Check if btrfs_next_leaf() returns error in btrfs_real_readdir()
btrfs_next_leaf() can return -errno, and we should propagate
it to userspace.

This also simplifies how we walk the btree path.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-13 14:25:31 +08:00
Li Zefan
2e6a00356a Btrfs: Check if btrfs_next_leaf() returns error in btrfs_listxattr()
btrfs_next_leaf() can return -errno, and we should propagate
it to userspace.

This also simplifies how we walk the btree path.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-04-13 14:25:28 +08:00
Chris Mason
109b36a2bb Btrfs: make uncache_state unconditional
The extent_io code can take cached pointers into the extent state trees,
and these can make lookups much faster in common operations.  The
caching only happens when specific bits are set that prevent merging
and splitting of the extent state.

A help function was added to uncache the state, and it was testing
the same set of conditionals.  This can leak in very strange corner
cases where the lock bit goes away unexpectedly.

The uncaching should be unconditional.  Once we have a ref on the
extent we should always give it up.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-04-12 20:51:26 -04:00
Linus Torvalds
6faf9a5415 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
  cifs: don't allow mmap'ed pages to be dirtied while under writeback (try #3)
  [CIFS] Warn on requesting default security (ntlm) on mount
  [CIFS] cifs: clarify the meaning of tcpStatus == CifsGood
  cifs: wrap received signature check in srv_mutex
  cifs: clean up various nits in unicode routines (try #2)
  cifs: clean up length checks in check2ndT2
  cifs: set ra_pages in backing_dev_info
  cifs: fix broken BCC check in is_valid_oplock_break
  cifs: always do is_path_accessible check in cifs_mount
  various endian fixes to cifs
  Elminate sparse __CHECK_ENDIAN__ warnings on port conversion
  Max share size is too small
  Allow user names longer than 32 bytes
  cifs: replace /proc/fs/cifs/Experimental with a module parm
  cifs: check for private_data before trying to put it
2011-04-12 15:24:42 -07:00
Dave Chinner
0d88f6e804 nfs: don't call __mark_inode_dirty while holding i_lock
nfs_scan_commit() is called with the inode->i_lock held, but it then
calls __mark_inode_dirty() while still holding the lock. This causes
a deadlock.

Push the inode->i_lock into nfs_scan_commit() so it can protect only
the parts of the code it needs to and can be dropped before the call
to __mark_inode_dirty() to avoid the deadlock.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Tested-by: Will Simoneau <simoneau@ele.uri.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-12 14:17:24 -07:00
Linus Torvalds
be85bccaa5 Revert "vfs: Export file system uuid via /proc/<pid>/mountinfo"
This reverts commit 93f1c20bc8.

It turns out that libmount misparses it because it adds a '-' character
in the uuid string, which libmount then incorrectly confuses with the
separator string (" - ") at the end of all the optional arguments.

Upstream libmount (in the util-linux tree) has been fixed, but until
that fix actually percolates up to users, we'd better not expose this
change in the kernel.

Let's revisit this later (possibly by exposing the UUID without any '-'
characters in it, avoiding the user-space bug).

Reported-by: Dave Jones <davej@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Karel Zak <kzak@redhat.com>
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: Miklos Szeredi <mszeredi@suse.cz>
Cc: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-12 13:35:56 -07:00
Jeff Layton
ca83ce3d5b cifs: don't allow mmap'ed pages to be dirtied while under writeback (try #3)
This is more or less the same patch as before, but with some merge
conflicts fixed up.

If a process has a dirty page mapped into its page tables, then it has
the ability to change it while the client is trying to write the data
out to the server. If that happens after the signature has been
calculated then that signature will then be wrong, and the server will
likely reset the TCP connection.

This patch adds a page_mkwrite handler for CIFS that simply takes the
page lock. Because the page lock is held over the life of writepage and
writepages, this prevents the page from becoming writeable until
the write call has completed.

With this, we can also remove the "sign_zero_copy" module option and
always inline the pages when writing.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-04-12 14:19:55 +00:00
Steve French
d9b9420137 [CIFS] Warn on requesting default security (ntlm) on mount
Warn once if default security (ntlm) requested. We will
update the default to the stronger security mechanism
(ntlmv2) in 2.6.41.  Kerberos is also stronger than
ntlm, but more servers support ntlmv2 and ntlmv2
does not require an upcall, so ntlmv2 is a better
default.

Reviewed-by: Jeff Layton <jlayton@redhat.com>
CC: Suresh Jayaraman <sjayaraman@suse.de>
Reviewed-by: Shirish Pargaonkar <shirishp@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-04-12 01:27:45 +00:00
Steve French
fd88ce9313 [CIFS] cifs: clarify the meaning of tcpStatus == CifsGood
When the TCP_Server_Info is first allocated and connected, tcpStatus ==
CifsGood means that the NEGOTIATE_PROTOCOL request has completed and the
socket is ready for other calls. cifs_reconnect however sets tcpStatus
to CifsGood as soon as the socket is reconnected and the optional
RFC1001 session setup is done. We have no clear way to tell the
difference between these two states, and we need to know this in order
to know whether we can send an echo or not.

Resolve this by adding a new statusEnum value -- CifsNeedNegotiate. When
the socket has been connected but has not yet had a NEGOTIATE_PROTOCOL
request done, set it to this value. Once the NEGOTIATE is done,
cifs_negotiate_protocol will set tcpStatus to CifsGood.

This also fixes and cleans the logic in cifs_reconnect and
cifs_reconnect_tcon. The old code checked for specific states when what
it really wants to know is whether the state has actually changed from
CifsNeedReconnect.

Reported-and-Tested-by: JG <jg@cms.ac>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-04-12 01:01:14 +00:00
Jeff Layton
157c249114 cifs: wrap received signature check in srv_mutex
While testing my patchset to fix asynchronous writes, I hit a bunch
of signature problems when testing with signing on. The problem seems
to be that signature checks on receive can be running at the same
time as a process that is sending, or even that multiple receives can
be checking signatures at the same time, clobbering the same data
structures.

While we're at it, clean up the comments over cifs_calculate_signature
and add a note that the srv_mutex should be held when calling this
function.

This patch seems to fix the problems for me, but I'm not clear on
whether it's the best approach. If it is, then this should probably
go to stable too.

Cc: stable@kernel.org
Cc: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-04-12 00:58:28 +00:00
Jeff Layton
581ade4d1c cifs: clean up various nits in unicode routines (try #2)
Minor revision to the original patch. Don't abuse the __le16 variable
on the stack by casting it to wchar_t and handing it off to char2uni.
Declare an actual wchar_t on the stack instead. This fixes a valid
sparse warning.

Fix the spelling of UNI_ASTERISK. Eliminate the unneeded len_remaining
variable in cifsConvertToUCS.

Also, as David Howells points out. We were better off making
cifsConvertToUCS *not* use put_unaligned_le16 since it means that we
can't optimize the mapped characters at compile time. Switch them
instead to use cpu_to_le16, and simply use put_unaligned to set them
in the string.

Reported-and-acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-04-12 00:57:12 +00:00
Jeff Layton
c0c7b905e9 cifs: clean up length checks in check2ndT2
Thus spake David Howells:

The code that follows this:

  	remaining = total_data_size - data_in_this_rsp;
	if (remaining == 0)
		return 0;
	else if (remaining < 0) {

generates better code if you drop the 'remaining' variable and compare
the values directly.

Clean it up per his recommendation...

Reported-and-acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-04-12 00:56:46 +00:00
Jeff Layton
2b6c26a0a6 cifs: set ra_pages in backing_dev_info
Commit 522440ed made cifs set backing_dev_info on the mapping attached
to new inodes. This change caused a fairly significant read performance
regression, as cifs started doing page-sized reads exclusively.

By virtue of the fact that they're allocated as part of cifs_sb_info by
kzalloc, the ra_pages on cifs BDIs get set to 0, which prevents any
readahead. This forces the normal read codepaths to use readpage instead
of readpages causing a four-fold increase in the number of read calls
with the default rsize.

Fix it by setting ra_pages in the BDI to the same value as that in the
default_backing_dev_info.

Fixes https://bugzilla.kernel.org/show_bug.cgi?id=31662

Cc: stable@kernel.org
Reported-and-Tested-by: Till <till2.schaefer@uni-dortmund.de>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-04-12 00:56:00 +00:00
Jeff Layton
8679b0dba7 cifs: fix broken BCC check in is_valid_oplock_break
The BCC is still __le16 at this point, and in any case we need to
use the get_bcc_le macro to make sure we don't hit alignment
problems.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-04-12 00:54:30 +00:00
Jeff Layton
7094564372 cifs: always do is_path_accessible check in cifs_mount
Currently, we skip doing the is_path_accessible check in cifs_mount if
there is no prefixpath. I have a report of at least one server however
that allows a TREE_CONNECT to a share that has a DFS referral at its
root. The reporter in this case was using a UNC that had no prefixpath,
so the is_path_accessible check was not triggered and the box later hit
a BUG() because we were chasing a DFS referral on the root dentry for
the mount.

This patch fixes this by removing the check for a zero-length
prefixpath.  That should make the is_path_accessible check be done in
this situation and should allow the client to chase the DFS referral at
mount time instead.

Cc: stable@kernel.org
Reported-and-Tested-by: Yogesh Sharma <ysharma@cymer.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-04-12 00:52:08 +00:00
Steve French
5443d130aa various endian fixes to cifs
make modules C=2 M=fs/cifs CF=-D__CHECK_ENDIAN__

Found for example:

 CHECK   fs/cifs/cifssmb.c
fs/cifs/cifssmb.c:728:22: warning: incorrect type in assignment (different base types)
fs/cifs/cifssmb.c:728:22:    expected unsigned short [unsigned] [usertype] Tid
fs/cifs/cifssmb.c:728:22:    got restricted __le16 [usertype] <noident>
fs/cifs/cifssmb.c:1883:45: warning: incorrect type in assignment (different base types)
fs/cifs/cifssmb.c:1883:45:    expected long long [signed] [usertype] fl_start
fs/cifs/cifssmb.c:1883:45:    got restricted __le64 [usertype] start
fs/cifs/cifssmb.c:1884:54: warning: restricted __le64 degrades to integer
fs/cifs/cifssmb.c:1885:58: warning: restricted __le64 degrades to integer
fs/cifs/cifssmb.c:1886:43: warning: incorrect type in assignment (different base types)
fs/cifs/cifssmb.c:1886:43:    expected unsigned int [unsigned] fl_pid
fs/cifs/cifssmb.c:1886:43:    got restricted __le32 [usertype] pid

In checking new smb2 code for missing endian conversions, I noticed
some endian errors had crept in over the last few releases into the
cifs code (symlink, ntlmssp, posix lock, and also a less problematic warning
in fscache).  A followon patch will address a few smb2 endian
problems.

Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-04-12 00:51:35 +00:00
Steve French
6da9791061 Elminate sparse __CHECK_ENDIAN__ warnings on port conversion
Ports are __be16 not unsigned short int

Eliminates the remaining fixable endian warnings:

~/cifs-2.6$ make modules C=1 M=fs/cifs CF=-D__CHECK_ENDIAN__
  CHECK   fs/cifs/connect.c
fs/cifs/connect.c:2408:23: warning: incorrect type in assignment (different base types)
fs/cifs/connect.c:2408:23:    expected unsigned short *sport
fs/cifs/connect.c:2408:23:    got restricted __be16 *<noident>
fs/cifs/connect.c:2410:23: warning: incorrect type in assignment (different base types)
fs/cifs/connect.c:2410:23:    expected unsigned short *sport
fs/cifs/connect.c:2410:23:    got restricted __be16 *<noident>
fs/cifs/connect.c:2416:24: warning: incorrect type in assignment (different base types)
fs/cifs/connect.c:2416:24:    expected unsigned short [unsigned] [short] <noident>
fs/cifs/connect.c:2416:24:    got restricted __be16 [usertype] <noident>
fs/cifs/connect.c:2423:24: warning: incorrect type in assignment (different base types)
fs/cifs/connect.c:2423:24:    expected unsigned short [unsigned] [short] <noident>
fs/cifs/connect.c:2423:24:    got restricted __be16 [usertype] <noident>
fs/cifs/connect.c:2326:23: warning: incorrect type in assignment (different base types)
fs/cifs/connect.c:2326:23:    expected unsigned short [unsigned] sport
fs/cifs/connect.c:2326:23:    got restricted __be16 [usertype] sin6_port
fs/cifs/connect.c:2330:23: warning: incorrect type in assignment (different base types)
fs/cifs/connect.c:2330:23:    expected unsigned short [unsigned] sport
fs/cifs/connect.c:2330:23:    got restricted __be16 [usertype] sin_port
fs/cifs/connect.c:2394:22: warning: restricted __be16 degrades to integer

Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-04-12 00:49:08 +00:00