Commit Graph

26714 Commits

Author SHA1 Message Date
Jeff Layton
3af9d8f227 cifs: fix offset handling in cifs_iovec_write
In the recent update of the cifs_iovec_write code to use async writes,
the handling of the file position was broken. That patch added a local
"offset" variable to handle the offset, and then only updated the
original "*poffset" before exiting.

Unfortunately, it copied off the original offset from the beginning,
instead of doing so after generic_write_checks had been called. Fix
this by moving the initialization of "offset" after that in the
function.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-04-19 22:16:33 -05:00
Linus Torvalds
c6f5c93098 Merge branch 'for-3.4' of git://linux-nfs.org/~bfields/linux
Pull nfsd bugfixes from J. Bruce Fields:
 "One bugfix, and one minor header fix from Jeff Layton while we're
  here"

* 'for-3.4' of git://linux-nfs.org/~bfields/linux:
  nfsd: include cld.h in the headers_install target
  nfsd: don't fail unchecked creates of non-special files
2012-04-19 14:54:52 -07:00
Trond Myklebust
451146be93 NFSv4: Fix open(O_TRUNC) and ftruncate() error handling
If the file wasn't opened for writing, then truncate and ftruncate
need to report the appropriate errors.

Reported-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2012-04-19 13:23:09 -04:00
Trond Myklebust
55725513b5 NFSv4: Ensure that we check lock exclusive/shared type against open modes
Since we may be simulating flock() locks using NFS byte range locks,
we can't rely on the VFS having checked the file open mode for us.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2012-04-19 13:23:08 -04:00
Trond Myklebust
05ffe24f52 NFSv4: Ensure that the LOCK code sets exception->inode
All callers of nfs4_handle_exception() that need to handle
NFS4ERR_OPENMODE correctly should set exception->inode

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2012-04-19 13:23:00 -04:00
Linus Torvalds
dbfad21422 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse updates from Miklos Szeredi.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: use flexible array in fuse.h
  fuse: allow nanosecond granularity
  fuse: O_DIRECT support for files
  fuse: fix nlink after unlink
2012-04-18 17:29:05 -07:00
Stefan Behrens
25cd999e1a Btrfs: fix that check_int_data mount option was ignored
The bitfield member mount_opt was too small by one bit to hold the mount
option that enabled to include data extents in the integrity checker.
Since the same issue happened when the BTRFS_MOUNT_PANIC_ON_FATAL_ERROR
option was added (git rebase silently merges so that the increase of the
size of the bitfield member is lost), the bit limit was removed entirely.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
2012-04-18 19:22:38 +02:00
Stefan Behrens
5c84fc3c39 Btrfs: don't count CRC or header errors twice while scrubbing
Each CRC or header error was counted twice, this is now fixed.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
2012-04-18 19:22:36 +02:00
Stefan Behrens
99ba55ad69 Btrfs: fix btrfs_ioctl_dev_info() crash on missing device
When a filesystem is mounted with the degraded option, it is
possible that some of the devices are not there.
btrfs_ioctl_dev_info() crashs in this case because the device
name is a NULL pointer. This ioctl was only used for scrub.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
2012-04-18 19:22:35 +02:00
Arne Jansen
b9688bb845 btrfs: don't return EINTR
It is basically a good thing if we are interruptible when waiting for
free space, but the generality in which it is implemented currently
leads to system calls being interruptible that are not documented this
way. For example git can't handle interrupted unlink(), leading to
corrupt repos under space pressure.
Instead we raise the bar to only be interruptible by SIGKILL.
Thanks to David Sterba for suggesting this.

Signed-off-by: Arne Jansen <sensille@gmx.net>
2012-04-18 19:22:33 +02:00
Dan Carpenter
253beebd5a Btrfs: double unlock bug in error handling
The caller expects this function to return with the lock held and
releases it immediately on error.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
2012-04-18 19:22:31 +02:00
Josef Bacik
5cf1ab5613 Btrfs: always store the mirror we read the eb from
A user reported a panic where we were trying to fix a bad mirror but the
mirror number we were giving was 0, which is invalid.  This is because we
don't do the transid verification until after the read, so as far as the
read code is concerned the read was a success.  So instead store the mirror
we read from so that if there is some failure post read we know which mirror
to try next and which mirror needs to be fixed if we find a good copy of the
block.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-04-18 19:22:30 +02:00
Julia Lawall
48d282326b fs/btrfs/volumes.c: add missing free_fs_devices
Free fs_devices as done in the error-handling code just below.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
2012-04-18 19:22:28 +02:00
Sergei Trofimovich
8a3db1849e btrfs: fix early abort in 'remount'
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Josef Bacik <josef@redhat.com>
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
2012-04-18 19:22:26 +02:00
Ilya Dryomov
37db63a400 Btrfs: fix max chunk size check in chunk allocator
Fix a bug, where in case we need to adjust stripe_size so that the
length of the resulting chunk is less than or equal to max_chunk_size,
DUP chunks turn out to be only half as big as they could be.

Cc: Arne Jansen <sensille@gmx.net>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2012-04-18 19:22:25 +02:00
Jan Schmidt
b916a59adf Btrfs: add missing read locks in backref.c
iref_to_path and iterate_irefs both increment the eb's refcount to use it
after releasing the path. Both depend on consistent data remaining in the
extent buffer and need a read lock to protect it.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-04-18 19:22:23 +02:00
Jan Schmidt
aefc1eb13e Btrfs: don't call free_extent_buffer twice in iterate_irefs
Avoid calling free_extent_buffer more than once when the iterator function
returns non-zero. The only code that uses this is scrub repair for corrupted
nodatasum blocks.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-04-18 19:22:21 +02:00
Jesper Juhl
4735fb2828 Btrfs: Make free_ipath() deal gracefully with NULL pointers
Make free_ipath() behave like most other freeing functions in the
kernel and gracefully do nothing when passed a NULL pointer.

Besides this making the bahaviour consistent with functions such as
kfree(), vfree(), btrfs_free_path() etc etc, it also fixes a real NULL
deref issue in fs/btrfs/ioctl.c::btrfs_ioctl_ino_to_path(). In that
function we have this code:

...
        ipath = init_ipath(size, root, path);
        if (IS_ERR(ipath)) {
                ret = PTR_ERR(ipath);
                ipath = NULL;
                goto out;
        }
...
out:
        btrfs_free_path(path);
        free_ipath(ipath);
...

If we ever take the true branch of that 'if' statement we'll end up
passing a NULL pointer to free_ipath() which will subsequently
dereference it and we'll go "Boom" :-(
This patch will avoid that.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
2012-04-18 19:22:20 +02:00
Li Zefan
cdc6a39525 Btrfs: avoid possible use-after-free in clear_extent_bit()
clear_extent_bit()
{
    next_node = rb_next(&state->rb_node);
    ...
    clear_state_bit(state);  <-- this may free next_node
    if (next_node) {
        state = rb_entry(next_node);
        ...
    }
}

clear_state_bit() calls merge_state() which may free the next node
of the passing extent_state, so clear_extent_bit() may end up
referencing freed memory.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2012-04-18 19:22:18 +02:00
Li Zefan
8e52acf704 Btrfs: retrurn void from clear_state_bit
Currently it returns a set of bits that were cleared, but this return
value is not used at all.

Moreover it doesn't seem to be useful, because we may clear the bits
of a few extent_states, but only the cleared bits of last one is
returned.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2012-04-18 19:22:16 +02:00
David Sterba
871383be59 btrfs: add missing unlocks to transaction abort paths
Added in commit 49b25e0540
("btrfs: enhance transaction abort infrastructure")

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
2012-04-18 19:22:14 +02:00
Liu Bo
8d082fb727 Btrfs: do not mount when we have a sectorsize unequal to PAGE_SIZE
Our code is not ready to cope with a sectorsize that's not equal to PAGE_SIZE.
It will lead to hanging-on while writing something.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
2012-04-18 19:22:13 +02:00
Arne Jansen
207a232cca btrfs: don't add both copies of DUP to reada extent tree
Normally when there are 2 copies of a block, we add both to the
reada extent tree and prefetch only the one that is easier to reach.
This way we can better utilize multiple devices.
In case of DUP this makes no sense as both copies reside on the
same device.

Signed-off-by: Arne Jansen <sensille@gmx.net>
2012-04-18 19:12:44 +02:00
Arne Jansen
8c9c2bf7a3 btrfs: fix race in reada
When inserting into the radix tree returns EEXIST, get the existing
entry without giving up the spinlock in between.
There was a race for both the zones trees and the extent tree.

Signed-off-by: Arne Jansen <sensille@gmx.net>
2012-04-18 19:12:44 +02:00
Li Zefan
848cce0d41 Btrfs: avoid setting ->d_op twice
Follow those instructions, and you'll trigger a warning in the
beginning of d_set_d_op():

  # mkfs.btrfs /dev/loop3
  # mount /dev/loop3 /mnt
  # btrfs sub create /mnt/sub
  # btrfs sub snap /mnt /mnt/snap
  # touch /mnt/snap/sub
  touch: cannot touch `tmp': Permission denied

__d_alloc() set d_op to sb->s_d_op (btrfs_dentry_operations), and
then simple_lookup() reset it to simple_dentry_operations, which
triggered the warning.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2012-04-18 19:12:44 +02:00
Fred Isaman
ca138f368a NFS: check for req==NULL in nfs_try_to_update_request cleanup
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-18 11:05:49 -04:00
Linus Torvalds
592fe89806 Ext4 regression fixes for 3.4
This fixes a scalability problem reported by Andi Kleen and Tim Chen;
 they were quite secretive about the precise nature of their workload,
 but they later admitted that it only showed up when they were using a
 large sparse file, so the amount of data I/O that was needed was close
 to zero.  I'm not sure how realistic this is and it's only a
 regression if you consider changes made since 2.6.39 to be a
 "regression" vis-a-vis the policy regarding post-merge window bug
 fixes, but Linus agreed it was worth fixing, so I'm including it in
 this pull request.
 
 This also fixes the journalled quota mount options, which I
 accidentally broke while I was cleaning up the mount option handling.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABCAAGBQJPjbs2AAoJENNvdpvBGATw/4UQAOMsxRlzbMnOAmohIBmesJiB
 nTPX41dNw0QCXstMFjvSyRbUJI0NZPGg6ZDhbtoQ/c42/izZOVNd7gh7QQYHRCt2
 Oqh6WS159P04ixcAe8NgCm5B5AV/C5Er/vSOUZENBaBo430vvrZasifsyUgQnx+b
 PxXlYsfPVzaQVeSxCu/68OeiBRLcLwmKZ6MicOaWt9GCNoCsWzgU+/LNskYuscPI
 841yQL6/BE7redU2E9qoEn/xjxx57hJOj2iiIAuqGPmNmRQqq3VtvTqNZHldNBLp
 Hz5mB3zSZsPg0uwvp+OxrnpP37NQCCn1L64UFIXxqGF47mcCYw7schAoGBtqwGQS
 neGUfkzG4beKk7kojyDawvnrrvVn4iCMaIkR1ZjzjPOk+QBPagckrS2nmuObbYzJ
 l4lmHq1v8nOh0clxqJPioNp5/Y13sbpEOY4tAa6sLpzKLKXF330RNuwwrKKHB7zo
 ZhCvSwVEjmxacgPCPhFJnR3fxtjoXWR8WvJs7H+gZB/QaC8hjEYw6xvvrkw8mAiS
 RNe3cYdYAz6kOJdtJrJaMzp/1CYdHydf+0WvNYCQ/d1poPr7uU5wQY61hdm2gFxh
 owsbVAOiEFjZWJHqrRRdcg2irTpINJTS3iRe4g/ltcvYzzRSeOcZNWvkKFspq3i8
 EUMHRBxLzPMRa+gU6Unm
 =jb3f
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 regression fixes from Ted Ts'o:
 "This fixes a scalability problem reported by Andi Kleen and Tim Chen;
  they were quite secretive about the precise nature of their workload,
  but they later admitted that it only showed up when they were using a
  large sparse file, so the amount of data I/O that was needed was close
  to zero.

  I'm not sure how realistic this is and it's only a regression if you
  consider changes made since 2.6.39 to be a "regression" vis-a-vis the
  policy regarding post-merge window bug fixes, but Linus agreed it was
  worth fixing, so I'm including it in this pull request.

  This also fixes the journalled quota mount options, which I
  accidentally broke while I was cleaning up the mount option handling."

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: fix handling of journalled quota options
  ext4: address scalability issue by removing extent cache statistics
2012-04-17 13:30:34 -07:00
Linus Torvalds
d44c6d4fa9 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
 "A bunch of endianness fixes and a couple of nfsd error value fixes.

  Speaking of endianness stuff, I'm rather tempted to slap

	ccflags-y += -D__CHECK_ENDIAN__

  in fs/Makefile, if not making it default for the entire tree; nfsd
  regressions I've caught make one hell of a pile and we'd obviously
  benefit from having that kind of stuff caught earlier..."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  lockd: fix the endianness bug
  ocfs2: ->e_leaf_clusters endianness breakage
  ocfs2: ->rl_count endianness breakage
  ocfs: ->rl_used breakage on big-endian
  ocfs2: ->l_next_free_req breakage on big-endian
  btrfs: btrfs_root_readonly() broken on big-endian
  ext4: fix endianness breakage in ext4_split_extent_at()
  nfsd: fix compose_entry_fh() failure exits
  nfsd: fix error value on allocation failure in nfsd4_decode_test_stateid()
  nfsd: fix endianness breakage in TEST_STATEID handling
  nfsd: fix error values returned by nfsd4_lockt() when nfsd_open() fails
  nfsd: fix b0rken error value for setattr on read-only mount
2012-04-17 13:21:50 -07:00
Linus Torvalds
bc0cf58ec7 Merge git://git.samba.org/sfrench/cifs-2.6
Pull CIFS fixes from Steve French.

* git://git.samba.org/sfrench/cifs-2.6:
  Fix number parsing in cifs_parse_mount_options
  Cleanup handling of NULL value passed for a mount option
2012-04-17 09:19:29 -07:00
Theodore Ts'o
57f73c2c89 ext4: fix handling of journalled quota options
Commit 26092bf5 broke handling of journalled quota mount options by
trying to parse argument of every mount option as a number.  Fix this
by dealing with the quota options before we call match_int().

Thanks to Jan Kara for discovering this regression.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
2012-04-16 18:55:26 -04:00
Theodore Ts'o
9cd70b347e ext4: address scalability issue by removing extent cache statistics
Andi Kleen and Tim Chen have reported that under certain circumstances
the extent cache statistics are causing scalability problems due to
cache line bounces.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
2012-04-16 12:16:20 -04:00
Linus Torvalds
659e45d8a0 Merge branch 'for-linus-min' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull the minimal btrfs branch from Chris Mason:
 "We have a use-after-free in there, along with errors when mount -o
  discard is enabled, and a BUG_ON(we should compile with UP more
  often)."

* 'for-linus-min' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: use commit root when loading free space cache
  Btrfs: fix use-after-free in __btrfs_end_transaction
  Btrfs: check return value of bio_alloc() properly
  Btrfs: remove lock assert from get_restripe_target()
  Btrfs: fix eof while discarding extents
  Btrfs: fix uninit variable in repair_eb_io_failure
  Revert "Btrfs: increase the global block reserve estimates"
2012-04-13 19:41:27 -07:00
Al Viro
e847469bf7 lockd: fix the endianness bug
comparing be32 values for < is not doing the right thing...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-04-13 13:50:52 -04:00
Al Viro
72094e43e3 ocfs2: ->e_leaf_clusters endianness breakage
le16, not le32...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-04-13 12:31:43 -04:00
Al Viro
28748b325d ocfs2: ->rl_count endianness breakage
le16, not le32...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-04-13 12:31:41 -04:00
Al Viro
e1bf4cc620 ocfs: ->rl_used breakage on big-endian
it's le16, not le32 or le64...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-04-13 12:31:38 -04:00
Al Viro
3a251f04fe ocfs2: ->l_next_free_req breakage on big-endian
It's le16, not le32...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-04-13 12:31:37 -04:00
Al Viro
6ed3cf2cdf btrfs: btrfs_root_readonly() broken on big-endian
->root_flags is __le64 and all accesses to it go through the helpers
that do proper conversions.  Except for btrfs_root_readonly(), which
checks bit 0 as in host-endian...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-04-13 11:54:32 -04:00
Sachin Prabhu
bfa890a3cd Fix number parsing in cifs_parse_mount_options
The function kstrtoul() used to parse number strings in the mount
option parser is set to expect a base 10 number . This treats the octal
numbers passed for mount options such as file_mode as base10 numbers
leading to incorrect behavior.

Change the 'base' argument passed to kstrtoul from 10 to 0 to
allow it to auto-detect the base of the number passed.

Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Acked-by: Jeff Layton <jlayton@samba.org>
Reported-by: Chris Clayton <chris2553@googlemail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-04-13 10:03:29 -05:00
Al Viro
af1584f570 ext4: fix endianness breakage in ext4_split_extent_at()
->ee_len is __le16, so assigning cpu_to_le32() to it is going to do
Bad Things(tm) on big-endian hosts...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-04-13 10:12:08 -04:00
Al Viro
efe39651f0 nfsd: fix compose_entry_fh() failure exits
Restore the original logics ("fail on mountpoints, negatives and in
case of fh_compose() failures").  Since commit 8177e (nfsd: clean up
readdirplus encoding) that got broken -
	rv = fh_compose(fhp, exp, dchild, &cd->fh);
	if (rv)
	       goto out;
	if (!dchild->d_inode)
		goto out;
	rv = 0;
out:
is equivalent to
	rv = fh_compose(fhp, exp, dchild, &cd->fh);
out:
and the second check has no effect whatsoever...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-04-13 10:12:02 -04:00
Al Viro
afcf6792af nfsd: fix error value on allocation failure in nfsd4_decode_test_stateid()
PTR_ERR(NULL) is going to be 0...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-04-13 10:12:01 -04:00
Al Viro
02f5fde5df nfsd: fix endianness breakage in TEST_STATEID handling
->ts_id_status gets nfs errno, i.e. it's already big-endian; no need
to apply htonl() to it.  Broken by commit 174568 (NFSD: Added TEST_STATEID
operation) last year...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-04-13 10:12:01 -04:00
Al Viro
04da6e9d63 nfsd: fix error values returned by nfsd4_lockt() when nfsd_open() fails
nfsd_open() already returns an NFS error value; only vfs_test_lock()
result needs to be fed through nfserrno().  Broken by commit 55ef12
(nfsd: Ensure nfsv4 calls the underlying filesystem on LOCKT)
three years ago...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-04-13 10:12:01 -04:00
Al Viro
96f6f98501 nfsd: fix b0rken error value for setattr on read-only mount
..._want_write() returns -EROFS on failure, _not_ an NFS error value.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-04-13 10:12:00 -04:00
Josef Bacik
d53ba47484 Btrfs: use commit root when loading free space cache
A user reported that booting his box up with btrfs root on 3.4 was way
slower than on 3.3 because I removed the ideal caching code.  It turns out
that we don't load the free space cache if we're in a commit for deadlock
reasons, but since we're reading the cache and it hasn't changed yet we are
safe reading the inode and free space item from the commit root, so do that
and remove all of the deadlock checks so we don't unnecessarily skip loading
the free space cache.  The user reported this fixed the slowness.  Thanks,

Tested-by: Calvin Walton <calvin.walton@kepstin.ca>
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-12 20:54:01 -04:00
Linus Torvalds
f5ad501006 Driver core and kobject fixes for 3.4-rc2
Here are some minor fixes for the driver core and kobjects that people have
 reported recently.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iEYEABECAAYFAk+HTLAACgkQMUfUDdst+ylwlACcCcVRLdGIyW7Qjd1VB8hAVCSy
 27EAniaKfzOiPAkrb9718D6gfdBTVU0B
 =c1Bm
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-3.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core and kobject fixes from Greg KH:
 "Here are some minor fixes for the driver core and kobjects that people
  have reported recently.

  Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"

* tag 'driver-core-3.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  kobject: provide more diagnostic info for kobject_add_internal() failures
  sysfs: handle 'parent deleted before child added'
  sysfs: Prevent crash on unset sysfs group attributes
  sysfs: Update the name hash for an entry after changing the namespace
  drivers/base: fix compiler warning in SoC export driver - idr should be ida
  drivers/base: Remove unneeded spin_lock_init call for soc_lock
2012-04-12 15:34:33 -07:00
Linus Torvalds
ccb1ec95e9 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
 "The itimer removal one is not strictly a fix, but I really wanted to
  avoid a rebase of the urgent ones."

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Revert "clocksource: Load the ACPI PM clocksource asynchronously"
  clockevents: tTack broadcast device mode change in tick_broadcast_switch_to_oneshot()
  itimer: Use printk_once instead of WARN_ONCE
  nohz: Fix stale jiffies update in tick_nohz_restart()
  tick: Document TICK_ONESHOT config option
  proc: stats: Use arch_idle_time for idle and iowait times if available
  itimer: Schedule silent NULL pointer fixup in setitimer() for removal
2012-04-12 15:16:26 -07:00
Dave Jones
4edc2ca388 Btrfs: fix use-after-free in __btrfs_end_transaction
49b25e0540 introduced a use-after-free bug
that caused spurious -EIO's to be returned.

Do the check before we free the transaction.

Cc: David Sterba <dsterba@suse.cz>
Cc: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-12 16:03:56 -04:00
Tsutomu Itoh
e627ee7bcd Btrfs: check return value of bio_alloc() properly
bio_alloc() has the possibility of returning NULL.
So, it is necessary to check the return value.

Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-12 16:03:56 -04:00
Ilya Dryomov
c6664b42c4 Btrfs: remove lock assert from get_restripe_target()
This fixes a regression introduced by fc67c450.  spin_is_locked() always
returns 0 on UP kernels, which caused assert in get_restripe_target() to
be fired on every call from btrfs_reduce_alloc_profile() on UP systems.
Remove it completely for now, it's not clear if it's going to be needed
in future.

Reported-by: Bobby Powers <bobbypowers@gmail.com>
Reported-by: Mitch Harder <mitch.harder@sabayonlinux.org>
Tested-by: Mitch Harder <mitch.harder@sabayonlinux.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-12 16:03:56 -04:00
Liu Bo
b89203f74b Btrfs: fix eof while discarding extents
We miscalculate the length of extents we're discarding, and it leads to
an eof of device.

Reported-by: Daniel Blueman <daniel@quora.org>
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-12 16:03:56 -04:00
Chris Mason
d95603b262 Btrfs: fix uninit variable in repair_eb_io_failure
We'd have to be passing bogus extent buffers for this uninit variable to
actually be used, but set it to zero just in case.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-12 15:55:15 -04:00
Linus Torvalds
5ba7026b44 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
 "Regression fix in mtdchar_open(), fix for a really old leak
  (almost never hit in practice - it's a b0rken failure exit in
  simple_fill_super()) and a typo fix in vfs.txt (misspelled
  method type)."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  typo fix in Documentation/filesystems/vfs.txt
  dentry leak in simple_fill_super() failure exit
  fix breakage in mtdchar_open(), sanitize failure exits
2012-04-12 12:07:39 -07:00
Chris Mason
8e62c2de6e Revert "Btrfs: increase the global block reserve estimates"
This reverts commit 5500cdbe14.

We've had a number of complaints of early enospc that bisect down
to this patch.  We'll hae to fix the reservations differently.

CC: stable@kernel.org
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-12 13:46:48 -04:00
Sachin Prabhu
4fe9e9639d Cleanup handling of NULL value passed for a mount option
Allow blank user= and ip= mount option. Also clean up redundant
checks for NULL values since the token parser will not actually
match mount options with NULL values unless explicitly specified.

Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Reported-by: Chris Clayton <chris2553@googlemail.com>
Acked-by: Jeff Layton <jlayton@samba.org>
Tested-by: Chris Clayton <chris2553@googlemail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-04-11 22:53:02 -05:00
J. Bruce Fields
9dc4e6c4d1 nfsd: don't fail unchecked creates of non-special files
Allow a v3 unchecked open of a non-regular file succeed as if it were a
lookup; typically a client in such a case will want to fall back on a
local open, so succeeding and giving it the filehandle is more useful
than failing with nfserr_exist, which makes it appear that nothing at
all exists by that name.

Similarly for v4, on an open-create, return the same errors we would on
an attempt to open a non-regular file, instead of returning
nfserr_exist.

This fixes a problem found doing a v4 open of a symlink with
O_RDONLY|O_CREAT, which resulted in the current client returning EEXIST.

Thanks also to Trond for analysis.

Cc: stable@kernel.org
Reported-by: Orion Poplawski <orion@cora.nwra.com>
Tested-by: Orion Poplawski <orion@cora.nwra.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-04-11 17:49:52 -04:00
Linus Torvalds
1b6150fe82 Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes
Pull GFS2 fixes from Steven Whitehouse

* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes:
  GFS2: Allow caching of rindex glock
  GFS2: Make sure rindex is uptodate before starting transactions
  GFS2: use depends instead of select in kconfig
  GFS2: put glock reference in error patch of read_rindex_entry
2012-04-11 11:04:45 -07:00
Miklos Szeredi
0a2da9b2ef fuse: allow nanosecond granularity
Derrik Pates reports that an utimensat with a NULL argument results in the
current time being sent from the kernel with 1 second granularity.

Reported-by: Derrik Pates <demon@now.ai>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2012-04-11 11:45:06 +02:00
Dan Williams
3a198886ab sysfs: handle 'parent deleted before child added'
In scsi at least two cases of the parent device being deleted before the
child is added have been observed.

1/ scsi is performing async scans and the device is removed prior to the
   async can thread running (can happen with an in-opportune / unlikely
   unplug during initial scan).

2/ libsas discovery event running after the parent port has been torn
   down (this is a bug in libsas).

Result in crash signatures like:
 BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
 IP: [<ffffffff8115e100>] sysfs_create_dir+0x32/0xb6
 ...
 Process scsi_scan_8 (pid: 5417, threadinfo ffff88080bd16000, task ffff880801b8a0b0)
 Stack:
  00000000fffffffe ffff880813470628 ffff88080bd17cd0 ffff88080614b7e8
  ffff88080b45c108 00000000fffffffe ffff88080bd17d20 ffffffff8125e4a8
  ffff88080bd17cf0 ffffffff81075149 ffff88080bd17d30 ffff88080614b7e8
 Call Trace:
  [<ffffffff8125e4a8>] kobject_add_internal+0x120/0x1e3
  [<ffffffff81075149>] ? trace_hardirqs_on+0xd/0xf
  [<ffffffff8125e641>] kobject_add_varg+0x41/0x50
  [<ffffffff8125e70b>] kobject_add+0x64/0x66
  [<ffffffff8131122b>] device_add+0x12d/0x63a

In this scenario the parent is still valid (because we have a
reference), but it has been device_del()'d which means its kobj->sd
pointer is NULL'd via:

 device_del()->kobject_del()->sysfs_remove_dir()

...and then sysfs_create_dir() (without this fix) goes ahead and
de-references parent_sd via sysfs_ns_type():

 return (sd->s_flags & SYSFS_NS_TYPE_MASK) >> SYSFS_NS_TYPE_SHIFT;

This scenario is being fixed in scsi/libsas, but if other subsystems
present the same ordering the system need not immediately crash.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: James Bottomley <JBottomley@parallels.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-10 14:48:51 -07:00
Bruno Prémont
5631f2c18f sysfs: Prevent crash on unset sysfs group attributes
Do not let the kernel crash when a device is registered with
sysfs while group attributes are not set (aka NULL).

Warn about the offender with some information about the offending
device.

This would warn instead of trying NULL pointer deref like:
 BUG: unable to handle kernel NULL pointer dereference at (null)
 IP: [<ffffffff81152673>] internal_create_group+0x83/0x1a0
 PGD 0
 Oops: 0000 [#1] SMP
 CPU 0
 Modules linked in:

 Pid: 1, comm: swapper/0 Not tainted 3.4.0-rc1-x86_64 #3 HP ProLiant DL360 G4
 RIP: 0010:[<ffffffff81152673>]  [<ffffffff81152673>] internal_create_group+0x83/0x1a0
 RSP: 0018:ffff88019485fd70  EFLAGS: 00010202
 RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000001
 RDX: ffff880192e99908 RSI: ffff880192e99630 RDI: ffffffff81a26c60
 RBP: ffff88019485fdc0 R08: 0000000000000000 R09: 0000000000000000
 R10: ffff880192e99908 R11: 0000000000000000 R12: ffffffff81a16a00
 R13: ffff880192e99908 R14: ffffffff81a16900 R15: 0000000000000000
 FS:  0000000000000000(0000) GS:ffff88019bc00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
 CR2: 0000000000000000 CR3: 0000000001a0c000 CR4: 00000000000007f0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
 Process swapper/0 (pid: 1, threadinfo ffff88019485e000, task ffff880194878000)
 Stack:
  ffff88019485fdd0 ffff880192da9d60 0000000000000000 ffff880192e99908
  ffff880192e995d8 0000000000000001 ffffffff81a16a00 ffff880192da9d60
  0000000000000000 0000000000000000 ffff88019485fdd0 ffffffff811527be
 Call Trace:
  [<ffffffff811527be>] sysfs_create_group+0xe/0x10
  [<ffffffff81376ca6>] device_add_groups+0x46/0x80
  [<ffffffff81377d3d>] device_add+0x46d/0x6a0
  ...

Signed-off-by: Bruno Prémont <bonbons@linux-vserver.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-10 14:48:51 -07:00
Bob Peterson
ca9248d833 GFS2: Allow caching of rindex glock
This patch allows caching of the rindex glock. We were previously
setting the GL_NOCACHE bit when the glock was released. That forced
the rindex inode to be invalidated, which caused us to re-read
rindex at the next access. However, it caused the glock to be
unnecessarily bounced around the cluster. This patch allows
the glock to remain cached, but it still causes the rindex to be
re-read once it has been written to by gfs2_grow.

Ben and I have tested single-node gfs2_grow cases and I've tested
clustered gfs2_grow cases on my four-node cluster.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-04-10 13:49:53 +01:00
Tom Goff
70fa4a62e9 sysfs: Update the name hash for an entry after changing the namespace
This is needed to allow renaming network devices that have been moved
to another network namespace.

Signed-off-by: Tom Goff <thomas.goff@boeing.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-09 15:33:00 -07:00
Al Viro
640946f203 dentry leak in simple_fill_super() failure exit
d_genocide() does _not_ evict dentries; it just removes extra ref
pinning each of those.  Normally it's followed by shrinking the
tree (it's done just before generic_shutdown_super() by kill_litter_super()),
but in case of simple_fill_super() nothing of that kind will follow.
Just do shrink_dcache_parent() manually.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-04-09 01:39:22 -04:00
Linus Torvalds
f68e556e23 Make the "word-at-a-time" helper functions more commonly usable
I have a new optimized x86 "strncpy_from_user()" that will use these
same helper functions for all the same reasons the name lookup code uses
them.  This is preparation for that.

This moves them into an architecture-specific header file.  It's
architecture-specific for two reasons:

 - some of the functions are likely to want architecture-specific
   implementations.  Even if the current code happens to be "generic" in
   the sense that it should work on any little-endian machine, it's
   likely that the "multiply by a big constant and shift" implementation
   is less than optimal for an architecture that has a guaranteed fast
   bit count instruction, for example.

 - I expect that if architectures like sparc want to start playing
   around with this, we'll need to abstract out a few more details (in
   particular the actual unaligned accesses).  So we're likely to have
   more architecture-specific stuff if non-x86 architectures start using
   this.

   (and if it turns out that non-x86 architectures don't start using
   this, then having it in an architecture-specific header is still the
   right thing to do, of course)

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-06 13:54:56 -07:00
Linus Torvalds
23f347ef63 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking updates from David Miller:

 1) Fix inaccuracies in network driver interface documentation, from Ben
    Hutchings.

 2) Fix handling of negative offsets in BPF JITs, from Jan Seiffert.

 3) Compile warning, locking, and refcounting fixes in netfilter's
    xt_CT, from Pablo Neira Ayuso.

 4) phonet sendmsg needs to validate user length just like any other
    datagram protocol, fix from Sasha Levin.

 5) Ipv6 multicast code uses wrong loop index, from RongQing Li.

 6) Link handling and firmware fixes in bnx2x driver from Yaniv Rosner
    and Yuval Mintz.

 7) mlx4 erroneously allocates 4 pages at a time, regardless of page
    size, fix from Thadeu Lima de Souza Cascardo.

 8) SCTP socket option wasn't extended in a backwards compatible way,
    fix from Thomas Graf.

 9) Add missing address change event emissions to bonding, from Shlomo
    Pongratz.

10) /proc/net/dev regressed because it uses a private offset to track
    where we are in the hash table, but this doesn't track the offset
    pullback that the seq_file code does resulting in some entries being
    missed in large dumps.

    Fix from Eric Dumazet.

11) do_tcp_sendpage() unloads the send queue way too fast, because it
    invokes tcp_push() when it shouldn't.  Let the natural sequence
    generated by the splice paths, and the assosciated MSG_MORE
    settings, guide the tcp_push() calls.

    Otherwise what goes out of TCP is spaghetti and doesn't batch
    effectively into GSO/TSO clusters.

    From Eric Dumazet.

12) Once we put a SKB into either the netlink receiver's queue or a
    socket error queue, it can be consumed and freed up, therefore we
    cannot touch it after queueing it like that.

    Fixes from Eric Dumazet.

13) PPP has this annoying behavior in that for every transmit call it
    immediately stops the TX queue, then calls down into the next layer
    to transmit the PPP frame.

    But if that next layer can take it immediately, it just un-stops the
    TX queue right before returning from the transmit method.

    Besides being useless work, it makes several facilities unusable, in
    particular things like the equalizers.  Well behaved devices should
    only stop the TX queue when they really are full, and in PPP's case
    when it gets backlogged to the downstream device.

    David Woodhouse therefore fixed PPP to not stop the TX queue until
    it's downstream can't take data any more.

14) IFF_UNICAST_FLT got accidently lost in some recent stmmac driver
    changes, re-add.  From Marc Kleine-Budde.

15) Fix link flaps in ixgbe, from Eric W. Multanen.

16) Descriptor writeback fixes in e1000e from Matthew Vick.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (47 commits)
  net: fix a race in sock_queue_err_skb()
  netlink: fix races after skb queueing
  doc, net: Update ndo_start_xmit return type and values
  doc, net: Remove instruction to set net_device::trans_start
  doc, net: Update netdev operation names
  doc, net: Update documentation of synchronisation for TX multiqueue
  doc, net: Remove obsolete reference to dev->poll
  ethtool: Remove exception to the requirement of holding RTNL lock
  MAINTAINERS: update for Marvell Ethernet drivers
  bonding: properly unset current_arp_slave on slave link up
  phonet: Check input from user before allocating
  tcp: tcp_sendpages() should call tcp_push() once
  ipv6: fix array index in ip6_mc_add_src()
  mlx4: allocate just enough pages instead of always 4 pages
  stmmac: re-add IFF_UNICAST_FLT for dwmac1000
  bnx2x: Clear MDC/MDIO warning message
  bnx2x: Fix BCM57711+BCM84823 link issue
  bnx2x: Clear BCM84833 LED after fan failure
  bnx2x: Fix BCM84833 PHY FW version presentation
  bnx2x: Fix link issue for BCM8727 boards.
  ...
2012-04-06 10:37:38 -07:00
Eric Dumazet
35f9c09fe9 tcp: tcp_sendpages() should call tcp_push() once
commit 2f53384424 (tcp: allow splice() to build full TSO packets) added
a regression for splice() calls using SPLICE_F_MORE.

We need to call tcp_flush() at the end of the last page processed in
tcp_sendpages(), or else transmits can be deferred and future sends
stall.

Add a new internal flag, MSG_SENDPAGE_NOTLAST, acting like MSG_MORE, but
with different semantic.

For all sendpage() providers, its a transparent change. Only
sock_sendpage() and tcp_sendpages() can differentiate the two different
flags provided by pipe_to_sendpage()

Reported-by: Tom Herbert <therbert@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: H.K. Jerry Chu <hkchu@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail>com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-05 19:04:27 -04:00
Linus Torvalds
5d32c88f0b Merge branch 'akpm' (Andrew's patch-bomb)
Merge batch of fixes from Andrew Morton:
 "The simple_open() cleanup was held back while I wanted for laggards to
  merge things.

  I still need to send a few checkpoint/restore patches.  I've been
  wobbly about merging them because I'm wobbly about the overall
  prospects for success of the project.  But after speaking with Pavel
  at the LSF conference, it sounds like they're further toward
  completion than I feared - apparently davem is at the "has stopped
  complaining" stage regarding the net changes.  So I need to go back
  and re-review those patchs and their (lengthy) discussion."

* emailed from Andrew Morton <akpm@linux-foundation.org>: (16 patches)
  memcg swap: use mem_cgroup_uncharge_swap fix
  backlight: add driver for DA9052/53 PMIC v1
  C6X: use set_current_blocked() and block_sigmask()
  MAINTAINERS: add entry for sparse checker
  MAINTAINERS: fix REMOTEPROC F: typo
  alpha: use set_current_blocked() and block_sigmask()
  simple_open: automatically convert to simple_open()
  scripts/coccinelle/api/simple_open.cocci: semantic patch for simple_open()
  libfs: add simple_open()
  hugetlbfs: remove unregister_filesystem() when initializing module
  drivers/rtc/rtc-88pm860x.c: fix rtc irq enable callback
  fs/xattr.c:setxattr(): improve handling of allocation failures
  fs/xattr.c:listxattr(): fall back to vmalloc() if kmalloc() failed
  fs/xattr.c: suppress page allocation failure warnings from sys_listxattr()
  sysrq: use SEND_SIG_FORCED instead of force_sig()
  proc: fix mount -t proc -o AAA
2012-04-05 15:30:34 -07:00
Stephen Boyd
234e340582 simple_open: automatically convert to simple_open()
Many users of debugfs copy the implementation of default_open() when
they want to support a custom read/write function op.  This leads to a
proliferation of the default_open() implementation across the entire
tree.

Now that the common implementation has been consolidated into libfs we
can replace all the users of this function with simple_open().

This replacement was done with the following semantic patch:

<smpl>
@ open @
identifier open_f != simple_open;
identifier i, f;
@@
-int open_f(struct inode *i, struct file *f)
-{
(
-if (i->i_private)
-f->private_data = i->i_private;
|
-f->private_data = i->i_private;
)
-return 0;
-}

@ has_open depends on open @
identifier fops;
identifier open.open_f;
@@
struct file_operations fops = {
...
-.open = open_f,
+.open = simple_open,
...
};
</smpl>

[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-05 15:25:50 -07:00
Stephen Boyd
20955e891d libfs: add simple_open()
debugfs and a few other drivers use an open-coded version of
simple_open() to pass a pointer from the file to the read/write file
ops.  Add support for this simple case to libfs so that we can remove
the many duplicate copies of this simple function.

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-05 15:25:50 -07:00
Hillf Danton
7563ec4c21 hugetlbfs: remove unregister_filesystem() when initializing module
It was introduced by d1d5e05ffd ("hugetlbfs: return error code when
initializing module") but as Al pointed out, is a bad idea.

Quoted comments from Al:
 "Note that unregister_filesystem() in module init is *always* wrong;
  it's not an issue here (it's done too early to care about and
  realistically the box is not going anywhere - it'll panic when attempt
  to exec /sbin/init fails, if not earlier), but it's a damn bad
  example.

  Consider a normal fs module.  Somebody loads it and in parallel with
  that we get a mount attempt on that fs type.  It comes between
  register and failure exits that causes unregister; at that point we
  are screwed since grabbing a reference to module as done by mount is
  enough to prevent exit, but not to prevent the failure of init.  As
  the result, module will get freed when init fails, mounted fs of that
  type be damned."

So remove it.

Signed-off-by: Hillf Danton <dhillf@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-05 15:25:50 -07:00
Andrew Morton
44c824982f fs/xattr.c:setxattr(): improve handling of allocation failures
This allocation can be as large as 64k.

 - Add __GFP_NOWARN so the a falied kmalloc() is silent

 - Fall back to vmalloc() if the kmalloc() failed

Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: David Rientjes <rientjes@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-05 15:25:50 -07:00
Andrew Morton
0d08d7b7e1 fs/xattr.c:listxattr(): fall back to vmalloc() if kmalloc() failed
This allocation can be as large as 64k.  As David points out, "falling
back to vmalloc here is much better solution than failing to retreive
the attribute - it will work no matter how fragmented memory gets.  That
means we don't get incomplete backups occurring after days or months of
uptime and successful backups".

Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: David Rientjes <rientjes@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-05 15:25:50 -07:00
Dave Jones
703bf2d122 fs/xattr.c: suppress page allocation failure warnings from sys_listxattr()
This size is user controllable, up to a maximum of XATTR_LIST_MAX (64k).
So it's trivial for someone to trigger a stream of order:4 page
allocation errors.

Signed-off-by: Dave Jones <davej@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dave Chinner <david@fromorbit.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-05 15:25:50 -07:00
Vasiliy Kulikov
99663be772 proc: fix mount -t proc -o AAA
The proc_parse_options() call from proc_mount() runs only once at boot
time.  So on any later mount attempt, any mount options are ignored
because ->s_root is already initialized.

As a consequence, "mount -o <options>" will ignore the options.  The
only way to change mount options is "mount -o remount,<options>".

To fix this, parse the mount options unconditionally.

Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Reported-by: Arkadiusz Miskiewicz <a.miskiewicz@gmail.com>
Tested-by: Arkadiusz Miskiewicz <a.miskiewicz@gmail.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-05 15:25:50 -07:00
Bob Peterson
5e2f7d617b GFS2: Make sure rindex is uptodate before starting transactions
This patch removes the call from gfs2_blk2rgrd to function
gfs2_rindex_update and replaces it with individual calls.
The former way turned out to be too problematic.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-04-05 10:20:10 +01:00
Linus Torvalds
43f63c8711 Merge git://git.samba.org/sfrench/cifs-2.6
Pull CIFS fixes from Steve French.

* git://git.samba.org/sfrench/cifs-2.6:
  Fix UNC parsing on mount
  Remove unnecessary check for NULL in password parser
  CIFS: Fix VFS lock usage for oplocked files
  Revert "CIFS: Fix VFS lock usage for oplocked files"
  cifs: writing past end of struct in cifs_convert_address()
  cifs: silence compiler warnings showing up with gcc-4.7.0
  CIFS: Fix VFS lock usage for oplocked files
2012-04-04 18:37:09 -07:00
Linus Torvalds
66cfb32772 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar.

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/p4: Add format attributes
  tracing, sched, vfs: Fix 'old_pid' usage in trace_sched_process_exec()
2012-04-04 10:04:42 -07:00
Sachin Prabhu
e4b41fb9da Fix UNC parsing on mount
The code cleanup of cifs_parse_mount_options resulted in a new bug being
introduced in the parsing of the UNC. This results in vol->UNC being
modified before vol->UNC was allocated.

Reported-by: Steve French <smfrench@gmail.com>
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-04-03 20:46:09 -05:00
Sachin Prabhu
1023807458 Remove unnecessary check for NULL in password parser
The password parser has an unnecessary check for a NULL value which
triggers warnings in source checking tools. The code contains artifacts
from the old parsing code which are no longer required.

Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-04-03 18:04:35 -05:00
Pavel Shilovsky
66189be74f CIFS: Fix VFS lock usage for oplocked files
We can deadlock if we have a write oplock and two processes
use the same file handle. In this case the first process can't
unlock its lock if the second process blocked on the lock in the
same time.

Fix it by using posix_lock_file rather than posix_lock_file_wait
under cinode->lock_mutex. If we request a blocking lock and
posix_lock_file indicates that there is another lock that prevents
us, wait untill that lock is released and restart our call.

Cc: stable@kernel.org
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-04-01 13:54:27 -05:00
Steve French
9ebb389d0a Revert "CIFS: Fix VFS lock usage for oplocked files"
Revert previous version of patch to incorporate feedback
so that we can merge version 3 of the patch instead.w

This reverts commit b5efb97846.
2012-04-01 13:52:54 -05:00
Dan Carpenter
2545e0720a cifs: writing past end of struct in cifs_convert_address()
"s6->sin6_scope_id" is an int bits but strict_strtoul() writes a long
so this can corrupt memory on 64 bit systems.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-03-31 17:32:18 -05:00
Jeff Layton
b2a3ad9ca5 cifs: silence compiler warnings showing up with gcc-4.7.0
gcc-4.7.0 has started throwing these warnings when building cifs.ko.

  CC [M]  fs/cifs/cifssmb.o
fs/cifs/cifssmb.c: In function ‘CIFSSMBSetCIFSACL’:
fs/cifs/cifssmb.c:3905:9: warning: array subscript is above array bounds [-Warray-bounds]
fs/cifs/cifssmb.c: In function ‘CIFSSMBSetFileInfo’:
fs/cifs/cifssmb.c:5711:8: warning: array subscript is above array bounds [-Warray-bounds]
fs/cifs/cifssmb.c: In function ‘CIFSSMBUnixSetFileInfo’:
fs/cifs/cifssmb.c:6001:25: warning: array subscript is above array bounds [-Warray-bounds]

This patch cleans up the code a bit by using the offsetof macro instead
of the funky "&pSMB->hdr.Protocol" construct.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-03-31 17:31:25 -05:00
Pavel Shilovsky
b5efb97846 CIFS: Fix VFS lock usage for oplocked files
We can deadlock if we have a write oplock and two processes
use the same file handle. In this case the first process can't
unlock its lock if another process blocked on the lock in the
same time.

Fix this by removing lock_mutex protection from waiting on a
blocked lock and protect only posix_lock_file call.

Cc: stable@kernel.org
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-03-31 17:30:48 -05:00
Linus Torvalds
8bb1f22952 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull second try at vfs part d#2 from Al Viro:
 "Miklos' first series (with do_lookup() rewrite split into edible
  chunks) + assorted bits and pieces.

  The 'untangling of do_lookup()' series is is a splitup of what used to
  be a monolithic patch from Miklos, so this series is basically "how do
  I convince myself that his patch is correct (or find a hole in it)".
  No holes found and I like the resulting cleanup, so in it went..."

Changes from try 1: Fix a boot problem with selinux, and commit messages
prettied up a bit.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (24 commits)
  vfs: fix out-of-date dentry_unhash() comment
  vfs: split __lookup_hash
  untangling do_lookup() - take __lookup_hash()-calling case out of line.
  untangling do_lookup() - switch to calling __lookup_hash()
  untangling do_lookup() - merge d_alloc_and_lookup() callers
  untangling do_lookup() - merge failure exits in !dentry case
  untangling do_lookup() - massage !dentry case towards __lookup_hash()
  untangling do_lookup() - get rid of need_reval in !dentry case
  untangling do_lookup() - eliminate a loop.
  untangling do_lookup() - expand the area under ->i_mutex
  untangling do_lookup() - isolate !dentry stuff from the rest of it.
  vfs: move MAY_EXEC check from __lookup_hash()
  vfs: don't revalidate just looked up dentry
  vfs: fix d_need_lookup/d_revalidate order in do_lookup
  ext3: move headers to fs/ext3/
  migrate ext2_fs.h guts to fs/ext2/ext2.h
  new helper: ext2_image_size()
  get rid of pointless includes of ext2_fs.h
  ext2: No longer export ext2_fs.h to user space
  mtdchar: kill persistently held vfsmount
  ...
2012-03-31 13:42:57 -07:00
J. Bruce Fields
c0d0259481 vfs: fix out-of-date dentry_unhash() comment
64252c75a2 "vfs: remove dget() from
dentry_unhash()" changed the implementation but not the comment.

Cc: Sage Weil <sage@newdream.net>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-31 16:03:17 -04:00
Miklos Szeredi
bad6118978 vfs: split __lookup_hash
Split __lookup_hash into two component functions:

 lookup_dcache - tries cached lookup, returns whether real lookup is needed
 lookup_real - calls i_op->lookup

This eliminates code duplication between d_alloc_and_lookup() and
d_inode_lookup().

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-31 16:03:17 -04:00
Al Viro
81e6f52089 untangling do_lookup() - take __lookup_hash()-calling case out of line.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-31 16:03:17 -04:00
Al Viro
a32555466c untangling do_lookup() - switch to calling __lookup_hash()
now we have __lookup_hash() open-coded if !dentry case;
just call the damn thing instead...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-31 16:03:16 -04:00
Al Viro
a6ecdfcfba untangling do_lookup() - merge d_alloc_and_lookup() callers
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-31 16:03:16 -04:00
Al Viro
ec335e91a4 untangling do_lookup() - merge failure exits in !dentry case
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-31 16:03:16 -04:00
Al Viro
d774a058d9 untangling do_lookup() - massage !dentry case towards __lookup_hash()
Reorder if-else cases for starters...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-31 16:03:16 -04:00
Al Viro
08b0ab7c20 untangling do_lookup() - get rid of need_reval in !dentry case
Everything arriving into if (!dentry) will have need_reval = 1.
Indeed, the only way to get there with need_reval reset to 0 would
be via
	if (unlikely(d_need_lookup(dentry)))
		goto unlazy;
	if (unlikely(dentry->d_flags & DCACHE_OP_REVALIDATE)) {
		status = d_revalidate(dentry, nd);
	if (unlikely(status <= 0)) {
		if (status != -ECHILD)
			need_reval = 0;
		goto unlazy;
...
unlazy:
	/* no assignments to dentry */
	if (dentry && unlikely(d_need_lookup(dentry))) {
		dput(dentry);
		dentry = NULL;
	}
and if d_need_lookup() had already been false the first time around, it
will remain false on the second call as well.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-31 16:03:16 -04:00
Al Viro
acc9cb3cd4 untangling do_lookup() - eliminate a loop.
d_lookup() *will* fail after successful d_invalidate(), if we are
holding i_mutex all along.  IOW, we don't need to jump back to
l: - we know what path will be taken there and can do that (i.e.
d_alloc_and_lookup()) directly.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-31 16:03:16 -04:00
Al Viro
37c17e1f37 untangling do_lookup() - expand the area under ->i_mutex
keep holding ->i_mutex over revalidation parts

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-31 16:03:16 -04:00
Al Viro
3f6c7c71a2 untangling do_lookup() - isolate !dentry stuff from the rest of it.
Duplicate the revalidation-related parts into if (!dentry) branch.
Next step will be to pull them under i_mutex.

This and the next 8 commits are more or less a splitup of patch
by Miklos; folks, when you are working with something that convoluted,
carve your patches up into easily reviewed steps, especially when
a lot of codepaths involved are rarely hit...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-31 16:03:16 -04:00
Miklos Szeredi
cda309de25 vfs: move MAY_EXEC check from __lookup_hash()
The only caller of __lookup_hash() that needs the exec permission check on
parent is lookup_one_len().

All lookup_hash() callers already checked permission in LOOKUP_PARENT walk.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-31 16:03:16 -04:00
Miklos Szeredi
3637c05d88 vfs: don't revalidate just looked up dentry
__lookup_hash() calls ->lookup() if the dentry needs lookup and on success
revalidates the dentry (all under dir->i_mutex).

While this is harmless it doesn't make a lot of sense.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-31 16:03:16 -04:00
Miklos Szeredi
fa4ee15951 vfs: fix d_need_lookup/d_revalidate order in do_lookup
Doing revalidate on a dentry which has not yet been looked up makes no sense.

Move the d_need_lookup() check before d_revalidate().

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-31 16:03:16 -04:00
Al Viro
4613ad180d ext3: move headers to fs/ext3/
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-31 16:03:16 -04:00
Al Viro
f7699f2b01 migrate ext2_fs.h guts to fs/ext2/ext2.h
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-31 16:03:16 -04:00
Al Viro
2f99c36986 get rid of pointless includes of ext2_fs.h
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-31 16:03:15 -04:00
Al Viro
22a71c3055 pstore: trim pstore_get_inode()
move mode-dependent parts to callers, kill unused arguments

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-31 16:03:15 -04:00
Al Viro
a2e1859adb aio: take final put_ioctx() into callers of io_destroy()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-31 16:03:15 -04:00
Al Viro
06af121eab aio: merge aio_cancel_all() with wait_for_all_aios()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-31 16:03:15 -04:00
Oleg Nesterov
6308191f6f tracing, sched, vfs: Fix 'old_pid' usage in trace_sched_process_exec()
1. TRACE_EVENT(sched_process_exec) forgets to actually use the
   old pid argument, it sets ->old_pid = p->pid.

2. search_binary_handler() uses the wrong pid number. tracepoint
   needs the global pid_t from the root namespace, while old_pid
   is the virtual pid number as it seen by the tracer/parent.

With this patch we have two pid_t's in search_binary_handler(),
not really nice. Perhaps we should switch to "struct pid*", but
in this case it would be better to cleanup the current code
first and move the "depth == 0" code outside.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: David Smith <dsmith@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Link: http://lkml.kernel.org/r/20120330162636.GA4857@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-03-31 11:53:22 +02:00
Linus Torvalds
623ff7739e MTD merge for 3.4
Artem's cleanup of the MTD API continues apace.
 Fixes and improvements for ST FSMC and SuperH FLCTL NAND, amongst others.
 More work on DiskOnChip G3, new driver for DiskOnChip G4.
 Clean up debug/warning printks in JFFS2 to use pr_<level>.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iEYEABECAAYFAk92K6UACgkQdwG7hYl686NrMACfWQJRWasR78MWKfkT2vWZwTFJ
 X5AAoKiSYO2pfo5gWJGOAahNC1zUqMX0
 =i3Vb
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-3.4' of git://git.infradead.org/mtd-2.6

Pull MTD changes from David Woodhouse:
 - Artem's cleanup of the MTD API continues apace.
 - Fixes and improvements for ST FSMC and SuperH FLCTL NAND, amongst
   others.
 - More work on DiskOnChip G3, new driver for DiskOnChip G4.
 - Clean up debug/warning printks in JFFS2 to use pr_<level>.

Fix up various trivial conflicts, largely due to changes in calling
conventions for things like dmaengine_prep_slave_sg() (new inline
wrapper to hide new parameter, clashing with rewrite of previously last
parameter that used to be an 'append' flag, and is now a bitmap of
'unsigned long flags').

(Also some header file fallout - like so many merges this merge window -
and silly conflicts with sparse fixes)

* tag 'for-linus-3.4' of git://git.infradead.org/mtd-2.6: (120 commits)
  mtd: docg3 add protection against concurrency
  mtd: docg3 refactor cascade floors structure
  mtd: docg3 increase write/erase timeout
  mtd: docg3 fix inbound calculations
  mtd: nand: gpmi: fix function annotations
  mtd: phram: fix section mismatch for phram_setup
  mtd: unify initialization of erase_info->fail_addr
  mtd: support ONFI multi lun NAND
  mtd: sm_ftl: fix typo in major number.
  mtd: add device-tree support to spear_smi
  mtd: spear_smi: Remove default partition information from driver
  mtd: Add device-tree support to fsmc_nand
  mtd: fix section mismatch for doc_probe_device
  mtd: nand/fsmc: Remove sparse warnings and errors
  mtd: nand/fsmc: Add DMA support
  mtd: nand/fsmc: Access the NAND device word by word whenever possible
  mtd: nand/fsmc: Use dev_err to report error scenario
  mtd: nand/fsmc: Use devm routines
  mtd: nand/fsmc: Modify fsmc driver to accept nand timing parameters via platform
  mtd: fsmc_nand: add pm callbacks to support hibernation
  ...
2012-03-30 17:31:56 -07:00
Linus Torvalds
10f3cb41d4 Merge git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French.

* git://git.samba.org/sfrench/cifs-2.6:
  [CIFS] Update CIFS version number to 1.77
  CIFS: Add missed forcemand mount option
  [CIFS] Fix trivial sparse warning with asyn i/o patch
  cifs: handle "sloppy" option appropriately
  cifs: use standard token parser for mount options
  cifs: remove /proc/fs/cifs/OplockEnabled
  cifs: convert cifs_iovec_write to use async writes
  cifs: call cifs_update_eof with i_lock held
  cifs: abstract out function to marshal up the iovec array for async writes
  cifs: fix up get_numpages
  cifs: make cifsFileInfo_get return the cifsFileInfo pointer
  cifs: fix allocation in cifs_write_allocate_pages
  cifs: allow caller to specify completion op when allocating writedata
  cifs: add pid field to cifs_writedata
  cifs: add new cifsiod_wq workqueue
  CIFS: Change mid_q_entry structure fields
  CIFS: Expand CurrentMid field
  CIFS: Separate protocol-specific code from cifs_readv_receive code
  CIFS: Separate protocol-specific code from demultiplex code
  CIFS: Separate protocol-specific code from transport routines
2012-03-30 16:24:38 -07:00
Linus Torvalds
9613bebb22 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes and features from Chris Mason:
 "We've merged in the error handling patches from SuSE.  These are
  already shipping in the sles kernel, and they give btrfs the ability
  to abort transactions and go readonly on errors.  It involves a lot of
  churn as they clarify BUG_ONs, and remove the ones we now properly
  deal with.

  Josef reworked the way our metadata interacts with the page cache.
  page->private now points to the btrfs extent_buffer object, which
  makes everything faster.  He changed it so we write an whole extent
  buffer at a time instead of allowing individual pages to go down,,
  which will be important for the raid5/6 code (for the 3.5 merge
  window ;)

  Josef also made us more aggressive about dropping pages for metadata
  blocks that were freed due to COW.  Overall, our metadata caching is
  much faster now.

  We've integrated my patch for metadata bigger than the page size.
  This allows metadata blocks up to 64KB in size.  In practice 16K and
  32K seem to work best.  For workloads with lots of metadata, this cuts
  down the size of the extent allocation tree dramatically and fragments
  much less.

  Scrub was updated to support the larger block sizes, which ended up
  being a fairly large change (thanks Stefan Behrens).

  We also have an assortment of fixes and updates, especially to the
  balancing code (Ilya Dryomov), the back ref walker (Jan Schmidt) and
  the defragging code (Liu Bo)."

Fixed up trivial conflicts in fs/btrfs/scrub.c that were just due to
removal of the second argument to k[un]map_atomic() in commit
7ac687d9e0.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (75 commits)
  Btrfs: update the checks for mixed block groups with big metadata blocks
  Btrfs: update to the right index of defragment
  Btrfs: do not bother to defrag an extent if it is a big real extent
  Btrfs: add a check to decide if we should defrag the range
  Btrfs: fix recursive defragment with autodefrag option
  Btrfs: fix the mismatch of page->mapping
  Btrfs: fix race between direct io and autodefrag
  Btrfs: fix deadlock during allocating chunks
  Btrfs: show useful info in space reservation tracepoint
  Btrfs: don't use crc items bigger than 4KB
  Btrfs: flush out and clean up any block device pages during mount
  btrfs: disallow unequal data/metadata blocksize for mixed block groups
  Btrfs: enhance superblock sanity checks
  Btrfs: change scrub to support big blocks
  Btrfs: minor cleanup in scrub
  Btrfs: introduce common define for max number of mirrors
  Btrfs: fix infinite loop in btrfs_shrink_device()
  Btrfs: fix memory leak in resolver code
  Btrfs: allow dup for data chunks in mixed mode
  Btrfs: validate target profiles only if we are going to use them
  ...
2012-03-30 12:44:29 -07:00
Martin Schwidefsky
cb85a6ed67 proc: stats: Use arch_idle_time for idle and iowait times if available
Git commit a25cac5198 "proc: Consider NO_HZ when printing idle and
iowait times" changes the code for /proc/stat to use get_cpu_idle_time_us
and get_cpu_iowait_time_us if the system is running with nohz enabled.
For architectures which define arch_idle_time (currently s390 only)
this is a change for the worse. The result of arch_idle_time is supposed
to be the exact sleep time of the target cpu and should be used instead
of the value kept by the scheduler.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20120330122308.18720283@de.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-03-30 15:43:33 +02:00
Linus Torvalds
a591afc01d Merge branch 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x32 support for x86-64 from Ingo Molnar:
 "This tree introduces the X32 binary format and execution mode for x86:
  32-bit data space binaries using 64-bit instructions and 64-bit kernel
  syscalls.

  This allows applications whose working set fits into a 32 bits address
  space to make use of 64-bit instructions while using a 32-bit address
  space with shorter pointers, more compressed data structures, etc."

Fix up trivial context conflicts in arch/x86/{Kconfig,vdso/vma.c}

* 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits)
  x32: Fix alignment fail in struct compat_siginfo
  x32: Fix stupid ia32/x32 inversion in the siginfo format
  x32: Add ptrace for x32
  x32: Switch to a 64-bit clock_t
  x32: Provide separate is_ia32_task() and is_x32_task() predicates
  x86, mtrr: Use explicit sizing and padding for the 64-bit ioctls
  x86/x32: Fix the binutils auto-detect
  x32: Warn and disable rather than error if binutils too old
  x32: Only clear TIF_X32 flag once
  x32: Make sure TS_COMPAT is cleared for x32 tasks
  fs: Remove missed ->fds_bits from cessation use of fd_set structs internally
  fs: Fix close_on_exec pointer in alloc_fdtable
  x32: Drop non-__vdso weak symbols from the x32 VDSO
  x32: Fix coding style violations in the x32 VDSO code
  x32: Add x32 VDSO support
  x32: Allow x32 to be configured
  x32: If configured, add x32 system calls to system call tables
  x32: Handle process creation
  x32: Signal-related system calls
  x86: Add #ifdef CONFIG_COMPAT to <asm/sys_ia32.h>
  ...
2012-03-29 18:12:23 -07:00
Linus Torvalds
6268b325c3 Revert "ext4: don't release page refs in ext4_end_bio()"
This reverts commit b43d17f319.

Dave Jones reports that it causes lockups on his laptop, and his debug
output showed a lot of processes hung waiting for page_writeback (or
more commonly - processes hung waiting for a lock that was held during
that writeback wait).

The page_writeback hint made Ted suggest that Dave look at this commit,
and Dave verified that reverting it makes his problems go away.

Ted says:
 "That commit fixes a race which is seen when you write into fallocated
  (and hence uninitialized) disk blocks under *very* heavy memory
  pressure.  Furthermore, although theoretically it could trigger under
  normal direct I/O writes, it only seems to trigger if you are issuing
  a huge number of AIO writes, such that a just-written page can get
  evicted from memory, and then read back into memory, before the
  workqueue has a chance to update the extent tree.

  This race has been around for a little over a year, and no one noticed
  until two months ago; it only happens under fairly exotic conditions,
  and in fact even after trying very hard to create a simple repro under
  lab conditions, we could only reproduce the problem and confirm the
  fix on production servers running MySQL on very fast PCIe-attached
  flash devices.

  Given that Dave was able to hit this problem pretty quickly, if we
  confirm that this commit is at fault, the only reasonable thing to do
  is to revert it IMO."

Reported-and-tested-by: Dave Jones <davej@redhat.com>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-29 17:00:56 -07:00
Linus Torvalds
71db34fc43 Merge branch 'for-3.4' of git://linux-nfs.org/~bfields/linux
Pull nfsd changes from Bruce Fields:

Highlights:
 - Benny Halevy and Tigran Mkrtchyan implemented some more 4.1 features,
   moving us closer to a complete 4.1 implementation.
 - Bernd Schubert fixed a long-standing problem with readdir cookies on
   ext2/3/4.
 - Jeff Layton performed a long-overdue overhaul of the server reboot
   recovery code which will allow us to deprecate the current code (a
   rather unusual user of the vfs), and give us some needed flexibility
   for further improvements.
 - Like the client, we now support numeric uid's and gid's in the
   auth_sys case, allowing easier upgrades from NFSv2/v3 to v4.x.

Plus miscellaneous bugfixes and cleanup.

Thanks to everyone!

There are also some delegation fixes waiting on vfs review that I
suppose will have to wait for 3.5.  With that done I think we'll finally
turn off the "EXPERIMENTAL" dependency for v4 (though that's mostly
symbolic as it's been on by default in distro's for a while).

And the list of 4.1 todo's should be achievable for 3.5 as well:

   http://wiki.linux-nfs.org/wiki/index.php/Server_4.0_and_4.1_issues

though we may still want a bit more experience with it before turning it
on by default.

* 'for-3.4' of git://linux-nfs.org/~bfields/linux: (55 commits)
  nfsd: only register cld pipe notifier when CONFIG_NFSD_V4 is enabled
  nfsd4: use auth_unix unconditionally on backchannel
  nfsd: fix NULL pointer dereference in cld_pipe_downcall
  nfsd4: memory corruption in numeric_name_to_id()
  sunrpc: skip portmap calls on sessions backchannel
  nfsd4: allow numeric idmapping
  nfsd: don't allow legacy client tracker init for anything but init_net
  nfsd: add notifier to handle mount/unmount of rpc_pipefs sb
  nfsd: add the infrastructure to handle the cld upcall
  nfsd: add a header describing upcall to nfsdcld
  nfsd: add a per-net-namespace struct for nfsd
  sunrpc: create nfsd dir in rpc_pipefs
  nfsd: add nfsd4_client_tracking_ops struct and a way to set it
  nfsd: convert nfs4_client->cl_cb_flags to a generic flags field
  NFSD: Fix nfs4_verifier memory alignment
  NFSD: Fix warnings when NFSD_DEBUG is not defined
  nfsd: vfs_llseek() with 32 or 64 bit offsets (hashes)
  nfsd: rename 'int access' to 'int may_flags' in nfsd_open()
  ext4: return 32/64-bit dir name hash according to usage type
  fs: add new FMODE flags: FMODE_32bithash and FMODE_64bithash
  ...
2012-03-29 14:53:25 -07:00
Linus Torvalds
18a06efae5 Merge branch 'akpm' (Andrew's patch-bomb)
Single fix for a commit from the first batch of patches through Andrew.

* emailed from Andrew Morton <akpm@linux-foundation.org>:
  pagemap: remove remaining unneeded spin_lock()
2012-03-29 14:07:08 -07:00
Naoya Horiguchi
10bdfb5ef7 pagemap: remove remaining unneeded spin_lock()
Commit 025c5b2451 ("thp: optimize away unnecessary page table
locking") moves spin_lock() into pmd_trans_huge_lock() in order to avoid
locking unless pmd is for thp.  So this spin_lock() is a bug.

Reported-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-29 14:06:43 -07:00
Chris Mason
bc3f116fec Btrfs: update the checks for mixed block groups with big metadata blocks
Dave Sterba had put in patches to look for mixed data/metadata groups
with metadata bigger than 4KB.  But these ended up in the wrong place
and it wasn't testing the feature flag correctly.

This updates the tests to make sure our sizes are matching

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-03-29 17:02:47 -04:00
Liu Bo
e1f041e14c Btrfs: update to the right index of defragment
When we use autodefrag, we forget to update the index which indicates
the last page we've dirty.  And we'll set dirty flags on a same set of
pages again and again.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-03-29 09:57:45 -04:00
Liu Bo
66c2689226 Btrfs: do not bother to defrag an extent if it is a big real extent
$ mkfs.btrfs /dev/sdb7
$ mount /dev/sdb7 /mnt/btrfs/ -oautodefrag
$ dd if=/dev/zero of=/mnt/btrfs/foobar bs=4k count=10 oflag=direct 2>/dev/null
$ filefrag -v /mnt/btrfs/foobar
Filesystem type is: 9123683e
File size of /mnt/btrfs/foobar is 40960 (10 blocks, blocksize 4096)
 ext logical physical expected length flags
   0       0     3072              10 eof
/mnt/btrfs/foobar: 1 extent found

Now we have a big real extent [0, 40960), but autodefrag will still defrag it.

$ sync
$ filefrag -v /mnt/btrfs/foobar
Filesystem type is: 9123683e
File size of /mnt/btrfs/foobar is 40960 (10 blocks, blocksize 4096)
 ext logical physical expected length flags
   0       0     3082              10 eof
/mnt/btrfs/foobar: 1 extent found

So if we already find a big real extent, we're ok about that, just skip it.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-03-29 09:57:45 -04:00
Liu Bo
17ce6ef8d7 Btrfs: add a check to decide if we should defrag the range
If our file's layout is as follows:
| hole | data1 | hole | data2 |

we do not need to defrag this file, because this file has holes and
cannot be merged into one extent.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-03-29 09:57:45 -04:00
Liu Bo
4cb13e5d6e Btrfs: fix recursive defragment with autodefrag option
$ mkfs.btrfs disk
$ mount disk /mnt -o autodefrag
$ dd if=/dev/zero of=/mnt/foobar bs=4k count=10 2>/dev/null && sync
$ for i in `seq 9 -2 0`; do dd if=/dev/zero of=/mnt/foobar bs=4k count=1 \
  seek=$i conv=notrunc 2> /dev/null; done && sync

then we'll get to defrag "foobar" again and again.
So does option "-o autodefrag,compress".

Reasons:
When the cleaner kthread gets to fetch inodes from the defrag tree and defrag
them, it will dirty pages and submit them, this will comes to another DATA COW
where the processing inode will be inserted to the defrag tree again.

This patch sets a rule for COW code, i.e. insert an inode when we're really
going to make some defragments.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-03-29 09:57:45 -04:00
Liu Bo
1f12bd0632 Btrfs: fix the mismatch of page->mapping
commit 600a45e1d5
(Btrfs: fix deadlock on page lock when doing auto-defragment)
fixes the deadlock on page, but it also introduces another bug.

A page may have been truncated after unlock & lock.
So we need to find it again to get the right one.

And since we've held i_mutex lock, inode size remains unchanged and
we can drop isize overflow checks.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-03-29 09:57:44 -04:00
Liu Bo
ecb8bea87d Btrfs: fix race between direct io and autodefrag
The bug is from running xfstests 209 with autodefrag.

The race is as follows:
       t1                       t2(autodefrag)
   direct IO
     invalidate pagecache
     dio(old data)             add_inode_defrag
     invalidate pagecache
   endio

   direct IO
     invalidate pagecache
                                run_defrag
                                  readpage(old data)
                                  set page dirty (old data)
     dio(new data, rewrite)
     invalidate pagecache (*)
     endio

t2(autodefrag) will get old data into pagecache via readpage and set
pagecache dirty.  Meanwhile, invalidate pagecache(*) will fail due to
dirty flags in pages.  So the old data may be flushed into disk by
flush thread, which will lead to data loss.

And so does the case of user defragment progs.

The patch fixes this race by holding i_mutex when we readpage and set page dirty.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-03-29 09:57:44 -04:00
Liu Bo
15d1ff8111 Btrfs: fix deadlock during allocating chunks
This deadlock comes from xfstests 251.

We'll hold the chunk_mutex throughout the whole of a chunk allocation.
But if we find that we've used up system chunk space, we need to allocate a
new system chunk, but this will lead to a recursion of chunk allocation and end
up with a deadlock on chunk_mutex.
So instead we need to allocate the system chunk first if we find we're in ENOSPC.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-03-29 09:57:44 -04:00
Liu Bo
2bcc0328c3 Btrfs: show useful info in space reservation tracepoint
o For space info, the type of space info is useful for debug.
o For transaction handle, its transid is useful.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-03-29 09:57:44 -04:00
Jeff Layton
797a9d797f nfsd: only register cld pipe notifier when CONFIG_NFSD_V4 is enabled
Otherwise, we get a warning or error similar to this when building with
CONFIG_NFSD_V4 disabled:

    ERROR: "nfsd4_cld_block" [fs/nfsd/nfsd.ko] undefined!

Fix this by wrapping the calls to rpc_pipefs_notifier_register and
..._unregister in another function and providing no-op replacements
when CONFIG_NFSD_V4 is disabled.

Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-03-29 08:01:07 -04:00
Linus Torvalds
afb9bd704c Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd
Pull trivial exofs changes from Boaz Harrosh:
 "Just nothingness really.  The big exofs changes are reserved for the
  next merge window."

* 'for-linus' of git://git.open-osd.org/linux-open-osd:
  exofs: Cap on the memcpy() size
  exofs: (trivial) Fix typo in super.c
  exofs: fix endian conversion in exofs_sync_fs()
2012-03-28 20:04:27 -07:00
Linus Torvalds
58df9b387c NFS client bugfixes for Linux 3.4
Highlights include:
 - Fix infinite loops in the mount code
 - Fix a userspace buffer overflow in __nfs4_get_acl_uncached
 - Fix a memory leak due to a double reference count in rpcb_getport_async()
 
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJPc8DPAAoJEGcL54qWCgDyxAkP/2YMxAnZ+8kzuhKi5J073y/K
 G9H8v6zV6WaMut9YeX+8sX/4xJrFPSZJoRALSdEbI9wpIGMMHjIxQD8RuY4imt1L
 +YwB3rAke+mr2YAvW/2Q6HVTV75/00Kui+Jkgsa2wm/4Fyz8PfCHe71bBLG4UPJg
 ZgauN9rCIHIQpT/sbuN5mPU5C4jbOZa959CogWyUKeMnbuts1Z3qbS+aFB0wcxAW
 279E88oqDISYLo5XQiC/NJzLKmKJ3iDl1UbjqS6T2g74i1zX+lyxl0K9MdTFmAmm
 9UQ1pjIFIMySKJI/LahY6ZXM76hGLW4TbwO0q+fmhuv4cFBM3LWZPUm3ggPyzAp1
 O8TEeMNIfkNVGRYVnC7GcPFjnUlt9ahiPzk0lQ7l0RzWfXiDt8y/EUOwhqdxULqx
 S5SwLmSKDw9bQXFiHJtqe7lnB9hrVWNUvHqE1iTC20YTtQYovYGWaT8YLP7/y/Iz
 4jSLv8VeOzA6BP1jfQkacsBucxpn0fhPjwuCOZl2ooa+8jldsr8in8nwo0nVgrJF
 PGzJvyvGSR2BT2BL+QSUPX0Pn8qe7eAFePlS8zFS/p3X8rkDNxYI51VwuyXOQQl3
 lJHUyTbdTd/G4H5ObEU+ClIhxIvqgDrawIdXrbiMonoShsaFoVkuX7IFviPiiPDE
 s7KGNtpLh45+GZsBVe6M
 =yiL7
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.4-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes for Linux 3.4 from Trond Myklebust

Highlights include:
- Fix infinite loops in the mount code
- Fix a userspace buffer overflow in __nfs4_get_acl_uncached
- Fix a memory leak due to a double reference count in rpcb_getport_async()

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

* tag 'nfs-for-3.4-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFSv4: Minor cleanups for nfs4_handle_exception and nfs4_async_handle_error
  NFSv4.1: Fix layoutcommit error handling
  NFSv4: Fix two infinite loops in the mount code
  SUNRPC: Use the already looked-up xprt in rpcb_getport_async()
  NFS4.1: remove duplicate variable declaration in filelayout_clear_request_commit
  Fix length of buffer copied in __nfs4_get_acl_uncached
2012-03-28 19:02:35 -07:00
Linus Torvalds
8563f8786e Add an extra mount time sanity check, plus some code cleanups and bug fixes.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPc65LAAoJEJAch/D1fbHUEswP/RzwQ7NFsGe+RgC7YygUn+nP
 qQma8CUBXMPNKLzGl+BGqG+TtNdBDEVGvNrHYtfn54tI7N1qliHTcMqfHv4SF1nZ
 QFKwqhBpEOFSbHQg4ts/N01Pa2Ilqw2A4L/8CBjqUkEZ8qOorI8sLp1Xb254YoVk
 G4oP+dY/YIEXZxYhIerevIkpNElkqTB2dZAZ/uhNcdHkKIRyAvqyay6F04YdGqI+
 r2JfzhPS0T70PbrBHur1ed7iAHYOtgrxgB89CS3jJ5X1iG+iK8i+Xsn18fBFOQPd
 ULaSwfdJY5xALqxrEyuO1VxP1uEGAmn2+aOPQP/KIapLmIBGaZXHjC8H3uMqGhZ6
 /Y6ZnoH4NJBANn+HXN3iwqQZ8+cw+HUgzJdyZwp6d8SEBM1KsESWeR2t+U6Zvr8L
 sLS5inXjbS3O6B07GV58liyCFLEXtmEHj3GCtnnWvp44Vjax57hbegQzKxe8C+3D
 YqBf/fx9WKIA5Ojbx5fGUaz7BQ2fczMuzrwNQB05bZAdHqSuKs3dWpGpjtKcelwp
 k1BO+kstuwE/dRiAxWZ3lpMQ9GLNmAGg1DgqWEKRMXuThwgxhXwf0sAshhGYCmL6
 IdkUqC95Be8/D5i9yxbY8TGIV4rmhV8xDR9j8cIVHtiqbuGZT1jPhXUXXJVWw/y2
 Us/awa6sC0qlL18jHV35
 =ywVL
 -----END PGP SIGNATURE-----

Merge tag 'squashfs-updates' of git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-next

Pull squashfs updates from Phillip Lougher:
 "Add an extra mount time sanity check, plus some code cleanups and bug
  fixes."

* tag 'squashfs-updates' of git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-next:
  Squashfs: add mount time sanity check for block_size and block_log match
  Squashfs: fix f_pos check in get_dir_index_using_offset
  Squashfs: get rid of obsolete definitions in header file
  Squashfs: remove redundant length initialisation in squashfs_lookup
  Squashfs: remove redundant length initialisation in squashfs_readdir
  Squashfs: update comment removing reference to zlib only
  Squashfs: use define instead of constant
2012-03-28 18:05:54 -07:00
Chris Mason
7ca4be45a0 Btrfs: don't use crc items bigger than 4KB
With the big metadata blocks, we can have crc items
that are much bigger than a page.  There are a few
places that we try to kmalloc memory to hold the
items during a split.

Items bigger than 4KB don't really have a huge benefit
in efficiency, but they do trigger larger order allocations.
This commits changes the csums to make sure they stay under
4KB.  This is not a format change, just a #define to limit
huge items.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-03-28 20:34:10 -04:00
Chris Mason
3c4bb26b21 Btrfs: flush out and clean up any block device pages during mount
Btrfs puts the filesystem metadata into its own address space, and
somehow the block device address space isn't getting onto disk properly
before a mount.  The end result is that a loop of mkfs and mounting the
filesystem will sometimes find stale or incorrect data.

This commit should fix it by sprinkling fdatawrites and invalidate_bdev
calls around.  This is a short term measure to make sure it is fixed.
The block devices really should be flushed and cleaned up higher in the
stack.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-03-28 20:33:58 -04:00
Chris Mason
98961a7e43 Merge git://git.jan-o-sch.net/btrfs-unstable into for-linus
Conflicts:
	fs/btrfs/transaction.c

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-03-28 20:33:40 -04:00
Chris Mason
1c691b330a Merge branch 'for-chris' of git://github.com/idryomov/btrfs-unstable into for-linus 2012-03-28 20:32:46 -04:00
Chris Mason
1d4284bd6e Merge branch 'error-handling' into for-linus
Conflicts:
	fs/btrfs/ctree.c
	fs/btrfs/disk-io.c
	fs/btrfs/extent-tree.c
	fs/btrfs/extent_io.c
	fs/btrfs/extent_io.h
	fs/btrfs/inode.c
	fs/btrfs/scrub.c

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-03-28 20:31:37 -04:00
David Sterba
65139ed992 btrfs: disallow unequal data/metadata blocksize for mixed block groups
With support for bigger metadata blocks, we must avoid mounting a
filesystem with different block size for mixed block groups, this causes
corruption (found by xfstests/083).

Signed-off-by: David Sterba <dsterba@suse.cz>
2012-03-28 20:30:28 -04:00
David Sterba
fcd1f065da Btrfs: enhance superblock sanity checks
Validate checksum algorithm during mount and prevent BUG_ON later in
btrfs_super_csum_size.

Signed-off-by: David Sterba <dsterba@suse.cz>
2012-03-28 20:30:28 -04:00
Linus Torvalds
532bfc851a Merge branch 'akpm' (Andrew's patch-bomb)
Merge third batch of patches from Andrew Morton:
 - Some MM stragglers
 - core SMP library cleanups (on_each_cpu_mask)
 - Some IPI optimisations
 - kexec
 - kdump
 - IPMI
 - the radix-tree iterator work
 - various other misc bits.

 "That'll do for -rc1.  I still have ~10 patches for 3.4, will send
  those along when they've baked a little more."

* emailed from Andrew Morton <akpm@linux-foundation.org>: (35 commits)
  backlight: fix typo in tosa_lcd.c
  crc32: add help text for the algorithm select option
  mm: move hugepage test examples to tools/testing/selftests/vm
  mm: move slabinfo.c to tools/vm
  mm: move page-types.c from Documentation to tools/vm
  selftests/Makefile: make `run_tests' depend on `all'
  selftests: launch individual selftests from the main Makefile
  radix-tree: use iterators in find_get_pages* functions
  radix-tree: rewrite gang lookup using iterator
  radix-tree: introduce bit-optimized iterator
  fs/proc/namespaces.c: prevent crash when ns_entries[] is empty
  nbd: rename the nbd_device variable from lo to nbd
  pidns: add reboot_pid_ns() to handle the reboot syscall
  sysctl: use bitmap library functions
  ipmi: use locks on watchdog timeout set on reboot
  ipmi: simplify locking
  ipmi: fix message handling during panics
  ipmi: use a tasklet for handling received messages
  ipmi: increase KCS timeouts
  ipmi: decrease the IPMI message transaction time in interrupt mode
  ...
2012-03-28 17:19:28 -07:00
Andrew Morton
4c619aa0ba fs/proc/namespaces.c: prevent crash when ns_entries[] is empty
If CONFIG_NET_NS, CONFIG_UTS_NS and CONFIG_IPC_NS are disabled,
ns_entries[] becomes empty and things like
ns_entries[ARRAY_SIZE(ns_entries) - 1] will explode.

Reported-by: Richard Weinberger <richard@nod.at>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Daniel Lezcano <daniel.lezcano@free.fr>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-28 17:14:37 -07:00
Gilad Ben-Yossef
42be35d039 fs: only send IPI to invalidate LRU BH when needed
In several code paths, such as when unmounting a file system (but not
only) we send an IPI to ask each cpu to invalidate its local LRU BHs.

For multi-cores systems that have many cpus that may not have any LRU BH
because they are idle or because they have not performed any file system
accesses since last invalidation (e.g.  CPU crunching on high perfomance
computing nodes that write results to shared memory or only using
filesystems that do not use the bh layer.) This can lead to loss of
performance each time someone switches the KVM (the virtual keyboard and
screen type, not the hypervisor) if it has a USB storage stuck in.

This patch attempts to only send an IPI to cpus that have LRU BH.

Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-28 17:14:35 -07:00
Andrea Arcangeli
45f83cefe3 mm: thp: fix up pmd_trans_unstable() locations
pmd_trans_unstable() should be called before pmd_offset_map() in the
locations where the mmap_sem is held for reading.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mark Salter <msalter@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-28 17:14:35 -07:00
KAMEZAWA Hiroyuki
3748b2f15b procfs: fix /proc/statm
bda7bad62b ("procfs: speed up /proc/pid/stat, statm") broke /proc/statm
- 'text' is printed twice by mistake.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reported-by: Ulrich Drepper <drepper@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-28 17:14:35 -07:00
J. Bruce Fields
4ca1f872cd nfsd4: use auth_unix unconditionally on backchannel
This isn't actually correct, but it works with the Linux client, and
agrees with the behavior we used to have before commit 80fc015bdf.

Later patches will implement the spec-mandated behavior (which is to use
the security parameters explicitly given by the client in create_session
or backchannel_ctl).

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-03-28 19:14:36 -04:00
Linus Torvalds
0195c00244 Disintegrate and delete asm/system.h
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIVAwUAT3NKzROxKuMESys7AQKElw/+JyDxJSlj+g+nymkx8IVVuU8CsEwNLgRk
 8KEnRfLhGtkXFLSJYWO6jzGo16F8Uqli1PdMFte/wagSv0285/HZaKlkkBVHdJ/m
 u40oSjgT013bBh6MQ0Oaf8pFezFUiQB5zPOA9QGaLVGDLXCmgqUgd7exaD5wRIwB
 ZmyItjZeAVnDfk1R+ZiNYytHAi8A5wSB+eFDCIQYgyulA1Igd1UnRtx+dRKbvc/m
 rWQ6KWbZHIdvP1ksd8wHHkrlUD2pEeJ8glJLsZUhMm/5oMf/8RmOCvmo8rvE/qwl
 eDQ1h4cGYlfjobxXZMHqAN9m7Jg2bI946HZjdb7/7oCeO6VW3FwPZ/Ic75p+wp45
 HXJTItufERYk6QxShiOKvA+QexnYwY0IT5oRP4DrhdVB/X9cl2MoaZHC+RbYLQy+
 /5VNZKi38iK4F9AbFamS7kd0i5QszA/ZzEzKZ6VMuOp3W/fagpn4ZJT1LIA3m4A9
 Q0cj24mqeyCfjysu0TMbPtaN+Yjeu1o1OFRvM8XffbZsp5bNzuTDEvviJ2NXw4vK
 4qUHulhYSEWcu9YgAZXvEWDEM78FXCkg2v/CrZXH5tyc95kUkMPcgG+QZBB5wElR
 FaOKpiC/BuNIGEf02IZQ4nfDxE90QwnDeoYeV+FvNj9UEOopJ5z5bMPoTHxm4cCD
 NypQthI85pc=
 =G9mT
 -----END PGP SIGNATURE-----

Merge tag 'split-asm_system_h-for-linus-20120328' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system

Pull "Disintegrate and delete asm/system.h" from David Howells:
 "Here are a bunch of patches to disintegrate asm/system.h into a set of
  separate bits to relieve the problem of circular inclusion
  dependencies.

  I've built all the working defconfigs from all the arches that I can
  and made sure that they don't break.

  The reason for these patches is that I recently encountered a circular
  dependency problem that came about when I produced some patches to
  optimise get_order() by rewriting it to use ilog2().

  This uses bitops - and on the SH arch asm/bitops.h drags in
  asm-generic/get_order.h by a circuituous route involving asm/system.h.

  The main difficulty seems to be asm/system.h.  It holds a number of
  low level bits with no/few dependencies that are commonly used (eg.
  memory barriers) and a number of bits with more dependencies that
  aren't used in many places (eg.  switch_to()).

  These patches break asm/system.h up into the following core pieces:

    (1) asm/barrier.h

        Move memory barriers here.  This already done for MIPS and Alpha.

    (2) asm/switch_to.h

        Move switch_to() and related stuff here.

    (3) asm/exec.h

        Move arch_align_stack() here.  Other process execution related bits
        could perhaps go here from asm/processor.h.

    (4) asm/cmpxchg.h

        Move xchg() and cmpxchg() here as they're full word atomic ops and
        frequently used by atomic_xchg() and atomic_cmpxchg().

    (5) asm/bug.h

        Move die() and related bits.

    (6) asm/auxvec.h

        Move AT_VECTOR_SIZE_ARCH here.

  Other arch headers are created as needed on a per-arch basis."

Fixed up some conflicts from other header file cleanups and moving code
around that has happened in the meantime, so David's testing is somewhat
weakened by that.  We'll find out anything that got broken and fix it..

* tag 'split-asm_system_h-for-linus-20120328' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system: (38 commits)
  Delete all instances of asm/system.h
  Remove all #inclusions of asm/system.h
  Add #includes needed to permit the removal of asm/system.h
  Move all declarations of free_initmem() to linux/mm.h
  Disintegrate asm/system.h for OpenRISC
  Split arch_align_stack() out from asm-generic/system.h
  Split the switch_to() wrapper out of asm-generic/system.h
  Move the asm-generic/system.h xchg() implementation to asm-generic/cmpxchg.h
  Create asm-generic/barrier.h
  Make asm-generic/cmpxchg.h #include asm-generic/cmpxchg-local.h
  Disintegrate asm/system.h for Xtensa
  Disintegrate asm/system.h for Unicore32 [based on ver #3, changed by gxt]
  Disintegrate asm/system.h for Tile
  Disintegrate asm/system.h for Sparc
  Disintegrate asm/system.h for SH
  Disintegrate asm/system.h for Score
  Disintegrate asm/system.h for S390
  Disintegrate asm/system.h for PowerPC
  Disintegrate asm/system.h for PA-RISC
  Disintegrate asm/system.h for MN10300
  ...
2012-03-28 15:58:21 -07:00
Linus Torvalds
f21ce8f844 Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
Pull XFS update (part 2) from Ben Myers:
 "Fixes for tracing of xfs_name strings, flag handling in
  open_by_handle, a log space hang with freeze/unfreeze, fstrim offset
  calculations, a section mismatch with xfs_qm_exit, an oops in
  xlog_recover_process_iunlinks, and a deadlock in xfs_rtfree_extent.

  There are also additional trace points for attributes, and the
  addition of a workqueue for allocation to work around kernel stack
  size limitations."

* 'for-linus' of git://oss.sgi.com/xfs/xfs:
  xfs: add lots of attribute trace points
  xfs: Fix oops on IO error during xlog_recover_process_iunlinks()
  xfs: fix fstrim offset calculations
  xfs: Account log unmount transaction correctly
  xfs: don't cache inodes read through bulkstat
  xfs: trace xfs_name strings correctly
  xfs: introduce an allocation workqueue
  xfs: Fix open flag handling in open_by_handle code
  xfs: fix deadlock in xfs_rtfree_extent
  fs: xfs: fix section mismatch in linux-next
2012-03-28 15:23:52 -07:00
David Howells
9ffc93f203 Remove all #inclusions of asm/system.h
Remove all #inclusions of asm/system.h preparatory to splitting and killing
it.  Performed with the following command:

perl -p -i -e 's!^#\s*include\s*<asm/system[.]h>.*\n!!' `grep -Irl '^#\s*include\s*<asm/system[.]h>' *`

Signed-off-by: David Howells <dhowells@redhat.com>
2012-03-28 18:30:03 +01:00
David Howells
96f951edb1 Add #includes needed to permit the removal of asm/system.h
asm/system.h is a cause of circular dependency problems because it contains
commonly used primitive stuff like barrier definitions and uncommonly used
stuff like switch_to() that might require MMU definitions.

asm/system.h has been disintegrated by this point on all arches into the
following common segments:

 (1) asm/barrier.h

     Moved memory barrier definitions here.

 (2) asm/cmpxchg.h

     Moved xchg() and cmpxchg() here.  #included in asm/atomic.h.

 (3) asm/bug.h

     Moved die() and similar here.

 (4) asm/exec.h

     Moved arch_align_stack() here.

 (5) asm/elf.h

     Moved AT_VECTOR_SIZE_ARCH here.

 (6) asm/switch_to.h

     Moved switch_to() here.

Signed-off-by: David Howells <dhowells@redhat.com>
2012-03-28 18:30:03 +01:00
Linus Torvalds
529b73fc0a trivial writeback fixes
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPad6tAAoJECvKgwp+S8JaAFEP/jMTQmRUh59x8ABwhJjeWkGf
 zv8JyA98aDYauXTrDbYJFsmeCg3UBvhn9Ut2yZhU74EPsmDVyKu5l/gfrVuJqPVr
 cShzJHGxZUJHBa79mrY8Oe7OAHN1ZuaQUgM8b8sQP6OPUipZM/2QPv9nshlqggQq
 WHbcYaOqqvt1+rcRzG/rIxv9XSaarU5uz+nebECY1Pg10bYR0owrisVGzNUt1e8B
 EFH/g8mMOmO37oRDYdXBTrVPlfY3lMBafgYTrhCON6TA2ucTwsZqnlgL5AJNjJAK
 CQD03sK/kAVCC+VV4n4U9f+PnNoEiYrRx2gC9QScRGwRxyAGjfw/9UAEgY9IID59
 WHSZTgd3KkUeSST69FHu+B/Vx5QDbGH9vRhOrL6aDXcitmitLphJ0Pa32bAhPkPI
 qbNORGT+pDmbWjERLPy6YEPB2rkAt4FQrDjtpfOOMjREBrTMkJs+iWBABEXs1AM/
 ppc1LGUotZuzcabxKCwSdLCJX6Gb4A6cl4dKk1ClndIKLSb/Q9+CIjM66Q6TdT29
 Ftp5GRKv+Ga6OFfjku71yatJmYMxGZ9hW7Nnpzby3dbjWKqOE6Zglxrhg7VVb9zI
 I7VTqwKDXzVrkyx1Tu7lJZ2NMVx6Nt2tpTuJ3cq5d/CWd1MrhC948hwjmFh0C1GY
 uRn8KoBXIvcQUwy92seb
 =hjf/
 -----END PGP SIGNATURE-----

Merge tag 'writeback-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux

Pull trivial writeback fixes from Wu Fengguang:
 "They've been tested in linux-next for 20 days actually."

* tag 'writeback-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
  writeback: Remove outdated comment
  fs: Remove bogus wait in write_inode_now()
2012-03-28 10:07:27 -07:00
Linus Torvalds
69e1aaddd6 Ext4 commits for 3.3 merge window; mostly cleanups and bug fixes
The changes to export dirty_writeback_interval are from Artem's s_dirt
 cleanup patch series.  The same is true of the change to remove the
 s_dirt helper functions which never got used by anyone in-tree.  I've
 run these changes by Al Viro, and am carrying them so that Artem can
 more easily fix up the rest of the file systems during the next merge
 window.  (Originally we had hopped to remove the use of s_dirt from
 ext4 during this merge window, but his patches had some bugs, so I
 ultimately ended dropping them from the ext4 tree.)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABCAAGBQJPb39rAAoJENNvdpvBGATwVz8P/3V1NqSsk20VJOLbmEE45GxL
 GDzQJ6OsFG0UiQk6ISSrSdwxfav/KTCGySsU9UtAoOdPcBwnnsf8S7wc6OggwwuC
 hBFGwwFzk6YSQaZ58sUxWRGeOJuP/FPem6Id6buC4DQ1KIcznP/hEEgEnh/ir4Ec
 vrsfexY93TR8BE2Mi23v2epDVLU0B6bY/w9nDqbTXif3xN/gh/ypoHHouuM6Bs2n
 TyWHOwD15NwfnvRHd8PfDDqQM/D29x3QI0FMrWj9McpwIz4d4cBfhN4LQ/G+yLDY
 izv5DM10GbinwHPrsOTGVAW3KIdSS9rP3jCJGVuOrJZ9ufGXosvHuIYVhI7J3SBK
 JhBu6QEsN1IsvlVYpz9q8mqVKaDXQLsz2eaTw+i4yfmyOk1kOX7nIEOxYFF78G+V
 Of/W1SpIpJQaXvLHRcDj9fDj0fZTciUZA8v7/HOFS+co2dzIl0iZbcfBFp0/56RY
 sWdQoeRlx1ciVDPR+w2TQO5w3VWQw1gT5aqux0NiPj0XFoiUHScxgNGAYbqENMQw
 v9chvyDMlorqj0rF/Vey5SssgEDi7MTdYuYTi4YyMqr7pcvOJaO85pf+wH9g2eKW
 XhW33PhPGuwCJDP5Pg8Y0Z2Hp/Q3DCqhLqhGfTyAs/NG9+hR4wgp3VWb8CUqhA1t
 C/yzNeOYqScAefCzQx2V
 =+9zk
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates for 3.4 from Ted Ts'o:
 "Ext4 commits for 3.3 merge window; mostly cleanups and bug fixes

  The changes to export dirty_writeback_interval are from Artem's s_dirt
  cleanup patch series.  The same is true of the change to remove the
  s_dirt helper functions which never got used by anyone in-tree.  I've
  run these changes by Al Viro, and am carrying them so that Artem can
  more easily fix up the rest of the file systems during the next merge
  window.  (Originally we had hopped to remove the use of s_dirt from
  ext4 during this merge window, but his patches had some bugs, so I
  ultimately ended dropping them from the ext4 tree.)"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (66 commits)
  vfs: remove unused superblock helpers
  mm: export dirty_writeback_interval
  ext4: remove useless s_dirt assignment
  ext4: write superblock only once on unmount
  ext4: do not mark superblock as dirty unnecessarily
  ext4: correct ext4_punch_hole return codes
  ext4: remove restrictive checks for EOFBLOCKS_FL
  ext4: always set then trimmed blocks count into len
  ext4: fix trimmed block count accunting
  ext4: fix start and len arguments handling in ext4_trim_fs()
  ext4: update s_free_{inodes,blocks}_count during online resize
  ext4: change some printk() calls to use ext4_msg() instead
  ext4: avoid output message interleaving in ext4_error_<foo>()
  ext4: remove trailing newlines from ext4_msg() and ext4_error() messages
  ext4: add no_printk argument validation, fix fallout
  ext4: remove redundant "EXT4-fs: " from uses of ext4_msg
  ext4: give more helpful error message in ext4_ext_rm_leaf()
  ext4: remove unused code from ext4_ext_map_blocks()
  ext4: rewrite punch hole to use ext4_ext_remove_space()
  jbd2: cleanup journal tail after transaction commit
  ...
2012-03-28 10:02:55 -07:00
Linus Torvalds
56b59b429b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph updates for 3.4-rc1 from Sage Weil:
 "Alex has been busy.  There are a range of rbd and libceph cleanups,
  especially surrounding device setup and teardown, and a few critical
  fixes in that code.  There are more cleanups in the messenger code,
  virtual xattrs, a fix for CRC calculation/checks, and lots of other
  miscellaneous stuff.

  There's a patch from Amon Ott to make inos behave a bit better on
  32-bit boxes, some decode check fixes from Xi Wang, and network
  throttling fix from Jim Schutt, and a couple RBD fixes from Josh
  Durgin.

  No new functionality, just a lot of cleanup and bug fixing."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (65 commits)
  rbd: move snap_rwsem to the device, rename to header_rwsem
  ceph: fix three bugs, two in ceph_vxattrcb_file_layout()
  libceph: isolate kmap() call in write_partial_msg_pages()
  libceph: rename "page_shift" variable to something sensible
  libceph: get rid of zero_page_address
  libceph: only call kernel_sendpage() via helper
  libceph: use kernel_sendpage() for sending zeroes
  libceph: fix inverted crc option logic
  libceph: some simple changes
  libceph: small refactor in write_partial_kvec()
  libceph: do crc calculations outside loop
  libceph: separate CRC calculation from byte swapping
  libceph: use "do" in CRC-related Boolean variables
  ceph: ensure Boolean options support both senses
  libceph: a few small changes
  libceph: make ceph_tcp_connect() return int
  libceph: encapsulate some messenger cleanup code
  libceph: make ceph_msgr_wq private
  libceph: encapsulate connection kvec operations
  libceph: move prepare_write_banner()
  ...
2012-03-28 10:01:29 -07:00
Linus Torvalds
9a7259d5c8 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext3, UDF, and quota fixes from Jan Kara:
 "A couple of ext3 & UDF fixes and also one improvement in quota
  locking."

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  ext3: fix start and len arguments handling in ext3_trim_fs()
  udf: Fix deadlock in udf_release_file()
  udf: Fix file entry logicalBlocksRecorded
  udf: Fix handling of i_blocks
  quota: Make quota code not call tty layer with dqptr_sem held
  udf: Init/maintain file entry checkpoint field
  ext3: Update ctime in ext3_splice_branch() only when needed
  ext3: Don't call dquot_free_block() if we don't update anything
  udf: Remove unnecessary OOM messages
2012-03-28 10:00:14 -07:00