2
0
mirror of https://github.com/edk2-porting/linux-next.git synced 2025-01-20 11:34:02 +08:00
Commit Graph

39052 Commits

Author SHA1 Message Date
David Sterba
c234a24de9 btrfs: cleanup, remove inode_ref_info helper
A simple wrapper around btrfs_find_item.

Signed-off-by: David Sterba <dsterba@suse.cz>
2015-01-14 19:23:47 +01:00
David Sterba
14692cc150 btrfs: cleanup, remove inode_item_info helper
It's only a simple wrapper around btrfs_find_item, the locally defined
key is not used.

Signed-off-by: David Sterba <dsterba@suse.cz>
2015-01-14 19:23:47 +01:00
David Sterba
381cf6587f btrfs: fix leak of path in btrfs_find_item
If btrfs_find_item is called with NULL path it allocates one locally but
does not free it. Affected paths are inserting an orphan item for a file
and for a subvol root.

Move the path allocation to the callers.

CC: <stable@vger.kernel.org> # 3.14+
Fixes: 3f870c2899 ("btrfs: expand btrfs_find_item() to include find_orphan_item functionality")
Signed-off-by: David Sterba <dsterba@suse.cz>
2015-01-14 19:23:46 +01:00
NeilBrown
52d304eb4e locks: fix NULL-deref in generic_delete_lease
commit 0efaa7e82f
  locks: generic_delete_lease doesn't need a file_lock at all

moves the call to fl->fl_lmops->lm_change() to a place in the
code where fl might be a non-lease lock.
When that happens, fl_lmops is NULL and an Oops ensures.

So add an extra test to restore correct functioning.

Reported-by: Linda Walsh <suse@tlinx.org>
Link: https://bugzilla.suse.com/show_bug.cgi?id=912569
Cc: stable@vger.kernel.org (v3.18)
Fixes: 0efaa7e82f
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
2015-01-13 07:00:55 -05:00
Linus Torvalds
5ab551d662 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
 "Misc fixes: group scheduling corner case fix, two deadline scheduler
  fixes, effective_load() overflow fix, nested sleep fix, 6144 CPUs
  system fix"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/fair: Fix RCU stall upon -ENOMEM in sched_create_group()
  sched/deadline: Avoid double-accounting in case of missed deadlines
  sched/deadline: Fix migration of SCHED_DEADLINE tasks
  sched: Fix odd values in effective_load() calculations
  sched, fanotify: Deal with nested sleeps
  sched: Fix KMALLOC_MAX_SIZE overflow during cpumask allocation
2015-01-11 11:51:49 -08:00
Linus Torvalds
dc9319f5a3 Merge branch 'for-3.19' of git://linux-nfs.org/~bfields/linux
Pull two nfsd bugfixes from Bruce Fields.

* 'for-3.19' of git://linux-nfs.org/~bfields/linux:
  rpc: fix xdr_truncate_encode to handle buffer ending on page boundary
  nfsd: fix fi_delegees leak when fi_had_conflict returns true
2015-01-09 18:10:48 -08:00
Linus Torvalds
20ebb34528 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull two Ceph fixes from Sage Weil:
 "These are both pretty trivial: a sparse warning fix and size_t printk
  thing"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  libceph: fix sparse endianness warnings
  ceph: use %zu for len in ceph_fill_inline_data()
2015-01-09 17:55:00 -08:00
Linus Torvalds
03c751a5e1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "None of these are huge, but my commit does fix a regression from 3.18
  that could cause lost files during log replay.

  This also adds Dave Sterba to the list of Btrfs maintainers.  It
  doesn't mean we're doing things differently, but Dave has really been
  helping with the maintainer workload for years"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: don't delay inode ref updates during log replay
  Btrfs: correctly get tree level in tree_backref_for_extent
  Btrfs: call inode_dec_link_count() on mkdir error path
  Btrfs: abort transaction if we don't find the block group
  Btrfs, scrub: uninitialized variable in scrub_extent_for_parity()
  Btrfs: add more maintainers
2015-01-09 17:46:07 -08:00
Rasmus Villemoes
72392ed0eb kernfs: Fix kernfs_name_compare
Returning a difference from a comparison functions is usually wrong
(see acbbe6fbb2 "kcmp: fix standard comparison bug" for the long
story). Here there is the additional twist that if the void pointers
ns and kn->ns happen to differ by a multiple of 2^32,
kernfs_name_compare returns 0, falsely reporting a match to the
caller.

Technically 'hash - kn->hash' is ok since the hashes are restricted to
31 bits, but it's better to avoid that subtlety.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-09 15:51:08 -08:00
Peter Zijlstra
536ebe9ca9 sched, fanotify: Deal with nested sleeps
As per e23738a730 ("sched, inotify: Deal with nested sleeps").

fanotify_read is a wait loop with sleeps in. Wait loops rely on
task_struct::state and sleeps do too, since that's the only means of
actually sleeping. Therefore the nested sleeps destroy the wait loop
state and the wait loop breaks the sleep functions that assume
TASK_RUNNING (mutex_lock).

Fix this by using the new woken_wake_function and wait_woken() stuff,
which registers wakeups in wait and thereby allows shrinking the
task_state::state changes to the actual sleep part.

Reported-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Eric Paris <eparis@redhat.com>
Link: http://lkml.kernel.org/r/20141216152838.GZ3337@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-09 11:18:12 +01:00
David Drysdale
75069f2b5b vfs: renumber FMODE_NONOTIFY and add to uniqueness check
Fix clashing values for O_PATH and FMODE_NONOTIFY on sparc.  The
clashing O_PATH value was added in commit 5229645bdc ("vfs: add
nonconflicting values for O_PATH") but this can't be changed as it is
user-visible.

FMODE_NONOTIFY is only used internally in the kernel, but it is in the
same numbering space as the other O_* flags, as indicated by the comment
at the top of include/uapi/asm-generic/fcntl.h (and its use in
fs/notify/fanotify/fanotify_user.c).  So renumber it to avoid the clash.

All of this has happened before (commit 12ed2e36c9: "fanotify:
FMODE_NONOTIFY and __O_SYNC in sparc conflict"), and all of this will
happen again -- so update the uniqueness check in fcntl_init() to
include __FMODE_NONOTIFY.

Signed-off-by: David Drysdale <drysdale@google.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Jan Kara <jack@suse.cz>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-08 15:10:52 -08:00
Xue jiufei
53dc20b9a3 ocfs2: fix the wrong directory passed to ocfs2_lookup_ino_from_name() when link file
In ocfs2_link(), the parent directory inode passed to function
ocfs2_lookup_ino_from_name() is wrong.  Parameter dir is the parent of
new_dentry not old_dentry.  We should get old_dir from old_dentry and
lookup old_dentry in old_dir in case another node remove the old dentry.

With this change, hard linking works again, when paths are relative with
at least one subdirectory.  This is how the problem was reproducable:

  # mkdir a
  # mkdir b
  # touch a/test
  # ln a/test b/test
  ln: failed to create hard link `b/test' => `a/test': No such file or  directory

However when creating links in the same dir, it worked well.

Now the link gets created.

Fixes: 0e048316ff ("ocfs2: check existence of old dentry in ocfs2_link()")
Signed-off-by: joyce.xue <xuejiufei@huawei.com>
Reported-by: Szabo Aron - UBIT <aron@ubit.hu>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Tested-by: Aron Szabo <aron@ubit.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-08 15:10:51 -08:00
Joseph Qi
eb4f73b4ca ocfs2: remove bogus check in dlm_process_recovery_data
In dlm_process_recovery_data, only when dlm_new_lock failed the ret will
be set to -ENOMEM.  And in this case, newlock is definitely NULL.  So
test newlock is meaningless, remove it.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Alex Chen <alex.chen@huawei.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-08 15:10:51 -08:00
Ilya Dryomov
0668ff52e2 ceph: use %zu for len in ceph_fill_inline_data()
len is size_t, should be printed with %zu.

Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
2015-01-08 20:36:56 +03:00
Jeff Layton
94ae1db226 nfsd: fix fi_delegees leak when fi_had_conflict returns true
Currently, nfs4_set_delegation takes a reference to an existing
delegation and then checks to see if there is a conflict. If there is
one, then it doesn't release that reference.

Change the code to take the reference after the check and only if there
is no conflict.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-01-07 13:38:21 -05:00
Linus Torvalds
3b421b80be Revert a potential seek_data/hole regression which shows up when using
ext4 to handle ext3 file systems, plus two minor bug fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJUqxTQAAoJENNvdpvBGATwUC8QAIYfP02XYyGrBQIoAoCiRJji
 TmhAe2Amy9xHgBq2I8Yz3bi8j1wGuiqc57fYgwwVScf10tzmO/HuwnAZZDAmg9hK
 ZWp9WyPoyf+U/nbkIfC5mRh3Qz0dt1pt6R3uQDUlcUuAamdMBrdJnhkQC6WMbpU2
 fAqsJT3/yGrLnMF29eVqJzcxb5KORJ8hEcD7kwkvJwe4sGm3C7iDsjS0i63YWDz4
 QNclW6zF4THhmuVNxwRupOgMQNSq8sHg8U23nP4DZLvLE7GlgtwfDvehU7uBfw5n
 WO5UfsEYLoeODNmujUJCtjXNLpzDXmrtByyWbbTK7EX3MmV94ym4uu5lHLfyMiTc
 o2ppxcsKBVcOsPWnFwuhJ5p/Wyy0Uld9Q3P6b5ymhyzDhkuwcTURpeRxBRXHEgcm
 nY5GE1bBdO7OigDz/+DFL/Zgr8EO7hW72hrBaLDWMEbrrl0asZw/ReC/bnreMmm4
 sP87DB+MqRXzRs8aOPWmCofJwGSgCYmOq2nqNCAaxgk/ofvrDURrnZYfLrbzspGa
 hqE1W0X5hvQydcifi4qq2Na76+Js3atSY38EOH/HNknSqlQjysnkW4ajTDWk/GFy
 M/fKUCfIl1tmCMN2myZzl89E7uMSyod75ycd0BQy36iHPE14JVvk/u7GfcKHLs53
 1rAPpW90a72GX2Z9+xxA
 =uPYz
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 bugfixes from Ted Ts'o:
 "Revert a potential seek_data/hole regression which shows up when using
  ext4 to handle ext3 file systems, plus two minor bug fixes"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: remove spurious KERN_INFO from ext4_warning call
  Revert "ext4: fix suboptimal seek_{data,hole} extents traversial"
  ext4: prevent online resize with backup superblock
2015-01-06 14:05:40 -08:00
Miklos Szeredi
9759bd5189 fuse: add memory barrier to INIT
Theoretically we need to order setting of various fields in fc with
fc->initialized.

No known bug reports related to this yet.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2015-01-06 10:45:35 +01:00
Miklos Szeredi
21f621741a fuse: fix LOOKUP vs INIT compat handling
Analysis from Marc:

 "Commit 7078187a79 ("fuse: introduce fuse_simple_request() helper")
  from the above pull request triggers some EIO errors for me in some tests
  that rely on fuse

  Looking at the code changes and a bit of debugging info I think there's a
  general problem here that fuse_get_req checks and possibly waits for
  fc->initialized, and this was always called first.  But this commit
  changes the ordering and in many places fc->minor is now possibly used
  before fuse_get_req, and we can't be sure that fc has been initialized.
  In my case fuse_lookup_init sets req->out.args[0].size to the wrong size
  because fc->minor at that point is still 0, leading to the EIO error."

Fix by moving the compat adjustments into fuse_simple_request() to after
fuse_get_req().

This is also more readable than the original, since now compatibility is
handled in a single function instead of cluttering each operation.

Reported-by: Marc Dionne <marc.c.dionne@gmail.com>
Tested-by: Marc Dionne <marc.c.dionne@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Fixes: 7078187a79 ("fuse: introduce fuse_simple_request() helper")
2015-01-06 10:45:35 +01:00
Trond Myklebust
4e379d36c0 NFSv4: Remove incorrect check in can_open_delegated()
Remove an incorrect check for NFS_DELEGATION_NEED_RECLAIM in
can_open_delegated(). We are allowed to cache opens even in
a situation where we're doing reboot recovery.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-01-05 19:40:54 -08:00
Chuck Lever
7a01edf005 NFS: Ignore transport protocol when detecting server trunking
Detect server trunking across transport protocols. Otherwise, an
RDMA mount and a TCP mount of the same server will end up with
separate nfs_clients using the same clientid4.

Reported-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-01-05 19:40:54 -08:00
Trond Myklebust
55b9df93dd NFSv4/v4.1: Verify the client owner id during trunking detection
While we normally expect the NFSv4 client to always send the same client
owner to all servers, there are a couple of situations where that is not
the case:
 1) In NFSv4.0, switching between use of '-omigration' and not will cause
    the kernel to switch between using the non-uniform and uniform client
    strings.
 2) In NFSv4.1, or NFSv4.0 when using uniform client strings, if the
    uniquifier string is suddenly changed.

This patch will catch those situations by checking the client owner id
in the trunking detection code, and will do the right thing if it notices
that the strings differ.

Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-01-05 19:40:53 -08:00
Trond Myklebust
ceb3a16c07 NFSv4: Cache the NFSv4/v4.1 client owner_id in the struct nfs_client
Ensure that we cache the NFSv4/v4.1 client owner_id so that we can
verify it when we're doing trunking detection.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-01-05 19:40:53 -08:00
Trond Myklebust
1fc0703af3 NFSv4.1: Fix client id trunking on Linux
Currently, our trunking code will check for session trunking, but will
fail to detect client id trunking. This is a problem, because it means
that the client will fail to recognise that the two connections represent
shared state, even if they do not permit a shared session.
By removing the check for the server minor id, and only checking the
major id, we will end up doing the right thing in both cases: we close
down the new nfs_client and fall back to using the existing one.

Fixes: 05f4c350ee ("NFS: Discover NFSv4 server trunking when mounting")
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org # 3.7.x
Tested-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-01-05 19:40:53 -08:00
Trond Myklebust
06bed7d18c LOCKD: Fix a race when initialising nlmsvc_timeout
This commit fixes a race whereby nlmclnt_init() first starts the lockd
daemon, and then calls nlm_bind_host() with the expectation that
nlmsvc_timeout has already been initialised. Unfortunately, there is no
no synchronisation between lockd() and lockd_up() to guarantee that this
is the case.

Fix is to move the initialisation of nlmsvc_timeout into lockd_create_svc

Fixes: 9a1b6bf818 ("LOCKD: Don't call utsname()->nodename...")
Cc: Bruce Fields <bfields@fieldses.org>
Cc: stable@vger.kernel.org # 3.10.x
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-01-05 19:40:53 -08:00
Jakub Wilk
363307e6e5 ext4: remove spurious KERN_INFO from ext4_warning call
Signed-off-by: Jakub Wilk <jwilk@jwilk.net>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-01-02 15:31:14 -05:00
Theodore Ts'o
ad7fefb109 Revert "ext4: fix suboptimal seek_{data,hole} extents traversial"
This reverts commit 14516bb7bb.

This was causing regression test failures with generic/285 with an ext3
filesystem using CONFIG_EXT4_USE_FOR_EXT23.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-01-02 15:16:00 -05:00
Chris Mason
6f8960541b Btrfs: don't delay inode ref updates during log replay
Commit 1d52c78afb (Btrfs: try not to ENOSPC on log replay) added a
check to skip delayed inode updates during log replay because it
confuses the enospc code.  But the delayed processing will end up
ignoring delayed refs from log replay because the inode itself wasn't
put through the delayed code.

This can end up triggering a warning at commit time:

WARNING: CPU: 2 PID: 778 at fs/btrfs/delayed-inode.c:1410 btrfs_assert_delayed_root_empty+0x32/0x34()

Which is repeated for each commit because we never process the delayed
inode ref update.

The fix used here is to change btrfs_delayed_delete_inode_ref to return
an error if we're currently in log replay.  The caller will do the ref
deletion immediately and everything will work properly.

Signed-off-by: Chris Mason <clm@fb.com>
cc: stable@vger.kernel.org # v3.18 and any stable series that picked 1d52c78afb
2015-01-02 14:47:56 -05:00
Filipe Manana
a1317f455a Btrfs: correctly get tree level in tree_backref_for_extent
If we are using skinny metadata, the block's tree level is in the offset
of the key and not in a btrfs_tree_block_info structure following the
extent item (it doesn't exist). Therefore fix it.

Besides returning the correct level in the tree, this also prevents reading
past the leaf's end in the case where the extent item is the last item in
the leaf (eb) and it has only 1 inline reference - this is because
sizeof(struct btrfs_tree_block_info) is greater than
sizeof(struct btrfs_extent_inline_ref).

Got it while running a scrub which produced the following warning:

    BTRFS: checksum error at logical 42123264 on dev /dev/sde, sector 15840: metadata node (level 24) in tree 5

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-01-02 14:47:56 -05:00
Wang Shilong
c7cfb8a540 Btrfs: call inode_dec_link_count() on mkdir error path
In btrfs_mkdir(), if it fails to create dir, we should
clean up existed items, setting inode's link properly
to make sure it could be cleaned up properly.

Signed-off-by: Wang Shilong <wangshilong1991@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-01-02 14:47:55 -05:00
Josef Bacik
df95e7f0d9 Btrfs: abort transaction if we don't find the block group
We shouldn't BUG_ON() if there is corruption.  I hit this while testing my block
group patch and the abort worked properly.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-01-02 14:47:55 -05:00
Dan Carpenter
6b6d24b389 Btrfs, scrub: uninitialized variable in scrub_extent_for_parity()
The only way that "ret" is set is when we call scrub_pages_for_parity()
so the skip to "if (ret) " test doesn't make sense and causes a static
checker warning.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-01-02 14:47:55 -05:00
Linus Torvalds
5faa0154fe Merge branch 'for-linus' of git://git.samba.org/sfrench/cifs-2.6
Pull CIFS fixes from Steve French:
 "A set of three minor cifs fixes"

* 'for-linus' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: make new inode cache when file type is different
  Fix signed/unsigned pointer warning
  Convert MessageID in smb2_hdr to LE
2014-12-29 21:09:57 -08:00
Linus Torvalds
b9d4a35f0a Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull UDF & isofs fixes from Jan Kara:
 "A couple of UDF fixes of handling of corrupted media and one iso9660
  fix of the same"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  udf: Reduce repeated dereferences
  udf: Check component length before reading it
  udf: Check path length when reading symlink
  udf: Verify symlink size before loading it
  udf: Verify i_size when loading inode
  isofs: Fix unchecked printing of ER records
2014-12-29 20:43:10 -08:00
Theodore Ts'o
011fa99404 ext4: prevent online resize with backup superblock
Prevent BUG or corrupted file systems after the following:

mkfs.ext4 /dev/vdc 100M
mount -t ext4 -o sb=40961 /dev/vdc /vdc
resize2fs /dev/vdc

We previously prevented online resizing using the old resize ioctl.
Move the code to ext4_resize_begin(), so the check applies for all of
the resize ioctl's.

Reported-by: Maxim Malkov <malkov@ispras.ru>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-12-26 23:58:21 -05:00
Nakajima Akira
9e6d722f3d cifs: make new inode cache when file type is different
In spite of different file type,
 if file is same name and same inode number, old inode cache is used.
This causes that you can not cd directory, can not cat SymbolicLink.
So this patch is that if file type is different, return error.

Reproducible sample :
1. create file 'a' at cifs client.
2. repeat rm and mkdir 'a' 4 times at server, then direcotry 'a' having same inode number is created.
   (Repeat 4 times, then same inode number is recycled.)
   (When server is under RHEL 6.6, 1 time is O.K.  Always same inode number is recycled.)
3. ls -li at client, then you can not cd directory, can not remove directory.

SymbolicLink has same problem.

Bug link:
https://bugzilla.kernel.org/show_bug.cgi?id=90011

Signed-off-by: Nakajima Akira <nakajima.akira@nttcom.co.jp>
Acked-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Steve French <steve.french@primarydata.com>
2014-12-22 14:16:21 -06:00
Jan Kara
3ee3039c5b udf: Reduce repeated dereferences
Replace repeated dereferences like dir->i_sb by storing superblock
pointer in a variable and using that.

Signed-off-by: Jan Kara <jack@suse.cz>
2014-12-21 22:42:37 +01:00
Jan Kara
e237ec37ec udf: Check component length before reading it
Check that length specified in a component of a symlink fits in the
input buffer we are reading. Also properly ignore component length for
component types that do not use it. Otherwise we read memory after end
of buffer for corrupted udf image.

Reported-by: Carl Henrik Lunde <chlunde@ping.uio.no>
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
2014-12-21 22:42:19 +01:00
Linus Torvalds
ecb5ec044a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile #3 from Al Viro:
 "Assorted fixes and patches from the last cycle"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  [regression] chunk lost from bd9b51
  vfs: make mounts and mountstats honor root dir like mountinfo does
  vfs: cleanup show_mountinfo
  init: fix read-write root mount
  unfuck binfmt_misc.c (broken by commit e6084d4)
  vm_area_operations: kill ->migrate()
  new helper: iter_is_iovec()
  move_extent_per_page(): get rid of unused w_flags
  lustre: get rid of playing with ->fs
  btrfs: filp_open() returns ERR_PTR() on failure, not NULL...
2014-12-19 18:19:19 -08:00
Linus Torvalds
298647e31a Fixes for filename decryption and encrypted view plus a cleanup
- The filename decryption routines were, at times, writing a zero byte one
   character past the end of the filename buffer
 - The encrypted view feature attempted, and failed, to roll its own form of
   enforcing a read-only mount instead of letting the VFS enforce it
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCgAGBQJUlGlWAAoJENaSAD2qAscKgdwQAJLCgt5xgoij8uH65mYVy6PE
 ++E8KggCHchhUcFh19EuuG++ENzwZedqKqbbRhmZt+svla08uPDO0jY+sYaOZzqB
 HMzZyyL4Fw/DiVsK6LUMGfN2CS7tua9ReK0dj4tUc3jwBplnQnGrQlUPbgdPRJJp
 XJDKHVtHGQPQC1dJAeProCJTg23jv5wXacly2I5VxZuh5DnDWjuo2KkzqGMKfadU
 9bxOf5DjVwQ/X6apgkNxG/8Q6J8a+80K7SGs/aRELIuRL0+mIj5AX9ht+p5Z0XI+
 E8jU4HM2biuqj8PtxK/WbyF1bHiREVyOLdneW+n3pcqGTLMZ3TLpDURkNd0cwZcd
 AGoZtUZZuQ/xvzE9IrEj+KYeINnZzPQpyJu9jXXp1CsQJ8u/wUG9e0kcJrUI1dZ/
 q26zhPaVhJ9x0go/BNOxzTUpOtuMWXttAVZGICthp74I5l+DgfSZ4J4CeQmFMMRs
 kbiTg5h4qsrD4jnEU/BCscpCLoz4qlF6+q/+lspwkg7OwvhHc0qSMJn3UjZ/eYzb
 ncDp6gc4Iju3RhWnVEZphA4ttbZJX7ahER4y0NdIG65Fa37AUEUxy0OmGfoyFOmC
 KVtlHkl2hKoCB5+ujLzL7WimH33ogtVhtOJTZlXzS0lsdSfSxUWU4/OZ22jTy0hL
 Bnx6uoNIPWsr6b5OJiA0
 =jTyq
 -----END PGP SIGNATURE-----

Merge tag 'ecryptfs-3.19-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs

Pull eCryptfs fixes from Tyler Hicks:
 "Fixes for filename decryption and encrypted view plus a cleanup

   - The filename decryption routines were, at times, writing a zero
     byte one character past the end of the filename buffer

   - The encrypted view feature attempted, and failed, to roll its own
     form of enforcing a read-only mount instead of letting the VFS
     enforce it"

* tag 'ecryptfs-3.19-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
  eCryptfs: Remove buggy and unnecessary write in file name decode routine
  eCryptfs: Remove unnecessary casts when parsing packet lengths
  eCryptfs: Force RO mount when encrypted view is enabled
2014-12-19 18:15:12 -08:00
Linus Torvalds
5c68eac68b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull more btrfs updates from Chris Mason:
 "This is part two of our merge window patches.

  These are all from Filipe, and fix some really hard to find races that
  can cause corruptions.  Most of them involved block group removal
  (balance) or discard"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: remove non-sense btrfs_error_discard_extent() function
  Btrfs: fix fs corruption on transaction abort if device supports discard
  Btrfs: always clear a block group node when removing it from the tree
  Btrfs: ensure deletion from pinned_chunks list is protected
2014-12-19 18:10:42 -08:00
Linus Torvalds
ac88ee3b6c Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq core fix from Thomas Gleixner:
 "A single fix plugging a long standing race between proc/stat and
  proc/interrupts access and freeing of interrupt descriptors"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq: Prevent proc race against freeing of irq descriptors
2014-12-19 13:26:08 -08:00
Jan Kara
0e5cc9a40a udf: Check path length when reading symlink
Symlink reading code does not check whether the resulting path fits into
the page provided by the generic code. This isn't as easy as just
checking the symlink size because of various encoding conversions we
perform on path. So we have to check whether there is still enough space
in the buffer on the fly.

CC: stable@vger.kernel.org
Reported-by: Carl Henrik Lunde <chlunde@ping.uio.no>
Signed-off-by: Jan Kara <jack@suse.cz>
2014-12-19 14:12:08 +01:00
Jan Kara
a1d47b2629 udf: Verify symlink size before loading it
UDF specification allows arbitrarily large symlinks. However we support
only symlinks at most one block large. Check the length of the symlink
so that we don't access memory beyond end of the symlink block.

CC: stable@vger.kernel.org
Reported-by: Carl Henrik Lunde <chlunde@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2014-12-19 13:17:10 +01:00
Jan Kara
e159332b9a udf: Verify i_size when loading inode
Verify that inode size is sane when loading inode with data stored in
ICB. Otherwise we may get confused later when working with the inode and
inode size is too big.

CC: stable@vger.kernel.org
Reported-by: Carl Henrik Lunde <chlunde@ping.uio.no>
Signed-off-by: Jan Kara <jack@suse.cz>
2014-12-19 13:13:05 +01:00
Jan Kara
4e2024624e isofs: Fix unchecked printing of ER records
We didn't check length of rock ridge ER records before printing them.
Thus corrupted isofs image can cause us to access and print some memory
behind the buffer with obvious consequences.

Reported-and-tested-by: Carl Henrik Lunde <chlunde@ping.uio.no>
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
2014-12-19 11:29:24 +01:00
Linus Torvalds
018cb13eb3 Merge branch 'akpm' (patches from Andrew)
Merge misc patches from Andrew Morton:
 "A few stragglers"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  tools/testing/selftests/Makefile: alphasort the TARGETS list
  mm/zsmalloc: adjust order of functions
  ocfs2: fix journal commit deadlock
  ocfs2/dlm: fix race between dispatched_work and dlm_lockres_grab_inflight_worker
  ocfs2: reflink: fix slow unlink for refcounted file
  mm/memory.c:do_shared_fault(): add comment
  .mailmap: Santosh Shilimkar has moved
  .mailmap: update akpm@osdl.org
  lib/show_mem.c: add cma reserved information
  fs/proc/meminfo.c: include cma info in proc/meminfo
  mm: cma: split cma-reserved in dmesg log
  hfsplus: fix longname handling
  mm/mempolicy.c: remove unnecessary is_valid_nodemask()
2014-12-18 19:08:25 -08:00
Junxiao Bi
136f49b917 ocfs2: fix journal commit deadlock
For buffer write, page lock will be got in write_begin and released in
write_end, in ocfs2_write_end_nolock(), before it unlock the page in
ocfs2_free_write_ctxt(), it calls ocfs2_run_deallocs(), this will ask
for the read lock of journal->j_trans_barrier.  Holding page lock and
ask for journal->j_trans_barrier breaks the locking order.

This will cause a deadlock with journal commit threads, ocfs2cmt will
get write lock of journal->j_trans_barrier first, then it wakes up
kjournald2 to do the commit work, at last it waits until done.  To
commit journal, kjournald2 needs flushing data first, it needs get the
cache page lock.

Since some ocfs2 cluster locks are holding by write process, this
deadlock may hung the whole cluster.

unlock pages before ocfs2_run_deallocs() can fix the locking order, also
put unlock before ocfs2_commit_trans() to make page lock is unlocked
before j_trans_barrier to preserve unlocking order.

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com>
Cc: <stable@vger.kernel.org>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-18 19:08:11 -08:00
Joseph Qi
1e58958160 ocfs2/dlm: fix race between dispatched_work and dlm_lockres_grab_inflight_worker
Commit ac4fef4d23 ("ocfs2/dlm: do not purge lockres that is queued for
assert master") may have the following possible race case:

  dlm_dispatch_assert_master       dlm_wq
  ========================================================================
  queue_work(dlm->quedlm_worker,
      &dlm->dispatched_work);
                                 dispatch work,
                                 dlm_lockres_drop_inflight_worker
                                 *BUG_ON(res->inflight_assert_workers == 0)*
  dlm_lockres_grab_inflight_worker
  inflight_assert_workers++

So ensure inflight_assert_workers to be increased first.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Xue jiufei <xuejiufei@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-18 19:08:11 -08:00
Junxiao Bi
f62f12b3a4 ocfs2: reflink: fix slow unlink for refcounted file
When running ocfs2 test suite multiple nodes reflink stress test, for a
4 nodes cluster, every unlink() for refcounted file needs about 700s.

The slow unlink is caused by the contention of refcount tree lock since
all nodes are unlink files using the same refcount tree.  When the
unlinking file have many extents(over 1600 in our test), most of the
extents has refcounted flag set.  In ocfs2_commit_truncate(), it will
execute the following call trace for every extents.  This means it needs
get and released refcount tree lock about 1600 times.  And when several
nodes are do this at the same time, the performance will be very low.

  ocfs2_remove_btree_range()
  --  ocfs2_lock_refcount_tree()
  ----  ocfs2_refcount_lock()
  ------  __ocfs2_cluster_lock()

ocfs2_refcount_lock() is costly, move it to ocfs2_commit_truncate() to
do lock/unlock once can improve a lot performance.

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Wengang <wen.gang.wang@oracle.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-18 19:08:11 -08:00
Pintu Kumar
47f8f9297d fs/proc/meminfo.c: include cma info in proc/meminfo
This patch include CMA info (CMATotal, CMAFree) in /proc/meminfo.
Currently, in a CMA enabled system, if somebody wants to know the total
CMA size declared, there is no way to tell, other than the dmesg or
/var/log/messages logs.

With this patch we are showing the CMA info as part of meminfo, so that it
can be determined at any point of time.  This will be populated only when
CMA is enabled.

Below is the sample output from a ARM based device with RAM:512MB and CMA:16MB.

  MemTotal:         471172 kB
  MemFree:          111712 kB
  MemAvailable:     271172 kB
  .
  .
  .
  CmaTotal:          16384 kB
  CmaFree:            6144 kB

This patch also fix below checkpatch errors that were found during these changes.

  ERROR: space required after that ',' (ctx:ExV)
  199: FILE: fs/proc/meminfo.c:199:
  +       ,atomic_long_read(&num_poisoned_pages) << (PAGE_SHIFT - 10)
          ^

  ERROR: space required after that ',' (ctx:ExV)
  202: FILE: fs/proc/meminfo.c:202:
  +       ,K(global_page_state(NR_ANON_TRANSPARENT_HUGEPAGES) *
          ^

  ERROR: space required after that ',' (ctx:ExV)
  206: FILE: fs/proc/meminfo.c:206:
  +       ,K(totalcma_pages)
          ^

  total: 3 errors, 0 warnings, 2 checks, 236 lines checked

Signed-off-by: Pintu Kumar <pintu.k@samsung.com>
Signed-off-by: Vishnu Pratap Singh <vishnu.ps@samsung.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-18 19:08:10 -08:00
Sougata Santra
89ac9b4d3d hfsplus: fix longname handling
Longname is not correctly handled by hfsplus driver.  If an attempt to
create a longname(>255) file/directory is made, it succeeds by creating a
file/directory with HFSPLUS_MAX_STRLEN and incorrect catalog key.  Thus
leaving the volume in an inconsistent state.  This patch fixes this issue.

Although lookup is always called first to create a negative entry, so just
doing a check in lookup would probably fix this issue.  I choose to
propagate error to other iops as well.

Please NOTE: I have factored out hfsplus_cat_build_key_with_cnid from
hfsplus_cat_build_key, to avoid unncessary branching.

Thanks a lot.

  TEST:
  ------
  dir="TEST_DIR"
  cdir=`pwd`
  name255="_123456789_123456789_123456789_123456789_123456789_123456789\
  _123456789_123456789_123456789_123456789_123456789_123456789_123456789\
  _123456789_123456789_123456789_123456789_123456789_123456789_123456789\
  _123456789_123456789_123456789_123456789_123456789_1234"
  name256="${name255}5"

  mkdir $dir
  cd $dir
  touch $name255
  rm -f $name255
  touch $name256
  ls -la
  cd $cdir
  rm -rf $dir

  RESULT:
  -------
  [sougata@ultrabook tmp]$ cdir=`pwd`
  [sougata@ultrabook tmp]$
  name255="_123456789_123456789_123456789_123456789_123456789_123456789\
   > _123456789_123456789_123456789_123456789_123456789_123456789_123456789\
   > _123456789_123456789_123456789_123456789_123456789_123456789_123456789\
   > _123456789_123456789_123456789_123456789_123456789_1234"
  [sougata@ultrabook tmp]$ name256="${name255}5"
  [sougata@ultrabook tmp]$
  [sougata@ultrabook tmp]$ mkdir $dir
  [sougata@ultrabook tmp]$ cd $dir
  [sougata@ultrabook TEST_DIR]$ touch $name255
  [sougata@ultrabook TEST_DIR]$ rm -f $name255
  [sougata@ultrabook TEST_DIR]$ touch $name256
  [sougata@ultrabook TEST_DIR]$ ls -la
  ls: cannot access
  _123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_1234:
  No such file or directory
  total 0
  drwxrwxr-x 1 sougata sougata 3 Feb 20 19:56 .
  drwxrwxrwx 1 root    root    6 Feb 20 19:56 ..
  -????????? ? ?       ?       ?            ?
  _123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_1234
  [sougata@ultrabook TEST_DIR]$ cd $cdir
  [sougata@ultrabook tmp]$ rm -rf $dir
  rm: cannot remove `TEST_DIR': Directory not empty

-ENAMETOOLONG returned from hfsplus_asc2uni was not propaged to iops.
This allowed hfsplus to create files/directories with HFSPLUS_MAX_STRLEN
and incorrect keys, leaving the FS in an inconsistent state.  This patch
fixes this issue.

Signed-off-by: Sougata Santra <sougata@tuxera.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-18 19:08:10 -08:00
Eric W. Biederman
c297abfdf1 mnt: Fix a memory stomp in umount
While reviewing the code of umount_tree I realized that when we append
to a preexisting unmounted list we do not change pprev of the former
first item in the list.

Which means later in namespace_unlock hlist_del_init(&mnt->mnt_hash) on
the former first item of the list will stomp unmounted.first leaving
it set to some random mount point which we are likely to free soon.

This isn't likely to hit, but if it does I don't know how anyone could
track it down.

[ This happened because we don't have all the same operations for
  hlist's as we do for normal doubly-linked lists. In particular,
  list_splice() is easy on our standard doubly-linked lists, while
  hlist_splice() doesn't exist and needs both start/end entries of the
  hlist.  And commit 38129a13e6 incorrectly open-coded that missing
  hlist_splice().

  We should think about making these kinds of "mindless" conversions
  easier to get right by adding the missing hlist helpers   - Linus ]

Fixes: 38129a13e6 switch mnt_hash to hlist
Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-18 11:22:02 -08:00
Linus Torvalds
44e8967d59 Ceph: remove left-over reject file
Neither Sage nor I noticed that Zheng Yan had mistakenly committed
fs/ceph/super.h.rej as part of commit 31c542a199 ("ceph: add inline
data to pagecache").

Remove it.

Requested-by: Yan, Zheng <ukernel@gmail.com>
Cc: Sage Weil <sweil@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-17 18:47:01 -08:00
Linus Torvalds
57666509b7 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull ceph updates from Sage Weil:
 "The big item here is support for inline data for CephFS and for
  message signatures from Zheng.  There are also several bug fixes,
  including interrupted flock request handling, 0-length xattrs, mksnap,
  cached readdir results, and a message version compat field.  Finally
  there are several cleanups from Ilya, Dan, and Markus.

  Note that there is another series coming soon that fixes some bugs in
  the RBD 'lingering' requests, but it isn't quite ready yet"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (27 commits)
  ceph: fix setting empty extended attribute
  ceph: fix mksnap crash
  ceph: do_sync is never initialized
  libceph: fixup includes in pagelist.h
  ceph: support inline data feature
  ceph: flush inline version
  ceph: convert inline data to normal data before data write
  ceph: sync read inline data
  ceph: fetch inline data when getting Fcr cap refs
  ceph: use getattr request to fetch inline data
  ceph: add inline data to pagecache
  ceph: parse inline data in MClientReply and MClientCaps
  libceph: specify position of extent operation
  libceph: add CREATE osd operation support
  libceph: add SETXATTR/CMPXATTR osd operations support
  rbd: don't treat CEPH_OSD_OP_DELETE as extent op
  ceph: remove unused stringification macros
  libceph: require cephx message signature by default
  ceph: introduce global empty snap context
  ceph: message versioning fixes
  ...
2014-12-17 16:03:12 -08:00
Linus Torvalds
87c31b39ab Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull user namespace related fixes from Eric Biederman:
 "As these are bug fixes almost all of thes changes are marked for
  backporting to stable.

  The first change (implicitly adding MNT_NODEV on remount) addresses a
  regression that was created when security issues with unprivileged
  remount were closed.  I go on to update the remount test to make it
  easy to detect if this issue reoccurs.

  Then there are a handful of mount and umount related fixes.

  Then half of the changes deal with the a recently discovered design
  bug in the permission checks of gid_map.  Unix since the beginning has
  allowed setting group permissions on files to less than the user and
  other permissions (aka ---rwx---rwx).  As the unix permission checks
  stop as soon as a group matches, and setgroups allows setting groups
  that can not later be dropped, results in a situtation where it is
  possible to legitimately use a group to assign fewer privileges to a
  process.  Which means dropping a group can increase a processes
  privileges.

  The fix I have adopted is that gid_map is now no longer writable
  without privilege unless the new file /proc/self/setgroups has been
  set to permanently disable setgroups.

  The bulk of user namespace using applications even the applications
  using applications using user namespaces without privilege remain
  unaffected by this change.  Unfortunately this ix breaks a couple user
  space applications, that were relying on the problematic behavior (one
  of which was tools/selftests/mount/unprivileged-remount-test.c).

  To hopefully prevent needing a regression fix on top of my security
  fix I rounded folks who work with the container implementations mostly
  like to be affected and encouraged them to test the changes.

    > So far nothing broke on my libvirt-lxc test bed. :-)
    > Tested with openSUSE 13.2 and libvirt 1.2.9.
    > Tested-by: Richard Weinberger <richard@nod.at>

    > Tested on Fedora20 with libvirt 1.2.11, works fine.
    > Tested-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com>

    > Ok, thanks - yes, unprivileged lxc is working fine with your kernels.
    > Just to be sure I was testing the right thing I also tested using
    > my unprivileged nsexec testcases, and they failed on setgroup/setgid
    > as now expected, and succeeded there without your patches.
    > Tested-by: Serge Hallyn <serge.hallyn@ubuntu.com>

    > I tested this with Sandstorm.  It breaks as is and it works if I add
    > the setgroups thing.
    > Tested-by: Andy Lutomirski <luto@amacapital.net> # breaks things as designed :("

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  userns: Unbreak the unprivileged remount tests
  userns; Correct the comment in map_write
  userns: Allow setting gid_maps without privilege when setgroups is disabled
  userns: Add a knob to disable setgroups on a per user namespace basis
  userns: Rename id_map_mutex to userns_state_mutex
  userns: Only allow the creator of the userns unprivileged mappings
  userns: Check euid no fsuid when establishing an unprivileged uid mapping
  userns: Don't allow unprivileged creation of gid mappings
  userns: Don't allow setgroups until a gid mapping has been setablished
  userns: Document what the invariant required for safe unprivileged mappings.
  groups: Consolidate the setgroups permission checks
  mnt: Clear mnt_expire during pivot_root
  mnt: Carefully set CL_UNPRIVILEGED in clone_mnt
  mnt: Move the clear of MNT_LOCKED from copy_tree to it's callers.
  umount: Do not allow unmounting rootfs.
  umount: Disallow unprivileged mount force
  mnt: Update unprivileged remount test
  mnt: Implicitly add MNT_NODEV on remount when it was implicitly added by mount
2014-12-17 12:31:40 -08:00
Linus Torvalds
d6666be6f0 MTD updates for 3.19:
* Add device tree support for DoC3
 
  * SPI NOR:
 
     Refactoring, for better layering between spi-nor.c and its driver users
     (e.g., m25p80.c)
 
     New flash device support
 
     Support 6-byte ID strings
 
  * NAND
 
     New NAND driver for Allwinner SoC's (sunxi)
 
     GPMI NAND: add support for raw (no ECC) access, for testing purposes
 
     Add ATO manufacturer ID
 
     A few odd driver fixes
 
  * MTD tests:
 
     Allow testers to compensate for OOB bitflips in oobtest
 
     Fix a torturetest regression
 
  * nandsim: Support longer ID byte strings
 
 And more.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUj6PbAAoJEFySrpd9RFgtahcP/RGvknk9lnitaZI7+aZPP8Zs
 AopfiuisLNv3v87EEBAWGYjRYm6vuuhO1z3K55iOIlemBVzoMIP0jf68XGy9uXnL
 Ox6AHqxm55wwmc+CHry5/GssaqE6GzdPm8TBP+AGGNhHrhc+raJL55R0QJaoYVwX
 pUxkhWaa4lZ6CrOIMQ3n+MEnduilHZoFIcXSc1UtI0y9IXsf1m0Qs8M5jGN8BQ16
 HOVNJN9wOXF89swoi7bKsyAn+QFUYgdksAisncb6b9r5Ks5KHmcOuS1LM5X9YoUr
 KfeogHfDM68fcaHsSvMU1xmxjXGtE+HFJE52eYNPB1fNbT3JAC13FFj92GeSsZtE
 ekpCQh4OPLazT/2wCUHTQwC7T1dCilwyW7VJB9MSl7fSBo9P7jIiUHxVUdM43kez
 Di02/XWi4IULTwrgzZiTT8yplFrVdvkmKHAAFEIoaVWiF/l4DeSodLGUw7owmNYn
 rJPBPQulpPHRwKZY7gThJuOUXpgbT715GSbvmPYXimHBqmViiPkrhqQ/b/v4PRRs
 Nlfhwbswr7WBq6vmPkd6eOyfdFANmWcZQMp/++BCdI/7mhfaik72GWMTBSuJ7hN5
 PB+z95soHaKBWlbiDGGGPvuqJmPkOVq1R8itQdIYBWEh7eNSHecwVxyUJJ+V3oPv
 QkD7mEP2ZozZe3Ys2EJQ
 =gDW8
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-20141215' of git://git.infradead.org/linux-mtd

Pull MTD updates from Brian Norris:
 "Summary:
   - Add device tree support for DoC3

   - SPI NOR:
        Refactoring, for better layering between spi-nor.c and its
        driver users (e.g., m25p80.c)

        New flash device support

        Support 6-byte ID strings

   - NAND:
        New NAND driver for Allwinner SoC's (sunxi)

        GPMI NAND: add support for raw (no ECC) access, for testing
        purposes

        Add ATO manufacturer ID

        A few odd driver fixes

   - MTD tests:
        Allow testers to compensate for OOB bitflips in oobtest

        Fix a torturetest regression

   - nandsim: Support longer ID byte strings

  And more"

* tag 'for-linus-20141215' of git://git.infradead.org/linux-mtd: (63 commits)
  mtd: tests: abort torturetest on erase errors
  mtd: physmap_of: fix potential NULL dereference
  mtd: spi-nor: allow NULL as chip name and try to auto detect it
  mtd: nand: gpmi: add raw oob access functions
  mtd: nand: gpmi: add proper raw access support
  mtd: nand: gpmi: add gpmi_copy_bits function
  mtd: spi-nor: factor out write_enable() for erase commands
  mtd: spi-nor: add support for s25fl128s
  mtd: spi-nor: remove the jedec_id/ext_id
  mtd: spi-nor: add id/id_len for flash_info{}
  mtd: nand: correct the comment of function nand_block_isreserved()
  jffs2: Drop bogus if in comment
  mtd: atmel_nand: replace memcpy32_toio/memcpy32_fromio with memcpy
  mtd: cafe_nand: drop duplicate .write_page implementation
  mtd: m25p80: Add support for serial flash Spansion S25FL132K
  MTD: m25p80: fix inconsistency in m25p_ids compared to spi_nor_ids
  mtd: spi-nor: improve wait-till-ready timeout loop
  mtd: delete unnecessary checks before two function calls
  mtd: nand: omap: Fix NAND enumeration on 3430 LDP
  mtd: nand: add ATO manufacturer info
  ...
2014-12-17 09:59:26 -08:00
Linus Torvalds
c103b21c20 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse update from Miklos Szeredi:
 "The first part makes sure we don't hold up umount with pending async
  requests.  In addition to being a cleanup, this is a small behavioral
  change (for the better) and unlikely to break anything.

  The second part prepares for a cleanup of the fuse device I/O code by
  adding a helper for simple request submission, with some savings in
  line numbers already realized"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: use file_inode() in fuse_file_fallocate()
  fuse: introduce fuse_simple_request() helper
  fuse: reduce max out args
  fuse: hold inode instead of path after release
  fuse: flush requests on umount
  fuse: don't wake up reserved req in fuse_conn_kill()
2014-12-17 09:41:32 -08:00
Yan, Zheng
0aeff37aba ceph: fix setting empty extended attribute
make sure 'value' is not null. otherwise __ceph_setxattr will remove
the extended attribute.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2014-12-17 20:18:49 +03:00
Yan, Zheng
275dd19ea4 ceph: fix mksnap crash
mksnap reply only contain 'target', does not contain 'dentry'. So
it's wrong to use req->r_reply_info.head->is_dentry to detect traceless
reply.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2014-12-17 20:09:53 +03:00
Dan Carpenter
021b77bee2 ceph: do_sync is never initialized
Probably this code was syncing a lot more often then intended because
the do_sync variable wasn't set to zero.

Cc: stable@vger.kernel.org # v3.11+
Fixes: c62988ec09 ('ceph: avoid meaningless calling ceph_caps_revoking if sync_mode == WB_SYNC_ALL.')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
2014-12-17 20:09:53 +03:00
Yan, Zheng
65a22662bf ceph: support inline data feature
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2014-12-17 20:09:53 +03:00
Yan, Zheng
e20d258d73 ceph: flush inline version
After converting inline data to normal data, client need to flush
the new i_inline_version (CEPH_INLINE_NONE) to MDS. This commit makes
cap messages (sent to MDS) contain inline_version and inline_data.
Client always converts inline data to normal data before data write,
so the inline data length part is always zero.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2014-12-17 20:09:53 +03:00
Yan, Zheng
28127bdd2f ceph: convert inline data to normal data before data write
Before any data write, convert inline data to normal data and set
i_inline_version to CEPH_INLINE_NONE. The OSD request that saves
inline data to object contains 3 operations (CMPXATTR, WRITE and
SETXATTR). It compares a xattr named 'inline_version' to prevent
old data overwrites newer data.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2014-12-17 20:09:52 +03:00
Yan, Zheng
83701246ae ceph: sync read inline data
we can't use getattr to fetch inline data while holding Fr cap,
because it can cause deadlock. If we need to sync read inline data,
drop cap refs first, then use getattr to fetch inline data.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2014-12-17 20:09:52 +03:00
Yan, Zheng
3738daa68a ceph: fetch inline data when getting Fcr cap refs
we can't use getattr to fetch inline data after getting Fcr caps,
because it can cause deadlock. The solution is try bringing inline
data to page cache when not holding any cap, and hope the inline
data page is still there after getting the Fcr caps. If the page
is still there, pin it in page cache for later IO.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2014-12-17 20:09:52 +03:00
Yan, Zheng
01deead041 ceph: use getattr request to fetch inline data
Add a new parameter 'locked_page' to ceph_do_getattr(). If inline data
in getattr reply will be copied to the page.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2014-12-17 20:09:52 +03:00
Yan, Zheng
31c542a199 ceph: add inline data to pagecache
Request reply and cap message can contain inline data. add inline data
to the page cache if there is Fc cap.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2014-12-17 20:09:52 +03:00
Yan, Zheng
fb01d1f8b0 ceph: parse inline data in MClientReply and MClientCaps
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2014-12-17 20:09:52 +03:00
Yan, Zheng
715e4cd405 libceph: specify position of extent operation
allow specifying position of extent operation in multi-operations
osd request. This is required for cephfs to convert inline data to
normal data (compare xattr, then write object).

Signed-off-by: Yan, Zheng <zyan@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
2014-12-17 20:09:52 +03:00
Ilya Dryomov
ca3995ad13 ceph: remove unused stringification macros
These were used to report git versions a long time ago.

Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
2014-12-17 20:09:51 +03:00
Yan, Zheng
97c85a828f ceph: introduce global empty snap context
Current snaphost code does not properly handle moving inode from one
empty snap realm to another empty snap realm. After changing inode's
snap realm, some dirty pages' snap context can be not equal to inode's
i_head_snap. This can trigger BUG() in ceph_put_wrbuffer_cap_refs()

The fix is introduce a global empty snap context for all empty snap
realm. This avoids triggering the BUG() for filesystem with no snapshot.

Fixes: http://tracker.ceph.com/issues/9928

Signed-off-by: Yan, Zheng <zyan@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
2014-12-17 20:09:51 +03:00
John Spray
7cfa0313d0 ceph: message versioning fixes
There were two places we were assigning version in host byte order
instead of network byte order.

Also in MSG_CLIENT_SESSION we weren't setting compat_version in the
header to reflect continued compatability with older MDSs.

Fixes: http://tracker.ceph.com/issues/9945

Signed-off-by: John Spray <john.spray@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2014-12-17 20:09:51 +03:00
Yan, Zheng
33d0733796 libceph: message signature support
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2014-12-17 20:09:50 +03:00
SF Markus Elfring
e96a650a81 ceph, rbd: delete unnecessary checks before two function calls
The functions ceph_put_snap_context() and iput() test whether their
argument is NULL and then return immediately. Thus the test around the
call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
[idryomov@redhat.com: squashed rbd.c hunk, changelog]
Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
2014-12-17 20:09:50 +03:00
Yan, Zheng
70db4f3629 ceph: introduce a new inode flag indicating if cached dentries are ordered
After creating/deleting/renaming file, offsets of sibling dentries may
change. So we can not use cached dentries to satisfy readdir. But we can
still use the cached dentries to conclude -ENOENT for lookup.

This patch introduces a new inode flag indicating if child dentries are
ordered. The flag is set at the same time marking a directory complete.
After creating/deleting/renaming file, we clear the flag on directory
inode. This prevents ceph_readdir() from using cached dentries to satisfy
readdir syscall.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2014-12-17 20:09:50 +03:00
Yan, Zheng
9280be24dc ceph: fix file lock interruption
When a lock operation is interrupted, current code sends a unlock request to
MDS to undo the lock operation. This method does not work as expected because
the unlock request can drop locks that have already been acquired.

The fix is use the newly introduced CEPH_LOCK_FCNTL_INTR/CEPH_LOCK_FLOCK_INTR
requests to interrupt blocked file lock request. These requests do not drop
locks that have alread been acquired, they only interrupt blocked file lock
request.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2014-12-17 20:09:49 +03:00
Dmitry V. Levin
9d4d65748a vfs: make mounts and mountstats honor root dir like mountinfo does
As we already show mountpoints relative to the root directory, thanks
to the change made back in 2000, change show_vfsmnt() and show_vfsstat()
to skip out-of-root mountpoints the same way as show_mountinfo() does.

Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-12-17 08:27:15 -05:00
Dmitry V. Levin
9ad4dc4f73 vfs: cleanup show_mountinfo
Starting with commit v3.2-rc4-1-g02125a8, seq_path_root() no longer
changes the value of its "struct path *root" argument.
Starting with commit v3.2-rc7-104-g8c9379e, the "struct path *root"
argument of seq_path_root() is const.
As result, the temporary variable "root" in show_mountinfo() that
holds a copy of struct path root is no longer needed.

Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-12-17 08:27:15 -05:00
Al Viro
7d65cf10e3 unfuck binfmt_misc.c (broken by commit e6084d4)
scanarg(s, del) never returns s; the empty field results in s + 1.
Restore the correct checks, and move NUL-termination into scanarg(),
while we are at it.

Incidentally, mixing "coding style cleanups" (for small values of cleanup)
with functional changes is a Bad Idea(tm)...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-12-17 08:27:14 -05:00
Al Viro
50062175ff vm_area_operations: kill ->migrate()
the only instance this method has ever grown was one in kernfs -
one that call ->migrate() of another vm_ops if it exists.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-12-17 08:26:51 -05:00
Al Viro
b1bc6d7f16 move_extent_per_page(): get rid of unused w_flags
... and comparing get_fs() with KERNEL_DS used only to initialize that

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-12-17 06:43:56 -05:00
Al Viro
98af592f5b btrfs: filp_open() returns ERR_PTR() on failure, not NULL...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-12-17 06:43:56 -05:00
Linus Torvalds
603ba7e41b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile #2 from Al Viro:
 "Next pile (and there'll be one or two more).

  The large piece in this one is getting rid of /proc/*/ns/* weirdness;
  among other things, it allows to (finally) make nameidata completely
  opaque outside of fs/namei.c, making for easier further cleanups in
  there"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  coda_venus_readdir(): use file_inode()
  fs/namei.c: fold link_path_walk() call into path_init()
  path_init(): don't bother with LOOKUP_PARENT in argument
  fs/namei.c: new helper (path_cleanup())
  path_init(): store the "base" pointer to file in nameidata itself
  make default ->i_fop have ->open() fail with ENXIO
  make nameidata completely opaque outside of fs/namei.c
  kill proc_ns completely
  take the targets of /proc/*/ns/* symlinks to separate fs
  bury struct proc_ns in fs/proc
  copy address of proc_ns_ops into ns_common
  new helpers: ns_alloc_inum/ns_free_inum
  make proc_ns_operations work with struct ns_common * instead of void *
  switch the rest of proc_ns_operations to working with &...->ns
  netns: switch ->get()/->put()/->install()/->inum() to working with &net->ns
  make mntns ->get()/->put()/->install()/->inum() work with &mnt_ns->ns
  common object embedded into various struct ....ns
2014-12-16 15:53:03 -08:00
Linus Torvalds
31f48fc8f2 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull isofs and reiserfs fixes from Jan Kara:
 "A reiserfs and an isofs fix.  They arrived after I sent you my first
  pull request and I don't want to delay them unnecessarily till rc2"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  isofs: Fix infinite looping over CE entries
  reiserfs: destroy allocated commit workqueue
2014-12-16 15:46:01 -08:00
Linus Torvalds
0b233b7c79 Merge branch 'for-3.19' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
 "A comparatively quieter cycle for nfsd this time, but still with two
  larger changes:

   - RPC server scalability improvements from Jeff Layton (using RCU
     instead of a spinlock to find idle threads).

   - server-side NFSv4.2 ALLOCATE/DEALLOCATE support from Anna
     Schumaker, enabling fallocate on new clients"

* 'for-3.19' of git://linux-nfs.org/~bfields/linux: (32 commits)
  nfsd4: fix xdr4 count of server in fs_location4
  nfsd4: fix xdr4 inclusion of escaped char
  sunrpc/cache: convert to use string_escape_str()
  sunrpc: only call test_bit once in svc_xprt_received
  fs: nfsd: Fix signedness bug in compare_blob
  sunrpc: add some tracepoints around enqueue and dequeue of svc_xprt
  sunrpc: convert to lockless lookup of queued server threads
  sunrpc: fix potential races in pool_stats collection
  sunrpc: add a rcu_head to svc_rqst and use kfree_rcu to free it
  sunrpc: require svc_create callers to pass in meaningful shutdown routine
  sunrpc: have svc_wake_up only deal with pool 0
  sunrpc: convert sp_task_pending flag to use atomic bitops
  sunrpc: move rq_cachetype field to better optimize space
  sunrpc: move rq_splice_ok flag into rq_flags
  sunrpc: move rq_dropme flag into rq_flags
  sunrpc: move rq_usedeferral flag to rq_flags
  sunrpc: move rq_local field to rq_flags
  sunrpc: add a generic rq_flags field to svc_rqst and move rq_secure to it
  nfsd: minor off by one checks in __write_versions()
  sunrpc: release svc_pool_map reference when serv allocation fails
  ...
2014-12-16 15:25:31 -08:00
Jan Kara
f54e18f1b8 isofs: Fix infinite looping over CE entries
Rock Ridge extensions define so called Continuation Entries (CE) which
define where is further space with Rock Ridge data. Corrupted isofs
image can contain arbitrarily long chain of these, including a one
containing loop and thus causing kernel to end in an infinite loop when
traversing these entries.

Limit the traversal to 32 entries which should be more than enough space
to store all the Rock Ridge data.

Reported-by: P J P <ppandit@redhat.com>
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
2014-12-15 15:53:26 +01:00
Linus Torvalds
67e2c38838 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security layer updates from James Morris:
 "In terms of changes, there's general maintenance to the Smack,
  SELinux, and integrity code.

  The IMA code adds a new kconfig option, IMA_APPRAISE_SIGNED_INIT,
  which allows IMA appraisal to require signatures.  Support for reading
  keys from rootfs before init is call is also added"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (23 commits)
  selinux: Remove security_ops extern
  security: smack: fix out-of-bounds access in smk_parse_smack()
  VFS: refactor vfs_read()
  ima: require signature based appraisal
  integrity: provide a hook to load keys when rootfs is ready
  ima: load x509 certificate from the kernel
  integrity: provide a function to load x509 certificate from the kernel
  integrity: define a new function integrity_read_file()
  Security: smack: replace kzalloc with kmem_cache for inode_smack
  Smack: Lock mode for the floor and hat labels
  ima: added support for new kernel cmdline parameter ima_template_fmt
  ima: allocate field pointers array on demand in template_desc_init_fields()
  ima: don't allocate a copy of template_fmt in template_desc_init_fields()
  ima: display template format in meas. list if template name length is zero
  ima: added error messages to template-related functions
  ima: use atomic bit operations to protect policy update interface
  ima: ignore empty and with whitespaces policy lines
  ima: no need to allocate entry for comment
  ima: report policy load status
  ima: use path names cache
  ...
2014-12-14 20:36:37 -08:00
Linus Torvalds
e6b5be2be4 Driver core patches for 3.19-rc1
Here's the set of driver core patches for 3.19-rc1.
 
 They are dominated by the removal of the .owner field in platform
 drivers.  They touch a lot of files, but they are "simple" changes, just
 removing a line in a structure.
 
 Other than that, a few minor driver core and debugfs changes.  There are
 some ath9k patches coming in through this tree that have been acked by
 the wireless maintainers as they relied on the debugfs changes.
 
 Everything has been in linux-next for a while.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlSOD20ACgkQMUfUDdst+ylLPACg2QrW1oHhdTMT9WI8jihlHVRM
 53kAoLeteByQ3iVwWurwwseRPiWa8+MI
 =OVRS
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core update from Greg KH:
 "Here's the set of driver core patches for 3.19-rc1.

  They are dominated by the removal of the .owner field in platform
  drivers.  They touch a lot of files, but they are "simple" changes,
  just removing a line in a structure.

  Other than that, a few minor driver core and debugfs changes.  There
  are some ath9k patches coming in through this tree that have been
  acked by the wireless maintainers as they relied on the debugfs
  changes.

  Everything has been in linux-next for a while"

* tag 'driver-core-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (324 commits)
  Revert "ath: ath9k: use debugfs_create_devm_seqfile() helper for seq_file entries"
  fs: debugfs: add forward declaration for struct device type
  firmware class: Deletion of an unnecessary check before the function call "vunmap"
  firmware loader: fix hung task warning dump
  devcoredump: provide a one-way disable function
  device: Add dev_<level>_once variants
  ath: ath9k: use debugfs_create_devm_seqfile() helper for seq_file entries
  ath: use seq_file api for ath9k debugfs files
  debugfs: add helper function to create device related seq_file
  drivers/base: cacheinfo: remove noisy error boot message
  Revert "core: platform: add warning if driver has no owner"
  drivers: base: support cpu cache information interface to userspace via sysfs
  drivers: base: add cpu_device_create to support per-cpu devices
  topology: replace custom attribute macros with standard DEVICE_ATTR*
  cpumask: factor out show_cpumap into separate helper function
  driver core: Fix unbalanced device reference in drivers_probe
  driver core: fix race with userland in device_add()
  sysfs/kernfs: make read requests on pre-alloc files use the buffer.
  sysfs/kernfs: allow attributes to request write buffer be pre-allocated.
  fs: sysfs: return EGBIG on write if offset is larger than file size
  ...
2014-12-14 16:10:09 -08:00
Linus Torvalds
7a02d08969 These patches optionally add LZ4 compression support to Squashfs.
LZ4 is a lightweight compression algorithm which can be used
 on embedded systems to reduce CPU and memory overhead (in comparison
 to the standard zlib compression).
 
 These patches add the wrapper code to allow Squashfs to use
 the existing LZ4 decompression code, and the necessary configuration
 option.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJUjgIJAAoJEJAch/D1fbHUArwP/iDiDSpqxdfTQHwUKxB57skO
 0iBzg6bXPlqwmsnllegg0SV0vOvFjpWiXWpOOtAVCJlfbov8DgUndsigpvG3UhD/
 qJpNXAJ+xSdOzBqj3bS7SOu2DwPY8Gz4rxGQcNN3PsuOVR/EUgAnNlv22ZHY10A5
 XQVyPbkwZ73TrZ2uKA8leWArFtCbM4oYGpxP+ramEox8nVFEOtixn5IcX5WkbGEL
 Yt0NRw8K8vDIIETWVariugUFE4C1olFk+YmqqAw7cmDGJ70cEg5jh9ocNkwDIZPj
 I9BNtkggBRMaCPwGsH6IvahMFUyWLQUgGayfY/fgbRiB9ZuYIQ1lyPDhzbgWczoE
 o34eXAIDdmfPrmYlDEBDkYnXXtwuYqdOVYOtcEnyFEYqpHfaeS2h2s9nTiM+rz21
 v0UEaDRmPtlkK/ZdLKUsrOf+8y9ejkT0R67swFaguHshL6EHey7X5ghmOuwCoL9x
 fzGWtPFR+Nbqga5T3dwf+apvyUVrPaw6gZu36NNim2779ZgpnIPzW6MEYUMhtXCn
 ef2+NvS9AeGyo7kiqlNQrihQWZSN0W/AiVsEeulzk5h+adzSNQ5eipzAO9DAAp16
 8muY4nq51bOGVaWzqJz/KacCmt7i0qUdmS1p4l2uqPp9gH/s/S91yrYn/iszf3AV
 CpwU2i9g3nQu9ecDc1Os
 =mr7J
 -----END PGP SIGNATURE-----

Merge tag 'squashfs-updates' of git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-next

Pull squashfs update from Phillip Lougher:
 "These patches optionally add LZ4 compression support to Squashfs.

  LZ4 is a lightweight compression algorithm which can be used on
  embedded systems to reduce CPU and memory overhead (in comparison to
  the standard zlib compression).

  These patches add the wrapper code to allow Squashfs to use the
  existing LZ4 decompression code, and the necessary configuration
  option"

* tag 'squashfs-updates' of git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-next:
  Squashfs: Add LZ4 compression configuration option
  Squashfs: add LZ4 compression support
2014-12-14 14:42:53 -08:00
Linus Torvalds
7d22286ff7 Merge git://git.kvack.org/~bcrl/aio-next
Pull aio updates from Benjamin LaHaise.

* git://git.kvack.org/~bcrl/aio-next:
  aio: Skip timer for io_getevents if timeout=0
  aio: Make it possible to remap aio ring
2014-12-14 13:36:57 -08:00
Kevin Cernekee
97c7134ae2 Fix signed/unsigned pointer warning
Commit 2ae83bf938 ("[CIFS] Fix setting time before epoch (negative
time values)") changed "u64 t" to "s64 t", which makes do_div() complain
about a pointer signedness mismatch:

      CC      fs/cifs/netmisc.o
    In file included from ./arch/mips/include/asm/div64.h:12:0,
                     from include/linux/kernel.h:124,
                     from include/linux/list.h:8,
                     from include/linux/wait.h:6,
                     from include/linux/net.h:23,
                     from fs/cifs/netmisc.c:25:
    fs/cifs/netmisc.c: In function ‘cifs_NTtimeToUnix’:
    include/asm-generic/div64.h:43:28: warning: comparison of distinct pointer types lacks a cast [enabled by default]
      (void)(((typeof((n)) *)0) == ((uint64_t *)0)); \
                                ^
    fs/cifs/netmisc.c:941:22: note: in expansion of macro ‘do_div’
       ts.tv_nsec = (long)do_div(t, 10000000) * 100;

Introduce a temporary "u64 abs_t" variable to fix this.

Signed-off-by: Kevin Cernekee <cernekee@gmail.com>
Signed-off-by: Steve French <steve.french@primarydata.com>
2014-12-14 14:55:57 -06:00
Sachin Prabhu
9235d09873 Convert MessageID in smb2_hdr to LE
We have encountered failures when When testing smb2 mounts on ppc64
machines when using both Samba as well as Windows 2012.

On poking around, the problem was determined to be caused by the
high endian MessageID passed in the header for smb2. On checking the
corresponding MID for smb1 is converted to LE before being sent on the
wire.

We have tested this patch successfully on a ppc64 machine.

Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
2014-12-14 14:55:45 -06:00
Fam Zheng
5f785de588 aio: Skip timer for io_getevents if timeout=0
In this case, it is basically a polling. Let's not involve timer at all
because that would hurt performance for application event loops.

In an arbitrary test I've done, io_getevents syscall elapsed time
reduces from 50000+ nanoseconds to a few hundereds.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
2014-12-13 17:50:20 -05:00
Pavel Emelyanov
e4a0d3e720 aio: Make it possible to remap aio ring
There are actually two issues this patch addresses. Let me start with
the one I tried to solve in the beginning.

So, in the checkpoint-restore project (criu) we try to dump tasks'
state and restore one back exactly as it was. One of the tasks' state
bits is rings set up with io_setup() call. There's (almost) no problems
in dumping them, there's a problem restoring them -- if I dump a task
with aio ring originally mapped at address A, I want to restore one
back at exactly the same address A. Unfortunately, the io_setup() does
not allow for that -- it mmaps the ring at whatever place mm finds
appropriate (it calls do_mmap_pgoff() with zero address and without
the MAP_FIXED flag).

To make restore possible I'm going to mremap() the freshly created ring
into the address A (under which it was seen before dump). The problem is
that the ring's virtual address is passed back to the user-space as the
context ID and this ID is then used as search key by all the other io_foo()
calls. Reworking this ID to be just some integer doesn't seem to work, as
this value is already used by libaio as a pointer using which this library
accesses memory for aio meta-data.

So, to make restore work we need to make sure that

a) ring is mapped at desired virtual address
b) kioctx->user_id matches this value

Having said that, the patch makes mremap() on aio region update the
kioctx's user_id and mmap_base values.

Here appears the 2nd issue I mentioned in the beginning of this mail.
If (regardless of the C/R dances I do) someone creates an io context
with io_setup(), then mremap()-s the ring and then destroys the context,
the kill_ioctx() routine will call munmap() on wrong (old) address.
This will result in a) aio ring remaining in memory and b) some other
vma get unexpectedly unmapped.

What do you think?

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
2014-12-13 17:49:50 -05:00
Linus Torvalds
caf292ae5b Merge branch 'for-3.19/core' of git://git.kernel.dk/linux-block
Pull block driver core update from Jens Axboe:
 "This is the pull request for the core block IO changes for 3.19.  Not
  a huge round this time, mostly lots of little good fixes:

   - Fix a bug in sysfs blktrace interface causing a NULL pointer
     dereference, when enabled/disabled through that API.  From Arianna
     Avanzini.

   - Various updates/fixes/improvements for blk-mq:

        - A set of updates from Bart, mostly fixing buts in the tag
          handling.

        - Cleanup/code consolidation from Christoph.

        - Extend queue_rq API to be able to handle batching issues of IO
          requests. NVMe will utilize this shortly. From me.

        - A few tag and request handling updates from me.

        - Cleanup of the preempt handling for running queues from Paolo.

        - Prevent running of unmapped hardware queues from Ming Lei.

        - Move the kdump memory limiting check to be in the correct
          location, from Shaohua.

        - Initialize all software queues at init time from Takashi. This
          prevents a kobject warning when CPUs are brought online that
          weren't online when a queue was registered.

   - Single writeback fix for I_DIRTY clearing from Tejun.  Queued with
     the core IO changes, since it's just a single fix.

   - Version X of the __bio_add_page() segment addition retry from
     Maurizio.  Hope the Xth time is the charm.

   - Documentation fixup for IO scheduler merging from Jan.

   - Introduce (and use) generic IO stat accounting helpers for non-rq
     drivers, from Gu Zheng.

   - Kill off artificial limiting of max sectors in a request from
     Christoph"

* 'for-3.19/core' of git://git.kernel.dk/linux-block: (26 commits)
  bio: modify __bio_add_page() to accept pages that don't start a new segment
  blk-mq: Fix uninitialized kobject at CPU hotplugging
  blktrace: don't let the sysfs interface remove trace from running list
  blk-mq: Use all available hardware queues
  blk-mq: Micro-optimize bt_get()
  blk-mq: Fix a race between bt_clear_tag() and bt_get()
  blk-mq: Avoid that __bt_get_word() wraps multiple times
  blk-mq: Fix a use-after-free
  blk-mq: prevent unmapped hw queue from being scheduled
  blk-mq: re-check for available tags after running the hardware queue
  blk-mq: fix hang in bt_get()
  blk-mq: move the kdump check to blk_mq_alloc_tag_set
  blk-mq: cleanup tag free handling
  blk-mq: use 'nr_cpu_ids' as highest CPU ID count for hwq <-> cpu map
  blk: introduce generic io stat accounting help function
  blk-mq: handle the single queue case in blk_mq_hctx_next_cpu
  genhd: check for int overflow in disk_expand_part_tbl()
  blk-mq: add blk_mq_free_hctx_request()
  blk-mq: export blk_mq_free_request()
  blk-mq: use get_cpu/put_cpu instead of preempt_disable/preempt_enable
  ...
2014-12-13 14:14:23 -08:00
Jan Kara
37d469e767 fsnotify: remove destroy_list from fsnotify_mark
destroy_list is used to track marks which still need waiting for srcu
period end before they can be freed.  However by the time mark is added to
destroy_list it isn't in group's list of marks anymore and thus we can
reuse fsnotify_mark->g_list for queueing into destroy_list.  This saves
two pointers for each fsnotify_mark.

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Eric Paris <eparis@redhat.com>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:53 -08:00
Jan Kara
0809ab69a2 fsnotify: unify inode and mount marks handling
There's a lot of common code in inode and mount marks handling.  Factor it
out to a common helper function.

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Eric Paris <eparis@redhat.com>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:53 -08:00
Heinrich Schuchardt
820c12d5d6 fallocate: create FAN_MODIFY and IN_MODIFY events
The fanotify and the inotify API can be used to monitor changes of the
file system.  System call fallocate() modifies files.  Hence it should
trigger the corresponding fanotify (FAN_MODIFY) and inotify (IN_MODIFY)
events.  The most interesting case is FALLOC_FL_COLLAPSE_RANGE because
this value allows to create arbitrary file content from random data.

This patch adds the missing call to fsnotify_modify().

The FAN_MODIFY and IN_MODIFY event will be created when fallocate()
succeeds.  It will even be created if the file length remains unchanged,
e.g.  when calling fanotify with flag FALLOC_FL_KEEP_SIZE.

This logic was primarily chosen to keep the coding simple.

It resembles the logic of the write() system call.

When we call write() we always create a FAN_MODIFY event, even in the case
of overwriting with identical data.

Events FAN_MODIFY and IN_MODIFY do not provide any guarantee that data was
actually changed.

Furthermore even if if the filesize remains unchanged, fallocate() may
influence whether a subsequent write() will succeed and hence the
fallocate() call may be considered a modification.

The fallocate(2) man page teaches: After a successful call, subsequent
writes into the range specified by offset and len are guaranteed not to
fail because of lack of disk space.

So calling fallocate(fd, FALLOC_FL_KEEP_SIZE, offset, len) may result in
different outcomes of a subsequent write depending on the values of offset
and len.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@parisplace.org>
Cc: John McCutchan <john@johnmccutchan.com>
Cc: Robert Love <rlove@rlove.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:53 -08:00
Fabian Frederick
92cab82b2c fs/affs/file.c: remove obsolete pagesize check
linux kernel doesn't manage page sizes below 4kb.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:52 -08:00
Fabian Frederick
9abb408307 fs/affs/file.c: add support to O_DIRECT
Based on ext2_direct_IO

Tested with O_DIRECT file open and sysbench/mariadb with 1% written
queries improvement (update_non_index test) on a volume created with
mkaffs.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:51 -08:00