Commit Graph

40393 Commits

Author SHA1 Message Date
Firo Yang
c456aacf3c nfs: Remove unneeded casts in nfs
Don't unnecessarily cast allocation return value in
fs/nfs/inode.c::nfs_alloc_inode().

Signed-off-by: Firo Yang <firogm@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-04-23 15:16:16 -04:00
Benjamin Coddington
ce85cfbed6 NFS: Don't attempt to decode missing directory entries
If a READDIR reply comes back without any page data, avoid a NULL pointer
dereference in xdr_copy_to_scratch().

BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
IP: [<ffffffff813a378d>] memcpy+0xd/0x110
...
Call Trace:
	? xdr_inline_decode+0x7a/0xb0 [sunrpc]
	nfs3_decode_dirent+0x73/0x320 [nfsv3]
	nfs_readdir_page_filler+0xd5/0x4e0 [nfs]
	? nfs3_rpc_wrapper.constprop.9+0x42/0xc0 [nfsv3]
	nfs_readdir_xdr_to_array+0x1fa/0x330 [nfs]
	? mem_cgroup_commit_charge+0xac/0x160
	? nfs_readdir_xdr_to_array+0x330/0x330 [nfs]
	nfs_readdir_filler+0x22/0x90 [nfs]
	do_read_cache_page+0x7e/0x1a0
	read_cache_page+0x1c/0x20
	nfs_readdir+0x18e/0x660 [nfs]
	? nfs3_xdr_dec_getattr3res+0x80/0x80 [nfsv3]
	iterate_dir+0x97/0x130
	SyS_getdents+0x94/0x120
	? fillonedir+0xd0/0xd0
	system_call_fastpath+0x12/0x17

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-04-23 15:16:15 -04:00
Nicolas Iooss
3708f842e1 Revert "nfs: replace nfs_add_stats with nfs_inc_stats when add one"
This reverts commit 5a254d08b0.

Since commit 5a254d08b0 ("nfs: replace nfs_add_stats with
nfs_inc_stats when add one"), nfs_readpage and nfs_do_writepage use
nfs_inc_stats to increment NFSIOS_READPAGES and NFSIOS_WRITEPAGES
instead of nfs_add_stats.

However nfs_inc_stats does not do the same thing as nfs_add_stats with
value 1 because these functions work on distinct stats:
nfs_inc_stats increments stats from "enum nfs_stat_eventcounters" (in
server->io_stats->events) and nfs_add_stats those from "enum
nfs_stat_bytecounters" (in server->io_stats->bytes).

Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Fixes: 5a254d08b0 ("nfs: replace nfs_add_stats with nfs_inc_stats...")
Cc: stable@vger.kernel.org # 3.19+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-04-23 15:16:15 -04:00
Anna Schumaker
7b320382d0 NFS: Rename idmap.c to nfs4idmap.c
I added the nfs4 prefix to make it obvious that this file is built into
the NFS v4 module, and not the generic client.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-04-23 15:16:14 -04:00
Anna Schumaker
40c64c26a4 NFS: Move nfs_idmap.h into fs/nfs/
This file is only used internally to the NFS v4 module, so it doesn't
need to be in the global include path.  I also renamed it from
nfs_idmap.h to nfs4idmap.h to emphasize that it's an NFSv4-only include
file.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-04-23 15:16:14 -04:00
Anna Schumaker
f9ebd61855 NFS: Remove CONFIG_NFS_V4 checks from nfs_idmap.h
The idmapper is completely internal to the NFS v4 module, so this macro
will always evaluate to true.  This patch also removes unnecessary
includes of this file from the generic NFS client.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-04-23 15:16:13 -04:00
Anna Schumaker
7c61f0d389 NFS: Add a stub for GETDEVICELIST
d4b18c3e (pnfs: remove GETDEVICELIST implementation) removed the
GETDEVICELIST operation from the NFS client, but left a "hole" in the
nfs4_procedures array.  This caused /proc/self/mountstats to report an
operation named "51" where GETDEVICELIST used to be.  This patch adds a
stub to fix mountstats.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Fixes: d4b18c3e (pnfs: remove GETDEVICELIST implementation)
Cc: stable@vger.kernel.org # 3.17+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-04-23 15:16:13 -04:00
Peng Tao
05f54903d9 nfs: remove WARN_ON_ONCE from nfs_direct_good_bytes
For flexfiles driver, we might choose to read from mirror index other
than 0 while mirror_count is always 1 for read.

Reported-by: Jean Spector <jean@primarydata.com>
Cc: <stable@vger.kernel.org> # v3.19+
Cc: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-04-23 15:05:19 -04:00
Peng Tao
1ccbad9f9f nfs: fix DIO good bytes calculation
For direct read that has IO size larger than rsize, we'll split
it into several READ requests and nfs_direct_good_bytes() would
count completed bytes incorrectly by eating last zero count reply.

Fix it by handling mirror and non-mirror cases differently such that
we only count mirrored writes differently.

This fixes 5fadeb47("nfs: count DIO good bytes correctly with mirroring").

Reported-by: Jean Spector <jean@primarydata.com>
Cc: <stable@vger.kernel.org> # v3.19+
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-04-23 15:05:18 -04:00
Anna Schumaker
ea96d1ecbe nfs: Fetch MOUNTED_ON_FILEID when updating an inode
2ef47eb1 (NFS: Fix use of nfs_attr_use_mounted_on_fileid()) was a good
start to fixing a circular directory structure warning for NFS v4
"junctioned" mountpoints.  Unfortunately, further testing continued to
generate this error.

My server is configured like this:

anna@nfsd ~ % df
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1       9.1G  2.0G  6.5G  24% /
/dev/vdc1      1014M   33M  982M   4% /exports
/dev/vdc2      1014M   33M  982M   4% /exports/vol1
/dev/vdc3      1014M   33M  982M   4% /exports/vol1/vol2

anna@nfsd ~ % cat /etc/exports
/exports/          *(rw,async,no_subtree_check,no_root_squash)
/exports/vol1/     *(rw,async,no_subtree_check,no_root_squash)
/exports/vol1/vol2 *(rw,async,no_subtree_check,no_root_squash)

I've been running chown across the entire mountpoint twice in a row to
hit this problem.  The first run succeeds, but the second one fails with
the circular directory warning along with:

anna@client ~ % dmesg
[Apr 3 14:28] NFS: server 192.168.100.204 error: fileid changed
              fsid 0:39: expected fileid 0x100080, got 0x80

WHere 0x80 is the mountpoint's fileid and 0x100080 is the mounted-on
fileid.

This patch fixes the issue by requesting an updated mounted-on fileid
from the server during nfs_update_inode(), and then checking that the
fileid stored in the nfs_inode matches either the fileid or mounted-on
fileid returned by the server.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-04-23 14:43:54 -04:00
Jeff Layton
5d05e54af3 nfs: fix high load average due to callback thread sleeping
Chuck pointed out a problem that crept in with commit 6ffa30d3f7 (nfs:
don't call blocking operations while !TASK_RUNNING). Linux counts tasks
in uninterruptible sleep against the load average, so this caused the
system's load average to be pinned at at least 1 when there was a
NFSv4.1+ mount active.

Not a huge problem, but it's probably worth fixing before we get too
many complaints about it. This patch converts the code back to use
TASK_INTERRUPTIBLE sleep, simply has it flush any signals on each loop
iteration. In practice no one should really be signalling this thread at
all, so I think this is reasonably safe.

With this change, there's also no need to game the hung task watchdog so
we can also convert the schedule_timeout call back to a normal schedule.

Cc: <stable@vger.kernel.org>
Reported-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Tested-by: Chuck Lever <chuck.lever@oracle.com>
Fixes: commit 6ffa30d3f7 (“nfs: don't call blocking . . .”)
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-04-23 14:38:07 -04:00
Anna Schumaker
f830f7ddd9 NFS: Reduce time spent holding the i_mutex during fallocate()
At the very least, we should not be taking the i_mutex until after
checking if the server even supports ALLOCATE or DEALLOCATE, allowing
v4.0 or v4.1 to exit without potentially waiting on a lock.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-04-23 14:36:28 -04:00
Anna Schumaker
9a51940bf6 NFS: Don't zap caches on fallocate()
This patch adds a GETATTR to the end of ALLOCATE and DEALLOCATE
operations so we can set the updated inode size and change attribute
directly.  DEALLOCATE will still need to release pagecache pages, so
nfs42_proc_deallocate() now calls truncate_pagecache_range() before
contacting the server.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-04-23 14:36:28 -04:00
Linus Torvalds
a62d016cec Common MTD:
* Add Kconfig option for keeping both the 'master' and 'partition' MTDs
    registered as devices. This would really make a better default if we could
    do it over, as it allows a lot more flexibility in (1) determining the flash
    topology of the system from user-space and (2) adding temporary partitions
    at runtime (ioctl(BLKPG)). Unfortunately, this would possibly cause
    user-space breakage, as it will cause renumbering of the /dev/mtdX devices.
    We'll see if we can change this in the future, as there have already been a
    few people looking for this feature, and I know others have just been
    working around our current limitations instead of fixing them this way.
  * Along with the previous change, add some additional information to sysfs, so
    user-space can read the offset of each partition within its master device
 
 SPI NOR:
  * add new device tree compatible binding to represent the mostly-compatible
    class of SPI NOR flash which can be detected by their extended JEDEC ID
    bytes, cutting down the duplication of our ID tables
  * misc. new IDs
 
 Various other miscellaneous fixes and changes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVN9ypAAoJEFySrpd9RFgtaUQQAKmlCVMrxAKtF6U5jpzf07hA
 7ZrcMdUTSwS++dBIAgDl6JSuSGT5KRLrS1FOp60p+VAjbD9VFcRLUUQxahXW1tAh
 Dr8a3Akwd+lgIp77bZhWBY35dXmjIJ1GSzo7jdbJMDwAeDd3gBeSFTDoePsrCt6K
 0/NPOsQzCFDDr1lwuQh1LzkLLQfVAC3ImNCBm5smvyEfhxXqzC02HOLf8Z9VMGnY
 OxM9i0T6Ik3xeaaP/vH91sApmdn598gP5DB5cNr61YrZeVZmEoI4EWlOmagcYVC2
 Tef9Ng4YmHGXo65k7XcKRykAVWECYAGr4HKCDZ8tsbvpfdbQMS5wHEgxMsAdvb01
 aChcBNxf4w/Mh49fzjZppTlPN25FERRMnXt7CkUqQkqet9uDkD/5RNPl65ermeC7
 EKx2MoxnpXrfZ0EkSxqrfdzP0oQx0AqAkbCyLIN42Vbxl7ckFMN3WAPQ2NR2Aaoh
 SUiKwwaFFiK+C9qEytj0s+cmKPzsTzeQVYgp9NX64EfVQumqpsfbu6XIPV+FGy2i
 DvHvmTEvm4SpqMPSnhkmZ6DFSjuzvQdqzKtDyZmRppxHKgWUsXYdftGPMG0+ZbaG
 t4zysWfJG897TMVYLKY9pGqvouMuAVJ4kX1+iZbJc8dr4bwIzXIYuEGPLVv58gUO
 KjjlYk91/jFNmBW5anxC
 =aIsV
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-20150422' of git://git.infradead.org/linux-mtd

Pull MTD updates from Brian Norris:
 "Common MTD:

   - Add Kconfig option for keeping both the 'master' and 'partition'
     MTDs registered as devices.  This would really make a better
     default if we could do it over, as it allows a lot more flexibility
     in (1) determining the flash topology of the system from user-space
     and (2) adding temporary partitions at runtime (ioctl(BLKPG)).

     Unfortunately, this would possibly cause user-space breakage, as it
     will cause renumbering of the /dev/mtdX devices.  We'll see if we
     can change this in the future, as there have already been a few
     people looking for this feature, and I know others have just been
     working around our current limitations instead of fixing them this
     way.

   - Along with the previous change, add some additional information to
     sysfs, so user-space can read the offset of each partition within
     its master device

  SPI NOR:

   - add new device tree compatible binding to represent the
     mostly-compatible class of SPI NOR flash which can be detected by
     their extended JEDEC ID bytes, cutting down the duplication of our
     ID tables

   - misc.  new IDs

  Various other miscellaneous fixes and changes"

* tag 'for-linus-20150422' of git://git.infradead.org/linux-mtd: (53 commits)
  mtd: spi-nor: Add support for Macronix mx25u6435f serial flash
  mtd: spi-nor: Add support for Winbond w25q64dw serial flash
  mtd: spi-nor: add support for the Winbond W25X05 flash
  mtd: spi-nor: support en25s64 device
  mtd: m25p80: bind to "nor-jedec" ID, for auto-detection
  Documentation: devicetree: m25p80: add "nor-jedec" binding
  mtd: Make MTD tests cancelable
  mtd: mtd_oobtest: Fix bitflip_limit usage in test case 3
  mtd: docg3: remove invalid __exit annotations
  mtd: fsl_ifc_nand: use msecs_to_jiffies for time conversion
  mtd: atmel_nand: don't map the ROM table if no pmecc table offset in DT
  mtd: atmel_nand: add a definition for the oob reserved bytes
  mtd: part: Remove partition overlap checks
  mtd: part: Add sysfs variable for offset of partition
  mtd: part: Create the master device node when partitioned
  mtd: ts5500_flash: Fix typo in MODULE_DESCRIPTION in ts5500_flash.c
  mtd: denali: Disable sub-page writes in Denali NAND driver
  mtd: pxa3xx_nand: cleanup wait_for_completion handling
  mtd: nand: gpmi: Check for scan_bbt() error
  mtd: nand: gpmi: fixup return type of wait_for_completion_timeout
  ...
2015-04-22 12:00:44 -07:00
Linus Torvalds
1204c46445 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph updates from Sage Weil:
 "This time around we have a collection of CephFS fixes from Zheng
  around MDS failure handling and snapshots, support for a new CRUSH
  straw2 algorithm (to sync up with userspace) and several RBD cleanups
  and fixes from Ilya, an error path leak fix from Taesoo, and then an
  assorted collection of cleanups from others"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (28 commits)
  rbd: rbd_wq comment is obsolete
  libceph: announce support for straw2 buckets
  crush: straw2 bucket type with an efficient 64-bit crush_ln()
  crush: ensuring at most num-rep osds are selected
  crush: drop unnecessary include from mapper.c
  ceph: fix uninline data function
  ceph: rename snapshot support
  ceph: fix null pointer dereference in send_mds_reconnect()
  ceph: hold on to exclusive caps on complete directories
  libceph: simplify our debugfs attr macro
  ceph: show non-default options only
  libceph: expose client options through debugfs
  libceph, ceph: split ceph_show_options()
  rbd: mark block queue as non-rotational
  libceph: don't overwrite specific con error msgs
  ceph: cleanup unsafe requests when reconnecting is denied
  ceph: don't zero i_wrbuffer_ref when reconnecting is denied
  ceph: don't mark dirty caps when there is no auth cap
  ceph: keep i_snap_realm while there are writers
  libceph: osdmap.h: Add missing format newlines
  ...
2015-04-22 11:30:10 -07:00
Yan, Zheng
ec137c10e7 ceph: fix uninline data function
For CEPH_OSD_CMPXATTR_MODE_U64, OSD expects the u64 to be encoded
as string in object's xattr.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-04-22 18:33:41 +03:00
Yan, Zheng
0ea611a3bc ceph: rename snapshot support
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-04-22 18:33:41 +03:00
Yan, Zheng
c0bd50e2ee ceph: fix null pointer dereference in send_mds_reconnect()
sb->s_root can be null when umounting

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-04-22 18:33:31 +03:00
Giuseppe Cantavenera
bb7ffbf29e nfsd: fix nsfd startup race triggering BUG_ON
nfsd triggered a BUG_ON in net_generic(...) when rpc_pipefs_event(...)
in fs/nfsd/nfs4recover.c was called before assigning ntfsd_net_id.
The following was observed on a MIPS 32-core processor:
kernel: Call Trace:
kernel: [<ffffffffc00bc5e4>] rpc_pipefs_event+0x7c/0x158 [nfsd]
kernel: [<ffffffff8017a2a0>] notifier_call_chain+0x70/0xb8
kernel: [<ffffffff8017a4e4>] __blocking_notifier_call_chain+0x4c/0x70
kernel: [<ffffffff8053aff8>] rpc_fill_super+0xf8/0x1a0
kernel: [<ffffffff8022204c>] mount_ns+0xb4/0xf0
kernel: [<ffffffff80222b48>] mount_fs+0x50/0x1f8
kernel: [<ffffffff8023dc00>] vfs_kern_mount+0x58/0xf0
kernel: [<ffffffff802404ac>] do_mount+0x27c/0xa28
kernel: [<ffffffff80240cf0>] SyS_mount+0x98/0xe8
kernel: [<ffffffff80135d24>] handle_sys64+0x44/0x68
kernel:
kernel:
        Code: 0040f809  00000000  2e020001 <00020336> 3c12c00d
                3c02801a  de100000 6442eb98  0040f809
kernel: ---[ end trace 7471374335809536 ]---

Fixed this behaviour by calling register_pernet_subsys(&nfsd_net_ops) before
registering rpc_pipefs_event(...) with the notifier chain.

Signed-off-by: Giuseppe Cantavenera <giuseppe.cantavenera.ext@nokia.com>
Signed-off-by: Lorenzo Restelli <lorenzo.restelli.ext@nokia.com>
Reviewed-by: Kinlong Mee <kinglongmee@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-04-21 16:16:03 -04:00
Mark Salter
135dd002c2 nfsd: eliminate NFSD_DEBUG
Commit f895b252d4 ("sunrpc: eliminate RPC_DEBUG") introduced
use of IS_ENABLED() in a uapi header which leads to a build
failure for userspace apps trying to use <linux/nfsd/debug.h>:

   linux/nfsd/debug.h:18:15: error: missing binary operator before token "("
  #if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
                ^

Since this was only used to define NFSD_DEBUG if CONFIG_SUNRPC_DEBUG
is enabled, replace instances of NFSD_DEBUG with CONFIG_SUNRPC_DEBUG.

Cc: stable@vger.kernel.org
Fixes: f895b252d4 "sunrpc: eliminate RPC_DEBUG"
Signed-off-by: Mark Salter <msalter@redhat.com>
Reviewed-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-04-21 16:16:02 -04:00
J. Bruce Fields
6e4891dc28 nfsd4: fix READ permission checking
In the case we already have a struct file (derived from a stateid), we
still need to do permission-checking; otherwise an unauthorized user
could gain access to a file by sniffing or guessing somebody else's
stateid.

Cc: stable@vger.kernel.org
Fixes: dc97618ddd "nfsd4: separate splice and readv cases"
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-04-21 16:16:01 -04:00
J. Bruce Fields
980608fb50 nfsd4: disallow SEEK with special stateids
If the client uses a special stateid then we'll pass a NULL file to
vfs_llseek.

Fixes: 24bab49122 " NFSD: Implement SEEK"
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: stable@vger.kernel.org
Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-04-21 16:16:01 -04:00
J. Bruce Fields
5ba4a25ab7 nfsd4: disallow ALLOCATE with special stateids
vfs_fallocate will hit a NULL dereference if the client tries an
ALLOCATE or DEALLOCATE with a special stateid.  Fix that.  (We also
depend on the open to have broken any conflicting leases or delegations
for us.)

(If it turns out we need to allow special stateid's then we could do a
temporary open here in the special-stateid case, as we do for read and
write.  For now I'm assuming it's not necessary.)

Fixes: 95d871f03c "nfsd: Add ALLOCATE support"
Cc: stable@vger.kernel.org
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-04-21 15:44:06 -04:00
Linus Torvalds
8f443e2372 Revert "ocfs2: incorrect check for debugfs returns"
This reverts commit e2ac55b6a8.

Huang Ying reports that this causes a hang at boot with debugfs disabled.

It is true that the debugfs error checks are kind of confusing, and this
code certainly merits more cleanup and thinking about it, but there's
something wrong with the trivial "check not just for NULL, but for error
pointers too" patch.

Yes, with debugfs disabled, we will end up setting the o2hb_debug_dir
pointer variable to an error pointer (-ENODEV), and then continue as if
everything was fine.  But since debugfs is disabled, all the _users_ of
that pointer end up being compiled away, so even though the pointer can
not be dereferenced, that's still fine.

So it's confusing and somewhat questionable, but the "more correct"
error checks end up causing more trouble than they fix.

Reported-by: Huang Ying <ying.huang@intel.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Chengyu Song <csong84@gatech.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-21 09:17:28 -07:00
Yan, Zheng
32ec439775 ceph: hold on to exclusive caps on complete directories
If a directory is complete, we want to keep the exclusive
cap. So that MDS does not end up revoking the shared cap
on every create/unlink operation.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-04-20 18:55:40 +03:00
Ilya Dryomov
ff7eeb82cc ceph: show non-default options only
Don't pollute /proc/mounts with default options (presently these are
dcache, nofsc and acl).  Leave the acl/noacl however - it's a bit of
a special case due to CONFIG_CEPH_FS_POSIX_ACL.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2015-04-20 18:55:39 +03:00
Ilya Dryomov
ff40f9ae95 libceph, ceph: split ceph_show_options()
Split ceph_show_options() into two pieces and move the piece
responsible for printing client (libceph) options into net/ceph.  This
way people adding a libceph option wouldn't have to remember to update
code in fs/ceph.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2015-04-20 18:55:38 +03:00
Yan, Zheng
1c841a96b5 ceph: cleanup unsafe requests when reconnecting is denied
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-04-20 18:55:37 +03:00
Yan, Zheng
a9f6eb6185 ceph: don't zero i_wrbuffer_ref when reconnecting is denied
remove_session_caps_cb() does not truncate dirty data in page
cache, but zeros i_wrbuffer_ref/i_wrbuffer_ref_head. This will
result negtive i_wrbuffer_ref/i_wrbuffer_ref_head

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-04-20 18:55:36 +03:00
Yan, Zheng
571ade336a ceph: don't mark dirty caps when there is no auth cap
No i_auth_cap means reconnecting to MDS was denied. So don't
add new dirty caps.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-04-20 18:55:36 +03:00
Yan, Zheng
db40cc1702 ceph: keep i_snap_realm while there are writers
when reconnecting to MDS is denied, we remove session caps
forcibly. But it's possible there are ongoing write, the
write code needs to reference i_snap_realm. So if there are
ongoing write, we keep i_snap_realm.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-04-20 18:55:35 +03:00
Sanidhya Kashyap
a149bb9a28 ceph: kstrdup() memory handling
Currently, there is no check for the kstrdup() for r_path2,
r_path1 and snapdir_name as various locations as there is a
possibility of failure during memory pressure. Therefore,
returning ENOMEM where the checks have been missed.

Signed-off-by: Sanidhya Kashyap <sanidhya.gatech@gmail.com>
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-04-20 18:55:34 +03:00
Taesoo Kim
c1d00b2d9c ceph: properly release page upon error
When ceph_update_writeable_page fails (including -EAGAIN), it
unlocks (w/ unlock_page) the page but does not 'release'
(w/ page_cache_release) properly.

Upon error, properly set *pagep to NULL, indicating an error.

Signed-off-by: Taesoo Kim <tsgatesv@gmail.com>
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-04-20 18:55:34 +03:00
Nicholas Mc Guire
57e95460f0 ceph: match wait_for_completion_timeout return type
return type of wait_for_completion_timeout is unsigned long not int. An
appropriately named unsigned long is added and the assignment fixed up.

Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-04-20 17:30:23 +03:00
Nicholas Mc Guire
3563dbdd99 ceph: use msecs_to_jiffies for time conversion
This is only an API consolidation and should make things more readable
it replaces var * HZ / 1000 by msecs_to_jiffies(var).

Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-04-20 17:30:22 +03:00
Fabian Frederick
e1eba3ea02 ceph: remove redundant declaration
ceph_aops was already defined extern in addr.c section

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-04-20 17:30:22 +03:00
Yan, Zheng
e2c3de046c ceph: fix dcache/nocache mount option
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-04-20 17:30:22 +03:00
Yan, Zheng
6e6f09231a ceph: drop cap releases in requests composed before cap reconnect
These cap releases are stale because MDS will re-establish client
caps according to the cap reconnect messages.

Note: MDS can detect stale cap messages, so these stale cap
releases are harmless even we don't drop them.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-04-20 17:30:22 +03:00
Linus Torvalds
6162e4b0be A few bug fixes and add support for file-system level encryption in ext4.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJVMvVGAAoJEPL5WVaVDYGjjZgH/0Z4bdtQpuQKAd2EoSUhiOh4
 tReqE1IuTU+urrL9qNA4qUFhAKq0Iju0INrnoYNb1+YxZ2myvUrMY4y2GkapaKgZ
 SFYL8LTS7E79/LuR6q1SFmUYoXCjqpWeHb7rAZ9OluSNQhke8SWdywLnp/0q05Go
 6SDwYdT8trxGED/wYTGPy9zMHcYEYHqIIvfFZd3eYtRnaP42Zo5rUvISg3cP0ekG
 LiX2D9Bi9pyqxgMjTG0+0xiC3ohTfXOujyHbnLVQ7kdZmpzZKfQspoczEIUolYb4
 /Ic4qPQQdbtjooQ7uRYUOFXeVjt7HZuTb3aVmh90RWrEhsLsyBmNd9StLFVdlcg=
 =9f7Z
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "A few bug fixes and add support for file-system level encryption in
  ext4"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (31 commits)
  ext4 crypto: enable encryption feature flag
  ext4 crypto: add symlink encryption
  ext4 crypto: enable filename encryption
  ext4 crypto: filename encryption modifications
  ext4 crypto: partial update to namei.c for fname crypto
  ext4 crypto: insert encrypted filenames into a leaf directory block
  ext4 crypto: teach ext4_htree_store_dirent() to store decrypted filenames
  ext4 crypto: filename encryption facilities
  ext4 crypto: implement the ext4 decryption read path
  ext4 crypto: implement the ext4 encryption write path
  ext4 crypto: inherit encryption policies on inode and directory create
  ext4 crypto: enforce context consistency
  ext4 crypto: add encryption key management facilities
  ext4 crypto: add ext4 encryption facilities
  ext4 crypto: add encryption policy and password salt support
  ext4 crypto: add encryption xattr support
  ext4 crypto: export ext4_empty_dir()
  ext4 crypto: add ext4 encryption Kconfig
  ext4 crypto: reserve codepoints used by the ext4 encryption feature
  ext4 crypto: add ext4_mpage_readpages()
  ...
2015-04-19 14:26:31 -07:00
Jann Horn
8b01fc86b9 fs: take i_mutex during prepare_binprm for set[ug]id executables
This prevents a race between chown() and execve(), where chowning a
setuid-user binary to root would momentarily make the binary setuid
root.

This patch was mostly written by Linus Torvalds.

Signed-off-by: Jann Horn <jann@thejh.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-19 13:46:21 -07:00
Linus Torvalds
dba94f2155 9p: patches for 4.1 merge window
Some accumulated cleanup patches for kerneldoc and unused variables
 as well as some lock bug fixes and adding privateport option for RDMA.
 
 A quick check shows some merge-conflicts versus current-tip on
    9p: use unsigned integers for nwqid/count
 If you would prefer I can rebase, remerge and fix the patch but didn't
 want to do that and look the for-next references.
 
 Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 Comment: GPGTools - http://gpgtools.org
 
 iQIcBAABAgAGBQJVMqXZAAoJEDZk62b0Tg6xBD4P/03nCkTxE5qDN9TVUSNdwHQD
 Oyq3JvvmfOORDHy7pZMp7wTdU4OLz+78RHYprpgJCk4Vs8Gcnl3hloeZ3L9l/W7J
 tz2Ek1noEE9uZLmeH6WPzSaba0sFOlnjbWPsLE8O84/zHOI/qj75s0UDPdrFRt1x
 LvMNQlTZqgUx0hogq1yLFKjp49bUzph78gMaJkoKK+30q9B4skPRRV93HLLzlo9j
 0dAGd0yhO8xUjtlm/ZkXIKiyeGeQ2XXj6UTnH6/4nwL29yVosWkGNjqIXkgz+ROu
 eyPvJqrjaBVtj8ZJkwfyZqM6xPrnsEbuSYUKLT2GcId87Ycebd7Wq1w+vhAO7l0H
 N1ZnzMGlQXHTszEhDGVCICCv1QU8b3ifvtA+nQYUly9JnDeIBcZGQ16g0oYQNoes
 1L6XKsrX4wdxROHYLqRJoNQ120KcaXAnRE3AmT8emiU8gl0KWW0TJ7WpLs9ICKRg
 cwgz1UzeGb/GGRtCv0gTlAE07fe/OjQVrSM3Q+ivTA+juRE2MWvluYh/WAMQHdFV
 FnJ5/sPKbcGK+IrHNWktkTLm2ZbbdcDnWHLmtk3egT3IubY5iLVpa5ADV47WsLAa
 viDp7N3mK0kZL8BJHgPs+aspRwMAHavme/EWzkuRTL048ABo8uTrM/BXiYsAaBBI
 GGh4+vEwcFDQdg2gMbF9
 =2sr2
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.1-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs

Pull 9pfs updates from Eric Van Hensbergen:
 "Some accumulated cleanup patches for kerneldoc and unused variables as
  well as some lock bug fixes and adding privateport option for RDMA"

* tag 'for-linus-4.1-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  net/9p: add a privport option for RDMA transport.
  fs/9p: Initialize status in v9fs_file_do_lock.
  net/9p: Initialize opts->privport as it should be.
  net/9p: use memcpy() instead of snprintf() in p9_mount_tag_show()
  9p: use unsigned integers for nwqid/count
  9p: do not crash on unknown lock status code
  9p: fix error handling in v9fs_file_do_lock
  9p: remove unused variable in p9_fd_create()
  9p: kerneldoc warning fixes
2015-04-18 17:45:30 -04:00
Linus Torvalds
8f502d5b9e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull usernamespace mount fixes from Eric Biederman:
 "Way back in October Andrey Vagin reported that umount(MNT_DETACH)
  could be used to defeat MNT_LOCKED.  As I worked to fix this I
  discovered that combined with mount propagation and an appropriate
  selection of shared subtrees a reference to a directory on an
  unmounted filesystem is not necessary.

  That MNT_DETACH is allowed in user namespace in a form that can break
  MNT_LOCKED comes from my early misunderstanding what MNT_DETACH does.

  To avoid breaking existing userspace the conflict between MNT_DETACH
  and MNT_LOCKED is fixed by leaving mounts that are locked to their
  parents in the mount hash table until the last reference goes away.

  While investigating this issue I also found an issue with
  __detach_mounts.  The code was unnecessarily and incorrectly
  triggering mount propagation.  Resulting in too many mounts going away
  when a directory is deleted, and too many cpu cycles are burned while
  doing that.

  Looking some more I realized that __detach_mounts by only keeping
  mounts connected that were MNT_LOCKED it had the potential to still
  leak information so I tweaked the code to keep everything locked
  together that possibly could be.

  This code was almost ready last cycle but Al invented fs_pin which
  slightly simplifies this code but required rewrites and retesting, and
  I have not been in top form for a while so it took me a while to get
  all of that done.  Similiarly this pull request is late because I have
  been feeling absolutely miserable all week.

  The issue of being able to escape a bind mount has not yet been
  addressed, as the fixes are not yet mature"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  mnt: Update detach_mounts to leave mounts connected
  mnt: Fix the error check in __detach_mounts
  mnt: Honor MNT_LOCKED when detaching mounts
  fs_pin: Allow for the possibility that m_list or s_list go unused.
  mnt: Factor umount_mnt from umount_tree
  mnt: Factor out unhash_mnt from detach_mnt and umount_tree
  mnt: Fail collect_mounts when applied to unmounted mounts
  mnt: Don't propagate unmounts to locked mounts
  mnt: On an unmount propagate clearing of MNT_LOCKED
  mnt: Delay removal from the mount hash.
  mnt: Add MNT_UMOUNT flag
  mnt: In umount_tree reuse mnt_list instead of mnt_hash
  mnt: Don't propagate umounts in __detach_mounts
  mnt: Improve the umount_tree flags
  mnt: Use hlist_move_list in namespace_unlock
2015-04-18 11:20:31 -04:00
Linus Torvalds
06a60deca8 Merge tag 'for-f2fs-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
 "New features:
   - in-memory extent_cache
   - fs_shutdown to test power-off-recovery
   - use inline_data to store symlink path
   - show f2fs as a non-misc filesystem

  Major fixes:
   - avoid CPU stalls on sync_dirty_dir_inodes
   - fix some power-off-recovery procedure
   - fix handling of broken symlink correctly
   - fix missing dot and dotdot made by sudden power cuts
   - handle wrong data index during roll-forward recovery
   - preallocate data blocks for direct_io

  ... and a bunch of minor bug fixes and cleanups"

* tag 'for-f2fs-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (71 commits)
  f2fs: pass checkpoint reason on roll-forward recovery
  f2fs: avoid abnormal behavior on broken symlink
  f2fs: flush symlink path to avoid broken symlink after POR
  f2fs: change 0 to false for bool type
  f2fs: do not recover wrong data index
  f2fs: do not increase link count during recovery
  f2fs: assign parent's i_mode for empty dir
  f2fs: add F2FS_INLINE_DOTS to recover missing dot dentries
  f2fs: fix mismatching lock and unlock pages for roll-forward recovery
  f2fs: fix sparse warnings
  f2fs: limit b_size of mapped bh in f2fs_map_bh
  f2fs: persist system.advise into on-disk inode
  f2fs: avoid NULL pointer dereference in f2fs_xattr_advise_get
  f2fs: preallocate fallocated blocks for direct IO
  f2fs: enable inline data by default
  f2fs: preserve extent info for extent cache
  f2fs: initialize extent tree with on-disk extent info of inode
  f2fs: introduce __{find,grab}_extent_tree
  f2fs: split set_data_blkaddr from f2fs_update_extent_cache
  f2fs: enable fast symlink by utilizing inline data
  ...
2015-04-18 11:17:20 -04:00
Ross Lagerwall
c57dcb566d efivarfs: Ensure VariableName is NUL-terminated
Some buggy firmware implementations update VariableNameSize on success
such that it does not include the final NUL character which results in
garbage in the efivarfs name entries.  Use kzalloc on the efivar_entry
(as is done in efivars.c) to ensure that the name is always
NUL-terminated.

The buggy firmware is:
BIOS Information
        Vendor: Intel Corp.
        Version: S1200RP.86B.02.02.0005.102320140911
        Release Date: 10/23/2014
        BIOS Revision: 4.6
System Information
        Manufacturer: Intel Corporation
        Product Name: S1200RP_SE

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Acked-by: Matthew Garrett <mjg59@coreos.com>
Cc: Jeremy Kerr <jk@ozlabs.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2015-04-17 15:41:13 +01:00
Linus Torvalds
54e514b91b Merge branch 'akpm' (patches from Andrew)
Merge third patchbomb from Andrew Morton:

 - various misc things

 - a couple of lib/ optimisations

 - provide DIV_ROUND_CLOSEST_ULL()

 - checkpatch updates

 - rtc tree

 - befs, nilfs2, hfs, hfsplus, fatfs, adfs, affs, bfs

 - ptrace fixes

 - fork() fixes

 - seccomp cleanups

 - more mmap_sem hold time reductions from Davidlohr

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (138 commits)
  proc: show locks in /proc/pid/fdinfo/X
  docs: add missing and new /proc/PID/status file entries, fix typos
  drivers/rtc/rtc-at91rm9200.c: make IO endian agnostic
  Documentation/spi/spidev_test.c: fix warning
  drivers/rtc/rtc-s5m.c: allow usage on device type different than main MFD type
  .gitignore: ignore *.tar
  MAINTAINERS: add Mediatek SoC mailing list
  tomoyo: reduce mmap_sem hold for mm->exe_file
  powerpc/oprofile: reduce mmap_sem hold for exe_file
  oprofile: reduce mmap_sem hold for mm->exe_file
  mips: ip32: add platform data hooks to use DS1685 driver
  lib/Kconfig: fix up HAVE_ARCH_BITREVERSE help text
  x86: switch to using asm-generic for seccomp.h
  sparc: switch to using asm-generic for seccomp.h
  powerpc: switch to using asm-generic for seccomp.h
  parisc: switch to using asm-generic for seccomp.h
  mips: switch to using asm-generic for seccomp.h
  microblaze: use asm-generic for seccomp.h
  arm: use asm-generic for seccomp.h
  seccomp: allow COMPAT sigreturn overrides
  ...
2015-04-17 09:04:38 -04:00
Andrey Vagin
6c8c90319c proc: show locks in /proc/pid/fdinfo/X
Let's show locks which are associated with a file descriptor in
its fdinfo file.

Currently we don't have a reliable way to determine who holds a lock.  We
can find some information in /proc/locks, but PID which is reported there
can be wrong.  For example, a process takes a lock, then forks a child and
dies.  In this case /proc/locks contains the parent pid, which can be
reused by another process.

$ cat /proc/locks
...
6: FLOCK  ADVISORY  WRITE 324 00:13:13431 0 EOF
...

$ ps -C rpcbind
  PID TTY          TIME CMD
  332 ?        00:00:00 rpcbind

$ cat /proc/332/fdinfo/4
pos:	0
flags:	0100000
mnt_id:	22
lock:	1: FLOCK  ADVISORY  WRITE 324 00:13:13431 0 EOF

$ ls -l /proc/332/fd/4
lr-x------ 1 root root 64 Mar  5 14:43 /proc/332/fd/4 -> /run/rpcbind.lock

$ ls -l /proc/324/fd/
total 0
lrwx------ 1 root root 64 Feb 27 14:50 0 -> /dev/pts/0
lrwx------ 1 root root 64 Feb 27 14:50 1 -> /dev/pts/0
lrwx------ 1 root root 64 Feb 27 14:49 2 -> /dev/pts/0

You can see that the process with the 324 pid doesn't hold the lock.

This information is required for proper dumping and restoring file
locks.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Acked-by: Jeff Layton <jlayton@poochiereds.net>
Acked-by: "J. Bruce Fields" <bfields@fieldses.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17 09:04:12 -04:00
Sanidhya Kashyap
c3fe5872eb bfs: correct return values
In case of failed memory allocation, the return should be ENOMEM instead
of ENOSPC.

Return -EIO when sb_bread() fails.

Signed-off-by: Sanidhya Kashyap <sanidhya.gatech@gmail.com>
Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17 09:04:09 -04:00
Sanidhya Kashyap
c8f33d0bec affs: kstrdup() memory handling
There is a possibility of kstrdup() failure upon memory pressure.
Therefore, returning ENOMEM even for new_opts.

[akpm@linux-foundation.org: cleanup]
Signed-off-by: Sanidhya Kashyap <sanidhya.gatech@gmail.com>
Cc: Taesoo kim <taesoo@gatech.edu>
Cc: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17 09:04:09 -04:00
Fabian Frederick
79bda4d510 fs/affs: use affs_test_opt()
Replace mount option test by affs_test_opt().

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17 09:04:08 -04:00
Fabian Frederick
34f2483538 fs/affs/super.c: use affs_set_opt()
Replace direct mount option assignation by affs_set_opt() macro.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17 09:04:08 -04:00
Fabian Frederick
6bf445ce35 fs/affs/affs.h: add mount option manipulation macros
Add clear/set/test affs mount option macros.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17 09:04:08 -04:00
Fabian Frederick
a0016ff286 fs/affs: use AFFS_MOUNT prefix for mount options
Currently, affs still uses direct access on mount_options.  This patch
prepares to use affs_clear/set/test_opt() like other filesystems.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17 09:04:08 -04:00
Sanidhya Kashyap
b796410630 adfs: return correct return values
Fix the wrong values returned by various functions such as EIO and ENOMEM.

Signed-off-by: Sanidhya Kashyap <sanidhya.gatech@gmail.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Joe Perches <joe@perches.com>
Cc: Taesoo kim <taesoo@gatech.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17 09:04:08 -04:00
Kirill Tkhai
dfcce791fb fs/exec.c:de_thread: move notify_count write under lock
We set sig->notify_count = -1 between RELEASE and ACQUIRE operations:

	spin_unlock_irq(lock);
	...
	if (!thread_group_leader(tsk)) {
		...
                for (;;) {
			sig->notify_count = -1;
                        write_lock_irq(&tasklist_lock);

There are no restriction on it so other processors may see this STORE
mixed with other STOREs in both areas limited by the spinlocks.

Probably, it may be reordered with the above

	sig->group_exit_task = tsk;
	sig->notify_count = zap_other_threads(tsk);

in some way.

Set it under tasklist_lock locked to be sure nothing will be reordered.

Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17 09:04:07 -04:00
Davidlohr Bueso
6e399cd144 prctl: avoid using mmap_sem for exe_file serialization
Oleg cleverly suggested using xchg() to set the new mm->exe_file instead
of calling set_mm_exe_file() which requires some form of serialization --
mmap_sem in this case.  For archs that do not have atomic rmw instructions
we still fallback to a spinlock alternative, so this should always be
safe.  As such, we only need the mmap_sem for looking up the backing
vm_file, which can be done sharing the lock.  Naturally, this means we
need to manually deal with both the new and old file reference counting,
and we need not worry about the MMF_EXE_FILE_CHANGED bits, which can
probably be deleted in the future anyway.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17 09:04:07 -04:00
Konstantin Khlebnikov
90f31d0ea8 mm: rcu-protected get_mm_exe_file()
This patch removes mm->mmap_sem from mm->exe_file read side.
Also it kills dup_mm_exe_file() and moves exe_file duplication into
dup_mmap() where both mmap_sems are locked.

[akpm@linux-foundation.org: fix comment typo]
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17 09:04:07 -04:00
Alexander Kuleshov
8de560def7 fs/fat: comment fix, fat_bits can be also 32
Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17 09:04:06 -04:00
Alexander Kuleshov
58932ef7f2 fs/fat: remove unnecessary includes
'fat.h' includes <linux/buffer_head.h> which includes <linux/fs.h> which
includes all the header files required for all *.c files fat filesystem.

[akpm@linux-foundation.org: fs/fat/iode.c needs seq_file.h]
[sfr@canb.auug.org.au: put one actually necessary include file back]
Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17 09:04:06 -04:00
Alexander Kuleshov
a40a7d9d07 fs/fat: remove unnecessary defintion
'*sb' never used, so let's remote it and pass inode->i_sb directly to the
MSDOS_SB.

Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17 09:04:05 -04:00
Thomas Hebb
db579e76f0 hfsplus: don't store special "osx" xattr prefix on-disk
On Mac OS X, HFS+ extended attributes are not namespaced.  Since we want
to be compatible with OS X filesystems and yet still support the Linux
namespacing system, the hfsplus driver implements a special "osx"
namespace that is reported for any attribute that is not namespaced
on-disk.  However, the current code for getting and setting these
unprefixed attributes is broken.

hfsplus_osx_setattr() and hfsplus_osx_getattr() are passed names that have
already had their "osx." prefixes stripped by the generic functions.  The
functions first, quite correctly, check those names to make sure that they
aren't prefixed with a known namespace, which would allow namespace access
restrictions to be bypassed.  However, the functions then prepend "osx."
to the name they're given before passing it on to hfsplus_getattr() and
hfsplus_setattr().  Not only does this cause the "osx." prefix to be
stored on-disk, defeating its purpose, it also breaks the check for the
special "com.apple.FinderInfo" attribute, which is reported for all files,
and as a consequence makes some userspace applications (e.g.  GNU patch)
fail even when extended attributes are not otherwise in use.

There are five commits which have touched this particular code:

  127e5f5ae5 ("hfsplus: rework functionality of getting, setting and deleting of extended attributes")
  b168fff721 ("hfsplus: use xattr handlers for removexattr")
  bf29e886b2 ("hfsplus: correct usage of HFSPLUS_ATTR_MAX_STRLEN for non-English attributes")
  fcacbd95e121 ("fs/hfsplus: move xattr_name allocation in hfsplus_getxattr()")
  ec1bbd346f18 ("fs/hfsplus: move xattr_name allocation in hfsplus_setxattr()")

The first commit creates the functions to begin with.  The namespace is
prepended by the original code, which I believe was correct at the time,
since hfsplus_?etattr() stripped the prefix if found.  The second commit
removes this behavior from hfsplus_?etattr() and appears to have been
intended to also remove the prefixing from hfsplus_osx_?etattr().
However, what it actually does is remove a necessary strncpy() call
completely, breaking the osx namespace entirely.  The third commit re-adds
the strncpy() call as it was originally, but doesn't mention it in its
commit message.  The final two commits refactor the code and don't affect
its functionality.

This commit does what b168fff attempted to do (prevent the prefix from
being added), but does it properly, instead of passing in an empty buffer
(which is what b168fff actually did).

Fixes: b168fff721 ("hfsplus: use xattr handlers for removexattr")
Signed-off-by: Thomas Hebb <tommyhebb@gmail.com>
Cc: Hin-Tak Leung <htl10@users.sourceforge.net>
Cc: Sergei Antonov <saproj@gmail.com>
Cc: Anton Altaparmakov <anton@tuxera.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Christian Kujau <lists@nerdbynature.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Viacheslav Dubeyko <slava@dubeyko.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17 09:04:05 -04:00
Sergei Antonov
059a704c43 hfsplus: fix expand when not enough available space
Fix a bug which is reproduced as follows. Create a file:

 echo abc > test_file

Try to expand the file beyond available space:

 truncate --size=<size exceeding available space> test_file

Since HFS+ does not support file size > allocated size, truncate should
fail.  However, it ends successfully.  The driver returns success despite
having been unable to allocate the requested space for the file.  Also
filesystem check finds an error:

 Checking catalog file.
 Incorrect size for file test_file
 (It should be 469094400 instead of 1000000000)

Add a piece of code analogous to code in the fat driver.  Now a proper
error is returned and filesystem remains consistent.

Signed-off-by: Sergei Antonov <saproj@gmail.com>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Hin-Tak Leung <htl10@users.sourceforge.net>
Reviewed-by: Anton Altaparmakov <anton@tuxera.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Sougata Santra <sougata@tuxera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17 09:04:05 -04:00
Chengyu Song
27a4e3884e hfsplus: incorrect return value
In case of memory allocation error, the return should be -ENOMEM, instead
of -ENOSPC.

Signed-off-by: Chengyu Song <csong84@gatech.edu>
Reviewed-by: Sergei Antonov <saproj@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17 09:04:05 -04:00
Fabian Frederick
7ce844a20e fs/hfsplus: replace if/BUG by BUG_ON
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17 09:04:05 -04:00
Fabian Frederick
1ad8d63d63 fs/hfsplus: use bool instead of int for is_known_namespace() return value
is_known_namespace() only returns true/false.  Also remove inline and let
compiler decide what to do with static functions.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17 09:04:05 -04:00
Fabian Frederick
73d28d571d fs/hfsplus: atomically set inode->i_flags
According to commit 5f16f3225b ("ext4: atomically set inode->i_flags in
ext4_set_inode_flags()").

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17 09:04:05 -04:00
Fabian Frederick
5e61473ea9 fs/hfsplus: move xattr_name allocation in hfsplus_setxattr()
security/trusted/user/osx setxattr did the same
xattr_name initialization. Move that operation in hfsplus_setxattr().

Tested with security/trusted/user getfattr/setfattr

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17 09:04:05 -04:00
Fabian Frederick
a3cef4cd68 fs/hfsplus: move xattr_name allocation in hfsplus_getxattr()
security/trusted/user/osx getxattr did the same
xattr_name initialization. Move that operation in hfsplus_getxattr().

Tested with security/trusted/user getfattr/setfattr

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17 09:04:04 -04:00
Dan Carpenter
f01fa5fb35 hfsplus: add missing curly braces in hfsplus_delete_cat()
This doesn't change how the code works, but clearly the curly braces were
intended.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Sougata Santra <sougata@tuxera.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17 09:04:04 -04:00
Chengyu Song
13f244852f hfs: incorrect return values
In case of memory allocation error, the return should be -ENOMEM, instead
of -ENOSPC.

Signed-off-by: Chengyu Song <csong84@gatech.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17 09:04:04 -04:00
Ryusuke Konishi
faea2c5311 nilfs2: use inode_set_flags() in nilfs_set_inode_flags()
Use inode_set_flags() to atomically set i_flags instead of clearing out
the S_IMMUTABLE, S_APPEND, etc.  flags and then setting them from the
FS_IMMUTABLE_FL, FS_APPEND_FL flags to avoid a race where an immutable
file has the immutable flag cleared for a brief window of time.

This is a similar fix to commit 5f16f3225b ("ext4: atomically set
inode->i_flags in ext4_set_inode_flags()").

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17 09:04:04 -04:00
Ryusuke Konishi
0ce187c4f3 nilfs2: put out gfp mask manipulation from nilfs_set_inode_flags()
nilfs_set_inode_flags() function adjusts gfp-mask of inode->i_mapping as
well as i_flags, however, this coupling of operations is not appropriate.

For instance, nilfs_ioctl_setflags(), one of three callers of
nilfs_set_inode_flags(), doesn't need to reinitialize the gfp-mask at all.
 In addition, nilfs_new_inode(), another caller of
nilfs_set_inode_flags(), doesn't either because it has already initialized
the gfp-mask.

Only __nilfs_read_inode(), the remaining caller, needs it.  So, this moves
the gfp mask manipulation to __nilfs_read_inode() from
nilfs_set_inode_flags().

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17 09:04:04 -04:00
Ryusuke Konishi
3377f843cf nilfs2: fix gcc warning at nilfs_checkpoint_is_mounted()
Fix the following build warning:

 fs/nilfs2/super.c: In function 'nilfs_checkpoint_is_mounted':
 fs/nilfs2/super.c:1023:10: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits]
   if (cno < 0 || cno > nilfs->ns_cno)
           ^

This warning indicates that the comparision "cno < 0" is useless because
variable "cno" has an unsigned integer type "__u64".

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Reported-by: David Binderman <dcb314@hotmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17 09:04:04 -04:00
Ryusuke Konishi
53a2c3bdf4 nilfs2: improve execution time of NILFS_IOCTL_GET_CPINFO ioctl
The older a filesystem gets, the slower lscp command becomes.  This is
because nilfs_cpfile_do_get_cpinfo() function meets more hole blocks
as the start offset of valid checkpoint numbers gets bigger.

This reduces the overhead by skipping hole blocks efficiently with
nilfs_mdt_find_block() helper.

A measurement result of this patch is as follows:

Before:
$ time lscp
                 CNO        DATE     TIME  MODE  FLG      BLKCNT       ICNT
             5769303  2015-02-22 19:31:33   cp    -          108          1
             5769304  2015-02-22 19:38:54   cp    -          108          1

real    0m0.182s
user    0m0.003s
sys     0m0.180s

After:
$ time lscp
                 CNO        DATE     TIME  MODE  FLG      BLKCNT       ICNT
             5769303  2015-02-22 19:31:33   cp    -          108          1
             5769304  2015-02-22 19:38:54   cp    -          108          1

real    0m0.003s
user    0m0.001s
sys     0m0.002s

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17 09:04:04 -04:00
Ryusuke Konishi
fa33915c92 nilfs2: add helper to find existent block on metadata file
Add a new metadata file function, nilfs_mdt_find_block(), which finds
an existent block on a metadata file in a given range of blocks.  This
function skips continuous hole blocks efficiently by using
nilfs_bmap_seek_key().

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17 09:04:04 -04:00
Ryusuke Konishi
5b20384fb3 nilfs2: add bmap function to seek a valid key
Add a new bmap function, nilfs_bmap_seek_key(), which seeks a valid
entry and returns its key starting from a given key.  This function
can be used to skip hole blocks efficiently.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17 09:04:03 -04:00
Ryusuke Konishi
3568a13f40 nilfs2: unify type of key arguments in bmap interface
The type of key arguments in block mapping interface varies depending
on function.  For instance, nilfs_bmap_lookup_at_level() takes "__u64"
for its key argument whereas nilfs_bmap_lookup() takes "unsigned
long".

This fits them to "__u64" to eliminate the variation.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17 09:04:03 -04:00
Ryusuke Konishi
0de6d6b9a2 nilfs2: use bgl_lock_ptr()
Simplify nilfs_mdt_bgl_lock() by utilizing bgl_lock_ptr() helper in
<linux/blockgroup_lock.h>.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17 09:04:03 -04:00
Ryusuke Konishi
ead8ecffa3 nilfs2: use set_mask_bits() for operations on buffer state bitmap
nilfs_forget_buffer(), nilfs_clear_dirty_page(), and
nilfs_segctor_complete_write() are using a bunch of atomic bit operations
against buffer state bitmap.

This reduces the number of them by utilizing set_mask_bits() macro.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17 09:04:03 -04:00
Ryusuke Konishi
6fb7a61e98 nilfs2: do not use async write flag for segment summary buffers
The async write flag is introduced to nilfs2 in the commit 7f42ec3941
("nilfs2: fix issue with race condition of competition between segments
for dirty blocks"), but the flag only makes sense for data buffers and
btree node buffers.  It is not needed for segment summary buffers.

This gets rid of the latter uses as part of refactoring of atomic bit
operations on buffer state bitmap.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17 09:04:03 -04:00
Fabian Frederick
f8ccad2164 befs: replace typedef befs_inode_info by structure
See Documentation/CodingStyle

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17 09:04:03 -04:00
Fabian Frederick
038428fcf7 befs: replace typedef befs_sb_info by structure
See Documenation/CodingStyle

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17 09:04:03 -04:00
Fabian Frederick
09ad0eae5e befs: replace typedef befs_mount_options by structure
See Documentation/CodingStyle

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17 09:04:03 -04:00
Rasmus Villemoes
6ceafb880c fs/binfmt_misc.c: simplify entry_status()
sprintf() reliably returns the number of characters printed, so we don't
need to ask strlen() where we are.  Also replace calling sprintf("%02x")
in a loop with the much simpler bin2hex().

[akpm@linux-foundation.org: it's odd to include kernel.h after everything else]
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17 09:03:59 -04:00
Linus Torvalds
4fc8adcfec Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull third hunk of vfs changes from Al Viro:
 "This contains the ->direct_IO() changes from Omar + saner
  generic_write_checks() + dealing with fcntl()/{read,write}() races
  (mirroring O_APPEND/O_DIRECT into iocb->ki_flags and instead of
  repeatedly looking at ->f_flags, which can be changed by fcntl(2),
  check ->ki_flags - which cannot) + infrastructure bits for dhowells'
  d_inode annotations + Christophs switch of /dev/loop to
  vfs_iter_write()"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (30 commits)
  block: loop: switch to VFS ITER_BVEC
  configfs: Fix inconsistent use of file_inode() vs file->f_path.dentry->d_inode
  VFS: Make pathwalk use d_is_reg() rather than S_ISREG()
  VFS: Fix up debugfs to use d_is_dir() in place of S_ISDIR()
  VFS: Combine inode checks with d_is_negative() and d_is_positive() in pathwalk
  NFS: Don't use d_inode as a variable name
  VFS: Impose ordering on accesses of d_inode and d_flags
  VFS: Add owner-filesystem positive/negative dentry checks
  nfs: generic_write_checks() shouldn't be done on swapout...
  ocfs2: use __generic_file_write_iter()
  mirror O_APPEND and O_DIRECT into iocb->ki_flags
  switch generic_write_checks() to iocb and iter
  ocfs2: move generic_write_checks() before the alignment checks
  ocfs2_file_write_iter: stop messing with ppos
  udf_file_write_iter: reorder and simplify
  fuse: ->direct_IO() doesn't need generic_write_checks()
  ext4_file_write_iter: move generic_write_checks() up
  xfs_file_aio_write_checks: switch to iocb/iov_iter
  generic_write_checks(): drop isblk argument
  blkdev_write_iter: expand generic_file_checks() call in there
  ...
2015-04-16 23:27:56 -04:00
Linus Torvalds
84588e7a5d Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull quota and udf updates from Jan Kara:
 "The pull contains quota changes which complete unification of XFS and
  VFS quota interfaces (so tools can use either interface to manipulate
  any filesystem).  There's also a patch to support project quotas in
  VFS quota subsystem from Li Xi.

  Finally there's a bunch of UDF fixes and cleanups and tiny cleanup in
  reiserfs & ext3"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: (21 commits)
  udf: Update ctime and mtime when directory is modified
  udf: return correct errno for udf_update_inode()
  ext3: Remove useless condition in if statement.
  vfs: Add general support to enforce project quota limits
  reiserfs: fix __RASSERT format string
  udf: use int for allocated blocks instead of sector_t
  udf: remove redundant buffer_head.h includes
  udf: remove else after return in __load_block_bitmap()
  udf: remove unused variable in udf_table_free_blocks()
  quota: Fix maximum quota limit settings
  quota: reorder flags in quota state
  quota: paranoia: check quota tree root
  quota: optimize i_dquot access
  quota: Hook up Q_XSETQLIM for id 0 to ->set_info
  xfs: Add support for Q_SETINFO
  quota: Make ->set_info use structure with neccesary info to VFS and XFS
  quota: Remove ->get_xstate and ->get_xstatev callbacks
  gfs2: Convert to using ->get_state callback
  xfs: Convert to using ->get_state callback
  quota: Wire up Q_GETXSTATE and Q_GETXSTATV calls to work with ->get_state
  ...
2015-04-16 22:19:33 -04:00
Linus Torvalds
d82312c808 Merge branch 'for-4.1/core' of git://git.kernel.dk/linux-block
Pull block layer core bits from Jens Axboe:
 "This is the core pull request for 4.1.  Not a lot of stuff in here for
  this round, mostly little fixes or optimizations.  This pull request
  contains:

   - An optimization that speeds up queue runs on blk-mq, especially for
     the case where there's a large difference between nr_cpu_ids and
     the actual mapped software queues on a hardware queue.  From Chong
     Yuan.

   - Honor node local allocations for requests on legacy devices.  From
     David Rientjes.

   - Cleanup of blk_mq_rq_to_pdu() from me.

   - exit_aio() fixup from me, greatly speeding up exiting multiple IO
     contexts off exit_group().  For my particular test case, fio exit
     took ~6 seconds.  A typical case of both exposing RCU grace periods
     to user space, and serializing exit of them.

   - Make blk_mq_queue_enter() honor the gfp mask passed in, so we only
     wait if __GFP_WAIT is set.  From Keith Busch.

   - blk-mq exports and two added helpers from Mike Snitzer, which will
     be used by the dm-mq code.

   - Cleanups of blk-mq queue init from Wei Fang and Xiaoguang Wang"

* 'for-4.1/core' of git://git.kernel.dk/linux-block:
  blk-mq: reduce unnecessary software queue looping
  aio: fix serial draining in exit_aio()
  blk-mq: cleanup blk_mq_rq_to_pdu()
  blk-mq: put blk_queue_rq_timeout together in blk_mq_init_queue()
  block: remove redundant check about 'set->nr_hw_queues' in blk_mq_alloc_tag_set()
  block: allocate request memory local to request queue
  blk-mq: don't wait in blk_mq_queue_enter() if __GFP_WAIT isn't set
  blk-mq: export blk_mq_run_hw_queues
  blk-mq: add blk_mq_init_allocated_queue and export blk_mq_register_disk
2015-04-16 21:49:16 -04:00
Linus Torvalds
d19d5efd8c powerpc updates for 4.1
- Numerous minor fixes, cleanups etc.
 - More EEH work from Gavin to remove its dependency on device_nodes.
 - Memory hotplug implemented entirely in the kernel from Nathan Fontenot.
 - Removal of redundant CONFIG_PPC_OF by Kevin Hao.
 - Rewrite of VPHN parsing logic & tests from Greg Kurz.
 - A fix from Nish Aravamudan to reduce memory usage by clamping
   nodes_possible_map.
 - Support for pstore on powernv from Hari Bathini.
 - Removal of old powerpc specific byte swap routines by David Gibson.
 - Fix from Vasant Hegde to prevent the flash driver telling you it was flashing
   your firmware when it wasn't.
 - Patch from Ben Herrenschmidt to add an OPAL heartbeat driver.
 - Fix for an oops causing get/put_cpu_var() imbalance in perf by Jan Stancek.
 - Some fixes for migration from Tyrel Datwyler.
 - A new syscall to switch the cpu endian by Michael Ellerman.
 - Large series from Wei Yang to implement SRIOV, reviewed and acked by Bjorn.
 - A fix for the OPAL sensor driver from Cédric Le Goater.
 - Fixes to get STRICT_MM_TYPECHECKS building again by Michael Ellerman.
 - Large series from Daniel Axtens to make our PCI hooks per PHB rather than per
   machine.
 - Small patch from Sam Bobroff to explicitly abort non-suspended transactions
   on syscalls, plus a test to exercise it.
 - Numerous reworks and fixes for the 24x7 PMU from Sukadev Bhattiprolu.
 - Small patch to enable the hard lockup detector from Anton Blanchard.
 - Fix from Dave Olson for missing L2 cache information on some CPUs.
 - Some fixes from Michael Ellerman to get Cell machines booting again.
 - Freescale updates from Scott: Highlights include BMan device tree nodes, an
   MSI erratum workaround, a couple minor performance improvements, config
   updates, and misc fixes/cleanup.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVL2cxAAoJEFHr6jzI4aWAR8cP/19VTo/CzCE4ffPSx7qR464n
 F+WFZcbNjIMXu6+B0YLuJZEsuWtKKrCit/MCg3+mSgE4iqvxmtI+HDD0445Buszj
 UD4E4HMdPrXQ+KUSUDORvRjv/FFUXIa94LSv/0g2UeMsPz/HeZlhMxEu7AkXw9Nf
 rTxsmRTsOWME85Y/c9ss7XHuWKXT3DJV7fOoK9roSaN3dJAuWTtG3WaKS0nUu0ok
 0M81D6ZczoD6ybwh2DUMPD9K6SGxLdQ4OzQwtW6vWzcQIBDfy5Pdeo0iAFhGPvXf
 T4LLPkv4cF4AwHsAC4rKDPHQNa+oZBoLlScrHClaebAlDiv+XYKNdMogawUObvSh
 h7avKmQr0Ygp1OvvZAaXLhuDJI9FJJ8lf6AOIeULgHsDR9SyKMjZWxRzPe11uarO
 Fyi0qj3oJaQu6LjazZraApu8mo+JBtQuD3z3o5GhLxeFtBBF60JXj6zAXJikufnl
 kk1/BUF10nKUhtKcDX767AMUCtMH3fp5hx8K/z9T5v+pobJB26Wup1bbdT68pNBT
 NjdKUppV6QTjZvCsA6U2/ECu6E9KeIaFtFSL2IRRoiI0dWBN5/5eYn3RGkO2ZFoL
 1NdwKA2XJcchwTPkpSRrUG70sYH0uM2AldNYyaLfjzrQqza7Y6lF699ilxWmCN/H
 OplzJAE5cQ8Am078veTW
 =03Yh
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux

Pull powerpc updates from Michael Ellerman:

 - Numerous minor fixes, cleanups etc.

 - More EEH work from Gavin to remove its dependency on device_nodes.

 - Memory hotplug implemented entirely in the kernel from Nathan
   Fontenot.

 - Removal of redundant CONFIG_PPC_OF by Kevin Hao.

 - Rewrite of VPHN parsing logic & tests from Greg Kurz.

 - A fix from Nish Aravamudan to reduce memory usage by clamping
   nodes_possible_map.

 - Support for pstore on powernv from Hari Bathini.

 - Removal of old powerpc specific byte swap routines by David Gibson.

 - Fix from Vasant Hegde to prevent the flash driver telling you it was
   flashing your firmware when it wasn't.

 - Patch from Ben Herrenschmidt to add an OPAL heartbeat driver.

 - Fix for an oops causing get/put_cpu_var() imbalance in perf by Jan
   Stancek.

 - Some fixes for migration from Tyrel Datwyler.

 - A new syscall to switch the cpu endian by Michael Ellerman.

 - Large series from Wei Yang to implement SRIOV, reviewed and acked by
   Bjorn.

 - A fix for the OPAL sensor driver from Cédric Le Goater.

 - Fixes to get STRICT_MM_TYPECHECKS building again by Michael Ellerman.

 - Large series from Daniel Axtens to make our PCI hooks per PHB rather
   than per machine.

 - Small patch from Sam Bobroff to explicitly abort non-suspended
   transactions on syscalls, plus a test to exercise it.

 - Numerous reworks and fixes for the 24x7 PMU from Sukadev Bhattiprolu.

 - Small patch to enable the hard lockup detector from Anton Blanchard.

 - Fix from Dave Olson for missing L2 cache information on some CPUs.

 - Some fixes from Michael Ellerman to get Cell machines booting again.

 - Freescale updates from Scott: Highlights include BMan device tree
   nodes, an MSI erratum workaround, a couple minor performance
   improvements, config updates, and misc fixes/cleanup.

* tag 'powerpc-4.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: (196 commits)
  powerpc/powermac: Fix build error seen with powermac smp builds
  powerpc/pseries: Fix compile of memory hotplug without CONFIG_MEMORY_HOTREMOVE
  powerpc: Remove PPC32 code from pseries specific find_and_init_phbs()
  powerpc/cell: Fix iommu breakage caused by controller_ops change
  powerpc/eeh: Fix crash in eeh_add_device_early() on Cell
  powerpc/perf: Cap 64bit userspace backtraces to PERF_MAX_STACK_DEPTH
  powerpc/perf/hv-24x7: Fail 24x7 initcall if create_events_from_catalog() fails
  powerpc/pseries: Correct memory hotplug locking
  powerpc: Fix missing L2 cache size in /sys/devices/system/cpu
  powerpc: Add ppc64 hard lockup detector support
  oprofile: Disable oprofile NMI timer on ppc64
  powerpc/perf/hv-24x7: Add missing put_cpu_var()
  powerpc/perf/hv-24x7: Break up single_24x7_request
  powerpc/perf/hv-24x7: Define update_event_count()
  powerpc/perf/hv-24x7: Whitespace cleanup
  powerpc/perf/hv-24x7: Define add_event_to_24x7_request()
  powerpc/perf/hv-24x7: Rename hv_24x7_event_update
  powerpc/perf/hv-24x7: Move debug prints to separate function
  powerpc/perf/hv-24x7: Drop event_24x7_request()
  powerpc/perf/hv-24x7: Use pr_devel() to log message
  ...

Conflicts:
	tools/testing/selftests/powerpc/Makefile
	tools/testing/selftests/powerpc/tm/Makefile
2015-04-16 13:53:32 -05:00
Jaegeuk Kim
10027551cc f2fs: pass checkpoint reason on roll-forward recovery
This patch adds CP_RECOVERY to remain recovery information for checkpoint.
And, it makes sure writing checkpoint in this case.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-16 09:45:40 -07:00
Jaegeuk Kim
feb7cbb079 f2fs: avoid abnormal behavior on broken symlink
When f2fs_symlink was triggered and checkpoint was done before syncing its
link path, f2fs can get broken symlink like "xxx -> \0\0\0".
This incurs abnormal path_walk by VFS.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-16 09:45:40 -07:00
Jaegeuk Kim
d0cae97cb6 f2fs: flush symlink path to avoid broken symlink after POR
This patch tries to avoid broken symlink case after POR in best effort.
This results in performance regression.
But, if f2fs has inline_data and the target path is under 3KB-sized long,
the page would be stored in its inode_block, so that there would be no
performance regression.

Note that, if user wants to keep this file atomically, it needs to trigger
dir->fsync.
And, there is still a hole to produce broken symlink.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-16 09:45:35 -07:00
Dave Chinner
542c311813 Merge branch 'xfs-dio-extend-fix' into for-next
Conflicts:
	fs/xfs/xfs_file.c
2015-04-16 22:13:18 +10:00
Dave Chinner
0cefb29e6a xfs: using generic_file_direct_write() is unnecessary
generic_file_direct_write() does all sorts of things to make DIO
work "sorta ok" with mixed buffered IO workloads. We already do
most of this work in xfs_file_aio_dio_write() because of the locking
requirements, so there's only a couple of things it does for us.

The first thing is that it does a page cache invalidation after the
->direct_IO callout. This can easily be added to the XFS code.

The second thing it does is that if data was written, it updates the
iov_iter structure to reflect the data written, and then does EOF
size updates if necessary. For XFS, these EOF size updates are now
not necessary, as we do them safely and race-free in IO completion
context. That leaves just the iov_iter update, and that's also moved
to the XFS code.

Therefore we don't need to call generic_file_direct_write() and in
doing so remove redundant buffered writeback and page cache
invalidation calls from the DIO submission path. We also remove a
racy EOF size update, and make the DIO submission code in XFS much
easier to follow. Wins all round, really.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-04-16 22:03:27 +10:00
Dave Chinner
40c63fbc55 xfs: direct IO EOF zeroing needs to drain AIO
When we are doing AIO DIO writes, the IOLOCK only provides an IO
submission barrier. When we need to do EOF zeroing, we need to ensure
that no other IO is in progress and all pending in-core EOF updates
have been completed. This requires us to wait for all outstanding
AIO DIO writes to the inode to complete and, if necessary, run their
EOF updates.

Once all the EOF updates are complete, we can then restart
xfs_file_aio_write_checks() while holding the IOLOCK_EXCL, knowing
that EOF is up to date and we have exclusive IO access to the file
so we can run EOF block zeroing if we need to without interference.
This gives EOF zeroing the same exclusivity against other IO as we
provide truncate operations.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-04-16 22:03:17 +10:00
Dave Chinner
b9d59846f7 xfs: DIO write completion size updates race
xfs_end_io_direct_write() can race with other IO completions when
updating the in-core inode size. The IO completion processing is not
serialised for direct IO - they are done either under the
IOLOCK_SHARED for non-AIO DIO, and without any IOLOCK held at all
during AIO DIO completion. Hence the non-atomic test-and-set update
of the in-core inode size is racy and can result in the in-core
inode size going backwards if the race if hit just right.

If the inode size goes backwards, this can trigger the EOF zeroing
code to run incorrectly on the next IO, which then will zero data
that has successfully been written to disk by a previous DIO.

To fix this bug, we need to serialise the test/set updates of the
in-core inode size. This first patch introduces locking around the
relevant updates and checks in the DIO path. Because we now have an
ioend in xfs_end_io_direct_write(), we know exactly then we are
doing an IO that requires an in-core EOF update, and we know that
they are not running in interrupt context. As such, we do not need to
use irqsave() spinlock variants to protect against interrupts while
the lock is held.

Hence we can use an existing spinlock in the inode to do this
serialisation and so not need to grow the struct xfs_inode just to
work around this problem.

This patch does not address the test/set EOF update in
generic_file_write_direct() for various reasons - that will be done
as a followup with separate explanation.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-04-16 22:03:07 +10:00
Dave Chinner
a06c277a13 xfs: DIO writes within EOF don't need an ioend
DIO writes that lie entirely within EOF have nothing to do in IO
completion. In this case, we don't need no steekin' ioend, and so we
can avoid allocating an ioend until we have a mapping that spans
EOF.

This means that IO completion has two contexts - deferred completion
to the dio workqueue that uses an ioend, and interrupt completion
that does nothing because there is nothing that can be done in this
context.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-04-16 22:00:00 +10:00
Dave Chinner
6dfa1b67e3 xfs: handle DIO overwrite EOF update completion correctly
Currently a DIO overwrite that extends the EOF (e.g sub-block IO or
write into allocated blocks beyond EOF) requires a transaction for
the EOF update. Thi is done in IO completion context, but we aren't
explicitly handling this situation properly and so it can run in
interrupt context. Ensure that we defer IO that spans EOF correctly
to the DIO completion workqueue, and now that we have an ioend in IO
completion we can use the common ioend completion path to do all the
work.

Note: we do not preallocate the append transaction as we can have
multiple mapping and allocation calls per direct IO. hence
preallocating can still leave us with nested transactions by
attempting to map and allocate more blocks after we've preallocated
an append transaction.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-04-16 21:59:34 +10:00
Dave Chinner
d5cc2e3f96 xfs: DIO needs an ioend for writes
Currently we can only tell DIO completion that an IO requires
unwritten extent completion. This is done by a hacky non-null
private pointer passed to Io completion, but the private pointer
does not actually contain any information that is used.

We also need to pass to IO completion the fact that the IO may be
beyond EOF and so a size update transaction needs to be done. This
is currently determined by checks in the io completion, but we need
to determine if this is necessary at block mapping time as we need
to defer the size update transactions to a completion workqueue,
just like unwritten extent conversion.

To do this, first we need to allocate and pass an ioend to to IO
completion. Add this for unwritten extent conversion; we'll do the
EOF updates in the next commit.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-04-16 21:59:07 +10:00
Dave Chinner
1fdca9c211 xfs: move DIO mapping size calculation
The mapping size calculation is done last in __xfs_get_blocks(), but
we are going to need the actual mapping size we will use to map the
direct IO correctly in xfs_map_direct(). Factor out the calculation
for code clarity, and move the call to be the first operation in
mapping the extent to the returned buffer.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-04-16 21:58:21 +10:00
Dave Chinner
a719370be5 xfs: factor DIO write mapping from get_blocks
Clarify and separate the buffer mapping logic so that the direct IO mapping is
not tangled up in propagating the extent status to teh mapping buffer. This
makes it easier to extend the direct IO mapping to use an ioend in future.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-04-16 21:57:48 +10:00
Theodore Ts'o
6ddb244784 ext4 crypto: enable encryption feature flag
Also add the test dummy encryption mode flag so we can more easily
test the encryption patches using xfstests.

Signed-off-by: Michael Halcrow <mhalcrow@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-04-16 01:56:00 -04:00