Pull Ceph updates from Sage Weil:
"There are several patches from Ilya fixing RBD allocation lifecycle
issues, a series adding a nocephx_sign_messages option (and associated
bug fixes/cleanups), several patches from Zheng improving the
(directory) fsync behavior, a big improvement in IO for direct-io
requests when striping is enabled from Caifeng, and several other
small fixes and cleanups"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
libceph: clear msg->con in ceph_msg_release() only
libceph: add nocephx_sign_messages option
libceph: stop duplicating client fields in messenger
libceph: drop authorizer check from cephx msg signing routines
libceph: msg signing callouts don't need con argument
libceph: evaluate osd_req_op_data() arguments only once
ceph: make fsync() wait unsafe requests that created/modified inode
ceph: add request to i_unsafe_dirops when getting unsafe reply
libceph: introduce ceph_x_authorizer_cleanup()
ceph: don't invalidate page cache when inode is no longer used
rbd: remove duplicate calls to rbd_dev_mapping_clear()
rbd: set device_type::release instead of device::release
rbd: don't free rbd_dev outside of the release callback
rbd: return -ENOMEM instead of pool id if rbd_dev_create() fails
libceph: use local variable cursor instead of &msg->cursor
libceph: remove con argument in handle_reply()
ceph: combine as many iovec as possile into one OSD request
ceph: fix message length computation
ceph: fix a comment typo
rbd: drop null test before destroy functions
Pull misc block fixes from Jens Axboe:
"Stuff that got collected after the merge window opened. This
contains:
- NVMe:
- Fix for non-striped transfer size setting for NVMe from
Sathyavathi.
- (Some) support for the weird Apple nvme controller in the
macbooks. From Stephan Günther.
- The error value leak for dax from Al.
- A few minor blk-mq tweaks from me.
- Add the new linux-block@vger.kernel.org mailing list to the
MAINTAINERS file.
- Discard fix for brd, from Jan.
- A kerneldoc warning for block core from Randy.
- An older fix from Vivek, converting a WARN_ON() to a rate limited
printk when a device is hot removed with dirty inodes"
* 'for-linus' of git://git.kernel.dk/linux-block:
block: don't hardcode blk_qc_t -> tag mask
dax_io(): don't let non-error value escape via retval instead of EFAULT
block: fix blk-core.c kernel-doc warning
fs/block_dev.c: Remove WARN_ON() when inode writeback fails
NVMe: add support for Apple NVMe controller
NVMe: use split lo_hi_{read,write}q
blk-mq: mark __blk_mq_complete_request() static
MAINTAINERS: add reference to new linux-block list
NVMe: Increase the max transfer size when mdts is 0
brd: Refuse improperly aligned discard requests
This update contains:
o per-mount operational statistics in sysfs
o fixes for concurrent aio append write submission
o various logging fixes
o detection of zeroed logs and invalid log sequence numbers on v5 filesystems
o memory allocation failure message improvements
o a bunch of xattr/ACL fixes
o fdatasync optimisation
o miscellaneous other fixes and cleanups
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJWQ7GzAAoJEK3oKUf0dfodJakP/3s3N5ngqRWa+PQwBQPdTO0r
MBQppSKXWdT7YLhiFt1ZRlvXiMQOIZPNx0yBS9mzQghL9sTGvcPdxjbQnNh6LUnE
fGC2Yzi/J8lM2M80ezk3JoFqdqAQ/U78ARA/VpZct4imrps/h+s2Klkx87xPJsiK
/wY56FXFtoUS1ADYhL8qCeiAGOFpyIttiDNOVW3O2ZXn4iJUsa2nLCoiFwF/yFvU
S85iUJWAsvVSW5WgfUufmodC4u+WOT+9isNRxEmBjpxYYAFrFb5+8DYY3Coh6z0V
HqYPhpzBOG9gXbAue5v+ccsp2w60atXIFUQkR2HFBblvxsDMkvsgycJWJgDNmJiw
RYDMBJ26epxUdTScUxijKiGfnnbZW5b+uzp6FvVsE4KPdP62ol7YNqxj8/FFIjQN
JBl2ooiczOgvhCdvdWmWNEGWHccBcJ8UJ2RzJ0owVIIJZZYwjkZNzeSieWzYc7tr
b9wBC4wnaYAK/V7aEGLJxMXVjkanrqAnaXf5ymICSFv8me/qAfZ2sLcY2P6SHuhO
Fmkj6R5Thh1SYxk3thgGFZg7LGuxJW9cmypvFGpKhIvEaNGIM6ScdIwO7kCHYWv7
3EkP42mmJLIYxKz/q2nHqt7R246YFraIRowLWptJUl32uyzO7SrdKbc8+o5WD4Wl
2byjE9TjXOa1jGuPa3kN
=zu+5
-----END PGP SIGNATURE-----
Merge tag 'xfs-for-linus-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs
Pull xfs updates from Dave Chinner:
"There is nothing really major here - the only significant addition is
the per-mount operation statistics infrastructure. Otherwises there's
various ACL, xattr, DAX, AIO and logging fixes, and a smattering of
small cleanups and fixes elsewhere.
Summary:
- per-mount operational statistics in sysfs
- fixes for concurrent aio append write submission
- various logging fixes
- detection of zeroed logs and invalid log sequence numbers on v5 filesystems
- memory allocation failure message improvements
- a bunch of xattr/ACL fixes
- fdatasync optimisation
- miscellaneous other fixes and cleanups"
* tag 'xfs-for-linus-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (39 commits)
xfs: give all workqueues rescuer threads
xfs: fix log recovery op header validation assert
xfs: Fix error path in xfs_get_acl
xfs: optimise away log forces on timestamp updates for fdatasync
xfs: don't leak uuid table on rmmod
xfs: invalidate cached acl if set via ioctl
xfs: Plug memory leak in xfs_attrmulti_attr_set
xfs: Validate the length of on-disk ACLs
xfs: invalidate cached acl if set directly via xattr
xfs: xfs_filemap_pmd_fault treats read faults as write faults
xfs: add ->pfn_mkwrite support for DAX
xfs: DAX does not use IO completion callbacks
xfs: Don't use unwritten extents for DAX
xfs: introduce BMAPI_ZERO for allocating zeroed extents
xfs: fix inode size update overflow in xfs_map_direct()
xfs: clear PF_NOFREEZE for xfsaild kthread
xfs: fix an error code in xfs_fs_fill_super()
xfs: stats are no longer dependent on CONFIG_PROC_FS
xfs: simplify /proc teardown & error handling
xfs: per-filesystem stats counter implementation
...
the breakup of the big NFSv4 state lock in 3.17--thanks especially to
Andrew Elble and Jeff Layton for tracking down some of the remaining
races.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWQ51oAAoJECebzXlCjuG+b1AP+wWamMNw8eS0N98+KslfTMNd
BFcOFp6L5Hv1VRuwl67V9UUNS+9y5rsgWh9gMnQe5yPZ+dVABbO6mKfh3f7HJ2zg
aGzmE9ZdTMejjRDpSHHMEqxYePlxGVgxVhr8CgnzgkXf6KBEy3emfDlFIocgFWdR
JGWEhfOoOa+H7b3Awq3KxlxhAajq1ic14un2CLxTYdvjshwhlIjnscY1F7vNiNRg
O/ELQRKCCSNZbwGSV/OhNUXx3VQPQUh1eMvIfSD3Fs6AtMybIWGW5Fc36jxZJKt2
kllcEfukRnGa+Ezl/hwBWd1pEVMwmkYhNRt+9LtH8uWG4+uT5i3Mxn4taDYg8618
plp2GtRoC3VwOvUKEcKl3HhlRBu5H4zJtx9x60NDzAgNUtoKG/Dl7rhm7o7QwUk8
W3k0jYAryoyx+12fYO0dssdM4pj1Gi7nRGR687lKzXXttktbQF88ZS9EHhI3oFiC
Ak8ilEeap8ND1KIJY6Z2xr925BPpw+P2GXbd/Mr5H0aX3a3WM3wLPXlToZbve5EP
haYnTqbHw9QzqbLDcki8s0hNgv+xwlQWopoInGijfr3IAHQMpKStI0WxhyTYsb8g
0xyRWA1COnj5WvgznbkMky7Q45T27q26EFgaS8+LEJ1rtEpmNDZOaycbwym6XQHk
1oyydIRSWM3c7eWnDvFG
=wRg0
-----END PGP SIGNATURE-----
Merge tag 'nfsd-4.4' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
"Apologies for coming a little late in the merge window. Fortunately
this is another fairly quiet one:
Mainly smaller bugfixes and cleanup. We're still finding some bugs
from the breakup of the big NFSv4 state lock in 3.17 -- thanks
especially to Andrew Elble and Jeff Layton for tracking down some of
the remaining races"
* tag 'nfsd-4.4' of git://linux-nfs.org/~bfields/linux:
svcrpc: document lack of some memory barriers
nfsd: fix race with open / open upgrade stateids
nfsd: eliminate sending duplicate and repeated delegations
nfsd: remove recurring workqueue job to clean DRC
SUNRPC: drop stale comment in svc_setup_socket()
nfsd: ensure that seqid morphing operations are atomic wrt to copies
nfsd: serialize layout stateid morphing operations
nfsd: improve client_has_state to check for unused openowners
nfsd: fix clid_inuse on mount with security change
sunrpc/cache: make cache flushing more reliable.
nfsd: move include of state.h from trace.c to trace.h
sunrpc: avoid warning in gss_key_timeout
lockd: get rid of reference-counted NSM RPC clients
SUNRPC: Use MSG_SENDPAGE_NOTLAST when calling sendpage()
lockd: create NSM handles per net namespace
nfsd: switch unsigned char flags in svc_fh to bools
nfsd: move svc_fh->fh_maxsize to just after fh_handle
nfsd: drop null test before destroy functions
nfsd: serialize state seqid morphing operations
Pull vfs update from Al Viro:
- misc stable fixes
- trivial kernel-doc and comment fixups
- remove never-used block_page_mkwrite() wrapper function, and rename
the function that is _actually_ used to not have double underscores.
* 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: 9p: cache.h: Add #define of include guard
vfs: remove stale comment in inode_operations
vfs: remove unused wrapper block_page_mkwrite()
binfmt_elf: Correct `arch_check_elf's description
fs: fix writeback.c kernel-doc warnings
fs: fix inode.c kernel-doc warning
fs/pipe.c: return error code rather than 0 in pipe_write()
fs/pipe.c: preserve alloc_file() error code
binfmt_elf: Don't clobber passed executable's file header
FS-Cache: Handle a write to the page immediately beyond the EOF marker
cachefiles: perform test on s_blocksize when opening cache file.
FS-Cache: Don't override netfs's primary_index if registering failed
FS-Cache: Increase reference of parent after registering, netfs success
debugfs: fix refcount imbalance in start_creating
If a block device is hot removed and later last reference to device
is put, we try to writeback the dirty inode. But device is gone and
that writeback fails.
Currently we do a WARN_ON() which does not seem to be the right thing.
Convert it to a ratelimited kernel warning.
Reported-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
[jmoyer@redhat.com: get rid of unnecessary name initialization, 80 cols]
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
The include file was intended to have an include guard, but the #define
part is missing.
Signed-off-by: Tzvetelin Katchov <katchov@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The function currently called "__block_page_mkwrite()" used to be called
"block_page_mkwrite()" until a wrapper for this function was added by:
commit 24da4fab5a ("vfs: Create __block_page_mkwrite() helper passing
error values back")
This wrapper, the current "block_page_mkwrite()", is currently unused.
__block_page_mkwrite() is used directly by ext4, nilfs2 and xfs.
Remove the unused wrapper, rename __block_page_mkwrite() back to
block_page_mkwrite() and update the comment above block_page_mkwrite().
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.com>
Cc: Jan Kara <jack@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Correct `arch_check_elf's description, mistakenly copied and pasted from
`arch_elf_pt_proc'.
Signed-off-by: Maciej W. Rozycki <macro@imgtec.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Fix kernel-doc warnings in fs/fs-writeback.c by moving a #define macro
to after the function's opening brace. Also #undef this macro at the
end of the function.
..//fs/fs-writeback.c:1984: warning: Excess function parameter 'inode' description in 'I_DIRTY_INODE'
..//fs/fs-writeback.c:1984: warning: Excess function parameter 'flags' description in 'I_DIRTY_INODE'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Fix kernel-doc warning in fs/inode.c:
..//fs/inode.c:1606: warning: No description found for parameter 'inode'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
pipe_write() would return 0 if it failed to merge the beginning of the
data to write with the last, partially filled pipe buffer. It should
return an error code instead. Userspace programs could be confused by
write() returning 0 when called with a nonzero 'count'.
The EFAULT error case was a regression from f0d1bec9d5 ("new helper:
copy_page_from_iter()"), while the ops->confirm() error case was a much
older bug.
Test program:
#include <assert.h>
#include <errno.h>
#include <unistd.h>
int main(void)
{
int fd[2];
char data[1] = {0};
assert(0 == pipe(fd));
assert(1 == write(fd[1], data, 1));
/* prior to this patch, write() returned 0 here */
assert(-1 == write(fd[1], NULL, 1));
assert(errno == EFAULT);
}
Cc: stable@vger.kernel.org # at least v3.15+
Signed-off-by: Eric Biggers <ebiggers3@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
If sys_pipe() was unable to allocate a 'struct file', it always failed
with ENFILE, which means "The number of simultaneously open files in the
system would exceed a system-imposed limit." However, alloc_file()
actually returns an ERR_PTR value and might fail with other error codes.
Currently, in addition to ENFILE, it can fail with ENOMEM, potentially
when there are few open files in the system. Update sys_pipe() to
preserve this error code.
In a prior submission of a similar patch (1) some concern was raised
about introducing a new error code for sys_pipe(). However, for most
system calls, programs cannot assume that new error codes will never be
introduced. In addition, ENOMEM was, in fact, already a possible error
code for sys_pipe(), in the case where the file descriptor table could
not be expanded due to insufficient memory.
(1) http://comments.gmane.org/gmane.linux.kernel/1357942
Signed-off-by: Eric Biggers <ebiggers3@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Do not clobber the buffer space passed from `search_binary_handler' and
originally preloaded by `prepare_binprm' with the executable's file
header by overwriting it with its interpreter's file header. Instead
keep the buffer space intact and directly use the data structure locally
allocated for the interpreter's file header, fixing a bug introduced in
2.1.14 with loadable module support (linux-mips.org commit beb11695
[Import of Linux/MIPS 2.1.14], predating kernel.org repo's history).
Adjust the amount of data read from the interpreter's file accordingly.
This was not an issue before loadable module support, because back then
`load_elf_binary' was executed only once for a given ELF executable,
whether the function succeeded or failed.
With loadable module support supported and enabled, upon a failure of
`load_elf_binary' -- which may for example be caused by architecture
code rejecting an executable due to a missing hardware feature requested
in the file header -- a module load is attempted and then the function
reexecuted by `search_binary_handler'. With the executable's file
header replaced with its interpreter's file header the executable can
then be erroneously accepted in this subsequent attempt.
Cc: stable@vger.kernel.org # all the way back
Signed-off-by: Maciej W. Rozycki <macro@imgtec.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Handle a write being requested to the page immediately beyond the EOF
marker on a cache object. Currently this gets an assertion failure in
CacheFiles because the EOF marker is used there to encode information about
a partial page at the EOF - which could lead to an unknown blank spot in
the file if we extend the file over it.
The problem is actually in fscache where we check the index of the page
being written against store_limit. store_limit is set to the number of
pages that we're allowed to store by fscache_set_store_limit() - which
means it's one more than the index of the last page we're allowed to store.
The problem is that we permit writing to a page with an index _equal_ to
the store limit - when we should reject that case.
Whilst we're at it, change the triggered assertion in CacheFiles to just
return -ENOBUFS instead.
The assertion failure looks something like this:
CacheFiles: Assertion failed
1000 < 7b1 is false
------------[ cut here ]------------
kernel BUG at fs/cachefiles/rdwr.c:962!
...
RIP: 0010:[<ffffffffa02c9e83>] [<ffffffffa02c9e83>] cachefiles_write_page+0x273/0x2d0 [cachefiles]
Cc: stable@vger.kernel.org # v2.6.31+; earlier - that + backport of a17754f (at least)
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
cachefiles requires that s_blocksize in the cache is not greater than
PAGE_SIZE, and performs the check every time a block is accessed.
Move the test to the place where the file is "opened", where other
file-validity tests are performed.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Only override netfs->primary_index when registering success.
Cc: stable@vger.kernel.org # v2.6.30+
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
If netfs exist, fscache should not increase the reference of parent's
usage and n_children, otherwise, never be decreased.
v2: thanks David's suggest,
move increasing reference of parent if success
use kmem_cache_free() freeing primary_index directly
v3: don't move "netfs->primary_index->parent = &fscache_fsdef_index;"
Cc: stable@vger.kernel.org # v2.6.30+
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
In debugfs' start_creating(), we pin the file system to safely access
its root. When we failed to create a file, we unpin the file system via
failed_creating() to release the mount count and eventually the reference
of the vfsmount.
However, when we run into an error during lookup_one_len() when still
in start_creating(), we only release the parent's mutex but not so the
reference on the mount. Looks like it was done in the past, but after
splitting portions of __create_file() into start_creating() and
end_creating() via 190afd81e4 ("debugfs: split the beginning and the
end of __create_file() off"), this seemed missed. Noticed during code
review.
Fixes: 190afd81e4 ("debugfs: split the beginning and the end of __create_file() off")
Cc: stable@vger.kernel.org # v4.0+
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull block IO poll support from Jens Axboe:
"Various groups have been doing experimentation around IO polling for
(really) fast devices. The code has been reviewed and has been
sitting on the side for a few releases, but this is now good enough
for coordinated benchmarking and further experimentation.
Currently O_DIRECT sync read/write are supported. A framework is in
the works that allows scalable stats tracking so we can auto-tune
this. And we'll add libaio support as well soon. Fow now, it's an
opt-in feature for test purposes"
* 'for-4.4/io-poll' of git://git.kernel.dk/linux-block:
direct-io: be sure to assign dio->bio_bdev for both paths
directio: add block polling support
NVMe: add blk polling support
block: add block polling support
blk-mq: return tag/queue combo in the make_request_fn handlers
block: change ->make_request_fn() and users to return a queue cookie
* access time support for UBIFS by Dongsheng Yang
* random cleanups and bug fixes all over the place
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJWQmnVAAoJEEtJtSqsAOnWpnkP/0pUN+Cx58QJoJXxaSgClH/l
QwYGznz4Fv3QuSafXJZy8CPzsTasaOOtoVKJPfPTvP8Drrl3FFbtdhqlqdx1YKqw
DuKeViwmE/HF49a5pxgtR8032l35s3zZUWwd2yKA3B6SVEIsjAoWlIutyzKA/DiR
R4mGVsfRevfzq1+l2YXGGZ8PJXMtRKnFTjzmIUJY0OeT3xCvl/uUFCid2x693aut
GpJ6AMKGYcU+BuQJmb89wvdRYCHPdYEJr7TWq+GBxU3gDPlLY9302IhPq9ftv8Lp
ClA0X3GZqVPR921XR+aA4ysRbfENT+UBewiBt2OEwQLnI6jxEn9EqsgErATxpOYy
PEduYuU56mqc67EE4r7+YHfkidMSa8KrR9s+GpzR6JStW9HNGFDfGfCAK7BcdvWT
+TuHfNk787Gbl4TO41iFI87s6zNQszYOpxBTEFYHx0FQWLNxD5c79UkisL+biqvJ
bX8+7pClJQXcY0CXqe56GVQLC5v4HbvMmnstiNk52ihxEB0NT6htzWe/GPb/+UZ1
K+kOAT54DPeag3FK4J1WZ57cYyJHdbyASB5vKvb+A1IwzrYFo8P35GekoWNK7eAJ
Od5mb++7pTEFSH7oSwsprPOmrHLRmXgRG8yhLRfqX1ZwFeyFdqGQxjce9O/QB1jo
jr4dxAY6Gnjdo1q07QSI
=9n2S
-----END PGP SIGNATURE-----
Merge tag 'upstream-4.4-rc1' of git://git.infradead.org/linux-ubifs
Pull UBI/UBIFS updates from Richard Weinberger:
- access time support for UBIFS by Dongsheng Yang
- random cleanups and bug fixes all over the place
* tag 'upstream-4.4-rc1' of git://git.infradead.org/linux-ubifs:
ubifs: introduce UBIFS_ATIME_SUPPORT to ubifs
ubifs: make ubifs_[get|set]xattr atomic
UBIFS: Delete unnecessary checks before the function call "iput"
UBI: Remove in vain semicolon
UBI: Fastmap: Fix PEB array type
UBIFS: Fix possible memory leak in ubifs_readdir()
fs/ubifs: remove unnecessary new_valid_dev check
ubi: fastmap: Implement produce_free_peb()
UBIFS: print verbose message when rescanning a corrupted node
UBIFS: call dbg_is_power_cut() instead of reading c->dbg->pc_happened
UBI: drop null test before destroy functions
UBI: Update comments to reflect UBI_METAONLY flag
UBI: Fix debug message
UBI: Fix typo in comment
UBI: Fastmap: Simplify expression
UBIFS: fix a typo in comment of ubifs_budget_req
UBIFS: use kmemdup rather than duplicating its implementation
1/ Add support for the ACPI 6.0 NFIT hot add mechanism to process
updates of the NFIT at runtime.
2/ Teach the coredump implementation how to filter out DAX mappings.
3/ Introduce NUMA hints for allocations made by the pmem driver, and as
a side effect all devm allocations now hint their NUMA node by
default.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWQX2sAAoJEB7SkWpmfYgCWsEQAK7w/xM9zClVY/DDlFJxFtYq
DZJ4faPj+E3FMTiJIEDzjtRgQvOFE+wmJtntYsCqKH/QZmpnyk9jeT/CbJzEEL2k
WsAk+qHGLcVUlSb36blwN1RFzYqC+IDYThewJqUvxDbOwL1AbiibbX7gplzZHLhW
+rj3ScVlSNOPRDgGGpkAeLNNsttuKvsGo7nB/JZopm0tV6g14rSK09wQbVhv6S6T
Lu7VGYqnJlkteL9YlzRiROf9hW2ZFCMGJz1YZydPTy3aX3hGTBX4w2qvmsPwBIKP
kW/gCNisVJGk1cZCk4joSJ8i/b3x3fE0zdZ5waivJ5jDvYbUUfyk0KtJkfw207Rl
14yWitUC6aeVuCeOqXHgsjRi+1QVN9Pg7i49xgGiUN1igQiUYRTgQPWZxDv6Zo/s
USrLFQBaRd+hJw+dl7A47lJ3mUF96tPCoQb4LCQ7DVsg5U4J2TvqXLH9Gek/CCZ4
QsMkZDTQlZw4+JEDlzBgg/L7xVty8DadplTADMdjaRhFU3y8zKNJ85Ileokt7KVt
IsBT4+S5HeZLvinZY95932DwAmFp1DtsyENd1BUXL06ddyvlQrFJ6NQaXji4fuDc
EVQmMoTAqDujZFupMAux9vkUBDFj/hmaVD5F7j3+MWP87OCritw/IZn+2LgTaKoX
EmttaYrDr2jJwIaGyw+H
=a2/L
-----END PGP SIGNATURE-----
Merge tag 'libnvdimm-for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updates from Dan Williams:
"Outside of the new ACPI-NFIT hot-add support this pull request is more
notable for what it does not contain, than what it does. There were a
handful of development topics this cycle, dax get_user_pages, dax
fsync, and raw block dax, that need more more iteration and will wait
for 4.5.
The patches to make devm and the pmem driver NUMA aware have been in
-next for several weeks. The hot-add support has not, but is
contained to the NFIT driver and is passing unit tests. The coredump
support is straightforward and was looked over by Jeff. All of it has
received a 0day build success notification across 107 configs.
Summary:
- Add support for the ACPI 6.0 NFIT hot add mechanism to process
updates of the NFIT at runtime.
- Teach the coredump implementation how to filter out DAX mappings.
- Introduce NUMA hints for allocations made by the pmem driver, and
as a side effect all devm allocations now hint their NUMA node by
default"
* tag 'libnvdimm-for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
coredump: add DAX filtering for FDPIC ELF coredumps
coredump: add DAX filtering for ELF coredumps
acpi: nfit: Add support for hot-add
nfit: in acpi_nfit_init, break on a 0-length table
pmem, memremap: convert to numa aware allocations
devm_memremap_pages: use numa_mem_id
devm: make allocations numa aware by default
devm_memremap: convert to return ERR_PTR
devm_memunmap: use devres_release()
pmem: kill memremap_pmem()
x86, mm: quiet arch_add_memory()
Pull i2c updates from Wolfram Sang:
- New drivers: UniPhier (with and without FIFO)
- some drivers got some bigger rework: ismt, designware, img-scb (rcar
had to be reverted because issues were showing up just lately)
- ACPI: reworked the device scanning and added support for muxes
... and quite a lot of driver bugfixes and cleanups this time. All
files touched outside of the i2c realm have proper acks.
* 'i2c/for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (70 commits)
i2c: rcar: Revert the latest refactoring series
i2c: pnx: remove superfluous assignment
MAINTAINERS: i2c: drop i2c-pnx maintainer
MAINTAINERS: i2c: mark also subdirectories as maintained
i2c: cadence: enable driver for ARM64
i2c: i801: Document Intel DNV and Broxton
i2c: at91: manage unexpected RXRDY flag when starting a transfer
i2c: pnx: Use setup_timer instead of open coding it
i2c: add ACPI support for I2C mux ports
acpi: add acpi_preset_companion() stub
i2c: pxa: Add support for pxa910/988 & new configuration features
i2c: au1550: Convert to devm_kzalloc and devm_ioremap_resource
i2c-dev: Fix I2C_SLAVE ioctl comment
i2c-dev: Fix typo in ioctl name reference
i2c: sirf: tune the divider to make i2c bus freq more accurate
i2c: imx: Use -ENXIO as error in the NACK case
i2c: i801: Add support for Intel Broxton
i2c: i801: Add support for Intel DNV
i2c: mediatek: add i2c resume support
i2c: imx: implement bus recovery
...
btrfs sets ->submit_io(), and we failed to set the block dev for
that path. That resulted in a potential NULL dereference when
we later wait for IO in dio_await_one().
Reported-by: kernel test robot <ying.huang@linux.intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
We observed multiple open stateids on the server for files that
seemingly should have been closed.
nfsd4_process_open2() tests for the existence of a preexisting
stateid. If one is not found, the locks are dropped and a new
one is created. The problem is that init_open_stateid(), which
is also responsible for hashing the newly initialized stateid,
doesn't check to see if another open has raced in and created
a matching stateid. This fix is to enable init_open_stateid() to
return the matching stateid and have nfsd4_process_open2()
swap to that stateid and switch to the open upgrade path.
In testing this patch, coverage to the newly created
path indicates that the race was indeed happening.
Signed-off-by: Andrew Elble <aweits@rit.edu>
Reviewed-by: Jeff Layton <jlayton@poochiereds.net>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
We've observed the nfsd server in a state where there are
multiple delegations on the same nfs4_file for the same client.
The nfs client does attempt to DELEGRETURN these when they are presented to
it - but apparently under some (unknown) circumstances the client does not
manage to return all of them. This leads to the eventual
attempt to CB_RECALL more than one delegation with the same nfs
filehandle to the same client. The first recall will succeed, but the
next recall will fail with NFS4ERR_BADHANDLE. This leads to the server
having delegations on cl_revoked that the client has no way to FREE
or DELEGRETURN, with resulting inability to recover. The state manager
on the server will continually assert SEQ4_STATUS_RECALLABLE_STATE_REVOKED,
and the state manager on the client will be looping unable to satisfy
the server.
List discussion also reports a race between OPEN and DELEGRETURN that
will be avoided by only sending the delegation once to the
client. This is also logically in accordance with RFC5561 9.1.1 and 10.2.
So, let's:
1.) Not hand out duplicate delegations.
2.) Only send them to the client once.
RFC 5561:
9.1.1:
"Delegations and layouts, on the other hand, are not associated with a
specific owner but are associated with the client as a whole
(identified by a client ID)."
10.2:
"...the stateid for a delegation is associated with a client ID and may be
used on behalf of all the open-owners for the given client. A
delegation is made to the client as a whole and not to any specific
process or thread of control within it."
Reported-by: Eric Meddaugh <etmsys@rit.edu>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Olga Kornievskaia <aglo@umich.edu>
Signed-off-by: Andrew Elble <aweits@rit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
We have a shrinker, we clean out the cache when nfsd is shut down, and
prune the chains on each request. A recurring workqueue job seems like
unnecessary overhead. Just remove it.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Highlights include:
Features:
- RDMA client backchannel from Chuck
- Support for NFSv4.2 file CLONE using the btrfs ioctl
Bugfixes + cleanups
- Move socket data receive out of the bottom halves and into a workqueue
- Refactor NFSv4 error handling so synchronous and asynchronous RPC handles
errors identically.
- Fix a panic when blocks or object layouts reads return a bad data length
- Fix nfsroot so it can handle a 1024 byte long path.
- Fix bad usage of page offset in bl_read_pagelist
- Various NFSv4 callback cleanups+fixes
- Fix GETATTR bitmap verification
- Support hexadecimal number for sunrpc debug sysctl files
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWQPMXAAoJEGcL54qWCgDy6ZUQAL32vpgyMXe7R4jcxoQxm52+
tn8FrY8aBZAqucvQsIGCrYfE01W/s8goDTQdZODn0MCcoor12BTPVYNIR42/J/no
MNnRTDF0dJ4WG+inX9G87XGG6sFN3wDaQcCaexknkQZlFNF4KthxojzR2BgjmRVI
p3WKkLSNTt6DYQQ8eDetvKoDT0AjR/KCYm89tiE8GMhKYcaZl6dTazJxwOcp2CX9
YDW6+fQbsv8qp5v2ay03e88O/DSmcNRFoxy/KUGT9OwJgdN08IN8fTt6GG38yycT
D9tb9uObBRcll4PnucouadBcykGr6jAP0z8HklE266LH1dwYLOHQoDFdgAs0QGtq
nlySiKvToj6CYXonXoPOjZF3P/lxlkj5ViZ2enBxgxrPmyWl172cUSa6NTXOMO46
kPpxw50xa1gP5kkBVwIZ6XZuzl/5YRhB3BRP3g6yuJCbAwVBJvawYU7riC+6DEB9
zygVfm21vi9juUQXJ37zXVRBTtoFhFjuSxcAYxc63o181lWYShKQ3IiRYg+zTxnq
7DOhXa0ZNGvMgJJi0tH9Es3/S6TrGhyKh5gKY/o2XUjY0hCSsCSdP6jw6Mb9Ax1s
0LzByHAikxBKPt2OFeoUgwycI2xqow4iAfuFk071iP7n0nwC804cUHSkGxW67dBZ
Ve5Skkg1CV+oWQYxGmGZ
=py1V
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-4.4-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
"Highlights include:
New features:
- RDMA client backchannel from Chuck
- Support for NFSv4.2 file CLONE using the btrfs ioctl
Bugfixes + cleanups:
- Move socket data receive out of the bottom halves and into a
workqueue
- Refactor NFSv4 error handling so synchronous and asynchronous RPC
handles errors identically.
- Fix a panic when blocks or object layouts reads return a bad data
length
- Fix nfsroot so it can handle a 1024 byte long path.
- Fix bad usage of page offset in bl_read_pagelist
- Various NFSv4 callback cleanups+fixes
- Fix GETATTR bitmap verification
- Support hexadecimal number for sunrpc debug sysctl files"
* tag 'nfs-for-4.4-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (53 commits)
Sunrpc: Supports hexadecimal number for sysctl files of sunrpc debug
nfs: Fix GETATTR bitmap verification
nfs: Remove unused xdr page offsets in getacl/setacl arguments
fs/nfs: remove unnecessary new_valid_dev check
SUNRPC: fix variable type
NFS: Enable client side NFSv4.1 backchannel to use other transports
pNFS/flexfiles: Add support for FF_FLAGS_NO_IO_THRU_MDS
pNFS/flexfiles: When mirrored, retry failed reads by switching mirrors
SUNRPC: Remove the TCP-only restriction in bc_svc_process()
svcrdma: Add backward direction service for RPC/RDMA transport
xprtrdma: Handle incoming backward direction RPC calls
xprtrdma: Add support for sending backward direction RPC replies
xprtrdma: Pre-allocate Work Requests for backchannel
xprtrdma: Pre-allocate backward rpc_rqst and send/receive buffers
SUNRPC: Abstract backchannel operations
xprtrdma: Saving IRQs no longer needed for rb_lock
xprtrdma: Remove reply tasklet
xprtrdma: Use workqueue to process RPC/RDMA replies
xprtrdma: Replace send and receive arrays
xprtrdma: Refactor reply handler error handling
...
Here is a list of patches we've accumulated for GFS2 for the current upstream
merge window. There are only six patches this time:
1. A cleanup patch from Andreas to remove the gl_spin #define in favor
of its value for the sake of clarity.
2. A fix from Andy Price to mark the inode dirty during fallocate.
3. A fix from Andy Price to set s_mode on mount failures to prevent
a stack trace.
4. A patch from me to prevent a kernel BUG() in trans_add_meta/trans_add_data
due to uninitialized storage.
5. A patch from me to protecting our freeing of the in-core directory
hash table to prevent double-free.
6. A fix for a page/block rounding problem that resulted in a metadata
coherency problem when the block size != page size.
I've got a lot more patches in various stages of review and testing,
but I'm afraid they'll have to wait until the next merge window. So
next time we're likely to have a lot more.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJWQMSyAAoJENeLYdPf93o7k+EH/inFFqkYxLCyXZngihTHdZvS
tYAwYxPJw6UgSrZ1dY6iwcmhy6YgT7a98RJdPA3Kj0SvJxQVBiJ5uc0VKpK0bj72
l7pVPkMEWCHs8u8RAIGfnik8y6IxOP35+EN0U/3ZLMG1Gc+Tmq9M8KLlnhfX980q
oaniaDJAUaSSW8RxD2AKabxjoJ0DKnE6MtDHsL/JWhp1j5co5BbbwOzBmBa2mLCI
RQ8YEvjqjtgm91g33pkxXJVMjAkFqLjRSVfomd5MSQWRUb+eGIpd3LFThHzVfm55
2f4j2kd2V4i7rTrh8Q3RdoYRoFgpXyxXQ3R2UYL59b7B2DvGTyAKyDxfU237ZXQ=
=yanT
-----END PGP SIGNATURE-----
Merge tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 updates from Bob Peterson:
"Here is a list of patches we've accumulated for GFS2 for the current
upstream merge window. There are only six patches this time:
1. A cleanup patch from Andreas to remove the gl_spin #define in favor
of its value for the sake of clarity.
2. A fix from Andy Price to mark the inode dirty during fallocate.
3. A fix from Andy Price to set s_mode on mount failures to prevent a
stack trace.
4 A patch from me to prevent a kernel BUG() in trans_add_meta/trans_add_data
due to uninitialized storage.
5. A patch from me to protecting our freeing of the in-core directory
hash table to prevent double-free.
6. A fix for a page/block rounding problem that resulted in a metadata
coherency problem when the block size != page size"
I've got a lot more patches in various stages of review and testing,
but I'm afraid they'll have to wait until the next merge window. So
next time we're likely to have a lot more"
* tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
GFS2: Fix rgrp end rounding problem for bsize < page size
GFS2: Protect freeing directory hash table with i_lock spin_lock
gfs2: Remove gl_spin define
gfs2: Add missing else in trans_add_meta/data
GFS2: Set s_mode before parsing mount options
GFS2: fallocate: do not rely on file_update_time to mark the inode dirty
Pull ext2 fix from Jan Kara:
"Fix for DAX on ext2"
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
ext2: Add locking for DAX faults
Pull m68knommu/coldfire fix from Greg Ungerer:
"Only a single patch, fixes brk area setup problem in nommu
environments"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
fs/binfmt_elf_fdpic.c: fix brk area overlap with stack on NOMMU
The ELF binary loader in binfmt_elf.c requires an MMU, making it
impossible to use regular ELF binaries on NOMMU archs. However, the FDPIC
ELF loader in binfmt_elf_fdpic.c is fully capable as a loader for plain
ELF, which requires constant displacements between LOAD segments, since it
already supports FDPIC ELF files flagged as needing constant displacement.
This patch adjusts the FDPIC ELF loader to accept non-FDPIC ELF files on
NOMMU archs. They are treated identically to FDPIC ELF files with the
constant-displacement flag bit set, except for personality, which must
match the ABI of the program being loaded; the PER_LINUX_FDPIC personality
controls how the kernel interprets function pointers passed to sigaction.
Files that do not set a stack size requirement explicitly are given a
default stack size (matching the amount of committed stack the normal ELF
loader for MMU archs would give them) rather than being rejected; this is
necessary because plain ELF files generally do not declare stack
requirements in theit program headers.
Only ET_DYN (PIE) format ELF files are supported, since loading at a fixed
virtual address is not possible on NOMMU.
This patch was developed and tested on J2 (SH2-compatible) but should
be usable immediately on all archs where binfmt_elf_fdpic is
available. Moreover, by providing dummy definitions of the
elf_check_fdpic() and elf_check_const_displacement() macros for archs
which lack an FDPIC ABI, it should be possible to enable building of
binfmt_elf_fdpic on all other NOMMU archs and thereby give them ELF
binary support, but I have not yet tested this.
The motivation for using binfmt_elf_fdpic.c rather than adapting
binfmt_elf.c to NOMMU is that the former already has all the necessary
code to work properly on NOMMU and has already received widespread
real-world use and testing. I hope this is not controversial.
I'm not really happy with having to unset the FDPIC_FUNCPTRS
personality bit when loading non-FDPIC ELF. This bit should really
reset automatically on execve, since otherwise, executing non-ELF
binaries (e.g. bFLT) from an FDPIC process will leave the personality
in the wrong state and severely break signal handling. But that's a
separate, existing bug and I don't know the right place to fix it.
Signed-off-by: Rich Felker <dalias@libc.org>
Acked-by: Greg Ungerer <gerg@uclinux.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: Oleg Endo <oleg.endo@t-online.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
new_valid_dev() always returns 1, so the !new_valid_dev() check is not
needed. Remove it.
Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
new_valid_dev() always returns 1, so the !new_valid_dev() check is not
needed. Remove it.
Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
new_valid_dev() always returns 1, so the !new_valid_dev() check is not
needed. Remove it.
Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
new_valid_dev() always returns 1, so the !new_valid_dev() check is not
needed. Remove it.
Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Petr Vandrovec <petr@vandrovec.name>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@fb.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
new_valid_dev() always returns 1, so the !new_valid_dev() checks are not
needed. Remove them.
Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
new_valid_dev() always returns 1, so the !new_valid_dev() check is not
needed. Remove it.
Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
new_valid_dev() always returns 1, so the !new_valid_dev() check is not
needed. Remove it.
Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Changman Lee <cm224.lee@samsung.com>
Cc: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
new_valid_dev() always returns 1, so the !new_valid_dev() check is not
needed. Remove it.
Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jan Kara <jack@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
new_valid_dev() always returns 1, so the !new_valid_dev() check is not
needed. Remove it.
Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Acked-by: Boaz Harrosh <ooo@electrozaur.com>
Cc: Benny Halevy <bhalevy@primarydata.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
new_valid_dev() always returns 1, so the !new_valid_dev() check is not
needed. Remove it.
Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <jbacik@fb.com>
Acked-by: David Sterba <dsterba@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
new_valid_dev() always returns 1, so the !new_valid_dev() check is not
needed. Remove it.
Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Switch everything to the new and more capable implementation of abs().
Mainly to give the new abs() a bit of a workout.
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix kernel-doc warnings in fs/fs-writeback.c by moving a #define macro to
after the function's opening brace. Also #undef this macro at the end of
the function.
../fs/fs-writeback.c:1984: warning: Excess function parameter 'inode' description in 'I_DIRTY_INODE'
../fs/fs-writeback.c:1984: warning: Excess function parameter 'flags' description in 'I_DIRTY_INODE'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix kernel-doc warning in fs/inode.c:
../fs/inode.c:1606: warning: No description found for parameter 'inode'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We're consistently hitting deadlocks here with XFS on recent kernels.
After some digging through the crash files, it looks like everyone in
the system is waiting for XFS to reclaim memory.
Something like this:
PID: 2733434 TASK: ffff8808cd242800 CPU: 19 COMMAND: "java"
#0 [ffff880019c53588] __schedule at ffffffff818c4df2
#1 [ffff880019c535d8] schedule at ffffffff818c5517
#2 [ffff880019c535f8] _xfs_log_force_lsn at ffffffff81316348
#3 [ffff880019c53688] xfs_log_force_lsn at ffffffff813164fb
#4 [ffff880019c536b8] xfs_iunpin_wait at ffffffff8130835e
#5 [ffff880019c53728] xfs_reclaim_inode at ffffffff812fd453
#6 [ffff880019c53778] xfs_reclaim_inodes_ag at ffffffff812fd8c7
#7 [ffff880019c53928] xfs_reclaim_inodes_nr at ffffffff812fe433
#8 [ffff880019c53958] xfs_fs_free_cached_objects at ffffffff8130d3b9
#9 [ffff880019c53968] super_cache_scan at ffffffff811a6f73
#10 [ffff880019c539c8] shrink_slab at ffffffff811460e6
#11 [ffff880019c53aa8] shrink_zone at ffffffff8114a53f
#12 [ffff880019c53b48] do_try_to_free_pages at ffffffff8114a8ba
#13 [ffff880019c53be8] try_to_free_pages at ffffffff8114ad5a
#14 [ffff880019c53c78] __alloc_pages_nodemask at ffffffff8113e1b8
#15 [ffff880019c53d88] alloc_kmem_pages_node at ffffffff8113e671
#16 [ffff880019c53dd8] copy_process at ffffffff8104f781
#17 [ffff880019c53ec8] do_fork at ffffffff8105129c
#18 [ffff880019c53f38] sys_clone at ffffffff810515b6
#19 [ffff880019c53f48] stub_clone at ffffffff818c8e4d
xfs_log_force_lsn is waiting for logs to get cleaned, which is waiting
for IO, which is waiting for workers to complete the IO which is waiting
for worker threads that don't exist yet:
PID: 2752451 TASK: ffff880bd6bdda00 CPU: 37 COMMAND: "kworker/37:1"
#0 [ffff8808d20abbb0] __schedule at ffffffff818c4df2
#1 [ffff8808d20abc00] schedule at ffffffff818c5517
#2 [ffff8808d20abc20] schedule_timeout at ffffffff818c7c6c
#3 [ffff8808d20abcc0] wait_for_completion_killable at ffffffff818c6495
#4 [ffff8808d20abd30] kthread_create_on_node at ffffffff8106ec82
#5 [ffff8808d20abdf0] create_worker at ffffffff8106752f
#6 [ffff8808d20abe40] worker_thread at ffffffff810699be
#7 [ffff8808d20abec0] kthread at ffffffff8106ef59
#8 [ffff8808d20abf50] ret_from_fork at ffffffff818c8ac8
I think we should be using WQ_MEM_RECLAIM to make sure this thread
pool makes progress when we're not able to allocate new workers.
[dchinner: make all workqueues WQ_MEM_RECLAIM]
Signed-off-by: Chris Mason <clm@fb.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>