linux-next

mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-22 20:23:57 +08:00

Author	SHA1	Message	Date
Aneesh Kumar K.V	b271ec47bc	fs/9p: Update link count correctly on different file system operations Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:40 -05:00
Aneesh Kumar K.V	edd73cf544	fs/9p: Add drop_inode 9p callback We want to immediately drop the inode in non cached mode Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:40 -05:00
Aneesh Kumar K.V	e959b54901	fs/9p: Add direct IO support in cached mode Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:40 -05:00
Aneesh Kumar K.V	fa6ea16160	fs/9p: Fix inode i_size update in file_write Only update inode i_size when we write towards end of file. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:40 -05:00
Aneesh Kumar K.V	6b365604ca	fs/9p: set default readahead pages in cached mode We want to enable readahead in cached mode Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:39 -05:00
Aneesh Kumar K.V	6b39f6d22f	fs/9p: Move writeback fid to v9fs_inode Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:39 -05:00
Aneesh Kumar K.V	a78ce05d5d	fs/9p: Add v9fs_inode Switch to the fscache code to v9fs_inode. We will later use v9fs_inode in cache=loose mode to track the inode cache validity timeout. Ie if we find an inode in cache older that a specific jiffie range we will consider it stale Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:39 -05:00
Aneesh Kumar K.V	a12119087b	fs/9p: Don't set stat.st_blocks based on nrpages simple_getattr does set stat.st_blocks to a value derived from nrpages. That is not correct with 9p Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:39 -05:00
Aneesh Kumar K.V	5ffc0cb308	fs/9p: Add inode hashing We didn't add the inode to inode hash in 9p. We need to do that to get sync to work, otherwise __mark_inode_dirty will not add the inode to super block's dirty list. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:39 -05:00
Aneesh Kumar K.V	62d810b424	fs/9p: We need not writeback dirty pages during close Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:38 -05:00
Aneesh Kumar K.V	00ea2df43e	fs/9p: Implement syncfs call back for 9Pfs FIXME!! what about dotu ? Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:38 -05:00
Aneesh Kumar K.V	db5841d4a5	fs/9p: Mark file system with MS_SYNCHRONOUS only if it is not cached mode We should not mark file system synchronous if mounted cache=* option Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:38 -05:00
Aneesh Kumar K.V	a950a65264	fs/9p: Clarify cached dentry delete operation Update the comment to indicate that we don't want to cache negative dentries. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:38 -05:00
Aneesh Kumar K.V	7263cebed9	fs/9p: Add buffered write support for v9fs. We can now support writeable mmaps. Based on the original patch from Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:37 -05:00
Aneesh Kumar K.V	3cf387d780	fs/9p: Add fid to inode in cached mode The fid attached to inode will be opened O_RDWR mode and is used for dirty page writeback only. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:37 -05:00
Aneesh Kumar K.V	17311779ac	fs/9p: Add read write helper function We add read write helper function here which will be used later by the mmap patch Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:37 -05:00
Aneesh Kumar K.V	2efda7998b	fs/9p: [fscache] wait for page write in cached mode We need to call fscache_wait_on_page_write in launder_page for fscache Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:37 -05:00
Aneesh Kumar K.V	20656a49ef	fs/9p: increment inode->i_count in cached mode. We need to ihold even in cached mode Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:36 -05:00
Aneesh Kumar K.V	46848de024	fs/9p: set fs cache cookie in create path also We need to call v9fs_cache_inode_set_cookie in create path also Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:36 -05:00
Aneesh Kumar K.V	29236f4e18	fs/9p: set the cached file_operations struct during inode init With the old code we were not setting the file->f_op with cached file operations during creat. (format correction by jvrao@linux.vnet.ibm.com) Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:36 -05:00
Venkateswararao Jujjuri (JV)	6752a1ebd1	[fs/9p] Make access=client default in 9p2000.L protocol Current code sets access=user as default for all protocol versions. This patch chagnes it to "client" only for dotl. User can always specify particular access mode with -o access= option. No change there. Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:34 -05:00
Venkateswararao Jujjuri (JV)	e782ef7109	[fs/9P] Add posixacl mount option The mount option access=client is overloaded as it assumes acl too. Adding posixacl option to enable POSIX ACLs makes it explicit and clear. Also it is convenient in the future to add other types of acls like richacls. Ideally, the access mode 'client' should be just like V9FS_ACCESS_USER except it underscores the location of access check. Traditional 9P protocol lets the server perform access checks but with this mode, all the access checks will be performed on the client itself. Server just follows the client's directive. Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:34 -05:00
Venkateswararao Jujjuri (JV)	9332685dff	[fs/9p] Ignore acl mount option when CONFIG_9P_FS_POSIX_ACL is not defined. If the kernel is not compiled with CONFIG_9P_FS_POSIX_ACL and the mount option is specified to enable ACLs current code fails the mount. This patch brings the behavior inline with other filesystems like ext3 by proceeding with the mount and log a warning to syslog. Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:34 -05:00
Venkateswararao Jujjuri (JV)	d344b0fb72	[fs/9p] Initialze cached acls both in cached/uncached mode. With create/mkdir/mknod in non cached mode we initialize the inode using v9fs_get_inode. v9fs_get_inode doesn't initialize the cache inode value to NULL. This is causing to trip on BUG_ON in v9fs_get_cached_acl. Fix is to initialize acls to NULL and not to leave them in ACL_NOT_CACHED state. Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2011-03-15 09:57:33 -05:00
Venkateswararao Jujjuri (JV)	c61fa0d6d9	[fs/9p] Plug potential acl leak In v9fs_get_acl() if __v9fs_get_acl() gets only one of the dacl/pacl we are not releasing it. Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2011-03-15 09:57:33 -05:00
Steven Whitehouse	7e32d02613	GFS2: Don't use _raw version of RCU dereference As per RCU glock patch review comments, don't use the _raw version of this function here. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2011-03-15 08:58:17 +00:00
Al Viro	326be7b484	Allow passing O_PATH descriptors via SCM_RIGHTS datagrams Just need to make sure that AF_UNIX garbage collector won't confuse O_PATHed socket on filesystem for real AF_UNIX opened socket. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-15 02:21:45 -04:00
Al Viro	65cfc67223	readlinkat(), fchownat() and fstatat() with empty relative pathnames For readlinkat() we simply allow empty pathname; it will fail unless we have dfd equal to O_PATH-opened symlink, so we are outside of POSIX scope here. For fchownat() and fstatat() we allow AT_EMPTY_PATH; let the caller explicitly ask for such behaviour. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-15 02:21:45 -04:00
Al Viro	bcda76524c	Allow O_PATH for symlinks At that point we can't do almost nothing with them. They can be opened with O_PATH, we can manipulate such descriptors with dup(), etc. and we can see them in /proc//{fd,fdinfo}/. We can't (and won't be able to) follow /proc//fd/ symlinks for those; there's simply not enough information for pathname resolution to go on from such point - to resolve a symlink we need to know which directory does it live in. We will be able to do useful things with them after the next commit, though - readlinkat() and fchownat() will be possible to use with dfd being an O_PATH-opened symlink and empty relative pathname. Combined with open_by_handle() it'll give us a way to do realink-by-handle and lchown-by-handle without messing with more redundant syscalls. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-15 02:21:45 -04:00
Al Viro	1abf0c718f	New kind of open files - "location only". New flag for open(2) - O_PATH. Semantics: * pathname is resolved, but the file itself is _NOT_ opened as far as filesystem is concerned. * almost all operations on the resulting descriptors shall fail with -EBADF. Exceptions are: 1) operations on descriptors themselves (i.e. close(), dup(), dup2(), dup3(), fcntl(fd, F_DUPFD), fcntl(fd, F_DUPFD_CLOEXEC, ...), fcntl(fd, F_GETFD), fcntl(fd, F_SETFD, ...)) 2) fcntl(fd, F_GETFL), for a common non-destructive way to check if descriptor is open 3) "dfd" arguments of ...at(2) syscalls, i.e. the starting points of pathname resolution * closing such descriptor does NOT affect dnotify or posix locks. * permissions are checked as usual along the way to file; no permission checks are applied to the file itself. Of course, giving such thing to syscall will result in permission checks (at the moment it means checking that starting point of ....at() is a directory and caller has exec permissions on it). fget() and fget_light() return NULL on such descriptors; use of fget_raw() and fget_raw_light() is needed to get them. That protects existing code from dealing with those things. There are two things still missing (they come in the next commits): one is handling of symlinks (right now we refuse to open them that way; see the next commit for semantics related to those) and another is descriptor passing via SCM_RIGHTS datagrams. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-15 02:21:45 -04:00
Aneesh Kumar K.V	f2fa2ffc20	ext4: Copy fs UUID to superblock File system UUID is made available to application via /proc/<pid>/mountinfo Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-15 02:21:45 -04:00
Aneesh Kumar K.V	03cb5f03dc	ext3: Copy fs UUID to superblock. File system UUID is made available to application via /proc/<pid>/mountinfo Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-15 02:21:45 -04:00
Aneesh Kumar K.V	93f1c20bc8	vfs: Export file system uuid via /proc/<pid>/mountinfo We add a per superblock uuid field. File systems should update the uuid in the fill_super callback Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-15 02:21:45 -04:00
Aneesh Kumar K.V	f17b604207	fs: Remove i_nlink check from file system link callback Now that VFS check for inode->i_nlink == 0 and returns proper error, remove similar check from file system Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-15 02:21:44 -04:00
Aneesh Kumar K.V	aae8a97d3e	fs: Don't allow to create hardlink for deleted file Add inode->i_nlink == 0 check in VFS. Some of the file systems do this internally. A followup patch will remove those instance. This is needed to ensure that with link by handle we don't allow to create hardlink of an unlinked file. The check also prevent a race between unlink and link Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-15 02:21:44 -04:00
Aneesh Kumar K.V	becfd1f375	vfs: Add open by file handle support [AV: duplicate of open() guts removed; file_open_root() used instead] Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-15 02:21:44 -04:00
Aneesh Kumar K.V	990d6c2d7a	vfs: Add name to file handle conversion support The syscall also return mount id which can be used to lookup file system specific information such as uuid in /proc/<pid>/mountinfo Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-15 02:21:37 -04:00
Al Viro	f52e0c1130	New AT_... flag: AT_EMPTY_PATH For name_to_handle_at(2) we'll want both ...at()-style syscall that would be usable for non-directory descriptors (with empty relative pathname). Introduce new flag (AT_EMPTY_PATH) to deal with that and corresponding LOOKUP_EMPTY; teach user_path_at() and path_init() to deal with the latter. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 19:12:20 -04:00
Linus Torvalds	5f40d42094	Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: NFS: NFSROOT should default to "proto=udp" nfs4: remove duplicated #include NFSv4: nfs4_state_mark_reclaim_nograce() should be static NFSv4: Fix the setlk error handler NFSv4.1: Fix the handling of the SEQUENCE status bits NFSv4/4.1: Fix nfs4_schedule_state_recovery abuses NFSv4.1 reclaim complete must wait for completion NFSv4: remove duplicate clientid in struct nfs_client NFSv4.1: Retry CREATE_SESSION on NFS4ERR_DELAY sunrpc: Propagate errors from xs_bind() through xs_create_sock() (try3-resend) Fix nfs_compat_user_ino64 so it doesn't cause problems if bit 31 or 63 are set in fileid nfs: fix compilation warning nfs: add kmalloc return value check in decode_and_add_ds SUNRPC: Remove resource leak in svc_rdma_send_error() nfs: close NFSv4 COMMIT vs. CLOSE race SUNRPC: Close a race in __rpc_wait_for_completion_task()	2011-03-14 11:19:50 -07:00
Timo Warns	1eafbfeb7b	Fix corrupted OSF partition table parsing The kernel automatically evaluates partition tables of storage devices. The code for evaluating OSF partitions contains a bug that leaks data from kernel heap memory to userspace for certain corrupted OSF partitions. In more detail: for (i = 0 ; i < le16_to_cpu(label->d_npartitions); i++, partition++) { iterates from 0 to d_npartitions - 1, where d_npartitions is read from the partition table without validation and partition is a pointer to an array of at most 8 d_partitions. Add the proper and obvious validation. Signed-off-by: Timo Warns <warns@pre-sense.de> Cc: stable@kernel.org [ Changed the patch trivially to not repeat the whole le16_to_cpu() thing, and to use an explicit constant for the magic value '8' ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-03-14 10:14:28 -07:00
Maxim	6c474f7bc1	GFS2: Adding missing unlock_page() gfs2_write_begin() calls grab_cache_page_write_begin() that returns locked page. Correspondent error-handling path lacks for unlock_page() call: > out: > if (error == 0) > return 0; > > page_cache_release(page); The whole system hangs if gfs2_unstuff_dinode() called from gfs2_write_begin() failed for some reason. Reported-by: Maxim <maxim.patlasov@gmail.com> Signed-off-by: Maxim <maxim.patlasov@gmail.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-03-14 13:19:21 +00:00
Aneesh Kumar K.V	5fe0c23788	exportfs: Return the minimum required handle size The exportfs encode handle function should return the minimum required handle size. This helps user to find out the handle size by passing 0 handle size in the first step and then redoing to the call again with the returned handle size value. Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:28 -04:00
Al Viro	c8b91accfa	clean statfs-like syscalls up New helpers: user_statfs() and fd_statfs(), taking userland pathname and descriptor resp. and filling struct kstatfs. Syscalls of statfs family (native, compat and foreign - osf and hpux on alpha and parisc resp.) switched to those. Removes some boilerplate code, simplifies cleanup on errors... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:28 -04:00
Al Viro	73d049a40f	open-style analog of vfs_path_lookup() new function: file_open_root(dentry, mnt, name, flags) opens the file vfs_path_lookup would arrive to. Note that name can be empty; in that case the usual requirement that dentry should be a directory is lifted. open-coded equivalents switched to it, may_open() got down exactly one caller and became static. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:28 -04:00
Al Viro	5b6ca027d8	reduce vfs_path_lookup() to do_path_lookup() New lookup flag: LOOKUP_ROOT. nd->root is set (and held) by caller, path_init() starts walking from that place and all pathname resolution machinery never drops nd->root if that flag is set. That turns vfs_path_lookup() into a special case of do_path_lookup() and gets us down to 3 callers of link_path_walk(), making it finally feasible to rip the handling of trailing symlink out of link_path_walk(). That will not only simply the living hell out of it, but make life much simpler for unionfs merge. Trailing symlink handling will become iterative, which is a good thing for stack footprint in a lot of situations as well. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:27 -04:00
Al Viro	5a18fff209	untangle do_lookup() That thing has devolved into rats nest of gotos; sane use of unlikely() gets rid of that horror and gives much more readable structure: * make a fast attempt to find a dentry; false negatives are OK. In RCU mode if everything went fine, we are done, otherwise just drop out of RCU. If we'd done (RCU) ->d_revalidate() and it had not refused outright (i.e. didn't give us -ECHILD), remember its result. * now we are not in RCU mode and hopefully have a dentry. If we do not, lock parent, do full d_lookup() and if that has not found anything, allocate and call ->lookup(). If we'd done that ->lookup(), remember that dentry is good and we don't need to revalidate it. * now we have a dentry. If it has ->d_revalidate() and we can't skip it, call it. * hopefully dentry is good; if not, either fail (in case of error) or try to invalidate it. If d_invalidate() has succeeded, drop it and retry everything as if original attempt had not found a dentry. * now we can finish it up - deal with mountpoint crossing and automount. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:27 -04:00
Al Viro	40b39136f0	path_openat: clean ELOOP handling a bit Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:27 -04:00
Al Viro	f374ed5fa8	do_last: kill a rudiment of old ->d_revalidate() workaround There used to be time when ->d_revalidate() couldn't return an error. So intents code had lookup_instantiate_filp() stash ERR_PTR(error) in nd->intent.open.filp and had it checked after lookup_hash(), to catch the otherwise silent failures. That had been introduced by commit `4af4c52f34`. These days ->d_revalidate() can and does propagate errors back to callers explicitly, so this check isn't needed anymore. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:27 -04:00
Al Viro	6c0d46c493	fold __open_namei_create() and open_will_truncate() into do_last() ... and clean up a bit more Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:27 -04:00
Al Viro	ca344a894b	do_last: unify may_open() call and everyting after it We have a bunch of diverging codepaths in do_last(); some of them converge, but the case of having to create a new file duplicates large part of common tail of the rest and exits separately. Massage them so that they could be merged. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:27 -04:00
Al Viro	9b44f1b392	move may_open() from __open_name_create() to do_last() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:26 -04:00
Al Viro	0f9d1a10c3	expand finish_open() in its only caller Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:26 -04:00
Al Viro	5a202bcd75	sanitize pathname component hash calculation Lift it to lookup_one_len() and link_path_walk() resp. into the same place where we calculated default hash function of the same name. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:26 -04:00
Al Viro	6a96ba5441	kill __lookup_one_len() only one caller left Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:26 -04:00
Al Viro	fe2d35ff0d	switch non-create side of open() to use of do_last() Instead of path_lookupat() doing trailing symlink resolution, use the same scheme as on the O_CREAT side. Walk with LOOKUP_PARENT, then (in do_last()) look the final component up, then either open it or return error or, if it's a symlink, give the symlink back to path_openat() to be resolved there. The really messy complication here is RCU. We don't want to drop out of RCU mode before the final lookup, since we don't want to bounce parent directory ->d_count without a good reason. Result is _not_ pretty; later in the series we'll clean it up. For now we are roughly back where we'd been before the revert done by Nick's series - top-level logics of path_openat() is cleaned up, do_last() does actual opening, symlink resolution is done uniformly. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:26 -04:00
Al Viro	70e9b35711	get rid of nd->file Don't stash the struct file * used as starting point of walk in nameidata; pass file ** to path_init() instead. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:26 -04:00
Al Viro	951361f954	get rid of the last LOOKUP_RCU dependencies in link_path_walk() New helper: terminate_walk(). An error has happened during pathname resolution and we either drop nd->path or terminate RCU, depending the mode we had been in. After that, nd is essentially empty. Switch link_path_walk() to using that for cleanup. Now the top-level logics in link_path_walk() is back to sanity. RCU dependencies are in the lower-level functions. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:26 -04:00
Al Viro	a7472baba2	make nameidata_dentry_drop_rcu_maybe() always leave RCU mode Now we have do_follow_link() guaranteed to leave without dangling RCU and the next step will get LOOKUP_RCU logics completely out of link_path_walk(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:25 -04:00
Al Viro	ef7562d528	make handle_dots() leave RCU mode on error Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:25 -04:00
Al Viro	4455ca6223	clear RCU on all failure exits from link_path_walk() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:25 -04:00
Al Viro	9856fa1b28	pull handling of . and .. into inlined helper getting LOOKUP_RCU checks out of link_path_walk()... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:25 -04:00
Al Viro	7bc055d1d5	kill out_dput: in link_path_walk() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:25 -04:00
Al Viro	13aab428a7	separate -ESTALE/-ECHILD retries in do_filp_open() from real work new helper: path_openat(). Does what do_filp_open() does, except that it tries only the walk mode (RCU/normal/force revalidation) it had been told to. Both create and non-create branches are using path_lookupat() now. Fixed the double audit_inode() in non-create branch. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:25 -04:00
Al Viro	47c805dc2d	switch do_filp_open() to struct open_flags take calculation of open_flags by open(2) arguments into new helper in fs/open.c, move filp_open() over there, have it and do_sys_open() use that helper, switch exec.c callers of do_filp_open() to explicit (and constant) struct open_flags. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:25 -04:00
Al Viro	c3e380b0b3	Collect "operation mode" arguments of do_last() into a structure No point messing with passing shitloads of "operation mode" arguments to do_open() one by one, especially since they are not going to change during do_filp_open(). Collect them into a struct, fill it and pass to do_last() by reference. Make sure that lookup intent flags are correctly set and removed - we want them for do_last(), but they make no sense for __do_follow_link(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:25 -04:00
Al Viro	f1afe9efc8	clean up the failure exits after __do_follow_link() in do_filp_open() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:24 -04:00
Al Viro	36f3b4f690	pull security_inode_follow_link() into __do_follow_link() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:24 -04:00
Al Viro	086e183a64	pull dropping RCU on success of link_path_walk() into path_lookupat() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:24 -04:00
Al Viro	16c2cd7179	untangle the "need_reval_dot" mess instead of ad-hackery around need_reval_dot(), do the following: set a flag (LOOKUP_JUMPED) in the beginning of path, on absolute symlink traversal, on ".." and on procfs-style symlinks. Clear on normal components, leave unchanged on ".". Non-nested callers of link_path_walk() call handle_reval_path(), which checks that flag is set and that fs does want the final revalidate thing, then does ->d_revalidate(). In link_path_walk() all the return_reval stuff is gone. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:24 -04:00
Al Viro	fe479a580d	merge component type recognition no need to do it in three places... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:24 -04:00
Al Viro	e41f7d4ee5	merge path_init and path_init_rcu Actual dependency on whether we want RCU or not is in 3 small areas (as it ought to be) and everything around those is the same in both versions. Since each function has only one caller and those callers are on two sides of if (flags & LOOKUP_RCU), it's easier and cleaner to merge them and pull the checks inside. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:24 -04:00
Al Viro	ee0827cd6b	sanitize path_walk() mess New helper: path_lookupat(). Basically, what do_path_lookup() boils to modulo -ECHILD/-ESTALE handler. path_walk* family is gone; vfs_path_lookup() is using link_path_walk() directly, do_path_lookup() and do_filp_open() are using path_lookupat(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:24 -04:00
Al Viro	52094c8a06	take RCU-dependent stuff around exec_permission() into a new helper Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:23 -04:00
Al Viro	c9c6cac0c2	kill path_lookup() all remaining callers pass LOOKUP_PARENT to it, so flags argument can die; renamed to kern_path_parent() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:23 -04:00
Steven Whitehouse	c618e87a5f	GFS2: Update to AIL list locking The previous patch missed a couple of places where the AIL list needed locking, so this fixes up those places, plus a comment is corrected too. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Dave Chinner <dchinner@redhat.com>	2011-03-14 12:40:29 +00:00
Al Viro	c44ed965be	compat breakage in preadv() and pwritev() Fix for a dumb preadv()/pwritev() compat bug - unlike the native variants, the compat_... ones forget to check FMODE_P{READ,WRITE}, so e.g. on pipe the native preadv() will fail with -ESPIPE and compat one will act as readv() and succeed. Not critical, but it's a clear bug with trivial fix, so IMO it's OK for -final. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-03-13 16:29:07 -07:00
Al Viro	586ce098a2	compat breakage in preadv() and pwritev() Fix for a dumb preadv()/pwritev() compat bug - unlike the native variants, compat_... ones forget to check FMODE_P{READ,WRITE}, so e.g. on pipe the native preadv() will fail with -ESPIPE and compat one will act as readv() and succeed. Not critical, but it's a clear bug with trivial fix. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-13 19:21:26 -04:00
Linus Torvalds	0e5b88cd99	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: break out of shrink_delalloc earlier btrfs: fix not enough reserved space btrfs: fix dip leak Btrfs: make sure not to return overlapping extents to fiemap Btrfs: deal with short returns from copy_from_user Btrfs: fix regressions in copy_from_user handling	2011-03-13 16:00:49 -07:00
Chris Mason	36e39c40b3	Btrfs: break out of shrink_delalloc earlier Josef had changed shrink_delalloc to exit after three shrink attempts, which wasn't quite enough because new writers could race in and steal free space. But it also fixed deadlocks and stalls as we tried to recover delalloc reservations. The code was tweaked to loop 1024 times, and would reset the counter any time a small amount of progress was made. This was too drastic, and with a lot of writers we can end up stuck in shrink_delalloc forever. The shrink_delalloc loop is fairly complex because the caller is looping too, and the caller will go ahead and force a transaction commit to make sure we reclaim space. This reworks things to exit shrink_delalloc when we've forced some writeback and the delalloc reservations have gone down. This means the writeback has not just started but has also finished at least some of the metadata changes required to reclaim delalloc space. If we've got this wrong, we're returning ENOSPC too early, which is a big improvement over the current behavior of hanging the machine. Test 224 in xfstests hammers on this nicely, and with 1000 writers trying to fill a 1GB drive we get our first ENOSPC at 93% full. The other writers are able to continue until we get 100%. This is a worst case test for btrfs because the 1000 writers are doing small IO, and the small FS size means we don't have a lot of room for metadata chunks. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-03-12 07:08:42 -05:00
Chuck Lever	53d4737580	NFS: NFSROOT should default to "proto=udp" There have been a number of recent reports that NFSROOT is no longer working with default mount options, but fails only with certain NICs. Brian Downing <bdowning@lavos.net> bisected to commit `56463e50` "NFS: Use super.c for NFSROOT mount option parsing". Among other things, this commit changes the default mount options for NFSROOT to use TCP instead of UDP as the underlying transport. TCP seems less able to deal with NICs that are slow to initialize. The system logs that have accompanied reports of problems all show that NFSROOT attempts to establish a TCP connection before the NIC is fully initialized, and thus the TCP connection attempt fails. When a TCP connection attempt fails during a mount operation, the NFS stack needs to fail the operation. Usually user space knows how and when to retry it. The network layer does not report a distinct error code for this particular failure mode. Thus, there isn't a clean way for the RPC client to see that it needs to retry in this case, but not in others. Because NFSROOT is used in some environments where it is not possible to update the kernel command line to specify "udp", the proper thing to do is change NFSROOT to use UDP by default, as it did before commit `56463e50`. To make it easier to see how to change default mount options for NFSROOT and to distinguish default settings from mandatory settings, I've adjusted a couple of areas to document the specifics. root_nfs_cat() is also modified to deal with commas properly when concatenating strings containing mount option lists. This keeps root_nfs_cat() call sites simpler, now that we may be concatenating multiple mount option strings. Tested-by: Brian Downing <bdowning@lavos.net> Tested-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: <stable@kernel.org> # 2.6.37 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:07 -05:00
Huang Weiyi	57df216bd8	nfs4: remove duplicated #include Remove duplicated #include('s) in fs/nfs/nfs4proc.c Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:18:37 -05:00
Trond Myklebust	f9feab1e18	NFSv4: nfs4_state_mark_reclaim_nograce() should be static There are no more external users of nfs4_state_mark_reclaim_nograce() or nfs4_state_mark_reclaim_reboot(), so mark them as static. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:18:36 -05:00
Trond Myklebust	ecac799a5e	NFSv4: Fix the setlk error handler Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:18:36 -05:00
Trond Myklebust	b4410c2f7f	NFSv4.1: Fix the handling of the SEQUENCE status bits We want SEQUENCE status bits to be handled by the state manager in order to avoid threading issues. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:18:35 -05:00
Trond Myklebust	0400a6b0cb	NFSv4/4.1: Fix nfs4_schedule_state_recovery abuses nfs4_schedule_state_recovery() should only be used when we need to force the state manager to check the lease. If we just want to start the state manager in order to handle a state recovery situation, we should be using nfs4_schedule_state_manager(). This patch fixes the abuses of nfs4_schedule_state_recovery() by replacing its use with a set of helper functions that do the right thing. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:18:22 -05:00
Dave Chinner	d6a079e82e	GFS2: introduce AIL lock The log lock is currently used to protect the AIL lists and the movements of buffers into and out of them. The lists are self contained and no log specific items outside the lists are accessed when starting or emptying the AIL lists. Hence the operation of the AIL does not require the protection of the log lock so split them out into a new AIL specific lock to reduce the amount of traffic on the log lock. This will also reduce the amount of serialisation that occurs when the gfs2_logd pushes on the AIL to move it forward. This reduces the impact of log pushing on sequential write throughput. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-03-11 11:52:25 +00:00
Benjamin Marzinski	e4a7b7b0c9	GFS2: fix block allocation check for fallocate GFS2 fallocate wasn't properly checking if a blocks were already allocated. In write_empty_blocks(), if a page didn't have buffer_heads attached, GFS2 was always treating it as if there were no blocks allocated for that page. GFS2 now calls gfs2_block_map() to check if the blocks are allocated before writing them out. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-03-11 09:26:48 +00:00
Bob Peterson	fa1bbdea30	GFS2: Optimize glock multiple-dequeue code This is a small patch that optimizes multiple glock dequeue operations. It changes the unlock order to be more efficient and makes it easier for lock debugging tools to unravel. It also eliminates the need for the temp variable x, although that would likely be optimized out. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-03-11 09:24:54 +00:00
Andy Adamson	c34c32ea97	NFSv4.1 reclaim complete must wait for completion Signed-off-by: Andy Adamson <andros@netapp.com> [Trond: fix whitespace errors] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-10 15:05:01 -05:00
Andy Adamson	114f64b5f2	NFSv4: remove duplicate clientid in struct nfs_client Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-10 15:05:00 -05:00
Ricardo Labiaga	7d6d63d642	NFSv4.1: Retry CREATE_SESSION on NFS4ERR_DELAY Fix bug where we currently retry the EXCHANGEID call again, eventhough we already have a valid clientid. Instead, delay and retry the CREATE_SESSION call. Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-10 15:04:59 -05:00
Frank Filz	3fa0b4e201	(try3-resend) Fix nfs_compat_user_ino64 so it doesn't cause problems if bit 31 or 63 are set in fileid The problem was use of an int32, which when converted to a uint64 is sign extended resulting in a fileid that doesn't fit in 32 bits even though the intent of the function is to fit the fileid into 32 bits. Signed-off-by: Frank Filz <ffilzlnx@us.ibm.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> [Trond: Added an include for compat.h] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-10 15:04:58 -05:00
Jovi Zhang	43b7c3f051	nfs: fix compilation warning this commit fix compilation warning as following: linux-2.6/fs/nfs/nfs4proc.c:3265: warning: comparison of distinct pointer types lacks a cast Signed-off-by: Jovi Zhang <bookjovi@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-10 15:04:56 -05:00
Stanislav Fomichev	b9f810570d	nfs: add kmalloc return value check in decode_and_add_ds add kmalloc return value check in decode_and_add_ds Signed-off-by: Stanislav Fomichev <kernel@fomichev.me> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-10 15:04:55 -05:00
Jeff Layton	d2224e7afb	nfs: close NFSv4 COMMIT vs. CLOSE race I've been adding in more artificial delays in the NFSv4 commit and close codepaths to uncover races. The kernel I'm testing has the patch to close the race in __rpc_wait_for_completion_task that's in Trond's cthon2011 branch. The reproducer I've been using does this in a loop: mkdir("DIR"); fd = open("DIR/FILE", O_WRONLY\|O_CREAT\|O_EXCL, 0644); write(fd, "abcdefg", 7); close(fd); unlink("DIR/FILE"); rmdir("DIR"); The above reproducer shouldn't result in any silly-renaming. However, when I add a "msleep(100)" just after the nfs_commit_clear_lock call in nfs_commit_release, I can almost always force one to occur. If I can force it to occur with that, then it can happen without that delay given the right timing. nfs_commit_inode waits for the NFS_INO_COMMIT bit to clear when called with FLUSH_SYNC set. nfs_commit_rpcsetup on the other hand does not wait for the task to complete before putting its reference to it, so the last reference get put in rpc_release task and gets queued to a workqueue. In this situation, the last open context reference may be put by the COMMIT release instead of the close() syscall. The close() syscall returns too quickly and the unlink runs while the d_count is still high since the COMMIT release hasn't put its dentry reference yet. Fix this by having rpc_commit_rpcsetup wait for the RPC call to complete before putting the task reference when FLUSH_SYNC is set. With this, the last reference is put by the process that's initiating the FLUSH_SYNC commit and the race is closed. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-10 15:04:53 -05:00
Trond Myklebust	bf294b41ce	SUNRPC: Close a race in __rpc_wait_for_completion_task() Although they run as rpciod background tasks, under normal operation (i.e. no SIGKILL), functions like nfs_sillyrename(), nfs4_proc_unlck() and nfs4_do_close() want to be fully synchronous. This means that when we exit, we want all references to the rpc_task to be gone, and we want any dentry references etc. held by that task to be released. For this reason these functions call __rpc_wait_for_completion_task(), followed by rpc_put_task() in the expectation that the latter will be releasing the last reference to the rpc_task, and thus ensuring that the callback_ops->rpc_release() has been called synchronously. This patch fixes a race which exists due to the fact that rpciod calls rpc_complete_task() (in order to wake up the callers of __rpc_wait_for_completion_task()) and then subsequently calls rpc_put_task() without ensuring that these two steps are done atomically. In order to avoid adding new spin locks, the patch uses the existing waitqueue spin lock to order the rpc_task reference count releases between the waiting process and rpciod. The common case where nobody is waiting for completion is optimised for by checking if the RPC_TASK_ASYNC flag is cleared and/or if the rpc_task reference count is 1: in those cases we drop trying to grab the spin lock, and immediately free up the rpc_task. Those few processes that need to put the rpc_task from inside an asynchronous context and that do not care about ordering are given a new helper: rpc_put_task_async(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-10 15:04:52 -05:00
Miao Xie	7e6b6465e6	btrfs: fix not enough reserved space btrfs_link() will insert 3 items(inode ref, dir name item and dir index item) into the b+ tree and update 2 items(its inode, and parent's inode) in the b+ tree. So we should reserve space for these 5 items, not 3 items. Reported-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-03-10 11:21:49 -05:00
Daniel J Blueman	b4966b7770	btrfs: fix dip leak The btrfs DIO code leaks dip structs when dip->csums allocation fails; bio->bi_end_io isn't set at the point where the free_ordered branch is consequently taken, thus bio_endio doesn't call the function which would free it in the normal case. Fix. Signed-off-by: Daniel J Blueman <daniel.blueman@gmail.com> Acked-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-03-10 11:21:49 -05:00
J. Bruce Fields	d891eedbc3	fs/dcache: allow d_obtain_alias() to return unhashed dentries Without this patch, inodes are not promptly freed on last close of an unlinked file by an nfs client: client$ mount -tnfs4 server:/export/ /mnt/ client$ tail -f /mnt/FOO ... server$ df -i /export server$ rm /export/FOO (^C the tail -f) server$ df -i /export server$ echo 2 >/proc/sys/vm/drop_caches server$ df -i /export the df's will show that the inode is not freed on the filesystem until the last step, when it could have been freed after killing the client's tail -f. On-disk data won't be deallocated either, leading to possible spurious ENOSPC. This occurs because when the client does the close, it arrives in a compound with a putfh and a close, processed like: - putfh: look up the filehandle. The only alias found for the inode will be DCACHE_UNHASHED alias referenced by the filp this, so it creates a new DCACHE_DISCONECTED dentry and returns that instead. - close: closes the existing filp, which is destroyed immediately by dput() since it's DCACHE_UNHASHED. - end of the compound: release the reference to the current filehandle, and dput() the new DCACHE_DISCONECTED dentry, which gets put on the unused list instead of being destroyed immediately. Nick Piggin suggested fixing this by allowing d_obtain_alias to return the unhashed dentry that is referenced by the filp, instead of making it create a new dentry. Leave __d_find_alias() alone to avoid changing behavior of other callers. Also nfsd doesn't need all the checks of __d_find_alias(); any dentry, hashed or unhashed, disconnected or not, should work. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-10 05:18:54 -05:00
Marco Stornelli	1ca551c6ca	Check for immutable/append flag in fallocate path In the fallocate path the kernel doesn't check for the immutable/append flag. It's possible to have a race condition in this scenario: an application open a file in read/write and it does something, meanwhile root set the immutable flag on the file, the application at that point can call fallocate with success. In addition, we don't allow to do any unreserve operation on an append only file but only the reserve one. Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-10 04:22:15 -05:00

1 2 3 4 5 ...

21664 Commits