linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-12-12 05:24:12 +08:00

Author	SHA1	Message	Date
Trond Myklebust	34901f70d1	NFS: Writeback optimisation Schedule writes using WB_SYNC_NONE first, then come back for a second pass using WB_SYNC_ALL. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-10-09 17:15:21 -04:00
Trond Myklebust	ed90ef51a3	NFS: Clean up NFS writeback flush code The only user of nfs_sync_mapping_range() is nfs_getattr(), which uses it to flush out the entire inode without sending a commit. We therefore replace nfs_sync_mapping_range with a more appropriate helper. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-10-09 17:15:18 -04:00
Trond Myklebust	f758c88519	NFS: Clean up nfs_writepages() Just call write_cache_pages directly instead of hacking the writeback control structure in order to find out if we were called from writepages() or directly from the VM. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-10-09 17:15:13 -04:00
Trond Myklebust	9cccef9505	NFS: Clean up write code... The addition of nfs_page_mkwrite means that We should no longer need to create requests inside nfs_writepage() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-10-09 17:15:11 -04:00
Trond Myklebust	1b3b4a1a2d	NFS: Fix a write request leak in nfs_invalidate_page() Ryusuke Konishi says: The recent truncate_complete_page() clears the dirty flag from a page before calling a_ops->invalidatepage(), ^^^^^^ static void truncate_complete_page(struct address_space mapping, struct page page) { ... cancel_dirty_page(page, PAGE_CACHE_SIZE); <--- Inserted here at kernel 2.6.20 if (PagePrivate(page)) do_invalidatepage(page, 0); ---> will call a_ops->invalidatepage() ... } and this is disturbing nfs_wb_page_priority() from calling nfs_writepage_locked() that is expected to handle the pending request (=nfs_page) associated with the page. int nfs_wb_page_priority(struct inode inode, struct page page, int how) { ... if (clear_page_dirty_for_io(page)) { ret = nfs_writepage_locked(page, &wbc); if (ret < 0) goto out; } ... } Since truncate_complete_page() will get rid of the page after a_ops->invalidatepage() returns, the request (=nfs_page) associated with the page becomes a garbage in nfs_inode->nfs_page_tree. ------------------------ Fix this by ensuring that nfs_wb_page_priority() recognises that it may also need to clear out non-dirty pages that have an nfs_page associated with them. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-09-01 10:14:54 -04:00
Paul Mundt	20c2df83d2	mm: Remove slab destructors from kmem_cache_create(). Slab destructors were no longer supported after Christoph's `c59def9f22` change. They've been BUGs for both slab and slub, and slob never supported them either. This rips out support for the dtor pointer from kmem_cache_create() completely and fixes up every single callsite in the kernel (there were about 224, not including the slab allocator definitions themselves, or the documentation references). Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2007-07-20 10:11:58 +09:00
Trond Myklebust	587142f85f	NFS: Replace NFS_I(inode)->req_lock with inode->i_lock There is no justification for keeping a special spinlock for the exclusive use of the NFS writeback code. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:38 -04:00
Trond Myklebust	2aefa10431	NFS: Remove the redundant 'dirty' and 'commit' lists from nfs_inode Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:26 -04:00
Trond Myklebust	5c36968343	NFS cleanup: speed up nfs_scan_commit using radix tree tags Add a tag for requests that are waiting for a COMMIT Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:26 -04:00
Trond Myklebust	9fd367f0f3	NFS cleanup: Rename NFS_PAGE_TAG_WRITEBACK to NFS_PAGE_TAG_LOCKED Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:26 -04:00
Trond Myklebust	c03b402461	NFS: Convert struct nfs_page to use krefs Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:26 -04:00
Trond Myklebust	88be9f990f	NFS: Replace vfsmount and dentry in nfs_open_context with struct path Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:23 -04:00
Trond Myklebust	44dd151d5c	NFS: Don't mark a written page as uptodate until it is on disk The write may fail, so we should not mark the page as uptodate until we are certain that the data has been accepted and written to disk by the server. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:23 -04:00
Trond Myklebust	7fe7f8487a	NFS: Avoid a deadlock situation on write When processes are allowed to attempt to lock a non-contiguous range of nfs write requests, it is possible for generic_writepages to 'wrap round' the address space, and call writepage() on a request that is already locked by the same process. We avoid the deadlock by checking if the page index is contiguous with the list of nfs write requests that is already held in our nfs_pageio_descriptor prior to attempting to lock a new request. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-05-24 10:44:20 -04:00
Trond Myklebust	10afec9081	NFS: Fix some 'sparse' warnings... - fs/nfs/dir.c:610:8: warning: symbol 'nfs_llseek_dir' was not declared. Should it be static? - fs/nfs/dir.c:636:5: warning: symbol 'nfs_fsync_dir' was not declared. Should it be static? - fs/nfs/write.c:925:19: warning: symbol 'req' shadows an earlier one - fs/nfs/write.c:61:6: warning: symbol 'nfs_commit_rcu_free' was not declared. Should it be static? - fs/nfs/nfs4proc.c:793:5: warning: symbol 'nfs4_recover_expired_lease' was not declared. Should it be static? Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-05-14 19:33:46 -04:00
Nate Diller	60945cb7c8	NFS: use zero_user_page Use zero_user_page() instead of the newly deprecated memclear_highpage_flush(). Signed-off-by: Nate Diller <nate.diller@gmail.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-05-14 19:33:45 -04:00
Peter Zijlstra	277866a0e3	nfs: fix congestion control: use atomic_longs Change the atomic_t in struct nfs_server to atomic_long_t in anticipation of machines that can handle 8+TB of (4K) pages under writeback. However I suspect other things in NFS will start going bang by then. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:21 -07:00
Randy Dunlap	e63340ae6b	header cleaning: don't include smp_lock.h when not used Remove includes of <linux/smp_lock.h> where it is not used/needed. Suggested by Al Viro. Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc, sparc64, and arm (all 59 defconfigs). Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:07 -07:00
Trond Myklebust	ca52fec152	NFS: Use pgoff_t in structures and functions that pass page cache offsets Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:09 -07:00
Trond Myklebust	724c439c20	NFS: Clean up nfs_sync_mapping_wait() It has no business touching wbc->pages_skipped. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:08 -07:00
Trond Myklebust	8d5658c949	NFS: Fix a buffer overflow in the allocation of struct nfs_read/writedata Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:07 -07:00
Trond Myklebust	c63c7b0513	NFS: Fix a race when doing NFS write coalescing Currently we do write coalescing in a very inefficient manner: one pass in generic_writepages() in order to lock the pages for writing, then one pass in nfs_flush_mapping() and/or nfs_sync_mapping_wait() in order to gather the locked pages for coalescing into RPC requests of size "wsize". In fact, it turns out there is actually a deadlock possible here since we only start I/O on the second pass. If the user signals the process while we're in nfs_sync_mapping_wait(), for instance, then we may exit before starting I/O on all the requests that have been queued up. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:06 -07:00
Trond Myklebust	bcb71bba7e	NFS: Another cleanup of the read/write request coalescing code Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:04 -07:00
Trond Myklebust	d8a5ad75cc	NFS: Cleanup the coalescing code Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:04 -07:00
Trond Myklebust	91e59c368c	NFS: Don't wait for congestion in nfs_update_request() It is redundant, and will interfere with the call to balance_dirty_pages_ratelimited_nr in generic_file_write(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:03 -07:00
Trond Myklebust	d585158b60	NFS: Fix nfs_set_page_dirty() Be more careful about testing page->mapping. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:02 -07:00
Trond Myklebust	2b82f190c8	NFS: Fix race in nfs_set_page_dirty Protect nfs_set_page_dirty() against races with nfs_inode_add_request. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-20 22:56:30 -07:00
Trond Myklebust	612c9384fd	NFS: Fix the 'desynchronized value of nfs_i.ncommit' error Redirtying a request that is already marked for commit will screw up the accounting for NR_UNSTABLE_NFS as well as nfs_i.ncommit. Ensure that all requests on the commit queue are labelled with the PG_NEED_COMMIT flag, and avoid moving them onto the dirty list inside nfs_page_mark_flush(). Also inline nfs_mark_request_dirty() into nfs_page_mark_flush() for atomicity reasons. Avoid dropping the spinlock until we're done marking the request in the radix tree and have added it to the ->dirty list. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-20 22:56:29 -07:00
Trond Myklebust	6d677e3504	NFS: Don't clear PG_writeback until after we've processed unstable writes Ensure that we don't release the PG_writeback lock until after the page has either been redirtied, or queued on the nfs_inode 'commit' list. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-20 22:56:29 -07:00
Trond Myklebust	8e821cad12	NFS: clean up the unstable write code Get rid of the inlined #ifdefs. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-20 22:56:29 -07:00
Trond Myklebust	eb4cac10d9	NFS: Fix a list corruption problem We must remove the request from whatever list it is currently on before we can add it to the dirty list. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-15 16:48:11 -07:00
Trond Myklebust	5a6d41b32a	NFS: Ensure PG_writeback is cleared when writeback fails If the writebacks are cancelled via nfs_cancel_dirty_list, or due to the memory allocation failing in nfs_flush_one/nfs_flush_multi, then we must ensure that the PG_writeback flag is cleared. Also ensure that we actually own the PG_writeback flag whenever we schedule a new writeback by making nfs_set_page_writeback() return the value of test_set_page_writeback(). The PG_writeback page flag ends up replacing the functionality of the PG_FLUSHING nfs_page flag, so we rip that out too. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-14 21:46:48 -07:00
Peter Zijlstra	89a09141df	[PATCH] nfs: fix congestion control The current NFS client congestion logic is severly broken, it marks the backing device congested during each nfs_writepages() call but doesn't mirror this in nfs_writepage() which makes for deadlocks. Also it implements its own waitqueue. Replace this by a more regular congestion implementation that puts a cap on the number of active writeback pages and uses the bdi congestion waitqueue. Also always use an interruptible wait since it makes sense to be able to SIGKILL the process even for mounts without 'intr'. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Christoph Lameter <clameter@engr.sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-03-16 19:25:05 -07:00
Trond Myklebust	a301b77771	NFS: Don't use ClearPageUptodate() when writeback fails ClearPageUptodate() will just cause races here. What we really want to do is to invalidate the page cache. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-02-12 22:40:38 -08:00
Chuck Lever	a3f565b1e5	NFS: fix print format for tk_pid The tk_pid field is an unsigned short. The proper print format specifier for that type is %5u, not %4d. Also clean up some miscellaneous print formatting nits. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-02-03 15:35:09 -08:00
Trond Myklebust	7c85d9007d	NFS: Fixup some outdated comments... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-02-03 15:35:07 -08:00
Trond Myklebust	d30c8348a4	NFS: nfs_writepages() cleanup Strip out the call to nfs_commit_inode(), and allow that to be done by nfs_write_inode(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-02-03 15:35:07 -08:00
Trond Myklebust	f40313ac39	NFS: Micro-optimisation for nfs_wb_page() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-02-03 15:35:07 -08:00
Trond Myklebust	02241bc47e	NFS: Ensure that ->writepage() uses flush_stable() when reclaiming pages Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-02-03 15:35:06 -08:00
Josef "Jeff" Sipek	01cce933d8	[PATCH] nfs: change uses of f_{dentry,vfsmnt} to use f_path Change all the uses of f_{dentry,vfsmnt} to f_path.{dentry,mnt} in the nfs client code. Signed-off-by: Josef "Jeff" Sipek <jsipek@cs.sunysb.edu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-08 08:28:41 -08:00
Trond Myklebust	21b4e73692	Merge branch 'master' of /home/trondmy/kernel/linux-2.6/ into merge_linus	2006-12-07 16:35:17 -05:00
Christoph Lameter	e18b890bb0	[PATCH] slab: remove kmem_cache_t Replace all uses of kmem_cache_t with struct kmem_cache. The patch was generated using the following script: #!/bin/sh # # Replace one string by another in all the kernel sources. # set -e for file in `find * -name ".c" -o -name ".h"\|xargs grep -l $1`; do quilt add $file sed -e "1,\$s/$1/$2/g" $file >/tmp/$$ mv /tmp/$$ $file quilt refresh done The script was run like this sh replace kmem_cache_t "struct kmem_cache" Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:25 -08:00
Christoph Lameter	e6b4f8da3a	[PATCH] slab: remove SLAB_NOFS SLAB_NOFS is an alias of GFP_NOFS. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:23 -08:00
Trond Myklebust	a18030445f	NFS: Clean up calls to mark_inode_dirty() part 2 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-12-06 10:46:42 -05:00
Trond Myklebust	3925675cb3	NFS: Fix up the dirty page accounting There is now no reason to account for the dirty pages in the NFS code, since the VM code will now do it for us via __set_page_dirty_nobuffers(), and set_page_writeback(). We still need to keep the accounting of stable writes, though. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-12-06 10:46:41 -05:00
Trond Myklebust	e507d9ebbb	NFS: Ensure the inode is marked as dirty if we break out of nfs_wb_all() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-12-06 10:46:40 -05:00
Trond Myklebust	61822ab5e3	NFS: Ensure we only call set_page_writeback() under the page lock Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-12-06 10:46:40 -05:00
Trond Myklebust	e261f51f25	NFS: Make nfs_updatepage() mark the page as dirty. This will ensure that we can call set_page_writeback() from within nfs_writepage(), which is always called with the page lock set. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-12-06 10:46:39 -05:00
Trond Myklebust	4d770ccf42	NFS: Ensure that nfs_wb_page() calls writepage when necessary. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-12-06 10:46:39 -05:00
Trond Myklebust	1a54533ec8	NFS: Add nfs_set_page_dirty() We will want to allow nfs_writepage() to distinguish between pages that have been marked as dirty by the VM, and those that have been marked as dirty by nfs_updatepage(). In the former case, the entire page will want to be written out, and so any requests that were pending need to be flushed out first. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-12-06 10:46:38 -05:00
Trond Myklebust	200baa2112	NFS: Remove nfs_writepage_sync() Maintaining two parallel ways of doing synchronous writes is rather pointless. This patch gets rid of the legacy nfs_writepage_sync(), and replaces it with the faster asynchronous writes. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-12-06 10:46:38 -05:00
Trond Myklebust	e21195a740	NFS: More cleanups of fs/nfs/write.c Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-12-06 10:46:37 -05:00
Trond Myklebust	87a4ce1608	NFS: Remove call to igrab() from nfs_writepage() We always ensure that the nfs_open_context holds a reference to the dentry, so the test in nfs_writepage() for whether or not the inode is referenced is redundant. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-12-06 10:46:37 -05:00
Trond Myklebust	49a70f2786	NFS: Cleanup: add common helper nfs_page_length() Clean up a lot of ad-hoc page length calculations in fs/nfs/write.c Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-12-06 10:46:36 -05:00
Trond Myklebust	277459d2e2	NFS: Store pointer to the nfs_page in page->private This will allow fast lookup of the nfs_page from the struct page instead of having to search the radix tree. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-12-06 10:46:36 -05:00
Trond Myklebust	1c75950b9a	NFS: cleanup of nfs_sync_inode_wait() Allow callers to directly pass it a struct writeback_control. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-12-06 10:46:35 -05:00
Trond Myklebust	3f442547b7	NFS: Clean up nfs_scan_dirty() Pass down struct writeback control. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-12-06 10:46:35 -05:00
Trond Myklebust	28c6925fce	NFS: Clean up nfs_flush_inode() Make it take a struct writepages argument, and rename to nfs_flush_mapping(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-12-06 10:46:34 -05:00
Frank Filz	a99b71c9c4	NFS: Remove use of the Big Kernel Lock around calls to rpc_execute. Remove use of the Big Kernel Lock around calls to rpc_execute. Signed-off-by: Frank Filz <ffilz@us.ibm.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-12-06 10:46:30 -05:00
Trond Myklebust	e8e058e830	NFS: Fix nfs_sync_inode_wait(FLUSH_INVALIDATE) Currently nfs_sync_inode_wait() will fail to loop correctly when we call nfs_sync_inode_wait with the FLUSH_INVALIDATE argument. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-12-06 10:46:28 -05:00
Trond Myklebust	8aca67f0ae	SUNRPC: Fix a potential race in rpc_wake_up_task() Use RCU to ensure that we can safely call rpc_finish_wakeup after we've called __rpc_do_wake_up_task. If not, there is a theoretical race, in which the rpc_task finishes executing, and gets freed first. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-12-06 10:46:26 -05:00
Trond Myklebust	b6dff26a08	[PATCH] NFS: Fix oops in nfs_cancel_commit_list Fix two bugs: - nfs_inode_remove_request will call nfs_clear_request, so we cannot reference req->wb_page after it. Move the call to dec_zone_page_state so that it occurs while req->wb_page is still valid. - Calling nfs_clear_page_writeback is unnecessary since the radix tree tags will have been cleared by the call to nfs_inode_remove_request. Replace with a simple call to nfs_unlock_request. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-20 10:26:38 -07:00
Andrew Morton	3fcfab16c5	[PATCH] separate bdi congestion functions from queue congestion functions Separate out the concept of "queue congestion" from "backing-dev congestion". Congestion is a backing-dev concept, not a queue concept. The blk_* congestion functions are retained, as wrappers around the core backing-dev congestion functions. This proper layering is needed so that NFS can cleanly use the congestion functions, and so that CONFIG_BLOCK=n actually links. Cc: "Thomas Maier" <balagi@justmail.de> Cc: "Jens Axboe" <jens.axboe@oracle.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: David Howells <dhowells@redhat.com> Cc: Peter Osterlund <petero2@telia.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-20 10:26:35 -07:00
David Howells	4cb50dc2ea	[PATCH] BLOCK: Remove no-longer necessary linux/mpage.h inclusions [try #6 ] Remove inclusions of linux/mpage.h that are no longer necessary due to the transfer of generic_writepages(). Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2006-09-30 20:52:30 +02:00
Alexey Dobriyan	1a1d92c10d	[PATCH] Really ignore kmem_cache_destroy return value * Rougly half of callers already do it by not checking return value * Code in drivers/acpi/osl.c does the following to be sure: (void)kmem_cache_destroy(cache); * Those who check it printk something, however, slab_error already printed the name of failed cache. * XFS BUGs on failed kmem_cache_destroy which is not the decision low-level filesystem driver should make. Converted to ignore. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:10 -07:00
Chuck Lever	f551e44ff1	NFS: add comments clarifying the use of nfs_post_op_update() Comments-only change to clarify a detail of the NFS protocol and how it is implemented in Linux. Test plan: None. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:25:05 -04:00
Trond Myklebust	275a082fe9	Add a real API for dealing with blk_congestion_wait() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:24:54 -04:00
David Howells	54ceac4515	NFS: Share NFS superblocks per-protocol per-server per-FSID The attached patch makes NFS share superblocks between mounts from the same server and FSID over the same protocol. It does this by creating each superblock with a false root and returning the real root dentry in the vfsmount presented by get_sb(). The root dentry set starts off as an anonymous dentry if we don't already have the dentry for its inode, otherwise it simply returns the dentry we already have. We may thus end up with several trees of dentries in the superblock, and if at some later point one of anonymous tree roots is discovered by normal filesystem activity to be located in another tree within the superblock, the anonymous root is named and materialises attached to the second tree at the appropriate point. Why do it this way? Why not pass an extra argument to the mount() syscall to indicate the subpath and then pathwalk from the server root to the desired directory? You can't guarantee this will work for two reasons: (1) The root and intervening nodes may not be accessible to the client. With NFS2 and NFS3, for instance, mountd is called on the server to get the filehandle for the tip of a path. mountd won't give us handles for anything we don't have permission to access, and so we can't set up NFS inodes for such nodes, and so can't easily set up dentries (we'd have to have ghost inodes or something). With this patch we don't actually create dentries until we get handles from the server that we can use to set up their inodes, and we don't actually bind them into the tree until we know for sure where they go. (2) Inaccessible symbolic links. If we're asked to mount two exports from the server, eg: mount warthog:/warthog/aaa/xxx /mmm mount warthog:/warthog/bbb/yyy /nnn We may not be able to access anything nearer the root than xxx and yyy, but we may find out later that /mmm/www/yyy, say, is actually the same directory as the one mounted on /nnn. What we might then find out, for example, is that /warthog/bbb was actually a symbolic link to /warthog/aaa/xxx/www, but we can't actually determine that by talking to the server until /warthog is made available by NFS. This would lead to having constructed an errneous dentry tree which we can't easily fix. We can end up with a dentry marked as a directory when it should actually be a symlink, or we could end up with an apparently hardlinked directory. With this patch we need not make assumptions about the type of a dentry for which we can't retrieve information, nor need we assume we know its place in the grand scheme of things until we actually see that place. This patch reduces the possibility of aliasing in the inode and page caches for inodes that may be accessed by more than one NFS export. It also reduces the number of superblocks required for NFS where there are many NFS exports being used from a server (home directory server + autofs for example). This in turn makes it simpler to do local caching of network filesystems, as it can then be guaranteed that there won't be links from multiple inodes in separate superblocks to the same cache file. Obviously, cache aliasing between different levels of NFS protocol could still be a problem, but at least that gives us another key to use when indexing the cache. This patch makes the following changes: (1) The server record construction/destruction has been abstracted out into its own set of functions to make things easier to get right. These have been moved into fs/nfs/client.c. All the code in fs/nfs/client.c has to do with the management of connections to servers, and doesn't touch superblocks in any way; the remaining code in fs/nfs/super.c has to do with VFS superblock management. (2) The sequence of events undertaken by NFS mount is now reordered: (a) A volume representation (struct nfs_server) is allocated. (b) A server representation (struct nfs_client) is acquired. This may be allocated or shared, and is keyed on server address, port and NFS version. (c) If allocated, the client representation is initialised. The state member variable of nfs_client is used to prevent a race during initialisation from two mounts. (d) For NFS4 a simple pathwalk is performed, walking from FH to FH to find the root filehandle for the mount (fs/nfs/getroot.c). For NFS2/3 we are given the root FH in advance. (e) The volume FSID is probed for on the root FH. (f) The volume representation is initialised from the FSINFO record retrieved on the root FH. (g) sget() is called to acquire a superblock. This may be allocated or shared, keyed on client pointer and FSID. (h) If allocated, the superblock is initialised. (i) If the superblock is shared, then the new nfs_server record is discarded. (j) The root dentry for this mount is looked up from the root FH. (k) The root dentry for this mount is assigned to the vfsmount. (3) nfs_readdir_lookup() creates dentries for each of the entries readdir() returns; this function now attaches disconnected trees from alternate roots that happen to be discovered attached to a directory being read (in the same way nfs_lookup() is made to do for lookup ops). The new d_materialise_unique() function is now used to do this, thus permitting the whole thing to be done under one set of locks, and thus avoiding any race between mount and lookup operations on the same directory. (4) The client management code uses a new debug facility: NFSDBG_CLIENT which is set by echoing 1024 to /proc/net/sunrpc/nfs_debug. (5) Clone mounts are now called xdev mounts. (6) Use the dentry passed to the statfs() op as the handle for retrieving fs statistics rather than the root dentry of the superblock (which is now a dummy). Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:24:37 -04:00
Trond Myklebust	5c2d97cb31	NFS: Fix nfs_page use after free issues in fs/nfs/write.c Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-19 11:59:10 -04:00
Trond Myklebust	e9f7bee1df	[PATCH] NFS: large non-page-aligned direct I/O clobbers memory The logic in nfs_direct_read_schedule and nfs_direct_write_schedule can allow data->npages to be one larger than rpages. This causes a page pointer to be written beyond the end of the pagevec in nfs_read_data (or nfs_write_data). Fix this by making nfs_(read\|write)_alloc() calculate the size of the pagevec array, and initialise data->npages. Also get rid of the redundant argument to nfs_commit_alloc(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-08 10:22:51 -07:00
Adrian Bunk	e4e20512cf	NFS: make 2 functions static nfs_writedata_free() and nfs_readdata_free() can now become static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> (cherry picked from 5e1ce40f0c3c8f67591aff17756930d7a18ceb1a commit)	2006-08-03 16:55:41 -04:00
Trond Myklebust	83715ad54f	NFS: Fix NFS page_state usage The introduction of the FLUSH_INVALIDATE argument to nfs_sync_inode_wait() does not clear the nr_unstable page state counter for pages that are being released. Also fix a longstanding similar bug when nfs_commit_list() fails. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-07-05 13:17:12 -04:00
Linus Torvalds	22a3e233ca	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial * git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: Remove obsolete #include <linux/config.h> remove obsolete swsusp_encrypt arch/arm26/Kconfig typos Documentation/IPMI typos Kconfig: Typos in net/sched/Kconfig v9fs: do not include linux/version.h Documentation/DocBook/mtdnand.tmpl: typo fixes typo fixes: specfic -> specific typo fixes in Documentation/networking/pktgen.txt typo fixes: occuring -> occurring typo fixes: infomation -> information typo fixes: disadvantadge -> disadvantage typo fixes: aquire -> acquire typo fixes: mecanism -> mechanism typo fixes: bandwith -> bandwidth fix a typo in the RTC_CLASS help text smb is no longer maintained Manually merged trivial conflict in arch/um/kernel/vmlinux.lds.S	2006-06-30 15:39:30 -07:00
Christoph Lameter	fd39fc8561	[PATCH] zoned vm counters: conversion of nr_unstable to per zone counter Conversion of nr_unstable to a per zone counter We need to do some special modifications to the nfs code since there are multiple cases of disposition and we need to have a page ref for proper accounting. This converts the last critical page state of the VM and therefore we need to remove several functions that were depending on GET_PAGE_STATE_LAST in order to make the kernel compile again. We are only left with event type counters in page state. [akpm@osdl.org: bugfixes] Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-30 11:25:36 -07:00
Christoph Lameter	b1e7a8fd85	[PATCH] zoned vm counters: conversion of nr_dirty to per zone counter This makes nr_dirty a per zone counter. Looping over all processors is avoided during writeback state determination. The counter aggregation for nr_dirty had to be undone in the NFS layer since we summed up the page counts from multiple zones. Someone more familiar with NFS should probably review what I have done. [akpm@osdl.org: bugfix] Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-30 11:25:35 -07:00
Jörn Engel	6ab3d5624e	Remove obsolete #include <linux/config.h> Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2006-06-30 19:25:36 +02:00
David Brownell	266bee8869	[PATCH] fix static linking of NFS Builds on ARM report link problems with common configurations like statically linked NFS (for nfsroot). The symptom is that __init section code references __exit section code; that won't work since the exit sections are discarded (since they can never be called). The best fix for these particular cases would be an "__init_or_exit" section annotation. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-27 14:07:19 -07:00
David Howells	f7b422b17e	NFS: Split fs/nfs/inode.c As fs/nfs/inode.c is rather large, heterogenous and unwieldy, the attached patch splits it up into a number of files: () fs/nfs/inode.c Strictly inode specific functions. () fs/nfs/super.c Superblock management functions for NFS and NFS4, normal access, clones and referrals. The NFS4 superblock functions _could_ move out into a separate conditionally compiled file, but it's probably not worth it as there're so many common bits. () fs/nfs/namespace.c Some namespace-specific functions have been moved here. () fs/nfs/nfs4namespace.c NFS4-specific namespace functions (this could be merged into the previous file). This file is conditionally compiled. () fs/nfs/internal.h Inter-file declarations, plus a few simple utility functions moved from fs/nfs/inode.c. Additionally, all the in-.c-file externs have been moved here, and those files they were moved from now includes this file. For the most part, the functions have not been changed, only some multiplexor functions have changed significantly. I've also: () Added some extra banner comments above some functions. () Rearranged the function order within the files to be more logical and better grouped (IMO), though someone may prefer a different order. () Reduced the number of #ifdefs in .c files. (*) Added missing __init and __exit directives. Signed-Off-By: David Howells <dhowells@redhat.com>	2006-06-09 09:34:33 -04:00
Trond Myklebust	d2ccddf042	NFS: Flesh out nfs_invalidate_page() In the case of a call to truncate_inode_pages(), we should really try to cancel any pending writes on the page. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-06-09 09:34:14 -04:00
Chuck Lever	0d0b5cb36f	NFS: Optimize allocation of nfs_read/write_data structures Clean up use of page_array, and fix an off-by-one error noticed by Tom Talpey which causes kmalloc calls in cases where using the page_array is sufficient. Test plan: Normal client functional testing with r/wsize=32768. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-06-09 09:34:07 -04:00
Matthew Dobson	93d2341c75	[PATCH] mempool: use mempool_create_slab_pool() Modify well over a dozen mempool users to call mempool_create_slab_pool() rather than calling mempool_create() with extra arguments, saving about 30 lines of code and increasing readability. Signed-off-by: Matthew Dobson <colpatch@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-26 08:57:00 -08:00
Trond Myklebust	c42de9dd67	NFS: Fix a race in nfs_sync_inode() Kudos to Neil Brown for spotting the problem: "in nfs_sync_inode, there is effectively the sequence: nfs_wait_on_requests nfs_flush_inode nfs_commit_inode This seems a bit racy to me as if the only requests are on the ->commit list, and nfs_commit_inode is called separately after nfs_wait_on_requests completes, and before nfs_commit_inode start (say: by nfs_write_inode) then none of these function will return >0, yet there will be some pending request that aren't waited for." The solution is to search for requests to wait upon, search for dirty requests, and search for uncommitted requests while holding the nfsi->req_lock The patch also cleans up nfs_sync_inode(), getting rid of the redundant FLUSH_WAIT flag. It turns out that we were always setting it. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-03-20 13:44:51 -05:00
Trond Myklebust	7d46a49f51	NFS: Clean up nfs_flush_list() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-03-20 13:44:50 -05:00
Trond Myklebust	deb7d63826	NFS: Fix a race with PG_private and nfs_release_page() We don't need to set PG_private for readahead pages, since they never get unlocked while I/O is in progress. However there is a small race in nfs_readpage_release() whereby the page may be unlocked, and have PG_private set. Fix is to have PG_private set only for the case of writes... Also fix a bug in nfs_clear_page_writeback(): Don't attempt to clear the radix_tree tag if we've already deleted the radix tree entry. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-03-20 13:44:50 -05:00
Trond Myklebust	3feb2d4939	NFS: Uninline nfs_writedata_(alloc\|free) and nfs_readdata_(alloc\|free) Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-03-20 13:44:37 -05:00
Trond Myklebust	e17b1fc4b3	NFS: Make nfs_commit_alloc() extern We need to use nfs_commit_alloc() in fs/nfs/direct.c. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-03-20 13:44:35 -05:00
Chuck Lever	462d5b3296	NFS: make direct write path generate write requests concurrently Duplicate infrastructure from direct read path that will allow write path to generate multiple write requests concurrently. This will enable us to add support for aio in this path. Temporarily we will lose the ability to do UNSTABLE writes followed by a COMMIT in the direct write path. However, all applications I am aware of that use NFS O_DIRECT currently write in relatively small chunks, so this should not be inconvenient in any way. Test plan: Millions of fsx-odirect ops. OraSim. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-03-20 13:44:32 -05:00
Trond Myklebust	788e7a89a0	NFS: Cleanup of NFS write code in preparation for asynchronous o_direct This patch inverts the callback hierarchy for NFS write calls. Instead of having the NFSv2/v3/v4-specific code set up the RPC callback ops, we allow the original caller to do so. This allows for more flexibility w.r.t. how to set up and tear down the nfs_write_data structure while still allowing the NFSv3/v4 code to perform error handling. The greater flexibility is needed by the asynchronous O_DIRECT code, which wants to be able to hold on to the original nfs_write_data structures after the WRITE RPC call has completed in order to be able to replay them if the COMMIT call determines that the server has rebooted. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-03-20 13:44:27 -05:00
Chuck Lever	91d5b47023	NFS: add I/O performance counters Invoke the byte and event counter macros where we want to count bytes and events. Clean-up: fix a possible NULL dereference in nfs_lock, and simplify nfs_file_open. Test-plan: fsx and iozone on UP and SMP systems, with and without pre-emption. Watch for memory overwrite bugs, and performance loss (significantly more CPU required per op). Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-03-20 13:44:14 -05:00
Eric Sesterhenn	bd6475454c	NFS: kzalloc conversion in fs/nfs this converts fs/nfs to kzalloc() usage. compile tested with make allyesconfig Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-03-20 13:44:10 -05:00
Neil Brown	1dd594b21b	NFS: Fix buglet in fs/nfs/write.c I've been reading through fs/nfs/write.c trying to track down a bug that seems to be related to pages loosing a refcount and getting freed too early (you interested in detail??) and I spotted a little bug which the following patch should fix. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-03-20 13:44:04 -05:00
Trond Myklebust	70b9ecbdb9	NFS: Make stat() return updated mtimes after a write() The SuS states that a call to write() will cause mtime to be updated on the file. In order to satisfy that requirement, we need to flush out any cached writes in nfs_getattr(). Speed things up slightly by not committing the writes. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-01-06 14:58:50 -05:00
Chuck Lever	40859d7ee6	NFS: support large reads and writes on the wire Most NFS server implementations allow up to 64KB reads and writes on the wire. The Solaris NFS server allows up to a megabyte, for instance. Now the Linux NFS client supports transfer sizes up to 1MB, too. This will help reduce protocol and context switch overhead on read/write intensive NFS workloads, and support larger atomic read and write operations on servers that support them. Test-plan: Connectathon and iozone on mount point with wsize=rsize>32768 over TCP. Tests with NFS over UDP to verify the maximum RPC payload size cap. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-01-06 14:58:49 -05:00
Trond Myklebust	963d8fe533	RPC: Clean up RPC task structure Shrink the RPC task structure. Instead of storing separate pointers for task->tk_exit and task->tk_release, put them in a structure. Also pass the user data pointer as a parameter instead of passing it via task->tk_calldata. This enables us to nest callbacks. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-01-06 14:58:39 -05:00
Trond Myklebust	abd3e641d5	NFS: Work correctly with single-page ->writepage() calls Ensure that we always initiate flushing of data before we exit a single-page ->writepage() call. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-01-06 14:58:39 -05:00
Trond Myklebust	bb713d6d38	NFS: use set_page_writeback() in the appropriate places Ensure that we use set_page_writeback() in the appropriate places to help the VM in keeping its page radix_tree in sync. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-12-03 15:20:14 -05:00
Chuck Lever	0bbacc402e	NFS,SUNRPC,NLM: fix unused variable warnings when CONFIG_SYSCTL is disabled Fix some dprintk's so that NLM, NFS client, and RPC client compile cleanly if CONFIG_SYSCTL is disabled. Test plan: Compile kernel with CONFIG_NFS enabled and CONFIG_SYSCTL disabled. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-11-04 15:39:48 -05:00
Trond Myklebust	d530838bfa	NFSv4: Fix problem with OPEN_DOWNGRADE RFC 3530 states that for OPEN_DOWNGRADE "The share_access and share_deny bits specified must be exactly equal to the union of the share_access and share_deny bits specified for some subset of the OPENs in effect for current openowner on the current file. Setattr is currently violating the NFSv4 rules for OPEN_DOWNGRADE in that it may cause a downgrade from OPEN4_SHARE_ACCESS_BOTH to OPEN4_SHARE_ACCESS_WRITE despite the fact that there exists no open file with O_WRONLY access mode. Fix the problem by replacing nfs4_find_state() with a modified version of nfs_find_open_context(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-11-04 15:33:38 -05:00
Trond Myklebust	0e574af1be	NFS: Cleanup initialisation of struct nfs_fattr Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-10-27 22:12:38 -04:00
Trond Myklebust	3da28eb1c6	[PATCH] NFS: Replace nfs_page insertion sort with a radix sort Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:39 -04:00

1 2 3 4

154 Commits