linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-16 08:44:21 +08:00

Author	SHA1	Message	Date
Jeff Mahoney	e00f730865	Btrfs: remove btrfs_init_path btrfs_init_path was initially used when the path objects were on the stack. Now all the work is done by btrfs_alloc_path and btrfs_init_path isn't required. This patch removes it, and just uses kmem_cache_zalloc to zero out the object. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-02-12 14:11:25 -05:00
Jeff Mahoney	7951f3cefb	Btrfs: balance_level checks !child after access The BUG_ON() is in the wrong spot. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-02-12 10:06:15 -05:00
Yan Zheng	b335b0034e	Btrfs: Avoid using __GFP_HIGHMEM with slab allocator btrfs_releasepage may call kmem_cache_alloc indirectly, and provide same GFP flags it gets to kmem_cache_alloc. So it's possible to use __GFP_HIGHMEM with the slab allocator. Signed-off-by: Yan Zheng <zheng.yan@oracle.com>	2009-02-12 10:06:04 -05:00
Chris Mason	e1df36d2f1	Btrfs: don't clean old snapshots on sync(1) Cleaning old snapshots can make sync(1) somewhat slow, and some users and applications still use it in a global fsync kind of workload. This patch changes btrfs not to clean old snapshots during sync, which is safe from a FS consistency point of view. The major downside is that it makes it difficult to tell when old snapshots have been reaped and the space they were using has been reclaimed. A new ioctl will be added for this purpose instead. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-02-12 09:45:08 -05:00
Chris Mason	536ac8ae86	Btrfs: use larger metadata clusters in ssd mode Larger metadata clusters can significantly improve writeback performance on ssd drives with large erasure blocks. The larger clusters make it more likely a given IO will completely overwrite the ssd block, so it doesn't have to do an internal rwm cycle. On spinning media, lager metadata clusters end up spreading out the metadata more over time, which makes fsck slower, so we don't want this to be the default. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-02-12 09:41:38 -05:00
Chris Mason	b288052e17	Btrfs: process mount options on mount -o remount, Btrfs wasn't parsing any new mount options during remount, making it difficult to set mount options on a root drive. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-02-12 09:37:35 -05:00
Josef Bacik	eb09967089	Btrfs: make sure all pending extent operations are complete Theres a slight problem with finish_current_insert, if we set all to 1 and then go through and don't actually skip any of the extents on the pending list, we could exit right after we've added new extents. This is a problem because by inserting the new extents we could have gotten new COW's to happen and such, so we may have some pending updates to do or even more inserts to do after that. So this patch will only exit if we have never skipped any of the extents in the pending list, and we have no extents to insert, this will make sure that all of the pending work is truly done before we return. I've been running with this patch for a few days with all of my other testing and have not seen issues. Thanks, Signed-off-by: Josef Bacik <jbacik@redhat.com>	2009-02-12 09:27:38 -05:00
Chris Mason	284b066af4	Btrfs: don't use spin_is_contended Btrfs was using spin_is_contended to see if it should drop locks before doing extent allocations during btrfs_search_slot. The idea was to avoid expensive searches in the tree unless the lock was actually contended. But, spin_is_contended is specific to the ticket spinlocks on x86, so this is causing compile errors everywhere else. In practice, the contention could easily appear some time after we started doing the extent allocation, and it makes more sense to always drop the lock instead. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-02-09 16:22:03 -05:00
Chris Mason	42f15d77df	Btrfs: Make sure dir is non-null before doing S_ISGID checks The S_ISGID check in btrfs_new_inode caused an oops during subvol creation because sometimes the dir is null. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-02-06 11:35:57 -05:00
Chris Mason	806638bce9	Btrfs: Fix memory leak in cache_drop_leaf_ref The code wasn't doing a kfree on the sorted array Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-02-05 09:08:14 -05:00
Chris Mason	9b0d3ace33	Btrfs: don't return congestion in write_cache_pages as often On fast devices that go from congested to uncongested very quickly, pdflush is waiting too often in congestion_wait, and the FS is backing off to easily in write_cache_pages. For now, fix this on the btrfs side by only checking congestion after some bios have already gone down. Longer term a real fix is needed for pdflush, but that is a larger project. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-02-04 09:33:00 -05:00
Chris Mason	7b78c170dc	Btrfs: Only prep for btree deletion balances when nodes are mostly empty Whenever an item deletion is done, we need to balance all the nodes in the tree to make sure we don't end up with an empty node if a pointer is deleted. This balance prep happens from the root of the tree down so we can drop our locks as we go. reada_for_balance was triggering read-ahead on neighboring nodes even when no balancing was required. This adds an extra check to avoid calling balance_level() and avoid reada_for_balance() when a balance won't be required. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-02-04 09:12:46 -05:00
Chris Mason	12f4daccfc	Btrfs: fix btrfs_unlock_up_safe to walk the entire path btrfs_unlock_up_safe would break out at the first NULL node entry or unlocked node it found in the path. Some of the callers have missing nodes at the lower levels of the path, so this commit fixes things to check all the nodes in the path before returning. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-02-04 09:31:42 -05:00
Chris Mason	4d081c41a4	Btrfs: change btrfs_del_leaf to drop locks earlier btrfs_del_leaf does two things. First it removes the pointer in the parent, and then it frees the block that has the leaf. It has the parent node locked for both operations. But, it only needs the parent locked while it is deleting the pointer. After that it can safely free the block without the parent locked. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-02-04 09:31:28 -05:00
Chris Mason	06d9a8d7c2	Btrfs: Change btrfs_truncate_inode_items to stop when it hits the inode btrfs_truncate_inode_items is setup to stop doing btree searches when it has finished removing the items for the inode. It used to detect the end of the inode by looking for an objectid that didn't match the one we were searching for. But, this would result in an extra search through the btree, which adds extra balancing and cow costs to the operation. This commit adds a check to see if we found the inode item, which means we can stop searching early. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-02-04 09:30:58 -05:00
Chris Mason	f03d9301f1	Btrfs: Don't try to compress pages past i_size The compression code had some checks to make sure we were only compressing bytes inside of i_size, but it wasn't catching every case. To make things worse, some incorrect math about the number of bytes remaining would make it try to compress more pages than the file really had. The fix used here is to fall back to the non-compression code in this case, which does all the proper cleanup of delalloc and other accounting. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-02-04 09:31:06 -05:00
Josef Bacik	811449496b	Btrfs: join the transaction in __btrfs_setxattr With selinux on we end up calling __btrfs_setxattr when we create an inode, which calls btrfs_start_transaction(). The problem is we've already called that in btrfs_new_inode, and in btrfs_start_transaction we end up doing a wait_current_trans(). If btrfs-transaction has started committing it will wait for all handles to finish, while the other process is waiting for the transaction to commit. This is fixed by using btrfs_join_transaction, which won't wait for the transaction to commit. Thanks, Signed-off-by: Josef Bacik <jbacik@redhat.com>	2009-02-04 09:18:33 -05:00
Chris Ball	8c087b5183	Btrfs: Handle SGID bit when creating inodes Before this patch, new files/dirs would ignore the SGID bit on their parent directory and always be owned by the creating user's uid/gid. Signed-off-by: Chris Ball <cjb@laptop.org> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-02-04 09:29:54 -05:00
Chris Mason	bd56b30205	Btrfs: Make btrfs_drop_snapshot work in larger and more efficient chunks Every transaction in btrfs creates a new snapshot, and then schedules the snapshot from the last transaction for deletion. Snapshot deletion works by walking down the btree and dropping the reference counts on each btree block during the walk. If if a given leaf or node has a reference count greater than one, the reference count is decremented and the subtree pointed to by that node is ignored. If the reference count is one, walking continues down into that node or leaf, and the references of everything it points to are decremented. The old code would try to work in small pieces, walking down the tree until it found the lowest leaf or node to free and then returning. This was very friendly to the rest of the FS because it didn't have a huge impact on other operations. But it wouldn't always keep up with the rate that new commits added new snapshots for deletion, and it wasn't very optimal for the extent allocation tree because it wasn't finding leaves that were close together on disk and processing them at the same time. This changes things to walk down to a level 1 node and then process it in bulk. All the leaf pointers are sorted and the leaves are dropped in order based on their extent number. The extent allocation tree and commit code are now fast enough for this kind of bulk processing to work without slowing the rest of the FS down. Overall it does less IO and is better able to keep up with snapshot deletions under high load. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-02-04 09:27:02 -05:00
Chris Mason	b4ce94de9b	Btrfs: Change btree locking to use explicit blocking points Most of the btrfs metadata operations can be protected by a spinlock, but some operations still need to schedule. So far, btrfs has been using a mutex along with a trylock loop, most of the time it is able to avoid going for the full mutex, so the trylock loop is a big performance gain. This commit is step one for getting rid of the blocking locks entirely. btrfs_tree_lock takes a spinlock, and the code explicitly switches to a blocking lock when it starts an operation that can schedule. We'll be able get rid of the blocking locks in smaller pieces over time. Tracing allows us to find the most common cause of blocking, so we can start with the hot spots first. The basic idea is: btrfs_tree_lock() returns with the spin lock held btrfs_set_lock_blocking() sets the EXTENT_BUFFER_BLOCKING bit in the extent buffer flags, and then drops the spin lock. The buffer is still considered locked by all of the btrfs code. If btrfs_tree_lock gets the spinlock but finds the blocking bit set, it drops the spin lock and waits on a wait queue for the blocking bit to go away. Much of the code that needs to set the blocking bit finishes without actually blocking a good percentage of the time. So, an adaptive spin is still used against the blocking bit to avoid very high context switch rates. btrfs_clear_lock_blocking() clears the blocking bit and returns with the spinlock held again. btrfs_tree_unlock() can be called on either blocking or spinning locks, it does the right thing based on the blocking bit. ctree.c has a helper function to set/clear all the locked buffers in a path as blocking. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-02-04 09:25:08 -05:00
Chris Mason	c487685d7c	Btrfs: hash_lock is no longer needed Before metadata is written to disk, it is updated to reflect that writeout has begun. Once this update is done, the block must be cow'd before it can be modified again. This update was originally synchronized by using a per-fs spinlock. Today the buffers for the metadata blocks are locked before writeout begins, and everyone that tests the flag has the buffer locked as well. So, the per-fs spinlock (called hash_lock for no good reason) is no longer required. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-02-04 09:24:25 -05:00
Chris Mason	3935127c50	Btrfs: disable leak debugging checks in extent_io.c extent_io.c has debugging code to report and free leaked extent_state and extent_buffer objects at rmmod time. This helps track down leaks and it saves you from rebooting just to properly remove the kmem_cache object. But, the code runs under a fairly expensive spinlock and the checks to see if it is currently enabled are not entirely consistent. Some use #ifdef and some #if. This changes everything to #if and disables the leak checking. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-02-04 09:24:05 -05:00
Chris Mason	b7a9f29fcf	Btrfs: sort references by byte number during btrfs_inc_ref When a block goes through cow, we update the reference counts of everything that block points to. The internal pointers of the block can be in just about any order, and it is likely to have clusters of things that are close together and clusters of things that are not. To help reduce the seeks that come with updating all of these reference counts, sort them by byte number before actual updates are done. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-02-04 09:23:45 -05:00
Chris Mason	b51912c91f	Btrfs: async threads should try harder to find work Tracing shows the delay between when an async thread goes to sleep and when more work is added is often very short. This commit adds a little bit of delay and extra checking to the code right before we schedule out. It allows more work to be added to the worker without requiring notifications from other procs. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-02-04 09:23:24 -05:00
Jim Owens	0279b4cd86	Btrfs: selinux support Add call to LSM security initialization and save resulting security xattr for new inodes. Add xattr support to symlink inode ops. Set inode->i_op for existing special files. Signed-off-by: jim owens <jowens@hp.com>	2009-02-04 09:29:13 -05:00
Christian Hesse	bef62ef339	Btrfs: make btrfs acls selectable This patch adds a menu entry to kconfig to enable acls for btrfs. This allows you to enable FS_POSIX_ACL at kernel compile time. (updated by Jeff Mahoney to make the changes in fs/btrfs/Kconfig instead) Signed-off-by: Christian Hesse <mail@earthworm.de> Signed-off-by: Jeff Mahoney <jeffm@suse.com>	2009-02-04 09:28:28 -05:00
Chris Mason	a683705153	Btrfs: Catch missed bios in the async bio submission thread The async bio submission thread was missing some bios that were added after it had decided there was no work left to do. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-02-04 09:19:41 -05:00
Chris Mason	89f135d8b5	Btrfs: fix readdir on 32 bit machines After btrfs_readdir has gone through all the directory items, it sets the directory f_pos to the largest possible int. This way applications that mix readdir with creating new files don't end up in an endless loop finding the new directory items as they go. It was a workaround for a bug in git, but the assumption was that if git could make this looping mistake than it would be a common problem. The largest possible int chosen was INT_LIMIT(typeof(file->f_pos), and it is possible for that to be a larger number than 32 bit glibc expects to come out of readdir. This patches switches that to INT_LIMIT(off_t), which should keep applications happy on 32 and 64 bit machines. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-01-28 15:34:27 -05:00
Chris Mason	e4f722fa42	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable Fix fs/btrfs/super.c conflict around #includes	2009-01-28 20:29:43 -05:00
Jeff Layton	fa82a49127	nfsd: only set file_lock.fl_lmops in nfsd4_lockt if a stateowner is found nfsd4_lockt does a search for a lockstateowner when building the lock struct to test. If one is found, it'll set fl_owner to it. Regardless of whether that happens, it'll also set fl_lmops. Given that this lock is basically a "lightweight" lock that's just used for checking conflicts, setting fl_lmops is probably not appropriate for it. This behavior exposed a bug in DLM's GETLK implementation where it wasn't clearing out the fields in the file_lock before filling in conflicting lock info. While we were able to fix this in DLM, it still seems pointless and dangerous to set the fl_lmops this way when we may have a NULL lockstateowner. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@pig.fieldses.org>	2009-01-27 17:26:59 -05:00
J. Bruce Fields	b914152a6f	nfsd: fix cred leak on every rpc Since override_creds() took its own reference on new, we need to release our own reference. (Note the put_cred on the return value puts the old value of current->creds, not the new passed-in value). Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-01-27 17:26:59 -05:00
J. Bruce Fields	bf935a7881	nfsd: fix null dereference on error path We're forgetting to check the return value from groups_alloc(). Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-01-27 17:26:58 -05:00
Linus Torvalds	a90e8a75fb	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm: dlm: initialize file_lock struct in GETLK before copying conflicting lock dlm: fix plock notify callback to lockd	2009-01-26 10:42:05 -08:00
Linus Torvalds	cc597bc3d3	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-quota-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-quota-2.6: ocfs2: Remove ocfs2_dquot_initialize() and ocfs2_dquot_drop() quota: Improve locking	2009-01-26 10:41:00 -08:00
Linus Torvalds	ed80386295	Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: klist.c: bit 0 in pointer can't be used as flag debugfs: introduce stub for debugfs_create_size_t() when DEBUG_FS=n sysfs: fix problems with binary files PNP: fix broken pnp lowercasing for acpi module aliases driver core: Convert '/' to '!' in dev_set_name()	2009-01-26 10:40:28 -08:00
Linus Torvalds	a1c70a756f	Merge branch 'Kconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/adobriyan/misc * 'Kconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/adobriyan/misc: (36 commits) fs/Kconfig: move 9p out fs/Kconfig: move afs out fs/Kconfig: move coda out fs/Kconfig: move the rest of ncpfs out fs/Kconfig: move smbfs out fs/Kconfig: move sunrpc out fs/Kconfig: move nfsd out fs/Kconfig: move nfs out fs/Kconfig: move ufs out fs/Kconfig: move sysv out fs/Kconfig: move romfs out fs/Kconfig: move qnx4 out fs/Kconfig: move hpfs out fs/Kconfig: move omfs out fs/Kconfig: move minix out fs/Kconfig: move vxfs out fs/Kconfig: move squashfs out fs/Kconfig: move cramfs out fs/Kconfig: move efs out fs/Kconfig: move bfs out ...	2009-01-26 10:08:50 -08:00
Vegard Nossum	3632dee2f8	inotify: clean up inotify_read and fix locking problems If userspace supplies an invalid pointer to a read() of an inotify instance, the inotify device's event list mutex is unlocked twice. This causes an unbalance which effectively leaves the data structure unprotected, and we can trigger oopses by accessing the inotify instance from different tasks concurrently. The best fix (contributed largely by Linus) is a total rewrite of the function in question: On Thu, Jan 22, 2009 at 7:05 AM, Linus Torvalds wrote: > The thing to notice is that: > > - locking is done in just one place, and there is no question about it > not having an unlock. > > - that whole double-while(1)-loop thing is gone. > > - use multiple functions to make nesting and error handling sane > > - do error testing after doing the things you always need to do, ie do > this: > > mutex_lock(..) > ret = function_call(); > mutex_unlock(..) > > .. test ret here .. > > instead of doing conditional exits with unlocking or freeing. > > So if the code is written in this way, it may still be buggy, but at least > it's not buggy because of subtle "forgot to unlock" or "forgot to free" > issues. > > This _always_ unlocks if it locked, and it always frees if it got a > non-error kevent. Cc: John McCutchan <ttb@tentacle.dhs.org> Cc: Robert Love <rlove@google.com> Cc: <stable@kernel.org> Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-01-26 10:08:05 -08:00
Linus Torvalds	2d07d4d1bb	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: fix poll notify fuse: destroy bdi on umount fuse: fuse_fill_super error handling cleanup fuse: fix missing fput on error fuse: fix NULL deref in fuse_file_alloc()	2009-01-26 09:49:22 -08:00
Miklos Szeredi	f6d47a1761	fuse: fix poll notify Move fuse_copy_finish() to before calling fuse_notify_poll_wakeup(). This is not a big issue because fuse_notify_poll_wakeup() should be atomic, but it's cleaner this way, and later uses of notification will need to be able to finish the copying before performing some actions. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2009-01-26 15:00:59 +01:00
Miklos Szeredi	26c3679101	fuse: destroy bdi on umount If a fuse filesystem is unmounted but the device file descriptor remains open and a new mount reuses the old device number, then the mount fails with EEXIST and the following warning is printed in the kernel log: WARNING: at fs/sysfs/dir.c:462 sysfs_add_one+0x35/0x3d() sysfs: duplicate filename '0:15' can not be created The cause is that the bdi belonging to the fuse filesystem was destoryed only after the device file was released. Fix this by calling bdi_destroy() from fuse_put_super() instead. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: stable@kernel.org	2009-01-26 15:00:59 +01:00
Miklos Szeredi	c2b8f00690	fuse: fuse_fill_super error handling cleanup Clean up error handling for the whole of fuse_fill_super() function. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2009-01-26 15:00:58 +01:00
Miklos Szeredi	3ddf1e7f57	fuse: fix missing fput on error Fix the leaking file reference if allocation or initialization of fuse_conn failed. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: stable@kernel.org	2009-01-26 15:00:58 +01:00
Dan Carpenter	bb875b38dc	fuse: fix NULL deref in fuse_file_alloc() ff is set to NULL and then dereferenced on line 65. Compile tested only. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: stable@kernel.org	2009-01-26 15:00:58 +01:00
Chris Mason	a717531942	Btrfs: do less aggressive btree readahead Just before reading a leaf, btrfs scans the node for blocks that are close by and reads them too. It tries to build up a large window of IO looking for blocks that are within a max distance from the top and bottom of the IO window. This patch changes things to just look for blocks within 64k of the target block. It will trigger less IO and make for lower latencies on the read size. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-01-22 09:23:10 -05:00
Alexey Dobriyan	0fcb440889	fs/Kconfig: move 9p out Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>	2009-01-22 13:16:01 +03:00
Alexey Dobriyan	b2480c7fbf	fs/Kconfig: move afs out Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>	2009-01-22 13:16:01 +03:00
Alexey Dobriyan	33a1a6fedf	fs/Kconfig: move coda out Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>	2009-01-22 13:16:01 +03:00
Alexey Dobriyan	9d7d6447ef	fs/Kconfig: move the rest of ncpfs out Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>	2009-01-22 13:16:01 +03:00
Alexey Dobriyan	213a41d404	fs/Kconfig: move smbfs out Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>	2009-01-22 13:16:01 +03:00
Alexey Dobriyan	9098c24f35	fs/Kconfig: move sunrpc out Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>	2009-01-22 13:16:00 +03:00

1 2 3 4 5 ...

12653 Commits