linux-next

mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-25 05:34:00 +08:00

Author	SHA1	Message	Date
Al Viro	178ea73521	kill check_acl callback of generic_permission() its value depends only on inode and does not change; we might as well store it in ->i_op->check_acl and be done with that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:43:16 -04:00
Mark Fasheh	1748f843a0	btrfs: Don't BUG_ON alloc_path errors in btrfs_read_locked_inode btrfs_iget() also needed an update so that errors from btrfs_locked_inode() are caught and bubbled back up. Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2011-07-14 14:14:45 -07:00
Mark Fasheh	0eb0e19cde	btrfs: Don't BUG_ON alloc_path errors in btrfs_truncate_inode_items I moved the path allocation up a few lines to the top of the function so that we couldn't get into the state where we've dropped delayed items and the extent cache but fail due to -ENOMEM. Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2011-07-14 14:14:45 -07:00
Mark Fasheh	d8926bb3ba	btrfs: don't BUG_ON btrfs_alloc_path() errors This patch fixes many callers of btrfs_alloc_path() which BUG_ON allocation failure. All the sites that are fixed in this patch were checked by me to be fairly trivial to fix because of at least one of two criteria: - Callers of the function catch errors from it already so bubbling the error up will be handled. - Callers of the function might BUG_ON any nonzero return code in which case there is no behavior changed (but we still got to remove a BUG_ON) The following functions were updated: btrfs_lookup_extent, alloc_reserved_tree_block, btrfs_remove_block_group, btrfs_lookup_csums_range, btrfs_csum_file_blocks, btrfs_mark_extent_written, btrfs_inode_by_name, btrfs_new_inode, btrfs_symlink, insert_reserved_file_extent, and run_delalloc_nocow Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2011-07-14 14:14:44 -07:00
Linus Torvalds	1acc9309eb	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: btrfs: fix oops when doing space balance Btrfs: don't panic if we get an error while balancing V2 btrfs: add missing options displayed in mount output	2011-07-08 23:25:45 -07:00
Miao Xie	149e2d76b4	btrfs: fix oops when doing space balance We need to make sure the data relocation inode doesn't go through the delayed metadata updates, otherwise we get an oops during balance: kernel BUG at fs/btrfs/relocation.c:4303! [SNIP] Call Trace: [<ffffffffa03143fd>] ? update_ref_for_cow+0x22d/0x330 [btrfs] [<ffffffffa0314951>] __btrfs_cow_block+0x451/0x5e0 [btrfs] [<ffffffffa031355d>] ? read_block_for_search+0x14d/0x4d0 [btrfs] [<ffffffffa0314beb>] btrfs_cow_block+0x10b/0x240 [btrfs] [<ffffffffa031acae>] btrfs_search_slot+0x49e/0x7a0 [btrfs] [<ffffffffa032d8af>] btrfs_lookup_inode+0x2f/0xa0 [btrfs] [<ffffffff8147bf0e>] ? mutex_lock+0x1e/0x50 [<ffffffffa0380cf1>] btrfs_update_delayed_inode+0x71/0x160 [btrfs] [<ffffffffa037ff27>] ? __btrfs_release_delayed_node+0x67/0x190 [btrfs] [<ffffffffa0381cf8>] btrfs_run_delayed_items+0xe8/0x120 [btrfs] [<ffffffffa03365e0>] btrfs_commit_transaction+0x250/0x850 [btrfs] [<ffffffff810f91d9>] ? find_get_pages+0x39/0x130 [<ffffffffa0336cd5>] ? join_transaction+0x25/0x250 [btrfs] [<ffffffff81081de0>] ? wake_up_bit+0x40/0x40 [<ffffffffa03785fa>] prepare_to_relocate+0xda/0xf0 [btrfs] [<ffffffffa037f2bb>] relocate_block_group+0x4b/0x620 [btrfs] [<ffffffffa0334cf5>] ? btrfs_clean_old_snapshots+0x35/0x150 [btrfs] [<ffffffffa037fa43>] btrfs_relocate_block_group+0x1b3/0x2e0 [btrfs] [<ffffffffa0368ec0>] ? btrfs_tree_unlock+0x50/0x50 [btrfs] [<ffffffffa035e39b>] btrfs_relocate_chunk+0x8b/0x670 [btrfs] [<ffffffffa031303d>] ? btrfs_set_path_blocking+0x3d/0x50 [btrfs] [<ffffffffa03577d8>] ? read_extent_buffer+0xd8/0x1d0 [btrfs] [<ffffffffa031bea1>] ? btrfs_previous_item+0xb1/0x150 [btrfs] [<ffffffffa03577d8>] ? read_extent_buffer+0xd8/0x1d0 [btrfs] [<ffffffffa035f5aa>] btrfs_balance+0x21a/0x2b0 [btrfs] [<ffffffffa0368898>] btrfs_ioctl+0x798/0xd20 [btrfs] [<ffffffff8111e358>] ? handle_mm_fault+0x148/0x270 [<ffffffff814809e8>] ? do_page_fault+0x1d8/0x4b0 [<ffffffff81160d6a>] do_vfs_ioctl+0x9a/0x540 [<ffffffff811612b1>] sys_ioctl+0xa1/0xb0 [<ffffffff81484ec2>] system_call_fastpath+0x16/0x1b [SNIP] RIP [<ffffffffa037c1cc>] btrfs_reloc_cow_block+0x22c/0x270 [btrfs] Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-07-06 18:51:53 -04:00
Linus Torvalds	af4087e0e6	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: btrfs: fix inconsonant inode information Btrfs: make sure to update total_bitmaps when freeing cache V3 Btrfs: fix type mismatch in find_free_extent() Btrfs: make sure to record the transid in new inodes	2011-06-27 13:32:14 -07:00
Miao Xie	2f7e33d432	btrfs: fix inconsonant inode information When iputting the inode, We may leave the delayed nodes if they have some delayed items that have not been dealt with. So when the inode is read again, we must look up the relative delayed node, and use the information in it to initialize the inode. Or we will get inconsonant inode information, it may cause that the same directory index number is allocated again, and hit the following oops: [ 5447.554187] err add delayed dir index item(name: pglog_0.965_0) into the insertion tree of the delayed node(root id: 262, inode id: 258, errno: -17) [ 5447.569766] ------------[ cut here ]------------ [ 5447.575361] kernel BUG at fs/btrfs/delayed-inode.c:1301! [SNIP] [ 5447.790721] Call Trace: [ 5447.793191] [<ffffffffa0641c4e>] btrfs_insert_dir_item+0x189/0x1bb [btrfs] [ 5447.800156] [<ffffffffa0651a45>] btrfs_add_link+0x12b/0x191 [btrfs] [ 5447.806517] [<ffffffffa0651adc>] btrfs_add_nondir+0x31/0x58 [btrfs] [ 5447.812876] [<ffffffffa0651d6a>] btrfs_create+0xf9/0x197 [btrfs] [ 5447.818961] [<ffffffff8111f840>] vfs_create+0x72/0x92 [ 5447.824090] [<ffffffff8111fa8c>] do_last+0x22c/0x40b [ 5447.829133] [<ffffffff8112076a>] path_openat+0xc0/0x2ef [ 5447.834438] [<ffffffff810c58e2>] ? __perf_event_task_sched_out+0x24/0x44 [ 5447.841216] [<ffffffff8103ecdd>] ? perf_event_task_sched_out+0x59/0x67 [ 5447.847846] [<ffffffff81121a79>] do_filp_open+0x3d/0x87 [ 5447.853156] [<ffffffff811e126c>] ? strncpy_from_user+0x43/0x4d [ 5447.859072] [<ffffffff8111f1f5>] ? getname_flags+0x2e/0x80 [ 5447.864636] [<ffffffff8111f179>] ? do_getname+0x14b/0x173 [ 5447.870112] [<ffffffff8111f1b7>] ? audit_getname+0x16/0x26 [ 5447.875682] [<ffffffff8112b1ab>] ? spin_lock+0xe/0x10 [ 5447.880882] [<ffffffff81112d39>] do_sys_open+0x69/0xae [ 5447.886153] [<ffffffff81112db1>] sys_open+0x20/0x22 [ 5447.891114] [<ffffffff813b9aab>] system_call_fastpath+0x16/0x1b Fix it by reusing the old delayed node. Reported-by: Jim Schutt <jaschut@sandia.gov> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Tested-by: Jim Schutt <jaschut@sandia.gov> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-06-27 11:34:27 -04:00
Chris Mason	1973f0faeb	Btrfs: make sure to record the transid in new inodes When we create a new inode, we aren't filling in the field that records the transaction that last changed this inode. If we then go to fsync that inode, it will be skipped because the field isn't filled in. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-06-24 13:13:29 -04:00
Linus Torvalds	90a800de0a	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: avoid delayed metadata items during commits btrfs: fix uninitialized return value btrfs: fix wrong reservation when doing delayed inode operations btrfs: Remove unused sysfs code btrfs: fix dereference of ERR_PTR value Btrfs: fix relocation races Btrfs: set no_trans_join after trying to expand the transaction Btrfs: protect the pending_snapshots list with trans_lock Btrfs: fix path leakage on subvol deletion Btrfs: drop the delalloc_bytes check in shrink_delalloc Btrfs: check the return value from set_anon_super	2011-06-20 08:58:53 -07:00
Josef Bacik	71d7aed014	Btrfs: fix path leakage on subvol deletion The delayed ref patch accidently removed the btrfs_free_path in btrfs_unlink_subvol, this puts it back and means we don't leak a path. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-06-15 13:24:45 -04:00
Linus Torvalds	3c25fa740e	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: use join_transaction in btrfs_evict_inode() Btrfs - use %pU to print fsid Btrfs: fix extent state leak on failed nodatasum reads btrfs: fix unlocked access of delalloc_inodes Btrfs: avoid stack bloat in btrfs_ioctl_fs_info() btrfs: remove 64bit alignment padding to allow extent_buffer to fit into one fewer cacheline Btrfs: clear current->journal_info on async transaction commit Btrfs: make sure to recheck for bitmaps in clusters btrfs: remove unneeded includes from scrub.c btrfs: reinitialize scrub workers btrfs: scrub: errors in tree enumeration Btrfs: don't map extent buffer if path->skip_locking is set Btrfs: unlock the trans lock properly Btrfs: don't map extent buffer if path->skip_locking is set Btrfs: fix duplicate checking logic Btrfs: fix the allocator loop logic Btrfs: fix bitmap regression Btrfs: don't commit the transaction if we dont have enough pinned bytes Btrfs: noinline the cluster searching functions Btrfs: cache bitmaps when searching for a cluster	2011-06-12 11:06:36 -07:00
Li Zefan	30b4caf5d7	Btrfs: use join_transaction in btrfs_evict_inode() The WARN_ON() in start_transaction() was triggered while balancing. The cause is btrfs_relocate_chunk() started a transaction and then called iput() on the inode that stores free space cache, and iput() called btrfs_start_transaction() again. Reported-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-06-11 08:31:55 -04:00
Jan Schmidt	08d2f347e8	Btrfs: fix extent state leak on failed nodatasum reads When encountering an EIO while reading from a nodatasum extent, we insert an error record into the inode's failure tree. btrfs_readpage_end_io_hook returns early for nodatasum inodes. We'd better clear the failure tree in that case, otherwise the kernel complains about BUG extent_state: Objects remaining on kmem_cache_close() on rmmod. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-06-10 19:00:53 -04:00
Linus Torvalds	e6ece70732	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (25 commits) btrfs: fix uninitialized variable warning btrfs: add helper for fs_info->closing Btrfs: add mount -o inode_cache btrfs: scrub: add explicit plugging btrfs: use btrfs_ino to access inode number Btrfs: don't save the inode cache if we are deleting this root btrfs: false BUG_ON when degraded Btrfs: don't save the inode cache in non-FS roots Btrfs: make sure we don't overflow the free space cache crc page Btrfs: fix uninit variable in the delayed inode code btrfs: scrub: don't reuse bios and pages Btrfs: leave spinning on lookup and map the leaf Btrfs: check for duplicate entries in the free space cache Btrfs: don't try to allocate from a block group that doesn't have enough space Btrfs: don't always do readahead Btrfs: try not to sleep as much when doing slow caching Btrfs: kill BTRFS_I(inode)->block_group Btrfs: don't look at the extent buffer level 3 times in a row Btrfs: map the node block when looking for readahead targets Btrfs: set range_start to the right start in count_range_bits ...	2011-06-05 06:17:23 +09:00
David Sterba	7841cb2898	btrfs: add helper for fs_info->closing wrap checking of filesystem 'closing' flag and fix a few missing memory barriers. Signed-off-by: David Sterba <dsterba@suse.cz>	2011-06-04 08:11:22 -04:00
Linus Torvalds	36947a7682	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (36 commits) Cache xattr security drop check for write v2 fs: block_page_mkwrite should wait for writeback to finish mm: Wait for writeback when grabbing pages to begin a write configfs: remove unnecessary dentry_unhash on rmdir, dir rename fat: remove unnecessary dentry_unhash on rmdir, dir rename hpfs: remove unnecessary dentry_unhash on rmdir, dir rename minix: remove unnecessary dentry_unhash on rmdir, dir rename fuse: remove unnecessary dentry_unhash on rmdir, dir rename coda: remove unnecessary dentry_unhash on rmdir, dir rename afs: remove unnecessary dentry_unhash on rmdir, dir rename affs: remove unnecessary dentry_unhash on rmdir, dir rename 9p: remove unnecessary dentry_unhash on rmdir, dir rename ncpfs: fix rename over directory with dangling references ncpfs: document dentry_unhash usage ecryptfs: remove unnecessary dentry_unhash on rmdir, dir rename hostfs: remove unnecessary dentry_unhash on rmdir, dir rename hfsplus: remove unnecessary dentry_unhash on rmdir, dir rename hfs: remove unnecessary dentry_unhash on rmdir, dir rename omfs: remove unnecessary dentry_unhash on rmdir, dir rneame udf: remove unnecessary dentry_unhash from rmdir, dir rename ...	2011-05-28 13:03:41 -07:00
Chris Mason	ff5714cca9	Merge branch 'for-chris' of git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-work into for-linus Conflicts: fs/btrfs/disk-io.c fs/btrfs/extent-tree.c fs/btrfs/free-space-cache.c fs/btrfs/inode.c fs/btrfs/transaction.c Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-28 07:00:39 -04:00
Christoph Hellwig	aa38572954	fs: pass exact type of data dirties to ->dirty_inode Tell the filesystem if we just updated timestamp (I_DIRTY_SYNC) or anything else, so that the filesystem can track internally if it needs to push out a transaction for fdatasync or not. This is just the prototype change with no user for it yet. I plan to push large XFS changes for the next merge window, and getting this trivial infrastructure in this window would help a lot to avoid tree interdependencies. Also remove incorrect comments that ->dirty_inode can't block. That has been changed a long time ago, and many implementations rely on it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-05-27 07:04:40 -04:00
Chris Mason	4cb5300bc8	Btrfs: add mount -o auto_defrag This will detect small random writes into files and queue the up for an auto defrag process. It isn't well suited to database workloads yet, but works for smaller files such as rpm, sqlite or bdb databases. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-26 17:52:15 -04:00
Chris Mason	d6c0cb379c	Merge branch 'cleanups_and_fixes' into inode_numbers Conflicts: fs/btrfs/tree-log.c fs/btrfs/volumes.c Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-23 14:37:47 -04:00
Julia Lawall	b083916638	fs/btrfs: Add missing btrfs_free_path Btrfs_alloc_path should be matched with btrfs_free_path in error-handling code. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @r exists@ local idexpression struct btrfs_path * x; expression ra,rb; position p1,p2; @@ x = btrfs_alloc_path@p1(...) ... when != btrfs_free_path(x,...) when != if (...) { ... btrfs_free_path(x,...) ...} when != x = ra if(...) { ... when != x = rb when forall when != btrfs_free_path(x,...) $return <+...x...+>; \\| return@p2...; $ } @script:python@ p1 << r.p1; p2 << r.p2; @@ cocci.print_main("alloc",p1) cocci.print_secs("return",p2) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-23 13:24:41 -04:00
Tsutomu Itoh	1cd307990d	Btrfs: BUG_ON is deleted from the caller of btrfs_truncate_item & btrfs_extend_item Currently, btrfs_truncate_item and btrfs_extend_item returns only 0. So, the check by BUG_ON in the caller is unnecessary. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-23 13:24:39 -04:00
Sergei Trofimovich	27160b6b5a	btrfs: fix typo 'testeing' -> 'testing' Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-23 13:24:32 -04:00
Josef Bacik	d90c732122	Btrfs: leave spinning on lookup and map the leaf On lookup we only want to read the inode item, so leave the path spinning. Also we're just wholesale reading the leaf off, so map the leaf so we don't do a bunch of kmap/kunmaps. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-05-23 13:03:17 -04:00
Josef Bacik	026fd31782	Btrfs: don't always do readahead Our readahead is sort of sloppy, and really isn't always needed. For example if ls is doing a stating ls (which is the default) it's going to stat in non-disk order, so if say you have a directory with a stupid amount of files, readahead is going to do nothing but waste time in the case of doing the stat. Taking the unconditional readahead out made my test go from 57 minutes to 36 minutes. This means that everywhere we do loop through the tree we want to make sure we do set path->reada properly, so I went through and found all of the places where we loop through the path and set reada to 1. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-05-23 13:03:14 -04:00
Josef Bacik	d82a6f1d7e	Btrfs: kill BTRFS_I(inode)->block_group Originally this was going to be used as a way to give hints to the allocator, but frankly we can get much better hints elsewhere and it's not even used at all for anything usefull. In addition to be completely useless, when we initialize an inode we try and find a freeish block group to set as the inodes block group, and with a completely full 40gb fs this takes _forever_, so I imagine with say 1tb fs this is just unbearable. So just axe the thing altoghether, we don't need it and it saves us 8 bytes in the inode and saves us 500 microseconds per inode lookup in my testcase. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-05-23 13:03:12 -04:00
Josef Bacik	fcb80c2aff	Btrfs: fix how we do space reservation for truncate The ceph guys keep running into problems where we have space reserved in our orphan block rsv when freeing it up. This is because they tend to do snapshots alot, so their truncates tend to use a bunch of space, so when we go to do things like update the inode we have to steal reservation space in order to make the reservation happen. This happens because truncate can use as much space as it freaking feels like, but we still have to hold space for removing the orphan item and updating the inode, which will definitely always happen. So in order to fix this we need to split all of the reservation stuf up. So with this patch we have 1) The orphan block reserve which only holds the space for deleting our orphan item when everything is over. 2) The truncate block reserve which gets allocated and used specifically for the space that the truncate will use on a per truncate basis. 3) The transaction will always have 1 item's worth of data reserved so we can update the inode normally. Hopefully this will make the ceph problem go away. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-05-23 13:03:08 -04:00
Josef Bacik	7a7eaa40a3	Btrfs: take away the num_items argument from btrfs_join_transaction I keep forgetting that btrfs_join_transaction() just ignores the num_items argument, which leads me to sending pointless patches and looking stupid :). So just kill the num_items argument from btrfs_join_transaction and btrfs_start_ioctl_transaction, since neither of them use it. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-05-23 13:00:56 -04:00
Josef Bacik	74b2107543	Btrfs: make sure to use the delalloc reserve when filling delalloc In the prealloc filling code and compressed code we don't set trans->block_rsv to the delalloc block reserve properly, which is going to make us use metadata from the wrong pool, this patch fixes that. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-05-23 13:00:56 -04:00
Chris Mason	712673339a	Merge branch 'for-chris' of git://git.kernel.org/pub/scm/linux/kernel/git/arne/btrfs-unstable-arne into inode_numbers Conflicts: fs/btrfs/Makefile fs/btrfs/ctree.h fs/btrfs/volumes.h Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-23 06:30:52 -04:00
Chris Mason	945d8962ce	Merge branch 'cleanups' of git://repo.or.cz/linux-2.6/btrfs-unstable into inode_numbers Conflicts: fs/btrfs/extent-tree.c fs/btrfs/free-space-cache.c fs/btrfs/inode.c fs/btrfs/tree-log.c Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-22 12:33:42 -04:00
Chris Mason	dcc6d07322	Merge branch 'delayed_inode' into inode_numbers Conflicts: fs/btrfs/inode.c fs/btrfs/ioctl.c fs/btrfs/transaction.c Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-22 07:07:01 -04:00
Miao Xie	16cdcec736	btrfs: implement delayed inode items operation Changelog V5 -> V6: - Fix oom when the memory load is high, by storing the delayed nodes into the root's radix tree, and letting btrfs inodes go. Changelog V4 -> V5: - Fix the race on adding the delayed node to the inode, which is spotted by Chris Mason. - Merge Chris Mason's incremental patch into this patch. - Fix deadlock between readdir() and memory fault, which is reported by Itaru Kitayama. Changelog V3 -> V4: - Fix nested lock, which is reported by Itaru Kitayama, by updating space cache inode in time. Changelog V2 -> V3: - Fix the race between the delayed worker and the task which does delayed items balance, which is reported by Tsutomu Itoh. - Modify the patch address David Sterba's comment. - Fix the bug of the cpu recursion spinlock, reported by Chris Mason Changelog V1 -> V2: - break up the global rb-tree, use a list to manage the delayed nodes, which is created for every directory and file, and used to manage the delayed directory name index items and the delayed inode item. - introduce a worker to deal with the delayed nodes. Compare with Ext3/4, the performance of file creation and deletion on btrfs is very poor. the reason is that btrfs must do a lot of b+ tree insertions, such as inode item, directory name item, directory name index and so on. If we can do some delayed b+ tree insertion or deletion, we can improve the performance, so we made this patch which implemented delayed directory name index insertion/deletion and delayed inode update. Implementation: - introduce a delayed root object into the filesystem, that use two lists to manage the delayed nodes which are created for every file/directory. One is used to manage all the delayed nodes that have delayed items. And the other is used to manage the delayed nodes which is waiting to be dealt with by the work thread. - Every delayed node has two rb-tree, one is used to manage the directory name index which is going to be inserted into b+ tree, and the other is used to manage the directory name index which is going to be deleted from b+ tree. - introduce a worker to deal with the delayed operation. This worker is used to deal with the works of the delayed directory name index items insertion and deletion and the delayed inode update. When the delayed items is beyond the lower limit, we create works for some delayed nodes and insert them into the work queue of the worker, and then go back. When the delayed items is beyond the upper bound, we create works for all the delayed nodes that haven't been dealt with, and insert them into the work queue of the worker, and then wait for that the untreated items is below some threshold value. - When we want to insert a directory name index into b+ tree, we just add the information into the delayed inserting rb-tree. And then we check the number of the delayed items and do delayed items balance. (The balance policy is above.) - When we want to delete a directory name index from the b+ tree, we search it in the inserting rb-tree at first. If we look it up, just drop it. If not, add the key of it into the delayed deleting rb-tree. Similar to the delayed inserting rb-tree, we also check the number of the delayed items and do delayed items balance. (The same to inserting manipulation) - When we want to update the metadata of some inode, we cached the data of the inode into the delayed node. the worker will flush it into the b+ tree after dealing with the delayed insertion and deletion. - We will move the delayed node to the tail of the list after we access the delayed node, By this way, we can cache more delayed items and merge more inode updates. - If we want to commit transaction, we will deal with all the delayed node. - the delayed node will be freed when we free the btrfs inode. - Before we log the inode items, we commit all the directory name index items and the delayed inode update. I did a quick test by the benchmark tool[1] and found we can improve the performance of file creation by ~15%, and file deletion by ~20%. Before applying this patch: Create files: Total files: 50000 Total time: 1.096108 Average time: 0.000022 Delete files: Total files: 50000 Total time: 1.510403 Average time: 0.000030 After applying this patch: Create files: Total files: 50000 Total time: 0.932899 Average time: 0.000019 Delete files: Total files: 50000 Total time: 1.215732 Average time: 0.000024 [1] http://marc.info/?l=linux-btrfs&m=128212635122920&q=p3 Many thanks for Kitayama-san's help! Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Reviewed-by: David Sterba <dave@jikos.cz> Tested-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Tested-by: Itaru Kitayama <kitayama@cl.bb4u.ne.jp> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-21 09:30:56 -04:00
Chris Mason	0965537308	Merge branch 'ino-alloc' of git://repo.or.cz/linux-btrfs-devel into inode_numbers Conflicts: fs/btrfs/free-space-cache.c Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-21 09:27:38 -04:00
David Sterba	7a36ddec10	btrfs: use printk_ratelimited instead of printk_ratelimit As per printk_ratelimit comment, it should not be used. Signed-off-by: David Sterba <dsterba@suse.cz>	2011-05-12 18:08:38 +02:00
Arne Jansen	a2de733c78	btrfs: scrub This adds an initial implementation for scrub. It works quite straightforward. The usermode issues an ioctl for each device in the fs. For each device, it enumerates the allocated device chunks. For each chunk, the contained extents are enumerated and the data checksums fetched. The extents are read sequentially and the checksums verified. If an error occurs (checksum or EIO), a good copy is searched for. If one is found, the bad copy will be rewritten. All enumerations happen from the commit roots. During a transaction commit, the scrubs get paused and afterwards continue from the new roots. This commit is based on the series originally posted to linux-btrfs with some improvements that resulted from comments from David Sterba, Ilya Dryomov and Jan Schmidt. Signed-off-by: Arne Jansen <sensille@gmx.net>	2011-05-12 14:45:20 +02:00
David Sterba	182608c829	btrfs: remove old unused commented out code Remove code which has been #if0-ed out for a very long time and does not seem to be related to current codebase anymore. Signed-off-by: David Sterba <dsterba@suse.cz>	2011-05-06 12:34:10 +02:00
David Sterba	f2a97a9dbd	btrfs: remove all unused functions Remove static and global declarations and/or definitions. Reduces size of btrfs.ko by ~3.4kB. text data bss dec hex filename 402081 7464 200 409745 64091 btrfs.ko.base 398620 7144 200 405964 631cc btrfs.ko.remove-all Signed-off-by: David Sterba <dsterba@suse.cz>	2011-05-06 12:34:03 +02:00
David Sterba	b3b4aa74b5	btrfs: drop unused parameter from btrfs_release_path parameter tree root it's not used since commit `5f39d397df` ("Btrfs: Create extent_buffer interface for large blocksizes") Signed-off-by: David Sterba <dsterba@suse.cz>	2011-05-02 13:57:22 +02:00
David Sterba	172ddd60a6	btrfs: drop gfp parameter from alloc_extent_map pass GFP_NOFS directly to kmem_cache_alloc Signed-off-by: David Sterba <dsterba@suse.cz>	2011-05-02 13:57:21 +02:00
David Sterba	a8067e022a	btrfs: drop unused parameter from extent_map_tree_init the GFP flags are not stored anywhere and all allocations are done via alloc_extent_map(GFP_NOFS). Signed-off-by: David Sterba <dsterba@suse.cz>	2011-05-02 13:57:21 +02:00
David Sterba	f993c883ad	btrfs: drop unused argument from extent_io_tree_init all callers pass GFP_NOFS, but the GFP mask argument is not used in the function; GFP_ATOMIC is passed to radix tree initialization and it's the only correct one, since we're using the preload/insert mechanism of radix tree. Let's drop the gfp mask from btrfs function, this will not change behaviour. Signed-off-by: David Sterba <dsterba@suse.cz>	2011-05-02 13:57:21 +02:00
David Sterba	c704005d88	btrfs: unify checking of IS_ERR and null use IS_ERR_OR_NULL when possible, done by this coccinelle script: @ match @ identifier id; @@ ( - BUG_ON(IS_ERR(id) \|\| !id); + BUG_ON(IS_ERR_OR_NULL(id)); \| - IS_ERR(id) \|\| !id + IS_ERR_OR_NULL(id) \| - !id \|\| IS_ERR(id) + IS_ERR_OR_NULL(id) ) Signed-off-by: David Sterba <dsterba@suse.cz>	2011-05-02 13:57:20 +02:00
David Sterba	306e16ce13	btrfs: rename variables clashing with global function names reported by gcc -Wshadow: page_index, page_offset, new_inode, dev_name Signed-off-by: David Sterba <dsterba@suse.cz>	2011-05-02 13:57:19 +02:00
Linus Torvalds	019793b755	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: cleanup error handling in inode.c Btrfs: put the right bio if we have an error Btrfs: free bitmaps properly when evicting the cache Btrfs: Free free_space item properly in btrfs_trim_block_group() btrfs: add missing spin_unlock to a rare exit path Btrfs: check return value of kmalloc() btrfs: fix wrong allocating flag when reading page Btrfs: fix missing mutex_unlock in btrfs_del_dir_entries_in_log()	2011-04-26 08:26:58 -07:00
Tsutomu Itoh	7cf96da3ec	Btrfs: cleanup error handling in inode.c The error processing of several places is changed like setting the error number only at the error. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-04-25 19:43:53 -04:00
Josef Bacik	64728bbbf8	Btrfs: put the right bio if we have an error In btrfs_submit_direct_hook if the first btrfs_map_block fails we need to put the orig_bio, not bio. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-04-25 19:43:52 -04:00
Tsutomu Itoh	8d413713ca	Btrfs: check return value of kmalloc() The check on the return value of kmalloc() is added to some places. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-04-25 19:43:52 -04:00
Li Zefan	82d5902d9c	Btrfs: Support reading/writing on disk free ino cache This is similar to block group caching. We dedicate a special inode in fs tree to save free ino cache. At the very first time we create/delete a file after mount, the free ino cache will be loaded from disk into memory. When the fs tree is commited, the cache will be written back to disk. To keep compatibility, we check the root generation against the generation of the special inode when loading the cache, so the loading will fail if the btrfs filesystem was mounted in an older kernel before. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>	2011-04-25 16:46:11 +08:00
Li Zefan	33345d0152	Btrfs: Always use 64bit inode number There's a potential problem in 32bit system when we exhaust 32bit inode numbers and start to allocate big inode numbers, because btrfs uses inode->i_ino in many places. So here we always use BTRFS_I(inode)->location.objectid, which is an u64 variable. There are 2 exceptions that BTRFS_I(inode)->location.objectid != inode->i_ino: the btree inode (0 vs 1) and empty subvol dirs (256 vs 2), and inode->i_ino will be used in those cases. Another reason to make this change is I'm going to use a special inode to save free ino cache, and the inode number must be > (u64)-256. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>	2011-04-25 16:46:09 +08:00
Li Zefan	581bb05094	Btrfs: Cache free inode numbers in memory Currently btrfs stores the highest objectid of the fs tree, and it always returns (highest+1) inode number when we create a file, so inode numbers won't be reclaimed when we delete files, so we'll run out of inode numbers as we keep create/delete files in 32bits machines. This fixes it, and it works similarly to how we cache free space in block cgroups. We start a kernel thread to read the file tree. By scanning inode items, we know which chunks of inode numbers are free, and we cache them in an rb-tree. Because we are searching the commit root, we have to carefully handle the cross-transaction case. The rb-tree is a hybrid extent+bitmap tree, so if we have too many small chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram of extents, and a bitmap will be used if we exceed this threshold. The extents threshold is adjusted in runtime. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>	2011-04-25 16:46:04 +08:00
Linus Torvalds	adff377bb1	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (24 commits) Btrfs: fix free space cache leak Btrfs: avoid taking the chunk_mutex in do_chunk_alloc Btrfs end_bio_extent_readpage should look for locked bits Btrfs: don't force chunk allocation in find_free_extent Btrfs: Check validity before setting an acl Btrfs: Fix incorrect inode nlink in btrfs_link() Btrfs: Check if btrfs_next_leaf() returns error in btrfs_real_readdir() Btrfs: Check if btrfs_next_leaf() returns error in btrfs_listxattr() Btrfs: make uncache_state unconditional btrfs: using cached extent_state in set/unlock combinations Btrfs: avoid taking the trans_mutex in btrfs_end_transaction Btrfs: fix subvolume mount by name problem when default mount subvolume is set fix user annotation in ioctl.c Btrfs: check for duplicate iov_base's when doing dio reads btrfs: properly handle overlapping areas in memmove_extent_buffer Btrfs: fix memory leaks in btrfs_new_inode() Btrfs: check for duplicate iov_base's when doing dio reads Btrfs: reuse the extent_map we found when calling btrfs_get_extent Btrfs: do not use async submit for small DIO io's Btrfs: don't split dio bios if we don't have to ...	2011-04-18 12:24:05 -07:00
Miao Xie	3153495d8e	Btrfs: Fix incorrect inode nlink in btrfs_link() Link count of the inode is not decreased if btrfs_set_inode_index() fails. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Singed-off-by: Li Zefan <lizf@cn.fujitsu.com>	2011-04-13 14:25:32 +08:00
Li Zefan	b9e03af0bc	Btrfs: Check if btrfs_next_leaf() returns error in btrfs_real_readdir() btrfs_next_leaf() can return -errno, and we should propagate it to userspace. This also simplifies how we walk the btree path. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>	2011-04-13 14:25:31 +08:00
Chris Mason	874d0d2633	Merge branch 'for-chris' of git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-work into for-linus	2011-04-11 20:46:03 -04:00
Arne Jansen	507903b818	btrfs: using cached extent_state in set/unlock combinations In several places the sequence (set_extent_uptodate, unlock_extent) is used. This leads to a duplicate lookup of the extent state. This patch lets set_extent_uptodate return a cached extent_state which can be passed to unlock_extent_cached. The occurences of the above sequences are updated to use the cache. Only end_bio_extent_readpage is updated that it first gets a cached state to pass it to the readpage_end_io_hook as the prototype requested and is later on being used for set/unlock. Signed-off-by: Arne Jansen <sensille@gmx.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-04-11 20:45:36 -04:00
Josef Bacik	a1b75f7d96	Btrfs: check for duplicate iov_base's when doing dio reads Apparently it is ok to submit a read to an IDE device with the same target page for different offsets. This is what Windows does under qemu. The problem is under DIO we expect them to be different buffers for checksumming reasons, and so this sort of thing will result in checksum errors, when in reality the file is fine. So when reading, check to make sure that all iov bases are different, and if they aren't fall back to buffered mode, since that will work out right. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-04-11 20:25:06 -04:00
Yoshinori Sano	8fb27640d0	Btrfs: fix memory leaks in btrfs_new_inode() This patch fixes memory leaks in btrfs_new_inode(). Signed-off-by: Yoshinori Sano <yoshinori.sano@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-04-11 20:25:06 -04:00
Josef Bacik	93a54bc4c2	Btrfs: check for duplicate iov_base's when doing dio reads Apparently it is ok to submit a read to an IDE device with the same target page for different offsets. This is what Windows does under qemu. The problem is under DIO we expect them to be different buffers for checksumming reasons, and so this sort of thing will result in checksum errors, when in reality the file is fine. So when reading, check to make sure that all iov bases are different, and if they aren't fall back to buffered mode, since that will work out right. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-04-08 13:00:43 -04:00
Josef Bacik	16d299ac74	Btrfs: reuse the extent_map we found when calling btrfs_get_extent In btrfs_get_block_direct we call btrfs_get_extent to lookup the extent for the range that we are looking for. If we don't find an extent, btrfs_get_extent will insert a extent_map for that area and mark it as a hole. So it does the job of allocating a new extent map and inserting it into the io tree. But if we're creating a new extent we free it up and redo all of that work. So instead pass the em to btrfs_new_extent_direct(), and if it will work just allocate the disk space and set it up properly and bypass the freeing/allocating of a new extent map and the expensive operation of inserting the thing into the io_tree. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-04-08 13:00:41 -04:00
Josef Bacik	1ae3993825	Btrfs: do not use async submit for small DIO io's When looking at our DIO performance Chris said that for small IO's doing the async submit stuff tends to be more overhead than it's worth. With this on top of my other fixes I get about a 17-20% speedup doing a sequential dd with 4k IO's. Basically if we don't have to split the bio for the map length it's small enough to be directly submitted, otherwise go back to the async submit. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-04-08 13:00:39 -04:00
Josef Bacik	02f57c7aed	Btrfs: don't split dio bios if we don't have to We have been unconditionally allocating a new bio and re-adding all pages from our original bio to the new bio. This is needed if our original bio is larger than our stripe size, but if it is smaller than the stripe size then there is no need to do this. So check the map length and if we are under that then go ahead and submit the original bio. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-04-08 13:00:37 -04:00
Josef Bacik	1ef30be142	Btrfs: do not call btrfs_update_inode in endio if nothing changed In the DIO code we often don't update the i_disk_size because the i_size isn't updated until after the DIO is completed, so basically we are allocating a path, doing a search, and updating the inode item for no reason since nothing changed. btrfs_ordered_update_i_size will return 1 if it didn't update i_disk_size, so only run btrfs_update_inode if btrfs_ordered_update_i_size returns 0. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-04-08 13:00:36 -04:00
Josef Bacik	12ddb96cb6	Btrfs: map the inode item when doing fill_inode_item Instead of calling kmap_atomic for every thing we set in the inode item, map the entire inode item at the start and unmap it at the end. This makes a sequential dd of 400mb O_DIRECT something like 1% faster. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-04-08 13:00:34 -04:00
Linus Torvalds	42933bac11	Merge branch 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6 * 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6: Fix common misspellings	2011-04-07 11:14:49 -07:00
Linus Torvalds	884b8267d5	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: don't warn in btrfs_add_orphan Btrfs: fix free space cache when there are pinned extents and clusters V2 Btrfs: Fix uninitialized root flags for subvolumes btrfs: clear __GFP_FS flag in the space cache inode Btrfs: fix memory leak in start_transaction() Btrfs: fix memory leak in btrfs_ioctl_start_sync() Btrfs: fix subvol_sem leak in btrfs_rename() Btrfs: Fix oops for defrag with compression turned on Btrfs: fix /proc/mounts info. Btrfs: fix compiler warning in file.c	2011-04-05 12:29:25 -07:00
Josef Bacik	c9ddec74aa	Btrfs: don't warn in btrfs_add_orphan When I moved the orphan adding to btrfs_truncate I missed the fact that during orphan cleanup we just add the orphan items to the orphan list without going through btrfs_orphan_add, which results in lots of warnings on mount if you have any orphan items that need to be truncated. Just remove this warning since it's ok, this will allow all of the normal space accounting take place. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-04-05 01:20:24 -04:00
Miao Xie	adae52b94e	btrfs: clear __GFP_FS flag in the space cache inode the object id of the space cache inode's key is allocated from the relative root, just like the regular file. So we can't identify space cache inode by checking the object id of the inode's key, and we have to clear __GFP_FS flag at the time we look up the space cache inode. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-04-05 01:19:43 -04:00
Johann Lombardi	b44c59a80d	Btrfs: fix subvol_sem leak in btrfs_rename() btrfs_rename() does not release the subvol_sem if the transaction failed to start. Signed-off-by: Johann Lombardi <johann@whamcloud.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-04-05 01:19:42 -04:00
Li Zefan	fe3f566cd1	Btrfs: Fix oops for defrag with compression turned on When we defrag a file, whose size can be fit into an inline extent, with compression enabled, the compress type is set to be fs_info->compress_type, which is 0 if the btrfs filesystem is mounted without compress option. This leads to oops. Reported-by: Daniel Blueman <daniel.blueman@gmail.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-04-05 01:19:42 -04:00
Lucas De Marchi	25985edced	Fix common misspellings Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>	2011-03-31 11:26:23 -03:00
Linus Torvalds	212a17ab87	Merge branch 'for-linus-unmerged' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'for-linus-unmerged' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (45 commits) Btrfs: fix __btrfs_map_block on 32 bit machines btrfs: fix possible deadlock by clearing __GFP_FS flag btrfs: check link counter overflow in link(2) btrfs: don't mess with i_nlink of unlocked inode in rename() Btrfs: check return value of btrfs_alloc_path() Btrfs: fix OOPS of empty filesystem after balance Btrfs: fix memory leak of empty filesystem after balance Btrfs: fix return value of setflags ioctl Btrfs: fix uncheck memory allocations btrfs: make inode ref log recovery faster Btrfs: add btrfs_trim_fs() to handle FITRIM Btrfs: adjust btrfs_discard_extent() return errors and trimmed bytes Btrfs: make btrfs_map_block() return entire free extent for each device of RAID0/1/10/DUP Btrfs: make update_reserved_bytes() public btrfs: return EXDEV when linking from different subvolumes Btrfs: Per file/directory controls for COW and compression Btrfs: add datacow flag in inode flag btrfs: use GFP_NOFS instead of GFP_KERNEL Btrfs: check return value of read_tree_block() btrfs: properly access unaligned checksum buffer ... Fix up trivial conflicts in fs/btrfs/volumes.c due to plug removal in the block layer.	2011-03-28 15:31:05 -07:00
Miao Xie	1561deda68	btrfs: fix possible deadlock by clearing __GFP_FS flag Using the GFP_HIGHUSER_MOVABLE flag to allocate the metadata's page may cause deadlock. Task1 open() ... btrfs_search_slot() ... btrfs_cow_block() ... alloc_page() wait for reclaiming shrink_slab() ... shrink_icache_memory() ... btrfs_evict_inode() ... btrfs_search_slot() If the path is locked by task1, the deadlock happens. So the btree's page cache is different with the file's page cache, it can not allocate pages by GFP_HIGHUSER_MOVABLE flag, we must clear __GFP_FS flag in GFP_HIGHUSER_MOVABLE flag. Reported-by: Itaru Kitayama <kitayama@cl.bb4u.ne.jp> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-03-28 05:37:58 -04:00
Al Viro	c055e99eea	btrfs: check link counter overflow in link(2) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-03-28 05:37:56 -04:00
Al Viro	92986796d8	btrfs: don't mess with i_nlink of unlocked inode in rename() old_inode is not locked; it's not safe to play with its link count. Instead of bumping it and calling btrfs_unlink_inode(), add a variant of the latter that does not do btrfs_drop_nlink()/ btrfs_update_inode(), call it instead of btrfs_inc_nlink()/ btrfs_unlink_inode() and do btrfs_update_inode() ourselves. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-03-28 05:37:55 -04:00
Tsutomu Itoh	c2db1073fd	Btrfs: check return value of btrfs_alloc_path() Adding the check on the return value of btrfs_alloc_path() to several places. And, some of callers are modified by this change. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-03-28 05:37:54 -04:00
Yoshinori Sano	dac97e516c	Btrfs: fix uncheck memory allocations To make Btrfs code more robust, several return value checks where memory allocation can fail are introduced. I use BUG_ON where I don't know how to handle the error properly, which increases the number of using the notorious BUG_ON, though. Signed-off-by: Yoshinori Sano <yoshinori.sano@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-03-28 05:37:49 -04:00
Mark Fasheh	3ab3564f01	btrfs: return EXDEV when linking from different subvolumes btrfs_link returns EPERM if a cross-subvolume link is attempted. However, in this case I believe EXDEV to be the more appropriate value. >From the link(2) man page: EXDEV oldpath and newpath are not on the same mounted file system. (Linux permits a file system to be mounted at multiple points, but link() does not work across different mount points, even if the same file system is mounted on both.) This matters because an application may have different behaviors based on return codes. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-03-28 05:37:42 -04:00
Liu Bo	75e7cb7fe0	Btrfs: Per file/directory controls for COW and compression Data compression and data cow are controlled across the entire FS by mount options right now. ioctls are needed to set this on a per file or per directory basis. This has been proposed previously, but VFS developers wanted us to use generic ioctls rather than btrfs-specific ones. According to Chris's comment, there should be just one true compression method(probably LZO) stored in the super. However, before this, we would wait for that one method is stable enough to be adopted into the super. So I list it as a long term goal, and just store it in ram today. After applying this patch, we can use the generic "FS_IOC_SETFLAGS" ioctl to control file and directory's datacow and compression attribute. NOTE: - The compression type is selected by such rules: If we mount btrfs with compress options, ie, zlib/lzo, the type is it. Otherwise, we'll use the default compress type (zlib today). v1->v2: - rebase to the latest btrfs. v2->v3: - fix a problem, i.e. when a file is set NOCOW via mount option, then this NOCOW will be screwed by inheritance from parent directory. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-03-28 05:37:41 -04:00
liubo	1abe9b8a13	Btrfs: add initial tracepoint support for btrfs Tracepoints can provide insight into why btrfs hits bugs and be greatly helpful for debugging, e.g dd-7822 [000] 2121.641088: btrfs_inode_request: root = 5(FS_TREE), gen = 4, ino = 256, blocks = 8, disk_i_size = 0, last_trans = 8, logged_trans = 0 dd-7822 [000] 2121.641100: btrfs_inode_new: root = 5(FS_TREE), gen = 8, ino = 257, blocks = 0, disk_i_size = 0, last_trans = 0, logged_trans = 0 btrfs-transacti-7804 [001] 2146.935420: btrfs_cow_block: root = 2(EXTENT_TREE), refs = 2, orig_buf = 29368320 (orig_level = 0), cow_buf = 29388800 (cow_level = 0) btrfs-transacti-7804 [001] 2146.935473: btrfs_cow_block: root = 1(ROOT_TREE), refs = 2, orig_buf = 29364224 (orig_level = 0), cow_buf = 29392896 (cow_level = 0) btrfs-transacti-7804 [001] 2146.972221: btrfs_transaction_commit: root = 1(ROOT_TREE), gen = 8 flush-btrfs-2-7821 [001] 2155.824210: btrfs_chunk_alloc: root = 3(CHUNK_TREE), offset = 1103101952, size = 1073741824, num_stripes = 1, sub_stripes = 0, type = DATA flush-btrfs-2-7821 [001] 2155.824241: btrfs_cow_block: root = 2(EXTENT_TREE), refs = 2, orig_buf = 29388800 (orig_level = 0), cow_buf = 29396992 (cow_level = 0) flush-btrfs-2-7821 [001] 2155.824255: btrfs_cow_block: root = 4(DEV_TREE), refs = 2, orig_buf = 29372416 (orig_level = 0), cow_buf = 29401088 (cow_level = 0) flush-btrfs-2-7821 [000] 2155.824329: btrfs_cow_block: root = 3(CHUNK_TREE), refs = 2, orig_buf = 20971520 (orig_level = 0), cow_buf = 20975616 (cow_level = 0) btrfs-endio-wri-7800 [001] 2155.898019: btrfs_cow_block: root = 5(FS_TREE), refs = 2, orig_buf = 29384704 (orig_level = 0), cow_buf = 29405184 (cow_level = 0) btrfs-endio-wri-7800 [001] 2155.898043: btrfs_cow_block: root = 7(CSUM_TREE), refs = 2, orig_buf = 29376512 (orig_level = 0), cow_buf = `29409280` (cow_level = 0) Here is what I have added: 1) ordere_extent: btrfs_ordered_extent_add btrfs_ordered_extent_remove btrfs_ordered_extent_start btrfs_ordered_extent_put These provide critical information to understand how ordered_extents are updated. 2) extent_map: btrfs_get_extent extent_map is used in both read and write cases, and it is useful for tracking how btrfs specific IO is running. 3) writepage: __extent_writepage btrfs_writepage_end_io_hook Pages are cirtical resourses and produce a lot of corner cases during writeback, so it is valuable to know how page is written to disk. 4) inode: btrfs_inode_new btrfs_inode_request btrfs_inode_evict These can show where and when a inode is created, when a inode is evicted. 5) sync: btrfs_sync_file btrfs_sync_fs These show sync arguments. 6) transaction: btrfs_transaction_commit In transaction based filesystem, it will be useful to know the generation and who does commit. 7) back reference and cow: btrfs_delayed_tree_ref btrfs_delayed_data_ref btrfs_delayed_ref_head btrfs_cow_block Btrfs natively supports back references, these tracepoints are helpful on understanding btrfs's COW mechanism. 8) chunk: btrfs_chunk_alloc btrfs_chunk_free Chunk is a link between physical offset and logical offset, and stands for space infomation in btrfs, and these are helpful on tracing space things. 9) reserved_extent: btrfs_reserved_extent_alloc btrfs_reserved_extent_free These can show how btrfs uses its space. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-03-28 05:37:33 -04:00
Josef Bacik	c0da7aa1a2	Btrfs: mark the bio with an error if we have a failure in dio I noticed that dio_end_io calls the appropriate endio function with an error, but the endio functions don't actually do anything with that error, they assume that if there was an error then the bio will not be uptodate. So if we had checksum failures we would never pass back EIO. So if there is an error in our endio functions make sure to clear the uptodate flag on the bio. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-03-25 19:08:19 -04:00
Josef Bacik	98bc3149fa	Btrfs: don't allocate dip->csums when doing writes When doing direct writes we store the checksums in the ordered sum stuff in the ordered extent for writing them when the write completes, so we don't even use the dip->csums array. So if we're writing, don't bother allocating dip->csums since we won't use it anyway. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-03-25 19:08:18 -04:00
Linus Torvalds	6c51038900	Merge branch 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block * 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits) Documentation/iostats.txt: bit-size reference etc. cfq-iosched: removing unnecessary think time checking cfq-iosched: Don't clear queue stats when preempt. blk-throttle: Reset group slice when limits are changed blk-cgroup: Only give unaccounted_time under debug cfq-iosched: Don't set active queue in preempt block: fix non-atomic access to genhd inflight structures block: attempt to merge with existing requests on plug flush block: NULL dereference on error path in __blkdev_get() cfq-iosched: Don't update group weights when on service tree fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away block: Require subsystems to explicitly allocate bio_set integrity mempool jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging fs: make fsync_buffers_list() plug mm: make generic_writepages() use plugging blk-cgroup: Add unaccounted time to timeslice_used. block: fixup plugging stubs for !CONFIG_BLOCK block: remove obsolete comments for blkdev_issue_zeroout. blktrace: Use rq->cmd_flags directly in blk_add_trace_rq. ... Fix up conflicts in fs/{aio.c,super.c}	2011-03-24 10:16:26 -07:00
Josef Bacik	22a94d44bd	Btrfs: add checks to verify dir items are correct We need to make sure the dir items we get are valid dir items. So any time we try and read one check it with verify_dir_item, which will do various sanity checks to make sure it looks sane. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-03-17 14:21:41 -04:00
Josef Bacik	695a0d0da0	Btrfs: add a comment explaining what btrfs_cont_expand does Everytime I have to deal with btrfs_cont_expand I stare at it for 20 minutes trying to remember what exactly it does and why the hell we need it. So add a comment to save future-Josef some time. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-03-17 14:21:33 -04:00
Josef Bacik	930f028abe	Btrfs: use mark_inode_dirty when expanding the file Mark_inode_dirty will call btrfs_dirty_inode which will take care of updating the inode. This makes setsize a little cleaner since we don't have to start a transaction and update the inode in there, we can just call mark_inode_dirty. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-03-17 14:21:32 -04:00
Josef Bacik	f0cd846e92	Btrfs: only add orphan items when truncating We don't need an orphan item when expanding files, we just need them for truncating them, so only add the orphan item in btrfs_truncate instead of in btrfs_setsize. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-03-17 14:21:30 -04:00
Josef Bacik	ded5db9de7	Btrfs: make sure to remove the orphan item from the in-memory list This fixes a problem where if truncate fails the inode will still be on the in memory orphan list. This is will make us complain when the inode gets destroyed because it's still on the orphan list. So if we fail just remove us from the in memory list and carry on. Signed-off-by: Josef Bacik <josef@redhat.com>	2011-03-17 14:21:28 -04:00
Josef Bacik	66b4ffd110	Btrfs: handle errors in btrfs_orphan_cleanup If we cannot truncate an inode for some reason we will never delete the orphan item associated with that inode, which means that we will loop forever in btrfs_orphan_cleanup. Instead of doing this just return error so we fail to mount. It sucks, but hey it's better than hanging. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-03-17 14:21:26 -04:00
Josef Bacik	3893e33b0b	Btrfs: cleanup error handling in the truncate path Now that we can handle having errors in the truncate path lets make sure we return errors instead of doing BUG_ON() and such. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-03-17 14:21:24 -04:00
Josef Bacik	a41ad394a0	Btrfs: convert to the new truncate sequence ->truncate() is going away, instead all of the work needs to be done in ->setattr(). So this converts us over to do this. It's fairly straightforward, just get rid of our .truncate inode operation and call btrfs_truncate() directly from btrfs_setsize. This works out better for us since truncate can technically return ENOSPC, and before we had no way of letting anybody know. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-03-17 14:21:22 -04:00
Josef Bacik	dc89e98244	Btrfs: use a slab for the free space entries Since we alloc/free free space entries a whole lot, lets use a slab to keep track of them. This makes some of my tests slightly faster. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-03-17 14:21:20 -04:00
Josef Bacik	57a45ced94	Btrfs: change reserved_extents to an atomic_t We track delayed allocation per inodes via 2 counters, one is outstanding_extents and reserved_extents. Outstanding_extents is already an atomic_t, but reserved_extents is not and is protected by a spinlock. So convert this to an atomic_t and instead of using a spinlock, use atomic_cmpxchg when releasing delalloc bytes. This makes our inode 72 bytes smaller, and reduces locking overhead (albiet it was minimal to begin with). Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-03-17 14:21:18 -04:00
Linus Torvalds	0f6e0e8448	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (33 commits) AppArmor: kill unused macros in lsm.c AppArmor: cleanup generated files correctly KEYS: Add an iovec version of KEYCTL_INSTANTIATE KEYS: Add a new keyctl op to reject a key with a specified error code KEYS: Add a key type op to permit the key description to be vetted KEYS: Add an RCU payload dereference macro AppArmor: Cleanup make file to remove cruft and make it easier to read SELinux: implement the new sb_remount LSM hook LSM: Pass -o remount options to the LSM SELinux: Compute SID for the newly created socket SELinux: Socket retains creator role and MLS attribute SELinux: Auto-generate security_is_socket_class TOMOYO: Fix memory leak upon file open. Revert "selinux: simplify ioctl checking" selinux: drop unused packet flow permissions selinux: Fix packet forwarding checks on postrouting selinux: Fix wrong checks for selinux_policycap_netpeer selinux: Fix check for xfrm selinux context algorithm ima: remove unnecessary call to ima_must_measure IMA: remove IMA imbalance checking ...	2011-03-16 09:15:43 -07:00
Linus Torvalds	422e6c4bc4	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (57 commits) tidy the trailing symlinks traversal up Turn resolution of trailing symlinks iterative everywhere simplify link_path_walk() tail Make trailing symlink resolution in path_lookupat() iterative update nd->inode in __do_follow_link() instead of after do_follow_link() pull handling of one pathname component into a helper fs: allow AT_EMPTY_PATH in linkat(), limit that to CAP_DAC_READ_SEARCH Allow passing O_PATH descriptors via SCM_RIGHTS datagrams readlinkat(), fchownat() and fstatat() with empty relative pathnames Allow O_PATH for symlinks New kind of open files - "location only". ext4: Copy fs UUID to superblock ext3: Copy fs UUID to superblock. vfs: Export file system uuid via /proc/<pid>/mountinfo unistd.h: Add new syscalls numbers to asm-generic x86: Add new syscalls for x86_64 x86: Add new syscalls for x86_32 fs: Remove i_nlink check from file system link callback fs: Don't allow to create hardlink for deleted file vfs: Add open by file handle support ...	2011-03-15 15:48:13 -07:00
James Morris	a002951c97	Merge branch 'next' into for-linus	2011-03-16 09:41:17 +11:00
Aneesh Kumar K.V	f17b604207	fs: Remove i_nlink check from file system link callback Now that VFS check for inode->i_nlink == 0 and returns proper error, remove similar check from file system Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-15 02:21:44 -04:00
Linus Torvalds	0e5b88cd99	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: break out of shrink_delalloc earlier btrfs: fix not enough reserved space btrfs: fix dip leak Btrfs: make sure not to return overlapping extents to fiemap Btrfs: deal with short returns from copy_from_user Btrfs: fix regressions in copy_from_user handling	2011-03-13 16:00:49 -07:00
Miao Xie	7e6b6465e6	btrfs: fix not enough reserved space btrfs_link() will insert 3 items(inode ref, dir name item and dir index item) into the b+ tree and update 2 items(its inode, and parent's inode) in the b+ tree. So we should reserve space for these 5 items, not 3 items. Reported-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-03-10 11:21:49 -05:00
Daniel J Blueman	b4966b7770	btrfs: fix dip leak The btrfs DIO code leaks dip structs when dip->csums allocation fails; bio->bi_end_io isn't set at the point where the free_ordered branch is consequently taken, thus bio_endio doesn't call the function which would free it in the normal case. Fix. Signed-off-by: Daniel J Blueman <daniel.blueman@gmail.com> Acked-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-03-10 11:21:49 -05:00
Jens Axboe	4c63f5646e	Merge branch 'for-2.6.39/stack-plug' into for-2.6.39/core Conflicts: block/blk-core.c block/blk-flush.c drivers/md/raid1.c drivers/md/raid10.c drivers/md/raid5.c fs/nilfs2/btnode.c fs/nilfs2/mdt.c Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-03-10 08:58:35 +01:00
Jens Axboe	7eaceaccab	block: remove per-queue plugging Code has been converted over to the new explicit on-stack plugging, and delay users have been converted to use the new API for that. So lets kill off the old plugging along with aops->sync_page(). Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-03-10 08:52:07 +01:00
James Morris	fe3fa43039	Merge branch 'master' of git://git.infradead.org/users/eparis/selinux into next	2011-03-08 11:38:10 +11:00
Linus Torvalds	4660ba63f1	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: fix fiemap bugs with delalloc Btrfs: set FMODE_EXCL in btrfs_device->mode Btrfs: make btrfs_rm_device() fail gracefully Btrfs: Avoid accessing unmapped kernel address Btrfs: Fix BTRFS_IOC_SUBVOL_SETFLAGS ioctl Btrfs: allow balance to explicitly allocate chunks as it relocates Btrfs: put ENOSPC debugging under a mount option	2011-02-25 14:03:39 -08:00
Chris Mason	ec29ed5b40	Btrfs: fix fiemap bugs with delalloc The Btrfs fiemap code wasn't properly returning delalloc extents, so applications that trust fiemap to decide if there are holes in the file see holes instead of delalloc. This reworks the btrfs fiemap code, adding a get_extent helper that searches for delalloc ranges and also adding a helper for extent_fiemap that skips past holes in the file. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-02-23 16:23:20 -05:00
Linus Torvalds	007a14af26	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: check return value of alloc_extent_map() Btrfs - Fix memory leak in btrfs_init_new_device() btrfs: prevent heap corruption in btrfs_ioctl_space_info() Btrfs: Fix balance panic Btrfs: don't release pages when we can't clear the uptodate bits Btrfs: fix page->private races	2011-02-15 08:00:35 -08:00
Tsutomu Itoh	c26a920373	Btrfs: check return value of alloc_extent_map() I add the check on the return value of alloc_extent_map() to several places. In addition, alloc_extent_map() returns only the address or NULL. Therefore, check by IS_ERR() is unnecessary. So, I remove IS_ERR() checking. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-02-14 16:21:37 -05:00
Linus Torvalds	cb5520f02c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (33 commits) Btrfs: Fix page count calculation btrfs: Drop __exit attribute on btrfs_exit_compress btrfs: cleanup error handling in btrfs_unlink_inode() Btrfs: exclude super blocks when we read in block groups Btrfs: make sure search_bitmap finds something in remove_from_bitmap btrfs: fix return value check of btrfs_start_transaction() btrfs: checking NULL or not in some functions Btrfs: avoid uninit variable warnings in ordered-data.c Btrfs: catch errors from btrfs_sync_log Btrfs: make shrink_delalloc a little friendlier Btrfs: handle no memory properly in prepare_pages Btrfs: do error checking in btrfs_del_csums Btrfs: use the global block reserve if we cannot reserve space Btrfs: do not release more reserved bytes to the global_block_rsv than we need Btrfs: fix check_path_shared so it returns the right value btrfs: check return value of btrfs_start_ioctl_transaction() properly btrfs: fix return value check of btrfs_join_transaction() fs/btrfs/inode.c: Add missing IS_ERR test btrfs: fix missing break in switch phrase btrfs: fix several uncheck memory allocations ...	2011-02-07 14:06:18 -08:00
Tsutomu Itoh	554233a6e0	btrfs: cleanup error handling in btrfs_unlink_inode() When btrfs_alloc_path() fails, btrfs_free_path() need not be called. Therefore, it changes the branch ahead. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-02-06 07:17:45 -05:00
Eric Paris	2a7dba391e	fs/vfs/security: pass last path component to LSM on inode creation SELinux would like to implement a new labeling behavior of newly created inodes. We currently label new inodes based on the parent and the creating process. This new behavior would also take into account the name of the new object when deciding the new label. This is not the (supposed) full path, just the last component of the path. This is very useful because creating /etc/shadow is different than creating /etc/passwd but the kernel hooks are unable to differentiate these operations. We currently require that userspace realize it is doing some difficult operation like that and than userspace jumps through SELinux hoops to get things set up correctly. This patch does not implement new behavior, that is obviously contained in a seperate SELinux patch, but it does pass the needed name down to the correct LSM hook. If no such name exists it is fine to pass NULL. Signed-off-by: Eric Paris <eparis@redhat.com>	2011-02-01 11:12:29 -05:00
Tsutomu Itoh	98d5dc13e7	btrfs: fix return value check of btrfs_start_transaction() The error check of btrfs_start_transaction() is added, and the mistake of the error check on several places is corrected. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-02-01 07:17:27 -05:00
Josef Bacik	dedefd7215	Btrfs: fix check_path_shared so it returns the right value When running xfstests 224 I kept getting ENOSPC when trying to remove the files, and this is because we were returning ret from check_path_shared while it was uninitalized, which isn't right. Fix this to return 0 properly, and now xfstests 224 doesn't freak out when it tries to clean itself up. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-28 16:40:37 -05:00
Tsutomu Itoh	3612b49598	btrfs: fix return value check of btrfs_join_transaction() The error check of btrfs_join_transaction()/btrfs_join_transaction_nolock() is added, and the mistake of the error check in several places is corrected. For more stable Btrfs, I think that we should reduce BUG_ON(). But, I think that long time is necessary for this. So, I propose this patch as a short-term solution. With this patch: - To more stable Btrfs, the part that should be corrected is clarified. - The panic isn't done by the NULL pointer reference etc. (even if BUG_ON() is increased temporarily) - The error code is returned in the place where the error can be easily returned. As a long-term plan: - BUG_ON() is reduced by using the forced-readonly framework, etc. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-28 16:40:37 -05:00
Julia Lawall	34d19bada0	fs/btrfs/inode.c: Add missing IS_ERR test After the conditional that precedes the following code, inode may be an ERR_PTR value. This can eg result from a memory allocation failure via the call to btrfs_iget, and thus does not imply that root is different than sub_root. Thus, an IS_ERR check is added to ensure that there is no dereference of inode in this case. The semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @r@ identifier f; @@ f(...) { ... return ERR_PTR(...); } @@ identifier r.f, fld; expression x; statement S1,S2; @@ x = f(...) ... when != IS_ERR(x) ( if (IS_ERR(x) \|\|...) S1 else S2 \| *x->fld ) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-28 16:40:37 -05:00
Chris Mason	eab49bec41	Merge branch 'bug-fixes' of git://repo.or.cz/linux-btrfs-devel into btrfs-38	2011-01-28 16:24:59 -05:00
Miao Xie	b897abec03	Btrfs: Fix memory leak in writepage fixup work fixup, which is allocated when starting page write to fix up the extent without ORDERED bit set, should be freed after this work is done. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>	2011-01-27 01:10:30 +08:00
Linus Torvalds	eee2a817df	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (25 commits) Btrfs: forced readonly mounts on errors btrfs: Require CAP_SYS_ADMIN for filesystem rebalance Btrfs: don't warn if we get ENOSPC in btrfs_block_rsv_check btrfs: Fix memory leak in btrfs_read_fs_root_no_radix() btrfs: check NULL or not btrfs: Don't pass NULL ptr to func that may deref it. btrfs: mount failure return value fix btrfs: Mem leak in btrfs_get_acl() btrfs: fix wrong free space information of btrfs btrfs: make the chunk allocator utilize the devices better btrfs: restructure find_free_dev_extent() btrfs: fix wrong calculation of stripe size btrfs: try to reclaim some space when chunk allocation fails btrfs: fix wrong data space statistics fs/btrfs: Fix build of ctree Btrfs: fix off by one while setting block groups readonly Btrfs: Add BTRFS_IOC_SUBVOL_GETFLAGS/SETFLAGS ioctls Btrfs: Add readonly snapshots support Btrfs: Refactor btrfs_ioctl_snap_create() btrfs: Extract duplicate decompress code ...	2011-01-17 14:43:43 -08:00
Christoph Hellwig	2fe17c1075	fallocate should be a file operation Currently all filesystems except XFS implement fallocate asynchronously, while XFS forced a commit. Both of these are suboptimal - in case of O_SYNC I/O we really want our allocation on disk, especially for the !KEEP_SIZE case where we actually grow the file with user-visible zeroes. On the other hand always commiting the transaction is a bad idea for fast-path uses of fallocate like for example in recent Samba versions. Given that block allocation is a data plane operation anyway change it from an inode operation to a file operation so that we have the file structure available that lets us check for O_SYNC. This also includes moving the code around for a few of the filesystems, and remove the already unnedded S_ISDIR checks given that we only wire up fallocate for regular files. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-17 02:25:31 -05:00
Christoph Hellwig	64c23e8687	make the feature checks in ->fallocate future proof Instead of various home grown checks that might need updates for new flags just check for any bit outside the mask of the features supported by the filesystem. This makes the check future proof for any newly added flag. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-17 02:25:30 -05:00
Chris Mason	f892436eb2	Merge branch 'lzo-support' of git://repo.or.cz/linux-btrfs-devel into btrfs-38	2011-01-16 11:25:54 -05:00
Josef Bacik	23a8519b55	Btrfs: fail if we try to use hole punch Btrfs doesn't have the ability to punch holes yet, so make sure we return EOPNOTSUPP if we try to use hole punching through fallocate. This support can be added later. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-12 20:16:44 -05:00
Al Viro	af53d29ac1	switch btrfs, close races Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-12 20:02:47 -05:00
Nick Piggin	258a5aa8df	btrfs: provide simple rcu-walk ACL implementation This simple implementation just checks for no ACLs on the inode, and if so, then the rcu-walk may proceed, otherwise fail it. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:30 +11:00
Nick Piggin	b74c79e993	fs: provide rcu-walk aware permission i_ops Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:29 +11:00
Nick Piggin	fb045adb99	fs: dcache reduce branches in lookup path Reduce some branches and memory accesses in dcache lookup by adding dentry flags to indicate common d_ops are set, rather than having to check them. This saves a pointer memory access (dentry->d_op) in common path lookup situations, and saves another pointer load and branch in cases where we have d_op but not the particular operation. Patched with: git grep -E '[.>]([[:space:]])d_op([[:space:]])=' \| xargs sed -e 's/$[^\t ]$->d_op = $.$;/d_set_d_op(\1, \2);/' -e 's/$[^\t ]$\.d_op = $.$;/d_set_d_op(\&\1, \2);/' -i Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:28 +11:00
Nick Piggin	fa0d7e3de6	fs: icache RCU free inodes RCU free the struct inode. This will allow: - Subsequent store-free path walking patch. The inode must be consulted for permissions when walking, so an RCU inode reference is a must. - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want to take i_lock no longer need to take sb_inode_list_lock to walk the list in the first place. This will simplify and optimize locking. - Could remove some nested trylock loops in dcache code - Could potentially simplify things a bit in VM land. Do not need to take the page lock to follow page->mapping. The downsides of this is the performance cost of using RCU. In a simple creat/unlink microbenchmark, performance drops by about 10% due to inability to reuse cache-hot slab objects. As iterations increase and RCU freeing starts kicking over, this increases to about 20%. In cases where inode lifetimes are longer (ie. many inodes may be allocated during the average life span of a single inode), a lot of this cache reuse is not applicable, so the regression caused by this patch is smaller. The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU, however this adds some complexity to list walking and store-free path walking, so I prefer to implement this at a later date, if it is shown to be a win in real situations. I haven't found a regression in any non-micro benchmark so I doubt it will be a problem. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:26 +11:00
Nick Piggin	fe15ce446b	fs: change d_delete semantics Change d_delete from a dentry deletion notification to a dentry caching advise, more like ->drop_inode. Require it to be constant and idempotent, and not take d_lock. This is how all existing filesystems use the callback anyway. This makes fine grained dentry locking of dput and dentry lru scanning much simpler. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:18 +11:00
Li Zefan	b83cc9693f	Btrfs: Add readonly snapshots support Usage: Set BTRFS_SUBVOL_RDONLY of btrfs_ioctl_vol_arg_v2->flags, and call ioctl(BTRFS_I0CTL_SNAP_CREATE_V2). Implementation: - Set readonly bit of btrfs_root_item->flags. - Add readonly checks in btrfs_permission (inode_permission), btrfs_setattr, btrfs_set/remove_xattr and some ioctls. Changelog for v3: - Eliminate btrfs_root->readonly, but check btrfs_root->root_item.flags. - Rename BTRFS_ROOT_SNAP_RDONLY to BTRFS_ROOT_SUBVOL_RDONLY. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>	2010-12-23 08:49:17 +08:00
Li Zefan	261507a02c	btrfs: Allow to add new compression algorithm Make the code aware of compression type, instead of always assuming zlib compression. Also make the zlib workspace function as common code for all compression types. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>	2010-12-22 23:15:45 +08:00
Linus Torvalds	e13cf63f2b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: prevent RAID level downgrades when space is low Btrfs: account for missing devices in RAID allocation profiles Btrfs: EIO when we fail to read tree roots Btrfs: fix compiler warnings Btrfs: Make async snapshot ioctl more generic Btrfs: pwrite blocked when writing from the mmaped buffer of the same page Btrfs: Fix a crash when mounting a subvolume Btrfs: fix sync subvol/snapshot creation Btrfs: Fix page leak in compressed writeback path Btrfs: do not BUG if we fail to remove the orphan item for dead snapshots Btrfs: fixup return code for btrfs_del_orphan_item Btrfs: do not do fast caching if we are allocating blocks for tree_root Btrfs: deal with space cache errors better Btrfs: fix use after free in O_DIRECT	2010-12-14 11:08:13 -08:00
Jan Beulich	3dd1462e82	Btrfs: fix compiler warnings ... regarding an unused function when !MIGRATION, and regarding a printk() format string vs argument mismatch. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-12-10 16:29:11 -05:00
Yan, Zheng	24ae63656a	Btrfs: Fix page leak in compressed writeback path "start + num_bytes >= actual_end" can happen when compressed page writeback races with file truncation. In that case we need unlock and release pages past the end of file. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-12-10 16:29:09 -05:00
Josef Bacik	955256f2c3	Btrfs: fix use after free in O_DIRECT This fixes a bug where we use dip after we have freed it. Instead just use the file_offset that was passed to the function. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2010-12-09 13:57:10 -05:00
Linus Torvalds	aa3fc52546	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (24 commits) Btrfs: don't use migrate page without CONFIG_MIGRATION Btrfs: deal with DIO bios that span more than one ordered extent Btrfs: setup blank root and fs_info for mount time Btrfs: fix fiemap Btrfs - fix race between btrfs_get_sb() and umount Btrfs: update inode ctime when using links Btrfs: make sure new inode size is ok in fallocate Btrfs: fix typo in fallocate to make it honor actual size Btrfs: avoid NULL pointer deref in try_release_extent_buffer Btrfs: make btrfs_add_nondir take parent inode as an argument Btrfs: hold i_mutex when calling btrfs_log_dentry_safe Btrfs: use dget_parent where we can UPDATED Btrfs: fix more ESTALE problems with NFS Btrfs: handle NFS lookups properly btrfs: make 1-bit signed fileds unsigned btrfs: Show device attr correctly for symlinks btrfs: Set file size correctly in file clone btrfs: Check if dest_offset is block-size aligned before cloning file Btrfs: handle the space_cache option properly btrfs: Fix early enospc because 'unused' calculated with wrong sign. ...	2010-11-29 14:11:08 -08:00
Chris Mason	163cf09c2a	Btrfs: deal with DIO bios that span more than one ordered extent The new DIO bio splitting code has problems when the bio spans more than one ordered extent. This will happen as the generic DIO code merges our get_blocks calls together into a bigger single bio. This fixes things by walking forward in the ordered extent code finding all the overlapping ordered extents and completing them all at once. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-11-28 19:56:33 -05:00
Josef Bacik	bc1cbf1f86	Btrfs: update inode ctime when using links Currently we fail xfstest 236 because we're not updating the inode ctime on link. This is a simple fix, and makes it so we pass 236 now. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-11-27 13:00:07 -05:00
Josef Bacik	0ed42a63f3	Btrfs: make sure new inode size is ok in fallocate We have been failing xfstest 228 forever, because we don't check to make sure the new inode size is acceptable as far as RLIMIT is concerned. Just check to make sure it's ok to create a inode with this new size and error out if not. With this patch we now pass 228. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-11-27 13:00:07 -05:00
Josef Bacik	55a61d1d06	Btrfs: fix typo in fallocate to make it honor actual size There is a typo in __btrfs_prealloc_file_range() where we set the i_size to actual_len/cur_offset, and then just set it to cur_offset again, and do the same with btrfs_ordered_update_i_size(). This fixes it back to keeping i_size in a local variable and then updating i_size properly. Tested this with xfs_io -F -f -c "falloc 0 1" -c "pwrite 0 1" foo stat'ing foo gives us a size of 1 instead of 4096 like it was. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-11-27 12:59:16 -05:00
Josef Bacik	a1b075d28d	Btrfs: make btrfs_add_nondir take parent inode as an argument Everybody who calls btrfs_add_nondir just passes in the dentry of the new file and then dereference dentry->d_parent->d_inode, but everybody who calls btrfs_add_nondir() are already passed the parent's inode. So instead of dereferencing dentry->d_parent, just make btrfs_add_nondir take the dir inode as an argument and pass that along so we don't have to worry about d_parent. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-11-21 22:26:10 -05:00
Josef Bacik	6a91221304	Btrfs: use dget_parent where we can UPDATED There are lots of places where we do dentry->d_parent->d_inode without holding the dentry->d_lock. This could cause problems with rename. So instead we need to use dget_parent() and hold the reference to the parent as long as we are going to use it's inode and then dput it at the end. Signed-off-by: Josef Bacik <josef@redhat.com> Cc: raven@themaw.net Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-11-21 22:26:09 -05:00
Josef Bacik	7619585390	Btrfs: fix more ESTALE problems with NFS When creating new inodes we don't setup inode->i_generation. So if we generate an fh with a newly created inode we save the generation of 0, but if we flush the inode to disk and have to read it back when getting the inode on the server we'll have the right i_generation, so gens wont match and we get ESTALE. This patch properly sets inode->i_generation when we create the new inode and now I'm no longer getting ESTALE. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-11-21 22:26:08 -05:00
Li Zefan	f209561ad8	btrfs: Show device attr correctly for symlinks Symlinks and files of other types show different device numbers, though they are on the same partition: $ touch tmp; ln -s tmp tmp2; stat tmp tmp2 File: `tmp' Size: 0 Blocks: 0 IO Block: 4096 regular empty file Device: 15h/21d Inode: 984027 Links: 1 --- snip --- File: `tmp2' -> `tmp' Size: 3 Blocks: 0 IO Block: 4096 symbolic link Device: 13h/19d Inode: 984028 Links: 1 Reported-by: Toke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-11-21 22:26:07 -05:00
Miao Xie	e65e153554	btrfs: fix panic caused by direct IO btrfs paniced when we write >64KB data by direct IO at one time. Reproduce steps: # mkfs.btrfs /dev/sda5 /dev/sda6 # mount /dev/sda5 /mnt # dd if=/dev/zero of=/mnt/tmpfile bs=100K count=1 oflag=direct Then btrfs paniced: mapping failed logical 1103155200 bio len 69632 len 12288 ------------[ cut here ]------------ kernel BUG at fs/btrfs/volumes.c:3010! [SNIP] Pid: 1992, comm: btrfs-worker-0 Not tainted 2.6.37-rc1 #1 D2399/PRIMERGY RIP: 0010:[<ffffffffa03d1462>] [<ffffffffa03d1462>] btrfs_map_bio+0x202/0x210 [btrfs] [SNIP] Call Trace: [<ffffffffa03ab3eb>] __btrfs_submit_bio_done+0x1b/0x20 [btrfs] [<ffffffffa03a35ff>] run_one_async_done+0x9f/0xb0 [btrfs] [<ffffffffa03d3d20>] run_ordered_completions+0x80/0xc0 [btrfs] [<ffffffffa03d45a4>] worker_loop+0x154/0x5f0 [btrfs] [<ffffffffa03d4450>] ? worker_loop+0x0/0x5f0 [btrfs] [<ffffffffa03d4450>] ? worker_loop+0x0/0x5f0 [btrfs] [<ffffffff81083216>] kthread+0x96/0xa0 [<ffffffff8100cec4>] kernel_thread_helper+0x4/0x10 [<ffffffff81083180>] ? kthread+0x0/0xa0 [<ffffffff8100cec0>] ? kernel_thread_helper+0x0/0x10 We fix this problem by splitting bios when we submit bios. Reported-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Tested-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-11-21 22:26:04 -05:00
Miao Xie	0c56fa9662	btrfs: fix free dip and dip->csums twice bio_endio() will free dip and dip->csums, so dip and dip->csums twice will be freed twice. Fix it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-11-21 22:26:02 -05:00
Linus Torvalds	925d169f5b	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (39 commits) Btrfs: deal with errors from updating the tree log Btrfs: allow subvol deletion by unprivileged user with -o user_subvol_rm_allowed Btrfs: make SNAP_DESTROY async Btrfs: add SNAP_CREATE_ASYNC ioctl Btrfs: add START_SYNC, WAIT_SYNC ioctls Btrfs: async transaction commit Btrfs: fix deadlock in btrfs_commit_transaction Btrfs: fix lockdep warning on clone ioctl Btrfs: fix clone ioctl where range is adjacent to extent Btrfs: fix delalloc checks in clone ioctl Btrfs: drop unused variable in block_alloc_rsv Btrfs: cleanup warnings from gcc 4.6 (nonbugs) Btrfs: Fix variables set but not read (bugs found by gcc 4.6) Btrfs: Use ERR_CAST helpers Btrfs: use memdup_user helpers Btrfs: fix raid code for removing missing drives Btrfs: Switch the extent buffer rbtree into a radix tree Btrfs: restructure try_release_extent_buffer() Btrfs: use the flusher threads for delalloc throttling Btrfs: tune the chunk allocation to 5% of the FS as metadata ... Fix up trivial conflicts in fs/btrfs/super.c and fs/fs-writeback.c, and remove use of INIT_RCU_HEAD in fs/btrfs/extent_io.c (that init macro was useless and removed in commit `5e8067adfd`: "rcu head remove init")	2010-10-30 09:05:48 -07:00
Chris Mason	6418c96107	Btrfs: deal with errors from updating the tree log During unlink we remove any references to the inode from the tree log. It can return -ENOENT and other errors, and this changes the unlink code to deal with it. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-10-30 07:34:24 -04:00
Andi Kleen	559af82114	Btrfs: cleanup warnings from gcc 4.6 (nonbugs) These are all the cases where a variable is set, but not read which are not bugs as far as I can see, but simply leftovers. Still needs more review. Found by gcc 4.6's new warnings Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-10-29 15:14:37 -04:00
Andi Kleen	411fc6bcef	Btrfs: Fix variables set but not read (bugs found by gcc 4.6) These are all the cases where a variable is set, but not read which are really bugs. - Couple of incorrect error handling fixed. - One incorrect use of a allocation policy - Some other things Still needs more review. Found by gcc 4.6's new warnings. [akpm@linux-foundation.org: fix build. Might have been bitrot] Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-10-29 15:14:31 -04:00
Chris Mason	6b5b817f10	Merge branch 'bug-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-work Conflicts: fs/btrfs/extent-tree.c Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-10-29 09:27:49 -04:00

1 2 3 4 5 ...

651 Commits