linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-12-05 01:54:09 +08:00

Author	SHA1	Message	Date
Tristan Ye	78f94673d7	Ocfs2: Optimize ocfs2 truncate to use ocfs2_remove_btree_range() instead. Truncate is just a special case of punching holes(from new i_size to end), we therefore could take advantage of the existing ocfs2_remove_btree_range() to reduce the comlexity and redundancy in alloc.c. The goal here is to make truncate more generic and straightforward. Several functions only used by ocfs2_commit_truncate() will smiply be removed. ocfs2_remove_btree_range() was originally used by the hole punching code, which didn't take refcount trees into account (definitely a bug). We therefore need to change that func a bit to handle refcount trees. It must take the refcount lock, calculate and reserve blocks for refcount tree changes, and decrease refcounts at the end. We replace ocfs2_lock_allocators() here by adding a new func ocfs2_reserve_blocks_for_rec_trunc() which accepts some extra blocks to reserve. This will not hurt any other code using ocfs2_remove_btree_range() (such as dir truncate and hole punching). I merged the following steps into one patch since they may be logically doing one thing, though I know it looks a little bit fat to review. 1). Remove redundant code used by ocfs2_commit_truncate(), since we're moving to ocfs2_remove_btree_range anyway. 2). Add a new func ocfs2_reserve_blocks_for_rec_trunc() for purpose of accepting some extra blocks to reserve. 3). Change ocfs2_prepare_refcount_change_for_del() a bit to fit our needs. It's safe to do this since it's only being called by truncate. 4). Change ocfs2_remove_btree_range() a bit to take refcount case into account. 5). Finally, we change ocfs2_commit_truncate() to call ocfs2_remove_btree_range() in a proper way. The patch has been tested normally for sanity check, stress tests with heavier workload will be expected. Based on this patch, fixing the punching holes bug will be fairly easy. Signed-off-by: Tristan Ye <tristan.ye@oracle.com> Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-05-18 12:25:10 -07:00
Joel Becker	e4b963f10e	ocfs2: Wrap signal blocking in void functions. ocfs2 sometimes needs to block signals around dlm operations, but it currently does it with sigprocmask(). Even worse, it's checking the error code of sigprocmask(). The in-kernel sigprocmask() can only error if you get the SIG_* argument wrong. We don't. Wrap the sigprocmask() calls with ocfs2_[un]block_signals(). These functions are void, but they will BUG() if somehow sigprocmask() returns an error. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-05-10 11:50:10 -07:00
Mark Fasheh	e3b4a97dbe	ocfs2: use allocation reservations for directory data Use the reservations system for unindexed dir tree allocations. We don't bother with the indexed tree as reads from it are mostly random anyway. Directory reservations are marked seperately, to allow the reservations code a chance to optimize their window sizes. This patch allocates only 8 bits for directory windows as they generally are not expected to grow as quickly as file data. Future improvements to dir window sizing can trivially be made. Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2010-05-05 18:17:30 -07:00
Mark Fasheh	4fe370afaa	ocfs2: use allocation reservations during file write Add a per-inode reservations structure and pass it through to the reservations code. Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2010-05-05 18:17:30 -07:00
Joel Becker	ec20cec7a3	ocfs2: Make ocfs2_journal_dirty() void. jbd[2]_journal_dirty_metadata() only returns 0. It's been returning 0 since before the kernel moved to git. There is no point in checking this error. ocfs2_journal_dirty() has been faithfully returning the status since the beginning. All over ocfs2, we have blocks of code checking this can't fail status. In the past few years, we've tried to avoid adding these checks, because they are pointless. But anyone who looks at our code assumes they are needed. Finally, ocfs2_journal_dirty() is made a void function. All error checking is removed from other files. We'll BUG_ON() the status of jbd2_journal_dirty_metadata() just in case they change it someday. They won't. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-05-05 18:17:29 -07:00
Joel Becker	d577632e65	ocfs2: Avoid a gcc warning in ocfs2_wipe_inode(). gcc warns that a variable is uninitialized. It's actually handled, but an early return fools gcc. Let's just initialize the variable to a garbage value that will crash if the usage is ever broken. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-05-03 19:15:49 -07:00
Joel Becker	f9221fd803	Merge branch 'skip_delete_inode' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2-mark into ocfs2-fixes	2010-04-30 13:37:29 -07:00
Dan Carpenter	0350cb078f	ocfs2: potential ERR_PTR dereference on error paths If "handle" is non null at the end of the function then we assume it's a valid pointer and pass it to ocfs2_commit_trans(); Signed-off-by: Dan Carpenter <error27@gmail.com> Cc: <stable@kernel.org> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-04-23 14:42:06 -07:00
Li Dongyang	d4cd1871cf	ocfs2: add OCFS2_INODE_SKIP_ORPHAN_DIR flag and honor it in the inode wipe code Currently in the error path of ocfs2_symlink and ocfs2_mknod, we just call iput with the inode we failed with, but the inode wipe code will complain because we don't add the inode to orphan dir. One solution would be to lock the orphan dir during the entire transaction, but that's too heavy for a rare error path. Instead, we add a flag, OCFS2_INODE_SKIP_ORPHAN_DIR which tells the inode wipe code that it won't find this inode in the orphan dir. [ Merge fixes and comment style cleanups -Mark ] Signed-off-by: Li Dongyang <lidongyang@novell.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2010-04-23 11:03:49 -07:00
Tejun Heo	5a0e3ad6af	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>	2010-03-30 22:02:32 +09:00
Tristan Ye	b54c2ca475	Ocfs2: Handle deletion of reflinked oprhan inodes correctly. The rule is that all inodes in the orphan dir have ORPHANED_FL, otherwise we treated it as an ERROR. This rule works well except for some rare cases of reflink operation: http://oss.oracle.com/bugzilla/show_bug.cgi?id=1215 The problem is caused by how reflink and our orphan_scan thread interact. * The orphan scan pulls the orphans into a queue first, then runs the queue at a later time. We only hold the orphan_dir's lock during scanning. * Reflink create a oprhaned target in orphan_dir as its first step. It removes the target and clears the flag as the final step. These two steps take the orphan_dir's lock, but it is not held for the duration. Based on the above semantics, a reflink inode can be moved out of the orphan dir and have its ORPHANED_FL cleared before the queue of orphans is run. This leads to a ERROR in ocfs2_query_wipde_inode(). This patch teaches ocfs2_query_wipe_inode() to detect previously orphaned reflink targets. If a reflink fails or a crash occurs during the relfink operation, the inode will retain ORPHANED_FL and will be properly wiped. Signed-off-by: Tristan Ye <tristan.ye@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-03-23 18:22:55 -07:00
Christoph Hellwig	871a293155	dquot: cleanup dquot initialize routine Get rid of the initialize dquot operation - it is now always called from the filesystem and if a filesystem really needs it's own (which none currently does) it can just call into it's own routine directly. Rename the now static low-level dquot_initialize helper to __dquot_initialize and vfs_dq_init to dquot_initialize to have a consistent namespace. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>	2010-03-05 00:20:30 +01:00
Christoph Hellwig	907f4554e2	dquot: move dquot initialization responsibility into the filesystem Currently various places in the VFS call vfs_dq_init directly. This means we tie the quota code into the VFS. Get rid of that and make the filesystem responsible for the initialization. For most metadata operations this is a straight forward move into the methods, but for truncate and open it's a bit more complicated. For truncate we currently only call vfs_dq_init for the sys_truncate case because open already takes care of it for ftruncate and open(O_TRUNC) - the new code causes an additional vfs_dq_init for those which is harmless. For open the initialization is moved from do_filp_open into the open method, which means it happens slightly earlier now, and only for regular files. The latter is fine because we don't need to initialize it for operations on special files, and we already do it as part of the namespace operations for directories. Add a dquot_file_open helper that filesystems that support generic quotas can use to fill in ->open. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>	2010-03-05 00:20:30 +01:00
Christoph Hellwig	9f75475802	dquot: cleanup dquot drop routine Get rid of the drop dquot operation - it is now always called from the filesystem and if a filesystem really needs it's own (which none currently does) it can just call into it's own routine directly. Rename the now static low-level dquot_drop helper to __dquot_drop and vfs_dq_drop to dquot_drop to have a consistent namespace. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>	2010-03-05 00:20:30 +01:00
Christoph Hellwig	257ba15ced	dquot: move dquot drop responsibility into the filesystem Currently clear_inode calls vfs_dq_drop directly. This means we tie the quota code into the VFS. Get rid of that and make the filesystem responsible for the drop inside the ->clear_inode superblock operation. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>	2010-03-05 00:20:29 +01:00
Christoph Hellwig	63936ddaa1	dquot: cleanup inode allocation / freeing routines Get rid of the alloc_inode and free_inode dquot operations - they are always called from the filesystem and if a filesystem really needs their own (which none currently does) it can just call into it's own routine directly. Also get rid of the vfs_dq_alloc/vfs_dq_free wrappers and always call the lowlevel dquot_alloc_inode / dqout_free_inode routines directly, which now lose the number argument which is always 1. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>	2010-03-05 00:20:28 +01:00
Sunil Mushran	2bd632165c	ocfs2/trivial: Remove trailing whitespaces Patch removes trailing whitespaces. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-01-25 19:20:51 -08:00
Tao Ma	8b2c0dba51	ocfs2: Call refcount tree remove process properly. Now with xattr refcount support, we need to check whether we have xattr refcounted before we remove the refcount tree. Now the mechanism is: 1) Check whether i_clusters == 0, if no, exit. 2) check whether we have i_xattr_loc in dinode. if yes, exit. 2) Check whether we have inline xattr stored outside, if yes, exit. 4) Remove the tree. Signed-off-by: Tao Ma <tao.ma@oracle.com>	2009-09-22 20:09:44 -07:00
Joel Becker	6136ca5f5f	ocfs2: Drop struct inode from ocfs2_extent_tree_operations. We can get to the inode from the caching information. Other parent types don't need it. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-09-04 16:07:57 -07:00
Joel Becker	0cf2f7632b	ocfs2: Pass struct ocfs2_caching_info to the journal functions. The next step in divorcing metadata I/O management from struct inode is to pass struct ocfs2_caching_info to the journal functions. Thus the journal locks a metadata cache with the cache io_lock function. It also can compare ci_last_trans and ci_created_trans directly. This is a large patch because of all the places we change ocfs2_journal_access..(handle, inode, ...) to ocfs2_journal_access..(handle, INODE_CACHE(inode), ...). Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-09-04 16:07:50 -07:00
Joel Becker	292dd27ec7	ocfs2: move ip_created_trans to struct ocfs2_caching_info Similar ip_last_trans, ip_created_trans tracks the creation of a journal managed inode. This specifically tracks what transaction created the inode. This is so the code can know if the inode has ever been written to disk. This behavior is desirable for any journal managed object. We move it to struct ocfs2_caching_info as ci_created_trans so that any object using ocfs2_caching_info can rely on this behavior. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-09-04 16:07:49 -07:00
Joel Becker	66fb345ddd	ocfs2: move ip_last_trans to struct ocfs2_caching_info We have the read side of metadata caching isolated to struct ocfs2_caching_info, now we need the write side. This means the journal functions. The journal only does a couple of things with struct inode. This change moves the ip_last_trans field onto struct ocfs2_caching_info as ci_last_trans. This field tells the journal whether a pending journal flush is required. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-09-04 16:07:49 -07:00
Joel Becker	8cb471e8f8	ocfs2: Take the inode out of the metadata read/write paths. We are really passing the inode into the ocfs2_read/write_blocks() functions to get at the metadata cache. This commit passes the cache directly into the metadata block functions, divorcing them from the inode. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-09-04 16:07:48 -07:00
Joel Becker	6e5a3d7538	ocfs2: Change metadata caching locks to an operations structure. We don't really want to cart around too many new fields on the ocfs2_caching_info structure. So let's wrap all our access of the parent object in a set of operations. One pointer on caching_info, and more flexibility to boot. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-09-04 16:07:48 -07:00
Joel Becker	47460d65a4	ocfs2: Make the ocfs2_caching_info structure self-contained. We want to use the ocfs2_caching_info structure in places that are not inodes. To do that, it can no longer rely on referencing the inode directly. This patch moves the flags to ocfs2_caching_info->ci_flags, stores pointers to the parent's locks on the ocfs2_caching_info, and renames the constants and flags to reflect its independant state. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-09-04 16:07:47 -07:00
Jan Kara	cb25797d45	ocfs2: Add lockdep annotations Add lockdep support to OCFS2. The support also covers all of the cluster locks except for open locks, journal locks, and local quotafile locks. These are special because they are acquired for a node, not for a particular process and lockdep cannot deal with such type of locking. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-06-22 14:34:26 -07:00
wengang wang	6ca497a83e	ocfs2: fix rare stale inode errors when exporting via nfs For nfs exporting, ocfs2_get_dentry() returns the dentry for fh. ocfs2_get_dentry() may read from disk when the inode is not in memory, without any cross cluster lock. this leads to the file system loading a stale inode. This patch fixes above problem. Solution is that in case of inode is not in memory, we get the cluster lock(PR) of alloc inode where the inode in question is allocated from (this causes node on which deletion is done sync the alloc inode) before reading out the inode itsself. then we check the bitmap in the group (the inode in question allcated from) to see if the bit is clear. if it's clear then it's stale. if the bit is set, we then check generation as the existing code does. We have to read out the inode in question from disk first to know its alloc slot and allot bit. And if its not stale we read it out using ocfs2_iget(). The second read should then be from cache. And also we have to add a per superblock nfs_sync_lock to cover the lock for alloc inode and that for inode in question. this is because ocfs2_get_dentry() and ocfs2_delete_inode() lock on them in reverse order. nfs_sync_lock is locked in EX mode in ocfs2_get_dentry() and in PR mode in ocfs2_delete_inode(). so that mutliple ocfs2_delete_inode() can run concurrently in normal case. [mfasheh@suse.com: build warning fixes and comment cleanups] Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-04-03 11:39:25 -07:00
Tao Ma	138211515c	ocfs2: Optimize inode allocation by remembering last group In ocfs2, the inode block search looks for the "emptiest" inode group to allocate from. So if an inode alloc file has many equally (or almost equally) empty groups, new inodes will tend to get spread out amongst them, which in turn can put them all over the disk. This is undesirable because directory operations on conceptually "nearby" inodes force a large number of seeks. So we add ip_last_used_group in core directory inodes which records the last used allocation group. Another field named ip_last_used_slot is also added in case inode stealing happens. When claiming new inode, we passed in directory's inode so that the allocation can use this information. For more details, please see http://oss.oracle.com/osswiki/OCFS2/DesignDocs/InodeAllocationStrategy. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-04-03 11:39:17 -07:00
Mark Fasheh	198a1ca3b7	ocfs2: Increase max links count Since we've now got a directory format capable of handling a large number of entries, we can increase the maximum link count supported. This only gets increased if the directory indexing feature is turned on. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Acked-by: Joel Becker <joel.becker@oracle.com>	2009-04-03 11:39:16 -07:00
Mark Fasheh	9b7895efac	ocfs2: Add a name indexed b-tree to directory inodes This patch makes use of Ocfs2's flexible btree code to add an additional tree to directory inodes. The new tree stores an array of small, fixed-length records in each leaf block. Each record stores a hash value, and pointer to a block in the traditional (unindexed) directory tree where a dirent with the given name hash resides. Lookup exclusively uses this tree to find dirents, thus providing us with constant time name lookups. Some of the hashing code was copied from ext3. Unfortunately, it has lots of unfixed checkpatch errors. I left that as-is so that tracking changes would be easier. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Acked-by: Joel Becker <joel.becker@oracle.com>	2009-04-03 11:39:15 -07:00
Joel Becker	13723d00e3	ocfs2: Use metadata-specific ocfs2_journal_access_() functions. The per-metadata-type ocfs2_journal_access_() functions hook up jbd2 commit triggers and allow us to compute metadata ecc right before the buffers are written out. This commit provides ecc for inodes, extent blocks, group descriptors, and quota blocks. It is not safe to use extened attributes and metaecc at the same time yet. The ocfs2_extent_tree and ocfs2_path abstractions in alloc.c both hide the type of block at their root. Before, it didn't matter, but now the root block must use the appropriate ocfs2_journal_access_*() function. To keep this abstract, the structures now have a pointer to the matching journal_access function and a wrapper call to call it. A few places use naked ocfs2_write_block() calls instead of adding the blocks to the journal. We make sure to calculate their checksum and ecc before the write. Since we pass around the journal_access functions. Let's typedef them in ocfs2.h. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:32 -08:00
Joel Becker	d6b32bbb3e	ocfs2: block read meta ecc. Add block check calls to the read_block validate functions. This is the almost all of the read-side checking of metaecc. xattr buckets are not checked yet. Writes are also unchecked, and so a read-write mount will quickly fail. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:31 -08:00
Jan Kara	a90714c150	ocfs2: Add quota calls for allocation and freeing of inodes and space Add quota calls for allocation and freeing of inodes and space, also update estimates on number of needed credits for a transaction. Move out inode allocation from ocfs2_mknod_locked() because vfs_dq_init() must be called outside of a transaction. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:23 -08:00
Jan Kara	bbbd0eb34b	ocfs2: Mark system files as not subject to quota accounting Mark system files as not subject to quota accounting. This prevents possible recursions into quota code and thus deadlocks. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:23 -08:00
Jan Kara	1a224ad11e	ocfs2: Assign feature bits and system inodes to quota feature and quota files Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:23 -08:00
Joel Becker	970e4936d7	ocfs2: Validate metadata only when it's read from disk. Add an optional validation hook to ocfs2_read_blocks(). Now the validation function is only called when a block was actually read off of disk. It is not called when the buffer was in cache. We add a buffer state bit BH_NeedsValidate to flag these buffers. It must always be one higher than the last JBD2 buffer state bit. The dinode, dirblock, extent_block, and xattr_block validators are lifted to this scheme directly. The group_descriptor validator needs to be split into two pieces. The first part only needs the gd buffer and is passed to ocfs2_read_block(). The second part requires the dinode as well, and is called every time. It's only 3 compares, so it's tiny. This also allows us to clean up the non-fatal gd check used by resize.c. It now has no magic argument. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:36:53 -08:00
Joel Becker	b657c95c11	ocfs2: Wrap inode block reads in a dedicated function. The ocfs2 code currently reads inodes off disk with a simple ocfs2_read_block() call. Each place that does this has a different set of sanity checks it performs. Some check only the signature. A couple validate the block number (the block read vs di->i_blkno). A couple others check for VALID_FL. Only one place validates i_fs_generation. A couple check nothing. Even when an error is found, they don't all do the same thing. We wrap inode reading into ocfs2_read_inode_block(). This will validate all the above fields, going readonly if they are invalid (they never should be). ocfs2_read_inode_block_full() is provided for the places that want to pass read_block flags. Every caller is passing a struct inode with a valid ip_blkno, so we don't need a separate blkno argument either. We will remove the validation checks from the rest of the code in a later commit, as they are no longer necessary. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:36:52 -08:00
Sunil Mushran	ae0dff6830	ocfs2: Set journal descriptor to NULL after journal shutdown Patch sets journal descriptor to NULL after the journal is shutdown. This ensures that jbd2_journal_release_jbd_inode(), which removes the jbd2 inode from txn lists, can be called safely from ocfs2_clear_inode() even after the journal has been shutdown. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-11-10 09:51:47 -08:00
Joel Becker	d4a8c93c82	ocfs2: Make cached block reads the common case. ocfs2_read_blocks() currently requires the CACHED flag for cached I/O. However, that's the common case. Let's flip it around and provide an IGNORE_CACHE flag for the special users. This has the added benefit of cleaning up the code some (ignore_cache takes on its special meaning earlier in the loop). Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-10-14 11:58:22 -07:00
Joel Becker	07446dc72c	ocfs2: Move ocfs2_bread() into dir.c dir.c is the only place using ocfs2_bread(), so let's make it static to that file. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-10-14 11:58:03 -07:00
Joel Becker	0fcaa56a2a	ocfs2: Simplify ocfs2_read_block() More than 30 callers of ocfs2_read_block() pass exactly OCFS2_BH_CACHED. Only six pass a different flag set. Rather than have every caller care, let's make ocfs2_read_block() take no flags and always do a cached read. The remaining six places can call ocfs2_read_blocks() directly. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-10-14 11:51:57 -07:00
Joel Becker	31d33073ca	ocfs2: Require an inode for ocfs2_read_block(s)(). Now that synchronous readers are using ocfs2_read_blocks_sync(), all callers of ocfs2_read_blocks() are passing an inode. Use it unconditionally. Since it's there, we don't need to pass the ocfs2_super either. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-10-14 11:43:29 -07:00
Joel Becker	da1e90985a	ocfs2: Separate out sync reads from ocfs2_read_blocks() The ocfs2_read_blocks() function currently handles sync reads, cached, reads, and sometimes cached reads. We're going to add some functionality to it, so first we should simplify it. The uncached, synchronous reads are much easer to handle as a separate function, so we instroduce ocfs2_read_blocks_sync(). Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-10-14 11:29:10 -07:00
Mark Fasheh	a81cb88b64	ocfs2: Don't check for NULL before brelse() This is pointless as brelse() already does the check. Signed-off-by: Mark Fasheh	2008-10-13 17:02:44 -07:00
Joel Becker	2b4e30fbde	ocfs2: Switch over to JBD2. ocfs2 wants JBD2 for many reasons, not the least of which is that JBD is limiting our maximum filesystem size. It's a pretty trivial change. Most functions are just renamed. The only functional change is moving to Jan's inode-based ordered data mode. It's better, too. Because JBD2 reads and writes JBD journals, this is compatible with any existing filesystem. It can even interact with JBD-based ocfs2 as long as the journal is formated for JBD. We provide a compatibility option so that paranoid people can still use JBD for the time being. This will go away shortly. [ Moved call of ocfs2_begin_ordered_truncate() from ocfs2_delete_inode() to ocfs2_truncate_for_delete(). --Mark ] Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-10-13 17:02:43 -07:00
Tiger Yang	cf1d6c763f	ocfs2: Add extended attribute support This patch implements storing extended attributes both in inode or a single external block. We only store EA's in-inode when blocksize > 512 or that inode block has free space for it. When an EA's value is larger than 80 bytes, we will store the value via b-tree outside inode or block. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-10-13 16:57:02 -07:00
Mark Fasheh	53da4939f3	ocfs2: POSIX file locks support This is actually pretty easy since fs/dlm already handles the bulk of the work. The Ocfs2 userspace cluster stack module already uses fs/dlm as the underlying lock manager, so I only had to add the right calls. Cluster-aware POSIX locks ("plocks") can be turned off by the same means at UNIX locks - mount with 'noflocks', or create a local-only Ocfs2 volume. Internally, the file system uses two sets of file_operations, depending on whether cluster aware plocks is required. This turns out to be easier than implementing local-only versions of ->lock. Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-10-13 13:57:57 -07:00
Marcin Slusarz	4092d49f70	ocfs2: convert byte order of constant instead of variable Convert byte order of constant instead of variable it will be done at compile time vs run time. Remove unused le32_and_cpu. Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 15:05:46 -08:00
Jan Kara	5fa0613ea5	ocfs2: Silence false lockdep warnings Create separate lockdep lock classes for system file's i_mutexes. They are used to guard allocations and similar things and thus rank differently than i_mutex of a regular file or directory. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 15:05:44 -08:00
Mark Fasheh	e63aecb651	ocfs2: Rename ocfs2_meta_[un]lock Call this the "inode_lock" now, since it covers both data and meta data. This patch makes no functional changes. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 14:46:01 -08:00
Mark Fasheh	c934a92d05	ocfs2: Remove data locks The meta lock now covers both meta data and data, so this just removes the now-redundant data lock. Combining locks saves us a round of lock mastery per inode and one less lock to ping between nodes during read/write. We don't lose much - since meta locks were always held before a data lock (and at the same level) ordered writeout mode (the default) ensured that flushing for the meta data lock also pushed out data anyways. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 14:45:57 -08:00
Mark Fasheh	34d024f843	ocfs2: Remove mount/unmount votes The node maps that are set/unset by these votes are no longer relevant, thus we can remove the mount and umount votes. Since those are the last two remaining votes, we can also remove the entire vote infrastructure. The vote thread has been renamed to the downconvert thread, and the small amount of functionality related to managing it has been moved into fs/ocfs2/dlmglue.c. All references to votes have been removed or updated. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 14:45:34 -08:00
Mark Fasheh	a46043e08f	ocfs2: log valid inode # on bad inode If the inode block isn't valid then we don't want to print the value from that, instead print the block number which was passed in (which should always be correct). Also, turn this into a debug print for now - folks who hit an actual problem always have other logs indicating what the source is. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-11-27 16:47:02 -08:00
Joe Perches	2759236f84	[PATCH] fs/ocfs2: Add missing "space" Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-11-27 16:47:01 -08:00
Mark Fasheh	1afc32b952	ocfs2: Write support for inline data This fixes up write, truncate, mmap, and RESVSP/UNRESVP to understand inline inode data. For the most part, the changes to the core write code can be relied on to do the heavy lifting. Any code calling ocfs2_write_begin (including shared writeable mmap) can count on it doing the right thing with respect to growing inline data to an extent tree. Size reducing truncates, including UNRESVP can simply zero that portion of the inode block being removed. Size increasing truncatesm, including RESVP have to be a little bit smarter and grow the inode to an extent tree if necessary. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Reviewed-by: Joel Becker <joel.becker@oracle.com>	2007-10-12 11:54:40 -07:00
Mark Fasheh	15b1e36bdb	ocfs2: Structure updates for inline data Add the disk, network and memory structures needed to support data in inode. Struct ocfs2_inline_data is defined and embedded in ocfs2_dinode for storing inline data. A new inode field, i_dyn_features, is added to facilitate tracking of dynamic inode state. Since it will be used often, we want to mirror it on ocfs2_inode_info, and transfer it via the meta data lvb. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Reviewed-by: Joel Becker <joel.becker@oracle.com>	2007-10-12 11:54:39 -07:00
Randy Dunlap	e63340ae6b	header cleaning: don't include smp_lock.h when not used Remove includes of <linux/smp_lock.h> where it is not used/needed. Suggested by Al Viro. Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc, sparc64, and arm (all 59 defconfigs). Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:07 -07:00
Mark Fasheh	1ca1a111b1	ocfs2: fix sparse warnings in fs/ocfs2 None of these are actually harmful, but the noise makes looking for real problems difficult. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-05-02 15:08:08 -07:00
Jan Kara	6e4b0d5692	[PATCH] Copy i_flags to ocfs2 inode flags on write Propagate flags such as S_APPEND, S_IMMUTABLE, etc. from i_flags into ocfs2-specific ip_attr. Hence, when someone sets these flags via a different interface than ioctl, they are stored correctly. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-05-02 15:07:58 -07:00
Joel Becker	ee19a77956	ocfs2: Wrap access of directory allocations with ip_alloc_sem. OCFS2_I(inode)->ip_alloc_sem is a read-write semaphore protecting local concurrent access of ocfs2 inodes. However, ocfs2 directories were not taking the semaphore while they accessed or modified the allocation tree. ocfs2_extend_dir() needs to take the semaphore in a write mode when it adds to the allocation. All other directory users get there via ocfs2_bread(), which takes the semaphore in read mode. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-05-02 15:07:42 -07:00
Mark Fasheh	8341897882	ocfs2: Cache extent records The extent map code was ripped out earlier because of an inability to deal with holes. This patch adds back a simpler caching scheme requiring far less code. Our old extent map caching was designed back when meta data block caching in Ocfs2 didn't work very well, resulting in many disk reads. These days our metadata caching is much better, resulting in no un-necessary disk reads. As a result, extent caching doesn't have to be as fancy, nor does it have to cache as many extents. Keeping the last 3 extents seen should be sufficient to give us a small performance boost on some streaming workloads. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 15:10:40 -07:00
Mark Fasheh	8110b073a9	ocfs2: Fix up i_blocks calculation to know about holes Older file systems which didn't support holes did a dumb calculation of i_blocks based on i_size. This is no longer accurate, so fix things up to take actual allocation into account. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 15:07:40 -07:00
Mark Fasheh	49cb8d2d49	ocfs2: Read from an unwritten extent returns zeros Return an optional extent flags field from our lookup functions and wire up callers to treat unwritten regions as holes for the purpose of returning zeros to the user. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 15:02:41 -07:00
Mark Fasheh	60b11392f1	ocfs2: zero tail of sparse files on truncate Since we don't zero on extend anymore, truncate needs to be fixed up to zero the part of a file between i_size and and end of it's cluster. Otherwise a subsequent extend could expose bad data. This introduced a new helper, which can be used in ocfs2_write(). Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 15:02:20 -07:00
Mark Fasheh	3a0782d09c	ocfs2: teach extend/truncate about sparse files For ocfs2_truncate_file(), we eliminate the "simple" truncate case which no longer exists since i_size is not tied to i_clusters. In ocfs2_extend_file(), we skip the allocation / page zeroing code for file systems which understand sparse files. The core truncate code is changed to do a bottom up tree traversal. This gets abstracted out into it's own function. To make things more readable, most of the special case handling for in-inode extents from ocfs2_do_truncate() is also removed. Though write support for sparse files comes in a later patch, we at least update ocfs2_prepare_inode_for_write() to skip allocation for sparse files. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 15:01:56 -07:00
Mark Fasheh	363041a5f7	ocfs2: temporarily remove extent map caching The code in extent_map.c is not prepared to deal with a subtree being rotated between lookups. This can happen when filling holes in sparse files. Instead of a lengthy patch to update the code (which would likely lose the benefit of caching subtree roots), we remove most of the algorithms and implement a simple path based lookup. A less ambitious extent caching scheme will be added in a later patch. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 15:01:31 -07:00
Mark Fasheh	6f16bf655c	ocfs2: small cleanup of ocfs2_request_delete() There are two checks in there (one for inode newness, one for other mounted nodes) which are unnecessary, so remove them. The DLM will allow the trylock in either case without any messaging overhead. Removing these makes ocfs2_request_delete() a one liner function, so just move the trylock out one level into ocfs2_query_inode_wipe(). Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 14:40:55 -07:00
Tiger Yang	68e2b740c4	ocfs2: remove unused code Remove node messaging code that becomes unused with the delete inode vote removal. [Removed even more cruft which I spotted during review --Mark] Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 14:40:16 -07:00
Tiger Yang	500086300e	ocfs2: Remove delete inode vote Ocfs2 currently does cluster-wide node messaging to check the open state of an inode during delete. This patch removes that mechanism in favor of an inode cluster lock which is taken at shared read when an inode is first read and dropped in clear_inode(). This allows a deleting node to test the liveness of an inode by attempting to take an exclusive lock. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 14:39:48 -07:00
Mark Fasheh	6a1bd4a578	ocfs2: cleanup ocfs2_iget() errors Get rid of some error prints in the ocfs2_iget() path from ocfs2_get_dentry(). NFSD can easily cause us to read stale inodes. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-01-21 16:19:12 -08:00
Sunil Mushran	c271c5c22b	ocfs2: local mounts This allows users to format an ocfs2 file system with a special flag, OCFS2_FEATURE_INCOMPAT_LOCAL_MOUNT. When the file system sees this flag, it will not use any cluster services, nor will it require a cluster configuration, thus acting like a 'local' file system. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-12-07 17:37:53 -08:00
Tiger Yang	7f1a37e31f	ocfs2: core atime update functions This patch adds the core routines for updating atime in ocfs2. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-12-01 18:28:51 -08:00
Mark Fasheh	1fabe1481f	ocfs2: Remove struct ocfs2_journal_handle in favor of handle_t This is mostly a search and replace as ocfs2_journal_handle is now no more than a container for a handle_t pointer. ocfs2_commit_trans() becomes very straight forward, and we remove some out of date comments / code. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-12-01 18:28:28 -08:00
Mark Fasheh	65eff9ccf8	ocfs2: remove handle argument to ocfs2_start_trans() All callers either pass in NULL directly, or a local variable that is already set to NULL. The internals of ocfs2_start_trans() get a nice cleanup as a result. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-12-01 18:28:23 -08:00
Mark Fasheh	02dc1af44e	ocfs2: pass ocfs2_super * into ocfs2_commit_trans() This sets us up to remove handle->journal. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-12-01 18:28:08 -08:00
Mark Fasheh	4bcec1847a	ocfs2: remove unused handle argument from ocfs2_meta_lock_full() Now that this is unused and all callers pass NULL, we can safely remove it. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-12-01 18:28:05 -08:00
Mark Fasheh	02928a71ae	ocfs2: remove unused ocfs2_handle_add_inode() We can also delete the unused infrastructure which was once in place to support this functionality. ocfs2_inode_private loses ip_handle and ip_handle_list. ocfs2_journal_handle loses handle_list. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-12-01 18:27:55 -08:00
Theodore Ts'o	ba52de123d	[PATCH] inode-diet: Eliminate i_blksize from the inode structure This eliminates the i_blksize field from struct inode. Filesystems that want to provide a per-inode st_blksize can do so by providing their own getattr routine instead of using the generic_fillattr() function. Note that some filesystems were providing pretty much random (and incorrect) values for i_blksize. [bunk@stusta.de: cleanup] [akpm@osdl.org: generic_fillattr() fix] Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:18 -07:00
Mark Fasheh	24c19ef404	ocfs2: Remove i_generation from inode lock names OCFS2 puts inode meta data in the "lock value block" provided by the DLM. Typically, i_generation is encoded in the lock name so that a deleted inode on and a new one in the same block don't share the same lvb. Unfortunately, that scheme means that the read in ocfs2_read_locked_inode() is potentially thrown away as soon as the meta data lock is taken - we cannot encode the lock name without first knowing i_generation, which requires a disk read. This patch encodes i_generation in the inode meta data lvb, and removes the value from the inode meta data lock name. This way, the read can be covered by a lock, and at the same time we can distinguish between an up to date and a stale LVB. This will help cold-cache stat(2) performance in particular. Since this patch changes the protocol version, we take the opportunity to do a minor re-organization of two of the LVB fields. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:46 -07:00
Mark Fasheh	379dfe9d0d	ocfs2: Hook rest of the file system into dentry locking API Actually replace the vote calls with the new dentry operations. Make any necessary adjustments to get the scheme to work. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:43 -07:00
Mark Fasheh	aa9588741d	ocfs2: implement directory read-ahead Uptodate.c now knows about read-ahead buffers. Use some more aggressive logic in ocfs2_readdir(). The two functions which currently use directory read-ahead are ocfs2_find_entry() and ocfs2_readdir(). Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-20 15:53:40 -07:00
Herbert Poetzl	ca4d147e62	ocfs2: add ext2 attributes Support immutable, and other attributes. Some renaming and other minor fixes done by myself. Signed-off-by: Herbert Poetzl <herbert@13thfloor.at> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-20 15:48:39 -07:00
Mark Fasheh	b0697053f9	ocfs2: don't use MLF* in the file system Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-03-24 14:58:28 -08:00
Mark Fasheh	b4df6ed8db	[PATCH] ocfs2: fix orphan recovery deadlock Orphan dir recovery can deadlock with another process in ocfs2_delete_inode() in some corner cases. Fix this by tracking recovery state more closely and allowing it to handle inode wipes which might deadlock. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-03-01 11:32:41 -08:00
Mark Fasheh	251b6eccbe	[OCFS2] Make ip_io_sem a mutex ip_io_sem is now ip_io_mutex. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-02-03 13:47:19 -08:00
Jes Sorensen	1b1dcc1b57	[PATCH] mutex subsystem, semaphore to mutex: VFS, ->i_sem This patch converts the inode semaphore to a mutex. I have tested it on XFS and compiled as much as one can consider on an ia64. Anyway your luck with it might be different. Modified-by: Ingo Molnar <mingo@elte.hu> (finished the conversion) Signed-off-by: Jes Sorensen <jes@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2006-01-09 15:59:24 -08:00
Mark Fasheh	ccd979bdbc	[PATCH] OCFS2: The Second Oracle Cluster Filesystem The OCFS2 file system module. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>	2006-01-03 11:45:47 -08:00

1 2 3

137 Commits