linux-next

mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-24 21:24:00 +08:00

Author	SHA1	Message	Date
Mark Fasheh	83f92318fa	ocfs2: Add dir_resv_level mount option The default behavior for directory reservations stays the same, but we add a mount option so people can tweak the size of directory reservations according to their workloads. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-05-05 18:18:07 -07:00
Mark Fasheh	b07f8f24df	ocfs2: change default reservation window sizes The default reservation size of 4 (32-bit windows) is a bit too ambitious. Scale it back to 16 bits (resv_level=2). I have been testing various sizes on a 4-node cluster which runs a mixed workload that is heavily threaded. With a 256MB local alloc, I get roughly the following levels of average file fragmentation: resv_level=0 70% resv_level=1 21% resv_level=2 23% resv_level=3 24% resv_level=4 60% resv_level=5 did not test resv_level=6 60% resv_level=2 seemed like a good compromise between not letting windows be too small, but not so big that heavier workloads will immediately suffer without tuning. This patch also change the behavior of directory reservations - they now track file reservations. The previous compromise of giving directory windows only 8 bits wound up fragmenting more at some window sizes because file allocations had smaller unused windows to poach from. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-05-05 18:18:07 -07:00
Mark Fasheh	6b82021b9e	ocfs2: increase the default size of local alloc windows I have observed that the current size of 8M gives us pretty poor fragmentation on multi-threaded workloads which do lots of writes. Generally, I can increase the size of local alloc windows and observe a marked decrease in fragmentation, even up and beyond window sizes of 512 megabytes. This makes sense for a couple reasons - larger local alloc means more room for reservation windows. On multi-node workloads the larger local alloc helps as well because we don't have to do window slides as often. Also, I removed the OCFS2_DEFAULT_LOCAL_ALLOC_SIZE constant as it is no longer used and the comment above it was out of date. To test fragmentation, I used a workload which launched 4 threads that did 4k writes into a series of about 140 alternating files. With resv_level=2, and a 4k/4k file system I observed the following average fragmentation for various localalloc= parameters: localalloc= avg. fragmentation 8 48 32 16 64 10 120 7 On larger cluster sizes, the difference is more dramatic. The new default size top out at 256M, which we'll only get for cluster sizes of 32K and above. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-05-05 18:18:07 -07:00
Mark Fasheh	73c8a80003	ocfs2: clean up localalloc mount option size parsing This patch pulls the local alloc sizing code into localalloc.c and provides a callout to it from ocfs2_fill_super(). Behavior is essentially unchanged except that I correctly calculate the maximum local alloc size. The old code in ocfs2_parse_options() calculated the max size as: ocfs2_local_alloc_size(sb) * 8 which is correct, in bits. Unfortunately though the option passed in is in megabytes. Ultimately, this bug made no real difference - the shrink code would catch a too-large size and bring it down to something reasonable. Still, it's less than efficient as-is. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-05-05 18:18:06 -07:00
Mark Fasheh	a57c8fd2ad	ocfs2: remove ocfs2_local_alloc_in_range() Inodes are always allocated from the global bitmap now so we don't need this any more. Also, the existing implementation bounces reservations around needlessly. Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2010-05-05 18:17:31 -07:00
Mark Fasheh	33d5d380d6	ocfs2: allocate btree internal block groups from the global bitmap Otherwise, the need for a very large contiguous allocation tends to wreak havoc on many inode allocation reservations on the local alloc, thus ruining any chances for contiguousness. Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2010-05-05 18:17:31 -07:00
Mark Fasheh	e3b4a97dbe	ocfs2: use allocation reservations for directory data Use the reservations system for unindexed dir tree allocations. We don't bother with the indexed tree as reads from it are mostly random anyway. Directory reservations are marked seperately, to allow the reservations code a chance to optimize their window sizes. This patch allocates only 8 bits for directory windows as they generally are not expected to grow as quickly as file data. Future improvements to dir window sizing can trivially be made. Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2010-05-05 18:17:30 -07:00
Mark Fasheh	4fe370afaa	ocfs2: use allocation reservations during file write Add a per-inode reservations structure and pass it through to the reservations code. Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2010-05-05 18:17:30 -07:00
Mark Fasheh	d02f00cc05	ocfs2: allocation reservations This patch improves Ocfs2 allocation policy by allowing an inode to reserve a portion of the local alloc bitmap for itself. The reserved portion (allocation window) is advisory in that other allocation windows might steal it if the local alloc bitmap becomes full. Otherwise, the reservations are honored and guaranteed to be free. When the local alloc window is moved to a different portion of the bitmap, existing reservations are discarded. Reservation windows are represented internally by a red-black tree. Within that tree, each node represents the reservation window of one inode. An LRU of active reservations is also maintained. When new data is written, we allocate it from the inodes window. When all bits in a window are exhausted, we allocate a new one as close to the previous one as possible. Should we not find free space, an existing reservation is pulled off the LRU and cannibalized. Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2010-05-05 18:17:30 -07:00
Joel Becker	ec20cec7a3	ocfs2: Make ocfs2_journal_dirty() void. jbd[2]_journal_dirty_metadata() only returns 0. It's been returning 0 since before the kernel moved to git. There is no point in checking this error. ocfs2_journal_dirty() has been faithfully returning the status since the beginning. All over ocfs2, we have blocks of code checking this can't fail status. In the past few years, we've tried to avoid adding these checks, because they are pointless. But anyone who looks at our code assumes they are needed. Finally, ocfs2_journal_dirty() is made a void function. All error checking is removed from other files. We'll BUG_ON() the status of jbd2_journal_dirty_metadata() just in case they change it someday. They won't. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-05-05 18:17:29 -07:00
Mark Fasheh	b4414eea0e	ocfs2: Clear undo bits when local alloc is freed When the local alloc file changes windows, unused bits are freed back to the global bitmap. By defnition, those bits can not be in use by any file. Also, the local alloc will never have been able to allocate those bits if they were part of a previous truncate. Therefore it makes sense that we should clear unused local alloc bits in the undo buffer so that they can be used immediatly. [ Modified to call it ocfs2_release_clusters() -- Joel ] Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-03-23 18:22:40 -07:00
Tao Ma	b23179681c	ocfs2: Init meta_ac properly in ocfs2_create_empty_xattr_block. You can't store a pointer that you haven't filled in yet and expect it to work. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-03-19 14:53:52 -07:00
Tao Ma	dfe4d3d6a6	ocfs2: Fix the update of name_offset when removing xattrs When replacing a xattr's value, in some case we wipe its name/value first and then re-add it. The wipe is done by ocfs2_xa_block_wipe_namevalue() when the xattr is in the inode or block. We currently adjust name_offset for all the entries which have (offset < name_offset). This does not adjust the entrie we're replacing. Since we are replacing the entry, we don't adjust the total entry count. When we calculate a new namevalue location, we trust the entries now-wrong offset in ocfs2_xa_get_free_start(). The solution is to also adjust the name_offset for the replaced entry, allowing ocfs2_xa_get_free_start() to calculate the new namevalue location correctly. The following script can trigger a kernel panic easily. echo 'y'\|mkfs.ocfs2 --fs-features=local,xattr -b 4K $DEVICE mount -t ocfs2 $DEVICE $MNT_DIR FILE=$MNT_DIR/$RANDOM for((i=0;i<76;i++)) do string_76="a$string_76" done string_78="aa$string_76" string_82="aaaa$string_78" touch $FILE setfattr -n 'user.test1234567890' -v $string_76 $FILE setfattr -n 'user.test1234567890' -v $string_78 $FILE setfattr -n 'user.test1234567890' -v $string_82 $FILE Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-03-19 14:53:51 -07:00
Mark Fasheh	b22b63ebaf	ocfs2: Always try for maximum bits with new local alloc windows What we were doing before was to ask for the current window size as the maximum allocation. This had the effect of limiting the amount of allocation we could get for the local alloc during times when the window size was shrunk due to fragmentation. In some cases, that could actually increase fragmentation by artificially limiting the number of bits we can accept. So while we still want to ask for a minimum number of bits equal to window size, there is no reason why we should limit the number of bits the local alloc should accept. Hence always allow the maximum number of local alloc bits. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-03-18 13:22:42 -07:00
Mark Fasheh	fcefd25ac8	ocfs2: set i_mode on disk during acl operations ocfs2_set_acl() and ocfs2_init_acl() were setting i_mode on the in-memory inode, but never setting it on the disk copy. Thus, acls were some times not getting propagated between nodes. This patch fixes the issue by adding a helper function ocfs2_acl_set_mode() which does this the right way. ocfs2_set_acl() and ocfs2_init_acl() are then updated to call ocfs2_acl_set_mode(). Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-03-17 12:28:22 -07:00
Tao Ma	6527f8f848	ocfs2: Update i_blocks in reflink operations. In reflink, we need to upate i_blocks for the target inode. Reported-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-03-17 12:28:00 -07:00
Tao Ma	78c37eb0d5	ocfs2: Change bg_chain check for ocfs2_validate_gd_parent. In ocfs2_validate_gd_parent, we check bg_chain against the cl_next_free_rec of the dinode. Actually in resize, we have the chance of bg_chain == cl_next_free_rec. So add some additional condition check for it. I also rename paramter "clean_error" to "resize", since the old one is not clearly enough to indicate that we should only meet with this case in resize. btw, the correpsonding bug is http://oss.oracle.com/bugzilla/show_bug.cgi?id=1230. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-03-17 12:07:21 -07:00
Sachin Prabhu	ee860b6a65	[PATCH] Skip check for mandatory locks when unlocking ocfs2_lock() will skip locks on file which has mode set to 02666. This is a problem in cases where the mode of the file is changed after a process has obtained a lock on the file. ocfs2_lock() should skip the check for mandatory locks when unlocking a file. Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-03-17 12:07:16 -07:00
Linus Torvalds	c32da02342	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (56 commits) doc: fix typo in comment explaining rb_tree usage Remove fs/ntfs/ChangeLog doc: fix console doc typo doc: cpuset: Update the cpuset flag file Fix of spelling in arch/sparc/kernel/leon_kernel.c no longer needed Remove drivers/parport/ChangeLog Remove drivers/char/ChangeLog doc: typo - Table 1-2 should refer to "status", not "statm" tree-wide: fix typos "ass?o[sc]iac?te" -> "associate" in comments No need to patch AMD-provided drivers/gpu/drm/radeon/atombios.h devres/irq: Fix devm_irq_match comment Remove reference to kthread_create_on_cpu tree-wide: Assorted spelling fixes tree-wide: fix 'lenght' typo in comments and code drm/kms: fix spelling in error message doc: capitalization and other minor fixes in pnp doc devres: typo fix s/dev/devm/ Remove redundant trailing semicolons from macros fix typo "definetly" -> "definitely" in comment tree-wide: s/widht/width/g typo in comments ... Fix trivial conflict in Documentation/laptops/00-INDEX	2010-03-12 16:04:50 -08:00
Joe Perches	03affdef4f	fs/ocfs2/cluster/tcp.c: remove use of NIPQUAD, use %pI4 Signed-off-by: Joe Perches <joe@perches.com> Cc: Mark Fasheh <mfasheh@suse.com> Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-03-12 15:52:27 -08:00
Jiri Kosina	318ae2edc3	Merge branch 'for-next' into for-linus Conflicts: Documentation/filesystems/proc.txt arch/arm/mach-u300/include/mach/debug-macro.S drivers/net/qlge/qlge_ethtool.c drivers/net/qlge/qlge_main.c drivers/net/typhoon.c	2010-03-08 16:55:37 +01:00
Emese Revfy	52cf25d0ab	Driver core: Constify struct sysfs_ops in struct kobj_type Constify struct sysfs_ops. This is part of the ops structure constification effort started by Arjan van de Ven et al. Benefits of this constification: * prevents modification of data that is shared (referenced) by many other structure instances at runtime * detects/prevents accidental (but not intentional) modification attempts on archs that enforce read-only kernel data at runtime * potentially better optimized code as the compiler can assume that the const data cannot be changed * the compiler/linker move const data into .rodata and therefore exclude them from false sharing Signed-off-by: Emese Revfy <re.emese@gmail.com> Acked-by: David Teigland <teigland@redhat.com> Acked-by: Matt Domsch <Matt_Domsch@dell.com> Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com> Acked-by: Hans J. Koch <hjk@linutronix.de> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Jens Axboe <jens.axboe@oracle.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2010-03-07 17:04:49 -08:00
Akinobu Mita	984b3f5746	bitops: rename for_each_bit() to for_each_set_bit() Rename for_each_bit to for_each_set_bit in the kernel source tree. To permit for_each_clear_bit(), should that ever be added. The patch includes a macro to map the old for_each_bit() onto the new for_each_set_bit(). This is a (very) temporary thing to ease the migration. [akpm@linux-foundation.org: add temporary for_each_bit()] Suggested-by: Alexey Dobriyan <adobriyan@gmail.com> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Russell King <rmk@arm.linux.org.uk> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Artem Bityutskiy <dedekind@infradead.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-03-06 11:26:23 -08:00
Linus Torvalds	e213e26ab3	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: (33 commits) quota: stop using QUOTA_OK / NO_QUOTA dquot: cleanup dquot initialize routine dquot: move dquot initialization responsibility into the filesystem dquot: cleanup dquot drop routine dquot: move dquot drop responsibility into the filesystem dquot: cleanup dquot transfer routine dquot: move dquot transfer responsibility into the filesystem dquot: cleanup inode allocation / freeing routines dquot: cleanup space allocation / freeing routines ext3: add writepage sanity checks ext3: Truncate allocated blocks if direct IO write fails to update i_size quota: Properly invalidate caches even for filesystems with blocksize < pagesize quota: generalize quota transfer interface quota: sb_quota state flags cleanup jbd: Delay discarding buffers in journal_unmap_buffer ext3: quota_write cross block boundary behaviour quota: drop permission checks from xfs_fs_set_xstate/xfs_fs_set_xquota quota: split out compat_sys_quotactl support from quota.c quota: split out netlink notification support from quota.c quota: remove invalid optimization from quota_sync_all ... Fixed trivial conflicts in fs/namei.c and fs/ufs/inode.c	2010-03-05 13:20:53 -08:00
Christoph Hellwig	871a293155	dquot: cleanup dquot initialize routine Get rid of the initialize dquot operation - it is now always called from the filesystem and if a filesystem really needs it's own (which none currently does) it can just call into it's own routine directly. Rename the now static low-level dquot_initialize helper to __dquot_initialize and vfs_dq_init to dquot_initialize to have a consistent namespace. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>	2010-03-05 00:20:30 +01:00
Christoph Hellwig	907f4554e2	dquot: move dquot initialization responsibility into the filesystem Currently various places in the VFS call vfs_dq_init directly. This means we tie the quota code into the VFS. Get rid of that and make the filesystem responsible for the initialization. For most metadata operations this is a straight forward move into the methods, but for truncate and open it's a bit more complicated. For truncate we currently only call vfs_dq_init for the sys_truncate case because open already takes care of it for ftruncate and open(O_TRUNC) - the new code causes an additional vfs_dq_init for those which is harmless. For open the initialization is moved from do_filp_open into the open method, which means it happens slightly earlier now, and only for regular files. The latter is fine because we don't need to initialize it for operations on special files, and we already do it as part of the namespace operations for directories. Add a dquot_file_open helper that filesystems that support generic quotas can use to fill in ->open. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>	2010-03-05 00:20:30 +01:00
Christoph Hellwig	9f75475802	dquot: cleanup dquot drop routine Get rid of the drop dquot operation - it is now always called from the filesystem and if a filesystem really needs it's own (which none currently does) it can just call into it's own routine directly. Rename the now static low-level dquot_drop helper to __dquot_drop and vfs_dq_drop to dquot_drop to have a consistent namespace. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>	2010-03-05 00:20:30 +01:00
Christoph Hellwig	257ba15ced	dquot: move dquot drop responsibility into the filesystem Currently clear_inode calls vfs_dq_drop directly. This means we tie the quota code into the VFS. Get rid of that and make the filesystem responsible for the drop inside the ->clear_inode superblock operation. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>	2010-03-05 00:20:29 +01:00
Christoph Hellwig	b43fa8284d	dquot: cleanup dquot transfer routine Get rid of the transfer dquot operation - it is now always called from the filesystem and if a filesystem really needs it's own (which none currently does) it can just call into it's own routine directly. Rename the now static low-level dquot_transfer helper to __dquot_transfer and vfs_dq_transfer to dquot_transfer to have a consistent namespace, and make the new dquot_transfer return a normal negative errno value which all callers expect. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>	2010-03-05 00:20:29 +01:00
Christoph Hellwig	63936ddaa1	dquot: cleanup inode allocation / freeing routines Get rid of the alloc_inode and free_inode dquot operations - they are always called from the filesystem and if a filesystem really needs their own (which none currently does) it can just call into it's own routine directly. Also get rid of the vfs_dq_alloc/vfs_dq_free wrappers and always call the lowlevel dquot_alloc_inode / dqout_free_inode routines directly, which now lose the number argument which is always 1. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>	2010-03-05 00:20:28 +01:00
Christoph Hellwig	5dd4056db8	dquot: cleanup space allocation / freeing routines Get rid of the alloc_space, free_space, reserve_space, claim_space and release_rsv dquot operations - they are always called from the filesystem and if a filesystem really needs their own (which none currently does) it can just call into it's own routine directly. Move shared logic into the common __dquot_alloc_space, dquot_claim_space_nodirty and __dquot_free_space low-level methods, and rationalize the wrappers around it to move as much as possible code into the common block for CONFIG_QUOTA vs not. Also rename all these helpers to be named dquot_* instead of vfs_dq_*. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>	2010-03-05 00:20:28 +01:00
Tristan Ye	9df5778ece	Ocfs2: Move ocfs2 ioctl definitions from ocfs2_fs.h to newly added ocfs2_ioctl.h Currently we were adding ioctl cmds/structures for ocfs2 into ocfs2_fs.h which was used for define ocfs2 on-disk layout. That sounds a little bit confusing, and it may be quickly polluted espcially when growing the ocfs2_info_request ioctls afterwards(it will grow i bet). As a result, such OCFS2 IOCs do need to be placed somewhere other than ocfs2_fs.h, a separated ocfs2_ioctl.h will be added to store such ioctl structures and definitions which could also be used from userspace to invoke ioctls call. Signed-off-by: Tristan Ye <tristan.ye@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-03-02 14:10:20 -08:00
Wengang Wang	5051f76883	ocfs2: send SIGXFSZ if new filesize exceeds limit -v2 This patch makes ocfs2 send SIGXFSZ if new file size exceeds the rlimit. Processes may get SIGXFSZ on one node (in the cluster) while others will not on another if file size limits are different on the two nodes. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-02-27 20:08:51 -08:00
Sunil Mushran	6fcef3f04a	ocfs2/userdlm: Add tracing in userdlm Make use of the newly added BASTS masklog to trace ASTs and BASTs in userdlm. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-02-27 19:57:07 -08:00
Sunil Mushran	9b915181af	ocfs2: Use a separate masklog for AST and BASTs This patch adds a new masklog and uses it allow tracing ASTs and BASTs in the dlmglue layer. This has been found to be very useful in debugging cluster locking issues. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-02-27 19:57:06 -08:00
Srinivas Eeda	bc9838c4d4	dlm: allow dlm do recovery during shutdown If a node down event happens while dlm shutdown in progress, dlm recovery should be done before dlm is shutdown. We can't migrate unrecovered locks, obviously. But dlm_reco_thread only does recovery if the dlm_state is in DLM_CTXT_JOINED. dlm_reco_thread should do recovery if dlm_state is in DLM_CTXT_JOINED or DLM_CTXT_IN_SHUTDOWN. Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-02-26 15:41:19 -08:00
Tao Ma	cbaee472f2	ocfs2: Only bug out in direct io write for reflinked extent. In ocfs2_direct_IO_get_blocks, we only need to bug out in case of we are going to write a recounted extent rec. What a silly bug introduced by me! Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Cc: stable@kernel.org	2010-02-26 15:41:19 -08:00
Coly Li	66b116c9d8	ocfs2: fix warning in ocfs2_file_aio_write() This patch fixes a compiling warning in ocfs2_file_aio_write(). Signed-off-by: Coly Li <coly.li@suse.de> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-02-26 15:41:18 -08:00
Joel Becker	cbe0e331fd	ocfs2_dlmfs: Enable the use of user cluster stacks. Unlike ocfs2, dlmfs has no permanent storage. It can't store off a cluster stack it is supposed to be using. So it can't specify the stack name in ocfs2_cluster_connect(). Instead, we create ocfs2_cluster_connect_agnostic(), which simply uses the stack that is currently enabled. This is find for dlmfs, which will rely on the stack initialization. We add the "stackglue" capability to dlmfs's capability list. This lets userspace know dlmfs can be used with all cluster stacks. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-02-26 15:41:18 -08:00
Joel Becker	0016eedc41	ocfs2_dlmfs: Use the stackglue. Rather than directly using o2dlm, dlmfs can now use the stackglue. This allows it to use userspace cluster stacks and fs/dlm. This commit forces o2cb for now. A latter commit will bump the protocol version and allow non-o2cb stacks. This is one big sed, really. LKM_xxMODE becomes DLM_LOCK_xx. LKM_flag becomes DLM_LKF_flag. We also learn to check that the LVB is valid before reading it. Any DLM can lose the contents of the LVB during a complicated recovery. userdlm should be checking this. Now it does. dlmfs will return 0 from read(2) if the LVB was invalid. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-02-26 15:41:18 -08:00
Joel Becker	e8fce482f3	ocfs2_dlmfs: Don't honor truncate. The size of a dlmfs file is LVB_LEN We want folks using dlmfs to be able to use the LVB in places other than just write(2)/read(2). By ignoring truncate requests, we allow 'echo "contents" > /dlm/space/lockname' to work. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-02-26 15:41:18 -08:00
Joel Becker	553b5eb91a	ocfs2: Pass the locking protocol into ocfs2_cluster_connect(). Inside the stackglue, the locking protocol structure is hanging off of the ocfs2_cluster_connection. This takes it one further; the locking protocol is passed into ocfs2_cluster_connect(). Now different cluster connections can have different locking protocols with distinct asts. Note that all locking protocols have to keep their maximum protocol version in lock-step. With the protocol structure set in ocfs2_cluster_connect(), there is no need for the stackglue to have a static pointer to a specific protocol structure. We can change initialization to only pass in the maximum protocol version. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-02-26 15:41:17 -08:00
Joel Becker	e603cfb074	ocfs2: Remove the ast pointers from ocfs2_stack_plugins With the full ocfs2_locking_protocol hanging off of the ocfs2_cluster_connection, ast wrappers can get the ast/bast pointers there. They don't need to get them from their plugin structure. The user plugin still needs the maximum locking protocol version, though. This changes the plugin structure so that it only holds the max version, not the entire ocfs2_locking_protocol pointer. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-02-26 15:41:16 -08:00
Joel Becker	110946c8fb	ocfs2: Hang the locking proto on the cluster conn and use it in asts. With the ocfs2_cluster_connection hanging off of the ocfs2_dlm_lksb, we have access to it in the ast and bast wrapper functions. Attach the ocfs2_locking_protocol to the conn. Now, instead of refering to a static variable for ast/bast pointers, the wrappers can look at the connection. This means different connections can have different ast/bast pointers, and it reduces the need for the static pointer. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-02-26 15:41:16 -08:00
Joel Becker	c0e4133851	ocfs2: Attach the connection to the lksb We're going to want it in the ast functions, so we convert union ocfs2_dlm_lksb to struct ocfs2_dlm_lksb and let it carry the connection. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-02-26 15:41:14 -08:00
Joel Becker	a796d2862a	ocfs2: Pass lksbs back from stackglue ast/bast functions. The stackglue ast and bast functions tried to maintain the fiction that their arguments were void pointers. In reality, stack_user.c had to know that the argument was an ocfs2_lock_res in order to get the status off of the lksb. That's ugly. This changes stackglue to always pass the lksb as the argument to ast and bast functions. The caller can always use container_of() to get the ocfs2_lock_res or user_dlm_lock_res. The net effect to the caller is zero. They still get back the lockres in their ast. stackglue gets cleaner, and now can use the lksb itself. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-02-26 15:41:14 -08:00
Joel Becker	34a9dd7e29	ocfs2_dlmfs: Move to its own directory We're going to remove the tie between ocfs2_dlmfs and o2dlm. ocfs2_dlmfs doesn't belong in the fs/ocfs2/dlm directory anymore. Here we move it to fs/ocfs2/dlmfs. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-02-26 15:41:14 -08:00
Joel Becker	65b6f34034	ocfs2_dlmfs: Use poll() to signify BASTs. o2dlm's userspace filesystem is an easy way to use the DLM from userspace. It is intentionally simple. For example, it does not allow for asynchronous behavior or lock conversion. This is intentional to keep the interface simple. Because there is no asynchronous notification, there is no way for a process holding a lock to know another node needs the lock. This is the number one complaint of ocfs2_dlmfs users. Turns out, we can solve this very easily. We add poll() support to ocfs2_dlmfs. When a BAST is received, the lock's file descriptor will receive POLLIN. This is trivial to implement. Userdlm already has an appropriate waitqueue, and the lock knows when it is blocked. We add the "bast" capability to tell userspace this is available. Signed-off-by: Joel Becker <joel.becker@oracle.com> Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-02-26 15:41:14 -08:00
Joel Becker	14a437c2b6	ocfs2_dlmfs: Add capabilities parameter. Over time, dlmfs has added some features that were not part of the initial ABI. Unfortunately, some of these features are not detectable via standard usage. For example, Linux's default poll always returns POLLIN, so there is no way for a caller of poll(2) to know when dlmfs added poll support. Instead, we provide this list of new capabilities. Capabilities is a read-only attribute. We do it as a module parameter so we can discover it whether dlmfs is built in, loaded, or even not loaded (via modinfo). The ABI features are local to this machine's dlmfs mount. This is distinct from the locking protocol, which is concerned with inter-node interaction. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-02-26 15:41:13 -08:00
Joel Becker	399ff3a748	ocfs2: Handle errors while setting external xattr values. ocfs2 can store extended attribute values as large as a single file. It does this using a standard ocfs2 btree for the large value. However, the previous code did not handle all error cases cleanly. There are multiple problems to have. 1) We have trouble allocating space for a new xattr. This leaves us with an empty xattr. 2) We overwrote an existing local xattr with a value root, and now we have an error allocating the storage. This leaves us an empty xattr. where there used to be a value. The value is lost. 3) We have trouble truncating a reused value. This leaves us with the original entry pointing to the truncated original value. The value is lost. 4) We have trouble extending the storage on a reused value. This leaves us with the original value safely in place, but with more storage allocated when needed. This doesn't consider storing local xattrs (values that don't require a btree). Those only fail when the journal fails. Case (1) is easy. We just remove the xattr we added. We leak the storage because we can't safely remove it, but otherwise everything is happy. We'll print a warning about the leak. Case (4) is easy. We still have the original value in place. We can just leave the extra storage attached to this xattr. We return the error, but the old value is untouched. We print a warning about the storage. Case (2) and (3) are hard because we've lost the original values. In the old code, we ended up with values that could be partially read. That's not good. Instead, we just wipe the xattr entry and leak the storage. It stinks that the original value is lost, but now there isn't a partial value to be read. We'll print a big fat warning. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-02-26 15:41:13 -08:00

1 2 3 4 5 ...

1177 Commits