linux-next

mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-23 04:34:11 +08:00

Author	SHA1	Message	Date
Christoph Hellwig	99428ad0f6	xfs: reindent xlog_write Reindent xlog_write to normal one tab indents and move all variable declarations into the closest enclosing block. Split from a bigger patch by Dave Chinner. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2010-05-19 09:58:10 -05:00
Dave Chinner	b5203cd0a4	xfs: factor xlog_write xlog_write is a mess that takes a lot of effort to understand. It is a mass of nested loops with 4 space indents to get it to fit in 80 columns and lots of funky variables that aren't obvious what they mean or do. Break it down into understandable chunks. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2010-05-19 09:58:09 -05:00
Dave Chinner	9b9fc2b760	xfs: log ticket reservation underestimates the number of iclogs When allocation a ticket for a transaction, the ticket is initialised with the worst case log space usage based on the number of bytes the transaction may consume. Part of this calculation is the number of log headers required for the iclog space used up by the transaction. This calculation makes an undocumented assumption that if the transaction uses the log header space reservation on an iclog, then it consumes either the entire iclog or it completes. That is - the transaction that is first in an iclog is the transaction that the log header reservation is accounted to. If the transaction is larger than the iclog, then it will use the entire iclog itself. Document this assumption. Further, the current calculation uses the rule that we can fit iclog_size bytes of transaction data into an iclog. This is in correct - the amount of space available in an iclog for transaction data is the size of the iclog minus the space used for log record headers. This means that the calculation is out by 512 bytes per 32k of log space the transaction can consume. This is rarely an issue because maximally sized transactions are extremely uncommon, and for 4k block size filesystems maximal transaction reservations are about 400kb. Hence the error in this case is less than the size of an iclog, so that makes it even harder to hit. However, anyone using larger directory blocks (16k directory blocks push the maximum transaction size to approx. 900k on a 4k block size filesystem) or larger block size (e.g. 64k blocks push transactions to the 3-4MB size) could see the error grow to more than an iclog and at this point the transaction is guaranteed to get a reservation underrun and shutdown the filesystem. Fix this by adjusting the calculation to calculate the correct number of iclogs required and account for them all up front. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2010-05-19 09:58:09 -05:00
Dave Chinner	43f5efc5b5	xfs: factor log item initialisation Each log item type does manual initialisation of the log item. Delayed logging introduces new fields that need initialisation, so factor all the open coded initialisation into a common function first. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2010-05-19 09:58:07 -05:00
Dave Chinner	b6f8dd49db	xfs: ensure that sync updates the log tail correctly Updates to the VFS layer removed an extra ->sync_fs call into the filesystem during the sync process (from the quota code). Unfortunately the sync code was unknowingly relying on this call to make sure metadata buffers were flushed via a xfs_buftarg_flush() call to move the tail of the log forward in memory before the final transactions of the sync process were issued. As a result, the old code would write a very recent log tail value to the log by the end of the sync process, and so a subsequent crash would leave nothing for log recovery to do. Hence in qa test 182, log recovery only replayed a small handle for inode fsync transactions in this case. However, with the removal of the extra ->sync_fs call, the log tail was now not moved forward with the inode fsync transactions near the end of the sync procese the first (and only) buftarg flush occurred after these transactions went to disk. The result is that log recovery now sees a large number of transactions for metadata that is already on disk. This usually isn't a problem, but when the transactions include inode chunk allocation, the inode create transactions and all subsequent changes are replayed as we cannt rely on what is on disk is valid. As a result, if the inode was written and contains unlogged changes, the unlogged changes are lost, thereby violating sync semantics. The fix is to always issue a transaction after the buftarg flush occurs is the log iѕ not idle or covered. This results in a dummy transaction being written that contains the up-to-date log tail value, which will be very recent. Indeed, it will be at least as recent as the old code would have left on disk, so log recovery will behave exactly as it used to in this situation. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2010-04-16 13:51:23 -05:00
Christoph Hellwig	35a8a72f06	xfs: stop passing opaque handles to xfs_log.c routines Currenly we pass opaque xfs_log_ticket_t handles instead of struct xlog_ticket pointers, and void pointers instead of struct xlog_in_core pointers to various log manager functions. Instead pass properly typed pointers after adding forward declarations for them to xfs_log.h, and adjust the touched function prototypes to the standard XFS style while at it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2010-03-01 16:35:32 -06:00
Christoph Hellwig	a14a348bff	xfs: cleanup up xfs_log_force calling conventions Remove the XFS_LOG_FORCE argument which was always set, and the XFS_LOG_URGE define, which was never used. Split xfs_log_force into a two helpers - xfs_log_force which forces the whole log, and xfs_log_force_lsn which forces up to the specified LSN. The underlying implementations already were entirely separate, as were the users. Also re-indent the new _xfs_log_force/_xfs_log_force which previously had a weird coding style. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2010-01-21 13:44:49 -06:00
Christoph Hellwig	4139b3b337	xfs: kill XLOG_VEC_SET_TYPE This macro only obsfucates the log item type assignments, so kill it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2010-01-21 13:44:43 -06:00
Christoph Hellwig	873ff5501d	xfs: clean up log buffer writes Don't bother using XFS_bwrite as it doesn't provide much code for our use case. Instead opencode it and fold xlog_bdstrat_cb into the new xlog_bdstrat helper. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2010-01-15 15:34:54 -06:00
Dave Chinner	2ee1abad73	xfs: improve metadata I/O merging in the elevator Change all async metadata buffers to use [READ\|WRITE]_META I/O types so that the I/O doesn't get issued immediately. This allows merging of adjacent metadata requests but still prioritises them over bulk data. This shows a 10-15% improvement in sequential create speed of small files. Don't include the log buffers in this classification - leave them as sync types so they are issued immediately. Signed-off-by: Dave Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2009-12-16 13:41:19 -06:00
Christoph Hellwig	0b1b213fcf	xfs: event tracing support Convert the old xfs tracing support that could only be used with the out of tree kdb and xfsidbg patches to use the generic event tracer. To use it make sure CONFIG_EVENT_TRACING is enabled and then enable all xfs trace channels by: echo 1 > /sys/kernel/debug/tracing/events/xfs/enable or alternatively enable single events by just doing the same in one event subdirectory, e.g. echo 1 > /sys/kernel/debug/tracing/events/xfs/xfs_ihold/enable or set more complex filters, etc. In Documentation/trace/events.txt all this is desctribed in more detail. To reads the events do a cat /sys/kernel/debug/tracing/trace Compared to the last posting this patch converts the tracing mostly to the one tracepoint per callsite model that other users of the new tracing facility also employ. This allows a very fine-grained control of the tracing, a cleaner output of the traces and also enables the perf tool to use each tracepoint as a virtual performance counter, allowing us to e.g. count how often certain workloads git various spots in XFS. Take a look at http://lwn.net/Articles/346470/ for some examples. Also the btree tracing isn't included at all yet, as it will require additional core tracing features not in mainline yet, I plan to deliver it later. And the really nice thing about this patch is that it actually removes many lines of code while adding this nice functionality: fs/xfs/Makefile \| 8 fs/xfs/linux-2.6/xfs_acl.c \| 1 fs/xfs/linux-2.6/xfs_aops.c \| 52 - fs/xfs/linux-2.6/xfs_aops.h \| 2 fs/xfs/linux-2.6/xfs_buf.c \| 117 +-- fs/xfs/linux-2.6/xfs_buf.h \| 33 fs/xfs/linux-2.6/xfs_fs_subr.c \| 3 fs/xfs/linux-2.6/xfs_ioctl.c \| 1 fs/xfs/linux-2.6/xfs_ioctl32.c \| 1 fs/xfs/linux-2.6/xfs_iops.c \| 1 fs/xfs/linux-2.6/xfs_linux.h \| 1 fs/xfs/linux-2.6/xfs_lrw.c \| 87 -- fs/xfs/linux-2.6/xfs_lrw.h \| 45 - fs/xfs/linux-2.6/xfs_super.c \| 104 --- fs/xfs/linux-2.6/xfs_super.h \| 7 fs/xfs/linux-2.6/xfs_sync.c \| 1 fs/xfs/linux-2.6/xfs_trace.c \| 75 ++ fs/xfs/linux-2.6/xfs_trace.h \| 1369 +++++++++++++++++++++++++++++++++++++++++ fs/xfs/linux-2.6/xfs_vnode.h \| 4 fs/xfs/quota/xfs_dquot.c \| 110 --- fs/xfs/quota/xfs_dquot.h \| 21 fs/xfs/quota/xfs_qm.c \| 40 - fs/xfs/quota/xfs_qm_syscalls.c \| 4 fs/xfs/support/ktrace.c \| 323 --------- fs/xfs/support/ktrace.h \| 85 -- fs/xfs/xfs.h \| 16 fs/xfs/xfs_ag.h \| 14 fs/xfs/xfs_alloc.c \| 230 +----- fs/xfs/xfs_alloc.h \| 27 fs/xfs/xfs_alloc_btree.c \| 1 fs/xfs/xfs_attr.c \| 107 --- fs/xfs/xfs_attr.h \| 10 fs/xfs/xfs_attr_leaf.c \| 14 fs/xfs/xfs_attr_sf.h \| 40 - fs/xfs/xfs_bmap.c \| 507 +++------------ fs/xfs/xfs_bmap.h \| 49 - fs/xfs/xfs_bmap_btree.c \| 6 fs/xfs/xfs_btree.c \| 5 fs/xfs/xfs_btree_trace.h \| 17 fs/xfs/xfs_buf_item.c \| 87 -- fs/xfs/xfs_buf_item.h \| 20 fs/xfs/xfs_da_btree.c \| 3 fs/xfs/xfs_da_btree.h \| 7 fs/xfs/xfs_dfrag.c \| 2 fs/xfs/xfs_dir2.c \| 8 fs/xfs/xfs_dir2_block.c \| 20 fs/xfs/xfs_dir2_leaf.c \| 21 fs/xfs/xfs_dir2_node.c \| 27 fs/xfs/xfs_dir2_sf.c \| 26 fs/xfs/xfs_dir2_trace.c \| 216 ------ fs/xfs/xfs_dir2_trace.h \| 72 -- fs/xfs/xfs_filestream.c \| 8 fs/xfs/xfs_fsops.c \| 2 fs/xfs/xfs_iget.c \| 111 --- fs/xfs/xfs_inode.c \| 67 -- fs/xfs/xfs_inode.h \| 76 -- fs/xfs/xfs_inode_item.c \| 5 fs/xfs/xfs_iomap.c \| 85 -- fs/xfs/xfs_iomap.h \| 8 fs/xfs/xfs_log.c \| 181 +---- fs/xfs/xfs_log_priv.h \| 20 fs/xfs/xfs_log_recover.c \| 1 fs/xfs/xfs_mount.c \| 2 fs/xfs/xfs_quota.h \| 8 fs/xfs/xfs_rename.c \| 1 fs/xfs/xfs_rtalloc.c \| 1 fs/xfs/xfs_rw.c \| 3 fs/xfs/xfs_trans.h \| 47 + fs/xfs/xfs_trans_buf.c \| 62 - fs/xfs/xfs_vnodeops.c \| 8 70 files changed, 2151 insertions(+), 2592 deletions(-) Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2009-12-14 23:08:16 -06:00
Christoph Hellwig	a8914f3a6d	xfs: fix spin_is_locked assert on uni-processor builds Without SMP or preemption spin_is_locked always returns false, so we can't do an assert with it. Instead use assert_spin_locked, which does the right thing on all builds. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Reported-by: Johannes Engel <jcnengel@googlemail.com> Tested-by: Johannes Engel <jcnengel@googlemail.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-08-12 01:08:27 -05:00
Dave Chinner	9d7fef74b2	xfs: inform the xfsaild of the push target before sleeping When trying to reserve log space, we find the amount of space we need, then go to sleep waiting for space. When we are woken, we try to push the tail of the log forward to make sure we have space available. Unfortunately, this means that if there is not space available, and everyone who needs space goes to sleep there is no-one left to push the tail of the log to make space available. Once we have a thread waiting for space to become available, the others queue up behind it in a FIFO, and none of them push the tail of the log. This can result in everyone going to sleep in xlog_grant_log_space() if the first sleeper races with the last I/O that moves the tail of the log forward. With no further I/O tomove the tail of the log, there is nothing to wake the sleepers and hence all transactions just stop. Fix this by making sure the xfsaild will create enough space for the transaction that is about to sleep by moving the push target far enough forwards to ensure that that the curent proceeees will have enough space available when it is woken. That is, we push the AIL before we go to sleep. Because we've inserted the log ticket into the queue before we've pushed and gone to sleep, subsequent transactions will wait behind this one. Hence we are guaranteed to have space available when we are woken. Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2009-04-06 18:42:59 +02:00
Dave Chinner	a6cb767e24	xfs: validate log feature fields correctly If the large log sector size feature bit is set in the superblock by accident (say disk corruption), the then fields that are now considered valid are not checked on production kernels. The checks are present as ASSERT statements so cause a panic on a debug kernel. Change this so that the fields are validity checked if the feature bit is set and abort the log mount if the fields do not contain valid values. Reported-by: Eric Sesterhenn <snakebyte@gmx.de> Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2009-04-06 18:39:27 +02:00
Malcolm Parsons	9da096fd13	xfs: fix various typos Signed-off-by: Malcolm Parsons <malcolm.parsons@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2009-03-29 09:55:42 +02:00
Christoph Hellwig	21b699c895	xfs: cleanup log unmount handling Kill the current xfs_log_unmount wrapper and opencode the two function calls in the only caller. Rename the current xfs_log_unmount_dealloc to xfs_log_unmount as it undoes xfs_log_mount and the new name makes that more clear. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com>	2009-03-16 08:19:29 +01:00
Christoph Hellwig	264307520b	xfs: fix error handling in xfs_log_mount We can't just call xfs_log_unmount_dealloc on any failure because the ail thread which is torn down by xfs_log_unmount_dealloc might not be initialized yet. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Felix Blyakher <felixb@sgi.com> Reported-by: Lachlan McIlroy <lachlan@sgi.com>	2009-02-12 19:55:48 +01:00
Christoph Hellwig	7153f8ba2b	xfs: remove iclog calculation special cases Our default has been to always use 8 32KB log buffers for a while now, so remove the special casing for larger block size filesystem to use the same or even lower number of buffers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com>	2009-02-09 08:36:46 +01:00
Christoph Hellwig	39e2defe73	reduce l_icloglock roundtrips All but one caller of xlog_state_want_sync drop and re-acquire l_icloglock around the call to it, just so that xlog_state_want_sync can acquire and drop it. Move all lock operation out of l_icloglock and assert that the lock is held when it is called. Note that it would make sense to extende this scheme to xlog_state_release_iclog, but the locking in there is more complicated and we'd like to keep the atomic_dec_and_lock optmization for those callers not having l_icloglock yet. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-04 15:39:21 +11:00
Christoph Hellwig	b28708d6a0	[XFS] sanitize xlog_in_core_t definition Move all fields from xlog_iclog_fields_t into xlog_in_core_t instead of having them in a substructure and the using #defines to make it look like they were directly in xlog_in_core_t. Also document that xlog_in_core_2_t is grossly misnamed, and make all references to it typesafe. (First sent on Semptember 15th) Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-01 11:37:25 +11:00
Christoph Hellwig	bac8dca9f9	[XFS] fix NULL pointer dereference in xfs_log_force_umount xfs_log_force_umount may be called very early during log recovery where If we fail a buffer read in xlog_recover_do_inode_trans we abort the mount. But at that point log recovery has started delayed writeback of inode buffers. As part of the aborted mount we try to flush out all delwri buffers, but at that point we have already freed the superblock, and set mp->m_sb_bp to NULL, and xfs_log_force_umount which gets called after the inode buffer writeback trips over it. Make xfs_log_force_umount a little more careful when accessing mp->m_sb_bp to avoid this. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-01 11:06:44 +11:00
Dave Chinner	cc09c0dc57	[XFS] Fix double free of log tickets When an I/O error occurs during an intermediate commit on a rolling transaction, xfs_trans_commit() will free the transaction structure and the related ticket. However, the duplicate transaction that gets used as the transaction continues still contains a pointer to the ticket. Hence when the duplicate transaction is cancelled and freed, we free the ticket a second time. Add reference counting to the ticket so that we hold an extra reference to the ticket over the transaction commit. We drop the extra reference once we have checked that the transaction commit did not return an error, thus avoiding a double free on commit error. Credit to Nick Piggin for tripping over the problem. SGI-PV: 989741 Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-11-17 17:37:10 +11:00
Dave Chinner	644c3567d1	[XFS] handle memory allocation failures during log initialisation When there is no memory left in the system, xfs_buf_get_noaddr() can fail. If this happens at mount time during xlog_alloc_log() we fail to catch the error and oops. Catch the error from xfs_buf_get_noaddr(), and allow other memory allocations to fail and catch those errors too. Report the error to the console and fail the mount with ENOMEM. Tested by manually injecting errors into xfs_buf_get_noaddr() and xlog_alloc_log(). Version 2: o remove unnecessary casts of the returned pointer from kmem_zalloc() SGI-PV: 987246 Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-11-10 16:50:24 +11:00
David Chinner	783a2f656f	[XFS] Finish removing the mount pointer from the AIL API Change all the remaining AIL API functions that are passed struct xfs_mount pointers to pass pointers directly to the struct xfs_ail being used. With this conversion, all external access to the AIL is via the struct xfs_ail. Hence the operation and referencing of the AIL is almost entirely independent of the xfs_mount that is using it - it is now much more tightly tied to the log and the items it is tracking in the log than it is tied to the xfs_mount. SGI-PV: 988143 SGI-Modid: xfs-linux-melb:xfs-kern:32353a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2008-10-30 17:39:58 +11:00
David Chinner	a9c21c1b9d	[XFS] Given the log a pointer to the AIL When we need to go from the log to the AIL, we have to go via the xfs_mount. Add a xfs_ail pointer to the log so we can go directly to the AIL associated with the log. SGI-PV: 988143 SGI-Modid: xfs-linux-melb:xfs-kern:32351a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2008-10-30 17:39:35 +11:00
David Chinner	c7e8f26827	[XFS] Move the AIL lock into the struct xfs_ail Bring the ail lock inside the struct xfs_ail. This means the AIL can be entirely manipulated via the struct xfs_ail rather than needing both the struct xfs_mount and the struct xfs_ail. SGI-PV: 988143 SGI-Modid: xfs-linux-melb:xfs-kern:32350a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2008-10-30 17:39:23 +11:00
David Chinner	5b00f14fbd	[XFS] move the AIl traversal over to a consistent interface With the new cursor interface, it makes sense to make all the traversing code use the cursor interface and make the old one go away. This means more of the AIL interfacing is done by passing struct xfs_ail pointers around the place instead of struct xfs_mount pointers. We can replace the use of xfs_trans_first_ail() in xfs_log_need_covered() as it is only checking if the AIL is empty. We can do that with a call to xfs_trans_ail_tail() instead, where a zero LSN returned indicates and empty AIL... SGI-PV: 988143 SGI-Modid: xfs-linux-melb:xfs-kern:32348a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2008-10-30 17:39:00 +11:00
David Chinner	27d8d5fe0e	[XFS] Use a cursor for AIL traversal. To replace the current generation number ensuring sanity of the AIL traversal, replace it with an external cursor that is linked to the AIL. Basically, we store the next item in the cursor whenever we want to drop the AIL lock to do something to the current item. When we regain the lock. the current item may already be free, so we can't reference it, but the next item in the traversal is already held in the cursor. When we move or delete an object, we search all the active cursors and if there is an item match we clear the cursor(s) that point to the object. This forces the traversal to restart transparently. We don't invalidate the cursor on insert because the cursor still points to a valid item. If the intem is inserted between the current item and the cursor it does not matter; the traversal is considered to be past the insertion point so it will be picked up in the next traversal. Hence traversal restarts pretty much disappear altogether with this method of traversal, which should substantially reduce the overhead of pushing on a busy AIL. Version 2 o add restart logic o comment cursor interface o minor cleanups SGI-PV: 988143 SGI-Modid: xfs-linux-melb:xfs-kern:32347a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2008-10-30 17:38:39 +11:00
Christoph Hellwig	73f6aa4d44	Fix barrier fail detection in XFS Currently we disable barriers as soon as we get a buffer in xlog_iodone that has the XBF_ORDERED flag cleared. But this can be the case not only for buffers where the barrier failed, but also the first buffer of a split log write in case of a log wraparound. Due to the disabled barriers we can easily get directory corruption on unclean shutdowns. So instead of using this check add a new buffer flag for failed barrier writes. This is a regression vs 2.6.26 caused by patch to use the right macro to check for the ORDERED flag, as we previously got true returned for every buffer. Thanks to Toei Rei for reporting the bug. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: David Chinner <david@fromorbit.com> Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-10 11:08:07 -07:00
David Chinner	b5b8c9acd5	[XFS] Fix barrier status change detection. The current code in xlog_iodone() uses the wrong macro to check if the barrier has been cleared due to an EOPNOTSUPP error form the lower layer. SGI-PV: 986143 SGI-Modid: xfs-linux-melb:xfs-kern:31984a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Nathaniel W. Turner <nate@houseofnate.net> Signed-off-by: Peter Leckie <pleckie@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-09-17 16:50:50 +10:00
Lachlan McIlroy	31bd61f2bb	[XFS] Move memory allocations for log tracing out of the critical path Memory allocations for log->l_grant_trace and iclog->ic_trace are done on demand when the first event is logged. In xlog_state_get_iclog_space() we call xlog_trace_iclog() under a spinlock and allocating memory here can cause us to sleep with a spinlock held and deadlock the system. For the log grant tracing we use KM_NOSLEEP but that means we can lose trace entries. Since there is no locking to serialize the log grant tracing we could race and have multiple allocations and leak memory. So move the allocations to where we initialize the log/iclog structures. Use KM_NOFS to avoid recursing into the filesystem and drop log->l_trace since it's not even used. SGI-PV: 983738 SGI-Modid: xfs-linux-melb:xfs-kern:31896a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2008-09-17 16:45:37 +10:00
Lachlan McIlroy	c6a7b0f8a4	[XFS] Fix use after free in xfs_log_done(). The ticket allocation code got reworked in 2.6.26 and we now free tickets whereas before we used to cache them so the use-after-free went undetected. SGI-PV: 985525 SGI-Modid: xfs-linux-melb:xfs-kern:31877a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com>	2008-08-13 16:52:50 +10:00
Lachlan McIlroy	5695ef46ef	[XFS] Use KM_NOFS for debug trace buffers Use KM_NOFS to prevent recursion back into the filesystem which can cause deadlocks. In the case of xfs_iread() we hold the lock on the inode cluster buffer while allocating memory for the trace buffers. If we recurse back into XFS to flush data that may require a transaction to allocate extents which needs log space. This can deadlock with the xfsaild thread which can't push the tail of the log because it is trying to get the inode cluster buffer lock. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31838a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com>	2008-08-13 16:51:57 +10:00
Christoph Hellwig	4249023a5d	[XFS] cleanup xfs_mountfs Remove all the useless flags and code keyed off it in xfs_mountfs. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31831a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-08-13 16:49:32 +10:00
David Chinner	12017faf38	[XFS] clean up stale references to semaphores A lot of code has been converted away from semaphores, but there are still comments that reference semaphore behaviour. The log code is the worst offender. Update the comments to reflect what the code really does now. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31814a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-08-13 16:34:31 +10:00
Matthew Wilcox	d748c62367	[XFS] Convert l_flushsema to a sv_t The l_flushsema doesn't exactly have completion semantics, nor mutex semantics. It's used as a list of tasks which are waiting to be notified that a flush has completed. It was also being used in a way that was potentially racy, depending on the semaphore implementation. By using a sv_t instead of a semaphore we avoid the need for a separate counter, since we know we just need to wake everything on the queue. Original waitqueue implementation from Matthew Wilcox. Cleanup and conversion to sv_t by Christoph Hellwig. SGI-PV: 981507 SGI-Modid: xfs-linux-melb:xfs-kern:31059a Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-07-28 16:58:12 +10:00
Michael Nishimoto	d729eae893	[XFS] Ensure that 2 GiB xfs logs work properly. We found this while experimenting with 2GiB xfs logs. The previous code never assumed that xfs logs would ever get so large. SGI-PV: 981502 SGI-Modid: xfs-linux-melb:xfs-kern:31058a Signed-off-by: Michael Nishimoto <miken@agami.com> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-07-28 16:58:11 +10:00
Denys Vlasenko	f0e2d93c29	[XFS] Remove unused arg from kmem_free() kmem_free() function takes (ptr, size) arguments but doesn't actually use second one. This patch removes size argument from all callsites. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31050a Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-07-28 16:58:07 +10:00
Dave Chinner	49641f1acf	Fix reference counting race on log buffers When we release the iclog, we do an atomic_dec_and_lock to determine if we are the last reference and need to trigger update of log headers and writeout. However, in xlog_state_get_iclog_space() we also need to check if we have the last reference count there. If we do, we release the log buffer, otherwise we decrement the reference count. But the compare and decrement in xlog_state_get_iclog_space() is not atomic, so both places can see a reference count of 2 and neither will release the iclog. That leads to a filesystem hang. Close the race by replacing the atomic_read() and atomic_dec() pair with atomic_add_unless() to ensure that they are executed atomically. Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Tim Shimmin <tes@sgi.com> Tested-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-11 11:37:18 -07:00
David Chinner	1bb7d6b5a8	[XFS] Catch log unmount failures. Unmounting the log can fail. unlikely, but it can. Catch all the error conditions an make sure it's propagated upwards. SGI-PV: 980084 SGI-Modid: xfs-linux-melb:xfs-kern:30833a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 12:02:30 +10:00
David Chinner	b911ca0472	[XFS] Sanitise xfs_log_force error checking. xfs_log_force() is declared to return an error, but we almost never check it. We don't need to check it in most cases; if there's a log I/O error then we'll be shutting down the filesystem anyway and that means we'll catch the error somewhere else. However, on certain calls we should be returning an error - sync transactions, fsync, sync writes, etc. so this isn't a pure black and white distinction. Hence make xfs_log_force() a void function that issues a warning to the syslog on error, and call _xfs_log_force() in all the places where we actually care about the error status returned. SGI-PV: 980084 SGI-Modid: xfs-linux-melb:xfs-kern:30832a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 12:02:20 +10:00
Harvey Harrison	34a622b2e1	[XFS] replace remaining __FUNCTION__ occurrences __FUNCTION__ is gcc-specific, use __func__ SGI-PV: 976035 SGI-Modid: xfs-linux-melb:xfs-kern:30775a Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:51:26 +10:00
David Chinner	6b1d1a732f	[XFS] Fix lock inversion in forced shutdown. Recent changes to xlog_state_release_iclog() placed the grant_lock inside the icloglock. forced unmount of the log does this the opposite way around, but does not depend on the order for correct working. Fix the inversion by changing the order locks are gained in xfs_log_force_umount(). SGI-PV: 979661 SGI-Modid: xfs-linux-melb:xfs-kern:30773a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:51:04 +10:00
David Chinner	4679b2d36d	[XFS] Reorganise xlog_t for better cacheline isolation of contention To reduce contention on the log in large CPU count, separate out different parts of the xlog_t structure onto different cachelines. Move each lock onto a different cacheline along with all the members that are accessed/modified while that lock is held. Also, move the debugging code into debug code. SGI-PV: 978729 SGI-Modid: xfs-linux-melb:xfs-kern:30772a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:50:53 +10:00
David Chinner	eb01c9cd87	[XFS] Remove the xlog_ticket allocator The ticket allocator is just a simple slab implementation internal to the log. It requires the icloglock to be held when manipulating it and this contributes to contention on that lock. Just kill the entire allocator and use a memory zone instead. While there, allow us to gracefully fail allocation with ENOMEM. SGI-PV: 978729 SGI-Modid: xfs-linux-melb:xfs-kern:30771a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:50:39 +10:00
David Chinner	114d23aae5	[XFS] Per iclog callback chain lock Rather than use the icloglock for protecting the iclog completion callback chain, use a new per-iclog lock so that walking the callback chain doesn't require holding a global lock. This reduces contention on the icloglock during transaction commit and log I/O completion by reducing the number of times we need to hold the global icloglock during these operations. SGI-PV: 978729 SGI-Modid: xfs-linux-melb:xfs-kern:30770a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:50:22 +10:00
David Chinner	155cc6b784	[XFS] Use atomics for iclog reference counting Now that we update the log tail LSN less frequently on transaction completion, we pass the contention straight to the global log state lock (l_iclog_lock) during transaction completion. We currently have to take this lock to decrement the iclog reference count. there is a reference count on each iclog, so we need to take �he global lock for all refcount changes. When large numbers of processes are all doing small trnasctions, the iclog reference counts will be quite high, and the state change that absolutely requires the l_iclog_lock is the except rather than the norm. Change the reference counting on the iclogs to use atomic_inc/dec so that we can use atomic_dec_and_lock during transaction completion and avoid the need for grabbing the l_iclog_lock for every reference count decrement except the one that matters - the last. SGI-PV: 975671 SGI-Modid: xfs-linux-melb:xfs-kern:30505a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:38:10 +10:00
David Chinner	b589334c7a	[XFS] Prevent AIL lock contention during transaction completion When hundreds of processors attempt to commit transactions at the same time, they can contend on the AIL lock when updating the tail LSN held in the in-core log structure. At the moment, the tail LSN is only needed when actually writing out an iclog, so it really does not need to be updated on every single transaction completion - only those that result in switching iclogs and flushing them to disk. The result is that we reduce the number of times we need to grab the AIL lock and the log grant lock by up to two orders of magnitude on large processor count machines. The problem has previously been hidden by AIL lock contention walking the AIL list which was recently solved and uncovered this issue. SGI-PV: 975671 SGI-Modid: xfs-linux-melb:xfs-kern:30504a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:38:01 +10:00
Eric Sandeen	6211870992	[XFS] remove shouting-indirection macros from xfs_sb.h Remove macro-to-small-function indirection from xfs_sb.h, and remove some which are completely unused. SGI-PV: 976035 SGI-Modid: xfs-linux-melb:xfs-kern:30528a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-10 16:24:45 +10:00
Marcin Slusarz	413d57c990	xfs: convert beX_add to beX_add_cpu (new common API) remove beX_add functions and replace all uses with beX_add_cpu Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Cc: Mark Fasheh <mark.fasheh@oracle.com> Reviewed-by: Dave Chinner <dgc@sgi.com> Cc: Timothy Shimmin <tes@sgi.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-13 16:21:19 -08:00
David Chinner	249a8c1124	[XFS] Move AIL pushing into it's own thread When many hundreds to thousands of threads all try to do simultaneous transactions and the log is in a tail-pushing situation (i.e. full), we can get multiple threads walking the AIL list and contending on the AIL lock. The AIL push is, in effect, a simple I/O dispatch algorithm complicated by the ordering constraints placed on it by the transaction subsystem. It really does not need multiple threads to push on it - even when only a single CPU is pushing the AIL, it can push the I/O out far faster that pretty much any disk subsystem can handle. So, to avoid contention problems stemming from multiple list walkers, move the list walk off into another thread and simply provide a "target" to push to. When a thread requires a push, it sets the target and wakes the push thread, then goes to sleep waiting for the required amount of space to become available in the log. This mechanism should also be a lot fairer under heavy load as the waiters will queue in arrival order, rather than queuing in "who completed a push first" order. Also, by moving the pushing to a separate thread we can do more effectively overload detection and prevention as we can keep context from loop iteration to loop iteration. That is, we can push only part of the list each loop and not have to loop back to the start of the list every time we run. This should also help by reducing the number of items we try to lock and/or push items that we cannot move. Note that this patch is not intended to solve the inefficiencies in the AIL structure and the associated issues with extremely large list contents. That needs to be addresses separately; parallel access would cause problems to any new structure as well, so I'm only aiming to isolate the structure from unbounded parallelism here. SGI-PV: 972759 SGI-Modid: xfs-linux-melb:xfs-kern:30371a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-02-07 18:22:51 +11:00
Tim Shimmin	e6a4b37f38	[XFS] Remove the BPCSHIFT and NB* based macros from XFS. The BPCSHIFT based macros, btoc, ctob, offtoc* and ctooff are either not used or don't need to be used. The NDPP, NDPP, NBBY macros don't need to be used but instead are replaced directly by PAGE_SIZE and PAGE_CACHE_SIZE where appropriate. Initial patch and motivation from Nicolas Kaiser. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:30096a Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-02-07 18:17:58 +11:00
Niv Sardi	f7b7c3673e	[XFS] Remove bogus assert This assert is bogus. We can have a forced shutdown occur between the check for the XLOG_FORCED_SHUTDOWN and the ASSERT. Also, the logging system shouldn't care about the state of XFS_FORCED_SHUTDOWN, it should only check XLOG_FORCED_SHUTDOWN. The logging system has it's own forced shutdown flag so, for the case of a forced shutdown that's not due to a logging error, we can flush the log. SGI-PV: 972985 SGI-Modid: xfs-linux-melb:xfs-kern:30029a Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-02-07 18:17:39 +11:00
David Chinner	a8272ce0c1	[XFS] Fix up sparse warnings. These are mostly locking annotations, marking things static, casts where needed and declaring stuff in header files. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:30002a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-02-07 18:14:38 +11:00
Christoph Hellwig	b53e675dc8	[XFS] xlog_rec_header/xlog_rec_ext_header endianess annotations Mostly trivial conversion with one exceptions: h_num_logops was kept in native endian previously and only converted to big endian in xlog_sync, but we always keep it big endian now. With todays cpus fast byteswap instructions that's not an issue but the new variant keeps the code clean and maintainable. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:29821a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 18:11:47 +11:00
Christoph Hellwig	67fcb7bfb6	[XFS] clean up some xfs_log_priv.h macros - the various assign lsn macros are replaced by a single inline, xlog_assign_lsn, which is equivalent to ASSIGN_ANY_LSN_HOST except for a more sane calling convention. ASSIGN_LSN_DISK is replaced by xlog_assign_lsn and a manual bytespap, and ASSIGN_LSN by the same, except we pass the cycle and block arguments explicitly instead of a log paramter. The latter two variants only had 2, respectively one user anyway. - the GET_CYCLE is replaced by a xlog_get_cycle inline with exactly the same calling conventions. - GET_CLIENT_ID is replaced by xlog_get_client_id which leaves away the unused arch argument. Instead of conditional defintions depending on host endianess we now do an unconditional swap and shift then, which generates equal code. - the unused XLOG_SET macro is removed. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:29820a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 18:11:38 +11:00
Christoph Hellwig	03bea6fe6c	[XFS] clean up some xfs_log_priv.h macros - the various assign lsn macros are replaced by a single inline, xlog_assign_lsn, which is equivalent to ASSIGN_ANY_LSN_HOST except for a more sane calling convention. ASSIGN_LSN_DISK is replaced by xlog_assign_lsn and a manual bytespap, and ASSIGN_LSN by the same, except we pass the cycle and block arguments explicitly instead of a log paramter. The latter two variants only had 2, respectively one user anyway. - the GET_CYCLE is replaced by a xlog_get_cycle inline with exactly the same calling conventions. - GET_CLIENT_ID is replaced by xlog_get_client_id which leaves away the unused arch argument. Instead of conditional defintions depending on host endianess we now do an unconditional swap and shift then, which generates equal code. - the unused XLOG_SET macro is removed. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:29819a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 18:10:31 +11:00
Eric Sandeen	007c61c686	[XFS] Remove spin.h remove spinlock init abstraction macro in spin.h, remove the callers, and remove the file. Move no-op spinlock_destroy to xfs_linux.h Cleanup spinlock locals in xfs_mount.c SGI-PV: 970382 SGI-Modid: xfs-linux-melb:xfs-kern:29751a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:47:45 +11:00
Eric Sandeen	c8b5ea289f	[XFS] Unwrap GRANT_LOCK. Un-obfuscate GRANT_LOCK, remove GRANT_LOCK->mutex_lock->spin_lock macros, call spin_lock directly, remove extraneous cookie holdover from old xfs code, and change lock type to spinlock_t. SGI-PV: 970382 SGI-Modid: xfs-linux-melb:xfs-kern:29741a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:44:41 +11:00
Eric Sandeen	b22cd72c95	[XFS] Unwrap LOG_LOCK. Un-obfuscate LOG_LOCK, remove LOG_LOCK->mutex_lock->spin_lock macros, call spin_lock directly, remove extraneous cookie holdover from old xfs code, and change lock type to spinlock_t. SGI-PV: 970382 SGI-Modid: xfs-linux-melb:xfs-kern:29740a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:44:32 +11:00
Christoph Hellwig	0adba5363c	[XFS] replace some large xfs_log_priv.h macros by proper functions ... or in the case of XLOG_TIC_ADD_OPHDR remove a useless macro entirely. SGI-PV: 968563 SGI-Modid: xfs-linux-melb:xfs-kern:29511a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-10-16 12:17:56 +10:00
Christoph Hellwig	bd186aa901	[XFS] kill the vfs_flags member in struct bhv_vfs All flags are added to xfs_mount's m_flag instead. Note that the 32bit inode flag was duplicated in both of them, but only cleared in the mount when it was not nessecary due to the filesystem beeing small enough. Two flags are still required here - one to indicate the mount option setting, and one to indicate if it applies or not. SGI-PV: 969608 SGI-Modid: xfs-linux-melb:xfs-kern:29507a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-10-16 11:45:57 +10:00
Eric Sandeen	1cb5125875	[XFS] choose single default logbuf count & size Remove sizing of logbuf size & count based on physical memory; this was never a very good gauge as it's looking at global memory, but deciding on sizing per-filesystem; no account is made of the total number of filesystems, for example. For now just take the largest "default" case, as was set for machines with >400MB - 8 x 32k buffers. This can always be tuned higher or lower with mount options if necessary. Removes one more user of xfs_physmem. SGI-PV: 968563 SGI-Modid: xfs-linux-melb:xfs-kern:29323a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-10-15 16:38:23 +10:00
David Chinner	0bfefc46dc	[XFS] Barriers need to be dynamically checked and switched off If the underlying block device suddenly stops supporting barriers, we need to handle the -EOPNOTSUPP error in a sane manner rather than shutting down the filesystem. If we get this error, clear the barrier flag, reissue the I/O, and tell the world bad things are occurring. SGI-PV: 964544 SGI-Modid: xfs-linux-melb:xfs-kern:28568a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-10-15 16:23:45 +10:00
Christoph Hellwig	4b80916b29	[XFS] Fix sparse NULL vs 0 warnings Sparse now warns about comparing pointers to 0, so change all instance where that happens to NULL instead. SGI-PV: 968555 SGI-Modid: xfs-linux-melb:xfs-kern:29308a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-09-05 14:47:33 +10:00
David Chinner	92821e2ba4	[XFS] Lazy Superblock Counters When we have a couple of hundred transactions on the fly at once, they all typically modify the on disk superblock in some way. create/unclink/mkdir/rmdir modify inode counts, allocation/freeing modify free block counts. When these counts are modified in a transaction, they must eventually lock the superblock buffer and apply the mods. The buffer then remains locked until the transaction is committed into the incore log buffer. The result of this is that with enough transactions on the fly the incore superblock buffer becomes a bottleneck. The result of contention on the incore superblock buffer is that transaction rates fall - the more pressure that is put on the superblock buffer, the slower things go. The key to removing the contention is to not require the superblock fields in question to be locked. We do that by not marking the superblock dirty in the transaction. IOWs, we modify the incore superblock but do not modify the cached superblock buffer. In short, we do not log superblock modifications to critical fields in the superblock on every transaction. In fact we only do it just before we write the superblock to disk every sync period or just before unmount. This creates an interesting problem - if we don't log or write out the fields in every transaction, then how do the values get recovered after a crash? the answer is simple - we keep enough duplicate, logged information in other structures that we can reconstruct the correct count after log recovery has been performed. It is the AGF and AGI structures that contain the duplicate information; after recovery, we walk every AGI and AGF and sum their individual counters to get the correct value, and we do a transaction into the log to correct them. An optimisation of this is that if we have a clean unmount record, we know the value in the superblock is correct, so we can avoid the summation walk under normal conditions and so mount/recovery times do not change under normal operation. One wrinkle that was discovered during development was that the blocks used in the freespace btrees are never accounted for in the AGF counters. This was once a valid optimisation to make; when the filesystem is full, the free space btrees are empty and consume no space. Hence when it matters, the "accounting" is correct. But that means the when we do the AGF summations, we would not have a correct count and xfs_check would complain. Hence a new counter was added to track the number of blocks used by the free space btrees. This is an on-disk format change. As a result of this, lazy superblock counters are a mkfs option and at the moment on linux there is no way to convert an old filesystem. This is possible - xfs_db can be used to twiddle the right bits and then xfs_repair will do the format conversion for you. Similarly, you can convert backwards as well. At some point we'll add functionality to xfs_admin to do the bit twiddling easily.... SGI-PV: 964999 SGI-Modid: xfs-linux-melb:xfs-kern:28652a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-07-14 15:28:50 +10:00
David Chinner	511105b3d7	[XFS] Fix vmalloc leak on mount/unmount. When setting the length of the iclogbuf to write out we should just be changing the desired byte count rather completely reassociating the buffer memory with the buffer. Reassociating the buffer memory changes the apparent length of the buffer and hence when we free the buffer, we don't free all the vmap()d space we originally allocated. SGI-PV: 964983 SGI-Modid: xfs-linux-melb:xfs-kern:28640a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-07-14 15:23:23 +10:00
David Chinner	3db296f341	[XFS] Fix use-after-free during log unmount. Don't reference the log buffer after running the callbacks as the callback can trigger the log buffers to be freed during unmount. SGI-PV: 964545 SGI-Modid: xfs-linux-melb:xfs-kern:28567a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-07-14 15:22:34 +10:00
Christoph Hellwig	1fa40b01ae	[XFS] Only use refcounted pages for I/O Many block drivers (aoe, iscsi) really want refcountable pages in bios, which is what almost everyone send down. XFS unfortunately has a few places where it sends down buffers that may come from kmalloc, which breaks them. Fix the places that use kmalloc()d buffers. SGI-PV: 964546 SGI-Modid: xfs-linux-melb:xfs-kern:28562a Signed-Off-By: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-07-14 15:21:14 +10:00
Tim Shimmin	955e47ad28	[XFS] Fixes the leak in reservation space because we weren't ungranting space for the unmount record - which becomes a problem in the freeze/thaw scenario. SGI-PV: 942533 SGI-Modid: xfs-linux-melb:xfs-kern:26815a Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 11:04:16 +10:00
Nathan Scott	efb8ad7e94	[XFS] Add a debug flag for allocations which are known to be larger than one page. SGI-PV: 955302 SGI-Modid: xfs-linux-melb:xfs-kern:26800a Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 11:03:05 +10:00
Nathan Scott	a3c6685eaa	[XFS] Ensure xlog_state_do_callback does not report spurious warnings on ramdisks. SGI-PV: 954802 SGI-Modid: xfs-linux-melb:xfs-kern:26627a Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 11:02:14 +10:00
Nathan Scott	f5faad7994	[XFS] Fix remount vs no/barrier options by ensuring we clear unwanted flags from iclog buffers before submitting them for writing. SGI-PV: 954772 SGI-Modid: xfs-linux-melb:xfs-kern:26605a Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-07-28 17:04:44 +10:00
Nathan Scott	5493a0fcba	[XFS] Fixup whitespace damage in log_write, remove final warning. SGI-PV: 904196 SGI-Modid: xfs-linux-melb:xfs-kern:26366a Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-06-28 11:17:28 +10:00
Nathan Scott	f6c2d1fa63	[XFS] Remove version 1 directory code. Never functioned on Linux, just pure bloat. SGI-PV: 952969 SGI-Modid: xfs-linux-melb:xfs-kern:26251a Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-06-20 13:04:51 +10:00
Nathan Scott	34327e1384	[XFS] Cleanup a missed porting conversion, and freezing. SGI-PV: 953338 SGI-Modid: xfs-linux-melb:xfs-kern:26109a Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-06-09 17:11:55 +10:00
Nathan Scott	b83bd13881	[XFS] Resolve a namespace collision on vfs/vfsops for FreeBSD porters. SGI-PV: 9533338 SGI-Modid: xfs-linux-melb:xfs-kern:26106a Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-06-09 16:48:30 +10:00
Nathan Scott	7d04a335b6	[XFS] Shutdown the filesystem if all device paths have gone. Made shutdown vop flags consistent with sync vop flags declarations too. SGI-PV: 939911 SGI-Modid: xfs-linux-melb:xfs-kern:26096a Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-06-09 14:58:38 +10:00
Nathan Scott	c41564b5af	[XFS] We really suck at spulling. Thanks to Chris Pascoe for fixing all these typos. SGI-PV: 904196 SGI-Modid: xfs-linux-melb:xfs-kern:25539a Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-03-29 08:55:14 +10:00
Jesper Juhl	014c2544e6	return statement cleanup - kill pointless parentheses This patch removes pointless parentheses from return statements. Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2006-01-15 02:37:08 +01:00
Tim Shimmin	1259845d3f	[XFS] remove XFS_LOG_RES_DEBUG and turn on the res history all the time to get more useful error info on space for trans items SGI-PV: 947110 SGI-Modid: xfs-linux-melb:xfs-kern:24886a Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-01-11 21:02:47 +11:00
Christoph Hellwig	dd954c69d1	[XFS] turn xlog helper macros into real functions SGI-PV: 946205 SGI-Modid: xfs-linux-melb:xfs-kern:203360a Signed-off-by: Christoph Hellwig <hch@sgi.com> Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-01-11 15:34:50 +11:00
Eric Sandeen	65be605419	[XFS] remove unused "readonly" arg from xlog_find_tail and xlog_recover SGI-PV: 946611 SGI-Modid: xfs-linux-melb:xfs-kern:203307a Signed-off-by: Eric Sandeen <sandeen@sgi.com> Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-01-11 15:34:19 +11:00
Nathan Scott	cfcbbbd089	[XFS] Remove old, broken nolog-mode code - noone plans to ever fix it. SGI-PV: 944821 SGI-Modid: xfs-linux:xfs-kern:24213a Signed-off-by: Nathan Scott <nathans@sgi.com>	2005-11-02 15:12:04 +11:00
Nathan Scott	7b71876980	[XFS] Update license/copyright notices to match the prefered SGI boilerplate. SGI-PV: 913862 SGI-Modid: xfs-linux:xfs-kern:23903a Signed-off-by: Nathan Scott <nathans@sgi.com>	2005-11-02 14:58:39 +11:00
Nathan Scott	a844f4510d	[XFS] Remove xfs_macros.c, xfs_macros.h, rework headers a whole lot. SGI-PV: 943122 SGI-Modid: xfs-linux:xfs-kern:23901a Signed-off-by: Nathan Scott <nathans@sgi.com>	2005-11-02 14:38:42 +11:00
Nathan Scott	fc1f8c1ca3	[XFS] Track external log/realtime device names for correct reporting in /proc/mounts. SGI-PV: 942984 SGI-Modid: xfs-linux:xfs-kern:23862a Signed-off-by: Nathan Scott <nathans@sgi.com>	2005-11-02 11:44:33 +11:00
Christoph Hellwig	f538d4da8d	[XFS] write barrier support Issue all log sync operations as ordered writes. In addition flush the disk cache on fsync if the sync cached operation didn't sync the log to disk (this requires some additional bookeping in the transaction and log code). If the device doesn't claim to support barriers, the filesystem has an extern log volume or the trial superblock write with barriers enabled failed we disable barriers and print a warning. We should probably fail the mount completely, but that could lead to nasty boot failures for the root filesystem. Not enabled by default yet, needs more destructive testing first. SGI-PV: 912426 SGI-Modid: xfs-linux:xfs-kern:198723a Signed-off-by: Christoph Hellwig <hch@sgi.com> Signed-off-by: Nathan Scott <nathans@sgi.com>	2005-11-02 10:26:59 +11:00
Christoph Hellwig	da1650a5d6	[XFS] Add format checking to cmn_err and icmn_err SGI-PV: 942243 SGI-Modid: xfs-linux:xfs-kern:198658a Signed-off-by: Christoph Hellwig <hch@sgi.com> Signed-off-by: Nathan Scott <nathans@sgi.com>	2005-11-02 10:21:35 +11:00
Tim Shimmin	7e9c639615	[XFS] 929956 add log debugging and tracing info SGI-PV: 931456 SGI-Modid: xfs-linux:xfs-kern:23155a Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Nathan Scott <nathans@sgi.com>	2005-09-02 16:42:05 +10:00
Tim Shimmin	32fb9b57ae	[XFS] Fix up the calculation of the reservation overhead to hopefully include all the components which make up the transaction in the ondisk log. Having this incomplete has shown up as problems on IRIX when some v2 log changes went in. The symptom was the msg of "xfs_log_write: reservation ran out. Need to up reservation" and was seen on synchronous writes on files with lots of holes (and therefore lots of extents). SGI-PV: 931457 SGI-Modid: xfs-linux:xfs-kern:23095a Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Nathan Scott <nathans@sgi.com>	2005-09-02 16:41:43 +10:00
Christoph Hellwig	ba0f32d460	[XFS] mark various symbols static Patch from Adrian Bunk SGI-PV: 936255 SGI-Modid: xfs-linux:xfs-kern:192760a Signed-off-by: Christoph Hellwig <hch@sgi.com> Signed-off-by: Nathan Scott <nathans@sgi.com>	2005-06-21 15:36:52 +10:00
Linus Torvalds	1da177e4c3	Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!	2005-04-16 15:20:36 -07:00

1 2 3

143 Commits