linux-next

mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-21 03:33:59 +08:00

Author	SHA1	Message	Date
Lukas Czerner	28739eea9c	ext4: protect bb_first_free in ext4_trim_all_free() with group lock We should protect reading bd_info->bb_first_free with the group lock because otherwise we might miss some free blocks. This is not a big deal at all, but the change to do right thing is really simple, so lets do that. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-05-24 18:28:07 -04:00
Lukas Czerner	7894408666	ext4: only load buddy bitmap in ext4_trim_fs() when it is needed Currently we are loading buddy ext4_mb_load_buddy() for every block group we are going through in ext4_trim_fs() in many cases just to find out that there is not enough space to be bothered with. As Amir Goldstein suggested we can use bb_free information directly from ext4_group_info. This commit removes ext4_mb_load_buddy() from ext4_trim_fs() and rather get the ext4_group_info via ext4_get_group_info() and use the bb_free information directly from that. This avoids unnecessary call to load buddy in the case the group does not have enough free space to trim. Loading buddy is now moved to ext4_trim_all_free(). Tested by me with xfstests 251. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-05-24 18:16:27 -04:00
Linus Torvalds	dc522adbee	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: jbd: Fix comment to match the code in journal_start() jbd/jbd2: remove obsolete summarise_journal_usage. jbd: Fix forever sleeping process in do_get_write_access() ext2: fix error msg when mounting fs with too-large blocksize jbd: fix fsync() tid wraparound bug ext3: Fix fs corruption when make_indexed_dir() fails ext3: Fix lock inversion in ext3_symlink()	2011-05-24 15:11:46 -07:00
Linus Torvalds	df3256f9ab	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm: dlm: make plock operation killable dlm: remove shared message stub for recovery dlm: delayed reply message warning dlm: Remove superfluous call to recalc_sigpending()	2011-05-24 15:04:00 -07:00
Eryu Guan	c867516de5	jbd2: Fix comment to match the code in jbd2__journal_start() jbd2__journal_start() returns an ERR_PTR() value rather than NULL on failure. Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-05-24 17:09:58 -04:00
John W. Linville	31ec97d9ce	Merge ssh://master.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem	2011-05-24 16:47:54 -04:00
Linus Torvalds	b0ca118dba	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (43 commits) TOMOYO: Fix wrong domainname validation. SELINUX: add /sys/fs/selinux mount point to put selinuxfs CRED: Fix load_flat_shared_library() to initialise bprm correctly SELinux: introduce path_has_perm flex_array: allow 0 length elements flex_arrays: allow zero length flex arrays flex_array: flex_array_prealloc takes a number of elements, not an end SELinux: pass last path component in may_create SELinux: put name based create rules in a hashtable SELinux: generic hashtab entry counter SELinux: calculate and print hashtab stats with a generic function SELinux: skip filename trans rules if ttype does not match parent dir SELinux: rename filename_compute_type argument to type instead of con SELinux: fix comment to state filename_compute_type takes an objname not a qstr SMACK: smack_file_lock can use the struct path LSM: separate LSM_AUDIT_DATA_DENTRY from LSM_AUDIT_DATA_PATH LSM: split LSM_AUDIT_DATA_FS into _PATH and _INODE SELINUX: Make selinux cache VFS RCU walks safe SECURITY: Move exec_permission RCU checks into security modules SELinux: security_read_policy should take a size_t not ssize_t ...	2011-05-24 13:38:19 -07:00
Sage Weil	db3540522e	ceph: fix cap flush race reentrancy In `e9964c10` we change cap flushing to do a delicate dance because some inodes on the cap_dirty list could be in a migrating state (got EXPORT but not IMPORT) in which we couldn't actually flush and move from dirty->flushing, breaking the while (!empty) { process first } loop structure. It worked for a single sync thread, but was not reentrant and triggered infinite loops when multiple syncers came along. Instead, move inodes with dirty to a separate cap_dirty_migrating list when in the limbo export-but-no-import state, allowing us to go back to the simple loop structure (which was reentrant). This is cleaner and more robust. Audited the cap_dirty users and this looks fine: list_empty(&ci->i_dirty_item) is still a reliable indicator of whether we have dirty caps (which list we're on is irrelevant) and list_del_init() calls still do the right thing. Signed-off-by: Sage Weil <sage@newdream.net>	2011-05-24 11:52:12 -07:00
Sage Weil	45e3d3eeb6	ceph: avoid inode lookup on nfs fh reconnect If we get the inode from the MDS, we have a reference in req; don't do a fresh lookup. Signed-off-by: Sage Weil <sage@newdream.net>	2011-05-24 11:52:06 -07:00
Sage Weil	3c454cf216	ceph: use LOOKUPINO to make unconnected nfs fh more reliable If we are unable to locate an inode by ino, ask the MDS using the new LOOKUPINO command. Signed-off-by: Sage Weil <sage@newdream.net>	2011-05-24 11:52:05 -07:00
Linus Torvalds	eb08d8ff47	Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6 * 'linux-next' of git://git.infradead.org/ubifs-2.6: (52 commits) UBIFS: switch to dynamic printks UBIFS: fix kernel-doc comments UBIFS: fix extremely rare mount failure UBIFS: simplify LEB recovery function further UBIFS: always cleanup the recovered LEB UBIFS: clean up LEB recovery function UBIFS: fix-up free space on mount if flag is set UBIFS: add the fixup function UBIFS: add a superblock flag for free space fix-up UBIFS: share the next_log_lnum helper UBIFS: expect corruption only in last journal head LEBs UBIFS: synchronize write-buffer before switching to the next bud UBIFS: remove BUG statement UBIFS: change bud replay function conventions UBIFS: substitute the replay tree with a replay list UBIFS: simplify replay UBIFS: store free and dirty space in the bud replay entry UBIFS: remove unnecessary stack variable UBIFS: double check that buds are replied in order UBIFS: make 2 functions static ...	2011-05-24 11:51:07 -07:00
Christoph Hellwig	55a7bc5a30	xfs: do not discard alloc btree blocks Blocks for the allocation btree are allocated from and released to the AGFL, and thus frequently reused. Even worse we do not have an easy way to avoid using an AGFL block when it is discarded due to the simple FILO list of free blocks, and thus can frequently stall on blocks that are currently undergoing a discard. Add a flag to the busy extent tracking structure to skip the discard for allocation btree blocks. In normal operation these blocks are reused frequently enough that there is no need to discard them anyway, but if they spill over to the allocation btree as part of a balance we "leak" blocks that we would otherwise discard. We could fix this by adding another flag and keeping these block in the rbtree even after they aren't busy any more so that we could discard them when they migrate out of the AGFL. Given that this would cause significant overhead I don't think it's worthwile for now. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-05-24 11:17:22 -05:00
Christoph Hellwig	e84661aa84	xfs: add online discard support Now that we have reliably tracking of deleted extents in a transaction we can easily implement "online" discard support which calls blkdev_issue_discard once a transaction commits. The actual discard is a two stage operation as we first have to mark the busy extent as not available for reuse before we can start the actual discard. Note that we don't bother supporting discard for the non-delaylog mode. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-05-24 11:17:13 -05:00
Jan Kara	93628ffb9b	ext4: fix waiting and sending of a barrier in ext4_sync_file() jbd2_log_start_commit() returns 1 only when we really start a transaction. But we also need to wait for a transaction when the commit is already running. Fix this problem by waiting for transaction commit unconditionally (which is just a quick check if the transaction is already committed). Also we have to be more careful with sending of a barrier because when transaction is being committed in parallel to ext4_sync_file() running, we cannot be sure that the barrier the journalling code sends happens after we wrote all the data for fsync (note that not every data writeout needs to trigger metadata changes thus commit of some metadata changes can be running while other data is still written out). So use jbd2_will_send_data_barrier() helper to detect the common cases when we can be sure barrier will be issued by the commit code and issue the barrier ourselves in the remaining cases. Reported-by: Edward Goggin <egoggin@vmware.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-05-24 12:00:54 -04:00
Jan Kara	bbd2be3691	jbd2: Add function jbd2_trans_will_send_data_barrier() Provide a function which returns whether a transaction with given tid will send a flush to the filesystem device. The function will be used by ext4 to detect whether fsync needs to send a separate flush or not. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-05-24 11:59:18 -04:00
Jan Kara	81be12c817	jbd2: fix sending of data flush on journal commit In data=ordered mode, it's theoretically possible (however rare) that an inode is filed to transaction's t_inode_list and a flusher thread writes all the data and inode is reclaimed before the transaction starts to commit. In such a case, we could erroneously omit sending a flush to file system device when it is different from the journal device (because data can still be in disk cache only). Fix the problem by setting a flag in a transaction when some inode is added to it and then send disk flush in the commit code when the flag is set. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-05-24 11:52:40 -04:00
Yongqiang Yang	b221349fa8	ext4: fix ext4_ext_fiemap_cb() to handle blocks before request range correctly To get delayed-extent information, ext4_ext_fiemap_cb() looks up pagecache, it thus collects information starting from a page's head block. If blocksize < pagesize, the beginning blocks of a page may lies before the request range. So ext4_ext_fiemap_cb() should proceed ignoring them, because they has been handled before. If no mapped buffer in the range is found in the 1st page, we need to look up the 2nd page, otherwise delayed-extents after a hole will be ignored. Without this patch, xfstests 225 will hung on ext4 with 1K block. Reported-by: Amir Goldstein <amir73il@users.sourceforge.net> Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-05-24 11:36:58 -04:00
James Morris	434d42cfd0	Merge branch 'next' into for-linus	2011-05-24 22:55:24 +10:00
Robin Dong	98ba073c60	ocfs2: change incorrect 'extern' keyword to 'static' in dlmfs Change function param_set_dlmfs_capabilities from 'extern' to 'static' since function param_get_dlmfs_capabilities is also 'static'. Signed-off-by: Robin Dong <sanbai@taobao.com> Signed-off-by: Joel Becker <jlbec@evilplan.org>	2011-05-23 23:59:40 -07:00
Sunil Mushran	9f62e96084	ocfs2/dlm: dlm_is_lockres_migrateable() returns boolean Patch cleans up the gunk added by commit `388c4bcb4e`. dlm_is_lockres_migrateable() now returns 1 if lockresource is deemed migrateable and 0 if not. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <jlbec@evilplan.org>	2011-05-23 23:37:39 -07:00
Tao Ma	10fca35ff1	ocfs2: Add trace event for trim. Add the corresponding trace event for trim. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: Joel Becker <jlbec@evilplan.org>	2011-05-23 23:37:20 -07:00
Tao Ma	55e67872b6	ocfs2: Add FITRIM ioctl. Add the corresponding ioctl function for FITRIM. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: Joel Becker <jlbec@evilplan.org>	2011-05-23 23:37:19 -07:00
Tao Ma	e80de36d8d	ocfs2: Add ocfs2_trim_fs for SSD trim support. Add ocfs2_trim_fs to support trimming freed clusters in the volume. A range will be given and all the freed clusters greater than minlen will be discarded to the block layer. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: Joel Becker <jlbec@evilplan.org>	2011-05-23 23:37:18 -07:00
Amerigo Wang	69a60c4d17	ocfs2: remove the /sys/o2cb symlink It is obsoleted since Dec 2005. Signed-off-by: WANG Cong <amwang@redhat.com> Signed-off-by: Joel Becker <jlbec@evilplan.org>	2011-05-23 23:37:14 -07:00
Tiger Yang	e2b0c215c2	ocfs2: clean up mount option about atime in ocfs2.txt As ocfs2 supports relatime and strictatime, we need update the relative document. Atime_quantum need work with strictatime, so only show it in procfs when mount with strictatime. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Joel Becker <jlbec@evilplan.org>	2011-05-23 23:37:12 -07:00
Linus Torvalds	343800e7d2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/hirofumi/fatfs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/hirofumi/fatfs-2.6: fat: Fix statfs->f_namelen fat: Replace all printk with fat_msg() fat: Add fat_msg() function for preformated FAT messages fat: Convert fat_fs_error to use %pV fat: Fix possible null deref in fat_cache_add() fat: use new setup() for ->dir_ops too	2011-05-23 21:11:38 -07:00
Jeff Layton	3c1105df69	cifs: don't call mid_q_entry->callback under the Global_MidLock (try #5 ) Minor revision to the last version of this patch -- the only difference is the fix to the cFYI statement in cifs_reconnect. Holding the spinlock while we call this function means that it can't sleep, which really limits what it can do. Taking it out from under the spinlock also means less contention for this global lock. Change the semantics such that the Global_MidLock is not held when the callback is called. To do this requires that we take extra care not to have sync_mid_result remove the mid from the list when the mid is in a state where that has already happened. This prevents list corruption when the mid is sitting on a private list for reconnect or when cifsd is coming down. Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-05-24 03:11:33 +00:00
Pavel Shilovsky	724d9f1cfb	CIFS: Simplify mount code for further shared sb capability Reorganize code to get mount option at first and when get a superblock. This lets us use shared superblock model further for equal mounts. Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-05-24 03:07:42 +00:00
Eryu Guan	c2b67735e5	jbd: Fix comment to match the code in journal_start() journal_start returns an ERR_PTR() value rather than NULL on failure. Cc: Jan Kara <jack@suse.cz> Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2011-05-24 00:27:53 +02:00
Linus Torvalds	a77febbef1	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: obey minleft values during extent allocation correctly xfs: reset buffer pointers before freeing them xfs: avoid getting stuck during async inode flushes xfs: fix xfs_itruncate_start tracing xfs: fix duplicate workqueue initialisation xfs: kill off xfs_printk() xfs: fix race condition in AIL push trigger xfs: make AIL target updates and compares 32bit safe. xfs: always push the AIL to the target xfs: exit AIL push work correctly when AIL is empty xfs: ensure reclaim cursor is reset correctly at end of AG xfs: add an x86 compat handler for XFS_IOC_ZERO_RANGE xfs: fix compiler warning in xfs_trace.h xfs: cleanup duplicate initializations xfs: reduce the number of pagb_lock roundtrips in xfs_alloc_clear_busy xfs: exact busy extent tracking xfs: do not immediately reuse busy extent ranges xfs: optimize AGFL refills	2011-05-23 15:19:16 -07:00
Linus Torvalds	99dff58562	Merge branch 'tty-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6 * 'tty-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6: (48 commits) serial: 8250_pci: add support for Cronyx Omega PCI multiserial board. tty/serial: Fix break handling for PORT_TEGRA tty/serial: Add explicit PORT_TEGRA type n_tracerouter and n_tracesink ldisc additions. Intel PTI implementaiton of MIPI 1149.7. Kernel documentation for the PTI feature. export kernel call get_task_comm(). tty: Remove to support serial for S5P6442 pch_phub: Support new device ML7223 8250_pci: Add support for the Digi/IBM PCIe 2-port Adapter ASoC: Update cx20442 for TTY API change pch_uart: Support new device ML7223 IOH parport: Use request_muxed_region for IT87 probe and lock tty/serial: add support for Xilinx PS UART n_gsm: Use print_hex_dump_bytes drivers/tty/moxa.c: Put correct tty value TTY: tty_io, annotate locking functions TTY: serial_core, remove superfluous set_task_state TTY: serial_core, remove invalid test Char: moxa, fix locking in moxa_write ... Fix up trivial conflicts in drivers/bluetooth/hci_ldisc.c and drivers/tty/serial/Makefile. I did the hci_ldisc thing as an evil merge, cleaning things up.	2011-05-23 12:23:20 -07:00
Theodore Ts'o	072bd7ea74	ext4: use truncate_setsize() unconditionally In commit `c8d46e41` (ext4: Add flag to files with blocks intentionally past EOF), if the EOFBLOCKS_FL flag is set, we call ext4_truncate() before calling vmtruncate(). This caused any allocated but unwritten blocks created by calling fallocate() with the FALLOC_FL_KEEP_SIZE flag to be dropped. This was done to make to make sure that EOFBLOCKS_FL would not be cleared while still leaving blocks past i_size allocated. This was not necessary, since ext4_truncate() guarantees that blocks past i_size will be dropped, even in the case where truncate() has increased i_size before calling ext4_truncate(). So fix this by removing the EOFBLOCKS_FL special case treatment in ext4_setattr(). In addition, use truncate_setsize() followed by a call to ext4_truncate() instead of using vmtruncate(). This is more efficient since it skips the call to inode_newsize_ok(), which has been checked already by inode_change_ok(). This is also in a win in the case where EOFBLOCKS_FL is set since it avoids calling ext4_truncate() twice. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-05-23 15:13:02 -04:00
Pavel Shilovsky	37bb04e5a0	CIFS: Simplify connection structure search calls Use separate functions for comparison between existing structure and what we are requesting for to make server, session and tcon search code easier to use on next superblock match call. Reviewed-by: Jeff Layton <jlayton@samba.org> Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-05-23 19:05:09 +00:00
Chris Mason	d6c0cb379c	Merge branch 'cleanups_and_fixes' into inode_numbers Conflicts: fs/btrfs/tree-log.c fs/btrfs/volumes.c Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-23 14:37:47 -04:00
Linus Torvalds	30cb6d5f2e	Merge branch 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: hrtimers: Reorder clock bases hrtimers: Avoid touching inactive timer bases hrtimers: Make struct hrtimer_cpu_base layout less stupid timerfd: Manage cancelable timers in timerfd clockevents: Move C3 stop test outside lock alarmtimer: Drop device refcount after rtc_open() alarmtimer: Check return value of class_find_device() timerfd: Allow timers to be cancelled when clock was set hrtimers: Prepare for cancel on clock was set timers	2011-05-23 11:30:28 -07:00
Christoph Hellwig	c02324a6ae	cifs: remove unused SMB2 config and mount options There's no SMB2 support in the CIFS filesystem driver, so there's no need to have a config and mount option for it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-05-23 18:08:05 +00:00
Namhyung Kim	825cdcb1a5	splice: add wakeup_pipe_readers() Add and use wakeup_pipe_readers() to consolidate duplicated codes. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-05-23 19:58:53 +02:00
Xiao Guangrong	1f78160ce1	Btrfs: using rcu lock in the reader side of devices list fs_devices->devices is only updated on remove and add device paths, so we can use rcu to protect it in the reader side Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-23 13:24:43 -04:00
Xiao Guangrong	4622470565	Btrfs: drop unnecessary device lock Drop device_list_mutex for the reader side on clone_fs_devices and btrfs_rm_device pathes since the fs_info->volume_mutex can ensure the device list is not updated btrfs_close_extra_devices is the initialized path, we can not add or remove device at this time, so we can simply drop the mutex safely, like other initialized function does(add_missing_dev, __find_device, __btrfs_open_devices ...). Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-23 13:24:43 -04:00
Xiao Guangrong	0c1daee085	Btrfs: fix the race between remove dev and alloc chunk On remove device path, it updates device->dev_alloc_list but does not hold chunk lock Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-23 13:24:43 -04:00
Xiao Guangrong	c9513edb00	Btrfs: fix the race between reading and updating devices On btrfs_congested_fn and __unplug_io_fn paths, we should hold device_list_mutex to avoid remove/add device path to update fs_devices->devices On __btrfs_close_devices and btrfs_prepare_sprout paths, the devices in fs_devices->devices or fs_devices->devices is updated, so we should hold the mutex to avoid the reader side to reach them Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-23 13:24:42 -04:00
Xiao Guangrong	4f6c9328c6	Btrfs: fix bh leak on __btrfs_open_devices path 'bh' is forgot to release if no error is detected Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-23 13:24:42 -04:00
Xiao Guangrong	c7f895a2b2	Btrfs: fix unsafe usage of merge_state merge_state can free the current state if it can be merged with the next node, but in set_extent_bit(), after merge_state, we still use the current extent to get the next node and cache it into cached_state Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-23 13:24:41 -04:00
Xiao Guangrong	8233767a22	Btrfs: allocate extent state and check the result properly It doesn't allocate extent_state and check the result properly: - in set_extent_bit, it doesn't allocate extent_state if the path is not allowed wait - in clear_extent_bit, it doesn't check the result after atomic-ly allocate, we trigger BUG_ON() if it's fail - if allocate fail, we trigger BUG_ON instead of returning -ENOMEM since the return value of clear_extent_bit() is ignored by many callers Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-23 13:24:41 -04:00
Julia Lawall	b083916638	fs/btrfs: Add missing btrfs_free_path Btrfs_alloc_path should be matched with btrfs_free_path in error-handling code. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @r exists@ local idexpression struct btrfs_path * x; expression ra,rb; position p1,p2; @@ x = btrfs_alloc_path@p1(...) ... when != btrfs_free_path(x,...) when != if (...) { ... btrfs_free_path(x,...) ...} when != x = ra if(...) { ... when != x = rb when forall when != btrfs_free_path(x,...) $return <+...x...+>; \\| return@p2...; $ } @script:python@ p1 << r.p1; p2 << r.p2; @@ cocci.print_main("alloc",p1) cocci.print_secs("return",p2) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-23 13:24:41 -04:00
Tsutomu Itoh	37daa4f968	Btrfs: check return value of btrfs_inc_extent_ref() If return value of btrfs_inc_extent_ref() is not 0, BUG() is called. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-23 13:24:40 -04:00
Tsutomu Itoh	c00e9493f1	Btrfs: return error to caller if read_one_inode() fails When read_one_inode() fails, error code is returned to caller instead of BUG_ON(). Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-23 13:24:40 -04:00
Tsutomu Itoh	1cd307990d	Btrfs: BUG_ON is deleted from the caller of btrfs_truncate_item & btrfs_extend_item Currently, btrfs_truncate_item and btrfs_extend_item returns only 0. So, the check by BUG_ON in the caller is unnecessary. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-23 13:24:39 -04:00
Tsutomu Itoh	65a246c5ff	Btrfs: return error code to caller when btrfs_del_item fails The error code is returned instead of calling BUG_ON when btrfs_del_item returns the error. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-23 13:24:39 -04:00
Tsutomu Itoh	b0b802d7e3	Btrfs: return error code to caller when btrfs_previous_item fails The error code is returned instead of calling BUG_ON when btrfs_previous_item returns the error. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-23 13:24:39 -04:00
Sergei Trofimovich	27160b6b5a	btrfs: fix typo 'testeing' -> 'testing' Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-23 13:24:32 -04:00
Sergei Trofimovich	9694b3fcbb	btrfs: typo: 'btrfS' -> 'btrfs' Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-23 13:24:25 -04:00
Sergei Trofimovich	c4f675cd40	btrfs: don't spin in shrink_delalloc if there is nothing to free Observed as a large delay when --mixed filesystem is filled up. Test example: 1. create tiny --mixed FS: $ dd if=/dev/zero of=2G.img seek=$((2048 * 1024 * 1024 - 1)) count=1 bs=1 $ mkfs.btrfs --mixed 2G.img $ mount -oloop 2G.img /mnt/ut/ 2. Try to fill it up: $ dd if=/dev/urandom of=10M.file bs=10240 count=1024 $ seq 1 256 \| while read file_no; do echo $file_no; time cp 10M.file ${file_no}.copy; done Up to '200.copy' it goes fast, but when disk fills-up each -ENOSPC message takes 3 seconds to pop-up _every_ ENOSPC (and in usermode linux it's even more: 30-60 seconds!). (Maybe, time depends on kernel's timer resolution). No IO, no CPU load, just rescheduling. Some debugging revealed busy spinning in shrink_delalloc. Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org> Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-23 13:24:14 -04:00
Jamey Sharp	0f3b708c11	btrfs: Delete unused version.sh script. In 2008, commit `b4f6c45dfb` dropped the use of fs/btrfs/version.sh, but left the script behind. Kill it. Commit by Jamey Sharp and Josh Triplett. Signed-off-by: Jamey Sharp <jamey@minilop.net> Signed-off-by: Josh Triplett <josh@joshtriplett.org> Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-23 13:05:39 -04:00
Hugo Mills	e215686715	btrfs: Ensure the tree search ioctl returns the right number of records Btrfs's tree search ioctl has a field to indicate that no more than a given number of records should be returned. The ioctl doesn't honour this, as the tested value is not incremented until the end of the copy_to_sk function. This patch removes an unnecessary local variable, and updates the num_found counter as each key is found in the tree. Signed-off-by: Hugo Mills <hugo@carfax.org.uk> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-23 13:05:39 -04:00
Andi Kleen	0956c798ef	BTRFS: Remove unused node_lock `240f62c875` replaced the node_lock with rcu_read_lock, but forgot to remove the actual lock in the data structure. Remove it here. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-23 13:05:39 -04:00
Linus Torvalds	57d19e80f4	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits) b43: fix comment typo reqest -> request Haavard Skinnemoen has left Atmel cris: typo in mach-fs Makefile Kconfig: fix copy/paste-ism for dell-wmi-aio driver doc: timers-howto: fix a typo ("unsgined") perf: Only include annotate.h once in tools/perf/util/ui/browsers/annotate.c md, raid5: Fix spelling error in comment ('Ofcourse' --> 'Of course'). treewide: fix a few typos in comments regulator: change debug statement be consistent with the style of the rest Revert "arm: mach-u300/gpio: Fix mem_region resource size miscalculations" audit: acquire creds selectively to reduce atomic op overhead rtlwifi: don't touch with treewide double semicolon removal treewide: cleanup continuations and remove logging message whitespace ath9k_hw: don't touch with treewide double semicolon removal include/linux/leds-regulator.h: fix syntax in example code tty: fix typo in descripton of tty_termios_encode_baud_rate xtensa: remove obsolete BKL kernel option from defconfig m68k: fix comment typo 'occcured' arch:Kconfig.locks Remove unused config option. treewide: remove extra semicolons ...	2011-05-23 09:12:26 -07:00
Tejun Heo	ff2a9941ca	block: move bd_set_size() above rescan_partitions() in __blkdev_get() `02e352287a` (block: rescan partitions on invalidated devices on -ENOMEDIA too) relocated partition rescan above explicit bd_set_size() to simplify condition check. As rescan_partitions() does its own bdev size setting, this doesn't break anything; however, rescan_partitions() prints out the following messages when adjusting bdev size, which can be confusing. sda: detected capacity change from 0 to 146815737856 sdb: detected capacity change from 0 to 146815737856 This patch restores the original order and remove the warning messages. stable: Please apply together with `02e352287a` (block: rescan partitions on invalidated devices on -ENOMEDIA too). Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Tony Luck <tony.luck@gmail.com> Tested-by: Tony Luck <tony.luck@gmail.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-05-23 08:50:48 -07:00
David Teigland	901025d2f3	dlm: make plock operation killable Allow processes blocked on plock requests to be interrupted when they are killed. This leaves the problem of cleaning up the lock state in userspace. This has three parts: 1. Add a flag to unlock operations sent to userspace indicating the file is being closed. Userspace will then look for and clear any waiting plock operations that were abandoned by an interrupted process. 2. Queue an unlock-close operation (like in 1) to clean up userspace from an interrupted plock request. This is needed because the vfs will not send a cleanup-unlock if it sees no locks on the file, which it won't if the interrupted operation was the only one. 3. Do not use replies from userspace for unlock-close operations because they are unnecessary (they are just cleaning up for the process which did not make an unlock call). This also simplifies the new unlock-close generated from point 2. Signed-off-by: David Teigland <teigland@redhat.com>	2011-05-23 10:47:06 -05:00
Linus Torvalds	4d9dec4db2	Merge branch 'exec_rm_compat' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc * 'exec_rm_compat' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc: exec: document acct_arg_size() exec: unify do_execve/compat_do_execve code exec: introduce struct user_arg_ptr exec: introduce get_user_arg_ptr() helper	2011-05-23 08:28:34 -07:00
Linus Torvalds	34b064569e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes: GFS2: Wait properly when flushing the ail list GFS2: Wipe directory hash table metadata when deallocating a directory	2011-05-23 08:24:09 -07:00
liubo	8e531cdfeb	Btrfs: do not flush csum items of unchanged file data during treelog The current code relogs the entire inode every time during fsync log, and it is much better suited to small files rather than large ones. During my performance test, the fsync performace of large files sucks, and we can ascribe this to the tremendous amount of csum infos of the large ones, cause we have to flush all of these csum infos into log trees even when there are only _one_ change in the whole file data. Apparently, to optimize fsync, we need to create a filter to skip the unnecessary csum ones, that is, the corresponding file data remains unchanged before this fsync. Here I have some test results to show, I use sysbench to do "random write + fsync". === sysbench --test=fileio --num-threads=1 --file-num=2 --file-block-size=4K --file-total-size=8G --file-test-mode=rndwr --file-io-mode=sync --file-extra-flags= [prepare, run] === Sysbench args: - Number of threads: 1 - Extra file open flags: 0 - 2 files, 4Gb each - Block size 4Kb - Number of random requests for random IO: 10000 - Read/Write ratio for combined random IO test: 1.50 - Periodic FSYNC enabled, calling fsync() each 100 requests. - Calling fsync() at the end of test, Enabled. - Using synchronous I/O mode - Doing random write test Sysbench results: === Operations performed: 0 Read, 10000 Write, 200 Other = 10200 Total Read 0b Written 39.062Mb Total transferred 39.062Mb === a) without patch: (SPEED : 451.01Kb/sec) 112.75 Requests/sec executed b) with patch: (SPEED : 4.7533Mb/sec) 1216.84 Requests/sec executed PS: I've made a _sub transid_ stuff patch, but it does not perform as effectively as this patch, and I'm wanderring where the problem is and trying to improve it more. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-23 10:13:16 -04:00
Thomas Gleixner	9ec2690758	timerfd: Manage cancelable timers in timerfd Peter is concerned about the extra scan of CLOCK_REALTIME_COS in the timer interrupt. Yes, I did not think about it, because the solution was so elegant. I didn't like the extra list in timerfd when it was proposed some time ago, but with a rcu based list the list walk it's less horrible than the original global lock, which was held over the list iteration. Requested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Peter Zijlstra <peterz@infradead.org>	2011-05-23 13:59:53 +02:00
Tejun Heo	7e69723fef	block: move bd_set_size() above rescan_partitions() in __blkdev_get() `02e352287a` (block: rescan partitions on invalidated devices on -ENOMEDIA too) relocated partition rescan above explicit bd_set_size() to simplify condition check. As rescan_partitions() does its own bdev size setting, this doesn't break anything; however, rescan_partitions() prints out the following messages when adjusting bdev size, which can be confusing. sda: detected capacity change from 0 to 146815737856 sdb: detected capacity change from 0 to 146815737856 This patch restores the original order and remove the warning messages. stable: Please apply together with `02e352287a` (block: rescan partitions on invalidated devices on -ENOMEDIA too). Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Tony Luck <tony.luck@gmail.com> Tested-by: Tony Luck <tony.luck@gmail.com> Cc: stable@kernel.org Stable note: 2.6.39 only. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-05-23 13:26:07 +02:00
Chris Mason	712673339a	Merge branch 'for-chris' of git://git.kernel.org/pub/scm/linux/kernel/git/arne/btrfs-unstable-arne into inode_numbers Conflicts: fs/btrfs/Makefile fs/btrfs/ctree.h fs/btrfs/volumes.h Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-23 06:30:52 -04:00
Linus Torvalds	caebc160ce	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: nilfs2: use mark_buffer_dirty to mark btnode or meta data dirty nilfs2: always set back pointer to host inode in mapping->host nilfs2: get rid of NILFS_I_NILFS nilfs2: use list_first_entry nilfs2: use empty_aops for gc-inodes nilfs2: implement resize ioctl nilfs2: add truncation routine of segment usage file nilfs2: add routine to move secondary super block nilfs2: add ioctl which limits range of segment to be allocated nilfs2: zero fill unused portion of super root block nilfs2: super root size should change depending on inode size nilfs2: get rid of private page allocator nilfs2: merge list_del()/list_add_tail() to list_move_tail()	2011-05-22 22:43:01 -07:00
Artem Bityutskiy	56e46742e8	UBIFS: switch to dynamic printks Switch to debugging using dynamic printk (pr_debug()). There is no good reason to carry custom debugging prints if there is so cool and powerful generic dynamic printk infrastructure, see Documentation/dynamic-debug-howto.txt. With dynamic printks we can switch on/of individual prints, per-file, per-function and per format messages. This means that instead of doing old-fashioned echo 1 > /sys/module/ubifs/parameters/debug_msgs to enable general messages, we can do: echo 'format "UBIFS DBG gen" +ptlf' > control to enable general messages and additionally ask the dynamic printk infrastructure to print process ID, line number and function name. So there is no reason to keep UBIFS-specific crud if there is more powerful generic thing. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-05-23 08:22:20 +03:00
Jeff Layton	59ffd84141	cifs: add ignore_pend flag to cifs_call_async The current code always ignores the max_pending limit. Have it instead only optionally ignore the pending limit. For CIFSSMBEcho, we need to ignore it to make sure they always can go out. For async reads, writes and potentially other calls, we need to respect it. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-05-23 02:59:16 +00:00
Jeff Layton	fcc31cb6f1	cifs: make cifs_send_async take a kvec array We'll need this for async writes, so convert the call to take a kvec array. CIFSSMBEcho is changed to put a kvec on the stack and pass in the SMB buffer using that. Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-05-23 02:58:26 +00:00
Jeff Layton	2c8f981d93	cifs: consolidate SendReceive response checks Further consolidate the SendReceive code by moving the checks run over the packet into a separate function that all the SendReceive variants can call. We can also eliminate the check for a receive_len that's too big or too small. cifs_demultiplex_thread already checks that and disconnects the socket if that occurs, while setting the midStatus to MALFORMED. It'll never call this code if that's the case. Finally do a little cleanup. Use "goto out" on errors so that the flow of code in the normal case is more evident. Also switch the logErr variable in map_smb_to_linux_error to a bool. Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-05-23 02:58:24 +00:00
Tao Ma	28e35e42fb	jbd2: Fix the wrong calculation of t_max_wait in update_t_max_wait t_max_wait is added in commit `8e85fb3f` to indicate how long we were waiting for new transaction to start. In commit `6d0bf005`, it is moved to another function named update_t_max_wait to avoid a build warning. But the wrong thing is that the original 'ts' is initialized in the start of function start_this_handle and we can calculate t_max_wait in the right way. while with this change, ts is initialized within the function and t_max_wait can never be calculated right. This patch moves the initialization of ts to the original beginning of start_this_handle and pass it to function update_t_max_wait so that it can be calculated right and the build warning is avoided also. Cc: Jan Kara <jack@suse.cz> Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Eric Sandeen <sandeen@redhat.com>	2011-05-22 21:45:26 -04:00
Eric Gouriou	f6d2f6b327	ext4: fix unbalanced up_write() in ext4_ext_truncate() error path ext4_ext_truncate() should not invoke up_write(&EXT4_I(inode)->i_data_sem) when ext4_orphan_add() returns an error, as it hasn't performed a down_write() yet. This trivial patch fixes this by moving the up_write() invocation above the out_stop label. Signed-off-by: Eric Gouriou <egouriou@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-05-22 21:33:00 -04:00
Vivek Haldar	77f4135f2a	ext4: count hits/misses of extent cache and expose in sysfs The number of hits and misses for each filesystem is exposed in /sys/fs/ext4/<dev>/extent_cache_{hits, misses}. Tested: fsstress, manual checks. Signed-off-by: Vivek Haldar <haldar@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-05-22 21:24:16 -04:00
Yongqiang Yang	93917411be	ext4: make ext4_split_extent() handle error correctly Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Mingming Cao <cmm@us.ibm.com>	2011-05-22 20:49:12 -04:00
Theodore Ts'o	373cd5c53d	ext4: don't show mount options in /proc/mounts if there is no journal After creating an ext4 file system without a journal: # mke2fs -t ext4 -O ^has_journal /dev/sda # mount -t ext4 /dev/sda /test the /proc/mounts will show: "/dev/sda /test ext4 rw,relatime,user_xattr,acl,barrier=1,data=writeback 0 0" which can fool users into thinking that the fs is using writeback mode. So don't set the writeback option when the journal has not been enabled; we don't depend on the writeback option being set, since ext4_should_writeback_data() in ext4_jbd2.h tests to see if the journal is not present before returning true. Reported-by: Robin Dong <sanbai@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-05-22 16:12:35 -04:00
Heiko Carstens	9ce6e0be06	fs: add missing prefetch.h include Fixes this build error on s390 and probably other archs as well: fs/inode.c: In function 'new_inode': fs/inode.c:894:2: error: implicit declaration of function 'spin_lock_prefetch' Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> [ Happens on architectures that don't define their own prefetch functions in <asm/processor.h>, and instead rely on the default ones in <linux/prefetch.h> - Linus] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-05-22 11:26:02 -07:00
Chris Mason	aa2dfb372a	Merge branch 'allocator' of git://git.kernel.org/pub/scm/linux/kernel/git/arne/btrfs-unstable-arne into inode_numbers Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-22 12:36:34 -04:00
Chris Mason	945d8962ce	Merge branch 'cleanups' of git://repo.or.cz/linux-2.6/btrfs-unstable into inode_numbers Conflicts: fs/btrfs/extent-tree.c fs/btrfs/free-space-cache.c fs/btrfs/inode.c fs/btrfs/tree-log.c Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-22 12:33:42 -04:00
Chris Mason	0d0ca30f18	Btrfs: update the delayed inode code to use the btrfs_ino helper. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-22 07:11:22 -04:00
Chris Mason	dcc6d07322	Merge branch 'delayed_inode' into inode_numbers Conflicts: fs/btrfs/inode.c fs/btrfs/ioctl.c fs/btrfs/transaction.c Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-22 07:07:01 -04:00
Steven Whitehouse	26b06a6958	GFS2: Wait properly when flushing the ail list The ail flush code has always relied upon log flushing to prevent it from spinning needlessly. This fixes it to wait on the last I/O request submitted (we don't need to wait for all of it) instead of either spinning with io_schedule or sleeping. As a result cpu usage of gfs2_logd is much reduced with certain workloads. Reported-by: Abhijith Das <adas@redhat.com> Tested-by: Abhijith Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-05-21 19:21:07 +01:00
Miao Xie	16cdcec736	btrfs: implement delayed inode items operation Changelog V5 -> V6: - Fix oom when the memory load is high, by storing the delayed nodes into the root's radix tree, and letting btrfs inodes go. Changelog V4 -> V5: - Fix the race on adding the delayed node to the inode, which is spotted by Chris Mason. - Merge Chris Mason's incremental patch into this patch. - Fix deadlock between readdir() and memory fault, which is reported by Itaru Kitayama. Changelog V3 -> V4: - Fix nested lock, which is reported by Itaru Kitayama, by updating space cache inode in time. Changelog V2 -> V3: - Fix the race between the delayed worker and the task which does delayed items balance, which is reported by Tsutomu Itoh. - Modify the patch address David Sterba's comment. - Fix the bug of the cpu recursion spinlock, reported by Chris Mason Changelog V1 -> V2: - break up the global rb-tree, use a list to manage the delayed nodes, which is created for every directory and file, and used to manage the delayed directory name index items and the delayed inode item. - introduce a worker to deal with the delayed nodes. Compare with Ext3/4, the performance of file creation and deletion on btrfs is very poor. the reason is that btrfs must do a lot of b+ tree insertions, such as inode item, directory name item, directory name index and so on. If we can do some delayed b+ tree insertion or deletion, we can improve the performance, so we made this patch which implemented delayed directory name index insertion/deletion and delayed inode update. Implementation: - introduce a delayed root object into the filesystem, that use two lists to manage the delayed nodes which are created for every file/directory. One is used to manage all the delayed nodes that have delayed items. And the other is used to manage the delayed nodes which is waiting to be dealt with by the work thread. - Every delayed node has two rb-tree, one is used to manage the directory name index which is going to be inserted into b+ tree, and the other is used to manage the directory name index which is going to be deleted from b+ tree. - introduce a worker to deal with the delayed operation. This worker is used to deal with the works of the delayed directory name index items insertion and deletion and the delayed inode update. When the delayed items is beyond the lower limit, we create works for some delayed nodes and insert them into the work queue of the worker, and then go back. When the delayed items is beyond the upper bound, we create works for all the delayed nodes that haven't been dealt with, and insert them into the work queue of the worker, and then wait for that the untreated items is below some threshold value. - When we want to insert a directory name index into b+ tree, we just add the information into the delayed inserting rb-tree. And then we check the number of the delayed items and do delayed items balance. (The balance policy is above.) - When we want to delete a directory name index from the b+ tree, we search it in the inserting rb-tree at first. If we look it up, just drop it. If not, add the key of it into the delayed deleting rb-tree. Similar to the delayed inserting rb-tree, we also check the number of the delayed items and do delayed items balance. (The same to inserting manipulation) - When we want to update the metadata of some inode, we cached the data of the inode into the delayed node. the worker will flush it into the b+ tree after dealing with the delayed insertion and deletion. - We will move the delayed node to the tail of the list after we access the delayed node, By this way, we can cache more delayed items and merge more inode updates. - If we want to commit transaction, we will deal with all the delayed node. - the delayed node will be freed when we free the btrfs inode. - Before we log the inode items, we commit all the directory name index items and the delayed inode update. I did a quick test by the benchmark tool[1] and found we can improve the performance of file creation by ~15%, and file deletion by ~20%. Before applying this patch: Create files: Total files: 50000 Total time: 1.096108 Average time: 0.000022 Delete files: Total files: 50000 Total time: 1.510403 Average time: 0.000030 After applying this patch: Create files: Total files: 50000 Total time: 0.932899 Average time: 0.000019 Delete files: Total files: 50000 Total time: 1.215732 Average time: 0.000024 [1] http://marc.info/?l=linux-btrfs&m=128212635122920&q=p3 Many thanks for Kitayama-san's help! Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Reviewed-by: David Sterba <dave@jikos.cz> Tested-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Tested-by: Itaru Kitayama <kitayama@cl.bb4u.ne.jp> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-21 09:30:56 -04:00
Chris Mason	0965537308	Merge branch 'ino-alloc' of git://repo.or.cz/linux-btrfs-devel into inode_numbers Conflicts: fs/btrfs/free-space-cache.c Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-21 09:27:38 -04:00
Steven Whitehouse	6d3117b412	GFS2: Wipe directory hash table metadata when deallocating a directory The deallocation code for directories in GFS2 is largely divided into two parts. The first part deallocates any directory leaf blocks and marks the directory as being a regular file when that is complete. The second stage was identical to deallocating regular files. Regular files have their data blocks in a different address space to directories, and thus what would have been normal data blocks in a regular file (the hash table in a GFS2 directory) were deallocated correctly. However, a reference to these blocks was left in the journal (assuming of course that some previous activity had resulted in those blocks being in the journal or ail list). This patch uses the i_depth as a test of whether the inode is an exhash directory (we cannot test the inode type as that has already been changed to a regular file at this stage in deallocation) The original issue was reported by Chris Hertel as an issue he encountered running bonnie++ Reported-by: Christopher R. Hertel <crh@samba.org> Cc: Abhijith Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-05-21 14:05:58 +01:00
Erez Zadok	1a4022f88d	VFS: move BUG_ON test for symlink nd->depth after current->link_count test This solves a serious VFS-level bug in nested_symlink (which was rewritten from do_follow_link), and follows the order of depth tests that existed before. The bug triggers a BUG_ON in fs/namei.c:1381, when running racer with symlink and rename ops. Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu> Acked-by: Miklos Szeredi <mszeredi@suse.cz> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-05-21 00:12:16 -07:00
Timo Warns	cae13fe4cc	Fix for buffer overflow in ldm_frag_add not sufficient As Ben Hutchings discovered [1], the patch for CVE-2011-1017 (buffer overflow in ldm_frag_add) is not sufficient. The original patch in commit `c340b1d640` ("fs/partitions/ldm.c: fix oops caused by corrupted partition table") does not consider that, for subsequent fragments, previously allocated memory is used. [1] http://lkml.org/lkml/2011/5/6/407 Reported-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Timo Warns <warns@pre-sense.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-05-20 16:40:36 -07:00
Linus Torvalds	8e7bfcbab3	Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6 * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6: [IA64] define "_sdata" symbol pstore: Fix Kconfig dependencies for apei->pstore pstore: fix potential logic issue in pstore read interface pstore: fix pstore filesystem mount/remount issue pstore: fix one type of return value in pstore [IA64] fix build warning in arch/ia64/oprofile/backtrace.c	2011-05-20 13:39:00 -07:00
Linus Torvalds	91444f47b2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: (32 commits) [CIFS] Fix to problem with getattr caused by invalidate simplification patch [CIFS] Remove sparse warning [CIFS] Update cifs to version 1.72 cifs: Change key name to cifs.idmap, misc. clean-up cifs: Unconditionally copy mount options to superblock info cifs: Use kstrndup for cifs_sb->mountdata cifs: Simplify handling of submount options in cifs_mount. cifs: cifs_parse_mount_options: do not tokenize mount options in-place cifs: Add support for mounting Windows 2008 DFS shares cifs: Extract DFS referral expansion logic to separate function cifs: turn BCC into a static inlined function cifs: keep BCC in little-endian format cifs: fix some unused variable warnings in id_rb_search CIFS: Simplify invalidate part (try #5) CIFS: directio read/write cleanups consistently use smb_buf_length as be32 for cifs (try 3) cifs: Invoke id mapping functions (try #17 repost) cifs: Add idmap key and related data structures and functions (try #17 repost) CIFS: Add launder_page operation (try #3) Introduce smb2 mounts as vers=2 ...	2011-05-20 13:37:49 -07:00
Linus Torvalds	3ed4c0583d	Merge branch 'ptrace' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc * 'ptrace' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc: (41 commits) signal: trivial, fix the "timespec declared inside parameter list" warning job control: reorganize wait_task_stopped() ptrace: fix signal->wait_chldexit usage in task_clear_group_stop_trapping() signal: sys_sigprocmask() needs retarget_shared_pending() signal: cleanup sys_sigprocmask() signal: rename signandsets() to sigandnsets() signal: do_sigtimedwait() needs retarget_shared_pending() signal: introduce do_sigtimedwait() to factor out compat/native code signal: sys_rt_sigtimedwait: simplify the timeout logic signal: cleanup sys_rt_sigprocmask() x86: signal: sys_rt_sigreturn() should use set_current_blocked() x86: signal: handle_signal() should use set_current_blocked() signal: sigprocmask() should do retarget_shared_pending() signal: sigprocmask: narrow the scope of ->siglock signal: retarget_shared_pending: optimize while_each_thread() loop signal: retarget_shared_pending: consider shared/unblocked signals only signal: introduce retarget_shared_pending() ptrace: ptrace_check_attach() should not do s/STOPPED/TRACED/ signal: Turn SIGNAL_STOP_DEQUEUED into GROUP_STOP_DEQUEUED signal: do_signal_stop: Remove the unneeded task_clear_group_stop_pending() ...	2011-05-20 13:33:21 -07:00
Linus Torvalds	6c1b8d94bc	Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (32 commits) GFS2: Move all locking inside the inode creation function GFS2: Clean up symlink creation GFS2: Clean up mkdir GFS2: Use UUID field in generic superblock GFS2: Rename ops_inode.c to inode.c GFS2: Inode.c is empty now, remove it GFS2: Move final part of inode.c into super.c GFS2: Move most of the remaining inode.c into ops_inode.c GFS2: Move gfs2_refresh_inode() and friends into glops.c GFS2: Remove gfs2_dinode_print() function GFS2: When adding a new dir entry, inc link count if it is a subdir GFS2: Make gfs2_dir_del update link count when required GFS2: Don't use gfs2_change_nlink in link syscall GFS2: Don't use a try lock when promoting to a higher mode GFS2: Double check link count under glock GFS2: Improve bug trap code in ->releasepage() GFS2: Fix ail list traversal GFS2: make sure fallocate bytes is a multiple of blksize GFS2: Add an AIL writeback tracepoint GFS2: Make writeback more responsive to system conditions ...	2011-05-20 13:28:45 -07:00
Linus Torvalds	268bb0ce3e	sanitize <linux/prefetch.h> usage Commit `e66eed651f` ("list: remove prefetching from regular list iterators") removed the include of prefetch.h from list.h, which uncovered several cases that had apparently relied on that rather obscure header file dependency. So this fixes things up a bit, using grep -L linux/prefetch.h $(git grep -l '[^a-z_]prefetchw(' -- '.[ch]') grep -L 'prefetchw(' $(git grep -l 'linux/prefetch.h' -- '.[ch]') to guide us in finding files that either need <linux/prefetch.h> inclusion, or have it despite not needing it. There are more of them around (mostly network drivers), but this gets many core ones. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-05-20 12:50:29 -07:00
Jens Axboe	698567f3fa	Merge commit 'v2.6.39' into for-2.6.40/core Since for-2.6.40/core was forked off the 2.6.39 devel tree, we've had churn in the core area that makes it difficult to handle patches for eg cfq or blk-throttle. Instead of requiring that they be based in older versions with bugs that have been fixed later in the rc cycle, merge in 2.6.39 final. Also fixes up conflicts in the below files. Conflicts: drivers/block/paride/pcd.c drivers/cdrom/viocd.c drivers/ide/ide-cd.c Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-05-20 20:33:15 +02:00
Thomas Gleixner	250f972d85	Merge branch 'timers/urgent' into timers/core Reason: Get upstream fixes and kfree_rcu which is necessary for a follow up patch. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2011-05-20 20:08:05 +02:00
Lukas Czerner	1bb933fb1f	ext4: fix possible use-after-free in ext4_remove_li_request() We need to take reference to the s_li_request after we take a mutex, because it might be freed since then, hence result in accessing old already freed memory. Also we should protect the whole ext4_remove_li_request() because ext4_li_info might be in the process of being freed in ext4_lazyinit_thread(). Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Eric Sandeen <sandeen@redhat.com>	2011-05-20 13:55:29 -04:00
Lukas Czerner	51ce651156	ext4: fix the mount option "init_itable=n" to work as expected for n=0 For some reason, when we set the mount option "init_itable=0" it behaves as we would set init_itable=20 which is not right at all. Basically when we set it to zero we are saying to lazyinit thread not to wait between zeroing the inode table (except of cond_resched()) so this commit fixes that and removes the unnecessary condition. The 'n' should be also properly used on remount. When the n is not set at all, it means that the default miltiplier EXT4_DEF_LI_WAIT_MULT is set instead. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reported-by: Eric Sandeen <sandeen@redhat.com>	2011-05-20 13:55:16 -04:00
Lukas Czerner	e1290b3e62	ext4: Remove unnecessary wait_event ext4_run_lazyinit_thread() For some reason we have been waiting for lazyinit thread to start in the ext4_run_lazyinit_thread() but it is not needed since it was jus unnecessary complexity, so get rid of it. We can also remove li_task and li_wait_task since it is not used anymore. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Eric Sandeen <sandeen@redhat.com>	2011-05-20 13:49:51 -04:00
Lukas Czerner	4ed5c033c1	ext4: Use schedule_timeout_interruptible() for waiting in lazyinit thread In order to make lazyinit eat approx. 10% of io bandwidth at max, we are sleeping between zeroing each single inode table. For that purpose we are using timer which wakes up thread when it expires. It is set via add_timer() and this may cause troubles in the case that thread has been woken up earlier and in next iteration we call add_timer() on still running timer hence hitting BUG_ON in add_timer(). We could fix that by using mod_timer() instead however we can use schedule_timeout_interruptible() for waiting and hence simplifying things a lot. This commit exchange the old "waiting mechanism" with simple schedule_timeout_interruptible(), setting the time to sleep. Hence we do not longer need li_wait_daemon waiting queue and others, so get rid of it. Addresses-Red-Hat-Bugzilla: #699708 Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Eric Sandeen <sandeen@redhat.com>	2011-05-20 13:49:04 -04:00
Tony Luck	3935bb949f	Pull pstore into release branch	2011-05-20 10:34:50 -07:00
Steve French	156ecb2d8b	[CIFS] Fix to problem with getattr caused by invalidate simplification patch Fix to earlier "Simplify invalidate part (try #6)" patch That patch caused problems with connectathon test 5. Reviewed-by: Jeff Layton <jlayton@samba.org> Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-05-20 17:00:01 +00:00
Artem Bityutskiy	bdc1a1b610	UBIFS: fix kernel-doc comments This is a minor fix for UBIFS kernel-doc comments - we forgot the "@" symbol for several 'struct ubifs_debug_info'. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-05-20 08:30:13 +03:00
Linus Torvalds	39ab05c8e0	Merge branch 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6 * 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: (44 commits) debugfs: Silence DEBUG_STRICT_USER_COPY_CHECKS=y warning sysfs: remove "last sysfs file:" line from the oops messages drivers/base/memory.c: fix warning due to "memory hotplug: Speed up add/remove when blocks are larger than PAGES_PER_SECTION" memory hotplug: Speed up add/remove when blocks are larger than PAGES_PER_SECTION SYSFS: Fix erroneous comments for sysfs_update_group(). driver core: remove the driver-model structures from the documentation driver core: Add the device driver-model structures to kerneldoc Translated Documentation/email-clients.txt RAW driver: Remove call to kobject_put(). reboot: disable usermodehelper to prevent fs access efivars: prevent oops on unload when efi is not enabled Allow setting of number of raw devices as a module parameter Introduce CONFIG_GOOGLE_FIRMWARE driver: Google Memory Console driver: Google EFI SMI x86: Better comments for get_bios_ebda() x86: get_bios_ebda_length() misc: fix ti-st build issues params.c: Use new strtobool function to process boolean inputs debugfs: move to new strtobool ... Fix up trivial conflicts in fs/debugfs/file.c due to the same patch being applied twice, and an unrelated cleanup nearby.	2011-05-19 18:24:11 -07:00
Sage Weil	9d6fcb081a	ceph: check return value for start_request in writepages Since we pass the nofail arg, we should never get an error; BUG if we do. (And fix the function to not return an error if __map_request fails.) Signed-off-by: Sage Weil <sage@newdream.net>	2011-05-19 11:25:05 -07:00
Sage Weil	6b4a3b517a	ceph: remove useless check rc is only ever 0 or negative in this method. Signed-off-by: Sage Weil <sage@newdream.net>	2011-05-19 11:25:05 -07:00
Sage Weil	da39822c65	ceph: fix broken comparison in readdir loop Both off and fi->offset are unsigned, so the difference is always >= 0. Compare them directly instead of the sign of the difference. Signed-off-by: Sage Weil <sage@newdream.net>	2011-05-19 11:25:04 -07:00
Sage Weil	3540303f87	ceph: fix rare potential cap leak If we grab new_cap, retake the lock, and find we already have a cap now for the given mds, release new_cap. Signed-off-by: Sage Weil <sage@newdream.net>	2011-05-19 11:25:03 -07:00
Sage Weil	ae59808301	ceph: use snprintf for dirstat content We allocate a buffer for rstats if the dirstat option is enabled. Use snprintf. Signed-off-by: Sage Weil <sage@newdream.net>	2011-05-19 11:25:02 -07:00
Sage Weil	1b36698577	libceph: remove unused variable Signed-off-by: Sage Weil <sage@newdream.net>	2011-05-19 11:24:17 -07:00
Sage Weil	3b66378034	ceph: take reference on mds request r_unsafe_dir We put ourselves on an inode list for the parent directory of metadata operations so that an fsync on the directory will wait for metadata updates to commit to disk. We weren't holding a reference to that directory, however, and under certain workloads (fsstress in this case) the directory can go away. Signed-off-by: Sage Weil <sage@newdream.net>	2011-05-19 11:20:07 -07:00
Dave Chinner	bf59170a66	xfs: obey minleft values during extent allocation correctly When allocating an extent that is long enough to consume the remaining free space in an AG, we need to ensure that the allocation leaves enough space in the AG for any subsequent bmap btree blocks that are needed to track the new extent. These have to be allocated in the same AG as we only reserve enough blocks in an allocation transaction for modification of the freespace trees in a single AG. xfs_alloc_fix_minleft() has been considering blocks on the AGFL as free blocks available for extent and bmbt block allocation, which is not correct - blocks on the AGFL are there exclusively for the use of the free space btrees. As a result, when minleft is less than the number of blocks on the AGFL, xfs_alloc_fix_minleft() does not trim the given extent to leave minleft blocks available for bmbt allocation, and hence we can fail allocation during bmbt record insertion. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-05-19 12:03:48 -05:00
Dave Chinner	44396476a0	xfs: reset buffer pointers before freeing them When we free a vmapped buffer, we need to ensure the vmap address and length we free is the same as when it was allocated. In various places in the log code we change the memory the buffer is pointing to before issuing IO, but we never reset the buffer to point back to it's original memory (or no memory, if that is the case for the buffer). As a result, when we free the buffer it points to memory that is owned by something else and attempts to unmap and free it. Because the range does not match any known mapped range, it can trigger BUG_ON() traps in the vmap code, and potentially corrupt the vmap area tracking. Fix this by always resetting these buffers to their original state before freeing them. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-05-19 12:03:45 -05:00
Dave Chinner	ee58abdfcc	xfs: avoid getting stuck during async inode flushes When the underlying inode buffer is locked and xfs_sync_inode_attr() is doing a non-blocking flush, xfs_iflush() can return EAGAIN. When this happens, clear the error rather than returning it to xfs_inode_ag_walk(), as returning EAGAIN will result in the AG walk delaying for a short while and trying again. This can result in background walks getting stuck on the one AG until inode buffer is unlocked by some other means. This behaviour was noticed when analysing event traces followed by code inspection and verification of the fix via further traces. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-05-19 12:03:42 -05:00
Dave Chinner	e57375153d	xfs: fix xfs_itruncate_start tracing Variables are ordered incorrectly in trace call. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-05-19 12:03:36 -05:00
Dave Chinner	1beb65ad45	xfs: fix duplicate workqueue initialisation The workqueue initialisation function is called twice when initialising the XFS subsystem. Remove the second initialisation call. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-05-19 12:03:24 -05:00
Joe Perches	e69522a8cc	xfs: kill off xfs_printk() xfs_alert_tag() can be defined using xfs_alert(), and thereby avoid using xfs_printk() altogether. This is the only remaining use of xfs_printk(), so changing it this way means xfs_printk() can simply be eliminated.can simply be eliminated.can simply be eliminated.can simply be eliminated.can simply be eliminated.can simply be eliminated.can simply be eliminated.can simply be eliminated.can simply be eliminated. Also add format checking to the non-debug inline function xfs_debug. Miscellaneous function prototype argument alignment. (Updated to delete the definition of xfs_printk(), which is no longer used or needed.) Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-05-19 11:38:09 -05:00
Steve French	ceec1e0fae	[CIFS] Remove sparse warning Move extern for cifsConvertToUCS to different header to prevent following warning: CHECK fs/cifs/cifs_unicode.c fs/cifs/cifs_unicode.c:267:1: warning: symbol 'cifsConvertToUCS' was not declared. Should it be static? Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-05-19 14:10:56 +00:00
Steve French	4e64fb33de	[CIFS] Update cifs to version 1.72 Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-05-19 14:10:56 +00:00
Shirish Pargaonkar	c4aca0c09f	cifs: Change key name to cifs.idmap, misc. clean-up Change idmap key name from cifs.cifs_idmap to cifs.idmap. Removed unused structure wksidarr and function match_sid(). Handle errors correctly in function init_cifs(). Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-05-19 14:10:56 +00:00
Sean Finney	f14bcf71d1	cifs: Unconditionally copy mount options to superblock info Previously mount options were copied and updated in the cifs_sb_info struct only when CONFIG_CIFS_DFS_UPCALL was enabled. Making this information generally available allows us to remove a number of ifdefs, extra function params, and temporary variables. Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Sean Finney <seanius@seanius.net> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-05-19 14:10:55 +00:00
Sean Finney	5167f11ec9	cifs: Use kstrndup for cifs_sb->mountdata A relatively minor nit, but also clarified the "consensus" from the preceding comments that it is in fact better to try for the kstrdup early and cleanup while cleaning up is still a simple thing to do. Reviewed-By: Steve French <smfrench@gmail.com> Signed-off-by: Sean Finney <seanius@seanius.net> Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-05-19 14:10:55 +00:00
Sean Finney	046462abca	cifs: Simplify handling of submount options in cifs_mount. With CONFIG_DFS_UPCALL enabled, maintain the submount options in cifs_sb->mountdata, simplifying the code just a bit as well as making corner-case allocation problems less likely. Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Sean Finney <seanius@seanius.net> Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-05-19 14:10:55 +00:00
Sean Finney	b946845a9d	cifs: cifs_parse_mount_options: do not tokenize mount options in-place To keep strings passed to cifs_parse_mount_options re-usable (which is needed to clean up the DFS referral handling), tokenize a copy of the mount options instead. If values are needed from this tokenized string, they too must be duplicated (previously, some options were copied and others duplicated). Since we are not on the critical path and any cleanup is relatively easy, the extra memory usage shouldn't be a problem (and it is a bit simpler than trying to implement something smarter). Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Sean Finney <seanius@seanius.net> Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-05-19 14:10:54 +00:00
Sean Finney	c1508ca236	cifs: Add support for mounting Windows 2008 DFS shares Windows 2008 CIFS servers do not always return PATH_NOT_COVERED when attempting to access a DFS share. Therefore, when checking for remote shares, unconditionally ask for a DFS referral for the UNC (w/out prepath) before continuing with previous behavior of attempting to access the UNC + prepath and checking for PATH_NOT_COVERED. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=31092 Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Sean Finney <seanius@seanius.net> Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-05-19 14:10:54 +00:00
Sean Finney	dd61394586	cifs: Extract DFS referral expansion logic to separate function The logic behind the expansion of DFS referrals is now extracted from cifs_mount into a new static function, expand_dfs_referral. This will reduce duplicate code in upcoming commits. Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Sean Finney <seanius@seanius.net> Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-05-19 14:10:54 +00:00
Jeff Layton	460458ce8e	cifs: turn BCC into a static inlined function It's a bad idea to have macro functions that reference variables more than once, as the arguments could have side effects. Turn BCC() into a static inlined function instead. While we're at it, make it return a void * to discourage anyone from dereferencing it as-is. Reported-and-acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-05-19 14:10:53 +00:00
Jeff Layton	820a803ffa	cifs: keep BCC in little-endian format This is the same patch as originally posted, just with some merge conflicts fixed up... Currently, the ByteCount is usually converted to host-endian on receive. This is confusing however, as we need to keep two sets of routines for accessing it, and keep track of when to use each routine. Munging received packets like this also limits when the signature can be calulated. Simplify the code by keeping the received ByteCount in little-endian format. This allows us to eliminate a set of routines for accessing it and we can now drop the *_le suffixes from the accessor functions since that's now implied. While we're at it, switch all of the places that read the ByteCount directly to use the get_bcc inline which should also clean up some unaligned accesses. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-05-19 14:10:53 +00:00
Jeff Layton	0e6e37a7a8	cifs: fix some unused variable warnings in id_rb_search fs/cifs/cifsacl.c: In function ‘id_rb_search’: fs/cifs/cifsacl.c:215:19: warning: variable ‘linkto’ set but not used [-Wunused-but-set-variable] fs/cifs/cifsacl.c:214:18: warning: variable ‘parent’ set but not used [-Wunused-but-set-variable] Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-05-19 14:10:52 +00:00
Pavel Shilovsky	6feb9891da	CIFS: Simplify invalidate part (try #5 ) Simplify many places when we call cifs_revalidate/invalidate to make it do what it exactly needs. Reviewed-by: Jeff Layton <jlayton@samba.org> Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-05-19 14:10:52 +00:00
Pavel Shilovsky	0b81c1c405	CIFS: directio read/write cleanups Recently introduced strictcache mode brought a new code that can be efficiently used by directio part. That's let us add vectored operations and break unnecessary cifs_user_read and cifs_user_write. Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-05-19 14:10:51 +00:00
Steve French	be8e3b0044	consistently use smb_buf_length as be32 for cifs (try 3) There is one big endian field in the cifs protocol, the RFC1001 length, which cifs code (unlike in the smb2 code) had been handling as u32 until the last possible moment, when it was converted to be32 (its native form) before sending on the wire. To remove the last sparse endian warning, and to make this consistent with the smb2 implementation (which always treats the fields in their native size and endianness), convert all uses of smb_buf_length to be32. This version incorporates Christoph's comment about using be32_add_cpu, and fixes a typo in the second version of the patch. Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-05-19 14:10:51 +00:00
Shirish Pargaonkar	9409ae58e0	cifs: Invoke id mapping functions (try #17 repost) rb tree search and insertion routines. A SID which needs to be mapped, is looked up in one of the rb trees depending on whether SID is either owner or group SID. If found in the tree, a (mapped) id from that node is assigned to uid or gid as appropriate. If unmapped, an upcall is attempted to map the SID to an id. If upcall is successful, node is marked as mapped. If upcall fails, node stays marked as unmapped and a mapping is attempted again only after an arbitrary time period has passed. To map a SID, which can be either a Owner SID or a Group SID, key description starts with the string "os" or "gs" followed by SID converted to a string. Without "os" or "gs", cifs.upcall does not know whether SID needs to be mapped to either an uid or a gid. Nodes in rb tree have fields to prevent multiple upcalls for a SID. Searching, adding, and removing nodes is done within global locks. Whenever a node is either found or inserted in a tree, a reference is taken on that node. Shrinker routine prunes a node if it has expired but does not prune an expired node if its refcount is not zero (i.e. sid/id of that node is_being/will_be accessed). Thus a node, if its SID needs to be mapped by making an upcall, can safely stay and its fields accessed without shrinker pruning it. A reference (refcount) is put on the node without holding the spinlock but a reference is get on the node by holding the spinlock. Every time an existing mapped node is accessed or mapping is attempted, its timestamp is updated to prevent it from getting erased or a to prevent multiple unnecessary repeat mapping retries respectively. For now, cifs.upcall is only used to map a SID to an id (uid or gid) but it would be used to obtain an SID for an id. Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-05-19 14:10:51 +00:00
Shirish Pargaonkar	4d79dba0e0	cifs: Add idmap key and related data structures and functions (try #17 repost) Define (global) data structures to store ids, uids and gids, to which a SID maps. There are two separate trees, one for SID/uid and another one for SID/gid. A new type of key, cifs_idmap_key_type, is used. Keys are instantiated and searched using credential of the root by overriding and restoring the credentials of the caller requesting the key. Id mapping functions are invoked under config option of cifs acl. Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-05-19 14:10:51 +00:00
Pavel Shilovsky	9ad1506b42	CIFS: Add launder_page operation (try #3 ) Add this let us drop filemap_write_and_wait from cifs_invalidate_mapping and simplify the code to properly process invalidate logic. Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-05-19 14:10:50 +00:00
Steve French	1cb06d0b50	Introduce smb2 mounts as vers=2 As with Linux nfs client, which uses "nfsvers=" or "vers=" to indicate which protocol to use for mount, specifying "vers=smb2" or "vers=2" will force an smb2 mount. When vers is not specified cifs is used ie "vers=cifs" or "vers=1" We can eventually autonegotiate down from smb2 to cifs when smb2 is stable enough to make it the default, but this is for the future. At that time we could also implement a "maxprotocol" mount option as smbclient and Samba have today, but that would be premature until smb2 is stable. Intially the smb2 Kconfig option will depend on "BROKEN" until the merge is complete, and then be "EXPERIMENTAL" When it is no longer experimental we can consider changing the default protocol to attempt first. Reviewed-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-05-19 14:10:50 +00:00
Pavel Shilovsky	257fb1f15d	CIFS: Use invalidate_inode_pages2 instead of invalidate_remote_inode (try #4 ) Use invalidate_inode_pages2 that don't leave pages even if shrink_page_list() has a temp ref on them. It prevents a data coherency problem when cifs_invalidate_mapping didn't invalidate pages but the client thinks that a data from the cache is uptodate according to an oplock level (exclusive or II). Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-05-19 14:10:50 +00:00
Jeff Layton	fd5707e1b4	cifs: fix comment in validate_t2 The comment about checking the bcc is in the wrong place. Also make it match kernel coding style. Reported-and-acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-05-19 14:10:50 +00:00
Jeff Layton	4358b5678b	VFS: trivial: fix comment on s_maxbytes value warning check I originally intended to remove this warning in 2.6.34, but it's not in a high performance codepath and might help us to catch bugs later. Let's keep it, but fix the comment to allay confusion about its removal. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-05-19 14:10:49 +00:00
Steve French	b73b9a4ba7	[CIFS] Allow to set extended attribute cifs_acl (try #2 ) Allow setting cifs_acl on the server. Pass on to the server the ACL blob generated by an application. cifs is just a pass-through, it does not monitor or inspect the contents of the blob, server decides whether to enforce/apply the ACL blob composed by an application. If setting of ACL is succeessful, mark the inode for revalidation. Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-05-19 14:10:49 +00:00
Steve French	43988d7685	[CIFS] Use ecb des kernel crypto APIs instead of local cifs functions (repost) Using kernel crypto APIs for DES encryption during LM and NT hash generation instead of local functions within cifs. Source file smbdes.c is deleted sans four functions, one of which uses ecb des functionality provided by kernel crypto APIs. Remove function SMBOWFencrypt. Add return codes to various functions such as calc_lanman_hash, SMBencrypt, and SMBNTencrypt. Includes fix noticed by Dan Carpenter. Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> CC: Dan Carpenter <error27@gmail.com> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-05-19 14:10:49 +00:00
Shirish Pargaonkar	257208736a	cifs: cleanup: Rename and remove config flags Remove config flag CIFS_EXPERIMENTAL. Do export operations under new config flag CIFS_NFSD_EXPORT Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-05-19 14:10:49 +00:00
Steve French	b34cb85cc2	Introduce SMB2 Kconfig option SMB2 is the followon to the CIFS (and SMB) protocols and the default for Windows since Windows Vista, and also now implemented by various non-Windows servers. SMB2 is more secure, has various performance advantages, including larger i/o sizes, flow control, better caching model and more. SMB2 also resolves some scalability limits in the cifs protocol and adds many new features while being much simpler (only a few dozen commands instead of hundreds) and since the protocol is clearer it is also more consistently implemented across servers and thus easier to optimize. After much discussion with Jeff Layton, Jeremy Allison and others at Connectathon, we decided to move the smb2 code from a distinct .ko and fstype into distinct C files that optionally build in cifs.ko. As a result the Kconfig gets simpler. To avoid destabilizing cifs, the smb2 code is going to be moved into its own experimental CONFIG_CIFS_SMB2 ifdef as it is merged and rereviewed. The changes to stable cifs (builds with the smb2 ifdef off) are expected to be fairly small. Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-05-19 14:10:48 +00:00
Steve French	34c87901e1	Shrink stack space usage in cifs_construct_tcon We were reserving MAX_USERNAME (now 256) on stack for something which only needs to fit about 24 bytes ie string krb50x + printf version of uid Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-05-19 14:10:48 +00:00
Justin P. Mattock	fd62cb7e74	fs:cifs:connect.c remove one to many l's in the word. The patch below removes an extra "l" in the word. Signed-off-by: Justin P. Mattock <justinmattock@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-05-19 14:10:48 +00:00
Steve French	c52a95545c	Don't compile in unused reparse point symlink code Recent Windows versions now create symlinks more frequently and they do use this "reparse point" symlink mechanism. We can of course do symlinks nicely to Samba and other servers which support the CIFS Unix Extensions and we can also do SFU symlinks and "client only" "MF" symlinks optionally, but for recent Windows we currently can not handle the common "reparse point" symlinks fully, removing the caller for this. We will need to extend and reenable this "reparse point" worker code in cifs and fix cifs_symlink to call this. In the interim this code has been moved to its own config option so it is not compiled in by default until cifs_symlink fixed up (and tested) to use this. CC: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-05-19 14:10:48 +00:00
Steve French	0eff0e2677	Remove unused CIFSSMBNotify worker function The CIFSSMBNotify worker is unused, pending changes to allow it to be called via inotify, so move it into its own experimental config option so it does not get built in, until the necessary VFS support is fixed. It used to be used in dnotify, but according to Jeff, inotify needs minor changes before we can reenable this. CC: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-05-19 14:10:47 +00:00
Shirish Pargaonkar	9b6763e0aa	cifs: Remove unused inode number while fetching root inode ino is unused in function cifs_root_iget(). Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-05-19 14:10:47 +00:00
James Morris	12a5a2621b	Merge branch 'master' into next Conflicts: include/linux/capability.h Manually resolve merge conflict w/ thanks to Stephen Rothwell. Signed-off-by: James Morris <jmorris@namei.org>	2011-05-19 18:51:57 +10:00
Jonathan Cameron	a037439637	debugfs: move to new strtobool No functional changes requires that we eat errors from strtobool. If people want to not do this, then it should be fixed at a later date. V2: Simplification suggested by Rusty Russell removes the need for additional variable ret. Signed-off-by: Jonathan Cameron <jic23@cam.ac.uk> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2011-05-19 16:55:28 +09:30
Linus Torvalds	3f80fbff5f	Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: configfs: Fix race between configfs_readdir() and configfs_d_iput() configfs: Don't try to d_delete() negative dentries. ocfs2/dlm: Target node death during resource migration leads to thread spin ocfs2: Skip mount recovery for hard-ro mounts ocfs2/cluster: Heartbeat mismatch message improved ocfs2/cluster: Increase the live threshold for global heartbeat ocfs2/dlm: Use negotiated o2dlm protocol version ocfs2: skip existing hole when removing the last extent_rec in punching-hole codes. ocfs2: Initialize data_ac (might be used uninitialized)	2011-05-18 16:50:28 -07:00
Daniel Mack	c47d832bc0	nfsd: make local functions static This also fixes a number of sparse warnings. Signed-off-by: Daniel Mack <zonque@gmail.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: J. Bruce Fields <bfields@fieldses.org> Cc: Neil Brown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-05-18 15:28:31 -04:00
Darrick J. Wong	0e499890c1	ext4: wait for writeback to complete while making pages writable In order to stabilize pages during disk writes, ext4_page_mkwrite must wait for writeback operations to complete before making a page writable. Furthermore, the function must return locked pages, and recheck the writeback status if the page lock is ever dropped. The "someone could wander in" part of this patch was suggested by Chris Mason. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-05-18 13:55:20 -04:00
Darrick J. Wong	7cb1a5351d	ext4: clean up some wait_on_page_writeback calls wait_on_page_writeback already checks the writeback bit, so callers of it needn't do that test. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-05-18 13:53:20 -04:00
Tao Ma	ed3ce80a52	ext4: don't warn about mnt_count if it has been disabled Currently, if we mkfs a new ext4 volume with s_max_mnt_count set to zero, and mount it for the first time, we will get the warning: maximal mount count reached, running e2fsck is recommended It is really misleading. So change the check so that it won't warn in that case. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-05-18 13:29:57 -04:00
Linus Torvalds	a2b9c1f620	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block * 'for-linus' of git://git.kernel.dk/linux-2.6-block: block: don't delay blk_run_queue_async scsi: remove performance regression due to async queue run blk-throttle: Use task_subsys_state() to determine a task's blkio_cgroup block: rescan partitions on invalidated devices on -ENOMEDIA too cdrom: always check_disk_change() on open block: unexport DISK_EVENT_MEDIA_CHANGE for legacy/fringe drivers	2011-05-18 06:49:02 -07:00
Joel Becker	24307aa1e7	configfs: Fix race between configfs_readdir() and configfs_d_iput() configfs_readdir() will use the existing inode numbers of inodes in the dcache, but it makes them up for attribute files that aren't currently instantiated. There is a race where a closing attribute file can be tearing down at the same time as configfs_readdir() is trying to get its inode number. We want to get the inode number of open attribute files, because they should match while instantiated. We can't lock down the transition where dentry->d_inode is set to NULL, so we just check for NULL there. We can, however, ensure that an inode we find isn't iput() in configfs_d_iput() until after we've accessed it. Signed-off-by: Joel Becker <jlbec@evilplan.org>	2011-05-18 04:08:16 -07:00
Joel Becker	df7f99670a	configfs: Don't try to d_delete() negative dentries. When configfs is faking mkdir() on its subsystem or default group objects, it starts by adding a negative dentry. It then tries to instantiate the group. If that should fail, it must clean up after itself. I was using d_delete() here, but configfs_attach_group() promises to return an empty dentry on error. d_delete() explodes with the entry dentry. Let's try d_drop() instead. The unhashing is what we want for our dentry. Signed-off-by: Joel Becker <jlbec@evilplan.org>	2011-05-18 03:30:58 -07:00
Jeff Layton	11379b5e33	cifs: fix cifsConvertToUCS() for the mapchars case As Metze pointed out, commit `84cdf74e` broke mapchars option: Commit "cifs: fix unaligned accesses in cifsConvertToUCS" (`84cdf74e80`) does multiple steps in just one commit (moving the function and changing it without testing). put_unaligned_le16(temp, &target[j]); is never called for any codepoint the goes via the 'default' switch statement. As a result we put just zero (or maybe uninitialized) bytes into the target buffer. His proposed patch looks correct, but doesn't apply to the current head of the tree. This patch should also fix it. Cc: <stable@kernel.org> # .38.x: `581ade4`: cifs: clean up various nits in unicode routines (try #2) Reported-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-05-17 20:54:04 +00:00
Jeff Layton	221d1d7972	cifs: add fallback in is_path_accessible for old servers The is_path_accessible check uses a QPathInfo call, which isn't supported by ancient win9x era servers. Fall back to an older SMBQueryInfo call if it fails with the magic error codes. Cc: stable@kernel.org Reported-and-Tested-by: Sandro Bonazzola <sandro.bonazzola@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-05-17 18:51:14 +00:00
Tao Ma	9199e66528	jbd/jbd2: remove obsolete summarise_journal_usage. summarise_journal_usage seems to be obsolete for a long time, so remove it. Cc: Jan Kara <jack@suse.cz> Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: Jan Kara <jack@suse.cz>	2011-05-17 13:47:42 +02:00
Jan Kara	2842bb20ee	jbd: Fix forever sleeping process in do_get_write_access() In do_get_write_access() we wait on BH_Unshadow bit for buffer to get from shadow state. The waking code in journal_commit_transaction() has a bug because it does not issue a memory barrier after the buffer is moved from the shadow state and before wake_up_bit() is called. Thus a waitqueue check can happen before the buffer is actually moved from the shadow state and waiting process may never be woken. Fix the problem by issuing proper barrier. CC: stable@kernel.org Reported-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: Jan Kara <jack@suse.cz>	2011-05-17 13:47:42 +02:00
Robin Dong	4e299c1d91	ext2: fix error msg when mounting fs with too-large blocksize When ext2 mounts a filesystem, it attempts to set the block device blocksize with a call to sb_set_blocksize, which can fail for several reasons. The current failure message in ext2 prints: EXT2-fs (loop1): error: blocksize is too small which is not correct in all cases. This can be demonstrated by creating a filesystem with # mkfs.ext2 -b 8192 on a 4k page system, and attempting to mount it. Change the error message to a more generic: EXT2-fs (loop1): bad blocksize 8192 to match the error message in ext3. Signed-off-by: Robin Dong <sanbai@taobao.com> Reviewed-by: Coly Li <bosong.ly@taobao.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz>	2011-05-17 13:47:42 +02:00
Ted Ts'o	d9b01934d5	jbd: fix fsync() tid wraparound bug If an application program does not make any changes to the indirect blocks or extent tree, i_datasync_tid will not get updated. If there are enough commits (i.e., 2*31) such that tid_geq()'s calculations wrap, and there isn't a currently active transaction at the time of the fdatasync() call, this can end up triggering a BUG_ON in fs/jbd/commit.c: J_ASSERT(journal->j_running_transaction != NULL); It's pretty rare that this can happen, since it requires the use of fdatasync() plus very* frequent and excessive use of fsync(). But with the right workload, it can. We fix this by replacing the use of tid_geq() with an equality test, since there's only one valid transaction id that is valid for us to start: namely, the currently running transaction (if it exists). CC: stable@kernel.org Reported-by: Martin_Zielinski@McAfee.com Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz>	2011-05-17 13:47:41 +02:00
Jan Kara	86c4f6d855	ext3: Fix fs corruption when make_indexed_dir() fails When make_indexed_dir() fails (e.g. because of ENOSPC) after it has allocated block for index tree root, we did not properly mark all changed buffers dirty. This lead to only some of these buffers being written out and thus effectively corrupting the directory. Fix the issue by marking all changed data dirty even in the error failure case. CC: stable@kernel.org Signed-off-by: Jan Kara <jack@suse.cz>	2011-05-17 13:47:15 +02:00
Alexey Dobriyan	011159a0a7	airo: correct proc entry creation interfaces * use proc_mkdir_mode() instead of create_proc_entry(S_IFDIR\|...), export proc_mkdir_mode() for that, oh well. * don't supply S_IFREG to proc_create_data(), it's unnecessary Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2011-05-16 14:25:28 -04:00
Chen Gong	06cf91b4b4	pstore: fix pstore filesystem mount/remount issue Currently after mount/remount operation on pstore filesystem, the content on pstore will be lost. It is because current ERST implementation doesn't support multi-user usage, which moves internal pointer to the end after accessing it. Adding multi-user support for pstore usage. Signed-off-by: Chen Gong <gong.chen@linux.intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2011-05-16 11:05:00 -07:00
Chen Gong	8d38d74b64	pstore: fix one type of return value in pstore the return type of function _read_ in pstore is size_t, but in the callback function of _read_, the logic doesn't consider it too much, which means if negative value (assuming error here) is returned, it will be converted to positive because of type casting. ssize_t is enough for this function. Signed-off-by: Chen Gong <gong.chen@linux.intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2011-05-16 11:04:51 -07:00
Allison Henderson	9b940f8e8c	ext4: ext4_ext_convert_to_initialized bug found in extended FSX testing This patch addresses bugs found while testing punch hole with the fsx test. The patch corrects the number of blocks that are zeroed out while splitting an extent, and also corrects the return value to return the number of blocks split out, instead of the number of blocks zeroed out. This patch has been tested in addition to the following patches: [Ext4 punch hole v7] [XFS Tests Punch Hole 1/1 v2] Add Punch Hole Testing to FSX The test ran successfully for 24 hours. Signed-off-by: Allison Henderson <achender@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-05-16 10:11:09 -04:00
Amir Goldstein	0b26859027	ext4: fix oops in ext4_quota_off() If quota is not enabled when ext4_quota_off() is called, we must not dereference quota file inode since it is NULL. Check properly for this. This fixes a bug in commit `21f976975c` (ext4: remove unnecessary [cm]time update of quota file), which was merged for 2.6.39-rc3. Reported-by: Amir Goldstein <amir73il@users.sf.net> Signed-off-by: Amir Goldstein <amir73il@users.sf.net> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-05-16 09:59:13 -04:00
Artem Bityutskiy	bbf2b37a98	UBIFS: fix extremely rare mount failure This patch fixes an extremely rare mount failure after a power cut, when mount fails with ENOSPC error because UBIFS could not find the GC LEB. In short, the reason for this failure is that after recovery the GC head LEB contains less free space than it had contained just before the power cut happened. As a result, if the FS is full, 'ubifs_rcvry_gc_commit()' is unable to find a dirty LEB to GC and a free LEB, so mount fails. This patch contains a huge comment with more detailed explanation, please refer that comment. Since this is really really rare and unlikely situation, I do not send this patch to the stable tree, also because it requires a lot of preparation patches which I did before. So sending this to -stable would be too risky. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-05-16 15:48:48 +03:00
Artem Bityutskiy	43e0707386	UBIFS: simplify LEB recovery function further Further simplify 'ubifs_recover_leb()' by noticing that we have to call 'clean_buf()' in any case, and it is fine to call it if the offset is aligned to 'c->min_io_size'. Thus, we do not have to call it separately from every "if" - just call it once at the end. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-05-16 15:48:48 +03:00
Artem Bityutskiy	7c47bfd0db	UBIFS: always cleanup the recovered LEB Now when we call 'ubifs_recover_leb()' only for LEBs which are potentially corrupted (i.e., only for last buds, not for all of them), we can cleanup every LEB, not only those where we find corruption. The reason - unstable bits. Even though the LEB may look good now, it might contain unstable bits which may hit us a bit later. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-05-16 15:48:48 +03:00
Artem Bityutskiy	6179920695	UBIFS: clean up LEB recovery function This patch cleans up 'ubifs_recover_leb()' function and makes it more readable. Move things which are done only once out of the loop and kill unneeded 'switch' statement. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-05-16 15:48:48 +03:00
Matthew L. Creech	9d510db423	UBIFS: fix-up free space on mount if flag is set If a UBIFS filesystem is being mounted read-write, or is being remounted from read-only to read-write, check for the "space_fixup" flag and fix all LEBs containing empty space if necessary. Artem: tweaked the patch a bit Signed-off-by: Matthew L. Creech <mlcreech@gmail.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-05-16 14:12:15 +03:00
Matthew L. Creech	6554a65781	UBIFS: add the fixup function This patch adds the 'ubifs_fixup_free_space()' function which scans all LEBs in the filesystem for those that are in-use but have one or more empty pages, then re-maps the LEBs in order to erase the empty portions. Afterward it removes the "space_fixup" flag from the UBIFS superblock. Artem: massaged the patch Signed-off-by: Matthew L. Creech <mlcreech@gmail.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-05-16 14:12:15 +03:00
Matthew L. Creech	9f58d3503a	UBIFS: add a superblock flag for free space fix-up The 'space_fixup' flag can be set in the superblock of a new filesystem by mkfs.ubifs to indicate that any eraseblocks with free space remaining should be fixed-up the first time it's mounted (after which the flag is un-set). This means that the UBIFS image has been flashed by a "dumb" flasher and the free space has been actually programmed (writing all 0xFFs), so this free space cannot be used. UBIFS fixes the free space up by re-writing the contents of all LEBs with free space using the atomic LEB change UBI operation. Artem: improved commit message, add some more commentaries to the code. Signed-off-by: Matthew L. Creech <mlcreech@gmail.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-05-16 14:12:14 +03:00
Artem Bityutskiy	e11602ea3e	UBIFS: share the next_log_lnum helper We'll need to use the 'next_log_lnum()' helper function from log.c in the fixup code, so let's move it to misc.h. IOW, this is a preparation to the following free space fixup changes. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-05-16 14:12:12 +03:00
Artem Bityutskiy	91c66083fc	UBIFS: expect corruption only in last journal head LEBs This patch improves UBIFS recovery and teaches it to expect corruption only in the last buds. Indeed, currently we just recover all buds, which is incorrect because only the last buds can have corruptions in case of a power cut. So it is inconsistent with the rest of the recovery strategy which tries hard to distinguish between corruptions cause by power cuts and other types of corruptions. This patch also adds one quirk - a bit older UBIFS was could have corruption in the next to last bud because of the way it switched buds: when bud A is full, it first searched for the next bud B, the wrote a reference node to the log about B, and then synchronized the write-buffer of A. So we could end up with buds A and B, where B is the last, but A had corruption. The UBIFS behavior was fixed, though, so currently it always first synchronizes A's write-buffer and only after this adds B to the log. However, to be make sure that we handle unclean (after a power cut) UBIFS images belonging to older UBIFS - we need to add a quirk and keep it for some time: we need to check for the situation described above. Thankfully, it is easy to check for that situation. When UBIFS adds B to the log, it always first unmaps B, then maps it, and then syncs A's write-buffer. Thus, in that situation we can check that B is empty, in which case it is OK to have corruption in A. To check that B is empty it is enough to just read the first few bytes of the bud and compare them with 0xFFs. This quirk may be removed in a couple of years. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-05-16 14:11:25 +03:00
Artem Bityutskiy	cb14a18465	UBIFS: synchronize write-buffer before switching to the next bud Currently when UBIFS fills up the current bud (which is the last in the journal head) and switches to the next bud, it first writes the log reference node for the next bud and only after this synchronizes the write-buffer of the previous bud. This is not a big deal, but an unclean power cut may lead to a situation when we have corruption in a next-to-last bud, although it is much more logical that we have to have corruption only in the last bud. This patch also removes write-buffer synchronization from 'ubifs_wbuf_seek_nolock()' because this is not needed anymore (we synchronize the write-buffer explicitly everywhere now) and also because this is just prone to various errors. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-05-16 10:31:41 +03:00
Artem Bityutskiy	c49139d809	UBIFS: remove BUG statement Remove a 'BUG()' statement when we are unable to find a bud and add a similar 'ubifs_assert()' statement instead. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-05-16 10:31:41 +03:00
Artem Bityutskiy	e76a452640	UBIFS: change bud replay function conventions This is a minor preparation patch which changes 'replay_bud()' interface - instead of passing bud lnum, offs, jhead, etc directly, pass a pointer to the bud entry which contains all the information. The bud entry will be also needed in one of the following patches. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-05-16 10:31:41 +03:00
Artem Bityutskiy	debf12d541	UBIFS: substitute the replay tree with a replay list This patch simplifies replay even further - it removes the replay tree and adds the replay list instead. Indeed, we just do not need to use a tree here - all we need to do is to add all nodes to the list and then sort it. Using RB-tree is an overkill - more code and slower. And since we replay buds in order, we expect the nodes to follow in _mostly_ sorted order, so the merge sort becomes much cheaper in average than an RB-tree. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-05-16 10:31:40 +03:00
Artem Bityutskiy	074bcb9b5c	UBIFS: simplify replay This patch simplifies the replay code and makes it smaller. First of all, we can notice that we do not really need to create bud replay entries and insert them to the replay tree, because the only reason we do this is to set buds lprops correctly at the end. Instead, we can just walk the list of buds at the very end and set lprops for each bud. This allows us to get rid of whole 'insert_ref_node()' function, the 'REPLAY_REF' flag, and several fields in 'struct replay_entry'. Then we can also notice that we do not need the 'flags' 'struct replay_entry' field, because there is only one flag - 'REPLAY_DELETION'. Instead, we can just add a 'deletion' bit fields. As a result, this patch deletes much more lines that in adds. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-05-16 10:31:40 +03:00
Artem Bityutskiy	af1dd41264	UBIFS: store free and dirty space in the bud replay entry This is just a small preparation patch which adds 'free' and 'drity' fields to 'struct bud_entry'. They will be used to set bud lprops. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-05-16 10:31:40 +03:00
Artem Bityutskiy	12f338914e	UBIFS: remove unnecessary stack variable This is patch removes an unnecessary 'offs' variable from 'ubifs_wbuf_write_nolock()' - we can just keep 'wbuf->offs' up-to-date instead. This patch is very minor the only motivation for it was that it is cleaner to keep wbuf->offs up-to-date by the time we call 'ubifs_leb_write()'. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-05-16 10:31:40 +03:00
Artem Bityutskiy	7703f09ded	UBIFS: double check that buds are replied in order Commit `52c6e6f990` provides misleading infomation in the commit messages - buds are replied in order. And the real reason why that fix helped is probably because it made sure we seek head even in read-only mode (so deferred recovery will have seeked heads). This patch adds an assertion which will fire if we reply buds out of order. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-05-16 10:31:40 +03:00
Artem Bityutskiy	e9ef7b5f25	UBIFS: make 2 functions static This is a minor change which makes 2 functions static because they are not used outside the gc.c file: 'data_nodes_cmp()' and 'nondata_nodes_cmp()'. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-05-16 10:31:39 +03:00
Artem Bityutskiy	7a9c3e3993	UBIFS: improve commentary This is a tiny clean-up patch which improves replay commentaries. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-05-16 10:31:39 +03:00
Artem Bityutskiy	c839e29768	UBIFS: improve debugging messages Print a bit more information is some recovery and replay paths. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-05-16 10:31:39 +03:00
Artem Bityutskiy	12346037a7	UBIFS: dump more in the lprops debugging check Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-05-16 10:31:39 +03:00
Artem Bityutskiy	34bdc3e257	UBIFS: simplify lprops debugging check Now we return all errors from 'scan_check_cb()' directly, so we do not need 'struct scan_check_data' any more, and this patch removes it. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-05-16 10:31:39 +03:00
Artem Bityutskiy	dcc50c8ee3	UBIFS: simplify error path in lprops debugging check Simplify error path in 'scan_check_cb()' and stop using the special 'data->err' field, but instead return the error code directly. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-05-16 10:31:38 +03:00
Artem Bityutskiy	8ca5175b02	UBIFS: improve debugging lprops scanning a little When doing the lprops extra check ('dbg_check_lprops()') we scan whole media. We even scan empty and freeable LEBs which may contain garbage, which we handle after scanning. This patch teach the lprops checking function ('scan_check_cb()') to avoid scanning for free and freeable LEBs and save time. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-05-16 10:31:38 +03:00
Linus Torvalds	eed631e0d7	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: fix FS_IOC_SETFLAGS ioctl Btrfs: fix FS_IOC_GETFLAGS ioctl fs: remove FS_COW_FL Btrfs: fix easily get into ENOSPC in mixed case Prevent oopsing in posix_acl_valid()	2011-05-15 10:22:10 -07:00
Allison Henderson	6976a6f2ac	ext4: don't dereference null pointer when make_indexed_dir() fails Fix for a null pointer bug found while running punch hole tests Signed-off-by: Allison Henderson <achender@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-05-15 00:19:41 -04:00
Li Zefan	ebcb904dfe	Btrfs: fix FS_IOC_SETFLAGS ioctl Steps to reproduce the bug: - Call FS_IOC_SETLFAGS ioctl with flags=FS_COMPR_FL - Call FS_IOC_SETFLAGS ioctl with flags=0 - Call FS_IOC_GETFLAGS ioctl, and you'll see FS_COMPR_FL is still set! Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-14 16:10:28 -04:00
Li Zefan	d0092bdda8	Btrfs: fix FS_IOC_GETFLAGS ioctl As we've added per file compression/cow support. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-14 16:10:27 -04:00
Li Zefan	e1e8fb6a1f	fs: remove FS_COW_FL FS_COW_FL and FS_NOCOW_FL were newly introduced to control per file COW in btrfs, but FS_NOCOW_FL is sufficient. The fact is we don't have corresponding BTRFS_INODE_COW flag. COW is default, and FS_NOCOW_FL can be used to switch off COW for a single file. If we mount btrfs with nodatacow, a newly created file will be set with the FS_NOCOW_FL flag. So to turn on COW for it, we can just clear the FS_NOCOW_FL flag. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-14 16:10:26 -04:00
liubo	1aba86d67f	Btrfs: fix easily get into ENOSPC in mixed case When a btrfs disk is created by mixed data & metadata option, it will have no pure data or pure metadata space info. In btrfs's for-linus branch, commit 78b1ea13838039cd88afdd62519b40b344d6c920 (Btrfs: fix OOPS of empty filesystem after balance) initializes space infos at the very beginning. The problem is this initialization does not take the mixed case into account, which will cause btrfs will easily get into ENOSPC in mixed case. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-14 16:10:26 -04:00
Daniel J Blueman	f5de939149	Prevent oopsing in posix_acl_valid() If posix_acl_from_xattr() returns an error code, a negative address is dereferenced causing an oops; fix by checking for error code first. Signed-off-by: Daniel J Blueman <daniel.blueman@gmail.com> Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-14 16:10:18 -04:00
J Freyensee	7d74f492e4	export kernel call get_task_comm(). This allows drivers who call this function to be compiled modularly. Otherwise, a driver who is interested in this type of functionality has to implement their own get_task_comm() call, causing code duplication in the Linux source tree. Signed-off-by: J Freyensee <james_p_freyensee@linux.intel.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-05-13 16:30:59 -07:00
Stephen Boyd	c42d223714	debugfs: Silence DEBUG_STRICT_USER_COPY_CHECKS=y warning Enabling DEBUG_STRICT_USER_COPY_CHECKS causes the following warning: In file included from arch/x86/include/asm/uaccess.h:573, from include/linux/uaccess.h:5, from include/linux/highmem.h:7, from include/linux/pagemap.h:10, from fs/debugfs/file.c:18: In function 'copy_from_user', inlined from 'write_file_bool' at fs/debugfs/file.c:435: arch/x86/include/asm/uaccess_64.h:65: warning: call to 'copy_from_user_overflow' declared with attribute warning: copy_from_user() buffer size is not provably correct presumably due to buf_size being signed causing GCC to fail to see that buf_size can't become negative. Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-05-13 16:15:35 -07:00
Greg Kroah-Hartman	82a3242e11	sysfs: remove "last sysfs file:" line from the oops messages On some arches (x86, sh, arm, unicore, powerpc) the oops message would print out the last sysfs file accessed. This was very useful in finding a number of sysfs and driver core bugs in the 2.5 and early 2.6 development days, but it has been a number of years since this file has actually helped in debugging anything that couldn't also be trivially determined from the stack traceback. So it's time to delete the line. This is good as we need all the space we can get for oops messages at times on consoles. Acked-by: Phil Carmody <ext-phil.2.carmody@nokia.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-05-13 16:05:51 -07:00
Linus Torvalds	cf70cc5b9d	Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: NFSv4.1: Ensure that layoutget uses the correct gfp modes NFSv4.1: remove pnfs_layout_hdr from pnfs_destroy_all_layouts tmp_list NFSv41: Resend on NFS4ERR_RETRY_UNCACHED_REP	2011-05-13 15:19:39 -07:00
Linus Torvalds	26cf46be95	vfs: micro-optimize acl_permission_check() It's a hot function, and we're better off not mixing types in the mask calculations. The compiler just ends up mixing 16-bit and 32-bit operations, for no good reason. So do everything in 'unsigned int' rather than mixing 'unsigned int' masking with a 'umode_t' (16-bit) mode variable. This, together with the parent commit (`47a150edc2`: "Cache user_ns in struct cred") makes acl_permission_check() much nicer. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-05-13 11:51:01 -07:00
Sunil Mushran	df016c665b	ocfs2/dlm: Target node death during resource migration leads to thread spin During resource migration, if the target node were to die, the thread doing the migration spins until the target node is not removed from the domain map. This patch slows the spin by making the thread wait for the recovery to kick in. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <jlbec@evilplan.org>	2011-05-13 11:27:30 -07:00
Sunil Mushran	10b3dd7611	ocfs2: Skip mount recovery for hard-ro mounts Patch skips mount recovery for hard-ro mounts which otherwise leads to an oops. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <jlbec@evilplan.org>	2011-05-13 11:27:14 -07:00
Sunil Mushran	33c12a5436	ocfs2/cluster: Heartbeat mismatch message improved If o2hb finds unexpected values in the heartbeat slot, it prints a message "ERROR: Device "dm-6": another node is heartbeating in our slot!" This message could be misleading. This patch adds two more messages to help users better diagnose the problem. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <jlbec@evilplan.org>	2011-05-13 11:27:02 -07:00
Sunil Mushran	76d9fc2954	ocfs2/cluster: Increase the live threshold for global heartbeat We have seen isolated cases (very few, I might add) of o2hb not detecting all live nodes on startup. One plausible reasoning for it is that other node had a hb io delay at the same time. The live threshold set at 2 (as low as it can be) could be increased to ameliorate the situation. But increasing the threshold directly affects mount time. Currently it takes around 5 secs to mount a volume in o2cb cluster with local heartbeat. Increasing the threshold will make mounts even slower. As the issue itself is rare, we have left things as they are for the local heartbeat mode. However we can improve the situation for global heartbeat mode as in that mode, we start the heartbeat much before the mount. This patch doubles the live threshold for the start of the first region in global heartbeat mode. Addresses internal Oracle bug#10635585. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <jlbec@evilplan.org>	2011-05-13 11:26:48 -07:00
Sunil Mushran	4da6dc2936	ocfs2/dlm: Use negotiated o2dlm protocol version Patch fixes a bug in the o2dlm protocol negotiation in that it is using the builtin version rather than the negotiated version during the domain join. This causes join errors when a node having kernel >= 2.6.37 joins a cluster with nodes having kernels < 2.6.37. This only affects the o2cb cluster stack. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Reported-by: Jacek Stepniewski <Jacek.Stepniewski@agora.pl> Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <jlbec@evilplan.org>	2011-05-13 11:26:37 -07:00
Tristan Ye	9a790ba1ec	ocfs2: skip existing hole when removing the last extent_rec in punching-hole codes. In the case of removing a partial extent record which covers a hole, current punching-hole logic will try to remove more than the length of whole extent record, which leads to the failure of following assert(fs/ocfs2/alloc.c): 5507 BUG_ON(cpos < le32_to_cpu(rec->e_cpos) \|\| trunc_range > rec_range); This patch tries to skip existing hole at the last attempt of removing a partial extent record, what's more, it also adds some necessary comments for better understanding of punching-hole codes. Signed-off-by: Tristan Ye <tristan.ye@oracle.com> Signed-off-by: Joel Becker <jlbec@evilplan.org>	2011-05-13 11:26:20 -07:00
Marcus Meissner	5d44670fac	ocfs2: Initialize data_ac (might be used uninitialized) CLANG found that there is a path that has data_ac uninitialized, this place 2917 /* This gets us the dx_root */ 2918 ret = ocfs2_reserve_new_metadata_blocks(osb, 1, &meta_ac); 2919 if (ret) { 3 Taking true branch 2920 mlog_errno(ret); 2921 goto out; 4 Control jumps to line 3168 2922 } Goes to the out: label without data_ac being initialized. Ciao, Marcus Signed-Off-By: Marcus Meissner <meissner@suse.de> Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <jlbec@evilplan.org>	2011-05-13 11:26:15 -07:00
Artem Bityutskiy	eaeee242c5	UBIFS: fix a rare memory leak in ro to rw remounting path When re-mounting from R/O mode to R/W mode and the LEB count in the superblock is not up-to date, because for the underlying UBI volume became larger, we re-write the superblock. We allocate RAM for these purposes, but never free it. So this is a memory leak, although very rare one. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Cc: stable@kernel.org	2011-05-13 19:23:57 +03:00
Artem Bityutskiy	c1f1f91d21	UBIFS: fix inode size debugging check failure This patch fixes a problem with the following symptoms: UBIFS: deferred recovery completed UBIFS error (pid 15676): dbg_check_synced_i_size: ui_size is 11481088, synced_i_size is 11459081, but inode is clean UBIFS error (pid 15676): dbg_check_synced_i_size: i_ino 128, i_mode 0x81a4, i_size 11481088 It happens when additional debugging checks are enabled and we are recovering from a power cut. When we fixup corrupted inode size during recovery, we change them in-place and we change ui_size as well, but not synced_i_size, which causes this failure. This patch makes sure we change both fields and fixes the issue. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-05-13 19:23:57 +03:00
Artem Bityutskiy	45cd5cddbf	UBIFS: fix debugging FS checking failure When the debugging self-checks are enabled, we go trough whole file-system after mount and check/validate every single node referred to by the index. This is implemented by the 'dbg_check_filesystem()' function. However, this function fails if we mount "unclean" file-system, i.e., if we mount the file-system after a power cut. It fails with the following symptoms: UBIFS DBG (pid 8171): ubifs_recover_size: ino 937 size 3309925 -> 3317760 UBIFS: recovery deferred UBIFS error (pid 8171): check_leaf: data node at LEB 1000:0 is not within inode size 3309925 The reason of failure is that recovery fixed up the inode size in memory, but not on the flash so far. So the value on the flash is incorrect so far, and would be corrected when we re-mount R/W. But 'check_leaf()' ignores this fact and tries to validate the size of the on-flash inode, which is incorrect, so it fails. This patch teaches the checking code to look at the VFS inode cache first, and if there is the inode in question, use that inode instead of the inode on the flash media. This fixes the issue. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-05-13 19:23:57 +03:00
Artem Bityutskiy	69f8a75a7d	UBIFS: remove an unneeded check In 'ubifs_recover_size()' we have an "if (!e->inode && c->ro_mount)" statement. But if 'c->ro_mount' is true, then '!e->inode' must always be true as well. So we can remove the unnecessary '!e->inode' test and put an 'ubifs_assert(!e->inode)' instead. This patch also removes an extra trailing white-space in a debugging print, as well as adds few empty lines to 'ubifs_recover_size()' to make it a bit more readable. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-05-13 19:23:57 +03:00
Artem Bityutskiy	4c9545200a	UBIFS: fix debugging message When recovering the inode size, one of the debugging messages was printed incorrecly, this patches fixes it. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-05-13 19:23:56 +03:00
Artem Bityutskiy	fe79c05f03	UBIFS: refactor ubifs_rcvry_gc_commit This commits refactors and cleans up 'ubifs_rcvry_gc_commit()' which was quite untidy, also removes the commentary which was not 100% correct. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-05-13 19:23:56 +03:00
Artem Bityutskiy	447442139c	UBIFS: split ubifs_rcvry_gc_commit Split the 'ubifs_rcvry_gc_commit()' function and introduce a 'grab_empty_leb()' heler. This cleans 'ubifs_rcvry_gc_commit()' a little and makes it a bit less of spagetti. Also, add a commentary which explains why it is crucial to first search for an empty LEB and then run commit. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-05-13 19:23:56 +03:00
Artem Bityutskiy	ec06814265	UBIFS: dump the stack on errors in failure mode too When UBIFS is in the failure mode (used for power cut emulation testing) we for some reasons do not dump the stack in many places, e.g., in assertions. Probably at early days we had too many of them and disabled this to make the development easier, but then never enabled. Nowadays I sometimes observe assertion failures during power cut testing, but the useful stackdump is not printed, which is bad. This patch makes UBIFS always print the stackdump when debugging is enabled. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-05-13 19:23:56 +03:00
Artem Bityutskiy	6d5904e062	UBIFS: print useful debugging messages when cannot recover gc_lnum If we fail to recover the gc_lnum we just return an error and it then it is difficult to figure out why this happened. This patch adds useful debugging information which should make it easier to debug the failure. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-05-13 19:23:56 +03:00
Artem Bityutskiy	bcdca3e10a	UBIFS: remove dead GC LEB recovery piece of code This patch removes a piece of code in 'ubifs_rcvry_gc_commit()' which is never executed. We call 'ubifs_find_dirty_leb()' function with min_space = wbuf->offs, so if it returns us an LEB, it is guaranteed to have at lease 'wbuf->offs' bytes of free+dirty space. So we can remove the subsequent code which deals with "returned LEB has less than 'wbuf->offs' bytes of free+dirty space". This simplifies 'ubifs_rcvry_gc_commit()' a little. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-05-13 19:23:55 +03:00
Artem Bityutskiy	2405f59481	UBIFS: remove duplicated code We have duplicated code in 'ubifs_garbage_collect()' and 'ubifs_rcvry_gc_commit()', which is about handling the special case of free LEB. In both cases we just want to garbage-collect the LEB using 'ubifs_garbage_collect_leb()'. This patch teaches 'ubifs_garbage_collect_leb()' to handle free LEB's so that the caller does not have to do this and the duplicated code is removed. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-05-13 19:23:55 +03:00
Artem Bityutskiy	2cd0a60cf4	UBIFS: remove strange commentary Remove the following commentary from 'ubifs_file_mmap()': /* 'generic_file_mmap()' takes care of NOMMU case */ I do not understand what it means, and I could not find anything relater to NOMMU in 'generic_file_mmap()'. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-05-13 19:23:55 +03:00
Artem Bityutskiy	341e262f90	UBIFS: do not change debugfs file position This patch is a tiny improvement which removes few bytes of code. UBIFS debugfs files are non-seekable and the file position is ignored, so do not increase it in the write handler. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-05-13 19:23:55 +03:00
Artem Bityutskiy	1321657d8f	UBIFS: fix oops in lprops dump function The 'dbg_dump_lprop()' is trying to detect journal head LEBs when printing, so it looks at the write-buffers. However, if we are in R/O mode, we de-allocate the write-buffers, so 'dbg_dump_lprop()' oopses. This patch fixes the issue. Note, this patch is not critical, it is only about the debugging code path, and it is unlikely that anyone but UBIFS developers would ever hit this issue. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-05-13 19:23:55 +03:00
Artem Bityutskiy	3b2f9a019e	UBIFS: use ro_mount instead of MS_RDONLY We have our own flags indicating R/O mode, and c->ro_mode is equivalent to MS_RDONLY. Let's be consistent and use UBIFS flags everywhere. This patch is just a minor cleanup. Additionally, add a comment that we are surprised with VFS behavior - as a reminder to look at this some day. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-05-13 19:23:55 +03:00
Artem Bityutskiy	1a29af8bd7	UBIFS: use EROFS when emulating failures When the debugging failure emulation is enabled and UBIFS decides to emulate an I/O error, it uses EIO error code. In which case UBIFS switches into R/O mode later on. The for the user-space is that when a failure is emulated, the file-system sometimes returns EIO and sometimes EROFS. This makes it more difficult to implement user-space tests for the failure mode. Let's be consistent and return EROFS in all the cases. This patch is an improvement for the debugging code and does not affect the functionality at all. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-05-13 19:23:54 +03:00
Sedat Dilek	14ffd5d0b0	UBIFS: make xattr operations names consistent This is just a tiny clean-up patch. The variable name for empty address space operations is "empty_aops". Let's use consistent names for empty inode and file operations: "empty_iops" and "empty_fops", instead of inconsistent "none_inode_operations" and "none_file_operations". Artem: re-write the commit message. Signed-off-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-05-13 19:23:54 +03:00
Artem Bityutskiy	cdd8ad6e9e	UBIFS: introduce lsave debugging Try to improve UBIFS testing coverage by randomly picking LEBs to store in lsave, rather than picking them optimally. Create a debugging version of 'populate_lsave()' for these purposes and enable it when general debugging self-checks are enabled. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-05-13 19:23:54 +03:00
Artem Bityutskiy	bc3f07f090	UBIFS: make force in-the-gaps to be a general self-check UBIFS can force itself to use the 'in-the-gaps' commit method - the last resort method which is normally invoced very very rarely. Currently this "force int-the-gaps" debugging feature is a separate test mode. But it is a bit saner to make it to be the "general" self-test check instead. This patch is just a clean-up which should make the debugging code look a bit nicer and easier to use - we have way too many debugging options. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-05-13 19:23:54 +03:00
Artem Bityutskiy	f1bd66afb1	UBIFS: improve space checking debugging feature This patch improves the 'dbg_check_space_info()' function which checks whether the amount of space before re-mounting and after re-mounting is the same (remounting from R/O to R/W modes and vice-versa). The problem is that 'dbg_check_space_info()' does not save the budgeting information before re-mounting, so when an error is reported, we do not know why the amount of free space changed. This patches makes the following changes: 1. Teaches 'dbg_dump_budg()' function to accept a 'struct ubifs_budg_info' argument and print out the this argument. This way we may ask it to print any saved budgeting info, no only the current one. 2. Accordingly changes all the callers of 'dbg_dump_budg()' to comply with the changed interface. 3. Introduce a 'saved_bi' (saved budgeting info) field to 'struct ubifs_debug_info' and save the budgeting info before re-mounting there. 4. Change 'dbg_check_space_info()' and make it print both old and new budgeting information. 5. Additionally, save 'c->igx_gc_cnt' and print it if and error happens. This value contributes to the amount of free space, so we have to print it. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-05-13 19:23:54 +03:00
Artem Bityutskiy	8c3067e445	UBIFS: rearrange the budget dump Re-arrange the budget dump and make sure we first dump all the 'struct ubifs_budg_info' fields, and then the other information. Additionally, print the 'uncommitted_idx' variable. This change is required for to the following dumping function enhancement where it will be possible to dump saved 'struct ubifs_budg_info' objects, not only the current one. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-05-13 19:23:53 +03:00
Artem Bityutskiy	8ff83089f8	UBIFS: simplify dbg_dump_budg calling conventions The current 'dbg_dump_budg()' calling convention is that the 'c->space_lock' spinlock is held. However, none of the callers actually use it from contects which have 'c->space_lock' locked, so all callers have to explicitely lock and unlock the spinlock. This is not very sensible convention. This patch changes it and makes 'dbg_dump_budg()' lock the spinlock instead of imposing this to the callers. This simplifies the code a little. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-05-13 19:23:53 +03:00
Artem Bityutskiy	b137545c44	UBIFS: introduce a separate structure for budgeting info This patch separates out all the budgeting-related information from 'struct ubifs_info' to 'struct ubifs_budg_info'. This way the code looks a bit cleaner. However, the main driver for this is that we want to save budgeting information and print it later, so a separate data structure for this is helpful. This patch is a preparation for the further debugging output improvements. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-05-13 19:23:53 +03:00
Artem Bityutskiy	cc64f774b4	UBIFS: use __packed instead of __attribute__((packed)) There was an attempt to standartize various "__attribute__" and other macros in order to have potentially portable and more consistent code, see commit `82ddcb0405`. Note, that commit refers Rober Love's blog post, but the URL is broken, the valid URL is: http://blog.rlove.org/2005/10/with-little-help-from-your-compiler.html Moreover, nowadays checkpatch.pl warns about using __attribute__((packed)): "WARNING: __packed is preferred over __attribute__((packed))" It is not a big deal for UBIFS to use __packed, so let's do it. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-05-13 19:23:53 +03:00
Artem Bityutskiy	c43615702f	UBIFS: fix minor stylistic issues Fix several minor stylistic issues: * lines longer than 80 characters * space before closing parenthesis ')' * spaces in the indentations Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-05-13 19:23:53 +03:00
Artem Bityutskiy	1bbfc848a9	UBIFS: make debugfs files non-seekable Turn the debufs files UBIFS maintains into non-seekable. Indeed, none of them is supposed to be seek'ed. Do this by making the '.lseek()' handler to be 'no_llseek()' and by using 'nonseekable_open()' in the '.open()' operation. This does mean an API break but this debugging API is only used by a couple of test scripts which do not rely in the 'llseek()' operation. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-05-13 19:23:53 +03:00
Arne Jansen	73c5de0051	btrfs: quasi-round-robin for chunk allocation In a multi device setup, the chunk allocator currently always allocates chunks on the devices in the same order. This leads to a very uneven distribution, especially with RAID1 or RAID10 and an uneven number of devices. This patch always sorts the devices before allocating, and allocates the stripes on the devices with the most available space, as long as there is enough space available. In a low space situation, it first tries to maximize striping. The patch also simplifies the allocator and reduces the checks for corner cases. The simplification is done by several means. First, it defines the properties of each RAID type upfront. These properties are used afterwards instead of differentiating cases in several places. Second, the old allocator defined a minimum stripe size for each block group type, tried to find a large enough chunk, and if this fails just allocates a smaller one. This is now done in one step. The largest possible chunk (up to max_chunk_size) is searched and allocated. Because we now have only one pass, the allocation of the map (struct map_lookup) is moved down to the point where the number of stripes is already known. This way we avoid reallocation of the map. We still avoid allocating stripes that are not a multiple of STRIPE_SIZE.	2011-05-13 15:36:14 +02:00
Arne Jansen	a9c9bf6827	btrfs: heed alloc_start currently alloc_start is disregarded if the requested chunk size is bigger than (device size - alloc_start), but smaller than the device size. The only situation where I see this could have made sense was when a chunk equal the size of the device has been requested. This was possible as the allocator failed to take alloc_start into account when calculating the request chunk size. As this gets fixed by this patch, the workaround is not necessary anymore.	2011-05-13 15:36:12 +02:00
Arne Jansen	bcd53741cc	btrfs: move btrfs_cmp_device_free_bytes to super.c this function won't be used here anymore, so move it super.c where it is used for df-calculation	2011-05-13 15:36:05 +02:00
Steven Whitehouse	f2741d9898	GFS2: Move all locking inside the inode creation function Now that there are no longer any exceptions to the normal inode creation code path, we can move the parts of the locking code which were duplicated in mkdir/mknod/create/symlink into the inode create function. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-05-13 12:11:17 +01:00
Steven Whitehouse	160b4026dc	GFS2: Clean up symlink creation This moves the symlink specific parts of inode creation into the function where we initialise the rest of the dinode. As a result we have one less place where we need to look up the inode's buffer. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-05-13 10:34:59 +01:00
Steven Whitehouse	e2d0a13bba	GFS2: Clean up mkdir This moves the initialisation of the directory into the inode creation functions to avoid having to duplicate the lookup of the inode's buffer. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-05-13 09:55:55 +01:00
David Sterba	4ea028859b	btrfs: use unsigned type for single bit bitfield Signed-off-by: David Sterba <dsterba@suse.cz>	2011-05-12 18:14:53 +02:00
David Sterba	7a36ddec10	btrfs: use printk_ratelimited instead of printk_ratelimit As per printk_ratelimit comment, it should not be used. Signed-off-by: David Sterba <dsterba@suse.cz>	2011-05-12 18:08:38 +02:00
Linus Torvalds	6eaed0a438	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: fix oops in revalidate when called with NULL nameidata	2011-05-12 08:06:53 -07:00
Arne Jansen	8628764e1a	btrfs: add readonly flag setting the readonly flag prevents writes in case an error is detected Signed-off-by: Arne Jansen <sensille@gmx.net>	2011-05-12 14:48:31 +02:00
Ilya Dryomov	96e369208e	btrfs scrub: make fixups sync btrfs scrub - make fixups sync, don't reuse fixup bios Fixups are already sync for csum failures, this patch makes them sync for EIO case as well. Fixups are now sharing pages with the parent sbio - instead of allocating a separate page to do a fixup we grab the page from the sbio buffer. Fixup bios are no longer reused. struct fixup is no longer needed, instead pass [sbio pointer, index]. Originally this was added to look at the possibility of sharing the code between drive swap and scrub, but it actually fixes a serious bug in scrub code where errors that could be corrected were ignored and reported as uncorrectable. btrfs scrub - restore bios properly after media errors The current code reallocates a bio after a media error. This is a temporary measure introduced in v3 after a serious problem related to bio reuse was found in v2 of scrub patchset. Basically we did not reset bv_offset and bv_len fields of the bio_vec structure. They are changed in case I/O error happens, for example, at offset 512 or 1024 into the page. Also bi_flags field wasn't properly setup before reusing the bio. Signed-off-by: Arne Jansen <sensille@gmx.net>	2011-05-12 14:48:28 +02:00
Jan Schmidt	475f63874d	btrfs: new ioctls for scrub adds ioctls necessary to start and cancel scrubs, to get current progress and to get info about devices to be scrubbed. Note that the scrub is done per-device and that the ioctl only returns after the scrub for this devices is finished or has been canceled. Signed-off-by: Arne Jansen <sensille@gmx.net>	2011-05-12 14:45:38 +02:00
Arne Jansen	a2de733c78	btrfs: scrub This adds an initial implementation for scrub. It works quite straightforward. The usermode issues an ioctl for each device in the fs. For each device, it enumerates the allocated device chunks. For each chunk, the contained extents are enumerated and the data checksums fetched. The extents are read sequentially and the checksums verified. If an error occurs (checksum or EIO), a good copy is searched for. If one is found, the bad copy will be rewritten. All enumerations happen from the commit roots. During a transaction commit, the scrubs get paused and afterwards continue from the new roots. This commit is based on the series originally posted to linux-btrfs with some improvements that resulted from comments from David Sterba, Ilya Dryomov and Jan Schmidt. Signed-off-by: Arne Jansen <sensille@gmx.net>	2011-05-12 14:45:20 +02:00
Trond Myklebust	a75b9df9d3	NFSv4.1: Ensure that layoutget uses the correct gfp modes Currently, writebacks may end up recursing back into the filesystem due to GFP_KERNEL direct reclaims in the pnfs subsystem. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-05-11 22:52:13 -04:00
Linus Torvalds	3568bd9720	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: do not use i_wrbuffer_ref as refcount for Fb cap ceph: fix list_add in ceph_put_snap_realm ceph: print debug message before put mds session	2011-05-11 19:13:34 -07:00
Andy Adamson	2887fe4552	NFSv4.1: remove pnfs_layout_hdr from pnfs_destroy_all_layouts tmp_list Prevents an infinite loop as list was never emptied. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-05-11 14:20:13 -04:00
Andy Adamson	a8a4ae3a89	NFSv41: Resend on NFS4ERR_RETRY_UNCACHED_REP Free the slot and resend the RPC with new session <slot#,seq#>. For nfs4_async_handle_error, return -EAGAIN and set the task->tk_status to 0 to restart the async rpc in the rpc_restart_call_prepare state which resets the slot. For nfs4_handle_exception, retrying a call that uses nfs4_call_sync will reset the slot via nfs41_call_sync_prepare. For open/close/lock/locku/delegreturn/layoutcommit/unlink/rename/write cachethis is true, so these operations will not trigger an NFS4ERR_RETRY_UNCACHED_REP. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-05-11 14:01:33 -04:00
Henry C Chang	d3d0720d4a	ceph: do not use i_wrbuffer_ref as refcount for Fb cap We increments i_wrbuffer_ref when taking the Fb cap. This breaks the dirty page accounting and causes looping in __ceph_do_pending_vmtruncate, and ceph client hangs. This bug can be reproduced occasionally by running blogbench. Add a new field i_wb_ref to inode and dedicate it to Fb reference counting. Signed-off-by: Henry C Chang <henry.cy.chang@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>	2011-05-11 10:44:48 -07:00
Henry C Chang	a26a185d27	ceph: fix list_add in ceph_put_snap_realm Signed-off-by: Henry C Chang <henry.cy.chang@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>	2011-05-11 10:44:36 -07:00
Henry C Chang	7d8e18a69d	ceph: print debug message before put mds session The mds session, s, could be freed during ceph_put_mds_session. Move dout before ceph_put_mds_session. Signed-off-by: Henry C Chang <henry.cy.chang@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>	2011-05-11 10:44:34 -07:00
Eric W. Biederman	a00eaf11a2	ns proc: Add support for the ipc namespace Acked-by: Daniel Lezcano <daniel.lezcano@free.fr> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2011-05-10 14:35:47 -07:00
Eric W. Biederman	34482e89a5	ns proc: Add support for the uts namespace Acked-by: Daniel Lezcano <daniel.lezcano@free.fr> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2011-05-10 14:35:35 -07:00
Eric W. Biederman	13b6f57623	ns proc: Add support for the network namespace. Implementing file descriptors for the network namespace is simple and straight forward. Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Daniel Lezcano <daniel.lezcano@free.fr> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2011-05-10 14:34:26 -07:00
Eric W. Biederman	6b4e306aa3	ns: proc files for namespace naming policy. Create files under /proc/<pid>/ns/ to allow controlling the namespaces of a process. This addresses three specific problems that can make namespaces hard to work with. - Namespaces require a dedicated process to pin them in memory. - It is not possible to use a namespace unless you are the child of the original creator. - Namespaces don't have names that userspace can use to talk about them. The namespace files under /proc/<pid>/ns/ can be opened and the file descriptor can be used to talk about a specific namespace, and to keep the specified namespace alive. A namespace can be kept alive by either holding the file descriptor open or bind mounting the file someplace else. aka: mount --bind /proc/self/ns/net /some/filesystem/path mount --bind /proc/self/fd/<N> /some/filesystem/path This allows namespaces to be named with userspace policy. It requires additional support to make use of these filedescriptors and that will be comming in the following patches. Acked-by: Daniel Lezcano <daniel.lezcano@free.fr> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2011-05-10 14:31:44 -07:00
Robert P. J. Day	1f8e1cdac6	SYSFS: Fix erroneous comments for sysfs_update_group(). Fix what is clearly a simple copy-and-paste error in commenting the sysfs_update_group() routine. Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-05-10 14:22:00 -07:00
Linus Torvalds	675badfc48	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: fix race condition in AIL push trigger xfs: make AIL target updates and compares 32bit safe. xfs: always push the AIL to the target xfs: exit AIL push work correctly when AIL is empty xfs: ensure reclaim cursor is reset correctly at end of AG	2011-05-10 11:56:35 -07:00
Miklos Szeredi	d24339059d	fuse: fix oops in revalidate when called with NULL nameidata Some cases (e.g. ecryptfs) can call ->dentry_revalidate with NULL nameidata. https://bugzilla.kernel.org/show_bug.cgi?id=34732 Tyler Hicks pointed out that this bug was introduced by commit `e7c0a16786` "fuse: make fuse_dentry_revalidate() RCU aware" Reported-by: Witold Baryluk <baryluk@smp.if.uj.edu.pl> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2011-05-10 17:35:58 +02:00
Steven Whitehouse	32e471ef10	GFS2: Use UUID field in generic superblock The VFS superblock structure now has a UUID field, so we can use that in preference to the UUID field in the GFS2 superblock now. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-05-10 15:01:59 +01:00
Ryusuke Konishi	5fc7b14177	nilfs2: use mark_buffer_dirty to mark btnode or meta data dirty This replaces nilfs_mdt_mark_buffer_dirty and nilfs_btnode_mark_dirty macros with mark_buffer_dirty and gets rid of nilfs_mark_buffer_dirty, an own mark buffer dirty function. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-05-10 22:21:57 +09:00
Ryusuke Konishi	aa405b1f42	nilfs2: always set back pointer to host inode in mapping->host In the current nilfs, page cache for btree nodes and meta data files do not set a valid back pointer to the host inode in mapping->host. This will change it so that every address space in nilfs uses mapping->host to hold its host inode. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-05-10 22:21:57 +09:00
Ryusuke Konishi	0ef28f9aec	nilfs2: get rid of NILFS_I_NILFS This replaces all references of NILFS_I_NILFS(inode)->ns_bdev with inode->i_sb->s_bdev and unfolds remaining uses of NILFS_I_NILFS inline function. Before 2.6.37, referring to a nilfs object from inodes needed a conditional judgement, and NILFS_I_NILFS was helpful to simplify it. But now we can simply do it by going through a super block instance like inode->i_sb->s_fs_info. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-05-10 22:21:56 +09:00
Ryusuke Konishi	0cc1283881	nilfs2: use list_first_entry This uses list_first_entry macro instead of list_entry if it's used to get the first entry. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-05-10 22:21:56 +09:00
Ryusuke Konishi	293ce0ed8c	nilfs2: use empty_aops for gc-inodes Applies empty_aops for address space operations of gc-inodes. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-05-10 22:21:56 +09:00
Ryusuke Konishi	4e33f9eab0	nilfs2: implement resize ioctl This adds resize ioctl which makes online resize possible. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-05-10 22:21:46 +09:00
Ryusuke Konishi	78eb64c247	nilfs2: add truncation routine of segment usage file When shrinking the filesystem, segments to be truncated must be test if they are busy or not, and unneeded sufile block should be deleted. This adds routines for the truncation. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-05-10 22:21:45 +09:00
Ryusuke Konishi	cfb0a4bfd8	nilfs2: add routine to move secondary super block After resizing the filesystem, the secondary super block must be moved to a new location. This adds a helper function for this. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-05-10 22:21:45 +09:00
Ryusuke Konishi	619205da5b	nilfs2: add ioctl which limits range of segment to be allocated This adds a new ioctl command which limits range of segment to be allocated. This is intended to gather data whithin a range of the partition before shrinking the filesystem, or to control new log location for some purpose. If a range is specified by the ioctl, segment allocator of nilfs tries to allocate new segments from the range unless no free segments are available there. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-05-10 22:21:45 +09:00
Ryusuke Konishi	56eb553885	nilfs2: zero fill unused portion of super root block The super root block is newly-allocated each time it is written back to disk, so unused portion of the block should be cleared. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-05-10 22:21:45 +09:00
Ryusuke Konishi	6c6de1aa65	nilfs2: super root size should change depending on inode size The size of super root structure depends on inode size, so NILFS_SR_BYTES macro should be a function of the inode size. This fixes the issue. Even though a different size value will be written for a possible future filesystem with extended inode, but fortunately this does not break disk format compatibility. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-05-10 22:21:44 +09:00
Ryusuke Konishi	1cb2d38cb3	nilfs2: get rid of private page allocator Previously, nilfs was cloning pages for mmapped region to freeze their data and ensure consistency of checksum during writeback cycles. A private page allocator was used for this page cloning. But, we no longer need to do that since clear_page_dirty_for_io function sets up pte so that vm_ops->page_mkwrite function is called right before the mmapped pages are modified and nilfs_page_mkwrite function can safely wait for the pages to be written back to disk. So, this stops making a copy of mmapped pages during writeback, and eliminates the private page allocation and deallocation functions from nilfs. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-05-10 22:21:44 +09:00
Nicolas Kaiser	eaae0f37d8	nilfs2: merge list_del()/list_add_tail() to list_move_tail() Merge list_del() + list_add_tail() to list_move_tail(). Signed-off-by: Nicolas Kaiser <nikai@nikai.net> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-05-10 22:21:44 +09:00
Ryusuke Konishi	349dbc3669	nilfs2: fix infinite loop in nilfs_palloc_freev function After having applied commit `9954e7af14` ("nilfs2: add free entries count only if clear bit operation succeeded"), a free routine of nilfs came to fall into an infinite loop, outputting the same message endlessly: nilfs_palloc_freev: entry number 29497 already freed nilfs_palloc_freev: entry number 29497 already freed nilfs_palloc_freev: entry number 29497 already freed nilfs_palloc_freev: entry number 29497 already freed nilfs_palloc_freev: entry number 29497 already freed ... That patch broke the routine so that a loop counter is never updated in an abnormal state. This fixes the regression. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-05-10 22:19:50 +09:00
Steven Whitehouse	2ab9cd1c63	GFS2: Rename ops_inode.c to inode.c This is the final part of the ops_inode.c/inode.c reordering. We are left with a single file called inode.c which now contains all the inode operations, as expected. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-05-10 13:12:49 +01:00
Steven Whitehouse	64ea540258	GFS2: Inode.c is empty now, remove it Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-05-10 13:09:53 +01:00
Justin P. Mattock	70f23fd66b	treewide: fix a few typos in comments - kenrel -> kernel - whetehr -> whether - ttt -> tt - sss -> ss Signed-off-by: Justin P. Mattock <justinmattock@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2011-05-10 10:16:21 +02:00
Amir Goldstein	44183d4231	ext4: remove alloc_semp After taking care of all group init races, all that remains is to remove alloc_semp from ext4_allocation_context and ext4_buddy structs. Signed-off-by: Amir Goldstein <amir73il@users.sf.net> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-05-09 21:52:36 -04:00
Amir Goldstein	9b8b7d353f	ext4: teach ext4_mb_init_cache() to skip uptodate buddy caches After online resize which adds new groups, some of the groups in a buddy page may be initialized and uptodate, while other (new ones) may be uninitialized. The indication for init of new block groups is when ext4_mb_init_cache() is called with an uptodate buddy page. In this case, initialized groups on that buddy page must be skipped when initializing the buddy cache. Signed-off-by: Amir Goldstein <amir73il@users.sf.net> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-05-09 21:49:42 -04:00
Amir Goldstein	2de8807b25	ext4: synchronize ext4_mb_init_group() with buddy page lock The old routines ext4_mb_[get\|put]_buddy_cache_lock(), which used to take grp->alloc_sem for all groups on the buddy page have been replaced with the routines ext4_mb_[get\|put]_buddy_page_lock(). The new routines take both buddy and bitmap page locks to protect against concurrent init of groups on the same buddy page. The GROUP_NEED_INIT flag is tested again under page lock to check if the group was initialized by another caller. Signed-off-by: Amir Goldstein <amir73il@users.sf.net> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-05-09 21:48:13 -04:00
Amir Goldstein	e73a347b77	ext4: implement ext4_add_groupblocks() by freeing blocks The old imlementation used to take grp->alloc_sem and set the GROUP_NEED_INIT flag, so that the buddy cache would be reloaded. The new implementation updates the buddy cache by freeing the added blocks and making them available for use, so there is no need to reload the buddy cache and there is no need to take grp->alloc_sem. Signed-off-by: Amir Goldstein <amir73il@users.sf.net> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-05-09 21:40:01 -04:00
Dave Chinner	7ac956576d	xfs: fix race condition in AIL push trigger The recent conversion of the xfsaild functionality to a work queue introduced a hard-to-hit log space grant hang. One is caused by a race condition in determining whether there is a psh in progress or not. The XFS_AIL_PUSHING_BIT is used to determine whether a push is currently in progress. When the AIL push work completes, it checked whether the target changed and cleared the PUSHING bit to allow a new push to be requeued. The race condition is as follows: Thread 1 push work smp_wmb() smp_rmb() check ailp->xa_target unchanged update ailp->xa_target test/set PUSHING bit does not queue clear PUSHING bit does not requeue Now that the push target is updated, new attempts to push the AIL will not trigger as the push target will be the same, and hence despite trying to push the AIL we won't ever wake it again. The fix is to ensure that the AIL push work clears the PUSHING bit before it checks if the target is unchanged. As a result, both push triggers operate on the same test/set bit criteria, so even if we race in the push work and miss the target update, the thread requesting the push will still set the PUSHING bit and queue the push work to occur. For safety sake, the same queue check is done if the push work detects the target change, though only one of the two will will queue new work due to the use of test_and_set_bit() checks. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> (cherry picked from commit `e4d3c4a43b`)	2011-05-09 18:35:04 -05:00
Dave Chinner	fe0da76731	xfs: make AIL target updates and compares 32bit safe. The recent conversion of the xfsaild functionality to a work queue introduced a hard-to-hit log space grant hang. One of the problems noticed was that updates of the push target are not 32 bit safe as the target is a 64 bit value. We cannot copy a 64 bit LSN without the possibility of corrupting the result when racing with another updating thread. We have function to do this update safely without needing to care about 32/64 bit issues - xfs_trans_ail_copy_lsn() - so use that when updating the AIL push target. Also move the reading of the target in the push work inside the AIL lock, and use XFS_LSN_CMP() for the unlocked comparison during work termination to close read holes as well. Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> (cherry picked from commit `fd5670f22f`)	2011-05-09 18:35:04 -05:00
Dave Chinner	50e86686df	xfs: always push the AIL to the target The recent conversion of the xfsaild functionality to a work queue introduced a hard-to-hit log space grant hang. One of the problems discovered is a target mismatch between the item pushing loop and the target itself. The push trigger checks for the target increasing (i.e. new target > current) while the push loop only pushes items that have a LSN < current. As a result, we can get the situation where the push target is X, the items at the tail of the AIL have LSN X and they don't get pushed. The push work then completes thinking it is done, and cannot be restarted until the push target increases to >= X + 1. If the push target then never increases (because the tail is not moving), then we never run the push work again and we stall. Fix it by making sure log items with a LSN that matches the target exactly are pushed during the loop. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> (cherry picked from commit `cb64026b6e`)	2011-05-09 18:35:03 -05:00
Dave Chinner	9e7004e741	xfs: exit AIL push work correctly when AIL is empty The recent conversion of the xfsaild functionality to a work queue introduced a hard-to-hit log space grant hang. The main cause is a regression where a work exit path fails to clear the PUSHING state and recheck the target correctly. Make both exit paths do the same PUSHING bit clearing and target checking when the "no more work to be done" condition is hit. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> (cherry picked from commit `ea35a20021`)	2011-05-09 18:35:03 -05:00
Dave Chinner	228d62dd3f	xfs: ensure reclaim cursor is reset correctly at end of AG On a 32 bit highmem PowerPC machine, the XFS inode cache was growing without bound and exhausting low memory causing the OOM killer to be triggered. After some effort, the problem was reproduced on a 32 bit x86 highmem machine. The problem is that the per-ag inode reclaim index cursor was not getting reset to the start of the AG if the radix tree tag lookup found no more reclaimable inodes. Hence every further reclaim attempt started at the same index beyond where any reclaimable inodes lay, and no further background reclaim ever occurred from the AG. Without background inode reclaim the VM driven cache shrinker simply cannot keep up with cache growth, and OOM is the result. While the change that exposed the problem was the conversion of the inode reclaim to use work queues for background reclaim, it was not the cause of the bug. The bug was introduced when the cursor code was added, just waiting for some weird configuration to strike.... Signed-off-by: Dave Chinner <dchinner@redhat.com> Tested-By: Christian Kujau <lists@nerdbynature.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> (cherry picked from commit `b223221956`)	2011-05-09 18:35:03 -05:00
Mikulas Patocka	a09a79f668	Don't lock guardpage if the stack is growing up Linux kernel excludes guard page when performing mlock on a VMA with down-growing stack. However, some architectures have up-growing stack and locking the guard page should be excluded in this case too. This patch fixes lvm2 on PA-RISC (and possibly other architectures with up-growing stack). lvm2 calculates number of used pages when locking and when unlocking and reports an internal error if the numbers mismatch. [ Patch changed fairly extensively to also fix /proc/<pid>/maps for the grows-up case, and to move things around a bit to clean it all up and share the infrstructure with the /proc bits. Tested on ia64 that has both grow-up and grow-down segments - Linus ] Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Tested-by: Tony Luck <tony.luck@gmail.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-05-09 16:22:07 -07:00
Dave Chinner	e4d3c4a43b	xfs: fix race condition in AIL push trigger The recent conversion of the xfsaild functionality to a work queue introduced a hard-to-hit log space grant hang. One is caused by a race condition in determining whether there is a psh in progress or not. The XFS_AIL_PUSHING_BIT is used to determine whether a push is currently in progress. When the AIL push work completes, it checked whether the target changed and cleared the PUSHING bit to allow a new push to be requeued. The race condition is as follows: Thread 1 push work smp_wmb() smp_rmb() check ailp->xa_target unchanged update ailp->xa_target test/set PUSHING bit does not queue clear PUSHING bit does not requeue Now that the push target is updated, new attempts to push the AIL will not trigger as the push target will be the same, and hence despite trying to push the AIL we won't ever wake it again. The fix is to ensure that the AIL push work clears the PUSHING bit before it checks if the target is unchanged. As a result, both push triggers operate on the same test/set bit criteria, so even if we race in the push work and miss the target update, the thread requesting the push will still set the PUSHING bit and queue the push work to occur. For safety sake, the same queue check is done if the push work detects the target change, though only one of the two will will queue new work due to the use of test_and_set_bit() checks. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>	2011-05-09 12:17:04 -05:00
Dave Chinner	fd5670f22f	xfs: make AIL target updates and compares 32bit safe. The recent conversion of the xfsaild functionality to a work queue introduced a hard-to-hit log space grant hang. One of the problems noticed was that updates of the push target are not 32 bit safe as the target is a 64 bit value. We cannot copy a 64 bit LSN without the possibility of corrupting the result when racing with another updating thread. We have function to do this update safely without needing to care about 32/64 bit issues - xfs_trans_ail_copy_lsn() - so use that when updating the AIL push target. Also move the reading of the target in the push work inside the AIL lock, and use XFS_LSN_CMP() for the unlocked comparison during work termination to close read holes as well. Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>	2011-05-09 12:17:04 -05:00
Dave Chinner	cb64026b6e	xfs: always push the AIL to the target The recent conversion of the xfsaild functionality to a work queue introduced a hard-to-hit log space grant hang. One of the problems discovered is a target mismatch between the item pushing loop and the target itself. The push trigger checks for the target increasing (i.e. new target > current) while the push loop only pushes items that have a LSN < current. As a result, we can get the situation where the push target is X, the items at the tail of the AIL have LSN X and they don't get pushed. The push work then completes thinking it is done, and cannot be restarted until the push target increases to >= X + 1. If the push target then never increases (because the tail is not moving), then we never run the push work again and we stall. Fix it by making sure log items with a LSN that matches the target exactly are pushed during the loop. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>	2011-05-09 12:17:04 -05:00
Dave Chinner	ea35a20021	xfs: exit AIL push work correctly when AIL is empty The recent conversion of the xfsaild functionality to a work queue introduced a hard-to-hit log space grant hang. The main cause is a regression where a work exit path fails to clear the PUSHING state and recheck the target correctly. Make both exit paths do the same PUSHING bit clearing and target checking when the "no more work to be done" condition is hit. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>	2011-05-09 12:17:03 -05:00
Dave Chinner	b223221956	xfs: ensure reclaim cursor is reset correctly at end of AG On a 32 bit highmem PowerPC machine, the XFS inode cache was growing without bound and exhausting low memory causing the OOM killer to be triggered. After some effort, the problem was reproduced on a 32 bit x86 highmem machine. The problem is that the per-ag inode reclaim index cursor was not getting reset to the start of the AG if the radix tree tag lookup found no more reclaimable inodes. Hence every further reclaim attempt started at the same index beyond where any reclaimable inodes lay, and no further background reclaim ever occurred from the AG. Without background inode reclaim the VM driven cache shrinker simply cannot keep up with cache growth, and OOM is the result. While the change that exposed the problem was the conversion of the inode reclaim to use work queues for background reclaim, it was not the cause of the bug. The bug was introduced when the cursor code was added, just waiting for some weird configuration to strike.... Signed-off-by: Dave Chinner <dchinner@redhat.com> Tested-By: Christian Kujau <lists@nerdbynature.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>	2011-05-09 12:17:03 -05:00
Linus Torvalds	7f4238a0ef	Merge branch 'hpfs' * hpfs: HPFS: Remove unused variable HPFS: Move declaration up, so that there are no out-of-scope pointers HPFS: Fix some unaligned accesses HPFS: Fix endianity. Make hpfs work on big-endian machines HPFS: Implement fsync for hpfs HPFS: Fix a bug that filesystem was not marked dirty when remounting it HPFS: Restrict uid and gid to 16-bit values HPFS: When marking or clearing the dirty bit, sync the filesystem HPFS: Use types with defined width HPFS: Remove mark_inode_dirty HPFS: Remove CR/LF conversion option HPFS: Remove remaining locks HPFS: Introduce a global mutex and lock it on every callback from VFS. HPFS: Make HPFS compile on preempt and SMP	2011-05-09 09:07:55 -07:00
Mikulas Patocka	88f4e9e870	HPFS: Remove unused variable Remove unused variable Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-05-09 09:04:24 -07:00
Mikulas Patocka	c351481744	HPFS: Move declaration up, so that there are no out-of-scope pointers Move declaration up, so that there are no out-of-scope pointers Reported-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-05-09 09:04:24 -07:00
Mikulas Patocka	d0969d1949	HPFS: Fix some unaligned accesses Fix some unaligned accesses Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-05-09 09:04:24 -07:00

... 4 5 6 7 8 ...

23026 Commits