Lots of bug fixes, cleanups and optimizations. In the bug fixes

category, of note is a fix for on-line resizing file systems where the block size is smaller than the page size (i.e., file systems 1k blocks on x86, or more interestingly file systems with 4k blocks on Power or ia64 systems.) In the cleanup category, the ext4's punch hole implementation was significantly improved by Lukas Czerner, and now supports bigalloc file systems. In addition, Jan Kara significantly cleaned up the write submission code path. We also improved error checking and added a few sanity checks. In the optimizations category, two major optimizations deserve mention. The first is that ext4_writepages() is now used for nodelalloc and ext3 compatibility mode. This allows writes to be submitted much more efficiently as a single bio request, instead of being sent as individual 4k writes into the block layer (which then relied on the elevator code to coalesce the requests in the block queue). Secondly, the extent cache shrink mechanism, which was introduce in 3.9, no longer has a scalability bottleneck caused by the i_es_lru spinlock. Other optimizations include some changes to reduce CPU usage and to avoid issuing empty commits unnecessarily. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABCAAGBQJR0XhgAAoJENNvdpvBGATwMXkQAJwTPk5XYLqtAwLziFLvM6wG 0tWa1QAzTNo80tLyM9iGqI6x74X5nddLw5NMICUmPooOa9agMuA4tlYVSss5jWzV yyB7vLzsc/2eZJusuVqfTKrdGybE+M766OI6VO9WodOoIF1l51JXKjktKeaWegfv NkcLKlakD4V+ZASEDB/cOcR/lTwAs9dQ89AZzgPiW+G8Do922QbqkENJB8mhalbg rFGX+lu9W0f3fqdmT3Xi8KGn3EglETdVd6jU7kOZN4vb5LcF5BKHQnnUmMlpeWMT ksOVasb3RZgcsyf5ZOV5feXV601EsNtPBrHAmH22pWQy3rdTIvMv/il63XlVUXZ2 AXT3cHEvNQP0/yVaOTCZ9xQVxT8sL4mI6kENP9PtNuntx7E90JBshiP5m24kzTZ/ zkIeDa+FPhsDx1D5EKErinFLqPV8cPWONbIt/qAgo6663zeeIyMVhzxO4resTS9k U2QEztQH+hDDbjgABtz9M/GjSrohkTYNSkKXzhTjqr/m5huBrVMngjy/F4/7G7RD vSEx5aXqyagnrUcjsupx+biJ1QvbvZWOVxAE/6hNQNRGDt9gQtHAmKw1eG2mugHX +TFDxodNE4iWEURenkUxXW3mDx7hFbGZR0poHG3M/LVhKMAAAw0zoKrrUG5c70G7 XrddRLGlk4Hf+2o7/D7B =SwaI -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 update from Ted Ts'o: "Lots of bug fixes, cleanups and optimizations. In the bug fixes category, of note is a fix for on-line resizing file systems where the block size is smaller than the page size (i.e., file systems 1k blocks on x86, or more interestingly file systems with 4k blocks on Power or ia64 systems.) In the cleanup category, the ext4's punch hole implementation was significantly improved by Lukas Czerner, and now supports bigalloc file systems. In addition, Jan Kara significantly cleaned up the write submission code path. We also improved error checking and added a few sanity checks. In the optimizations category, two major optimizations deserve mention. The first is that ext4_writepages() is now used for nodelalloc and ext3 compatibility mode. This allows writes to be submitted much more efficiently as a single bio request, instead of being sent as individual 4k writes into the block layer (which then relied on the elevator code to coalesce the requests in the block queue). Secondly, the extent cache shrink mechanism, which was introduce in 3.9, no longer has a scalability bottleneck caused by the i_es_lru spinlock. Other optimizations include some changes to reduce CPU usage and to avoid issuing empty commits unnecessarily." * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (86 commits) ext4: optimize starting extent in ext4_ext_rm_leaf() jbd2: invalidate handle if jbd2_journal_restart() fails ext4: translate flag bits to strings in tracepoints ext4: fix up error handling for mpage_map_and_submit_extent() jbd2: fix theoretical race in jbd2__journal_restart ext4: only zero partial blocks in ext4_zero_partial_blocks() ext4: check error return from ext4_write_inline_data_end() ext4: delete unnecessary C statements ext3,ext4: don't mess with dir_file->f_pos in htree_dirblock_to_tree() jbd2: move superblock checksum calculation to jbd2_write_superblock() ext4: pass inode pointer instead of file pointer to punch hole ext4: improve free space calculation for inline_data ext4: reduce object size when !CONFIG_PRINTK ext4: improve extent cache shrink mechanism to avoid to burn CPU time ext4: implement error handling of ext4_mb_new_preallocation() ext4: fix corruption when online resizing a fs with 1K block size ext4: delete unused variables ext4: return FIEMAP_EXTENT_UNKNOWN for delalloc extents jbd2: remove debug dependency on debug_fs and update Kconfig help text jbd2: use a single printk for jbd_debug() ...
2024-09-21 20:22:13 +08:00 · 2013-07-02 09:39:34 -07:00 · 2013-07-02 09:39:34 -07:00 · 9e239bb939
commit 9e239bb939
parent 63580e51bb 6ae06ff51e
63 changed files with 2724 additions and 2242 deletions
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@ -189,7 +189,7 @@ prototypes:
 				loff_t pos, unsigned len, unsigned copied,
 				struct page *page, void *fsdata);
 	sector_t (*bmap)(struct address_space *, sector_t);
-	int (*invalidatepage) (struct page *, unsigned long);
+	void (*invalidatepage) (struct page *, unsigned int, unsigned int);
 	int (*releasepage) (struct page *, int);
 	void (*freepage)(struct page *);
 	int (*direct_IO)(int, struct kiocb *, const struct iovec *iov,
@ -310,8 +310,8 @@ filesystems and by the swapper. The latter will eventually go away.  Please,
 keep it that way and don't breed new callers.
 	->invalidatepage() is called when the filesystem must attempt to drop
-some or all of the buffers from the page when it is being truncated.  It
+some or all of the buffers from the page when it is being truncated. It
-returns zero on success.  If ->invalidatepage is zero, the kernel uses
+returns zero on success. If ->invalidatepage is zero, the kernel uses
 block_invalidatepage() instead.
 	->releasepage() is called when the kernel is about to try to drop the
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@ -549,7 +549,7 @@ struct address_space_operations
 -------------------------------
 This describes how the VFS can manipulate mapping of a file to page cache in
-your filesystem. As of kernel 2.6.22, the following members are defined:
+your filesystem. The following members are defined:
 struct address_space_operations {
 	int (*writepage)(struct page *page, struct writeback_control *wbc);
@ -566,7 +566,7 @@ struct address_space_operations {
 				loff_t pos, unsigned len, unsigned copied,
 				struct page *page, void *fsdata);
 	sector_t (*bmap)(struct address_space *, sector_t);
-	int (*invalidatepage) (struct page *, unsigned long);
+	void (*invalidatepage) (struct page *, unsigned int, unsigned int);
 	int (*releasepage) (struct page *, int);
 	void (*freepage)(struct page *);
 	ssize_t (*direct_IO)(int, struct kiocb *, const struct iovec *iov,
@ -685,14 +685,14 @@ struct address_space_operations {
  invalidatepage: If a page has PagePrivate set, then invalidatepage
        will be called when part or all of the page is to be removed
 	from the address space.  This generally corresponds to either a
-	truncation or a complete invalidation of the address space
+	truncation, punch hole  or a complete invalidation of the address
-	(in the latter case 'offset' will always be 0).
+	space (in the latter case 'offset' will always be 0 and 'length'
-	Any private data associated with the page should be updated
+	will be PAGE_CACHE_SIZE). Any private data associated with the page
-	to reflect this truncation.  If offset is 0, then
+	should be updated to reflect this truncation.  If offset is 0 and
-	the private data should be released, because the page
+	length is PAGE_CACHE_SIZE, then the private data should be released,
-	must be able to be completely discarded.  This may be done by
+	because the page must be able to be completely discarded.  This may
-        calling the ->releasepage function, but in this case the
+	be done by calling the ->releasepage function, but in this case the
-        release MUST succeed.
+	release MUST succeed.
  releasepage: releasepage is called on PagePrivate pages to indicate
        that the page should be freed if possible.  ->releasepage
--- a/fs/9p/vfs_addr.c
+++ b/fs/9p/vfs_addr.c
@ -148,13 +148,14 @@ static int v9fs_release_page(struct page *page, gfp_t gfp)
 * @offset: offset in the page
 */
-static void v9fs_invalidate_page(struct page *page, unsigned long offset)
+static void v9fs_invalidate_page(struct page *page, unsigned int offset,
 				 unsigned int length)
 {
 	/*
 	 * If called with zero offset, we should release
 	 * the private state assocated with the page
 	 */
-	if (offset == 0)
+	if (offset == 0 && length == PAGE_CACHE_SIZE)
 		v9fs_fscache_invalidate_page(page);
 }
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@ -19,7 +19,8 @@
 #include "internal.h"
 static int afs_readpage(struct file *file, struct page *page);
-static void afs_invalidatepage(struct page *page, unsigned long offset);
+static void afs_invalidatepage(struct page *page, unsigned int offset,
 			       unsigned int length);
 static int afs_releasepage(struct page *page, gfp_t gfp_flags);
 static int afs_launder_page(struct page *page);
@ -310,16 +311,17 @@ static int afs_launder_page(struct page *page)
 * - release a page and clean up its private data if offset is 0 (indicating
 *   the entire page)
 */
-static void afs_invalidatepage(struct page *page, unsigned long offset)
+static void afs_invalidatepage(struct page *page, unsigned int offset,
 			       unsigned int length)
 {
 	struct afs_writeback *wb = (struct afs_writeback *) page_private(page);
-	_enter("{%lu},%lu", page->index, offset);
+	_enter("{%lu},%u,%u", page->index, offset, length);
 	BUG_ON(!PageLocked(page));
 	/* we clean up only if the entire page is being invalidated */
-	if (offset == 0) {
+	if (offset == 0 && length == PAGE_CACHE_SIZE) {
 #ifdef CONFIG_AFS_FSCACHE
 		if (PageFsCache(page)) {
 			struct afs_vnode *vnode = AFS_FS_I(page->mapping->host);
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@ -1013,7 +1013,8 @@ static int btree_releasepage(struct page *page, gfp_t gfp_flags)
 	return try_release_extent_buffer(page);
 }
-static void btree_invalidatepage(struct page *page, unsigned long offset)
+static void btree_invalidatepage(struct page *page, unsigned int offset,
 				 unsigned int length)
 {
 	struct extent_io_tree *tree;
 	tree = &BTRFS_I(page->mapping->host)->io_tree;
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@ -2957,7 +2957,7 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc,
 	pg_offset = i_size & (PAGE_CACHE_SIZE - 1);
 	if (page->index > end_index ||
 	   (page->index == end_index && !pg_offset)) {
-		page->mapping->a_ops->invalidatepage(page, 0);
+		page->mapping->a_ops->invalidatepage(page, 0, PAGE_CACHE_SIZE);
 		unlock_page(page);
 		return 0;
 	}
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@ -7493,7 +7493,8 @@ static int btrfs_releasepage(struct page *page, gfp_t gfp_flags)
 	return __btrfs_releasepage(page, gfp_flags & GFP_NOFS);
 }
-static void btrfs_invalidatepage(struct page *page, unsigned long offset)
+static void btrfs_invalidatepage(struct page *page, unsigned int offset,
 				 unsigned int length)
 {
 	struct inode *inode = page->mapping->host;
 	struct extent_io_tree *tree;
--- a/fs/buffer.c
+++ b/fs/buffer.c
@ -1454,7 +1454,8 @@ static void discard_buffer(struct buffer_head * bh)
 * block_invalidatepage - invalidate part or all of a buffer-backed page
 *
 * @page: the page which is affected
- * @offset: the index of the truncation point
+ * @offset: start of the range to invalidate
 * @length: length of the range to invalidate
 *
 * block_invalidatepage() is called when all or part of the page has become
 * invalidated by a truncate operation.
@ -1465,21 +1466,34 @@ static void discard_buffer(struct buffer_head * bh)
 * point.  Because the caller is about to free (and possibly reuse) those
 * blocks on-disk.
 */
-void block_invalidatepage(struct page *page, unsigned long offset)
+void block_invalidatepage(struct page *page, unsigned int offset,
 			  unsigned int length)
 {
 	struct buffer_head *head, *bh, *next;
 	unsigned int curr_off = 0;
 	unsigned int stop = length + offset;
 	BUG_ON(!PageLocked(page));
 	if (!page_has_buffers(page))
 		goto out;
 	/*
 	 * Check for overflow
 	 */
 	BUG_ON(stop > PAGE_CACHE_SIZE || stop < length);
 	head = page_buffers(page);
 	bh = head;
 	do {
 		unsigned int next_off = curr_off + bh->b_size;
 		next = bh->b_this_page;
 		/*
 		 * Are we still fully in range ?
 		 */
 		if (next_off > stop)
 			goto out;
 		/*
 		 * is this block fully invalidated?
 		 */
@ -1501,6 +1515,7 @@ out:
 }
 EXPORT_SYMBOL(block_invalidatepage);
 /*
 * We attach and possibly dirty the buffers atomically wrt
 * __set_page_dirty_buffers() via private_lock.  try_to_free_buffers
@ -2841,7 +2856,7 @@ int block_write_full_page_endio(struct page *page, get_block_t *get_block,
 		 * they may have been added in ext3_writepage().  Make them
 		 * freeable here, so the page does not leak.
 		 */
-		do_invalidatepage(page, 0);
+		do_invalidatepage(page, 0, PAGE_CACHE_SIZE);
 		unlock_page(page);
 		return 0; /* don't care */
 	}
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@ -143,7 +143,8 @@ static int ceph_set_page_dirty(struct page *page)
 * dirty page counters appropriately.  Only called if there is private
 * data on the page.
 */
-static void ceph_invalidatepage(struct page *page, unsigned long offset)
+static void ceph_invalidatepage(struct page *page, unsigned int offset,
 				unsigned int length)
 {
 	struct inode *inode;
 	struct ceph_inode_info *ci;
@ -163,20 +164,20 @@ static void ceph_invalidatepage(struct page *page, unsigned long offset)
 	if (!PageDirty(page))
 		pr_err("%p invalidatepage %p page not dirty\n", inode, page);
-	if (offset == 0)
+	if (offset == 0 && length == PAGE_CACHE_SIZE)
 		ClearPageChecked(page);
 	ci = ceph_inode(inode);
-	if (offset == 0) {
+	if (offset == 0 && length == PAGE_CACHE_SIZE) {
-		dout("%p invalidatepage %p idx %lu full dirty page %lu\n",
+		dout("%p invalidatepage %p idx %lu full dirty page\n",
-		     inode, page, page->index, offset);
+		     inode, page, page->index);
 		ceph_put_wrbuffer_cap_refs(ci, 1, snapc);
 		ceph_put_snap_context(snapc);
 		page->private = 0;
 		ClearPagePrivate(page);
 	} else {
-		dout("%p invalidatepage %p idx %lu partial dirty page\n",
+		dout("%p invalidatepage %p idx %lu partial dirty page %u(%u)\n",
-		     inode, page, page->index);
+		     inode, page, page->index, offset, length);
 	}
 }
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@ -3546,11 +3546,12 @@ static int cifs_release_page(struct page *page, gfp_t gfp)
 	return cifs_fscache_release_page(page, gfp);
 }
-static void cifs_invalidate_page(struct page *page, unsigned long offset)
+static void cifs_invalidate_page(struct page *page, unsigned int offset,
 				 unsigned int length)
 {
 	struct cifsInodeInfo *cifsi = CIFS_I(page->mapping->host);
-	if (offset == 0)
+	if (offset == 0 && length == PAGE_CACHE_SIZE)
 		cifs_fscache_invalidate_page(page, &cifsi->vfs_inode);
 }
--- a/fs/exofs/inode.c
+++ b/fs/exofs/inode.c
@ -953,9 +953,11 @@ static int exofs_releasepage(struct page *page, gfp_t gfp)
 	return 0;
 }
-static void exofs_invalidatepage(struct page *page, unsigned long offset)
+static void exofs_invalidatepage(struct page *page, unsigned int offset,
 				 unsigned int length)
 {
-	EXOFS_DBGMSG("page 0x%lx offset 0x%lx\n", page->index, offset);
+	EXOFS_DBGMSG("page 0x%lx offset 0x%x length 0x%x\n",
 		     page->index, offset, length);
 	WARN_ON(1);
 }
--- a/fs/ext3/inode.c
+++ b/fs/ext3/inode.c
@ -1825,19 +1825,20 @@ ext3_readpages(struct file *file, struct address_space *mapping,
 	return mpage_readpages(mapping, pages, nr_pages, ext3_get_block);
 }
-static void ext3_invalidatepage(struct page *page, unsigned long offset)
+static void ext3_invalidatepage(struct page *page, unsigned int offset,
 				unsigned int length)
 {
 	journal_t *journal = EXT3_JOURNAL(page->mapping->host);
-	trace_ext3_invalidatepage(page, offset);
+	trace_ext3_invalidatepage(page, offset, length);
 	/*
 	 * If it's a full truncate we just forget about the pending dirtying
 	 */
-	if (offset == 0)
+	if (offset == 0 && length == PAGE_CACHE_SIZE)
 		ClearPageChecked(page);
-	journal_invalidatepage(journal, page, offset);
+	journal_invalidatepage(journal, page, offset, length);
 }
 static int ext3_releasepage(struct page *page, gfp_t wait)
--- a/fs/ext3/namei.c
+++ b/fs/ext3/namei.c
@ -576,11 +576,8 @@ static int htree_dirblock_to_tree(struct file *dir_file,
 		if (!ext3_check_dir_entry("htree_dirblock_to_tree", dir, de, bh,
 					(block<<EXT3_BLOCK_SIZE_BITS(dir->i_sb))
 						+((char *)de - bh->b_data))) {
-			/* On error, skip the f_pos to the next block. */
+			/* silently ignore the rest of the block */
-			dir_file->f_pos = (dir_file->f_pos |
+			break;
 					(dir->i_sb->s_blocksize - 1)) + 1;
 			brelse (bh);
 			return count;
 		}
 		ext3fs_dirhash(de->name, de->name_len, hinfo);
 		if ((hinfo->hash < start_hash) ||
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@ -682,11 +682,15 @@ ext4_fsblk_t ext4_count_free_clusters(struct super_block *sb)
 static inline int test_root(ext4_group_t a, int b)
 {
-	int num = b;
+	while (1) {
-
+		if (a < b)
-	while (a > num)
+			return 0;
-		num *= b;
+		if (a == b)
-	return num == a;
+			return 1;
 		if ((a % b) != 0)
 			return 0;
 		a = a / b;
 	}
 }
 static int ext4_group_sparse(ext4_group_t group)
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@ -176,39 +176,29 @@ struct ext4_map_blocks {
 	unsigned int m_flags;
 };
 /*
 * For delayed allocation tracking
 */
 struct mpage_da_data {
 	struct inode *inode;
 	sector_t b_blocknr;		/* start block number of extent */
 	size_t b_size;			/* size of extent */
 	unsigned long b_state;		/* state of the extent */
 	unsigned long first_page, next_page;	/* extent of pages */
 	struct writeback_control *wbc;
 	int io_done;
 	int pages_written;
 	int retval;
 };
 /*
 * Flags for ext4_io_end->flags
 */
 #define	EXT4_IO_END_UNWRITTEN	0x0001
-#define EXT4_IO_END_ERROR	0x0002
+#define EXT4_IO_END_DIRECT	0x0002
 #define EXT4_IO_END_DIRECT	0x0004
 /*
- * For converting uninitialized extents on a work queue.
+ * For converting uninitialized extents on a work queue. 'handle' is used for
 * buffered writeback.
 */
 typedef struct ext4_io_end {
 	struct list_head	list;		/* per-file finished IO list */
 	handle_t		*handle;	/* handle reserved for extent
 						 * conversion */
 	struct inode		*inode;		/* file being written to */
 	struct bio		*bio;		/* Linked list of completed
 						 * bios covering the extent */
 	unsigned int		flag;		/* unwritten or not */
 	loff_t			offset;		/* offset in the file */
 	ssize_t			size;		/* size of the extent */
 	struct kiocb		*iocb;		/* iocb struct for AIO */
 	int			result;		/* error value for AIO */
 	atomic_t		count;		/* reference counter */
 } ext4_io_end_t;
 struct ext4_io_submit {
@ -580,11 +570,6 @@ enum {
 #define EXT4_FREE_BLOCKS_NOFREE_FIRST_CLUSTER	0x0010
 #define EXT4_FREE_BLOCKS_NOFREE_LAST_CLUSTER	0x0020
 /*
 * Flags used by ext4_discard_partial_page_buffers
 */
 #define EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED	0x0001
 /*
 * ioctl commands
 */
@ -879,6 +864,7 @@ struct ext4_inode_info {
 	rwlock_t i_es_lock;
 	struct list_head i_es_lru;
 	unsigned int i_es_lru_nr;	/* protected by i_es_lock */
 	unsigned long i_touch_when;	/* jiffies of last accessing */
 	/* ialloc */
 	ext4_group_t	i_last_alloc_group;
@ -903,12 +889,22 @@ struct ext4_inode_info {
 	qsize_t i_reserved_quota;
 #endif
-	/* completed IOs that might need unwritten extents handling */
+	/* Lock protecting lists below */
 	struct list_head i_completed_io_list;
 	spinlock_t i_completed_io_lock;
 	/*
 	 * Completed IOs that need unwritten extents handling and have
 	 * transaction reserved
 	 */
 	struct list_head i_rsv_conversion_list;
 	/*
 	 * Completed IOs that need unwritten extents handling and don't have
 	 * transaction reserved
 	 */
 	struct list_head i_unrsv_conversion_list;
 	atomic_t i_ioend_count;	/* Number of outstanding io_end structs */
 	atomic_t i_unwritten; /* Nr. of inflight conversions pending */
-	struct work_struct i_unwritten_work;	/* deferred extent conversion */
+	struct work_struct i_rsv_conversion_work;
 	struct work_struct i_unrsv_conversion_work;
 	spinlock_t i_block_reservation_lock;
@ -1245,7 +1241,6 @@ struct ext4_sb_info {
 	unsigned int s_mb_stats;
 	unsigned int s_mb_order2_reqs;
 	unsigned int s_mb_group_prealloc;
 	unsigned int s_max_writeback_mb_bump;
 	unsigned int s_max_dir_size_kb;
 	/* where last allocation was done - for stream allocation */
 	unsigned long s_mb_last_group;
@ -1281,8 +1276,10 @@ struct ext4_sb_info {
 	struct flex_groups *s_flex_groups;
 	ext4_group_t s_flex_groups_allocated;
-	/* workqueue for dio unwritten */
+	/* workqueue for unreserved extent convertions (dio) */
-	struct workqueue_struct *dio_unwritten_wq;
+	struct workqueue_struct *unrsv_conversion_wq;
 	/* workqueue for reserved extent conversions (buffered io) */
 	struct workqueue_struct *rsv_conversion_wq;
 	/* timer for periodic error stats printing */
 	struct timer_list s_err_report;
@ -1307,6 +1304,7 @@ struct ext4_sb_info {
 	/* Reclaim extents from extent status tree */
 	struct shrinker s_es_shrinker;
 	struct list_head s_es_lru;
 	unsigned long s_es_last_sorted;
 	struct percpu_counter s_extent_cache_cnt;
 	spinlock_t s_es_lru_lock ____cacheline_aligned_in_smp;
 };
@ -1342,6 +1340,9 @@ static inline void ext4_set_io_unwritten_flag(struct inode *inode,
 					      struct ext4_io_end *io_end)
 {
 	if (!(io_end->flag & EXT4_IO_END_UNWRITTEN)) {
 		/* Writeback has to have coversion transaction reserved */
 		WARN_ON(EXT4_SB(inode->i_sb)->s_journal && !io_end->handle &&
 			!(io_end->flag & EXT4_IO_END_DIRECT));
 		io_end->flag |= EXT4_IO_END_UNWRITTEN;
 		atomic_inc(&EXT4_I(inode)->i_unwritten);
 	}
@ -1999,7 +2000,6 @@ static inline  unsigned char get_dtype(struct super_block *sb, int filetype)
 /* fsync.c */
 extern int ext4_sync_file(struct file *, loff_t, loff_t, int);
 extern int ext4_flush_unwritten_io(struct inode *);
 /* hash.c */
 extern int ext4fs_dirhash(const char *name, int len, struct
@ -2088,7 +2088,7 @@ extern int ext4_change_inode_journal_flag(struct inode *, int);
 extern int ext4_get_inode_loc(struct inode *, struct ext4_iloc *);
 extern int ext4_can_truncate(struct inode *inode);
 extern void ext4_truncate(struct inode *);
-extern int ext4_punch_hole(struct file *file, loff_t offset, loff_t length);
+extern int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length);
 extern int ext4_truncate_restart_trans(handle_t *, struct inode *, int nblocks);
 extern void ext4_set_inode_flags(struct inode *);
 extern void ext4_get_inode_flags(struct ext4_inode_info *);
@ -2096,9 +2096,12 @@ extern int ext4_alloc_da_blocks(struct inode *inode);
 extern void ext4_set_aops(struct inode *inode);
 extern int ext4_writepage_trans_blocks(struct inode *);
 extern int ext4_chunk_trans_blocks(struct inode *, int nrblocks);
-extern int ext4_discard_partial_page_buffers(handle_t *handle,
+extern int ext4_block_truncate_page(handle_t *handle,
-		struct address_space *mapping, loff_t from,
+		struct address_space *mapping, loff_t from);
-		loff_t length, int flags);
+extern int ext4_block_zero_page_range(handle_t *handle,
 		struct address_space *mapping, loff_t from, loff_t length);
 extern int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode,
 			     loff_t lstart, loff_t lend);
 extern int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf);
 extern qsize_t *ext4_get_reserved_space(struct inode *inode);
 extern void ext4_da_update_reserve_space(struct inode *inode,
@ -2111,7 +2114,7 @@ extern ssize_t ext4_ind_direct_IO(int rw, struct kiocb *iocb,
 				const struct iovec *iov, loff_t offset,
 				unsigned long nr_segs);
 extern int ext4_ind_calc_metadata_amount(struct inode *inode, sector_t lblock);
-extern int ext4_ind_trans_blocks(struct inode *inode, int nrblocks, int chunk);
+extern int ext4_ind_trans_blocks(struct inode *inode, int nrblocks);
 extern void ext4_ind_truncate(handle_t *, struct inode *inode);
 extern int ext4_free_hole_blocks(handle_t *handle, struct inode *inode,
 				 ext4_lblk_t first, ext4_lblk_t stop);
@ -2166,42 +2169,96 @@ extern int ext4_alloc_flex_bg_array(struct super_block *sb,
 				    ext4_group_t ngroup);
 extern const char *ext4_decode_error(struct super_block *sb, int errno,
 				     char nbuf[16]);
 extern __printf(4, 5)
 void __ext4_error(struct super_block *, const char *, unsigned int,
 		  const char *, ...);
 #define ext4_error(sb, message...)	__ext4_error(sb, __func__,	\
 						     __LINE__, ## message)
 extern __printf(5, 6)
-void ext4_error_inode(struct inode *, const char *, unsigned int, ext4_fsblk_t,
+void __ext4_error_inode(struct inode *, const char *, unsigned int, ext4_fsblk_t,
 		      const char *, ...);
 extern __printf(5, 6)
-void ext4_error_file(struct file *, const char *, unsigned int, ext4_fsblk_t,
+void __ext4_error_file(struct file *, const char *, unsigned int, ext4_fsblk_t,
 		     const char *, ...);
 extern void __ext4_std_error(struct super_block *, const char *,
 			     unsigned int, int);
 extern __printf(4, 5)
 void __ext4_abort(struct super_block *, const char *, unsigned int,
 		  const char *, ...);
 #define ext4_abort(sb, message...)	__ext4_abort(sb, __func__, \
 						       __LINE__, ## message)
 extern __printf(4, 5)
 void __ext4_warning(struct super_block *, const char *, unsigned int,
 		    const char *, ...);
 #define ext4_warning(sb, message...)	__ext4_warning(sb, __func__, \
 						       __LINE__, ## message)
 extern __printf(3, 4)
-void ext4_msg(struct super_block *, const char *, const char *, ...);
+void __ext4_msg(struct super_block *, const char *, const char *, ...);
 extern void __dump_mmp_msg(struct super_block *, struct mmp_struct *mmp,
 			   const char *, unsigned int, const char *);
 #define dump_mmp_msg(sb, mmp, msg)	__dump_mmp_msg(sb, mmp, __func__, \
 						       __LINE__, msg)
 extern __printf(7, 8)
 void __ext4_grp_locked_error(const char *, unsigned int,
 			     struct super_block *, ext4_group_t,
 			     unsigned long, ext4_fsblk_t,
 			     const char *, ...);
-#define ext4_grp_locked_error(sb, grp, message...) \
+
-	__ext4_grp_locked_error(__func__, __LINE__, (sb), (grp), ## message)
+#ifdef CONFIG_PRINTK
 #define ext4_error_inode(inode, func, line, block, fmt, ...)		\
 	__ext4_error_inode(inode, func, line, block, fmt, ##__VA_ARGS__)
 #define ext4_error_file(file, func, line, block, fmt, ...)		\
 	__ext4_error_file(file, func, line, block, fmt, ##__VA_ARGS__)
 #define ext4_error(sb, fmt, ...)					\
 	__ext4_error(sb, __func__, __LINE__, fmt, ##__VA_ARGS__)
 #define ext4_abort(sb, fmt, ...)					\
 	__ext4_abort(sb, __func__, __LINE__, fmt, ##__VA_ARGS__)
 #define ext4_warning(sb, fmt, ...)					\
 	__ext4_warning(sb, __func__, __LINE__, fmt, ##__VA_ARGS__)
 #define ext4_msg(sb, level, fmt, ...)				\
 	__ext4_msg(sb, level, fmt, ##__VA_ARGS__)
 #define dump_mmp_msg(sb, mmp, msg)					\
 	__dump_mmp_msg(sb, mmp, __func__, __LINE__, msg)
 #define ext4_grp_locked_error(sb, grp, ino, block, fmt, ...)		\
 	__ext4_grp_locked_error(__func__, __LINE__, sb, grp, ino, block, \
 				fmt, ##__VA_ARGS__)
 #else
 #define ext4_error_inode(inode, func, line, block, fmt, ...)		\
 do {									\
 	no_printk(fmt, ##__VA_ARGS__);					\
 	__ext4_error_inode(inode, "", 0, block, " ");			\
 } while (0)
 #define ext4_error_file(file, func, line, block, fmt, ...)		\
 do {									\
 	no_printk(fmt, ##__VA_ARGS__);					\
 	__ext4_error_file(file, "", 0, block, " ");			\
 } while (0)
 #define ext4_error(sb, fmt, ...)					\
 do {									\
 	no_printk(fmt, ##__VA_ARGS__);					\
 	__ext4_error(sb, "", 0, " ");					\
 } while (0)
 #define ext4_abort(sb, fmt, ...)					\
 do {									\
 	no_printk(fmt, ##__VA_ARGS__);					\
 	__ext4_abort(sb, "", 0, " ");					\
 } while (0)
 #define ext4_warning(sb, fmt, ...)					\
 do {									\
 	no_printk(fmt, ##__VA_ARGS__);					\
 	__ext4_warning(sb, "", 0, " ");					\
 } while (0)
 #define ext4_msg(sb, level, fmt, ...)					\
 do {									\
 	no_printk(fmt, ##__VA_ARGS__);					\
 	__ext4_msg(sb, "", " ");					\
 } while (0)
 #define dump_mmp_msg(sb, mmp, msg)					\
 	__dump_mmp_msg(sb, mmp, "", 0, "")
 #define ext4_grp_locked_error(sb, grp, ino, block, fmt, ...)		\
 do {									\
 	no_printk(fmt, ##__VA_ARGS__);				\
 	__ext4_grp_locked_error("", 0, sb, grp, ino, block, " ");	\
 } while (0)
 #endif
 extern void ext4_update_dynamic_rev(struct super_block *sb);
 extern int ext4_update_compat_feature(handle_t *handle, struct super_block *sb,
 					__u32 compat);
@ -2312,6 +2369,7 @@ struct ext4_group_info *ext4_get_group_info(struct super_block *sb,
 {
 	 struct ext4_group_info ***grp_info;
 	 long indexv, indexh;
 	 BUG_ON(group >= EXT4_SB(sb)->s_groups_count);
 	 grp_info = EXT4_SB(sb)->s_group_info;
 	 indexv = group >> (EXT4_DESC_PER_BLOCK_BITS(sb));
 	 indexh = group & ((EXT4_DESC_PER_BLOCK(sb)) - 1);
@ -2598,8 +2656,7 @@ struct ext4_extent;
 extern int ext4_ext_tree_init(handle_t *handle, struct inode *);
 extern int ext4_ext_writepage_trans_blocks(struct inode *, int);
-extern int ext4_ext_index_trans_blocks(struct inode *inode, int nrblocks,
+extern int ext4_ext_index_trans_blocks(struct inode *inode, int extents);
 				       int chunk);
 extern int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
 			       struct ext4_map_blocks *map, int flags);
 extern void ext4_ext_truncate(handle_t *, struct inode *);
@ -2609,8 +2666,8 @@ extern void ext4_ext_init(struct super_block *);
 extern void ext4_ext_release(struct super_block *);
 extern long ext4_fallocate(struct file *file, int mode, loff_t offset,
 			  loff_t len);
-extern int ext4_convert_unwritten_extents(struct inode *inode, loff_t offset,
+extern int ext4_convert_unwritten_extents(handle_t *handle, struct inode *inode,
-			  ssize_t len);
+					  loff_t offset, ssize_t len);
 extern int ext4_map_blocks(handle_t *handle, struct inode *inode,
 			   struct ext4_map_blocks *map, int flags);
 extern int ext4_ext_calc_metadata_amount(struct inode *inode,
@ -2650,12 +2707,15 @@ extern int ext4_move_extents(struct file *o_filp, struct file *d_filp,
 /* page-io.c */
 extern int __init ext4_init_pageio(void);
 extern void ext4_add_complete_io(ext4_io_end_t *io_end);
 extern void ext4_exit_pageio(void);
 extern void ext4_ioend_shutdown(struct inode *);
 extern void ext4_free_io_end(ext4_io_end_t *io);
 extern ext4_io_end_t *ext4_init_io_end(struct inode *inode, gfp_t flags);
-extern void ext4_end_io_work(struct work_struct *work);
+extern ext4_io_end_t *ext4_get_io_end(ext4_io_end_t *io_end);
 extern int ext4_put_io_end(ext4_io_end_t *io_end);
 extern void ext4_put_io_end_defer(ext4_io_end_t *io_end);
 extern void ext4_io_submit_init(struct ext4_io_submit *io,
 				struct writeback_control *wbc);
 extern void ext4_end_io_rsv_work(struct work_struct *work);
 extern void ext4_end_io_unrsv_work(struct work_struct *work);
 extern void ext4_io_submit(struct ext4_io_submit *io);
 extern int ext4_bio_write_page(struct ext4_io_submit *io,
 			       struct page *page,
@ -2668,20 +2728,17 @@ extern void ext4_mmp_csum_set(struct super_block *sb, struct mmp_struct *mmp);
 extern int ext4_mmp_csum_verify(struct super_block *sb,
 				struct mmp_struct *mmp);
-/* BH_Uninit flag: blocks are allocated but uninitialized on disk */
+/*
 * Note that these flags will never ever appear in a buffer_head's state flag.
 * See EXT4_MAP_... to see where this is used.
 */
 enum ext4_state_bits {
 	BH_Uninit	/* blocks are allocated but uninitialized on disk */
-	  = BH_JBDPrivateStart,
+	 = BH_JBDPrivateStart,
 	BH_AllocFromCluster,	/* allocated blocks were part of already
-				 * allocated cluster. Note that this flag will
+				 * allocated cluster. */
 				 * never, ever appear in a buffer_head's state
 				 * flag. See EXT4_MAP_FROM_CLUSTER to see where
 				 * this is used. */
 };
 BUFFER_FNS(Uninit, uninit)
 TAS_BUFFER_FNS(Uninit, uninit)
 /*
 * Add new method to test whether block and inode bitmaps are properly
 * initialized. With uninit_bg reading the block from disk is not enough
--- a/fs/ext4/ext4_jbd2.c
+++ b/fs/ext4/ext4_jbd2.c
@ -38,31 +38,43 @@ static void ext4_put_nojournal(handle_t *handle)
 /*
 * Wrappers for jbd2_journal_start/end.
 */
-handle_t *__ext4_journal_start_sb(struct super_block *sb, unsigned int line,
+static int ext4_journal_check_start(struct super_block *sb)
 				  int type, int nblocks)
 {
 	journal_t *journal;
 	might_sleep();
 	trace_ext4_journal_start(sb, nblocks, _RET_IP_);
 	if (sb->s_flags & MS_RDONLY)
-		return ERR_PTR(-EROFS);
+		return -EROFS;
 	WARN_ON(sb->s_writers.frozen == SB_FREEZE_COMPLETE);
 	journal = EXT4_SB(sb)->s_journal;
 	if (!journal)
 		return ext4_get_nojournal();
 	/*
 	 * Special case here: if the journal has aborted behind our
 	 * backs (eg. EIO in the commit thread), then we still need to
 	 * take the FS itself readonly cleanly.
 	 */
-	if (is_journal_aborted(journal)) {
+	if (journal && is_journal_aborted(journal)) {
 		ext4_abort(sb, "Detected aborted journal");
-		return ERR_PTR(-EROFS);
+		return -EROFS;
 	}
-	return jbd2__journal_start(journal, nblocks, GFP_NOFS, type, line);
+	return 0;
 }
 handle_t *__ext4_journal_start_sb(struct super_block *sb, unsigned int line,
 				  int type, int blocks, int rsv_blocks)
 {
 	journal_t *journal;
 	int err;
 	trace_ext4_journal_start(sb, blocks, rsv_blocks, _RET_IP_);
 	err = ext4_journal_check_start(sb);
 	if (err < 0)
 		return ERR_PTR(err);
 	journal = EXT4_SB(sb)->s_journal;
 	if (!journal)
 		return ext4_get_nojournal();
 	return jbd2__journal_start(journal, blocks, rsv_blocks, GFP_NOFS,
 				   type, line);
 }
 int __ext4_journal_stop(const char *where, unsigned int line, handle_t *handle)
@ -86,6 +98,30 @@ int __ext4_journal_stop(const char *where, unsigned int line, handle_t *handle)
 	return err;
 }
 handle_t *__ext4_journal_start_reserved(handle_t *handle, unsigned int line,
 					int type)
 {
 	struct super_block *sb;
 	int err;
 	if (!ext4_handle_valid(handle))
 		return ext4_get_nojournal();
 	sb = handle->h_journal->j_private;
 	trace_ext4_journal_start_reserved(sb, handle->h_buffer_credits,
 					  _RET_IP_);
 	err = ext4_journal_check_start(sb);
 	if (err < 0) {
 		jbd2_journal_free_reserved(handle);
 		return ERR_PTR(err);
 	}
 	err = jbd2_journal_start_reserved(handle, type, line);
 	if (err < 0)
 		return ERR_PTR(err);
 	return handle;
 }
 void ext4_journal_abort_handle(const char *caller, unsigned int line,
 			       const char *err_fn, struct buffer_head *bh,
 			       handle_t *handle, int err)
--- a/fs/ext4/ext4_jbd2.h
+++ b/fs/ext4/ext4_jbd2.h
@ -134,7 +134,8 @@ static inline int ext4_jbd2_credits_xattr(struct inode *inode)
 #define EXT4_HT_MIGRATE          8
 #define EXT4_HT_MOVE_EXTENTS     9
 #define EXT4_HT_XATTR           10
-#define EXT4_HT_MAX             11
+#define EXT4_HT_EXT_CONVERT     11
 #define EXT4_HT_MAX             12
 /**
 *   struct ext4_journal_cb_entry - Base structure for callback information.
@ -265,7 +266,7 @@ int __ext4_handle_dirty_super(const char *where, unsigned int line,
 	__ext4_handle_dirty_super(__func__, __LINE__, (handle), (sb))
 handle_t *__ext4_journal_start_sb(struct super_block *sb, unsigned int line,
-				  int type, int nblocks);
+				  int type, int blocks, int rsv_blocks);
 int __ext4_journal_stop(const char *where, unsigned int line, handle_t *handle);
 #define EXT4_NOJOURNAL_MAX_REF_COUNT ((unsigned long) 4096)
@ -300,21 +301,37 @@ static inline int ext4_handle_has_enough_credits(handle_t *handle, int needed)
 }
 #define ext4_journal_start_sb(sb, type, nblocks)			\
-	__ext4_journal_start_sb((sb), __LINE__, (type), (nblocks))
+	__ext4_journal_start_sb((sb), __LINE__, (type), (nblocks), 0)
 #define ext4_journal_start(inode, type, nblocks)			\
-	__ext4_journal_start((inode), __LINE__, (type), (nblocks))
+	__ext4_journal_start((inode), __LINE__, (type), (nblocks), 0)
 #define ext4_journal_start_with_reserve(inode, type, blocks, rsv_blocks) \
 	__ext4_journal_start((inode), __LINE__, (type), (blocks), (rsv_blocks))
 static inline handle_t *__ext4_journal_start(struct inode *inode,
 					     unsigned int line, int type,
-					     int nblocks)
+					     int blocks, int rsv_blocks)
 {
-	return __ext4_journal_start_sb(inode->i_sb, line, type, nblocks);
+	return __ext4_journal_start_sb(inode->i_sb, line, type, blocks,
 				       rsv_blocks);
 }
 #define ext4_journal_stop(handle) \
 	__ext4_journal_stop(__func__, __LINE__, (handle))
 #define ext4_journal_start_reserved(handle, type) \
 	__ext4_journal_start_reserved((handle), __LINE__, (type))
 handle_t *__ext4_journal_start_reserved(handle_t *handle, unsigned int line,
 					int type);
 static inline void ext4_journal_free_reserved(handle_t *handle)
 {
 	if (ext4_handle_valid(handle))
 		jbd2_journal_free_reserved(handle);
 }
 static inline handle_t *ext4_journal_current_handle(void)
 {
 	return journal_current_handle();
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@ -2125,7 +2125,8 @@ static int ext4_fill_fiemap_extents(struct inode *inode,
 		next_del = ext4_find_delayed_extent(inode, &es);
 		if (!exists && next_del) {
 			exists = 1;
-			flags |= FIEMAP_EXTENT_DELALLOC;
+			flags |= (FIEMAP_EXTENT_DELALLOC |
 				  FIEMAP_EXTENT_UNKNOWN);
 		}
 		up_read(&EXT4_I(inode)->i_data_sem);
@ -2328,17 +2329,15 @@ int ext4_ext_calc_credits_for_single_extent(struct inode *inode, int nrblocks,
 }
 /*
- * How many index/leaf blocks need to change/allocate to modify nrblocks?
+ * How many index/leaf blocks need to change/allocate to add @extents extents?
 *
- * if nrblocks are fit in a single extent (chunk flag is 1), then
+ * If we add a single extent, then in the worse case, each tree level
- * in the worse case, each tree level index/leaf need to be changed
+ * index/leaf need to be changed in case of the tree split.
 * if the tree split due to insert a new extent, then the old tree
 * index/leaf need to be updated too
 *
- * If the nrblocks are discontiguous, they could cause
+ * If more extents are inserted, they could cause the whole tree split more
- * the whole tree split more than once, but this is really rare.
+ * than once, but this is really rare.
 */
-int ext4_ext_index_trans_blocks(struct inode *inode, int nrblocks, int chunk)
+int ext4_ext_index_trans_blocks(struct inode *inode, int extents)
 {
 	int index;
 	int depth;
@ -2349,7 +2348,7 @@ int ext4_ext_index_trans_blocks(struct inode *inode, int nrblocks, int chunk)
 	depth = ext_depth(inode);
-	if (chunk)
+	if (extents <= 1)
 		index = depth * 2;
 	else
 		index = depth * 3;
@ -2357,20 +2356,24 @@ int ext4_ext_index_trans_blocks(struct inode *inode, int nrblocks, int chunk)
 	return index;
 }
 static inline int get_default_free_blocks_flags(struct inode *inode)
 {
 	if (S_ISDIR(inode->i_mode) || S_ISLNK(inode->i_mode))
 		return EXT4_FREE_BLOCKS_METADATA | EXT4_FREE_BLOCKS_FORGET;
 	else if (ext4_should_journal_data(inode))
 		return EXT4_FREE_BLOCKS_FORGET;
 	return 0;
 }
 static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
 			      struct ext4_extent *ex,
-			      ext4_fsblk_t *partial_cluster,
+			      long long *partial_cluster,
 			      ext4_lblk_t from, ext4_lblk_t to)
 {
 	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
 	unsigned short ee_len =  ext4_ext_get_actual_len(ex);
 	ext4_fsblk_t pblk;
-	int flags = 0;
+	int flags = get_default_free_blocks_flags(inode);
 	if (S_ISDIR(inode->i_mode) || S_ISLNK(inode->i_mode))
 		flags |= EXT4_FREE_BLOCKS_METADATA | EXT4_FREE_BLOCKS_FORGET;
 	else if (ext4_should_journal_data(inode))
 		flags |= EXT4_FREE_BLOCKS_FORGET;
 	/*
 	 * For bigalloc file systems, we never free a partial cluster
@ -2388,7 +2391,8 @@ static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
 	 * partial cluster here.
 	 */
 	pblk = ext4_ext_pblock(ex) + ee_len - 1;
-	if (*partial_cluster && (EXT4_B2C(sbi, pblk) != *partial_cluster)) {
+	if ((*partial_cluster > 0) &&
 	    (EXT4_B2C(sbi, pblk) != *partial_cluster)) {
 		ext4_free_blocks(handle, inode, NULL,
 				 EXT4_C2B(sbi, *partial_cluster),
 				 sbi->s_cluster_ratio, flags);
@ -2414,41 +2418,46 @@ static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
 	    && to == le32_to_cpu(ex->ee_block) + ee_len - 1) {
 		/* tail removal */
 		ext4_lblk_t num;
 		unsigned int unaligned;
 		num = le32_to_cpu(ex->ee_block) + ee_len - from;
 		pblk = ext4_ext_pblock(ex) + ee_len - num;
-		ext_debug("free last %u blocks starting %llu\n", num, pblk);
+		/*
 		 * Usually we want to free partial cluster at the end of the
 		 * extent, except for the situation when the cluster is still
 		 * used by any other extent (partial_cluster is negative).
 		 */
 		if (*partial_cluster < 0 &&
 		    -(*partial_cluster) == EXT4_B2C(sbi, pblk + num - 1))
 			flags |= EXT4_FREE_BLOCKS_NOFREE_LAST_CLUSTER;
 		ext_debug("free last %u blocks starting %llu partial %lld\n",
 			  num, pblk, *partial_cluster);
 		ext4_free_blocks(handle, inode, NULL, pblk, num, flags);
 		/*
 		 * If the block range to be freed didn't start at the
 		 * beginning of a cluster, and we removed the entire
-		 * extent, save the partial cluster here, since we
+		 * extent and the cluster is not used by any other extent,
-		 * might need to delete if we determine that the
+		 * save the partial cluster here, since we might need to
-		 * truncate operation has removed all of the blocks in
+		 * delete if we determine that the truncate operation has
-		 * the cluster.
+		 * removed all of the blocks in the cluster.
 		 *
 		 * On the other hand, if we did not manage to free the whole
 		 * extent, we have to mark the cluster as used (store negative
 		 * cluster number in partial_cluster).
 		 */
-		if (pblk & (sbi->s_cluster_ratio - 1) &&
+		unaligned = pblk & (sbi->s_cluster_ratio - 1);
-		    (ee_len == num))
+		if (unaligned && (ee_len == num) &&
 		    (*partial_cluster != -((long long)EXT4_B2C(sbi, pblk))))
 			*partial_cluster = EXT4_B2C(sbi, pblk);
-		else
+		else if (unaligned)
 			*partial_cluster = -((long long)EXT4_B2C(sbi, pblk));
 		else if (*partial_cluster > 0)
 			*partial_cluster = 0;
-	} else if (from == le32_to_cpu(ex->ee_block)
+	} else
-		   && to <= le32_to_cpu(ex->ee_block) + ee_len - 1) {
+		ext4_error(sbi->s_sb, "strange request: removal(2) "
-		/* head removal */
+			   "%u-%u from %u:%u\n",
-		ext4_lblk_t num;
+			   from, to, le32_to_cpu(ex->ee_block), ee_len);
 		ext4_fsblk_t start;
 		num = to - from;
 		start = ext4_ext_pblock(ex);
 		ext_debug("free first %u blocks starting %llu\n", num, start);
 		ext4_free_blocks(handle, inode, NULL, start, num, flags);
 	} else {
 		printk(KERN_INFO "strange request: removal(2) "
 				"%u-%u from %u:%u\n",
 				from, to, le32_to_cpu(ex->ee_block), ee_len);
 	}
 	return 0;
 }
@ -2461,12 +2470,16 @@ static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
 * @handle: The journal handle
 * @inode:  The files inode
 * @path:   The path to the leaf
 * @partial_cluster: The cluster which we'll have to free if all extents
 *                   has been released from it. It gets negative in case
 *                   that the cluster is still used.
 * @start:  The first block to remove
 * @end:   The last block to remove
 */
 static int
 ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
-		 struct ext4_ext_path *path, ext4_fsblk_t *partial_cluster,
+		 struct ext4_ext_path *path,
 		 long long *partial_cluster,
 		 ext4_lblk_t start, ext4_lblk_t end)
 {
 	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
@ -2479,6 +2492,7 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
 	unsigned short ex_ee_len;
 	unsigned uninitialized = 0;
 	struct ext4_extent *ex;
 	ext4_fsblk_t pblk;
 	/* the header must be checked already in ext4_ext_remove_space() */
 	ext_debug("truncate since %u in leaf to %u\n", start, end);
@ -2490,7 +2504,9 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
 		return -EIO;
 	}
 	/* find where to start removing */
-	ex = EXT_LAST_EXTENT(eh);
+	ex = path[depth].p_ext;
 	if (!ex)
 		ex = EXT_LAST_EXTENT(eh);
 	ex_ee_block = le32_to_cpu(ex->ee_block);
 	ex_ee_len = ext4_ext_get_actual_len(ex);
@ -2517,6 +2533,16 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
 		/* If this extent is beyond the end of the hole, skip it */
 		if (end < ex_ee_block) {
 			/*
 			 * We're going to skip this extent and move to another,
 			 * so if this extent is not cluster aligned we have
 			 * to mark the current cluster as used to avoid
 			 * accidentally freeing it later on
 			 */
 			pblk = ext4_ext_pblock(ex);
 			if (pblk & (sbi->s_cluster_ratio - 1))
 				*partial_cluster =
 					-((long long)EXT4_B2C(sbi, pblk));
 			ex--;
 			ex_ee_block = le32_to_cpu(ex->ee_block);
 			ex_ee_len = ext4_ext_get_actual_len(ex);
@ -2592,7 +2618,7 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
 					sizeof(struct ext4_extent));
 			}
 			le16_add_cpu(&eh->eh_entries, -1);
-		} else
+		} else if (*partial_cluster > 0)
 			*partial_cluster = 0;
 		err = ext4_ext_dirty(handle, inode, path + depth);
@ -2610,17 +2636,13 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
 		err = ext4_ext_correct_indexes(handle, inode, path);
 	/*
-	 * If there is still a entry in the leaf node, check to see if
+	 * Free the partial cluster only if the current extent does not
-	 * it references the partial cluster.  This is the only place
+	 * reference it. Otherwise we might free used cluster.
 	 * where it could; if it doesn't, we can free the cluster.
 	 */
-	if (*partial_cluster && ex >= EXT_FIRST_EXTENT(eh) &&
+	if (*partial_cluster > 0 &&
 	    (EXT4_B2C(sbi, ext4_ext_pblock(ex) + ex_ee_len - 1) !=
 	     *partial_cluster)) {
-		int flags = EXT4_FREE_BLOCKS_FORGET;
+		int flags = get_default_free_blocks_flags(inode);
 		if (S_ISDIR(inode->i_mode) || S_ISLNK(inode->i_mode))
 			flags |= EXT4_FREE_BLOCKS_METADATA;
 		ext4_free_blocks(handle, inode, NULL,
 				 EXT4_C2B(sbi, *partial_cluster),
@ -2664,7 +2686,7 @@ int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start,
 	struct super_block *sb = inode->i_sb;
 	int depth = ext_depth(inode);
 	struct ext4_ext_path *path = NULL;
-	ext4_fsblk_t partial_cluster = 0;
+	long long partial_cluster = 0;
 	handle_t *handle;
 	int i = 0, err = 0;
@ -2676,7 +2698,7 @@ int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start,
 		return PTR_ERR(handle);
 again:
-	trace_ext4_ext_remove_space(inode, start, depth);
+	trace_ext4_ext_remove_space(inode, start, end, depth);
 	/*
 	 * Check if we are removing extents inside the extent tree. If that
@ -2844,17 +2866,14 @@ again:
 		}
 	}
-	trace_ext4_ext_remove_space_done(inode, start, depth, partial_cluster,
+	trace_ext4_ext_remove_space_done(inode, start, end, depth,
-			path->p_hdr->eh_entries);
+			partial_cluster, path->p_hdr->eh_entries);
 	/* If we still have something in the partial cluster and we have removed
 	 * even the first extent, then we should free the blocks in the partial
 	 * cluster as well. */
-	if (partial_cluster && path->p_hdr->eh_entries == 0) {
+	if (partial_cluster > 0 && path->p_hdr->eh_entries == 0) {
-		int flags = EXT4_FREE_BLOCKS_FORGET;
+		int flags = get_default_free_blocks_flags(inode);
 		if (S_ISDIR(inode->i_mode) || S_ISLNK(inode->i_mode))
 			flags |= EXT4_FREE_BLOCKS_METADATA;
 		ext4_free_blocks(handle, inode, NULL,
 				 EXT4_C2B(EXT4_SB(sb), partial_cluster),
@ -4363,7 +4382,7 @@ out2:
 	}
 out3:
-	trace_ext4_ext_map_blocks_exit(inode, map, err ? err : allocated);
+	trace_ext4_ext_map_blocks_exit(inode, flags, map, err ? err : allocated);
 	return err ? err : allocated;
 }
@ -4446,7 +4465,7 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
 		return -EOPNOTSUPP;
 	if (mode & FALLOC_FL_PUNCH_HOLE)
-		return ext4_punch_hole(file, offset, len);
+		return ext4_punch_hole(inode, offset, len);
 	ret = ext4_convert_inline_data(inode);
 	if (ret)
@ -4548,10 +4567,9 @@ retry:
 * function, to convert the fallocated extents after IO is completed.
 * Returns 0 on success.
 */
-int ext4_convert_unwritten_extents(struct inode *inode, loff_t offset,
+int ext4_convert_unwritten_extents(handle_t *handle, struct inode *inode,
-				    ssize_t len)
+				   loff_t offset, ssize_t len)
 {
 	handle_t *handle;
 	unsigned int max_blocks;
 	int ret = 0;
 	int ret2 = 0;
@ -4566,16 +4584,32 @@ int ext4_convert_unwritten_extents(struct inode *inode, loff_t offset,
 	max_blocks = ((EXT4_BLOCK_ALIGN(len + offset, blkbits) >> blkbits) -
 		      map.m_lblk);
 	/*
-	 * credits to insert 1 extent into extent tree
+	 * This is somewhat ugly but the idea is clear: When transaction is
 	 * reserved, everything goes into it. Otherwise we rather start several
 	 * smaller transactions for conversion of each extent separately.
 	 */
-	credits = ext4_chunk_trans_blocks(inode, max_blocks);
+	if (handle) {
 		handle = ext4_journal_start_reserved(handle,
 						     EXT4_HT_EXT_CONVERT);
 		if (IS_ERR(handle))
 			return PTR_ERR(handle);
 		credits = 0;
 	} else {
 		/*
 		 * credits to insert 1 extent into extent tree
 		 */
 		credits = ext4_chunk_trans_blocks(inode, max_blocks);
 	}
 	while (ret >= 0 && ret < max_blocks) {
 		map.m_lblk += ret;
 		map.m_len = (max_blocks -= ret);
-		handle = ext4_journal_start(inode, EXT4_HT_MAP_BLOCKS, credits);
+		if (credits) {
-		if (IS_ERR(handle)) {
+			handle = ext4_journal_start(inode, EXT4_HT_MAP_BLOCKS,
-			ret = PTR_ERR(handle);
+						    credits);
-			break;
+			if (IS_ERR(handle)) {
 				ret = PTR_ERR(handle);
 				break;
 			}
 		}
 		ret = ext4_map_blocks(handle, inode, &map,
 				      EXT4_GET_BLOCKS_IO_CONVERT_EXT);
@ -4586,10 +4620,13 @@ int ext4_convert_unwritten_extents(struct inode *inode, loff_t offset,
 				     inode->i_ino, map.m_lblk,
 				     map.m_len, ret);
 		ext4_mark_inode_dirty(handle, inode);
-		ret2 = ext4_journal_stop(handle);
+		if (credits)
-		if (ret <= 0 || ret2 )
+			ret2 = ext4_journal_stop(handle);
 		if (ret <= 0 || ret2)
 			break;
 	}
 	if (!credits)
 		ret2 = ext4_journal_stop(handle);
 	return ret > 0 ? ret2 : ret;
 }
@ -4659,7 +4696,7 @@ static int ext4_xattr_fiemap(struct inode *inode,
 		error = ext4_get_inode_loc(inode, &iloc);
 		if (error)
 			return error;
-		physical = iloc.bh->b_blocknr << blockbits;
+		physical = (__u64)iloc.bh->b_blocknr << blockbits;
 		offset = EXT4_GOOD_OLD_INODE_SIZE +
 				EXT4_I(inode)->i_extra_isize;
 		physical += offset;
@ -4667,7 +4704,7 @@ static int ext4_xattr_fiemap(struct inode *inode,
 		flags |= FIEMAP_EXTENT_DATA_INLINE;
 		brelse(iloc.bh);
 	} else { /* external block */
-		physical = EXT4_I(inode)->i_file_acl << blockbits;
+		physical = (__u64)EXT4_I(inode)->i_file_acl << blockbits;
 		length = inode->i_sb->s_blocksize;
 	}
--- a/fs/ext4/extents_status.c
+++ b/fs/ext4/extents_status.c
@ -10,6 +10,7 @@
 * Ext4 extents status tree core functions.
 */
 #include <linux/rbtree.h>
 #include <linux/list_sort.h>
 #include "ext4.h"
 #include "extents_status.h"
 #include "ext4_extents.h"
@ -291,7 +292,6 @@ out:
 	read_unlock(&EXT4_I(inode)->i_es_lock);
 	ext4_es_lru_add(inode);
 	trace_ext4_es_find_delayed_extent_range_exit(inode, es);
 }
@ -672,7 +672,6 @@ int ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
 error:
 	write_unlock(&EXT4_I(inode)->i_es_lock);
 	ext4_es_lru_add(inode);
 	ext4_es_print_tree(inode);
 	return err;
@ -734,7 +733,6 @@ out:
 	read_unlock(&EXT4_I(inode)->i_es_lock);
 	ext4_es_lru_add(inode);
 	trace_ext4_es_lookup_extent_exit(inode, es, found);
 	return found;
 }
@ -878,12 +876,28 @@ int ext4_es_zeroout(struct inode *inode, struct ext4_extent *ex)
 				     EXTENT_STATUS_WRITTEN);
 }
 static int ext4_inode_touch_time_cmp(void *priv, struct list_head *a,
 				     struct list_head *b)
 {
 	struct ext4_inode_info *eia, *eib;
 	eia = list_entry(a, struct ext4_inode_info, i_es_lru);
 	eib = list_entry(b, struct ext4_inode_info, i_es_lru);
 	if (eia->i_touch_when == eib->i_touch_when)
 		return 0;
 	if (time_after(eia->i_touch_when, eib->i_touch_when))
 		return 1;
 	else
 		return -1;
 }
 static int ext4_es_shrink(struct shrinker *shrink, struct shrink_control *sc)
 {
 	struct ext4_sb_info *sbi = container_of(shrink,
 					struct ext4_sb_info, s_es_shrinker);
 	struct ext4_inode_info *ei;
-	struct list_head *cur, *tmp, scanned;
+	struct list_head *cur, *tmp;
 	LIST_HEAD(skiped);
 	int nr_to_scan = sc->nr_to_scan;
 	int ret, nr_shrunk = 0;
@ -893,23 +907,41 @@ static int ext4_es_shrink(struct shrinker *shrink, struct shrink_control *sc)
 	if (!nr_to_scan)
 		return ret;
 	INIT_LIST_HEAD(&scanned);
 	spin_lock(&sbi->s_es_lru_lock);
 	/*
 	 * If the inode that is at the head of LRU list is newer than
 	 * last_sorted time, that means that we need to sort this list.
 	 */
 	ei = list_first_entry(&sbi->s_es_lru, struct ext4_inode_info, i_es_lru);
 	if (sbi->s_es_last_sorted < ei->i_touch_when) {
 		list_sort(NULL, &sbi->s_es_lru, ext4_inode_touch_time_cmp);
 		sbi->s_es_last_sorted = jiffies;
 	}
 	list_for_each_safe(cur, tmp, &sbi->s_es_lru) {
-		list_move_tail(cur, &scanned);
+		/*
 		 * If we have already reclaimed all extents from extent
 		 * status tree, just stop the loop immediately.
 		 */
 		if (percpu_counter_read_positive(&sbi->s_extent_cache_cnt) == 0)
 			break;
 		ei = list_entry(cur, struct ext4_inode_info, i_es_lru);
-		read_lock(&ei->i_es_lock);
+		/* Skip the inode that is newer than the last_sorted time */
-		if (ei->i_es_lru_nr == 0) {
+		if (sbi->s_es_last_sorted < ei->i_touch_when) {
-			read_unlock(&ei->i_es_lock);
+			list_move_tail(cur, &skiped);
 			continue;
 		}
-		read_unlock(&ei->i_es_lock);
+
 		if (ei->i_es_lru_nr == 0)
 			continue;
 		write_lock(&ei->i_es_lock);
 		ret = __es_try_to_reclaim_extents(ei, nr_to_scan);
 		if (ei->i_es_lru_nr == 0)
 			list_del_init(&ei->i_es_lru);
 		write_unlock(&ei->i_es_lock);
 		nr_shrunk += ret;
@ -917,7 +949,9 @@ static int ext4_es_shrink(struct shrinker *shrink, struct shrink_control *sc)
 		if (nr_to_scan == 0)
 			break;
 	}
-	list_splice_tail(&scanned, &sbi->s_es_lru);
+
 	/* Move the newer inodes into the tail of the LRU list. */
 	list_splice_tail(&skiped, &sbi->s_es_lru);
 	spin_unlock(&sbi->s_es_lru_lock);
 	ret = percpu_counter_read_positive(&sbi->s_extent_cache_cnt);
@ -925,21 +959,19 @@ static int ext4_es_shrink(struct shrinker *shrink, struct shrink_control *sc)
 	return ret;
 }
-void ext4_es_register_shrinker(struct super_block *sb)
+void ext4_es_register_shrinker(struct ext4_sb_info *sbi)
 {
 	struct ext4_sb_info *sbi;
 	sbi = EXT4_SB(sb);
 	INIT_LIST_HEAD(&sbi->s_es_lru);
 	spin_lock_init(&sbi->s_es_lru_lock);
 	sbi->s_es_last_sorted = 0;
 	sbi->s_es_shrinker.shrink = ext4_es_shrink;
 	sbi->s_es_shrinker.seeks = DEFAULT_SEEKS;
 	register_shrinker(&sbi->s_es_shrinker);
 }
-void ext4_es_unregister_shrinker(struct super_block *sb)
+void ext4_es_unregister_shrinker(struct ext4_sb_info *sbi)
 {
-	unregister_shrinker(&EXT4_SB(sb)->s_es_shrinker);
+	unregister_shrinker(&sbi->s_es_shrinker);
 }
 void ext4_es_lru_add(struct inode *inode)
@ -947,11 +979,14 @@ void ext4_es_lru_add(struct inode *inode)
 	struct ext4_inode_info *ei = EXT4_I(inode);
 	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
 	ei->i_touch_when = jiffies;
 	if (!list_empty(&ei->i_es_lru))
 		return;
 	spin_lock(&sbi->s_es_lru_lock);
 	if (list_empty(&ei->i_es_lru))
 		list_add_tail(&ei->i_es_lru, &sbi->s_es_lru);
 	else
 		list_move_tail(&ei->i_es_lru, &sbi->s_es_lru);
 	spin_unlock(&sbi->s_es_lru_lock);
 }
--- a/fs/ext4/extents_status.h
+++ b/fs/ext4/extents_status.h
@ -39,6 +39,7 @@
 				 EXTENT_STATUS_DELAYED | \
 				 EXTENT_STATUS_HOLE)
 struct ext4_sb_info;
 struct ext4_extent;
 struct extent_status {
@ -119,8 +120,8 @@ static inline void ext4_es_store_status(struct extent_status *es,
 	es->es_pblk = block;
 }
-extern void ext4_es_register_shrinker(struct super_block *sb);
+extern void ext4_es_register_shrinker(struct ext4_sb_info *sbi);
-extern void ext4_es_unregister_shrinker(struct super_block *sb);
+extern void ext4_es_unregister_shrinker(struct ext4_sb_info *sbi);
 extern void ext4_es_lru_add(struct inode *inode);
 extern void ext4_es_lru_del(struct inode *inode);
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@ -312,7 +312,7 @@ static int ext4_find_unwritten_pgoff(struct inode *inode,
 	blkbits = inode->i_sb->s_blocksize_bits;
 	startoff = *offset;
 	lastoff = startoff;
-	endoff = (map->m_lblk + map->m_len) << blkbits;
+	endoff = (loff_t)(map->m_lblk + map->m_len) << blkbits;
 	index = startoff >> PAGE_CACHE_SHIFT;
 	end = endoff >> PAGE_CACHE_SHIFT;
@ -457,7 +457,7 @@ static loff_t ext4_seek_data(struct file *file, loff_t offset, loff_t maxsize)
 		ret = ext4_map_blocks(NULL, inode, &map, 0);
 		if (ret > 0 && !(map.m_flags & EXT4_MAP_UNWRITTEN)) {
 			if (last != start)
-				dataoff = last << blkbits;
+				dataoff = (loff_t)last << blkbits;
 			break;
 		}
@ -468,7 +468,7 @@ static loff_t ext4_seek_data(struct file *file, loff_t offset, loff_t maxsize)
 		ext4_es_find_delayed_extent_range(inode, last, last, &es);
 		if (es.es_len != 0 && in_range(last, es.es_lblk, es.es_len)) {
 			if (last != start)
-				dataoff = last << blkbits;
+				dataoff = (loff_t)last << blkbits;
 			break;
 		}
@ -486,7 +486,7 @@ static loff_t ext4_seek_data(struct file *file, loff_t offset, loff_t maxsize)
 		}
 		last++;
-		dataoff = last << blkbits;
+		dataoff = (loff_t)last << blkbits;
 	} while (last <= end);
 	mutex_unlock(&inode->i_mutex);
@ -540,7 +540,7 @@ static loff_t ext4_seek_hole(struct file *file, loff_t offset, loff_t maxsize)
 		ret = ext4_map_blocks(NULL, inode, &map, 0);
 		if (ret > 0 && !(map.m_flags & EXT4_MAP_UNWRITTEN)) {
 			last += ret;
-			holeoff = last << blkbits;
+			holeoff = (loff_t)last << blkbits;
 			continue;
 		}
@ -551,7 +551,7 @@ static loff_t ext4_seek_hole(struct file *file, loff_t offset, loff_t maxsize)
 		ext4_es_find_delayed_extent_range(inode, last, last, &es);
 		if (es.es_len != 0 && in_range(last, es.es_lblk, es.es_len)) {
 			last = es.es_lblk + es.es_len;
-			holeoff = last << blkbits;
+			holeoff = (loff_t)last << blkbits;
 			continue;
 		}
@ -566,7 +566,7 @@ static loff_t ext4_seek_hole(struct file *file, loff_t offset, loff_t maxsize)
 							      &map, &holeoff);
 			if (!unwritten) {
 				last += ret;
-				holeoff = last << blkbits;
+				holeoff = (loff_t)last << blkbits;
 				continue;
 			}
 		}
--- a/fs/ext4/fsync.c
+++ b/fs/ext4/fsync.c
@ -73,32 +73,6 @@ static int ext4_sync_parent(struct inode *inode)
 	return ret;
 }
 /**
 * __sync_file - generic_file_fsync without the locking and filemap_write
 * @inode:	inode to sync
 * @datasync:	only sync essential metadata if true
 *
 * This is just generic_file_fsync without the locking.  This is needed for
 * nojournal mode to make sure this inodes data/metadata makes it to disk
 * properly.  The i_mutex should be held already.
 */
 static int __sync_inode(struct inode *inode, int datasync)
 {
 	int err;
 	int ret;
 	ret = sync_mapping_buffers(inode->i_mapping);
 	if (!(inode->i_state & I_DIRTY))
 		return ret;
 	if (datasync && !(inode->i_state & I_DIRTY_DATASYNC))
 		return ret;
 	err = sync_inode_metadata(inode, 1);
 	if (ret == 0)
 		ret = err;
 	return ret;
 }
 /*
 * akpm: A new design for ext4_sync_file().
 *
@ -116,7 +90,7 @@ int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 	struct inode *inode = file->f_mapping->host;
 	struct ext4_inode_info *ei = EXT4_I(inode);
 	journal_t *journal = EXT4_SB(inode->i_sb)->s_journal;
-	int ret, err;
+	int ret = 0, err;
 	tid_t commit_tid;
 	bool needs_barrier = false;
@ -124,25 +98,24 @@ int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 	trace_ext4_sync_file_enter(file, datasync);
-	ret = filemap_write_and_wait_range(inode->i_mapping, start, end);
+	if (inode->i_sb->s_flags & MS_RDONLY) {
-	if (ret)
+		/* Make sure that we read updated s_mount_flags value */
-		return ret;
+		smp_rmb();
-	mutex_lock(&inode->i_mutex);
+		if (EXT4_SB(inode->i_sb)->s_mount_flags & EXT4_MF_FS_ABORTED)
-
+			ret = -EROFS;
 	if (inode->i_sb->s_flags & MS_RDONLY)
 		goto out;
 	ret = ext4_flush_unwritten_io(inode);
 	if (ret < 0)
 		goto out;
 	}
 	if (!journal) {
-		ret = __sync_inode(inode, datasync);
+		ret = generic_file_fsync(file, start, end, datasync);
 		if (!ret && !hlist_empty(&inode->i_dentry))
 			ret = ext4_sync_parent(inode);
 		goto out;
 	}
 	ret = filemap_write_and_wait_range(inode->i_mapping, start, end);
 	if (ret)
 		return ret;
 	/*
 	 * data=writeback,ordered:
 	 *  The caller's filemap_fdatawrite()/wait will sync the data.
@ -172,8 +145,7 @@ int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 		if (!ret)
 			ret = err;
 	}
- out:
+out:
 	mutex_unlock(&inode->i_mutex);
 	trace_ext4_sync_file_exit(inode, ret);
 	return ret;
 }
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@ -747,7 +747,8 @@ repeat_in_this_group:
 		if (!handle) {
 			BUG_ON(nblocks <= 0);
 			handle = __ext4_journal_start_sb(dir->i_sb, line_no,
-							 handle_type, nblocks);
+							 handle_type, nblocks,
 							 0);
 			if (IS_ERR(handle)) {
 				err = PTR_ERR(handle);
 				ext4_std_error(sb, err);
--- a/fs/ext4/indirect.c
+++ b/fs/ext4/indirect.c
@ -624,7 +624,7 @@ cleanup:
 		partial--;
 	}
 out:
-	trace_ext4_ind_map_blocks_exit(inode, map, err);
+	trace_ext4_ind_map_blocks_exit(inode, flags, map, err);
 	return err;
 }
@ -675,11 +675,6 @@ ssize_t ext4_ind_direct_IO(int rw, struct kiocb *iocb,
 retry:
 	if (rw == READ && ext4_should_dioread_nolock(inode)) {
 		if (unlikely(atomic_read(&EXT4_I(inode)->i_unwritten))) {
 			mutex_lock(&inode->i_mutex);
 			ext4_flush_unwritten_io(inode);
 			mutex_unlock(&inode->i_mutex);
 		}
 		/*
 		 * Nolock dioread optimization may be dynamically disabled
 		 * via ext4_inode_block_unlocked_dio(). Check inode's state
@ -779,27 +774,18 @@ int ext4_ind_calc_metadata_amount(struct inode *inode, sector_t lblock)
 	return (blk_bits / EXT4_ADDR_PER_BLOCK_BITS(inode->i_sb)) + 1;
 }
-int ext4_ind_trans_blocks(struct inode *inode, int nrblocks, int chunk)
+/*
 * Calculate number of indirect blocks touched by mapping @nrblocks logically
 * contiguous blocks
 */
 int ext4_ind_trans_blocks(struct inode *inode, int nrblocks)
 {
 	int indirects;
 	/* if nrblocks are contiguous */
 	if (chunk) {
 		/*
 		 * With N contiguous data blocks, we need at most
 		 * N/EXT4_ADDR_PER_BLOCK(inode->i_sb) + 1 indirect blocks,
 		 * 2 dindirect blocks, and 1 tindirect block
 		 */
 		return DIV_ROUND_UP(nrblocks,
 				    EXT4_ADDR_PER_BLOCK(inode->i_sb)) + 4;
 	}
 	/*
-	 * if nrblocks are not contiguous, worse case, each block touch
+	 * With N contiguous data blocks, we need at most
-	 * a indirect block, and each indirect block touch a double indirect
+	 * N/EXT4_ADDR_PER_BLOCK(inode->i_sb) + 1 indirect blocks,
-	 * block, plus a triple indirect block
+	 * 2 dindirect blocks, and 1 tindirect block
 	 */
-	indirects = nrblocks * 2 + 1;
+	return DIV_ROUND_UP(nrblocks, EXT4_ADDR_PER_BLOCK(inode->i_sb)) + 4;
 	return indirects;
 }
 /*
@ -940,11 +926,13 @@ static int ext4_clear_blocks(handle_t *handle, struct inode *inode,
 			     __le32 *last)
 {
 	__le32 *p;
-	int	flags = EXT4_FREE_BLOCKS_FORGET | EXT4_FREE_BLOCKS_VALIDATED;
+	int	flags = EXT4_FREE_BLOCKS_VALIDATED;
 	int	err;
 	if (S_ISDIR(inode->i_mode) || S_ISLNK(inode->i_mode))
-		flags |= EXT4_FREE_BLOCKS_METADATA;
+		flags |= EXT4_FREE_BLOCKS_FORGET | EXT4_FREE_BLOCKS_METADATA;
 	else if (ext4_should_journal_data(inode))
 		flags |= EXT4_FREE_BLOCKS_FORGET;
 	if (!ext4_data_block_valid(EXT4_SB(inode->i_sb), block_to_free,
 				   count)) {
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@ -72,7 +72,7 @@ static int get_max_inline_xattr_value_size(struct inode *inode,
 		entry = (struct ext4_xattr_entry *)
 			((void *)raw_inode + EXT4_I(inode)->i_inline_off);
-		free += le32_to_cpu(entry->e_value_size);
+		free += EXT4_XATTR_SIZE(le32_to_cpu(entry->e_value_size));
 		goto out;
 	}
@ -1810,7 +1810,7 @@ int ext4_inline_data_fiemap(struct inode *inode,
 	if (error)
 		goto out;
-	physical = iloc.bh->b_blocknr << inode->i_sb->s_blocksize_bits;
+	physical = (__u64)iloc.bh->b_blocknr << inode->i_sb->s_blocksize_bits;
 	physical += (char *)ext4_raw_inode(&iloc) - iloc.bh->b_data;
 	physical += offsetof(struct ext4_inode, i_block);
 	length = i_size_read(inode);
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@ -2105,6 +2105,7 @@ repeat:
 		group = ac->ac_g_ex.fe_group;
 		for (i = 0; i < ngroups; group++, i++) {
 			cond_resched();
 			/*
 			 * Artificially restricted ngroups for non-extent
 			 * files makes group > ngroups possible on first loop.
@ -4405,17 +4406,20 @@ ext4_fsblk_t ext4_mb_new_blocks(handle_t *handle,
 repeat:
 		/* allocate space in core */
 		*errp = ext4_mb_regular_allocator(ac);
 		if (*errp)
 			goto discard_and_exit;
 		/* as we've just preallocated more space than
 		 * user requested originally, we store allocated
 		 * space in a special descriptor */
 		if (ac->ac_status == AC_STATUS_FOUND &&
 		    ac->ac_o_ex.fe_len < ac->ac_b_ex.fe_len)
 			*errp = ext4_mb_new_preallocation(ac);
 		if (*errp) {
 		discard_and_exit:
 			ext4_discard_allocated_blocks(ac);
 			goto errout;
 		}
 		/* as we've just preallocated more space than
 		 * user requested orinally, we store allocated
 		 * space in a special descriptor */
 		if (ac->ac_status == AC_STATUS_FOUND &&
 				ac->ac_o_ex.fe_len < ac->ac_b_ex.fe_len)
 			ext4_mb_new_preallocation(ac);
 	}
 	if (likely(ac->ac_status == AC_STATUS_FOUND)) {
 		*errp = ext4_mb_mark_diskspace_used(ac, handle, reserv_clstrs);
@ -4612,10 +4616,11 @@ void ext4_free_blocks(handle_t *handle, struct inode *inode,
 		BUG_ON(bh && (count > 1));
 		for (i = 0; i < count; i++) {
 			cond_resched();
 			if (!bh)
 				tbh = sb_find_get_block(inode->i_sb,
 							block + i);
-			if (unlikely(!tbh))
+			if (!tbh)
 				continue;
 			ext4_forget(handle, flags & EXT4_FREE_BLOCKS_METADATA,
 				    inode, tbh, block + i);
--- a/fs/ext4/move_extent.c
+++ b/fs/ext4/move_extent.c
@ -912,7 +912,6 @@ move_extent_per_page(struct file *o_filp, struct inode *donor_inode,
 	struct page *pagep[2] = {NULL, NULL};
 	handle_t *handle;
 	ext4_lblk_t orig_blk_offset;
 	long long offs = orig_page_offset << PAGE_CACHE_SHIFT;
 	unsigned long blocksize = orig_inode->i_sb->s_blocksize;
 	unsigned int w_flags = 0;
 	unsigned int tmp_data_size, data_size, replaced_size;
@ -940,8 +939,6 @@ again:
 	orig_blk_offset = orig_page_offset * blocks_per_page +
 		data_offset_in_page;
 	offs = (long long)orig_blk_offset << orig_inode->i_blkbits;
 	/* Calculate data_size */
 	if ((orig_blk_offset + block_len_in_page - 1) ==
 	    ((orig_inode->i_size - 1) >> orig_inode->i_blkbits)) {
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@ -918,11 +918,8 @@ static int htree_dirblock_to_tree(struct file *dir_file,
 				bh->b_data, bh->b_size,
 				(block<<EXT4_BLOCK_SIZE_BITS(dir->i_sb))
 					 + ((char *)de - bh->b_data))) {
-			/* On error, skip the f_pos to the next block. */
+			/* silently ignore the rest of the block */
-			dir_file->f_pos = (dir_file->f_pos |
+			break;
 					(dir->i_sb->s_blocksize - 1)) + 1;
 			brelse(bh);
 			return count;
 		}
 		ext4fs_dirhash(de->name, de->name_len, hinfo);
 		if ((hinfo->hash < start_hash) ||
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@ -45,165 +45,6 @@ void ext4_exit_pageio(void)
 	kmem_cache_destroy(io_end_cachep);
 }
 /*
 * This function is called by ext4_evict_inode() to make sure there is
 * no more pending I/O completion work left to do.
 */
 void ext4_ioend_shutdown(struct inode *inode)
 {
 	wait_queue_head_t *wq = ext4_ioend_wq(inode);
 	wait_event(*wq, (atomic_read(&EXT4_I(inode)->i_ioend_count) == 0));
 	/*
 	 * We need to make sure the work structure is finished being
 	 * used before we let the inode get destroyed.
 	 */
 	if (work_pending(&EXT4_I(inode)->i_unwritten_work))
 		cancel_work_sync(&EXT4_I(inode)->i_unwritten_work);
 }
 void ext4_free_io_end(ext4_io_end_t *io)
 {
 	BUG_ON(!io);
 	BUG_ON(!list_empty(&io->list));
 	BUG_ON(io->flag & EXT4_IO_END_UNWRITTEN);
 	if (atomic_dec_and_test(&EXT4_I(io->inode)->i_ioend_count))
 		wake_up_all(ext4_ioend_wq(io->inode));
 	kmem_cache_free(io_end_cachep, io);
 }
 /* check a range of space and convert unwritten extents to written. */
 static int ext4_end_io(ext4_io_end_t *io)
 {
 	struct inode *inode = io->inode;
 	loff_t offset = io->offset;
 	ssize_t size = io->size;
 	int ret = 0;
 	ext4_debug("ext4_end_io_nolock: io 0x%p from inode %lu,list->next 0x%p,"
 		   "list->prev 0x%p\n",
 		   io, inode->i_ino, io->list.next, io->list.prev);
 	ret = ext4_convert_unwritten_extents(inode, offset, size);
 	if (ret < 0) {
 		ext4_msg(inode->i_sb, KERN_EMERG,
 			 "failed to convert unwritten extents to written "
 			 "extents -- potential data loss!  "
 			 "(inode %lu, offset %llu, size %zd, error %d)",
 			 inode->i_ino, offset, size, ret);
 	}
 	/* Wake up anyone waiting on unwritten extent conversion */
 	if (atomic_dec_and_test(&EXT4_I(inode)->i_unwritten))
 		wake_up_all(ext4_ioend_wq(inode));
 	if (io->flag & EXT4_IO_END_DIRECT)
 		inode_dio_done(inode);
 	if (io->iocb)
 		aio_complete(io->iocb, io->result, 0);
 	return ret;
 }
 static void dump_completed_IO(struct inode *inode)
 {
 #ifdef	EXT4FS_DEBUG
 	struct list_head *cur, *before, *after;
 	ext4_io_end_t *io, *io0, *io1;
 	if (list_empty(&EXT4_I(inode)->i_completed_io_list)) {
 		ext4_debug("inode %lu completed_io list is empty\n",
 			   inode->i_ino);
 		return;
 	}
 	ext4_debug("Dump inode %lu completed_io list\n", inode->i_ino);
 	list_for_each_entry(io, &EXT4_I(inode)->i_completed_io_list, list) {
 		cur = &io->list;
 		before = cur->prev;
 		io0 = container_of(before, ext4_io_end_t, list);
 		after = cur->next;
 		io1 = container_of(after, ext4_io_end_t, list);
 		ext4_debug("io 0x%p from inode %lu,prev 0x%p,next 0x%p\n",
 			    io, inode->i_ino, io0, io1);
 	}
 #endif
 }
 /* Add the io_end to per-inode completed end_io list. */
 void ext4_add_complete_io(ext4_io_end_t *io_end)
 {
 	struct ext4_inode_info *ei = EXT4_I(io_end->inode);
 	struct workqueue_struct *wq;
 	unsigned long flags;
 	BUG_ON(!(io_end->flag & EXT4_IO_END_UNWRITTEN));
 	wq = EXT4_SB(io_end->inode->i_sb)->dio_unwritten_wq;
 	spin_lock_irqsave(&ei->i_completed_io_lock, flags);
 	if (list_empty(&ei->i_completed_io_list))
 		queue_work(wq, &ei->i_unwritten_work);
 	list_add_tail(&io_end->list, &ei->i_completed_io_list);
 	spin_unlock_irqrestore(&ei->i_completed_io_lock, flags);
 }
 static int ext4_do_flush_completed_IO(struct inode *inode)
 {
 	ext4_io_end_t *io;
 	struct list_head unwritten;
 	unsigned long flags;
 	struct ext4_inode_info *ei = EXT4_I(inode);
 	int err, ret = 0;
 	spin_lock_irqsave(&ei->i_completed_io_lock, flags);
 	dump_completed_IO(inode);
 	list_replace_init(&ei->i_completed_io_list, &unwritten);
 	spin_unlock_irqrestore(&ei->i_completed_io_lock, flags);
 	while (!list_empty(&unwritten)) {
 		io = list_entry(unwritten.next, ext4_io_end_t, list);
 		BUG_ON(!(io->flag & EXT4_IO_END_UNWRITTEN));
 		list_del_init(&io->list);
 		err = ext4_end_io(io);
 		if (unlikely(!ret && err))
 			ret = err;
 		io->flag &= ~EXT4_IO_END_UNWRITTEN;
 		ext4_free_io_end(io);
 	}
 	return ret;
 }
 /*
 * work on completed aio dio IO, to convert unwritten extents to extents
 */
 void ext4_end_io_work(struct work_struct *work)
 {
 	struct ext4_inode_info *ei = container_of(work, struct ext4_inode_info,
 						  i_unwritten_work);
 	ext4_do_flush_completed_IO(&ei->vfs_inode);
 }
 int ext4_flush_unwritten_io(struct inode *inode)
 {
 	int ret;
 	WARN_ON_ONCE(!mutex_is_locked(&inode->i_mutex) &&
 		     !(inode->i_state & I_FREEING));
 	ret = ext4_do_flush_completed_IO(inode);
 	ext4_unwritten_wait(inode);
 	return ret;
 }
 ext4_io_end_t *ext4_init_io_end(struct inode *inode, gfp_t flags)
 {
 	ext4_io_end_t *io = kmem_cache_zalloc(io_end_cachep, flags);
 	if (io) {
 		atomic_inc(&EXT4_I(inode)->i_ioend_count);
 		io->inode = inode;
 		INIT_LIST_HEAD(&io->list);
 	}
 	return io;
 }
 /*
 * Print an buffer I/O error compatible with the fs/buffer.c.  This
 * provides compatibility with dmesg scrapers that look for a specific
@ -219,21 +60,11 @@ static void buffer_io_error(struct buffer_head *bh)
 			(unsigned long long)bh->b_blocknr);
 }
-static void ext4_end_bio(struct bio *bio, int error)
+static void ext4_finish_bio(struct bio *bio)
 {
 	ext4_io_end_t *io_end = bio->bi_private;
 	struct inode *inode;
 	int i;
-	int blocksize;
+	int error = !test_bit(BIO_UPTODATE, &bio->bi_flags);
 	sector_t bi_sector = bio->bi_sector;
 	BUG_ON(!io_end);
 	inode = io_end->inode;
 	blocksize = 1 << inode->i_blkbits;
 	bio->bi_private = NULL;
 	bio->bi_end_io = NULL;
 	if (test_bit(BIO_UPTODATE, &bio->bi_flags))
 		error = 0;
 	for (i = 0; i < bio->bi_vcnt; i++) {
 		struct bio_vec *bvec = &bio->bi_io_vec[i];
 		struct page *page = bvec->bv_page;
@ -259,7 +90,7 @@ static void ext4_end_bio(struct bio *bio, int error)
 		bit_spin_lock(BH_Uptodate_Lock, &head->b_state);
 		do {
 			if (bh_offset(bh) < bio_start ||
-			    bh_offset(bh) + blocksize > bio_end) {
+			    bh_offset(bh) + bh->b_size > bio_end) {
 				if (buffer_async_write(bh))
 					under_io++;
 				continue;
@ -273,10 +104,235 @@ static void ext4_end_bio(struct bio *bio, int error)
 		if (!under_io)
 			end_page_writeback(page);
 	}
-	bio_put(bio);
+}
 static void ext4_release_io_end(ext4_io_end_t *io_end)
 {
 	struct bio *bio, *next_bio;
 	BUG_ON(!list_empty(&io_end->list));
 	BUG_ON(io_end->flag & EXT4_IO_END_UNWRITTEN);
 	WARN_ON(io_end->handle);
 	if (atomic_dec_and_test(&EXT4_I(io_end->inode)->i_ioend_count))
 		wake_up_all(ext4_ioend_wq(io_end->inode));
 	for (bio = io_end->bio; bio; bio = next_bio) {
 		next_bio = bio->bi_private;
 		ext4_finish_bio(bio);
 		bio_put(bio);
 	}
 	if (io_end->flag & EXT4_IO_END_DIRECT)
 		inode_dio_done(io_end->inode);
 	if (io_end->iocb)
 		aio_complete(io_end->iocb, io_end->result, 0);
 	kmem_cache_free(io_end_cachep, io_end);
 }
 static void ext4_clear_io_unwritten_flag(ext4_io_end_t *io_end)
 {
 	struct inode *inode = io_end->inode;
 	io_end->flag &= ~EXT4_IO_END_UNWRITTEN;
 	/* Wake up anyone waiting on unwritten extent conversion */
 	if (atomic_dec_and_test(&EXT4_I(inode)->i_unwritten))
 		wake_up_all(ext4_ioend_wq(inode));
 }
 /*
 * Check a range of space and convert unwritten extents to written. Note that
 * we are protected from truncate touching same part of extent tree by the
 * fact that truncate code waits for all DIO to finish (thus exclusion from
 * direct IO is achieved) and also waits for PageWriteback bits. Thus we
 * cannot get to ext4_ext_truncate() before all IOs overlapping that range are
 * completed (happens from ext4_free_ioend()).
 */
 static int ext4_end_io(ext4_io_end_t *io)
 {
 	struct inode *inode = io->inode;
 	loff_t offset = io->offset;
 	ssize_t size = io->size;
 	handle_t *handle = io->handle;
 	int ret = 0;
 	ext4_debug("ext4_end_io_nolock: io 0x%p from inode %lu,list->next 0x%p,"
 		   "list->prev 0x%p\n",
 		   io, inode->i_ino, io->list.next, io->list.prev);
 	io->handle = NULL;	/* Following call will use up the handle */
 	ret = ext4_convert_unwritten_extents(handle, inode, offset, size);
 	if (ret < 0) {
 		ext4_msg(inode->i_sb, KERN_EMERG,
 			 "failed to convert unwritten extents to written "
 			 "extents -- potential data loss!  "
 			 "(inode %lu, offset %llu, size %zd, error %d)",
 			 inode->i_ino, offset, size, ret);
 	}
 	ext4_clear_io_unwritten_flag(io);
 	ext4_release_io_end(io);
 	return ret;
 }
 static void dump_completed_IO(struct inode *inode, struct list_head *head)
 {
 #ifdef	EXT4FS_DEBUG
 	struct list_head *cur, *before, *after;
 	ext4_io_end_t *io, *io0, *io1;
 	if (list_empty(head))
 		return;
 	ext4_debug("Dump inode %lu completed io list\n", inode->i_ino);
 	list_for_each_entry(io, head, list) {
 		cur = &io->list;
 		before = cur->prev;
 		io0 = container_of(before, ext4_io_end_t, list);
 		after = cur->next;
 		io1 = container_of(after, ext4_io_end_t, list);
 		ext4_debug("io 0x%p from inode %lu,prev 0x%p,next 0x%p\n",
 			    io, inode->i_ino, io0, io1);
 	}
 #endif
 }
 /* Add the io_end to per-inode completed end_io list. */
 static void ext4_add_complete_io(ext4_io_end_t *io_end)
 {
 	struct ext4_inode_info *ei = EXT4_I(io_end->inode);
 	struct workqueue_struct *wq;
 	unsigned long flags;
 	BUG_ON(!(io_end->flag & EXT4_IO_END_UNWRITTEN));
 	spin_lock_irqsave(&ei->i_completed_io_lock, flags);
 	if (io_end->handle) {
 		wq = EXT4_SB(io_end->inode->i_sb)->rsv_conversion_wq;
 		if (list_empty(&ei->i_rsv_conversion_list))
 			queue_work(wq, &ei->i_rsv_conversion_work);
 		list_add_tail(&io_end->list, &ei->i_rsv_conversion_list);
 	} else {
 		wq = EXT4_SB(io_end->inode->i_sb)->unrsv_conversion_wq;
 		if (list_empty(&ei->i_unrsv_conversion_list))
 			queue_work(wq, &ei->i_unrsv_conversion_work);
 		list_add_tail(&io_end->list, &ei->i_unrsv_conversion_list);
 	}
 	spin_unlock_irqrestore(&ei->i_completed_io_lock, flags);
 }
 static int ext4_do_flush_completed_IO(struct inode *inode,
 				      struct list_head *head)
 {
 	ext4_io_end_t *io;
 	struct list_head unwritten;
 	unsigned long flags;
 	struct ext4_inode_info *ei = EXT4_I(inode);
 	int err, ret = 0;
 	spin_lock_irqsave(&ei->i_completed_io_lock, flags);
 	dump_completed_IO(inode, head);
 	list_replace_init(head, &unwritten);
 	spin_unlock_irqrestore(&ei->i_completed_io_lock, flags);
 	while (!list_empty(&unwritten)) {
 		io = list_entry(unwritten.next, ext4_io_end_t, list);
 		BUG_ON(!(io->flag & EXT4_IO_END_UNWRITTEN));
 		list_del_init(&io->list);
 		err = ext4_end_io(io);
 		if (unlikely(!ret && err))
 			ret = err;
 	}
 	return ret;
 }
 /*
 * work on completed IO, to convert unwritten extents to extents
 */
 void ext4_end_io_rsv_work(struct work_struct *work)
 {
 	struct ext4_inode_info *ei = container_of(work, struct ext4_inode_info,
 						  i_rsv_conversion_work);
 	ext4_do_flush_completed_IO(&ei->vfs_inode, &ei->i_rsv_conversion_list);
 }
 void ext4_end_io_unrsv_work(struct work_struct *work)
 {
 	struct ext4_inode_info *ei = container_of(work, struct ext4_inode_info,
 						  i_unrsv_conversion_work);
 	ext4_do_flush_completed_IO(&ei->vfs_inode, &ei->i_unrsv_conversion_list);
 }
 ext4_io_end_t *ext4_init_io_end(struct inode *inode, gfp_t flags)
 {
 	ext4_io_end_t *io = kmem_cache_zalloc(io_end_cachep, flags);
 	if (io) {
 		atomic_inc(&EXT4_I(inode)->i_ioend_count);
 		io->inode = inode;
 		INIT_LIST_HEAD(&io->list);
 		atomic_set(&io->count, 1);
 	}
 	return io;
 }
 void ext4_put_io_end_defer(ext4_io_end_t *io_end)
 {
 	if (atomic_dec_and_test(&io_end->count)) {
 		if (!(io_end->flag & EXT4_IO_END_UNWRITTEN) || !io_end->size) {
 			ext4_release_io_end(io_end);
 			return;
 		}
 		ext4_add_complete_io(io_end);
 	}
 }
 int ext4_put_io_end(ext4_io_end_t *io_end)
 {
 	int err = 0;
 	if (atomic_dec_and_test(&io_end->count)) {
 		if (io_end->flag & EXT4_IO_END_UNWRITTEN) {
 			err = ext4_convert_unwritten_extents(io_end->handle,
 						io_end->inode, io_end->offset,
 						io_end->size);
 			io_end->handle = NULL;
 			ext4_clear_io_unwritten_flag(io_end);
 		}
 		ext4_release_io_end(io_end);
 	}
 	return err;
 }
 ext4_io_end_t *ext4_get_io_end(ext4_io_end_t *io_end)
 {
 	atomic_inc(&io_end->count);
 	return io_end;
 }
 static void ext4_end_bio(struct bio *bio, int error)
 {
 	ext4_io_end_t *io_end = bio->bi_private;
 	sector_t bi_sector = bio->bi_sector;
 	BUG_ON(!io_end);
 	bio->bi_end_io = NULL;
 	if (test_bit(BIO_UPTODATE, &bio->bi_flags))
 		error = 0;
 	if (io_end->flag & EXT4_IO_END_UNWRITTEN) {
 		/*
 		 * Link bio into list hanging from io_end. We have to do it
 		 * atomically as bio completions can be racing against each
 		 * other.
 		 */
 		bio->bi_private = xchg(&io_end->bio, bio);
 	} else {
 		ext4_finish_bio(bio);
 		bio_put(bio);
 	}
 	if (error) {
-		io_end->flag |= EXT4_IO_END_ERROR;
+		struct inode *inode = io_end->inode;
 		ext4_warning(inode->i_sb, "I/O error writing to inode %lu "
 			     "(offset %llu size %ld starting block %llu)",
 			     inode->i_ino,
@ -285,13 +341,7 @@ static void ext4_end_bio(struct bio *bio, int error)
 			     (unsigned long long)
 			     bi_sector >> (inode->i_blkbits - 9));
 	}
-
+	ext4_put_io_end_defer(io_end);
 	if (!(io_end->flag & EXT4_IO_END_UNWRITTEN)) {
 		ext4_free_io_end(io_end);
 		return;
 	}
 	ext4_add_complete_io(io_end);
 }
 void ext4_io_submit(struct ext4_io_submit *io)
@ -305,43 +355,38 @@ void ext4_io_submit(struct ext4_io_submit *io)
 		bio_put(io->io_bio);
 	}
 	io->io_bio = NULL;
-	io->io_op = 0;
+}
 void ext4_io_submit_init(struct ext4_io_submit *io,
 			 struct writeback_control *wbc)
 {
 	io->io_op = (wbc->sync_mode == WB_SYNC_ALL ?  WRITE_SYNC : WRITE);
 	io->io_bio = NULL;
 	io->io_end = NULL;
 }
-static int io_submit_init(struct ext4_io_submit *io,
+static int io_submit_init_bio(struct ext4_io_submit *io,
-			  struct inode *inode,
+			      struct buffer_head *bh)
 			  struct writeback_control *wbc,
 			  struct buffer_head *bh)
 {
 	ext4_io_end_t *io_end;
 	struct page *page = bh->b_page;
 	int nvecs = bio_get_nr_vecs(bh->b_bdev);
 	struct bio *bio;
 	io_end = ext4_init_io_end(inode, GFP_NOFS);
 	if (!io_end)
 		return -ENOMEM;
 	bio = bio_alloc(GFP_NOIO, min(nvecs, BIO_MAX_PAGES));
 	if (!bio)
 		return -ENOMEM;
 	bio->bi_sector = bh->b_blocknr * (bh->b_size >> 9);
 	bio->bi_bdev = bh->b_bdev;
 	bio->bi_private = io->io_end = io_end;
 	bio->bi_end_io = ext4_end_bio;
-
+	bio->bi_private = ext4_get_io_end(io->io_end);
 	io_end->offset = (page->index << PAGE_CACHE_SHIFT) + bh_offset(bh);
 	io->io_bio = bio;
 	io->io_op = (wbc->sync_mode == WB_SYNC_ALL ?  WRITE_SYNC : WRITE);
 	io->io_next_block = bh->b_blocknr;
 	return 0;
 }
 static int io_submit_add_bh(struct ext4_io_submit *io,
 			    struct inode *inode,
 			    struct writeback_control *wbc,
 			    struct buffer_head *bh)
 {
 	ext4_io_end_t *io_end;
 	int ret;
 	if (io->io_bio && bh->b_blocknr != io->io_next_block) {
@ -349,18 +394,14 @@ submit_and_retry:
 		ext4_io_submit(io);
 	}
 	if (io->io_bio == NULL) {
-		ret = io_submit_init(io, inode, wbc, bh);
+		ret = io_submit_init_bio(io, bh);
 		if (ret)
 			return ret;
 	}
 	io_end = io->io_end;
 	if (test_clear_buffer_uninit(bh))
 		ext4_set_io_unwritten_flag(inode, io_end);
 	io->io_end->size += bh->b_size;
 	io->io_next_block++;
 	ret = bio_add_page(io->io_bio, bh->b_page, bh->b_size, bh_offset(bh));
 	if (ret != bh->b_size)
 		goto submit_and_retry;
 	io->io_next_block++;
 	return 0;
 }
@ -432,7 +473,7 @@ int ext4_bio_write_page(struct ext4_io_submit *io,
 	do {
 		if (!buffer_async_write(bh))
 			continue;
-		ret = io_submit_add_bh(io, inode, wbc, bh);
+		ret = io_submit_add_bh(io, inode, bh);
 		if (ret) {
 			/*
 			 * We only get here on ENOMEM.  Not much else
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@ -79,12 +79,20 @@ static int verify_group_input(struct super_block *sb,
 	ext4_fsblk_t end = start + input->blocks_count;
 	ext4_group_t group = input->group;
 	ext4_fsblk_t itend = input->inode_table + sbi->s_itb_per_group;
-	unsigned overhead = ext4_group_overhead_blocks(sb, group);
+	unsigned overhead;
-	ext4_fsblk_t metaend = start + overhead;
+	ext4_fsblk_t metaend;
 	struct buffer_head *bh = NULL;
 	ext4_grpblk_t free_blocks_count, offset;
 	int err = -EINVAL;
 	if (group != sbi->s_groups_count) {
 		ext4_warning(sb, "Cannot add at group %u (only %u groups)",
 			     input->group, sbi->s_groups_count);
 		return -EINVAL;
 	}
 	overhead = ext4_group_overhead_blocks(sb, group);
 	metaend = start + overhead;
 	input->free_blocks_count = free_blocks_count =
 		input->blocks_count - 2 - overhead - sbi->s_itb_per_group;
@ -96,10 +104,7 @@ static int verify_group_input(struct super_block *sb,
 		       free_blocks_count, input->reserved_blocks);
 	ext4_get_group_no_and_offset(sb, start, NULL, &offset);
-	if (group != sbi->s_groups_count)
+	if (offset != 0)
 		ext4_warning(sb, "Cannot add at group %u (only %u groups)",
 			     input->group, sbi->s_groups_count);
 	else if (offset != 0)
 			ext4_warning(sb, "Last group not full");
 	else if (input->reserved_blocks > input->blocks_count / 5)
 		ext4_warning(sb, "Reserved blocks too high (%u)",
@ -1551,11 +1556,10 @@ int ext4_group_add(struct super_block *sb, struct ext4_new_group_data *input)
 	int reserved_gdb = ext4_bg_has_super(sb, input->group) ?
 		le16_to_cpu(es->s_reserved_gdt_blocks) : 0;
 	struct inode *inode = NULL;
-	int gdb_off, gdb_num;
+	int gdb_off;
 	int err;
 	__u16 bg_flags = 0;
 	gdb_num = input->group / EXT4_DESC_PER_BLOCK(sb);
 	gdb_off = input->group % EXT4_DESC_PER_BLOCK(sb);
 	if (gdb_off == 0 && !EXT4_HAS_RO_COMPAT_FEATURE(sb,
@ -1656,12 +1660,10 @@ errout:
 		err = err2;
 	if (!err) {
 		ext4_fsblk_t first_block;
 		first_block = ext4_group_first_block_no(sb, 0);
 		if (test_opt(sb, DEBUG))
 			printk(KERN_DEBUG "EXT4-fs: extended group to %llu "
 			       "blocks\n", ext4_blocks_count(es));
-		update_backups(sb, EXT4_SB(sb)->s_sbh->b_blocknr - first_block,
+		update_backups(sb, EXT4_SB(sb)->s_sbh->b_blocknr,
 			       (char *)es, sizeof(struct ext4_super_block), 0);
 	}
 	return err;
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@ -69,6 +69,7 @@ static void ext4_mark_recovery_complete(struct super_block *sb,
 static void ext4_clear_journal_err(struct super_block *sb,
 				   struct ext4_super_block *es);
 static int ext4_sync_fs(struct super_block *sb, int wait);
 static int ext4_sync_fs_nojournal(struct super_block *sb, int wait);
 static int ext4_remount(struct super_block *sb, int *flags, char *data);
 static int ext4_statfs(struct dentry *dentry, struct kstatfs *buf);
 static int ext4_unfreeze(struct super_block *sb);
@ -398,6 +399,11 @@ static void ext4_handle_error(struct super_block *sb)
 	}
 	if (test_opt(sb, ERRORS_RO)) {
 		ext4_msg(sb, KERN_CRIT, "Remounting filesystem read-only");
 		/*
 		 * Make sure updated value of ->s_mount_flags will be visible
 		 * before ->s_flags update
 		 */
 		smp_wmb();
 		sb->s_flags |= MS_RDONLY;
 	}
 	if (test_opt(sb, ERRORS_PANIC))
@ -422,9 +428,9 @@ void __ext4_error(struct super_block *sb, const char *function,
 	ext4_handle_error(sb);
 }
-void ext4_error_inode(struct inode *inode, const char *function,
+void __ext4_error_inode(struct inode *inode, const char *function,
-		      unsigned int line, ext4_fsblk_t block,
+			unsigned int line, ext4_fsblk_t block,
-		      const char *fmt, ...)
+			const char *fmt, ...)
 {
 	va_list args;
 	struct va_format vaf;
@ -451,9 +457,9 @@ void ext4_error_inode(struct inode *inode, const char *function,
 	ext4_handle_error(inode->i_sb);
 }
-void ext4_error_file(struct file *file, const char *function,
+void __ext4_error_file(struct file *file, const char *function,
-		     unsigned int line, ext4_fsblk_t block,
+		       unsigned int line, ext4_fsblk_t block,
-		     const char *fmt, ...)
+		       const char *fmt, ...)
 {
 	va_list args;
 	struct va_format vaf;
@ -570,8 +576,13 @@ void __ext4_abort(struct super_block *sb, const char *function,
 	if ((sb->s_flags & MS_RDONLY) == 0) {
 		ext4_msg(sb, KERN_CRIT, "Remounting filesystem read-only");
 		sb->s_flags |= MS_RDONLY;
 		EXT4_SB(sb)->s_mount_flags |= EXT4_MF_FS_ABORTED;
 		/*
 		 * Make sure updated value of ->s_mount_flags will be visible
 		 * before ->s_flags update
 		 */
 		smp_wmb();
 		sb->s_flags |= MS_RDONLY;
 		if (EXT4_SB(sb)->s_journal)
 			jbd2_journal_abort(EXT4_SB(sb)->s_journal, -EIO);
 		save_error_info(sb, function, line);
@ -580,7 +591,8 @@ void __ext4_abort(struct super_block *sb, const char *function,
 		panic("EXT4-fs panic from previous error\n");
 }
-void ext4_msg(struct super_block *sb, const char *prefix, const char *fmt, ...)
+void __ext4_msg(struct super_block *sb,
 		const char *prefix, const char *fmt, ...)
 {
 	struct va_format vaf;
 	va_list args;
@ -750,8 +762,10 @@ static void ext4_put_super(struct super_block *sb)
 	ext4_unregister_li_request(sb);
 	dquot_disable(sb, -1, DQUOT_USAGE_ENABLED | DQUOT_LIMITS_ENABLED);
-	flush_workqueue(sbi->dio_unwritten_wq);
+	flush_workqueue(sbi->unrsv_conversion_wq);
-	destroy_workqueue(sbi->dio_unwritten_wq);
+	flush_workqueue(sbi->rsv_conversion_wq);
 	destroy_workqueue(sbi->unrsv_conversion_wq);
 	destroy_workqueue(sbi->rsv_conversion_wq);
 	if (sbi->s_journal) {
 		err = jbd2_journal_destroy(sbi->s_journal);
@ -760,7 +774,7 @@ static void ext4_put_super(struct super_block *sb)
 			ext4_abort(sb, "Couldn't clean up the journal");
 	}
-	ext4_es_unregister_shrinker(sb);
+	ext4_es_unregister_shrinker(sbi);
 	del_timer(&sbi->s_err_report);
 	ext4_release_system_zone(sb);
 	ext4_mb_release(sb);
@ -849,6 +863,7 @@ static struct inode *ext4_alloc_inode(struct super_block *sb)
 	rwlock_init(&ei->i_es_lock);
 	INIT_LIST_HEAD(&ei->i_es_lru);
 	ei->i_es_lru_nr = 0;
 	ei->i_touch_when = 0;
 	ei->i_reserved_data_blocks = 0;
 	ei->i_reserved_meta_blocks = 0;
 	ei->i_allocated_meta_blocks = 0;
@ -859,13 +874,15 @@ static struct inode *ext4_alloc_inode(struct super_block *sb)
 	ei->i_reserved_quota = 0;
 #endif
 	ei->jinode = NULL;
-	INIT_LIST_HEAD(&ei->i_completed_io_list);
+	INIT_LIST_HEAD(&ei->i_rsv_conversion_list);
 	INIT_LIST_HEAD(&ei->i_unrsv_conversion_list);
 	spin_lock_init(&ei->i_completed_io_lock);
 	ei->i_sync_tid = 0;
 	ei->i_datasync_tid = 0;
 	atomic_set(&ei->i_ioend_count, 0);
 	atomic_set(&ei->i_unwritten, 0);
-	INIT_WORK(&ei->i_unwritten_work, ext4_end_io_work);
+	INIT_WORK(&ei->i_rsv_conversion_work, ext4_end_io_rsv_work);
 	INIT_WORK(&ei->i_unrsv_conversion_work, ext4_end_io_unrsv_work);
 	return &ei->vfs_inode;
 }
@ -1093,6 +1110,7 @@ static const struct super_operations ext4_nojournal_sops = {
 	.dirty_inode	= ext4_dirty_inode,
 	.drop_inode	= ext4_drop_inode,
 	.evict_inode	= ext4_evict_inode,
 	.sync_fs	= ext4_sync_fs_nojournal,
 	.put_super	= ext4_put_super,
 	.statfs		= ext4_statfs,
 	.remount_fs	= ext4_remount,
@ -1908,7 +1926,6 @@ static int ext4_fill_flex_info(struct super_block *sb)
 	struct ext4_sb_info *sbi = EXT4_SB(sb);
 	struct ext4_group_desc *gdp = NULL;
 	ext4_group_t flex_group;
 	unsigned int groups_per_flex = 0;
 	int i, err;
 	sbi->s_log_groups_per_flex = sbi->s_es->s_log_groups_per_flex;
@ -1916,7 +1933,6 @@ static int ext4_fill_flex_info(struct super_block *sb)
 		sbi->s_log_groups_per_flex = 0;
 		return 1;
 	}
 	groups_per_flex = 1U << sbi->s_log_groups_per_flex;
 	err = ext4_alloc_flex_bg_array(sb, sbi->s_groups_count);
 	if (err)
@ -2164,19 +2180,22 @@ static void ext4_orphan_cleanup(struct super_block *sb,
 		list_add(&EXT4_I(inode)->i_orphan, &EXT4_SB(sb)->s_orphan);
 		dquot_initialize(inode);
 		if (inode->i_nlink) {
-			ext4_msg(sb, KERN_DEBUG,
+			if (test_opt(sb, DEBUG))
-				"%s: truncating inode %lu to %lld bytes",
+				ext4_msg(sb, KERN_DEBUG,
-				__func__, inode->i_ino, inode->i_size);
+					"%s: truncating inode %lu to %lld bytes",
 					__func__, inode->i_ino, inode->i_size);
 			jbd_debug(2, "truncating inode %lu to %lld bytes\n",
 				  inode->i_ino, inode->i_size);
 			mutex_lock(&inode->i_mutex);
 			truncate_inode_pages(inode->i_mapping, inode->i_size);
 			ext4_truncate(inode);
 			mutex_unlock(&inode->i_mutex);
 			nr_truncates++;
 		} else {
-			ext4_msg(sb, KERN_DEBUG,
+			if (test_opt(sb, DEBUG))
-				"%s: deleting unreferenced inode %lu",
+				ext4_msg(sb, KERN_DEBUG,
-				__func__, inode->i_ino);
+					"%s: deleting unreferenced inode %lu",
 					__func__, inode->i_ino);
 			jbd_debug(2, "deleting unreferenced inode %lu\n",
 				  inode->i_ino);
 			nr_orphans++;
@ -2377,7 +2396,10 @@ struct ext4_attr {
 	ssize_t (*show)(struct ext4_attr *, struct ext4_sb_info *, char *);
 	ssize_t (*store)(struct ext4_attr *, struct ext4_sb_info *,
 			 const char *, size_t);
-	int offset;
+	union {
 		int offset;
 		int deprecated_val;
 	} u;
 };
 static int parse_strtoull(const char *buf,
@ -2446,7 +2468,7 @@ static ssize_t inode_readahead_blks_store(struct ext4_attr *a,
 static ssize_t sbi_ui_show(struct ext4_attr *a,
 			   struct ext4_sb_info *sbi, char *buf)
 {
-	unsigned int *ui = (unsigned int *) (((char *) sbi) + a->offset);
+	unsigned int *ui = (unsigned int *) (((char *) sbi) + a->u.offset);
 	return snprintf(buf, PAGE_SIZE, "%u\n", *ui);
 }
@ -2455,7 +2477,7 @@ static ssize_t sbi_ui_store(struct ext4_attr *a,
 			    struct ext4_sb_info *sbi,
 			    const char *buf, size_t count)
 {
-	unsigned int *ui = (unsigned int *) (((char *) sbi) + a->offset);
+	unsigned int *ui = (unsigned int *) (((char *) sbi) + a->u.offset);
 	unsigned long t;
 	int ret;
@ -2504,12 +2526,20 @@ static ssize_t trigger_test_error(struct ext4_attr *a,
 	return count;
 }
 static ssize_t sbi_deprecated_show(struct ext4_attr *a,
 				   struct ext4_sb_info *sbi, char *buf)
 {
 	return snprintf(buf, PAGE_SIZE, "%d\n", a->u.deprecated_val);
 }
 #define EXT4_ATTR_OFFSET(_name,_mode,_show,_store,_elname) \
 static struct ext4_attr ext4_attr_##_name = {			\
 	.attr = {.name = __stringify(_name), .mode = _mode },	\
 	.show	= _show,					\
 	.store	= _store,					\
-	.offset = offsetof(struct ext4_sb_info, _elname),	\
+	.u = {							\
 		.offset = offsetof(struct ext4_sb_info, _elname),\
 	},							\
 }
 #define EXT4_ATTR(name, mode, show, store) \
 static struct ext4_attr ext4_attr_##name = __ATTR(name, mode, show, store)
@ -2520,6 +2550,14 @@ static struct ext4_attr ext4_attr_##name = __ATTR(name, mode, show, store)
 #define EXT4_RW_ATTR_SBI_UI(name, elname)	\
 	EXT4_ATTR_OFFSET(name, 0644, sbi_ui_show, sbi_ui_store, elname)
 #define ATTR_LIST(name) &ext4_attr_##name.attr
 #define EXT4_DEPRECATED_ATTR(_name, _val)	\
 static struct ext4_attr ext4_attr_##_name = {			\
 	.attr = {.name = __stringify(_name), .mode = 0444 },	\
 	.show	= sbi_deprecated_show,				\
 	.u = {							\
 		.deprecated_val = _val,				\
 	},							\
 }
 EXT4_RO_ATTR(delayed_allocation_blocks);
 EXT4_RO_ATTR(session_write_kbytes);
@ -2534,7 +2572,7 @@ EXT4_RW_ATTR_SBI_UI(mb_min_to_scan, s_mb_min_to_scan);
 EXT4_RW_ATTR_SBI_UI(mb_order2_req, s_mb_order2_reqs);
 EXT4_RW_ATTR_SBI_UI(mb_stream_req, s_mb_stream_request);
 EXT4_RW_ATTR_SBI_UI(mb_group_prealloc, s_mb_group_prealloc);
-EXT4_RW_ATTR_SBI_UI(max_writeback_mb_bump, s_max_writeback_mb_bump);
+EXT4_DEPRECATED_ATTR(max_writeback_mb_bump, 128);
 EXT4_RW_ATTR_SBI_UI(extent_max_zeroout_kb, s_extent_max_zeroout_kb);
 EXT4_ATTR(trigger_fs_error, 0200, NULL, trigger_test_error);
@ -3763,7 +3801,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 	sbi->s_err_report.data = (unsigned long) sb;
 	/* Register extent status tree shrinker */
-	ext4_es_register_shrinker(sb);
+	ext4_es_register_shrinker(sbi);
 	err = percpu_counter_init(&sbi->s_freeclusters_counter,
 			ext4_count_free_clusters(sb));
@ -3787,7 +3825,6 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 	}
 	sbi->s_stripe = ext4_get_stripe_size(sbi);
 	sbi->s_max_writeback_mb_bump = 128;
 	sbi->s_extent_max_zeroout_kb = 32;
 	/*
@ -3915,12 +3952,20 @@ no_journal:
 	 * The maximum number of concurrent works can be high and
 	 * concurrency isn't really necessary.  Limit it to 1.
 	 */
-	EXT4_SB(sb)->dio_unwritten_wq =
+	EXT4_SB(sb)->rsv_conversion_wq =
-		alloc_workqueue("ext4-dio-unwritten", WQ_MEM_RECLAIM | WQ_UNBOUND, 1);
+		alloc_workqueue("ext4-rsv-conversion", WQ_MEM_RECLAIM | WQ_UNBOUND, 1);
-	if (!EXT4_SB(sb)->dio_unwritten_wq) {
+	if (!EXT4_SB(sb)->rsv_conversion_wq) {
-		printk(KERN_ERR "EXT4-fs: failed to create DIO workqueue\n");
+		printk(KERN_ERR "EXT4-fs: failed to create workqueue\n");
 		ret = -ENOMEM;
-		goto failed_mount_wq;
+		goto failed_mount4;
 	}
 	EXT4_SB(sb)->unrsv_conversion_wq =
 		alloc_workqueue("ext4-unrsv-conversion", WQ_MEM_RECLAIM | WQ_UNBOUND, 1);
 	if (!EXT4_SB(sb)->unrsv_conversion_wq) {
 		printk(KERN_ERR "EXT4-fs: failed to create workqueue\n");
 		ret = -ENOMEM;
 		goto failed_mount4;
 	}
 	/*
@ -4074,14 +4119,17 @@ failed_mount4a:
 	sb->s_root = NULL;
 failed_mount4:
 	ext4_msg(sb, KERN_ERR, "mount failed");
-	destroy_workqueue(EXT4_SB(sb)->dio_unwritten_wq);
+	if (EXT4_SB(sb)->rsv_conversion_wq)
 		destroy_workqueue(EXT4_SB(sb)->rsv_conversion_wq);
 	if (EXT4_SB(sb)->unrsv_conversion_wq)
 		destroy_workqueue(EXT4_SB(sb)->unrsv_conversion_wq);
 failed_mount_wq:
 	if (sbi->s_journal) {
 		jbd2_journal_destroy(sbi->s_journal);
 		sbi->s_journal = NULL;
 	}
 failed_mount3:
-	ext4_es_unregister_shrinker(sb);
+	ext4_es_unregister_shrinker(sbi);
 	del_timer(&sbi->s_err_report);
 	if (sbi->s_flex_groups)
 		ext4_kvfree(sbi->s_flex_groups);
@ -4517,19 +4565,52 @@ static int ext4_sync_fs(struct super_block *sb, int wait)
 {
 	int ret = 0;
 	tid_t target;
 	bool needs_barrier = false;
 	struct ext4_sb_info *sbi = EXT4_SB(sb);
 	trace_ext4_sync_fs(sb, wait);
-	flush_workqueue(sbi->dio_unwritten_wq);
+	flush_workqueue(sbi->rsv_conversion_wq);
 	flush_workqueue(sbi->unrsv_conversion_wq);
 	/*
 	 * Writeback quota in non-journalled quota case - journalled quota has
 	 * no dirty dquots
 	 */
 	dquot_writeback_dquots(sb, -1);
 	/*
 	 * Data writeback is possible w/o journal transaction, so barrier must
 	 * being sent at the end of the function. But we can skip it if
 	 * transaction_commit will do it for us.
 	 */
 	target = jbd2_get_latest_transaction(sbi->s_journal);
 	if (wait && sbi->s_journal->j_flags & JBD2_BARRIER &&
 	    !jbd2_trans_will_send_data_barrier(sbi->s_journal, target))
 		needs_barrier = true;
 	if (jbd2_journal_start_commit(sbi->s_journal, &target)) {
 		if (wait)
-			jbd2_log_wait_commit(sbi->s_journal, target);
+			ret = jbd2_log_wait_commit(sbi->s_journal, target);
 	}
 	if (needs_barrier) {
 		int err;
 		err = blkdev_issue_flush(sb->s_bdev, GFP_KERNEL, NULL);
 		if (!ret)
 			ret = err;
 	}
 	return ret;
 }
 static int ext4_sync_fs_nojournal(struct super_block *sb, int wait)
 {
 	int ret = 0;
 	trace_ext4_sync_fs(sb, wait);
 	flush_workqueue(EXT4_SB(sb)->rsv_conversion_wq);
 	flush_workqueue(EXT4_SB(sb)->unrsv_conversion_wq);
 	dquot_writeback_dquots(sb, -1);
 	if (wait && test_opt(sb, BARRIER))
 		ret = blkdev_issue_flush(sb->s_bdev, GFP_KERNEL, NULL);
 	return ret;
 }
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@ -698,7 +698,8 @@ static ssize_t f2fs_direct_IO(int rw, struct kiocb *iocb,
 						  get_data_block_ro);
 }
-static void f2fs_invalidate_data_page(struct page *page, unsigned long offset)
+static void f2fs_invalidate_data_page(struct page *page, unsigned int offset,
 				      unsigned int length)
 {
 	struct inode *inode = page->mapping->host;
 	struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@ -1205,7 +1205,8 @@ static int f2fs_set_node_page_dirty(struct page *page)
 	return 0;
 }
-static void f2fs_invalidate_node_page(struct page *page, unsigned long offset)
+static void f2fs_invalidate_node_page(struct page *page, unsigned int offset,
 				      unsigned int length)
 {
 	struct inode *inode = page->mapping->host;
 	struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@ -110,7 +110,7 @@ static int gfs2_writepage_common(struct page *page,
 	/* Is the page fully outside i_size? (truncate in progress) */
 	offset = i_size & (PAGE_CACHE_SIZE-1);
 	if (page->index > end_index || (page->index == end_index && !offset)) {
-		page->mapping->a_ops->invalidatepage(page, 0);
+		page->mapping->a_ops->invalidatepage(page, 0, PAGE_CACHE_SIZE);
 		goto out;
 	}
 	return 1;
@ -299,7 +299,8 @@ static int gfs2_write_jdata_pagevec(struct address_space *mapping,
 		/* Is the page fully outside i_size? (truncate in progress) */
 		if (page->index > end_index || (page->index == end_index && !offset)) {
-			page->mapping->a_ops->invalidatepage(page, 0);
+			page->mapping->a_ops->invalidatepage(page, 0,
 							     PAGE_CACHE_SIZE);
 			unlock_page(page);
 			continue;
 		}
@ -943,27 +944,33 @@ static void gfs2_discard(struct gfs2_sbd *sdp, struct buffer_head *bh)
 	unlock_buffer(bh);
 }
-static void gfs2_invalidatepage(struct page *page, unsigned long offset)
+static void gfs2_invalidatepage(struct page *page, unsigned int offset,
 				unsigned int length)
 {
 	struct gfs2_sbd *sdp = GFS2_SB(page->mapping->host);
 	unsigned int stop = offset + length;
 	int partial_page = (offset || length < PAGE_CACHE_SIZE);
 	struct buffer_head *bh, *head;
 	unsigned long pos = 0;
 	BUG_ON(!PageLocked(page));
-	if (offset == 0)
+	if (!partial_page)
 		ClearPageChecked(page);
 	if (!page_has_buffers(page))
 		goto out;
 	bh = head = page_buffers(page);
 	do {
 		if (pos + bh->b_size > stop)
 			return;
 		if (offset <= pos)
 			gfs2_discard(sdp, bh);
 		pos += bh->b_size;
 		bh = bh->b_this_page;
 	} while (bh != head);
 out:
-	if (offset == 0)
+	if (!partial_page)
 		try_to_release_page(page, 0);
 }
--- a/fs/jbd/transaction.c
+++ b/fs/jbd/transaction.c
@ -2019,16 +2019,20 @@ zap_buffer_unlocked:
 * void journal_invalidatepage() - invalidate a journal page
 * @journal: journal to use for flush
 * @page:    page to flush
- * @offset:  length of page to invalidate.
+ * @offset:  offset of the range to invalidate
 * @length:  length of the range to invalidate
 *
- * Reap page buffers containing data after offset in page.
+ * Reap page buffers containing data in specified range in page.
 */
 void journal_invalidatepage(journal_t *journal,
 		      struct page *page,
-		      unsigned long offset)
+		      unsigned int offset,
 		      unsigned int length)
 {
 	struct buffer_head *head, *bh, *next;
 	unsigned int stop = offset + length;
 	unsigned int curr_off = 0;
 	int partial_page = (offset || length < PAGE_CACHE_SIZE);
 	int may_free = 1;
 	if (!PageLocked(page))
@ -2036,6 +2040,8 @@ void journal_invalidatepage(journal_t *journal,
 	if (!page_has_buffers(page))
 		return;
 	BUG_ON(stop > PAGE_CACHE_SIZE || stop < length);
 	/* We will potentially be playing with lists other than just the
 	 * data lists (especially for journaled data mode), so be
 	 * cautious in our locking. */
@ -2045,11 +2051,14 @@ void journal_invalidatepage(journal_t *journal,
 		unsigned int next_off = curr_off + bh->b_size;
 		next = bh->b_this_page;
 		if (next_off > stop)
 			return;
 		if (offset <= curr_off) {
 			/* This block is wholly outside the truncation point */
 			lock_buffer(bh);
 			may_free &= journal_unmap_buffer(journal, bh,
-							 offset > 0);
+							 partial_page);
 			unlock_buffer(bh);
 		}
 		curr_off = next_off;
@ -2057,7 +2066,7 @@ void journal_invalidatepage(journal_t *journal,
 	} while (bh != head);
-	if (!offset) {
+	if (!partial_page) {
 		if (may_free && try_to_free_buffers(page))
 			J_ASSERT(!page_has_buffers(page));
 	}
--- a/fs/jbd2/Kconfig
+++ b/fs/jbd2/Kconfig
@ -20,7 +20,7 @@ config JBD2
 config JBD2_DEBUG
 	bool "JBD2 (ext4) debugging support"
-	depends on JBD2 && DEBUG_FS
+	depends on JBD2
 	help
 	  If you are using the ext4 journaled file system (or
 	  potentially any other filesystem/device using JBD2), this option
@ -29,7 +29,7 @@ config JBD2_DEBUG
 	  By default, the debugging output will be turned off.
 	  If you select Y here, then you will be able to turn on debugging
-	  with "echo N > /sys/kernel/debug/jbd2/jbd2-debug", where N is a
+	  with "echo N > /sys/module/jbd2/parameters/jbd2_debug", where N is a
 	  number between 1 and 5. The higher the number, the more debugging
 	  output is generated.  To turn debugging off again, do
-	  "echo 0 > /sys/kernel/debug/jbd2/jbd2-debug".
+	  "echo 0 > /sys/module/jbd2/parameters/jbd2_debug".
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@ -120,8 +120,8 @@ void __jbd2_log_wait_for_space(journal_t *journal)
 	int nblocks, space_left;
 	/* assert_spin_locked(&journal->j_state_lock); */
-	nblocks = jbd_space_needed(journal);
+	nblocks = jbd2_space_needed(journal);
-	while (__jbd2_log_space_left(journal) < nblocks) {
+	while (jbd2_log_space_left(journal) < nblocks) {
 		if (journal->j_flags & JBD2_ABORT)
 			return;
 		write_unlock(&journal->j_state_lock);
@ -140,8 +140,8 @@ void __jbd2_log_wait_for_space(journal_t *journal)
 		 */
 		write_lock(&journal->j_state_lock);
 		spin_lock(&journal->j_list_lock);
-		nblocks = jbd_space_needed(journal);
+		nblocks = jbd2_space_needed(journal);
-		space_left = __jbd2_log_space_left(journal);
+		space_left = jbd2_log_space_left(journal);
 		if (space_left < nblocks) {
 			int chkpt = journal->j_checkpoint_transactions != NULL;
 			tid_t tid = 0;
@ -156,7 +156,15 @@ void __jbd2_log_wait_for_space(journal_t *journal)
 				/* We were able to recover space; yay! */
 				;
 			} else if (tid) {
 				/*
 				 * jbd2_journal_commit_transaction() may want
 				 * to take the checkpoint_mutex if JBD2_FLUSHED
 				 * is set.  So we need to temporarily drop it.
 				 */
 				mutex_unlock(&journal->j_checkpoint_mutex);
 				jbd2_log_wait_commit(journal, tid);
 				write_lock(&journal->j_state_lock);
 				continue;
 			} else {
 				printk(KERN_ERR "%s: needed %d blocks and "
 				       "only had %d space available\n",
@ -625,10 +633,6 @@ int __jbd2_journal_remove_checkpoint(struct journal_head *jh)
 	__jbd2_journal_drop_transaction(journal, transaction);
 	jbd2_journal_free_transaction(transaction);
 	/* Just in case anybody was waiting for more transactions to be
           checkpointed... */
 	wake_up(&journal->j_wait_logspace);
 	ret = 1;
 out:
 	return ret;
@ -690,9 +694,7 @@ void __jbd2_journal_drop_transaction(journal_t *journal, transaction_t *transact
 	J_ASSERT(transaction->t_state == T_FINISHED);
 	J_ASSERT(transaction->t_buffers == NULL);
 	J_ASSERT(transaction->t_forget == NULL);
 	J_ASSERT(transaction->t_iobuf_list == NULL);
 	J_ASSERT(transaction->t_shadow_list == NULL);
 	J_ASSERT(transaction->t_log_list == NULL);
 	J_ASSERT(transaction->t_checkpoint_list == NULL);
 	J_ASSERT(transaction->t_checkpoint_io_list == NULL);
 	J_ASSERT(atomic_read(&transaction->t_updates) == 0);
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@ -30,15 +30,22 @@
 #include <trace/events/jbd2.h>
 /*
- * Default IO end handler for temporary BJ_IO buffer_heads.
+ * IO end handler for temporary buffer_heads handling writes to the journal.
 */
 static void journal_end_buffer_io_sync(struct buffer_head *bh, int uptodate)
 {
 	struct buffer_head *orig_bh = bh->b_private;
 	BUFFER_TRACE(bh, "");
 	if (uptodate)
 		set_buffer_uptodate(bh);
 	else
 		clear_buffer_uptodate(bh);
 	if (orig_bh) {
 		clear_bit_unlock(BH_Shadow, &orig_bh->b_state);
 		smp_mb__after_clear_bit();
 		wake_up_bit(&orig_bh->b_state, BH_Shadow);
 	}
 	unlock_buffer(bh);
 }
@ -85,8 +92,7 @@ nope:
 	__brelse(bh);
 }
-static void jbd2_commit_block_csum_set(journal_t *j,
+static void jbd2_commit_block_csum_set(journal_t *j, struct buffer_head *bh)
 				       struct journal_head *descriptor)
 {
 	struct commit_header *h;
 	__u32 csum;
@ -94,12 +100,11 @@ static void jbd2_commit_block_csum_set(journal_t *j,
 	if (!JBD2_HAS_INCOMPAT_FEATURE(j, JBD2_FEATURE_INCOMPAT_CSUM_V2))
 		return;
-	h = (struct commit_header *)(jh2bh(descriptor)->b_data);
+	h = (struct commit_header *)(bh->b_data);
 	h->h_chksum_type = 0;
 	h->h_chksum_size = 0;
 	h->h_chksum[0] = 0;
-	csum = jbd2_chksum(j, j->j_csum_seed, jh2bh(descriptor)->b_data,
+	csum = jbd2_chksum(j, j->j_csum_seed, bh->b_data, j->j_blocksize);
 			   j->j_blocksize);
 	h->h_chksum[0] = cpu_to_be32(csum);
 }
@ -116,7 +121,6 @@ static int journal_submit_commit_record(journal_t *journal,
 					struct buffer_head **cbh,
 					__u32 crc32_sum)
 {
 	struct journal_head *descriptor;
 	struct commit_header *tmp;
 	struct buffer_head *bh;
 	int ret;
@ -127,12 +131,10 @@ static int journal_submit_commit_record(journal_t *journal,
 	if (is_journal_aborted(journal))
 		return 0;
-	descriptor = jbd2_journal_get_descriptor_buffer(journal);
+	bh = jbd2_journal_get_descriptor_buffer(journal);
-	if (!descriptor)
+	if (!bh)
 		return 1;
 	bh = jh2bh(descriptor);
 	tmp = (struct commit_header *)bh->b_data;
 	tmp->h_magic = cpu_to_be32(JBD2_MAGIC_NUMBER);
 	tmp->h_blocktype = cpu_to_be32(JBD2_COMMIT_BLOCK);
@ -146,9 +148,9 @@ static int journal_submit_commit_record(journal_t *journal,
 		tmp->h_chksum_size 	= JBD2_CRC32_CHKSUM_SIZE;
 		tmp->h_chksum[0] 	= cpu_to_be32(crc32_sum);
 	}
-	jbd2_commit_block_csum_set(journal, descriptor);
+	jbd2_commit_block_csum_set(journal, bh);
-	JBUFFER_TRACE(descriptor, "submit commit block");
+	BUFFER_TRACE(bh, "submit commit block");
 	lock_buffer(bh);
 	clear_buffer_dirty(bh);
 	set_buffer_uptodate(bh);
@ -180,7 +182,6 @@ static int journal_wait_on_commit_record(journal_t *journal,
 	if (unlikely(!buffer_uptodate(bh)))
 		ret = -EIO;
 	put_bh(bh);            /* One for getblk() */
 	jbd2_journal_put_journal_head(bh2jh(bh));
 	return ret;
 }
@ -321,7 +322,7 @@ static void write_tag_block(int tag_bytes, journal_block_tag_t *tag,
 }
 static void jbd2_descr_block_csum_set(journal_t *j,
-				      struct journal_head *descriptor)
+				      struct buffer_head *bh)
 {
 	struct jbd2_journal_block_tail *tail;
 	__u32 csum;
@ -329,12 +330,10 @@ static void jbd2_descr_block_csum_set(journal_t *j,
 	if (!JBD2_HAS_INCOMPAT_FEATURE(j, JBD2_FEATURE_INCOMPAT_CSUM_V2))
 		return;
-	tail = (struct jbd2_journal_block_tail *)
+	tail = (struct jbd2_journal_block_tail *)(bh->b_data + j->j_blocksize -
 			(jh2bh(descriptor)->b_data + j->j_blocksize -
 			sizeof(struct jbd2_journal_block_tail));
 	tail->t_checksum = 0;
-	csum = jbd2_chksum(j, j->j_csum_seed, jh2bh(descriptor)->b_data,
+	csum = jbd2_chksum(j, j->j_csum_seed, bh->b_data, j->j_blocksize);
 			   j->j_blocksize);
 	tail->t_checksum = cpu_to_be32(csum);
 }
@ -343,20 +342,21 @@ static void jbd2_block_tag_csum_set(journal_t *j, journal_block_tag_t *tag,
 {
 	struct page *page = bh->b_page;
 	__u8 *addr;
-	__u32 csum;
+	__u32 csum32;
 	if (!JBD2_HAS_INCOMPAT_FEATURE(j, JBD2_FEATURE_INCOMPAT_CSUM_V2))
 		return;
 	sequence = cpu_to_be32(sequence);
 	addr = kmap_atomic(page);
-	csum = jbd2_chksum(j, j->j_csum_seed, (__u8 *)&sequence,
+	csum32 = jbd2_chksum(j, j->j_csum_seed, (__u8 *)&sequence,
-			  sizeof(sequence));
+			     sizeof(sequence));
-	csum = jbd2_chksum(j, csum, addr + offset_in_page(bh->b_data),
+	csum32 = jbd2_chksum(j, csum32, addr + offset_in_page(bh->b_data),
-			  bh->b_size);
+			     bh->b_size);
 	kunmap_atomic(addr);
-	tag->t_checksum = cpu_to_be32(csum);
+	/* We only have space to store the lower 16 bits of the crc32c. */
 	tag->t_checksum = cpu_to_be16(csum32);
 }
 /*
 * jbd2_journal_commit_transaction
@ -368,7 +368,8 @@ void jbd2_journal_commit_transaction(journal_t *journal)
 {
 	struct transaction_stats_s stats;
 	transaction_t *commit_transaction;
-	struct journal_head *jh, *new_jh, *descriptor;
+	struct journal_head *jh;
 	struct buffer_head *descriptor;
 	struct buffer_head **wbuf = journal->j_wbuf;
 	int bufs;
 	int flags;
@ -392,6 +393,8 @@ void jbd2_journal_commit_transaction(journal_t *journal)
 	tid_t first_tid;
 	int update_tail;
 	int csum_size = 0;
 	LIST_HEAD(io_bufs);
 	LIST_HEAD(log_bufs);
 	if (JBD2_HAS_INCOMPAT_FEATURE(journal, JBD2_FEATURE_INCOMPAT_CSUM_V2))
 		csum_size = sizeof(struct jbd2_journal_block_tail);
@ -424,13 +427,13 @@ void jbd2_journal_commit_transaction(journal_t *journal)
 	J_ASSERT(journal->j_committing_transaction == NULL);
 	commit_transaction = journal->j_running_transaction;
 	J_ASSERT(commit_transaction->t_state == T_RUNNING);
 	trace_jbd2_start_commit(journal, commit_transaction);
 	jbd_debug(1, "JBD2: starting commit of transaction %d\n",
 			commit_transaction->t_tid);
 	write_lock(&journal->j_state_lock);
 	J_ASSERT(commit_transaction->t_state == T_RUNNING);
 	commit_transaction->t_state = T_LOCKED;
 	trace_jbd2_commit_locking(journal, commit_transaction);
@ -520,6 +523,12 @@ void jbd2_journal_commit_transaction(journal_t *journal)
 	 */
 	jbd2_journal_switch_revoke_table(journal);
 	/*
 	 * Reserved credits cannot be claimed anymore, free them
 	 */
 	atomic_sub(atomic_read(&journal->j_reserved_credits),
 		   &commit_transaction->t_outstanding_credits);
 	trace_jbd2_commit_flushing(journal, commit_transaction);
 	stats.run.rs_flushing = jiffies;
 	stats.run.rs_locked = jbd2_time_diff(stats.run.rs_locked,
@ -533,7 +542,7 @@ void jbd2_journal_commit_transaction(journal_t *journal)
 	wake_up(&journal->j_wait_transaction_locked);
 	write_unlock(&journal->j_state_lock);
-	jbd_debug(3, "JBD2: commit phase 2\n");
+	jbd_debug(3, "JBD2: commit phase 2a\n");
 	/*
 	 * Now start flushing things to disk, in the order they appear
@ -545,10 +554,10 @@ void jbd2_journal_commit_transaction(journal_t *journal)
 	blk_start_plug(&plug);
 	jbd2_journal_write_revoke_records(journal, commit_transaction,
-					  WRITE_SYNC);
+					  &log_bufs, WRITE_SYNC);
 	blk_finish_plug(&plug);
-	jbd_debug(3, "JBD2: commit phase 2\n");
+	jbd_debug(3, "JBD2: commit phase 2b\n");
 	/*
 	 * Way to go: we have now written out all of the data for a
@ -571,8 +580,8 @@ void jbd2_journal_commit_transaction(journal_t *journal)
 		 atomic_read(&commit_transaction->t_outstanding_credits));
 	err = 0;
 	descriptor = NULL;
 	bufs = 0;
 	descriptor = NULL;
 	blk_start_plug(&plug);
 	while (commit_transaction->t_buffers) {
@ -604,8 +613,6 @@ void jbd2_journal_commit_transaction(journal_t *journal)
 		   record the metadata buffer. */
 		if (!descriptor) {
 			struct buffer_head *bh;
 			J_ASSERT (bufs == 0);
 			jbd_debug(4, "JBD2: get descriptor\n");
@ -616,26 +623,26 @@ void jbd2_journal_commit_transaction(journal_t *journal)
 				continue;
 			}
 			bh = jh2bh(descriptor);
 			jbd_debug(4, "JBD2: got buffer %llu (%p)\n",
-				(unsigned long long)bh->b_blocknr, bh->b_data);
+				(unsigned long long)descriptor->b_blocknr,
-			header = (journal_header_t *)&bh->b_data[0];
+				descriptor->b_data);
 			header = (journal_header_t *)descriptor->b_data;
 			header->h_magic     = cpu_to_be32(JBD2_MAGIC_NUMBER);
 			header->h_blocktype = cpu_to_be32(JBD2_DESCRIPTOR_BLOCK);
 			header->h_sequence  = cpu_to_be32(commit_transaction->t_tid);
-			tagp = &bh->b_data[sizeof(journal_header_t)];
+			tagp = &descriptor->b_data[sizeof(journal_header_t)];
-			space_left = bh->b_size - sizeof(journal_header_t);
+			space_left = descriptor->b_size -
 						sizeof(journal_header_t);
 			first_tag = 1;
-			set_buffer_jwrite(bh);
+			set_buffer_jwrite(descriptor);
-			set_buffer_dirty(bh);
+			set_buffer_dirty(descriptor);
-			wbuf[bufs++] = bh;
+			wbuf[bufs++] = descriptor;
 			/* Record it so that we can wait for IO
                           completion later */
-			BUFFER_TRACE(bh, "ph3: file as descriptor");
+			BUFFER_TRACE(descriptor, "ph3: file as descriptor");
-			jbd2_journal_file_buffer(descriptor, commit_transaction,
+			jbd2_file_log_bh(&log_bufs, descriptor);
 					BJ_LogCtl);
 		}
 		/* Where is the buffer to be written? */
@ -658,29 +665,22 @@ void jbd2_journal_commit_transaction(journal_t *journal)
 		/* Bump b_count to prevent truncate from stumbling over
                   the shadowed buffer!  @@@ This can go if we ever get
-                   rid of the BJ_IO/BJ_Shadow pairing of buffers. */
+                   rid of the shadow pairing of buffers. */
 		atomic_inc(&jh2bh(jh)->b_count);
 		/* Make a temporary IO buffer with which to write it out
                   (this will requeue both the metadata buffer and the
                   temporary IO buffer). new_bh goes on BJ_IO*/
 		set_bit(BH_JWrite, &jh2bh(jh)->b_state);
 		/*
-		 * akpm: jbd2_journal_write_metadata_buffer() sets
+		 * Make a temporary IO buffer with which to write it out
-		 * new_bh->b_transaction to commit_transaction.
+		 * (this will requeue the metadata buffer to BJ_Shadow).
 		 * We need to clean this up before we release new_bh
 		 * (which is of type BJ_IO)
 		 */
 		set_bit(BH_JWrite, &jh2bh(jh)->b_state);
 		JBUFFER_TRACE(jh, "ph3: write metadata");
 		flags = jbd2_journal_write_metadata_buffer(commit_transaction,
-						      jh, &new_jh, blocknr);
+						jh, &wbuf[bufs], blocknr);
 		if (flags < 0) {
 			jbd2_journal_abort(journal, flags);
 			continue;
 		}
-		set_bit(BH_JWrite, &jh2bh(new_jh)->b_state);
+		jbd2_file_log_bh(&io_bufs, wbuf[bufs]);
 		wbuf[bufs++] = jh2bh(new_jh);
 		/* Record the new block's tag in the current descriptor
                   buffer */
@ -694,10 +694,11 @@ void jbd2_journal_commit_transaction(journal_t *journal)
 		tag = (journal_block_tag_t *) tagp;
 		write_tag_block(tag_bytes, tag, jh2bh(jh)->b_blocknr);
 		tag->t_flags = cpu_to_be16(tag_flag);
-		jbd2_block_tag_csum_set(journal, tag, jh2bh(new_jh),
+		jbd2_block_tag_csum_set(journal, tag, wbuf[bufs],
 					commit_transaction->t_tid);
 		tagp += tag_bytes;
 		space_left -= tag_bytes;
 		bufs++;
 		if (first_tag) {
 			memcpy (tagp, journal->j_uuid, 16);
@ -809,7 +810,7 @@ start_journal_io:
           the log.  Before we can commit it, wait for the IO so far to
           complete.  Control buffers being written are on the
           transaction's t_log_list queue, and metadata buffers are on
-           the t_iobuf_list queue.
+           the io_bufs list.
 	   Wait for the buffers in reverse order.  That way we are
 	   less likely to be woken up until all IOs have completed, and
@ -818,47 +819,33 @@ start_journal_io:
 	jbd_debug(3, "JBD2: commit phase 3\n");
-	/*
+	while (!list_empty(&io_bufs)) {
-	 * akpm: these are BJ_IO, and j_list_lock is not needed.
+		struct buffer_head *bh = list_entry(io_bufs.prev,
-	 * See __journal_try_to_free_buffer.
+						    struct buffer_head,
-	 */
+						    b_assoc_buffers);
 wait_for_iobuf:
 	while (commit_transaction->t_iobuf_list != NULL) {
 		struct buffer_head *bh;
-		jh = commit_transaction->t_iobuf_list->b_tprev;
+		wait_on_buffer(bh);
-		bh = jh2bh(jh);
+		cond_resched();
 		if (buffer_locked(bh)) {
 			wait_on_buffer(bh);
 			goto wait_for_iobuf;
 		}
 		if (cond_resched())
 			goto wait_for_iobuf;
 		if (unlikely(!buffer_uptodate(bh)))
 			err = -EIO;
-
+		jbd2_unfile_log_bh(bh);
 		clear_buffer_jwrite(bh);
 		JBUFFER_TRACE(jh, "ph4: unfile after journal write");
 		jbd2_journal_unfile_buffer(journal, jh);
 		/*
-		 * ->t_iobuf_list should contain only dummy buffer_heads
+		 * The list contains temporary buffer heads created by
-		 * which were created by jbd2_journal_write_metadata_buffer().
+		 * jbd2_journal_write_metadata_buffer().
 		 */
 		BUFFER_TRACE(bh, "dumping temporary bh");
 		jbd2_journal_put_journal_head(jh);
 		__brelse(bh);
 		J_ASSERT_BH(bh, atomic_read(&bh->b_count) == 0);
 		free_buffer_head(bh);
-		/* We also have to unlock and free the corresponding
+		/* We also have to refile the corresponding shadowed buffer */
                   shadowed buffer */
 		jh = commit_transaction->t_shadow_list->b_tprev;
 		bh = jh2bh(jh);
-		clear_bit(BH_JWrite, &bh->b_state);
+		clear_buffer_jwrite(bh);
 		J_ASSERT_BH(bh, buffer_jbddirty(bh));
 		J_ASSERT_BH(bh, !buffer_shadow(bh));
 		/* The metadata is now released for reuse, but we need
                   to remember it against this transaction so that when
@ -866,14 +853,6 @@ wait_for_iobuf:
                   required. */
 		JBUFFER_TRACE(jh, "file as BJ_Forget");
 		jbd2_journal_file_buffer(jh, commit_transaction, BJ_Forget);
 		/*
 		 * Wake up any transactions which were waiting for this IO to
 		 * complete. The barrier must be here so that changes by
 		 * jbd2_journal_file_buffer() take effect before wake_up_bit()
 		 * does the waitqueue check.
 		 */
 		smp_mb();
 		wake_up_bit(&bh->b_state, BH_Unshadow);
 		JBUFFER_TRACE(jh, "brelse shadowed buffer");
 		__brelse(bh);
 	}
@ -883,26 +862,19 @@ wait_for_iobuf:
 	jbd_debug(3, "JBD2: commit phase 4\n");
 	/* Here we wait for the revoke record and descriptor record buffers */
- wait_for_ctlbuf:
+	while (!list_empty(&log_bufs)) {
 	while (commit_transaction->t_log_list != NULL) {
 		struct buffer_head *bh;
-		jh = commit_transaction->t_log_list->b_tprev;
+		bh = list_entry(log_bufs.prev, struct buffer_head, b_assoc_buffers);
-		bh = jh2bh(jh);
+		wait_on_buffer(bh);
-		if (buffer_locked(bh)) {
+		cond_resched();
 			wait_on_buffer(bh);
 			goto wait_for_ctlbuf;
 		}
 		if (cond_resched())
 			goto wait_for_ctlbuf;
 		if (unlikely(!buffer_uptodate(bh)))
 			err = -EIO;
 		BUFFER_TRACE(bh, "ph5: control buffer writeout done: unfile");
 		clear_buffer_jwrite(bh);
-		jbd2_journal_unfile_buffer(journal, jh);
+		jbd2_unfile_log_bh(bh);
 		jbd2_journal_put_journal_head(jh);
 		__brelse(bh);		/* One for getblk */
 		/* AKPM: bforget here */
 	}
@ -952,9 +924,7 @@ wait_for_iobuf:
 	J_ASSERT(list_empty(&commit_transaction->t_inode_list));
 	J_ASSERT(commit_transaction->t_buffers == NULL);
 	J_ASSERT(commit_transaction->t_checkpoint_list == NULL);
 	J_ASSERT(commit_transaction->t_iobuf_list == NULL);
 	J_ASSERT(commit_transaction->t_shadow_list == NULL);
 	J_ASSERT(commit_transaction->t_log_list == NULL);
 restart_loop:
 	/*
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@ -103,6 +103,24 @@ EXPORT_SYMBOL(jbd2_inode_cache);
 static void __journal_abort_soft (journal_t *journal, int errno);
 static int jbd2_journal_create_slab(size_t slab_size);
 #ifdef CONFIG_JBD2_DEBUG
 void __jbd2_debug(int level, const char *file, const char *func,
 		  unsigned int line, const char *fmt, ...)
 {
 	struct va_format vaf;
 	va_list args;
 	if (level > jbd2_journal_enable_debug)
 		return;
 	va_start(args, fmt);
 	vaf.fmt = fmt;
 	vaf.va = &args;
 	printk(KERN_DEBUG "%s: (%s, %u): %pV\n", file, func, line, &vaf);
 	va_end(args);
 }
 EXPORT_SYMBOL(__jbd2_debug);
 #endif
 /* Checksumming functions */
 int jbd2_verify_csum_type(journal_t *j, journal_superblock_t *sb)
 {
@ -310,14 +328,12 @@ static void journal_kill_thread(journal_t *journal)
 *
 * If the source buffer has already been modified by a new transaction
 * since we took the last commit snapshot, we use the frozen copy of
- * that data for IO.  If we end up using the existing buffer_head's data
+ * that data for IO. If we end up using the existing buffer_head's data
- * for the write, then we *have* to lock the buffer to prevent anyone
+ * for the write, then we have to make sure nobody modifies it while the
- * else from using and possibly modifying it while the IO is in
+ * IO is in progress. do_get_write_access() handles this.
 * progress.
 *
- * The function returns a pointer to the buffer_heads to be used for IO.
+ * The function returns a pointer to the buffer_head to be used for IO.
- *
+ * 
 * We assume that the journal has already been locked in this function.
 *
 * Return value:
 *  <0: Error
@ -330,15 +346,14 @@ static void journal_kill_thread(journal_t *journal)
 int jbd2_journal_write_metadata_buffer(transaction_t *transaction,
 				  struct journal_head  *jh_in,
-				  struct journal_head **jh_out,
+				  struct buffer_head **bh_out,
-				  unsigned long long blocknr)
+				  sector_t blocknr)
 {
 	int need_copy_out = 0;
 	int done_copy_out = 0;
 	int do_escape = 0;
 	char *mapped_data;
 	struct buffer_head *new_bh;
 	struct journal_head *new_jh;
 	struct page *new_page;
 	unsigned int new_offset;
 	struct buffer_head *bh_in = jh2bh(jh_in);
@ -368,14 +383,13 @@ retry_alloc:
 	/* keep subsequent assertions sane */
 	atomic_set(&new_bh->b_count, 1);
 	new_jh = jbd2_journal_add_journal_head(new_bh);	/* This sleeps */
 	jbd_lock_bh_state(bh_in);
 repeat:
 	/*
 	 * If a new transaction has already done a buffer copy-out, then
 	 * we use that version of the data for the commit.
 	 */
 	jbd_lock_bh_state(bh_in);
 repeat:
 	if (jh_in->b_frozen_data) {
 		done_copy_out = 1;
 		new_page = virt_to_page(jh_in->b_frozen_data);
@ -415,7 +429,7 @@ repeat:
 		jbd_unlock_bh_state(bh_in);
 		tmp = jbd2_alloc(bh_in->b_size, GFP_NOFS);
 		if (!tmp) {
-			jbd2_journal_put_journal_head(new_jh);
+			brelse(new_bh);
 			return -ENOMEM;
 		}
 		jbd_lock_bh_state(bh_in);
@ -426,7 +440,7 @@ repeat:
 		jh_in->b_frozen_data = tmp;
 		mapped_data = kmap_atomic(new_page);
-		memcpy(tmp, mapped_data + new_offset, jh2bh(jh_in)->b_size);
+		memcpy(tmp, mapped_data + new_offset, bh_in->b_size);
 		kunmap_atomic(mapped_data);
 		new_page = virt_to_page(tmp);
@ -452,14 +466,14 @@ repeat:
 	}
 	set_bh_page(new_bh, new_page, new_offset);
-	new_jh->b_transaction = NULL;
+	new_bh->b_size = bh_in->b_size;
-	new_bh->b_size = jh2bh(jh_in)->b_size;
+	new_bh->b_bdev = journal->j_dev;
 	new_bh->b_bdev = transaction->t_journal->j_dev;
 	new_bh->b_blocknr = blocknr;
 	new_bh->b_private = bh_in;
 	set_buffer_mapped(new_bh);
 	set_buffer_dirty(new_bh);
-	*jh_out = new_jh;
+	*bh_out = new_bh;
 	/*
 	 * The to-be-written buffer needs to get moved to the io queue,
@ -470,11 +484,9 @@ repeat:
 	spin_lock(&journal->j_list_lock);
 	__jbd2_journal_file_buffer(jh_in, transaction, BJ_Shadow);
 	spin_unlock(&journal->j_list_lock);
 	set_buffer_shadow(bh_in);
 	jbd_unlock_bh_state(bh_in);
 	JBUFFER_TRACE(new_jh, "file as BJ_IO");
 	jbd2_journal_file_buffer(new_jh, transaction, BJ_IO);
 	return do_escape | (done_copy_out << 1);
 }
@ -483,35 +495,6 @@ repeat:
 * journal, so that we can begin checkpointing when appropriate.
 */
 /*
 * __jbd2_log_space_left: Return the number of free blocks left in the journal.
 *
 * Called with the journal already locked.
 *
 * Called under j_state_lock
 */
 int __jbd2_log_space_left(journal_t *journal)
 {
 	int left = journal->j_free;
 	/* assert_spin_locked(&journal->j_state_lock); */
 	/*
 	 * Be pessimistic here about the number of those free blocks which
 	 * might be required for log descriptor control blocks.
 	 */
 #define MIN_LOG_RESERVED_BLOCKS 32 /* Allow for rounding errors */
 	left -= MIN_LOG_RESERVED_BLOCKS;
 	if (left <= 0)
 		return 0;
 	left -= (left >> 3);
 	return left;
 }
 /*
 * Called with j_state_lock locked for writing.
 * Returns true if a transaction commit was started.
@ -564,20 +547,17 @@ int jbd2_log_start_commit(journal_t *journal, tid_t tid)
 }
 /*
- * Force and wait upon a commit if the calling process is not within
+ * Force and wait any uncommitted transactions.  We can only force the running
- * transaction.  This is used for forcing out undo-protected data which contains
+ * transaction if we don't have an active handle, otherwise, we will deadlock.
- * bitmaps, when the fs is running out of space.
+ * Returns: <0 in case of error,
- *
+ *           0 if nothing to commit,
- * We can only force the running transaction if we don't have an active handle;
+ *           1 if transaction was successfully committed.
 * otherwise, we will deadlock.
 *
 * Returns true if a transaction was started.
 */
-int jbd2_journal_force_commit_nested(journal_t *journal)
+static int __jbd2_journal_force_commit(journal_t *journal)
 {
 	transaction_t *transaction = NULL;
 	tid_t tid;
-	int need_to_start = 0;
+	int need_to_start = 0, ret = 0;
 	read_lock(&journal->j_state_lock);
 	if (journal->j_running_transaction && !current->journal_info) {
@ -588,16 +568,53 @@ int jbd2_journal_force_commit_nested(journal_t *journal)
 		transaction = journal->j_committing_transaction;
 	if (!transaction) {
 		/* Nothing to commit */
 		read_unlock(&journal->j_state_lock);
-		return 0;	/* Nothing to retry */
+		return 0;
 	}
 	tid = transaction->t_tid;
 	read_unlock(&journal->j_state_lock);
 	if (need_to_start)
 		jbd2_log_start_commit(journal, tid);
-	jbd2_log_wait_commit(journal, tid);
+	ret = jbd2_log_wait_commit(journal, tid);
-	return 1;
+	if (!ret)
 		ret = 1;
 	return ret;
 }
 /**
 * Force and wait upon a commit if the calling process is not within
 * transaction.  This is used for forcing out undo-protected data which contains
 * bitmaps, when the fs is running out of space.
 *
 * @journal: journal to force
 * Returns true if progress was made.
 */
 int jbd2_journal_force_commit_nested(journal_t *journal)
 {
 	int ret;
 	ret = __jbd2_journal_force_commit(journal);
 	return ret > 0;
 }
 /**
 * int journal_force_commit() - force any uncommitted transactions
 * @journal: journal to force
 *
 * Caller want unconditional commit. We can only force the running transaction
 * if we don't have an active handle, otherwise, we will deadlock.
 */
 int jbd2_journal_force_commit(journal_t *journal)
 {
 	int ret;
 	J_ASSERT(!current->journal_info);
 	ret = __jbd2_journal_force_commit(journal);
 	if (ret > 0)
 		ret = 0;
 	return ret;
 }
 /*
@ -798,7 +815,7 @@ int jbd2_journal_bmap(journal_t *journal, unsigned long blocknr,
 * But we don't bother doing that, so there will be coherency problems with
 * mmaps of blockdevs which hold live JBD-controlled filesystems.
 */
-struct journal_head *jbd2_journal_get_descriptor_buffer(journal_t *journal)
+struct buffer_head *jbd2_journal_get_descriptor_buffer(journal_t *journal)
 {
 	struct buffer_head *bh;
 	unsigned long long blocknr;
@ -817,7 +834,7 @@ struct journal_head *jbd2_journal_get_descriptor_buffer(journal_t *journal)
 	set_buffer_uptodate(bh);
 	unlock_buffer(bh);
 	BUFFER_TRACE(bh, "return this buffer");
-	return jbd2_journal_add_journal_head(bh);
+	return bh;
 }
 /*
@ -1062,11 +1079,10 @@ static journal_t * journal_init_common (void)
 		return NULL;
 	init_waitqueue_head(&journal->j_wait_transaction_locked);
 	init_waitqueue_head(&journal->j_wait_logspace);
 	init_waitqueue_head(&journal->j_wait_done_commit);
 	init_waitqueue_head(&journal->j_wait_checkpoint);
 	init_waitqueue_head(&journal->j_wait_commit);
 	init_waitqueue_head(&journal->j_wait_updates);
 	init_waitqueue_head(&journal->j_wait_reserved);
 	mutex_init(&journal->j_barrier);
 	mutex_init(&journal->j_checkpoint_mutex);
 	spin_lock_init(&journal->j_revoke_lock);
@ -1076,6 +1092,7 @@ static journal_t * journal_init_common (void)
 	journal->j_commit_interval = (HZ * JBD2_DEFAULT_MAX_COMMIT_AGE);
 	journal->j_min_batch_time = 0;
 	journal->j_max_batch_time = 15000; /* 15ms */
 	atomic_set(&journal->j_reserved_credits, 0);
 	/* The journal is marked for error until we succeed with recovery! */
 	journal->j_flags = JBD2_ABORT;
@ -1318,6 +1335,7 @@ static int journal_reset(journal_t *journal)
 static void jbd2_write_superblock(journal_t *journal, int write_op)
 {
 	struct buffer_head *bh = journal->j_sb_buffer;
 	journal_superblock_t *sb = journal->j_superblock;
 	int ret;
 	trace_jbd2_write_superblock(journal, write_op);
@ -1339,6 +1357,7 @@ static void jbd2_write_superblock(journal_t *journal, int write_op)
 		clear_buffer_write_io_error(bh);
 		set_buffer_uptodate(bh);
 	}
 	jbd2_superblock_csum_set(journal, sb);
 	get_bh(bh);
 	bh->b_end_io = end_buffer_write_sync;
 	ret = submit_bh(write_op, bh);
@ -1435,7 +1454,6 @@ void jbd2_journal_update_sb_errno(journal_t *journal)
 	jbd_debug(1, "JBD2: updating superblock error (errno %d)\n",
 		  journal->j_errno);
 	sb->s_errno    = cpu_to_be32(journal->j_errno);
 	jbd2_superblock_csum_set(journal, sb);
 	read_unlock(&journal->j_state_lock);
 	jbd2_write_superblock(journal, WRITE_SYNC);
@ -2325,13 +2343,13 @@ static struct journal_head *journal_alloc_journal_head(void)
 #ifdef CONFIG_JBD2_DEBUG
 	atomic_inc(&nr_journal_heads);
 #endif
-	ret = kmem_cache_alloc(jbd2_journal_head_cache, GFP_NOFS);
+	ret = kmem_cache_zalloc(jbd2_journal_head_cache, GFP_NOFS);
 	if (!ret) {
 		jbd_debug(1, "out of memory for journal_head\n");
 		pr_notice_ratelimited("ENOMEM in %s, retrying.\n", __func__);
 		while (!ret) {
 			yield();
-			ret = kmem_cache_alloc(jbd2_journal_head_cache, GFP_NOFS);
+			ret = kmem_cache_zalloc(jbd2_journal_head_cache, GFP_NOFS);
 		}
 	}
 	return ret;
@ -2393,10 +2411,8 @@ struct journal_head *jbd2_journal_add_journal_head(struct buffer_head *bh)
 	struct journal_head *new_jh = NULL;
 repeat:
-	if (!buffer_jbd(bh)) {
+	if (!buffer_jbd(bh))
 		new_jh = journal_alloc_journal_head();
 		memset(new_jh, 0, sizeof(*new_jh));
 	}
 	jbd_lock_bh_journal_head(bh);
 	if (buffer_jbd(bh)) {
--- a/fs/jbd2/recovery.c
+++ b/fs/jbd2/recovery.c
@ -399,18 +399,17 @@ static int jbd2_commit_block_csum_verify(journal_t *j, void *buf)
 static int jbd2_block_tag_csum_verify(journal_t *j, journal_block_tag_t *tag,
 				      void *buf, __u32 sequence)
 {
-	__u32 provided, calculated;
+	__u32 csum32;
 	if (!JBD2_HAS_INCOMPAT_FEATURE(j, JBD2_FEATURE_INCOMPAT_CSUM_V2))
 		return 1;
 	sequence = cpu_to_be32(sequence);
-	calculated = jbd2_chksum(j, j->j_csum_seed, (__u8 *)&sequence,
+	csum32 = jbd2_chksum(j, j->j_csum_seed, (__u8 *)&sequence,
-				 sizeof(sequence));
+			     sizeof(sequence));
-	calculated = jbd2_chksum(j, calculated, buf, j->j_blocksize);
+	csum32 = jbd2_chksum(j, csum32, buf, j->j_blocksize);
 	provided = be32_to_cpu(tag->t_checksum);
-	return provided == cpu_to_be32(calculated);
+	return tag->t_checksum == cpu_to_be16(csum32);
 }
 static int do_one_pass(journal_t *journal,
--- a/fs/jbd2/revoke.c
+++ b/fs/jbd2/revoke.c
@ -122,9 +122,10 @@ struct jbd2_revoke_table_s
 #ifdef __KERNEL__
 static void write_one_revoke_record(journal_t *, transaction_t *,
-				    struct journal_head **, int *,
+				    struct list_head *,
 				    struct buffer_head **, int *,
 				    struct jbd2_revoke_record_s *, int);
-static void flush_descriptor(journal_t *, struct journal_head *, int, int);
+static void flush_descriptor(journal_t *, struct buffer_head *, int, int);
 #endif
 /* Utility functions to maintain the revoke table */
@ -531,9 +532,10 @@ void jbd2_journal_switch_revoke_table(journal_t *journal)
 */
 void jbd2_journal_write_revoke_records(journal_t *journal,
 				       transaction_t *transaction,
 				       struct list_head *log_bufs,
 				       int write_op)
 {
-	struct journal_head *descriptor;
+	struct buffer_head *descriptor;
 	struct jbd2_revoke_record_s *record;
 	struct jbd2_revoke_table_s *revoke;
 	struct list_head *hash_list;
@ -553,7 +555,7 @@ void jbd2_journal_write_revoke_records(journal_t *journal,
 		while (!list_empty(hash_list)) {
 			record = (struct jbd2_revoke_record_s *)
 				hash_list->next;
-			write_one_revoke_record(journal, transaction,
+			write_one_revoke_record(journal, transaction, log_bufs,
 						&descriptor, &offset,
 						record, write_op);
 			count++;
@ -573,13 +575,14 @@ void jbd2_journal_write_revoke_records(journal_t *journal,
 static void write_one_revoke_record(journal_t *journal,
 				    transaction_t *transaction,
-				    struct journal_head **descriptorp,
+				    struct list_head *log_bufs,
 				    struct buffer_head **descriptorp,
 				    int *offsetp,
 				    struct jbd2_revoke_record_s *record,
 				    int write_op)
 {
 	int csum_size = 0;
-	struct journal_head *descriptor;
+	struct buffer_head *descriptor;
 	int offset;
 	journal_header_t *header;
@ -609,26 +612,26 @@ static void write_one_revoke_record(journal_t *journal,
 		descriptor = jbd2_journal_get_descriptor_buffer(journal);
 		if (!descriptor)
 			return;
-		header = (journal_header_t *) &jh2bh(descriptor)->b_data[0];
+		header = (journal_header_t *)descriptor->b_data;
 		header->h_magic     = cpu_to_be32(JBD2_MAGIC_NUMBER);
 		header->h_blocktype = cpu_to_be32(JBD2_REVOKE_BLOCK);
 		header->h_sequence  = cpu_to_be32(transaction->t_tid);
 		/* Record it so that we can wait for IO completion later */
-		JBUFFER_TRACE(descriptor, "file as BJ_LogCtl");
+		BUFFER_TRACE(descriptor, "file in log_bufs");
-		jbd2_journal_file_buffer(descriptor, transaction, BJ_LogCtl);
+		jbd2_file_log_bh(log_bufs, descriptor);
 		offset = sizeof(jbd2_journal_revoke_header_t);
 		*descriptorp = descriptor;
 	}
 	if (JBD2_HAS_INCOMPAT_FEATURE(journal, JBD2_FEATURE_INCOMPAT_64BIT)) {
-		* ((__be64 *)(&jh2bh(descriptor)->b_data[offset])) =
+		* ((__be64 *)(&descriptor->b_data[offset])) =
 			cpu_to_be64(record->blocknr);
 		offset += 8;
 	} else {
-		* ((__be32 *)(&jh2bh(descriptor)->b_data[offset])) =
+		* ((__be32 *)(&descriptor->b_data[offset])) =
 			cpu_to_be32(record->blocknr);
 		offset += 4;
 	}
@ -636,8 +639,7 @@ static void write_one_revoke_record(journal_t *journal,
 	*offsetp = offset;
 }
-static void jbd2_revoke_csum_set(journal_t *j,
+static void jbd2_revoke_csum_set(journal_t *j, struct buffer_head *bh)
 				 struct journal_head *descriptor)
 {
 	struct jbd2_journal_revoke_tail *tail;
 	__u32 csum;
@ -645,12 +647,10 @@ static void jbd2_revoke_csum_set(journal_t *j,
 	if (!JBD2_HAS_INCOMPAT_FEATURE(j, JBD2_FEATURE_INCOMPAT_CSUM_V2))
 		return;
-	tail = (struct jbd2_journal_revoke_tail *)
+	tail = (struct jbd2_journal_revoke_tail *)(bh->b_data + j->j_blocksize -
 			(jh2bh(descriptor)->b_data + j->j_blocksize -
 			sizeof(struct jbd2_journal_revoke_tail));
 	tail->r_checksum = 0;
-	csum = jbd2_chksum(j, j->j_csum_seed, jh2bh(descriptor)->b_data,
+	csum = jbd2_chksum(j, j->j_csum_seed, bh->b_data, j->j_blocksize);
 			   j->j_blocksize);
 	tail->r_checksum = cpu_to_be32(csum);
 }
@ -662,25 +662,24 @@ static void jbd2_revoke_csum_set(journal_t *j,
 */
 static void flush_descriptor(journal_t *journal,
-			     struct journal_head *descriptor,
+			     struct buffer_head *descriptor,
 			     int offset, int write_op)
 {
 	jbd2_journal_revoke_header_t *header;
 	struct buffer_head *bh = jh2bh(descriptor);
 	if (is_journal_aborted(journal)) {
-		put_bh(bh);
+		put_bh(descriptor);
 		return;
 	}
-	header = (jbd2_journal_revoke_header_t *) jh2bh(descriptor)->b_data;
+	header = (jbd2_journal_revoke_header_t *)descriptor->b_data;
 	header->r_count = cpu_to_be32(offset);
 	jbd2_revoke_csum_set(journal, descriptor);
-	set_buffer_jwrite(bh);
+	set_buffer_jwrite(descriptor);
-	BUFFER_TRACE(bh, "write");
+	BUFFER_TRACE(descriptor, "write");
-	set_buffer_dirty(bh);
+	set_buffer_dirty(descriptor);
-	write_dirty_buffer(bh, write_op);
+	write_dirty_buffer(descriptor, write_op);
 }
 #endif
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@ -89,7 +89,8 @@ jbd2_get_transaction(journal_t *journal, transaction_t *transaction)
 	transaction->t_expires = jiffies + journal->j_commit_interval;
 	spin_lock_init(&transaction->t_handle_lock);
 	atomic_set(&transaction->t_updates, 0);
-	atomic_set(&transaction->t_outstanding_credits, 0);
+	atomic_set(&transaction->t_outstanding_credits,
 		   atomic_read(&journal->j_reserved_credits));
 	atomic_set(&transaction->t_handle_count, 0);
 	INIT_LIST_HEAD(&transaction->t_inode_list);
 	INIT_LIST_HEAD(&transaction->t_private_list);
@ -140,6 +141,112 @@ static inline void update_t_max_wait(transaction_t *transaction,
 #endif
 }
 /*
 * Wait until running transaction passes T_LOCKED state. Also starts the commit
 * if needed. The function expects running transaction to exist and releases
 * j_state_lock.
 */
 static void wait_transaction_locked(journal_t *journal)
 	__releases(journal->j_state_lock)
 {
 	DEFINE_WAIT(wait);
 	int need_to_start;
 	tid_t tid = journal->j_running_transaction->t_tid;
 	prepare_to_wait(&journal->j_wait_transaction_locked, &wait,
 			TASK_UNINTERRUPTIBLE);
 	need_to_start = !tid_geq(journal->j_commit_request, tid);
 	read_unlock(&journal->j_state_lock);
 	if (need_to_start)
 		jbd2_log_start_commit(journal, tid);
 	schedule();
 	finish_wait(&journal->j_wait_transaction_locked, &wait);
 }
 static void sub_reserved_credits(journal_t *journal, int blocks)
 {
 	atomic_sub(blocks, &journal->j_reserved_credits);
 	wake_up(&journal->j_wait_reserved);
 }
 /*
 * Wait until we can add credits for handle to the running transaction.  Called
 * with j_state_lock held for reading. Returns 0 if handle joined the running
 * transaction. Returns 1 if we had to wait, j_state_lock is dropped, and
 * caller must retry.
 */
 static int add_transaction_credits(journal_t *journal, int blocks,
 				   int rsv_blocks)
 {
 	transaction_t *t = journal->j_running_transaction;
 	int needed;
 	int total = blocks + rsv_blocks;
 	/*
 	 * If the current transaction is locked down for commit, wait
 	 * for the lock to be released.
 	 */
 	if (t->t_state == T_LOCKED) {
 		wait_transaction_locked(journal);
 		return 1;
 	}
 	/*
 	 * If there is not enough space left in the log to write all
 	 * potential buffers requested by this operation, we need to
 	 * stall pending a log checkpoint to free some more log space.
 	 */
 	needed = atomic_add_return(total, &t->t_outstanding_credits);
 	if (needed > journal->j_max_transaction_buffers) {
 		/*
 		 * If the current transaction is already too large,
 		 * then start to commit it: we can then go back and
 		 * attach this handle to a new transaction.
 		 */
 		atomic_sub(total, &t->t_outstanding_credits);
 		wait_transaction_locked(journal);
 		return 1;
 	}
 	/*
 	 * The commit code assumes that it can get enough log space
 	 * without forcing a checkpoint.  This is *critical* for
 	 * correctness: a checkpoint of a buffer which is also
 	 * associated with a committing transaction creates a deadlock,
 	 * so commit simply cannot force through checkpoints.
 	 *
 	 * We must therefore ensure the necessary space in the journal
 	 * *before* starting to dirty potentially checkpointed buffers
 	 * in the new transaction.
 	 */
 	if (jbd2_log_space_left(journal) < jbd2_space_needed(journal)) {
 		atomic_sub(total, &t->t_outstanding_credits);
 		read_unlock(&journal->j_state_lock);
 		write_lock(&journal->j_state_lock);
 		if (jbd2_log_space_left(journal) < jbd2_space_needed(journal))
 			__jbd2_log_wait_for_space(journal);
 		write_unlock(&journal->j_state_lock);
 		return 1;
 	}
 	/* No reservation? We are done... */
 	if (!rsv_blocks)
 		return 0;
 	needed = atomic_add_return(rsv_blocks, &journal->j_reserved_credits);
 	/* We allow at most half of a transaction to be reserved */
 	if (needed > journal->j_max_transaction_buffers / 2) {
 		sub_reserved_credits(journal, rsv_blocks);
 		atomic_sub(total, &t->t_outstanding_credits);
 		read_unlock(&journal->j_state_lock);
 		wait_event(journal->j_wait_reserved,
 			 atomic_read(&journal->j_reserved_credits) + rsv_blocks
 			 <= journal->j_max_transaction_buffers / 2);
 		return 1;
 	}
 	return 0;
 }
 /*
 * start_this_handle: Given a handle, deal with any locking or stalling
 * needed to make sure that there is enough journal space for the handle
@ -151,18 +258,24 @@ static int start_this_handle(journal_t *journal, handle_t *handle,
 			     gfp_t gfp_mask)
 {
 	transaction_t	*transaction, *new_transaction = NULL;
-	tid_t		tid;
+	int		blocks = handle->h_buffer_credits;
-	int		needed, need_to_start;
+	int		rsv_blocks = 0;
 	int		nblocks = handle->h_buffer_credits;
 	unsigned long ts = jiffies;
-	if (nblocks > journal->j_max_transaction_buffers) {
+	/*
 	 * 1/2 of transaction can be reserved so we can practically handle
 	 * only 1/2 of maximum transaction size per operation
 	 */
 	if (WARN_ON(blocks > journal->j_max_transaction_buffers / 2)) {
 		printk(KERN_ERR "JBD2: %s wants too many credits (%d > %d)\n",
-		       current->comm, nblocks,
+		       current->comm, blocks,
-		       journal->j_max_transaction_buffers);
+		       journal->j_max_transaction_buffers / 2);
 		return -ENOSPC;
 	}
 	if (handle->h_rsv_handle)
 		rsv_blocks = handle->h_rsv_handle->h_buffer_credits;
 alloc_transaction:
 	if (!journal->j_running_transaction) {
 		new_transaction = kmem_cache_zalloc(transaction_cache,
@ -199,8 +312,12 @@ repeat:
 		return -EROFS;
 	}
-	/* Wait on the journal's transaction barrier if necessary */
+	/*
-	if (journal->j_barrier_count) {
+	 * Wait on the journal's transaction barrier if necessary. Specifically
 	 * we allow reserved handles to proceed because otherwise commit could
 	 * deadlock on page writeback not being able to complete.
 	 */
 	if (!handle->h_reserved && journal->j_barrier_count) {
 		read_unlock(&journal->j_state_lock);
 		wait_event(journal->j_wait_transaction_locked,
 				journal->j_barrier_count == 0);
@ -213,7 +330,7 @@ repeat:
 			goto alloc_transaction;
 		write_lock(&journal->j_state_lock);
 		if (!journal->j_running_transaction &&
-		    !journal->j_barrier_count) {
+		    (handle->h_reserved || !journal->j_barrier_count)) {
 			jbd2_get_transaction(journal, new_transaction);
 			new_transaction = NULL;
 		}
@ -223,85 +340,18 @@ repeat:
 	transaction = journal->j_running_transaction;
-	/*
+	if (!handle->h_reserved) {
-	 * If the current transaction is locked down for commit, wait for the
+		/* We may have dropped j_state_lock - restart in that case */
-	 * lock to be released.
+		if (add_transaction_credits(journal, blocks, rsv_blocks))
-	 */
+			goto repeat;
-	if (transaction->t_state == T_LOCKED) {
+	} else {
 		DEFINE_WAIT(wait);
 		prepare_to_wait(&journal->j_wait_transaction_locked,
 					&wait, TASK_UNINTERRUPTIBLE);
 		read_unlock(&journal->j_state_lock);
 		schedule();
 		finish_wait(&journal->j_wait_transaction_locked, &wait);
 		goto repeat;
 	}
 	/*
 	 * If there is not enough space left in the log to write all potential
 	 * buffers requested by this operation, we need to stall pending a log
 	 * checkpoint to free some more log space.
 	 */
 	needed = atomic_add_return(nblocks,
 				   &transaction->t_outstanding_credits);
 	if (needed > journal->j_max_transaction_buffers) {
 		/*
-		 * If the current transaction is already too large, then start
+		 * We have handle reserved so we are allowed to join T_LOCKED
-		 * to commit it: we can then go back and attach this handle to
+		 * transaction and we don't have to check for transaction size
-		 * a new transaction.
+		 * and journal space.
 		 */
-		DEFINE_WAIT(wait);
+		sub_reserved_credits(journal, blocks);
-
+		handle->h_reserved = 0;
 		jbd_debug(2, "Handle %p starting new commit...\n", handle);
 		atomic_sub(nblocks, &transaction->t_outstanding_credits);
 		prepare_to_wait(&journal->j_wait_transaction_locked, &wait,
 				TASK_UNINTERRUPTIBLE);
 		tid = transaction->t_tid;
 		need_to_start = !tid_geq(journal->j_commit_request, tid);
 		read_unlock(&journal->j_state_lock);
 		if (need_to_start)
 			jbd2_log_start_commit(journal, tid);
 		schedule();
 		finish_wait(&journal->j_wait_transaction_locked, &wait);
 		goto repeat;
 	}
 	/*
 	 * The commit code assumes that it can get enough log space
 	 * without forcing a checkpoint.  This is *critical* for
 	 * correctness: a checkpoint of a buffer which is also
 	 * associated with a committing transaction creates a deadlock,
 	 * so commit simply cannot force through checkpoints.
 	 *
 	 * We must therefore ensure the necessary space in the journal
 	 * *before* starting to dirty potentially checkpointed buffers
 	 * in the new transaction.
 	 *
 	 * The worst part is, any transaction currently committing can
 	 * reduce the free space arbitrarily.  Be careful to account for
 	 * those buffers when checkpointing.
 	 */
 	/*
 	 * @@@ AKPM: This seems rather over-defensive.  We're giving commit
 	 * a _lot_ of headroom: 1/4 of the journal plus the size of
 	 * the committing transaction.  Really, we only need to give it
 	 * committing_transaction->t_outstanding_credits plus "enough" for
 	 * the log control blocks.
 	 * Also, this test is inconsistent with the matching one in
 	 * jbd2_journal_extend().
 	 */
 	if (__jbd2_log_space_left(journal) < jbd_space_needed(journal)) {
 		jbd_debug(2, "Handle %p waiting for checkpoint...\n", handle);
 		atomic_sub(nblocks, &transaction->t_outstanding_credits);
 		read_unlock(&journal->j_state_lock);
 		write_lock(&journal->j_state_lock);
 		if (__jbd2_log_space_left(journal) < jbd_space_needed(journal))
 			__jbd2_log_wait_for_space(journal);
 		write_unlock(&journal->j_state_lock);
 		goto repeat;
 	}
 	/* OK, account for the buffers that this operation expects to
@ -309,15 +359,16 @@ repeat:
 	 */
 	update_t_max_wait(transaction, ts);
 	handle->h_transaction = transaction;
-	handle->h_requested_credits = nblocks;
+	handle->h_requested_credits = blocks;
 	handle->h_start_jiffies = jiffies;
 	atomic_inc(&transaction->t_updates);
 	atomic_inc(&transaction->t_handle_count);
-	jbd_debug(4, "Handle %p given %d credits (total %d, free %d)\n",
+	jbd_debug(4, "Handle %p given %d credits (total %d, free %lu)\n",
-		  handle, nblocks,
+		  handle, blocks,
 		  atomic_read(&transaction->t_outstanding_credits),
-		  __jbd2_log_space_left(journal));
+		  jbd2_log_space_left(journal));
 	read_unlock(&journal->j_state_lock);
 	current->journal_info = handle;
 	lock_map_acquire(&handle->h_lockdep_map);
 	jbd2_journal_free_transaction(new_transaction);
@ -348,16 +399,21 @@ static handle_t *new_handle(int nblocks)
 *
 * We make sure that the transaction can guarantee at least nblocks of
 * modified buffers in the log.  We block until the log can guarantee
- * that much space.
+ * that much space. Additionally, if rsv_blocks > 0, we also create another
- *
+ * handle with rsv_blocks reserved blocks in the journal. This handle is
- * This function is visible to journal users (like ext3fs), so is not
+ * is stored in h_rsv_handle. It is not attached to any particular transaction
- * called with the journal already locked.
+ * and thus doesn't block transaction commit. If the caller uses this reserved
 * handle, it has to set h_rsv_handle to NULL as otherwise jbd2_journal_stop()
 * on the parent handle will dispose the reserved one. Reserved handle has to
 * be converted to a normal handle using jbd2_journal_start_reserved() before
 * it can be used.
 *
 * Return a pointer to a newly allocated handle, or an ERR_PTR() value
 * on failure.
 */
-handle_t *jbd2__journal_start(journal_t *journal, int nblocks, gfp_t gfp_mask,
+handle_t *jbd2__journal_start(journal_t *journal, int nblocks, int rsv_blocks,
-			      unsigned int type, unsigned int line_no)
+			      gfp_t gfp_mask, unsigned int type,
 			      unsigned int line_no)
 {
 	handle_t *handle = journal_current_handle();
 	int err;
@ -374,13 +430,24 @@ handle_t *jbd2__journal_start(journal_t *journal, int nblocks, gfp_t gfp_mask,
 	handle = new_handle(nblocks);
 	if (!handle)
 		return ERR_PTR(-ENOMEM);
 	if (rsv_blocks) {
 		handle_t *rsv_handle;
-	current->journal_info = handle;
+		rsv_handle = new_handle(rsv_blocks);
 		if (!rsv_handle) {
 			jbd2_free_handle(handle);
 			return ERR_PTR(-ENOMEM);
 		}
 		rsv_handle->h_reserved = 1;
 		rsv_handle->h_journal = journal;
 		handle->h_rsv_handle = rsv_handle;
 	}
 	err = start_this_handle(journal, handle, gfp_mask);
 	if (err < 0) {
 		if (handle->h_rsv_handle)
 			jbd2_free_handle(handle->h_rsv_handle);
 		jbd2_free_handle(handle);
 		current->journal_info = NULL;
 		return ERR_PTR(err);
 	}
 	handle->h_type = type;
@ -395,10 +462,65 @@ EXPORT_SYMBOL(jbd2__journal_start);
 handle_t *jbd2_journal_start(journal_t *journal, int nblocks)
 {
-	return jbd2__journal_start(journal, nblocks, GFP_NOFS, 0, 0);
+	return jbd2__journal_start(journal, nblocks, 0, GFP_NOFS, 0, 0);
 }
 EXPORT_SYMBOL(jbd2_journal_start);
 void jbd2_journal_free_reserved(handle_t *handle)
 {
 	journal_t *journal = handle->h_journal;
 	WARN_ON(!handle->h_reserved);
 	sub_reserved_credits(journal, handle->h_buffer_credits);
 	jbd2_free_handle(handle);
 }
 EXPORT_SYMBOL(jbd2_journal_free_reserved);
 /**
 * int jbd2_journal_start_reserved(handle_t *handle) - start reserved handle
 * @handle: handle to start
 *
 * Start handle that has been previously reserved with jbd2_journal_reserve().
 * This attaches @handle to the running transaction (or creates one if there's
 * not transaction running). Unlike jbd2_journal_start() this function cannot
 * block on journal commit, checkpointing, or similar stuff. It can block on
 * memory allocation or frozen journal though.
 *
 * Return 0 on success, non-zero on error - handle is freed in that case.
 */
 int jbd2_journal_start_reserved(handle_t *handle, unsigned int type,
 				unsigned int line_no)
 {
 	journal_t *journal = handle->h_journal;
 	int ret = -EIO;
 	if (WARN_ON(!handle->h_reserved)) {
 		/* Someone passed in normal handle? Just stop it. */
 		jbd2_journal_stop(handle);
 		return ret;
 	}
 	/*
 	 * Usefulness of mixing of reserved and unreserved handles is
 	 * questionable. So far nobody seems to need it so just error out.
 	 */
 	if (WARN_ON(current->journal_info)) {
 		jbd2_journal_free_reserved(handle);
 		return ret;
 	}
 	handle->h_journal = NULL;
 	/*
 	 * GFP_NOFS is here because callers are likely from writeback or
 	 * similarly constrained call sites
 	 */
 	ret = start_this_handle(journal, handle, GFP_NOFS);
 	if (ret < 0)
 		jbd2_journal_free_reserved(handle);
 	handle->h_type = type;
 	handle->h_line_no = line_no;
 	return ret;
 }
 EXPORT_SYMBOL(jbd2_journal_start_reserved);
 /**
 * int jbd2_journal_extend() - extend buffer credits.
@ -423,49 +545,53 @@ EXPORT_SYMBOL(jbd2_journal_start);
 int jbd2_journal_extend(handle_t *handle, int nblocks)
 {
 	transaction_t *transaction = handle->h_transaction;
-	journal_t *journal = transaction->t_journal;
+	journal_t *journal;
 	int result;
 	int wanted;
-	result = -EIO;
+	WARN_ON(!transaction);
 	if (is_handle_aborted(handle))
-		goto out;
+		return -EROFS;
 	journal = transaction->t_journal;
 	result = 1;
 	read_lock(&journal->j_state_lock);
 	/* Don't extend a locked-down transaction! */
-	if (handle->h_transaction->t_state != T_RUNNING) {
+	if (transaction->t_state != T_RUNNING) {
 		jbd_debug(3, "denied handle %p %d blocks: "
 			  "transaction not running\n", handle, nblocks);
 		goto error_out;
 	}
 	spin_lock(&transaction->t_handle_lock);
-	wanted = atomic_read(&transaction->t_outstanding_credits) + nblocks;
+	wanted = atomic_add_return(nblocks,
 				   &transaction->t_outstanding_credits);
 	if (wanted > journal->j_max_transaction_buffers) {
 		jbd_debug(3, "denied handle %p %d blocks: "
 			  "transaction too large\n", handle, nblocks);
 		atomic_sub(nblocks, &transaction->t_outstanding_credits);
 		goto unlock;
 	}
-	if (wanted > __jbd2_log_space_left(journal)) {
+	if (wanted + (wanted >> JBD2_CONTROL_BLOCKS_SHIFT) >
 	    jbd2_log_space_left(journal)) {
 		jbd_debug(3, "denied handle %p %d blocks: "
 			  "insufficient log space\n", handle, nblocks);
 		atomic_sub(nblocks, &transaction->t_outstanding_credits);
 		goto unlock;
 	}
 	trace_jbd2_handle_extend(journal->j_fs_dev->bd_dev,
-				 handle->h_transaction->t_tid,
+				 transaction->t_tid,
 				 handle->h_type, handle->h_line_no,
 				 handle->h_buffer_credits,
 				 nblocks);
 	handle->h_buffer_credits += nblocks;
 	handle->h_requested_credits += nblocks;
 	atomic_add(nblocks, &transaction->t_outstanding_credits);
 	result = 0;
 	jbd_debug(3, "extended handle %p by %d\n", handle, nblocks);
@ -473,7 +599,6 @@ unlock:
 	spin_unlock(&transaction->t_handle_lock);
 error_out:
 	read_unlock(&journal->j_state_lock);
 out:
 	return result;
 }
@ -490,19 +615,22 @@ out:
 * to a running handle, a call to jbd2_journal_restart will commit the
 * handle's transaction so far and reattach the handle to a new
 * transaction capabable of guaranteeing the requested number of
- * credits.
+ * credits. We preserve reserved handle if there's any attached to the
 * passed in handle.
 */
 int jbd2__journal_restart(handle_t *handle, int nblocks, gfp_t gfp_mask)
 {
 	transaction_t *transaction = handle->h_transaction;
-	journal_t *journal = transaction->t_journal;
+	journal_t *journal;
 	tid_t		tid;
 	int		need_to_start, ret;
 	WARN_ON(!transaction);
 	/* If we've had an abort of any type, don't even think about
 	 * actually doing the restart! */
 	if (is_handle_aborted(handle))
 		return 0;
 	journal = transaction->t_journal;
 	/*
 	 * First unlink the handle from its current transaction, and start the
@ -515,12 +643,18 @@ int jbd2__journal_restart(handle_t *handle, int nblocks, gfp_t gfp_mask)
 	spin_lock(&transaction->t_handle_lock);
 	atomic_sub(handle->h_buffer_credits,
 		   &transaction->t_outstanding_credits);
 	if (handle->h_rsv_handle) {
 		sub_reserved_credits(journal,
 				     handle->h_rsv_handle->h_buffer_credits);
 	}
 	if (atomic_dec_and_test(&transaction->t_updates))
 		wake_up(&journal->j_wait_updates);
 	tid = transaction->t_tid;
 	spin_unlock(&transaction->t_handle_lock);
 	handle->h_transaction = NULL;
 	current->journal_info = NULL;
 	jbd_debug(2, "restarting handle %p\n", handle);
 	tid = transaction->t_tid;
 	need_to_start = !tid_geq(journal->j_commit_request, tid);
 	read_unlock(&journal->j_state_lock);
 	if (need_to_start)
@ -557,6 +691,14 @@ void jbd2_journal_lock_updates(journal_t *journal)
 	write_lock(&journal->j_state_lock);
 	++journal->j_barrier_count;
 	/* Wait until there are no reserved handles */
 	if (atomic_read(&journal->j_reserved_credits)) {
 		write_unlock(&journal->j_state_lock);
 		wait_event(journal->j_wait_reserved,
 			   atomic_read(&journal->j_reserved_credits) == 0);
 		write_lock(&journal->j_state_lock);
 	}
 	/* Wait until there are no running updates */
 	while (1) {
 		transaction_t *transaction = journal->j_running_transaction;
@ -619,6 +761,12 @@ static void warn_dirty_buffer(struct buffer_head *bh)
 	       bdevname(bh->b_bdev, b), (unsigned long long)bh->b_blocknr);
 }
 static int sleep_on_shadow_bh(void *word)
 {
 	io_schedule();
 	return 0;
 }
 /*
 * If the buffer is already part of the current transaction, then there
 * is nothing we need to do.  If it is already part of a prior
@ -634,17 +782,16 @@ do_get_write_access(handle_t *handle, struct journal_head *jh,
 			int force_copy)
 {
 	struct buffer_head *bh;
-	transaction_t *transaction;
+	transaction_t *transaction = handle->h_transaction;
 	journal_t *journal;
 	int error;
 	char *frozen_buffer = NULL;
 	int need_copy = 0;
 	unsigned long start_lock, time_lock;
 	WARN_ON(!transaction);
 	if (is_handle_aborted(handle))
 		return -EROFS;
 	transaction = handle->h_transaction;
 	journal = transaction->t_journal;
 	jbd_debug(5, "journal_head %p, force_copy %d\n", jh, force_copy);
@ -754,41 +901,29 @@ repeat:
 		 * journaled.  If the primary copy is already going to
 		 * disk then we cannot do copy-out here. */
-		if (jh->b_jlist == BJ_Shadow) {
+		if (buffer_shadow(bh)) {
 			DEFINE_WAIT_BIT(wait, &bh->b_state, BH_Unshadow);
 			wait_queue_head_t *wqh;
 			wqh = bit_waitqueue(&bh->b_state, BH_Unshadow);
 			JBUFFER_TRACE(jh, "on shadow: sleep");
 			jbd_unlock_bh_state(bh);
-			/* commit wakes up all shadow buffers after IO */
+			wait_on_bit(&bh->b_state, BH_Shadow,
-			for ( ; ; ) {
+				    sleep_on_shadow_bh, TASK_UNINTERRUPTIBLE);
 				prepare_to_wait(wqh, &wait.wait,
 						TASK_UNINTERRUPTIBLE);
 				if (jh->b_jlist != BJ_Shadow)
 					break;
 				schedule();
 			}
 			finish_wait(wqh, &wait.wait);
 			goto repeat;
 		}
-		/* Only do the copy if the currently-owning transaction
+		/*
-		 * still needs it.  If it is on the Forget list, the
+		 * Only do the copy if the currently-owning transaction still
-		 * committing transaction is past that stage.  The
+		 * needs it. If buffer isn't on BJ_Metadata list, the
-		 * buffer had better remain locked during the kmalloc,
+		 * committing transaction is past that stage (here we use the
-		 * but that should be true --- we hold the journal lock
+		 * fact that BH_Shadow is set under bh_state lock together with
-		 * still and the buffer is already on the BUF_JOURNAL
+		 * refiling to BJ_Shadow list and at this point we know the
-		 * list so won't be flushed.
+		 * buffer doesn't have BH_Shadow set).
 		 *
 		 * Subtle point, though: if this is a get_undo_access,
 		 * then we will be relying on the frozen_data to contain
 		 * the new value of the committed_data record after the
 		 * transaction, so we HAVE to force the frozen_data copy
-		 * in that case. */
+		 * in that case.
-
+		 */
-		if (jh->b_jlist != BJ_Forget || force_copy) {
+		if (jh->b_jlist == BJ_Metadata || force_copy) {
 			JBUFFER_TRACE(jh, "generate frozen data");
 			if (!frozen_buffer) {
 				JBUFFER_TRACE(jh, "allocate memory for buffer");
@ -915,14 +1050,16 @@ int jbd2_journal_get_write_access(handle_t *handle, struct buffer_head *bh)
 int jbd2_journal_get_create_access(handle_t *handle, struct buffer_head *bh)
 {
 	transaction_t *transaction = handle->h_transaction;
-	journal_t *journal = transaction->t_journal;
+	journal_t *journal;
 	struct journal_head *jh = jbd2_journal_add_journal_head(bh);
 	int err;
 	jbd_debug(5, "journal_head %p\n", jh);
 	WARN_ON(!transaction);
 	err = -EROFS;
 	if (is_handle_aborted(handle))
 		goto out;
 	journal = transaction->t_journal;
 	err = 0;
 	JBUFFER_TRACE(jh, "entry");
@ -1128,12 +1265,14 @@ void jbd2_buffer_abort_trigger(struct journal_head *jh,
 int jbd2_journal_dirty_metadata(handle_t *handle, struct buffer_head *bh)
 {
 	transaction_t *transaction = handle->h_transaction;
-	journal_t *journal = transaction->t_journal;
+	journal_t *journal;
 	struct journal_head *jh;
 	int ret = 0;
 	WARN_ON(!transaction);
 	if (is_handle_aborted(handle))
-		goto out;
+		return -EROFS;
 	journal = transaction->t_journal;
 	jh = jbd2_journal_grab_journal_head(bh);
 	if (!jh) {
 		ret = -EUCLEAN;
@ -1227,7 +1366,7 @@ int jbd2_journal_dirty_metadata(handle_t *handle, struct buffer_head *bh)
 	JBUFFER_TRACE(jh, "file as BJ_Metadata");
 	spin_lock(&journal->j_list_lock);
-	__jbd2_journal_file_buffer(jh, handle->h_transaction, BJ_Metadata);
+	__jbd2_journal_file_buffer(jh, transaction, BJ_Metadata);
 	spin_unlock(&journal->j_list_lock);
 out_unlock_bh:
 	jbd_unlock_bh_state(bh);
@ -1258,12 +1397,17 @@ out:
 int jbd2_journal_forget (handle_t *handle, struct buffer_head *bh)
 {
 	transaction_t *transaction = handle->h_transaction;
-	journal_t *journal = transaction->t_journal;
+	journal_t *journal;
 	struct journal_head *jh;
 	int drop_reserve = 0;
 	int err = 0;
 	int was_modified = 0;
 	WARN_ON(!transaction);
 	if (is_handle_aborted(handle))
 		return -EROFS;
 	journal = transaction->t_journal;
 	BUFFER_TRACE(bh, "entry");
 	jbd_lock_bh_state(bh);
@ -1290,7 +1434,7 @@ int jbd2_journal_forget (handle_t *handle, struct buffer_head *bh)
 	 */
 	jh->b_modified = 0;
-	if (jh->b_transaction == handle->h_transaction) {
+	if (jh->b_transaction == transaction) {
 		J_ASSERT_JH(jh, !jh->b_frozen_data);
 		/* If we are forgetting a buffer which is already part
@ -1385,19 +1529,21 @@ drop:
 int jbd2_journal_stop(handle_t *handle)
 {
 	transaction_t *transaction = handle->h_transaction;
-	journal_t *journal = transaction->t_journal;
+	journal_t *journal;
-	int err, wait_for_commit = 0;
+	int err = 0, wait_for_commit = 0;
 	tid_t tid;
 	pid_t pid;
 	if (!transaction)
 		goto free_and_exit;
 	journal = transaction->t_journal;
 	J_ASSERT(journal_current_handle() == handle);
 	if (is_handle_aborted(handle))
 		err = -EIO;
-	else {
+	else
 		J_ASSERT(atomic_read(&transaction->t_updates) > 0);
 		err = 0;
 	}
 	if (--handle->h_ref > 0) {
 		jbd_debug(4, "h_ref %d -> %d\n", handle->h_ref + 1,
@ -1407,7 +1553,7 @@ int jbd2_journal_stop(handle_t *handle)
 	jbd_debug(4, "Handle %p going down\n", handle);
 	trace_jbd2_handle_stats(journal->j_fs_dev->bd_dev,
-				handle->h_transaction->t_tid,
+				transaction->t_tid,
 				handle->h_type, handle->h_line_no,
 				jiffies - handle->h_start_jiffies,
 				handle->h_sync, handle->h_requested_credits,
@ -1518,33 +1664,13 @@ int jbd2_journal_stop(handle_t *handle)
 	lock_map_release(&handle->h_lockdep_map);
 	if (handle->h_rsv_handle)
 		jbd2_journal_free_reserved(handle->h_rsv_handle);
 free_and_exit:
 	jbd2_free_handle(handle);
 	return err;
 }
 /**
 * int jbd2_journal_force_commit() - force any uncommitted transactions
 * @journal: journal to force
 *
 * For synchronous operations: force any uncommitted transactions
 * to disk.  May seem kludgy, but it reuses all the handle batching
 * code in a very simple manner.
 */
 int jbd2_journal_force_commit(journal_t *journal)
 {
 	handle_t *handle;
 	int ret;
 	handle = jbd2_journal_start(journal, 1);
 	if (IS_ERR(handle)) {
 		ret = PTR_ERR(handle);
 	} else {
 		handle->h_sync = 1;
 		ret = jbd2_journal_stop(handle);
 	}
 	return ret;
 }
 /*
 *
 * List management code snippets: various functions for manipulating the
@ -1601,10 +1727,10 @@ __blist_del_buffer(struct journal_head **list, struct journal_head *jh)
 * Remove a buffer from the appropriate transaction list.
 *
 * Note that this function can *change* the value of
- * bh->b_transaction->t_buffers, t_forget, t_iobuf_list, t_shadow_list,
+ * bh->b_transaction->t_buffers, t_forget, t_shadow_list, t_log_list or
- * t_log_list or t_reserved_list.  If the caller is holding onto a copy of one
+ * t_reserved_list.  If the caller is holding onto a copy of one of these
- * of these pointers, it could go bad.  Generally the caller needs to re-read
+ * pointers, it could go bad.  Generally the caller needs to re-read the
- * the pointer from the transaction_t.
+ * pointer from the transaction_t.
 *
 * Called under j_list_lock.
 */
@ -1634,15 +1760,9 @@ static void __jbd2_journal_temp_unlink_buffer(struct journal_head *jh)
 	case BJ_Forget:
 		list = &transaction->t_forget;
 		break;
 	case BJ_IO:
 		list = &transaction->t_iobuf_list;
 		break;
 	case BJ_Shadow:
 		list = &transaction->t_shadow_list;
 		break;
 	case BJ_LogCtl:
 		list = &transaction->t_log_list;
 		break;
 	case BJ_Reserved:
 		list = &transaction->t_reserved_list;
 		break;
@ -2034,18 +2154,23 @@ zap_buffer_unlocked:
 * void jbd2_journal_invalidatepage()
 * @journal: journal to use for flush...
 * @page:    page to flush
- * @offset:  length of page to invalidate.
+ * @offset:  start of the range to invalidate
 * @length:  length of the range to invalidate
 *
- * Reap page buffers containing data after offset in page. Can return -EBUSY
+ * Reap page buffers containing data after in the specified range in page.
- * if buffers are part of the committing transaction and the page is straddling
+ * Can return -EBUSY if buffers are part of the committing transaction and
- * i_size. Caller then has to wait for current commit and try again.
+ * the page is straddling i_size. Caller then has to wait for current commit
 * and try again.
 */
 int jbd2_journal_invalidatepage(journal_t *journal,
 				struct page *page,
-				unsigned long offset)
+				unsigned int offset,
 				unsigned int length)
 {
 	struct buffer_head *head, *bh, *next;
 	unsigned int stop = offset + length;
 	unsigned int curr_off = 0;
 	int partial_page = (offset || length < PAGE_CACHE_SIZE);
 	int may_free = 1;
 	int ret = 0;
@ -2054,6 +2179,8 @@ int jbd2_journal_invalidatepage(journal_t *journal,
 	if (!page_has_buffers(page))
 		return 0;
 	BUG_ON(stop > PAGE_CACHE_SIZE || stop < length);
 	/* We will potentially be playing with lists other than just the
 	 * data lists (especially for journaled data mode), so be
 	 * cautious in our locking. */
@ -2063,10 +2190,13 @@ int jbd2_journal_invalidatepage(journal_t *journal,
 		unsigned int next_off = curr_off + bh->b_size;
 		next = bh->b_this_page;
 		if (next_off > stop)
 			return 0;
 		if (offset <= curr_off) {
 			/* This block is wholly outside the truncation point */
 			lock_buffer(bh);
-			ret = journal_unmap_buffer(journal, bh, offset > 0);
+			ret = journal_unmap_buffer(journal, bh, partial_page);
 			unlock_buffer(bh);
 			if (ret < 0)
 				return ret;
@ -2077,7 +2207,7 @@ int jbd2_journal_invalidatepage(journal_t *journal,
 	} while (bh != head);
-	if (!offset) {
+	if (!partial_page) {
 		if (may_free && try_to_free_buffers(page))
 			J_ASSERT(!page_has_buffers(page));
 	}
@ -2138,15 +2268,9 @@ void __jbd2_journal_file_buffer(struct journal_head *jh,
 	case BJ_Forget:
 		list = &transaction->t_forget;
 		break;
 	case BJ_IO:
 		list = &transaction->t_iobuf_list;
 		break;
 	case BJ_Shadow:
 		list = &transaction->t_shadow_list;
 		break;
 	case BJ_LogCtl:
 		list = &transaction->t_log_list;
 		break;
 	case BJ_Reserved:
 		list = &transaction->t_reserved_list;
 		break;
@ -2248,10 +2372,12 @@ void jbd2_journal_refile_buffer(journal_t *journal, struct journal_head *jh)
 int jbd2_journal_file_inode(handle_t *handle, struct jbd2_inode *jinode)
 {
 	transaction_t *transaction = handle->h_transaction;
-	journal_t *journal = transaction->t_journal;
+	journal_t *journal;
 	WARN_ON(!transaction);
 	if (is_handle_aborted(handle))
-		return -EIO;
+		return -EROFS;
 	journal = transaction->t_journal;
 	jbd_debug(4, "Adding inode %lu, tid:%d\n", jinode->i_vfs_inode->i_ino,
 			transaction->t_tid);
--- a/fs/jfs/jfs_metapage.c
+++ b/fs/jfs/jfs_metapage.c
@ -571,9 +571,10 @@ static int metapage_releasepage(struct page *page, gfp_t gfp_mask)
 	return ret;
 }
-static void metapage_invalidatepage(struct page *page, unsigned long offset)
+static void metapage_invalidatepage(struct page *page, unsigned int offset,
 				    unsigned int length)
 {
-	BUG_ON(offset);
+	BUG_ON(offset || length < PAGE_CACHE_SIZE);
 	BUG_ON(PageWriteback(page));
--- a/fs/logfs/file.c
+++ b/fs/logfs/file.c
@ -159,7 +159,8 @@ static int logfs_writepage(struct page *page, struct writeback_control *wbc)
 	return __logfs_writepage(page);
 }
-static void logfs_invalidatepage(struct page *page, unsigned long offset)
+static void logfs_invalidatepage(struct page *page, unsigned int offset,
 				 unsigned int length)
 {
 	struct logfs_block *block = logfs_block(page);
--- a/fs/logfs/segment.c
+++ b/fs/logfs/segment.c
@ -884,7 +884,8 @@ static struct logfs_area *alloc_area(struct super_block *sb)
 	return area;
 }
-static void map_invalidatepage(struct page *page, unsigned long l)
+static void map_invalidatepage(struct page *page, unsigned int o,
 			       unsigned int l)
 {
 	return;
 }
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@ -451,11 +451,13 @@ static int nfs_write_end(struct file *file, struct address_space *mapping,
 * - Called if either PG_private or PG_fscache is set on the page
 * - Caller holds page lock
 */
-static void nfs_invalidate_page(struct page *page, unsigned long offset)
+static void nfs_invalidate_page(struct page *page, unsigned int offset,
 				unsigned int length)
 {
-	dfprintk(PAGECACHE, "NFS: invalidate_page(%p, %lu)\n", page, offset);
+	dfprintk(PAGECACHE, "NFS: invalidate_page(%p, %u, %u)\n",
 		 page, offset, length);
-	if (offset != 0)
+	if (offset != 0 || length < PAGE_CACHE_SIZE)
 		return;
 	/* Cancel any unstarted writes on this page */
 	nfs_wb_page_cancel(page_file_mapping(page)->host, page);
--- a/fs/ntfs/aops.c
+++ b/fs/ntfs/aops.c
@ -1372,7 +1372,7 @@ retry_writepage:
 		 * The page may have dirty, unmapped buffers.  Make them
 		 * freeable here, so the page does not leak.
 		 */
-		block_invalidatepage(page, 0);
+		block_invalidatepage(page, 0, PAGE_CACHE_SIZE);
 		unlock_page(page);
 		ntfs_debug("Write outside i_size - truncated?");
 		return 0;
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@ -603,11 +603,12 @@ static void ocfs2_dio_end_io(struct kiocb *iocb,
 * from ext3.  PageChecked() bits have been removed as OCFS2 does not
 * do journalled data.
 */
-static void ocfs2_invalidatepage(struct page *page, unsigned long offset)
+static void ocfs2_invalidatepage(struct page *page, unsigned int offset,
 				 unsigned int length)
 {
 	journal_t *journal = OCFS2_SB(page->mapping->host->i_sb)->journal->j_journal;
-	jbd2_journal_invalidatepage(journal, page, offset);
+	jbd2_journal_invalidatepage(journal, page, offset, length);
 }
 static int ocfs2_releasepage(struct page *page, gfp_t wait)
--- a/fs/reiserfs/inode.c
+++ b/fs/reiserfs/inode.c
@ -2975,16 +2975,19 @@ static int invalidatepage_can_drop(struct inode *inode, struct buffer_head *bh)
 }
 /* clm -- taken from fs/buffer.c:block_invalidate_page */
-static void reiserfs_invalidatepage(struct page *page, unsigned long offset)
+static void reiserfs_invalidatepage(struct page *page, unsigned int offset,
 				    unsigned int length)
 {
 	struct buffer_head *head, *bh, *next;
 	struct inode *inode = page->mapping->host;
 	unsigned int curr_off = 0;
 	unsigned int stop = offset + length;
 	int partial_page = (offset || length < PAGE_CACHE_SIZE);
 	int ret = 1;
 	BUG_ON(!PageLocked(page));
-	if (offset == 0)
+	if (!partial_page)
 		ClearPageChecked(page);
 	if (!page_has_buffers(page))
@ -2996,6 +2999,9 @@ static void reiserfs_invalidatepage(struct page *page, unsigned long offset)
 		unsigned int next_off = curr_off + bh->b_size;
 		next = bh->b_this_page;
 		if (next_off > stop)
 			goto out;
 		/*
 		 * is this block fully invalidated?
 		 */
@ -3014,7 +3020,7 @@ static void reiserfs_invalidatepage(struct page *page, unsigned long offset)
 	 * The get_block cached value has been unconditionally invalidated,
 	 * so real IO is not possible anymore.
 	 */
-	if (!offset && ret) {
+	if (!partial_page && ret) {
 		ret = try_to_release_page(page, 0);
 		/* maybe should BUG_ON(!ret); - neilb */
 	}
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@ -1277,13 +1277,14 @@ int ubifs_setattr(struct dentry *dentry, struct iattr *attr)
 	return err;
 }
-static void ubifs_invalidatepage(struct page *page, unsigned long offset)
+static void ubifs_invalidatepage(struct page *page, unsigned int offset,
 				 unsigned int length)
 {
 	struct inode *inode = page->mapping->host;
 	struct ubifs_info *c = inode->i_sb->s_fs_info;
 	ubifs_assert(PagePrivate(page));
-	if (offset)
+	if (offset || length < PAGE_CACHE_SIZE)
 		/* Partial page remains dirty */
 		return;
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@ -843,10 +843,12 @@ xfs_cluster_write(
 STATIC void
 xfs_vm_invalidatepage(
 	struct page		*page,
-	unsigned long		offset)
+	unsigned int		offset,
 	unsigned int		length)
 {
-	trace_xfs_invalidatepage(page->mapping->host, page, offset);
+	trace_xfs_invalidatepage(page->mapping->host, page, offset,
-	block_invalidatepage(page, offset);
+				 length);
 	block_invalidatepage(page, offset, length);
 }
 /*
@ -910,7 +912,7 @@ next_buffer:
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 out_invalidate:
-	xfs_vm_invalidatepage(page, 0);
+	xfs_vm_invalidatepage(page, 0, PAGE_CACHE_SIZE);
 	return;
 }
@ -940,7 +942,7 @@ xfs_vm_writepage(
 	int			count = 0;
 	int			nonblocking = 0;
-	trace_xfs_writepage(inode, page, 0);
+	trace_xfs_writepage(inode, page, 0, 0);
 	ASSERT(page_has_buffers(page));
@ -1171,7 +1173,7 @@ xfs_vm_releasepage(
 {
 	int			delalloc, unwritten;
-	trace_xfs_releasepage(page->mapping->host, page, 0);
+	trace_xfs_releasepage(page->mapping->host, page, 0, 0);
 	xfs_count_page_state(page, &delalloc, &unwritten);
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@ -974,14 +974,16 @@ DEFINE_RW_EVENT(xfs_file_splice_read);
 DEFINE_RW_EVENT(xfs_file_splice_write);
 DECLARE_EVENT_CLASS(xfs_page_class,
-	TP_PROTO(struct inode *inode, struct page *page, unsigned long off),
+	TP_PROTO(struct inode *inode, struct page *page, unsigned long off,
-	TP_ARGS(inode, page, off),
+		 unsigned int len),
 	TP_ARGS(inode, page, off, len),
 	TP_STRUCT__entry(
 		__field(dev_t, dev)
 		__field(xfs_ino_t, ino)
 		__field(pgoff_t, pgoff)
 		__field(loff_t, size)
 		__field(unsigned long, offset)
 		__field(unsigned int, length)
 		__field(int, delalloc)
 		__field(int, unwritten)
 	),
@ -995,24 +997,27 @@ DECLARE_EVENT_CLASS(xfs_page_class,
 		__entry->pgoff = page_offset(page);
 		__entry->size = i_size_read(inode);
 		__entry->offset = off;
 		__entry->length = len;
 		__entry->delalloc = delalloc;
 		__entry->unwritten = unwritten;
 	),
 	TP_printk("dev %d:%d ino 0x%llx pgoff 0x%lx size 0x%llx offset %lx "
-		  "delalloc %d unwritten %d",
+		  "length %x delalloc %d unwritten %d",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  __entry->ino,
 		  __entry->pgoff,
 		  __entry->size,
 		  __entry->offset,
 		  __entry->length,
 		  __entry->delalloc,
 		  __entry->unwritten)
 )
 #define DEFINE_PAGE_EVENT(name)		\
 DEFINE_EVENT(xfs_page_class, name,	\
-	TP_PROTO(struct inode *inode, struct page *page, unsigned long off),	\
+	TP_PROTO(struct inode *inode, struct page *page, unsigned long off, \
-	TP_ARGS(inode, page, off))
+		 unsigned int len),	\
 	TP_ARGS(inode, page, off, len))
 DEFINE_PAGE_EVENT(xfs_writepage);
 DEFINE_PAGE_EVENT(xfs_releasepage);
 DEFINE_PAGE_EVENT(xfs_invalidatepage);
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@ -198,7 +198,8 @@ extern int buffer_heads_over_limit;
 * Generic address_space_operations implementations for buffer_head-backed
 * address_spaces.
 */
-void block_invalidatepage(struct page *page, unsigned long offset);
+void block_invalidatepage(struct page *page, unsigned int offset,
 			  unsigned int length);
 int block_write_full_page(struct page *page, get_block_t *get_block,
 				struct writeback_control *wbc);
 int block_write_full_page_endio(struct page *page, get_block_t *get_block,
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@ -364,7 +364,7 @@ struct address_space_operations {
 	/* Unfortunately this kludge is needed for FIBMAP. Don't use it */
 	sector_t (*bmap)(struct address_space *, sector_t);
-	void (*invalidatepage) (struct page *, unsigned long);
+	void (*invalidatepage) (struct page *, unsigned int, unsigned int);
 	int (*releasepage) (struct page *, gfp_t);
 	void (*freepage)(struct page *);
 	ssize_t (*direct_IO)(int, struct kiocb *, const struct iovec *iov,
--- a/include/linux/jbd.h
+++ b/include/linux/jbd.h
@ -27,7 +27,6 @@
 #include <linux/buffer_head.h>
 #include <linux/journal-head.h>
 #include <linux/stddef.h>
 #include <linux/bit_spinlock.h>
 #include <linux/mutex.h>
 #include <linux/timer.h>
 #include <linux/lockdep.h>
@ -244,6 +243,31 @@ typedef struct journal_superblock_s
 #include <linux/fs.h>
 #include <linux/sched.h>
 enum jbd_state_bits {
 	BH_JBD			/* Has an attached ext3 journal_head */
 	  = BH_PrivateStart,
 	BH_JWrite,		/* Being written to log (@@@ DEBUGGING) */
 	BH_Freed,		/* Has been freed (truncated) */
 	BH_Revoked,		/* Has been revoked from the log */
 	BH_RevokeValid,		/* Revoked flag is valid */
 	BH_JBDDirty,		/* Is dirty but journaled */
 	BH_State,		/* Pins most journal_head state */
 	BH_JournalHead,		/* Pins bh->b_private and jh->b_bh */
 	BH_Unshadow,		/* Dummy bit, for BJ_Shadow wakeup filtering */
 	BH_JBDPrivateStart,	/* First bit available for private use by FS */
 };
 BUFFER_FNS(JBD, jbd)
 BUFFER_FNS(JWrite, jwrite)
 BUFFER_FNS(JBDDirty, jbddirty)
 TAS_BUFFER_FNS(JBDDirty, jbddirty)
 BUFFER_FNS(Revoked, revoked)
 TAS_BUFFER_FNS(Revoked, revoked)
 BUFFER_FNS(RevokeValid, revokevalid)
 TAS_BUFFER_FNS(RevokeValid, revokevalid)
 BUFFER_FNS(Freed, freed)
 #include <linux/jbd_common.h>
 #define J_ASSERT(assert)	BUG_ON(!(assert))
@ -840,7 +864,7 @@ extern void	 journal_release_buffer (handle_t *, struct buffer_head *);
 extern int	 journal_forget (handle_t *, struct buffer_head *);
 extern void	 journal_sync_buffer (struct buffer_head *);
 extern void	 journal_invalidatepage(journal_t *,
-				struct page *, unsigned long);
+				struct page *, unsigned int, unsigned int);
 extern int	 journal_try_to_free_buffers(journal_t *, struct page *, gfp_t);
 extern int	 journal_stop(handle_t *);
 extern int	 journal_flush (journal_t *);
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@ -26,7 +26,6 @@
 #include <linux/buffer_head.h>
 #include <linux/journal-head.h>
 #include <linux/stddef.h>
 #include <linux/bit_spinlock.h>
 #include <linux/mutex.h>
 #include <linux/timer.h>
 #include <linux/slab.h>
@ -57,17 +56,13 @@
 */
 #define JBD2_EXPENSIVE_CHECKING
 extern ushort jbd2_journal_enable_debug;
 void __jbd2_debug(int level, const char *file, const char *func,
 		  unsigned int line, const char *fmt, ...);
-#define jbd_debug(n, f, a...)						\
+#define jbd_debug(n, fmt, a...) \
-	do {								\
+	__jbd2_debug((n), __FILE__, __func__, __LINE__, (fmt), ##a)
 		if ((n) <= jbd2_journal_enable_debug) {			\
 			printk (KERN_DEBUG "(%s, %d): %s: ",		\
 				__FILE__, __LINE__, __func__);	\
 			printk (f, ## a);				\
 		}							\
 	} while (0)
 #else
-#define jbd_debug(f, a...)	/**/
+#define jbd_debug(n, fmt, a...)    /**/
 #endif
 extern void *jbd2_alloc(size_t size, gfp_t flags);
@ -302,6 +297,34 @@ typedef struct journal_superblock_s
 #include <linux/fs.h>
 #include <linux/sched.h>
 enum jbd_state_bits {
 	BH_JBD			/* Has an attached ext3 journal_head */
 	  = BH_PrivateStart,
 	BH_JWrite,		/* Being written to log (@@@ DEBUGGING) */
 	BH_Freed,		/* Has been freed (truncated) */
 	BH_Revoked,		/* Has been revoked from the log */
 	BH_RevokeValid,		/* Revoked flag is valid */
 	BH_JBDDirty,		/* Is dirty but journaled */
 	BH_State,		/* Pins most journal_head state */
 	BH_JournalHead,		/* Pins bh->b_private and jh->b_bh */
 	BH_Shadow,		/* IO on shadow buffer is running */
 	BH_Verified,		/* Metadata block has been verified ok */
 	BH_JBDPrivateStart,	/* First bit available for private use by FS */
 };
 BUFFER_FNS(JBD, jbd)
 BUFFER_FNS(JWrite, jwrite)
 BUFFER_FNS(JBDDirty, jbddirty)
 TAS_BUFFER_FNS(JBDDirty, jbddirty)
 BUFFER_FNS(Revoked, revoked)
 TAS_BUFFER_FNS(Revoked, revoked)
 BUFFER_FNS(RevokeValid, revokevalid)
 TAS_BUFFER_FNS(RevokeValid, revokevalid)
 BUFFER_FNS(Freed, freed)
 BUFFER_FNS(Shadow, shadow)
 BUFFER_FNS(Verified, verified)
 #include <linux/jbd_common.h>
 #define J_ASSERT(assert)	BUG_ON(!(assert))
@ -382,8 +405,15 @@ struct jbd2_revoke_table_s;
 struct jbd2_journal_handle
 {
-	/* Which compound transaction is this update a part of? */
+	union {
-	transaction_t		*h_transaction;
+		/* Which compound transaction is this update a part of? */
 		transaction_t	*h_transaction;
 		/* Which journal handle belongs to - used iff h_reserved set */
 		journal_t	*h_journal;
 	};
 	/* Handle reserved for finishing the logical operation */
 	handle_t		*h_rsv_handle;
 	/* Number of remaining buffers we are allowed to dirty: */
 	int			h_buffer_credits;
@ -398,6 +428,7 @@ struct jbd2_journal_handle
 	/* Flags [no locking] */
 	unsigned int	h_sync:		1;	/* sync-on-close */
 	unsigned int	h_jdata:	1;	/* force data journaling */
 	unsigned int	h_reserved:	1;	/* handle with reserved credits */
 	unsigned int	h_aborted:	1;	/* fatal error on handle */
 	unsigned int	h_type:		8;	/* for handle statistics */
 	unsigned int	h_line_no:	16;	/* for handle statistics */
@ -523,12 +554,6 @@ struct transaction_s
 	 */
 	struct journal_head	*t_checkpoint_io_list;
 	/*
 	 * Doubly-linked circular list of temporary buffers currently undergoing
 	 * IO in the log [j_list_lock]
 	 */
 	struct journal_head	*t_iobuf_list;
 	/*
 	 * Doubly-linked circular list of metadata buffers being shadowed by log
 	 * IO.  The IO buffers on the iobuf list and the shadow buffers on this
@ -536,12 +561,6 @@ struct transaction_s
 	 */
 	struct journal_head	*t_shadow_list;
 	/*
 	 * Doubly-linked circular list of control buffers being written to the
 	 * log. [j_list_lock]
 	 */
 	struct journal_head	*t_log_list;
 	/*
 	 * List of inodes whose data we've modified in data=ordered mode.
 	 * [j_list_lock]
@ -671,11 +690,10 @@ jbd2_time_diff(unsigned long start, unsigned long end)
 *  waiting for checkpointing
 * @j_wait_transaction_locked: Wait queue for waiting for a locked transaction
 *  to start committing, or for a barrier lock to be released
 * @j_wait_logspace: Wait queue for waiting for checkpointing to complete
 * @j_wait_done_commit: Wait queue for waiting for commit to complete
 * @j_wait_checkpoint:  Wait queue to trigger checkpointing
 * @j_wait_commit: Wait queue to trigger commit
 * @j_wait_updates: Wait queue to wait for updates to complete
 * @j_wait_reserved: Wait queue to wait for reserved buffer credits to drop
 * @j_checkpoint_mutex: Mutex for locking against concurrent checkpoints
 * @j_head: Journal head - identifies the first unused block in the journal
 * @j_tail: Journal tail - identifies the oldest still-used block in the
@ -689,6 +707,7 @@ jbd2_time_diff(unsigned long start, unsigned long end)
 *     journal
 * @j_fs_dev: Device which holds the client fs.  For internal journal this will
 *     be equal to j_dev
 * @j_reserved_credits: Number of buffers reserved from the running transaction
 * @j_maxlen: Total maximum capacity of the journal region on disk.
 * @j_list_lock: Protects the buffer lists and internal buffer state.
 * @j_inode: Optional inode where we store the journal.  If present, all journal
@ -778,21 +797,18 @@ struct journal_s
 	 */
 	wait_queue_head_t	j_wait_transaction_locked;
 	/* Wait queue for waiting for checkpointing to complete */
 	wait_queue_head_t	j_wait_logspace;
 	/* Wait queue for waiting for commit to complete */
 	wait_queue_head_t	j_wait_done_commit;
 	/* Wait queue to trigger checkpointing */
 	wait_queue_head_t	j_wait_checkpoint;
 	/* Wait queue to trigger commit */
 	wait_queue_head_t	j_wait_commit;
 	/* Wait queue to wait for updates to complete */
 	wait_queue_head_t	j_wait_updates;
 	/* Wait queue to wait for reserved buffer credits to drop */
 	wait_queue_head_t	j_wait_reserved;
 	/* Semaphore for locking against concurrent checkpoints */
 	struct mutex		j_checkpoint_mutex;
@ -847,6 +863,9 @@ struct journal_s
 	/* Total maximum capacity of the journal region on disk. */
 	unsigned int		j_maxlen;
 	/* Number of buffers reserved from the running transaction */
 	atomic_t		j_reserved_credits;
 	/*
 	 * Protects the buffer lists and internal buffer state.
 	 */
@ -991,9 +1010,17 @@ extern void __jbd2_journal_file_buffer(struct journal_head *, transaction_t *, i
 extern void __journal_free_buffer(struct journal_head *bh);
 extern void jbd2_journal_file_buffer(struct journal_head *, transaction_t *, int);
 extern void __journal_clean_data_list(transaction_t *transaction);
 static inline void jbd2_file_log_bh(struct list_head *head, struct buffer_head *bh)
 {
 	list_add_tail(&bh->b_assoc_buffers, head);
 }
 static inline void jbd2_unfile_log_bh(struct buffer_head *bh)
 {
 	list_del_init(&bh->b_assoc_buffers);
 }
 /* Log buffer allocation */
-extern struct journal_head * jbd2_journal_get_descriptor_buffer(journal_t *);
+struct buffer_head *jbd2_journal_get_descriptor_buffer(journal_t *journal);
 int jbd2_journal_next_log_block(journal_t *, unsigned long long *);
 int jbd2_journal_get_log_tail(journal_t *journal, tid_t *tid,
 			      unsigned long *block);
@ -1039,11 +1066,10 @@ extern void jbd2_buffer_abort_trigger(struct journal_head *jh,
 				      struct jbd2_buffer_trigger_type *triggers);
 /* Buffer IO */
-extern int
+extern int jbd2_journal_write_metadata_buffer(transaction_t *transaction,
-jbd2_journal_write_metadata_buffer(transaction_t	  *transaction,
+					      struct journal_head *jh_in,
-			      struct journal_head  *jh_in,
+					      struct buffer_head **bh_out,
-			      struct journal_head **jh_out,
+					      sector_t blocknr);
 			      unsigned long long   blocknr);
 /* Transaction locking */
 extern void		__wait_on_journal (journal_t *);
@ -1076,10 +1102,14 @@ static inline handle_t *journal_current_handle(void)
 */
 extern handle_t *jbd2_journal_start(journal_t *, int nblocks);
-extern handle_t *jbd2__journal_start(journal_t *, int nblocks, gfp_t gfp_mask,
+extern handle_t *jbd2__journal_start(journal_t *, int blocks, int rsv_blocks,
-				     unsigned int type, unsigned int line_no);
+				     gfp_t gfp_mask, unsigned int type,
 				     unsigned int line_no);
 extern int	 jbd2_journal_restart(handle_t *, int nblocks);
 extern int	 jbd2__journal_restart(handle_t *, int nblocks, gfp_t gfp_mask);
 extern int	 jbd2_journal_start_reserved(handle_t *handle,
 				unsigned int type, unsigned int line_no);
 extern void	 jbd2_journal_free_reserved(handle_t *handle);
 extern int	 jbd2_journal_extend (handle_t *, int nblocks);
 extern int	 jbd2_journal_get_write_access(handle_t *, struct buffer_head *);
 extern int	 jbd2_journal_get_create_access (handle_t *, struct buffer_head *);
@ -1090,7 +1120,7 @@ extern int	 jbd2_journal_dirty_metadata (handle_t *, struct buffer_head *);
 extern int	 jbd2_journal_forget (handle_t *, struct buffer_head *);
 extern void	 journal_sync_buffer (struct buffer_head *);
 extern int	 jbd2_journal_invalidatepage(journal_t *,
-				struct page *, unsigned long);
+				struct page *, unsigned int, unsigned int);
 extern int	 jbd2_journal_try_to_free_buffers(journal_t *, struct page *, gfp_t);
 extern int	 jbd2_journal_stop(handle_t *);
 extern int	 jbd2_journal_flush (journal_t *);
@ -1125,6 +1155,7 @@ extern void	   jbd2_journal_ack_err    (journal_t *);
 extern int	   jbd2_journal_clear_err  (journal_t *);
 extern int	   jbd2_journal_bmap(journal_t *, unsigned long, unsigned long long *);
 extern int	   jbd2_journal_force_commit(journal_t *);
 extern int	   jbd2_journal_force_commit_nested(journal_t *);
 extern int	   jbd2_journal_file_inode(handle_t *handle, struct jbd2_inode *inode);
 extern int	   jbd2_journal_begin_ordered_truncate(journal_t *journal,
 				struct jbd2_inode *inode, loff_t new_size);
@ -1178,8 +1209,10 @@ extern int	   jbd2_journal_init_revoke_caches(void);
 extern void	   jbd2_journal_destroy_revoke(journal_t *);
 extern int	   jbd2_journal_revoke (handle_t *, unsigned long long, struct buffer_head *);
 extern int	   jbd2_journal_cancel_revoke(handle_t *, struct journal_head *);
-extern void	   jbd2_journal_write_revoke_records(journal_t *,
+extern void	   jbd2_journal_write_revoke_records(journal_t *journal,
-						     transaction_t *, int);
+						     transaction_t *transaction,
 						     struct list_head *log_bufs,
 						     int write_op);
 /* Recovery revoke support */
 extern int	jbd2_journal_set_revoke(journal_t *, unsigned long long, tid_t);
@ -1195,11 +1228,9 @@ extern void	jbd2_clear_buffer_revoked_flags(journal_t *journal);
 * transitions on demand.
 */
 int __jbd2_log_space_left(journal_t *); /* Called with journal locked */
 int jbd2_log_start_commit(journal_t *journal, tid_t tid);
 int __jbd2_log_start_commit(journal_t *journal, tid_t tid);
 int jbd2_journal_start_commit(journal_t *journal, tid_t *tid);
 int jbd2_journal_force_commit_nested(journal_t *journal);
 int jbd2_log_wait_commit(journal_t *journal, tid_t tid);
 int jbd2_complete_transaction(journal_t *journal, tid_t tid);
 int jbd2_log_do_checkpoint(journal_t *journal);
@ -1235,7 +1266,7 @@ static inline int is_journal_aborted(journal_t *journal)
 static inline int is_handle_aborted(handle_t *handle)
 {
-	if (handle->h_aborted)
+	if (handle->h_aborted || !handle->h_transaction)
 		return 1;
 	return is_journal_aborted(handle->h_transaction->t_journal);
 }
@ -1265,17 +1296,38 @@ static inline int tid_geq(tid_t x, tid_t y)
 extern int jbd2_journal_blocks_per_page(struct inode *inode);
 extern size_t journal_tag_bytes(journal_t *journal);
 /*
 * We reserve t_outstanding_credits >> JBD2_CONTROL_BLOCKS_SHIFT for
 * transaction control blocks.
 */
 #define JBD2_CONTROL_BLOCKS_SHIFT 5
 /*
 * Return the minimum number of blocks which must be free in the journal
 * before a new transaction may be started.  Must be called under j_state_lock.
 */
-static inline int jbd_space_needed(journal_t *journal)
+static inline int jbd2_space_needed(journal_t *journal)
 {
 	int nblocks = journal->j_max_transaction_buffers;
-	if (journal->j_committing_transaction)
+	return nblocks + (nblocks >> JBD2_CONTROL_BLOCKS_SHIFT);
-		nblocks += atomic_read(&journal->j_committing_transaction->
+}
-				       t_outstanding_credits);
+
-	return nblocks;
+/*
 * Return number of free blocks in the log. Must be called under j_state_lock.
 */
 static inline unsigned long jbd2_log_space_left(journal_t *journal)
 {
 	/* Allow for rounding errors */
 	unsigned long free = journal->j_free - 32;
 	if (journal->j_committing_transaction) {
 		unsigned long committing = atomic_read(&journal->
 			j_committing_transaction->t_outstanding_credits);
 		/* Transaction + control blocks */
 		free -= committing + (committing >> JBD2_CONTROL_BLOCKS_SHIFT);
 	}
 	return free;
 }
 /*
@ -1286,11 +1338,9 @@ static inline int jbd_space_needed(journal_t *journal)
 #define BJ_None		0	/* Not journaled */
 #define BJ_Metadata	1	/* Normal journaled metadata */
 #define BJ_Forget	2	/* Buffer superseded by this transaction */
-#define BJ_IO		3	/* Buffer is for temporary IO use */
+#define BJ_Shadow	3	/* Buffer contents being shadowed to the log */
-#define BJ_Shadow	4	/* Buffer contents being shadowed to the log */
+#define BJ_Reserved	4	/* Buffer is reserved for access by journal */
-#define BJ_LogCtl	5	/* Buffer contains log descriptors */
+#define BJ_Types	5
 #define BJ_Reserved	6	/* Buffer is reserved for access by journal */
 #define BJ_Types	7
 extern int jbd_blocks_per_page(struct inode *inode);
@ -1319,6 +1369,19 @@ static inline u32 jbd2_chksum(journal_t *journal, u32 crc,
 	return *(u32 *)desc.ctx;
 }
 /* Return most recent uncommitted transaction */
 static inline tid_t  jbd2_get_latest_transaction(journal_t *journal)
 {
 	tid_t tid;
 	read_lock(&journal->j_state_lock);
 	tid = journal->j_commit_request;
 	if (journal->j_running_transaction)
 		tid = journal->j_running_transaction->t_tid;
 	read_unlock(&journal->j_state_lock);
 	return tid;
 }
 #ifdef __KERNEL__
 #define buffer_trace_init(bh)	do {} while (0)
--- a/include/linux/jbd_common.h
+++ b/include/linux/jbd_common.h
@ -1,31 +1,7 @@
 #ifndef _LINUX_JBD_STATE_H
 #define _LINUX_JBD_STATE_H
-enum jbd_state_bits {
+#include <linux/bit_spinlock.h>
 	BH_JBD			/* Has an attached ext3 journal_head */
 	  = BH_PrivateStart,
 	BH_JWrite,		/* Being written to log (@@@ DEBUGGING) */
 	BH_Freed,		/* Has been freed (truncated) */
 	BH_Revoked,		/* Has been revoked from the log */
 	BH_RevokeValid,		/* Revoked flag is valid */
 	BH_JBDDirty,		/* Is dirty but journaled */
 	BH_State,		/* Pins most journal_head state */
 	BH_JournalHead,		/* Pins bh->b_private and jh->b_bh */
 	BH_Unshadow,		/* Dummy bit, for BJ_Shadow wakeup filtering */
 	BH_Verified,		/* Metadata block has been verified ok */
 	BH_JBDPrivateStart,	/* First bit available for private use by FS */
 };
 BUFFER_FNS(JBD, jbd)
 BUFFER_FNS(JWrite, jwrite)
 BUFFER_FNS(JBDDirty, jbddirty)
 TAS_BUFFER_FNS(JBDDirty, jbddirty)
 BUFFER_FNS(Revoked, revoked)
 TAS_BUFFER_FNS(Revoked, revoked)
 BUFFER_FNS(RevokeValid, revokevalid)
 TAS_BUFFER_FNS(RevokeValid, revokevalid)
 BUFFER_FNS(Freed, freed)
 BUFFER_FNS(Verified, verified)
 static inline struct buffer_head *jh2bh(struct journal_head *jh)
 {
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@ -1041,7 +1041,8 @@ int get_kernel_page(unsigned long start, int write, struct page **pages);
 struct page *get_dump_page(unsigned long addr);
 extern int try_to_release_page(struct page * page, gfp_t gfp_mask);
-extern void do_invalidatepage(struct page *page, unsigned long offset);
+extern void do_invalidatepage(struct page *page, unsigned int offset,
 			      unsigned int length);
 int __set_page_dirty_nobuffers(struct page *page);
 int __set_page_dirty_no_writeback(struct page *page);
--- a/include/trace/events/ext3.h
+++ b/include/trace/events/ext3.h
@ -290,13 +290,14 @@ DEFINE_EVENT(ext3__page_op, ext3_releasepage,
 );
 TRACE_EVENT(ext3_invalidatepage,
-	TP_PROTO(struct page *page, unsigned long offset),
+	TP_PROTO(struct page *page, unsigned int offset, unsigned int length),
-	TP_ARGS(page, offset),
+	TP_ARGS(page, offset, length),
 	TP_STRUCT__entry(
 		__field(	pgoff_t, index			)
-		__field(	unsigned long, offset		)
+		__field(	unsigned int, offset		)
 		__field(	unsigned int, length		)
 		__field(	ino_t,	ino			)
 		__field(	dev_t,	dev			)
@ -305,14 +306,15 @@ TRACE_EVENT(ext3_invalidatepage,
 	TP_fast_assign(
 		__entry->index	= page->index;
 		__entry->offset	= offset;
 		__entry->length	= length;
 		__entry->ino	= page->mapping->host->i_ino;
 		__entry->dev	= page->mapping->host->i_sb->s_dev;
 	),
-	TP_printk("dev %d,%d ino %lu page_index %lu offset %lu",
+	TP_printk("dev %d,%d ino %lu page_index %lu offset %u length %u",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  (unsigned long) __entry->ino,
-		  __entry->index, __entry->offset)
+		  __entry->index, __entry->offset, __entry->length)
 );
 TRACE_EVENT(ext3_discard_blocks,
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@ -19,6 +19,57 @@ struct extent_status;
 #define EXT4_I(inode) (container_of(inode, struct ext4_inode_info, vfs_inode))
 #define show_mballoc_flags(flags) __print_flags(flags, "|",	\
 	{ EXT4_MB_HINT_MERGE,		"HINT_MERGE" },		\
 	{ EXT4_MB_HINT_RESERVED,	"HINT_RESV" },		\
 	{ EXT4_MB_HINT_METADATA,	"HINT_MDATA" },		\
 	{ EXT4_MB_HINT_FIRST,		"HINT_FIRST" },		\
 	{ EXT4_MB_HINT_BEST,		"HINT_BEST" },		\
 	{ EXT4_MB_HINT_DATA,		"HINT_DATA" },		\
 	{ EXT4_MB_HINT_NOPREALLOC,	"HINT_NOPREALLOC" },	\
 	{ EXT4_MB_HINT_GROUP_ALLOC,	"HINT_GRP_ALLOC" },	\
 	{ EXT4_MB_HINT_GOAL_ONLY,	"HINT_GOAL_ONLY" },	\
 	{ EXT4_MB_HINT_TRY_GOAL,	"HINT_TRY_GOAL" },	\
 	{ EXT4_MB_DELALLOC_RESERVED,	"DELALLOC_RESV" },	\
 	{ EXT4_MB_STREAM_ALLOC,		"STREAM_ALLOC" },	\
 	{ EXT4_MB_USE_ROOT_BLOCKS,	"USE_ROOT_BLKS" },	\
 	{ EXT4_MB_USE_RESERVED,		"USE_RESV" })
 #define show_map_flags(flags) __print_flags(flags, "|",			\
 	{ EXT4_GET_BLOCKS_CREATE,		"CREATE" },		\
 	{ EXT4_GET_BLOCKS_UNINIT_EXT,		"UNINIT" },		\
 	{ EXT4_GET_BLOCKS_DELALLOC_RESERVE,	"DELALLOC" },		\
 	{ EXT4_GET_BLOCKS_PRE_IO,		"PRE_IO" },		\
 	{ EXT4_GET_BLOCKS_CONVERT,		"CONVERT" },		\
 	{ EXT4_GET_BLOCKS_METADATA_NOFAIL,	"METADATA_NOFAIL" },	\
 	{ EXT4_GET_BLOCKS_NO_NORMALIZE,		"NO_NORMALIZE" },	\
 	{ EXT4_GET_BLOCKS_KEEP_SIZE,		"KEEP_SIZE" },		\
 	{ EXT4_GET_BLOCKS_NO_LOCK,		"NO_LOCK" },		\
 	{ EXT4_GET_BLOCKS_NO_PUT_HOLE,		"NO_PUT_HOLE" })
 #define show_mflags(flags) __print_flags(flags, "",	\
 	{ EXT4_MAP_NEW,		"N" },			\
 	{ EXT4_MAP_MAPPED,	"M" },			\
 	{ EXT4_MAP_UNWRITTEN,	"U" },			\
 	{ EXT4_MAP_BOUNDARY,	"B" },			\
 	{ EXT4_MAP_UNINIT,	"u" },			\
 	{ EXT4_MAP_FROM_CLUSTER, "C" })
 #define show_free_flags(flags) __print_flags(flags, "|",	\
 	{ EXT4_FREE_BLOCKS_METADATA,		"METADATA" },	\
 	{ EXT4_FREE_BLOCKS_FORGET,		"FORGET" },	\
 	{ EXT4_FREE_BLOCKS_VALIDATED,		"VALIDATED" },	\
 	{ EXT4_FREE_BLOCKS_NO_QUOT_UPDATE,	"NO_QUOTA" },	\
 	{ EXT4_FREE_BLOCKS_NOFREE_FIRST_CLUSTER,"1ST_CLUSTER" },\
 	{ EXT4_FREE_BLOCKS_NOFREE_LAST_CLUSTER,	"LAST_CLUSTER" })
 #define show_extent_status(status) __print_flags(status, "",	\
 	{ (1 << 3),	"W" }, 					\
 	{ (1 << 2),	"U" },					\
 	{ (1 << 1),	"D" },					\
 	{ (1 << 0),	"H" })
 TRACE_EVENT(ext4_free_inode,
 	TP_PROTO(struct inode *inode),
@ -281,7 +332,7 @@ DEFINE_EVENT(ext4__write_end, ext4_da_write_end,
 	TP_ARGS(inode, pos, len, copied)
 );
-TRACE_EVENT(ext4_da_writepages,
+TRACE_EVENT(ext4_writepages,
 	TP_PROTO(struct inode *inode, struct writeback_control *wbc),
 	TP_ARGS(inode, wbc),
@ -324,46 +375,62 @@ TRACE_EVENT(ext4_da_writepages,
 );
 TRACE_EVENT(ext4_da_write_pages,
-	TP_PROTO(struct inode *inode, struct mpage_da_data *mpd),
+	TP_PROTO(struct inode *inode, pgoff_t first_page,
 		 struct writeback_control *wbc),
-	TP_ARGS(inode, mpd),
+	TP_ARGS(inode, first_page, wbc),
 	TP_STRUCT__entry(
 		__field(	dev_t,	dev			)
 		__field(	ino_t,	ino			)
-		__field(	__u64,	b_blocknr		)
+		__field(      pgoff_t,	first_page		)
-		__field(	__u32,	b_size			)
+		__field(	 long,	nr_to_write		)
-		__field(	__u32,	b_state			)
+		__field(	  int,	sync_mode		)
 		__field(	unsigned long,	first_page	)
 		__field(	int,	io_done			)
 		__field(	int,	pages_written		)
 		__field(	int,	sync_mode		)
 	),
 	TP_fast_assign(
 		__entry->dev		= inode->i_sb->s_dev;
 		__entry->ino		= inode->i_ino;
-		__entry->b_blocknr	= mpd->b_blocknr;
+		__entry->first_page	= first_page;
-		__entry->b_size		= mpd->b_size;
+		__entry->nr_to_write	= wbc->nr_to_write;
-		__entry->b_state	= mpd->b_state;
+		__entry->sync_mode	= wbc->sync_mode;
 		__entry->first_page	= mpd->first_page;
 		__entry->io_done	= mpd->io_done;
 		__entry->pages_written	= mpd->pages_written;
 		__entry->sync_mode	= mpd->wbc->sync_mode;
 	),
-	TP_printk("dev %d,%d ino %lu b_blocknr %llu b_size %u b_state 0x%04x "
+	TP_printk("dev %d,%d ino %lu first_page %lu nr_to_write %ld "
-		  "first_page %lu io_done %d pages_written %d sync_mode %d",
+		  "sync_mode %d",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  (unsigned long) __entry->ino,
+		  (unsigned long) __entry->ino, __entry->first_page,
-		  __entry->b_blocknr, __entry->b_size,
+		  __entry->nr_to_write, __entry->sync_mode)
 		  __entry->b_state, __entry->first_page,
 		  __entry->io_done, __entry->pages_written,
 		  __entry->sync_mode
                  )
 );
-TRACE_EVENT(ext4_da_writepages_result,
+TRACE_EVENT(ext4_da_write_pages_extent,
 	TP_PROTO(struct inode *inode, struct ext4_map_blocks *map),
 	TP_ARGS(inode, map),
 	TP_STRUCT__entry(
 		__field(	dev_t,	dev			)
 		__field(	ino_t,	ino			)
 		__field(	__u64,	lblk			)
 		__field(	__u32,	len			)
 		__field(	__u32,	flags			)
 	),
 	TP_fast_assign(
 		__entry->dev		= inode->i_sb->s_dev;
 		__entry->ino		= inode->i_ino;
 		__entry->lblk		= map->m_lblk;
 		__entry->len		= map->m_len;
 		__entry->flags		= map->m_flags;
 	),
 	TP_printk("dev %d,%d ino %lu lblk %llu len %u flags %s",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  (unsigned long) __entry->ino, __entry->lblk, __entry->len,
 		  show_mflags(__entry->flags))
 );
 TRACE_EVENT(ext4_writepages_result,
 	TP_PROTO(struct inode *inode, struct writeback_control *wbc,
 			int ret, int pages_written),
@ -444,16 +511,16 @@ DEFINE_EVENT(ext4__page_op, ext4_releasepage,
 );
 DECLARE_EVENT_CLASS(ext4_invalidatepage_op,
-	TP_PROTO(struct page *page, unsigned long offset),
+	TP_PROTO(struct page *page, unsigned int offset, unsigned int length),
-	TP_ARGS(page, offset),
+	TP_ARGS(page, offset, length),
 	TP_STRUCT__entry(
 		__field(	dev_t,	dev			)
 		__field(	ino_t,	ino			)
 		__field(	pgoff_t, index			)
-		__field(	unsigned long, offset		)
+		__field(	unsigned int, offset		)
-
+		__field(	unsigned int, length		)
 	),
 	TP_fast_assign(
@ -461,24 +528,26 @@ DECLARE_EVENT_CLASS(ext4_invalidatepage_op,
 		__entry->ino	= page->mapping->host->i_ino;
 		__entry->index	= page->index;
 		__entry->offset	= offset;
 		__entry->length	= length;
 	),
-	TP_printk("dev %d,%d ino %lu page_index %lu offset %lu",
+	TP_printk("dev %d,%d ino %lu page_index %lu offset %u length %u",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  (unsigned long) __entry->ino,
-		  (unsigned long) __entry->index, __entry->offset)
+		  (unsigned long) __entry->index,
 		  __entry->offset, __entry->length)
 );
 DEFINE_EVENT(ext4_invalidatepage_op, ext4_invalidatepage,
-	TP_PROTO(struct page *page, unsigned long offset),
+	TP_PROTO(struct page *page, unsigned int offset, unsigned int length),
-	TP_ARGS(page, offset)
+	TP_ARGS(page, offset, length)
 );
 DEFINE_EVENT(ext4_invalidatepage_op, ext4_journalled_invalidatepage,
-	TP_PROTO(struct page *page, unsigned long offset),
+	TP_PROTO(struct page *page, unsigned int offset, unsigned int length),
-	TP_ARGS(page, offset)
+	TP_ARGS(page, offset, length)
 );
 TRACE_EVENT(ext4_discard_blocks,
@ -673,10 +742,10 @@ TRACE_EVENT(ext4_request_blocks,
 		__entry->flags	= ar->flags;
 	),
-	TP_printk("dev %d,%d ino %lu flags %u len %u lblk %u goal %llu "
+	TP_printk("dev %d,%d ino %lu flags %s len %u lblk %u goal %llu "
 		  "lleft %u lright %u pleft %llu pright %llu ",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  (unsigned long) __entry->ino, __entry->flags,
+		  (unsigned long) __entry->ino, show_mballoc_flags(__entry->flags),
 		  __entry->len, __entry->logical, __entry->goal,
 		  __entry->lleft, __entry->lright, __entry->pleft,
 		  __entry->pright)
@ -715,10 +784,10 @@ TRACE_EVENT(ext4_allocate_blocks,
 		__entry->flags	= ar->flags;
 	),
-	TP_printk("dev %d,%d ino %lu flags %u len %u block %llu lblk %u "
+	TP_printk("dev %d,%d ino %lu flags %s len %u block %llu lblk %u "
 		  "goal %llu lleft %u lright %u pleft %llu pright %llu",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  (unsigned long) __entry->ino, __entry->flags,
+		  (unsigned long) __entry->ino, show_mballoc_flags(__entry->flags),
 		  __entry->len, __entry->block, __entry->logical,
 		  __entry->goal,  __entry->lleft, __entry->lright,
 		  __entry->pleft, __entry->pright)
@ -748,11 +817,11 @@ TRACE_EVENT(ext4_free_blocks,
 		__entry->mode		= inode->i_mode;
 	),
-	TP_printk("dev %d,%d ino %lu mode 0%o block %llu count %lu flags %d",
+	TP_printk("dev %d,%d ino %lu mode 0%o block %llu count %lu flags %s",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  (unsigned long) __entry->ino,
 		  __entry->mode, __entry->block, __entry->count,
-		  __entry->flags)
+		  show_free_flags(__entry->flags))
 );
 TRACE_EVENT(ext4_sync_file_enter,
@ -903,7 +972,7 @@ TRACE_EVENT(ext4_mballoc_alloc,
 	),
 	TP_printk("dev %d,%d inode %lu orig %u/%d/%u@%u goal %u/%d/%u@%u "
-		  "result %u/%d/%u@%u blks %u grps %u cr %u flags 0x%04x "
+		  "result %u/%d/%u@%u blks %u grps %u cr %u flags %s "
 		  "tail %u broken %u",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  (unsigned long) __entry->ino,
@ -914,7 +983,7 @@ TRACE_EVENT(ext4_mballoc_alloc,
 		  __entry->result_group, __entry->result_start,
 		  __entry->result_len, __entry->result_logical,
 		  __entry->found, __entry->groups, __entry->cr,
-		  __entry->flags, __entry->tail,
+		  show_mballoc_flags(__entry->flags), __entry->tail,
 		  __entry->buddy ? 1 << __entry->buddy : 0)
 );
@ -1528,10 +1597,10 @@ DECLARE_EVENT_CLASS(ext4__map_blocks_enter,
 		__entry->flags	= flags;
 	),
-	TP_printk("dev %d,%d ino %lu lblk %u len %u flags %u",
+	TP_printk("dev %d,%d ino %lu lblk %u len %u flags %s",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  (unsigned long) __entry->ino,
-		  __entry->lblk, __entry->len, __entry->flags)
+		  __entry->lblk, __entry->len, show_map_flags(__entry->flags))
 );
 DEFINE_EVENT(ext4__map_blocks_enter, ext4_ext_map_blocks_enter,
@ -1549,47 +1618,53 @@ DEFINE_EVENT(ext4__map_blocks_enter, ext4_ind_map_blocks_enter,
 );
 DECLARE_EVENT_CLASS(ext4__map_blocks_exit,
-	TP_PROTO(struct inode *inode, struct ext4_map_blocks *map, int ret),
+	TP_PROTO(struct inode *inode, unsigned flags, struct ext4_map_blocks *map,
 		 int ret),
-	TP_ARGS(inode, map, ret),
+	TP_ARGS(inode, flags, map, ret),
 	TP_STRUCT__entry(
 		__field(	dev_t,		dev		)
 		__field(	ino_t,		ino		)
 		__field(	unsigned int,	flags		)
 		__field(	ext4_fsblk_t,	pblk		)
 		__field(	ext4_lblk_t,	lblk		)
 		__field(	unsigned int,	len		)
-		__field(	unsigned int,	flags		)
+		__field(	unsigned int,	mflags		)
 		__field(	int,		ret		)
 	),
 	TP_fast_assign(
 		__entry->dev    = inode->i_sb->s_dev;
 		__entry->ino    = inode->i_ino;
 		__entry->flags	= flags;
 		__entry->pblk	= map->m_pblk;
 		__entry->lblk	= map->m_lblk;
 		__entry->len	= map->m_len;
-		__entry->flags	= map->m_flags;
+		__entry->mflags	= map->m_flags;
 		__entry->ret	= ret;
 	),
-	TP_printk("dev %d,%d ino %lu lblk %u pblk %llu len %u flags %x ret %d",
+	TP_printk("dev %d,%d ino %lu flags %s lblk %u pblk %llu len %u "
 		  "mflags %s ret %d",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  (unsigned long) __entry->ino,
-		  __entry->lblk, __entry->pblk,
+		  show_map_flags(__entry->flags), __entry->lblk, __entry->pblk,
-		  __entry->len, __entry->flags, __entry->ret)
+		  __entry->len, show_mflags(__entry->mflags), __entry->ret)
 );
 DEFINE_EVENT(ext4__map_blocks_exit, ext4_ext_map_blocks_exit,
-	TP_PROTO(struct inode *inode, struct ext4_map_blocks *map, int ret),
+	TP_PROTO(struct inode *inode, unsigned flags,
 		 struct ext4_map_blocks *map, int ret),
-	TP_ARGS(inode, map, ret)
+	TP_ARGS(inode, flags, map, ret)
 );
 DEFINE_EVENT(ext4__map_blocks_exit, ext4_ind_map_blocks_exit,
-	TP_PROTO(struct inode *inode, struct ext4_map_blocks *map, int ret),
+	TP_PROTO(struct inode *inode, unsigned flags,
 		 struct ext4_map_blocks *map, int ret),
-	TP_ARGS(inode, map, ret)
+	TP_ARGS(inode, flags, map, ret)
 );
 TRACE_EVENT(ext4_ext_load_extent,
@ -1638,25 +1713,50 @@ TRACE_EVENT(ext4_load_inode,
 );
 TRACE_EVENT(ext4_journal_start,
-	TP_PROTO(struct super_block *sb, int nblocks, unsigned long IP),
+	TP_PROTO(struct super_block *sb, int blocks, int rsv_blocks,
 		 unsigned long IP),
-	TP_ARGS(sb, nblocks, IP),
+	TP_ARGS(sb, blocks, rsv_blocks, IP),
 	TP_STRUCT__entry(
 		__field(	dev_t,	dev			)
 		__field(unsigned long,	ip			)
-		__field(	int,	nblocks			)
+		__field(	  int,	blocks			)
 		__field(	  int,	rsv_blocks		)
 	),
 	TP_fast_assign(
-		__entry->dev	 = sb->s_dev;
+		__entry->dev		 = sb->s_dev;
-		__entry->ip	 = IP;
+		__entry->ip		 = IP;
-		__entry->nblocks = nblocks;
+		__entry->blocks		 = blocks;
 		__entry->rsv_blocks	 = rsv_blocks;
 	),
-	TP_printk("dev %d,%d nblocks %d caller %pF",
+	TP_printk("dev %d,%d blocks, %d rsv_blocks, %d caller %pF",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  __entry->nblocks, (void *)__entry->ip)
+		  __entry->blocks, __entry->rsv_blocks, (void *)__entry->ip)
 );
 TRACE_EVENT(ext4_journal_start_reserved,
 	TP_PROTO(struct super_block *sb, int blocks, unsigned long IP),
 	TP_ARGS(sb, blocks, IP),
 	TP_STRUCT__entry(
 		__field(	dev_t,	dev			)
 		__field(unsigned long,	ip			)
 		__field(	  int,	blocks			)
 	),
 	TP_fast_assign(
 		__entry->dev		 = sb->s_dev;
 		__entry->ip		 = IP;
 		__entry->blocks		 = blocks;
 	),
 	TP_printk("dev %d,%d blocks, %d caller %pF",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  __entry->blocks, (void *)__entry->ip)
 );
 DECLARE_EVENT_CLASS(ext4__trim,
@ -1736,12 +1836,12 @@ TRACE_EVENT(ext4_ext_handle_uninitialized_extents,
 		__entry->newblk		= newblock;
 	),
-	TP_printk("dev %d,%d ino %lu m_lblk %u m_pblk %llu m_len %u flags %x "
+	TP_printk("dev %d,%d ino %lu m_lblk %u m_pblk %llu m_len %u flags %s "
 		  "allocated %d newblock %llu",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  (unsigned long) __entry->ino,
 		  (unsigned) __entry->lblk, (unsigned long long) __entry->pblk,
-		  __entry->len, __entry->flags,
+		  __entry->len, show_map_flags(__entry->flags),
 		  (unsigned int) __entry->allocated,
 		  (unsigned long long) __entry->newblk)
 );
@ -1769,10 +1869,10 @@ TRACE_EVENT(ext4_get_implied_cluster_alloc_exit,
 		__entry->ret	= ret;
 	),
-	TP_printk("dev %d,%d m_lblk %u m_pblk %llu m_len %u m_flags %u ret %d",
+	TP_printk("dev %d,%d m_lblk %u m_pblk %llu m_len %u m_flags %s ret %d",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  __entry->lblk, (unsigned long long) __entry->pblk,
-		  __entry->len, __entry->flags, __entry->ret)
+		  __entry->len, show_mflags(__entry->flags), __entry->ret)
 );
 TRACE_EVENT(ext4_ext_put_in_cache,
@ -1926,7 +2026,7 @@ TRACE_EVENT(ext4_ext_show_extent,
 TRACE_EVENT(ext4_remove_blocks,
 	    TP_PROTO(struct inode *inode, struct ext4_extent *ex,
 		ext4_lblk_t from, ext4_fsblk_t to,
-		ext4_fsblk_t partial_cluster),
+		long long partial_cluster),
 	TP_ARGS(inode, ex, from, to, partial_cluster),
@ -1935,7 +2035,7 @@ TRACE_EVENT(ext4_remove_blocks,
 		__field(	ino_t,		ino	)
 		__field(	ext4_lblk_t,	from	)
 		__field(	ext4_lblk_t,	to	)
-		__field(	ext4_fsblk_t,	partial	)
+		__field(	long long,	partial	)
 		__field(	ext4_fsblk_t,	ee_pblk	)
 		__field(	ext4_lblk_t,	ee_lblk	)
 		__field(	unsigned short,	ee_len	)
@ -1953,7 +2053,7 @@ TRACE_EVENT(ext4_remove_blocks,
 	),
 	TP_printk("dev %d,%d ino %lu extent [%u(%llu), %u]"
-		  "from %u to %u partial_cluster %u",
+		  "from %u to %u partial_cluster %lld",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  (unsigned long) __entry->ino,
 		  (unsigned) __entry->ee_lblk,
@ -1961,19 +2061,20 @@ TRACE_EVENT(ext4_remove_blocks,
 		  (unsigned short) __entry->ee_len,
 		  (unsigned) __entry->from,
 		  (unsigned) __entry->to,
-		  (unsigned) __entry->partial)
+		  (long long) __entry->partial)
 );
 TRACE_EVENT(ext4_ext_rm_leaf,
 	TP_PROTO(struct inode *inode, ext4_lblk_t start,
-		 struct ext4_extent *ex, ext4_fsblk_t partial_cluster),
+		 struct ext4_extent *ex,
 		 long long partial_cluster),
 	TP_ARGS(inode, start, ex, partial_cluster),
 	TP_STRUCT__entry(
 		__field(	dev_t,		dev	)
 		__field(	ino_t,		ino	)
-		__field(	ext4_fsblk_t,	partial	)
+		__field(	long long,	partial	)
 		__field(	ext4_lblk_t,	start	)
 		__field(	ext4_lblk_t,	ee_lblk	)
 		__field(	ext4_fsblk_t,	ee_pblk	)
@ -1991,14 +2092,14 @@ TRACE_EVENT(ext4_ext_rm_leaf,
 	),
 	TP_printk("dev %d,%d ino %lu start_lblk %u last_extent [%u(%llu), %u]"
-		  "partial_cluster %u",
+		  "partial_cluster %lld",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  (unsigned long) __entry->ino,
 		  (unsigned) __entry->start,
 		  (unsigned) __entry->ee_lblk,
 		  (unsigned long long) __entry->ee_pblk,
 		  (unsigned short) __entry->ee_len,
-		  (unsigned) __entry->partial)
+		  (long long) __entry->partial)
 );
 TRACE_EVENT(ext4_ext_rm_idx,
@ -2025,14 +2126,16 @@ TRACE_EVENT(ext4_ext_rm_idx,
 );
 TRACE_EVENT(ext4_ext_remove_space,
-	TP_PROTO(struct inode *inode, ext4_lblk_t start, int depth),
+	TP_PROTO(struct inode *inode, ext4_lblk_t start,
 		 ext4_lblk_t end, int depth),
-	TP_ARGS(inode, start, depth),
+	TP_ARGS(inode, start, end, depth),
 	TP_STRUCT__entry(
 		__field(	dev_t,		dev	)
 		__field(	ino_t,		ino	)
 		__field(	ext4_lblk_t,	start	)
 		__field(	ext4_lblk_t,	end	)
 		__field(	int,		depth	)
 	),
@ -2040,28 +2143,31 @@ TRACE_EVENT(ext4_ext_remove_space,
 		__entry->dev	= inode->i_sb->s_dev;
 		__entry->ino	= inode->i_ino;
 		__entry->start	= start;
 		__entry->end	= end;
 		__entry->depth	= depth;
 	),
-	TP_printk("dev %d,%d ino %lu since %u depth %d",
+	TP_printk("dev %d,%d ino %lu since %u end %u depth %d",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  (unsigned long) __entry->ino,
 		  (unsigned) __entry->start,
 		  (unsigned) __entry->end,
 		  __entry->depth)
 );
 TRACE_EVENT(ext4_ext_remove_space_done,
-	TP_PROTO(struct inode *inode, ext4_lblk_t start, int depth,
+	TP_PROTO(struct inode *inode, ext4_lblk_t start, ext4_lblk_t end,
-		ext4_lblk_t partial, __le16 eh_entries),
+		 int depth, long long partial, __le16 eh_entries),
-	TP_ARGS(inode, start, depth, partial, eh_entries),
+	TP_ARGS(inode, start, end, depth, partial, eh_entries),
 	TP_STRUCT__entry(
 		__field(	dev_t,		dev		)
 		__field(	ino_t,		ino		)
 		__field(	ext4_lblk_t,	start		)
 		__field(	ext4_lblk_t,	end		)
 		__field(	int,		depth		)
-		__field(	ext4_lblk_t,	partial		)
+		__field(	long long,	partial		)
 		__field(	unsigned short,	eh_entries	)
 	),
@ -2069,18 +2175,20 @@ TRACE_EVENT(ext4_ext_remove_space_done,
 		__entry->dev		= inode->i_sb->s_dev;
 		__entry->ino		= inode->i_ino;
 		__entry->start		= start;
 		__entry->end		= end;
 		__entry->depth		= depth;
 		__entry->partial	= partial;
 		__entry->eh_entries	= le16_to_cpu(eh_entries);
 	),
-	TP_printk("dev %d,%d ino %lu since %u depth %d partial %u "
+	TP_printk("dev %d,%d ino %lu since %u end %u depth %d partial %lld "
 		  "remaining_entries %u",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  (unsigned long) __entry->ino,
 		  (unsigned) __entry->start,
 		  (unsigned) __entry->end,
 		  __entry->depth,
-		  (unsigned) __entry->partial,
+		  (long long) __entry->partial,
 		  (unsigned short) __entry->eh_entries)
 );
@ -2095,7 +2203,7 @@ TRACE_EVENT(ext4_es_insert_extent,
 		__field(	ext4_lblk_t,	lblk		)
 		__field(	ext4_lblk_t,	len		)
 		__field(	ext4_fsblk_t,	pblk		)
-		__field(	unsigned long long, status	)
+		__field(	char, status	)
 	),
 	TP_fast_assign(
@ -2104,14 +2212,14 @@ TRACE_EVENT(ext4_es_insert_extent,
 		__entry->lblk	= es->es_lblk;
 		__entry->len	= es->es_len;
 		__entry->pblk	= ext4_es_pblock(es);
-		__entry->status	= ext4_es_status(es);
+		__entry->status	= ext4_es_status(es) >> 60;
 	),
-	TP_printk("dev %d,%d ino %lu es [%u/%u) mapped %llu status %llx",
+	TP_printk("dev %d,%d ino %lu es [%u/%u) mapped %llu status %s",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  (unsigned long) __entry->ino,
 		  __entry->lblk, __entry->len,
-		  __entry->pblk, __entry->status)
+		  __entry->pblk, show_extent_status(__entry->status))
 );
 TRACE_EVENT(ext4_es_remove_extent,
@ -2172,7 +2280,7 @@ TRACE_EVENT(ext4_es_find_delayed_extent_range_exit,
 		__field(	ext4_lblk_t,	lblk		)
 		__field(	ext4_lblk_t,	len		)
 		__field(	ext4_fsblk_t,	pblk		)
-		__field(	unsigned long long, status	)
+		__field(	char, status	)
 	),
 	TP_fast_assign(
@ -2181,14 +2289,14 @@ TRACE_EVENT(ext4_es_find_delayed_extent_range_exit,
 		__entry->lblk	= es->es_lblk;
 		__entry->len	= es->es_len;
 		__entry->pblk	= ext4_es_pblock(es);
-		__entry->status	= ext4_es_status(es);
+		__entry->status	= ext4_es_status(es) >> 60;
 	),
-	TP_printk("dev %d,%d ino %lu es [%u/%u) mapped %llu status %llx",
+	TP_printk("dev %d,%d ino %lu es [%u/%u) mapped %llu status %s",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  (unsigned long) __entry->ino,
 		  __entry->lblk, __entry->len,
-		  __entry->pblk, __entry->status)
+		  __entry->pblk, show_extent_status(__entry->status))
 );
 TRACE_EVENT(ext4_es_lookup_extent_enter,
@ -2225,7 +2333,7 @@ TRACE_EVENT(ext4_es_lookup_extent_exit,
 		__field(	ext4_lblk_t,	lblk		)
 		__field(	ext4_lblk_t,	len		)
 		__field(	ext4_fsblk_t,	pblk		)
-		__field(	unsigned long long,	status	)
+		__field(	char,		status		)
 		__field(	int,		found		)
 	),
@ -2235,16 +2343,16 @@ TRACE_EVENT(ext4_es_lookup_extent_exit,
 		__entry->lblk	= es->es_lblk;
 		__entry->len	= es->es_len;
 		__entry->pblk	= ext4_es_pblock(es);
-		__entry->status	= ext4_es_status(es);
+		__entry->status	= ext4_es_status(es) >> 60;
 		__entry->found	= found;
 	),
-	TP_printk("dev %d,%d ino %lu found %d [%u/%u) %llu %llx",
+	TP_printk("dev %d,%d ino %lu found %d [%u/%u) %llu %s",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  (unsigned long) __entry->ino, __entry->found,
 		  __entry->lblk, __entry->len,
 		  __entry->found ? __entry->pblk : 0,
-		  __entry->found ? __entry->status : 0)
+		  show_extent_status(__entry->found ? __entry->status : 0))
 );
 TRACE_EVENT(ext4_es_shrink_enter,
--- a/mm/readahead.c
+++ b/mm/readahead.c
@ -48,7 +48,7 @@ static void read_cache_pages_invalidate_page(struct address_space *mapping,
 		if (!trylock_page(page))
 			BUG();
 		page->mapping = mapping;
-		do_invalidatepage(page, 0);
+		do_invalidatepage(page, 0, PAGE_CACHE_SIZE);
 		page->mapping = NULL;
 		unlock_page(page);
 	}
--- a/mm/truncate.c
+++ b/mm/truncate.c
@ -26,7 +26,8 @@
 /**
 * do_invalidatepage - invalidate part or all of a page
 * @page: the page which is affected
- * @offset: the index of the truncation point
+ * @offset: start of the range to invalidate
 * @length: length of the range to invalidate
 *
 * do_invalidatepage() is called when all or part of the page has become
 * invalidated by a truncate operation.
@ -37,24 +38,18 @@
 * point.  Because the caller is about to free (and possibly reuse) those
 * blocks on-disk.
 */
-void do_invalidatepage(struct page *page, unsigned long offset)
+void do_invalidatepage(struct page *page, unsigned int offset,
 		       unsigned int length)
 {
-	void (*invalidatepage)(struct page *, unsigned long);
+	void (*invalidatepage)(struct page *, unsigned int, unsigned int);
 	invalidatepage = page->mapping->a_ops->invalidatepage;
 #ifdef CONFIG_BLOCK
 	if (!invalidatepage)
 		invalidatepage = block_invalidatepage;
 #endif
 	if (invalidatepage)
-		(*invalidatepage)(page, offset);
+		(*invalidatepage)(page, offset, length);
 }
 static inline void truncate_partial_page(struct page *page, unsigned partial)
 {
 	zero_user_segment(page, partial, PAGE_CACHE_SIZE);
 	cleancache_invalidate_page(page->mapping, page);
 	if (page_has_private(page))
 		do_invalidatepage(page, partial);
 }
 /*
@ -103,7 +98,7 @@ truncate_complete_page(struct address_space *mapping, struct page *page)
 		return -EIO;
 	if (page_has_private(page))
-		do_invalidatepage(page, 0);
+		do_invalidatepage(page, 0, PAGE_CACHE_SIZE);
 	cancel_dirty_page(page, PAGE_CACHE_SIZE);
@ -185,11 +180,11 @@ int invalidate_inode_page(struct page *page)
 * truncate_inode_pages_range - truncate range of pages specified by start & end byte offsets
 * @mapping: mapping to truncate
 * @lstart: offset from which to truncate
- * @lend: offset to which to truncate
+ * @lend: offset to which to truncate (inclusive)
 *
 * Truncate the page cache, removing the pages that are between
- * specified offsets (and zeroing out partial page
+ * specified offsets (and zeroing out partial pages
- * (if lstart is not page aligned)).
+ * if lstart or lend + 1 is not page aligned).
 *
 * Truncate takes two passes - the first pass is nonblocking.  It will not
 * block on page locks and it will not block on writeback.  The second pass
@ -200,35 +195,58 @@ int invalidate_inode_page(struct page *page)
 * We pass down the cache-hot hint to the page freeing code.  Even if the
 * mapping is large, it is probably the case that the final pages are the most
 * recently touched, and freeing happens in ascending file offset order.
 *
 * Note that since ->invalidatepage() accepts range to invalidate
 * truncate_inode_pages_range is able to handle cases where lend + 1 is not
 * page aligned properly.
 */
 void truncate_inode_pages_range(struct address_space *mapping,
 				loff_t lstart, loff_t lend)
 {
-	const pgoff_t start = (lstart + PAGE_CACHE_SIZE-1) >> PAGE_CACHE_SHIFT;
+	pgoff_t		start;		/* inclusive */
-	const unsigned partial = lstart & (PAGE_CACHE_SIZE - 1);
+	pgoff_t		end;		/* exclusive */
-	struct pagevec pvec;
+	unsigned int	partial_start;	/* inclusive */
-	pgoff_t index;
+	unsigned int	partial_end;	/* exclusive */
-	pgoff_t end;
+	struct pagevec	pvec;
-	int i;
+	pgoff_t		index;
 	int		i;
 	cleancache_invalidate_inode(mapping);
 	if (mapping->nrpages == 0)
 		return;
-	BUG_ON((lend & (PAGE_CACHE_SIZE - 1)) != (PAGE_CACHE_SIZE - 1));
+	/* Offsets within partial pages */
-	end = (lend >> PAGE_CACHE_SHIFT);
+	partial_start = lstart & (PAGE_CACHE_SIZE - 1);
 	partial_end = (lend + 1) & (PAGE_CACHE_SIZE - 1);
 	/*
 	 * 'start' and 'end' always covers the range of pages to be fully
 	 * truncated. Partial pages are covered with 'partial_start' at the
 	 * start of the range and 'partial_end' at the end of the range.
 	 * Note that 'end' is exclusive while 'lend' is inclusive.
 	 */
 	start = (lstart + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
 	if (lend == -1)
 		/*
 		 * lend == -1 indicates end-of-file so we have to set 'end'
 		 * to the highest possible pgoff_t and since the type is
 		 * unsigned we're using -1.
 		 */
 		end = -1;
 	else
 		end = (lend + 1) >> PAGE_CACHE_SHIFT;
 	pagevec_init(&pvec, 0);
 	index = start;
-	while (index <= end && pagevec_lookup(&pvec, mapping, index,
+	while (index < end && pagevec_lookup(&pvec, mapping, index,
-			min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) {
+			min(end - index, (pgoff_t)PAGEVEC_SIZE))) {
 		mem_cgroup_uncharge_start();
 		for (i = 0; i < pagevec_count(&pvec); i++) {
 			struct page *page = pvec.pages[i];
 			/* We rely upon deletion not changing page->index */
 			index = page->index;
-			if (index > end)
+			if (index >= end)
 				break;
 			if (!trylock_page(page))
@ -247,27 +265,56 @@ void truncate_inode_pages_range(struct address_space *mapping,
 		index++;
 	}
-	if (partial) {
+	if (partial_start) {
 		struct page *page = find_lock_page(mapping, start - 1);
 		if (page) {
 			unsigned int top = PAGE_CACHE_SIZE;
 			if (start > end) {
 				/* Truncation within a single page */
 				top = partial_end;
 				partial_end = 0;
 			}
 			wait_on_page_writeback(page);
-			truncate_partial_page(page, partial);
+			zero_user_segment(page, partial_start, top);
 			cleancache_invalidate_page(mapping, page);
 			if (page_has_private(page))
 				do_invalidatepage(page, partial_start,
 						  top - partial_start);
 			unlock_page(page);
 			page_cache_release(page);
 		}
 	}
 	if (partial_end) {
 		struct page *page = find_lock_page(mapping, end);
 		if (page) {
 			wait_on_page_writeback(page);
 			zero_user_segment(page, 0, partial_end);
 			cleancache_invalidate_page(mapping, page);
 			if (page_has_private(page))
 				do_invalidatepage(page, 0,
 						  partial_end);
 			unlock_page(page);
 			page_cache_release(page);
 		}
 	}
 	/*
 	 * If the truncation happened within a single page no pages
 	 * will be released, just zeroed, so we can bail out now.
 	 */
 	if (start >= end)
 		return;
 	index = start;
 	for ( ; ; ) {
 		cond_resched();
 		if (!pagevec_lookup(&pvec, mapping, index,
-			min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) {
+			min(end - index, (pgoff_t)PAGEVEC_SIZE))) {
 			if (index == start)
 				break;
 			index = start;
 			continue;
 		}
-		if (index == start && pvec.pages[0]->index > end) {
+		if (index == start && pvec.pages[0]->index >= end) {
 			pagevec_release(&pvec);
 			break;
 		}
@ -277,7 +324,7 @@ void truncate_inode_pages_range(struct address_space *mapping,
 			/* We rely upon deletion not changing page->index */
 			index = page->index;
-			if (index > end)
+			if (index >= end)
 				break;
 			lock_page(page);
@ -598,10 +645,8 @@ void truncate_pagecache_range(struct inode *inode, loff_t lstart, loff_t lend)
 	 * This rounding is currently just for example: unmap_mapping_range
 	 * expands its hole outwards, whereas we want it to contract the hole
 	 * inwards.  However, existing callers of truncate_pagecache_range are
-	 * doing their own page rounding first; and truncate_inode_pages_range
+	 * doing their own page rounding first.  Note that unmap_mapping_range
-	 * currently BUGs if lend is not pagealigned-1 (it handles partial
+	 * allows holelen 0 for all, and we allow lend -1 for end of file.
 	 * page at start of hole, but not partial page at end of hole).  Note
 	 * unmap_mapping_range allows holelen 0 for all, and we allow lend -1.
 	 */
 	/*