linux/mm
Zach Brown 65b8291c40 [PATCH] dio: invalidate clean pages before dio write
This patch fixes a user-triggerable oops that was reported by Leonid
Ananiev as archived at http://lkml.org/lkml/2007/2/8/337.

dio writes invalidate clean pages that intersect the written region so that
subsequent buffered reads go to disk to read the new data.  If this fails
the interface tries to tell the caller that the cache is inconsistent by
returning EIO.

Before this patch we had the problem where this invalidation failure would
clobber -EIOCBQUEUED as it made its way from fs/direct-io.c to fs/aio.c.
Both fs/aio.c and bio completion call aio_complete() and we reference freed
memory, usually oopsing.

This patch addresses this problem by invalidating before the write so that
we can cleanly return -EIO before ->direct_IO() has had a chance to return
-EIOCBQUEUED.

There is a compromise here.  During the dio write we can fault in mmap()ed
pages which intersect the written range with get_user_pages() if the user
provided them for the source buffer.  This is a crazy thing to do, but we
can make it mostly work in most cases by trying the invalidation again.
The compromise is that we won't return an error if this second invalidation
fails if it's an AIO write and we have -EIOCBQUEUED.

This was tested by having two processes race performing large O_DIRECT and
buffered ordered writes.  Within minutes ext3 would see a race between
ext3_releasepage() and jbd holding a reference on ordered data buffers and
would cause invalidation to fail, panicing the box.  The test can be found
in the 'aio_dio_bugs' test group in test.kernel.org/autotest.  After this
patch the test passes.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Cc: Leonid Ananiev <leonid.i.ananiev@linux.intel.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-16 19:25:04 -07:00
..
allocpercpu.c [PATCH] Allow NULL pointers in percpu_free 2006-12-07 08:39:22 -08:00
backing-dev.c [PATCH] separate bdi congestion functions from queue congestion functions 2006-10-20 10:26:35 -07:00
bootmem.c [PATCH] remove EXPORT_UNUSED_SYMBOL'ed symbols 2006-12-07 08:39:44 -08:00
bounce.c [PATCH] blktrace: only add a bounce trace when we really bounce 2007-01-12 10:46:49 -08:00
fadvise.c [PATCH] mm: change uses of f_{dentry,vfsmnt} to use f_path 2006-12-08 08:28:43 -08:00
filemap_xip.c [PATCH] mm: mremap correct rmap accounting 2007-01-30 08:33:32 -08:00
filemap.c [PATCH] dio: invalidate clean pages before dio write 2007-03-16 19:25:04 -07:00
filemap.h Remove all inclusions of <linux/config.h> 2006-10-04 03:38:54 -04:00
fremap.c [PATCH] mm: more rmap debugging 2006-12-22 08:55:49 -08:00
highmem.c [PATCH] Use ZVC for free_pages 2007-02-11 10:51:17 -08:00
hugetlb.c [PATCH] hugetlb: preserve hugetlb pte dirty state 2007-02-09 09:25:46 -08:00
internal.h [PATCH] mm: VM_BUG_ON 2006-09-26 08:48:44 -07:00
Kconfig [PATCH] Set CONFIG_ZONE_DMA for arches with GENERIC_ISA_DMA 2007-02-11 10:51:19 -08:00
madvise.c [PATCH] mm: fix madvise infinine loop 2007-03-16 19:25:04 -07:00
Makefile [PATCH] separate bdi congestion functions from queue congestion functions 2006-10-20 10:26:35 -07:00
memory_hotplug.c [PATCH] Fix sparsemem on Cell 2007-01-11 18:18:20 -08:00
memory.c [PATCH] Add NOPFN_REFAULT result from vm_ops->nopfn() 2007-02-12 09:48:27 -08:00
mempolicy.c [PATCH] Page migration: Fix vma flag checking 2007-03-05 07:57:51 -08:00
mempool.c [PATCH] Numerous fixes to kernel-doc info in source files. 2007-02-11 10:51:32 -08:00
migrate.c [PATCH] Page migration: Fix vma flag checking 2007-03-05 07:57:51 -08:00
mincore.c [PATCH] mincore: vma crossing fix 2007-02-15 09:57:03 -08:00
mlock.c [PATCH] mlock cleanup 2006-12-07 08:39:22 -08:00
mmap.c [PATCH] Bug in MM_RB debugging 2007-03-01 14:53:38 -08:00
mmzone.c [PATCH] remove EXPORT_UNUSED_SYMBOL'ed symbols 2006-12-07 08:39:44 -08:00
mprotect.c [PATCH] paravirt: lazy mmu mode hooks.patch 2006-10-01 00:39:33 -07:00
mremap.c [PATCH] mm: mremap correct rmap accounting 2007-01-30 08:33:32 -08:00
msync.c [PATCH] mm: msync() cleanup 2006-09-26 08:48:45 -07:00
nommu.c [PATCH] struct path: convert mm 2006-12-08 08:28:47 -08:00
oom_kill.c [PATCH] fix OOM killing of swapoff 2007-01-05 23:55:29 -08:00
page_alloc.c [PATCH] Rename PG_checked to PG_owner_priv_1 2007-03-01 14:53:37 -08:00
page_io.c [PATCH] swsusp: use block device offsets to identify swap locations 2006-12-07 08:39:27 -08:00
page-writeback.c [PATCH] throttle_vm_writeout(): don't loop on GFP_NOFS and GFP_NOIO allocations 2007-03-01 14:53:38 -08:00
pdflush.c [PATCH] Add include/linux/freezer.h and move definitions from sched.h 2006-12-07 08:39:27 -08:00
prio_tree.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
readahead.c [PATCH] Drop __get_zone_counts() 2007-02-11 10:51:18 -08:00
rmap.c [PATCH] adapt page_lock_anon_vma() to PREEMPT_RCU 2007-03-01 14:53:39 -08:00
shmem_acl.c [PATCH] Fix typos in mm/shmem_acl.c 2006-10-11 11:14:23 -07:00
shmem.c [PATCH] shmem and simple const super_operations 2007-03-05 07:57:51 -08:00
slab.c [PATCH] kernel-doc fixes for 2.6.20-git15 (non-drivers) 2007-03-01 14:53:37 -08:00
slob.c [PATCH] MM: SLOB is broken by recent cleanup of slab.h 2006-12-30 10:56:42 -08:00
sparse.c [PATCH] numa node ids are int, page_to_nid and zone_to_nid should return int 2006-12-07 08:39:23 -08:00
swap_state.c [PATCH] lockdep: locking init debugging improvement 2006-07-03 15:27:02 -07:00
swap.c [PATCH] hotplug CPU: clean up hotcpu_notifier() use 2006-12-07 08:39:39 -08:00
swapfile.c [PATCH] swsusp: Do not fail if resume device is not set 2007-01-05 23:55:22 -08:00
thrash.c [PATCH] make mm/thrash.c:global_faults static 2006-12-07 08:39:22 -08:00
tiny-shmem.c [PATCH] mm/{,tiny-}shmem.c cleanups 2007-03-01 14:53:35 -08:00
truncate.c [PATCH] VM: invalidate_inode_pages2_range() should not exit early 2007-03-01 14:53:39 -08:00
util.c [PATCH] slab: clean up leak tracking ifdefs a little bit 2006-10-04 07:55:13 -07:00
vmalloc.c [PATCH] Numerous fixes to kernel-doc info in source files. 2007-02-11 10:51:32 -08:00
vmscan.c [PATCH] throttle_vm_writeout(): don't loop on GFP_NOFS and GFP_NOIO allocations 2007-03-01 14:53:38 -08:00
vmstat.c [PATCH] optional ZONE_DMA: optional ZONE_DMA in the VM 2007-02-11 10:51:18 -08:00