Split the LRU lists in two, one set for pages that are backed by real file
systems ("file") and one for pages that are backed by memory and swap
("anon"). The latter includes tmpfs.
The advantage of doing this is that the VM will not have to scan over lots
of anonymous pages (which we generally do not want to swap out), just to
find the page cache pages that it should evict.
This patch has the infrastructure and a basic policy to balance how much
we scan the anon lists and how much we scan the file lists. The big
policy changes are in separate patches.
[lee.schermerhorn@hp.com: collect lru meminfo statistics from correct offset]
[kosaki.motohiro@jp.fujitsu.com: prevent incorrect oom under split_lru]
[kosaki.motohiro@jp.fujitsu.com: fix pagevec_move_tail() doesn't treat unevictable page]
[hugh@veritas.com: memcg swapbacked pages active]
[hugh@veritas.com: splitlru: BDI_CAP_SWAP_BACKED]
[akpm@linux-foundation.org: fix /proc/vmstat units]
[nishimura@mxp.nes.nec.co.jp: memcg: fix handling of shmem migration]
[kosaki.motohiro@jp.fujitsu.com: adjust Quicklists field of /proc/meminfo]
[kosaki.motohiro@jp.fujitsu.com: fix style issue of get_scan_ratio()]
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Update the location of the NTFS homepage in several files.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
d_add_ci was lifted 1:1 from ntfs. Change ntfs to use the common
version.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Like the page lock change, this also requires name change, so convert the
raw test_and_set bitop to a trylock.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
All calls to remove_suid() are made with a file pointer, because
(similarly to file_update_time) it is called when the file is written.
Clean up callers by passing in a file instead of a dentry.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Kmem cache passed to constructor is only needed for constructors that are
themselves multiplexeres. Nobody uses this "feature", nor does anybody uses
passed kmem cache in non-trivial way, so pass only pointer to object.
Non-trivial places are:
arch/powerpc/mm/init_64.c
arch/powerpc/mm/hugetlbpage.c
This is flag day, yes.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Jon Tollefson <kniht@linux.vnet.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Matt Mackall <mpm@selenic.com>
[akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c]
[akpm@linux-foundation.org: fix mm/slab.c]
[akpm@linux-foundation.org: fix ubifs]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
__FUNCTION__ is gcc-specific, use __func__
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some drivers have duplicated unlikely() macros. IS_ERR() already has
unlikely() in itself.
This patch cleans up such pointless code.
Signed-off-by: Hirofumi Nakagawa <hnakagawa@miraclelinux.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Jeff Garzik <jeff@garzik.org>
Cc: Paul Clements <paul.clements@steeleye.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: David Brownell <david-b@pacbell.net>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: Anton Altaparmakov <aia21@cantab.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Jaroslav Kysela <perex@perex.cz>
Cc: Takashi Iwai <tiwai@suse.de>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Checking if an address is a vmalloc address is done in a couple of places.
Define a common version in mm.h and replace the other checks.
Again the include structures suck. The definition of VMALLOC_START and
VMALLOC_END is not available in vmalloc.h since highmem.c cannot be included
there.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Simplify page cache zeroing of segments of pages through 3 functions
zero_user_segments(page, start1, end1, start2, end2)
Zeros two segments of the page. It takes the position where to
start and end the zeroing which avoids length calculations and
makes code clearer.
zero_user_segment(page, start, end)
Same for a single segment.
zero_user(page, start, length)
Length variant for the case where we know the length.
We remove the zero_user_page macro. Issues:
1. Its a macro. Inline functions are preferable.
2. The KM_USER0 macro is only defined for HIGHMEM.
Having to treat this special case everywhere makes the
code needlessly complex. The parameter for zeroing is always
KM_USER0 except in one single case that we open code.
Avoiding KM_USER0 makes a lot of code not having to be dealing
with the special casing for HIGHMEM anymore. Dealing with
kmap is only necessary for HIGHMEM configurations. In those
configurations we use KM_USER0 like we do for a series of other
functions defined in highmem.h.
Since KM_USER0 is depends on HIGHMEM the existing zero_user_page
function could not be a macro. zero_user_* functions introduced
here can be be inline because that constant is not used when these
functions are called.
Also extract the flushing of the caches to be outside of the kmap.
[akpm@linux-foundation.org: fix nfs and ntfs build]
[akpm@linux-foundation.org: fix ntfs build some more]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Steven French <sfrench@us.ibm.com>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: <linux-ext4@vger.kernel.org>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Anton Altaparmakov <aia21@cantab.net>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: David Chinner <dgc@sgi.com>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: Steven French <sfrench@us.ibm.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The regression was caused by:
commit[a32ea1e1f9] Fix read/truncate race
This causes ntfs_readpage() to be called for a zero i_size inode, which
failed when the file was compressed and non-resident.
Thanks a lot to Mike Galbraith for reporting the issue and tracking down
the commit that caused the regression.
Looking into it I found three bugs which the patch fixes.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
Tested-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now that nfsd has stopped writing to the find_exported_dentry member we an
mark the export_operations const
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Neil Brown <neilb@suse.de>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: <linux-ext4@vger.kernel.org>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Anton Altaparmakov <aia21@cantab.net>
Cc: David Chinner <dgc@sgi.com>
Cc: Timothy Shimmin <tes@sgi.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Chris Mason <mason@suse.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: "Vladimir V. Saveliev" <vs@namesys.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Trivial switch over to the new generic helpers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Neil Brown <neilb@suse.de>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Anton Altaparmakov <aia21@cantab.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Convert files to UTF-8.
* Also correct some people's names
(one example is Eißfeldt, which was found in a source file.
Given that the author used an ß at all in a source file
indicates that the real name has in fact a 'ß' and not an 'ss',
which is commonly used as a substitute for 'ß' when limited to
7bit.)
* Correct town names (Goettingen -> Göttingen)
* Update Eberhard Mönkeberg's address (http://lkml.org/lkml/2007/1/8/313)
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
NTFS's if-condition on dirty inodes is not complete. Fix it with
sb_has_dirty_inodes().
Cc: Anton Altaparmakov <aia21@cantab.net>
Cc: Ken Chen <kenchen@google.com>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The early LFS work that Linux uses favours EFBIG in various places. SuSv3
specifically uses EOVERFLOW for this as noted by Michael (Bug 7253)
[EOVERFLOW]
The named file is a regular file and the size of the file cannot be
represented correctly in an object of type off_t. We should therefore
transition to the proper error return code
Signed-off-by: Alan Cox <alan@redhat.com>
Cc: Theodore Tso <tytso@mit.edu>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Slab constructors currently have a flags parameter that is never used. And
the order of the arguments is opposite to other slab functions. The object
pointer is placed before the kmem_cache pointer.
Convert
ctor(void *object, struct kmem_cache *s, unsigned long flags)
to
ctor(struct kmem_cache *s, void *object)
throughout the kernel
[akpm@linux-foundation.org: coupla fixes]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Big thanks go to Mathias Kolehmainen for reporting the bug, providing
debug output and testing the patches I sent him to get it working.
The fix was to stop calling ntfs_attr_set() at mount time as that causes
balance_dirty_pages_ratelimited() to be called which on systems with
little memory actually tries to go and balance the dirty pages which tries
to take the s_umount semaphore but because we are still in fill_super()
across which the VFS holds s_umount for writing this results in a
deadlock.
We now do the dirty work by hand by submitting individual buffers. This
has the annoying "feature" that mounting can take a few seconds if the
journal is large as we have clear it all. One day someone should improve
on this by deferring the journal clearing to a helper kernel thread so it
can be done in the background but I don't have time for this at the moment
and the current solution works fine so I am leaving it like this for now.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Slab destructors were no longer supported after Christoph's
c59def9f22 change. They've been
BUGs for both slab and slub, and slob never supported them
either.
This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
currently the export_operation structure and helpers related to it are in
fs.h. fs.h is already far too large and there are very few places needing the
export bits, so split them off into a separate header.
[akpm@linux-foundation.org: fix cifs build]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Steven French <sfrench@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
They can use generic_file_splice_read() instead. Since sys_sendfile() now
prefers that, there should be no change in behaviour.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Local variable `i' is a byte-counter. Don't use it as an index into an array
of le32's.
Reported-by: "young dave" <hidave.darkstar@gmail.com>
Cc: "Christoph Lameter" <clameter@sgi.com>
Acked-by: Anton Altaparmakov <aia21@cantab.net>
Cc: <stable@kernel.org>
Cc: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
First thing mm.h does is including sched.h solely for can_do_mlock() inline
function which has "current" dereference inside. By dealing with can_do_mlock()
mm.h can be detached from sched.h which is good. See below, why.
This patch
a) removes unconditional inclusion of sched.h from mm.h
b) makes can_do_mlock() normal function in mm/mlock.c
c) exports can_do_mlock() to not break compilation
d) adds sched.h inclusions back to files that were getting it indirectly.
e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were
getting them indirectly
Net result is:
a) mm.h users would get less code to open, read, preprocess, parse, ... if
they don't need sched.h
b) sched.h stops being dependency for significant number of files:
on x86_64 allmodconfig touching sched.h results in recompile of 4083 files,
after patch it's only 3744 (-8.3%).
Cross-compile tested on
all arm defconfigs, all mips defconfigs, all powerpc defconfigs,
alpha alpha-up
arm
i386 i386-up i386-defconfig i386-allnoconfig
ia64 ia64-up
m68k
mips
parisc parisc-up
powerpc powerpc-up
s390 s390-up
sparc sparc-up
sparc64 sparc64-up
um-x86_64
x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig
as well as my two usual configs.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Steven French <sfrench@us.ibm.com>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Anton Altaparmakov <aia21@cantab.net>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@ucw.cz>
Cc: David Chinner <dgc@sgi.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use zero_user_page() instead of open-coding it.
[akpm@linux-foundation.org: kmap-type fixes]
Signed-off-by: Nate Diller <nate.diller@gmail.com>
Acked-by: Anton Altaparmakov <aia21@cantab.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove includes of <linux/smp_lock.h> where it is not used/needed.
Suggested by Al Viro.
Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
sparc64, and arm (all 59 defconfigs).
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I have never seen a use of SLAB_DEBUG_INITIAL. It is only supported by
SLAB.
I think its purpose was to have a callback after an object has been freed
to verify that the state is the constructor state again? The callback is
performed before each freeing of an object.
I would think that it is much easier to check the object state manually
before the free. That also places the check near the code object
manipulation of the object.
Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was
compiled with SLAB debugging on. If there would be code in a constructor
handling SLAB_DEBUG_INITIAL then it would have to be conditional on
SLAB_DEBUG otherwise it would just be dead code. But there is no such code
in the kernel. I think SLUB_DEBUG_INITIAL is too problematic to make real
use of, difficult to understand and there are easier ways to accomplish the
same effect (i.e. add debug code before kfree).
There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be
clear in fs inode caches. Remove the pointless checks (they would even be
pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors.
This is the last slab flag that SLUB did not support. Remove the check for
unimplemented flags from SLUB.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ensure pages are uptodate after returning from read_cache_page, which allows
us to cut out most of the filesystem-internal PageUptodate calls.
I didn't have a great look down the call chains, but this appears to fixes 7
possible use-before uptodate in hfs, 2 in hfsplus, 1 in jfs, a few in
ecryptfs, 1 in jffs2, and a possible cleared data overwritten with readpage in
block2mtd. All depending on whether the filler is async and/or can return
with a !uptodate page.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It isn't needed anymore, all of the users are gone, and all of the ctl_table
initializers have been converted to use explicit names of the fields they are
initializing.
[akpm@osdl.org: NTFS fix]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The semantic effect of insert_at_head is that it would allow new registered
sysctl entries to override existing sysctl entries of the same name. Which is
pain for caching and the proc interface never implemented.
I have done an audit and discovered that none of the current users of
register_sysctl care as (excpet for directories) they do not register
duplicate sysctl entries.
So this patch simply removes the support for overriding existing entries in
the sys_sysctl interface since no one uses it or cares and it makes future
enhancments harder.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Corey Minyard <minyard@acm.org>
Cc: Neil Brown <neilb@suse.de>
Cc: "John W. Linville" <linville@tuxdriver.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Jan Kara <jack@ucw.cz>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: David Chinner <dgc@sgi.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Putting ntfs-debug under FS_NRINODE was not a kosher thing to do so don't give
it any binary number.
[akpm@osdl.org: build fix]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Anton Altaparmakov <aia21@cantab.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch is inspired by Arjan's "Patch series to mark struct
file_operations and struct inode_operations const".
Compile tested with gcc & sparse.
Signed-off-by: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Many struct inode_operations in the kernel can be "const". Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data. In addition it'll catch accidental writes at compile time to
these shared resources.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Replace the incorrect debugging check of "#ifdef NTFS_DEBUG" with
just "#ifdef DEBUG".
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Acked-by: Anton Altaparmakov <aia21@cantab.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The KM_BIO_SRC_IRQ kmap slot requires local irq protection.
Acked-by: Anton Altaparmakov <aia21@cantab.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Fix deadlock in fs/ntfs/inode.c::ntfs_put_inode(). Thanks to Sergey
Vlasov for the report and detailed analysis of the deadlock. The fix
involved getting rid of ntfs_put_inode() altogether and hence NTFS no
longer has a ->put_inode super operation.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
Change all the uses of f_{dentry,vfsmnt} to f_path.{dentry,mnt} in the ntfs
filesystem code.
Signed-off-by: Josef "Jeff" Sipek <jsipek@cs.sunysb.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
SLAB_NOFS is an alias of GFP_NOFS.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch cleans up generic_file_*_read/write() interfaces. Christoph
Hellwig gave me the idea for this clean ups.
In a nutshell, all filesystems should set .aio_read/.aio_write methods and use
do_sync_read/ do_sync_write() as their .read/.write methods. This allows us
to cleanup all variants of generic_file_* routines.
Final available interfaces:
generic_file_aio_read() - read handler
generic_file_aio_write() - write handler
generic_file_aio_write_nolock() - no lock write handler
__generic_file_aio_write_nolock() - internal worker routine
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch removes readv() and writev() methods and replaces them with
aio_read()/aio_write() methods.
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch vectorizes aio_read() and aio_write() methods to prepare for
collapsing all aio & vectored operations into one interface - which is
aio_read()/aio_write().
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Michael Holzheu <HOLZHEU@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Conversion of booleans to: generic-boolean.patch (2006-08-23)
Signed-off-by: Richard Knutsson <ricknu-0@student.ltu.se>
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This eliminates the i_blksize field from struct inode. Filesystems that want
to provide a per-inode st_blksize can do so by providing their own getattr
routine instead of using the generic_fillattr() function.
Note that some filesystems were providing pretty much random (and incorrect)
values for i_blksize.
[bunk@stusta.de: cleanup]
[akpm@osdl.org: generic_fillattr() fix]
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* Rougly half of callers already do it by not checking return value
* Code in drivers/acpi/osl.c does the following to be sure:
(void)kmem_cache_destroy(cache);
* Those who check it printk something, however, slab_error already printed
the name of failed cache.
* XFS BUGs on failed kmem_cache_destroy which is not the decision
low-level filesystem driver should make. Converted to ignore.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NTFS uses lots of type-opaque objects which acquire their true identity
runtime - so the lock validator needs to be helped in a couple of places to
figure out object types.
Many thanks to Anton Altaparmakov for giving lots of explanations about NTFS
locking rules.
Has no effect on non-lockdep kernels.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Same as with already do with the file operations: keep them in .rodata and
prevents people from doing runtime patching.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Steven French <sfrench@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The problem is that when we write to a file, the copy from userspace to
pagecache is first done with preemption disabled, so if the source address is
not immediately available the copy fails *and* *zeros* *the* *destination*.
This is a problem because a concurrent read (which admittedly is an odd thing
to do) might see zeros rather that was there before the write, or what was
there after, or some mixture of the two (any of these being a reasonable thing
to see).
If the copy did fail, it will immediately be retried with preemption
re-enabled so any transient problem with accessing the source won't cause an
error.
The first copying does not need to zero any uncopied bytes, and doing so
causes the problem. It uses copy_from_user_atomic rather than copy_from_user
so the simple expedient is to change copy_from_user_atomic to *not* zero out
bytes on failure.
The first of these two patches prepares for the change by fixing two places
which assume copy_from_user_atomic does zero the tail. The two usages are
very similar pieces of code which copy from a userspace iovec into one or more
page-cache pages. These are changed to remove the assumption.
The second patch changes __copy_from_user_inatomic* to not zero the tail.
Once these are accepted, I will look at similar patches of other architectures
where this is important (ppc, mips and sparc being the ones I can find).
This patch:
There is a problem with __copy_from_user_inatomic zeroing the tail of the
buffer in the case of an error. As it is called in atomic context, the error
may be transient, so it results in zeros being written where maybe they
shouldn't be.
In the usage in filemap, this opens a window for a well timed read to see data
(zeros) which is not consistent with any ordering of reads and writes.
Most cases where __copy_from_user_inatomic is called, a failure results in
__copy_from_user being called immediately. As long as the latter zeros the
tail, the former doesn't need to. However in *copy_from_user_iovec
implementations (in both filemap and ntfs/file), it is assumed that
copy_from_user_inatomic will zero the tail.
This patch removes that assumption, so that after this patch it will
be safe for copy_from_user_inatomic to not zero the tail.
This patch also adds some commentary to filemap.h and asm-i386/uaccess.h.
After this patch, all architectures that might disable preempt when
kmap_atomic is called need to have their __copy_from_user_inatomic* "fixed".
This includes
- powerpc
- i386
- mips
- sparc
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Anton Altaparmakov <aia21@cantab.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add read_mapping_page() which is used for callers that pass
mapping->a_ops->readpage as the filler for read_cache_page. This removes
some duplication from filesystem code.
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Give the statfs superblock operation a dentry pointer rather than a superblock
pointer.
This complements the get_sb() patch. That reduced the significance of
sb->s_root, allowing NFS to place a fake root there. However, NFS does
require a dentry to use as a target for the statfs operation. This permits
the root in the vfsmount to be used instead.
linux/mount.h has been added where necessary to make allyesconfig build
successfully.
Interest has also been expressed for use with the FUSE and XFS filesystems.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Nathan Scott <nathans@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Extend the get_sb() filesystem operation to take an extra argument that
permits the VFS to pass in the target vfsmount that defines the mountpoint.
The filesystem is then required to manually set the superblock and root dentry
pointers. For most filesystems, this should be done with simple_set_mnt()
which will set the superblock pointer and then set the root dentry to the
superblock's s_root (as per the old default behaviour).
The get_sb() op now returns an integer as there's now no need to return the
superblock pointer.
This patch permits a superblock to be implicitly shared amongst several mount
points, such as can be done with NFS to avoid potential inode aliasing. In
such a case, simple_set_mnt() would not be called, and instead the mnt_root
and mnt_sb would be set directly.
The patch also makes the following changes:
(*) the get_sb_*() convenience functions in the core kernel now take a vfsmount
pointer argument and return an integer, so most filesystems have to change
very little.
(*) If one of the convenience function is not used, then get_sb() should
normally call simple_set_mnt() to instantiate the vfsmount. This will
always return 0, and so can be tail-called from get_sb().
(*) generic_shutdown_super() now calls shrink_dcache_sb() to clean up the
dcache upon superblock destruction rather than shrink_dcache_anon().
This is required because the superblock may now have multiple trees that
aren't actually bound to s_root, but that still need to be cleaned up. The
currently called functions assume that the whole tree is rooted at s_root,
and that anonymous dentries are not the roots of trees which results in
dentries being left unculled.
However, with the way NFS superblock sharing are currently set to be
implemented, these assumptions are violated: the root of the filesystem is
simply a dummy dentry and inode (the real inode for '/' may well be
inaccessible), and all the vfsmounts are rooted on anonymous[*] dentries
with child trees.
[*] Anonymous until discovered from another tree.
(*) The documentation has been adjusted, including the additional bit of
changing ext2_* into foo_* in the documentation.
[akpm@osdl.org: convert ipath_fs, do other stuff]
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Nathan Scott <nathans@sgi.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Many thanks to Pauline Ng for the detailed bug report and analysis!
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is a conversion to make the various file_operations structs in fs/
const. Basically a regexp job, with a few manual fixups
The goal is both to increase correctness (harder to accidentally write to
shared datastructures) and reducing the false sharing of cachelines with
things that get dirty in .data (while .rodata is nicely read only and thus
cache clean)
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Now the only user who are using generic_ffs() is ntfs filesystem. This patch
isolates generic_ffs() as ntfs_ffs() for ntfs.
Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Cc: Anton Altaparmakov <aia21@cantab.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Mark file system inode and similar slab caches subject to SLAB_MEM_SPREAD
memory spreading.
If a slab cache is marked SLAB_MEM_SPREAD, then anytime that a task that's
in a cpuset with the 'memory_spread_slab' option enabled goes to allocate
from such a slab cache, the allocations are spread evenly over all the
memory nodes (task->mems_allowed) allowed to that task, instead of favoring
allocation on the node local to the current cpu.
The following inode and similar caches are marked SLAB_MEM_SPREAD:
file cache
==== =====
fs/adfs/super.c adfs_inode_cache
fs/affs/super.c affs_inode_cache
fs/befs/linuxvfs.c befs_inode_cache
fs/bfs/inode.c bfs_inode_cache
fs/block_dev.c bdev_cache
fs/cifs/cifsfs.c cifs_inode_cache
fs/coda/inode.c coda_inode_cache
fs/dquot.c dquot
fs/efs/super.c efs_inode_cache
fs/ext2/super.c ext2_inode_cache
fs/ext2/xattr.c (fs/mbcache.c) ext2_xattr
fs/ext3/super.c ext3_inode_cache
fs/ext3/xattr.c (fs/mbcache.c) ext3_xattr
fs/fat/cache.c fat_cache
fs/fat/inode.c fat_inode_cache
fs/freevxfs/vxfs_super.c vxfs_inode
fs/hpfs/super.c hpfs_inode_cache
fs/isofs/inode.c isofs_inode_cache
fs/jffs/inode-v23.c jffs_fm
fs/jffs2/super.c jffs2_i
fs/jfs/super.c jfs_ip
fs/minix/inode.c minix_inode_cache
fs/ncpfs/inode.c ncp_inode_cache
fs/nfs/direct.c nfs_direct_cache
fs/nfs/inode.c nfs_inode_cache
fs/ntfs/super.c ntfs_big_inode_cache_name
fs/ntfs/super.c ntfs_inode_cache
fs/ocfs2/dlm/dlmfs.c dlmfs_inode_cache
fs/ocfs2/super.c ocfs2_inode_cache
fs/proc/inode.c proc_inode_cache
fs/qnx4/inode.c qnx4_inode_cache
fs/reiserfs/super.c reiser_inode_cache
fs/romfs/inode.c romfs_inode_cache
fs/smbfs/inode.c smb_inode_cache
fs/sysv/inode.c sysv_inode_cache
fs/udf/super.c udf_inode_cache
fs/ufs/super.c ufs_inode_cache
net/socket.c sock_inode_cache
net/sunrpc/rpc_pipe.c rpc_inode_cache
The choice of which slab caches to so mark was quite simple. I marked
those already marked SLAB_RECLAIM_ACCOUNT, except for fs/xfs, dentry_cache,
inode_cache, and buffer_head, which were marked in a previous patch. Even
though SLAB_RECLAIM_ACCOUNT is for a different purpose, it marks the same
potentially large file system i/o related slab caches as we need for memory
spreading.
Given that the rule now becomes "wherever you would have used a
SLAB_RECLAIM_ACCOUNT slab cache flag before (usually the inode cache), use
the SLAB_MEM_SPREAD flag too", this should be easy enough to maintain.
Future file system writers will just copy one of the existing file system
slab cache setups and tend to get it right without thinking.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
forgot to update a temporary variable so loading index inodes which
have an index allocation attribute failed.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
allowed by NTFS, i.e. 255 Unicode characters, not including the
terminating NULL (which is not stored on disk).
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
Windows copes with this and even chkdsk does not detect or fix this
so we have to cope with it, too. Thanks to Pawel Kot for reporting
the problem.
- Miscellaneous updates to layout.h.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
MS_RDONLU implies not atime updates at all, no need for the MS_NOATIME and
MS_NODIRATIME flags.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Anton Altaparmakov <aia21@cantab.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
To allow various options to work per-mount instead of per-sb we need a
struct vfsmount when updating ctime and mtime. This preparation patch
replaces the inode_update_time routine with a file_update_atime routine so
we can easily get at the vfsmount. (and the file makes more sense in this
context anyway). Also get rid of the unused second argument - we always
want to update the ctime when calling this routine.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Anton Altaparmakov <aia21@cantab.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch converts the inode semaphore to a mutex. I have tested it on
XFS and compiled as much as one can consider on an ia64. Anyway your
luck with it might be different.
Modified-by: Ingo Molnar <mingo@elte.hu>
(finished the conversion)
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch removes all references to the bouncing address
rddunlap@osdl.org and one dead web page from the kernel.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
left shift using PAGE_CACHE_SHIFT in fs/ntfs/file.c. Thanks to Andrew
Morton pointing this out to.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
Many thanks to Alberto Patino for testing and reporting the data
corruption. And many apologies for corrupting his partition.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
file operations ->write(), ->aio_write(), and ->writev() for regular
files. This replaces the old use of generic_file_write(), et al and
the address space operations ->prepare_write and ->commit_write.
This means that both sparse and non-sparse (unencrypted and
uncompressed) files can now be extended using the normal write(2)
code path. There are two limitations at present and these are that
we never create sparse files and that we only have limited support
for highly fragmented files, i.e. ones whose data attribute is split
across multiple extents. When such a case is encountered,
EOPNOTSUPP is returned.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
and cond_resched() in the main loop as we could be dirtying a lot of
pages and this ensures we play nice with the VM and the system as a
whole.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
- added typedef unsigned int __nocast gfp_t;
- replaced __nocast uses for gfp flags with gfp_t - it gives exactly
the same warnings as far as sparse is concerned, doesn't change
generated code (from gcc point of view we replaced unsigned int with
typedef) and documents what's going on far better.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
the initial implementation of file truncation. Now both open(2)ing
a file with the O_TRUNC flag and the {,f}truncate(2) system calls
will resize a file appropriately. The limitations are that only
uncompressed and unencrypted files are supported. Also, there is
only very limited support for highly fragmented files (the ones whose
$DATA attribute is split into multiple attribute extents).
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
extend the allocation of an attributes. Optionally, the data size,
but not the initialized size can be extended, too.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
which is zero for a resident attribute but should no longer be zero
once the attribute is non-resident as it then has real clusters
allocated.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
as an extra parameter. This is needed since we need to know the size
before we can map the mft record and our callers always know it. The
reason we cannot simply read the size from the vfs inode i_size is
that this is not necessarily uptodate. This happens when
ntfs_attr_make_non_resident() is called in the ->truncate call path.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
specifying whether the cluster are being allocated to extend an
attribute or to fill a hole.
- Change ntfs_attr_make_non_resident() to call ntfs_cluster_alloc()
with @is_extension set to TRUE and remove the runlist terminator
fixup code as this is now done by ntfs_cluster_alloc().
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
search context as argument. This allows calling it with the mft
record mapped. Update all callers.
- Fix potential deadlock in ntfs_mft_data_extend_allocation_nolock()
error handling by passing in the active search context when calling
ntfs_cluster_free().
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
search context as argument. This allows calling it with the mft
record mapped. Update all callers.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
restart pages in the journal without multi sector transfer protection
fixups (i.e. the update sequence array is empty and in fact does not
exist).
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
since we otherwise get into a lock reversal deadlock if a read locked
runlist is passed in. In the process also change it to take an ntfs
inode instead of a vfs inode as parameter.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
an octal number to conform to how chmod(1) works, too. Thanks to
Giuseppe Bilotta and Horst von Brand for pointing out the errors of
my ways.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
fs/ntfs/aops.c::ntfs_end_buffer_async_read() to a bit spin lock
in the first buffer head of a page.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
lock protection over the buffer submission for i/o which allows the
removal of the get_bh()/put_bh() pairs for each buffer.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
Also, add BUG() checks to ntfs_attr_make_non_resident() and
ntfs_attr_set() to ensure that these functions are never called
for compressed or encrypted attributes.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
- Fix a bug in ntfs_map_runlist_nolock() where we forgot to protect
access to the allocated size in the ntfs inode with the size lock.
- Fix ntfs_attr_vcn_to_lcn_nolock() and ntfs_attr_find_vcn_nolock() to
return LCN_ENOENT when there is no runlist and the allocated size is
zero.
- Fix load_attribute_list() to handle the case of a NULL runlist.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
index entry is in the index root, we forgot to set the @ir pointer in
the index context. Thanks for Yura Pakhuchiy for finding this bug.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
in the two critical regions. This means we no longer need to
panic() when the allocation fails as it now cannot fail.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
- Modify fs/ntfs/malloc.h::ntfs_malloc_nofs() to do the kmalloc() based
allocations with __GFP_HIGHMEM, analogous to how the vmalloc() based
allocations are done.
- Add fs/ntfs/malloc.h::ntfs_malloc_nofs_nofail() which is analogous to
ntfs_malloc_nofs() but it performs allocations with __GFP_NOFAIL and
hence cannot fail.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
- Support journals ($LogFile) which have been modified by chkdsk. This
means users can boot into Windows after we marked the volume dirty.
The Windows boot will run chkdsk and then reboot. The user can then
immediately boot into Linux rather than having to do a full Windows
boot first before rebooting into Linux and we will recognize such a
journal and empty it as it is clean by definition.
- Support journals ($LogFile) with only one restart page as well as
journals with two different restart pages. We sanity check both and
either use the only sane one or the more recent one of the two in the
case that both are valid.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
the buffers when mapping them after the VM had discarded them.
Thanks to Martin MOKREJŠ for the bug report.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
turn many #if $undefined_string into #ifdef $undefined_string to fix some
warnings after -Wno-def was added to global CFLAGS
Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The situation: VFS inode X on a mounted ntfs volume is dirty. For
same inode X, the ntfs_inode is dirty and thus corresponding on-disk
inode, i.e. mft record, which is in a dirty PAGE_CACHE_PAGE belonging
to the table of inodes, i.e. $MFT, inode 0.
What happens:
Process 1: sys_sync()/umount()/whatever... calls
__sync_single_inode() for $MFT -> do_writepages() -> write_page for
the dirty page containing the on-disk inode X, the page is now locked
-> ntfs_write_mst_block() which clears PageUptodate() on the page to
prevent anyone else getting hold of it whilst it does the write out.
This is necessary as the on-disk inode needs "fixups" applied before
the write to disk which are removed again after the write and
PageUptodate is then set again. It then analyses the page looking
for dirty on-disk inodes and when it finds one it calls
ntfs_may_write_mft_record() to see if it is safe to write this
on-disk inode. This then calls ilookup5() to check if the
corresponding VFS inode is in icache(). This in turn calls ifind()
which waits on the inode lock via wait_on_inode whilst holding the
global inode_lock.
Process 2: pdflush results in a call to __sync_single_inode for the
same VFS inode X on the ntfs volume. This locks the inode (I_LOCK)
then calls write-inode -> ntfs_write_inode -> map_mft_record() ->
read_cache_page() for the page (in page cache of table of inodes
$MFT, inode 0) containing the on-disk inode. This page has
PageUptodate() clear because of Process 1 (see above) so
read_cache_page() blocks when it tries to take the page lock for the
page so it can call ntfs_read_page().
Thus Process 1 is holding the page lock on the page containing the
on-disk inode X and it is waiting on the inode X to be unlocked in
ifind() so it can write the page out and then unlock the page.
And Process 2 is holding the inode lock on inode X and is waiting for
the page to be unlocked so it can call ntfs_readpage() or discover
that Process 1 set PageUptodate() again and use the page.
Thus we have a deadlock due to ifind() waiting on the inode lock.
The solution: The fix is to use the newly introduced
ilookup5_nowait() which does not wait on the inode's lock and hence
avoids the deadlock. This is safe as we do not care about the VFS
inode and only use the fact that it is in the VFS inode cache and the
fact that the vfs and ntfs inodes are one struct in memory to find
the ntfs inode in memory if present. Also, the ntfs inode has its
own locking so it does not matter if the vfs inode is locked.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
if the requested vcn is inside it. Otherwise we get into problems
when we try to map an out of bounds vcn because we then try to map
the already mapped runlist fragment which causes
ntfs_mapping_pairs_decompress() to fail and return error. Update
ntfs_attr_find_vcn_nolock() accordingly.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
and ntfs_mapping_pairs_build() to allow the runlist encoding to be
partial which is desirable when filling holes in sparse attributes.
Update all callers.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
with a 64-bit variable and a int, i.e. 32-bit, constant. This causes
the higher order 32-bits of the 64-bit variable to be zeroed. To fix
this cast the 'const' to the same 64-bit type as 'var'.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
to be mounted and if this is the case do not allow (re)mounting
read-write. This is done by parsing hiberfil.sys if present.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
if the runlist was not mapped at all and a mapping error occured we
would leave the runlist locked on exit to the function so that the
next access to the same file would try to take the lock and deadlock.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
is active on the volume and we are mounting read-write or remounting
from read-only to read-write.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
Thus, relax the checking in fs/ntfs/super.c::is_boot_sector_ntfs() to
only emit a warning when the checksum is incorrect rather than
refusing the mount. Thanks to Bernd Casimir for pointing this
problem out.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
- Add ifdef NTFS_RW around write specific code if fs/ntfs/runlist.[hc] and
fs/ntfs/attrib.[hc].
- Minor bugfix to fs/ntfs/attrib.c::ntfs_attr_make_non_resident() where the
runlist was not freed in all error cases.
- Add fs/ntfs/runlist.[hc]::ntfs_rl_find_vcn_nolock().
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
and handle the case where an attribute is converted from resident
to non-resident by a concurrent file write.
- Reorder some operations when converting an attribute from resident
to non-resident (fs/ntfs/attrib.c) so it is safe wrt concurrent
->readpage and ->writepage.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
dropping the read lock and taking the write lock we were not checking
whether someone else did not already do the work we wanted to do.
- Rename ntfs_find_vcn_nolock() to ntfs_attr_find_vcn_nolock().
- Tidy up some comments in fs/ntfs/runlist.c.
- Add LCN_ENOMEM and LCN_EIO definitions to fs/ntfs/runlist.h.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
checked and set in the ntfs inode as done for compressed files
and the compressed size needs to be used for vfs inode->i_blocks
instead of the allocated size, again, as done for compressed files.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
definition of ntfs_export_ops from fs/ntfs/super.c to namei.c.
Also, declare ntfs_export_ops in fs/ntfs/ntfs.h.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
mft record for resident attributes (fs/ntfs/inode.c).
- Small readability cleanup to use "a" instead of "ctx->attr"
everywhere (fs/ntfs/inode.c).
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
warning in the do_div() call on sparc32. Thanks to Meelis Roos for the
report and analysis of the warning.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
helper ntfs_map_runlist_nolock() which is used by ntfs_map_runlist().
This allows us to map runlist fragments with the runlist lock already
held without having to drop and reacquire it around the call. Adapt
all callers.
- Change ntfs_find_vcn() to ntfs_find_vcn_nolock() which takes a locked
runlist. This allows us to find runlist elements with the runlist
lock already held without having to drop and reacquire it around the
call. Adapt all callers.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
enable bit which is set appropriately and a per inode sparse disable
bit which is preset on some system file inodes as appropriate.
- Enforce that sparse support is disabled on NTFS volumes pre 3.0.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
fs/ntfs/aops.c::ntfs_{prepare,commit}_write()() and re-enable it.
It should be safe now. (Famous last words...)
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
value afterwards. Cache the initialized_size in the same way and
protect access to the two sizes using the size_lock.
- Minor optimization to fs/ntfs/super.c::ntfs_statfs() and its helpers.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
cached value everywhere. Cache the initialized_size in the same way
and protect the critical region where the two sizes are read using the
new size_lock of the ntfs inode.
- Add the new size_lock to the ntfs_inode structure (fs/ntfs/inode.h)
and initialize it (fs/ntfs/inode.c).
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.
Let it rip!