This removes unnecessary parsing for directory entries.
If short_only, we don't need to parse longname. And if !both and it found
the longname, we don't need shortname.
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This uses uses stack for shortname, and uses __getname() for longname in
fat_search_long() and __fat_readdir(). By this, it removes unneeded
__getname() for shortname.
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is no logic changes, just cleans fs/fat/dir.c up.
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
struct __fat_dirent is what was formerly the kernel struct dirent (that
was different from the userspace struct dirent).
Converting all fat users to struct __fat_dirent will allow us to get rid
of the conflicting struct dirent definition.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Current parse_options() exits too early. We need to run the code of
bottom in this function even if users doesn't specify options.
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
remove the definitions of macros:
XATTR_SECURITY_PREFIX
XATTR_TRUSTED_PREFIX
XATTR_USER_PREFIX
since they are defined in linux/xattr.h
Signed-off-by: Shen Feng <shen@cn.fujitsu.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
j_commit_lock is a semaphore but uses it as if it were a mutex. This patch
converts it to a mutex.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Edward Shishkin <edward.shishkin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
j_flush_sem is a semaphore but uses it as if it were a mutex. This patch
converts it to a mutex.
[akpm@linux-foundation.org: fix mutex_trylock retval treatment]
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Edward Shishkin <edward.shishkin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
j_lock is a semaphore but uses it as if it were a mutex. This patch converts
it to a mutex.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Edward Shishkin <edward.shishkin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We should not allow user to change quota mount options when quota is just
suspended. It would make mount options and internal quota state inconsistent.
Also we should not allow user to change quota format when quota is turned on.
On the other hand we can just silently ignore when some option is set to the
value it already has (some mount versions do this on remount). Finally, we
should not discard current quota options if parsing of mount options fails.
Cc: <reiserfs-devel@vger.kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: <reiserfs-devel@vger.kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In journal=data mode, it is not enough to do write_inode_now() as done in
vfs_quota_on() to write all data to their final location (which is needed for
quota_read to work correctly). Calling journal_end_sync() before calling
vfs_quota_on() does it's job because transactions are committed to the journal
and data marked as dirty in memory so write_inode_now() writes them to their
final locations.
Cc: <reiserfs-devel@vger.kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Apple Extended HFS file system: The semaphore extents lock is used as a
mutex. Convert it to the mutex API.
Signed-off-by: Matthias Kaehlcke <matthias@kaehlcke.net>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Apple Macintosh file system: The semaphore extens_lock is used as a mutex.
Convert it to the mutex API
Signed-off-by: Matthias Kaehlcke <matthias@kaehlcke.net>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Apple Macintosh file system: The semaphore bitmap_lock is used as a mutex.
Convert it to the mutex API
Signed-off-by: Matthias Kaehlcke <matthias@kaehlcke.net>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While fixing CONFIG_ leakages to the userspace kernel headers I ran into
CODA_FS_OLD_API.
After five years, are there still people using the old API left?
Especially considering that you have to choose at compile time which API
to support in the kernel (and distributions tend to offer the new API for
some time).
Jan: "The old API can definitely go. Around the time the new
interface went in there were some non-Coda userspace file system
implementations that took a while longer to convert to the new API,
but by now they all switched to the new interface or in some cases
to a FUSE-based solution."
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Jan Harkes <jaharkes@cs.cmu.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some iso9660 images contain files with rockridge data that is either
incorrect or incompletely parsed. Prior to commit
f2966632a1 ("[PATCH] rock: handle directory
overflows") (included with kernel 2.6.13) the kernel ignored the rockridge
data for these files, while still allowing the files to be accessed under
their non-rockridge names. That commit inadvertently changed things so
that files with invalid rockridge data could not be accessed at all. (I
ran across the problem when comparing some old CDs with hard disk copies I
had made long ago under kernel 2.4: a few of the files on the hard disk
copies were no longer visible on the CDs.)
This change reverts to the pre-2.6.13 behavior.
Signed-off-by: Adam Greenblatt <adam.greenblatt@gmail.com>
Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: <stable@kernel.org> [2.6.25.x, 2.6.26.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ext3_dx_find_entry uses ext3_next_entry without verifying that the entry
is valid. If its rec_len == 0 this causes an infinite loop. Refactor the
loop to check the validity of entries before checking whether they match
and moving onto the next one.
There are other uses of ext3_next_entry in this file which also look
problematic. They should be reviewed and fixed if/when we have a
test-case that triggers them.
This patch fixes the first case (image hdb.25.softlockup.gz) reported in
http://bugzilla.kernel.org/show_bug.cgi?id=10882.
Signed-off-by: Duane Griffin <duaneg@dghda.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In ordered mode, the current jbd aborts the journal if a file data buffer
has an error. But this behavior is unintended, and we found that it has
been adopted accidentally.
This patch undoes it and just calls printk() instead of aborting the
journal. Additionally, set AS_EIO into the address_space object of the
failed buffer which is submitted by journal_do_submit_data() so that
fsync() can get -EIO.
Missing error checkings are also added to inform errors on file data
buffers to the user. The following buffers are targeted.
(a) the buffer which has already been written out by pdflush
(b) the buffer which has been unlocked before scanned in the
t_locked_list loop
[akpm@linux-foundation.org: improve grammar in a printk]
Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Acked-by: Jan Kara <jack@suse.cz>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
dx_root_limit() will never return 20, and I can't figure out what 20
stands for. This function has never changed since htree directory
indexing was merged.
Similar for dx_node_limit() and the magic 22.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Andreas Dilger <adilger@sun.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After ext3-ordered files are truncated, there is a possibility that the
pages which cannot be estimated still remain. Remaining pages can be
released when the system has really few memory. So, it is not memory
leakage. But the resource management software etc. may not work
correctly.
It is possible that journal_unmap_buffer() cannot release the buffers, and
the pages to which they belong because they are attached to a commiting
transaction and journal_unmap_buffer() cannot release them. To release
such the buffers and the pages later, journal_unmap_buffer() leaves it to
journal_commit_transaction(). (journal_unmap_buffer() puts the mark
'BH_Freed' to the buffers so that journal_commit_transaction() can
identify whether they can be released or not.)
In the journalled mode and the writeback mode, jbd does with only metadata
buffers. But in the ordered mode, jbd does with metadata buffers and also
data buffers.
Actually, journal_commit_transaction() releases only the metadata buffers
of which release is demanded by journal_unmap_buffer(), and also releases
the pages to which they belong if possible.
As a result, the data buffers of which release is demanded by
journal_unmap_buffer() remain after a transaction commits. And also the
pages to which they belong remain.
Such the remained pages don't have mapping any longer. Due to this fact,
there is a possibility that the pages which cannot be estimated remain.
The metadata buffers marked 'BH_Freed' and the pages to which
they belong can be released at 'JBD: commit phase 7'.
Therefore, by applying the same code into 'JBD: commit phase 2' (where the
data buffers are done with), journal_commit_transaction() can also release
the data buffers marked 'BH_Freed' and the pages to which they belong.
As a result, all the buffers marked 'BH_Freed' can be released, and also
all the pages to which these buffers belong can be released at
journal_commit_transaction(). So, the page which cannot be estimated is
lost.
<<Excerpt of code at 'JBD: commit phase 7'>>
> spin_lock(&journal->j_list_lock);
> while (commit_transaction->t_forget) {
> transaction_t *cp_transaction;
> struct buffer_head *bh;
>
> jh = commit_transaction->t_forget;
>...
> if (buffer_freed(bh)) {
> ^^^^^^^^^^^^^^^^^^^^^^^^
> clear_buffer_freed(bh);
> ^^^^^^^^^^^^^^^^^^^^^^^^
> clear_buffer_jbddirty(bh);
> }
>
> if (buffer_jbddirty(bh)) {
> JBUFFER_TRACE(jh, "add to new checkpointing trans");
> __journal_insert_checkpoint(jh, commit_transaction);
> JBUFFER_TRACE(jh, "refile for checkpoint writeback");
> __journal_refile_buffer(jh);
> jbd_unlock_bh_state(bh);
> } else {
> J_ASSERT_BH(bh, !buffer_dirty(bh));
> ...
> JBUFFER_TRACE(jh, "refile or unfile freed buffer");
> __journal_refile_buffer(jh);
> if (!jh->b_transaction) {
> jbd_unlock_bh_state(bh);
> /* needs a brelse */
> journal_remove_journal_head(bh);
> release_buffer_page(bh);
> ^^^^^^^^^^^^^^^^^^^^^^^^
> } else
> }
****************************************************************
* Apply the code of "^^^^^^" lines into 'JBD: commit phase 2' *
****************************************************************
At journal_commit_transaction() code, there is one extra message in the
series of jbd debug messages. ("JBD: commit phase 2") This patch fixes
it, too.
Signed-off-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Acked-by: Jan Kara <jack@suse.cz>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While freeing indirect blocks we attach a journal head to the parent
buffer head, free the blocks, then journal the parent. If the indirect
block list is corrupted and points to the parent the journal head will be
detached when the block is cleared, causing an OOPS.
Check for that explicitly and handle it gracefully.
This patch fixes the third case (image hdb.20000057.nullderef.gz)
reported in http://bugzilla.kernel.org/show_bug.cgi?id=10882.
Immediately above the change, in the ext3_free_data function, we call
ext3_clear_blocks to clear the indirect blocks in this parent block. If
one of those blocks happens to actually be the parent block it will clear
b_private / BH_JBD.
I did the check at the end rather than earlier as it seemed more elegant.
I don't think there should be much practical difference, although it is
possible the FS may not be quite so badly corrupted if we did it the other
way (and didn't clear the block at all). To be honest, I'm not convinced
there aren't other similar failure modes lurking in this code, although I
couldn't find any with a quick review.
[akpm@linux-foundation.org: fix printk warning]
Signed-off-by: Duane Griffin <duaneg@dghda.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A transient I/O error can corrupt inode data. Here is the scenario:
(1) update inode_A at the block_B
(2) pdflush writes out new inode_A to the filesystem, but it results
in write I/O error, at this point, BH_Uptodate flag of the buffer
for block_B is cleared and BH_Write_EIO is set
(3) create new inode_C which located at block_B, and
__ext3_get_inode_loc() tries to read on-disk block_B because the
buffer is not uptodate
(4) if it can read on-disk block_B successfully, inode_A is
overwritten by old data
This patch makes __ext3_get_inode_loc() not read the inode block if the
buffer has BH_Write_EIO flag. In this case, the buffer should have the
latest information, so setting the uptodate flag to the buffer (this
avoids WARN_ON_ONCE() in mark_buffer_dirty().)
According to this change, we would need to test BH_Write_EIO flag for the
error checking. Currently nobody checks write I/O errors on metadata
buffers, but it will be done in other patches I'm working on.
Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: sugita <yumiko.sugita.yf@hitachi.com>
Cc: Satoshi OSHIMA <satoshi.oshima.fk@hitachi.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Jan Kara <jack@ucw.cz>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If the orphan node list includes valid, untruncatable nodes with nlink > 0
the ext3_orphan_cleanup loop which attempts to delete them will not do so,
causing it to loop forever. Fix by checking for such nodes in the
ext3_orphan_get function.
This patch fixes the second case (image hdb.20000009.softlockup.gz)
reported in http://bugzilla.kernel.org/show_bug.cgi?id=10882.
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: printk warning fix]
Signed-off-by: Duane Griffin <duaneg@dghda.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
remove the definitions of macros:
XATTR_TRUSTED_PREFIX
XATTR_USER_PREFIX
since they are defined in linux/xattr.h
Signed-off-by: Shen Feng <shen@cn.fujitsu.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
journal_try_to_free_buffers() could race with jbd commit transaction when
the later is holding the buffer reference while waiting for the data
buffer to flush to disk. If the caller of journal_try_to_free_buffers()
request tries hard to release the buffers, it will treat the failure as
error and return back to the caller. We have seen the directo IO failed
due to this race. Some of the caller of releasepage() also expecting the
buffer to be dropped when passed with GFP_KERNEL mask to the
releasepage()->journal_try_to_free_buffers().
With this patch, if the caller is passing the __GFP_WAIT and __GFP_FS to
indicating this call could wait, in case of try_to_free_buffers() failed,
let's waiting for journal_commit_transaction() to finish commit the
current committing transaction, then try to free those buffers again.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Reviewed-by: Badari Pulavarty <pbadari@us.ibm.com>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- remove unnecessary code in free_rb_tree_fname
- rename free_rb_tree_fname to ext3_htree_create_dir_info
since it and ext3_htree_free_dir_info are a pair
- replace kmalloc with kzalloc in ext3_htree_free_dir_info
Signed-off-by: Shen Feng <shen@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make revocation cache destruction safe to call if initialisation fails
partially or entirely. This allows it to be used to cleanup in the case
of initialisation failure, simplifying that code slightly.
Signed-off-by: Duane Griffin <duaneg@dghda.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The revocation table initialisation/destruction code is repeated for each
of the two revocation tables stored in the journal. Refactoring the
duplicated code into functions is tidier, simplifies the logic in
initialisation in particular, and slightly reduces the code size.
There should not be any functional change.
Signed-off-by: Duane Griffin <duaneg@dghda.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If an error occurs during jbd cache initialisation it is possible for the
journal_head_cache to be NULL when journal_destroy_journal_head_cache is
called. Replace the J_ASSERT with an if block to handle the situation
correctly.
Note that even with this fix things will break badly if jbd is statically
compiled in and cache initialisation fails.
Signed-off-by: Duane Griffin <duaneg@dghda.com
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We should not allow user to change quota mount options when quota is just
suspended. I would make mount options and internal quota state inconsistent.
Also we should not allow user to change quota format when quota is turned on.
On the other hand we can just silently ignore when some option is set to the
value it already has (mount does this on remount).
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In journal=data mode, it is not enough to do write_inode_now as done in
vfs_quota_on() to write all data to their final location (which is needed for
quota_read to work correctly). Calling journal_flush() does its job.
Reported-by: Nick <gentuu@gmail.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
remove the definitions of macros:
XATTR_TRUSTED_PREFIX
XATTR_USER_PREFIX
since they are defined in linux/xattr.h
Signed-off-by: Shen Feng <shen@cn.fujitsu.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch removes the !NO_TRUNCATE code that anyway required a manual
editing of the code for being used.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
fs/exec.c used to need mman.h pagemap.h swap.h and rmap.h when it did
mm-ish stuff in install_arg_page(); but no need for them after 2.6.22.
[akpm@linux-foundation.org: unbreak arm]
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Replace the private BE16/BE32/BE64 macros with direct calls to
get_unaligned_be16/32/64.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This fixes the following compile error caused by commit
f9247273cb ("UFS: add const to parser
token table"):
CC fs/nfs/nfsroot.o
/home/bunk/linux/kernel-2.6/git/linux-2.6/fs/nfs/nfsroot.c:130: error: tokens causes a section type conflict
make[3]: *** [fs/nfs/nfsroot.o] Error 1
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch adds a "const" to the parser token table. I've done an
allmodconfig build to see if this produces any warnings/failures and the
patch includes a fix for the only warning that was produced.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Acked-by: Alexander Viro <aviro@redhat.com>
Acked-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The ioctls AUTOFS_IOC_TOGGLEREGHOST and AUTOFS_IOC_ASKREGHOST were added
several years ago but what they were intended for has never been
implemented (as far as I'm aware noone uses them) so remove them.
Signed-off-by: Ian Kent <raven@themaw.net>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch re-orgnirzes the checking for and waiting on active expires and
elininates redundant checks.
Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Appologies, somehow I seem to have sent an out dated version of this
patch. Here is an additional patch that brings the patch up to date.
Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For direct and offset type mounts that are covered by another mount we
cannot check the AUTOFS_INF_EXPIRING flag during a path walk which leads
to lookups walking into an expiring mount while it is being expired.
For example, for the direct multi-mount map entry with a couple of
offsets:
/race/mm1 / <server1>:/<path1>
/om1 <server2>:/<path2>
/om2 <server1>:/<path3>
an autofs trigger mount is mounted on /race/mm1 and when accessed it is
over mounted and trigger mounts made for /race/mm1/om1 and /race/mm1/om2.
So it isn't possible for path walks to see the expiring flag at all and
they happily walk into the file system while it is expiring.
When expiring these mounts follow_down() must stop at the autofs mount and
all processes must block in the ->follow_link() method (except the daemon)
until the expire is complete. This is done by decrementing the d_mounted
field of the autofs trigger mount root dentry until the expire is
completed. In ->follow_link() all processes wait on the expire and the
mount following is completed for the daemon until the expire is complete.
Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The selection of a dentry for expiration and the setting of the
AUTOFS_INF_EXPIRING flag isn't done atomically which can lead to lookups
walking into an expiring mount.
What happens is that an expire is initiated by the daemon and a dentry is
selected for expire but, since there is no lock held between the selection
and setting of the expiring flag, a process may find the flag clear and
continue walking into the mount tree at the same time the daemon attempts
the expire it.
Signed-off-by: Ian Kent <raven@themaw.net>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are two cases for which a dentry that has a pending mount request
does not wait for completion. One is via autofs4_revalidate() and the
other via autofs4_follow_link().
In revalidate, after the mount point directory is created, but before the
mount is done, the check in try_to_fill_dentry() can can fail to send the
dentry to the wait queue since the dentry is positive and the lookup flags
may contain only LOOKUP_FOLLOW. Although we don't trigger a mount for the
LOOKUP_FOLLOW flag, if ther's one pending we might as well wait and use
the mounted dentry for the lookup.
In autofs4_follow_link() the dentry is not checked to see if it is pending
so it may fail to call try_to_fill_dentry() and not wait for mount
completion.
A dentry that is pending must always be sent to the wait queue.
Signed-off-by: Ian Kent <raven@themaw.net>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The mount triggering functionality of readdir and related functions is no
longer used (and is quite broken as well). The unused portions have been
removed.
Signed-off-by: Ian Kent <raven@themaw.net>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We have been seeing mount requests comming to the automount daemon for
keys of the form "<map key>/<non key directory>" which are lookups for
invalid map keys. But we can check for this in the kernel module and
return a fail immediately, without having to send a request to the daemon.
It is possible to recognise these requests are invalid based on whether
the request dentry is negative and its relation to the autofs file system
root.
For example, given the indirect multi-mount map entry:
idm1 \
/mm1 <server>:/<path1>
/mm2 <server>:/<path2>
For a request to mount idm1, IS_ROOT((idm1)->d_parent) will be always be
true and the dentry may be negative. But directories idm1/mm1 and
idm1/mm2 will always be created as part of the mount request for idm1. So
any mount request within idm1 itself must have a positive dentry otherwise
the map key is invalid.
In version 4 these multi-mount entries are all mounted and umounted as a
single request and in version 5 the directories idm1/mm1 and idm1/mm2 are
created and an autofs fs mounted on them to act as a mount trigger so the
above is also true.
This also holds true for the autofs version 4 pseudo direct mount feature.
When this feature is used without the "--ghost" option automount(8) will
create internal submounts as we go down the map key paths which are
essentially normal indirect mounts for which the above holds. If the
"--ghost" option is given the directories for map keys are created at
daemon startup so valid map entries correspond to postive dentries in the
autofs fs.
autofs version 5 direct mount maps are similar except that the IS_ROOT
check is not needed. This has been addressed in a previous patch tittled
"autofs4 - detect invalid direct mount requests".
For example, given the direct multi-mount map entry:
/test/dm1 \
/mm1 <server>:/<path1>
/mm2 <server>:/<path2>
An autofs fs is mounted on /test/dm1 as a trigger mount and when a mount
is triggered for /test/dm1, the multi-mount offset directories
/test/dm1/mm1 and /test/dm1/mm2 are created and an autofs fs is mounted on
them to act as mount triggers. So valid direct mount requests must always
have a positive dentry if they correspond to a valid map entry.
Signed-off-by: Ian Kent <raven@themaw.net>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
autofs v5 direct and offset mounts within an autofs filesystem are
triggered by existing autofs triger mounts so the mount point dentry must
be positive. If the mount point dentry is negative then the trigger
doesn't exist so we can return fail immediately.
Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If an autofs mount becomes catatonic before autofs4_wait_release() is
called the wait queue counter will not be decremented down to zero and the
entry will never be freed. There are also races decrementing the wait
counter in the wait release function. To deal with this the counter needs
to be updated while holding the wait queue mutex and waiters need to be
woken up unconditionally when the wait is removed from the queue to ensure
we eventually free the wait.
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>