Eliminate the poor, misunderstood "oflags" option from cifs_new_fileinfo.
The callers mostly pass in the filp->f_flags here.
That's not correct however since we're checking that value for
the presence of FMODE_READ. Luckily that only affects how the f_list is
ordered. What it really wants here is the file->f_mode. Just use that
field from the filp to determine it.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
The way flags are passed and converted for cifs_posix_open is rather
non-sensical. Some callers call cifs_posix_convert_flags on the flags
before they pass them to cifs_posix_open, whereas some don't. Two flag
conversion steps is just confusing though.
Change the function instead to clearly expect input in f_flags format,
and fix the callers to pass that in. Then, have cifs_posix_open call
cifs_convert_posix_flags to do the conversion. Move cifs_posix_open to
file.c as well so we can keep cifs_convert_posix_flags as a static
function.
Fix it also to not ignore O_CREAT, O_EXCL and O_TRUNC, and instead have
cifs_reopen_file mask those bits off before calling cifs_posix_open.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
cifs: eliminate cifs_posix_open_inode_helper
This function is redundant. The only thing it does is set the canCache
flags, but those get set in cifs_new_fileinfo anyway.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Suresh Jayaraman <sjayaraman@suse.de>
Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Filesystems aren't really supposed to do anything with a vfsmount. It's
considered a layering violation since vfsmounts are entirely managed at
the VFS layer.
CIFS currently keeps an active reference to a vfsmount in order to
prevent the superblock vanishing before an oplock break has completed.
What we really want to do instead is to keep sb->s_active high until the
oplock break has completed. This patch borrows the scheme that NFS uses
for handling sillyrenames.
An atomic_t is added to the cifs_sb_info. When it transitions from 0 to
1, an extra reference to the superblock is taken (by bumping the
s_active value). When it transitions from 1 to 0, that reference is
dropped and a the superblock teardown may proceed if there are no more
references to it.
Also, the vfsmount pointer is removed from cifsFileInfo and from
cifs_new_fileinfo, and some bogus forward declarations are removed from
cifsfs.h.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Suresh Jayaraman <sjayaraman@suse.de>
Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
cifsFileInfo is a bit problematic. It contains a reference back to the
struct file itself. This makes it difficult for a cifsFileInfo to exist
without a corresponding struct file.
It would be better instead of the cifsFileInfo just held info pertaining
to the open file on the server instead without any back refrences to the
struct file. This would allow it to exist after the filp to which it was
originally attached was closed.
Much of the use of the file pointer in this struct is to get at the
dentry. Begin divorcing the cifsFileInfo from the struct file by
keeping a reference to the dentry. Since the dentry will have a
reference to the inode, we can eliminate the "pInode" field too and
convert the igrab/iput to dget/dput.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Suresh Jayaraman <sjayaraman@suse.de>
Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
When we implement multiuser mounts, we'll need to filter filehandles
by fsuid. Add a flag for multiuser mounts and code to filter by
fsuid when it's set.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
cifsFileInfo needs a pointer to a tcon, but it doesn't currently hold a
reference to it. Change it to keep a pointer to a tcon_link instead and
hold a reference to it.
That will keep the tcon from being freed until the file is closed.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Eventually, we'll need to track the use of tcons on a per-sb basis, so that
we know when it's ok to tear them down. Begin this conversion by adding a
new "tcon_link" struct and accessors that get it. For now, the core data
structures are untouched -- cifs_sb still just points to a single tcon and
the pointers are just cast to deal with the accessor functions. A later
patch will flesh this out.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Get a reference to the file early so we can eventually base the decision
about signing on the correct tcon. If that doesn't work for some reason,
then fall back to generic_writepages. That's just as likely to fail, but
it simplifies the error handling.
In truth, I'm not sure how that could occur anyway, so maybe a NULL
open_file here ought to be a BUG()?
After that, we drop the reference to the open_file and then we re-get
one prior to each WriteAndX call. This helps ensure that the filehandle
isn't held open any longer than necessary and that open files are
reclaimed prior to each write call.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
To minimize calls to cifs_sb_tcon and to allow for a clear error path if
a tcon can't be acquired.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
When we convert cifs to do multiple sessions per mount, we'll need more
than one tcon per superblock. At that point "cifs_sb->tcon" will make
no sense. Add a new accessor function that gets a tcon given a cifs_sb.
For now, it just returns cifs_sb->tcon. Later it'll do more.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
cifs_new_fileinfo() does not use the 'oplock' value from the callers. Instead,
it sets it to REQ_OPLOCK which seems wrong. We should be using the oplock value
obtained from the Server to set the inode's clientCanCacheAll or
clientCanCacheRead flags. Fix this by passing oplock from the callers to
cifs_new_fileinfo().
This change dates back to commit a6ce4932 (2.6.30-rc3). So, all the affected
versions will need this fix. Please Cc stable once reviewed and accepted.
Cc: Stable <stable@kernel.org>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
... and avoid implicit casting from a signed type. Also, pass oplock by value
instead by reference as we don't intend to change the value in
cifs_open_inode_helper().
Thanks to Jeff Layton for spotting this.
Reviewed-by: Jeff Layton <jlayton@samba.org>
Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
cifs has a lot of complicated functions that have to clean up things on
error, but some of them don't have all of the cleanup code
well-consolidated. Clean up and consolidate error handling in several
functions.
This is in preparation of later patches that will need to put references
to the tcon link container.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (55 commits)
workqueue: mark init_workqueues() as early_initcall()
workqueue: explain for_each_*cwq_cpu() iterators
fscache: fix build on !CONFIG_SYSCTL
slow-work: kill it
gfs2: use workqueue instead of slow-work
drm: use workqueue instead of slow-work
cifs: use workqueue instead of slow-work
fscache: drop references to slow-work
fscache: convert operation to use workqueue instead of slow-work
fscache: convert object to use workqueue instead of slow-work
workqueue: fix how cpu number is stored in work->data
workqueue: fix mayday_mask handling on UP
workqueue: fix build problem on !CONFIG_SMP
workqueue: fix locking in retry path of maybe_create_worker()
async: use workqueue for worker pool
workqueue: remove WQ_SINGLE_CPU and use WQ_UNBOUND instead
workqueue: implement unbound workqueue
workqueue: prepare for WQ_UNBOUND implementation
libata: take advantage of cmwq and remove concurrency limitations
workqueue: fix worker management invocation without pending works
...
Fixed up conflicts in fs/cifs/* as per Tejun. Other trivial conflicts in
include/linux/workqueue.h, kernel/trace/Kconfig and kernel/workqueue.c
Read pages from a FS-Cache data storage object into a CIFS inode.
Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Store pages from an CIFS inode into the data storage object associated with
that inode.
Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Takes care of invalidation and release of FS-Cache marked pages and also
invalidation of the FsCache page flag when the inode is removed.
Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Define inode-level data storage objects (managed by cifsInodeInfo structs).
Each inode-level object is created in a super-block level object and is itself
a data storage object in to which pages from the inode are stored.
The inode object is keyed by UniqueId. The coherency data being used is
LastWriteTime, LastChangeTime and end of file reported by the server.
Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Workqueue can now handle high concurrency. Use system_nrt_wq
instead of slow-work.
* Updated is_valid_oplock_break() to not call cifs_oplock_break_put()
as advised by Steve French. It might cause deadlock. Instead,
reference is increased after queueing succeeded and
cifs_oplock_break() briefly grabs GlobalSMBSeslock before putting
the cfile to make sure it doesn't put before the matching get is
finished.
* Anton Blanchard reported that cifs conversion was using now gone
system_single_wq. Use system_nrt_wq which provides non-reentrance
guarantee which is enough and much better.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Steve French <sfrench@samba.org>
Cc: Anton Blanchard <anton@samba.org>
It's currently possible for cifs_open to fail after it has already
called cifs_new_fileinfo. In that situation, the new fileinfo will be
leaked as the caller doesn't call fput. That in turn leads to a busy
inodes after umount problem since the fileinfo holds an extra inode
reference now. Shuffle cifs_open around a bit so that it only calls
cifs_new_fileinfo if it's going to succeed.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-and-Tested-by: Suresh Jayaraman <sjayaraman@suse.de>
...and ensure that we propagate the error back to avoid any surprises.
Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Reviewed-and-Tested-by: Jeff Layton <jlayton@redhat.com>
...which takes a ton of unneeded arguments and does a lot more pointer
dereferencing than is really needed.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-and-Tested-by: Suresh Jayaraman <sjayaraman@suse.de>
The current scheme of sticking open files on a list and assuming that
cifs_open will scoop them off of it is broken and leads to "Busy
inodes after umount..." errors at unmount time.
The problem is that there is no guarantee that cifs_open will always
be called after a ->lookup or ->create operation. If there are
permissions or other problems, then it's quite likely that it *won't*
be called.
Fix this by fully instantiating the filp whenever the file is created
and pass that filp back to the VFS. If there is a problem, the VFS
can clean up the references.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-and-Tested-by: Suresh Jayaraman <sjayaraman@suse.de>
Having cifs_posix_open call cifs_new_fileinfo is problematic and
inconsistent with how "regular" opens work. It's also buggy as
cifs_reopen_file calls this function on a reconnect, which creates a new
struct cifsFileInfo that just gets leaked.
Push it out into the callers. This also allows us to get rid of the
"mnt" arg to cifs_posix_open.
Finally, in the event that a cifsFileInfo isn't or can't be created, we
always want to close the filehandle out on the server as the client
won't have a record of the filehandle and can't actually use it. Make
sure that CIFSSMBClose is called in those cases.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-and-Tested-by: Suresh Jayaraman <sjayaraman@suse.de>
Commit 315e995c63 is causing OOM kills
when stress-testing a CIFS filesystem. The VFS readpages operation takes
a page reference. The older code just handed this reference off to the
page cache, but the new code takes an extra one. The simplest fix is to
put the new reference after add_to_page_cache_lru.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
..a left over from the commit 3321b791b2.
Cc: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
On lease break we were breaking to readonly leases always
even if write requested. Also removed experimental
ifdef around setlease code
Signed-off-by: Steve French <sfrench@us.ibm.com>
While creating a file on a server which supports unix extensions
such as Samba, if a file is being created which does not supply
nameidata (i.e. nd is null), cifs client can oops when calling
cifs_posix_open.
Signed-off-by: Shirish Pargaonkar <shirishp@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
When process does fork() private_data of files with lock list stays the same
for file descriptors of the parent and of the child. While finishing the child closes
files and deletes locks from the list even if unlocking fails. When the child process
finishes the parent doesn't have lock in lock list and can't unlock previously before
fork() locked region after the child process finished.
This patch provides behaviour to save locks in lock list if unlocking fails.
Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com>
Reviewed-by: Jeff Layton <jlayton@samba.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Neaten cERROR and cFYI macros, reduce text space
~2.5K
Convert '__FILE__ ": " fmt' to '"%s: " fmt', __FILE__' to save text space
Surround macros with do {} while
Add parentheses to macros
Make statement expression macro from macro with assign
Remove now unnecessary parentheses from cFYI and cERROR uses
defconfig with CIFS support old
$ size fs/cifs/built-in.o
text data bss dec hex filename
156012 1760 148 157920 268e0 fs/cifs/built-in.o
defconfig with CIFS support old
$ size fs/cifs/built-in.o
text data bss dec hex filename
153508 1760 148 155416 25f18 fs/cifs/built-in.o
allyesconfig old:
$ size fs/cifs/built-in.o
text data bss dec hex filename
309138 3864 74824 387826 5eaf2 fs/cifs/built-in.o
allyesconfig new
$ size fs/cifs/built-in.o
text data bss dec hex filename
305655 3864 74824 384343 5dd57 fs/cifs/built-in.o
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
add_to_page_cache_lru is exported, so it should be used. Benefits over
using a private pagevec: neater code, 128 bytes fewer stack used, percpu
lru ordering is preserved, and finally don't need to flush pagevec
before returning so batching may be shared with other LRU insertions.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Reviewed-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
not overwriting file_lock structure after GET_LK
cifs: Fix a kernel BUG with remote OS/2 server (try #3)
[CIFS] initialize nbytes at the beginning of CIFSSMBWrite()
[CIFS] Add mmap for direct, nobrl cifs mount types
If we have preventing lock, cifs should overwrite file_lock structure
with info about preventing lock. If we haven't preventing lock, cifs
should leave it unchanged except for the lock type (change it to F_UNLCK).
Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com>
Reviewed-by: Jeff Layton <jlayton@samba.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
...to allow updating inode attributes on an existing inode by
filehandle. Change mmap and llseek codepaths to use that
instead of cifs_revalidate_dentry since they have a filehandle
readily available.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
cifs_revalidate is renamed to cifs_revalidate_dentry as a later patch
will add a by-filehandle variant.
Add a new "invalid_mapping" flag to the cifsInodeInfo that indicates
that the pagecache is considered invalid. Add a new routine to check
inode attributes whenever they're updated and set that flag if the inode
has changed on the server.
cifs_revalidate_dentry is then changed to just update the attrcache if
needed and then to zap the pagecache if it's not valid.
There are some other behavior changes in here as well. Open files are
now allowed to have their caches invalidated. I see no reason why we'd
want to keep stale data around just because a file is open. Also,
cifs_revalidate_cache uses the server_eof for revalidating the file
size since that should more closely match the size of the file on the
server.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
While Linux provided an O_SYNC flag basically since day 1, it took until
Linux 2.4.0-test12pre2 to actually get it implemented for filesystems,
since that day we had generic_osync_around with only minor changes and the
great "For now, when the user asks for O_SYNC, we'll actually give
O_DSYNC" comment. This patch intends to actually give us real O_SYNC
semantics in addition to the O_DSYNC semantics. After Jan's O_SYNC
patches which are required before this patch it's actually surprisingly
simple, we just need to figure out when to set the datasync flag to
vfs_fsync_range and when not.
This patch renames the existing O_SYNC flag to O_DSYNC while keeping it's
numerical value to keep binary compatibility, and adds a new real O_SYNC
flag. To guarantee backwards compatiblity it is defined as expanding to
both the O_DSYNC and the new additional binary flag (__O_SYNC) to make
sure we are backwards-compatible when compiled against the new headers.
This also means that all places that don't care about the differences can
just check O_DSYNC and get the right behaviour for O_SYNC, too - only
places that actuall care need to check __O_SYNC in addition. Drivers and
network filesystems have been updated in a fail safe way to always do the
full sync magic if O_DSYNC is set. The few places setting O_SYNC for
lower layers are kept that way for now to stay failsafe.
We enforce that O_DSYNC is set when __O_SYNC is set early in the open path
to make sure we always get these sane options.
Note that parisc really screwed up their headers as they already define a
O_DSYNC that has always been a no-op. We try to repair it by using it for
the new O_DSYNC and redefinining O_SYNC to send both the traditional
O_SYNC numerical value _and_ the O_DSYNC one.
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Grant Grundler <grundler@parisc-linux.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Dilger <adilger@sun.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Acked-by: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jan Kara <jack@suse.cz>
The patch to remove cifs_init_private introduced a locking imbalance. It
didn't remove the leftover list addition code and the unlocking in that
function. cifs_new_fileinfo does the list addition now, so there should
be no need to do it outside of that function.
pCifsInode will never be NULL, so we don't need to check for that. This
patch also gets rid of the ugly locking and unlocking across function
calls.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Steve French <sfrench@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
...it does the same thing as cifs_fill_fileinfo, but doesn't handle the
flist ordering correctly. Also rename cifs_fill_fileinfo to a more
descriptive name and have it take an open flags arg instead of just a
write_only flag. That makes the logic in the callers a little simpler.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
This is the fourth respin of the patch to convert oplock breaks to
use the slow_work facility.
A customer of ours was testing a backport of one of the earlier
patchsets, and hit a "Busy inodes after umount..." problem. An oplock
break job had raced with a umount, and the superblock got torn down and
its memory reused. When the oplock break job tried to dereference the
inode->i_sb, the kernel oopsed.
This patchset has the oplock break job hold an inode and vfsmount
reference until the oplock break completes. With this, there should be
no need to take a tcon reference (the vfsmount implicitly holds one
already).
Currently, when an oplock break comes in there's a chance that the
oplock break job won't occur if the allocation of the oplock_q_entry
fails. There are also some rather nasty races in the allocation and
handling these structs.
Rather than allocating oplock queue entries when an oplock break comes
in, add a few extra fields to the cifsFileInfo struct. Get rid of the
dedicated cifs_oplock_thread as well and queue the oplock break job to
the slow_work thread pool.
This approach also has the advantage that the oplock break jobs can
potentially run in parallel rather than be serialized like they are
today.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
It's possible that this struct will outlive the filp to which it is
attached. If it does and it needs to do some work on the inode, then
it'll need a reference.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
cifs_posix_open takes a "poplock" argument that's intended to be used in
the actual posix open call to set the "Flags" field. It ignores this
value however and declares an "oplock" parameter on the stack that it
passes uninitialized to the CIFSPOSIXOpen function. Not only does this
mean that the oplock request flags are bogus, but the result that's
expected to be in that variable is unchanged.
Fix this, and also clean up the type of the oplock parameter used. Since
it's expected to be __u32, we should use that everywhere and not
implicitly cast it from a signed type.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Currently, cifs_close() tries to wait until all I/O is complete and then
frees the file private data. If I/O does not completely in a reasonable
amount of time it frees the structure anyway, leaving a potential use-
after-free situation.
This patch changes the wrtPending counter to a complete reference count and
lets the last user free the structure.
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Tested-by: Shirish Pargaonkar <shirishp@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
cifs: rename CIFSSMBUnixSetInfo to CIFSSMBUnixSetPathInfo
...in preparation of adding a SET_FILE_INFO variant.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Fixes a regression caused by commit a6ce4932fb
When this lock was converted to a mutex, the locks were turned into
unlocks and vice-versa.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Shirish Pargaonkar <shirishp@us.ibm.com>
Cc: Stable Tree <stable@kernel.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
FreeXid() along with freeing Xid does add a cifsFYI debug message that
prints rc (return code) as well. In some code paths where we set/return
error code after calling FreeXid(), incorrect error code is being
printed when cifsFYI is enabled.
This could be misleading in few cases. For eg.
In cifs_open() if cifs_fill_filedata() returns a valid pointer to
cifsFileInfo, FreeXid() prints rc=-13 whereas 0 is actually being
returned. Fix this by setting rc before calling FreeXid().
Basically convert
FreeXid(xid); rc = -ERR;
return -ERR; => FreeXid(xid);
return rc;
[Note that Christoph would like to replace the GetXid/FreeXid
calls, which are primarily used for debugging. This seems
like a good longer term goal, but although there is an
alternative tracing facility, there are no examples yet
available that I know of that we can use (yet) to
convert this cifs function entry/exit logging, and for
creating an identifier that we can use to correlate
all dmesg log entries for a particular vfs operation
(ie identify all log entries for a particular vfs
request to cifs: e.g. a particular close or read or write
or byte range lock call ... and just using the thread id
is harder). Eventually when a replacement
for this is available (e.g. when NFS switches over and various
samples to look at in other file systems) we can remove the
GetXid/FreeXid macro but in the meantime multiple people
use this run time configurable logging all the time
for debugging, and Suresh's patch fixes a problem
which made it harder to notice some low
memory problems in the log so it is worthwhile
to fix this problem until a better logging
approach is able to be used]
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
...and just have the function call le64_to_cpu.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Posix open code was not properly adding the file to the
list of open files. Fix allocating cifsFileInfo
more than once, and adding twice to flist and tlist.
Also fix mode setting to be done in one place in these
paths.
Signed-off-by: Steve French <sfrench@us.ibm.com>
Reviewed-by: Shirish Pargaonkar <shirishp@us.ibm.com>
Tested-by: Jeff Layton <jlayton@redhat.com>
Tested-by: Luca Tettamanti <kronos.it@gmail.com>
Remove adding open file entry twice to lists in the file
Do not fill file info twice in case of posix opens and creates
Signed-off-by: Shirish Pargaonkar <shirishp@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
This patch by utilizing lookup intents, and thus removing a network
roundtrip in the open path, improves performance dramatically on
open (30% or more) to Samba and other servers which support the
cifs posix extensions
Signed-off-by: Shirish Pargaonkar <shirishp@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
This is the fourth version of this patch:
The first three generated a compiler warning asking for explicit curly
braces.
The first two didn't handle update the size correctly when writes that
didn't start at the eof were done.
The first patch also didn't update the size correctly when it explicitly
set via truncate().
This patch adds code to track the client's current understanding of the
size of the file on the server separate from the i_size, and then to use
this info to semi-intelligently set the timeout for writes past the EOF.
This helps prevent timeouts when trying to write large, sparse files on
windows servers.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Samba server (version 3.3.1 and earlier, and 3.2.8 and earlier) incorrectly
required the O_CREAT flag on posix open (even when a file was not being
created). This disables posix open (create is still ok) after the first
attempt returns EINVAL (and logs an error, once, recommending that they
update their server).
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
If the network connection crashes, and we have to reopen files, preferentially
use the newer cifs posix open protocol operation if the server supports it.
Signed-off-by: Steve French <sfrench@us.ibm.com>
If this mount option is set, when an application does an
fsync call then the cifs client does not send an SMB Flush
to the server (to force the server to write all dirty data
for this file immediately to disk), although cifs still sends
all dirty (cached) file data to the server and waits for the
server to respond to the write write. Since SMB Flush can be
very slow, and some servers may be reliable enough (to risk
delaying slightly flushing the data to disk on the server),
turning on this option may be useful to improve performance for
applications that fsync too much, at a small risk of server
crash. If this mount option is not set, by default cifs will
send an SMB flush request (and wait for a response) on every
fsync call.
Signed-off-by: Steve French <sfrench@us.ibm.com>
In contrast to the now-obsolete smbfs, cifs does not send SMB_COM_FLUSH
in response to an explicit fsync(2) to guarantee that all volatile data
is written to stable storage on the server side, provided the server
honors the request (which, to my knowledge, is true for Windows and
Samba with 'strict sync' enabled).
This patch modifies the cifs_fsync implementation to restore the
fsync-behavior of smbfs by triggering SMB_COM_FLUSH after sending
outstanding data on the client side to the server.
Signed-off-by: Horst Reiterer <horst.reiterer@gmail.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
With the write_begin/write_end aops, page_symlink was broken because it
could no longer pass a GFP_NOFS type mask into the point where the
allocations happened. They are done in write_begin, which would always
assume that the filesystem can be entered from reclaim. This bug could
cause filesystem deadlocks.
The funny thing with having a gfp_t mask there is that it doesn't really
allow the caller to arbitrarily tinker with the context in which it can be
called. It couldn't ever be GFP_ATOMIC, for example, because it needs to
take the page lock. The only thing any callers care about is __GFP_FS
anyway, so turn that into a single flag.
Add a new flag for write_begin, AOP_FLAG_NOFS. Filesystems can now act on
this flag in their write_begin function. Change __grab_cache_page to
accept a nofs argument as well, to honour that flag (while we're there,
change the name to grab_cache_page_write_begin which is more instructive
and does away with random leading underscores).
This is really a more flexible way to go in the end anyway -- if a
filesystem happens to want any extra allocations aside from the pagecache
ones in ints write_begin function, it may now use GFP_KERNEL (rather than
GFP_NOFS) for common case allocations (eg. ocfs2_alloc_write_ctxt, for a
random example).
[kosaki.motohiro@jp.fujitsu.com: fix ubifs]
[kosaki.motohiro@jp.fujitsu.com: fix fuse]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: <stable@kernel.org> [2.6.28.x]
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Cleaned up the calling convention: just pass in the AOP flags
untouched to the grab_cache_page_write_begin() function. That
just simplifies everybody, and may even allow future expansion of the
logic. - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some applications/subsystems require mandatory byte range locks
(as is used for Windows/DOS/OS2 etc). Sending advisory (posix style)
byte range lock requests (instead of mandatory byte range locks) can
lead to problems for these applications (which expect that other
clients be prevented from writing to portions of the file which
they have locked and are updating). This mount option allows
mounting cifs with the new mount option "forcemand" (or
"forcemandatorylock") in order to have the cifs client use mandatory
byte range locks (ie SMB/CIFS/Windows/NTFS style locks) rather than
posix byte range lock requests, even if the server would support
posix byte range lock requests. This has no effect if the server
does not support the CIFS Unix Extensions (since posix style locks
require support for the CIFS Unix Extensions), but for mounts
to Samba servers this can be helpful for Wine and applications
that require mandatory byte range locks.
Acked-by: Jeff Layton <jlayton@redhat.com>
CC: Alexander Bokovoy <ab@samba.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
The conversion to write_begin/write_end interfaces had a bug where we
were passing a bad parameter to cifs_readpage_worker. Rather than
passing the page offset of the start of the write, we needed to pass the
offset of the beginning of the page. This was reliably showing up as
data corruption in the fsx-linux test from LTP.
It also became evident that this code was occasionally doing unnecessary
read calls. Optimize those away by using the PG_checked flag to indicate
that the unwritten part of the page has been initialized.
CC: Nick Piggin <npiggin@suse.de>
Acked-by: Dave Kleikamp <shaggy@us.ibm.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
If a connection with open file handles has gone down
and come back up and reconnected without reopening
the file handle yet, do not attempt to send an SMB close
request for this handle in cifs_close. We were
checking for the connection being invalid in cifs_close
but since the connection may have been reconnected
we also need to check whether the file handle
was marked invalid (otherwise we could close the
wrong file handle by accident).
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Fixes a data corruption under heavy stress in which pages could be left
dirty after all open instances of a inode have been closed.
In order to write contiguous pages whenever possible, cifs_writepages()
asks pagevec_lookup_tag() for more pages than it may write at one time.
Normally, it then resets index just past the last page written before calling
pagevec_lookup_tag() again.
If cifs_writepages() can't write the first page returned, it wasn't resetting
index, and the next call to pagevec_lookup_tag() resulted in skipping all of
the pages it previously returned, even though cifs_writepages() did nothing
with them. This can result in data loss when the file descriptor is about
to be closed.
This patch ensures that index gets set back to the next returned page so
that none get skipped.
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Cc: Shirish S Pargaonkar <shirishp@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
In preparation for Jeff's big umount/mount fixes to remove the possibility of
various races in cifs mount and linked list handling of sessions, sockets and
tree connections, this patch cleans up some repetitive code in cifs_mount,
and addresses a problem with ses->status and tcon->tidStatus in which we
were overloading the "need_reconnect" state with other status in that
field. So the "need_reconnect" flag has been broken out from those
two state fields (need reconnect was not mutually exclusive from some of the
other possible tid and ses states). In addition, a few exit cases in
cifs_mount were cleaned up, and a problem with a tcon flag (for lease support)
was not being set consistently for the 2nd mount of the same share
CC: Jeff Layton <jlayton@redhat.com>
CC: Shirish Pargaonkar <shirishp@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
smb_send2 exit logic was strange, and with the previous change
could cause us to fail large
smb writes when all of the smb was not sent as one chunk.
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Split the LRU lists in two, one set for pages that are backed by real file
systems ("file") and one for pages that are backed by memory and swap
("anon"). The latter includes tmpfs.
The advantage of doing this is that the VM will not have to scan over lots
of anonymous pages (which we generally do not want to swap out), just to
find the page cache pages that it should evict.
This patch has the infrastructure and a basic policy to balance how much
we scan the anon lists and how much we scan the file lists. The big
policy changes are in separate patches.
[lee.schermerhorn@hp.com: collect lru meminfo statistics from correct offset]
[kosaki.motohiro@jp.fujitsu.com: prevent incorrect oom under split_lru]
[kosaki.motohiro@jp.fujitsu.com: fix pagevec_move_tail() doesn't treat unevictable page]
[hugh@veritas.com: memcg swapbacked pages active]
[hugh@veritas.com: splitlru: BDI_CAP_SWAP_BACKED]
[akpm@linux-foundation.org: fix /proc/vmstat units]
[nishimura@mxp.nes.nec.co.jp: memcg: fix handling of shmem migration]
[kosaki.motohiro@jp.fujitsu.com: adjust Quicklists field of /proc/meminfo]
[kosaki.motohiro@jp.fujitsu.com: fix style issue of get_scan_ratio()]
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
cifs: Convert cifs to new aops.
This patch is based on the one originally posted by Nick Piggin. His
patch was very close, but had a couple of small bugs. Nick's original
comments follow:
This is another relatively naive conversion. Always do the read upfront
when the page is not uptodate (unless we're in the writethrough path).
Fix an uninitialized data exposure where SetPageUptodate was called
before the page was uptodate.
SetPageUptodate and switch to writeback mode in the case that the full
page was dirtied.
Acked-by: Shaggy <shaggy@austin.ibm.com>
Acked-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
When the CIFS client goes to write out pages, it needs to pick a
filehandle to write to. find_writeable_file however just picks the
first filehandle that it finds. This can cause problems when a lock
is issued against a particular filehandle and we pick a different
filehandle to write to.
This patch tries to avert this situation by having find_writable_file
prefer filehandles that have a pid that matches the current task.
This seems to fix lock test 11 from the connectathon test suite when
run against a windows server.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
The direct I/O write codepath for CIFS is done through
cifs_user_write(). That function does not currently call
generic_write_checks() so the file position isn't being properly set
when the file is opened with O_APPEND. It's also not doing the other
"normal" checks that should be done for a write call.
The problem is currently that when you open a file with O_APPEND on a
mount with the directio mount option, the file position is set to the
beginning of the file. This makes any subsequent writes clobber the data
in the file starting at the beginning.
This seems to fix the problem in cursory testing. It is, however
important to note that NFS disallows the combination of
(O_DIRECT|O_APPEND). If my understanding is correct, the concern is
races with multiple clients appending to a file clobbering each others'
data. Since the write model for CIFS and NFS is pretty similar in this
regard, CIFS is probably subject to the same sort of races. What's
unclear to me is why this is a particular problem with O_DIRECT and not
with buffered writes...
Regardless, disallowing O_APPEND on an entire mount is probably not
reasonable, so we'll probably just have to deal with it and reevaluate
this flag combination when we get proper support for O_DIRECT. In the
meantime this patch at least fixes the existing problem.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: Stable Tree <stable@kernel.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
[CIFS] list entry can not return null
turn cifs_setattr into a multiplexor that calls the correct function
move file time and dos attribute setting logic into new function
spin off cifs_setattr with unix extensions to its own function
[CIFS] Code cleanup in old sessionsetup code
[CIFS] cifs_mkdir and cifs_create should respect the setgid bit on parent dir
Rename CIFSSMBSetFileTimes to CIFSSMBSetFileInfo and add PID arg
change CIFSSMBSetTimes to CIFSSMBSetPathInfo
[CIFS] fix trailing whitespace
bundle up Unix SET_PATH_INFO args into a struct and change name
Fix missing braces in cifs_revalidate()
remove locking around tcpSesAllocCount atomic variable
[CIFS] properly account for new user= field in SPNEGO upcall string allocation
[CIFS] remove level of indentation from decode_negTokenInit
[CIFS] cifs send2 not retrying enough in some cases on full socket
[CIFS] oid should also be checked against class in cifs asn
We'd like to be able to use the unix SET_PATH_INFO_BASIC args to set
file times as well, but that makes the argument list rather long. Bundle
up the args for unix SET_PATH_INFO call into a struct. For now, we don't
actually use the times fields anywhere. That will be done in a follow-on
patch.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Converting page lock to new locking bitops requires a change of page flag
operation naming, so we might as well convert it to something nicer
(!TestSetPageLocked_Lock => trylock_page, SetPageLocked => set_page_locked).
This also facilitates lockdeping of page lock.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
cifs_convert_flags returns 0x20197 in the default case. It's not
immediately evident where that number comes from, so change it
to be an or'ed set of flags. The compiler will boil it down anyway.
(Thanks to Guenter Kukkukk for clarifying the flags).
Signed-off-by: Steve French <sfrench@us.ibm.com>
Shirish Pargaonkar noted:
With cifsacl mount option, when a file is created on the Windows server,
exclusive oplock is broken right away because the get cifs acl code
again opens the file to obtain security descriptor.
The client does not have the newly created file handle or inode in any
of its lists yet so it does not respond to oplock break and server waits for
its duration and then responds to the second open. This slows down file
creation signficantly. The fix is to pass the file descriptor to the get
cifsacl code wherever available so that get cifs acl code does not send
second open (NT Create ANDX) and oplock is not broken.
CC: Shirish Pargaonkar <shirishp@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Christoph had noticed too many ifdefs in the CIFS code making it
hard to read. This patch removes about a quarter of them from
the C files in cifs by improving a few key ifdefs in the .h files.
Signed-off-by: Steve French <sfrench@us.ibm.com>
Fix RedHat bug 329431
The idea here is separate "conscious" from "unconscious" flushes.
Conscious flushes are those due to a fsync() or close(). Unconscious
ones are flushes that occur as a side effect of some other operation or
due to memory pressure.
Currently, when an error occurs during an unconscious flush (ENOSPC or
EIO), we toss out the page and don't preserve that error to report to
the user when a conscious flush occurs. If after the unconscious flush,
there are no more dirty pages for the inode, the conscious flush will
simply return success even though there were previous errors when writing
out pages. This can lead to data corruption.
The easiest way to reproduce this is to mount up a CIFS share that's
very close to being full or where the user is very close to quota. mv
a file to the share that's slightly larger than the quota allows. The
writes will all succeed (since they go to pagecache). The mv will do a
setattr to set the new file's attributes. This calls
filemap_write_and_wait,
which will return an error since all of the pages can't be written out.
Then later, when the flush and release ops occur, there are no more
dirty pages in pagecache for the file and those operations return 0. mv
then assumes that the file was written out correctly and deletes the
original.
CIFS already has a write_behind_rc variable where it stores the results
from earlier flushes, but that value is only reported in cifs_close.
Since the VFS ignores the return value from the release operation, this
isn't helpful. We should be reporting this error during the flush
operation.
This patch does the following:
1) changes cifs_fsync to use filemap_write_and_wait and cifs_flush and also
sync to check its return code. If it returns successful, they then check
the value of write_behind_rc to see if an earlier flush had reported any
errors. If so, they return that error and clear write_behind_rc.
2) sets write_behind_rc in a few other places where pages are written
out as a side effect of other operations and the code waits on them.
3) changes cifs_setattr to only call filemap_write_and_wait for
ATTR_SIZE changes.
4) makes cifs_writepages accurately distinguish between EIO and ENOSPC
errors when writing out pages.
Some simple testing indicates that the patch works as expected and that
it fixes the reproduceable known problem.
Acked-by: Dave Kleikamp <shaggy@austin.rr.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
request
In SendReceive() function in transport.c - it memcpy's
message payload into a buffer passed via out_buf param. The function
assumes that all buffers are of size (CIFSMaxBufSize +
MAX_CIFS_HDR_SIZE) , unfortunately it is also called with smaller
(MAX_CIFS_SMALL_BUFFER_SIZE) buffers. There are eight callers
(SMB worker functions) which are primarily affected by this change:
TreeDisconnect, uLogoff, Close, findClose, SetFileSize, SetFileTimes,
Lock and PosixLock
CC: Dave Kleikamp <shaggy@austin.ibm.com>
CC: Przemyslaw Wegrzyn <czajnik@czajsoft.pl>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
When find_writable_file is racing with close and the session
to the server goes down, Shaggy noticed that there was a
chance that an open file in the list of files off the inode
could have been freed by close since cifs_reconnect can
block (the spinlock thus not held). This means that
we have to start over at the beginning of the list in some
cases.
There is a 2nd change that needs to be made later
(pointed out by Jeremy Allison and Shaggy) in order to
prevent cifs_close ever freeing the cifs per file info
when a write is pending. Although we delay close from
freeing this memory for sufficiently long for all known
cases, ultimately on a very, very slow write
overlapping a close pending we need to allow close to return
(without freeing the cifs file info) and defer freeing the
memory to be the responsibility of the (sloooow) write
thread (presumably have to look at every place wrtPending
is decremented - and add a flag for deferred free for
after wrtPending goes to zero).
Acked-by: Shaggy <shaggy@us.ibm.com>
Acked-by: Shirish Pargaonkar <shirishp@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Harmless since it only protected turning off caching for the
inode, but cleaner to lock around this in case we have a close
racing with open.
Signed-off-by: Shaggy <shaggy@us.ibm.com>
CC: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
There was a case in which find_writable_file was not waiting long enough
under heavy stress when writepages was racing with close of the file
handle being used by the write.
Signed-off-by: Steve French <sfrench@us.ibm.com>
On a mount without posix extensions enabled, when an unlock request is
made, the client can release more than is intended. To reproduce, on a
CIFS mount without posix extensions enabled:
1) open file
2) do fcntl lock: start=0 len=1
3) do fcntl lock: start=2 len=1
4) do fcntl unlock: start=0 len=1
...on the unlock call the client sends an unlock request to the server
for both locks. The problem is a bad test in cifs_lock.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Caused by unneeded reopen during reconnect while spinlock held.
Fixes kernel bugzilla bug #7903
Thanks to Lin Feng Shen for testing this, and Amit Arora for
some nice problem determination to narrow this down.
Acked-by: Dave Kleikamp <shaggy@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Previously the only way to do this was to umount all mounts to that server,
turn off a proc setting (/proc/fs/cifs/LinuxExtensionsEnabled).
Fixes Samba bugzilla bug number: 4582 (and also 2008)
Signed-off-by: Steve French <sfrench@us.ibm.com>
It's common for file systems to need to zero data on either side of a
write, if a page is not Uptodate during prepare_write. It just so happens
that simple_prepare_write() in libfs.c does exactly that, so we can avoid
duplication and just call that function to zero page data.
Signed-off-by: Nate Diller <nate.diller@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
This should be the last big batch of whitespace/formatting fixes.
checkpatch warnings for the cifs directory are down about 90% and
many of the remaining ones are harder to remove or make the code
harder to read.
Signed-off-by: Steve French <sfrench@us.ibm.com>
nfsd is passing null nameidata (probably the only one doing that)
on call to create - cifs was missing one check for this.
Note that running nfsd over a cifs mount requires specifying fsid on
the nfs exports entry and requires mounting cifs with serverino mount
option.
Signed-off-by: Steve French <sfrench@us.ibm.com>
Remove includes of <linux/smp_lock.h> where it is not used/needed.
Suggested by Al Viro.
Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
sparc64, and arm (all 59 defconfigs).
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Originally at http://lkml.org/lkml/2006/9/2/86
The recent change to "allow Windows blocking locks to be cancelled via a
CANCEL_LOCK call" introduced a new semaphore in struct cifsFileInfo,
lock_sem. However, semaphores used as mutexes are deprecated these days,
and there's no reason to add a new one to the kernel. Therefore, convert
lock_sem to a struct mutex (and also fix one indentation glitch on one of
the lines changed anyway).
Signed-off-by: Roland Dreier <roland@digitalvampire.org>
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Also expand debug entry to show which character on a failed Unicode
mapping.
Acked-by: Shaggy <shaggy@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
file->f_path.dentry or file->f_path.dentry.d_inode can't be NULL since at
least ten years, similar for all but very few arguments passed in from the
VFS.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Could cause hangs on smp systems in i_size_read on a cifs inode
whose size has been previously simultaneously updated from
different processes.
Thanks to Brian Wang for some great testing/debugging on this
hard problem.
Fixes kernel bugzilla #7903
CC: Shirish Pargoankar <shirishp@us.ibm.com>
CC: Shaggy <shaggy@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
The two cifs functions that used the most stack according
to "make checkstack" have been changed to use less stack.
Thanks to jra and Shaggy for helpful ideas
Signed-off-by: Steve French <sfrench@us.ibm.com>
cc: jra@samba.org
cc: shaggy@us.ibm.com
This also adds he required page "writeback" flag handling, that cifs
hasn't been doing and that the page dirty flag changes made obvious.
Acked-by: Steve French <smfltc@us.ibm.com>
Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
CIFS implements ->readpages and doesn't use read_cache_pages(). So wire the
read IO accounting up within CIFS.
Cc: Jay Lan <jlan@sgi.com>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Chris Sturtivant <csturtiv@sgi.com>
Cc: Tony Ernst <tee@sgi.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
Cc: Steven French <sfrench@us.ibm.com>
Cc: David Wright <daw@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Change all the uses of f_{dentry,vfsmnt} to f_path.{dentry,mnt} in the cifs
filesystem.
Signed-off-by: Josef "Jeff" Sipek <jsipek@cs.sunysb.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Informational/debug message was being logged too often. The error
case of logging having to send a close with (presumably stuck on buggy
server) pending writes is still logged.
Signed-off-by: Steve French <sfrench@us.ibm.com>
This just ignore the remaining pages, and will fix a forgot put_pages_list().
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Steven French <sfrench@us.ibm.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove inclusions of linux/mpage.h that are no longer necessary due to the
transfer of generic_writepages().
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Allow Windows blocking locks to be cancelled via a
CANCEL_LOCK call. TODO - restrict this to servers
that support NT_STATUS codes (Win9x will probably
not support this call).
Signed-off-by: Jeremy Allison <jra@samba.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
(cherry picked from 570d4d2d895569825d0d017d4e76b51138f68864 commit)
request and do not time out slow requests to a server that is still responding
well to other threads
Suggested by jra of Samba team
Signed-off-by: Steve French <sfrench@us.ibm.com>
(cherry picked from 89b57148115479eef074b8d3f86c4c86c96ac969 commit)
Same as with already do with the file operations: keep them in .rodata and
prevents people from doing runtime patching.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Steven French <sfrench@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
CIFS takes/releases f_owner.lock - why? It does not change anything in the
fowner state. Remove this locking.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Pass the POSIX lock owner ID to the flush operation.
This is useful for filesystems which don't want to store any locking state
in inode->i_flock but want to handle locking/unlocking POSIX locks
internally. FUSE is one such filesystem but I think it possible that some
network filesystems would need this also.
Also add a flag to indicate that a POSIX locking request was generated by
close(), so filesystems using the above feature won't send an extra locking
request in this case.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When a writeback_control's `start' and `end' fields are used to
indicate a one-byte-range starting at file offset zero, the required
values of .start=0,.end=0 mean that the ->writepages() implementation
has no way of telling that it is being asked to perform a range
request. Because we're currently overloading (start == 0 && end == 0)
to mean "this is not a write-a-range request".
To make all this sane, the patch changes range of writeback_control.
So caller does: If it is calling ->writepages() to write pages, it
sets range (range_start/end or range_cyclic) always.
And if range_cyclic is true, ->writepages() thinks the range is
cyclic, otherwise it just uses range_start and range_end.
This patch does,
- Add LLONG_MAX, LLONG_MIN, ULLONG_MAX to include/linux/kernel.h
-1 is usually ok for range_end (type is long long). But, if someone did,
range_end += val; range_end is "val - 1"
u64val = range_end >> bits; u64val is "~(0ULL)"
or something, they are wrong. So, this adds LLONG_MAX to avoid nasty
things, and uses LLONG_MAX for range_end.
- All callers of ->writepages() sets range_start/end or range_cyclic.
- Fix updates of ->writeback_index. It seems already bit strange.
If it starts at 0 and ended by check of nr_to_write, this last
index may reduce chance to scan end of file. So, this updates
->writeback_index only if range_cyclic is true or whole-file is
scanned.
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Nathan Scott <nathans@sgi.com>
Cc: Anton Altaparmakov <aia21@cantab.net>
Cc: Steven French <sfrench@us.ibm.com>
Cc: "Vladimir V. Saveliev" <vs@namesys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fixes oops to OS/2 on ls and removes redundant NTCreateX calls to servers
which do not support NT SMBs. Key operations to OS/2 work.
Signed-off-by: Steve French <sfrench@us.ibm.com>
cifs should not be overwriting an element of the aops structure, since the
structure is shared by all cifs inodes. Instead define a separate aops
structure to suit each purpose.
I also took the liberty of replacing a hard-coded 4096 with PAGE_CACHE_SIZE
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Steven French <sfrench@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Wasn't able to reproduce a hard hang, but was able to get an oops if
suspended the machine during a copy to the cifs mount. This led to some
things hanging, including a "sync". Also got I/O errors when trying to
access the mount afterwards (even when didn't see the oops), and had
to unmount and remount in order to access the filesystem.
This patch fixed the oops.
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
in directory
Also includes first part of fix to compensate for servers which forget
to return . and .. as well as updates to changelog and cifs readme.
Signed-off-by: Steve French <sfrench@us.ibm.com>
The only user ignores the return value, and the only instanace
(block_sync_page) always returns 0...
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Samba (version 3) server support for this is also currently being
done. This client code is in an experimental path (requires enabling
/proc/fs/cifs/Experimental) while it is being tested.
Signed-off-by: Steve French <sfrench@us.ibm.com>
This patch fixes an oops reported by Adrian Bunk in cifs_user_read when a null
read response is returned on a forcedirectio mount.
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
from server when mount forcedirectio.
Allowing update of file size with non forcedirectio mounts should be
allowed in the fiture but requires carefully writing out the
last page in the local file if it is a partial page in order to
avoid corruption and careful serialization
Thanks to Maximiliano Curia who suggested similar changes and provided
a testcase.
Signed-off-by: Steve French <sfrench@us.ibm.com>
This patch add EXPORT_SYMBOL(filemap_write_and_wait) and use it.
See mm/filemap.c:
And changes the filemap_write_and_wait() and filemap_write_and_wait_range().
Current filemap_write_and_wait() doesn't wait if filemap_fdatawrite()
returns error. However, even if filemap_fdatawrite() returned an
error, it may have submitted the partially data pages to the device.
(e.g. in the case of -ENOSPC)
<quotation>
Andrew Morton writes,
If filemap_fdatawrite() returns an error, this might be due to some
I/O problem: dead disk, unplugged cable, etc. Given the generally
crappy quality of the kernel's handling of such exceptions, there's a
good chance that the filemap_fdatawait() will get stuck in D state
forever.
</quotation>
So, this patch doesn't wait if filemap_fdatawrite() returns the -EIO.
Trond, could you please review the nfs part? Especially I'm not sure,
nfs must use the "filemap_fdatawrite(inode->i_mapping) == 0", or not.
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
need to get in ahead of it that depend on that file handle. Fixes
occassional bad file handle errors on write with heavy use multiple process
cases.
Signed-off-by: Steve French <sfrench@us.ibm.com>
New cifs_writepages routine was not updated bytes written in cifs stats.
Also added ability to clear /proc/fs/cifs/Stats by writing (0 or 1) to it.
Signed-off-by: Steve French <sfrench@us.ibm.com>
This allows cifs_writepages to send data in larger chunks from the page
cache, without requiring larger memory allocations in other cases.
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
eliminate the double copy, and improve cifs write performance and
help the server by upping the typical write size from 4K to 16K
(or even larger if wsize set explicitly) for servers which support this.
Part 1 of 2
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
we do not request more than negotiated buffer size even if buffer
size is small (smaller than one page)
Signed-off-by: Steve French (sfrench@us.ibm.com)
This argument was added in a recent patch, but is unnecessary, since
the superblock is easily obtained from the dentry.
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
The recent change to locks_remove_flock code in fs/locks.c changes how
byte range locks are removed from closing files, which shows up a bug in
cifs.
The assumption in the cifs code was that the close call sent to the
server would remove any pending locks on the server on this file, but
that is no longer safe as the fs/locks.c code on the client wants unlock
of 0 to PATH_MAX to remove all locks (at least from this client, it is
not possible AFAIK to remove all locks from other clients made to the
server copy of the file).
Note that cifs locks are different from posix locks - and it is not
possible to map posix locks perfectly on the wire yet, due to
restrictions of the cifs network protocol, even to Samba without adding
a new request type to the network protocol (which we plan to do for
Samba 3.0.21 within a few months), but the local client will have the
correct, posix view, of the lock in most cases.
The correct fix for cifs for this would involve a bigger change than I
would like to do this late in the 2.6.13-rc cycle - and would involve
cifs keeping track of all unmerged (uncoalesced) byte range locks for
each remote inode and scanning that list to remove locks that intersect
or fall wholly within the range - locks that intersect may have to be
reaquired with the smaller, remaining range.
Signed-off-by: Steve French <sfrench@us.ibm.com>
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make cifs_stats code conditional in the header files to avoid ifdefs in the
main code.
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
and add_to_page_cache fails.
Thanks to Shaggy for pointing out the fix.
Signed-off-by: Steve French (sfrench@us.ibm.com)
Signed-off-by: Shaggy (shaggy@us.ibm.com)
if cifsd thread is no longer running to demultixplex responses.
Do not send FindClose request when FindFirst failed without reaching end
of search.
Signed-off-by: Steve French (sfrench@us.ibm.com)
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.
Let it rip!