The radix tree based inode caches did away with the inode cluster hashes,
replacing them with a bunch of masking and gang lookups on the radix tree.
This masking got broken when moving the code to per-ag radix trees and
indexing by agino # rather than straight inode number. The result is
clustered inode writeback does not cluster and things can go extremely
slowly when there are lots of inodes to write.
Fix it up by comparing the agino # of the inode we just looked up to the
index of the cluster we are looking for.
Tested-by: Torsten Kaiser <just.for.lkml@googlemail.com>
SGI-PV: 972915
SGI-Modid: xfs-linux-melb:xfs-kern:30033a
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
- calculation of 'page_count' was incorrect as it did not
consider the offset of 'mem' into the first page. The
logic to bump 'page_count' didn't work if 'len' was <=
PAGE_CACHE_SIZE (ie offset = 3k, len = 2k).
- setting b_buffer_length to 'len' is incorrect if 'offset'
is > 0. Set it to the total length of the buffer.
- I suspect that passing a non-aligned address into
mem_to_page() for the first page may have been causing
issues - don't know but just tidy up that code anyway.
SGI-PV: 971596
SGI-Modid: xfs-linux-melb:xfs-kern:30143a
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
- sanity check for NULL user buffer in xfs_ioc_bulkstat[_compat]()
- remove the special case for XFS_IOC_FSBULKSTAT with count == 1. This
special case causes bulkstat to fail because the special case uses
xfs_bulkstat_single() instead of xfs_bulkstat() and the two functions
have different semantics. xfs_bulkstat() will return the next inode
after the one supplied while skipping internal inodes (ie quota inodes).
xfs_bulkstate_single() will only lookup the inode supplied and return
an error if it is an internal inode.
- in xfs_bulkstat(), need to initialise 'lastino' to the inode supplied
so in cases were we return without examining any inodes the scan wont
restart back at zero.
- sanity check for valid *ubcountp values. Cannot sanity check for valid
ubuffer here because some users of xfs_bulkstat() don't supply a buffer.
- checks against 'ubleft' (the space left in the user's buffer) should be
against 'statstruct_size' which is the supplied minimum object size.
The mixture of checks against statstruct_size and 0 was one of the
reasons we were skipping inodes.
- if the formatter function returns BULKSTAT_RV_NOTHING and an error and
the error is not ENOENT or EINVAL then we need to abort the scan. ENOENT
is for inodes that are no longer valid and we just skip them. EINVAL is
returned if we try to lookup an internal inode so we skip them too. For
a DMF scan if the inode and DMF attribute cannot fit into the space left
in the user's buffer it would return ERANGE. We didn't handle this error
and skipped the inode. We would continue to skip inodes until one fitted
into the user's buffer or we completed the scan.
- put back the recalculation of agino (that got removed with the last fix)
at the end of the while loop. This is because the code at the start of
the loop expects agino to be the last inode examined if it is non-zero.
- if we found some inodes but then encountered an error, return success
this time and the error next time. If the formatter aborted with ENOMEM
we will now return this error but only if we couldn't read any inodes.
Previously if we encountered ENOMEM without reading any inodes we
returned a zero count and no error which falsely indicated the scan was
complete.
SGI-PV: 973431
SGI-Modid: xfs-linux-melb:xfs-kern:30089a
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: David Chinner <dgc@sgi.com>
The recent behaviour layer removal dropped the check for quotas that have
been requested at mount time but have subsequently been turned off. This
results in a panic when accessing m_quotainfo which has been freed.
This patch adds the check originally made by xfs_qm_syncall() to
xfs_qm_sync().
SGI-PV: 969769
SGI-Modid: xfs-linux-melb:xfs-kern:29908a
Signed-off-by: Donald Douwsma <donaldd@sgi.com>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
endianness annotations in networking code had been in place for quite a
while; in particular, sin_port and s_addr are annotated as big-endian.
Code in ocfs2 had __force casts added apparently to shut the sparse
warnings up; of course, these days they only serve to *produce* warnings
for no reason whatsoever...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
BFS_FILEBLOCKS() expects struct bfs_inode * (on-disk data, with little-
endian fields), not struct bfs_inode_info * (in-core stuff, with host-
endian ones).
It's a macro and fields with the right names are present in
bfs_inode_info, so it compiles, but on big-endian host it gives bogus
results.
Introduced in commit f433dc5634 ("Fixes to
the BFS filesystem driver").
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
access_flags_to_mode() gets on-the-wire data (little-endian) and treats
it as host-endian.
Introduced in commit e01b640013 ("[CIFS]
enable get mode from ACL when cifsacl mount option specified")
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Before we start committing a transaction, we call
__journal_clean_checkpoint_list() to cleanup transaction's written-back
buffers.
If this call happens to remove all of them (and there were already some
buffers), __journal_remove_checkpoint() will decide to free the transaction
because it isn't (yet) a committing transaction and soon we fail some
assertion - the transaction really isn't ready to be freed :).
We change the check in __journal_remove_checkpoint() to free only a
transaction in T_FINISHED state. The locking there is subtle though (as
everywhere in JBD ;(). We use j_list_lock to protect the check and a
subsequent call to __journal_drop_transaction() and do the same in the end
of journal_commit_transaction() which is the only place where a transaction
can get to T_FINISHED state.
Probably I'm too paranoid here and such locking is not really necessary -
checkpoint lists are processed only from log_do_checkpoint() where a
transaction must be already committed to be processed or from
__journal_clean_checkpoint_list() where kjournald itself calls it and thus
transaction cannot change state either. Better be safe if something
changes in future...
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch fixes regression, introduced since 2.6.16. NextStep variant of
UFS as OpenStep uses directory block size equals to 1024. Without this
change, ufs_check_page fails in many cases.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Cc: Dave Bailey <dsbailey@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On 2.6.24, top started showing 100% iowait on one CPU when a UML instance was
running (but completely idle). The UML code sits in io_getevents waiting for
an event to be submitted and completed.
Fix this by checking ctx->reqs_active before scheduling to determine whether
or not we are waiting for I/O.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix breakage caused by commit d5d8c5976d
"freezer: do not send signals to kernel threads" in
jffs2_garbage_collect_thread() that assumed it would be sent signals
by the freezer.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Pete MacKay <armlinux@architechnical.net>
Signed-off-by: Len Brown <len.brown@intel.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/net-2.6: (27 commits)
[INET]: Fix inet_diag dead-lock regression
[NETNS]: Fix /proc/net breakage
[TEXTSEARCH]: Do not allow zero length patterns in the textsearch infrastructure
[NETFILTER]: fix forgotten module release in xt_CONNMARK and xt_CONNSECMARK
[NETFILTER]: xt_TCPMSS: remove network triggerable WARN_ON
[DECNET]: dn_nl_deladdr() almost always returns no error
[IPV6]: Restore IPv6 when MTU is big enough
[RXRPC]: Add missing select on CRYPTO
mac80211: rate limit wep decrypt failed messages
rfkill: fix double-mutex-locking
mac80211: drop unencrypted frames if encryption is expected
mac80211: Fix behavior of ieee80211_open and ieee80211_close
ieee80211: fix unaligned access in ieee80211_copy_snap
mac80211: free ifsta->extra_ie and clear IEEE80211_STA_PRIVACY_INVOKED
SCTP: Fix build issues with SCTP AUTH.
SCTP: Fix chunk acceptance when no authenticated chunks were listed.
SCTP: Fix the supported extensions paramter
SCTP: Fix SCTP-AUTH to correctly add HMACS paramter.
SCTP: Fix the number of HB transmissions.
[TCP] illinois: Incorrect beta usage
...
Well I clearly goofed when I added the initial network namespace support
for /proc/net. Currently things work but there are odd details visible to
user space, even when we have a single network namespace.
Since we do not cache proc_dir_entry dentries at the moment we can just
modify ->lookup to return a different directory inode depending on the
network namespace of the process looking at /proc/net, replacing the
current technique of using a magic and fragile follow_link method.
To accomplish that this patch:
- introduces a shadow_proc method to allow different dentries to
be returned from proc_lookup.
- Removes the old /proc/net follow_link magic
- Fixes a weakness in our not caching of proc generic dentries.
As shadow_proc uses a task struct to decided which dentry to return we can
go back later and fix the proc generic caching without modifying any code
that uses the shadow_proc method.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Make them depend on TCGETS2. If that one is implemented the rest should be
there as well.
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Invalidate attributes on rename, since some filesystems may update
st_ctime. Reported by Szabolcs Szakacsits
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I found problems accessing (executing) previously existing files, until
I did chmod on them (or setattr).
If the fi->attr_version is not initialized, then it could be
larger than fc->attr_version until a setattr is executed, and as a
result the inode attributes would never be set.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
FUSE_FILE_OPS is meant to signal that the kernel will send the open file to to
the userspace filesystem for operations on open files, so that sillyrenaming
unlinked files becomes unnecessary.
However this needs VFS changes, which won't make it into 2.6.24.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some open flags (O_APPEND, O_DIRECT) can be changed with fcntl(F_SETFL, ...)
after open, but fuse currently only sends the flags to userspace in open.
To make it possible to correcly handle changing flags, send the
current value to userspace in each read and write.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently reading a fuse file will stop at cached i_size and return
EOF, even though the file might have grown since the attributes were
last updated.
So detect if trying to read past EOF, and refresh the attributes
before continuing with the read.
Thanks to mpb for the report.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In commit a686cd898b:
"Val's cross-port of the ext3 reservations code into ext2."
include/linux/ext2_fs.h got a new function whose return value is only
defined if __KERNEL__ is defined. Putting #ifdef __KERNEL__ around the
function seems to help, patch below.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg noticed that the call of task_pid_nr_ns() in proc_pid_readdir
is racy with respect to tasks exiting.
After a bit of examination it also appears that the call itself
is completely unnecessary.
So to fix the problem this patch modifies next_tgid() to return
both a tgid and the task struct in question.
A structure is introduced to return these values because it is
slightly cleaner and easier to optimize, and the resulting code
is a little shorter.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
proc_kill_inodes() can clear ->i_fop in the middle of vfs_readdir resulting in
NULL dereference during "file->f_op->readdir(file, buf, filler)".
The solution is to remove proc_kill_inodes() completely:
a) we don't have tricky modules implementing their tricky readdir hooks which
could keeping this revoke from hell.
b) In a situation when module is gone but PDE still alive, standard
readdir will return only "." and "..", because pde->next was cleared by
remove_proc_entry().
c) the race proc_kill_inode() destined to prevent is not completely
fixed, just race window made smaller, because vfs_readdir() is run
without sb_lock held and without file_list_lock held. Effectively,
->i_fop is cleared at random moment, which can't fix properly anything.
BUG: unable to handle kernel NULL pointer dereference at virtual address 00000018
printing eip: c1061205 *pdpt = 0000000005b22001 *pde = 0000000000000000
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: foo af_packet ipv6 cpufreq_ondemand loop serio_raw sr_mod k8temp cdrom hwmon amd_rng
Pid: 2033, comm: find Not tainted (2.6.24-rc1-b1d08ac064268d0ae2281e98bf5e82627e0f0c56 #2)
EIP: 0060:[<c1061205>] EFLAGS: 00010246 CPU: 0
EIP is at vfs_readdir+0x47/0x74
EAX: c6b6a780 EBX: 00000000 ECX: c1061040 EDX: c5decf94
ESI: c6b6a780 EDI: fffffffe EBP: c9797c54 ESP: c5decf78
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process find (pid: 2033, ti=c5dec000 task=c64bba90 task.ti=c5dec000)
Stack: c5decf94 c1061040 fffffff7 0805ffbc 00000000 c6b6a780 c1061295 0805ffbc
00000000 00000400 00000000 00000004 0805ffbc 4588eff4 c5dec000 c10026ba
00000004 0805ffbc 00000400 0805ffbc 4588eff4 bfdc6c70 000000dc 0000007b
Call Trace:
[<c1061040>] filldir64+0x0/0xc5
[<c1061295>] sys_getdents64+0x63/0xa5
[<c10026ba>] sysenter_past_esp+0x5f/0x85
=======================
Code: 49 83 78 18 00 74 43 8d 6b 74 bf fe ff ff ff 89 e8 e8 b8 c0 12 00 f6 83 2c 01 00 00 10 75 22 8b 5e 10 8b 4c 24 04 89 f0 8b 14 24 <ff> 53 18 f6 46 1a 04 89 c7 75 0b 8b 56 0c 8b 46 08 e8 c8 66 00
EIP: [<c1061205>] vfs_readdir+0x47/0x74 SS:ESP 0068:c5decf78
hch: "Nice, getting rid of this is a very good step formwards.
Unfortunately we have another copy of this junk in
security/selinux/selinuxfs.c:sel_remove_entries() which would need the
same treatment."
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Acked-by: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <jmorris@namei.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6:
sysfs: fix off-by-one error in fill_read_buffer()
kobject: two typo fixes
UIO: add UIO documentation target to DocBook Makefile
UIO: fix up the UIO documentation
create /sys/.../power when CONFIG_PM is set
allow LEGACY_PTYS to be set to 0
I found that there is a off-by-one problem in the following code.
Version: 2.6.24-rc2
File: fs/sysfs/file.c:118-122
Function: fill_read_buffer
--------------------------------------------------------------------
count = ops->show(kobj, attr_sd->s_attr.attr, buffer->page);
sysfs_put_active_two(attr_sd);
BUG_ON(count > (ssize_t)PAGE_SIZE);
--------------------------------------------------------------------
Because according to the specification of the sysfs and the implement of
the show methods, the show methods return the number of bytes which would
be generated for the given input, excluding the trailing null.So if the
return value of the show methods equals PAGE_SIZE - 1, the buffer is full
in fact. And if the return value equals PAGE_SIZE, the resulting string
was already truncated,or buffer overflow occurred.
This patch fixes an off-by-one error in fill_read_buffer.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Tejun Heo <teheo@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
fix: http://bugzilla.kernel.org/show_bug.cgi?id=3043
only allow coredumping to the same uid that the coredumping
task runs under.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Alan Cox <alan@redhat.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Acked-by: Al Viro <viro@ftp.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ocfs2_truncate() and ocfs2_remove_inode_range() had reversed their "set
i_size" arguments to ocfs2_truncate_inline(). Fix things so that truncate
sets i_size, and punching a hole ignores it.
This exposed a problem where punching a hole in an inline-data file wasn't
updating the page cache, so fix that too.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
The existing bug statement didn't take into account unhashed dentries which
might not have a cluster lock on them. This could happen if a node exporting
the file system via NFS is rebooted, re-exported to nfs clients and then
unmounted. It's fine in this case to not have a dentry cluster lock.
Just remove the bug statement and replace it with an error print, which
does the proper checks. Though we want to know if something has happened
which might have prevented a cluster lock from being created, it's
definitely not necessary to panic the machine for this.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Enable expensive bitmap scanning only if DEBUG option is enabled.
The bitmap scanning quite loads the CPU and on my machine the write
throughput of dd if=/dev/zero of=/ocfs2/file bs=1M count=500 conv=sync
improves from 37 MB/s to 45.4 MB/s in local mode...
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
If the inode block isn't valid then we don't want to print the value from
that, instead print the block number which was passed in (which should
always be correct). Also, turn this into a debug print for now - folks who
hit an actual problem always have other logs indicating what the source is.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
It's almost never worth printing in that situation and we keep forgetting to
manually filter it out.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Right now we're just setting them from the existing parameters, not the
new ones that a remount specified.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* git://git.linux-nfs.org/pub/linux/nfs-2.6:
NFS: Clean up new multi-segment direct I/O changes
NFS: Ensure we return zero if applications attempt to write zero bytes
NFS: Support multiple segment iovecs in the NFS direct I/O path
NFS: Introduce iovec I/O helpers to fs/nfs/direct.c
SUNRPC: Add missing "space" to net/sunrpc/auth_gss.c
SUNRPC: make sunrpc/xprtsock.c:xs_setup_{udp,tcp}() static
NFS: fs/nfs/dir.c should #include "internal.h"
NFS: make nfs_wb_page_priority() static
NFS: mount failure causes bad page state
SUNRPC: remove NFS/RDMA client's binary sysctls
kernel BUG at fs/nfs/namespace.c:108! - can be triggered by bad server
sunrpc: rpc_pipe_poll may miss available data in some cases
sunrpc: return error if unsupported enctype or cksumtype is encountered
sunrpc: gss_pipe_downcall(), don't assume all errors are transient
NFS: Fix the ustat() regression
Simplify calling sequence of nfs_direct_{read,write}_schedule(), and
rename them to reflect their new role.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
A zero byte count direct write request should be a successful no-op, not an
error.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Allow applications to perform asynchronous scatter-gather direct I/O
to NFS files.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Add helpers that iterate over multi-segment iovecs. These will
be used to support multi-segment scatter/gather direct I/O in a
later patch.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Every file should include the headers containing the prototypes for its global
functions (in this case nfs_access_cache_shrinker()).
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
nfs_wb_page_priority() can now become static.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
While testing a kernel based upon ecd744eec3
(with wrong boot arguments), I got the following bad page state entry while
NFS was trying to mount it's rootfs:
IP-Config: Complete:
device=eth0, addr=192.168.1.101, mask=255.255.255.0, gw=255.255.255.255,
host=192.168.1.101, domain=, nis-domain=(none),
bootserver=192.168.1.100, rootserver=192.168.1.100, rootpath=
Looking up port of RPC 100003/2 on 192.168.1.100
rpcbind: server 192.168.1.100 not responding, timed out
Root-NFS: Unable to get nfsd port number from server, using default
Looking up port of RPC 100005/1 on 192.168.1.100
rpcbind: server 192.168.1.100 not responding, timed out
Root-NFS: Unable to get mountd port number from server, using default
mount: server 192.168.1.100 not responding, timed out
Root-NFS: Server returned error -5 while mounting /nfs/rootfs/
VFS: Unable to mount root fs via NFS, trying floppy.
Bad page state in process 'swapper'
page:c02b1260 flags:0x00000400 mapping:00000000 mapcount:0 count:0
Trying to fix it up, but a reboot is needed
Backtrace:
[<c0023e34>] (dump_stack+0x0/0x14) from [<c0062570>] (bad_page+0x70/0xac)
[<c0062500>] (bad_page+0x0/0xac) from [<c0064914>] (free_hot_cold_page+0x80/0x178)
[<c0064894>] (free_hot_cold_page+0x0/0x178) from [<c0064a74>] (free_hot_page+0x14/0x18)
[<c0064a60>] (free_hot_page+0x0/0x18) from [<c0067078>] (put_page+0xf8/0x154)
[<c0066f80>] (put_page+0x0/0x154) from [<c007dbc8>] (kfree+0xc8/0xd0)
[<c007db00>] (kfree+0x0/0xd0) from [<c00cbb54>] (nfs_get_sb+0x230/0x710)
[<c00cb924>] (nfs_get_sb+0x0/0x710) from [<c0084334>] (vfs_kern_mount+0x58/0xac)[<c00842dc>] (vfs_kern_mount+0x0/0xac) from [<c00843c0>] (do_kern_mount+0x38/0xf4)
[<c0084388>] (do_kern_mount+0x0/0xf4) from [<c0099c7c>] (do_mount+0x1e8/0x614)
...
This seems to be caused by use of an uninitialised structure due to NULL
options being passed to nfs_validate_mount_data(). Ensure that the
parsed mount data is always initialised.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
(Trond: added fix for the same bug in nfs4_validate_mount_data()).
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Srivatsa Vaddagiri noticed occasionally incorrect CPU usage
values in top and tracked it down to stime going below 0 in
task_stime(). Negative values are possible there due to the
sampled nature of stime/utime.
Fix suggested by Balbir Singh.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Tested-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Reviewed-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Fix RedHat bug 329431
The idea here is separate "conscious" from "unconscious" flushes.
Conscious flushes are those due to a fsync() or close(). Unconscious
ones are flushes that occur as a side effect of some other operation or
due to memory pressure.
Currently, when an error occurs during an unconscious flush (ENOSPC or
EIO), we toss out the page and don't preserve that error to report to
the user when a conscious flush occurs. If after the unconscious flush,
there are no more dirty pages for the inode, the conscious flush will
simply return success even though there were previous errors when writing
out pages. This can lead to data corruption.
The easiest way to reproduce this is to mount up a CIFS share that's
very close to being full or where the user is very close to quota. mv
a file to the share that's slightly larger than the quota allows. The
writes will all succeed (since they go to pagecache). The mv will do a
setattr to set the new file's attributes. This calls
filemap_write_and_wait,
which will return an error since all of the pages can't be written out.
Then later, when the flush and release ops occur, there are no more
dirty pages in pagecache for the file and those operations return 0. mv
then assumes that the file was written out correctly and deletes the
original.
CIFS already has a write_behind_rc variable where it stores the results
from earlier flushes, but that value is only reported in cifs_close.
Since the VFS ignores the return value from the release operation, this
isn't helpful. We should be reporting this error during the flush
operation.
This patch does the following:
1) changes cifs_fsync to use filemap_write_and_wait and cifs_flush and also
sync to check its return code. If it returns successful, they then check
the value of write_behind_rc to see if an earlier flush had reported any
errors. If so, they return that error and clear write_behind_rc.
2) sets write_behind_rc in a few other places where pages are written
out as a side effect of other operations and the code waits on them.
3) changes cifs_setattr to only call filemap_write_and_wait for
ATTR_SIZE changes.
4) makes cifs_writepages accurately distinguish between EIO and ENOSPC
errors when writing out pages.
Some simple testing indicates that the patch works as expected and that
it fixes the reproduceable known problem.
Acked-by: Dave Kleikamp <shaggy@austin.rr.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
When retrying kernel_recvmsg() because of a short read, check returned
length against the remaining length, not against total length. This
avoids unneeded session reconnects which would otherwise occur when
kernel_recvmsg() finally returns zero when asked to read zero bytes.
Signed-off-by: Petr Tesarik <ptesarik@suse.cz>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Hi Trond,
I have discovered that the BUG_ON in nfs_follow_mountpoint:
BUG_ON(IS_ROOT(dentry));
can be triggered by a misbehaving server.
What happens is the client does a lookup and discoveres that the named
directory has a different fsid, so it initiates a mount.
It then performs a GETATTR on the mounted directory and gets a
different fsid again (due to a bug in the NFS server).
This causes nfs_follow_mountpoint to be called on the newly mounted
root, which triggers the BUG_ON.
To duplicate this, have a directory which contains some mountpoints,
and export that directory with the "crossmnt" flag using nfs-utils
1.1.1 (or 1.1.0 I think)
The GETATTR on the root of the mounted filesystem will return the
information for the top exportpoint, while a lookup will return the
correct information. This difference causes the NFS client to BUG.
I think the best way to fix this is to trap this possibility early, so
just before completing the mount in the NFS client, check that it isn't
going to use nfs_mountpoint_inode_operations.
As long as i_op will never change once set (is that true?), this
should be adequately safe.
The following patch shows a possible approach, and it works for me.
i.e. when the NFS server is misbehaving, I get ESTALE on those
mountpoints, while when the NFS server is working correctly, I get
correct behaviour on the client.
NeilBrown
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Since 2.6.18, the superblock sb->s_root has been a dummy dentry with a
dummy inode. This breaks ustat(), which actually uses sb->s_root in a
vfstat() call.
Fix this by making the s_root a dummy alias to the directory inode that was
used when creating the superblock.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Have CIFS_SessSetup call cifs_get_spnego_key when Kerberos is
negotiated. Use the info in the key payload to build a session
setup request packet. Also clean up how the request buffer in
the function is freed on error.
With appropriate user space helper (in samba/source/client). Kerberos
support (secure session establishment can be done now via Kerberos,
previously users would have to use NTLMv2 instead for more secure
session setup).
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
...and populate it with the hostname portion of the UNC string.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Move all of the kfree's sprinkled in the middle of the function to the
end, and have the code set rc and just goto there on error. Also zero
out the password string before freeing it. Looks like this should also
fix a potential memory leak of the prepath string if an error occurs
near the end of the function.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
With 64KB blocksize, a directory entry can have size 64KB which does not
fit into 16 bits we have for entry lenght. So we store 0xffff instead and
convert value when read from / written to disk. The patch also converts
some places to use ext3_next_entry() when we are changing them anyway.
[akpm@linux-foundation.org: coding-style cleanups]
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix some warnings with SMBFS_DEBUG_* builds. This patch makes it so that
builds with -Werror don't fail.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
sys_open / sys_read were used in the early 1.2 days to load firmware from
disk inside drivers. Since 2.0 or so this was deprecated behavior, but
several drivers still were using this. Since a few years we have a
request_firmware() API that implements this in a nice, consistent way.
Only some old ISA sound drivers (pre-ALSA) still straggled along for some
time.... however with commit c2b1239a9f the
last user is now gone.
This is a good thing, since using sys_open / sys_read etc for firmware is a
very buggy to dangerous thing to do; these operations put an fd in the
process file descriptor table.... which then can be tampered with from
other threads for example. For those who don't want the firmware loader,
filp_open()/vfs_read are the better APIs to use, without this security
issue.
The patch below marks sys_open and sys_read unused now that they're
really not used anymore, and for deletion in the 2.6.25 timeframe.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently we special case when we have only the initial pid namespace.
Unfortunately in doing so the copied case for the other namespaces was
broken so we don't properly flush the thread directories :(
So this patch removes the unnecessary special case (removing a usage of
proc_mnt) and corrects the flushing of the thread directories.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Cc: Kirill Korotaev <dev@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is not a new problem in 2.6.23-git17. 2.6.22/2.6.23 is buggy in the
same way.
Reiserfs could accumulate dirty sub-page-size files until umount time.
They cannot be synced to disk by pdflush routines or explicit `sync'
commands. Only `umount' can do the trick.
The direct cause is: the dirty page's PG_dirty is wrongly _cleared_.
Call trace:
[<ffffffff8027e920>] cancel_dirty_page+0xd0/0xf0
[<ffffffff8816d470>] :reiserfs:reiserfs_cut_from_item+0x660/0x710
[<ffffffff8816d791>] :reiserfs:reiserfs_do_truncate+0x271/0x530
[<ffffffff8815872d>] :reiserfs:reiserfs_truncate_file+0xfd/0x3b0
[<ffffffff8815d3d0>] :reiserfs:reiserfs_file_release+0x1e0/0x340
[<ffffffff802a187c>] __fput+0xcc/0x1b0
[<ffffffff802a1ba6>] fput+0x16/0x20
[<ffffffff8029e676>] filp_close+0x56/0x90
[<ffffffff8029fe0d>] sys_close+0xad/0x110
[<ffffffff8020c41e>] system_call+0x7e/0x83
Fix the bug by removing the cancel_dirty_page() call. Tests show that
it causes no bad behaviors on various write sizes.
=== for the patient ===
Here are more detailed demonstrations of the problem.
1) the page has both PG_dirty(D)/PAGECACHE_TAG_DIRTY(d) after being written to;
and then only PAGECACHE_TAG_DIRTY(d) remains after the file is closed.
------------------------------ screen 0 ------------------------------
[T0] root /home/wfg# cat > /test/tiny
[T1] hi
[T2] root /home/wfg#
------------------------------ screen 1 ------------------------------
[T1] root /home/wfg# echo /test/tiny > /proc/filecache
[T1] root /home/wfg# cat /proc/filecache
# file /test/tiny
# flags R:referenced A:active M:mmap U:uptodate D:dirty W:writeback O:owner B:buffer d:dirty w:writeback
# idx len state refcnt
0 1 ___UD__Bd_ 2
[T2] root /home/wfg# cat /proc/filecache
# file /test/tiny
# flags R:referenced A:active M:mmap U:uptodate D:dirty W:writeback O:owner B:buffer d:dirty w:writeback
# idx len state refcnt
0 1 ___U___Bd_ 2
2) note the non-zero 'cancelled_write_bytes' after /tmp/hi is copied.
------------------------------ screen 0 ------------------------------
[T0] root /home/wfg# echo hi > /tmp/hi
[T1] root /home/wfg# cp /tmp/hi /dev/stdin /test
[T2] hi
[T3] root /home/wfg#
------------------------------ screen 1 ------------------------------
[T1] root /proc/4397# cd /proc/`pidof cp`
[T1] root /proc/4713# cat io
rchar: 8396
wchar: 3
syscr: 20
syscw: 1
read_bytes: 0
write_bytes: 20480
cancelled_write_bytes: 4096
[T2] root /proc/4713# cat io
rchar: 8399
wchar: 6
syscr: 21
syscw: 2
read_bytes: 0
write_bytes: 24576
cancelled_write_bytes: 4096
//Question: the 'write_bytes' is a bit more than expected ;-)
Tested-by: Maxim Levitsky <maximlevitsky@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
Reviewed-by: Chris Mason <chris.mason@oracle.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I found a few bugs in the BFS driver. Detailed description of the bugs as
well as the steps to reproduce the errors are given in the kernel bugzilla.
Please follow these links for more information:
http://bugzilla.kernel.org/show_bug.cgi?id=9363http://bugzilla.kernel.org/show_bug.cgi?id=9364http://bugzilla.kernel.org/show_bug.cgi?id=9365http://bugzilla.kernel.org/show_bug.cgi?id=9366
This patch fixes the bugs described above. Besides, the patch introduces
coding style changes to make the BFS driver conform to the requirements
specified for Linux kernel code. Finally, I made a few cosmetic changes
such as removal of trivial debug output.
Also, the patch removes the fields `si_lf_ioff' and `si_lf_sblk' of the
in-core superblock structure. These fields are initialized but never
actually used.
If you are wondering why I need BFS, here is the answer: I am using this
driver in the context of Linux kernel classes I am teaching in the Moscow
State University and in the International Institute of Information
Technology in Pune, India.
Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@gmail.com>
Cc: Tigran Aivazian <tigran@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a second parameter 'delta' to hugetlb_get_quota and hugetlb_put_quota to
allow bulk updating of the sbinfo->free_blocks counter. This will be used by
the next patch in the series.
Signed-off-by: Adam Litke <agl@us.ibm.com>
Cc: Ken Chen <kenchen@google.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: David Gibson <hermes@gibson.dropbear.id.au>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The hugetlbfs quota management system was never taught to handle MAP_PRIVATE
mappings when that support was added. Currently, quota is debited at page
instantiation and credited at file truncation. This approach works correctly
for shared pages but is incomplete for private pages. In addition to
hugetlb_no_page(), private pages can be instantiated by hugetlb_cow(); but
this function does not respect quotas.
Private huge pages are treated very much like normal, anonymous pages. They
are not "backed" by the hugetlbfs file and are not stored in the mapping's
radix tree. This means that private pages are invisible to
truncate_hugepages() so that function will not credit the quota.
This patch (based on a prototype provided by Ken Chen) moves quota crediting
for all pages into free_huge_page(). page->private is used to store a pointer
to the mapping to which this page belongs. This is used to credit quota on
the appropriate hugetlbfs instance.
Signed-off-by: Adam Litke <agl@us.ibm.com>
Cc: Ken Chen <kenchen@google.com>
Cc: Ken Chen <kenchen@google.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: David Gibson <hermes@gibson.dropbear.id.au>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It appears we overlooked support for removing generic proc files
when we added support for multiple proc super blocks. Handle
that now.
[akpm@linux-foundation.org: coding-style cleanups]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Alexey Dobriyan <adobriyan@sw.ru>
Acked-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Forbid user from changing file flags on quota files. User has no bussiness
in playing with these flags when quota is on. Furthermore there is a
remote possibility of deadlock due to a lock inversion between quota file's
i_mutex and transaction's start (i_mutex for quota file is locked only when
trasaction is started in quota operations) in ext3 and ext4.
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: LIOU Payphone <lioupayphone@gmail.com>
Cc: <linux-ext4@vger.kernel.org>
Acked-by: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: <reiserfs-dev@namesys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
page->index should be cast to loff_t instead of off_t.
Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com>
Reported-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
request
In SendReceive() function in transport.c - it memcpy's
message payload into a buffer passed via out_buf param. The function
assumes that all buffers are of size (CIFSMaxBufSize +
MAX_CIFS_HDR_SIZE) , unfortunately it is also called with smaller
(MAX_CIFS_SMALL_BUFFER_SIZE) buffers. There are eight callers
(SMB worker functions) which are primarily affected by this change:
TreeDisconnect, uLogoff, Close, findClose, SetFileSize, SetFileTimes,
Lock and PosixLock
CC: Dave Kleikamp <shaggy@austin.ibm.com>
CC: Przemyslaw Wegrzyn <czajnik@czajsoft.pl>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
This reverts commit 7c9e69faa2, fixing up
conflicts in fs/ext4/balloc.c manually.
The cost of doing the bitmap validation on each lookup - even when the
bitmap is cached - is absolutely prohibitive. We could, and probably
should, do it only when adding the bitmap to the buffer cache. However,
right now we are better off just reverting it.
Peter Zijlstra measured the cost of this extra validation as a 85%
decrease in cached iozone, and while I had a patch that took it down to
just 17% by not being _quite_ so stupid in the validation, it was still
a big slowdown that could have been avoided by just doing it right.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andreas Dilger <adilger@clusterfs.com>
Cc: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch reverts Eric's commit 2b008b0a8e
It diets .text & .data section of the kernel if CONFIG_NET_NS is not set.
This is safe after list operations cleanup.
Signed-of-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
...and fix a couple of bugs in the NBD, CIFS and OCFS2 socket handlers.
Looking at the sock->op->shutdown() handlers, it looks as if all of them
take a SHUT_RD/SHUT_WR/SHUT_RDWR argument instead of the
RCV_SHUTDOWN/SEND_SHUTDOWN arguments.
Add a helper, and then define the SHUT_* enum to ensure that kernel users
of shutdown() don't get confused.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Mark Fasheh <mark.fasheh@oracle.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As with commit 7fc90ec93a ("knfsd: nfsd:
call nfsd_setuser() on fh_compose(), fix nfsd4 permissions problem")
this is a case where we need to redo a security check in fh_verify()
even though the filehandle already has an associated dentry--if the
filehandle was created by fh_compose() in an earlier operation of the
nfsv4 compound, then we may not have done these checks yet.
Without this fix it is possible, for example, to traverse from an export
without the secure ports requirement to one with it in a single
compound, and bypass the secure port check on the new export.
While we're here, fix up some minor style problems and change a printk()
to a dprintk(), to make it harder for random unprivileged users to spam
the logs.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Reviewed-By: NeilBrown <neilb@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The v2/v3 acl code in nfsd is translating any return from fh_verify() to
nfserr_inval. This is particularly unfortunate in the case of an
nfserr_dropit return, which is an internal error meant to indicate to
callers that this request has been deferred and should just be dropped
pending the results of an upcall to mountd.
Thanks to Roland <devzero@web.de> for bug report and data collection.
Cc: Roland <devzero@web.de>
Acked-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Reviewed-By: NeilBrown <neilb@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: (21 commits)
[CIFS] fix oops on second mount to same server when null auth is used
[CIFS] Fix stale mode after readdir when cifsacl specified
[CIFS] add mode to acl conversion helper function
[CIFS] Fix incorrect mode when ACL had deny access control entries
[CIFS] Add uid to key description so krb can handle user mounts
[CIFS] Fix walking out end of cifs dacl
[CIFS] Add upcall files for cifs to use spnego/kerberos
[CIFS] add OIDs for KRB5 and MSKRB5 to ASN1 parsing routines
[CIFS] Register and unregister cifs_spnego_key_type on module init/exit
[CIFS] implement upcalls for SPNEGO blob via keyctl API
[CIFS] allow cifs_calc_signature2 to deal with a zero length iovec
[CIFS] If no Access Control Entries, set mode perm bits to zero
[CIFS] when mount helper missing fix slash wrong direction in share
[CIFS] Don't request too much permission when reading an ACL
[CIFS] enable get mode from ACL when cifsacl mount option specified
[CIFS] ACL support part 8
[CIFS] acl support part 7
[CIFS] acl support part 6
[CIFS] acl support part 6
[CIFS] remove unused funtion compile warning when experimental off
...
The coredump code always calls set_dumpable(0) when it starts (even
if RLIMIT_CORE prevents any core from being dumped). The effect of
this (via task_dumpable) is to make /proc/pid/* files owned by root
instead of the user, so the user can no longer examine his own
process--in a case where there was never any privileged data to
protect. This affects e.g. auxv, environ, fd; in Fedora (execshield)
kernels, also maps. In practice, you can only notice this when a
debugger has requested PTRACE_EVENT_EXIT tracing.
set_dumpable was only used in do_coredump for synchronization and not
intended for any security purpose. (It doesn't secure anything that wasn't
already unsecured when a process dies by SIGTERM instead of SIGQUIT.)
This changes do_coredump to check the core_waiters count as the means of
synchronization, which is sufficient. Now we leave the "dumpable" bits alone.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When a share is mounted using no username, cifs_mount sets
volume_info.username as a NULL pointer, and the sesInfo userName as an
empty string. The volume_info.username is passed to a couple of other
functions to see if there is an existing unc or tcp connection that can
be used. These functions assume that the username will be a valid
string that can be passed to strncmp. If the pointer is NULL, then the
kernel will oops if there's an existing session to which the string
can be compared.
This patch changes cifs_mount to set volume_info.username to an empty
string in this situation, which prevents the oops and should make it
so that the comparison to other null auth sessions match.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
When mounted with cifsacl mount option, readdir can not
instantiate the inode with the estimated mode based on the ACL
for each file since we have not queried for the ACL for
each of these files yet. So set the refresh time to zero
for these inodes so that the next stat will cause the client
to go to the server for the ACL info so we can build the estimated
mode (this means we also will issue an extra QueryPathInfo if
the stat happens within 1 second, but this is trivial compared to
the time required to open/getacl/close for each).
ls -l is slower when cifsacl mount option is specified, but
displays correct mode information.
Signed-off-by: Shirish Pargaonkar <shirishp@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
When mounted with the cifsacl mount option, we were
treating any deny ACEs found like allow ACEs and it turns out for
SFU and SUA Windows set these type of access control entries often.
The order of ACEs is important too. The canonical order that most
ACL tools and Windows explorer consruct ACLs with is to begin with
DENY entries then follow with ALLOW, otherwise an allow entry
could be encountered first, making the subsequent deny entry like "dead
code which would be superflous since Windows stops when a match is
made for the operation you are trying to perform for your user
We start with no permissions in the mode and build up as we find
permissions (ie allow ACEs). This fixes deny ACEs so they affect
the mask used to set the subsequent allow ACEs.
Acked-by: Shirish Pargaonkar <shirishp@us.ibm.com>
CC: Alexander Bokovoy <ab@samba.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Adds uid to key description fro supporting user mounts
and minor formating changes
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Igor Mammedov <niallain@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Normally io priorities follow the CPU nice, unless a specific scheduling
class has been set. Once that is set, there's no way to reset the
behaviour to 'none' so that it follows CPU nice again.
Currently passing in 0 as the ioprio class/value will return -1/EINVAL,
change that to allow resetting of a set scheduling class.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
If another node unlinks the destination while ocfs2_rename() is waiting on a
cluster lock, ocfs2_rename() simply logs an error and continues. This causes
a crash because the renaming node is now trying to delete a non-existent
inode. The correct solution is to return -ENOENT.
Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
We should subtract start of our IO from PAGE_CACHE_SIZE to get the right
length of the write we want to perform.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
On file systems which don't support sparse files, Ocfs2_map_page_blocks()
was reading blocks on appending writes. This caused write performance to
suffer dramatically. Fix this by detecting an appending write on a nonsparse
fs and skipping the read.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
We're missing a meta data commit for extending sync writes. In thoery, write
could return with the meta data required to read the data uncommitted to
disk. Fix that by detecting an allocating write and forcing a journal commit
in the sync case.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Do this to avoid a theoretical (I haven't seen this in practice) race where
the downconvert thread might drop the dentry lock, allowing a remote unlink
to proceed before dropping the inode locks. This could bounce access to the
orphan dir between nodes.
There doesn't seem to be a need to do the same in ocfs2_dentry_iput() as
that's never called for the last ref drop from the downconvert thread.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
If we have not yet created a cluster lock, ocfs2_cluster_lock() will
first create it at NLMODE, and then convert the lock to either PRMODE or
EXMODE (whichever is requested).
Change ocfs2_cluster_lock() to just create the lock at the initially
requested level. ocfs2_locking_ast() handles this case fine, so the only
update required was in setup of locking state. This should reduce the number
of network messages required for a new lock by one, providing an incremental
performance enhancement.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
v9fs_parse_options function uses strsep which modifies the value of the
v9ses->options field. That modified value is later passed to the function
that creates the transport potentially making the transport creation
function to fail.
This patch creates a copy of v9ses->option field that v9fs_parse_options
function uses instead of the original value.
Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Acked-by: Eric Van Hensbergen <ericvh@gmail.com>