NFSv4 uses leases to guarantee that clients can cache metadata as well
as data.
Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Cc: David Howells <dhowells@redhat.com>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Dustin Kirkland <dustin.kirkland@gazzang.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
We need to break delegations on any operation that changes the set of
links pointing to an inode. Start with unlink.
Such operations also hold the i_mutex on a parent directory. Breaking a
delegation may require waiting for a timeout (by default 90 seconds) in
the case of a unresponsive NFS client. To avoid blocking all directory
operations, we therefore drop locks before waiting for the delegation.
The logic then looks like:
acquire locks
...
test for delegation; if found:
take reference on inode
release locks
wait for delegation break
drop reference on inode
retry
It is possible this could never terminate. (Even if we take precautions
to prevent another delegation being acquired on the same inode, we could
get a different inode on each retry.) But this seems very unlikely.
The initial test for a delegation happens after the lock on the target
inode is acquired, but the directory inode may have been acquired
further up the call stack. We therefore add a "struct inode **"
argument to any intervening functions, which we use to pass the inode
back up to the caller in the case it needs a delegation synchronously
broken.
Cc: David Howells <dhowells@redhat.com>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Dustin Kirkland <dustin.kirkland@gazzang.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
For now FL_DELEG is just a synonym for FL_LEASE. So this patch doesn't
change behavior.
Next we'll modify break_lease to treat FL_DELEG leases differently, to
account for the fact that NFSv4 delegations should be broken in more
situations than Windows oplocks.
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull vfs pile 4 from Al Viro:
"list_lru pile, mostly"
This came out of Andrew's pile, Al ended up doing the merge work so that
Andrew didn't have to.
Additionally, a few fixes.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (42 commits)
super: fix for destroy lrus
list_lru: dynamically adjust node arrays
shrinker: Kill old ->shrink API.
shrinker: convert remaining shrinkers to count/scan API
staging/lustre/libcfs: cleanup linux-mem.h
staging/lustre/ptlrpc: convert to new shrinker API
staging/lustre/obdclass: convert lu_object shrinker to count/scan API
staging/lustre/ldlm: convert to shrinkers to count/scan API
hugepage: convert huge zero page shrinker to new shrinker API
i915: bail out earlier when shrinker cannot acquire mutex
drivers: convert shrinkers to new count/scan API
fs: convert fs shrinkers to new scan/count API
xfs: fix dquot isolation hang
xfs-convert-dquot-cache-lru-to-list_lru-fix
xfs: convert dquot cache lru to list_lru
xfs: rework buffer dispose list tracking
xfs-convert-buftarg-lru-to-generic-code-fix
xfs: convert buftarg LRU to generic code
fs: convert inode and dentry shrinking to be node aware
vmscan: per-node deferred work
...
Pull nfsd updates from Bruce Fields:
"This was a very quiet cycle! Just a few bugfixes and some cleanup"
* 'nfsd-next' of git://linux-nfs.org/~bfields/linux:
rpc: let xdr layer allocate gssproxy receieve pages
rpc: fix huge kmalloc's in gss-proxy
rpc: comment on linux_cred encoding, treat all as unsigned
rpc: clean up decoding of gssproxy linux creds
svcrpc: remove unused rq_resused
nfsd4: nfsd4_create_clid_dir prints uninitialized data
nfsd4: fix leak of inode reference on delegation failure
Revert "nfsd: nfs4_file_get_access: need to be more careful with O_RDWR"
sunrpc: prepare NFS for 2038
nfsd4: fix setlease error return
nfsd: nfs4_file_get_access: need to be more careful with O_RDWR
Convert the filesystem shrinkers to use the new API, and standardise some
of the behaviours of the shrinkers at the same time. For example,
nr_to_scan means the number of objects to scan, not the number of objects
to free.
I refactored the CIFS idmap shrinker a little - it really needs to be
broken up into a shrinker per tree and keep an item count with the tree
root so that we don't need to walk the tree every time the shrinker needs
to count the number of objects in the tree (i.e. all the time under
memory pressure).
[glommer@openvz.org: fixes for ext4, ubifs, nfs, cifs and glock. Fixes are needed mainly due to new code merged in the tree]
[assorted fixes folded in]
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Glauber Costa <glommer@openvz.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This fixes a regression from 68a3396178
"nfsd4: shut down more of delegation earlier".
After that commit, nfs4_set_delegation() failures result in
nfs4_put_delegation being called, but nfs4_put_delegation doesn't free
the nfs4_file that has already been set by alloc_init_deleg().
This can result in an oops on later unmounting the exported filesystem.
Note also delaying the fi_had_conflict check we're able to return a
better error (hence give 4.1 clients a better idea why the delegation
failed; though note CONFLICT isn't an exact match here, as that's
supposed to indicate a current conflict, but all we know here is that
there was one recently).
Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Tested-by: Toralf Förster <toralf.foerster@gmx.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This reverts commit df66e75395.
nfsd4_lock can get a read-only or write-only reference when only a
read-write open is available. This is normal.
Cc: Harshula Jayasuriya <harshula@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
- don't BUG_ON() when not SP4_NONE
- calculate recv and send reserve sizes correctly
Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This actually makes a difference in the 4.1 case, since we use the
status to decide what reason to give the client for the delegation
refusal (see nfsd4_open_deleg_none_ext), and in theory a client might
choose suboptimal behavior if we give the wrong answer.
Reported-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
If fi_fds = {non-NULL, NULL, non-NULL} and oflag = O_WRONLY
the WARN_ON_ONCE(!(fp->fi_fds[oflag] || fp->fi_fds[O_RDWR]))
doesn't trigger when it should.
Signed-off-by: Harshula Jayasuriya <harshula@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The following call chain:
------------------------------------------------------------
nfs4_get_vfs_file
- nfsd_open
- dentry_open
- do_dentry_open
- __get_file_write_access
- get_write_access
- return atomic_inc_unless_negative(&inode->i_writecount) ? 0 : -ETXTBSY;
------------------------------------------------------------
can result in the following state:
------------------------------------------------------------
struct nfs4_file {
...
fi_fds = {0xffff880c1fa65c80, 0xffffffffffffffe6, 0x0},
fi_access = {{
counter = 0x1
}, {
counter = 0x0
}},
...
------------------------------------------------------------
1) First time around, in nfs4_get_vfs_file() fp->fi_fds[O_WRONLY] is
NULL, hence nfsd_open() is called where we get status set to an error
and fp->fi_fds[O_WRONLY] to -ETXTBSY. Thus we do not reach
nfs4_file_get_access() and fi_access[O_WRONLY] is not incremented.
2) Second time around, in nfs4_get_vfs_file() fp->fi_fds[O_WRONLY] is
NOT NULL (-ETXTBSY), so nfsd_open() is NOT called, but
nfs4_file_get_access() IS called and fi_access[O_WRONLY] is incremented.
Thus we leave a landmine in the form of the nfs4_file data structure in
an incorrect state.
3) Eventually, when __nfs4_file_put_access() is called it finds
fi_access[O_WRONLY] being non-zero, it decrements it and calls
nfs4_file_put_fd() which tries to fput -ETXTBSY.
------------------------------------------------------------
...
[exception RIP: fput+0x9]
RIP: ffffffff81177fa9 RSP: ffff88062e365c90 RFLAGS: 00010282
RAX: ffff880c2b3d99cc RBX: ffff880c2b3d9978 RCX: 0000000000000002
RDX: dead000000100101 RSI: 0000000000000001 RDI: ffffffffffffffe6
RBP: ffff88062e365c90 R8: ffff88041fe797d8 R9: ffff88062e365d58
R10: 0000000000000008 R11: 0000000000000000 R12: 0000000000000001
R13: 0000000000000007 R14: 0000000000000000 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#9 [ffff88062e365c98] __nfs4_file_put_access at ffffffffa0562334 [nfsd]
#10 [ffff88062e365cc8] nfs4_file_put_access at ffffffffa05623ab [nfsd]
#11 [ffff88062e365ce8] free_generic_stateid at ffffffffa056634d [nfsd]
#12 [ffff88062e365d18] release_open_stateid at ffffffffa0566e4b [nfsd]
#13 [ffff88062e365d38] nfsd4_close at ffffffffa0567401 [nfsd]
#14 [ffff88062e365d88] nfsd4_proc_compound at ffffffffa0557f28 [nfsd]
#15 [ffff88062e365dd8] nfsd_dispatch at ffffffffa054543e [nfsd]
#16 [ffff88062e365e18] svc_process_common at ffffffffa04ba5a4 [sunrpc]
#17 [ffff88062e365e98] svc_process at ffffffffa04babe0 [sunrpc]
#18 [ffff88062e365eb8] nfsd at ffffffffa0545b62 [nfsd]
#19 [ffff88062e365ee8] kthread at ffffffff81090886
#20 [ffff88062e365f48] kernel_thread at ffffffff8100c14a
------------------------------------------------------------
Cc: stable@vger.kernel.org
Signed-off-by: Harshula Jayasuriya <harshula@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pull nfsd bugfixes from Bruce Fields:
"Just three minor bugfixes"
* 'for-3.11' of git://linux-nfs.org/~bfields/linux:
svcrdma: underflow issue in decode_write_list()
nfsd4: fix minorversion support interface
lockd: protect nlm_blocked access in nlmsvc_retry_blocked
You can turn on or off support for minorversions using e.g.
echo "-4.2" >/proc/fs/nfsd/versions
However, the current implementation is a little wonky. For example, the
above will turn off 4.2 support, but it will also turn *on* 4.1 support.
This didn't matter as long as we only had 2 minorversions, which was
true till very recently.
And do a little cleanup here.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pull nfsd changes from Bruce Fields:
"Changes this time include:
- 4.1 enabled on the server by default: the last 4.1-specific issues
I know of are fixed, so we're not going to find the rest of the
bugs without more exposure.
- Experimental support for NFSv4.2 MAC Labeling (to allow running
selinux over NFS), from Dave Quigley.
- Fixes for some delicate cache/upcall races that could cause rare
server hangs; thanks to Neil Brown and Bodo Stroesser for extreme
debugging persistence.
- Fixes for some bugs found at the recent NFS bakeathon, mostly v4
and v4.1-specific, but also a generic bug handling fragmented rpc
calls"
* 'for-3.11' of git://linux-nfs.org/~bfields/linux: (31 commits)
nfsd4: support minorversion 1 by default
nfsd4: allow destroy_session over destroyed session
svcrpc: fix failures to handle -1 uid's
sunrpc: Don't schedule an upcall on a replaced cache entry.
net/sunrpc: xpt_auth_cache should be ignored when expired.
sunrpc/cache: ensure items removed from cache do not have pending upcalls.
sunrpc/cache: use cache_fresh_unlocked consistently and correctly.
sunrpc/cache: remove races with queuing an upcall.
nfsd4: return delegation immediately if lease fails
nfsd4: do not throw away 4.1 lock state on last unlock
nfsd4: delegation-based open reclaims should bypass permissions
svcrpc: don't error out on small tcp fragment
svcrpc: fix handling of too-short rpc's
nfsd4: minor read_buf cleanup
nfsd4: fix decoding of compounds across page boundaries
nfsd4: clean up nfs4_open_delegation
NFSD: Don't give out read delegations on creates
nfsd4: allow client to send no cb_sec flavors
nfsd4: fail attempts to request gss on the backchannel
nfsd4: implement minimal SP4_MACH_CRED
...
Feature highlights include:
- Add basic client support for NFSv4.2
- Add basic client support for Labeled NFS (selinux for NFSv4.2)
- Fix the use of credentials in NFSv4.1 stateful operations, and
add support for NFSv4.1 state protection.
Bugfix highlights:
- Fix another NFSv4 open state recovery race
- Fix an NFSv4.1 back channel session regression
- Various rpc_pipefs races
- Fix another issue with NFSv3 auth negotiation
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
iQIcBAABAgAGBQJR2vsSAAoJEGcL54qWCgDyWBIP/AqlpBBAblxbNQ1Bl/0m1Pdb
iKH961qgM4U1BzK0svGtHTZqkovpm4o/VbkbKBT5mQ4g6SbbsJ/AsS1plCyfnIZi
bdnKNJyj6zg0NsAkJ3vKWqd4BTaP+icdSfEIlRKQxAPESewN7b5B3OWgY4KdYmnk
q5BP25anC1ryxVycSY67ux8S2IKXVSRZeCZv+RO21rvZ2G0bV5y7t8Om28ztxEnU
RKrHgQHgaaktR7i8QVO0sbiWq3iqLa3GPkUvFLwWGr8PQJtTkYY0QwYSrsV3N4rY
hYpMRUZFHpZ8UG5YvBT6xyOy/XaGwMGKSfZjB9/YG4QVju+tTy50U1JbTil5PEWY
GHWYF68aurIeUkXrhSv8AVnOnhir0mISx5ou/SV7p0QoAZ92V6kq+LkPrW520qlc
z8ILh3j28pN3ZUCIEArcaZhYCt48uO2hwBi5TqevQyyGRsXFGbN1moD5jvHkllft
Fi0XGuCBdvhrzFRZcsEl+PDq7fT8lXUK2BHe8oR5jz9PhUp+jpEl9m/eg3RsjJjN
DuxsHye2U4chScdnRtLBQvpFtdINvWX/Gy8Bi7kdE5tsQySvOa+rdwuBc7h88PHC
+4xI2iX3z4O1+GpsAe/T9+pjW689jEilS+eVDRVEGl6yHGn9q8PYOayjPjwbJHxS
R2mLTRhKu1DKguTzO13f
=wGjn
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-3.11-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
"Feature highlights include:
- Add basic client support for NFSv4.2
- Add basic client support for Labeled NFS (selinux for NFSv4.2)
- Fix the use of credentials in NFSv4.1 stateful operations, and add
support for NFSv4.1 state protection.
Bugfix highlights:
- Fix another NFSv4 open state recovery race
- Fix an NFSv4.1 back channel session regression
- Various rpc_pipefs races
- Fix another issue with NFSv3 auth negotiation
Please note that Labeled NFS does require some additional support from
the security subsystem. The relevant changesets have all been
reviewed and acked by James Morris."
* tag 'nfs-for-3.11-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (54 commits)
NFS: Set NFS_CS_MIGRATION for NFSv4 mounts
NFSv4.1 Refactor nfs4_init_session and nfs4_init_channel_attrs
nfs: have NFSv3 try server-specified auth flavors in turn
nfs: have nfs_mount fake up a auth_flavs list when the server didn't provide it
nfs: move server_authlist into nfs_try_mount_request
nfs: refactor "need_mount" code out of nfs_try_mount
SUNRPC: PipeFS MOUNT notification optimization for dying clients
SUNRPC: split client creation routine into setup and registration
SUNRPC: fix races on PipeFS UMOUNT notifications
SUNRPC: fix races on PipeFS MOUNT notifications
NFSv4.1 use pnfs_device maxcount for the objectlayout gdia_maxcount
NFSv4.1 use pnfs_device maxcount for the blocklayout gdia_maxcount
NFSv4.1 Fix gdia_maxcount calculation to fit in ca_maxresponsesize
NFS: Improve legacy idmapping fallback
NFSv4.1 end back channel session draining
NFS: Apply v4.1 capabilities to v4.2
NFSv4.1: Clean up layout segment comparison helper names
NFSv4.1: layout segment comparison helpers should take 'const' parameters
NFSv4: Move the DNS resolver into the NFSv4 module
rpc_pipefs: only set rpc_dentry_ops if d_op isn't already set
...
We now have minimal minorversion 1 support; turn it on by default.
This can still be turned off with "echo -4.1 >/proc/fs/nfsd/versions".
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
RFC 5661 allows a client to destroy a session using a compound
associated with the destroyed session, as long as the DESTROY_SESSION op
is the last op of the compound.
We attempt to allow this, but testing against a Solaris client (which
does destroy sessions in this way) showed that we were failing the
DESTROY_SESSION with NFS4ERR_DELAY, because we assumed the reference
count on the session (held by us) represented another rpc in progress
over this session.
Fix this by noting that in this case the expected reference count is 1,
not 0.
Also, note as long as the session holds a reference to the compound
we're destroying, we can't free it here--instead, delay the free till
the final put in nfs4svc_encode_compoundres.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This case shouldn't happen--the administrator shouldn't really allow
other applications access to the export until clients have had the
chance to reclaim their state--but if it does then we should set the
"return this lease immediately" bit on the reply. That still leaves
some small races, but it's the best the protocol allows us to do in the
case a lease is ripped out from under us....
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This reverts commit eb2099f31b "nfsd4:
release lockowners on last unlock in 4.1 case". Trond identified
language in rfc 5661 section 8.2.4 which forbids this behavior:
Stateids associated with byte-range locks are an exception.
They remain valid even if a LOCKU frees all remaining locks, so
long as the open file with which they are associated remains
open, unless the client frees the stateids via the FREE_STATEID
operation.
And bakeathon 2013 testing found a 4.1 freebsd client was getting an
incorrect BAD_STATEID return from a FREE_STATEID in the above situation
and then failing.
The spec language honestly was probably a mistake but at this point with
implementations already following it we're probably stuck with that.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
We saw a v4.0 client's create fail as follows:
- open create succeeds and gets a read delegation
- client attempts to set mode on new file, gets DELAY while
server recalls delegation.
- client attempts a CLAIM_DELEGATE_CUR open using the
delegation, gets error because of new file mode.
This probably can't happen on a recent kernel since we're no longer
giving out delegations on create opens. Nevertheless, it's a
bug--reclaim opens should bypass permission checks.
Reported-by: Steve Dickson <steved@redhat.com>
Reported-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
A freebsd NFSv4.0 client was getting rare IO errors expanding a tarball.
A network trace showed the server returning BAD_XDR on the final getattr
of a getattr+write+getattr compound. The final getattr started on a
page boundary.
I believe the Linux client ignores errors on the post-write getattr, and
that that's why we haven't seen this before.
Cc: stable@vger.kernel.org
Reported-by: Rick Macklem <rmacklem@uoguelph.ca>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The nfs4_open_delegation logic is unecessarily baroque.
Also stop pretending we support write delegations in several places.
Some day we will support write delegations, but when that happens adding
back in these flag parameters will be the easy part. For now they're
just confusing.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
When an exclusive create is done with the mode bits
set (aka open(testfile, O_CREAT | O_EXCL, 0777)) this
causes a OPEN op followed by a SETATTR op. When a
read delegation is given in the OPEN, it causes
the SETATTR to delay with EAGAIN until the
delegation is recalled.
This patch caused exclusive creates to give out
a write delegation (which turn into no delegation)
which allows the SETATTR seamlessly succeed.
Signed-off-by: Steve Dickson <steved@redhat.com>
[bfields: do this for any CREATE, not just exclusive; comment]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
In testing I notice that some of the pynfs tests forget to send any
cb_sec flavors, and that we haven't necessarily errored out in that case
before.
I'll fix pynfs, but am also inclined to default to trying AUTH_NONE in
that case in case this is something clients actually do.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
We don't support gss on the backchannel. We should state that fact up
front rather than just letting things continue and later making the
client try to figure out why the backchannel isn't working.
Trond suggested instead returning NFS4ERR_NOENT. I think it would be
tricky for the client to distinguish between the case "I don't support
gss on the backchannel" and "I can't find that in my cache, please
create another context and try that instead", and I'd prefer something
that currently doesn't have any other meaning for this operation, hence
the (somewhat arbitrary) NFS4ERR_ENCR_ALG_UNSUPP.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Do a minimal SP4_MACH_CRED implementation suggested by Trond, ignoring
the client-provided spo_must_* arrays and just enforcing credential
checks for the minimum required operations.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Store a pointer to the gss mechanism used in the rq_cred and cl_cred.
This will make it easier to enforce SP4_MACH_CRED, which needs to
compare the mechanism used on the exchange_id with that used on
protected operations.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Having a global lock that protects all of this code is a clear
scalability problem. Instead of doing that, move most of the code to be
protected by the i_lock instead. The exceptions are the global lists
that the ->fl_link sits on, and the ->fl_block list.
->fl_link is what connects these structures to the
global lists, so we must ensure that we hold those locks when iterating
over or updating these lists.
Furthermore, sound deadlock detection requires that we hold the
blocked_list state steady while checking for loops. We also must ensure
that the search and update to the list are atomic.
For the checking and insertion side of the blocked_list, push the
acquisition of the global lock into __posix_lock_file and ensure that
checking and update of the blocked_list is done without dropping the
lock in between.
On the removal side, when waking up blocked lock waiters, take the
global lock before walking the blocked list and dequeue the waiters from
the global list prior to removal from the fl_block list.
With this, deadlock detection should be race free while we minimize
excessive file_lock_lock thrashing.
Finally, in order to avoid a lock inversion problem when handling
/proc/locks output we must ensure that manipulations of the fl_block
list are also protected by the file_lock_lock.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
New method - ->iterate(file, ctx). That's the replacement for ->readdir();
it takes callback from ctx->actor, uses ctx->pos instead of file->f_pos and
calls dir_emit(ctx, ...) instead of filldir(data, ...). It does *not*
update file->f_pos (or look at it, for that matter); iterate_dir() does the
update.
Note that dir_emit() takes the offset from ctx->pos (and eventually
filldir_t will lose that argument).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
iterate_dir(): new helper, replacing vfs_readdir().
struct dir_context: contains the readdir callback (and will get more stuff
in it), embedded into whatever data that callback wants to deal with;
eventually, we'll be passing it to ->readdir() replacement instead of
(data,filldir) pair.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
In C, signed integer overflow results in undefined behavior, but unsigned
overflow wraps around. So do the subtraction first, then cast to signed.
Reported-by: Joakim Tjernlund <joakim.tjernlund@transmode.se>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Implement labeled NFS on the server: encoding and decoding, and writing
and reading, of file labels.
Enabled with CONFIG_NFSD_V4_SECURITY_LABEL.
Signed-off-by: Matthew N. Dodd <Matthew.Dodd@sparta.com>
Signed-off-by: Miguel Rodel Felipe <Rodel_FM@dsi.a-star.edu.sg>
Signed-off-by: Phua Eu Gene <PHUA_Eu_Gene@dsi.a-star.edu.sg>
Signed-off-by: Khin Mi Mi Aung <Mi_Mi_AUNG@dsi.a-star.edu.sg>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This enables NFSv4.2 support for the server. To enable this
code do the following:
echo "+4.2" >/proc/fs/nfsd/versions
after the nfsd kernel module is loaded.
On its own this does nothing except allow the server to respond to
compounds with minorversion set to 2. All the new NFSv4.2 features are
optional, so this is perfectly legal.
Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This code assumes that any client using exchange_id is using NFSv4.1,
but with the introduction of 4.2 that will no longer true.
This main effect of this is that client callbacks will use the same
minorversion as that used on the exchange_id.
Note that clients are forbidden from mixing 4.1 and 4.2 compounds. (See
rfc 5661, section 2.7, #13: "A client MUST NOT attempt to use a stateid,
filehandle, or similar returned object from the COMPOUND procedure with
minor version X for another COMPOUND procedure with minor version Y,
where X != Y.") However, we do not currently attempt to enforce this
except in the case of mixing zero minor version with non-zero minor
versions.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The fh_lock_parent(), nfsd_truncate(), nfsd_notify_change() and
nfsd_sync_dir() fuctions are neither implemented nor used, just remove
them.
Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Matthew N. Dodd <Matthew.Dodd@sparta.com>
Signed-off-by: Miguel Rodel Felipe <Rodel_FM@dsi.a-star.edu.sg>
Signed-off-by: Phua Eu Gene <PHUA_Eu_Gene@dsi.a-star.edu.sg>
Signed-off-by: Khin Mi Mi Aung <Mi_Mi_AUNG@dsi.a-star.edu.sg>
Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pull nfsd fixes from Bruce Fields:
"Small fixes for two bugs and two warnings"
* 'for-3.10' of git://linux-nfs.org/~bfields/linux:
nfsd: fix oops when legacy_recdir_name_error is passed a -ENOENT error
SUNRPC: fix decoding of optional gss-proxy xdr fields
SUNRPC: Refactor gssx_dec_option_array() to kill uninitialized warning
nfsd4: don't allow owner override on 4.1 CLAIM_FH opens
The Linux client is using CLAIM_FH to implement regular opens, not just
recovery cases, so it depends on the server to check permissions
correctly.
Therefore the owner override, which may make sense in the delegation
recovery case, isn't right in the CLAIM_FH case.
Symptoms: on a client with 49f9a0fafd
"NFSv4.1: Enable open-by-filehandle", Bryan noticed this:
touch test.txt
chmod 000 test.txt
echo test > test.txt
succeeding.
Cc: stable@kernel.org
Reported-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>