Commit Graph

1175 Commits

Author SHA1 Message Date
Trond Myklebust
fb500a7cfe nfsd: CLOSE SHOULD return the invalid special stateid for NFSv4.x (x>0)
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27 16:45:10 -05:00
Trond Myklebust
d8a1a00055 nfsd: Fix another OPEN stateid race
If nfsd4_process_open2() is initialising a new stateid, and yet the
call to nfs4_get_vfs_file() fails for some reason, then we must
declare the stateid closed, and unhash it before dropping the mutex.

Right now, we unhash the stateid after dropping the mutex, and without
changing the stateid type, meaning that another OPEN could theoretically
look it up and attempt to use it.

Reported-by: Andrew W Elble <aweits@rit.edu>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27 16:45:10 -05:00
Trond Myklebust
15ca08d329 nfsd: Fix stateid races between OPEN and CLOSE
Open file stateids can linger on the nfs4_file list of stateids even
after they have been closed. In order to avoid reusing such a
stateid, and confusing the client, we need to recheck the
nfs4_stid's type after taking the mutex.
Otherwise, we risk reusing an old stateid that was already closed,
which will confuse clients that expect new stateids to conform to
RFC7530 Sections 9.1.4.2 and 16.2.5 or RFC5661 Sections 8.2.2 and 18.2.4.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27 16:45:10 -05:00
Andrew Elble
95da1b3a5a nfsd: deal with revoked delegations appropriately
If a delegation has been revoked by the server, operations using that
delegation should error out with NFS4ERR_DELEG_REVOKED in the >4.1
case, and NFS4ERR_BAD_STATEID otherwise.

The server needs NFSv4.1 clients to explicitly free revoked delegations.
If the server returns NFS4ERR_DELEG_REVOKED, the client will do that;
otherwise it may just forget about the delegation and be unable to
recover when it later sees SEQ4_STATUS_RECALLABLE_STATE_REVOKED set on a
SEQUENCE reply.  That can cause the Linux 4.1 client to loop in its
stage manager.

Signed-off-by: Andrew Elble <aweits@rit.edu>
Reviewed-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-07 16:44:02 -05:00
Vasily Averin
7e981a8afa nfsd: use nfs->ns.inum as net ID
Publishing of net pointer is not safe,
let's use nfs->ns.inum instead

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-07 16:44:01 -05:00
Elena Reshetova
818a34eb26 fs, nfsd: convert nfs4_file.fi_ref from atomic_t to refcount_t
atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable nfs4_file.fi_ref is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-07 16:43:59 -05:00
Elena Reshetova
cff7cb2ece fs, nfsd: convert nfs4_cntl_odstate.co_odcount from atomic_t to refcount_t
atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable nfs4_cntl_odstate.co_odcount is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-07 16:43:59 -05:00
Elena Reshetova
a15dfcd529 fs, nfsd: convert nfs4_stid.sc_count from atomic_t to refcount_t
atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable nfs4_stid.sc_count is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-07 16:43:58 -05:00
J. Bruce Fields
53da6a53e1 nfsd4: catch some false session retries
The spec allows us to return NFS4ERR_SEQ_FALSE_RETRY if we notice that
the client is making a call that matches a previous (slot, seqid) pair
but that *isn't* actually a replay, because some detail of the call
doesn't actually match the previous one.

Catching every such case is difficult, but we may as well catch a few
easy ones.  This also handles the case described in the previous patch,
in a different way.

The spec does however require us to catch the case where the difference
is in the rpc credentials.  This prevents somebody from snooping another
user's replies by fabricating retries.

(But the practical value of the attack is limited by the fact that the
replies with the most sensitive data are READ replies, which are not
normally cached.)

Tested-by: Olga Kornievskaia <aglo@umich.edu>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-07 16:43:57 -05:00
J. Bruce Fields
085def3ade nfsd4: fix cached replies to solo SEQUENCE compounds
Currently our handling of 4.1+ requests without "cachethis" set is
confusing and not quite correct.

Suppose a client sends a compound consisting of only a single SEQUENCE
op, and it matches the seqid in a session slot (so it's a retry), but
the previous request with that seqid did not have "cachethis" set.

The obvious thing to do might be to return NFS4ERR_RETRY_UNCACHED_REP,
but the protocol only allows that to be returned on the op following the
SEQUENCE, and there is no such op in this case.

The protocol permits us to cache replies even if the client didn't ask
us to.  And it's easy to do so in the case of solo SEQUENCE compounds.

So, when we get a solo SEQUENCE, we can either return the previously
cached reply or NFSERR_SEQ_FALSE_RETRY if we notice it differs in some
way from the original call.

Currently, we're returning a corrupt reply in the case a solo SEQUENCE
matches a previous compound with more ops.  This actually matters
because the Linux client recently started doing this as a way to recover
from lost replies to idempotent operations in the case the process doing
the original reply was killed: in that case it's difficult to keep the
original arguments around to do a real retry, and the client no longer
cares what the result is anyway, but it would like to make sure that the
slot's sequence id has been incremented, and the solo SEQUENCE assures
that: if the server never got the original reply, it will increment the
sequence id.  If it did get the original reply, it won't increment, and
nothing else that about the reply really matters much.  But we can at
least attempt to return valid xdr!

Tested-by: Olga Kornievskaia <aglo@umich.edu>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-07 16:43:57 -05:00
J. Bruce Fields
de766e5704 nfsd: give out fewer session slots as limit approaches
Instead of granting client's full requests until we hit our DRC size
limit and then failing CREATE_SESSIONs (and hence mounts) completely,
start granting clients smaller slot tables as we approach the limit.

The factor chosen here is pretty much arbitrary.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-10-04 16:25:01 -04:00
Christoph Hellwig
eb69853da9 nfsd4: properly type op_func callbacks
Pass union nfsd4_op_u to the op_func callbacks instead of using unsafe
function pointer casts.

It also adds two missing structures to struct nfsd4_op.u to facilitate
this.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-05-15 17:42:29 +02:00
Christoph Hellwig
57832e7bd8 nfsd4: properly type op_get_currentstateid callbacks
Pass union nfsd4_op_u to the op_set_currentstateid callbacks instead of
using unsafe function pointer casts.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-05-15 17:42:27 +02:00
Christoph Hellwig
b60e985980 nfsd4: properly type op_set_currentstateid callbacks
Given the args union in struct nfsd4_op a name, and pass it to the
op_set_currentstateid callbacks instead of using unsafe function
pointer casts.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-05-15 17:42:27 +02:00
NeilBrown
2f10fdcb6a nfsd4: remove pointless strdup_if_nonnull
kstrdup() already checks for NULL.

(Brought to our attention by Jason Yann noticing (from sparse output)
that it should have been declared static.)

Signed-off-by: NeilBrown <neilb@suse.com>
Reported-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-04-25 17:25:54 -04:00
Rasmus Villemoes
4ab495bfe5 nfsd: remove superfluous KERN_INFO
dprintk already provides a KERN_* prefix; this KERN_INFO just shows up
as some odd characters in the output.

Simplify the message a bit while we're there.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-02-24 15:45:13 -05:00
Kinglong Mee
f7d1ddbe76 nfsd/callback: Cleanup callback cred on shutdown
The rpccred gotten from rpc_lookup_machine_cred() should be put when
state is shutdown.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-02-17 16:26:00 -05:00
Kinglong Mee
d19fb70dd6 NFSD: Fix a null reference case in find_or_create_lock_stateid()
nfsd assigns the nfs4_free_lock_stateid to .sc_free in init_lock_stateid().

If nfsd doesn't go through init_lock_stateid() and put stateid at end,
there is a NULL reference to .sc_free when calling nfs4_put_stid(ns).

This patch let the nfs4_stid.sc_free assignment to nfs4_alloc_stid().

Cc: stable@vger.kernel.org
Fixes: 356a95ece7 "nfsd: clean up races in lock stateid searching..."
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-01-31 12:29:24 -05:00
Chuck Lever
f46c445b79 nfsd: Fix general protection fault in release_lock_stateid()
When I push NFSv4.1 / RDMA hard, (xfstests generic/089, for example),
I get this crash on the server:

Oct 28 22:04:30 klimt kernel: general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
Oct 28 22:04:30 klimt kernel: Modules linked in: cts rpcsec_gss_krb5 iTCO_wdt iTCO_vendor_support sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm btrfs irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd xor pcspkr raid6_pq i2c_i801 i2c_smbus lpc_ich mfd_core sg mei_me mei ioatdma shpchp wmi ipmi_si ipmi_msghandler rpcrdma ib_ipoib rdma_ucm acpi_power_meter acpi_pad ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c mlx4_ib mlx4_en ib_core sr_mod cdrom sd_mod ast drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel igb ahci libahci ptp mlx4_core pps_core dca libata i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod
Oct 28 22:04:30 klimt kernel: CPU: 7 PID: 1558 Comm: nfsd Not tainted 4.9.0-rc2-00005-g82cd754 #8
Oct 28 22:04:30 klimt kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 1.0c 09/09/2015
Oct 28 22:04:30 klimt kernel: task: ffff880835c3a100 task.stack: ffff8808420d8000
Oct 28 22:04:30 klimt kernel: RIP: 0010:[<ffffffffa05a759f>]  [<ffffffffa05a759f>] release_lock_stateid+0x1f/0x60 [nfsd]
Oct 28 22:04:30 klimt kernel: RSP: 0018:ffff8808420dbce0  EFLAGS: 00010246
Oct 28 22:04:30 klimt kernel: RAX: ffff88084e6660f0 RBX: ffff88084e667020 RCX: 0000000000000000
Oct 28 22:04:30 klimt kernel: RDX: 0000000000000007 RSI: 0000000000000000 RDI: ffff88084e667020
Oct 28 22:04:30 klimt kernel: RBP: ffff8808420dbcf8 R08: 0000000000000001 R09: 0000000000000000
Oct 28 22:04:30 klimt kernel: R10: ffff880835c3a100 R11: ffff880835c3aca8 R12: 6b6b6b6b6b6b6b6b
Oct 28 22:04:30 klimt kernel: R13: ffff88084e6670d8 R14: ffff880835f546f0 R15: ffff880835f1c548
Oct 28 22:04:30 klimt kernel: FS:  0000000000000000(0000) GS:ffff88087bdc0000(0000) knlGS:0000000000000000
Oct 28 22:04:30 klimt kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 28 22:04:30 klimt kernel: CR2: 00007ff020389000 CR3: 0000000001c06000 CR4: 00000000001406e0
Oct 28 22:04:30 klimt kernel: Stack:
Oct 28 22:04:30 klimt kernel: ffff88084e667020 0000000000000000 ffff88084e6670d8 ffff8808420dbd20
Oct 28 22:04:30 klimt kernel: ffffffffa05ac80d ffff880835f54548 ffff88084e640008 ffff880835f545b0
Oct 28 22:04:30 klimt kernel: ffff8808420dbd70 ffffffffa059803d ffff880835f1c768 0000000000000870
Oct 28 22:04:30 klimt kernel: Call Trace:
Oct 28 22:04:30 klimt kernel: [<ffffffffa05ac80d>] nfsd4_free_stateid+0xfd/0x1b0 [nfsd]
Oct 28 22:04:30 klimt kernel: [<ffffffffa059803d>] nfsd4_proc_compound+0x40d/0x690 [nfsd]
Oct 28 22:04:30 klimt kernel: [<ffffffffa0583114>] nfsd_dispatch+0xd4/0x1d0 [nfsd]
Oct 28 22:04:30 klimt kernel: [<ffffffffa047bbf9>] svc_process_common+0x3d9/0x700 [sunrpc]
Oct 28 22:04:30 klimt kernel: [<ffffffffa047ca64>] svc_process+0xf4/0x330 [sunrpc]
Oct 28 22:04:30 klimt kernel: [<ffffffffa05827ca>] nfsd+0xfa/0x160 [nfsd]
Oct 28 22:04:30 klimt kernel: [<ffffffffa05826d0>] ? nfsd_destroy+0x170/0x170 [nfsd]
Oct 28 22:04:30 klimt kernel: [<ffffffff810b367b>] kthread+0x10b/0x120
Oct 28 22:04:30 klimt kernel: [<ffffffff810b3570>] ? kthread_stop+0x280/0x280
Oct 28 22:04:30 klimt kernel: [<ffffffff8174e8ba>] ret_from_fork+0x2a/0x40
Oct 28 22:04:30 klimt kernel: Code: c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 48 8b 87 b0 00 00 00 48 89 fb 4c 8b a0 98 00 00 00 <49> 8b 44 24 20 48 8d b8 80 03 00 00 e8 10 66 1a e1 48 89 df e8
Oct 28 22:04:30 klimt kernel: RIP  [<ffffffffa05a759f>] release_lock_stateid+0x1f/0x60 [nfsd]
Oct 28 22:04:30 klimt kernel: RSP <ffff8808420dbce0>
Oct 28 22:04:30 klimt kernel: ---[ end trace cf5d0b371973e167 ]---

Jeff Layton says:
> Hm...now that I look though, this is a little suspicious:
>
>    struct nfs4_openowner *oo = openowner(stp->st_openstp->st_stateowner);
>
> I wonder if it's possible for the openstateid to have already been
> destroyed at this point.
>
> We might be better off doing something like this to get the client pointer:
>
>    stp->st_stid.sc_client;
>
> ...which should be more direct and less dependent on other stateids
> staying valid.

With the suggested change, I am no longer able to reproduce the above oops.

v2: Fix unhash_lock_stateid() as well

Fix-suggested-by: Jeff Layton <jlayton@redhat.com>
Fixes: 42691398be ('nfsd: Fix race between FREE_STATEID and LOCK')
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-11-01 15:24:43 -04:00
Jeff Layton
0cc11a61b8 nfsd: move blocked lock handling under a dedicated spinlock
Bruce was hitting some lockdep warnings in testing, showing that we
could hit a deadlock with the new CB_NOTIFY_LOCK handling, involving a
rather complex situation involving four different spinlocks.

The crux of the matter is that we end up taking the nn->client_lock in
the lm_notify handler. The simplest fix is to just declare a new
per-nfsd_net spinlock to protect the new CB_NOTIFY_LOCK structures.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-10-24 16:51:21 -04:00
Linus Torvalds
2778556474 Some RDMA work and some good bugfixes, and two new features that could
benefit from user testing:
 
 Anna Schumacker contributed a simple NFSv4.2 COPY implementation.  COPY
 is already supported on the client side, so a call to copy_file_range()
 on a recent client should now result in a server-side copy that doesn't
 require all the data to make a round trip to the client and back.
 
 Jeff Layton implemented callbacks to notify clients when contended locks
 become available, which should reduce latency on workloads with
 contended locks.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJX/mcsAAoJECebzXlCjuG+MU0P/3SzTLGYXU5yOTAorx255/uf
 fUVKQQhTzzaA2xj3gWWWztYx3y0ZJUVgwU56a+Ap5Z8/goqDQ78H+ePEc+MG7BT/
 /UXS/bITvt0MP/dvPrDzhSltvqx/wpelLPBo29hGLlAQ2dsnD4Y75IbOOQccWqcC
 iD2v6x7lnpWZ7j9Zhwzg/JNQHwISIb7tiLoYBjfcdNDEMU76KIyhxD0Cx9MSeBzH
 9Rq/oEdwGDFS5WqVfNe2jxbngoauq1IupziQ2eQGv2D/POyXCx8fphoYjDz1XaW8
 PxaJtJtM2owPGG+z2CxklJqNaS1Z4F+oppjg+nf4i/ibxmIBaTy8NluASX3vMh69
 CDO1+ly+TiF0l1VqMOQJWRnqn1qGk6fLpF6P1Ac62B0oWpeLGU7nmik7XN1ORgsi
 8ksxRKNAWeprZo3wl5xNrADu/wlZ7XCJTc4QoHEgYT04aHF+j8EMCHv+mtZ8+Bwn
 WWiA8iItZOgXV4vitCRJlvsixjYvmF3djPIoI2Lt5KDWIg+eL89sKwzTALSfeC4m
 Vjb0svzPX1MmZCNP1rCStFbl3gZYXZyqPk+uA6M7H8mjAjVeKxRPowWpMBgvYZHr
 FjCPb878bAuqCeBVbIyOLLcKWBLTw8PsUWZAor3gNg454JGkMjLUyJ/S22Cz5Nbo
 HdjoiTJtbPrHnCwTMXwa
 =nozl
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-4.9' of git://linux-nfs.org/~bfields/linux

Pull nfsd updates from Bruce Fields:
 "Some RDMA work and some good bugfixes, and two new features that could
  benefit from user testing:

   - Anna Schumacker contributed a simple NFSv4.2 COPY implementation.
     COPY is already supported on the client side, so a call to
     copy_file_range() on a recent client should now result in a
     server-side copy that doesn't require all the data to make a round
     trip to the client and back.

   - Jeff Layton implemented callbacks to notify clients when contended
     locks become available, which should reduce latency on workloads
     with contended locks"

* tag 'nfsd-4.9' of git://linux-nfs.org/~bfields/linux:
  NFSD: Implement the COPY call
  nfsd: handle EUCLEAN
  nfsd: only WARN once on unmapped errors
  exportfs: be careful to only return expected errors.
  nfsd4: setclientid_confirm with unmatched verifier should fail
  nfsd: randomize SETCLIENTID reply to help distinguish servers
  nfsd: set the MAY_NOTIFY_LOCK flag in OPEN replies
  nfs: add a new NFS4_OPEN_RESULT_MAY_NOTIFY_LOCK constant
  nfsd: add a LRU list for blocked locks
  nfsd: have nfsd4_lock use blocking locks for v4.1+ locks
  nfsd: plumb in a CB_NOTIFY_LOCK operation
  NFSD: fix corruption in notifier registration
  svcrdma: support Remote Invalidation
  svcrdma: Server-side support for rpcrdma_connect_private
  rpcrdma: RDMA/CM private message data structure
  svcrdma: Skip put_page() when send_reply() fails
  svcrdma: Tail iovec leaves an orphaned DMA mapping
  nfsd: fix dprintk in nfsd4_encode_getdeviceinfo
  nfsd: eliminate cb_minorversion field
  nfsd: don't set a FL_LAYOUT lease for flexfiles layouts
2016-10-13 21:04:42 -07:00
Alexey Dobriyan
81243eacfa cred: simpler, 1D supplementary groups
Current supplementary groups code can massively overallocate memory and
is implemented in a way so that access to individual gid is done via 2D
array.

If number of gids is <= 32, memory allocation is more or less tolerable
(140/148 bytes).  But if it is not, code allocates full page (!)
regardless and, what's even more fun, doesn't reuse small 32-entry
array.

2D array means dependent shifts, loads and LEAs without possibility to
optimize them (gid is never known at compile time).

All of the above is unnecessary.  Switch to the usual
trailing-zero-len-array scheme.  Memory is allocated with
kmalloc/vmalloc() and only as much as needed.  Accesses become simpler
(LEA 8(gi,idx,4) or even without displacement).

Maximum number of gids is 65536 which translates to 256KB+8 bytes.  I
think kernel can handle such allocation.

On my usual desktop system with whole 9 (nine) aux groups, struct
group_info shrinks from 148 bytes to 44 bytes, yay!

Nice side effects:

 - "gi->gid[i]" is shorter than "GROUP_AT(gi, i)", less typing,

 - fix little mess in net/ipv4/ping.c
   should have been using GROUP_AT macro but this point becomes moot,

 - aux group allocation is persistent and should be accounted as such.

Link: http://lkml.kernel.org/r/20160817201927.GA2096@p183.telecom.by
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Vasily Kulikov <segoon@openwall.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-07 18:46:30 -07:00
J. Bruce Fields
7d22fc11c7 nfsd4: setclientid_confirm with unmatched verifier should fail
A setclientid_confirm with (clientid, verifier) both matching an
existing confirmed record is assumed to be a replay, but if the verifier
doesn't match, it shouldn't be.

This would be a very rare case, except that clients following
https://tools.ietf.org/html/rfc7931#section-5.8 may depend on the
failure.

Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-09-26 15:20:38 -04:00
Jeff Layton
19e4c3477f nfsd: set the MAY_NOTIFY_LOCK flag in OPEN replies
If we are using v4.1+, then we can send notification when contended
locks become free. Inform the client of that fact.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-09-26 15:20:37 -04:00
Jeff Layton
7919d0a27f nfsd: add a LRU list for blocked locks
It's possible for a client to call in on a lock that is blocked for a
long time, but discontinue polling for it. A malicious client could
even set a lock on a file, and then spam the server with failing lock
requests from different lockowners that pile up in a DoS attack.

Add the blocked lock structures to a per-net namespace LRU when hashing
them, and timestamp them. If the lock request is not revisited after a
lease period, we'll drop it under the assumption that the client is no
longer interested.

This also gives us a mechanism to clean up these objects at server
shutdown time as well.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-09-26 15:20:36 -04:00
Jeff Layton
76d348fadf nfsd: have nfsd4_lock use blocking locks for v4.1+ locks
Create a new per-lockowner+per-inode structure that contains a
file_lock. Have nfsd4_lock add this structure to the lockowner's list
prior to setting the lock. Then call the vfs and request a blocking lock
(by setting FL_SLEEP). If we get anything besides FILE_LOCK_DEFERRED
back, then we dequeue the block structure and free it. When the next
lock request comes in, we'll look for an existing block for the same
filehandle and dequeue and reuse it if there is one.

When the lock comes free (a'la an lm_notify call), we dequeue it
from the lockowner's list and kick off a CB_NOTIFY_LOCK callback to
inform the client that it should retry the lock request.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-09-26 15:20:36 -04:00
Jeff Layton
dd257933fa nfsd: don't return an unhashed lock stateid after taking mutex
nfsd4_lock will take the st_mutex before working with the stateid it
gets, but between the time when we drop the cl_lock and take the mutex,
the stateid could become unhashed (a'la FREE_STATEID). If that happens
the lock stateid returned to the client will be forgotten.

Fix this by first moving the st_mutex acquisition into
lookup_or_create_lock_state. Then, have it check to see if the lock
stateid is still hashed after taking the mutex. If it's not, then put
the stateid and try the find/create again.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Tested-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Cc: stable@vger.kernel.org # feb9dad5 nfsd: Always lock state exclusively.
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-08-12 16:10:25 -04:00
Chuck Lever
42691398be nfsd: Fix race between FREE_STATEID and LOCK
When running LTP's nfslock01 test, the Linux client can send a LOCK
and a FREE_STATEID request at the same time. The outcome is:

Frame 324    R OPEN stateid [2,O]

Frame 115004 C LOCK lockowner_is_new stateid [2,O] offset 672000 len 64
Frame 115008 R LOCK stateid [1,L]
Frame 115012 C WRITE stateid [0,L] offset 672000 len 64
Frame 115016 R WRITE NFS4_OK
Frame 115019 C LOCKU stateid [1,L] offset 672000 len 64
Frame 115022 R LOCKU NFS4_OK
Frame 115025 C FREE_STATEID stateid [2,L]
Frame 115026 C LOCK lockowner_is_new stateid [2,O] offset 672128 len 64
Frame 115029 R FREE_STATEID NFS4_OK
Frame 115030 R LOCK stateid [3,L]
Frame 115034 C WRITE stateid [0,L] offset 672128 len 64
Frame 115038 R WRITE NFS4ERR_BAD_STATEID

In other words, the server returns stateid L in a successful LOCK
reply, but it has already released it. Subsequent uses of stateid L
fail.

To address this, protect the generation check in nfsd4_free_stateid
with the st_mutex. This should guarantee that only one of two
outcomes occurs: either LOCK returns a fresh valid stateid, or
FREE_STATEID returns NFS4ERR_LOCKS_HELD.

Reported-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Fix-suggested-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-08-11 15:08:39 -04:00
Chuck Lever
885848186f nfsd: Close race between nfsd4_release_lockowner and nfsd4_lock
nfsd4_release_lockowner finds a lock owner that has no lock state,
and drops cl_lock. Then release_lockowner picks up cl_lock and
unhashes the lock owner.

During the window where cl_lock is dropped, I don't see anything
preventing a concurrent nfsd4_lock from finding that same lock owner
and adding lock state to it.

Move release_lockowner() into nfsd4_release_lockowner and hang onto
the cl_lock until after the lock owner's state cannot be found
again.

Found by inspection, we don't currently have a reproducer.

Fixes: 2c41beb0e5 ("nfsd: reduce cl_lock thrashing in ... ")
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-07-15 15:31:31 -04:00
Christophe JAILLET
d28c442f5b nfsd: Fix some indent inconsistancy
Silent a few smatch warnings about indentation

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-07-13 15:53:41 -04:00
Andrew Elble
ed94164398 nfsd: implement machine credential support for some operations
This addresses the conundrum referenced in RFC5661 18.35.3,
and will allow clients to return state to the server using the
machine credentials.

The biggest part of the problem is that we need to allow the client
to send a compound op with integrity/privacy on mounts that don't
have it enabled.

Add server support for properly decoding and using spo_must_enforce
and spo_must_allow bits. Add support for machine credentials to be
used for CLOSE, OPEN_DOWNGRADE, LOCKU, DELEGRETURN,
and TEST/FREE STATEID.
Implement a check so as to not throw WRONGSEC errors when these
operations are used if integrity/privacy isn't turned on.

Without this, Linux clients with credentials that expired while holding
delegations were getting stuck in an endless loop.

Signed-off-by: Andrew Elble <aweits@rit.edu>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-07-13 15:32:47 -04:00
Andrew Elble
dedeb13f9e nfsd: allow mach_creds_match to be used more broadly
Rename mach_creds_match() to nfsd4_mach_creds_match() and un-staticify

Signed-off-by: Andrew Elble <aweits@rit.edu>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-07-13 15:32:47 -04:00
Oleg Drokin
8c7245abda nfsd: Make init_open_stateid() a bit more whole
Move the state selection logic inside from the caller,
always making it return correct stp to use.

Signed-off-by: J . Bruce Fields <bfields@fieldses.org>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-06-15 22:03:53 -04:00
Oleg Drokin
5cc1fb2a09 nfsd: Extend the mutex holding region around in nfsd4_process_open2()
To avoid racing entry into nfs4_get_vfs_file().
Make init_open_stateid() return with locked stateid to be unlocked
by the caller.

Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-06-15 22:03:41 -04:00
Oleg Drokin
feb9dad520 nfsd: Always lock state exclusively.
It used to be the case that state had an rwlock that was locked for write
by downgrades, but for read for upgrades (opens). Well, the problem is
if there are two competing opens for the same state, they step on
each other toes potentially leading to leaking file descriptors
from the state structure, since access mode is a bitmap only set once.

Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-06-15 22:03:31 -04:00
Jeff Layton
14b7f4a1ed nfsd: handle seqid wraparound in nfsd4_preprocess_layout_stateid
Move the existing static function to an inline helper, and call it.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-05-13 15:34:47 -04:00
Chuck Lever
4500632f60 nfsd: Lower NFSv4.1 callback message size limit
The maximum size of a backchannel message on RPC-over-RDMA depends
on the connection's inline threshold. Today that threshold is
typically 1024 bytes, making the maximum message size 996 bytes.

The Linux server's CREATE_SESSION operation checks that the size
of callback Calls can be as large as 1044 bytes, to accommodate
RPCSEC_GSS. Thus CREATE_SESSION fails if a client advertises the
true message size maximum of 996 bytes.

But the server's backchannel currently does not support RPCSEC_GSS.
The actual maximum size it needs is much smaller. It is safe to
reduce the limit to enable NFSv4.1 on RDMA backchannel operation.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-01 13:06:35 -08:00
Chuck Lever
4ce85c8cf8 nfsd: Update NFS server comments related to RDMA support
The server does indeed now support NFSv4.1 on RDMA transports. It
does not support shifting an RDMA-capable TCP transport (such as
iWARP) to RDMA mode.

Reported-by: Shirley Ma <shirley.ma@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-01 13:06:32 -08:00
Kinglong Mee
8edf4b0288 nfsd: Fix a memory leak when meeting unsupported state_protect_how4
Remember free allocated client when meeting unsupported state protect how.

Fixes: 50c7b948ad ("nfsd: minor consolidation of mach_cred handling code")
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-01 13:06:31 -08:00
Linus Torvalds
cc80fe0eef Smaller bugfixes and cleanup, including a fix for a failures of
kerberized NFSv4.1 mounts, and Scott Mayhew's work addressing ACK storms
 that can affect some high-availability NFS setups.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWmVZHAAoJECebzXlCjuG+Cn4P/3zwSuwIeLuv9b89vzFXU8Xv
 AbBWHk7WkFXJQGTKdclYjwxqU+l15D5lYHCae1cuD5eXdviraxXf7EcnqrMhJUc0
 oRiQx0rAwlEkKUAVrxGCFP7WKjlX3TsEBV6wPpTCP3BEMzTPDEeaDek7+hICFkLF
 9a/miEXAopm3jxP7WNmXEkdKpFEHklDDwtv6Av7iIKCW6+7XCGp7Prqo4NQKAKp6
 hjE+nvt2HiD06MZhUeyb14cn6547smzt1rbSfK4IB4yHMwLyaoqPrT7ekDh9LDrE
 uGgo+Y2PBbEcTAE6tJ88EjZx7cMCFPn0te+eKPgnpPy9RqrNqSxj5N/b7JAecKgW
 a/09BtvFOoYs8fO5ovqeRY5THrE3IRyMIwn4gt7fCYaaAbG3dwGKG1uklTAVXtb1
 95DkhOb8He2VhOCCoJ6ybbTnRfjB6b/cv7ZuEGlQfvTE+BtU3Jj9I76ruWFhb3zd
 HM1dRI20UfwL/0Y8yYhZ+/rje9SSk2jOmVgSCqY9hnCmEqOqOdUU0X/uumIWaBym
 zfGx9GIM0jQuYVdLQRXtJJbUgJUUN3MilGyU5wx7YoXip5guqTalXqAdQpShzXeW
 s1ATYh/mY5X9ig51KogkkVlm9bXDQAzJBAnDRpLtJZqy5Cgkrj9RSu0ExN1Rmlhw
 LKQCddBQxUSWJ+XWycgK
 =G7V3
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-4.5' of git://linux-nfs.org/~bfields/linux

Pull nfsd updates from Bruce Fields:
 "Smaller bugfixes and cleanup, including a fix for a failures of
  kerberized NFSv4.1 mounts, and Scott Mayhew's work addressing ACK
  storms that can affect some high-availability NFS setups"

* tag 'nfsd-4.5' of git://linux-nfs.org/~bfields/linux:
  nfsd: add new io class tracepoint
  nfsd: give up on CB_LAYOUTRECALLs after two lease periods
  nfsd: Fix nfsd leaks sunrpc module references
  lockd: constify nlmsvc_binding structure
  lockd: use to_delayed_work
  nfsd: use to_delayed_work
  Revert "svcrdma: Do not send XDR roundup bytes for a write chunk"
  lockd: Register callbacks on the inetaddr_chain and inet6addr_chain
  nfsd: Register callbacks on the inetaddr_chain and inet6addr_chain
  sunrpc: Add a function to close temporary transports immediately
  nfsd: don't base cl_cb_status on stale information
  nfsd4: fix gss-proxy 4.1 mounts for some AD principals
  nfsd: fix unlikely NULL deref in mach_creds_match
  nfsd: minor consolidation of mach_cred handling code
  nfsd: helper for dup of possibly NULL string
  svcrpc: move some initialization to common code
  nfsd: fix a warning message
  nfsd: constify nfsd4_callback_ops structure
  nfsd: recover: constify nfsd4_client_tracking_ops structures
  svcrdma: Do not send XDR roundup bytes for a write chunk
2016-01-15 12:49:44 -08:00
Geliang Tang
2e55f3ab45 nfsd: use to_delayed_work
Use to_delayed_work() instead of open-coding it.

Signed-off-by: Geliang Tang <geliangtang@163.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-01-07 10:10:49 -05:00
Anna Schumaker
aa0d6aed45 nfsd: Pass filehandle to nfs4_preprocess_stateid_op()
This will be needed so COPY can look up the saved_fh in addition to the
current_fh.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-07 23:11:52 -05:00
J. Bruce Fields
414ca017a5 nfsd4: fix gss-proxy 4.1 mounts for some AD principals
The principal name on a gss cred is used to setup the NFSv4.0 callback,
which has to have a client principal name to authenticate to.

That code wants the name to be in the form servicetype@hostname.
rpc.svcgssd passes down such names (and passes down no principal name at
all in the case the principal isn't a service principal).

gss-proxy always passes down the principal name, and passes it down in
the form servicetype/hostname@REALM.  So we've been munging the name
gss-proxy passes down into the format the NFSv4.0 callback code expects,
or throwing away the name if we can't.

Since the introduction of the MACH_CRED enforcement in NFSv4.1, we've
also been using the principal name to verify that certain operations are
done as the same principal as was used on the original EXCHANGE_ID call.

For that application, the original name passed down by gss-proxy is also
useful.

Lack of that name in some cases was causing some kerberized NFSv4.1
mount failures in an Active Directory environment.

This fix only works in the gss-proxy case.  The fix for legacy
rpc.svcgssd would be more involved, and rpc.svcgssd already has other
problems in the AD case.

Reported-and-tested-by: James Ralston <ralston@pobox.com>
Acked-by: Simo Sorce <simo@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-11-24 11:36:31 -07:00
J. Bruce Fields
920dd9bb7d nfsd: fix unlikely NULL deref in mach_creds_match
We really shouldn't allow a client to be created with cl_mach_cred set
unless it also has a principal name.

This also allows us to fail such cases immediately on EXCHANGE_ID as
opposed to waiting and incorrectly returning WRONG_CRED on the following
CREATE_SESSION.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-11-24 10:39:18 -07:00
J. Bruce Fields
50c7b948ad nfsd: minor consolidation of mach_cred handling code
Minor cleanup, no change in functionality.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-11-24 10:39:18 -07:00
J. Bruce Fields
5004385932 nfsd: helper for dup of possibly NULL string
Technically the initialization in the NULL case isn't even needed as the
only caller already has target zeroed out, but it seems safer to keep
copy_cred generic.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-11-24 10:39:17 -07:00
Dan Carpenter
d3f03403a8 nfsd: fix a warning message
The WARN() macro takes a condition and a format string.  The condition
was accidentally left out here so it just prints the function name
instead of the message.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Jeff Layton <jlayton@poochiereds.net>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-11-23 12:15:31 -07:00
Julia Lawall
c4cb897462 nfsd: constify nfsd4_callback_ops structure
The nfsd4_callback_ops structure is never modified, so declare it as const.

Done with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-11-23 12:15:31 -07:00
Andrew Elble
7fc0564e3a nfsd: fix race with open / open upgrade stateids
We observed multiple open stateids on the server for files that
seemingly should have been closed.

nfsd4_process_open2() tests for the existence of a preexisting
stateid. If one is not found, the locks are dropped and a new
one is created. The problem is that init_open_stateid(), which
is also responsible for hashing the newly initialized stateid,
doesn't check to see if another open has raced in and created
a matching stateid. This fix is to enable init_open_stateid() to
return the matching stateid and have nfsd4_process_open2()
swap to that stateid and switch to the open upgrade path.
In testing this patch, coverage to the newly created
path indicates that the race was indeed happening.

Signed-off-by: Andrew Elble <aweits@rit.edu>
Reviewed-by: Jeff Layton <jlayton@poochiereds.net>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-11-10 09:29:45 -05:00
Andrew Elble
34ed9872e7 nfsd: eliminate sending duplicate and repeated delegations
We've observed the nfsd server in a state where there are
multiple delegations on the same nfs4_file for the same client.
The nfs client does attempt to DELEGRETURN these when they are presented to
it - but apparently under some (unknown) circumstances the client does not
manage to return all of them. This leads to the eventual
attempt to CB_RECALL more than one delegation with the same nfs
filehandle to the same client. The first recall will succeed, but the
next recall will fail with NFS4ERR_BADHANDLE. This leads to the server
having delegations on cl_revoked that the client has no way to FREE
or DELEGRETURN, with resulting inability to recover. The state manager
on the server will continually assert SEQ4_STATUS_RECALLABLE_STATE_REVOKED,
and the state manager on the client will be looping unable to satisfy
the server.

List discussion also reports a race between OPEN and DELEGRETURN that
will be avoided by only sending the delegation once to the
client. This is also logically in accordance with RFC5561 9.1.1 and 10.2.

So, let's:

1.) Not hand out duplicate delegations.
2.) Only send them to the client once.

RFC 5561:

9.1.1:
"Delegations and layouts, on the other hand, are not associated with a
specific owner but are associated with the client as a whole
(identified by a client ID)."

10.2:
"...the stateid for a delegation is associated with a client ID and may be
used on behalf of all the open-owners for the given client.  A
delegation is made to the client as a whole and not to any specific
process or thread of control within it."

Reported-by: Eric Meddaugh <etmsys@rit.edu>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Olga Kornievskaia <aglo@umich.edu>
Signed-off-by: Andrew Elble <aweits@rit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-11-10 09:29:44 -05:00
Jeff Layton
9767feb2c6 nfsd: ensure that seqid morphing operations are atomic wrt to copies
Bruce points out that the increment of the seqid in stateids is not
serialized in any way, so it's possible for racing calls to bump it
twice and end up sending the same stateid. While we don't have any
reports of this problem it _is_ theoretically possible, and could lead
to spurious state recovery by the client.

In the current code, update_stateid is always followed by a memcpy of
that stateid, so we can combine the two operations. For better
atomicity, we add a spinlock to the nfs4_stid and hold that when bumping
the seqid and copying the stateid.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-10-23 15:57:33 -04:00
J. Bruce Fields
4eaea13425 nfsd: improve client_has_state to check for unused openowners
At least in the v4.0 case openowners can hang around for a while after
last close, but they shouldn't really block (for example), a new mount
with a different principal.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-10-23 15:57:31 -04:00
J. Bruce Fields
2b63482185 nfsd: fix clid_inuse on mount with security change
In bakeathon testing Solaris client was getting CLID_INUSE error when
doing a krb5 mount soon after an auth_sys mount, or vice versa.

That's not really necessary since in this case the old client doesn't
have any state any more:

	http://tools.ietf.org/html/rfc7530#page-103

	"when the server gets a SETCLIENTID for a client ID that
	currently has no state, or it has state but the lease has
	expired, rather than returning NFS4ERR_CLID_INUSE, the server
	MUST allow the SETCLIENTID and confirm the new client ID if
	followed by the appropriate SETCLIENTID_CONFIRM."

This doesn't fix the problem completely since our client_has_state()
check counts openowners left around to handle close replays, which we
should probably just remove in this case.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-10-23 15:57:30 -04:00
Jeff Layton
35a92fe877 nfsd: serialize state seqid morphing operations
Andrew was seeing a race occur when an OPEN and OPEN_DOWNGRADE were
running in parallel. The server would receive the OPEN_DOWNGRADE first
and check its seqid, but then an OPEN would race in and bump it. The
OPEN_DOWNGRADE would then complete and bump the seqid again.  The result
was that the OPEN_DOWNGRADE would be applied after the OPEN, even though
it should have been rejected since the seqid changed.

The only recourse we have here I think is to serialize operations that
bump the seqid in a stateid, particularly when we're given a seqid in
the call. To address this, we add a new rw_semaphore to the
nfs4_ol_stateid struct. We do a down_write prior to checking the seqid
after looking up the stateid to ensure that nothing else is going to
bump it while we're operating on it.

In the case of OPEN, we do a down_read, as the call doesn't contain a
seqid. Those can run in parallel -- we just need to serialize them when
there is a concurrent OPEN_DOWNGRADE or CLOSE.

LOCK and LOCKU however always take the write lock as there is no
opportunity for parallelizing those.

Reported-and-Tested-by: Andrew W Elble <aweits@rit.edu>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-10-12 17:31:03 -04:00
Andrew Elble
a457974f1b nfsd: deal with DELEGRETURN racing with CB_RECALL
We have observed the server sending recalls for delegation stateids
that have already been successfully returned. Change
nfsd4_cb_recall_done() to return success if the client has returned
the delegation. While this does not completely eliminate the sending
of recalls for delegations that have already been returned, this
does prevent unnecessarily declaring the callback path to be down.

Reported-by: Eric Meddaugh <etmsys@rit.edu>
Signed-off-by: Andrew Elble <aweits@rit.edu>
Acked-by: Jeff Layton <jlayton@poochiereds.net>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-09-02 10:05:28 -04:00
J. Bruce Fields
f984a7ce58 nfsd: return CLID_INUSE for unexpected SETCLIENTID_CONFIRM case
Somebody with a Solaris client was hitting this case.  We haven't
figured out why yet, and don't have a reproducer.  Meanwhile Frank
noticed that RFC 7530 actually recommends CLID_INUSE for this case.
Unlikely to help the original reporter, but may as well fix it.

Reported-by: Frank Filz <ffilzlnx@mindspring.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-09-01 13:53:40 -04:00
Jeff Layton
3fcbbd244e nfsd: ensure that delegation stateid hash references are only put once
It's possible that a DELEGRETURN could race with (e.g.) client expiry,
in which case we could end up putting the delegation hash reference more
than once.

Have unhash_delegation_locked return a bool that indicates whether it
was already unhashed. In the case of destroy_delegation we only
conditionally put the hash reference if that returns true.

The other callers of unhash_delegation_locked call it while walking
list_heads that shouldn't yet be detached. If we find that it doesn't
return true in those cases, then throw a WARN_ON as that indicates that
we have a partially hashed delegation, and that something is likely very
wrong.

Tested-by: Andrew W Elble <aweits@rit.edu>
Tested-by: Anna Schumaker <Anna.Schumaker@netapp.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-31 16:32:16 -04:00
Jeff Layton
e85687393f nfsd: ensure that the ol stateid hash reference is only put once
When an open or lock stateid is hashed, we take an extra reference to
it. When we unhash it, we drop that reference. The code however does
not properly account for the case where we have two callers concurrently
trying to unhash the stateid. This can lead to list corruption and the
hash reference being put more than once.

Fix this by having unhash_ol_stateid use list_del_init on the st_perfile
list_head, and then testing to see if that list_head is empty before
releasing the hash reference. This means that some of the unhashing
wrappers now become bool return functions so we can test to see whether
the stateid was unhashed before we put the reference.

Reported-by: Andrew W Elble <aweits@rit.edu>
Tested-by: Andrew W Elble <aweits@rit.edu>
Reported-by: Anna Schumaker <Anna.Schumaker@netapp.com>
Tested-by: Anna Schumaker <Anna.Schumaker@netapp.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-31 16:32:15 -04:00
Jeff Layton
51a5456859 nfsd: allow more than one laundry job to run at a time
We can potentially have several nfs4_laundromat jobs running if there
are multiple namespaces running nfsd on the box. Those are effectively
separated from one another though, so I don't see any reason to
serialize them.

Also, create_singlethread_workqueue automatically adds the
WQ_MEM_RECLAIM flag. Since we run this job on a timer, it's not really
involved in any reclaim paths. I see no need for a rescuer thread.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-31 16:32:14 -04:00
J. Bruce Fields
c87fb4a378 lockd: NLM grace period shouldn't block NFSv4 opens
NLM locks don't conflict with NFSv4 share reservations, so we're not
going to learn anything new by watiting for them.

They do conflict with NFSv4 locks and with delegations.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-13 10:22:06 -04:00
J. Bruce Fields
9056fff3d5 Merge branch 'for-4.2' into for-4.3 2015-08-10 16:16:03 -04:00
Kinglong Mee
c8623999ff nfsd: Remove unused clientid arguments from, find_lockowner_str{_locked}
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10 16:05:54 -04:00
Kinglong Mee
76f6c9e176 nfsd: Use lk_new_xxx instead of v.new.xxx for nfs4_lockowner
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10 16:05:53 -04:00
Kinglong Mee
e7969315f4 nfsd: Remove macro LOFF_OVERFLOW
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10 16:05:52 -04:00
Kinglong Mee
7a5e8d5b5c nfsd: Remove duplicate checking of nfsd_net in nfs4_laundromat()
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10 16:05:51 -04:00
Kinglong Mee
efde6b4d4e nfsd: Remove unused values in nfs4_setlease()
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10 16:05:51 -04:00
Kinglong Mee
871860225b nfsd: Remove nfs4_set_claim_prev()
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10 16:05:50 -04:00
Kinglong Mee
f5e22bb6d9 nfsd: Drop duplicate checking of seqid in nfsd4_create_session()
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10 16:05:49 -04:00
Kinglong Mee
41eb16702c nfsd: Add missing gen_confirm in nfsd4_setclientid()
Commit 294ac32e99 "nfsd: protect clid and verifier generation with
client_lock" moved gen_confirm() to gen_clid().

After that commit, setclientid will return a bad reply with all-zero
verifier after copy_clid().

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10 16:05:48 -04:00
Kinglong Mee
19311aa835 nfsd: New counter for generating client confirm verifier
If using clientid_counter, it seems possible that gen_confirm could
generate the same verifier for the same client in some situations.

Add a new counter for client confirm verifier to make sure gen_confirm
generates a different verifier on each call for the same clientid.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Jeff Layton <jlayton@poochiereds.net>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10 16:05:47 -04:00
Kinglong Mee
d50ffded79 nfsd: Fix memory leak of so_owner.data in nfs4_stateowner
v2, new helper nfs4_free_stateowner for freeing so_owner.data and sop

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10 16:05:46 -04:00
Kinglong Mee
47e970bee7 nfsd: Add layouts checking in client_has_state()
Layout is a state resource, nfsd should check it too.

v2, drop unneeded updating in nfsd4_renew()
v3, fix compile error without CONFIG_NFSD_PNFS

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10 16:05:46 -04:00
Kinglong Mee
af9dbaf48d nfsd: Fix a memory leak of struct file_lock
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10 16:05:45 -04:00
Jeff Layton
8fcd461db7 nfsd: do nfs4_check_fh in nfs4_check_file instead of nfs4_check_olstateid
Currently, preprocess_stateid_op calls nfs4_check_olstateid which
verifies that the open stateid corresponds to the current filehandle in the
call by calling nfs4_check_fh.

If the stateid is a NFS4_DELEG_STID however, then no such check is done.
This could cause incorrect enforcement of permissions, because the
nfsd_permission() call in nfs4_check_file uses current the current
filehandle, but any subsequent IO operation will use the file descriptor
in the stateid.

Move the call to nfs4_check_fh into nfs4_check_file instead so that it
can be done for all stateid types.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Cc: stable@vger.kernel.org
[bfields: moved fh check to avoid NULL deref in special stateid case]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-07-31 16:30:26 -04:00
Christoph Hellwig
af90f707fa nfsd: take struct file setup fully into nfs4_preprocess_stateid_op
This patch changes nfs4_preprocess_stateid_op so it always returns
a valid struct file if it has been asked for that.  For that we
now allocate a temporary struct file for special stateids, and check
permissions if we got the file structure from the stateid.  This
ensures that all callers will get their handling of special stateids
right, and avoids code duplication.

There is a little wart in here because the read code needs to know
if we allocated a file structure so that it can copy around the
read-ahead parameters.  In the long run we should probably aim to
cache full file structures used with special stateids instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-06-22 14:15:03 -04:00
Christoph Hellwig
a0649b2d3f nfsd: refactor nfs4_preprocess_stateid_op
Split out two self contained helpers to make the function more readable.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-06-19 15:39:52 -04:00
Arnd Bergmann
6ac75368e1 nfsd: work around a gcc-5.1 warning
gcc-5.0 warns about a potential uninitialized variable use in nfsd:

fs/nfsd/nfs4state.c: In function 'nfsd4_process_open2':
fs/nfsd/nfs4state.c:3781:3: warning: 'old_deny_bmap' may be used uninitialized in this function [-Wmaybe-uninitialized]
   reset_union_bmap_deny(old_deny_bmap, stp);
   ^
fs/nfsd/nfs4state.c:3760:16: note: 'old_deny_bmap' was declared here
  unsigned char old_deny_bmap;
                ^

This is a false positive, the code path that is warned about cannot
actually be reached.

This adds an initialization for the variable to make the warning go
away.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-05-29 11:04:03 -04:00
Christoph Hellwig
fd89145460 nfsd: remove nfsd_close
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-05-04 12:02:43 -04:00
Christoph Hellwig
cba5f62b18 nfsd: fix callback restarts
Checking the rpc_client pointer is not a reliable way to detect
backchannel changes: cl_cb_client is changed only after shutting down
the rpc client, so the condition cl_cb_client = tk_client will always be
true.

Check the RPC_TASK_KILLED flag instead, and rewrite the code to avoid
the buggy cl_callbacks list and fix the lifetime rules due to double
calls of the ->prepare callback operations method for this retry case.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-05-04 12:02:41 -04:00
Sachin Bhamare
8287f009bd nfsd: fix pNFS return on close semantics
For the sake of forgetful clients, the server should return the layouts
to the file system on 'last close' of a file (assuming that there are no
delegations outstanding to that particular client) or on delegreturn
(assuming that there are no opens on a file from that particular
client).

In theory the information is all there in current data structures, but
it's not efficiently available; nfs4_file->fi_ref includes references on
the file across all clients, but we need a per-(client, file) count.
Walking through lots of stateid's to calculate this on each close or
delegreturn would be painful.

This patch introduces infrastructure to maintain per-client opens and
delegation counters on a per-file basis.

[hch: ported to the mainline pNFS support, merged various fixes from Jeff]
Signed-off-by: Sachin Bhamare <sachin.bhamare@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-05-04 12:02:39 -04:00
Christoph Hellwig
ebe9cb3bb1 nfsd: fix the check for confirmed openowner in nfs4_preprocess_stateid_op
If we find a non-confirmed openowner we jump to exit the function, but do
not set an error value.  Fix this by factoring out a helper to do the
check and properly set the error from nfsd4_validate_stateid.

Cc: stable@vger.kernel.org
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-05-04 12:02:38 -04:00
Linus Torvalds
9ec3a646fe Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull fourth vfs update from Al Viro:
 "d_inode() annotations from David Howells (sat in for-next since before
  the beginning of merge window) + four assorted fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  RCU pathwalk breakage when running into a symlink overmounting something
  fix I_DIO_WAKEUP definition
  direct-io: only inc/dec inode->i_dio_count for file systems
  fs/9p: fix readdir()
  VFS: assorted d_backing_inode() annotations
  VFS: fs/inode.c helpers: d_inode() annotations
  VFS: fs/cachefiles: d_backing_inode() annotations
  VFS: fs library helpers: d_inode() annotations
  VFS: assorted weird filesystems: d_inode() annotations
  VFS: normal filesystems (and lustre): d_inode() annotations
  VFS: security/: d_inode() annotations
  VFS: security/: d_backing_inode() annotations
  VFS: net/: d_inode() annotations
  VFS: net/unix: d_backing_inode() annotations
  VFS: kernel/: d_inode() annotations
  VFS: audit: d_backing_inode() annotations
  VFS: Fix up some ->d_inode accesses in the chelsio driver
  VFS: Cachefiles should perform fs modifications on the top layer only
  VFS: AF_UNIX sockets should call mknod on the top layer only
2015-04-26 17:22:07 -07:00
Linus Torvalds
860448cf76 Merge branch 'for-4.1' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
 "A quiet cycle this time; this is basically entirely bugfixes.

  The few that aren't cc'd to stable are cleanup or seemed unlikely to
  affect anyone much"

* 'for-4.1' of git://linux-nfs.org/~bfields/linux:
  uapi: Remove kernel internal declaration
  nfsd: fix nsfd startup race triggering BUG_ON
  nfsd: eliminate NFSD_DEBUG
  nfsd4: fix READ permission checking
  nfsd4: disallow SEEK with special stateids
  nfsd4: disallow ALLOCATE with special stateids
  nfsd: add NFSEXP_PNFS to the exflags array
  nfsd: Remove duplicate macro define for max sec label length
  nfsd: allow setting acls with unenforceable DENYs
  nfsd: NFSD_FAULT_INJECTION depends on DEBUG_FS
  nfsd: remove unused status arg to nfsd4_cleanup_open_state
  nfsd: remove bogus setting of status in nfsd4_process_open2
  NFSD: Use correct reply size calculating function
  NFSD: Using path_equal() for checking two paths
2015-04-24 07:46:05 -07:00
Mark Salter
135dd002c2 nfsd: eliminate NFSD_DEBUG
Commit f895b252d4 ("sunrpc: eliminate RPC_DEBUG") introduced
use of IS_ENABLED() in a uapi header which leads to a build
failure for userspace apps trying to use <linux/nfsd/debug.h>:

   linux/nfsd/debug.h:18:15: error: missing binary operator before token "("
  #if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
                ^

Since this was only used to define NFSD_DEBUG if CONFIG_SUNRPC_DEBUG
is enabled, replace instances of NFSD_DEBUG with CONFIG_SUNRPC_DEBUG.

Cc: stable@vger.kernel.org
Fixes: f895b252d4 "sunrpc: eliminate RPC_DEBUG"
Signed-off-by: Mark Salter <msalter@redhat.com>
Reviewed-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-04-21 16:16:02 -04:00
David Howells
2b0143b5c9 VFS: normal filesystems (and lustre): d_inode() annotations
that's the bulk of filesystem drivers dealing with inodes of their own

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-15 15:06:57 -04:00
Jeff Layton
cae80b305e locks: change lm_get_owner and lm_put_owner prototypes
The current prototypes for these operations are somewhat awkward as they
deal with fl_owners but take struct file_lock arguments. In the future,
we'll want to be able to take references without necessarily dealing
with a struct file_lock.

Change them to take fl_owner_t arguments instead and have the callers
deal with assigning the values to the file_lock structs.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
2015-04-03 09:04:04 -04:00
Jeff Layton
4229789993 nfsd: remove unused status arg to nfsd4_cleanup_open_state
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-03-31 16:46:39 -04:00
Jeff Layton
fc26c3860a nfsd: remove bogus setting of status in nfsd4_process_open2
status is always reset after this (and it doesn't make much sense there
anyway).

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-03-31 16:46:39 -04:00
J. Bruce Fields
340f0ba1c6 nfsd: return correct lockowner when there is a race on hash insert
alloc_init_lock_stateowner can return an already freed entry if there is
a race to put openowners in the hashtable.

Noticed by inspection after Jeff Layton fixed the same bug for open
owners.  Depending on client behavior, this one may be trickier to
trigger in practice.

Fixes: c58c6610ec "nfsd: Protect adding/removing lock owners using client_lock"
Cc: <stable@vger.kernel.org>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Acked-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-03-25 21:06:16 -04:00
Jeff Layton
c5952338bf nfsd: return correct openowner when there is a race to put one in the hash
alloc_init_open_stateowner can return an already freed entry if there is
a race to put openowners in the hashtable.

In commit 7ffb588086, we changed it so that we allocate and initialize
an openowner, and then check to see if a matching one got stuffed into
the hashtable in the meantime. If it did, then we free the one we just
allocated and take a reference on the one already there. There is a bug
here though. The code will then return the pointer to the one that was
allocated (and has now been freed).

This wasn't evident before as this race almost never occurred. The Linux
kernel client used to serialize requests for a single openowner.  That
has changed now with v4.0 kernels, and this race can now easily occur.

Fixes: 7ffb588086
Cc: <stable@vger.kernel.org> # v3.17+
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Reported-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-03-25 21:06:06 -04:00
Andrew Elble
c876486be1 nfsd: fix clp->cl_revoked list deletion causing softlock in nfsd
commit 2d4a532d38 ("nfsd: ensure that clp->cl_revoked list is
protected by clp->cl_lock") removed the use of the reaplist to
clean out clp->cl_revoked. It failed to change list_entry() to
walk clp->cl_revoked.next instead of reaplist.next

Fixes: 2d4a532d38 ("nfsd: ensure that clp->cl_revoked list is protected by clp->cl_lock")
Cc: stable@vger.kernel.org
Reported-by: Eric Meddaugh <etmsys@rit.edu>
Tested-by: Eric Meddaugh <etmsys@rit.edu>
Signed-off-by: Andrew Elble <aweits@rit.edu>
Reviewed-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-02-26 15:32:24 -05:00
Christoph Hellwig
c5c707f96f nfsd: implement pNFS layout recalls
Add support to issue layout recalls to clients.  For now we only support
full-file recalls to get a simple and stable implementation.  This allows
to embedd a nfsd4_callback structure in the layout_state and thus avoid
any memory allocations under spinlocks during a recall.  For normal
use cases that do not intent to share a single file between multiple
clients this implementation is fully sufficient.

To ensure layouts are recalled on local filesystem access each layout
state registers a new FL_LAYOUT lease with the kernel file locking code,
which filesystems that support pNFS exports that require recalls need
to break on conflicting access patterns.

The XDR code is based on the old pNFS server implementation by
Andy Adamson, Benny Halevy, Boaz Harrosh, Dean Hildebrand, Fred Isaman,
Marc Eshel, Mike Sager and Ricardo Labiaga.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2015-02-02 18:09:43 +01:00
Christoph Hellwig
9cf514ccfa nfsd: implement pNFS operations
Add support for the GETDEVICEINFO, LAYOUTGET, LAYOUTCOMMIT and
LAYOUTRETURN NFSv4.1 operations, as well as backing code to manage
outstanding layouts and devices.

Layout management is very straight forward, with a nfs4_layout_stateid
structure that extends nfs4_stid to manage layout stateids as the
top-level structure.  It is linked into the nfs4_file and nfs4_client
structures like the other stateids, and contains a linked list of
layouts that hang of the stateid.  The actual layout operations are
implemented in layout drivers that are not part of this commit, but
will be added later.

The worst part of this commit is the management of the pNFS device IDs,
which suffers from a specification that is not sanely implementable due
to the fact that the device-IDs are global and not bound to an export,
and have a small enough size so that we can't store the fsid portion of
a file handle, and must never be reused.  As we still do need perform all
export authentication and validation checks on a device ID passed to
GETDEVICEINFO we are caught between a rock and a hard place.  To work
around this issue we add a new hash that maps from a 64-bit integer to a
fsid so that we can look up the export to authenticate against it,
a 32-bit integer as a generation that we can bump when changing the device,
and a currently unused 32-bit integer that could be used in the future
to handle more than a single device per export.  Entries in this hash
table are never deleted as we can't reuse the ids anyway, and would have
a severe lifetime problem anyway as Linux export structures are temporary
structures that can go away under load.

Parts of the XDR data, structures and marshaling/unmarshaling code, as
well as many concepts are derived from the old pNFS server implementation
from Andy Adamson, Benny Halevy, Dean Hildebrand, Marc Eshel, Fred Isaman,
Mike Sager, Ricardo Labiaga and many others.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2015-02-02 18:09:42 +01:00
Christoph Hellwig
4d227fca1b nfsd: make find_any_file available outside nfs4state.c
Signed-off-by: Christoph Hellwig <hch@lst.de>
2015-02-02 18:09:41 +01:00
Christoph Hellwig
e6ba76e194 nfsd: make find/get/put file available outside nfs4state.c
Signed-off-by: Christoph Hellwig <hch@lst.de>
2015-02-02 18:09:41 +01:00
Christoph Hellwig
cd61c52231 nfsd: make lookup/alloc/unhash_stid available outside nfs4state.c
Signed-off-by: Christoph Hellwig <hch@lst.de>
2015-02-02 18:09:40 +01:00
Christoph Hellwig
4d94c2ef20 nfsd: move nfsd_fh_match to nfsfh.h
The pnfs code will need it too.  Also remove the nfsd_ prefix to match the
other filehandle helpers in that file.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2015-02-02 18:09:39 +01:00
Christoph Hellwig
2ab99ee124 fs: track fl_owner for leases
Just like for other lock types we should allow different owners to have
a read lease on a file.  Currently this can't happen, but with the addition
of pNFS layout leases we'll need this feature.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2015-02-02 18:09:38 +01:00
J. Bruce Fields
a584143b01 Merge branch 'locks-3.20' of git://git.samba.org/jlayton/linux into for-3.20
Christoph's block pnfs patches have some minor dependencies on these
lock patches.
2015-02-02 11:29:29 -05:00
J. Bruce Fields
bbc7f33ac6 nfsd: fix year-2038 nfs4 state problem
Someone with a weird time_t happened to notice this, it shouldn't really
manifest till 2038.  It may not be our ownly year-2038 problem.

Reported-by: Aaron Pace <Aaron.Pace@alcatel-lucent.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-01-23 10:29:11 -05:00
Jeff Layton
7448cc37b1 locks: clean up the lm_change prototype
Now that we use standard list_heads for tracking leases, we can have
lm_change take a pointer to the lease to be modified instead of a
double pointer.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Acked-by: Christoph Hellwig <hch@lst.de>
2015-01-16 16:08:50 -05:00
Jeff Layton
6109c85037 locks: add a dedicated spinlock to protect i_flctx lists
We can now add a dedicated spinlock without expanding struct inode.
Change to using that to protect the various i_flctx lists.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Acked-by: Christoph Hellwig <hch@lst.de>
2015-01-16 16:08:49 -05:00
Jeff Layton
bd61e0a9c8 locks: convert posix locks to file_lock_context
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Acked-by: Christoph Hellwig <hch@lst.de>
2015-01-16 16:08:16 -05:00
Rickard Strandqvist
917937025a nfsd: nfs4state: Remove unused function
Remove the function renew_client() that is not used anywhere.

This was partially found by using a static code analysis program called cppcheck.

Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-01-15 15:01:42 -05:00
Jeff Layton
67db103448 nfsd: fi_delegees doesn't need to be an atomic_t
fi_delegees is always handled under the fi_lock, so there's no need to
use an atomic_t for this field.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-01-07 14:05:35 -05:00
Jeff Layton
94ae1db226 nfsd: fix fi_delegees leak when fi_had_conflict returns true
Currently, nfs4_set_delegation takes a reference to an existing
delegation and then checks to see if there is a conflict. If there is
one, then it doesn't release that reference.

Change the code to take the reference after the check and only if there
is no conflict.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-01-07 13:38:21 -05:00
Linus Torvalds
0b233b7c79 Merge branch 'for-3.19' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
 "A comparatively quieter cycle for nfsd this time, but still with two
  larger changes:

   - RPC server scalability improvements from Jeff Layton (using RCU
     instead of a spinlock to find idle threads).

   - server-side NFSv4.2 ALLOCATE/DEALLOCATE support from Anna
     Schumaker, enabling fallocate on new clients"

* 'for-3.19' of git://linux-nfs.org/~bfields/linux: (32 commits)
  nfsd4: fix xdr4 count of server in fs_location4
  nfsd4: fix xdr4 inclusion of escaped char
  sunrpc/cache: convert to use string_escape_str()
  sunrpc: only call test_bit once in svc_xprt_received
  fs: nfsd: Fix signedness bug in compare_blob
  sunrpc: add some tracepoints around enqueue and dequeue of svc_xprt
  sunrpc: convert to lockless lookup of queued server threads
  sunrpc: fix potential races in pool_stats collection
  sunrpc: add a rcu_head to svc_rqst and use kfree_rcu to free it
  sunrpc: require svc_create callers to pass in meaningful shutdown routine
  sunrpc: have svc_wake_up only deal with pool 0
  sunrpc: convert sp_task_pending flag to use atomic bitops
  sunrpc: move rq_cachetype field to better optimize space
  sunrpc: move rq_splice_ok flag into rq_flags
  sunrpc: move rq_dropme flag into rq_flags
  sunrpc: move rq_usedeferral flag to rq_flags
  sunrpc: move rq_local field to rq_flags
  sunrpc: add a generic rq_flags field to svc_rqst and move rq_secure to it
  nfsd: minor off by one checks in __write_versions()
  sunrpc: release svc_pool_map reference when serv allocation fails
  ...
2014-12-16 15:25:31 -08:00
Daniel Borkmann
87545899b5 net: replace remaining users of arch_fast_hash with jhash
This patch effectively reverts commit 500f808726 ("net: ovs: use CRC32
accelerated flow hash if available"), and other remaining arch_fast_hash()
users such as from nfsd via commit 6282cd5655 ("NFSD: Don't hand out
delegations for 30 seconds after recalling them.") where it has been used
as a hash function for bloom filtering.

While we think that these users are actually not much of concern, it has
been requested to remove the arch_fast_hash() library bits that arose
from [1] entirely as per recent discussion [2]. The main argument is that
using it as a hash may introduce bias due to its linearity (see avalanche
criterion) and thus makes it less clear (though we tried to document that)
when this security/performance trade-off is actually acceptable for a
general purpose library function.

Lets therefore avoid any further confusion on this matter and remove it to
prevent any future accidental misuse of it. For the time being, this is
going to make hashing of flow keys a bit more expensive in the ovs case,
but future work could reevaluate a different hashing discipline.

  [1] https://patchwork.ozlabs.org/patch/299369/
  [2] https://patchwork.ozlabs.org/patch/418756/

Cc: Neil Brown <neilb@suse.de>
Cc: Francesco Fusco <fusco@ntop.org>
Cc: Jesse Gross <jesse@nicira.com>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-10 15:17:45 -05:00
Rasmus Villemoes
ef17af2a81 fs: nfsd: Fix signedness bug in compare_blob
Bugs similar to the one in acbbe6fbb2 (kcmp: fix standard comparison
bug) are in rich supply.

In this variant, the problem is that struct xdr_netobj::len has type
unsigned int, so the expression o1->len - o2->len _also_ has type
unsigned int; it has completely well-defined semantics, and the result
is some non-negative integer, which is always representable in a long
long. But this means that if the conditional triggers, we are
guaranteed to return a positive value from compare_blob.

In this case it could be fixed by

-       res = o1->len - o2->len;
+       res = (long long)o1->len - (long long)o2->len;

but I'd rather eliminate the usually broken 'return a - b;' idiom.

Reviewed-by: Jeff Layton <jlayton@primarydata.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-12-09 11:29:14 -05:00
Jeff Layton
5b095e9992 nfsd: convert nfs4_file searches to use RCU
The global state_lock protects the file_hashtbl, and that has the
potential to be a scalability bottleneck.

Address this by making the file_hashtbl use RCU. Add a rcu_head to the
nfs4_file and use that when freeing ones that have been hashed. In order
to conserve space, we union the fi_rcu field with the fi_delegations
list_head which must be clear by the time the last reference to the file
is dropped.

Convert find_file_locked to use RCU lookup primitives and not to require
that the state_lock be held, and convert find_file to do a lockless
lookup. Convert find_or_add_file to attempt a lockless lookup first, and
then fall back to doing a locked search and insert if that fails to find
anything.

Also, minimize the number of times we need to calculate the hash value
by passing it in as an argument to the search and insert functions, and
optimize the order of arguments in nfsd4_init_file.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-11-07 16:56:11 -05:00
Chuck Lever
b0d2e42cce NFSD: Always initialize cl_cb_addr
A client may not want to use the back channel on a transport it sent
CREATE_SESSION on, in which case it clears SESSION4_BACK_CHAN.

However, cl_cb_addr should be populated anyway, to be used if the
client binds other connections to this session. If cl_cb_addr is
not initialized, rpc_create() fails when the server attempts to
set up a back channel on such secondary transports.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-10-23 14:05:11 -04:00
Linus Torvalds
ef4a48c513 File locking related changes for v3.18 (pile #1)
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUNZK4AAoJEAAOaEEZVoIVI08P/iM7eaIVRnqaqtWw/JBzxiba
 EMDlJYUBSlv6lYk9s8RJT4bMmcmGAKSYzVAHSoPahzNcqTDdFLeDTLGxJ8uKBbjf
 d1qRRdH1yZHGUzCvJq3mEendjfXn435Y3YburUxjLfmzrzW7EbMvndiQsS5dhAm9
 PEZ+wrKF/zFL7LuXa1YznYrbqOD/GRsJAXGEWc3kNwfS9avephVG/RI3GtpI2PJj
 RY1mf8P7+WOlrShYoEuUo5aqs01MnU70LbqGHzY8/QKH+Cb0SOkCHZPZyClpiA+G
 MMJ+o2XWcif3BZYz+dobwz/FpNZ0Bar102xvm2E8fqByr/T20JFjzooTKsQ+PtCk
 DetQptrU2gtyZDKtInJUQSDPrs4cvA13TW+OEB1tT8rKBnmyEbY3/TxBpBTB9E6j
 eb/V3iuWnywR3iE+yyvx24Qe7Pov6deM31s46+Vj+GQDuWmAUJXemhfzPtZiYpMT
 exMXTyDS3j+W+kKqHblfU5f+Bh1eYGpG2m43wJVMLXKV7NwDf8nVV+Wea962ga+w
 BAM3ia4JRVgRWJBPsnre3lvGT5kKPyfTZsoG+kOfRxiorus2OABoK+SIZBZ+c65V
 Xh8VH5p3qyCUBOynXlHJWFqYWe2wH0LfbPrwe9dQwTwON51WF082EMG5zxTG0Ymf
 J2z9Shz68zu0ok8cuSlo
 =Hhee
 -----END PGP SIGNATURE-----

Merge tag 'locks-v3.18-1' of git://git.samba.org/jlayton/linux

Pull file locking related changes from Jeff Layton:
 "This release is a little more busy for file locking changes than the
  last:

   - a set of patches from Kinglong Mee to fix the lockowner handling in
     knfsd
   - a pile of cleanups to the internal file lease API.  This should get
     us a bit closer to allowing for setlease methods that can block.

  There are some dependencies between mine and Bruce's trees this cycle,
  and I based my tree on top of the requisite patches in Bruce's tree"

* tag 'locks-v3.18-1' of git://git.samba.org/jlayton/linux: (26 commits)
  locks: fix fcntl_setlease/getlease return when !CONFIG_FILE_LOCKING
  locks: flock_make_lock should return a struct file_lock (or PTR_ERR)
  locks: set fl_owner for leases to filp instead of current->files
  locks: give lm_break a return value
  locks: __break_lease cleanup in preparation of allowing direct removal of leases
  locks: remove i_have_this_lease check from __break_lease
  locks: move freeing of leases outside of i_lock
  locks: move i_lock acquisition into generic_*_lease handlers
  locks: define a lm_setup handler for leases
  locks: plumb a "priv" pointer into the setlease routines
  nfsd: don't keep a pointer to the lease in nfs4_file
  locks: clean up vfs_setlease kerneldoc comments
  locks: generic_delete_lease doesn't need a file_lock at all
  nfsd: fix potential lease memory leak in nfs4_setlease
  locks: close potential race in lease_get_mtime
  security: make security_file_set_fowner, f_setown and __f_setown void return
  locks: consolidate "nolease" routines
  locks: remove lock_may_read and lock_may_write
  lockd: rip out deferred lock handling from testlock codepath
  NFSD: Get reference of lockowner when coping file_lock
  ...
2014-10-11 13:21:34 -04:00
Jeff Layton
4d01b7f5e7 locks: give lm_break a return value
Christoph suggests:

   "Add a return value to lm_break so that the lock manager can tell the
    core code "you can delete this lease right now".  That gets rid of
    the games with the timeout which require all kinds of race avoidance
    code in the users."

Do that here and have the nfsd lease break routine use it when it detects
that there was a race between setting up the lease and it being broken.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2014-10-07 14:06:13 -04:00
Jeff Layton
c45198eda2 locks: move freeing of leases outside of i_lock
There was only one place where we still could free a file_lock while
holding the i_lock -- lease_modify. Add a new list_head argument to the
lm_change operation, pass in a private list when calling it, and fix
those callers to dispose of the list once the lock has been dropped.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2014-10-07 14:06:13 -04:00
Jeff Layton
1c7dd2ff43 locks: define a lm_setup handler for leases
...and move the fasync setup into it for fcntl lease calls. At the same
time, change the semantics of how the file_lock double-pointer is
handled. Up until now, on a successful lease return you got a pointer to
the lock on the list. This is bad, since that pointer can no longer be
relied on as valid once the inode->i_lock has been released.

Change the code to instead just zero out the pointer if the lease we
passed in ended up being used. Then the callers can just check to see
if it's NULL after the call and free it if it isn't.

The priv argument has the same semantics. The lm_setup function can
zero the pointer out to signal to the caller that it should not be
freed after the function returns.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2014-10-07 14:06:12 -04:00
Jeff Layton
e6f5c78930 locks: plumb a "priv" pointer into the setlease routines
In later patches, we're going to add a new lock_manager_operation to
finish setting up the lease while still holding the i_lock.  To do
this, we'll need to pass a little bit of info in the fcntl setlease
case (primarily an fasync structure). Plumb the extra pointer into
there in advance of that.

We declare this pointer as a void ** to make it clear that this is
private info, and that the caller isn't required to set this unless
the lm_setup specifically requires it.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2014-10-07 14:06:12 -04:00
Jeff Layton
0c637be884 nfsd: don't keep a pointer to the lease in nfs4_file
Now that we don't need to pass in an actual lease pointer to
vfs_setlease on unlock, we can stop tracking a pointer to the lease in
the nfs4_file.

Switch all of the places that check the fi_lease to check fi_deleg_file
instead. We always set that at the same time so it will have the same
semantics.

Cc: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2014-10-07 14:06:12 -04:00
Jeff Layton
0efaa7e82f locks: generic_delete_lease doesn't need a file_lock at all
Ensure that it's OK to pass in a NULL file_lock double pointer on
a F_UNLCK request and convert the vfs_setlease F_UNLCK callers to
do just that.

Finally, turn the BUG_ON in generic_setlease into a WARN_ON_ONCE
with an error return. That's a problem we can handle without
crashing the box if it occurs.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2014-10-07 14:06:12 -04:00
Jeff Layton
415b96c5a1 nfsd: fix potential lease memory leak in nfs4_setlease
It's unlikely to ever occur, but if there were already a lease set on
the file then we could end up getting back a different pointer on a
successful setlease attempt than the one we allocated. If that happens,
the one we allocated could leak.

In practice, I don't think this will happen due to the fact that we only
try to set up the lease once per nfs4_file, but this error handling is a
bit more correct given the current lease API.

Cc: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2014-10-07 14:06:12 -04:00
Jeff Layton
34549ab09e nfsd: eliminate "to_delegation" define
We now have cb_to_delegation and to_delegation, which do the same thing
and are defined separately in different .c files. Move the
cb_to_delegation definition into a header file and eliminate the
redundant to_delegation definition.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
2014-10-01 12:28:01 -04:00
Christoph Hellwig
0162ac2b97 nfsd: introduce nfsd4_callback_ops
Add a higher level abstraction than the rpc_ops for callback operations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-09-26 16:29:29 -04:00
Christoph Hellwig
f0b5de1b6b nfsd: split nfsd4_callback initialization and use
Split out initializing the nfs4_callback structure from using it.  For
the NULL callback this gets rid of tons of pointless re-initializations.

Note that I don't quite understand what protects us from running multiple
NULL callbacks at the same time, but at least this chance doesn't make
it worse..

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-09-26 16:29:28 -04:00
Christoph Hellwig
326129d02a nfsd: introduce a generic nfsd4_cb
Add a helper to queue up a callback.  CB_NULL has a bit of special casing
because it is special in the specification, but all other new callback
operations will be able to share code with this and a few more changes
to refactor the callback code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-09-26 16:29:27 -04:00
J. Bruce Fields
70b2823535 nfsd4: clarify how grace period ends
The grace period is ended in two steps--first userland is notified that
the grace period is now long enough that any clients who have not yet
reclaimed can be safely forgotten, then we flip the switch that forbids
reclaims and allows new opens.  I had to think a bit to convince myself
that the ordering was right here.  Document it.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-09-17 16:33:19 -04:00
J. Bruce Fields
bea57fe45b nfsd4: stop grace_time update at end of grace period
The attempt to automatically set a new grace period time at the end of
the grace period isn't really helpful.  We'll probably shut down and
reboot before we actually make use of the new grace period time anyway.
So may as well leave it up to the init system to get this right.

This just confuses people when they see /proc/fs/nfsd/nfsv4gracetime
change from what they set it to.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-09-17 16:33:18 -04:00
Jeff Layton
d4318acd5d nfsd: pass extra info in env vars to upcalls to allow for early grace period end
In order to support lifting the grace period early, we must tell
nfsdcltrack what sort of client the "create" upcall is for. We can't
reliably tell if a v4.0 client has completed reclaiming, so we can only
lift the grace period once all the v4.1+ clients have issued a
RECLAIM_COMPLETE and if there are no v4.0 clients.

Also, in order to lift the grace period, we have to tell userland when
the grace period started so that it can tell whether a RECLAIM_COMPLETE
has been issued for each client since then.

Since this is all optional info, we pass it along in environment
variables to the "init" and "create" upcalls. By doing this, we don't
need to revise the upcall format. The UMH upcall can simply make use of
this info if it happens to be present. If it's not then it can just
avoid lifting the grace period early.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
2014-09-17 16:33:15 -04:00
Jeff Layton
7f5ef2e900 nfsd: add a v4_end_grace file to /proc/fs/nfsd
Allow a privileged userland process to end the v4 grace period early.
Writing "Y", "y", or "1" to the file will cause the v4 grace period to
be lifted.  The basic idea with this will be to allow the userland
client tracking program to lift the grace period once it knows that no
more clients will be reclaiming state.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
2014-09-17 16:33:14 -04:00
Jeff Layton
3b3e7b7223 nfsd: reject reclaim request when client has already sent RECLAIM_COMPLETE
As stated in RFC 5661, section 18.51.3:

    Once a RECLAIM_COMPLETE is done, there can be no further reclaim
    operations for locks whose scope is defined as having completed
    recovery.  Once the client sends RECLAIM_COMPLETE, the server will
    not allow the client to do subsequent reclaims of locking state for
    that scope and, if these are attempted, will return
    NFS4ERR_NO_GRACE.

Ensure that we enforce that requirement.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
2014-09-17 16:33:13 -04:00
Jeff Layton
919b8049f0 nfsd: remove redundant boot_time parm from grace_done client tracking op
Since it's stored in nfsd_net, we don't need to pass it in separately.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
2014-09-17 16:33:12 -04:00
Kinglong Mee
aef9583b23 NFSD: Get reference of lockowner when coping file_lock
v5: using nfs4_get_stateowner() instead of an inline function
v3: Update based on Jeff's comments
v2: Fix bad using of struct file_lock_operations for handle the owner

Acked-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
2014-09-09 16:01:09 -04:00
Kinglong Mee
b5971afa0b NFSD: New helper nfs4_get_stateowner() for atomic_inc sop reference
v5: same as the first version

Reviewed-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
2014-09-09 16:01:09 -04:00
Kinglong Mee
6cd906627b NFSD: Remove duplicate initialization of file_lock
locks_alloc_lock() has initialized struct file_lock, no need to
re-initialize it here.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-28 15:58:35 -04:00
Jeff Layton
afbda402a0 nfsd: call nfs4_put_deleg_lease outside of state_lock
Currently, we hold the state_lock when releasing the lease. That's
potentially problematic in the future if we allow for setlease methods
that can sleep. Move the nfs4_put_deleg_lease call out of the delegation
unhashing routine (which was always a bit goofy anyway), and into the
unlocked sections of the callers of unhash_delegation_locked.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-17 12:00:14 -04:00
Jeff Layton
6bcc034eac nfsd: protect lease-related nfs4_file fields with fi_lock
Currently these fields are protected with the state_lock, but that
doesn't really make a lot of sense. These fields are "private" to the
nfs4_file, and can be protected with the more granular fi_lock.

The fi_lock is already held when setting these fields. Make the code
hold the fp->fi_lock when clearing the lease-related fields in the
nfs4_file, and no longer require that the state_lock be held when
calling into this function.

To prevent lock inversion with the i_lock, we also move the vfs_setlease
and fput calls outside of the fi_lock. This also sets us up for allowing
vfs_setlease calls to block in the future.

Finally, remove a redundant NULL pointer check. unhash_delegation_locked
locks the fp->fi_lock prior to that check, so fp in that function must
never be NULL.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-17 12:00:13 -04:00
Jeff Layton
b687f6863e nfsd: remove the client_mutex and the nfs4_lock/unlock_state wrappers
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-05 15:00:54 -04:00
Jeff Layton
74cf76df0f nfsd: remove nfs4_lock_state: nfs4_state_shutdown_net
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-05 10:55:20 -04:00
Jeff Layton
dab6ef2415 nfsd: remove nfs4_lock_state: nfs4_laundromat
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-05 10:55:20 -04:00
Trond Myklebust
05149dd4dc nfsd: Remove nfs4_lock_state(): reclaim_complete()
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-05 10:55:19 -04:00
Trond Myklebust
cb86fb1428 nfsd: Remove nfs4_lock_state(): setclientid, setclientid_confirm, renew
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-05 10:55:18 -04:00
Trond Myklebust
3974552dce nfsd: Remove nfs4_lock_state(): exchange_id, create/destroy_session()
Also destroy_clientid and bind_conn_to_session.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-05 10:55:17 -04:00
Trond Myklebust
3234975f47 nfsd: Remove nfs4_lock_state(): nfsd4_open and nfsd4_open_confirm
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-05 10:55:16 -04:00
Trond Myklebust
084d4d4549 nfsd: Remove nfs4_lock_state(): nfsd4_delegreturn()
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-05 10:55:15 -04:00
Trond Myklebust
36626a2ecf nfsd: Remove nfs4_lock_state(): nfsd4_open_downgrade + nfsd4_close
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-05 10:55:14 -04:00
Trond Myklebust
2dd7f2ad4e nfsd: Remove nfs4_lock_state(): nfsd4_lock/locku/lockt()
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-05 10:55:13 -04:00
Trond Myklebust
51f5e78355 nfsd: Remove nfs4_lock_state(): nfsd4_release_lockowner
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-05 10:55:12 -04:00
Trond Myklebust
e7d5dc19ce nfsd: Remove nfs4_lock_state(): nfsd4_test_stateid/nfsd4_free_stateid
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-05 10:55:12 -04:00
Trond Myklebust
c2d1d6a8f0 nfsd: Remove nfs4_lock_state(): nfs4_preprocess_stateid_op()
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-05 10:55:11 -04:00
Jeff Layton
285abdee53 nfsd: remove old fault injection infrastructure
Remove the old nfsd_for_n_state function and move nfsd_find_client
higher up into the file to get rid of forward declaration. Remove
the struct nfsd_fault_inject_op arguments from the operations as
they are no longer needed by any of them.

Finally, remove the old "standard" get and set routines, which
also eliminates the client_mutex from this code.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-05 10:55:10 -04:00
Jeff Layton
98d5c7c5bd nfsd: add more granular locking to *_delegations fault injectors
...instead of relying on the client_mutex.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-05 10:55:09 -04:00
Jeff Layton
82e05efaec nfsd: add more granular locking to forget_openowners fault injector
...instead of relying on the client_mutex.

Also, fix up the printk output that is generated when the file is read.
It currently says that it's reporting the number of open files, but
it's actually reporting the number of openowners.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-05 10:55:08 -04:00
Jeff Layton
016200c373 nfsd: add more granular locking to forget_locks fault injector
...instead of relying on the client_mutex.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-05 10:55:07 -04:00
Jeff Layton
3738d50e7f nfsd: add a list_head arg to nfsd_foreach_client_lock
In a later patch, we'll want to collect the locks onto a list for later
destruction. If "func" is defined and "collect" is defined, then we'll
add the lock stateid to the list.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-05 10:55:06 -04:00
Jeff Layton
69fc9edf98 nfsd: add nfsd_inject_forget_clients
...which uses the client_lock for protection instead of client_mutex.
Also remove nfsd_forget_client as there are no more callers.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-05 10:55:05 -04:00
Jeff Layton
a0926d1527 nfsd: add a forget_client set_clnt routine
...that relies on the client_lock instead of client_mutex.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-05 10:55:04 -04:00
Jeff Layton
7ec0e36f1a nfsd: add a forget_clients "get" routine with proper locking
Add a new "get" routine for forget_clients that relies on the
client_lock instead of the client_mutex.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-05 10:55:04 -04:00
Jeff Layton
294ac32e99 nfsd: protect clid and verifier generation with client_lock
The clid counter is a global counter currently. Move it to be a per-net
property so that it can be properly protected by the nn->client_lock
instead of relying on the client_mutex.

The verifier generator is also potentially racy if there are two
simultaneous callers. Generate the verifier when we generate the clid
value, so it's also created under the client_lock. With this, there's
no need to keep two counters as they'd always be in sync anyway, so
just use the clientid_counter for both.

As Trond points out, what would be best is to eventually move this
code to use IDR instead of the hash tables. That would also help ensure
uniqueness, but that's probably best done as a separate project.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-05 10:55:02 -04:00
Jeff Layton
fd699b8a48 nfsd: don't destroy clients that are busy
It's possible that we'll have an in-progress call on some of the clients
while a rogue EXCHANGE_ID or DESTROY_CLIENTID call comes in. Be sure to
try and mark the client expired first, so that the refcount is
respected.

This will only be a problem once the client_mutex is removed.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-05 10:55:01 -04:00
Kinglong Mee
fb94d766af NFSD: Put the reference of nfs4_file when freeing stid
After testing nfs4 lock, I restart the nfsd service, got messages as,

[ 5677.403419] nfsd: last server has exited, flushing export cache
[ 5677.463728] =============================================================================
[ 5677.463942] BUG nfsd4_files (Tainted: G    B      OE): Objects remaining in nfsd4_files on kmem_cache_close()
[ 5677.464055] -----------------------------------------------------------------------------

[ 5677.464203] INFO: Slab 0xffffea0000233400 objects=28 used=1 fp=0xffff880008cd3d98 flags=0x3ffc0000004080
[ 5677.464318] CPU: 0 PID: 3772 Comm: rmmod Tainted: G    B      OE 3.16.0-rc2+ #29
[ 5677.464420] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[ 5677.464538]  0000000000000000 0000000036af2c9f ffff88000ce97d68 ffffffff816eacfa
[ 5677.464643]  ffffea0000233400 ffff88000ce97e40 ffffffff811cda44 ffffffff00000020
[ 5677.464774]  ffff88000ce97e50 ffff88000ce97e00 656a624f00000008 616d657220737463
[ 5677.464875] Call Trace:
[ 5677.464925]  [<ffffffff816eacfa>] dump_stack+0x45/0x56
[ 5677.464983]  [<ffffffff811cda44>] slab_err+0xb4/0xe0
[ 5677.465040]  [<ffffffff811d0457>] ? __kmalloc+0x117/0x290
[ 5677.465099]  [<ffffffff81100eec>] ? on_each_cpu_cond+0xac/0xf0
[ 5677.465158]  [<ffffffff811d1bc0>] ? kmem_cache_close+0x110/0x2e0
[ 5677.465218]  [<ffffffff811d1be0>] kmem_cache_close+0x130/0x2e0
[ 5677.465279]  [<ffffffff8135a0c1>] ? kobject_cleanup+0x91/0x1b0
[ 5677.465338]  [<ffffffff811d22be>] __kmem_cache_shutdown+0xe/0x10
[ 5677.465399]  [<ffffffff8119bd28>] kmem_cache_destroy+0x48/0x100
[ 5677.465466]  [<ffffffffa05ef78d>] nfsd4_free_slabs+0x2d/0x50 [nfsd]
[ 5677.465530]  [<ffffffffa05fa987>] exit_nfsd+0x34/0x6ad [nfsd]
[ 5677.465589]  [<ffffffff81104ac2>] SyS_delete_module+0x162/0x200
[ 5677.465649]  [<ffffffff81013b69>] ? do_notify_resume+0x59/0x90
[ 5677.465759]  [<ffffffff816f2369>] system_call_fastpath+0x16/0x1b
[ 5677.465822] INFO: Object 0xffff880008cd0000 @offset=0
[ 5677.465882] INFO: Allocated in nfsd4_process_open1+0x61/0x350 [nfsd] age=7599 cpu=0 pid=3253
[ 5677.466115]  __slab_alloc+0x3b0/0x4b1
[ 5677.466166]  kmem_cache_alloc+0x1e4/0x240
[ 5677.466220]  nfsd4_process_open1+0x61/0x350 [nfsd]
[ 5677.466276]  nfsd4_open+0xee/0x860 [nfsd]
[ 5677.466329]  nfsd4_proc_compound+0x4d7/0x7f0 [nfsd]
[ 5677.466384]  nfsd_dispatch+0xbb/0x200 [nfsd]
[ 5677.466447]  svc_process_common+0x453/0x6f0 [sunrpc]
[ 5677.466506]  svc_process+0x103/0x170 [sunrpc]
[ 5677.466559]  nfsd+0x117/0x190 [nfsd]
[ 5677.466609]  kthread+0xd8/0xf0
[ 5677.466656]  ret_from_fork+0x7c/0xb0
[ 5677.466775] kmem_cache_destroy nfsd4_files: Slab cache still has objects
[ 5677.466839] CPU: 0 PID: 3772 Comm: rmmod Tainted: G    B      OE 3.16.0-rc2+ #29
[ 5677.466937] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[ 5677.467049]  0000000000000000 0000000036af2c9f ffff88000ce97eb0 ffffffff816eacfa
[ 5677.467150]  ffff880020bb2d00 ffff88000ce97ed0 ffffffff8119bdd9 0000000000000000
[ 5677.467250]  ffffffffa06065c0 ffff88000ce97ee0 ffffffffa05ef78d ffff88000ce97ef0
[ 5677.467351] Call Trace:
[ 5677.467397]  [<ffffffff816eacfa>] dump_stack+0x45/0x56
[ 5677.467454]  [<ffffffff8119bdd9>] kmem_cache_destroy+0xf9/0x100
[ 5677.467516]  [<ffffffffa05ef78d>] nfsd4_free_slabs+0x2d/0x50 [nfsd]
[ 5677.467579]  [<ffffffffa05fa987>] exit_nfsd+0x34/0x6ad [nfsd]
[ 5677.467639]  [<ffffffff81104ac2>] SyS_delete_module+0x162/0x200
[ 5677.467765]  [<ffffffff81013b69>] ? do_notify_resume+0x59/0x90
[ 5677.467826]  [<ffffffff816f2369>] system_call_fastpath+0x16/0x1b

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Jeff Layton <jlayton@primarydata.com>
Fixes: 11b9164ada "nfsd: Add a struct nfs4_file field to struct nfs4_stid"
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-05 10:53:36 -04:00
Jeff Layton
7abea1e8e8 nfsd: don't destroy client if mark_client_expired_locked fails
If it fails, it means that the client is in use and so destroying it
would be bad. Currently, the client_mutex prevents this from happening
but once we remove it, we won't be able to do this.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-01 16:28:26 -04:00
Jeff Layton
97403d95e1 nfsd: move unhash_client_locked call into mark_client_expired_locked
All the callers except for the fault injection code call it directly
afterward, and in the fault injection case it won't hurt to do so
anyway.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-01 16:28:25 -04:00
Jeff Layton
217526e7ec nfsd: protect the close_lru list and oo_last_closed_stid with client_lock
Currently, it's protected by the client_mutex. Move it so that the list
and the fields in the openowner are protected by the client_lock.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-01 16:28:24 -04:00
Trond Myklebust
0a880a28f8 nfsd: Add lockdep assertions to document the nfs4_client/session locking
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-01 16:28:23 -04:00
Trond Myklebust
3e339f964b nfsd: Ensure lookup_clientid() takes client_lock
Ensure that the client lookup is done safely under the client_lock, so
we're not relying on the client_mutex.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-01 16:28:23 -04:00
Trond Myklebust
6b10ad193d nfsd: Protect nfsd4_destroy_clientid using client_lock
...instead of relying on the client_mutex.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-01 16:28:22 -04:00
Jeff Layton
d20c11d86d nfsd: Protect session creation and client confirm using client_lock
In particular, we want to ensure that the move_to_confirmed() is
protected by the nn->client_lock spin lock, so that we can use that when
looking up the clientid etc. instead of relying on the client_mutex.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-01 16:28:21 -04:00
Trond Myklebust
3dbacee6e1 nfsd: Protect unconfirmed client creation using client_lock
...instead of relying on the client_mutex.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-01 16:28:20 -04:00
Trond Myklebust
5cc40fd7b6 nfsd: Move create_client() call outside the lock
For efficiency reasons, and because we want to use spin locks instead
of relying on the client_mutex.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-01 16:28:20 -04:00
Trond Myklebust
425510f5c8 nfsd: Don't require client_lock in free_client
The struct nfs_client is supposed to be invisible and unreferenced
before it gets here.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-01 16:28:19 -04:00
Trond Myklebust
4864af97e0 nfsd: Ensure that the laundromat unhashes the client before releasing locks
If we leave the client on the confirmed/unconfirmed tables, and leave
the sessions visible on the sessionid_hashtbl, then someone might
find them before we've had a chance to destroy them.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-01 16:28:18 -04:00
Trond Myklebust
4beb345b37 nfsd: Ensure struct nfs4_client is unhashed before we try to destroy it
When we remove the client_mutex protection, we will need to ensure
that it can't be found by other threads while we're destroying it.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-01 16:28:17 -04:00
Jeff Layton
4ae098d327 nfsd: rename unhash_generic_stateid to unhash_ol_stateid
...to better match other functions that deal with open/lock stateids.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-31 14:20:31 -04:00
Jeff Layton
d83017f94c nfsd: don't thrash the cl_lock while freeing an open stateid
When we remove the client_mutex, we'll have a potential race between
FREE_STATEID and CLOSE.

The root of the problem is that we are walking the st_locks list,
dropping the spinlock and then trying to release the persistent
reference to the lockstateid. In between, a FREE_STATEID call can come
along and take the lock, find the stateid and then try to put the
reference. That leads to a double put.

Fix this by not releasing the cl_lock in order to release each lock
stateid. Use put_generic_stateid_locked to unhash them and gather them
onto a list, and free_ol_stateid_reaplist to free any that end up on the
list.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-31 14:20:31 -04:00
Jeff Layton
2c41beb0e5 nfsd: reduce cl_lock thrashing in release_openowner
Releasing an openowner is a bit inefficient as it can potentially thrash
the cl_lock if you have a lot of stateids attached to it. Once we remove
the client_mutex, it'll also potentially be dangerous to do this.

Add some functions to make it easier to defer the part of putting a
generic stateid reference that needs to be done outside the cl_lock while
doing the parts that must be done while holding it under a single lock.

First we unhash each open stateid. Then we call
put_generic_stateid_locked which will put the reference to an
nfs4_ol_stateid. If it turns out to be the last reference, it'll go
ahead and remove the stid from the IDR tree and put it onto the reaplist
using the st_locks list_head.

Then, after dropping the lock we'll call free_ol_stateid_reaplist to
walk the list of stateids that are fully unhashed and ready to be freed,
and free each of them. This function can sleep, so it must be done
outside any spinlocks.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-31 14:20:30 -04:00
Jeff Layton
fc5a96c3b7 nfsd: close potential race in nfsd4_free_stateid
Once we remove the client_mutex, it'll be possible for the sc_type of a
lock stateid to change after it's found and checked, but before we can
go to destroy it. If that happens, we can end up putting the persistent
reference to the stateid more than once, and unhash it more than once.

Fix this by unhashing the lock stateid prior to dropping the cl_lock but
after finding it.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-31 14:20:29 -04:00
Jeff Layton
3c1c995cc2 nfsd: optimize destroy_lockowner cl_lock thrashing
Reduce the cl_lock trashing in destroy_lockowner. Unhash all of the
lockstateids on the lockowner's list. Put the reference under the lock
and see if it was the last one. If so, then add it to a private list
to be destroyed after we drop the lock.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-31 14:20:28 -04:00
Jeff Layton
a819ecc1bb nfsd: add locking to stateowner release
Once we remove the client_mutex, we'll need to properly protect
the stateowner reference counts using the cl_lock.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-31 14:20:27 -04:00
Jeff Layton
882e9d25e1 nfsd: clean up and reorganize release_lockowner
Do more within the main loop, and simplify the function a bit. Also,
there's no need to take a stateowner reference unless we're going to call
release_lockowner.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-31 14:20:27 -04:00
Trond Myklebust
d4f0489f38 nfsd: Move the open owner hash table into struct nfs4_client
Preparation for removing the client_mutex.

Convert the open owner hash table into a per-client table and protect it
using the nfs4_client->cl_lock spin lock.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-31 14:20:26 -04:00
Trond Myklebust
c58c6610ec nfsd: Protect adding/removing lock owners using client_lock
Once we remove client mutex protection, we'll need to ensure that
stateowner lookup and creation are atomic between concurrent compounds.
Ensure that alloc_init_lock_stateowner checks the hashtable under the
client_lock before adding a new element.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-31 14:20:25 -04:00
Trond Myklebust
7ffb588086 nfsd: Protect adding/removing open state owners using client_lock
Once we remove client mutex protection, we'll need to ensure that
stateowner lookup and creation are atomic between concurrent compounds.
Ensure that alloc_init_open_stateowner checks the hashtable under the
client_lock before adding a new element.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-31 14:20:24 -04:00
Jeff Layton
b401be22b5 nfsd: don't allow CLOSE to proceed until refcount on stateid drops
Once we remove client_mutex protection, it'll be possible to have an
in-flight operation using an openstateid when a CLOSE call comes in.
If that happens, we can't just put the sc_file reference and clear its
pointer without risking an oops.

Fix this by ensuring that v4.0 CLOSE operations wait for the refcount
to drop before proceeding to do so.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-31 14:20:23 -04:00
Jeff Layton
d3134b1049 nfsd: make openstateids hold references to their openowners
Change it so that only openstateids hold persistent references to
openowners. References can still be held by compounds in progress.

With this, we can get rid of NFS4_OO_NEW. It's possible that we
will create a new openowner in the process of doing the open, but
something later fails. In the meantime, another task could find
that openowner and start using it on a successful open. If that
occurs we don't necessarily want to tear it down, just put the
reference that the failing compound holds.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-31 14:20:23 -04:00
Jeff Layton
5adfd8850b nfsd: clean up refcounting for lockowners
Ensure that lockowner references are only held by lockstateids and
operations that are in-progress. With this, we can get rid of
release_lockowner_if_empty, which will be racy once we remove
client_mutex protection.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-31 14:20:22 -04:00
Trond Myklebust
e4f1dd7fc2 nfsd: Make lock stateid take a reference to the lockowner
A necessary step toward client_mutex removal.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-31 14:20:21 -04:00
Jeff Layton
8f4b54c53f nfsd: add an operation for unhashing a stateowner
Allow stateowners to be unhashed and destroyed when the last reference
is put. The unhashing must be idempotent. In a future patch, we'll add
some locking around it, but for now it's only protected by the
client_mutex.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-31 14:20:20 -04:00
Jeff Layton
5db1c03feb nfsd: clean up lockowner refcounting when finding them
Ensure that when finding or creating a lockowner, that we get a
reference to it. For now, we also take an extra reference when a
lockowner is created that can be put when release_lockowner is called,
but we'll remove that in a later patch once we change how references are
held.

Since we no longer destroy lockowners in the event of an error in
nfsd4_lock, we must change how the seqid gets bumped in the lk_is_new
case. Instead of doing so on creation, do it manually in nfsd4_lock.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-31 14:20:20 -04:00
Jeff Layton
58fb12e6a4 nfsd: Add a mutex to protect the NFSv4.0 open owner replay cache
We don't want to rely on the client_mutex for protection in the case of
NFSv4 open owners. Instead, we add a mutex that will only be taken for
NFSv4.0 state mutating operations, and that will be released once the
entire compound is done.

Also, ensure that nfsd4_cstate_assign_replay/nfsd4_cstate_clear_replay
take a reference to the stateowner when they are using it for NFSv4.0
open and lock replay caching.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-31 14:20:19 -04:00
Jeff Layton
6b180f0b57 nfsd: Add reference counting to state owners
The way stateowners are managed today is somewhat awkward. They need to
be explicitly destroyed, even though the stateids reference them. This
will be particularly problematic when we remove the client_mutex.

We may create a new stateowner and attempt to open a file or set a lock,
and have that fail. In the meantime, another RPC may come in that uses
that same stateowner and succeed. We can't have the first task tearing
down the stateowner in that situation.

To fix this, we need to change how stateowners are tracked altogether.
Refcount them and only destroy them once all stateids that reference
them have been destroyed. This patch starts by adding the refcounting
necessary to do that.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-31 14:20:18 -04:00
Trond Myklebust
2d3f96689f nfsd: Migrate the stateid reference into nfs4_find_stateid_by_type()
Allow nfs4_find_stateid_by_type to take the stateid reference, while
still holding the &cl->cl_lock. Necessary step toward client_mutex
removal.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-31 14:20:17 -04:00
Trond Myklebust
fd9110113c nfsd: Migrate the stateid reference into nfs4_lookup_stateid()
Allow nfs4_lookup_stateid to take the stateid reference, instead
of having all the callers do so.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-31 14:20:16 -04:00
Trond Myklebust
4cbfc9f704 nfsd: Migrate the stateid reference into nfs4_preprocess_seqid_op
Allow nfs4_preprocess_seqid_op to take the stateid reference, instead
of having all the callers do so.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-31 14:20:15 -04:00
Trond Myklebust
0667b1e9d8 nfsd: Add reference counting to nfs4_preprocess_confirmed_seqid_op
Ensure that all the callers put the open stateid after use.
Necessary step toward client_mutex removal.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-31 14:20:15 -04:00
Trond Myklebust
2585fc7958 nfsd: nfsd4_open_confirm() must reference the open stateid
Ensure that nfsd4_open_confirm() keeps a reference to the open
stateid until it is done working with it.

Necessary step toward client_mutex removal.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-31 14:20:14 -04:00
Trond Myklebust
8a0b589d8f nfsd: Prepare nfsd4_close() for open stateid referencing
Prepare nfsd4_close for a future where nfs4_preprocess_seqid_op()
hands it a fully referenced open stateid. Necessary step toward
client_mutex removal.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-31 14:20:13 -04:00
Trond Myklebust
d6f2bc5dcf nfsd: nfsd4_process_open2() must reference the open stateid
Ensure that nfsd4_process_open2() keeps a reference to the open
stateid until it is done working with it. Necessary step toward
client_mutex removal.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-31 14:20:12 -04:00
Trond Myklebust
dcd94cc2e7 nfsd: nfsd4_process_open2() must reference the delegation stateid
Ensure that nfsd4_process_open2() keeps a reference to the delegation
stateid until it is done working with it. Necessary step toward
client_mutex removal.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-31 14:20:11 -04:00
Trond Myklebust
67cb1279be nfsd: Ensure that nfs4_open_delegation() references the delegation stateid
Ensure that nfs4_open_delegation() keeps a reference to the delegation
stateid until it is done working with it. Necessary step toward
client_mutex removal.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-31 14:20:11 -04:00
Trond Myklebust
858cc57336 nfsd: nfsd4_locku() must reference the lock stateid
Ensure that nfsd4_locku() keeps a reference to the lock stateid
until it is done working with it. Necessary step toward client_mutex
removal.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-31 14:20:10 -04:00
Trond Myklebust
3d0fabd5a4 nfsd: Add reference counting to lock stateids
Ensure that nfsd4_lock() references the lock stateid while it is
manipulating it. Not currently necessary, but will be once the
client_mutex is removed.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-31 14:20:09 -04:00
Jeff Layton
1af71cc801 nfsd: ensure atomicity in nfsd4_free_stateid and nfsd4_validate_stateid
Hold the cl_lock over the bulk of these functions. In addition to
ensuring that they aren't freed prematurely, this will also help prevent
a potential race that could be introduced later. Once we remove the
client_mutex, it'll be possible for FREE_STATEID and CLOSE to race and
for both to try to put the "persistent" reference to the stateid.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-31 14:20:08 -04:00
Jeff Layton
356a95ece7 nfsd: clean up races in lock stateid searching and creation
Preparation for removal of the client_mutex.

Currently, no lock aside from the client_mutex is held when calling
find_lock_state. Ensure that the cl_lock is held by adding a lockdep
assertion.

Once we remove the client_mutex, it'll be possible for another thread to
race in and insert a lock state for the same file after we search but
before we insert a new one. Ensure that doesn't happen by redoing the
search after allocating a new stid that we plan to insert. If one is
found just put the one that was allocated.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-31 14:20:07 -04:00
Jeff Layton
1c755dc1ad nfsd: Add locking to protect the state owner lists
Change to using the clp->cl_lock for this. For now, there's a lot of
cl_lock thrashing, but in later patches we'll eliminate that and close
the potential races that can occur when releasing the cl_lock while
walking the lists. For now, the client_mutex prevents those races.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-31 14:20:07 -04:00
Jeff Layton
b49e084d8c nfsd: do filp_close in sc_free callback for lock stateids
Releasing locks when we unhash the stateid instead of doing so only when
the stateid is actually released will be problematic in later patches
when we need to protect the unhashing with spinlocks. Move it into the
sc_free operation instead.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-31 14:19:50 -04:00
Jeff Layton
4770d72201 nfsd4: use cl_lock to synchronize all stateid idr calls
Currently, this is serialized by the client_mutex, which is slated for
removal. Add finer-grained locking here. Also, do some cleanup around
find_stateid to prepare for taking references.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Benny Halevy <bhalevy@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-31 14:19:25 -04:00
Trond Myklebust
11b9164ada nfsd: Add a struct nfs4_file field to struct nfs4_stid
All stateids are associated with a nfs4_file. Let's consolidate.
Replace delegation->dl_file with the dl_stid.sc_file, and
nfs4_ol_stateid->st_file with st_stid.sc_file.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-31 12:51:34 -04:00
Trond Myklebust
6011695da2 nfsd: Add reference counting to the lock and open stateids
When we remove the client_mutex, we'll need to be able to ensure that
these objects aren't destroyed while we're not holding locks.

Add a ->free() callback to the struct nfs4_stid, so that we can
release a reference to the stid without caring about the contents.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-31 12:43:53 -04:00
Jeff Layton
650ecc8f8f nfsd: remove dl_fh field from struct nfs4_delegation
Now that the nfs4_file has a filehandle in it, we no longer need to
keep a per-delegation copy of it. Switch to using the one in the
nfs4_file instead.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-29 14:49:58 -04:00
Jeff Layton
f54fe962b8 nfsd: give block_delegation and delegation_blocked its own spinlock
The state lock can be fairly heavily contended, and there's no reason
that nfs4_file lookups and delegation_blocked should be mutually
exclusive.  Let's give the new block_delegation code its own spinlock.
It does mean that we'll need to take a different lock in the delegation
break code, but that's not generally as critical to performance.

Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-29 14:49:57 -04:00
Jeff Layton
0b26693c56 nfsd: clean up nfs4_set_delegation
Move the alloc_init_deleg call into nfs4_set_delegation and change the
function to return a pointer to the delegation or an IS_ERR return. This
allows us to skip allocating a delegation if the file has already
experienced a lease conflict.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-29 14:49:56 -04:00
Jeff Layton
4cf59221c7 nfsd: clean up arguments to nfs4_open_delegation
No need to pass in a net pointer since we can derive that.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-29 14:49:55 -04:00
Jeff Layton
f9416e281e nfsd: drop unused stp arg to alloc_init_deleg
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-29 14:49:54 -04:00
Trond Myklebust
02a3508dba nfsd: Convert delegation counter to an atomic_long_t type
We want to convert to an atomic type so that we don't need to lock
across the call to alloc_init_deleg(). Then convert to a long type so
that we match the size of 'max_delegations'.

None of this is a problem today, but it will be once we remove
client_mutex protection.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-29 14:49:54 -04:00
Jeff Layton
2d4a532d38 nfsd: ensure that clp->cl_revoked list is protected by clp->cl_lock
Currently, both destroy_revoked_delegation and revoke_delegation
manipulate the cl_revoked list without any locking aside from the
client_mutex. Ensure that the clp->cl_lock is held when manipulating it,
except for the list walking in destroy_client. At that point, the client
should no longer be in use, and so it should be safe to walk the list
without any locking. That also means that we don't need to do the
list_splice_init there either.

Also, the fact that revoke_delegation deletes dl_recall_lru list_head
without any locking makes it difficult to know whether it's doing so
safely in all cases. Move the list_del_init calls into the callers, and
add a WARN_ON in the event that t's passed a delegation that has a
non-empty list_head.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-29 14:49:53 -04:00
Jeff Layton
4269067696 nfsd: fully unhash delegations when revoking them
Ensure that the delegations cannot be found by the laundromat etc once
we add them to the various 'revoke' lists.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-29 14:49:52 -04:00
Trond Myklebust
f83388341b nfsd: simplify stateid allocation and file handling
Don't allow stateids to clear the open file pointer until they are
being destroyed. In a later patches we'll want to rely on the fact that
we have a valid file pointer when dealing with the stateid and this
will save us from having to do a lot of NULL pointer checks before
doing so.

Also, move to allocating stateids with kzalloc and get rid of the
explicit zeroing of fields.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-29 14:49:51 -04:00
Jeff Layton
f9c00c3ab4 nfsd: Do not let nfs4_file pin the struct inode
Remove the fi_inode field in struct nfs4_file in order to remove the
possibility of struct nfs4_file pinning the inode when it does not have
any open state.

The only place we still need to get to an inode is in check_for_locks,
so change it to use find_any_file and use the inode from any that it
finds. If it doesn't find one, then just assume there aren't any.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-23 16:35:24 -04:00
Trond Myklebust
b07c54a4a3 nfsd: nfs4_check_fh - make it actually check the filehandle
...instead of just checking the inode that corresponds to it.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-23 16:35:24 -04:00
Trond Myklebust
ca94321783 nfsd: Use the filehandle to look up the struct nfs4_file instead of inode
This makes more sense anyway since an inode pointer value can change
even when the filehandle doesn't.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-23 16:35:24 -04:00
Trond Myklebust
e2cf80d73f nfsd: Store the filehandle with the struct nfs4_file
For use when we may not have a struct inode.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-23 16:35:23 -04:00
Jeff Layton
2f6ce8e73c nfsd: ensure that st_access_bmap and st_deny_bmap are initialized to 0
Open stateids must be initialized with the st_access_bmap and
st_deny_bmap set to 0, so that nfs4_get_vfs_file can properly record
their state in old_access_bmap and old_deny_bmap.

This bug was introduced in commit baeb4ff0e5 (nfsd: make deny mode
enforcement more efficient and close races in it) and was causing the
refcounts to end up incorrect when nfs4_get_vfs_file returned an error
after bumping the refcounts. This made it impossible to unmount the
underlying filesystem after running pynfs tests that involve deny modes.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-23 14:20:47 -04:00
Jeff Layton
d55a166c96 nfsd: bump dl_time when unhashing delegation
There's a potential race between a lease break and DELEGRETURN call.

Suppose a lease break comes in and queues the workqueue job for a
delegation, but it doesn't run just yet. Then, a DELEGRETURN comes in
finds the delegation and calls destroy_delegation on it to unhash it and
put its primary reference.

Next, the workqueue job runs and queues the delegation back onto the
del_recall_lru list, issues the CB_RECALL and puts the final reference.
With that, the final reference to the delegation is put, but it's still
on the LRU list.

When we go to unhash a delegation, it's because we intend to get rid of
it soon afterward, so we don't want lease breaks to mess with it once
that occurs. Fix this by bumping the dl_time whenever we unhash a
delegation, to ensure that lease breaks don't monkey with it.

I believe this is a regression due to commit 02e1215f9f (nfsd: Avoid
taking state_lock while holding inode lock in nfsd_break_one_deleg).
Prior to that, the state_lock was held in the lm_break callback itself,
and that would have prevented this race.

Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-22 15:34:47 -04:00
Trond Myklebust
72c0b0fb9f nfsd: Move the delegation reference counter into the struct nfs4_stid
We will want to add reference counting to the lock stateid and open
stateids too in later patches.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-21 17:03:00 -04:00
Jeff Layton
417c6629b2 nfsd: fix race that grants unrecallable delegation
If nfs4_setlease succesfully acquires a new delegation, then another
task breaks the delegation before we reach hash_delegation_locked, then
the breaking task will see an empty fi_delegations list and do nothing.
The client will receive an open reply incorrectly granting a delegation
and will never receive a recall.

Move more of the delegation fields to be protected by the fi_lock. It's
more granular than the state_lock and in later patches we'll want to
be able to rely on it in addition to the state_lock.

Attempt to acquire a delegation. If that succeeds, take the spinlocks
and then check to see if the file has had a conflict show up since then.
If it has, then we assume that the lease is no longer valid and that
we shouldn't hand out a delegation.

There's also one more potential (but very unlikely) problem. If the
lease is broken before the delegation is hashed, then it could leak.
In the event that the fi_delegations list is empty, reset the
fl_break_time to jiffies so that it's cleaned up ASAP by
the normal lease handling code.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-21 16:31:17 -04:00
J. Bruce Fields
57a3714421 nfsd4: CREATE_SESSION should update backchannel immediately
nfsd4_probe_callback kicks off some work that will eventually run
nfsd4_process_cb_update and update the session flags.  In theory we
could process a following SEQUENCE call before that update happens
resulting in flags that don't accurately represent, for example, the
lack of a backchannel.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-21 12:30:50 -04:00
Trond Myklebust
b0fc29d6fc nfsd: Ensure stateids remain unique until they are freed
Add an extra delegation state to allow the stateid to remain in the idr
tree until the last reference has been released. This will be necessary
to ensure uniqueness once the client_mutex is removed.

[jlayton: reset the sc_type under the state_lock in unhash_delegation]

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-16 21:39:51 -04:00
Jeff Layton
d564fbec7a nfsd: nfs4_alloc_init_lease should take a nfs4_file arg
No need to pass the delegation pointer in here as it's only used to get
the nfs4_file pointer.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-16 21:35:25 -04:00
Jeff Layton
02e1215f9f nfsd: Avoid taking state_lock while holding inode lock in nfsd_break_one_deleg
state_lock is a heavily contended global lock. We don't want to grab
that while simultaneously holding the inode->i_lock.

Add a new per-nfs4_file lock that we can use to protect the
per-nfs4_file delegation list. Hold that while walking the list in the
break_deleg callback and queue the workqueue job for each one.

The workqueue job can then take the state_lock and do the list
manipulations without the i_lock being held prior to starting the
rpc call.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-16 21:06:12 -04:00
Jeff Layton
e8051c837b nfsd: eliminate nfsd4_init_callback
It's just an obfuscated INIT_WORK call. Just make the work_func_t a
non-static symbol and use a normal INIT_WORK call.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-16 14:18:58 -04:00
Jeff Layton
a46cb7f287 nfsd: cleanup and rename nfs4_check_open
Rename it to better describe what it does, and have it just return the
stateid instead of a __be32 (which is now always nfs_ok). Also, do the
search for an existing stateid after the delegation check, to reduce
cleanup if the delegation check returns error.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-11 11:06:38 -04:00
Jeff Layton
baeb4ff0e5 nfsd: make deny mode enforcement more efficient and close races in it
The current enforcement of deny modes is both inefficient and scattered
across several places, which makes it hard to guarantee atomicity. The
inefficiency is a problem now, and the lack of atomicity will mean races
once the client_mutex is removed.

First, we address the inefficiency. We have to track deny modes on a
per-stateid basis to ensure that open downgrades are sane, but when the
server goes to enforce them it has to walk the entire list of stateids
and check against each one.

Instead of doing that, maintain a per-nfs4_file deny mode. When a file
is opened, we simply set any deny bits in that mode that were specified
in the OPEN call. We can then use that unified deny mode to do a simple
check to see whether there are any conflicts without needing to walk the
entire stateid list.

The only time we'll need to walk the entire list of stateids is when a
stateid that has a deny mode on it is being released, or one is having
its deny mode downgraded. In that case, we must walk the entire list and
recalculate the fi_share_deny field. Since deny modes are pretty rare
today, this should be very rare under normal workloads.

To address the potential for races once the client_mutex is removed,
protect fi_share_deny with the fi_lock. In nfs4_get_vfs_file, check to
make sure that any deny mode we want to apply won't conflict with
existing access. If that's ok, then have nfs4_file_get_access check that
new access to the file won't conflict with existing deny modes.

If that also passes, then get file access references, set the correct
access and deny bits in the stateid, and update the fi_share_deny field.
If opening the file or truncating it fails, then unwind the whole mess
and return the appropriate error.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-11 11:06:32 -04:00
Jeff Layton
7214e8600e nfsd: always hold the fi_lock when bumping fi_access refcounts
Once we remove the client_mutex, there's an unlikely but possible race
that could occur. It will be possible for nfs4_file_put_access to race
with nfs4_file_get_access. The refcount will go to zero (briefly) and
then bumped back to one. If that happens we set ourselves up for a
use-after-free and the potential for a lock to race onto the i_flock
list as a filp is being torn down.

Ensure that we can safely bump the refcount on the file by holding the
fi_lock whenever that's done. The only place it currently isn't is in
get_lock_access.

In order to ensure atomicity with finding the file, use the
find_*_file_locked variants and then call get_lock_access to get new
access references on the nfs4_file under the same lock.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-11 11:06:17 -04:00
Jeff Layton
3b84240a7b nfsd: clean up reset_union_bmap_deny
Fix the "deny" argument type, and start the loop at 1. The 0 iteration
is always a noop.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-11 11:06:11 -04:00
Jeff Layton
6eb3a1d096 nfsd: set stateid access and deny bits in nfs4_get_vfs_file
Cleanup -- ensure that the stateid bits are set at the same time that
the file access refcounts are incremented. Keeping them coherent like
this makes it easier to ensure that we account for all of the
references.

Since the initialization of the st_*_bmap fields is done when it's
hashed, we go ahead and hash the stateid before getting access to the
file and unhash it if that function returns error. This will be
necessary anyway in a follow-on patch that will overhaul deny mode
handling.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-11 11:06:05 -04:00
Jeff Layton
c11c591fe6 nfsd: shrink st_access_bmap and st_deny_bmap
We never use anything above bit #3, so an unsigned long for each is
wasteful. Shrink them to a char each, and add some WARN_ON_ONCE calls if
we try to set or clear bits that would go outside those sizes.

Note too that because atomic bitops work on unsigned longs, we have to
abandon their use here. That shouldn't be a problem though since we
don't really care about the atomicity in this code anyway. Using them
was just a convenient way to flip bits.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-11 11:06:04 -04:00
Jeff Layton
6d338b51eb nfsd: remove nfs4_file_put_fd
...and replace it with a simple swap call.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-11 11:05:57 -04:00
Jeff Layton
1265965172 nfsd: refactor nfs4_file_get_access and nfs4_file_put_access
Have them take NFS4_SHARE_ACCESS_* flags instead of an open mode. This
spares the callers from having to convert it themselves.

This also allows us to simplify these functions as we no longer need
to do the access_to_omode conversion in either one.

Note too that this patch eliminates the WARN_ON in
__nfs4_file_get_access. It's valid for now, but in a later patch we'll
be bumping the refcounts prior to opening the file in order to close
some races, at which point we'll need to remove it anyway.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-11 11:03:23 -04:00
Trond Myklebust
e20fcf1e65 nfsd: clean up helper __release_lock_stateid
Use filp_close instead of open coding. filp_close does a bit more than
just release the locks and put the filp. It also calls ->flush and
dnotify_flush, both of which should be done here anyway.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-10 15:05:26 -04:00
Trond Myklebust
de18643dce nfsd: Add locking to the nfs4_file->fi_fds[] array
Preparation for removal of the client_mutex, which currently protects
this array. While we don't actually need the find_*_file_locked variants
just yet, a later patch will. So go ahead and add them now to reduce
future churn in this code.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-10 15:05:26 -04:00
Trond Myklebust
1d31a2531a nfsd: Add fine grained protection for the nfs4_file->fi_stateids list
Access to this list is currently serialized by the client_mutex. Add
finer grained locking around this list in preparation for its removal.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-10 15:05:25 -04:00
Jeff Layton
d6c249b4d4 nfsd: reduce some spinlocking in put_client_renew
No need to take the lock unless the count goes to 0.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-10 13:41:00 -04:00
Jeff Layton
dff1399f8a nfsd: close potential race between delegation break and laundromat
Bruce says:

    There's also a preexisting expire_client/laundromat vs break race:

    - expire_client/laundromat adds a delegation to its local
      reaplist using the same dl_recall_lru field that a delegation
      uses to track its position on the recall lru and drops the
      state lock.

    - a concurrent break_lease adds the delegation to the lru.

    - expire/client/laundromat then walks it reaplist and sees the
      lru head as just another delegation on the list....

Fix this race by checking the dl_time under the state_lock. If we find
that it's not 0, then we know that it has already been queued to the LRU
list and that we shouldn't queue it again.

In the case of destroy_client, we must also ensure that we don't hit
similar races by ensuring that we don't move any delegations to the
reaplist with a dl_time of 0. Just bump the dl_time by one before we
drop the state_lock. We're destroying the delegations anyway, so a 1s
difference there won't matter.

The fault injection code also requires a bit of surgery here:

First, in the case of nfsd_forget_client_delegations, we must prevent
the same sort of race vs. the delegation break callback. For that, we
just increment the dl_time to ensure that a delegation callback can't
race in while we're working on it.

We can't do that for nfsd_recall_client_delegations, as we need to have
it actually queue the delegation, and that won't happen if we increment
the dl_time. The state lock is held over that function, so we don't need
to worry about these sorts of races there.

There is one other potential bug nfsd_recall_client_delegations though.
Entries on the victims list are not dequeued before calling
nfsd_break_one_deleg. That's a potential list corruptor, so ensure that
we do that there.

Reported-by: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-10 13:40:51 -04:00
Trond Myklebust
0fe492db60 nfsd: Convert nfs4_check_open_reclaim() to work with lookup_clientid()
lookup_clientid is preferable to find_confirmed_client since it's able
to use the cached client in the compound state.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-09 20:55:07 -04:00
Trond Myklebust
2d91e8953c nfsd: Always use lookup_clientid() in nfsd4_process_open1
In later patches, we'll be moving the stateowner table into the
nfs4_client, and by doing this we ensure that we have a cached
nfs4_client pointer.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-09 20:55:06 -04:00
Trond Myklebust
13d6f66b08 nfsd: Convert nfsd4_process_open1() to work with lookup_clientid()
...and have alloc_init_open_stateowner just use the cstate->clp pointer
instead of passing in a clp separately. This allows us to use the
cached nfs4_client pointer in the cstate instead of having to look it
up again.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-09 20:55:05 -04:00
Jeff Layton
4b24ca7d30 nfsd: Allow struct nfsd4_compound_state to cache the nfs4_client
We want to use the nfsd4_compound_state to cache the nfs4_client in
order to optimise away extra lookups of the clid.

In the v4.0 case, we use this to ensure that we only have to look up the
client at most once per compound for each call into lookup_clientid. For
v4.1+ we set the pointer in the cstate during SEQUENCE processing so we
should never need to do a search for it.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-09 20:55:04 -04:00
Trond Myklebust
2dd6e458c3 nfsd: Cleanup - Let nfsd4_lookup_stateid() take a cstate argument
The cstate already holds information about the session, and hence
the client id, so it makes more sense to pass that information
rather than the current practice of passing a 'minor version' number.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-09 20:55:01 -04:00
Trond Myklebust
d4e19e7027 nfsd: Don't get a session reference without a client reference
If the client were to disappear from underneath us while we're holding
a session reference, things would be bad. This cleanup helps ensure
that it cannot, which will be a possibility when the client_mutex is
removed.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-09 20:55:00 -04:00
Jeff Layton
fd44907c2d nfsd: clean up nfsd4_release_lockowner
Now that we know that we won't have several lockowners with the same,
owner->data, we can simplify nfsd4_release_lockowner and get rid of
the lo_list in the process.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-09 20:54:59 -04:00
Trond Myklebust
b3c32bcd9c nfsd: NFSv4 lock-owners are not associated to a specific file
Just like open-owners, lock-owners are associated with a name, a clientid
and, in the case of minor version 0, a sequence id. There is no association
to a file.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-09 20:54:58 -04:00
Jeff Layton
c53530da4d nfsd: Allow lockowners to hold several stateids
A lockowner can have more than one lock stateid. For instance, if a
process has more than one file open and has locks on both, then the same
lockowner has more than one stateid associated with it. Change it so
that this reality is better reflected by the objects that nfsd uses.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-09 20:54:57 -04:00