linux-next

mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-23 20:53:53 +08:00

Author	SHA1	Message	Date
Trond Myklebust	da2e812751	NFSv4.2: Fix up a decoding error in layoutstats According to the spec, the server is only returning the status, which we decode in the op header. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-27 11:30:57 -04:00
Trond Myklebust	d620876990	pNFS/flexfiles: Fix the reset of struct pgio_header when resending hdr->good_bytes needs to be set to the length of the request, not zero. Cc: stable@vger.kernel.org # 4.0+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-26 15:39:50 -04:00
Trond Myklebust	c0f5f5059f	pNFS/flexfiles: Turn off layoutcommit for servers that don't need it This patch ensures that we record the value of 'ffl_flags' from the layout, and then checks for the presence of the FF_FLAGS_NO_LAYOUTCOMMIT flag before deciding whether or not to call pnfs_set_layoutcommit(). The effect is that servers now can decide whether or not they want the client to call layoutcommit before returning a writeable layout. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-26 14:51:32 -04:00
Trond Myklebust	122d328d30	Merge branch 'layoutstats' * layoutstats: pnfs/flexfiles: protect ktime manipulation with mirror lock nfs: provide pnfs_report_layoutstat when NFS42 is disabled pnfs/flexfiles: report layoutstat regularly nfs42: serialize LAYOUTSTATS calls of the same file pnfs/flexfiles: encode LAYOUTSTATS flexfiles specific data pnfs/flexfiles: add ff_layout_prepare_layoutstats pNFS/flexfiles: track when layout is first used pNFS/flexfiles: add layoutstats tracking pNFS/flexfiles: Remove unused struct members user_name, group_name pnfs: add pnfs_report_layoutstat helper function pNFS: fill in nfs42_layoutstat_ops NFSv.2/pnfs Add a LAYOUTSTATS rpc function	2015-06-26 14:01:59 -04:00
Peng Tao	9bbd9bb40c	pnfs/flexfiles: protect ktime manipulation with mirror lock It looks as if xchg() and cmpxchg() are not available for 64-bit integers on sparc32: > New breakage seen in linux-next today: > > ERROR: "__xchg_called_with_bad_pointer" [fs/nfs/flexfilelayout/nfs_layout_flexfiles.ko] undefined! > ERROR: "__cmpxchg_called_with_bad_pointer" [fs/nfs/flexfilelayout/nfs_layout_flexfiles.ko] undefined! > make[2]: * [__modpost] Error 1 > make[1]: * [modules] Error 2 Given that mirror ktime manipulation is already under mirror->lock, let's make use of the fact. Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-26 14:01:37 -04:00
Peng Tao	865a7ecb21	nfs: provide pnfs_report_layoutstat when NFS42 is disabled kbuild test robot reported: fs/built-in.o: In function `pnfs_report_layoutstat': >> (.text+0x151a1c): undefined reference to `nfs42_proc_layoutstats_generic' Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-26 14:01:37 -04:00
Benjamin Coddington	18a6008972	nfs: verify open flags before allowing open Commit `9597c13b` forbade opens with O_APPEND\|O_DIRECT for NFSv4: nfs: verify open flags before allowing an atomic open Currently, you can open a NFSv4 file with O_APPEND\|O_DIRECT, but cannot fcntl(F_SETFL,...) with those flags. This flag combination is explicitly forbidden on NFSv3 opens, and it seems like it should also be on NFSv4. However, you can still open a file with O_DIRECT\|O_APPEND if there exists a cached dentry for the file because nfs4_file_open() is used instead of nfs_atomic_open() and the check is bypassed. Add the check in nfs4_file_open() as well. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-25 19:38:00 -04:00
Jeff Layton	0c8315dd56	nfs: always update creds in mirror, even when we have an already connected ds A ds can be associated with more than one mirror, but we currently skip setting a mirror's credentials if we find that it's already set up with a connected client. The upshot is that we can end up sending DS writes with MDS credentials instead of properly setting them up. Fix nfs4_ff_layout_prepare_ds to always verify that the mirror's credentials are set up, even when we have a DS that's already connected. Reported-by: Tom Haynes <thomas.haynes@primarydata.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Cc: stable@vger.kernel.org # 4.0+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-25 19:35:21 -04:00
Jeff Layton	a24221dca1	nfs: fix potential credential leak in ff_layout_update_mirror_cred If we have two tasks racing to update a mirror's credentials, then they can end up leaking one (or more) sets of credentials. The first task will set mirror->cred and then the second task will just overwrite it. Use a cmpxchg to ensure that the creds are only set once. If we get to the point where we would set mirror->cred and find that they're already set, then we just release the creds that were just found. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Cc: stable@vger.kernel.org # 4.0+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-25 19:34:40 -04:00
Peng Tao	97ba375b5d	pnfs/flexfiles: report layoutstat regularly As a simple scheme, report every minute if IO is still going on. Reviewed-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-24 10:54:23 -04:00
Peng Tao	1bfe3b259f	nfs42: serialize LAYOUTSTATS calls of the same file There is no need to report concurrently. Reviewed-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-24 10:53:11 -04:00
Peng Tao	27c4306443	pnfs/flexfiles: encode LAYOUTSTATS flexfiles specific data Reviewed-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-24 10:53:11 -04:00
Peng Tao	ad4dc53e64	pnfs/flexfiles: add ff_layout_prepare_layoutstats It fills in the generic part of LAYOUTSTATS call. One thing to note is that we don't really track if IO is continuous or not. So just fake to use the completed bytes for it. Still missing flexfiles specific part, which will be included in the next patch. Reviewed-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-24 10:53:11 -04:00
Peng Tao	d983803d38	pNFS/flexfiles: track when layout is first used So that we can report cumulative time since the beginning of statistics collection of the layout. Reviewed-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-24 10:53:10 -04:00
Trond Myklebust	abcb7bfc9f	pNFS/flexfiles: add layoutstats tracking Reviewed-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-24 10:53:10 -04:00
Trond Myklebust	27797d1bb3	pNFS/flexfiles: Remove unused struct members user_name, group_name Reviewed-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-24 10:17:37 -04:00
Peng Tao	8733408d6e	pnfs: add pnfs_report_layoutstat helper function Reviewed-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-24 10:17:37 -04:00
Peng Tao	1b4a4bd82c	pNFS: fill in nfs42_layoutstat_ops Reviewed-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-24 10:17:37 -04:00
Trond Myklebust	be3a5d2339	NFSv.2/pnfs Add a LAYOUTSTATS rpc function Reviewed-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-24 10:17:37 -04:00
Trond Myklebust	1372a3130a	Merge branch 'bugfixes' * bugfixes: NFS: Ensure we set NFS_CONTEXT_RESEND_WRITES when requeuing writes pNFS: Fix a memory leak when attempted pnfs fails NFS: Ensure that we update the sequence id under the slot table lock nfs: Initialize cb_sequenceres information before validate_seqid() nfs: Only update callback sequnce id when CB_SEQUENCE success NFSv4: nfs4_handle_delegation_recall_error should ignore EAGAIN	2015-06-22 09:55:08 -04:00
Trond Myklebust	775f06ab49	SUNRPC: Set the TCP user timeout option on client sockets Use the TCP_USER_TIMEOUT socket option to advertise to the server how long we will keep the connection open if there is unacknowledged data. See RFC5482. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-20 15:31:54 -04:00
Trond Myklebust	4876cc779f	SUNRPC: Ensure we release the TCP socket once it has been closed This fixes a regression introduced by commit `caf4ccd4e8` ("SUNRPC: Make xs_tcp_close() do a socket shutdown rather than a sock_release"). Prior to that commit, the autoclose feature would ensure that an idle connection would result in the socket being both disconnected and released, whereas now only gets disconnected. While the current behaviour is harmless, it does leave the port bound until either RPC traffic resumes or the RPC client is shut down. Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-19 19:20:12 -04:00
Trond Myklebust	3832591e6f	SUNRPC: Handle connection issues correctly on the back channel If the back channel is disconnected, we can and should just fail the transmission. The expectation is that the NFSv4.1 server will always retransmit any outstanding callbacks once the connection is re-established. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-19 13:04:13 -04:00
Yijing Wang	dfad7000f3	nfs: Fix comment for nfs_pageio_init() and nfs_pageio_complete_mirror() Signed-off-by: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-18 08:59:13 -04:00
Fabian Frederick	d1381929de	sunrpc: use sg_init_one() in krb5_rc4_setup_enc/seq_key() Don't opencode sg_init_one() Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-18 08:58:38 -04:00
Trond Myklebust	c70701131f	NFS: Ensure we set NFS_CONTEXT_RESEND_WRITES when requeuing writes If a write attempt fails, and the write is queued up for resending to the server, as opposed to being dropped, then we need to set the appropriate flag so that nfs_file_fsync() does the right thing. Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-17 20:00:42 -04:00
Trond Myklebust	1ca018d28d	pNFS: Fix a memory leak when attempted pnfs fails pnfs_do_write() expects the call to pnfs_write_through_mds() to free the pgio header and to release the layout segment before exiting. The problem is that nfs_pgio_data_destroy() doesn't actually do this; it only frees the memory allocated by nfs_generic_pgio(). Ditto for pnfs_do_read()... Fix in both cases is to add a call to hdr->release(hdr). Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-17 20:00:26 -04:00
Trond Myklebust	3438995bc4	NFS: NFSoRDMA Client Changes These patches continue to build up for improving the rsize and wsize that the NFS client uses when talking over RDMA. In addition, these patches also add in scalability enhancements and other bugfixes. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJVey8qAAoJENfLVL+wpUDroT4P/3lwspXwdxS6VZWsW1VpNtdV V1KKd5D+TkpBpz/ih9GdOVaBZijaHpb6XtMReh8xuh0KI893iYmsmLoyhMTPJMsU 6sjUDEv8IFrXwlRKldX1KEfBvNgR0czCNiha6O+YsV5Az08+zr57ahyGKmLUzMxo 4XzPZbwnb5fxvgmBgENUU33g+xXGsXDbsdzLvKW3UGcPU2x6PGOTLr5vP7lQkwxE 20d9ak8xQeRUk0hsmRM4fAebzcluD1o3PLIFQBEhh0Gqm1VGtSCkr9o493gT5TgM /+XrU7B8OnbdJ1B4f/y4Bz4RucfKzyRuXMpulrnK1hL7QIiZLqiph7UrTel/ajcD 0us9PImNwXPo8tMz7Wjw2XMQplndHB3FG3M3lXlJGHlXvCI7F0yjm21AP4SeetOm kxL24Qiyi7l/S7JJxHqNlOc0b8kpVLohBZm6yee9w4r/JUPnynUqfnXCHLjIp/5W F1hzbCUATyfKrSs7VKO0hCQHfntigPEhRmyfoyXRAXzl5LnR1XqD6Wah3a3pwXn+ mEquUd6fKRHIIvJ8cKU6KtykkhRHg1sR/z1mw2ZEW/2PCd0cb+8+WN7X/fQqEN+u +VQSo7oPp38SHdsyozuUUyukN5qHptTMSrNZL+LI7J8/0+BuRuIvW0nojViapc51 LOUlcgqRdUlIvmn754Yo =N1tO -----END PGP SIGNATURE----- Merge tag 'nfs-rdma-for-4.2' of git://git.linux-nfs.org/projects/anna/nfs-rdma NFS: NFSoRDMA Client Changes These patches continue to build up for improving the rsize and wsize that the NFS client uses when talking over RDMA. In addition, these patches also add in scalability enhancements and other bugfixes. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> * tag 'nfs-rdma-for-4.2' of git://git.linux-nfs.org/projects/anna/nfs-rdma: (142 commits) xprtrdma: Reduce per-transport MR allocation xprtrdma: Stack relief in fmr_op_map() xprtrdma: Split rb_lock xprtrdma: Remove rpcrdma_ia::ri_memreg_strategy xprtrdma: Remove ->ro_reset xprtrdma: Remove unused LOCAL_INV recovery logic xprtrdma: Acquire MRs in rpcrdma_register_external() xprtrdma: Introduce an FRMR recovery workqueue xprtrdma: Acquire FMRs in rpcrdma_fmr_register_external() xprtrdma: Introduce helpers for allocating MWs xprtrdma: Use ib_device pointer safely xprtrdma: Remove rr_func xprtrdma: Replace rpcrdma_rep::rr_buffer with rr_rxprt xprtrdma: Warn when there are orphaned IB objects ...	2015-06-16 11:37:50 -04:00
Trond Myklebust	5ba12443a1	NFSv4: Fix stateid recovery on revoked delegations Ensure that we fix the non-NULL stateid case as well. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-16 11:29:51 -04:00
Olga Kornievskaia	ae2ffef383	Recover from stateid-type error on SETATTR Client can receives stateid-type error (eg., BAD_STATEID) on SETATTR when delegation stateid was used. When no open state exists, in case of application calling truncate() on the file, client has no state to recover and fails with EIO. Instead, upon such error, return the bad delegation and then resend the SETATTR with a zero stateid. Signed-off: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-16 11:29:46 -04:00
Kinglong Mee	df05a49f72	nfs: Fix showing truncated fsid/dev in, /proc/net/nfsfs/volumes A truncated fsid showing from /proc/fs/nfsfs/volumes as, NV SERVER PORT DEV FSID FSC v4 c0a80881 801 0:43 34931f044c2a439b no It should be as, NV SERVER PORT DEV FSID FSC v4 c0a80881 801 0:43 34931f044c2a439b:954c5d830fa4be8c no The max buffer length for storing "%llx:%llx" format should be 16 + 1 + 16 + 1 = 34 (16 for %llx, 1 for ':', 1 for '\0'). Also, for storing "%u:%u" of MAJOR() and MINOR() should be 8 + 1 + 3 + 1 = 13 (8 for 2^24, 1 for ':', 3 for 2^8, 1 for '\0'). v2, add comments for dev/fsid buffer and use sizeof in snprintf. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-16 11:17:37 -04:00
Jeff Layton	873e385116	nfs: make nfs4_init_uniform_client_string use a dynamically allocated buffer Change the uniform client string generator to dynamically allocate the NFSv4 client name string buffer. With this patch, we can eliminate the buffers that are embedded within the "args" structs and simply use the name string that is hanging off the client. This uniform string case is a little simpler than the nonuniform since we don't need to deal with RCU, but we do have two different cases, depending on whether there is a uniquifier or not. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-16 11:15:51 -04:00
Jeff Layton	a319268891	nfs: make nfs4_init_nonuniform_client_string use a dynamically allocated buffer The way the *_client_string functions work is a little goofy. They build the string in an on-stack buffer and then use kstrdup to copy it. This is not only stack-heavy but artificially limits the size of the client name string. Change it so that we determine the length of the string, allocate it and then scnprintf into it. Since the contents of the nonuniform string depend on rcu-managed data structures, it's possible that they'll change between when we allocate the string and when we go to fill it. If that happens, free the string, recalculate the length and try again. If it the mismatch isn't resolved on the second try then just give up and return -EINVAL. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-16 11:15:45 -04:00
Jeff Layton	b8fb2f595e	nfs: update maxsz values for SETCLIENTID and EXCHANGE_ID The spec allows for up to NFS4_OPAQUE_LIMIT (1k). While we'll almost certainly never use that much, these ops are generally the only ones in the compound so we might as well allow for them to be that large. Also, the existing code didn't add in a word for the opaque length field for either name string. Fix that while we're in there. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-16 11:15:40 -04:00
Jeff Layton	3a6bb73879	nfs: convert setclientid and exchange_id encoders to use clp->cl_owner_id ...instead of buffers that are part of their arg structs. We already hold a reference to the client, so we might as well use the allocated buffer. In the event that we can't allocate the clp->cl_owner_id, then just return -ENOMEM. Note too that we switch from a GFP_KERNEL allocation here to GFP_NOFS. It's possible we could end up trying to do a SETCLIENTID or EXCHANGE_ID in order to reclaim some memory, and the GFP_KERNEL allocations in the existing code could cause recursion back into NFS reclaim. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-16 11:15:31 -04:00
Jeff Layton	764ad8ba8c	nfs: increase size of EXCHANGE_ID name string buffer The current buffer is much too small if you have a relatively long hostname. Bring it up to the size of the one that SETCLIENTID has. Cc: <stable@vger.kernel.org> Reported-by: Michael Skralivetsky <michael.skralivetsky@primarydata.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-16 11:15:21 -04:00
Fabian Frederick	455b6ee645	pnfs/flexfiles: use swap() in ff_layout_sort_mirrors() Use kernel.h macro definition. Thanks to Julia Lawall for Coccinelle scripting support. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-16 11:14:03 -04:00
Neil Brown	2980731811	SUNRPC: never enqueue a ->rq_cong request on ->sending If the sending queue has a task without ->rq_cong set at the front, and then a number of tasks with ->rq_cong set such that they use the entire congestion window, then the queue deadlocks. The first entry cannot be processed until later entries complete. This scenario has been seen with a client using UDP to access a server, and the network connection breaking for a period of time - it doesn't recover. It never really makes sense for an ->rq_cong request to be on the ->sending queue, but it can happen when a request is being retried, and finds the transport if locked (XPRT_LOCKED). In this case we simple call __xprt_put_cong() and the deadlock goes away. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-16 11:13:21 -04:00
Chuck Lever	40c6ed0c8a	xprtrdma: Reduce per-transport MR allocation Reduce resource consumption per-transport to make way for increasing the credit limit and maximum r/wsize. Pre-allocate fewer MRs. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Devesh Sharma <devesh.sharma@avagotech.com> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com> Reviewed-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2015-06-12 13:10:37 -04:00
Chuck Lever	acb9da7a57	xprtrdma: Stack relief in fmr_op_map() fmr_op_map() declares a 64 element array of u64 in automatic storage. This is 512 bytes (8 * 64) on the stack. Instead, when FMR memory registration is in use, pre-allocate a physaddr array for each rpcrdma_mw. This is a pre-requisite for increasing the r/wsize maximum for FMR on platforms with 4KB pages. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Devesh Sharma <devesh.sharma@avagotech.com> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com> Reviewed-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2015-06-12 13:10:37 -04:00
Chuck Lever	58d1dcf5a8	xprtrdma: Split rb_lock /proc/lock_stat showed contention between rpcrdma_buffer_get/put and the MR allocation functions during I/O intensive workloads. Now that MRs are no longer allocated in rpcrdma_buffer_get(), there's no reason the rb_mws list has to be managed using the same lock as the send/receive buffers. Split that lock. The new lock does not need to disable interrupts because buffer get/put is never called in an interrupt context. struct rpcrdma_buffer is re-arranged to ensure rb_mwlock and rb_mws are always in a different cacheline than rb_lock and the buffer pointers. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com> Reviewed-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2015-06-12 13:10:37 -04:00
Chuck Lever	7e53df111b	xprtrdma: Remove rpcrdma_ia::ri_memreg_strategy Clean up: This field is no longer used. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Devesh Sharma <devesh.sharma@avagotech.com> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com> Reviewed-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2015-06-12 13:10:37 -04:00
Chuck Lever	3269a94b62	xprtrdma: Remove ->ro_reset An RPC can exit at any time. When it does so, xprt_rdma_free() is called, and it calls ->op_unmap(). If ->ro_reset() is running due to a transport disconnect, the two methods can race while processing the same rpcrdma_mw. The results are unpredictable. Because of this, in previous patches I've altered ->ro_map() to handle MR reset. ->ro_reset() is no longer needed and can be removed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Devesh Sharma <devesh.sharma@avagotech.com> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com> Reviewed-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2015-06-12 13:10:37 -04:00
Chuck Lever	06b00880b0	xprtrdma: Remove unused LOCAL_INV recovery logic Clean up: Remove functions no longer used to recover broken FRMRs. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Devesh Sharma <devesh.sharma@avagotech.com> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com> Reviewed-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2015-06-12 13:10:37 -04:00
Chuck Lever	c14d86e591	xprtrdma: Acquire MRs in rpcrdma_register_external() Acquiring 64 MRs in rpcrdma_buffer_get() while holding the buffer pool lock is expensive, and unnecessary because most modern adapters can transfer 100s of KBs of payload using just a single MR. Instead, acquire MRs one-at-a-time as chunks are registered, and return them to rb_mws immediately during deregistration. Note: commit `539431a437` ("xprtrdma: Don't invalidate FRMRs if registration fails") is reverted: There is now a valid case where registration can fail (with -ENOMEM) but the QP is still in RTS. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com> Reviewed-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2015-06-12 13:10:37 -04:00
Chuck Lever	951e721ca0	xprtrdma: Introduce an FRMR recovery workqueue After a transport disconnect, FRMRs can be left in an undetermined state. In particular, the MR's rkey is no good. Currently, FRMRs are fixed up by the transport connect worker, but that can race with ->ro_unmap if an RPC happens to exit while the transport connect worker is running. A better way of dealing with broken FRMRs is to detect them before they are re-used by ->ro_map. Such FRMRs are either already invalid or are owned by the sending RPC, and thus no race with ->ro_unmap is possible. Introduce a mechanism for handing broken FRMRs to a workqueue to be reset in a context that is appropriate for allocating resources (ie. an ib_alloc_fast_reg_mr() API call). This mechanism is not yet used, but will be in subsequent patches. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com> Reviewed-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2015-06-12 13:10:36 -04:00
Chuck Lever	fc7fbb59e7	xprtrdma: Acquire FMRs in rpcrdma_fmr_register_external() Acquiring 64 FMRs in rpcrdma_buffer_get() while holding the buffer pool lock is expensive, and unnecessary because FMR mode can transfer up to a 1MB payload using just a single ib_fmr. Instead, acquire ib_fmrs one-at-a-time as chunks are registered, and return them to rb_mws immediately during deregistration. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com> Reviewed-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2015-06-12 13:10:36 -04:00
Chuck Lever	346aa66b2a	xprtrdma: Introduce helpers for allocating MWs We eventually want to handle allocating MWs one at a time, as needed, instead of grabbing 64 and throwing them at each RPC in the pipeline. Add a helper for grabbing an MW off rb_mws, and a helper for returning an MW to rb_mws. These will be used in a subsequent patch. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com> Reviewed-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2015-06-12 13:10:36 -04:00
Chuck Lever	89e0d11258	xprtrdma: Use ib_device pointer safely The connect worker can replace ri_id, but prevents ri_id->device from changing during the lifetime of a transport instance. The old ID is kept around until a new ID is created and the ->device is confirmed to be the same. Cache a copy of ri_id->device in rpcrdma_ia and in rpcrdma_rep. The cached copy can be used safely in code that does not serialize with the connect worker. Other code can use it to save an extra address generation (one pointer dereference instead of two). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com> Reviewed-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2015-06-12 13:10:36 -04:00
Chuck Lever	494ae30d2a	xprtrdma: Remove rr_func A posted rpcrdma_rep never has rr_func set to anything but rpcrdma_reply_handler. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com> Reviewed-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2015-06-12 13:10:36 -04:00

1 2 3 4 5 ...

520176 Commits