linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-12-02 08:34:20 +08:00

Author	SHA1	Message	Date
Trond Myklebust	0445f92c5d	SUNRPC: Fix disconnection races When the socket is closed, we need to call xprt_disconnect_done() in order to clean up the XPRT_WRITE_SPACE flag, and wake up the sleeping tasks. However, we also want to ensure that we don't wake them up before the socket is closed, since that would cause thundering herd issues with everyone piling up to retransmit before the TCP shutdown dance has completed. Only the task that holds XPRT_LOCKED needs to wake up early in order to allow the close to complete. Reported-by: Dave Wysochanski <dwysocha@redhat.com> Reported-by: Scott Mayhew <smayhew@redhat.com> Cc: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>	2018-12-18 11:03:57 -05:00
Trond Myklebust	79462857eb	SUNRPC: Don't force a redundant disconnection in xs_read_stream() If the connection is broken, then xs_tcp_state_change() will take care of scheduling the socket close as soon as appropriate. xs_read_stream() just needs to report the error. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-12-05 07:11:12 -05:00
Trond Myklebust	dfcf038085	SUNRPC: Fix up socket polling Ensure that we do not exit the socket read callback without clearing XPRT_SOCK_DATA_READY. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-12-05 07:11:12 -05:00
Trond Myklebust	b76a5afdce	SUNRPC: Use the discard iterator rather than MSG_TRUNC When discarding message data from the stream, we're better off using the discard iterator, since that will work with non-TCP streams. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-12-05 07:11:12 -05:00
Trond Myklebust	26781eab48	SUNRPC: Treat EFAULT as a truncated message in xs_read_stream_request() Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-12-05 07:11:12 -05:00
Trond Myklebust	16e5e90f0e	SUNRPC: Fix up handling of the XDRBUF_SPARSE_PAGES flag If the allocator fails before it has reached the target number of pages, then we need to recheck that we're not seeking past the page buffer. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-12-05 07:11:12 -05:00
Trond Myklebust	c443305529	SUNRPC: Fix RPC receive hangs The RPC code is occasionally hanging when the receive code fails to empty the socket buffer due to a partial read of the data. When we convert that to an EAGAIN, it appears we occasionally leave data in the socket. The fix is to just keep reading until the socket returns EAGAIN/EWOULDBLOCK. Reported-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: Cristian Marussi <cristian.marussi@arm.com> Reported-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Tested-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Cristian Marussi <cristian.marussi@arm.com>	2018-12-05 07:10:06 -05:00
Al Viro	0e9b4a8271	missing bits of "iov_iter: Separate type from direction and use accessor functions" sunrpc patches from nfs tree conflict with calling conventions change done in iov_iter work. Trivial fixup... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-11-01 18:19:03 -04:00
Trond Myklebust	93bdcf9fdb	NFS RDMA client updates for Linux 4.20 Stable bugfixes: - Reset credit grant properly after a disconnect Other bugfixes and cleanups: - xprt_release_rqst_cong is called outside of transport_lock - Create more MRs at a time and toss out old ones during recovery - Various improvements to the RDMA connection and disconnection code: - Improve naming of trace events, functions, and variables - Add documenting comments - Fix metrics and stats reporting - Fix a tracepoint sparse warning -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAlvHmcUACgkQ18tUv7Cl QOv5Mg//ZIL92L6WqW2C+Tddr4UcPg1YphBEwGo3+TrswSRg3ncDiTQ8ycOOrmoy 7m5Oe5I1uEM0Ejqu0lh0uoxJlxRtMF0pwpnTA2Mx6bb4GLSTXjJQomKBhZ3v6owo RQaQZTnAT+T5w1jZMuImdZ+c1zNNiFonSdPO7Er5jbdczvY6N7bg84goLoXLZkjk cuYFbBl3DAyoUJ1usgiuCZLbMcEe0isJEtFU45dLkxxFkvNk+gO8UtA48qe0rFNg 8LQMHqhXDhHbdqLFpIRdvaanRpi8VjhCukE+Af9z/y0XNPYItWKTm0clkZ1bu/D4 /Q6gUnCU8KeSzlPqrT7nATO6L5sHlqlE9vSbJpWguvgBg9JbMDZquh0gejVqGr4t 1YbJyNh/anl5Xm56CIADEbQK3QocyDRwk9tQhlOUEwBu7rgQrU7NO+1CHgXRD94c iNI892W9FZZQxKOkWnb3DtgpnmuQ4k9tLND/SSnqOllADpztDag+czOxtOHOxlK0 sVh4U82JtYtP9ubhKzFvTDlFv3rcjE86Nn55mAgCFk/XBDLF3kMjjZcS527YLkWY OqDVeKin7nKv1ZfeV9msWbaKp4w2sNhgtROGQr4g1/FktRXS6b8XsTxn7anyVYzM l8SJx66q8XFaWcp0Kwp55oOyhh16dhhel34lWi+wAlF79ypMGlE= =SxSD -----END PGP SIGNATURE----- Merge tag 'nfs-rdma-for-4.20-1' of git://git.linux-nfs.org/projects/anna/linux-nfs NFS RDMA client updates for Linux 4.20 Stable bugfixes: - Reset credit grant properly after a disconnect Other bugfixes and cleanups: - xprt_release_rqst_cong is called outside of transport_lock - Create more MRs at a time and toss out old ones during recovery - Various improvements to the RDMA connection and disconnection code: - Improve naming of trace events, functions, and variables - Add documenting comments - Fix metrics and stats reporting - Fix a tracepoint sparse warning Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-10-18 17:29:00 -04:00
J. Bruce Fields	826799e66e	sunrpc: safely reallow resvport min/max inversion Commits `ffb6ca33b0` and `e08ea3a96f` prevent setting xprt_min_resvport greater than xprt_max_resvport, but may also break simple code that sets one parameter then the other, if the new range does not overlap the old. Also it looks racy to me, unless there's some serialization I'm not seeing. Granted it would probably require malicious privileged processes (unless there's a chance these might eventually be settable in unprivileged containers), but still it seems better not to let userspace panic the kernel. Simpler seems to be to allow setting the parameters to whatever you want but interpret xprt_min_resvport > xprt_max_resvport as the empty range. Fixes: `ffb6ca33b0` "sunrpc: Prevent resvport min/max inversion..." Fixes: `e08ea3a96f` "sunrpc: Prevent rexvport min/max inversion..." Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-10-18 17:20:57 -04:00
Chuck Lever	8440a88611	sunrpc: Report connect_time in seconds The way connection-oriented transports report connect_time is wrong: it's supposed to be in seconds, not in jiffies. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2018-10-02 16:11:00 -04:00
Chuck Lever	3968a8a531	sunrpc: Fix connect metrics For TCP, the logic in xprt_connect_status is currently never invoked to record a successful connection. Commit `2a4919919a` ("SUNRPC: Return EAGAIN instead of ENOTCONN when waking up xprt->pending") changed the way TCP xprt's are awoken after a connect succeeds. Instead, change connection-oriented transports to bump connect_count and compute connect_time the moment that XPRT_CONNECTED is set. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2018-10-02 16:08:12 -04:00
Trond Myklebust	4f54614975	SUNRPC: Clean up xs_udp_data_receive() Simplify the retry logic. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-09-30 15:35:16 -04:00
Trond Myklebust	550aebfe1c	SUNRPC: Allow AF_LOCAL sockets to use the generic stream receive Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-09-30 15:35:16 -04:00
Trond Myklebust	c50b8ee02f	SUNRPC: Clean up - rename xs_tcp_data_receive() to xs_stream_data_receive() In preparation for sharing with AF_LOCAL. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-09-30 15:35:16 -04:00
Trond Myklebust	277e4ab7d5	SUNRPC: Simplify TCP receive code by switching to using iterators Most of this code should also be reusable with other socket types. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-09-30 15:35:16 -04:00
Trond Myklebust	adfa71446d	SUNRPC: Cleanup: remove the unused 'task' argument from the request_send() Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-09-30 15:35:16 -04:00
Trond Myklebust	c544577dad	SUNRPC: Clean up transport write space handling Treat socket write space handling in the same way we now treat transport congestion: by denying the XPRT_LOCK until the transport signals that it has free buffer space. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-09-30 15:35:15 -04:00
Trond Myklebust	36bd7de949	SUNRPC: Turn off throttling of RPC slots for TCP sockets The theory was that we would need to grab the socket lock anyway, so we might as well use it to gate the allocation of RPC slots for a TCP socket. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-09-30 15:35:15 -04:00
Trond Myklebust	75891f502f	SUNRPC: Support for congestion control when queuing is enabled Both RDMA and UDP transports require the request to get a "congestion control" credit before they can be transmitted. Right now, this is done when the request locks the socket. We'd like it to happen when a request attempts to be transmitted for the first time. In order to support retransmission of requests that already hold such credits, we also want to ensure that they get queued first, so that we don't deadlock with requests that have yet to obtain a credit. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-09-30 15:35:15 -04:00
Trond Myklebust	50f484e298	SUNRPC: Treat the task and request as separate in the xprt_ops->send_request() When we shift to using the transmit queue, then the task that holds the write lock will not necessarily be the same as the one being transmitted. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-09-30 15:35:15 -04:00
Trond Myklebust	75c84151a9	SUNRPC: Rename xprt->recv_lock to xprt->queue_lock We will use the same lock to protect both the transmit and receive queues. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-09-30 15:35:14 -04:00
Trond Myklebust	4cd34e7c2e	SUNRPC: Simplify dealing with aborted partially transmitted messages If the previous message was only partially transmitted, we need to close the socket in order to avoid corruption of the message stream. To do so, we currently hijack the unlocking of the socket in order to schedule the close. Now that we track the message offset in the socket state, we can move that kind of checking out of the socket lock code, which is needed to allow messages to remain queued after dropping the socket lock. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-09-30 15:35:14 -04:00
Trond Myklebust	6c7a64e5a4	SUNRPC: Add socket transmit queue offset tracking Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-09-30 15:35:14 -04:00
Trond Myklebust	e1806c7bfb	SUNRPC: Move reset of TCP state variables into the reconnect code Rather than resetting state variables in socket state_change() callback, do it in the sunrpc TCP connect function itself. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-09-30 15:35:14 -04:00
Trond Myklebust	d1109aa56c	SUNRPC: Rename TCP receive-specific state variables Since we will want to introduce similar TCP state variables for the transmission of requests, let's rename the existing ones to label that they are for the receive side. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-09-30 15:35:14 -04:00
Stephen Hemminger	8fdee4cc95	sunrpc: whitespace fixes Remove trailing whitespace and blank line at EOF Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2018-07-31 12:53:40 -04:00
Chuck Lever	a9cde23ab7	SUNRPC: Add a ->free_slot transport callout Refactor: xprtrdma needs to have better control over when RPCs are awoken from the backlog queue, so replace xprt_free_slot with a transport op callout. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2018-05-07 09:20:03 -04:00
Linus Torvalds	a1bf4c7da6	NFS client updates for Linux 4.17 Stable bugfixes: - xprtrdma: Fix corner cases when handling device removal # v4.12+ - xprtrdma: Fix latency regression on NUMA NFS/RDMA clients # v4.15+ Features: - New sunrpc tracepoint for RPC pings - Finer grained NFSv4 attribute checking - Don't unnecessarily return NFS v4 delegations Other bugfixes and cleanups: - Several other small NFSoRDMA cleanups - Improvements to the sunrpc RTT measurements - A few sunrpc tracepoint cleanups - Various fixes for NFS v4 lock notifications - Various sunrpc and NFS v4 XDR encoding cleanups - Switch to the ida_simple API - Fix NFSv4.1 exclusive create - Forget acl cache after setattr operation - Don't advance the nfs_entry readdir cookie if xdr decoding fails -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAlrNG1IACgkQ18tUv7Cl QOvotw//fQoUgQ/AOJGlZo/4ws2mGJN3dfwwKM8xYOnHaxppOYubZRHwvswK8d22 +XR/Q6IVbUxI3mJluv1L0d9CJT06s3c9CO90McIJbk4CWihGP19bNIY4JiPlzrbv 4FDiyOvMBej2UXbHX5EzKj0srxyBoEVf3iUAIa6DaHi3c6EIUo6fP3d2eRNJStqd WMyZs+nqr2W9biyClxntT7l/Sk+o+4I7M3Oo9pjjS+PiePYdaMrL5T1kPeHaJshF GMGXkbvVdqpDRiXX84R9+2/nuSiA15eEnaR94UNvs84oLR3qob3ZhxhudqFdSPrX RS6E7m34gY/EaQm/wbB26PZm+3jHd4Pqm5SKLbyFfoCmG6oMwBvXNRJZas1DFaHM CMOECvfAr6kixVLkAN0MNQ2Ku/FuJ52OLP1dRLmxsblocnhEPujc6RSz6Ju/v3a0 adbpmJMA2IoSGgXMu3g1VGnjHfMj7ZmjtpigXVvlcUqQGCL7t4ngh23cpeTQeJ76 bMwSHUQu18NbmtJjBTE+PIm7mdCrpQD7ZuOPWpK62zxLYUnnv7nm75m84DrDru7d XAmrCmdUJNrVWQs6BAtCXgO4PZ6xNGLosb0xTQXTAQYftc+DRJ9SW/VGc0Mp1L9m 0G0iz++b8cy4Pih5UCDJcCkpjCIvHLcn72zn1kbufWqG3xr2koc= =IlWo -----END PGP SIGNATURE----- Merge tag 'nfs-for-4.17-1' of git://git.linux-nfs.org/projects/anna/linux-nfs Pull NFS client updates from Anna Schumaker: "Stable bugfixes: - xprtrdma: Fix corner cases when handling device removal # v4.12+ - xprtrdma: Fix latency regression on NUMA NFS/RDMA clients # v4.15+ Features: - New sunrpc tracepoint for RPC pings - Finer grained NFSv4 attribute checking - Don't unnecessarily return NFS v4 delegations Other bugfixes and cleanups: - Several other small NFSoRDMA cleanups - Improvements to the sunrpc RTT measurements - A few sunrpc tracepoint cleanups - Various fixes for NFS v4 lock notifications - Various sunrpc and NFS v4 XDR encoding cleanups - Switch to the ida_simple API - Fix NFSv4.1 exclusive create - Forget acl cache after setattr operation - Don't advance the nfs_entry readdir cookie if xdr decoding fails" * tag 'nfs-for-4.17-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (47 commits) NFS: advance nfs_entry cookie only after decoding completes successfully NFSv3/acl: forget acl cache after setattr NFSv4.1: Fix exclusive create NFSv4: Declare the size up to date after it was set. nfs: Use ida_simple API NFSv4: Fix the nfs_inode_set_delegation() arguments NFSv4: Clean up CB_GETATTR encoding NFSv4: Don't ask for attributes when ACCESS is protected by a delegation NFSv4: Add a helper to encode/decode struct timespec NFSv4: Clean up encode_attrs NFSv4; Clean up XDR encoding of type bitmap4 NFSv4: Allow GFP_NOIO sleeps in decode_attr_owner/decode_attr_group SUNRPC: Add a helper for encoding opaque data inline SUNRPC: Add helpers for decoding opaque and string types NFSv4: Ignore change attribute invalidations if we hold a delegation NFS: More fine grained attribute tracking NFS: Don't force unnecessary cache invalidation in nfs_update_inode() NFS: Don't redirty the attribute cache in nfs_wcc_update_inode() NFS: Don't force a revalidation of all attributes if change is missing NFS: Convert NFS_INO_INVALID flags to unsigned long ...	2018-04-12 12:55:50 -07:00
Chuck Lever	78215759e2	SUNRPC: Make RTT measurement more precise (Send) Some RPC transports have more overhead in their send_request callouts than others. For example, for RPC-over-RDMA: - Marshaling an RPC often has to DMA map the RPC arguments - Registration methods perform memory registration as part of marshaling To capture just server and network latencies more precisely: when sending a Call, capture the rq_xtime timestamp _after_ the transport header has been marshaled. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2018-04-10 16:06:22 -04:00
Chuck Lever	ecd465ee88	SUNRPC: Move xprt_update_rtt callsite Since commit `33849792cb` ("xprtrdma: Detect unreachable NFS/RDMA servers more reliably"), the xprtrdma transport now has a ->timer callout. But xprtrdma does not need to compute RTT data, only UDP needs that. Move the xprt_update_rtt call into the UDP transport implementation. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2018-04-10 16:06:22 -04:00
Denys Vlasenko	9b2c45d479	net: make getname() functions return length rather than use int* parameter Changes since v1: Added changes in these files: drivers/infiniband/hw/usnic/usnic_transport.c drivers/staging/lustre/lnet/lnet/lib-socket.c drivers/target/iscsi/iscsi_target_login.c drivers/vhost/net.c fs/dlm/lowcomms.c fs/ocfs2/cluster/tcp.c security/tomoyo/network.c Before: All these functions either return a negative error indicator, or store length of sockaddr into "int socklen" parameter and return zero on success. "int socklen" parameter is awkward. For example, if caller does not care, it still needs to provide on-stack storage for the value it does not need. None of the many FOO_getname() functions of various protocols ever used old value of *socklen. They always just overwrite it. This change drops this parameter, and makes all these functions, on success, return length of sockaddr. It's always >= 0 and can be differentiated from an error. Tests in callers are changed from "if (err)" to "if (err < 0)", where needed. rpc_sockname() lost "int buflen" parameter, since its only use was to be passed to kernel_getsockname() as &buflen and subsequently not used in any way. Userspace API is not changed. text data bss dec hex filename 30108430 2633624 873672 33615726 200ef6e vmlinux.before.o 30108109 2633612 873672 33615393 200ee21 vmlinux.o Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> CC: David S. Miller <davem@davemloft.net> CC: linux-kernel@vger.kernel.org CC: netdev@vger.kernel.org CC: linux-bluetooth@vger.kernel.org CC: linux-decnet-user@lists.sourceforge.net CC: linux-wireless@vger.kernel.org CC: linux-rdma@vger.kernel.org CC: linux-sctp@vger.kernel.org CC: linux-nfs@vger.kernel.org CC: linux-x25@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2018-02-12 14:15:04 -05:00
Trond Myklebust	0afa6b4412	SUNRPC: Don't call __UDPX_INC_STATS() from a preemptible context Calling __UDPX_INC_STATS() from a preemptible context leads to a warning of the form: BUG: using __this_cpu_add() in preemptible [00000000] code: kworker/u5:0/31 caller is xs_udp_data_receive_workfn+0x194/0x270 CPU: 1 PID: 31 Comm: kworker/u5:0 Not tainted 4.15.0-rc8-00076-g90ea9f1 #2 Workqueue: xprtiod xs_udp_data_receive_workfn Call Trace: dump_stack+0x85/0xc1 check_preemption_disabled+0xce/0xe0 xs_udp_data_receive_workfn+0x194/0x270 process_one_work+0x318/0x620 worker_thread+0x20a/0x390 ? process_one_work+0x620/0x620 kthread+0x120/0x130 ? __kthread_bind_mask+0x60/0x60 ret_from_fork+0x24/0x30 Since we're taking a spinlock in those functions anyway, let's fix the issue by moving the call so that it occurs under the spinlock. Reported-by: kernel test robot <fengguang.wu@intel.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2018-02-09 09:39:42 -05:00
Trond Myklebust	9b30889c54	SUNRPC: Ensure we always close the socket after a connection shuts down Ensure that we release the TCP socket once it is in the TCP_CLOSE or TCP_TIME_WAIT state (and only then) so that we don't confuse rkhunter and its ilk. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2018-02-05 19:23:28 -05:00
Trond Myklebust	0af3442af7	SUNRPC: Add explicit rescheduling points in the receive path When reading the reply from the server, insert an explicit cond_resched() to avoid starving higher priority tasks. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2018-01-14 23:06:30 -05:00
Trond Myklebust	3d188805f8	SUNRPC: Chunk reading of replies from the server Read the TCP data in chunks of max 2MB so that we do not hog the socket lock. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2018-01-14 23:06:30 -05:00
Linus Torvalds	2db767d988	NFS client fixes for Linux 4.15-rc2 Bugfixes: - NFSv4: Ensure gcc 4.4.4 can compile initialiser for "invalid_stateid" - SUNRPC: Allow connect to return EHOSTUNREACH - SUNRPC: Handle ENETDOWN errors -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAlohwp4ACgkQ18tUv7Cl QOtq1A//RPOxJBPQsImfkVTiVzxZbS8k2/obJSZjPYoNozmywEJs9dnFYJVCFUGp l9AvRd/SjXOVjGovk6ZhDCY3xA2eP1XfOLiVg7EhpczPVCRNJ34BUT7hWyxnTLSz MKc1qLLfVaSjsLioO6YmdCPjiGC0KegrBKNlRlIbI+OjCq5aNJpz73Fb4mFgCp5M taERunf7X29WHxAVn0c3mhIHN7tpCi9SgfbMURBEKLNrzj7RxnRY07dT1S9Mg/Yg 4FWU9FIpAyk9C9we/LR9jUywZQ3GGJFFFTOo8RfyMB/LR9RACSXnbHjhI1nUEQTb R/NpBxlpvxEOapHdmw32jwj1fkY/WYlUiJekQhjEekp/HkFNdctQL8PjrhG6lIW7 eBfFqZ2RUhYF1OQ8k4o0pR60O2scH3/D7tZwpgnJMFSpQSMnPnU8K3gvn/B5Mi4f UPDHtfj3GlWCIIJq1RIqKN4mt4tPktatnTCLIzDmqNbwqISwxow1lxmSesNejULo MryXLLl5M3XegjokXs0d0hadoywswHRTAxXxQEZav0dKMcHq4F0NirVw+VOIyNCB CztIVFI5Czzo4h4x99lgN26bNTysGMvse2qiPkVVr0CZt2leyrZyTl9khvDe3C0t ijyq882b4LqibuQtnI3l/Pynrrowfp7fqYx7SO62VJjraBVYUzE= =eQyi -----END PGP SIGNATURE----- Merge tag 'nfs-for-4.15-2' of git://git.linux-nfs.org/projects/anna/linux-nfs Pull NFS client fixes from Anna Schumaker: "These patches fix a problem with compiling using an old version of gcc, and also fix up error handling in the SUNRPC layer. - NFSv4: Ensure gcc 4.4.4 can compile initialiser for "invalid_stateid" - SUNRPC: Allow connect to return EHOSTUNREACH - SUNRPC: Handle ENETDOWN errors" * tag 'nfs-for-4.15-2' of git://git.linux-nfs.org/projects/anna/linux-nfs: SUNRPC: Handle ENETDOWN errors SUNRPC: Allow connect to return EHOSTUNREACH NFSv4: Ensure gcc 4.4.4 can compile initialiser for "invalid_stateid"	2017-12-01 20:04:20 -05:00
Trond Myklebust	eb5b46faa6	SUNRPC: Handle ENETDOWN errors Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2017-11-30 11:52:52 -05:00
Trond Myklebust	4ba161a793	SUNRPC: Allow connect to return EHOSTUNREACH Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2017-11-29 14:02:01 -05:00
Linus Torvalds	c3e9c04b89	NFS client updates for Linux 4.15 Stable bugfixes: - Revalidate "." and ".." correctly on open - Avoid RCU usage in tracepoints - Fix ugly referral attributes - Fix a typo in nomigration mount option - Revert "NFS: Move the flock open mode check into nfs_flock()" Features: - Implement a stronger send queue accounting system for NFS over RDMA - Switch some atomics to the new refcount_t type Other bugfixes and cleanups: - Clean up access mode bits - Remove special-case revalidations in nfs_opendir() - Improve invalidating NFS over RDMA memory for async operations that time out - Handle NFS over RDMA replies with a worqueue - Handle NFS over RDMA sends with a workqueue - Fix up replaying interrupted requests - Remove dead NFS over RDMA definitions - Update NFS over RDMA copyright information - Be more consistent with bool initialization and comparisons - Mark expected switch fall throughs - Various sunrpc tracepoint cleanups - Fix various OPEN races - Fix a typo in nfs_rename() - Use common error handling code in nfs_lock_and_join_request() - Check that some structures are properly cleaned up during net_exit() - Remove net pointer from dprintk()s -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAloPWGwACgkQ18tUv7Cl QOtMVhAAufCkDxqO2lmDH+0JyYUKMcoOMYtI8s2J1HrbEzTW/dVtI28fPAKEEd4m 2JjNqnO516Jiv+g3E6eO4uunZRb4IB3AYT6YaTwmBFE+l7tpMdPb1xybOBP02Hji Y29kzLXwxxvnoxEqFalzCzV2BeRb2kAw6mayY9FxH6AfiEEQZfmxLCYgVuYa2jTC Z/B5E0GxAf28Aj0bIP8lLKbOkFijo851DB88UffEOZQGKUDlAd3GNUSSHb81Rj0N 4ef7bKoGylkIpZ1PdTChdG1+RKqud02zrmQfmEwXui3eUwhOWy8hrKloNykqR5sj pgoDz79euAq4TDVyQKtutnbvVxfCcBeMYAXZhXkZLVcl+39in0kuLj4SxU5AmDhf ErnthG4W7jsLMM96kMvSTaoh4uwioviG1KmZfvuvUoMBSwtiX18hFTWtFKRD6x9e PNOqBdh8nkKYEFbEO4ksfYaWZJ5AuyFIQiIpj1gm+7sf039oN/zEuPV+jaEJG0oa Ef9IqHrQbbCUFYFjpBENr3HjU3igTTaxQ5iq+VYl4zg1pw6m6JTojqZ6qtQzqOYS O3N1ygeShsW934z8QcWjtEyeUXIB3JF9vUS3gEBgWPDyCltGXyq4Cq6Lod4s4JCb pWGI6wJLX1Fg6nq7cj0S4Or3QBgz2q8ZyBxssamhdvON/Ef5ccI= =2Zc1 -----END PGP SIGNATURE----- Merge tag 'nfs-for-4.15-1' of git://git.linux-nfs.org/projects/anna/linux-nfs Pull NFS client updates from Anna Schumaker: "Stable bugfixes: - Revalidate "." and ".." correctly on open - Avoid RCU usage in tracepoints - Fix ugly referral attributes - Fix a typo in nomigration mount option - Revert "NFS: Move the flock open mode check into nfs_flock()" Features: - Implement a stronger send queue accounting system for NFS over RDMA - Switch some atomics to the new refcount_t type Other bugfixes and cleanups: - Clean up access mode bits - Remove special-case revalidations in nfs_opendir() - Improve invalidating NFS over RDMA memory for async operations that time out - Handle NFS over RDMA replies with a worqueue - Handle NFS over RDMA sends with a workqueue - Fix up replaying interrupted requests - Remove dead NFS over RDMA definitions - Update NFS over RDMA copyright information - Be more consistent with bool initialization and comparisons - Mark expected switch fall throughs - Various sunrpc tracepoint cleanups - Fix various OPEN races - Fix a typo in nfs_rename() - Use common error handling code in nfs_lock_and_join_request() - Check that some structures are properly cleaned up during net_exit() - Remove net pointer from dprintk()s" * tag 'nfs-for-4.15-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (62 commits) NFS: Revert "NFS: Move the flock open mode check into nfs_flock()" NFS: Fix typo in nomigration mount option nfs: Fix ugly referral attributes NFS: super: mark expected switch fall-throughs sunrpc: remove net pointer from messages nfs: remove net pointer from messages sunrpc: exit_net cleanup check added nfs client: exit_net cleanup check added nfs/write: Use common error handling code in nfs_lock_and_join_requests() NFSv4: Replace closed stateids with the "invalid special stateid" NFSv4: nfs_set_open_stateid must not trigger state recovery for closed state NFSv4: Check the open stateid when searching for expired state NFSv4: Clean up nfs4_delegreturn_done NFSv4: cleanup nfs4_close_done NFSv4: Retry NFS4ERR_OLD_STATEID errors in layoutreturn pNFS: Retry NFS4ERR_OLD_STATEID errors in layoutreturn-on-close NFSv4: Don't try to CLOSE if the stateid 'other' field has changed NFSv4: Retry CLOSE and DELEGRETURN on NFS4ERR_OLD_STATEID. NFS: Fix a typo in nfs_rename() NFSv4: Fix open create exclusive when the server reboots ...	2017-11-17 14:18:00 -08:00
Gustavo A. R. Silva	e9d4763935	net: sunrpc: mark expected switch fall-throughs In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2017-11-17 16:43:44 -05:00
Greg Kroah-Hartman	b24413180f	License cleanup: add SPDX GPL-2.0 license identifier to files with no license Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a /uapi/ one with no licensing information in it, - file was a /uapi/ one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non /uapi/ files that summary was: SPDX license identifier # files ---------------------------------------------------\|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a /uapi/ path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------\|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the /uapi/ ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------\|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-02 11:10:55 +01:00
Colin Ian King	d099b8af46	sunrpc: remove redundant initialization of sock sock is being initialized and then being almost immediately updated hence the initialized value is not being used and is redundant. Remove the initialization. Cleans up clang warning: warning: Value stored to 'sock' during its initialization is never read Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2017-10-01 18:51:30 -04:00
Trond Myklebust	f9773b22a2	NFS-over-RDMA client updates for Linux 4.14 Bugfixes and cleanups: - Constify rpc_xprt_ops - Harden RPC call encoding and decoding - Clean up rpc call decoding to use xdr_streams - Remove unused variables from various structures - Refactor code to remove imul instructions - Rearrange rx_stats structure for better cacheline sharing -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAlmgfA4ACgkQ18tUv7Cl QOsbXBAAnNaCWwerMGi7IbPcvA8aIQLcaruVUVuI2HIUdwb0At3EBakLJr5vFong IbUPEegi2F7Dm8gwwQ8Ntb0gqGER1mHr0Bd4tcls+cNxwKNpRad/cv8ZjN4AMVpz Kf1ZQOSDoRyJxwnAaRTYsU302tkWQFHrBjpCXpvgI3uoQ7kJwC1sZpXH6qN+r9E3 hFlkzZJ6gkZE3Rx3XsQqjl+TFZ3amd9Yl1AjzND622oLItmcJiRoptCVz8jYEFBJ uYvg22jbZWIrI66pPXnX+TuDfkbA6nFuSqJma0VLZAyTGKtRzJpaExvSJuuMqLm1 ZuWgWXIO3Kvvyx4gTvRFq06TAlunjOHlxb+39Yr41w2LLcDitvTmv2t/o8+BcVCp fkaziwZIqkfXoE4+3SGRC0s+R5obtgjAiTlAPTwno9p8T7jC+x43fdPF9l5jgAs+ 0jtl1d+whQK0yGITq7zwbLimLxxz12f8S9JH6U4umkL/A458ApRVuUQfoCHzl4wk ZPG1DGZjPBClM3R//XfUargfs/uM2FO6u0Z4+mxxdyJAHrdExczDC6OE9lLG9hnR KQEa7PVDjQZssNHOY0Nu3QaTpBoVxmN6xiDMTtXdf+ltd2m/ja18lER3tB9IwpXD +RqIJ8aFat3oP76tZ8CNJ7LiRORzmqDTcfjWkpCDPK259OK7FFU= =fdZG -----END PGP SIGNATURE----- Merge tag 'nfs-rdma-for-4.14-1' of git://git.linux-nfs.org/projects/anna/linux-nfs into linux-next NFS-over-RDMA client updates for Linux 4.14 Bugfixes and cleanups: - Constify rpc_xprt_ops - Harden RPC call encoding and decoding - Clean up rpc call decoding to use xdr_streams - Remove unused variables from various structures - Refactor code to remove imul instructions - Rearrange rx_stats structure for better cacheline sharing	2017-09-05 15:16:04 -04:00
Trond Myklebust	ce7c252a8c	SUNRPC: Add a separate spinlock to protect the RPC request receive list This further reduces contention with the transport_lock, and allows us to convert to using a non-bh-safe spinlock, since the list is now never accessed from a bh context. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2017-08-18 14:45:04 -04:00
Trond Myklebust	040249dfbe	SUNRPC: Cleanup xs_tcp_read_common() Simplify the code to avoid a full copy of the struct xdr_skb_reader. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2017-08-16 15:10:17 -04:00
Trond Myklebust	8d6f97d698	SUNRPC: Don't loop forever in xs_tcp_data_receive() Ensure that we don't hog the workqueue thread by requeuing the job every 64 loops. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2017-08-16 15:10:16 -04:00
Trond Myklebust	c89091c88d	SUNRPC: Don't hold the transport lock when receiving backchannel data The backchannel request has no associated task, so it is going nowhere until we call xprt_complete_bc_request(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2017-08-16 15:10:16 -04:00
Trond Myklebust	729749bb8d	SUNRPC: Don't hold the transport lock across socket copy operations Instead add a mechanism to ensure that the request doesn't disappear from underneath us while copying from the socket. We do this by preventing xprt_release() from freeing the XDR buffers until the flag RPC_TASK_MSG_RECV has been cleared from the request. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com>	2017-08-16 15:10:15 -04:00
Chuck Lever	d31ae25481	sunrpc: Const-ify all instances of struct rpc_xprt_ops After transport instance creation, these function pointers never change. Mark them as constant to prevent their use as an attack vector for code injections. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2017-08-01 16:10:35 -04:00
NeilBrown	3ffbc1d655	net/sunrpc/xprt_sock: fix regression in connection error reporting. Commit `3d4762639d` ("tcp: remove poll() flakes when receiving RST") in v4.12 changed the order in which ->sk_state_change() and ->sk_error_report() are called when a socket is shut down - sk_state_change() is now called first. This causes xs_tcp_state_change() -> xs_sock_mark_closed() -> xprt_disconnect_done() to wake all pending tasked with -EAGAIN. When the ->sk_error_report() callback arrives, it is too late to pass the error on, and it is lost. As easy way to demonstrate the problem caused is to try to start rpc.nfsd while rcpbind isn't running. nfsd will attempt a tcp connection to rpcbind. A ECONNREFUSED error is returned, but sunrpc code loses the error and keeps retrying. If it saw the ECONNREFUSED, it would abort. To fix this, handle the sk->sk_err in the TCP_CLOSE branch of xs_tcp_state_change(). Fixes: `3d4762639d` ("tcp: remove poll() flakes when receiving RST") Cc: stable@vger.kernel.org (v4.12) Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2017-07-21 08:49:58 -04:00
NeilBrown	6ea44adce9	SUNRPC: ensure correct error is reported by xs_tcp_setup_socket() If you attempt a TCP mount from an host that is unreachable in a way that triggers an immediate error from kernel_connect(), that error does not propagate up, instead EAGAIN is reported. This results in call_connect_status receiving the wrong error. A case that it easy to demonstrate is to attempt to mount from an address that results in ENETUNREACH, but first deleting any default route. Without this patch, the mount.nfs process is persistently runnable and is hard to kill. With this patch it exits as it should. The problem is caused by the fact that xs_tcp_force_close() eventually calls xprt_wake_pending_tasks(xprt, -EAGAIN); which causes an error return of -EAGAIN. so when xs_tcp_setup_sock() calls xprt_wake_pending_tasks(xprt, status); the status is ignored. Fixes: `4efdd92c92` ("SUNRPC: Remove TCP client connection reset hack") Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2017-05-31 12:26:44 -04:00
Linus Torvalds	8f03cf50bc	NFS client updates for Linux 4.11 Stable bugfixes: - NFSv4: Fix memory and state leak in _nfs4_open_and_get_state - xprtrdma: Fix Read chunk padding - xprtrdma: Per-connection pad optimization - xprtrdma: Disable pad optimization by default - xprtrdma: Reduce required number of send SGEs - nlm: Ensure callback code also checks that the files match - pNFS/flexfiles: If the layout is invalid, it must be updated before retrying - NFSv4: Fix reboot recovery in copy offload - Revert "NFSv4.1: Handle NFS4ERR_BADSESSION/NFS4ERR_DEADSESSION replies to OP_SEQUENCE" - NFSv4: fix getacl head length estimation - NFSv4: fix getacl ERANGE for sum ACL buffer sizes Features: - Add and use dprintk_cont macros - Various cleanups to NFS v4.x to reduce code duplication and complexity - Remove unused cr_magic related code - Improvements to sunrpc "read from buffer" code - Clean up sunrpc timeout code and allow changing TCP timeout parameters - Remove duplicate mw_list management code in xprtrdma - Add generic functions for encoding and decoding xdr streams Bugfixes: - Clean up nfs_show_mountd_netid - Make layoutreturn_ops static and use NULL instead of 0 to fix sparse warnings - Properly handle -ERESTARTSYS in nfs_rename() - Check if register_shrinker() failed during rpcauth_init() - Properly clean up procfs/pipefs entries - Various NFS over RDMA related fixes - Silence unititialized variable warning in sunrpc -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAli3F7YACgkQ18tUv7Cl QOvzrQ//dL+nnBaqsm9bA2wwuVJSQ2R1zdkwHOCWghEWROZrQHzpi0VHu0ZKBLzr YsYFhHvIPax9Q8USY4B/QFQ3eUuZILEVn+xDruRxZaJPnsA4Zmr16VJwGF2F68Lh CGekA5qybqy8lAG6v96Gyjbi+JqjHNCmelYWRv7SX9IZcDjNJpsEbrSI4LkabTWh 70WtCl3LBzVMRYRxe8+f0mcx4g4XCQ8pDaQRgRnfKtNeQk/+PgWz66xSNinDakVb A8AkaiUadPRgUTpap6HfBSicpRvtLQeLhARC0E4YE5pXp2H/kUt2MFe5szblfSCv zf2nrPUbNEHjBypFhERzCZZk6EonY6FeOojyW0g2C+rmPdK7WLlKbwTQFxdRGvsx 78fIiPRdlDHDp9CXzD8V4xxRBJX/KkicA1Vp8CoyQtmpzpu2fjwT0kr9HeD+aEe6 293+72QUfk05re2HYWF9MCGGVVLdnLLjrKCgwwRQ0HX5WF6GNQxX/yVgBVlqFeV3 xc8m7ltKco5N9JxIqwlIpySq2e114EQOqsmHYz3gxd7ID9J1NJz+9H2z2EvgAKZ7 wIPSLoZrdBdnoXG8ZDDTAvPKeB8l6egi6wjrvGKxewVlMbjzogdARsMKWoifnCfG HMkH+IEvLGvFc1pPeLbscJGEdVWXVn0thO+8fkS9F9sE/zMX9PA= =01DU -----END PGP SIGNATURE----- Merge tag 'nfs-for-4.11-1' of git://git.linux-nfs.org/projects/anna/linux-nfs Pull NFS client updates from Anna Schumaker: "Highlights include: Stable bugfixes: - NFSv4: Fix memory and state leak in _nfs4_open_and_get_state - xprtrdma: Fix Read chunk padding - xprtrdma: Per-connection pad optimization - xprtrdma: Disable pad optimization by default - xprtrdma: Reduce required number of send SGEs - nlm: Ensure callback code also checks that the files match - pNFS/flexfiles: If the layout is invalid, it must be updated before retrying - NFSv4: Fix reboot recovery in copy offload - Revert "NFSv4.1: Handle NFS4ERR_BADSESSION/NFS4ERR_DEADSESSION replies to OP_SEQUENCE" - NFSv4: fix getacl head length estimation - NFSv4: fix getacl ERANGE for sum ACL buffer sizes Features: - Add and use dprintk_cont macros - Various cleanups to NFS v4.x to reduce code duplication and complexity - Remove unused cr_magic related code - Improvements to sunrpc "read from buffer" code - Clean up sunrpc timeout code and allow changing TCP timeout parameters - Remove duplicate mw_list management code in xprtrdma - Add generic functions for encoding and decoding xdr streams Bugfixes: - Clean up nfs_show_mountd_netid - Make layoutreturn_ops static and use NULL instead of 0 to fix sparse warnings - Properly handle -ERESTARTSYS in nfs_rename() - Check if register_shrinker() failed during rpcauth_init() - Properly clean up procfs/pipefs entries - Various NFS over RDMA related fixes - Silence unititialized variable warning in sunrpc" * tag 'nfs-for-4.11-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (64 commits) NFSv4: fix getacl ERANGE for some ACL buffer sizes NFSv4: fix getacl head length estimation Revert "NFSv4.1: Handle NFS4ERR_BADSESSION/NFS4ERR_DEADSESSION replies to OP_SEQUENCE" NFSv4: Fix reboot recovery in copy offload pNFS/flexfiles: If the layout is invalid, it must be updated before retrying NFSv4: Clean up owner/group attribute decode SUNRPC: Add a helper function xdr_stream_decode_string_dup() NFSv4: Remove bogus "struct nfs_client" argument from decode_ace() NFSv4: Fix the underestimation of delegation XDR space reservation NFSv4: Replace callback string decode function with a generic NFSv4: Replace the open coded decode_opaque_inline() with the new generic NFSv4: Replace ad-hoc xdr encode/decode helpers with xdr_stream_* generics SUNRPC: Add generic helpers for xdr_stream encode/decode sunrpc: silence uninitialized variable warning nlm: Ensure callback code also checks that the files match sunrpc: Allow xprt->ops->timer method to sleep xprtrdma: Refactor management of mw_list field xprtrdma: Handle stale connection rejection xprtrdma: Properly recover FRWRs with in-flight FASTREG WRs xprtrdma: Shrink send SGEs array ...	2017-03-01 16:10:30 -08:00
Alexey Dobriyan	5b5e0928f7	lib/vsprintf.c: remove %Z support Now that %z is standartised in C99 there is no reason to support %Z. Unlike %L it doesn't even make format strings smaller. Use BUILD_BUG_ON in a couple ATM drivers. In case anyone didn't notice lib/vsprintf.o is about half of SLUB which is in my opinion is quite an achievement. Hopefully this patch inspires someone else to trim vsprintf.c more. Link: http://lkml.kernel.org/r/20170103230126.GA30170@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-27 18:43:47 -08:00
Dan Carpenter	9761a2469d	sunrpc: silence uninitialized variable warning kstrtouint() can return a couple different error codes so the check for "ret == -EINVAL" is wrong and static analysis tools correctly complain that we can use "num" without initializing it. It's not super harmful because we check the bounds. But it's also easy enough to fix. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2017-02-21 10:53:36 -05:00
Chuck Lever	b977b644cc	sunrpc: Allow xprt->ops->timer method to sleep The transport lock is needed to protect the xprt_adjust_cwnd() call in xs_udp_timer, but it is not necessary for accessing the rq_reply_bytes_recvd or tk_status fields. It is correct to sublimate the lock into UDP's xs_udp_timer method, where it is required. The ->timer method has to take the transport lock if needed, but it can now sleep safely, or even call back into the RPC scheduler. This is more a clean-up than a fix, but the "issue" was introduced by my transport switch patches back in 2005. Fixes: `46c0ee8bc4` ("RPC: separate xprt_timer implementations") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2017-02-10 14:02:37 -05:00
Trond Myklebust	7196dbb02e	SUNRPC: Allow changing of the TCP timeout parameters on the fly When the NFSv4 server tells us the lease period, we usually want to adjust down the timeout parameters on the TCP connection to ensure that we don't miss lease renewals due to a faulty connection. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2017-02-09 14:02:10 -05:00
Trond Myklebust	8d1b8c62e0	SUNRPC: Refactor TCP socket timeout code into a helper function Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2017-02-09 13:49:04 -05:00
David S. Miller	bb598c1b8c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Several cases of bug fixes in 'net' overlapping other changes in 'net-next-. Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-15 10:54:36 -05:00
Paolo Abeni	7c13f97ffd	udp: do fwd memory scheduling on dequeue A new argument is added to __skb_recv_datagram to provide an explicit skb destructor, invoked under the receive queue lock. The UDP protocol uses such argument to perform memory reclaiming on dequeue, so that the UDP protocol does not set anymore skb->desctructor. Instead explicit memory reclaiming is performed at close() time and when skbs are removed from the receive queue. The in kernel UDP protocol users now need to call a skb_recv_udp() variant instead of skb_recv_datagram() to properly perform memory accounting on dequeue. Overall, this allows acquiring only once the receive queue lock on dequeue. Tested using pktgen with random src port, 64 bytes packet, wire-speed on a 10G link as sender and udp_sink as the receiver, using an l4 tuple rxhash to stress the contention, and one or more udp_sink instances with reuseport. nr sinks vanilla patched 1 440 560 3 2150 2300 6 3650 3800 9 4450 4600 12 6250 6450 v1 -> v2: - do rmem and allocated memory scheduling under the receive lock - do bulk scheduling in first_packet_length() and in udp_destruct_sock() - avoid the typdef for the dequeue callback Suggested-by: Eric Dumazet <edumazet@google.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-07 13:24:41 -05:00
Jeff Layton	18e601d6ad	sunrpc: fix some missing rq_rbuffer assignments We've been seeing some crashes in testing that look like this: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff8135ce99>] memcpy_orig+0x29/0x110 PGD 212ca2067 PUD 212ca3067 PMD 0 Oops: 0002 [#1] SMP Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache ppdev parport_pc i2c_piix4 sg parport i2c_core virtio_balloon pcspkr acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod ata_generic pata_acpi virtio_scsi 8139too ata_piix libata 8139cp mii virtio_pci floppy virtio_ring serio_raw virtio CPU: 1 PID: 1540 Comm: nfsd Not tainted 4.9.0-rc1 #39 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 task: ffff88020d7ed200 task.stack: ffff880211838000 RIP: 0010:[<ffffffff8135ce99>] [<ffffffff8135ce99>] memcpy_orig+0x29/0x110 RSP: 0018:ffff88021183bdd0 EFLAGS: 00010206 RAX: 0000000000000000 RBX: ffff88020d7fa000 RCX: 000000f400000000 RDX: 0000000000000014 RSI: ffff880212927020 RDI: 0000000000000000 RBP: ffff88021183be30 R08: 01000000ef896996 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff880211704ca8 R13: ffff88021473f000 R14: 00000000ef896996 R15: ffff880211704800 FS: 0000000000000000(0000) GS:ffff88021fc80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000212ca1000 CR4: 00000000000006e0 Stack: ffffffffa01ea087 ffffffff63400001 ffff880215145e00 ffff880211bacd00 ffff88021473f2b8 0000000000000004 00000000d0679d67 ffff880211bacd00 ffff88020d7fa000 ffff88021473f000 0000000000000000 ffff88020d7faa30 Call Trace: [<ffffffffa01ea087>] ? svc_tcp_recvfrom+0x5a7/0x790 [sunrpc] [<ffffffffa01f84d8>] svc_recv+0xad8/0xbd0 [sunrpc] [<ffffffffa0262d5e>] nfsd+0xde/0x160 [nfsd] [<ffffffffa0262c80>] ? nfsd_destroy+0x60/0x60 [nfsd] [<ffffffff810a9418>] kthread+0xd8/0xf0 [<ffffffff816dbdbf>] ret_from_fork+0x1f/0x40 [<ffffffff810a9340>] ? kthread_park+0x60/0x60 Code: 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 fe 7c 35 48 83 ea 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c 8b 56 10 4c 8b 5e 18 48 8d 76 20 <4c> 89 07 4c 89 4f 08 4c 89 57 10 4c 89 5f 18 48 8d 7f 20 73 d4 RIP [<ffffffff8135ce99>] memcpy_orig+0x29/0x110 RSP <ffff88021183bdd0> CR2: 0000000000000000 Both Bruce and Eryu ran a bisect here and found that the problematic patch was `68778945e4` (SUNRPC: Separate buffer pointers for RPC Call and Reply messages). That patch changed rpc_xdr_encode to use a new rq_rbuffer pointer to set up the receive buffer, but didn't change all of the necessary codepaths to set it properly. In particular the backchannel setup was missing. We need to set rq_rbuffer whenever rq_buffer is set. Ensure that it is. Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Chuck Lever <chuck.lever@oracle.com> Reported-by: Eryu Guan <guaneryu@gmail.com> Tested-by: Eryu Guan <guaneryu@gmail.com> Fixes: `68778945e4` "SUNRPC: Separate buffer pointers..." Reported-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-10-28 16:57:33 -04:00
Paolo Abeni	850cbaddb5	udp: use it's own memory accounting schema Completely avoid default sock memory accounting and replace it with udp-specific accounting. Since the new memory accounting model encapsulates completely the required locking, remove the socket lock on both enqueue and dequeue, and avoid using the backlog on enqueue. Be sure to clean-up rx queue memory on socket destruction, using udp its own sk_destruct. Tested using pktgen with random src port, 64 bytes packet, wire-speed on a 10G link as sender and udp_sink as the receiver, using an l4 tuple rxhash to stress the contention, and one or more udp_sink instances with reuseport. nr readers Kpps (vanilla) Kpps (patched) 1 170 440 3 1250 2150 6 3000 3650 9 4200 4450 12 5700 6250 v4 -> v5: - avoid unneeded test in first_packet_length v3 -> v4: - remove useless sk_rcvqueues_full() call v2 -> v3: - do not set the now unsed backlog_rcv callback v1 -> v2: - add memory pressure support - fixed dropwatch accounting for ipv6 Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-10-22 17:05:05 -04:00
David Vrabel	d48f9ce73c	sunrpc: fix write space race causing stalls Write space becoming available may race with putting the task to sleep in xprt_wait_for_buffer_space(). The existing mechanism to avoid the race does not work. This (edited) partial trace illustrates the problem: [1] rpc_task_run_action: task:43546@5 ... action=call_transmit [2] xs_write_space <-xs_tcp_write_space [3] xprt_write_space <-xs_write_space [4] rpc_task_sleep: task:43546@5 ... [5] xs_write_space <-xs_tcp_write_space [1] Task 43546 runs but is out of write space. [2] Space becomes available, xs_write_space() clears the SOCKWQ_ASYNC_NOSPACE bit. [3] xprt_write_space() attemts to wake xprt->snd_task (== 43546), but this has not yet been queued and the wake up is lost. [4] xs_nospace() is called which calls xprt_wait_for_buffer_space() which queues task 43546. [5] The call to sk->sk_write_space() at the end of xs_nospace() (which is supposed to handle the above race) does not call xprt_write_space() as the SOCKWQ_ASYNC_NOSPACE bit is clear and thus the task is not woken. Fix the race by resetting the SOCKWQ_ASYNC_NOSPACE bit in xs_nospace() so the second call to sk->sk_write_space() calls xprt_write_space(). Suggested-by: Trond Myklebust <trondmy@primarydata.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> cc: stable@vger.kernel.org # 4.4 Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-09-19 13:21:36 -04:00
Chuck Lever	3435c74aed	SUNRPC: Generalize the RPC buffer release API xprtrdma needs to allocate the Call and Reply buffers separately. TBH, the reliance on using a single buffer for the pair of XDR buffers is transport implementation-specific. Instead of passing just the rq_buffer into the buf_free method, pass the task structure and let buf_free take care of freeing both XDR buffers at once. There's a micro-optimization here. In the common case, both xprt_release and the transport's buf_free method were checking if rq_buffer was NULL. Now the check is done only once per RPC. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-09-19 13:08:37 -04:00
Chuck Lever	5fe6eaa1f9	SUNRPC: Generalize the RPC buffer allocation API xprtrdma needs to allocate the Call and Reply buffers separately. TBH, the reliance on using a single buffer for the pair of XDR buffers is transport implementation-specific. Transports that want to allocate separate Call and Reply buffers will ignore the "size" argument anyway. Don't bother passing it. The buf_alloc method can't return two pointers. Instead, make the method's return value an error code, and set the rq_buffer pointer in the method itself. This gives call_allocate an opportunity to terminate an RPC instead of looping forever when a permanent problem occurs. If a request is just bogus, or the transport is in a state where it can't allocate resources for any request, there needs to be a way to kill the RPC right there and not loop. This immediately fixes a rare problem in the backchannel send path, which loops if the server happens to send a CB request whose call+reply size is larger than a page (which it shouldn't do yet). One more issue: looks like xprt_inject_disconnect was incorrectly placed in the failure path in call_allocate. It needs to be in the success path, as it is for other call-sites. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-09-19 13:08:37 -04:00
Paolo Abeni	a41bd25ae6	sunrpc: fix UDP memory accounting The commit `f9b2ee714c` ("SUNRPC: Move UDP receive data path into a workqueue context"), as a side effect, moved the skb_free_datagram() call outside the scope of the related socket lock, but UDP sockets require such lock to be held for proper memory accounting. Fix it by replacing skb_free_datagram() with skb_free_datagram_locked(). Fixes: `f9b2ee714c` ("SUNRPC: Move UDP receive data path into a workqueue context") Reported-and-tested-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Cc: stable@vger.kernel.org # 4.4+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-09-03 10:00:49 -04:00
Trond Myklebust	3851f1cdb2	SUNRPC: Limit the reconnect backoff timer to the max RPC message timeout ...and ensure that we propagate it to new transports on the same client. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-08-05 14:12:09 -04:00
Trond Myklebust	02910177ae	SUNRPC: Fix reconnection timeouts When the connect attempt fails and backs off, we should start the clock at the last connection attempt, not time at which we queue up the reconnect job. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-08-05 12:18:10 -04:00
NeilBrown	d88e4d82ef	SUNRPC: disable the use of IPv6 temporary addresses. If the net.ipv6.conf.*.use_temp_addr sysctl is set to '2', then TCP connections over IPv6 will prefer a 'private' source address. These eventually expire and become invalid, typically after a week, but the time is configurable. When the local address becomes invalid the client will not be able to receive replies from the server. Eventually the connection will timeout or break and a new connection will be established, but this can take half an hour (typically TCP connection break time). RFC 4941, which describes private IPv6 addresses, acknowledges that some applications might not work well with them and that the application may explicitly a request non-temporary (i.e. "public") address. I believe this is correct for SUNRPC clients. Without this change, a client will occasionally experience a long delay if private addresses have been enabled. The privacy offered by private addresses is of little value for an NFS server which requires client authentication. For NFSv3 this will often not be a problem because idle connections are closed after 5 minutes. For NFSv4 connections never go idle due to the period RENEW (or equivalent) request. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-08-05 11:29:59 -04:00
Trond Myklebust	1f4c17a03b	SUNRPC: Handle EADDRNOTAVAIL on connection failures If the connect attempt immediately fails with an EADDRNOTAVAIL error, then that means our choice of source port number was bad. This error is expected when we set the SO_REUSEPORT socket option and we have 2 sockets sharing the same source and destination address and port combinations. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Fixes: `402e23b4ed` ("SUNRPC: Fix stupid typo in xs_sock_set_reuseport") Cc: stable@vger.kernel.org # v4.0+	2016-08-01 15:03:02 -04:00
Trond Myklebust	7f94ed2495	Merge branch 'sunrpc'	2016-07-24 17:08:31 -04:00
Frank Sorenson	ffb6ca33b0	sunrpc: Prevent resvport min/max inversion via sysfs and module parameter The current min/max resvport settings are independently limited by the entire range of allowed ports, so max_resvport can be set to a port lower than min_resvport. Prevent inversion of min/max values when set through sysfs and module parameter by setting the limits dependent on each other. Signed-off-by: Frank Sorenson <sorenson@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-19 16:23:27 -04:00
Frank Sorenson	e08ea3a96f	sunrpc: Prevent resvport min/max inversion via sysctl The current min/max resvport settings are independently limited by the entire range of allowed ports, so max_resvport can be set to a port lower than min_resvport. Prevent inversion of min/max values when set through sysctl by setting the limits dependent on each other. Signed-off-by: Frank Sorenson <sorenson@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-19 16:23:27 -04:00
Frank Sorenson	5d71899a26	sunrpc: Fix reserved port range calculation The range calculation for choosing the random reserved port will panic with divide-by-zero when min_resvport == max_resvport, a range of one port, not zero. Fix the reserved port range calculation by adding one to the difference. Signed-off-by: Frank Sorenson <sorenson@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-19 16:23:26 -04:00
J. Bruce Fields	39a9beab5a	rpc: share one xps between all backchannels The spec allows backchannels for multiple clients to share the same tcp connection. When that happens, we need to use the same xprt for all of them. Similarly, we need the same xps. This fixes list corruption introduced by the multipath code. Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com> Acked-by: Trond Myklebust <trondmy@primarydata.com>	2016-06-15 10:32:25 -04:00
Trond Myklebust	9ffadfbc09	SUNRPC: Fix suspicious enobufs issues. The current test is racy when dealing with fast NICs. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-06-13 12:35:51 -04:00
Trond Myklebust	40a5f1b19b	SUNRPC: RPC transport queue must be low latency rpciod can easily get congested due to the long list of queued rpc_tasks. Having the receive queue wait in turn for those tasks to complete can therefore be a bottleneck. Address the problem by separating the workqueues into: - rpciod: manages rpc_tasks - xprtiod: manages transport related work. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-06-13 12:35:51 -04:00
Trond Myklebust	5157b95696	SUNRPC: Consolidate xs_tcp_data_ready and xs_data_ready The only difference between the two at this point is the reset of the connection timeout, and since everyone expect tcp ignore that value, we can just throw it into the generic function. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-06-13 12:35:51 -04:00
Trond Myklebust	42d42a5b0c	SUNRPC: Small optimisation of client receive Do not queue the client receive work if we're still processing. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-06-13 12:35:51 -04:00
Linus Torvalds	ea8ea737c4	NFS client updates for Linux 4.7 Highlights include: Features: - Add support for the NFS v4.2 COPY operation - Add support for NFS/RDMA over IPv6 Bugfixes and cleanups: - Avoid race that crashes nfs_init_commit() - Fix oops in callback path - Fix LOCK/OPEN race when unlinking an open file - Choose correct stateids when using delegations in setattr, read and write - Don't send empty SETATTR after OPEN_CREATE - xprtrdma: Prevent server from writing a reply into memory client has released - xprtrdma: Support using Read list and Reply chunk in one RPC call -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJXRu76AAoJENfLVL+wpUDrDVoQAKPKv1tEVJMRUQA3UVoKoixd KjmmZMjl6GfpISwTZl+a8W549jyGuYH7Gl8vSbMaE9/FI+kJW6XZQniTYfFqY8/a LbMSdNx1+yURisbkyO0vPqqwKw9r6UmsfGeUT8SpS3ff61yp4Oj436ra2qcPJsZ3 cWl/lHItzX7oKFAWmr0Nmq2X8ac/8+NFyK29+V/QGfwtp3qAPbpA8XM5HrHw3rA2 uk5uNSr3hwqz7P3+Hi7ZoO2m4nQTAbQnEunfYpxlOwz4IaM7qcGnntT6Jhwq1pGE /1YasG7bHeiWjhynmZZ4CWuMkogau2UJ/G68Cz7ehLhPNr8rH/ZFCJZ+XX0e0CgI 1d+AwxZvgszIQVBY3S7sg8ezVSCPBXRFJ8rtzggGscqC53aP7L+rLfUFH+OKrhMg 6n7RQiq4EmGDJGviB/R2HixI9CpdOf2puNhDKSJmPOqiSS7UuHMw8QCq++vdru+1 GLGunGyO7D70yTV92KtsdzJlFlnfa/g+FIJrmaMpL3HH1h0stTctWX5xlTYmqEL3 z3aUuT8RySk2t1FTabSj6KRWqE/krK5BMZbX91kpF27WL4c/olXFaZPqBDsj0q4u 2rm1fIrc8RxLXctJan9ro092s/e9dup/1JxV5XWMq/EGS1ezvf+0XkCOtURaAWp3 2aPHlx7M8iuq2SouL6f7 =QMmY -----END PGP SIGNATURE----- Merge tag 'nfs-for-4.7-1' of git://git.linux-nfs.org/projects/anna/linux-nfs Pull NFS client updates from Anna Schumaker: "Highlights include: Features: - Add support for the NFS v4.2 COPY operation - Add support for NFS/RDMA over IPv6 Bugfixes and cleanups: - Avoid race that crashes nfs_init_commit() - Fix oops in callback path - Fix LOCK/OPEN race when unlinking an open file - Choose correct stateids when using delegations in setattr, read and write - Don't send empty SETATTR after OPEN_CREATE - xprtrdma: Prevent server from writing a reply into memory client has released - xprtrdma: Support using Read list and Reply chunk in one RPC call" * tag 'nfs-for-4.7-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (61 commits) pnfs: pnfs_update_layout needs to consider if strict iomode checking is on nfs/flexfiles: Use the layout segment for reading unless it a IOMODE_RW and reading is disabled nfs/flexfiles: Helper function to detect FF_FLAGS_NO_READ_IO nfs: avoid race that crashes nfs_init_commit NFS: checking for NULL instead of IS_ERR() in nfs_commit_file() pnfs: make pnfs_layout_process more robust pnfs: rework LAYOUTGET retry handling pnfs: lift retry logic from send_layoutget to pnfs_update_layout pnfs: fix bad error handling in send_layoutget flexfiles: add kerneldoc header to nfs4_ff_layout_prepare_ds flexfiles: remove pointless setting of NFS_LAYOUT_RETURN_REQUESTED pnfs: only tear down lsegs that precede seqid in LAYOUTRETURN args pnfs: keep track of the return sequence number in pnfs_layout_hdr pnfs: record sequence in pnfs_layout_segment when it's created pnfs: don't merge new ff lsegs with ones that have LAYOUTRETURN bit set pNFS/flexfiles: When initing reads or writes, we might have to retry connecting to DSes pNFS/flexfiles: When checking for available DSes, conditionally check for MDS io pNFS/flexfile: Fix erroneous fall back to read/write through the MDS NFS: Reclaim writes via writepage are opportunistic NFSv4: Use the right stateid for delegations in setattr, read and write ...	2016-05-26 10:33:33 -07:00
Chuck Lever	6b26cc8c8e	sunrpc: Advertise maximum backchannel payload size RPC-over-RDMA transports have a limit on how large a backward direction (backchannel) RPC message can be. Ensure that the NFSv4.x CREATE_SESSION operation advertises this limit to servers. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-05-17 15:47:57 -04:00
Eric Dumazet	b4411457d5	sunrpc: set SOCK_FASYNC sunrpc is using SOCKWQ_ASYNC_NOSPACE without setting SOCK_FASYNC, so the recent optimizations done in sk_set_bit() and sk_clear_bit() broke it. There is still the risk that a subsequent sock_fasync() call would clear SOCK_FASYNC, but sunrpc does not use this yet. Fixes: `9317bb6982` ("net: SOCKWQ_ASYNC_NOSPACE optimizations") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Jiri Pirko <jiri@resnulli.us> Reported-by: Huang, Ying <ying.huang@intel.com> Tested-by: Jiri Pirko <jiri@resnulli.us> Tested-by: Huang, Ying <ying.huang@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-13 01:43:52 -04:00
Eric Dumazet	02c223470c	net: udp: rename UDP_INC_STATS_BH() Rename UDP_INC_STATS_BH() to __UDP_INC_STATS(), and UDP6_INC_STATS_BH() to __UDP6_INC_STATS() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-27 22:48:23 -04:00
Hannes Frederic Sowa	fafc4e1ea1	sock: tigthen lockdep checks for sock_owned_by_user sock_owned_by_user should not be used without socket lock held. It seems to be a common practice to check .owned before lock reclassification, so provide a little help to abstract this check away. Cc: linux-cifs@vger.kernel.org Cc: linux-bluetooth@vger.kernel.org Cc: linux-nfs@vger.kernel.org Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-13 22:37:20 -04:00
Willem de Bruijn	1da8c681d5	sunrpc: do not pull udp headers on receive Commit `e6afc8ace6` modified the udp receive path by pulling the udp header before queuing an skbuff onto the receive queue. Sunrpc also calls skb_recv_datagram to dequeue an skb from a udp socket. Modify this receive path to also no longer expect udp headers. Fixes: `e6afc8ace6` ("udp: remove headers from UDP packets before queueing") Reported-by: Franklin S Cooper Jr. <fcooper@ti.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Tested-by: Thierry Reding <treding@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-11 15:31:33 -04:00
Trond Myklebust	fb43d17210	SUNRPC: Use the multipath iterator to assign a transport to each task Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-02-05 18:48:55 -05:00
Trond Myklebust	daaadd2283	Merge branch 'bugfixes' * bugfixes: SUNRPC: Fixup socket wait for memory SUNRPC: Fix a missing break in rpc_anyaddr() pNFS/flexfiles: Fix an Oopsable typo in ff_mirror_match_fh() NFS: Fix attribute cache revalidation NFS: Ensure we revalidate attributes before using execute_ok() NFS: Flush reclaim writes using FLUSH_COND_STABLE NFS: Background flush should not be low priority NFSv4.1/pnfs: Fixup an lo->plh_block_lgets imbalance in layoutreturn NFSv4: Don't perform cached access checks before we've OPENed the file NFS: Allow the combination pNFS and labeled NFS NFS42: handle layoutstats stateid error nfs: Fix race in __update_open_stateid() nfs: fix missing assignment in nfs4_sequence_done tracepoint	2016-01-07 18:45:36 -05:00
Trond Myklebust	13331a551a	SUNRPC: Fixup socket wait for memory We're seeing hangs in the NFS client code, with loops of the form: RPC: 30317 xmit incomplete (267368 left of 524448) RPC: 30317 call_status (status -11) RPC: 30317 call_transmit (status 0) RPC: 30317 xprt_prepare_transmit RPC: 30317 xprt_transmit(524448) RPC: xs_tcp_send_request(267368) = -11 RPC: 30317 xmit incomplete (267368 left of 524448) RPC: 30317 call_status (status -11) RPC: 30317 call_transmit (status 0) RPC: 30317 xprt_prepare_transmit RPC: 30317 xprt_transmit(524448) Turns out commit `ceb5d58b21` ("net: fix sock_wake_async() rcu protection") moved SOCKWQ_ASYNC_NOSPACE out of sock->flags and into sk->sk_wq->flags, however it never tried to fix up the code in net/sunrpc. The new idiom is to use the flags in the RCU protected struct socket_wq. While we're at it, clear out the now redundant places where we set/clear SOCKWQ_ASYNC_NOSPACE and SOCK_NOSPACE. In principle, sk_stream_wait_memory() is supposed to set these for us, so we only need to clear them in the particular case of our ->write_space() callback. Fixes: `ceb5d58b21` ("net: fix sock_wake_async() rcu protection") Cc: Eric Dumazet <edumazet@google.com> Cc: stable@vger.kernel.org # 4.4 Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-01-06 12:22:25 -05:00
Stefan Hajnoczi	d1358917f2	SUNRPC: drop unused xs_reclassify_socketX() helpers xs_reclassify_socket4() and friends used to be called directly. xs_reclassify_socket() is called instead nowadays. The xs_reclassify_socketX() helper functions are empty when CONFIG_DEBUG_LOCK_ALLOC is not defined. Drop them since they have no callers. Note that AF_LOCAL still calls xs_reclassify_socketu() directly but is easily converted to generic xs_reclassify_socket(). Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-12-28 09:57:15 -05:00
Eric Dumazet	9cd3e072b0	net: rename SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA This patch is a cleanup to make following patch easier to review. Goal is to move SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA from (struct socket)->flags to a (struct socket_wq)->flags to benefit from RCU protection in sock_wake_async() To ease backports, we rename both constants. Two new helpers, sk_set_bit(int nr, struct sock sk) and sk_clear_bit(int net, struct sock sk) are added so that following patch can change their implementation. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-12-01 15:45:05 -05:00
Andrzej Hajda	7fc561362d	SUNRPC: fix variable type Due to incorrect len type bc_send_request returned always zero. The problem has been detected using proposed semantic patch scripts/coccinelle/tests/assign_signed_to_unsigned.cocci [1]. [1]: http://permalink.gmane.org/gmane.linux.kernel/2046107 Signed-off-by: Andrzej Hajda <a.hajda@samsung.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-11-03 12:31:31 -05:00
Trond Myklebust	ac3c860c75	NFS: NFSoRDMA Client Side Changes In addition to a variety of bugfixes, these patches are mostly geared at enabling both swap and backchannel support to the NFS over RDMA client. Signed-off-by: Anna Schumake <Anna.Schumaker@Netapp.com> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJWN9tvAAoJENfLVL+wpUDrurkP/0exWvxZb0yAxOlquyh4tmUA ZO2rd+aap9iyaOPYGcWGd38x3WuvoecuaT/Eu+wRGkH89sF1LMSA+GUD7Ua/Ii7r 5spQP6tVRVswr+cK53H3fbEpQE7NTuBJB4RjivmddmduMPy678FcMSg4wfMqGwmw bFuCG70bYkEboIe+jiqNOzy6+Dkkn6h4pLg8S89jGj4XeV7JF9l7Cr0OfxZVWxme YX1y9lyIMB/dKsD8o2TjhfeSQ1TtmWDS1rw7MurIF/pIlmvTfAoivZFfflrAbOC6 vx/wWsswLKZPJ72QrXfnRErEI+8nea5mvBvgW2xQh1GywWQI5kzdvG3lVMmvjX3I g5X/e6oDaPAtBXuzundQP7vE3yYTGGH+C0rBoFRHR5ThuRZyNqQY0VphQ/nz+B6b m5loQaxKy+qDdNH0sTwaY3KUNoP4LHzMF+15g2nVIjKLZlG+7Yx8yJwhkKx4XXzn t8opIcLSNb6ehlQ/Vw3smhjc6NAXecg0jEeGkL1MV0Cqpk+Uyf1JFNyDL/nJkeI+ 3zlmVDIIbPCHz7gmqhlXCN6Ql6QttgGyt5mgW0f6Q1N0Miqix6DCywu9aaprLZPJ O+MOZaNa/6F0KSZpPTwqZ5i7nxrBu48r8OK0HDU7FOdJ1CZXd7y7TXrXnBVco4uu AXVsLy/tnjAlqOy07ibB =Ush5 -----END PGP SIGNATURE----- Merge tag 'nfs-rdma-4.4-2' of git://git.linux-nfs.org/projects/anna/nfs-rdma NFS: NFSoRDMA Client Side Changes In addition to a variety of bugfixes, these patches are mostly geared at enabling both swap and backchannel support to the NFS over RDMA client. Signed-off-by: Anna Schumake <Anna.Schumaker@Netapp.com>	2015-11-02 17:09:24 -05:00
Chuck Lever	76566773a1	NFS: Enable client side NFSv4.1 backchannel to use other transports Forechannel transports get their own "bc_up" method to create an endpoint for the backchannel service. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> [Anna Schumaker: Add forward declaration of struct net to xprt.h] Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2015-11-02 16:29:13 -05:00
Chuck Lever	42e5c3e272	SUNRPC: Abstract backchannel operations xprt_{setup,destroy}_backchannel() won't be adequate for RPC/RMDA bi-direction. In particular, receive buffers have to be pre- registered and posted in order to receive incoming backchannel requests. Add a virtual function call to allow the insertion of appropriate backchannel setup and destruction methods for each transport. In addition, freeing a backchannel request is a little different for RPC/RDMA. Introduce an rpc_xprt_op to handle the difference. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2015-11-02 13:45:15 -05:00
Trond Myklebust	31303d6cbb	SUNRPC: Use MSG_SENDPAGE_NOTLAST in xs_send_pagedata() If we're sending more than one page via kernel_sendpage(), then set MSG_SENDPAGE_NOTLAST between the pages so that we don't send suboptimal frames (see commit `2f53384424` and commit `35f9c09fe9`). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-10-08 09:12:35 -04:00
Trond Myklebust	a26480942c	SUNRPC: Move AF_LOCAL receive data path into a workqueue context Now that we've done it for TCP and UDP, let's convert AF_LOCAL as well. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-10-08 08:27:05 -04:00
Trond Myklebust	f9b2ee714c	SUNRPC: Move UDP receive data path into a workqueue context Now that we've done it for TCP, let's convert UDP as well. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-10-08 08:27:04 -04:00
Trond Myklebust	edc1b01cd3	SUNRPC: Move TCP receive data path into a workqueue context Stream protocols such as TCP can often build up a backlog of data to be read due to ordering. Combine this with the fact that some workloads such as NFS read()-intensive workloads need to receive a lot of data per RPC call, and it turns out that receiving the data from inside a softirq context can cause starvation. The following patch moves the TCP data receive into a workqueue context. We still end up calling tcp_read_sock(), but we do so from a process context, meaning that softirqs are enabled for most of the time. With this patch, I see a doubling of read bandwidth when running a multi-threaded iozone workload between a virtual client and server setup. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-10-08 08:27:04 -04:00
Trond Myklebust	66d7a56a62	SUNRPC: Refactor TCP receive Move the TCP data receive loop out of xs_tcp_data_ready(). Doing so will allow us to move the data receive out of the softirq context in a set of followup patches. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-10-08 08:27:04 -04:00
Trond Myklebust	4b0ab51db3	SUNRPC: xs_sock_mark_closed() does not need to trigger socket autoclose Under all conditions, it should be quite sufficient just to mark the socket as disconnected. It will then be closed by the transport shutdown or reconnect code. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-09-19 16:38:35 -04:00
Trond Myklebust	0fdea1e8a2	SUNRPC: Ensure that we wait for connections to complete before retrying Commit `718ba5b873`, moved the responsibility for unlocking the socket to xs_tcp_setup_socket, meaning that the socket will be unlocked before we know that it has finished trying to connect. The following patch is based on an initial patch by Russell King to ensure that we delay clearing the XPRT_CONNECTING flag until we either know that we failed to initiate a connection attempt, or the connection attempt itself failed. Fixes: `718ba5b873` ("SUNRPC: Add helpers to prevent socket create from racing") Reported-by: Russell King <linux@arm.linux.org.uk> Reported-by: Russell King <rmk+kernel@arm.linux.org.uk> Tested-by: Russell King <rmk+kernel@arm.linux.org.uk> Tested-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-09-17 18:01:28 -04:00
Trond Myklebust	03c78827db	SUNRPC: Fix races between socket connection and destroy code When we're destroying the socket transport, we need to ensure that we cancel any existing delayed connection attempts, and order them w.r.t. the call to xs_close(). Reported-by:"Suzuki K. Poulose" <suzuki.poulose@arm.com> Acked-by: Jeff Layton <jlayton@poochiereds.net> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-09-17 15:48:23 -04:00
Trond Myklebust	099392048c	SUNRPC: Prevent SYN+SYNACK+RST storms Add a shutdown() call before we release the socket in order to ensure the reset is sent before we try to reconnect. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-08-29 19:11:21 -07:00
Trond Myklebust	0c78789e3a	SUNRPC: xs_reset_transport must mark the connection as disconnected In case the reconnection attempt fails. Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-08-29 13:40:32 -07:00
Trond Myklebust	c2126157ea	SUNRPC: Allow sockets to do GFP_NOIO allocations Follow up to commit `c4a7ca7749` ("SUNRPC: Allow waiting on memory allocation"). Allows the RPC socket code to do non-IO blocking. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-08-19 21:46:15 -05:00
Trond Myklebust	99b1a4c32a	SUNRPC: Fix a thinko in xs_connect() It is rather pointless to test the value of transport->inet after calling xs_reset_transport(), since it will always be zero, and so we will never see any exponential back off behaviour. Also don't force early connections for SOFTCONN tasks. If the server disconnects us, we should respect the exponential backoff. Cc: stable@vger.kernel.org # 4.0+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-08-17 13:05:49 -05:00
Linus Torvalds	d8132e08d2	NFS client bugfixes for Linux 4.2 Highlights include: Stable patches: - Fix a situation where the client uses the wrong (zero) stateid. - Fix a memory leak in nfs_do_recoalesce Bugfixes: - Plug a memory leak when ->prepare_layoutcommit fails - Fix an Oops in the NFSv4 open code - Fix a backchannel deadlock - Fix a livelock in sunrpc when sendmsg fails due to low memory availability - Don't revalidate the mapping if both size and change attr are up to date - Ensure we don't miss a file extension when doing pNFS - Several fixes to handle NFSv4.1 sequence operation status bits correctly - Several pNFS layout return bugfixes -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJVt6RGAAoJEGcL54qWCgDyiDIP/2+fUM7Tc1llCxYbM2WLC6Ar 34v5yVwO96MqhI4L2mXB5FJvr4LP2/EZ4ZExMcf4ymT7pgJnjFK4nEv9IHUSy6xb ea+oS9GjvFSeGdkukJLRniNER5/ZG3GWkojlHNJCgByoIVRK4ISXF/qL9w2sedGw +5ejvjqie9NmBnBXMq8DRlU+kXhVYCF6E9qWATwUNK5Eq2eeQnDbA2w9ACSBVK3W LhCvZi0eBq7krSbHob018PmlQ0VPvmYwk5xL4d//FvcaNj/utk82VjAZCdKOK1sH qn8hcKgVeVko/3jwcUp6m3zAkKZ1IX/XaXJeHbosnKG/g0vy3hQirpa/g2iDTQ4H NXOSwcsd6syReZDZbQTxbvaSOp5ACxZAQKYLnlPerJ/hMpXDQCEAwyeAFKzEaKz4 FfF0VJF+30w9PJk3wgk2DF66xbYVfHyvrLtVcb/ki8gb91cH09i+nFFSSfHQBMLh +ciHg7rOyXnbXoCaW9fBvONz2sCYDwbHATmhpWWZIx/3UTDf5owxHFa3BFDgGKnD jyiPjMh6I3JUE+Qm1zwInsfsskBKRSl2BdJgTHBGY5ODuQGF/sogOmvgbrT7Ox3t kbL8nzCydqLixM+4aw61nYakZqgDsKNER5Ggr+lkv4AZ2dH6IeP2IZjuoHLLylvZ dyqHwpCjoUtmYAUr166U =wlUD -----END PGP SIGNATURE----- Merge tag 'nfs-for-4.2-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client bugfixes from Trond Myklebust: "Highlights include: Stable patches: - Fix a situation where the client uses the wrong (zero) stateid. - Fix a memory leak in nfs_do_recoalesce Bugfixes: - Plug a memory leak when ->prepare_layoutcommit fails - Fix an Oops in the NFSv4 open code - Fix a backchannel deadlock - Fix a livelock in sunrpc when sendmsg fails due to low memory availability - Don't revalidate the mapping if both size and change attr are up to date - Ensure we don't miss a file extension when doing pNFS - Several fixes to handle NFSv4.1 sequence operation status bits correctly - Several pNFS layout return bugfixes" * tag 'nfs-for-4.2-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (28 commits) nfs: Fix an oops caused by using other thread's stack space in ASYNC mode nfs: plug memory leak when ->prepare_layoutcommit fails SUNRPC: Report TCP errors to the caller sunrpc: translate -EAGAIN to -ENOBUFS when socket is writable. NFSv4.2: handle NFS-specific llseek errors NFS: Don't clear desc->pg_moreio in nfs_do_recoalesce() NFS: Fix a memory leak in nfs_do_recoalesce NFS: nfs_mark_for_revalidate should always set NFS_INO_REVAL_PAGECACHE NFS: Remove the "NFS_CAP_CHANGE_ATTR" capability NFS: Set NFS_INO_REVAL_PAGECACHE if the change attribute is uninitialised NFS: Don't revalidate the mapping if both size and change attr are up to date NFSv4/pnfs: Ensure we don't miss a file extension NFSv4: We must set NFS_OPEN_STATE flag in nfs_resync_open_stateid_locked SUNRPC: xprt_complete_bc_request must also decrement the free slot count SUNRPC: Fix a backchannel deadlock pNFS: Don't throw out valid layout segments pNFS: pnfs_roc_drain() fix a race with open pNFS: Fix races between return-on-close and layoutreturn. pNFS: pnfs_roc_drain should return 'true' when sleeping pNFS: Layoutreturn must invalidate all existing layout segments. ...	2015-07-28 09:37:44 -07:00
Trond Myklebust	f580dd0428	SUNRPC: Report TCP errors to the caller Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-07-27 17:56:57 -04:00
NeilBrown	743c69e7c0	sunrpc: translate -EAGAIN to -ENOBUFS when socket is writable. The networking layer does not reliably report the distinction between a non-block write failing because: 1/ the queue is too full already and 2/ a memory allocation attempt failed. The distinction is important because in the first case it is appropriate to retry as soon as the socket reports that it is writable, and in the second case a small delay is required as the socket will most likely report as writable but kmalloc could still fail. sk_stream_wait_memory() exhibits this distinction nicely, setting 'vm_wait' if a small wait is needed. However in the non-blocking case it always returns -EAGAIN no matter the cause of the failure. This -EAGAIN call get all the way to sunrpc. The sunrpc layer expects EAGAIN to indicate the first cause, and ENOBUFS to indicate the second. Various documentation suggests that this is not unreasonable, but does not guarantee the desired error codes. The result of getting -EAGAIN when -ENOBUFS is expected is that the send is tried again in a tight loop and soft lockups are reported. so: add tests after calls to xs_sendpages() to translate -EAGAIN into -ENOBUFS if the socket is writable. This cannot happen inside xs_sendpages() as the test for "is socket writable" is different between TCP and UDP. With this change, the tight loop retrying xs_sendpages() becomes a loop which only retries every 250ms, and so will not trigger a soft-lockup warning. It is possible that the write did fail because the queue was too full and by the time xs_sendpages() completed, the queue was writable again. In this case an extra 250ms delay is inserted that isn't really needed. This circumstance suggests a degree of congestion so a delay is not necessarily a bad thing, and it can only cause a single 250ms delay, not a series of them. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-07-27 11:16:56 -04:00
Trond Myklebust	b5872f0c67	SUNRPC: Don't confuse ENOBUFS with a write_space issue ENOBUFS means that memory allocations are failing due to an actual low memory situation. It should not be confused with being out of socket buffer space. Handle the problem by just punting to the delay in call_status. Reported-by: Neil Brown <neilb@suse.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-07-03 09:42:54 -04:00
Linus Torvalds	8688d9540c	NFS client updates for Linux 4.2 Highlights include: Stable patches: - Fix a crash in the NFSv4 file locking code. - Fix an fsync() regression, where we were failing to retry I/O in some circumstances. - Fix an infinite loop in NFSv4.0 OPEN stateid recovery - Fix a memory leak when an attempted pnfs fails. - Fix a memory leak in the backchannel code - Large hostnames were not supported correctly in NFSv4.1 - Fix a pNFS/flexfiles bug that was impeding error reporting on I/O. - Fix a couple of credential issues in pNFS/flexfiles Bugfixes + cleanups: - Open flag sanity checks in the NFSv4 atomic open codepath - More NFSv4 delegation related bugfixes - Various NFSv4.1 backchannel bugfixes and cleanups - Fix the NFS swap socket code - Various cleanups of the NFSv4 SETCLIENTID and EXCHANGE_ID code - Fix a UDP transport deadlock issue Features: - More RDMA client transport improvements - NFSv4.2 LAYOUTSTATS functionality for pnfs flexfiles. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJVlWQgAAoJEGcL54qWCgDyXtcP/2Y3HJ9xu5qU3Bo/jzCAw4E1 jPPMSFAz4kqy/LGoslyc1cNDEiKGzJYWU8TtCGI3KAyNxb6n3pT1mEE1tvIsSdis D8bpV13M452PPpZYrBawIf4+OuohXmuYHpFiVNSpLbH3Uo7dthvFFnbqCGaGlnqY rXYZHAnx637OGBcJsT4AXCUz12ILvxMYRnqwW6Xn+j9JmwR1coQX3v8W8e7SMf6i J+zOny7Uetjrg1U9C9uQB6ZvIoxUMo9QOVmtGCwsBl8lM3fLmzaQfcUf9fm76pMT yTrKJs4jBLvVf00bRHFDv9EHWCy97oqCkeQEw1EY2lnxp/lmM5SiI4zQqjbf0QTW 5VQScT1MK6xwHoUbuI/sYdXXR8KGDVT1xCFFHUNcg69CvgqdgWslPQY7xLJMvUJZ vBWfWDd8ppdCw2ZVX4ae/bnhfc+/mVh4wRPF7tgVAjT0pobBV9xMOeMkF4mo76Wa pvo/nTRMt68hpESVSvq9dYEMVhy5haqFhPrSbyAGOpT4SE2V3RCCZQfhu15TMKdW BdvItG+mdAVPbIHqhx7vRdAudcOEZKyxbFA+l3E5FyCAXLV7XS3M8CEl3P1w7gmm Ccr8DW9abKFJf1RAKdX3stexIoJLGTwciSMR5smsbup/xNcx/fRgx2f1w31JMPxb kG3Izfk25w9uGSsbR39D =AREr -----END PGP SIGNATURE----- Merge tag 'nfs-for-4.2-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client updates from Trond Myklebust: "Highlights include: Stable patches: - Fix a crash in the NFSv4 file locking code. - Fix an fsync() regression, where we were failing to retry I/O in some circumstances. - Fix an infinite loop in NFSv4.0 OPEN stateid recovery - Fix a memory leak when an attempted pnfs fails. - Fix a memory leak in the backchannel code - Large hostnames were not supported correctly in NFSv4.1 - Fix a pNFS/flexfiles bug that was impeding error reporting on I/O. - Fix a couple of credential issues in pNFS/flexfiles Bugfixes + cleanups: - Open flag sanity checks in the NFSv4 atomic open codepath - More NFSv4 delegation related bugfixes - Various NFSv4.1 backchannel bugfixes and cleanups - Fix the NFS swap socket code - Various cleanups of the NFSv4 SETCLIENTID and EXCHANGE_ID code - Fix a UDP transport deadlock issue Features: - More RDMA client transport improvements - NFSv4.2 LAYOUTSTATS functionality for pnfs flexfiles" * tag 'nfs-for-4.2-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (87 commits) nfs: Remove invalid tk_pid from debug message nfs: Remove invalid NFS_ATTR_FATTR_V4_REFERRAL checking in nfs4_get_rootfh nfs: Drop bad comment in nfs41_walk_client_list() nfs: Remove unneeded micro checking of CONFIG_PROC_FS nfs: Don't setting FILE_CREATED flags always nfs: Use remove_proc_subtree() instead remove_proc_entry() nfs: Remove unused argument in nfs_server_set_fsinfo() nfs: Fix a memory leak when meeting an unsupported state protect nfs: take extra reference to fl->fl_file when running a LOCKU operation NFSv4: When returning a delegation, don't reclaim an incompatible open mode. NFSv4.2: LAYOUTSTATS is optional to implement NFSv4.2: Fix up a decoding error in layoutstats pNFS/flexfiles: Fix the reset of struct pgio_header when resending pNFS/flexfiles: Turn off layoutcommit for servers that don't need it pnfs/flexfiles: protect ktime manipulation with mirror lock nfs: provide pnfs_report_layoutstat when NFS42 is disabled nfs: verify open flags before allowing open nfs: always update creds in mirror, even when we have an already connected ds nfs: fix potential credential leak in ff_layout_update_mirror_cred pnfs/flexfiles: report layoutstat regularly ...	2015-07-02 11:32:23 -07:00
Trond Myklebust	775f06ab49	SUNRPC: Set the TCP user timeout option on client sockets Use the TCP_USER_TIMEOUT socket option to advertise to the server how long we will keep the connection open if there is unacknowledged data. See RFC5482. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-20 15:31:54 -04:00
Trond Myklebust	4876cc779f	SUNRPC: Ensure we release the TCP socket once it has been closed This fixes a regression introduced by commit `caf4ccd4e8` ("SUNRPC: Make xs_tcp_close() do a socket shutdown rather than a sock_release"). Prior to that commit, the autoclose feature would ensure that an idle connection would result in the socket being both disconnected and released, whereas now only gets disconnected. While the current behaviour is harmless, it does leave the port bound until either RPC traffic resumes or the RPC client is shut down. Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-19 19:20:12 -04:00
Chuck Lever	4a06825839	SUNRPC: Transport fault injection It has been exceptionally useful to exercise the logic that handles local immediate errors and RDMA connection loss. To enable developers to test this regularly and repeatably, add logic to simulate connection loss every so often. Fault injection is disabled by default. It is enabled with $ sudo echo xxx > /sys/kernel/debug/sunrpc/inject_fault/disconnect where "xxx" is a large positive number of transport method calls before a disconnect. A value of several thousand is usually a good number that allows reasonable forward progress while still causing a lot of connection drops. These hooks are disabled when SUNRPC_DEBUG is turned off. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-10 18:37:26 -04:00
Jeff Layton	d67fa4d85a	sunrpc: turn swapper_enable/disable functions into rpc_xprt_ops RDMA xprts don't have a sock_xprt, but an rdma_xprt, so the xs_swapper_enable/disable functions will likely oops when fed an RDMA xprt. Turn these functions into rpc_xprt_ops so that that doesn't occur. For now the RDMA versions are no-ops that just return -EINVAL on an attempt to swapon. Cc: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-10 18:26:26 -04:00
Jeff Layton	d6e971d8ec	sunrpc: lock xprt before trying to set memalloc on the sockets It's possible that we could race with a call to xs_reset_transport, in which case the xprt->inet pointer could be zeroed out while we're accessing it. Lock the xprt before we try to set memalloc on it. Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-10 18:26:24 -04:00
Jeff Layton	264d1df3b3	sunrpc: if we're closing down a socket, clear memalloc on it first We currently increment the memalloc_socks counter if we have a xprt that is associated with a swapfile. That socket can be replaced however during a reconnect event, and the memalloc_socks counter is never decremented if that occurs. When tearing down a xprt socket, check to see if the xprt is set up for swapping and sk_clear_memalloc before releasing the socket if so. Acked-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-10 18:26:22 -04:00
Jeff Layton	8e2281330f	sunrpc: make xprt->swapper an atomic_t Split xs_swapper into enable/disable functions and eliminate the "enable" flag. Currently, it's racy if you have multiple swapon/swapoff operations running in parallel over the same xprt. Also fix it so that we only set it to a memalloc socket on a 0->1 transition and only clear it on a 1->0 transition. Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-10 18:26:18 -04:00
Jeff Layton	3c87ef6efb	sunrpc: keep a count of swapfiles associated with the rpc_clnt Jerome reported seeing a warning pop when working with a swapfile on NFS. The nfs_swap_activate can end up calling sk_set_memalloc while holding the rcu_read_lock and that function can sleep. To fix that, we need to take a reference to the xprt while holding the rcu_read_lock, set the socket up for swapping and then drop that reference. But, xprt_put is not exported and having NFS deal with the underlying xprt is a bit of layering violation anyway. Fix this by adding a set of activate/deactivate functions that take a rpc_clnt pointer instead of an rpc_xprt, and have nfs_swap_activate and nfs_swap_deactivate call those. Also, add a per-rpc_clnt atomic counter to keep track of the number of active swapfiles associated with it. When the counter does a 0->1 transition, we enable swapping on the xprt, when we do a 1->0 transition we disable swapping on it. This also allows us to be a bit more selective with the RPC_TASK_SWAPPER flag. If non-swapper and swapper clnts are sharing a xprt, then we only need to flag the tasks from the swapper clnt with that flag. Acked-by: Mel Gorman <mgorman@suse.de> Reported-by: Jerome Marchand <jmarchan@redhat.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-10 18:26:14 -04:00
Stefan Hajnoczi	9300fdba25	SUNRPC: drop stale doc comments in xprtsock.c Several functions have outdated arguments listed in the doc comments. Drop documentation for arguments that no longer exist. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-02 08:55:28 -04:00
Luis R. Rodriguez	9c27847dda	kernel/params: constify struct kernel_param_ops uses Most code already uses consts for the struct kernel_param_ops, sweep the kernel for the last offending stragglers. Other than include/linux/moduleparam.h and kernel/params.c all other changes were generated with the following Coccinelle SmPL patch. Merge conflicts between trees can be handled with Coccinelle. In the future git could get Coccinelle merge support to deal with patch --> fail --> grammar --> Coccinelle --> new patch conflicts automatically for us on patches where the grammar is available and the patch is of high confidence. Consider this a feature request. Test compiled on x86_64 against: * allnoconfig * allmodconfig * allyesconfig @ const_found @ identifier ops; @@ const struct kernel_param_ops ops = { }; @ const_not_found depends on !const_found @ identifier ops; @@ -struct kernel_param_ops ops = { +const struct kernel_param_ops ops = { }; Generated-by: Coccinelle SmPL Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Junio C Hamano <gitster@pobox.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Kees Cook <keescook@chromium.org> Cc: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: cocci@systeme.lip6.fr Cc: linux-kernel@vger.kernel.org Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2015-05-28 11:32:10 +09:30
Trond Myklebust	c627d31ba0	SUNRPC: Cleanup to remove xs_tcp_close() xs_tcp_close() is now just a call to xs_tcp_shutdown(), so remove it, and replace the entry in xs_tcp_ops. Suggested-by: Anna Schumaker <anna.schumaker@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-02-10 11:06:04 -05:00
Trond Myklebust	402e23b4ed	SUNRPC: Fix stupid typo in xs_sock_set_reuseport Yes, kernel_setsockopt() hates you for using a char argument. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-02-09 17:31:02 -05:00
Trond Myklebust	54c0987492	SUNRPC: Define xs_tcp_fin_timeout only if CONFIG_SUNRPC_DEBUG Now that the linger code is gone, the xs_tcp_fin_timeout variable has no real function. Keep it for now, since it is part of the /proc interface, but only define it if that /proc interface is enabled. Suggested-by: Anna Schumaker <Anna.Schumaker@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-02-09 11:27:45 -05:00
Trond Myklebust	b70ae915e4	SUNRPC: Handle connection reset more efficiently. If the connection reset is due to an active call on our side, then the state change is sometimes not reported. Catch those instances using xs_error_report() instead. Also remove the xs_tcp_shutdown() call in xs_tcp_send_request() as the change in behaviour makes it redundant. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-02-09 11:27:42 -05:00
Trond Myklebust	9e2b9f3776	SUNRPC: Remove the redundant XPRT_CONNECTION_CLOSE flag Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-02-09 11:26:06 -05:00
Trond Myklebust	caf4ccd4e8	SUNRPC: Make xs_tcp_close() do a socket shutdown rather than a sock_release Use of socket shutdown() means that we monitor the shutdown process through the xs_tcp_state_change() callback, so it is preferable to a full close in all cases unless we're destroying the transport. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-02-09 11:20:44 -05:00
Trond Myklebust	0efeac261c	SUNRPC: Ensure xs_tcp_shutdown() requests a full close of the connection The previous behaviour left the connection half-open in order to try to scrape the last replies from the socket. Now that we have more reliable reconnection, change the behaviour to close down the socket faster. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-02-09 09:31:11 -05:00
Trond Myklebust	505936f59f	SUNRPC: Cleanup to remove remaining uses of XPRT_CONNECTION_ABORT Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-02-09 09:20:40 -05:00
Trond Myklebust	9cbc94fb06	SUNRPC: Remove TCP socket linger code Now that we no longer use the partial shutdown code when closing the socket, we no longer need to worry about the TCP linger2 state. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-02-09 09:20:40 -05:00
Trond Myklebust	4efdd92c92	SUNRPC: Remove TCP client connection reset hack Instead we rely on SO_REUSEPORT to provide the reconnection semantics that we need for NFSv2/v3. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-02-08 21:47:30 -05:00
Trond Myklebust	de84d89030	SUNRPC: TCP/UDP always close the old socket before reconnecting It is not safe to call xs_reset_transport() from inside xs_udp_setup_socket() or xs_tcp_setup_socket(), since they do not own the correct locks. Instead, do it in xs_connect(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-02-08 21:47:30 -05:00
Trond Myklebust	718ba5b873	SUNRPC: Add helpers to prevent socket create from racing The socket lock is currently held by the task that is requesting the connection be established. While that is efficient in the case where the connection happens quickly, it is racy in the case where it doesn't. What we really want is for the connect helper to be able to block access to the socket while it is being set up. This patch does so by arranging to transfer the socket lock from the task that is requesting the connect attempt, and then releasing that lock once everything is done. This scheme also gives us automatic protection against collisions with the RPC close code, so we can kill the cancel_delayed_work_sync() call in xs_close(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-02-08 21:47:29 -05:00
Trond Myklebust	6cc7e90836	SUNRPC: Ensure xs_reset_transport() resets the close connection flags Otherwise, we may end up looping. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-02-08 21:47:28 -05:00
Trond Myklebust	76698b2358	SUNRPC: Do not clear the source port in xs_reset_transport Now that we can reuse bound ports after a close, we never really want to clear the transport's source port after it has been set. Doing so really messes up the NFSv3 DRC on the server. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-02-08 21:47:28 -05:00
Trond Myklebust	3913c78c3a	SUNRPC: Handle EADDRINUSE on connect Now that we're setting SO_REUSEPORT, we still need to handle the case where a connect() is attempted, but the old socket is still lingering. Essentially, all we want to do here is handle the error by waiting a few seconds and then retrying. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-02-08 21:47:27 -05:00
Trond Myklebust	4dda9c8a5e	SUNRPC: Set SO_REUSEPORT socket option for TCP connections When using TCP, we need the ability to reuse port numbers after a disconnection, so that the NFSv3 server knows that we're the same client. Currently we use a hack to work around the TCP socket's TIME_WAIT: we send an RST instead of closing, which doesn't always work... The SO_REUSEPORT option added in Linux 3.9 allows us to bind multiple TCP connections to the same source address+port combination, and thus to use ordinary TCP close() instead of the current hack. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-02-08 18:52:11 -05:00
Jeff Layton	f895b252d4	sunrpc: eliminate RPC_DEBUG It's always set to whatever CONFIG_SUNRPC_DEBUG is, so just use that. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-11-24 17:31:46 -05:00
Jeff Layton	1a867a0898	sunrpc: add tracepoints in xs_tcp_data_recv Add tracepoints inside the main loop on xs_tcp_data_recv that allow us to keep an eye on what's happening during each phase of it. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-11-24 12:53:35 -05:00
Jeff Layton	3705ad64f1	sunrpc: add new tracepoints in xprt handling code ...so we can keep track of when calls are sent and replies received. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-11-24 12:53:35 -05:00
NeilBrown	1aff525629	NFS/SUNRPC: Remove other deadlock-avoidance mechanisms in nfs_release_page() Now that nfs_release_page() doesn't block indefinitely, other deadlock avoidance mechanisms aren't needed. - it doesn't hurt for kswapd to block occasionally. If it doesn't want to block it would clear __GFP_WAIT. The current_is_kswapd() was only added to avoid deadlocks and we have a new approach for that. - memory allocation in the SUNRPC layer can very rarely try to ->releasepage() a page it is trying to handle. The deadlock is removed as nfs_release_page() doesn't block indefinitely. So we don't need to set PF_FSTRANS for sunrpc network operations any more. Signed-off-by: NeilBrown <neilb@suse.de> Acked-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-09-25 08:25:47 -04:00
Jason Baron	3dedbb5ca1	rpc: Add -EPERM processing for xs_udp_send_request() If an iptables drop rule is added for an nfs server, the client can end up in a softlockup. Because of the way that xs_sendpages() is structured, the -EPERM is ignored since the prior bits of the packet may have been successfully queued and thus xs_sendpages() returns a non-zero value. Then, xs_udp_send_request() thinks that because some bits were queued it should return -EAGAIN. We then try the request again and again, resulting in cpu spinning. Reproducer: 1) open a file on the nfs server '/nfs/foo' (mounted using udp) 2) iptables -A OUTPUT -d <nfs server ip> -j DROP 3) write to /nfs/foo 4) close /nfs/foo 5) iptables -D OUTPUT -d <nfs server ip> -j DROP The softlockup occurs in step 4 above. The previous patch, allows xs_sendpages() to return both a sent count and any error values that may have occurred. Thus, if we get an -EPERM, return that to the higher level code. With this patch in place we can successfully abort the above sequence and avoid the softlockup. I also tried the above test case on an nfs mount on tcp and although the system does not softlockup, I still ended up with the 'hung_task' firing after 120 seconds, due to the i/o being stuck. The tcp case appears a bit harder to fix, since -EPERM appears to get ignored much lower down in the stack and does not propogate up to xs_sendpages(). This case is not quite as insidious as the softlockup and it is not addressed here. Reported-by: Yigong Lou <ylou@akamai.com> Signed-off-by: Jason Baron <jbaron@akamai.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-09-24 23:13:46 -04:00
Jason Baron	f279cd008f	rpc: return sent and err from xs_sendpages() If an error is returned after the first bits of a packet have already been successfully queued, xs_sendpages() will return a positive 'int' value indicating success. Callers seem to treat this as -EAGAIN. However, there are cases where its not a question of waiting for the write queue to drain. For example, when there is an iptables rule dropping packets to the destination, the lower level code can return -EPERM only after parts of the packet have been successfully queued. In this case, we can end up continuously retrying resulting in a kernel softlockup. This patch is intended to make no changes in behavior but is in preparation for subsequent patches that can make decisions based on both on the number of bytes sent by xs_sendpages() and any errors that may have be returned. Signed-off-by: Jason Baron <jbaron@akamai.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-09-24 23:13:37 -04:00
Benjamin Coddington	a743419f42	SUNRPC: Don't wake tasks during connection abort When aborting a connection to preserve source ports, don't wake the task in xs_error_report. This allows tasks with RPC_TASK_SOFTCONN to succeed if the connection needs to be re-established since it preserves the task's status instead of setting it to the status of the aborting kernel_connect(). This may also avoid a potential conflict on the socket's lock. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Cc: stable@vger.kernel.org # 3.14+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-09-24 23:06:56 -04:00
Chris Perl	0f7a622ca6	rpc: xs_bind - do not bind when requesting a random ephemeral port When attempting to establish a local ephemeral endpoint for a TCP or UDP socket, do not explicitly call bind, instead let it happen implicilty when the socket is first used. The main motivating factor for this change is when TCP runs out of unique ephemeral ports (i.e. cannot find any ephemeral ports which are not a part of any TCP connection). In this situation if you explicitly call bind, then the call will fail with EADDRINUSE. However, if you allow the allocation of an ephemeral port to happen implicitly as part of connect (or other functions), then ephemeral ports can be reused, so long as the combination of (local_ip, local_port, remote_ip, remote_port) is unique for TCP sockets on the system. This doesn't matter for UDP sockets, but it seemed easiest to treat TCP and UDP sockets the same. This can allow mount.nfs(8) to continue to function successfully, even in the face of misbehaving applications which are creating a large number of TCP connections. Signed-off-by: Chris Perl <chris.perl@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-09-10 12:47:00 -07:00
Daniel Walter	00cfaa943e	replace strict_strto calls Replace obsolete strict_strto calls with appropriate kstrto calls Signed-off-by: Daniel Walter <dwalter@google.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-07-12 18:45:49 -04:00
Trond Myklebust	3601c4a91e	SUNRPC: Ensure that we handle ENOBUFS errors correctly. Currently, an ENOBUFS error will result in a fatal error for the RPC call. Normally, we will just want to wait and then retry. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-06-30 13:42:19 -04:00
Linus Torvalds	f9da455b93	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Pull networking updates from David Miller: 1) Seccomp BPF filters can now be JIT'd, from Alexei Starovoitov. 2) Multiqueue support in xen-netback and xen-netfront, from Andrew J Benniston. 3) Allow tweaking of aggregation settings in cdc_ncm driver, from Bjørn Mork. 4) BPF now has a "random" opcode, from Chema Gonzalez. 5) Add more BPF documentation and improve test framework, from Daniel Borkmann. 6) Support TCP fastopen over ipv6, from Daniel Lee. 7) Add software TSO helper functions and use them to support software TSO in mvneta and mv643xx_eth drivers. From Ezequiel Garcia. 8) Support software TSO in fec driver too, from Nimrod Andy. 9) Add Broadcom SYSTEMPORT driver, from Florian Fainelli. 10) Handle broadcasts more gracefully over macvlan when there are large numbers of interfaces configured, from Herbert Xu. 11) Allow more control over fwmark used for non-socket based responses, from Lorenzo Colitti. 12) Do TCP congestion window limiting based upon measurements, from Neal Cardwell. 13) Support busy polling in SCTP, from Neal Horman. 14) Allow RSS key to be configured via ethtool, from Venkata Duvvuru. 15) Bridge promisc mode handling improvements from Vlad Yasevich. 16) Don't use inetpeer entries to implement ID generation any more, it performs poorly, from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1522 commits) rtnetlink: fix userspace API breakage for iproute2 < v3.9.0 tcp: fixing TLP's FIN recovery net: fec: Add software TSO support net: fec: Add Scatter/gather support net: fec: Increase buffer descriptor entry number net: fec: Factorize feature setting net: fec: Enable IP header hardware checksum net: fec: Factorize the .xmit transmit function bridge: fix compile error when compiling without IPv6 support bridge: fix smatch warning / potential null pointer dereference via-rhine: fix full-duplex with autoneg disable bnx2x: Enlarge the dorq threshold for VFs bnx2x: Check for UNDI in uncommon branch bnx2x: Fix 1G-baseT link bnx2x: Fix link for KR with swapped polarity lane sctp: Fix sk_ack_backlog wrap-around problem net/core: Add VF link state control policy net/fsl: xgmac_mdio is dependent on OF_MDIO net/fsl: Make xgmac_mdio read error message useful net_sched: drr: warn when qdisc is not work conserving ...	2014-06-12 14:27:40 -07:00
Tom Herbert	0f8066bd48	sunrpc: Remove sk_no_check setting Setting sk_no_check to UDP_CSUM_NORCV seems to have no effect. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-23 16:28:53 -04:00
Peter Zijlstra	4e857c58ef	arch: Mass conversion of smp_mb__() Mostly scripted conversion of the smp_mb__ barriers. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-arch@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-04-18 14:20:48 +02:00

1 2 3 4 5 ...

536 Commits