2
0
mirror of https://github.com/edk2-porting/linux-next.git synced 2025-01-15 09:03:59 +08:00
Mainline Linux tree for various devices, only for fun :)
Go to file
Chuck Lever 8d4fb8ff42 xprtrdma: Fix disconnect regression
I found that injecting disconnects with v4.18-rc resulted in
random failures of the multi-threaded git regression test.

The root cause appears to be that, after a reconnect, the
RPC/RDMA transport is waking pending RPCs before the transport has
posted enough Receive buffers to receive the Replies. If a Reply
arrives before enough Receive buffers are posted, the connection
is dropped. A few connection drops happen in quick succession as
the client and server struggle to regain credit synchronization.

This regression was introduced with commit 7c8d9e7c88 ("xprtrdma:
Move Receive posting to Receive handler"). The client is supposed to
post a single Receive when a connection is established because
it's not supposed to send more than one RPC Call before it gets
a fresh credit grant in the first RPC Reply [RFC 8166, Section
3.3.3].

Unfortunately there appears to be a longstanding bug in the Linux
client's credit accounting mechanism. On connect, it simply dumps
all pending RPC Calls onto the new connection. It's possible it has
done this ever since the RPC/RDMA transport was added to the kernel
ten years ago.

Servers have so far been tolerant of this bad behavior. Currently no
server implementation ever changes its credit grant over reconnects,
and servers always repost enough Receives before connections are
fully established.

The Linux client implementation used to post a Receive before each
of these Calls. This has covered up the flooding send behavior.

I could try to correct this old bug so that the client sends exactly
one RPC Call and waits for a Reply. Since we are so close to the
next merge window, I'm going to instead provide a simple patch to
post enough Receives before a reconnect completes (based on the
number of credits granted to the previous connection).

The spurious disconnects will be gone, but the client will still
send multiple RPC Calls immediately after a reconnect.

Addressing the latter problem will wait for a merge window because
a) I expect it to be a large change requiring lots of testing, and
b) obviously the Linux client has interoperated successfully since
day zero while still being broken.

Fixes: 7c8d9e7c88 ("xprtrdma: Move Receive posting to ... ")
Cc: stable@vger.kernel.org # v4.18+
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-08-08 16:49:47 -04:00
arch A couple more MIPS fixes for 4.18: 2018-07-24 18:11:15 -07:00
block for-linus-20180713 2018-07-14 12:28:00 -07:00
certs certs/blacklist: fix const confusion 2018-06-26 09:43:03 -07:00
crypto Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2018-07-19 07:32:44 -07:00
Documentation USB fixes for 4.18-rc7 2018-07-26 09:29:29 -07:00
drivers USB fixes for 4.18-rc7 2018-07-26 09:29:29 -07:00
firmware kbuild: remove all dummy assignments to obj- 2017-11-18 11:46:06 +09:00
fs NFSv4 client live hangs after live data migration recovery 2018-07-31 12:53:40 -04:00
include NFSv4 client live hangs after live data migration recovery 2018-07-31 12:53:40 -04:00
init Kbuild fixes for v4.18 2018-06-30 13:05:30 -07:00
ipc ipc: use new return type vm_fault_t 2018-06-15 07:55:25 +09:00
kernel Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-07-24 17:31:47 -07:00
lib Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-07-21 16:52:08 -07:00
LICENSES LICENSES: Add Linux-OpenIB license text 2018-04-27 16:41:53 -06:00
mm mm: make vm_area_alloc() initialize core fields 2018-07-21 15:24:03 -07:00
net xprtrdma: Fix disconnect regression 2018-08-08 16:49:47 -04:00
samples Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-07-18 19:32:54 -07:00
scripts Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-07-18 19:32:54 -07:00
security selinux/stable-4.18 PR 20180629 2018-06-30 11:15:12 -07:00
sound ALSA: hda/realtek - Yet another Clevo P950 quirk entry 2018-07-18 12:17:46 +02:00
tools USB fixes for 4.18-rc7 2018-07-26 09:29:29 -07:00
usr kbuild: rename built-in.o to built-in.a 2018-03-26 02:01:19 +09:00
virt Miscellaneous bugfixes, plus a small patchlet related to Spectre v2. 2018-07-18 11:08:44 -07:00
.clang-format clang-format: add configuration file 2018-04-11 10:28:35 -07:00
.cocciconfig scripts: add Linux .cocciconfig for coccinelle 2016-07-22 12:13:39 +02:00
.get_maintainer.ignore
.gitattributes .gitattributes: set git diff driver for C source code files 2016-10-07 18:46:30 -07:00
.gitignore Kbuild updates for v4.17 (2nd) 2018-04-15 17:21:30 -07:00
.mailmap Merge branch 'asoc-4.17' into asoc-4.18 for compress dependencies 2018-04-26 12:24:28 +01:00
COPYING COPYING: use the new text with points to the license files 2018-03-23 12:41:45 -06:00
CREDITS MAINTAINERS/CREDITS: Drop METAG ARCHITECTURE 2018-03-05 16:34:24 +00:00
Kbuild Kbuild updates for v4.15 2017-11-17 17:45:29 -08:00
Kconfig kconfig: add basic helper macros to scripts/Kconfig.include 2018-05-29 03:31:19 +09:00
MAINTAINERS MAINTAINERS: Peter has moved 2018-07-21 12:50:46 -07:00
Makefile Linux 4.18-rc6 2018-07-22 14:12:20 -07:00
README Docs: Added a pointer to the formatted docs to README 2018-03-21 09:02:53 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.
See Documentation/00-INDEX for a list of what is contained in each file.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.