Go to file
Bob Peterson 03678a99d1 gfs2: Ignore dlm recovery requests if gfs2 is withdrawn
When a node fails, user space informs dlm of the node failure,
and dlm instructs gfs2 on the surviving nodes to perform journal
recovery. It does this by calling various callback functions in
lock_dlm.c. To mark its progress, it keeps generation numbers
and recover bits in a dlm "control" lock lvb, which is seen by
all nodes to determine which journals need to be replayed.

The gfs2 on all nodes get the same recovery requests from dlm,
so they all try to do the recovery, but only one will be
granted the exclusive lock on the journal. The others fail
with a "Busy" message on their "try lock."

However, when a node is withdrawn, it cannot safely do any
recovery or replay any journals. To make matters worse,
gfs2 might withdraw as a result of attempting recovery. For
example, this might happen if the device goes offline, or if
an hba fails. But in today's gfs2 code, it doesn't check for
being withdrawn at any step in the recovery process. What's
worse is that these callbacks from dlm have no return code,
so there is no way to indicate failure back to dlm. We can
send a "Recovery failed" uevent eventually, but that tells
user space what happened, not dlm's kernel code.

Before this patch, lock_dlm would perform its recovery steps but
ignore the result, and eventually it would still update its
generation number in the lvb, despite the fact that it may have
withdrawn or encountered an error. The other nodes would then
see the newer generation number in the lvb and conclude that
they don't need to do recovery because the generation number
is newer than the last one they saw. They think a different
node has already recovered the journal.

This patch adds checks to several of the callbacks used by dlm
in its recovery state machine so that the functions are ignored
and skipped if an io error has occurred or if the file system
is withdrawn. That prevents the lvb bits from being updated, and
therefore dlm and user space still see the need for recovery to
take place.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-02-10 07:39:50 -06:00
arch Merge branch 'akpm' (patches from Andrew) 2020-01-31 12:16:36 -08:00
block SCSI misc on 20200129 2020-01-29 18:16:16 -08:00
certs certs: Add wrapper function to check blacklisted binary hash 2019-11-12 12:25:50 +11:00
crypto Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2020-01-28 15:38:56 -08:00
Documentation Merge branch 'akpm' (patches from Andrew) 2020-01-31 12:16:36 -08:00
drivers Merge branch 'akpm' (patches from Andrew) 2020-01-31 12:16:36 -08:00
fs gfs2: Ignore dlm recovery requests if gfs2 is withdrawn 2020-02-10 07:39:50 -06:00
include New code for 5.6: 2020-01-31 12:58:12 -08:00
init init/main.c: fix misleading "This architecture does not have kernel memory protection" message 2020-01-31 10:30:41 -08:00
ipc treewide: Use sizeof_field() macro 2019-12-09 10:36:44 -08:00
kernel Merge branch 'akpm' (patches from Andrew) 2020-01-31 12:16:36 -08:00
lib kcov: ignore fault-inject and stacktrace 2020-01-31 10:30:41 -08:00
LICENSES LICENSES: Rename other to deprecated 2019-05-03 06:34:32 -06:00
mm Merge branch 'akpm' (patches from Andrew) 2020-01-31 12:16:36 -08:00
net Merge branch 'akpm' (patches from Andrew) 2020-01-31 12:16:36 -08:00
samples drm pull for 5.6-rc1 2020-01-30 08:04:01 -08:00
scripts Merge branch 'akpm' (patches from Andrew) 2020-01-31 12:16:36 -08:00
security linux-kselftest-5.6-rc1-kunit 2020-01-29 15:25:34 -08:00
sound sound updates for 5.6-rc1 2020-01-28 16:26:57 -08:00
tools Merge branch 'akpm' (patches from Andrew) 2020-01-31 12:16:36 -08:00
usr gen_initramfs_list.sh: fix 'bad variable name' error 2020-01-04 00:00:48 +09:00
virt Merge branch 'cve-2019-3016' into kvm-next-5.6 2020-01-30 18:47:59 +01:00
.clang-format clang-format: Update with the latest for_each macro list 2019-08-31 10:00:51 +02:00
.cocciconfig
.get_maintainer.ignore Opt out of scripts/get_maintainer.pl 2019-05-16 10:53:40 -07:00
.gitattributes .gitattributes: use 'dts' diff driver for dts files 2019-12-04 19:44:11 -08:00
.gitignore modpost: dump missing namespaces into a single modules.nsdeps file 2019-11-11 20:10:01 +09:00
.mailmap Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2020-01-28 15:38:56 -08:00
COPYING COPYING: use the new text with points to the license files 2018-03-23 12:41:45 -06:00
CREDITS open: introduce openat2(2) syscall 2020-01-18 09:19:18 -05:00
Kbuild kbuild: do not descend to ./Kbuild when cleaning 2019-08-21 21:03:58 +09:00
Kconfig docs: kbuild: convert docs to ReST and rename to *.rst 2019-06-14 14:21:21 -06:00
MAINTAINERS The main MIPS changes for 5.6: 2020-01-31 11:28:31 -08:00
Makefile Linux 5.5 2020-01-26 16:23:03 -08:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.