Go to file
Darrick J. Wong ea22fcd032 xfs: only run COW extent recovery when there are no live extents
[ Upstream commit 7993f1a431 ]

As part of multiple customer escalations due to file data corruption
after copy on write operations, I wrote some fstests that use fsstress
to hammer on COW to shake things loose.  Regrettably, I caught some
filesystem shutdowns due to incorrect rmap operations with the following
loop:

mount <filesystem>				# (0)
fsstress <run only readonly ops> &		# (1)
while true; do
	fsstress <run all ops>
	mount -o remount,ro			# (2)
	fsstress <run only readonly ops>
	mount -o remount,rw			# (3)
done

When (2) happens, notice that (1) is still running.  xfs_remount_ro will
call xfs_blockgc_stop to walk the inode cache to free all the COW
extents, but the blockgc mechanism races with (1)'s reader threads to
take IOLOCKs and loses, which means that it doesn't clean them all out.
Call such a file (A).

When (3) happens, xfs_remount_rw calls xfs_reflink_recover_cow, which
walks the ondisk refcount btree and frees any COW extent that it finds.
This function does not check the inode cache, which means that incore
COW forks of inode (A) is now inconsistent with the ondisk metadata.  If
one of those former COW extents are allocated and mapped into another
file (B) and someone triggers a COW to the stale reservation in (A), A's
dirty data will be written into (B) and once that's done, those blocks
will be transferred to (A)'s data fork without bumping the refcount.

The results are catastrophic -- file (B) and the refcount btree are now
corrupt.  In the first patch, we fixed the race condition in (2) so that
(A) will always flush the COW fork.  In this second patch, we move the
_recover_cow call to the initial mount call in (0) for safety.

As mentioned previously, xfs_reflink_recover_cow walks the refcount
btree looking for COW staging extents, and frees them.  This was
intended to be run at mount time (when we know there are no live inodes)
to clean up any leftover staging events that may have been left behind
during an unclean shutdown.  As a time "optimization" for readonly
mounts, we deferred this to the ro->rw transition, not realizing that
any failure to clean all COW forks during a rw->ro transition would
result in catastrophic corruption.

Therefore, remove this optimization and only run the recovery routine
when we're guaranteed not to have any COW staging extents anywhere,
which means we always run this at mount time.  While we're at it, move
the callsite to xfs_log_mount_finish because any refcount btree
expansion (however unlikely given that we're removing records from the
right side of the index) must be fed by a per-AG reservation, which
doesn't exist in its current location.

Fixes: 174edb0e46 ("xfs: store in-progress CoW allocations in the refcount btree")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-07-21 21:24:14 +02:00
arch sh: convert nommu io{re,un}map() to static inline functions 2022-07-21 21:24:14 +02:00
block block: fix rq-qos breakage from skipping rq_qos_done_bio() 2022-07-12 16:34:57 +02:00
certs certs/blacklist_hashes.c: fix const confusion in certs blacklist 2022-06-22 14:22:01 +02:00
crypto crypto: memneq - move into lib/ 2022-06-22 14:22:03 +02:00
Documentation dt-bindings: dma: allwinner,sun50i-a64-dma: Fix min/max typo 2022-07-12 16:35:17 +02:00
drivers drm/panfrost: Fix shrinker list corruption by madvise IOCTL 2022-07-21 21:24:13 +02:00
fs xfs: only run COW extent recovery when there are no live extents 2022-07-21 21:24:14 +02:00
include cgroup: Use separate src/dst nodes when preloading css_sets for migration 2022-07-21 21:24:13 +02:00
init Kconfig: Add option for asm goto w/ tied outputs to workaround clang-13 bug 2022-06-09 10:23:26 +02:00
ipc ipc/mqueue: use get_tree_nodev() in mqueue_get_tree() 2022-06-09 10:23:10 +02:00
kernel cgroup: Use separate src/dst nodes when preloading css_sets for migration 2022-07-21 21:24:13 +02:00
lib ida: don't use BUG_ON() for debugging 2022-07-12 16:35:18 +02:00
LICENSES LICENSES/dual/CC-BY-4.0: Git rid of "smart quotes" 2021-07-15 06:31:24 -06:00
mm mm: split huge PUD on wp_huge_pud fallback 2022-07-21 21:24:11 +02:00
net wifi: mac80211: fix queue selection for mesh/OCB interfaces 2022-07-21 21:24:13 +02:00
samples samples/landlock: Format with clang-format 2022-06-09 10:23:23 +02:00
scripts stddef: Introduce DECLARE_FLEX_ARRAY() helper 2022-07-12 16:35:03 +02:00
security Revert "evm: Fix memleak in init_desc" 2022-07-21 21:24:14 +02:00
sound ALSA: hda/realtek - Enable the headset-mic on a Xiaomi's laptop 2022-07-21 21:24:10 +02:00
tools selftests/net: fix section name when using xdp_dummy.o 2022-07-12 16:35:19 +02:00
usr usr/include/Makefile: add linux/nfc.h to the compile-test coverage 2022-02-01 17:27:15 +01:00
virt KVM: Initialize debugfs_dentry when a VM is created to avoid NULL deref 2022-07-12 16:35:05 +02:00
.clang-format clang-format: Update with the latest for_each macro list 2021-05-12 23:32:39 +02:00
.cocciconfig
.get_maintainer.ignore Opt out of scripts/get_maintainer.pl 2019-05-16 10:53:40 -07:00
.gitattributes .gitattributes: use 'dts' diff driver for dts files 2019-12-04 19:44:11 -08:00
.gitignore .gitignore: ignore only top-level modules.builtin 2021-05-02 00:43:35 +09:00
.mailmap mailmap: add Andrej Shadura 2021-10-18 20:22:03 -10:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS MAINTAINERS: Move Daniel Drake to credits 2021-09-21 08:34:58 +03:00
Kbuild kbuild: rename hostprogs-y/always to hostprogs/always-y 2020-02-04 01:53:07 +09:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS Input: goodix - add a goodix.h header file 2022-07-12 16:34:51 +02:00
Makefile Linux 5.15.55 2022-07-15 10:13:00 +02:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.