linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-15 00:04:15 +08:00

Go to file

Qu Wenruo 7a31507230 btrfs: raid56: do data csum verification during RMW cycle [BUG] For the following small script, btrfs will be unable to recover the content of file1: mkfs.btrfs -f -m raid1 -d raid5 -b 1G $dev1 $dev2 $dev3 mount $dev1 $mnt xfs_io -f -c "pwrite -S 0xff 0 64k" -c sync $mnt/file1 md5sum $mnt/file1 umount $mnt # Corrupt the above 64K data stripe. xfs_io -f -c "pwrite -S 0x00 323026944 64K" -c sync $dev3 mount $dev1 $mnt # Write a new 64K, which should be in the other data stripe # And this is a sub-stripe write, which will cause RMW xfs_io -f -c "pwrite 0 64k" -c sync $mnt/file2 md5sum $mnt/file1 umount $mnt Above md5sum would fail. [CAUSE] There is a long existing problem for raid56 (not limited to btrfs raid56) that, if we already have some corrupted on-disk data, and then trigger a sub-stripe write (which needs RMW cycle), it can cause further damage into P/Q stripe. Disk 1: data 1 \|0x000000000000\| <- Corrupted Disk 2: data 2 \|0x000000000000\| Disk 2: parity \|0xffffffffffff\| In above case, data 1 is already corrupted, the original data should be 64KiB of 0xff. At this stage, if we read data 1, and it has data checksum, we can still recovery going via the regular RAID56 recovery path. But if now we decide to write some data into data 2, then we need to go RMW. Let's say we want to write 64KiB of '0x00' into data 2, then we read the on-disk data of data 1, calculate the new parity, resulting the following layout: Disk 1: data 1 \|0x000000000000\| <- Corrupted Disk 2: data 2 \|0x000000000000\| <- New '0x00' writes Disk 2: parity \|0x000000000000\| <- New Parity. But the new parity is calculated using the corrupted data 1, we can no longer recover the correct data of data1. Thus the corruption is forever there. [FIX] To solve above problem, this patch will do a full stripe data checksum verification at RMW time. This involves the following changes: - Always read the full stripe (including data/P/Q) when doing RMW Before we only read the missing data sectors, but since we may do a data csum verification and recovery, we need to read everything out. Please note that, if we have a cached rbio, we don't need to read anything, and can treat it the same as full stripe write. As only stripe with all its csum matches can be cached. - Verify the data csum during read. The goal is only the rbio stripe sectors, and only if the rbio already has csum_buf/csum_bitmap filled. And sectors which cannot pass csum verification will have their bit set in error_bitmap. - Always call recovery_sectors() after we read out all the sectors Since error_bitmap will be updated during read, recover_sectors() can easily find out all the bad sectors and try to recover (if still under tolerance). And since recovery_sectors() is already migrated to use error_bitmap, it can skip vertical stripes which don't have any error. - Verify the repaired sectors against its csum in recover_vertical() - Rename rmw_read_and_wait() to rmw_read_wait_recover() Since we will always recover the sectors, the old name is no longer accurate. Furthermore since recovery is already done in rmw_read_wait_recover(), we no longer need to call recovery_sectors() inside rmw_rbio(). Obviously this will have a performance impact, as we are doing more work during RMW cycle: - Fetch the data checksums - Do checksum verification for all data stripes - Do checksum verification again after repair But for full stripe write or cached rbio we won't have the overhead all, thus for fully optimized RAID56 workload (always full stripe write), there should be no extra overhead. To me, the extra overhead looks reasonable, as data consistency is way more important than performance. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>		2022-12-05 18:00:57 +01:00
arch	powerpc fixes for 6.1 #6	2022-12-04 12:24:58 -08:00
block	block-6.1-2022-11-25	2022-11-25 17:50:57 -08:00
certs	certs: make system keyring depend on built-in x509 parser	2022-09-24 04:31:18 +09:00
crypto	treewide: use get_random_bytes() when possible	2022-10-11 17:42:58 -06:00
Documentation	A set of clk driver fixes that resolve issues for various SoCs. Most of	2022-11-30 15:46:46 -08:00
drivers	char: tpm: Protect tpm_pm_suspend with locks	2022-12-04 12:49:13 -08:00
fs	btrfs: raid56: do data csum verification during RMW cycle	2022-12-05 18:00:57 +01:00
include	btrfs: switch extent_io_tree::private_data to btrfs_inode and rename	2022-12-05 18:00:54 +01:00
init	init/Kconfig: fix CC_HAS_ASM_GOTO_TIED_OUTPUT test with dash	2022-11-22 22:42:38 +09:00
io_uring	io_uring: clear TIF_NOTIFY_SIGNAL if set and task_work not available	2022-11-25 10:55:08 -07:00
ipc	ipc/shm: call underlying open/close vm_ops	2022-11-22 18:50:42 -08:00
kernel	- Fix a use-after-free case where the perf pending task callback would	2022-12-04 12:36:23 -08:00
lib	15 hotfixes. 11 marked cc:stable. Only three or four of the latter	2022-12-02 13:39:38 -08:00
LICENSES	LICENSES/LGPL-2.1: Add LGPL-2.1-or-later as valid identifiers	2021-12-16 14:33:10 +01:00
mm	Revert "mm: align larger anonymous mappings on THP boundaries"	2022-12-04 12:51:59 -08:00
net	Including fixes from bpf, can and wifi.	2022-11-29 09:52:10 -08:00
rust	Kbuild: add Rust support	2022-09-28 09:02:20 +02:00
samples	VFIO updates for v6.1-rc1	2022-10-12 14:46:48 -07:00
scripts	- Handle different output of readelf on different distros running	2022-11-27 12:08:17 -08:00
security	lsm/stable-6.1 PR 20221031	2022-10-31 12:09:42 -07:00
sound	ASoC: Fixes for v6.1	2022-11-30 17:26:55 +01:00
tools	15 hotfixes. 11 marked cc:stable. Only three or four of the latter	2022-12-02 13:39:38 -08:00
usr	usr/gen_init_cpio.c: remove unnecessary -1 values from int file	2022-10-03 14:21:44 -07:00
virt	Merge branch 'kvm-dwmw2-fixes' into HEAD	2022-11-23 18:59:45 -05:00
.clang-format	PCI/DOE: Add DOE mailbox support functions	2022-07-19 15:38:04 -07:00
.cocciconfig
.get_maintainer.ignore	get_maintainer: add Alan to .get_maintainer.ignore	2022-08-20 15:17:44 -07:00
.gitattributes	.gitattributes: use 'dts' diff driver for dts files	2019-12-04 19:44:11 -08:00
.gitignore	Kbuild: add Rust support	2022-09-28 09:02:20 +02:00
.mailmap	Including fixes from bpf, can and wifi.	2022-11-29 09:52:10 -08:00
.rustfmt.toml	rust: add `.rustfmt.toml`	2022-09-28 09:02:20 +02:00
COPYING	COPYING: state that all contributions really are covered by this file	2020-02-10 13:32:20 -08:00
CREDITS	MAINTAINERS: Remove Michal Marek from Kbuild maintainers	2022-11-16 14:53:00 +09:00
Kbuild	Kbuild updates for v6.1	2022-10-10 12:00:45 -07:00
Kconfig	kbuild: ensure full rebuild when the compiler is updated	2020-05-12 13:28:33 +09:00
MAINTAINERS	Including fixes from bpf, can and wifi.	2022-11-29 09:52:10 -08:00
Makefile	Linux 6.1-rc8	2022-12-04 14:48:12 -08:00
README	Drop all 00-INDEX files from Documentation/	2018-09-09 15:08:58 -06:00

README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.