We pass the csum root from way high in the call chain in check down to
where we actually need it. However we can just get it from the fs_info
in these places, so clean up the functions to skip passing around the
csum root needlessly.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We are going to need to start looking up the csum root based on the
bytenr with extent tree v2. To that end stop passing the root to the
csum related functions so that can be done in the helper functions
themselves.
There's an unrelated deletion of a function prototype that no longer
exists.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We use this pattern in a few places, and will use it more with different
roots in the future. Extract out this helper to read the root nodes.
There is a behavior change here in that we're now checking the root
levels, whereas before we were not.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Running with ASAN we won't pass the self tests because we leak the whole
fs_info with btrfs filesystem show. Fix this by making sure we close
out the fs_info and clean up all of the memory and such.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We will lookup implied refs for every root id for every block that we
find when verifying qgroups, but we don't need to worry about non
fstrees, so skip them here.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is doing the same work as insert_block_group_item, rework it to
call the helper instead.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
I screwed up a fix where we're setting the bytenr range as dirty when
marking all tree blocks used, I was looking at btrfs_pin_extent and put
->nodesize for end instead of the actual end, which is bytenr +
->nodesize - 1. Fix this up so it's correct.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
There is a bug report that a corrupted key type (expected
UUID_KEY_SUBVOL, has EXTENT_ITEM) causing newer kernel to reject a
mount.
Although the root cause is not determined yet, with roll out of v5.11
kernel to various distros, such problem should be prevented by
tree-checker, no matter if it's hardware problem or not.
And older kernel with "-o uuid_rescan" mount option won't help, as
uuid_rescan will only delete items with
UUID_KEY_SUBVOL/UUID_KEY_RECEIVED_SUBVOL key types, not deleting such
corrupted key.
[FIX]
To fix such problem we have to rely on offline tool, thus there we
introduce a new rescue tool, clear-uuid-tree, to empty and then remove
uuid tree.
Kernel will re-generate the correct uuid tree at next mount.
Reported-by: S. <sb56637@gmail.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There is a bug in raid56_recov() which doesn't properly repair data and
P case corruption:
/* Data and P*/
if (dest2 == nr_devs - 1)
return raid6_recov_datap(nr_devs, stripe_len, dest1, data);
Note that, dest1/2 is to indicate which slot has corruption.
For RAID6 cases:
[0, nr_devs - 2) is for data stripes,
@data_devs - 2 is for P,
@data_devs - 1 is for Q.
For above code, the comment is correct, but the check condition is
wrong, and leads to the only project, btrfs-fuse, to report raid6
recovery error for 2 devices missing case.
Fix it by using correct condition.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Current formula calculates the stripe size, however that's not what we
want in the case of RAID1/DUP profiles. In those cases since chunk are
mirrored across devices we want the full size of the chunk. Without this
patch the 'btrfs fi usage' output from an fs which is using RAID1 is:
Data,RAID1: Size:2.00GiB, Used:1.00GiB (50.03%)
/dev/vdc 1.00GiB
/dev/vdf 1.00GiB
Metadata,RAID1: Size:256.00MiB, Used:1.34MiB (0.52%)
/dev/vdc 128.00MiB
/dev/vdf 128.00MiB
System,RAID1: Size:8.00MiB, Used:16.00KiB (0.20%)
/dev/vdc 4.00MiB
/dev/vdf 4.00MiB
Unallocated:
/dev/vdc 8.87GiB
/dev/vdf 8.87GiB
So a 2 gigabyte RAID1 chunk actually will take up 4 gigabytes on the
actual disks 2 each. In this case this is being miscalculated as taking
up 1GiB on each device.
This also leads to erroneously calculated unallocated space. The correct
output in this case is:
Data,RAID1: Size:2.00GiB, Used:1.00GiB (50.03%)
/dev/vdc 2.00GiB
/dev/vdf 2.00GiB
Metadata,RAID1: Size:256.00MiB, Used:1.34MiB (0.52%)
/dev/vdc 256.00MiB
/dev/vdf 256.00MiB
System,RAID1: Size:8.00MiB, Used:16.00KiB (0.20%)
/dev/vdc 8.00MiB
/dev/vdf 8.00MiB
Unallocated:
/dev/vdc 7.74GiB
/dev/vdf 7.74GiB
Fix it by only utilising the chunk formula for profiles which are not
RAID1/DUP.
Issue: #422
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Commit 80714610f3 ("btrfs-progs: use raid table for ncopies")
slightly broke how raid ratio are being calculated since the resulting
code would always reset ratio to be 1 in case we didn't have RAID56
profile. The correct behavior is to simply set it to 0 if we have RAID56
as the calculation is different in this case and leave it intact
otherwise.
This bug manifests by doing all size-related calculation for 'btrfs
filesystem usage' command as if all block groups are of type SINGLE. Fix
this by only resetting ratio 0 in case of RAID56.
Issue: #422
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
mkfs.btrfs v5.15 outputs a message even if the disk is a HDD without
TRIM/DISCARD support:
Performing full device TRIM /dev/sdc2 (326.03GiB) ...
[CAUSE]
mkfs.btrfs check TRIM/DISCARD support through the content of
queue/discard_granularity, but compare it against a wrong value.
When HDD without TRIM/DISCARD support, the content of
queue/discard_granularity is '0' '\n' '\0', rather than '0' '\0'.
[FIX]
- compare the value based on atoi() to provide more robustness
- delete unnecessary '\n' in pr_verbose()
Fixes: c50c448518 ("btrfs-progs: do sysfs detection of device discard capability")
Signed-off-by: Wang Yugui <wangyugui@e16-tech.com>
Signed-off-by: David Sterba <dsterba@suse.com>
My tool ntfs2btrfs has been creating btrfs volumes in a way that the
kernel doesn't like, but which isn't picked up by btrfs check - see
maharmstone/ntfs2btrfs#23 for the details, including a backtrace. This
patch adds a check for when a csum item contains too many entries -
effectively it ensures that there's always at least sizeof(struct
btrfs_item) bytes free in the tree, otherwise btrfs_del_csums can throw
an error.
max_entries is the value of the __MAX_CSUM_ITEMS macro in
fs/btrfs/file-item.c.
Pull-request: #401
Signed-off-by: Mark Harmstone <mark@harmstone.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Make it possible to select sphinx doc generator instead of asciidoc so
we don't have two makefiles for that. It's still a bit crude and does
not support installing the files.
The required package is python-Sphinx (or similar name), built by
'sphinx-build'.
Configure:
$ ASCIIDOC_TOOL=sphinx ./configure
...
doc generator: sphinx
...
Generate:
$ cd Documentation/
$ make man
$ make html
$ make info # yes we can have info pages too
There are several more targets provided by sphinx, run 'make' to list
them.
Signed-off-by: David Sterba <dsterba@suse.com>
All long command line options in the html target render as a single
character which is confusing and wrong as we expect that to be '--'.
Disable it.
Signed-off-by: David Sterba <dsterba@suse.com>
Cell spanning is not supported for manual page target, so add separate
columns for the redundancy numbers.
Signed-off-by: David Sterba <dsterba@suse.com>
Just like kernel commit 22b6331d9617 ("btrfs: store precalculated
csum_size in fs_info"), we can cache csum_size and csum_type in
btrfs_fs_info.
Furthermore, there is already a 32 bits hole in btrfs_fs_info, and we
can fit csum_type and csum_size into the hole without increase the size
of btrfs_fs_info.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There are a lot of call sites where we use the following code snippet:
u8 super_block_data[BTRFS_SUPER_INFO_SIZE];
struct btrfs_super_block *sb;
u64 ret;
sb = (struct btrfs_super_block *)super_block_data;
The reason for this is, structure btrfs_super_block was smaller than
BTRFS_SUPER_INFO_SIZE.
Thus for anything with csum involved, we have to use a proper 4K buffer.
Since the recent unification of sizeof(struct btrfs_super_block), we no
longer need such workaround, and can use struct btrfs_super_block
directly to do any operation.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Just like kernel change, pad struct btrfs_super_block to 4096 bytes. As
ctree.h is part of public headers, use raw number for the superblock
offset.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In commit a138daac17 ("btrfs-progs: mkfs: set super_cache_generation
to 0 if we're using free space tree") the space cache (v1) generation
was reset to 0 to let kernel know it's not used when the free-space-tree
is enabled.
This got broken again in 5.14 when the free space tree code got
refactored in 4b6cf2a3eb ("btrfs-progs: mkfs: generate free space tree
at make_btrfs() time").
Reset the space cache generation to 0.
Issue: #414
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
We noticed 'btrfs check' outputs something like
leaf 30408704 flags 0x0(P1逅?) backref revision 1
but we expected:
leaf 30408704 flags 0x0() backref revision 1
[CAUSE]
Some XX_flags_to_str() failed to make sure the result string always ends
with '\0' in some case.
[FIX]
Reset the buffer at the beginnig.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Wang Yugui (wangyugui@e16-tech.com)
Signed-off-by: David Sterba <dsterba@suse.com>
For consistency with older versions switch the case of 'single' to be
lower case again even if it's inconsistent. This could be revisited in
the future.
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
Since commit dad03fac3b ("btrfs-progs: switch btrfs_group_profile_str
to use raid table"), fstests/btrfs/023 and btrfs/151 will always fail.
The failure of btrfs/151 explains the reason pretty well:
btrfs/151 1s ... - output mismatch
--- tests/btrfs/151.out 2019-10-22 15:18:14.068965341 +0800
+++ ~/xfstests-dev/results//btrfs/151.out.bad 2021-11-02 17:13:43.879999994 +0800
@@ -1,2 +1,2 @@
QA output created by 151
-Data, RAID1
+Data, raid1
...
(Run 'diff -u ~/xfstests-dev/tests/btrfs/151.out ~/xfstests-dev/results//btrfs/151.out.bad' to see the entire diff)
[CAUSE]
Commit dad03fac3b ("btrfs-progs: switch btrfs_group_profile_str to use
raid table") will use btrfs_raid_array[index].raid_name, which is all
lower case.
[FIX]
There is no need to bring such output format change.
So here we split the btrfs_raid_attr::raid_name[] into upper_name[] and
lower_name[], and make upper and lower case helpers for callers to use.
Now there are several types of callers referring to lower_name and
upper_name:
- parse_bg_profile()
It uses strcasecmp(), either case would be fine.
- btrfs_group_profile_str()
Originally it uses upper case for all profiles except "single".
Now unified to upper case.
- sprint_profiles()
It uses lower case.
- bg_flags_to_str()
It uses upper case.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When compiling btrfs-progs on 32bit x86 using GCC 11.1.0, there are
several warnings:
In file included from ./common/utils.h:30,
from check/main.c:36:
check/main.c: In function 'run_next_block':
./common/messages.h:42:31: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'u32' {aka 'unsigned int'} [-Wformat=]
42 | __btrfs_error((fmt), ##__VA_ARGS__); \
| ^~~~~
check/main.c:6496:33: note: in expansion of macro 'error'
6496 | error(
| ^~~~~
In file included from ./common/utils.h:30,
from kernel-shared/volumes.c:32:
kernel-shared/volumes.c: In function 'btrfs_check_chunk_valid':
./common/messages.h:42:31: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'u32' {aka 'unsigned int'} [-Wformat=]
42 | __btrfs_error((fmt), ##__VA_ARGS__); \
| ^~~~~
kernel-shared/volumes.c:2052:17: note: in expansion of macro 'error'
2052 | error("invalid chunk item size, have %u expect [%zu, %lu)",
| ^~~~~
image/main.c: In function 'search_for_chunk_blocks':
./common/messages.h:42:31: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t' {aka 'unsigned int'} [-Wformat=]
42 | __btrfs_error((fmt), ##__VA_ARGS__); \
| ^~~~~
image/main.c:2122:33: note: in expansion of macro 'error'
2122 | error(
| ^~~~~
There are two types of problems:
- __BTRFS_LEAF_DATA_SIZE()
This macro has no type definition, making it behaves differently on
different arches.
Fix this by following kernel to use inline function to make its return
value fixed to u32.
- size_t related output
For x86_64 %lu is OK but not for x86.
Fix this by using %zu.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Commit ("btrfs-progs: switch btrfs_group_profile_str to use raid table")
introduced a regression that raid profile of GlobalReserve will be
printed as 'unknown'.
$ btrfs filesystem df /mnt/test
Data, single: total=5.02TiB, used=4.98TiB
System, single: total=4.00MiB, used=624.00KiB
Metadata, single: total=11.01GiB, used=6.94GiB
GlobalReserve, unknown: total=512.00MiB, used=0.00B
Fix it by:
- take BTRFS_BLOCK_GROUP_RESERVED into account when masking the block
group flags
- update the define of BTRFS_BLOCK_GROUP_RESERVED too so it's same as in
kernel
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Wang Yugui <wangyugui@e16-tech.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There's a report that a read-only subvolume with a received_uuid set
emits the warning in command 'btrfs subvolume show', which is obviously
wrong.
The reason is that there are different types of root item flags,
depending on how we read them. The check in cmd_subvol_show uses the
ioctl GET_SUBVOL_INFO and the appropriate flag is raw
BTRFS_ROOT_SUBVOL_RDONLY (0x1), while there's another SUBVOL_GETFLAGS that
maps the flags and the raw value is different (BTRFS_SUBVOL_RDONLY, 0x2).
Due to this the warning was issued. Fix that by using the right flag
constant. The test has been extended to check for all combinations of
read-write and received_uuid.
Issue: #419
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
When a special image (diverted from fsck/012) has its unused slots (slot
number >= nritems) with garbage, lowmem mode btrfs check can crash:
(gdb) run check --mode=lowmem ~/downloads/good.img.restored
Starting program: /home/adam/btrfs/btrfs-progs/btrfs check --mode=lowmem ~/downloads/good.img.restored
...
ERROR: root 5 INODE[5044031582654955520] nlink(257228800) not equal to inode_refs(0)
ERROR: root 5 INODE[5044031582654955520] nbytes 474624 not equal to extent_size 0
Program received signal SIGSEGV, Segmentation fault.
0x0000555555639b11 in btrfs_inode_size (eb=0x5555558a7540, s=0x642e6cd1) at ./kernel-shared/ctree.h:1703
1703 BTRFS_SETGET_FUNCS(inode_size, struct btrfs_inode_item, size, 64);
(gdb) bt
#0 0x0000555555639b11 in btrfs_inode_size (eb=0x5555558a7540, s=0x642e6cd1) at ./kernel-shared/ctree.h:1703
#1 0x0000555555641544 in check_inode_item (root=0x5555556c2290, path=0x7fffffffd960) at check/mode-lowmem.c:2628
[CAUSE]
At check_inode_item() we have path->slot[0] at 29, while the tree block
only has 26 items.
This happens because two reasons:
- btrfs_next_item() never reverts its slots
Even if we failed to read next leaf.
- check_inode_item() doesn't inform the caller that a fatal error
happened
In check_inode_item(), if btrfs_next_item() failed, it goes to out
label, which doesn't really set @err properly.
This means, when check_inode_item() fails at btrfs_next_item(), it will
increase path->slots[0], while it's already beyond current tree block
nritems.
When the slot increases furthermore, and if the unused item slots have
some garbage, we will get invalid btrfs_item_ptr() result, and causing
above segfault.
[FIX]
Fix the problems by two ways:
- Make btrfs_next_item() to revert its path->slots[0] on failure
- Properly detect fatal error from check_inode_item()
By this, we will no longer crash on the crafted image.
Reported-by: Wang Yugui <wangyugui@e16-tech.com>
Issue: #412
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>