Commit "btrfs-progs: convert: prevent 32bit overflow for
cctx->total_bytes" added an assert to ensure that cctxx.total_bytes did
not overflow, but this ASSERT calls assert_trace, which expects a long
value.
By converting the u64 to long overflows in a 32bit machine, leading the
assert_trace to be triggered since cctx.total_bytes turns to zero.
Fix this problem by comparing the cctx.total_bytes with zero when
calling ASSERT.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
When convert is called on a 64GiB ext4 fs, it fails like this:
$ btrfs-convert /dev/loop0p1
create btrfs filesystem:
blocksize: 4096
nodesize: 16384
features: extref, skinny-metadata (default)
checksum: crc32c
creating ext2 image file
ERROR: missing data block for bytenr 1048576
ERROR: failed to create ext2_saved/image: -2
WARNING: an error occurred during conversion, filesystem is partially created but not finalized and not mountable
Btrfs-convert also corrupts the source fs:
$ LANG=C e2fsck /dev/loop0p1 -f
e2fsck 1.45.6 (20-Mar-2020)
Resize inode not valid. Recreate<y>? yes
Pass 1: Checking inodes, blocks, and sizes
Deleted inode 3681 has zero dtime. Fix<y>? yes
Inodes that were part of a corrupted orphan linked list found. Fix<y>? yes
Inode 3744 was part of the orphaned inode list. FIXED.
Deleted inode 3745 has zero dtime. Fix<y>? yes
Inode 3747 has INLINE_DATA_FL flag on filesystem without inline data support.
Clear<y>? yes
...
[CAUSE]
After some debugging, the first strange behavior is, the value of
cctx->total_bytes is 0 in ext2_open_fs().
It turns out that, the value assign for cctx->total_bytes could lead to
bit overflow for the unsigned int value.
And that 0 cctx->total_bytes leads to various problems for later free
space calculation.
For example, in calculate_available_space(), we use cctx->total_bytes to
ensure we won't create a data chunk beyond device end:
cue_len = min(cctx->total_bytes - cur_off, cur_len);
If that cur_offset is also 0, we will create a cache_extent with 0 size,
which could cause a lot of problems for cache tree search.
[FIX]
Do manual casting for the multiply operation, so we could got a real u64
result. The fix will be applied to all supported fses (ext* and
reiserfs).
Reported-by: Christian Zangl <coralllama@gmail.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
The following script could lead to corrupted btrfs fs after
btrfs-convert:
fallocate -l 1G test.img
mkfs.ext4 test.img
mount test.img $mnt
fallocate -l 200m $mnt/file1
fallocate -l 200m $mnt/file2
fallocate -l 200m $mnt/file3
fallocate -l 200m $mnt/file4
fallocate -l 205m $mnt/file1
fallocate -l 205m $mnt/file2
fallocate -l 205m $mnt/file3
fallocate -l 205m $mnt/file4
umount $mnt
btrfs-convert test.img
The result btrfs will have a device extent beyond its boundary:
pening filesystem to check...
Checking filesystem on test.img
UUID: bbcd7399-fd5b-41a7-81ae-d48bc6935e43
[1/7] checking root items
[2/7] checking extents
ERROR: dev extent devid 1 physical offset 993198080 len 85786624 is beyond device boundary 1073741824
ERROR: errors found in extent allocation tree or chunk allocation
[3/7] checking free space cache
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs
[7/7] checking quota groups skipped (not enabled on this FS)
found 913960960 bytes used, error(s) found
total csum bytes: 891500
total tree bytes: 1064960
total fs tree bytes: 49152
total extent tree bytes: 16384
btree space waste bytes: 144885
file data blocks allocated: 2129063936
referenced 1772728320
[CAUSE]
Btrfs-convert first collect all used blocks in the original fs, then
slightly enlarge the used blocks range as new btrfs data chunks.
However the enlarge part has a problem, that it doesn't take the device
boundary into consideration.
Thus it caused device extents and data chunks to go beyond device
boundary.
[FIX]
Just to extra check before inserting data chunks into
btrfs_convert_context::data_chunk.
Reported-by: Jiachen YANG <farseerfc@gmail.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[WARNING]
When compiling btrfs-progs, there is one warning from convert ext2 code:
convert/source-ext2.c: In function 'ext2_open_fs':
convert/source-ext2.c:91:44: warning: pointer targets in passing argument 1 of 'strndup' differ in signedness [-Wpointer-sign]
91 | cctx->volume_name = strndup(ext2_fs->super->s_volume_name, 16);
| ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~
| |
| __u8 * {aka unsigned char *}
In file included from ./kerncompat.h:25,
from convert/source-ext2.c:19:
/usr/include/string.h:175:35: note: expected 'const char *' but argument is of type '__u8 *' {aka 'unsigned char *'}
175 | extern char *strndup (const char *__string, size_t __n)
| ~~~~~~~~~~~~^~~~~~~~
The toolchain involved is:
- GCC 10.1.0
- e2fsprogs 1.45.6
[CAUSE]
Obviously, in the offending e2fsprogs, the volume label is using u8,
which is unsigned char, not char.
/*078*/ __u8 s_volume_name[EXT2_LABEL_LEN]; /* volume name, no NUL? */
[FIX]
Just do a forced conversion to suppress the warning is enough.
I don't think we need to apply -Wnopointer-sign yet.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Make the features structures more generic to allow mkfs-time and
mount-time sets to be defined.
This provides base for later mkfs support of mount-time features like
quotas.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This would sync the code between kernel and btrfs-progs, and save at
least 1 byte for each btrfs_block_group_cache.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
On aarch64 with pagesize 64k, btrfs-convert of ext4 is successful,
but it won't mount because we don't yet support subpage blocksize, ie.
when page size and sectorsize don't match.
BTRFS error (device vda): sectorsize 4096 not supported yet, only support 65536
So in this case during convert provide a warning but let the conversion
proceed.
Example:
WARNING: Blocksize 4096 is not equal to the pagesize 65536,
converted filesystem won't mount on this system.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Convert is always set to true so there's no point in having it as a
function parameter or using it as a predicate inside
btrfs_alloc_data_chunk. Remove it and all relevant code which would
have never been executed. No semantics changes.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
It's always set to BTRFS_BLOCK_GROUP_DATA so sink it into the function.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
With the introduction of xxhash64 to btrfs-progs we created a crypto/
directory for all the hashes used in btrfs (although no
cryptographically secure hash is there yet).
Move the crc32c implementation from kernel-lib/ to crypto/ as well so we
have all hashes consolidated.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
Adding this table will make extending btrfs-progs with new checksum types
easier.
Also add accessor functions to access the table fields.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
Add an option to mkfs to specify which checksum algorithm will be used
for the filesystem. Currently only crc32c is supported.
The option name is -c, presumably one of the comonly used options so it
gets the lowercase option.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
Update the checksumming API to be able to cope with more checksum types
than just CRC32C. The finalization call is merged into btrfs_csum_data.
There are some fixme's and asserts added that need to be resolved.
Co-developed-by: David Sterba <dsterba@suse.com>
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
In preparation to supporting new checksum algorithm pass the checksum type
to btrfs_csum_data/btrfs_csum_final, this allows us to encapsulate any
differences in processing into the respective functions
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
Pass pointer to a generic buffer instead of fixed size that crc32c
currently uses.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
Add the checksum type to csum_tree_block_size(), __csum_tree_block_size()
and verify_tree_block_csum_silent().
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
Add checksum type to the definition structure for a new filesystem, this
will be used in following patches.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
Pass in a btrfs_mkfs_config to write_temp_extent_buffer(), this is
needed so we can grab the checksum type for checksum buffer verification
in later patches.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
Build several standalone tools into one binary and switch the function
by name (symlink or hardlink).
* btrfs
* mkfs.btrfs
* btrfs-image
* btrfs-convert
* btrfstune
The static target is also supported. The name of resulting boxed
binaries is btrfs.box and btrfs.box.static . All the binaries can be
built at the same time without prior configuration.
text data bss dec hex filename
822454 27000 19724 869178 d433a btrfs
927314 28816 20812 976942 ee82e btrfs.box
2067745 58004 44736 2170485 211e75 btrfs.static
2627198 61724 83800 2772722 2a4ef2 btrfs.box.static
File sizes:
857496 btrfs
968536 btrfs.box
2141400 btrfs.static
2704472 btrfs.box.static
Standalone utilities:
512504 btrfs-convert
495960 btrfs-image
471224 btrfstune
491864 mkfs.btrfs
1747720 btrfs-convert.static
1411416 btrfs-image.static
1304256 btrfstune.static
1361696 mkfs.btrfs.static
So the shared 900K binary saves ~2M, or ~5.7M for static build.
Signed-off-by: David Sterba <dsterba@suse.cz>
Create directory for all sources that can be used by anything that's not
rellated to a relevant kernel part, all common functions, helpers,
utilities that do not fit any other specific category.
The traditional location would be probably lib/ with all things that are
statically linked to the main binaries, but we have libbtrfs and
libbtrfsutil so this would be confusing.
Signed-off-by: David Sterba <dsterba@suse.com>
Although moderm hardware is fast enough and crc32c calculation is not a
hotspot, doing such optimization won't hurt anyway.
Issue: #175
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In convert we use trans->block_reserved >= 4096 as a threshold to commit
transaction, where block_reserved is the number of new tree blocks
allocated inside a transaction.
The problem is, we still have a hidden bug in delayed ref implementation
in btrfs-progs, when we have a large enough transaction, delayed ref may
failed to find certain tree blocks in extent tree and cause transaction
abort.
This fix will workaround it by committing transaction at a much lower
threshold.
The old 4096 means 4096 new tree blocks, when using default (16K)
nodesize, it's 64M, which can contain over 12k inlined data extent or
csum for around 60G, or over 800K file extents.
The new threshold will limit the size of new tree blocks to 2M, aligning
with the chunk preallocator threshold, and reducing the possibility to
hit that delayed ref bug.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add support for a new metadata_uuid field. This is just a preparatory
commit which switches all users of the fsid field for metdata comparison
purposes to utilize the new field. This more or less mirrors the
kernel patch, additionally:
* Update 'btrfs inspect-internal dump-super' to account for the new
field. This involes introducing the 'metadata_uuid' line to the
output and updating the logic for comparing the fs uuid to the
dev_item uuid.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Similar to the changes where strerror(errno) was converted, continue
with the remaining cases where the argument was stored in another
variable.
The savings in object size are about 4500 bytes:
$ size btrfs.old btrfs.new
text data bss dec hex filename
805055 24248 19748 849051 cf49b btrfs.old
804527 24248 19748 848523 cf28b btrfs.new
Signed-off-by: David Sterba <dsterba@suse.com>
When convert failed, the error messsage would look like:
create btrfs filesystem:
blocksize: 4096
nodesize: 16384
features: extref, skinny-metadata (default)
creating ext2 image file
ERROR: failed to create ext2_saved/image: -1
WARNING: an error occurred during conversion, filesystem is partially
created but not finalized and not mountable
We can only know something wrong happened during "ext2_saved/image" file
creation, but unable to know what exactly went wrong.
This patch will add the following error messages for create_image() and
its callee:
1) Sanity test error
2) Csum calculation error
3) Free ino number allocation error
4) Inode creation error
5) Inode mode change error
6) Inode link error
With all these error messages, we should be pretty easy to locate the
error without extra debugging.
Reported-by: Serhat Sevki Dincer <jfcgauss@gmail.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When pread64() returns value smaller than expected, it normally means
EIO, so just return -EIO to replace the intermediate number. So when IO
fails, we should be able to get more meaningful error number of than
EPERM.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We reuse the task_position enum and task_ctx struct of the original progress
indicator, adding more values and fields for our needs.
Then add hooks in all steps of the check to properly record progress.
Here's how the output looks like on a 22 Tb 5-disk RAID1 FS:
Opening filesystem to check...
Checking filesystem on /dev/mapper/luks-ST10000VN0004-XXXXXXXX
UUID: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
[1/7] checking extents (0:20:21 elapsed, 950958 items checked)
[2/7] checking root items (0:01:29 elapsed, 15121 items checked)
[3/7] checking free space cache (0:00:11 elapsed, 4928 items checked)
[4/7] checking fs roots (0:51:31 elapsed, 600892 items checked)
[5/7] checking csums (0:14:35 elapsed, 754522 items checked)
[6/7] checking root refs (0:00:00 elapsed, 232 items checked)
[7/7] checking quota groups skipped (not enabled on this FS)
found 5286458060800 bytes used, no error found
Signed-off-by: Stéphane Lesimple <stephane_btrfs@lesimple.fr>
Signed-off-by: David Sterba <dsterba@suse.com>
It's always set to extent_root and the function already takes a
transaction handle where fs_info could be referenced and in turn
the extent_tree.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Commit 324d4c1857 (btrfs-progs: convert: Add larger device support)
introduced new dependencies on the 64-bit API provided by e2fsprogs.
That API was introduced in v1.42 (along with bigalloc).
This patch maps the following to their equivalents in e2fsprogs < 1.42.
- ext2fs_get_block_bitmap_range2
- ext2fs_inode_data_blocks2
- ext2fs_read_ext_attr2
Since we need to detect and define EXT2_FLAG_64BITS for compatibilty
anyway, it makes sense to use that to detect the older e2fsprogs instead
of defining a new flag ourselves.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The old flag OPEN_CTREE_FS_PARTIAL is in fact quite easy to be confused
with OPEN_CTREE_PARTIAL, which allow btrfs-progs to open damaged
filesystem (like corrupted extent/csum tree).
However OPEN_CTREE_FS_PARTIAL, unlike its name, is just allowing
btrfs-progs to open fs with temporary superblocks (which only has 6
basic trees on SINGLE meta/sys chunks).
The usage of FS_PARTIAL is really confusing here.
So rename OPEN_CTREE_FS_PARTIAL to OPEN_CTREE_TEMPORARY_SUPER, and add
extra comment for its behavior.
Also rename BTRFS_MAGIC_PARTIAL to BTRFS_MAGIC_TEMPORARY to keep the
naming consistent.
And with above comment, the usage of FS_PARTIAL in dump-tree is
obviously incorrect, fix it.
Fixes: 8698a2b9ba ("btrfs-progs: Allow inspect dump-tree to show specified tree block even some tree roots are corrupted")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Instead of using the internal struct extent_io_tree, use struct fs_info.
This does not only unify the interface between kernel and btrfs-progs,
but also makes later btrfs_print_tree() use fewer parameters.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The kernel code no longer has BTRFS_CRC32_SIZE and only uses
btrfs_csum_sizes[]. So, update the progs code as well.
Suggested-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[Bug]
On btrfs converted from ext*, one user reported the following kernel
warning:
------------[ cut here ]------------
BTRFS: Transaction aborted (error -95)
WARNING: CPU: 0 PID: 324 at fs/btrfs/inode.c:3042 btrfs_finish_ordered_io+0x7ab/0x850 [btrfs]
Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
RIP: 0010:btrfs_finish_ordered_io+0x7ab/0x850 [btrfs]
...
Call Trace:
normal_work_helper+0x39/0x370 [btrfs]
process_one_work+0x1ce/0x410
worker_thread+0x2b/0x3d0
? process_one_work+0x410/0x410
kthread+0x113/0x130
? kthread_create_on_node+0x70/0x70
? do_syscall_64+0x74/0x190
? SyS_exit_group+0x10/0x10
ret_from_fork+0x35/0x40
---[ end trace c8ed62ff6a525901 ]---
BTRFS: error (device dm-2) in
btrfs_finish_ordered_io:3042: errno=-95 unknown
BTRFS info (device dm-2): forced readonly
BTRFS error (device dm-2): pending csums is 6447104
[Cause]
The call trace and the unique return value points to
__btrfs_drop_extents(), when we tries to drop pages of an inline extent,
we will trigger such -EOPNOTSUPP.
However kernel has limitation on the size of inline file extent
(sector size for ram size and sector size - 1 for on-disk size),
btrfs-convert doesn't have the same limitation, resulting much larger
file extent.
The lack of correct inline extent size check dates back to 2008 when
btrfs-convert is added into btrfs-progs.
[Fix]
Fix the inline extent creation condition, not only using
BTRFS_MAX_INLINE_DATA_SIZE(), which is only the maximum size of inline
data according to nodesize, but also limit it against sector size.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In latest e2fsprogs (1.44.0) definition of ext2_ext_attr_entry has
removed member e_value_block, as currently ext* doesn't support it set
anyway.
So remove such check so that we can pass compile.
Issue: #110
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199071
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Exposed by convert-test with D=asan.
Unlike btrfs, ext2fs_close() still leaves its ext2_filsys parameter
filled with allocated pointers.
It needs ext2fs_free() to free those pointers.
Issue: #92
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Do a cleanup. Also make it consistent with kernel. Use fs_info instead
of root for BTRFS_MAX_INLINE_DATA_SIZE, since maybe in some situation we
do not know root, but just know fs_info.
Change macro to inline function to be consistent with kernel. And
change the function body to match kernel.
Signed-off-by: Gu Jinxiang <gujx@cn.fujitsu.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>