Commit Graph

211 Commits

Author SHA1 Message Date
Mark Harmstone
ec8a6b1536 btrfs-progs: mkfs: add ro flag to --subvol
Adds a flag to mkfs.btrfs --subvol to allow subvolumes to be created
readonly.

Signed-off-by: Mark Harmstone <maharmstone@fb.com>
2024-09-17 17:00:03 +02:00
Qu Wenruo
18ecbfd3dd btrfs-progs: open the devices exclusively for writes
There is an internal report that, during btrfs-convert to block-group
tree, by accident some systemd events triggered the mount of the target
fs.

This leads to double mount (one by kernel and one by the btrfs-progs),
which seems to cause quite some problems.

To avoid such accident, exclusively opens all devices if btrfs-progs is
doing write operations.

Pull-request: #888
Reported-by: pandada8 <pandada8@gmail.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
2024-09-17 14:33:22 +02:00
Qu Wenruo
bc0995297f btrfs-progs: convert: fix inline extent size for symlink
[BUG]
Sometimes test case btrfs/012 fails randomly, with the failure to read a
symlink:

     QA output created by 012
     Checking converted btrfs against the original one:
    -OK
    +readlink: Structure needs cleaning
     Checking saved ext2 image against the original one:
     OK

Furthermore, this will trigger a kernel error message:

 BTRFS critical (device dm-2): regular/prealloc extent found for non-regular inode 133081

[CAUSE]
For that specific inode 133081, the tree dump looks like this:

        item 127 key (133081 INODE_ITEM 0) itemoff 40984 itemsize 160
                generation 1 transid 1 size 4095 nbytes 4096
                block group 0 mode 120777 links 1 uid 0 gid 0 rdev 0
                sequence 0 flags 0x0(none)
        item 128 key (133081 INODE_REF 133080) itemoff 40972 itemsize 12
                index 2 namelen 2 name: l3
        item 129 key (133081 EXTENT_DATA 0) itemoff 40919 itemsize 53
                generation 4 type 1 (regular)
                extent data disk byte 2147483648 nr 38080512
                extent data offset 37974016 nr 4096 ram 38080512
                extent compression 0 (none)

Note that, the symlink inode size is 4095 at the max size (PATH_MAX,
removing the terminating NUL).
But the nbytes is 4096, exactly matching the sector size of the btrfs.

Thus it results the creation of a regular extent, but for btrfs we do
not accept a symlink with a regular/preallocated extent, thus kernel
rejects such read and failed the readlink call.

The root cause is in the convert code, where for symlinks we always
create a data extent with its size + 1, causing the above problem.

I guess the original code is to handle the terminating NUL, but in btrfs
we never need to store the terminating NUL for inline extents nor
file names.

Thus this pitfall in btrfs-convert leads to the above invalid data
extent and fail the test case.

[FIX]
- Fix the ext2 and reiserfs symbolic link creation code
  To remove the terminating NUL.

- Add extra checks for the size of a symbolic link
  Btrfs has extra limits on the size of a symbolic link, as btrfs must
  store symbolic link targets as inlined extents.

  This means for 4K node sized btrfs, the size limit is smaller than the
  usual PATH_MAX - 1 (only around 4000 bytes instead of 4095).

  So for certain nodesize, some filesystems can not be converted to
  btrfs.
  (this should be rare, because the default nodesize is 16K already)

- Split the symbolic link and inline data extent size checks
  For symbolic links the real limit is PATH_MAX - 1 (removing the
  terminating NUL), but for inline data extents the limit is
  sectorsize - 1, which can be different from 4096 - 1 (e.g. 64K sector
  size).

Pull-request: #884
Signed-off-by: Qu Wenruo <wqu@suse.com>
2024-09-17 14:33:22 +02:00
Qu Wenruo
d3cf350e21 btrfs-progs: introduce btrfs_rebuild_uuid_tree() for mkfs and btrfs-convert
Currently mkfs uses its own create_uuid_tree(), but that function is
only handling FS_TREE.  This means for btrfs-convert we do not generate
the uuid tree, nor add the UUID of the image subvolume.  This can be a
problem if we're going to support multiple subvolumes during mkfs time.

To address this, introduce a new helper, btrfs_rebuild_uuid_tree():

- Create a new uuid tree if there is not one

- Remove all the existing items from uuid tree

- Iterate through all subvolumes
  * If the subvolume has no valid UUID, regenerate one
  * Add the uuid entry for the subvolume UUID
  * If the subvolume has received UUID, also add it to UUID tree

By this, this new helper can handle all the uuid tree generation needs for:

- Current mkfs
  Only one uuid entry for FS_TREE

- Current btrfs-convert
  Only FS_TREE and the image subvolume

- Future multi-subvolume mkfs
  As we do the scan for all subvolumes.

- Future "btrfs rescue rebuild-uuid-tree"

Signed-off-by: Qu Wenruo <wqu@suse.com>
2024-07-30 20:02:56 +02:00
Qu Wenruo
a927cb1b0a btrfs-progs: convert: fix the filename output when rolling back
[BUG]
When rolling back a converted btrfs, the filename output is corrupted:

  $ btrfs-convert -r  ~/test.img
  btrfs-convert from btrfs-progs v6.9.2

  Open filesystem for rollback:
    Label:
    UUID:            df54baf3-c91e-4956-96f9-99413a857576
    Restoring from:  ext2_saved0ƨy/image
                               ^^^ Corruption
  Rollback succeeded

[CAUSE]
The error is in how we handle the filename.  In btrfs all our strings
are not '\0' terminated, but with explicit length.

But in C, most strings are '\0' terminated, so after reading a filename
from btrfs, we need to manually terminate the string.

However the code adding the terminating '\0' looks like this:

	/* Get the filename length. */
	name_len = btrfs_root_ref_name_len(path.nodes[0], root_ref_item);

	/*
	 * This should not happen, but as an extra handling for possible
	 * corrupted btrfs.
	 */
	if (name_len > sizeof(dir_name))
		name_len = sizeof(dir_name) - 1;
	/* Got the real filename into our buffer. */
 	read_extent_buffer(path.nodes[0], dir_name, (unsigned long)(root_ref_item + 1), name_len);

	/* Terminate the string. */
	dir_name[sizeof(dir_name) - 1] = 0;

The problem is, the final termination is totally wrong, it always make
the last buffer char '\0', not using the @name_len we read before.

[FIX]
Use @name_len to terminate the string, as we have already updated it to
handle buffer overflow, it can handle both the regular and corrupted
case.

Fixes: dc29a5c51d ("btrfs-progs: convert: update default output")
Signed-off-by: Qu Wenruo <wqu@suse.com>
2024-07-30 19:59:07 +02:00
Yaroslav Halchenko
5b3bb3973a btrfs-progs: do interactive fixing of some ambigous typos
Typos fixed manually using the following:

=== Do not change lines below ===
{
 "chain": [],
 "cmd": "codespell -w -i 3 -C 2",
 "exit": 0,
 "extra_inputs": [],
 "inputs": [],
 "outputs": [],
 "pwd": "."
}
^^^ Do not change lines above ^^^

Author: Yaroslav Halchenko <debian@onerussian.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-07-30 19:56:08 +02:00
Qu Wenruo
9ad15a7301 btrfs-progs: use btrfs_link_subvolume() to replace btrfs_mksubvol()
The function btrfs_mksubvol() is very different between btrfs-progs and
kernel, the former version is really just linking a subvolume to another
directory inode, but the kernel version is really to make a completely
new subvolume.

Instead of same-named function, introduce btrfs_link_subvolume() and use
it to replace the old btrfs_mksubvol().

This is done by:

- Introduce btrfs_link_subvolume()
  Which does extra checks before doing any modification:
  * Make sure the target inode is a directory
  * Make sure no filename conflict

  Then do the linkage:
  * Add the dir_item/dir_index into the parent inode
  * Add the forward and backward root refs into tree root

- Introduce link_image_subvolume() helper
  Currently btrfs_mksubvol() has a dedicated convert filename retry
  behavior, which is unnecessary and should be done by the convert code.

  Now move the filename retry behavior into the helper.

- Remove btrfs_mksubvol()
  Since there is only one caller utilizing btrfs_mksubvol(), and it's
  now gone, we can remove the old btrfs_mksubvol().

Signed-off-by: Qu Wenruo <wqu@suse.com>
2024-07-30 19:54:50 +02:00
Qu Wenruo
3c555beabf btrfs-progs: introduce btrfs_make_subvolume()
There are two different subvolume/data reloc tree creation routines:

- create_subvol() from convert/main.c
  * calls btrfs_copy_root() to create an empty root
    This is not safe, as it relies on the source root to be empty.
  * calls btrfs_read_fs_root() to add it to the cache and trace it
    properly
  * calls btrfs_make_root_dir() to initialize the empty new root

- create_data_reloc_tree() from mkfs/main.c
  * calls btrfs_create_tree() to create an empty root
  * Manually add the root to fs_root cache
    This is only safe for data reloc tree as it's never updated
    inside btrfs-progs.
    But not safe for other subvolume trees.
  * manually setup the root dir

Both have their good and bad aspects, so here we introduce a new helper,
btrfs_make_subvolume():

- Calls btrfs_create_tree() to create an empty root
- Calls btrfs_read_fs_root() to setup the cache and tracking properly
- Calls btrfs_make_root_dir() to initialize the root dir
- Calls btrfs_update_root() to reflect the rootdir change

So this new helper can replace both create_subvol() and
create_data_reloc_tree().

Signed-off-by: Qu Wenruo <wqu@suse.com>
2024-07-30 19:54:04 +02:00
David Sterba
ed1c6b8c00 btrfs-progs: convert: copy whole xattr name buffer
Commit 4db925911c ("btrfs-progs: use strncpy_null everywhere")
replaced strncpy with strncpy_null, the maximum xattr name length is 255
(current limit), the target buffer is large enough for the whole size so
make sure the last character is also copied.

Signed-off-by: David Sterba <dsterba@suse.com>
2024-06-25 17:42:14 +02:00
David Sterba
4db925911c btrfs-progs: use strncpy_null everywhere
Use the safe version of strncpy that makes sure the string is
terminated.

To be noted:

- the conversion in scrub path handling was skipped
- sizes of device paths in some ioctl related structures is
  BTRFS_DEVICE_PATH_NAME_MAX + 1

Recently gcc 13.3 started to detect problems with our use of strncpy
potentially lacking the null terminator, warnings like:

cmds/inspect.c: In function ‘cmd_inspect_logical_resolve’:
cmds/inspect.c:294:33: warning: ‘__builtin_strncpy’ specified bound 4096 equals destination size [-Wstringop-truncation]
  294 |                                 strncpy(mount_path, mounted, PATH_MAX);
      |                                 ^

Signed-off-by: David Sterba <dsterba@suse.com>
2024-06-24 19:18:48 +02:00
David Sterba
e673aa22bc btrfs-progs: rename and move __strncpy_null to string-utils
Now that there's only __strncpy_null we can drop the underscore and move
it to string-utils as it's a generic string function rather than
something for paths.

Signed-off-by: David Sterba <dsterba@suse.com>
2024-06-24 19:18:46 +02:00
David Sterba
eb7c5eb1bd btrfs-progs: copy entire label buffer to target buffers
The label is of a fixed size 256 bytes and expects the zero terminator.
Using __strncpy_null is correct as it makes sure there's always the zero
termination but the argument passed in skips the last character.

Signed-off-by: David Sterba <dsterba@suse.com>
2024-06-24 19:18:43 +02:00
Qu Wenruo
c23e068aaf btrfs-progs: convert: rework file extent iteration to handle unwritten extents
[BUG]
There is a bug report that btrfs-convert cannot handle unwritten extents
(EXT2_EXTENT_FLAGS_UNINIT set, which is pretty much the same as
BTRFS_FILE_EXTENT_PREALLOC), which can cause the converted image to have
incorrect contents.

[CAUSE]
Currently we use ext2fs_block_iterate2() to go through all data extents
of an ext2 inode, but it doesn't provide the info on if the range is
unwritten or not.

Thus for unwritten extents, the results btrfs would just treat it as
regular extents, and read the contents from disk other than setting the
contents to zero.

[FIX]
Instead of the ext2fs_block_iterate2(), here we follow the debugfs'
"dump_extents" command, to use ext2fs_extent_*() helpers to go through
every data extent of the inode, that's if the inode supports the
EXT4_EXTENTS_FL flag.

Now we can properly get the info of which extents are unwritten, and use
holes to replace those unwritten extents.

Reported-by: Yordan <y16267966@gmail.com>
Link: https://lore.kernel.org/all/d34c7d77a7f00c93bea6a4d6e83c7caf.mailbg@mail.bg/
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-06-03 22:10:22 +02:00
David Sterba
bcb887a4de btrfs-progs: convert: handle rename of inode_includes() from e2fsprogs 1.47.1
There is a new release candidate for e2fsprogs https://github.com/tytso/e2fsprogs/releases/tag/v1.47.1-rc2

Linking btrfs-progs v6.8 against this version of e2fsprogs leads to the following compile error:

convert/source-ext2.c: In function 'ext4_copy_inode_timespec_extra':
convert/source-ext2.c:733:13: warning: implicit declaration of function 'inode_includes' [-Wimplicit-function-declaration]
  733 |         if (inode_includes(inode_size, i_ ## xtime ## _extra)) {                        \
      |             ^~~~~~~~~~~~~~
convert/source-ext2.c:769:9: note: in expansion of macro 'EXT4_COPY_XTIME'
  769 |         EXT4_COPY_XTIME(atime, dst, tv_sec, tv_nsec);
      |         ^~~~~~~~~~~~~~~
convert/source-ext2.c:733:40: error: 'i_atime_extra' undeclared (first use in this function)
  733 |         if (inode_includes(inode_size, i_ ## xtime ## _extra)) {                        \
      |                                        ^~
convert/source-ext2.c:769:9: note: in expansion of macro 'EXT4_COPY_XTIME'
  769 |         EXT4_COPY_XTIME(atime, dst, tv_sec, tv_nsec);
      |         ^~~~~~~~~~~~~~~
convert/source-ext2.c:733:40: note: each undeclared identifier is reported only once for each function it appears in
  733 |         if (inode_includes(inode_size, i_ ## xtime ## _extra)) {                        \
      |                                        ^~
convert/source-ext2.c:769:9: note: in expansion of macro 'EXT4_COPY_XTIME'
  769 |         EXT4_COPY_XTIME(atime, dst, tv_sec, tv_nsec);
      |         ^~~~~~~~~~~~~~~
convert/source-ext2.c:733:40: error: 'i_mtime_extra' undeclared (first use in this function)
  733 |         if (inode_includes(inode_size, i_ ## xtime ## _extra)) {                        \
      |                                        ^~
convert/source-ext2.c:770:9: note: in expansion of macro 'EXT4_COPY_XTIME'
  770 |         EXT4_COPY_XTIME(mtime, dst, tv_sec, tv_nsec);
      |         ^~~~~~~~~~~~~~~
convert/source-ext2.c:733:40: error: 'i_ctime_extra' undeclared (first use in this function)
  733 |         if (inode_includes(inode_size, i_ ## xtime ## _extra)) {                        \
      |                                        ^~
convert/source-ext2.c:771:9: note: in expansion of macro 'EXT4_COPY_XTIME'
  771 |         EXT4_COPY_XTIME(ctime, dst, tv_sec, tv_nsec);
      |         ^~~~~~~~~~~~~~~
convert/source-ext2.c:774:40: error: 'i_crtime_extra' undeclared (first use in this function)
  774 |         if (inode_includes(inode_size, i_crtime_extra)) {
      |                                        ^~~~~~~~~~~~~~

from tytso/e2fsprogs@ca8bc92

    Fix inode_includes() macro to properly wrap "inode" parameter,
    and rename to ext2fs_inode_includes() to avoid potential name
    clashes. Use this to check inode field inclusion in debugfs
    instead of bare constants for inode field offsets.

To fix that use the new prefixed macro and add backward compatibility that
would still use inode_includes().

Issue: #785
Signed-off-by: David Sterba <dsterba@suse.com>
2024-05-03 14:53:52 +02:00
David Sterba
7f396f5ced btrfs-progs: reorder key initializations
Use the objectid, type, offset natural order as it's more readable and
we're used to read keys like that.

Signed-off-by: David Sterba <dsterba@suse.com>
2024-04-30 21:49:15 +02:00
David Sterba
1601cd0d4f btrfs-progs: convert defined constants to enums
Signed-off-by: David Sterba <dsterba@suse.com>
2024-02-08 08:30:37 +01:00
Qu Wenruo
389c959d6d btrfs-progs: implement arg_strtou64_with_suffix() with a new helper
This patch introduces a new parser helper, parse_u64_with_suffix(),
which has a better error handling, following all the parse_*()
helpers to return non-zero value for errors.

This new helper is going to replace parse_size_from_string(), which
would directly call exit(1) to stop the whole program.

Furthermore most callers of parse_size_from_string() are expecting
exit(1) for error, so that they can skip the error handling.

For those call sites, introduce a wrapper, arg_strtou64_with_suffix(),
to do that.  The only disadvantage is a little less detailed error
report for why the parse failed, but for most cases the generic error
string should be enough.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-01-18 02:14:23 +01:00
Qu Wenruo
517ba2d9e5 btrfs-progs: convert: insert a dummy inode item before inode ref for ext2/4
[BUG]
There is a report about failed btrfs-convert, which shows the following
error:

  Create btrfs metadata
  corrupt leaf: root=5 block=5001931145216 slot=1 ino=89911763, invalid previous key objectid, have 89911762 expect 89911763
  leaf 5001931145216 items 336 free space 7 generation 90 owner FS_TREE
  leaf 5001931145216 flags 0x1(WRITTEN) backref revision 1
  fs uuid 8b69f018-37c3-4b30-b859-42ccfcbe2449
  chunk uuid 448ce78c-ea41-49f6-99dc-46ad80b93da9
          item 0 key (89911762 INODE_REF 3858733) itemoff 16222 itemsize 61
                  index 171 namelen 51 name: [FILENAME1]
          item 1 key (89911763 INODE_REF 3858733) itemoff 16161 itemsize 61
                  index 103 namelen 51 name: [FILENAME2]

[CAUSE]
When iterating a directory, btrfs-convert would insert the DIR_ITEMs,
along with the INODE_REF of that inode.

This leads to above stray INODE_REFs, and trigger the tree-checker.

This can only happen for large fs, as for most cases we have all these
modified tree blocks cached, thus tree-checker won't be triggered.
But when the tree block cache is not hit, and we have to read from disk,
then such behavior can lead to above tree-checker error.

[FIX]
Insert a dummy INODE_ITEM for the INODE_REF first, the inode items would
be updated when iterating the child inode of the directory.

Issue: #731
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-01-17 01:47:39 +01:00
Qu Wenruo
136c58617c btrfs-progs: convert: make sure the length of data chunks are also stripe aligned
Although kernel scrub code has been updated to handle the unaligned
chunk length, there is also no harm if we can allocate data chunk with
both start and length aligned.

This patch handles this by rounding up the end bytenr when allocating
data chunks for the conversion.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-01-16 19:11:54 +01:00
Qu Wenruo
631ee66639 btrfs-progs: convert: for ext2, fix possible tree-checker error when converting a large fs
[BUG]
There is a report about failed btrfs-convert, which shows the following
error:

  corrupt leaf: root=5 block=5001928998912 slot=1 ino=89911763, invalid previous key objectid, have 89911762 expect 89911763
  ERROR: failed to copy ext2 inode 89911320: -5
  ERROR: error during copy_inodes -5
  WARNING: error during conversion, the original filesystem is not modified

[CAUSE]
Above error is triggered when checking the following items inside a
subvolume:

- inode ref
- dir item/index
- file extent
- xattr

This is to make sure these items have correct previous key.

However btrfs-convert is not following this requirement, it always
inserts those items first, then creates a btrfs_inode for it.

Thus it can lead to the error.

This can only happen for large fs, as for most cases we have all these
modified tree blocks cached, thus tree-checker won't be triggered.
But when the tree block cache is not hit, and we have to read from disk,
then such behavior can lead to above tree-checker error.

[FIX]
Make sure we insert the inode item first, then the file extents/dir
items/xattrs.  And after the file extents/dir items/xattrs inserted, we
update the existing inode (to update its size and bytes).

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-01-12 16:36:36 +01:00
Qu Wenruo
146cca7e16 btrfs-progs: move clear-cache.[ch] from check/ to common/ directory
The clear-cache functionality is shared by several commands:

- btrfs check
  For --clear-cache and --clear-ino-cache.

- btrfstune
  Mostly for block-group-tree feature conversion.

- btrfs-convert
  To enable the now default v2 space cache.

Thus it's no longer proper to keep clear-cache.[ch] under check/
directory, move them to common/ directory.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-13 18:13:12 +02:00
David Sterba
21aa6777b2 btrfs-progs: clean up includes, using include-what-you-use
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-03 01:11:57 +02:00
Josef Bacik
3808db2b3e btrfs-progs: move btrfs_record_file_extent and code into a new file
This function and it's related functions only exist for the utilities
that populate existing file systems, and do not exist in the upstream
kernel.  Move this function and the related function into it's own
common source file and out of the kernel-shared sources, and then update
all of the users to include the new location of this code.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-03 01:11:56 +02:00
Josef Bacik
8069b8b8cd btrfs-progs: drop btrfs_init_path
This simply zero's out the path, and this is used everywhere we use a
stack path.  Drop this usage and simply init the path's to empty instead
of using a function to do the memset.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-03 01:11:56 +02:00
Qu Wenruo
99ed2a64d8 btrfs-progs: convert: follow the default free space tree setting
[BUG]
We got some test failures related to btrfs-convert with subpage, e.g.
btrfs/012, the failure would cause the following dmesg:

  BTRFS warning (device nvme0n1p7): v1 space cache is not supported for page size 16384 with sectorsize 4096
  BTRFS error (device nvme0n1p7): open_ctree failed

[CAUSE]
v1 space cache has tons of hard coded PAGE_SIZE usage, and considering
v2 space cache is going to replace it (which is already the new default
since v5.15 btrfs-progs), thus for btrfs subpage support, we just simply
reject the v1 space cache, and utilize v2 space cache when possible.

But there is special catch in btrfs-convert, although we're specifying
v2 space cache as the new default for btrfs-convert, it doesn't really
follow the specification at all.

Thus the converted filesystem will still go v1 space cache.

[FIX]
It can be a huge change to btrfs-convert to make the initial btrfs image
to support v2 cache.

Thus this patch would change the fs at the final stage, just before we
finalize the btrfs.

This patch would drop all the v1 cache created, then call
btrfs_create_free_space_tree() to populate the free space tree and
commit the superblock with needed compat_ro flags.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-01 01:42:03 +02:00
Qu Wenruo
08a3bd7694 btrfs-progs: tune: add the ability to generate new data checksums
This patch would modify btrfs_csum_file_block() to handle csum type
other than the one used in the current fs.

The new data checksum would use a different objectid (-13) to
distinguish with the existing one (-10).
This needs to change tree-checker to skip the item size checks,
since new csum can be larger than the original csum.

After this stage, the resulted csum tree would look like this:

	item 0 key (CSUM_CHANGE EXTENT_CSUM 13631488) itemoff 8091 itemsize 8192
		range start 13631488 end 22020096 length 8388608
	item 1 key (EXTENT_CSUM EXTENT_CSUM 13631488) itemoff 7067 itemsize 1024
		range start 13631488 end 14680064 length 1048576

Note the itemsize is 8 times the original one, as the original csum is
CRC32, while target csum is SHA256, which is 8 times the size.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:32 +02:00
Qu Wenruo
416507904c btrfs-progs: convert: fix bad csum for migrated range.
[BUG]
There is a report that btrfs-convert leads to bad csum for the image
file.

The reproducer looks like this:
(note the 64K block size, it's used to force a certain chunk layout)

  # touch test.img
  # truncate -s 10G test.img
  # mkfs.ext4 -b 64K test.img
  # btrfs-convert -N 64K test.img
  # btrfs check --check-data-csum test.img
  Opening filesystem to check...
  Checking filesystem on /home/adam/test.img
  UUID: 39d49537-a9f5-47f1-b6ab-7857707b9133
  [1/7] checking root items
  [2/7] checking extents
  [3/7] checking free space cache
  [4/7] checking fs roots
  [5/7] checking csums against data
  mirror 1 bytenr 4563140608 csum 0x3f1fa0ef expected csum 0xa4c4c072
  mirror 1 bytenr 4563206144 csum 0x55dcf0d3 expected csum 0xa4c4c072
  mirror 1 bytenr 4563271680 csum 0x4491b00a expected csum 0xa4c4c072
  mirror 1 bytenr 4563337216 csum 0x655d1f61 expected csum 0xa4c4c072
  mirror 1 bytenr 4563402752 csum 0xd37114d3 expected csum 0xa4c4c072
  mirror 1 bytenr 4563468288 csum 0x4c2dab30 expected csum 0xa4c4c072
  mirror 1 bytenr 4563533824 csum 0xa80fceed expected csum 0xa4c4c072
  mirror 1 bytenr 4563599360 csum 0xaf610db8 expected csum 0xa4c4c072
  mirror 1 bytenr 4563795968 csum 0x67b3c8a0 expected csum 0xa4c4c072
  ERROR: errors found in csum tree
  [6/7] checking root refs
  ...

[CAUSE]
Above initial failure is for logical bytenr of 4563140608, which is
inside the relocated range of the image file offset [0, 1M).

During convert, we migrate the original image file ranges which would
later be covered by super and other reserved ranges.

The migration happens as:

- Read out the original data
- Reserve a new file extent
- Write the data back to the file extent
  Note that, the new file extent can be inside some new data chunks,
  thus it's no longer 1:1 mapped.
- Generate the new csum for the new file extent

The problem happens at the last stage. We should read out the data from
the new file extent, but we call read_disk_extent() using the logical
bytenr, however read_disk_extent() is not doing logical -> physical
mapping.

Thus we will read some garbage, not the newly written data, and use
those garbage to generate csum. And caused the above problem.

[FIX]
Instead of read_disk_extent(), call read_data_from_disk(), which would
do the proper logical -> physical mapping, thus would fix the bug.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:31 +02:00
Qu Wenruo
46364d3766 btrfs-progs: replace write_and_map_eb() by write_data_to_disk()
The function write_and_map_eb() is quite abused as a way to write any
generic buffer back to disk.

But we have a more suitable function already, write_data_to_disk().

This patch would remove the abused write_data_to_disk() calls, and
convert the only three valid call sites to write_data_to_disk() instead.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:31 +02:00
Josef Bacik
f8efe9f724 btrfs-progs: sync file-item.h into progs
This patch syncs file-item.h into btrfs-progs.  This carries with it an
API change for btrfs_del_csums, which takes a root argument in the
kernel, so all callsites have been updated accordingly.

I didn't sync file-item.c because it carries with it a bunch of bio
related helpers which are difficult to adapt to the kernel.
Additionally there's a few helpers in the local copy of file-item.c that
aren't in the kernel that are required for different tools.

This requires more cleanups in both the kernel and progs in order to
sync file-item.c, so for now just do file-item.h in order to pull things
out of ctree.h.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:29 +02:00
Josef Bacik
a754fe29d9 btrfs-progs: sync uapi/btrfs.h into btrfs-progs
We want to keep this file locally as we want to be uptodate with
upstream, so we can build btrfs-progs regardless of which kernel is
currently installed.  Sync this with the upstream version and put it in
kernel-shared/uapi to maintain some semblance of where this file comes
from.

There are some changes that need to be synced back to kernel. A local
definition of static_assert is used to avoid compilation problems on gcc
(< 9) due to mandatory 2nd parameter.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:28 +02:00
Josef Bacik
bf0f3db765 btrfs-progs: introduce UASSERT() for purely userspace code
While syncing messages.[ch] I had to back out the ASSERT() code in
kerncompat.h, which means we now rely on the kernel code for ASSERT().
In order to maintain some semblance of separation introduce UASSERT()
and use that in all the purely userspace code.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:28 +02:00
Qu Wenruo
f3353a5e85 btrfs-progs: convert: handle ext4 orphan file feature properly
[BUG]
Since e2fsprog 1.47, even with a newly created empty ext4 filesystem,
btrfs-convert would result an fs that btrfs-check would complain:

  # mkfs.ext4 -F test.img
  # btrfs-convert test.img
  # btrfs-check test.img
  Opening filesystem to check...
  Checking filesystem on test.img
  UUID: e45da158-8967-4e4d-9c9f-66b0d127dbce
  [1/7] checking root items
  [2/7] checking extents
  [3/7] checking free space cache
  [4/7] checking fs roots
  root 5 inode 266 errors 2000, link count wrong
  ERROR: errors found in fs roots
  found 26333184 bytes used, error(s) found <<<
  total csum bytes: 25540
  total tree bytes: 180224
  total fs tree bytes: 49152
  total extent tree bytes: 16384
  btree space waste bytes: 145423
  file data blocks allocated: 33947648
   referenced 26284032

[CAUSE]
Ext4 has a new compat feature, COMPAT_ORPHAN_FILE, as a better way to
track all the orphan inodes.

This new feature would create a new special inode for this purpose, and
such orphan file inode would not be reachable from any other inode, but
only from super block.

Unfortunately btrfs-convert only skip ext2 known special inodes, not the
newer one.

[FIX]
According to the kernel document, we can locate the orphan file inode
using ext2 super block s_orphan_file_inum, and skip it for
btrfs-convert.

And such skip would only happen if we have the definition of
EXT4_FEATURE_COMPAT_ORPHAN_FILE, to be compatible with older e2fsprogs.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-03-23 19:48:26 +01:00
David Sterba
c811ccc6c9 btrfs-progs: convert: convert help text to option formatter
Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-28 20:11:23 +01:00
David Sterba
24ec095295 btrfs-progs: crypto: add common function for accelerated initialization
Prepare a single location that will detect or set accelerated versions
of hash algorithms. Right now it's the crc32c, blake2 and sha256 do
an if-else switch while crc32c sets a function pointer.

Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-28 19:49:31 +01:00
Qu Wenruo
d8f1bd519f btrfs-progs: mkfs: fix a stack over-flow when features string are too long
[BUG]
Even with chunk_objectid bug fixed, mkfs.btrfs can still caused stack
overflow when enabling extent-tree-v2 feature (need experimental
features enabled):

  # ./mkfs.btrfs  -f -O extent-tree-v2 ~/test.img
  btrfs-progs v5.19.1
  See http://btrfs.wiki.kernel.org for more information.

  ERROR: superblock magic doesn't match
  NOTE: several default settings have changed in version 5.15, please make sure
        this does not affect your deployments:
        - DUP for metadata (-m dup)
        - enabled no-holes (-O no-holes)
        - enabled free-space-tree (-R free-space-tree)

  Label:              (null)
  UUID:               205c61e7-f58e-4e8f-9dc2-38724f5c554b
  Node size:          16384
  Sector size:        4096
  Filesystem size:    512.00MiB
  Block group profiles:
    Data:             single            8.00MiB
    Metadata:         DUP              32.00MiB
    System:           DUP               8.00MiB
  SSD detected:       no
  Zoned device:       no
  =================================================================
  [... Skip full ASAN output ...]
  ==65655==ABORTING

[CAUSE]
For experimental build, we have unified feature output, but the old
buffer size is only 64 bytes, which is too small to cover the new full
feature string:

  extref, skinny-metadata, no-holes, free-space-tree, block-group-tree, extent-tree-v2

Above feature string is already 84 bytes, over the 64 on-stack memory
size.

This can also be proved by the ASAN output:

  ==65655==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffc4e03b1d0 at pc 0x7ff0fc05fafe bp 0x7ffc4e03ac60 sp 0x7ffc4e03a408
  WRITE of size 17 at 0x7ffc4e03b1d0 thread T0
      #0 0x7ff0fc05fafd in __interceptor_strcat /usr/src/debug/gcc/libsanitizer/asan/asan_interceptors.cpp:377
      #1 0x55cdb7b06ca5 in parse_features_to_string common/fsfeatures.c:316
      #2 0x55cdb7b06ce1 in btrfs_parse_fs_features_to_string common/fsfeatures.c:324
      #3 0x55cdb7a37226 in main mkfs/main.c:1783
      #4 0x7ff0fbe3c28f  (/usr/lib/libc.so.6+0x2328f)
      #5 0x7ff0fbe3c349 in __libc_start_main (/usr/lib/libc.so.6+0x23349)
      #6 0x55cdb7a2cb34 in _start ../sysdeps/x86_64/start.S:115

[FIX]
Introduce a new macro, BTRFS_FEATURE_STRING_BUF_SIZE, along with a new
sanity check helper, btrfs_assert_feature_buf_size().

The problem is I can not find a build time method to verify
BTRFS_FEATURE_STRING_BUF_SIZE is large enough to contain all feature
names, thus have to go the runtime function to do the BUG_ON() to verify
the macro size.

Now the minimal buffer size for experimental build is 138 bytes, just
bump it to 160 for future expansion.

And if further features go beyond that number, mkfs.btrfs/btrfs-convert
will immediately crash at that BUG_ON(), so we can definitely detect it.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Tested-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-11 09:08:12 +02:00
Qu Wenruo
bed70b939f btrfs-progs: fsfeatures: properly merge -O and -R options
[BUG]
Commit "btrfs-progs: prepare merging compat feature lists" tries to
merged "-O" and "-R" options, as they don't correctly represents
btrfs features.

But that commit caused the following bug during mkfs for experimental
build:

  $ mkfs.btrfs -f -O block-group-tree  /dev/nvme0n1
  btrfs-progs v5.19.1
  See http://btrfs.wiki.kernel.org for more information.

  ERROR: superblock magic doesn't match
  ERROR: illegal nodesize 16384 (not equal to 4096 for mixed block group)

[CAUSE]
Currently btrfs_parse_fs_features() will return a u64, and reuse the
same u64 for both incompat and compat RO flags for experimental branch.

This can easily leads to conflicts, as
BTRFS_FEATURE_INCOMPAT_MIXED_BLOCK_GROUP and
BTRFS_FEATURE_COMPAT_RO_BLOCK_GROUP_TREE both share the same bit
(1 << 2).

Thus for above case, mkfs.btrfs believe it has set MIXED_BLOCK_GROUP
feature, but what we really want is BLOCK_GROUP_TREE.

[FIX]
Instead of incorrectly re-using the same bits in btrfs_feature, split
the old flags into 3 flags:

- incompat_flag
- compat_ro_flag
- runtime_flag

The first two flags are easy to understand, the corresponding flag of
each feature.
The last runtime_flag is to compensate features which doesn't have any
on-disk flag set, like QUOTA and LIST_ALL.

And since we're no longer using a single u64 as features, we have to
introduce a new structure, btrfs_mkfs_features, to contain above 3
flags.

This also mean, things like default mkfs features must be converted to
use the new structure, thus those old macros are all converted to
const static structures:

- BTRFS_MKFS_DEFAULT_FEATURES + BTRFS_MKFS_DEFAULT_RUNTIME_FEATURES
  -> btrfs_mkfs_default_features

- BTRFS_CONVERT_ALLOWED_FEATURES -> btrfs_convert_allowed_features

And since we're using a structure, it's not longer as easy to implement
a disallowed mask.

Thus functions with @mask_disallowed are all changed to using
an @allowed structure pointer (which can be NULL).

Finally if we have experimental features enabled, all features can be
specified by -O options, and we can output a unified feature list,
instead of the old split ones.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-11 09:08:11 +02:00
David Sterba
2b909aa5db btrfs-progs: convert: do proper error handling
There are cases where the BUG_ON should be replaced by error
handling as it's validating the data from the source filesystem or
possibility to convert. The unconverted cases are asserts and will be
replaced later.

Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-11 09:08:10 +02:00
David Sterba
a827bb2db8 btrfs-progs: use template for transaction commit error messages
Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-11 09:08:10 +02:00
David Sterba
8fcafae04a btrfs-progs: use template for transaction start error messages
Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-11 09:08:10 +02:00
David Sterba
b73a29936a btrfs-progs: remove unnecessary casts for u64
The (unsigned long long) type casts can be dropped, printf understands
%llu and u64 and does not warn. In cases where the type is not u64 keep
the cast.

Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-11 09:08:09 +02:00
David Sterba
2267708bfe btrfs-progs: move repair.c from common/ to check/
Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-11 09:08:09 +02:00
Thomas Hebb
86bd027377 btrfs-progs: convert: simplify create_image_file_range()
The logic at the beginning of this function to handle reserved ranges
was pretty complex and hard to follow. By refactoring it to use the
existing intersect_with_reserved() function, we can remove most of the
comparisons and boolean operators while preserving the exact same logic.

This change is only for readability. It does not change the logic itself
at all.

Author: Thomas Hebb <tommyhebb@gmail.com>
Pull-request: #494
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-11 09:08:08 +02:00
Thomas Hebb
9509b85449 btrfs-progs: convert: expose intersect_with_reserved() to main.c
We currently open code a similar operation in create_image_file_range().
By exposing intersect_with_reserved() outside of source-fs.c and
slightly changing its semantics to return the entire range instead of
just the end address, we can reuse it in create_image_file_range().

Author: Thomas Hebb <tommyhebb@gmail.com>
Pull-request: #494
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-11 09:08:08 +02:00
Thomas Hebb
cfae74f6bb btrfs-progs: convert: fix buggy logic in create_image_file_range()
When checking if the requested range starts in a valid region but later
hits a reserved range, we require the reserved range to end before the
requested one does.

This is incorrect. Since we're going to truncate the requested range
anyway, we want this check to pass even if the requested range ends
partway through a reserved range.

Fix the issue by checking against the reserved range's start address
instead of its end.

Luckily, I don't believe this bug makes a difference in the current code
path, since the range we pass to this function never ends before the end
of the filesystem.

Issue: #297
Issue: #349
Author: Thomas Hebb <tommyhebb@gmail.com>
Pull-request: #494
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-11 09:08:08 +02:00
Thomas Hebb
407a8721b6 btrfs-progs: convert: fix off-by-one error in overlap test
intersect_with_reserved() currently succeeds if (bytenr + num_bytes) is
greater than or equal to the first address in the range, assuming that
bytenr is also not past the end of the range.

This is wrong. (bytenr + num bytes) is one byte past the last address in
the range we're checking, meaning that our range only overlaps the
reserved range if it's strictly greater than the reserved range's start
address.

For example, imagine a range at 0x3000 with length 0x1000 that we're
checking against a reserved range that starts at 0x4000. The addresses
in our range are 0x3000-0x3fff: it doesn't overlap. But the current
check, (0x3000 + 0x1000 >= 0x4000), will erroneously pass.

Fix the issue by changing >= to >.

Issue: #297
Issue: #349
Author: Thomas Hebb <tommyhebb@gmail.com>
Pull-request: #494
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-11 09:08:08 +02:00
Thomas Hebb
a5f6622d32 btrfs-progs: convert: make comment formatting consistent
Author: Thomas Hebb <tommyhebb@gmail.com>
Pull-request: #494
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-11 09:08:08 +02:00
Thomas Hebb
2f43f2dc7b btrfs-progs: convert: move simple_range into common.h
This is currently defined in source-fs.h, but main.c uses it far more
than source-fs.c does. Put it in common.h instead, since it's a useful
standalone type.

Author: Thomas Hebb <tommyhebb@gmail.com>
Pull-request: #494
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-11 09:08:08 +02:00
David Sterba
b610fc1612 btrfs-progs: convert: use message helpers for error messages
Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-11 09:08:07 +02:00
David Sterba
ec55de0ac5 btrfs-progs: convert: update include lists
The tool IWYU (include what you use) suggests to remove and add some
includes.

Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-11 09:06:12 +02:00
David Sterba
be6043ecd2 btrfs-progs: convert: reorder includes
The preferred order:
- system headers
- standard headers
- libraries
- kernel library
- kernel shared
- common headers
- other tools
- own headers

Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-11 09:06:11 +02:00