We all know there's some dark and scary corners with RAID5/6, but users
may not know. Add a warning message in mkfs so anybody trying to use
this will know things can go very wrong.
Issue: #265
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
[ reword message ]
Signed-off-by: David Sterba <dsterba@suse.com>
The libmount dependency has been added in commit 61ecaff036
("btrfs-progs: build: add libmount dependency"), and static build got
broken. There are functions that do basically the same thing and also
share the name, which in turn fails at link time.
ld: /../lib64/libmount.a(libcommon_la-canonicalize.o): in function `canonicalize_dm_name':
util-linux-2.34/lib/canonicalize.c:58: multiple definition of `canonicalize_dm_name';
common/path-utils.static.o:btrfs-progs/common/path-utils.c:286: first defined here
In case the collision can be resolved by renaming, it's done
(canonicalize_path and parse_size). There are 2 symbols from selinux
that are substituted by a weak aliases during the static build.
There's one new warning due to use of getgrnam_r in libmount that
depends on dynamic linking and may not work properly with static build.
We're not using the related functions directly or indirectly, so it
should be safe to ignore the warnings.
ld: ../lib64/libmount.a(la-utils.o): in function `mnt_get_gid':
util-linux-2.34/libmount/src/utils.c:625: warning: Using 'getgrnam_r' in statically linked applications
+requires at runtime the shared libraries from the glibc version used for linking
Issue: #333
Signed-off-by: David Sterba <dsterba@suse.com>
There are several problems for current sectorsize check:
- No check at all for sectorsize
This means you can even specify "-s 62k".
- No way to specify sectorsize smaller than page size
Fix all these problems by:
- Introduce btrfs_check_sectorsize()
To do:
* power of 2 check for sectorsize
* lower and upper boundary check for sectorsize
* warn about sectorsize mismatch with page size
- Remove the max() between page size and sectorsize
This allows us to override the sectorsize for 64K page systems.
- Make nodesize calculation based on sectorsize
No need to use page size any more.
Users who specify sectorsize manually really know what they are doing,
and we have warned them already.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add a runtime feature (-R) flag for the free space tree. A filesystem
that is mkfs'd with -R free-space-tree then mounted with no options has
the same contents as one mkfs'd without the option, then mounted with
'-o space_cache=v2'.
The only tricky thing is in exactly how to call the tree creation code.
Using btrfs_create_free_space_tree as is did not quite work, because an
extra reference to the eb (root->commit_root) is leaked, which mkfs
complains about with a warning. I opted to follow how the uuid tree is
created by adding it to the dirty roots list for cleanup by
commit_tree_roots in commit_transaction. As a result,
btrfs_create_free_space_tree no longer exactly matches the version in
the kernel sources.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
Extract the defaults for data and metadata profiles to a header and
use the symbolic names instead of hardcoding the profiles.
Signed-off-by: David Sterba <dsterba@suse.com>
The option -A was used long time ago for debugging and marked as
obsolete since 4.14.1. Remove the option and set the alloc start to the
default value 1MiB.
Signed-off-by: David Sterba <dsterba@suse.com>
Add support for enabling quotas at mkfs time. The qgroup accounting will
be consistent, ie. works with --rootdir.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Just like -O|--features, introduce -R|--runtime-features to enable
features that are now enabled on a mounted filesystem
Currently only mkfs is supported, convert is not supported yet.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Make the features structures more generic to allow mkfs-time and
mount-time sets to be defined.
This provides base for later mkfs support of mount-time features like
quotas.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce a new function, setup_quota_root(), which will create quota
root, and do an offline rescan to ensure all quota accounting numbers
are correct.
Signed-off-by: Qu Wenruo <wqu@suse.com>
[ minor improvement in the fail path ]
Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce a new function, insert_qgroup_items(), to insert qgroup info
item and qgroup limit item for later mkfs qgroup support.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
To sync with the refactored kernel code. Also since we're here, sync
the function parameters with kernel too.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This would sync the code between kernel and btrfs-progs, and save at
least 1 byte for each btrfs_block_group_cache.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add definition, crypto wrappers and support to mkfs for blake2 for
checksumming. There are 2 aliases either blake2 or blake2b.
Signed-off-by: David Sterba <dsterba@suse.com>
Add the definition to the checksum types and let mkfs accept it.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
With the introduction of xxhash64 to btrfs-progs we created a crypto/
directory for all the hashes used in btrfs (although no
cryptographically secure hash is there yet).
Move the crc32c implementation from kernel-lib/ to crypto/ as well so we
have all hashes consolidated.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
As mkfs will grow new checksums, print the used checksum in it's
versbose output.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
Add an option to mkfs to specify which checksum algorithm will be used
for the filesystem. Currently only crc32c is supported.
The option name is -c, presumably one of the comonly used options so it
gets the lowercase option.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
Add checksum type to the definition structure for a new filesystem, this
will be used in following patches.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
When btrfs_add_to_fsid fails in mkfs we try to close the ctree. That
complains that we already have a transaction open. We should be taking
the error path and exit cleanly without writing.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When creating a filesystem with mixed block groups, we are creating two
space info objects to track used/reserved/pinned space, one only for data
and another one only for metadata.
This is making fstests test case generic/416 fail, with btrfs' check
reporting over an hundred errors about bad extents:
(...)
bad extent [17186816, 17190912), type mismatch with chunk
bad extent [17195008, 17199104), type mismatch with chunk
bad extent [17203200, 17207296), type mismatch with chunk
(...)
Because, surprisingly, this results in block groups that do not have the
BTRFS_BLOCK_GROUP_DATA flag set but have data extents allocated in them.
This is a regression introduced in btrfs-progs v5.2.
So fix this by making sure we only create one space info object, for both
metadata and data, when mixed block groups are enabled.
Fixes: c31edf610c ("btrfs-progs: Fix false ENOSPC alert by tracking used space correctly")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Build several standalone tools into one binary and switch the function
by name (symlink or hardlink).
* btrfs
* mkfs.btrfs
* btrfs-image
* btrfs-convert
* btrfstune
The static target is also supported. The name of resulting boxed
binaries is btrfs.box and btrfs.box.static . All the binaries can be
built at the same time without prior configuration.
text data bss dec hex filename
822454 27000 19724 869178 d433a btrfs
927314 28816 20812 976942 ee82e btrfs.box
2067745 58004 44736 2170485 211e75 btrfs.static
2627198 61724 83800 2772722 2a4ef2 btrfs.box.static
File sizes:
857496 btrfs
968536 btrfs.box
2141400 btrfs.static
2704472 btrfs.box.static
Standalone utilities:
512504 btrfs-convert
495960 btrfs-image
471224 btrfstune
491864 mkfs.btrfs
1747720 btrfs-convert.static
1411416 btrfs-image.static
1304256 btrfstune.static
1361696 mkfs.btrfs.static
So the shared 900K binary saves ~2M, or ~5.7M for static build.
Signed-off-by: David Sterba <dsterba@suse.cz>
[BUG]
There is a bug report of unexpected ENOSPC from btrfs-convert, issue #123.
After some debugging, even when we have enough unallocated space, we
still hit ENOSPC at btrfs_reserve_extent().
[CAUSE]
Btrfs-progs relies on chunk preallocator to make enough space for
data/metadata.
However after the introduction of delayed-ref, it's no longer reliable
to rely on btrfs_space_info::bytes_used and
btrfs_space_info::bytes_pinned to calculate used metadata space.
For a running transaction with a lot of allocated tree blocks,
btrfs_space_info::bytes_used stays its original value, and will only be
updated when running delayed ref.
This makes btrfs-progs chunk preallocator completely useless. And for
btrfs-convert/mkfs.btrfs --rootdir, if we're going to have enough
metadata to fill a metadata block group in one transaction, we will hit
ENOSPC no matter whether we have enough unallocated space.
[FIX]
This patch will introduce btrfs_space_info::bytes_reserved to track how
many space we have reserved but not yet committed to extent tree.
To support this change, this commit also introduces the following
modification:
- More comment on btrfs_space_info::bytes_*
To make code a little easier to read
- Export update_space_info() to preallocate empty data/metadata space
info for mkfs.
For mkfs, we only have a temporary fs image with SYSTEM chunk only.
Export update_space_info() so that we can preallocate empty
data/metadata space info before we start a transaction.
- Proper btrfs_space_info::bytes_reserved update
The timing is the as kernel (except we don't need to update
bytes_reserved for data extents)
* Increase bytes_reserved when call alloc_reserved_tree_block()
* Decrease bytes_reserved when running delayed refs
With the help of head->must_insert_reserved to determine whether we
need to decrease.
Issue: #123
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Although moderm hardware is fast enough and crc32c calculation is not a
hotspot, doing such optimization won't hurt anyway.
Issue: #175
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
For data reloc tree creation, we copy its contents from the fs tree just
for its INODE_ITEM, INODE_REF and dirid. This hides the detail and is
not obvious for why we're copying from fs root.
This patch will create data reloc tree from scratch:
- Create root, including root item and new tree root
- Change dirid to BTRFS_FIRST_FREE_OBJECTID
- Insert root INODE_ITEM and INODE_REF
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Similar to the changes where strerror(errno) was converted, continue
with the remaining cases where the argument was stored in another
variable.
The savings in object size are about 4500 bytes:
$ size btrfs.old btrfs.new
text data bss dec hex filename
805055 24248 19748 849051 cf49b btrfs.old
804527 24248 19748 848523 cf28b btrfs.new
Signed-off-by: David Sterba <dsterba@suse.com>
The old flag OPEN_CTREE_FS_PARTIAL is in fact quite easy to be confused
with OPEN_CTREE_PARTIAL, which allow btrfs-progs to open damaged
filesystem (like corrupted extent/csum tree).
However OPEN_CTREE_FS_PARTIAL, unlike its name, is just allowing
btrfs-progs to open fs with temporary superblocks (which only has 6
basic trees on SINGLE meta/sys chunks).
The usage of FS_PARTIAL is really confusing here.
So rename OPEN_CTREE_FS_PARTIAL to OPEN_CTREE_TEMPORARY_SUPER, and add
extra comment for its behavior.
Also rename BTRFS_MAGIC_PARTIAL to BTRFS_MAGIC_TEMPORARY to keep the
naming consistent.
And with above comment, the usage of FS_PARTIAL in dump-tree is
obviously incorrect, fix it.
Fixes: 8698a2b9ba ("btrfs-progs: Allow inspect dump-tree to show specified tree block even some tree roots are corrupted")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
With mkfs.btrfs on a thin provisioned device with very small backing
size and big virtual size, all code works well in mkfs.btrfs until
close_ctree() is called.
close_ctree() fails to sync device due to small backing size while
closing devices. However, mkfs returns 0 in such situation which causes
failure of fstests generic/405.
So, let mkfs returns nonzero value if previous steps succeeded but
close_ctree() failed. Then fstests generic/405 passes now.
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We can easily create the uuid tree that's usually created after first
mount. The kernel will still check the tree on first mount so we don't
try to fake the uuid tree generation so it appears consistent, even if
it's empty.
Signed-off-by: David Sterba <dsterba@suse.com>