Recently we had a scrub use-after-free caused by unaligned chunk
length, although the fix was submitted, we may want to do extra checks
for a chunk's alignment.
This patch adds such check for the starting bytenr and length of a
chunk, to make sure they are properly aligned to 64K stripe boundary.
By default, the check only leads to a warning but is not treated as an
error, as we expect kernel to handle such unalignment without any
problem.
But if the new debug environmental variable,
BTRFS_PROGS_DEBUG_STRICT_CHUNK_ALIGNMENT, is specified, then we will
treat it as an error. So that we can detect unexpected chunks from
btrfs-progs, and fix them before reaching the end users.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
To be consistent with the rest of the code the sysfs helper should
return the -errno instead of passing -1 from various syscalls. Update
callers that relied on -1 as the invalid file descriptor.
Signed-off-by: David Sterba <dsterba@suse.com>
The enqueue option should let the user know that the expected operation
hasn't started yet and that it's waiting for another one. Although the
exclusive operations can take long, the two reason should be
distinguished.
Signed-off-by: David Sterba <dsterba@suse.com>
strtoull may return the boundary values, if the callers could expect
that and verify it then the errno must be reset before the call.
Signed-off-by: David Sterba <dsterba@suse.com>
GCC 14 introduces a new -Walloc-size included in -Wextra which gives:
```
common/utils.c:983:15: warning: allocation of insufficient size ‘1’ for type ‘struct config_param’ with size ‘32’ [-Walloc-size]
cmds/qgroup.c:1644:13: warning: allocation of insufficient size ‘1’ for type ‘struct btrfs_qgroup_inherit’ with size ‘72’ [-Walloc-size]
```
The calloc prototype is:
```
void *calloc(size_t nmemb, size_t size);
```
So, just swap the number of members and size arguments to match the prototype, as
we're initialising 1 struct of size `sizeof(struct ...)`. GCC then sees we're not
doing anything wrong.
Pull-request: #707
Signed-off-by: Sam James <sam@gentoo.org>
Signed-off-by: David Sterba <dsterba@suse.com>
There's a report that reading properties from a sound device the system
is stuck and then gets rebooted by watchdog. Reading from fifo files
gets stuck as well, although this would not trigger the watchdog.
The reason is that open() on fifo files is blocking until the other end
of the pipe is opened. For device nodes it's driver specific, most
device nodes fail right away:
$ btrfs prop get /dev/tty
ERROR: object is not a btrfs object: /dev/tty
In case of the sound device the consequences were fatal. We can fix that
by opening the path on non-blocking mode. This is only for reading the
fsid, the fd is closed right after the ioctl so the non-blocking mode
does not affect other operation.
The blocking mode must be used for block devices as e.g. loop devices
may not be finalized when the open() call returns and get_fsid fails.
The known problematic devices are character and fifos.
Issue: #699
Signed-off-by: David Sterba <dsterba@suse.com>
Some commands could be run in a dry-run mode, i.e. not doing any
write/change actions, only printing the steps and ignoring errors.
There are two possibilities where to put the option:
- as a global one: btrfs --dry-run subvolume delete /path
- local option: btrfs subvolume delete --dry-run /path
As we have several global options already, let's put it there, dry-run
should not be very common so the slight inconvenience of writing the
option out of order of command arguments should be acceptable.
Issue: #629
Signed-off-by: David Sterba <dsterba@suse.com>
./btrfs --param key=value command ...
./btrfs --param key command ...
To pass various tuning data for testing and debugging, undocumented
for regular users.
To add support add reading of the parameter value after option parsing
bconf_param_value("key") and convert to what you need.
Signed-off-by: David Sterba <dsterba@suse.com>
The kernel patches for RST and squota are queued for 6.7, we need to be
able to test the features so it's not necessary to hide the mkfs support
under experimental build. The kernel may still need debug build to
enable mount.
Signed-off-by: David Sterba <dsterba@suse.com>
While we like to have the descriptive names also add short aliases that
we also use for reference in changelogs and documentation.
$ mkfs.btrfs -O list-all
Filesystem features available:
mixed-bg - mixed data and metadata block groups (compat=2.6.37, safe=2.6.37)
quota - quota support (qgroups) (compat=3.4)
extref - increased hardlink limit per file to 65536 (compat=3.7, safe=3.12, default=3.12)
raid56 - raid56 extended format (compat=3.9)
skinny-metadata - reduced-size metadata extent refs (compat=3.10, safe=3.18, default=3.18)
no-holes - no explicit hole extents for files (compat=3.14, safe=4.0, default=5.15)
fst - free-space-tree alias
free-space-tree - free space tree (space_cache=v2) (compat=4.5, safe=4.9, default=5.15)
raid1c34 - RAID1 with 3 or 4 copies (compat=5.5)
zoned - support zoned devices (compat=5.12)
extent-tree-v2 - new extent tree format (compat=5.15)
bgt - block-group-tree alias
block-group-tree - block group tree to reduce mount time (compat=6.1)
rst - raid-stripe-tree alias
raid-stripe-tree - raid stripe tree (compat=6.7)
squota - squota support (simple accounting qgroups) (compat=6.7)
Signed-off-by: David Sterba <dsterba@suse.com>
The list 'mkfs -O list-all' should be sorted by version of kernel
support (compat) so it's clear what's new.
Signed-off-by: David Sterba <dsterba@suse.com>
The clear-cache functionality is shared by several commands:
- btrfs check
For --clear-cache and --clear-ino-cache.
- btrfstune
Mostly for block-group-tree feature conversion.
- btrfs-convert
To enable the now default v2 space cache.
Thus it's no longer proper to keep clear-cache.[ch] under check/
directory, move them to common/ directory.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The current implementation would introduce variable shadowing due to
both max() and min() are using the same __x and __y.
This may not be a big deal, but since kernel is already handling it
properly using __UNIQUE_ID() macro, and has more checks, we can
cross-port the kernel version to btrfs-progs.
There are some dependency needed, they are all small enough thus can be
put into the helper.
- __PASTE()
- __UNIQUE_ID()
- BUILD_BUG_ON_ZERO()
- __is_constexpr()
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This function and it's related functions only exist for the utilities
that populate existing file systems, and do not exist in the upstream
kernel. Move this function and the related function into it's own
common source file and out of the kernel-shared sources, and then update
all of the users to include the new location of this code.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add the ability to enable simple quotas from mkfs with '-O squota'
There is some complication around handling enable_gen while still
counting the root node of an fs. To handle this, employ a hack of doing
a no-op write on the root node to bump its generation up above that of
the qgroup enable generation, which results in counting it properly.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
Allow for RAID levels 0, 1 and 10 on zoned devices if the RAID stripe tree
is used.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We're getting more features and the string size limit will not be
sufficient, so extend enough that we won't have to care for some time.
Signed-off-by: David Sterba <dsterba@suse.com>
The accelerated crc32c needs to check for two CPU features, the crc32c
instructions is in SSE 4.2 and 'pclmulqdq' is a separate. There's still
old hardware used that does not have the PCLMUL instructions. Detect it
and make it the condition.
The pclmul is not supported on old compilers so also add a
configure-time detection and leave the SSE 4.2 only implementation as
the accelerated one if possible.
Issue: #676
Signed-off-by: David Sterba <dsterba@suse.com>
Aligning with the kernel's struct btrfs_fs_devices:fs_list, rename
btrfs_fs_devices::list to btrfs_fs_devices::fs_list.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Copy faster implementation of crc32c from linux kernel as of 6.5-rc7
(x86_64, arch/x86/crypto/crc32c-pcl-intel-asm_64.S). This needs
assembler build support, so detect target architecture so
cross-compilation still works.
Add a special CPU flag so the old and new implementations can be
benchmarked and verified separately.
Sample benchmark:
CPU flags: 0x1ff
CPU features: SSE2 SSSE3 SSE41 SSE42 SHA AVX AVX2 CRC32C_PCL
Block size: 4096
Iterations: 1000000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 77177218, cycles/i 77
NULL-MEMCPY: cycles: 226313072, cycles/i 226, 62133.395 MiB/s
CRC32C-ref: cycles: 24418596066, cycles/i 24418, 575.859 MiB/s
CRC32C-NI: cycles: 1188335920, cycles/i 1188, 11833.073 MiB/s
CRC32C-PCL: cycles: 463193456, cycles/i 463, 30358.037 MiB/s
XXHASH: cycles: 851606646, cycles/i 851, 16511.916 MiB/s
SHA256-ref: cycles: 74476234956, cycles/i 74476, 188.808 MiB/s
SHA256-NI: cycles: 34198637428, cycles/i 34198, 411.177 MiB/s
BLAKE2-ref: cycles: 14761411664, cycles/i 14761, 952.597 MiB/s
BLAKE2-SSE2: cycles: 18101896796, cycles/i 18101, 776.807 MiB/s
BLAKE2-SSE41: cycles: 12599091062, cycles/i 12599, 1116.087 MiB/s
BLAKE2-AVX2: cycles: 9668247506, cycles/i 9668, 1454.418 MiB/s
The new implementation is about 2.5x faster.
Note: there new version does not work on musl because of linkage
problems (relocations in .rodata), so it's still using the old
implementation.
Signed-off-by: David Sterba <dsterba@suse.com>
In some places we want to read a single u64 value from a sysfs path, or
from fsid directory. Add helpers that do that in one go.
Signed-off-by: David Sterba <dsterba@suse.com>
The sysfs could use more convenience helpers so move the current code to
own file before adding more helpers.
Signed-off-by: David Sterba <dsterba@suse.com>
API for extensible array of pointers for covenience. A simple wrapper
around a (void *) array with length.
Signed-off-by: David Sterba <dsterba@suse.com>
This is a potentially breaking change to json output. An all zeros uuid
was printed as "-" but we can utilize native json type null for that.
Note the va_copy must be used as va_arg advances the pointer.
{
"nulluuid": null
}
Signed-off-by: David Sterba <dsterba@suse.com>
Make the timestamp format more descriptive what is actually printed. We
may need separate date or time in the future.
Signed-off-by: David Sterba <dsterba@suse.com>
The json spec allows numeric values and it's recommended to use them
instead of the stringified numbers. This is a potentially breaking change
if some tools relied on the string value.
As most formats we now have are '%llu' and it's convenient to just pass
it to vprintf, don't add a special type for ints. Any new int type must
be added to the list.
{
"number": 1234
}
Signed-off-by: David Sterba <dsterba@suse.com>
The 'str' type was added in ecbb6a7fcd ("btrfs-progs: add json
formatter for escaped string") but not documented. It should be used
e.g. for paths or strings from unknown origin.
Signed-off-by: David Sterba <dsterba@suse.com>
For null or boolean values the "..." quoting must not be done, add
support for that. This is detected internally for each printed value.
Signed-off-by: David Sterba <dsterba@suse.com>
A newline character in option description text will break line and then
indent the text properly, can be used for lists or paragraphs.
Signed-off-by: David Sterba <dsterba@suse.com>
To be able to test errors at specific locations, add a simple way to
check for a condition in code and controlled from user space environment
variable INJECT. For now a single value is accepted.
Use like:
if (inject_error(0x1234)) {
do_something();
return -ERROR;
}
This is enabled in debugging build by default (make D=1) and can be
enabled on demand too (make EXTRA_CFLAGS=-DINJECT).
Signed-off-by: David Sterba <dsterba@suse.com>
There's a report that btrfs-find-root does not work as built-in tool in
btrfs.box, while it's advertised in the help:
$ ./btrfs.box help --box
Standalone tools built-in in the busybox style:
- mkfs.btrfs
- btrfs-image
- btrfs-convert
- btrfstune
- btrfs-find-root
Add the support as it might be useful tool sometimes. In the future the
command should be moved to e.g. inspect-internal or rescue.
Issue: #648
Signed-off-by: David Sterba <dsterba@suse.com>
The function check_where_mounted() scans the system for all other btrfs
devices, which is necessary for its operation. However, in certain
cases, devices remaining in the scanned state is undesirable. Introduce
the 'noscan' argument to make devices unscanned before return.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
To prepare for handling command line given devices factor out
btrfs_scan_argv_devices().
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The variable 'is_btrfs' is declared as an integer but should be a boolean
instead.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The send.h for libbtrfs has been separated some time ago so we're now
free to keep up with kernel, 6.4-rc1.
Signed-off-by: David Sterba <dsterba@suse.com>
The following functions accept a buffer for write, which can be marked
as const:
- btrfs_pwrite()
- write_data_to_disk()
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
It's not a common practice to use the same io function for both read and
write (we have pread() and pwrite(), not pio()).
Furthermore the original function has the following problems:
- Not returning proper error number
If we had ioctl/stat errors we just return 0 with errno set.
Thus caller would treat it as a short read, not a proper error.
- Unnecessary @ret_rw
This is not that obvious if we have different handling for read and
write, but if we split them it's super obvious we can reuse @ret.
- No proper copy back for short read
- Unable to constify the @buf pointer for write operation
All those problems would be addressed in this patch.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The fixes involve the following changes:
- Unexport functions which are not utilized out of the file
* print_path_column()
* parse_reflink_range()
* btrfs_list_setup_print_column()
* device_get_partition_size_sysfs()
* max_zone_append_size()
- Include related headers before implementing the function
* change-uuid.c
* convert-bgt.c
* seed.h
- Add missing headers caused by the above header changes
* include <uuid/uuid.h> for tune/tune.h.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>