If the final fsync() on the Btrfs device fails, we just swallow the
error and don't alert the user in any way. This was uncovered by xfstest
generic/405, which checks that mkfs fails when it encounters EIO.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Large numbers like (1024 * 1024 * 1024) may cost reader/reviewer to
waste one second to convert to 1G.
Introduce kernel include/linux/sizes.h to replace any intermediate
number larger than 4096 (not including 4096) to SZ_*.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This solves an ENOSPC issue with nearly full filesystems.
The only things missing from the function is contains_pending_extent()
which should not be required in this case.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Use thew raid5_gen_result() function to calculate raid5 parity.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Ebs and pointers are allocated, but if any of the allocation failed, we
should free the allocated memory.
Resolves-Coverity-CID: 1374101
Resolves-Coverity-CID: 1374100
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The key.offset member is not well-aligned as the key is packed, use a
temporary variable to pass the argument. Reported by ASAN in misc test
002.
Signed-off-by: David Sterba <dsterba@suse.com>
Remove various BUG_ON in raid56 write routine, including:
1) Memory allocation error
Old codes allocates memory when code needs new memory in a loop, and
catch the error using BUG_ON().
New codes allocates memory in a allocation loop first, if any failure
is caught, freeing already allocated memories then throw -ENOMEM.
2) Write error
Change BUG_ON() to correct return value.
3) Validation check
Same as write error.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Current we only do chunk validation check at mount time.
It's good for most case, but for fuzzed or manually crafted images, we
can insert a CHUNK_ITEM key into root tree.
Since mount time check will only check chunk tree, it will not check
CHUNK_ITEM in root tree.
Even with previous key type check against leaf owner, it is still
possible to modify the leaf owner to by-pass it.
So we still need to check chunk validation before processing it.
Reported-by: Lukas Lueg <lukas.lueg@gmail.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In read_one_chunk(), we may add an empty entry for a missing device.
However, this entry wasn't being added to the dev_list, and so it never
got freed.
Signed-off-by: Justin Maggard <jmaggard@netgear.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The extent_buffer::data might be unaligned wrt unsigned long, depends on
acutal layout of the structure and width of the int types. Use explicit
unaligned access helpers.
Reported-by: Anatoly Pugachev <matorola@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When open in btrfs_open_devices failed, only the following message is
displayed. Therefore the user doesn't understand the reason why open
failed.
# btrfs check /dev/sdb8
Couldn't open file system
This patch adds the error message when open fails.
Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce new function, make_convert_data_chunks(), to build up data
chunks for convert.
It will call a modified version of btrfs_alloc_data_chunk() to force
data chunks to covert all known ext* data.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
To survive fuzz filesystem images, we need various validation checks to
make btrfsck detect any invalid value inside chunks including those in
sys_array.
Note that these checks may not be sufficient to cover all corner cases,
we may need to add more later.
This also refractor previous various checks into a helper function so
that we can add more checks into it in the future.
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Reported-by: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The message
"warning devid %llu not found already\n",
does not seem to be too useful, it appears during several commands and
sometimes repeatedly.
Signed-off-by: David Sterba <dsterba@suse.com>
Since open_ctree_fs_info() now may return a fs_info even without any
roots, modify functions like read_tree_block() to operate with such
fs_info.
This provides the basis for btrfs-find-root to operate on chunk tree
with corrupted fs.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
[ coding style adjustments, unified declarations ]
Signed-off-by: David Sterba <dsterba@suse.com>
There is a small bug from 2011, where btrfs_next_bg (formally
btrfs_next_metadata) function will always skip the first chunk.
That's OK for that time, as there is always 3 empty temporary chunks.
But now, we may ended up with only one metadata or system chunk, with
empty chunk auto-remove from kernel or new mkfs.btrfs.
So fix it by checking the initial value so btrfs_next_bg() will return
the first chunk if its *logical parameter is 0.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Enhance chunk validation:
1) Num_stripes
We already have such check but it's only in super block sys chunk
array.
Now check all on-disk chunks.
2) Chunk logical
It should be aligned to sector size.
This behavior should be *DOUBLE CHECKED* for 64K sector size like
PPC64 or AArch64.
Maybe we can found some hidden bugs.
3) Chunk length
Same as chunk logical, should be aligned to sector size.
4) Stripe length
It should be power of 2.
5) Chunk type
Any bit out of TYPE_MAS | PROFILE_MASK is invalid.
With all these much restrict rules, several fuzzed image reported in
mail list should no longer cause btrfsck error.
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We can handle the special case of num_stripes == 0 directly inside
btrfs_read_sys_array. The BUG_ON in btrfs_chunk_item_size is there to
catch other unhandled cases where we fail to validate external data,
like in btrfs-show-super.
Signed-off-by: David Sterba <dsterba@suse.com>
Port of kernel commit e3540eab29e1b2260bc4b9b3979a49a00e3e3af8
Verify that the sys_array has enough bytes to read the next item.
Signed-off-by: David Sterba <dsterba@suse.com>
Port of kernel commit 1ffb22cf8c322bbfea6b35fe23d025841b49fede
There's a pointer to buffer, integer offset and offset passed as
pointer, try to find matching names for them.
Signed-off-by: David Sterba <dsterba@suse.com>
There are some sanity checks missing on both sides, kernel/userspace.
Preparation to port the missing changes.
Sync code with parent of kernel commit
1ffb22cf8c322bbfea6b35fe23d025841b49fede ("btrfs: cleanup, rename a few
variables in btrfs_read_sys_array")
This effectively reverts progs commit
be96777126 ("btrfs-progs: Cleanup unneeded
extra variant in btrfs_read_sys_array") so we can apply more of the
kernel patches.
Signed-off-by: David Sterba <dsterba@suse.com>
Add support to search chunk root, as we only need to search tree roots
in system chunk, which should be very easy to add, just iterate in
system chunks.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
[ renamed to btrfs_next_bg_* ]
Signed-off-by: David Sterba <dsterba@suse.com>
This patch is generated from a coccinelle semantic patch:
identifier t;
expression e;
statement s;
@@
-t = malloc(e);
+t = calloc(1, e);
(
if (!t) s
|
if (t == NULL) s
|
)
-memset(t, 0, e);
Signed-off-by: Silvio Fricke <silvio.fricke@gmail.com>
[squashed patches into one]
Signed-off-by: David Sterba <dsterba@suse.com>
If there is more than one fs_devices in fs_uuids list (like mkfs.btrfs
does), we need close them all before exit. Add a helper for that.
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This bug is found by making break point after change_fsid_prepare() and
then kill the unfinished change, then try to restore the unfinished fsid
change.
If fsid change is canceled, open_ctree will still fail even with
IGNORE_FSID_MIMATCH open ctree flag, since it can't find device with
mismatched fsid, making it unable to restoring.
Now add ignore_fsid_mismatch judgment in btrfs_find_device() to fix the
bug and allow later restore to work as expected.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
For DUP profile, the num_stripes should be 2 not 1.
This causes btrfs offline tool fails on valid image.
Also, num_stripes check is too restrict for btrfsck self test,
as there is some image in degraded mode, so modify it to allow degraded
chunk.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Adds extra check when reading a chunk item:
1) Check chunk type.
Don't allow any unsupported type/profile bit.
2) Check num_stripes
Any chunk item should contain at least one stripe.
For system chunk, the chunk item size(calculated by btrfs_stripe size *
(num_stripes - 1) + btrfs_chunk size) should not exceed
BTRFS_SYSTEM_CHUNK_SIZE(2048).
For normal chunk, the chunk item size(calculated) should match the chunk
item size.
3) Check num_stripes/sub_stripes against chunk profile.
Num_stripes/sub_stripes must meet its lower limit for its chunk profile.
Reported-by: Lukas Lueg <lukas.lueg@gmail.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
glibc 2.10+ (5+ years old) enables all the desired features:
_XOPEN_SOURCE 700, __XOPEN2K8, POSIX_C_SOURCE, DEFAULT_SOURCE; with a
single _GNU_SOURCE define in the makefile alone. For portability to
other libc implementations (e.g. dietlibc) _XOPEN_SOURCE=700 is also
defined.
This also resolves Debian bug report filed by Michael Tautschnig -
"Inconsistent use of _XOPEN_SOURCE results in conflicting
declarations". Whilst I was not able to reproduce the results, the
reported fact is that _XOPEN_SOURCE set to 500 in one set of files
(e.g. cmds-filesystem.c) generates/defines different struct stat from
other files (cmds-replace.c).
This patch thus cleans up all feature defines, and sets them at a
consistent level.
Bug-Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=747969
Signed-off-by: Dimitri John Ledkov <dimitri.j.ledkov@intel.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Run fstests: btrfs/012 will fail with message:
unable to do rollback
It is because the rollback function checks sequentially each piece of space
to map to a certain block group. If some piece doesn't, rollback refuses to continue.
After kernel commit:
commit 47ab2a6c689913db23ccae38349714edf8365e0a
Btrfs: remove empty block groups automatically
Empty block groups are removed, so there are possible gaps:
|--block group 1--| |--block group 2--|
^
|
gap
So the piece of space of the gap belongs to a removed empty block group,
and rollback should detect this case, and feel free to continue.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
The @search_cache_extent() only returns the next cache_extent or NULL,
it will never return the previous cache_extent.
So just remove the dead condition for previous cache_extent handle.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
*Note*
this handles the problem under umounted state, the similar problem
under mounted state is already fixed by Anand.
Steps to reproduce:
# mkfs.btrfs -f /dev/sda1
# btrfstune -S 1 /dev/sda1
# mount /dev/sda1 /mnt
# btrfs dev add /dev/sda2 /mnt
# umount /mnt <== (umounted)
# btrfs fi show /dev/sda2
result:
Label: none uuid: XXXXXXXXXXXXXXXXXX
Total devices 2 FS bytes used 368.00KiB
devid 2 size 9.31GiB used 1.25GiB path /dev/sda2
*** Some devices missing
Btrfs v3.16-67-g69f54ea-dirty
It is because @btrfs_scan_lblkid() won't establish mappinig
between the seed and sprout devices. So seeding devices are missing.
We could use @open_ctree_* to detect all seed/sprout mappings
for each fs scanned after @btrfs_scan_lblkid().
sth worthes mention:
o If there are multi-level of seeds, all devices in them will be shown
in the ascending order of @devid
o If device replace is execed on a sprout fs with a device in a seed fs,
the replaced device still exist in the seed fs together with
the replacing device in the sprout fs, so we only keep the latest device
with the newest generation
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Steps to reproduce:
# mkfs.btrfs -f /dev/sda[1-2]
# btrfstune -S 1 /dev/sda1
# mount /dev/sda /mnt
# btrfs dev add /dev/sda3 /mnt
# umount /mnt
# mkfs.ext4 /dev/sda1 // kill seed dev
# mkfs.ext4 /dev/sda2 // kill seed dev
# btrfs-debug-tree /dev/sda3 <== BUG_ON
Output msg:
volumes.c:1824: btrfs_read_chunk_tree: Assertion `ret` failed.
btrfs-debug-tree[0x41cb36]
btrfs-debug-tree(btrfs_read_chunk_tree+0x3ca)
btrfs-debug-tree(btrfs_setup_chunk_tree_and_device_map
btrfs-debug-tree[0x40f695]
btrfs-debug-tree(open_ctree_fs_info+0x86)
btrfs-debug-tree(main+0x12d)
/lib64/libc.so.6(__libc_start_main+0xf5)
btrfs-debug-tree[0x4062e9]
This BUG_ON complains about a failed @read_one_dev() call when
@open_seed_devices() failed to find the seed @fs_devices object
for a dev_item in chunk tree.
In this case, just insert a "shadow" @fs_devices with the fsid in
dev_item shall make no harm since no other tools will try to
make use of the stuff that the "shadow" @fs_devices possesses
after its creation.
After apply this commit, btrfs-debug-tree will report unable
to open the device.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Asserting is no fun, we may be able to recover from this error in certain cases
(like btrfs-image and btrfsck). Just do what the kernel does and spit out an
error and return that there is only 1 copy. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Recently we merge a memory leak fix, which fails xfstests/btrfs/012,
the cause is that it only frees @fs_devices but leaves it on the global
fs_uuid list, which cause a 'Segmentation fault' over running command
btrfs-convert. This fixes the problem.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Btrfs-progs superblock checksum check is somewhat too restricted for
super-recover, since current btrfs-progs will only read the 1st
superblock and if you need super-recover the 1st superblock is
possibly already damaged.
The fix is introducing super_recover parameter for
btrfs_read_dev_super() and callers to allow scan backup superblocks if
needed.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
When a struct btrfs_fs_devices was being torn down by
btrfs_close_devices(), there was an invalidated pointer in the global
list fs_uuids which still pointed to it; if a device was closed and
then reopened (which btrfs-convert does), freed memory would be
accessed.
This was found using ThreadSanitizer (pretty much doing what
AddressSanitizer would, but not exiting after the first failure).
To reproduce, build with -fsanitize=thread and run 'make test'.
Representative output is below.
This change makes the current tests TSan-clean.
WARNING: ThreadSanitizer: heap-use-after-free (pid=29161)
Read of size 8 at 0x7d180000eee0 by main thread:
#0 memcmp ??:0
#1 find_fsid .../volumes.c:81
#2 device_list_add .../volumes.c:95
#3 btrfs_scan_one_device .../volumes.c:259
#4 btrfs_scan_fs_devices .../disk-io.c:1002
#5 __open_ctree_fd .../disk-io.c:1090
#6 open_ctree_fd .../disk-io.c:1191
#7 do_convert .../btrfs-convert.c:2317
#8 main .../btrfs-convert.c:2745
Previous write of size 8 at 0x7d180000eee0 by main thread:
#0 free ??:0
#1 btrfs_close_devices .../volumes.c:191
#2 close_ctree .../disk-io.c:1401
#3 do_convert .../btrfs-convert.c:2300
#4 main .../btrfs-convert.c:2745
Location is heap block of size 96 at 0x7d180000eee0 allocated by main thread:
#0 calloc ??:0 (exe+0x00000002acc6)
#1 device_list_add .../volumes.c:97
#2 btrfs_scan_one_device .../volumes.c:259
#3 btrfs_scan_fs_devices .../disk-io.c:1002
#4 __open_ctree_fd .../disk-io.c:1090
#5 open_ctree_fd .../disk-io.c:1191
#6 do_convert .../btrfs-convert.c:2256
#7 main .../btrfs-convert.c:2745
Signed-off-by: Adam Buchbinder <abuchbinder@google.com>
Reviewed-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
For RAID0,5,6,10,
For system chunk, there shouldn't be too many stripes to
make a btrfs_chunk that exceeds BTRFS_SYSTEM_CHUNK_ARRAY_SIZE
For data/meta chunk, there shouldn't be too many stripes to
make a btrfs_chunk that exceeds a leaf.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
For system chunk array,
We copy a "disk_key" and an chunk item each time,
so there should be enough space to hold both of them,
not only the chunk item.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Valgrind reports memleak in btrfs_scan_one_device() about allocating
btrfs_device but on btrfs_close_devices() they are not reclaimed.
Although not a bug since after btrfs_close_devices() btrfs will exit so
memory will be reclaimed by system anyway, it's better to fix it anyway.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
this patch will make btrfsck operations to open disk in exclusive mode,
so that mount will fail when btrfsck is running
Signed-off-by: Anand Jain <Anand.Jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>