linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-11 21:38:32 +08:00

Author	SHA1	Message	Date
David Sterba	8f881e8c18	btrfs: get fs_info from eb in leaf_data_end We can read fs_info from extent buffer and can drop it from the parameters. Signed-off-by: David Sterba <dsterba@suse.com>	2019-04-29 19:02:30 +02:00
David Sterba	0ab0206328	btrfs: get fs_info from eb in write_one_eb We can read fs_info from extent buffer and can drop it from the parameters. Signed-off-by: David Sterba <dsterba@suse.com>	2019-04-29 19:02:29 +02:00
David Sterba	20a1fbf97e	btrfs: get fs_info from eb in repair_eb_io_failure We can read fs_info from extent buffer and can drop it from the parameters. As all callsites are updated, add the btrfs_ prefix as the function is exported. Signed-off-by: David Sterba <dsterba@suse.com>	2019-04-29 19:02:29 +02:00
David Sterba	9df76fb544	btrfs: get fs_info from eb in lock_extent_buffer_for_io We can read fs_info from extent buffer and can drop it from the parameters. Signed-off-by: David Sterba <dsterba@suse.com>	2019-04-29 19:02:29 +02:00
Phillip Potter	7d157c3d48	btrfs: use common file type conversion Deduplicate the btrfs file type conversion implementation - file systems that use the same file types as defined by POSIX do not need to define their own versions and can use the common helper functions decared in fs_types.h and implemented in fs_types.c Common implementation can be found via commit: `bbe7449e25` "fs: common implementation of file type" Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Phillip Potter <phil@philpotter.co.uk> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-04-29 19:02:29 +02:00
Goldwyn Rodrigues	7984ae52bb	btrfs: Perform locking/unlocking in btrfs_remap_file_range() Move code to make it more readable, so as locking and unlocking is done in the same function. The generic checks that are now performed in the locked section are unaffected. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-04-29 19:02:29 +02:00
Arnd Bergmann	290342f661	btrfs: use BUG() instead of BUG_ON(1) BUG_ON(1) leads to bogus warnings from clang when CONFIG_PROFILE_ANNOTATED_BRANCHES is set: fs/btrfs/volumes.c:5041:3: error: variable 'max_chunk_size' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized] BUG_ON(1); ^~~~~~~~~ include/asm-generic/bug.h:61:36: note: expanded from macro 'BUG_ON' #define BUG_ON(condition) do { if (unlikely(condition)) BUG(); } while (0) ^~~~~~~~~~~~~~~~~~~ include/linux/compiler.h:48:23: note: expanded from macro 'unlikely' # define unlikely(x) (__branch_check__(x, 0, __builtin_constant_p(x))) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ fs/btrfs/volumes.c:5046:9: note: uninitialized use occurs here max_chunk_size); ^~~~~~~~~~~~~~ include/linux/kernel.h:860:36: note: expanded from macro 'min' #define min(x, y) __careful_cmp(x, y, <) ^ include/linux/kernel.h:853:17: note: expanded from macro '__careful_cmp' __cmp_once(x, y, __UNIQUE_ID(__x), __UNIQUE_ID(__y), op)) ^ include/linux/kernel.h:847:25: note: expanded from macro '__cmp_once' typeof(y) unique_y = (y); \ ^ fs/btrfs/volumes.c:5041:3: note: remove the 'if' if its condition is always true BUG_ON(1); ^ include/asm-generic/bug.h:61:32: note: expanded from macro 'BUG_ON' #define BUG_ON(condition) do { if (unlikely(condition)) BUG(); } while (0) ^ fs/btrfs/volumes.c:4993:20: note: initialize the variable 'max_chunk_size' to silence this warning u64 max_chunk_size; ^ = 0 Change it to BUG() so clang can see that this code path can never continue. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David Sterba <dsterba@suse.com>	2019-04-29 19:02:28 +02:00
David Sterba	247462a5ac	btrfs: move tree block wait and write helpers to tree-log The wrapper names better describe what's happening so they're not deleted though they're trivial, but at least moved closer to their place of use. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-04-29 19:02:28 +02:00
David Sterba	d4eb671a08	btrfs: remove stale definition of BUFFER_LRU_MAX Long time ago (2008), the extent buffers were organized in a LRU list and switched to rb-tree in `6af118ce51` ("Btrfs: Index extent buffers in an rbtree"). There was one stale macro definition left. Signed-off-by: David Sterba <dsterba@suse.com>	2019-04-29 19:02:28 +02:00
David Sterba	e4fa7469eb	btrfs: tests: unify messages when tests start - make the messages more visually consistent and use same format "running ... test", any error or other warning can be easily spotted - move some message to the test entry function - add message to the inode tests Example output: [ 8.187391] Btrfs loaded, crc32c=crc32c-generic, assert=on, integrity-checker=on, ref-verify=on [ 8.189476] BTRFS: selftest: sectorsize: 4096 nodesize: 4096 [ 8.190761] BTRFS: selftest: running btrfs free space cache tests [ 8.192245] BTRFS: selftest: running extent only tests [ 8.193573] BTRFS: selftest: running bitmap only tests [ 8.194876] BTRFS: selftest: running bitmap and extent tests [ 8.196166] BTRFS: selftest: running space stealing from bitmap to extent tests [ 8.198026] BTRFS: selftest: running extent buffer operation tests [ 8.199328] BTRFS: selftest: running btrfs_split_item tests [ 8.200653] BTRFS: selftest: running extent I/O tests [ 8.201808] BTRFS: selftest: running find delalloc tests [ 8.320733] BTRFS: selftest: running extent buffer bitmap tests [ 8.340795] BTRFS: selftest: running inode tests [ 8.341766] BTRFS: selftest: running btrfs_get_extent tests [ 8.342981] BTRFS: selftest: running hole first btrfs_get_extent test [ 8.344342] BTRFS: selftest: running outstanding_extents tests [ 8.345575] BTRFS: selftest: running qgroup tests [ 8.346537] BTRFS: selftest: running qgroup add/remove tests [ 8.347725] BTRFS: selftest: running qgroup multiple refs test [ 8.354982] BTRFS: selftest: running free space tree tests [ 8.372175] BTRFS: selftest: sectorsize: 4096 nodesize: 8192 [ 8.373539] BTRFS: selftest: running btrfs free space cache tests [ 8.374989] BTRFS: selftest: running extent only tests [ 8.376236] BTRFS: selftest: running bitmap only tests [ 8.377483] BTRFS: selftest: running bitmap and extent tests [ 8.378854] BTRFS: selftest: running space stealing from bitmap to extent tests ... Signed-off-by: David Sterba <dsterba@suse.com>	2019-04-29 19:02:28 +02:00
David Sterba	752dbe48e2	btrfs: tests: drop messages when some tests finish The messages like 'extent I/O tests finished' are redundant, if the test fails it's quite obvious in the log and hang is also noticeable. No other then extent_io and free space tree tests print that so make it consistent. Signed-off-by: David Sterba <dsterba@suse.com>	2019-04-29 19:02:28 +02:00
David Sterba	3173fd926c	btrfs: tests: fix comments about tested extent map ranges Comments about ranges did not match the code, the correct calculation is to use start and start+len as the interval boundaries. Signed-off-by: David Sterba <dsterba@suse.com>	2019-04-29 19:02:28 +02:00
David Sterba	43f7cddc6e	btrfs: tests: use SZ_ constants everywhere There are a few unconverted constants that are not powers of two and haven't been converted. Signed-off-by: David Sterba <dsterba@suse.com>	2019-04-29 19:02:27 +02:00
David Sterba	6c30474680	btrfs: tests: use standard error message after extent map allocation failure Signed-off-by: David Sterba <dsterba@suse.com>	2019-04-29 19:02:27 +02:00
David Sterba	ccfada1f65	btrfs: tests: return error from all extent map test cases The way the extent map tests handle errors does not conform to the rest of the suite, where the first failure is reported and then it stops. Do the same now that we have the errors returned from all the functions. Signed-off-by: David Sterba <dsterba@suse.com>	2019-04-29 19:02:27 +02:00
David Sterba	7c6f670052	btrfs: tests: return errors from extent map test case 4 Replace asserts with error messages and return errors. Signed-off-by: David Sterba <dsterba@suse.com>	2019-04-29 19:02:27 +02:00
David Sterba	992dce7494	btrfs: tests: return errors from extent map test case 3 Replace asserts with error messages and return errors. Signed-off-by: David Sterba <dsterba@suse.com>	2019-04-29 19:02:27 +02:00
David Sterba	e71f2e17e8	btrfs: tests: return errors from extent map test case 2 Replace asserts with error messages and return errors. Signed-off-by: David Sterba <dsterba@suse.com>	2019-04-29 19:02:27 +02:00
David Sterba	d7de4b0864	btrfs: tests: return errors from extent map test case 1 Replace asserts with error messages and return errors. Signed-off-by: David Sterba <dsterba@suse.com>	2019-04-29 19:02:26 +02:00
David Sterba	488f673023	btrfs: tests: return errors from extent map tests The individual testcases for extent maps do not return an error on allocation failures. This is not a big problem as the allocation don't fail in general but there are functional tests handled with ASSERTS. This makes tests dependent on them and it's not reliable. This patch adds the allocation failure handling and allows for the conversion of the asserts to proper error handling and reporting. Signed-off-by: David Sterba <dsterba@suse.com>	2019-04-29 19:02:26 +02:00
David Sterba	7b9586bc2b	btrfs: tests: properly initialize fs_info of extent buffer The fs_info is supposed to be valid, even though it's not used right now and the test does not crash. Signed-off-by: David Sterba <dsterba@suse.com>	2019-04-29 19:02:26 +02:00
David Sterba	3199366da7	btrfs: tests: use standard error message after block group allocation failure Signed-off-by: David Sterba <dsterba@suse.com>	2019-04-29 19:02:26 +02:00
David Sterba	6a060db85d	btrfs: tests: use standard error message after inode allocation failure Signed-off-by: David Sterba <dsterba@suse.com>	2019-04-29 19:02:26 +02:00
David Sterba	770e0cc040	btrfs: tests: use standard error message after path allocation failure Signed-off-by: David Sterba <dsterba@suse.com>	2019-04-29 19:02:26 +02:00
David Sterba	9e3d9f8462	btrfs: tests: use standard error message after extent buffer allocation failure Signed-off-by: David Sterba <dsterba@suse.com>	2019-04-29 19:02:25 +02:00
David Sterba	52ab7bca35	btrfs: tests: use standard error message after root allocation failure Signed-off-by: David Sterba <dsterba@suse.com>	2019-04-29 19:02:25 +02:00
David Sterba	37b2a7bc1e	btrfs: tests: use standard error message after fs_info allocation failure Signed-off-by: David Sterba <dsterba@suse.com>	2019-04-29 19:02:25 +02:00
David Sterba	703de4266f	btrfs: tests: add table of most common errors Allocation of main objects like fs_info or extent buffers is in each test so let's simplify and unify the error messages to a table and add a convenience helper. Signed-off-by: David Sterba <dsterba@suse.com>	2019-04-29 19:02:25 +02:00
David Sterba	efd31fce54	btrfs: tests: print file:line for error messages For better diagnostics print the file name and line to locate the errors. Sample output: [ 9.052924] BTRFS: selftest: fs/btrfs/tests/extent-io-tests.c:283 offset bits do not match Signed-off-by: David Sterba <dsterba@suse.com>	2019-04-29 19:02:25 +02:00
David Sterba	d33d105b85	btrfs: tests: don't leak fs_info in extent_io bitmap tests The fs_info is not freed at the end of the function and leaks. The function is called twice so there can be up to 2x sizeof(struct btrfs_fs_info) of leaked memory. Fortunatelly this affects only testing builds, the size could be 16k with several debugging features enabled. Signed-off-by: David Sterba <dsterba@suse.com>	2019-04-29 19:02:25 +02:00
David Sterba	d46a05edac	btrfs: tests: handle fs_info allocation failure in extent_io tests Signed-off-by: David Sterba <dsterba@suse.com>	2019-04-29 19:02:24 +02:00
Qu Wenruo	75391f0d41	btrfs: disk-io: Show the timing of corrupted tree block explicitly Just add one extra line to show when the corruption is detected. Currently only read time detection is possible. The planned distinguish line would be: read time: <detailed report> block=XXXXX read time tree block corruption detected write time: <detailed report> block=XXXXX write time tree block corruption detected Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-04-29 19:02:24 +02:00
Josef Bacik	ff612ba784	btrfs: fix panic during relocation after ENOSPC before writeback happens We've been seeing the following sporadically throughout our fleet panic: kernel BUG at fs/btrfs/relocation.c:4584! netversion: 5.0-0 Backtrace: #0 [ffffc90003adb880] machine_kexec at ffffffff81041da8 #1 [ffffc90003adb8c8] __crash_kexec at ffffffff8110396c #2 [ffffc90003adb988] crash_kexec at ffffffff811048ad #3 [ffffc90003adb9a0] oops_end at ffffffff8101c19a #4 [ffffc90003adb9c0] do_trap at ffffffff81019114 #5 [ffffc90003adba00] do_error_trap at ffffffff810195d0 #6 [ffffc90003adbab0] invalid_op at ffffffff81a00a9b [exception RIP: btrfs_reloc_cow_block+692] RIP: ffffffff8143b614 RSP: ffffc90003adbb68 RFLAGS: 00010246 RAX: fffffffffffffff7 RBX: ffff8806b9c32000 RCX: ffff8806aad00690 RDX: ffff880850b295e0 RSI: ffff8806b9c32000 RDI: ffff88084f205bd0 RBP: ffff880849415000 R8: ffffc90003adbbe0 R9: ffff88085ac90000 R10: ffff8805f7369140 R11: 0000000000000000 R12: ffff880850b295e0 R13: ffff88084f205bd0 R14: 0000000000000000 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffffc90003adbbb0] __btrfs_cow_block at ffffffff813bf1cd #8 [ffffc90003adbc28] btrfs_cow_block at ffffffff813bf4b3 #9 [ffffc90003adbc78] btrfs_search_slot at ffffffff813c2e6c The way relocation moves data extents is by creating a reloc inode and preallocating extents in this inode and then copying the data into these preallocated extents. Once we've done this for all of our extents, we'll write out these dirty pages, which marks the extent written, and goes into btrfs_reloc_cow_block(). From here we get our current reloc_control, which _should_ match the reloc_control for the current block group we're relocating. However if we get an ENOSPC in this path at some point we'll bail out, never initiating writeback on this inode. Not a huge deal, unless we happen to be doing relocation on a different block group, and this block group is now rc->stage == UPDATE_DATA_PTRS. This trips the BUG_ON() in btrfs_reloc_cow_block(), because we expect to be done modifying the data inode. We are in fact done modifying the metadata for the data inode we're currently using, but not the one from the failed block group, and thus we BUG_ON(). (This happens when writeback finishes for extents from the previous group, when we are at btrfs_finish_ordered_io() which updates the data reloc tree (inode item, drops/adds extent items, etc).) Fix this by writing out the reloc data inode always, and then breaking out of the loop after that point to keep from tripping this BUG_ON() later. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> [ add note from Filipe ] Signed-off-by: David Sterba <dsterba@suse.com>	2019-04-29 19:02:24 +02:00
Nikolay Borisov	6a8d2136ca	btrfs: Use less confusing condition for uptodate parameter to btrfs_writepage_endio_finish_ordered The uptodate parameter of btrfs_writepage_endio_finish_ordered is used to signal whether an error has occured while writing the given page. 0 signals an error, which is propagated to callees and 1 signifies success. In end_compressed_bio_write the ->bi_status is checked and based on it either BLK_STS_OK (0) or BLK_STS_NOTSUPP (1) are used. While from functional point of view this is ok it's a for the poor reader of the code, since the block layer values are conflated with the semantics of the parameter. Just use plain 0 or 1. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-04-29 19:02:24 +02:00
Qu Wenruo	a2a72fbd11	btrfs: extent_io: Handle errors better in extent_writepages() We can only get <=0 from extent_write_cache_pages, add an ASSERT() for it just in case. Then instead of submitting the write bio even if we got some error, check the return value first. If we have already hit some error, just clean up the corrupted or half-baked bio, and return error. If there is no error so far, then call flush_write_bio() and return the result. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-04-29 19:02:24 +02:00
Qu Wenruo	2e3c25136a	btrfs: extent_io: add proper error handling to lock_extent_buffer_for_io() This function needs some extra checks on locked pages and eb. For error handling we need to unlock locked pages and the eb. There is a rare >0 return value branch, where all pages get locked while write bio is not flushed. Thankfully it's handled by the only caller, btree_write_cache_pages(), as later write_one_eb() call will trigger submit_one_bio(). So there shouldn't be any problem. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-04-29 19:02:24 +02:00
Qu Wenruo	02c6db4f73	btrfs: extent_io: Handle errors better in extent_write_locked_range() We can only get @ret <= 0. Add an ASSERT() for it just in case. Then, instead of submitting the write bio even we got some error, check the return value first. If we have already hit some error, just clean up the corrupted or half-baked bio, and return error. If there is no error so far, then call flush_write_bio() and return the result. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-04-29 19:02:23 +02:00
Qu Wenruo	e06808be8a	btrfs: extent_io: Kill dead condition in extent_write_cache_pages() Since __extent_writepage() will no longer return >0 value, (ret == AOP_WRITEPAGE_ACTIVATE) will never be true. Kill that dead branch. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-04-29 19:02:23 +02:00
Qu Wenruo	2b952eea81	btrfs: extent_io: Handle errors better in btree_write_cache_pages() In btree_write_cache_pages(), we can only get @ret <= 0. Add an ASSERT() for it just in case. Then instead of submitting the write bio even we got some error, check the return value first. If we have already hit some error, just clean up the corrupted or half-baked bio, and return error. If there is no error so far, then call flush_write_bio() and return the result. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-04-29 19:02:23 +02:00
Qu Wenruo	3065976b04	btrfs: extent_io: Handle errors better in extent_write_full_page() Since now flush_write_bio() could return error, kill the BUG_ON() first. Then don't call flush_write_bio() unconditionally, instead we check the return value from __extent_writepage() first. If __extent_writepage() fails, we do cleanup, and return error without submitting the possible corrupted or half-baked bio. If __extent_writepage() successes, then we call flush_write_bio() and return the result. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-04-29 19:02:23 +02:00
Qu Wenruo	f4340622e0	btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up We have a BUG_ON() in flush_write_bio() to handle the return value of submit_one_bio(). Move the BUG_ON() one level up to all its callers. This patch will introduce temporary variable, @flush_ret to keep code change minimal in this patch. That variable will be cleaned up when enhancing the error handling later. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-04-29 19:02:23 +02:00
Qu Wenruo	63489055e4	btrfs: Always output error message when key/level verification fails We have internal report of strange transaction abort due to EUCLEAN without any error message. Since error message inside verify_level_key() is only enabled for CONFIG_BTRFS_DEBUG, the error message won't be printed on most builds. This patch will make the error message mandatory, so when problem happens we know what's causing the problem. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-04-29 19:02:23 +02:00
Qu Wenruo	448de471cd	btrfs: Check the first key and level for cached extent buffer [BUG] When reading a file from a fuzzed image, kernel can panic like: BTRFS warning (device loop0): csum failed root 5 ino 270 off 0 csum 0x98f94189 expected csum 0x00000000 mirror 1 assertion failed: !memcmp_extent_buffer(b, &disk_key, offsetof(struct btrfs_leaf, items[0].key), sizeof(disk_key)), file: fs/btrfs/ctree.c, line: 2544 ------------[ cut here ]------------ kernel BUG at fs/btrfs/ctree.h:3500! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI RIP: 0010:btrfs_search_slot.cold.24+0x61/0x63 [btrfs] Call Trace: btrfs_lookup_csum+0x52/0x150 [btrfs] __btrfs_lookup_bio_sums+0x209/0x640 [btrfs] btrfs_submit_bio_hook+0x103/0x170 [btrfs] submit_one_bio+0x59/0x80 [btrfs] extent_read_full_page+0x58/0x80 [btrfs] generic_file_read_iter+0x2f6/0x9d0 __vfs_read+0x14d/0x1a0 vfs_read+0x8d/0x140 ksys_read+0x52/0xc0 do_syscall_64+0x60/0x210 entry_SYSCALL_64_after_hwframe+0x49/0xbe [CAUSE] The fuzzed image has a corrupted leaf whose first key doesn't match its parent: checksum tree key (CSUM_TREE ROOT_ITEM 0) node 29741056 level 1 items 14 free 107 generation 19 owner CSUM_TREE fs uuid 3381d111-94a3-4ac7-8f39-611bbbdab7e6 chunk uuid 9af1c3c7-2af5-488b-8553-530bd515f14c ... key (EXTENT_CSUM EXTENT_CSUM 79691776) block 29761536 gen 19 leaf 29761536 items 1 free space 1726 generation 19 owner CSUM_TREE leaf 29761536 flags 0x1(WRITTEN) backref revision 1 fs uuid 3381d111-94a3-4ac7-8f39-611bbbdab7e6 chunk uuid 9af1c3c7-2af5-488b-8553-530bd515f14c item 0 key (EXTENT_CSUM EXTENT_CSUM 8798638964736) itemoff 1751 itemsize 2244 range start 8798638964736 end 8798641262592 length 2297856 When reading the above tree block, we have extent_buffer->refs = 2 in the context: - initial one from __alloc_extent_buffer() alloc_extent_buffer() \|- __alloc_extent_buffer() \|- atomic_set(&eb->refs, 1) - one being added to fs_info->buffer_radix alloc_extent_buffer() \|- check_buffer_tree_ref() \|- atomic_inc(&eb->refs) So if even we call free_extent_buffer() in read_tree_block or other similar situation, we only decrease the refs by 1, it doesn't reach 0 and won't be freed right now. The staled eb and its corrupted content will still be kept cached. Furthermore, we have several extra cases where we either don't do first key check or the check is not proper for all callers: - scrub We just don't have first key in this context. - shared tree block One tree block can be shared by several snapshot/subvolume trees. In that case, the first key check for one subvolume doesn't apply to another. So for the above reasons, a corrupted extent buffer can sneak into the buffer cache. [FIX] Call verify_level_key in read_block_for_search to do another verification. For that purpose the function is exported. Due to above reasons, although we can free corrupted extent buffer from cache, we still need the check in read_block_for_search(), for scrub and shared tree blocks. Link: https://bugzilla.kernel.org/show_bug.cgi?id=202755 Link: https://bugzilla.kernel.org/show_bug.cgi?id=202757 Link: https://bugzilla.kernel.org/show_bug.cgi?id=202759 Link: https://bugzilla.kernel.org/show_bug.cgi?id=202761 Link: https://bugzilla.kernel.org/show_bug.cgi?id=202767 Link: https://bugzilla.kernel.org/show_bug.cgi?id=202769 Reported-by: Yoon Jungyeon <jungyeon@gatech.edu> CC: stable@vger.kernel.org # 4.19+ Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-04-29 19:02:22 +02:00
Nikolay Borisov	537f38f019	btrfs: Correctly free extent buffer in case btree_read_extent_buffer_pages fails If a an eb fails to be read for whatever reason - it's corrupted on disk and parent transid/key validations fail or IO for eb pages fail then this buffer must be removed from the buffer cache. Currently the code calls free_extent_buffer if an error occurs. Unfortunately this doesn't achieve the desired behavior since btrfs_find_create_tree_block returns with eb->refs == 2. On the other hand free_extent_buffer will only decrement the refs once leaving it added to the buffer cache radix tree. This enables later code to look up the buffer from the cache and utilize it potentially leading to a crash. The correct way to free the buffer is call free_extent_buffer_stale. This function will correctly call atomic_dec explicitly for the buffer and subsequently call release_extent_buffer which will decrement the final reference thus correctly remove the invalid buffer from buffer cache. This change affects only newly allocated buffers since they have eb->refs == 2. Link: https://bugzilla.kernel.org/show_bug.cgi?id=202755 Reported-by: Jungyeon <jungyeon@gatech.edu> CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-04-29 19:02:22 +02:00
Qu Wenruo	80fbc341dc	btrfs: Make btrfs_(set\|clear)_header_flag return void From the introduction of btrfs_(set\|clear)_header_flag, there is no usage of its return value. So just make it return void. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-04-29 19:02:22 +02:00
Qu Wenruo	10995c0491	btrfs: reloc: Fix NULL pointer dereference due to expanded reloc_root lifespan Commit `d2311e6985` ("btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots()") expands the life span of root->reloc_root. This breaks certain checs of fs_info->reloc_ctl. Before that commit, if we have a root with valid reloc_root, then it's ensured to have fs_info->reloc_ctl. But now since reloc_root doesn't always mean a valid fs_info->reloc_ctl, such check is unreliable and can cause the following NULL pointer dereference: BUG: unable to handle kernel NULL pointer dereference at 00000000000005c1 IP: btrfs_reloc_pre_snapshot+0x20/0x50 [btrfs] PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 0 PID: 10379 Comm: snapperd Not tainted Call Trace: create_pending_snapshot+0xd7/0xfc0 [btrfs] create_pending_snapshots+0x8e/0xb0 [btrfs] btrfs_commit_transaction+0x2ac/0x8f0 [btrfs] btrfs_mksubvol+0x561/0x570 [btrfs] btrfs_ioctl_snap_create_transid+0x189/0x190 [btrfs] btrfs_ioctl_snap_create_v2+0x102/0x150 [btrfs] btrfs_ioctl+0x5c9/0x1e60 [btrfs] do_vfs_ioctl+0x90/0x5f0 SyS_ioctl+0x74/0x80 do_syscall_64+0x7b/0x150 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 RIP: 0033:0x7fd7cdab8467 Fix it by explicitly checking fs_info->reloc_ctl other than using the implied root->reloc_root. Fixes: `d2311e6985` ("btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots") Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-04-29 19:02:22 +02:00
Nikolay Borisov	d51f51bb6f	btrfs: Remove unused -EIO assignment in end_bio_extent_readpage In case we hit the error case for a metadata buffer in end_bio_extent_readpage then 'ret' won't really be checked before it's written again to. This means the -EIO in this case will never be checked, just remove it. Fixes-coverity-id: 1442513 Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-04-29 19:02:22 +02:00
Nikolay Borisov	e65ef21ed8	btrfs: Exploit the fact that pages passed to extent_readpages are always contiguous Currently extent_readpages (called from btrfs_readpages) will always call __extent_readpages which tries to create contiguous range of pages and call __do_contiguous_readpages when such contiguous range is created. It turns out this is unnecessary due to the fact that generic MM code always calls filesystem's ->readpages callback (btrfs_readpages in this case) with already contiguous pages. Armed with this knowledge it's possible to simplify extent_readpages by eliminating the call to __extent_readpages and directly calling contiguous_readpages. The only edge case that needs to be handled is when add_to_page_cache_lru fails. This is easy as all that is needed is to submit whatever is the number of pages successfully added to the lru. This can happen when the page is already in the range, so it does not need to be read again, and we can't do anything else in case of other errors. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-04-29 19:02:22 +02:00
David Sterba	ed1b4ed79d	btrfs: switch extent_buffer::lock_nested to bool The member is tracking simple status of the lock, we can use bool for that and make some room for further space reduction in the structure. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: David Sterba <dsterba@suse.com>	2019-04-29 19:02:21 +02:00
David Sterba	c79adfc085	btrfs: use assertion helpers for extent buffer write lock counters Use the helpers where open coded. On non-debug builds, the warnings will not trigger and extent_buffer::write_locks become unused and can be moved to the appropriate section, saving a few bytes. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: David Sterba <dsterba@suse.com>	2019-04-29 19:02:21 +02:00

1 2 3 4 5 ...

58052 Commits