Structure tree_entry provides a very simple rb_tree which only uses
bytenr as search index.
That tree_entry is used in 3 structures: backref_node, mapping_node and
tree_block.
Since we're going to make backref_node independnt from relocation, it's
a good time to extract the tree_entry into rb_simple_node, and export it
into misc.h.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
These 3 structures are the main part of btrfs backref cache, move them
to backref.h to build the basis for later reuse.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Those three structures are the main elements of backref cache. Add the
"btrfs_" prefix for later export.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This patch will also add some comment for the cleanup.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
After handle_one_tree_backref(), all newly added (not cached) edges and
nodes have the following features:
- Only backref_edge::list[LOWER] is linked.
This means, we can only iterate from botton to top, not the other
direction.
- Newly added nodes are not added to cache rb_tree yet
So to finish the backref cache, we still need to finish the links and
add all nodes into backref cache rb_tree.
This patch will refactor the existing code into finish_upper_links(),
add more comments of each branch, and why we need to do all the work.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
build_backref_tree() uses "goto again;" to implement a breadth-first
search to build backref cache.
This patch will extract most of its work into a wrapper,
handle_one_tree_block(), and use a do {} while() loop to implement the
same thing.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Bytenr and level are essential parameters for backref_node, thus it
makes sense to initialize them at allocation time.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Since backref_edge is used to connect upper and lower backref nodes, and
needs to access both nodes, some code can look pretty nasty:
list_add_tail(&edge->list[LOWER], &cur->upper);
The above code will link @cur to the LOWER side of the edge, while both
"LOWER" and "upper" words show up. This can sometimes be very confusing
for reader to grasp.
This patch introduces a new wrapper, link_backref_edge(), to handle the
linking behavior. Which also has extra ASSERT() to ensure caller won't
pass wrong nodes.
Also, this updates the comment of related lists of backref_node and
backref_edge, to make it more clear that each list points to what.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The processing of indirect tree backref (TREE_BLOCK_REF) is the most
complex work.
We need to grab the fs root, do a tree search to locate all its parent
nodes, link all needed edges, and put all uncached edges to pending edge
list.
This is definitely worth a helper function.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
For BTRFS_SHARED_BLOCK_REF_KEY, its processing is straightforward, as we
now the parent node bytenr directly.
If the parent is already cached, or a root, call it a day.
If the parent is not cached, add it pending list.
This patch will just refactor this part into its own function,
handle_direct_tree_backref() and add some comment explaining the
@ref_key parameter.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
find_reloc_root() searches reloc_control::reloc_root_tree to find the
reloc root. This behavior is only useful for relocation backref cache.
For the incoming more generic purpose backref cache, we don't care
about who owns the reloc root, but only care if it's a reloc root.
So this patch makes the following modifications to make the reloc root
search more specific to relocation backref:
- Add backref_node::is_reloc_root
This will be an extra indicator for generic purposed backref cache.
User doesn't need to read root key from backref_node::root to
determine if it's a reloc root.
Also for reloc tree root, it's useless and will be queued to useless
list.
- Add backref_cache::is_reloc
This will allow backref cache code to do different behavior for
generic purpose backref cache and relocation backref cache.
- Pass fs_info to find_reloc_root()
- Export find_reloc_root()
So backref.c can utilize this function.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add this member so that we can grab fs_info without the help from
reloc_control.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
These two new members will act the same as the existing local lists,
@useless and @list in build_backref_tree().
Currently build_backref_tree() is only executed serially, thus moving
such local list into backref_cache is still safe.
Also since we're here, use list_first_entry() to replace a lot of
list_entry() calls after !list_empty().
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
These two functions are weirdly named, mark_block_processed() in fact
just marks a range dirty unconditionally, while __mark_block_processed()
does extra check before doing the marking.
This patch will open code old mark_block_processed, and rename
__mark_block_processed() to remove the "__" prefix.
Since we're here, also kill the forward declaration, which could also
kill in_block_group() with in_range() macro.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In the core function of relocation, build_backref_tree, it needs to
iterate all backref items of one tree block.
Use btrfs_backref_iter infrastructure to do the loop and make the code
more readable.
The backref items look would be much more easier to read:
ret = btrfs_backref_iter_start(iter, cur->bytenr);
for (; ret == 0; ret = btrfs_backref_iter_next(iter)) {
/* The really important work */
}
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_recover_relocation() invokes btrfs_join_transaction(), which joins
a btrfs_trans_handle object into transactions and returns a reference of
it with increased refcount to "trans".
When btrfs_recover_relocation() returns, "trans" becomes invalid, so the
refcount should be decreased to keep refcount balanced.
The reference counting issue happens in one exception handling path of
btrfs_recover_relocation(). When read_fs_root() failed, the refcnt
increased by btrfs_join_transaction() is not decreased, causing a refcnt
leak.
Fix this issue by calling btrfs_end_transaction() on this error path
when read_fs_root() failed.
Fixes: 79787eaab4 ("btrfs: replace many BUG_ONs with proper error handling")
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
I made a mistake with my previous fix, I assumed that we didn't need to
mess with the reloc roots once we were out of the part of relocation where
we are actually moving the extents.
The subtle thing that I missed is that btrfs_init_reloc_root() also
updates the last_trans for the reloc root when we do
btrfs_record_root_in_trans() for the corresponding fs_root. I've added a
comment to make sure future me doesn't make this mistake again.
This showed up as a WARN_ON() in btrfs_copy_root() because our
last_trans didn't == the current transid. This could happen if we
snapshotted a fs root with a reloc root after we set
rc->create_reloc_tree = 0, but before we actually merge the reloc root.
Worth mentioning that the regression produced the following warning
when running snapshot creation and balance in parallel:
BTRFS info (device sdc): relocating block group 30408704 flags metadata|dup
------------[ cut here ]------------
WARNING: CPU: 0 PID: 12823 at fs/btrfs/ctree.c:191 btrfs_copy_root+0x26f/0x430 [btrfs]
CPU: 0 PID: 12823 Comm: btrfs Tainted: G W 5.6.0-rc7-btrfs-next-58 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
RIP: 0010:btrfs_copy_root+0x26f/0x430 [btrfs]
RSP: 0018:ffffb96e044279b8 EFLAGS: 00010202
RAX: 0000000000000009 RBX: ffff9da70bf61000 RCX: ffffb96e04427a48
RDX: ffff9da733a770c8 RSI: ffff9da70bf61000 RDI: ffff9da694163818
RBP: ffff9da733a770c8 R08: fffffffffffffff8 R09: 0000000000000002
R10: ffffb96e044279a0 R11: 0000000000000000 R12: ffff9da694163818
R13: fffffffffffffff8 R14: ffff9da6d2512000 R15: ffff9da714cdac00
FS: 00007fdeacf328c0(0000) GS:ffff9da735e00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055a2a5b8a118 CR3: 00000001eed78002 CR4: 00000000003606f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
? create_reloc_root+0x49/0x2b0 [btrfs]
? kmem_cache_alloc_trace+0xe5/0x200
create_reloc_root+0x8b/0x2b0 [btrfs]
btrfs_reloc_post_snapshot+0x96/0x5b0 [btrfs]
create_pending_snapshot+0x610/0x1010 [btrfs]
create_pending_snapshots+0xa8/0xd0 [btrfs]
btrfs_commit_transaction+0x4c7/0xc50 [btrfs]
? btrfs_mksubvol+0x3cd/0x560 [btrfs]
btrfs_mksubvol+0x455/0x560 [btrfs]
__btrfs_ioctl_snap_create+0x15f/0x190 [btrfs]
btrfs_ioctl_snap_create_v2+0xa4/0xf0 [btrfs]
? mem_cgroup_commit_charge+0x6e/0x540
btrfs_ioctl+0x12d8/0x3760 [btrfs]
? do_raw_spin_unlock+0x49/0xc0
? _raw_spin_unlock+0x29/0x40
? __handle_mm_fault+0x11b3/0x14b0
? ksys_ioctl+0x92/0xb0
ksys_ioctl+0x92/0xb0
? trace_hardirqs_off_thunk+0x1a/0x1c
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x5c/0x280
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7fdeabd3bdd7
Fixes: 2abc726ab4 ("btrfs: do not init a reloc root if we aren't relocating")
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Previously we would set the reloc root's last snapshot to transid - 1.
However there was a problem with doing this, and we changed it to
setting the last snapshot to the generation of the commit node of the fs
root.
This however broke should_ignore_root(). The assumption is that if we
are in a generation newer than when the reloc root was created, then we
would find the reloc root through normal backref lookups, and thus can
ignore any fs roots we find with an old enough reloc root.
Now that the last snapshot could be considerably further in the past
than before, we'd end up incorrectly ignoring an fs root. Thus we'd
find no nodes for the bytenr we were searching for, and we'd fail to
relocate anything. We'd loop through the relocate code again and see
that there were still used space in that block group, attempt to
relocate those bytenr's again, fail in the same way, and just loop like
this forever. This is tricky in that we have to not modify the fs root
at all during this time, so we need to have a block group that has data
in this fs root that is not shared by any other root, which is why this
has been difficult to reproduce.
Fixes: 054570a1dc ("Btrfs: fix relocation incorrectly dropping data references")
CC: stable@vger.kernel.org # 4.9+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We always search the commit root of the extent tree for looking up back
references, however we track the reloc roots based on their current
bytenr.
This is wrong, if we commit the transaction between relocating tree
blocks we could end up in this code in build_backref_tree
if (key.objectid == key.offset) {
/*
* Only root blocks of reloc trees use backref
* pointing to itself.
*/
root = find_reloc_root(rc, cur->bytenr);
ASSERT(root);
cur->root = root;
break;
}
find_reloc_root() is looking based on the bytenr we had in the commit
root, but if we've COWed this reloc root we will not find that bytenr,
and we will trip over the ASSERT(root).
Fix this by using the commit_root->start bytenr for indexing the commit
root. Then we change the __update_reloc_root() caller to be used when
we switch the commit root for the reloc root during commit.
This fixes the panic I was seeing when we started throttling relocation
for delayed refs.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There are two bugs here, but fixing them independently would just result
in pain if you happened to bisect between the two patches.
First is how we handle the -EAGAIN from relocate_tree_block(). We don't
set error, unless we happen to be the first node, which makes no sense,
I have no idea what the code was trying to accomplish here.
We in fact _do_ want err set here so that we know we need to restart in
relocate_block_group(). Also we need finish_pending_nodes() to not
actually call link_to_upper(), because we didn't actually relocate the
block.
And then if we do get -EAGAIN we do not want to set our backref cache
last_trans to the one before ours. This would force us to update our
backref cache if we didn't cross transaction ids, which would mean we'd
have some nodes updated to their new_bytenr, but still able to find
their old bytenr because we're searching the same commit root as the
last time we went through relocate_tree_blocks.
Fixing these two things keeps us from panicing when we start breaking
out of relocate_tree_blocks() either for delayed ref flushing or enospc.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Since we're not only checking for metadata reservations but also if we
need to throttle our delayed ref generation, reorder
reserve_metadata_space() above the select_one_root() call in
relocate_tree_block().
The reason we want this is because select_reloc_root() will mess with
the backref cache, and if we're going to bail we want to be able to
cleanly remove this node from the backref cache and come back along to
regenerate it. Move it up so this is the first thing we do to make
restarting cleaner.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Here we are just searching down to the bytenr we're building the backref
tree for, and all of it's paths to the roots. These bytenrs are not
guaranteed to be anywhere near each other, so readahead just generates
extra latency.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There are a few different ways to free roots, either you allocated them
yourself and you just do
free_extent_buffer(root->node);
free_extent_buffer(root->commit_node);
btrfs_put_root(root);
Which is the pattern for log roots. Or for snapshots/subvolumes that
are being dropped you simply call btrfs_free_fs_root() which does all
the cleanup for you.
Unify this all into btrfs_put_root(), so that we don't free up things
associated with the root until the last reference is dropped. This
makes the root freeing code much more significant.
The only caveat is at close_ctree() time we have to free the extent
buffers for all of our main roots (extent_root, chunk_root, etc) because
we have to drop the btree_inode and we'll run into issues if we hold
onto those nodes until ->kill_sb() time. This will be addressed in the
future when we kill the btree_inode.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This was pretty subtle, we default to reloc roots having 0 root refs, so
if we crash in the middle of the relocation they can just be deleted.
If we successfully complete the relocation operations we'll set our root
refs to 1 in prepare_to_merge() and then go on to merge_reloc_roots().
At prepare_to_merge() time if any of the reloc roots have a 0 reference
still, we will remove that reloc root from our reloc root rb tree, and
then clean it up later.
However this only happens if we successfully start a transaction. If
we've aborted previously we will skip this step completely, and only
have reloc roots with a reference count of 0, but were never properly
removed from the reloc control's rb tree.
This isn't a problem per-se, our references are held by the list the
reloc roots are on, and by the original root the reloc root belongs to.
If we end up in this situation all the reloc roots will be added to the
dirty_reloc_list, and then properly dropped at that point. The reloc
control will be free'd and the rb tree is no longer used.
There were two options when fixing this, one was to remove the BUG_ON(),
the other was to make prepare_to_merge() handle the case where we
couldn't start a trans handle.
IMO this is the cleaner solution. I started with handling the error in
prepare_to_merge(), but it turned out super ugly. And in the end this
BUG_ON() simply doesn't matter, the cleanup was happening properly, we
were just panicing because this BUG_ON() only matters in the success
case. So I've opted to just remove it and add a comment where it was.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We previously were relying on root->reloc_root to be cleaned up by the
drop snapshot, or the error handling. However if btrfs_drop_snapshot()
failed it wouldn't drop the ref for the root. Also we sort of depend on
the right thing to happen with moving reloc roots between lists and the
fs root they belong to, which makes it hard to figure out who owns the
reference.
Fix this by explicitly holding a reference on the reloc root for
roo->reloc_root. This means that we hold two references on reloc roots,
one for whichever reloc_roots list it's attached to, and the
root->reloc_root we're on.
This makes it easier to reason out who owns a reference on the root, and
when it needs to be dropped.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The DEAD_RELOC_TREE flag is in place in order to avoid a use after free
in init_reloc_root, tracking the presence of reloc_root. However adding
the explicit tree references in previous patches makes the use after
free impossible because at this point we no longer have a reloc_control
set on the fs_info and thus cannot enter the function.
So move this to be coupled with clearing the root->reloc_root so we're
consistent with all other operations of the reloc root.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
If we have an error while processing the reloc roots we could leak roots
that were added to rc->reloc_roots before we hit the error. We could
have also not removed the reloc tree mapping from our rb_tree, so clean
up any remaining nodes in the reloc root rb_tree.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ use rbtree_postorder_for_each_entry_safe ]
Signed-off-by: David Sterba <dsterba@suse.com>
We previously were checking if the root had a dead root before accessing
root->reloc_root in order to avoid a use-after-free type bug. However
this scenario happens after we've unset the reloc control, so we would
have been saved if we'd simply checked for fs_info->reloc_control. At
this point during relocation we no longer need to be creating new reloc
roots, so simply move this check above the reloc_root checks to avoid
any future races and confusion.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If we do merge_reloc_roots() we could insert a few roots onto the dirty
subvol roots list, where we hold a ref on them. If we fail to start the
transaction we need to run clean_dirty_subvols() in order to cleanup the
refs.
CC: stable@vger.kernel.org # 5.4+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If we fail to load an fs root, or fail to start a transaction we can
bail without unsetting the reloc control, which leads to problems later
when we free the reloc control but still have it attached to the file
system.
In the normal path we'll end up calling unset_reloc_control() twice, but
all it does is set fs_info->reloc_control = NULL, and we can only have
one balance at a time so it's not racey.
CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If we have an error while building the backref tree in relocation we'll
process all the pending edges and then free the node. However if we
integrated some edges into the cache we'll lose our link to those edges
by simply freeing this node, which means we'll leak memory and
references to any roots that we've found.
Instead we need to use remove_backref_node(), which walks through all of
the edges that are still linked to this node and free's them up and
drops any root references we may be holding.
CC: stable@vger.kernel.org # 4.9+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In relocation, we need to locate all parent tree leaves referring to one
data extent, thus we have a complex mechanism to iterate throught extent
tree and subvolume trees to locate the related leaves.
However this is already done in backref.c, we have
btrfs_find_all_leafs(), which can return a ulist containing all leaves
referring to that data extent.
Use btrfs_find_all_leafs() to replace find_data_references().
There is a special handling for v1 space cache data extents, where we
need to delete the v1 space cache data extents, to avoid those data
extents to hang the data relocation.
In this patch, the special handling is done by re-iterating the root
tree leaf. Although it's a little less efficient than the old handling,
considering we can reuse a lot of code, it should be acceptable.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
It's no longer used following 30d40577e3 ("btrfs: reloc: Also queue
orphan reloc tree for cleanup to avoid BUG_ON()"), so just remove it.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently the non-prefixed version is a simple wrapper used to hide
the 4th argument of the prefixed version. This doesn't bring much value
in practice and only makes the code harder to follow by adding another
level of indirection. Rectify this by removing the __ prefix and
have only one public function to release bytes from a block reservation.
No semantic changes.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When relocating data block groups with tons of small extents, or large
metadata block groups, there can be over 200,000 extents.
We will iterate all extents of such block group in relocate_block_group(),
where iteration itself can be kinda time-consuming.
So when user want to cancel the balance, the extent iteration loop can
be another target.
This patch will add the cancelling check in the extent iteration loop of
relocate_block_group() to make balance cancelling faster.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When relocating a data extents with large large data extents, we spend
most of our time in relocate_file_extent_cluster() at stage "moving data
extents":
1) | btrfs_relocate_block_group [btrfs]() {
1) | relocate_file_extent_cluster [btrfs]() {
1) $ 6586769 us | }
1) + 18.260 us | relocate_file_extent_cluster [btrfs]();
1) + 15.770 us | relocate_file_extent_cluster [btrfs]();
1) $ 8916340 us | }
1) | btrfs_relocate_block_group [btrfs]() {
1) | relocate_file_extent_cluster [btrfs]() {
1) $ 11611586 us | }
1) + 16.930 us | relocate_file_extent_cluster [btrfs]();
1) + 15.870 us | relocate_file_extent_cluster [btrfs]();
1) $ 14986130 us | }
To make data relocation cancelling quicker, add extra balance cancelling
check after each page read in relocate_file_extent_cluster().
Cleanup and error handling uses the same mechanism as if the whole
process finished
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce a new error injection point, should_cancel_balance().
It's just a wrapper of atomic_read(&fs_info->balance_cancel_req), but
allows us to override the return value.
Currently there are only one locations using this function:
- btrfs_balance()
It checks cancel before each block group.
There are other locations checking fs_info->balance_cancel_req, but they
are not used as an indicator to exit, so there is no need to use the
wrapper.
But there will be more locations coming, and some locations can cause
kernel panic if not handled properly. So introduce this error injection
to provide better test interface.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
relocate_tree_blocks calls get_tree_block_key for a block iff that block
has its ->key_ready equal false. Thus the BUG_ON in the latter function
cannot ever be triggered so remove it.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This function is only used in read_fs_root(), which is just a wrapper of
btrfs_get_fs_root().
For all the mentioned essential roots except log root tree,
btrfs_get_fs_root() has its own quick path to grab them from fs_info
directly, thus no need for key.offset modification.
For subvolume trees, btrfs_get_fs_root() with key.offset == -1 is
completely fine.
For log trees and log root tree, it's impossible to hit them, as for
relocation all backrefs are fetched from commit root, which never
records log tree blocks.
Log tree blocks either get freed in regular transaction commit, or
replayed at mount time. At runtime we should never hit an backref for
log tree in extent tree.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We are now using these for all roots, rename them to btrfs_put_root()
and btrfs_grab_root();
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Now that all callers of btrfs_get_fs_root are subsequently calling
btrfs_grab_fs_root and handling dropping the ref when they are done
appropriately, go ahead and push btrfs_grab_fs_root up into
btrfs_get_fs_root.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
All of relocation uses read_fs_root to lookup fs roots, so push the
btrfs_grab_fs_root() up into that helper and remove the individual
calls.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We look up the fs root in various places in here when recovering from a
crashed relcoation. Make sure we hold a ref on the root whenever we
look them up.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We're creating a reloc inode in the data reloc tree, we need to hold a
ref on the root while we're doing that.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We're looking up the data references for the bytenr in a root, we need
to hold a ref on that root while we're doing that.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>