bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
// SPDX-License-Identifier: GPL-2.0
|
|
|
|
|
|
|
|
#include "bcachefs.h"
|
|
|
|
#include "btree_locking.h"
|
|
|
|
#include "btree_update.h"
|
|
|
|
#include "btree_update_interior.h"
|
|
|
|
#include "btree_write_buffer.h"
|
|
|
|
#include "error.h"
|
|
|
|
#include "journal.h"
|
2023-11-03 06:57:19 +08:00
|
|
|
#include "journal_io.h"
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
#include "journal_reclaim.h"
|
|
|
|
|
2023-11-03 06:57:19 +08:00
|
|
|
#include <linux/prefetch.h>
|
2024-03-22 16:01:27 +08:00
|
|
|
#include <linux/sort.h>
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
|
2023-11-10 02:19:00 +08:00
|
|
|
static int bch2_btree_write_buffer_journal_flush(struct journal *,
|
|
|
|
struct journal_entry_pin *, u64);
|
|
|
|
|
2023-11-03 06:57:19 +08:00
|
|
|
static int bch2_journal_keys_to_write_buffer(struct bch_fs *, struct journal_buf *);
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
|
2023-12-28 07:23:34 +08:00
|
|
|
static inline bool __wb_key_ref_cmp(const struct wb_key_ref *l, const struct wb_key_ref *r)
|
2023-11-04 12:06:56 +08:00
|
|
|
{
|
|
|
|
return (cmp_int(l->hi, r->hi) ?:
|
|
|
|
cmp_int(l->mi, r->mi) ?:
|
|
|
|
cmp_int(l->lo, r->lo)) >= 0;
|
|
|
|
}
|
|
|
|
|
2023-12-28 07:23:34 +08:00
|
|
|
static inline bool wb_key_ref_cmp(const struct wb_key_ref *l, const struct wb_key_ref *r)
|
2023-11-04 12:06:56 +08:00
|
|
|
{
|
|
|
|
#ifdef CONFIG_X86_64
|
|
|
|
int cmp;
|
|
|
|
|
|
|
|
asm("mov (%[l]), %%rax;"
|
|
|
|
"sub (%[r]), %%rax;"
|
|
|
|
"mov 8(%[l]), %%rax;"
|
|
|
|
"sbb 8(%[r]), %%rax;"
|
|
|
|
"mov 16(%[l]), %%rax;"
|
|
|
|
"sbb 16(%[r]), %%rax;"
|
|
|
|
: "=@ccae" (cmp)
|
|
|
|
: [l] "r" (l), [r] "r" (r)
|
|
|
|
: "rax", "cc");
|
|
|
|
|
2023-12-28 07:23:34 +08:00
|
|
|
EBUG_ON(cmp != __wb_key_ref_cmp(l, r));
|
2023-11-04 12:06:56 +08:00
|
|
|
return cmp;
|
|
|
|
#else
|
2023-12-28 07:23:34 +08:00
|
|
|
return __wb_key_ref_cmp(l, r);
|
2023-11-04 12:06:56 +08:00
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
2024-03-22 16:01:27 +08:00
|
|
|
static int wb_key_seq_cmp(const void *_l, const void *_r)
|
|
|
|
{
|
|
|
|
const struct btree_write_buffered_key *l = _l;
|
|
|
|
const struct btree_write_buffered_key *r = _r;
|
|
|
|
|
|
|
|
return cmp_int(l->journal_seq, r->journal_seq);
|
|
|
|
}
|
|
|
|
|
2023-11-04 12:06:56 +08:00
|
|
|
/* Compare excluding idx, the low 24 bits: */
|
|
|
|
static inline bool wb_key_eq(const void *_l, const void *_r)
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
{
|
2023-11-03 06:57:19 +08:00
|
|
|
const struct wb_key_ref *l = _l;
|
|
|
|
const struct wb_key_ref *r = _r;
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
|
2023-11-04 12:06:56 +08:00
|
|
|
return !((l->hi ^ r->hi)|
|
|
|
|
(l->mi ^ r->mi)|
|
|
|
|
((l->lo >> 24) ^ (r->lo >> 24)));
|
|
|
|
}
|
|
|
|
|
|
|
|
static noinline void wb_sort(struct wb_key_ref *base, size_t num)
|
|
|
|
{
|
|
|
|
size_t n = num, a = num / 2;
|
|
|
|
|
|
|
|
if (!a) /* num < 2 || size == 0 */
|
|
|
|
return;
|
|
|
|
|
|
|
|
for (;;) {
|
|
|
|
size_t b, c, d;
|
|
|
|
|
|
|
|
if (a) /* Building heap: sift down --a */
|
|
|
|
--a;
|
|
|
|
else if (--n) /* Sorting: Extract root to --n */
|
|
|
|
swap(base[0], base[n]);
|
|
|
|
else /* Sort complete */
|
|
|
|
break;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Sift element at "a" down into heap. This is the
|
|
|
|
* "bottom-up" variant, which significantly reduces
|
|
|
|
* calls to cmp_func(): we find the sift-down path all
|
|
|
|
* the way to the leaves (one compare per level), then
|
|
|
|
* backtrack to find where to insert the target element.
|
|
|
|
*
|
|
|
|
* Because elements tend to sift down close to the leaves,
|
|
|
|
* this uses fewer compares than doing two per level
|
|
|
|
* on the way down. (A bit more than half as many on
|
|
|
|
* average, 3/4 worst-case.)
|
|
|
|
*/
|
|
|
|
for (b = a; c = 2*b + 1, (d = c + 1) < n;)
|
2023-12-28 07:23:34 +08:00
|
|
|
b = wb_key_ref_cmp(base + c, base + d) ? c : d;
|
2023-11-04 12:06:56 +08:00
|
|
|
if (d == n) /* Special case last leaf with no sibling */
|
|
|
|
b = c;
|
|
|
|
|
|
|
|
/* Now backtrack from "b" to the correct location for "a" */
|
2023-12-28 07:23:34 +08:00
|
|
|
while (b != a && wb_key_ref_cmp(base + a, base + b))
|
2023-11-04 12:06:56 +08:00
|
|
|
b = (b - 1) / 2;
|
|
|
|
c = b; /* Where "a" belongs */
|
|
|
|
while (b != a) { /* Shift it into place */
|
|
|
|
b = (b - 1) / 2;
|
|
|
|
swap(base[b], base[c]);
|
|
|
|
}
|
|
|
|
}
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
}
|
|
|
|
|
2023-11-27 10:58:11 +08:00
|
|
|
static noinline int wb_flush_one_slowpath(struct btree_trans *trans,
|
|
|
|
struct btree_iter *iter,
|
|
|
|
struct btree_write_buffered_key *wb)
|
|
|
|
{
|
2023-12-04 13:39:38 +08:00
|
|
|
struct btree_path *path = btree_iter_path(trans, iter);
|
|
|
|
|
|
|
|
bch2_btree_node_unlock_write(trans, path, path->l[0].b);
|
2023-11-27 10:58:11 +08:00
|
|
|
|
|
|
|
trans->journal_res.seq = wb->journal_seq;
|
|
|
|
|
|
|
|
return bch2_trans_update(trans, iter, &wb->k,
|
2024-04-08 06:05:34 +08:00
|
|
|
BTREE_UPDATE_internal_snapshot_node) ?:
|
2023-11-27 10:58:11 +08:00
|
|
|
bch2_trans_commit(trans, NULL, NULL,
|
|
|
|
BCH_TRANS_COMMIT_no_enospc|
|
|
|
|
BCH_TRANS_COMMIT_no_check_rw|
|
|
|
|
BCH_TRANS_COMMIT_no_journal_res|
|
|
|
|
BCH_TRANS_COMMIT_journal_reclaim);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline int wb_flush_one(struct btree_trans *trans, struct btree_iter *iter,
|
|
|
|
struct btree_write_buffered_key *wb,
|
|
|
|
bool *write_locked, size_t *fast)
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
{
|
|
|
|
struct btree_path *path;
|
|
|
|
int ret;
|
|
|
|
|
2023-11-27 10:58:11 +08:00
|
|
|
EBUG_ON(!wb->journal_seq);
|
2024-01-17 02:29:59 +08:00
|
|
|
EBUG_ON(!trans->c->btree_write_buffer.flushing.pin.seq);
|
|
|
|
EBUG_ON(trans->c->btree_write_buffer.flushing.pin.seq > wb->journal_seq);
|
2023-11-03 06:57:19 +08:00
|
|
|
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
ret = bch2_btree_iter_traverse(iter);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
2023-11-07 23:42:53 +08:00
|
|
|
/*
|
|
|
|
* We can't clone a path that has write locks: unshare it now, before
|
|
|
|
* set_pos and traverse():
|
|
|
|
*/
|
2023-12-04 13:39:38 +08:00
|
|
|
if (btree_iter_path(trans, iter)->ref > 1)
|
|
|
|
iter->path = __bch2_btree_path_make_mut(trans, iter->path, true, _THIS_IP_);
|
2023-11-07 23:42:53 +08:00
|
|
|
|
2023-12-04 13:39:38 +08:00
|
|
|
path = btree_iter_path(trans, iter);
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
|
|
|
|
if (!*write_locked) {
|
|
|
|
ret = bch2_btree_node_lock_write(trans, path, &path->l[0].b->c);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
bch2_btree_node_prep_for_write(trans, path, path->l[0].b);
|
|
|
|
*write_locked = true;
|
|
|
|
}
|
|
|
|
|
2024-01-17 02:29:59 +08:00
|
|
|
if (unlikely(!bch2_btree_node_insert_fits(path->l[0].b, wb->k.k.u64s))) {
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
*write_locked = false;
|
2023-11-27 10:58:11 +08:00
|
|
|
return wb_flush_one_slowpath(trans, iter, wb);
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
bch2_btree_insert_key_leaf(trans, path, &wb->k, wb->journal_seq);
|
|
|
|
(*fast)++;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
bcachefs: use prejournaled key updates for write buffer flushes
The write buffer mechanism journals keys twice in certain
situations. A key is always journaled on write buffer insertion, and
is potentially journaled again if a write buffer flush falls into
either of the slow btree insert paths. This has shown to cause
journal recovery ordering problems in the event of an untimely
crash.
For example, consider if a key is inserted into index 0 of a write
buffer, the active write buffer switches to index 1, the key is
deleted in index 1, and then index 0 is flushed. If the original key
is rejournaled in the btree update from the index 0 flush, the (now
deleted) key is journaled in a seq buffer ahead of the latest
version of key (which was journaled when the key was deleted in
index 1). If the fs crashes while this is still observable in the
log, recovery sees the key from the btree update after the delete
key from the write buffer insert, which is the incorrect order. This
problem is occasionally reproduced by generic/388 and generally
manifests as one or more backpointer entry inconsistencies.
To avoid this problem, never rejournal write buffered key updates to
the associated btree. Instead, use prejournaled key updates to pass
the journal seq of the write buffer insert down to the btree insert,
which updates the btree leaf pin to reflect the seq of the key.
Note that tracking the seq is required instead of just using
NOJOURNAL here because otherwise we lose protection of the write
buffer pin when the buffer is flushed, which means the key can fall
off the tail of the on-disk journal before the btree leaf is flushed
and lead to similar recovery inconsistencies.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-07-19 20:53:06 +08:00
|
|
|
/*
|
|
|
|
* Update a btree with a write buffered key using the journal seq of the
|
|
|
|
* original write buffer insert.
|
|
|
|
*
|
|
|
|
* It is not safe to rejournal the key once it has been inserted into the write
|
|
|
|
* buffer because that may break recovery ordering. For example, the key may
|
|
|
|
* have already been modified in the active write buffer in a seq that comes
|
|
|
|
* before the current transaction. If we were to journal this key again and
|
|
|
|
* crash, recovery would process updates in the wrong order.
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
btree_write_buffered_insert(struct btree_trans *trans,
|
|
|
|
struct btree_write_buffered_key *wb)
|
|
|
|
{
|
|
|
|
struct btree_iter iter;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
bch2_trans_iter_init(trans, &iter, wb->btree, bkey_start_pos(&wb->k.k),
|
2024-04-08 06:05:34 +08:00
|
|
|
BTREE_ITER_cached|BTREE_ITER_intent);
|
bcachefs: use prejournaled key updates for write buffer flushes
The write buffer mechanism journals keys twice in certain
situations. A key is always journaled on write buffer insertion, and
is potentially journaled again if a write buffer flush falls into
either of the slow btree insert paths. This has shown to cause
journal recovery ordering problems in the event of an untimely
crash.
For example, consider if a key is inserted into index 0 of a write
buffer, the active write buffer switches to index 1, the key is
deleted in index 1, and then index 0 is flushed. If the original key
is rejournaled in the btree update from the index 0 flush, the (now
deleted) key is journaled in a seq buffer ahead of the latest
version of key (which was journaled when the key was deleted in
index 1). If the fs crashes while this is still observable in the
log, recovery sees the key from the btree update after the delete
key from the write buffer insert, which is the incorrect order. This
problem is occasionally reproduced by generic/388 and generally
manifests as one or more backpointer entry inconsistencies.
To avoid this problem, never rejournal write buffered key updates to
the associated btree. Instead, use prejournaled key updates to pass
the journal seq of the write buffer insert down to the btree insert,
which updates the btree leaf pin to reflect the seq of the key.
Note that tracking the seq is required instead of just using
NOJOURNAL here because otherwise we lose protection of the write
buffer pin when the buffer is flushed, which means the key can fall
off the tail of the on-disk journal before the btree leaf is flushed
and lead to similar recovery inconsistencies.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-07-19 20:53:06 +08:00
|
|
|
|
2023-11-09 11:04:29 +08:00
|
|
|
trans->journal_res.seq = wb->journal_seq;
|
|
|
|
|
bcachefs: use prejournaled key updates for write buffer flushes
The write buffer mechanism journals keys twice in certain
situations. A key is always journaled on write buffer insertion, and
is potentially journaled again if a write buffer flush falls into
either of the slow btree insert paths. This has shown to cause
journal recovery ordering problems in the event of an untimely
crash.
For example, consider if a key is inserted into index 0 of a write
buffer, the active write buffer switches to index 1, the key is
deleted in index 1, and then index 0 is flushed. If the original key
is rejournaled in the btree update from the index 0 flush, the (now
deleted) key is journaled in a seq buffer ahead of the latest
version of key (which was journaled when the key was deleted in
index 1). If the fs crashes while this is still observable in the
log, recovery sees the key from the btree update after the delete
key from the write buffer insert, which is the incorrect order. This
problem is occasionally reproduced by generic/388 and generally
manifests as one or more backpointer entry inconsistencies.
To avoid this problem, never rejournal write buffered key updates to
the associated btree. Instead, use prejournaled key updates to pass
the journal seq of the write buffer insert down to the btree insert,
which updates the btree leaf pin to reflect the seq of the key.
Note that tracking the seq is required instead of just using
NOJOURNAL here because otherwise we lose protection of the write
buffer pin when the buffer is flushed, which means the key can fall
off the tail of the on-disk journal before the btree leaf is flushed
and lead to similar recovery inconsistencies.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-07-19 20:53:06 +08:00
|
|
|
ret = bch2_btree_iter_traverse(&iter) ?:
|
2023-11-09 11:04:29 +08:00
|
|
|
bch2_trans_update(trans, &iter, &wb->k,
|
2024-04-08 06:05:34 +08:00
|
|
|
BTREE_UPDATE_internal_snapshot_node);
|
bcachefs: use prejournaled key updates for write buffer flushes
The write buffer mechanism journals keys twice in certain
situations. A key is always journaled on write buffer insertion, and
is potentially journaled again if a write buffer flush falls into
either of the slow btree insert paths. This has shown to cause
journal recovery ordering problems in the event of an untimely
crash.
For example, consider if a key is inserted into index 0 of a write
buffer, the active write buffer switches to index 1, the key is
deleted in index 1, and then index 0 is flushed. If the original key
is rejournaled in the btree update from the index 0 flush, the (now
deleted) key is journaled in a seq buffer ahead of the latest
version of key (which was journaled when the key was deleted in
index 1). If the fs crashes while this is still observable in the
log, recovery sees the key from the btree update after the delete
key from the write buffer insert, which is the incorrect order. This
problem is occasionally reproduced by generic/388 and generally
manifests as one or more backpointer entry inconsistencies.
To avoid this problem, never rejournal write buffered key updates to
the associated btree. Instead, use prejournaled key updates to pass
the journal seq of the write buffer insert down to the btree insert,
which updates the btree leaf pin to reflect the seq of the key.
Note that tracking the seq is required instead of just using
NOJOURNAL here because otherwise we lose protection of the write
buffer pin when the buffer is flushed, which means the key can fall
off the tail of the on-disk journal before the btree leaf is flushed
and lead to similar recovery inconsistencies.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-07-19 20:53:06 +08:00
|
|
|
bch2_trans_iter_exit(trans, &iter);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2023-11-03 06:57:19 +08:00
|
|
|
static void move_keys_from_inc_to_flushing(struct btree_write_buffer *wb)
|
|
|
|
{
|
|
|
|
struct bch_fs *c = container_of(wb, struct bch_fs, btree_write_buffer);
|
|
|
|
struct journal *j = &c->journal;
|
|
|
|
|
|
|
|
if (!wb->inc.keys.nr)
|
|
|
|
return;
|
|
|
|
|
|
|
|
bch2_journal_pin_add(j, wb->inc.keys.data[0].journal_seq, &wb->flushing.pin,
|
|
|
|
bch2_btree_write_buffer_journal_flush);
|
|
|
|
|
|
|
|
darray_resize(&wb->flushing.keys, min_t(size_t, 1U << 20, wb->flushing.keys.nr + wb->inc.keys.nr));
|
|
|
|
darray_resize(&wb->sorted, wb->flushing.keys.size);
|
|
|
|
|
|
|
|
if (!wb->flushing.keys.nr && wb->sorted.size >= wb->inc.keys.nr) {
|
|
|
|
swap(wb->flushing.keys, wb->inc.keys);
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
size_t nr = min(darray_room(wb->flushing.keys),
|
|
|
|
wb->sorted.size - wb->flushing.keys.nr);
|
|
|
|
nr = min(nr, wb->inc.keys.nr);
|
|
|
|
|
|
|
|
memcpy(&darray_top(wb->flushing.keys),
|
|
|
|
wb->inc.keys.data,
|
|
|
|
sizeof(wb->inc.keys.data[0]) * nr);
|
|
|
|
|
|
|
|
memmove(wb->inc.keys.data,
|
|
|
|
wb->inc.keys.data + nr,
|
|
|
|
sizeof(wb->inc.keys.data[0]) * (wb->inc.keys.nr - nr));
|
|
|
|
|
|
|
|
wb->flushing.keys.nr += nr;
|
|
|
|
wb->inc.keys.nr -= nr;
|
|
|
|
out:
|
|
|
|
if (!wb->inc.keys.nr)
|
|
|
|
bch2_journal_pin_drop(j, &wb->inc.pin);
|
|
|
|
else
|
|
|
|
bch2_journal_pin_update(j, wb->inc.keys.data[0].journal_seq, &wb->inc.pin,
|
|
|
|
bch2_btree_write_buffer_journal_flush);
|
|
|
|
|
|
|
|
if (j->watermark) {
|
|
|
|
spin_lock(&j->lock);
|
|
|
|
bch2_journal_set_watermark(j);
|
|
|
|
spin_unlock(&j->lock);
|
|
|
|
}
|
|
|
|
|
|
|
|
BUG_ON(wb->sorted.size < wb->flushing.keys.nr);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int bch2_btree_write_buffer_flush_locked(struct btree_trans *trans)
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
{
|
|
|
|
struct bch_fs *c = trans->c;
|
|
|
|
struct journal *j = &c->journal;
|
|
|
|
struct btree_write_buffer *wb = &c->btree_write_buffer;
|
|
|
|
struct btree_iter iter = { NULL };
|
2023-11-03 06:57:19 +08:00
|
|
|
size_t skipped = 0, fast = 0, slowpath = 0;
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
bool write_locked = false;
|
|
|
|
int ret = 0;
|
|
|
|
|
2023-11-03 06:57:19 +08:00
|
|
|
bch2_trans_unlock(trans);
|
|
|
|
bch2_trans_begin(trans);
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
|
2023-11-03 06:57:19 +08:00
|
|
|
mutex_lock(&wb->inc.lock);
|
|
|
|
move_keys_from_inc_to_flushing(wb);
|
|
|
|
mutex_unlock(&wb->inc.lock);
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
|
2023-11-03 06:57:19 +08:00
|
|
|
for (size_t i = 0; i < wb->flushing.keys.nr; i++) {
|
|
|
|
wb->sorted.data[i].idx = i;
|
|
|
|
wb->sorted.data[i].btree = wb->flushing.keys.data[i].btree;
|
2023-11-04 12:06:56 +08:00
|
|
|
memcpy(&wb->sorted.data[i].pos, &wb->flushing.keys.data[i].k.k.p, sizeof(struct bpos));
|
2023-11-03 06:57:19 +08:00
|
|
|
}
|
|
|
|
wb->sorted.nr = wb->flushing.keys.nr;
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* We first sort so that we can detect and skip redundant updates, and
|
|
|
|
* then we attempt to flush in sorted btree order, as this is most
|
|
|
|
* efficient.
|
|
|
|
*
|
|
|
|
* However, since we're not flushing in the order they appear in the
|
|
|
|
* journal we won't be able to drop our journal pin until everything is
|
bcachefs: more aggressive fast path write buffer key flushing
The btree write buffer flush code is prone to causing journal
deadlock due to inefficient use and release of reservation space.
Reservation is not pre-reserved for write buffered keys (as is done
for key cache keys, for example), because the write buffer flush
side uses a fast path that attempts insertion without need for any
reservation at all.
The write buffer flush attempts to deal with this by inserting keys
using the BTREE_INSERT_JOURNAL_RECLAIM flag to return an error on
journal reservations that require blocking. Upon first error, it
falls back to a slow path that inserts in journal order and supports
moving the associated journal pin forward.
The problem is that under pathological conditions (i.e. smaller log,
larger write buffer and journal reservation pressure), we've seen
instances where the fast path fails fairly quickly without having
completed many insertions, and then the slow path is unable to push
the journal pin forward enough to free up the space it needs to
completely flush the buffer. This problem is occasionally reproduced
by fstest generic/333.
To avoid this problem, update the fast path algorithm to skip key
inserts that fail due to inability to acquire needed journal
reservation without immediately breaking out of the loop. Instead,
insert as many keys as possible, zap the sequence numbers to mark
them as processed, and then fall back to the slow path to process
the remaining set in journal order. This reduces the amount of
journal reservation that might be required to flush the entire
buffer and increases the odds that the slow path is able to move the
journal pin forward and free up space as keys are processed.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-03-17 20:54:01 +08:00
|
|
|
* flushed - which means this could deadlock the journal if we weren't
|
2023-11-12 05:31:50 +08:00
|
|
|
* passing BCH_TRANS_COMMIT_journal_reclaim. This causes the update to fail
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
* if it would block taking a journal reservation.
|
|
|
|
*
|
bcachefs: more aggressive fast path write buffer key flushing
The btree write buffer flush code is prone to causing journal
deadlock due to inefficient use and release of reservation space.
Reservation is not pre-reserved for write buffered keys (as is done
for key cache keys, for example), because the write buffer flush
side uses a fast path that attempts insertion without need for any
reservation at all.
The write buffer flush attempts to deal with this by inserting keys
using the BTREE_INSERT_JOURNAL_RECLAIM flag to return an error on
journal reservations that require blocking. Upon first error, it
falls back to a slow path that inserts in journal order and supports
moving the associated journal pin forward.
The problem is that under pathological conditions (i.e. smaller log,
larger write buffer and journal reservation pressure), we've seen
instances where the fast path fails fairly quickly without having
completed many insertions, and then the slow path is unable to push
the journal pin forward enough to free up the space it needs to
completely flush the buffer. This problem is occasionally reproduced
by fstest generic/333.
To avoid this problem, update the fast path algorithm to skip key
inserts that fail due to inability to acquire needed journal
reservation without immediately breaking out of the loop. Instead,
insert as many keys as possible, zap the sequence numbers to mark
them as processed, and then fall back to the slow path to process
the remaining set in journal order. This reduces the amount of
journal reservation that might be required to flush the entire
buffer and increases the odds that the slow path is able to move the
journal pin forward and free up space as keys are processed.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-03-17 20:54:01 +08:00
|
|
|
* If that happens, simply skip the key so we can optimistically insert
|
|
|
|
* as many keys as possible in the fast path.
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
*/
|
2023-11-04 12:06:56 +08:00
|
|
|
wb_sort(wb->sorted.data, wb->sorted.nr);
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
|
2023-11-03 06:57:19 +08:00
|
|
|
darray_for_each(wb->sorted, i) {
|
|
|
|
struct btree_write_buffered_key *k = &wb->flushing.keys.data[i->idx];
|
|
|
|
|
|
|
|
for (struct wb_key_ref *n = i + 1; n < min(i + 4, &darray_top(wb->sorted)); n++)
|
|
|
|
prefetch(&wb->flushing.keys.data[n->idx]);
|
|
|
|
|
|
|
|
BUG_ON(!k->journal_seq);
|
|
|
|
|
|
|
|
if (i + 1 < &darray_top(wb->sorted) &&
|
2023-11-04 12:06:56 +08:00
|
|
|
wb_key_eq(i, i + 1)) {
|
2023-11-03 06:57:19 +08:00
|
|
|
struct btree_write_buffered_key *n = &wb->flushing.keys.data[i[1].idx];
|
|
|
|
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
skipped++;
|
2023-11-03 06:57:19 +08:00
|
|
|
n->journal_seq = min_t(u64, n->journal_seq, k->journal_seq);
|
|
|
|
k->journal_seq = 0;
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
2023-11-03 06:57:19 +08:00
|
|
|
if (write_locked) {
|
2023-12-04 13:39:38 +08:00
|
|
|
struct btree_path *path = btree_iter_path(trans, &iter);
|
2023-11-03 06:57:19 +08:00
|
|
|
|
|
|
|
if (path->btree_id != i->btree ||
|
|
|
|
bpos_gt(k->k.k.p, path->l[0].b->key.k.p)) {
|
|
|
|
bch2_btree_node_unlock_write(trans, path, path->l[0].b);
|
|
|
|
write_locked = false;
|
2023-12-27 11:42:34 +08:00
|
|
|
|
|
|
|
ret = lockrestart_do(trans,
|
|
|
|
bch2_btree_iter_traverse(&iter) ?:
|
|
|
|
bch2_foreground_maybe_merge(trans, iter.path, 0,
|
|
|
|
BCH_WATERMARK_reclaim|
|
|
|
|
BCH_TRANS_COMMIT_journal_reclaim|
|
|
|
|
BCH_TRANS_COMMIT_no_check_rw|
|
|
|
|
BCH_TRANS_COMMIT_no_enospc));
|
|
|
|
if (ret)
|
|
|
|
goto err;
|
2023-11-03 06:57:19 +08:00
|
|
|
}
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
}
|
|
|
|
|
2023-11-03 06:57:19 +08:00
|
|
|
if (!iter.path || iter.btree_id != k->btree) {
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
bch2_trans_iter_exit(trans, &iter);
|
2023-11-03 06:57:19 +08:00
|
|
|
bch2_trans_iter_init(trans, &iter, k->btree, k->k.k.p,
|
2024-04-08 06:05:34 +08:00
|
|
|
BTREE_ITER_intent|BTREE_ITER_all_snapshots);
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
}
|
|
|
|
|
2023-11-03 06:57:19 +08:00
|
|
|
bch2_btree_iter_set_pos(&iter, k->k.k.p);
|
2023-12-04 13:39:38 +08:00
|
|
|
btree_iter_path(trans, &iter)->preserve = false;
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
|
|
|
|
do {
|
2023-11-27 11:06:48 +08:00
|
|
|
if (race_fault()) {
|
|
|
|
ret = -BCH_ERR_journal_reclaim_would_deadlock;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2023-11-03 06:57:19 +08:00
|
|
|
ret = wb_flush_one(trans, &iter, k, &write_locked, &fast);
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
if (!write_locked)
|
|
|
|
bch2_trans_begin(trans);
|
|
|
|
} while (bch2_err_matches(ret, BCH_ERR_transaction_restart));
|
|
|
|
|
2023-11-27 10:58:11 +08:00
|
|
|
if (!ret) {
|
2023-11-03 06:57:19 +08:00
|
|
|
k->journal_seq = 0;
|
2023-11-27 10:58:11 +08:00
|
|
|
} else if (ret == -BCH_ERR_journal_reclaim_would_deadlock) {
|
bcachefs: more aggressive fast path write buffer key flushing
The btree write buffer flush code is prone to causing journal
deadlock due to inefficient use and release of reservation space.
Reservation is not pre-reserved for write buffered keys (as is done
for key cache keys, for example), because the write buffer flush
side uses a fast path that attempts insertion without need for any
reservation at all.
The write buffer flush attempts to deal with this by inserting keys
using the BTREE_INSERT_JOURNAL_RECLAIM flag to return an error on
journal reservations that require blocking. Upon first error, it
falls back to a slow path that inserts in journal order and supports
moving the associated journal pin forward.
The problem is that under pathological conditions (i.e. smaller log,
larger write buffer and journal reservation pressure), we've seen
instances where the fast path fails fairly quickly without having
completed many insertions, and then the slow path is unable to push
the journal pin forward enough to free up the space it needs to
completely flush the buffer. This problem is occasionally reproduced
by fstest generic/333.
To avoid this problem, update the fast path algorithm to skip key
inserts that fail due to inability to acquire needed journal
reservation without immediately breaking out of the loop. Instead,
insert as many keys as possible, zap the sequence numbers to mark
them as processed, and then fall back to the slow path to process
the remaining set in journal order. This reduces the amount of
journal reservation that might be required to flush the entire
buffer and increases the odds that the slow path is able to move the
journal pin forward and free up space as keys are processed.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-03-17 20:54:01 +08:00
|
|
|
slowpath++;
|
2023-11-27 10:58:11 +08:00
|
|
|
ret = 0;
|
|
|
|
} else
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2023-12-04 13:39:38 +08:00
|
|
|
if (write_locked) {
|
|
|
|
struct btree_path *path = btree_iter_path(trans, &iter);
|
|
|
|
bch2_btree_node_unlock_write(trans, path, path->l[0].b);
|
|
|
|
}
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
bch2_trans_iter_exit(trans, &iter);
|
|
|
|
|
2023-11-27 11:06:48 +08:00
|
|
|
if (ret)
|
|
|
|
goto err;
|
|
|
|
|
|
|
|
if (slowpath) {
|
|
|
|
/*
|
|
|
|
* Flush in the order they were present in the journal, so that
|
|
|
|
* we can release journal pins:
|
|
|
|
* The fastpath zapped the seq of keys that were successfully flushed so
|
|
|
|
* we can skip those here.
|
|
|
|
*/
|
2023-11-03 06:57:19 +08:00
|
|
|
trace_and_count(c, write_buffer_flush_slowpath, trans, slowpath, wb->flushing.keys.nr);
|
2023-11-27 11:06:48 +08:00
|
|
|
|
2024-03-22 16:01:27 +08:00
|
|
|
sort(wb->flushing.keys.data,
|
|
|
|
wb->flushing.keys.nr,
|
|
|
|
sizeof(wb->flushing.keys.data[0]),
|
|
|
|
wb_key_seq_cmp, NULL);
|
|
|
|
|
2023-11-03 06:57:19 +08:00
|
|
|
darray_for_each(wb->flushing.keys, i) {
|
2023-11-27 11:06:48 +08:00
|
|
|
if (!i->journal_seq)
|
|
|
|
continue;
|
|
|
|
|
2023-11-03 06:57:19 +08:00
|
|
|
bch2_journal_pin_update(j, i->journal_seq, &wb->flushing.pin,
|
|
|
|
bch2_btree_write_buffer_journal_flush);
|
|
|
|
|
|
|
|
bch2_trans_begin(trans);
|
2023-11-27 11:06:48 +08:00
|
|
|
|
|
|
|
ret = commit_do(trans, NULL, NULL,
|
|
|
|
BCH_WATERMARK_reclaim|
|
2023-12-27 11:42:34 +08:00
|
|
|
BCH_TRANS_COMMIT_journal_reclaim|
|
2023-11-27 11:06:48 +08:00
|
|
|
BCH_TRANS_COMMIT_no_check_rw|
|
|
|
|
BCH_TRANS_COMMIT_no_enospc|
|
2023-12-27 11:42:34 +08:00
|
|
|
BCH_TRANS_COMMIT_no_journal_res ,
|
2023-11-27 11:06:48 +08:00
|
|
|
btree_write_buffered_insert(trans, i));
|
|
|
|
if (ret)
|
|
|
|
goto err;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
err:
|
2024-03-18 09:51:19 +08:00
|
|
|
bch2_fs_fatal_err_on(ret, c, "%s", bch2_err_str(ret));
|
2023-11-03 06:57:19 +08:00
|
|
|
trace_write_buffer_flush(trans, wb->flushing.keys.nr, skipped, fast, 0);
|
|
|
|
bch2_journal_pin_drop(j, &wb->flushing.pin);
|
|
|
|
wb->flushing.keys.nr = 0;
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2023-11-03 06:57:19 +08:00
|
|
|
static int fetch_wb_keys_from_journal(struct bch_fs *c, u64 seq)
|
|
|
|
{
|
|
|
|
struct journal *j = &c->journal;
|
|
|
|
struct journal_buf *buf;
|
|
|
|
int ret = 0;
|
|
|
|
|
|
|
|
while (!ret && (buf = bch2_next_write_buffer_flush_journal_buf(j, seq))) {
|
|
|
|
ret = bch2_journal_keys_to_write_buffer(c, buf);
|
|
|
|
mutex_unlock(&j->buf_lock);
|
|
|
|
}
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int btree_write_buffer_flush_seq(struct btree_trans *trans, u64 seq)
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
{
|
2023-11-03 10:31:16 +08:00
|
|
|
struct bch_fs *c = trans->c;
|
2023-11-03 06:57:19 +08:00
|
|
|
struct btree_write_buffer *wb = &c->btree_write_buffer;
|
|
|
|
int ret = 0, fetch_from_journal_err;
|
2023-11-03 10:31:16 +08:00
|
|
|
|
2023-11-03 06:57:19 +08:00
|
|
|
do {
|
|
|
|
bch2_trans_unlock(trans);
|
2023-11-03 07:37:15 +08:00
|
|
|
|
2023-11-03 06:57:19 +08:00
|
|
|
fetch_from_journal_err = fetch_wb_keys_from_journal(c, seq);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* On memory allocation failure, bch2_btree_write_buffer_flush_locked()
|
|
|
|
* is not guaranteed to empty wb->inc:
|
|
|
|
*/
|
|
|
|
mutex_lock(&wb->flushing.lock);
|
|
|
|
ret = bch2_btree_write_buffer_flush_locked(trans);
|
|
|
|
mutex_unlock(&wb->flushing.lock);
|
|
|
|
} while (!ret &&
|
|
|
|
(fetch_from_journal_err ||
|
|
|
|
(wb->inc.pin.seq && wb->inc.pin.seq <= seq) ||
|
|
|
|
(wb->flushing.pin.seq && wb->flushing.pin.seq <= seq)));
|
2023-11-03 10:31:16 +08:00
|
|
|
|
2023-11-03 07:37:15 +08:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2023-11-03 06:57:19 +08:00
|
|
|
static int bch2_btree_write_buffer_journal_flush(struct journal *j,
|
|
|
|
struct journal_entry_pin *_pin, u64 seq)
|
|
|
|
{
|
|
|
|
struct bch_fs *c = container_of(j, struct bch_fs, journal);
|
|
|
|
|
|
|
|
return bch2_trans_run(c, btree_write_buffer_flush_seq(trans, seq));
|
|
|
|
}
|
|
|
|
|
|
|
|
int bch2_btree_write_buffer_flush_sync(struct btree_trans *trans)
|
|
|
|
{
|
|
|
|
struct bch_fs *c = trans->c;
|
|
|
|
|
|
|
|
trace_and_count(c, write_buffer_flush_sync, trans, _RET_IP_);
|
|
|
|
|
|
|
|
return btree_write_buffer_flush_seq(trans, journal_cur_seq(&c->journal));
|
|
|
|
}
|
|
|
|
|
2023-11-03 07:37:15 +08:00
|
|
|
int bch2_btree_write_buffer_flush_nocheck_rw(struct btree_trans *trans)
|
|
|
|
{
|
2023-11-03 08:32:19 +08:00
|
|
|
struct bch_fs *c = trans->c;
|
|
|
|
struct btree_write_buffer *wb = &c->btree_write_buffer;
|
|
|
|
int ret = 0;
|
|
|
|
|
2023-11-03 06:57:19 +08:00
|
|
|
if (mutex_trylock(&wb->flushing.lock)) {
|
2023-11-03 08:32:19 +08:00
|
|
|
ret = bch2_btree_write_buffer_flush_locked(trans);
|
2023-11-03 06:57:19 +08:00
|
|
|
mutex_unlock(&wb->flushing.lock);
|
2023-11-03 08:32:19 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
return ret;
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
}
|
|
|
|
|
2023-11-03 08:36:00 +08:00
|
|
|
int bch2_btree_write_buffer_tryflush(struct btree_trans *trans)
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
{
|
2023-11-03 07:37:15 +08:00
|
|
|
struct bch_fs *c = trans->c;
|
|
|
|
|
|
|
|
if (!bch2_write_ref_tryget(c, BCH_WRITE_REF_btree_write_buffer))
|
|
|
|
return -BCH_ERR_erofs_no_writes;
|
|
|
|
|
|
|
|
int ret = bch2_btree_write_buffer_flush_nocheck_rw(trans);
|
|
|
|
bch2_write_ref_put(c, BCH_WRITE_REF_btree_write_buffer);
|
|
|
|
return ret;
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
}
|
|
|
|
|
2023-11-03 06:57:19 +08:00
|
|
|
static void bch2_btree_write_buffer_flush_work(struct work_struct *work)
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
{
|
2023-11-03 06:57:19 +08:00
|
|
|
struct bch_fs *c = container_of(work, struct bch_fs, btree_write_buffer.flush_work);
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
struct btree_write_buffer *wb = &c->btree_write_buffer;
|
2023-11-03 06:57:19 +08:00
|
|
|
int ret;
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
|
2023-11-03 06:57:19 +08:00
|
|
|
mutex_lock(&wb->flushing.lock);
|
|
|
|
do {
|
|
|
|
ret = bch2_trans_run(c, bch2_btree_write_buffer_flush_locked(trans));
|
|
|
|
} while (!ret && bch2_btree_write_buffer_should_flush(c));
|
|
|
|
mutex_unlock(&wb->flushing.lock);
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
|
2023-11-03 06:57:19 +08:00
|
|
|
bch2_write_ref_put(c, BCH_WRITE_REF_btree_write_buffer);
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
}
|
|
|
|
|
2023-12-28 09:26:30 +08:00
|
|
|
int bch2_journal_key_to_wb_slowpath(struct bch_fs *c,
|
2023-11-03 06:57:19 +08:00
|
|
|
struct journal_keys_to_wb *dst,
|
|
|
|
enum btree_id btree, struct bkey_i *k)
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
{
|
2023-11-03 06:57:19 +08:00
|
|
|
struct btree_write_buffer *wb = &c->btree_write_buffer;
|
|
|
|
int ret;
|
|
|
|
retry:
|
|
|
|
ret = darray_make_room_gfp(&dst->wb->keys, 1, GFP_KERNEL);
|
|
|
|
if (!ret && dst->wb == &wb->flushing)
|
|
|
|
ret = darray_resize(&wb->sorted, wb->flushing.keys.size);
|
|
|
|
|
|
|
|
if (unlikely(ret)) {
|
|
|
|
if (dst->wb == &c->btree_write_buffer.flushing) {
|
|
|
|
mutex_unlock(&dst->wb->lock);
|
|
|
|
dst->wb = &c->btree_write_buffer.inc;
|
|
|
|
bch2_journal_pin_add(&c->journal, dst->seq, &dst->wb->pin,
|
|
|
|
bch2_btree_write_buffer_journal_flush);
|
|
|
|
goto retry;
|
|
|
|
}
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
dst->room = darray_room(dst->wb->keys);
|
|
|
|
if (dst->wb == &wb->flushing)
|
|
|
|
dst->room = min(dst->room, wb->sorted.size - wb->flushing.keys.nr);
|
|
|
|
BUG_ON(!dst->room);
|
|
|
|
BUG_ON(!dst->seq);
|
|
|
|
|
|
|
|
struct btree_write_buffered_key *wb_k = &darray_top(dst->wb->keys);
|
|
|
|
wb_k->journal_seq = dst->seq;
|
|
|
|
wb_k->btree = btree;
|
|
|
|
bkey_copy(&wb_k->k, k);
|
|
|
|
dst->wb->keys.nr++;
|
|
|
|
dst->room--;
|
|
|
|
return 0;
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
}
|
|
|
|
|
2023-11-03 06:57:19 +08:00
|
|
|
void bch2_journal_keys_to_write_buffer_start(struct bch_fs *c, struct journal_keys_to_wb *dst, u64 seq)
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
{
|
|
|
|
struct btree_write_buffer *wb = &c->btree_write_buffer;
|
|
|
|
|
2023-11-03 06:57:19 +08:00
|
|
|
if (mutex_trylock(&wb->flushing.lock)) {
|
|
|
|
mutex_lock(&wb->inc.lock);
|
|
|
|
move_keys_from_inc_to_flushing(wb);
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
|
2023-11-03 06:57:19 +08:00
|
|
|
/*
|
|
|
|
* Attempt to skip wb->inc, and add keys directly to
|
|
|
|
* wb->flushing, saving us a copy later:
|
|
|
|
*/
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
|
2023-11-03 06:57:19 +08:00
|
|
|
if (!wb->inc.keys.nr) {
|
|
|
|
dst->wb = &wb->flushing;
|
|
|
|
} else {
|
|
|
|
mutex_unlock(&wb->flushing.lock);
|
|
|
|
dst->wb = &wb->inc;
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
}
|
2023-11-03 06:57:19 +08:00
|
|
|
} else {
|
|
|
|
mutex_lock(&wb->inc.lock);
|
|
|
|
dst->wb = &wb->inc;
|
|
|
|
}
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
|
2023-11-03 06:57:19 +08:00
|
|
|
dst->room = darray_room(dst->wb->keys);
|
|
|
|
if (dst->wb == &wb->flushing)
|
|
|
|
dst->room = min(dst->room, wb->sorted.size - wb->flushing.keys.nr);
|
|
|
|
dst->seq = seq;
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
|
2023-11-03 06:57:19 +08:00
|
|
|
bch2_journal_pin_add(&c->journal, seq, &dst->wb->pin,
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
bch2_btree_write_buffer_journal_flush);
|
2023-11-03 06:57:19 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
void bch2_journal_keys_to_write_buffer_end(struct bch_fs *c, struct journal_keys_to_wb *dst)
|
|
|
|
{
|
|
|
|
struct btree_write_buffer *wb = &c->btree_write_buffer;
|
|
|
|
|
|
|
|
if (!dst->wb->keys.nr)
|
|
|
|
bch2_journal_pin_drop(&c->journal, &dst->wb->pin);
|
|
|
|
|
|
|
|
if (bch2_btree_write_buffer_should_flush(c) &&
|
|
|
|
__bch2_write_ref_tryget(c, BCH_WRITE_REF_btree_write_buffer) &&
|
|
|
|
!queue_work(system_unbound_wq, &c->btree_write_buffer.flush_work))
|
|
|
|
bch2_write_ref_put(c, BCH_WRITE_REF_btree_write_buffer);
|
|
|
|
|
|
|
|
if (dst->wb == &wb->flushing)
|
|
|
|
mutex_unlock(&wb->flushing.lock);
|
|
|
|
mutex_unlock(&wb->inc.lock);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int bch2_journal_keys_to_write_buffer(struct bch_fs *c, struct journal_buf *buf)
|
|
|
|
{
|
|
|
|
struct journal_keys_to_wb dst;
|
|
|
|
int ret = 0;
|
|
|
|
|
|
|
|
bch2_journal_keys_to_write_buffer_start(c, &dst, le64_to_cpu(buf->data->seq));
|
|
|
|
|
|
|
|
for_each_jset_entry_type(entry, buf->data, BCH_JSET_ENTRY_write_buffer_keys) {
|
|
|
|
jset_entry_for_each_key(entry, k) {
|
|
|
|
ret = bch2_journal_key_to_wb(c, &dst, entry->btree_id, k);
|
|
|
|
if (ret)
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
entry->type = BCH_JSET_ENTRY_btree_keys;
|
|
|
|
}
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
|
2024-02-18 06:54:39 +08:00
|
|
|
spin_lock(&c->journal.lock);
|
2023-11-03 06:57:19 +08:00
|
|
|
buf->need_flush_to_write_buffer = false;
|
2024-02-18 06:54:39 +08:00
|
|
|
spin_unlock(&c->journal.lock);
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
out:
|
2023-11-03 06:57:19 +08:00
|
|
|
bch2_journal_keys_to_write_buffer_end(c, &dst);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int wb_keys_resize(struct btree_write_buffer_keys *wb, size_t new_size)
|
|
|
|
{
|
|
|
|
if (wb->keys.size >= new_size)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
if (!mutex_trylock(&wb->lock))
|
|
|
|
return -EINTR;
|
|
|
|
|
|
|
|
int ret = darray_resize(&wb->keys, new_size);
|
|
|
|
mutex_unlock(&wb->lock);
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2023-11-03 06:57:19 +08:00
|
|
|
int bch2_btree_write_buffer_resize(struct bch_fs *c, size_t new_size)
|
|
|
|
{
|
|
|
|
struct btree_write_buffer *wb = &c->btree_write_buffer;
|
|
|
|
|
|
|
|
return wb_keys_resize(&wb->flushing, new_size) ?:
|
|
|
|
wb_keys_resize(&wb->inc, new_size);
|
|
|
|
}
|
|
|
|
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
void bch2_fs_btree_write_buffer_exit(struct bch_fs *c)
|
|
|
|
{
|
|
|
|
struct btree_write_buffer *wb = &c->btree_write_buffer;
|
|
|
|
|
2023-11-03 06:57:19 +08:00
|
|
|
BUG_ON((wb->inc.keys.nr || wb->flushing.keys.nr) &&
|
|
|
|
!bch2_journal_error(&c->journal));
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
|
2023-11-03 06:57:19 +08:00
|
|
|
darray_exit(&wb->sorted);
|
|
|
|
darray_exit(&wb->flushing.keys);
|
|
|
|
darray_exit(&wb->inc.keys);
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
int bch2_fs_btree_write_buffer_init(struct bch_fs *c)
|
|
|
|
{
|
|
|
|
struct btree_write_buffer *wb = &c->btree_write_buffer;
|
|
|
|
|
2023-11-03 06:57:19 +08:00
|
|
|
mutex_init(&wb->inc.lock);
|
|
|
|
mutex_init(&wb->flushing.lock);
|
|
|
|
INIT_WORK(&wb->flush_work, bch2_btree_write_buffer_flush_work);
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
|
2023-11-03 06:57:19 +08:00
|
|
|
/* Will be resized by journal as needed: */
|
|
|
|
unsigned initial_size = 1 << 16;
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
|
2023-11-03 06:57:19 +08:00
|
|
|
return darray_make_room(&wb->inc.keys, initial_size) ?:
|
|
|
|
darray_make_room(&wb->flushing.keys, initial_size) ?:
|
|
|
|
darray_make_room(&wb->sorted, initial_size);
|
bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.
This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.
This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.
Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.
- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.
- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.
If there isn't room we flush the write buffer in the transaction
commit error path and retry.
- A new persistent option, for specifying the number of entries in the
write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-01-04 13:00:50 +08:00
|
|
|
}
|