We can't use the page lock to protect it, because on writeback IO error
we need to access the page state before calling end_page_writeback() and
the page lock semantics are completely insane so that deadlocks.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Disk space accounting for erasure coding + compression was completely
broken - we need to calculate the parity sectors delta the same way we
calculate disk_sectors, by calculating the old and new usage and
subtracting to get the difference.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
The bkey_s_c returned by btree_iter_(peek|next) points into the btree
iter type, so advancing the iterator and then using the one previously
returned is a bug...
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This make the disk accounting code saner, and it's not clear why we'd
ever want the same data to be in multiple stripes simultaneously.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
If an extent only contained cached or erasure coded pointers, there
won't be any devices in the normal dirty replicas list or an entry to
update.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Running the filesystem under valgrind exposed some garbage data being
written to disk in bch2_journal_super_entries_add_common(), in the
portion which encodes bch_replica_entry objects.
Signed-off-by: Justin Husted <sigstop@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Running the filesystem under valgrind exposed a path where the max_stale
variable in bch2_gc_btree() might not be initialized before use in a
rare case when there are no btree nodes in a transaction.
Signed-off-by: Justin Husted <sigstop@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
bch2_extent_atomic_end counts the number of iterators requried for
marking overwrites - but journal replay never marks overwrites, so that
part was incorrect. And counting iterators for the key being inserted
should be unnecessary because we did that prior to the key being
inserted before it was first journalled.
This should fix an iterator overflow bug - the iterators for walking
overwrites were totally unneeded.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This fixes a bug in io.c bch2_write_index_default() - it was missing the
traverse call, but bch2_extent_atomic_end returns an error now and can
just call it itself.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This refactoring makes the code easier to understand by separating the
bcachefs btree transactional code from the linux VFS code - but more
importantly, it's also to share code with the fuse port.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
With the refactoring that's coming to add fuse support, we want
bch2_hash_info_init() to be cheaper so we don't have to rely on anything
cached besides the inode in the btree.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
We currently don't have a way to propagate inode io opts to indirect
extents. This is a problem...
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
It's possible to get -EIO in __btree_iter_traverse_all() after looping,
with orig_iter NULL.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
When grab_cache_page_write_begin() fails but we did pin some pages, we
shouldn't return -ENOMEM, we should do a partial write.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This is the start of some refactoring work to make less code depend on
the linux VFS - here the inode cache - to make e.g. the fuse port
easier.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
The btree_trans struct needs to memoize/cache btree iterators, so that
on transaction restart we don't have to completely redo btree lookups,
and so that we can do them all at once in the correct order when the
transaction had to restart to avoid a deadlock.
This switches the btree iterator lookups to work based on iterator
position, instead of trying to match them up based on the stack trace.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
We shouldn't ever be writing past i_size - but, apparently there's still
a bug to track down.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
In order to avoid trying to allocate too many btree iterators,
bch2_extent_atomic_end() needs to count how many iterators are going to
be needed for insertions and overwrites - but we weren't counting the
iterators for deleting a reflink_v when the refcount goes to 0.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
If the user buffer isn't aligned to the filesystem block size, on a
large enough IO - where it won't fit into a single bio -
bio_iov_iter_get_pages() won't necessarily return a bio with the proper
alignment.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
When an extent is erasure coded, we need to record a replicas entry to
indicate that data is present on the devices that extent has pointers to
- but nr_required should be 0, because it's erasure coded.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Last of the basic operations for iterating forwards and backwards over
the btree: we now have
- peek(), returns key >= iter->pos
- next(), returns key > iter->pos
- peek_prev(), returns key <= iter->pos
- prev(), returns key < iter->pos
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
bcachefs used to work mostly in terms of PAGE_SIZE, not block size at
the vfs level - but that has since been fixed.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Call bch2_btree_iter_verify from bch2_btree_node_iter_fix(); also verify
in btree_iter_peek_uptodate() that iter->k matches what's in the btree.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Any time we're modifying what's in the btree, iterators potentially have
to be updated - this one was exposed by the reflink code.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
The allocator needs to make sure there's buckets available on the
RESERVE_NONE freelist if at all possible - otherwise foreground IO will
get stuck.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
.key_debugcheck no longer needs to take a pointer to the btree node
Also, try to make sure wherever we're inserting or modifying keys in the
btree.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
With multiple iterators, if another iterator points to the key being
modified, we need to call bch2_btree_node_iter_fix() to re-unpack the
key into the iter->k
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Move extents instead of copying them - this way, we can iterate over
only live extents, not the entire keyspace. Also, this means we can
mostly skip running triggers.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>