linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-15 16:24:13 +08:00

Author	SHA1	Message	Date
Kent Overstreet	bcdb4b9732	bcachefs: Don't call into journal reclaim when we're not supposed to This was causing a deadlock when btree_update_nodes_writtes() invokes journal reclaim because of the btree cache being too dirty. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:53 -04:00
Kent Overstreet	59a7405161	bcachefs: Create allocator threads when allocating filesystem We're seeing failures to mount because of a failure to start the allocator threads, which currently happens fairly late in the mount process, after walking all metadata, and kthread_create() fails if something has tried to kill the mount process, which is probably not what we want. This patch avoids this issue by creating, but not starting, the allocator threads when we preallocate all of our other in memory data structures. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:53 -04:00
Kent Overstreet	18a7b97239	bcachefs: Fix for bch2_btree_node_get_noiter() returning -ENOMEM bch2_btree_node_get_noiter() isn't used from the btree iterator code, which retries with the btree node cache cannibalize lock held on -ENOMEM, so we should do it ourself if necessary. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:53 -04:00
Kent Overstreet	dab9ef0d27	bcachefs: Add error message for some allocation failures Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:53 -04:00
Kent Overstreet	8042b5b715	bcachefs: Extents may now cross btree node boundaries When snapshots arrive, we won't necessarily be able to arbitrarily split existis - when we need to split an existing extent, we'll have to check if the extent was overwritten in child snapshots and if so emit a whiteout for the split in the child snapshot. Because extents couldn't span btree nodes previously, journal replay would sometimes have to split existing extents. That's no good anymore, but fortunately since extent handling has already been lifted above most of the btree code there's no real need for that rule anymore. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:53 -04:00
Kent Overstreet	7e1a3aa9df	bcachefs: iter->real_pos We need to differentiate between the search position of a btree iterator, vs. what it actually points at (what we found). This matters for extents, where iter->pos will typically be the start of the key we found and iter->real_pos will be the end of the key we found (which soon won't necessarily be in the same btree node!) and it will also matter for snapshots. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:53 -04:00
Kent Overstreet	9f631dc143	bcachefs: Ensure btree iterators are traversed in bch2_trans_commit() The upcoming patch to allow extents to span btree nodes will require this... and this assertion seems to be popping, and it's not a very good assertion anyways. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:53 -04:00
Kent Overstreet	0507962f63	bcachefs: Drop invalid stripe ptrs in fsck More repair code, now that we can repair extents during initial gc. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:53 -04:00
Robbie Litchfield	0ef837a0cc	bcachefs: Fix unnecessary read amplificaiton when allocating ec stripes When allocating an erasure coding stripe, bcachefs will always reuse any partial stripes before reserving a new stripe. This causes unnecessary read amplification when preparing a stripe for writing. This patch changes bcachefs to always reserve new stripes first, only relying on stripe reuse when copygc needs more time to empty buckets from existing stripes. Signed-off-by: Robbie Litchfield <blam.kiwi@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:53 -04:00
Kent Overstreet	2bb748a695	bcachefs: Fsck fixes Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:53 -04:00
Kent Overstreet	9d40326176	bcachefs: Fix a shift greater than type size Found by UBSAN Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:53 -04:00
Kent Overstreet	5ea037d03c	bcachefs: Assert that we're not trying to flush journal seq in the future Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:53 -04:00
Kent Overstreet	3d4955952f	bcachefs: Fix bch2_btree_iter_peek_prev() This makes bch2_btree_iter_peek_prev() and bch2_btree_iter_prev() consistent with peek() and next(), w.r.t. iter->pos. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:53 -04:00
Kent Overstreet	434094bec0	bcachefs: bch2_btree_iter_advance_pos() This adds a new common helper for advancing past the last key returned by peek(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:53 -04:00
Kent Overstreet	792e2c4c85	bcachefs: Kill bch2_btree_iter_set_pos_same_leaf() The only reason we were keeping this around was for BTREE_INSERT_NOUNLOCK semantics - if bch2_btree_iter_set_pos() advances to the next leaf node, it'll drop the lock on the node that we just inserted to. But we don't rely on BTREE_INSERT_NOUNLOCK semantics for the extents btree, just the inodes btree, and if we do need it for the extents btree in the future we can do it more cleanly by cloning the iterator - this lets us delete some special cases in the btree iterator code, which is complicated enough as it is. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:53 -04:00
Kent Overstreet	2b2c1a89ce	bcachefs: Simplify btree_iter_(next\|prev)_leaf() There's no good reason for these functions to not be using bch2_btree_iter_set_pos(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:53 -04:00
Kent Overstreet	eaf798317a	bcachefs: Fix for hash_redo_key() in fsck It's possible we're calling hash_redo_key() because of a duplicate key - easiest fix for that is to just not use BCH_HASH_SET_MUST_CREATE. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:53 -04:00
Kent Overstreet	6a16ad951a	bcachefs: Add flushed_seq_ondisk to journal_debug_to_text() Also, make the wait in bch2_journal_flush_seq() interruptible, not just killable. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:53 -04:00
Kent Overstreet	fcb3431be8	bcachefs: Redo checks for sufficient devices When the replicas mechanism was added, for tracking data by which drives it's replicated on, the check for whether we have sufficient devices was never updated to make use of it. This patch finally does that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:53 -04:00
Kent Overstreet	5d428c7c64	bcachefs: Run fsck if BCH_FEATURE_alloc_v2 isn't set We're using BCH_FEATURE_alloc_v2 to also gate journalling updates to dev usage - we don't have the code for reconstructing this from buckets anymore, so we need to run fsck if it's not set. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:53 -04:00
Kent Overstreet	4b8f89afd4	bcachefs: Fixes/improvements for journal entry reservations This fixes some arithmetic bugs in "bcachefs: Journal updates to dev usage" - additionally, it cleans things up by switching everything that goes in every journal entry to the journal_entry_res mechanism. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:52 -04:00
Kent Overstreet	91f6ad6f94	bcachefs: Include device in btree IO error messages Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:52 -04:00
Kent Overstreet	180fb49dea	bcachefs: Journal updates to dev usage This eliminates the need to scan every bucket to regenerate dev_usage at mount time. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:52 -04:00
Kent Overstreet	2abe542087	bcachefs: Persist 64 bit io clocks Originally, bcachefs - going back to bcache - stored, for each bucket, a 16 bit counter corresponding to how long it had been since the bucket was read from. But, this required periodically rescaling counters on every bucket to avoid wraparound. That wasn't an issue in bcache, where we'd perodically rewrite the per bucket metadata all at once, but in bcachefs we're trying to avoid having to walk every single bucket. This patch switches to persisting 64 bit io clocks, corresponding to the 64 bit bucket timestaps introduced in the previous patch with KEY_TYPE_alloc_v2. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:52 -04:00
Kent Overstreet	7f4e1d5d0f	bcachefs: KEY_TYPE_alloc_v2 This introduces a new version of KEY_TYPE_alloc, which uses the new varint encoding introduced for inodes. This means we'll eventually be able to support much larger bucket sizes (for SMR devices), and the read/write time fields are expanded to 64 bits - which will be used in the next patch to get rid of the periodic rescaling of those fields. Also, for buckets that are members of erasure coded stripes, this adds persistent fields for the index of the stripe they're members of and the stripe redundancy. This is part of work to get rid of having to scan and read into memory the alloc and stripes btrees at mount time. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:52 -04:00
Kent Overstreet	26452d1dcd	bcachefs: Add missing call to bch2_replicas_entry_sort() This fixes a bug introduced by "bcachefs: Improve diagnostics when journal entries are missing" - devices in a replicas entry are supposed to be sorted. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:52 -04:00
Kent Overstreet	a28bd48a7f	bcachefs: Add an assertion to check for journal writes to same location Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:52 -04:00
Kent Overstreet	d042b0402c	bcachefs: Add an option for metadata_target Also, make journal writes obey foreground_target and metadata_target. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:52 -04:00
Kent Overstreet	5fc70d3a54	bcachefs: Repair bad data pointers Now that we can repair metadata during GC, we can handle bad pointers that would trigger errors being marked, when they need to just be dropped. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:52 -04:00
Kent Overstreet	a0b73c1c53	bcachefs: Add (partial) support for fixing btree topology When we walk the btrees during recovery, part of that is checking that btree topology is correct: for every interior btree node, its child nodes should exactly span the range the parent node covers. Previously, we had checks for this, but not repair code. Now that we have the ability to do btree updates during initial GC, this patch adds that repair code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:52 -04:00
Kent Overstreet	5b593ee172	bcachefs: Add support for doing btree updates prior to journal replay Some errors may need to be fixed in order for GC to successfully run - walk and mark all metadata. But we can't start the allocators and do normal btree updates until after GC has completed, and allocation information is known to be consistent, so we need a different method of doing btree updates. Fortunately, we already have code for walking the btree while overlaying keys from the journal to be replayed. This patch adds an update path that adds keys to the list of keys to be replayed by journal replay, and also fixes up iterators. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:52 -04:00
Kent Overstreet	51d2dfb82d	bcachefs: Add BTREE_PTR_RANGE_UPDATED This is so that when we discover btree topology issues, we can just update the pointer to a btree node and signal btree read path that the min/max keys in the node header should be updated from the node pointer. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:52 -04:00
Kent Overstreet	a66f798974	bcachefs: Refactor checking of btree topology Still a lot of work to be done here: we can't yet repair btree topology issues, but this patch refactors things so that we have better access to what we need in the topology checks. Next up will be figuring out a way to do btree updates during gc, before journal replay is done. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:52 -04:00
Kent Overstreet	e4c3f386b6	bcachefs: Improve diagnostics when journal entries are missing There's an outstanding bug with journal entries being missing in journal replay. This patch adds code to print out where the journal entries were physically located that were around the entry(ies) being missing, which should make debugging easier. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:52 -04:00
Kent Overstreet	522c25f068	bcachefs: Fix BCH_REPLICAS_MAX check Ideally, this limit will be going away in the future. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:52 -04:00
Kent Overstreet	0093a50f27	bcachefs: Fix build in userspace The userspace bch_err() macro doesn't use the filesystem argument. Could also be fixed with a better macro. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:52 -04:00
Kent Overstreet	4529ae09ce	bcachefs: Fix an assertion If we're invalidating a bucket that has cached data in it, data_type won't be 0 - oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:52 -04:00
Kent Overstreet	bfcf840ddf	bcachefs: Mark superblocks transactionally More work towards getting rid of the in memory struct bucket: this path adds code for marking superblock and journal buckets via the btree, and uses it in the device add and journal resize paths. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:52 -04:00
Kent Overstreet	9afc6652d1	bcachefs: Kill bch2_invalidate_bucket() This patch is working towards eventually getting rid of the in memory struct bucket, and relying only on the btree representation. Since bch2_invalidate_bucket() was only used for incrementing gens, not invalidating cached data, no other counters were being changed as a side effect - meaning it's safe for the allocator code to increment the bucket gen directly. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:52 -04:00
Kent Overstreet	72eab8da47	bcachefs: Refactor dev usage This is to make it more amenable for serialization. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:52 -04:00
Kent Overstreet	079663d8ed	bcachefs: Kill metadata only gc This was useful before we had transactional updates to interior btree nodes - but now, it's just extra unneeded complexity. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:52 -04:00
Kent Overstreet	b7cf4bd7fe	bcachefs: Ensure __bch2_trans_commit() always calls bch2_trans_reset() This was leading to a very strange bug in bch2_bucket_io_time_reset(), where we'd retry without clearing out the list of updates. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:51 -04:00
Kent Overstreet	fdbb88ac01	bcachefs: Fix a faulty assertion If journal replay hasn't finished, the journal can't be empty - oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:51 -04:00
Kent Overstreet	e46b855734	bcachefs: Switch replicas.c allocations to GFP_KERNEL We're transitioning to memalloc_nofs_save/restore instead of GFP flags with the rest of the kernel, and GFP_NOIO was excessively strict and causing unnnecessary allocation failures - these allocations are done with btree locks dropped. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:51 -04:00
Kent Overstreet	b4725cc1a4	bcachefs: Fix loopback in dio mode We had a deadlock on page_lock, because buffered reads signal completion by unlocking the page, but the dio read path normally dirties the pages it's reading to with set_page_dirty_lock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:51 -04:00
Kent Overstreet	ef470b4817	bcachefs: Clean up bch2_extent_can_insert It was using an internal btree node iterator interface, when bch2_btree_iter_peek_slot() sufficed. We were hitting a null ptr deref that looked like it was from the iterator not being uptodate - this will also fix that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:51 -04:00
Kent Overstreet	a5cd80ea99	bcachefs: Fix an assertion pop There was a race: btree node writes drop their reference on journal pins before clearing the btree_node_write_in_flight flag. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:51 -04:00
Kent Overstreet	33ccd7188e	bcachefs: Don't allocate stripes at POS_MIN In the future, stripe index 0 will be a sentinal value. This patch doesn't disallow stripes at POS_MIN yet, leaving that for when we do the on disk format changes. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:51 -04:00
Kent Overstreet	6c7585b098	bcachefs: Rework allocating buckets for stripes Allocating buckets for existing stripes was busted, in part because the data structures were too contorted. This reworks new stripes so that we have an array of open buckets that matches blocks in the stripe, and it's sparse if we're reusing an existing stripe. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:51 -04:00
Kent Overstreet	f9ef45ad43	bcachefs: Verify transaction updates are sorted A user reported a bug that implies they might not be correctly sorted, this should help track that down. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:51 -04:00

1 2 3 4 5 ...

1216111 Commits