linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-12-28 05:24:47 +08:00

Author	SHA1	Message	Date
Kent Overstreet	462f494bc5	bcachefs: Fix lockdep splat in bch2_readdir dir_emit() can fault (taking mmap_lock); thus we can't be holding btree locks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:04 -04:00
Kent Overstreet	b6898917f2	bcachefs: Check for ERR_PTR() from filemap_lock_folio() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:04 -04:00
Kent Overstreet	1bb3c2a974	bcachefs: New error message helpers Add two new helpers for printing error messages with __func__ and bch2_err_str(): - bch_err_fn - bch_err_msg Also kill the old error strings in the recovery path, which were causing us to incorrectly report memory allocation failures - they're not needed anymore. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:04 -04:00
Kent Overstreet	a83e108fc1	bcachefs: fiemap: Fix a lockdep splat As with the previous patch, we generally can't hold btree locks while copying to userspace, as that may incur a page fault and require mmap_lock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:04 -04:00
Kent Overstreet	a5b696ee6e	bcachefs: seqmutex; fix a lockdep splat We can't be holding btree_trans_lock while copying to user space, which might incur a page fault. To fix this, convert it to a seqmutex so we can unlock/relock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:04 -04:00
Kent Overstreet	6547ebabda	bcachefs: Don't call lock_graph_descend() with wait lock held This fixes a deadlock: 01305 WARNING: possible circular locking dependency detected 01305 6.3.0-ktest-gf4de9bee61af #5305 Tainted: G W 01305 ------------------------------------------------------ 01305 cat/14658 is trying to acquire lock: 01305 ffffffc00982f460 (fs_reclaim){+.+.}-{0:0}, at: __kmem_cache_alloc_node+0x48/0x278 01305 01305 but task is already holding lock: 01305 ffffff8011aaf040 (&lock->wait_lock){+.+.}-{2:2}, at: bch2_check_for_deadlock+0x4b8/0xa58 01305 01305 which lock already depends on the new lock. 01305 01305 01305 the existing dependency chain (in reverse order) is: 01305 01305 -> #2 (&lock->wait_lock){+.+.}-{2:2}: 01305 _raw_spin_lock+0x54/0x70 01305 __six_lock_wakeup+0x40/0x1b0 01305 six_unlock_ip+0xe8/0x248 01305 bch2_btree_key_cache_scan+0x720/0x940 01305 shrink_slab.constprop.0+0x284/0x770 01305 shrink_node+0x390/0x828 01305 balance_pgdat+0x390/0x6d0 01305 kswapd+0x2e4/0x718 01305 kthread+0x184/0x1a8 01305 ret_from_fork+0x10/0x20 01305 01305 -> #1 (&c->lock#2){+.+.}-{3:3}: 01305 __mutex_lock+0x104/0x14a0 01305 mutex_lock_nested+0x30/0x40 01305 bch2_btree_key_cache_scan+0x5c/0x940 01305 shrink_slab.constprop.0+0x284/0x770 01305 shrink_node+0x390/0x828 01305 balance_pgdat+0x390/0x6d0 01305 kswapd+0x2e4/0x718 01305 kthread+0x184/0x1a8 01305 ret_from_fork+0x10/0x20 01305 01305 -> #0 (fs_reclaim){+.+.}-{0:0}: 01305 __lock_acquire+0x19d0/0x2930 01305 lock_acquire+0x1dc/0x458 01305 fs_reclaim_acquire+0x9c/0xe0 01305 __kmem_cache_alloc_node+0x48/0x278 01305 __kmalloc_node_track_caller+0x5c/0x278 01305 krealloc+0x94/0x180 01305 bch2_printbuf_make_room.part.0+0xac/0x118 01305 bch2_prt_printf+0x150/0x1e8 01305 bch2_btree_bkey_cached_common_to_text+0x170/0x298 01305 bch2_btree_trans_to_text+0x244/0x348 01305 print_cycle+0x7c/0xb0 01305 break_cycle+0x254/0x528 01305 bch2_check_for_deadlock+0x59c/0xa58 01305 bch2_btree_deadlock_read+0x174/0x200 01305 full_proxy_read+0x94/0xf0 01305 vfs_read+0x15c/0x3a8 01305 ksys_read+0xb8/0x148 01305 __arm64_sys_read+0x48/0x60 01305 invoke_syscall.constprop.0+0x64/0x138 01305 do_el0_svc+0x84/0x138 01305 el0_svc+0x34/0x80 01305 el0t_64_sync_handler+0xb0/0xb8 01305 el0t_64_sync+0x14c/0x150 01305 01305 other info that might help us debug this: 01305 01305 Chain exists of: 01305 fs_reclaim --> &c->lock#2 --> &lock->wait_lock 01305 01305 Possible unsafe locking scenario: 01305 01305 CPU0 CPU1 01305 ---- ---- 01305 lock(&lock->wait_lock); 01305 lock(&c->lock#2); 01305 lock(&lock->wait_lock); 01305 lock(fs_reclaim); 01305 01305 * DEADLOCK * Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:04 -04:00
Kent Overstreet	e96f5a61cb	bcachefs: Fix bch2_check_discard_freespace_key() We weren't correctly checking the freespace btree - it's an extents btree, which means we need to iterate over each bucket in a freespace extent. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:04 -04:00
Kent Overstreet	25aa8c2167	bcachefs: bch2_trans_unlock_noassert() This fixes a spurious assert in the btree node read path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:04 -04:00
Kent Overstreet	45a1ab57dd	bcachefs: Fix bch2_btree_update_start() The calculation for number of nodes to allocate in bch2_btree_update_start() was incorrect - this fixes a BUG_ON() on the small nodes test. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:04 -04:00
Kent Overstreet	91ecd41b7f	bcachefs: bch2_extent_ptr_desired_durability() This adds a new helper for getting a pointer's durability irrespective of the device state, and uses it in the the data update path. This fixes a bug where we do a data update but request 0 replicas to be allocated, because the replica being rewritten is on a device marked as failed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:04 -04:00
Kent Overstreet	253748a26a	bcachefs: snapshot_to_text() includes snapshot tree Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:04 -04:00
Kent Overstreet	995f9128e0	bcachefs: Fix try_decrease_writepoints() - We may need to drop btree locks before taking the writepoint_lock, as is done in other places. - We should be using open_bucket_free_unused(), so that we don't waste space. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:04 -04:00
Kent Overstreet	25c70097a6	bcachefs: Delete weird hacky transaction restart injection since we currently don't have a good fault injection library, bch2_btree_insert_node() was randomly injecting faults based on local_clock(). At the very least this should have been a debug mode only thing, but this is a brittle method so let's just delete it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:04 -04:00
Kent Overstreet	8e5b1115f1	bcachefs: Write buffer flush needs BTREE_INSERT_NOCHECK_RW btree write buffer flush is only invoked from contexts that already hold a write ref, and checking if we're still RW could cause us to fail to completely flush the write buffer when shutting down. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:04 -04:00
Kent Overstreet	7724664f0e	bcachefs: New assertions when marking filesystem clean Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:04 -04:00
Kent Overstreet	99a3d39893	bcachefs: ec: Fix a lost wakeup Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:04 -04:00
Mikulas Patocka	954ed17e02	bcachefs: fix NULL pointer dereference in try_alloc_bucket On Mon, 29 May 2023, Mikulas Patocka wrote: > The oops happens in set_btree_iter_dontneed and it is caused by the fact > that iter->path is NULL. The code in try_alloc_bucket is buggy because it > sets "struct btree_iter iter = { NULL };" and then jumps to the "err" > label that tries to dereference values in "iter". Here I'm sending a patch for it. From: Mikulas Patocka <mpatocka@redhat.com> The function try_alloc_bucket sets the variable "iter" to NULL and then (on various error conditions) jumps to the label "err". On the "err" label, it calls "set_btree_iter_dontneed" that tries to dereference "iter->trans" and "iter->path". So, we get an oops on error condition. This patch fixes the crash by testing that iter.trans and iter.path is non-zero before calling set_btree_iter_dontneed. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:04 -04:00
Kent Overstreet	b0e8c75e40	bcachefs: Fix subvol deletion deadlock d_prune_aliases() may call bch2_evict_inode(), which needs c->vfs_inodes_list_lock. Fix this by always calling igrab() before putting the inodes onto our disposal list, and then calling d_prune_aliases() with c->vfs_inodes_lock dropped. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:03 -04:00
Brian Foster	5bc740820e	bcachefs: don't spin in rebalance when background target is not usable If a bcachefs filesystem is configured with a background device (disk group), rebalance will relocate data to this device in the background by checking extent keys for whether they currently reside in the specified target. For keys that do not, rebalance performs a read/write cycle to allow the write path to properly relocate data. If the background target is not usable (read-only, for example), however, the write path doesn't actually move data to another device. Instead, rebalance spins indefinitely reading and rewriting the same data over and over to the same device. If the background target is made available again, the rebalance picks this up, relocates the data, and eventually terminates. To avoid this spinning behavior, update the rebalance background target logic to not only check whether the extent is not in the target, but whether the target is actually usable as well. If not, then don't mark the key for rewrite. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:03 -04:00
Brian Foster	a1dd428b8b	bcachefs: push rcu lock down into bch2_target_to_mask() We have one caller that cycles the rcu lock solely for this call (via target_rw_devs()), and we'd like to add another. Simplify things by pushing the rcu lock down into bch2_target_to_mask(), similar to how bch2_dev_in_target() works. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:03 -04:00
Brian Foster	fec4fc82b5	bcachefs: create internal disk_groups sysfs file We have bch2_sb_disk_groups_to_text() to dump disk group labels, but no good information on device group membership at runtime. Add bch2_disk_groups_to_text() and an associated 'disk_groups' sysfs file to print group and device relationships. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:03 -04:00
Kent Overstreet	28551613b7	bcachefs: Clean up tests code - delete redundant error messages - convert various code to bch2_trans_run Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:03 -04:00
Kent Overstreet	bc166d711d	bcachefs: Improve backpointers error message the error message here dated from when backpointers could be stored in alloc keys; now, we should always print the full key. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:03 -04:00
Kent Overstreet	49c7cd9d8d	bcachefs: More drop_locks_do() conversions Using drop_locks_do() ensures that every unlock() is paired with a relock(), with proper error checking. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:03 -04:00
Kent Overstreet	bb125baf51	bcachefs: Delete warning from promote_alloc() It's possible to see a -BCH_ERR_ENOSPC_disk_reservation here, and that's fine. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:03 -04:00
Kent Overstreet	4f2c166ebe	bcachefs: Fix bch2_fsck_ask_yn() - getline() output includes a newline, without stripping that we were just looping - Make the prompt clearer Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:03 -04:00
Kent Overstreet	21da6101bd	bcachefs: replicas_deltas_realloc() uses allocate_dropping_locks() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:03 -04:00
Kent Overstreet	5ff10c0a04	bcachefs: Convert acl.c to allocate_dropping_locks() More work to avoid allocating memory with btree locks held. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:03 -04:00
Kent Overstreet	d95dd378c2	bcachefs: allocate_dropping_locks() Add two new helpers for allocating memory with btree locks held: The idea is to first try the allocation with GFP_NOWAIT\|__GFP_NOWARN, then if that fails - unlock, retry with GFP_KERNEL, and then call trans_relock(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:03 -04:00
Kent Overstreet	3ebfc8fe95	bcachefs: Use unlikely() in bch2_err_matches() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:03 -04:00
Kent Overstreet	4c4a8f20d1	bcachefs: Fix error handling in promote path The promote path had a BUG_ON() for unknown error type, which we're now seeing: change it to a WARN_ON() - because we're curious what this is - and otherwise handle it in the normal error path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:03 -04:00
Kent Overstreet	5718fda0b5	bcachefs: fs-io: Eliminate GFP_NOFS usage GFP_NOFS doesn't ever make sense. If we're allocatingc memory it should be GFP_NOWAIT if btree locks are held, GFP_KERNEL otherwise. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:03 -04:00
Kent Overstreet	78367aaa5a	bcachefs: bch2_trans_kmalloc no longer allocates memory with btree locks held When allocating memory, gfp flags should generally be - GFP_NOWAIT\|__GFP_NOWARN if btree locks are held - GFP_NOFS if in the IO path or otherwise holding resources needed for IO submission - GFP_KERNEL otherwise Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:03 -04:00
Kent Overstreet	b5fd75669a	bcachefs: drop_locks_do() Add a new helper for the common pattern of: - trans_unlock() - do something - trans_relock() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:03 -04:00
Kent Overstreet	19c304bebd	bcachefs: GFP_NOIO -> GFP_NOFS GFP_NOIO dates from the bcache days, when we operated under the block layer. Now, GFP_NOFS is more appropriate, so switch all GFP_NOIO uses to GFP_NOFS. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:03 -04:00
Kent Overstreet	e1d29c5fa1	bcachefs: Ensure bch2_btree_node_get() calls relock() after unlock() Fix a bug where bch2_btree_node_get() might call bch2_trans_unlock() (in fill) without calling bch2_trans_relock(); this is a bug when it's done in the core btree code. Also, twea bch2_btree_node_mem_alloc() to drop btree locks before doing a blocking memory allocation. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:03 -04:00
Kent Overstreet	70d41c9e27	bcachefs: Avoid __GFP_NOFAIL We've been using __GFP_NOFAIL for allocating struct bch_folio, our private per-folio state. However, that struct is variable size - it holds state for each sector in the folio, and folios can be quite large now, which means it's possible for bch_folio to be larger than PAGE_SIZE now. __GFP_NOFAIL allocations are undesirable in normal circumstances, but particularly so at >= PAGE_SIZE, and warnings are emitted for that. So, this patch adds proper error paths and eliminates most uses of __GFP_NOFAIL. Also, do some more cleanup of gfp flags w.r.t. btree node locks: we can use GFP_KERNEL, but only if we're not holding btree locks, and if we are holding btree locks we should be using GFP_NOWAIT. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:03 -04:00
Kent Overstreet	ad520141b1	bcachefs: Fix corruption with writeable snapshots When partially overwriting an extent in an older snapshot, the existing extent has to be split. If the existing extent was overwritten in a different (sibling) snapshot, we have to ensure that the split won't be visible in the sibling snapshot. data_update.c already has code for this, bch2_insert_snapshot_writeouts() - we just need to move it into btree_update_leaf.c and change bch2_trans_update_extent() to use it as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:03 -04:00
Kent Overstreet	e47a390aa5	bcachefs: Convert -ENOENT to private error codes As with previous conversions, replace -ENOENT uses with more informative private error codes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:03 -04:00
Kent Overstreet	f154c3eb42	bcachefs: trans_for_each_path_safe() bch2_btree_trans_to_text() is used on btree_trans objects that are owned by different threads - when printing out deadlock cycles - so we need a safe version of trans_for_each_path(), else we race with seeing a btree_path that was just allocated and not fully initialized: Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:02 -04:00
Kent Overstreet	e7ffda565a	bcachefs: Fix a quota read bug bch2_fs_quota_read() could see an inode that's been deleted (KEY_TYPE_inode_generation) - bch2_fs_quota_read_inode() needs to check for that instead of erroring. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:02 -04:00
Kent Overstreet	c26463ce99	bcachefs: Fix move_extent_fail counter fail counters need to be events, not numbers of sectors - or the calculations the tests use for determining if we've had too many slowpath events don't work. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:02 -04:00
Kent Overstreet	fc0ee376bb	bcachefs: Don't reuse reflink btree keyspace We've been seeing difficult to debug "missing indirect extent" bugs, that fsck doesn't seem to find. One possibility is that there was a missing indirect extent, but then a new indirect extent was created at the location of the previous indirect extent. This patch eliminates that possibility by always creating new indirect extents right after the last one, at the end of the reflink btree. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:02 -04:00
Kent Overstreet	db32bb9a5f	mean and variance: Add a missing include abs() is in math.h Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:02 -04:00
Kent Overstreet	65bc410907	mean and variance: More tests Add some more tests that test conventional and weighted mean simultaneously, and with a table of values that represents events that we'll be using this to look for so we can verify-by-eyeball that the output looks sane. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:02 -04:00
Kent Overstreet	aab5e0972a	six locks: Disable percpu read lock mode in userspace When running in userspace, we currently don't have a real percpu implementation available - at least in bcachefs-tools, which is where this code is currently used in userspace. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:02 -04:00
Kent Overstreet	2d9200cfe0	six locks: Use atomic_try_cmpxchg_acquire() This switches to a newer cmpxchg variant which updates @old for us on failure, simplifying the cmpxchg loops a bit and supposedly generating better code. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:02 -04:00
Kent Overstreet	c4687a4a75	six locks: Fix an unitialized var In the conversion to atomic_t, six_lock_slowpath() ended up calling six_lock_wakeup() in the failure path with a state variable that was never initialized - whoops. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:02 -04:00
Kent Overstreet	96e53e909d	six locks: Delete redundant comment Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:02 -04:00
Kent Overstreet	2ab62310fd	six locks: Tiny bit more tidying Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:02 -04:00

1 2 3 4 5 ...

1217694 Commits