linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-13 14:24:11 +08:00

Author	SHA1	Message	Date
Pei Xiao	93d53f1caf	bcachefs: add check NULL return of bio_kmalloc in journal_read_bucket bio_kmalloc may return NULL, will cause NULL pointer dereference. Add check NULL return for bio_kmalloc in journal_read_bucket. Signed-off-by: Pei Xiao <xiaopei01@kylinos.cn> Fixes: `ac10a9611d` ("bcachefs: Some fixes for building in userspace") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-11-07 16:48:21 -05:00
Gaosheng Cui	ca959e328b	bcachefs: fix possible null-ptr-deref in __bch2_ec_stripe_head_get() The function ec_new_stripe_head_alloc() returns nullptr if kzalloc() fails. It is crucial to verify its return value before dereferencing it to avoid a potential nullptr dereference. Fixes: `035d72f72c` ("bcachefs: bch2_ec_stripe_head_get() now checks for change in rw devices") Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-10-29 06:34:10 -04:00
Gianfranco Trad	2045fc4295	bcachefs: Fix invalid shift in validate_sb_layout() Add check on layout->sb_max_size_bits against BCH_SB_LAYOUT_SIZE_BITS_MAX to prevent UBSAN shift-out-of-bounds in validate_sb_layout(). Reported-by: syzbot+089fad5a3a5e77825426@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=089fad5a3a5e77825426 Fixes: `03ef80b469` ("bcachefs: Ignore unknown mount options") Tested-by: syzbot+089fad5a3a5e77825426@syzkaller.appspotmail.com Signed-off-by: Gianfranco Trad <gianf.trad@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-10-24 17:41:43 -04:00
Kent Overstreet	19773ec997	bcachefs: Disk accounting device validation fixes - Fix failure to validate that accounting replicas entries point to valid devices: this wasn't a real bug since they'd be cleaned up by GC, but is still something we should know about - Fix failure to validate that dev_data_type entries point to valid devices: this does fix a real bug, since bch2_accounting_read() would then try to copy the counters to that device and pop an inconsistent error when the device didn't exist - Remove accounting entries that are zeroed or invalid: if we're not validating them we need to get rid of them: they might not exist in the superblock, so we need the to trigger the superblock mark path when they're readded. This fixes the replication.ktest rereplicate test, which was failing with "superblock not marked for replicas..." Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-10-09 16:42:53 -04:00
Kent Overstreet	ad8d1f77fc	bcachefs: bch2_dev_remove_stripes() We can now correctly force-remove a device that has stripes on it; this uses the new BCH_SB_MEMBER_INVALID sentinal value. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-21 11:39:49 -04:00
Kent Overstreet	54a12984a9	bcachefs: EIO errcode cleanup We want to be using private errcodes whenever possible, for better error messages. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-21 11:39:48 -04:00
Kent Overstreet	a977f3e162	bcachefs: BCH_WRITE_ALLOC_NOWAIT no longer applies to open bucket allocation rebalance writes must be BCH_WRITE_ALLOC_NOWAIT because they don't allocate from the full filesystem - but we don't want spurious allocation failures due to open buckets. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-21 11:35:20 -04:00
Kent Overstreet	e3e6940940	bcachefs: Revert lockless buffered IO path We had a report of data corruption on nixos when building installer images. https://github.com/NixOS/nixpkgs/pull/321055#issuecomment-2184131334 It seems that writes are being dropped, but only when issued by QEMU, and possibly only in snapshot mode. It's undetermined if it's write calls are being dropped or dirty folios. Further testing, via minimizing the original patch to just the change that skips the inode lock on non appends/truncates, reveals that it really is just not taking the inode lock that causes the corruption: it has nothing to do with the other logic changes for preserving write atomicity in corner cases. It's also kernel config dependent: it doesn't reproduce with the minimal kernel config that ktest uses, but it does reproduce with nixos's distro config. Bisection the kernel config initially pointer the finger at page migration or compaction, but it appears that was erroneous; we haven't yet determined what kernel config option actually triggers it. Sadly it appears this will have to be reverted since we're getting too close to release and my plate is full, but we'd _really_ like to fully debug it. My suspicion is that this patch is exposing a preexisting bug - the inode lock actually covers very little in IO paths, and we have a different lock (the pagecache add lock) that guards against races with truncate here. Fixes: `7e64c86cdc` ("bcachefs: Buffered write path now can avoid the inode lock") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-08-31 19:26:08 -04:00
Kent Overstreet	d97de0d017	bcachefs: Make bkey_fsck_err() a wrapper around fsck_err() bkey_fsck_err() was added as an interface that looks like fsck_err(), but previously all it did was ensure that the appropriate error counter was incremented in the superblock. This is a cleanup and bugfix patch that converts it to a wrapper around fsck_err(). This is needed to fix an issue with the upgrade path to disk_accounting_v3, where the "silent fix" error list now includes bkey_fsck errors; fsck_err() handles this in a unified way, and since we need to change printing of bkey fsck errors from the caller to the inner bkey_fsck_err() calls, this ends up being a pretty big change. Als,, rename .invalid() methods to .validate(), for clarity, while we're changing the function signature anyways (to drop the printbuf argument). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-08-13 23:00:50 -04:00
Thomas Bertschinger	1c12d1caf8	bcachefs: Add error code to defer option parsing This introduces a new error code, option_needs_open_fs, which is used to indicate that an attempt was made to parse a mount option prior to opening a filesystem, when that mount option requires an open filesystem in order to be validated. Returning this error results in bch2_parse_one_mount_opt() saving that option for later parsing, after the filesystem is opened. Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:12 -04:00
Kent Overstreet	504794067f	bcachefs: Replace bare EEXIST with private error codes Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-21 10:17:07 -04:00
Kent Overstreet	db42549d40	bcachefs: Add a better limit for maximum number of buckets The bucket_gens array is a single array allocation (one byte per bucket), and kernel allocations are still limited to INT_MAX. Check this limit to avoid failing the bucket_gens array allocation. Reported-by: syzbot+b29f436493184ea42e2b@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-06 10:58:17 -04:00
Kent Overstreet	ec9cc18fc2	bcachefs: Add checks for invalid snapshot IDs Previously, we assumed that keys were consistent with the snapshots btree - but that's not correct as fsck may not have been run or may not be complete. This adds checks and error handling when using the in-memory snapshots table (that mirrors the snapshots btree). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-31 20:36:11 -04:00
Hongbo Li	2a68d611a1	bcachefs: intercept mountoption value for bool type For mount option with bool type, the value must be 0 or 1 (See bch2_opt_parse). But this seems does not well intercepted cause for other value(like 2...), it returns the unexpect return code with error message printed. Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:26 -04:00
Kent Overstreet	7e64c86cdc	bcachefs: Buffered write path now can avoid the inode lock Non append, non extending buffered writes can now avoid taking the inode lock. To ensure atomicity of writes w.r.t. other writes, we lock every folio that we'll be writing to, and if this fails we fall back to taking the inode lock. Extensive comments are provided as to corner cases. Link: https://lore.kernel.org/linux-fsdevel/Zdkxfspq3urnrM6I@bombadil.infradead.org/ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:26 -04:00
Hongbo Li	79162e829b	bcachefs: fix the error code when mounting with incorrect options. When mount with incorrect options such as: "mount -t bcachefs -o errors=back /dev/loop1 /mnt/bcachefs/". It rebacks the error "mount: /mnt/bcachefs: permission denied." cause bch2_parse_mount_opts returns -1 and bch2_mount throws it up. This is unreasonable. The real error message should be like this: "mount: /mnt/bcachefs: wrong fs type, bad option, bad superblock on /dev/loop1, missing codepage or helper program, or other error." Adding three private error codes for mounting error. Here are: - BCH_ERR_mount_option as the parent class for option error. - BCH_ERR_option_name represents the invalid option name. - BCH_ERR_option_value represents the invalid option value. Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:25 -04:00
Kent Overstreet	eb386617be	bcachefs: Errcode tracepoint, documentation Add a tracepoint for downcasting private errors to standard errors, so they can be recovered even when not logged; also, add some documentation. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:25 -04:00
Kent Overstreet	835cd3e147	bcachefs: Check for subvolume children when deleting subvolumes Recursively destroying subvolumes isn't allowed yet. Fixes: https://github.com/koverstreet/bcachefs/issues/634 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:24 -04:00
Kent Overstreet	52946d828a	bcachefs: Kill more -EIO error codes This converts -EIOs related to btree node errors to private error codes, which will help with some ongoing debugging by giving us better error messages. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:23 -04:00
Kent Overstreet	ce3e9283de	bcachefs: Kill some -EINVALs Repurposing standard error codes in bcachefs code is banned in new code, and we need to get rid of the remaining ones - private error codes give us much better error messages. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-10 15:34:09 -04:00
Kent Overstreet	0d529663f0	bcachefs: Split brain detection Use the new bch_member->seq, sb->write_time fields to detect split brain and kick out devices when necessary. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:19 -05:00
Kent Overstreet	53b67d8dcf	bcachefs: better error message in btree_node_write_work() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:42 -05:00
Kent Overstreet	cee0a8ea6d	bcachefs: Improve the nopromote tracepoint Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:42 -05:00
Kent Overstreet	447c1c0105	bcachefs: check for failure to downgrade With the upcoming member seq patch, it's now critical that we don't ever write to a superblock that hasn't been version downgraded - failure to update member seq fields will cause split brain detection to fire erroniously. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:42 -05:00
Kent Overstreet	09caeabe1a	bcachefs: btree write buffer now slurps keys from journal Previosuly, the transaction commit path would have to add keys to the btree write buffer as a separate operation, requiring additional global synchronization. This patch introduces a new journal entry type, which indicates that the keys need to be copied into the btree write buffer prior to being written out. We switch the journal entry type back to JSET_ENTRY_btree_keys prior to write, so this is not an on disk format change. Flushing the btree write buffer may require pulling keys out of journal entries yet to be written, and quiescing outstanding journal reservations; we previously added journal->buf_lock for synchronization with the journal write path. We also can't put strict bounds on the number of keys in the journal destined for the write buffer, which means we might overflow the size of the preallocated buffer and have to reallocate - this introduces a potentially fatal memory allocation failure. This is something we'll have to watch for, if it becomes an issue in practice we can do additional mitigation. The transaction commit path no longer has to explicitly check if the write buffer is full and wait on flushing; this is another performance optimization. Instead, when the btree write buffer is close to full we change the journal watermark, so that only reservations for journal reclaim are allowed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:41 -05:00
Kent Overstreet	56ec287d30	bcachefs: BCH_ERR_opt_parse_error Continuing the project of replacing generic error codes with more specific ones. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:40 -05:00
Kent Overstreet	573224301c	bcachefs: Make journal replay more efficient Journal replay now first attempts to replay keys in sorted order, similar to how the btree write buffer flush path works. Any keys that can not be replayed due to journal deadlock are then left for later and replayed in journal order, unpinning journal entries as we go. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:37 -05:00
Kent Overstreet	84f1638795	bcachefs: bch_sb_field_downgrade Add a new superblock section that contains a list of { minor version, recovery passes, errors_to_fix } that is - a list of recovery passes that must be run when downgrading past a given version, and a list of errors to silently fix. The upcoming disk accounting rewrite is not going to be fully compatible: we're going to have to regenerate accounting both when upgrading to the new version, and also from downgrading from the new version, since the new method of doing disk space accounting is a completely different architecture based on deltas, and synchronizing them for every jounal entry write to maintain compatibility is going to be too expensive and impractical. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:07 -05:00
Kent Overstreet	8b16413cda	bcachefs: bch_sb.recovery_passes_required Add two new superblock fields. Since the main section of the superblock is now fully, we have to add a new variable length section for them - bch_sb_field_ext. - recovery_passes_requried: recovery passes that must be run on the next mount - errors_silent: errors that will be silently fixed These are to improve upgrading and dwongrading: these fields won't be cleared until after recovery successfully completes, so there won't be any issues with crashing partway through an upgrade or a downgrade. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:07 -05:00
Kent Overstreet	d5bd37872a	bcachefs: Add missing validation for jset_entry_data_usage Validation was completely missing for replicas entries in the journal (not the superblock replicas section) - we can't have replicas entries pointing to invalid devices. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-28 17:18:24 -05:00
Kent Overstreet	7d9f8468ff	bcachefs: Data update path won't accidentaly grow replicas Previously, there was a bug where if an extent had greater durability than required (because we needed to move a durability=1 pointer and ended up putting it on a durability 2 device), we would submit a write for replicas=2 - the durability of the pointer being rewritten - instead of the number of replicas required to bring it back up to the data_replicas option. This, plus the allocation path sometimes allocating on a greater durability device than requested, meant that extents could continue having more and more replicas added as they were being rewritten. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-25 21:48:42 -05:00
Kent Overstreet	a973de85e3	bcachefs: Replace ERANGE with private error codes We avoid using standard error codes: private, per-callsite error codes make debugging easier. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-05 13:12:18 -05:00
Kent Overstreet	f5d26fa31e	bcachefs: bch_sb_field_errors Add a new superblock section to keep counts of errors seen since filesystem creation: we'll be addingcounters for every distinct fsck error. The new superblock section has entries of the for [ id, count, time_of_last_error ]; this is intended to let us see what errors are occuring - and getting fixed - via show-super output. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-01 21:11:08 -04:00
Kent Overstreet	6ddedca218	bcachefs: Guard against unknown compression options Since compression options now include compression level, proper validation is a bit more involved. This adds bch2_compression_opt_valid(), and plumbs it around appropriately. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-31 12:18:37 -04:00
Hunter Shaffer	3f7b9713da	bcachefs: New superblock section members_v2 members_v2 has dynamically resizable entries so that we can extend bch_member. The members can no longer be accessed with simple array indexing Instead members_v2_get is used to find a member's exact location within the array and returns a copy of that member. Alternatively member_v2_get_mut retrieves a mutable point to a member. Signed-off-by: Hunter Shaffer <huntershaffer182456@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:15 -04:00
Kent Overstreet	40a53b9215	bcachefs: More minor smatch fixes - fix a few uninitialized return values - return a proper error code in lookup_lostfound() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:14 -04:00
Kent Overstreet	feb5cc3981	bcachefs: trace_read_nopromote() Add a tracepoint to print the reason a read wasn't promoted. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:12 -04:00
Kent Overstreet	1809b8cba7	bcachefs: Break up io.c More reorganization, this splits up io.c into - io_read.c - io_misc.c - fallocate, fpunch, truncate - io_write.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:12 -04:00
Kent Overstreet	56046e3ecc	bcachefs: Convert btree_err_type to normal error codes Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:09 -04:00
Kent Overstreet	922bc5a037	bcachefs: Make topology repair a normal recovery pass This adds bch2_run_explicit_recovery_pass(), for rewinding recovery and explicitly running a specific recovery pass - this is a more general replacement for how we were running topology repair before. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:08 -04:00
Kent Overstreet	ae2e13d780	bcachefs: bch2_run_explicit_recovery_pass() This introduces bch2_run_explicit_recovery_pass() and uses it for when fsck detects that we need to re-run dead snaphots cleanup, and makes dead snapshot cleanup more like a normal recovery pass. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:08 -04:00
Kent Overstreet	43b81a4eac	bcachefs: overlapping_extents_found() This improves the repair path for overlapping extents - we now verify that we find in the btree the overlapping extents that the algorithm detected, and fail the fsck run with a more useful error if it doesn't match. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:07 -04:00
Kent Overstreet	7c50140fce	bcachefs: Convert more -EROFS to private error codes Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:06 -04:00
Kent Overstreet	e9d017234f	bcachefs: BCH_ERR_fsck -> EINVAL When we return errors outside of bcachefs, we need to return a standard error code - fix this for BCH_ERR_fsck. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:04 -04:00
Kent Overstreet	9473cff989	bcachefs: Fix more lockdep splats in debug.c Similar to previous fixes, we can't incur page faults while holding btree locks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:04 -04:00
Kent Overstreet	3ebfc8fe95	bcachefs: Use unlikely() in bch2_err_matches() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:03 -04:00
Kent Overstreet	e47a390aa5	bcachefs: Convert -ENOENT to private error codes As with previous conversions, replace -ENOENT uses with more informative private error codes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:03 -04:00
Kent Overstreet	1c59b483a3	bcachefs: BTREE_ID_snapshot_tree This adds a new btree which gets us a persistent per-snapshot-tree identifier. - BTREE_ID_snapshot_trees - KEY_TYPE_snapshot_tree - bch_snapshot now has a field that points to a snapshot_tree This is going to be used to designate one snapshot ID/subvolume out of a given tree of snapshots as the "main" subvolume, so that we can do quota accounting in that subvolume and not the rest. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:01 -04:00
Kent Overstreet	51e84d3bbf	bcachefs: bch2_bkey_get_empty_slot() Add a new helper for allocating a new slot in a btree. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:01 -04:00
Kent Overstreet	65d48e3525	bcachefs: Private error codes: ENOMEM This adds private error codes for most (but not all) of our ENOMEM uses, which makes it easier to track down assorted allocation failures. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:57 -04:00

1 2

73 Commits