linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2025-01-27 00:04:47 +08:00

Author	SHA1	Message	Date
Kent Overstreet	35a067b42d	bcachefs: Change when we allow overwrites Originally, we'd check for -ENOSPC when getting a disk reservation whenever the new extent took up more space on disk than the old extent. Erasure coding screwed this up, because with erasure coding writes are initially replicated, and then in the background the extra replicas are dropped when the stripe is created. This means that with erasure coding enabled, writes will always take up more space on disk than the data they're overwriting - but, according to posix, overwrites aren't supposed to return ENOSPC. So, in this patch we fudge things: if the new extent has more replicas than the _effective_ replicas of the old extent, or if the old extent is compressed and the new one isn't, we check for ENOSPC when getting the disk reservation - otherwise, we don't. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:50 -04:00
Kent Overstreet	f30dd86012	bcachefs: Don't write bucket IO time lazily With the btree key cache code, we don't need to update the alloc btree lazily - and this will mean we can remove the bch2_alloc_write() call in the shutdown path. Future work: we really need to expend the bucket IO clocks from 16 to 64 bits, so that we don't have to rescale them. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:50 -04:00
Kent Overstreet	33c74e4119	bcachefs: Flag inodes that had btree update errors On write error, the vfs inode's i_size may be inconsistent with the btree inode's i_size - flag this so we don't have spurious assertions. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:49 -04:00
Kent Overstreet	0fefe8d8ef	bcachefs: Improve some IO error messages it's useful to know whether an error was for a read or a write - this also standardizes error messages a bit more. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:49 -04:00
Kent Overstreet	3eb26d0157	bcachefs: bch2_trans_get_iter() no longer returns errors Since we now always preallocate the maximum number of iterators when we initialize a btree transaction, getting an iterator never fails - we can delete a fair amount of error path code. This patch also simplifies the iterator allocation code a bit. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:48 -04:00
Kent Overstreet	89931472c2	bcachefs: Fix for __readahead_batch getting partial batch We were incorrectly ignoring the return value of __readahead_batch, leading to a null ptr deref in __bch2_page_state_create(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:48 -04:00
Kent Overstreet	eb8e6e9ccb	bcachefs: Deadlock prevention for ei_pagecache_lock In the dio write path, when get_user_pages() invokes the fault handler we have a recursive locking situation - we have to handle the lock ordering ourselves or we have a deadlock: this patch addresses that by checking for locking ordering violations and doing the unlock/relock dance if necessary. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:46 -04:00
Matthew Wilcox (Oracle)	00276f9f34	bcachefs: Use attach_page_private and detach_page_private These recently added helpers simplify the code. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:46 -04:00
Matthew Wilcox (Oracle)	96fee47e44	bcachefs: Remove page_state_init_for_read This is dead code; delete the function. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:46 -04:00
Kent Overstreet	13dcd4abcd	bcachefs: Fix rare use after free in read path If the bkey_on_stack_reassemble() call in __bch2_read_indirect_extent() reallocates the buffer, k in bch2_read - which we pointed at the bkey_on_stack buffer - will now point to a stale buffer. Whoops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:45 -04:00
Kent Overstreet	9ba2eb25f0	bcachefs: Fix __bch2_truncate_page() __bch2_truncate_page() will mark some of the blocks in a page as unallocated. But, if the page is mmapped (and writable), every block in the page needs to be marked dirty, else those blocks won't be written by __bch2_writepage(). The solution is to change those userspace mappings to RO, so that we force bch2_page_mkwrite() to be called again. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:44 -04:00
Kent Overstreet	912bdf17a8	bcachefs: Fix short buffered writes In the buffered write path, we have to check for short writes that write to the full page, where the page wasn't UpToDate; when this happens, the page is partly garbage, so we have to zero it out and revert that part of the write. This check was wrong - we reverted total from copied, but didn't revert the iov_iter, probably also leading to corrupted writes. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:42 -04:00
Kent Overstreet	52fbb7c859	bcachefs: Don't cap ios in dio write path at 2 MB It appears this was erronious, a different bug was responsible Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:42 -04:00
Kent Overstreet	042a1f268e	bcachefs: Refactor dio write code to reinit bch_write_op This fixes a bug where the BCH_WRITE_SKIP_CLOSURE_PUT was set incorrectly, causing the completion to be delivered multiple times. oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:42 -04:00
Kent Overstreet	36b8372b59	bcachefs: Add an option to disable reflink support Reflink might be buggy, so we're adding an option so users can help bisect what's going on. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:40 -04:00
Yuxuan Shui	22d8a33d30	bcachefs: fix stack corruption When a bkey_on_stack is passed to bch_read_indirect_extent, there is no guarantee that it will be big enough to hold the bkey. And bch_read_indirect_extent is not aware of bkey_on_stack to call realloc on it. This cause a stack corruption. This commit makes bch_read_indirect_extent aware of bkey_on_stack so it can call realloc when appropriate. Tested-by: Yuxuan Shui <yshuiv7@gmail.com> Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:39 -04:00
Kent Overstreet	f59b346477	bcachefs: Don't issue writes that are more than 1 MB the bcachefs io path in io.c can't bounce writes larger than that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:39 -04:00
Kent Overstreet	283eda5798	bcachefs: Fix fallocate FL_INSERT_RANGE This was another bug because of bch2_btree_iter_set_pos() invalidating iterators. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:38 -04:00
Kent Overstreet	286d8ad040	bcachefs: Fix a use after free in dio write path Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:36 -04:00
Kent Overstreet	163e885a0a	bcachefs: Kill TRANS_RESET_MEM\|TRANS_RESET_ITERS All iterators should be released now with bch2_trans_iter_put(), so TRANS_RESET_ITERS shouldn't be needed anymore, and TRANS_RESET_MEM is always used. Also convert more code to __bch2_trans_do(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:35 -04:00
Kent Overstreet	24326cd12a	bcachefs: Sort & deduplicate updates in bch2_trans_update() Previously, when doing multiple update in the same transaction commit that overwrote each other, we relied on doing the updates in the same order as the bch2_trans_update() calls in order to get the correct result. But that wasn't correct for triggers; bch2_trans_mark_update() when marking overwrites would do the wrong thing because it hadn't seen the update that was being overwritten. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:33 -04:00
Kent Overstreet	2d594dfb53	bcachefs: Split out btree_trigger_flags The trigger flags really belong with individual btree_insert_entries, not the transaction commit flags - this splits out those flags and unifies them with the BCH_BUCKET_MARK flags. Todo - split out btree_trigger.c from buckets.c Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:33 -04:00
Kent Overstreet	58e2388f9e	bcachefs: Kill BTREE_INSERT_ATOMIC Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:33 -04:00
Kent Overstreet	b1fd23df1d	bcachefs: Convert all bch2_trans_commit() users to BTREE_INSERT_ATOMIC BTREE_INSERT_ATOMIC should really be the default mode, and there's not that much code that doesn't need it - so this is prep work for getting rid of the flag. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:33 -04:00
Kent Overstreet	a8abd3a7f6	bcachefs: bch2_trans_reset() calls should be at the tops of loops It needs to be called when we get -EINTR due to e.g. lock restart - this fixes a transaction iterators overflow bug. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:33 -04:00
Kent Overstreet	c45d473df7	bcachefs: Fix for an assertion on filesystem error Normally the in memory i_size is always greater than or equal to i_size on disk; this doesn't hold on filesystem error. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:33 -04:00
Kent Overstreet	5934a0caf2	bcachefs: bkey_on_stack_reassemble() Small helper function. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:32 -04:00
Kent Overstreet	4de774952b	bcachefs: Reorganize extents.c Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:32 -04:00
Kent Overstreet	4be1a412ea	bcachefs: Inline data extents This implements extents that have their data inline, in the value, instead of the bkey value being pointers to the data - and the read and write paths are updated to read from these new extent types and write them out, when the write size is small enough. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:32 -04:00
Kent Overstreet	08c07fea7b	bcachefs: Split out extent_update.c Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:32 -04:00
Kent Overstreet	085ab69357	bcachefs: Rework of cut_front & cut_back This changes bch2_cut_front and bch2_cut_back so that they're able to shorten the size of the value, and it also changes the extent update path to update the accounting in the btree node when this happens. When the size of the value is shortened, they zero out the space that's no longer used, so it's interpreted as noops (as implemented in the last patch). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:32 -04:00
Kent Overstreet	35189e09ab	bcachefs: bkey_on_stack This implements code for storing small bkeys on the stack and allocating out of a mempool if they're too big. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:32 -04:00
Kent Overstreet	50fe5bd69c	bcachefs: Use wbc_to_write_flags() Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:32 -04:00
Kent Overstreet	677fc0562a	bcachefs: Some reflink fixes len might fit into a loff_t when aligned_len does not - make sure we use a u64 for aligned_len. Also, we weren't always extending the inode correctly. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:31 -04:00
Kent Overstreet	a023127a28	bcachefs: Eliminate function calls in DIO fastpaths We can assume that usually buffered and O_DIRECT IO won't be mixed, and the calls to flush the page cache won't be needed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:31 -04:00
Kent Overstreet	54847d253a	bcachefs: DIO write path only needs to shoot down pagecache once, not twice Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:31 -04:00
Kent Overstreet	1b783a690d	bcachefs: Add pagecache_add lock to buffered IO path, fault path Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:31 -04:00
Kent Overstreet	7edcfbfefe	bcachefs: Don't hold inode lock longer than necessary in dio write path In theory we should be able to do (non appending/extending) dio writes without taking the inode lock at all - but this gets us most of the way there. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:31 -04:00
Kent Overstreet	f8f3086338	bcachefs: Avoid atomics in write fast path This adds some horrible hacks, but the atomic ops for closures were getting to be a pretty expensive part of the write path. We don't want to rip out closures entirely from the write path, because they're used for e.g. waiting on the allocator, or waiting on the journal flush, and that stuff would get really ugly without closures. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:31 -04:00
Kent Overstreet	406d6d5a07	bcachefs: Fix an error path race On IO error, bch2_writepages_io_done() will set the page state to indicate nothing's already reserved (since the write didn't happen, we don't know what's already reserved). This can race with the buffered IO path, in between getting a disk reservation and calling bch2_set_page_dirty(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:30 -04:00
Kent Overstreet	2a9101a989	bcachefs: Refactor bch2_trans_commit() path Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:30 -04:00
Kent Overstreet	a94407434b	bcachefs: Limit bios in writepages path to 256M This works around a bug where bio_full() doesn't check for bio->bi_iter.bi_size overflowing - and, we don't really want to build bios that are that big anyways. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:30 -04:00
Kent Overstreet	9a3df993e1	bcachefs: Kill bchfs_extent_update() The generic IO path now handles inode updates for i_size and i_sectors - this means we can drop a fair amount of code from fs-io.c. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:29 -04:00
Kent Overstreet	2e87eae1fb	bcachefs: Convert bch2_fpunch to bch2_extent_update() As before - we're moving non Linux specific code out of fs-io.c. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:29 -04:00
Kent Overstreet	2925fc49b3	bcachefs: Split out bchfs_extent_update() The next few patches are going to be more moving the logic around i_size/i_sectors updates to io.c, and better separating the Linux VFS specific code from core bcachefs code, to better support the fuse port. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:29 -04:00
Kent Overstreet	e0541a9346	bcachefs: Kill some dependencies on ei_inode Moving bch2_extent_update() to io.c will be greatly simplified if we no longer have to keep ei_inode.bi_size/bi_sectors up to date. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:29 -04:00
Kent Overstreet	daf3fe502a	bcachefs: Check if extending inode differently In bch2_extent_update(), we have to update the inode if i_size is changing (the file is being extend) or if i_sectors is changing, but we want to avoid touching the inode if it's not necessary. Change sum_sector_overwrites() to also check if there's already data above where we're writing to - this means we're definitely not extending the file. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:29 -04:00
Kent Overstreet	3826ee0b17	bcachefs: Add a lock to bch_page_state We can't use the page lock to protect it, because on writeback IO error we need to access the page state before calling end_page_writeback() and the page lock semantics are completely insane so that deadlocks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:29 -04:00
Kent Overstreet	137b0ed907	bcachefs: bch2_extent_atomic_end() now traverses iter This fixes a bug in io.c bch2_write_index_default() - it was missing the traverse call, but bch2_extent_atomic_end returns an error now and can just call it itself. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:28 -04:00
Kent Overstreet	58677a1d40	bcachefs: bch2_inode_peek()/bch2_inode_write() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:28 -04:00

1 2 3

111 Commits