Commit Graph

125 Commits

Author SHA1 Message Date
Linus Torvalds
3f6984e730 vfs-6.8.super
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZZUx4wAKCRCRxhvAZXjc
 osaNAQC/c+xXVfiq/pFbuK9MQLna4RGZaGcG9k312YniXbHq0AD9HAf4aPcZwPy1
 /wkD4pauj3UZ3f0xBSyazGBvAXyN0Qc=
 =iFAQ
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.8.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs super updates from Christian Brauner:
 "This contains the super work for this cycle including the long-awaited
  series by Jan to make it possible to prevent writing to mounted block
  devices:

   - Writing to mounted devices is dangerous and can lead to filesystem
     corruption as well as crashes. Furthermore syzbot comes with more
     and more involved examples how to corrupt block device under a
     mounted filesystem leading to kernel crashes and reports we can do
     nothing about. Add tracking of writers to each block device and a
     kernel cmdline argument which controls whether other writeable
     opens to block devices open with BLK_OPEN_RESTRICT_WRITES flag are
     allowed.

     Note that this effectively only prevents modification of the
     particular block device's page cache by other writers. The actual
     device content can still be modified by other means - e.g. by
     issuing direct scsi commands, by doing writes through devices lower
     in the storage stack (e.g. in case loop devices, DM, or MD are
     involved) etc. But blocking direct modifications of the block
     device page cache is enough to give filesystems a chance to perform
     data validation when loading data from the underlying storage and
     thus prevent kernel crashes.

     Syzbot can use this cmdline argument option to avoid uninteresting
     crashes. Also users whose userspace setup does not need writing to
     mounted block devices can set this option for hardening. We expect
     that this will be interesting to quite a few workloads.

     Btrfs is currently opted out of this because they still haven't
     merged patches we require for this to work from three kernel
     releases ago.

   - Reimplement block device freezing and thawing as holder operations
     on the block device.

     This allows us to extend block device freezing to all devices
     associated with a superblock and not just the main device. It also
     allows us to remove get_active_super() and thus another function
     that scans the global list of superblocks.

     Freezing via additional block devices only works if the filesystem
     chooses to use @fs_holder_ops for these additional devices as well.
     That currently only includes ext4 and xfs.

     Earlier releases switched get_tree_bdev() and mount_bdev() to use
     @fs_holder_ops. The remaining nilfs2 open-coded version of
     mount_bdev() has been converted to rely on @fs_holder_ops as well.
     So block device freezing for the main block device will continue to
     work as before.

     There should be no regressions in functionality. The only special
     case is btrfs where block device freezing for the main block device
     never worked because sb->s_bdev isn't set. Block device freezing
     for btrfs can be fixed once they can switch to @fs_holder_ops but
     that can happen whenever they're ready"

* tag 'vfs-6.8.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (27 commits)
  block: Fix a memory leak in bdev_open_by_dev()
  super: don't bother with WARN_ON_ONCE()
  super: massage wait event mechanism
  ext4: Block writes to journal device
  xfs: Block writes to log device
  fs: Block writes to mounted block devices
  btrfs: Do not restrict writes to btrfs devices
  block: Add config option to not allow writing to mounted devices
  block: Remove blkdev_get_by_*() functions
  bcachefs: Convert to bdev_open_by_path()
  fs: handle freezing from multiple devices
  fs: remove dead check
  nilfs2: simplify device handling
  fs: streamline thaw_super_locked
  ext4: simplify device handling
  xfs: simplify device handling
  fs: simplify setup_bdev_super() calls
  blkdev: comment fs_holder_ops
  porting: document block device freeze and thaw changes
  fs: remove unused helper
  ...
2024-01-08 10:43:51 -08:00
Kent Overstreet
84f1638795 bcachefs: bch_sb_field_downgrade
Add a new superblock section that contains a list of
  { minor version, recovery passes, errors_to_fix }

that is - a list of recovery passes that must be run when downgrading
past a given version, and a list of errors to silently fix.

The upcoming disk accounting rewrite is not going to be fully
compatible: we're going to have to regenerate accounting both when
upgrading to the new version, and also from downgrading from the new
version, since the new method of doing disk space accounting is a
completely different architecture based on deltas, and synchronizing
them for every jounal entry write to maintain compatibility is going to
be too expensive and impractical.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:07 -05:00
Kent Overstreet
8b16413cda bcachefs: bch_sb.recovery_passes_required
Add two new superblock fields. Since the main section of the superblock
is now fully, we have to add a new variable length section for them -
bch_sb_field_ext.

 - recovery_passes_requried: recovery passes that must be run on the
   next mount
 - errors_silent: errors that will be silently fixed

These are to improve upgrading and dwongrading: these fields won't be
cleared until after recovery successfully completes, so there won't be
any issues with crashing partway through an upgrade or a downgrade.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:07 -05:00
Brian Foster
63807d9518 bcachefs: preserve device path as device name
Various userspace scripts/tools may expect mount entries in
/proc/mounts to reflect the device path names used to mount the
associated filesystem. bcachefs seems to normalize the device path
to the underlying device name based on the block device. This
confuses tools like fstests when the test devices might be lvm or
device-mapper based.

The default behavior for show_vfsmnt() appers to be to use the
string passed to alloc_vfsmnt(), so tweak bcachefs to copy the path
at device superblock read time and to display it via
->show_devname().

Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-24 02:42:07 -05:00
Jan Kara
1bfdc94b28
bcachefs: Convert to bdev_open_by_path()
Convert bcachefs to use bdev_open_by_path() and pass the handle around.

CC: Kent Overstreet <kent.overstreet@linux.dev>
CC: Brian Foster <bfoster@redhat.com>
CC: <linux-bcachefs@vger.kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20231101174325.10596-1-jack@suse.cz
Acked-by: Kent Overstreet <kent.overstreet@linux.dev>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-11-18 14:59:25 +01:00
Kent Overstreet
59154f2c66 bcachefs: bch2_prt_datetime()
Improved, better named version of pr_time().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-05 13:12:08 -05:00
Kent Overstreet
01ccee225a bcachefs: Add missing printk newlines
This was causing error messages in -tools to not get printed.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-04 22:19:13 -04:00
Kent Overstreet
f5d26fa31e bcachefs: bch_sb_field_errors
Add a new superblock section to keep counts of errors seen since
filesystem creation: we'll be addingcounters for every distinct fsck
error.

The new superblock section has entries of the for [ id, count,
time_of_last_error ]; this is intended to let us see what errors are
occuring - and getting fixed - via show-super output.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-01 21:11:08 -04:00
Kent Overstreet
94119eeb02 bcachefs: Add IO error counts to bch_member
We now track IO errors per device since filesystem creation.

IO error counts can be viewed in sysfs, or with the 'bcachefs
show-super' command.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-01 21:11:08 -04:00
Kent Overstreet
4637429e39 bcachefs: bch2_sb_field_get() refactoring
Instead of using token pasting to generate methods for each superblock
section, just make the type a parameter to bch2_sb_field_get().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:16 -04:00
Hunter Shaffer
9af26120f0 bcachefs: Rename bch_sb_field_members -> bch_sb_field_members_v1
Signed-off-by: Hunter Shaffer <huntershaffer182456@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:15 -04:00
Hunter Shaffer
3f7b9713da bcachefs: New superblock section members_v2
members_v2 has dynamically resizable entries so that we can extend
bch_member. The members can no longer be accessed with simple array
indexing Instead members_v2_get is used to find a member's exact
location within the array and returns a copy of that member.
Alternatively member_v2_get_mut retrieves a mutable point to a member.

Signed-off-by: Hunter Shaffer <huntershaffer182456@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:15 -04:00
Hunter Shaffer
1241df5872 bcachefs: Add new helper to retrieve bch_member from sb
Prep work for introducing bch_sb_field_members_v2 - introduce new
helpers that will check for members_v2 if it exists, otherwise using v1

Signed-off-by: Hunter Shaffer <huntershaffer182456@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:15 -04:00
Kent Overstreet
793a06d984 bcachefs: Fixes for building in userspace
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:15 -04:00
Kent Overstreet
40a53b9215 bcachefs: More minor smatch fixes
- fix a few uninitialized return values
 - return a proper error code in lookup_lostfound()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:14 -04:00
Kent Overstreet
96dea3d599 bcachefs: Fix W=12 build errors
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:13 -04:00
Colin Ian King
6bf3766b52 bcachefs: Fix a handful of spelling mistakes in various messages
There are several spelling mistakes in error messages. Fix these.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:13 -04:00
Kent Overstreet
1809b8cba7 bcachefs: Break up io.c
More reorganization, this splits up io.c into
 - io_read.c
 - io_misc.c - fallocate, fpunch, truncate
 - io_write.c

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:12 -04:00
Kent Overstreet
a37ad1a3ab bcachefs: sb-clean.c
Pull code for bch_sb_field_clean out into its own file.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:10 -04:00
Kent Overstreet
0ec3985694 bcachefs: Move bch_sb_field_crypt code to checksum.c
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:10 -04:00
Kent Overstreet
fb8e5b4cae bcachefs: sb-members.c
Split out a new file for bch_sb_field_members - we'll likely want to
move more code here in the future.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:10 -04:00
Kent Overstreet
1e81f89b02 bcachefs: Fix assorted checkpatch nits
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:10 -04:00
Kent Overstreet
6fe893eade bcachefs: Fix for sb buffer being misaligned
On old kernels, kmalloc() may return an allocation that's not naturally
aligned - this resulted in a bug where we allocated a bio with not
enough biovecs. Fix this by using buf_pages().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:10 -04:00
Kent Overstreet
bf5a261c7a bcachefs: Assorted fixes for clang
clang had a few more warnings about enum conversion, and also didn't
like the opts.c initializer.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:09 -04:00
Kent Overstreet
813e0cecd1 bcachefs: Upgrade path fixes
Some minor fixes to not print errors that are actually due to a verson
upgrade.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:07 -04:00
Kent Overstreet
f39d1aca4d bcachefs: Add buffered IO fallback for userspace
In userspace, we want to be able to switch to buffered IO when we're
dealing with an image on a filesystem/device that doesn't support the
blocksize the filesystem was formatted with.

This plumbs through !opts.direct_io -> FMODE_BUFFERED, which will be
supported by the shim version of blkdev_get_by_path() in -tools, and it
adds a fallback to disable direct IO and retry for userspace.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:07 -04:00
Kent Overstreet
b912913613 bcachefs: Fix build error on weird gcc
fixes
./include/linux/stddef.h:8:14: error: positional initialization of field in ‘struct’ declared with ‘designated_init’ attribute [-Werror=designated-init]

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:07 -04:00
Kent Overstreet
065bd3356c bcachefs: Version table now lists required recovery passes
Now that we've got forward compatibility sorted out, we should be doing
more frequent version upgrades in the future.

To avoid having to run a full fsck for every version upgrade, this
improves the BCH_METADATA_VERSIONS() table to explicitly specify a
bitmask of recovery passes to run when upgrading to or past a given
version.

This means we can also delete PASS_UPGRADE().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:06 -04:00
Kent Overstreet
6619d84626 bcachefs: bch2_sb_maybe_downgrade(), bch2_sb_upgrade()
Add some new helpers, and fix upgrade/downgrade in bch2_fs_initialize().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:06 -04:00
Kent Overstreet
ba8eeae8ee bcachefs: bcachefs_metadata_version_major_minor
This introduces major/minor versioning to the superblock version number.
Major version number changes indicate incompatible releases; we can move
forward to a new major version number, but not backwards. Minor version
numbers indicate compatible changes - these add features, but can still
be mounted and used by old versions.

With the recent patches that make it possible to roll out new btrees and
key types without breaking compatibility, we should be able to roll out
most new features without incompatible changes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:06 -04:00
Kent Overstreet
3045bb958a bcachefs: version_upgrade is now an enum
The version_upgrade parameter is now an enum, not a bool, and it's
persistent in the superblock:
 - compatible (default):	upgrade to the latest compatible version
 - incompatible:		upgrade to latest incompatible version
 - none

Currently all upgrades are incompatible upgrades, but the next release
will introduce major:minor versions.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:06 -04:00
Kent Overstreet
24964e1c5c bcachefs: BCH_SB_VERSION_UPGRADE_COMPLETE()
Version upgrades are not atomic operations: when we do a version upgrade
we need to update the superblock before we start using new features, and
then when the upgrade completes we need to update the superblock again.
This adds a new superblock field so we can detect and handle incomplete
version upgrades.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:06 -04:00
Kent Overstreet
c8b4534d82 bcachefs: Delete redundant log messages
Now that we have distinct error codes for different memory allocation
failures, the early init log messages are no longer needed.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:06 -04:00
Kent Overstreet
236b68da50 bcachefs: Refactor bch_sb_field_ops handling
This changes bch_sb_field_ops lookup to match how bkey_ops now works;
for an unknown field type we return an empty ops struct.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:06 -04:00
Kent Overstreet
a02a0121b3 bcachefs: bch2_version_compatible()
This adds a new helper for checking if an on-disk version is compatible
with the running version of bcachefs - prep work for introducing
major:minor version numbers.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:05 -04:00
Kent Overstreet
e3804b55e4 bcachefs: bch2_version_to_text()
Add a new helper for printing out metadata versions in a standard
format.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:05 -04:00
Kent Overstreet
65d48e3525 bcachefs: Private error codes: ENOMEM
This adds private error codes for most (but not all) of our ENOMEM uses,
which makes it easier to track down assorted allocation failures.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:57 -04:00
Kent Overstreet
4bd4035e64 bcachefs: Handle sb buffer resizing in __copy_super()
This fixes a rare buffer overrun when one field is growing and another
field is shrinking - and is a nice simplification as well.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:53 -04:00
Kent Overstreet
858536c7ce bcachefs: Convert EROFS errors to private error codes
More error code improvements - this gets us more useful error messages.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:49 -04:00
Kent Overstreet
78c0b75c34 bcachefs: More errcode cleanup
We shouldn't be overloading standard error codes now that we have
provisions for bcachefs-specific errorcodes: this patch converts super.c
and super-io.c to per error site errcodes, with a bit of cleanup.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:48 -04:00
Kent Overstreet
e153821259 bcachefs: New magic number
Add a new bcachefs-specific magic number for the superblock, instead of
continuing to use the old bcache magic number3

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:47 -04:00
Kent Overstreet
3e3e02e6bc bcachefs: Assorted checkpatch fixes
checkpatch.pl gives lots of warnings that we don't want - suggested
ignore list:

 ASSIGN_IN_IF
 UNSPECIFIED_INT	- bcachefs coding style prefers single token type names
 NEW_TYPEDEFS		- typedefs are occasionally good
 FUNCTION_ARGUMENTS	- we prefer to look at functions in .c files
			  (hopefully with docbook documentation), not .h
			  file prototypes
 MULTISTATEMENT_MACRO_USE_DO_WHILE
			- we have _many_ x-macros and other macros where
			  we can't do this

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:44 -04:00
Kent Overstreet
c23a9e0882 bcachefs: Improve jset_validate()
Previously, jset_validate() was formatting the initial part of an error
string for every entry it validating - expensive.

This moves that code to journal_entry_err_msg(), which is now only
called if there's an actual error.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:42 -04:00
Kent Overstreet
098ef98d5b bcachefs: Add private error codes for ENOSPC
Continuing the saga of introducing private dedicated error codes for
each error path, this patch converts ENOSPC to error codes that are
subtypes of ENOSPC. We've recently had a test failure where we got
-ENOSPC where we shouldn't have, and didn't have enough information to
tell where it came from, so this patch will solve that problem.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:40 -04:00
Kent Overstreet
5a82c7c7d1 bcachefs: Fix sb_field_counters formatting
We have counters with longer names now, so adjust the tabstop - also,
make sure there's always a space printed between the name and the
number.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:40 -04:00
Kent Overstreet
674cfc2624 bcachefs: Add persistent counters for all tracepoints
Also, do some reorganizing/renaming, convert atomic counters in bch_fs
to persistent counters, and add a few missing counters.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:39 -04:00
Kent Overstreet
401ec4db63 bcachefs: Printbuf rework
This converts bcachefs to the modern printbuf interface/implementation,
synced with the version to be submitted upstream.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:33 -04:00
Daniel Hill
a8dea22703 bcachefs: Rename group to label for remaining strings.
Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:33 -04:00
Daniel Hill
104c69745f bcachefs: Add persistent counters
This adds a new superblock field for persisting counters
and adds a sysfs interface in counters/ exposing these counters.

The superblock field is ignored by older versions letting us avoid
an on disk version bump.

Each sysfs file outputs a counter that tracks since filesystem
creation and a counter for the current mount session.

Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:32 -04:00
Kent Overstreet
822835ffea bcachefs: Fold bucket_state in to BCH_DATA_TYPES()
Previously, we were missing accounting for buckets in need_gc_gens and
need_discard states. This matters because buckets in those states need
other btree operations done before they can be used, so they can't be
conuted when checking current number of free buckets against the
allocation watermark.

Also, we weren't directly counting free buckets at all. Now, data type 0
== BCH_DATA_free, and free buckets are counted; this means we can get
rid of the separate (poorly defined) count of unavailable buckets.

This is a new on disk format version, with upgrade and fsck required for
the accounting changes.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:30 -04:00