Commit Graph

88097 Commits

Author SHA1 Message Date
Kent Overstreet
d05db12715 bcachefs: Print durability in member_to_text()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:38 -05:00
Kent Overstreet
7541787f58 bcachefs: Improve sysfs compression_stats
Break it out by compression type, and include average extent size.

Also, format into a nice table.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:38 -05:00
Kent Overstreet
9b34f02cdc bcachefs: Kill dev_usage->buckets_ec
This counter is redundant; it's simply the sum of BCH_DATA_stripe and
BCH_DATA_parity buckets.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:38 -05:00
Kent Overstreet
ed0cd515cd bcachefs: bch2_dev_usage_to_text()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:38 -05:00
Kent Overstreet
dafff7e575 bcachefs: New bucket sector count helpers
This introduces bch2_bucket_sectors() and bch2_bucket_sectors_dirty(),
prep work for separately accounting stripe sectors.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:38 -05:00
Kent Overstreet
e6674decb2 bcachefs: BCH_IOCTL_DEV_USAGE_V2
BCH_IOCTL_DEV_USAGE mistakenly put the per-data-type array in struct
bch_ioctl_dev_usage; since ioctl numbers encode the size of the arg,
that means adding new data types breaks the ioctl.

This adds a new version that includes the number of data types as a
parameter: the old version is fixed at 10 so as to not break when adding
new types.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:38 -05:00
Kent Overstreet
3b05b8e082 bcachefs: Simplify check_bucket_ref()
We only need the sector count being modified.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:38 -05:00
Kent Overstreet
011173321f bcachefs: six locks: Simplify optimistic spinning
osq lock maintainers don't want it to be used outside of kernel/locking/
- but, we can do better.

Since we have lock handoff signalled via waitlist entries, there's no
reason for optimistic spinning to have to look at the lock at all -
aside from checking lock-owner; we can just spin looking at our waitlist
entry.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:38 -05:00
Kent Overstreet
ba11c7d67a bcachefs: BCH_DATA_OP_drop_extra_replicas
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:38 -05:00
Kent Overstreet
3c843a6759 bcachefs: Convert bch2_move_btree() to bbpos
Minor cleanup.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:38 -05:00
Kent Overstreet
01e9564540 bcachefs: x-macro-ify bch_data_ops enum
This will let us add an enum -> string table for a to_text() fn.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:38 -05:00
Yang Li
225879f403 bcachefs: clean up one inconsistent indenting
fs/bcachefs/journal_io.c:1843 bch2_journal_write_pick_flush() warn: inconsistent indenting

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=7585
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:38 -05:00
Daniel Hill
2b161cc7cb bcachefs: add a quieter bch2_read_super
If we're looking for a bcachefs supers iteratively we don't want to see
this error.

This function replaces KERN_ERR with KERN_INFO for when we don't find a
bcachefs superblock but preserves other errors.

Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:38 -05:00
Kent Overstreet
25f64e997e bcachefs: Don't use update_cached_sectors() in bch2_mark_alloc()
bch2_update_cached_sectors_list() is closer to how the new disk space
accounting works, called from trans_mark().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:38 -05:00
Kent Overstreet
086a52f7fa bcachefs: Rename bch_replicas_entry -> bch_replicas_entry_v1
Prep work for introducing bch_replicas_entry_v2

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:38 -05:00
Kent Overstreet
ad9c7992eb bcachefs: Kill btree_iter->journal_pos
For BTREE_ITER_WITH_JOURNAL, we memoize lookups in the journal keys, to
avoid the binary search overhead.

Previously we stashed the pos of the last key returned from the journal,
in order to force the lookup to be redone when rewinding.

Now bch2_journal_keys_peek_upto() handles rewinding itself when
necessary - so we can slim down btree_iter.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:37 -05:00
Kent Overstreet
1ae8a0904a bcachefs: Kill memset() in bch2_btree_iter_init()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:37 -05:00
Kent Overstreet
ae0e61175e bcachefs: Add a tracepoint for journal entry close
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:37 -05:00
Kent Overstreet
b27d7afb79 bcachefs: Don't flush journal after replay
The flush_all_pins() after journal replay was unecessary, and trying to
completely flush the journal while RW is not a great idea - it's not
guaranteed to terminate if other threads keep adding things to the
jorunal.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:37 -05:00
Kent Overstreet
b4b79b0764 bcachefs: Don't rejournal keys in key cache flush
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:37 -05:00
Kent Overstreet
5fd24caf57 bcachefs: Fix userspace bch2_prt_datetime()
ctime_r() outputs a newline, which we don't want.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:37 -05:00
Kent Overstreet
e56978c80d bcachefs: Kill BTREE_ITER_ALL_LEVELS
As discussed in the previous patch, BTREE_ITER_ALL_LEVELS appears to be
racy with concurrent interior node updates - and perhaps it is fixable,
but it's tricky and unnecessary.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:37 -05:00
Kent Overstreet
cd404e5b05 bcachefs: backpointers fsck no longer uses BTREE_ITER_ALL_LEVELS
It appears that BTREE_ITER_ALL_LEVELS is racy with concurrent interior
node btree updates; unfortunate but not terribly surprising it's a
difficult problem - that was the original reason for gc_lock.

BTREE_ITER_ALL_LEVELS will probably be deleted in a subsequent patch,
this changes backpointers fsck to instead walk keys at one level of the
btree at a time.

This fixes the tiering_drop_alloc test, which stopped working with the
patch to not flush the journal after journal replay.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:37 -05:00
Kent Overstreet
eb54e81f27 bcachefs: Improve btree_path_dowgrade tracepoint
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:37 -05:00
Kent Overstreet
cb52d23e77 bcachefs: Rename BTREE_INSERT flags
BTREE_INSERT flags are actually transaction commit flags - rename them
for clarity.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:37 -05:00
Kent Overstreet
5927310dcf bcachefs: bch_str_hash_flags_t
Create a separate enum for str_hash flags - instead of abusing the
btree_insert_flags enum - and create a __bitwise typedef for sparse
typechecking.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:37 -05:00
Kent Overstreet
aa62aabbc7 bcachefs: Kill dead BTREE_INSERT flags
BTREE_INSERT_NOWAIT and BTREE_INSERT_GC_LOCK_HELD are no longer used,
and can be deleted.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:37 -05:00
Kent Overstreet
cd5bd16282 bcachefs: Fix redundant variable initialization
path->level was being read, but never used.

Reported-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:37 -05:00
Kent Overstreet
e17b93eb36 bcachefs: Avoiding dropping/retaking write locks in bch2_btree_write_buffer_flush_one()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:37 -05:00
Kent Overstreet
573224301c bcachefs: Make journal replay more efficient
Journal replay now first attempts to replay keys in sorted order,
similar to how the btree write buffer flush path works.

Any keys that can not be replayed due to journal deadlock are then left
for later and replayed in journal order, unpinning journal entries as we
go.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:37 -05:00
Kent Overstreet
bdde9829de bcachefs: Go rw before journal replay
This gets us slightly nicer log messages.

Also, this slightly clarifies synchronization of c->journal_keys; after
we go RW it's in use by multiple threads (so that the btree iterator
code can overlay keys from the journal); so it has to be prepped before
that point.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:37 -05:00
Kent Overstreet
43c7ede009 bcachefs: Kill BTREE_UPDATE_PREJOURNAL
With the previous patch that reworks BTREE_INSERT_JOURNAL_REPLAY, we can
now switch the btree write buffer to use it for flushing.

This has the advantage that transaction commits don't need to take a
journal reservation at all.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:37 -05:00
Kent Overstreet
9a71de675f bcachefs: BTREE_INSERT_JOURNAL_REPLAY now "don't init trans->journal_res"
This slightly changes how trans->journal_res works, in preparation for
changing the btree write buffer flush path to use it.

Now, BTREE_INSERT_JOURNAL_REPLAY means "don't take a journal
reservation; trans->journal_res.seq already refers to the journal
sequence number to pin".

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:37 -05:00
Kent Overstreet
389c92b36e bcachefs: Clear k->needs_whitout earlier in commit path
The upcoming btree write buffer rework is going to use the journal
itself as the first stage of the write buffer; this is a cleanup to make
sure k->needs_whiteout is initialized before keys hit the journal.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:37 -05:00
Kent Overstreet
066a26460b bcachefs: track_event_change()
This introduces a new helper for connecting time_stats to state changes,
i.e. when taking journal reservations is blocked for some reason.

We use this to track separately the different reasons the journal might
be blocked - i.e. space in the journal full, or the journal pin fifo
full.

Also do some cleanup and improvements on the time stats code.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:37 -05:00
Kent Overstreet
3eedfe1af9 bcachefs: Journal pins must always have a flush_fn
flush_fn is how we identify journal pins in debugfs - this is a
debugging aid.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:37 -05:00
Kent Overstreet
df8e13ccf3 bcachefs: Add an assertion in bch2_journal_pin_set()
Previously, bch2_journal_pin_set() would silently ignore a request to
pin a journal sequence number that was no longer dirty, because it was
used internally by bch2_journal_pin_copy() which could race with the src
pin being flushed.

Split these apart so that we can properly assert that @seq is a
currently dirty journal sequence number - this is almost always a bug.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:37 -05:00
Kent Overstreet
fa5df9e7d5 bcachefs: Include average write size in sysfs journal_debug
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:36 -05:00
Kent Overstreet
09e0153b72 bcachefs: Fix warning when building in userspace
bch_err() doesn't reference the fs arg in userspace

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:36 -05:00
Kent Overstreet
fbf9270817 bcachefs: Print old version when scanning for old metadata
Also: we should be using bch2_fs_read_write_early()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:36 -05:00
Kent Overstreet
7d9ae04e39 bcachefs: Fix locking when checking freespace btree
On transaction restart, we weren't re-validating the hole we saw.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:36 -05:00
Kent Overstreet
359d1bad1b bcachefs: Check for unlinked inodes not on deleted list
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:36 -05:00
Kent Overstreet
ecf8a74dab bcachefs: kill INODE_LOCK, use lock_two_nondirectories()
In an ideal world, we'd have a common helper that could be used for
sorting a list of inodes into the correct lock order, and then the same
lock ordering could be used for any type of inode lock, not just
i_rwsem.

But the lock ordering rules for i_rwsem are a bit complicated, so -
abandon that dream for now and do it the more standard way.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:36 -05:00
Kent Overstreet
8b58623f5b bcachefs: Improved backpointer messages in fsck
When we have a key to print, we should print it.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:36 -05:00
Kent Overstreet
e7f7ddedd6 bcachefs: Add extra verbose logging for ro path
Also log time waiting for c->writes references to be dropped; this will
help in debugging why unmounts are taking longer than they should.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:36 -05:00
Kent Overstreet
30418de09e bcachefs: Flush fsck errors before running twice
It's confusing if we run fsck a second time (in debug mode, to verify
the second run is clean), but errors are still ratelimited from the
first run.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:36 -05:00
Kent Overstreet
0d72ab35a9 bcachefs: make RO snapshots actually RO
Add checks to all the VFS paths for "are we in a RO snapshot?".

Note - we don't check this when setting inode options via our xattr
interface, since those generally only affect data placement, not
contents of data.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Reported-by: "Carl E. Thompson" <list-bcachefs@carlthompson.net>
2024-01-01 11:47:07 -05:00
Kent Overstreet
84f1638795 bcachefs: bch_sb_field_downgrade
Add a new superblock section that contains a list of
  { minor version, recovery passes, errors_to_fix }

that is - a list of recovery passes that must be run when downgrading
past a given version, and a list of errors to silently fix.

The upcoming disk accounting rewrite is not going to be fully
compatible: we're going to have to regenerate accounting both when
upgrading to the new version, and also from downgrading from the new
version, since the new method of doing disk space accounting is a
completely different architecture based on deltas, and synchronizing
them for every jounal entry write to maintain compatibility is going to
be too expensive and impractical.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:07 -05:00
Kent Overstreet
8b16413cda bcachefs: bch_sb.recovery_passes_required
Add two new superblock fields. Since the main section of the superblock
is now fully, we have to add a new variable length section for them -
bch_sb_field_ext.

 - recovery_passes_requried: recovery passes that must be run on the
   next mount
 - errors_silent: errors that will be silently fixed

These are to improve upgrading and dwongrading: these fields won't be
cleared until after recovery successfully completes, so there won't be
any issues with crashing partway through an upgrade or a downgrade.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:07 -05:00
Kent Overstreet
808c680f2a bcachefs: Add persistent identifiers for recovery passes
The next patch will start to refer to recovery passes from the
superblock; naturally, we now need identifiers that don't change, since
the existing enum is in the order in which they are run and is not
fixed.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:07 -05:00
Kent Overstreet
560661d4ae bcachefs: prt_bitflags_vector()
similar to prt_bitflags(), but for ulong arrays

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:07 -05:00
Kent Overstreet
6b49b0f7e7 bcachefs: move BCH_SB_ERRS() to sb-errors_types.h
we need BCH_SB_ERR_MAX in bcachefs.h

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:07 -05:00
Kent Overstreet
d9534cc9fc bcachefs: fix buffer overflow in nocow write path
BCH_REPLICAS_MAX isn't the actual maximum number of pointers in an
extent, it's the maximum number of dirty pointers.

We don't have a real restriction on the number of cached pointers, and
we don't want a fixed size array here anyways - so switch to
DARRAY_PREALLOCATED().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Reported-and-tested-by: Daniel J Blueman <daniel@quora.org>
2024-01-01 11:46:52 -05:00
Kent Overstreet
099dc5c29d bcachefs: DARRAY_PREALLOCATED()
Add support to darray for preallocating some number of elements.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:46:52 -05:00
Kent Overstreet
a58a6a58f5 bcachefs: Switch darray to kvmalloc()
We sometimes use darrays for quite large buffers - the btree write
buffer in particular needs large buffers, since it must be sized to hold
all the write buffer keys outstanding in the journal.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:46:52 -05:00
Kent Overstreet
73ab9e0386 bcachefs: Factor out darray resize slowpath
Move the slowpath (actually growing the darray) to an out-of-line
function; also, add some helpers for the upcoming btree write buffer
rewrite.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:46:52 -05:00
Kent Overstreet
f87bf892ea bcachefs: fix setting version_upgrade_complete
If a superblock write hasn't happened (i.e. we never had to go rw), then
c->sb.version will be out of date w.r.t. c->disk_sb.sb->version.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:46:52 -05:00
Kent Overstreet
f2eb8434e4 bcachefs: fix invalid free in dio write path
turns out iterate_iovec() mutates __iov, we need to save our own copy

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Reported-by: Marcin Mirosław <marcin@mejor.pl>
2024-01-01 11:43:03 -05:00
Kent Overstreet
fa014953f9 bcachefs: Fix extents iteration + snapshots interaction
peek_upto() checks against the end position and bails out before
FILTER_SNAPSHOTS checks; this is because if we end up at a different
inode number than the original search key none of the keys we see might
be visibile in the current snapshot - we might be looking at inode in a
completely different subvolume.

But this is broken, because when we're iterating over extents we're
checking against the extent start position to decide when to bail out,
and the extent start position isn't monotonically increasing until after
we've run FILTER_SNAPSHOTS.

Fix this by adding a simple inode number check where the old bailout
check was, and moving the main check to the correct position.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Reported-by: "Carl E. Thompson" <list-bcachefs@carlthompson.net>
2024-01-01 11:42:39 -05:00
Linus Torvalds
453f5db061 tracing fixes for v6.7-rc7:
- Fix readers that are blocked on the ring buffer when buffer_percent is
   100%. They are supposed to wake up when the buffer is full, but
   because the sub-buffer that the writer is on is never considered
   "dirty" in the calculation, dirty pages will never equal nr_pages.
   Add +1 to the dirty count in order to count for the sub-buffer that
   the writer is on.
 
 - When a reader is blocked on the "snapshot_raw" file, it is to be
   woken up when a snapshot is done and be able to read the snapshot
   buffer. But because the snapshot swaps the buffers (the main one
   with the snapshot one), and the snapshot reader is waiting on the
   old snapshot buffer, it was not woken up (because it is now on
   the main buffer after the swap). Worse yet, when it reads the buffer
   after a snapshot, it's not reading the snapshot buffer, it's reading
   the live active main buffer.
 
   Fix this by forcing a wakeup of all readers on the snapshot buffer when
   a new snapshot happens, and then update the buffer that the reader
   is reading to be back on the snapshot buffer.
 
 - Fix the modification of the direct_function hash. There was a race
   when new functions were added to the direct_function hash as when
   it moved function entries from the old hash to the new one, a direct
   function trace could be hit and not see its entry.
 
   This is fixed by allocating the new hash, copy all the old entries
   onto it as well as the new entries, and then use rcu_assign_pointer()
   to update the new direct_function hash with it.
 
   This also fixes a memory leak in that code.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZZAzTxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qs9IAP9e6wZ74aEjMED9nsbC49EpyCNTqa72
 y0uDS/p9ppv52gD7Be+l+kJQzYNh6bZU0+B19hNC2QVn38jb7sOadfO/1Q8=
 =NDkf
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Fix readers that are blocked on the ring buffer when buffer_percent
   is 100%. They are supposed to wake up when the buffer is full, but
   because the sub-buffer that the writer is on is never considered
   "dirty" in the calculation, dirty pages will never equal nr_pages.
   Add +1 to the dirty count in order to count for the sub-buffer that
   the writer is on.

 - When a reader is blocked on the "snapshot_raw" file, it is to be
   woken up when a snapshot is done and be able to read the snapshot
   buffer. But because the snapshot swaps the buffers (the main one with
   the snapshot one), and the snapshot reader is waiting on the old
   snapshot buffer, it was not woken up (because it is now on the main
   buffer after the swap). Worse yet, when it reads the buffer after a
   snapshot, it's not reading the snapshot buffer, it's reading the live
   active main buffer.

   Fix this by forcing a wakeup of all readers on the snapshot buffer
   when a new snapshot happens, and then update the buffer that the
   reader is reading to be back on the snapshot buffer.

 - Fix the modification of the direct_function hash. There was a race
   when new functions were added to the direct_function hash as when it
   moved function entries from the old hash to the new one, a direct
   function trace could be hit and not see its entry.

   This is fixed by allocating the new hash, copy all the old entries
   onto it as well as the new entries, and then use rcu_assign_pointer()
   to update the new direct_function hash with it.

   This also fixes a memory leak in that code.

 - Fix eventfs ownership

* tag 'trace-v6.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  ftrace: Fix modification of direct_function hash while in use
  tracing: Fix blocked reader of snapshot buffer
  ring-buffer: Fix wake ups when buffer_percent is set to 100
  eventfs: Fix file and directory uid and gid ownership
2023-12-30 11:37:35 -08:00
Linus Torvalds
8735c7c84d ksmbd server fix, also for stable
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmWM/+gACgkQiiy9cAdy
 T1H6NAv/a/hJO2dQB8/7Ts7Y3XTDP0hqvJEu2RRx0LuK5ERSTr7zEUgDP0W8HYpO
 IHvlOnOCa3n68EJGVa9OolYRWgAIX2qYXrGx4+iEWJaLIDsdPgyZdfU0YgwjdmgD
 xHjadidNfmrahfc/r6s8C9HBE9J/LVpFyHnCmasAw7V4p+7mBL/Ycwkm3BmbX0Sp
 /gSX5ITOPZdGJl97OS7elkA7t2ABLaJXeHPKCHgKYiwx9TbpH75WxM2Kpfsk2LaO
 yaEWW/JTgr+Vg+EYUTumKGONhKpgx+rgqIGiZYwWu42aEXp+53JM8f67FWihDvL5
 2t4uei8GaeuXRrYTSq/ylfy10V9BOcraf3SV/pJ6C9NvAWUoGiVKRFj62BjZSxJp
 E8Pdj4wy97hFa9gaEyMg81w02CvGe19R1ikLzSXA8GJLnlg+rBLzEt/1g0gQ6YI/
 vj8Ett6m5ezkW1Q7V3yCvjDN9IrYV0q138ZBtcjJm9P9EWHXzpDKXoUVcaXGNw31
 gM5eRPNE
 =/4UC
 -----END PGP SIGNATURE-----

Merge tag '6.7rc7-smb3-srv-fix' of git://git.samba.org/ksmbd

Pull ksmbd server fix from Steve French:

 - address possible slab out of bounds in parsing of open requests

* tag '6.7rc7-smb3-srv-fix' of git://git.samba.org/ksmbd:
  ksmbd: fix slab-out-of-bounds in smb_strndup_from_utf16()
2023-12-28 16:12:23 -08:00
Linus Torvalds
eeec259963 More bcachefs bugfixes for 6.7:
Just a few fixes: besides a few one liners, we have a fix for snapshots
 + compression where the extent update path didn't account for the fact
 that with snapshots, we might split an existing extent into three, not
 just two; and a small fixup for promotes which were broken by the recent
 changes in the data update path to correctly take into account device
 durability.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmWMU08ACgkQE6szbY3K
 bnZ22Q//dTTPeGkHDIT0lk/8Jbn6zcIh2nvrstR2BUBLlrFzeKJEL1nhMBnhbcxZ
 99NcjKPirumINeOyH9c0ZzRn+Q/Mr799yOQ1FgmGyYcmP2DkxK/3eiTzb/lVD6PW
 MKnHantKCDkaBptRl9pTzHlSir7IbyzRejTwrLbO+zSZxlEuticfaGhfnDBGeVnK
 MZIwnYOFWs9mIzGaYMTBXjQD0/Q38Z+mPR+1kgcz/cAyoYSblRoWiZ3qfy1LMpRB
 FVfKK2xXXa0z1Z3NAbtoiNmIvZOlwWNqApY5jIDX5EVKmQahXxxnR+kcFp545bd+
 Lp78ULJhSDNxs1XZZM44lUV52ZPoVfqz6SXs2hs3kxFcdFdWtpMpfWLjqm4sZtvK
 9gnXckurLtHVhdQdSbQ1n1iGVdx1qgs9znAjlvSzQEKr+ba3RzdUABsRvGjnqtgy
 V5xBv02Xa/gF5Uk7YonDqm3/aY6QhhU4bDlGlRzveZOm8ktwElqrkeE54FveuEy1
 9IasPoUNVMu6urDO4xr7LpY3s3u06TnNHhj0TKllqY1mWZoj4COr+IwizUyff0Nz
 W1s0wKaSOX4eLcntJLmfzZ4429JTyR2AJbA1j3VnxGF3jaLhvyeSD1pstBqVicSJ
 wXg5PHN/XM1AQlEcpNv2pSrt2u//9+UOPnk9H18EY0tXvo/B91g=
 =JTkX
 -----END PGP SIGNATURE-----

Merge tag 'bcachefs-2023-12-27' of https://evilpiepirate.org/git/bcachefs

Pull bcachefs fixes from Kent Overstreet:
 "Just a few fixes: besides a few one liners, we have a fix for
  snapshots + compression where the extent update path didn't account
  for the fact that with snapshots, we might split an existing extent
  into three, not just two; and a small fixup for promotes which were
  broken by the recent changes in the data update path to correctly take
  into account device durability"

* tag 'bcachefs-2023-12-27' of https://evilpiepirate.org/git/bcachefs:
  bcachefs: Fix promotes
  bcachefs: Fix leakage of internal error code
  bcachefs: Fix insufficient disk reservation with compression + snapshots
  bcachefs: fix BCH_FSCK_ERR enum
2023-12-28 11:55:20 -08:00
Namjae Jeon
d10c77873b ksmbd: fix slab-out-of-bounds in smb_strndup_from_utf16()
If ->NameOffset/Length is bigger than ->CreateContextsOffset/Length,
ksmbd_check_message doesn't validate request buffer it correctly.
So slab-out-of-bounds warning from calling smb_strndup_from_utf16()
in smb2_open() could happen. If ->NameLength is non-zero, Set the larger
of the two sums (Name and CreateContext size) as the offset and length of
the data area.

Reported-by: Yang Chaoming <lometsj@live.com>
Cc: stable@vger.kernel.org
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-27 22:55:36 -06:00
Kent Overstreet
7b474c77da bcachefs: Fix promotes
The recent work to fix data moves w.r.t. durability broke promotes,
because the caused us to bail out when the extent minus pointers being
dropped still has enough pointers to satisfy the current number of
replicas.

Disable this check when we're adding cached replicas.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-26 19:31:11 -05:00
Linus Torvalds
a0652eb205 Char/Misc driver fixes for 6.7-rc7
Here are a small number of various driver fixes for 6.7-rc7 that
 normally come through the char-misc tree, and one debugfs fix as well.
 Included in here are:
   - iio and hid sensor driver fixes for a number of small things
   - interconnect driver fixes
   - brcm_nvmem driver fixes
   - debugfs fix for previous fix
   - guard() definition in device.h so that many subsystems can start
     using it for 6.8-rc1 (requested by Dan Williams to make future
     merges easier.)
 
 All of these have been in linux-next for a while with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZYapuQ8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+yljzgCbBkgtY/CpJJLz2VWcibJ5QiYougsAoK7vQKcX
 7gJbm3CB3gWjHqx1eKAu
 =Wf96
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-6.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char / misc driver fixes from Greg KH:
 "Here are a small number of various driver fixes for 6.7-rc7 that
  normally come through the char-misc tree, and one debugfs fix as well.

  Included in here are:

   - iio and hid sensor driver fixes for a number of small things

   - interconnect driver fixes

   - brcm_nvmem driver fixes

   - debugfs fix for previous fix

   - guard() definition in device.h so that many subsystems can start
     using it for 6.8-rc1 (requested by Dan Williams to make future
     merges easier)

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'char-misc-6.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (21 commits)
  debugfs: initialize cancellations earlier
  Revert "iio: hid-sensor-als: Add light color temperature support"
  Revert "iio: hid-sensor-als: Add light chromaticity support"
  nvmem: brcm_nvram: store a copy of NVRAM content
  dt-bindings: nvmem: mxs-ocotp: Document fsl,ocotp
  driver core: Add a guard() definition for the device_lock()
  interconnect: qcom: icc-rpm: Fix peak rate calculation
  iio: adc: MCP3564: fix hardware identification logic
  iio: adc: MCP3564: fix calib_bias and calib_scale range checks
  iio: adc: meson: add separate config for axg SoC family
  iio: adc: imx93: add four channels for imx93 adc
  iio: adc: ti_am335x_adc: Fix return value check of tiadc_request_dma()
  interconnect: qcom: sm8250: Enable sync_state
  iio: triggered-buffer: prevent possible freeing of wrong buffer
  iio: imu: inv_mpu6050: fix an error code problem in inv_mpu6050_read_raw
  iio: imu: adis16475: use bit numbers in assign_bit()
  iio: imu: adis16475: add spi_device_id table
  iio: tmag5273: fix temperature offset
  interconnect: Treat xlate() returning NULL node as an error
  iio: common: ms_sensors: ms_sensors_i2c: fix humidity conversion time table
  ...
2023-12-23 11:29:12 -08:00
Steven Rostedt (Google)
7e8358edf5 eventfs: Fix file and directory uid and gid ownership
It was reported that when mounting the tracefs file system with a gid
other than root, the ownership did not carry down to the eventfs directory
due to the dynamic nature of it.

A fix was done to solve this, but it had two issues.

(a) if the attr passed into update_inode_attr() was NULL, it didn't do
    anything. This is true for files that have not had a chown or chgrp
    done to itself or any of its sibling files, as the attr is allocated
    for all children when any one needs it.

 # umount /sys/kernel/tracing
 # mount -o rw,seclabel,relatime,gid=1000 -t tracefs nodev /mnt

 # ls -ld /mnt/events/sched
drwxr-xr-x 28 root rostedt 0 Dec 21 13:12 /mnt/events/sched/

 # ls -ld /mnt/events/sched/sched_switch
drwxr-xr-x 2 root rostedt 0 Dec 21 13:12 /mnt/events/sched/sched_switch/

But when checking the files:

 # ls -l /mnt/events/sched/sched_switch
total 0
-rw-r----- 1 root root 0 Dec 21 13:12 enable
-rw-r----- 1 root root 0 Dec 21 13:12 filter
-r--r----- 1 root root 0 Dec 21 13:12 format
-r--r----- 1 root root 0 Dec 21 13:12 hist
-r--r----- 1 root root 0 Dec 21 13:12 id
-rw-r----- 1 root root 0 Dec 21 13:12 trigger

(b) When the attr does not denote the UID or GID, it defaulted to using
    the parent uid or gid. This is incorrect as changing the parent
    uid or gid will automatically change all its children.

 # chgrp tracing /mnt/events/timer

 # ls -ld /mnt/events/timer
drwxr-xr-x 2 root tracing 0 Dec 21 14:34 /mnt/events/timer

 # ls -l /mnt/events/timer
total 0
-rw-r----- 1 root root    0 Dec 21 14:35 enable
-rw-r----- 1 root root    0 Dec 21 14:35 filter
drwxr-xr-x 2 root tracing 0 Dec 21 14:35 hrtimer_cancel
drwxr-xr-x 2 root tracing 0 Dec 21 14:35 hrtimer_expire_entry
drwxr-xr-x 2 root tracing 0 Dec 21 14:35 hrtimer_expire_exit
drwxr-xr-x 2 root tracing 0 Dec 21 14:35 hrtimer_init
drwxr-xr-x 2 root tracing 0 Dec 21 14:35 hrtimer_start
drwxr-xr-x 2 root tracing 0 Dec 21 14:35 itimer_expire
drwxr-xr-x 2 root tracing 0 Dec 21 14:35 itimer_state
drwxr-xr-x 2 root tracing 0 Dec 21 14:35 tick_stop
drwxr-xr-x 2 root tracing 0 Dec 21 14:35 timer_cancel
drwxr-xr-x 2 root tracing 0 Dec 21 14:35 timer_expire_entry
drwxr-xr-x 2 root tracing 0 Dec 21 14:35 timer_expire_exit
drwxr-xr-x 2 root tracing 0 Dec 21 14:35 timer_init
drwxr-xr-x 2 root tracing 0 Dec 21 14:35 timer_start

At first it was thought that this could be easily fixed by just making the
default ownership of the superblock when it was mounted. But this does not
handle the case of:

 # chgrp tracing instances
 # mkdir instances/foo

If the superblock was used, then the group ownership would be that of what
it was when it was mounted, when it should instead be "tracing".

Instead, set a flag for the top level eventfs directory ("events") to flag
which eventfs_inode belongs to it.

Since the "events" directory's dentry and inode are never freed, it does
not need to use its attr field to restore its mode and ownership. Use the
this eventfs_inode's attr as the default ownership for all the files and
directories underneath it.

When the events eventfs_inode is created, it sets its ownership to its
parent uid and gid. As the events directory is created at boot up before
it gets mounted, this will always be uid=0 and gid=0. If it's created via
an instance, then it will take the ownership of the instance directory.

When the file system is mounted, it will update all the gids if one is
specified. This will have a callback to update the events evenfs_inode's
default entries.

When a file or directory is created under the events directory, it will
walk the ei->dentry parents until it finds the evenfs_inode that belongs
to the events directory to retrieve the default uid and gid values.

Link: https://lore.kernel.org/all/CAHk-=wiwQtUHvzwyZucDq8=Gtw+AnwScyLhpFswrQ84PjhoGsg@mail.gmail.com/
Link: https://lore.kernel.org/linux-trace-kernel/20231221190757.7eddbca9@gandalf.local.home

Cc: stable@vger.kernel.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Dongliang Cui <cuidongliang390@gmail.com>
Cc: Hongyu Jin  <hongyu.jin@unisoc.com>
Fixes: 0dfc852b6f ("eventfs: Have event files and directories default to parent uid and gid")
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Tested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-12-22 08:13:55 -05:00
Johannes Berg
159f5bdadc debugfs: initialize cancellations earlier
Tetsuo Handa pointed out that in the (now reverted)
lockdep commit I initialized the data too late. The
same is true for the cancellation data, it must be
initialized before the cmpxchg(), otherwise it may
be done twice and possibly even overwriting data in
there already when there's a race. Fix that, which
also requires destroying the mutex in case we lost
the race.

Fixes: 8c88a47435 ("debugfs: add API to allow debugfs operations cancellation")
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://lore.kernel.org/r/20231221150444.1e47a0377f80.If7e8ba721ba2956f12c6e8405e7d61e154aa7ae7@changeid
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-22 07:33:02 +01:00
Kent Overstreet
c8296d730f bcachefs: Fix leakage of internal error code
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-21 23:46:52 -05:00
Kent Overstreet
01db5e5f2f bcachefs: Fix insufficient disk reservation with compression + snapshots
When overwriting and splitting existing extents, we weren't correctly
accounting for a 3 way split of a compressed extent.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-21 23:46:51 -05:00
David Howells
9a6b294ab4 afs: Fix use-after-free due to get/remove race in volume tree
When an afs_volume struct is put, its refcount is reduced to 0 before
the cell->volume_lock is taken and the volume removed from the
cell->volumes tree.

Unfortunately, this means that the lookup code can race and see a volume
with a zero ref in the tree, resulting in a use-after-free:

    refcount_t: addition on 0; use-after-free.
    WARNING: CPU: 3 PID: 130782 at lib/refcount.c:25 refcount_warn_saturate+0x7a/0xda
    ...
    RIP: 0010:refcount_warn_saturate+0x7a/0xda
    ...
    Call Trace:
     afs_get_volume+0x3d/0x55
     afs_create_volume+0x126/0x1de
     afs_validate_fc+0xfe/0x130
     afs_get_tree+0x20/0x2e5
     vfs_get_tree+0x1d/0xc9
     do_new_mount+0x13b/0x22e
     do_mount+0x5d/0x8a
     __do_sys_mount+0x100/0x12a
     do_syscall_64+0x3a/0x94
     entry_SYSCALL_64_after_hwframe+0x62/0x6a

Fix this by:

 (1) When putting, use a flag to indicate if the volume has been removed
     from the tree and skip the rb_erase if it has.

 (2) When looking up, use a conditional ref increment and if it fails
     because the refcount is 0, replace the node in the tree and set the
     removal flag.

Fixes: 20325960f8 ("afs: Reorganise volume and server trees to be rooted on the cell")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-12-21 10:16:07 -08:00
David Howells
a9e01ac8c5 afs: Fix overwriting of result of DNS query
In afs_update_cell(), ret is the result of the DNS lookup and the errors
are to be handled by a switch - however, the value gets clobbered in
between by setting it to -ENOMEM in case afs_alloc_vlserver_list()
fails.

Fix this by moving the setting of -ENOMEM into the error handling for
OOM failure.  Further, only do it if we don't have an alternative error
to return.

Found by Linux Verification Center (linuxtesting.org) with SVACE.  Based
on a patch from Anastasia Belova [1].

Fixes: d5c32c89b2 ("afs: Fix cell DNS lookup")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
cc: Anastasia Belova <abelova@astralinux.ru>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: lvc-project@linuxtesting.org
Link: https://lore.kernel.org/r/20231221085849.1463-1-abelova@astralinux.ru/ [1]
Link: https://lore.kernel.org/r/1700862.1703168632@warthog.procyon.org.uk/ # v1
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-12-21 09:57:43 -08:00
Linus Torvalds
937fd40338 AFS fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAmWEVXEACgkQ+7dXa6fL
 C2sxbw/+IIgTpjfXDGQoGOpcvQyHW+gMGFqrrZjKJZGQiNZ0DfHPciYcMvOOyqZp
 rbt22V/WvQKOlcQ1IYQqjdB47DilFGRepRLZ/fuqq6JDmcHGx2Btj8uJTsV0He4o
 rCLXVrfm/JNYECY6dO5bGizrCYL6clVo0x/U2LPlU/2mbXltY1d1yXtzE++6kBZl
 w/MLJDmQxvONarhpdD0J9E/uAJ+kHX05HhlqnSxu8HEoGHMVka1N5EGAOq9cICvm
 y/8NwnGtflhpJEIso2Kx7XAE8kszXyKw0PJvOaO4GG1PWMs3rIrZbHn7wCbChyMi
 xOw+qZVC60BTang/vEOo5I4eFD+NIdBDoGdyuyNICXDIMQ9WvN2nF5qUdFAeR7Vi
 Dgxld1WWHm6RcOjl6y9t5Na0zJmgdOyONWx6Xli/AJw2RTx5JiVzDuKP6yu+DMvn
 DUPrjEQ1m+qPbTwclEzqu3grNabp7EX1vYRKDC4bf+Lg8iGNxlFp+2uyg14HsDUH
 N/yqnj8MK6ADcVMfZGGUalIzsgN06vHfHhE7Tj4xSnrR1dekxBveNFJM3r+eeaLV
 0VsjHW/IMKPWxO/vDzi6zr0nBeWYQgxAAg+w3LXl3qRGEXlihibmosofovhQFD6k
 GhkXojmc3BeSceVfOcEHZu0xXZIy/2y+hZy95BbNLT/eiCwIf/M=
 =GJNW
 -----END PGP SIGNATURE-----

Merge tag 'afs-fixes-20231221' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

Pull AFS fixes from David Howells:
 "Improve the interaction of arbitrary lookups in the AFS dynamic root
  that hit DNS lookup failures [1] where kafs behaves differently from
  openafs and causes some applications to fail that aren't expecting
  that. Further, negative DNS results aren't getting removed and are
  causing failures to persist.

   - Always delete unused (particularly negative) dentries as soon as
     possible so that they don't prevent future lookups from retrying.

   - Fix the handling of new-style negative DNS lookups in ->lookup() to
     make them return ENOENT so that userspace doesn't get confused when
     stat succeeds but the following open on the looked up file then
     fails.

   - Fix key handling so that DNS lookup results are reclaimed almost as
     soon as they expire rather than sitting round either forever or for
     an additional 5 mins beyond a set expiry time returning
     EKEYEXPIRED. They persist for 1s as /bin/ls will do a second stat
     call if the first fails"

Link: https://bugzilla.kernel.org/show_bug.cgi?id=216637 [1]
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>

* tag 'afs-fixes-20231221' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  keys, dns: Allow key types (eg. DNS) to be reclaimed immediately on expiry
  afs: Fix dynamic root lookup DNS check
  afs: Fix the dynamic root's d_delete to always delete unused dentries
2023-12-21 09:53:25 -08:00
Linus Torvalds
13b734465a Tracing fixes for 6.7:
- Fix another kerneldoc warning
 
 - Fix eventfs files to inherit the ownership of its parent directory.
   The dynamic creating of dentries in eventfs did not take into
   account if the tracefs file system was mounted with a gid/uid,
   and would still default to the gid/uid of root. This is a regression.
 
 - Fix warning when synthetic event testing is enabled along with
   startup event tracing testing is enabled
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZYRYjhQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qs0aAQCXWcBeDEWsi8VxAOBU5Q6isvXn2koM
 +xSX6LJPh6hFVAD+Pc3oLgvyE5IyqNUM9RYtpwPVMhpAsyE9FIz3TWarEww=
 =LY0i
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.7-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Fix another kerneldoc warning

 - Fix eventfs files to inherit the ownership of its parent directory.

   The dynamic creation of dentries in eventfs did not take into account
   if the tracefs file system was mounted with a gid/uid, and would
   still default to the gid/uid of root. This is a regression.

 - Fix warning when synthetic event testing is enabled along with
   startup event tracing testing is enabled

* tag 'trace-v6.7-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing / synthetic: Disable events after testing in synth_event_gen_test_init()
  eventfs: Have event files and directories default to parent uid and gid
  tracing/synthetic: fix kernel-doc warnings
2023-12-21 09:31:45 -08:00
Steven Rostedt (Google)
0dfc852b6f eventfs: Have event files and directories default to parent uid and gid
Dongliang reported:

  I found that in the latest version, the nodes of tracefs have been
  changed to dynamically created.

  This has caused me to encounter a problem where the gid I specified in
  the mounting parameters cannot apply to all files, as in the following
  situation:

  /data/tmp/events # mount | grep tracefs
  tracefs on /data/tmp type tracefs (rw,seclabel,relatime,gid=3012)

  gid 3012 = readtracefs

  /data/tmp # ls -lh
  total 0
  -r--r-----   1 root readtracefs 0 1970-01-01 08:00 README
  -r--r-----   1 root readtracefs 0 1970-01-01 08:00 available_events

  ums9621_1h10:/data/tmp/events # ls -lh
  total 0
  drwxr-xr-x 2 root root 0 2023-12-19 00:56 alarmtimer
  drwxr-xr-x 2 root root 0 2023-12-19 00:56 asoc

  It will prevent certain applications from accessing tracefs properly, I
  try to avoid this issue by making the following modifications.

To fix this, have the files created default to taking the ownership of
the parent dentry unless the ownership was previously set by the user.

Link: https://lore.kernel.org/linux-trace-kernel/1703063706-30539-1-git-send-email-dongliang.cui@unisoc.com/
Link: https://lore.kernel.org/linux-trace-kernel/20231220105017.1489d790@gandalf.local.home

Cc: stable@vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Hongyu Jin  <hongyu.jin@unisoc.com>
Fixes: 28e12c09f5 ("eventfs: Save ownership and mode")
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reported-by: Dongliang Cui <cuidongliang390@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-12-21 09:58:02 -05:00
Linus Torvalds
eee7f5b48e eight import client fixes addressing, most also for stable
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmWDm5MACgkQiiy9cAdy
 T1G34gv+N6wKK2jsQp6zPTGkMNDCcTb2JluSFyTw9IXJDXRICSaVAmG1aBY6X/GT
 pe4NRDd9fD2KZu9wf9Sw4oVpmEoXU4uiQwSGYYkUBMRBj1jqpWYe+Vs7m3ShQyJM
 CAReHCV/TAbLNgjC8ZzrkuyHOh9jSAr3lWYPX9caTWC1n1KkFc1gGBi9A8PhVUJn
 MMKwbugc7bCzXhiAmLy1X7EhLtDvjLsby7r0lveK8OR+iSr9Nf59inX+cwmcePe1
 8pCNrQUX72O5jQ8y7eXIloZaFUwEvXx4TlYR6Ty3TL3h+f2tvQzbKhzKQQmCnB17
 gbInx4rgn5irI8RYgRca3pyXyB0Xv0H3lG0hy7qjvjTdlTYLYhLL18fTy1r9L7f2
 VJ/5aa6fT/WFNezQmyvQ5VSg9n5lQ+Pg5aX6QtjmDWxlOu/VrBddpC1ScjUYBdPa
 0Ep7xUqbDxgUBONa4gQMVHpiiZDfUCibl6pTH+kl6N5EMzLtj0uQhdM37P5B0znS
 s3dQNeWD
 =+3SX
 -----END PGP SIGNATURE-----

Merge tag '6.7-rc6-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

 - two multichannel reconnect fixes, one fixing an important refcounting
   problem that can lead to umount problems

 - atime fix

 - five fixes for various potential OOB accesses, including a CVE fix,
   and two additional fixes for problems pointed out by Robert Morris's
   fuzzing investigation

* tag '6.7-rc6-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: do not let cifs_chan_update_iface deallocate channels
  cifs: fix a pending undercount of srv_count
  fs: cifs: Fix atime update check
  smb: client: fix potential OOB in smb2_dump_detail()
  smb: client: fix potential OOB in cifs_dump_detail()
  smb: client: fix OOB in smbCalcSize()
  smb: client: fix OOB in SMB2_query_info_init()
  smb: client: fix OOB in cifsd when receiving compounded resps
2023-12-20 21:09:47 -08:00
Linus Torvalds
1a44b0073b overlayfs fixes for 6.7-rc7
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE9zuTYTs0RXF+Ke33EVvVyTe/1WoFAmWCXnIACgkQEVvVyTe/
 1WpafRAAiEDCJW3nczULYVUhqWqROFwPULpyBs/DNAkc+eZ3i4WC0KlPfSrgpKsw
 V1JBncqxRRU5D5NKDPx2FXUjE+wZKDXqNQizdFKgvEgTmMZ24Avoirq2qfUo1MP+
 pWzRzDYY27geBovLRalmHJt76Jl3OOQ6j4RTCvnroaf5gfHj2eKv9cI/BSCYQ5Lf
 sjvsXEsL07eXZFzY1MaBz4kcn/feJuvQyBxyauLiZ/hsJI2a+W70DNsZUV4y+swE
 xtWxWAJXvtcoJ5aKcfpDcrHhKs6ZOq4iV/F9KgAhM5SkoyWqpGJ1/aZxYv9LfSG3
 5oqKci9qEMdDur6RsE9BLwDr1GZ9sD1N+NjY2gFM+6S+e0Vcg6RHE6Nt+TEDu88n
 FjyzqHF5dOzmV65a+OUMssNjvXsrMOwkonF0Io31njP/xMk1R6HTmGR0EjMbikI6
 2wcIqGFGJwFCt4EtST69jjnLr9NEbtOxu9A2uLnZ02Nn6yokn2jOABzGjwtfLIwF
 rJFqsM6QQDpHq7RVvuzQYPXgVxH87YjLMgBOswH6MlKKGcDTa+RgxU5Jglm3TblZ
 aMTgAgd2vQfP078y2Hvi6ywFD/tN7ROJ44ibKKdFlapn8zVV4M/qqI2vJ39Q2UMx
 BTmTT3ZKK7rFf9BaJGtg1oztXbfv6rgPwFwhNKDaHsMIABgT2Ho=
 =c+C2
 -----END PGP SIGNATURE-----

Merge tag 'ovl-fixes-6.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs

Pull overlayfs fix from Amir Goldstein:
 "Fix a regression from this merge window"

* tag 'ovl-fixes-6.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs:
  ovl: fix dentry reference leak after changes to underlying layers
2023-12-20 12:04:03 -08:00
Linus Torvalds
74d8fc2b86 More bcachefs bugfixes for 6.7:
- Fix a deadlock in the data move path with nocow locks (vs. update in
    place writes); when trylock failed we were incorrectly waiting for in
    flight ios to flush.
  - Fix reporting of NFS file handle length
  - Fix early error path in bch2_fs_alloc() - list head wasn't being
    initialized early enough
  - Make sure correct (hardware accelerated) crc modules get loaded
  - Fix a rare overflow in the btree split path, when the packed bkey
    format grows and all the keys have no value (LRU btree).
  - Fix error handling in the sector allocator
    This was causing writes to spuriously fail in multidevice setups, and
    another bug meant that the errors weren't being logged, only reported
    via fsync.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmWCXs8ACgkQE6szbY3K
 bnaNcQ//bECexJp/gNYJg1kZDEeLrYoe5rAjVEE46hWM+PuXV8NqzJqw+lqhmnzL
 x/gX6vT/daGU2TMvxHo6Utk2dcmR18iYK3O1DK+No0m/U4riuB8sYNU2jw3QLYGc
 onZKg5fFqmuM5riuKDEsLnkTiDE/PRQ++YuFKcp9ejG57sSXszvPymAZVPC3pT17
 tGD0ejpVqjmp0ztKuCnFoknLhf8YxOIBwF2nHgtsVOKqBAmutmQcNPnuTdpYu5TO
 Bdc24DIWJqfjyGqO9SxlpcOYBp5dDK2PeP9FJ7UL6aDj3swP2DChKGZr0OW7h3jH
 wDFt7rR392Hcc5PEBJMU0CDdVj4Y5B6M88PUlUlNGKXgaX/epMKZTqSzepx2pqT6
 zjn+wN8Y/092NuEPeIDbAXny+LmxHGf4BPRvraruertLD/sPtW+p1qA0OB5+9leK
 SrfR/RMqSlPLEAgxbQ+AOtCaODgPODpLV3zpOdZU+NtWfXcs7sAxNcrEGk7pjnjn
 1YQn+LSHXGovURhJsDMT/ht8UOm9ryx1aCzRA344wPtyvswis9xy+GnGhsrwhjcu
 5TKvu9B3Lg3IONoMSegtInWJK2FiO9oOAuf75vjSaID+lAEiRNfxofU8dQ9DV3hx
 GHWz8D1tHobPVRtUJg6geC+3td06dFyKSD9cBPoNO5T23dgxQNQ=
 =y4Cl
 -----END PGP SIGNATURE-----

Merge tag 'bcachefs-2023-12-19' of https://evilpiepirate.org/git/bcachefs

Pull more bcachefs fixes from Kent Overstreet:

 - Fix a deadlock in the data move path with nocow locks (vs. update in
   place writes); when trylock failed we were incorrectly waiting for in
   flight ios to flush.

 - Fix reporting of NFS file handle length

 - Fix early error path in bch2_fs_alloc() - list head wasn't being
   initialized early enough

 - Make sure correct (hardware accelerated) crc modules get loaded

 - Fix a rare overflow in the btree split path, when the packed bkey
   format grows and all the keys have no value (LRU btree).

 - Fix error handling in the sector allocator

   This was causing writes to spuriously fail in multidevice setups, and
   another bug meant that the errors weren't being logged, only reported
   via fsync.

* tag 'bcachefs-2023-12-19' of https://evilpiepirate.org/git/bcachefs:
  bcachefs: Fix bch2_alloc_sectors_start_trans() error handling
  bcachefs; guard against overflow in btree node split
  bcachefs: btree_node_u64s_with_format() takes nr keys
  bcachefs: print explicit recovery pass message only once
  bcachefs: improve modprobe support by providing softdeps
  bcachefs: fix invalid memory access in bch2_fs_alloc() error path
  bcachefs: Fix determining required file handle length
  bcachefs: Fix nocow locks deadlock
2023-12-20 11:24:28 -08:00
Linus Torvalds
ac1c13e257 nfsd-6.7 fixes:
- Address a few recently-introduced issues
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmWB+F4ACgkQM2qzM29m
 f5dZKBAAoAxQA9FTIcb+uMIviGvMFBylKv9mhQxKSkhWhhP9XX2ESXAl+6pcBQJP
 ikjWw09WO/6ttvxg4hY7exk6FgkDuCALnx1xB6azEq5Ndf4T2aauuFUTEERfQ8th
 mtGSKEX1a98bGlkBIVQgfr3VFbQ0MRwcr0iGKfbHC3X5GiuFZOHn5fFE+0LjXWyJ
 7MjI0Du3lXDlrG5mJ88T5ySMcbaOWBTzqlMY3kbwcf1dWy3TZ6uh6faXqbemrDPg
 Zixj6JGi+oLlrbYYjQV1Sm5QLW3e882QCQ0U9g2sOEjmLRN/hbCE90i6l1rVEe9a
 E1ZkAY5qtl+iFK6cbAYUt6lOcZaF8lNeEtBW5JhcFm7f7CJAUSmMb05klZ6rwpXR
 l6UDwQtnAbmghqAwuKaWdfxys/yqFTvKNJRida7+qK1fs8H8q0y/acKsIv/5PoWz
 cJvWXova4wIT7Q3xneaYXhHk1aWTQTooErQ0VrgnsE0Ch7Vc4sONhQ3SLajllChD
 x3JL0DgDJ6mO/JaFZCfIy3ihEaprV0uL3bqgXaad1SszEWJt4HCCE1fQpI0cSbXH
 Vyq5H0R74EoLh8TGc6dbJ5wZwU81P6Rc0nHz/VU+gUPrEeNYKv8P2bahnXfR7Nsc
 F93sl2DqQOEv/Q/h/9UolwL7AGl6TkzdPKEoYKNafagUWT031y0=
 =F1lB
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:

 - Address a few recently-introduced issues

* tag 'nfsd-6.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  SUNRPC: Revert 5f7fc5d69f
  NFSD: Revert 738401a9bd
  NFSD: Revert 6c41d9a9bd
  nfsd: hold nfsd_mutex across entire netlink operation
  nfsd: call nfsd_last_thread() before final nfsd_put()
2023-12-20 11:16:50 -08:00
David Howells
74cef6872c afs: Fix dynamic root lookup DNS check
In the afs dynamic root directory, the ->lookup() function does a DNS check
on the cell being asked for and if the DNS upcall reports an error it will
report an error back to userspace (typically ENOENT).

However, if a failed DNS upcall returns a new-style result, it will return
a valid result, with the status field set appropriately to indicate the
type of failure - and in that case, dns_query() doesn't return an error and
we let stat() complete with no error - which can cause confusion in
userspace as subsequent calls that trigger d_automount then fail with
ENOENT.

Fix this by checking the status result from a valid dns_query() and
returning an error if it indicates a failure.

Fixes: bbb4c4323a ("dns: Allow the dns resolver to retrieve a server set")
Reported-by: Markus Suvanto <markus.suvanto@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=216637
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Markus Suvanto <markus.suvanto@gmail.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2023-12-20 11:57:47 +00:00
David Howells
71f8b55bc3 afs: Fix the dynamic root's d_delete to always delete unused dentries
Fix the afs dynamic root's d_delete function to always delete unused
dentries rather than only deleting them if they're positive.  With things
as they stand upstream, negative dentries stemming from failed DNS lookups
stick around preventing retries.

Fixes: 66c7e1d319 ("afs: Split the dynroot stuff out and give it its own ops tables")
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Markus Suvanto <markus.suvanto@gmail.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2023-12-20 11:57:35 +00:00
Kent Overstreet
b0c279ff6c bcachefs: fix BCH_FSCK_ERR enum
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-19 19:01:52 -05:00
Kent Overstreet
247ce5f1bb bcachefs: Fix bch2_alloc_sectors_start_trans() error handling
When we fail to allocate because of insufficient open buckets, we don't
want to retry from the full set of devices - we just want to retry in
blocking mode.

But if the retry in blocking mode fails with a different error code, we
end up squashing the -BCH_ERR_open_buckets_empty error with an error
that makes us thing we won't be able to allocate (insufficient_devices)
- which is incorrect when we didn't try to allocate from the full set of
devices, and causes the write to fail.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-19 19:01:52 -05:00
Kent Overstreet
7ba1f6ec97 bcachefs; guard against overflow in btree node split
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-19 16:18:16 -05:00
Kent Overstreet
0fa3b97767 bcachefs: btree_node_u64s_with_format() takes nr keys
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-19 16:18:13 -05:00
Shyam Prasad N
12d1e301bd cifs: do not let cifs_chan_update_iface deallocate channels
cifs_chan_update_iface is meant to check and update the server
interface used for a channel when the existing server interface
is no longer available.

So far, this handler had the code to remove an interface entry
even if a new candidate interface is not available. Allowing
this leads to several corner cases to handle.

This change makes the logic much simpler by not deallocating
the current channel interface entry if a new interface is not
found to replace it with.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-19 11:04:04 -06:00
Shyam Prasad N
f30bbc3870 cifs: fix a pending undercount of srv_count
The following commit reverted the changes to ref count
the server struct while scheduling a reconnect work:
8233425248 Revert "cifs: reconnect work should have reference on server struct"

However, a following change also introduced scheduling
of reconnect work, and assumed ref counting. This change
fixes that as well.

Fixes umount problems like:

[73496.157838] CPU: 5 PID: 1321389 Comm: umount Tainted: G        W  OE      6.7.0-060700rc6-generic #202312172332
[73496.157841] Hardware name: LENOVO 20MAS08500/20MAS08500, BIOS N2CET67W (1.50 ) 12/15/2022
[73496.157843] RIP: 0010:cifs_put_tcp_session+0x17d/0x190 [cifs]
[73496.157906] Code: 5d 31 c0 31 d2 31 f6 31 ff c3 cc cc cc cc e8 4a 6e 14 e6 e9 f6 fe ff ff be 03 00 00 00 48 89 d7 e8 78 26 b3 e5 e9 e4 fe ff ff <0f> 0b e9 b1 fe ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90
[73496.157908] RSP: 0018:ffffc90003bcbcb8 EFLAGS: 00010286
[73496.157911] RAX: 00000000ffffffff RBX: ffff8885830fa800 RCX: 0000000000000000
[73496.157913] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[73496.157915] RBP: ffffc90003bcbcc8 R08: 0000000000000000 R09: 0000000000000000
[73496.157917] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[73496.157918] R13: ffff8887d56ba800 R14: 00000000ffffffff R15: ffff8885830fa800
[73496.157920] FS:  00007f1ff0e33800(0000) GS:ffff88887ba80000(0000) knlGS:0000000000000000
[73496.157922] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[73496.157924] CR2: 0000115f002e2010 CR3: 00000003d1e24005 CR4: 00000000003706f0
[73496.157926] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[73496.157928] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[73496.157929] Call Trace:
[73496.157931]  <TASK>
[73496.157933]  ? show_regs+0x6d/0x80
[73496.157936]  ? __warn+0x89/0x160
[73496.157939]  ? cifs_put_tcp_session+0x17d/0x190 [cifs]
[73496.157976]  ? report_bug+0x17e/0x1b0
[73496.157980]  ? handle_bug+0x51/0xa0
[73496.157983]  ? exc_invalid_op+0x18/0x80
[73496.157985]  ? asm_exc_invalid_op+0x1b/0x20
[73496.157989]  ? cifs_put_tcp_session+0x17d/0x190 [cifs]
[73496.158023]  ? cifs_put_tcp_session+0x1e/0x190 [cifs]
[73496.158057]  __cifs_put_smb_ses+0x2b5/0x540 [cifs]
[73496.158090]  ? tconInfoFree+0xc2/0x120 [cifs]
[73496.158130]  cifs_put_tcon.part.0+0x108/0x2b0 [cifs]
[73496.158173]  cifs_put_tlink+0x49/0x90 [cifs]
[73496.158220]  cifs_umount+0x56/0xb0 [cifs]
[73496.158258]  cifs_kill_sb+0x52/0x60 [cifs]
[73496.158306]  deactivate_locked_super+0x32/0xc0
[73496.158309]  deactivate_super+0x46/0x60
[73496.158311]  cleanup_mnt+0xc3/0x170
[73496.158314]  __cleanup_mnt+0x12/0x20
[73496.158330]  task_work_run+0x5e/0xa0
[73496.158333]  exit_to_user_mode_loop+0x105/0x130
[73496.158336]  exit_to_user_mode_prepare+0xa5/0xb0
[73496.158338]  syscall_exit_to_user_mode+0x29/0x60
[73496.158341]  do_syscall_64+0x6c/0xf0
[73496.158344]  ? syscall_exit_to_user_mode+0x37/0x60
[73496.158346]  ? do_syscall_64+0x6c/0xf0
[73496.158349]  ? exit_to_user_mode_prepare+0x30/0xb0
[73496.158353]  ? syscall_exit_to_user_mode+0x37/0x60
[73496.158355]  ? do_syscall_64+0x6c/0xf0

Reported-by: Robert Morris <rtm@csail.mit.edu>
Fixes: 705fc522fe ("cifs: handle when server starts supporting multichannel")
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-19 10:59:11 -06:00
Zizhi Wo
01fe654f78 fs: cifs: Fix atime update check
Commit 9b9c5bea0b ("cifs: do not return atime less than mtime") indicates
that in cifs, if atime is less than mtime, some apps will break.
Therefore, it introduce a function to compare this two variables in two
places where atime is updated. If atime is less than mtime, update it to
mtime.

However, the patch was handled incorrectly, resulting in atime and mtime
being exactly equal. A previous commit 69738cfdfa ("fs: cifs: Fix atime
update check vs mtime") fixed one place and forgot to fix another. Fix it.

Fixes: 9b9c5bea0b ("cifs: do not return atime less than mtime")
Cc: stable@vger.kernel.org
Signed-off-by: Zizhi Wo <wozizhi@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-19 10:33:57 -06:00
Paulo Alcantara
567320c46a smb: client: fix potential OOB in smb2_dump_detail()
Validate SMB message with ->check_message() before calling
->calc_smb_size().

This fixes CVE-2023-6610.

Reported-by: j51569436@gmail.com
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218219
Cc; stable@vger.kernel.org
Signed-off-by: Paulo Alcantara <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-19 10:33:39 -06:00
Chuck Lever
1227561c2f NFSD: Revert 738401a9bd
There's nothing wrong with this commit, but this is dead code now
that nothing triggers a CB_GETATTR callback. It can be re-introduced
once the issues with handling conflicting GETATTRs are resolved.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-12-18 11:22:19 -05:00
Chuck Lever
862bee84d7 NFSD: Revert 6c41d9a9bd
For some reason, the wait_on_bit() in nfsd4_deleg_getattr_conflict()
is waiting forever, preventing a clean server shutdown. The
requesting client might also hang waiting for a reply to the
conflicting GETATTR.

Invoking wait_on_bit() in an nfsd thread context is a hazard. The
correct fix is to replace this wait_on_bit() call site with a
mechanism that defers the conflicting GETATTR until the CB_GETATTR
completes or is known to have failed.

That will require some surgery and extended testing and it's late
in the v6.7-rc cycle, so I'm reverting now in favor of trying again
in a subsequent kernel release.

This is my fault: I should have recognized the ramifications of
calling wait_on_bit() in here before accepting this patch.

Thanks to Dai Ngo <dai.ngo@oracle.com> for diagnosing the issue.

Reported-by: Wolfgang Walter <linux-nfs@stwm.de>
Closes: https://lore.kernel.org/linux-nfs/e3d43ecdad554fbdcaa7181833834f78@stwm.de/
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-12-18 11:22:16 -05:00
Kent Overstreet
e8c7692718 bcachefs: print explicit recovery pass message only once
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-17 23:05:43 -05:00
Paulo Alcantara
b50492b05f smb: client: fix potential OOB in cifs_dump_detail()
Validate SMB message with ->check_message() before calling
->calc_smb_size().

Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-17 19:03:08 -06:00
Paulo Alcantara
b35858b378 smb: client: fix OOB in smbCalcSize()
Validate @smb->WordCount to avoid reading off the end of @smb and thus
causing the following KASAN splat:

  BUG: KASAN: slab-out-of-bounds in smbCalcSize+0x32/0x40 [cifs]
  Read of size 2 at addr ffff88801c024ec5 by task cifsd/1328

  CPU: 1 PID: 1328 Comm: cifsd Not tainted 6.7.0-rc5 #9
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
  rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
  Call Trace:
   <TASK>
   dump_stack_lvl+0x4a/0x80
   print_report+0xcf/0x650
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? __phys_addr+0x46/0x90
   kasan_report+0xd8/0x110
   ? smbCalcSize+0x32/0x40 [cifs]
   ? smbCalcSize+0x32/0x40 [cifs]
   kasan_check_range+0x105/0x1b0
   smbCalcSize+0x32/0x40 [cifs]
   checkSMB+0x162/0x370 [cifs]
   ? __pfx_checkSMB+0x10/0x10 [cifs]
   cifs_handle_standard+0xbc/0x2f0 [cifs]
   ? srso_alias_return_thunk+0x5/0xfbef5
   cifs_demultiplex_thread+0xed1/0x1360 [cifs]
   ? __pfx_cifs_demultiplex_thread+0x10/0x10 [cifs]
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? lockdep_hardirqs_on_prepare+0x136/0x210
   ? __pfx_lock_release+0x10/0x10
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? mark_held_locks+0x1a/0x90
   ? lockdep_hardirqs_on_prepare+0x136/0x210
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? __kthread_parkme+0xce/0xf0
   ? __pfx_cifs_demultiplex_thread+0x10/0x10 [cifs]
   kthread+0x18d/0x1d0
   ? kthread+0xdb/0x1d0
   ? __pfx_kthread+0x10/0x10
   ret_from_fork+0x34/0x60
   ? __pfx_kthread+0x10/0x10
   ret_from_fork_asm+0x1b/0x30
   </TASK>

This fixes CVE-2023-6606.

Reported-by: j51569436@gmail.com
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218218
Cc: stable@vger.kernel.org
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-17 19:02:59 -06:00
Paulo Alcantara
33eae65c6f smb: client: fix OOB in SMB2_query_info_init()
A small CIFS buffer (448 bytes) isn't big enough to hold
SMB2_QUERY_INFO request along with user's input data from
CIFS_QUERY_INFO ioctl.  That is, if the user passed an input buffer >
344 bytes, the client will memcpy() off the end of @req->Buffer in
SMB2_query_info_init() thus causing the following KASAN splat:

  BUG: KASAN: slab-out-of-bounds in SMB2_query_info_init+0x242/0x250 [cifs]
  Write of size 1023 at addr ffff88801308c5a8 by task a.out/1240

  CPU: 1 PID: 1240 Comm: a.out Not tainted 6.7.0-rc4 #5
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
  rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
  Call Trace:
   <TASK>
   dump_stack_lvl+0x4a/0x80
   print_report+0xcf/0x650
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? __phys_addr+0x46/0x90
   kasan_report+0xd8/0x110
   ? SMB2_query_info_init+0x242/0x250 [cifs]
   ? SMB2_query_info_init+0x242/0x250 [cifs]
   kasan_check_range+0x105/0x1b0
   __asan_memcpy+0x3c/0x60
   SMB2_query_info_init+0x242/0x250 [cifs]
   ? __pfx_SMB2_query_info_init+0x10/0x10 [cifs]
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? smb_rqst_len+0xa6/0xc0 [cifs]
   smb2_ioctl_query_info+0x4f4/0x9a0 [cifs]
   ? __pfx_smb2_ioctl_query_info+0x10/0x10 [cifs]
   ? __pfx_cifsConvertToUTF16+0x10/0x10 [cifs]
   ? kasan_set_track+0x25/0x30
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? __kasan_kmalloc+0x8f/0xa0
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? cifs_strndup_to_utf16+0x12d/0x1a0 [cifs]
   ? __build_path_from_dentry_optional_prefix+0x19d/0x2d0 [cifs]
   ? __pfx_smb2_ioctl_query_info+0x10/0x10 [cifs]
   cifs_ioctl+0x11c7/0x1de0 [cifs]
   ? __pfx_cifs_ioctl+0x10/0x10 [cifs]
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? rcu_is_watching+0x23/0x50
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? __rseq_handle_notify_resume+0x6cd/0x850
   ? __pfx___schedule+0x10/0x10
   ? blkcg_iostat_update+0x250/0x290
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? ksys_write+0xe9/0x170
   __x64_sys_ioctl+0xc9/0x100
   do_syscall_64+0x47/0xf0
   entry_SYSCALL_64_after_hwframe+0x6f/0x77
  RIP: 0033:0x7f893dde49cf
  Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48
  89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89>
  c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00
  RSP: 002b:00007ffc03ff4160 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
  RAX: ffffffffffffffda RBX: 00007ffc03ff4378 RCX: 00007f893dde49cf
  RDX: 00007ffc03ff41d0 RSI: 00000000c018cf07 RDI: 0000000000000003
  RBP: 00007ffc03ff4260 R08: 0000000000000410 R09: 0000000000000001
  R10: 00007f893dce7300 R11: 0000000000000246 R12: 0000000000000000
  R13: 00007ffc03ff4388 R14: 00007f893df15000 R15: 0000000000406de0
   </TASK>

Fix this by increasing size of SMB2_QUERY_INFO request buffers and
validating input length to prevent other callers from overflowing @req
in SMB2_query_info_init() as well.

Fixes: f5b05d622a ("cifs: add IOCTL for QUERY_INFO passthrough to userspace")
Cc: stable@vger.kernel.org
Reported-by: Robert Morris <rtm@csail.mit.edu>
Signed-off-by: Paulo Alcantara <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-17 19:00:14 -06:00
Paulo Alcantara
a8f68b1115 smb: client: fix OOB in cifsd when receiving compounded resps
Validate next header's offset in ->next_header() so that it isn't
smaller than MID_HEADER_SIZE(server) and then standard_receive3() or
->receive() ends up writing off the end of the buffer because
'pdu_length - MID_HEADER_SIZE(server)' wraps up to a huge length:

  BUG: KASAN: slab-out-of-bounds in _copy_to_iter+0x4fc/0x840
  Write of size 701 at addr ffff88800caf407f by task cifsd/1090

  CPU: 0 PID: 1090 Comm: cifsd Not tainted 6.7.0-rc4 #5
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
  rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
  Call Trace:
   <TASK>
   dump_stack_lvl+0x4a/0x80
   print_report+0xcf/0x650
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? __phys_addr+0x46/0x90
   kasan_report+0xd8/0x110
   ? _copy_to_iter+0x4fc/0x840
   ? _copy_to_iter+0x4fc/0x840
   kasan_check_range+0x105/0x1b0
   __asan_memcpy+0x3c/0x60
   _copy_to_iter+0x4fc/0x840
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? hlock_class+0x32/0xc0
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? __pfx__copy_to_iter+0x10/0x10
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? lock_is_held_type+0x90/0x100
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? __might_resched+0x278/0x360
   ? __pfx___might_resched+0x10/0x10
   ? srso_alias_return_thunk+0x5/0xfbef5
   __skb_datagram_iter+0x2c2/0x460
   ? __pfx_simple_copy_to_iter+0x10/0x10
   skb_copy_datagram_iter+0x6c/0x110
   tcp_recvmsg_locked+0x9be/0xf40
   ? __pfx_tcp_recvmsg_locked+0x10/0x10
   ? mark_held_locks+0x5d/0x90
   ? srso_alias_return_thunk+0x5/0xfbef5
   tcp_recvmsg+0xe2/0x310
   ? __pfx_tcp_recvmsg+0x10/0x10
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? lock_acquire+0x14a/0x3a0
   ? srso_alias_return_thunk+0x5/0xfbef5
   inet_recvmsg+0xd0/0x370
   ? __pfx_inet_recvmsg+0x10/0x10
   ? __pfx_lock_release+0x10/0x10
   ? do_raw_spin_trylock+0xd1/0x120
   sock_recvmsg+0x10d/0x150
   cifs_readv_from_socket+0x25a/0x490 [cifs]
   ? __pfx_cifs_readv_from_socket+0x10/0x10 [cifs]
   ? srso_alias_return_thunk+0x5/0xfbef5
   cifs_read_from_socket+0xb5/0x100 [cifs]
   ? __pfx_cifs_read_from_socket+0x10/0x10 [cifs]
   ? __pfx_lock_release+0x10/0x10
   ? do_raw_spin_trylock+0xd1/0x120
   ? _raw_spin_unlock+0x23/0x40
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? __smb2_find_mid+0x126/0x230 [cifs]
   cifs_demultiplex_thread+0xd39/0x1270 [cifs]
   ? __pfx_cifs_demultiplex_thread+0x10/0x10 [cifs]
   ? __pfx_lock_release+0x10/0x10
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? mark_held_locks+0x1a/0x90
   ? lockdep_hardirqs_on_prepare+0x136/0x210
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? __kthread_parkme+0xce/0xf0
   ? __pfx_cifs_demultiplex_thread+0x10/0x10 [cifs]
   kthread+0x18d/0x1d0
   ? kthread+0xdb/0x1d0
   ? __pfx_kthread+0x10/0x10
   ret_from_fork+0x34/0x60
   ? __pfx_kthread+0x10/0x10
   ret_from_fork_asm+0x1b/0x30
   </TASK>

Fixes: 8ce79ec359 ("cifs: update multiplex loop to handle compounded responses")
Cc: stable@vger.kernel.org
Reported-by: Robert Morris <rtm@csail.mit.edu>
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-17 19:00:12 -06:00
Linus Torvalds
0e38983467 for-6.7-rc5-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmV/Kr0ACgkQxWXV+ddt
 WDveXA/+N3y74uafOZI8Bh4PtHuArgjdHsbQVO0Oev5j4dFyDbrz0D84YqGxfB1X
 GFQzbv01xuyvuJfXQ5Pyfnqt/N/K4ZDGg6kkYR2MC9T3LOGZFv5kyTSFbj2q0Qy7
 3K+xolPmk34DBjipCKi5kV7wo2xLxqpnzs5oYZzwfaSRig+GuG30u/levADc7uG/
 fcnVbvf2Vz8YgIe/62RkZc7jWQrhjGPyrTVN5pj75+o2Up7iKM63F2eOTcTj/Fqk
 RMWBuDNSEiYBm6SPUwpBJ7r6NHbKuXbtbceelsOD36wL4i+lZGOhM/8Tlw/6U2Ks
 JxRkezDn62NiwZKd9d7po1AKPziFOdXjqhc3tZIFjR0xSgsjFFFrI6Qig/BURlbx
 L70c+dqojYpQvGndr9+wPxdEyUigAiCP7y7eym4yegY+93W/UXSjMGAUxCPKkgpL
 FUUB5HBIn2P3KeJGidu2NRWW85163ISEASUcyhcLA1hd5LThWbdyXxWO19lG6foH
 lLg0U0LJ+2HSB6FjW9+GKFTzT8/90nmz5ap7N/Vl3xENz0KXgFuDXx76bvW8Yj1E
 t8hrtXEMD+RaTZI7OFYpSEtmD5zeoJx48FLalwlEblHHbMcgPsLTfiBLA4GR3VHa
 vMn3mRrCowyOYoUljZm1aS1sWPwk+VT3gBpxDSQermYjT7x40Tc=
 =HN3b
 -----END PGP SIGNATURE-----

Merge tag 'for-6.7-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fix from David Sterba:
 "One more fix that verifies that the snapshot source is a root, same
  check is also done in user space but should be done by the ioctl as
  well"

* tag 'for-6.7-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: do not allow non subvolume root targets for snapshot
2023-12-17 09:27:36 -08:00
Amir Goldstein
413ba91089 ovl: fix dentry reference leak after changes to underlying layers
syzbot excercised the forbidden practice of moving the workdir under
lowerdir while overlayfs is mounted and tripped a dentry reference leak.

Fixes: c63e56a4a6 ("ovl: do not open/llseek lower file with upper sb_writers held")
Reported-and-tested-by: syzbot+8608bb4553edb8c78f41@syzkaller.appspotmail.com
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-12-17 13:33:46 +02:00
Linus Torvalds
3b8a9b2e68 Tracing fixes for v6.7-rc5:
- Fix eventfs to check creating new files for events with names greater than
   NAME_MAX. The eventfs lookup needs to check the return result of
   simple_lookup().
 
 - Fix the ring buffer to check the proper max data size. Events must be able to
   fit on the ring buffer sub-buffer, if it cannot, then it fails to be written
   and the logic to add the event is avoided. The code to check if an event can
   fit failed to add the possible absolute timestamp which may make the event
   not be able to fit. This causes the ring buffer to go into an infinite loop
   trying to find a sub-buffer that would fit the event. Luckily, there's a check
   that will bail out if it looped over a 1000 times and it also warns.
 
   The real fix is not to add the absolute timestamp to an event that is
   starting at the beginning of a sub-buffer because it uses the sub-buffer
   timestamp. By avoiding the timestamp at the start of the sub-buffer allows
   events that pass the first check to always find a sub-buffer that it can fit
   on.
 
 - Have large events that do not fit on a trace_seq to print "LINE TOO BIG" like
   it does for the trace_pipe instead of what it does now which is to silently
   drop the output.
 
 - Fix a memory leak of forgetting to free the spare page that is saved by a
   trace instance.
 
 - Update the size of the snapshot buffer when the main buffer is updated if the
   snapshot buffer is allocated.
 
 - Fix ring buffer timestamp logic by removing all the places that tried to put
   the before_stamp back to the write stamp so that the next event doesn't add
   an absolute timestamp. But each of these updates added a race where by making
   the two timestamp equal, it was validating the write_stamp so that it can be
   incorrectly used for calculating the delta of an event.
 
 - There's a temp buffer used for printing the event that was using the event
   data size for allocation when it needed to use the size of the entire event
   (meta-data and payload data)
 
 - For hardening, use "%.*s" for printing the trace_marker output, to limit the
   amount that is printed by the size of the event. This was discovered by
   development that added a bug that truncated the '\0' and caused a crash.
 
 - Fix a use-after-free bug in the use of the histogram files when an instance
   is being removed.
 
 - Remove a useless update in the rb_try_to_discard of the write_stamp. The
   before_stamp was already changed to force the next event to add an absolute
   timestamp that the write_stamp is not used. But the write_stamp is modified
   again using an unneeded 64-bit cmpxchg.
 
 - Fix several races in the 32-bit implementation of the rb_time_cmpxchg() that
   does a 64-bit cmpxchg.
 
 - While looking at fixing the 64-bit cmpxchg, I noticed that because the ring
   buffer uses normal cmpxchg, and this can be done in NMI context, there's some
   architectures that do not have a working cmpxchg in NMI context. For these
   architectures, fail recording events that happen in NMI context.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZX0nChQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qlOMAQD3iegTcceQl9lAsroa3tb3xdweC1GP
 51MsX5athxSyoQEAutI/2pBCtLFXgTLMHAMd5F23EM1U9rha7W0myrnvKQY=
 =d3bS
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Fix eventfs to check creating new files for events with names greater
   than NAME_MAX. The eventfs lookup needs to check the return result of
   simple_lookup().

 - Fix the ring buffer to check the proper max data size. Events must be
   able to fit on the ring buffer sub-buffer, if it cannot, then it
   fails to be written and the logic to add the event is avoided. The
   code to check if an event can fit failed to add the possible absolute
   timestamp which may make the event not be able to fit. This causes
   the ring buffer to go into an infinite loop trying to find a
   sub-buffer that would fit the event. Luckily, there's a check that
   will bail out if it looped over a 1000 times and it also warns.

   The real fix is not to add the absolute timestamp to an event that is
   starting at the beginning of a sub-buffer because it uses the
   sub-buffer timestamp.

   By avoiding the timestamp at the start of the sub-buffer allows
   events that pass the first check to always find a sub-buffer that it
   can fit on.

 - Have large events that do not fit on a trace_seq to print "LINE TOO
   BIG" like it does for the trace_pipe instead of what it does now
   which is to silently drop the output.

 - Fix a memory leak of forgetting to free the spare page that is saved
   by a trace instance.

 - Update the size of the snapshot buffer when the main buffer is
   updated if the snapshot buffer is allocated.

 - Fix ring buffer timestamp logic by removing all the places that tried
   to put the before_stamp back to the write stamp so that the next
   event doesn't add an absolute timestamp. But each of these updates
   added a race where by making the two timestamp equal, it was
   validating the write_stamp so that it can be incorrectly used for
   calculating the delta of an event.

 - There's a temp buffer used for printing the event that was using the
   event data size for allocation when it needed to use the size of the
   entire event (meta-data and payload data)

 - For hardening, use "%.*s" for printing the trace_marker output, to
   limit the amount that is printed by the size of the event. This was
   discovered by development that added a bug that truncated the '\0'
   and caused a crash.

 - Fix a use-after-free bug in the use of the histogram files when an
   instance is being removed.

 - Remove a useless update in the rb_try_to_discard of the write_stamp.
   The before_stamp was already changed to force the next event to add
   an absolute timestamp that the write_stamp is not used. But the
   write_stamp is modified again using an unneeded 64-bit cmpxchg.

 - Fix several races in the 32-bit implementation of the
   rb_time_cmpxchg() that does a 64-bit cmpxchg.

 - While looking at fixing the 64-bit cmpxchg, I noticed that because
   the ring buffer uses normal cmpxchg, and this can be done in NMI
   context, there's some architectures that do not have a working
   cmpxchg in NMI context. For these architectures, fail recording
   events that happen in NMI context.

* tag 'trace-v6.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  ring-buffer: Do not record in NMI if the arch does not support cmpxchg in NMI
  ring-buffer: Have rb_time_cmpxchg() set the msb counter too
  ring-buffer: Fix 32-bit rb_time_read() race with rb_time_cmpxchg()
  ring-buffer: Fix a race in rb_time_cmpxchg() for 32 bit archs
  ring-buffer: Remove useless update to write_stamp in rb_try_to_discard()
  ring-buffer: Do not try to put back write_stamp
  tracing: Fix uaf issue when open the hist or hist_debug file
  tracing: Add size check when printing trace_marker output
  ring-buffer: Have saved event hold the entire event
  ring-buffer: Do not update before stamp when switching sub-buffers
  tracing: Update snapshot buffer on resize if it is allocated
  ring-buffer: Fix memory leak of free page
  eventfs: Fix events beyond NAME_MAX blocking tasks
  tracing: Have large events show up as '[LINE TOO BIG]' instead of nothing
  ring-buffer: Fix writing to the buffer with max_data_size
2023-12-16 10:40:51 -08:00
Josef Bacik
a8892fd719 btrfs: do not allow non subvolume root targets for snapshot
Our btrfs subvolume snapshot <source> <destination> utility enforces
that <source> is the root of the subvolume, however this isn't enforced
in the kernel.  Update the kernel to also enforce this limitation to
avoid problems with other users of this ioctl that don't have the
appropriate checks in place.

Reported-by: Martin Michaelis <code@mgjm.de>
CC: stable@vger.kernel.org # 4.14+
Reviewed-by: Neal Gompa <neal@gompa.dev>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15 23:46:51 +01:00
Jens Axboe
ae1914174a cred: get rid of CONFIG_DEBUG_CREDENTIALS
This code is rarely (never?) enabled by distros, and it hasn't caught
anything in decades. Let's kill off this legacy debug code.

Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-12-15 14:19:48 -08:00
NeilBrown
1bd773b4f0 nfsd: hold nfsd_mutex across entire netlink operation
Rather than using svc_get() and svc_put() to hold a stable reference to
the nfsd_svc for netlink lookups, simply hold the mutex for the entire
time.

The "entire" time isn't very long, and the mutex is not often contented.

This makes way for us to remove the refcounts of svc, which is more
confusing than useful.

Reported-by: Jeff Layton <jlayton@kernel.org>
Closes: https://lore.kernel.org/linux-nfs/5d9bbb599569ce29f16e4e0eef6b291eda0f375b.camel@kernel.org/T/#u
Fixes: bd9d6a3efa ("NFSD: add rpc_status netlink support")
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-12-15 10:23:52 -05:00
NeilBrown
2a501f55cd nfsd: call nfsd_last_thread() before final nfsd_put()
If write_ports_addfd or write_ports_addxprt fail, they call nfsd_put()
without calling nfsd_last_thread().  This leaves nn->nfsd_serv pointing
to a structure that has been freed.

So remove 'static' from nfsd_last_thread() and call it when the
nfsd_serv is about to be destroyed.

Fixes: ec52361df9 ("SUNRPC: stop using ->sv_nrthreads as a refcount")
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-12-15 10:23:46 -05:00
Linus Torvalds
3f7168591e four import client fixes addressing potential overflows, all marked for stable as well
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmV7q7QACgkQiiy9cAdy
 T1G9EQv/fpdrMMDcivh3h8vzZTxR9kIDa971C/wEPgQb4CNtRp2LTfybg/OOeyPD
 qtdRVXyUs3fA/1/tCxfdo2Jan1E4iEFOkzGXv+EmolCpQ5Ye3tEsAwF6s5eP9pUc
 wR5/swzNFdVfW5BwoES7/RonMezc43OXWZY0Y/9NiaPZKV7i8NTz2ZlfDMjPkplL
 Pxlmiht62L11O3Ui4h8udVGaLagfbmbPt4MLfpuMupDFg071XA8Sz8AF0Wfqh2zu
 WxkTCGHD6Oj8GPp1gJcVUkLgugvSzeSmarTOgygZVF5/fIeFJKB8VrfqCxDZcxhe
 e4E4QEv6tfetutwuCFJejTHeNgrzvMOoR+tuw5/oci/W8msq0l91varSXf0TwUBc
 7ZSnFIw92Oa4pG0zYV9SbTAxEwuoMbrUAXDvraT9AccBYFBZm66TVooR2rnTwRwc
 art398CiTdRcllP9g4ZI4ogxzkHHsVJnQ5w0h/R6/7Y1qLEqRcps84LwmSMYaK4y
 5jad3mh9
 =i6Gk
 -----END PGP SIGNATURE-----

Merge tag '6.7-rc5-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:
 "Address OOBs and NULL dereference found by Dr. Morris's recent
  analysis and fuzzing.

  All marked for stable as well"

* tag '6.7-rc5-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb: client: fix OOB in smb2_query_reparse_point()
  smb: client: fix NULL deref in asn1_ber_decoder()
  smb: client: fix potential OOBs in smb2_parse_contexts()
  smb: client: fix OOB in receive_encrypted_standard()
2023-12-14 19:57:42 -08:00
Daniel Hill
85c6db9809 bcachefs: improve modprobe support by providing softdeps
We need to help modprobe load architecture specific modules so we don't
fall back to generic software implementations, this should help
performance when building as a module.

Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-14 15:24:14 -05:00
Thomas Bertschinger
50a8a732d2 bcachefs: fix invalid memory access in bch2_fs_alloc() error path
When bch2_fs_alloc() gets an error before calling
bch2_fs_btree_iter_init(), bch2_fs_btree_iter_exit() makes an invalid
memory access because btree_trans_list is uninitialized.

Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com>
Fixes: 6bd68ec266 ("bcachefs: Heap allocate btree_trans")
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-14 15:24:14 -05:00
Linus Torvalds
bdb2701f0b for-6.7-rc5-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmV5rTIACgkQxWXV+ddt
 WDuLUg/+Ix/CeA+JY6VZMA2kBHMzmRexSjYONWfQwIL7LPBy4sOuSEaTZt+QQMs+
 AEKau1YfTgo7e9S2DlbZhIWp6P87VFui7Q1E99uJEmKelakvf94DbMrufPTTKjaD
 JG2KB6LsD59yWwfbGHEAVVNGSMRk2LDXzcUWMK6/uzu/7Bcr4ataOymWd86/blUV
 cw5g87uAHpBn+R1ARTf1CkqyYiI9UldNUJmW1q7dwxOyYG+weUtJImosw2Uda76y
 wQXAFQAH3vsFzTC+qjC9Vz7cnyAX9qAw48ODRH7rIT1BQ3yAFQbfXE20jJ/fSE+C
 lz3p05tA9373KAOtLUHmANBwe3NafCnlut6ZYRfpTcEzUslAO5PnajPaHh5Al7uC
 Iwdpy49byoyVFeNf0yECBsuDP8s86HlUALF8mdJabPI1Kl66MUea6KgS1oyO3pCB
 hfqLbpofV4JTywtIRLGQTQvzSwkjPHTbSwtZ9nftTw520a5f7memDu5vi4XzFd+B
 NrJxmz2DrMRlwrLgWg9OXXgx1riWPvHnIoqzjG5W6A9N74Ud1/oz7t3VzjGSQ5S2
 UikRB6iofPE0deD8IF6H6DvFfvQxU9d9BJ6IS9V2zRt5vdgJ2w08FlqbLZewSY4x
 iaQ+L7UYKDjC9hdosXVNu/6fAspyBVdSp2NbKk14fraZtNAoPNs=
 =uF/Q
 -----END PGP SIGNATURE-----

Merge tag 'for-6.7-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
  "Some fixes to quota accounting code, mostly around error handling and
   correctness:

   - free reserves on various error paths, after IO errors or
     transaction abort

   - don't clear reserved range at the folio release time, it'll be
     properly cleared after final write

   - fix integer overflow due to int used when passing around size of
     freed reservations

   - fix a regression in squota accounting that missed some cases with
     delayed refs"

* tag 'for-6.7-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: ensure releasing squota reserve on head refs
  btrfs: don't clear qgroup reserved bit in release_folio
  btrfs: free qgroup pertrans reserve on transaction abort
  btrfs: fix qgroup_free_reserved_data int overflow
  btrfs: free qgroup reserve when ORDERED_IOERR is set
2023-12-14 11:53:00 -08:00
Linus Torvalds
5bd7ef53ff ufs got broken this merge window on folio conversion - calling
conventions for filemap_lock_folio() are not the same as for
 find_lock_page()
 
 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZXnZHAAKCRBZ7Krx/gZQ
 6y3QAQCazzMsqWYmqfkbR5yGjolKBPS6ILFWBHWoFySs9/WptAEA3c/960nhFuh1
 aQE9Qp5zUlbWmSZ5zjz3Q2lX8N/jugU=
 =kyMm
 -----END PGP SIGNATURE-----

Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull ufs fix from Al Viro:
 "ufs got broken this merge window on folio conversion - calling
  conventions for filemap_lock_folio() are not the same as for
  find_lock_page()"

* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fix ufs_get_locked_folio() breakage
2023-12-13 11:09:58 -08:00
Jan Kara
8bf771972b bcachefs: Fix determining required file handle length
The ->encode_fh method is responsible for setting amount of space
required for storing the file handle if not enough space was provided.
bch2_encode_fh() was not setting required length in that case which
breaks e.g. fanotify. Fix it.

Reported-by: Petr Vorel <pvorel@suse.cz>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-13 13:09:36 -05:00
Al Viro
485053bb81 fix ufs_get_locked_folio() breakage
filemap_lock_folio() returns ERR_PTR(-ENOENT) if the thing is not
in cache - not NULL like find_lock_page() used to.

Fixes: 5fb7bd50b3 "ufs: add ufs_get_locked_folio and ufs_put_locked_folio"
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-12-13 11:14:09 -05:00
Beau Belgrave
5eaf7f0589 eventfs: Fix events beyond NAME_MAX blocking tasks
Eventfs uses simple_lookup(), however, it will fail if the name of the
entry is beyond NAME_MAX length. When this error is encountered, eventfs
still tries to create dentries instead of skipping the dentry creation.
When the dentry is attempted to be created in this state d_wait_lookup()
will loop forever, waiting for the lookup to be removed.

Fix eventfs to return the error in simple_lookup() back to the caller
instead of continuing to try to create the dentry.

Link: https://lore.kernel.org/linux-trace-kernel/20231210213534.497-1-beaub@linux.microsoft.com

Fixes: 6394044955 ("eventfs: Implement eventfs lookup, read, open functions")
Link: https://lore.kernel.org/linux-trace-kernel/20231208183601.GA46-beaub@linux.microsoft.com/
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-12-12 19:00:37 -05:00
Linus Torvalds
cf52eed70e Fix various bugs / regressions for ext4, including a soft lockup, a
WARN_ON, and a BUG.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmV4tMwACgkQ8vlZVpUN
 gaNZRAf/ejQZne9iZck8SSV62mkR9E7EwN9J2+gkWFrlsyurErZlVsBA5yRB+i9A
 V1v6DRGDnYFwKFNHJhR/RW9NhEwpYkX9Vo3miksSCq8rsAB1kjSs3xVrTBIYi/8c
 ztw4ncyxW7RRFRmruzFfUEKriiyJzxJYx+EqbNsQHcl5ET6Y2/5zM0bChV9MwuN3
 iS1Rm98RbHVrylzKbGG562MaGdJyUYvQ+mnRCgma1mTu6K9SWLJg211icLTsDhHg
 XEB/QGWji2O7xOudcry8wLIpoR6rYPAhWfbkLekW1K9hjV3iXuJoVjj7eB9LctMf
 FAXr8u0FKJI0iIQyrQrEEqIuh+jKBA==
 =4zQL
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
 "Fix various bugs / regressions for ext4, including a soft lockup, a
  WARN_ON, and a BUG"

* tag 'ext4_for_linus-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  jbd2: fix soft lockup in journal_finish_inode_data_buffers()
  ext4: fix warning in ext4_dio_write_end_io()
  jbd2: increase the journal IO's priority
  jbd2: correct the printing of write_flags in jbd2_write_superblock()
  ext4: prevent the normalized size from exceeding EXT_MAX_BLOCKS
2023-12-12 11:37:04 -08:00
Linus Torvalds
eaadbbaaff fuse fixes for 6.7-rc6
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCZXhQgQAKCRDh3BK/laaZ
 PHL9AQC0y7A+HLH6oXM8uI8rqC8e78qGdoGGl+Ppapae+BhO8gD+Or5B4yR0MZKR
 Z/j0zMe57mmRMplxcwz/LXXCqeE9+w0=
 =cn6g
 -----END PGP SIGNATURE-----

Merge tag 'fuse-fixes-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse

Pull fuse fixes from Miklos Szeredi:

 - Fix a couple of potential crashes, one introduced in 6.6 and one
   in 5.10

 - Fix misbehavior of virtiofs submounts on memory pressure

 - Clarify naming in the uAPI for a recent feature

* tag 'fuse-fixes-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: disable FOPEN_PARALLEL_DIRECT_WRITES with FUSE_DIRECT_IO_ALLOW_MMAP
  fuse: dax: set fc->dax to NULL in fuse_dax_conn_free()
  fuse: share lookup state between submount and its parent
  docs/fuse-io: Document the usage of DIRECT_IO_ALLOW_MMAP
  fuse: Rename DIRECT_IO_RELAX to DIRECT_IO_ALLOW_MMAP
2023-12-12 11:06:41 -08:00
Linus Torvalds
8b8cd4beea nine smb3 server fixes
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmV3pUoACgkQiiy9cAdy
 T1GQCgv/YURd8zz5k+GSOvUF2tCl6zW6h0NJQbWRIjgl4i7eGHZwIslgCI6kZIN1
 AFrSyUj4tZQmFvh0aVWLZeWsoKETbSkOYkz2dC4X/lC8LJD3VAy3vzAhu4oSAWva
 +pItQVlOTG0CcmMSTANSfw0sSsCwC2BHAUJnnu7ypgERI3wllOPtxE1xN9mT/8Bf
 NxJDZa3jtZd2hC4Cda1NTYYEfaSGufEOzPZIW9/h5ftpRo0qtEZkKh9TPddBMGm4
 yMnt1sSp4DHoW6xOyGOt+7kJAGA5NtP3/voLSjirG558Bb4HjWhBT+Dkxe6dUiXn
 i9gi1bFJ/8gRulv1cTdOxTFGE+i9Wr4PzpG2g82qugYRTl3LqLoJBa8NH+WzKz+q
 AX8EySFdlJtE++wTMNZB5hgFuJNGkzRi3YbjrQjvHFDQvaSVHvtayyhuEN+UcqAe
 gWuj1PTDKy6cfkxFYPDEBtMgp1u4+72nWOxoYUE5LyvzkLCLjfgMKCDX03RlAvfZ
 zB76cU/3
 =yMkH
 -----END PGP SIGNATURE-----

Merge tag '6.7-rc5-ksmbd-server-fixes' of git://git.samba.org/ksmbd

Pull smb server fixes from Steve French:

 - Memory leak fix (in lock error path)

 - Two fixes for create with allocation size

 - FIx for potential UAF in lease break error path

 - Five directory lease (caching) fixes found during additional recent
   testing

* tag '6.7-rc5-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: fix wrong name of SMB2_CREATE_ALLOCATION_SIZE
  ksmbd: fix wrong allocation size update in smb2_open()
  ksmbd: avoid duplicate opinfo_put() call on error of smb21_lease_break_ack()
  ksmbd: lazy v2 lease break on smb2_write()
  ksmbd: send v2 lease break notification for directory
  ksmbd: downgrade RWH lease caching state to RH for directory
  ksmbd: set v2 lease capability
  ksmbd: set epoch in create context v2 lease
  ksmbd: fix memory leak in smb2_lock()
2023-12-12 10:30:10 -08:00
Ye Bin
6c02757c93 jbd2: fix soft lockup in journal_finish_inode_data_buffers()
There's issue when do io test:
WARN: soft lockup - CPU#45 stuck for 11s! [jbd2/dm-2-8:4170]
CPU: 45 PID: 4170 Comm: jbd2/dm-2-8 Kdump: loaded Tainted: G  OE
Call trace:
 dump_backtrace+0x0/0x1a0
 show_stack+0x24/0x30
 dump_stack+0xb0/0x100
 watchdog_timer_fn+0x254/0x3f8
 __hrtimer_run_queues+0x11c/0x380
 hrtimer_interrupt+0xfc/0x2f8
 arch_timer_handler_phys+0x38/0x58
 handle_percpu_devid_irq+0x90/0x248
 generic_handle_irq+0x3c/0x58
 __handle_domain_irq+0x68/0xc0
 gic_handle_irq+0x90/0x320
 el1_irq+0xcc/0x180
 queued_spin_lock_slowpath+0x1d8/0x320
 jbd2_journal_commit_transaction+0x10f4/0x1c78 [jbd2]
 kjournald2+0xec/0x2f0 [jbd2]
 kthread+0x134/0x138
 ret_from_fork+0x10/0x18

Analyzed informations from vmcore as follows:
(1) There are about 5k+ jbd2_inode in 'commit_transaction->t_inode_list';
(2) Now is processing the 855th jbd2_inode;
(3) JBD2 task has TIF_NEED_RESCHED flag;
(4) There's no pags in address_space around the 855th jbd2_inode;
(5) There are some process is doing drop caches;
(6) Mounted with 'nodioread_nolock' option;
(7) 128 CPUs;

According to informations from vmcore we know 'journal->j_list_lock' spin lock
competition is fierce. So journal_finish_inode_data_buffers() maybe process
slowly. Theoretically, there is scheduling point in the filemap_fdatawait_range_keep_errors().
However, if inode's address_space has no pages which taged with PAGECACHE_TAG_WRITEBACK,
will not call cond_resched(). So may lead to soft lockup.
journal_finish_inode_data_buffers
  filemap_fdatawait_range_keep_errors
    __filemap_fdatawait_range
      while (index <= end)
        nr_pages = pagevec_lookup_range_tag(&pvec, mapping, &index, end, PAGECACHE_TAG_WRITEBACK);
        if (!nr_pages)
           break;    --> If 'nr_pages' is equal zero will break, then will not call cond_resched()
        for (i = 0; i < nr_pages; i++)
          wait_on_page_writeback(page);
        cond_resched();

To solve above issue, add scheduling point in the journal_finish_inode_data_buffers();

Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20231211112544.3879780-1-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-12-12 10:25:46 -05:00
Kent Overstreet
bedd6fe4d3 bcachefs: Fix nocow locks deadlock
On trylock failure we were waiting for outstanding reads to complete -
but nocow locks need to be held until the whole move is finished.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-11 20:43:11 -05:00
Linus Torvalds
26aff84943 More bcachefs bugfixes for 6.7:
- Fix a rare emergency shutdown path bug: dropping journal pins after
    the filesystem has mostly been torn down is not what we want.
  - Fix some concurrency issues with the btree write buffer and journal
    replay by not using the btree write buffer until journal replay is
    finished
  - A fixup from the prior patch to kill journal pre-reservations: at the
    start of the btree update path, where previously we took a
    pre-reservation, we do at least want to check the journal watermark.
  - Fix a race between dropping device metadata and btree node writes,
    which would re-add a pointer to a device that had just been dropped
  - Fix one of the SCRU lock warnings, in
    bch2_compression_stats_to_text().
  - Partial fix for a rare transaction paths overflow, when indirect
    extents had been split by background tasks, by not running certain
    triggers when they're not needed.
  - Fix for creating a snapshot with implicit source in a subdirectory of
    the containing subvolume
  - Don't unfreeze when we're emergency read-only
  - Fix for rebalance spinning trying to compress unwritten extentns
  - Another deleted_inodes fix, for directories
  - Fix a rare deadlock (usually just an unecessary wait) when flushing
    the journal with an open journal entry.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmV2T4EACgkQE6szbY3K
 bnay1w/+PyH5qwE2gOy17rno6cWSNyKJELUkcqVNqrSTZpuA+TbMbcV8+oOeBnG1
 9/ShwKRvwwNC4HVk6KySoTMo9lRkaZ5wX6DpEsOqxoN8aCp6kqiCUxr0inAAyVdu
 O8FktP83eSX/vERWNlCeGLdi1KsCK0BWVbVMpkiVEO9QhLpS9eo1C8btstDIjbsv
 TVGvKO7IpVgibSBwymQPKpZa6BGN4d6emLlgKStdpVVR1RwJW3eLJwi1EV2hSp1f
 LBnTI5eD64pu+phEb4zE83JX932XAbxdBWaHlN1y3i4l6+sJDu63Y4R8bkbW+rnJ
 cbiyYM5IuAH6MFbbh9rIW8kEIvjrX13mY94oGlK8ClCI9WX129jD5538tEH624U5
 KnhCZpkuzeGC5CVXNAzdJ8NP/Aj9qtKvSyssG6R5ZTitQ1FnTZ391Wb2pIRgj9pm
 yVfpJ/Q4cizVfSsKBvtr0U5I444zq50z+brKwegIoH8uMuGHKXcIgTUOu4q5pKDD
 znjS9eFrQTN2li2HB3LMxuS94yUmozqwgxClMptynLsHVknQH7F3cAdD+mYbwW5Q
 GUOd/QTlpskBYAUfBS8ewllowRjLGDJyrGvbR9Mvitk8CxOLRgoDipdh1K13jDMS
 zCmG1eQgdbtPHTM6fqif8Bu8xtgK7p2r099dcBhhiWmRyLPo5Qw=
 =l5sa
 -----END PGP SIGNATURE-----

Merge tag 'bcachefs-2023-12-10' of https://evilpiepirate.org/git/bcachefs

Pull more bcachefs bugfixes from Kent Overstreet:

 - Fix a rare emergency shutdown path bug: dropping journal pins after
   the filesystem has mostly been torn down is not what we want.

 - Fix some concurrency issues with the btree write buffer and journal
   replay by not using the btree write buffer until journal replay is
   finished

 - A fixup from the prior patch to kill journal pre-reservations: at the
   start of the btree update path, where previously we took a
   pre-reservation, we do at least want to check the journal watermark.

 - Fix a race between dropping device metadata and btree node writes,
   which would re-add a pointer to a device that had just been dropped

 - Fix one of the SCRU lock warnings, in
   bch2_compression_stats_to_text().

 - Partial fix for a rare transaction paths overflow, when indirect
   extents had been split by background tasks, by not running certain
   triggers when they're not needed.

 - Fix for creating a snapshot with implicit source in a subdirectory of
   the containing subvolume

 - Don't unfreeze when we're emergency read-only

 - Fix for rebalance spinning trying to compress unwritten extentns

 - Another deleted_inodes fix, for directories

 - Fix a rare deadlock (usually just an unecessary wait) when flushing
   the journal with an open journal entry.

* tag 'bcachefs-2023-12-10' of https://evilpiepirate.org/git/bcachefs:
  bcachefs: Close journal entry if necessary when flushing all pins
  bcachefs: Fix uninitialized var in bch2_journal_replay()
  bcachefs: Fix deleted inode check for dirs
  bcachefs: rebalance shouldn't attempt to compress unwritten extents
  bcachefs: don't attempt rw on unfreeze when shutdown
  bcachefs: Fix creating snapshot with implict source
  bcachefs: Don't run indirect extent trigger unless inserting/deleting
  bcachefs: Convert compression_stats to for_each_btree_key2
  bcachefs: Fix bch2_extent_drop_ptrs() call
  bcachefs: Fix a journal deadlock in replay
  bcachefs; Don't use btree write buffer until journal replay is finished
  bcachefs: Don't drop journal pins in exit path
2023-12-11 16:13:51 -08:00
David Howells
52bf9f6c09 afs: Fix refcount underflow from error handling race
If an AFS cell that has an unreachable (eg. ENETUNREACH) server listed (VL
server or fileserver), an asynchronous probe to one of its addresses may
fail immediately because sendmsg() returns an error.  When this happens, a
refcount underflow can happen if certain events hit a very small window.

The way this occurs is:

 (1) There are two levels of "call" object, the afs_call and the
     rxrpc_call.  Each of them can be transitioned to a "completed" state
     in the event of success or failure.

 (2) Asynchronous afs_calls are self-referential whilst they are active to
     prevent them from evaporating when they're not being processed.  This
     reference is disposed of when the afs_call is completed.

     Note that an afs_call may only be completed once; once completed
     completing it again will do nothing.

 (3) When a call transmission is made, the app-side rxrpc code queues a Tx
     buffer for the rxrpc I/O thread to transmit.  The I/O thread invokes
     sendmsg() to transmit it - and in the case of failure, it transitions
     the rxrpc_call to the completed state.

 (4) When an rxrpc_call is completed, the app layer is notified.  In this
     case, the app is kafs and it schedules a work item to process events
     pertaining to an afs_call.

 (5) When the afs_call event processor is run, it goes down through the
     RPC-specific handler to afs_extract_data() to retrieve data from rxrpc
     - and, in this case, it picks up the error from the rxrpc_call and
     returns it.

     The error is then propagated to the afs_call and that is completed
     too.  At this point the self-reference is released.

 (6) If the rxrpc I/O thread manages to complete the rxrpc_call within the
     window between rxrpc_send_data() queuing the request packet and
     checking for call completion on the way out, then
     rxrpc_kernel_send_data() will return the error from sendmsg() to the
     app.

 (7) Then afs_make_call() will see an error and will jump to the error
     handling path which will attempt to clean up the afs_call.

 (8) The problem comes when the error handling path in afs_make_call()
     tries to unconditionally drop an async afs_call's self-reference.
     This self-reference, however, may already have been dropped by
     afs_extract_data() completing the afs_call

 (9) The refcount underflows when we return to afs_do_probe_vlserver() and
     that tries to drop its reference on the afs_call.

Fix this by making afs_make_call() attempt to complete the afs_call rather
than unconditionally putting it.  That way, if afs_extract_data() manages
to complete the call first, afs_make_call() won't do anything.

The bug can be forced by making do_udp_sendmsg() return -ENETUNREACH and
sticking an msleep() in rxrpc_send_data() after the 'success:' label to
widen the race window.

The error message looks something like:

    refcount_t: underflow; use-after-free.
    WARNING: CPU: 3 PID: 720 at lib/refcount.c:28 refcount_warn_saturate+0xba/0x110
    ...
    RIP: 0010:refcount_warn_saturate+0xba/0x110
    ...
    afs_put_call+0x1dc/0x1f0 [kafs]
    afs_fs_get_capabilities+0x8b/0xe0 [kafs]
    afs_fs_probe_fileserver+0x188/0x1e0 [kafs]
    afs_lookup_server+0x3bf/0x3f0 [kafs]
    afs_alloc_server_list+0x130/0x2e0 [kafs]
    afs_create_volume+0x162/0x400 [kafs]
    afs_get_tree+0x266/0x410 [kafs]
    vfs_get_tree+0x25/0xc0
    fc_mount+0xe/0x40
    afs_d_automount+0x1b3/0x390 [kafs]
    __traverse_mounts+0x8f/0x210
    step_into+0x340/0x760
    path_openat+0x13a/0x1260
    do_filp_open+0xaf/0x160
    do_sys_openat2+0xaf/0x170

or something like:

    refcount_t: underflow; use-after-free.
    ...
    RIP: 0010:refcount_warn_saturate+0x99/0xda
    ...
    afs_put_call+0x4a/0x175
    afs_send_vl_probes+0x108/0x172
    afs_select_vlserver+0xd6/0x311
    afs_do_cell_detect_alias+0x5e/0x1e9
    afs_cell_detect_alias+0x44/0x92
    afs_validate_fc+0x9d/0x134
    afs_get_tree+0x20/0x2e6
    vfs_get_tree+0x1d/0xc9
    fc_mount+0xe/0x33
    afs_d_automount+0x48/0x9d
    __traverse_mounts+0xe0/0x166
    step_into+0x140/0x274
    open_last_lookups+0x1c1/0x1df
    path_openat+0x138/0x1c3
    do_filp_open+0x55/0xb4
    do_sys_openat2+0x6c/0xb6

Fixes: 34fa47612b ("afs: Fix race in async call refcounting")
Reported-by: Bill MacAllister <bill@ca-zephyr.org>
Closes: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1052304
Suggested-by: Jeffrey E Altman <jaltman@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/2633992.1702073229@warthog.procyon.org.uk/ # v1
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-12-11 15:40:41 -08:00
Paulo Alcantara
3a42709fa9 smb: client: fix OOB in smb2_query_reparse_point()
Validate @ioctl_rsp->OutputOffset and @ioctl_rsp->OutputCount so that
their sum does not wrap to a number that is smaller than @reparse_buf
and we end up with a wild pointer as follows:

  BUG: unable to handle page fault for address: ffff88809c5cd45f
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 4a01067 P4D 4a01067 PUD 0
  Oops: 0000 [#1] PREEMPT SMP NOPTI
  CPU: 2 PID: 1260 Comm: mount.cifs Not tainted 6.7.0-rc4 #2
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
  rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
  RIP: 0010:smb2_query_reparse_point+0x3e0/0x4c0 [cifs]
  Code: ff ff e8 f3 51 fe ff 41 89 c6 58 5a 45 85 f6 0f 85 14 fe ff ff
  49 8b 57 48 8b 42 60 44 8b 42 64 42 8d 0c 00 49 39 4f 50 72 40 <8b>
  04 02 48 8b 9d f0 fe ff ff 49 8b 57 50 89 03 48 8b 9d e8 fe ff
  RSP: 0018:ffffc90000347a90 EFLAGS: 00010212
  RAX: 000000008000001f RBX: ffff88800ae11000 RCX: 00000000000000ec
  RDX: ffff88801c5cd440 RSI: 0000000000000000 RDI: ffffffff82004aa4
  RBP: ffffc90000347bb0 R08: 00000000800000cd R09: 0000000000000001
  R10: 0000000000000000 R11: 0000000000000024 R12: ffff8880114d4100
  R13: ffff8880114d4198 R14: 0000000000000000 R15: ffff8880114d4000
  FS: 00007f02c07babc0(0000) GS:ffff88806ba00000(0000)
  knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: ffff88809c5cd45f CR3: 0000000011750000 CR4: 0000000000750ef0
  PKRU: 55555554
  Call Trace:
   <TASK>
   ? __die+0x23/0x70
   ? page_fault_oops+0x181/0x480
   ? search_module_extables+0x19/0x60
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? exc_page_fault+0x1b6/0x1c0
   ? asm_exc_page_fault+0x26/0x30
   ? _raw_spin_unlock_irqrestore+0x44/0x60
   ? smb2_query_reparse_point+0x3e0/0x4c0 [cifs]
   cifs_get_fattr+0x16e/0xa50 [cifs]
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? lock_acquire+0xbf/0x2b0
   cifs_root_iget+0x163/0x5f0 [cifs]
   cifs_smb3_do_mount+0x5bd/0x780 [cifs]
   smb3_get_tree+0xd9/0x290 [cifs]
   vfs_get_tree+0x2c/0x100
   ? capable+0x37/0x70
   path_mount+0x2d7/0xb80
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? _raw_spin_unlock_irqrestore+0x44/0x60
   __x64_sys_mount+0x11a/0x150
   do_syscall_64+0x47/0xf0
   entry_SYSCALL_64_after_hwframe+0x6f/0x77
  RIP: 0033:0x7f02c08d5b1e

Fixes: 2e4564b31b ("smb3: add support for stat of WSL reparse points for special file types")
Cc: stable@vger.kernel.org
Reported-by: Robert Morris <rtm@csail.mit.edu>
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-11 12:30:39 -06:00
Paulo Alcantara
90d025c2e9 smb: client: fix NULL deref in asn1_ber_decoder()
If server replied SMB2_NEGOTIATE with a zero SecurityBufferOffset,
smb2_get_data_area() sets @len to non-zero but return NULL, so
decode_negTokeninit() ends up being called with a NULL @security_blob:

  BUG: kernel NULL pointer dereference, address: 0000000000000000
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 0 P4D 0
  Oops: 0000 [#1] PREEMPT SMP NOPTI
  CPU: 2 PID: 871 Comm: mount.cifs Not tainted 6.7.0-rc4 #2
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
  RIP: 0010:asn1_ber_decoder+0x173/0xc80
  Code: 01 4c 39 2c 24 75 09 45 84 c9 0f 85 2f 03 00 00 48 8b 14 24 4c 29 ea 48 83 fa 01 0f 86 1e 07 00 00 48 8b 74 24 28 4d 8d 5d 01 <42> 0f b6 3c 2e 89 fa 40 88 7c 24 5c f7 d2 83 e2 1f 0f 84 3d 07 00
  RSP: 0018:ffffc9000063f950 EFLAGS: 00010202
  RAX: 0000000000000002 RBX: 0000000000000000 RCX: 000000000000004a
  RDX: 000000000000004a RSI: 0000000000000000 RDI: 0000000000000000
  RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000002 R11: 0000000000000001 R12: 0000000000000000
  R13: 0000000000000000 R14: 000000000000004d R15: 0000000000000000
  FS:  00007fce52b0fbc0(0000) GS:ffff88806ba00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000000 CR3: 000000001ae64000 CR4: 0000000000750ef0
  PKRU: 55555554
  Call Trace:
   <TASK>
   ? __die+0x23/0x70
   ? page_fault_oops+0x181/0x480
   ? __stack_depot_save+0x1e6/0x480
   ? exc_page_fault+0x6f/0x1c0
   ? asm_exc_page_fault+0x26/0x30
   ? asn1_ber_decoder+0x173/0xc80
   ? check_object+0x40/0x340
   decode_negTokenInit+0x1e/0x30 [cifs]
   SMB2_negotiate+0xc99/0x17c0 [cifs]
   ? smb2_negotiate+0x46/0x60 [cifs]
   ? srso_alias_return_thunk+0x5/0xfbef5
   smb2_negotiate+0x46/0x60 [cifs]
   cifs_negotiate_protocol+0xae/0x130 [cifs]
   cifs_get_smb_ses+0x517/0x1040 [cifs]
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? queue_delayed_work_on+0x5d/0x90
   cifs_mount_get_session+0x78/0x200 [cifs]
   dfs_mount_share+0x13a/0x9f0 [cifs]
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? lock_acquire+0xbf/0x2b0
   ? find_nls+0x16/0x80
   ? srso_alias_return_thunk+0x5/0xfbef5
   cifs_mount+0x7e/0x350 [cifs]
   cifs_smb3_do_mount+0x128/0x780 [cifs]
   smb3_get_tree+0xd9/0x290 [cifs]
   vfs_get_tree+0x2c/0x100
   ? capable+0x37/0x70
   path_mount+0x2d7/0xb80
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? _raw_spin_unlock_irqrestore+0x44/0x60
   __x64_sys_mount+0x11a/0x150
   do_syscall_64+0x47/0xf0
   entry_SYSCALL_64_after_hwframe+0x6f/0x77
  RIP: 0033:0x7fce52c2ab1e

Fix this by setting @len to zero when @off == 0 so callers won't
attempt to dereference non-existing data areas.

Reported-by: Robert Morris <rtm@csail.mit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-11 12:30:39 -06:00
Paulo Alcantara
af1689a9b7 smb: client: fix potential OOBs in smb2_parse_contexts()
Validate offsets and lengths before dereferencing create contexts in
smb2_parse_contexts().

This fixes following oops when accessing invalid create contexts from
server:

  BUG: unable to handle page fault for address: ffff8881178d8cc3
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 4a01067 P4D 4a01067 PUD 0
  Oops: 0000 [#1] PREEMPT SMP NOPTI
  CPU: 3 PID: 1736 Comm: mount.cifs Not tainted 6.7.0-rc4 #1
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
  rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
  RIP: 0010:smb2_parse_contexts+0xa0/0x3a0 [cifs]
  Code: f8 10 75 13 48 b8 93 ad 25 50 9c b4 11 e7 49 39 06 0f 84 d2 00
  00 00 8b 45 00 85 c0 74 61 41 29 c5 48 01 c5 41 83 fd 0f 76 55 <0f> b7
  7d 04 0f b7 45 06 4c 8d 74 3d 00 66 83 f8 04 75 bc ba 04 00
  RSP: 0018:ffffc900007939e0 EFLAGS: 00010216
  RAX: ffffc90000793c78 RBX: ffff8880180cc000 RCX: ffffc90000793c90
  RDX: ffffc90000793cc0 RSI: ffff8880178d8cc0 RDI: ffff8880180cc000
  RBP: ffff8881178d8cbf R08: ffffc90000793c22 R09: 0000000000000000
  R10: ffff8880180cc000 R11: 0000000000000024 R12: 0000000000000000
  R13: 0000000000000020 R14: 0000000000000000 R15: ffffc90000793c22
  FS: 00007f873753cbc0(0000) GS:ffff88806bc00000(0000)
  knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: ffff8881178d8cc3 CR3: 00000000181ca000 CR4: 0000000000750ef0
  PKRU: 55555554
  Call Trace:
   <TASK>
   ? __die+0x23/0x70
   ? page_fault_oops+0x181/0x480
   ? search_module_extables+0x19/0x60
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? exc_page_fault+0x1b6/0x1c0
   ? asm_exc_page_fault+0x26/0x30
   ? smb2_parse_contexts+0xa0/0x3a0 [cifs]
   SMB2_open+0x38d/0x5f0 [cifs]
   ? smb2_is_path_accessible+0x138/0x260 [cifs]
   smb2_is_path_accessible+0x138/0x260 [cifs]
   cifs_is_path_remote+0x8d/0x230 [cifs]
   cifs_mount+0x7e/0x350 [cifs]
   cifs_smb3_do_mount+0x128/0x780 [cifs]
   smb3_get_tree+0xd9/0x290 [cifs]
   vfs_get_tree+0x2c/0x100
   ? capable+0x37/0x70
   path_mount+0x2d7/0xb80
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? _raw_spin_unlock_irqrestore+0x44/0x60
   __x64_sys_mount+0x11a/0x150
   do_syscall_64+0x47/0xf0
   entry_SYSCALL_64_after_hwframe+0x6f/0x77
  RIP: 0033:0x7f8737657b1e

Reported-by: Robert Morris <rtm@csail.mit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-11 12:30:39 -06:00
Paulo Alcantara
eec04ea119 smb: client: fix OOB in receive_encrypted_standard()
Fix potential OOB in receive_encrypted_standard() if server returned a
large shdr->NextCommand that would end up writing off the end of
@next_buffer.

Fixes: b24df3e30c ("cifs: update receive_encrypted_standard to handle compounded responses")
Cc: stable@vger.kernel.org
Reported-by: Robert Morris <rtm@csail.mit.edu>
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-11 12:30:39 -06:00
Kent Overstreet
a66ff26b0f bcachefs: Close journal entry if necessary when flushing all pins
Since outstanding journal buffers hold a journal pin, when flushing all
pins we need to close the current journal entry if necessary so its pin
can be released.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-10 16:53:46 -05:00
Kent Overstreet
4a147af208 bcachefs: Fix uninitialized var in bch2_journal_replay()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-10 12:23:07 -05:00
Linus Torvalds
ca20f1622b Char/Misc driver fixes for 6.7-rc5
Here are some small fixes for 6.7-rc5 for a variety of small driver
 subsystems.  Included in here are:
   - debugfs revert for reported issue
   - greybus revert for reported issue
   - greybus fixup for endian build warning
   - coresight driver fixes
   - nvmem driver fixes
   - devcoredump fix
   - parport new device id
   - ndtest build fix
 
 All of these have ben in linux-next with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZXRxAQ8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ykphwCfaF0Dh6oajneYbo/pq70+an876uYAnjwALPfr
 g2EezrYYUAkkPACOd27t
 =q7gC
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-6.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char / misc driver fixes from Greg KH:
 "Here are some small fixes for 6.7-rc5 for a variety of small driver
  subsystems. Included in here are:

   - debugfs revert for reported issue

   - greybus revert for reported issue

   - greybus fixup for endian build warning

   - coresight driver fixes

   - nvmem driver fixes

   - devcoredump fix

   - parport new device id

   - ndtest build fix

  All of these have ben in linux-next with no reported issues"

* tag 'char-misc-6.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  nvmem: Do not expect fixed layouts to grab a layout driver
  parport: Add support for Brainboxes IX/UC/PX parallel cards
  Revert "greybus: gb-beagleplay: Ensure le for values in transport"
  greybus: gb-beagleplay: Ensure le for values in transport
  greybus: BeaglePlay driver needs CRC_CCITT
  Revert "debugfs: annotate debugfs handlers vs. removal with lockdep"
  devcoredump: Send uevent once devcd is ready
  ndtest: fix typo class_regster -> class_register
  misc: mei: client.c: fix problem of return '-EOVERFLOW' in mei_cl_write
  misc: mei: client.c: return negative error code in mei_cl_write
  mei: pxp: fix mei_pxp_send_message return value
  coresight: ultrasoc-smb: Fix uninitialized before use buf_hw_base
  coresight: ultrasoc-smb: Config SMB buffer before register sink
  coresight: ultrasoc-smb: Fix sleep while close preempt in enable_smb
  Documentation: coresight: fix `make refcheckdocs` warning
  hwtracing: hisi_ptt: Don't try to attach a task
  hwtracing: hisi_ptt: Handle the interrupt in hardirq context
  hwtracing: hisi_ptt: Add dummy callback pmu::read()
  coresight: Fix crash when Perf and sysfs modes are used concurrently
  coresight: etm4x: Remove bogous __exit annotation for some functions
2023-12-09 12:44:10 -08:00
Linus Torvalds
2099306c4e Six smb3 client fixes
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmVzknIACgkQiiy9cAdy
 T1HIsgwAkpJsKz0ReQB0PDNGQ9/ZHnxhQ5V/E1sCtwbXeu50YffIo+hpABtkK8aK
 ViZAEgpcoO0kWzSr7zvPwxkykbDLV8pBXqvwHTg/KxXpxCEGt2VNCP5oOAFFp1IM
 8XpBkitB1rCuM6P6vJ5nmRK1gpwaGS/qwLWqROejvsouur/w/KfOH6bNEq/PMypo
 sUc7N5WP01grnh9/ipvgUBHSjKpWJFZ7Y2eRXirCXBMEHmosqbam0Fac6njYflLE
 4gc8YSrfxHaBqrgschAzcnwQrGY0lEy/UN9KpUOCKqGxP4ha45ni5t9foiCArhI5
 qzA7Ns//zqRcl+/A5DmJi8brvbHWnDzG4wQ5IyEm81w/8W5HqXt+9kGSxs6q8K0n
 MdVBq2eRl3untIRTXtMpp2xvPyAdKOuYZYfe27nv64Il2Xg3xUnh0viuX5M/BHBH
 kIqhMcvfFigdjYcgpgAuDeVXXmjUoW/9+Ybdc19d4ypsrxAGqMjQqLKgBLrAewDu
 DS80ihnn
 =LR8C
 -----END PGP SIGNATURE-----

Merge tag '6.7-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:
 "Six smb3 client fixes:

   - Fixes for copy_file_range and clone (cache invalidation and file
     size), also addresses an xfstest failure

   - Fix to return proper error if REMAP_FILE_DEDUP set (also fixes
     xfstest generic/304)

   - Fix potential null pointer reference with DFS

   - Multichannel fix addressing (reverting an earlier patch) some of
     the problems with enabling/disabling channels dynamically

  Still working on a followon multichannel fix to address another issue
  found in reconnect testing that will send next week"

* tag '6.7-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: reconnect worker should take reference on server struct unconditionally
  Revert "cifs: reconnect work should have reference on server struct"
  cifs: Fix non-availability of dedup breaking generic/304
  smb: client: fix potential NULL deref in parse_dfs_referrals()
  cifs: Fix flushing, invalidation and file size with FICLONE
  cifs: Fix flushing, invalidation and file size with copy_file_range()
2023-12-09 12:10:56 -08:00
Linus Torvalds
8e819a7623 31 hotfixes. 10 of these address pre-6.6 issues and are marked cc:stable.
The remainder address post-6.6 issues or aren't considered serious enough
 to justify backporting.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZXKEfwAKCRDdBJ7gKXxA
 jlRpAQCiAp1nSqIz/fOKTzoQRaTDXU/m+C+6ZAXdKLDfvQBhpwEAnxxjZ8IgF+8Z
 Klz/GirHX5w5o7jE2wb8iObo1nR75Qo=
 =omRq
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2023-12-07-18-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "31 hotfixes. Ten of these address pre-6.6 issues and are marked
  cc:stable. The remainder address post-6.6 issues or aren't considered
  serious enough to justify backporting"

* tag 'mm-hotfixes-stable-2023-12-07-18-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (31 commits)
  mm/madvise: add cond_resched() in madvise_cold_or_pageout_pte_range()
  nilfs2: prevent WARNING in nilfs_sufile_set_segment_usage()
  mm/hugetlb: have CONFIG_HUGETLB_PAGE select CONFIG_XARRAY_MULTI
  scripts/gdb: fix lx-device-list-bus and lx-device-list-class
  MAINTAINERS: drop Antti Palosaari
  highmem: fix a memory copy problem in memcpy_from_folio
  nilfs2: fix missing error check for sb_set_blocksize call
  kernel/Kconfig.kexec: drop select of KEXEC for CRASH_DUMP
  units: add missing header
  drivers/base/cpu: crash data showing should depends on KEXEC_CORE
  mm/damon/sysfs-schemes: add timeout for update_schemes_tried_regions
  scripts/gdb/tasks: fix lx-ps command error
  mm/Kconfig: make userfaultfd a menuconfig
  selftests/mm: prevent duplicate runs caused by TEST_GEN_PROGS
  mm/damon/core: copy nr_accesses when splitting region
  lib/group_cpus.c: avoid acquiring cpu hotplug lock in group_cpus_evenly
  checkstack: fix printed address
  mm/memory_hotplug: fix error handling in add_memory_resource()
  mm/memory_hotplug: add missing mem_hotplug_lock
  .mailmap: add a new address mapping for Chester Lin
  ...
2023-12-08 08:36:23 -08:00
Namjae Jeon
1373665448 ksmbd: fix wrong name of SMB2_CREATE_ALLOCATION_SIZE
MS confirm that "AISi" name of SMB2_CREATE_ALLOCATION_SIZE in MS-SMB2
specification is a typo. cifs/ksmbd have been using this wrong name from
MS-SMB2. It should be "AlSi". Also It will cause problem when running
smb2.create.open test in smbtorture against ksmbd.

Cc: stable@vger.kernel.org
Fixes: 12197a7fdd ("Clarify SMB2/SMB3 create context and add missing ones")
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-08 10:11:33 -06:00
Namjae Jeon
a9f106c765 ksmbd: fix wrong allocation size update in smb2_open()
When client send SMB2_CREATE_ALLOCATION_SIZE create context, ksmbd update
old size to ->AllocationSize in smb2 create response. ksmbd_vfs_getattr()
should be called after it to get updated stat result.

Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-08 10:11:33 -06:00
Namjae Jeon
658609d9a6 ksmbd: avoid duplicate opinfo_put() call on error of smb21_lease_break_ack()
opinfo_put() could be called twice on error of smb21_lease_break_ack().
It will cause UAF issue if opinfo is referenced on other places.

Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-08 10:11:33 -06:00
Namjae Jeon
c2a721eead ksmbd: lazy v2 lease break on smb2_write()
Don't immediately send directory lease break notification on smb2_write().
Instead, It postpones it until smb2_close().

Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-08 10:11:33 -06:00
Namjae Jeon
d47d9886ae ksmbd: send v2 lease break notification for directory
If client send different parent key, different client guid, or there is
no parent lease key flags in create context v2 lease, ksmbd send lease
break to client.

Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-08 10:11:33 -06:00
Kent Overstreet
6d1980f0af bcachefs: Fix deleted inode check for dirs
We could delete directories transactionally on rmdir()/unlink(), but we
don't; instead, like with regular files we wait for the VFS to call
evict().

That means that our check for directories in the deleted inodes btree is
wrong - the check should be for non-empty directories.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-08 00:39:56 -05:00
Ryusuke Konishi
675abf8df1 nilfs2: prevent WARNING in nilfs_sufile_set_segment_usage()
If nilfs2 reads a disk image with corrupted segment usage metadata, and
its segment usage information is marked as an error for the segment at the
write location, nilfs_sufile_set_segment_usage() can trigger WARN_ONs
during log writing.

Segments newly allocated for writing with nilfs_sufile_alloc() will not
have this error flag set, but this unexpected situation will occur if the
segment indexed by either nilfs->ns_segnum or nilfs->ns_nextnum (active
segment) was marked in error.

Fix this issue by inserting a sanity check to treat it as a file system
corruption.

Since error returns are not allowed during the execution phase where
nilfs_sufile_set_segment_usage() is used, this inserts the sanity check
into nilfs_sufile_mark_dirty() which pre-reads the buffer containing the
segment usage record to be updated and sets it up in a dirty state for
writing.

In addition, nilfs_sufile_set_segment_usage() is also called when
canceling log writing and undoing segment usage update, so in order to
avoid issuing the same kernel warning in that case, in case of
cancellation, avoid checking the error flag in
nilfs_sufile_set_segment_usage().

Link: https://lkml.kernel.org/r/20231205085947.4431-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+14e9f834f6ddecece094@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=14e9f834f6ddecece094
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-06 16:12:50 -08:00
Sidhartha Kumar
4a3ef6be03 mm/hugetlb: have CONFIG_HUGETLB_PAGE select CONFIG_XARRAY_MULTI
After commit a08c7193e4 "mm/filemap: remove hugetlb special casing in
filemap.c", hugetlb pages are stored in the page cache in base page sized
indexes.  This leads to multi index stores in the xarray which is only
supporting through CONFIG_XARRAY_MULTI.  The other page cache user of
multi index stores ,THP, selects XARRAY_MULTI.  Have CONFIG_HUGETLB_PAGE
follow this behavior as well to avoid the BUG() with a CONFIG_HUGETLB_PAGE
&& !CONFIG_XARRAY_MULTI config.

Link: https://lkml.kernel.org/r/20231204183234.348697-1-sidhartha.kumar@oracle.com
Fixes: a08c7193e4 ("mm/filemap: remove hugetlb special casing in filemap.c")
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <muchun.song@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-06 16:12:49 -08:00
Ryusuke Konishi
d61d0ab573 nilfs2: fix missing error check for sb_set_blocksize call
When mounting a filesystem image with a block size larger than the page
size, nilfs2 repeatedly outputs long error messages with stack traces to
the kernel log, such as the following:

 getblk(): invalid block size 8192 requested
 logical block size: 512
 ...
 Call Trace:
  dump_stack_lvl+0x92/0xd4
  dump_stack+0xd/0x10
  bdev_getblk+0x33a/0x354
  __breadahead+0x11/0x80
  nilfs_search_super_root+0xe2/0x704 [nilfs2]
  load_nilfs+0x72/0x504 [nilfs2]
  nilfs_mount+0x30f/0x518 [nilfs2]
  legacy_get_tree+0x1b/0x40
  vfs_get_tree+0x18/0xc4
  path_mount+0x786/0xa88
  __ia32_sys_mount+0x147/0x1a8
  __do_fast_syscall_32+0x56/0xc8
  do_fast_syscall_32+0x29/0x58
  do_SYSENTER_32+0x15/0x18
  entry_SYSENTER_32+0x98/0xf1
 ...

This overloads the system logger.  And to make matters worse, it sometimes
crashes the kernel with a memory access violation.

This is because the return value of the sb_set_blocksize() call, which
should be checked for errors, is not checked.

The latter issue is due to out-of-buffer memory being accessed based on a
large block size that caused sb_set_blocksize() to fail for buffers read
with the initial minimum block size that remained unupdated in the
super_block structure.

Since nilfs2 mkfs tool does not accept block sizes larger than the system
page size, this has been overlooked.  However, it is possible to create
this situation by intentionally modifying the tool or by passing a
filesystem image created on a system with a large page size to a system
with a smaller page size and mounting it.

Fix this issue by inserting the expected error handling for the call to
sb_set_blocksize().

Link: https://lkml.kernel.org/r/20231129141547.4726-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-06 16:12:48 -08:00
Lizhi Xu
eb66b8abae squashfs: squashfs_read_data need to check if the length is 0
When the length passed in is 0, the pagemap_scan_test_walk() caller should
bail.  This error causes at least a WARN_ON().

Link: https://lkml.kernel.org/r/20231116031352.40853-1-lizhi.xu@windriver.com
Reported-by: syzbot+32d3767580a1ea339a81@syzkaller.appspotmail.com
Closes: https://lkml.kernel.org/r/0000000000000526f2060a30a085@google.com
Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com>
Reviewed-by: Phillip Lougher <phillip@squashfs.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-06 16:12:45 -08:00
Peter Xu
4980e837ca mm/pagemap: fix wr-protect even if PM_SCAN_WP_MATCHING not set
The new pagemap ioctl contains a fast path for wr-protections without
looking into category masks.  It forgets to check PM_SCAN_WP_MATCHING
before applying the wr-protections.  It can cause, e.g., pte markers
installed on archs that do not even support uffd wr-protect.

WARNING: CPU: 0 PID: 5059 at mm/memory.c:1520 zap_pte_range mm/memory.c:1520 [inline]

Link: https://lkml.kernel.org/r/20231116201547.536857-3-peterx@redhat.com
Fixes: 12f6b01a0b ("fs/proc/task_mmu: add fast paths to get/clear PAGE_IS_WRITTEN flag")
Signed-off-by: Peter Xu <peterx@redhat.com>
Reported-by: syzbot+7ca4b2719dc742b8d0a4@syzkaller.appspotmail.com
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Andrei Vagin <avagin@gmail.com>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-06 16:12:45 -08:00
Peter Xu
0dff1b407d mm/pagemap: fix ioctl(PAGEMAP_SCAN) on vma check
Patch series "mm/pagemap: A few fixes to the recent PAGEMAP_SCAN".

This series should fix two known reports from syzbot on the new
PAGEMAP_SCAN ioctl():

https://lore.kernel.org/all/000000000000b0e576060a30ee3b@google.com/
https://lore.kernel.org/all/000000000000773fa7060a31e2cc@google.com/

The 3rd patch is something I found when testing these patches.


This patch (of 3):

The new ioctl(PAGEMAP_SCAN) relies on vma wr-protect capability provided
by userfault, however in the vma test it didn't explicitly require the vma
to have wr-protect function enabled, even if PM_SCAN_WP_MATCHING flag is
set.

It means the pagemap code can now apply uffd-wp bit to a page in the vma
even if not registered to userfaultfd at all.

Then in whatever way as long as the pte got written and page fault
resolved, we'll apply the write bit even if uffd-wp bit is set.  We'll see
a pte that has both UFFD_WP and WRITE bit set.  Anything later that looks
up the pte for uffd-wp bit will trigger the warning:

WARNING: CPU: 1 PID: 5071 at arch/x86/include/asm/pgtable.h:403 pte_uffd_wp arch/x86/include/asm/pgtable.h:403 [inline]

Fix it by doing proper check over the vma attributes when
PM_SCAN_WP_MATCHING is specified.

Link: https://lkml.kernel.org/r/20231116201547.536857-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20231116201547.536857-2-peterx@redhat.com
Fixes: 52526ca7fd ("fs/proc/task_mmu: implement IOCTL to get and optionally clear info about PTEs")
Signed-off-by: Peter Xu <peterx@redhat.com>
Reported-by: syzbot+e94c5aaf7890901ebf9b@syzkaller.appspotmail.com
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Andrei Vagin <avagin@gmail.com>
Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-06 16:12:44 -08:00
Daniel Hill
e597288839 bcachefs: rebalance shouldn't attempt to compress unwritten extents
This fixes a bug where rebalance would loop repeatedly on the same
extents.

Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-06 17:43:21 -05:00
Boris Burkov
e85a0adacf btrfs: ensure releasing squota reserve on head refs
A reservation goes through a 3 step lifetime:

- generated during delalloc
- released/counted by ordered_extent allocation
- freed by running delayed ref

That third step depends on must_insert_reserved on the head ref, so the
head ref with that field set owns the reservation. Once you prepare to
run the head ref, must_insert_reserved is unset, which means that
running the ref must free the reservation, whether or not it succeeds,
or else the reservation is leaked. That results in either a risk of
spurious ENOSPC if the fs stays writeable or a warning on unmount if it
is readonly.

The existing squota code was aware of these invariants, but missed a few
cases. Improve it by adding a helper function to use in the cleanup
paths and call it from the existing early returns in running delayed
refs. This also simplifies btrfs_record_squota_delta and struct
btrfs_quota_delta.

This fixes (or at least improves the reliability of) generic/475 with
"mkfs -O squota". On my machine, that test failed ~4/10 times without
this patch and passed 100/100 times with it.

Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-06 22:32:57 +01:00
Boris Burkov
a86805504b btrfs: don't clear qgroup reserved bit in release_folio
The EXTENT_QGROUP_RESERVED bit is used to "lock" regions of the file for
duplicate reservations. That is two writes to that range in one
transaction shouldn't create two reservations, as the reservation will
only be freed once when the write finally goes down. Therefore, it is
never OK to clear that bit without freeing the associated qgroup
reserve. At this point, we don't want to be freeing the reserve, so mask
off the bit.

CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-06 22:32:52 +01:00
Boris Burkov
b321a52cce btrfs: free qgroup pertrans reserve on transaction abort
If we abort a transaction, we never run the code that frees the pertrans
qgroup reservation. This results in warnings on unmount as that
reservation has been leaked. The leak isn't a huge issue since the fs is
read-only, but it's better to clean it up when we know we can/should. Do
it during the cleanup_transaction step of aborting.

CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-06 22:32:49 +01:00
Boris Burkov
9e65bfca24 btrfs: fix qgroup_free_reserved_data int overflow
The reserved data counter and input parameter is a u64, but we
inadvertently accumulate it in an int. Overflowing that int results in
freeing the wrong amount of data and breaking reserve accounting.

Unfortunately, this overflow rot spreads from there, as the qgroup
release/free functions rely on returning an int to take advantage of
negative values for error codes.

Therefore, the full fix is to return the "released" or "freed" amount by
a u64 argument and to return 0 or negative error code via the return
value.

Most of the call sites simply ignore the return value, though some
of them handle the error and count the returned bytes. Change all of
them accordingly.

CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-06 22:32:46 +01:00
Boris Burkov
f63e1164b9 btrfs: free qgroup reserve when ORDERED_IOERR is set
An ordered extent completing is a critical moment in qgroup reserve
handling, as the ownership of the reservation is handed off from the
ordered extent to the delayed ref. In the happy path we release (unlock)
but do not free (decrement counter) the reservation, and the delayed ref
drives the free. However, on an error, we don't create a delayed ref,
since there is no ref to add. Therefore, free on the error path.

CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-06 22:32:40 +01:00
Shyam Prasad N
04909192ad cifs: reconnect worker should take reference on server struct unconditionally
Reconnect worker currently assumes that the server struct
is alive and only takes reference on the server if it needs
to call smb2_reconnect.

With the new ability to disable channels based on whether the
server has multichannel disabled, this becomes a problem when
we need to disable established channels. While disabling the
channels and deallocating the server, there could be reconnect
work that could not be cancelled (because it started).

This change forces the reconnect worker to unconditionally
take a reference on the server when it runs.

Also, this change now allows smb2_reconnect to know if it was
called by the reconnect worker. Based on this, the cifs_put_tcp_session
can decide whether it can cancel the reconnect work synchronously or not.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-06 11:04:23 -06:00
Shyam Prasad N
8233425248 Revert "cifs: reconnect work should have reference on server struct"
This reverts commit 19a4b9d6c3.

This earlier commit was making an assumption that each mod_delayed_work
called for the reconnect work would result in smb2_reconnect_server
being called twice. This assumption turns out to be untrue. So reverting
this change for now.

I will submit a follow-up patch to fix the actual problem in a different
way.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-06 11:03:36 -06:00
Brian Foster
5796230582 bcachefs: don't attempt rw on unfreeze when shutdown
The internal freeze mechanism in bcachefs mostly reuses the generic
rw<->ro transition code. If the fs happens to shutdown during or
after freeze, a transition back to rw can fail. This is expected,
but returning an error from the unfreeze callout prevents the
filesystem from being unfrozen.

Skip the read write transition if the fs is shutdown. This allows
the fs to unfreeze at the vfs level so writes will no longer block,
but will still fail due to the emergency read-only state of the fs.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-06 00:21:30 -05:00
Kent Overstreet
7aebaabfed bcachefs: Fix creating snapshot with implict source
When creating a snapshot without specifying the source subvolume, we use
the subvolume containing the new snapshot.

Previously, this worked if the directory containing the new snapshot was
the subvolume root - but we were using the incorrect helper, and got a
subvolume ID of 0 when the parent directory wasn't the root of the
subvolume, causing an emergency read-only.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-06 00:21:30 -05:00
David Howells
691a41d8da cifs: Fix non-availability of dedup breaking generic/304
Deduplication isn't supported on cifs, but cifs doesn't reject it, instead
treating it as extent duplication/cloning.  This can cause generic/304 to go
silly and run for hours on end.

Fix cifs to indicate EOPNOTSUPP if REMAP_FILE_DEDUP is set in
->remap_file_range().

Note that it's unclear whether or not commit b073a08016 is meant to cause
cifs to return an error if REMAP_FILE_DEDUP.

Fixes: b073a08016 ("cifs: fix that return -EINVAL when do dedupe operation")
Cc: stable@vger.kernel.org
Suggested-by: Dave Chinner <david@fromorbit.com>
cc: Xiaoli Feng <fengxiaoli0714@gmail.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: Darrick Wong <darrick.wong@oracle.com>
cc: fstests@vger.kernel.org
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/3876191.1701555260@warthog.procyon.org.uk/
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-05 21:12:04 -06:00
Paulo Alcantara
92414333eb smb: client: fix potential NULL deref in parse_dfs_referrals()
If server returned no data for FSCTL_DFS_GET_REFERRALS, @dfs_rsp will
remain NULL and then parse_dfs_referrals() will dereference it.

Fix this by returning -EIO when no output data is returned.

Besides, we can't fix it in SMB2_ioctl() as some FSCTLs are allowed to
return no data as per MS-SMB2 2.2.32.

Fixes: 9d49640a21 ("CIFS: implement get_dfs_refer for SMB2+")
Cc: stable@vger.kernel.org
Reported-by: Robert Morris <rtm@csail.mit.edu>
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-05 21:12:00 -06:00
Namjae Jeon
eb547407f3 ksmbd: downgrade RWH lease caching state to RH for directory
RWH(Read + Write + Handle) caching state is not supported for directory.
ksmbd downgrade it to RH for directory if client send RWH caching lease
state.

Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-05 20:43:23 -06:00
Namjae Jeon
18dd1c367c ksmbd: set v2 lease capability
Set SMB2_GLOBAL_CAP_DIRECTORY_LEASING to ->capabilities to inform server
support directory lease to client.

Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-05 20:43:23 -06:00
Namjae Jeon
d045850b62 ksmbd: set epoch in create context v2 lease
To support v2 lease(directory lease), ksmbd set epoch in create context
v2 lease response.

Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-05 20:43:23 -06:00
Zizhi Wo
8f17527230 ksmbd: fix memory leak in smb2_lock()
In smb2_lock(), if setup_async_work() executes successfully,
work->cancel_argv will bind the argv that generated by kmalloc(). And
release_async_work() is called in ksmbd_conn_try_dequeue_request() or
smb2_lock() to release argv.
However, when setup_async_work function fails, work->cancel_argv has not
been bound to the argv, resulting in the previously allocated argv not
being released. Call kfree() to fix it.

Fixes: e2f34481b2 ("cifsd: add server-side procedures for SMB3")
Signed-off-by: Zizhi Wo <wozizhi@huawei.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-05 20:43:23 -06:00
Kent Overstreet
f88d811a23 bcachefs: Don't run indirect extent trigger unless inserting/deleting
This fixes a transaction path overflow reported in the snapshot deletion
path, when moving extents to the correct snapshot.

The root of the issue is that creating/deleting a reflink pointer can
generate an unbounded number of updates, if it is allowed to reference
an unbounded number of indirect extents; to prevent this, merging of
reflink pointers has been disabled.

But there's a hole, which is that copygc/rebalance may fragment existing
extents in the course of moving them around, and if an indirect extent
becomes too fragmented we'll then become unable to delete the reflink
pointer.

The eventual solution is going to be to tweak trigger handling so that
we can process large reflink pointers incrementally when necessary, and
notice that trigger updates don't need to be run for the part of the
reflink pointer not changing. That is going to be a bigger project
though, for another patch.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-04 16:04:55 -05:00
Kent Overstreet
adcf4ee642 bcachefs: Convert compression_stats to for_each_btree_key2
for_each_btree_key2() runs each loop iteration in a btree transaction,
and thus does not cause SRCU lock hold time problems.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-04 16:04:55 -05:00
Kent Overstreet
131898b0cb bcachefs: Fix bch2_extent_drop_ptrs() call
Also, make bch2_extent_drop_ptrs() safer, so it works with extents and
non-extents iterators.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-04 16:04:55 -05:00
Kent Overstreet
87b0d8d3d0 bcachefs: Fix a journal deadlock in replay
Recently, journal pre-reservations were removed. They were for reserving
space ahead of time in the journal for operations that are required for
journal reclaim, e.g. btree key cache flushing and interior node btree
updates.

Instead we have watermarks - only operations for journal reclaim are
allowed when the journal is low on space, and in general we're quite
good about doing operations in the order that will free up space in the
journal quickest when we're low on space. If we're doing a journal
reclaim operation out of order, we usually do it in nonblocking mode if
it's not freeing up space at the end of the journal.

There's an exceptino though - interior btree node update operations have
to be BCH_WATERMARK_reclaim - once they've been started, and they can't
be nonblocking. Generally this is fine because they'll only be a very
small fraction of transaction commits - but there's an exception, which
is during journal replay.

Journal replay does many btree operations, but doesn't need to commit
them to the journal since they're already in the journal. So killing off
of pre-reservation, plus another change to make journal replay more
efficient by initially doing the replay in sorted btree order, made it
possible for the interior update operations replay generates to fill and
deadlock the journal.

Fix this by introducing a new check on journal space at the _start_ of
an interior update operation. This causes us to block if necessary in
exactly the same way as we used to when interior updates took a journal
pre-reservaiton, but without all the expensive accounting
pre-reservations required.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-04 16:04:55 -05:00
Kent Overstreet
ef6fae4a13 bcachefs; Don't use btree write buffer until journal replay is finished
The keys being replayed by journal replay have to be synchronized with
updates by other threads that overwrite them. We rely on btree node
locks for synchronizing - but since btree write buffer updates take no
btree locks, that won't work.

Instead, simply disable using the btree write buffer until journal
replay is finished.

This fixes a rare backpointers error in the merge_torture_flakey test.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-04 15:46:31 -05:00
David Howells
c54fc3a4f3 cifs: Fix flushing, invalidation and file size with FICLONE
Fix a number of issues in the cifs filesystem implementation of the FICLONE
ioctl in cifs_remap_file_range().  This is analogous to the previously
fixed bug in cifs_file_copychunk_range() and can share the helper
functions.

Firstly, the invalidation of the destination range is handled incorrectly:
We shouldn't just invalidate the whole file as dirty data in the file may
get lost and we can't just call truncate_inode_pages_range() to invalidate
the destination range as that will erase parts of a partial folio at each
end whilst invalidating and discarding all the folios in the middle.  We
need to force all the folios covering the range to be reloaded, but we
mustn't lose dirty data in them that's not in the destination range.

Further, we shouldn't simply round out the range to PAGE_SIZE at each end
as cifs should move to support multipage folios.

Secondly, there's an issue whereby a write may have extended the file
locally, but not have been written back yet.  This can leaves the local
idea of the EOF at a later point than the server's EOF.  If a clone request
is issued, this will fail on the server with STATUS_INVALID_VIEW_SIZE
(which gets translated to -EIO locally) if the clone source extends past
the server's EOF.

Fix this by:

 (0) Flush the source region (already done).  The flush does nothing and
     the EOF isn't moved if the source region has no dirty data.

 (1) Move the EOF to the end of the source region if it isn't already at
     least at this point.  If we can't do this, for instance if the server
     doesn't support it, just flush the entire source file.

 (2) Find the folio (if present) at each end of the range, flushing it and
     increasing the region-to-be-invalidated to cover those in their
     entirety.

 (3) Fully discard all the folios covering the range as we want them to be
     reloaded.

 (4) Then perform the extent duplication.

Thirdly, set i_size after doing the duplicate_extents operation as this
value may be used by various things internally.  stat() hides the issue
because setting ->time to 0 causes cifs_getatr() to revalidate the
attributes.

These were causing the cifs/001 xfstest to fail.

Fixes: 04b38d6012 ("vfs: pull btrfs clone API to vfs layer")
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: stable@vger.kernel.org
cc: Christoph Hellwig <hch@lst.de>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: linux-mm@kvack.org
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-04 14:15:05 -06:00
David Howells
7b2404a886 cifs: Fix flushing, invalidation and file size with copy_file_range()
Fix a number of issues in the cifs filesystem implementation of the
copy_file_range() syscall in cifs_file_copychunk_range().

Firstly, the invalidation of the destination range is handled incorrectly:
We shouldn't just invalidate the whole file as dirty data in the file may
get lost and we can't just call truncate_inode_pages_range() to invalidate
the destination range as that will erase parts of a partial folio at each
end whilst invalidating and discarding all the folios in the middle.  We
need to force all the folios covering the range to be reloaded, but we
mustn't lose dirty data in them that's not in the destination range.

Further, we shouldn't simply round out the range to PAGE_SIZE at each end
as cifs should move to support multipage folios.

Secondly, there's an issue whereby a write may have extended the file
locally, but not have been written back yet.  This can leaves the local
idea of the EOF at a later point than the server's EOF.  If a copy request
is issued, this will fail on the server with STATUS_INVALID_VIEW_SIZE
(which gets translated to -EIO locally) if the copy source extends past the
server's EOF.

Fix this by:

 (0) Flush the source region (already done).  The flush does nothing and
     the EOF isn't moved if the source region has no dirty data.

 (1) Move the EOF to the end of the source region if it isn't already at
     least at this point.  If we can't do this, for instance if the server
     doesn't support it, just flush the entire source file.

 (2) Find the folio (if present) at each end of the range, flushing it and
     increasing the region-to-be-invalidated to cover those in their
     entirety.

 (3) Fully discard all the folios covering the range as we want them to be
     reloaded.

 (4) Then perform the copy.

Thirdly, set i_size after doing the copychunk_range operation as this value
may be used by various things internally.  stat() hides the issue because
setting ->time to 0 causes cifs_getatr() to revalidate the attributes.

These were causing the generic/075 xfstest to fail.

Fixes: 620d8745b3 ("Introduce cifs_copy_file_range()")
Cc: stable@vger.kernel.org
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: linux-mm@kvack.org
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-04 14:14:43 -06:00
Amir Goldstein
3f29f1c336 fuse: disable FOPEN_PARALLEL_DIRECT_WRITES with FUSE_DIRECT_IO_ALLOW_MMAP
The new fuse init flag FUSE_DIRECT_IO_ALLOW_MMAP breaks assumptions made by
FOPEN_PARALLEL_DIRECT_WRITES and causes test generic/095 to hit
BUG_ON(fi->writectr < 0) assertions in fuse_set_nowrite():

generic/095 5s ...
  kernel BUG at fs/fuse/dir.c:1756!
...
  ? fuse_set_nowrite+0x3d/0xdd
  ? do_raw_spin_unlock+0x88/0x8f
  ? _raw_spin_unlock+0x2d/0x43
  ? fuse_range_is_writeback+0x71/0x84
  fuse_sync_writes+0xf/0x19
  fuse_direct_io+0x167/0x5bd
  fuse_direct_write_iter+0xf0/0x146

Auto disable FOPEN_PARALLEL_DIRECT_WRITES when server negotiated
FUSE_DIRECT_IO_ALLOW_MMAP.

Fixes: e78662e818 ("fuse: add a new fuse init flag to relax restrictions in no cache mode")
Cc: <stable@vger.kernel.org> # v6.6
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-12-04 10:19:32 +01:00
Hangyu Hua
7f8ed28d14 fuse: dax: set fc->dax to NULL in fuse_dax_conn_free()
fuse_dax_conn_free() will be called when fuse_fill_super_common() fails
after fuse_dax_conn_alloc(). Then deactivate_locked_super() in
virtio_fs_get_tree() will call virtio_kill_sb() to release the discarded
superblock. This will call fuse_dax_conn_free() again in fuse_conn_put(),
resulting in a possible double free.

Fixes: 1dd539577c ("virtiofs: add a mount option to enable dax")
Signed-off-by: Hangyu Hua <hbh25y@gmail.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Cc: <stable@vger.kernel.org> # v5.10
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-12-04 10:16:53 +01:00
Krister Johansen
c4d361f66a fuse: share lookup state between submount and its parent
Fuse submounts do not perform a lookup for the nodeid that they inherit
from their parent.  Instead, the code decrements the nlookup on the
submount's fuse_inode when it is instantiated, and no forget is
performed when a submount root is evicted.

Trouble arises when the submount's parent is evicted despite the
submount itself being in use.  In this author's case, the submount was
in a container and deatched from the initial mount namespace via a
MNT_DEATCH operation.  When memory pressure triggered the shrinker, the
inode from the parent was evicted, which triggered enough forgets to
render the submount's nodeid invalid.

Since submounts should still function, even if their parent goes away,
solve this problem by sharing refcounted state between the parent and
its submount.  When all of the references on this shared state reach
zero, it's safe to forget the final lookup of the fuse nodeid.

Signed-off-by: Krister Johansen <kjlx@templeofstupid.com>
Cc: stable@vger.kernel.org
Fixes: 1866d779d5 ("fuse: Allow fuse_fill_super_common() for submounts")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-12-04 10:16:53 +01:00
Tyler Fanelli
c55e0a55b1 fuse: Rename DIRECT_IO_RELAX to DIRECT_IO_ALLOW_MMAP
Although DIRECT_IO_RELAX's initial usage is to allow shared mmap, its
description indicates a purpose of reducing memory footprint. This
may imply that it could be further used to relax other DIRECT_IO
operations in the future.

Replace it with a flag DIRECT_IO_ALLOW_MMAP which does only one thing,
allow shared mmap of DIRECT_IO files while still bypassing the cache
on regular reads and writes.

[Miklos] Also Keep DIRECT_IO_RELAX definition for backward compatibility.

Signed-off-by: Tyler Fanelli <tfanelli@redhat.com>
Fixes: e78662e818 ("fuse: add a new fuse init flag to relax restrictions in no cache mode")
Cc: <stable@vger.kernel.org> # v6.6
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-12-04 10:14:39 +01:00
Johannes Berg
88ac06a9f9 Revert "debugfs: annotate debugfs handlers vs. removal with lockdep"
This reverts commit f4acfcd4de ("debugfs: annotate debugfs handlers
vs. removal with lockdep"), it appears to have false positives and
really shouldn't have been in the -rc series with the fixes anyway.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://lore.kernel.org/r/20231202114936.fd55431ab160.I911aa53abeeca138126f690d383a89b13eb05667@changeid
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-04 07:43:16 +01:00
Kent Overstreet
0117591e69 bcachefs: Don't drop journal pins in exit path
There's no need to drop journal pins in our exit paths - the code was
trying to have everything cleaned up on any shutdown, but better to just
tweak the assertions a bit.

This fixes a bug where calling into journal reclaim in the exit path
would cass a null ptr deref.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-03 12:44:18 -05:00
Linus Torvalds
968f35f4ab five cifs/smb3 fixes
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmVqcagACgkQiiy9cAdy
 T1E/YgwAtsB7RwxufSE5CB18wKdyBySIIZBSzm9IFTzX92VWGUolui+mQDtqvmRA
 Q+JlOzHrgo+FJNYMUvT8eY5r9GKfIvrqKRsBs5EITUAqg8cGPQzgG+Jgy1NsSuD7
 Of5WBjTbZPMUPGqcI2wf3+/xGLyiQq63thUUCn9QiaYkq0SjOdc/IcZUo8dcHnVB
 N/58QX+JLpEVwjGL5NJtG4EscbyqGBEk9KTg4C9MP7emNG9LuNo/1UJCzs5SFNKk
 TlLeYpVjtjqWhhr48AfXoFfxGO8K7XuFHqiPksw5Lnl8Mcvo8mb1zZvz+xX3g016
 EQ2RzZ+UNQh5qwBALswPNRXlvWUV2gA0cQC1JKY7NgPF5bc/GRefWvsSJd0ycwud
 U1D/tYX2hRYKGZqVu5fbNy848JIDFE2AEBM9nu/77n3pEmeyD+h2F4FSsEpkevNj
 uo0cqw3C0hvMl6O1VM+pA8RfbfQuoDaQK7PyLaZS3cSVKUKcBhCLdC9WKfQGQBLF
 HEWLFzb8
 =g+0x
 -----END PGP SIGNATURE-----

Merge tag 'v6.7-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

 - Two fallocate fixes

 - Fix warnings from new gcc

 - Two symlink fixes

* tag 'v6.7-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb: client, common: fix fortify warnings
  cifs: Fix FALLOC_FL_INSERT_RANGE by setting i_size after EOF moved
  cifs: Fix FALLOC_FL_ZERO_RANGE by setting i_size if EOF moved
  smb: client: report correct st_size for SMB and NFS symlinks
  smb: client: fix missing mode bits for SMB symlinks
2023-12-03 09:08:26 +09:00
Linus Torvalds
c1c09da07c \n
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmVpr4QACgkQnJ2qBz9k
 QNkoPQf/aurRZZcMHmcYEmd//O59iRXYQVgO9+ORtjqIgB4wHD5q5eYK70eatxmR
 nZkntlFZSTUj9bbgzwOJWzhrsBGfZ2tWY1JFQl/Dlk2nF02rg2aIt1ZksqtNF4wX
 Qt/yGpf+5ubYLxmpxHhBirR3l9DgBppZsTbS2oY5ki17IM4+eOrvMWHHyMcO9Tpb
 jWaiqDueZRL6HCP52cusAJZ6x72vJzivbGGQffbfR4NNGz4YXaiCUQcCnAoW8xcO
 u8xTZ1iH6jrolBSJHNA9fKoYN69aTjPkGFQFnZuOBBgVlUyWYXeJO+9psxusqhE0
 FrghuuGZMhQNkTAKbJTpmeMQ8XRYFQ==
 =Lhm6
 -----END PGP SIGNATURE-----

Merge tag 'fs_for_v6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull ext2 fix from Jan Kara:
 "Fix an ext2 bug introduced by changes in ext2 & iomap stepping on each
  other toes (apparently ext2 driver does not get much testing in
  linux-next)"

* tag 'fs_for_v6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  ext2: Fix ki_pos update for DIO buffered-io fallback case
2023-12-02 06:19:27 +09:00
Linus Torvalds
e6861be452 More bcachefs bugfixes for 6.7
Bigger/user visible fixes:
 
  - bcache & bcachefs were broken with CFI enabled; patch for closures to
    fix type punning
 
  - mark erasure coding as extra-experimental; there are incompatible
    disk space accounting changes coming for erasure coding, and I'm
    still seeing checksum errors in some tests
 
  - several fixes for durability-related issues (durability is a device
    specific setting where we can tell bcachefs that data on a given
    device should be counted as replicated x times )
 
  - a fix for a rare livelock when a btree node merge then updates a
    parent node that is almost full
 
  - fix a race in the device removal path, where dropping a pointer in a
    btree node to a device would be clobbered by an in flight btree write
    updating the btree node key on completion
 
  - fix one SRCU lock hold time warning in the btree gc code - ther's
    still a bunch more of these to fix
 
  - fix a rare race where we'd start copygc before initializing the "are
    we rw" percpu refcount; copygc would think we were already ro and die
    immediately
 
 https://evilpiepirate.org/~testdashboard/ci?branch=bcachefs-for-upstream
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmVnoHoACgkQE6szbY3K
 bnbzLBAApVEg3kB3XDCHYw+8AxLbzkuKbuV8FR/w+ULYAmRKbnM5e4pM4UJzwVJ9
 vzBS9KUT4mVNpA5zl7FWmqh5AiJkhbPgb/BijtQiS+gz1ofZ8uCW/DjzWZpaTaT9
 0zz9auiKwzJbBmLXC2lWC28MUPjFNXxlP2pfQPqhpKqlGKBC893hKeJ0Veb6dM1R
 DqkctoWtSQzsNpEaXiQpKBNoNUIlYcFX1XXHn+XpPpWNe80SpMfVNCs2qPkMByu/
 V/QULE9cHI7RTu7oyFY80+9xQDeXDDYZgvtpD7hqNPcyyoix+r/DVz1mZe41XF2B
 bvaJhfcdWePctmiuEXJVXT4HSkwwzC6EKHwi7fejGY56hOvsrEAxNzTEIPRNw5st
 ZkZlxASwFqkiJ3ehy+KRngLX2GZSbJsU4aM5ViQJKtz4rBzGyyf0LmMucdxAoDH5
 zLzsAYaA6FkIZ5e5ZNdTDj7/TMnKWXlU9vTttqIpb8s7qSy+3ejk5NuGitJihZ4R
 LAaCTs1JIsItLP47Ko0ZvmKV6CHlmt+Ht8OBqu73BWJ8vsBTQ8JMK4mGt60bwHvm
 LdEMtp3C3FmXFc06zhKoGgjrletZYO6G4mFBPnQqh1brfFXM1prVg3ftDTqBWkMI
 iAz2chiVc8k0qxoSAqylCYFaGzgiBKzw6YMtqPRmZgfLcq/sJ34=
 =vN+y
 -----END PGP SIGNATURE-----

Merge tag 'bcachefs-2023-11-29' of https://evilpiepirate.org/git/bcachefs

Pull more bcachefs bugfixes from Kent Overstreet:

 - bcache & bcachefs were broken with CFI enabled; patch for closures to
   fix type punning

 - mark erasure coding as extra-experimental; there are incompatible
   disk space accounting changes coming for erasure coding, and I'm
   still seeing checksum errors in some tests

 - several fixes for durability-related issues (durability is a device
   specific setting where we can tell bcachefs that data on a given
   device should be counted as replicated x times)

 - a fix for a rare livelock when a btree node merge then updates a
   parent node that is almost full

 - fix a race in the device removal path, where dropping a pointer in a
   btree node to a device would be clobbered by an in flight btree write
   updating the btree node key on completion

 - fix one SRCU lock hold time warning in the btree gc code - ther's
   still a bunch more of these to fix

 - fix a rare race where we'd start copygc before initializing the "are
   we rw" percpu refcount; copygc would think we were already ro and die
   immediately

* tag 'bcachefs-2023-11-29' of https://evilpiepirate.org/git/bcachefs: (23 commits)
  bcachefs: Extra kthread_should_stop() calls for copygc
  bcachefs: Convert gc_alloc_start() to for_each_btree_key2()
  bcachefs: Fix race between btree writes and metadata drop
  bcachefs: move journal seq assertion
  bcachefs: -EROFS doesn't count as move_extent_start_fail
  bcachefs: trace_move_extent_start_fail() now includes errcode
  bcachefs: Fix split_race livelock
  bcachefs: Fix bucket data type for stripe buckets
  bcachefs: Add missing validation for jset_entry_data_usage
  bcachefs: Fix zstd compress workspace size
  bcachefs: bpos is misaligned on big endian
  bcachefs: Fix ec + durability calculation
  bcachefs: Data update path won't accidentaly grow replicas
  bcachefs: deallocate_extra_replicas()
  bcachefs: Proper refcounting for journal_keys
  bcachefs: preserve device path as device name
  bcachefs: Fix an endianness conversion
  bcachefs: Start gc, copygc, rebalance threads after initing writes ref
  bcachefs: Don't stop copygc thread on device resize
  bcachefs: Make sure bch2_move_ratelimit() also waits for move_ops
  ...
2023-12-02 06:02:16 +09:00
Jan Kara
619f75dae2 ext4: fix warning in ext4_dio_write_end_io()
The syzbot has reported that it can hit the warning in
ext4_dio_write_end_io() because i_size < i_disksize. Indeed the
reproducer creates a race between DIO IO completion and truncate
expanding the file and thus ext4_dio_write_end_io() sees an inconsistent
inode state where i_disksize is already updated but i_size is not
updated yet. Since we are careful when setting up DIO write and consider
it extending (and thus performing the IO synchronously with i_rwsem held
exclusively) whenever it goes past either of i_size or i_disksize, we
can use the same test during IO completion without risking entering
ext4_handle_inode_extension() without i_rwsem held. This way we make it
obvious both i_size and i_disksize are large enough when we report DIO
completion without relying on unreliable WARN_ON.

Reported-by:  <syzbot+47479b71cdfc78f56d30@syzkaller.appspotmail.com>
Fixes: 91562895f8 ("ext4: properly sync file size update after O_SYNC direct IO")
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20231130095653.22679-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-11-30 23:29:34 -05:00
Zhang Yi
6a3afb6ac6 jbd2: increase the journal IO's priority
Current jbd2 only add REQ_SYNC for descriptor block, metadata log
buffer, commit buffer and superblock buffer, the submitted IO could be
throttled by writeback throttle in block layer, that could lead to
priority inversion in some cases. The log IO looks like a kind of high
priority metadata IO, so it should not be throttled by WBT like QOS
policies in block layer, let's add REQ_SYNC | REQ_IDLE to exempt from
writeback throttle, and also add REQ_META together indicates it's a
metadata IO.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20231129114740.2686201-2-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-11-30 23:27:39 -05:00
Zhang Yi
8555922721 jbd2: correct the printing of write_flags in jbd2_write_superblock()
The write_flags print in the trace of jbd2_write_superblock() is not
real, so move the modification before the trace.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20231129114740.2686201-1-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-11-30 23:27:39 -05:00
Baokun Li
2dcf5fde6d ext4: prevent the normalized size from exceeding EXT_MAX_BLOCKS
For files with logical blocks close to EXT_MAX_BLOCKS, the file size
predicted in ext4_mb_normalize_request() may exceed EXT_MAX_BLOCKS.
This can cause some blocks to be preallocated that will not be used.
And after [Fixes], the following issue may be triggered:

=========================================================
 kernel BUG at fs/ext4/mballoc.c:4653!
 Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
 CPU: 1 PID: 2357 Comm: xfs_io 6.7.0-rc2-00195-g0f5cc96c367f
 Hardware name: linux,dummy-virt (DT)
 pc : ext4_mb_use_inode_pa+0x148/0x208
 lr : ext4_mb_use_inode_pa+0x98/0x208
 Call trace:
  ext4_mb_use_inode_pa+0x148/0x208
  ext4_mb_new_inode_pa+0x240/0x4a8
  ext4_mb_use_best_found+0x1d4/0x208
  ext4_mb_try_best_found+0xc8/0x110
  ext4_mb_regular_allocator+0x11c/0xf48
  ext4_mb_new_blocks+0x790/0xaa8
  ext4_ext_map_blocks+0x7cc/0xd20
  ext4_map_blocks+0x170/0x600
  ext4_iomap_begin+0x1c0/0x348
=========================================================

Here is a calculation when adjusting ac_b_ex in ext4_mb_new_inode_pa():

	ex.fe_logical = orig_goal_end - EXT4_C2B(sbi, ex.fe_len);
	if (ac->ac_o_ex.fe_logical >= ex.fe_logical)
		goto adjust_bex;

The problem is that when orig_goal_end is subtracted from ac_b_ex.fe_len
it is still greater than EXT_MAX_BLOCKS, which causes ex.fe_logical to
overflow to a very small value, which ultimately triggers a BUG_ON in
ext4_mb_new_inode_pa() because pa->pa_free < len.

The last logical block of an actual write request does not exceed
EXT_MAX_BLOCKS, so in ext4_mb_normalize_request() also avoids normalizing
the last logical block to exceed EXT_MAX_BLOCKS to avoid the above issue.

The test case in [Link] can reproduce the above issue with 64k block size.

Link: https://patchwork.kernel.org/project/fstests/list/?series=804003
Cc:  <stable@kernel.org> # 6.4
Fixes: 93cdf49f6e ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()")
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20231127063313.3734294-1-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-11-30 21:57:38 -05:00
Linus Torvalds
6172a5180f Including fixes from bpf and wifi.
Current release - regressions:
 
   - neighbour: fix __randomize_layout crash in struct neighbour
 
   - r8169: fix deadlock on RTL8125 in jumbo mtu mode
 
 Previous releases - regressions:
 
   - wifi:
     - mac80211: fix warning at station removal time
     - cfg80211: fix CQM for non-range use
 
   - tools: ynl-gen: fix unexpected response handling
 
   - octeontx2-af: fix possible buffer overflow
 
   - dpaa2: recycle the RX buffer only after all processing done
 
   - rswitch: fix missing dev_kfree_skb_any() in error path
 
 Previous releases - always broken:
 
   - ipv4: fix uaf issue when receiving igmp query packet
 
   - wifi: mac80211: fix debugfs deadlock at device removal time
 
   - bpf:
     - sockmap: af_unix stream sockets need to hold ref for pair sock
     - netdevsim: don't accept device bound programs
 
   - selftests: fix a char signedness issue
 
   - dsa: mv88e6xxx: fix marvell 6350 probe crash
 
   - octeontx2-pf: restore TC ingress police rules when interface is up
 
   - wangxun: fix memory leak on msix entry
 
   - ravb: keep reverse order of operations in ravb_remove()
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmVobzISHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOk4rwP/2qaUstOJVpkO8cG+bRYi3idH9uO/8Yu
 dYgFI4LM826YgbVNVzuiu9Sh7t78dbep/fWQ2quDuZUinWtPmv6RV3UKbDyNWLRr
 iV7sZvXElGsUefixxGANYDUPuCrlr3O230Y8zCN0R65BMurppljs9Pp8FwIqaD+v
 pVs2alb/PeX7g+hPACKPr4Knu8QeZYmzdHoyYeLoMG3PqIgJVU3/8OHHfmnYCdxT
 VSss2LB5FKFCOgetEPGy83KQP7QVaK22GDphZJ4xh7aSewRVP92ORfauiI8To4vQ
 0VnLNcQ+1pXnYzgGdv8oF02e4EP5b0jvrTpqCw1U0QU2s2PARJarzajCXBkwa308
 gXELRpVRpY4+7WEBSX4RGUigurwGGEh/IP/puVtPDr9KU3lFgaTI8wM624Y3Ob/e
 /LVI7a5kUSJysq9/H/QrHjoiuTtV7nCmzBgDqEFSN5hQinSHYKyD0XsUPcLlMJmn
 p6CyQDGHv2ibbg+8TStig0xfmC83N8KfDfcCekSrYxquDMTRtfa2VXofzQiQKDnr
 XNyIURmZAAUVPR6enxlg5Iqzc0mQGumYif7wzsO1bzVzmVZgIDCVxU95hkoRrutU
 qnWXuUGUdieUvXA9HltntTzy2BgJVtg7L/p8YEbd97dxtgK80sbdnjfDswFvEeE4
 nTvE+IDKdCmb
 =QiQp
 -----END PGP SIGNATURE-----

Merge tag 'net-6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from bpf and wifi.

  Current release - regressions:

   - neighbour: fix __randomize_layout crash in struct neighbour

   - r8169: fix deadlock on RTL8125 in jumbo mtu mode

  Previous releases - regressions:

   - wifi:
       - mac80211: fix warning at station removal time
       - cfg80211: fix CQM for non-range use

   - tools: ynl-gen: fix unexpected response handling

   - octeontx2-af: fix possible buffer overflow

   - dpaa2: recycle the RX buffer only after all processing done

   - rswitch: fix missing dev_kfree_skb_any() in error path

  Previous releases - always broken:

   - ipv4: fix uaf issue when receiving igmp query packet

   - wifi: mac80211: fix debugfs deadlock at device removal time

   - bpf:
       - sockmap: af_unix stream sockets need to hold ref for pair sock
       - netdevsim: don't accept device bound programs

   - selftests: fix a char signedness issue

   - dsa: mv88e6xxx: fix marvell 6350 probe crash

   - octeontx2-pf: restore TC ingress police rules when interface is up

   - wangxun: fix memory leak on msix entry

   - ravb: keep reverse order of operations in ravb_remove()"

* tag 'net-6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (51 commits)
  net: ravb: Keep reverse order of operations in ravb_remove()
  net: ravb: Stop DMA in case of failures on ravb_open()
  net: ravb: Start TX queues after HW initialization succeeded
  net: ravb: Make write access to CXR35 first before accessing other EMAC registers
  net: ravb: Use pm_runtime_resume_and_get()
  net: ravb: Check return value of reset_control_deassert()
  net: libwx: fix memory leak on msix entry
  ice: Fix VF Reset paths when interface in a failed over aggregate
  bpf, sockmap: Add af_unix test with both sockets in map
  bpf, sockmap: af_unix stream sockets need to hold ref for pair sock
  tools: ynl-gen: always construct struct ynl_req_state
  ethtool: don't propagate EOPNOTSUPP from dumps
  ravb: Fix races between ravb_tx_timeout_work() and net related ops
  r8169: prevent potential deadlock in rtl8169_close
  r8169: fix deadlock on RTL8125 in jumbo mtu mode
  neighbour: Fix __randomize_layout crash in struct neighbour
  octeontx2-pf: Restore TC ingress police rules when interface is up
  octeontx2-pf: Fix adding mbox work queue entry when num_vfs > 64
  net: stmmac: xgmac: Disable FPE MMC interrupts
  octeontx2-af: Fix possible buffer overflow
  ...
2023-12-01 08:24:46 +09:00
Dmitry Antipov
0015eb6e12 smb: client, common: fix fortify warnings
When compiling with gcc version 14.0.0 20231126 (experimental)
and CONFIG_FORTIFY_SOURCE=y, I've noticed the following:

In file included from ./include/linux/string.h:295,
                 from ./include/linux/bitmap.h:12,
                 from ./include/linux/cpumask.h:12,
                 from ./arch/x86/include/asm/paravirt.h:17,
                 from ./arch/x86/include/asm/cpuid.h:62,
                 from ./arch/x86/include/asm/processor.h:19,
                 from ./arch/x86/include/asm/cpufeature.h:5,
                 from ./arch/x86/include/asm/thread_info.h:53,
                 from ./include/linux/thread_info.h:60,
                 from ./arch/x86/include/asm/preempt.h:9,
                 from ./include/linux/preempt.h:79,
                 from ./include/linux/spinlock.h:56,
                 from ./include/linux/wait.h:9,
                 from ./include/linux/wait_bit.h:8,
                 from ./include/linux/fs.h:6,
                 from fs/smb/client/smb2pdu.c:18:
In function 'fortify_memcpy_chk',
    inlined from '__SMB2_close' at fs/smb/client/smb2pdu.c:3480:4:
./include/linux/fortify-string.h:588:25: warning: call to '__read_overflow2_field'
declared with attribute warning: detected read beyond size of field (2nd parameter);
maybe use struct_group()? [-Wattribute-warning]
  588 |                         __read_overflow2_field(q_size_field, size);
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

and:

In file included from ./include/linux/string.h:295,
                 from ./include/linux/bitmap.h:12,
                 from ./include/linux/cpumask.h:12,
                 from ./arch/x86/include/asm/paravirt.h:17,
                 from ./arch/x86/include/asm/cpuid.h:62,
                 from ./arch/x86/include/asm/processor.h:19,
                 from ./arch/x86/include/asm/cpufeature.h:5,
                 from ./arch/x86/include/asm/thread_info.h:53,
                 from ./include/linux/thread_info.h:60,
                 from ./arch/x86/include/asm/preempt.h:9,
                 from ./include/linux/preempt.h:79,
                 from ./include/linux/spinlock.h:56,
                 from ./include/linux/wait.h:9,
                 from ./include/linux/wait_bit.h:8,
                 from ./include/linux/fs.h:6,
                 from fs/smb/client/cifssmb.c:17:
In function 'fortify_memcpy_chk',
    inlined from 'CIFS_open' at fs/smb/client/cifssmb.c:1248:3:
./include/linux/fortify-string.h:588:25: warning: call to '__read_overflow2_field'
declared with attribute warning: detected read beyond size of field (2nd parameter);
maybe use struct_group()? [-Wattribute-warning]
  588 |                         __read_overflow2_field(q_size_field, size);
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In both cases, the fortification logic inteprets calls to 'memcpy()' as an
attempts to copy an amount of data which exceeds the size of the specified
field (i.e. more than 8 bytes from __le64 value) and thus issues an overread
warning. Both of these warnings may be silenced by using the convenient
'struct_group()' quirk.

Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-11-30 11:17:03 -06:00
Jakub Kicinski
300fbb247e wireless fixes:
- debugfs had a deadlock (removal vs. use of files),
    fixes going through wireless ACKed by Greg
  - support for HT STAs on 320 MHz channels, even if it's
    not clear that should ever happen (that's 6 GHz), best
    not to WARN()
  - fix for the previous CQM fix that broke most cases
  - various wiphy locking fixes
  - various small driver fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmVnUrkACgkQ10qiO8sP
 aACE4A/9EdhyexRlNKdF3yLh0Q4acuR8IlaLBNgGiQDFxx17djE04p2PTpwkxOyn
 0EPy8dKhhWwoAkbvZ+6ToNFa+Jv9w+C5xVls2osGRuYGSVNlxCNy+8tWSNVT73jp
 rrapEkYHzuz9wBZiJpKwC+mK9uH9gAQgWyYUaTTeOnO/+m3HVhZdU6y71j8gQlm6
 9YFSI3r/VWlq1JpThn8WGULJXOMICWN0Sp4XRhEzHPLjK7MiCLNrrQSyV+uMHHUI
 PmRQN8QW6Oomjbcih3YNn+Geps7xUJFEG4mZvM6GUBXugnIq9t9xmzEbEJyBRIpl
 MGTwrIAyxtvBeHMpB7U/R3RSEyHfSW6fXHgN2S8ZEFTu6v70gP3TK8lL1gWc5mR4
 GogNaGQ0G6LdiHjM7XeGUv/SfzD6HJWa9aR1JbRwapkvxFfM7BEc14gxip0nFDLG
 z9sixiztGmzBsxmDc+h7M6YLlvWe4oqueUyjA7/p42NhYq41Zy/BYK5oUgsuL4ah
 Lsb00O6Mz9m6iPtJlREe81Qsclkjg8xMY2YcCYELXP3v50KwU3THBAOYbh5hvgrL
 T2gibkR0zFAGz1EDcPyQq/SBeUtASPVHyLGUMae6DOcVnOFdAhDTbX0B/TvcnRSq
 e1gvmNywL7wFVZLD/RYZtWW9g6NUcwK+CU59PRBh2OqVb0ow8Ls=
 =X6t2
 -----END PGP SIGNATURE-----

Merge tag 'wireless-2023-11-29' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

Johannes Berg says:

====================
wireless fixes:
 - debugfs had a deadlock (removal vs. use of files),
   fixes going through wireless ACKed by Greg
 - support for HT STAs on 320 MHz channels, even if it's
   not clear that should ever happen (that's 6 GHz), best
   not to WARN()
 - fix for the previous CQM fix that broke most cases
 - various wiphy locking fixes
 - various small driver fixes

* tag 'wireless-2023-11-29' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
  wifi: mac80211: use wiphy locked debugfs for sdata/link
  wifi: mac80211: use wiphy locked debugfs helpers for agg_status
  wifi: cfg80211: add locked debugfs wrappers
  debugfs: add API to allow debugfs operations cancellation
  debugfs: annotate debugfs handlers vs. removal with lockdep
  debugfs: fix automount d_fsdata usage
  wifi: mac80211: handle 320 MHz in ieee80211_ht_cap_ie_to_sta_ht_cap
  wifi: avoid offset calculation on NULL pointer
  wifi: cfg80211: hold wiphy mutex for send_interface
  wifi: cfg80211: lock wiphy mutex for rfkill poll
  wifi: cfg80211: fix CQM for non-range use
  wifi: mac80211: do not pass AP_VLAN vif pointer to drivers during flush
  wifi: iwlwifi: mvm: fix an error code in iwl_mvm_mld_add_sta()
  wifi: mt76: mt7925: fix typo in mt7925_init_he_caps
  wifi: mt76: mt7921: fix 6GHz disabled by the missing default CLC config
====================

Link: https://lore.kernel.org/r/20231129150809.31083-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-29 19:43:34 -08:00
David Howells
88010155f0 cifs: Fix FALLOC_FL_INSERT_RANGE by setting i_size after EOF moved
Fix the cifs filesystem implementations of FALLOC_FL_INSERT_RANGE, in
smb3_insert_range(), to set i_size after extending the file on the server
and before we do the copy to open the gap (as we don't clean up the EOF
marker if the copy fails).

Fixes: 7fe6fe95b9 ("cifs: add FALLOC_FL_INSERT_RANGE support")
Cc: stable@vger.kernel.org
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Paulo Alcantara <pc@manguebit.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: linux-mm@kvack.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-11-29 19:59:49 -06:00
David Howells
83d5518b12 cifs: Fix FALLOC_FL_ZERO_RANGE by setting i_size if EOF moved
Fix the cifs filesystem implementations of FALLOC_FL_ZERO_RANGE, in
smb3_zero_range(), to set i_size after extending the file on the server.

Fixes: 72c419d9b0 ("cifs: fix smb3_zero_range so it can expand the file-size when required")
Cc: stable@vger.kernel.org
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Paulo Alcantara <pc@manguebit.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: linux-mm@kvack.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-11-29 19:59:20 -06:00
Kent Overstreet
415e5107b0 bcachefs: Extra kthread_should_stop() calls for copygc
This fixes a bug where going read-only was taking longer than it should
have due to copygc forgetting to check kthread_should_stop()

Additionally: fix a missing is_kthread check in bch2_move_ratelimit().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-28 22:58:23 -05:00
Kent Overstreet
463086d998 bcachefs: Convert gc_alloc_start() to for_each_btree_key2()
This eliminates some SRCU warnings: for_each_btree_key2() runs every
loop iteration in a distinct transaction context.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-28 22:58:22 -05:00
Kent Overstreet
2111f39459 bcachefs: Fix race between btree writes and metadata drop
btree writes update the btree node key after every write, in order to
update sectors_written, and they also might need to drop pointers if one
of the writes failed in a replicated btree node.

But the btree node might also have had a pointer dropped while the write
was in flight, by bch2_dev_metadata_drop(), and thus there was a bug
where the btree node write would ovewrite the btree node's key with what
it had at the start of the write.

Fix this by dropping pointers not currently in the btree node key.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-28 22:58:22 -05:00
Kent Overstreet
ef0beeb8dd bcachefs: move journal seq assertion
journal_cur_seq() can legitimately be used outside of the journal lock,
where this assert can race

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-28 22:58:22 -05:00
Kent Overstreet
1b1bd0fd41 bcachefs: -EROFS doesn't count as move_extent_start_fail
The automated tests check if we've hit too many slowpath/error path
events and fail the test - if we're just shutting down, that naturally
shouldn't count.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-28 22:58:22 -05:00
Paulo Alcantara
9d63509547 smb: client: report correct st_size for SMB and NFS symlinks
We can't rely on FILE_STANDARD_INFORMATION::EndOfFile for reparse
points as they will be always zero.  Set it to symlink target's length
as specified by POSIX.

This will make stat() family of syscalls return the correct st_size
for such files.

Cc: stable@vger.kernel.org
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-11-28 20:41:41 -06:00
Paulo Alcantara
ef22bb800d smb: client: fix missing mode bits for SMB symlinks
When instantiating inodes for SMB symlinks, add the mode bits from
@cifs_sb->ctx->file_mode as we already do for the other special files.

Cc: stable@vger.kernel.org
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-11-28 20:40:21 -06:00
Kent Overstreet
ae4d612cc1 bcachefs: trace_move_extent_start_fail() now includes errcode
Renamed from trace_move_extent_alloc_mem_fail, because there are other
reasons we colud fail (disk space allocation failure).

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-28 17:18:24 -05:00
Kent Overstreet
5510a4af52 bcachefs: Fix split_race livelock
bch2_btree_update_start() calculates which nodes are going to have to be
split/rewritten, so that we know how many nodes to reserve and how deep
in the tree we have to take locks.

But btree node merges require inserting two keys into the parent node,
not just splits.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-28 17:18:24 -05:00
Kent Overstreet
03013bb0c6 bcachefs: Fix bucket data type for stripe buckets
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-28 17:18:24 -05:00
Kent Overstreet
d5bd37872a bcachefs: Add missing validation for jset_entry_data_usage
Validation was completely missing for replicas entries in the journal
(not the superblock replicas section) - we can't have replicas entries
pointing to invalid devices.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-28 17:18:24 -05:00
Kent Overstreet
bbc3a46065 bcachefs: Fix zstd compress workspace size
zstd apparently lies about the size of the compression workspace it
requires; if we double it compression succeeds.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-28 17:18:24 -05:00
Linus Torvalds
18d46e76d7 for-6.7-rc3-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmVmHvEACgkQxWXV+ddt
 WDtczA//UdxSPxQwJY1oOQj3k5Zgb/zOThfyD4x5wrDxFAYwVh1XhuyxV2XT8qyq
 ipp+mi9doVykZKLSY+oRAeqjGlF0nIYm1z2PGSU7JXz4VsEj970rsDs3ePwH5TW6
 V8/6VraOIIvmOnTft7aiuM8CXUjyndalNl7RvHu+v6grgAAgQAaly/3CmRIsm/Ui
 2Wb6/J8ciAOBZ8TkFMr0PiTJd+CjUL+1Y9IaYEywujf0nVNJSgHp+R2CpwLDvM/5
 1x6LdRrUKmnY7mhOvC7QwWfQmgsPnj3OuR+3+L+8jULTvcpwka2KEcpCH8/s6mUK
 +4XhKQ4xXOJr8M+KmAUpy1yZZ30G6cDSwnfCWbWRCfR03396tTb08kb6G21fR+NL
 o2qEUOe4DoMbYX/5zd9xEVqbwyGhAIXB0fJ7KJ0RqbaNBh/roRALBVCseP2CFwJE
 P0DE9phjeIGQf3ybdfP7XqnMfk520bqoeV49Akbn2us2SrV1+O9Yjqmj2pbTnljE
 M30Jh/btaiTFtsGB3MBDRRnGhf7F2l1dsmdmMVhdOK8HMY6obcJUdv6YXVLAjBDn
 ATWtUVVizOpHvSZL0G/+1fXqHhLqOnHLY4A97uMjcElK5WJfuYZv8vZK7GVKC/jW
 y5F4w/FPxU8dmhorMGksya2CLMvUsv5dikyAzGHirjEAdyrK1jg=
 =85Pb
 -----END PGP SIGNATURE-----

Merge tag 'for-6.7-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "A few fixes and message updates:

   - for simple quotas, handle the case when a snapshot is created and
     the target qgroup already exists

   - fix a warning when file descriptor given to send ioctl is not
     writable

   - fix off-by-one condition when checking chunk maps

   - free pages when page array allocation fails during compression
     read, other cases were handled

   - fix memory leak on error handling path in ref-verify debugging
     feature

   - copy missing struct member 'version' in 64/32bit compat send ioctl

   - tree-checker verifies inline backref ordering

   - print messages to syslog on first mount and last unmount

   - update error messages when reading chunk maps"

* tag 'for-6.7-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: send: ensure send_fd is writable
  btrfs: free the allocated memory if btrfs_alloc_page_array() fails
  btrfs: fix 64bit compat send ioctl arguments not initializing version member
  btrfs: make error messages more clear when getting a chunk map
  btrfs: fix off-by-one when checking chunk map includes logical address
  btrfs: ref-verify: fix memory leaks in btrfs_ref_tree_mod()
  btrfs: add dmesg output for first mount and last unmount of a filesystem
  btrfs: do not abort transaction if there is already an existing qgroup
  btrfs: tree-checker: add type and sequence check for inline backrefs
2023-11-28 11:16:04 -08:00
Linus Torvalds
df60cee26a Seven ksmbd server fixes
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmVkHAAACgkQiiy9cAdy
 T1GTHwv+O0zp+oiqPezYFnDIp387AjDneUnSlGXyt9Xki0x3i84KiCMlrEnh3W0X
 rej+Dqnxi0/559L4HPUg0fhibFh2eClN1/B28isbep9vYKSv9AH1z1zN8/g1RV5r
 SI+eog1RvBx5DVGuH4+2ChbEJ1ys5StTzEUa4Csln6VCbfTa5rL1X3Lzukc8gPxf
 Qp3fun74xINSHDk2yvEr5Con9inu0NQOT+0IEaT5fCxZVb33tMtFt1NH3n4v+wL3
 SsGhrFvP6GBhSx/m2cPofXquEtE+iHa2/5KYrbP8ypGTFxx5VtRvQrPv/U5y0xme
 sh/C6xiYBc4QmhQf44vmt4OKhLEYTsSOUk2TO8QAgSJc1sIe6VJrwbhmqJsD6YMI
 YyLb1CAN54Yz3Gdi4wrD3uC5BAHr1Ybsx4at6P/7SxyTdILHDZyg9occS23Jd3kq
 Tv/iLz+EQPU0uZavhLug4gFezCpHl3YXa6kVHgwz0rpIHHXUOoNQCe8MjsQSz6wE
 a/FqkgZ2
 =tLLt
 -----END PGP SIGNATURE-----

Merge tag '6.7-rc3-smb3-server-fixes' of git://git.samba.org/ksmbd

Pull smb server fixes from Steve French:

 - Memory leak fix

 - Fix possible deadlock in open

 - Multiple SMB3 leasing (caching) fixes including:
     - incorrect open count (found via xfstest generic/002 with leases)
     - lease breaking incorrect serialization
     - lease break error handling fix
     - fix sending async response when lease pending

 - Async command fix

* tag '6.7-rc3-smb3-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: don't update ->op_state as OPLOCK_STATE_NONE on error
  ksmbd: move setting SMB2_FLAGS_ASYNC_COMMAND and AsyncId
  ksmbd: release interim response after sending status pending response
  ksmbd: move oplock handling after unlock parent dir
  ksmbd: separately allocate ci per dentry
  ksmbd: fix possible deadlock in smb2_open
  ksmbd: prevent memory leak on error return
2023-11-27 17:17:23 -08:00
Johannes Berg
8c88a47435 debugfs: add API to allow debugfs operations cancellation
In some cases there might be longer-running hardware accesses
in debugfs files, or attempts to acquire locks, and we want
to still be able to quickly remove the files.

Introduce a cancellations API to use inside the debugfs handler
functions to be able to cancel such operations on a per-file
basis.

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-11-27 11:24:55 +01:00
Johannes Berg
f4acfcd4de debugfs: annotate debugfs handlers vs. removal with lockdep
When you take a lock in a debugfs handler but also try
to remove the debugfs file under that lock, things can
deadlock since the removal has to wait for all users
to finish.

Add lockdep annotations in debugfs_file_get()/_put()
to catch such issues.

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-11-27 11:24:50 +01:00
Johannes Berg
0ed04a1847 debugfs: fix automount d_fsdata usage
debugfs_create_automount() stores a function pointer in d_fsdata,
but since commit 7c8d469877 ("debugfs: add support for more
elaborate ->d_fsdata") debugfs_release_dentry() will free it, now
conditionally on DEBUGFS_FSDATA_IS_REAL_FOPS_BIT, but that's not
set for the function pointer in automount. As a result, removing
an automount dentry would attempt to free the function pointer.
Luckily, the only user of this (tracing) never removes it.

Nevertheless, it's safer if we just handle the fsdata in one way,
namely either DEBUGFS_FSDATA_IS_REAL_FOPS_BIT or allocated. Thus,
change the automount to allocate it, and use the real_fops in the
data to indicate whether or not automount is filled, rather than
adding a type tag. At least for now this isn't actually needed,
but the next changes will require it.

Also check in debugfs_file_get() that it gets only called
on regular files, just to make things clearer.

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-11-27 11:24:45 +01:00
Linus Torvalds
5b2b1173a9 eventfs fixes:
- With the usage of simple_recursive_remove() recommended by Al Viro,
   the code should not be calling "d_invalidate()" itself. Doing so
   is causing crashes. The code was calling d_invalidate() on the race
   of trying to look up a file while the parent was being deleted.
   This was detected, and the added dentry was having d_invalidate() called
   on it, but the deletion of the directory was also calling d_invalidate()
   on that same dentry.
 
 - A fix to not free the eventfs_inode (ei) until the last dput() was called
   on its ei->dentry made the ei->dentry exist even after it was marked
   for free by setting the ei->is_freed. But code elsewhere still was
   checking if ei->dentry was NULL if ei->is_freed is set and would
   trigger WARN_ON if that was the case. That's no longer true and there
   should not be any warnings when it is true.
 
 - Use GFP_NOFS for allocations done under eventfs_mutex.
   The eventfs_mutex can be taken on file system reclaim, make sure
   that allocations done under that mutex do not trigger file system
   reclaim.
 
 - Clean up code by moving the taking of inode_lock out of the helper
   functions and into where they are needed, and not use the
   parameter to know to take it or not. It must always be held but
   some callers of the helper function have it taken when they were
   called.
 
 - Warn if the inode_lock is not held in the helper functions.
 
 - Warn if eventfs_start_creating() is called without a parent.
   As eventfs is underneath tracefs, all files created will have
   a parent (the top one will have a tracefs parent).
 
 Tracing update;
 
 - Add Mathieu Desnoyers as an official reviewer of the tracing sub system.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZWNdfBQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qsw1AQC0x3cdNhIhzi1mWh9a8KSH/GJPdl/a
 7t0sv9XT8HV+iQEAyvvr0ov/s7XSAffG2HcYU0WxRGXTI5MyQlzmaIQ9TQo=
 =ZQYD
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt::
 "Eventfs fixes:

   - With the usage of simple_recursive_remove() recommended by Al Viro,
     the code should not be calling "d_invalidate()" itself. Doing so is
     causing crashes. The code was calling d_invalidate() on the race of
     trying to look up a file while the parent was being deleted. This
     was detected, and the added dentry was having d_invalidate() called
     on it, but the deletion of the directory was also calling
     d_invalidate() on that same dentry.

   - A fix to not free the eventfs_inode (ei) until the last dput() was
     called on its ei->dentry made the ei->dentry exist even after it
     was marked for free by setting the ei->is_freed. But code elsewhere
     still was checking if ei->dentry was NULL if ei->is_freed is set
     and would trigger WARN_ON if that was the case. That's no longer
     true and there should not be any warnings when it is true.

   - Use GFP_NOFS for allocations done under eventfs_mutex. The
     eventfs_mutex can be taken on file system reclaim, make sure that
     allocations done under that mutex do not trigger file system
     reclaim.

   - Clean up code by moving the taking of inode_lock out of the helper
     functions and into where they are needed, and not use the parameter
     to know to take it or not. It must always be held but some callers
     of the helper function have it taken when they were called.

   - Warn if the inode_lock is not held in the helper functions.

   - Warn if eventfs_start_creating() is called without a parent. As
     eventfs is underneath tracefs, all files created will have a parent
     (the top one will have a tracefs parent).

  Tracing update:

   - Add Mathieu Desnoyers as an official reviewer of the tracing subsystem"

* tag 'trace-v6.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  MAINTAINERS: TRACING: Add Mathieu Desnoyers as Reviewer
  eventfs: Make sure that parent->d_inode is locked in creating files/dirs
  eventfs: Do not allow NULL parent to eventfs_start_creating()
  eventfs: Move taking of inode_lock into dcache_dir_open_wrapper()
  eventfs: Use GFP_NOFS for allocation when eventfs_mutex is held
  eventfs: Do not invalidate dentry in create_file/dir_dentry()
  eventfs: Remove expectation that ei->is_freed means ei->dentry == NULL
2023-11-26 19:48:20 -08:00
Linus Torvalds
4515866db1 five cifs/smb3 fixes
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmVioPcACgkQiiy9cAdy
 T1Gf7wv/bbwvRAhPi2m51Iv/LZn+XzY639rtvl2RvfuwxzdEsJzmzXHEU4SFYj7w
 9d4vcVndwTzcD8sE9jf6tSgkFd96U9ZEoias+nUdU2CY02DW7C/PVlTckiC4brHD
 2ZOOYqn5w3k8mJGN1dP7+1yw1OtXJeOr65/8JbP12z0eru3lzNn91blbW3faeQSt
 P+dqFTYFygBLiB3AFqlrtXh6TeUpT88UkfYpDrhT+TIiY+BwYKyQBcgfW9Ho12Cx
 FcD5Bc8QhXT2IM0+igjNJgLwqhko6KGz585m/ojABJwIvstl6gtPhT+r07VLAl8K
 87rNuKvIXfdmbaWEcUgOBX9KTa8fduTgVi6s4wj/1jP8NxihOXjDtpRO5xGyTudh
 w3HooLC8SajEzH76MbLRPnmrG07IG+ZRPiYkMIB65sZvXh/rlAM/59k8K+2WMtEN
 cDZ53EO1Y2qd5C/wto/dg3e3eWEC2wxu94x5uWYaXHxAmdDCfoZ4Qj1kSkgSXHd5
 FoMd4xAJ
 =6aLE
 -----END PGP SIGNATURE-----

Merge tag '6.7-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

 - use after free fix in releasing multichannel interfaces

 - fixes for special file types (report char, block, FIFOs properly when
   created e.g. by NFS to Windows)

 - fixes for reporting various special file types and symlinks properly
   when using SMB1

* tag '6.7-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb: client: introduce cifs_sfu_make_node()
  smb: client: set correct file type from NFS reparse points
  smb: client: introduce ->parse_reparse_point()
  smb: client: implement ->query_reparse_point() for SMB1
  cifs: fix use after free for iface while disabling secondary channels
2023-11-26 08:22:27 -08:00
Kent Overstreet
3f3ae1250e bcachefs: bpos is misaligned on big endian
bkey embeds a bpos that is misaligned on big endian; this is so that
bch2_bkey_swab() works correctly without having to differentiate between
packed and non-packed keys (a debatable design decision).

This means it can't have the __aligned() tag on big endian.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-25 21:48:42 -05:00
Kent Overstreet
e4f72bb46a bcachefs: Fix ec + durability calculation
Durability of an erasure coded pointer doesn't add the device
durability; durability is the same for any extent in that stripe so the
calculation only comes from the stripe.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-25 21:48:42 -05:00