Commit Graph

82884 Commits

Author SHA1 Message Date
Bob Peterson
f511e60a55 gfs2: Small gfs2_quota_lock cleanup
No need to set error = 0 since it's set further down.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-09-05 15:58:18 +02:00
Bob Peterson
a4d22e337d gfs2: move qdsb_put and reduce redundancy
This patch looks more invasive than it is. It simply moves function
qdsb_put before qd_unlock, then changes qd_unlock to call it rather than
open coding it. Again, this reduces redundancy.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-09-05 15:58:18 +02:00
Bob Peterson
03d468f1c0 gfs2: improvements to sysfs status
This patch adds some new fields to the gfs2 status file in sysfs to aid
in debugging.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-09-05 15:58:18 +02:00
Bob Peterson
9f494e9bdc gfs2: Don't try to sync non-changes
Function need_sync is supposed to determine if a qd element needs to be
synced. If the "change" (qd_change) is zero, it does not need to be
synced because there's literally no change in the value. Before this
patch need_sync returned false if value < 0. That should be <= 0.
This patch changes the check to <=.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-09-05 15:58:18 +02:00
Bob Peterson
2a4f651167 gfs2: Simplify function need_sync
This patch simplifies function need_sync by eliminating a variable in
favor of just returning the appropriate value as soon as we know it.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-09-05 15:58:18 +02:00
Bob Peterson
e34c16c9c6 gfs2: remove unneeded pg_oflow variable
Function gfs2_write_disk_quota checks if its write overflows onto
another page, and if so, does a second write. Before this patch it kept
two variables for this, but only one is needed. This patch simplifies
it by eliminating pg_oflow.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-09-05 15:58:17 +02:00
Bob Peterson
f0418e4b56 gfs2: remove unneeded variable done
Function gfs2_write_buf_to_page uses variable done to exit its loop, but
it's unnecessary if we just code an infinite loop and exit when we need.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-09-05 15:58:17 +02:00
Bob Peterson
d96dad2715 gfs2: pass sdp to gfs2_write_buf_to_page
This patch passes the superblock pointer to gfs2_write_buf_to_page so it
becomes more apparent it's dealing with the system quota file.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-09-05 15:58:17 +02:00
Bob Peterson
adfd2b5e4f gfs2: pass sdp in to gfs2_write_disk_quota
Like the previous patch, we now pass the superblock pointer to function
gfs2_write_disk_quota. This makes the code more understandable, since it
only operates on the quota inode.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-09-05 15:58:17 +02:00
Bob Peterson
ee1768e467 gfs2: Pass sdp to gfs2_adjust_quota
Before this change function gfs2_adjust_quota's first parameter was an
gfs2_inode pointer. But it always pointed to the quota inode. Here we
switch that to pass the superblock pointer, sdp, so it is easier to read
the code and understand that it's only dealing with the quota inode.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-09-05 15:58:17 +02:00
Bob Peterson
768963ab07 gfs2: remove dead code for quota writes
Since patch 845802b112 function gfs2_write_buf_to_page checks if the
target inode is jdata or ordered. This function only operates on the
system quota file, which is always jdata, so the check for jdata is
useless. This patch removes it.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-09-05 15:58:17 +02:00
Bob Peterson
eef46ab713 gfs2: Introduce new quota=quiet mount option
This patch adds a new mount option quota=quiet which is the same as
quota=on but it suppresses gfs2 quota error messages.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-09-05 15:58:17 +02:00
Andreas Gruenbacher
267d1a011e gfs2: Add device name to gfs2_logd and gfs2_quotad
Add the device name to the names of the gfs2_logd and gfs2_quotad kernel
threads to allow for easier identification.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-09-05 15:58:17 +02:00
Andreas Gruenbacher
ab8eecf5d0 gfs2: Rename "freeze_workqueue" to "gfs2_freeze"
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-09-05 15:58:17 +02:00
Andreas Gruenbacher
5c0dc371a2 gfs2: Rename "gfs_recovery" workqueue to "gfs2_recovery"
Rename the "gfs_recovery" workqueue to "gfs2_recovery", and
gfs_recovery_wq to gfs2_recovery_wq.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-09-05 15:58:17 +02:00
Andreas Gruenbacher
e3da6be3d7 gfs2: Fix withdraw race
Function gfs2_withdraw() tries to synchronize concurrent callers by
atomically setting the SDF_WITHDRAWN flag in the first caller, setting
the SDF_WITHDRAW_IN_PROG flag to indicate that a withdraw is in
progress, performing the actual withdraw, and clearing the
SDF_WITHDRAW_IN_PROG flag when done.  All other callers wait for the
SDF_WITHDRAW_IN_PROG flag to be cleared before returning.

This leaves a small window in which callers can find the SDF_WITHDRAWN
flag set before the SDF_WITHDRAW_IN_PROG flag has been set, causing them
to return prematurely, before the withdraw has been completed.

Fix that by setting the SDF_WITHDRAWN and SDF_WITHDRAW_IN_PROG flags
atomically.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-09-05 15:58:17 +02:00
Andreas Gruenbacher
fe0690f0a6 gfs2: Sanitize kthread stopping
Immediately stop the logd and quotad kernel threads when a filesystem
withdraw is detected: those threads aren't doing anything useful after a
withdraw.  (Depends on the extra logd and quotad task struct references
held since commit 7a109f383fa3 ("gfs2: Fix asynchronous thread
destruction").)

In addition, check for kthread_should_stop() in the wait condition in
gfs2_quotad() to stop immediately when kthread_stop() is called.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-09-05 15:58:17 +02:00
Andreas Gruenbacher
e4a8b5481c gfs2: Switch to wait_event in gfs2_quotad
In gfs2_quotad(), switch from an open-coded wait loop to
wait_event_interruptible_timeout().

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-09-05 15:58:17 +02:00
Andreas Gruenbacher
fe4f7940d2 gfs2: Fix asynchronous thread destruction
The kernel threads are currently stopped and destroyed synchronously by
gfs2_make_fs_ro() and gfs2_put_super(), and asynchronously by
signal_our_withdraw(), with no synchronization, so the synchronous and
asynchronous contexts can race with each other.

First, when creating the kernel threads, take an extra task struct
reference so that the task struct won't go away immediately when they
terminate.  This allows those kthreads to terminate immediately when
they're done rather than hanging around as zombies until they are reaped
by kthread_stop().  When kthread_stop() is called on a terminated
kthread, it will return immediately.

Second, in signal_our_withdraw(), once the SDF_JOURNAL_LIVE flag has
been cleared, wake up the logd and quotad wait queues instead of
stopping the logd and quotad kthreads.  The kthreads are then expected
to terminate automatically within short time, but if they cannot, they
will not block the withdraw.

For example, if a user process and one of the kthread decide to withdraw
at the same time, only one of them will perform the actual withdraw and
the other will wait for it to be done.  If the kthread ends up being the
one to wait, the withdrawing user process won't be able to stop it.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-09-05 15:58:17 +02:00
Andreas Gruenbacher
f66af88e33 gfs2: Stop using gfs2_make_fs_ro for withdraw
[   81.372851][ T5532] CPU: 1 PID: 5532 Comm: syz-executor.0 Not tainted 6.2.0-rc1-syzkaller-dirty #0
[   81.382080][ T5532] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/12/2023
[   81.392343][ T5532] Call Trace:
[   81.395654][ T5532]  <TASK>
[   81.398603][ T5532]  dump_stack_lvl+0x1b1/0x290
[   81.418421][ T5532]  gfs2_assert_warn_i+0x19a/0x2e0
[   81.423480][ T5532]  gfs2_quota_cleanup+0x4c6/0x6b0
[   81.428611][ T5532]  gfs2_make_fs_ro+0x517/0x610
[   81.457802][ T5532]  gfs2_withdraw+0x609/0x1540
[   81.481452][ T5532]  gfs2_inode_refresh+0xb2d/0xf60
[   81.506658][ T5532]  gfs2_instantiate+0x15e/0x220
[   81.511504][ T5532]  gfs2_glock_wait+0x1d9/0x2a0
[   81.516352][ T5532]  do_sync+0x485/0xc80
[   81.554943][ T5532]  gfs2_quota_sync+0x3da/0x8b0
[   81.559738][ T5532]  gfs2_sync_fs+0x49/0xb0
[   81.564063][ T5532]  sync_filesystem+0xe8/0x220
[   81.568740][ T5532]  generic_shutdown_super+0x6b/0x310
[   81.574112][ T5532]  kill_block_super+0x79/0xd0
[   81.578779][ T5532]  deactivate_locked_super+0xa7/0xf0
[   81.584064][ T5532]  cleanup_mnt+0x494/0x520
[   81.593753][ T5532]  task_work_run+0x243/0x300
[   81.608837][ T5532]  exit_to_user_mode_loop+0x124/0x150
[   81.614232][ T5532]  exit_to_user_mode_prepare+0xb2/0x140
[   81.619820][ T5532]  syscall_exit_to_user_mode+0x26/0x60
[   81.625287][ T5532]  do_syscall_64+0x49/0xb0
[   81.629710][ T5532]  entry_SYSCALL_64_after_hwframe+0x63/0xcd

In this backtrace, gfs2_quota_sync() takes quota data references and
then calls do_sync().  Function do_sync() encounters filesystem
corruption and withdraws the filesystem, which (among other things) calls
gfs2_quota_cleanup().  Function gfs2_quota_cleanup() wrongly assumes
that nobody is holding any quota data references anymore, and destroys
all quota data objects.  When gfs2_quota_sync() then resumes and
dereferences the quota data objects it is holding, those objects are no
longer there.

Function gfs2_quota_cleanup() deals with resource deallocation and can
easily be delayed until gfs2_put_super() in the case of a filesystem
withdraw.  In fact, most of the other work gfs2_make_fs_ro() does is
unnecessary during a withdraw as well, so change signal_our_withdraw()
to skip gfs2_make_fs_ro() and perform the necessary steps directly
instead.

Thanks to Edward Adam Davis <eadavis@sina.com> for the initial patches.

Link: https://lore.kernel.org/all/0000000000002b5e2405f14e860f@google.com
Reported-by: syzbot+3f6a670108ce43356017@syzkaller.appspotmail.com
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-09-05 15:58:17 +02:00
Andreas Gruenbacher
a475c5dd16 gfs2: Free quota data objects synchronously
In gfs2_quota_cleanup(), wait for the quota data objects to be freed
before returning.  Otherwise, there is no guarantee that the quota data
objects will be gone when their kmem cache is destroyed.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-09-05 15:58:17 +02:00
Andreas Gruenbacher
bb73ae8ff3 gfs2: Fix initial quota data refcount
Fix the refcount of quota data objects created directly by
gfs2_quota_init(): those are placed into the in-memory quota "database"
for eventual syncing to the main quota file, but they are not actively
held and should thus have an initial refcount of 0.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-09-05 15:58:17 +02:00
Andreas Gruenbacher
fae2e73a55 gfs2: No more quota complaints after withdraw
Once a filesystem is withdrawn, don't complain about quota changes
that can't be synced to the main quota file anymore.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-09-05 15:58:16 +02:00
Andreas Gruenbacher
faada74a90 gfs2: Factor out duplicate quota data disposal code
Rename gfs2_qd_dispose() to gfs2_qd_dispose_list().  Move some code
duplicated in gfs2_qd_dispose_list() and gfs2_quota_cleanup() into a
new gfs2_qd_dispose() function.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-09-05 15:58:16 +02:00
Andreas Gruenbacher
961fe3422e gfs2: Use gfs2_qd_dispose in gfs2_quota_cleanup
Change gfs2_quota_cleanup() to move the quota data objects to dispose of
on a dispose list and call gfs2_qd_dispose() on that list, like
gfs2_qd_shrink_scan() does, instead of disposing of the quota data
objects directly.

This may look a bit pointless by itself, but it will make more sense in
combination with a fix that follows.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-09-05 15:58:16 +02:00
Andreas Gruenbacher
6b0e9a5f1e gfs2: Fix wrong quota shrinker return value
Function gfs2_qd_isolate must only return LRU_REMOVED when removing the
item from the lru list; otherwise, the number of items on the list will
go wrong.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-09-05 15:58:16 +02:00
Andreas Gruenbacher
e7beb8b6de gfs2: Rename SDF_DEACTIVATING to SDF_KILL
Rename the SDF_DEACTIVATING flag to SDF_KILL to make it more obvious
that this relates to the kill_sb filesystem operation.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-09-05 15:58:16 +02:00
Andreas Gruenbacher
3c69c437bf gfs2: Rename sd_{ glock => kill }_wait
Rename sd_glock_wait to sd_kill_wait: we'll use it for other things
related to "killing" a filesystem on unmount soon (kill_sb).

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-09-05 15:58:16 +02:00
Bob Peterson
481f6e7d73 gfs2: Use qd_sbd more consequently
Before this patch many of the functions in quota.c got their superblock
pointer, sdp, from the quota_data's glock pointer. That's silly because
the qd already has its own pointer to the superblock (qd_sbd).

This patch changes references to use that instead, eliminating a level
of indirection.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-09-05 15:58:16 +02:00
Andreas Gruenbacher
db77789bae gfs2: journal flush threshold fixes and cleanup
Commit f07b352021 ("GFS2: Made logd daemon take into account log
demand") changed gfs2_ail_flush_reqd() and gfs2_jrnl_flush_reqd() to
take sd_log_blks_needed into account, but the checks in
gfs2_log_commit() were not updated correspondingly.

Once that is fixed, gfs2_jrnl_flush_reqd() and gfs2_ail_flush_reqd() can
be used in gfs2_log_commit().  Make those two helpers available to
gfs2_log_commit() by defining them above gfs2_log_commit().

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-09-05 15:58:16 +02:00
Andreas Gruenbacher
b6b8f72a11 gfs2: Fix logd wakeup on I/O error
When quotad detects an I/O error, it sets sd_log_error and then it wakes
up logd to withdraw the filesystem.  However, logd doesn't wake up when
sd_log_error is set.  Fix that.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-09-05 15:58:16 +02:00
Andreas Gruenbacher
b74cd55aa9 gfs2: low-memory forced flush fixes
First, function gfs2_ail_flush_reqd checks the SDF_FORCE_AIL_FLUSH flag
to determine if an AIL flush should be forced in low-memory situations.
However, it also immediately clears the flag, and when called repeatedly
as in function gfs2_logd, the flag will be lost.  Fix that by pulling
the SDF_FORCE_AIL_FLUSH flag check out of gfs2_ail_flush_reqd.

Second, function gfs2_writepages sets the SDF_FORCE_AIL_FLUSH flag
whether or not enough pages were written.  If enough pages could be
written, flushing the AIL is unnecessary, though.

Third, gfs2_writepages doesn't wake up logd after setting the
SDF_FORCE_AIL_FLUSH flag, so it can take a long time for logd to react.
It would be preferable to wake up logd, but that hurts the performance
of some workloads and we don't quite understand why so far, so don't
wake up logd so far.

Fixes: b066a4eebd ("gfs2: forcibly flush ail to relieve memory pressure")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-09-05 15:58:16 +02:00
Andreas Gruenbacher
6df373b09b gfs2: Switch to wait_event in gfs2_logd
In gfs2_logd(), switch from an open-coded wait loop to
wait_event_interruptible_timeout().

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-09-05 15:58:16 +02:00
Bob Peterson
66fa9912ec gfs2: conversion deadlock do_promote bypass
Consider the following case:
1. A glock is held in shared mode.
2. A process requests the glock in exclusive mode (rename).
3. Before the lock is granted, more processes (read / ls) request the
   glock in shared mode again.
4. gfs2 sends a request to dlm for the lock in exclusive mode because
   that holder is at the head of the queue.
5. Somehow the dlm request gets canceled, so dlm sends us back a
   response with state == LM_ST_SHARED and LM_OUT_CANCELED.  So at that
   point, the glock is still held in shared mode.
6. finish_xmote gets called to process the response from dlm. It detects
   that the glock is not in the requested mode and no demote is in
   progress, so it moves the canceled holder to the tail of the queue
   and finds the new holder at the head of the queue.  That holder is
   requesting the glock in shared mode.
7. finish_xmote calls do_xmote to transition the glock into shared mode,
   but the glock is already in shared mode and so do_xmote complains
   about that with:
	GLOCK_BUG_ON(gl, gl->gl_state == gl->gl_target);

Instead, in finish_xmote, after moving the canceled holder to the tail
of the queue, check if any new holders can be granted.  Only call
do_xmote to repeat the dlm request if the holder at the head of the
queue is requesting the glock in a mode that is incompatible with the
mode the glock is currently held in.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-09-05 15:58:16 +02:00
Andreas Gruenbacher
0b93bac227 gfs2: Remove LM_FLAG_PRIORITY flag
The last user of this flag was removed in commit b77b4a4815 ("gfs2:
Rework freeze / thaw logic").

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-09-05 15:58:16 +02:00
Andreas Gruenbacher
de3e7f97ae gfs2: do_promote cleanup
Change function do_promote to return true on success, and false
otherwise.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-09-05 15:58:16 +02:00
Andreas Gruenbacher
dc0b943523 gfs: Don't use GFP_NOFS in gfs2_unstuff_dinode
Revert the rest of commit 220cca2a4f ("GFS2: Change truncate page
allocation to be GFP_NOFS"):

In gfs2_unstuff_dinode(), there is no need to carry out the page cache
allocation under GFP_NOFS because inodes on the "regular" filesystem are
never un-inlined under memory pressure, so switch back from
find_or_create_page() to grab_cache_page() here as well.

Inodes on the "metadata" filesystem can theoretically be un-inlined
under memory pressure, but any page cache allocations in that context
would happen in GFP_NOFS context because those inodes have
inode->i_mapping->gfp_mask set to GFP_NOFS (see the previous patch).

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-09-05 15:58:16 +02:00
Andreas Gruenbacher
111c7d27a1 gfs2: Use mapping->gfp_mask for metadata inodes
Set mapping->gfp mask to GFP_NOFS for all metadata inodes so that
allocating pages in the address space of those inodes won't call back
into the filesystem.  This allows to switch back from
find_or_create_page() to grab_cache_page() in two places.

Partially reverts commit 220cca2a4f ("GFS2: Change truncate page
allocation to be GFP_NOFS").

Thanks to Dan Carpenter <dan.carpenter@linaro.org> for pointing out a
Smatch static checker warning.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-09-05 15:58:15 +02:00
Minjie Du
5f02d16868 gfs2: increase usage of folio_next_index() helper
Simplify code pattern of 'folio->index + folio_nr_pages(folio)' by using
the existing helper folio_next_index().

Signed-off-by: Minjie Du <duminjie@vivo.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-09-05 15:58:15 +02:00
Linus Torvalds
02aee814d3 gfs2 fixes
- Fix a freeze consistency check in gfs2_trans_add_meta().
 
 - Don't use filemap_splice_read as it can cause deadlocks on gfs2.
 -----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmTSOikUHGFncnVlbmJh
 QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTpggg//Q+1yil5lS+Egf5Z+gL1E0SgXge7k
 CfEcrjSfkIL8LnVgZSpJD8I++nMdXJb533qkGOIwvAifqIP2ZnGrtK2T4cF6PgWs
 iAtKLPJGtg+6HswgGMEpEnl7sSBo4DYE6EH9TCoht9N0nSfJCAVWP2brVxEmnacl
 omIXQymQAAGilVB58tru0XqfvneeHCvsipEYrJ1if1VGQHUwwU5v3uZGiOh2/VB2
 CU8qrA9kX3O+cXyHDED5Rja0pKkZlxogK6/OUPophTGSRDOJZsnT36OfSCAF+Dhl
 TZ1J8pBircPk7nA5u7GRUk6u5HVc2idmOLx6FoRfgJ97tsDz5NodNxEtyJBY0g/O
 mOD6IRqSLSNgTrH9AFff4BauilT+NOCyMoe69Dw6XHCMfxrQB4l9nEH1clQ+nuks
 2jBcjkpEn71dvAJjM+YNU7a2HOmhz7w2zTidv1pN7SIspBRSOD9w5DIt855PNIwd
 y7SkUsT3GVcrVaUd2mNAmM1PWn0Gu+V3tPJbXqfzCEOKLKyOtMkMndm+uHB7wwH1
 25HaNV+Bj8vWbMTjF0KYIkksQ9TgboIzdeV6Q5DrIEyWsXH/pRJF07JVgGjU80/n
 +pIwqj1yVkipPVZ8orKvoWucp+Q3qFN2CHJTdKjwzeHFM2+MlXktHNDdKbAEGRiw
 irK7Mq41PwAMVgY=
 =pPXf
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-v6.4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 fixes from Andreas Gruenbacher:

 - Fix a freeze consistency check in gfs2_trans_add_meta()

 - Don't use filemap_splice_read as it can cause deadlocks on gfs2

* tag 'gfs2-v6.4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: Don't use filemap_splice_read
  gfs2: Fix freeze consistency check in gfs2_trans_add_meta
2023-08-08 09:27:08 -07:00
Bob Peterson
0be8432166 gfs2: Don't use filemap_splice_read
Starting with patch 2cb1e08985, gfs2 started using the new function
filemap_splice_read rather than the old (and subsequently deleted)
function generic_file_splice_read.

filemap_splice_read works by taking references to a number of folios in
the page cache and splicing those folios into a pipe.  The folios are
then read from the pipe and the folio references are dropped.  This can
take an arbitrary amount of time.  We cannot allow that in gfs2 because
those folio references will pin the inode glock to the node and prevent
it from being demoted, which can lead to cluster-wide deadlocks.

Instead, use copy_splice_read.

(In addition, the old generic_file_splice_read called into ->read_iter,
which called gfs2_file_read_iter, which took the inode glock during the
operation.  The new filemap_splice_read interface does not take the
inode glock anymore.  This is fixable, but it still wouldn't prevent
cluster-wide deadlocks.)

Fixes: 2cb1e08985 ("splice: Use filemap_splice_read() instead of generic_file_splice_read()")
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-08-07 18:42:04 +02:00
Andreas Gruenbacher
2cbd80642b gfs2: Fix freeze consistency check in gfs2_trans_add_meta
Function gfs2_trans_add_meta() checks for the SDF_FROZEN flag to make
sure that no buffers are added to a transaction while the filesystem is
frozen.  With the recent freeze/thaw rework, the SDF_FROZEN flag is
cleared after thaw_super() is called, which is sufficient for
serializing freeze/thaw.

However, other filesystem operations started after thaw_super() may now
be calling gfs2_trans_add_meta() before the SDF_FROZEN flag is cleared,
which will trigger the SDF_FROZEN check in gfs2_trans_add_meta().  Fix
that by checking the s_writers.frozen state instead.

In addition, make sure not to call gfs2_assert_withdraw() with the
sd_log_lock spin lock held.  Check for a withdrawn filesystem before
checking for a frozen filesystem, and don't pin/add buffers to the
current transaction in case of a failure in either case.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2023-08-07 18:40:51 +02:00
Linus Torvalds
0108963f14 v6.5-rc5.vfs.fixes
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZM+bcQAKCRCRxhvAZXjc
 opWnAP9Ik49607Rn3OvFhWiYQp21nJ9NTs4lp5H30gMM3KhOxQEA9YAafIRH3rMs
 zYjmEBwf4FCW9XQ4QgmktsW4Y7RqggE=
 =DCbd
 -----END PGP SIGNATURE-----

Merge tag 'v6.5-rc5.vfs.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs fixes from Christian Brauner:

 - Fix a wrong check for O_TMPFILE during RESOLVE_CACHED lookup

 - Clean up directory iterators and clarify file_needs_f_pos_lock()

* tag 'v6.5-rc5.vfs.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  fs: rely on ->iterate_shared to determine f_pos locking
  vfs: get rid of old '->iterate' directory operation
  proc: fix missing conversion to 'iterate_shared'
  open: make RESOLVE_CACHED correctly test for O_TMPFILE
2023-08-06 10:43:52 -07:00
Christian Brauner
7d84d1b9af
fs: rely on ->iterate_shared to determine f_pos locking
Now that we removed ->iterate we don't need to check for either
->iterate or ->iterate_shared in file_needs_f_pos_lock(). Simply check
for ->iterate_shared instead. This will tell us whether we need to
unconditionally take the lock. Not just does it allow us to avoid
checking f_inode's mode it also actually clearly shows that we're
locking because of readdir.

Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-06 15:08:36 +02:00
Linus Torvalds
3e32715496
vfs: get rid of old '->iterate' directory operation
All users now just use '->iterate_shared()', which only takes the
directory inode lock for reading.

Filesystems that never got convered to shared mode now instead use a
wrapper that drops the lock, re-takes it in write mode, calls the old
function, and then downgrades the lock back to read mode.

This way the VFS layer and other callers no longer need to care about
filesystems that never got converted to the modern era.

The filesystems that use the new wrapper are ceph, coda, exfat, jfs,
ntfs, ocfs2, overlayfs, and vboxsf.

Honestly, several of them look like they really could just iterate their
directories in shared mode and skip the wrapper entirely, but the point
of this change is to not change semantics or fix filesystems that
haven't been fixed in the last 7+ years, but to finally get rid of the
dual iterators.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-06 15:08:35 +02:00
Linus Torvalds
0a2c2baafa
proc: fix missing conversion to 'iterate_shared'
I'm looking at the directory handling due to the discussion about f_pos
locking (see commit 797964253d: "file: reinstate f_pos locking
optimization for regular files"), and wanting to clean that up.

And one source of ugliness is how we were supposed to move filesystems
over to the '->iterate_shared()' function that only takes the inode lock
for reading many many years ago, but several filesystems still use the
bad old '->iterate()' that takes the inode lock for exclusive access.

See commit 6192269444 ("introduce a parallel variant of ->iterate()")
that also added some documentation stating

      Old method is only used if the new one is absent; eventually it will
      be removed.  Switch while you still can; the old one won't stay.

and that was back in April 2016.  Here we are, many years later, and the
old version is still clearly sadly alive and well.

Now, some of those old style iterators are probably just because the
filesystem may end up having per-inode mutable data that it uses for
iterating a directory, but at least one case is just a mistake.

Al switched over most filesystems to use '->iterate_shared()' back when
it was introduced.  In particular, the /proc filesystem was converted as
one of the first ones in commit f50752eaa0 ("switch all procfs
directories ->iterate_shared()").

But then later one new user of '->iterate()' was then re-introduced by
commit 6d9c939dbe ("procfs: add smack subdir to attrs").

And that's clearly not what we wanted, since that new case just uses the
same 'proc_pident_readdir()' and 'proc_pident_lookup()' helper functions
that other /proc pident directories use, and they are most definitely
safe to use with the inode lock held shared.

So just fix it.

This still leaves a fair number of oddball filesystems using the
old-style directory iterator (ceph, coda, exfat, jfs, ntfs, ocfs2,
overlayfs, and vboxsf), but at least we don't have any remaining in the
core filesystems.

I'm going to add a wrapper function that just drops the read-lock and
takes it as a write lock, so that we can clean up the core vfs layer and
make all the ugly 'this filesystem needs exclusive inode locking' be
just filesystem-internal warts.

I just didn't want to make that conversion when we still had a core user
left.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-06 15:08:35 +02:00
Aleksa Sarai
a0fc452a5d
open: make RESOLVE_CACHED correctly test for O_TMPFILE
O_TMPFILE is actually __O_TMPFILE|O_DIRECTORY. This means that the old
fast-path check for RESOLVE_CACHED would reject all users passing
O_DIRECTORY with -EAGAIN, when in fact the intended test was to check
for __O_TMPFILE.

Cc: stable@vger.kernel.org # v5.12+
Fixes: 99668f6180 ("fs: expose LOOKUP_CACHED through openat2() RESOLVE_CACHED")
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Message-Id: <20230806-resolve_cached-o_tmpfile-v1-1-7ba16308465e@cyphar.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-06 15:08:35 +02:00
Linus Torvalds
f6a6916859 small DFS fix
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmTOmLAACgkQiiy9cAdy
 T1EAXAv+L20j8z9XBHdNf2UYsft961NoS3058DC2PBd6xy3yxPDUgrBwFFhx2JGe
 bSH6Q+LfZjJdlfh04iZjwXbEIjFjL8aWpD+gq1F0NenjNXncel39oDaxn7dNgsjl
 FUfkXrsOk/UohnyqoCFPmvzt9dtltShXMjHknIPRcOnFAwtd2GSXvpzQEI3m07cB
 OVQ2sbZ5U1TTxlOIcaJYDSBqpA0yuivohdJkcn7eJytnXXTfRGzFDstfGP18gWfg
 UbdilYKXvsRp9Y+zR6Z0iv1j3llZTfNBtpF25X7nC3yDTQV36UIdWpKby1jSq/V0
 7o6OuooMv7EPLU7ZP8RldWclyahoTKoV6F82LakKENbzYq75IDAPRfKZL5cd5yhk
 PiGAky8aiAcdJZl3OFhIG3Fa9qGHK3MJDwxXV+RVpNq71UEyM6BhY6wUOOziC6Xk
 xio+Mt2hPuAQLbOf25MMAB9uoz8IDSpcdudCJCwKC+zPSZXZAE/FSJBvzQUwkYLj
 kkWoz+ke
 =AY3g
 -----END PGP SIGNATURE-----

Merge tag '6.5-rc4-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fix from Steve French:

 - Fix DFS interlink problem (different namespace)

* tag '6.5-rc4-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6:
  smb: client: fix dfs link mount against w2k8
2023-08-05 13:44:06 -07:00
Linus Torvalds
4593f3c2c6 Two patches to improve RBD exclusive lock interaction with
osd_request_timeout option and another fix to reduce the potential for
 erroneous blocklisting -- this time in CephFS.  All going to stable.
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmTNFFUTHGlkcnlvbW92
 QGdtYWlsLmNvbQAKCRBKf944AhHzi5I8B/9a8C5ed0XfTadHcHX5VQsY3b//4rgp
 0VYkQbjYnSCwrYRIPsvnL8LeLHzbcPGLpFAQXg7uUlmJ5dpaOz303hKmKt5GdyOR
 qvWka3K4zeG177b6yc1srqs0cEsCLpQrn+krnvOl5v87QdFsCP/bsJMOrJ9mlhdM
 9GjkjDRn6jvNyOLGbn3kIvwCRF9NH6/nHzjBcTUzvS8fBUye02o9C1H6ZQ7sYjKH
 sJnmQCNCFHEqdaVjDZ7mw/doIrAbmTV6sgusuPjiF5bHILzX4oWG4UJmRpHFV//S
 JPQgMp2DNjP8tW9aCVLVVVV5t5AKBr84etF59DaFNflk27U3COJWkE0a
 =gw7n
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-6.5-rc5' of https://github.com/ceph/ceph-client

Pull ceph fixes from Ilya Dryomov:
 "Two patches to improve RBD exclusive lock interaction with
  osd_request_timeout option and another fix to reduce the potential for
  erroneous blocklisting -- this time in CephFS. All going to stable"

* tag 'ceph-for-6.5-rc5' of https://github.com/ceph/ceph-client:
  libceph: fix potential hang in ceph_osdc_notify()
  rbd: prevent busy loop when requesting exclusive lock
  ceph: defer stopping mdsc delayed_work
2023-08-04 11:29:38 -07:00
Linus Torvalds
797964253d file: reinstate f_pos locking optimization for regular files
In commit 20ea1e7d13 ("file: always lock position for
FMODE_ATOMIC_POS") we ended up always taking the file pos lock, because
pidfd_getfd() could get a reference to the file even when it didn't have
an elevated file count due to threading of other sharing cases.

But Mateusz Guzik reports that the extra locking is actually measurable,
so let's re-introduce the optimization, and only force the locking for
directory traversal.

Directories need the lock for correctness reasons, while regular files
only need it for "POSIX semantics".  Since pidfd_getfd() is about
debuggers etc special things that are _way_ outside of POSIX, we can
relax the rules for that case.

Reported-by: Mateusz Guzik <mjguzik@gmail.com>
Cc: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/linux-fsdevel/20230803095311.ijpvhx3fyrbkasul@f/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-08-04 11:22:14 -07:00