Commit Graph

2930 Commits

Author SHA1 Message Date
Andreas Gruenbacher
3f4475bf24 Revert "GFS2: Don't add all glocks to the lru"
This reverts commit e7ccaf5fe1.

Before commit e7ccaf5fe1, every time a resource group glock was
dequeued by gfs2_glock_dq(), it was added to the glock LRU list even
though the glock was still referenced by the resource group and could
never be evicted, anyway.  Commit e7ccaf5fe1 added a GLOF_LRU hack to
avoid that overhead for resource group glocks, and that hack was since
adopted for some other types of glocks as well.

We now no longer add glocks to the glock LRU list while they are still
referenced.  This solves the underlying problem, and obsoletes the
GLOF_LRU hack.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
(cherry picked from commit 3e5257c810cba91e274d07f3db5cf013c7c830be)
2024-05-29 15:34:55 +02:00
Andreas Gruenbacher
767fd5a016 gfs2: Revise glock reference counting model
In the current glock reference counting model, a bias of one is added to
a glock's refcount when it is locked (gl->gl_state != LM_ST_UNLOCKED).
A glock is removed from the lru_list when it is enqueued, and added back
when it is dequeued.  This isn't a very appropriate model because most
glocks are held for long periods of time (for example, the inode "owns"
references to its inode and iopen glocks as long as the inode is cached
even when the glock state changes to LM_ST_UNLOCKED), and they can only
be freed when they are no longer referenced, anyway.

Fix this by getting rid of the refcount bias for locked glocks.  That
way, we can use lockref_put_or_lock() to efficiently drop all but the
last glock reference, and put the glock onto the lru_list when the last
reference is dropped.  When find_insert_glock() returns a reference to a
cached glock, it removes the glock from the lru_list.

Dumping the "glocks" and "glstats" debugfs files also takes glock
references, but instead of removing the glocks from the lru_list in that
case as well, we leave them on the list.  This ensures that dumping
those files won't perturb the order of the glocks on the lru_list.

In addition, when the last reference to an *unlocked* glock is dropped,
we immediately free it; this preserves the preexisting behavior.  If it
later turns out that caching unlocked glocks is useful in some
situations, we can change the caching strategy.

It is currently unclear if a glock that has no active references can
have the GLF_LFLUSH flag set.  To make sure that such a glock won't
accidentally be evicted due to memory pressure, we add a GLF_LFLUSH
check to gfs2_dispose_glock_lru().

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-05-29 15:34:55 +02:00
Andreas Gruenbacher
30e388d573 gfs2: Switch to a per-filesystem glock workqueue
Switch to a per-filesystem glock workqueue.  Additional workqueues are
cheap nowadays, and keeping separate workqueues allows to flush the work
of each filesystem without affecting the others.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-05-29 15:34:55 +02:00
Andreas Gruenbacher
51568ac2e9 gfs2: Report when glocks cannot be freed for a long time
When glocks cannot be freed for a long time, avoid the "task blocked for
more than N seconds" messages and report how many glocks are still
outstanding, instead.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-05-29 15:34:55 +02:00
Andreas Gruenbacher
8f6b8f142b gfs2: gfs2_glock_get cleanup
Clean up the messy code in gfs2_glock_get().  No change in
functionality.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-05-29 15:34:55 +02:00
Andreas Gruenbacher
c8758ad005 gfs2: Invert the GLF_INITIAL flag
Invert the meaning of the GLF_INITIAL flag: right now, when GLF_INITIAL
is set, a DLM lock exists and we have a valid identifier for it; when
GLF_INITIAL is cleared, no DLM lock exists (yet).  This is confusing.
In addition, it makes more sense to highlight the exceptional case
(i.e., no DLM lock exists yet) in glock dumps and trace points than to
highlight the common case.

To avoid confusion between the "old" and the "new" meaning of the flag,
use 'a' instead of 'I' to represent the flag.

For improved code consistency, check if the GLF_INITIAL flag is cleared
to determine whether a DLM lock exists instead of checking if the lock
identifier is non-zero.

Document what the flag is used for.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-05-29 15:34:55 +02:00
Andreas Gruenbacher
c8cf2d9f18 gfs2: Remove outdated comment in glock_work_func
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-05-29 15:34:55 +02:00
Andreas Gruenbacher
edeb180f1c gfs2: Rename handle_callback to request_demote
Function handle_callback() is used to request a glock demote.  This
often happens in response to a conflicting remote locking request and
subsequent bast callback from DLM, but there are other reasons for
triggering a demote request as well, such as when trying to release a
glock in response to memory pressure.  To clarify that, rename the
function to request_demote().

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-05-28 16:59:53 +02:00
Andreas Gruenbacher
1fb5f67e21 gfs2: Rename GLF_FROZEN to GLF_HAVE_FROZEN_REPLY
The GLF_FROZEN flag indicates that a reply to a DLM locking request has
been received, but should not be processed at this time.  To clarify
that meaning, rename the flag to GLF_HAVE_FROZEN_REPLY.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-05-28 16:59:53 +02:00
Andreas Gruenbacher
0a0383a93e gfs2: Rename GLF_REPLY_PENDING to GLF_HAVE_REPLY
The GLF_REPLY_PENDING flag indicates to glock_work_func() that in
response to a locking request, DLM has sent a reply that needs to be
processed.  A flag with that name could as well indicate that we are
waiting on a reply from DLM, however.  To disambiguate these two cases,
rename the flag to GLF_HAVE_REPLY.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-05-28 16:59:53 +02:00
Andreas Gruenbacher
121e730112 gfs2: Rename GLF_FREEING to GLF_UNLOCKED
Rename the GLF_FREEING flag to GLF_UNLOCKED, and the ->go_free glock
operation to ->go_unlocked.  This mechanism is used to wait for the
underlying DLM lock to be unlocked; being able to free the glock is a
consequence of the DLM lock being unlocked.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-05-28 16:59:53 +02:00
Andreas Gruenbacher
932a9052dc gfs2: Remove useless return statement in run_queue
The return statement at the end of run_queue() is useless.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-05-28 16:59:53 +02:00
Andreas Gruenbacher
99b8520c00 gfs2: Remove unnecessary function prototype
Function __gfs2_glock_dq() gets defined before it is used, so there is
no need for a separate function declaration.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-05-28 16:59:53 +02:00
Linus Torvalds
38da32ee70 bd_inode series
Replacement of bdev->bd_inode with sane(r) set of primitives.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZkwjlgAKCRBZ7Krx/gZQ
 66OmAP9nhZLASn/iM2+979I6O0GW+vid+uLh48uW3d+LbsmVIgD9GYpR+cuLQ/xj
 mJESWfYKOVSpFFSrqlzKg9PQlU/GFgs=
 =6LRp
 -----END PGP SIGNATURE-----

Merge tag 'pull-bd_inode-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull bdev bd_inode updates from Al Viro:
 "Replacement of bdev->bd_inode with sane(r) set of primitives by me and
  Yu Kuai"

* tag 'pull-bd_inode-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  RIP ->bd_inode
  dasd_format(): killing the last remaining user of ->bd_inode
  nilfs_attach_log_writer(): use ->bd_mapping->host instead of ->bd_inode
  block/bdev.c: use the knowledge of inode/bdev coallocation
  gfs2: more obvious initializations of mapping->host
  fs/buffer.c: massage the remaining users of ->bd_inode to ->bd_mapping
  blk_ioctl_{discard,zeroout}(): we only want ->bd_inode->i_mapping here...
  grow_dev_folio(): we only want ->bd_inode->i_mapping there
  use ->bd_mapping instead of ->bd_inode->i_mapping
  block_device: add a pointer to struct address_space (page cache of bdev)
  missing helpers: bdev_unhash(), bdev_drop()
  block: move two helpers into bdev.c
  block2mtd: prevent direct access of bd_inode
  dm-vdo: use bdev_nr_bytes(bdev) instead of i_size_read(bdev->bd_inode)
  blkdev_write_iter(): saner way to get inode and bdev
  bcachefs: remove dead function bdev_sectors()
  ext4: remove block_device_ejected()
  erofs_buf: store address_space instead of inode
  erofs: switch erofs_bread() to passing offset instead of block number
2024-05-21 09:51:42 -07:00
Linus Torvalds
9518ae6ec5 gfs2 updates
- Properly fix the glock shrinker this time: it broke in commit "gfs2:
   Make glock lru list scanning safer" and commit "gfs2: fix glock
   shrinker ref issues" wasn't actually enough to fix it.
 
 - On unmount, keep glocks around long enough that no more dlm callbacks
   can occur on them.
 
 - Some more folio conversion patches from Matthew Wilcox.
 
 - Lots of other smaller fixes and cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmZCfuIUHGFncnVlbmJh
 QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTravg//UIAXF+o8POtTCozn9QCfPqP5eWIF
 r879u7cf1o2yH43Qfx9urQKrNFuqQnDoDbbkCnDNQW5nsWWAjIAidG2Vi1j02YdM
 HTsU8890Y9uVlFtJ41jbRIkk2KtIZgKtKRGtPx7+FlWdCFB5ai36/aKTXRK34xcY
 KOBM2dDpnPdByERJ7+EPRC0EonhSZKhug3PqVfaCcj3p9S0eZtQHQdxhnPSF8wDK
 8ds2ErUV8N5DQtfR5WZ4N89JTk83CgFMuNaSXp8Kc8b/Wxa4KuKBTz+h6K90CJMP
 HKAnCz+qYro9ptoQCdXc7Wi1lWWS8kNuzXpTrQyDiVC5TCivxaBwQssOT2dVyB4+
 dlLL/4ikXDDYOq6r5q1rU8eVpKkCuzImCRtUW3rG4q7/Rvva4dfsVqcD/ISuxYD/
 fmXw/cb3zMDC06/59OusEFlwrltHjxH/2/IPS9ZQLLcP5An8LIbemDPdMvD7s2si
 b/OjgWA4M08cdBgKAhF4Kzla762y78kQYtZ/v18CI+I2Kr9fRXM74OI/rjTLSRim
 EyFH82lrpU8b38UpbElG4LI6FfZGvGCwmZYaEFQx8MUuvqAOceS2TH1z7v65Ekfu
 3NSqabKi0PGcH4oydfLSdQMxoc6glgZpTl3L/rx09qTnm3SFH/ghKVlivODbM8OK
 oJNhtATgTFbODuk=
 =d50l
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-for-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 updates from Andreas Gruenbacher:

 - Properly fix the glock shrinker this time: it broke in commit "gfs2:
   Make glock lru list scanning safer" and commit "gfs2: fix glock
   shrinker ref issues" wasn't actually enough to fix it

 - On unmount, keep glocks around long enough that no more dlm callbacks
   can occur on them

 - Some more folio conversion patches from Matthew Wilcox

 - Lots of other smaller fixes and cleanups

* tag 'gfs2-for-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (27 commits)
  gfs2: make timeout values more explicit
  gfs2: Convert gfs2_aspace_writepage() to use a folio
  gfs2: Add a migrate_folio operation for journalled files
  gfs2: Simplify gfs2_read_super
  gfs2: Convert gfs2_page_mkwrite() to use a folio
  gfs2: gfs2_freeze_unlock cleanup
  gfs2: Remove and replace gfs2_glock_queue_work
  gfs2: do_xmote fixes
  gfs2: finish_xmote cleanup
  gfs2: Unlock fewer glocks on unmount
  gfs2: Fix potential glock use-after-free on unmount
  gfs2: Remove ill-placed consistency check
  gfs2: Fix lru_count accounting
  gfs2: Fix "Make glock lru list scanning safer"
  Revert "gfs2: fix glock shrinker ref issues"
  gfs2: Fix "ignore unlock failures after withdraw"
  gfs2: Get rid of unnecessary test_and_set_bit
  gfs2: Don't set GLF_LOCK in gfs2_dispose_glock_lru
  gfs2: Replace gfs2_glock_queue_put with gfs2_glock_put_async
  gfs2: Get rid of gfs2_glock_queue_put in signal_our_withdraw
  ...
2024-05-14 17:35:22 -07:00
Wolfram Sang
c1c53c26e3 gfs2: make timeout values more explicit
'timeout' is a vague name for the return value of wait_event_*_timeout
because it actually returns the time left. Because the variable is never
used later, just drop the return value. Since variable 'timeout' is then
only used to carry a fixed timeout value, drop this in favor of a fixed
function argument as in the other call to wait_event_timeout() above.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-05-07 12:42:48 +02:00
Matthew Wilcox (Oracle)
50fabd42cb gfs2: Convert gfs2_aspace_writepage() to use a folio
Convert the incoming struct page to a folio and use it throughout.
Saves six calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-05-03 21:01:02 +02:00
Matthew Wilcox (Oracle)
b844048011 gfs2: Add a migrate_folio operation for journalled files
For journalled data, folio migration currently works by writing the folio
back, freeing the folio and faulting the new folio back in.  We can
bypass that by telling the migration code to migrate the buffer_heads
attached to our folios.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-05-03 21:01:02 +02:00
Al Viro
2d0026a490 gfs2: more obvious initializations of mapping->host
what's going on is copying the ->host of bdev's address_space

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lore.kernel.org/r/20240411145346.2516848-4-viro@zeniv.linux.org.uk
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-03 02:36:51 -04:00
Matthew Wilcox (Oracle)
75377ae754 gfs2: Simplify gfs2_read_super
Use submit_bio_wait() instead of hand-rolling our own synchronous
wait.  Also allocate the BIO on the stack since we're not deep in
the call stack at this point.

There's no need to kmap the page, since it isn't allocated from HIGHMEM.
Turn the GFP_NOFS allocation into GFP_KERNEL; if the page allocator
enters reclaim, we cannot be called as the filesystem has not yet been
initialised and so has no pages to reclaim.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-05-02 19:24:08 +02:00
Matthew Wilcox (Oracle)
f3851fed07 gfs2: Convert gfs2_page_mkwrite() to use a folio
Convert the incoming page to a folio and use it throughout, saving
several calls to compound_head().  Also use 'pos' for file position
rather than the ambiguous 'offset' and convert 'length' to type size_t
in case we get some truly ridiculous sized folios in the future.  This
function should now be large-folio safe, but I may have missed
something.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-04-29 13:04:40 +02:00
Andreas Gruenbacher
fcd63086bc gfs2: gfs2_freeze_unlock cleanup
Function gfs2_freeze_unlock() is always called with &sdp->sd_freeze_gh
as its argument, so clean up the code by passing in sdp instead.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-04-29 12:35:15 +02:00
Andreas Gruenbacher
1e86044402 gfs2: Remove and replace gfs2_glock_queue_work
There are no more callers of gfs2_glock_queue_work() left, so remove
that helper.  With that, we can now rename __gfs2_glock_queue_work()
back to gfs2_glock_queue_work() to get rid of some unnecessary clutter.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-04-24 19:48:20 +02:00
Andreas Gruenbacher
9947a06d29 gfs2: do_xmote fixes
Function do_xmote() is called with the glock spinlock held.  Commit
86934198ee added a 'goto skip_inval' statement at the beginning of the
function to further below where the glock spinlock is expected not to be
held anymore.  Then it added code there that requires the glock spinlock
to be held.  This doesn't make sense; fix this up by dropping and
retaking the spinlock where needed.

In addition, when ->lm_lock() returned an error, do_xmote() didn't fail
the locking operation, and simply left the glock hanging; fix that as
well.  (This is a much older error.)

Fixes: 86934198ee ("gfs2: Clear flags when withdraw prevents xmote")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-04-24 19:48:20 +02:00
Andreas Gruenbacher
1cd28e1586 gfs2: finish_xmote cleanup
Currently, function finish_xmote() takes and releases the glock
spinlock.  However, all of its callers immediately take that spinlock
again, so it makes more sense to take the spin lock before calling
finish_xmote() already.

With that, thaw_glock() is the only place that sets the GLF_HAVE_REPLY
flag outside of the glock spinlock, but it also takes that spinlock
immediately thereafter.  Change that to set the bit when the spinlock is
already held.  This allows to switch from test_and_clear_bit() to
test_bit() and clear_bit() in glock_work_func().

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-04-24 19:48:20 +02:00
Andreas Gruenbacher
a3730c5ec5 gfs2: Unlock fewer glocks on unmount
At unmount time, we would generally like to explicitly unlock as few
glocks as possible for efficiency.  We are already skipping glocks that
don't have a lock value block (LVB), but we can also skip glocks which
are not held in DLM_LOCK_EX or DLM_LOCK_PW mode (of which gfs2 only uses
DLM_LOCK_EX under the name LM_ST_EXCLUSIVE).

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Cc: David Teigland <teigland@redhat.com>
2024-04-24 19:48:20 +02:00
Andreas Gruenbacher
d98779e687 gfs2: Fix potential glock use-after-free on unmount
When a DLM lockspace is released and there ares still locks in that
lockspace, DLM will unlock those locks automatically.  Commit
fb6791d100 started exploiting this behavior to speed up filesystem
unmount: gfs2 would simply free glocks it didn't want to unlock and then
release the lockspace.  This didn't take the bast callbacks for
asynchronous lock contention notifications into account, which remain
active until until a lock is unlocked or its lockspace is released.

To prevent those callbacks from accessing deallocated objects, put the
glocks that should not be unlocked on the sd_dead_glocks list, release
the lockspace, and only then free those glocks.

As an additional measure, ignore unexpected ast and bast callbacks if
the receiving glock is dead.

Fixes: fb6791d100 ("GFS2: skip dlm_unlock calls in unmount")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Cc: David Teigland <teigland@redhat.com>
2024-04-24 19:48:20 +02:00
Andreas Gruenbacher
59f6000579 gfs2: Remove ill-placed consistency check
This consistency check was originally added by commit 9287c6452d
("gfs2: Fix occasional glock use-after-free").  It is ill-placed in
gfs2_glock_free() because if it holds there, it must equally hold in
__gfs2_glock_put() already.  Either way, the check doesn't seem
necessary anymore.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-04-24 19:48:20 +02:00
Andreas Gruenbacher
7a1ad9d812 gfs2: Fix lru_count accounting
Currently, gfs2_scan_glock_lru() decrements lru_count when a glock is
moved onto the dispose list.  When such a glock is then stolen from the
dispose list while gfs2_dispose_glock_lru() doesn't hold the lru_lock,
lru_count will be decremented again, so the counter will eventually go
negative.

This bug has existed in one form or another since at least commit
97cc1025b1 ("GFS2: Kill two daemons with one patch").

Fix this by only decrementing lru_count when we actually remove a glock
and schedule for it to be unlocked and dropped.  We also don't need to
remove and then re-add glocks when we can just as well move them back
onto the lru_list when necessary.

In addition, return the number of glocks freed as we should, not the
number of glocks moved onto the dispose list.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-04-24 19:46:38 +02:00
Andreas Gruenbacher
acf1f42faf gfs2: Fix "Make glock lru list scanning safer"
Commit 228804a35c tried to add a refcount check to
gfs2_scan_glock_lru() to make sure that glocks that are still referenced
cannot be freed.  It failed to account for the bias state_change() adds
to the refcount for held glocks, so held glocks are no longer removed
from the glock cache, which can lead to out-of-memory problems.  Fix
that.  (The inodes those glocks are associated with do get shrunk and do
get pushed out of memory.)

In addition, use the same eligibility check in gfs2_scan_glock_lru() and
gfs2_dispose_glock_lru().

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-04-09 18:35:58 +02:00
Andreas Gruenbacher
c9a0a4b028 Revert "gfs2: fix glock shrinker ref issues"
This reverts commit 62862485a4.

Commit 62862485a4 tried to fix issues introduced by commit
228804a35c ("gfs2: Make glock lru list scanning safer"), but like that
commit, it failed to account for the bias state_change() adds to the
glock reference count for locked glocks.  Revert commit 62862485a4 so
that we can fix commit 228804a35c properly.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-04-09 18:35:58 +02:00
Andreas Gruenbacher
5d92311119 gfs2: Fix "ignore unlock failures after withdraw"
Commit 3e11e53041 tries to suppress dlm_lock() lock conversion errors
that occur when the lockspace has already been released.

It does that by setting and checking the SDF_SKIP_DLM_UNLOCK flag.  This
conflicts with the intended meaning of the SDF_SKIP_DLM_UNLOCK flag, so
check whether the lockspace is still allocated instead.

(Given the current DLM API, checking for this kind of error after the
fact seems easier that than to make sure that the lockspace is still
allocated before calling dlm_lock().  Changing the DLM API so that users
maintain the lockspace references themselves would be an option.)

Fixes: 3e11e53041 ("GFS2: ignore unlock failures after withdraw")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-04-09 18:35:58 +02:00
Andreas Gruenbacher
262ee3a07e gfs2: Get rid of unnecessary test_and_set_bit
The GLF_LOCK flag is protected by the gl->gl_lockref.lock spin lock
which is held when entering run_queue(), so we can use test_bit() and
set_bit() here.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-04-09 18:35:58 +02:00
Andreas Gruenbacher
927cfc90d2 gfs2: Don't set GLF_LOCK in gfs2_dispose_glock_lru
In gfs2_dispose_glock_lru(), we want to skip glocks which are in the
process of transitioning state (as indicated by the set GLF_LOCK flag),
but we we don't need to set that flag for requesting a state transition.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-04-09 18:35:57 +02:00
Andreas Gruenbacher
ee2be7d7c7 gfs2: Replace gfs2_glock_queue_put with gfs2_glock_put_async
Function gfs2_glock_queue_put() puts a glock reference by enqueuing
glock work instead of putting the reference directly.  This ensures that
the operation won't sleep, but it is costly and really only necessary
when putting the final glock reference.  Replace it with a new
gfs2_glock_put_async() function that only queues glock work when putting
the last glock reference.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-04-09 18:35:57 +02:00
Andreas Gruenbacher
f80d882edc gfs2: Get rid of gfs2_glock_queue_put in signal_our_withdraw
In function signal_our_withdraw(), we are calling gfs2_glock_queue_put()
in a context in which we are actually allowed to sleep, so replace that
with a simple call to gfs2_glock_put().

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-04-09 18:35:57 +02:00
Andreas Gruenbacher
35264909e9 gfs2: Fix NULL pointer dereference in gfs2_log_flush
In gfs2_jindex_free(), set sdp->sd_jdesc to NULL under the log flush
lock to provide exclusion against gfs2_log_flush().

In gfs2_log_flush(), check if sdp->sd_jdesc is non-NULL before
dereferencing it.  Otherwise, we could run into a NULL pointer
dereference when outstanding glock work races with an unmount
(glock_work_func -> run_queue -> do_xmote -> inode_go_sync ->
gfs2_log_flush).

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-04-09 18:35:57 +02:00
Andreas Gruenbacher
b01189333e gfs2: Don't forget to complete delayed withdraw
Commit fffe9bee14 ("gfs2: Delay withdraw from atomic context")
switched from gfs2_withdraw() to gfs2_withdraw_delayed() in
gfs2_ail_error(), but failed to then check if a delayed withdraw had
occurred.  Fix that by adding the missing check in __gfs2_ail_flush(),
where the spin locks are already dropped and a withdraw is possible.

Fixes: fffe9bee14 ("gfs2: Delay withdraw from atomic context")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-04-09 18:35:57 +02:00
Andreas Gruenbacher
3592bfaf74 gfs2: Use [NO_]CREATE consistently for gfs2_glock_get
When calling gfs2_glock_get(), consistently pass the CREATE or NO_CREATE
flag for the create argument.  This is the only place that passes 0
instead.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-04-09 18:35:57 +02:00
Andreas Gruenbacher
52c2d389fe gfs2: Follow-up to flag rename in sysfs status file
Follow up to commit e7beb8b6de ("gfs2: Rename SDF_DEACTIVATING to
SDF_KILL") and change the description of the SDF_KILL flag from
"Deactivating" to "Killing".

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-04-09 18:35:57 +02:00
Andreas Gruenbacher
795405c4b9 gfs2: Remove unnecessary gfs2_meta_check_ii argument
The type argument of gfs2_meta_check_ii() is always set to "magic
number", so remove that argument and hardcode the string in
gfs2_meta_check_ii().  Change the string to "bad magic number" to
emphasize that the problem is the incorrect magic number.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-04-09 18:35:57 +02:00
Andreas Gruenbacher
b204b1b61e gfs2: Get rid of newlines in log messages
Get rid of attempts to create multi-line syslog entries; this only makes
the messages harder to read.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-04-09 18:35:57 +02:00
Andrew Price
10398ef57a gfs2: Improve gfs2_consist_inode() usage
gfs2_consist_inode() logs an error message with the source file and line
number. When we jump before calling it, the line number becomes less
useful as it no longer relates to the source of the error. To aid
troubleshooting, replace the gotos with the gfs2_consist_inode() calls
so that the error messages are more informative.

Signed-off-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-04-09 18:35:57 +02:00
Linus Torvalds
928a87efa4 gfs2 fix
- Fix boundary check in punch_hole
 -----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmYBaM0UHGFncnVlbmJh
 QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTrzqw/9GpK71h1dIA8vYqInumdrUabksLKy
 jRMR2ZxfzBKLdAfgn9AS3nrWNos72vjAxbjYCi/fbY9uvIK1/zzq7Ef7601kCetM
 NzxShY8AwLJa9mO8O5yReLL7O/61gjlcdD6rSjkYwphWuobd5vpudKkibgpdJyH8
 bn6U1/2K5ASFtWyTRbudOIsz4AqPUE6ZB4KxSuCDx7uFiQjnuh6sk8wfg48pdig7
 GAsNPmBFfWAQXClPnI/WFG0hpkuRIK1hk9ITWx1ybu2JqaNeVXRBqGoRZbEkPYju
 qEkp4oT3j/1siBz1sMOjC5tfmAzhLvAeL61pD2EOcm5Bpd3iKJibYt/uCIpYFHM0
 WfRcUmqEduN1zhDuSR4KSe49JQ5dFXVf83YqUgbtrHFiHHXNBYYqFNUVfcDAB1p7
 IH9AlNd82zyxJ3fsBX7VpEbGC2qNa3K8hYO7px8DNVrPGzW7AhPF1Lsh0OE9GlZU
 H5f70Nryi98iwadbePBUchTrx0S3iYjk2TQgLGf5L/lAl6J/MRNG31kittDtehri
 cct/JBr8sUAK014TS5NxPbpxqDnVot3UsYk7h6s7WdmM1svfs7j5f1mo3ovMEGqX
 io5Z6pFEE7n1ce5hbieDKr3JFh6LxP1ArUSY8oz5rR0shE2XHMcIdq3J26Vfi0Q0
 4VjdBic/7rUUBXI=
 =QXEK
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-v6.8-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 fix from Andreas Gruenbacher:

 - Fix boundary check in punch_hole

* tag 'gfs2-v6.8-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: Fix invalid metadata access in punch_hole
2024-03-25 10:53:39 -07:00
Linus Torvalds
f88c3fb81c mm, slab: remove last vestiges of SLAB_MEM_SPREAD
Yes, yes, I know the slab people were planning on going slow and letting
every subsystem fight this thing on their own.  But let's just rip off
the band-aid and get it over and done with.  I don't want to see a
number of unnecessary pull requests just to get rid of a flag that no
longer has any meaning.

This was mainly done with a couple of 'sed' scripts and then some manual
cleanup of the end result.

Link: https://lore.kernel.org/all/CAHk-=wji0u+OOtmAOD-5JV3SXcRJF___k_+8XNKmak0yd5vW1Q@mail.gmail.com/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-03-12 20:32:19 -07:00
Linus Torvalds
0f1a876682 vfs-6.9.uuid
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZem5LwAKCRCRxhvAZXjc
 onZsAQCjMNabNWAty2VBAQrNIpGkZ+AMA2DxEajPldaPiJH5zQEA9ea7feB3T47i
 NUrXXfMQ5DSop+k5Y65pPkEpbX4rhQo=
 =NZgd
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.9.uuid' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs uuid updates from Christian Brauner:
 "This adds two new ioctl()s for getting the filesystem uuid and
  retrieving the sysfs path based on the path of a mounted filesystem.
  Getting the filesystem uuid has been implemented in filesystem
  specific code for a while it's now lifted as a generic ioctl"

* tag 'vfs-6.9.uuid' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  xfs: add support for FS_IOC_GETFSSYSFSPATH
  fs: add FS_IOC_GETFSSYSFSPATH
  fat: Hook up sb->s_uuid
  fs: FS_IOC_GETUUID
  ovl: convert to super_set_uuid()
  fs: super_set_uuid()
2024-03-11 11:02:06 -07:00
Linus Torvalds
0c750012e8 vfs-6.9.file
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZem4tQAKCRCRxhvAZXjc
 ohnfAP4sm946PZfiC4y5Euk96WDC3hC8WCSBar+fpFmYVzeD9wEAy+NVCsjkMElz
 vqNxwFULUwQjFxxvsM9gvhrgGUud1AE=
 =UZk/
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.9.file' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull file locking updates from Christian Brauner:
 "A few years ago struct file_lock_context was added to allow for
  separate lists to track different types of file locks instead of using
  a singly-linked list for all of them.

  Now leases no longer need to be tracked using struct file_lock.
  However, a lot of the infrastructure is identical for leases and locks
  so separating them isn't trivial.

  This splits a group of fields used by both file locks and leases into
  a new struct file_lock_core. The new core struct is embedded in struct
  file_lock. Coccinelle was used to convert a lot of the callers to deal
  with the move, with the remaining 25% or so converted by hand.

  Afterwards several internal functions in fs/locks.c are made to work
  with struct file_lock_core. Ultimately this allows to split struct
  file_lock into struct file_lock and struct file_lease. The file lease
  APIs are then converted to take struct file_lease"

* tag 'vfs-6.9.file' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (51 commits)
  filelock: fix deadlock detection in POSIX locking
  filelock: always define for_each_file_lock()
  smb: remove redundant check
  filelock: don't do security checks on nfsd setlease calls
  filelock: split leases out of struct file_lock
  filelock: remove temporary compatibility macros
  smb/server: adapt to breakup of struct file_lock
  smb/client: adapt to breakup of struct file_lock
  ocfs2: adapt to breakup of struct file_lock
  nfsd: adapt to breakup of struct file_lock
  nfs: adapt to breakup of struct file_lock
  lockd: adapt to breakup of struct file_lock
  fuse: adapt to breakup of struct file_lock
  gfs2: adapt to breakup of struct file_lock
  dlm: adapt to breakup of struct file_lock
  ceph: adapt to breakup of struct file_lock
  afs: adapt to breakup of struct file_lock
  9p: adapt to breakup of struct file_lock
  filelock: convert seqfile handling to use file_lock_core
  filelock: convert locks_translate_pid to take file_lock_core
  ...
2024-03-11 10:37:45 -07:00
Linus Torvalds
54126fafea vfs-6.9.iomap
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZem4UQAKCRCRxhvAZXjc
 ouERAQDg63R9s3bKmUgGqngf9cfr//VCTE+WVARwOUTdn2iDbwEA1IME7X1kL/Vz
 EdhEjyqO6xom+ao/Vqxe0XIDNz70vgs=
 =8RdE
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.9.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull iomap updates from Christian Brauner:

 - Restore read-write hints in struct bio through the bi_write_hint
   member for the sake of UFS devices in mobile applications. This can
   result in up to 40% lower write amplification in UFS devices. The
   patch series that builds on this will be coming in via the SCSI
   maintainers (Bart)

 - Overhaul the iomap writeback code. Afterwards ->map_blocks() is able
   to map multiple blocks at once as long as they're in the same folio.
   This reduces CPU usage for buffered write workloads on e.g., xfs on
   systems with lots of cores (Christoph)

 - Record processed bytes in iomap_iter() trace event (Kassey)

 - Extend iomap_writepage_map() trace event after Christoph's
   ->map_block() changes to map mutliple blocks at once (Zhang)

* tag 'vfs-6.9.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (22 commits)
  iomap: Add processed for iomap_iter
  iomap: add pos and dirty_len into trace_iomap_writepage_map
  block, fs: Restore the per-bio/request data lifetime fields
  fs: Propagate write hints to the struct block_device inode
  fs: Move enum rw_hint into a new header file
  fs: Split fcntl_rw_hint()
  fs: Verify write lifetime constants at compile time
  fs: Fix rw_hint validation
  iomap: pass the length of the dirty region to ->map_blocks
  iomap: map multiple blocks at a time
  iomap: submit ioends immediately
  iomap: factor out a iomap_writepage_map_block helper
  iomap: only call mapping_set_error once for each failed bio
  iomap: don't chain bios
  iomap: move the iomap_sector sector calculation out of iomap_add_to_ioend
  iomap: clean up the iomap_alloc_ioend calling convention
  iomap: move all remaining per-folio logic into iomap_writepage_map
  iomap: factor out a iomap_writepage_handle_eof helper
  iomap: move the PF_MEMALLOC check to iomap_writepages
  iomap: move the io_folios field out of struct iomap_ioend
  ...
2024-03-11 10:07:03 -07:00
Andrew Price
c95346ac91 gfs2: Fix invalid metadata access in punch_hole
In punch_hole(), when the offset lies in the final block for a given
height, there is no hole to punch, but the maximum size check fails to
detect that.  Consequently, punch_hole() will try to punch a hole beyond
the end of the metadata and fail.  Fix the maximum size check.

Signed-off-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-03-11 17:11:18 +01:00
Kent Overstreet
a4af51ce22
fs: super_set_uuid()
Some weird old filesytems have UUID-like things that we wish to expose
as UUIDs, but are smaller; add a length field so that the new
FS_IOC_(GET|SET)UUID ioctls can handle them in generic code.

And add a helper super_set_uuid(), for setting nonstandard length uuids.

Helper is now required for the new FS_IOC_GETUUID ioctl; if
super_set_uuid() hasn't been called, the ioctl won't be supported.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Link: https://lore.kernel.org/r/20240207025624.1019754-2-kent.overstreet@linux.dev
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-08 21:19:59 +01:00