This reverts commit e7ccaf5fe1.
Before commit e7ccaf5fe1, every time a resource group glock was
dequeued by gfs2_glock_dq(), it was added to the glock LRU list even
though the glock was still referenced by the resource group and could
never be evicted, anyway. Commit e7ccaf5fe1 added a GLOF_LRU hack to
avoid that overhead for resource group glocks, and that hack was since
adopted for some other types of glocks as well.
We now no longer add glocks to the glock LRU list while they are still
referenced. This solves the underlying problem, and obsoletes the
GLOF_LRU hack.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
(cherry picked from commit 3e5257c810cba91e274d07f3db5cf013c7c830be)
In the current glock reference counting model, a bias of one is added to
a glock's refcount when it is locked (gl->gl_state != LM_ST_UNLOCKED).
A glock is removed from the lru_list when it is enqueued, and added back
when it is dequeued. This isn't a very appropriate model because most
glocks are held for long periods of time (for example, the inode "owns"
references to its inode and iopen glocks as long as the inode is cached
even when the glock state changes to LM_ST_UNLOCKED), and they can only
be freed when they are no longer referenced, anyway.
Fix this by getting rid of the refcount bias for locked glocks. That
way, we can use lockref_put_or_lock() to efficiently drop all but the
last glock reference, and put the glock onto the lru_list when the last
reference is dropped. When find_insert_glock() returns a reference to a
cached glock, it removes the glock from the lru_list.
Dumping the "glocks" and "glstats" debugfs files also takes glock
references, but instead of removing the glocks from the lru_list in that
case as well, we leave them on the list. This ensures that dumping
those files won't perturb the order of the glocks on the lru_list.
In addition, when the last reference to an *unlocked* glock is dropped,
we immediately free it; this preserves the preexisting behavior. If it
later turns out that caching unlocked glocks is useful in some
situations, we can change the caching strategy.
It is currently unclear if a glock that has no active references can
have the GLF_LFLUSH flag set. To make sure that such a glock won't
accidentally be evicted due to memory pressure, we add a GLF_LFLUSH
check to gfs2_dispose_glock_lru().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Switch to a per-filesystem glock workqueue. Additional workqueues are
cheap nowadays, and keeping separate workqueues allows to flush the work
of each filesystem without affecting the others.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
When glocks cannot be freed for a long time, avoid the "task blocked for
more than N seconds" messages and report how many glocks are still
outstanding, instead.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Invert the meaning of the GLF_INITIAL flag: right now, when GLF_INITIAL
is set, a DLM lock exists and we have a valid identifier for it; when
GLF_INITIAL is cleared, no DLM lock exists (yet). This is confusing.
In addition, it makes more sense to highlight the exceptional case
(i.e., no DLM lock exists yet) in glock dumps and trace points than to
highlight the common case.
To avoid confusion between the "old" and the "new" meaning of the flag,
use 'a' instead of 'I' to represent the flag.
For improved code consistency, check if the GLF_INITIAL flag is cleared
to determine whether a DLM lock exists instead of checking if the lock
identifier is non-zero.
Document what the flag is used for.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Function handle_callback() is used to request a glock demote. This
often happens in response to a conflicting remote locking request and
subsequent bast callback from DLM, but there are other reasons for
triggering a demote request as well, such as when trying to release a
glock in response to memory pressure. To clarify that, rename the
function to request_demote().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
The GLF_FROZEN flag indicates that a reply to a DLM locking request has
been received, but should not be processed at this time. To clarify
that meaning, rename the flag to GLF_HAVE_FROZEN_REPLY.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
The GLF_REPLY_PENDING flag indicates to glock_work_func() that in
response to a locking request, DLM has sent a reply that needs to be
processed. A flag with that name could as well indicate that we are
waiting on a reply from DLM, however. To disambiguate these two cases,
rename the flag to GLF_HAVE_REPLY.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Rename the GLF_FREEING flag to GLF_UNLOCKED, and the ->go_free glock
operation to ->go_unlocked. This mechanism is used to wait for the
underlying DLM lock to be unlocked; being able to free the glock is a
consequence of the DLM lock being unlocked.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Function __gfs2_glock_dq() gets defined before it is used, so there is
no need for a separate function declaration.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Replacement of bdev->bd_inode with sane(r) set of primitives.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZkwjlgAKCRBZ7Krx/gZQ
66OmAP9nhZLASn/iM2+979I6O0GW+vid+uLh48uW3d+LbsmVIgD9GYpR+cuLQ/xj
mJESWfYKOVSpFFSrqlzKg9PQlU/GFgs=
=6LRp
-----END PGP SIGNATURE-----
Merge tag 'pull-bd_inode-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull bdev bd_inode updates from Al Viro:
"Replacement of bdev->bd_inode with sane(r) set of primitives by me and
Yu Kuai"
* tag 'pull-bd_inode-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
RIP ->bd_inode
dasd_format(): killing the last remaining user of ->bd_inode
nilfs_attach_log_writer(): use ->bd_mapping->host instead of ->bd_inode
block/bdev.c: use the knowledge of inode/bdev coallocation
gfs2: more obvious initializations of mapping->host
fs/buffer.c: massage the remaining users of ->bd_inode to ->bd_mapping
blk_ioctl_{discard,zeroout}(): we only want ->bd_inode->i_mapping here...
grow_dev_folio(): we only want ->bd_inode->i_mapping there
use ->bd_mapping instead of ->bd_inode->i_mapping
block_device: add a pointer to struct address_space (page cache of bdev)
missing helpers: bdev_unhash(), bdev_drop()
block: move two helpers into bdev.c
block2mtd: prevent direct access of bd_inode
dm-vdo: use bdev_nr_bytes(bdev) instead of i_size_read(bdev->bd_inode)
blkdev_write_iter(): saner way to get inode and bdev
bcachefs: remove dead function bdev_sectors()
ext4: remove block_device_ejected()
erofs_buf: store address_space instead of inode
erofs: switch erofs_bread() to passing offset instead of block number
- Properly fix the glock shrinker this time: it broke in commit "gfs2:
Make glock lru list scanning safer" and commit "gfs2: fix glock
shrinker ref issues" wasn't actually enough to fix it.
- On unmount, keep glocks around long enough that no more dlm callbacks
can occur on them.
- Some more folio conversion patches from Matthew Wilcox.
- Lots of other smaller fixes and cleanups.
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmZCfuIUHGFncnVlbmJh
QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTravg//UIAXF+o8POtTCozn9QCfPqP5eWIF
r879u7cf1o2yH43Qfx9urQKrNFuqQnDoDbbkCnDNQW5nsWWAjIAidG2Vi1j02YdM
HTsU8890Y9uVlFtJ41jbRIkk2KtIZgKtKRGtPx7+FlWdCFB5ai36/aKTXRK34xcY
KOBM2dDpnPdByERJ7+EPRC0EonhSZKhug3PqVfaCcj3p9S0eZtQHQdxhnPSF8wDK
8ds2ErUV8N5DQtfR5WZ4N89JTk83CgFMuNaSXp8Kc8b/Wxa4KuKBTz+h6K90CJMP
HKAnCz+qYro9ptoQCdXc7Wi1lWWS8kNuzXpTrQyDiVC5TCivxaBwQssOT2dVyB4+
dlLL/4ikXDDYOq6r5q1rU8eVpKkCuzImCRtUW3rG4q7/Rvva4dfsVqcD/ISuxYD/
fmXw/cb3zMDC06/59OusEFlwrltHjxH/2/IPS9ZQLLcP5An8LIbemDPdMvD7s2si
b/OjgWA4M08cdBgKAhF4Kzla762y78kQYtZ/v18CI+I2Kr9fRXM74OI/rjTLSRim
EyFH82lrpU8b38UpbElG4LI6FfZGvGCwmZYaEFQx8MUuvqAOceS2TH1z7v65Ekfu
3NSqabKi0PGcH4oydfLSdQMxoc6glgZpTl3L/rx09qTnm3SFH/ghKVlivODbM8OK
oJNhtATgTFbODuk=
=d50l
-----END PGP SIGNATURE-----
Merge tag 'gfs2-for-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 updates from Andreas Gruenbacher:
- Properly fix the glock shrinker this time: it broke in commit "gfs2:
Make glock lru list scanning safer" and commit "gfs2: fix glock
shrinker ref issues" wasn't actually enough to fix it
- On unmount, keep glocks around long enough that no more dlm callbacks
can occur on them
- Some more folio conversion patches from Matthew Wilcox
- Lots of other smaller fixes and cleanups
* tag 'gfs2-for-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (27 commits)
gfs2: make timeout values more explicit
gfs2: Convert gfs2_aspace_writepage() to use a folio
gfs2: Add a migrate_folio operation for journalled files
gfs2: Simplify gfs2_read_super
gfs2: Convert gfs2_page_mkwrite() to use a folio
gfs2: gfs2_freeze_unlock cleanup
gfs2: Remove and replace gfs2_glock_queue_work
gfs2: do_xmote fixes
gfs2: finish_xmote cleanup
gfs2: Unlock fewer glocks on unmount
gfs2: Fix potential glock use-after-free on unmount
gfs2: Remove ill-placed consistency check
gfs2: Fix lru_count accounting
gfs2: Fix "Make glock lru list scanning safer"
Revert "gfs2: fix glock shrinker ref issues"
gfs2: Fix "ignore unlock failures after withdraw"
gfs2: Get rid of unnecessary test_and_set_bit
gfs2: Don't set GLF_LOCK in gfs2_dispose_glock_lru
gfs2: Replace gfs2_glock_queue_put with gfs2_glock_put_async
gfs2: Get rid of gfs2_glock_queue_put in signal_our_withdraw
...
'timeout' is a vague name for the return value of wait_event_*_timeout
because it actually returns the time left. Because the variable is never
used later, just drop the return value. Since variable 'timeout' is then
only used to carry a fixed timeout value, drop this in favor of a fixed
function argument as in the other call to wait_event_timeout() above.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Convert the incoming struct page to a folio and use it throughout.
Saves six calls to compound_head().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
For journalled data, folio migration currently works by writing the folio
back, freeing the folio and faulting the new folio back in. We can
bypass that by telling the migration code to migrate the buffer_heads
attached to our folios.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Use submit_bio_wait() instead of hand-rolling our own synchronous
wait. Also allocate the BIO on the stack since we're not deep in
the call stack at this point.
There's no need to kmap the page, since it isn't allocated from HIGHMEM.
Turn the GFP_NOFS allocation into GFP_KERNEL; if the page allocator
enters reclaim, we cannot be called as the filesystem has not yet been
initialised and so has no pages to reclaim.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Convert the incoming page to a folio and use it throughout, saving
several calls to compound_head(). Also use 'pos' for file position
rather than the ambiguous 'offset' and convert 'length' to type size_t
in case we get some truly ridiculous sized folios in the future. This
function should now be large-folio safe, but I may have missed
something.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Function gfs2_freeze_unlock() is always called with &sdp->sd_freeze_gh
as its argument, so clean up the code by passing in sdp instead.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
There are no more callers of gfs2_glock_queue_work() left, so remove
that helper. With that, we can now rename __gfs2_glock_queue_work()
back to gfs2_glock_queue_work() to get rid of some unnecessary clutter.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Function do_xmote() is called with the glock spinlock held. Commit
86934198ee added a 'goto skip_inval' statement at the beginning of the
function to further below where the glock spinlock is expected not to be
held anymore. Then it added code there that requires the glock spinlock
to be held. This doesn't make sense; fix this up by dropping and
retaking the spinlock where needed.
In addition, when ->lm_lock() returned an error, do_xmote() didn't fail
the locking operation, and simply left the glock hanging; fix that as
well. (This is a much older error.)
Fixes: 86934198ee ("gfs2: Clear flags when withdraw prevents xmote")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Currently, function finish_xmote() takes and releases the glock
spinlock. However, all of its callers immediately take that spinlock
again, so it makes more sense to take the spin lock before calling
finish_xmote() already.
With that, thaw_glock() is the only place that sets the GLF_HAVE_REPLY
flag outside of the glock spinlock, but it also takes that spinlock
immediately thereafter. Change that to set the bit when the spinlock is
already held. This allows to switch from test_and_clear_bit() to
test_bit() and clear_bit() in glock_work_func().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
At unmount time, we would generally like to explicitly unlock as few
glocks as possible for efficiency. We are already skipping glocks that
don't have a lock value block (LVB), but we can also skip glocks which
are not held in DLM_LOCK_EX or DLM_LOCK_PW mode (of which gfs2 only uses
DLM_LOCK_EX under the name LM_ST_EXCLUSIVE).
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Cc: David Teigland <teigland@redhat.com>
When a DLM lockspace is released and there ares still locks in that
lockspace, DLM will unlock those locks automatically. Commit
fb6791d100 started exploiting this behavior to speed up filesystem
unmount: gfs2 would simply free glocks it didn't want to unlock and then
release the lockspace. This didn't take the bast callbacks for
asynchronous lock contention notifications into account, which remain
active until until a lock is unlocked or its lockspace is released.
To prevent those callbacks from accessing deallocated objects, put the
glocks that should not be unlocked on the sd_dead_glocks list, release
the lockspace, and only then free those glocks.
As an additional measure, ignore unexpected ast and bast callbacks if
the receiving glock is dead.
Fixes: fb6791d100 ("GFS2: skip dlm_unlock calls in unmount")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Cc: David Teigland <teigland@redhat.com>
This consistency check was originally added by commit 9287c6452d
("gfs2: Fix occasional glock use-after-free"). It is ill-placed in
gfs2_glock_free() because if it holds there, it must equally hold in
__gfs2_glock_put() already. Either way, the check doesn't seem
necessary anymore.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Currently, gfs2_scan_glock_lru() decrements lru_count when a glock is
moved onto the dispose list. When such a glock is then stolen from the
dispose list while gfs2_dispose_glock_lru() doesn't hold the lru_lock,
lru_count will be decremented again, so the counter will eventually go
negative.
This bug has existed in one form or another since at least commit
97cc1025b1 ("GFS2: Kill two daemons with one patch").
Fix this by only decrementing lru_count when we actually remove a glock
and schedule for it to be unlocked and dropped. We also don't need to
remove and then re-add glocks when we can just as well move them back
onto the lru_list when necessary.
In addition, return the number of glocks freed as we should, not the
number of glocks moved onto the dispose list.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Commit 228804a35c tried to add a refcount check to
gfs2_scan_glock_lru() to make sure that glocks that are still referenced
cannot be freed. It failed to account for the bias state_change() adds
to the refcount for held glocks, so held glocks are no longer removed
from the glock cache, which can lead to out-of-memory problems. Fix
that. (The inodes those glocks are associated with do get shrunk and do
get pushed out of memory.)
In addition, use the same eligibility check in gfs2_scan_glock_lru() and
gfs2_dispose_glock_lru().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
This reverts commit 62862485a4.
Commit 62862485a4 tried to fix issues introduced by commit
228804a35c ("gfs2: Make glock lru list scanning safer"), but like that
commit, it failed to account for the bias state_change() adds to the
glock reference count for locked glocks. Revert commit 62862485a4 so
that we can fix commit 228804a35c properly.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Commit 3e11e53041 tries to suppress dlm_lock() lock conversion errors
that occur when the lockspace has already been released.
It does that by setting and checking the SDF_SKIP_DLM_UNLOCK flag. This
conflicts with the intended meaning of the SDF_SKIP_DLM_UNLOCK flag, so
check whether the lockspace is still allocated instead.
(Given the current DLM API, checking for this kind of error after the
fact seems easier that than to make sure that the lockspace is still
allocated before calling dlm_lock(). Changing the DLM API so that users
maintain the lockspace references themselves would be an option.)
Fixes: 3e11e53041 ("GFS2: ignore unlock failures after withdraw")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
The GLF_LOCK flag is protected by the gl->gl_lockref.lock spin lock
which is held when entering run_queue(), so we can use test_bit() and
set_bit() here.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
In gfs2_dispose_glock_lru(), we want to skip glocks which are in the
process of transitioning state (as indicated by the set GLF_LOCK flag),
but we we don't need to set that flag for requesting a state transition.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Function gfs2_glock_queue_put() puts a glock reference by enqueuing
glock work instead of putting the reference directly. This ensures that
the operation won't sleep, but it is costly and really only necessary
when putting the final glock reference. Replace it with a new
gfs2_glock_put_async() function that only queues glock work when putting
the last glock reference.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
In function signal_our_withdraw(), we are calling gfs2_glock_queue_put()
in a context in which we are actually allowed to sleep, so replace that
with a simple call to gfs2_glock_put().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
In gfs2_jindex_free(), set sdp->sd_jdesc to NULL under the log flush
lock to provide exclusion against gfs2_log_flush().
In gfs2_log_flush(), check if sdp->sd_jdesc is non-NULL before
dereferencing it. Otherwise, we could run into a NULL pointer
dereference when outstanding glock work races with an unmount
(glock_work_func -> run_queue -> do_xmote -> inode_go_sync ->
gfs2_log_flush).
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Commit fffe9bee14 ("gfs2: Delay withdraw from atomic context")
switched from gfs2_withdraw() to gfs2_withdraw_delayed() in
gfs2_ail_error(), but failed to then check if a delayed withdraw had
occurred. Fix that by adding the missing check in __gfs2_ail_flush(),
where the spin locks are already dropped and a withdraw is possible.
Fixes: fffe9bee14 ("gfs2: Delay withdraw from atomic context")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
When calling gfs2_glock_get(), consistently pass the CREATE or NO_CREATE
flag for the create argument. This is the only place that passes 0
instead.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Follow up to commit e7beb8b6de ("gfs2: Rename SDF_DEACTIVATING to
SDF_KILL") and change the description of the SDF_KILL flag from
"Deactivating" to "Killing".
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
The type argument of gfs2_meta_check_ii() is always set to "magic
number", so remove that argument and hardcode the string in
gfs2_meta_check_ii(). Change the string to "bad magic number" to
emphasize that the problem is the incorrect magic number.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Get rid of attempts to create multi-line syslog entries; this only makes
the messages harder to read.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
gfs2_consist_inode() logs an error message with the source file and line
number. When we jump before calling it, the line number becomes less
useful as it no longer relates to the source of the error. To aid
troubleshooting, replace the gotos with the gfs2_consist_inode() calls
so that the error messages are more informative.
Signed-off-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Yes, yes, I know the slab people were planning on going slow and letting
every subsystem fight this thing on their own. But let's just rip off
the band-aid and get it over and done with. I don't want to see a
number of unnecessary pull requests just to get rid of a flag that no
longer has any meaning.
This was mainly done with a couple of 'sed' scripts and then some manual
cleanup of the end result.
Link: https://lore.kernel.org/all/CAHk-=wji0u+OOtmAOD-5JV3SXcRJF___k_+8XNKmak0yd5vW1Q@mail.gmail.com/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZem5LwAKCRCRxhvAZXjc
onZsAQCjMNabNWAty2VBAQrNIpGkZ+AMA2DxEajPldaPiJH5zQEA9ea7feB3T47i
NUrXXfMQ5DSop+k5Y65pPkEpbX4rhQo=
=NZgd
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.9.uuid' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs uuid updates from Christian Brauner:
"This adds two new ioctl()s for getting the filesystem uuid and
retrieving the sysfs path based on the path of a mounted filesystem.
Getting the filesystem uuid has been implemented in filesystem
specific code for a while it's now lifted as a generic ioctl"
* tag 'vfs-6.9.uuid' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
xfs: add support for FS_IOC_GETFSSYSFSPATH
fs: add FS_IOC_GETFSSYSFSPATH
fat: Hook up sb->s_uuid
fs: FS_IOC_GETUUID
ovl: convert to super_set_uuid()
fs: super_set_uuid()
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZem4tQAKCRCRxhvAZXjc
ohnfAP4sm946PZfiC4y5Euk96WDC3hC8WCSBar+fpFmYVzeD9wEAy+NVCsjkMElz
vqNxwFULUwQjFxxvsM9gvhrgGUud1AE=
=UZk/
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.9.file' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull file locking updates from Christian Brauner:
"A few years ago struct file_lock_context was added to allow for
separate lists to track different types of file locks instead of using
a singly-linked list for all of them.
Now leases no longer need to be tracked using struct file_lock.
However, a lot of the infrastructure is identical for leases and locks
so separating them isn't trivial.
This splits a group of fields used by both file locks and leases into
a new struct file_lock_core. The new core struct is embedded in struct
file_lock. Coccinelle was used to convert a lot of the callers to deal
with the move, with the remaining 25% or so converted by hand.
Afterwards several internal functions in fs/locks.c are made to work
with struct file_lock_core. Ultimately this allows to split struct
file_lock into struct file_lock and struct file_lease. The file lease
APIs are then converted to take struct file_lease"
* tag 'vfs-6.9.file' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (51 commits)
filelock: fix deadlock detection in POSIX locking
filelock: always define for_each_file_lock()
smb: remove redundant check
filelock: don't do security checks on nfsd setlease calls
filelock: split leases out of struct file_lock
filelock: remove temporary compatibility macros
smb/server: adapt to breakup of struct file_lock
smb/client: adapt to breakup of struct file_lock
ocfs2: adapt to breakup of struct file_lock
nfsd: adapt to breakup of struct file_lock
nfs: adapt to breakup of struct file_lock
lockd: adapt to breakup of struct file_lock
fuse: adapt to breakup of struct file_lock
gfs2: adapt to breakup of struct file_lock
dlm: adapt to breakup of struct file_lock
ceph: adapt to breakup of struct file_lock
afs: adapt to breakup of struct file_lock
9p: adapt to breakup of struct file_lock
filelock: convert seqfile handling to use file_lock_core
filelock: convert locks_translate_pid to take file_lock_core
...
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZem4UQAKCRCRxhvAZXjc
ouERAQDg63R9s3bKmUgGqngf9cfr//VCTE+WVARwOUTdn2iDbwEA1IME7X1kL/Vz
EdhEjyqO6xom+ao/Vqxe0XIDNz70vgs=
=8RdE
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.9.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull iomap updates from Christian Brauner:
- Restore read-write hints in struct bio through the bi_write_hint
member for the sake of UFS devices in mobile applications. This can
result in up to 40% lower write amplification in UFS devices. The
patch series that builds on this will be coming in via the SCSI
maintainers (Bart)
- Overhaul the iomap writeback code. Afterwards ->map_blocks() is able
to map multiple blocks at once as long as they're in the same folio.
This reduces CPU usage for buffered write workloads on e.g., xfs on
systems with lots of cores (Christoph)
- Record processed bytes in iomap_iter() trace event (Kassey)
- Extend iomap_writepage_map() trace event after Christoph's
->map_block() changes to map mutliple blocks at once (Zhang)
* tag 'vfs-6.9.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (22 commits)
iomap: Add processed for iomap_iter
iomap: add pos and dirty_len into trace_iomap_writepage_map
block, fs: Restore the per-bio/request data lifetime fields
fs: Propagate write hints to the struct block_device inode
fs: Move enum rw_hint into a new header file
fs: Split fcntl_rw_hint()
fs: Verify write lifetime constants at compile time
fs: Fix rw_hint validation
iomap: pass the length of the dirty region to ->map_blocks
iomap: map multiple blocks at a time
iomap: submit ioends immediately
iomap: factor out a iomap_writepage_map_block helper
iomap: only call mapping_set_error once for each failed bio
iomap: don't chain bios
iomap: move the iomap_sector sector calculation out of iomap_add_to_ioend
iomap: clean up the iomap_alloc_ioend calling convention
iomap: move all remaining per-folio logic into iomap_writepage_map
iomap: factor out a iomap_writepage_handle_eof helper
iomap: move the PF_MEMALLOC check to iomap_writepages
iomap: move the io_folios field out of struct iomap_ioend
...
In punch_hole(), when the offset lies in the final block for a given
height, there is no hole to punch, but the maximum size check fails to
detect that. Consequently, punch_hole() will try to punch a hole beyond
the end of the metadata and fail. Fix the maximum size check.
Signed-off-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Some weird old filesytems have UUID-like things that we wish to expose
as UUIDs, but are smaller; add a length field so that the new
FS_IOC_(GET|SET)UUID ioctls can handle them in generic code.
And add a helper super_set_uuid(), for setting nonstandard length uuids.
Helper is now required for the new FS_IOC_GETUUID ioctl; if
super_set_uuid() hasn't been called, the ioctl won't be supported.
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Link: https://lore.kernel.org/r/20240207025624.1019754-2-kent.overstreet@linux.dev
Signed-off-by: Christian Brauner <brauner@kernel.org>