No need to set error = 0 since it's set further down.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
This patch looks more invasive than it is. It simply moves function
qdsb_put before qd_unlock, then changes qd_unlock to call it rather than
open coding it. Again, this reduces redundancy.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
This patch adds some new fields to the gfs2 status file in sysfs to aid
in debugging.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Function need_sync is supposed to determine if a qd element needs to be
synced. If the "change" (qd_change) is zero, it does not need to be
synced because there's literally no change in the value. Before this
patch need_sync returned false if value < 0. That should be <= 0.
This patch changes the check to <=.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
This patch simplifies function need_sync by eliminating a variable in
favor of just returning the appropriate value as soon as we know it.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Function gfs2_write_disk_quota checks if its write overflows onto
another page, and if so, does a second write. Before this patch it kept
two variables for this, but only one is needed. This patch simplifies
it by eliminating pg_oflow.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Function gfs2_write_buf_to_page uses variable done to exit its loop, but
it's unnecessary if we just code an infinite loop and exit when we need.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
This patch passes the superblock pointer to gfs2_write_buf_to_page so it
becomes more apparent it's dealing with the system quota file.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Like the previous patch, we now pass the superblock pointer to function
gfs2_write_disk_quota. This makes the code more understandable, since it
only operates on the quota inode.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Before this change function gfs2_adjust_quota's first parameter was an
gfs2_inode pointer. But it always pointed to the quota inode. Here we
switch that to pass the superblock pointer, sdp, so it is easier to read
the code and understand that it's only dealing with the quota inode.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Since patch 845802b112 function gfs2_write_buf_to_page checks if the
target inode is jdata or ordered. This function only operates on the
system quota file, which is always jdata, so the check for jdata is
useless. This patch removes it.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
This patch adds a new mount option quota=quiet which is the same as
quota=on but it suppresses gfs2 quota error messages.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Add the device name to the names of the gfs2_logd and gfs2_quotad kernel
threads to allow for easier identification.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Rename the "gfs_recovery" workqueue to "gfs2_recovery", and
gfs_recovery_wq to gfs2_recovery_wq.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Function gfs2_withdraw() tries to synchronize concurrent callers by
atomically setting the SDF_WITHDRAWN flag in the first caller, setting
the SDF_WITHDRAW_IN_PROG flag to indicate that a withdraw is in
progress, performing the actual withdraw, and clearing the
SDF_WITHDRAW_IN_PROG flag when done. All other callers wait for the
SDF_WITHDRAW_IN_PROG flag to be cleared before returning.
This leaves a small window in which callers can find the SDF_WITHDRAWN
flag set before the SDF_WITHDRAW_IN_PROG flag has been set, causing them
to return prematurely, before the withdraw has been completed.
Fix that by setting the SDF_WITHDRAWN and SDF_WITHDRAW_IN_PROG flags
atomically.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Immediately stop the logd and quotad kernel threads when a filesystem
withdraw is detected: those threads aren't doing anything useful after a
withdraw. (Depends on the extra logd and quotad task struct references
held since commit 7a109f383fa3 ("gfs2: Fix asynchronous thread
destruction").)
In addition, check for kthread_should_stop() in the wait condition in
gfs2_quotad() to stop immediately when kthread_stop() is called.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
The kernel threads are currently stopped and destroyed synchronously by
gfs2_make_fs_ro() and gfs2_put_super(), and asynchronously by
signal_our_withdraw(), with no synchronization, so the synchronous and
asynchronous contexts can race with each other.
First, when creating the kernel threads, take an extra task struct
reference so that the task struct won't go away immediately when they
terminate. This allows those kthreads to terminate immediately when
they're done rather than hanging around as zombies until they are reaped
by kthread_stop(). When kthread_stop() is called on a terminated
kthread, it will return immediately.
Second, in signal_our_withdraw(), once the SDF_JOURNAL_LIVE flag has
been cleared, wake up the logd and quotad wait queues instead of
stopping the logd and quotad kthreads. The kthreads are then expected
to terminate automatically within short time, but if they cannot, they
will not block the withdraw.
For example, if a user process and one of the kthread decide to withdraw
at the same time, only one of them will perform the actual withdraw and
the other will wait for it to be done. If the kthread ends up being the
one to wait, the withdrawing user process won't be able to stop it.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
[ 81.372851][ T5532] CPU: 1 PID: 5532 Comm: syz-executor.0 Not tainted 6.2.0-rc1-syzkaller-dirty #0
[ 81.382080][ T5532] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/12/2023
[ 81.392343][ T5532] Call Trace:
[ 81.395654][ T5532] <TASK>
[ 81.398603][ T5532] dump_stack_lvl+0x1b1/0x290
[ 81.418421][ T5532] gfs2_assert_warn_i+0x19a/0x2e0
[ 81.423480][ T5532] gfs2_quota_cleanup+0x4c6/0x6b0
[ 81.428611][ T5532] gfs2_make_fs_ro+0x517/0x610
[ 81.457802][ T5532] gfs2_withdraw+0x609/0x1540
[ 81.481452][ T5532] gfs2_inode_refresh+0xb2d/0xf60
[ 81.506658][ T5532] gfs2_instantiate+0x15e/0x220
[ 81.511504][ T5532] gfs2_glock_wait+0x1d9/0x2a0
[ 81.516352][ T5532] do_sync+0x485/0xc80
[ 81.554943][ T5532] gfs2_quota_sync+0x3da/0x8b0
[ 81.559738][ T5532] gfs2_sync_fs+0x49/0xb0
[ 81.564063][ T5532] sync_filesystem+0xe8/0x220
[ 81.568740][ T5532] generic_shutdown_super+0x6b/0x310
[ 81.574112][ T5532] kill_block_super+0x79/0xd0
[ 81.578779][ T5532] deactivate_locked_super+0xa7/0xf0
[ 81.584064][ T5532] cleanup_mnt+0x494/0x520
[ 81.593753][ T5532] task_work_run+0x243/0x300
[ 81.608837][ T5532] exit_to_user_mode_loop+0x124/0x150
[ 81.614232][ T5532] exit_to_user_mode_prepare+0xb2/0x140
[ 81.619820][ T5532] syscall_exit_to_user_mode+0x26/0x60
[ 81.625287][ T5532] do_syscall_64+0x49/0xb0
[ 81.629710][ T5532] entry_SYSCALL_64_after_hwframe+0x63/0xcd
In this backtrace, gfs2_quota_sync() takes quota data references and
then calls do_sync(). Function do_sync() encounters filesystem
corruption and withdraws the filesystem, which (among other things) calls
gfs2_quota_cleanup(). Function gfs2_quota_cleanup() wrongly assumes
that nobody is holding any quota data references anymore, and destroys
all quota data objects. When gfs2_quota_sync() then resumes and
dereferences the quota data objects it is holding, those objects are no
longer there.
Function gfs2_quota_cleanup() deals with resource deallocation and can
easily be delayed until gfs2_put_super() in the case of a filesystem
withdraw. In fact, most of the other work gfs2_make_fs_ro() does is
unnecessary during a withdraw as well, so change signal_our_withdraw()
to skip gfs2_make_fs_ro() and perform the necessary steps directly
instead.
Thanks to Edward Adam Davis <eadavis@sina.com> for the initial patches.
Link: https://lore.kernel.org/all/0000000000002b5e2405f14e860f@google.com
Reported-by: syzbot+3f6a670108ce43356017@syzkaller.appspotmail.com
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
In gfs2_quota_cleanup(), wait for the quota data objects to be freed
before returning. Otherwise, there is no guarantee that the quota data
objects will be gone when their kmem cache is destroyed.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Fix the refcount of quota data objects created directly by
gfs2_quota_init(): those are placed into the in-memory quota "database"
for eventual syncing to the main quota file, but they are not actively
held and should thus have an initial refcount of 0.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Once a filesystem is withdrawn, don't complain about quota changes
that can't be synced to the main quota file anymore.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Rename gfs2_qd_dispose() to gfs2_qd_dispose_list(). Move some code
duplicated in gfs2_qd_dispose_list() and gfs2_quota_cleanup() into a
new gfs2_qd_dispose() function.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Change gfs2_quota_cleanup() to move the quota data objects to dispose of
on a dispose list and call gfs2_qd_dispose() on that list, like
gfs2_qd_shrink_scan() does, instead of disposing of the quota data
objects directly.
This may look a bit pointless by itself, but it will make more sense in
combination with a fix that follows.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Function gfs2_qd_isolate must only return LRU_REMOVED when removing the
item from the lru list; otherwise, the number of items on the list will
go wrong.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Rename the SDF_DEACTIVATING flag to SDF_KILL to make it more obvious
that this relates to the kill_sb filesystem operation.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Rename sd_glock_wait to sd_kill_wait: we'll use it for other things
related to "killing" a filesystem on unmount soon (kill_sb).
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Before this patch many of the functions in quota.c got their superblock
pointer, sdp, from the quota_data's glock pointer. That's silly because
the qd already has its own pointer to the superblock (qd_sbd).
This patch changes references to use that instead, eliminating a level
of indirection.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Commit f07b352021 ("GFS2: Made logd daemon take into account log
demand") changed gfs2_ail_flush_reqd() and gfs2_jrnl_flush_reqd() to
take sd_log_blks_needed into account, but the checks in
gfs2_log_commit() were not updated correspondingly.
Once that is fixed, gfs2_jrnl_flush_reqd() and gfs2_ail_flush_reqd() can
be used in gfs2_log_commit(). Make those two helpers available to
gfs2_log_commit() by defining them above gfs2_log_commit().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
When quotad detects an I/O error, it sets sd_log_error and then it wakes
up logd to withdraw the filesystem. However, logd doesn't wake up when
sd_log_error is set. Fix that.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
First, function gfs2_ail_flush_reqd checks the SDF_FORCE_AIL_FLUSH flag
to determine if an AIL flush should be forced in low-memory situations.
However, it also immediately clears the flag, and when called repeatedly
as in function gfs2_logd, the flag will be lost. Fix that by pulling
the SDF_FORCE_AIL_FLUSH flag check out of gfs2_ail_flush_reqd.
Second, function gfs2_writepages sets the SDF_FORCE_AIL_FLUSH flag
whether or not enough pages were written. If enough pages could be
written, flushing the AIL is unnecessary, though.
Third, gfs2_writepages doesn't wake up logd after setting the
SDF_FORCE_AIL_FLUSH flag, so it can take a long time for logd to react.
It would be preferable to wake up logd, but that hurts the performance
of some workloads and we don't quite understand why so far, so don't
wake up logd so far.
Fixes: b066a4eebd ("gfs2: forcibly flush ail to relieve memory pressure")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Consider the following case:
1. A glock is held in shared mode.
2. A process requests the glock in exclusive mode (rename).
3. Before the lock is granted, more processes (read / ls) request the
glock in shared mode again.
4. gfs2 sends a request to dlm for the lock in exclusive mode because
that holder is at the head of the queue.
5. Somehow the dlm request gets canceled, so dlm sends us back a
response with state == LM_ST_SHARED and LM_OUT_CANCELED. So at that
point, the glock is still held in shared mode.
6. finish_xmote gets called to process the response from dlm. It detects
that the glock is not in the requested mode and no demote is in
progress, so it moves the canceled holder to the tail of the queue
and finds the new holder at the head of the queue. That holder is
requesting the glock in shared mode.
7. finish_xmote calls do_xmote to transition the glock into shared mode,
but the glock is already in shared mode and so do_xmote complains
about that with:
GLOCK_BUG_ON(gl, gl->gl_state == gl->gl_target);
Instead, in finish_xmote, after moving the canceled holder to the tail
of the queue, check if any new holders can be granted. Only call
do_xmote to repeat the dlm request if the holder at the head of the
queue is requesting the glock in a mode that is incompatible with the
mode the glock is currently held in.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
The last user of this flag was removed in commit b77b4a4815 ("gfs2:
Rework freeze / thaw logic").
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Revert the rest of commit 220cca2a4f ("GFS2: Change truncate page
allocation to be GFP_NOFS"):
In gfs2_unstuff_dinode(), there is no need to carry out the page cache
allocation under GFP_NOFS because inodes on the "regular" filesystem are
never un-inlined under memory pressure, so switch back from
find_or_create_page() to grab_cache_page() here as well.
Inodes on the "metadata" filesystem can theoretically be un-inlined
under memory pressure, but any page cache allocations in that context
would happen in GFP_NOFS context because those inodes have
inode->i_mapping->gfp_mask set to GFP_NOFS (see the previous patch).
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Set mapping->gfp mask to GFP_NOFS for all metadata inodes so that
allocating pages in the address space of those inodes won't call back
into the filesystem. This allows to switch back from
find_or_create_page() to grab_cache_page() in two places.
Partially reverts commit 220cca2a4f ("GFS2: Change truncate page
allocation to be GFP_NOFS").
Thanks to Dan Carpenter <dan.carpenter@linaro.org> for pointing out a
Smatch static checker warning.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Simplify code pattern of 'folio->index + folio_nr_pages(folio)' by using
the existing helper folio_next_index().
Signed-off-by: Minjie Du <duminjie@vivo.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
- Fix a freeze consistency check in gfs2_trans_add_meta().
- Don't use filemap_splice_read as it can cause deadlocks on gfs2.
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmTSOikUHGFncnVlbmJh
QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTpggg//Q+1yil5lS+Egf5Z+gL1E0SgXge7k
CfEcrjSfkIL8LnVgZSpJD8I++nMdXJb533qkGOIwvAifqIP2ZnGrtK2T4cF6PgWs
iAtKLPJGtg+6HswgGMEpEnl7sSBo4DYE6EH9TCoht9N0nSfJCAVWP2brVxEmnacl
omIXQymQAAGilVB58tru0XqfvneeHCvsipEYrJ1if1VGQHUwwU5v3uZGiOh2/VB2
CU8qrA9kX3O+cXyHDED5Rja0pKkZlxogK6/OUPophTGSRDOJZsnT36OfSCAF+Dhl
TZ1J8pBircPk7nA5u7GRUk6u5HVc2idmOLx6FoRfgJ97tsDz5NodNxEtyJBY0g/O
mOD6IRqSLSNgTrH9AFff4BauilT+NOCyMoe69Dw6XHCMfxrQB4l9nEH1clQ+nuks
2jBcjkpEn71dvAJjM+YNU7a2HOmhz7w2zTidv1pN7SIspBRSOD9w5DIt855PNIwd
y7SkUsT3GVcrVaUd2mNAmM1PWn0Gu+V3tPJbXqfzCEOKLKyOtMkMndm+uHB7wwH1
25HaNV+Bj8vWbMTjF0KYIkksQ9TgboIzdeV6Q5DrIEyWsXH/pRJF07JVgGjU80/n
+pIwqj1yVkipPVZ8orKvoWucp+Q3qFN2CHJTdKjwzeHFM2+MlXktHNDdKbAEGRiw
irK7Mq41PwAMVgY=
=pPXf
-----END PGP SIGNATURE-----
Merge tag 'gfs2-v6.4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 fixes from Andreas Gruenbacher:
- Fix a freeze consistency check in gfs2_trans_add_meta()
- Don't use filemap_splice_read as it can cause deadlocks on gfs2
* tag 'gfs2-v6.4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: Don't use filemap_splice_read
gfs2: Fix freeze consistency check in gfs2_trans_add_meta
Starting with patch 2cb1e08985, gfs2 started using the new function
filemap_splice_read rather than the old (and subsequently deleted)
function generic_file_splice_read.
filemap_splice_read works by taking references to a number of folios in
the page cache and splicing those folios into a pipe. The folios are
then read from the pipe and the folio references are dropped. This can
take an arbitrary amount of time. We cannot allow that in gfs2 because
those folio references will pin the inode glock to the node and prevent
it from being demoted, which can lead to cluster-wide deadlocks.
Instead, use copy_splice_read.
(In addition, the old generic_file_splice_read called into ->read_iter,
which called gfs2_file_read_iter, which took the inode glock during the
operation. The new filemap_splice_read interface does not take the
inode glock anymore. This is fixable, but it still wouldn't prevent
cluster-wide deadlocks.)
Fixes: 2cb1e08985 ("splice: Use filemap_splice_read() instead of generic_file_splice_read()")
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Function gfs2_trans_add_meta() checks for the SDF_FROZEN flag to make
sure that no buffers are added to a transaction while the filesystem is
frozen. With the recent freeze/thaw rework, the SDF_FROZEN flag is
cleared after thaw_super() is called, which is sufficient for
serializing freeze/thaw.
However, other filesystem operations started after thaw_super() may now
be calling gfs2_trans_add_meta() before the SDF_FROZEN flag is cleared,
which will trigger the SDF_FROZEN check in gfs2_trans_add_meta(). Fix
that by checking the s_writers.frozen state instead.
In addition, make sure not to call gfs2_assert_withdraw() with the
sd_log_lock spin lock held. Check for a withdrawn filesystem before
checking for a frozen filesystem, and don't pin/add buffers to the
current transaction in case of a failure in either case.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZM+bcQAKCRCRxhvAZXjc
opWnAP9Ik49607Rn3OvFhWiYQp21nJ9NTs4lp5H30gMM3KhOxQEA9YAafIRH3rMs
zYjmEBwf4FCW9XQ4QgmktsW4Y7RqggE=
=DCbd
-----END PGP SIGNATURE-----
Merge tag 'v6.5-rc5.vfs.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
- Fix a wrong check for O_TMPFILE during RESOLVE_CACHED lookup
- Clean up directory iterators and clarify file_needs_f_pos_lock()
* tag 'v6.5-rc5.vfs.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs: rely on ->iterate_shared to determine f_pos locking
vfs: get rid of old '->iterate' directory operation
proc: fix missing conversion to 'iterate_shared'
open: make RESOLVE_CACHED correctly test for O_TMPFILE
Now that we removed ->iterate we don't need to check for either
->iterate or ->iterate_shared in file_needs_f_pos_lock(). Simply check
for ->iterate_shared instead. This will tell us whether we need to
unconditionally take the lock. Not just does it allow us to avoid
checking f_inode's mode it also actually clearly shows that we're
locking because of readdir.
Signed-off-by: Christian Brauner <brauner@kernel.org>
All users now just use '->iterate_shared()', which only takes the
directory inode lock for reading.
Filesystems that never got convered to shared mode now instead use a
wrapper that drops the lock, re-takes it in write mode, calls the old
function, and then downgrades the lock back to read mode.
This way the VFS layer and other callers no longer need to care about
filesystems that never got converted to the modern era.
The filesystems that use the new wrapper are ceph, coda, exfat, jfs,
ntfs, ocfs2, overlayfs, and vboxsf.
Honestly, several of them look like they really could just iterate their
directories in shared mode and skip the wrapper entirely, but the point
of this change is to not change semantics or fix filesystems that
haven't been fixed in the last 7+ years, but to finally get rid of the
dual iterators.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
I'm looking at the directory handling due to the discussion about f_pos
locking (see commit 797964253d: "file: reinstate f_pos locking
optimization for regular files"), and wanting to clean that up.
And one source of ugliness is how we were supposed to move filesystems
over to the '->iterate_shared()' function that only takes the inode lock
for reading many many years ago, but several filesystems still use the
bad old '->iterate()' that takes the inode lock for exclusive access.
See commit 6192269444 ("introduce a parallel variant of ->iterate()")
that also added some documentation stating
Old method is only used if the new one is absent; eventually it will
be removed. Switch while you still can; the old one won't stay.
and that was back in April 2016. Here we are, many years later, and the
old version is still clearly sadly alive and well.
Now, some of those old style iterators are probably just because the
filesystem may end up having per-inode mutable data that it uses for
iterating a directory, but at least one case is just a mistake.
Al switched over most filesystems to use '->iterate_shared()' back when
it was introduced. In particular, the /proc filesystem was converted as
one of the first ones in commit f50752eaa0 ("switch all procfs
directories ->iterate_shared()").
But then later one new user of '->iterate()' was then re-introduced by
commit 6d9c939dbe ("procfs: add smack subdir to attrs").
And that's clearly not what we wanted, since that new case just uses the
same 'proc_pident_readdir()' and 'proc_pident_lookup()' helper functions
that other /proc pident directories use, and they are most definitely
safe to use with the inode lock held shared.
So just fix it.
This still leaves a fair number of oddball filesystems using the
old-style directory iterator (ceph, coda, exfat, jfs, ntfs, ocfs2,
overlayfs, and vboxsf), but at least we don't have any remaining in the
core filesystems.
I'm going to add a wrapper function that just drops the read-lock and
takes it as a write lock, so that we can clean up the core vfs layer and
make all the ugly 'this filesystem needs exclusive inode locking' be
just filesystem-internal warts.
I just didn't want to make that conversion when we still had a core user
left.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
O_TMPFILE is actually __O_TMPFILE|O_DIRECTORY. This means that the old
fast-path check for RESOLVE_CACHED would reject all users passing
O_DIRECTORY with -EAGAIN, when in fact the intended test was to check
for __O_TMPFILE.
Cc: stable@vger.kernel.org # v5.12+
Fixes: 99668f6180 ("fs: expose LOOKUP_CACHED through openat2() RESOLVE_CACHED")
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Message-Id: <20230806-resolve_cached-o_tmpfile-v1-1-7ba16308465e@cyphar.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmTOmLAACgkQiiy9cAdy
T1EAXAv+L20j8z9XBHdNf2UYsft961NoS3058DC2PBd6xy3yxPDUgrBwFFhx2JGe
bSH6Q+LfZjJdlfh04iZjwXbEIjFjL8aWpD+gq1F0NenjNXncel39oDaxn7dNgsjl
FUfkXrsOk/UohnyqoCFPmvzt9dtltShXMjHknIPRcOnFAwtd2GSXvpzQEI3m07cB
OVQ2sbZ5U1TTxlOIcaJYDSBqpA0yuivohdJkcn7eJytnXXTfRGzFDstfGP18gWfg
UbdilYKXvsRp9Y+zR6Z0iv1j3llZTfNBtpF25X7nC3yDTQV36UIdWpKby1jSq/V0
7o6OuooMv7EPLU7ZP8RldWclyahoTKoV6F82LakKENbzYq75IDAPRfKZL5cd5yhk
PiGAky8aiAcdJZl3OFhIG3Fa9qGHK3MJDwxXV+RVpNq71UEyM6BhY6wUOOziC6Xk
xio+Mt2hPuAQLbOf25MMAB9uoz8IDSpcdudCJCwKC+zPSZXZAE/FSJBvzQUwkYLj
kkWoz+ke
=AY3g
-----END PGP SIGNATURE-----
Merge tag '6.5-rc4-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fix from Steve French:
- Fix DFS interlink problem (different namespace)
* tag '6.5-rc4-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: fix dfs link mount against w2k8
osd_request_timeout option and another fix to reduce the potential for
erroneous blocklisting -- this time in CephFS. All going to stable.
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmTNFFUTHGlkcnlvbW92
QGdtYWlsLmNvbQAKCRBKf944AhHzi5I8B/9a8C5ed0XfTadHcHX5VQsY3b//4rgp
0VYkQbjYnSCwrYRIPsvnL8LeLHzbcPGLpFAQXg7uUlmJ5dpaOz303hKmKt5GdyOR
qvWka3K4zeG177b6yc1srqs0cEsCLpQrn+krnvOl5v87QdFsCP/bsJMOrJ9mlhdM
9GjkjDRn6jvNyOLGbn3kIvwCRF9NH6/nHzjBcTUzvS8fBUye02o9C1H6ZQ7sYjKH
sJnmQCNCFHEqdaVjDZ7mw/doIrAbmTV6sgusuPjiF5bHILzX4oWG4UJmRpHFV//S
JPQgMp2DNjP8tW9aCVLVVVV5t5AKBr84etF59DaFNflk27U3COJWkE0a
=gw7n
-----END PGP SIGNATURE-----
Merge tag 'ceph-for-6.5-rc5' of https://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"Two patches to improve RBD exclusive lock interaction with
osd_request_timeout option and another fix to reduce the potential for
erroneous blocklisting -- this time in CephFS. All going to stable"
* tag 'ceph-for-6.5-rc5' of https://github.com/ceph/ceph-client:
libceph: fix potential hang in ceph_osdc_notify()
rbd: prevent busy loop when requesting exclusive lock
ceph: defer stopping mdsc delayed_work
In commit 20ea1e7d13 ("file: always lock position for
FMODE_ATOMIC_POS") we ended up always taking the file pos lock, because
pidfd_getfd() could get a reference to the file even when it didn't have
an elevated file count due to threading of other sharing cases.
But Mateusz Guzik reports that the extra locking is actually measurable,
so let's re-introduce the optimization, and only force the locking for
directory traversal.
Directories need the lock for correctness reasons, while regular files
only need it for "POSIX semantics". Since pidfd_getfd() is about
debuggers etc special things that are _way_ outside of POSIX, we can
relax the rules for that case.
Reported-by: Mateusz Guzik <mjguzik@gmail.com>
Cc: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/linux-fsdevel/20230803095311.ijpvhx3fyrbkasul@f/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>