Before this patch, each individual "go_lock" glock operation (glop)
checked the GL_SKIP flag, and if set, would skip further processing.
This patch changes the logic so the go_lock caller, function go_promote,
checks the GL_SKIP flag before calling the go_lock op in the first place.
This avoids having to unnecessarily unlock gl_lockref.lock only to
re-lock it again.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Somehow, the GL_SKIP flag was missed when dumping glock holders.
This patch adds it to function hflags2str. I added it at the end because
I wanted Holder and Skip flags together to read "Hs" rather than "sH"
to avoid confusion with "Shared" ("SH") holder state.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Before this patch, function gfs2_rgrp_go_lock checked if GL_SKIP and
ar_rgrplvb were both true. However, GL_SKIP is only set for rgrps if
ar_rgrplvb is true (see gfs2_inplace_reserve). This patch simply removes
the redundant check.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Also disable page faults during direct I/O requests and implement a
similar kind of retry logic as in the buffered I/O case.
The retry logic in the direct I/O case differs from the buffered I/O
case in the following way: direct I/O doesn't provide the kinds of
consistency guarantees between concurrent reads and writes that buffered
I/O provides, so once we lose the inode glock while faulting in user
pages, we always resume the operation. We never need to return a
partial read or write.
This locking problem was originally reported by Jan Kara. Linus came up
with the idea of disabling page faults. Many thanks to Al Viro and
Matthew Wilcox for their feedback.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Add a done_before argument to iomap_dio_rw that indicates how much of
the request has already been transferred. When the request succeeds, we
report that done_before additional bytes were tranferred. This is
useful for finishing a request asynchronously when part of the request
has already been completed synchronously.
We'll use that to allow iomap_dio_rw to be used with page faults
disabled: when a page fault occurs while submitting a request, we
synchronously complete the part of the request that has already been
submitted. The caller can then take care of the page fault and call
iomap_dio_rw again for the rest of the request, passing in the number of
bytes already tranferred.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
In the .read_iter and .write_iter file operations, we're accessing
user-space memory while holding the inode glock. There is a possibility
that the memory is mapped to the same file, in which case we'd recurse
on the same glock.
We could detect and work around this simple case of recursive locking,
but more complex scenarios exist that involve multiple glocks,
processes, and cluster nodes, and working around all of those cases
isn't practical or even possible.
Avoid these kinds of problems by disabling page faults while holding the
inode glock. If a page fault would occur, we either end up with a
partial read or write or with -EFAULT if nothing could be read or
written. In either case, we know that we're not done with the
operation, so we indicate that we're willing to give up the inode glock
and then we fault in the missing pages. If that made us lose the inode
glock, we return a partial read or write. Otherwise, we resume the
operation.
This locking problem was originally reported by Jan Kara. Linus came up
with the idea of disabling page faults. Many thanks to Al Viro and
Matthew Wilcox for their feedback.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Now that gfs2_file_buffered_write is the only remaining user of
ip->i_gh, we can move the glock holder to the stack (or rather, use the
one we already have on the stack); there is no need for keeping the
holder in the inode anymore.
This is slightly complicated by the fact that we're using ip->i_gh for
the statfs inode in gfs2_file_buffered_write as well. Writing to the
statfs inode isn't very common, so allocate the statfs holder
dynamically when needed.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
So far, for buffered writes, we were taking the inode glock in
gfs2_iomap_begin and dropping it in gfs2_iomap_end with the intention of
not holding the inode glock while iomap_write_actor faults in user
pages. It turns out that iomap_write_actor is called inside iomap_begin
... iomap_end, so the user pages were still faulted in while holding the
inode glock and the locking code in iomap_begin / iomap_end was
completely pointless.
Move the locking into gfs2_file_buffered_write instead. We'll take care
of the potential deadlocks due to faulting in user pages while holding a
glock in a subsequent patch.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
This patch introduces a new HIF_MAY_DEMOTE flag and infrastructure that
will allow glocks to be demoted automatically on locking conflicts.
When a locking request comes in that isn't compatible with the locking
state of an active holder and that holder has the HIF_MAY_DEMOTE flag
set, the holder will be demoted before the incoming locking request is
granted.
Note that this mechanism demotes active holders (with the HIF_HOLDER
flag set), while before we were only demoting glocks without any active
holders. This allows processes to keep hold of locks that may form a
cyclic locking dependency; the core glock logic will then break those
dependencies in case a conflicting locking request occurs. We'll use
this to avoid giving up the inode glock proactively before faulting in
pages.
Processes that allow a glock holder to be taken away indicate this by
calling gfs2_holder_allow_demote(), which sets the HIF_MAY_DEMOTE flag.
Later, they call gfs2_holder_disallow_demote() to clear the flag again,
and then they check if their holder is still queued: if it is, they are
still holding the glock; if it isn't, they can re-acquire the glock (or
abort).
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Pass the first current glock holder into function may_grant and
deobfuscate the logic there.
While at it, switch from BUG_ON to GLOCK_BUG_ON in may_grant. To make
that build cleanly, de-constify the may_grant arguments.
We're now using function find_first_holder in do_promote, so move the
function's definition above do_promote.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Add a wrapper around iomap_file_buffered_write. We'll add code for when
the operation needs to be retried here later.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Pull gfs2 setattr updates from Al Viro:
"Make it possible for filesystems to use a generic 'may_setattr()' and
switch gfs2 to using it"
* 'work.gfs2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
gfs2: Switch to may_setattr in gfs2_setattr
fs: Move notify_change permission checks into may_setattr
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCYTDKKAAKCRDh3BK/laaZ
PG9PAQCUF0fdBlCKudwSEt5PV5xemycL9OCAlYCd7d4XbBIe9wEA6sVJL9J+OwV2
aF0NomiXtJccE+S9+byjVCyqSzQJGQQ=
=6L2Y
-----END PGP SIGNATURE-----
Merge tag 'ovl-update-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs update from Miklos Szeredi:
- Copy up immutable/append/sync/noatime attributes (Amir Goldstein)
- Improve performance by enabling RCU lookup.
- Misc fixes and improvements
The reason this touches so many files is that the ->get_acl() method now
gets a "bool rcu" argument. The ->get_acl() API was updated based on
comments from Al and Linus:
Link: https://lore.kernel.org/linux-fsdevel/CAJfpeguQxpd6Wgc0Jd3ks77zcsAv_bn0q17L3VNnnmPKu11t8A@mail.gmail.com/
* tag 'ovl-update-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: enable RCU'd ->get_acl()
vfs: add rcu argument to ->get_acl() callback
ovl: fix BUG_ON() in may_delete() when called from ovl_cleanup()
ovl: use kvalloc in xattr copy-up
ovl: update ctime when changing fileattr
ovl: skip checking lower file's i_writecount on truncate
ovl: relax lookup error on mismatch origin ftype
ovl: do not set overlay.opaque for new directories
ovl: add ovl_allow_offline_changes() helper
ovl: disable decoding null uuid with redirect_dir
ovl: consistent behavior for immutable/append-only inodes
ovl: copy up sync/noatime fileattr flags
ovl: pass ovl_fs to ovl_check_setxattr()
fs: add generic helper for filling statx attribute flags
- Simplify the bio_end_page usage in the buffered IO code.
- Support reading inline data at nonzero offsets for erofs.
- Fix some typos and bad grammar.
- Convert kmap_atomic usage in the inline data read path.
- Add some extra inline data input checking.
- Fix a memory corruption bug stemming from iomap_swapfile_activate
trying to activate more pages than mm was expecting.
- Pass errnos through the page writeback code so that writeback errors
are reported correctly instead of being munged to EIO.
- Replace iomap_apply with a open-coded iterator loops to reduce the
number of indirect calls by a third to a half.
- Refactor the fsdax code to use iomap iterators instead of the
open-coded iomap_apply code that it had before.
- Format file range iomap tracepoint data in hexadecimal and
standardize the names used in the pretty-print string.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmEnwC0ACgkQ+H93GTRK
tOtVOQ//Zu9ul2ZmPARMV8xyAfopnLpmggREOthFbPkDZ3z3ZgRpPxlbAvWEEKnj
VDNLFNj204rDojuxP/YSdxgiawLod7dYfXIwwft8R8oI7MdgVQhpvimUi5bkz/Od
X5pmFDe84INfFvEztOgC+sPk1RI/ToQLgrcIffWMWfF2iyVkNVMCD5MMe6LoH1la
9GbVCfPx6Y2Nffaa8EuAEgaCo7FMPc81bvQG4qpeqXyX8qql/r5n4YENhkn3n4qa
zI4F2lgqwbelFkamZOYNDjtLt13lb7Ze0PoFOpmTZUqlyybqhRxDvJ+OxZn8W6zH
20pxWx/RCXhCp/sS6DRcYyf7WKoIfdGDkxed7aSuhJ+VKKtBtsjMoy7dh5IY5RJa
8L1DMat6xtea8Glx04SF7Vib0n/An9oHOTzLEWxsUlRaPhW68uVpKgXuGLTAf+dc
ztJhlQ9pLX0D2NmgGlkXN8d4F1XEH2BgyIrtF6UNtMbyIlCREHM9HELJs6JzKl6U
a4ivJXyaq8o/hlXr8IMWUOTVubS0i+hgvvQjnVJmcSTJxhH10mPPJLnNsGX6heD9
SlnnXRbD03iqsbMJP/R431VKooryOSKBc86IEECkuMz3RUfw75DGAnLtETnT1rsA
71rSVf5NaCGZ2hV4du6jv53TS7yrPpqkxJHyDWD1WP4xGPbO1XA=
=iVns
-----END PGP SIGNATURE-----
Merge tag 'iomap-5.15-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull iomap updates from Darrick Wong:
"The most notable externally visible change for this cycle is the
addition of support for reads to inline tail fragments of files, which
was requested by the erofs developers; and a correction for a kernel
memory corruption bug if the sysadmin tries to activate a swapfile
with more pages than the swapfile header suggests.
We also now report writeback completion errors to the file mapping
correctly, instead of munging all errors into EIO.
Internally, the bulk of the changes are Christoph's patchset to reduce
the indirect function call count by a third to a half by converting
iomap iteration from a loop pattern to a generator/consumer pattern.
As an added bonus, fsdax no longer open-codes iomap apply loops.
Summary:
- Simplify the bio_end_page usage in the buffered IO code.
- Support reading inline data at nonzero offsets for erofs.
- Fix some typos and bad grammar.
- Convert kmap_atomic usage in the inline data read path.
- Add some extra inline data input checking.
- Fix a memory corruption bug stemming from iomap_swapfile_activate
trying to activate more pages than mm was expecting.
- Pass errnos through the page writeback code so that writeback
errors are reported correctly instead of being munged to EIO.
- Replace iomap_apply with a open-coded iterator loops to reduce the
number of indirect calls by a third to a half.
- Refactor the fsdax code to use iomap iterators instead of the
open-coded iomap_apply code that it had before.
- Format file range iomap tracepoint data in hexadecimal and
standardize the names used in the pretty-print string"
* tag 'iomap-5.15-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (41 commits)
iomap: standardize tracepoint formatting and storage
mm/swap: consider max pages in iomap_swapfile_add_extent
iomap: move loop control code to iter.c
iomap: constify iomap_iter_srcmap
fsdax: switch the fault handlers to use iomap_iter
fsdax: factor out a dax_fault_actor() helper
fsdax: factor out helpers to simplify the dax fault code
iomap: rework unshare flag
iomap: pass an iomap_iter to various buffered I/O helpers
iomap: remove iomap_apply
fsdax: switch dax_iomap_rw to use iomap_iter
iomap: switch iomap_swapfile_activate to use iomap_iter
iomap: switch iomap_seek_data to use iomap_iter
iomap: switch iomap_seek_hole to use iomap_iter
iomap: switch iomap_bmap to use iomap_iter
iomap: switch iomap_fiemap to use iomap_iter
iomap: switch __iomap_dio_rw to use iomap_iter
iomap: switch iomap_page_mkwrite to use iomap_iter
iomap: switch iomap_zero_range to use iomap_iter
iomap: switch iomap_file_unshare to use iomap_iter
...
* Various withdraw related fixes (freeze glock recursion, thread
initialization / destruction order, journal recovery, glock cleanup,
withdraw under journal lock).
* Some error message improvements.
* Various minor cleanups.
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmEtRyUUHGFncnVlbmJh
QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTrMMA//ZBr+FmdlDmFmQ75R/50fy9ytBGb4
Z5K8xEbkI4i4zHA8SjJat2pnWccvZ8ahQkIKmYnRmLntegWZy1jKCqXabnnp+v8I
i2ceRR25uZnCMgP63VSpQnuZ2K6uXdg1f3nrUwcyUnUV1Y5W4wqMXZjY4V+2Hesu
HUJFpk8FRxmKQkbcKa9PWdda/LnbT72siW0zTt0wJpXTcFyzXD5vZRrTdRxdvMQA
xEDPKb1r2kKSQmvpziq1dVxbo9PHS7dSE9bzDPhA4aedXYGVmwwGjMJH7hAeKP+T
fhnOnxmOHYK/kUUi+VquONShd0RkHjnqpc10HJ5uCQfS7qETNg5hOOaw4X0S9bXZ
Ni+uVtZwS526RnTGCZCeS19bkCu7wPa+/m68LXot5+oA5+nFXMTY9EiT/QIfkusj
V2+BWMojjcFo9dpGcQagsmshurQH5iPKhR6qHgkNqV+lvn/c0C+GvlxQKz1tnjRS
oVJdOkan2hAPPVfRuCnPloz9bec6iATfz5UFT9RCEU4z2sBnwCn17BQcMpU0II8B
cYLV83BKSXtSLWe5KcUSx/Oim++qUtyM+cv1BKH4ECi2iuQ9pK+iL4qFys2qQZPR
NzyWftriOqpqNomED5kqiF9pAeK02oHakfFsGVbm5hVpJvuMRcX6yHNlgGbiehTC
+0Y7QiOc5NX9zeY=
=QBuF
-----END PGP SIGNATURE-----
Merge tag 'gfs2-v5.14-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 updates from Andreas Gruenbacher:
- Various withdraw related fixes (freeze glock recursion, thread
initialization / destruction order, journal recovery, glock cleanup,
withdraw under journal lock).
- Some error message improvements.
- Various minor cleanups.
* tag 'gfs2-v5.14-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: Remove redundant check from gfs2_glock_dq
gfs2: Delay withdraw from atomic context
gfs2: Don't call dlm after protocol is unmounted
gfs2: don't stop reads while withdraw in progress
gfs2: Mark journal inodes as "don't cache"
gfs2: nit: gfs2_drop_inode shouldn't return bool
gfs2: Eliminate vestigial HIF_FIRST
gfs2: Make recovery error more readable
gfs2: Don't release and reacquire local statfs bh
gfs2: init system threads before freeze lock
gfs2: tiny cleanup in gfs2_log_reserve
gfs2: trivial clean up of gfs2_ail_error
gfs2: be more verbose replaying invalid rgrp blocks
gfs2: Fix glock recursion in freeze_go_xmote_bh
gfs2: Fix memory leak of object lsi on error return path
We added CONFIG_MANDATORY_FILE_LOCKING in 2015, and soon after turned it
off in Fedora and RHEL8. Several other distros have followed suit.
I've heard of one problem in all that time: Someone migrated from an
older distro that supported "-o mand" to one that didn't, and the host
had a fstab entry with "mand" in it which broke on reboot. They didn't
actually _use_ mandatory locking so they just removed the mount option
and moved on.
This patch rips out mandatory locking support wholesale from the kernel,
along with the Kconfig option and the Documentation file. It also
changes the mount code to ignore the "mand" mount option instead of
erroring out, and to throw a big, ugly warning.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
In function gfs2_glock_dq, it checks to see if this is the fast path.
Before this patch, it checked both "find_first_holder(gl) == NULL" and
list_empty(&gl->gl_holders), which is redundant. If gl_holders is empty
then find_first_holder must return NULL. This patch removes the
redundancy.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Before this patch, if function __gfs2_ail_flush detected an error
syncing the ail list, it call gfs2_ail_error which called gfs2_withdraw.
Since __gfs2_ail_flush deals with a specific glock, we shouldn't withdraw
immediately because the withdraw code (signal_our_withdraw) uses glocks
in its processing.
This patch changes the call from gfs2_withdraw to gfs2_withdraw_delayed
which defers the withdraw until a more appropriate context, such as the
logd daemon, discovers the intent to withdraw.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
In the gfs2 withdraw sequence, the dlm protocol is unmounted with a call
to lm_unmount. After a withdraw, users are allowed to unmount the
withdrawn file system. But at that point we may still have glocks left
over that we need to free via unmount's call to gfs2_gl_hash_clear.
These glocks may have never been completed because of whatever problem
caused the withdraw (IO errors or whatever).
Before this patch, function gdlm_put_lock would still try to call into
dlm to unlock these leftover glocks, which resulted in dlm returning
-EINVAL because the lock space was abandoned. These glocks were never
freed because there was no mechanism after that to free them.
This patch adds a check to gdlm_put_lock to see if the locking protocol
was inactive (DFL_UNMOUNT flag) and if so, free the glock and not
make the invalid call into dlm.
I could have combined this "if" with the one that follows, related to
leftover glock LVBs, but I felt the code was more readable with its own
if clause.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
When gfs2 withdraws a file system, it calls signal_our_withdraw which
triggers another node to replay the withdrawing node's journal. Then it
waits until it knows the journal has been replayed. Part of this wait is
to repeatedly call check_journal_clean which calls gfs2_jdesc_check,
which checks to see if the journal is sane. As part of its sanity checks
it needs to re-read its journal's metadata. But with today's code, any
attempt to re-read the metadata results in -EIO because of a check for
the file system withdraw in function gfs2_meta_wait.
This patch adds an additional check for SDF_WITHDRAW_IN_PROG, to tell
if the read is done while the withdraw is in progress. In that case
we allow the metadata read to not be rejected. Therefore the metadata
check is done properly, so the withdraw sequence can finish normally.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Before this patch, journal inodes were considered regular inodes,
which meant that instead of evicting them, function iput_final would
just put them on the lru for later processing. If the file system
withdrew for whatever reason, the withdraw would never be seen until
the inode was evicted, which could be indefinitely.
This patch marks all journal inodes as "don't cache" which means
function iput_final will evict them immediately, allowing us to
properly recover the journal on other cluster nodes.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Today, gfs2_drop_inode can return "false" for an int value.
I'm sure this was just an oversight. Change to int value.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Before this patch, withdraws could cause an error that looked like:
Journal recovery skipped for 0 until next mount.
This patch changes it to a more readable:
Journal recovery skipped for jid 0 until next mount.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Before this patch, several functions in gfs2 related to the updating
of the statfs file used a newly acquired/read buffer_head for the
local statfs file. This is completely unnecessary, because other nodes
should never update it. Recreating the buffer is a waste of time.
This patch allows gfs2 to read in the local statefs buffer_head at
mount time and keep it around until unmount time.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Patch 96b1454f2e ("gfs2: move freeze glock outside the make_fs_rw and _ro
functions") changed the gfs2 mount sequence so that it holds the freeze
lock before calling gfs2_make_fs_rw. Before this patch, gfs2_make_fs_rw
called init_threads to initialize the quotad and logd threads. That is a
problem if the system needs to withdraw due to IO errors early in the
mount sequence, for example, while initializing the system statfs inode:
1. An IO error causes the statfs glock to not sync properly after
recovery, and leaves items on the ail list.
2. The leftover items on the ail list causes its do_xmote call to fail,
which makes it want to withdraw. But since the glock code cannot
withdraw (because the withdraw sequence uses glocks) it relies upon
the logd daemon to initiate the withdraw.
3. The withdraw can never be performed by the logd daemon because all
this takes place before the logd daemon is started.
This patch moves function init_threads from super.c to ops_fstype.c
and it changes gfs2_fill_super to start its threads before holding the
freeze lock, and if there's an error, stop its threads after releasing
it. This allows the logd to run unblocked by the freeze lock. Thus,
the logd daemon can perform its withdraw sequence properly.
Fixes: 96b1454f2e ("gfs2: move freeze glock outside the make_fs_rw and _ro functions")
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Function gfs2_log_reserve was setting revoke_blks to 0. There's no
need because it calculates it shortly thereafter. This patch removes
the unnecessary set.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
This patch does not change function. It adds variable sdp to clean up
function gfs2_ail_error and make it more readable.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
This patch adds some crucial information when journal replay detects a
replay of an obsolete rgrp block. For example, it wasn't printing the
journal id or the generation number played. This just supplements what
is logged in this unusual case.
The function that actually complains about the replaying of an obsolete
rgrp block has been split off to avoid long lines and sparse warnings.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Add a rcu argument to the ->get_acl() callback to allow
get_cached_acl_rcu() to call the ->get_acl() method in the next patch.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
These aren't actually used by the only instance implementing the methods.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
The permission check in gfs2_setattr is an old and outdated version of
may_setattr(). Switch to the updated version.
Fixes fstest generic/079.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
We must not call gfs2_consist (which does a file system withdraw) from
the freeze glock's freeze_go_xmote_bh function because the withdraw
will try to use the freeze glock, thus causing a glock recursion error.
This patch changes freeze_go_xmote_bh to call function
gfs2_assert_withdraw_delayed instead of gfs2_consist to avoid recursion.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
In the case where IS_ERR(lsi->si_sc_inode) is true the error exit path
to free_local does not kfree the allocated object lsi leading to a memory
leak. Fix this by kfree'ing lst before taking the error exit path.
Addresses-Coverity: ("Resource leak")
Fixes: 97fd734ba1 ("gfs2: lookup local statfs inodes prior to journal recovery")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmDa9GoUHGFncnVlbmJh
QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTr2Wg//VPDf8gD07ixZ9jOujwCB/0q5n403
doTtonF+26VVN0e3aZY5M949cW/65ogKYiZgzC6JlCkqsGg+8Y8xQ2mxKtUPQoY7
lGCU1uQVnoZMWpRzcigH3iGFz+yyW6PFH7lgr+mcyqBHNPc1zPK5oioV3KKQTtZH
XUP5FGZ9A3noty/feMCcicjAfh3iqwBSa7+QqPe9OlZ08AeDbaHatFzSnOPerpEh
eIOzv795q9MVKOghq+JHgELbnX58IrI3FVWDAWS9+KBfgdb23DV/78mmA9rjjP91
UXD5IJ7hYDQiPkKHX6uHrLyQaIkZlXISEAYiEmiNzfVysopl88P1JAV6n6MfrCVx
MX3ZEvOFAe5aSXjUMXPdhgYgf7dg1gV5L6zmDSTngQqegGxpp8tSXiUY1f5U/JU3
hVSwN0H1xOd6w9GbZM8GJq316gM8DHZGS/0UyZjHNYrH1rt4m7C0K0aukbelJ0EM
YdYceC6zONSk+ltM90Cvfy0WPO7nVm9hrG8nJcyxzyuIptrKqa9BCzLV7FvlObVE
Ia3MQf0j7wF+33Ro3VcyQTV2xadqFIzeJ2yLxJZmJ8pf/Pyu8pB0/+8Ug3Xrve+w
i1Q4y4LikJNNhyhBYt2Qev+myIx7VE8WoQ7tFwOmMHhKs9DbTP4WpyJL/x7Wy9wg
34KFHHVp+jIeUwI=
=47Ww
-----END PGP SIGNATURE-----
Merge tag 'gfs2-v5.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 updates from Andreas Gruenbacher:
"Various minor gfs2 cleanups and fixes"
* tag 'gfs2-v5.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: Clean up gfs2_unstuff_dinode
gfs2: Unstuff before locking page in gfs2_page_mkwrite
gfs2: Clean up the error handling in gfs2_page_mkwrite
gfs2: Fix error handling in init_statfs
gfs2: Fix underflow in gfs2_page_mkwrite
gfs2: Use list_move_tail instead of list_del/list_add_tail
gfs2: Fix do_gfs2_set_flags description
The only difference between iomap_set_page_dirty() and
__set_page_dirty_nobuffers() is that the latter includes a debugging check
that a !Uptodate page has private data.
Link: https://lkml.kernel.org/r/20210615162342.1669332-4-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove the CONFIG_BLOCK default to __set_page_dirty_buffers and just wire
that method up for the missing instances.
[hch@lst.de: ecryptfs: add a ->set_page_dirty cludge]
Link: https://lkml.kernel.org/r/20210624125250.536369-1-hch@lst.de
Link: https://lkml.kernel.org/r/20210614061512.3966143-4-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Tyler Hicks <code@tyhicks.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Split __gfs2_unstuff_inode off from gfs2_unstuff_dinode and clean up the
code a little. All remaining callers now pass NULL as the page argument
of gfs2_unstuff_dinode, so remove that argument.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
In gfs2_page_mkwrite, unstuff inodes before locking the page. That
way, we won't have to pass in the locked page to gfs2_unstuff_inode,
and gfs2_unstuff_inode can look up and lock the page itself.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
We're setting an error number so that block_page_mkwrite_return
translates it into the corresponding VM_FAULT_* code in several places,
but this is getting confusing, so set the VM_FAULT_* codes directly
instead. (No change in functionality.)
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
On an error path, init_statfs calls iput(pn) after pn has already been put.
Fix that by setting pn to NULL after the initial iput.
Fixes: 97fd734ba1 ("gfs2: lookup local statfs inodes prior to journal recovery")
Cc: stable@vger.kernel.org # v5.10+
Reported-by: Jing Xiangfeng <jingxiangfeng@huawei.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
On filesystems with a block size smaller than PAGE_SIZE and non-empty
files smaller then PAGE_SIZE, gfs2_page_mkwrite could end up allocating
excess blocks beyond the end of the file, similar to fallocate. This
doesn't make sense; fix it.
Reported-by: Bob Peterson <rpeterso@redhat.com>
Fixes: 184b4e6085 ("gfs2: Fix end-of-file handling in gfs2_page_mkwrite")
Cc: stable@vger.kernel.org # v5.5+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Using list_move_tail() instead of list_del() + list_add_tail().
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Commit 88b631cbfb ("gfs2: convert to fileattr") changed the argument list
without updating the description.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
This reverts commit b7f55d928e.
As explained by Linus in [*], write faults on a mmap region are reads
from a filesysten point of view, so taking the inode glock exclusively
on write faults is incorrect.
Instead, when a page is marked writable, the .page_mkwrite vm operation
will be called, which is where the exclusive lock taking needs to
happen. I got this wrong because of a broken test case that made me
believe .page_mkwrite isn't getting called when it actually is.
[*] https://lore.kernel.org/lkml/CAHk-=wj8EWr_D65i4oRSj2FTbrc6RdNydNNCGxeabRnwtoU=3Q@mail.gmail.com/
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
The GLF_LRU flag is checked under lru_lock in gfs2_glock_remove_from_lru() to
remove the glock from the lru list in __gfs2_glock_put().
On the shrink scan path, the same flag is cleared under lru_lock but because
of cond_resched_lock(&lru_lock) in gfs2_dispose_glock_lru(), progress on the
put side can be made without deleting the glock from the lru list.
Keep GLF_LRU across the race window opened by cond_resched_lock(&lru_lock) to
ensure correct behavior on both sides - clear GLF_LRU after list_del under
lru_lock.
Reported-by: syzbot <syzbot+34ba7ddbf3021981a228@syzkaller.appspotmail.com>
Signed-off-by: Hillf Danton <hdanton@sina.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
When a write fault occurs, we need to take the inode glock of the underlying
inode in exclusive mode. Otherwise, there's no guarantee that the dirty page
will be written back to disk.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Before this patch, the system ail lists were cleaned up if the logd
process withdrew, but on other withdraws, they were not cleaned up.
This included the cleaning up of the revokes as well.
This patch reorganizes things a bit so that all withdraws (not just logd)
clean up the ail lists, including any pending revokes.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Before this patch, gfs2 would deadlock because of the following
sequence during mount:
mount
gfs2_fill_super
gfs2_make_fs_rw <--- Detects IO error with glock
kthread_stop(sdp->sd_quotad_process);
<--- Blocked waiting for quotad to finish
logd
Detects IO error and the need to withdraw
calls gfs2_withdraw
gfs2_make_fs_ro
kthread_stop(sdp->sd_quotad_process);
<--- Blocked waiting for quotad to finish
gfs2_quotad
gfs2_statfs_sync
gfs2_glock_wait <---- Blocked waiting for statfs glock to be granted
glock_work_func
do_xmote <---Detects IO error, can't release glock: blocked on withdraw
glops->go_inval
glock_blocked_by_withdraw
requeue glock work & exit <--- work requeued, blocked by withdraw
This patch makes a special exception for the statfs system inode glock,
which allows the statfs glock UNLOCK to proceed normally. That allows the
quotad daemon to exit during the withdraw, which allows the logd daemon
to exit during the withdraw, which allows the mount to exit.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Before this patch, in the unlikely event that gfs2_glock_dq encountered
a withdraw, it would do a wait_on_bit to wait for its journal to be
recovered, but it never released the glock's spin_lock, which caused a
scheduling-while-atomic error.
This patch unlocks the lockref spin_lock before waiting for recovery.
Fixes: 601ef0d52e ("gfs2: Force withdraw to replay journals and wait for it to finish")
Cc: stable@vger.kernel.org # v5.7+
Reported-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>