linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-09-22 12:44:11 +08:00

Author	SHA1	Message	Date
Andreas Gruenbacher	fe3e397668	gfs2: Rework the log space allocation logic The current log space allocation logic is hard to understand or extend. The principle it that when the log is flushed, we may or may not have a transaction active that has space allocated in the log. To deal with that, we set aside a magical number of blocks to be used in case we don't have an active transaction. It isn't clear that the pool will always be big enough. In addition, we can't return unused log space at the end of a transaction, so the number of blocks allocated must exactly match the number of blocks used. Simplify this as follows: * When transactions are allocated or merged, always reserve enough blocks to flush the transaction (err on the safe side). * In gfs2_log_flush, return any allocated blocks that haven't been used. * Maintain a pool of spare blocks big enough to do one log flush, as before. * In gfs2_log_flush, when we have no active transaction, allocate a suitable number of blocks. For that, use the spare pool when called from logd, and leave the pool alone otherwise. This means that when the log is almost full, logd will still be able to do one more log flush, which will result in more log space becoming available. This will make the log space allocator code easier to work with in the future. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2021-02-22 21:16:22 +01:00
Andreas Gruenbacher	71b219f4e5	gfs2: Minor calc_reserved cleanup No functional change. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2021-02-22 21:16:22 +01:00
Andreas Gruenbacher	76fce65489	gfs2: Move function gfs2_ail_empty_tr Move this function further up in log.c so that we can use it in the next patch. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2021-02-03 18:37:25 +01:00
Andreas Gruenbacher	5cb738b5fb	gfs2: Get rid of current_tail() Keep the current value of the updated log tail in the super block as sb_log_flush_tail instead of computing it on the fly. This avoids unnecessary sd_ail_lock taking and cleans up the code. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2021-02-03 18:37:25 +01:00
Andreas Gruenbacher	297de3180d	gfs2: Use a tighter bound in gfs2_trans_begin Use a tighter bound for the number of blocks required by transactions in gfs2_trans_begin: in the worst case, we'll have mixed data and metadata, so we'll need a log desciptor for each type. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2021-02-03 18:37:25 +01:00
Andreas Gruenbacher	5ae8fff8d0	gfs2: Clean up gfs2_log_reserve Wake up log waiters in gfs2_log_release when log space has actually become available. This is a much better place for the wakeup than gfs2_logd. Check if enough log space is immeditely available before anything else. If there isn't, use io_wait_event to wait instead of open-coding it. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2021-02-03 18:37:24 +01:00
Andreas Gruenbacher	4a3d049db4	gfs2: Don't wait for journal flush in clean_journal Commit `588bff95c9` added gfs2_write_log_header() and started using it in clean_journal(), with an additional call to log_flush_wait() at the end of gfs2_write_log_header() which is unnecessary for clean_journal(). Move that call out of gfs2_write_log_header() to restore the previous behavior. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2021-02-03 18:37:24 +01:00
Andreas Gruenbacher	c1eba1b0bc	gfs2: Move lock flush locking to gfs2_trans_{begin,end} Move the read locking of sd_log_flush_lock from gfs2_log_reserve to gfs2_trans_begin, and its unlocking from gfs2_log_release to gfs2_trans_end. Use gfs2_log_release in two places in which it was open coded before. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2021-02-03 18:37:24 +01:00
Andreas Gruenbacher	f3708fb59f	gfs2: Get rid of sd_reserving_log This counter and the associated wait queue are only used so that gfs2_make_fs_ro can efficiently wait for all pending log space allocations to fail after setting the filesystem to read-only. This comes at the cost of waking up that wait queue very frequently. Instead, when gfs2_log_reserve fails because the filesystem has become read-only, Wake up sd_log_waitq. In gfs2_make_fs_ro, set the file system read-only and then wait until all the log space has been released. Give up and report the problem after a while. With that, sd_reserving_log and sd_reserving_log_wait can be removed. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2021-02-03 18:37:24 +01:00
Andreas Gruenbacher	c968f5788b	gfs2: Clean up on-stack transactions Replace the TR_ALLOCED flag by its inverse, TR_ONSTACK: that way, the flag only needs to be set in the exceptional case of on-stack transactions. Split off __gfs2_trans_begin from gfs2_trans_begin and use it to replace the open-coded version in gfs2_ail_empty_gl. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2021-02-03 18:37:10 +01:00
Andreas Gruenbacher	15e20a301a	gfs2: Use sb_start_intwrite in gfs2_ail_empty_gl Commit `2e60d7683c` ("GFS2: update freeze code to use freeze/thaw_super on all nodes") optimized away the sb_start_intwrite ... sb_end_intwrite protection for the on-stack transactions in gfs2_ail_empty_gl with no explanation. I can't think of a valid reason for doing that, so revert that change. This simplifies the next commit. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2021-02-03 16:33:16 +01:00
Andreas Gruenbacher	6e80674af0	gfs2: Clean up ail2_empty Clean up the logic in ail2_empty (no functional change). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2021-01-19 21:17:50 +01:00
Andreas Gruenbacher	e7501bf88c	gfs2: Rename gfs2_{write => flush}_revokes Function gfs2_write_revokes doesn't actually write any revokes; instead, it adds revokes to the system transaction during a flush. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2021-01-19 21:17:49 +01:00
Andreas Gruenbacher	625a8edd5e	gfs2: Minor debugging improvement Split the assert in gfs2_trans_end into two parts. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2021-01-19 21:17:45 +01:00
Andreas Gruenbacher	6188e8777d	gfs2: Some documentation updates The calc_reserved description claims that buf_limit is 502 (on 4k filesystems), but it is actually 503. Fix / clarify the entire description. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2021-01-19 21:17:07 +01:00
Andreas Gruenbacher	5a4e9c607e	gfs2: Minor gfs2_write_revokes cleanups Clean up the computations in gfs2_write_revokes (no change in functionality). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2021-01-19 21:17:07 +01:00
Andreas Gruenbacher	458094c2c6	gfs2: Simplify the buf_limit and databuf_limit definitions The BUF_OFFSET and DATABUF_OFFSET definitions are only used in buf_limit and databuf_limit, respectively, and the rounding done in those definitions is immediately wiped out by dividing by the element size. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2021-01-19 21:17:07 +01:00
Andreas Gruenbacher	736b2f778f	gfs2: Un-obfuscate function jdesc_find_i Clean up this function to show that it is trivial. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2021-01-19 21:16:43 +01:00
Bob Peterson	6e5c4ea37a	gfs2: in signal_our_withdraw wait for unfreeze of _this_ fs only Function signal_our_withdraw needs to work on file systems that have been partially frozen. To do this, it called flush_workqueue(gfs2_freeze_wq). This this wrong because it waits for ALL file systems to be unfrozen, not just the one we're withdrawing from. It should only wait for the targetted file system to be unfrozen. Otherwise it would wait until ALL file systems are thawed before signaling the withdraw. This patch changes signal_our_withdraw so it calls flush_work() for the target file system's freeze work (only) to be completed. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2020-12-03 17:04:41 +01:00
Bob Peterson	dd64fe8167	gfs2: Remove sb_start_write from gfs2_statfs_sync Before this patch, function gfs2_statfs_sync called sb_start_write and sb_end_write. This is completely unnecessary because, aside from grabbing glocks, gfs2_statfs_sync does all its updates to statfs with a transaction: gfs2_trans_begin and _end. And transactions always do sb_start_intwrite in gfs2_trans_begin and sb_end_intwrite in gfs2_trans_end. This patch simply removes the call to sb_start_write. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2020-12-03 17:04:24 +01:00
Tom Rix	28c332b941	gfs2: remove trailing semicolons from macro definitions The macro use will already have a semicolon. Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2020-12-01 00:25:21 +01:00
Andreas Gruenbacher	a55a47a3bc	Revert "GFS2: Prevent delete work from occurring on glocks used for create" Since commit `a0e3cc65fa` ("gfs2: Turn gl_delete into a delayed work"), we're cancelling any pending delete work of an iopen glock before attaching a new inode to that glock in gfs2_create_inode. This means that delete_work_func can no longer be queued or running when attaching the iopen glock to the new inode, and we can revert commit `a4923865ea` ("GFS2: Prevent delete work from occurring on glocks used for create"), which tried to achieve the same but in a racy way. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2020-12-01 00:25:21 +01:00
Andreas Gruenbacher	e3a77eebfa	gfs2: Make inode operations static The inode operations are not used outside inode.c. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2020-12-01 00:25:21 +01:00
Andreas Gruenbacher	dd0ecf5441	gfs2: Fix deadlock between gfs2_{create_inode,inode_lookup} and delete_work_func In gfs2_create_inode and gfs2_inode_lookup, make sure to cancel any pending delete work before taking the inode glock. Otherwise, gfs2_cancel_delete_work may block waiting for delete_work_func to complete, and delete_work_func may block trying to acquire the inode glock in gfs2_inode_lookup. Reported-by: Alexander Aring <aahringo@redhat.com> Fixes: `a0e3cc65fa` ("gfs2: Turn gl_delete into a delayed work") Cc: stable@vger.kernel.org # v5.8+ Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2020-12-01 00:21:10 +01:00
Andreas Gruenbacher	82e938bd53	gfs2: Upgrade shared glocks for atime updates Commit `20f829999c` ("gfs2: Rework read and page fault locking") lifted the glock lock taking from the low-level ->readpage and ->readahead address space operations to the higher-level ->read_iter file and ->fault vm operations. The glocks are still taken in LM_ST_SHARED mode only. On filesystems mounted without the noatime option, ->read_iter sometimes needs to update the atime as well, though. Right now, this leads to a failed locking mode assertion in gfs2_dirty_inode. Fix that by introducing a new update_time inode operation. There, if the glock is held non-exclusively, upgrade it to an exclusive lock. Reported-by: Alexander Aring <aahringo@redhat.com> Fixes: `20f829999c` ("gfs2: Rework read and page fault locking") Cc: stable@vger.kernel.org # v5.8+ Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2020-11-26 19:58:25 +01:00
Bob Peterson	f39e7d3aae	gfs2: Don't freeze the file system during unmount GFS2's freeze/thaw mechanism uses a special freeze glock to control its operation. It does this with a sync glock operation (glops.c) called freeze_go_sync. When the freeze glock is demoted (glock's do_xmote) the glops function causes the file system to be frozen. This is intended. However, GFS2's mount and unmount processes also hold the freeze glock to prevent other processes, perhaps on different cluster nodes, from mounting the frozen file system in read-write mode. Before this patch, there was no check in freeze_go_sync for whether a freeze in intended or whether the glock demote was caused by a normal unmount. So it was trying to freeze the file system it's trying to unmount, which ends up in a deadlock. This patch adds an additional check to freeze_go_sync so that demotes of the freeze glock are ignored if they come from the unmount process. Fixes: `20b3291290` ("gfs2: Fix regression in freeze_go_sync") Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2020-11-25 18:12:08 +01:00
Bob Peterson	778721510e	gfs2: check for empty rgrp tree in gfs2_ri_update If gfs2 tries to mount a (corrupt) file system that has no resource groups it still tries to set preferences on the first one, which causes a kernel null pointer dereference. This patch adds a check to function gfs2_ri_update so this condition is detected and reported back as an error. Reported-by: syzbot+e3f23ce40269a4c9053a@syzkaller.appspotmail.com Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2020-11-25 18:10:55 +01:00
Alexander Aring	515b269d5b	gfs2: set lockdep subclass for iopen glocks This patch introduce a new globs attribute to define the subclass of the glock lockref spinlock. This avoid the following lockdep warning, which occurs when we lock an inode lock while an iopen lock is held: ============================================ WARNING: possible recursive locking detected 5.10.0-rc3+ #4990 Not tainted -------------------------------------------- kworker/0:1/12 is trying to acquire lock: ffff9067d45672d8 (&gl->gl_lockref.lock){+.+.}-{3:3}, at: lockref_get+0x9/0x20 but task is already holding lock: ffff9067da308588 (&gl->gl_lockref.lock){+.+.}-{3:3}, at: delete_work_func+0x164/0x260 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&gl->gl_lockref.lock); lock(&gl->gl_lockref.lock); * DEADLOCK * May be due to missing lock nesting notation 3 locks held by kworker/0:1/12: #0: ffff9067c1bfdd38 ((wq_completion)delete_workqueue){+.+.}-{0:0}, at: process_one_work+0x1b7/0x540 #1: ffffac594006be70 ((work_completion)(&(&gl->gl_delete)->work)){+.+.}-{0:0}, at: process_one_work+0x1b7/0x540 #2: ffff9067da308588 (&gl->gl_lockref.lock){+.+.}-{3:3}, at: delete_work_func+0x164/0x260 stack backtrace: CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.10.0-rc3+ #4990 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 Workqueue: delete_workqueue delete_work_func Call Trace: dump_stack+0x8b/0xb0 __lock_acquire.cold+0x19e/0x2e3 lock_acquire+0x150/0x410 ? lockref_get+0x9/0x20 _raw_spin_lock+0x27/0x40 ? lockref_get+0x9/0x20 lockref_get+0x9/0x20 delete_work_func+0x188/0x260 process_one_work+0x237/0x540 worker_thread+0x4d/0x3b0 ? process_one_work+0x540/0x540 kthread+0x127/0x140 ? __kthread_bind_mask+0x60/0x60 ret_from_fork+0x22/0x30 Suggested-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2020-11-24 23:45:58 +01:00
Alexander Aring	16e6281b6b	gfs2: Fix deadlock dumping resource group glocks Commit `0e539ca1bb` ("gfs2: Fix NULL pointer dereference in gfs2_rgrp_dump") introduced additional locking in gfs2_rgrp_go_dump, which is also used for dumping resource group glocks via debugfs. However, on that code path, the glock spin lock is already taken in dump_glock, and taking it again in gfs2_glock2rgrp leads to deadlock. This can be reproduced with: $ mkfs.gfs2 -O -p lock_nolock /dev/FOO $ mount /dev/FOO /mnt/foo $ touch /mnt/foo/bar $ cat /sys/kernel/debug/gfs2/FOO/glocks Fix that by not taking the glock spin lock inside the go_dump callback. Fixes: `0e539ca1bb` ("gfs2: Fix NULL pointer dereference in gfs2_rgrp_dump") Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2020-11-24 23:45:58 +01:00
Bob Peterson	20b3291290	gfs2: Fix regression in freeze_go_sync Patch `541656d3a5` ("gfs2: freeze should work on read-only mounts") changed the check for glock state in function freeze_go_sync() from "gl->gl_state == LM_ST_SHARED" to "gl->gl_req == LM_ST_EXCLUSIVE". That's wrong and it regressed gfs2's freeze/thaw mechanism because it caused only the freezing node (which requests the glock in EX) to queue freeze work. All nodes go through this go_sync code path during the freeze to drop their SHared hold on the freeze glock, allowing the freezing node to acquire it in EXclusive mode. But all the nodes must freeze access to the file system locally, so they ALL must queue freeze work. The freeze_work calls freeze_func, which makes a request to reacquire the freeze glock in SH, effectively blocking until the thaw from the EX holder. Once thawed, the freezing node drops its EX hold on the freeze glock, then the (blocked) freeze_func reacquires the freeze glock in SH again (on all nodes, including the freezer) so all nodes go back to a thawed state. This patch changes the check back to gl_state == LM_ST_SHARED like it was prior to `541656d3a5`. Fixes: `541656d3a5` ("gfs2: freeze should work on read-only mounts") Cc: stable@vger.kernel.org # v5.8+ Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2020-11-18 16:28:11 +01:00
Bob Peterson	4e79e3f08e	gfs2: Fix case in which ail writes are done to jdata holes Patch `b2a846dbef` ("gfs2: Ignore journal log writes for jdata holes") tried (unsuccessfully) to fix a case in which writes were done to jdata blocks, the blocks are sent to the ail list, then a punch_hole or truncate operation caused the blocks to be freed. In other words, the ail items are for jdata holes. Before `b2a846dbef`, the jdata hole caused function gfs2_block_map to return -EIO, which was eventually interpreted as an IO error to the journal, and then withdraw. This patch changes function gfs2_get_block_noalloc, which is only used for jdata writes, so it returns -ENODATA rather than -EIO, and when -ENODATA is returned to gfs2_ail1_start_one, the error is ignored. We can safely ignore it because gfs2_ail1_start_one is only called when the jdata pages have already been written and truncated, so the ail1 content no longer applies. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2020-11-12 18:55:20 +01:00
Bob Peterson	d3039c0615	Revert "gfs2: Ignore journal log writes for jdata holes" This reverts commit `b2a846dbef`. That commit changed the behavior of function gfs2_block_map to return -ENODATA in cases where a hole (IOMAP_HOLE) is encountered and create is false. While that fixed the intended problem for jdata, it also broke other callers of gfs2_block_map such as some jdata block reads. Before the patch, an encountered hole would be skipped and the buffer seen as unmapped by the caller. The patch changed the behavior to return -ENODATA, which is interpreted as an error by the caller. The -ENODATA return code should be restricted to the specific case where jdata holes are encountered during ail1 writes. That will be done in a later patch. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2020-11-12 18:41:57 +01:00
Zhang Qilong	bc923818b1	gfs2: fix possible reference leak in gfs2_check_blk_type In the fail path of gfs2_check_blk_type, forgetting to call gfs2_glock_dq_uninit will result in rgd_gh reference leak. Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2020-11-12 13:09:07 +01:00
Alexander Aring	da7d554f7c	gfs2: Wake up when sd_glock_disposal becomes zero Commit `fc0e38dae6` ("GFS2: Fix glock deallocation race") fixed a sd_glock_disposal accounting bug by adding a missing atomic_dec statement, but it failed to wake up sd_glock_wait when that decrement causes sd_glock_disposal to reach zero. As a consequence, gfs2_gl_hash_clear can now run into a 10-minute timeout instead of being woken up. Add the missing wakeup. Fixes: `fc0e38dae6` ("GFS2: Fix glock deallocation race") Cc: stable@vger.kernel.org # v2.6.39+ Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2020-11-03 14:39:11 +01:00
Andreas Gruenbacher	6bd1c7bd4e	gfs2: Don't call cancel_delayed_work_sync from within delete work function Right now, we can end up calling cancel_delayed_work_sync from within delete_work_func via gfs2_lookup_by_inum -> gfs2_inode_lookup -> gfs2_cancel_delete_work. When that happens, it will result in a deadlock. Instead, gfs2_inode_lookup should skip the call to gfs2_cancel_delete_work when called from delete_work_func (blktype == GFS2_BLKST_UNLINKED). Reported-by: Alexander Ahring Oder Aring <aahringo@redhat.com> Fixes: `a0e3cc65fa` ("gfs2: Turn gl_delete into a delayed work") Cc: stable@vger.kernel.org # v5.8+ Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2020-11-02 21:34:47 +01:00
Bob Peterson	c5c6872469	gfs2: check for live vs. read-only file system in gfs2_fitrim Before this patch, gfs2_fitrim was not properly checking for a "live" file system. If the file system had something to trim and the file system was read-only (or spectator) it would start the trim, but when it starts the transaction, gfs2_trans_begin returns -EROFS (read-only file system) and it errors out. However, if the file system was already trimmed so there's no work to do, it never called gfs2_trans_begin. That code is bypassed so it never returns the error. Instead, it returns a good return code with 0 work. All this makes for inconsistent behavior: The same fstrim command can return -EROFS in one case and 0 in another. This tripped up xfstests generic/537 which reports the error as: +fstrim with unrecovered metadata just ate your filesystem This patch adds a check for a "live" (iow, active journal, iow, RW) file system, and if not, returns the error properly. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2020-10-29 22:16:47 +01:00
Bob Peterson	7e5b926699	gfs2: don't initialize statfs_change inodes in spectator mode Before commit `97fd734ba1`, the local statfs_changeX inode was never initialized for spectator mounts. However, it still checks for spectator mounts when unmounting everything. There's no good reason to lookup the statfs_changeX files because spectators cannot perform recovery. It still, however, needs the master statfs file for statfs calls. This patch adds the check for spectator mounts to init_statfs. Fixes: `97fd734ba1` ("gfs2: lookup local statfs inodes prior to journal recovery") Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2020-10-29 22:16:46 +01:00
Bob Peterson	4a55752ae2	gfs2: Split up gfs2_meta_sync into inode and rgrp versions Before this patch, function gfs2_meta_sync called filemap_fdatawrite to write the address space for the metadata being synced. That's great for inodes, but resource groups all point to the same superblock-address space, sdp->sd_aspace. Each rgrp has its own range of blocks on which it should operate. That meant every time an rgrp's metadata was synced, it would write all of them instead of just the range. This patch eliminates function gfs2_meta_sync and tailors specific metasync functions for inodes and rgrps. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2020-10-29 22:16:46 +01:00
Bob Peterson	c4af59bd44	gfs2: init_journal's undo directive should also undo the statfs inodes Hi, Before this patch, function init_journal's "undo" directive jumped to label fail_jinode_gh. But now that it does statfs initialization, it needs to jump to fail_statfs instead. Failure to do so means that mount failures after init_journal is successful will neglect to let go of the proper statfs information, stranding the statfs_changeX inodes. This makes it impossible to free its glocks, and results in: gfs2: fsid=sda.s: G: s:EX n:2/805f f:Dqob t:EX d:UN/603701000 a:0 v:0 r:4 m:200 p:1 gfs2: fsid=sda.s: H: s:EX f:H e:0 p:1397947 [(ended)] init_journal+0x548/0x890 [gfs2] gfs2: fsid=sda.s: I: n:6/32863 t:8 f:0x00 d:0x00000201 s:24 p:0 gfs2: fsid=sda.s: G: s:SH n:5/805f f:Dqob t:SH d:UN/603712000 a:0 v:0 r:3 m:200 p:0 gfs2: fsid=sda.s: H: s:SH f:EH e:0 p:1397947 [(ended)] gfs2_inode_lookup+0x1fb/0x410 [gfs2] VFS: Busy inodes after unmount of sda. Self-destruct in 5 seconds. Have a nice day... The next time the file system is mounted, it then reuses the same glocks, which ends in a kernel NULL pointer dereference when trying to dump the reused glock. This patch makes the "undo" function of init_journal jump to fail_statfs so the statfs files are properly deconstructed upon failure. Fixes: `97fd734ba1` ("gfs2: lookup local statfs inodes prior to journal recovery") Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2020-10-29 22:16:46 +01:00
Bob Peterson	a9dd945cce	gfs2: Add missing truncate_inode_pages_final for sd_aspace Gfs2 creates an address space for its rgrps called sd_aspace, but it never called truncate_inode_pages_final on it. This confused vfs greatly which tried to reference the address space after gfs2 had freed the superblock that contained it. This patch adds a call to truncate_inode_pages_final for sd_aspace, thus avoiding the use-after-free. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2020-10-29 22:16:46 +01:00
Bob Peterson	d0f17d3883	gfs2: Free rd_bits later in gfs2_clear_rgrpd to fix use-after-free Function gfs2_clear_rgrpd calls kfree(rgd->rd_bits) before calling return_all_reservations, but return_all_reservations still dereferences rgd->rd_bits in __rs_deltree. Fix that by moving the call to kfree below the call to return_all_reservations. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2020-10-29 22:16:36 +01:00
Abhi Das	bedb0f056f	gfs2: Recover statfs info in journal head Apply the outstanding statfs changes in the journal head to the master statfs file. Zero out the local statfs file for good measure. Previously, statfs updates would be read in from the local statfs inode and synced to the master statfs inode during recovery. We now use the statfs updates in the journal head to update the master statfs inode instead of reading in from the local statfs inode. To preserve backward compatibility with kernels that can't do this, we still need to keep the local statfs inode up to date by writing changes to it. At some point in the future, we can do away with the local statfs inodes altogether and keep the statfs changes solely in the journal. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2020-10-23 15:47:38 +02:00
Abhi Das	97fd734ba1	gfs2: lookup local statfs inodes prior to journal recovery We need to lookup the master statfs inode and the local statfs inodes earlier in the mount process (in init_journal) so journal recovery can use them when it attempts to recover the statfs info. We lookup all the local statfs inodes and store them in a linked list to allow a node to recover statfs info for other nodes in the cluster. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2020-10-23 15:47:14 +02:00
Abhi Das	730926982d	gfs2: Add fields for statfs info in struct gfs2_log_header_host And read these in __get_log_header() from the log header. Also make gfs2_statfs_change_out() non-static so it can be used outside of super.c Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2020-10-20 23:16:22 +02:00
Andreas Gruenbacher	ed3adb375b	gfs2: Ignore subsequent errors after withdraw in rgrp_go_sync Once a withdraw has occurred, ignore errors that are the consequence of the withdraw. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2020-10-20 23:16:22 +02:00
Bob Peterson	23cfb0c3d8	gfs2: Eliminate gl_vm The gfs2_glock structure has a gl_vm member, introduced in commit `7005c3e4ae` ("GFS2: Use range based functions for rgrp sync/invalidation"), which stores the location of resource groups within their address space. This structure is in a union with iopen glock specific fields. It was introduced because at unmount time, the resource group objects were destroyed before flushing out any pending resource group glock work, and flushing out such work could require flushing / truncating the address space. Since commit `b3422cacdd` ("gfs2: Rework how rgrp buffer_heads are managed"), any pending resource group glock work is flushed out before destroying the resource group objects. So the resource group objects will now always exist in rgrp_go_sync and rgrp_go_inval, and we now simply compute the gl_vm values where needed instead of caching them. This also eliminates the union. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2020-10-20 23:16:22 +02:00
Bob Peterson	2ffed5290b	gfs2: Only access gl_delete for iopen glocks Only initialize gl_delete for iopen glocks, but more importantly, only access it for iopen glocks in flush_delete_work: flush_delete_work is called for different types of glocks including rgrp glocks, and those use gl_vm which is in a union with gl_delete. Without this fix, we'll end up clobbering gl_vm, which results in general memory corruption. Fixes: `a0e3cc65fa` ("gfs2: Turn gl_delete into a delayed work") Cc: stable@vger.kernel.org # v5.8+ Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2020-10-20 23:16:22 +02:00
Bob Peterson	dbffb29dac	gfs2: Fix comments to glock_hash_walk The comments before function glock_hash_walk had the wrong name and an extra parameter. This simply fixes the comments. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2020-10-20 23:16:16 +02:00
Bob Peterson	e2c6c8a797	gfs2: eliminate GLF_QUEUED flag in favor of list_empty(gl_holders) Before this patch, glock.c maintained a flag, GLF_QUEUED, which indicated when a glock had a holder queued. It was only checked for inode glocks, although set and cleared by all glocks, and it was only used to determine whether the glock should be held for the minimum hold time before releasing. The problem is that the flag is not accurate at all. If a process holds the glock, the flag is set. When they dequeue the glock, it only cleared the flag in cases when the state actually changed. So if the state doesn't change, the flag may still be set, even when nothing is queued. This happens to iopen glocks often: the get held in SH, then the file is closed, but the glock remains in SH mode. We don't need a special flag to indicate this: we can simply tell whether the glock has any items queued to the holders queue. It's a waste of cpu time to maintain it. This patch eliminates the flag in favor of simply checking list_empty on the glock holders. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2020-10-15 17:04:53 +02:00
Bob Peterson	b2a846dbef	gfs2: Ignore journal log writes for jdata holes When flushing out its ail1 list, gfs2_write_jdata_page calls function __block_write_full_page passing in function gfs2_get_block_noalloc. But there was a problem when a process wrote to a jdata file, then truncated it or punched a hole, leaving references to the blocks within the new hole in its ail list, which are to be written to the journal log. In writing them to the journal, after calling gfs2_block_map, function gfs2_get_block_noalloc determined that the (hole-punched) block was not mapped, so it returned -EIO to generic_writepages, which passed it back to gfs2_ail1_start_one. This, in turn, performed a withdraw, assuming there was a real IO error writing to the journal. This might be a valid error when writing metadata to the journal, but for journaled data writes, it does not warrant a withdraw. This patch adds a check to function gfs2_block_map that makes an exception for journaled data writes that correspond to jdata holes: If the iomap get function returns a block type of IOMAP_HOLE, it instead returns -ENODATA which does not cause the withdraw. Other errors are returned as before. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2020-10-15 14:29:04 +02:00

1 2 3 4 5 ...

2343 Commits