linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-11 12:28:41 +08:00

Author	SHA1	Message	Date
Kevin Hao	edd13270fa	gfs2: Use wait_event_freezable_timeout() for freezable kthread A freezable kernel thread can enter frozen state during freezing by either calling try_to_freeze() or using wait_event_freezable() and its variants. So for the following snippet of code in a kernel thread loop: try_to_freeze(); wait_event_interruptible_timeout(); We can change it to a simple wait_event_freezable_timeout() and then eliminate a function call. Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-12-21 22:53:46 +01:00
Kevin Hao	76e7211ca1	gfs2: Add missing set_freezable() for freezable kthread The kernel thread function gfs2_logd() and gfs2_quotad() invoke the try_to_freeze() in its loop. But all the kernel threads are no-freezable by default. So if we want to make a kernel thread to be freezable, we have to invoke set_freezable() explicitly. Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-12-21 22:53:35 +01:00
Matthew Wilcox (Oracle)	ff7a85af5a	gfs2: Remove use of error flag in journal reads Conventionally, we use the uptodate bit to signal whether a read encountered an error or not. Use folio_end_read() to set the uptodate bit on success. Also use filemap_set_wb_err() to communicate the errno instead of the more heavy-weight mapping_set_error(). Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-12-20 21:29:41 +01:00
Andreas Gruenbacher	e0f1f02178	gfs2: Lift withdraw check out of gfs2_ail1_empty Lift the check for the SDF_WITHDRAWING flag out of gfs2_ail1_empty() and into its callers. This is needed so that gfs2_flush_revokes() can drop the sd_log_lock spinlock before triggering a withdraw if necessary. Instead of checking for the SDF_WITHDRAWING flag, use gfs2_withdrawing(). Also, the low-level code triggering the delayed withdraw reports when there is a problem, so there is no need to report that again. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-12-20 21:29:41 +01:00
Andreas Gruenbacher	4d927b03a6	gfs2: Rename gfs2_withdrawn to gfs2_withdrawing_or_withdrawn This function checks whether the filesystem has been been marked to be withdrawn eventually or has been withdrawn already. Rename this function to avoid confusing code like checking for gfs2_withdrawing() when gfs2_withdrawn() has already returned true. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-12-20 21:29:40 +01:00
Andreas Gruenbacher	015af1af44	gfs2: Mark withdraws as unlikely Mark the gfs2_withdrawn(), gfs2_withdrawing(), and gfs2_withdraw_in_prog() inline functions as likely to return %false. This allows to get rid of likely() and unlikely() annotations at the call sites of those functions. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-12-20 21:29:40 +01:00
Andreas Gruenbacher	4710642807	gfs2: Minor gfs2_ail1_empty cleanup Change gfs2_ail1_empty() to return %true when the ail1 list is empty. Based on that, make the loop in empty_ail1_list() more obvious. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-12-20 21:29:40 +01:00
Al Viro	34d63b8162	gfs2: use is_subdir() ... instead of reimplementing it with misguiding name (is_ancestor(x, y) would normally imply "x is an ancestor of y", not the other way round). With races, while we are at it... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-12-20 12:46:52 +01:00
Al Viro	34d7224643	gfs2: d_obtain_alias(ERR_PTR(...)) will do the right thing Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-12-20 12:46:52 +01:00
Abhi Das	dd00aaeb34	gfs2: Use GL_NOBLOCK flag for non-blocking lookups Add the GL_NOBLOCK flag to the locking requests in gfs2_permission() and gfs2_drevalidate() when called with the MAY_NOT_BLOCK flag and LOOKUP_RCU flag, respectively. This will cause the locking requests to be handled without sleeping if possible. We bail out with -ECHILD if we can't grant the glock immediately. Make sure not to dget() + dput() the parent dentry in gfs2_drevalidate() in LOOKUP_RCU mode; dput() is a sleeping operation. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-12-18 14:24:33 +01:00
Andreas Gruenbacher	f9f229c1f7	gfs2: Add GL_NOBLOCK flag Add a GL_NOBLOCK flag for trying to take a glock without sleeping. This will be used for implementing non-blocking lookup (MAY_NOT_BLOCK in gfs2_permission, LOOKUP_RCU in gfs2_drevalidate). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-12-18 14:24:33 +01:00
Randy Dunlap	95d0f62525	gfs2: rgrp: fix kernel-doc warnings Fix kernel-doc warnings found when using "W=1". rgrp.c:162: warning: missing initial short description on line: * gfs2_bit_search rgrp.c:1200: warning: Function parameter or member 'gl' not described in 'gfs2_rgrp_go_instantiate' rgrp.c:1200: warning: Excess function parameter 'gh' description in 'gfs2_rgrp_go_instantiate' rgrp.c:1970: warning: missing initial short description on line: * gfs2_rgrp_used_recently Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-12-18 14:24:33 +01:00
Edward Adam Davis	71733b4922	gfs2: fix kernel BUG in gfs2_quota_cleanup [Syz report] kernel BUG at fs/gfs2/quota.c:1508! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 5060 Comm: syz-executor505 Not tainted 6.7.0-rc3-syzkaller-00134-g994d5c58e50e #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/10/2023 RIP: 0010:gfs2_quota_cleanup+0x6b5/0x6c0 fs/gfs2/quota.c:1508 Code: fe e9 cf fd ff ff 44 89 e9 80 e1 07 80 c1 03 38 c1 0f 8c 2d fe ff ff 4c 89 ef e8 b6 19 23 fe e9 20 fe ff ff e8 ec 11 c7 fd 90 <0f> 0b e8 84 9c 4f 07 0f 1f 40 00 66 0f 1f 00 55 41 57 41 56 41 54 RSP: 0018:ffffc9000409f9e0 EFLAGS: 00010293 RAX: ffffffff83c76854 RBX: 0000000000000002 RCX: ffff888026001dc0 RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000 RBP: ffffc9000409fb00 R08: ffffffff83c762b0 R09: 1ffff1100fd38015 R10: dffffc0000000000 R11: ffffed100fd38016 R12: dffffc0000000000 R13: ffff88807e9c0828 R14: ffff888014693580 R15: ffff88807e9c0000 FS: 0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f16d1bd70f8 CR3: 0000000027199000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> gfs2_put_super+0x2e1/0x940 fs/gfs2/super.c:611 generic_shutdown_super+0x13a/0x2c0 fs/super.c:696 kill_block_super+0x44/0x90 fs/super.c:1667 deactivate_locked_super+0xc1/0x130 fs/super.c:484 cleanup_mnt+0x426/0x4c0 fs/namespace.c:1256 task_work_run+0x24a/0x300 kernel/task_work.c:180 exit_task_work include/linux/task_work.h:38 [inline] do_exit+0xa34/0x2750 kernel/exit.c:871 do_group_exit+0x206/0x2c0 kernel/exit.c:1021 __do_sys_exit_group kernel/exit.c:1032 [inline] __se_sys_exit_group kernel/exit.c:1030 [inline] __x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1030 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x45/0x110 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x63/0x6b ... [pid 5060] fsconfig(4, FSCONFIG_CMD_RECONFIGURE, NULL, NULL, 0) = 0 [pid 5060] exit_group(1) = ? ... [Analysis] When the task exits, it will execute cleanup_mnt() to recycle the mounted gfs2 file system, but it performs a system call fsconfig(4, FSCONFIG_CMD_RECONFIGURE, NULL, NULL, 0) before executing the task exit operation. This will execute the following kernel path to complete the setting of SDF_JOURNAL_LIVE for sd_flags: SYSCALL_DEFINE5(fsconfig, ..)-> vfs_fsconfig_locked()-> vfs_cmd_reconfigure()-> gfs2_reconfigure()-> gfs2_make_fs_rw()-> set_bit(SDF_JOURNAL_LIVE, &sdp->sd_flags); [Fix] Add SDF_NORECOVERY check in gfs2_quota_cleanup() to avoid checking SDF_JOURNAL_LIVE on the path where gfs2 is being unmounted. Reported-and-tested-by: syzbot+3b6e67ac2b646da57862@syzkaller.appspotmail.com Fixes: `f66af88e33` ("gfs2: Stop using gfs2_make_fs_ro for withdraw") Signed-off-by: Edward Adam Davis <eadavis@qq.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-12-18 14:24:33 +01:00
Andreas Gruenbacher	1181f2d9fe	gfs2: Fix inode_go_instantiate description Fixes a "function parameter or member gl not described in inode_go_instantiate" warning. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-12-18 14:24:33 +01:00
Osama Muhammad	8877243bea	gfs2: Fix kernel NULL pointer dereference in gfs2_rgrp_dump Syzkaller has reported a NULL pointer dereference when accessing rgd->rd_rgl in gfs2_rgrp_dump(). This can happen when creating rgd->rd_gl fails in read_rindex_entry(). Add a NULL pointer check in gfs2_rgrp_dump() to prevent that. Reported-and-tested-by: syzbot+da0fc229cc1ff4bb2e6d@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=da0fc229cc1ff4bb2e6d Fixes: `72244b6bc7` ("gfs2: improve debug information when lvb mismatches are found") Signed-off-by: Osama Muhammad <osmtendev@gmail.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-12-18 14:16:36 +01:00
Nhat Pham	0a97c01cd2	list_lru: allow explicit memcg and NUMA node selection Patch series "workload-specific and memory pressure-driven zswap writeback", v8. There are currently several issues with zswap writeback: 1. There is only a single global LRU for zswap, making it impossible to perform worload-specific shrinking - an memcg under memory pressure cannot determine which pages in the pool it owns, and often ends up writing pages from other memcgs. This issue has been previously observed in practice and mitigated by simply disabling memcg-initiated shrinking: https://lore.kernel.org/all/20230530232435.3097106-1-nphamcs@gmail.com/T/#u But this solution leaves a lot to be desired, as we still do not have an avenue for an memcg to free up its own memory locked up in the zswap pool. 2. We only shrink the zswap pool when the user-defined limit is hit. This means that if we set the limit too high, cold data that are unlikely to be used again will reside in the pool, wasting precious memory. It is hard to predict how much zswap space will be needed ahead of time, as this depends on the workload (specifically, on factors such as memory access patterns and compressibility of the memory pages). This patch series solves these issues by separating the global zswap LRU into per-memcg and per-NUMA LRUs, and performs workload-specific (i.e memcg- and NUMA-aware) zswap writeback under memory pressure. The new shrinker does not have any parameter that must be tuned by the user, and can be opted in or out on a per-memcg basis. As a proof of concept, we ran the following synthetic benchmark: build the linux kernel in a memory-limited cgroup, and allocate some cold data in tmpfs to see if the shrinker could write them out and improved the overall performance. Depending on the amount of cold data generated, we observe from 14% to 35% reduction in kernel CPU time used in the kernel builds. This patch (of 6): The interface of list_lru is based on the assumption that the list node and the data it represents belong to the same allocated on the correct node/memcg. While this assumption is valid for existing slab objects LRU such as dentries and inodes, it is undocumented, and rather inflexible for certain potential list_lru users (such as the upcoming zswap shrinker and the THP shrinker). It has caused us a lot of issues during our development. This patch changes list_lru interface so that the caller must explicitly specify numa node and memcg when adding and removing objects. The old list_lru_add() and list_lru_del() are renamed to list_lru_add_obj() and list_lru_del_obj(), respectively. It also extends the list_lru API with a new function, list_lru_putback, which undoes a previous list_lru_isolate call. Unlike list_lru_add, it does not increment the LRU node count (as list_lru_isolate does not decrement the node count). list_lru_putback also allows for explicit memcg and NUMA node selection. Link: https://lkml.kernel.org/r/20231130194023.4102148-1-nphamcs@gmail.com Link: https://lkml.kernel.org/r/20231130194023.4102148-2-nphamcs@gmail.com Signed-off-by: Nhat Pham <nphamcs@gmail.com> Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Tested-by: Bagas Sanjaya <bagasdotme@gmail.com> Cc: Chris Li <chrisl@kernel.org> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Seth Jennings <sjenning@redhat.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-12-12 10:57:01 -08:00
Matthew Wilcox (Oracle)	af7628d6ec	fs: convert error_remove_page to error_remove_folio There were already assertions that we were not passing a tail page to error_remove_page(), so make the compiler enforce that by converting everything to pass and use a folio. Link: https://lkml.kernel.org/r/20231117161447.2461643-7-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-12-10 16:51:42 -08:00
Matthew Wilcox (Oracle)	78c3c11268	gfs2: convert stuffed_readpage() to stuffed_read_folio() Use folio_fill_tail() to implement the unstuffing and folio_end_read() to simultaneously mark the folio uptodate and unlock it. Unifies a couple of code paths. Link: https://lkml.kernel.org/r/20231107212643.3490372-4-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-12-10 16:51:36 -08:00
Matthew Wilcox (Oracle)	600f111ef5	fs: Rename mapping private members It is hard to find where mapping->private_lock, mapping->private_list and mapping->private_data are used, due to private_XXX being a relatively common name for variables and structure members in the kernel. To fit with other members of struct address_space, rename them all to have an i_ prefix. Tested with an allmodconfig build. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: https://lore.kernel.org/r/20231117215823.2821906-1-willy@infradead.org Acked-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2023-11-21 11:57:10 +01:00
Alexander Aring	0c08699744	dlm: implement EXPORT_OP_ASYNC_LOCK This patch is activating the EXPORT_OP_ASYNC_LOCK export flag to signal lockd that both filesystems are able to handle async lock requests. The cluster filesystems gfs2 and ocfs2 will redirect their lock requests to DLMs plock implementation that can handle async lock requests. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2023-11-16 11:59:19 -06:00
Linus Torvalds	1a0507d878	gfs2 fixes - Don't update inode timestamps for direct writes (performance regression fix). - Skip no-op quota records instead of panicing. - Fix a RCU race in gfs2_permission(). - Various other smaller fixes and cleanups all over the place. -----BEGIN PGP SIGNATURE----- iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmVKReAUHGFncnVlbmJh QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTqZ0Q//RigZFdejtWEJ2GnrZfHCRxXXjF7A uXyK8PKv6xaHmF+I4PcEr/5yBkOxPhld0sBcsqdNUlvFbEl4AqLrXoSTMB2Fbq2h ycPTSPdGTFWjFJBIpN5F1LOQ0urxuS+fKkmxruaFwNywZqbPY0fYs7ZN5K76y0cL OKBcbOg5Q39Z+fqrxB0mf7WNIBfUxAsdelGQM3VK1ptNNxebmvBdNaWIhG7vWyWm MpWpPRwm906tOrMwhmy1oCCY6RrVC2naLlROOi58iodQ9sIF4MqmKOoejbbll6BH 1XMtOPQfO+IWWhq0/AX/xBL+nxuukko6V72Qg15e4sDhbla+vYrYEs4P+MZ5UU0u dM1MxnmV3xjKBw8KrnwMZL88ExuBhm6HLzKWlshJ4wye21Y3s2OKA2zCK2u3NBc3 GwYtNLBidoe8TjVD0ZeKeQJt2nZeKRrIWqQteaTHYpLHlRzCajEpmFS+ejEu59WC 8qw8MjTR2P6m8bPYNTbvxh6Lw4cE6ZHCj71nSPwrEDeU2QPQAmMKQg0bgTM7vpU7 HKl92Av20xhM+vyb/N670KcR/4yXEkUtbOzazyj/r/XB361luCFP9D9dRiNl7tOo TU0otksrkjJ7QrnVd3XUrA2N7iQEIeCVcx0k+KTGrJh7+yVzbUKaJ5W4Tsd+TVzB h/lfHAYS4FTd8nA= =YfOA -----END PGP SIGNATURE----- Merge tag 'gfs2-v6.6-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Andreas Gruenbacher: - Don't update inode timestamps for direct writes (performance regression fix) - Skip no-op quota records instead of panicing - Fix a RCU race in gfs2_permission() - Various other smaller fixes and cleanups all over the place * tag 'gfs2-v6.6-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (24 commits) gfs2: don't withdraw if init_threads() got interrupted gfs2: remove dead code in add_to_queue gfs2: Fix slab-use-after-free in gfs2_qd_dealloc gfs2: Silence "suspicious RCU usage in gfs2_permission" warning gfs2: fs: derive f_fsid from s_uuid gfs2: No longer use 'extern' in function declarations gfs2: Rename gfs2_lookup_{ simple => meta } gfs2: Convert gfs2_internal_read to folios gfs2: Convert stuffed_readpage to folios gfs2: Minor gfs2_write_jdata_batch PAGE_SIZE cleanup gfs2: Get rid of gfs2_alloc_blocks generation parameter gfs2: Add metapath_dibh helper gfs2: Clean up quota.c:print_message gfs2: Clean up gfs2_alloc_parms initializers gfs2: Two quota=account mode fixes gfs2: Stop using GFS2_BASIC_BLOCK and GFS2_BASIC_BLOCK_SHIFT gfs2: setattr_chown: Add missing initialization gfs2: fix an oops in gfs2_permission gfs2: ignore negated quota changes gfs2: Don't update inode timestamps for direct writes ...	2023-11-07 11:54:17 -08:00
Andreas Gruenbacher	0cdc6f44e9	gfs2: don't withdraw if init_threads() got interrupted In gfs2_fill_super(), when mounting a gfs2 filesystem is interrupted, kthread_create() can return -EINTR. When that happens, we roll back what has already been done and abort the mount. Since commit `62dd0f98a0` ("gfs2: Flag a withdraw if init_threads() fails), we are calling gfs2_withdraw_delayed() in gfs2_fill_super(); first via gfs2_make_fs_rw(), then directly. But gfs2_withdraw_delayed() only marks the filesystem as withdrawing and relies on a caller further up the stack to do the actual withdraw, which doesn't exist in the gfs2_fill_super() case. Because the filesystem is marked as withdrawing / withdrawn, function gfs2_lm_unmount() doesn't release the dlm lockspace, so when we try to mount that filesystem again, we get: gfs2: fsid=gohan:gohan0: Trying to join cluster "lock_dlm", "gohan:gohan0" gfs2: fsid=gohan:gohan0: dlm_new_lockspace error -17 Since commit `b77b4a4815` ("gfs2: Rework freeze / thaw logic"), the deadlock this gfs2_withdraw_delayed() call was supposed to work around cannot occur anymore because freeze_go_callback() won't take the sb->s_umount semaphore unconditionally anymore, so we can get rid of the gfs2_withdraw_delayed() in gfs2_fill_super() entirely. Reported-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Cc: stable@vger.kernel.org # v6.5+	2023-11-06 01:51:26 +01:00
Su Hui	bb25b97562	gfs2: remove dead code in add_to_queue clang static analyzer complains that value stored to 'gh' is never read. The code of this line is useless after commit `0b93bac227` ("gfs2: Remove LM_FLAG_PRIORITY flag"). Remove this code to save space. Signed-off-by: Su Hui <suhui@nfschina.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-11-06 01:51:26 +01:00
Juntong Deng	bdcb8aa434	gfs2: Fix slab-use-after-free in gfs2_qd_dealloc In gfs2_put_super(), whether withdrawn or not, the quota should be cleaned up by gfs2_quota_cleanup(). Otherwise, struct gfs2_sbd will be freed before gfs2_qd_dealloc (rcu callback) has run for all gfs2_quota_data objects, resulting in use-after-free. Also, gfs2_destroy_threads() and gfs2_quota_cleanup() is already called by gfs2_make_fs_ro(), so in gfs2_put_super(), after calling gfs2_make_fs_ro(), there is no need to call them again. Reported-by: syzbot+29c47e9e51895928698c@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=29c47e9e51895928698c Signed-off-by: Juntong Deng <juntong.deng@outlook.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-11-06 01:51:26 +01:00
Andreas Gruenbacher	074d7306a4	gfs2: Silence "suspicious RCU usage in gfs2_permission" warning Commit `0abd1557e2` added rcu_dereference() for dereferencing ip->i_gl in gfs2_permission. This now causes lockdep to complain when gfs2_permission is called in non-RCU context: WARNING: suspicious RCU usage in gfs2_permission Switch to rcu_dereference_check() and check for the MAY_NOT_BLOCK flag to shut up lockdep when we know that dereferencing ip->i_gl is safe. Fixes: `0abd1557e2` ("gfs2: fix an oops in gfs2_permission") Reported-by: syzbot+3e5130844b0c0e2b4948@syzkaller.appspotmail.com Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-11-06 01:51:26 +01:00
Amir Goldstein	d6fc6c9363	gfs2: fs: derive f_fsid from s_uuid gfs2 already has optional persistent uuid. Use that uuid to report f_fsid in statfs(2), same as ext2/ext4/zonefs. This allows gfs2 to be monitored by fanotify filesystem watch. for example, with inotify-tools 4.23.8.0, the following command can be used to watch changes over entire filesystem: fsnotifywatch --filesystem /mnt/gfs2 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-11-06 01:51:26 +01:00
Andreas Gruenbacher	0b2355fe91	gfs2: No longer use 'extern' in function declarations For non-static function declarations, external linkage is implied and the 'extern' keyword isn't needed. Some static checkers complain about the overuse of 'extern', so clean up all the function declarations. In addition, remove 'extern' from the definition of free_local_statfs_inodes(); it isn't needed there, either. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-11-06 01:51:26 +01:00
Andreas Gruenbacher	062fb90389	gfs2: Rename gfs2_lookup_{ simple => meta } Function gfs2_lookup_simple() is used for looking up inodes in the metadata directory tree, so rename it to gfs2_lookup_meta() to closer match its purpose. Clean the function up a little on the way. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-11-06 01:51:26 +01:00
Andreas Gruenbacher	be7f6a6b0b	gfs2: Convert gfs2_internal_read to folios Change gfs2_internal_read() to use folios. Convert sizes to size_t. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-11-06 01:51:26 +01:00
Andreas Gruenbacher	7fa4964b35	gfs2: Convert stuffed_readpage to folios Change stuffed_readpage() to take a folio instead of a page. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-11-06 01:51:25 +01:00
Andreas Gruenbacher	d6d64dac1d	gfs2: Minor gfs2_write_jdata_batch PAGE_SIZE cleanup In gfs2_write_jdata_batch(), to compute the number of blocks, compute the total size of the folio batch instead of the number of pages it contains. Not a functional change. Note that we don't currently allow mounting filesystems with a block size bigger than the page size. We could change that after converting the page cache to folios. The page cache would then only contain block-size or bigger folios, so rounding wouldn't become an issue here. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-11-06 01:51:25 +01:00
Andreas Gruenbacher	4c7b3f7fb7	gfs2: Get rid of gfs2_alloc_blocks generation parameter Get rid of the generation parameter of gfs2_alloc_blocks(): we only ever set the generation of the current inode while creating it, so do so directly. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-11-06 01:51:25 +01:00
Linus Torvalds	8f6f76a6a2	As usual, lots of singleton and doubleton patches all over the tree and there's little I can say which isn't in the individual changelogs. The lengthier patch series are - "kdump: use generic functions to simplify crashkernel reservation in arch", from Baoquan He. This is mainly cleanups and consolidation of the "crashkernel=" kernel parameter handling. - After much discussion, David Laight's "minmax: Relax type checks in min() and max()" is here. Hopefully reduces some typecasting and the use of min_t() and max_t(). - A group of patches from Oleg Nesterov which clean up and slightly fix our handling of reads from /proc/PID/task/... and which remove task_struct.therad_group. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZUQP9wAKCRDdBJ7gKXxA jmOAAQDh8sxagQYocoVsSm28ICqXFeaY9Co1jzBIDdNesAvYVwD/c2DHRqJHEiS4 63BNcG3+hM9nwGJHb5lyh5m79nBMRg0= =On4u -----END PGP SIGNATURE----- Merge tag 'mm-nonmm-stable-2023-11-02-14-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: "As usual, lots of singleton and doubleton patches all over the tree and there's little I can say which isn't in the individual changelogs. The lengthier patch series are - 'kdump: use generic functions to simplify crashkernel reservation in arch', from Baoquan He. This is mainly cleanups and consolidation of the 'crashkernel=' kernel parameter handling - After much discussion, David Laight's 'minmax: Relax type checks in min() and max()' is here. Hopefully reduces some typecasting and the use of min_t() and max_t() - A group of patches from Oleg Nesterov which clean up and slightly fix our handling of reads from /proc/PID/task/... and which remove task_struct.thread_group" * tag 'mm-nonmm-stable-2023-11-02-14-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (64 commits) scripts/gdb/vmalloc: disable on no-MMU scripts/gdb: fix usage of MOD_TEXT not defined when CONFIG_MODULES=n .mailmap: add address mapping for Tomeu Vizoso mailmap: update email address for Claudiu Beznea tools/testing/selftests/mm/run_vmtests.sh: lower the ptrace permissions .mailmap: map Benjamin Poirier's address scripts/gdb: add lx_current support for riscv ocfs2: fix a spelling typo in comment proc: test ProtectionKey in proc-empty-vm test proc: fix proc-empty-vm test with vsyscall fs/proc/base.c: remove unneeded semicolon do_io_accounting: use sig->stats_lock do_io_accounting: use __for_each_thread() ocfs2: replace BUG_ON() at ocfs2_num_free_extents() with ocfs2_error() ocfs2: fix a typo in a comment scripts/show_delta: add __main__ judgement before main code treewide: mark stuff as __ro_after_init fs: ocfs2: check status values proc: test /proc/${pid}/statm compiler.h: move __is_constexpr() to compiler.h ...	2023-11-02 20:53:31 -10:00
Linus Torvalds	ecae0bd517	Many singleton patches against the MM code. The patch series which are included in this merge do the following: - Kemeng Shi has contributed some compation maintenance work in the series "Fixes and cleanups to compaction". - Joel Fernandes has a patchset ("Optimize mremap during mutual alignment within PMD") which fixes an obscure issue with mremap()'s pagetable handling during a subsequent exec(), based upon an implementation which Linus suggested. - More DAMON/DAMOS maintenance and feature work from SeongJae Park i the following patch series: mm/damon: misc fixups for documents, comments and its tracepoint mm/damon: add a tracepoint for damos apply target regions mm/damon: provide pseudo-moving sum based access rate mm/damon: implement DAMOS apply intervals mm/damon/core-test: Fix memory leaks in core-test mm/damon/sysfs-schemes: Do DAMOS tried regions update for only one apply interval - In the series "Do not try to access unaccepted memory" Adrian Hunter provides some fixups for the recently-added "unaccepted memory' feature. To increase the feature's checking coverage. "Plug a few gaps where RAM is exposed without checking if it is unaccepted memory". - In the series "cleanups for lockless slab shrink" Qi Zheng has done some maintenance work which is preparation for the lockless slab shrinking code. - Qi Zheng has redone the earlier (and reverted) attempt to make slab shrinking lockless in the series "use refcount+RCU method to implement lockless slab shrink". - David Hildenbrand contributes some maintenance work for the rmap code in the series "Anon rmap cleanups". - Kefeng Wang does more folio conversions and some maintenance work in the migration code. Series "mm: migrate: more folio conversion and unification". - Matthew Wilcox has fixed an issue in the buffer_head code which was causing long stalls under some heavy memory/IO loads. Some cleanups were added on the way. Series "Add and use bdev_getblk()". - In the series "Use nth_page() in place of direct struct page manipulation" Zi Yan has fixed a potential issue with the direct manipulation of hugetlb page frames. - In the series "mm: hugetlb: Skip initialization of gigantic tail struct pages if freed by HVO" has improved our handling of gigantic pages in the hugetlb vmmemmep optimizaton code. This provides significant boot time improvements when significant amounts of gigantic pages are in use. - Matthew Wilcox has sent the series "Small hugetlb cleanups" - code rationalization and folio conversions in the hugetlb code. - Yin Fengwei has improved mlock()'s handling of large folios in the series "support large folio for mlock" - In the series "Expose swapcache stat for memcg v1" Liu Shixin has added statistics for memcg v1 users which are available (and useful) under memcg v2. - Florent Revest has enhanced the MDWE (Memory-Deny-Write-Executable) prctl so that userspace may direct the kernel to not automatically propagate the denial to child processes. The series is named "MDWE without inheritance". - Kefeng Wang has provided the series "mm: convert numa balancing functions to use a folio" which does what it says. - In the series "mm/ksm: add fork-exec support for prctl" Stefan Roesch makes is possible for a process to propagate KSM treatment across exec(). - Huang Ying has enhanced memory tiering's calculation of memory distances. This is used to permit the dax/kmem driver to use "high bandwidth memory" in addition to Optane Data Center Persistent Memory Modules (DCPMM). The series is named "memory tiering: calculate abstract distance based on ACPI HMAT" - In the series "Smart scanning mode for KSM" Stefan Roesch has optimized KSM by teaching it to retain and use some historical information from previous scans. - Yosry Ahmed has fixed some inconsistencies in memcg statistics in the series "mm: memcg: fix tracking of pending stats updates values". - In the series "Implement IOCTL to get and optionally clear info about PTEs" Peter Xu has added an ioctl to /proc/<pid>/pagemap which permits us to atomically read-then-clear page softdirty state. This is mainly used by CRIU. - Hugh Dickins contributed the series "shmem,tmpfs: general maintenance" - a bunch of relatively minor maintenance tweaks to this code. - Matthew Wilcox has increased the use of the VMA lock over file-backed page faults in the series "Handle more faults under the VMA lock". Some rationalizations of the fault path became possible as a result. - In the series "mm/rmap: convert page_move_anon_rmap() to folio_move_anon_rmap()" David Hildenbrand has implemented some cleanups and folio conversions. - In the series "various improvements to the GUP interface" Lorenzo Stoakes has simplified and improved the GUP interface with an eye to providing groundwork for future improvements. - Andrey Konovalov has sent along the series "kasan: assorted fixes and improvements" which does those things. - Some page allocator maintenance work from Kemeng Shi in the series "Two minor cleanups to break_down_buddy_pages". - In thes series "New selftest for mm" Breno Leitao has developed another MM self test which tickles a race we had between madvise() and page faults. - In the series "Add folio_end_read" Matthew Wilcox provides cleanups and an optimization to the core pagecache code. - Nhat Pham has added memcg accounting for hugetlb memory in the series "hugetlb memcg accounting". - Cleanups and rationalizations to the pagemap code from Lorenzo Stoakes, in the series "Abstract vma_merge() and split_vma()". - Audra Mitchell has fixed issues in the procfs page_owner code's new timestamping feature which was causing some misbehaviours. In the series "Fix page_owner's use of free timestamps". - Lorenzo Stoakes has fixed the handling of new mappings of sealed files in the series "permit write-sealed memfd read-only shared mappings". - Mike Kravetz has optimized the hugetlb vmemmap optimization in the series "Batch hugetlb vmemmap modification operations". - Some buffer_head folio conversions and cleanups from Matthew Wilcox in the series "Finish the create_empty_buffers() transition". - As a page allocator performance optimization Huang Ying has added automatic tuning to the allocator's per-cpu-pages feature, in the series "mm: PCP high auto-tuning". - Roman Gushchin has contributed the patchset "mm: improve performance of accounted kernel memory allocations" which improves their performance by ~30% as measured by a micro-benchmark. - folio conversions from Kefeng Wang in the series "mm: convert page cpupid functions to folios". - Some kmemleak fixups in Liu Shixin's series "Some bugfix about kmemleak". - Qi Zheng has improved our handling of memoryless nodes by keeping them off the allocation fallback list. This is done in the series "handle memoryless nodes more appropriately". - khugepaged conversions from Vishal Moola in the series "Some khugepaged folio conversions". -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZULEMwAKCRDdBJ7gKXxA jhQHAQCYpD3g849x69DmHnHWHm/EHQLvQmRMDeYZI+nx/sCJOwEAw4AKg0Oemv9y FgeUPAD1oasg6CP+INZvCj34waNxwAc= =E+Y4 -----END PGP SIGNATURE----- Merge tag 'mm-stable-2023-11-01-14-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "Many singleton patches against the MM code. The patch series which are included in this merge do the following: - Kemeng Shi has contributed some compation maintenance work in the series 'Fixes and cleanups to compaction' - Joel Fernandes has a patchset ('Optimize mremap during mutual alignment within PMD') which fixes an obscure issue with mremap()'s pagetable handling during a subsequent exec(), based upon an implementation which Linus suggested - More DAMON/DAMOS maintenance and feature work from SeongJae Park i the following patch series: mm/damon: misc fixups for documents, comments and its tracepoint mm/damon: add a tracepoint for damos apply target regions mm/damon: provide pseudo-moving sum based access rate mm/damon: implement DAMOS apply intervals mm/damon/core-test: Fix memory leaks in core-test mm/damon/sysfs-schemes: Do DAMOS tried regions update for only one apply interval - In the series 'Do not try to access unaccepted memory' Adrian Hunter provides some fixups for the recently-added 'unaccepted memory' feature. To increase the feature's checking coverage. 'Plug a few gaps where RAM is exposed without checking if it is unaccepted memory' - In the series 'cleanups for lockless slab shrink' Qi Zheng has done some maintenance work which is preparation for the lockless slab shrinking code - Qi Zheng has redone the earlier (and reverted) attempt to make slab shrinking lockless in the series 'use refcount+RCU method to implement lockless slab shrink' - David Hildenbrand contributes some maintenance work for the rmap code in the series 'Anon rmap cleanups' - Kefeng Wang does more folio conversions and some maintenance work in the migration code. Series 'mm: migrate: more folio conversion and unification' - Matthew Wilcox has fixed an issue in the buffer_head code which was causing long stalls under some heavy memory/IO loads. Some cleanups were added on the way. Series 'Add and use bdev_getblk()' - In the series 'Use nth_page() in place of direct struct page manipulation' Zi Yan has fixed a potential issue with the direct manipulation of hugetlb page frames - In the series 'mm: hugetlb: Skip initialization of gigantic tail struct pages if freed by HVO' has improved our handling of gigantic pages in the hugetlb vmmemmep optimizaton code. This provides significant boot time improvements when significant amounts of gigantic pages are in use - Matthew Wilcox has sent the series 'Small hugetlb cleanups' - code rationalization and folio conversions in the hugetlb code - Yin Fengwei has improved mlock()'s handling of large folios in the series 'support large folio for mlock' - In the series 'Expose swapcache stat for memcg v1' Liu Shixin has added statistics for memcg v1 users which are available (and useful) under memcg v2 - Florent Revest has enhanced the MDWE (Memory-Deny-Write-Executable) prctl so that userspace may direct the kernel to not automatically propagate the denial to child processes. The series is named 'MDWE without inheritance' - Kefeng Wang has provided the series 'mm: convert numa balancing functions to use a folio' which does what it says - In the series 'mm/ksm: add fork-exec support for prctl' Stefan Roesch makes is possible for a process to propagate KSM treatment across exec() - Huang Ying has enhanced memory tiering's calculation of memory distances. This is used to permit the dax/kmem driver to use 'high bandwidth memory' in addition to Optane Data Center Persistent Memory Modules (DCPMM). The series is named 'memory tiering: calculate abstract distance based on ACPI HMAT' - In the series 'Smart scanning mode for KSM' Stefan Roesch has optimized KSM by teaching it to retain and use some historical information from previous scans - Yosry Ahmed has fixed some inconsistencies in memcg statistics in the series 'mm: memcg: fix tracking of pending stats updates values' - In the series 'Implement IOCTL to get and optionally clear info about PTEs' Peter Xu has added an ioctl to /proc/<pid>/pagemap which permits us to atomically read-then-clear page softdirty state. This is mainly used by CRIU - Hugh Dickins contributed the series 'shmem,tmpfs: general maintenance', a bunch of relatively minor maintenance tweaks to this code - Matthew Wilcox has increased the use of the VMA lock over file-backed page faults in the series 'Handle more faults under the VMA lock'. Some rationalizations of the fault path became possible as a result - In the series 'mm/rmap: convert page_move_anon_rmap() to folio_move_anon_rmap()' David Hildenbrand has implemented some cleanups and folio conversions - In the series 'various improvements to the GUP interface' Lorenzo Stoakes has simplified and improved the GUP interface with an eye to providing groundwork for future improvements - Andrey Konovalov has sent along the series 'kasan: assorted fixes and improvements' which does those things - Some page allocator maintenance work from Kemeng Shi in the series 'Two minor cleanups to break_down_buddy_pages' - In thes series 'New selftest for mm' Breno Leitao has developed another MM self test which tickles a race we had between madvise() and page faults - In the series 'Add folio_end_read' Matthew Wilcox provides cleanups and an optimization to the core pagecache code - Nhat Pham has added memcg accounting for hugetlb memory in the series 'hugetlb memcg accounting' - Cleanups and rationalizations to the pagemap code from Lorenzo Stoakes, in the series 'Abstract vma_merge() and split_vma()' - Audra Mitchell has fixed issues in the procfs page_owner code's new timestamping feature which was causing some misbehaviours. In the series 'Fix page_owner's use of free timestamps' - Lorenzo Stoakes has fixed the handling of new mappings of sealed files in the series 'permit write-sealed memfd read-only shared mappings' - Mike Kravetz has optimized the hugetlb vmemmap optimization in the series 'Batch hugetlb vmemmap modification operations' - Some buffer_head folio conversions and cleanups from Matthew Wilcox in the series 'Finish the create_empty_buffers() transition' - As a page allocator performance optimization Huang Ying has added automatic tuning to the allocator's per-cpu-pages feature, in the series 'mm: PCP high auto-tuning' - Roman Gushchin has contributed the patchset 'mm: improve performance of accounted kernel memory allocations' which improves their performance by ~30% as measured by a micro-benchmark - folio conversions from Kefeng Wang in the series 'mm: convert page cpupid functions to folios' - Some kmemleak fixups in Liu Shixin's series 'Some bugfix about kmemleak' - Qi Zheng has improved our handling of memoryless nodes by keeping them off the allocation fallback list. This is done in the series 'handle memoryless nodes more appropriately' - khugepaged conversions from Vishal Moola in the series 'Some khugepaged folio conversions'" [ bcachefs conflicts with the dynamically allocated shrinkers have been resolved as per Stephen Rothwell in https://lore.kernel.org/all/20230913093553.4290421e@canb.auug.org.au/ with help from Qi Zheng. The clone3 test filtering conflict was half-arsed by yours truly ] * tag 'mm-stable-2023-11-01-14-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (406 commits) mm/damon/sysfs: update monitoring target regions for online input commit mm/damon/sysfs: remove requested targets when online-commit inputs selftests: add a sanity check for zswap Documentation: maple_tree: fix word spelling error mm/vmalloc: fix the unchecked dereference warning in vread_iter() zswap: export compression failure stats Documentation: ubsan: drop "the" from article title mempolicy: migration attempt to match interleave nodes mempolicy: mmap_lock is not needed while migrating folios mempolicy: alloc_pages_mpol() for NUMA policy without vma mm: add page_rmappable_folio() wrapper mempolicy: remove confusing MPOL_MF_LAZY dead code mempolicy: mpol_shared_policy_init() without pseudo-vma mempolicy trivia: use pgoff_t in shared mempolicy tree mempolicy trivia: slightly more consistent naming mempolicy trivia: delete those ancient pr_debug()s mempolicy: fix migrate_pages(2) syscall return nr_failed kernfs: drop shared NUMA mempolicy hooks hugetlbfs: drop shared NUMA mempolicy pretence mm/damon/sysfs-test: add a unit test for damon_sysfs_set_targets() ...	2023-11-02 19:38:47 -10:00
Andreas Gruenbacher	92099f0c92	gfs2: Add metapath_dibh helper Add a metapath_dibh() helper for extracting the inode's buffer head from a metapath. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-11-02 20:10:00 +01:00
Andreas Gruenbacher	1f7b0a84c8	gfs2: Clean up quota.c:print_message Function print_message() in quota.c doesn't return a meaningful return value. Turn it into a void function and stop abusing it for setting variable error to 0 in gfs2_quota_check(). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-11-02 20:10:00 +01:00
Andreas Gruenbacher	f7e4c610cb	gfs2: Clean up gfs2_alloc_parms initializers When intializing a struct, all fields that are not explicitly mentioned are zeroed out already. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-11-02 20:10:00 +01:00
Andreas Gruenbacher	703df114f9	gfs2: Two quota=account mode fixes Make sure we don't skip accounting for quota changes with the quota=account mount option. Reviewed-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-11-02 20:09:38 +01:00
Linus Torvalds	14ab6d425e	vfs-6.7.ctime -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZTppYgAKCRCRxhvAZXjc okIHAP9anLz1QDyMLH12ASuHjgBc0Of3jcB6NB97IWGpL4O21gEA46ohaD+vcJuC YkBLU3lXqQ87nfu28ExFAzh10hG2jwM= =m4pB -----END PGP SIGNATURE----- Merge tag 'vfs-6.7.ctime' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull vfs inode time accessor updates from Christian Brauner: "This finishes the conversion of all inode time fields to accessor functions as discussed on list. Changing timestamps manually as we used to do before is error prone. Using accessors function makes this robust. It does not contain the switch of the time fields to discrete 64 bit integers to replace struct timespec and free up space in struct inode. But after this, the switch can be trivially made and the patch should only affect the vfs if we decide to do it" * tag 'vfs-6.7.ctime' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (86 commits) fs: rename inode i_atime and i_mtime fields security: convert to new timestamp accessors selinux: convert to new timestamp accessors apparmor: convert to new timestamp accessors sunrpc: convert to new timestamp accessors mm: convert to new timestamp accessors bpf: convert to new timestamp accessors ipc: convert to new timestamp accessors linux: convert to new timestamp accessors zonefs: convert to new timestamp accessors xfs: convert to new timestamp accessors vboxsf: convert to new timestamp accessors ufs: convert to new timestamp accessors udf: convert to new timestamp accessors ubifs: convert to new timestamp accessors tracefs: convert to new timestamp accessors sysv: convert to new timestamp accessors squashfs: convert to new timestamp accessors server: convert to new timestamp accessors client: convert to new timestamp accessors ...	2023-10-30 09:47:13 -10:00
Linus Torvalds	7352a6765c	vfs-6.7.xattr -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZTppWAAKCRCRxhvAZXjc okB2AP4jjoRErJBwj245OIDJqzoj4m4UVOVd0MH2AkiSpANczwD/TToChdpusY2y qAYg1fQoGMbDVlb7Txaj9qI9ieCf9w0= =2PXg -----END PGP SIGNATURE----- Merge tag 'vfs-6.7.xattr' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull vfs xattr updates from Christian Brauner: "The 's_xattr' field of 'struct super_block' currently requires a mutable table of 'struct xattr_handler' entries (although each handler itself is const). However, no code in vfs actually modifies the tables. This changes the type of 's_xattr' to allow const tables, and modifies existing file systems to move their tables to .rodata. This is desirable because these tables contain entries with function pointers in them; moving them to .rodata makes it considerably less likely to be modified accidentally or maliciously at runtime" * tag 'vfs-6.7.xattr' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (30 commits) const_structs.checkpatch: add xattr_handler net: move sockfs_xattr_handlers to .rodata shmem: move shmem_xattr_handlers to .rodata overlayfs: move xattr tables to .rodata xfs: move xfs_xattr_handlers to .rodata ubifs: move ubifs_xattr_handlers to .rodata squashfs: move squashfs_xattr_handlers to .rodata smb: move cifs_xattr_handlers to .rodata reiserfs: move reiserfs_xattr_handlers to .rodata orangefs: move orangefs_xattr_handlers to .rodata ocfs2: move ocfs2_xattr_handlers and ocfs2_xattr_handler_map to .rodata ntfs3: move ntfs_xattr_handlers to .rodata nfs: move nfs4_xattr_handlers to .rodata kernfs: move kernfs_xattr_handlers to .rodata jfs: move jfs_xattr_handlers to .rodata jffs2: move jffs2_xattr_handlers to .rodata hfsplus: move hfsplus_xattr_handlers to .rodata hfs: move hfs_xattr_handlers to .rodata gfs2: move gfs2_xattr_handlers_max to .rodata fuse: move fuse_xattr_handlers to .rodata ...	2023-10-30 09:29:44 -10:00
Linus Torvalds	3b3f874cc1	vfs-6.7.misc -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZTpoQAAKCRCRxhvAZXjc ovFNAQDgIRjXfZ1Ku+USxsRRdqp8geJVaNc3PuMmYhOYhUenqgEAmC1m+p0y31dS P6+HlL16Mqgu0tpLCcJK9BibpDZ0Ew4= =7yD1 -----END PGP SIGNATURE----- Merge tag 'vfs-6.7.misc' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "This contains the usual miscellaneous features, cleanups, and fixes for vfs and individual fses. Features: - Rename and export helpers that get write access to a mount. They are used in overlayfs to get write access to the upper mount. - Print the pretty name of the root device on boot failure. This helps in scenarios where we would usually only print "unknown-block(1,2)". - Add an internal SB_I_NOUMASK flag. This is another part in the endless POSIX ACL saga in a way. When POSIX ACLs are enabled via SB_POSIXACL the vfs cannot strip the umask because if the relevant inode has POSIX ACLs set it might take the umask from there. But if the inode doesn't have any POSIX ACLs set then we apply the umask in the filesytem itself. So we end up with: (1) no SB_POSIXACL -> strip umask in vfs (2) SB_POSIXACL -> strip umask in filesystem The umask semantics associated with SB_POSIXACL allowed filesystems that don't even support POSIX ACLs at all to raise SB_POSIXACL purely to avoid umask stripping. That specifically means NFS v4 and Overlayfs. NFS v4 does it because it delegates this to the server and Overlayfs because it needs to delegate umask stripping to the upper filesystem, i.e., the filesystem used as the writable layer. This went so far that SB_POSIXACL is raised eve on kernels that don't even have POSIX ACL support at all. Stop this blatant abuse and add SB_I_NOUMASK which is an internal superblock flag that filesystems can raise to opt out of umask handling. That should really only be the two mentioned above. It's not that we want any filesystems to do this. Ideally we have all umask handling always in the vfs. - Make overlayfs use SB_I_NOUMASK too. - Now that we have SB_I_NOUMASK, stop checking for SB_POSIXACL in IS_POSIXACL() if the kernel doesn't have support for it. This is a very old patch but it's only possible to do this now with the wider cleanup that was done. - Follow-up work on fake path handling from last cycle. Citing mostly from Amir: When overlayfs was first merged, overlayfs files of regular files and directories, the ones that are installed in file table, had a "fake" path, namely, f_path is the overlayfs path and f_inode is the "real" inode on the underlying filesystem. In v6.5, we took another small step by introducing of the backing_file container and the file_real_path() helper. This change allowed vfs and filesystem code to get the "real" path of an overlayfs backing file. With this change, we were able to make fsnotify work correctly and report events on the "real" filesystem objects that were accessed via overlayfs. This method works fine, but it still leaves the vfs vulnerable to new code that is not aware of files with fake path. A recent example is commit `db1d1e8b98` ("IMA: use vfs_getattr_nosec to get the i_version"). This commit uses direct referencing to f_path in IMA code that otherwise uses file_inode() and file_dentry() to reference the filesystem objects that it is measuring. This contains work to switch things around: instead of having filesystem code opt-in to get the "real" path, have generic code opt-in for the "fake" path in the few places that it is needed. Is it far more likely that new filesystems code that does not use the file_dentry() and file_real_path() helpers will end up causing crashes or averting LSM/audit rules if we keep the "fake" path exposed by default. This change already makes file_dentry() moot, but for now we did not change this helper just added a WARN_ON() in ovl_d_real() to catch if we have made any wrong assumptions. After the dust settles on this change, we can make file_dentry() a plain accessor and we can drop the inode argument to ->d_real(). - Switch struct file to SLAB_TYPESAFE_BY_RCU. This looks like a small change but it really isn't and I would like to see everyone on their tippie toes for any possible bugs from this work. Essentially we've been doing most of what SLAB_TYPESAFE_BY_RCU for files since a very long time because of the nasty interactions between the SCM_RIGHTS file descriptor garbage collection. So extending it makes a lot of sense but it is a subtle change. There are almost no places that fiddle with file rcu semantics directly and the ones that did mess around with struct file internal under rcu have been made to stop doing that because it really was always dodgy. I forgot to put in the link tag for this change and the discussion in the commit so adding it into the merge message: https://lore.kernel.org/r/20230926162228.68666-1-mjguzik@gmail.com Cleanups: - Various smaller pipe cleanups including the removal of a spin lock that was only used to protect against writes without pipe_lock() from O_NOTIFICATION_PIPE aka watch queues. As that was never implemented remove the additional locking from pipe_write(). - Annotate struct watch_filter with the new __counted_by attribute. - Clarify do_unlinkat() cleanup so that it doesn't look like an extra iput() is done that would cause issues. - Simplify file cleanup when the file has never been opened. - Use module helper instead of open-coding it. - Predict error unlikely for stale retry. - Use WRITE_ONCE() for mount expiry field instead of just commenting that one hopes the compiler doesn't get smart. Fixes: - Fix readahead on block devices. - Fix writeback when layztime is enabled and inodes whose timestamp is the only thing that changed reside on wb->b_dirty_time. This caused excessively large zombie memory cgroup when lazytime was enabled as such inodes weren't handled fast enough. - Convert BUG_ON() to WARN_ON_ONCE() in open_last_lookups()" * tag 'vfs-6.7.misc' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (26 commits) file, i915: fix file reference for mmap_singleton() vfs: Convert BUG_ON to WARN_ON_ONCE in open_last_lookups writeback, cgroup: switch inodes with dirty timestamps to release dying cgwbs chardev: Simplify usage of try_module_get() ovl: rely on SB_I_NOUMASK fs: fix umask on NFS with CONFIG_FS_POSIX_ACL=n fs: store real path instead of fake path in backing file f_path fs: create helper file_user_path() for user displayed mapped file path fs: get mnt_writers count for an open backing file's real path vfs: stop counting on gcc not messing with mnt_expiry_mark if not asked vfs: predict the error in retry_estale as unlikely backing file: free directly vfs: fix readahead(2) on block devices io_uring: use files_lookup_fd_locked() file: convert to SLAB_TYPESAFE_BY_RCU vfs: shave work on failed file open fs: simplify misleading code to remove ambiguity regarding ihold()/iput() watch_queue: Annotate struct watch_filter with __counted_by fs/pipe: use spinlock in pipe_read() only if there is a watch_queue fs/pipe: remove unnecessary spinlock from pipe_write() ...	2023-10-30 09:14:19 -10:00
Matthew Wilcox (Oracle)	0a88810d9b	buffer: remove folio_create_empty_buffers() With all users converted, remove the old create_empty_buffers() and rename folio_create_empty_buffers() to create_empty_buffers(). Link: https://lkml.kernel.org/r/20231016201114.1928083-28-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Pankaj Raghav <p.raghav@samsung.com> Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-10-25 16:47:10 -07:00
Matthew Wilcox (Oracle)	4064a0aa8a	gfs2: convert gfs2_write_buf_to_page() to use a folio Remove several folio->page->folio conversions. Link: https://lkml.kernel.org/r/20231016201114.1928083-10-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> Cc: Pankaj Raghav <p.raghav@samsung.com> Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-10-25 16:47:08 -07:00
Matthew Wilcox (Oracle)	c646e57372	gfs2: convert gfs2_getjdatabuf to use a folio Use the folio APIs, saving four hidden calls to compound_head(). Link: https://lkml.kernel.org/r/20231016201114.1928083-9-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> Cc: Pankaj Raghav <p.raghav@samsung.com> Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-10-25 16:47:08 -07:00
Matthew Wilcox (Oracle)	0eb751791d	gfs2: convert gfs2_getbuf() to folios Remove several folio->page->folio conversions. Also use __GFP_NOFAIL instead of calling yield() and the new get_nth_bh(). Link: https://lkml.kernel.org/r/20231016201114.1928083-8-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> Cc: Pankaj Raghav <p.raghav@samsung.com> Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-10-25 16:47:08 -07:00
Matthew Wilcox (Oracle)	81cb277ebd	gfs2: convert inode unstuffing to use a folio Use the folio APIs, removing numerous hidden calls to compound_head(). Also remove the stale comment about the page being looked up if it's NULL. Link: https://lkml.kernel.org/r/20231016201114.1928083-7-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> Cc: Pankaj Raghav <p.raghav@samsung.com> Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-10-25 16:47:08 -07:00
Andreas Gruenbacher	1358706907	gfs2: Stop using GFS2_BASIC_BLOCK and GFS2_BASIC_BLOCK_SHIFT Header gfs2_ondisk.h defines GFS2_BASIC_BLOCK and GFS2_BASIC_BLOCK_SHIFT in a misguided attempt to abstract away the fact that sectors on block devices are 512 or (1 << 9) bytes in size. Stop using those definitions. I would be inclinded to remove those definitions altogether, but the gfs2 user-space tools are using them. In addition, instead of GFS2_SB(inode)->sd_sb.sb_bsize_shift, simply use inode->i_blkbits. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Cc: Andrew Price <anprice@redhat.com>	2023-10-23 11:47:13 +02:00
Andreas Gruenbacher	2d8d799061	gfs2: setattr_chown: Add missing initialization Add a missing initialization of variable ap in setattr_chown(). Without, chown() may be able to bypass quotas. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-10-23 11:47:13 +02:00
Christian Brauner	0ede61d858	file: convert to SLAB_TYPESAFE_BY_RCU In recent discussions around some performance improvements in the file handling area we discussed switching the file cache to rely on SLAB_TYPESAFE_BY_RCU which allows us to get rid of call_rcu() based freeing for files completely. This is a pretty sensitive change overall but it might actually be worth doing. The main downside is the subtlety. The other one is that we should really wait for Jann's patch to land that enables KASAN to handle SLAB_TYPESAFE_BY_RCU UAFs. Currently it doesn't but a patch for this exists. With SLAB_TYPESAFE_BY_RCU objects may be freed and reused multiple times which requires a few changes. So it isn't sufficient anymore to just acquire a reference to the file in question under rcu using atomic_long_inc_not_zero() since the file might have already been recycled and someone else might have bumped the reference. In other words, callers might see reference count bumps from newer users. For this reason it is necessary to verify that the pointer is the same before and after the reference count increment. This pattern can be seen in get_file_rcu() and __files_get_rcu(). In addition, it isn't possible to access or check fields in struct file without first aqcuiring a reference on it. Not doing that was always very dodgy and it was only usable for non-pointer data in struct file. With SLAB_TYPESAFE_BY_RCU it is necessary that callers first acquire a reference under rcu or they must hold the files_lock of the fdtable. Failing to do either one of this is a bug. Thanks to Jann for pointing out that we need to ensure memory ordering between reallocations and pointer check by ensuring that all subsequent loads have a dependency on the second load in get_file_rcu() and providing a fixup that was folded into this patch. Cc: Jann Horn <jannh@google.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2023-10-19 11:02:48 +02:00
Jeff Layton	580f721b6f	gfs2: convert to new timestamp accessors Convert to using the new inode timestamp accessor functions. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20231004185347.80880-38-jlayton@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2023-10-18 14:08:21 +02:00
Wedson Almeida Filho	89491fafa8	gfs2: move gfs2_xattr_handlers_max to .rodata This makes it harder for accidental or malicious changes to gfs2_xattr_handlers_max at runtime. Cc: Bob Peterson <rpeterso@redhat.com> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: cluster-devel@redhat.com Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com> Link: https://lore.kernel.org/r/20230930050033.41174-13-wedsonaf@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2023-10-09 16:24:18 +02:00
Andreas Gruenbacher	6309727ef2	kthread: add kthread_stop_put Add a kthread_stop_put() helper that stops a thread and puts its task struct. Use it to replace the various instances of kthread_stop() followed by put_task_struct(). Remove the kthread_stop_put() macro in usbip that is similar but doesn't return the result of kthread_stop(). [agruenba@redhat.com: fix kerneldoc comment] Link: https://lkml.kernel.org/r/20230911111730.2565537-1-agruenba@redhat.com [akpm@linux-foundation.org: document kthread_stop_put()'s argument] Link: https://lkml.kernel.org/r/20230907234048.2499820-1-agruenba@redhat.com Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-10-04 10:41:57 -07:00
Qi Zheng	8ee0fd9c10	gfs2: dynamically allocate the gfs2-qd shrinker Use new APIs to dynamically allocate the gfs2-qd shrinker. Link: https://lkml.kernel.org/r/20230911094444.68966-10-zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Bob Peterson <rpeterso@redhat.com> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Abhinav Kumar <quic_abhinavk@quicinc.com> Cc: Alasdair Kergon <agk@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Anna Schumaker <anna@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Carlos Llamas <cmllamas@google.com> Cc: Chandan Babu R <chandan.babu@oracle.com> Cc: Chao Yu <chao@kernel.org> Cc: Chris Mason <clm@fb.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Christian Koenig <christian.koenig@amd.com> Cc: Chuck Lever <cel@kernel.org> Cc: Coly Li <colyli@suse.de> Cc: Dai Ngo <Dai.Ngo@oracle.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Airlie <airlied@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Sterba <dsterba@suse.com> Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Cc: Gao Xiang <hsiangkao@linux.alibaba.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Huang Rui <ray.huang@amd.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Jason Wang <jasowang@redhat.com> Cc: Jeff Layton <jlayton@kernel.org> Cc: Jeffle Xu <jefflexu@linux.alibaba.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Kirill Tkhai <tkhai@ya.ru> Cc: Marijn Suijten <marijn.suijten@somainline.org> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Mike Snitzer <snitzer@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nadav Amit <namit@vmware.com> Cc: Neil Brown <neilb@suse.de> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Cc: Olga Kornievskaia <kolga@netapp.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Richard Weinberger <richard@nod.at> Cc: Rob Clark <robdclark@gmail.com> Cc: Rob Herring <robh@kernel.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Sean Paul <sean@poorly.run> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Song Liu <song@kernel.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Steven Price <steven.price@arm.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com> Cc: Tom Talpey <tom@talpey.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Yue Hu <huyue2@coolpad.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-10-04 10:32:23 -07:00
Qi Zheng	a304c23cd6	gfs2: dynamically allocate the gfs2-glock shrinker Use new APIs to dynamically allocate the gfs2-glock shrinker. Link: https://lkml.kernel.org/r/20230911094444.68966-9-zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Bob Peterson <rpeterso@redhat.com> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Abhinav Kumar <quic_abhinavk@quicinc.com> Cc: Alasdair Kergon <agk@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Anna Schumaker <anna@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Carlos Llamas <cmllamas@google.com> Cc: Chandan Babu R <chandan.babu@oracle.com> Cc: Chao Yu <chao@kernel.org> Cc: Chris Mason <clm@fb.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Christian Koenig <christian.koenig@amd.com> Cc: Chuck Lever <cel@kernel.org> Cc: Coly Li <colyli@suse.de> Cc: Dai Ngo <Dai.Ngo@oracle.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Airlie <airlied@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Sterba <dsterba@suse.com> Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Cc: Gao Xiang <hsiangkao@linux.alibaba.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Huang Rui <ray.huang@amd.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Jason Wang <jasowang@redhat.com> Cc: Jeff Layton <jlayton@kernel.org> Cc: Jeffle Xu <jefflexu@linux.alibaba.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Kirill Tkhai <tkhai@ya.ru> Cc: Marijn Suijten <marijn.suijten@somainline.org> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Mike Snitzer <snitzer@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nadav Amit <namit@vmware.com> Cc: Neil Brown <neilb@suse.de> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Cc: Olga Kornievskaia <kolga@netapp.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Richard Weinberger <richard@nod.at> Cc: Rob Clark <robdclark@gmail.com> Cc: Rob Herring <robh@kernel.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Sean Paul <sean@poorly.run> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Song Liu <song@kernel.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Steven Price <steven.price@arm.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com> Cc: Tom Talpey <tom@talpey.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Yue Hu <huyue2@coolpad.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-10-04 10:32:23 -07:00
Al Viro	0abd1557e2	gfs2: fix an oops in gfs2_permission In RCU mode, we might race with gfs2_evict_inode(), which zeroes ->i_gl. Freeing of the object it points to is RCU-delayed, so if we manage to fetch the pointer before it's been replaced with NULL, we are fine. Check if we'd fetched NULL and treat that as "bail out and tell the caller to get out of RCU mode". Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-10-03 16:47:21 +02:00
Bob Peterson	4c6a08125f	gfs2: ignore negated quota changes When lots of quota changes are made, there may be cases in which an inode's quota information is increased and then decreased, such as when blocks are added to a file, then deleted from it. If the timing is right, function do_qc can add pending quota changes to a transaction, then later, another call to do_qc can negate those changes, resulting in a net gain of 0. The quota_change information is recorded in the qc buffer (and qd element of the inode as well). The buffer is added to the transaction by the first call to do_qc, but a subsequent call changes the value from non-zero back to zero. At that point it's too late to remove the buffer_head from the transaction. Later, when the quota sync code is called, the zero-change qd element is discovered and flagged as an assert warning. If the fs is mounted with errors=panic, the kernel will panic. This is usually seen when files are truncated and the quota changes are negated by punch_hole/truncate which uses gfs2_quota_hold and gfs2_quota_unhold rather than block allocations that use gfs2_quota_lock and gfs2_quota_unlock which automatically do quota sync. This patch solves the problem by adding a check to qd_check_sync such that net-zero quota changes already added to the transaction are no longer deemed necessary to be synced, and skipped. In this case references are taken for the qd and the slot from do_qc so those need to be put. The normal sequence of events for a normal non-zero quota change is as follows: gfs2_quota_change do_qc qd_hold slot_hold Later, when the changes are to be synced: gfs2_quota_sync qd_fish qd_check_sync gets qd ref via lockref_get_not_dead do_sync do_qc(QC_SYNC) qd_put lockref_put_or_lock qd_unlock qd_put lockref_put_or_lock In the net-zero change case, we add a check to qd_check_sync so it puts the qd and slot references acquired in gfs2_quota_change and skip the unneeded sync. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-10-03 16:24:00 +02:00
Andreas Gruenbacher	089f4eb003	gfs2: Don't update inode timestamps for direct writes During direct reads and writes, the caller is holding the inode glock in deferred mode, which doesn't allow metadata updates. However, a previous change caused callers to update the inode modification time before carrying out direct writes, which caused the inode glock to be converted to exclusive mode for the timestamp update, only to be immediately converted back to deferred mode for the direct write. This locks out other direct readers and writers and wreaks havoc on performance. Fix that by reverting to not updating the inode modification time for direct writes. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-09-22 13:42:33 +02:00
Andreas Gruenbacher	21d9067efc	gfs2: Get rid of the gfs2_glock_is_held_* helpers Those helpers don't add any clarity and are easy to use wrong. Spell them out to make more obvious what's happening. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-09-22 13:42:19 +02:00
Andreas Gruenbacher	b4bf3d5c37	gfs2: Remove unused gfs2_extent_length argument The limit argument of gfs2_extent_length() is unused. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-09-18 23:13:21 +02:00
Andreas Gruenbacher	bbacb395ac	gfs2: Remove freeze_go_demote_ok Before commit `b77b4a4815` ("gfs2: Rework freeze / thaw logic"), the freeze glock was kept around in the glock cache in shared mode without being actively held while a filesystem is in thawed state. In that state, memory pressure could have eventually evicted the freeze glock, and the freeze_go_demote_ok callback was needed to prevent that from happening. With the freeze / thaw rework, the freeze glock is now always actively held in shared mode while a filesystem is thawed, and the freeze_go_demote_ok hack is no longer needed. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-09-18 23:13:21 +02:00
Andreas Gruenbacher	18c1db313e	gfs2: Simplify function gfs2_upgrade_iopen_glock When trying to upgrade the iopen glock, gfs2_upgrade_iopen_glock() tries to take the iopen glock with the LM_FLAG_TRY_1CB flag set before trying to take it without the LM_FLAG_TRY or LM_FLAG_TRY_1CB flags set. Both calls will cause the lock contention bast callbacks to be invoked throughout the cluster, and we really don't need them to be invoked twice. Remove the first LM_FLAG_TRY_1CB call to eliminate unnecessary dlm traffic. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-09-18 23:13:20 +02:00
Bob Peterson	fb95d53608	gfs2: Fix quota=quiet oversight Patch `eef46ab713` introduced a new gfs2 quota=quiet mount option. Checks for the new option were added to quota.c, but a check in gfs2_quota_lock_check() was overlooked. This patch adds the missing check. Fixes: `eef46ab713` ("gfs2: Introduce new quota=quiet mount option") Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-09-18 16:26:24 +02:00
Bob Peterson	62862485a4	gfs2: fix glock shrinker ref issues Before this patch, function gfs2_scan_glock_lru would only try to free glocks that had a reference count of 0. But if the reference count ever got to 0, the glock should have already been freed. Shrinker function gfs2_dispose_glock_lru checks whether glocks on the LRU are demote_ok, and if so, tries to demote them. But that's only possible if the reference count is at least 1. This patch changes gfs2_scan_glock_lru so it will try to demote and/or dispose of glocks that have a reference count of 1 and which are either demotable, or are already unlocked. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-09-18 16:00:50 +02:00
Andreas Gruenbacher	52954b7509	gfs2: Fix another freeze/thaw hang On a thawed filesystem, the freeze glock is held in shared mode. In order to initiate a cluster-wide freeze, the node initiating the freeze drops the freeze glock and grabs it in exclusive mode. The other nodes recognize this as contention on the freeze glock; function freeze_go_callback is invoked. This indicates to them that they must freeze the filesystem locally, drop the freeze glock, and then re-acquire it in shared mode before being able to unfreeze the filesystem locally. While a node is trying to re-acquire the freeze glock in shared mode, additional contention can occur. In that case, the node must behave in the same way as above. Unfortunately, freeze_go_callback() contains a check that causes it to bail out when the freeze glock isn't held in shared mode. Fix that to allow the glock to be unlocked or held in shared mode. In addition, update a reference to trylock_super() which has been renamed to super_trylock_shared() in the meantime. Fixes: `b77b4a4815` ("gfs2: Rework freeze / thaw logic") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-09-18 16:00:49 +02:00
Linus Torvalds	65d6e954e3	gfs2 fixes - Fix a glock state (non-)transition bug when a dlm request times out and is canceled, and we have locking requests that can now be granted immediately. - Various fixes and cleanups in how the logd and quotad daemons are woken up and terminated. - Fix several bugs in the quota data reference counting and shrinking. Free quota data objects synchronously in put_super() instead of letting call_rcu() run wild. - Make sure not to deallocate quota data during a withdraw; rather, defer quota data deallocation to put_super(). Withdraws can happen in contexts in which callers on the stack are holding quota data references. - Many minor quota fixes and cleanups by Bob. - Update the the mailing list address for gfs2 and dlm. (It's the same list for both and we are moving it to gfs2@lists.linux.dev.) - Various other minor cleanups. -----BEGIN PGP SIGNATURE----- iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmT3T7UUHGFncnVlbmJh QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTqxhw/+IWp+4cY4htNkTRG7xkheTeQ+5whG NU40mp7Hj+WY5GoHqsk676q1pBkVAq5mNN1kt9S/oC6lLHrdu1HLpdIkgFow2nAC nDqlEqx9/Da9Q4H/+K442usO90S4o1MmOXOE9xcGcvJLqK4FLDOVDXbUWa43OXrK 4HxgjgGSNPF4itD+U0o/V4f19jQ+4cNwmo6hGLyhsYillaUHiIXJezQlH7XycByM qGJqlG6odJ56wE38NG8Bt9Lj+91PsLLqO1TJxSYyzpf0h9QGQ2ySvu6/esTXwxWO XRuT4db7yjyAUhJoJMw+YU77xWQTz0/jriIDS7VqzvR1ns3GPaWdtb31TdUTBG4H IvBA8ep3oxHtcYFoPzCLBXgOIDej6KjAgS3RSv51yLeaZRHFUBc21fTSXbcDTIUs gkusZlRNQ9ANdBCVyf8hZxbE54HnaBJ8dKMZtynOXJEHs0EtGV8YKCNIpkFLxOvE vZkKcRsmVtuZ9fVhX1iH7dYmcsCMPI8RNo47k7hHk2EG8dU+eqyPSbi4QCmErNFf DlqX+fIuiDtOkbmWcrb2qdphn6j6bMLhDaOMJGIBOmgOPi+AU9dNAfmtu1cG4u1b 2TFyUISayiwqHJQgguzvDed15fxexYdgoLB7O9t9TMbCENxisguNa5TsAN6ZkiLQ 0hY6h80xSR2kCPU= =EonA -----END PGP SIGNATURE----- Merge tag 'gfs2-v6.5-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Andreas Gruenbacher: - Fix a glock state (non-)transition bug when a dlm request times out and is canceled, and we have locking requests that can now be granted immediately - Various fixes and cleanups in how the logd and quotad daemons are woken up and terminated - Fix several bugs in the quota data reference counting and shrinking. Free quota data objects synchronously in put_super() instead of letting call_rcu() run wild - Make sure not to deallocate quota data during a withdraw; rather, defer quota data deallocation to put_super(). Withdraws can happen in contexts in which callers on the stack are holding quota data references - Many minor quota fixes and cleanups by Bob - Update the the mailing list address for gfs2 and dlm. (It's the same list for both and we are moving it to gfs2@lists.linux.dev) - Various other minor cleanups * tag 'gfs2-v6.5-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (51 commits) MAINTAINERS: Update dlm mailing list MAINTAINERS: Update gfs2 mailing list gfs2: change qd_slot_count to qd_slot_ref gfs2: check for no eligible quota changes gfs2: Remove useless assignment gfs2: simplify slot_get gfs2: Simplify qd2offset gfs2: introduce qd_bh_get_or_undo gfs2: Remove quota allocation info from quota file gfs2: use constant for array size gfs2: Set qd_sync_gen in do_sync gfs2: Remove useless err set gfs2: Small gfs2_quota_lock cleanup gfs2: move qdsb_put and reduce redundancy gfs2: improvements to sysfs status gfs2: Don't try to sync non-changes gfs2: Simplify function need_sync gfs2: remove unneeded pg_oflow variable gfs2: remove unneeded variable done gfs2: pass sdp to gfs2_write_buf_to_page ...	2023-09-05 13:00:28 -07:00
Bob Peterson	0e072cac92	gfs2: change qd_slot_count to qd_slot_ref Variable qd_slot_count is a reference count, not a count of slots. This patch renames it to qd_slot_ref to make that more clear. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-09-05 15:58:18 +02:00
Bob Peterson	06aa6fd31a	gfs2: check for no eligible quota changes Before this patch, function gfs2_quota_sync would always allocate a page full of memory and increment its quota sync generation number. This happened even when the system was completely idle or if no blocks were allocated or quota changes made. This patch adds function qd_changed to determine if any changes have been made that qualify for a quota sync. If not, it avoids the memory allocation and bumping the generation number, along with all the additional work it would do. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-09-05 15:58:18 +02:00
Bob Peterson	36a740916a	gfs2: Remove useless assignment This assignment is unnecessary because if error was not already 0, it would have branched to an error label already. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-09-05 15:58:18 +02:00
Bob Peterson	9ab7b78a13	gfs2: simplify slot_get Simplify function slot_get and get rid of the goto that jumps into the middle of an else branch. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-09-05 15:58:18 +02:00
Bob Peterson	8f190c97a4	gfs2: Simplify qd2offset This is a minor cleanup of function qd2offset. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-09-05 15:58:18 +02:00
Bob Peterson	7dbc6ae60d	gfs2: introduce qd_bh_get_or_undo This patch is an attempt to force some consistency in quota sync processing. Two functions (qd_fish and gfs2_quota_unlock) called qd_check_sync, after which they both called bh_get, and if that failed, they took the same steps to undo the actions of qd_check_sync. This patch introduces a new function, qd_bh_get_or_undo, which performs the same steps, reducing code redundancy. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-09-05 15:58:18 +02:00
Bob Peterson	3932e50730	gfs2: Remove quota allocation info from quota file Function do_sync called gfs2_qa_get and put for quota allocation data. But the inode in question is the system master quota file, which is never subject to quotas. Therefore, a qa structure should be unnecessary and if anything accesses it, it's probably a bug. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-09-05 15:58:18 +02:00
Bob Peterson	c9ff3c65c2	gfs2: use constant for array size Function gfs2_quota_unlock declared an array of 4 qd elements. We have a constant for that, we should be using it. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-09-05 15:58:18 +02:00
Bob Peterson	fce17cb0ee	gfs2: Set qd_sync_gen in do_sync Func do_sync was called in two places: gfs2_quota_unlock and gfs2_quota_sync. In gfs2_quota_sync it updated qd_sync_gen to the latest superblock sync gen, if do_sync was successful. In gfs2_quota_unlock it didn't update the value. That can only lead to extra work, for example, if the value is synced by gfs2_quota_unlock but still has the old value. This patch moves the setting of qd_sync_gen inside do_sync so we are guaranteed consistency. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-09-05 15:58:18 +02:00
Bob Peterson	dec64ae37b	gfs2: Remove useless err set Function gfs2_adjust_quota set variable err, then set it again to a different value. This patch removes the redundant set. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-09-05 15:58:18 +02:00
Bob Peterson	f511e60a55	gfs2: Small gfs2_quota_lock cleanup No need to set error = 0 since it's set further down. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-09-05 15:58:18 +02:00
Bob Peterson	a4d22e337d	gfs2: move qdsb_put and reduce redundancy This patch looks more invasive than it is. It simply moves function qdsb_put before qd_unlock, then changes qd_unlock to call it rather than open coding it. Again, this reduces redundancy. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-09-05 15:58:18 +02:00
Bob Peterson	03d468f1c0	gfs2: improvements to sysfs status This patch adds some new fields to the gfs2 status file in sysfs to aid in debugging. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-09-05 15:58:18 +02:00
Bob Peterson	9f494e9bdc	gfs2: Don't try to sync non-changes Function need_sync is supposed to determine if a qd element needs to be synced. If the "change" (qd_change) is zero, it does not need to be synced because there's literally no change in the value. Before this patch need_sync returned false if value < 0. That should be <= 0. This patch changes the check to <=. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-09-05 15:58:18 +02:00
Bob Peterson	2a4f651167	gfs2: Simplify function need_sync This patch simplifies function need_sync by eliminating a variable in favor of just returning the appropriate value as soon as we know it. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-09-05 15:58:18 +02:00
Bob Peterson	e34c16c9c6	gfs2: remove unneeded pg_oflow variable Function gfs2_write_disk_quota checks if its write overflows onto another page, and if so, does a second write. Before this patch it kept two variables for this, but only one is needed. This patch simplifies it by eliminating pg_oflow. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-09-05 15:58:17 +02:00
Bob Peterson	f0418e4b56	gfs2: remove unneeded variable done Function gfs2_write_buf_to_page uses variable done to exit its loop, but it's unnecessary if we just code an infinite loop and exit when we need. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-09-05 15:58:17 +02:00
Bob Peterson	d96dad2715	gfs2: pass sdp to gfs2_write_buf_to_page This patch passes the superblock pointer to gfs2_write_buf_to_page so it becomes more apparent it's dealing with the system quota file. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-09-05 15:58:17 +02:00
Bob Peterson	adfd2b5e4f	gfs2: pass sdp in to gfs2_write_disk_quota Like the previous patch, we now pass the superblock pointer to function gfs2_write_disk_quota. This makes the code more understandable, since it only operates on the quota inode. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-09-05 15:58:17 +02:00
Bob Peterson	ee1768e467	gfs2: Pass sdp to gfs2_adjust_quota Before this change function gfs2_adjust_quota's first parameter was an gfs2_inode pointer. But it always pointed to the quota inode. Here we switch that to pass the superblock pointer, sdp, so it is easier to read the code and understand that it's only dealing with the quota inode. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-09-05 15:58:17 +02:00
Bob Peterson	768963ab07	gfs2: remove dead code for quota writes Since patch `845802b112` function gfs2_write_buf_to_page checks if the target inode is jdata or ordered. This function only operates on the system quota file, which is always jdata, so the check for jdata is useless. This patch removes it. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-09-05 15:58:17 +02:00
Bob Peterson	eef46ab713	gfs2: Introduce new quota=quiet mount option This patch adds a new mount option quota=quiet which is the same as quota=on but it suppresses gfs2 quota error messages. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-09-05 15:58:17 +02:00
Andreas Gruenbacher	267d1a011e	gfs2: Add device name to gfs2_logd and gfs2_quotad Add the device name to the names of the gfs2_logd and gfs2_quotad kernel threads to allow for easier identification. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-09-05 15:58:17 +02:00
Andreas Gruenbacher	ab8eecf5d0	gfs2: Rename "freeze_workqueue" to "gfs2_freeze" Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-09-05 15:58:17 +02:00
Andreas Gruenbacher	5c0dc371a2	gfs2: Rename "gfs_recovery" workqueue to "gfs2_recovery" Rename the "gfs_recovery" workqueue to "gfs2_recovery", and gfs_recovery_wq to gfs2_recovery_wq. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-09-05 15:58:17 +02:00
Andreas Gruenbacher	e3da6be3d7	gfs2: Fix withdraw race Function gfs2_withdraw() tries to synchronize concurrent callers by atomically setting the SDF_WITHDRAWN flag in the first caller, setting the SDF_WITHDRAW_IN_PROG flag to indicate that a withdraw is in progress, performing the actual withdraw, and clearing the SDF_WITHDRAW_IN_PROG flag when done. All other callers wait for the SDF_WITHDRAW_IN_PROG flag to be cleared before returning. This leaves a small window in which callers can find the SDF_WITHDRAWN flag set before the SDF_WITHDRAW_IN_PROG flag has been set, causing them to return prematurely, before the withdraw has been completed. Fix that by setting the SDF_WITHDRAWN and SDF_WITHDRAW_IN_PROG flags atomically. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-09-05 15:58:17 +02:00
Andreas Gruenbacher	fe0690f0a6	gfs2: Sanitize kthread stopping Immediately stop the logd and quotad kernel threads when a filesystem withdraw is detected: those threads aren't doing anything useful after a withdraw. (Depends on the extra logd and quotad task struct references held since commit 7a109f383fa3 ("gfs2: Fix asynchronous thread destruction").) In addition, check for kthread_should_stop() in the wait condition in gfs2_quotad() to stop immediately when kthread_stop() is called. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-09-05 15:58:17 +02:00
Andreas Gruenbacher	e4a8b5481c	gfs2: Switch to wait_event in gfs2_quotad In gfs2_quotad(), switch from an open-coded wait loop to wait_event_interruptible_timeout(). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-09-05 15:58:17 +02:00
Andreas Gruenbacher	fe4f7940d2	gfs2: Fix asynchronous thread destruction The kernel threads are currently stopped and destroyed synchronously by gfs2_make_fs_ro() and gfs2_put_super(), and asynchronously by signal_our_withdraw(), with no synchronization, so the synchronous and asynchronous contexts can race with each other. First, when creating the kernel threads, take an extra task struct reference so that the task struct won't go away immediately when they terminate. This allows those kthreads to terminate immediately when they're done rather than hanging around as zombies until they are reaped by kthread_stop(). When kthread_stop() is called on a terminated kthread, it will return immediately. Second, in signal_our_withdraw(), once the SDF_JOURNAL_LIVE flag has been cleared, wake up the logd and quotad wait queues instead of stopping the logd and quotad kthreads. The kthreads are then expected to terminate automatically within short time, but if they cannot, they will not block the withdraw. For example, if a user process and one of the kthread decide to withdraw at the same time, only one of them will perform the actual withdraw and the other will wait for it to be done. If the kthread ends up being the one to wait, the withdrawing user process won't be able to stop it. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-09-05 15:58:17 +02:00
Andreas Gruenbacher	f66af88e33	gfs2: Stop using gfs2_make_fs_ro for withdraw [ 81.372851][ T5532] CPU: 1 PID: 5532 Comm: syz-executor.0 Not tainted 6.2.0-rc1-syzkaller-dirty #0 [ 81.382080][ T5532] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/12/2023 [ 81.392343][ T5532] Call Trace: [ 81.395654][ T5532] <TASK> [ 81.398603][ T5532] dump_stack_lvl+0x1b1/0x290 [ 81.418421][ T5532] gfs2_assert_warn_i+0x19a/0x2e0 [ 81.423480][ T5532] gfs2_quota_cleanup+0x4c6/0x6b0 [ 81.428611][ T5532] gfs2_make_fs_ro+0x517/0x610 [ 81.457802][ T5532] gfs2_withdraw+0x609/0x1540 [ 81.481452][ T5532] gfs2_inode_refresh+0xb2d/0xf60 [ 81.506658][ T5532] gfs2_instantiate+0x15e/0x220 [ 81.511504][ T5532] gfs2_glock_wait+0x1d9/0x2a0 [ 81.516352][ T5532] do_sync+0x485/0xc80 [ 81.554943][ T5532] gfs2_quota_sync+0x3da/0x8b0 [ 81.559738][ T5532] gfs2_sync_fs+0x49/0xb0 [ 81.564063][ T5532] sync_filesystem+0xe8/0x220 [ 81.568740][ T5532] generic_shutdown_super+0x6b/0x310 [ 81.574112][ T5532] kill_block_super+0x79/0xd0 [ 81.578779][ T5532] deactivate_locked_super+0xa7/0xf0 [ 81.584064][ T5532] cleanup_mnt+0x494/0x520 [ 81.593753][ T5532] task_work_run+0x243/0x300 [ 81.608837][ T5532] exit_to_user_mode_loop+0x124/0x150 [ 81.614232][ T5532] exit_to_user_mode_prepare+0xb2/0x140 [ 81.619820][ T5532] syscall_exit_to_user_mode+0x26/0x60 [ 81.625287][ T5532] do_syscall_64+0x49/0xb0 [ 81.629710][ T5532] entry_SYSCALL_64_after_hwframe+0x63/0xcd In this backtrace, gfs2_quota_sync() takes quota data references and then calls do_sync(). Function do_sync() encounters filesystem corruption and withdraws the filesystem, which (among other things) calls gfs2_quota_cleanup(). Function gfs2_quota_cleanup() wrongly assumes that nobody is holding any quota data references anymore, and destroys all quota data objects. When gfs2_quota_sync() then resumes and dereferences the quota data objects it is holding, those objects are no longer there. Function gfs2_quota_cleanup() deals with resource deallocation and can easily be delayed until gfs2_put_super() in the case of a filesystem withdraw. In fact, most of the other work gfs2_make_fs_ro() does is unnecessary during a withdraw as well, so change signal_our_withdraw() to skip gfs2_make_fs_ro() and perform the necessary steps directly instead. Thanks to Edward Adam Davis <eadavis@sina.com> for the initial patches. Link: https://lore.kernel.org/all/0000000000002b5e2405f14e860f@google.com Reported-by: syzbot+3f6a670108ce43356017@syzkaller.appspotmail.com Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-09-05 15:58:17 +02:00
Andreas Gruenbacher	a475c5dd16	gfs2: Free quota data objects synchronously In gfs2_quota_cleanup(), wait for the quota data objects to be freed before returning. Otherwise, there is no guarantee that the quota data objects will be gone when their kmem cache is destroyed. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-09-05 15:58:17 +02:00
Andreas Gruenbacher	bb73ae8ff3	gfs2: Fix initial quota data refcount Fix the refcount of quota data objects created directly by gfs2_quota_init(): those are placed into the in-memory quota "database" for eventual syncing to the main quota file, but they are not actively held and should thus have an initial refcount of 0. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-09-05 15:58:17 +02:00
Andreas Gruenbacher	fae2e73a55	gfs2: No more quota complaints after withdraw Once a filesystem is withdrawn, don't complain about quota changes that can't be synced to the main quota file anymore. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-09-05 15:58:16 +02:00
Andreas Gruenbacher	faada74a90	gfs2: Factor out duplicate quota data disposal code Rename gfs2_qd_dispose() to gfs2_qd_dispose_list(). Move some code duplicated in gfs2_qd_dispose_list() and gfs2_quota_cleanup() into a new gfs2_qd_dispose() function. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-09-05 15:58:16 +02:00
Andreas Gruenbacher	961fe3422e	gfs2: Use gfs2_qd_dispose in gfs2_quota_cleanup Change gfs2_quota_cleanup() to move the quota data objects to dispose of on a dispose list and call gfs2_qd_dispose() on that list, like gfs2_qd_shrink_scan() does, instead of disposing of the quota data objects directly. This may look a bit pointless by itself, but it will make more sense in combination with a fix that follows. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2023-09-05 15:58:16 +02:00

1 2 3 4 5 ...

2917 Commits