linux/fs/afs
Yang Shi 16380f52b7 mm: page_ref: remove folio_try_get_rcu()
commit fa2690af57 upstream.

The below bug was reported on a non-SMP kernel:

[  275.267158][ T4335] ------------[ cut here ]------------
[  275.267949][ T4335] kernel BUG at include/linux/page_ref.h:275!
[  275.268526][ T4335] invalid opcode: 0000 [#1] KASAN PTI
[  275.269001][ T4335] CPU: 0 PID: 4335 Comm: trinity-c3 Not tainted 6.7.0-rc4-00061-gefa7df3e3bb5 #1
[  275.269787][ T4335] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[  275.270679][ T4335] RIP: 0010:try_get_folio (include/linux/page_ref.h:275 (discriminator 3) mm/gup.c:79 (discriminator 3))
[  275.272813][ T4335] RSP: 0018:ffffc90005dcf650 EFLAGS: 00010202
[  275.273346][ T4335] RAX: 0000000000000246 RBX: ffffea00066e0000 RCX: 0000000000000000
[  275.274032][ T4335] RDX: fffff94000cdc007 RSI: 0000000000000004 RDI: ffffea00066e0034
[  275.274719][ T4335] RBP: ffffea00066e0000 R08: 0000000000000000 R09: fffff94000cdc006
[  275.275404][ T4335] R10: ffffea00066e0037 R11: 0000000000000000 R12: 0000000000000136
[  275.276106][ T4335] R13: ffffea00066e0034 R14: dffffc0000000000 R15: ffffea00066e0008
[  275.276790][ T4335] FS:  00007fa2f9b61740(0000) GS:ffffffff89d0d000(0000) knlGS:0000000000000000
[  275.277570][ T4335] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  275.278143][ T4335] CR2: 00007fa2f6c00000 CR3: 0000000134b04000 CR4: 00000000000406f0
[  275.278833][ T4335] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  275.279521][ T4335] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  275.280201][ T4335] Call Trace:
[  275.280499][ T4335]  <TASK>
[ 275.280751][ T4335] ? die (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:434 arch/x86/kernel/dumpstack.c:447)
[ 275.281087][ T4335] ? do_trap (arch/x86/kernel/traps.c:112 arch/x86/kernel/traps.c:153)
[ 275.281463][ T4335] ? try_get_folio (include/linux/page_ref.h:275 (discriminator 3) mm/gup.c:79 (discriminator 3))
[ 275.281884][ T4335] ? try_get_folio (include/linux/page_ref.h:275 (discriminator 3) mm/gup.c:79 (discriminator 3))
[ 275.282300][ T4335] ? do_error_trap (arch/x86/kernel/traps.c:174)
[ 275.282711][ T4335] ? try_get_folio (include/linux/page_ref.h:275 (discriminator 3) mm/gup.c:79 (discriminator 3))
[ 275.283129][ T4335] ? handle_invalid_op (arch/x86/kernel/traps.c:212)
[ 275.283561][ T4335] ? try_get_folio (include/linux/page_ref.h:275 (discriminator 3) mm/gup.c:79 (discriminator 3))
[ 275.283990][ T4335] ? exc_invalid_op (arch/x86/kernel/traps.c:264)
[ 275.284415][ T4335] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:568)
[ 275.284859][ T4335] ? try_get_folio (include/linux/page_ref.h:275 (discriminator 3) mm/gup.c:79 (discriminator 3))
[ 275.285278][ T4335] try_grab_folio (mm/gup.c:148)
[ 275.285684][ T4335] __get_user_pages (mm/gup.c:1297 (discriminator 1))
[ 275.286111][ T4335] ? __pfx___get_user_pages (mm/gup.c:1188)
[ 275.286579][ T4335] ? __pfx_validate_chain (kernel/locking/lockdep.c:3825)
[ 275.287034][ T4335] ? mark_lock (kernel/locking/lockdep.c:4656 (discriminator 1))
[ 275.287416][ T4335] __gup_longterm_locked (mm/gup.c:1509 mm/gup.c:2209)
[ 275.288192][ T4335] ? __pfx___gup_longterm_locked (mm/gup.c:2204)
[ 275.288697][ T4335] ? __pfx_lock_acquire (kernel/locking/lockdep.c:5722)
[ 275.289135][ T4335] ? __pfx___might_resched (kernel/sched/core.c:10106)
[ 275.289595][ T4335] pin_user_pages_remote (mm/gup.c:3350)
[ 275.290041][ T4335] ? __pfx_pin_user_pages_remote (mm/gup.c:3350)
[ 275.290545][ T4335] ? find_held_lock (kernel/locking/lockdep.c:5244 (discriminator 1))
[ 275.290961][ T4335] ? mm_access (kernel/fork.c:1573)
[ 275.291353][ T4335] process_vm_rw_single_vec+0x142/0x360
[ 275.291900][ T4335] ? __pfx_process_vm_rw_single_vec+0x10/0x10
[ 275.292471][ T4335] ? mm_access (kernel/fork.c:1573)
[ 275.292859][ T4335] process_vm_rw_core+0x272/0x4e0
[ 275.293384][ T4335] ? hlock_class (arch/x86/include/asm/bitops.h:227 arch/x86/include/asm/bitops.h:239 include/asm-generic/bitops/instrumented-non-atomic.h:142 kernel/locking/lockdep.c:228)
[ 275.293780][ T4335] ? __pfx_process_vm_rw_core+0x10/0x10
[ 275.294350][ T4335] process_vm_rw (mm/process_vm_access.c:284)
[ 275.294748][ T4335] ? __pfx_process_vm_rw (mm/process_vm_access.c:259)
[ 275.295197][ T4335] ? __task_pid_nr_ns (include/linux/rcupdate.h:306 (discriminator 1) include/linux/rcupdate.h:780 (discriminator 1) kernel/pid.c:504 (discriminator 1))
[ 275.295634][ T4335] __x64_sys_process_vm_readv (mm/process_vm_access.c:291)
[ 275.296139][ T4335] ? syscall_enter_from_user_mode (kernel/entry/common.c:94 kernel/entry/common.c:112)
[ 275.296642][ T4335] do_syscall_64 (arch/x86/entry/common.c:51 (discriminator 1) arch/x86/entry/common.c:82 (discriminator 1))
[ 275.297032][ T4335] ? __task_pid_nr_ns (include/linux/rcupdate.h:306 (discriminator 1) include/linux/rcupdate.h:780 (discriminator 1) kernel/pid.c:504 (discriminator 1))
[ 275.297470][ T4335] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4300 kernel/locking/lockdep.c:4359)
[ 275.297988][ T4335] ? do_syscall_64 (arch/x86/include/asm/cpufeature.h:171 arch/x86/entry/common.c:97)
[ 275.298389][ T4335] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4300 kernel/locking/lockdep.c:4359)
[ 275.298906][ T4335] ? do_syscall_64 (arch/x86/include/asm/cpufeature.h:171 arch/x86/entry/common.c:97)
[ 275.299304][ T4335] ? do_syscall_64 (arch/x86/include/asm/cpufeature.h:171 arch/x86/entry/common.c:97)
[ 275.299703][ T4335] ? do_syscall_64 (arch/x86/include/asm/cpufeature.h:171 arch/x86/entry/common.c:97)
[ 275.300115][ T4335] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129)

This BUG is the VM_BUG_ON(!in_atomic() && !irqs_disabled()) assertion in
folio_ref_try_add_rcu() for non-SMP kernel.

The process_vm_readv() calls GUP to pin the THP. An optimization for
pinning THP instroduced by commit 57edfcfd34 ("mm/gup: accelerate thp
gup even for "pages != NULL"") calls try_grab_folio() to pin the THP,
but try_grab_folio() is supposed to be called in atomic context for
non-SMP kernel, for example, irq disabled or preemption disabled, due to
the optimization introduced by commit e286781d5f ("mm: speculative
page references").

The commit efa7df3e3b ("mm: align larger anonymous mappings on THP
boundaries") is not actually the root cause although it was bisected to.
It just makes the problem exposed more likely.

The follow up discussion suggested the optimization for non-SMP kernel
may be out-dated and not worth it anymore [1].  So removing the
optimization to silence the BUG.

However calling try_grab_folio() in GUP slow path actually is
unnecessary, so the following patch will clean this up.

[1] https://lore.kernel.org/linux-mm/821cf1d6-92b9-4ac4-bacc-d8f2364ac14f@paulmck-laptop/

Link: https://lkml.kernel.org/r/20240625205350.1777481-1-yang@os.amperecomputing.com
Fixes: 57edfcfd34 ("mm/gup: accelerate thp gup even for "pages != NULL"")
Signed-off-by: Yang Shi <yang@os.amperecomputing.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Tested-by: Oliver Sang <oliver.sang@intel.com>
Acked-by: Peter Xu <peterx@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Vivek Kasireddy <vivek.kasireddy@intel.com>
Cc: <stable@vger.kernel.org>	[6.6+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-25 09:50:56 +02:00
..
addr_list.c afs: Use kfree_rcu() instead of casting kfree() to rcu_callback_t 2020-03-13 10:47:33 -07:00
afs_cm.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
afs_fs.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
afs_vl.h afs: Implement client support for the YFSVL.GetCellName RPC op 2020-06-04 15:37:57 +01:00
afs.h rxrpc: Fix timeout of a call that hasn't yet been granted a channel 2023-05-01 07:43:19 +01:00
callback.c afs: fix the usage of read_seqbegin_or_lock() in afs_lookup_volume_rcu() 2024-02-05 20:14:16 +00:00
cell.c afs: Fix overwriting of result of DNS query 2024-01-01 12:42:34 +00:00
cmservice.c rxrpc: Tidy up abort generation infrastructure 2023-01-06 09:43:32 +00:00
dir_edit.c afs: fix the afs_dir_get_folio return value 2023-05-06 10:10:08 -07:00
dir_silly.c netfs: Fix gcc-12 warning by embedding vfs inode in netfs_i_context 2022-06-09 13:55:00 -07:00
dir.c afs: Revert "afs: Hide silly-rename files from userspace" 2024-03-26 18:20:03 -04:00
dynroot.c afs: Fix dynamic root lookup DNS check 2024-01-01 12:42:33 +00:00
file.c splice: Use filemap_splice_read() instead of generic_file_splice_read() 2023-05-24 08:42:17 -06:00
flock.c fs: remove locks_inode 2023-01-11 06:52:43 -05:00
fs_operation.c netfs: Fix gcc-12 warning by embedding vfs inode in netfs_i_context 2022-06-09 13:55:00 -07:00
fs_probe.c afs: Fix lost servers_outstanding count 2022-12-22 11:40:35 +00:00
fsclient.c afs: Use the operation issue time instead of the reply time for callbacks 2022-09-01 11:44:13 +01:00
inode.c fs: pass the request_mask to generic_fillattr 2023-08-09 08:56:36 +02:00
internal.h afs: Fix use-after-free due to get/remove race in volume tree 2024-01-01 12:42:34 +00:00
Kconfig afs: Convert afs to use the new fscache API 2022-01-07 13:44:47 +00:00
main.c afs: Convert afs to use the new fscache API 2022-01-07 13:44:47 +00:00
Makefile afs: Convert afs to use the new fscache API 2022-01-07 13:44:47 +00:00
misc.c afs: Return -EAGAIN, not -EREMOTEIO, when a file already locked 2022-09-06 21:33:01 -04:00
mntpt.c afs: Don't cross .backup mountpoint from backup volume 2024-06-16 13:47:30 +02:00
proc.c afs: Use refcount_t rather than atomic_t 2022-08-02 18:10:11 +01:00
protocol_afs.h afs: Fix corruption in reads at fpos 2G-4G from an OpenAFS server 2021-09-13 09:14:21 +01:00
protocol_uae.h afs: Add support for the UAE error table 2019-06-28 18:37:53 +01:00
protocol_yfs.h afs: Fix corruption in reads at fpos 2G-4G from an OpenAFS server 2021-09-13 09:14:21 +01:00
rotate.c afs: Adjust ACK interpretation to try and cope with NAT 2022-05-22 21:03:02 +01:00
rxrpc.c afs: Fix refcount underflow from error handling race 2023-12-20 17:01:43 +01:00
security.c fs: port ->permission() to pass mnt_idmap 2023-01-19 09:24:28 +01:00
server_list.c afs: Fix afs_server_list to be cleaned up with RCU 2023-12-03 07:33:02 +01:00
server.c afs: fix the usage of read_seqbegin_or_lock() in afs_find_server*() 2024-02-05 20:14:16 +00:00
super.c afs: Fix file locking on R/O volumes to operate in local mode 2023-12-03 07:33:05 +01:00
vl_alias.c afs: Add tracing for cell refcount and active user count 2020-10-16 14:39:21 +01:00
vl_list.c afs: Use refcount_t rather than atomic_t 2022-08-02 18:10:11 +01:00
vl_probe.c afs: Fix vlserver probe RTT handling 2023-06-16 14:43:41 -07:00
vl_rotate.c afs: Return ENOENT if no cell DNS record can be found 2023-12-03 07:33:05 +01:00
vlclient.c afs: Fix fall-through warnings for Clang 2021-05-25 07:30:34 -10:00
volume.c afs: Increase buffer size in afs_update_volume_status() 2024-03-01 13:35:07 +01:00
write.c mm: page_ref: remove folio_try_get_rcu() 2024-07-25 09:50:56 +02:00
xattr.c fs: port xattr to mnt_idmap 2023-01-19 09:24:28 +01:00
xdr_fs.h afs: Fix directory entry size calculation 2021-01-04 12:25:19 +00:00
yfsclient.c afs: Use the operation issue time instead of the reply time for callbacks 2022-09-01 11:44:13 +01:00