mm/pagewalk: fix usage of pmd_leaf()/pud_leaf() without present check

pmd_leaf()/pud_leaf() only implies a pmd_present()/pud_present() check on
some architectures.  We really should check for
pmd_present()/pud_present() first.

This should explain the report we got on ppc64 (which has
CONFIG_PGTABLE_HAS_HUGE_LEAVES set in the config) that triggered:
	VM_WARN_ON_ONCE(pmd_leaf(pmdp_get_lockless(pmdp)));

Likely we had a PMD migration entry for which pmd_leaf() did not trigger. 
We raced with restoring the PMD migration entry, and suddenly saw a
pmd_leaf().  In this case, pte_offset_map_lock() saved us from more
trouble, because it rechecks the PMD value, but we would not have
processed the migration entry -- which is not too bad because the only
user of FW_MIGRATION is KSM for unsharing, and KSM only applies to small
folios.

Further, we shouldn't re-read the PMD/PUD value for our warning, the
primary purpose of the VM_WARN_ON_ONCE() is to find spurious use of
pmd_leaf()/pud_leaf() without CONFIG_PGTABLE_HAS_HUGE_LEAVES.

As a side note, we are currently not implementing FW_MIGRATION support for
PUD migration entries, which likely should exist due to hugetlb.  Add a
TODO so this won't fall through the cracks if more FW_MIGRATION users get
added.

Was able to write a quick reproducer and verify that the issue no longer triggers with this fix.

https://gitlab.com/davidhildenbrand/scratchspace/-/blob/main/reproducers/move-pages-pmd-leaf.c

Without this fix after a couple of seconds in a VM with 2 NUMA nodes:

[   54.333753] ------------[ cut here ]------------
[   54.334901] WARNING: CPU: 20 PID: 1704 at mm/pagewalk.c:815 folio_walk_start+0x48f/0x6e0
[   54.336455] Modules linked in: ...
[   54.345009] CPU: 20 UID: 0 PID: 1704 Comm: move-pages-pmd- Not tainted 6.12.0-rc2+ #81
[   54.346529] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
[   54.348191] RIP: 0010:folio_walk_start+0x48f/0x6e0
[   54.349134] Code: b5 ad 48 8d 35 00 00 00 00 e8 6d 59 d7 ff e8 08 74 da ff e9 9c fe ff ff 4c 8b 7c 24 08 4c 89 ff e8 26 2b be 00 e9 8a fe ff ff <0f> 0b e9 ec fe ff ff f7 c2 ff 0f 00 00 0f 85 81 fe ff ff 48 8b 02
[   54.352660] RSP: 0018:ffffb7e4c430bc78 EFLAGS: 00010282
[   54.353679] RAX: 80000002a3e008e7 RBX: ffff9946039aa580 RCX: ffff994380000000
[   54.355056] RDX: ffff994606aec000 RSI: 00007f004b000000 RDI: 0000000000000000
[   54.356440] RBP: 00007f004b000000 R08: 0000000000000591 R09: 0000000000000001
[   54.357820] R10: 0000000000000200 R11: 0000000000000001 R12: ffffb7e4c430bd10
[   54.359198] R13: ffff994606aec2c0 R14: 0000000000000002 R15: ffff994604a89b00
[   54.360564] FS:  00007f004ae006c0(0000) GS:ffff9947f7400000(0000) knlGS:0000000000000000
[   54.362111] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   54.363242] CR2: 00007f004adffe58 CR3: 0000000281e12005 CR4: 0000000000770ef0
[   54.364615] PKRU: 55555554
[   54.365153] Call Trace:
[   54.365646]  <TASK>
[   54.366073]  ? __warn.cold+0xb7/0x14d
[   54.366796]  ? folio_walk_start+0x48f/0x6e0
[   54.367628]  ? report_bug+0xff/0x140
[   54.368324]  ? handle_bug+0x58/0x90
[   54.369019]  ? exc_invalid_op+0x17/0x70
[   54.369771]  ? asm_exc_invalid_op+0x1a/0x20
[   54.370606]  ? folio_walk_start+0x48f/0x6e0
[   54.371415]  ? folio_walk_start+0x9e/0x6e0
[   54.372227]  do_pages_move+0x1c5/0x680
[   54.372972]  kernel_move_pages+0x1a1/0x2b0
[   54.373804]  __x64_sys_move_pages+0x25/0x30

Link: https://lkml.kernel.org/r/20241015111236.1290921-1-david@redhat.com
Fixes: aa39ca6940 ("mm/pagewalk: introduce folio_walk_start() + folio_walk_end()")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: syzbot+7d917f67c05066cec295@syzkaller.appspotmail.com
Closes: https://lkml.kernel.org/r/670d3248.050a0220.3e960.0064.GAE@google.com
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This commit is contained in:
David Hildenbrand 2024-10-15 13:12:36 +02:00 committed by Andrew Morton
parent 8198375843
commit 7c18d48110

View File

@ -744,7 +744,8 @@ struct folio *folio_walk_start(struct folio_walk *fw,
pud = pudp_get(pudp); pud = pudp_get(pudp);
if (pud_none(pud)) if (pud_none(pud))
goto not_found; goto not_found;
if (IS_ENABLED(CONFIG_PGTABLE_HAS_HUGE_LEAVES) && pud_leaf(pud)) { if (IS_ENABLED(CONFIG_PGTABLE_HAS_HUGE_LEAVES) &&
(!pud_present(pud) || pud_leaf(pud))) {
ptl = pud_lock(vma->vm_mm, pudp); ptl = pud_lock(vma->vm_mm, pudp);
pud = pudp_get(pudp); pud = pudp_get(pudp);
@ -753,6 +754,10 @@ struct folio *folio_walk_start(struct folio_walk *fw,
fw->pudp = pudp; fw->pudp = pudp;
fw->pud = pud; fw->pud = pud;
/*
* TODO: FW_MIGRATION support for PUD migration entries
* once there are relevant users.
*/
if (!pud_present(pud) || pud_devmap(pud) || pud_special(pud)) { if (!pud_present(pud) || pud_devmap(pud) || pud_special(pud)) {
spin_unlock(ptl); spin_unlock(ptl);
goto not_found; goto not_found;
@ -769,12 +774,13 @@ struct folio *folio_walk_start(struct folio_walk *fw,
} }
pmd_table: pmd_table:
VM_WARN_ON_ONCE(pud_leaf(*pudp)); VM_WARN_ON_ONCE(!pud_present(pud) || pud_leaf(pud));
pmdp = pmd_offset(pudp, addr); pmdp = pmd_offset(pudp, addr);
pmd = pmdp_get_lockless(pmdp); pmd = pmdp_get_lockless(pmdp);
if (pmd_none(pmd)) if (pmd_none(pmd))
goto not_found; goto not_found;
if (IS_ENABLED(CONFIG_PGTABLE_HAS_HUGE_LEAVES) && pmd_leaf(pmd)) { if (IS_ENABLED(CONFIG_PGTABLE_HAS_HUGE_LEAVES) &&
(!pmd_present(pmd) || pmd_leaf(pmd))) {
ptl = pmd_lock(vma->vm_mm, pmdp); ptl = pmd_lock(vma->vm_mm, pmdp);
pmd = pmdp_get(pmdp); pmd = pmdp_get(pmdp);
@ -786,7 +792,7 @@ pmd_table:
if (pmd_none(pmd)) { if (pmd_none(pmd)) {
spin_unlock(ptl); spin_unlock(ptl);
goto not_found; goto not_found;
} else if (!pmd_leaf(pmd)) { } else if (pmd_present(pmd) && !pmd_leaf(pmd)) {
spin_unlock(ptl); spin_unlock(ptl);
goto pte_table; goto pte_table;
} else if (pmd_present(pmd)) { } else if (pmd_present(pmd)) {
@ -812,7 +818,7 @@ pmd_table:
} }
pte_table: pte_table:
VM_WARN_ON_ONCE(pmd_leaf(pmdp_get_lockless(pmdp))); VM_WARN_ON_ONCE(!pmd_present(pmd) || pmd_leaf(pmd));
ptep = pte_offset_map_lock(vma->vm_mm, pmdp, addr, &ptl); ptep = pte_offset_map_lock(vma->vm_mm, pmdp, addr, &ptl);
if (!ptep) if (!ptep)
goto not_found; goto not_found;