2
0
mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-21 03:33:59 +08:00
linux-next/arch/powerpc/mm
Michael Ellerman 87e65ad7bd powerpc/mm/64s/hash: Add real-mode change_memory_range() for hash LPAR
When we enabled STRICT_KERNEL_RWX we received some reports of boot
failures when using the Hash MMU and running under phyp. The crashes
are intermittent, and often exhibit as a completely unresponsive
system, or possibly an oops.

One example, which was caught in xmon:

  [   14.068327][    T1] devtmpfs: mounted
  [   14.069302][    T1] Freeing unused kernel memory: 5568K
  [   14.142060][  T347] BUG: Unable to handle kernel instruction fetch
  [   14.142063][    T1] Run /sbin/init as init process
  [   14.142074][  T347] Faulting instruction address: 0xc000000000004400
  cpu 0x2: Vector: 400 (Instruction Access) at [c00000000c7475e0]
      pc: c000000000004400: exc_virt_0x4400_instruction_access+0x0/0x80
      lr: c0000000001862d4: update_rq_clock+0x44/0x110
      sp: c00000000c747880
     msr: 8000000040001031
    current = 0xc00000000c60d380
    paca    = 0xc00000001ec9de80   irqmask: 0x03   irq_happened: 0x01
      pid   = 347, comm = kworker/2:1
  ...
  enter ? for help
  [c00000000c747880] c0000000001862d4 update_rq_clock+0x44/0x110 (unreliable)
  [c00000000c7478f0] c000000000198794 update_blocked_averages+0xb4/0x6d0
  [c00000000c7479f0] c000000000198e40 update_nohz_stats+0x90/0xd0
  [c00000000c747a20] c0000000001a13b4 _nohz_idle_balance+0x164/0x390
  [c00000000c747b10] c0000000001a1af8 newidle_balance+0x478/0x610
  [c00000000c747be0] c0000000001a1d48 pick_next_task_fair+0x58/0x480
  [c00000000c747c40] c000000000eaab5c __schedule+0x12c/0x950
  [c00000000c747cd0] c000000000eab3e8 schedule+0x68/0x120
  [c00000000c747d00] c00000000016b730 worker_thread+0x130/0x640
  [c00000000c747da0] c000000000174d50 kthread+0x1a0/0x1b0
  [c00000000c747e10] c00000000000e0f0 ret_from_kernel_thread+0x5c/0x6c

This shows that CPU 2, which was idle, woke up and then appears to
randomly take an instruction fault on a completely valid area of
kernel text.

The cause turns out to be the call to hash__mark_rodata_ro(), late in
boot. Due to the way we layout text and rodata, that function actually
changes the permissions for all of text and rodata to read-only plus
execute.

To do the permission change we use a hypervisor call, H_PROTECT. On
phyp that appears to be implemented by briefly removing the mapping of
the kernel text, before putting it back with the updated permissions.
If any other CPU is executing during that window, it will see spurious
faults on the kernel text and/or data, leading to crashes.

To fix it we use stop machine to collect all other CPUs, and then have
them drop into real mode (MMU off), while we change the mapping. That
way they are unaffected by the mapping temporarily disappearing.

We don't see this bug on KVM because KVM always use VPM=1, where
faults are directed to the hypervisor, and the fault will be
serialised vs the h_protect() by HPTE_V_HVLOCK.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210331003845.216246-5-mpe@ellerman.id.au
2021-04-08 21:17:43 +10:00
..
book3s32 powerpc/32s: Move KUEP locking/unlocking in C 2021-03-29 13:22:10 +11:00
book3s64 powerpc/mm/64s/hash: Add real-mode change_memory_range() for hash LPAR 2021-04-08 21:17:43 +10:00
kasan powerpc updates for 5.10 2020-10-16 12:21:15 -07:00
nohash powerpc: Enable KFENCE for PPC32 2021-03-24 14:09:30 +11:00
ptdump powerpc/32s: mfsrin()/mtsrin() become mfsr()/mtsr() 2021-02-09 01:10:15 +11:00
copro_fault.c mm: clean up the last pieces of page fault accountings 2020-08-12 10:58:04 -07:00
dma-noncoherent.c dma-mapping: merge <linux/dma-noncoherent.h> into <linux/dma-map-ops.h> 2020-10-06 07:07:06 +02:00
drmem.c pseries/drmem: don't cache node id in drmem_lmb struct 2020-09-02 11:00:21 +10:00
fault.c powerpc/mm: Remove unneeded #ifdef CONFIG_PPC_MEM_KEYS 2021-03-29 13:22:15 +11:00
hugetlbpage.c powerpc/mm: Enable compound page check for both THP and HugeTLB 2021-02-11 23:35:06 +11:00
init_32.c powerpc: Enable KFENCE for PPC32 2021-03-24 14:09:30 +11:00
init_64.c Merge branch 'fixes' into next 2020-09-14 22:57:18 +10:00
init-common.c powerpc: Inline setup_kup() 2020-12-15 13:13:49 +11:00
ioremap_32.c powerpc/ioremap: warn on early use of ioremap() 2019-11-19 19:38:38 +11:00
ioremap_64.c powerpc: remove __ioremap_at and __iounmap_at 2020-06-02 10:59:10 -07:00
ioremap.c mm/memremap_pages: Introduce memremap_compat_align() 2020-02-20 16:58:55 -08:00
maccess.c powerpc/mm: Fix KUAP warning by providing copy_from_kernel_nofault_allowed() 2020-12-08 10:22:09 +11:00
Makefile The new preemtible kmap_local() implementation: 2020-12-14 18:35:53 -08:00
mem.c powerpc/mm: Move the linear_mapping_mutex to the ifdef where it is used 2021-03-24 14:09:29 +11:00
mmap.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 156 2019-05-30 11:26:35 -07:00
mmu_context.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
mmu_decl.h powerpc: Enable KFENCE for PPC32 2021-03-24 14:09:30 +11:00
numa.c powerpc/numa: Fix a regression on memoryless node 0 2020-11-27 22:06:21 +11:00
pgtable_32.c powerpc/32s: Don't hash_preload() kernel text 2020-12-04 01:01:31 +11:00
pgtable_64.c mm: remove unneeded includes of <asm/pgalloc.h> 2020-08-07 11:33:26 -07:00
pgtable-frag.c powerpc/mm/radix: Fix PTE/PMD fragment count for early page table mappings 2020-07-20 22:57:56 +10:00
pgtable.c powerpc/mm: Add PG_dcache_clean to indicate dcache clean state 2021-02-11 23:35:06 +11:00
slice.c powerpc: Replace _ALIGN_UP() by ALIGN() 2020-05-11 23:15:15 +10:00