linux/arch/x86/mm
Dave Hansen 54e3d9434e x86/mm: Remove "INVPCID single" feature tracking
From: Dave Hansen <dave.hansen@linux.intel.com>

tl;dr: Replace a synthetic X86_FEATURE with a hardware X86_FEATURE
       and check of existing per-cpu state.

== Background ==

There are three features in play here:
 1. Good old Page Table Isolation (PTI)
 2. Process Context IDentifiers (PCIDs) which allow entries from
    multiple address spaces to be in the TLB at once.
 3. Support for the "Invalidate PCID" (INVPCID) instruction,
    specifically the "individual address" mode (aka. mode 0).

When all *three* of these are in place, INVPCID can and should be used
to flush out individual addresses in the PTI user address space.

But there's a wrinkle or two: First, this INVPCID mode is dependent on
CR4.PCIDE.  Even if X86_FEATURE_INVPCID==1, the instruction may #GP
without setting up CR4.  Second, TLB flushing is done very early, even
before CR4 is fully set up.  That means even if PTI, PCID and INVPCID
are supported, there is *still* a window where INVPCID can #GP.

== Problem ==

The current code seems to work, but mostly by chance and there are a
bunch of ways it can go wrong.  It's also somewhat hard to follow
since X86_FEATURE_INVPCID_SINGLE is set far away from its lone user.

== Solution ==

Make "INVPCID single" more robust and easier to follow by placing all
the logic in one place.  Remove X86_FEATURE_INVPCID_SINGLE.

Make two explicit checks before using INVPCID:
 1. Check that the system supports INVPCID itself (boot_cpu_has())
 2. Then check the CR4.PCIDE shadow to ensures that the CPU
    can safely use INVPCID for individual address invalidation.

The CR4 check *always* works and is not affected by any X86_FEATURE_*
twiddling or inconsistencies between the boot and secondary CPUs.

This has been tested on non-Meltdown hardware by using pti=on and
then flipping PCID and INVPCID support with qemu.

== Aside ==

How does this code even work today?  By chance, I think.  First, PTI
is initialized around the same time that the boot CPU sets
CR4.PCIDE=1.  There are currently no TLB invalidations when PTI=1 but
CR4.PCIDE=0.  That means that the X86_FEATURE_INVPCID_SINGLE check is
never even reached.

this_cpu_has() is also very nasty to use in this context because the
boot CPU reaches here before cpu_data(0) has been initialized.  It
happens to work for X86_FEATURE_INVPCID_SINGLE since it's a
software-defined feature but it would fall over for a hardware-
derived X86_FEATURE.

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20230718170630.7922E235%40davehans-spike.ostc.intel.com
2023-08-03 10:34:05 -07:00
..
pat - Address -Wmissing-prototype warnings 2023-06-26 16:43:54 -07:00
amdtopology.c x86/mm: Replace nodes_weight() with nodes_empty() where appropriate 2022-04-10 22:35:38 +02:00
cpu_entry_area.c x86/mm: Do not shuffle CPU entry areas without KASLR 2023-03-22 10:42:47 -07:00
debug_pagetables.c x86/mm/dump_pagetables: remove MODULE_LICENSE in non-modules 2023-04-13 13:13:54 -07:00
dump_pagetables.c mm: don't include asm/pgtable.h if linux/mm.h is already included 2020-06-09 09:39:13 -07:00
extable.c x86-64: mm: clarify the 'positive addresses' user address rules 2023-05-03 10:37:22 -07:00
fault.c mm: introduce new 'lock_mm_and_find_vma()' page fault helper 2023-06-24 14:12:54 -07:00
highmem_32.c x86/mm: Include asm/numa.h for set_highmem_pages_init() 2023-05-18 11:56:18 -07:00
hugetlbpage.c arch/x86/mm/hugetlbpage.c: pud_huge() returns 0 when using 2-level paging 2022-11-08 15:57:25 -08:00
ident_map.c x86/mm/ident_map: Check for errors from ident_pud_init() 2020-10-28 14:48:30 +01:00
init_32.c x86/mm: Remove Xen-PV leftovers from init_32.c 2023-06-09 11:00:21 +02:00
init_64.c mm/sparse-vmemmap: generalise vmemmap_populate_hugepages() 2022-12-11 18:12:12 -08:00
init.c x86/mm: Remove "INVPCID single" feature tracking 2023-08-03 10:34:05 -07:00
iomap_32.c io-mapping: Cleanup atomic iomap 2020-11-06 23:14:58 +01:00
ioremap.c x86/ioremap: Add hypervisor callback for private MMIO mapping in coco VM 2023-03-26 23:42:40 +02:00
kasan_init_64.c x86/kasan: Populate shadow for shared chunk of the CPU entry area 2022-12-15 10:37:28 -08:00
kaslr.c x86/mm: Avoid using set_pgd() outside of real PGD pages 2023-06-16 11:46:42 -07:00
kmmio.c x86/mm/kmmio: Remove redundant preempt_disable() 2022-12-12 10:54:48 -05:00
kmsan_shadow.c x86: kmsan: handle CPU entry area 2022-10-03 14:03:26 -07:00
maccess.c x86: Share definition of __is_canonical_address() 2022-02-02 13:11:42 +01:00
Makefile x86: kmsan: handle CPU entry area 2022-10-03 14:03:26 -07:00
mem_encrypt_amd.c - Fix a race window where load_unaligned_zeropad() could cause 2023-06-26 16:32:47 -07:00
mem_encrypt_boot.S x86/mm: Remove P*D_PAGE_MASK and P*D_PAGE_SIZE macros 2022-12-15 10:37:27 -08:00
mem_encrypt_identity.c - Yosry Ahmed brought back some cgroup v1 stats in OOM logs. 2023-06-28 10:28:11 -07:00
mem_encrypt.c virtio: replace arch_has_restricted_virtio_memory_access() 2022-06-06 08:22:01 +02:00
mm_internal.h x86/mm: thread pgprot_t through init_memory_mapping() 2020-04-10 15:36:21 -07:00
mmap.c x86/mm/mmap: Fix -Wmissing-prototypes warnings 2020-04-22 20:19:48 +02:00
mmio-mod.c x86: Replace cpumask_weight() with cpumask_empty() where appropriate 2022-04-10 22:35:38 +02:00
numa_32.c x86/mm: Drop deprecated DISCONTIGMEM support for 32-bit 2020-05-28 18:34:30 +02:00
numa_64.c
numa_emulation.c x86/mm: Replace nodes_weight() with nodes_empty() where appropriate 2022-04-10 22:35:38 +02:00
numa_internal.h
numa.c x86/numa: Use cpumask_available instead of hardcoded NULL check 2022-08-03 11:44:57 +02:00
pf_in.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 156 2019-05-30 11:26:35 -07:00
pf_in.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 156 2019-05-30 11:26:35 -07:00
pgprot.c x86/mm: move protection_map[] inside the platform 2022-07-17 17:14:38 -07:00
pgtable_32.c mm: remove unneeded includes of <asm/pgalloc.h> 2020-08-07 11:33:26 -07:00
pgtable.c x86/mm: Only check uniform after calling mtrr_type_lookup() 2023-06-01 15:04:33 +02:00
physaddr.c mm, x86/mm: Untangle address space layout definitions from basic pgtable type definitions 2019-12-10 10:12:55 +01:00
physaddr.h
pkeys.c x86/pkeys: Clarify PKRU_AD_KEY macro 2022-06-07 16:06:33 -07:00
pti.c x86/mm: Remove P*D_PAGE_MASK and P*D_PAGE_SIZE macros 2022-12-15 10:37:27 -08:00
srat.c
testmmiotrace.c remove ioremap_nocache and devm_ioremap_nocache 2020-01-06 09:45:59 +01:00
tlb.c x86/mm: Remove "INVPCID single" feature tracking 2023-08-03 10:34:05 -07:00