x86/mm: Check shadow stack page fault errors

The CPU performs "shadow stack accesses" when it expects to encounter
shadow stack mappings. These accesses can be implicit (via CALL/RET
instructions) or explicit (instructions like WRSS).

Shadow stack accesses to shadow-stack mappings can result in faults in
normal, valid operation just like regular accesses to regular mappings.
Shadow stacks need some of the same features like delayed allocation, swap
and copy-on-write. The kernel needs to use faults to implement those
features.

The architecture has concepts of both shadow stack reads and shadow stack
writes. Any shadow stack access to non-shadow stack memory will generate
a fault with the shadow stack error code bit set.

This means that, unlike normal write protection, the fault handler needs
to create a type of memory that can be written to (with instructions that
generate shadow stack writes), even to fulfill a read access. So in the
case of COW memory, the COW needs to take place even with a shadow stack
read. Otherwise the page will be left (shadow stack) writable in
userspace. So to trigger the appropriate behavior, set FAULT_FLAG_WRITE
for shadow stack accesses, even if the access was a shadow stack read.

For the purpose of making this clearer, consider the following example.
If a process has a shadow stack, and forks, the shadow stack PTEs will
become read-only due to COW. If the CPU in one process performs a shadow
stack read access to the shadow stack, for example executing a RET and
causing the CPU to read the shadow stack copy of the return address, then
in order for the fault to be resolved the PTE will need to be set with
shadow stack permissions. But then the memory would be changeable from
userspace (from CALL, RET, WRSS, etc). So this scenario needs to trigger
COW, otherwise the shared page would be changeable from both processes.

Shadow stack accesses can also result in errors, such as when a shadow
stack overflows, or if a shadow stack access occurs to a non-shadow-stack
mapping. Also, generate the errors for invalid shadow stack accesses.

Co-developed-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Tested-by: Pengfei Xu <pengfei.xu@intel.com>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/all/20230613001108.3040476-16-rick.p.edgecombe%40intel.com
This commit is contained in:
Rick Edgecombe 2023-06-12 17:10:41 -07:00
parent 54007f8182
commit fd5439e0c9
2 changed files with 24 additions and 0 deletions

View File

@ -11,6 +11,7 @@
* bit 3 == 1: use of reserved bit detected * bit 3 == 1: use of reserved bit detected
* bit 4 == 1: fault was an instruction fetch * bit 4 == 1: fault was an instruction fetch
* bit 5 == 1: protection keys block access * bit 5 == 1: protection keys block access
* bit 6 == 1: shadow stack access fault
* bit 15 == 1: SGX MMU page-fault * bit 15 == 1: SGX MMU page-fault
*/ */
enum x86_pf_error_code { enum x86_pf_error_code {
@ -20,6 +21,7 @@ enum x86_pf_error_code {
X86_PF_RSVD = 1 << 3, X86_PF_RSVD = 1 << 3,
X86_PF_INSTR = 1 << 4, X86_PF_INSTR = 1 << 4,
X86_PF_PK = 1 << 5, X86_PF_PK = 1 << 5,
X86_PF_SHSTK = 1 << 6,
X86_PF_SGX = 1 << 15, X86_PF_SGX = 1 << 15,
}; };

View File

@ -1112,8 +1112,22 @@ access_error(unsigned long error_code, struct vm_area_struct *vma)
(error_code & X86_PF_INSTR), foreign)) (error_code & X86_PF_INSTR), foreign))
return 1; return 1;
/*
* Shadow stack accesses (PF_SHSTK=1) are only permitted to
* shadow stack VMAs. All other accesses result in an error.
*/
if (error_code & X86_PF_SHSTK) {
if (unlikely(!(vma->vm_flags & VM_SHADOW_STACK)))
return 1;
if (unlikely(!(vma->vm_flags & VM_WRITE)))
return 1;
return 0;
}
if (error_code & X86_PF_WRITE) { if (error_code & X86_PF_WRITE) {
/* write, present and write, not present: */ /* write, present and write, not present: */
if (unlikely(vma->vm_flags & VM_SHADOW_STACK))
return 1;
if (unlikely(!(vma->vm_flags & VM_WRITE))) if (unlikely(!(vma->vm_flags & VM_WRITE)))
return 1; return 1;
return 0; return 0;
@ -1305,6 +1319,14 @@ void do_user_addr_fault(struct pt_regs *regs,
perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address); perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
/*
* Read-only permissions can not be expressed in shadow stack PTEs.
* Treat all shadow stack accesses as WRITE faults. This ensures
* that the MM will prepare everything (e.g., break COW) such that
* maybe_mkwrite() can create a proper shadow stack PTE.
*/
if (error_code & X86_PF_SHSTK)
flags |= FAULT_FLAG_WRITE;
if (error_code & X86_PF_WRITE) if (error_code & X86_PF_WRITE)
flags |= FAULT_FLAG_WRITE; flags |= FAULT_FLAG_WRITE;
if (error_code & X86_PF_INSTR) if (error_code & X86_PF_INSTR)