License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 22:07:57 +08:00
|
|
|
// SPDX-License-Identifier: GPL-2.0
|
2017-03-15 01:35:41 +08:00
|
|
|
/*
|
|
|
|
* Core of Xen paravirt_ops implementation.
|
|
|
|
*
|
|
|
|
* This file contains the xen_paravirt_ops structure itself, and the
|
|
|
|
* implementations for:
|
|
|
|
* - privileged instructions
|
|
|
|
* - interrupt flags
|
|
|
|
* - segment operations
|
|
|
|
* - booting and setup
|
|
|
|
*
|
|
|
|
* Jeremy Fitzhardinge <jeremy@xensource.com>, XenSource Inc, 2007
|
|
|
|
*/
|
|
|
|
|
|
|
|
#include <linux/cpu.h>
|
|
|
|
#include <linux/kernel.h>
|
|
|
|
#include <linux/init.h>
|
|
|
|
#include <linux/smp.h>
|
|
|
|
#include <linux/preempt.h>
|
|
|
|
#include <linux/hardirq.h>
|
|
|
|
#include <linux/percpu.h>
|
|
|
|
#include <linux/delay.h>
|
|
|
|
#include <linux/start_kernel.h>
|
|
|
|
#include <linux/sched.h>
|
|
|
|
#include <linux/kprobes.h>
|
2022-11-02 05:14:16 +08:00
|
|
|
#include <linux/kstrtox.h>
|
2018-10-31 06:09:49 +08:00
|
|
|
#include <linux/memblock.h>
|
2017-03-15 01:35:41 +08:00
|
|
|
#include <linux/export.h>
|
|
|
|
#include <linux/mm.h>
|
|
|
|
#include <linux/page-flags.h>
|
|
|
|
#include <linux/pci.h>
|
|
|
|
#include <linux/gfp.h>
|
|
|
|
#include <linux/edd.h>
|
2022-05-10 07:32:22 +08:00
|
|
|
#include <linux/reboot.h>
|
2022-06-22 14:38:38 +08:00
|
|
|
#include <linux/virtio_anchor.h>
|
2022-10-24 04:06:00 +08:00
|
|
|
#include <linux/stackprotector.h>
|
2017-03-15 01:35:41 +08:00
|
|
|
|
|
|
|
#include <xen/xen.h>
|
|
|
|
#include <xen/events.h>
|
|
|
|
#include <xen/interface/xen.h>
|
|
|
|
#include <xen/interface/version.h>
|
|
|
|
#include <xen/interface/physdev.h>
|
|
|
|
#include <xen/interface/vcpu.h>
|
|
|
|
#include <xen/interface/memory.h>
|
|
|
|
#include <xen/interface/nmi.h>
|
|
|
|
#include <xen/interface/xen-mca.h>
|
|
|
|
#include <xen/features.h>
|
|
|
|
#include <xen/page.h>
|
|
|
|
#include <xen/hvc-console.h>
|
|
|
|
#include <xen/acpi.h>
|
|
|
|
|
|
|
|
#include <asm/paravirt.h>
|
|
|
|
#include <asm/apic.h>
|
|
|
|
#include <asm/page.h>
|
|
|
|
#include <asm/xen/pci.h>
|
|
|
|
#include <asm/xen/hypercall.h>
|
|
|
|
#include <asm/xen/hypervisor.h>
|
|
|
|
#include <asm/xen/cpuid.h>
|
|
|
|
#include <asm/fixmap.h>
|
|
|
|
#include <asm/processor.h>
|
|
|
|
#include <asm/proto.h>
|
|
|
|
#include <asm/msr-index.h>
|
|
|
|
#include <asm/traps.h>
|
|
|
|
#include <asm/setup.h>
|
|
|
|
#include <asm/desc.h>
|
|
|
|
#include <asm/pgalloc.h>
|
|
|
|
#include <asm/tlbflush.h>
|
|
|
|
#include <asm/reboot.h>
|
|
|
|
#include <asm/hypervisor.h>
|
|
|
|
#include <asm/mach_traps.h>
|
|
|
|
#include <asm/mwait.h>
|
|
|
|
#include <asm/pci_x86.h>
|
|
|
|
#include <asm/cpu.h>
|
2020-02-18 23:47:12 +08:00
|
|
|
#ifdef CONFIG_X86_IOPL_IOPERM
|
|
|
|
#include <asm/io_bitmap.h>
|
|
|
|
#endif
|
2017-03-15 01:35:41 +08:00
|
|
|
|
|
|
|
#ifdef CONFIG_ACPI
|
|
|
|
#include <linux/acpi.h>
|
|
|
|
#include <asm/acpi.h>
|
|
|
|
#include <acpi/pdc_intel.h>
|
|
|
|
#include <acpi/processor.h>
|
|
|
|
#include <xen/interface/platform.h>
|
|
|
|
#endif
|
|
|
|
|
|
|
|
#include "xen-ops.h"
|
|
|
|
#include "mmu.h"
|
|
|
|
#include "smp.h"
|
|
|
|
#include "multicalls.h"
|
|
|
|
#include "pmu.h"
|
|
|
|
|
2017-12-19 00:37:45 +08:00
|
|
|
#include "../kernel/cpu/cpu.h" /* get_cpu_cap() */
|
|
|
|
|
2017-03-15 01:35:41 +08:00
|
|
|
void *xen_initial_gdt;
|
|
|
|
|
|
|
|
static int xen_cpu_up_prepare_pv(unsigned int cpu);
|
|
|
|
static int xen_cpu_dead_pv(unsigned int cpu);
|
|
|
|
|
|
|
|
struct tls_descs {
|
|
|
|
struct desc_struct desc[3];
|
|
|
|
};
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Updating the 3 TLS descriptors in the GDT on every task switch is
|
|
|
|
* surprisingly expensive so we avoid updating them if they haven't
|
|
|
|
* changed. Since Xen writes different descriptors than the one
|
|
|
|
* passed in the update_descriptor hypercall we keep shadow copies to
|
|
|
|
* compare against.
|
|
|
|
*/
|
|
|
|
static DEFINE_PER_CPU(struct tls_descs, shadow_tls_desc);
|
|
|
|
|
2022-09-26 19:16:56 +08:00
|
|
|
static __read_mostly bool xen_msr_safe = IS_ENABLED(CONFIG_XEN_PV_MSR_SAFE);
|
|
|
|
|
|
|
|
static int __init parse_xen_msr_safe(char *str)
|
|
|
|
{
|
|
|
|
if (str)
|
2022-11-02 05:14:16 +08:00
|
|
|
return kstrtobool(str, &xen_msr_safe);
|
2022-09-26 19:16:56 +08:00
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
early_param("xen_msr_safe", parse_xen_msr_safe);
|
|
|
|
|
2018-07-20 04:55:31 +08:00
|
|
|
static void __init xen_pv_init_platform(void)
|
|
|
|
{
|
2022-06-22 14:38:38 +08:00
|
|
|
/* PV guests can't operate virtio devices without grants. */
|
|
|
|
if (IS_ENABLED(CONFIG_XEN_VIRTIO))
|
2022-08-29 19:26:08 +08:00
|
|
|
virtio_set_mem_acc_cb(xen_virtio_restricted_mem_acc);
|
2022-06-03 03:23:49 +08:00
|
|
|
|
2018-08-20 23:24:20 +08:00
|
|
|
populate_extra_pte(fix_to_virt(FIX_PARAVIRT_BOOTMAP));
|
|
|
|
|
2018-07-20 04:55:31 +08:00
|
|
|
set_fixmap(FIX_PARAVIRT_BOOTMAP, xen_start_info->shared_info);
|
|
|
|
HYPERVISOR_shared_info = (void *)fix_to_virt(FIX_PARAVIRT_BOOTMAP);
|
|
|
|
|
|
|
|
/* xen clock uses per-cpu vcpu_info, need to init it for boot cpu */
|
|
|
|
xen_vcpu_info_reset(0);
|
|
|
|
|
|
|
|
/* pvclock is in shared info area */
|
|
|
|
xen_init_time_ops();
|
|
|
|
}
|
|
|
|
|
|
|
|
static void __init xen_pv_guest_late_init(void)
|
|
|
|
{
|
|
|
|
#ifndef CONFIG_SMP
|
|
|
|
/* Setup shared vcpu info for non-smp configurations */
|
|
|
|
xen_setup_vcpu_info_placement();
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
2017-03-15 01:35:41 +08:00
|
|
|
static __read_mostly unsigned int cpuid_leaf5_ecx_val;
|
|
|
|
static __read_mostly unsigned int cpuid_leaf5_edx_val;
|
|
|
|
|
|
|
|
static void xen_cpuid(unsigned int *ax, unsigned int *bx,
|
|
|
|
unsigned int *cx, unsigned int *dx)
|
|
|
|
{
|
|
|
|
unsigned maskebx = ~0;
|
2017-04-12 21:12:09 +08:00
|
|
|
|
2017-03-15 01:35:41 +08:00
|
|
|
/*
|
|
|
|
* Mask out inconvenient features, to try and disable as many
|
|
|
|
* unsupported kernel subsystems as possible.
|
|
|
|
*/
|
|
|
|
switch (*ax) {
|
|
|
|
case CPUID_MWAIT_LEAF:
|
|
|
|
/* Synthesize the values.. */
|
|
|
|
*ax = 0;
|
|
|
|
*bx = 0;
|
|
|
|
*cx = cpuid_leaf5_ecx_val;
|
|
|
|
*dx = cpuid_leaf5_edx_val;
|
|
|
|
return;
|
|
|
|
|
|
|
|
case 0xb:
|
|
|
|
/* Suppress extended topology stuff */
|
|
|
|
maskebx = 0;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
asm(XEN_EMULATE_PREFIX "cpuid"
|
|
|
|
: "=a" (*ax),
|
|
|
|
"=b" (*bx),
|
|
|
|
"=c" (*cx),
|
|
|
|
"=d" (*dx)
|
|
|
|
: "0" (*ax), "2" (*cx));
|
|
|
|
|
|
|
|
*bx &= maskebx;
|
|
|
|
}
|
|
|
|
|
|
|
|
static bool __init xen_check_mwait(void)
|
|
|
|
{
|
|
|
|
#ifdef CONFIG_ACPI
|
|
|
|
struct xen_platform_op op = {
|
|
|
|
.cmd = XENPF_set_processor_pminfo,
|
|
|
|
.u.set_pminfo.id = -1,
|
|
|
|
.u.set_pminfo.type = XEN_PM_PDC,
|
|
|
|
};
|
|
|
|
uint32_t buf[3];
|
|
|
|
unsigned int ax, bx, cx, dx;
|
|
|
|
unsigned int mwait_mask;
|
|
|
|
|
|
|
|
/* We need to determine whether it is OK to expose the MWAIT
|
|
|
|
* capability to the kernel to harvest deeper than C3 states from ACPI
|
|
|
|
* _CST using the processor_harvest_xen.c module. For this to work, we
|
|
|
|
* need to gather the MWAIT_LEAF values (which the cstate.c code
|
|
|
|
* checks against). The hypervisor won't expose the MWAIT flag because
|
|
|
|
* it would break backwards compatibility; so we will find out directly
|
|
|
|
* from the hardware and hypercall.
|
|
|
|
*/
|
|
|
|
if (!xen_initial_domain())
|
|
|
|
return false;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* When running under platform earlier than Xen4.2, do not expose
|
|
|
|
* mwait, to avoid the risk of loading native acpi pad driver
|
|
|
|
*/
|
|
|
|
if (!xen_running_on_version_or_later(4, 2))
|
|
|
|
return false;
|
|
|
|
|
|
|
|
ax = 1;
|
|
|
|
cx = 0;
|
|
|
|
|
|
|
|
native_cpuid(&ax, &bx, &cx, &dx);
|
|
|
|
|
|
|
|
mwait_mask = (1 << (X86_FEATURE_EST % 32)) |
|
|
|
|
(1 << (X86_FEATURE_MWAIT % 32));
|
|
|
|
|
|
|
|
if ((cx & mwait_mask) != mwait_mask)
|
|
|
|
return false;
|
|
|
|
|
|
|
|
/* We need to emulate the MWAIT_LEAF and for that we need both
|
|
|
|
* ecx and edx. The hypercall provides only partial information.
|
|
|
|
*/
|
|
|
|
|
|
|
|
ax = CPUID_MWAIT_LEAF;
|
|
|
|
bx = 0;
|
|
|
|
cx = 0;
|
|
|
|
dx = 0;
|
|
|
|
|
|
|
|
native_cpuid(&ax, &bx, &cx, &dx);
|
|
|
|
|
|
|
|
/* Ask the Hypervisor whether to clear ACPI_PDC_C_C2C3_FFH. If so,
|
|
|
|
* don't expose MWAIT_LEAF and let ACPI pick the IOPORT version of C3.
|
|
|
|
*/
|
|
|
|
buf[0] = ACPI_PDC_REVISION_ID;
|
|
|
|
buf[1] = 1;
|
|
|
|
buf[2] = (ACPI_PDC_C_CAPABILITY_SMP | ACPI_PDC_EST_CAPABILITY_SWSMP);
|
|
|
|
|
|
|
|
set_xen_guest_handle(op.u.set_pminfo.pdc, buf);
|
|
|
|
|
|
|
|
if ((HYPERVISOR_platform_op(&op) == 0) &&
|
|
|
|
(buf[2] & (ACPI_PDC_C_C1_FFH | ACPI_PDC_C_C2C3_FFH))) {
|
|
|
|
cpuid_leaf5_ecx_val = cx;
|
|
|
|
cpuid_leaf5_edx_val = dx;
|
|
|
|
}
|
|
|
|
return true;
|
|
|
|
#else
|
|
|
|
return false;
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
2017-04-12 21:12:09 +08:00
|
|
|
static bool __init xen_check_xsave(void)
|
|
|
|
{
|
2017-04-25 14:47:40 +08:00
|
|
|
unsigned int cx, xsave_mask;
|
2017-03-15 01:35:41 +08:00
|
|
|
|
2017-04-25 14:47:40 +08:00
|
|
|
cx = cpuid_ecx(1);
|
|
|
|
|
|
|
|
xsave_mask = (1 << (X86_FEATURE_XSAVE % 32)) |
|
|
|
|
(1 << (X86_FEATURE_OSXSAVE % 32));
|
|
|
|
|
|
|
|
/* Xen will set CR4.OSXSAVE if supported and not disabled by force */
|
|
|
|
return (cx & xsave_mask) == xsave_mask;
|
2017-03-15 01:35:41 +08:00
|
|
|
}
|
|
|
|
|
2017-04-13 14:55:41 +08:00
|
|
|
static void __init xen_init_capabilities(void)
|
|
|
|
{
|
|
|
|
setup_force_cpu_cap(X86_FEATURE_XENPV);
|
2017-04-12 14:20:29 +08:00
|
|
|
setup_clear_cpu_cap(X86_FEATURE_DCA);
|
2017-04-12 14:27:07 +08:00
|
|
|
setup_clear_cpu_cap(X86_FEATURE_APERFMPERF);
|
2017-04-12 15:21:05 +08:00
|
|
|
setup_clear_cpu_cap(X86_FEATURE_MTRR);
|
2017-04-12 15:24:01 +08:00
|
|
|
setup_clear_cpu_cap(X86_FEATURE_ACC);
|
2017-04-12 18:45:57 +08:00
|
|
|
setup_clear_cpu_cap(X86_FEATURE_X2APIC);
|
2017-07-18 05:10:29 +08:00
|
|
|
setup_clear_cpu_cap(X86_FEATURE_SME);
|
2023-01-12 15:20:32 +08:00
|
|
|
setup_clear_cpu_cap(X86_FEATURE_LKGS);
|
2017-04-12 15:27:47 +08:00
|
|
|
|
2017-06-29 23:53:21 +08:00
|
|
|
/*
|
|
|
|
* Xen PV would need some work to support PCID: CR3 handling as well
|
|
|
|
* as xen_flush_tlb_others() would need updating.
|
|
|
|
*/
|
|
|
|
setup_clear_cpu_cap(X86_FEATURE_PCID);
|
2017-04-12 15:27:47 +08:00
|
|
|
|
|
|
|
if (!xen_initial_domain())
|
|
|
|
setup_clear_cpu_cap(X86_FEATURE_ACPI);
|
2017-04-12 18:37:00 +08:00
|
|
|
|
|
|
|
if (xen_check_mwait())
|
|
|
|
setup_force_cpu_cap(X86_FEATURE_MWAIT);
|
|
|
|
else
|
|
|
|
setup_clear_cpu_cap(X86_FEATURE_MWAIT);
|
2017-04-12 21:12:09 +08:00
|
|
|
|
2017-04-25 14:47:40 +08:00
|
|
|
if (!xen_check_xsave()) {
|
2017-04-12 21:12:09 +08:00
|
|
|
setup_clear_cpu_cap(X86_FEATURE_XSAVE);
|
|
|
|
setup_clear_cpu_cap(X86_FEATURE_OSXSAVE);
|
|
|
|
}
|
2017-04-13 14:55:41 +08:00
|
|
|
}
|
|
|
|
|
2021-06-24 17:41:16 +08:00
|
|
|
static noinstr void xen_set_debugreg(int reg, unsigned long val)
|
2017-03-15 01:35:41 +08:00
|
|
|
{
|
|
|
|
HYPERVISOR_set_debugreg(reg, val);
|
|
|
|
}
|
|
|
|
|
2021-06-24 17:41:15 +08:00
|
|
|
static noinstr unsigned long xen_get_debugreg(int reg)
|
2017-03-15 01:35:41 +08:00
|
|
|
{
|
|
|
|
return HYPERVISOR_get_debugreg(reg);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void xen_end_context_switch(struct task_struct *next)
|
|
|
|
{
|
|
|
|
xen_mc_flush();
|
|
|
|
paravirt_end_context_switch(next);
|
|
|
|
}
|
|
|
|
|
|
|
|
static unsigned long xen_store_tr(void)
|
|
|
|
{
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Set the page permissions for a particular virtual address. If the
|
|
|
|
* address is a vmalloc mapping (or other non-linear mapping), then
|
|
|
|
* find the linear mapping of the page and also set its protections to
|
|
|
|
* match.
|
|
|
|
*/
|
|
|
|
static void set_aliased_prot(void *v, pgprot_t prot)
|
|
|
|
{
|
|
|
|
int level;
|
|
|
|
pte_t *ptep;
|
|
|
|
pte_t pte;
|
|
|
|
unsigned long pfn;
|
|
|
|
unsigned char dummy;
|
2020-07-03 15:28:15 +08:00
|
|
|
void *va;
|
2017-03-15 01:35:41 +08:00
|
|
|
|
|
|
|
ptep = lookup_address((unsigned long)v, &level);
|
|
|
|
BUG_ON(ptep == NULL);
|
|
|
|
|
|
|
|
pfn = pte_pfn(*ptep);
|
|
|
|
pte = pfn_pte(pfn, prot);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Careful: update_va_mapping() will fail if the virtual address
|
|
|
|
* we're poking isn't populated in the page tables. We don't
|
|
|
|
* need to worry about the direct map (that's always in the page
|
|
|
|
* tables), but we need to be careful about vmap space. In
|
|
|
|
* particular, the top level page table can lazily propagate
|
|
|
|
* entries between processes, so if we've switched mms since we
|
|
|
|
* vmapped the target in the first place, we might not have the
|
|
|
|
* top-level page table entry populated.
|
|
|
|
*
|
|
|
|
* We disable preemption because we want the same mm active when
|
|
|
|
* we probe the target and when we issue the hypercall. We'll
|
|
|
|
* have the same nominal mm, but if we're a kernel thread, lazy
|
|
|
|
* mm dropping could change our pgd.
|
|
|
|
*
|
|
|
|
* Out of an abundance of caution, this uses __get_user() to fault
|
|
|
|
* in the target address just in case there's some obscure case
|
|
|
|
* in which the target address isn't readable.
|
|
|
|
*/
|
|
|
|
|
|
|
|
preempt_disable();
|
|
|
|
|
2020-06-17 15:37:53 +08:00
|
|
|
copy_from_kernel_nofault(&dummy, v, 1);
|
2017-03-15 01:35:41 +08:00
|
|
|
|
|
|
|
if (HYPERVISOR_update_va_mapping((unsigned long)v, pte, 0))
|
|
|
|
BUG();
|
|
|
|
|
2020-07-03 15:28:15 +08:00
|
|
|
va = __va(PFN_PHYS(pfn));
|
2017-03-15 01:35:41 +08:00
|
|
|
|
2020-07-03 15:28:15 +08:00
|
|
|
if (va != v && HYPERVISOR_update_va_mapping((unsigned long)va, pte, 0))
|
|
|
|
BUG();
|
2017-03-15 01:35:41 +08:00
|
|
|
|
|
|
|
preempt_enable();
|
|
|
|
}
|
|
|
|
|
|
|
|
static void xen_alloc_ldt(struct desc_struct *ldt, unsigned entries)
|
|
|
|
{
|
|
|
|
const unsigned entries_per_page = PAGE_SIZE / LDT_ENTRY_SIZE;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We need to mark the all aliases of the LDT pages RO. We
|
|
|
|
* don't need to call vm_flush_aliases(), though, since that's
|
|
|
|
* only responsible for flushing aliases out the TLBs, not the
|
|
|
|
* page tables, and Xen will flush the TLB for us if needed.
|
|
|
|
*
|
|
|
|
* To avoid confusing future readers: none of this is necessary
|
|
|
|
* to load the LDT. The hypervisor only checks this when the
|
|
|
|
* LDT is faulted in due to subsequent descriptor access.
|
|
|
|
*/
|
|
|
|
|
|
|
|
for (i = 0; i < entries; i += entries_per_page)
|
|
|
|
set_aliased_prot(ldt + i, PAGE_KERNEL_RO);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void xen_free_ldt(struct desc_struct *ldt, unsigned entries)
|
|
|
|
{
|
|
|
|
const unsigned entries_per_page = PAGE_SIZE / LDT_ENTRY_SIZE;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
for (i = 0; i < entries; i += entries_per_page)
|
|
|
|
set_aliased_prot(ldt + i, PAGE_KERNEL);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void xen_set_ldt(const void *addr, unsigned entries)
|
|
|
|
{
|
|
|
|
struct mmuext_op *op;
|
|
|
|
struct multicall_space mcs = xen_mc_entry(sizeof(*op));
|
|
|
|
|
|
|
|
trace_xen_cpu_set_ldt(addr, entries);
|
|
|
|
|
|
|
|
op = mcs.args;
|
|
|
|
op->cmd = MMUEXT_SET_LDT;
|
|
|
|
op->arg1.linear_addr = (unsigned long)addr;
|
|
|
|
op->arg2.nr_ents = entries;
|
|
|
|
|
|
|
|
MULTI_mmuext_op(mcs.mc, op, 1, NULL, DOMID_SELF);
|
|
|
|
|
|
|
|
xen_mc_issue(PARAVIRT_LAZY_CPU);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void xen_load_gdt(const struct desc_ptr *dtr)
|
|
|
|
{
|
|
|
|
unsigned long va = dtr->address;
|
|
|
|
unsigned int size = dtr->size + 1;
|
2018-04-19 01:08:32 +08:00
|
|
|
unsigned long pfn, mfn;
|
|
|
|
int level;
|
|
|
|
pte_t *ptep;
|
|
|
|
void *virt;
|
|
|
|
|
|
|
|
/* @size should be at most GDT_SIZE which is smaller than PAGE_SIZE. */
|
|
|
|
BUG_ON(size > PAGE_SIZE);
|
|
|
|
BUG_ON(va & ~PAGE_MASK);
|
2017-03-15 01:35:41 +08:00
|
|
|
|
|
|
|
/*
|
2018-04-19 01:08:32 +08:00
|
|
|
* The GDT is per-cpu and is in the percpu data area.
|
|
|
|
* That can be virtually mapped, so we need to do a
|
|
|
|
* page-walk to get the underlying MFN for the
|
|
|
|
* hypercall. The page can also be in the kernel's
|
|
|
|
* linear range, so we need to RO that mapping too.
|
2017-03-15 01:35:41 +08:00
|
|
|
*/
|
2018-04-19 01:08:32 +08:00
|
|
|
ptep = lookup_address(va, &level);
|
|
|
|
BUG_ON(ptep == NULL);
|
2017-03-15 01:35:41 +08:00
|
|
|
|
2018-04-19 01:08:32 +08:00
|
|
|
pfn = pte_pfn(*ptep);
|
|
|
|
mfn = pfn_to_mfn(pfn);
|
|
|
|
virt = __va(PFN_PHYS(pfn));
|
2017-03-15 01:35:41 +08:00
|
|
|
|
2018-04-19 01:08:32 +08:00
|
|
|
make_lowmem_page_readonly((void *)va);
|
|
|
|
make_lowmem_page_readonly(virt);
|
2017-03-15 01:35:41 +08:00
|
|
|
|
2018-04-19 01:08:32 +08:00
|
|
|
if (HYPERVISOR_set_gdt(&mfn, size / sizeof(struct desc_struct)))
|
2017-03-15 01:35:41 +08:00
|
|
|
BUG();
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* load_gdt for early boot, when the gdt is only mapped once
|
|
|
|
*/
|
|
|
|
static void __init xen_load_gdt_boot(const struct desc_ptr *dtr)
|
|
|
|
{
|
|
|
|
unsigned long va = dtr->address;
|
|
|
|
unsigned int size = dtr->size + 1;
|
2018-04-19 01:08:32 +08:00
|
|
|
unsigned long pfn, mfn;
|
|
|
|
pte_t pte;
|
2017-03-15 01:35:41 +08:00
|
|
|
|
2018-04-19 01:08:32 +08:00
|
|
|
/* @size should be at most GDT_SIZE which is smaller than PAGE_SIZE. */
|
|
|
|
BUG_ON(size > PAGE_SIZE);
|
2017-03-15 01:35:41 +08:00
|
|
|
BUG_ON(va & ~PAGE_MASK);
|
|
|
|
|
2018-04-19 01:08:32 +08:00
|
|
|
pfn = virt_to_pfn(va);
|
|
|
|
mfn = pfn_to_mfn(pfn);
|
2017-03-15 01:35:41 +08:00
|
|
|
|
2018-04-19 01:08:32 +08:00
|
|
|
pte = pfn_pte(pfn, PAGE_KERNEL_RO);
|
2017-03-15 01:35:41 +08:00
|
|
|
|
2018-04-19 01:08:32 +08:00
|
|
|
if (HYPERVISOR_update_va_mapping((unsigned long)va, pte, 0))
|
|
|
|
BUG();
|
2017-03-15 01:35:41 +08:00
|
|
|
|
2018-04-19 01:08:32 +08:00
|
|
|
if (HYPERVISOR_set_gdt(&mfn, size / sizeof(struct desc_struct)))
|
2017-03-15 01:35:41 +08:00
|
|
|
BUG();
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline bool desc_equal(const struct desc_struct *d1,
|
|
|
|
const struct desc_struct *d2)
|
|
|
|
{
|
2017-08-28 14:47:40 +08:00
|
|
|
return !memcmp(d1, d2, sizeof(*d1));
|
2017-03-15 01:35:41 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void load_TLS_descriptor(struct thread_struct *t,
|
|
|
|
unsigned int cpu, unsigned int i)
|
|
|
|
{
|
|
|
|
struct desc_struct *shadow = &per_cpu(shadow_tls_desc, cpu).desc[i];
|
|
|
|
struct desc_struct *gdt;
|
|
|
|
xmaddr_t maddr;
|
|
|
|
struct multicall_space mc;
|
|
|
|
|
|
|
|
if (desc_equal(shadow, &t->tls_array[i]))
|
|
|
|
return;
|
|
|
|
|
|
|
|
*shadow = t->tls_array[i];
|
|
|
|
|
|
|
|
gdt = get_cpu_gdt_rw(cpu);
|
|
|
|
maddr = arbitrary_virt_to_machine(&gdt[GDT_ENTRY_TLS_MIN+i]);
|
|
|
|
mc = __xen_mc_entry(0);
|
|
|
|
|
|
|
|
MULTI_update_descriptor(mc.mc, maddr.maddr, t->tls_array[i]);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void xen_load_tls(struct thread_struct *t, unsigned int cpu)
|
|
|
|
{
|
|
|
|
/*
|
2020-06-29 16:35:39 +08:00
|
|
|
* In lazy mode we need to zero %fs, otherwise we may get an
|
2017-03-15 01:35:41 +08:00
|
|
|
* exception between the new %fs descriptor being loaded and
|
|
|
|
* %fs being effectively cleared at __switch_to().
|
|
|
|
*/
|
2020-06-29 16:35:39 +08:00
|
|
|
if (paravirt_get_lazy_mode() == PARAVIRT_LAZY_CPU)
|
2017-03-15 01:35:41 +08:00
|
|
|
loadsegment(fs, 0);
|
|
|
|
|
|
|
|
xen_mc_batch();
|
|
|
|
|
|
|
|
load_TLS_descriptor(t, cpu, 0);
|
|
|
|
load_TLS_descriptor(t, cpu, 1);
|
|
|
|
load_TLS_descriptor(t, cpu, 2);
|
|
|
|
|
|
|
|
xen_mc_issue(PARAVIRT_LAZY_CPU);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void xen_load_gs_index(unsigned int idx)
|
|
|
|
{
|
|
|
|
if (HYPERVISOR_set_segment_base(SEGBASE_GS_USER_SEL, idx))
|
|
|
|
BUG();
|
|
|
|
}
|
|
|
|
|
|
|
|
static void xen_write_ldt_entry(struct desc_struct *dt, int entrynum,
|
|
|
|
const void *ptr)
|
|
|
|
{
|
|
|
|
xmaddr_t mach_lp = arbitrary_virt_to_machine(&dt[entrynum]);
|
|
|
|
u64 entry = *(u64 *)ptr;
|
|
|
|
|
|
|
|
trace_xen_cpu_write_ldt_entry(dt, entrynum, entry);
|
|
|
|
|
|
|
|
preempt_disable();
|
|
|
|
|
|
|
|
xen_mc_flush();
|
|
|
|
if (HYPERVISOR_update_descriptor(mach_lp.maddr, entry))
|
|
|
|
BUG();
|
|
|
|
|
|
|
|
preempt_enable();
|
|
|
|
}
|
|
|
|
|
2020-07-04 01:02:55 +08:00
|
|
|
void noist_exc_debug(struct pt_regs *regs);
|
|
|
|
|
|
|
|
DEFINE_IDTENTRY_RAW(xenpv_exc_nmi)
|
|
|
|
{
|
2021-01-20 21:55:43 +08:00
|
|
|
/* On Xen PV, NMI doesn't use IST. The C part is the same as native. */
|
2020-07-04 01:02:55 +08:00
|
|
|
exc_nmi(regs);
|
|
|
|
}
|
|
|
|
|
2021-01-20 21:55:43 +08:00
|
|
|
DEFINE_IDTENTRY_RAW_ERRORCODE(xenpv_exc_double_fault)
|
|
|
|
{
|
|
|
|
/* On Xen PV, DF doesn't use IST. The C part is the same as native. */
|
|
|
|
exc_double_fault(regs, error_code);
|
|
|
|
}
|
|
|
|
|
2020-07-04 01:02:55 +08:00
|
|
|
DEFINE_IDTENTRY_RAW(xenpv_exc_debug)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* There's no IST on Xen PV, but we still need to dispatch
|
|
|
|
* to the correct handler.
|
|
|
|
*/
|
|
|
|
if (user_mode(regs))
|
|
|
|
noist_exc_debug(regs);
|
|
|
|
else
|
|
|
|
exc_debug(regs);
|
|
|
|
}
|
|
|
|
|
2021-01-25 21:42:07 +08:00
|
|
|
DEFINE_IDTENTRY_RAW(exc_xen_unknown_trap)
|
|
|
|
{
|
|
|
|
/* This should never happen and there is no way to handle it. */
|
2021-06-21 19:12:36 +08:00
|
|
|
instrumentation_begin();
|
2021-01-25 21:42:07 +08:00
|
|
|
pr_err("Unknown trap in Xen PV mode.");
|
|
|
|
BUG();
|
2021-06-21 19:12:36 +08:00
|
|
|
instrumentation_end();
|
2021-01-25 21:42:07 +08:00
|
|
|
}
|
|
|
|
|
2021-01-20 21:55:42 +08:00
|
|
|
#ifdef CONFIG_X86_MCE
|
|
|
|
DEFINE_IDTENTRY_RAW(xenpv_exc_machine_check)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* There's no IST on Xen PV, but we still need to dispatch
|
|
|
|
* to the correct handler.
|
|
|
|
*/
|
|
|
|
if (user_mode(regs))
|
|
|
|
noist_exc_machine_check(regs);
|
|
|
|
else
|
|
|
|
exc_machine_check(regs);
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2017-09-01 01:42:49 +08:00
|
|
|
struct trap_array_entry {
|
|
|
|
void (*orig)(void);
|
|
|
|
void (*xen)(void);
|
|
|
|
bool ist_okay;
|
|
|
|
};
|
|
|
|
|
2020-02-26 06:16:14 +08:00
|
|
|
#define TRAP_ENTRY(func, ist_ok) { \
|
|
|
|
.orig = asm_##func, \
|
|
|
|
.xen = xen_asm_##func, \
|
|
|
|
.ist_okay = ist_ok }
|
|
|
|
|
2020-07-04 01:02:55 +08:00
|
|
|
#define TRAP_ENTRY_REDIR(func, ist_ok) { \
|
2020-02-26 06:33:25 +08:00
|
|
|
.orig = asm_##func, \
|
2020-07-04 01:02:55 +08:00
|
|
|
.xen = xen_asm_xenpv_##func, \
|
2020-02-26 06:33:25 +08:00
|
|
|
.ist_okay = ist_ok }
|
|
|
|
|
2017-09-01 01:42:49 +08:00
|
|
|
static struct trap_array_entry trap_array[] = {
|
2020-07-04 01:02:55 +08:00
|
|
|
TRAP_ENTRY_REDIR(exc_debug, true ),
|
2021-01-20 21:55:43 +08:00
|
|
|
TRAP_ENTRY_REDIR(exc_double_fault, true ),
|
2017-09-01 01:42:49 +08:00
|
|
|
#ifdef CONFIG_X86_MCE
|
2021-01-20 21:55:42 +08:00
|
|
|
TRAP_ENTRY_REDIR(exc_machine_check, true ),
|
2017-09-01 01:42:49 +08:00
|
|
|
#endif
|
2020-07-04 01:02:55 +08:00
|
|
|
TRAP_ENTRY_REDIR(exc_nmi, true ),
|
2020-02-26 06:16:16 +08:00
|
|
|
TRAP_ENTRY(exc_int3, false ),
|
2020-02-26 06:16:15 +08:00
|
|
|
TRAP_ENTRY(exc_overflow, false ),
|
2017-09-01 01:42:49 +08:00
|
|
|
#ifdef CONFIG_IA32_EMULATION
|
|
|
|
{ entry_INT80_compat, xen_entry_INT80_compat, false },
|
|
|
|
#endif
|
2020-05-22 04:05:28 +08:00
|
|
|
TRAP_ENTRY(exc_page_fault, false ),
|
2020-02-26 06:16:14 +08:00
|
|
|
TRAP_ENTRY(exc_divide_error, false ),
|
2020-02-26 06:16:17 +08:00
|
|
|
TRAP_ENTRY(exc_bounds, false ),
|
2020-02-26 06:16:18 +08:00
|
|
|
TRAP_ENTRY(exc_invalid_op, false ),
|
2020-02-26 06:16:19 +08:00
|
|
|
TRAP_ENTRY(exc_device_not_available, false ),
|
2020-02-26 06:16:20 +08:00
|
|
|
TRAP_ENTRY(exc_coproc_segment_overrun, false ),
|
2020-02-26 06:16:22 +08:00
|
|
|
TRAP_ENTRY(exc_invalid_tss, false ),
|
2020-02-26 06:16:23 +08:00
|
|
|
TRAP_ENTRY(exc_segment_not_present, false ),
|
2020-02-26 06:16:24 +08:00
|
|
|
TRAP_ENTRY(exc_stack_segment, false ),
|
2020-02-26 06:16:25 +08:00
|
|
|
TRAP_ENTRY(exc_general_protection, false ),
|
2020-02-26 06:16:26 +08:00
|
|
|
TRAP_ENTRY(exc_spurious_interrupt_bug, false ),
|
2020-02-26 06:16:27 +08:00
|
|
|
TRAP_ENTRY(exc_coprocessor_error, false ),
|
2020-02-26 06:16:28 +08:00
|
|
|
TRAP_ENTRY(exc_alignment_check, false ),
|
2020-02-26 06:16:29 +08:00
|
|
|
TRAP_ENTRY(exc_simd_coprocessor_error, false ),
|
2022-03-08 23:30:23 +08:00
|
|
|
#ifdef CONFIG_X86_KERNEL_IBT
|
|
|
|
TRAP_ENTRY(exc_control_protection, false ),
|
|
|
|
#endif
|
2017-09-01 01:42:49 +08:00
|
|
|
};
|
|
|
|
|
2017-11-24 16:42:21 +08:00
|
|
|
static bool __ref get_trap_addr(void **addr, unsigned int ist)
|
2017-09-01 01:42:49 +08:00
|
|
|
{
|
|
|
|
unsigned int nr;
|
|
|
|
bool ist_okay = false;
|
2021-01-25 21:42:07 +08:00
|
|
|
bool found = false;
|
2017-09-01 01:42:49 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Replace trap handler addresses by Xen specific ones.
|
|
|
|
* Check for known traps using IST and whitelist them.
|
|
|
|
* The debugger ones are the only ones we care about.
|
2020-02-26 06:33:31 +08:00
|
|
|
* Xen will handle faults like double_fault, so we should never see
|
2017-09-01 01:42:49 +08:00
|
|
|
* them. Warn if there's an unexpected IST-using fault handler.
|
|
|
|
*/
|
|
|
|
for (nr = 0; nr < ARRAY_SIZE(trap_array); nr++) {
|
|
|
|
struct trap_array_entry *entry = trap_array + nr;
|
|
|
|
|
|
|
|
if (*addr == entry->orig) {
|
|
|
|
*addr = entry->xen;
|
|
|
|
ist_okay = entry->ist_okay;
|
2021-01-25 21:42:07 +08:00
|
|
|
found = true;
|
2017-09-01 01:42:49 +08:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-11-24 16:42:21 +08:00
|
|
|
if (nr == ARRAY_SIZE(trap_array) &&
|
|
|
|
*addr >= (void *)early_idt_handler_array[0] &&
|
|
|
|
*addr < (void *)early_idt_handler_array[NUM_EXCEPTION_VECTORS]) {
|
|
|
|
nr = (*addr - (void *)early_idt_handler_array[0]) /
|
|
|
|
EARLY_IDT_HANDLER_SIZE;
|
|
|
|
*addr = (void *)xen_early_idt_handler_array[nr];
|
2021-01-25 21:42:07 +08:00
|
|
|
found = true;
|
2017-11-24 16:42:21 +08:00
|
|
|
}
|
|
|
|
|
2021-01-25 21:42:07 +08:00
|
|
|
if (!found)
|
|
|
|
*addr = (void *)xen_asm_exc_xen_unknown_trap;
|
|
|
|
|
|
|
|
if (WARN_ON(found && ist != 0 && !ist_okay))
|
2017-09-01 01:42:49 +08:00
|
|
|
return false;
|
|
|
|
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
2017-03-15 01:35:41 +08:00
|
|
|
static int cvt_gate_to_trap(int vector, const gate_desc *val,
|
|
|
|
struct trap_info *info)
|
|
|
|
{
|
|
|
|
unsigned long addr;
|
|
|
|
|
2017-08-28 14:47:37 +08:00
|
|
|
if (val->bits.type != GATE_TRAP && val->bits.type != GATE_INTERRUPT)
|
2017-03-15 01:35:41 +08:00
|
|
|
return 0;
|
|
|
|
|
|
|
|
info->vector = vector;
|
|
|
|
|
2017-08-28 14:47:37 +08:00
|
|
|
addr = gate_offset(val);
|
2017-09-01 01:42:49 +08:00
|
|
|
if (!get_trap_addr((void **)&addr, val->bits.ist))
|
2017-03-15 01:35:41 +08:00
|
|
|
return 0;
|
|
|
|
info->address = addr;
|
|
|
|
|
2017-08-28 14:47:37 +08:00
|
|
|
info->cs = gate_segment(val);
|
|
|
|
info->flags = val->bits.dpl;
|
2017-03-15 01:35:41 +08:00
|
|
|
/* interrupt gates clear IF */
|
2017-08-28 14:47:37 +08:00
|
|
|
if (val->bits.type == GATE_INTERRUPT)
|
2017-03-15 01:35:41 +08:00
|
|
|
info->flags |= 1 << 2;
|
|
|
|
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Locations of each CPU's IDT */
|
|
|
|
static DEFINE_PER_CPU(struct desc_ptr, idt_desc);
|
|
|
|
|
|
|
|
/* Set an IDT entry. If the entry is part of the current IDT, then
|
|
|
|
also update Xen. */
|
|
|
|
static void xen_write_idt_entry(gate_desc *dt, int entrynum, const gate_desc *g)
|
|
|
|
{
|
|
|
|
unsigned long p = (unsigned long)&dt[entrynum];
|
|
|
|
unsigned long start, end;
|
|
|
|
|
|
|
|
trace_xen_cpu_write_idt_entry(dt, entrynum, g);
|
|
|
|
|
|
|
|
preempt_disable();
|
|
|
|
|
|
|
|
start = __this_cpu_read(idt_desc.address);
|
|
|
|
end = start + __this_cpu_read(idt_desc.size) + 1;
|
|
|
|
|
|
|
|
xen_mc_flush();
|
|
|
|
|
|
|
|
native_write_idt_entry(dt, entrynum, g);
|
|
|
|
|
|
|
|
if (p >= start && (p + 8) <= end) {
|
|
|
|
struct trap_info info[2];
|
|
|
|
|
|
|
|
info[1].address = 0;
|
|
|
|
|
|
|
|
if (cvt_gate_to_trap(entrynum, g, &info[0]))
|
|
|
|
if (HYPERVISOR_set_trap_table(info))
|
|
|
|
BUG();
|
|
|
|
}
|
|
|
|
|
|
|
|
preempt_enable();
|
|
|
|
}
|
|
|
|
|
2021-09-21 00:15:11 +08:00
|
|
|
static unsigned xen_convert_trap_info(const struct desc_ptr *desc,
|
|
|
|
struct trap_info *traps, bool full)
|
2017-03-15 01:35:41 +08:00
|
|
|
{
|
|
|
|
unsigned in, out, count;
|
|
|
|
|
|
|
|
count = (desc->size+1) / sizeof(gate_desc);
|
|
|
|
BUG_ON(count > 256);
|
|
|
|
|
|
|
|
for (in = out = 0; in < count; in++) {
|
|
|
|
gate_desc *entry = (gate_desc *)(desc->address) + in;
|
|
|
|
|
2021-09-21 00:15:11 +08:00
|
|
|
if (cvt_gate_to_trap(in, entry, &traps[out]) || full)
|
2017-03-15 01:35:41 +08:00
|
|
|
out++;
|
|
|
|
}
|
2021-09-21 00:15:11 +08:00
|
|
|
|
|
|
|
return out;
|
2017-03-15 01:35:41 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
void xen_copy_trap_info(struct trap_info *traps)
|
|
|
|
{
|
|
|
|
const struct desc_ptr *desc = this_cpu_ptr(&idt_desc);
|
|
|
|
|
2021-09-21 00:15:11 +08:00
|
|
|
xen_convert_trap_info(desc, traps, true);
|
2017-03-15 01:35:41 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/* Load a new IDT into Xen. In principle this can be per-CPU, so we
|
|
|
|
hold a spinlock to protect the static traps[] array (static because
|
|
|
|
it avoids allocation, and saves stack space). */
|
|
|
|
static void xen_load_idt(const struct desc_ptr *desc)
|
|
|
|
{
|
|
|
|
static DEFINE_SPINLOCK(lock);
|
|
|
|
static struct trap_info traps[257];
|
2022-09-20 10:45:14 +08:00
|
|
|
static const struct trap_info zero = { };
|
2021-09-21 00:15:11 +08:00
|
|
|
unsigned out;
|
2017-03-15 01:35:41 +08:00
|
|
|
|
|
|
|
trace_xen_cpu_load_idt(desc);
|
|
|
|
|
|
|
|
spin_lock(&lock);
|
|
|
|
|
|
|
|
memcpy(this_cpu_ptr(&idt_desc), desc, sizeof(idt_desc));
|
|
|
|
|
2021-09-21 00:15:11 +08:00
|
|
|
out = xen_convert_trap_info(desc, traps, false);
|
2022-09-20 10:45:14 +08:00
|
|
|
traps[out] = zero;
|
2017-03-15 01:35:41 +08:00
|
|
|
|
|
|
|
xen_mc_flush();
|
|
|
|
if (HYPERVISOR_set_trap_table(traps))
|
|
|
|
BUG();
|
|
|
|
|
|
|
|
spin_unlock(&lock);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Write a GDT descriptor entry. Ignore LDT descriptors, since
|
|
|
|
they're handled differently. */
|
|
|
|
static void xen_write_gdt_entry(struct desc_struct *dt, int entry,
|
|
|
|
const void *desc, int type)
|
|
|
|
{
|
|
|
|
trace_xen_cpu_write_gdt_entry(dt, entry, desc, type);
|
|
|
|
|
|
|
|
preempt_disable();
|
|
|
|
|
|
|
|
switch (type) {
|
|
|
|
case DESC_LDT:
|
|
|
|
case DESC_TSS:
|
|
|
|
/* ignore */
|
|
|
|
break;
|
|
|
|
|
|
|
|
default: {
|
|
|
|
xmaddr_t maddr = arbitrary_virt_to_machine(&dt[entry]);
|
|
|
|
|
|
|
|
xen_mc_flush();
|
|
|
|
if (HYPERVISOR_update_descriptor(maddr.maddr, *(u64 *)desc))
|
|
|
|
BUG();
|
|
|
|
}
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
preempt_enable();
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Version of write_gdt_entry for use at early boot-time needed to
|
|
|
|
* update an entry as simply as possible.
|
|
|
|
*/
|
|
|
|
static void __init xen_write_gdt_entry_boot(struct desc_struct *dt, int entry,
|
|
|
|
const void *desc, int type)
|
|
|
|
{
|
|
|
|
trace_xen_cpu_write_gdt_entry(dt, entry, desc, type);
|
|
|
|
|
|
|
|
switch (type) {
|
|
|
|
case DESC_LDT:
|
|
|
|
case DESC_TSS:
|
|
|
|
/* ignore */
|
|
|
|
break;
|
|
|
|
|
|
|
|
default: {
|
|
|
|
xmaddr_t maddr = virt_to_machine(&dt[entry]);
|
|
|
|
|
|
|
|
if (HYPERVISOR_update_descriptor(maddr.maddr, *(u64 *)desc))
|
|
|
|
dt[entry] = *(struct desc_struct *)desc;
|
|
|
|
}
|
|
|
|
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-11-02 15:59:10 +08:00
|
|
|
static void xen_load_sp0(unsigned long sp0)
|
2017-03-15 01:35:41 +08:00
|
|
|
{
|
|
|
|
struct multicall_space mcs;
|
|
|
|
|
|
|
|
mcs = xen_mc_entry(0);
|
2017-11-02 15:59:10 +08:00
|
|
|
MULTI_stack_switch(mcs.mc, __KERNEL_DS, sp0);
|
2017-03-15 01:35:41 +08:00
|
|
|
xen_mc_issue(PARAVIRT_LAZY_CPU);
|
2017-12-04 22:07:29 +08:00
|
|
|
this_cpu_write(cpu_tss_rw.x86_tss.sp0, sp0);
|
2017-03-15 01:35:41 +08:00
|
|
|
}
|
|
|
|
|
2020-02-18 23:47:12 +08:00
|
|
|
#ifdef CONFIG_X86_IOPL_IOPERM
|
2020-07-18 07:53:55 +08:00
|
|
|
static void xen_invalidate_io_bitmap(void)
|
|
|
|
{
|
|
|
|
struct physdev_set_iobitmap iobitmap = {
|
2020-07-21 18:02:17 +08:00
|
|
|
.bitmap = NULL,
|
2020-07-18 07:53:55 +08:00
|
|
|
.nr_ports = 0,
|
|
|
|
};
|
|
|
|
|
|
|
|
native_tss_invalidate_io_bitmap();
|
|
|
|
HYPERVISOR_physdev_op(PHYSDEVOP_set_iobitmap, &iobitmap);
|
|
|
|
}
|
|
|
|
|
2020-02-18 23:47:12 +08:00
|
|
|
static void xen_update_io_bitmap(void)
|
|
|
|
{
|
|
|
|
struct physdev_set_iobitmap iobitmap;
|
|
|
|
struct tss_struct *tss = this_cpu_ptr(&cpu_tss_rw);
|
|
|
|
|
|
|
|
native_tss_update_io_bitmap();
|
|
|
|
|
|
|
|
iobitmap.bitmap = (uint8_t *)(&tss->x86_tss) +
|
|
|
|
tss->x86_tss.io_bitmap_base;
|
|
|
|
if (tss->x86_tss.io_bitmap_base == IO_BITMAP_OFFSET_INVALID)
|
|
|
|
iobitmap.nr_ports = 0;
|
|
|
|
else
|
|
|
|
iobitmap.nr_ports = IO_BITMAP_BITS;
|
|
|
|
|
|
|
|
HYPERVISOR_physdev_op(PHYSDEVOP_set_iobitmap, &iobitmap);
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2017-03-15 01:35:41 +08:00
|
|
|
static void xen_io_delay(void)
|
|
|
|
{
|
|
|
|
}
|
|
|
|
|
|
|
|
static DEFINE_PER_CPU(unsigned long, xen_cr0_value);
|
|
|
|
|
|
|
|
static unsigned long xen_read_cr0(void)
|
|
|
|
{
|
|
|
|
unsigned long cr0 = this_cpu_read(xen_cr0_value);
|
|
|
|
|
|
|
|
if (unlikely(cr0 == 0)) {
|
|
|
|
cr0 = native_read_cr0();
|
|
|
|
this_cpu_write(xen_cr0_value, cr0);
|
|
|
|
}
|
|
|
|
|
|
|
|
return cr0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void xen_write_cr0(unsigned long cr0)
|
|
|
|
{
|
|
|
|
struct multicall_space mcs;
|
|
|
|
|
|
|
|
this_cpu_write(xen_cr0_value, cr0);
|
|
|
|
|
|
|
|
/* Only pay attention to cr0.TS; everything else is
|
|
|
|
ignored. */
|
|
|
|
mcs = xen_mc_entry(0);
|
|
|
|
|
|
|
|
MULTI_fpu_taskswitch(mcs.mc, (cr0 & X86_CR0_TS) != 0);
|
|
|
|
|
|
|
|
xen_mc_issue(PARAVIRT_LAZY_CPU);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void xen_write_cr4(unsigned long cr4)
|
|
|
|
{
|
|
|
|
cr4 &= ~(X86_CR4_PGE | X86_CR4_PSE | X86_CR4_PCE);
|
|
|
|
|
|
|
|
native_write_cr4(cr4);
|
|
|
|
}
|
|
|
|
|
2022-09-26 18:33:03 +08:00
|
|
|
static u64 xen_do_read_msr(unsigned int msr, int *err)
|
2017-03-15 01:35:41 +08:00
|
|
|
{
|
2022-09-26 18:33:03 +08:00
|
|
|
u64 val = 0; /* Avoid uninitialized value for safe variant. */
|
2017-03-15 01:35:41 +08:00
|
|
|
|
|
|
|
if (pmu_msr_read(msr, &val, err))
|
|
|
|
return val;
|
|
|
|
|
2022-09-26 18:33:03 +08:00
|
|
|
if (err)
|
|
|
|
val = native_read_msr_safe(msr, err);
|
|
|
|
else
|
|
|
|
val = native_read_msr(msr);
|
|
|
|
|
2017-03-15 01:35:41 +08:00
|
|
|
switch (msr) {
|
|
|
|
case MSR_IA32_APICBASE:
|
2018-12-10 18:03:00 +08:00
|
|
|
val &= ~X2APIC_ENABLE;
|
2017-03-15 01:35:41 +08:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
return val;
|
|
|
|
}
|
|
|
|
|
2022-09-26 18:33:03 +08:00
|
|
|
static void set_seg(unsigned int which, unsigned int low, unsigned int high,
|
|
|
|
int *err)
|
2017-03-15 01:35:41 +08:00
|
|
|
{
|
2022-09-26 18:33:03 +08:00
|
|
|
u64 base = ((u64)high << 32) | low;
|
2017-03-15 01:35:41 +08:00
|
|
|
|
2022-09-26 18:33:03 +08:00
|
|
|
if (HYPERVISOR_set_segment_base(which, base) == 0)
|
|
|
|
return;
|
2017-03-15 01:35:41 +08:00
|
|
|
|
2022-09-26 18:33:03 +08:00
|
|
|
if (err)
|
|
|
|
*err = -EIO;
|
|
|
|
else
|
|
|
|
WARN(1, "Xen set_segment_base(%u, %llx) failed\n", which, base);
|
|
|
|
}
|
2017-03-15 01:35:41 +08:00
|
|
|
|
2022-09-26 18:33:03 +08:00
|
|
|
/*
|
|
|
|
* Support write_msr_safe() and write_msr() semantics.
|
|
|
|
* With err == NULL write_msr() semantics are selected.
|
|
|
|
* Supplying an err pointer requires err to be pre-initialized with 0.
|
|
|
|
*/
|
|
|
|
static void xen_do_write_msr(unsigned int msr, unsigned int low,
|
|
|
|
unsigned int high, int *err)
|
|
|
|
{
|
2017-03-15 01:35:41 +08:00
|
|
|
switch (msr) {
|
2022-09-26 18:33:03 +08:00
|
|
|
case MSR_FS_BASE:
|
|
|
|
set_seg(SEGBASE_FS, low, high, err);
|
|
|
|
break;
|
|
|
|
|
|
|
|
case MSR_KERNEL_GS_BASE:
|
|
|
|
set_seg(SEGBASE_GS_USER, low, high, err);
|
|
|
|
break;
|
|
|
|
|
|
|
|
case MSR_GS_BASE:
|
|
|
|
set_seg(SEGBASE_GS_KERNEL, low, high, err);
|
2017-03-15 01:35:41 +08:00
|
|
|
break;
|
|
|
|
|
|
|
|
case MSR_STAR:
|
|
|
|
case MSR_CSTAR:
|
|
|
|
case MSR_LSTAR:
|
|
|
|
case MSR_SYSCALL_MASK:
|
|
|
|
case MSR_IA32_SYSENTER_CS:
|
|
|
|
case MSR_IA32_SYSENTER_ESP:
|
|
|
|
case MSR_IA32_SYSENTER_EIP:
|
|
|
|
/* Fast syscall setup is all done in hypercalls, so
|
|
|
|
these are all ignored. Stub them out here to stop
|
|
|
|
Xen console noise. */
|
|
|
|
break;
|
|
|
|
|
|
|
|
default:
|
2022-09-26 18:33:03 +08:00
|
|
|
if (!pmu_msr_write(msr, low, high, err)) {
|
|
|
|
if (err)
|
|
|
|
*err = native_write_msr_safe(msr, low, high);
|
|
|
|
else
|
|
|
|
native_write_msr(msr, low, high);
|
|
|
|
}
|
2017-03-15 01:35:41 +08:00
|
|
|
}
|
2022-09-26 18:33:03 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static u64 xen_read_msr_safe(unsigned int msr, int *err)
|
|
|
|
{
|
|
|
|
return xen_do_read_msr(msr, err);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int xen_write_msr_safe(unsigned int msr, unsigned int low,
|
|
|
|
unsigned int high)
|
|
|
|
{
|
|
|
|
int err = 0;
|
|
|
|
|
|
|
|
xen_do_write_msr(msr, low, high, &err);
|
2017-03-15 01:35:41 +08:00
|
|
|
|
2022-09-26 18:33:03 +08:00
|
|
|
return err;
|
2017-03-15 01:35:41 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static u64 xen_read_msr(unsigned int msr)
|
|
|
|
{
|
|
|
|
int err;
|
|
|
|
|
2022-09-26 19:16:56 +08:00
|
|
|
return xen_do_read_msr(msr, xen_msr_safe ? &err : NULL);
|
2017-03-15 01:35:41 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void xen_write_msr(unsigned int msr, unsigned low, unsigned high)
|
|
|
|
{
|
2022-09-26 19:16:56 +08:00
|
|
|
int err;
|
|
|
|
|
|
|
|
xen_do_write_msr(msr, low, high, xen_msr_safe ? &err : NULL);
|
2017-03-15 01:35:41 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/* This is called once we have the cpu_possible_mask */
|
2018-07-20 04:55:31 +08:00
|
|
|
void __init xen_setup_vcpu_info_placement(void)
|
2017-03-15 01:35:41 +08:00
|
|
|
{
|
|
|
|
int cpu;
|
|
|
|
|
|
|
|
for_each_possible_cpu(cpu) {
|
|
|
|
/* Set up direct vCPU id mapping for PV guests. */
|
|
|
|
per_cpu(xen_vcpu_id, cpu) = cpu;
|
2021-10-28 15:27:47 +08:00
|
|
|
xen_vcpu_setup(cpu);
|
2017-03-15 01:35:41 +08:00
|
|
|
}
|
|
|
|
|
2021-10-28 15:27:47 +08:00
|
|
|
pv_ops.irq.save_fl = __PV_IS_CALLEE_SAVE(xen_save_fl_direct);
|
|
|
|
pv_ops.irq.irq_disable = __PV_IS_CALLEE_SAVE(xen_irq_disable_direct);
|
|
|
|
pv_ops.irq.irq_enable = __PV_IS_CALLEE_SAVE(xen_irq_enable_direct);
|
|
|
|
pv_ops.mmu.read_cr2 = __PV_IS_CALLEE_SAVE(xen_read_cr2_direct);
|
2017-03-15 01:35:41 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static const struct pv_info xen_info __initconst = {
|
|
|
|
.extra_user_64bit_cs = FLAT_USER_CS64,
|
|
|
|
.name = "Xen",
|
|
|
|
};
|
|
|
|
|
x86/xen: Rework the xen_{cpu,irq,mmu}_opsarrays
In order to allow objtool to make sense of all the various paravirt
functions, it needs to either parse whole pv_ops[] tables, or observe
individual assignments in the form:
bf87: 48 c7 05 00 00 00 00 00 00 00 00 movq $0x0,0x0(%rip)
bf92 <xen_init_spinlocks+0x5f>
bf8a: R_X86_64_PC32 pv_ops+0x268
As is, xen_cpu_ops[] is at offset +0 in pv_ops[] and could thus be
parsed as a 'normal' pv_ops[] table, however xen_irq_ops[] and
xen_mmu_ops[] are not.
Worse, both the latter two are compiled into the individual assignment
for by current GCC, but that's not something one can rely on.
Therefore, convert all three into full pv_ops[] tables. This has the
benefit of not needing to teach objtool about the offsets and
resulting in more conservative code-gen.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20210624095149.057262522@infradead.org
2021-06-24 17:41:22 +08:00
|
|
|
static const typeof(pv_ops) xen_cpu_ops __initconst = {
|
|
|
|
.cpu = {
|
|
|
|
.cpuid = xen_cpuid,
|
2017-03-15 01:35:41 +08:00
|
|
|
|
x86/xen: Rework the xen_{cpu,irq,mmu}_opsarrays
In order to allow objtool to make sense of all the various paravirt
functions, it needs to either parse whole pv_ops[] tables, or observe
individual assignments in the form:
bf87: 48 c7 05 00 00 00 00 00 00 00 00 movq $0x0,0x0(%rip)
bf92 <xen_init_spinlocks+0x5f>
bf8a: R_X86_64_PC32 pv_ops+0x268
As is, xen_cpu_ops[] is at offset +0 in pv_ops[] and could thus be
parsed as a 'normal' pv_ops[] table, however xen_irq_ops[] and
xen_mmu_ops[] are not.
Worse, both the latter two are compiled into the individual assignment
for by current GCC, but that's not something one can rely on.
Therefore, convert all three into full pv_ops[] tables. This has the
benefit of not needing to teach objtool about the offsets and
resulting in more conservative code-gen.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20210624095149.057262522@infradead.org
2021-06-24 17:41:22 +08:00
|
|
|
.set_debugreg = xen_set_debugreg,
|
|
|
|
.get_debugreg = xen_get_debugreg,
|
2017-03-15 01:35:41 +08:00
|
|
|
|
x86/xen: Rework the xen_{cpu,irq,mmu}_opsarrays
In order to allow objtool to make sense of all the various paravirt
functions, it needs to either parse whole pv_ops[] tables, or observe
individual assignments in the form:
bf87: 48 c7 05 00 00 00 00 00 00 00 00 movq $0x0,0x0(%rip)
bf92 <xen_init_spinlocks+0x5f>
bf8a: R_X86_64_PC32 pv_ops+0x268
As is, xen_cpu_ops[] is at offset +0 in pv_ops[] and could thus be
parsed as a 'normal' pv_ops[] table, however xen_irq_ops[] and
xen_mmu_ops[] are not.
Worse, both the latter two are compiled into the individual assignment
for by current GCC, but that's not something one can rely on.
Therefore, convert all three into full pv_ops[] tables. This has the
benefit of not needing to teach objtool about the offsets and
resulting in more conservative code-gen.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20210624095149.057262522@infradead.org
2021-06-24 17:41:22 +08:00
|
|
|
.read_cr0 = xen_read_cr0,
|
|
|
|
.write_cr0 = xen_write_cr0,
|
2017-03-15 01:35:41 +08:00
|
|
|
|
x86/xen: Rework the xen_{cpu,irq,mmu}_opsarrays
In order to allow objtool to make sense of all the various paravirt
functions, it needs to either parse whole pv_ops[] tables, or observe
individual assignments in the form:
bf87: 48 c7 05 00 00 00 00 00 00 00 00 movq $0x0,0x0(%rip)
bf92 <xen_init_spinlocks+0x5f>
bf8a: R_X86_64_PC32 pv_ops+0x268
As is, xen_cpu_ops[] is at offset +0 in pv_ops[] and could thus be
parsed as a 'normal' pv_ops[] table, however xen_irq_ops[] and
xen_mmu_ops[] are not.
Worse, both the latter two are compiled into the individual assignment
for by current GCC, but that's not something one can rely on.
Therefore, convert all three into full pv_ops[] tables. This has the
benefit of not needing to teach objtool about the offsets and
resulting in more conservative code-gen.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20210624095149.057262522@infradead.org
2021-06-24 17:41:22 +08:00
|
|
|
.write_cr4 = xen_write_cr4,
|
2017-03-15 01:35:41 +08:00
|
|
|
|
2023-01-13 03:43:44 +08:00
|
|
|
.wbinvd = pv_native_wbinvd,
|
2017-03-15 01:35:41 +08:00
|
|
|
|
x86/xen: Rework the xen_{cpu,irq,mmu}_opsarrays
In order to allow objtool to make sense of all the various paravirt
functions, it needs to either parse whole pv_ops[] tables, or observe
individual assignments in the form:
bf87: 48 c7 05 00 00 00 00 00 00 00 00 movq $0x0,0x0(%rip)
bf92 <xen_init_spinlocks+0x5f>
bf8a: R_X86_64_PC32 pv_ops+0x268
As is, xen_cpu_ops[] is at offset +0 in pv_ops[] and could thus be
parsed as a 'normal' pv_ops[] table, however xen_irq_ops[] and
xen_mmu_ops[] are not.
Worse, both the latter two are compiled into the individual assignment
for by current GCC, but that's not something one can rely on.
Therefore, convert all three into full pv_ops[] tables. This has the
benefit of not needing to teach objtool about the offsets and
resulting in more conservative code-gen.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20210624095149.057262522@infradead.org
2021-06-24 17:41:22 +08:00
|
|
|
.read_msr = xen_read_msr,
|
|
|
|
.write_msr = xen_write_msr,
|
2017-03-15 01:35:41 +08:00
|
|
|
|
x86/xen: Rework the xen_{cpu,irq,mmu}_opsarrays
In order to allow objtool to make sense of all the various paravirt
functions, it needs to either parse whole pv_ops[] tables, or observe
individual assignments in the form:
bf87: 48 c7 05 00 00 00 00 00 00 00 00 movq $0x0,0x0(%rip)
bf92 <xen_init_spinlocks+0x5f>
bf8a: R_X86_64_PC32 pv_ops+0x268
As is, xen_cpu_ops[] is at offset +0 in pv_ops[] and could thus be
parsed as a 'normal' pv_ops[] table, however xen_irq_ops[] and
xen_mmu_ops[] are not.
Worse, both the latter two are compiled into the individual assignment
for by current GCC, but that's not something one can rely on.
Therefore, convert all three into full pv_ops[] tables. This has the
benefit of not needing to teach objtool about the offsets and
resulting in more conservative code-gen.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20210624095149.057262522@infradead.org
2021-06-24 17:41:22 +08:00
|
|
|
.read_msr_safe = xen_read_msr_safe,
|
|
|
|
.write_msr_safe = xen_write_msr_safe,
|
2017-03-15 01:35:41 +08:00
|
|
|
|
x86/xen: Rework the xen_{cpu,irq,mmu}_opsarrays
In order to allow objtool to make sense of all the various paravirt
functions, it needs to either parse whole pv_ops[] tables, or observe
individual assignments in the form:
bf87: 48 c7 05 00 00 00 00 00 00 00 00 movq $0x0,0x0(%rip)
bf92 <xen_init_spinlocks+0x5f>
bf8a: R_X86_64_PC32 pv_ops+0x268
As is, xen_cpu_ops[] is at offset +0 in pv_ops[] and could thus be
parsed as a 'normal' pv_ops[] table, however xen_irq_ops[] and
xen_mmu_ops[] are not.
Worse, both the latter two are compiled into the individual assignment
for by current GCC, but that's not something one can rely on.
Therefore, convert all three into full pv_ops[] tables. This has the
benefit of not needing to teach objtool about the offsets and
resulting in more conservative code-gen.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20210624095149.057262522@infradead.org
2021-06-24 17:41:22 +08:00
|
|
|
.read_pmc = xen_read_pmc,
|
2017-03-15 01:35:41 +08:00
|
|
|
|
x86/xen: Rework the xen_{cpu,irq,mmu}_opsarrays
In order to allow objtool to make sense of all the various paravirt
functions, it needs to either parse whole pv_ops[] tables, or observe
individual assignments in the form:
bf87: 48 c7 05 00 00 00 00 00 00 00 00 movq $0x0,0x0(%rip)
bf92 <xen_init_spinlocks+0x5f>
bf8a: R_X86_64_PC32 pv_ops+0x268
As is, xen_cpu_ops[] is at offset +0 in pv_ops[] and could thus be
parsed as a 'normal' pv_ops[] table, however xen_irq_ops[] and
xen_mmu_ops[] are not.
Worse, both the latter two are compiled into the individual assignment
for by current GCC, but that's not something one can rely on.
Therefore, convert all three into full pv_ops[] tables. This has the
benefit of not needing to teach objtool about the offsets and
resulting in more conservative code-gen.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20210624095149.057262522@infradead.org
2021-06-24 17:41:22 +08:00
|
|
|
.load_tr_desc = paravirt_nop,
|
|
|
|
.set_ldt = xen_set_ldt,
|
|
|
|
.load_gdt = xen_load_gdt,
|
|
|
|
.load_idt = xen_load_idt,
|
|
|
|
.load_tls = xen_load_tls,
|
|
|
|
.load_gs_index = xen_load_gs_index,
|
2017-03-15 01:35:41 +08:00
|
|
|
|
x86/xen: Rework the xen_{cpu,irq,mmu}_opsarrays
In order to allow objtool to make sense of all the various paravirt
functions, it needs to either parse whole pv_ops[] tables, or observe
individual assignments in the form:
bf87: 48 c7 05 00 00 00 00 00 00 00 00 movq $0x0,0x0(%rip)
bf92 <xen_init_spinlocks+0x5f>
bf8a: R_X86_64_PC32 pv_ops+0x268
As is, xen_cpu_ops[] is at offset +0 in pv_ops[] and could thus be
parsed as a 'normal' pv_ops[] table, however xen_irq_ops[] and
xen_mmu_ops[] are not.
Worse, both the latter two are compiled into the individual assignment
for by current GCC, but that's not something one can rely on.
Therefore, convert all three into full pv_ops[] tables. This has the
benefit of not needing to teach objtool about the offsets and
resulting in more conservative code-gen.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20210624095149.057262522@infradead.org
2021-06-24 17:41:22 +08:00
|
|
|
.alloc_ldt = xen_alloc_ldt,
|
|
|
|
.free_ldt = xen_free_ldt,
|
2017-03-15 01:35:41 +08:00
|
|
|
|
x86/xen: Rework the xen_{cpu,irq,mmu}_opsarrays
In order to allow objtool to make sense of all the various paravirt
functions, it needs to either parse whole pv_ops[] tables, or observe
individual assignments in the form:
bf87: 48 c7 05 00 00 00 00 00 00 00 00 movq $0x0,0x0(%rip)
bf92 <xen_init_spinlocks+0x5f>
bf8a: R_X86_64_PC32 pv_ops+0x268
As is, xen_cpu_ops[] is at offset +0 in pv_ops[] and could thus be
parsed as a 'normal' pv_ops[] table, however xen_irq_ops[] and
xen_mmu_ops[] are not.
Worse, both the latter two are compiled into the individual assignment
for by current GCC, but that's not something one can rely on.
Therefore, convert all three into full pv_ops[] tables. This has the
benefit of not needing to teach objtool about the offsets and
resulting in more conservative code-gen.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20210624095149.057262522@infradead.org
2021-06-24 17:41:22 +08:00
|
|
|
.store_tr = xen_store_tr,
|
2017-03-15 01:35:41 +08:00
|
|
|
|
x86/xen: Rework the xen_{cpu,irq,mmu}_opsarrays
In order to allow objtool to make sense of all the various paravirt
functions, it needs to either parse whole pv_ops[] tables, or observe
individual assignments in the form:
bf87: 48 c7 05 00 00 00 00 00 00 00 00 movq $0x0,0x0(%rip)
bf92 <xen_init_spinlocks+0x5f>
bf8a: R_X86_64_PC32 pv_ops+0x268
As is, xen_cpu_ops[] is at offset +0 in pv_ops[] and could thus be
parsed as a 'normal' pv_ops[] table, however xen_irq_ops[] and
xen_mmu_ops[] are not.
Worse, both the latter two are compiled into the individual assignment
for by current GCC, but that's not something one can rely on.
Therefore, convert all three into full pv_ops[] tables. This has the
benefit of not needing to teach objtool about the offsets and
resulting in more conservative code-gen.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20210624095149.057262522@infradead.org
2021-06-24 17:41:22 +08:00
|
|
|
.write_ldt_entry = xen_write_ldt_entry,
|
|
|
|
.write_gdt_entry = xen_write_gdt_entry,
|
|
|
|
.write_idt_entry = xen_write_idt_entry,
|
|
|
|
.load_sp0 = xen_load_sp0,
|
2017-03-15 01:35:41 +08:00
|
|
|
|
2020-02-18 23:47:12 +08:00
|
|
|
#ifdef CONFIG_X86_IOPL_IOPERM
|
x86/xen: Rework the xen_{cpu,irq,mmu}_opsarrays
In order to allow objtool to make sense of all the various paravirt
functions, it needs to either parse whole pv_ops[] tables, or observe
individual assignments in the form:
bf87: 48 c7 05 00 00 00 00 00 00 00 00 movq $0x0,0x0(%rip)
bf92 <xen_init_spinlocks+0x5f>
bf8a: R_X86_64_PC32 pv_ops+0x268
As is, xen_cpu_ops[] is at offset +0 in pv_ops[] and could thus be
parsed as a 'normal' pv_ops[] table, however xen_irq_ops[] and
xen_mmu_ops[] are not.
Worse, both the latter two are compiled into the individual assignment
for by current GCC, but that's not something one can rely on.
Therefore, convert all three into full pv_ops[] tables. This has the
benefit of not needing to teach objtool about the offsets and
resulting in more conservative code-gen.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20210624095149.057262522@infradead.org
2021-06-24 17:41:22 +08:00
|
|
|
.invalidate_io_bitmap = xen_invalidate_io_bitmap,
|
|
|
|
.update_io_bitmap = xen_update_io_bitmap,
|
2020-02-18 23:47:12 +08:00
|
|
|
#endif
|
x86/xen: Rework the xen_{cpu,irq,mmu}_opsarrays
In order to allow objtool to make sense of all the various paravirt
functions, it needs to either parse whole pv_ops[] tables, or observe
individual assignments in the form:
bf87: 48 c7 05 00 00 00 00 00 00 00 00 movq $0x0,0x0(%rip)
bf92 <xen_init_spinlocks+0x5f>
bf8a: R_X86_64_PC32 pv_ops+0x268
As is, xen_cpu_ops[] is at offset +0 in pv_ops[] and could thus be
parsed as a 'normal' pv_ops[] table, however xen_irq_ops[] and
xen_mmu_ops[] are not.
Worse, both the latter two are compiled into the individual assignment
for by current GCC, but that's not something one can rely on.
Therefore, convert all three into full pv_ops[] tables. This has the
benefit of not needing to teach objtool about the offsets and
resulting in more conservative code-gen.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20210624095149.057262522@infradead.org
2021-06-24 17:41:22 +08:00
|
|
|
.io_delay = xen_io_delay,
|
2017-03-15 01:35:41 +08:00
|
|
|
|
x86/xen: Rework the xen_{cpu,irq,mmu}_opsarrays
In order to allow objtool to make sense of all the various paravirt
functions, it needs to either parse whole pv_ops[] tables, or observe
individual assignments in the form:
bf87: 48 c7 05 00 00 00 00 00 00 00 00 movq $0x0,0x0(%rip)
bf92 <xen_init_spinlocks+0x5f>
bf8a: R_X86_64_PC32 pv_ops+0x268
As is, xen_cpu_ops[] is at offset +0 in pv_ops[] and could thus be
parsed as a 'normal' pv_ops[] table, however xen_irq_ops[] and
xen_mmu_ops[] are not.
Worse, both the latter two are compiled into the individual assignment
for by current GCC, but that's not something one can rely on.
Therefore, convert all three into full pv_ops[] tables. This has the
benefit of not needing to teach objtool about the offsets and
resulting in more conservative code-gen.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20210624095149.057262522@infradead.org
2021-06-24 17:41:22 +08:00
|
|
|
.start_context_switch = paravirt_start_context_switch,
|
|
|
|
.end_context_switch = xen_end_context_switch,
|
|
|
|
},
|
2017-03-15 01:35:41 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
static void xen_restart(char *msg)
|
|
|
|
{
|
|
|
|
xen_reboot(SHUTDOWN_reboot);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void xen_machine_halt(void)
|
|
|
|
{
|
|
|
|
xen_reboot(SHUTDOWN_poweroff);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void xen_machine_power_off(void)
|
|
|
|
{
|
2022-05-10 07:32:22 +08:00
|
|
|
do_kernel_power_off();
|
2017-03-15 01:35:41 +08:00
|
|
|
xen_reboot(SHUTDOWN_poweroff);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void xen_crash_shutdown(struct pt_regs *regs)
|
|
|
|
{
|
|
|
|
xen_reboot(SHUTDOWN_crash);
|
|
|
|
}
|
|
|
|
|
|
|
|
static const struct machine_ops xen_machine_ops __initconst = {
|
|
|
|
.restart = xen_restart,
|
|
|
|
.halt = xen_machine_halt,
|
|
|
|
.power_off = xen_machine_power_off,
|
|
|
|
.shutdown = xen_machine_halt,
|
|
|
|
.crash_shutdown = xen_crash_shutdown,
|
|
|
|
.emergency_restart = xen_emergency_restart,
|
|
|
|
};
|
|
|
|
|
|
|
|
static unsigned char xen_get_nmi_reason(void)
|
|
|
|
{
|
|
|
|
unsigned char reason = 0;
|
|
|
|
|
|
|
|
/* Construct a value which looks like it came from port 0x61. */
|
|
|
|
if (test_bit(_XEN_NMIREASON_io_error,
|
|
|
|
&HYPERVISOR_shared_info->arch.nmi_reason))
|
|
|
|
reason |= NMI_REASON_IOCHK;
|
|
|
|
if (test_bit(_XEN_NMIREASON_pci_serr,
|
|
|
|
&HYPERVISOR_shared_info->arch.nmi_reason))
|
|
|
|
reason |= NMI_REASON_SERR;
|
|
|
|
|
|
|
|
return reason;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void __init xen_boot_params_init_edd(void)
|
|
|
|
{
|
|
|
|
#if IS_ENABLED(CONFIG_EDD)
|
|
|
|
struct xen_platform_op op;
|
|
|
|
struct edd_info *edd_info;
|
|
|
|
u32 *mbr_signature;
|
|
|
|
unsigned nr;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
edd_info = boot_params.eddbuf;
|
|
|
|
mbr_signature = boot_params.edd_mbr_sig_buffer;
|
|
|
|
|
|
|
|
op.cmd = XENPF_firmware_info;
|
|
|
|
|
|
|
|
op.u.firmware_info.type = XEN_FW_DISK_INFO;
|
|
|
|
for (nr = 0; nr < EDDMAXNR; nr++) {
|
|
|
|
struct edd_info *info = edd_info + nr;
|
|
|
|
|
|
|
|
op.u.firmware_info.index = nr;
|
|
|
|
info->params.length = sizeof(info->params);
|
|
|
|
set_xen_guest_handle(op.u.firmware_info.u.disk_info.edd_params,
|
|
|
|
&info->params);
|
|
|
|
ret = HYPERVISOR_platform_op(&op);
|
|
|
|
if (ret)
|
|
|
|
break;
|
|
|
|
|
|
|
|
#define C(x) info->x = op.u.firmware_info.u.disk_info.x
|
|
|
|
C(device);
|
|
|
|
C(version);
|
|
|
|
C(interface_support);
|
|
|
|
C(legacy_max_cylinder);
|
|
|
|
C(legacy_max_head);
|
|
|
|
C(legacy_sectors_per_track);
|
|
|
|
#undef C
|
|
|
|
}
|
|
|
|
boot_params.eddbuf_entries = nr;
|
|
|
|
|
|
|
|
op.u.firmware_info.type = XEN_FW_DISK_MBR_SIGNATURE;
|
|
|
|
for (nr = 0; nr < EDD_MBR_SIG_MAX; nr++) {
|
|
|
|
op.u.firmware_info.index = nr;
|
|
|
|
ret = HYPERVISOR_platform_op(&op);
|
|
|
|
if (ret)
|
|
|
|
break;
|
|
|
|
mbr_signature[nr] = op.u.firmware_info.u.disk_mbr_signature.mbr_signature;
|
|
|
|
}
|
|
|
|
boot_params.edd_mbr_sig_buf_entries = nr;
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Set up the GDT and segment registers for -fstack-protector. Until
|
|
|
|
* we do this, we have to be careful not to call any stack-protected
|
|
|
|
* function, which is most of the kernel.
|
|
|
|
*/
|
2018-06-25 18:34:01 +08:00
|
|
|
static void __init xen_setup_gdt(int cpu)
|
2017-03-15 01:35:41 +08:00
|
|
|
{
|
2018-08-28 15:40:19 +08:00
|
|
|
pv_ops.cpu.write_gdt_entry = xen_write_gdt_entry_boot;
|
|
|
|
pv_ops.cpu.load_gdt = xen_load_gdt_boot;
|
2017-03-15 01:35:41 +08:00
|
|
|
|
2022-09-15 19:10:42 +08:00
|
|
|
switch_gdt_and_percpu_base(cpu);
|
2017-03-15 01:35:41 +08:00
|
|
|
|
2018-08-28 15:40:19 +08:00
|
|
|
pv_ops.cpu.write_gdt_entry = xen_write_gdt_entry;
|
|
|
|
pv_ops.cpu.load_gdt = xen_load_gdt;
|
2017-03-15 01:35:41 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void __init xen_dom0_set_legacy_features(void)
|
|
|
|
{
|
|
|
|
x86_platform.legacy.rtc = 1;
|
|
|
|
}
|
|
|
|
|
2021-09-03 16:49:37 +08:00
|
|
|
static void __init xen_domu_set_legacy_features(void)
|
|
|
|
{
|
|
|
|
x86_platform.legacy.rtc = 0;
|
|
|
|
}
|
|
|
|
|
2022-03-08 23:30:22 +08:00
|
|
|
extern void early_xen_iret_patch(void);
|
|
|
|
|
2017-03-15 01:35:41 +08:00
|
|
|
/* First C function to be called on Xen boot */
|
2022-06-30 15:14:39 +08:00
|
|
|
asmlinkage __visible void __init xen_start_kernel(struct start_info *si)
|
2017-03-15 01:35:41 +08:00
|
|
|
{
|
|
|
|
struct physdev_set_iopl set_iopl;
|
|
|
|
unsigned long initrd_start = 0;
|
|
|
|
int rc;
|
|
|
|
|
2022-06-30 15:14:39 +08:00
|
|
|
if (!si)
|
2017-03-15 01:35:41 +08:00
|
|
|
return;
|
|
|
|
|
2022-06-30 15:14:39 +08:00
|
|
|
clear_bss();
|
|
|
|
|
|
|
|
xen_start_info = si;
|
|
|
|
|
2022-03-08 23:30:22 +08:00
|
|
|
__text_gen_insn(&early_xen_iret_patch,
|
|
|
|
JMP32_INSN_OPCODE, &early_xen_iret_patch, &xen_iret,
|
|
|
|
JMP32_INSN_SIZE);
|
|
|
|
|
2017-03-15 01:35:41 +08:00
|
|
|
xen_domain_type = XEN_PV_DOMAIN;
|
2018-06-08 16:40:38 +08:00
|
|
|
xen_start_flags = xen_start_info->flags;
|
2017-03-15 01:35:41 +08:00
|
|
|
|
|
|
|
xen_setup_features();
|
|
|
|
|
|
|
|
/* Install Xen paravirt ops */
|
|
|
|
pv_info = xen_info;
|
x86/xen: Rework the xen_{cpu,irq,mmu}_opsarrays
In order to allow objtool to make sense of all the various paravirt
functions, it needs to either parse whole pv_ops[] tables, or observe
individual assignments in the form:
bf87: 48 c7 05 00 00 00 00 00 00 00 00 movq $0x0,0x0(%rip)
bf92 <xen_init_spinlocks+0x5f>
bf8a: R_X86_64_PC32 pv_ops+0x268
As is, xen_cpu_ops[] is at offset +0 in pv_ops[] and could thus be
parsed as a 'normal' pv_ops[] table, however xen_irq_ops[] and
xen_mmu_ops[] are not.
Worse, both the latter two are compiled into the individual assignment
for by current GCC, but that's not something one can rely on.
Therefore, convert all three into full pv_ops[] tables. This has the
benefit of not needing to teach objtool about the offsets and
resulting in more conservative code-gen.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20210624095149.057262522@infradead.org
2021-06-24 17:41:22 +08:00
|
|
|
pv_ops.cpu = xen_cpu_ops.cpu;
|
2018-07-12 23:40:34 +08:00
|
|
|
xen_init_irq_ops();
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Setup xen_vcpu early because it is needed for
|
|
|
|
* local_irq_disable(), irqs_disabled(), e.g. in printk().
|
|
|
|
*
|
|
|
|
* Don't do the full vcpu_info placement stuff until we have
|
|
|
|
* the cpu_possible_mask and a non-dummy shared_info.
|
|
|
|
*/
|
|
|
|
xen_vcpu_info_reset(0);
|
2017-03-15 01:35:41 +08:00
|
|
|
|
|
|
|
x86_platform.get_nmi_reason = xen_get_nmi_reason;
|
2022-11-23 19:45:23 +08:00
|
|
|
x86_platform.realmode_reserve = x86_init_noop;
|
|
|
|
x86_platform.realmode_init = x86_init_noop;
|
2017-03-15 01:35:41 +08:00
|
|
|
|
|
|
|
x86_init.resources.memory_setup = xen_memory_setup;
|
2020-01-23 19:54:53 +08:00
|
|
|
x86_init.irqs.intr_mode_select = x86_init_noop;
|
2017-09-13 17:12:52 +08:00
|
|
|
x86_init.irqs.intr_mode_init = x86_init_noop;
|
2017-03-15 01:35:41 +08:00
|
|
|
x86_init.oem.arch_setup = xen_arch_setup;
|
|
|
|
x86_init.oem.banner = xen_banner;
|
2018-07-20 04:55:31 +08:00
|
|
|
x86_init.hyper.init_platform = xen_pv_init_platform;
|
|
|
|
x86_init.hyper.guest_late_init = xen_pv_guest_late_init;
|
2017-03-15 01:35:41 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Set up some pagetable state before starting to set any ptes.
|
|
|
|
*/
|
|
|
|
|
2018-07-12 23:40:34 +08:00
|
|
|
xen_setup_machphys_mapping();
|
2017-03-15 01:35:41 +08:00
|
|
|
xen_init_mmu_ops();
|
|
|
|
|
|
|
|
/* Prevent unwanted bits from being set in PTEs. */
|
|
|
|
__supported_pte_mask &= ~_PAGE_GLOBAL;
|
2018-07-02 18:00:18 +08:00
|
|
|
__default_kernel_pte_mask &= ~_PAGE_GLOBAL;
|
2017-03-15 01:35:41 +08:00
|
|
|
|
|
|
|
/* Get mfn list */
|
|
|
|
xen_build_dynamic_phys_to_machine();
|
|
|
|
|
2021-05-20 19:42:42 +08:00
|
|
|
/* Work out if we support NX */
|
|
|
|
get_cpu_cap(&boot_cpu_data);
|
|
|
|
x86_configure_nx();
|
|
|
|
|
2017-03-15 01:35:41 +08:00
|
|
|
/*
|
|
|
|
* Set up kernel GDT and segment registers, mainly so that
|
|
|
|
* -fstack-protector code can be executed.
|
|
|
|
*/
|
|
|
|
xen_setup_gdt(0);
|
|
|
|
|
2018-07-24 20:45:47 +08:00
|
|
|
/* Determine virtual and physical address sizes */
|
|
|
|
get_cpu_address_sizes(&boot_cpu_data);
|
|
|
|
|
2017-11-24 16:42:21 +08:00
|
|
|
/* Let's presume PV guests always boot on vCPU with id 0. */
|
|
|
|
per_cpu(xen_vcpu_id, 0) = 0;
|
|
|
|
|
|
|
|
idt_setup_early_handler();
|
|
|
|
|
2017-04-13 14:55:41 +08:00
|
|
|
xen_init_capabilities();
|
2017-03-15 01:35:41 +08:00
|
|
|
|
|
|
|
#ifdef CONFIG_X86_LOCAL_APIC
|
|
|
|
/*
|
|
|
|
* set up the basic apic ops.
|
|
|
|
*/
|
|
|
|
xen_init_apic();
|
|
|
|
#endif
|
|
|
|
|
|
|
|
machine_ops = xen_machine_ops;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The only reliable way to retain the initial address of the
|
|
|
|
* percpu gdt_page is to remember it here, so we can go and
|
|
|
|
* mark it RW later, when the initial percpu area is freed.
|
|
|
|
*/
|
|
|
|
xen_initial_gdt = &per_cpu(gdt_page, 0);
|
|
|
|
|
|
|
|
xen_smp_init();
|
|
|
|
|
|
|
|
#ifdef CONFIG_ACPI_NUMA
|
|
|
|
/*
|
|
|
|
* The pages we from Xen are not related to machine pages, so
|
|
|
|
* any NUMA information the kernel tries to get from ACPI will
|
|
|
|
* be meaningless. Prevent it from trying.
|
|
|
|
*/
|
2020-10-14 07:48:57 +08:00
|
|
|
disable_srat();
|
2017-03-15 01:35:41 +08:00
|
|
|
#endif
|
|
|
|
WARN_ON(xen_cpuhp_setup(xen_cpu_up_prepare_pv, xen_cpu_dead_pv));
|
|
|
|
|
|
|
|
local_irq_disable();
|
|
|
|
early_boot_irqs_disabled = true;
|
|
|
|
|
|
|
|
xen_raw_console_write("mapping kernel into physical memory\n");
|
|
|
|
xen_setup_kernel_pagetable((pgd_t *)xen_start_info->pt_base,
|
|
|
|
xen_start_info->nr_pages);
|
|
|
|
xen_reserve_special_pages();
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We used to do this in xen_arch_setup, but that is too late
|
|
|
|
* on AMD were early_cpu_init (run before ->arch_setup()) calls
|
|
|
|
* early_amd_init which pokes 0xcf8 port.
|
|
|
|
*/
|
|
|
|
set_iopl.iopl = 1;
|
|
|
|
rc = HYPERVISOR_physdev_op(PHYSDEVOP_set_iopl, &set_iopl);
|
|
|
|
if (rc != 0)
|
|
|
|
xen_raw_printk("physdev_op failed %d\n", rc);
|
|
|
|
|
|
|
|
|
|
|
|
if (xen_start_info->mod_start) {
|
|
|
|
if (xen_start_info->flags & SIF_MOD_START_PFN)
|
|
|
|
initrd_start = PFN_PHYS(xen_start_info->mod_start);
|
|
|
|
else
|
|
|
|
initrd_start = __pa(xen_start_info->mod_start);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Poke various useful things into boot_params */
|
|
|
|
boot_params.hdr.type_of_loader = (9 << 4) | 0;
|
|
|
|
boot_params.hdr.ramdisk_image = initrd_start;
|
|
|
|
boot_params.hdr.ramdisk_size = xen_start_info->mod_len;
|
|
|
|
boot_params.hdr.cmd_line_ptr = __pa(xen_start_info->cmd_line);
|
|
|
|
boot_params.hdr.hardware_subarch = X86_SUBARCH_XEN;
|
|
|
|
|
|
|
|
if (!xen_initial_domain()) {
|
|
|
|
if (pci_xen)
|
|
|
|
x86_init.pci.arch_init = pci_xen_init;
|
2021-09-03 16:49:37 +08:00
|
|
|
x86_platform.set_legacy_features =
|
|
|
|
xen_domu_set_legacy_features;
|
2017-03-15 01:35:41 +08:00
|
|
|
} else {
|
|
|
|
const struct dom0_vga_console_info *info =
|
|
|
|
(void *)((char *)xen_start_info +
|
|
|
|
xen_start_info->console.dom0.info_off);
|
|
|
|
struct xen_platform_op op = {
|
|
|
|
.cmd = XENPF_firmware_info,
|
|
|
|
.interface_version = XENPF_INTERFACE_VERSION,
|
|
|
|
.u.firmware_info.type = XEN_FW_KBD_SHIFT_FLAGS,
|
|
|
|
};
|
|
|
|
|
|
|
|
x86_platform.set_legacy_features =
|
|
|
|
xen_dom0_set_legacy_features;
|
|
|
|
xen_init_vga(info, xen_start_info->console.dom0.info_size);
|
|
|
|
xen_start_info->console.domU.mfn = 0;
|
|
|
|
xen_start_info->console.domU.evtchn = 0;
|
|
|
|
|
|
|
|
if (HYPERVISOR_platform_op(&op) == 0)
|
|
|
|
boot_params.kbd_status = op.u.firmware_info.u.kbd_shift_flags;
|
|
|
|
|
|
|
|
/* Make sure ACS will be enabled */
|
|
|
|
pci_request_acs();
|
|
|
|
|
|
|
|
xen_acpi_sleep_register();
|
|
|
|
|
|
|
|
xen_boot_params_init_edd();
|
2020-09-25 22:07:51 +08:00
|
|
|
|
|
|
|
#ifdef CONFIG_ACPI
|
|
|
|
/*
|
|
|
|
* Disable selecting "Firmware First mode" for correctable
|
|
|
|
* memory errors, as this is the duty of the hypervisor to
|
|
|
|
* decide.
|
|
|
|
*/
|
|
|
|
acpi_disable_cmcff = 1;
|
|
|
|
#endif
|
2017-03-15 01:35:41 +08:00
|
|
|
}
|
2018-02-27 18:19:22 +08:00
|
|
|
|
2021-09-30 20:19:16 +08:00
|
|
|
xen_add_preferred_consoles();
|
2018-02-27 18:19:22 +08:00
|
|
|
|
2017-03-15 01:35:41 +08:00
|
|
|
#ifdef CONFIG_PCI
|
|
|
|
/* PCI BIOS service won't work from a PV guest. */
|
|
|
|
pci_probe &= ~PCI_PROBE_BIOS;
|
|
|
|
#endif
|
|
|
|
xen_raw_console_write("about to get started...\n");
|
|
|
|
|
2017-06-03 08:05:58 +08:00
|
|
|
/* We need this for printk timestamps */
|
2017-03-15 01:35:41 +08:00
|
|
|
xen_setup_runstate_info(0);
|
|
|
|
|
2019-04-23 21:04:16 +08:00
|
|
|
xen_efi_init(&boot_params);
|
2017-03-15 01:35:41 +08:00
|
|
|
|
|
|
|
/* Start the world */
|
|
|
|
cr4_init_shadow(); /* 32b kernel does this in i386_start_kernel() */
|
|
|
|
x86_64_start_reservations((char *)__pa_symbol(&boot_params));
|
|
|
|
}
|
|
|
|
|
|
|
|
static int xen_cpu_up_prepare_pv(unsigned int cpu)
|
|
|
|
{
|
|
|
|
int rc;
|
|
|
|
|
2017-06-03 08:06:01 +08:00
|
|
|
if (per_cpu(xen_vcpu, cpu) == NULL)
|
|
|
|
return -ENODEV;
|
|
|
|
|
2017-03-15 01:35:41 +08:00
|
|
|
xen_setup_timer(cpu);
|
|
|
|
|
|
|
|
rc = xen_smp_intr_init(cpu);
|
|
|
|
if (rc) {
|
|
|
|
WARN(1, "xen_smp_intr_init() for CPU %d failed: %d\n",
|
|
|
|
cpu, rc);
|
|
|
|
return rc;
|
|
|
|
}
|
2017-03-15 01:35:42 +08:00
|
|
|
|
|
|
|
rc = xen_smp_intr_init_pv(cpu);
|
|
|
|
if (rc) {
|
|
|
|
WARN(1, "xen_smp_intr_init_pv() for CPU %d failed: %d\n",
|
|
|
|
cpu, rc);
|
|
|
|
return rc;
|
|
|
|
}
|
|
|
|
|
2017-03-15 01:35:41 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int xen_cpu_dead_pv(unsigned int cpu)
|
|
|
|
{
|
|
|
|
xen_smp_intr_free(cpu);
|
2017-03-15 01:35:42 +08:00
|
|
|
xen_smp_intr_free_pv(cpu);
|
2017-03-15 01:35:41 +08:00
|
|
|
|
|
|
|
xen_teardown_timer(cpu);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static uint32_t __init xen_platform_pv(void)
|
|
|
|
{
|
|
|
|
if (xen_pv_domain())
|
|
|
|
return xen_cpuid_base();
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2017-11-09 21:27:36 +08:00
|
|
|
const __initconst struct hypervisor_x86 x86_hyper_xen_pv = {
|
2017-03-15 01:35:41 +08:00
|
|
|
.name = "Xen PV",
|
|
|
|
.detect = xen_platform_pv,
|
2017-11-09 21:27:36 +08:00
|
|
|
.type = X86_HYPER_XEN_PV,
|
2017-11-09 21:27:35 +08:00
|
|
|
.runtime.pin_vcpu = xen_pin_vcpu,
|
2019-07-11 20:02:09 +08:00
|
|
|
.ignore_nopv = true,
|
2017-03-15 01:35:41 +08:00
|
|
|
};
|