License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 22:07:57 +08:00
|
|
|
// SPDX-License-Identifier: GPL-2.0
|
2012-05-22 10:50:07 +08:00
|
|
|
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
|
|
|
|
|
2008-03-11 06:28:04 +08:00
|
|
|
#include <linux/errno.h>
|
|
|
|
#include <linux/kernel.h>
|
|
|
|
#include <linux/mm.h>
|
|
|
|
#include <linux/smp.h>
|
2009-02-28 05:25:28 +08:00
|
|
|
#include <linux/prctl.h>
|
2008-03-11 06:28:04 +08:00
|
|
|
#include <linux/slab.h>
|
|
|
|
#include <linux/sched.h>
|
2017-02-01 23:36:40 +08:00
|
|
|
#include <linux/sched/idle.h>
|
2017-02-09 01:51:35 +08:00
|
|
|
#include <linux/sched/debug.h>
|
2017-02-09 01:51:36 +08:00
|
|
|
#include <linux/sched/task.h>
|
2017-02-09 01:51:37 +08:00
|
|
|
#include <linux/sched/task_stack.h>
|
2016-07-14 08:18:56 +08:00
|
|
|
#include <linux/init.h>
|
|
|
|
#include <linux/export.h>
|
2008-04-25 23:39:01 +08:00
|
|
|
#include <linux/pm.h>
|
2015-04-03 08:01:28 +08:00
|
|
|
#include <linux/tick.h>
|
2009-05-12 10:05:28 +08:00
|
|
|
#include <linux/random.h>
|
2009-09-19 14:40:22 +08:00
|
|
|
#include <linux/user-return-notifier.h>
|
2009-12-08 16:29:42 +08:00
|
|
|
#include <linux/dmi.h>
|
|
|
|
#include <linux/utsname.h>
|
2012-03-26 05:00:04 +08:00
|
|
|
#include <linux/stackprotector.h>
|
|
|
|
#include <linux/cpuidle.h>
|
2018-11-22 10:04:09 +08:00
|
|
|
#include <linux/acpi.h>
|
|
|
|
#include <linux/elf-randomize.h>
|
2009-09-17 22:11:28 +08:00
|
|
|
#include <trace/events/power.h>
|
2009-09-10 01:22:48 +08:00
|
|
|
#include <linux/hw_breakpoint.h>
|
2011-01-20 22:42:52 +08:00
|
|
|
#include <asm/cpu.h>
|
2008-11-11 21:33:44 +08:00
|
|
|
#include <asm/apic.h>
|
2016-12-25 03:46:01 +08:00
|
|
|
#include <linux/uaccess.h>
|
sched/idle/x86: Restore mwait_idle() to fix boot hangs, to improve power savings and to improve performance
In Linux-3.9 we removed the mwait_idle() loop:
69fb3676df33 ("x86 idle: remove mwait_idle() and "idle=mwait" cmdline param")
The reasoning was that modern machines should be sufficiently
happy during the boot process using the default_idle() HALT
loop, until cpuidle loads and either acpi_idle or intel_idle
invoke the newer MWAIT-with-hints idle loop.
But two machines reported problems:
1. Certain Core2-era machines support MWAIT-C1 and HALT only.
MWAIT-C1 is preferred for optimal power and performance.
But if they support just C1, cpuidle never loads and
so they use the boot-time default idle loop forever.
2. Some laptops will boot-hang if HALT is used,
but will boot successfully if MWAIT is used.
This appears to be a hidden assumption in BIOS SMI,
that is presumably valid on the proprietary OS
where the BIOS was validated.
https://bugzilla.kernel.org/show_bug.cgi?id=60770
So here we effectively revert the patch above, restoring
the mwait_idle() loop. However, we don't bother restoring
the idle=mwait cmdline parameter, since it appears to add
no value.
Maintainer notes:
For 3.9, simply revert 69fb3676df
for 3.10, patch -F3 applies, fuzz needed due to __cpuinit use in
context For 3.11, 3.12, 3.13, this patch applies cleanly
Tested-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Acked-by: Mike Galbraith <bitbucket@online.de>
Cc: <stable@vger.kernel.org> # 3.9+
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ian Malone <ibmalone@gmail.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/345254a551eb5a6a866e048d7ab570fd2193aca4.1389763084.git.len.brown@intel.com
[ Ported to recent kernels. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-15 13:37:34 +08:00
|
|
|
#include <asm/mwait.h>
|
2021-10-22 06:55:10 +08:00
|
|
|
#include <asm/fpu/api.h>
|
2021-10-15 09:16:20 +08:00
|
|
|
#include <asm/fpu/sched.h>
|
2021-10-22 06:55:22 +08:00
|
|
|
#include <asm/fpu/xstate.h>
|
2009-06-02 02:14:55 +08:00
|
|
|
#include <asm/debugreg.h>
|
2012-03-26 05:00:04 +08:00
|
|
|
#include <asm/nmi.h>
|
2014-10-25 06:58:07 +08:00
|
|
|
#include <asm/tlbflush.h>
|
2015-08-13 00:29:40 +08:00
|
|
|
#include <asm/mce.h>
|
2015-07-29 13:41:16 +08:00
|
|
|
#include <asm/vm86.h>
|
2016-08-14 00:38:18 +08:00
|
|
|
#include <asm/switch_to.h>
|
2017-02-21 00:56:14 +08:00
|
|
|
#include <asm/desc.h>
|
2017-03-20 16:16:26 +08:00
|
|
|
#include <asm/prctl.h>
|
2018-04-29 21:21:42 +08:00
|
|
|
#include <asm/spec-ctrl.h>
|
2019-11-12 06:03:21 +08:00
|
|
|
#include <asm/io_bitmap.h>
|
2018-11-22 10:04:09 +08:00
|
|
|
#include <asm/proto.h>
|
2020-09-15 01:04:22 +08:00
|
|
|
#include <asm/frame.h>
|
2021-10-22 22:53:02 +08:00
|
|
|
#include <asm/unwind.h>
|
x86/tdx: Add HLT support for TDX guests
The HLT instruction is a privileged instruction, executing it stops
instruction execution and places the processor in a HALT state. It
is used in kernel for cases like reboot, idle loop and exception fixup
handlers. For the idle case, interrupts will be enabled (using STI)
before the HLT instruction (this is also called safe_halt()).
To support the HLT instruction in TDX guests, it needs to be emulated
using TDVMCALL (hypercall to VMM). More details about it can be found
in Intel Trust Domain Extensions (Intel TDX) Guest-Host-Communication
Interface (GHCI) specification, section TDVMCALL[Instruction.HLT].
In TDX guests, executing HLT instruction will generate a #VE, which is
used to emulate the HLT instruction. But #VE based emulation will not
work for the safe_halt() flavor, because it requires STI instruction to
be executed just before the TDCALL. Since idle loop is the only user of
safe_halt() variant, handle it as a special case.
To avoid *safe_halt() call in the idle function, define the
tdx_guest_idle() and use it to override the "x86_idle" function pointer
for a valid TDX guest.
Alternative choices like PV ops have been considered for adding
safe_halt() support. But it was rejected because HLT paravirt calls
only exist under PARAVIRT_XXL, and enabling it in TDX guest just for
safe_halt() use case is not worth the cost.
Co-developed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20220405232939.73860-9-kirill.shutemov@linux.intel.com
2022-04-06 07:29:17 +08:00
|
|
|
#include <asm/tdx.h>
|
2012-03-26 05:00:04 +08:00
|
|
|
|
2018-11-26 02:33:47 +08:00
|
|
|
#include "process.h"
|
|
|
|
|
2012-05-03 17:03:01 +08:00
|
|
|
/*
|
|
|
|
* per-CPU TSS segments. Threads are completely 'soft' on Linux,
|
|
|
|
* no more per-task TSS's. The TSS size is kept cacheline-aligned
|
|
|
|
* so they are allowed to end up in the .data..cacheline_aligned
|
|
|
|
* section. Since TSS's are completely CPU-local, we want them
|
|
|
|
* on exact cacheline boundaries, to eliminate cacheline ping-pong.
|
|
|
|
*/
|
2018-01-04 04:39:52 +08:00
|
|
|
__visible DEFINE_PER_CPU_PAGE_ALIGNED(struct tss_struct, cpu_tss_rw) = {
|
2015-03-06 11:19:06 +08:00
|
|
|
.x86_tss = {
|
2017-11-02 15:59:13 +08:00
|
|
|
/*
|
|
|
|
* .sp0 is only used when entering ring 0 from a lower
|
|
|
|
* privilege level. Since the init task never runs anything
|
|
|
|
* but ring 0 code, there is no need for a valid value here.
|
|
|
|
* Poison it.
|
|
|
|
*/
|
|
|
|
.sp0 = (1UL << (BITS_PER_LONG-1)) + 1,
|
2017-12-04 22:07:21 +08:00
|
|
|
|
2021-01-26 01:34:29 +08:00
|
|
|
#ifdef CONFIG_X86_32
|
2017-12-04 22:07:21 +08:00
|
|
|
.sp1 = TOP_OF_INIT_STACK,
|
|
|
|
|
2015-03-06 11:19:06 +08:00
|
|
|
.ss0 = __KERNEL_DS,
|
|
|
|
.ss1 = __KERNEL_CS,
|
|
|
|
#endif
|
2019-11-12 06:03:20 +08:00
|
|
|
.io_bitmap_base = IO_BITMAP_OFFSET_INVALID,
|
2015-03-06 11:19:06 +08:00
|
|
|
},
|
|
|
|
};
|
2017-12-04 22:07:29 +08:00
|
|
|
EXPORT_PER_CPU_SYMBOL(cpu_tss_rw);
|
2012-05-03 17:03:01 +08:00
|
|
|
|
2017-02-22 23:36:16 +08:00
|
|
|
DEFINE_PER_CPU(bool, __tss_limit_invalid);
|
|
|
|
EXPORT_PER_CPU_SYMBOL_GPL(__tss_limit_invalid);
|
2017-02-21 00:56:14 +08:00
|
|
|
|
2012-05-17 06:03:51 +08:00
|
|
|
/*
|
|
|
|
* this gets called so that we can store lazy state into memory and copy the
|
|
|
|
* current task into the new thread.
|
|
|
|
*/
|
2008-03-11 06:28:04 +08:00
|
|
|
int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
|
|
|
|
{
|
2015-07-17 18:28:12 +08:00
|
|
|
memcpy(dst, src, arch_task_struct_size);
|
2015-10-31 13:42:46 +08:00
|
|
|
#ifdef CONFIG_VM86
|
|
|
|
dst->thread.vm86 = NULL;
|
|
|
|
#endif
|
2021-10-13 22:55:43 +08:00
|
|
|
/* Drop the copied pointer to current's fpstate */
|
|
|
|
dst->thread.fpu.fpstate = NULL;
|
2021-10-22 06:55:22 +08:00
|
|
|
|
2021-10-15 09:16:04 +08:00
|
|
|
return 0;
|
2008-03-11 06:28:04 +08:00
|
|
|
}
|
2008-04-25 23:39:01 +08:00
|
|
|
|
2021-10-22 06:55:22 +08:00
|
|
|
#ifdef CONFIG_X86_64
|
|
|
|
void arch_release_task_struct(struct task_struct *tsk)
|
|
|
|
{
|
|
|
|
if (fpu_state_size_dynamic())
|
|
|
|
fpstate_free(&tsk->thread.fpu);
|
2008-03-11 06:28:04 +08:00
|
|
|
}
|
2021-10-22 06:55:22 +08:00
|
|
|
#endif
|
2008-04-25 23:39:01 +08:00
|
|
|
|
2009-02-28 05:25:28 +08:00
|
|
|
/*
|
x86/ioperm: Prevent a memory leak when fork fails
In the copy_process() routine called by _do_fork(), failure to allocate
a PID (or further along in the function) will trigger an invocation to
exit_thread(). This is done to clean up from an earlier call to
copy_thread_tls(). Naturally, the child task is passed into exit_thread(),
however during the process, io_bitmap_exit() nullifies the parent's
io_bitmap rather than the child's.
As copy_thread_tls() has been called ahead of the failure, the reference
count on the calling thread's io_bitmap is incremented as we would expect.
However, io_bitmap_exit() doesn't accept any arguments, and thus assumes
it should trash the current thread's io_bitmap reference rather than the
child's. This is pretty sneaky in practice, because in all instances but
this one, exit_thread() is called with respect to the current task and
everything works out.
A determined attacker can issue an appropriate ioctl (i.e. KDENABIO) to
get a bitmap allocated, and force a clone3() syscall to fail by passing
in a zeroed clone_args structure. The kernel handles the erroneous struct
and the buggy code path is followed, and even though the parent's reference
to the io_bitmap is trashed, the child still holds a reference and thus
the structure will never be freed.
Fix this by tweaking io_bitmap_exit() and its subroutines to accept a
task_struct argument which to operate on.
Fixes: ea5f1cd7ab49 ("x86/ioperm: Remove bitmap if all permissions dropped")
Signed-off-by: Jay Lang <jaytlang@mit.edu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable#@vger.kernel.org
Link: https://lkml.kernel.org/r/20200524162742.253727-1-jaytlang@mit.edu
2020-05-25 00:27:39 +08:00
|
|
|
* Free thread data structures etc..
|
2009-02-28 05:25:28 +08:00
|
|
|
*/
|
2016-05-21 08:00:20 +08:00
|
|
|
void exit_thread(struct task_struct *tsk)
|
2009-02-28 05:25:28 +08:00
|
|
|
{
|
2016-05-21 08:00:20 +08:00
|
|
|
struct thread_struct *t = &tsk->thread;
|
2015-04-23 18:33:50 +08:00
|
|
|
struct fpu *fpu = &t->fpu;
|
2019-11-12 06:03:24 +08:00
|
|
|
|
|
|
|
if (test_thread_flag(TIF_IO_BITMAP))
|
x86/ioperm: Prevent a memory leak when fork fails
In the copy_process() routine called by _do_fork(), failure to allocate
a PID (or further along in the function) will trigger an invocation to
exit_thread(). This is done to clean up from an earlier call to
copy_thread_tls(). Naturally, the child task is passed into exit_thread(),
however during the process, io_bitmap_exit() nullifies the parent's
io_bitmap rather than the child's.
As copy_thread_tls() has been called ahead of the failure, the reference
count on the calling thread's io_bitmap is incremented as we would expect.
However, io_bitmap_exit() doesn't accept any arguments, and thus assumes
it should trash the current thread's io_bitmap reference rather than the
child's. This is pretty sneaky in practice, because in all instances but
this one, exit_thread() is called with respect to the current task and
everything works out.
A determined attacker can issue an appropriate ioctl (i.e. KDENABIO) to
get a bitmap allocated, and force a clone3() syscall to fail by passing
in a zeroed clone_args structure. The kernel handles the erroneous struct
and the buggy code path is followed, and even though the parent's reference
to the io_bitmap is trashed, the child still holds a reference and thus
the structure will never be freed.
Fix this by tweaking io_bitmap_exit() and its subroutines to accept a
task_struct argument which to operate on.
Fixes: ea5f1cd7ab49 ("x86/ioperm: Remove bitmap if all permissions dropped")
Signed-off-by: Jay Lang <jaytlang@mit.edu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable#@vger.kernel.org
Link: https://lkml.kernel.org/r/20200524162742.253727-1-jaytlang@mit.edu
2020-05-25 00:27:39 +08:00
|
|
|
io_bitmap_exit(tsk);
|
2012-05-17 06:03:54 +08:00
|
|
|
|
2015-07-29 13:41:16 +08:00
|
|
|
free_vm86(t);
|
|
|
|
|
x86/fpu: Synchronize the naming of drop_fpu() and fpu_reset_state()
drop_fpu() and fpu_reset_state() are similar in functionality
and in scope, yet this is not apparent from their names.
drop_fpu() deactivates FPU contents (both the fpregs and the fpstate),
but leaves register contents intact in the eager-FPU case, mostly as an
optimization. It disables fpregs in the lazy FPU case. The drop_fpu()
method can be used to destroy FPU state in an optimized way, when we
know that a new state will be loaded before user-space might see
any remains of the old FPU state:
- such as in sys_exit()'s exit_thread() where we know this task
won't execute any user-space instructions anymore and the
next context switch cleans up the FPU. The old FPU state
might still be around in the eagerfpu case but won't be
saved.
- in __restore_xstate_sig(), where we use drop_fpu() before
copying a new state into the fpstate and activating that one.
No user-pace instructions can execute between those steps.
- in sys_execve()'s fpu__clear(): there we use drop_fpu() in
the !eagerfpu case, where it's equivalent to a full reinit.
fpu_reset_state() is a stronger version of drop_fpu(): both in
the eagerfpu and the lazy-FPU case it guarantees that fpregs
are reinitialized to init state. This method is used in cases
where we need a full reset:
- handle_signal() uses fpu_reset_state() to reset the FPU state
to init before executing a user-space signal handler. While we
have already saved the original FPU state at this point, and
always restore the original state, the signal handling code
still has to do this reinit, because signals may interrupt
any user-space instruction, and the FPU might be in various
intermediate states (such as an unbalanced x87 stack) that is
not immediately usable for general C signal handler code.
- __restore_xstate_sig() uses fpu_reset_state() when the signal
frame has no FP context. Since the signal handler may have
modified the FPU state, it gets reset back to init state.
- in another branch __restore_xstate_sig() uses fpu_reset_state()
to handle a restoration error: when restore_user_xstate() fails
to restore FPU state and we might have inconsistent FPU data,
fpu_reset_state() is used to reset it back to a known good
state.
- __kernel_fpu_end() uses fpu_reset_state() in an error branch.
This is in a 'must not trigger' error branch, so on bug-free
kernels this never triggers.
- fpu__restore() uses fpu_reset_state() in an error path
as well: if the fpstate was set up with invalid FPU state
(via ptrace or via a signal handler), then it's reset back
to init state.
- likewise, the scheduler's switch_fpu_finish() uses it in a
restoration error path too.
Move both drop_fpu() and fpu_reset_state() to the fpu__*() namespace
and harmonize their naming with their function:
fpu__drop()
fpu__reset()
This clearly shows that both methods operate on the full state of the
FPU, just like fpu__restore().
Also add comments to explain what each function does.
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-30 01:04:31 +08:00
|
|
|
fpu__drop(fpu);
|
2009-02-28 05:25:28 +08:00
|
|
|
}
|
|
|
|
|
2019-11-12 06:03:16 +08:00
|
|
|
static int set_new_tls(struct task_struct *p, unsigned long tls)
|
|
|
|
{
|
|
|
|
struct user_desc __user *utls = (struct user_desc __user *)tls;
|
|
|
|
|
|
|
|
if (in_ia32_syscall())
|
|
|
|
return do_set_thread_area(p, -1, utls, 0);
|
|
|
|
else
|
|
|
|
return do_set_thread_area_64(p, ARCH_SET_FS, tls);
|
|
|
|
}
|
|
|
|
|
2022-04-09 07:07:50 +08:00
|
|
|
int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
|
2019-11-12 06:03:16 +08:00
|
|
|
{
|
2022-04-09 07:07:50 +08:00
|
|
|
unsigned long clone_flags = args->flags;
|
|
|
|
unsigned long sp = args->stack;
|
|
|
|
unsigned long tls = args->tls;
|
2019-11-12 06:03:16 +08:00
|
|
|
struct inactive_task_frame *frame;
|
|
|
|
struct fork_frame *fork_frame;
|
|
|
|
struct pt_regs *childregs;
|
2019-11-12 06:03:25 +08:00
|
|
|
int ret = 0;
|
2019-11-12 06:03:16 +08:00
|
|
|
|
|
|
|
childregs = task_pt_regs(p);
|
|
|
|
fork_frame = container_of(childregs, struct fork_frame, regs);
|
|
|
|
frame = &fork_frame->frame;
|
|
|
|
|
2020-09-15 01:04:22 +08:00
|
|
|
frame->bp = encode_frame_pointer(childregs);
|
2019-11-12 06:03:16 +08:00
|
|
|
frame->ret_addr = (unsigned long) ret_from_fork;
|
|
|
|
p->thread.sp = (unsigned long) fork_frame;
|
2019-11-12 06:03:21 +08:00
|
|
|
p->thread.io_bitmap = NULL;
|
2021-09-17 17:20:04 +08:00
|
|
|
p->thread.iopl_warn = 0;
|
2019-11-12 06:03:16 +08:00
|
|
|
memset(p->thread.ptrace_bps, 0, sizeof(p->thread.ptrace_bps));
|
|
|
|
|
|
|
|
#ifdef CONFIG_X86_64
|
2020-05-29 04:13:53 +08:00
|
|
|
current_save_fsgs();
|
|
|
|
p->thread.fsindex = current->thread.fsindex;
|
|
|
|
p->thread.fsbase = current->thread.fsbase;
|
|
|
|
p->thread.gsindex = current->thread.gsindex;
|
|
|
|
p->thread.gsbase = current->thread.gsbase;
|
|
|
|
|
2019-11-12 06:03:16 +08:00
|
|
|
savesegment(es, p->thread.es);
|
|
|
|
savesegment(ds, p->thread.ds);
|
|
|
|
#else
|
|
|
|
p->thread.sp0 = (unsigned long) (childregs + 1);
|
2022-03-25 23:39:52 +08:00
|
|
|
savesegment(gs, p->thread.gs);
|
2019-11-12 06:03:16 +08:00
|
|
|
/*
|
|
|
|
* Clear all status flags including IF and set fixed bit. 64bit
|
|
|
|
* does not have this initialization as the frame does not contain
|
|
|
|
* flags. The flags consistency (especially vs. AC) is there
|
|
|
|
* ensured via objtool, which lacks 32bit support.
|
|
|
|
*/
|
|
|
|
frame->flags = X86_EFLAGS_FIXED;
|
|
|
|
#endif
|
|
|
|
|
2022-04-12 23:18:48 +08:00
|
|
|
fpu_clone(p, clone_flags, args->fn);
|
2021-10-15 09:16:04 +08:00
|
|
|
|
2019-11-12 06:03:16 +08:00
|
|
|
/* Kernel thread ? */
|
2021-05-05 19:03:10 +08:00
|
|
|
if (unlikely(p->flags & PF_KTHREAD)) {
|
2021-06-23 20:02:18 +08:00
|
|
|
p->thread.pkru = pkru_get_init_value();
|
2019-11-12 06:03:16 +08:00
|
|
|
memset(childregs, 0, sizeof(struct pt_regs));
|
2022-04-12 23:18:48 +08:00
|
|
|
kthread_frame_init(frame, args->fn, args->fn_arg);
|
2019-11-12 06:03:16 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2021-06-23 20:02:18 +08:00
|
|
|
/*
|
|
|
|
* Clone current's PKRU value from hardware. tsk->thread.pkru
|
|
|
|
* is only valid when scheduled out.
|
|
|
|
*/
|
|
|
|
p->thread.pkru = read_pkru();
|
|
|
|
|
2019-11-12 06:03:16 +08:00
|
|
|
frame->bx = 0;
|
|
|
|
*childregs = *current_pt_regs();
|
|
|
|
childregs->ax = 0;
|
|
|
|
if (sp)
|
|
|
|
childregs->sp = sp;
|
|
|
|
|
2022-04-12 23:18:48 +08:00
|
|
|
if (unlikely(args->fn)) {
|
2021-05-05 19:03:10 +08:00
|
|
|
/*
|
2022-04-12 23:18:48 +08:00
|
|
|
* A user space thread, but it doesn't return to
|
|
|
|
* ret_after_fork().
|
2021-05-05 19:03:10 +08:00
|
|
|
*
|
|
|
|
* In order to indicate that to tools like gdb,
|
|
|
|
* we reset the stack and instruction pointers.
|
|
|
|
*
|
|
|
|
* It does the same kernel frame setup to return to a kernel
|
|
|
|
* function that a kernel thread does.
|
|
|
|
*/
|
|
|
|
childregs->sp = 0;
|
|
|
|
childregs->ip = 0;
|
2022-04-12 23:18:48 +08:00
|
|
|
kthread_frame_init(frame, args->fn, args->fn_arg);
|
2021-05-05 19:03:10 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2019-11-12 06:03:16 +08:00
|
|
|
/* Set a new TLS for the child thread? */
|
2019-11-12 06:03:25 +08:00
|
|
|
if (clone_flags & CLONE_SETTLS)
|
2019-11-12 06:03:16 +08:00
|
|
|
ret = set_new_tls(p, tls);
|
2019-11-12 06:03:25 +08:00
|
|
|
|
|
|
|
if (!ret && unlikely(test_tsk_thread_flag(current, TIF_IO_BITMAP)))
|
|
|
|
io_bitmap_share(p);
|
|
|
|
|
2019-11-12 06:03:16 +08:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2021-06-23 20:02:13 +08:00
|
|
|
static void pkru_flush_thread(void)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* If PKRU is enabled the default PKRU value has to be loaded into
|
|
|
|
* the hardware right here (similar to context switch).
|
|
|
|
*/
|
|
|
|
pkru_write_default();
|
|
|
|
}
|
|
|
|
|
2009-02-28 05:25:28 +08:00
|
|
|
void flush_thread(void)
|
|
|
|
{
|
|
|
|
struct task_struct *tsk = current;
|
|
|
|
|
2009-09-10 01:22:48 +08:00
|
|
|
flush_ptrace_hw_breakpoint(tsk);
|
2009-02-28 05:25:28 +08:00
|
|
|
memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));
|
2015-01-20 02:52:12 +08:00
|
|
|
|
2021-06-23 20:02:12 +08:00
|
|
|
fpu_flush_thread();
|
2021-06-23 20:02:13 +08:00
|
|
|
pkru_flush_thread();
|
2009-02-28 05:25:28 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
void disable_TSC(void)
|
|
|
|
{
|
|
|
|
preempt_disable();
|
|
|
|
if (!test_and_set_thread_flag(TIF_NOTSC))
|
|
|
|
/*
|
|
|
|
* Must flip the CPU state synchronously with
|
|
|
|
* TIF_NOTSC in the current running context.
|
|
|
|
*/
|
2017-02-14 16:11:04 +08:00
|
|
|
cr4_set_bits(X86_CR4_TSD);
|
2009-02-28 05:25:28 +08:00
|
|
|
preempt_enable();
|
|
|
|
}
|
|
|
|
|
|
|
|
static void enable_TSC(void)
|
|
|
|
{
|
|
|
|
preempt_disable();
|
|
|
|
if (test_and_clear_thread_flag(TIF_NOTSC))
|
|
|
|
/*
|
|
|
|
* Must flip the CPU state synchronously with
|
|
|
|
* TIF_NOTSC in the current running context.
|
|
|
|
*/
|
2017-02-14 16:11:04 +08:00
|
|
|
cr4_clear_bits(X86_CR4_TSD);
|
2009-02-28 05:25:28 +08:00
|
|
|
preempt_enable();
|
|
|
|
}
|
|
|
|
|
|
|
|
int get_tsc_mode(unsigned long adr)
|
|
|
|
{
|
|
|
|
unsigned int val;
|
|
|
|
|
|
|
|
if (test_thread_flag(TIF_NOTSC))
|
|
|
|
val = PR_TSC_SIGSEGV;
|
|
|
|
else
|
|
|
|
val = PR_TSC_ENABLE;
|
|
|
|
|
|
|
|
return put_user(val, (unsigned int __user *)adr);
|
|
|
|
}
|
|
|
|
|
|
|
|
int set_tsc_mode(unsigned int val)
|
|
|
|
{
|
|
|
|
if (val == PR_TSC_SIGSEGV)
|
|
|
|
disable_TSC();
|
|
|
|
else if (val == PR_TSC_ENABLE)
|
|
|
|
enable_TSC();
|
|
|
|
else
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2017-03-20 16:16:26 +08:00
|
|
|
DEFINE_PER_CPU(u64, msr_misc_features_shadow);
|
|
|
|
|
|
|
|
static void set_cpuid_faulting(bool on)
|
|
|
|
{
|
|
|
|
u64 msrval;
|
|
|
|
|
|
|
|
msrval = this_cpu_read(msr_misc_features_shadow);
|
|
|
|
msrval &= ~MSR_MISC_FEATURES_ENABLES_CPUID_FAULT;
|
|
|
|
msrval |= (on << MSR_MISC_FEATURES_ENABLES_CPUID_FAULT_BIT);
|
|
|
|
this_cpu_write(msr_misc_features_shadow, msrval);
|
|
|
|
wrmsrl(MSR_MISC_FEATURES_ENABLES, msrval);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void disable_cpuid(void)
|
|
|
|
{
|
|
|
|
preempt_disable();
|
|
|
|
if (!test_and_set_thread_flag(TIF_NOCPUID)) {
|
|
|
|
/*
|
|
|
|
* Must flip the CPU state synchronously with
|
|
|
|
* TIF_NOCPUID in the current running context.
|
|
|
|
*/
|
|
|
|
set_cpuid_faulting(true);
|
|
|
|
}
|
|
|
|
preempt_enable();
|
|
|
|
}
|
|
|
|
|
|
|
|
static void enable_cpuid(void)
|
|
|
|
{
|
|
|
|
preempt_disable();
|
|
|
|
if (test_and_clear_thread_flag(TIF_NOCPUID)) {
|
|
|
|
/*
|
|
|
|
* Must flip the CPU state synchronously with
|
|
|
|
* TIF_NOCPUID in the current running context.
|
|
|
|
*/
|
|
|
|
set_cpuid_faulting(false);
|
|
|
|
}
|
|
|
|
preempt_enable();
|
|
|
|
}
|
|
|
|
|
|
|
|
static int get_cpuid_mode(void)
|
|
|
|
{
|
|
|
|
return !test_thread_flag(TIF_NOCPUID);
|
|
|
|
}
|
|
|
|
|
2022-05-12 20:04:08 +08:00
|
|
|
static int set_cpuid_mode(unsigned long cpuid_enabled)
|
2017-03-20 16:16:26 +08:00
|
|
|
{
|
2019-03-30 02:52:59 +08:00
|
|
|
if (!boot_cpu_has(X86_FEATURE_CPUID_FAULT))
|
2017-03-20 16:16:26 +08:00
|
|
|
return -ENODEV;
|
|
|
|
|
|
|
|
if (cpuid_enabled)
|
|
|
|
enable_cpuid();
|
|
|
|
else
|
|
|
|
disable_cpuid();
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Called immediately after a successful exec.
|
|
|
|
*/
|
|
|
|
void arch_setup_new_exec(void)
|
|
|
|
{
|
|
|
|
/* If cpuid was previously disabled for this task, re-enable it. */
|
|
|
|
if (test_thread_flag(TIF_NOCPUID))
|
|
|
|
enable_cpuid();
|
2019-01-17 06:01:36 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Don't inherit TIF_SSBD across exec boundary when
|
|
|
|
* PR_SPEC_DISABLE_NOEXEC is used.
|
|
|
|
*/
|
|
|
|
if (test_thread_flag(TIF_SSBD) &&
|
|
|
|
task_spec_ssb_noexec(current)) {
|
|
|
|
clear_thread_flag(TIF_SSBD);
|
|
|
|
task_clear_spec_ssb_disable(current);
|
|
|
|
task_clear_spec_ssb_noexec(current);
|
2021-11-29 21:06:53 +08:00
|
|
|
speculation_ctrl_update(read_thread_flags());
|
2019-01-17 06:01:36 +08:00
|
|
|
}
|
2017-03-20 16:16:26 +08:00
|
|
|
}
|
|
|
|
|
2019-11-13 04:40:33 +08:00
|
|
|
#ifdef CONFIG_X86_IOPL_IOPERM
|
2019-11-12 06:03:23 +08:00
|
|
|
static inline void switch_to_bitmap(unsigned long tifp)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Invalidate I/O bitmap if the previous task used it. This prevents
|
|
|
|
* any possible leakage of an active I/O bitmap.
|
|
|
|
*
|
|
|
|
* If the next task has an I/O bitmap it will handle it on exit to
|
|
|
|
* user mode.
|
|
|
|
*/
|
|
|
|
if (tifp & _TIF_IO_BITMAP)
|
2020-07-18 07:53:55 +08:00
|
|
|
tss_invalidate_io_bitmap();
|
2019-11-12 06:03:23 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void tss_copy_io_bitmap(struct tss_struct *tss, struct io_bitmap *iobm)
|
2019-11-12 06:03:22 +08:00
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Copy at least the byte range of the incoming tasks bitmap which
|
|
|
|
* covers the permitted I/O ports.
|
|
|
|
*
|
|
|
|
* If the previous task which used an I/O bitmap had more bits
|
|
|
|
* permitted, then the copy needs to cover those as well so they
|
|
|
|
* get turned off.
|
|
|
|
*/
|
|
|
|
memcpy(tss->io_bitmap.bitmap, iobm->bitmap,
|
|
|
|
max(tss->io_bitmap.prev_max, iobm->max));
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Store the new max and the sequence number of this bitmap
|
|
|
|
* and a pointer to the bitmap itself.
|
|
|
|
*/
|
|
|
|
tss->io_bitmap.prev_max = iobm->max;
|
|
|
|
tss->io_bitmap.prev_sequence = iobm->sequence;
|
|
|
|
}
|
|
|
|
|
2019-11-12 06:03:23 +08:00
|
|
|
/**
|
2022-04-14 14:21:10 +08:00
|
|
|
* native_tss_update_io_bitmap - Update I/O bitmap before exiting to user mode
|
2019-11-12 06:03:23 +08:00
|
|
|
*/
|
2020-02-18 23:47:12 +08:00
|
|
|
void native_tss_update_io_bitmap(void)
|
2017-02-14 16:11:02 +08:00
|
|
|
{
|
2018-11-26 02:33:47 +08:00
|
|
|
struct tss_struct *tss = this_cpu_ptr(&cpu_tss_rw);
|
2019-11-30 23:00:53 +08:00
|
|
|
struct thread_struct *t = ¤t->thread;
|
2019-11-12 06:03:28 +08:00
|
|
|
u16 *base = &tss->x86_tss.io_bitmap_base;
|
2018-11-26 02:33:47 +08:00
|
|
|
|
2019-11-30 23:00:53 +08:00
|
|
|
if (!test_thread_flag(TIF_IO_BITMAP)) {
|
2020-07-18 07:53:55 +08:00
|
|
|
native_tss_invalidate_io_bitmap();
|
2019-11-30 23:00:53 +08:00
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (IS_ENABLED(CONFIG_X86_IOPL_IOPERM) && t->iopl_emul == 3) {
|
|
|
|
*base = IO_BITMAP_OFFSET_VALID_ALL;
|
|
|
|
} else {
|
|
|
|
struct io_bitmap *iobm = t->io_bitmap;
|
|
|
|
|
2017-02-14 16:11:02 +08:00
|
|
|
/*
|
2019-11-30 23:00:53 +08:00
|
|
|
* Only copy bitmap data when the sequence number differs. The
|
|
|
|
* update time is accounted to the incoming task.
|
2017-02-14 16:11:02 +08:00
|
|
|
*/
|
2019-11-30 23:00:53 +08:00
|
|
|
if (tss->io_bitmap.prev_sequence != iobm->sequence)
|
|
|
|
tss_copy_io_bitmap(tss, iobm);
|
|
|
|
|
|
|
|
/* Enable the bitmap */
|
|
|
|
*base = IO_BITMAP_OFFSET_VALID_MAP;
|
2017-02-14 16:11:02 +08:00
|
|
|
}
|
2019-11-30 23:00:53 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Make sure that the TSS limit is covering the IO bitmap. It might have
|
|
|
|
* been cut down by a VMEXIT to 0x67 which would cause a subsequent I/O
|
|
|
|
* access from user space to trigger a #GP because tbe bitmap is outside
|
|
|
|
* the TSS limit.
|
|
|
|
*/
|
|
|
|
refresh_tss_limit();
|
2017-02-14 16:11:02 +08:00
|
|
|
}
|
2019-11-13 04:40:33 +08:00
|
|
|
#else /* CONFIG_X86_IOPL_IOPERM */
|
|
|
|
static inline void switch_to_bitmap(unsigned long tifp) { }
|
|
|
|
#endif
|
2017-02-14 16:11:02 +08:00
|
|
|
|
2018-05-10 03:53:09 +08:00
|
|
|
#ifdef CONFIG_SMP
|
|
|
|
|
|
|
|
struct ssb_state {
|
|
|
|
struct ssb_state *shared_state;
|
|
|
|
raw_spinlock_t lock;
|
|
|
|
unsigned int disable_state;
|
|
|
|
unsigned long local_state;
|
|
|
|
};
|
|
|
|
|
|
|
|
#define LSTATE_SSB 0
|
|
|
|
|
|
|
|
static DEFINE_PER_CPU(struct ssb_state, ssb_state);
|
|
|
|
|
|
|
|
void speculative_store_bypass_ht_init(void)
|
2018-04-29 21:21:42 +08:00
|
|
|
{
|
2018-05-10 03:53:09 +08:00
|
|
|
struct ssb_state *st = this_cpu_ptr(&ssb_state);
|
|
|
|
unsigned int this_cpu = smp_processor_id();
|
|
|
|
unsigned int cpu;
|
|
|
|
|
|
|
|
st->local_state = 0;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Shared state setup happens once on the first bringup
|
|
|
|
* of the CPU. It's not destroyed on CPU hotunplug.
|
|
|
|
*/
|
|
|
|
if (st->shared_state)
|
|
|
|
return;
|
|
|
|
|
|
|
|
raw_spin_lock_init(&st->lock);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Go over HT siblings and check whether one of them has set up the
|
|
|
|
* shared state pointer already.
|
|
|
|
*/
|
|
|
|
for_each_cpu(cpu, topology_sibling_cpumask(this_cpu)) {
|
|
|
|
if (cpu == this_cpu)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
if (!per_cpu(ssb_state, cpu).shared_state)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
/* Link it to the state of the sibling: */
|
|
|
|
st->shared_state = per_cpu(ssb_state, cpu).shared_state;
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* First HT sibling to come up on the core. Link shared state of
|
|
|
|
* the first HT sibling to itself. The siblings on the same core
|
|
|
|
* which come up later will see the shared state pointer and link
|
2021-03-18 22:28:01 +08:00
|
|
|
* themselves to the state of this CPU.
|
2018-05-10 03:53:09 +08:00
|
|
|
*/
|
|
|
|
st->shared_state = st;
|
|
|
|
}
|
2018-04-29 21:21:42 +08:00
|
|
|
|
2018-05-10 03:53:09 +08:00
|
|
|
/*
|
|
|
|
* Logic is: First HT sibling enables SSBD for both siblings in the core
|
|
|
|
* and last sibling to disable it, disables it for the whole core. This how
|
|
|
|
* MSR_SPEC_CTRL works in "hardware":
|
|
|
|
*
|
|
|
|
* CORE_SPEC_CTRL = THREAD0_SPEC_CTRL | THREAD1_SPEC_CTRL
|
|
|
|
*/
|
|
|
|
static __always_inline void amd_set_core_ssb_state(unsigned long tifn)
|
|
|
|
{
|
|
|
|
struct ssb_state *st = this_cpu_ptr(&ssb_state);
|
|
|
|
u64 msr = x86_amd_ls_cfg_base;
|
|
|
|
|
|
|
|
if (!static_cpu_has(X86_FEATURE_ZEN)) {
|
|
|
|
msr |= ssbd_tif_to_amd_ls_cfg(tifn);
|
2018-04-29 21:21:42 +08:00
|
|
|
wrmsrl(MSR_AMD64_LS_CFG, msr);
|
2018-05-10 03:53:09 +08:00
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (tifn & _TIF_SSBD) {
|
|
|
|
/*
|
|
|
|
* Since this can race with prctl(), block reentry on the
|
|
|
|
* same CPU.
|
|
|
|
*/
|
|
|
|
if (__test_and_set_bit(LSTATE_SSB, &st->local_state))
|
|
|
|
return;
|
|
|
|
|
|
|
|
msr |= x86_amd_ls_cfg_ssbd_mask;
|
|
|
|
|
|
|
|
raw_spin_lock(&st->shared_state->lock);
|
|
|
|
/* First sibling enables SSBD: */
|
|
|
|
if (!st->shared_state->disable_state)
|
|
|
|
wrmsrl(MSR_AMD64_LS_CFG, msr);
|
|
|
|
st->shared_state->disable_state++;
|
|
|
|
raw_spin_unlock(&st->shared_state->lock);
|
2018-04-29 21:21:42 +08:00
|
|
|
} else {
|
2018-05-10 03:53:09 +08:00
|
|
|
if (!__test_and_clear_bit(LSTATE_SSB, &st->local_state))
|
|
|
|
return;
|
|
|
|
|
|
|
|
raw_spin_lock(&st->shared_state->lock);
|
|
|
|
st->shared_state->disable_state--;
|
|
|
|
if (!st->shared_state->disable_state)
|
|
|
|
wrmsrl(MSR_AMD64_LS_CFG, msr);
|
|
|
|
raw_spin_unlock(&st->shared_state->lock);
|
2018-04-29 21:21:42 +08:00
|
|
|
}
|
|
|
|
}
|
2018-05-10 03:53:09 +08:00
|
|
|
#else
|
|
|
|
static __always_inline void amd_set_core_ssb_state(unsigned long tifn)
|
|
|
|
{
|
|
|
|
u64 msr = x86_amd_ls_cfg_base | ssbd_tif_to_amd_ls_cfg(tifn);
|
|
|
|
|
|
|
|
wrmsrl(MSR_AMD64_LS_CFG, msr);
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2018-05-17 23:09:18 +08:00
|
|
|
static __always_inline void amd_set_ssb_virt_state(unsigned long tifn)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* SSBD has the same definition in SPEC_CTRL and VIRT_SPEC_CTRL,
|
|
|
|
* so ssbd_tif_to_spec_ctrl() just works.
|
|
|
|
*/
|
|
|
|
wrmsrl(MSR_AMD64_VIRT_SPEC_CTRL, ssbd_tif_to_spec_ctrl(tifn));
|
|
|
|
}
|
|
|
|
|
2018-11-26 02:33:35 +08:00
|
|
|
/*
|
|
|
|
* Update the MSRs managing speculation control, during context switch.
|
|
|
|
*
|
|
|
|
* tifp: Previous task's thread flags
|
|
|
|
* tifn: Next task's thread flags
|
|
|
|
*/
|
|
|
|
static __always_inline void __speculation_ctrl_update(unsigned long tifp,
|
|
|
|
unsigned long tifn)
|
2018-05-10 03:53:09 +08:00
|
|
|
{
|
2018-11-26 02:33:46 +08:00
|
|
|
unsigned long tif_diff = tifp ^ tifn;
|
2018-11-26 02:33:35 +08:00
|
|
|
u64 msr = x86_spec_ctrl_base;
|
|
|
|
bool updmsr = false;
|
|
|
|
|
2019-04-15 01:51:06 +08:00
|
|
|
lockdep_assert_irqs_disabled();
|
|
|
|
|
2020-01-06 04:19:43 +08:00
|
|
|
/* Handle change of TIF_SSBD depending on the mitigation method. */
|
|
|
|
if (static_cpu_has(X86_FEATURE_VIRT_SSBD)) {
|
|
|
|
if (tif_diff & _TIF_SSBD)
|
2018-11-26 02:33:35 +08:00
|
|
|
amd_set_ssb_virt_state(tifn);
|
2020-01-06 04:19:43 +08:00
|
|
|
} else if (static_cpu_has(X86_FEATURE_LS_CFG_SSBD)) {
|
|
|
|
if (tif_diff & _TIF_SSBD)
|
2018-11-26 02:33:35 +08:00
|
|
|
amd_set_core_ssb_state(tifn);
|
2020-01-06 04:19:43 +08:00
|
|
|
} else if (static_cpu_has(X86_FEATURE_SPEC_CTRL_SSBD) ||
|
|
|
|
static_cpu_has(X86_FEATURE_AMD_SSBD)) {
|
|
|
|
updmsr |= !!(tif_diff & _TIF_SSBD);
|
|
|
|
msr |= ssbd_tif_to_spec_ctrl(tifn);
|
2018-11-26 02:33:35 +08:00
|
|
|
}
|
2018-05-10 03:53:09 +08:00
|
|
|
|
2020-01-06 04:19:43 +08:00
|
|
|
/* Only evaluate TIF_SPEC_IB if conditional STIBP is enabled. */
|
2018-11-26 02:33:46 +08:00
|
|
|
if (IS_ENABLED(CONFIG_SMP) &&
|
|
|
|
static_branch_unlikely(&switch_to_cond_stibp)) {
|
|
|
|
updmsr |= !!(tif_diff & _TIF_SPEC_IB);
|
|
|
|
msr |= stibp_tif_to_spec_ctrl(tifn);
|
|
|
|
}
|
|
|
|
|
2018-11-26 02:33:35 +08:00
|
|
|
if (updmsr)
|
2022-06-15 05:15:54 +08:00
|
|
|
write_spec_ctrl_current(msr, false);
|
2018-05-10 03:53:09 +08:00
|
|
|
}
|
|
|
|
|
2018-11-28 17:56:57 +08:00
|
|
|
static unsigned long speculation_ctrl_update_tif(struct task_struct *tsk)
|
2018-05-10 03:53:09 +08:00
|
|
|
{
|
2018-11-28 17:56:57 +08:00
|
|
|
if (test_and_clear_tsk_thread_flag(tsk, TIF_SPEC_FORCE_UPDATE)) {
|
|
|
|
if (task_spec_ssb_disable(tsk))
|
|
|
|
set_tsk_thread_flag(tsk, TIF_SSBD);
|
|
|
|
else
|
|
|
|
clear_tsk_thread_flag(tsk, TIF_SSBD);
|
x86/speculation: Add prctl() control for indirect branch speculation
Add the PR_SPEC_INDIRECT_BRANCH option for the PR_GET_SPECULATION_CTRL and
PR_SET_SPECULATION_CTRL prctls to allow fine grained per task control of
indirect branch speculation via STIBP and IBPB.
Invocations:
Check indirect branch speculation status with
- prctl(PR_GET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, 0, 0, 0);
Enable indirect branch speculation with
- prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_ENABLE, 0, 0);
Disable indirect branch speculation with
- prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_DISABLE, 0, 0);
Force disable indirect branch speculation with
- prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_FORCE_DISABLE, 0, 0);
See Documentation/userspace-api/spec_ctrl.rst.
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Casey Schaufler <casey.schaufler@intel.com>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Waiman Long <longman9394@gmail.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Dave Stewart <david.c.stewart@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181125185005.866780996@linutronix.de
2018-11-26 02:33:53 +08:00
|
|
|
|
|
|
|
if (task_spec_ib_disable(tsk))
|
|
|
|
set_tsk_thread_flag(tsk, TIF_SPEC_IB);
|
|
|
|
else
|
|
|
|
clear_tsk_thread_flag(tsk, TIF_SPEC_IB);
|
2018-11-28 17:56:57 +08:00
|
|
|
}
|
|
|
|
/* Return the updated threadinfo flags*/
|
2021-11-29 21:06:53 +08:00
|
|
|
return read_task_thread_flags(tsk);
|
2018-05-10 03:53:09 +08:00
|
|
|
}
|
2018-04-29 21:21:42 +08:00
|
|
|
|
2018-11-26 02:33:34 +08:00
|
|
|
void speculation_ctrl_update(unsigned long tif)
|
2018-04-29 21:21:42 +08:00
|
|
|
{
|
2019-04-15 01:51:06 +08:00
|
|
|
unsigned long flags;
|
|
|
|
|
2018-11-26 02:33:35 +08:00
|
|
|
/* Forced update. Make sure all relevant TIF flags are different */
|
2019-04-15 01:51:06 +08:00
|
|
|
local_irq_save(flags);
|
2018-11-26 02:33:35 +08:00
|
|
|
__speculation_ctrl_update(~tif, tif);
|
2019-04-15 01:51:06 +08:00
|
|
|
local_irq_restore(flags);
|
2018-04-29 21:21:42 +08:00
|
|
|
}
|
|
|
|
|
2018-11-28 17:56:57 +08:00
|
|
|
/* Called from seccomp/prctl update */
|
|
|
|
void speculation_ctrl_update_current(void)
|
|
|
|
{
|
|
|
|
preempt_disable();
|
|
|
|
speculation_ctrl_update(speculation_ctrl_update_tif(current));
|
|
|
|
preempt_enable();
|
|
|
|
}
|
|
|
|
|
2020-04-21 17:20:29 +08:00
|
|
|
static inline void cr4_toggle_bits_irqsoff(unsigned long mask)
|
|
|
|
{
|
|
|
|
unsigned long newval, cr4 = this_cpu_read(cpu_tlbstate.cr4);
|
|
|
|
|
|
|
|
newval = cr4 ^ mask;
|
|
|
|
if (newval != cr4) {
|
|
|
|
this_cpu_write(cpu_tlbstate.cr4, newval);
|
|
|
|
__write_cr4(newval);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-11-26 02:33:47 +08:00
|
|
|
void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p)
|
2009-02-28 05:25:28 +08:00
|
|
|
{
|
2017-02-14 16:11:02 +08:00
|
|
|
unsigned long tifp, tifn;
|
2009-02-28 05:25:28 +08:00
|
|
|
|
2021-11-29 21:06:53 +08:00
|
|
|
tifn = read_task_thread_flags(next_p);
|
|
|
|
tifp = read_task_thread_flags(prev_p);
|
2019-11-12 06:03:23 +08:00
|
|
|
|
|
|
|
switch_to_bitmap(tifp);
|
2017-02-14 16:11:02 +08:00
|
|
|
|
|
|
|
propagate_user_return_notify(prev_p, next_p);
|
|
|
|
|
2017-02-14 16:11:03 +08:00
|
|
|
if ((tifp & _TIF_BLOCKSTEP || tifn & _TIF_BLOCKSTEP) &&
|
|
|
|
arch_has_block_step()) {
|
|
|
|
unsigned long debugctl, msk;
|
2010-03-25 21:51:51 +08:00
|
|
|
|
2017-02-14 16:11:03 +08:00
|
|
|
rdmsrl(MSR_IA32_DEBUGCTLMSR, debugctl);
|
2010-03-25 21:51:51 +08:00
|
|
|
debugctl &= ~DEBUGCTLMSR_BTF;
|
2017-02-14 16:11:03 +08:00
|
|
|
msk = tifn & _TIF_BLOCKSTEP;
|
|
|
|
debugctl |= (msk >> TIF_BLOCKSTEP) << DEBUGCTLMSR_BTF_SHIFT;
|
|
|
|
wrmsrl(MSR_IA32_DEBUGCTLMSR, debugctl);
|
2010-03-25 21:51:51 +08:00
|
|
|
}
|
2009-02-28 05:25:28 +08:00
|
|
|
|
2017-02-14 16:11:04 +08:00
|
|
|
if ((tifp ^ tifn) & _TIF_NOTSC)
|
2017-11-25 11:29:07 +08:00
|
|
|
cr4_toggle_bits_irqsoff(X86_CR4_TSD);
|
2017-03-20 16:16:26 +08:00
|
|
|
|
|
|
|
if ((tifp ^ tifn) & _TIF_NOCPUID)
|
|
|
|
set_cpuid_faulting(!!(tifn & _TIF_NOCPUID));
|
2018-04-29 21:21:42 +08:00
|
|
|
|
2018-11-28 17:56:57 +08:00
|
|
|
if (likely(!((tifp | tifn) & _TIF_SPEC_FORCE_UPDATE))) {
|
|
|
|
__speculation_ctrl_update(tifp, tifn);
|
|
|
|
} else {
|
|
|
|
speculation_ctrl_update_tif(prev_p);
|
|
|
|
tifn = speculation_ctrl_update_tif(next_p);
|
|
|
|
|
|
|
|
/* Enforce MSR update to ensure consistent state */
|
|
|
|
__speculation_ctrl_update(~tifn, tifn);
|
|
|
|
}
|
2009-02-28 05:25:28 +08:00
|
|
|
}
|
|
|
|
|
2008-06-10 00:35:28 +08:00
|
|
|
/*
|
|
|
|
* Idle related variables and functions
|
|
|
|
*/
|
2010-11-04 00:06:14 +08:00
|
|
|
unsigned long boot_option_idle_override = IDLE_NO_OVERRIDE;
|
2008-06-10 00:35:28 +08:00
|
|
|
EXPORT_SYMBOL(boot_option_idle_override);
|
|
|
|
|
2013-02-10 10:45:03 +08:00
|
|
|
static void (*x86_idle)(void);
|
2008-06-10 00:35:28 +08:00
|
|
|
|
2012-03-26 05:00:04 +08:00
|
|
|
#ifndef CONFIG_SMP
|
|
|
|
static inline void play_dead(void)
|
|
|
|
{
|
|
|
|
BUG();
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2013-03-22 05:50:03 +08:00
|
|
|
void arch_cpu_idle_enter(void)
|
|
|
|
{
|
2016-12-13 21:14:17 +08:00
|
|
|
tsc_verify_tsc_adjust(false);
|
2013-03-22 05:50:03 +08:00
|
|
|
local_touch_nmi();
|
|
|
|
}
|
2012-03-26 05:00:04 +08:00
|
|
|
|
2013-03-22 05:50:03 +08:00
|
|
|
void arch_cpu_idle_dead(void)
|
|
|
|
{
|
|
|
|
play_dead();
|
|
|
|
}
|
2012-03-26 05:00:04 +08:00
|
|
|
|
2013-03-22 05:50:03 +08:00
|
|
|
/*
|
|
|
|
* Called from the generic idle code.
|
|
|
|
*/
|
|
|
|
void arch_cpu_idle(void)
|
|
|
|
{
|
2014-01-30 01:45:12 +08:00
|
|
|
x86_idle();
|
2012-03-26 05:00:04 +08:00
|
|
|
}
|
|
|
|
|
2008-06-10 00:35:28 +08:00
|
|
|
/*
|
2013-03-22 05:50:03 +08:00
|
|
|
* We use this if we don't have any better idle routine..
|
2008-06-10 00:35:28 +08:00
|
|
|
*/
|
2016-10-08 08:02:55 +08:00
|
|
|
void __cpuidle default_idle(void)
|
2008-06-10 00:35:28 +08:00
|
|
|
{
|
2020-11-20 18:50:35 +08:00
|
|
|
raw_safe_halt();
|
2008-06-10 00:35:28 +08:00
|
|
|
}
|
2019-07-04 07:51:25 +08:00
|
|
|
#if defined(CONFIG_APM_MODULE) || defined(CONFIG_HALTPOLL_CPUIDLE_MODULE)
|
2008-06-10 00:35:28 +08:00
|
|
|
EXPORT_SYMBOL(default_idle);
|
|
|
|
#endif
|
|
|
|
|
2013-02-10 12:08:07 +08:00
|
|
|
#ifdef CONFIG_XEN
|
|
|
|
bool xen_set_default_idle(void)
|
2011-11-22 07:02:02 +08:00
|
|
|
{
|
2013-02-10 10:45:03 +08:00
|
|
|
bool ret = !!x86_idle;
|
2011-11-22 07:02:02 +08:00
|
|
|
|
2013-02-10 10:45:03 +08:00
|
|
|
x86_idle = default_idle;
|
2011-11-22 07:02:02 +08:00
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
2013-02-10 12:08:07 +08:00
|
|
|
#endif
|
2017-07-18 05:10:28 +08:00
|
|
|
|
2022-03-08 23:30:47 +08:00
|
|
|
void __noreturn stop_this_cpu(void *dummy)
|
2008-11-11 21:33:44 +08:00
|
|
|
{
|
|
|
|
local_irq_disable();
|
|
|
|
/*
|
|
|
|
* Remove this CPU:
|
|
|
|
*/
|
2009-03-13 12:19:54 +08:00
|
|
|
set_cpu_online(smp_processor_id(), false);
|
2008-11-11 21:33:44 +08:00
|
|
|
disable_local_APIC();
|
2015-08-13 00:29:40 +08:00
|
|
|
mcheck_cpu_clear(this_cpu_ptr(&cpu_info));
|
2008-11-11 21:33:44 +08:00
|
|
|
|
2018-01-18 07:41:41 +08:00
|
|
|
/*
|
|
|
|
* Use wbinvd on processors that support SME. This provides support
|
|
|
|
* for performing a successful kexec when going from SME inactive
|
|
|
|
* to SME active (or vice-versa). The cache must be cleared so that
|
|
|
|
* if there are entries with the same physical address, both with and
|
|
|
|
* without the encryption bit, they don't race each other when flushed
|
|
|
|
* and potentially end up with the wrong entry being committed to
|
|
|
|
* memory.
|
2022-02-16 11:44:46 +08:00
|
|
|
*
|
|
|
|
* Test the CPUID bit directly because the machine might've cleared
|
|
|
|
* X86_FEATURE_SME due to cmdline options.
|
2018-01-18 07:41:41 +08:00
|
|
|
*/
|
2022-02-16 11:44:46 +08:00
|
|
|
if (cpuid_eax(0x8000001f) & BIT(0))
|
2018-01-18 07:41:41 +08:00
|
|
|
native_wbinvd();
|
2017-07-18 05:10:28 +08:00
|
|
|
for (;;) {
|
|
|
|
/*
|
2018-01-18 07:41:41 +08:00
|
|
|
* Use native_halt() so that memory contents don't change
|
|
|
|
* (stack usage and variables) after possibly issuing the
|
|
|
|
* native_wbinvd() above.
|
2017-07-18 05:10:28 +08:00
|
|
|
*/
|
2018-01-18 07:41:41 +08:00
|
|
|
native_halt();
|
2017-07-18 05:10:28 +08:00
|
|
|
}
|
2008-04-25 23:39:01 +08:00
|
|
|
}
|
|
|
|
|
2008-06-10 01:15:00 +08:00
|
|
|
/*
|
2016-12-10 02:29:11 +08:00
|
|
|
* AMD Erratum 400 aware idle routine. We handle it the same way as C3 power
|
|
|
|
* states (local apic timer and TSC stop).
|
2020-11-20 18:50:35 +08:00
|
|
|
*
|
|
|
|
* XXX this function is completely buggered vs RCU and tracing.
|
2008-06-10 01:15:00 +08:00
|
|
|
*/
|
2011-04-02 04:59:53 +08:00
|
|
|
static void amd_e400_idle(void)
|
2008-06-10 01:15:00 +08:00
|
|
|
{
|
2016-12-10 02:29:11 +08:00
|
|
|
/*
|
|
|
|
* We cannot use static_cpu_has_bug() here because X86_BUG_AMD_APIC_C1E
|
|
|
|
* gets set after static_cpu_has() places have been converted via
|
|
|
|
* alternatives.
|
|
|
|
*/
|
|
|
|
if (!boot_cpu_has_bug(X86_BUG_AMD_APIC_C1E)) {
|
|
|
|
default_idle();
|
|
|
|
return;
|
2008-06-10 01:15:00 +08:00
|
|
|
}
|
|
|
|
|
2016-12-10 02:29:11 +08:00
|
|
|
tick_broadcast_enter();
|
2008-06-10 01:15:00 +08:00
|
|
|
|
2016-12-10 02:29:11 +08:00
|
|
|
default_idle();
|
2008-06-17 15:12:03 +08:00
|
|
|
|
2016-12-10 02:29:11 +08:00
|
|
|
/*
|
|
|
|
* The switch back from broadcast mode needs to be called with
|
|
|
|
* interrupts disabled.
|
|
|
|
*/
|
2020-11-20 18:50:35 +08:00
|
|
|
raw_local_irq_disable();
|
2016-12-10 02:29:11 +08:00
|
|
|
tick_broadcast_exit();
|
2020-11-20 18:50:35 +08:00
|
|
|
raw_local_irq_enable();
|
2008-06-10 01:15:00 +08:00
|
|
|
}
|
|
|
|
|
sched/idle/x86: Restore mwait_idle() to fix boot hangs, to improve power savings and to improve performance
In Linux-3.9 we removed the mwait_idle() loop:
69fb3676df33 ("x86 idle: remove mwait_idle() and "idle=mwait" cmdline param")
The reasoning was that modern machines should be sufficiently
happy during the boot process using the default_idle() HALT
loop, until cpuidle loads and either acpi_idle or intel_idle
invoke the newer MWAIT-with-hints idle loop.
But two machines reported problems:
1. Certain Core2-era machines support MWAIT-C1 and HALT only.
MWAIT-C1 is preferred for optimal power and performance.
But if they support just C1, cpuidle never loads and
so they use the boot-time default idle loop forever.
2. Some laptops will boot-hang if HALT is used,
but will boot successfully if MWAIT is used.
This appears to be a hidden assumption in BIOS SMI,
that is presumably valid on the proprietary OS
where the BIOS was validated.
https://bugzilla.kernel.org/show_bug.cgi?id=60770
So here we effectively revert the patch above, restoring
the mwait_idle() loop. However, we don't bother restoring
the idle=mwait cmdline parameter, since it appears to add
no value.
Maintainer notes:
For 3.9, simply revert 69fb3676df
for 3.10, patch -F3 applies, fuzz needed due to __cpuinit use in
context For 3.11, 3.12, 3.13, this patch applies cleanly
Tested-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Acked-by: Mike Galbraith <bitbucket@online.de>
Cc: <stable@vger.kernel.org> # 3.9+
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ian Malone <ibmalone@gmail.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/345254a551eb5a6a866e048d7ab570fd2193aca4.1389763084.git.len.brown@intel.com
[ Ported to recent kernels. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-15 13:37:34 +08:00
|
|
|
/*
|
2022-06-07 02:03:35 +08:00
|
|
|
* Prefer MWAIT over HALT if MWAIT is supported, MWAIT_CPUID leaf
|
|
|
|
* exists and whenever MONITOR/MWAIT extensions are present there is at
|
|
|
|
* least one C1 substate.
|
sched/idle/x86: Restore mwait_idle() to fix boot hangs, to improve power savings and to improve performance
In Linux-3.9 we removed the mwait_idle() loop:
69fb3676df33 ("x86 idle: remove mwait_idle() and "idle=mwait" cmdline param")
The reasoning was that modern machines should be sufficiently
happy during the boot process using the default_idle() HALT
loop, until cpuidle loads and either acpi_idle or intel_idle
invoke the newer MWAIT-with-hints idle loop.
But two machines reported problems:
1. Certain Core2-era machines support MWAIT-C1 and HALT only.
MWAIT-C1 is preferred for optimal power and performance.
But if they support just C1, cpuidle never loads and
so they use the boot-time default idle loop forever.
2. Some laptops will boot-hang if HALT is used,
but will boot successfully if MWAIT is used.
This appears to be a hidden assumption in BIOS SMI,
that is presumably valid on the proprietary OS
where the BIOS was validated.
https://bugzilla.kernel.org/show_bug.cgi?id=60770
So here we effectively revert the patch above, restoring
the mwait_idle() loop. However, we don't bother restoring
the idle=mwait cmdline parameter, since it appears to add
no value.
Maintainer notes:
For 3.9, simply revert 69fb3676df
for 3.10, patch -F3 applies, fuzz needed due to __cpuinit use in
context For 3.11, 3.12, 3.13, this patch applies cleanly
Tested-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Acked-by: Mike Galbraith <bitbucket@online.de>
Cc: <stable@vger.kernel.org> # 3.9+
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ian Malone <ibmalone@gmail.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/345254a551eb5a6a866e048d7ab570fd2193aca4.1389763084.git.len.brown@intel.com
[ Ported to recent kernels. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-15 13:37:34 +08:00
|
|
|
*
|
2022-06-07 02:03:35 +08:00
|
|
|
* Do not prefer MWAIT if MONITOR instruction has a bug or idle=nomwait
|
|
|
|
* is passed to kernel commandline parameter.
|
sched/idle/x86: Restore mwait_idle() to fix boot hangs, to improve power savings and to improve performance
In Linux-3.9 we removed the mwait_idle() loop:
69fb3676df33 ("x86 idle: remove mwait_idle() and "idle=mwait" cmdline param")
The reasoning was that modern machines should be sufficiently
happy during the boot process using the default_idle() HALT
loop, until cpuidle loads and either acpi_idle or intel_idle
invoke the newer MWAIT-with-hints idle loop.
But two machines reported problems:
1. Certain Core2-era machines support MWAIT-C1 and HALT only.
MWAIT-C1 is preferred for optimal power and performance.
But if they support just C1, cpuidle never loads and
so they use the boot-time default idle loop forever.
2. Some laptops will boot-hang if HALT is used,
but will boot successfully if MWAIT is used.
This appears to be a hidden assumption in BIOS SMI,
that is presumably valid on the proprietary OS
where the BIOS was validated.
https://bugzilla.kernel.org/show_bug.cgi?id=60770
So here we effectively revert the patch above, restoring
the mwait_idle() loop. However, we don't bother restoring
the idle=mwait cmdline parameter, since it appears to add
no value.
Maintainer notes:
For 3.9, simply revert 69fb3676df
for 3.10, patch -F3 applies, fuzz needed due to __cpuinit use in
context For 3.11, 3.12, 3.13, this patch applies cleanly
Tested-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Acked-by: Mike Galbraith <bitbucket@online.de>
Cc: <stable@vger.kernel.org> # 3.9+
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ian Malone <ibmalone@gmail.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/345254a551eb5a6a866e048d7ab570fd2193aca4.1389763084.git.len.brown@intel.com
[ Ported to recent kernels. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-15 13:37:34 +08:00
|
|
|
*/
|
|
|
|
static int prefer_mwait_c1_over_halt(const struct cpuinfo_x86 *c)
|
|
|
|
{
|
2022-06-07 02:03:35 +08:00
|
|
|
u32 eax, ebx, ecx, edx;
|
|
|
|
|
2022-06-07 02:03:34 +08:00
|
|
|
/* User has disallowed the use of MWAIT. Fallback to HALT */
|
|
|
|
if (boot_option_idle_override == IDLE_NOMWAIT)
|
sched/idle/x86: Restore mwait_idle() to fix boot hangs, to improve power savings and to improve performance
In Linux-3.9 we removed the mwait_idle() loop:
69fb3676df33 ("x86 idle: remove mwait_idle() and "idle=mwait" cmdline param")
The reasoning was that modern machines should be sufficiently
happy during the boot process using the default_idle() HALT
loop, until cpuidle loads and either acpi_idle or intel_idle
invoke the newer MWAIT-with-hints idle loop.
But two machines reported problems:
1. Certain Core2-era machines support MWAIT-C1 and HALT only.
MWAIT-C1 is preferred for optimal power and performance.
But if they support just C1, cpuidle never loads and
so they use the boot-time default idle loop forever.
2. Some laptops will boot-hang if HALT is used,
but will boot successfully if MWAIT is used.
This appears to be a hidden assumption in BIOS SMI,
that is presumably valid on the proprietary OS
where the BIOS was validated.
https://bugzilla.kernel.org/show_bug.cgi?id=60770
So here we effectively revert the patch above, restoring
the mwait_idle() loop. However, we don't bother restoring
the idle=mwait cmdline parameter, since it appears to add
no value.
Maintainer notes:
For 3.9, simply revert 69fb3676df
for 3.10, patch -F3 applies, fuzz needed due to __cpuinit use in
context For 3.11, 3.12, 3.13, this patch applies cleanly
Tested-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Acked-by: Mike Galbraith <bitbucket@online.de>
Cc: <stable@vger.kernel.org> # 3.9+
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ian Malone <ibmalone@gmail.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/345254a551eb5a6a866e048d7ab570fd2193aca4.1389763084.git.len.brown@intel.com
[ Ported to recent kernels. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-15 13:37:34 +08:00
|
|
|
return 0;
|
|
|
|
|
2022-06-07 02:03:35 +08:00
|
|
|
/* MWAIT is not supported on this platform. Fallback to HALT */
|
|
|
|
if (!cpu_has(c, X86_FEATURE_MWAIT))
|
sched/idle/x86: Restore mwait_idle() to fix boot hangs, to improve power savings and to improve performance
In Linux-3.9 we removed the mwait_idle() loop:
69fb3676df33 ("x86 idle: remove mwait_idle() and "idle=mwait" cmdline param")
The reasoning was that modern machines should be sufficiently
happy during the boot process using the default_idle() HALT
loop, until cpuidle loads and either acpi_idle or intel_idle
invoke the newer MWAIT-with-hints idle loop.
But two machines reported problems:
1. Certain Core2-era machines support MWAIT-C1 and HALT only.
MWAIT-C1 is preferred for optimal power and performance.
But if they support just C1, cpuidle never loads and
so they use the boot-time default idle loop forever.
2. Some laptops will boot-hang if HALT is used,
but will boot successfully if MWAIT is used.
This appears to be a hidden assumption in BIOS SMI,
that is presumably valid on the proprietary OS
where the BIOS was validated.
https://bugzilla.kernel.org/show_bug.cgi?id=60770
So here we effectively revert the patch above, restoring
the mwait_idle() loop. However, we don't bother restoring
the idle=mwait cmdline parameter, since it appears to add
no value.
Maintainer notes:
For 3.9, simply revert 69fb3676df
for 3.10, patch -F3 applies, fuzz needed due to __cpuinit use in
context For 3.11, 3.12, 3.13, this patch applies cleanly
Tested-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Acked-by: Mike Galbraith <bitbucket@online.de>
Cc: <stable@vger.kernel.org> # 3.9+
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ian Malone <ibmalone@gmail.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/345254a551eb5a6a866e048d7ab570fd2193aca4.1389763084.git.len.brown@intel.com
[ Ported to recent kernels. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-15 13:37:34 +08:00
|
|
|
return 0;
|
|
|
|
|
2022-06-07 02:03:35 +08:00
|
|
|
/* Monitor has a bug. Fallback to HALT */
|
|
|
|
if (boot_cpu_has_bug(X86_BUG_MONITOR))
|
sched/idle/x86: Restore mwait_idle() to fix boot hangs, to improve power savings and to improve performance
In Linux-3.9 we removed the mwait_idle() loop:
69fb3676df33 ("x86 idle: remove mwait_idle() and "idle=mwait" cmdline param")
The reasoning was that modern machines should be sufficiently
happy during the boot process using the default_idle() HALT
loop, until cpuidle loads and either acpi_idle or intel_idle
invoke the newer MWAIT-with-hints idle loop.
But two machines reported problems:
1. Certain Core2-era machines support MWAIT-C1 and HALT only.
MWAIT-C1 is preferred for optimal power and performance.
But if they support just C1, cpuidle never loads and
so they use the boot-time default idle loop forever.
2. Some laptops will boot-hang if HALT is used,
but will boot successfully if MWAIT is used.
This appears to be a hidden assumption in BIOS SMI,
that is presumably valid on the proprietary OS
where the BIOS was validated.
https://bugzilla.kernel.org/show_bug.cgi?id=60770
So here we effectively revert the patch above, restoring
the mwait_idle() loop. However, we don't bother restoring
the idle=mwait cmdline parameter, since it appears to add
no value.
Maintainer notes:
For 3.9, simply revert 69fb3676df
for 3.10, patch -F3 applies, fuzz needed due to __cpuinit use in
context For 3.11, 3.12, 3.13, this patch applies cleanly
Tested-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Acked-by: Mike Galbraith <bitbucket@online.de>
Cc: <stable@vger.kernel.org> # 3.9+
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ian Malone <ibmalone@gmail.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/345254a551eb5a6a866e048d7ab570fd2193aca4.1389763084.git.len.brown@intel.com
[ Ported to recent kernels. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-15 13:37:34 +08:00
|
|
|
return 0;
|
|
|
|
|
2022-06-07 02:03:35 +08:00
|
|
|
cpuid(CPUID_MWAIT_LEAF, &eax, &ebx, &ecx, &edx);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If MWAIT extensions are not available, it is safe to use MWAIT
|
|
|
|
* with EAX=0, ECX=0.
|
|
|
|
*/
|
|
|
|
if (!(ecx & CPUID5_ECX_EXTENSIONS_SUPPORTED))
|
|
|
|
return 1;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If MWAIT extensions are available, there should be at least one
|
|
|
|
* MWAIT C1 substate present.
|
|
|
|
*/
|
|
|
|
return (edx & MWAIT_C1_SUBSTATE_MASK);
|
sched/idle/x86: Restore mwait_idle() to fix boot hangs, to improve power savings and to improve performance
In Linux-3.9 we removed the mwait_idle() loop:
69fb3676df33 ("x86 idle: remove mwait_idle() and "idle=mwait" cmdline param")
The reasoning was that modern machines should be sufficiently
happy during the boot process using the default_idle() HALT
loop, until cpuidle loads and either acpi_idle or intel_idle
invoke the newer MWAIT-with-hints idle loop.
But two machines reported problems:
1. Certain Core2-era machines support MWAIT-C1 and HALT only.
MWAIT-C1 is preferred for optimal power and performance.
But if they support just C1, cpuidle never loads and
so they use the boot-time default idle loop forever.
2. Some laptops will boot-hang if HALT is used,
but will boot successfully if MWAIT is used.
This appears to be a hidden assumption in BIOS SMI,
that is presumably valid on the proprietary OS
where the BIOS was validated.
https://bugzilla.kernel.org/show_bug.cgi?id=60770
So here we effectively revert the patch above, restoring
the mwait_idle() loop. However, we don't bother restoring
the idle=mwait cmdline parameter, since it appears to add
no value.
Maintainer notes:
For 3.9, simply revert 69fb3676df
for 3.10, patch -F3 applies, fuzz needed due to __cpuinit use in
context For 3.11, 3.12, 3.13, this patch applies cleanly
Tested-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Acked-by: Mike Galbraith <bitbucket@online.de>
Cc: <stable@vger.kernel.org> # 3.9+
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ian Malone <ibmalone@gmail.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/345254a551eb5a6a866e048d7ab570fd2193aca4.1389763084.git.len.brown@intel.com
[ Ported to recent kernels. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-15 13:37:34 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2015-05-26 16:28:09 +08:00
|
|
|
* MONITOR/MWAIT with no hints, used for default C1 state. This invokes MWAIT
|
|
|
|
* with interrupts enabled and no flags, which is backwards compatible with the
|
|
|
|
* original MWAIT implementation.
|
sched/idle/x86: Restore mwait_idle() to fix boot hangs, to improve power savings and to improve performance
In Linux-3.9 we removed the mwait_idle() loop:
69fb3676df33 ("x86 idle: remove mwait_idle() and "idle=mwait" cmdline param")
The reasoning was that modern machines should be sufficiently
happy during the boot process using the default_idle() HALT
loop, until cpuidle loads and either acpi_idle or intel_idle
invoke the newer MWAIT-with-hints idle loop.
But two machines reported problems:
1. Certain Core2-era machines support MWAIT-C1 and HALT only.
MWAIT-C1 is preferred for optimal power and performance.
But if they support just C1, cpuidle never loads and
so they use the boot-time default idle loop forever.
2. Some laptops will boot-hang if HALT is used,
but will boot successfully if MWAIT is used.
This appears to be a hidden assumption in BIOS SMI,
that is presumably valid on the proprietary OS
where the BIOS was validated.
https://bugzilla.kernel.org/show_bug.cgi?id=60770
So here we effectively revert the patch above, restoring
the mwait_idle() loop. However, we don't bother restoring
the idle=mwait cmdline parameter, since it appears to add
no value.
Maintainer notes:
For 3.9, simply revert 69fb3676df
for 3.10, patch -F3 applies, fuzz needed due to __cpuinit use in
context For 3.11, 3.12, 3.13, this patch applies cleanly
Tested-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Acked-by: Mike Galbraith <bitbucket@online.de>
Cc: <stable@vger.kernel.org> # 3.9+
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ian Malone <ibmalone@gmail.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/345254a551eb5a6a866e048d7ab570fd2193aca4.1389763084.git.len.brown@intel.com
[ Ported to recent kernels. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-15 13:37:34 +08:00
|
|
|
*/
|
2016-10-08 08:02:55 +08:00
|
|
|
static __cpuidle void mwait_idle(void)
|
sched/idle/x86: Restore mwait_idle() to fix boot hangs, to improve power savings and to improve performance
In Linux-3.9 we removed the mwait_idle() loop:
69fb3676df33 ("x86 idle: remove mwait_idle() and "idle=mwait" cmdline param")
The reasoning was that modern machines should be sufficiently
happy during the boot process using the default_idle() HALT
loop, until cpuidle loads and either acpi_idle or intel_idle
invoke the newer MWAIT-with-hints idle loop.
But two machines reported problems:
1. Certain Core2-era machines support MWAIT-C1 and HALT only.
MWAIT-C1 is preferred for optimal power and performance.
But if they support just C1, cpuidle never loads and
so they use the boot-time default idle loop forever.
2. Some laptops will boot-hang if HALT is used,
but will boot successfully if MWAIT is used.
This appears to be a hidden assumption in BIOS SMI,
that is presumably valid on the proprietary OS
where the BIOS was validated.
https://bugzilla.kernel.org/show_bug.cgi?id=60770
So here we effectively revert the patch above, restoring
the mwait_idle() loop. However, we don't bother restoring
the idle=mwait cmdline parameter, since it appears to add
no value.
Maintainer notes:
For 3.9, simply revert 69fb3676df
for 3.10, patch -F3 applies, fuzz needed due to __cpuinit use in
context For 3.11, 3.12, 3.13, this patch applies cleanly
Tested-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Acked-by: Mike Galbraith <bitbucket@online.de>
Cc: <stable@vger.kernel.org> # 3.9+
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ian Malone <ibmalone@gmail.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/345254a551eb5a6a866e048d7ab570fd2193aca4.1389763084.git.len.brown@intel.com
[ Ported to recent kernels. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-15 13:37:34 +08:00
|
|
|
{
|
2014-01-19 00:14:44 +08:00
|
|
|
if (!current_set_polling_and_test()) {
|
|
|
|
if (this_cpu_has(X86_BUG_CLFLUSH_MONITOR)) {
|
2016-01-29 01:02:51 +08:00
|
|
|
mb(); /* quirk */
|
sched/idle/x86: Restore mwait_idle() to fix boot hangs, to improve power savings and to improve performance
In Linux-3.9 we removed the mwait_idle() loop:
69fb3676df33 ("x86 idle: remove mwait_idle() and "idle=mwait" cmdline param")
The reasoning was that modern machines should be sufficiently
happy during the boot process using the default_idle() HALT
loop, until cpuidle loads and either acpi_idle or intel_idle
invoke the newer MWAIT-with-hints idle loop.
But two machines reported problems:
1. Certain Core2-era machines support MWAIT-C1 and HALT only.
MWAIT-C1 is preferred for optimal power and performance.
But if they support just C1, cpuidle never loads and
so they use the boot-time default idle loop forever.
2. Some laptops will boot-hang if HALT is used,
but will boot successfully if MWAIT is used.
This appears to be a hidden assumption in BIOS SMI,
that is presumably valid on the proprietary OS
where the BIOS was validated.
https://bugzilla.kernel.org/show_bug.cgi?id=60770
So here we effectively revert the patch above, restoring
the mwait_idle() loop. However, we don't bother restoring
the idle=mwait cmdline parameter, since it appears to add
no value.
Maintainer notes:
For 3.9, simply revert 69fb3676df
for 3.10, patch -F3 applies, fuzz needed due to __cpuinit use in
context For 3.11, 3.12, 3.13, this patch applies cleanly
Tested-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Acked-by: Mike Galbraith <bitbucket@online.de>
Cc: <stable@vger.kernel.org> # 3.9+
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ian Malone <ibmalone@gmail.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/345254a551eb5a6a866e048d7ab570fd2193aca4.1389763084.git.len.brown@intel.com
[ Ported to recent kernels. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-15 13:37:34 +08:00
|
|
|
clflush((void *)¤t_thread_info()->flags);
|
2016-01-29 01:02:51 +08:00
|
|
|
mb(); /* quirk */
|
2014-01-19 00:14:44 +08:00
|
|
|
}
|
sched/idle/x86: Restore mwait_idle() to fix boot hangs, to improve power savings and to improve performance
In Linux-3.9 we removed the mwait_idle() loop:
69fb3676df33 ("x86 idle: remove mwait_idle() and "idle=mwait" cmdline param")
The reasoning was that modern machines should be sufficiently
happy during the boot process using the default_idle() HALT
loop, until cpuidle loads and either acpi_idle or intel_idle
invoke the newer MWAIT-with-hints idle loop.
But two machines reported problems:
1. Certain Core2-era machines support MWAIT-C1 and HALT only.
MWAIT-C1 is preferred for optimal power and performance.
But if they support just C1, cpuidle never loads and
so they use the boot-time default idle loop forever.
2. Some laptops will boot-hang if HALT is used,
but will boot successfully if MWAIT is used.
This appears to be a hidden assumption in BIOS SMI,
that is presumably valid on the proprietary OS
where the BIOS was validated.
https://bugzilla.kernel.org/show_bug.cgi?id=60770
So here we effectively revert the patch above, restoring
the mwait_idle() loop. However, we don't bother restoring
the idle=mwait cmdline parameter, since it appears to add
no value.
Maintainer notes:
For 3.9, simply revert 69fb3676df
for 3.10, patch -F3 applies, fuzz needed due to __cpuinit use in
context For 3.11, 3.12, 3.13, this patch applies cleanly
Tested-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Acked-by: Mike Galbraith <bitbucket@online.de>
Cc: <stable@vger.kernel.org> # 3.9+
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ian Malone <ibmalone@gmail.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/345254a551eb5a6a866e048d7ab570fd2193aca4.1389763084.git.len.brown@intel.com
[ Ported to recent kernels. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-15 13:37:34 +08:00
|
|
|
|
|
|
|
__monitor((void *)¤t_thread_info()->flags, 0, 0);
|
|
|
|
if (!need_resched())
|
|
|
|
__sti_mwait(0, 0);
|
|
|
|
else
|
2020-11-20 18:50:35 +08:00
|
|
|
raw_local_irq_enable();
|
2014-01-19 00:14:44 +08:00
|
|
|
} else {
|
2020-11-20 18:50:35 +08:00
|
|
|
raw_local_irq_enable();
|
2014-01-19 00:14:44 +08:00
|
|
|
}
|
|
|
|
__current_clr_polling();
|
sched/idle/x86: Restore mwait_idle() to fix boot hangs, to improve power savings and to improve performance
In Linux-3.9 we removed the mwait_idle() loop:
69fb3676df33 ("x86 idle: remove mwait_idle() and "idle=mwait" cmdline param")
The reasoning was that modern machines should be sufficiently
happy during the boot process using the default_idle() HALT
loop, until cpuidle loads and either acpi_idle or intel_idle
invoke the newer MWAIT-with-hints idle loop.
But two machines reported problems:
1. Certain Core2-era machines support MWAIT-C1 and HALT only.
MWAIT-C1 is preferred for optimal power and performance.
But if they support just C1, cpuidle never loads and
so they use the boot-time default idle loop forever.
2. Some laptops will boot-hang if HALT is used,
but will boot successfully if MWAIT is used.
This appears to be a hidden assumption in BIOS SMI,
that is presumably valid on the proprietary OS
where the BIOS was validated.
https://bugzilla.kernel.org/show_bug.cgi?id=60770
So here we effectively revert the patch above, restoring
the mwait_idle() loop. However, we don't bother restoring
the idle=mwait cmdline parameter, since it appears to add
no value.
Maintainer notes:
For 3.9, simply revert 69fb3676df
for 3.10, patch -F3 applies, fuzz needed due to __cpuinit use in
context For 3.11, 3.12, 3.13, this patch applies cleanly
Tested-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Acked-by: Mike Galbraith <bitbucket@online.de>
Cc: <stable@vger.kernel.org> # 3.9+
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ian Malone <ibmalone@gmail.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/345254a551eb5a6a866e048d7ab570fd2193aca4.1389763084.git.len.brown@intel.com
[ Ported to recent kernels. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-15 13:37:34 +08:00
|
|
|
}
|
|
|
|
|
x86: delete __cpuinit usage from all x86 files
The __cpuinit type of throwaway sections might have made sense
some time ago when RAM was more constrained, but now the savings
do not offset the cost and complications. For example, the fix in
commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
is a good example of the nasty type of bugs that can be created
with improper use of the various __init prefixes.
After a discussion on LKML[1] it was decided that cpuinit should go
the way of devinit and be phased out. Once all the users are gone,
we can then finally remove the macros themselves from linux/init.h.
Note that some harmless section mismatch warnings may result, since
notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c)
are flagged as __cpuinit -- so if we remove the __cpuinit from
arch specific callers, we will also get section mismatch warnings.
As an intermediate step, we intend to turn the linux/init.h cpuinit
content into no-ops as early as possible, since that will get rid
of these warnings. In any case, they are temporary and harmless.
This removes all the arch/x86 uses of the __cpuinit macros from
all C files. x86 only had the one __CPUINIT used in assembly files,
and it wasn't paired off with a .previous or a __FINIT, so we can
delete it directly w/o any corresponding additional change there.
[1] https://lkml.org/lkml/2013/5/20/589
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2013-06-19 06:23:59 +08:00
|
|
|
void select_idle_routine(const struct cpuinfo_x86 *c)
|
2008-04-25 23:39:01 +08:00
|
|
|
{
|
2009-01-28 00:07:08 +08:00
|
|
|
#ifdef CONFIG_SMP
|
2013-03-22 05:50:03 +08:00
|
|
|
if (boot_option_idle_override == IDLE_POLL && smp_num_siblings > 1)
|
2012-05-22 10:50:07 +08:00
|
|
|
pr_warn_once("WARNING: polling idle and HT enabled, performance may degrade\n");
|
2008-04-25 23:39:01 +08:00
|
|
|
#endif
|
2013-03-22 05:50:03 +08:00
|
|
|
if (x86_idle || boot_option_idle_override == IDLE_POLL)
|
2008-06-09 22:59:53 +08:00
|
|
|
return;
|
|
|
|
|
2016-12-10 02:29:09 +08:00
|
|
|
if (boot_cpu_has_bug(X86_BUG_AMD_E400)) {
|
2012-05-22 10:50:07 +08:00
|
|
|
pr_info("using AMD E400 aware idle routine\n");
|
2013-02-10 10:45:03 +08:00
|
|
|
x86_idle = amd_e400_idle;
|
sched/idle/x86: Restore mwait_idle() to fix boot hangs, to improve power savings and to improve performance
In Linux-3.9 we removed the mwait_idle() loop:
69fb3676df33 ("x86 idle: remove mwait_idle() and "idle=mwait" cmdline param")
The reasoning was that modern machines should be sufficiently
happy during the boot process using the default_idle() HALT
loop, until cpuidle loads and either acpi_idle or intel_idle
invoke the newer MWAIT-with-hints idle loop.
But two machines reported problems:
1. Certain Core2-era machines support MWAIT-C1 and HALT only.
MWAIT-C1 is preferred for optimal power and performance.
But if they support just C1, cpuidle never loads and
so they use the boot-time default idle loop forever.
2. Some laptops will boot-hang if HALT is used,
but will boot successfully if MWAIT is used.
This appears to be a hidden assumption in BIOS SMI,
that is presumably valid on the proprietary OS
where the BIOS was validated.
https://bugzilla.kernel.org/show_bug.cgi?id=60770
So here we effectively revert the patch above, restoring
the mwait_idle() loop. However, we don't bother restoring
the idle=mwait cmdline parameter, since it appears to add
no value.
Maintainer notes:
For 3.9, simply revert 69fb3676df
for 3.10, patch -F3 applies, fuzz needed due to __cpuinit use in
context For 3.11, 3.12, 3.13, this patch applies cleanly
Tested-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Acked-by: Mike Galbraith <bitbucket@online.de>
Cc: <stable@vger.kernel.org> # 3.9+
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ian Malone <ibmalone@gmail.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/345254a551eb5a6a866e048d7ab570fd2193aca4.1389763084.git.len.brown@intel.com
[ Ported to recent kernels. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-15 13:37:34 +08:00
|
|
|
} else if (prefer_mwait_c1_over_halt(c)) {
|
|
|
|
pr_info("using mwait in idle threads\n");
|
|
|
|
x86_idle = mwait_idle;
|
x86/tdx: Add HLT support for TDX guests
The HLT instruction is a privileged instruction, executing it stops
instruction execution and places the processor in a HALT state. It
is used in kernel for cases like reboot, idle loop and exception fixup
handlers. For the idle case, interrupts will be enabled (using STI)
before the HLT instruction (this is also called safe_halt()).
To support the HLT instruction in TDX guests, it needs to be emulated
using TDVMCALL (hypercall to VMM). More details about it can be found
in Intel Trust Domain Extensions (Intel TDX) Guest-Host-Communication
Interface (GHCI) specification, section TDVMCALL[Instruction.HLT].
In TDX guests, executing HLT instruction will generate a #VE, which is
used to emulate the HLT instruction. But #VE based emulation will not
work for the safe_halt() flavor, because it requires STI instruction to
be executed just before the TDCALL. Since idle loop is the only user of
safe_halt() variant, handle it as a special case.
To avoid *safe_halt() call in the idle function, define the
tdx_guest_idle() and use it to override the "x86_idle" function pointer
for a valid TDX guest.
Alternative choices like PV ops have been considered for adding
safe_halt() support. But it was rejected because HLT paravirt calls
only exist under PARAVIRT_XXL, and enabling it in TDX guest just for
safe_halt() use case is not worth the cost.
Co-developed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20220405232939.73860-9-kirill.shutemov@linux.intel.com
2022-04-06 07:29:17 +08:00
|
|
|
} else if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) {
|
|
|
|
pr_info("using TDX aware idle routine\n");
|
|
|
|
x86_idle = tdx_safe_halt;
|
2008-06-09 22:59:53 +08:00
|
|
|
} else
|
2013-02-10 10:45:03 +08:00
|
|
|
x86_idle = default_idle;
|
2008-04-25 23:39:01 +08:00
|
|
|
}
|
|
|
|
|
2016-12-10 02:29:11 +08:00
|
|
|
void amd_e400_c1e_apic_setup(void)
|
2009-03-17 12:20:34 +08:00
|
|
|
{
|
2016-12-10 02:29:11 +08:00
|
|
|
if (boot_cpu_has_bug(X86_BUG_AMD_APIC_C1E)) {
|
|
|
|
pr_info("Switch to broadcast mode on CPU%d\n", smp_processor_id());
|
|
|
|
local_irq_disable();
|
|
|
|
tick_broadcast_force();
|
|
|
|
local_irq_enable();
|
|
|
|
}
|
2009-03-17 12:20:34 +08:00
|
|
|
}
|
|
|
|
|
2016-12-10 02:29:10 +08:00
|
|
|
void __init arch_post_acpi_subsys_init(void)
|
|
|
|
{
|
|
|
|
u32 lo, hi;
|
|
|
|
|
|
|
|
if (!boot_cpu_has_bug(X86_BUG_AMD_E400))
|
|
|
|
return;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* AMD E400 detection needs to happen after ACPI has been enabled. If
|
|
|
|
* the machine is affected K8_INTP_C1E_ACTIVE_MASK bits are set in
|
|
|
|
* MSR_K8_INT_PENDING_MSG.
|
|
|
|
*/
|
|
|
|
rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi);
|
|
|
|
if (!(lo & K8_INTP_C1E_ACTIVE_MASK))
|
|
|
|
return;
|
|
|
|
|
|
|
|
boot_cpu_set_bug(X86_BUG_AMD_APIC_C1E);
|
|
|
|
|
|
|
|
if (!boot_cpu_has(X86_FEATURE_NONSTOP_TSC))
|
|
|
|
mark_tsc_unstable("TSC halt in AMD C1E");
|
|
|
|
pr_info("System has AMD C1E enabled\n");
|
|
|
|
}
|
|
|
|
|
2008-04-25 23:39:01 +08:00
|
|
|
static int __init idle_setup(char *str)
|
|
|
|
{
|
2008-07-05 19:53:36 +08:00
|
|
|
if (!str)
|
|
|
|
return -EINVAL;
|
|
|
|
|
2008-04-25 23:39:01 +08:00
|
|
|
if (!strcmp(str, "poll")) {
|
2012-05-22 10:50:07 +08:00
|
|
|
pr_info("using polling idle threads\n");
|
2010-11-04 00:06:14 +08:00
|
|
|
boot_option_idle_override = IDLE_POLL;
|
2013-03-22 05:50:03 +08:00
|
|
|
cpu_idle_poll_ctrl(true);
|
2010-11-04 00:06:14 +08:00
|
|
|
} else if (!strcmp(str, "halt")) {
|
2008-06-24 17:58:53 +08:00
|
|
|
/*
|
|
|
|
* When the boot option of idle=halt is added, halt is
|
|
|
|
* forced to be used for CPU idle. In such case CPU C2/C3
|
|
|
|
* won't be used again.
|
|
|
|
* To continue to load the CPU idle driver, don't touch
|
|
|
|
* the boot_option_idle_override.
|
|
|
|
*/
|
2013-02-10 10:45:03 +08:00
|
|
|
x86_idle = default_idle;
|
2010-11-04 00:06:14 +08:00
|
|
|
boot_option_idle_override = IDLE_HALT;
|
2008-06-24 18:01:09 +08:00
|
|
|
} else if (!strcmp(str, "nomwait")) {
|
|
|
|
/*
|
|
|
|
* If the boot option of "idle=nomwait" is added,
|
2022-06-07 02:03:34 +08:00
|
|
|
* it means that mwait will be disabled for CPU C1/C2/C3
|
|
|
|
* states.
|
2008-06-24 18:01:09 +08:00
|
|
|
*/
|
2010-11-04 00:06:14 +08:00
|
|
|
boot_option_idle_override = IDLE_NOMWAIT;
|
2008-06-24 17:58:53 +08:00
|
|
|
} else
|
2008-04-25 23:39:01 +08:00
|
|
|
return -1;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
early_param("idle", idle_setup);
|
|
|
|
|
2009-05-12 10:05:28 +08:00
|
|
|
unsigned long arch_align_stack(unsigned long sp)
|
|
|
|
{
|
|
|
|
if (!(current->personality & ADDR_NO_RANDOMIZE) && randomize_va_space)
|
2022-10-05 22:43:38 +08:00
|
|
|
sp -= prandom_u32_max(8192);
|
2009-05-12 10:05:28 +08:00
|
|
|
return sp & ~0xf;
|
|
|
|
}
|
|
|
|
|
|
|
|
unsigned long arch_randomize_brk(struct mm_struct *mm)
|
|
|
|
{
|
2016-10-12 04:53:56 +08:00
|
|
|
return randomize_page(mm->brk, 0x02000000);
|
2009-05-12 10:05:28 +08:00
|
|
|
}
|
|
|
|
|
2015-09-30 16:38:23 +08:00
|
|
|
/*
|
|
|
|
* Called from fs/proc with a reference on @p to find the function
|
|
|
|
* which called into schedule(). This needs to be done carefully
|
|
|
|
* because the task might wake up and we might look at a stack
|
|
|
|
* changing under us.
|
|
|
|
*/
|
2021-09-30 06:02:14 +08:00
|
|
|
unsigned long __get_wchan(struct task_struct *p)
|
2015-09-30 16:38:23 +08:00
|
|
|
{
|
2021-10-22 22:53:02 +08:00
|
|
|
struct unwind_state state;
|
|
|
|
unsigned long addr = 0;
|
2015-09-30 16:38:23 +08:00
|
|
|
|
2021-11-19 17:29:47 +08:00
|
|
|
if (!try_get_task_stack(p))
|
|
|
|
return 0;
|
|
|
|
|
2021-10-22 22:53:02 +08:00
|
|
|
for (unwind_start(&state, p, NULL, NULL); !unwind_done(&state);
|
|
|
|
unwind_next_frame(&state)) {
|
|
|
|
addr = unwind_get_return_address(&state);
|
|
|
|
if (!addr)
|
|
|
|
break;
|
|
|
|
if (in_sched_functions(addr))
|
|
|
|
continue;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2021-11-19 17:29:47 +08:00
|
|
|
put_task_stack(p);
|
|
|
|
|
2021-10-22 22:53:02 +08:00
|
|
|
return addr;
|
2015-09-30 16:38:23 +08:00
|
|
|
}
|
2017-03-20 16:16:23 +08:00
|
|
|
|
2022-05-12 20:04:08 +08:00
|
|
|
long do_arch_prctl_common(int option, unsigned long arg2)
|
2017-03-20 16:16:23 +08:00
|
|
|
{
|
2017-03-20 16:16:26 +08:00
|
|
|
switch (option) {
|
|
|
|
case ARCH_GET_CPUID:
|
|
|
|
return get_cpuid_mode();
|
|
|
|
case ARCH_SET_CPUID:
|
2022-05-12 20:04:08 +08:00
|
|
|
return set_cpuid_mode(arg2);
|
2021-10-22 06:55:10 +08:00
|
|
|
case ARCH_GET_XCOMP_SUPP:
|
|
|
|
case ARCH_GET_XCOMP_PERM:
|
|
|
|
case ARCH_REQ_XCOMP_PERM:
|
2022-01-05 20:35:12 +08:00
|
|
|
case ARCH_GET_XCOMP_GUEST_PERM:
|
|
|
|
case ARCH_REQ_XCOMP_GUEST_PERM:
|
2022-05-12 20:04:08 +08:00
|
|
|
return fpu_xstate_prctl(option, arg2);
|
2017-03-20 16:16:26 +08:00
|
|
|
}
|
|
|
|
|
2017-03-20 16:16:23 +08:00
|
|
|
return -EINVAL;
|
|
|
|
}
|