linux/arch/x86/power/hibernate_64.c
Linus Torvalds 408c9861c6 Power management updates for v4.13-rc1
- Rework suspend-to-idle to allow it to take wakeup events signaled
    by the EC into account on ACPI-based platforms in order to properly
    support power button wakeup from suspend-to-idle on recent Dell
    laptops (Rafael Wysocki).
 
    That includes the core suspend-to-idle code rework, support for
    the Low Power S0 _DSM interface, and support for the ACPI INT0002
    Virtual GPIO device from Hans de Goede (required for USB keyboard
    wakeup from suspend-to-idle to work on some machines).
 
  - Stop trying to export the current CPU frequency via /proc/cpuinfo
    on x86 as that is inaccurate and confusing (Len Brown).
 
  - Rework the way in which the current CPU frequency is exported by
    the kernel (over the cpufreq sysfs interface) on x86 systems with
    the APERF and MPERF registers by always using values read from
    these registers, when available, to compute the current frequency
    regardless of which cpufreq driver is in use (Len Brown).
 
  - Rework the PCI/ACPI device wakeup infrastructure to remove the
    questionable and artificial distinction between "devices that
    can wake up the system from sleep states" and "devices that can
    generate wakeup signals in the working state" from it, which
    allows the code to be simplified quite a bit (Rafael Wysocki).
 
  - Fix the wakeup IRQ framework by making it use SRCU instead of
    RCU which doesn't allow sleeping in the read-side critical
    sections, but which in turn is expected to be allowed by the
    IRQ bus locking infrastructure (Thomas Gleixner).
 
  - Modify some computations in the intel_pstate driver to avoid
    rounding errors resulting from them (Srinivas Pandruvada).
 
  - Reduce the overhead of the intel_pstate driver in the HWP
    (hardware-managed P-states) mode and when the "performance"
    P-state selection algorithm is in use by making it avoid
    registering scheduler callbacks in those cases (Len Brown).
 
  - Rework the energy_performance_preference sysfs knob in
    intel_pstate by changing the values that correspond to
    different symbolic hint names used by it (Len Brown).
 
  - Make it possible to use more than one cpuidle driver at the same
    time on ARM (Daniel Lezcano).
 
  - Make it possible to prevent the cpuidle menu governor from using
    the 0 state by disabling it via sysfs (Nicholas Piggin).
 
  - Add support for FFH (Fixed Functional Hardware) MWAIT in ACPI C1
    on AMD systems (Yazen Ghannam).
 
  - Make the CPPC cpufreq driver take the lowest nonlinear performance
    information into account (Prashanth Prakash).
 
  - Add support for hi3660 to the cpufreq-dt driver, fix the
    imx6q driver and clean up the sfi, exynos5440 and intel_pstate
    drivers (Colin Ian King, Krzysztof Kozlowski, Octavian Purdila,
    Rafael Wysocki, Tao Wang).
 
  - Fix a few minor issues in the generic power domains (genpd)
    framework and clean it up somewhat (Krzysztof Kozlowski,
    Mikko Perttunen, Viresh Kumar).
 
  - Fix a couple of minor issues in the operating performance points
    (OPP) framework and clean it up somewhat (Viresh Kumar).
 
  - Fix a CONFIG dependency in the hibernation core and clean it up
    slightly (Balbir Singh, Arvind Yadav, BaoJun Luo).
 
  - Add rk3228 support to the rockchip-io adaptive voltage scaling
    (AVS) driver (David Wu).
 
  - Fix an incorrect bit shift operation in the RAPL power capping
    driver (Adam Lessnau).
 
  - Add support for the EPP field in the HWP (hardware managed
    P-states) control register, HWP.EPP, to the x86_energy_perf_policy
    tool and update msr-index.h with HWP.EPP values (Len Brown).
 
  - Fix some minor issues in the turbostat tool (Len Brown).
 
  - Add support for AMD family 0x17 CPUs to the cpupower tool and fix
    a minor issue in it (Sherry Hurwitz).
 
  - Assorted cleanups, mostly related to the constification of some
    data structures (Arvind Yadav, Joe Perches, Kees Cook, Krzysztof
    Kozlowski).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJZWrICAAoJEILEb/54YlRxZYMQAIRhfbyDxKq+ByvSilUS8kTA
 AItwJ8FFzykhiwN75Cqabg4rAGyWma7IRs1vzU7zeC1aEQIn+bTQtvk+utZNI+g2
 ANFlDha20q/sXsP/CDMMTIAdW9tSOC0TOvFI9s2V2Y8dJZhoekO4ctx34FAfUS5d
 Ao6rwSAWCMsCXcGaTAlqTA+TEJmBG7u6Iq6hq6ngltoFwOv3mWWBVn52VVaJ7SMp
 9/IPbbLGMFAedrgEBRGCR+MME1xZZpvcZIJaTt1Mgn7Cx3cJaysIUAvqY/SsvFGq
 5FcUTcF2qpK3+AGawiAxZIjvOBsGRtIwqKinNIzYWs/NjiIdzmgVAmTeuPtTqp+5
 HFehUdtkFcnuDnLqSNzAaZUa7tw84cJkwnbVMnesx0MkG6rZ1SeL22E2Sabpcdsh
 3Yo1ThzJSxi59DhiiE92EQnNCEjmCldRy+8q5Ag035muxl6EJYvuNBMnZv/BMCUn
 ltSNOrmps1DlN+Col8ORIeNzQ1YjYzWMqKAYzSbyccm4ug/iSHx0/DuESmQ4GTlF
 YCwkmqyWiHrBwpl51jc+4a7SGlMmKRqU+MJes0CjagaaqoUAb8qeBOpzEJ0yNwjZ
 wtI41l6blE6kbMD3yqGdCfiB2S7GlPVoxa15eX1wRyLH3fLjwwrzJirEaiBS86tI
 1PzHZEOmBlh3DYC6DBKA
 =Wsph
 -----END PGP SIGNATURE-----

Merge tag 'pm-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "The big ticket items here are the rework of suspend-to-idle in order
  to add proper support for power button wakeup from it on recent Dell
  laptops and the rework of interfaces exporting the current CPU
  frequency on x86.

  In addition to that, support for a few new pieces of hardware is
  added, the PCI/ACPI device wakeup infrastructure is simplified
  significantly and the wakeup IRQ framework is fixed to unbreak the IRQ
  bus locking infrastructure.

  Also, there are some functional improvements for intel_pstate, tools
  updates and small fixes and cleanups all over.

  Specifics:

   - Rework suspend-to-idle to allow it to take wakeup events signaled
     by the EC into account on ACPI-based platforms in order to properly
     support power button wakeup from suspend-to-idle on recent Dell
     laptops (Rafael Wysocki).

     That includes the core suspend-to-idle code rework, support for the
     Low Power S0 _DSM interface, and support for the ACPI INT0002
     Virtual GPIO device from Hans de Goede (required for USB keyboard
     wakeup from suspend-to-idle to work on some machines).

   - Stop trying to export the current CPU frequency via /proc/cpuinfo
     on x86 as that is inaccurate and confusing (Len Brown).

   - Rework the way in which the current CPU frequency is exported by
     the kernel (over the cpufreq sysfs interface) on x86 systems with
     the APERF and MPERF registers by always using values read from
     these registers, when available, to compute the current frequency
     regardless of which cpufreq driver is in use (Len Brown).

   - Rework the PCI/ACPI device wakeup infrastructure to remove the
     questionable and artificial distinction between "devices that can
     wake up the system from sleep states" and "devices that can
     generate wakeup signals in the working state" from it, which allows
     the code to be simplified quite a bit (Rafael Wysocki).

   - Fix the wakeup IRQ framework by making it use SRCU instead of RCU
     which doesn't allow sleeping in the read-side critical sections,
     but which in turn is expected to be allowed by the IRQ bus locking
     infrastructure (Thomas Gleixner).

   - Modify some computations in the intel_pstate driver to avoid
     rounding errors resulting from them (Srinivas Pandruvada).

   - Reduce the overhead of the intel_pstate driver in the HWP
     (hardware-managed P-states) mode and when the "performance" P-state
     selection algorithm is in use by making it avoid registering
     scheduler callbacks in those cases (Len Brown).

   - Rework the energy_performance_preference sysfs knob in intel_pstate
     by changing the values that correspond to different symbolic hint
     names used by it (Len Brown).

   - Make it possible to use more than one cpuidle driver at the same
     time on ARM (Daniel Lezcano).

   - Make it possible to prevent the cpuidle menu governor from using
     the 0 state by disabling it via sysfs (Nicholas Piggin).

   - Add support for FFH (Fixed Functional Hardware) MWAIT in ACPI C1 on
     AMD systems (Yazen Ghannam).

   - Make the CPPC cpufreq driver take the lowest nonlinear performance
     information into account (Prashanth Prakash).

   - Add support for hi3660 to the cpufreq-dt driver, fix the imx6q
     driver and clean up the sfi, exynos5440 and intel_pstate drivers
     (Colin Ian King, Krzysztof Kozlowski, Octavian Purdila, Rafael
     Wysocki, Tao Wang).

   - Fix a few minor issues in the generic power domains (genpd)
     framework and clean it up somewhat (Krzysztof Kozlowski, Mikko
     Perttunen, Viresh Kumar).

   - Fix a couple of minor issues in the operating performance points
     (OPP) framework and clean it up somewhat (Viresh Kumar).

   - Fix a CONFIG dependency in the hibernation core and clean it up
     slightly (Balbir Singh, Arvind Yadav, BaoJun Luo).

   - Add rk3228 support to the rockchip-io adaptive voltage scaling
     (AVS) driver (David Wu).

   - Fix an incorrect bit shift operation in the RAPL power capping
     driver (Adam Lessnau).

   - Add support for the EPP field in the HWP (hardware managed
     P-states) control register, HWP.EPP, to the x86_energy_perf_policy
     tool and update msr-index.h with HWP.EPP values (Len Brown).

   - Fix some minor issues in the turbostat tool (Len Brown).

   - Add support for AMD family 0x17 CPUs to the cpupower tool and fix a
     minor issue in it (Sherry Hurwitz).

   - Assorted cleanups, mostly related to the constification of some
     data structures (Arvind Yadav, Joe Perches, Kees Cook, Krzysztof
     Kozlowski)"

* tag 'pm-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (69 commits)
  cpufreq: Update scaling_cur_freq documentation
  cpufreq: intel_pstate: Clean up after performance governor changes
  PM: hibernate: constify attribute_group structures.
  cpuidle: menu: allow state 0 to be disabled
  intel_idle: Use more common logging style
  PM / Domains: Fix missing default_power_down_ok comment
  PM / Domains: Fix unsafe iteration over modified list of domains
  PM / Domains: Fix unsafe iteration over modified list of domain providers
  PM / Domains: Fix unsafe iteration over modified list of device links
  PM / Domains: Handle safely genpd_syscore_switch() call on non-genpd device
  PM / Domains: Call driver's noirq callbacks
  PM / core: Drop run_wake flag from struct dev_pm_info
  PCI / PM: Simplify device wakeup settings code
  PCI / PM: Drop pme_interrupt flag from struct pci_dev
  ACPI / PM: Consolidate device wakeup settings code
  ACPI / PM: Drop run_wake from struct acpi_device_wakeup_flags
  PM / QoS: constify *_attribute_group.
  PM / AVS: rockchip-io: add io selectors and supplies for rk3228
  powercap/RAPL: prevent overridding bits outside of the mask
  PM / sysfs: Constify attribute groups
  ...
2017-07-04 13:39:41 -07:00

331 lines
8.1 KiB
C

/*
* Hibernation support for x86-64
*
* Distribute under GPLv2
*
* Copyright (c) 2007 Rafael J. Wysocki <rjw@sisk.pl>
* Copyright (c) 2002 Pavel Machek <pavel@ucw.cz>
* Copyright (c) 2001 Patrick Mochel <mochel@osdl.org>
*/
#include <linux/gfp.h>
#include <linux/smp.h>
#include <linux/suspend.h>
#include <linux/scatterlist.h>
#include <linux/kdebug.h>
#include <crypto/hash.h>
#include <asm/e820/api.h>
#include <asm/init.h>
#include <asm/proto.h>
#include <asm/page.h>
#include <asm/pgtable.h>
#include <asm/mtrr.h>
#include <asm/sections.h>
#include <asm/suspend.h>
#include <asm/tlbflush.h>
/* Defined in hibernate_asm_64.S */
extern asmlinkage __visible int restore_image(void);
/*
* Address to jump to in the last phase of restore in order to get to the image
* kernel's text (this value is passed in the image header).
*/
unsigned long restore_jump_address __visible;
unsigned long jump_address_phys;
/*
* Value of the cr3 register from before the hibernation (this value is passed
* in the image header).
*/
unsigned long restore_cr3 __visible;
unsigned long temp_level4_pgt __visible;
unsigned long relocated_restore_code __visible;
static int set_up_temporary_text_mapping(pgd_t *pgd)
{
pmd_t *pmd;
pud_t *pud;
p4d_t *p4d;
/*
* The new mapping only has to cover the page containing the image
* kernel's entry point (jump_address_phys), because the switch over to
* it is carried out by relocated code running from a page allocated
* specifically for this purpose and covered by the identity mapping, so
* the temporary kernel text mapping is only needed for the final jump.
* Moreover, in that mapping the virtual address of the image kernel's
* entry point must be the same as its virtual address in the image
* kernel (restore_jump_address), so the image kernel's
* restore_registers() code doesn't find itself in a different area of
* the virtual address space after switching over to the original page
* tables used by the image kernel.
*/
if (IS_ENABLED(CONFIG_X86_5LEVEL)) {
p4d = (p4d_t *)get_safe_page(GFP_ATOMIC);
if (!p4d)
return -ENOMEM;
}
pud = (pud_t *)get_safe_page(GFP_ATOMIC);
if (!pud)
return -ENOMEM;
pmd = (pmd_t *)get_safe_page(GFP_ATOMIC);
if (!pmd)
return -ENOMEM;
set_pmd(pmd + pmd_index(restore_jump_address),
__pmd((jump_address_phys & PMD_MASK) | __PAGE_KERNEL_LARGE_EXEC));
set_pud(pud + pud_index(restore_jump_address),
__pud(__pa(pmd) | _KERNPG_TABLE));
if (IS_ENABLED(CONFIG_X86_5LEVEL)) {
set_p4d(p4d + p4d_index(restore_jump_address), __p4d(__pa(pud) | _KERNPG_TABLE));
set_pgd(pgd + pgd_index(restore_jump_address), __pgd(__pa(p4d) | _KERNPG_TABLE));
} else {
/* No p4d for 4-level paging: point the pgd to the pud page table */
set_pgd(pgd + pgd_index(restore_jump_address), __pgd(__pa(pud) | _KERNPG_TABLE));
}
return 0;
}
static void *alloc_pgt_page(void *context)
{
return (void *)get_safe_page(GFP_ATOMIC);
}
static int set_up_temporary_mappings(void)
{
struct x86_mapping_info info = {
.alloc_pgt_page = alloc_pgt_page,
.page_flag = __PAGE_KERNEL_LARGE_EXEC,
.offset = __PAGE_OFFSET,
};
unsigned long mstart, mend;
pgd_t *pgd;
int result;
int i;
pgd = (pgd_t *)get_safe_page(GFP_ATOMIC);
if (!pgd)
return -ENOMEM;
/* Prepare a temporary mapping for the kernel text */
result = set_up_temporary_text_mapping(pgd);
if (result)
return result;
/* Set up the direct mapping from scratch */
for (i = 0; i < nr_pfn_mapped; i++) {
mstart = pfn_mapped[i].start << PAGE_SHIFT;
mend = pfn_mapped[i].end << PAGE_SHIFT;
result = kernel_ident_mapping_init(&info, pgd, mstart, mend);
if (result)
return result;
}
temp_level4_pgt = __pa(pgd);
return 0;
}
static int relocate_restore_code(void)
{
pgd_t *pgd;
p4d_t *p4d;
pud_t *pud;
pmd_t *pmd;
pte_t *pte;
relocated_restore_code = get_safe_page(GFP_ATOMIC);
if (!relocated_restore_code)
return -ENOMEM;
memcpy((void *)relocated_restore_code, core_restore_code, PAGE_SIZE);
/* Make the page containing the relocated code executable */
pgd = (pgd_t *)__va(read_cr3_pa()) +
pgd_index(relocated_restore_code);
p4d = p4d_offset(pgd, relocated_restore_code);
if (p4d_large(*p4d)) {
set_p4d(p4d, __p4d(p4d_val(*p4d) & ~_PAGE_NX));
goto out;
}
pud = pud_offset(p4d, relocated_restore_code);
if (pud_large(*pud)) {
set_pud(pud, __pud(pud_val(*pud) & ~_PAGE_NX));
goto out;
}
pmd = pmd_offset(pud, relocated_restore_code);
if (pmd_large(*pmd)) {
set_pmd(pmd, __pmd(pmd_val(*pmd) & ~_PAGE_NX));
goto out;
}
pte = pte_offset_kernel(pmd, relocated_restore_code);
set_pte(pte, __pte(pte_val(*pte) & ~_PAGE_NX));
out:
__flush_tlb_all();
return 0;
}
int swsusp_arch_resume(void)
{
int error;
/* We have got enough memory and from now on we cannot recover */
error = set_up_temporary_mappings();
if (error)
return error;
error = relocate_restore_code();
if (error)
return error;
restore_image();
return 0;
}
/*
* pfn_is_nosave - check if given pfn is in the 'nosave' section
*/
int pfn_is_nosave(unsigned long pfn)
{
unsigned long nosave_begin_pfn = __pa_symbol(&__nosave_begin) >> PAGE_SHIFT;
unsigned long nosave_end_pfn = PAGE_ALIGN(__pa_symbol(&__nosave_end)) >> PAGE_SHIFT;
return (pfn >= nosave_begin_pfn) && (pfn < nosave_end_pfn);
}
#define MD5_DIGEST_SIZE 16
struct restore_data_record {
unsigned long jump_address;
unsigned long jump_address_phys;
unsigned long cr3;
unsigned long magic;
u8 e820_digest[MD5_DIGEST_SIZE];
};
#define RESTORE_MAGIC 0x23456789ABCDEF01UL
#if IS_BUILTIN(CONFIG_CRYPTO_MD5)
/**
* get_e820_md5 - calculate md5 according to given e820 table
*
* @table: the e820 table to be calculated
* @buf: the md5 result to be stored to
*/
static int get_e820_md5(struct e820_table *table, void *buf)
{
struct scatterlist sg;
struct crypto_ahash *tfm;
int size;
int ret = 0;
tfm = crypto_alloc_ahash("md5", 0, CRYPTO_ALG_ASYNC);
if (IS_ERR(tfm))
return -ENOMEM;
{
AHASH_REQUEST_ON_STACK(req, tfm);
size = offsetof(struct e820_table, entries) + sizeof(struct e820_entry) * table->nr_entries;
ahash_request_set_tfm(req, tfm);
sg_init_one(&sg, (u8 *)table, size);
ahash_request_set_callback(req, 0, NULL, NULL);
ahash_request_set_crypt(req, &sg, buf, size);
if (crypto_ahash_digest(req))
ret = -EINVAL;
ahash_request_zero(req);
}
crypto_free_ahash(tfm);
return ret;
}
static void hibernation_e820_save(void *buf)
{
get_e820_md5(e820_table_firmware, buf);
}
static bool hibernation_e820_mismatch(void *buf)
{
int ret;
u8 result[MD5_DIGEST_SIZE];
memset(result, 0, MD5_DIGEST_SIZE);
/* If there is no digest in suspend kernel, let it go. */
if (!memcmp(result, buf, MD5_DIGEST_SIZE))
return false;
ret = get_e820_md5(e820_table_firmware, result);
if (ret)
return true;
return memcmp(result, buf, MD5_DIGEST_SIZE) ? true : false;
}
#else
static void hibernation_e820_save(void *buf)
{
}
static bool hibernation_e820_mismatch(void *buf)
{
/* If md5 is not builtin for restore kernel, let it go. */
return false;
}
#endif
/**
* arch_hibernation_header_save - populate the architecture specific part
* of a hibernation image header
* @addr: address to save the data at
*/
int arch_hibernation_header_save(void *addr, unsigned int max_size)
{
struct restore_data_record *rdr = addr;
if (max_size < sizeof(struct restore_data_record))
return -EOVERFLOW;
rdr->jump_address = (unsigned long)restore_registers;
rdr->jump_address_phys = __pa_symbol(restore_registers);
rdr->cr3 = restore_cr3;
rdr->magic = RESTORE_MAGIC;
hibernation_e820_save(rdr->e820_digest);
return 0;
}
/**
* arch_hibernation_header_restore - read the architecture specific data
* from the hibernation image header
* @addr: address to read the data from
*/
int arch_hibernation_header_restore(void *addr)
{
struct restore_data_record *rdr = addr;
restore_jump_address = rdr->jump_address;
jump_address_phys = rdr->jump_address_phys;
restore_cr3 = rdr->cr3;
if (rdr->magic != RESTORE_MAGIC) {
pr_crit("Unrecognized hibernate image header format!\n");
return -EINVAL;
}
if (hibernation_e820_mismatch(rdr->e820_digest)) {
pr_crit("Hibernate inconsistent memory map detected!\n");
return -ENODEV;
}
return 0;
}