linux-next

mirror of https://github.com/edk2-porting/linux-next.git synced 2025-01-12 15:44:01 +08:00

Author	SHA1	Message	Date
Srinivas Pandruvada	46992d6b55	cpufreq: intel_pstate: round up min_perf limits In some use cases, user wants to enforce a minimum performance limit on CPUs. But because of simple division the resultant performance is 100 MHz less than the desired in some cases. For example when the maximum frequency is 3.50 GHz, setting scaling_min_frequency to 1.6 GHz always results in 1.5 GHz minimum. With simple round up, the frequency can be set to 1.6 GHz to minimum in this case. This round up is already done to max_policy_pct and max_perf, so do the same for min_policy_pct and min_perf. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-11-22 02:31:48 +01:00
Rafael J. Wysocki	001c76f05b	cpufreq: intel_pstate: Generic governors support There may be reasons to use generic cpufreq governors (eg. schedutil) on Intel platforms instead of the intel_pstate driver's internal governor. However, that currently can only be done by disabling intel_pstate altogether and using the acpi-cpufreq driver instead of it, which is subject to limitations. First of all, acpi-cpufreq only works on systems where the _PSS object is present in the ACPI tables for all logical CPUs. Second, on those systems acpi-cpufreq will only use frequencies listed by _PSS which may be suboptimal. In particular, by convention, the whole turbo range is represented in _PSS as a single P-state and the frequency assigned to it is greater by 1 MHz than the greatest non-turbo frequency listed by _PSS. That may confuse governors to use turbo frequencies less frequently which may lead to suboptimal performance. For this reason, make it possible to use the intel_pstate driver with generic cpufreq governors as a "normal" cpufreq driver. That mode is enforced by adding intel_pstate=passive to the kernel command line and cannot be disabled at run time. In that mode, intel_pstate provides a cpufreq driver interface including the ->target() and ->fast_switch() callbacks and is listed in scaling_driver as "intel_cpufreq". Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Doug Smythies <dsmythies@telus.net>	2016-11-21 14:32:32 +01:00
Rafael J. Wysocki	d0ea59e188	cpufreq: intel_pstate: Request P-states control from SMM if needed Currently, intel_pstate is unable to control P-states on my IvyBridge-based Acer Aspire S5, because they are controlled by SMM on that machine by default and it is necessary to request OS control of P-states from it via the SMI Command register exposed in the ACPI FADT. intel_pstate doesn't do that now, but acpi-cpufreq and other cpufreq drivers for x86 platforms do. Address this problem by making intel_pstate use the ACPI-defined mechanism as well. However, intel_pstate is not modular and it doesn't need the module refcount tricks played by acpi_processor_notify_smm(), so export the core of this function to it as acpi_processor_pstate_control() and make it call that. [The changes in processor_perflib.c related to this should not make any functional difference for the acpi_processor_notify_smm() users]. To be safe, only call acpi_processor_notify_smm() from intel_pstate if ACPI _PPC support is enabled in it. Suggested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>	2016-11-17 22:47:47 +01:00
Srinivas Pandruvada	7f7a516ee3	cpufreq: intel_pstate: Use CPU load based algorithm for PM_MOBILE Use get_target_pstate_use_cpu_load() to calculate target P-State for devices, with the preferred power management profile in ACPI FADT set to PM_MOBILE. This may help in resolving some thermal issues caused by low sustained cpu bound workloads. The current algorithm tend to over provision in this case as it doesn't look at the CPU busyness. Also included the fix from Arnd Bergmann <arnd@arndb.de> to solve compile issue, when CONFIG_ACPI is not defined. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-11-14 21:25:23 +01:00
Srinivas Pandruvada	a410c03d66	cpufreq: intel_pstate: protect limits variable The limits variable gets modified from intel_pstate sysfs and also gets modified from cpufreq sysfs. So protect with a mutex to keep data integrity, when they are getting modified from multiple threads. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-11-01 06:10:54 +01:00
Srinivas Pandruvada	5879f87739	cpufreq: intel_pstate: Reduce impact due to rounding error When policy->max and policy->min are same, in some cases they don't result in the same frequency cap. The max_policy_pct is rounded up but not min_perf_pct. So even when they are same, results in different percentage or maximum and minimum. Since minimum is a conservative value for power, a lower value without rounding is better in most of the cases, unless user wants policy->max = policy->min. This change uses use the same policy percentage when policy->max and policy->min are same. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-11-01 06:04:06 +01:00
Srinivas Pandruvada	eae48f046f	cpufreq: intel_pstate: Per CPU P-State limits Intel P-State offers two interface to set performance limits: - Intel P-State sysfs /sys/devices/system/cpu/intel_pstate/max_perf_pct /sys/devices/system/cpu/intel_pstate/min_perf_pct - cpufreq /sys/devices/system/cpu/cpu/cpufreq/scaling_max_freq /sys/devices/system/cpu/cpu/cpufreq/scaling_min_freq In the current implementation both of the above methods, change limits to every CPU in the system. Moreover the limits placed using cpufreq policy interface also presented in the Intel P-State sysfs via modified max_perf_pct and min_per_pct during sysfs reads. This allows to check percent of reduced/increased performance, irrespective of method used to limit. There are some new generations of processors, where it is possible to have limits placed on individual CPU cores. Using cpufreq interface it is possible to set limits on each CPU. But the current processing will use last limits placed on all CPUs. So the per core limit feature of CPUs can't be used. This change brings in capability to set P-States limits for each CPU, with some limitations. In this case what should be the read of max_perf_pct and min_perf_pct? It can be most restrictive limits placed on any CPU or max possible performance on any given CPU on which no limits are placed. In either case someone will have issue. So the consensus is, we can't have both sysfs controls present when user wants to use limit per core limits. - By default per-core-control feature is not enabled. So no one will notice any difference. - The way to enable is by kernel command line intel_pstate=per_cpu_perf_limits - When the per-core-controls are enabled there is no display of for both read and write on /sys/devices/system/cpu/intel_pstate/max_perf_pct /sys/devices/system/cpu/intel_pstate/min_perf_pct - User can change limits using /sys/devices/system/cpu/cpu/cpufreq/scaling_max_freq /sys/devices/system/cpu/cpu/cpufreq/scaling_min_freq /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor - User can still observe turbo percent and number of P-States from /sys/devices/system/cpu/intel_pstate/turbo_pct /sys/devices/system/cpu/intel_pstate/num_pstates - User can read write system wide turbo status /sys/devices/system/cpu/no_turbo While changing this BUG_ON is changed to WARN_ON, as they are not fatal errors for the system. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-11-01 06:04:06 +01:00
Rafael J. Wysocki	fe0f59c412	Merge back earlier cpufreq material for v4.10.	2016-10-30 06:12:50 +01:00
Rafael J. Wysocki	2f1d407ada	cpufreq: intel_pstate: Always set max P-state in performance mode The only times at which intel_pstate checks the policy set for a given CPU is the initialization of that CPU and updates of its policy settings from cpufreq when intel_pstate_set_policy() is invoked. That is insufficient, however, because intel_pstate uses the same P-state selection function for all CPUs regardless of the policy setting for each of them and the P-state limits are shared between them. Thus if the policy is set to "performance" for a particular CPU, it may not behave as expected if the cpufreq settings are changed subsequently for another CPU. That can be easily demonstrated by writing "performance" to scaling_governor for all CPUs and then switching it to "powersave" for one of them in which case all of the CPUs will behave as though their scaling_governor were all "powersave" (even though the policy still appears to be "performance" for the remaining CPUs). Fix this problem by modifying intel_pstate_adjust_busy_pstate() to always set the P-state to the maximum allowed by the current limits for all CPUs whose policy is set to "performance". Note that it still is recommended to always change the policy setting in the same way for all CPUs even with this fix applied to avoid confusion. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-10-24 23:20:25 +02:00
Rafael J. Wysocki	a6c6ead141	cpufreq: intel_pstate: Set P-state upfront in performance mode After commit `a4675fbc4a` (cpufreq: intel_pstate: Replace timers with utilization update callbacks) the cpufreq governor callbacks may not be invoked on NOHZ_FULL CPUs and, in particular, switching to the "performance" policy via sysfs may not have any effect on them. That is a problem, because it usually is desirable to squeeze the last bit of performance out of those CPUs, so work around it by setting the maximum P-state (within the limits) in intel_pstate_set_policy() upfront when the policy is CPUFREQ_POLICY_PERFORMANCE. Fixes: `a4675fbc4a` (cpufreq: intel_pstate: Replace timers with utilization update callbacks) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>	2016-10-21 22:18:22 +02:00
Srinivas Pandruvada	185d82456e	cpufreq: intel_pstate: Remove PID debugfs when not used When target state is calculated using get_target_pstate_use_cpu_load(), PID controller is not used, hence it has no effect on performance. So don't present debugfs entries to tune PID controller. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-10-21 22:16:26 +02:00
Rafael J. Wysocki	1d29815ef2	cpufreq: intel_pstate: Drop boost_iowait flag The "IOwait boosting" mechanism is only used by the get_target_pstate_use_cpu_load() governor function and the boost_iowait flag in pid_params is always set when that function is in use (and it is never set otherwise). This means that the boost_iowait flag is in fact redundant and may be dropped. For this reason, replace the boost_iowait flag check in intel_pstate_update_util() with an equivalent check against pstate_funcs.get_target_pstate and drop that flag. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>	2016-10-21 22:13:51 +02:00
Rafael J. Wysocki	3954517e2f	cpufreq: intel_pstate: Fix struct pstate_adjust_policy kerneldoc It looks like the name of struct pstate_adjust_policy was updated without updating its kerneldoc comment accordingly, so fix that mistake. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-10-12 20:58:14 +02:00
Rafael J. Wysocki	0843e83c1a	cpufreq: intel_pstate: Proportional algorithm for Atom The PID algorithm used by the intel_pstate driver tends to drive performance to the minimum for workloads with utilization below the setpoint, which is undesirable, so replace it with a modified "proportional" algorithm on Atom. The new algorithm will set the new P-state to be 1.25 times the available maximum times the (frequency-invariant) utilization during the previous sampling period except when the target P-state computed this way is lower than the average P-state during the previous sampling period. In the latter case, it will increase the target by 50% of the difference between it and the average P-state to prevent performance from dropping down too fast in some cases. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>	2016-10-12 20:58:13 +02:00
Rafael J. Wysocki	f00593a4bd	cpufreq: intel_pstate: Clarify comment in get_target_pstate_use_performance() Make the comment explaining the meaning of the perf_scaled variable in get_target_pstate_use_performance() more straightforward. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-10-09 18:54:57 +02:00
Srinivas Pandruvada	f9f4872df6	cpufreq: intel_pstate: Fix unsafe HWP MSR access This is a requirement that MSR MSR_PM_ENABLE must be set to 0x01 before reading MSR_HWP_CAPABILITIES on a given CPU. If cpufreq init() is scheduled on a CPU which is not same as policy->cpu or migrates to a different CPU before calling msr read for MSR_HWP_CAPABILITIES, it is possible that MSR_PM_ENABLE was not to set to 0x01 on that CPU. This will cause GP fault. So like other places in this path rdmsrl_on_cpu should be used instead of rdmsrl. Moreover the scope of MSR_HWP_CAPABILITIES is on per thread basis, so it should be read from the same CPU, for which MSR MSR_HWP_REQUEST is getting set. dmesg dump or warning: [ 22.014488] WARNING: CPU: 139 PID: 1 at arch/x86/mm/extable.c:50 ex_handler_rdmsr_unsafe+0x68/0x70 [ 22.014492] unchecked MSR access error: RDMSR from 0x771 [ 22.014493] Modules linked in: [ 22.014507] CPU: 139 PID: 1 Comm: swapper/0 Not tainted 4.7.5+ #1 ... ... [ 22.014516] Call Trace: [ 22.014542] [<ffffffff813d7dd1>] dump_stack+0x63/0x82 [ 22.014558] [<ffffffff8107bc8b>] __warn+0xcb/0xf0 [ 22.014561] [<ffffffff8107bcff>] warn_slowpath_fmt+0x4f/0x60 [ 22.014563] [<ffffffff810676f8>] ex_handler_rdmsr_unsafe+0x68/0x70 [ 22.014564] [<ffffffff810677d9>] fixup_exception+0x39/0x50 [ 22.014604] [<ffffffff8102e400>] do_general_protection+0x80/0x150 [ 22.014610] [<ffffffff817f9ec8>] general_protection+0x28/0x30 [ 22.014635] [<ffffffff81687940>] ? get_target_pstate_use_performance+0xb0/0xb0 [ 22.014642] [<ffffffff810600c7>] ? native_read_msr+0x7/0x40 [ 22.014657] [<ffffffff81688123>] intel_pstate_hwp_set+0x23/0x130 [ 22.014660] [<ffffffff81688406>] intel_pstate_set_policy+0x1b6/0x340 [ 22.014662] [<ffffffff816829bb>] cpufreq_set_policy+0xeb/0x2c0 [ 22.014664] [<ffffffff81682f39>] cpufreq_init_policy+0x79/0xe0 [ 22.014666] [<ffffffff81682cb0>] ? cpufreq_update_policy+0x120/0x120 [ 22.014669] [<ffffffff816833a6>] cpufreq_online+0x406/0x820 [ 22.014671] [<ffffffff8168381f>] cpufreq_add_dev+0x5f/0x90 [ 22.014717] [<ffffffff81530ac8>] subsys_interface_register+0xb8/0x100 [ 22.014719] [<ffffffff816821bc>] cpufreq_register_driver+0x14c/0x210 [ 22.014749] [<ffffffff81fe1d90>] intel_pstate_init+0x39d/0x4d5 [ 22.014751] [<ffffffff81fe13f2>] ? cpufreq_gov_dbs_init+0x12/0x12 Cc: 4.3+ <stable@vger.kernel.org> # 4.3+ Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-10-09 18:54:13 +02:00
Rafael J. Wysocki	b6e2511782	Merge branch 'pm-cpufreq-sched' into pm-cpufreq	2016-10-02 01:42:33 +02:00
Srinivas Pandruvada	3ba7bcaa36	cpufreq: intel_pstate: Add io_boost trace Add io_boost percent to current pstate_sample tracepoint. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-09-16 23:55:30 +02:00
Rafael J. Wysocki	09c448d3c6	cpufreq: intel_pstate: Use IOWAIT flag in Atom algorithm Modify the P-state selection algorithm for Atom processors to use the new SCHED_CPUFREQ_IOWAIT flag instead of the questionable get_cpu_iowait_time_us() function. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-09-14 02:28:13 +02:00
Julia Lawall	42ce8921cc	intel_pstate: constify local structures For structure types defined in the same file or local header files, find top-level static structure declarations that have the following properties: 1. Never reassigned. 2. Address never taken 3. Not passed to a top-level macro call 4. No pointer or array-typed field passed to a function or stored in a variable. Declare structures having all of these properties as const. Done using Coccinelle. Based on a suggestion by Joe Perches <joe@perches.com>. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-09-13 02:40:24 +02:00
Rafael J. Wysocki	58919e83c8	cpufreq / sched: Pass flags to cpufreq_update_util() It is useful to know the reason why cpufreq_update_util() has just been called and that can be passed as flags to cpufreq_update_util() and to the ->func() callback in struct update_util_data. However, doing that in addition to passing the util and max arguments they already take would be clumsy, so avoid it. Instead, use the observation that the schedutil governor is part of the scheduler proper, so it can access scheduler data directly. This allows the util and max arguments of cpufreq_update_util() and the ->func() callback in struct update_util_data to be replaced with a flags one, but schedutil has to be modified to follow. Thus make the schedutil governor obtain the CFS utilization information from the scheduler and use the "RT" and "DL" flags instead of the special utilization value of ULONG_MAX to track updates from the RT and DL sched classes. Make it non-modular too to avoid having to export scheduler variables to modules at large. Next, update all of the other users of cpufreq_update_util() and the ->func() callback in struct update_util_data accordingly. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-08-16 22:14:55 +02:00
Rafael J. Wysocki	e2b3b80de5	Merge branches 'pm-sleep', 'pm-cpufreq', 'pm-core' and 'pm-opp' * pm-sleep: x86/power/64: Do not refer to __PAGE_OFFSET from assembly code * pm-cpufreq: cpufreq: Do not default-yes CPU_FREQ_STAT cpufreq: intel_pstate: Add more out-of-band IDs * pm-core: PM-wakeup: Delete unnecessary checks before three function calls * pm-opp: PM / OPP: optimize dev_pm_opp_set_rate() performance a bit	2016-08-05 15:46:55 +02:00
Srinivas Pandruvada	65c1262f40	cpufreq: intel_pstate: Add more out-of-band IDs Add Skylake-X and Broadwell-X IDs for out-of-band (OBB) control of P-States. For these processors, if MSR_MISC_PWR_MGMT BIT(8) == 1, then the Intel P-State driver should exit as OS can't control P-States. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw : Subject/changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-07-28 23:58:17 +02:00
Rafael J. Wysocki	bc841e260c	Merge branch 'pm-cpu' * pm-cpu: x86: remove duplicate turbo ratio limit MSRs tools/power turbostat: Replace MSR_NHM_TURBO_RATIO_LIMIT cpufreq: intel_pstate: Replace MSR_NHM_TURBO_RATIO_LIMIT	2016-07-25 13:46:30 +02:00
Srinivas Pandruvada	da7de91c3e	cpufreq: intel_pstate: Check cpuid for MSR_HWP_INTERRUPT The MSR MSR_HWP_INTERRUPT is valid only when CPUID.06H:EAX[8] = 1, so check for feature before accessing this MSR. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-07-21 14:29:30 +02:00
Rafael J. Wysocki	bc95a454b6	intel_pstate: Update cpu_frequency tracepoint every time Currently, intel_pstate only updates the cpu_frequency tracepoint if the new P-state to set is different from the current one, but that causes powertop to report 100% idle on an 100% loaded system sometimes. Prevent that from happening by updating the cpu_frequency tracepoint every time intel_pstate_update_pstate() is called. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>-	2016-07-21 14:28:37 +02:00
Carsten Emde	2630abc243	cpufreq: intel_pstate: clean remnant struct element When I was working with the Intel P state driver I came across a remnant struct element that is no longer needed after the function intel_pstate_calc_freq() was retired. Signed-off-by: Carsten Emde <C.Emde@osadl.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-07-21 14:26:00 +02:00
Jan Kiszka	5fc8f707a2	intel_pstate: Fix MSR_CONFIG_TDP_x addressing in core_get_max_pstate() If MSR_CONFIG_TDP_CONTROL is locked, we currently try to address some MSR 0x80000648 or so. Mask out the relevant level bits 0 and 1. Found while running over the Jailhouse hypervisor which became upset about this strange MSR index. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: 4.4+ <stable@vger.kernel.org> # 4.4+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-07-11 15:12:30 +02:00
Srinivas Pandruvada	100cf6f277	cpufreq: intel_pstate: Replace MSR_NHM_TURBO_RATIO_LIMIT Replace MSR_NHM_TURBO_RATIO_LIMIT with MSR_TURBO_RATIO_LIMIT. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-07-07 15:31:58 +02:00
Rafael J. Wysocki	5d1191ab6c	Merge back earlier cpufreq material for v4.8.	2016-07-04 13:21:43 +02:00
Jisheng Zhang	4a7cb7a96a	intel_pstate: Declare pid_params/pstate_funcs/hwp_active __read_mostly pid_params is written once by copy_pid_params() during initialization, and thereafter is mostly read by hot path intel_pstate_update_util(). The read of pid_params gets more after commit `a4675fbc4a` ("cpufreq: intel_pstate: Replace timers with utilization update callbacks") pstate_funcs is written once by copy_cpu_funcs() during initialization, and thereafter is mostly read by hot path intel_pstate_update_util() hwp_active is written to once during initialization and thereafter is mostly read by hot path intel_pstate_update_util(). The fact that they are mostly read and not written to makes them candidates for __read_mostly declarations. Signed-off-by: Jisheng Zhang <jszhang@marvell.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-06-28 00:04:04 +02:00
Jisheng Zhang	29327c84ba	intel_pstate: add __init/__initdata marker to some functions/variables These functions/variables are not needed after booting, so mark them as __init or __initdata. Signed-off-by: Jisheng Zhang <jszhang@marvell.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-06-28 00:04:04 +02:00
Jisheng Zhang	eed436095e	intel_pstate: Fix incorrect placement of __initdata __initdata should be placed between the variable name and equal sign (if there is) for the variable to be placed in the intended section. Signed-off-by: Jisheng Zhang <jszhang@marvell.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-06-28 00:04:04 +02:00
Rafael J. Wysocki	5ab666e095	intel_pstate: Do not clear utilization update hooks on policy changes intel_pstate_set_policy() is invoked by the cpufreq core during driver initialization, on changes of policy attributes (minimim and maximum frequency, for example) via sysfs and via CPU notifications from the platform firmware. On some platforms the latter may occur relatively often. Commit `bb6ab52f2b` (intel_pstate: Do not set utilization update hook too early) made intel_pstate_set_policy() clear the CPU's utilization update hook before updating the policy attributes for it (and set the hook again after doind that), but that involves invoking synchronize_sched() and adds overhead to the CPU notifications mentioned above and to the sched-RCU handling in general. That extra overhead is arguably not necessary, because updating policy attributes when the CPU's utilization update hook is active should not lead to any adverse effects, so drop the clearing of the hook from intel_pstate_set_policy() and make it check if the hook has been set already when attempting to set it. Fixes: `bb6ab52f2b` (intel_pstate: Do not set utilization update hook too early) Reported-by: Jisheng Zhang <jszhang@marvell.com> Tested-by: Jisheng Zhang <jszhang@marvell.com> Tested-by: Doug Smythies <dsmythies@telus.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-06-27 23:47:15 +02:00
Rafael J. Wysocki	56b7808572	Merge back earlier cpufreq changes for v4.8.	2016-06-20 14:31:41 +02:00
Srinivas Pandruvada	b00345d199	cpufreq: intel_pstate: Adjust _PSS[0] freqeuency if needed The maximum turbo P-State used by the intel_pstate driver may be limited by ACPI _PSS table entry 0. After commit `9522a2ff9c` (cpufreq: intel_pstate: Enforce _PPC limits), the maximum performance on servers will be capped by the _PSS table entry 0 by default. Even though that is formally correct, it may lead to preformance regressions in some cases. Namely, if the _PSS table entry 0 is not the maximum turbo P-State, performance measured after commit `9522a2ff9c` will not match the performance measured before that commit on the same system. For this reason, modify the code to always use the maximum turbo frequency as the one that corresponds to _PSS table entry 0 if turbo is enabled in the BIOS. This way, the performance levels from before commit `9522a2ff9c` will be restored on the affected systems. Fixes: `9522a2ff9c` (cpufreq: intel_pstate: Enforce _PPC limits) Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw : Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-06-15 01:56:47 +02:00
Srinivas Pandruvada	41bad47f76	cpufreq: intel_pstate: Broxton support Add Broxton CPU model number. Broxton requires core_params to get performance limits via MSRs, but it is an Atom platform, which requires more power optimized algorithm. So the P state selection will use similar algorithm as other Atom platforms. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-06-13 23:49:39 +02:00
Rafael J. Wysocki	b77b565108	Merge branch 'x86/cpu' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into x86/cpu Pull recent changes related to x86 CPU model representations from tip.	2016-06-13 23:48:23 +02:00
Dave Hansen	5b20c94488	x86/cpufreq: Use Intel family name macros for the intel_pstate cpufreq driver Another straightforward replacement of magic numbers. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave@sr71.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: jacob.jun.pan@intel.com Cc: linux-pm@vger.kernel.org Link: http://lkml.kernel.org/r/20160603001945.0F5D02AA@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-06-08 13:03:26 +02:00
Srinivas Pandruvada	983e600e88	cpufreq: intel_pstate: Fix ->set_policy() interface for no_turbo When turbo is disabled, the ->set_policy() interface is broken. For example, when turbo is disabled and cpuinfo.max = `2900000` (full max turbo frequency), setting the limits results in frequency less than the requested one: Set 1000000 KHz results in 0700000 KHz Set 1500000 KHz results in 1100000 KHz Set 2000000 KHz results in 1500000 KHz This is because the limits->max_perf fraction is calculated using the max turbo frequency as the reference, but when the max P-State is capped in intel_pstate_get_min_max(), the reference is not the max turbo P-State. This results in reducing max P-State. One option is to always use max turbo as reference for calculating limits. But this will not be correct. By definition the intel_pstate sysfs limits, shows percentage of available performance. So when BIOS has disabled turbo, the available performance is max non turbo. So the max_perf_pct should still show 100%. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw : Subject & changelog, rewrite in fewer lines of code ] Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-06-08 03:22:40 +02:00
Srinivas Pandruvada	2c2c1af449	cpufreq: intel_pstate: Fix code ordering in intel_pstate_set_policy() The limits->max_perf is rounded_up but immediately overwritten by another assignment to limits->max_perf. Move that operation to the correct location. While here also added a pr_debug() call in ->set_policy to aid in debugging. Fixes: `785ee27881` (cpufreq: intel_pstate: Fix limits->max_perf rounding error) Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw : Subject & changelog ] Cc: 4.4+ <stable@vger.kernel.org> # 4.4+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-06-08 03:22:39 +02:00
Srinivas Pandruvada	6cacd115a8	cpufreq: intel_pstate: Downgrade print level for _PPC Downgrade pr_info to pr_debug for the "_PPC limits will be enforced" message. In server systems with many cores this message is annoying. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-05-30 15:22:02 +02:00
Rafael J. Wysocki	c749c64f45	intel_pstate: Simplify conditional in intel_pstate_set_policy() One of the if () statements in intel_pstate_set_policy() causes another if () to be evaluated if the condition is true and it doesn't do anything else, so merge the two if () statements into one. No functional changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>	2016-05-18 02:26:33 +02:00
Rafael J. Wysocki	1aa7a6e2b8	intel_pstate: Clean up get_target_pstate_use_performance() The comments and the core_busy variable name in get_target_pstate_use_performance() are totally confusing, so modify them to reflect what's going on. The results of the computations should be the same as before. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-05-11 22:58:38 +02:00
Rafael J. Wysocki	8edb0a6e48	intel_pstate: Use sample.core_avg_perf in get_avg_pstate() Notice that get_avg_pstate() can use sample.core_avg_perf instead of carrying the same division again, so make it do that. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-05-11 22:58:37 +02:00
Rafael J. Wysocki	a1c9787dc3	intel_pstate: Clarify average performance computation The core_pct_busy field of struct sample actually contains the average performace during the last sampling period (in percent) and not the utilization of the core as suggested by its name which is confusing. For this reason, change the name of that field to core_avg_perf and rename the function that computes its value accordingly. Also notice that storing this value as percentage requires a costly integer multiplication to be carried out in a hot path, so instead store it as an "extended fixed point" value with more fraction bits and update the code using it accordingly (it is better to change the name of the field along with its meaning in one go than to make those two changes separately, as that would likely lead to more confusion). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-05-11 22:58:37 +02:00
Chen Yu	4578ee7e1d	intel_pstate: Avoid unnecessary synchronize_sched() during initialization Currently, in intel_pstate_clear_update_util_hook(), after clearing the utilization update hook, we leverage synchronize_sched() to deal with synchronization, which is a little bit time-costly because synchronize_sched() has to wait for all the CPUs to go through a grace period. Actually, the synchronize_sched() is not necessary if the utilization update hook has not been set for the given CPU yet, so make the driver check if that's the case and avoid the synchronize_sched() call then. Link: https://bugzilla.kernel.org/show_bug.cgi?id=116371 Tested-by: Tian Ye <yex.tian@intel.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com> [ rjw : Rebase ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-05-11 22:56:34 +02:00
Rafael J. Wysocki	f96fd0c86f	intel_pstate: Clean up intel_pstate_get() intel_pstate_get() contains a local variable that's initialized but never used and it can be written in fewer lines of code, so clean it up. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>	2016-05-09 22:55:36 +02:00
Rafael J. Wysocki	da43af961b	Merge cpufreq fixes going into v4.6. * pm-cpufreq-fixes: intel_pstate: Fix intel_pstate_get() cpufreq: intel_pstate: Fix HWP on boot CPU after system resume cpufreq: st: enable selective initialization based on the platform cpufreq: intel_pstate: Fix processing for turbo activation ratio	2016-05-06 22:01:14 +02:00
Srinivas Pandruvada	e59a8f7ff4	cpufreq: intel_pstate: Ignore _PPC processing under HWP When HWP (hardware P states) feature is active, the ACPI _PSS and _PPC is not used. So ignore processing for _PPC limits. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-05-05 01:43:47 +02:00

1 2 3 4 5

218 Commits