When build with W=1 and "-Werror=format-truncation", below error is
observed in coretemp driver,
drivers/hwmon/coretemp.c: In function 'create_core_data':
>> drivers/hwmon/coretemp.c:393:34: error: '%s' directive output may be truncated writing likely 5 or more bytes into a region of size between 3 and 13 [-Werror=format-truncation=]
393 | "temp%d_%s", attr_no, suffixes[i]);
| ^~
drivers/hwmon/coretemp.c:393:26: note: assuming directive output of 5 bytes
393 | "temp%d_%s", attr_no, suffixes[i]);
| ^~~~~~~~~~~
drivers/hwmon/coretemp.c:392:17: note: 'snprintf' output 7 or more bytes (assuming 22) into a destination of size 19
392 | snprintf(tdata->attr_name[i], CORETEMP_NAME_LENGTH,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
393 | "temp%d_%s", attr_no, suffixes[i]);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
Given that
1. '%d' could take 10 charactors,
2. '%s' could take 10 charactors ("crit_alarm"),
3. "temp", "_" and the NULL terminator take 6 charactors,
fix the problem by increasing CORETEMP_NAME_LENGTH to 28.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Fixes: 7108b80a54 ("hwmon/coretemp: Handle large core ID value")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202310200443.iD3tUbbK-lkp@intel.com/
Link: https://lore.kernel.org/r/20231025122316.836400-1-rui.zhang@intel.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
The refinement of tjmax value retrieved from MSR_IA32_TEMPERATURE_TARGET
has been changed for several times.
Now, the raw value from MSR is used without refinement. Thus remove the
obsolete comment.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Link: https://lore.kernel.org/r/20230330103346.6044-2-rui.zhang@intel.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
After commit c0c67f8761 ("hwmon: (coretemp) Add support for dynamic
tjmax"), tjmax value is retrieved from MSR every time the temperature is
read.
This means that, with debug message enabled, the tjmax debug message is
printed out for every single temperature read for any CPU. This spams
the syslog.
Ideally, as tjmax is package scope unique, the debug message should show
once when tjmax is changed for one package. But this requires inventing
some new per-package data in the coretemp driver, and this is overkill.
To keep the code simple, delete the tjmax debug message.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Link: https://lore.kernel.org/r/20230330103346.6044-1-rui.zhang@intel.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Coretemp's platform driver is unconventional. All the real work is done
globally by the initcall and CPU hotplug notifiers, while the "driver"
effectively just wraps an allocation and the registration of the hwmon
interface in a long-winded round-trip through the driver core. The whole
logic of dynamically creating and destroying platform devices to bring
the interfaces up and down is error prone, since it assumes
platform_device_add() will synchronously bind the driver and set drvdata
before it returns, thus results in a NULL dereference if drivers_autoprobe
is turned off for the platform bus. Furthermore, the unusual approach of
doing that from within a CPU hotplug notifier, already commented in the
code that it deadlocks suspend, also causes lockdep issues for other
drivers or subsystems which may want to legitimately register a CPU
hotplug notifier from a platform bus notifier.
All of these issues can be solved by ripping this unusual behaviour out
completely, simply tying the platform devices to the lifetime of the
module itself, and directly managing the hwmon interfaces from the
hotplug notifiers. There is a slight user-visible change in that
/sys/bus/platform/drivers/coretemp will no longer appear, and
/sys/devices/platform/coretemp.n will remain present if package n is
hotplugged off, but hwmon users should really only be looking for the
presence of the hwmon interfaces, whose behaviour remains unchanged.
Link: https://lore.kernel.org/lkml/20220922101036.87457-1-janusz.krzysztofik@linux.intel.com/
Link: https://gitlab.freedesktop.org/drm/intel/issues/6641
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Link: https://lore.kernel.org/r/20230103114620.15319-1-janusz.krzysztofik@linux.intel.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
The coretemp driver uses rdmsr_on_cpu calls to read
MSR_IA32_PACKAGE_THERM_STATUS/MSR_IA32_THERM_STATUS registers,
which contain information about current core temperature.
For certain low latency applications, the RDMSR interruption exceeds
the applications requirements.
So do not create core files in sysfs, for CPUs which have
isolation and nohz_full enabled.
Temperature information from the housekeeping cores should be
sufficient to infer die temperature.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Link: https://lore.kernel.org/r/Y5zT6B1mY9/pnwJV@tpad
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Tjmax value retrieved from MSR_IA32_TEMPERATURE_TARGET can be changed at
runtime when the Intel SST-PP (Intel Speed Select Technology -
Performance Profile) level is changed. As a result, the ttarget value
also becomes dyamic.
Improve the code to always get updated ttarget value.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Link: https://lore.kernel.org/r/20221113153145.32696-4-rui.zhang@intel.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Tjmax value retrieved from MSR_IA32_TEMPERATURE_TARGET can be changed at
runtime when the Intel SST-PP (Intel Speed Select Technology -
Performance Profile) level is changed.
Improve the code to always use updated tjmax when it can be retrieved
from MSR_IA32_TEMPERATURE_TARGET.
When tjmax can not be retrieved from MSR_IA32_TEMPERATURE_TARGET, still
follow the previous logic and always use a static tjmax value.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Link: https://lore.kernel.org/r/20221113153145.32696-3-rui.zhang@intel.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Rearrange the tjmax handling code so that it can be used directly in
the sysfs attribute callbacks without forward declarations.
No functional change in this patch.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Link: https://lore.kernel.org/r/20221113153145.32696-2-rui.zhang@intel.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Checking for the valid bit of IA32_THERM_STATUS is removed in commit
bf6ea084eb ("hwmon: (coretemp) Do not return -EAGAIN for low
temperatures"), and temp_data->valid is set and never cleared when the
temperature has been read once.
Remove the obsolete temp_data->valid field.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Link: https://lore.kernel.org/r/20221108075051.5139-2-rui.zhang@intel.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
As comment of pci_get_domain_bus_and_slot() says, it returns
a pci device with refcount increment, when finish using it,
the caller must decrement the reference count by calling
pci_dev_put(). So call it after using to avoid refcount leak.
Fixes: 14513ee696 ("hwmon: (coretemp) Use PCI host bridge ID to identify CPU if necessary")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Link: https://lore.kernel.org/r/20221118093303.214163-1-yangyingliang@huawei.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
The coretemp driver supports up to a hard-coded limit of 128 cores.
Today, the driver can not support a core with an ID above that limit.
Yet, the encoding of core ID's is arbitrary (BIOS APIC-ID) and so they
may be sparse and they may be large.
Update the driver to map arbitrary core ID numbers into appropriate
array indexes so that 128 cores can be supported, no matter the encoding
of core ID's.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Len Brown <len.brown@intel.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20221014090147.1836-3-rui.zhang@intel.com
We have bool so use it consistently in all the drivers.
The following Coccinelle script was used:
@@
identifier T;
type t = { char, int };
@@
struct T {
...
- t valid;
+ bool valid;
...
}
@@
identifier v;
@@
(
- v->valid = 0
+ v->valid = false
|
- v->valid = 1
+ v->valid = true
)
followed by sed to fixup the comments:
sed '/bool valid;/{s/!=0/true/;s/zero/false/}'
Few whitespace changes were fixed manually. All modified drivers were
compile-tested.
Signed-off-by: Paul Fertser <fercerpav@gmail.com>
Link: https://lore.kernel.org/r/20210924195202.27917-1-fercerpav@gmail.com
[groeck: Fixed up 'u8 valid' to 'boool valid' in atxp1.c]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
The new macro set has a consistent namespace and uses C99 initializers
instead of the grufty C89 ones.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lkml.kernel.org/r/20200320131509.859324598@linutronix.de
In coretemp_init(), 'zone_devices' is allocated through kcalloc().
However, it is not deallocated in the following execution if
platform_driver_register() fails, leading to a memory leak. To fix this
issue, introduce the 'outzone' label to free 'zone_devices' before
returning the error.
Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Link: https://lore.kernel.org/r/1566248402-6538-1-git-send-email-wenwen@cs.uga.edu
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Pull x86 topology updates from Ingo Molnar:
"Implement multi-die topology support on Intel CPUs and expose the die
topology to user-space tooling, by Len Brown, Kan Liang and Zhang Rui.
These changes should have no effect on the kernel's existing
understanding of topologies, i.e. there should be no behavioral impact
on cache, NUMA, scheduler, perf and other topologies and overall
system performance"
* 'x86-topology-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel/rapl: Cosmetic rename internal variables in response to multi-die/pkg support
perf/x86/intel/uncore: Cosmetic renames in response to multi-die/pkg support
hwmon/coretemp: Cosmetic: Rename internal variables to zones from packages
thermal/x86_pkg_temp_thermal: Cosmetic: Rename internal variables to zones from packages
perf/x86/intel/cstate: Support multi-die/package
perf/x86/intel/rapl: Support multi-die/package
perf/x86/intel/uncore: Support multi-die/package
topology: Create core_cpus and die_cpus sysfs attributes
topology: Create package_cpus sysfs attribute
hwmon/coretemp: Support multi-die/package
powercap/intel_rapl: Update RAPL domain name and debug messages
thermal/x86_pkg_temp_thermal: Support multi-die/package
powercap/intel_rapl: Support multi-die/package
powercap/intel_rapl: Simplify rapl_find_package()
x86/topology: Define topology_logical_die_id()
x86/topology: Define topology_die_id()
cpu/topology: Export die_id
x86/topology: Create topology_max_die_per_package()
x86/topology: Add CPUID.1F multi-die/package support
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation version 2 of the license this program
is distributed in the hope that it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details you should have received a copy of the gnu general
public license along with this program if not write to the free
software foundation inc 51 franklin street fifth floor boston ma
02110 1301 usa
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 12 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Richard Fontana <rfontana@redhat.com>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070033.745497013@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Syntax update only -- no logical or functional change.
In response to the new multi-die/package changes, update variable names to
use the more generic thermal "zone" terminology, instead of "package", as
the zones can refer to either packages or die.
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Link: https://lkml.kernel.org/r/facecfd3525d55c2051f63a7ec709aeb03cc1dc1.1557769318.git.len.brown@intel.com
Package temperature sensors are actually implemented in hardware per-die.
Update coretemp to be "die-aware", so it can expose mulitple sensors per
package, instead of just one. No change to single-die/package systems.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: linux-pm@vger.kernel.org
Cc: linux-hwmon@vger.kernel.org
Link: https://lkml.kernel.org/r/ec2868f35113a01ff72d9041e0b97fc6a1c7df84.1557769318.git.len.brown@intel.com
Replace S_<PERMS> with octal values.
The conversion was done automatically with coccinelle. The semantic patches
and the scripts used to generate this commit log are available at
https://github.com/groeck/coccinelle-patches/hwmon/.
This patch does not introduce functional changes. It was verified by
compiling the old and new files and comparing text and data sizes.
Cc: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Pull x86 PTI and Spectre related fixes and updates from Ingo Molnar:
"Here's the latest set of Spectre and PTI related fixes and updates:
Spectre:
- Add entry code register clearing to reduce the Spectre attack
surface
- Update the Spectre microcode blacklist
- Inline the KVM Spectre helpers to get close to v4.14 performance
again.
- Fix indirect_branch_prediction_barrier()
- Fix/improve Spectre related kernel messages
- Fix array_index_nospec_mask() asm constraint
- KVM: fix two MSR handling bugs
PTI:
- Fix a paranoid entry PTI CR3 handling bug
- Fix comments
objtool:
- Fix paranoid_entry() frame pointer warning
- Annotate WARN()-related UD2 as reachable
- Various fixes
- Add Add Peter Zijlstra as objtool co-maintainer
Misc:
- Various x86 entry code self-test fixes
- Improve/simplify entry code stack frame generation and handling
after recent heavy-handed PTI and Spectre changes. (There's two
more WIP improvements expected here.)
- Type fix for cache entries
There's also some low risk non-fix changes I've included in this
branch to reduce backporting conflicts:
- rename a confusing x86_cpu field name
- de-obfuscate the naming of single-TLB flushing primitives"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits)
x86/entry/64: Fix CR3 restore in paranoid_exit()
x86/cpu: Change type of x86_cache_size variable to unsigned int
x86/spectre: Fix an error message
x86/cpu: Rename cpu_data.x86_mask to cpu_data.x86_stepping
selftests/x86/mpx: Fix incorrect bounds with old _sigfault
x86/mm: Rename flush_tlb_single() and flush_tlb_one() to __flush_tlb_one_[user|kernel]()
x86/speculation: Add <asm/msr-index.h> dependency
nospec: Move array_index_nospec() parameter checking into separate macro
x86/speculation: Fix up array_index_nospec_mask() asm constraint
x86/debug: Use UD2 for WARN()
x86/debug, objtool: Annotate WARN()-related UD2 as reachable
objtool: Fix segfault in ignore_unreachable_insn()
selftests/x86: Disable tests requiring 32-bit support on pure 64-bit systems
selftests/x86: Do not rely on "int $0x80" in single_step_syscall.c
selftests/x86: Do not rely on "int $0x80" in test_mremap_vdso.c
selftests/x86: Fix build bug caused by the 5lvl test which has been moved to the VM directory
selftests/x86/pkeys: Remove unused functions
selftests/x86: Clean up and document sscanf() usage
selftests/x86: Fix vDSO selftest segfault for vsyscall=none
x86/entry/64: Remove the unused 'icebp' macro
...
x86_mask is a confusing name which is hard to associate with the
processor's stepping.
Additionally, correct an indent issue in lib/cpu.c.
Signed-off-by: Jia Zhang <qianyue.zj@alibaba-inc.com>
[ Updated it to more recent kernels. ]
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@alien8.de
Cc: tony.luck@intel.com
Link: http://lkml.kernel.org/r/1514771530-70829-1-git-send-email-qianyue.zj@alibaba-inc.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
pci_get_bus_and_slot() is restrictive such that it assumes domain=0 as
where a PCI device is present. This restricts the device drivers to be
reused for other domain numbers.
Use pci_get_domain_bus_and_slot() with a domain number of 0 where we can't
extract the domain number. Other places, use the actual domain number from
the device.
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
The recent conversion to the hotplug state machine missed that the original
hotplug notifiers did not execute in the frozen state, which is used on
suspend on resume.
This does not matter on single socket machines, but on multi socket systems
this breaks when the device for a non-boot socket is removed when the last
CPU of that socket is brought offline. The device removal locks up the
machine hard w/o any debug output.
Prevent executing the hotplug callbacks when cpuhp_tasks_frozen is true.
Thanks to Tommi for providing debug information patiently while I failed to
spot the obvious.
Fixes: e00ca5df37 ("hwmon: (coretemp) Convert to hotplug state machine")
Reported-by: Tommi Rantala <tt.rantala@gmail.com>
Tested-by: Tommi Rantala <tt.rantala@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Keeping track of the per package platform devices requires an extra object,
which is held in a linked list.
The maximum number of packages is known at init() time. So the extra object
and linked list management can be replaced by an array of platform device
pointers in which the per package devices pointers can be stored. Lookup
becomes a simple array lookup instead of a list walk.
The mutex protecting the list can be removed as well because the array is
only accessed from cpu hotplug callbacks which are already serialized.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
The cpu online callback returns success unconditionally even when the
device has no support, micro code mismatches or device allocation fails.
Only if CPU_HOTPLUG is disabled, the init function checks whether the
device list is empty and removes the driver.
This does not make sense. If CPU HOTPLUG is enabled then there is no point
to keep the driver around when it failed to initialize on the already
online cpus. The chance that not yet online CPUs will provide a functional
interface later is very close to zero.
Add proper error return codes, so the setup of the cpu hotplug states fails
when the device cannot be initialized and remove all the magic cruft.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Install the callbacks via the state machine. Setup and teardown are handled
by the hotplug core.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-hwmon@vger.kernel.org
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Jean Delvare <jdelvare@suse.com>
Cc: rt@linuxtronix.de
Cc: Guenter Roeck <linux@roeck-us.net>
Link: http://lkml.kernel.org/r/20161117183541.8588-5-bigeasy@linutronix.de
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
No point in looking up the same thing over and over.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
The coretemp driver provides a sysfs interface per physical core. If
hyperthreading is enabled and one of the siblings goes offline the sysfs
interface is removed and then immeditately created again for the
sibling. The only difference of them is the target cpu for the
rdmsr_on_cpu() in the sysfs show functions.
It's way simpler to keep a cpumask of cpus which are active in a package
and only remove the interface when the last sibling goes offline. Otherwise
just move the target cpu for the sysfs show functions to the still online
sibling.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
When a CPU is offlined nothing checks whether it is the target CPU for the
package temperature sysfs interface.
As a consequence all future readouts of the package temperature return
crap:
90000
which is Tjmax of that package.
Check whether the outgoing CPU is the target for the package and assign it
to some other still online CPU in the package. Protect the change against
the rdmsr_on_cpu() in show_crit_alarm().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
A new limit selected arbitrarily as power of two greater than
required minimum for Xeon Phi processor (72 for Knights Landing).
Currently driver is not able to handle cores with core ID greater than 32.
Such attempt ends up with the following error in dmesg:
coretemp coretemp.0: Adding Core XXX failed
Signed-off-by: Lukasz Odzioba <lukasz.odzioba@intel.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
The former duplicates the functionality of the latter but is
neither documented nor arch-independent.
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Cc: Benoit Cousson <bcousson@baylibre.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Jean Delvare <jdelvare@suse.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Drokin <oleg.drokin@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: http://lkml.kernel.org/r/1432645896-12588-4-git-send-email-bgolaszewski@baylibre.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
By extracting the only part that differs we can allow static checking
of the format string, and possibly save a little .rodata.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
[Guenter Roeck: continuation line alignment]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
This reverts commit 9fb6c9c73b.
Tjmax on some Intel CPUs is below 85 degrees C. One known example is
L5630 with Tjmax of 71 degrees C. There are other Xeon processors with
Tjmax of 70 or 80 degrees C. Also, the Intel IA32 System Programming
document states that the temperature target is in bits 23:16 of MSR 0x1a2
(MSR_TEMPERATURE_TARGET), which is 8 bits, not 7.
So even if turbostat uses similar checks to validate Tjmax, there is no
evidence that the checks are actually required. On the contrary, the
checks are known to cause problems and therefore need to be removed.
This fixes https://bugzilla.kernel.org/show_bug.cgi?id=75071.
Fixes: 9fb6c9c hwmon: (coretemp) Refine TjMax detection
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Cc: stable@vger.kernel.org # 3.14+
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
The purpose of this single series of commits from Srivatsa S Bhat (with
a small piece from Gautham R Shenoy) touching multiple subsystems that use
CPU hotplug notifiers is to provide a way to register them that will not
lead to deadlocks with CPU online/offline operations as described in the
changelog of commit 93ae4f978c (CPU hotplug: Provide lockless versions
of callback registration functions).
The first three commits in the series introduce the API and document it
and the rest simply goes through the users of CPU hotplug notifiers and
converts them to using the new method.
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJTQow2AAoJEILEb/54YlRxW4QQAJlYRDUzwFJzJzYhltQYuVR+
4D74XMtvXgoJfg3cwdSWvMKKpJZnA9BVN0f7Hcx9wYmgdexYUuHeZJmMNyc3S2+g
KjKBIsugvgmZhHbbLd6TJ6GBbhGT5JLt9VmSfL9zIkveInU1YHFUUqL/mxdHm4J0
BSGKjk2rN3waRJgmY+xfliFLtQjDKFwJpMuvrgtoUyfas3f4sIV43UNbqdvA/weJ
rzedxXOlKH/id4b56lj/4iIzcoL3mwvJJ7r6n0CEMsKv87z09kqR0O+69Tsq/cgs
j17CsvoJOmZGk3QTeKVMQWBsvk6aPoDu3zK83gLbQMt+qjOpSTbJLz/3HZw4/TrW
ss4nuZne1DLMGS+6hoxYbTP+6Ni//Kn+l/LrHc5jb7m1X3lMO4W2aV3IROtIE1rv
lEP1IG01NU4u9YwkVj1dyhrkSp8tLPul4SrUK8W+oNweOC5crjJV7vJbIPJgmYiM
IZN55wln0yVRtR4TX+rmvN0PixsInE8MeaVCmReApyF9pdzul/StxlBze5BKLSJD
cqo1kNPpsmdxoDucqUpQ/gSvy+IOl2qnlisB5PpV93sk7De6TFDYrGHxjYIW7jMf
StXwdCDDQhzd2Q8Kfpp895A1dbIl8rKtwA6bTU2eX+BfMVFzuMdT44cvosx1+UdQ
sWl//rg76nb13dFjvF+q
=SW7Q
-----END PGP SIGNATURE-----
Merge tag 'cpu-hotplug-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull CPU hotplug notifiers registration fixes from Rafael Wysocki:
"The purpose of this single series of commits from Srivatsa S Bhat
(with a small piece from Gautham R Shenoy) touching multiple
subsystems that use CPU hotplug notifiers is to provide a way to
register them that will not lead to deadlocks with CPU online/offline
operations as described in the changelog of commit 93ae4f978c ("CPU
hotplug: Provide lockless versions of callback registration
functions").
The first three commits in the series introduce the API and document
it and the rest simply goes through the users of CPU hotplug notifiers
and converts them to using the new method"
* tag 'cpu-hotplug-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (52 commits)
net/iucv/iucv.c: Fix CPU hotplug callback registration
net/core/flow.c: Fix CPU hotplug callback registration
mm, zswap: Fix CPU hotplug callback registration
mm, vmstat: Fix CPU hotplug callback registration
profile: Fix CPU hotplug callback registration
trace, ring-buffer: Fix CPU hotplug callback registration
xen, balloon: Fix CPU hotplug callback registration
hwmon, via-cputemp: Fix CPU hotplug callback registration
hwmon, coretemp: Fix CPU hotplug callback registration
thermal, x86-pkg-temp: Fix CPU hotplug callback registration
octeon, watchdog: Fix CPU hotplug callback registration
oprofile, nmi-timer: Fix CPU hotplug callback registration
intel-idle: Fix CPU hotplug callback registration
clocksource, dummy-timer: Fix CPU hotplug callback registration
drivers/base/topology.c: Fix CPU hotplug callback registration
acpi-cpufreq: Fix CPU hotplug callback registration
zsmalloc: Fix CPU hotplug callback registration
scsi, fcoe: Fix CPU hotplug callback registration
scsi, bnx2fc: Fix CPU hotplug callback registration
scsi, bnx2i: Fix CPU hotplug callback registration
...
Subsystems that want to register CPU hotplug callbacks, as well as perform
initialization for the CPUs that are already online, often do it as shown
below:
get_online_cpus();
for_each_online_cpu(cpu)
init_cpu(cpu);
register_cpu_notifier(&foobar_cpu_notifier);
put_online_cpus();
This is wrong, since it is prone to ABBA deadlocks involving the
cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
with CPU hotplug operations).
Instead, the correct and race-free way of performing the callback
registration is:
cpu_notifier_register_begin();
for_each_online_cpu(cpu)
init_cpu(cpu);
/* Note the use of the double underscored version of the API */
__register_cpu_notifier(&foobar_cpu_notifier);
cpu_notifier_register_done();
Fix the hwmon coretemp code by using this latter form of callback
registration.
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Jean Delvare <jdelvare@suse.de>
Cc: Ingo Molnar <mingo@kernel.org>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Simplify code, reduce code size, and attach sysfs attributes to hwmon device.
For this driver, the only attribute created is the name attribute.
Other attributes are still created and removed dynamically as cores
are added or removed.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Tested-by: Jean Delvare <jdelvare@suse.de>
This simplifies error handling.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Tested-by: Jean Delvare <jdelvare@suse.de>
Instead of creating each attribute individually, use sysfs_create_group
to create all attributes for one core with a single call.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Tested-by: Jean Delvare <jdelvare@suse.de>
Some Intel CPUs do not set the 'valid' bit in IA32_THERM_STATUS if the
temperature is too low to be measured. This condition will not change until
the CPU is hot enough for its temperature to be measured. Returning an error
in such conditions is not very useful. Drop checking the valid bit and just
return the reported temperature instead.
Reviewed-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Intel's turbostat code uses only 7 bits from MSR_IA32_TEMPERATURE_TARGET to
read TjMax, and also only accepts it if the reported temperature is at least
85 degrees C. Play safe and do the same.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Since we now have to use PCI IDs to detect CPU types anyway, use this mechanism
to detect CE41x0 CPUs. Advantage is that it only requires a single entry and
covers all variants of CE41x0, including those unknown to us.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Atom S12x0 CPUs are identified by the CPU host bridge ID. Add an override
table based on PCI IDs as well as code to detect it.
PCI access functions can now be called with PCI disabled, so unlike previous
attempts to use PCI IDs, the code no longer depends on it. If PCI is disabled,
the CPU will not be identified correctly. Since it is unlikely that anything
will work in this case, this is an acceptable limitation.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
When the core number exceeds 9, the size of the buffer storing the
alarm attribute name is insufficient and the attribute name is
truncated. This causes libsensors to skip these attributes as the
truncated name is not recognized.
Reported-by: Andreas Hollmann <hollmann@in.tum.de>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: stable@vger.kernel.org
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Display warning "Unable to read TjMax from CPU x" only if the CPU
is supposed to support it. This is not the case for the various Atom CPUs.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
The __cpuinit type of throwaway sections might have made sense
some time ago when RAM was more constrained, but now the savings
do not offset the cost and complications. For example, the fix in
commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
is a good example of the nasty type of bugs that can be created
with improper use of the various __init prefixes.
After a discussion on LKML[1] it was decided that cpuinit should go
the way of devinit and be phased out. Once all the users are gone,
we can then finally remove the macros themselves from linux/init.h.
This removes all the drivers/hwmon uses of the __cpuinit macros
from all C files.
[1] https://lkml.org/lkml/2013/5/20/589
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: lm-sensors@lm-sensors.org
Acked-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Commit 0998d06310 (device-core: Ensure drvdata = NULL when no
driver is bound) removes the need to set driver data field to
NULL.
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Cc: Rudolf Marek <r.marek@assembler.cz>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Cc: Corentin Labbe <corentin.labbe@geomatys.fr>
Cc: Mark M. Hoffman <mhoffman@lightlink.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Juerg Haefliger <juergh@gmail.com>
Cc: Andreas Herrmann <herrmann.der.user@googlemail.com>
Cc: Rudolf Marek <r.marek@assembler.cz>
Cc: Jim Cromie <jim.cromie@gmail.com>
Cc: Roger Lucas <vt8231@hiddenengine.co.uk>
Cc: Marc Hulsman <m.hulsman@tudelft.nl>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>