The current notifiers have the following error handling pattern all
over the place:
int err, nr;
err = __foo_notifier_call_chain(&chain, val_up, v, -1, &nr);
if (err & NOTIFIER_STOP_MASK)
__foo_notifier_call_chain(&chain, val_down, v, nr-1, NULL)
And aside from the endless repetition thereof, it is broken. Consider
blocking notifiers; both calls take and drop the rwsem, this means
that the notifier list can change in between the two calls, making @nr
meaningless.
Fix this by replacing all the __foo_notifier_call_chain() functions
with foo_notifier_call_chain_robust() that embeds the above pattern,
but ensures it is inside a single lock region.
Note: I switched atomic_notifier_call_chain_robust() to use
the spinlock, since RCU cannot provide the guarantee
required for the recovery.
Note: software_resume() error handling was broken afaict.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20200818135804.325626653@infradead.org
Important fixes:
- in s2idle, use timekeeping_freeze trace mark instead of
machine_suspend to denote entry into s2idle mode.
- in s2idle, use machine_suspend trace mark to create a new virtual
device called "s2idle_enter_<n>x". It denotes an s2idle_enter call
loop of <n> iterations where s2idle was never actually achieved.
It isn't counted as "freeze time" in the header.
- in s2idle, only show multiple freeze times if s2idle went in and
out of resume_noirq. Otherwise multiple freezes are shown with
"waking" time subtracted (waking time is time spent outside s2idle
dealing with wakeups).
- in s2idle summaries, include "FREEZEWAKE" as an issue when at
least 1ms is spent waking from s2idle. A clean run should only
wake for the rtc timer.
- add support for device callbacks with matching names in the same
phase. In rare cases some devices register multiple callbacks from
separate drivers using the same name. Without this fix only one is
shown.
- add kparamsfmt string back to fix bootgraph
General updates:
- when suspend_machine is missing, error says "failed in
suspend_machine"
- extract target count/time and add to summary title if -multi
used
- include any instances of "timeout" in dmesg as issues to be
logged.
- fix ftrace parse to handle any number of flags (instead of
just 4).
- remove sync/async_device string from device detail, remains in
hover.
- when using callgraph (-f) add driver name to callgraph titles.
Signed-off-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The single user could have called freeze_secondary_cpus() directly.
Since this function was a source of confusion, remove it as it's
just a pointless wrapper.
While at it, rename enable_nonboot_cpus() to thaw_secondary_cpus() to
preserve the naming symmetry.
Done automatically via:
git grep -l enable_nonboot_cpus | xargs sed -i 's/enable_nonboot_cpus/thaw_secondary_cpus/g'
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Link: https://lkml.kernel.org/r/20200430114004.17477-1-qais.yousef@arm.com
sleepgraph:
- force usage of python3 instead of using system default
- fix bugzilla 204773 (https://bugzilla.kernel.org/show_bug.cgi?id=204773)
- fix issue of platform info not being reset in -multi (logs fill up)
- change -ftop call to "pm_suspend", this is one level below state_store
- add -wificheck command to read out the current wifi device details
- change -wifi behavior to poll /proc/net/wireless for wifi connect
- add wifi reconnect time to timeline, include time in summary column
- add "fail on wifi_resume" to timeline and summary when wifi fails
- add a set of commands to collect data before/after suspend in the log
- add "-cmdinfo" command which prints out all the data collected
- check for cmd info tools at start, print found/missing in green/red
- fix kernel suspend time calculation: tool used to look for start of
pm_suspend_console, but the order has changed. latest kernel starts
with ksys_sync, use this instead
- include time spent in mem/disk in the header (same as freeze/standby)
- ignore turbostat 32-bit capability warnings
- print to result.txt when -skiphtml is used, just say result: pass
- don't exit on SIGTSTP, it's a ctrl-Z and the tool may come back
- -multi argument supports duration as well as count: hours, minutes, seconds
- update the -multi status output to be more informative
- -maxfail sets maximum consecutive fails before a -multi run is aborted
- in -summary, ignore dmesg/ftrace/html files that are 0 size
bootgraph:
- force usage of python3 instead of using system default
README:
- add endurance testing instructions
Makefile:
- remove pycache on uninstall
Signed-off-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
sleepgraph:
- kprobe_events won't set correctly if the data is buffered
- force sysvals.setVal to be unbuffered and use binary mode
- tested in both python2 and python3
Link: https://bugzilla.kernel.org/show_bug.cgi?id=204773
Signed-off-by: Todd Brandt <todd.e.brandt@linux.intel.com>
[ rjw: Subject ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Upgrade bootgraph/sleepgraph to be able to run on python2 and python3.
Both now simply require python, the system can choose which to use.
bootgraph python3 update:
- add floor function to handle integer arithmetic
- change argument loop to use next() instead of args.next()
- open dmesg log and popen in binary, use decode(ascii, ignore)
- sort all html data to allow diff between python versions
- change exception handler to use python3 as instead of comma
sleepgraph python3 update:
- import configparser not ConfigParser (p2 needs python-configparser)
- add floor function to handle integer arithmetic
- change argument loop to use next() instead of args.next()
- handle popen output in binary, use decode(ascii, ignore)
- sort all html/output data to allow diff between python versions
- force gzip open to use text mode, same for file open
- ensure no binary data is written to logs (ascii convert devprops info)
- use codecs library to handle zlib encoding for mcelog data
- remove all uses of python3.7 keyword "async" as members or vars
- assume all FPDT and DMI data is in binary string form
sleepgraph:
- turbostat will be used by default if it's found & the mode is freeze
- a new option "-noturbostat" will disable its use
- fix bug where two callgraphs with the same start time overwrite.
- fix s2idle processing where two suspend/resume_machines occur back2back
- update getexec function to use which first (assuming PATH exists)
- new platforminfo data in log with: lspci, gpe counts, /proc/interrupts
- new data is zipped, b64 encoded, and tacked on the end of ftrace
Signed-off-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms and conditions of the gnu general public license
version 2 as published by the free software foundation this program
is distributed in the hope it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 263 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190529141901.208660670@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
bootgraph:
- dmesg log format has changed, update parser in two places
- fix prints in preparation for upgrade to python3
sleepgraph:
- fix prints in preparation for upgrade to python3
- add new trace events and kprobes to cover freeze more completely
- add new -ftop callgraph trace over suspend_devices_and_enter
- add -wifi option to check if a wifi connection is active
- add -skipkprobe option to suppress unwanted kprobes in dev mode
- add kernel params and sysinfo to the log output
- don't crash if /dev/mem is throwing IO errors, ignore FPDT and DMI
- fix kprobe length calculation when calls are recursive
- add several new kernel issue definitions for USB, ACPI, ATA, etc
- enable turbostat output to be read from stdout instead of from file
- add BIOS call data to the timeline from acpi_ps_execute_method kprobe
Signed-off-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
sleepgraph:
- add support for parsing kernel issues from timeline dmesg logs
- with -summary, generate a summary-issues.html for kernel issues found
- with -summary, generate a summary-devices.html for device callback times
- when recreating a timeline, use -o to set the output html filename
- capture mcelog data when hardware errors occur and store in log
- add -turbostat option to capture power data during freeze
Signed-off-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
bootgraph & sleepgraph:
- funnel all prints through the pprint function
- remove superfluous print calls, arrange them in single blocks
- flush stdout on every print, enables log capture on hang
sleepgraph:
- in -summary, if all tests have the same host+kernel+mode, add to title
- update verbose device detail print to include machine suspend/resume
- match tKernSus and tKernRes to pm_prepare/restore_console
- fully support multiple suspend/resumes in a single timeline
- enable various disk modes (disk-suspend, disk-test_resume, etc)
- add warnings when -display (xset) fails
Signed-off-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
general:
- add battery charge data before and after test
- remove special s0i3 handling
- remove melding of dmesg & ftrace data in old kernels, use one only
- updates to various kprobes in trace (ksys_sync, etc)
- enable pm_debug_messages during the test
- instrument more subsystems with dev functions (phy0)
error handling:
- return codes for tool show the status of the test run
- 0: success, 1: general error (no timeline), 2: fail (suspend aborted)
- monitor output of /sys/power/state, mark as failure if exception occurs
- add signal handler when using -result to catch tool exceptions
display control
- add -x commands for testing xset with mode settings and status
- allow display setting to on, off, suspend, standby
- add display mode change info to the log, along with a warning on fail
s2idle (freeze)
- remove fixed 10-phase dependency, allow any phase order & any count
- multiple phase occurences show as phase_nameN e.g. suspend_noirq3
- if multiple freezes occur, print multiple time values in header
summary:
- add new columns to summary output: issues, worst suspend/resume devices
- worst device: includes summation of all phases of suspend or resume
- issues: includes WARNING/ERROR/BUG from dmesg log, and other issues
- s2idle: multiple freezes show as FREEZExN in the issues column
Signed-off-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
general changes:
- make python dependent on version2 to enable clearlinux
- upgrade dmesg error/warning extraction to be more detailed
- enable logs generated from -cmd runs to be processed in gzip form
- add notification on power mode entry failure into the timeline
- add -battery option to show if battery is connected and its charge
summary changes (output of -summary):
- add -genhtml option to regenerate missing timelines from logs found
- add min/max/median/avg data to the summary page with links to the data
- add highlight to minimum, maximum, and median tests
- add result column to summary (pass or fail) with red highlight on fail
- add issues column to summary with a list of dmesg err/warn/bugs
Signed-off-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
- add -cgskip option to reduce callgraph output size
- add -cgfilter option to focus on a list of devices
- add -result option for exporting batch test results
- removed all phoronix hooks, use -result to enable batch testing
- change -usbtopo to -devinfo, now prints all devices
- add -gzip option to read/write logs in gz format
- add -bufsize option to manually control ftrace buffer size
- add -sync option to run filesystem sync prior to test
- add -display option to enable/disable the display prior to test
- add -rs option to enable/disable runtime suspend on all devices for test
- add installed config files to search path
- add kernel error/warning links into the timeline
- fix callgraph trace to better handle interrupts
- include command string and kernel params in timeline output header
Signed-off-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
- name change: analyze_boot.py to bootgraph.py
- name change: analyze_suspend.py to sleepgraph.py
- added config files for easier sleepgraph usage
- added example.cfg which describes all config options
- added cgskip.txt definition for slimmer callgraphs
Signed-off-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>