mirror of
https://mirrors.bfsu.edu.cn/git/linux.git
synced 2024-11-11 21:38:32 +08:00
devlink: convert devlink-health.txt to rst format
Update the devlink-health documentation to use the newer ReStructuredText format. Note that it's unclear what OOB stood for, and it has been left as-is without a proper first-use expansion of the acronym. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This commit is contained in:
parent
f4bdd71036
commit
f7555fd199
114
Documentation/networking/devlink/devlink-health.rst
Normal file
114
Documentation/networking/devlink/devlink-health.rst
Normal file
@ -0,0 +1,114 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
==============
|
||||
Devlink Health
|
||||
==============
|
||||
|
||||
Background
|
||||
==========
|
||||
|
||||
The ``devlink`` health mechanism is targeted for Real Time Alerting, in
|
||||
order to know when something bad happened to a PCI device.
|
||||
|
||||
* Provide alert debug information.
|
||||
* Self healing.
|
||||
* If problem needs vendor support, provide a way to gather all needed
|
||||
debugging information.
|
||||
|
||||
Overview
|
||||
========
|
||||
|
||||
The main idea is to unify and centralize driver health reports in the
|
||||
generic ``devlink`` instance and allow the user to set different
|
||||
attributes of the health reporting and recovery procedures.
|
||||
|
||||
The ``devlink`` health reporter:
|
||||
Device driver creates a "health reporter" per each error/health type.
|
||||
Error/Health type can be a known/generic (eg pci error, fw error, rx/tx error)
|
||||
or unknown (driver specific).
|
||||
For each registered health reporter a driver can issue error/health reports
|
||||
asynchronously. All health reports handling is done by ``devlink``.
|
||||
Device driver can provide specific callbacks for each "health reporter", e.g.:
|
||||
|
||||
* Recovery procedures
|
||||
* Diagnostics procedures
|
||||
* Object dump procedures
|
||||
* OOB initial parameters
|
||||
|
||||
Different parts of the driver can register different types of health reporters
|
||||
with different handlers.
|
||||
|
||||
Actions
|
||||
=======
|
||||
|
||||
Once an error is reported, devlink health will perform the following actions:
|
||||
|
||||
* A log is being send to the kernel trace events buffer
|
||||
* Health status and statistics are being updated for the reporter instance
|
||||
* Object dump is being taken and saved at the reporter instance (as long as
|
||||
there is no other dump which is already stored)
|
||||
* Auto recovery attempt is being done. Depends on:
|
||||
- Auto-recovery configuration
|
||||
- Grace period vs. time passed since last recover
|
||||
|
||||
User Interface
|
||||
==============
|
||||
|
||||
User can access/change each reporter's parameters and driver specific callbacks
|
||||
via ``devlink``, e.g per error type (per health reporter):
|
||||
|
||||
* Configure reporter's generic parameters (like: disable/enable auto recovery)
|
||||
* Invoke recovery procedure
|
||||
* Run diagnostics
|
||||
* Object dump
|
||||
|
||||
.. list-table:: List of devlink health interfaces
|
||||
:widths: 10 90
|
||||
|
||||
* - Name
|
||||
- Description
|
||||
* - ``DEVLINK_CMD_HEALTH_REPORTER_GET``
|
||||
- Retrieves status and configuration info per DEV and reporter.
|
||||
* - ``DEVLINK_CMD_HEALTH_REPORTER_SET``
|
||||
- Allows reporter-related configuration setting.
|
||||
* - ``DEVLINK_CMD_HEALTH_REPORTER_RECOVER``
|
||||
- Triggers a reporter's recovery procedure.
|
||||
* - ``DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE``
|
||||
- Retrieves diagnostics data from a reporter on a device.
|
||||
* - ``DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET``
|
||||
- Retrieves the last stored dump. Devlink health
|
||||
saves a single dump. If an dump is not already stored by the devlink
|
||||
for this reporter, devlink generates a new dump.
|
||||
dump output is defined by the reporter.
|
||||
* - ``DEVLINK_CMD_HEALTH_REPORTER_DUMP_CLEAR``
|
||||
- Clears the last saved dump file for the specified reporter.
|
||||
|
||||
The following diagram provides a general overview of ``devlink-health``::
|
||||
|
||||
netlink
|
||||
+--------------------------+
|
||||
| |
|
||||
| + |
|
||||
| | |
|
||||
+--------------------------+
|
||||
|request for ops
|
||||
|(diagnose,
|
||||
mlx5_core devlink |recover,
|
||||
|dump)
|
||||
+--------+ +--------------------------+
|
||||
| | | reporter| |
|
||||
| | | +---------v----------+ |
|
||||
| | ops execution | | | |
|
||||
| <----------------------------------+ | |
|
||||
| | | | | |
|
||||
| | | + ^------------------+ |
|
||||
| | | | request for ops |
|
||||
| | | | (recover, dump) |
|
||||
| | | | |
|
||||
| | | +-+------------------+ |
|
||||
| | health report | | health handler | |
|
||||
| +-------------------------------> | |
|
||||
| | | +--------------------+ |
|
||||
| | health reporter create | |
|
||||
| +----------------------------> |
|
||||
+--------+ +--------------------------+
|
@ -1,86 +0,0 @@
|
||||
The health mechanism is targeted for Real Time Alerting, in order to know when
|
||||
something bad had happened to a PCI device
|
||||
- Provide alert debug information
|
||||
- Self healing
|
||||
- If problem needs vendor support, provide a way to gather all needed debugging
|
||||
information.
|
||||
|
||||
The main idea is to unify and centralize driver health reports in the
|
||||
generic devlink instance and allow the user to set different
|
||||
attributes of the health reporting and recovery procedures.
|
||||
|
||||
The devlink health reporter:
|
||||
Device driver creates a "health reporter" per each error/health type.
|
||||
Error/Health type can be a known/generic (eg pci error, fw error, rx/tx error)
|
||||
or unknown (driver specific).
|
||||
For each registered health reporter a driver can issue error/health reports
|
||||
asynchronously. All health reports handling is done by devlink.
|
||||
Device driver can provide specific callbacks for each "health reporter", e.g.
|
||||
- Recovery procedures
|
||||
- Diagnostics and object dump procedures
|
||||
- OOB initial parameters
|
||||
Different parts of the driver can register different types of health reporters
|
||||
with different handlers.
|
||||
|
||||
Once an error is reported, devlink health will do the following actions:
|
||||
* A log is being send to the kernel trace events buffer
|
||||
* Health status and statistics are being updated for the reporter instance
|
||||
* Object dump is being taken and saved at the reporter instance (as long as
|
||||
there is no other dump which is already stored)
|
||||
* Auto recovery attempt is being done. Depends on:
|
||||
- Auto-recovery configuration
|
||||
- Grace period vs. time passed since last recover
|
||||
|
||||
The user interface:
|
||||
User can access/change each reporter's parameters and driver specific callbacks
|
||||
via devlink, e.g per error type (per health reporter)
|
||||
- Configure reporter's generic parameters (like: disable/enable auto recovery)
|
||||
- Invoke recovery procedure
|
||||
- Run diagnostics
|
||||
- Object dump
|
||||
|
||||
The devlink health interface (via netlink):
|
||||
DEVLINK_CMD_HEALTH_REPORTER_GET
|
||||
Retrieves status and configuration info per DEV and reporter.
|
||||
DEVLINK_CMD_HEALTH_REPORTER_SET
|
||||
Allows reporter-related configuration setting.
|
||||
DEVLINK_CMD_HEALTH_REPORTER_RECOVER
|
||||
Triggers a reporter's recovery procedure.
|
||||
DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE
|
||||
Retrieves diagnostics data from a reporter on a device.
|
||||
DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET
|
||||
Retrieves the last stored dump. Devlink health
|
||||
saves a single dump. If an dump is not already stored by the devlink
|
||||
for this reporter, devlink generates a new dump.
|
||||
dump output is defined by the reporter.
|
||||
DEVLINK_CMD_HEALTH_REPORTER_DUMP_CLEAR
|
||||
Clears the last saved dump file for the specified reporter.
|
||||
|
||||
|
||||
netlink
|
||||
+--------------------------+
|
||||
| |
|
||||
| + |
|
||||
| | |
|
||||
+--------------------------+
|
||||
|request for ops
|
||||
|(diagnose,
|
||||
mlx5_core devlink |recover,
|
||||
|dump)
|
||||
+--------+ +--------------------------+
|
||||
| | | reporter| |
|
||||
| | | +---------v----------+ |
|
||||
| | ops execution | | | |
|
||||
| <----------------------------------+ | |
|
||||
| | | | | |
|
||||
| | | + ^------------------+ |
|
||||
| | | | request for ops |
|
||||
| | | | (recover, dump) |
|
||||
| | | | |
|
||||
| | | +-+------------------+ |
|
||||
| | health report | | health handler | |
|
||||
| +-------------------------------> | |
|
||||
| | | +--------------------+ |
|
||||
| | health reporter create | |
|
||||
| +----------------------------> |
|
||||
+--------+ +--------------------------+
|
@ -9,6 +9,7 @@ Contents:
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
devlink-health
|
||||
devlink-info-versions
|
||||
devlink-trap
|
||||
devlink-trap-netdevsim
|
||||
|
Loading…
Reference in New Issue
Block a user