mirror of
https://mirrors.bfsu.edu.cn/git/linux.git
synced 2024-11-11 12:28:41 +08:00
KVM/arm fixes for 5.3
- A bunch of switch/case fall-through annotation, fixing one actual bug - Fix PMU reset bug - Add missing exception class debug strings -----BEGIN PGP SIGNATURE----- iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAl1Bzw8PHG1hekBrZXJu ZWwub3JnAAoJECPQ0LrRPXpDlXYP/ixqJzqpJetTrvpiUpmLjhp4YwjjOxqyeQvo bWy/EFz8bSWbTZlwAAstFDVmtGenuwaiOakChvV8GH6USYqRsYdvc/sJu0evQplJ JQtOzGhyv1NuM0s9wYBcstAH+YAW+gBK5YFnowreheuidK/1lo3C/EnR2DxCtNal gpV3qQt8qfw3ysGlpC/fDjjOYw4lDkFa6CSx9uk3/587fPBqHANRY/i87nJxmhhX lGeCJcOrY3cy1HhbedFwxVt4Q/ZbHf0UhTfgwvsBYw7BaWmB1ymoEOoktQcUWoKb LL0rBe+OxNQgRnJpn3fMEHiCAmXaI9qE4dohFOl1J3dQvCElcV/jWjkXDD1+KgzW S2XZGB6yxet93Fh1x6xv4i6ATJvmZeTIDUXi9KkjcDiycB9YMCDYY2ejTbQv5VUP V0DghGGDd3d8sY7dEjxwBakuJ6nqKixSouQaNsWuBTm7tVpEVS8yW+hqWs/IVI5b 48SDbxaNpKvx7sAyhuWAjCFbZeIm0hd//JN3JoxazF9i9PKuqnZLbNv/ME6hmzj+ LrETwaAbjsw5Au+ST+OdT2UiauiBm9C6Kg62qagHrKJviuK941+3hjH8aj/e0pYk a0DQxumiyofXPQ0pVe8ZfqlPptONz+EKyAsrOm8AjLJ+bBdRUNHLcZKYj7em7YiE pANc8/T+ =kcDj -----END PGP SIGNATURE----- Merge tag 'kvmarm-fixes-for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm fixes for 5.3 - A bunch of switch/case fall-through annotation, fixing one actual bug - Fix PMU reset bug - Add missing exception class debug strings
This commit is contained in:
commit
0e1c438c44
1
.gitignore
vendored
1
.gitignore
vendored
@ -30,6 +30,7 @@
|
||||
*.lz4
|
||||
*.lzma
|
||||
*.lzo
|
||||
*.mod
|
||||
*.mod.c
|
||||
*.o
|
||||
*.o.*
|
||||
|
3
CREDITS
3
CREDITS
@ -1770,7 +1770,6 @@ S: USA
|
||||
|
||||
N: Dave Jones
|
||||
E: davej@codemonkey.org.uk
|
||||
W: http://www.codemonkey.org.uk
|
||||
D: Assorted VIA x86 support.
|
||||
D: 2.5 AGPGART overhaul.
|
||||
D: CPUFREQ maintenance.
|
||||
@ -3120,7 +3119,7 @@ S: France
|
||||
N: Rik van Riel
|
||||
E: riel@redhat.com
|
||||
W: http://www.surriel.com/
|
||||
D: Linux-MM site, Documentation/sysctl/*, swap/mm readaround
|
||||
D: Linux-MM site, Documentation/admin-guide/sysctl/*, swap/mm readaround
|
||||
D: kswapd fixes, random kernel hacker, rmap VM,
|
||||
D: nl.linux.org administrator, minor scheduler additions
|
||||
S: Red Hat Boston
|
||||
|
@ -11,7 +11,7 @@ Description:
|
||||
Kernel code may export it for complete or partial access.
|
||||
|
||||
GPIOs are identified as they are inside the kernel, using integers in
|
||||
the range 0..INT_MAX. See Documentation/gpio for more information.
|
||||
the range 0..INT_MAX. See Documentation/admin-guide/gpio for more information.
|
||||
|
||||
/sys/class/gpio
|
||||
/export ... asks the kernel to export a GPIO to userspace
|
||||
|
@ -1,6 +1,6 @@
|
||||
rfkill - radio frequency (RF) connector kill switch support
|
||||
|
||||
For details to this subsystem look at Documentation/rfkill.txt.
|
||||
For details to this subsystem look at Documentation/driver-api/rfkill.rst.
|
||||
|
||||
What: /sys/class/rfkill/rfkill[0-9]+/claim
|
||||
Date: 09-Jul-2007
|
||||
|
@ -423,23 +423,6 @@ Description:
|
||||
(e.g. driver restart on the VM which owns the VF).
|
||||
|
||||
|
||||
sysfs interface for NetEffect RNIC Low-Level iWARP driver (nes)
|
||||
---------------------------------------------------------------
|
||||
|
||||
What: /sys/class/infiniband/nesX/hw_rev
|
||||
What: /sys/class/infiniband/nesX/hca_type
|
||||
What: /sys/class/infiniband/nesX/board_id
|
||||
Date: Feb, 2008
|
||||
KernelVersion: v2.6.25
|
||||
Contact: linux-rdma@vger.kernel.org
|
||||
Description:
|
||||
hw_rev: (RO) Hardware revision number
|
||||
|
||||
hca_type: (RO) Host Channel Adapter type (NEX020)
|
||||
|
||||
board_id: (RO) Manufacturing board id
|
||||
|
||||
|
||||
sysfs interface for Chelsio T4/T5 RDMA driver (cxgb4)
|
||||
-----------------------------------------------------
|
||||
|
||||
|
@ -1,6 +1,6 @@
|
||||
rfkill - radio frequency (RF) connector kill switch support
|
||||
|
||||
For details to this subsystem look at Documentation/rfkill.txt.
|
||||
For details to this subsystem look at Documentation/driver-api/rfkill.rst.
|
||||
|
||||
For the deprecated /sys/class/rfkill/*/claim knobs of this interface look in
|
||||
Documentation/ABI/removed/sysfs-class-rfkill.
|
||||
|
@ -61,7 +61,7 @@ Date: October 2002
|
||||
Contact: Linux Memory Management list <linux-mm@kvack.org>
|
||||
Description:
|
||||
The node's hit/miss statistics, in units of pages.
|
||||
See Documentation/numastat.txt
|
||||
See Documentation/admin-guide/numastat.rst
|
||||
|
||||
What: /sys/devices/system/node/nodeX/distance
|
||||
Date: October 2002
|
||||
|
@ -120,3 +120,23 @@ Description: These files show the system reset cause, as following: ComEx
|
||||
the last reset cause.
|
||||
|
||||
The files are read only.
|
||||
|
||||
Date: June 2019
|
||||
KernelVersion: 5.3
|
||||
Contact: Vadim Pasternak <vadimpmellanox.com>
|
||||
Description: These files show the system reset cause, as following:
|
||||
COMEX thermal shutdown; wathchdog power off or reset was derived
|
||||
by one of the next components: COMEX, switch board or by Small Form
|
||||
Factor mezzanine, reset requested from ASIC, reset cuased by BIOS
|
||||
reload. Value 1 in file means this is reset cause, 0 - otherwise.
|
||||
Only one of the above causes could be 1 at the same time, representing
|
||||
only last reset cause.
|
||||
|
||||
The files are read only.
|
||||
|
||||
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_comex_thermal
|
||||
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_comex_wd
|
||||
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_from_asic
|
||||
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_reload_bios
|
||||
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_sff_wd
|
||||
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_swb_wd
|
||||
|
@ -29,4 +29,4 @@ Description:
|
||||
17 - sectors discarded
|
||||
18 - time spent discarding
|
||||
|
||||
For more details refer to Documentation/iostats.txt
|
||||
For more details refer to Documentation/admin-guide/iostats.rst
|
||||
|
@ -15,7 +15,7 @@ Description:
|
||||
9 - I/Os currently in progress
|
||||
10 - time spent doing I/Os (ms)
|
||||
11 - weighted time spent doing I/Os (ms)
|
||||
For more details refer Documentation/iostats.txt
|
||||
For more details refer Documentation/admin-guide/iostats.rst
|
||||
|
||||
|
||||
What: /sys/block/<disk>/<part>/stat
|
||||
|
@ -45,7 +45,7 @@ Description:
|
||||
- Values below -2 are rejected with -EINVAL
|
||||
|
||||
For more information, see
|
||||
Documentation/laptops/disk-shock-protection.txt
|
||||
Documentation/admin-guide/laptops/disk-shock-protection.rst
|
||||
|
||||
|
||||
What: /sys/block/*/device/ncq_prio_enable
|
||||
|
@ -376,10 +376,42 @@ Description:
|
||||
supply. Normally this is configured based on the type of
|
||||
connection made (e.g. A configured SDP should output a maximum
|
||||
of 500mA so the input current limit is set to the same value).
|
||||
Use preferably input_power_limit, and for problems that can be
|
||||
solved using power limit use input_current_limit.
|
||||
|
||||
Access: Read, Write
|
||||
Valid values: Represented in microamps
|
||||
|
||||
What: /sys/class/power_supply/<supply_name>/input_voltage_limit
|
||||
Date: May 2019
|
||||
Contact: linux-pm@vger.kernel.org
|
||||
Description:
|
||||
This entry configures the incoming VBUS voltage limit currently
|
||||
set in the supply. Normally this is configured based on
|
||||
system-level knowledge or user input (e.g. This is part of the
|
||||
Pixel C's thermal management strategy to effectively limit the
|
||||
input power to 5V when the screen is on to meet Google's skin
|
||||
temperature targets). Note that this feature should not be
|
||||
used for safety critical things.
|
||||
Use preferably input_power_limit, and for problems that can be
|
||||
solved using power limit use input_voltage_limit.
|
||||
|
||||
Access: Read, Write
|
||||
Valid values: Represented in microvolts
|
||||
|
||||
What: /sys/class/power_supply/<supply_name>/input_power_limit
|
||||
Date: May 2019
|
||||
Contact: linux-pm@vger.kernel.org
|
||||
Description:
|
||||
This entry configures the incoming power limit currently set
|
||||
in the supply. Normally this is configured based on
|
||||
system-level knowledge or user input. Use preferably this
|
||||
feature to limit the incoming power and use current/voltage
|
||||
limit only for problems that can be solved using power limit.
|
||||
|
||||
Access: Read, Write
|
||||
Valid values: Represented in microwatts
|
||||
|
||||
What: /sys/class/power_supply/<supply_name>/online,
|
||||
Date: May 2007
|
||||
Contact: linux-pm@vger.kernel.org
|
||||
|
30
Documentation/ABI/testing/sysfs-class-power-wilco
Normal file
30
Documentation/ABI/testing/sysfs-class-power-wilco
Normal file
@ -0,0 +1,30 @@
|
||||
What: /sys/class/power_supply/wilco-charger/charge_type
|
||||
Date: April 2019
|
||||
KernelVersion: 5.2
|
||||
Description:
|
||||
What charging algorithm to use:
|
||||
|
||||
Standard: Fully charges battery at a standard rate.
|
||||
Adaptive: Battery settings adaptively optimized based on
|
||||
typical battery usage pattern.
|
||||
Fast: Battery charges over a shorter period.
|
||||
Trickle: Extends battery lifespan, intended for users who
|
||||
primarily use their Chromebook while connected to AC.
|
||||
Custom: A low and high threshold percentage is specified.
|
||||
Charging begins when level drops below
|
||||
charge_control_start_threshold, and ceases when
|
||||
level is above charge_control_end_threshold.
|
||||
|
||||
What: /sys/class/power_supply/wilco-charger/charge_control_start_threshold
|
||||
Date: April 2019
|
||||
KernelVersion: 5.2
|
||||
Description:
|
||||
Used when charge_type="Custom", as described above. Measured in
|
||||
percentages. The valid range is [50, 95].
|
||||
|
||||
What: /sys/class/power_supply/wilco-charger/charge_control_end_threshold
|
||||
Date: April 2019
|
||||
KernelVersion: 5.2
|
||||
Description:
|
||||
Used when charge_type="Custom", as described above. Measured in
|
||||
percentages. The valid range is [55, 100].
|
@ -5,7 +5,7 @@ Contact: linux-pm@vger.kernel.org
|
||||
Description:
|
||||
The powercap/ class sub directory belongs to the power cap
|
||||
subsystem. Refer to
|
||||
Documentation/power/powercap/powercap.txt for details.
|
||||
Documentation/power/powercap/powercap.rst for details.
|
||||
|
||||
What: /sys/class/powercap/<control type>
|
||||
Date: September 2013
|
||||
|
@ -1,6 +1,6 @@
|
||||
switchtec - Microsemi Switchtec PCI Switch Management Endpoint
|
||||
|
||||
For details on this subsystem look at Documentation/switchtec.txt.
|
||||
For details on this subsystem look at Documentation/driver-api/switchtec.rst.
|
||||
|
||||
What: /sys/class/switchtec
|
||||
Date: 05-Jan-2017
|
||||
|
@ -34,7 +34,7 @@ Description: CPU topology files that describe kernel limits related to
|
||||
present: cpus that have been identified as being present in
|
||||
the system.
|
||||
|
||||
See Documentation/cputopology.txt for more information.
|
||||
See Documentation/admin-guide/cputopology.rst for more information.
|
||||
|
||||
|
||||
What: /sys/devices/system/cpu/probe
|
||||
@ -103,7 +103,7 @@ Description: CPU topology files that describe a logical CPU's relationship
|
||||
thread_siblings_list: human-readable list of cpu#'s hardware
|
||||
threads within the same core as cpu#
|
||||
|
||||
See Documentation/cputopology.txt for more information.
|
||||
See Documentation/admin-guide/cputopology.rst for more information.
|
||||
|
||||
|
||||
What: /sys/devices/system/cpu/cpuidle/current_driver
|
||||
|
@ -31,7 +31,7 @@ Description:
|
||||
To control the LED display, use the following :
|
||||
echo 0x0T000DDD > /sys/devices/platform/asus_laptop/
|
||||
where T control the 3 letters display, and DDD the 3 digits display.
|
||||
The DDD table can be found in Documentation/laptops/asus-laptop.txt
|
||||
The DDD table can be found in Documentation/admin-guide/laptops/asus-laptop.rst
|
||||
|
||||
What: /sys/devices/platform/asus_laptop/bluetooth
|
||||
Date: January 2007
|
||||
|
@ -36,3 +36,13 @@ KernelVersion: 3.5
|
||||
Contact: "AceLan Kao" <acelan.kao@canonical.com>
|
||||
Description:
|
||||
Resume on lid open. 1 means on, 0 means off.
|
||||
|
||||
What: /sys/devices/platform/<platform>/fan_boost_mode
|
||||
Date: Sep 2019
|
||||
KernelVersion: 5.3
|
||||
Contact: "Yurii Pavlovskyi" <yurii.pavlovskyi@gmail.com>
|
||||
Description:
|
||||
Fan boost mode:
|
||||
* 0 - normal,
|
||||
* 1 - overboost,
|
||||
* 2 - silent
|
||||
|
@ -1,7 +1,7 @@
|
||||
What: /sys/devices/platform/<i2c-demux-name>/available_masters
|
||||
Date: January 2016
|
||||
KernelVersion: 4.6
|
||||
Contact: Wolfram Sang <wsa@the-dreams.de>
|
||||
Contact: Wolfram Sang <wsa+renesas@sang-engineering.com>
|
||||
Description:
|
||||
Reading the file will give you a list of masters which can be
|
||||
selected for a demultiplexed bus. The format is
|
||||
@ -12,7 +12,7 @@ Description:
|
||||
What: /sys/devices/platform/<i2c-demux-name>/current_master
|
||||
Date: January 2016
|
||||
KernelVersion: 4.6
|
||||
Contact: Wolfram Sang <wsa@the-dreams.de>
|
||||
Contact: Wolfram Sang <wsa+renesas@sang-engineering.com>
|
||||
Description:
|
||||
This file selects/shows the active I2C master for a demultiplexed
|
||||
bus. It uses the <index> value from the file 'available_masters'.
|
||||
|
@ -212,7 +212,7 @@ The standard 64-bit addressing device would do something like this::
|
||||
|
||||
If the device only supports 32-bit addressing for descriptors in the
|
||||
coherent allocations, but supports full 64-bits for streaming mappings
|
||||
it would look like this:
|
||||
it would look like this::
|
||||
|
||||
if (dma_set_mask(dev, DMA_BIT_MASK(64))) {
|
||||
dev_warn(dev, "mydev: No suitable DMA available\n");
|
||||
|
@ -1,4 +1,8 @@
|
||||
ACPI considerations for PCI host bridges
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
========================================
|
||||
ACPI considerations for PCI host bridges
|
||||
========================================
|
||||
|
||||
The general rule is that the ACPI namespace should describe everything the
|
||||
OS might use unless there's another way for the OS to find it [1, 2].
|
||||
@ -131,12 +135,13 @@ address always corresponds to bus 0, even if the bus range below the bridge
|
||||
|
||||
[4] ACPI 6.2, sec 6.4.3.5.1, 2, 3, 4:
|
||||
QWord/DWord/Word Address Space Descriptor (.1, .2, .3)
|
||||
General Flags: Bit [0] Ignored
|
||||
General Flags: Bit [0] Ignored
|
||||
|
||||
Extended Address Space Descriptor (.4)
|
||||
General Flags: Bit [0] Consumer/Producer:
|
||||
1–This device consumes this resource
|
||||
0–This device produces and consumes this resource
|
||||
General Flags: Bit [0] Consumer/Producer:
|
||||
|
||||
* 1 – This device consumes this resource
|
||||
* 0 – This device produces and consumes this resource
|
||||
|
||||
[5] ACPI 6.2, sec 19.6.43:
|
||||
ResourceUsage specifies whether the Memory range is consumed by
|
13
Documentation/PCI/endpoint/index.rst
Normal file
13
Documentation/PCI/endpoint/index.rst
Normal file
@ -0,0 +1,13 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
======================
|
||||
PCI Endpoint Framework
|
||||
======================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
pci-endpoint
|
||||
pci-endpoint-cfs
|
||||
pci-test-function
|
||||
pci-test-howto
|
@ -1,41 +1,51 @@
|
||||
CONFIGURING PCI ENDPOINT USING CONFIGFS
|
||||
Kishon Vijay Abraham I <kishon@ti.com>
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
=======================================
|
||||
Configuring PCI Endpoint Using CONFIGFS
|
||||
=======================================
|
||||
|
||||
:Author: Kishon Vijay Abraham I <kishon@ti.com>
|
||||
|
||||
The PCI Endpoint Core exposes configfs entry (pci_ep) to configure the
|
||||
PCI endpoint function and to bind the endpoint function
|
||||
with the endpoint controller. (For introducing other mechanisms to
|
||||
configure the PCI Endpoint Function refer to [1]).
|
||||
|
||||
*) Mounting configfs
|
||||
Mounting configfs
|
||||
=================
|
||||
|
||||
The PCI Endpoint Core layer creates pci_ep directory in the mounted configfs
|
||||
directory. configfs can be mounted using the following command.
|
||||
directory. configfs can be mounted using the following command::
|
||||
|
||||
mount -t configfs none /sys/kernel/config
|
||||
|
||||
*) Directory Structure
|
||||
Directory Structure
|
||||
===================
|
||||
|
||||
The pci_ep configfs has two directories at its root: controllers and
|
||||
functions. Every EPC device present in the system will have an entry in
|
||||
the *controllers* directory and and every EPF driver present in the system
|
||||
will have an entry in the *functions* directory.
|
||||
::
|
||||
|
||||
/sys/kernel/config/pci_ep/
|
||||
.. controllers/
|
||||
.. functions/
|
||||
/sys/kernel/config/pci_ep/
|
||||
.. controllers/
|
||||
.. functions/
|
||||
|
||||
*) Creating EPF Device
|
||||
Creating EPF Device
|
||||
===================
|
||||
|
||||
Every registered EPF driver will be listed in controllers directory. The
|
||||
entries corresponding to EPF driver will be created by the EPF core.
|
||||
::
|
||||
|
||||
/sys/kernel/config/pci_ep/functions/
|
||||
.. <EPF Driver1>/
|
||||
... <EPF Device 11>/
|
||||
... <EPF Device 21>/
|
||||
.. <EPF Driver2>/
|
||||
... <EPF Device 12>/
|
||||
... <EPF Device 22>/
|
||||
/sys/kernel/config/pci_ep/functions/
|
||||
.. <EPF Driver1>/
|
||||
... <EPF Device 11>/
|
||||
... <EPF Device 21>/
|
||||
.. <EPF Driver2>/
|
||||
... <EPF Device 12>/
|
||||
... <EPF Device 22>/
|
||||
|
||||
In order to create a <EPF device> of the type probed by <EPF Driver>, the
|
||||
user has to create a directory inside <EPF DriverN>.
|
||||
@ -44,34 +54,37 @@ Every <EPF device> directory consists of the following entries that can be
|
||||
used to configure the standard configuration header of the endpoint function.
|
||||
(These entries are created by the framework when any new <EPF Device> is
|
||||
created)
|
||||
::
|
||||
|
||||
.. <EPF Driver1>/
|
||||
... <EPF Device 11>/
|
||||
... vendorid
|
||||
... deviceid
|
||||
... revid
|
||||
... progif_code
|
||||
... subclass_code
|
||||
... baseclass_code
|
||||
... cache_line_size
|
||||
... subsys_vendor_id
|
||||
... subsys_id
|
||||
... interrupt_pin
|
||||
.. <EPF Driver1>/
|
||||
... <EPF Device 11>/
|
||||
... vendorid
|
||||
... deviceid
|
||||
... revid
|
||||
... progif_code
|
||||
... subclass_code
|
||||
... baseclass_code
|
||||
... cache_line_size
|
||||
... subsys_vendor_id
|
||||
... subsys_id
|
||||
... interrupt_pin
|
||||
|
||||
*) EPC Device
|
||||
EPC Device
|
||||
==========
|
||||
|
||||
Every registered EPC device will be listed in controllers directory. The
|
||||
entries corresponding to EPC device will be created by the EPC core.
|
||||
::
|
||||
|
||||
/sys/kernel/config/pci_ep/controllers/
|
||||
.. <EPC Device1>/
|
||||
... <Symlink EPF Device11>/
|
||||
... <Symlink EPF Device12>/
|
||||
... start
|
||||
.. <EPC Device2>/
|
||||
... <Symlink EPF Device21>/
|
||||
... <Symlink EPF Device22>/
|
||||
... start
|
||||
/sys/kernel/config/pci_ep/controllers/
|
||||
.. <EPC Device1>/
|
||||
... <Symlink EPF Device11>/
|
||||
... <Symlink EPF Device12>/
|
||||
... start
|
||||
.. <EPC Device2>/
|
||||
... <Symlink EPF Device21>/
|
||||
... <Symlink EPF Device22>/
|
||||
... start
|
||||
|
||||
The <EPC Device> directory will have a list of symbolic links to
|
||||
<EPF Device>. These symbolic links should be created by the user to
|
||||
@ -81,7 +94,7 @@ The <EPC Device> directory will also have a *start* field. Once
|
||||
"1" is written to this field, the endpoint device will be ready to
|
||||
establish the link with the host. This is usually done after
|
||||
all the EPF devices are created and linked with the EPC device.
|
||||
|
||||
::
|
||||
|
||||
| controllers/
|
||||
| <Directory: EPC name>/
|
||||
@ -102,4 +115,4 @@ all the EPF devices are created and linked with the EPC device.
|
||||
| interrupt_pin
|
||||
| function
|
||||
|
||||
[1] -> Documentation/PCI/endpoint/pci-endpoint.txt
|
||||
[1] :doc:`pci-endpoint`
|
@ -1,11 +1,13 @@
|
||||
PCI ENDPOINT FRAMEWORK
|
||||
Kishon Vijay Abraham I <kishon@ti.com>
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
:Author: Kishon Vijay Abraham I <kishon@ti.com>
|
||||
|
||||
This document is a guide to use the PCI Endpoint Framework in order to create
|
||||
endpoint controller driver, endpoint function driver, and using configfs
|
||||
interface to bind the function driver to the controller driver.
|
||||
|
||||
1. Introduction
|
||||
Introduction
|
||||
============
|
||||
|
||||
Linux has a comprehensive PCI subsystem to support PCI controllers that
|
||||
operates in Root Complex mode. The subsystem has capability to scan PCI bus,
|
||||
@ -19,26 +21,30 @@ add endpoint mode support in Linux. This will help to run Linux in an
|
||||
EP system which can have a wide variety of use cases from testing or
|
||||
validation, co-processor accelerator, etc.
|
||||
|
||||
2. PCI Endpoint Core
|
||||
PCI Endpoint Core
|
||||
=================
|
||||
|
||||
The PCI Endpoint Core layer comprises 3 components: the Endpoint Controller
|
||||
library, the Endpoint Function library, and the configfs layer to bind the
|
||||
endpoint function with the endpoint controller.
|
||||
|
||||
2.1 PCI Endpoint Controller(EPC) Library
|
||||
PCI Endpoint Controller(EPC) Library
|
||||
------------------------------------
|
||||
|
||||
The EPC library provides APIs to be used by the controller that can operate
|
||||
in endpoint mode. It also provides APIs to be used by function driver/library
|
||||
in order to implement a particular endpoint function.
|
||||
|
||||
2.1.1 APIs for the PCI controller Driver
|
||||
APIs for the PCI controller Driver
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
This section lists the APIs that the PCI Endpoint core provides to be used
|
||||
by the PCI controller driver.
|
||||
|
||||
*) devm_pci_epc_create()/pci_epc_create()
|
||||
* devm_pci_epc_create()/pci_epc_create()
|
||||
|
||||
The PCI controller driver should implement the following ops:
|
||||
|
||||
* write_header: ops to populate configuration space header
|
||||
* set_bar: ops to configure the BAR
|
||||
* clear_bar: ops to reset the BAR
|
||||
@ -51,110 +57,116 @@ by the PCI controller driver.
|
||||
The PCI controller driver can then create a new EPC device by invoking
|
||||
devm_pci_epc_create()/pci_epc_create().
|
||||
|
||||
*) devm_pci_epc_destroy()/pci_epc_destroy()
|
||||
* devm_pci_epc_destroy()/pci_epc_destroy()
|
||||
|
||||
The PCI controller driver can destroy the EPC device created by either
|
||||
devm_pci_epc_create() or pci_epc_create() using devm_pci_epc_destroy() or
|
||||
pci_epc_destroy().
|
||||
|
||||
*) pci_epc_linkup()
|
||||
* pci_epc_linkup()
|
||||
|
||||
In order to notify all the function devices that the EPC device to which
|
||||
they are linked has established a link with the host, the PCI controller
|
||||
driver should invoke pci_epc_linkup().
|
||||
|
||||
*) pci_epc_mem_init()
|
||||
* pci_epc_mem_init()
|
||||
|
||||
Initialize the pci_epc_mem structure used for allocating EPC addr space.
|
||||
|
||||
*) pci_epc_mem_exit()
|
||||
* pci_epc_mem_exit()
|
||||
|
||||
Cleanup the pci_epc_mem structure allocated during pci_epc_mem_init().
|
||||
|
||||
2.1.2 APIs for the PCI Endpoint Function Driver
|
||||
|
||||
APIs for the PCI Endpoint Function Driver
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
This section lists the APIs that the PCI Endpoint core provides to be used
|
||||
by the PCI endpoint function driver.
|
||||
|
||||
*) pci_epc_write_header()
|
||||
* pci_epc_write_header()
|
||||
|
||||
The PCI endpoint function driver should use pci_epc_write_header() to
|
||||
write the standard configuration header to the endpoint controller.
|
||||
|
||||
*) pci_epc_set_bar()
|
||||
* pci_epc_set_bar()
|
||||
|
||||
The PCI endpoint function driver should use pci_epc_set_bar() to configure
|
||||
the Base Address Register in order for the host to assign PCI addr space.
|
||||
Register space of the function driver is usually configured
|
||||
using this API.
|
||||
|
||||
*) pci_epc_clear_bar()
|
||||
* pci_epc_clear_bar()
|
||||
|
||||
The PCI endpoint function driver should use pci_epc_clear_bar() to reset
|
||||
the BAR.
|
||||
|
||||
*) pci_epc_raise_irq()
|
||||
* pci_epc_raise_irq()
|
||||
|
||||
The PCI endpoint function driver should use pci_epc_raise_irq() to raise
|
||||
Legacy Interrupt, MSI or MSI-X Interrupt.
|
||||
|
||||
*) pci_epc_mem_alloc_addr()
|
||||
* pci_epc_mem_alloc_addr()
|
||||
|
||||
The PCI endpoint function driver should use pci_epc_mem_alloc_addr(), to
|
||||
allocate memory address from EPC addr space which is required to access
|
||||
RC's buffer
|
||||
|
||||
*) pci_epc_mem_free_addr()
|
||||
* pci_epc_mem_free_addr()
|
||||
|
||||
The PCI endpoint function driver should use pci_epc_mem_free_addr() to
|
||||
free the memory space allocated using pci_epc_mem_alloc_addr().
|
||||
|
||||
2.1.3 Other APIs
|
||||
Other APIs
|
||||
~~~~~~~~~~
|
||||
|
||||
There are other APIs provided by the EPC library. These are used for binding
|
||||
the EPF device with EPC device. pci-ep-cfs.c can be used as reference for
|
||||
using these APIs.
|
||||
|
||||
*) pci_epc_get()
|
||||
* pci_epc_get()
|
||||
|
||||
Get a reference to the PCI endpoint controller based on the device name of
|
||||
the controller.
|
||||
|
||||
*) pci_epc_put()
|
||||
* pci_epc_put()
|
||||
|
||||
Release the reference to the PCI endpoint controller obtained using
|
||||
pci_epc_get()
|
||||
|
||||
*) pci_epc_add_epf()
|
||||
* pci_epc_add_epf()
|
||||
|
||||
Add a PCI endpoint function to a PCI endpoint controller. A PCIe device
|
||||
can have up to 8 functions according to the specification.
|
||||
|
||||
*) pci_epc_remove_epf()
|
||||
* pci_epc_remove_epf()
|
||||
|
||||
Remove the PCI endpoint function from PCI endpoint controller.
|
||||
|
||||
*) pci_epc_start()
|
||||
* pci_epc_start()
|
||||
|
||||
The PCI endpoint function driver should invoke pci_epc_start() once it
|
||||
has configured the endpoint function and wants to start the PCI link.
|
||||
|
||||
*) pci_epc_stop()
|
||||
* pci_epc_stop()
|
||||
|
||||
The PCI endpoint function driver should invoke pci_epc_stop() to stop
|
||||
the PCI LINK.
|
||||
|
||||
2.2 PCI Endpoint Function(EPF) Library
|
||||
|
||||
PCI Endpoint Function(EPF) Library
|
||||
----------------------------------
|
||||
|
||||
The EPF library provides APIs to be used by the function driver and the EPC
|
||||
library to provide endpoint mode functionality.
|
||||
|
||||
2.2.1 APIs for the PCI Endpoint Function Driver
|
||||
APIs for the PCI Endpoint Function Driver
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
This section lists the APIs that the PCI Endpoint core provides to be used
|
||||
by the PCI endpoint function driver.
|
||||
|
||||
*) pci_epf_register_driver()
|
||||
* pci_epf_register_driver()
|
||||
|
||||
The PCI Endpoint Function driver should implement the following ops:
|
||||
* bind: ops to perform when a EPC device has been bound to EPF device
|
||||
@ -166,50 +178,54 @@ by the PCI endpoint function driver.
|
||||
The PCI Function driver can then register the PCI EPF driver by using
|
||||
pci_epf_register_driver().
|
||||
|
||||
*) pci_epf_unregister_driver()
|
||||
* pci_epf_unregister_driver()
|
||||
|
||||
The PCI Function driver can unregister the PCI EPF driver by using
|
||||
pci_epf_unregister_driver().
|
||||
|
||||
*) pci_epf_alloc_space()
|
||||
* pci_epf_alloc_space()
|
||||
|
||||
The PCI Function driver can allocate space for a particular BAR using
|
||||
pci_epf_alloc_space().
|
||||
|
||||
*) pci_epf_free_space()
|
||||
* pci_epf_free_space()
|
||||
|
||||
The PCI Function driver can free the allocated space
|
||||
(using pci_epf_alloc_space) by invoking pci_epf_free_space().
|
||||
|
||||
2.2.2 APIs for the PCI Endpoint Controller Library
|
||||
APIs for the PCI Endpoint Controller Library
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
This section lists the APIs that the PCI Endpoint core provides to be used
|
||||
by the PCI endpoint controller library.
|
||||
|
||||
*) pci_epf_linkup()
|
||||
* pci_epf_linkup()
|
||||
|
||||
The PCI endpoint controller library invokes pci_epf_linkup() when the
|
||||
EPC device has established the connection to the host.
|
||||
|
||||
2.2.2 Other APIs
|
||||
Other APIs
|
||||
~~~~~~~~~~
|
||||
|
||||
There are other APIs provided by the EPF library. These are used to notify
|
||||
the function driver when the EPF device is bound to the EPC device.
|
||||
pci-ep-cfs.c can be used as reference for using these APIs.
|
||||
|
||||
*) pci_epf_create()
|
||||
* pci_epf_create()
|
||||
|
||||
Create a new PCI EPF device by passing the name of the PCI EPF device.
|
||||
This name will be used to bind the the EPF device to a EPF driver.
|
||||
|
||||
*) pci_epf_destroy()
|
||||
* pci_epf_destroy()
|
||||
|
||||
Destroy the created PCI EPF device.
|
||||
|
||||
*) pci_epf_bind()
|
||||
* pci_epf_bind()
|
||||
|
||||
pci_epf_bind() should be invoked when the EPF device has been bound to
|
||||
a EPC device.
|
||||
|
||||
*) pci_epf_unbind()
|
||||
* pci_epf_unbind()
|
||||
|
||||
pci_epf_unbind() should be invoked when the binding between EPC device
|
||||
and EPF device is lost.
|
@ -1,5 +1,10 @@
|
||||
PCI TEST
|
||||
Kishon Vijay Abraham I <kishon@ti.com>
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
=================
|
||||
PCI Test Function
|
||||
=================
|
||||
|
||||
:Author: Kishon Vijay Abraham I <kishon@ti.com>
|
||||
|
||||
Traditionally PCI RC has always been validated by using standard
|
||||
PCI cards like ethernet PCI cards or USB PCI cards or SATA PCI cards.
|
||||
@ -23,65 +28,76 @@ The PCI endpoint test device has the following registers:
|
||||
8) PCI_ENDPOINT_TEST_IRQ_TYPE
|
||||
9) PCI_ENDPOINT_TEST_IRQ_NUMBER
|
||||
|
||||
*) PCI_ENDPOINT_TEST_MAGIC
|
||||
* PCI_ENDPOINT_TEST_MAGIC
|
||||
|
||||
This register will be used to test BAR0. A known pattern will be written
|
||||
and read back from MAGIC register to verify BAR0.
|
||||
|
||||
*) PCI_ENDPOINT_TEST_COMMAND:
|
||||
* PCI_ENDPOINT_TEST_COMMAND
|
||||
|
||||
This register will be used by the host driver to indicate the function
|
||||
that the endpoint device must perform.
|
||||
|
||||
Bitfield Description:
|
||||
Bit 0 : raise legacy IRQ
|
||||
Bit 1 : raise MSI IRQ
|
||||
Bit 2 : raise MSI-X IRQ
|
||||
Bit 3 : read command (read data from RC buffer)
|
||||
Bit 4 : write command (write data to RC buffer)
|
||||
Bit 5 : copy command (copy data from one RC buffer to another
|
||||
RC buffer)
|
||||
======== ================================================================
|
||||
Bitfield Description
|
||||
======== ================================================================
|
||||
Bit 0 raise legacy IRQ
|
||||
Bit 1 raise MSI IRQ
|
||||
Bit 2 raise MSI-X IRQ
|
||||
Bit 3 read command (read data from RC buffer)
|
||||
Bit 4 write command (write data to RC buffer)
|
||||
Bit 5 copy command (copy data from one RC buffer to another RC buffer)
|
||||
======== ================================================================
|
||||
|
||||
*) PCI_ENDPOINT_TEST_STATUS
|
||||
* PCI_ENDPOINT_TEST_STATUS
|
||||
|
||||
This register reflects the status of the PCI endpoint device.
|
||||
|
||||
Bitfield Description:
|
||||
Bit 0 : read success
|
||||
Bit 1 : read fail
|
||||
Bit 2 : write success
|
||||
Bit 3 : write fail
|
||||
Bit 4 : copy success
|
||||
Bit 5 : copy fail
|
||||
Bit 6 : IRQ raised
|
||||
Bit 7 : source address is invalid
|
||||
Bit 8 : destination address is invalid
|
||||
======== ==============================
|
||||
Bitfield Description
|
||||
======== ==============================
|
||||
Bit 0 read success
|
||||
Bit 1 read fail
|
||||
Bit 2 write success
|
||||
Bit 3 write fail
|
||||
Bit 4 copy success
|
||||
Bit 5 copy fail
|
||||
Bit 6 IRQ raised
|
||||
Bit 7 source address is invalid
|
||||
Bit 8 destination address is invalid
|
||||
======== ==============================
|
||||
|
||||
*) PCI_ENDPOINT_TEST_SRC_ADDR
|
||||
* PCI_ENDPOINT_TEST_SRC_ADDR
|
||||
|
||||
This register contains the source address (RC buffer address) for the
|
||||
COPY/READ command.
|
||||
|
||||
*) PCI_ENDPOINT_TEST_DST_ADDR
|
||||
* PCI_ENDPOINT_TEST_DST_ADDR
|
||||
|
||||
This register contains the destination address (RC buffer address) for
|
||||
the COPY/WRITE command.
|
||||
|
||||
*) PCI_ENDPOINT_TEST_IRQ_TYPE
|
||||
* PCI_ENDPOINT_TEST_IRQ_TYPE
|
||||
|
||||
This register contains the interrupt type (Legacy/MSI) triggered
|
||||
for the READ/WRITE/COPY and raise IRQ (Legacy/MSI) commands.
|
||||
|
||||
Possible types:
|
||||
- Legacy : 0
|
||||
- MSI : 1
|
||||
- MSI-X : 2
|
||||
|
||||
*) PCI_ENDPOINT_TEST_IRQ_NUMBER
|
||||
====== ==
|
||||
Legacy 0
|
||||
MSI 1
|
||||
MSI-X 2
|
||||
====== ==
|
||||
|
||||
* PCI_ENDPOINT_TEST_IRQ_NUMBER
|
||||
|
||||
This register contains the triggered ID interrupt.
|
||||
|
||||
Admissible values:
|
||||
- Legacy : 0
|
||||
- MSI : [1 .. 32]
|
||||
- MSI-X : [1 .. 2048]
|
||||
|
||||
====== ===========
|
||||
Legacy 0
|
||||
MSI [1 .. 32]
|
||||
MSI-X [1 .. 2048]
|
||||
====== ===========
|
@ -1,38 +1,51 @@
|
||||
PCI TEST USERGUIDE
|
||||
Kishon Vijay Abraham I <kishon@ti.com>
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
===================
|
||||
PCI Test User Guide
|
||||
===================
|
||||
|
||||
:Author: Kishon Vijay Abraham I <kishon@ti.com>
|
||||
|
||||
This document is a guide to help users use pci-epf-test function driver
|
||||
and pci_endpoint_test host driver for testing PCI. The list of steps to
|
||||
be followed in the host side and EP side is given below.
|
||||
|
||||
1. Endpoint Device
|
||||
Endpoint Device
|
||||
===============
|
||||
|
||||
1.1 Endpoint Controller Devices
|
||||
Endpoint Controller Devices
|
||||
---------------------------
|
||||
|
||||
To find the list of endpoint controller devices in the system:
|
||||
To find the list of endpoint controller devices in the system::
|
||||
|
||||
# ls /sys/class/pci_epc/
|
||||
51000000.pcie_ep
|
||||
|
||||
If PCI_ENDPOINT_CONFIGFS is enabled
|
||||
If PCI_ENDPOINT_CONFIGFS is enabled::
|
||||
|
||||
# ls /sys/kernel/config/pci_ep/controllers
|
||||
51000000.pcie_ep
|
||||
|
||||
1.2 Endpoint Function Drivers
|
||||
|
||||
To find the list of endpoint function drivers in the system:
|
||||
Endpoint Function Drivers
|
||||
-------------------------
|
||||
|
||||
To find the list of endpoint function drivers in the system::
|
||||
|
||||
# ls /sys/bus/pci-epf/drivers
|
||||
pci_epf_test
|
||||
|
||||
If PCI_ENDPOINT_CONFIGFS is enabled
|
||||
If PCI_ENDPOINT_CONFIGFS is enabled::
|
||||
|
||||
# ls /sys/kernel/config/pci_ep/functions
|
||||
pci_epf_test
|
||||
|
||||
1.3 Creating pci-epf-test Device
|
||||
|
||||
Creating pci-epf-test Device
|
||||
----------------------------
|
||||
|
||||
PCI endpoint function device can be created using the configfs. To create
|
||||
pci-epf-test device, the following commands can be used
|
||||
pci-epf-test device, the following commands can be used::
|
||||
|
||||
# mount -t configfs none /sys/kernel/config
|
||||
# cd /sys/kernel/config/pci_ep/
|
||||
@ -42,7 +55,7 @@ The "mkdir func1" above creates the pci-epf-test function device that will
|
||||
be probed by pci_epf_test driver.
|
||||
|
||||
The PCI endpoint framework populates the directory with the following
|
||||
configurable fields.
|
||||
configurable fields::
|
||||
|
||||
# ls functions/pci_epf_test/func1
|
||||
baseclass_code interrupt_pin progif_code subsys_id
|
||||
@ -51,67 +64,83 @@ configurable fields.
|
||||
|
||||
The PCI endpoint function driver populates these entries with default values
|
||||
when the device is bound to the driver. The pci-epf-test driver populates
|
||||
vendorid with 0xffff and interrupt_pin with 0x0001
|
||||
vendorid with 0xffff and interrupt_pin with 0x0001::
|
||||
|
||||
# cat functions/pci_epf_test/func1/vendorid
|
||||
0xffff
|
||||
# cat functions/pci_epf_test/func1/interrupt_pin
|
||||
0x0001
|
||||
|
||||
1.4 Configuring pci-epf-test Device
|
||||
|
||||
Configuring pci-epf-test Device
|
||||
-------------------------------
|
||||
|
||||
The user can configure the pci-epf-test device using configfs entry. In order
|
||||
to change the vendorid and the number of MSI interrupts used by the function
|
||||
device, the following commands can be used.
|
||||
device, the following commands can be used::
|
||||
|
||||
# echo 0x104c > functions/pci_epf_test/func1/vendorid
|
||||
# echo 0xb500 > functions/pci_epf_test/func1/deviceid
|
||||
# echo 16 > functions/pci_epf_test/func1/msi_interrupts
|
||||
# echo 8 > functions/pci_epf_test/func1/msix_interrupts
|
||||
|
||||
1.5 Binding pci-epf-test Device to EP Controller
|
||||
|
||||
Binding pci-epf-test Device to EP Controller
|
||||
--------------------------------------------
|
||||
|
||||
In order for the endpoint function device to be useful, it has to be bound to
|
||||
a PCI endpoint controller driver. Use the configfs to bind the function
|
||||
device to one of the controller driver present in the system.
|
||||
device to one of the controller driver present in the system::
|
||||
|
||||
# ln -s functions/pci_epf_test/func1 controllers/51000000.pcie_ep/
|
||||
|
||||
Once the above step is completed, the PCI endpoint is ready to establish a link
|
||||
with the host.
|
||||
|
||||
1.6 Start the Link
|
||||
|
||||
Start the Link
|
||||
--------------
|
||||
|
||||
In order for the endpoint device to establish a link with the host, the _start_
|
||||
field should be populated with '1'.
|
||||
field should be populated with '1'::
|
||||
|
||||
# echo 1 > controllers/51000000.pcie_ep/start
|
||||
|
||||
2. RootComplex Device
|
||||
|
||||
2.1 lspci Output
|
||||
RootComplex Device
|
||||
==================
|
||||
|
||||
Note that the devices listed here correspond to the value populated in 1.4 above
|
||||
lspci Output
|
||||
------------
|
||||
|
||||
Note that the devices listed here correspond to the value populated in 1.4
|
||||
above::
|
||||
|
||||
00:00.0 PCI bridge: Texas Instruments Device 8888 (rev 01)
|
||||
01:00.0 Unassigned class [ff00]: Texas Instruments Device b500
|
||||
|
||||
2.2 Using Endpoint Test function Device
|
||||
|
||||
Using Endpoint Test function Device
|
||||
-----------------------------------
|
||||
|
||||
pcitest.sh added in tools/pci/ can be used to run all the default PCI endpoint
|
||||
tests. To compile this tool the following commands should be used:
|
||||
tests. To compile this tool the following commands should be used::
|
||||
|
||||
# cd <kernel-dir>
|
||||
# make -C tools/pci
|
||||
|
||||
or if you desire to compile and install in your system:
|
||||
or if you desire to compile and install in your system::
|
||||
|
||||
# cd <kernel-dir>
|
||||
# make -C tools/pci install
|
||||
|
||||
The tool and script will be located in <rootfs>/usr/bin/
|
||||
|
||||
2.2.1 pcitest.sh Output
|
||||
|
||||
pcitest.sh Output
|
||||
~~~~~~~~~~~~~~~~~
|
||||
::
|
||||
|
||||
# pcitest.sh
|
||||
BAR tests
|
||||
|
18
Documentation/PCI/index.rst
Normal file
18
Documentation/PCI/index.rst
Normal file
@ -0,0 +1,18 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
=======================
|
||||
Linux PCI Bus Subsystem
|
||||
=======================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
:numbered:
|
||||
|
||||
pci
|
||||
picebus-howto
|
||||
pci-iov-howto
|
||||
msi-howto
|
||||
acpi-info
|
||||
pci-error-recovery
|
||||
pcieaer-howto
|
||||
endpoint/index
|
@ -1,13 +1,16 @@
|
||||
The MSI Driver Guide HOWTO
|
||||
Tom L Nguyen tom.l.nguyen@intel.com
|
||||
10/03/2003
|
||||
Revised Feb 12, 2004 by Martine Silbermann
|
||||
email: Martine.Silbermann@hp.com
|
||||
Revised Jun 25, 2004 by Tom L Nguyen
|
||||
Revised Jul 9, 2008 by Matthew Wilcox <willy@linux.intel.com>
|
||||
Copyright 2003, 2008 Intel Corporation
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
.. include:: <isonum.txt>
|
||||
|
||||
1. About this guide
|
||||
==========================
|
||||
The MSI Driver Guide HOWTO
|
||||
==========================
|
||||
|
||||
:Authors: Tom L Nguyen; Martine Silbermann; Matthew Wilcox
|
||||
|
||||
:Copyright: 2003, 2008 Intel Corporation
|
||||
|
||||
About this guide
|
||||
================
|
||||
|
||||
This guide describes the basics of Message Signaled Interrupts (MSIs),
|
||||
the advantages of using MSI over traditional interrupt mechanisms, how
|
||||
@ -15,7 +18,8 @@ to change your driver to use MSI or MSI-X and some basic diagnostics to
|
||||
try if a device doesn't support MSIs.
|
||||
|
||||
|
||||
2. What are MSIs?
|
||||
What are MSIs?
|
||||
==============
|
||||
|
||||
A Message Signaled Interrupt is a write from the device to a special
|
||||
address which causes an interrupt to be received by the CPU.
|
||||
@ -29,7 +33,8 @@ Devices may support both MSI and MSI-X, but only one can be enabled at
|
||||
a time.
|
||||
|
||||
|
||||
3. Why use MSIs?
|
||||
Why use MSIs?
|
||||
=============
|
||||
|
||||
There are three reasons why using MSIs can give an advantage over
|
||||
traditional pin-based interrupts.
|
||||
@ -61,14 +66,16 @@ Other possible designs include giving one interrupt to each packet queue
|
||||
in a network card or each port in a storage controller.
|
||||
|
||||
|
||||
4. How to use MSIs
|
||||
How to use MSIs
|
||||
===============
|
||||
|
||||
PCI devices are initialised to use pin-based interrupts. The device
|
||||
driver has to set up the device to use MSI or MSI-X. Not all machines
|
||||
support MSIs correctly, and for those machines, the APIs described below
|
||||
will simply fail and the device will continue to use pin-based interrupts.
|
||||
|
||||
4.1 Include kernel support for MSIs
|
||||
Include kernel support for MSIs
|
||||
-------------------------------
|
||||
|
||||
To support MSI or MSI-X, the kernel must be built with the CONFIG_PCI_MSI
|
||||
option enabled. This option is only available on some architectures,
|
||||
@ -76,14 +83,15 @@ and it may depend on some other options also being set. For example,
|
||||
on x86, you must also enable X86_UP_APIC or SMP in order to see the
|
||||
CONFIG_PCI_MSI option.
|
||||
|
||||
4.2 Using MSI
|
||||
Using MSI
|
||||
---------
|
||||
|
||||
Most of the hard work is done for the driver in the PCI layer. The driver
|
||||
simply has to request that the PCI layer set up the MSI capability for this
|
||||
device.
|
||||
|
||||
To automatically use MSI or MSI-X interrupt vectors, use the following
|
||||
function:
|
||||
function::
|
||||
|
||||
int pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
|
||||
unsigned int max_vecs, unsigned int flags);
|
||||
@ -101,12 +109,12 @@ any possible kind of interrupt. If the PCI_IRQ_AFFINITY flag is set,
|
||||
pci_alloc_irq_vectors() will spread the interrupts around the available CPUs.
|
||||
|
||||
To get the Linux IRQ numbers passed to request_irq() and free_irq() and the
|
||||
vectors, use the following function:
|
||||
vectors, use the following function::
|
||||
|
||||
int pci_irq_vector(struct pci_dev *dev, unsigned int nr);
|
||||
|
||||
Any allocated resources should be freed before removing the device using
|
||||
the following function:
|
||||
the following function::
|
||||
|
||||
void pci_free_irq_vectors(struct pci_dev *dev);
|
||||
|
||||
@ -126,7 +134,7 @@ The typical usage of MSI or MSI-X interrupts is to allocate as many vectors
|
||||
as possible, likely up to the limit supported by the device. If nvec is
|
||||
larger than the number supported by the device it will automatically be
|
||||
capped to the supported limit, so there is no need to query the number of
|
||||
vectors supported beforehand:
|
||||
vectors supported beforehand::
|
||||
|
||||
nvec = pci_alloc_irq_vectors(pdev, 1, nvec, PCI_IRQ_ALL_TYPES)
|
||||
if (nvec < 0)
|
||||
@ -135,7 +143,7 @@ vectors supported beforehand:
|
||||
If a driver is unable or unwilling to deal with a variable number of MSI
|
||||
interrupts it can request a particular number of interrupts by passing that
|
||||
number to pci_alloc_irq_vectors() function as both 'min_vecs' and
|
||||
'max_vecs' parameters:
|
||||
'max_vecs' parameters::
|
||||
|
||||
ret = pci_alloc_irq_vectors(pdev, nvec, nvec, PCI_IRQ_ALL_TYPES);
|
||||
if (ret < 0)
|
||||
@ -143,23 +151,24 @@ number to pci_alloc_irq_vectors() function as both 'min_vecs' and
|
||||
|
||||
The most notorious example of the request type described above is enabling
|
||||
the single MSI mode for a device. It could be done by passing two 1s as
|
||||
'min_vecs' and 'max_vecs':
|
||||
'min_vecs' and 'max_vecs'::
|
||||
|
||||
ret = pci_alloc_irq_vectors(pdev, 1, 1, PCI_IRQ_ALL_TYPES);
|
||||
if (ret < 0)
|
||||
goto out_err;
|
||||
|
||||
Some devices might not support using legacy line interrupts, in which case
|
||||
the driver can specify that only MSI or MSI-X is acceptable:
|
||||
the driver can specify that only MSI or MSI-X is acceptable::
|
||||
|
||||
nvec = pci_alloc_irq_vectors(pdev, 1, nvec, PCI_IRQ_MSI | PCI_IRQ_MSIX);
|
||||
if (nvec < 0)
|
||||
goto out_err;
|
||||
|
||||
4.3 Legacy APIs
|
||||
Legacy APIs
|
||||
-----------
|
||||
|
||||
The following old APIs to enable and disable MSI or MSI-X interrupts should
|
||||
not be used in new code:
|
||||
not be used in new code::
|
||||
|
||||
pci_enable_msi() /* deprecated */
|
||||
pci_disable_msi() /* deprecated */
|
||||
@ -174,9 +183,11 @@ number of vectors. If you have a legitimate special use case for the count
|
||||
of vectors we might have to revisit that decision and add a
|
||||
pci_nr_irq_vectors() helper that handles MSI and MSI-X transparently.
|
||||
|
||||
4.4 Considerations when using MSIs
|
||||
Considerations when using MSIs
|
||||
------------------------------
|
||||
|
||||
4.4.1 Spinlocks
|
||||
Spinlocks
|
||||
~~~~~~~~~
|
||||
|
||||
Most device drivers have a per-device spinlock which is taken in the
|
||||
interrupt handler. With pin-based interrupts or a single MSI, it is not
|
||||
@ -188,7 +199,8 @@ acquire the spinlock. Such deadlocks can be avoided by using
|
||||
spin_lock_irqsave() or spin_lock_irq() which disable local interrupts
|
||||
and acquire the lock (see Documentation/kernel-hacking/locking.rst).
|
||||
|
||||
4.5 How to tell whether MSI/MSI-X is enabled on a device
|
||||
How to tell whether MSI/MSI-X is enabled on a device
|
||||
----------------------------------------------------
|
||||
|
||||
Using 'lspci -v' (as root) may show some devices with "MSI", "Message
|
||||
Signalled Interrupts" or "MSI-X" capabilities. Each of these capabilities
|
||||
@ -196,7 +208,8 @@ has an 'Enable' flag which is followed with either "+" (enabled)
|
||||
or "-" (disabled).
|
||||
|
||||
|
||||
5. MSI quirks
|
||||
MSI quirks
|
||||
==========
|
||||
|
||||
Several PCI chipsets or devices are known not to support MSIs.
|
||||
The PCI stack provides three ways to disable MSIs:
|
||||
@ -205,7 +218,8 @@ The PCI stack provides three ways to disable MSIs:
|
||||
2. on all devices behind a specific bridge
|
||||
3. on a single device
|
||||
|
||||
5.1. Disabling MSIs globally
|
||||
Disabling MSIs globally
|
||||
-----------------------
|
||||
|
||||
Some host chipsets simply don't support MSIs properly. If we're
|
||||
lucky, the manufacturer knows this and has indicated it in the ACPI
|
||||
@ -219,7 +233,8 @@ on the kernel command line to disable MSIs on all devices. It would be
|
||||
in your best interests to report the problem to linux-pci@vger.kernel.org
|
||||
including a full 'lspci -v' so we can add the quirks to the kernel.
|
||||
|
||||
5.2. Disabling MSIs below a bridge
|
||||
Disabling MSIs below a bridge
|
||||
-----------------------------
|
||||
|
||||
Some PCI bridges are not able to route MSIs between busses properly.
|
||||
In this case, MSIs must be disabled on all devices behind the bridge.
|
||||
@ -230,7 +245,7 @@ as the nVidia nForce and Serverworks HT2000). As with host chipsets,
|
||||
Linux mostly knows about them and automatically enables MSIs if it can.
|
||||
If you have a bridge unknown to Linux, you can enable
|
||||
MSIs in configuration space using whatever method you know works, then
|
||||
enable MSIs on that bridge by doing:
|
||||
enable MSIs on that bridge by doing::
|
||||
|
||||
echo 1 > /sys/bus/pci/devices/$bridge/msi_bus
|
||||
|
||||
@ -244,7 +259,8 @@ below this bridge.
|
||||
Again, please notify linux-pci@vger.kernel.org of any bridges that need
|
||||
special handling.
|
||||
|
||||
5.3. Disabling MSIs on a single device
|
||||
Disabling MSIs on a single device
|
||||
---------------------------------
|
||||
|
||||
Some devices are known to have faulty MSI implementations. Usually this
|
||||
is handled in the individual device driver, but occasionally it's necessary
|
||||
@ -252,7 +268,8 @@ to handle this with a quirk. Some drivers have an option to disable use
|
||||
of MSI. While this is a convenient workaround for the driver author,
|
||||
it is not good practice, and should not be emulated.
|
||||
|
||||
5.4. Finding why MSIs are disabled on a device
|
||||
Finding why MSIs are disabled on a device
|
||||
-----------------------------------------
|
||||
|
||||
From the above three sections, you can see that there are many reasons
|
||||
why MSIs may not be enabled for a given device. Your first step should
|
||||
@ -260,8 +277,8 @@ be to examine your dmesg carefully to determine whether MSIs are enabled
|
||||
for your machine. You should also check your .config to be sure you
|
||||
have enabled CONFIG_PCI_MSI.
|
||||
|
||||
Then, 'lspci -t' gives the list of bridges above a device. Reading
|
||||
/sys/bus/pci/devices/*/msi_bus will tell you whether MSIs are enabled (1)
|
||||
Then, 'lspci -t' gives the list of bridges above a device. Reading
|
||||
`/sys/bus/pci/devices/*/msi_bus` will tell you whether MSIs are enabled (1)
|
||||
or disabled (0). If 0 is found in any of the msi_bus files belonging
|
||||
to bridges between the PCI root and the device, MSIs are disabled.
|
||||
|
@ -1,12 +1,13 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
PCI Error Recovery
|
||||
------------------
|
||||
February 2, 2006
|
||||
==================
|
||||
PCI Error Recovery
|
||||
==================
|
||||
|
||||
Current document maintainer:
|
||||
Linas Vepstas <linasvepstas@gmail.com>
|
||||
updated by Richard Lary <rlary@us.ibm.com>
|
||||
and Mike Mason <mmlnx@us.ibm.com> on 27-Jul-2009
|
||||
|
||||
:Authors: - Linas Vepstas <linasvepstas@gmail.com>
|
||||
- Richard Lary <rlary@us.ibm.com>
|
||||
- Mike Mason <mmlnx@us.ibm.com>
|
||||
|
||||
|
||||
Many PCI bus controllers are able to detect a variety of hardware
|
||||
@ -63,7 +64,8 @@ mechanisms for dealing with SCSI bus errors and SCSI bus resets.
|
||||
|
||||
|
||||
Detailed Design
|
||||
---------------
|
||||
===============
|
||||
|
||||
Design and implementation details below, based on a chain of
|
||||
public email discussions with Ben Herrenschmidt, circa 5 April 2005.
|
||||
|
||||
@ -73,30 +75,33 @@ pci_driver. A driver that fails to provide the structure is "non-aware",
|
||||
and the actual recovery steps taken are platform dependent. The
|
||||
arch/powerpc implementation will simulate a PCI hotplug remove/add.
|
||||
|
||||
This structure has the form:
|
||||
struct pci_error_handlers
|
||||
{
|
||||
int (*error_detected)(struct pci_dev *dev, enum pci_channel_state);
|
||||
int (*mmio_enabled)(struct pci_dev *dev);
|
||||
int (*slot_reset)(struct pci_dev *dev);
|
||||
void (*resume)(struct pci_dev *dev);
|
||||
};
|
||||
This structure has the form::
|
||||
|
||||
The possible channel states are:
|
||||
enum pci_channel_state {
|
||||
pci_channel_io_normal, /* I/O channel is in normal state */
|
||||
pci_channel_io_frozen, /* I/O to channel is blocked */
|
||||
pci_channel_io_perm_failure, /* PCI card is dead */
|
||||
};
|
||||
struct pci_error_handlers
|
||||
{
|
||||
int (*error_detected)(struct pci_dev *dev, enum pci_channel_state);
|
||||
int (*mmio_enabled)(struct pci_dev *dev);
|
||||
int (*slot_reset)(struct pci_dev *dev);
|
||||
void (*resume)(struct pci_dev *dev);
|
||||
};
|
||||
|
||||
Possible return values are:
|
||||
enum pci_ers_result {
|
||||
PCI_ERS_RESULT_NONE, /* no result/none/not supported in device driver */
|
||||
PCI_ERS_RESULT_CAN_RECOVER, /* Device driver can recover without slot reset */
|
||||
PCI_ERS_RESULT_NEED_RESET, /* Device driver wants slot to be reset. */
|
||||
PCI_ERS_RESULT_DISCONNECT, /* Device has completely failed, is unrecoverable */
|
||||
PCI_ERS_RESULT_RECOVERED, /* Device driver is fully recovered and operational */
|
||||
};
|
||||
The possible channel states are::
|
||||
|
||||
enum pci_channel_state {
|
||||
pci_channel_io_normal, /* I/O channel is in normal state */
|
||||
pci_channel_io_frozen, /* I/O to channel is blocked */
|
||||
pci_channel_io_perm_failure, /* PCI card is dead */
|
||||
};
|
||||
|
||||
Possible return values are::
|
||||
|
||||
enum pci_ers_result {
|
||||
PCI_ERS_RESULT_NONE, /* no result/none/not supported in device driver */
|
||||
PCI_ERS_RESULT_CAN_RECOVER, /* Device driver can recover without slot reset */
|
||||
PCI_ERS_RESULT_NEED_RESET, /* Device driver wants slot to be reset. */
|
||||
PCI_ERS_RESULT_DISCONNECT, /* Device has completely failed, is unrecoverable */
|
||||
PCI_ERS_RESULT_RECOVERED, /* Device driver is fully recovered and operational */
|
||||
};
|
||||
|
||||
A driver does not have to implement all of these callbacks; however,
|
||||
if it implements any, it must implement error_detected(). If a callback
|
||||
@ -134,16 +139,17 @@ shouldn't do any new IOs. Called in task context. This is sort of a
|
||||
|
||||
All drivers participating in this system must implement this call.
|
||||
The driver must return one of the following result codes:
|
||||
- PCI_ERS_RESULT_CAN_RECOVER:
|
||||
Driver returns this if it thinks it might be able to recover
|
||||
the HW by just banging IOs or if it wants to be given
|
||||
a chance to extract some diagnostic information (see
|
||||
mmio_enable, below).
|
||||
- PCI_ERS_RESULT_NEED_RESET:
|
||||
Driver returns this if it can't recover without a
|
||||
slot reset.
|
||||
- PCI_ERS_RESULT_DISCONNECT:
|
||||
Driver returns this if it doesn't want to recover at all.
|
||||
|
||||
- PCI_ERS_RESULT_CAN_RECOVER
|
||||
Driver returns this if it thinks it might be able to recover
|
||||
the HW by just banging IOs or if it wants to be given
|
||||
a chance to extract some diagnostic information (see
|
||||
mmio_enable, below).
|
||||
- PCI_ERS_RESULT_NEED_RESET
|
||||
Driver returns this if it can't recover without a
|
||||
slot reset.
|
||||
- PCI_ERS_RESULT_DISCONNECT
|
||||
Driver returns this if it doesn't want to recover at all.
|
||||
|
||||
The next step taken will depend on the result codes returned by the
|
||||
drivers.
|
||||
@ -159,25 +165,27 @@ then recovery proceeds to STEP 4 (Slot Reset).
|
||||
If the platform is unable to recover the slot, the next step
|
||||
is STEP 6 (Permanent Failure).
|
||||
|
||||
>>> The current powerpc implementation assumes that a device driver will
|
||||
>>> *not* schedule or semaphore in this routine; the current powerpc
|
||||
>>> implementation uses one kernel thread to notify all devices;
|
||||
>>> thus, if one device sleeps/schedules, all devices are affected.
|
||||
>>> Doing better requires complex multi-threaded logic in the error
|
||||
>>> recovery implementation (e.g. waiting for all notification threads
|
||||
>>> to "join" before proceeding with recovery.) This seems excessively
|
||||
>>> complex and not worth implementing.
|
||||
.. note::
|
||||
|
||||
>>> The current powerpc implementation doesn't much care if the device
|
||||
>>> attempts I/O at this point, or not. I/O's will fail, returning
|
||||
>>> a value of 0xff on read, and writes will be dropped. If more than
|
||||
>>> EEH_MAX_FAILS I/O's are attempted to a frozen adapter, EEH
|
||||
>>> assumes that the device driver has gone into an infinite loop
|
||||
>>> and prints an error to syslog. A reboot is then required to
|
||||
>>> get the device working again.
|
||||
The current powerpc implementation assumes that a device driver will
|
||||
*not* schedule or semaphore in this routine; the current powerpc
|
||||
implementation uses one kernel thread to notify all devices;
|
||||
thus, if one device sleeps/schedules, all devices are affected.
|
||||
Doing better requires complex multi-threaded logic in the error
|
||||
recovery implementation (e.g. waiting for all notification threads
|
||||
to "join" before proceeding with recovery.) This seems excessively
|
||||
complex and not worth implementing.
|
||||
|
||||
The current powerpc implementation doesn't much care if the device
|
||||
attempts I/O at this point, or not. I/O's will fail, returning
|
||||
a value of 0xff on read, and writes will be dropped. If more than
|
||||
EEH_MAX_FAILS I/O's are attempted to a frozen adapter, EEH
|
||||
assumes that the device driver has gone into an infinite loop
|
||||
and prints an error to syslog. A reboot is then required to
|
||||
get the device working again.
|
||||
|
||||
STEP 2: MMIO Enabled
|
||||
-------------------
|
||||
--------------------
|
||||
The platform re-enables MMIO to the device (but typically not the
|
||||
DMA), and then calls the mmio_enabled() callback on all affected
|
||||
device drivers.
|
||||
@ -192,34 +200,36 @@ link reset was performed by the HW. If the platform can't just re-enable IOs
|
||||
without a slot reset or a link reset, it will not call this callback, and
|
||||
instead will have gone directly to STEP 3 (Link Reset) or STEP 4 (Slot Reset)
|
||||
|
||||
>>> The following is proposed; no platform implements this yet:
|
||||
>>> Proposal: All I/O's should be done _synchronously_ from within
|
||||
>>> this callback, errors triggered by them will be returned via
|
||||
>>> the normal pci_check_whatever() API, no new error_detected()
|
||||
>>> callback will be issued due to an error happening here. However,
|
||||
>>> such an error might cause IOs to be re-blocked for the whole
|
||||
>>> segment, and thus invalidate the recovery that other devices
|
||||
>>> on the same segment might have done, forcing the whole segment
|
||||
>>> into one of the next states, that is, link reset or slot reset.
|
||||
.. note::
|
||||
|
||||
The following is proposed; no platform implements this yet:
|
||||
Proposal: All I/O's should be done _synchronously_ from within
|
||||
this callback, errors triggered by them will be returned via
|
||||
the normal pci_check_whatever() API, no new error_detected()
|
||||
callback will be issued due to an error happening here. However,
|
||||
such an error might cause IOs to be re-blocked for the whole
|
||||
segment, and thus invalidate the recovery that other devices
|
||||
on the same segment might have done, forcing the whole segment
|
||||
into one of the next states, that is, link reset or slot reset.
|
||||
|
||||
The driver should return one of the following result codes:
|
||||
- PCI_ERS_RESULT_RECOVERED
|
||||
Driver returns this if it thinks the device is fully
|
||||
functional and thinks it is ready to start
|
||||
normal driver operations again. There is no
|
||||
guarantee that the driver will actually be
|
||||
allowed to proceed, as another driver on the
|
||||
same segment might have failed and thus triggered a
|
||||
slot reset on platforms that support it.
|
||||
- PCI_ERS_RESULT_RECOVERED
|
||||
Driver returns this if it thinks the device is fully
|
||||
functional and thinks it is ready to start
|
||||
normal driver operations again. There is no
|
||||
guarantee that the driver will actually be
|
||||
allowed to proceed, as another driver on the
|
||||
same segment might have failed and thus triggered a
|
||||
slot reset on platforms that support it.
|
||||
|
||||
- PCI_ERS_RESULT_NEED_RESET
|
||||
Driver returns this if it thinks the device is not
|
||||
recoverable in its current state and it needs a slot
|
||||
reset to proceed.
|
||||
- PCI_ERS_RESULT_NEED_RESET
|
||||
Driver returns this if it thinks the device is not
|
||||
recoverable in its current state and it needs a slot
|
||||
reset to proceed.
|
||||
|
||||
- PCI_ERS_RESULT_DISCONNECT
|
||||
Same as above. Total failure, no recovery even after
|
||||
reset driver dead. (To be defined more precisely)
|
||||
- PCI_ERS_RESULT_DISCONNECT
|
||||
Same as above. Total failure, no recovery even after
|
||||
reset driver dead. (To be defined more precisely)
|
||||
|
||||
The next step taken depends on the results returned by the drivers.
|
||||
If all drivers returned PCI_ERS_RESULT_RECOVERED, then the platform
|
||||
@ -293,31 +303,33 @@ device will be considered "dead" in this case.
|
||||
Drivers for multi-function cards will need to coordinate among
|
||||
themselves as to which driver instance will perform any "one-shot"
|
||||
or global device initialization. For example, the Symbios sym53cxx2
|
||||
driver performs device init only from PCI function 0:
|
||||
driver performs device init only from PCI function 0::
|
||||
|
||||
+ if (PCI_FUNC(pdev->devfn) == 0)
|
||||
+ sym_reset_scsi_bus(np, 0);
|
||||
+ if (PCI_FUNC(pdev->devfn) == 0)
|
||||
+ sym_reset_scsi_bus(np, 0);
|
||||
|
||||
Result codes:
|
||||
- PCI_ERS_RESULT_DISCONNECT
|
||||
Same as above.
|
||||
Result codes:
|
||||
- PCI_ERS_RESULT_DISCONNECT
|
||||
Same as above.
|
||||
|
||||
Drivers for PCI Express cards that require a fundamental reset must
|
||||
set the needs_freset bit in the pci_dev structure in their probe function.
|
||||
For example, the QLogic qla2xxx driver sets the needs_freset bit for certain
|
||||
PCI card types:
|
||||
PCI card types::
|
||||
|
||||
+ /* Set EEH reset type to fundamental if required by hba */
|
||||
+ if (IS_QLA24XX(ha) || IS_QLA25XX(ha) || IS_QLA81XX(ha))
|
||||
+ pdev->needs_freset = 1;
|
||||
+
|
||||
+ /* Set EEH reset type to fundamental if required by hba */
|
||||
+ if (IS_QLA24XX(ha) || IS_QLA25XX(ha) || IS_QLA81XX(ha))
|
||||
+ pdev->needs_freset = 1;
|
||||
+
|
||||
|
||||
Platform proceeds either to STEP 5 (Resume Operations) or STEP 6 (Permanent
|
||||
Failure).
|
||||
|
||||
>>> The current powerpc implementation does not try a power-cycle
|
||||
>>> reset if the driver returned PCI_ERS_RESULT_DISCONNECT.
|
||||
>>> However, it probably should.
|
||||
.. note::
|
||||
|
||||
The current powerpc implementation does not try a power-cycle
|
||||
reset if the driver returned PCI_ERS_RESULT_DISCONNECT.
|
||||
However, it probably should.
|
||||
|
||||
|
||||
STEP 5: Resume Operations
|
||||
@ -370,44 +382,43 @@ The current policy is to turn this into a platform policy.
|
||||
That is, the recovery API only requires that:
|
||||
|
||||
- There is no guarantee that interrupt delivery can proceed from any
|
||||
device on the segment starting from the error detection and until the
|
||||
slot_reset callback is called, at which point interrupts are expected
|
||||
to be fully operational.
|
||||
device on the segment starting from the error detection and until the
|
||||
slot_reset callback is called, at which point interrupts are expected
|
||||
to be fully operational.
|
||||
|
||||
- There is no guarantee that interrupt delivery is stopped, that is,
|
||||
a driver that gets an interrupt after detecting an error, or that detects
|
||||
an error within the interrupt handler such that it prevents proper
|
||||
ack'ing of the interrupt (and thus removal of the source) should just
|
||||
return IRQ_NOTHANDLED. It's up to the platform to deal with that
|
||||
condition, typically by masking the IRQ source during the duration of
|
||||
the error handling. It is expected that the platform "knows" which
|
||||
interrupts are routed to error-management capable slots and can deal
|
||||
with temporarily disabling that IRQ number during error processing (this
|
||||
isn't terribly complex). That means some IRQ latency for other devices
|
||||
sharing the interrupt, but there is simply no other way. High end
|
||||
platforms aren't supposed to share interrupts between many devices
|
||||
anyway :)
|
||||
a driver that gets an interrupt after detecting an error, or that detects
|
||||
an error within the interrupt handler such that it prevents proper
|
||||
ack'ing of the interrupt (and thus removal of the source) should just
|
||||
return IRQ_NOTHANDLED. It's up to the platform to deal with that
|
||||
condition, typically by masking the IRQ source during the duration of
|
||||
the error handling. It is expected that the platform "knows" which
|
||||
interrupts are routed to error-management capable slots and can deal
|
||||
with temporarily disabling that IRQ number during error processing (this
|
||||
isn't terribly complex). That means some IRQ latency for other devices
|
||||
sharing the interrupt, but there is simply no other way. High end
|
||||
platforms aren't supposed to share interrupts between many devices
|
||||
anyway :)
|
||||
|
||||
>>> Implementation details for the powerpc platform are discussed in
|
||||
>>> the file Documentation/powerpc/eeh-pci-error-recovery.txt
|
||||
.. note::
|
||||
|
||||
>>> As of this writing, there is a growing list of device drivers with
|
||||
>>> patches implementing error recovery. Not all of these patches are in
|
||||
>>> mainline yet. These may be used as "examples":
|
||||
>>>
|
||||
>>> drivers/scsi/ipr
|
||||
>>> drivers/scsi/sym53c8xx_2
|
||||
>>> drivers/scsi/qla2xxx
|
||||
>>> drivers/scsi/lpfc
|
||||
>>> drivers/next/bnx2.c
|
||||
>>> drivers/next/e100.c
|
||||
>>> drivers/net/e1000
|
||||
>>> drivers/net/e1000e
|
||||
>>> drivers/net/ixgb
|
||||
>>> drivers/net/ixgbe
|
||||
>>> drivers/net/cxgb3
|
||||
>>> drivers/net/s2io.c
|
||||
>>> drivers/net/qlge
|
||||
Implementation details for the powerpc platform are discussed in
|
||||
the file Documentation/powerpc/eeh-pci-error-recovery.txt
|
||||
|
||||
The End
|
||||
-------
|
||||
As of this writing, there is a growing list of device drivers with
|
||||
patches implementing error recovery. Not all of these patches are in
|
||||
mainline yet. These may be used as "examples":
|
||||
|
||||
- drivers/scsi/ipr
|
||||
- drivers/scsi/sym53c8xx_2
|
||||
- drivers/scsi/qla2xxx
|
||||
- drivers/scsi/lpfc
|
||||
- drivers/next/bnx2.c
|
||||
- drivers/next/e100.c
|
||||
- drivers/net/e1000
|
||||
- drivers/net/e1000e
|
||||
- drivers/net/ixgb
|
||||
- drivers/net/ixgbe
|
||||
- drivers/net/cxgb3
|
||||
- drivers/net/s2io.c
|
||||
- drivers/net/qlge
|
@ -1,14 +1,19 @@
|
||||
PCI Express I/O Virtualization Howto
|
||||
Copyright (C) 2009 Intel Corporation
|
||||
Yu Zhao <yu.zhao@intel.com>
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
.. include:: <isonum.txt>
|
||||
|
||||
Update: November 2012
|
||||
-- sysfs-based SRIOV enable-/disable-ment
|
||||
Donald Dutile <ddutile@redhat.com>
|
||||
====================================
|
||||
PCI Express I/O Virtualization Howto
|
||||
====================================
|
||||
|
||||
1. Overview
|
||||
:Copyright: |copy| 2009 Intel Corporation
|
||||
:Authors: - Yu Zhao <yu.zhao@intel.com>
|
||||
- Donald Dutile <ddutile@redhat.com>
|
||||
|
||||
1.1 What is SR-IOV
|
||||
Overview
|
||||
========
|
||||
|
||||
What is SR-IOV
|
||||
--------------
|
||||
|
||||
Single Root I/O Virtualization (SR-IOV) is a PCI Express Extended
|
||||
capability which makes one physical device appear as multiple virtual
|
||||
@ -23,9 +28,11 @@ Memory Space, which is used to map its register set. VF device driver
|
||||
operates on the register set so it can be functional and appear as a
|
||||
real existing PCI device.
|
||||
|
||||
2. User Guide
|
||||
User Guide
|
||||
==========
|
||||
|
||||
2.1 How can I enable SR-IOV capability
|
||||
How can I enable SR-IOV capability
|
||||
----------------------------------
|
||||
|
||||
Multiple methods are available for SR-IOV enablement.
|
||||
In the first method, the device driver (PF driver) will control the
|
||||
@ -43,105 +50,123 @@ checks, e.g., check numvfs == 0 if enabling VFs, ensure
|
||||
numvfs <= totalvfs.
|
||||
The second method is the recommended method for new/future VF devices.
|
||||
|
||||
2.2 How can I use the Virtual Functions
|
||||
How can I use the Virtual Functions
|
||||
-----------------------------------
|
||||
|
||||
The VF is treated as hot-plugged PCI devices in the kernel, so they
|
||||
should be able to work in the same way as real PCI devices. The VF
|
||||
requires device driver that is same as a normal PCI device's.
|
||||
|
||||
3. Developer Guide
|
||||
Developer Guide
|
||||
===============
|
||||
|
||||
3.1 SR-IOV API
|
||||
SR-IOV API
|
||||
----------
|
||||
|
||||
To enable SR-IOV capability:
|
||||
(a) For the first method, in the driver:
|
||||
|
||||
(a) For the first method, in the driver::
|
||||
|
||||
int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn);
|
||||
'nr_virtfn' is number of VFs to be enabled.
|
||||
(b) For the second method, from sysfs:
|
||||
|
||||
'nr_virtfn' is number of VFs to be enabled.
|
||||
|
||||
(b) For the second method, from sysfs::
|
||||
|
||||
echo 'nr_virtfn' > \
|
||||
/sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_numvfs
|
||||
|
||||
To disable SR-IOV capability:
|
||||
(a) For the first method, in the driver:
|
||||
|
||||
(a) For the first method, in the driver::
|
||||
|
||||
void pci_disable_sriov(struct pci_dev *dev);
|
||||
(b) For the second method, from sysfs:
|
||||
|
||||
(b) For the second method, from sysfs::
|
||||
|
||||
echo 0 > \
|
||||
/sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_numvfs
|
||||
|
||||
To enable auto probing VFs by a compatible driver on the host, run
|
||||
command below before enabling SR-IOV capabilities. This is the
|
||||
default behavior.
|
||||
::
|
||||
|
||||
echo 1 > \
|
||||
/sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_drivers_autoprobe
|
||||
|
||||
To disable auto probing VFs by a compatible driver on the host, run
|
||||
command below before enabling SR-IOV capabilities. Updating this
|
||||
entry will not affect VFs which are already probed.
|
||||
::
|
||||
|
||||
echo 0 > \
|
||||
/sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_drivers_autoprobe
|
||||
|
||||
3.2 Usage example
|
||||
Usage example
|
||||
-------------
|
||||
|
||||
Following piece of code illustrates the usage of the SR-IOV API.
|
||||
::
|
||||
|
||||
static int dev_probe(struct pci_dev *dev, const struct pci_device_id *id)
|
||||
{
|
||||
pci_enable_sriov(dev, NR_VIRTFN);
|
||||
static int dev_probe(struct pci_dev *dev, const struct pci_device_id *id)
|
||||
{
|
||||
pci_enable_sriov(dev, NR_VIRTFN);
|
||||
|
||||
...
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static void dev_remove(struct pci_dev *dev)
|
||||
{
|
||||
pci_disable_sriov(dev);
|
||||
|
||||
...
|
||||
}
|
||||
|
||||
static int dev_suspend(struct pci_dev *dev, pm_message_t state)
|
||||
{
|
||||
...
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int dev_resume(struct pci_dev *dev)
|
||||
{
|
||||
...
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static void dev_shutdown(struct pci_dev *dev)
|
||||
{
|
||||
...
|
||||
}
|
||||
|
||||
static int dev_sriov_configure(struct pci_dev *dev, int numvfs)
|
||||
{
|
||||
if (numvfs > 0) {
|
||||
...
|
||||
pci_enable_sriov(dev, numvfs);
|
||||
...
|
||||
return numvfs;
|
||||
}
|
||||
if (numvfs == 0) {
|
||||
....
|
||||
pci_disable_sriov(dev);
|
||||
...
|
||||
|
||||
return 0;
|
||||
}
|
||||
}
|
||||
|
||||
static struct pci_driver dev_driver = {
|
||||
.name = "SR-IOV Physical Function driver",
|
||||
.id_table = dev_id_table,
|
||||
.probe = dev_probe,
|
||||
.remove = dev_remove,
|
||||
.suspend = dev_suspend,
|
||||
.resume = dev_resume,
|
||||
.shutdown = dev_shutdown,
|
||||
.sriov_configure = dev_sriov_configure,
|
||||
};
|
||||
static void dev_remove(struct pci_dev *dev)
|
||||
{
|
||||
pci_disable_sriov(dev);
|
||||
|
||||
...
|
||||
}
|
||||
|
||||
static int dev_suspend(struct pci_dev *dev, pm_message_t state)
|
||||
{
|
||||
...
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int dev_resume(struct pci_dev *dev)
|
||||
{
|
||||
...
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static void dev_shutdown(struct pci_dev *dev)
|
||||
{
|
||||
...
|
||||
}
|
||||
|
||||
static int dev_sriov_configure(struct pci_dev *dev, int numvfs)
|
||||
{
|
||||
if (numvfs > 0) {
|
||||
...
|
||||
pci_enable_sriov(dev, numvfs);
|
||||
...
|
||||
return numvfs;
|
||||
}
|
||||
if (numvfs == 0) {
|
||||
....
|
||||
pci_disable_sriov(dev);
|
||||
...
|
||||
return 0;
|
||||
}
|
||||
}
|
||||
|
||||
static struct pci_driver dev_driver = {
|
||||
.name = "SR-IOV Physical Function driver",
|
||||
.id_table = dev_id_table,
|
||||
.probe = dev_probe,
|
||||
.remove = dev_remove,
|
||||
.suspend = dev_suspend,
|
||||
.resume = dev_resume,
|
||||
.shutdown = dev_shutdown,
|
||||
.sriov_configure = dev_sriov_configure,
|
||||
};
|
@ -1,10 +1,12 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
How To Write Linux PCI Drivers
|
||||
==============================
|
||||
How To Write Linux PCI Drivers
|
||||
==============================
|
||||
|
||||
by Martin Mares <mj@ucw.cz> on 07-Feb-2000
|
||||
updated by Grant Grundler <grundler@parisc-linux.org> on 23-Dec-2006
|
||||
:Authors: - Martin Mares <mj@ucw.cz>
|
||||
- Grant Grundler <grundler@parisc-linux.org>
|
||||
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
The world of PCI is vast and full of (mostly unpleasant) surprises.
|
||||
Since each CPU architecture implements different chip-sets and PCI devices
|
||||
have different requirements (erm, "features"), the result is the PCI support
|
||||
@ -15,8 +17,7 @@ PCI device drivers.
|
||||
A more complete resource is the third edition of "Linux Device Drivers"
|
||||
by Jonathan Corbet, Alessandro Rubini, and Greg Kroah-Hartman.
|
||||
LDD3 is available for free (under Creative Commons License) from:
|
||||
|
||||
http://lwn.net/Kernel/LDD3/
|
||||
http://lwn.net/Kernel/LDD3/.
|
||||
|
||||
However, keep in mind that all documents are subject to "bit rot".
|
||||
Refer to the source code if things are not working as described here.
|
||||
@ -25,9 +26,8 @@ Please send questions/comments/patches about Linux PCI API to the
|
||||
"Linux PCI" <linux-pci@atrey.karlin.mff.cuni.cz> mailing list.
|
||||
|
||||
|
||||
|
||||
0. Structure of PCI drivers
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Structure of PCI drivers
|
||||
========================
|
||||
PCI drivers "discover" PCI devices in a system via pci_register_driver().
|
||||
Actually, it's the other way around. When the PCI generic code discovers
|
||||
a new device, the driver with a matching "description" will be notified.
|
||||
@ -42,24 +42,25 @@ pointers and thus dictates the high level structure of a driver.
|
||||
Once the driver knows about a PCI device and takes ownership, the
|
||||
driver generally needs to perform the following initialization:
|
||||
|
||||
Enable the device
|
||||
Request MMIO/IOP resources
|
||||
Set the DMA mask size (for both coherent and streaming DMA)
|
||||
Allocate and initialize shared control data (pci_allocate_coherent())
|
||||
Access device configuration space (if needed)
|
||||
Register IRQ handler (request_irq())
|
||||
Initialize non-PCI (i.e. LAN/SCSI/etc parts of the chip)
|
||||
Enable DMA/processing engines
|
||||
- Enable the device
|
||||
- Request MMIO/IOP resources
|
||||
- Set the DMA mask size (for both coherent and streaming DMA)
|
||||
- Allocate and initialize shared control data (pci_allocate_coherent())
|
||||
- Access device configuration space (if needed)
|
||||
- Register IRQ handler (request_irq())
|
||||
- Initialize non-PCI (i.e. LAN/SCSI/etc parts of the chip)
|
||||
- Enable DMA/processing engines
|
||||
|
||||
When done using the device, and perhaps the module needs to be unloaded,
|
||||
the driver needs to take the follow steps:
|
||||
Disable the device from generating IRQs
|
||||
Release the IRQ (free_irq())
|
||||
Stop all DMA activity
|
||||
Release DMA buffers (both streaming and coherent)
|
||||
Unregister from other subsystems (e.g. scsi or netdev)
|
||||
Release MMIO/IOP resources
|
||||
Disable the device
|
||||
|
||||
- Disable the device from generating IRQs
|
||||
- Release the IRQ (free_irq())
|
||||
- Stop all DMA activity
|
||||
- Release DMA buffers (both streaming and coherent)
|
||||
- Unregister from other subsystems (e.g. scsi or netdev)
|
||||
- Release MMIO/IOP resources
|
||||
- Disable the device
|
||||
|
||||
Most of these topics are covered in the following sections.
|
||||
For the rest look at LDD3 or <linux/pci.h> .
|
||||
@ -70,99 +71,38 @@ completely empty or just returning an appropriate error codes to avoid
|
||||
lots of ifdefs in the drivers.
|
||||
|
||||
|
||||
pci_register_driver() call
|
||||
==========================
|
||||
|
||||
1. pci_register_driver() call
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
PCI device drivers call pci_register_driver() during their
|
||||
PCI device drivers call ``pci_register_driver()`` during their
|
||||
initialization with a pointer to a structure describing the driver
|
||||
(struct pci_driver):
|
||||
(``struct pci_driver``):
|
||||
|
||||
field name Description
|
||||
---------- ------------------------------------------------------
|
||||
id_table Pointer to table of device ID's the driver is
|
||||
interested in. Most drivers should export this
|
||||
table using MODULE_DEVICE_TABLE(pci,...).
|
||||
.. kernel-doc:: include/linux/pci.h
|
||||
:functions: pci_driver
|
||||
|
||||
probe This probing function gets called (during execution
|
||||
of pci_register_driver() for already existing
|
||||
devices or later if a new device gets inserted) for
|
||||
all PCI devices which match the ID table and are not
|
||||
"owned" by the other drivers yet. This function gets
|
||||
passed a "struct pci_dev *" for each device whose
|
||||
entry in the ID table matches the device. The probe
|
||||
function returns zero when the driver chooses to
|
||||
take "ownership" of the device or an error code
|
||||
(negative number) otherwise.
|
||||
The probe function always gets called from process
|
||||
context, so it can sleep.
|
||||
|
||||
remove The remove() function gets called whenever a device
|
||||
being handled by this driver is removed (either during
|
||||
deregistration of the driver or when it's manually
|
||||
pulled out of a hot-pluggable slot).
|
||||
The remove function always gets called from process
|
||||
context, so it can sleep.
|
||||
|
||||
suspend Put device into low power state.
|
||||
suspend_late Put device into low power state.
|
||||
|
||||
resume_early Wake device from low power state.
|
||||
resume Wake device from low power state.
|
||||
|
||||
(Please see Documentation/power/pci.txt for descriptions
|
||||
of PCI Power Management and the related functions.)
|
||||
|
||||
shutdown Hook into reboot_notifier_list (kernel/sys.c).
|
||||
Intended to stop any idling DMA operations.
|
||||
Useful for enabling wake-on-lan (NIC) or changing
|
||||
the power state of a device before reboot.
|
||||
e.g. drivers/net/e100.c.
|
||||
|
||||
err_handler See Documentation/PCI/pci-error-recovery.txt
|
||||
|
||||
|
||||
The ID table is an array of struct pci_device_id entries ending with an
|
||||
The ID table is an array of ``struct pci_device_id`` entries ending with an
|
||||
all-zero entry. Definitions with static const are generally preferred.
|
||||
|
||||
Each entry consists of:
|
||||
.. kernel-doc:: include/linux/mod_devicetable.h
|
||||
:functions: pci_device_id
|
||||
|
||||
vendor,device Vendor and device ID to match (or PCI_ANY_ID)
|
||||
|
||||
subvendor, Subsystem vendor and device ID to match (or PCI_ANY_ID)
|
||||
subdevice,
|
||||
|
||||
class Device class, subclass, and "interface" to match.
|
||||
See Appendix D of the PCI Local Bus Spec or
|
||||
include/linux/pci_ids.h for a full list of classes.
|
||||
Most drivers do not need to specify class/class_mask
|
||||
as vendor/device is normally sufficient.
|
||||
|
||||
class_mask limit which sub-fields of the class field are compared.
|
||||
See drivers/scsi/sym53c8xx_2/ for example of usage.
|
||||
|
||||
driver_data Data private to the driver.
|
||||
Most drivers don't need to use driver_data field.
|
||||
Best practice is to use driver_data as an index
|
||||
into a static list of equivalent device types,
|
||||
instead of using it as a pointer.
|
||||
|
||||
|
||||
Most drivers only need PCI_DEVICE() or PCI_DEVICE_CLASS() to set up
|
||||
Most drivers only need ``PCI_DEVICE()`` or ``PCI_DEVICE_CLASS()`` to set up
|
||||
a pci_device_id table.
|
||||
|
||||
New PCI IDs may be added to a device driver pci_ids table at runtime
|
||||
as shown below:
|
||||
as shown below::
|
||||
|
||||
echo "vendor device subvendor subdevice class class_mask driver_data" > \
|
||||
/sys/bus/pci/drivers/{driver}/new_id
|
||||
echo "vendor device subvendor subdevice class class_mask driver_data" > \
|
||||
/sys/bus/pci/drivers/{driver}/new_id
|
||||
|
||||
All fields are passed in as hexadecimal values (no leading 0x).
|
||||
The vendor and device fields are mandatory, the others are optional. Users
|
||||
need pass only as many optional fields as necessary:
|
||||
o subvendor and subdevice fields default to PCI_ANY_ID (FFFFFFFF)
|
||||
o class and classmask fields default to 0
|
||||
o driver_data defaults to 0UL.
|
||||
|
||||
- subvendor and subdevice fields default to PCI_ANY_ID (FFFFFFFF)
|
||||
- class and classmask fields default to 0
|
||||
- driver_data defaults to 0UL.
|
||||
|
||||
Note that driver_data must match the value used by any of the pci_device_id
|
||||
entries defined in the driver. This makes the driver_data field mandatory
|
||||
@ -175,29 +115,31 @@ When the driver exits, it just calls pci_unregister_driver() and the PCI layer
|
||||
automatically calls the remove hook for all devices handled by the driver.
|
||||
|
||||
|
||||
1.1 "Attributes" for driver functions/data
|
||||
"Attributes" for driver functions/data
|
||||
--------------------------------------
|
||||
|
||||
Please mark the initialization and cleanup functions where appropriate
|
||||
(the corresponding macros are defined in <linux/init.h>):
|
||||
|
||||
====== =================================================
|
||||
__init Initialization code. Thrown away after the driver
|
||||
initializes.
|
||||
__exit Exit code. Ignored for non-modular drivers.
|
||||
====== =================================================
|
||||
|
||||
Tips on when/where to use the above attributes:
|
||||
o The module_init()/module_exit() functions (and all
|
||||
- The module_init()/module_exit() functions (and all
|
||||
initialization functions called _only_ from these)
|
||||
should be marked __init/__exit.
|
||||
|
||||
o Do not mark the struct pci_driver.
|
||||
- Do not mark the struct pci_driver.
|
||||
|
||||
o Do NOT mark a function if you are not sure which mark to use.
|
||||
- Do NOT mark a function if you are not sure which mark to use.
|
||||
Better to not mark the function than mark the function wrong.
|
||||
|
||||
|
||||
|
||||
2. How to find PCI devices manually
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
How to find PCI devices manually
|
||||
================================
|
||||
|
||||
PCI drivers should have a really good reason for not using the
|
||||
pci_register_driver() interface to search for PCI devices.
|
||||
@ -207,17 +149,17 @@ E.g. combined serial/parallel port/floppy controller.
|
||||
|
||||
A manual search may be performed using the following constructs:
|
||||
|
||||
Searching by vendor and device ID:
|
||||
Searching by vendor and device ID::
|
||||
|
||||
struct pci_dev *dev = NULL;
|
||||
while (dev = pci_get_device(VENDOR_ID, DEVICE_ID, dev))
|
||||
configure_device(dev);
|
||||
|
||||
Searching by class ID (iterate in a similar way):
|
||||
Searching by class ID (iterate in a similar way)::
|
||||
|
||||
pci_get_class(CLASS_ID, dev)
|
||||
|
||||
Searching by both vendor/device and subsystem vendor/device ID:
|
||||
Searching by both vendor/device and subsystem vendor/device ID::
|
||||
|
||||
pci_get_subsys(VENDOR_ID,DEVICE_ID, SUBSYS_VENDOR_ID, SUBSYS_DEVICE_ID, dev).
|
||||
|
||||
@ -230,21 +172,20 @@ the pci_dev that they return. You must eventually (possibly at module unload)
|
||||
decrement the reference count on these devices by calling pci_dev_put().
|
||||
|
||||
|
||||
|
||||
3. Device Initialization Steps
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Device Initialization Steps
|
||||
===========================
|
||||
|
||||
As noted in the introduction, most PCI drivers need the following steps
|
||||
for device initialization:
|
||||
|
||||
Enable the device
|
||||
Request MMIO/IOP resources
|
||||
Set the DMA mask size (for both coherent and streaming DMA)
|
||||
Allocate and initialize shared control data (pci_allocate_coherent())
|
||||
Access device configuration space (if needed)
|
||||
Register IRQ handler (request_irq())
|
||||
Initialize non-PCI (i.e. LAN/SCSI/etc parts of the chip)
|
||||
Enable DMA/processing engines.
|
||||
- Enable the device
|
||||
- Request MMIO/IOP resources
|
||||
- Set the DMA mask size (for both coherent and streaming DMA)
|
||||
- Allocate and initialize shared control data (pci_allocate_coherent())
|
||||
- Access device configuration space (if needed)
|
||||
- Register IRQ handler (request_irq())
|
||||
- Initialize non-PCI (i.e. LAN/SCSI/etc parts of the chip)
|
||||
- Enable DMA/processing engines.
|
||||
|
||||
The driver can access PCI config space registers at any time.
|
||||
(Well, almost. When running BIST, config space can go away...but
|
||||
@ -252,26 +193,29 @@ that will just result in a PCI Bus Master Abort and config reads
|
||||
will return garbage).
|
||||
|
||||
|
||||
3.1 Enable the PCI device
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Enable the PCI device
|
||||
---------------------
|
||||
Before touching any device registers, the driver needs to enable
|
||||
the PCI device by calling pci_enable_device(). This will:
|
||||
o wake up the device if it was in suspended state,
|
||||
o allocate I/O and memory regions of the device (if BIOS did not),
|
||||
o allocate an IRQ (if BIOS did not).
|
||||
|
||||
NOTE: pci_enable_device() can fail! Check the return value.
|
||||
- wake up the device if it was in suspended state,
|
||||
- allocate I/O and memory regions of the device (if BIOS did not),
|
||||
- allocate an IRQ (if BIOS did not).
|
||||
|
||||
[ OS BUG: we don't check resource allocations before enabling those
|
||||
resources. The sequence would make more sense if we called
|
||||
pci_request_resources() before calling pci_enable_device().
|
||||
Currently, the device drivers can't detect the bug when when two
|
||||
devices have been allocated the same range. This is not a common
|
||||
problem and unlikely to get fixed soon.
|
||||
.. note::
|
||||
pci_enable_device() can fail! Check the return value.
|
||||
|
||||
.. warning::
|
||||
OS BUG: we don't check resource allocations before enabling those
|
||||
resources. The sequence would make more sense if we called
|
||||
pci_request_resources() before calling pci_enable_device().
|
||||
Currently, the device drivers can't detect the bug when when two
|
||||
devices have been allocated the same range. This is not a common
|
||||
problem and unlikely to get fixed soon.
|
||||
|
||||
This has been discussed before but not changed as of 2.6.19:
|
||||
http://lkml.org/lkml/2006/3/2/194
|
||||
|
||||
This has been discussed before but not changed as of 2.6.19:
|
||||
http://lkml.org/lkml/2006/3/2/194
|
||||
]
|
||||
|
||||
pci_set_master() will enable DMA by setting the bus master bit
|
||||
in the PCI_COMMAND register. It also fixes the latency timer value if
|
||||
@ -288,8 +232,8 @@ pci_try_set_mwi() to have the system do its best effort at enabling
|
||||
Mem-Wr-Inval.
|
||||
|
||||
|
||||
3.2 Request MMIO/IOP resources
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Request MMIO/IOP resources
|
||||
--------------------------
|
||||
Memory (MMIO), and I/O port addresses should NOT be read directly
|
||||
from the PCI device config space. Use the values in the pci_dev structure
|
||||
as the PCI "bus address" might have been remapped to a "host physical"
|
||||
@ -304,9 +248,10 @@ Conversely, drivers should call pci_release_region() AFTER
|
||||
calling pci_disable_device().
|
||||
The idea is to prevent two devices colliding on the same address range.
|
||||
|
||||
[ See OS BUG comment above. Currently (2.6.19), The driver can only
|
||||
determine MMIO and IO Port resource availability _after_ calling
|
||||
pci_enable_device(). ]
|
||||
.. tip::
|
||||
See OS BUG comment above. Currently (2.6.19), The driver can only
|
||||
determine MMIO and IO Port resource availability _after_ calling
|
||||
pci_enable_device().
|
||||
|
||||
Generic flavors of pci_request_region() are request_mem_region()
|
||||
(for MMIO ranges) and request_region() (for IO Port ranges).
|
||||
@ -316,12 +261,13 @@ BARs.
|
||||
Also see pci_request_selected_regions() below.
|
||||
|
||||
|
||||
3.3 Set the DMA mask size
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
[ If anything below doesn't make sense, please refer to
|
||||
Documentation/DMA-API.txt. This section is just a reminder that
|
||||
drivers need to indicate DMA capabilities of the device and is not
|
||||
an authoritative source for DMA interfaces. ]
|
||||
Set the DMA mask size
|
||||
---------------------
|
||||
.. note::
|
||||
If anything below doesn't make sense, please refer to
|
||||
Documentation/DMA-API.txt. This section is just a reminder that
|
||||
drivers need to indicate DMA capabilities of the device and is not
|
||||
an authoritative source for DMA interfaces.
|
||||
|
||||
While all drivers should explicitly indicate the DMA capability
|
||||
(e.g. 32 or 64 bit) of the PCI bus master, devices with more than
|
||||
@ -342,23 +288,23 @@ Many 64-bit "PCI" devices (before PCI-X) and some PCI-X devices are
|
||||
("consistent") data.
|
||||
|
||||
|
||||
3.4 Setup shared control data
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Setup shared control data
|
||||
-------------------------
|
||||
Once the DMA masks are set, the driver can allocate "consistent" (a.k.a. shared)
|
||||
memory. See Documentation/DMA-API.txt for a full description of
|
||||
the DMA APIs. This section is just a reminder that it needs to be done
|
||||
before enabling DMA on the device.
|
||||
|
||||
|
||||
3.5 Initialize device registers
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Initialize device registers
|
||||
---------------------------
|
||||
Some drivers will need specific "capability" fields programmed
|
||||
or other "vendor specific" register initialized or reset.
|
||||
E.g. clearing pending interrupts.
|
||||
|
||||
|
||||
3.6 Register IRQ handler
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Register IRQ handler
|
||||
--------------------
|
||||
While calling request_irq() is the last step described here,
|
||||
this is often just another intermediate step to initialize a device.
|
||||
This step can often be deferred until the device is opened for use.
|
||||
@ -396,6 +342,7 @@ and msix_enabled flags in the pci_dev structure after calling
|
||||
pci_alloc_irq_vectors.
|
||||
|
||||
There are (at least) two really good reasons for using MSI:
|
||||
|
||||
1) MSI is an exclusive interrupt vector by definition.
|
||||
This means the interrupt handler doesn't have to verify
|
||||
its device caused the interrupt.
|
||||
@ -410,24 +357,23 @@ See drivers/infiniband/hw/mthca/ or drivers/net/tg3.c for examples
|
||||
of MSI/MSI-X usage.
|
||||
|
||||
|
||||
|
||||
4. PCI device shutdown
|
||||
~~~~~~~~~~~~~~~~~~~~~~~
|
||||
PCI device shutdown
|
||||
===================
|
||||
|
||||
When a PCI device driver is being unloaded, most of the following
|
||||
steps need to be performed:
|
||||
|
||||
Disable the device from generating IRQs
|
||||
Release the IRQ (free_irq())
|
||||
Stop all DMA activity
|
||||
Release DMA buffers (both streaming and consistent)
|
||||
Unregister from other subsystems (e.g. scsi or netdev)
|
||||
Disable device from responding to MMIO/IO Port addresses
|
||||
Release MMIO/IO Port resource(s)
|
||||
- Disable the device from generating IRQs
|
||||
- Release the IRQ (free_irq())
|
||||
- Stop all DMA activity
|
||||
- Release DMA buffers (both streaming and consistent)
|
||||
- Unregister from other subsystems (e.g. scsi or netdev)
|
||||
- Disable device from responding to MMIO/IO Port addresses
|
||||
- Release MMIO/IO Port resource(s)
|
||||
|
||||
|
||||
4.1 Stop IRQs on the device
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Stop IRQs on the device
|
||||
-----------------------
|
||||
How to do this is chip/device specific. If it's not done, it opens
|
||||
the possibility of a "screaming interrupt" if (and only if)
|
||||
the IRQ is shared with another device.
|
||||
@ -446,16 +392,16 @@ MSI and MSI-X are defined to be exclusive interrupts and thus
|
||||
are not susceptible to the "screaming interrupt" problem.
|
||||
|
||||
|
||||
4.2 Release the IRQ
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
Release the IRQ
|
||||
---------------
|
||||
Once the device is quiesced (no more IRQs), one can call free_irq().
|
||||
This function will return control once any pending IRQs are handled,
|
||||
"unhook" the drivers IRQ handler from that IRQ, and finally release
|
||||
the IRQ if no one else is using it.
|
||||
|
||||
|
||||
4.3 Stop all DMA activity
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Stop all DMA activity
|
||||
---------------------
|
||||
It's extremely important to stop all DMA operations BEFORE attempting
|
||||
to deallocate DMA control data. Failure to do so can result in memory
|
||||
corruption, hangs, and on some chip-sets a hard crash.
|
||||
@ -467,8 +413,8 @@ While this step sounds obvious and trivial, several "mature" drivers
|
||||
didn't get this step right in the past.
|
||||
|
||||
|
||||
4.4 Release DMA buffers
|
||||
~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Release DMA buffers
|
||||
-------------------
|
||||
Once DMA is stopped, clean up streaming DMA first.
|
||||
I.e. unmap data buffers and return buffers to "upstream"
|
||||
owners if there is one.
|
||||
@ -478,8 +424,8 @@ Then clean up "consistent" buffers which contain the control data.
|
||||
See Documentation/DMA-API.txt for details on unmapping interfaces.
|
||||
|
||||
|
||||
4.5 Unregister from other subsystems
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Unregister from other subsystems
|
||||
--------------------------------
|
||||
Most low level PCI device drivers support some other subsystem
|
||||
like USB, ALSA, SCSI, NetDev, Infiniband, etc. Make sure your
|
||||
driver isn't losing resources from that other subsystem.
|
||||
@ -487,31 +433,30 @@ If this happens, typically the symptom is an Oops (panic) when
|
||||
the subsystem attempts to call into a driver that has been unloaded.
|
||||
|
||||
|
||||
4.6 Disable Device from responding to MMIO/IO Port addresses
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Disable Device from responding to MMIO/IO Port addresses
|
||||
--------------------------------------------------------
|
||||
io_unmap() MMIO or IO Port resources and then call pci_disable_device().
|
||||
This is the symmetric opposite of pci_enable_device().
|
||||
Do not access device registers after calling pci_disable_device().
|
||||
|
||||
|
||||
4.7 Release MMIO/IO Port Resource(s)
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Release MMIO/IO Port Resource(s)
|
||||
--------------------------------
|
||||
Call pci_release_region() to mark the MMIO or IO Port range as available.
|
||||
Failure to do so usually results in the inability to reload the driver.
|
||||
|
||||
|
||||
How to access PCI config space
|
||||
==============================
|
||||
|
||||
5. How to access PCI config space
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
You can use pci_(read|write)_config_(byte|word|dword) to access the config
|
||||
space of a device represented by struct pci_dev *. All these functions return 0
|
||||
when successful or an error code (PCIBIOS_...) which can be translated to a text
|
||||
string by pcibios_strerror. Most drivers expect that accesses to valid PCI
|
||||
You can use `pci_(read|write)_config_(byte|word|dword)` to access the config
|
||||
space of a device represented by `struct pci_dev *`. All these functions return
|
||||
0 when successful or an error code (`PCIBIOS_...`) which can be translated to a
|
||||
text string by pcibios_strerror. Most drivers expect that accesses to valid PCI
|
||||
devices don't fail.
|
||||
|
||||
If you don't have a struct pci_dev available, you can call
|
||||
pci_bus_(read|write)_config_(byte|word|dword) to access a given device
|
||||
`pci_bus_(read|write)_config_(byte|word|dword)` to access a given device
|
||||
and function on that bus.
|
||||
|
||||
If you access fields in the standard portion of the config header, please
|
||||
@ -522,10 +467,10 @@ pci_find_capability() for the particular capability and it will find the
|
||||
corresponding register block for you.
|
||||
|
||||
|
||||
Other interesting functions
|
||||
===========================
|
||||
|
||||
6. Other interesting functions
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
============================= ================================================
|
||||
pci_get_domain_bus_and_slot() Find pci_dev corresponding to given domain,
|
||||
bus and slot and number. If the device is
|
||||
found, its reference count is increased.
|
||||
@ -539,11 +484,11 @@ pci_set_drvdata() Set private driver data pointer for a pci_dev
|
||||
pci_get_drvdata() Return private driver data pointer for a pci_dev
|
||||
pci_set_mwi() Enable Memory-Write-Invalidate transactions.
|
||||
pci_clear_mwi() Disable Memory-Write-Invalidate transactions.
|
||||
============================= ================================================
|
||||
|
||||
|
||||
|
||||
7. Miscellaneous hints
|
||||
~~~~~~~~~~~~~~~~~~~~~~
|
||||
Miscellaneous hints
|
||||
===================
|
||||
|
||||
When displaying PCI device names to the user (for example when a driver wants
|
||||
to tell the user what card has it found), please use pci_name(pci_dev).
|
||||
@ -559,9 +504,8 @@ on the bus need to be capable of doing it, so this is something which needs
|
||||
to be handled by platform and generic code, not individual drivers.
|
||||
|
||||
|
||||
|
||||
8. Vendor and device identifications
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Vendor and device identifications
|
||||
=================================
|
||||
|
||||
Do not add new device or vendor IDs to include/linux/pci_ids.h unless they
|
||||
are shared across multiple drivers. You can add private definitions in
|
||||
@ -575,28 +519,27 @@ There are mirrors of the pci.ids file at http://pciids.sourceforge.net/
|
||||
and https://github.com/pciutils/pciids.
|
||||
|
||||
|
||||
|
||||
9. Obsolete functions
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
Obsolete functions
|
||||
==================
|
||||
|
||||
There are several functions which you might come across when trying to
|
||||
port an old driver to the new PCI interface. They are no longer present
|
||||
in the kernel as they aren't compatible with hotplug or PCI domains or
|
||||
having sane locking.
|
||||
|
||||
================= ===========================================
|
||||
pci_find_device() Superseded by pci_get_device()
|
||||
pci_find_subsys() Superseded by pci_get_subsys()
|
||||
pci_find_slot() Superseded by pci_get_domain_bus_and_slot()
|
||||
pci_get_slot() Superseded by pci_get_domain_bus_and_slot()
|
||||
|
||||
================= ===========================================
|
||||
|
||||
The alternative is the traditional PCI device driver that walks PCI
|
||||
device lists. This is still possible but discouraged.
|
||||
|
||||
|
||||
|
||||
10. MMIO Space and "Write Posting"
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
MMIO Space and "Write Posting"
|
||||
==============================
|
||||
|
||||
Converting a driver from using I/O Port space to using MMIO space
|
||||
often requires some additional changes. Specifically, "write posting"
|
||||
@ -609,14 +552,14 @@ the CPU before the transaction has reached its destination.
|
||||
|
||||
Thus, timing sensitive code should add readl() where the CPU is
|
||||
expected to wait before doing other work. The classic "bit banging"
|
||||
sequence works fine for I/O Port space:
|
||||
sequence works fine for I/O Port space::
|
||||
|
||||
for (i = 8; --i; val >>= 1) {
|
||||
outb(val & 1, ioport_reg); /* write bit */
|
||||
udelay(10);
|
||||
}
|
||||
|
||||
The same sequence for MMIO space should be:
|
||||
The same sequence for MMIO space should be::
|
||||
|
||||
for (i = 8; --i; val >>= 1) {
|
||||
writeb(val & 1, mmio_reg); /* write bit */
|
||||
@ -633,4 +576,3 @@ handle the PCI master abort on all platforms if the PCI device is
|
||||
expected to not respond to a readl(). Most x86 platforms will allow
|
||||
MMIO reads to master abort (a.k.a. "Soft Fail") and return garbage
|
||||
(e.g. ~0). But many RISC platforms will crash (a.k.a."Hard Fail").
|
||||
|
@ -1,21 +1,29 @@
|
||||
The PCI Express Advanced Error Reporting Driver Guide HOWTO
|
||||
T. Long Nguyen <tom.l.nguyen@intel.com>
|
||||
Yanmin Zhang <yanmin.zhang@intel.com>
|
||||
07/29/2006
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
.. include:: <isonum.txt>
|
||||
|
||||
===========================================================
|
||||
The PCI Express Advanced Error Reporting Driver Guide HOWTO
|
||||
===========================================================
|
||||
|
||||
1. Overview
|
||||
:Authors: - T. Long Nguyen <tom.l.nguyen@intel.com>
|
||||
- Yanmin Zhang <yanmin.zhang@intel.com>
|
||||
|
||||
1.1 About this guide
|
||||
:Copyright: |copy| 2006 Intel Corporation
|
||||
|
||||
Overview
|
||||
===========
|
||||
|
||||
About this guide
|
||||
----------------
|
||||
|
||||
This guide describes the basics of the PCI Express Advanced Error
|
||||
Reporting (AER) driver and provides information on how to use it, as
|
||||
well as how to enable the drivers of endpoint devices to conform with
|
||||
PCI Express AER driver.
|
||||
|
||||
1.2 Copyright (C) Intel Corporation 2006.
|
||||
|
||||
1.3 What is the PCI Express AER Driver?
|
||||
What is the PCI Express AER Driver?
|
||||
-----------------------------------
|
||||
|
||||
PCI Express error signaling can occur on the PCI Express link itself
|
||||
or on behalf of transactions initiated on the link. PCI Express
|
||||
@ -30,17 +38,19 @@ The PCI Express AER driver provides the infrastructure to support PCI
|
||||
Express Advanced Error Reporting capability. The PCI Express AER
|
||||
driver provides three basic functions:
|
||||
|
||||
- Gathers the comprehensive error information if errors occurred.
|
||||
- Reports error to the users.
|
||||
- Performs error recovery actions.
|
||||
- Gathers the comprehensive error information if errors occurred.
|
||||
- Reports error to the users.
|
||||
- Performs error recovery actions.
|
||||
|
||||
AER driver only attaches root ports which support PCI-Express AER
|
||||
capability.
|
||||
|
||||
|
||||
2. User Guide
|
||||
User Guide
|
||||
==========
|
||||
|
||||
2.1 Include the PCI Express AER Root Driver into the Linux Kernel
|
||||
Include the PCI Express AER Root Driver into the Linux Kernel
|
||||
-------------------------------------------------------------
|
||||
|
||||
The PCI Express AER Root driver is a Root Port service driver attached
|
||||
to the PCI Express Port Bus driver. If a user wants to use it, the driver
|
||||
@ -48,7 +58,8 @@ has to be compiled. Option CONFIG_PCIEAER supports this capability. It
|
||||
depends on CONFIG_PCIEPORTBUS, so pls. set CONFIG_PCIEPORTBUS=y and
|
||||
CONFIG_PCIEAER = y.
|
||||
|
||||
2.2 Load PCI Express AER Root Driver
|
||||
Load PCI Express AER Root Driver
|
||||
--------------------------------
|
||||
|
||||
Some systems have AER support in firmware. Enabling Linux AER support at
|
||||
the same time the firmware handles AER may result in unpredictable
|
||||
@ -56,30 +67,34 @@ behavior. Therefore, Linux does not handle AER events unless the firmware
|
||||
grants AER control to the OS via the ACPI _OSC method. See the PCI FW 3.0
|
||||
Specification for details regarding _OSC usage.
|
||||
|
||||
2.3 AER error output
|
||||
AER error output
|
||||
----------------
|
||||
|
||||
When a PCIe AER error is captured, an error message will be output to
|
||||
console. If it's a correctable error, it is output as a warning.
|
||||
Otherwise, it is printed as an error. So users could choose different
|
||||
log level to filter out correctable error messages.
|
||||
|
||||
Below shows an example:
|
||||
0000:50:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, id=0500(Requester ID)
|
||||
0000:50:00.0: device [8086:0329] error status/mask=00100000/00000000
|
||||
0000:50:00.0: [20] Unsupported Request (First)
|
||||
0000:50:00.0: TLP Header: 04000001 00200a03 05010000 00050100
|
||||
Below shows an example::
|
||||
|
||||
0000:50:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, id=0500(Requester ID)
|
||||
0000:50:00.0: device [8086:0329] error status/mask=00100000/00000000
|
||||
0000:50:00.0: [20] Unsupported Request (First)
|
||||
0000:50:00.0: TLP Header: 04000001 00200a03 05010000 00050100
|
||||
|
||||
In the example, 'Requester ID' means the ID of the device who sends
|
||||
the error message to root port. Pls. refer to pci express specs for
|
||||
other fields.
|
||||
|
||||
2.4 AER Statistics / Counters
|
||||
AER Statistics / Counters
|
||||
-------------------------
|
||||
|
||||
When PCIe AER errors are captured, the counters / statistics are also exposed
|
||||
in the form of sysfs attributes which are documented at
|
||||
Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats
|
||||
|
||||
3. Developer Guide
|
||||
Developer Guide
|
||||
===============
|
||||
|
||||
To enable AER aware support requires a software driver to configure
|
||||
the AER capability structure within its device and to provide callbacks.
|
||||
@ -120,7 +135,8 @@ hierarchy and links. These errors do not include any device specific
|
||||
errors because device specific errors will still get sent directly to
|
||||
the device driver.
|
||||
|
||||
3.1 Configure the AER capability structure
|
||||
Configure the AER capability structure
|
||||
--------------------------------------
|
||||
|
||||
AER aware drivers of PCI Express component need change the device
|
||||
control registers to enable AER. They also could change AER registers,
|
||||
@ -128,9 +144,11 @@ including mask and severity registers. Helper function
|
||||
pci_enable_pcie_error_reporting could be used to enable AER. See
|
||||
section 3.3.
|
||||
|
||||
3.2. Provide callbacks
|
||||
Provide callbacks
|
||||
-----------------
|
||||
|
||||
3.2.1 callback reset_link to reset pci express link
|
||||
callback reset_link to reset pci express link
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
This callback is used to reset the pci express physical link when a
|
||||
fatal error happens. The root port aer service driver provides a
|
||||
@ -140,13 +158,15 @@ upstream ports should provide their own reset_link functions.
|
||||
|
||||
In struct pcie_port_service_driver, a new pointer, reset_link, is
|
||||
added.
|
||||
::
|
||||
|
||||
pci_ers_result_t (*reset_link) (struct pci_dev *dev);
|
||||
pci_ers_result_t (*reset_link) (struct pci_dev *dev);
|
||||
|
||||
Section 3.2.2.2 provides more detailed info on when to call
|
||||
reset_link.
|
||||
|
||||
3.2.2 PCI error-recovery callbacks
|
||||
PCI error-recovery callbacks
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The PCI Express AER Root driver uses error callbacks to coordinate
|
||||
with downstream device drivers associated with a hierarchy in question
|
||||
@ -161,7 +181,8 @@ definitions of the callbacks.
|
||||
|
||||
Below sections specify when to call the error callback functions.
|
||||
|
||||
3.2.2.1 Correctable errors
|
||||
Correctable errors
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Correctable errors pose no impacts on the functionality of
|
||||
the interface. The PCI Express protocol can recover without any
|
||||
@ -169,13 +190,16 @@ software intervention or any loss of data. These errors do not
|
||||
require any recovery actions. The AER driver clears the device's
|
||||
correctable error status register accordingly and logs these errors.
|
||||
|
||||
3.2.2.2 Non-correctable (non-fatal and fatal) errors
|
||||
Non-correctable (non-fatal and fatal) errors
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
If an error message indicates a non-fatal error, performing link reset
|
||||
at upstream is not required. The AER driver calls error_detected(dev,
|
||||
pci_channel_io_normal) to all drivers associated within a hierarchy in
|
||||
question. for example,
|
||||
EndPoint<==>DownstreamPort B<==>UpstreamPort A<==>RootPort.
|
||||
question. for example::
|
||||
|
||||
EndPoint<==>DownstreamPort B<==>UpstreamPort A<==>RootPort
|
||||
|
||||
If Upstream port A captures an AER error, the hierarchy consists of
|
||||
Downstream port B and EndPoint.
|
||||
|
||||
@ -199,53 +223,72 @@ function. If error_detected returns PCI_ERS_RESULT_CAN_RECOVER and
|
||||
reset_link returns PCI_ERS_RESULT_RECOVERED, the error handling goes
|
||||
to mmio_enabled.
|
||||
|
||||
3.3 helper functions
|
||||
helper functions
|
||||
----------------
|
||||
::
|
||||
|
||||
int pci_enable_pcie_error_reporting(struct pci_dev *dev);
|
||||
|
||||
3.3.1 int pci_enable_pcie_error_reporting(struct pci_dev *dev);
|
||||
pci_enable_pcie_error_reporting enables the device to send error
|
||||
messages to root port when an error is detected. Note that devices
|
||||
don't enable the error reporting by default, so device drivers need
|
||||
call this function to enable it.
|
||||
|
||||
3.3.2 int pci_disable_pcie_error_reporting(struct pci_dev *dev);
|
||||
::
|
||||
|
||||
int pci_disable_pcie_error_reporting(struct pci_dev *dev);
|
||||
|
||||
pci_disable_pcie_error_reporting disables the device to send error
|
||||
messages to root port when an error is detected.
|
||||
|
||||
3.3.3 int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev);
|
||||
::
|
||||
|
||||
int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev);`
|
||||
|
||||
pci_cleanup_aer_uncorrect_error_status cleanups the uncorrectable
|
||||
error status register.
|
||||
|
||||
3.4 Frequent Asked Questions
|
||||
Frequent Asked Questions
|
||||
------------------------
|
||||
|
||||
Q: What happens if a PCI Express device driver does not provide an
|
||||
error recovery handler (pci_driver->err_handler is equal to NULL)?
|
||||
Q:
|
||||
What happens if a PCI Express device driver does not provide an
|
||||
error recovery handler (pci_driver->err_handler is equal to NULL)?
|
||||
|
||||
A: The devices attached with the driver won't be recovered. If the
|
||||
error is fatal, kernel will print out warning messages. Please refer
|
||||
to section 3 for more information.
|
||||
A:
|
||||
The devices attached with the driver won't be recovered. If the
|
||||
error is fatal, kernel will print out warning messages. Please refer
|
||||
to section 3 for more information.
|
||||
|
||||
Q: What happens if an upstream port service driver does not provide
|
||||
callback reset_link?
|
||||
Q:
|
||||
What happens if an upstream port service driver does not provide
|
||||
callback reset_link?
|
||||
|
||||
A: Fatal error recovery will fail if the errors are reported by the
|
||||
upstream ports who are attached by the service driver.
|
||||
A:
|
||||
Fatal error recovery will fail if the errors are reported by the
|
||||
upstream ports who are attached by the service driver.
|
||||
|
||||
Q: How does this infrastructure deal with driver that is not PCI
|
||||
Express aware?
|
||||
Q:
|
||||
How does this infrastructure deal with driver that is not PCI
|
||||
Express aware?
|
||||
|
||||
A: This infrastructure calls the error callback functions of the
|
||||
driver when an error happens. But if the driver is not aware of
|
||||
PCI Express, the device might not report its own errors to root
|
||||
port.
|
||||
A:
|
||||
This infrastructure calls the error callback functions of the
|
||||
driver when an error happens. But if the driver is not aware of
|
||||
PCI Express, the device might not report its own errors to root
|
||||
port.
|
||||
|
||||
Q: What modifications will that driver need to make it compatible
|
||||
with the PCI Express AER Root driver?
|
||||
Q:
|
||||
What modifications will that driver need to make it compatible
|
||||
with the PCI Express AER Root driver?
|
||||
|
||||
A: It could call the helper functions to enable AER in devices and
|
||||
cleanup uncorrectable status register. Pls. refer to section 3.3.
|
||||
A:
|
||||
It could call the helper functions to enable AER in devices and
|
||||
cleanup uncorrectable status register. Pls. refer to section 3.3.
|
||||
|
||||
|
||||
4. Software error injection
|
||||
Software error injection
|
||||
========================
|
||||
|
||||
Debugging PCIe AER error recovery code is quite difficult because it
|
||||
is hard to trigger real hardware errors. Software based error
|
||||
@ -261,6 +304,7 @@ After reboot with new kernel or insert the module, a device file named
|
||||
|
||||
Then, you need a user space tool named aer-inject, which can be gotten
|
||||
from:
|
||||
|
||||
https://git.kernel.org/cgit/linux/kernel/git/gong.chen/aer-inject.git/
|
||||
|
||||
More information about aer-inject can be found in the document comes
|
@ -1,16 +1,23 @@
|
||||
The PCI Express Port Bus Driver Guide HOWTO
|
||||
Tom L Nguyen tom.l.nguyen@intel.com
|
||||
11/03/2004
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
.. include:: <isonum.txt>
|
||||
|
||||
1. About this guide
|
||||
===========================================
|
||||
The PCI Express Port Bus Driver Guide HOWTO
|
||||
===========================================
|
||||
|
||||
:Author: Tom L Nguyen tom.l.nguyen@intel.com 11/03/2004
|
||||
:Copyright: |copy| 2004 Intel Corporation
|
||||
|
||||
About this guide
|
||||
================
|
||||
|
||||
This guide describes the basics of the PCI Express Port Bus driver
|
||||
and provides information on how to enable the service drivers to
|
||||
register/unregister with the PCI Express Port Bus Driver.
|
||||
|
||||
2. Copyright 2004 Intel Corporation
|
||||
|
||||
3. What is the PCI Express Port Bus Driver
|
||||
What is the PCI Express Port Bus Driver
|
||||
=======================================
|
||||
|
||||
A PCI Express Port is a logical PCI-PCI Bridge structure. There
|
||||
are two types of PCI Express Port: the Root Port and the Switch
|
||||
@ -30,7 +37,8 @@ support (AER), and virtual channel support (VC). These services may
|
||||
be handled by a single complex driver or be individually distributed
|
||||
and handled by corresponding service drivers.
|
||||
|
||||
4. Why use the PCI Express Port Bus Driver?
|
||||
Why use the PCI Express Port Bus Driver?
|
||||
========================================
|
||||
|
||||
In existing Linux kernels, the Linux Device Driver Model allows a
|
||||
physical device to be handled by only a single driver. The PCI
|
||||
@ -51,28 +59,31 @@ PCI Express Ports and distributes all provided service requests
|
||||
to the corresponding service drivers as required. Some key
|
||||
advantages of using the PCI Express Port Bus driver are listed below:
|
||||
|
||||
- Allow multiple service drivers to run simultaneously on
|
||||
a PCI-PCI Bridge Port device.
|
||||
- Allow multiple service drivers to run simultaneously on
|
||||
a PCI-PCI Bridge Port device.
|
||||
|
||||
- Allow service drivers implemented in an independent
|
||||
staged approach.
|
||||
- Allow service drivers implemented in an independent
|
||||
staged approach.
|
||||
|
||||
- Allow one service driver to run on multiple PCI-PCI Bridge
|
||||
Port devices.
|
||||
- Allow one service driver to run on multiple PCI-PCI Bridge
|
||||
Port devices.
|
||||
|
||||
- Manage and distribute resources of a PCI-PCI Bridge Port
|
||||
device to requested service drivers.
|
||||
- Manage and distribute resources of a PCI-PCI Bridge Port
|
||||
device to requested service drivers.
|
||||
|
||||
5. Configuring the PCI Express Port Bus Driver vs. Service Drivers
|
||||
Configuring the PCI Express Port Bus Driver vs. Service Drivers
|
||||
===============================================================
|
||||
|
||||
5.1 Including the PCI Express Port Bus Driver Support into the Kernel
|
||||
Including the PCI Express Port Bus Driver Support into the Kernel
|
||||
-----------------------------------------------------------------
|
||||
|
||||
Including the PCI Express Port Bus driver depends on whether the PCI
|
||||
Express support is included in the kernel config. The kernel will
|
||||
automatically include the PCI Express Port Bus driver as a kernel
|
||||
driver when the PCI Express support is enabled in the kernel.
|
||||
|
||||
5.2 Enabling Service Driver Support
|
||||
Enabling Service Driver Support
|
||||
-------------------------------
|
||||
|
||||
PCI device drivers are implemented based on Linux Device Driver Model.
|
||||
All service drivers are PCI device drivers. As discussed above, it is
|
||||
@ -89,9 +100,11 @@ header file /include/linux/pcieport_if.h, before calling these APIs.
|
||||
Failure to do so will result an identity mismatch, which prevents
|
||||
the PCI Express Port Bus driver from loading a service driver.
|
||||
|
||||
5.2.1 pcie_port_service_register
|
||||
pcie_port_service_register
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
::
|
||||
|
||||
int pcie_port_service_register(struct pcie_port_service_driver *new)
|
||||
int pcie_port_service_register(struct pcie_port_service_driver *new)
|
||||
|
||||
This API replaces the Linux Driver Model's pci_register_driver API. A
|
||||
service driver should always calls pcie_port_service_register at
|
||||
@ -99,69 +112,76 @@ module init. Note that after service driver being loaded, calls
|
||||
such as pci_enable_device(dev) and pci_set_master(dev) are no longer
|
||||
necessary since these calls are executed by the PCI Port Bus driver.
|
||||
|
||||
5.2.2 pcie_port_service_unregister
|
||||
pcie_port_service_unregister
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
::
|
||||
|
||||
void pcie_port_service_unregister(struct pcie_port_service_driver *new)
|
||||
void pcie_port_service_unregister(struct pcie_port_service_driver *new)
|
||||
|
||||
pcie_port_service_unregister replaces the Linux Driver Model's
|
||||
pci_unregister_driver. It's always called by service driver when a
|
||||
module exits.
|
||||
|
||||
5.2.3 Sample Code
|
||||
Sample Code
|
||||
~~~~~~~~~~~
|
||||
|
||||
Below is sample service driver code to initialize the port service
|
||||
driver data structure.
|
||||
::
|
||||
|
||||
static struct pcie_port_service_id service_id[] = { {
|
||||
.vendor = PCI_ANY_ID,
|
||||
.device = PCI_ANY_ID,
|
||||
.port_type = PCIE_RC_PORT,
|
||||
.service_type = PCIE_PORT_SERVICE_AER,
|
||||
}, { /* end: all zeroes */ }
|
||||
};
|
||||
static struct pcie_port_service_id service_id[] = { {
|
||||
.vendor = PCI_ANY_ID,
|
||||
.device = PCI_ANY_ID,
|
||||
.port_type = PCIE_RC_PORT,
|
||||
.service_type = PCIE_PORT_SERVICE_AER,
|
||||
}, { /* end: all zeroes */ }
|
||||
};
|
||||
|
||||
static struct pcie_port_service_driver root_aerdrv = {
|
||||
.name = (char *)device_name,
|
||||
.id_table = &service_id[0],
|
||||
static struct pcie_port_service_driver root_aerdrv = {
|
||||
.name = (char *)device_name,
|
||||
.id_table = &service_id[0],
|
||||
|
||||
.probe = aerdrv_load,
|
||||
.remove = aerdrv_unload,
|
||||
.probe = aerdrv_load,
|
||||
.remove = aerdrv_unload,
|
||||
|
||||
.suspend = aerdrv_suspend,
|
||||
.resume = aerdrv_resume,
|
||||
};
|
||||
.suspend = aerdrv_suspend,
|
||||
.resume = aerdrv_resume,
|
||||
};
|
||||
|
||||
Below is a sample code for registering/unregistering a service
|
||||
driver.
|
||||
::
|
||||
|
||||
static int __init aerdrv_service_init(void)
|
||||
{
|
||||
int retval = 0;
|
||||
static int __init aerdrv_service_init(void)
|
||||
{
|
||||
int retval = 0;
|
||||
|
||||
retval = pcie_port_service_register(&root_aerdrv);
|
||||
if (!retval) {
|
||||
/*
|
||||
* FIX ME
|
||||
*/
|
||||
}
|
||||
return retval;
|
||||
}
|
||||
retval = pcie_port_service_register(&root_aerdrv);
|
||||
if (!retval) {
|
||||
/*
|
||||
* FIX ME
|
||||
*/
|
||||
}
|
||||
return retval;
|
||||
}
|
||||
|
||||
static void __exit aerdrv_service_exit(void)
|
||||
{
|
||||
pcie_port_service_unregister(&root_aerdrv);
|
||||
}
|
||||
static void __exit aerdrv_service_exit(void)
|
||||
{
|
||||
pcie_port_service_unregister(&root_aerdrv);
|
||||
}
|
||||
|
||||
module_init(aerdrv_service_init);
|
||||
module_exit(aerdrv_service_exit);
|
||||
module_init(aerdrv_service_init);
|
||||
module_exit(aerdrv_service_exit);
|
||||
|
||||
6. Possible Resource Conflicts
|
||||
Possible Resource Conflicts
|
||||
===========================
|
||||
|
||||
Since all service drivers of a PCI-PCI Bridge Port device are
|
||||
allowed to run simultaneously, below lists a few of possible resource
|
||||
conflicts with proposed solutions.
|
||||
|
||||
6.1 MSI and MSI-X Vector Resource
|
||||
MSI and MSI-X Vector Resource
|
||||
-----------------------------
|
||||
|
||||
Once MSI or MSI-X interrupts are enabled on a device, it stays in this
|
||||
mode until they are disabled again. Since service drivers of the same
|
||||
@ -179,7 +199,8 @@ driver. Service drivers should use (struct pcie_device*)dev->irq to
|
||||
call request_irq/free_irq. In addition, the interrupt mode is stored
|
||||
in the field interrupt_mode of struct pcie_device.
|
||||
|
||||
6.3 PCI Memory/IO Mapped Regions
|
||||
PCI Memory/IO Mapped Regions
|
||||
----------------------------
|
||||
|
||||
Service drivers for PCI Express Power Management (PME), Advanced
|
||||
Error Reporting (AER), Hot-Plug (HP) and Virtual Channel (VC) access
|
||||
@ -188,7 +209,8 @@ registers accessed are independent of each other. This patch assumes
|
||||
that all service drivers will be well behaved and not overwrite
|
||||
other service driver's configuration settings.
|
||||
|
||||
6.4 PCI Config Registers
|
||||
PCI Config Registers
|
||||
--------------------
|
||||
|
||||
Each service driver runs its PCI config operations on its own
|
||||
capability structure except the PCI Express capability structure, in
|
@ -1,3 +1,7 @@
|
||||
==================
|
||||
Control Groupstats
|
||||
==================
|
||||
|
||||
Control Groupstats is inspired by the discussion at
|
||||
http://lkml.org/lkml/2007/4/11/187 and implements per cgroup statistics as
|
||||
suggested by Andrew Morton in http://lkml.org/lkml/2007/4/11/263.
|
||||
@ -19,9 +23,9 @@ about tasks blocked on I/O. If CONFIG_TASK_DELAY_ACCT is disabled, this
|
||||
information will not be available.
|
||||
|
||||
To extract cgroup statistics a utility very similar to getdelays.c
|
||||
has been developed, the sample output of the utility is shown below
|
||||
has been developed, the sample output of the utility is shown below::
|
||||
|
||||
~/balbir/cgroupstats # ./getdelays -C "/sys/fs/cgroup/a"
|
||||
sleeping 1, blocked 0, running 1, stopped 0, uninterruptible 0
|
||||
~/balbir/cgroupstats # ./getdelays -C "/sys/fs/cgroup"
|
||||
sleeping 155, blocked 0, running 1, stopped 0, uninterruptible 2
|
||||
~/balbir/cgroupstats # ./getdelays -C "/sys/fs/cgroup/a"
|
||||
sleeping 1, blocked 0, running 1, stopped 0, uninterruptible 0
|
||||
~/balbir/cgroupstats # ./getdelays -C "/sys/fs/cgroup"
|
||||
sleeping 155, blocked 0, running 1, stopped 0, uninterruptible 2
|
@ -1,5 +1,6 @@
|
||||
================
|
||||
Delay accounting
|
||||
----------------
|
||||
================
|
||||
|
||||
Tasks encounter delays in execution when they wait
|
||||
for some kernel resource to become available e.g. a
|
||||
@ -39,7 +40,9 @@ in detail in a separate document in this directory. Taskstats returns a
|
||||
generic data structure to userspace corresponding to per-pid and per-tgid
|
||||
statistics. The delay accounting functionality populates specific fields of
|
||||
this structure. See
|
||||
|
||||
include/linux/taskstats.h
|
||||
|
||||
for a description of the fields pertaining to delay accounting.
|
||||
It will generally be in the form of counters returning the cumulative
|
||||
delay seen for cpu, sync block I/O, swapin, memory reclaim etc.
|
||||
@ -61,13 +64,16 @@ also serves as an example of using the taskstats interface.
|
||||
Usage
|
||||
-----
|
||||
|
||||
Compile the kernel with
|
||||
Compile the kernel with::
|
||||
|
||||
CONFIG_TASK_DELAY_ACCT=y
|
||||
CONFIG_TASKSTATS=y
|
||||
|
||||
Delay accounting is enabled by default at boot up.
|
||||
To disable, add
|
||||
To disable, add::
|
||||
|
||||
nodelayacct
|
||||
|
||||
to the kernel boot options. The rest of the instructions
|
||||
below assume this has not been done.
|
||||
|
||||
@ -78,40 +84,43 @@ The utility also allows a given command to be
|
||||
executed and the corresponding delays to be
|
||||
seen.
|
||||
|
||||
General format of the getdelays command
|
||||
General format of the getdelays command::
|
||||
|
||||
getdelays [-t tgid] [-p pid] [-c cmd...]
|
||||
getdelays [-t tgid] [-p pid] [-c cmd...]
|
||||
|
||||
|
||||
Get delays, since system boot, for pid 10
|
||||
# ./getdelays -p 10
|
||||
(output similar to next case)
|
||||
Get delays, since system boot, for pid 10::
|
||||
|
||||
Get sum of delays, since system boot, for all pids with tgid 5
|
||||
# ./getdelays -t 5
|
||||
# ./getdelays -p 10
|
||||
(output similar to next case)
|
||||
|
||||
Get sum of delays, since system boot, for all pids with tgid 5::
|
||||
|
||||
# ./getdelays -t 5
|
||||
|
||||
|
||||
CPU count real total virtual total delay total
|
||||
7876 92005750 100000000 24001500
|
||||
IO count delay total
|
||||
0 0
|
||||
SWAP count delay total
|
||||
0 0
|
||||
RECLAIM count delay total
|
||||
0 0
|
||||
CPU count real total virtual total delay total
|
||||
7876 92005750 100000000 24001500
|
||||
IO count delay total
|
||||
0 0
|
||||
SWAP count delay total
|
||||
0 0
|
||||
RECLAIM count delay total
|
||||
0 0
|
||||
|
||||
Get delays seen in executing a given simple command
|
||||
# ./getdelays -c ls /
|
||||
Get delays seen in executing a given simple command::
|
||||
|
||||
bin data1 data3 data5 dev home media opt root srv sys usr
|
||||
boot data2 data4 data6 etc lib mnt proc sbin subdomain tmp var
|
||||
# ./getdelays -c ls /
|
||||
|
||||
bin data1 data3 data5 dev home media opt root srv sys usr
|
||||
boot data2 data4 data6 etc lib mnt proc sbin subdomain tmp var
|
||||
|
||||
|
||||
CPU count real total virtual total delay total
|
||||
CPU count real total virtual total delay total
|
||||
6 4000250 4000000 0
|
||||
IO count delay total
|
||||
IO count delay total
|
||||
0 0
|
||||
SWAP count delay total
|
||||
SWAP count delay total
|
||||
0 0
|
||||
RECLAIM count delay total
|
||||
RECLAIM count delay total
|
||||
0 0
|
14
Documentation/accounting/index.rst
Normal file
14
Documentation/accounting/index.rst
Normal file
@ -0,0 +1,14 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
==========
|
||||
Accounting
|
||||
==========
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
cgroupstats
|
||||
delay-accounting
|
||||
psi
|
||||
taskstats
|
||||
taskstats-struct
|
@ -35,14 +35,14 @@ Pressure interface
|
||||
Pressure information for each resource is exported through the
|
||||
respective file in /proc/pressure/ -- cpu, memory, and io.
|
||||
|
||||
The format for CPU is as such:
|
||||
The format for CPU is as such::
|
||||
|
||||
some avg10=0.00 avg60=0.00 avg300=0.00 total=0
|
||||
some avg10=0.00 avg60=0.00 avg300=0.00 total=0
|
||||
|
||||
and for memory and IO:
|
||||
and for memory and IO::
|
||||
|
||||
some avg10=0.00 avg60=0.00 avg300=0.00 total=0
|
||||
full avg10=0.00 avg60=0.00 avg300=0.00 total=0
|
||||
some avg10=0.00 avg60=0.00 avg300=0.00 total=0
|
||||
full avg10=0.00 avg60=0.00 avg300=0.00 total=0
|
||||
|
||||
The "some" line indicates the share of time in which at least some
|
||||
tasks are stalled on a given resource.
|
||||
@ -77,9 +77,9 @@ To register a trigger user has to open psi interface file under
|
||||
/proc/pressure/ representing the resource to be monitored and write the
|
||||
desired threshold and time window. The open file descriptor should be
|
||||
used to wait for trigger events using select(), poll() or epoll().
|
||||
The following format is used:
|
||||
The following format is used::
|
||||
|
||||
<some|full> <stall amount in us> <time window in us>
|
||||
<some|full> <stall amount in us> <time window in us>
|
||||
|
||||
For example writing "some 150000 1000000" into /proc/pressure/memory
|
||||
would add 150ms threshold for partial memory stall measured within
|
||||
@ -115,18 +115,20 @@ trigger is closed.
|
||||
Userspace monitor usage example
|
||||
===============================
|
||||
|
||||
#include <errno.h>
|
||||
#include <fcntl.h>
|
||||
#include <stdio.h>
|
||||
#include <poll.h>
|
||||
#include <string.h>
|
||||
#include <unistd.h>
|
||||
::
|
||||
|
||||
/*
|
||||
* Monitor memory partial stall with 1s tracking window size
|
||||
* and 150ms threshold.
|
||||
*/
|
||||
int main() {
|
||||
#include <errno.h>
|
||||
#include <fcntl.h>
|
||||
#include <stdio.h>
|
||||
#include <poll.h>
|
||||
#include <string.h>
|
||||
#include <unistd.h>
|
||||
|
||||
/*
|
||||
* Monitor memory partial stall with 1s tracking window size
|
||||
* and 150ms threshold.
|
||||
*/
|
||||
int main() {
|
||||
const char trig[] = "some 150000 1000000";
|
||||
struct pollfd fds;
|
||||
int n;
|
||||
@ -165,7 +167,7 @@ int main() {
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
}
|
||||
|
||||
Cgroup2 interface
|
||||
=================
|
@ -1,5 +1,6 @@
|
||||
====================
|
||||
The struct taskstats
|
||||
--------------------
|
||||
====================
|
||||
|
||||
This document contains an explanation of the struct taskstats fields.
|
||||
|
||||
@ -10,16 +11,24 @@ There are three different groups of fields in the struct taskstats:
|
||||
the common fields and basic accounting fields are collected for
|
||||
delivery at do_exit() of a task.
|
||||
2) Delay accounting fields
|
||||
These fields are placed between
|
||||
/* Delay accounting fields start */
|
||||
and
|
||||
/* Delay accounting fields end */
|
||||
These fields are placed between::
|
||||
|
||||
/* Delay accounting fields start */
|
||||
|
||||
and::
|
||||
|
||||
/* Delay accounting fields end */
|
||||
|
||||
Their values are collected if CONFIG_TASK_DELAY_ACCT is set.
|
||||
3) Extended accounting fields
|
||||
These fields are placed between
|
||||
/* Extended accounting fields start */
|
||||
and
|
||||
/* Extended accounting fields end */
|
||||
These fields are placed between::
|
||||
|
||||
/* Extended accounting fields start */
|
||||
|
||||
and::
|
||||
|
||||
/* Extended accounting fields end */
|
||||
|
||||
Their values are collected if CONFIG_TASK_XACCT is set.
|
||||
|
||||
4) Per-task and per-thread context switch count statistics
|
||||
@ -31,31 +40,33 @@ There are three different groups of fields in the struct taskstats:
|
||||
Future extension should add fields to the end of the taskstats struct, and
|
||||
should not change the relative position of each field within the struct.
|
||||
|
||||
::
|
||||
|
||||
struct taskstats {
|
||||
struct taskstats {
|
||||
|
||||
1) Common and basic accounting fields::
|
||||
|
||||
1) Common and basic accounting fields:
|
||||
/* The version number of this struct. This field is always set to
|
||||
* TAKSTATS_VERSION, which is defined in <linux/taskstats.h>.
|
||||
* Each time the struct is changed, the value should be incremented.
|
||||
*/
|
||||
__u16 version;
|
||||
|
||||
/* The exit code of a task. */
|
||||
/* The exit code of a task. */
|
||||
__u32 ac_exitcode; /* Exit status */
|
||||
|
||||
/* The accounting flags of a task as defined in <linux/acct.h>
|
||||
/* The accounting flags of a task as defined in <linux/acct.h>
|
||||
* Defined values are AFORK, ASU, ACOMPAT, ACORE, and AXSIG.
|
||||
*/
|
||||
__u8 ac_flag; /* Record flags */
|
||||
|
||||
/* The value of task_nice() of a task. */
|
||||
/* The value of task_nice() of a task. */
|
||||
__u8 ac_nice; /* task_nice */
|
||||
|
||||
/* The name of the command that started this task. */
|
||||
/* The name of the command that started this task. */
|
||||
char ac_comm[TS_COMM_LEN]; /* Command name */
|
||||
|
||||
/* The scheduling discipline as set in task->policy field. */
|
||||
/* The scheduling discipline as set in task->policy field. */
|
||||
__u8 ac_sched; /* Scheduling discipline */
|
||||
|
||||
__u8 ac_pad[3];
|
||||
@ -64,26 +75,27 @@ struct taskstats {
|
||||
__u32 ac_pid; /* Process ID */
|
||||
__u32 ac_ppid; /* Parent process ID */
|
||||
|
||||
/* The time when a task begins, in [secs] since 1970. */
|
||||
/* The time when a task begins, in [secs] since 1970. */
|
||||
__u32 ac_btime; /* Begin time [sec since 1970] */
|
||||
|
||||
/* The elapsed time of a task, in [usec]. */
|
||||
/* The elapsed time of a task, in [usec]. */
|
||||
__u64 ac_etime; /* Elapsed time [usec] */
|
||||
|
||||
/* The user CPU time of a task, in [usec]. */
|
||||
/* The user CPU time of a task, in [usec]. */
|
||||
__u64 ac_utime; /* User CPU time [usec] */
|
||||
|
||||
/* The system CPU time of a task, in [usec]. */
|
||||
/* The system CPU time of a task, in [usec]. */
|
||||
__u64 ac_stime; /* System CPU time [usec] */
|
||||
|
||||
/* The minor page fault count of a task, as set in task->min_flt. */
|
||||
/* The minor page fault count of a task, as set in task->min_flt. */
|
||||
__u64 ac_minflt; /* Minor Page Fault Count */
|
||||
|
||||
/* The major page fault count of a task, as set in task->maj_flt. */
|
||||
__u64 ac_majflt; /* Major Page Fault Count */
|
||||
|
||||
|
||||
2) Delay accounting fields:
|
||||
2) Delay accounting fields::
|
||||
|
||||
/* Delay accounting fields start
|
||||
*
|
||||
* All values, until the comment "Delay accounting fields end" are
|
||||
@ -134,7 +146,8 @@ struct taskstats {
|
||||
/* version 1 ends here */
|
||||
|
||||
|
||||
3) Extended accounting fields
|
||||
3) Extended accounting fields::
|
||||
|
||||
/* Extended accounting fields start */
|
||||
|
||||
/* Accumulated RSS usage in duration of a task, in MBytes-usecs.
|
||||
@ -145,15 +158,15 @@ struct taskstats {
|
||||
*/
|
||||
__u64 coremem; /* accumulated RSS usage in MB-usec */
|
||||
|
||||
/* Accumulated virtual memory usage in duration of a task.
|
||||
/* Accumulated virtual memory usage in duration of a task.
|
||||
* Same as acct_rss_mem1 above except that we keep track of VM usage.
|
||||
*/
|
||||
__u64 virtmem; /* accumulated VM usage in MB-usec */
|
||||
|
||||
/* High watermark of RSS usage in duration of a task, in KBytes. */
|
||||
/* High watermark of RSS usage in duration of a task, in KBytes. */
|
||||
__u64 hiwater_rss; /* High-watermark of RSS usage */
|
||||
|
||||
/* High watermark of VM usage in duration of a task, in KBytes. */
|
||||
/* High watermark of VM usage in duration of a task, in KBytes. */
|
||||
__u64 hiwater_vm; /* High-water virtual memory usage */
|
||||
|
||||
/* The following four fields are I/O statistics of a task. */
|
||||
@ -164,17 +177,23 @@ struct taskstats {
|
||||
|
||||
/* Extended accounting fields end */
|
||||
|
||||
4) Per-task and per-thread statistics
|
||||
4) Per-task and per-thread statistics::
|
||||
|
||||
__u64 nvcsw; /* Context voluntary switch counter */
|
||||
__u64 nivcsw; /* Context involuntary switch counter */
|
||||
|
||||
5) Time accounting for SMT machines
|
||||
5) Time accounting for SMT machines::
|
||||
|
||||
__u64 ac_utimescaled; /* utime scaled on frequency etc */
|
||||
__u64 ac_stimescaled; /* stime scaled on frequency etc */
|
||||
__u64 cpu_scaled_run_real_total; /* scaled cpu_run_real_total */
|
||||
|
||||
6) Extended delay accounting fields for memory reclaim
|
||||
6) Extended delay accounting fields for memory reclaim::
|
||||
|
||||
/* Delay waiting for memory reclaim */
|
||||
__u64 freepages_count;
|
||||
__u64 freepages_delay_total;
|
||||
}
|
||||
|
||||
::
|
||||
|
||||
}
|
@ -1,5 +1,6 @@
|
||||
=============================
|
||||
Per-task statistics interface
|
||||
-----------------------------
|
||||
=============================
|
||||
|
||||
|
||||
Taskstats is a netlink-based interface for sending per-task and
|
||||
@ -65,7 +66,7 @@ taskstats.h file.
|
||||
|
||||
The data exchanged between user and kernel space is a netlink message belonging
|
||||
to the NETLINK_GENERIC family and using the netlink attributes interface.
|
||||
The messages are in the format
|
||||
The messages are in the format::
|
||||
|
||||
+----------+- - -+-------------+-------------------+
|
||||
| nlmsghdr | Pad | genlmsghdr | taskstats payload |
|
||||
@ -167,15 +168,13 @@ extended and the number of cpus grows large.
|
||||
To avoid losing statistics, userspace should do one or more of the following:
|
||||
|
||||
- increase the receive buffer sizes for the netlink sockets opened by
|
||||
listeners to receive exit data.
|
||||
listeners to receive exit data.
|
||||
|
||||
- create more listeners and reduce the number of cpus being listened to by
|
||||
each listener. In the extreme case, there could be one listener for each cpu.
|
||||
Users may also consider setting the cpu affinity of the listener to the subset
|
||||
of cpus to which it listens, especially if they are listening to just one cpu.
|
||||
each listener. In the extreme case, there could be one listener for each cpu.
|
||||
Users may also consider setting the cpu affinity of the listener to the subset
|
||||
of cpus to which it listens, especially if they are listening to just one cpu.
|
||||
|
||||
Despite these measures, if the userspace receives ENOBUFS error messages
|
||||
indicated overflow of receive buffers, it should take measures to handle the
|
||||
loss of data.
|
||||
|
||||
----
|
@ -20,7 +20,7 @@ driver. The aoetools are on sourceforge.
|
||||
|
||||
http://aoetools.sourceforge.net/
|
||||
|
||||
The scripts in this Documentation/aoe directory are intended to
|
||||
The scripts in this Documentation/admin-guide/aoe directory are intended to
|
||||
document the use of the driver and are not necessary if you install
|
||||
the aoetools.
|
||||
|
||||
@ -86,7 +86,7 @@ Using sysfs
|
||||
a convenient way. Users with aoetools should use the aoe-stat
|
||||
command::
|
||||
|
||||
root@makki root# sh Documentation/aoe/status.sh
|
||||
root@makki root# sh Documentation/admin-guide/aoe/status.sh
|
||||
e10.0 eth3 up
|
||||
e10.1 eth3 up
|
||||
e10.2 eth3 up
|
@ -1,5 +1,3 @@
|
||||
:orphan:
|
||||
|
||||
=======================
|
||||
ATA over Ethernet (AoE)
|
||||
=======================
|
@ -11,7 +11,7 @@
|
||||
# udev_rules="/etc/udev/rules.d/"
|
||||
# bash# ls /etc/udev/rules.d/
|
||||
# 10-wacom.rules 50-udev.rules
|
||||
# bash# cp /path/to/linux/Documentation/aoe/udev.txt \
|
||||
# bash# cp /path/to/linux/Documentation/admin-guide/aoe/udev.txt \
|
||||
# /etc/udev/rules.d/60-aoe.rules
|
||||
#
|
||||
|
Before Width: | Height: | Size: 22 KiB After Width: | Height: | Size: 22 KiB |
Before Width: | Height: | Size: 17 KiB After Width: | Height: | Size: 17 KiB |
@ -1,3 +1,7 @@
|
||||
================================
|
||||
kernel data structure for DRBD-9
|
||||
================================
|
||||
|
||||
This describes the in kernel data structure for DRBD-9. Starting with
|
||||
Linux v3.14 we are reorganizing DRBD to use this data structure.
|
||||
|
||||
@ -10,7 +14,7 @@ device is represented by a block device locally.
|
||||
|
||||
The DRBD objects are interconnected to form a matrix as depicted below; a
|
||||
drbd_peer_device object sits at each intersection between a drbd_device and a
|
||||
drbd_connection:
|
||||
drbd_connection::
|
||||
|
||||
/--------------+---------------+.....+---------------\
|
||||
| resource | device | | device |
|
30
Documentation/admin-guide/blockdev/drbd/figures.rst
Normal file
30
Documentation/admin-guide/blockdev/drbd/figures.rst
Normal file
@ -0,0 +1,30 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
.. The here included files are intended to help understand the implementation
|
||||
|
||||
Data flows that Relate some functions, and write packets
|
||||
========================================================
|
||||
|
||||
.. kernel-figure:: DRBD-8.3-data-packets.svg
|
||||
:alt: DRBD-8.3-data-packets.svg
|
||||
:align: center
|
||||
|
||||
.. kernel-figure:: DRBD-data-packets.svg
|
||||
:alt: DRBD-data-packets.svg
|
||||
:align: center
|
||||
|
||||
|
||||
Sub graphs of DRBD's state transitions
|
||||
======================================
|
||||
|
||||
.. kernel-figure:: conn-states-8.dot
|
||||
:alt: conn-states-8.dot
|
||||
:align: center
|
||||
|
||||
.. kernel-figure:: disk-states-8.dot
|
||||
:alt: disk-states-8.dot
|
||||
:align: center
|
||||
|
||||
.. kernel-figure:: node-states-8.dot
|
||||
:alt: node-states-8.dot
|
||||
:align: center
|
@ -1,4 +1,9 @@
|
||||
==========================================
|
||||
Distributed Replicated Block Device - DRBD
|
||||
==========================================
|
||||
|
||||
Description
|
||||
===========
|
||||
|
||||
DRBD is a shared-nothing, synchronously replicated block device. It
|
||||
is designed to serve as a building block for high availability
|
||||
@ -7,10 +12,8 @@ Description
|
||||
|
||||
Please visit http://www.drbd.org to find out more.
|
||||
|
||||
The here included files are intended to help understand the implementation
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
DRBD-8.3-data-packets.svg, DRBD-data-packets.svg
|
||||
relates some functions, and write packets.
|
||||
|
||||
conn-states-8.dot, disk-states-8.dot, node-states-8.dot
|
||||
The sub graphs of DRBD's state transitions
|
||||
data-structure-v9
|
||||
figures
|
@ -11,4 +11,3 @@ digraph peer_states {
|
||||
Unknown -> Primary [ label = "connected" ]
|
||||
Unknown -> Secondary [ label = "connected" ]
|
||||
}
|
||||
|
@ -1,35 +1,37 @@
|
||||
This file describes the floppy driver.
|
||||
=============
|
||||
Floppy Driver
|
||||
=============
|
||||
|
||||
FAQ list:
|
||||
=========
|
||||
|
||||
A FAQ list may be found in the fdutils package (see below), and also
|
||||
A FAQ list may be found in the fdutils package (see below), and also
|
||||
at <http://fdutils.linux.lu/faq.html>.
|
||||
|
||||
|
||||
LILO configuration options (Thinkpad users, read this)
|
||||
======================================================
|
||||
|
||||
The floppy driver is configured using the 'floppy=' option in
|
||||
The floppy driver is configured using the 'floppy=' option in
|
||||
lilo. This option can be typed at the boot prompt, or entered in the
|
||||
lilo configuration file.
|
||||
|
||||
Example: If your kernel is called linux-2.6.9, type the following line
|
||||
at the lilo boot prompt (if you have a thinkpad):
|
||||
Example: If your kernel is called linux-2.6.9, type the following line
|
||||
at the lilo boot prompt (if you have a thinkpad)::
|
||||
|
||||
linux-2.6.9 floppy=thinkpad
|
||||
|
||||
You may also enter the following line in /etc/lilo.conf, in the description
|
||||
of linux-2.6.9:
|
||||
of linux-2.6.9::
|
||||
|
||||
append = "floppy=thinkpad"
|
||||
|
||||
Several floppy related options may be given, example:
|
||||
Several floppy related options may be given, example::
|
||||
|
||||
linux-2.6.9 floppy=daring floppy=two_fdc
|
||||
append = "floppy=daring floppy=two_fdc"
|
||||
|
||||
If you give options both in the lilo config file and on the boot
|
||||
If you give options both in the lilo config file and on the boot
|
||||
prompt, the option strings of both places are concatenated, the boot
|
||||
prompt options coming last. That's why there are also options to
|
||||
restore the default behavior.
|
||||
@ -38,21 +40,23 @@ restore the default behavior.
|
||||
Module configuration options
|
||||
============================
|
||||
|
||||
If you use the floppy driver as a module, use the following syntax:
|
||||
modprobe floppy floppy="<options>"
|
||||
If you use the floppy driver as a module, use the following syntax::
|
||||
|
||||
Example:
|
||||
modprobe floppy floppy="omnibook messages"
|
||||
modprobe floppy floppy="<options>"
|
||||
|
||||
If you need certain options enabled every time you load the floppy driver,
|
||||
you can put:
|
||||
Example::
|
||||
|
||||
options floppy floppy="omnibook messages"
|
||||
modprobe floppy floppy="omnibook messages"
|
||||
|
||||
If you need certain options enabled every time you load the floppy driver,
|
||||
you can put::
|
||||
|
||||
options floppy floppy="omnibook messages"
|
||||
|
||||
in a configuration file in /etc/modprobe.d/.
|
||||
|
||||
|
||||
The floppy driver related options are:
|
||||
The floppy driver related options are:
|
||||
|
||||
floppy=asus_pci
|
||||
Sets the bit mask to allow only units 0 and 1. (default)
|
||||
@ -70,8 +74,7 @@ in a configuration file in /etc/modprobe.d/.
|
||||
Tells the floppy driver that you have only one floppy controller.
|
||||
(default)
|
||||
|
||||
floppy=two_fdc
|
||||
floppy=<address>,two_fdc
|
||||
floppy=two_fdc / floppy=<address>,two_fdc
|
||||
Tells the floppy driver that you have two floppy controllers.
|
||||
The second floppy controller is assumed to be at <address>.
|
||||
This option is not needed if the second controller is at address
|
||||
@ -84,8 +87,7 @@ in a configuration file in /etc/modprobe.d/.
|
||||
floppy=0,thinkpad
|
||||
Tells the floppy driver that you don't have a Thinkpad.
|
||||
|
||||
floppy=omnibook
|
||||
floppy=nodma
|
||||
floppy=omnibook / floppy=nodma
|
||||
Tells the floppy driver not to use Dma for data transfers.
|
||||
This is needed on HP Omnibooks, which don't have a workable
|
||||
DMA channel for the floppy driver. This option is also useful
|
||||
@ -144,14 +146,16 @@ in a configuration file in /etc/modprobe.d/.
|
||||
described in the physical CMOS), or if your BIOS uses
|
||||
non-standard CMOS types. The CMOS types are:
|
||||
|
||||
0 - Use the value of the physical CMOS
|
||||
1 - 5 1/4 DD
|
||||
2 - 5 1/4 HD
|
||||
3 - 3 1/2 DD
|
||||
4 - 3 1/2 HD
|
||||
5 - 3 1/2 ED
|
||||
6 - 3 1/2 ED
|
||||
16 - unknown or not installed
|
||||
== ==================================
|
||||
0 Use the value of the physical CMOS
|
||||
1 5 1/4 DD
|
||||
2 5 1/4 HD
|
||||
3 3 1/2 DD
|
||||
4 3 1/2 HD
|
||||
5 3 1/2 ED
|
||||
6 3 1/2 ED
|
||||
16 unknown or not installed
|
||||
== ==================================
|
||||
|
||||
(Note: there are two valid types for ED drives. This is because 5 was
|
||||
initially chosen to represent floppy *tapes*, and 6 for ED drives.
|
||||
@ -162,8 +166,7 @@ in a configuration file in /etc/modprobe.d/.
|
||||
Print a warning message when an unexpected interrupt is received.
|
||||
(default)
|
||||
|
||||
floppy=no_unexpected_interrupts
|
||||
floppy=L40SX
|
||||
floppy=no_unexpected_interrupts / floppy=L40SX
|
||||
Don't print a message when an unexpected interrupt is received. This
|
||||
is needed on IBM L40SX laptops in certain video modes. (There seems
|
||||
to be an interaction between video and floppy. The unexpected
|
||||
@ -199,47 +202,54 @@ in a configuration file in /etc/modprobe.d/.
|
||||
Sets the floppy DMA channel to <nr> instead of 2.
|
||||
|
||||
floppy=slow
|
||||
Use PS/2 stepping rate:
|
||||
" PS/2 floppies have much slower step rates than regular floppies.
|
||||
Use PS/2 stepping rate::
|
||||
|
||||
PS/2 floppies have much slower step rates than regular floppies.
|
||||
It's been recommended that take about 1/4 of the default speed
|
||||
in some more extreme cases."
|
||||
in some more extreme cases.
|
||||
|
||||
|
||||
Supporting utilities and additional documentation:
|
||||
==================================================
|
||||
|
||||
Additional parameters of the floppy driver can be configured at
|
||||
Additional parameters of the floppy driver can be configured at
|
||||
runtime. Utilities which do this can be found in the fdutils package.
|
||||
This package also contains a new version of mtools which allows to
|
||||
access high capacity disks (up to 1992K on a high density 3 1/2 disk!).
|
||||
It also contains additional documentation about the floppy driver.
|
||||
|
||||
The latest version can be found at fdutils homepage:
|
||||
|
||||
http://fdutils.linux.lu
|
||||
|
||||
The fdutils releases can be found at:
|
||||
|
||||
http://fdutils.linux.lu/download.html
|
||||
|
||||
http://www.tux.org/pub/knaff/fdutils/
|
||||
|
||||
ftp://metalab.unc.edu/pub/Linux/utils/disk-management/
|
||||
|
||||
Reporting problems about the floppy driver
|
||||
==========================================
|
||||
|
||||
If you have a question or a bug report about the floppy driver, mail
|
||||
If you have a question or a bug report about the floppy driver, mail
|
||||
me at Alain.Knaff@poboxes.com . If you post to Usenet, preferably use
|
||||
comp.os.linux.hardware. As the volume in these groups is rather high,
|
||||
be sure to include the word "floppy" (or "FLOPPY") in the subject
|
||||
line. If the reported problem happens when mounting floppy disks, be
|
||||
sure to mention also the type of the filesystem in the subject line.
|
||||
|
||||
Be sure to read the FAQ before mailing/posting any bug reports!
|
||||
Be sure to read the FAQ before mailing/posting any bug reports!
|
||||
|
||||
Alain
|
||||
Alain
|
||||
|
||||
Changelog
|
||||
=========
|
||||
|
||||
10-30-2004 : Cleanup, updating, add reference to module configuration.
|
||||
10-30-2004 :
|
||||
Cleanup, updating, add reference to module configuration.
|
||||
James Nelson <james4765@gmail.com>
|
||||
|
||||
6-3-2000 : Original Document
|
||||
6-3-2000 :
|
||||
Original Document
|
16
Documentation/admin-guide/blockdev/index.rst
Normal file
16
Documentation/admin-guide/blockdev/index.rst
Normal file
@ -0,0 +1,16 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
===========================
|
||||
The Linux RapidIO Subsystem
|
||||
===========================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
floppy
|
||||
nbd
|
||||
paride
|
||||
ramdisk
|
||||
zram
|
||||
|
||||
drbd/index
|
@ -1,3 +1,4 @@
|
||||
==================================
|
||||
Network Block Device (TCP version)
|
||||
==================================
|
||||
|
||||
@ -28,4 +29,3 @@ max_part
|
||||
|
||||
nbds_max
|
||||
Number of block devices that should be initialized (default: 16).
|
||||
|
@ -1,15 +1,17 @@
|
||||
|
||||
Linux and parallel port IDE devices
|
||||
===================================
|
||||
Linux and parallel port IDE devices
|
||||
===================================
|
||||
|
||||
PARIDE v1.03 (c) 1997-8 Grant Guenther <grant@torque.net>
|
||||
|
||||
1. Introduction
|
||||
===============
|
||||
|
||||
Owing to the simplicity and near universality of the parallel port interface
|
||||
to personal computers, many external devices such as portable hard-disk,
|
||||
CD-ROM, LS-120 and tape drives use the parallel port to connect to their
|
||||
host computer. While some devices (notably scanners) use ad-hoc methods
|
||||
to pass commands and data through the parallel port interface, most
|
||||
to pass commands and data through the parallel port interface, most
|
||||
external devices are actually identical to an internal model, but with
|
||||
a parallel-port adapter chip added in. Some of the original parallel port
|
||||
adapters were little more than mechanisms for multiplexing a SCSI bus.
|
||||
@ -28,47 +30,50 @@ were to open up a parallel port CD-ROM drive, for instance, one would
|
||||
find a standard ATAPI CD-ROM drive, a power supply, and a single adapter
|
||||
that interconnected a standard PC parallel port cable and a standard
|
||||
IDE cable. It is usually possible to exchange the CD-ROM device with
|
||||
any other device using the IDE interface.
|
||||
any other device using the IDE interface.
|
||||
|
||||
The document describes the support in Linux for parallel port IDE
|
||||
devices. It does not cover parallel port SCSI devices, "ditto" tape
|
||||
drives or scanners. Many different devices are supported by the
|
||||
drives or scanners. Many different devices are supported by the
|
||||
parallel port IDE subsystem, including:
|
||||
|
||||
MicroSolutions backpack CD-ROM
|
||||
MicroSolutions backpack PD/CD
|
||||
MicroSolutions backpack hard-drives
|
||||
MicroSolutions backpack 8000t tape drive
|
||||
SyQuest EZ-135, EZ-230 & SparQ drives
|
||||
Avatar Shark
|
||||
Imation Superdisk LS-120
|
||||
Maxell Superdisk LS-120
|
||||
FreeCom Power CD
|
||||
Hewlett-Packard 5GB and 8GB tape drives
|
||||
Hewlett-Packard 7100 and 7200 CD-RW drives
|
||||
- MicroSolutions backpack CD-ROM
|
||||
- MicroSolutions backpack PD/CD
|
||||
- MicroSolutions backpack hard-drives
|
||||
- MicroSolutions backpack 8000t tape drive
|
||||
- SyQuest EZ-135, EZ-230 & SparQ drives
|
||||
- Avatar Shark
|
||||
- Imation Superdisk LS-120
|
||||
- Maxell Superdisk LS-120
|
||||
- FreeCom Power CD
|
||||
- Hewlett-Packard 5GB and 8GB tape drives
|
||||
- Hewlett-Packard 7100 and 7200 CD-RW drives
|
||||
|
||||
as well as most of the clone and no-name products on the market.
|
||||
|
||||
To support such a wide range of devices, PARIDE, the parallel port IDE
|
||||
subsystem, is actually structured in three parts. There is a base
|
||||
paride module which provides a registry and some common methods for
|
||||
accessing the parallel ports. The second component is a set of
|
||||
high-level drivers for each of the different types of supported devices:
|
||||
accessing the parallel ports. The second component is a set of
|
||||
high-level drivers for each of the different types of supported devices:
|
||||
|
||||
=== =============
|
||||
pd IDE disk
|
||||
pcd ATAPI CD-ROM
|
||||
pf ATAPI disk
|
||||
pt ATAPI tape
|
||||
pg ATAPI generic
|
||||
=== =============
|
||||
|
||||
(Currently, the pg driver is only used with CD-R drives).
|
||||
|
||||
The high-level drivers function according to the relevant standards.
|
||||
The third component of PARIDE is a set of low-level protocol drivers
|
||||
for each of the parallel port IDE adapter chips. Thanks to the interest
|
||||
and encouragement of Linux users from many parts of the world,
|
||||
and encouragement of Linux users from many parts of the world,
|
||||
support is available for almost all known adapter protocols:
|
||||
|
||||
==== ====================================== ====
|
||||
aten ATEN EH-100 (HK)
|
||||
bpck Microsolutions backpack (US)
|
||||
comm DataStor (old-type) "commuter" adapter (TW)
|
||||
@ -83,9 +88,11 @@ support is available for almost all known adapter protocols:
|
||||
ktti KT Technology PHd adapter (SG)
|
||||
on20 OnSpec 90c20 (US)
|
||||
on26 OnSpec 90c26 (US)
|
||||
==== ====================================== ====
|
||||
|
||||
|
||||
2. Using the PARIDE subsystem
|
||||
=============================
|
||||
|
||||
While configuring the Linux kernel, you may choose either to build
|
||||
the PARIDE drivers into your kernel, or to build them as modules.
|
||||
@ -94,10 +101,10 @@ In either case, you will need to select "Parallel port IDE device support"
|
||||
as well as at least one of the high-level drivers and at least one
|
||||
of the parallel port communication protocols. If you do not know
|
||||
what kind of parallel port adapter is used in your drive, you could
|
||||
begin by checking the file names and any text files on your DOS
|
||||
begin by checking the file names and any text files on your DOS
|
||||
installation floppy. Alternatively, you can look at the markings on
|
||||
the adapter chip itself. That's usually sufficient to identify the
|
||||
correct device.
|
||||
correct device.
|
||||
|
||||
You can actually select all the protocol modules, and allow the PARIDE
|
||||
subsystem to try them all for you.
|
||||
@ -105,8 +112,9 @@ subsystem to try them all for you.
|
||||
For the "brand-name" products listed above, here are the protocol
|
||||
and high-level drivers that you would use:
|
||||
|
||||
================ ============ ====== ========
|
||||
Manufacturer Model Driver Protocol
|
||||
|
||||
================ ============ ====== ========
|
||||
MicroSolutions CD-ROM pcd bpck
|
||||
MicroSolutions PD drive pf bpck
|
||||
MicroSolutions hard-drive pd bpck
|
||||
@ -119,8 +127,10 @@ and high-level drivers that you would use:
|
||||
Hewlett-Packard 5GB Tape pt epat
|
||||
Hewlett-Packard 7200e (CD) pcd epat
|
||||
Hewlett-Packard 7200e (CD-R) pg epat
|
||||
================ ============ ====== ========
|
||||
|
||||
2.1 Configuring built-in drivers
|
||||
---------------------------------
|
||||
|
||||
We recommend that you get to know how the drivers work and how to
|
||||
configure them as loadable modules, before attempting to compile a
|
||||
@ -143,7 +153,7 @@ protocol identification number and, for some devices, the drive's
|
||||
chain ID. While your system is booting, a number of messages are
|
||||
displayed on the console. Like all such messages, they can be
|
||||
reviewed with the 'dmesg' command. Among those messages will be
|
||||
some lines like:
|
||||
some lines like::
|
||||
|
||||
paride: bpck registered as protocol 0
|
||||
paride: epat registered as protocol 1
|
||||
@ -158,10 +168,10 @@ the last two digits of the drive's serial number (but read MicroSolutions'
|
||||
documentation about this).
|
||||
|
||||
As an example, let's assume that you have a MicroSolutions PD/CD drive
|
||||
with unit ID number 36 connected to the parallel port at 0x378, a SyQuest
|
||||
EZ-135 connected to the chained port on the PD/CD drive and also an
|
||||
Imation Superdisk connected to port 0x278. You could give the following
|
||||
options on your boot command:
|
||||
with unit ID number 36 connected to the parallel port at 0x378, a SyQuest
|
||||
EZ-135 connected to the chained port on the PD/CD drive and also an
|
||||
Imation Superdisk connected to port 0x278. You could give the following
|
||||
options on your boot command::
|
||||
|
||||
pd.drive0=0x378,1 pf.drive0=0x278,1 pf.drive1=0x378,0,36
|
||||
|
||||
@ -169,24 +179,27 @@ In the last option, pf.drive1 configures device /dev/pf1, the 0x378
|
||||
is the parallel port base address, the 0 is the protocol registration
|
||||
number and 36 is the chain ID.
|
||||
|
||||
Please note: while PARIDE will work both with and without the
|
||||
Please note: while PARIDE will work both with and without the
|
||||
PARPORT parallel port sharing system that is included by the
|
||||
"Parallel port support" option, PARPORT must be included and enabled
|
||||
if you want to use chains of devices on the same parallel port.
|
||||
|
||||
2.2 Loading and configuring PARIDE as modules
|
||||
----------------------------------------------
|
||||
|
||||
It is much faster and simpler to get to understand the PARIDE drivers
|
||||
if you use them as loadable kernel modules.
|
||||
if you use them as loadable kernel modules.
|
||||
|
||||
Note 1: using these drivers with the "kerneld" automatic module loading
|
||||
system is not recommended for beginners, and is not documented here.
|
||||
Note 1:
|
||||
using these drivers with the "kerneld" automatic module loading
|
||||
system is not recommended for beginners, and is not documented here.
|
||||
|
||||
Note 2: if you build PARPORT support as a loadable module, PARIDE must
|
||||
also be built as loadable modules, and PARPORT must be loaded before the
|
||||
PARIDE modules.
|
||||
Note 2:
|
||||
if you build PARPORT support as a loadable module, PARIDE must
|
||||
also be built as loadable modules, and PARPORT must be loaded before
|
||||
the PARIDE modules.
|
||||
|
||||
To use PARIDE, you must begin by
|
||||
To use PARIDE, you must begin by::
|
||||
|
||||
insmod paride
|
||||
|
||||
@ -195,8 +208,8 @@ among other tasks.
|
||||
|
||||
Then, load as many of the protocol modules as you think you might need.
|
||||
As you load each module, it will register the protocols that it supports,
|
||||
and print a log message to your kernel log file and your console. For
|
||||
example:
|
||||
and print a log message to your kernel log file and your console. For
|
||||
example::
|
||||
|
||||
# insmod epat
|
||||
paride: epat registered as protocol 0
|
||||
@ -205,22 +218,22 @@ example:
|
||||
paride: k971 registered as protocol 2
|
||||
|
||||
Finally, you can load high-level drivers for each kind of device that
|
||||
you have connected. By default, each driver will autoprobe for a single
|
||||
you have connected. By default, each driver will autoprobe for a single
|
||||
device, but you can support up to four similar devices by giving their
|
||||
individual co-ordinates when you load the driver.
|
||||
|
||||
For example, if you had two no-name CD-ROM drives both using the
|
||||
KingByte KBIC-951A adapter, one on port 0x378 and the other on 0x3bc
|
||||
you could give the following command:
|
||||
you could give the following command::
|
||||
|
||||
# insmod pcd drive0=0x378,1 drive1=0x3bc,1
|
||||
|
||||
For most adapters, giving a port address and protocol number is sufficient,
|
||||
but check the source files in linux/drivers/block/paride for more
|
||||
but check the source files in linux/drivers/block/paride for more
|
||||
information. (Hopefully someone will write some man pages one day !).
|
||||
|
||||
As another example, here's what happens when PARPORT is installed, and
|
||||
a SyQuest EZ-135 is attached to port 0x378:
|
||||
a SyQuest EZ-135 is attached to port 0x378::
|
||||
|
||||
# insmod paride
|
||||
paride: version 1.0 installed
|
||||
@ -237,46 +250,47 @@ Note that the last line is the output from the generic partition table
|
||||
scanner - in this case it reports that it has found a disk with one partition.
|
||||
|
||||
2.3 Using a PARIDE device
|
||||
--------------------------
|
||||
|
||||
Once the drivers have been loaded, you can access PARIDE devices in the
|
||||
same way as their traditional counterparts. You will probably need to
|
||||
create the device "special files". Here is a simple script that you can
|
||||
cut to a file and execute:
|
||||
cut to a file and execute::
|
||||
|
||||
#!/bin/bash
|
||||
#
|
||||
# mkd -- a script to create the device special files for the PARIDE subsystem
|
||||
#
|
||||
function mkdev {
|
||||
mknod $1 $2 $3 $4 ; chmod 0660 $1 ; chown root:disk $1
|
||||
}
|
||||
#
|
||||
function pd {
|
||||
D=$( printf \\$( printf "x%03x" $[ $1 + 97 ] ) )
|
||||
mkdev pd$D b 45 $[ $1 * 16 ]
|
||||
for P in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
|
||||
do mkdev pd$D$P b 45 $[ $1 * 16 + $P ]
|
||||
done
|
||||
}
|
||||
#
|
||||
cd /dev
|
||||
#
|
||||
for u in 0 1 2 3 ; do pd $u ; done
|
||||
for u in 0 1 2 3 ; do mkdev pcd$u b 46 $u ; done
|
||||
for u in 0 1 2 3 ; do mkdev pf$u b 47 $u ; done
|
||||
for u in 0 1 2 3 ; do mkdev pt$u c 96 $u ; done
|
||||
for u in 0 1 2 3 ; do mkdev npt$u c 96 $[ $u + 128 ] ; done
|
||||
for u in 0 1 2 3 ; do mkdev pg$u c 97 $u ; done
|
||||
#
|
||||
# end of mkd
|
||||
#!/bin/bash
|
||||
#
|
||||
# mkd -- a script to create the device special files for the PARIDE subsystem
|
||||
#
|
||||
function mkdev {
|
||||
mknod $1 $2 $3 $4 ; chmod 0660 $1 ; chown root:disk $1
|
||||
}
|
||||
#
|
||||
function pd {
|
||||
D=$( printf \\$( printf "x%03x" $[ $1 + 97 ] ) )
|
||||
mkdev pd$D b 45 $[ $1 * 16 ]
|
||||
for P in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
|
||||
do mkdev pd$D$P b 45 $[ $1 * 16 + $P ]
|
||||
done
|
||||
}
|
||||
#
|
||||
cd /dev
|
||||
#
|
||||
for u in 0 1 2 3 ; do pd $u ; done
|
||||
for u in 0 1 2 3 ; do mkdev pcd$u b 46 $u ; done
|
||||
for u in 0 1 2 3 ; do mkdev pf$u b 47 $u ; done
|
||||
for u in 0 1 2 3 ; do mkdev pt$u c 96 $u ; done
|
||||
for u in 0 1 2 3 ; do mkdev npt$u c 96 $[ $u + 128 ] ; done
|
||||
for u in 0 1 2 3 ; do mkdev pg$u c 97 $u ; done
|
||||
#
|
||||
# end of mkd
|
||||
|
||||
With the device files and drivers in place, you can access PARIDE devices
|
||||
like any other Linux device. For example, to mount a CD-ROM in pcd0, use:
|
||||
like any other Linux device. For example, to mount a CD-ROM in pcd0, use::
|
||||
|
||||
mount /dev/pcd0 /cdrom
|
||||
|
||||
If you have a fresh Avatar Shark cartridge, and the drive is pda, you
|
||||
might do something like:
|
||||
might do something like::
|
||||
|
||||
fdisk /dev/pda -- make a new partition table with
|
||||
partition 1 of type 83
|
||||
@ -289,41 +303,46 @@ might do something like:
|
||||
|
||||
Devices like the Imation superdisk work in the same way, except that
|
||||
they do not have a partition table. For example to make a 120MB
|
||||
floppy that you could share with a DOS system:
|
||||
floppy that you could share with a DOS system::
|
||||
|
||||
mkdosfs /dev/pf0
|
||||
mount /dev/pf0 /mnt
|
||||
|
||||
|
||||
2.4 The pf driver
|
||||
------------------
|
||||
|
||||
The pf driver is intended for use with parallel port ATAPI disk
|
||||
devices. The most common devices in this category are PD drives
|
||||
and LS-120 drives. Traditionally, media for these devices are not
|
||||
partitioned. Consequently, the pf driver does not support partitioned
|
||||
media. This may be changed in a future version of the driver.
|
||||
media. This may be changed in a future version of the driver.
|
||||
|
||||
2.5 Using the pt driver
|
||||
------------------------
|
||||
|
||||
The pt driver for parallel port ATAPI tape drives is a minimal driver.
|
||||
It does not yet support many of the standard tape ioctl operations.
|
||||
It does not yet support many of the standard tape ioctl operations.
|
||||
For best performance, a block size of 32KB should be used. You will
|
||||
probably want to set the parallel port delay to 0, if you can.
|
||||
|
||||
2.6 Using the pg driver
|
||||
------------------------
|
||||
|
||||
The pg driver can be used in conjunction with the cdrecord program
|
||||
to create CD-ROMs. Please get cdrecord version 1.6.1 or later
|
||||
from ftp://ftp.fokus.gmd.de/pub/unix/cdrecord/ . To record CD-R media
|
||||
your parallel port should ideally be set to EPP mode, and the "port delay"
|
||||
should be set to 0. With those settings it is possible to record at 2x
|
||||
from ftp://ftp.fokus.gmd.de/pub/unix/cdrecord/ . To record CD-R media
|
||||
your parallel port should ideally be set to EPP mode, and the "port delay"
|
||||
should be set to 0. With those settings it is possible to record at 2x
|
||||
speed without any buffer underruns. If you cannot get the driver to work
|
||||
in EPP mode, try to use "bidirectional" or "PS/2" mode and 1x speeds only.
|
||||
|
||||
|
||||
3. Troubleshooting
|
||||
==================
|
||||
|
||||
3.1 Use EPP mode if you can
|
||||
----------------------------
|
||||
|
||||
The most common problems that people report with the PARIDE drivers
|
||||
concern the parallel port CMOS settings. At this time, none of the
|
||||
@ -332,6 +351,7 @@ If you are able to do so, please set your parallel port into EPP mode
|
||||
using your CMOS setup procedure.
|
||||
|
||||
3.2 Check the port delay
|
||||
-------------------------
|
||||
|
||||
Some parallel ports cannot reliably transfer data at full speed. To
|
||||
offset the errors, the PARIDE protocol modules introduce a "port
|
||||
@ -347,23 +367,25 @@ read the comments at the beginning of the driver source files in
|
||||
linux/drivers/block/paride.
|
||||
|
||||
3.3 Some drives need a printer reset
|
||||
-------------------------------------
|
||||
|
||||
There appear to be a number of "noname" external drives on the market
|
||||
that do not always power up correctly. We have noticed this with some
|
||||
drives based on OnSpec and older Freecom adapters. In these rare cases,
|
||||
the adapter can often be reinitialised by issuing a "printer reset" on
|
||||
the parallel port. As the reset operation is potentially disruptive in
|
||||
multiple device environments, the PARIDE drivers will not do it
|
||||
automatically. You can however, force a printer reset by doing:
|
||||
the parallel port. As the reset operation is potentially disruptive in
|
||||
multiple device environments, the PARIDE drivers will not do it
|
||||
automatically. You can however, force a printer reset by doing::
|
||||
|
||||
insmod lp reset=1
|
||||
rmmod lp
|
||||
|
||||
If you have one of these marginal cases, you should probably build
|
||||
your paride drivers as modules, and arrange to do the printer reset
|
||||
before loading the PARIDE drivers.
|
||||
before loading the PARIDE drivers.
|
||||
|
||||
3.4 Use the verbose option and dmesg if you need help
|
||||
------------------------------------------------------
|
||||
|
||||
While a lot of testing has gone into these drivers to make them work
|
||||
as smoothly as possible, problems will arise. If you do have problems,
|
||||
@ -373,7 +395,7 @@ clues, then please make sure that only one drive is hooked to your system,
|
||||
and that either (a) PARPORT is enabled or (b) no other device driver
|
||||
is using your parallel port (check in /proc/ioports). Then, load the
|
||||
appropriate drivers (you can load several protocol modules if you want)
|
||||
as in:
|
||||
as in::
|
||||
|
||||
# insmod paride
|
||||
# insmod epat
|
||||
@ -394,12 +416,14 @@ by e-mail to grant@torque.net, or join the linux-parport mailing list
|
||||
and post your report there.
|
||||
|
||||
3.5 For more information or help
|
||||
---------------------------------
|
||||
|
||||
You can join the linux-parport mailing list by sending a mail message
|
||||
to
|
||||
to:
|
||||
|
||||
linux-parport-request@torque.net
|
||||
|
||||
with the single word
|
||||
with the single word::
|
||||
|
||||
subscribe
|
||||
|
||||
@ -412,6 +436,4 @@ have in your mail headers, when sending mail to the list server.
|
||||
You might also find some useful information on the linux-parport
|
||||
web pages (although they are not always up to date) at
|
||||
|
||||
http://web.archive.org/web/*/http://www.torque.net/parport/
|
||||
|
||||
|
||||
http://web.archive.org/web/%2E/http://www.torque.net/parport/
|
@ -1,7 +1,8 @@
|
||||
==========================================
|
||||
Using the RAM disk block device with Linux
|
||||
------------------------------------------
|
||||
==========================================
|
||||
|
||||
Contents:
|
||||
.. Contents:
|
||||
|
||||
1) Overview
|
||||
2) Kernel Command Line Parameters
|
||||
@ -42,7 +43,7 @@ rescue floppy disk.
|
||||
2a) Kernel Command Line Parameters
|
||||
|
||||
ramdisk_size=N
|
||||
==============
|
||||
Size of the ramdisk.
|
||||
|
||||
This parameter tells the RAM disk driver to set up RAM disks of N k size. The
|
||||
default is 4096 (4 MB).
|
||||
@ -50,16 +51,13 @@ default is 4096 (4 MB).
|
||||
2b) Module parameters
|
||||
|
||||
rd_nr
|
||||
=====
|
||||
/dev/ramX devices created.
|
||||
/dev/ramX devices created.
|
||||
|
||||
max_part
|
||||
========
|
||||
Maximum partition number.
|
||||
Maximum partition number.
|
||||
|
||||
rd_size
|
||||
=======
|
||||
See ramdisk_size.
|
||||
See ramdisk_size.
|
||||
|
||||
3) Using "rdev -r"
|
||||
------------------
|
||||
@ -71,11 +69,11 @@ to 2 MB (2^11) of where to find the RAM disk (this used to be the size). Bit
|
||||
prompt/wait sequence is to be given before trying to read the RAM disk. Since
|
||||
the RAM disk dynamically grows as data is being written into it, a size field
|
||||
is not required. Bits 11 to 13 are not currently used and may as well be zero.
|
||||
These numbers are no magical secrets, as seen below:
|
||||
These numbers are no magical secrets, as seen below::
|
||||
|
||||
./arch/x86/kernel/setup.c:#define RAMDISK_IMAGE_START_MASK 0x07FF
|
||||
./arch/x86/kernel/setup.c:#define RAMDISK_PROMPT_FLAG 0x8000
|
||||
./arch/x86/kernel/setup.c:#define RAMDISK_LOAD_FLAG 0x4000
|
||||
./arch/x86/kernel/setup.c:#define RAMDISK_IMAGE_START_MASK 0x07FF
|
||||
./arch/x86/kernel/setup.c:#define RAMDISK_PROMPT_FLAG 0x8000
|
||||
./arch/x86/kernel/setup.c:#define RAMDISK_LOAD_FLAG 0x4000
|
||||
|
||||
Consider a typical two floppy disk setup, where you will have the
|
||||
kernel on disk one, and have already put a RAM disk image onto disk #2.
|
||||
@ -92,20 +90,23 @@ sequence so that you have a chance to switch floppy disks.
|
||||
The command line equivalent is: "prompt_ramdisk=1"
|
||||
|
||||
Putting that together gives 2^15 + 2^14 + 0 = 49152 for an rdev word.
|
||||
So to create disk one of the set, you would do:
|
||||
So to create disk one of the set, you would do::
|
||||
|
||||
/usr/src/linux# cat arch/x86/boot/zImage > /dev/fd0
|
||||
/usr/src/linux# rdev /dev/fd0 /dev/fd0
|
||||
/usr/src/linux# rdev -r /dev/fd0 49152
|
||||
|
||||
If you make a boot disk that has LILO, then for the above, you would use:
|
||||
If you make a boot disk that has LILO, then for the above, you would use::
|
||||
|
||||
append = "ramdisk_start=0 load_ramdisk=1 prompt_ramdisk=1"
|
||||
Since the default start = 0 and the default prompt = 1, you could use:
|
||||
|
||||
Since the default start = 0 and the default prompt = 1, you could use::
|
||||
|
||||
append = "load_ramdisk=1"
|
||||
|
||||
|
||||
4) An Example of Creating a Compressed RAM Disk
|
||||
----------------------------------------------
|
||||
-----------------------------------------------
|
||||
|
||||
To create a RAM disk image, you will need a spare block device to
|
||||
construct it on. This can be the RAM disk device itself, or an
|
||||
@ -120,11 +121,11 @@ a) Decide on the RAM disk size that you want. Say 2 MB for this example.
|
||||
Create it by writing to the RAM disk device. (This step is not currently
|
||||
required, but may be in the future.) It is wise to zero out the
|
||||
area (esp. for disks) so that maximal compression is achieved for
|
||||
the unused blocks of the image that you are about to create.
|
||||
the unused blocks of the image that you are about to create::
|
||||
|
||||
dd if=/dev/zero of=/dev/ram0 bs=1k count=2048
|
||||
|
||||
b) Make a filesystem on it. Say ext2fs for this example.
|
||||
b) Make a filesystem on it. Say ext2fs for this example::
|
||||
|
||||
mke2fs -vm0 /dev/ram0 2048
|
||||
|
||||
@ -133,11 +134,11 @@ c) Mount it, copy the files you want to it (eg: /etc/* /dev/* ...)
|
||||
|
||||
d) Compress the contents of the RAM disk. The level of compression
|
||||
will be approximately 50% of the space used by the files. Unused
|
||||
space on the RAM disk will compress to almost nothing.
|
||||
space on the RAM disk will compress to almost nothing::
|
||||
|
||||
dd if=/dev/ram0 bs=1k count=2048 | gzip -v9 > /tmp/ram_image.gz
|
||||
|
||||
e) Put the kernel onto the floppy
|
||||
e) Put the kernel onto the floppy::
|
||||
|
||||
dd if=zImage of=/dev/fd0 bs=1k
|
||||
|
||||
@ -146,13 +147,13 @@ f) Put the RAM disk image onto the floppy, after the kernel. Use an offset
|
||||
(possibly larger) kernel onto the same floppy later without overlapping
|
||||
the RAM disk image. An offset of 400 kB for kernels about 350 kB in
|
||||
size would be reasonable. Make sure offset+size of ram_image.gz is
|
||||
not larger than the total space on your floppy (usually 1440 kB).
|
||||
not larger than the total space on your floppy (usually 1440 kB)::
|
||||
|
||||
dd if=/tmp/ram_image.gz of=/dev/fd0 bs=1k seek=400
|
||||
|
||||
g) Use "rdev" to set the boot device, RAM disk offset, prompt flag, etc.
|
||||
For prompt_ramdisk=1, load_ramdisk=1, ramdisk_start=400, one would
|
||||
have 2^15 + 2^14 + 400 = 49552.
|
||||
have 2^15 + 2^14 + 400 = 49552::
|
||||
|
||||
rdev /dev/fd0 /dev/fd0
|
||||
rdev -r /dev/fd0 49552
|
||||
@ -160,15 +161,17 @@ g) Use "rdev" to set the boot device, RAM disk offset, prompt flag, etc.
|
||||
That is it. You now have your boot/root compressed RAM disk floppy. Some
|
||||
users may wish to combine steps (d) and (f) by using a pipe.
|
||||
|
||||
--------------------------------------------------------------------------
|
||||
|
||||
Paul Gortmaker 12/95
|
||||
|
||||
Changelog:
|
||||
----------
|
||||
|
||||
10-22-04 : Updated to reflect changes in command line options, remove
|
||||
10-22-04 :
|
||||
Updated to reflect changes in command line options, remove
|
||||
obsolete references, general cleanup.
|
||||
James Nelson (james4765@gmail.com)
|
||||
|
||||
|
||||
12-95 : Original Document
|
||||
12-95 :
|
||||
Original Document
|
@ -1,7 +1,9 @@
|
||||
========================================
|
||||
zram: Compressed RAM based block devices
|
||||
----------------------------------------
|
||||
========================================
|
||||
|
||||
* Introduction
|
||||
Introduction
|
||||
============
|
||||
|
||||
The zram module creates RAM based block devices named /dev/zram<id>
|
||||
(<id> = 0, 1, ...). Pages written to these disks are compressed and stored
|
||||
@ -12,9 +14,11 @@ use as swap disks, various caches under /var and maybe many more :)
|
||||
Statistics for individual zram devices are exported through sysfs nodes at
|
||||
/sys/block/zram<id>/
|
||||
|
||||
* Usage
|
||||
Usage
|
||||
=====
|
||||
|
||||
There are several ways to configure and manage zram device(-s):
|
||||
|
||||
a) using zram and zram_control sysfs attributes
|
||||
b) using zramctl utility, provided by util-linux (util-linux@vger.kernel.org).
|
||||
|
||||
@ -22,7 +26,7 @@ In this document we will describe only 'manual' zram configuration steps,
|
||||
IOW, zram and zram_control sysfs attributes.
|
||||
|
||||
In order to get a better idea about zramctl please consult util-linux
|
||||
documentation, zramctl man-page or `zramctl --help'. Please be informed
|
||||
documentation, zramctl man-page or `zramctl --help`. Please be informed
|
||||
that zram maintainers do not develop/maintain util-linux or zramctl, should
|
||||
you have any questions please contact util-linux@vger.kernel.org
|
||||
|
||||
@ -30,19 +34,23 @@ Following shows a typical sequence of steps for using zram.
|
||||
|
||||
WARNING
|
||||
=======
|
||||
|
||||
For the sake of simplicity we skip error checking parts in most of the
|
||||
examples below. However, it is your sole responsibility to handle errors.
|
||||
|
||||
zram sysfs attributes always return negative values in case of errors.
|
||||
The list of possible return codes:
|
||||
-EBUSY -- an attempt to modify an attribute that cannot be changed once
|
||||
the device has been initialised. Please reset device first;
|
||||
-ENOMEM -- zram was not able to allocate enough memory to fulfil your
|
||||
needs;
|
||||
-EINVAL -- invalid input has been provided.
|
||||
|
||||
======== =============================================================
|
||||
-EBUSY an attempt to modify an attribute that cannot be changed once
|
||||
the device has been initialised. Please reset device first;
|
||||
-ENOMEM zram was not able to allocate enough memory to fulfil your
|
||||
needs;
|
||||
-EINVAL invalid input has been provided.
|
||||
======== =============================================================
|
||||
|
||||
If you use 'echo', the returned value that is changed by 'echo' utility,
|
||||
and, in general case, something like:
|
||||
and, in general case, something like::
|
||||
|
||||
echo 3 > /sys/block/zram0/max_comp_streams
|
||||
if [ $? -ne 0 ];
|
||||
@ -51,7 +59,11 @@ and, in general case, something like:
|
||||
|
||||
should suffice.
|
||||
|
||||
1) Load Module:
|
||||
1) Load Module
|
||||
==============
|
||||
|
||||
::
|
||||
|
||||
modprobe zram num_devices=4
|
||||
This creates 4 devices: /dev/zram{0,1,2,3}
|
||||
|
||||
@ -59,6 +71,8 @@ num_devices parameter is optional and tells zram how many devices should be
|
||||
pre-created. Default: 1.
|
||||
|
||||
2) Set max number of compression streams
|
||||
========================================
|
||||
|
||||
Regardless the value passed to this attribute, ZRAM will always
|
||||
allocate multiple compression streams - one per online CPUs - thus
|
||||
allowing several concurrent compression operations. The number of
|
||||
@ -66,16 +80,20 @@ allocated compression streams goes down when some of the CPUs
|
||||
become offline. There is no single-compression-stream mode anymore,
|
||||
unless you are running a UP system or has only 1 CPU online.
|
||||
|
||||
To find out how many streams are currently available:
|
||||
To find out how many streams are currently available::
|
||||
|
||||
cat /sys/block/zram0/max_comp_streams
|
||||
|
||||
3) Select compression algorithm
|
||||
===============================
|
||||
|
||||
Using comp_algorithm device attribute one can see available and
|
||||
currently selected (shown in square brackets) compression algorithms,
|
||||
change selected compression algorithm (once the device is initialised
|
||||
there is no way to change compression algorithm).
|
||||
|
||||
Examples:
|
||||
Examples::
|
||||
|
||||
#show supported compression algorithms
|
||||
cat /sys/block/zram0/comp_algorithm
|
||||
lzo [lz4]
|
||||
@ -83,20 +101,23 @@ Examples:
|
||||
#select lzo compression algorithm
|
||||
echo lzo > /sys/block/zram0/comp_algorithm
|
||||
|
||||
For the time being, the `comp_algorithm' content does not necessarily
|
||||
For the time being, the `comp_algorithm` content does not necessarily
|
||||
show every compression algorithm supported by the kernel. We keep this
|
||||
list primarily to simplify device configuration and one can configure
|
||||
a new device with a compression algorithm that is not listed in
|
||||
`comp_algorithm'. The thing is that, internally, ZRAM uses Crypto API
|
||||
`comp_algorithm`. The thing is that, internally, ZRAM uses Crypto API
|
||||
and, if some of the algorithms were built as modules, it's impossible
|
||||
to list all of them using, for instance, /proc/crypto or any other
|
||||
method. This, however, has an advantage of permitting the usage of
|
||||
custom crypto compression modules (implementing S/W or H/W compression).
|
||||
|
||||
4) Set Disksize
|
||||
===============
|
||||
|
||||
Set disk size by writing the value to sysfs node 'disksize'.
|
||||
The value can be either in bytes or you can use mem suffixes.
|
||||
Examples:
|
||||
Examples::
|
||||
|
||||
# Initialize /dev/zram0 with 50MB disksize
|
||||
echo $((50*1024*1024)) > /sys/block/zram0/disksize
|
||||
|
||||
@ -111,10 +132,13 @@ since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the
|
||||
size of the disk when not in use so a huge zram is wasteful.
|
||||
|
||||
5) Set memory limit: Optional
|
||||
=============================
|
||||
|
||||
Set memory limit by writing the value to sysfs node 'mem_limit'.
|
||||
The value can be either in bytes or you can use mem suffixes.
|
||||
In addition, you could change the value in runtime.
|
||||
Examples:
|
||||
Examples::
|
||||
|
||||
# limit /dev/zram0 with 50MB memory
|
||||
echo $((50*1024*1024)) > /sys/block/zram0/mem_limit
|
||||
|
||||
@ -126,7 +150,11 @@ Examples:
|
||||
# To disable memory limit
|
||||
echo 0 > /sys/block/zram0/mem_limit
|
||||
|
||||
6) Activate:
|
||||
6) Activate
|
||||
===========
|
||||
|
||||
::
|
||||
|
||||
mkswap /dev/zram0
|
||||
swapon /dev/zram0
|
||||
|
||||
@ -134,6 +162,7 @@ Examples:
|
||||
mount /dev/zram1 /tmp
|
||||
|
||||
7) Add/remove zram devices
|
||||
==========================
|
||||
|
||||
zram provides a control interface, which enables dynamic (on-demand) device
|
||||
addition and removal.
|
||||
@ -142,44 +171,51 @@ In order to add a new /dev/zramX device, perform read operation on hot_add
|
||||
attribute. This will return either new device's device id (meaning that you
|
||||
can use /dev/zram<id>) or error code.
|
||||
|
||||
Example:
|
||||
Example::
|
||||
|
||||
cat /sys/class/zram-control/hot_add
|
||||
1
|
||||
|
||||
To remove the existing /dev/zramX device (where X is a device id)
|
||||
execute
|
||||
execute::
|
||||
|
||||
echo X > /sys/class/zram-control/hot_remove
|
||||
|
||||
8) Stats:
|
||||
8) Stats
|
||||
========
|
||||
|
||||
Per-device statistics are exported as various nodes under /sys/block/zram<id>/
|
||||
|
||||
A brief description of exported device attributes. For more details please
|
||||
read Documentation/ABI/testing/sysfs-block-zram.
|
||||
|
||||
====================== ====== ===============================================
|
||||
Name access description
|
||||
---- ------ -----------
|
||||
====================== ====== ===============================================
|
||||
disksize RW show and set the device's disk size
|
||||
initstate RO shows the initialization state of the device
|
||||
reset WO trigger device reset
|
||||
mem_used_max WO reset the `mem_used_max' counter (see later)
|
||||
mem_limit WO specifies the maximum amount of memory ZRAM can use
|
||||
to store the compressed data
|
||||
writeback_limit WO specifies the maximum amount of write IO zram can
|
||||
write out to backing device as 4KB unit
|
||||
mem_used_max WO reset the `mem_used_max` counter (see later)
|
||||
mem_limit WO specifies the maximum amount of memory ZRAM can
|
||||
use to store the compressed data
|
||||
writeback_limit WO specifies the maximum amount of write IO zram
|
||||
can write out to backing device as 4KB unit
|
||||
writeback_limit_enable RW show and set writeback_limit feature
|
||||
max_comp_streams RW the number of possible concurrent compress operations
|
||||
max_comp_streams RW the number of possible concurrent compress
|
||||
operations
|
||||
comp_algorithm RW show and change the compression algorithm
|
||||
compact WO trigger memory compaction
|
||||
debug_stat RO this file is used for zram debugging purposes
|
||||
backing_dev RW set up backend storage for zram to write out
|
||||
idle WO mark allocated slot as idle
|
||||
====================== ====== ===============================================
|
||||
|
||||
|
||||
User space is advised to use the following files to read the device statistics.
|
||||
|
||||
File /sys/block/zram<id>/stat
|
||||
|
||||
Represents block layer statistics. Read Documentation/block/stat.txt for
|
||||
Represents block layer statistics. Read Documentation/block/stat.rst for
|
||||
details.
|
||||
|
||||
File /sys/block/zram<id>/io_stat
|
||||
@ -188,23 +224,31 @@ The stat file represents device's I/O statistics not accounted by block
|
||||
layer and, thus, not available in zram<id>/stat file. It consists of a
|
||||
single line of text and contains the following stats separated by
|
||||
whitespace:
|
||||
failed_reads the number of failed reads
|
||||
failed_writes the number of failed writes
|
||||
invalid_io the number of non-page-size-aligned I/O requests
|
||||
|
||||
============= =============================================================
|
||||
failed_reads The number of failed reads
|
||||
failed_writes The number of failed writes
|
||||
invalid_io The number of non-page-size-aligned I/O requests
|
||||
notify_free Depending on device usage scenario it may account
|
||||
|
||||
a) the number of pages freed because of swap slot free
|
||||
notifications or b) the number of pages freed because of
|
||||
REQ_OP_DISCARD requests sent by bio. The former ones are
|
||||
sent to a swap block device when a swap slot is freed,
|
||||
which implies that this disk is being used as a swap disk.
|
||||
notifications
|
||||
b) the number of pages freed because of
|
||||
REQ_OP_DISCARD requests sent by bio. The former ones are
|
||||
sent to a swap block device when a swap slot is freed,
|
||||
which implies that this disk is being used as a swap disk.
|
||||
|
||||
The latter ones are sent by filesystem mounted with
|
||||
discard option, whenever some data blocks are getting
|
||||
discarded.
|
||||
============= =============================================================
|
||||
|
||||
File /sys/block/zram<id>/mm_stat
|
||||
|
||||
The stat file represents device's mm statistics. It consists of a single
|
||||
line of text and contains the following stats separated by whitespace:
|
||||
|
||||
================ =============================================================
|
||||
orig_data_size uncompressed size of data stored in this disk.
|
||||
This excludes same-element-filled pages (same_pages) since
|
||||
no memory is allocated for them.
|
||||
@ -223,58 +267,71 @@ line of text and contains the following stats separated by whitespace:
|
||||
No memory is allocated for such pages.
|
||||
pages_compacted the number of pages freed during compaction
|
||||
huge_pages the number of incompressible pages
|
||||
================ =============================================================
|
||||
|
||||
File /sys/block/zram<id>/bd_stat
|
||||
|
||||
The stat file represents device's backing device statistics. It consists of
|
||||
a single line of text and contains the following stats separated by whitespace:
|
||||
|
||||
============== =============================================================
|
||||
bd_count size of data written in backing device.
|
||||
Unit: 4K bytes
|
||||
bd_reads the number of reads from backing device
|
||||
Unit: 4K bytes
|
||||
bd_writes the number of writes to backing device
|
||||
Unit: 4K bytes
|
||||
============== =============================================================
|
||||
|
||||
9) Deactivate
|
||||
=============
|
||||
|
||||
::
|
||||
|
||||
9) Deactivate:
|
||||
swapoff /dev/zram0
|
||||
umount /dev/zram1
|
||||
|
||||
10) Reset:
|
||||
Write any positive value to 'reset' sysfs node
|
||||
echo 1 > /sys/block/zram0/reset
|
||||
echo 1 > /sys/block/zram1/reset
|
||||
10) Reset
|
||||
=========
|
||||
|
||||
Write any positive value to 'reset' sysfs node::
|
||||
|
||||
echo 1 > /sys/block/zram0/reset
|
||||
echo 1 > /sys/block/zram1/reset
|
||||
|
||||
This frees all the memory allocated for the given device and
|
||||
resets the disksize to zero. You must set the disksize again
|
||||
before reusing the device.
|
||||
|
||||
* Optional Feature
|
||||
Optional Feature
|
||||
================
|
||||
|
||||
= writeback
|
||||
writeback
|
||||
---------
|
||||
|
||||
With CONFIG_ZRAM_WRITEBACK, zram can write idle/incompressible page
|
||||
to backing storage rather than keeping it in memory.
|
||||
To use the feature, admin should set up backing device via
|
||||
To use the feature, admin should set up backing device via::
|
||||
|
||||
"echo /dev/sda5 > /sys/block/zramX/backing_dev"
|
||||
echo /dev/sda5 > /sys/block/zramX/backing_dev
|
||||
|
||||
before disksize setting. It supports only partition at this moment.
|
||||
If admin want to use incompressible page writeback, they could do via
|
||||
If admin want to use incompressible page writeback, they could do via::
|
||||
|
||||
"echo huge > /sys/block/zramX/write"
|
||||
echo huge > /sys/block/zramX/write
|
||||
|
||||
To use idle page writeback, first, user need to declare zram pages
|
||||
as idle.
|
||||
as idle::
|
||||
|
||||
"echo all > /sys/block/zramX/idle"
|
||||
echo all > /sys/block/zramX/idle
|
||||
|
||||
From now on, any pages on zram are idle pages. The idle mark
|
||||
will be removed until someone request access of the block.
|
||||
IOW, unless there is access request, those pages are still idle pages.
|
||||
|
||||
Admin can request writeback of those idle pages at right timing via
|
||||
Admin can request writeback of those idle pages at right timing via::
|
||||
|
||||
"echo idle > /sys/block/zramX/writeback"
|
||||
echo idle > /sys/block/zramX/writeback
|
||||
|
||||
With the command, zram writeback idle pages from memory to the storage.
|
||||
|
||||
@ -285,7 +342,7 @@ to guarantee storage health for entire product life.
|
||||
To overcome the concern, zram supports "writeback_limit" feature.
|
||||
The "writeback_limit_enable"'s default value is 0 so that it doesn't limit
|
||||
any writeback. IOW, if admin want to apply writeback budget, he should
|
||||
enable writeback_limit_enable via
|
||||
enable writeback_limit_enable via::
|
||||
|
||||
$ echo 1 > /sys/block/zramX/writeback_limit_enable
|
||||
|
||||
@ -296,7 +353,7 @@ until admin set the budget via /sys/block/zramX/writeback_limit.
|
||||
assigned via /sys/block/zramX/writeback_limit is meaninless.)
|
||||
|
||||
If admin want to limit writeback as per-day 400M, he could do it
|
||||
like below.
|
||||
like below::
|
||||
|
||||
$ MB_SHIFT=20
|
||||
$ 4K_SHIFT=12
|
||||
@ -305,16 +362,16 @@ like below.
|
||||
$ echo 1 > /sys/block/zram0/writeback_limit_enable
|
||||
|
||||
If admin want to allow further write again once the bugdet is exausted,
|
||||
he could do it like below
|
||||
he could do it like below::
|
||||
|
||||
$ echo $((400<<MB_SHIFT>>4K_SHIFT)) > \
|
||||
/sys/block/zram0/writeback_limit
|
||||
|
||||
If admin want to see remaining writeback budget since he set,
|
||||
If admin want to see remaining writeback budget since he set::
|
||||
|
||||
$ cat /sys/block/zramX/writeback_limit
|
||||
|
||||
If admin want to disable writeback limit, he could do
|
||||
If admin want to disable writeback limit, he could do::
|
||||
|
||||
$ echo 0 > /sys/block/zramX/writeback_limit_enable
|
||||
|
||||
@ -326,25 +383,35 @@ budget in next setting is user's job.
|
||||
If admin want to measure writeback count in a certain period, he could
|
||||
know it via /sys/block/zram0/bd_stat's 3rd column.
|
||||
|
||||
= memory tracking
|
||||
memory tracking
|
||||
===============
|
||||
|
||||
With CONFIG_ZRAM_MEMORY_TRACKING, user can know information of the
|
||||
zram block. It could be useful to catch cold or incompressible
|
||||
pages of the process with*pagemap.
|
||||
|
||||
If you enable the feature, you could see block state via
|
||||
/sys/kernel/debug/zram/zram0/block_state". The output is as follows,
|
||||
/sys/kernel/debug/zram/zram0/block_state". The output is as follows::
|
||||
|
||||
300 75.033841 .wh.
|
||||
301 63.806904 s...
|
||||
302 63.806919 ..hi
|
||||
|
||||
First column is zram's block index.
|
||||
Second column is access time since the system was booted
|
||||
Third column is state of the block.
|
||||
(s: same page
|
||||
w: written page to backing store
|
||||
h: huge page
|
||||
i: idle page)
|
||||
First column
|
||||
zram's block index.
|
||||
Second column
|
||||
access time since the system was booted
|
||||
Third column
|
||||
state of the block:
|
||||
|
||||
s:
|
||||
same page
|
||||
w:
|
||||
written page to backing store
|
||||
h:
|
||||
huge page
|
||||
i:
|
||||
idle page
|
||||
|
||||
First line of above example says 300th block is accessed at 75.033841sec
|
||||
and the block's state is huge so it is written back to the backing
|
@ -90,9 +90,9 @@ the disk is not available then you have three options:
|
||||
run a null modem to a second machine and capture the output there
|
||||
using your favourite communication program. Minicom works well.
|
||||
|
||||
(3) Use Kdump (see Documentation/kdump/kdump.rst),
|
||||
(3) Use Kdump (see Documentation/admin-guide/kdump/kdump.rst),
|
||||
extract the kernel ring buffer from old memory with using dmesg
|
||||
gdbmacro in Documentation/kdump/gdbmacros.txt.
|
||||
gdbmacro in Documentation/admin-guide/kdump/gdbmacros.txt.
|
||||
|
||||
Finding the bug's location
|
||||
--------------------------
|
||||
|
@ -3,7 +3,7 @@ Control Groups
|
||||
==============
|
||||
|
||||
Written by Paul Menage <menage@google.com> based on
|
||||
Documentation/cgroup-v1/cpusets.rst
|
||||
Documentation/admin-guide/cgroup-v1/cpusets.rst
|
||||
|
||||
Original copyright statements from cpusets.txt:
|
||||
|
||||
@ -76,7 +76,7 @@ On their own, the only use for cgroups is for simple job
|
||||
tracking. The intention is that other subsystems hook into the generic
|
||||
cgroup support to provide new attributes for cgroups, such as
|
||||
accounting/limiting the resources which processes in a cgroup can
|
||||
access. For example, cpusets (see Documentation/cgroup-v1/cpusets.rst) allow
|
||||
access. For example, cpusets (see Documentation/admin-guide/cgroup-v1/cpusets.rst) allow
|
||||
you to associate a set of CPUs and a set of memory nodes with the
|
||||
tasks in each cgroup.
|
||||
|
@ -49,7 +49,7 @@ hooks, beyond what is already present, required to manage dynamic
|
||||
job placement on large systems.
|
||||
|
||||
Cpusets use the generic cgroup subsystem described in
|
||||
Documentation/cgroup-v1/cgroups.rst.
|
||||
Documentation/admin-guide/cgroup-v1/cgroups.rst.
|
||||
|
||||
Requests by a task, using the sched_setaffinity(2) system call to
|
||||
include CPUs in its CPU affinity mask, and using the mbind(2) and
|
@ -1,5 +1,3 @@
|
||||
:orphan:
|
||||
|
||||
========================
|
||||
Control Groups version 1
|
||||
========================
|
@ -10,7 +10,7 @@ Because VM is getting complex (one of reasons is memcg...), memcg's behavior
|
||||
is complex. This is a document for memcg's internal behavior.
|
||||
Please note that implementation details can be changed.
|
||||
|
||||
(*) Topics on API should be in Documentation/cgroup-v1/memory.rst)
|
||||
(*) Topics on API should be in Documentation/admin-guide/cgroup-v1/memory.rst)
|
||||
|
||||
0. How to record usage ?
|
||||
========================
|
||||
@ -327,7 +327,7 @@ Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y.
|
||||
You can see charges have been moved by reading ``*.usage_in_bytes`` or
|
||||
memory.stat of both A and B.
|
||||
|
||||
See 8.2 of Documentation/cgroup-v1/memory.rst to see what value should
|
||||
See 8.2 of Documentation/admin-guide/cgroup-v1/memory.rst to see what value should
|
||||
be written to move_charge_at_immigrate.
|
||||
|
||||
9.10 Memory thresholds
|
@ -9,7 +9,7 @@ This is the authoritative documentation on the design, interface and
|
||||
conventions of cgroup v2. It describes all userland-visible aspects
|
||||
of cgroup including core and specific controller behaviors. All
|
||||
future changes must be reflected in this document. Documentation for
|
||||
v1 is available under Documentation/cgroup-v1/.
|
||||
v1 is available under Documentation/admin-guide/cgroup-v1/.
|
||||
|
||||
.. CONTENTS
|
||||
|
||||
@ -1014,7 +1014,7 @@ All time durations are in microseconds.
|
||||
A read-only nested-key file which exists on non-root cgroups.
|
||||
|
||||
Shows pressure stall information for CPU. See
|
||||
Documentation/accounting/psi.txt for details.
|
||||
Documentation/accounting/psi.rst for details.
|
||||
|
||||
|
||||
Memory
|
||||
@ -1355,7 +1355,7 @@ PAGE_SIZE multiple when read back.
|
||||
A read-only nested-key file which exists on non-root cgroups.
|
||||
|
||||
Shows pressure stall information for memory. See
|
||||
Documentation/accounting/psi.txt for details.
|
||||
Documentation/accounting/psi.rst for details.
|
||||
|
||||
|
||||
Usage Guidelines
|
||||
@ -1498,7 +1498,7 @@ IO Interface Files
|
||||
A read-only nested-key file which exists on non-root cgroups.
|
||||
|
||||
Shows pressure stall information for IO. See
|
||||
Documentation/accounting/psi.txt for details.
|
||||
Documentation/accounting/psi.rst for details.
|
||||
|
||||
|
||||
Writeback
|
||||
@ -2124,7 +2124,7 @@ following two functions.
|
||||
a queue (device) has been associated with the bio and
|
||||
before submission.
|
||||
|
||||
wbc_account_io(@wbc, @page, @bytes)
|
||||
wbc_account_cgroup_owner(@wbc, @page, @bytes)
|
||||
Should be called for each data segment being written out.
|
||||
While this function doesn't care exactly when it's called
|
||||
during the writeback session, it's the easiest and most
|
||||
|
@ -1,5 +1,3 @@
|
||||
:orphan:
|
||||
|
||||
=============
|
||||
Device Mapper
|
||||
=============
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue
Block a user