mirror of
https://github.com/edk2-porting/linux-next.git
synced 2025-01-20 11:34:02 +08:00
These are the documentation changes for 4.10.
It's another busy cycle for the docs tree, as the sphinx conversion continues. Highlights include: - Further work on PDF output, which remains a bit of a pain but should be more solid now. - Five more DocBook template files converted to Sphinx. Only 27 to go... Lots of plain-text files have also been converted and integrated. - Images in binary formats have been replaced with more source-friendly versions. - Various bits of organizational work, including the renaming of various files discussed at the kernel summit. - New documentation for the device_link mechanism. ...and, of course, lots of typo fixes and small updates. -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJYTbl7AAoJEI3ONVYwIuV63NIP/REwzThnGWFJMRSuq8Ieq2r9 sFSQsaGTGlhyKiDoEooo+SO/Za3uTonjK+e7WZg8mhdiEdamta5aociU/71C1Yy/ T9ur0FhcGblrvZ1NidSDvCLwuECZOMMei7mgLZ9a+KCpc4ANqqTVZSUm1blKcqhF XelhVXxBa0ar35l/pVzyCxkdNXRWXv+MJZE8hp5XAdTdr11DS7UY9zrZdH31axtf BZlbYJrvB8WPydU6myTjRpirA17Hu7uU64MsL3bNIEiRQ+nVghEzQC8uxeUCvfVx r0H5AgGGQeir+e8GEv2T20SPZ+dumXs+y/HehKNb3jS3gV0mo+pKPeUhwLIxr+Zh QY64gf+jYf5ISHwAJRnU0Ima72ehObzSbx9Dko10nhq2OvbR5f83gjz9t9jKYFU7 RDowICA8lwqyRbHRoVfyoW8CpVhWFpMFu3yNeJMckeTish3m7ANqzaWslbsqIP5G zxgFMIrVVSbeae+sUeygtEJAnWI09aZ4tuaUXYtGWwu6ikC/3aV6DryP4bthG2LF A19uV4nMrLuuh8g2wiTHHjMfjYRwvSn+f9yaolwJhwyNDXQzRPy+ZJ3W/6olOkXC bAxTmVRCW5GA/fmSrfXmW1KbnxlWfP2C62hzZQ09UHxzTHdR97oFLDQdZhKo1uwf pmSJR0hVeRUmA4uw6+Su =A0EV -----END PGP SIGNATURE----- Merge tag 'docs-4.10' of git://git.lwn.net/linux into drm-misc-next Backmerge the docs-next branch from Jon into drm-misc so that we can apply the dma-buf documentation cleanup patches. Git found a conflict where there was none because both drm-misc and docs had identical patches to clean up file rename issues in the rst include directives. Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
This commit is contained in:
commit
c248891675
@ -14,13 +14,8 @@ Following translations are available on the WWW:
|
||||
- this file.
|
||||
ABI/
|
||||
- info on kernel <-> userspace ABI and relative interface stability.
|
||||
|
||||
BUG-HUNTING
|
||||
- brute force method of doing binary search of patches to find bug.
|
||||
Changes
|
||||
- list of changes that break older software packages.
|
||||
CodingStyle
|
||||
- how the maintainers expect the C code in the kernel to look.
|
||||
- nothing here, just a pointer to process/coding-style.rst.
|
||||
DMA-API.txt
|
||||
- DMA API, pci_ API & extensions for non-consistent memory machines.
|
||||
DMA-API-HOWTO.txt
|
||||
@ -33,8 +28,6 @@ DocBook/
|
||||
- directory with DocBook templates etc. for kernel documentation.
|
||||
EDID/
|
||||
- directory with info on customizing EDID for broken gfx/displays.
|
||||
HOWTO
|
||||
- the process and procedures of how to do Linux kernel development.
|
||||
IPMI.txt
|
||||
- info on Linux Intelligent Platform Management Interface (IPMI) Driver.
|
||||
IRQ-affinity.txt
|
||||
@ -46,62 +39,43 @@ IRQ.txt
|
||||
Intel-IOMMU.txt
|
||||
- basic info on the Intel IOMMU virtualization support.
|
||||
Makefile
|
||||
- This file does nothing. Removing it breaks make htmldocs and
|
||||
make distclean.
|
||||
ManagementStyle
|
||||
- how to (attempt to) manage kernel hackers.
|
||||
- It's not of interest for those who aren't touching the build system.
|
||||
Makefile.sphinx
|
||||
- It's not of interest for those who aren't touching the build system.
|
||||
PCI/
|
||||
- info related to PCI drivers.
|
||||
RCU/
|
||||
- directory with info on RCU (read-copy update).
|
||||
SAK.txt
|
||||
- info on Secure Attention Keys.
|
||||
SM501.txt
|
||||
- Silicon Motion SM501 multimedia companion chip
|
||||
SecurityBugs
|
||||
- procedure for reporting security bugs found in the kernel.
|
||||
SubmitChecklist
|
||||
- Linux kernel patch submission checklist.
|
||||
SubmittingDrivers
|
||||
- procedure to get a new driver source included into the kernel tree.
|
||||
SubmittingPatches
|
||||
- procedure to get a source patch included into the kernel tree.
|
||||
VGA-softcursor.txt
|
||||
- how to change your VGA cursor from a blinking underscore.
|
||||
- nothing here, just a pointer to process/coding-style.rst.
|
||||
accounting/
|
||||
- documentation on accounting and taskstats.
|
||||
acpi/
|
||||
- info on ACPI-specific hooks in the kernel.
|
||||
admin-guide/
|
||||
- info related to Linux users and system admins.
|
||||
aoe/
|
||||
- description of AoE (ATA over Ethernet) along with config examples.
|
||||
applying-patches.txt
|
||||
- description of various trees and how to apply their patches.
|
||||
arm/
|
||||
- directory with info about Linux on the ARM architecture.
|
||||
arm64/
|
||||
- directory with info about Linux on the 64 bit ARM architecture.
|
||||
assoc_array.txt
|
||||
- generic associative array intro.
|
||||
atomic_ops.txt
|
||||
- semantics and behavior of atomic and bitmask operations.
|
||||
auxdisplay/
|
||||
- misc. LCD driver documentation (cfag12864b, ks0108).
|
||||
backlight/
|
||||
- directory with info on controlling backlights in flat panel displays
|
||||
bad_memory.txt
|
||||
- how to use kernel parameters to exclude bad RAM regions.
|
||||
basic_profiling.txt
|
||||
- basic instructions for those who wants to profile Linux kernel.
|
||||
bcache.txt
|
||||
- Block-layer cache on fast SSDs to improve slow (raid) I/O performance.
|
||||
binfmt_misc.txt
|
||||
- info on the kernel support for extra binary formats.
|
||||
blackfin/
|
||||
- directory with documentation for the Blackfin arch.
|
||||
block/
|
||||
- info on the Block I/O (BIO) layer.
|
||||
blockdev/
|
||||
- info on block devices & drivers
|
||||
braille-console.txt
|
||||
- info on how to use serial devices for Braille support.
|
||||
bt8xxgpio.txt
|
||||
- info on how to modify a bt8xx video card for GPIO usage.
|
||||
btmrvl.txt
|
||||
@ -114,18 +88,24 @@ cachetlb.txt
|
||||
- describes the cache/TLB flushing interfaces Linux uses.
|
||||
cdrom/
|
||||
- directory with information on the CD-ROM drivers that Linux has.
|
||||
cgroups/
|
||||
- cgroups features, including cpusets and memory controller.
|
||||
cgroup-v1/
|
||||
- cgroups v1 features, including cpusets and memory controller.
|
||||
cgroup-v2.txt
|
||||
- cgroups v2 features, including cpusets and memory controller.
|
||||
circular-buffers.txt
|
||||
- how to make use of the existing circular buffer infrastructure
|
||||
clk.txt
|
||||
- info on the common clock framework
|
||||
coccinelle.txt
|
||||
- info on how to get and use the Coccinelle code checking tool.
|
||||
cma/
|
||||
- Continuous Memory Area (CMA) debugfs interface.
|
||||
conf.py
|
||||
- It's not of interest for those who aren't touching the build system.
|
||||
connector/
|
||||
- docs on the netlink based userspace<->kernel space communication mod.
|
||||
console/
|
||||
- documentation on Linux console drivers.
|
||||
core-api/
|
||||
- documentation on kernel core components.
|
||||
cpu-freq/
|
||||
- info on CPU frequency and voltage scaling.
|
||||
cpu-hotplug.txt
|
||||
@ -150,26 +130,26 @@ debugging-via-ohci1394.txt
|
||||
- how to use firewire like a hardware debugger memory reader.
|
||||
dell_rbu.txt
|
||||
- document demonstrating the use of the Dell Remote BIOS Update driver.
|
||||
development-process/
|
||||
- how to work with the mainline kernel development process.
|
||||
dev-tools/
|
||||
- directory with info on development tools for the kernel.
|
||||
device-mapper/
|
||||
- directory with info on Device Mapper.
|
||||
devices.txt
|
||||
- plain ASCII listing of all the nodes in /dev/ with major minor #'s.
|
||||
dmaengine/
|
||||
- the DMA engine and controller API guides.
|
||||
devicetree/
|
||||
- directory with info on device tree files used by OF/PowerPC/ARM
|
||||
digsig.txt
|
||||
-info on the Digital Signature Verification API
|
||||
dma-buf-sharing.txt
|
||||
- the DMA Buffer Sharing API Guide
|
||||
docutils.conf
|
||||
- nothing here. Just a configuration file for docutils.
|
||||
dontdiff
|
||||
- file containing a list of files that should never be diff'ed.
|
||||
driver-api/
|
||||
- the Linux driver implementer's API guide.
|
||||
driver-model/
|
||||
- directory with info about Linux driver model.
|
||||
dvb/
|
||||
- info on Linux Digital Video Broadcast (DVB) subsystem.
|
||||
dynamic-debug-howto.txt
|
||||
- how to use the dynamic debug (dyndbg) feature.
|
||||
early-userspace/
|
||||
- info about initramfs, klibc, and userspace early during boot.
|
||||
edac.txt
|
||||
@ -178,14 +158,16 @@ efi-stub.txt
|
||||
- How to use the EFI boot stub to bypass GRUB or elilo on EFI systems.
|
||||
eisa.txt
|
||||
- info on EISA bus support.
|
||||
email-clients.txt
|
||||
- info on how to use e-mail to send un-mangled (git) patches.
|
||||
extcon/
|
||||
- directory with porting guide for Android kernel switch driver.
|
||||
isa.txt
|
||||
- info on EISA bus support.
|
||||
fault-injection/
|
||||
- dir with docs about the fault injection capabilities infrastructure.
|
||||
fb/
|
||||
- directory with info on the frame buffer graphics abstraction layer.
|
||||
features/
|
||||
- status of feature implementation on different architectures.
|
||||
filesystems/
|
||||
- info on the vfs and the various filesystems that Linux supports.
|
||||
firmware_class/
|
||||
@ -194,20 +176,22 @@ flexible-arrays.txt
|
||||
- how to make use of flexible sized arrays in linux
|
||||
fmc/
|
||||
- information about the FMC bus abstraction
|
||||
fpga/
|
||||
- FPGA Manager Core.
|
||||
frv/
|
||||
- Fujitsu FR-V Linux documentation.
|
||||
futex-requeue-pi.txt
|
||||
- info on requeueing of tasks from a non-PI futex to a PI futex
|
||||
gcov.txt
|
||||
- use of GCC's coverage testing tool "gcov" with the Linux kernel
|
||||
gcc-plugins.txt
|
||||
- GCC plugin infrastructure.
|
||||
gpio/
|
||||
- gpio related documentation
|
||||
gpu/
|
||||
- directory with information on GPU driver developer's guide.
|
||||
hid/
|
||||
- directory with information on human interface devices
|
||||
highuid.txt
|
||||
- notes on the change from 16 bit to 32 bit user/group IDs.
|
||||
hsi.txt
|
||||
- HSI subsystem overview.
|
||||
hwspinlock.txt
|
||||
- hardware spinlock provides hardware assistance for synchronization
|
||||
timers/
|
||||
@ -218,18 +202,18 @@ hwmon/
|
||||
- directory with docs on various hardware monitoring drivers.
|
||||
i2c/
|
||||
- directory with info about the I2C bus/protocol (2 wire, kHz speed).
|
||||
i2o/
|
||||
- directory with info about the Linux I2O subsystem.
|
||||
x86/i386/
|
||||
- directory with info about Linux on Intel 32 bit architecture.
|
||||
ia64/
|
||||
- directory with info about Linux on Intel 64 bit architecture.
|
||||
ide/
|
||||
- Information regarding the Enhanced IDE drive.
|
||||
iio/
|
||||
- info on industrial IIO configfs support.
|
||||
index.rst
|
||||
- main index for the documentation at ReST format.
|
||||
infiniband/
|
||||
- directory with documents concerning Linux InfiniBand support.
|
||||
init.txt
|
||||
- what to do when the kernel can't find the 1st process to run.
|
||||
initrd.txt
|
||||
- how to use the RAM disk as an initial/temporary root filesystem.
|
||||
input/
|
||||
- info on Linux input device support.
|
||||
intel_txt.txt
|
||||
@ -248,28 +232,16 @@ isapnp.txt
|
||||
- info on Linux ISA Plug & Play support.
|
||||
isdn/
|
||||
- directory with info on the Linux ISDN support, and supported cards.
|
||||
java.txt
|
||||
- info on the in-kernel binary support for Java(tm).
|
||||
ja_JP/
|
||||
- directory with Japanese translations of various documents
|
||||
kbuild/
|
||||
- directory with info about the kernel build process.
|
||||
kernel-doc-nano-HOWTO.txt
|
||||
- outdated info about kernel-doc documentation.
|
||||
kdump/
|
||||
- directory with mini HowTo on getting the crash dump code to work.
|
||||
kernel-docs.txt
|
||||
- listing of various WWW + books that document kernel internals.
|
||||
kernel-documentation.rst
|
||||
doc-guide/
|
||||
- how to write and format reStructuredText kernel documentation
|
||||
kernel-parameters.txt
|
||||
- summary listing of command line / boot prompt args for the kernel.
|
||||
kernel-per-CPU-kthreads.txt
|
||||
- List of all per-CPU kthreads and how they introduce jitter.
|
||||
kmemcheck.txt
|
||||
- info on dynamic checker that detects uses of uninitialized memory.
|
||||
kmemleak.txt
|
||||
- info on how to make use of the kernel memory leak detection system
|
||||
ko_KR/
|
||||
- directory with Korean translations of various documents
|
||||
kobject.txt
|
||||
- info of the kobject infrastructure of the Linux kernel.
|
||||
kprobes.txt
|
||||
@ -284,8 +256,8 @@ ldm.txt
|
||||
- a brief description of LDM (Windows Dynamic Disks).
|
||||
leds/
|
||||
- directory with info about LED handling under Linux.
|
||||
local_ops.txt
|
||||
- semantics and behavior of local atomic operations.
|
||||
livepatch/
|
||||
- info on kernel live patching.
|
||||
locking/
|
||||
- directory with info about kernel locking primitives
|
||||
lockup-watchdogs.txt
|
||||
@ -298,22 +270,24 @@ lzo.txt
|
||||
- kernel LZO decompressor input formats
|
||||
m68k/
|
||||
- directory with info about Linux on Motorola 68k architecture.
|
||||
magic-number.txt
|
||||
- list of magic numbers used to mark/protect kernel data structures.
|
||||
mailbox.txt
|
||||
- How to write drivers for the common mailbox framework (IPC).
|
||||
md.txt
|
||||
- info on boot arguments for the multiple devices driver.
|
||||
media-framework.txt
|
||||
- info on media framework, its data structures, functions and usage.
|
||||
md-cluster.txt
|
||||
- info on shared-device RAID MD cluster.
|
||||
media/
|
||||
- info on media drivers: uAPI, kAPI and driver documentation.
|
||||
memory-barriers.txt
|
||||
- info on Linux kernel memory barriers.
|
||||
memory-devices/
|
||||
- directory with info on parts like the Texas Instruments EMIF driver
|
||||
memory-hotplug.txt
|
||||
- Hotpluggable memory support, how to use and current status.
|
||||
men-chameleon-bus.txt
|
||||
- info on MEN chameleon bus.
|
||||
metag/
|
||||
- directory with info about Linux on Meta architecture.
|
||||
mic/
|
||||
- Intel Many Integrated Core (MIC) architecture device driver.
|
||||
mips/
|
||||
- directory with info about Linux on MIPS architecture.
|
||||
misc-devices/
|
||||
@ -322,12 +296,8 @@ mmc/
|
||||
- directory with info about the MMC subsystem
|
||||
mn10300/
|
||||
- directory with info about the mn10300 architecture port
|
||||
module-signing.txt
|
||||
- Kernel module signing for increased security when loading modules.
|
||||
mtd/
|
||||
- directory with info about memory technology devices (flash)
|
||||
mono.txt
|
||||
- how to execute Mono-based .NET binaries with the help of BINFMT_MISC.
|
||||
namespaces/
|
||||
- directory with various information about namespaces
|
||||
netlabel/
|
||||
@ -336,30 +306,42 @@ networking/
|
||||
- directory with info on various aspects of networking with Linux.
|
||||
nfc/
|
||||
- directory relating info about Near Field Communications support.
|
||||
nios2/
|
||||
- Linux on the Nios II architecture.
|
||||
nommu-mmap.txt
|
||||
- documentation about no-mmu memory mapping support.
|
||||
numastat.txt
|
||||
- info on how to read Numa policy hit/miss statistics in sysfs.
|
||||
oops-tracing.txt
|
||||
- how to decode those nasty internal kernel error dump messages.
|
||||
ntb.txt
|
||||
- info on Non-Transparent Bridge (NTB) drivers.
|
||||
nvdimm/
|
||||
- info on non-volatile devices.
|
||||
nvmem/
|
||||
- info on non volatile memory framework.
|
||||
output/
|
||||
- default directory where html/LaTeX/pdf files will be written.
|
||||
padata.txt
|
||||
- An introduction to the "padata" parallel execution API
|
||||
parisc/
|
||||
- directory with info on using Linux on PA-RISC architecture.
|
||||
parport.txt
|
||||
- how to use the parallel-port driver.
|
||||
parport-lowlevel.txt
|
||||
- description and usage of the low level parallel port functions.
|
||||
pcmcia/
|
||||
- info on the Linux PCMCIA driver.
|
||||
percpu-rw-semaphore.txt
|
||||
- RCU based read-write semaphore optimized for locking for reading
|
||||
perf/
|
||||
- info about the APM X-Gene SoC Performance Monitoring Unit (PMU).
|
||||
phy/
|
||||
- ino on Samsung USB 2.0 PHY adaptation layer.
|
||||
phy.txt
|
||||
- Description of the generic PHY framework.
|
||||
pi-futex.txt
|
||||
- documentation on lightweight priority inheritance futexes.
|
||||
pinctrl.txt
|
||||
- info on pinctrl subsystem and the PINMUX/PINCONF and drivers
|
||||
platform/
|
||||
- List of supported hardware by compal and Dell laptop.
|
||||
pnp.txt
|
||||
- Linux Plug and Play documentation.
|
||||
power/
|
||||
@ -372,14 +354,16 @@ preempt-locking.txt
|
||||
- info on locking under a preemptive kernel.
|
||||
printk-formats.txt
|
||||
- how to get printk format specifiers right
|
||||
process/
|
||||
- how to work with the mainline kernel development process.
|
||||
pps/
|
||||
- directory with information on the pulse-per-second support
|
||||
pti/
|
||||
- directory with info on Intel MID PTI.
|
||||
ptp/
|
||||
- directory with info on support for IEEE 1588 PTP clocks in Linux.
|
||||
pwm.txt
|
||||
- info on the pulse width modulation driver subsystem
|
||||
ramoops.txt
|
||||
- documentation of the ramoops oops/panic logging module.
|
||||
rapidio/
|
||||
- directory with info on RapidIO packet-based fabric interconnect
|
||||
rbtree.txt
|
||||
@ -406,8 +390,6 @@ security/
|
||||
- directory that contains security-related info
|
||||
serial/
|
||||
- directory with info on the low level serial API.
|
||||
serial-console.txt
|
||||
- how to set up Linux with a serial line console as the default.
|
||||
sgi-ioc4.txt
|
||||
- description of the SGI IOC4 PCI (multi function) device.
|
||||
sh/
|
||||
@ -416,24 +398,20 @@ smsc_ece1099.txt
|
||||
-info on the smsc Keyboard Scan Expansion/GPIO Expansion device.
|
||||
sound/
|
||||
- directory with info on sound card support.
|
||||
sparse.txt
|
||||
- info on how to obtain and use the sparse tool for typechecking.
|
||||
spi/
|
||||
- overview of Linux kernel Serial Peripheral Interface (SPI) support.
|
||||
stable_api_nonsense.txt
|
||||
- info on why the kernel does not have a stable in-kernel api or abi.
|
||||
stable_kernel_rules.txt
|
||||
- rules and procedures for the -stable kernel releases.
|
||||
sphinx/
|
||||
- no documentation here, just files required by Sphinx toolchain.
|
||||
sphinx-static/
|
||||
- no documentation here, just files required by Sphinx toolchain.
|
||||
static-keys.txt
|
||||
- info on how static keys allow debug code in hotpaths via patching
|
||||
svga.txt
|
||||
- short guide on selecting video modes at boot via VGA BIOS.
|
||||
sysfs-rules.txt
|
||||
- How not to use sysfs.
|
||||
sync_file.txt
|
||||
- Sync file API guide.
|
||||
sysctl/
|
||||
- directory with info on the /proc/sys/* files.
|
||||
sysrq.txt
|
||||
- info on the magic SysRq key.
|
||||
target/
|
||||
- directory with info on generating TCM v4 fabric .ko modules
|
||||
this_cpu_ops.txt
|
||||
@ -442,39 +420,29 @@ thermal/
|
||||
- directory with information on managing thermal issues (CPU/temp)
|
||||
trace/
|
||||
- directory with info on tracing technologies within linux
|
||||
translations/
|
||||
- translations of this document from English to another language
|
||||
unaligned-memory-access.txt
|
||||
- info on how to avoid arch breaking unaligned memory access in code.
|
||||
unicode.txt
|
||||
- info on the Unicode character/font mapping used in Linux.
|
||||
unshare.txt
|
||||
- description of the Linux unshare system call.
|
||||
usb/
|
||||
- directory with info regarding the Universal Serial Bus.
|
||||
vDSO/
|
||||
- directory with info regarding virtual dynamic shared objects
|
||||
vfio.txt
|
||||
- info on Virtual Function I/O used in guest/hypervisor instances.
|
||||
vgaarbiter.txt
|
||||
- info on enable/disable the legacy decoding on different VGA devices
|
||||
video-output.txt
|
||||
- sysfs class driver interface to enable/disable a video output device.
|
||||
video4linux/
|
||||
- directory with info regarding video/TV/radio cards and linux.
|
||||
virtual/
|
||||
- directory with information on the various linux virtualizations.
|
||||
vm/
|
||||
- directory with info on the Linux vm code.
|
||||
vme_api.txt
|
||||
- file relating info on the VME bus API in linux
|
||||
volatile-considered-harmful.txt
|
||||
- Why the "volatile" type class should not be used
|
||||
w1/
|
||||
- directory with documents regarding the 1-wire (w1) subsystem.
|
||||
watchdog/
|
||||
- how to auto-reboot Linux if it has "fallen and can't get up". ;-)
|
||||
wimax/
|
||||
- directory with info about Intel Wireless Wimax Connections
|
||||
workqueue.txt
|
||||
core-api/workqueue.rst
|
||||
- information on the Concurrency Managed Workqueue implementation
|
||||
x86/x86_64/
|
||||
- directory with info on Linux support for AMD x86-64 (Hammer) machines.
|
||||
@ -484,7 +452,5 @@ xtensa/
|
||||
- directory with documents relating to arch/xtensa port/implementation
|
||||
xz.txt
|
||||
- how to make use of the XZ data compression within linux kernel
|
||||
zh_CN/
|
||||
- directory with Chinese translations of various documents
|
||||
zorro.txt
|
||||
- info on writing drivers for Zorro bus devices found on Amigas.
|
||||
|
@ -84,4 +84,4 @@ stable:
|
||||
|
||||
- Kernel-internal symbols. Do not rely on the presence, absence, location, or
|
||||
type of any kernel symbol, either in System.map files or the kernel binary
|
||||
itself. See Documentation/stable_api_nonsense.txt.
|
||||
itself. See Documentation/process/stable-api-nonsense.rst.
|
||||
|
@ -347,7 +347,7 @@ Description:
|
||||
because of fragmentation, SLUB will retry with the minimum order
|
||||
possible depending on its characteristics.
|
||||
When debug_guardpage_minorder=N (N > 0) parameter is specified
|
||||
(see Documentation/kernel-parameters.txt), the minimum possible
|
||||
(see Documentation/admin-guide/kernel-parameters.rst), the minimum possible
|
||||
order is used and this sysfs entry can not be used to change
|
||||
the order at run time.
|
||||
|
||||
|
@ -1,246 +0,0 @@
|
||||
Table of contents
|
||||
=================
|
||||
|
||||
Last updated: 20 December 2005
|
||||
|
||||
Contents
|
||||
========
|
||||
|
||||
- Introduction
|
||||
- Devices not appearing
|
||||
- Finding patch that caused a bug
|
||||
-- Finding using git-bisect
|
||||
-- Finding it the old way
|
||||
- Fixing the bug
|
||||
|
||||
Introduction
|
||||
============
|
||||
|
||||
Always try the latest kernel from kernel.org and build from source. If you are
|
||||
not confident in doing that please report the bug to your distribution vendor
|
||||
instead of to a kernel developer.
|
||||
|
||||
Finding bugs is not always easy. Have a go though. If you can't find it don't
|
||||
give up. Report as much as you have found to the relevant maintainer. See
|
||||
MAINTAINERS for who that is for the subsystem you have worked on.
|
||||
|
||||
Before you submit a bug report read REPORTING-BUGS.
|
||||
|
||||
Devices not appearing
|
||||
=====================
|
||||
|
||||
Often this is caused by udev. Check that first before blaming it on the
|
||||
kernel.
|
||||
|
||||
Finding patch that caused a bug
|
||||
===============================
|
||||
|
||||
|
||||
|
||||
Finding using git-bisect
|
||||
------------------------
|
||||
|
||||
Using the provided tools with git makes finding bugs easy provided the bug is
|
||||
reproducible.
|
||||
|
||||
Steps to do it:
|
||||
- start using git for the kernel source
|
||||
- read the man page for git-bisect
|
||||
- have fun
|
||||
|
||||
Finding it the old way
|
||||
----------------------
|
||||
|
||||
[Sat Mar 2 10:32:33 PST 1996 KERNEL_BUG-HOWTO lm@sgi.com (Larry McVoy)]
|
||||
|
||||
This is how to track down a bug if you know nothing about kernel hacking.
|
||||
It's a brute force approach but it works pretty well.
|
||||
|
||||
You need:
|
||||
|
||||
. A reproducible bug - it has to happen predictably (sorry)
|
||||
. All the kernel tar files from a revision that worked to the
|
||||
revision that doesn't
|
||||
|
||||
You will then do:
|
||||
|
||||
. Rebuild a revision that you believe works, install, and verify that.
|
||||
. Do a binary search over the kernels to figure out which one
|
||||
introduced the bug. I.e., suppose 1.3.28 didn't have the bug, but
|
||||
you know that 1.3.69 does. Pick a kernel in the middle and build
|
||||
that, like 1.3.50. Build & test; if it works, pick the mid point
|
||||
between .50 and .69, else the mid point between .28 and .50.
|
||||
. You'll narrow it down to the kernel that introduced the bug. You
|
||||
can probably do better than this but it gets tricky.
|
||||
|
||||
. Narrow it down to a subdirectory
|
||||
|
||||
- Copy kernel that works into "test". Let's say that 3.62 works,
|
||||
but 3.63 doesn't. So you diff -r those two kernels and come
|
||||
up with a list of directories that changed. For each of those
|
||||
directories:
|
||||
|
||||
Copy the non-working directory next to the working directory
|
||||
as "dir.63".
|
||||
One directory at time, try moving the working directory to
|
||||
"dir.62" and mv dir.63 dir"time, try
|
||||
|
||||
mv dir dir.62
|
||||
mv dir.63 dir
|
||||
find dir -name '*.[oa]' -print | xargs rm -f
|
||||
|
||||
And then rebuild and retest. Assuming that all related
|
||||
changes were contained in the sub directory, this should
|
||||
isolate the change to a directory.
|
||||
|
||||
Problems: changes in header files may have occurred; I've
|
||||
found in my case that they were self explanatory - you may
|
||||
or may not want to give up when that happens.
|
||||
|
||||
. Narrow it down to a file
|
||||
|
||||
- You can apply the same technique to each file in the directory,
|
||||
hoping that the changes in that file are self contained.
|
||||
|
||||
. Narrow it down to a routine
|
||||
|
||||
- You can take the old file and the new file and manually create
|
||||
a merged file that has
|
||||
|
||||
#ifdef VER62
|
||||
routine()
|
||||
{
|
||||
...
|
||||
}
|
||||
#else
|
||||
routine()
|
||||
{
|
||||
...
|
||||
}
|
||||
#endif
|
||||
|
||||
And then walk through that file, one routine at a time and
|
||||
prefix it with
|
||||
|
||||
#define VER62
|
||||
/* both routines here */
|
||||
#undef VER62
|
||||
|
||||
Then recompile, retest, move the ifdefs until you find the one
|
||||
that makes the difference.
|
||||
|
||||
Finally, you take all the info that you have, kernel revisions, bug
|
||||
description, the extent to which you have narrowed it down, and pass
|
||||
that off to whomever you believe is the maintainer of that section.
|
||||
A post to linux.dev.kernel isn't such a bad idea if you've done some
|
||||
work to narrow it down.
|
||||
|
||||
If you get it down to a routine, you'll probably get a fix in 24 hours.
|
||||
|
||||
My apologies to Linus and the other kernel hackers for describing this
|
||||
brute force approach, it's hardly what a kernel hacker would do. However,
|
||||
it does work and it lets non-hackers help fix bugs. And it is cool
|
||||
because Linux snapshots will let you do this - something that you can't
|
||||
do with vendor supplied releases.
|
||||
|
||||
Fixing the bug
|
||||
==============
|
||||
|
||||
Nobody is going to tell you how to fix bugs. Seriously. You need to work it
|
||||
out. But below are some hints on how to use the tools.
|
||||
|
||||
To debug a kernel, use objdump and look for the hex offset from the crash
|
||||
output to find the valid line of code/assembler. Without debug symbols, you
|
||||
will see the assembler code for the routine shown, but if your kernel has
|
||||
debug symbols the C code will also be available. (Debug symbols can be enabled
|
||||
in the kernel hacking menu of the menu configuration.) For example:
|
||||
|
||||
objdump -r -S -l --disassemble net/dccp/ipv4.o
|
||||
|
||||
NB.: you need to be at the top level of the kernel tree for this to pick up
|
||||
your C files.
|
||||
|
||||
If you don't have access to the code you can also debug on some crash dumps
|
||||
e.g. crash dump output as shown by Dave Miller.
|
||||
|
||||
> EIP is at ip_queue_xmit+0x14/0x4c0
|
||||
> ...
|
||||
> Code: 44 24 04 e8 6f 05 00 00 e9 e8 fe ff ff 8d 76 00 8d bc 27 00 00
|
||||
> 00 00 55 57 56 53 81 ec bc 00 00 00 8b ac 24 d0 00 00 00 8b 5d 08
|
||||
> <8b> 83 3c 01 00 00 89 44 24 14 8b 45 28 85 c0 89 44 24 18 0f 85
|
||||
>
|
||||
> Put the bytes into a "foo.s" file like this:
|
||||
>
|
||||
> .text
|
||||
> .globl foo
|
||||
> foo:
|
||||
> .byte .... /* bytes from Code: part of OOPS dump */
|
||||
>
|
||||
> Compile it with "gcc -c -o foo.o foo.s" then look at the output of
|
||||
> "objdump --disassemble foo.o".
|
||||
>
|
||||
> Output:
|
||||
>
|
||||
> ip_queue_xmit:
|
||||
> push %ebp
|
||||
> push %edi
|
||||
> push %esi
|
||||
> push %ebx
|
||||
> sub $0xbc, %esp
|
||||
> mov 0xd0(%esp), %ebp ! %ebp = arg0 (skb)
|
||||
> mov 0x8(%ebp), %ebx ! %ebx = skb->sk
|
||||
> mov 0x13c(%ebx), %eax ! %eax = inet_sk(sk)->opt
|
||||
|
||||
In addition, you can use GDB to figure out the exact file and line
|
||||
number of the OOPS from the vmlinux file. If you have
|
||||
CONFIG_DEBUG_INFO enabled, you can simply copy the EIP value from the
|
||||
OOPS:
|
||||
|
||||
EIP: 0060:[<c021e50e>] Not tainted VLI
|
||||
|
||||
And use GDB to translate that to human-readable form:
|
||||
|
||||
gdb vmlinux
|
||||
(gdb) l *0xc021e50e
|
||||
|
||||
If you don't have CONFIG_DEBUG_INFO enabled, you use the function
|
||||
offset from the OOPS:
|
||||
|
||||
EIP is at vt_ioctl+0xda8/0x1482
|
||||
|
||||
And recompile the kernel with CONFIG_DEBUG_INFO enabled:
|
||||
|
||||
make vmlinux
|
||||
gdb vmlinux
|
||||
(gdb) p vt_ioctl
|
||||
(gdb) l *(0x<address of vt_ioctl> + 0xda8)
|
||||
or, as one command
|
||||
(gdb) l *(vt_ioctl + 0xda8)
|
||||
|
||||
If you have a call trace, such as :-
|
||||
>Call Trace:
|
||||
> [<ffffffff8802c8e9>] :jbd:log_wait_commit+0xa3/0xf5
|
||||
> [<ffffffff810482d9>] autoremove_wake_function+0x0/0x2e
|
||||
> [<ffffffff8802770b>] :jbd:journal_stop+0x1be/0x1ee
|
||||
> ...
|
||||
this shows the problem in the :jbd: module. You can load that module in gdb
|
||||
and list the relevant code.
|
||||
gdb fs/jbd/jbd.ko
|
||||
(gdb) p log_wait_commit
|
||||
(gdb) l *(0x<address> + 0xa3)
|
||||
or
|
||||
(gdb) l *(log_wait_commit + 0xa3)
|
||||
|
||||
|
||||
Another very useful option of the Kernel Hacking section in menuconfig is
|
||||
Debug memory allocations. This will help you see whether data has been
|
||||
initialised and not set before use etc. To see the values that get assigned
|
||||
with this look at mm/slab.c and search for POISON_INUSE. When using this an
|
||||
Oops will often show the poisoned data instead of zero which is the default.
|
||||
|
||||
Once you have worked out a fix please submit it upstream. After all open
|
||||
source is about sharing what you do and don't you want to be recognised for
|
||||
your genius?
|
||||
|
||||
Please do read Documentation/SubmittingPatches though to help your code get
|
||||
accepted.
|
File diff suppressed because it is too large
Load Diff
@ -9,12 +9,10 @@
|
||||
DOCBOOKS := z8530book.xml \
|
||||
kernel-hacking.xml kernel-locking.xml deviceiobook.xml \
|
||||
writing_usb_driver.xml networking.xml \
|
||||
kernel-api.xml filesystems.xml lsm.xml usb.xml kgdb.xml \
|
||||
kernel-api.xml filesystems.xml lsm.xml kgdb.xml \
|
||||
gadget.xml libata.xml mtdnand.xml librs.xml rapidio.xml \
|
||||
genericirq.xml s390-drivers.xml uio-howto.xml scsi.xml \
|
||||
debugobjects.xml sh.xml regulator.xml \
|
||||
alsa-driver-api.xml writing-an-alsa-driver.xml \
|
||||
tracepoint.xml w1.xml \
|
||||
80211.xml sh.xml regulator.xml w1.xml \
|
||||
writing_musb_glue_layer.xml crypto-API.xml iio.xml
|
||||
|
||||
ifeq ($(DOCBOOKS),)
|
||||
@ -264,6 +262,7 @@ clean-files := $(DOCBOOKS) \
|
||||
$(patsubst %.xml, %.aux.xml, $(DOCBOOKS)) \
|
||||
$(patsubst %.xml, %.xml.db, $(DOCBOOKS)) \
|
||||
$(patsubst %.xml, %.xml, $(DOCBOOKS)) \
|
||||
$(patsubst %.xml, .%.xml.cmd, $(DOCBOOKS)) \
|
||||
$(index)
|
||||
|
||||
clean-dirs := $(patsubst %.xml,%,$(DOCBOOKS)) man
|
||||
|
@ -1,142 +0,0 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
|
||||
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
|
||||
|
||||
<!-- ****************************************************** -->
|
||||
<!-- Header -->
|
||||
<!-- ****************************************************** -->
|
||||
<book id="ALSA-Driver-API">
|
||||
<bookinfo>
|
||||
<title>The ALSA Driver API</title>
|
||||
|
||||
<legalnotice>
|
||||
<para>
|
||||
This document is free; you can redistribute it and/or modify it
|
||||
under the terms of the GNU General Public License as published by
|
||||
the Free Software Foundation; either version 2 of the License, or
|
||||
(at your option) any later version.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
This document is distributed in the hope that it will be useful,
|
||||
but <emphasis>WITHOUT ANY WARRANTY</emphasis>; without even the
|
||||
implied warranty of <emphasis>MERCHANTABILITY or FITNESS FOR A
|
||||
PARTICULAR PURPOSE</emphasis>. See the GNU General Public License
|
||||
for more details.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
You should have received a copy of the GNU General Public
|
||||
License along with this program; if not, write to the Free
|
||||
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
|
||||
MA 02111-1307 USA
|
||||
</para>
|
||||
</legalnotice>
|
||||
|
||||
</bookinfo>
|
||||
|
||||
<toc></toc>
|
||||
|
||||
<chapter><title>Management of Cards and Devices</title>
|
||||
<sect1><title>Card Management</title>
|
||||
!Esound/core/init.c
|
||||
</sect1>
|
||||
<sect1><title>Device Components</title>
|
||||
!Esound/core/device.c
|
||||
</sect1>
|
||||
<sect1><title>Module requests and Device File Entries</title>
|
||||
!Esound/core/sound.c
|
||||
</sect1>
|
||||
<sect1><title>Memory Management Helpers</title>
|
||||
!Esound/core/memory.c
|
||||
!Esound/core/memalloc.c
|
||||
</sect1>
|
||||
</chapter>
|
||||
<chapter><title>PCM API</title>
|
||||
<sect1><title>PCM Core</title>
|
||||
!Esound/core/pcm.c
|
||||
!Esound/core/pcm_lib.c
|
||||
!Esound/core/pcm_native.c
|
||||
!Iinclude/sound/pcm.h
|
||||
</sect1>
|
||||
<sect1><title>PCM Format Helpers</title>
|
||||
!Esound/core/pcm_misc.c
|
||||
</sect1>
|
||||
<sect1><title>PCM Memory Management</title>
|
||||
!Esound/core/pcm_memory.c
|
||||
</sect1>
|
||||
<sect1><title>PCM DMA Engine API</title>
|
||||
!Esound/core/pcm_dmaengine.c
|
||||
!Iinclude/sound/dmaengine_pcm.h
|
||||
</sect1>
|
||||
</chapter>
|
||||
<chapter><title>Control/Mixer API</title>
|
||||
<sect1><title>General Control Interface</title>
|
||||
!Esound/core/control.c
|
||||
</sect1>
|
||||
<sect1><title>AC97 Codec API</title>
|
||||
!Esound/pci/ac97/ac97_codec.c
|
||||
!Esound/pci/ac97/ac97_pcm.c
|
||||
</sect1>
|
||||
<sect1><title>Virtual Master Control API</title>
|
||||
!Esound/core/vmaster.c
|
||||
!Iinclude/sound/control.h
|
||||
</sect1>
|
||||
</chapter>
|
||||
<chapter><title>MIDI API</title>
|
||||
<sect1><title>Raw MIDI API</title>
|
||||
!Esound/core/rawmidi.c
|
||||
</sect1>
|
||||
<sect1><title>MPU401-UART API</title>
|
||||
!Esound/drivers/mpu401/mpu401_uart.c
|
||||
</sect1>
|
||||
</chapter>
|
||||
<chapter><title>Proc Info API</title>
|
||||
<sect1><title>Proc Info Interface</title>
|
||||
!Esound/core/info.c
|
||||
</sect1>
|
||||
</chapter>
|
||||
<chapter><title>Compress Offload</title>
|
||||
<sect1><title>Compress Offload API</title>
|
||||
!Esound/core/compress_offload.c
|
||||
!Iinclude/uapi/sound/compress_offload.h
|
||||
!Iinclude/uapi/sound/compress_params.h
|
||||
!Iinclude/sound/compress_driver.h
|
||||
</sect1>
|
||||
</chapter>
|
||||
<chapter><title>ASoC</title>
|
||||
<sect1><title>ASoC Core API</title>
|
||||
!Iinclude/sound/soc.h
|
||||
!Esound/soc/soc-core.c
|
||||
<!-- !Esound/soc/soc-cache.c no docbook comments here -->
|
||||
!Esound/soc/soc-devres.c
|
||||
!Esound/soc/soc-io.c
|
||||
!Esound/soc/soc-pcm.c
|
||||
!Esound/soc/soc-ops.c
|
||||
!Esound/soc/soc-compress.c
|
||||
</sect1>
|
||||
<sect1><title>ASoC DAPM API</title>
|
||||
!Esound/soc/soc-dapm.c
|
||||
</sect1>
|
||||
<sect1><title>ASoC DMA Engine API</title>
|
||||
!Esound/soc/soc-generic-dmaengine-pcm.c
|
||||
</sect1>
|
||||
</chapter>
|
||||
<chapter><title>Miscellaneous Functions</title>
|
||||
<sect1><title>Hardware-Dependent Devices API</title>
|
||||
!Esound/core/hwdep.c
|
||||
</sect1>
|
||||
<sect1><title>Jack Abstraction Layer API</title>
|
||||
!Iinclude/sound/jack.h
|
||||
!Esound/core/jack.c
|
||||
!Esound/soc/soc-jack.c
|
||||
</sect1>
|
||||
<sect1><title>ISA DMA Helpers</title>
|
||||
!Esound/core/isadma.c
|
||||
</sect1>
|
||||
<sect1><title>Other Helper Macros</title>
|
||||
!Iinclude/sound/core.h
|
||||
</sect1>
|
||||
</chapter>
|
||||
|
||||
</book>
|
@ -1,443 +0,0 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
|
||||
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
|
||||
|
||||
<book id="debug-objects-guide">
|
||||
<bookinfo>
|
||||
<title>Debug objects life time</title>
|
||||
|
||||
<authorgroup>
|
||||
<author>
|
||||
<firstname>Thomas</firstname>
|
||||
<surname>Gleixner</surname>
|
||||
<affiliation>
|
||||
<address>
|
||||
<email>tglx@linutronix.de</email>
|
||||
</address>
|
||||
</affiliation>
|
||||
</author>
|
||||
</authorgroup>
|
||||
|
||||
<copyright>
|
||||
<year>2008</year>
|
||||
<holder>Thomas Gleixner</holder>
|
||||
</copyright>
|
||||
|
||||
<legalnotice>
|
||||
<para>
|
||||
This documentation is free software; you can redistribute
|
||||
it and/or modify it under the terms of the GNU General Public
|
||||
License version 2 as published by the Free Software Foundation.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
This program is distributed in the hope that it will be
|
||||
useful, but WITHOUT ANY WARRANTY; without even the implied
|
||||
warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
|
||||
See the GNU General Public License for more details.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
You should have received a copy of the GNU General Public
|
||||
License along with this program; if not, write to the Free
|
||||
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
|
||||
MA 02111-1307 USA
|
||||
</para>
|
||||
|
||||
<para>
|
||||
For more details see the file COPYING in the source
|
||||
distribution of Linux.
|
||||
</para>
|
||||
</legalnotice>
|
||||
</bookinfo>
|
||||
|
||||
<toc></toc>
|
||||
|
||||
<chapter id="intro">
|
||||
<title>Introduction</title>
|
||||
<para>
|
||||
debugobjects is a generic infrastructure to track the life time
|
||||
of kernel objects and validate the operations on those.
|
||||
</para>
|
||||
<para>
|
||||
debugobjects is useful to check for the following error patterns:
|
||||
<itemizedlist>
|
||||
<listitem><para>Activation of uninitialized objects</para></listitem>
|
||||
<listitem><para>Initialization of active objects</para></listitem>
|
||||
<listitem><para>Usage of freed/destroyed objects</para></listitem>
|
||||
</itemizedlist>
|
||||
</para>
|
||||
<para>
|
||||
debugobjects is not changing the data structure of the real
|
||||
object so it can be compiled in with a minimal runtime impact
|
||||
and enabled on demand with a kernel command line option.
|
||||
</para>
|
||||
</chapter>
|
||||
|
||||
<chapter id="howto">
|
||||
<title>Howto use debugobjects</title>
|
||||
<para>
|
||||
A kernel subsystem needs to provide a data structure which
|
||||
describes the object type and add calls into the debug code at
|
||||
appropriate places. The data structure to describe the object
|
||||
type needs at minimum the name of the object type. Optional
|
||||
functions can and should be provided to fixup detected problems
|
||||
so the kernel can continue to work and the debug information can
|
||||
be retrieved from a live system instead of hard core debugging
|
||||
with serial consoles and stack trace transcripts from the
|
||||
monitor.
|
||||
</para>
|
||||
<para>
|
||||
The debug calls provided by debugobjects are:
|
||||
<itemizedlist>
|
||||
<listitem><para>debug_object_init</para></listitem>
|
||||
<listitem><para>debug_object_init_on_stack</para></listitem>
|
||||
<listitem><para>debug_object_activate</para></listitem>
|
||||
<listitem><para>debug_object_deactivate</para></listitem>
|
||||
<listitem><para>debug_object_destroy</para></listitem>
|
||||
<listitem><para>debug_object_free</para></listitem>
|
||||
<listitem><para>debug_object_assert_init</para></listitem>
|
||||
</itemizedlist>
|
||||
Each of these functions takes the address of the real object and
|
||||
a pointer to the object type specific debug description
|
||||
structure.
|
||||
</para>
|
||||
<para>
|
||||
Each detected error is reported in the statistics and a limited
|
||||
number of errors are printk'ed including a full stack trace.
|
||||
</para>
|
||||
<para>
|
||||
The statistics are available via /sys/kernel/debug/debug_objects/stats.
|
||||
They provide information about the number of warnings and the
|
||||
number of successful fixups along with information about the
|
||||
usage of the internal tracking objects and the state of the
|
||||
internal tracking objects pool.
|
||||
</para>
|
||||
</chapter>
|
||||
<chapter id="debugfunctions">
|
||||
<title>Debug functions</title>
|
||||
<sect1 id="prototypes">
|
||||
<title>Debug object function reference</title>
|
||||
!Elib/debugobjects.c
|
||||
</sect1>
|
||||
<sect1 id="debug_object_init">
|
||||
<title>debug_object_init</title>
|
||||
<para>
|
||||
This function is called whenever the initialization function
|
||||
of a real object is called.
|
||||
</para>
|
||||
<para>
|
||||
When the real object is already tracked by debugobjects it is
|
||||
checked, whether the object can be initialized. Initializing
|
||||
is not allowed for active and destroyed objects. When
|
||||
debugobjects detects an error, then it calls the fixup_init
|
||||
function of the object type description structure if provided
|
||||
by the caller. The fixup function can correct the problem
|
||||
before the real initialization of the object happens. E.g. it
|
||||
can deactivate an active object in order to prevent damage to
|
||||
the subsystem.
|
||||
</para>
|
||||
<para>
|
||||
When the real object is not yet tracked by debugobjects,
|
||||
debugobjects allocates a tracker object for the real object
|
||||
and sets the tracker object state to ODEBUG_STATE_INIT. It
|
||||
verifies that the object is not on the callers stack. If it is
|
||||
on the callers stack then a limited number of warnings
|
||||
including a full stack trace is printk'ed. The calling code
|
||||
must use debug_object_init_on_stack() and remove the object
|
||||
before leaving the function which allocated it. See next
|
||||
section.
|
||||
</para>
|
||||
</sect1>
|
||||
|
||||
<sect1 id="debug_object_init_on_stack">
|
||||
<title>debug_object_init_on_stack</title>
|
||||
<para>
|
||||
This function is called whenever the initialization function
|
||||
of a real object which resides on the stack is called.
|
||||
</para>
|
||||
<para>
|
||||
When the real object is already tracked by debugobjects it is
|
||||
checked, whether the object can be initialized. Initializing
|
||||
is not allowed for active and destroyed objects. When
|
||||
debugobjects detects an error, then it calls the fixup_init
|
||||
function of the object type description structure if provided
|
||||
by the caller. The fixup function can correct the problem
|
||||
before the real initialization of the object happens. E.g. it
|
||||
can deactivate an active object in order to prevent damage to
|
||||
the subsystem.
|
||||
</para>
|
||||
<para>
|
||||
When the real object is not yet tracked by debugobjects
|
||||
debugobjects allocates a tracker object for the real object
|
||||
and sets the tracker object state to ODEBUG_STATE_INIT. It
|
||||
verifies that the object is on the callers stack.
|
||||
</para>
|
||||
<para>
|
||||
An object which is on the stack must be removed from the
|
||||
tracker by calling debug_object_free() before the function
|
||||
which allocates the object returns. Otherwise we keep track of
|
||||
stale objects.
|
||||
</para>
|
||||
</sect1>
|
||||
|
||||
<sect1 id="debug_object_activate">
|
||||
<title>debug_object_activate</title>
|
||||
<para>
|
||||
This function is called whenever the activation function of a
|
||||
real object is called.
|
||||
</para>
|
||||
<para>
|
||||
When the real object is already tracked by debugobjects it is
|
||||
checked, whether the object can be activated. Activating is
|
||||
not allowed for active and destroyed objects. When
|
||||
debugobjects detects an error, then it calls the
|
||||
fixup_activate function of the object type description
|
||||
structure if provided by the caller. The fixup function can
|
||||
correct the problem before the real activation of the object
|
||||
happens. E.g. it can deactivate an active object in order to
|
||||
prevent damage to the subsystem.
|
||||
</para>
|
||||
<para>
|
||||
When the real object is not yet tracked by debugobjects then
|
||||
the fixup_activate function is called if available. This is
|
||||
necessary to allow the legitimate activation of statically
|
||||
allocated and initialized objects. The fixup function checks
|
||||
whether the object is valid and calls the debug_objects_init()
|
||||
function to initialize the tracking of this object.
|
||||
</para>
|
||||
<para>
|
||||
When the activation is legitimate, then the state of the
|
||||
associated tracker object is set to ODEBUG_STATE_ACTIVE.
|
||||
</para>
|
||||
</sect1>
|
||||
|
||||
<sect1 id="debug_object_deactivate">
|
||||
<title>debug_object_deactivate</title>
|
||||
<para>
|
||||
This function is called whenever the deactivation function of
|
||||
a real object is called.
|
||||
</para>
|
||||
<para>
|
||||
When the real object is tracked by debugobjects it is checked,
|
||||
whether the object can be deactivated. Deactivating is not
|
||||
allowed for untracked or destroyed objects.
|
||||
</para>
|
||||
<para>
|
||||
When the deactivation is legitimate, then the state of the
|
||||
associated tracker object is set to ODEBUG_STATE_INACTIVE.
|
||||
</para>
|
||||
</sect1>
|
||||
|
||||
<sect1 id="debug_object_destroy">
|
||||
<title>debug_object_destroy</title>
|
||||
<para>
|
||||
This function is called to mark an object destroyed. This is
|
||||
useful to prevent the usage of invalid objects, which are
|
||||
still available in memory: either statically allocated objects
|
||||
or objects which are freed later.
|
||||
</para>
|
||||
<para>
|
||||
When the real object is tracked by debugobjects it is checked,
|
||||
whether the object can be destroyed. Destruction is not
|
||||
allowed for active and destroyed objects. When debugobjects
|
||||
detects an error, then it calls the fixup_destroy function of
|
||||
the object type description structure if provided by the
|
||||
caller. The fixup function can correct the problem before the
|
||||
real destruction of the object happens. E.g. it can deactivate
|
||||
an active object in order to prevent damage to the subsystem.
|
||||
</para>
|
||||
<para>
|
||||
When the destruction is legitimate, then the state of the
|
||||
associated tracker object is set to ODEBUG_STATE_DESTROYED.
|
||||
</para>
|
||||
</sect1>
|
||||
|
||||
<sect1 id="debug_object_free">
|
||||
<title>debug_object_free</title>
|
||||
<para>
|
||||
This function is called before an object is freed.
|
||||
</para>
|
||||
<para>
|
||||
When the real object is tracked by debugobjects it is checked,
|
||||
whether the object can be freed. Free is not allowed for
|
||||
active objects. When debugobjects detects an error, then it
|
||||
calls the fixup_free function of the object type description
|
||||
structure if provided by the caller. The fixup function can
|
||||
correct the problem before the real free of the object
|
||||
happens. E.g. it can deactivate an active object in order to
|
||||
prevent damage to the subsystem.
|
||||
</para>
|
||||
<para>
|
||||
Note that debug_object_free removes the object from the
|
||||
tracker. Later usage of the object is detected by the other
|
||||
debug checks.
|
||||
</para>
|
||||
</sect1>
|
||||
|
||||
<sect1 id="debug_object_assert_init">
|
||||
<title>debug_object_assert_init</title>
|
||||
<para>
|
||||
This function is called to assert that an object has been
|
||||
initialized.
|
||||
</para>
|
||||
<para>
|
||||
When the real object is not tracked by debugobjects, it calls
|
||||
fixup_assert_init of the object type description structure
|
||||
provided by the caller, with the hardcoded object state
|
||||
ODEBUG_NOT_AVAILABLE. The fixup function can correct the problem
|
||||
by calling debug_object_init and other specific initializing
|
||||
functions.
|
||||
</para>
|
||||
<para>
|
||||
When the real object is already tracked by debugobjects it is
|
||||
ignored.
|
||||
</para>
|
||||
</sect1>
|
||||
</chapter>
|
||||
<chapter id="fixupfunctions">
|
||||
<title>Fixup functions</title>
|
||||
<sect1 id="debug_obj_descr">
|
||||
<title>Debug object type description structure</title>
|
||||
!Iinclude/linux/debugobjects.h
|
||||
</sect1>
|
||||
<sect1 id="fixup_init">
|
||||
<title>fixup_init</title>
|
||||
<para>
|
||||
This function is called from the debug code whenever a problem
|
||||
in debug_object_init is detected. The function takes the
|
||||
address of the object and the state which is currently
|
||||
recorded in the tracker.
|
||||
</para>
|
||||
<para>
|
||||
Called from debug_object_init when the object state is:
|
||||
<itemizedlist>
|
||||
<listitem><para>ODEBUG_STATE_ACTIVE</para></listitem>
|
||||
</itemizedlist>
|
||||
</para>
|
||||
<para>
|
||||
The function returns true when the fixup was successful,
|
||||
otherwise false. The return value is used to update the
|
||||
statistics.
|
||||
</para>
|
||||
<para>
|
||||
Note, that the function needs to call the debug_object_init()
|
||||
function again, after the damage has been repaired in order to
|
||||
keep the state consistent.
|
||||
</para>
|
||||
</sect1>
|
||||
|
||||
<sect1 id="fixup_activate">
|
||||
<title>fixup_activate</title>
|
||||
<para>
|
||||
This function is called from the debug code whenever a problem
|
||||
in debug_object_activate is detected.
|
||||
</para>
|
||||
<para>
|
||||
Called from debug_object_activate when the object state is:
|
||||
<itemizedlist>
|
||||
<listitem><para>ODEBUG_STATE_NOTAVAILABLE</para></listitem>
|
||||
<listitem><para>ODEBUG_STATE_ACTIVE</para></listitem>
|
||||
</itemizedlist>
|
||||
</para>
|
||||
<para>
|
||||
The function returns true when the fixup was successful,
|
||||
otherwise false. The return value is used to update the
|
||||
statistics.
|
||||
</para>
|
||||
<para>
|
||||
Note that the function needs to call the debug_object_activate()
|
||||
function again after the damage has been repaired in order to
|
||||
keep the state consistent.
|
||||
</para>
|
||||
<para>
|
||||
The activation of statically initialized objects is a special
|
||||
case. When debug_object_activate() has no tracked object for
|
||||
this object address then fixup_activate() is called with
|
||||
object state ODEBUG_STATE_NOTAVAILABLE. The fixup function
|
||||
needs to check whether this is a legitimate case of a
|
||||
statically initialized object or not. In case it is it calls
|
||||
debug_object_init() and debug_object_activate() to make the
|
||||
object known to the tracker and marked active. In this case
|
||||
the function should return false because this is not a real
|
||||
fixup.
|
||||
</para>
|
||||
</sect1>
|
||||
|
||||
<sect1 id="fixup_destroy">
|
||||
<title>fixup_destroy</title>
|
||||
<para>
|
||||
This function is called from the debug code whenever a problem
|
||||
in debug_object_destroy is detected.
|
||||
</para>
|
||||
<para>
|
||||
Called from debug_object_destroy when the object state is:
|
||||
<itemizedlist>
|
||||
<listitem><para>ODEBUG_STATE_ACTIVE</para></listitem>
|
||||
</itemizedlist>
|
||||
</para>
|
||||
<para>
|
||||
The function returns true when the fixup was successful,
|
||||
otherwise false. The return value is used to update the
|
||||
statistics.
|
||||
</para>
|
||||
</sect1>
|
||||
<sect1 id="fixup_free">
|
||||
<title>fixup_free</title>
|
||||
<para>
|
||||
This function is called from the debug code whenever a problem
|
||||
in debug_object_free is detected. Further it can be called
|
||||
from the debug checks in kfree/vfree, when an active object is
|
||||
detected from the debug_check_no_obj_freed() sanity checks.
|
||||
</para>
|
||||
<para>
|
||||
Called from debug_object_free() or debug_check_no_obj_freed()
|
||||
when the object state is:
|
||||
<itemizedlist>
|
||||
<listitem><para>ODEBUG_STATE_ACTIVE</para></listitem>
|
||||
</itemizedlist>
|
||||
</para>
|
||||
<para>
|
||||
The function returns true when the fixup was successful,
|
||||
otherwise false. The return value is used to update the
|
||||
statistics.
|
||||
</para>
|
||||
</sect1>
|
||||
<sect1 id="fixup_assert_init">
|
||||
<title>fixup_assert_init</title>
|
||||
<para>
|
||||
This function is called from the debug code whenever a problem
|
||||
in debug_object_assert_init is detected.
|
||||
</para>
|
||||
<para>
|
||||
Called from debug_object_assert_init() with a hardcoded state
|
||||
ODEBUG_STATE_NOTAVAILABLE when the object is not found in the
|
||||
debug bucket.
|
||||
</para>
|
||||
<para>
|
||||
The function returns true when the fixup was successful,
|
||||
otherwise false. The return value is used to update the
|
||||
statistics.
|
||||
</para>
|
||||
<para>
|
||||
Note, this function should make sure debug_object_init() is
|
||||
called before returning.
|
||||
</para>
|
||||
<para>
|
||||
The handling of statically initialized objects is a special
|
||||
case. The fixup function should check if this is a legitimate
|
||||
case of a statically initialized object or not. In this case only
|
||||
debug_object_init() should be called to make the object known to
|
||||
the tracker. Then the function should return false because this
|
||||
is not
|
||||
a real fixup.
|
||||
</para>
|
||||
</sect1>
|
||||
</chapter>
|
||||
<chapter id="bugs">
|
||||
<title>Known Bugs And Assumptions</title>
|
||||
<para>
|
||||
None (knock on wood).
|
||||
</para>
|
||||
</chapter>
|
||||
</book>
|
@ -1208,8 +1208,8 @@ static struct block_device_operations opt_fops = {
|
||||
|
||||
<listitem>
|
||||
<para>
|
||||
Finally, don't forget to read <filename>Documentation/SubmittingPatches</filename>
|
||||
and possibly <filename>Documentation/SubmittingDrivers</filename>.
|
||||
Finally, don't forget to read <filename>Documentation/process/submitting-patches.rst</filename>
|
||||
and possibly <filename>Documentation/process/submitting-drivers.rst</filename>.
|
||||
</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
|
@ -1,112 +0,0 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
|
||||
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
|
||||
|
||||
<book id="Tracepoints">
|
||||
<bookinfo>
|
||||
<title>The Linux Kernel Tracepoint API</title>
|
||||
|
||||
<authorgroup>
|
||||
<author>
|
||||
<firstname>Jason</firstname>
|
||||
<surname>Baron</surname>
|
||||
<affiliation>
|
||||
<address>
|
||||
<email>jbaron@redhat.com</email>
|
||||
</address>
|
||||
</affiliation>
|
||||
</author>
|
||||
<author>
|
||||
<firstname>William</firstname>
|
||||
<surname>Cohen</surname>
|
||||
<affiliation>
|
||||
<address>
|
||||
<email>wcohen@redhat.com</email>
|
||||
</address>
|
||||
</affiliation>
|
||||
</author>
|
||||
</authorgroup>
|
||||
|
||||
<legalnotice>
|
||||
<para>
|
||||
This documentation is free software; you can redistribute
|
||||
it and/or modify it under the terms of the GNU General Public
|
||||
License as published by the Free Software Foundation; either
|
||||
version 2 of the License, or (at your option) any later
|
||||
version.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
This program is distributed in the hope that it will be
|
||||
useful, but WITHOUT ANY WARRANTY; without even the implied
|
||||
warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
|
||||
See the GNU General Public License for more details.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
You should have received a copy of the GNU General Public
|
||||
License along with this program; if not, write to the Free
|
||||
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
|
||||
MA 02111-1307 USA
|
||||
</para>
|
||||
|
||||
<para>
|
||||
For more details see the file COPYING in the source
|
||||
distribution of Linux.
|
||||
</para>
|
||||
</legalnotice>
|
||||
</bookinfo>
|
||||
|
||||
<toc></toc>
|
||||
<chapter id="intro">
|
||||
<title>Introduction</title>
|
||||
<para>
|
||||
Tracepoints are static probe points that are located in strategic points
|
||||
throughout the kernel. 'Probes' register/unregister with tracepoints
|
||||
via a callback mechanism. The 'probes' are strictly typed functions that
|
||||
are passed a unique set of parameters defined by each tracepoint.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
From this simple callback mechanism, 'probes' can be used to profile, debug,
|
||||
and understand kernel behavior. There are a number of tools that provide a
|
||||
framework for using 'probes'. These tools include Systemtap, ftrace, and
|
||||
LTTng.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Tracepoints are defined in a number of header files via various macros. Thus,
|
||||
the purpose of this document is to provide a clear accounting of the available
|
||||
tracepoints. The intention is to understand not only what tracepoints are
|
||||
available but also to understand where future tracepoints might be added.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The API presented has functions of the form:
|
||||
<function>trace_tracepointname(function parameters)</function>. These are the
|
||||
tracepoints callbacks that are found throughout the code. Registering and
|
||||
unregistering probes with these callback sites is covered in the
|
||||
<filename>Documentation/trace/*</filename> directory.
|
||||
</para>
|
||||
</chapter>
|
||||
|
||||
<chapter id="irq">
|
||||
<title>IRQ</title>
|
||||
!Iinclude/trace/events/irq.h
|
||||
</chapter>
|
||||
|
||||
<chapter id="signal">
|
||||
<title>SIGNAL</title>
|
||||
!Iinclude/trace/events/signal.h
|
||||
</chapter>
|
||||
|
||||
<chapter id="block">
|
||||
<title>Block IO</title>
|
||||
!Iinclude/trace/events/block.h
|
||||
</chapter>
|
||||
|
||||
<chapter id="workqueue">
|
||||
<title>Workqueue</title>
|
||||
!Iinclude/trace/events/workqueue.h
|
||||
</chapter>
|
||||
</book>
|
@ -45,6 +45,13 @@ GPL version 2.
|
||||
</abstract>
|
||||
|
||||
<revhistory>
|
||||
<revision>
|
||||
<revnumber>0.10</revnumber>
|
||||
<date>2016-10-17</date>
|
||||
<authorinitials>sch</authorinitials>
|
||||
<revremark>Added generic hyperv driver
|
||||
</revremark>
|
||||
</revision>
|
||||
<revision>
|
||||
<revnumber>0.9</revnumber>
|
||||
<date>2009-07-16</date>
|
||||
@ -1033,6 +1040,61 @@ int main()
|
||||
|
||||
</chapter>
|
||||
|
||||
<chapter id="uio_hv_generic" xreflabel="Using Generic driver for Hyper-V VMBUS">
|
||||
<?dbhtml filename="uio_hv_generic.html"?>
|
||||
<title>Generic Hyper-V UIO driver</title>
|
||||
<para>
|
||||
The generic driver is a kernel module named uio_hv_generic.
|
||||
It supports devices on the Hyper-V VMBus similar to uio_pci_generic
|
||||
on PCI bus.
|
||||
</para>
|
||||
|
||||
<sect1 id="uio_hv_generic_binding">
|
||||
<title>Making the driver recognize the device</title>
|
||||
<para>
|
||||
Since the driver does not declare any device GUID's, it will not get loaded
|
||||
automatically and will not automatically bind to any devices, you must load it
|
||||
and allocate id to the driver yourself. For example, to use the network device
|
||||
GUID:
|
||||
<programlisting>
|
||||
modprobe uio_hv_generic
|
||||
echo "f8615163-df3e-46c5-913f-f2d2f965ed0e" > /sys/bus/vmbus/drivers/uio_hv_generic/new_id
|
||||
</programlisting>
|
||||
</para>
|
||||
<para>
|
||||
If there already is a hardware specific kernel driver for the device, the
|
||||
generic driver still won't bind to it, in this case if you want to use the
|
||||
generic driver (why would you?) you'll have to manually unbind the hardware
|
||||
specific driver and bind the generic driver, like this:
|
||||
<programlisting>
|
||||
echo -n vmbus-ed963694-e847-4b2a-85af-bc9cfc11d6f3 > /sys/bus/vmbus/drivers/hv_netvsc/unbind
|
||||
echo -n vmbus-ed963694-e847-4b2a-85af-bc9cfc11d6f3 > /sys/bus/vmbus/drivers/uio_hv_generic/bind
|
||||
</programlisting>
|
||||
</para>
|
||||
<para>
|
||||
You can verify that the device has been bound to the driver
|
||||
by looking for it in sysfs, for example like the following:
|
||||
<programlisting>
|
||||
ls -l /sys/bus/vmbus/devices/vmbus-ed963694-e847-4b2a-85af-bc9cfc11d6f3/driver
|
||||
</programlisting>
|
||||
Which if successful should print
|
||||
<programlisting>
|
||||
.../vmbus-ed963694-e847-4b2a-85af-bc9cfc11d6f3/driver -> ../../../bus/vmbus/drivers/uio_hv_generic
|
||||
</programlisting>
|
||||
</para>
|
||||
</sect1>
|
||||
|
||||
<sect1 id="uio_hv_generic_internals">
|
||||
<title>Things to know about uio_hv_generic</title>
|
||||
<para>
|
||||
On each interrupt, uio_hv_generic sets the Interrupt Disable bit.
|
||||
This prevents the device from generating further interrupts
|
||||
until the bit is cleared. The userspace driver should clear this
|
||||
bit before blocking and waiting for more interrupts.
|
||||
</para>
|
||||
</sect1>
|
||||
</chapter>
|
||||
|
||||
<appendix id="app1">
|
||||
<title>Further information</title>
|
||||
<itemizedlist>
|
||||
|
@ -1,992 +0,0 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
|
||||
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
|
||||
|
||||
<book id="Linux-USB-API">
|
||||
<bookinfo>
|
||||
<title>The Linux-USB Host Side API</title>
|
||||
|
||||
<legalnotice>
|
||||
<para>
|
||||
This documentation is free software; you can redistribute
|
||||
it and/or modify it under the terms of the GNU General Public
|
||||
License as published by the Free Software Foundation; either
|
||||
version 2 of the License, or (at your option) any later
|
||||
version.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
This program is distributed in the hope that it will be
|
||||
useful, but WITHOUT ANY WARRANTY; without even the implied
|
||||
warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
|
||||
See the GNU General Public License for more details.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
You should have received a copy of the GNU General Public
|
||||
License along with this program; if not, write to the Free
|
||||
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
|
||||
MA 02111-1307 USA
|
||||
</para>
|
||||
|
||||
<para>
|
||||
For more details see the file COPYING in the source
|
||||
distribution of Linux.
|
||||
</para>
|
||||
</legalnotice>
|
||||
</bookinfo>
|
||||
|
||||
<toc></toc>
|
||||
|
||||
<chapter id="intro">
|
||||
<title>Introduction to USB on Linux</title>
|
||||
|
||||
<para>A Universal Serial Bus (USB) is used to connect a host,
|
||||
such as a PC or workstation, to a number of peripheral
|
||||
devices. USB uses a tree structure, with the host as the
|
||||
root (the system's master), hubs as interior nodes, and
|
||||
peripherals as leaves (and slaves).
|
||||
Modern PCs support several such trees of USB devices, usually
|
||||
one USB 2.0 tree (480 Mbit/sec each) with
|
||||
a few USB 1.1 trees (12 Mbit/sec each) that are used when you
|
||||
connect a USB 1.1 device directly to the machine's "root hub".
|
||||
</para>
|
||||
|
||||
<para>That master/slave asymmetry was designed-in for a number of
|
||||
reasons, one being ease of use. It is not physically possible to
|
||||
assemble (legal) USB cables incorrectly: all upstream "to the host"
|
||||
connectors are the rectangular type (matching the sockets on
|
||||
root hubs), and all downstream connectors are the squarish type
|
||||
(or they are built into the peripheral).
|
||||
Also, the host software doesn't need to deal with distributed
|
||||
auto-configuration since the pre-designated master node manages all that.
|
||||
And finally, at the electrical level, bus protocol overhead is reduced by
|
||||
eliminating arbitration and moving scheduling into the host software.
|
||||
</para>
|
||||
|
||||
<para>USB 1.0 was announced in January 1996 and was revised
|
||||
as USB 1.1 (with improvements in hub specification and
|
||||
support for interrupt-out transfers) in September 1998.
|
||||
USB 2.0 was released in April 2000, adding high-speed
|
||||
transfers and transaction-translating hubs (used for USB 1.1
|
||||
and 1.0 backward compatibility).
|
||||
</para>
|
||||
|
||||
<para>Kernel developers added USB support to Linux early in the 2.2 kernel
|
||||
series, shortly before 2.3 development forked. Updates from 2.3 were
|
||||
regularly folded back into 2.2 releases, which improved reliability and
|
||||
brought <filename>/sbin/hotplug</filename> support as well more drivers.
|
||||
Such improvements were continued in the 2.5 kernel series, where they added
|
||||
USB 2.0 support, improved performance, and made the host controller drivers
|
||||
(HCDs) more consistent. They also simplified the API (to make bugs less
|
||||
likely) and added internal "kerneldoc" documentation.
|
||||
</para>
|
||||
|
||||
<para>Linux can run inside USB devices as well as on
|
||||
the hosts that control the devices.
|
||||
But USB device drivers running inside those peripherals
|
||||
don't do the same things as the ones running inside hosts,
|
||||
so they've been given a different name:
|
||||
<emphasis>gadget drivers</emphasis>.
|
||||
This document does not cover gadget drivers.
|
||||
</para>
|
||||
|
||||
</chapter>
|
||||
|
||||
<chapter id="host">
|
||||
<title>USB Host-Side API Model</title>
|
||||
|
||||
<para>Host-side drivers for USB devices talk to the "usbcore" APIs.
|
||||
There are two. One is intended for
|
||||
<emphasis>general-purpose</emphasis> drivers (exposed through
|
||||
driver frameworks), and the other is for drivers that are
|
||||
<emphasis>part of the core</emphasis>.
|
||||
Such core drivers include the <emphasis>hub</emphasis> driver
|
||||
(which manages trees of USB devices) and several different kinds
|
||||
of <emphasis>host controller drivers</emphasis>,
|
||||
which control individual busses.
|
||||
</para>
|
||||
|
||||
<para>The device model seen by USB drivers is relatively complex.
|
||||
</para>
|
||||
|
||||
<itemizedlist>
|
||||
|
||||
<listitem><para>USB supports four kinds of data transfers
|
||||
(control, bulk, interrupt, and isochronous). Two of them (control
|
||||
and bulk) use bandwidth as it's available,
|
||||
while the other two (interrupt and isochronous)
|
||||
are scheduled to provide guaranteed bandwidth.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para>The device description model includes one or more
|
||||
"configurations" per device, only one of which is active at a time.
|
||||
Devices that are capable of high-speed operation must also support
|
||||
full-speed configurations, along with a way to ask about the
|
||||
"other speed" configurations which might be used.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para>Configurations have one or more "interfaces", each
|
||||
of which may have "alternate settings". Interfaces may be
|
||||
standardized by USB "Class" specifications, or may be specific to
|
||||
a vendor or device.</para>
|
||||
|
||||
<para>USB device drivers actually bind to interfaces, not devices.
|
||||
Think of them as "interface drivers", though you
|
||||
may not see many devices where the distinction is important.
|
||||
<emphasis>Most USB devices are simple, with only one configuration,
|
||||
one interface, and one alternate setting.</emphasis>
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para>Interfaces have one or more "endpoints", each of
|
||||
which supports one type and direction of data transfer such as
|
||||
"bulk out" or "interrupt in". The entire configuration may have
|
||||
up to sixteen endpoints in each direction, allocated as needed
|
||||
among all the interfaces.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para>Data transfer on USB is packetized; each endpoint
|
||||
has a maximum packet size.
|
||||
Drivers must often be aware of conventions such as flagging the end
|
||||
of bulk transfers using "short" (including zero length) packets.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para>The Linux USB API supports synchronous calls for
|
||||
control and bulk messages.
|
||||
It also supports asynchronous calls for all kinds of data transfer,
|
||||
using request structures called "URBs" (USB Request Blocks).
|
||||
</para></listitem>
|
||||
|
||||
</itemizedlist>
|
||||
|
||||
<para>Accordingly, the USB Core API exposed to device drivers
|
||||
covers quite a lot of territory. You'll probably need to consult
|
||||
the USB 2.0 specification, available online from www.usb.org at
|
||||
no cost, as well as class or device specifications.
|
||||
</para>
|
||||
|
||||
<para>The only host-side drivers that actually touch hardware
|
||||
(reading/writing registers, handling IRQs, and so on) are the HCDs.
|
||||
In theory, all HCDs provide the same functionality through the same
|
||||
API. In practice, that's becoming more true on the 2.5 kernels,
|
||||
but there are still differences that crop up especially with
|
||||
fault handling. Different controllers don't necessarily report
|
||||
the same aspects of failures, and recovery from faults (including
|
||||
software-induced ones like unlinking an URB) isn't yet fully
|
||||
consistent.
|
||||
Device driver authors should make a point of doing disconnect
|
||||
testing (while the device is active) with each different host
|
||||
controller driver, to make sure drivers don't have bugs of
|
||||
their own as well as to make sure they aren't relying on some
|
||||
HCD-specific behavior.
|
||||
(You will need external USB 1.1 and/or
|
||||
USB 2.0 hubs to perform all those tests.)
|
||||
</para>
|
||||
|
||||
</chapter>
|
||||
|
||||
<chapter id="types"><title>USB-Standard Types</title>
|
||||
|
||||
<para>In <filename><linux/usb/ch9.h></filename> you will find
|
||||
the USB data types defined in chapter 9 of the USB specification.
|
||||
These data types are used throughout USB, and in APIs including
|
||||
this host side API, gadget APIs, and usbfs.
|
||||
</para>
|
||||
|
||||
!Iinclude/linux/usb/ch9.h
|
||||
|
||||
</chapter>
|
||||
|
||||
<chapter id="hostside"><title>Host-Side Data Types and Macros</title>
|
||||
|
||||
<para>The host side API exposes several layers to drivers, some of
|
||||
which are more necessary than others.
|
||||
These support lifecycle models for host side drivers
|
||||
and devices, and support passing buffers through usbcore to
|
||||
some HCD that performs the I/O for the device driver.
|
||||
</para>
|
||||
|
||||
|
||||
!Iinclude/linux/usb.h
|
||||
|
||||
</chapter>
|
||||
|
||||
<chapter id="usbcore"><title>USB Core APIs</title>
|
||||
|
||||
<para>There are two basic I/O models in the USB API.
|
||||
The most elemental one is asynchronous: drivers submit requests
|
||||
in the form of an URB, and the URB's completion callback
|
||||
handle the next step.
|
||||
All USB transfer types support that model, although there
|
||||
are special cases for control URBs (which always have setup
|
||||
and status stages, but may not have a data stage) and
|
||||
isochronous URBs (which allow large packets and include
|
||||
per-packet fault reports).
|
||||
Built on top of that is synchronous API support, where a
|
||||
driver calls a routine that allocates one or more URBs,
|
||||
submits them, and waits until they complete.
|
||||
There are synchronous wrappers for single-buffer control
|
||||
and bulk transfers (which are awkward to use in some
|
||||
driver disconnect scenarios), and for scatterlist based
|
||||
streaming i/o (bulk or interrupt).
|
||||
</para>
|
||||
|
||||
<para>USB drivers need to provide buffers that can be
|
||||
used for DMA, although they don't necessarily need to
|
||||
provide the DMA mapping themselves.
|
||||
There are APIs to use used when allocating DMA buffers,
|
||||
which can prevent use of bounce buffers on some systems.
|
||||
In some cases, drivers may be able to rely on 64bit DMA
|
||||
to eliminate another kind of bounce buffer.
|
||||
</para>
|
||||
|
||||
!Edrivers/usb/core/urb.c
|
||||
!Edrivers/usb/core/message.c
|
||||
!Edrivers/usb/core/file.c
|
||||
!Edrivers/usb/core/driver.c
|
||||
!Edrivers/usb/core/usb.c
|
||||
!Edrivers/usb/core/hub.c
|
||||
</chapter>
|
||||
|
||||
<chapter id="hcd"><title>Host Controller APIs</title>
|
||||
|
||||
<para>These APIs are only for use by host controller drivers,
|
||||
most of which implement standard register interfaces such as
|
||||
EHCI, OHCI, or UHCI.
|
||||
UHCI was one of the first interfaces, designed by Intel and
|
||||
also used by VIA; it doesn't do much in hardware.
|
||||
OHCI was designed later, to have the hardware do more work
|
||||
(bigger transfers, tracking protocol state, and so on).
|
||||
EHCI was designed with USB 2.0; its design has features that
|
||||
resemble OHCI (hardware does much more work) as well as
|
||||
UHCI (some parts of ISO support, TD list processing).
|
||||
</para>
|
||||
|
||||
<para>There are host controllers other than the "big three",
|
||||
although most PCI based controllers (and a few non-PCI based
|
||||
ones) use one of those interfaces.
|
||||
Not all host controllers use DMA; some use PIO, and there
|
||||
is also a simulator.
|
||||
</para>
|
||||
|
||||
<para>The same basic APIs are available to drivers for all
|
||||
those controllers.
|
||||
For historical reasons they are in two layers:
|
||||
<structname>struct usb_bus</structname> is a rather thin
|
||||
layer that became available in the 2.2 kernels, while
|
||||
<structname>struct usb_hcd</structname> is a more featureful
|
||||
layer (available in later 2.4 kernels and in 2.5) that
|
||||
lets HCDs share common code, to shrink driver size
|
||||
and significantly reduce hcd-specific behaviors.
|
||||
</para>
|
||||
|
||||
!Edrivers/usb/core/hcd.c
|
||||
!Edrivers/usb/core/hcd-pci.c
|
||||
!Idrivers/usb/core/buffer.c
|
||||
</chapter>
|
||||
|
||||
<chapter id="usbfs">
|
||||
<title>The USB Filesystem (usbfs)</title>
|
||||
|
||||
<para>This chapter presents the Linux <emphasis>usbfs</emphasis>.
|
||||
You may prefer to avoid writing new kernel code for your
|
||||
USB driver; that's the problem that usbfs set out to solve.
|
||||
User mode device drivers are usually packaged as applications
|
||||
or libraries, and may use usbfs through some programming library
|
||||
that wraps it. Such libraries include
|
||||
<ulink url="http://libusb.sourceforge.net">libusb</ulink>
|
||||
for C/C++, and
|
||||
<ulink url="http://jUSB.sourceforge.net">jUSB</ulink> for Java.
|
||||
</para>
|
||||
|
||||
<note><title>Unfinished</title>
|
||||
<para>This particular documentation is incomplete,
|
||||
especially with respect to the asynchronous mode.
|
||||
As of kernel 2.5.66 the code and this (new) documentation
|
||||
need to be cross-reviewed.
|
||||
</para>
|
||||
</note>
|
||||
|
||||
<para>Configure usbfs into Linux kernels by enabling the
|
||||
<emphasis>USB filesystem</emphasis> option (CONFIG_USB_DEVICEFS),
|
||||
and you get basic support for user mode USB device drivers.
|
||||
Until relatively recently it was often (confusingly) called
|
||||
<emphasis>usbdevfs</emphasis> although it wasn't solving what
|
||||
<emphasis>devfs</emphasis> was.
|
||||
Every USB device will appear in usbfs, regardless of whether or
|
||||
not it has a kernel driver.
|
||||
</para>
|
||||
|
||||
<sect1 id="usbfs-files">
|
||||
<title>What files are in "usbfs"?</title>
|
||||
|
||||
<para>Conventionally mounted at
|
||||
<filename>/proc/bus/usb</filename>, usbfs
|
||||
features include:
|
||||
<itemizedlist>
|
||||
<listitem><para><filename>/proc/bus/usb/devices</filename>
|
||||
... a text file
|
||||
showing each of the USB devices on known to the kernel,
|
||||
and their configuration descriptors.
|
||||
You can also poll() this to learn about new devices.
|
||||
</para></listitem>
|
||||
<listitem><para><filename>/proc/bus/usb/BBB/DDD</filename>
|
||||
... magic files
|
||||
exposing the each device's configuration descriptors, and
|
||||
supporting a series of ioctls for making device requests,
|
||||
including I/O to devices. (Purely for access by programs.)
|
||||
</para></listitem>
|
||||
</itemizedlist>
|
||||
</para>
|
||||
|
||||
<para> Each bus is given a number (BBB) based on when it was
|
||||
enumerated; within each bus, each device is given a similar
|
||||
number (DDD).
|
||||
Those BBB/DDD paths are not "stable" identifiers;
|
||||
expect them to change even if you always leave the devices
|
||||
plugged in to the same hub port.
|
||||
<emphasis>Don't even think of saving these in application
|
||||
configuration files.</emphasis>
|
||||
Stable identifiers are available, for user mode applications
|
||||
that want to use them. HID and networking devices expose
|
||||
these stable IDs, so that for example you can be sure that
|
||||
you told the right UPS to power down its second server.
|
||||
"usbfs" doesn't (yet) expose those IDs.
|
||||
</para>
|
||||
|
||||
</sect1>
|
||||
|
||||
<sect1 id="usbfs-fstab">
|
||||
<title>Mounting and Access Control</title>
|
||||
|
||||
<para>There are a number of mount options for usbfs, which will
|
||||
be of most interest to you if you need to override the default
|
||||
access control policy.
|
||||
That policy is that only root may read or write device files
|
||||
(<filename>/proc/bus/BBB/DDD</filename>) although anyone may read
|
||||
the <filename>devices</filename>
|
||||
or <filename>drivers</filename> files.
|
||||
I/O requests to the device also need the CAP_SYS_RAWIO capability,
|
||||
</para>
|
||||
|
||||
<para>The significance of that is that by default, all user mode
|
||||
device drivers need super-user privileges.
|
||||
You can change modes or ownership in a driver setup
|
||||
when the device hotplugs, or maye just start the
|
||||
driver right then, as a privileged server (or some activity
|
||||
within one).
|
||||
That's the most secure approach for multi-user systems,
|
||||
but for single user systems ("trusted" by that user)
|
||||
it's more convenient just to grant everyone all access
|
||||
(using the <emphasis>devmode=0666</emphasis> option)
|
||||
so the driver can start whenever it's needed.
|
||||
</para>
|
||||
|
||||
<para>The mount options for usbfs, usable in /etc/fstab or
|
||||
in command line invocations of <emphasis>mount</emphasis>, are:
|
||||
|
||||
<variablelist>
|
||||
<varlistentry>
|
||||
<term><emphasis>busgid</emphasis>=NNNNN</term>
|
||||
<listitem><para>Controls the GID used for the
|
||||
/proc/bus/usb/BBB
|
||||
directories. (Default: 0)</para></listitem></varlistentry>
|
||||
<varlistentry><term><emphasis>busmode</emphasis>=MMM</term>
|
||||
<listitem><para>Controls the file mode used for the
|
||||
/proc/bus/usb/BBB
|
||||
directories. (Default: 0555)
|
||||
</para></listitem></varlistentry>
|
||||
<varlistentry><term><emphasis>busuid</emphasis>=NNNNN</term>
|
||||
<listitem><para>Controls the UID used for the
|
||||
/proc/bus/usb/BBB
|
||||
directories. (Default: 0)</para></listitem></varlistentry>
|
||||
|
||||
<varlistentry><term><emphasis>devgid</emphasis>=NNNNN</term>
|
||||
<listitem><para>Controls the GID used for the
|
||||
/proc/bus/usb/BBB/DDD
|
||||
files. (Default: 0)</para></listitem></varlistentry>
|
||||
<varlistentry><term><emphasis>devmode</emphasis>=MMM</term>
|
||||
<listitem><para>Controls the file mode used for the
|
||||
/proc/bus/usb/BBB/DDD
|
||||
files. (Default: 0644)</para></listitem></varlistentry>
|
||||
<varlistentry><term><emphasis>devuid</emphasis>=NNNNN</term>
|
||||
<listitem><para>Controls the UID used for the
|
||||
/proc/bus/usb/BBB/DDD
|
||||
files. (Default: 0)</para></listitem></varlistentry>
|
||||
|
||||
<varlistentry><term><emphasis>listgid</emphasis>=NNNNN</term>
|
||||
<listitem><para>Controls the GID used for the
|
||||
/proc/bus/usb/devices and drivers files.
|
||||
(Default: 0)</para></listitem></varlistentry>
|
||||
<varlistentry><term><emphasis>listmode</emphasis>=MMM</term>
|
||||
<listitem><para>Controls the file mode used for the
|
||||
/proc/bus/usb/devices and drivers files.
|
||||
(Default: 0444)</para></listitem></varlistentry>
|
||||
<varlistentry><term><emphasis>listuid</emphasis>=NNNNN</term>
|
||||
<listitem><para>Controls the UID used for the
|
||||
/proc/bus/usb/devices and drivers files.
|
||||
(Default: 0)</para></listitem></varlistentry>
|
||||
</variablelist>
|
||||
|
||||
</para>
|
||||
|
||||
<para>Note that many Linux distributions hard-wire the mount options
|
||||
for usbfs in their init scripts, such as
|
||||
<filename>/etc/rc.d/rc.sysinit</filename>,
|
||||
rather than making it easy to set this per-system
|
||||
policy in <filename>/etc/fstab</filename>.
|
||||
</para>
|
||||
|
||||
</sect1>
|
||||
|
||||
<sect1 id="usbfs-devices">
|
||||
<title>/proc/bus/usb/devices</title>
|
||||
|
||||
<para>This file is handy for status viewing tools in user
|
||||
mode, which can scan the text format and ignore most of it.
|
||||
More detailed device status (including class and vendor
|
||||
status) is available from device-specific files.
|
||||
For information about the current format of this file,
|
||||
see the
|
||||
<filename>Documentation/usb/proc_usb_info.txt</filename>
|
||||
file in your Linux kernel sources.
|
||||
</para>
|
||||
|
||||
<para>This file, in combination with the poll() system call, can
|
||||
also be used to detect when devices are added or removed:
|
||||
<programlisting>int fd;
|
||||
struct pollfd pfd;
|
||||
|
||||
fd = open("/proc/bus/usb/devices", O_RDONLY);
|
||||
pfd = { fd, POLLIN, 0 };
|
||||
for (;;) {
|
||||
/* The first time through, this call will return immediately. */
|
||||
poll(&pfd, 1, -1);
|
||||
|
||||
/* To see what's changed, compare the file's previous and current
|
||||
contents or scan the filesystem. (Scanning is more precise.) */
|
||||
}</programlisting>
|
||||
Note that this behavior is intended to be used for informational
|
||||
and debug purposes. It would be more appropriate to use programs
|
||||
such as udev or HAL to initialize a device or start a user-mode
|
||||
helper program, for instance.
|
||||
</para>
|
||||
</sect1>
|
||||
|
||||
<sect1 id="usbfs-bbbddd">
|
||||
<title>/proc/bus/usb/BBB/DDD</title>
|
||||
|
||||
<para>Use these files in one of these basic ways:
|
||||
</para>
|
||||
|
||||
<para><emphasis>They can be read,</emphasis>
|
||||
producing first the device descriptor
|
||||
(18 bytes) and then the descriptors for the current configuration.
|
||||
See the USB 2.0 spec for details about those binary data formats.
|
||||
You'll need to convert most multibyte values from little endian
|
||||
format to your native host byte order, although a few of the
|
||||
fields in the device descriptor (both of the BCD-encoded fields,
|
||||
and the vendor and product IDs) will be byteswapped for you.
|
||||
Note that configuration descriptors include descriptors for
|
||||
interfaces, altsettings, endpoints, and maybe additional
|
||||
class descriptors.
|
||||
</para>
|
||||
|
||||
<para><emphasis>Perform USB operations</emphasis> using
|
||||
<emphasis>ioctl()</emphasis> requests to make endpoint I/O
|
||||
requests (synchronously or asynchronously) or manage
|
||||
the device.
|
||||
These requests need the CAP_SYS_RAWIO capability,
|
||||
as well as filesystem access permissions.
|
||||
Only one ioctl request can be made on one of these
|
||||
device files at a time.
|
||||
This means that if you are synchronously reading an endpoint
|
||||
from one thread, you won't be able to write to a different
|
||||
endpoint from another thread until the read completes.
|
||||
This works for <emphasis>half duplex</emphasis> protocols,
|
||||
but otherwise you'd use asynchronous i/o requests.
|
||||
</para>
|
||||
|
||||
</sect1>
|
||||
|
||||
|
||||
<sect1 id="usbfs-lifecycle">
|
||||
<title>Life Cycle of User Mode Drivers</title>
|
||||
|
||||
<para>Such a driver first needs to find a device file
|
||||
for a device it knows how to handle.
|
||||
Maybe it was told about it because a
|
||||
<filename>/sbin/hotplug</filename> event handling agent
|
||||
chose that driver to handle the new device.
|
||||
Or maybe it's an application that scans all the
|
||||
/proc/bus/usb device files, and ignores most devices.
|
||||
In either case, it should <function>read()</function> all
|
||||
the descriptors from the device file,
|
||||
and check them against what it knows how to handle.
|
||||
It might just reject everything except a particular
|
||||
vendor and product ID, or need a more complex policy.
|
||||
</para>
|
||||
|
||||
<para>Never assume there will only be one such device
|
||||
on the system at a time!
|
||||
If your code can't handle more than one device at
|
||||
a time, at least detect when there's more than one, and
|
||||
have your users choose which device to use.
|
||||
</para>
|
||||
|
||||
<para>Once your user mode driver knows what device to use,
|
||||
it interacts with it in either of two styles.
|
||||
The simple style is to make only control requests; some
|
||||
devices don't need more complex interactions than those.
|
||||
(An example might be software using vendor-specific control
|
||||
requests for some initialization or configuration tasks,
|
||||
with a kernel driver for the rest.)
|
||||
</para>
|
||||
|
||||
<para>More likely, you need a more complex style driver:
|
||||
one using non-control endpoints, reading or writing data
|
||||
and claiming exclusive use of an interface.
|
||||
<emphasis>Bulk</emphasis> transfers are easiest to use,
|
||||
but only their sibling <emphasis>interrupt</emphasis> transfers
|
||||
work with low speed devices.
|
||||
Both interrupt and <emphasis>isochronous</emphasis> transfers
|
||||
offer service guarantees because their bandwidth is reserved.
|
||||
Such "periodic" transfers are awkward to use through usbfs,
|
||||
unless you're using the asynchronous calls. However, interrupt
|
||||
transfers can also be used in a synchronous "one shot" style.
|
||||
</para>
|
||||
|
||||
<para>Your user-mode driver should never need to worry
|
||||
about cleaning up request state when the device is
|
||||
disconnected, although it should close its open file
|
||||
descriptors as soon as it starts seeing the ENODEV
|
||||
errors.
|
||||
</para>
|
||||
|
||||
</sect1>
|
||||
|
||||
<sect1 id="usbfs-ioctl"><title>The ioctl() Requests</title>
|
||||
|
||||
<para>To use these ioctls, you need to include the following
|
||||
headers in your userspace program:
|
||||
<programlisting>#include <linux/usb.h>
|
||||
#include <linux/usbdevice_fs.h>
|
||||
#include <asm/byteorder.h></programlisting>
|
||||
The standard USB device model requests, from "Chapter 9" of
|
||||
the USB 2.0 specification, are automatically included from
|
||||
the <filename><linux/usb/ch9.h></filename> header.
|
||||
</para>
|
||||
|
||||
<para>Unless noted otherwise, the ioctl requests
|
||||
described here will
|
||||
update the modification time on the usbfs file to which
|
||||
they are applied (unless they fail).
|
||||
A return of zero indicates success; otherwise, a
|
||||
standard USB error code is returned. (These are
|
||||
documented in
|
||||
<filename>Documentation/usb/error-codes.txt</filename>
|
||||
in your kernel sources.)
|
||||
</para>
|
||||
|
||||
<para>Each of these files multiplexes access to several
|
||||
I/O streams, one per endpoint.
|
||||
Each device has one control endpoint (endpoint zero)
|
||||
which supports a limited RPC style RPC access.
|
||||
Devices are configured
|
||||
by hub_wq (in the kernel) setting a device-wide
|
||||
<emphasis>configuration</emphasis> that affects things
|
||||
like power consumption and basic functionality.
|
||||
The endpoints are part of USB <emphasis>interfaces</emphasis>,
|
||||
which may have <emphasis>altsettings</emphasis>
|
||||
affecting things like which endpoints are available.
|
||||
Many devices only have a single configuration and interface,
|
||||
so drivers for them will ignore configurations and altsettings.
|
||||
</para>
|
||||
|
||||
|
||||
<sect2 id="usbfs-mgmt">
|
||||
<title>Management/Status Requests</title>
|
||||
|
||||
<para>A number of usbfs requests don't deal very directly
|
||||
with device I/O.
|
||||
They mostly relate to device management and status.
|
||||
These are all synchronous requests.
|
||||
</para>
|
||||
|
||||
<variablelist>
|
||||
|
||||
<varlistentry><term>USBDEVFS_CLAIMINTERFACE</term>
|
||||
<listitem><para>This is used to force usbfs to
|
||||
claim a specific interface,
|
||||
which has not previously been claimed by usbfs or any other
|
||||
kernel driver.
|
||||
The ioctl parameter is an integer holding the number of
|
||||
the interface (bInterfaceNumber from descriptor).
|
||||
</para><para>
|
||||
Note that if your driver doesn't claim an interface
|
||||
before trying to use one of its endpoints, and no
|
||||
other driver has bound to it, then the interface is
|
||||
automatically claimed by usbfs.
|
||||
</para><para>
|
||||
This claim will be released by a RELEASEINTERFACE ioctl,
|
||||
or by closing the file descriptor.
|
||||
File modification time is not updated by this request.
|
||||
</para></listitem></varlistentry>
|
||||
|
||||
<varlistentry><term>USBDEVFS_CONNECTINFO</term>
|
||||
<listitem><para>Says whether the device is lowspeed.
|
||||
The ioctl parameter points to a structure like this:
|
||||
<programlisting>struct usbdevfs_connectinfo {
|
||||
unsigned int devnum;
|
||||
unsigned char slow;
|
||||
}; </programlisting>
|
||||
File modification time is not updated by this request.
|
||||
</para><para>
|
||||
<emphasis>You can't tell whether a "not slow"
|
||||
device is connected at high speed (480 MBit/sec)
|
||||
or just full speed (12 MBit/sec).</emphasis>
|
||||
You should know the devnum value already,
|
||||
it's the DDD value of the device file name.
|
||||
</para></listitem></varlistentry>
|
||||
|
||||
<varlistentry><term>USBDEVFS_GETDRIVER</term>
|
||||
<listitem><para>Returns the name of the kernel driver
|
||||
bound to a given interface (a string). Parameter
|
||||
is a pointer to this structure, which is modified:
|
||||
<programlisting>struct usbdevfs_getdriver {
|
||||
unsigned int interface;
|
||||
char driver[USBDEVFS_MAXDRIVERNAME + 1];
|
||||
};</programlisting>
|
||||
File modification time is not updated by this request.
|
||||
</para></listitem></varlistentry>
|
||||
|
||||
<varlistentry><term>USBDEVFS_IOCTL</term>
|
||||
<listitem><para>Passes a request from userspace through
|
||||
to a kernel driver that has an ioctl entry in the
|
||||
<emphasis>struct usb_driver</emphasis> it registered.
|
||||
<programlisting>struct usbdevfs_ioctl {
|
||||
int ifno;
|
||||
int ioctl_code;
|
||||
void *data;
|
||||
};
|
||||
|
||||
/* user mode call looks like this.
|
||||
* 'request' becomes the driver->ioctl() 'code' parameter.
|
||||
* the size of 'param' is encoded in 'request', and that data
|
||||
* is copied to or from the driver->ioctl() 'buf' parameter.
|
||||
*/
|
||||
static int
|
||||
usbdev_ioctl (int fd, int ifno, unsigned request, void *param)
|
||||
{
|
||||
struct usbdevfs_ioctl wrapper;
|
||||
|
||||
wrapper.ifno = ifno;
|
||||
wrapper.ioctl_code = request;
|
||||
wrapper.data = param;
|
||||
|
||||
return ioctl (fd, USBDEVFS_IOCTL, &wrapper);
|
||||
} </programlisting>
|
||||
File modification time is not updated by this request.
|
||||
</para><para>
|
||||
This request lets kernel drivers talk to user mode code
|
||||
through filesystem operations even when they don't create
|
||||
a character or block special device.
|
||||
It's also been used to do things like ask devices what
|
||||
device special file should be used.
|
||||
Two pre-defined ioctls are used
|
||||
to disconnect and reconnect kernel drivers, so
|
||||
that user mode code can completely manage binding
|
||||
and configuration of devices.
|
||||
</para></listitem></varlistentry>
|
||||
|
||||
<varlistentry><term>USBDEVFS_RELEASEINTERFACE</term>
|
||||
<listitem><para>This is used to release the claim usbfs
|
||||
made on interface, either implicitly or because of a
|
||||
USBDEVFS_CLAIMINTERFACE call, before the file
|
||||
descriptor is closed.
|
||||
The ioctl parameter is an integer holding the number of
|
||||
the interface (bInterfaceNumber from descriptor);
|
||||
File modification time is not updated by this request.
|
||||
</para><warning><para>
|
||||
<emphasis>No security check is made to ensure
|
||||
that the task which made the claim is the one
|
||||
which is releasing it.
|
||||
This means that user mode driver may interfere
|
||||
other ones. </emphasis>
|
||||
</para></warning></listitem></varlistentry>
|
||||
|
||||
<varlistentry><term>USBDEVFS_RESETEP</term>
|
||||
<listitem><para>Resets the data toggle value for an endpoint
|
||||
(bulk or interrupt) to DATA0.
|
||||
The ioctl parameter is an integer endpoint number
|
||||
(1 to 15, as identified in the endpoint descriptor),
|
||||
with USB_DIR_IN added if the device's endpoint sends
|
||||
data to the host.
|
||||
</para><warning><para>
|
||||
<emphasis>Avoid using this request.
|
||||
It should probably be removed.</emphasis>
|
||||
Using it typically means the device and driver will lose
|
||||
toggle synchronization. If you really lost synchronization,
|
||||
you likely need to completely handshake with the device,
|
||||
using a request like CLEAR_HALT
|
||||
or SET_INTERFACE.
|
||||
</para></warning></listitem></varlistentry>
|
||||
|
||||
<varlistentry><term>USBDEVFS_DROP_PRIVILEGES</term>
|
||||
<listitem><para>This is used to relinquish the ability
|
||||
to do certain operations which are considered to be
|
||||
privileged on a usbfs file descriptor.
|
||||
This includes claiming arbitrary interfaces, resetting
|
||||
a device on which there are currently claimed interfaces
|
||||
from other users, and issuing USBDEVFS_IOCTL calls.
|
||||
The ioctl parameter is a 32 bit mask of interfaces
|
||||
the user is allowed to claim on this file descriptor.
|
||||
You may issue this ioctl more than one time to narrow
|
||||
said mask.
|
||||
</para></listitem></varlistentry>
|
||||
</variablelist>
|
||||
|
||||
</sect2>
|
||||
|
||||
<sect2 id="usbfs-sync">
|
||||
<title>Synchronous I/O Support</title>
|
||||
|
||||
<para>Synchronous requests involve the kernel blocking
|
||||
until the user mode request completes, either by
|
||||
finishing successfully or by reporting an error.
|
||||
In most cases this is the simplest way to use usbfs,
|
||||
although as noted above it does prevent performing I/O
|
||||
to more than one endpoint at a time.
|
||||
</para>
|
||||
|
||||
<variablelist>
|
||||
|
||||
<varlistentry><term>USBDEVFS_BULK</term>
|
||||
<listitem><para>Issues a bulk read or write request to the
|
||||
device.
|
||||
The ioctl parameter is a pointer to this structure:
|
||||
<programlisting>struct usbdevfs_bulktransfer {
|
||||
unsigned int ep;
|
||||
unsigned int len;
|
||||
unsigned int timeout; /* in milliseconds */
|
||||
void *data;
|
||||
};</programlisting>
|
||||
</para><para>The "ep" value identifies a
|
||||
bulk endpoint number (1 to 15, as identified in an endpoint
|
||||
descriptor),
|
||||
masked with USB_DIR_IN when referring to an endpoint which
|
||||
sends data to the host from the device.
|
||||
The length of the data buffer is identified by "len";
|
||||
Recent kernels support requests up to about 128KBytes.
|
||||
<emphasis>FIXME say how read length is returned,
|
||||
and how short reads are handled.</emphasis>.
|
||||
</para></listitem></varlistentry>
|
||||
|
||||
<varlistentry><term>USBDEVFS_CLEAR_HALT</term>
|
||||
<listitem><para>Clears endpoint halt (stall) and
|
||||
resets the endpoint toggle. This is only
|
||||
meaningful for bulk or interrupt endpoints.
|
||||
The ioctl parameter is an integer endpoint number
|
||||
(1 to 15, as identified in an endpoint descriptor),
|
||||
masked with USB_DIR_IN when referring to an endpoint which
|
||||
sends data to the host from the device.
|
||||
</para><para>
|
||||
Use this on bulk or interrupt endpoints which have
|
||||
stalled, returning <emphasis>-EPIPE</emphasis> status
|
||||
to a data transfer request.
|
||||
Do not issue the control request directly, since
|
||||
that could invalidate the host's record of the
|
||||
data toggle.
|
||||
</para></listitem></varlistentry>
|
||||
|
||||
<varlistentry><term>USBDEVFS_CONTROL</term>
|
||||
<listitem><para>Issues a control request to the device.
|
||||
The ioctl parameter points to a structure like this:
|
||||
<programlisting>struct usbdevfs_ctrltransfer {
|
||||
__u8 bRequestType;
|
||||
__u8 bRequest;
|
||||
__u16 wValue;
|
||||
__u16 wIndex;
|
||||
__u16 wLength;
|
||||
__u32 timeout; /* in milliseconds */
|
||||
void *data;
|
||||
};</programlisting>
|
||||
</para><para>
|
||||
The first eight bytes of this structure are the contents
|
||||
of the SETUP packet to be sent to the device; see the
|
||||
USB 2.0 specification for details.
|
||||
The bRequestType value is composed by combining a
|
||||
USB_TYPE_* value, a USB_DIR_* value, and a
|
||||
USB_RECIP_* value (from
|
||||
<emphasis><linux/usb.h></emphasis>).
|
||||
If wLength is nonzero, it describes the length of the data
|
||||
buffer, which is either written to the device
|
||||
(USB_DIR_OUT) or read from the device (USB_DIR_IN).
|
||||
</para><para>
|
||||
At this writing, you can't transfer more than 4 KBytes
|
||||
of data to or from a device; usbfs has a limit, and
|
||||
some host controller drivers have a limit.
|
||||
(That's not usually a problem.)
|
||||
<emphasis>Also</emphasis> there's no way to say it's
|
||||
not OK to get a short read back from the device.
|
||||
</para></listitem></varlistentry>
|
||||
|
||||
<varlistentry><term>USBDEVFS_RESET</term>
|
||||
<listitem><para>Does a USB level device reset.
|
||||
The ioctl parameter is ignored.
|
||||
After the reset, this rebinds all device interfaces.
|
||||
File modification time is not updated by this request.
|
||||
</para><warning><para>
|
||||
<emphasis>Avoid using this call</emphasis>
|
||||
until some usbcore bugs get fixed,
|
||||
since it does not fully synchronize device, interface,
|
||||
and driver (not just usbfs) state.
|
||||
</para></warning></listitem></varlistentry>
|
||||
|
||||
<varlistentry><term>USBDEVFS_SETINTERFACE</term>
|
||||
<listitem><para>Sets the alternate setting for an
|
||||
interface. The ioctl parameter is a pointer to a
|
||||
structure like this:
|
||||
<programlisting>struct usbdevfs_setinterface {
|
||||
unsigned int interface;
|
||||
unsigned int altsetting;
|
||||
}; </programlisting>
|
||||
File modification time is not updated by this request.
|
||||
</para><para>
|
||||
Those struct members are from some interface descriptor
|
||||
applying to the current configuration.
|
||||
The interface number is the bInterfaceNumber value, and
|
||||
the altsetting number is the bAlternateSetting value.
|
||||
(This resets each endpoint in the interface.)
|
||||
</para></listitem></varlistentry>
|
||||
|
||||
<varlistentry><term>USBDEVFS_SETCONFIGURATION</term>
|
||||
<listitem><para>Issues the
|
||||
<function>usb_set_configuration</function> call
|
||||
for the device.
|
||||
The parameter is an integer holding the number of
|
||||
a configuration (bConfigurationValue from descriptor).
|
||||
File modification time is not updated by this request.
|
||||
</para><warning><para>
|
||||
<emphasis>Avoid using this call</emphasis>
|
||||
until some usbcore bugs get fixed,
|
||||
since it does not fully synchronize device, interface,
|
||||
and driver (not just usbfs) state.
|
||||
</para></warning></listitem></varlistentry>
|
||||
|
||||
</variablelist>
|
||||
</sect2>
|
||||
|
||||
<sect2 id="usbfs-async">
|
||||
<title>Asynchronous I/O Support</title>
|
||||
|
||||
<para>As mentioned above, there are situations where it may be
|
||||
important to initiate concurrent operations from user mode code.
|
||||
This is particularly important for periodic transfers
|
||||
(interrupt and isochronous), but it can be used for other
|
||||
kinds of USB requests too.
|
||||
In such cases, the asynchronous requests described here
|
||||
are essential. Rather than submitting one request and having
|
||||
the kernel block until it completes, the blocking is separate.
|
||||
</para>
|
||||
|
||||
<para>These requests are packaged into a structure that
|
||||
resembles the URB used by kernel device drivers.
|
||||
(No POSIX Async I/O support here, sorry.)
|
||||
It identifies the endpoint type (USBDEVFS_URB_TYPE_*),
|
||||
endpoint (number, masked with USB_DIR_IN as appropriate),
|
||||
buffer and length, and a user "context" value serving to
|
||||
uniquely identify each request.
|
||||
(It's usually a pointer to per-request data.)
|
||||
Flags can modify requests (not as many as supported for
|
||||
kernel drivers).
|
||||
</para>
|
||||
|
||||
<para>Each request can specify a realtime signal number
|
||||
(between SIGRTMIN and SIGRTMAX, inclusive) to request a
|
||||
signal be sent when the request completes.
|
||||
</para>
|
||||
|
||||
<para>When usbfs returns these urbs, the status value
|
||||
is updated, and the buffer may have been modified.
|
||||
Except for isochronous transfers, the actual_length is
|
||||
updated to say how many bytes were transferred; if the
|
||||
USBDEVFS_URB_DISABLE_SPD flag is set
|
||||
("short packets are not OK"), if fewer bytes were read
|
||||
than were requested then you get an error report.
|
||||
</para>
|
||||
|
||||
<programlisting>struct usbdevfs_iso_packet_desc {
|
||||
unsigned int length;
|
||||
unsigned int actual_length;
|
||||
unsigned int status;
|
||||
};
|
||||
|
||||
struct usbdevfs_urb {
|
||||
unsigned char type;
|
||||
unsigned char endpoint;
|
||||
int status;
|
||||
unsigned int flags;
|
||||
void *buffer;
|
||||
int buffer_length;
|
||||
int actual_length;
|
||||
int start_frame;
|
||||
int number_of_packets;
|
||||
int error_count;
|
||||
unsigned int signr;
|
||||
void *usercontext;
|
||||
struct usbdevfs_iso_packet_desc iso_frame_desc[];
|
||||
};</programlisting>
|
||||
|
||||
<para> For these asynchronous requests, the file modification
|
||||
time reflects when the request was initiated.
|
||||
This contrasts with their use with the synchronous requests,
|
||||
where it reflects when requests complete.
|
||||
</para>
|
||||
|
||||
<variablelist>
|
||||
|
||||
<varlistentry><term>USBDEVFS_DISCARDURB</term>
|
||||
<listitem><para>
|
||||
<emphasis>TBS</emphasis>
|
||||
File modification time is not updated by this request.
|
||||
</para><para>
|
||||
</para></listitem></varlistentry>
|
||||
|
||||
<varlistentry><term>USBDEVFS_DISCSIGNAL</term>
|
||||
<listitem><para>
|
||||
<emphasis>TBS</emphasis>
|
||||
File modification time is not updated by this request.
|
||||
</para><para>
|
||||
</para></listitem></varlistentry>
|
||||
|
||||
<varlistentry><term>USBDEVFS_REAPURB</term>
|
||||
<listitem><para>
|
||||
<emphasis>TBS</emphasis>
|
||||
File modification time is not updated by this request.
|
||||
</para><para>
|
||||
</para></listitem></varlistentry>
|
||||
|
||||
<varlistentry><term>USBDEVFS_REAPURBNDELAY</term>
|
||||
<listitem><para>
|
||||
<emphasis>TBS</emphasis>
|
||||
File modification time is not updated by this request.
|
||||
</para><para>
|
||||
</para></listitem></varlistentry>
|
||||
|
||||
<varlistentry><term>USBDEVFS_SUBMITURB</term>
|
||||
<listitem><para>
|
||||
<emphasis>TBS</emphasis>
|
||||
</para><para>
|
||||
</para></listitem></varlistentry>
|
||||
|
||||
</variablelist>
|
||||
</sect2>
|
||||
|
||||
</sect1>
|
||||
|
||||
</chapter>
|
||||
|
||||
</book>
|
||||
<!-- vim:syntax=sgml:sw=4
|
||||
-->
|
File diff suppressed because it is too large
Load Diff
@ -10,6 +10,8 @@ _SPHINXDIRS = $(patsubst $(srctree)/Documentation/%/conf.py,%,$(wildcard $(src
|
||||
SPHINX_CONF = conf.py
|
||||
PAPER =
|
||||
BUILDDIR = $(obj)/output
|
||||
PDFLATEX = xelatex
|
||||
LATEXOPTS = -interaction=batchmode
|
||||
|
||||
# User-friendly check for sphinx-build
|
||||
HAVE_SPHINX := $(shell if which $(SPHINXBUILD) >/dev/null 2>&1; then echo 1; else echo 0; fi)
|
||||
@ -29,7 +31,7 @@ else ifneq ($(DOCBOOKS),)
|
||||
else # HAVE_SPHINX
|
||||
|
||||
# User-friendly check for pdflatex
|
||||
HAVE_PDFLATEX := $(shell if which xelatex >/dev/null 2>&1; then echo 1; else echo 0; fi)
|
||||
HAVE_PDFLATEX := $(shell if which $(PDFLATEX) >/dev/null 2>&1; then echo 1; else echo 0; fi)
|
||||
|
||||
# Internal variables.
|
||||
PAPEROPT_a4 = -D latex_paper_size=a4
|
||||
@ -51,8 +53,8 @@ loop_cmd = $(echo-cmd) $(cmd_$(1))
|
||||
# $5 reST source folder relative to $(srctree)/$(src),
|
||||
# e.g. "media" for the linux-tv book-set at ./Documentation/media
|
||||
|
||||
quiet_cmd_sphinx = SPHINX $@ --> file://$(abspath $(BUILDDIR)/$3/$4);
|
||||
cmd_sphinx = $(MAKE) BUILDDIR=$(abspath $(BUILDDIR)) $(build)=Documentation/media all;\
|
||||
quiet_cmd_sphinx = SPHINX $@ --> file://$(abspath $(BUILDDIR)/$3/$4)
|
||||
cmd_sphinx = $(MAKE) BUILDDIR=$(abspath $(BUILDDIR)) $(build)=Documentation/media $2;\
|
||||
BUILDDIR=$(abspath $(BUILDDIR)) SPHINX_CONF=$(abspath $(srctree)/$(src)/$5/$(SPHINX_CONF)) \
|
||||
$(SPHINXBUILD) \
|
||||
-b $2 \
|
||||
@ -67,16 +69,19 @@ htmldocs:
|
||||
@$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,html,$(var),,$(var)))
|
||||
|
||||
latexdocs:
|
||||
ifeq ($(HAVE_PDFLATEX),0)
|
||||
$(warning The 'xelatex' command was not found. Make sure you have it installed and in PATH to produce PDF output.)
|
||||
@echo " SKIP Sphinx $@ target."
|
||||
else # HAVE_PDFLATEX
|
||||
@$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,latex,$(var),latex,$(var)))
|
||||
endif # HAVE_PDFLATEX
|
||||
|
||||
ifeq ($(HAVE_PDFLATEX),0)
|
||||
|
||||
pdfdocs:
|
||||
$(warning The '$(PDFLATEX)' command was not found. Make sure you have it installed and in PATH to produce PDF output.)
|
||||
@echo " SKIP Sphinx $@ target."
|
||||
|
||||
else # HAVE_PDFLATEX
|
||||
|
||||
pdfdocs: latexdocs
|
||||
ifneq ($(HAVE_PDFLATEX),0)
|
||||
$(foreach var,$(SPHINXDIRS), $(MAKE) PDFLATEX=xelatex LATEXOPTS="-interaction=nonstopmode" -C $(BUILDDIR)/$(var)/latex)
|
||||
$(foreach var,$(SPHINXDIRS), $(MAKE) PDFLATEX=$(PDFLATEX) LATEXOPTS="$(LATEXOPTS)" -C $(BUILDDIR)/$(var)/latex;)
|
||||
|
||||
endif # HAVE_PDFLATEX
|
||||
|
||||
epubdocs:
|
||||
@ -93,6 +98,7 @@ installmandocs:
|
||||
|
||||
cleandocs:
|
||||
$(Q)rm -rf $(BUILDDIR)
|
||||
$(Q)$(MAKE) BUILDDIR=$(abspath $(BUILDDIR)) -C Documentation/media clean
|
||||
|
||||
endif # HAVE_SPHINX
|
||||
|
||||
|
@ -1,841 +1 @@
|
||||
.. _submittingpatches:
|
||||
|
||||
How to Get Your Change Into the Linux Kernel or Care And Operation Of Your Linus Torvalds
|
||||
=========================================================================================
|
||||
|
||||
For a person or company who wishes to submit a change to the Linux
|
||||
kernel, the process can sometimes be daunting if you're not familiar
|
||||
with "the system." This text is a collection of suggestions which
|
||||
can greatly increase the chances of your change being accepted.
|
||||
|
||||
This document contains a large number of suggestions in a relatively terse
|
||||
format. For detailed information on how the kernel development process
|
||||
works, see :ref:`Documentation/development-process <development_process_main>`.
|
||||
Also, read :ref:`Documentation/SubmitChecklist <submitchecklist>`
|
||||
for a list of items to check before
|
||||
submitting code. If you are submitting a driver, also read
|
||||
:ref:`Documentation/SubmittingDrivers <submittingdrivers>`;
|
||||
for device tree binding patches, read
|
||||
Documentation/devicetree/bindings/submitting-patches.txt.
|
||||
|
||||
Many of these steps describe the default behavior of the ``git`` version
|
||||
control system; if you use ``git`` to prepare your patches, you'll find much
|
||||
of the mechanical work done for you, though you'll still need to prepare
|
||||
and document a sensible set of patches. In general, use of ``git`` will make
|
||||
your life as a kernel developer easier.
|
||||
|
||||
Creating and Sending your Change
|
||||
********************************
|
||||
|
||||
|
||||
0) Obtain a current source tree
|
||||
-------------------------------
|
||||
|
||||
If you do not have a repository with the current kernel source handy, use
|
||||
``git`` to obtain one. You'll want to start with the mainline repository,
|
||||
which can be grabbed with::
|
||||
|
||||
git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
|
||||
|
||||
Note, however, that you may not want to develop against the mainline tree
|
||||
directly. Most subsystem maintainers run their own trees and want to see
|
||||
patches prepared against those trees. See the **T:** entry for the subsystem
|
||||
in the MAINTAINERS file to find that tree, or simply ask the maintainer if
|
||||
the tree is not listed there.
|
||||
|
||||
It is still possible to download kernel releases via tarballs (as described
|
||||
in the next section), but that is the hard way to do kernel development.
|
||||
|
||||
1) ``diff -up``
|
||||
---------------
|
||||
|
||||
If you must generate your patches by hand, use ``diff -up`` or ``diff -uprN``
|
||||
to create patches. Git generates patches in this form by default; if
|
||||
you're using ``git``, you can skip this section entirely.
|
||||
|
||||
All changes to the Linux kernel occur in the form of patches, as
|
||||
generated by :manpage:`diff(1)`. When creating your patch, make sure to
|
||||
create it in "unified diff" format, as supplied by the ``-u`` argument
|
||||
to :manpage:`diff(1)`.
|
||||
Also, please use the ``-p`` argument which shows which C function each
|
||||
change is in - that makes the resultant ``diff`` a lot easier to read.
|
||||
Patches should be based in the root kernel source directory,
|
||||
not in any lower subdirectory.
|
||||
|
||||
To create a patch for a single file, it is often sufficient to do::
|
||||
|
||||
SRCTREE= linux
|
||||
MYFILE= drivers/net/mydriver.c
|
||||
|
||||
cd $SRCTREE
|
||||
cp $MYFILE $MYFILE.orig
|
||||
vi $MYFILE # make your change
|
||||
cd ..
|
||||
diff -up $SRCTREE/$MYFILE{.orig,} > /tmp/patch
|
||||
|
||||
To create a patch for multiple files, you should unpack a "vanilla",
|
||||
or unmodified kernel source tree, and generate a ``diff`` against your
|
||||
own source tree. For example::
|
||||
|
||||
MYSRC= /devel/linux
|
||||
|
||||
tar xvfz linux-3.19.tar.gz
|
||||
mv linux-3.19 linux-3.19-vanilla
|
||||
diff -uprN -X linux-3.19-vanilla/Documentation/dontdiff \
|
||||
linux-3.19-vanilla $MYSRC > /tmp/patch
|
||||
|
||||
``dontdiff`` is a list of files which are generated by the kernel during
|
||||
the build process, and should be ignored in any :manpage:`diff(1)`-generated
|
||||
patch.
|
||||
|
||||
Make sure your patch does not include any extra files which do not
|
||||
belong in a patch submission. Make sure to review your patch -after-
|
||||
generating it with :manpage:`diff(1)`, to ensure accuracy.
|
||||
|
||||
If your changes produce a lot of deltas, you need to split them into
|
||||
individual patches which modify things in logical stages; see
|
||||
:ref:`split_changes`. This will facilitate review by other kernel developers,
|
||||
very important if you want your patch accepted.
|
||||
|
||||
If you're using ``git``, ``git rebase -i`` can help you with this process. If
|
||||
you're not using ``git``, ``quilt`` <http://savannah.nongnu.org/projects/quilt>
|
||||
is another popular alternative.
|
||||
|
||||
.. _describe_changes:
|
||||
|
||||
2) Describe your changes
|
||||
------------------------
|
||||
|
||||
Describe your problem. Whether your patch is a one-line bug fix or
|
||||
5000 lines of a new feature, there must be an underlying problem that
|
||||
motivated you to do this work. Convince the reviewer that there is a
|
||||
problem worth fixing and that it makes sense for them to read past the
|
||||
first paragraph.
|
||||
|
||||
Describe user-visible impact. Straight up crashes and lockups are
|
||||
pretty convincing, but not all bugs are that blatant. Even if the
|
||||
problem was spotted during code review, describe the impact you think
|
||||
it can have on users. Keep in mind that the majority of Linux
|
||||
installations run kernels from secondary stable trees or
|
||||
vendor/product-specific trees that cherry-pick only specific patches
|
||||
from upstream, so include anything that could help route your change
|
||||
downstream: provoking circumstances, excerpts from dmesg, crash
|
||||
descriptions, performance regressions, latency spikes, lockups, etc.
|
||||
|
||||
Quantify optimizations and trade-offs. If you claim improvements in
|
||||
performance, memory consumption, stack footprint, or binary size,
|
||||
include numbers that back them up. But also describe non-obvious
|
||||
costs. Optimizations usually aren't free but trade-offs between CPU,
|
||||
memory, and readability; or, when it comes to heuristics, between
|
||||
different workloads. Describe the expected downsides of your
|
||||
optimization so that the reviewer can weigh costs against benefits.
|
||||
|
||||
Once the problem is established, describe what you are actually doing
|
||||
about it in technical detail. It's important to describe the change
|
||||
in plain English for the reviewer to verify that the code is behaving
|
||||
as you intend it to.
|
||||
|
||||
The maintainer will thank you if you write your patch description in a
|
||||
form which can be easily pulled into Linux's source code management
|
||||
system, ``git``, as a "commit log". See :ref:`explicit_in_reply_to`.
|
||||
|
||||
Solve only one problem per patch. If your description starts to get
|
||||
long, that's a sign that you probably need to split up your patch.
|
||||
See :ref:`split_changes`.
|
||||
|
||||
When you submit or resubmit a patch or patch series, include the
|
||||
complete patch description and justification for it. Don't just
|
||||
say that this is version N of the patch (series). Don't expect the
|
||||
subsystem maintainer to refer back to earlier patch versions or referenced
|
||||
URLs to find the patch description and put that into the patch.
|
||||
I.e., the patch (series) and its description should be self-contained.
|
||||
This benefits both the maintainers and reviewers. Some reviewers
|
||||
probably didn't even receive earlier versions of the patch.
|
||||
|
||||
Describe your changes in imperative mood, e.g. "make xyzzy do frotz"
|
||||
instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy
|
||||
to do frotz", as if you are giving orders to the codebase to change
|
||||
its behaviour.
|
||||
|
||||
If the patch fixes a logged bug entry, refer to that bug entry by
|
||||
number and URL. If the patch follows from a mailing list discussion,
|
||||
give a URL to the mailing list archive; use the https://lkml.kernel.org/
|
||||
redirector with a ``Message-Id``, to ensure that the links cannot become
|
||||
stale.
|
||||
|
||||
However, try to make your explanation understandable without external
|
||||
resources. In addition to giving a URL to a mailing list archive or
|
||||
bug, summarize the relevant points of the discussion that led to the
|
||||
patch as submitted.
|
||||
|
||||
If you want to refer to a specific commit, don't just refer to the
|
||||
SHA-1 ID of the commit. Please also include the oneline summary of
|
||||
the commit, to make it easier for reviewers to know what it is about.
|
||||
Example::
|
||||
|
||||
Commit e21d2170f36602ae2708 ("video: remove unnecessary
|
||||
platform_set_drvdata()") removed the unnecessary
|
||||
platform_set_drvdata(), but left the variable "dev" unused,
|
||||
delete it.
|
||||
|
||||
You should also be sure to use at least the first twelve characters of the
|
||||
SHA-1 ID. The kernel repository holds a *lot* of objects, making
|
||||
collisions with shorter IDs a real possibility. Bear in mind that, even if
|
||||
there is no collision with your six-character ID now, that condition may
|
||||
change five years from now.
|
||||
|
||||
If your patch fixes a bug in a specific commit, e.g. you found an issue using
|
||||
``git bisect``, please use the 'Fixes:' tag with the first 12 characters of
|
||||
the SHA-1 ID, and the one line summary. For example::
|
||||
|
||||
Fixes: e21d2170f366 ("video: remove unnecessary platform_set_drvdata()")
|
||||
|
||||
The following ``git config`` settings can be used to add a pretty format for
|
||||
outputting the above style in the ``git log`` or ``git show`` commands::
|
||||
|
||||
[core]
|
||||
abbrev = 12
|
||||
[pretty]
|
||||
fixes = Fixes: %h (\"%s\")
|
||||
|
||||
.. _split_changes:
|
||||
|
||||
3) Separate your changes
|
||||
------------------------
|
||||
|
||||
Separate each **logical change** into a separate patch.
|
||||
|
||||
For example, if your changes include both bug fixes and performance
|
||||
enhancements for a single driver, separate those changes into two
|
||||
or more patches. If your changes include an API update, and a new
|
||||
driver which uses that new API, separate those into two patches.
|
||||
|
||||
On the other hand, if you make a single change to numerous files,
|
||||
group those changes into a single patch. Thus a single logical change
|
||||
is contained within a single patch.
|
||||
|
||||
The point to remember is that each patch should make an easily understood
|
||||
change that can be verified by reviewers. Each patch should be justifiable
|
||||
on its own merits.
|
||||
|
||||
If one patch depends on another patch in order for a change to be
|
||||
complete, that is OK. Simply note **"this patch depends on patch X"**
|
||||
in your patch description.
|
||||
|
||||
When dividing your change into a series of patches, take special care to
|
||||
ensure that the kernel builds and runs properly after each patch in the
|
||||
series. Developers using ``git bisect`` to track down a problem can end up
|
||||
splitting your patch series at any point; they will not thank you if you
|
||||
introduce bugs in the middle.
|
||||
|
||||
If you cannot condense your patch set into a smaller set of patches,
|
||||
then only post say 15 or so at a time and wait for review and integration.
|
||||
|
||||
|
||||
|
||||
4) Style-check your changes
|
||||
---------------------------
|
||||
|
||||
Check your patch for basic style violations, details of which can be
|
||||
found in
|
||||
:ref:`Documentation/CodingStyle <codingstyle>`.
|
||||
Failure to do so simply wastes
|
||||
the reviewers time and will get your patch rejected, probably
|
||||
without even being read.
|
||||
|
||||
One significant exception is when moving code from one file to
|
||||
another -- in this case you should not modify the moved code at all in
|
||||
the same patch which moves it. This clearly delineates the act of
|
||||
moving the code and your changes. This greatly aids review of the
|
||||
actual differences and allows tools to better track the history of
|
||||
the code itself.
|
||||
|
||||
Check your patches with the patch style checker prior to submission
|
||||
(scripts/checkpatch.pl). Note, though, that the style checker should be
|
||||
viewed as a guide, not as a replacement for human judgment. If your code
|
||||
looks better with a violation then its probably best left alone.
|
||||
|
||||
The checker reports at three levels:
|
||||
- ERROR: things that are very likely to be wrong
|
||||
- WARNING: things requiring careful review
|
||||
- CHECK: things requiring thought
|
||||
|
||||
You should be able to justify all violations that remain in your
|
||||
patch.
|
||||
|
||||
|
||||
5) Select the recipients for your patch
|
||||
---------------------------------------
|
||||
|
||||
You should always copy the appropriate subsystem maintainer(s) on any patch
|
||||
to code that they maintain; look through the MAINTAINERS file and the
|
||||
source code revision history to see who those maintainers are. The
|
||||
script scripts/get_maintainer.pl can be very useful at this step. If you
|
||||
cannot find a maintainer for the subsystem you are working on, Andrew
|
||||
Morton (akpm@linux-foundation.org) serves as a maintainer of last resort.
|
||||
|
||||
You should also normally choose at least one mailing list to receive a copy
|
||||
of your patch set. linux-kernel@vger.kernel.org functions as a list of
|
||||
last resort, but the volume on that list has caused a number of developers
|
||||
to tune it out. Look in the MAINTAINERS file for a subsystem-specific
|
||||
list; your patch will probably get more attention there. Please do not
|
||||
spam unrelated lists, though.
|
||||
|
||||
Many kernel-related lists are hosted on vger.kernel.org; you can find a
|
||||
list of them at http://vger.kernel.org/vger-lists.html. There are
|
||||
kernel-related lists hosted elsewhere as well, though.
|
||||
|
||||
Do not send more than 15 patches at once to the vger mailing lists!!!
|
||||
|
||||
Linus Torvalds is the final arbiter of all changes accepted into the
|
||||
Linux kernel. His e-mail address is <torvalds@linux-foundation.org>.
|
||||
He gets a lot of e-mail, and, at this point, very few patches go through
|
||||
Linus directly, so typically you should do your best to -avoid-
|
||||
sending him e-mail.
|
||||
|
||||
If you have a patch that fixes an exploitable security bug, send that patch
|
||||
to security@kernel.org. For severe bugs, a short embargo may be considered
|
||||
to allow distributors to get the patch out to users; in such cases,
|
||||
obviously, the patch should not be sent to any public lists.
|
||||
|
||||
Patches that fix a severe bug in a released kernel should be directed
|
||||
toward the stable maintainers by putting a line like this::
|
||||
|
||||
Cc: stable@vger.kernel.org
|
||||
|
||||
into the sign-off area of your patch (note, NOT an email recipient). You
|
||||
should also read
|
||||
:ref:`Documentation/stable_kernel_rules.txt <stable_kernel_rules>`
|
||||
in addition to this file.
|
||||
|
||||
Note, however, that some subsystem maintainers want to come to their own
|
||||
conclusions on which patches should go to the stable trees. The networking
|
||||
maintainer, in particular, would rather not see individual developers
|
||||
adding lines like the above to their patches.
|
||||
|
||||
If changes affect userland-kernel interfaces, please send the MAN-PAGES
|
||||
maintainer (as listed in the MAINTAINERS file) a man-pages patch, or at
|
||||
least a notification of the change, so that some information makes its way
|
||||
into the manual pages. User-space API changes should also be copied to
|
||||
linux-api@vger.kernel.org.
|
||||
|
||||
For small patches you may want to CC the Trivial Patch Monkey
|
||||
trivial@kernel.org which collects "trivial" patches. Have a look
|
||||
into the MAINTAINERS file for its current manager.
|
||||
|
||||
Trivial patches must qualify for one of the following rules:
|
||||
|
||||
- Spelling fixes in documentation
|
||||
- Spelling fixes for errors which could break :manpage:`grep(1)`
|
||||
- Warning fixes (cluttering with useless warnings is bad)
|
||||
- Compilation fixes (only if they are actually correct)
|
||||
- Runtime fixes (only if they actually fix things)
|
||||
- Removing use of deprecated functions/macros
|
||||
- Contact detail and documentation fixes
|
||||
- Non-portable code replaced by portable code (even in arch-specific,
|
||||
since people copy, as long as it's trivial)
|
||||
- Any fix by the author/maintainer of the file (ie. patch monkey
|
||||
in re-transmission mode)
|
||||
|
||||
|
||||
|
||||
6) No MIME, no links, no compression, no attachments. Just plain text
|
||||
----------------------------------------------------------------------
|
||||
|
||||
Linus and other kernel developers need to be able to read and comment
|
||||
on the changes you are submitting. It is important for a kernel
|
||||
developer to be able to "quote" your changes, using standard e-mail
|
||||
tools, so that they may comment on specific portions of your code.
|
||||
|
||||
For this reason, all patches should be submitted by e-mail "inline".
|
||||
|
||||
.. warning::
|
||||
|
||||
Be wary of your editor's word-wrap corrupting your patch,
|
||||
if you choose to cut-n-paste your patch.
|
||||
|
||||
Do not attach the patch as a MIME attachment, compressed or not.
|
||||
Many popular e-mail applications will not always transmit a MIME
|
||||
attachment as plain text, making it impossible to comment on your
|
||||
code. A MIME attachment also takes Linus a bit more time to process,
|
||||
decreasing the likelihood of your MIME-attached change being accepted.
|
||||
|
||||
Exception: If your mailer is mangling patches then someone may ask
|
||||
you to re-send them using MIME.
|
||||
|
||||
See :ref:`Documentation/email-clients.txt <email_clients>`
|
||||
for hints about configuring your e-mail client so that it sends your patches
|
||||
untouched.
|
||||
|
||||
7) E-mail size
|
||||
--------------
|
||||
|
||||
Large changes are not appropriate for mailing lists, and some
|
||||
maintainers. If your patch, uncompressed, exceeds 300 kB in size,
|
||||
it is preferred that you store your patch on an Internet-accessible
|
||||
server, and provide instead a URL (link) pointing to your patch. But note
|
||||
that if your patch exceeds 300 kB, it almost certainly needs to be broken up
|
||||
anyway.
|
||||
|
||||
8) Respond to review comments
|
||||
-----------------------------
|
||||
|
||||
Your patch will almost certainly get comments from reviewers on ways in
|
||||
which the patch can be improved. You must respond to those comments;
|
||||
ignoring reviewers is a good way to get ignored in return. Review comments
|
||||
or questions that do not lead to a code change should almost certainly
|
||||
bring about a comment or changelog entry so that the next reviewer better
|
||||
understands what is going on.
|
||||
|
||||
Be sure to tell the reviewers what changes you are making and to thank them
|
||||
for their time. Code review is a tiring and time-consuming process, and
|
||||
reviewers sometimes get grumpy. Even in that case, though, respond
|
||||
politely and address the problems they have pointed out.
|
||||
|
||||
|
||||
9) Don't get discouraged - or impatient
|
||||
---------------------------------------
|
||||
|
||||
After you have submitted your change, be patient and wait. Reviewers are
|
||||
busy people and may not get to your patch right away.
|
||||
|
||||
Once upon a time, patches used to disappear into the void without comment,
|
||||
but the development process works more smoothly than that now. You should
|
||||
receive comments within a week or so; if that does not happen, make sure
|
||||
that you have sent your patches to the right place. Wait for a minimum of
|
||||
one week before resubmitting or pinging reviewers - possibly longer during
|
||||
busy times like merge windows.
|
||||
|
||||
|
||||
10) Include PATCH in the subject
|
||||
--------------------------------
|
||||
|
||||
Due to high e-mail traffic to Linus, and to linux-kernel, it is common
|
||||
convention to prefix your subject line with [PATCH]. This lets Linus
|
||||
and other kernel developers more easily distinguish patches from other
|
||||
e-mail discussions.
|
||||
|
||||
|
||||
|
||||
11) Sign your work
|
||||
------------------
|
||||
|
||||
To improve tracking of who did what, especially with patches that can
|
||||
percolate to their final resting place in the kernel through several
|
||||
layers of maintainers, we've introduced a "sign-off" procedure on
|
||||
patches that are being emailed around.
|
||||
|
||||
The sign-off is a simple line at the end of the explanation for the
|
||||
patch, which certifies that you wrote it or otherwise have the right to
|
||||
pass it on as an open-source patch. The rules are pretty simple: if you
|
||||
can certify the below:
|
||||
|
||||
Developer's Certificate of Origin 1.1
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
By making a contribution to this project, I certify that:
|
||||
|
||||
(a) The contribution was created in whole or in part by me and I
|
||||
have the right to submit it under the open source license
|
||||
indicated in the file; or
|
||||
|
||||
(b) The contribution is based upon previous work that, to the best
|
||||
of my knowledge, is covered under an appropriate open source
|
||||
license and I have the right under that license to submit that
|
||||
work with modifications, whether created in whole or in part
|
||||
by me, under the same open source license (unless I am
|
||||
permitted to submit under a different license), as indicated
|
||||
in the file; or
|
||||
|
||||
(c) The contribution was provided directly to me by some other
|
||||
person who certified (a), (b) or (c) and I have not modified
|
||||
it.
|
||||
|
||||
(d) I understand and agree that this project and the contribution
|
||||
are public and that a record of the contribution (including all
|
||||
personal information I submit with it, including my sign-off) is
|
||||
maintained indefinitely and may be redistributed consistent with
|
||||
this project or the open source license(s) involved.
|
||||
|
||||
then you just add a line saying::
|
||||
|
||||
Signed-off-by: Random J Developer <random@developer.example.org>
|
||||
|
||||
using your real name (sorry, no pseudonyms or anonymous contributions.)
|
||||
|
||||
Some people also put extra tags at the end. They'll just be ignored for
|
||||
now, but you can do this to mark internal company procedures or just
|
||||
point out some special detail about the sign-off.
|
||||
|
||||
If you are a subsystem or branch maintainer, sometimes you need to slightly
|
||||
modify patches you receive in order to merge them, because the code is not
|
||||
exactly the same in your tree and the submitters'. If you stick strictly to
|
||||
rule (c), you should ask the submitter to rediff, but this is a totally
|
||||
counter-productive waste of time and energy. Rule (b) allows you to adjust
|
||||
the code, but then it is very impolite to change one submitter's code and
|
||||
make him endorse your bugs. To solve this problem, it is recommended that
|
||||
you add a line between the last Signed-off-by header and yours, indicating
|
||||
the nature of your changes. While there is nothing mandatory about this, it
|
||||
seems like prepending the description with your mail and/or name, all
|
||||
enclosed in square brackets, is noticeable enough to make it obvious that
|
||||
you are responsible for last-minute changes. Example::
|
||||
|
||||
Signed-off-by: Random J Developer <random@developer.example.org>
|
||||
[lucky@maintainer.example.org: struct foo moved from foo.c to foo.h]
|
||||
Signed-off-by: Lucky K Maintainer <lucky@maintainer.example.org>
|
||||
|
||||
This practice is particularly helpful if you maintain a stable branch and
|
||||
want at the same time to credit the author, track changes, merge the fix,
|
||||
and protect the submitter from complaints. Note that under no circumstances
|
||||
can you change the author's identity (the From header), as it is the one
|
||||
which appears in the changelog.
|
||||
|
||||
Special note to back-porters: It seems to be a common and useful practice
|
||||
to insert an indication of the origin of a patch at the top of the commit
|
||||
message (just after the subject line) to facilitate tracking. For instance,
|
||||
here's what we see in a 3.x-stable release::
|
||||
|
||||
Date: Tue Oct 7 07:26:38 2014 -0400
|
||||
|
||||
libata: Un-break ATA blacklist
|
||||
|
||||
commit 1c40279960bcd7d52dbdf1d466b20d24b99176c8 upstream.
|
||||
|
||||
And here's what might appear in an older kernel once a patch is backported::
|
||||
|
||||
Date: Tue May 13 22:12:27 2008 +0200
|
||||
|
||||
wireless, airo: waitbusy() won't delay
|
||||
|
||||
[backport of 2.6 commit b7acbdfbd1f277c1eb23f344f899cfa4cd0bf36a]
|
||||
|
||||
Whatever the format, this information provides a valuable help to people
|
||||
tracking your trees, and to people trying to troubleshoot bugs in your
|
||||
tree.
|
||||
|
||||
|
||||
12) When to use Acked-by: and Cc:
|
||||
---------------------------------
|
||||
|
||||
The Signed-off-by: tag indicates that the signer was involved in the
|
||||
development of the patch, or that he/she was in the patch's delivery path.
|
||||
|
||||
If a person was not directly involved in the preparation or handling of a
|
||||
patch but wishes to signify and record their approval of it then they can
|
||||
ask to have an Acked-by: line added to the patch's changelog.
|
||||
|
||||
Acked-by: is often used by the maintainer of the affected code when that
|
||||
maintainer neither contributed to nor forwarded the patch.
|
||||
|
||||
Acked-by: is not as formal as Signed-off-by:. It is a record that the acker
|
||||
has at least reviewed the patch and has indicated acceptance. Hence patch
|
||||
mergers will sometimes manually convert an acker's "yep, looks good to me"
|
||||
into an Acked-by: (but note that it is usually better to ask for an
|
||||
explicit ack).
|
||||
|
||||
Acked-by: does not necessarily indicate acknowledgement of the entire patch.
|
||||
For example, if a patch affects multiple subsystems and has an Acked-by: from
|
||||
one subsystem maintainer then this usually indicates acknowledgement of just
|
||||
the part which affects that maintainer's code. Judgement should be used here.
|
||||
When in doubt people should refer to the original discussion in the mailing
|
||||
list archives.
|
||||
|
||||
If a person has had the opportunity to comment on a patch, but has not
|
||||
provided such comments, you may optionally add a ``Cc:`` tag to the patch.
|
||||
This is the only tag which might be added without an explicit action by the
|
||||
person it names - but it should indicate that this person was copied on the
|
||||
patch. This tag documents that potentially interested parties
|
||||
have been included in the discussion.
|
||||
|
||||
|
||||
13) Using Reported-by:, Tested-by:, Reviewed-by:, Suggested-by: and Fixes:
|
||||
--------------------------------------------------------------------------
|
||||
|
||||
The Reported-by tag gives credit to people who find bugs and report them and it
|
||||
hopefully inspires them to help us again in the future. Please note that if
|
||||
the bug was reported in private, then ask for permission first before using the
|
||||
Reported-by tag.
|
||||
|
||||
A Tested-by: tag indicates that the patch has been successfully tested (in
|
||||
some environment) by the person named. This tag informs maintainers that
|
||||
some testing has been performed, provides a means to locate testers for
|
||||
future patches, and ensures credit for the testers.
|
||||
|
||||
Reviewed-by:, instead, indicates that the patch has been reviewed and found
|
||||
acceptable according to the Reviewer's Statement:
|
||||
|
||||
Reviewer's statement of oversight
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
By offering my Reviewed-by: tag, I state that:
|
||||
|
||||
(a) I have carried out a technical review of this patch to
|
||||
evaluate its appropriateness and readiness for inclusion into
|
||||
the mainline kernel.
|
||||
|
||||
(b) Any problems, concerns, or questions relating to the patch
|
||||
have been communicated back to the submitter. I am satisfied
|
||||
with the submitter's response to my comments.
|
||||
|
||||
(c) While there may be things that could be improved with this
|
||||
submission, I believe that it is, at this time, (1) a
|
||||
worthwhile modification to the kernel, and (2) free of known
|
||||
issues which would argue against its inclusion.
|
||||
|
||||
(d) While I have reviewed the patch and believe it to be sound, I
|
||||
do not (unless explicitly stated elsewhere) make any
|
||||
warranties or guarantees that it will achieve its stated
|
||||
purpose or function properly in any given situation.
|
||||
|
||||
A Reviewed-by tag is a statement of opinion that the patch is an
|
||||
appropriate modification of the kernel without any remaining serious
|
||||
technical issues. Any interested reviewer (who has done the work) can
|
||||
offer a Reviewed-by tag for a patch. This tag serves to give credit to
|
||||
reviewers and to inform maintainers of the degree of review which has been
|
||||
done on the patch. Reviewed-by: tags, when supplied by reviewers known to
|
||||
understand the subject area and to perform thorough reviews, will normally
|
||||
increase the likelihood of your patch getting into the kernel.
|
||||
|
||||
A Suggested-by: tag indicates that the patch idea is suggested by the person
|
||||
named and ensures credit to the person for the idea. Please note that this
|
||||
tag should not be added without the reporter's permission, especially if the
|
||||
idea was not posted in a public forum. That said, if we diligently credit our
|
||||
idea reporters, they will, hopefully, be inspired to help us again in the
|
||||
future.
|
||||
|
||||
A Fixes: tag indicates that the patch fixes an issue in a previous commit. It
|
||||
is used to make it easy to determine where a bug originated, which can help
|
||||
review a bug fix. This tag also assists the stable kernel team in determining
|
||||
which stable kernel versions should receive your fix. This is the preferred
|
||||
method for indicating a bug fixed by the patch. See :ref:`describe_changes`
|
||||
for more details.
|
||||
|
||||
|
||||
14) The canonical patch format
|
||||
------------------------------
|
||||
|
||||
This section describes how the patch itself should be formatted. Note
|
||||
that, if you have your patches stored in a ``git`` repository, proper patch
|
||||
formatting can be had with ``git format-patch``. The tools cannot create
|
||||
the necessary text, though, so read the instructions below anyway.
|
||||
|
||||
The canonical patch subject line is::
|
||||
|
||||
Subject: [PATCH 001/123] subsystem: summary phrase
|
||||
|
||||
The canonical patch message body contains the following:
|
||||
|
||||
- A ``from`` line specifying the patch author (only needed if the person
|
||||
sending the patch is not the author).
|
||||
|
||||
- An empty line.
|
||||
|
||||
- The body of the explanation, line wrapped at 75 columns, which will
|
||||
be copied to the permanent changelog to describe this patch.
|
||||
|
||||
- The ``Signed-off-by:`` lines, described above, which will
|
||||
also go in the changelog.
|
||||
|
||||
- A marker line containing simply ``---``.
|
||||
|
||||
- Any additional comments not suitable for the changelog.
|
||||
|
||||
- The actual patch (``diff`` output).
|
||||
|
||||
The Subject line format makes it very easy to sort the emails
|
||||
alphabetically by subject line - pretty much any email reader will
|
||||
support that - since because the sequence number is zero-padded,
|
||||
the numerical and alphabetic sort is the same.
|
||||
|
||||
The ``subsystem`` in the email's Subject should identify which
|
||||
area or subsystem of the kernel is being patched.
|
||||
|
||||
The ``summary phrase`` in the email's Subject should concisely
|
||||
describe the patch which that email contains. The ``summary
|
||||
phrase`` should not be a filename. Do not use the same ``summary
|
||||
phrase`` for every patch in a whole patch series (where a ``patch
|
||||
series`` is an ordered sequence of multiple, related patches).
|
||||
|
||||
Bear in mind that the ``summary phrase`` of your email becomes a
|
||||
globally-unique identifier for that patch. It propagates all the way
|
||||
into the ``git`` changelog. The ``summary phrase`` may later be used in
|
||||
developer discussions which refer to the patch. People will want to
|
||||
google for the ``summary phrase`` to read discussion regarding that
|
||||
patch. It will also be the only thing that people may quickly see
|
||||
when, two or three months later, they are going through perhaps
|
||||
thousands of patches using tools such as ``gitk`` or ``git log
|
||||
--oneline``.
|
||||
|
||||
For these reasons, the ``summary`` must be no more than 70-75
|
||||
characters, and it must describe both what the patch changes, as well
|
||||
as why the patch might be necessary. It is challenging to be both
|
||||
succinct and descriptive, but that is what a well-written summary
|
||||
should do.
|
||||
|
||||
The ``summary phrase`` may be prefixed by tags enclosed in square
|
||||
brackets: "Subject: [PATCH <tag>...] <summary phrase>". The tags are
|
||||
not considered part of the summary phrase, but describe how the patch
|
||||
should be treated. Common tags might include a version descriptor if
|
||||
the multiple versions of the patch have been sent out in response to
|
||||
comments (i.e., "v1, v2, v3"), or "RFC" to indicate a request for
|
||||
comments. If there are four patches in a patch series the individual
|
||||
patches may be numbered like this: 1/4, 2/4, 3/4, 4/4. This assures
|
||||
that developers understand the order in which the patches should be
|
||||
applied and that they have reviewed or applied all of the patches in
|
||||
the patch series.
|
||||
|
||||
A couple of example Subjects::
|
||||
|
||||
Subject: [PATCH 2/5] ext2: improve scalability of bitmap searching
|
||||
Subject: [PATCH v2 01/27] x86: fix eflags tracking
|
||||
|
||||
The ``from`` line must be the very first line in the message body,
|
||||
and has the form:
|
||||
|
||||
From: Original Author <author@example.com>
|
||||
|
||||
The ``from`` line specifies who will be credited as the author of the
|
||||
patch in the permanent changelog. If the ``from`` line is missing,
|
||||
then the ``From:`` line from the email header will be used to determine
|
||||
the patch author in the changelog.
|
||||
|
||||
The explanation body will be committed to the permanent source
|
||||
changelog, so should make sense to a competent reader who has long
|
||||
since forgotten the immediate details of the discussion that might
|
||||
have led to this patch. Including symptoms of the failure which the
|
||||
patch addresses (kernel log messages, oops messages, etc.) is
|
||||
especially useful for people who might be searching the commit logs
|
||||
looking for the applicable patch. If a patch fixes a compile failure,
|
||||
it may not be necessary to include _all_ of the compile failures; just
|
||||
enough that it is likely that someone searching for the patch can find
|
||||
it. As in the ``summary phrase``, it is important to be both succinct as
|
||||
well as descriptive.
|
||||
|
||||
The ``---`` marker line serves the essential purpose of marking for patch
|
||||
handling tools where the changelog message ends.
|
||||
|
||||
One good use for the additional comments after the ``---`` marker is for
|
||||
a ``diffstat``, to show what files have changed, and the number of
|
||||
inserted and deleted lines per file. A ``diffstat`` is especially useful
|
||||
on bigger patches. Other comments relevant only to the moment or the
|
||||
maintainer, not suitable for the permanent changelog, should also go
|
||||
here. A good example of such comments might be ``patch changelogs``
|
||||
which describe what has changed between the v1 and v2 version of the
|
||||
patch.
|
||||
|
||||
If you are going to include a ``diffstat`` after the ``---`` marker, please
|
||||
use ``diffstat`` options ``-p 1 -w 70`` so that filenames are listed from
|
||||
the top of the kernel source tree and don't use too much horizontal
|
||||
space (easily fit in 80 columns, maybe with some indentation). (``git``
|
||||
generates appropriate diffstats by default.)
|
||||
|
||||
See more details on the proper patch format in the following
|
||||
references.
|
||||
|
||||
.. _explicit_in_reply_to:
|
||||
|
||||
15) Explicit In-Reply-To headers
|
||||
--------------------------------
|
||||
|
||||
It can be helpful to manually add In-Reply-To: headers to a patch
|
||||
(e.g., when using ``git send-email``) to associate the patch with
|
||||
previous relevant discussion, e.g. to link a bug fix to the email with
|
||||
the bug report. However, for a multi-patch series, it is generally
|
||||
best to avoid using In-Reply-To: to link to older versions of the
|
||||
series. This way multiple versions of the patch don't become an
|
||||
unmanageable forest of references in email clients. If a link is
|
||||
helpful, you can use the https://lkml.kernel.org/ redirector (e.g., in
|
||||
the cover email text) to link to an earlier version of the patch series.
|
||||
|
||||
|
||||
16) Sending ``git pull`` requests
|
||||
---------------------------------
|
||||
|
||||
If you have a series of patches, it may be most convenient to have the
|
||||
maintainer pull them directly into the subsystem repository with a
|
||||
``git pull`` operation. Note, however, that pulling patches from a developer
|
||||
requires a higher degree of trust than taking patches from a mailing list.
|
||||
As a result, many subsystem maintainers are reluctant to take pull
|
||||
requests, especially from new, unknown developers. If in doubt you can use
|
||||
the pull request as the cover letter for a normal posting of the patch
|
||||
series, giving the maintainer the option of using either.
|
||||
|
||||
A pull request should have [GIT] or [PULL] in the subject line. The
|
||||
request itself should include the repository name and the branch of
|
||||
interest on a single line; it should look something like::
|
||||
|
||||
Please pull from
|
||||
|
||||
git://jdelvare.pck.nerim.net/jdelvare-2.6 i2c-for-linus
|
||||
|
||||
to get these changes:
|
||||
|
||||
A pull request should also include an overall message saying what will be
|
||||
included in the request, a ``git shortlog`` listing of the patches
|
||||
themselves, and a ``diffstat`` showing the overall effect of the patch series.
|
||||
The easiest way to get all this information together is, of course, to let
|
||||
``git`` do it for you with the ``git request-pull`` command.
|
||||
|
||||
Some maintainers (including Linus) want to see pull requests from signed
|
||||
commits; that increases their confidence that the request actually came
|
||||
from you. Linus, in particular, will not pull from public hosting sites
|
||||
like GitHub in the absence of a signed tag.
|
||||
|
||||
The first step toward creating such tags is to make a GNUPG key and get it
|
||||
signed by one or more core kernel developers. This step can be hard for
|
||||
new developers, but there is no way around it. Attending conferences can
|
||||
be a good way to find developers who can sign your key.
|
||||
|
||||
Once you have prepared a patch series in ``git`` that you wish to have somebody
|
||||
pull, create a signed tag with ``git tag -s``. This will create a new tag
|
||||
identifying the last commit in the series and containing a signature
|
||||
created with your private key. You will also have the opportunity to add a
|
||||
changelog-style message to the tag; this is an ideal place to describe the
|
||||
effects of the pull request as a whole.
|
||||
|
||||
If the tree the maintainer will be pulling from is not the repository you
|
||||
are working from, don't forget to push the signed tag explicitly to the
|
||||
public tree.
|
||||
|
||||
When generating your pull request, use the signed tag as the target. A
|
||||
command like this will do the trick::
|
||||
|
||||
git request-pull master git://my.public.tree/linux.git my-signed-tag
|
||||
|
||||
|
||||
REFERENCES
|
||||
**********
|
||||
|
||||
Andrew Morton, "The perfect patch" (tpp).
|
||||
<http://www.ozlabs.org/~akpm/stuff/tpp.txt>
|
||||
|
||||
Jeff Garzik, "Linux kernel patch submission format".
|
||||
<http://linux.yyz.us/patch-format.html>
|
||||
|
||||
Greg Kroah-Hartman, "How to piss off a kernel subsystem maintainer".
|
||||
<http://www.kroah.com/log/linux/maintainer.html>
|
||||
|
||||
<http://www.kroah.com/log/linux/maintainer-02.html>
|
||||
|
||||
<http://www.kroah.com/log/linux/maintainer-03.html>
|
||||
|
||||
<http://www.kroah.com/log/linux/maintainer-04.html>
|
||||
|
||||
<http://www.kroah.com/log/linux/maintainer-05.html>
|
||||
|
||||
<http://www.kroah.com/log/linux/maintainer-06.html>
|
||||
|
||||
NO!!!! No more huge patch bombs to linux-kernel@vger.kernel.org people!
|
||||
<https://lkml.org/lkml/2005/7/11/336>
|
||||
|
||||
Kernel Documentation/CodingStyle:
|
||||
:ref:`Documentation/CodingStyle <codingstyle>`
|
||||
|
||||
Linus Torvalds's mail on the canonical patch format:
|
||||
<http://lkml.org/lkml/2005/4/7/183>
|
||||
|
||||
Andi Kleen, "On submitting kernel patches"
|
||||
Some strategies to get difficult or controversial changes in.
|
||||
|
||||
http://halobates.de/on-submitting-patches.pdf
|
||||
|
||||
This file has moved to process/submitting-patches.rst
|
||||
|
@ -1,39 +0,0 @@
|
||||
Software cursor for VGA by Pavel Machek <pavel@atrey.karlin.mff.cuni.cz>
|
||||
======================= and Martin Mares <mj@atrey.karlin.mff.cuni.cz>
|
||||
|
||||
Linux now has some ability to manipulate cursor appearance. Normally, you
|
||||
can set the size of hardware cursor (and also work around some ugly bugs in
|
||||
those miserable Trident cards--see #define TRIDENT_GLITCH in drivers/video/
|
||||
vgacon.c). You can now play a few new tricks: you can make your cursor look
|
||||
like a non-blinking red block, make it inverse background of the character it's
|
||||
over or to highlight that character and still choose whether the original
|
||||
hardware cursor should remain visible or not. There may be other things I have
|
||||
never thought of.
|
||||
|
||||
The cursor appearance is controlled by a "<ESC>[?1;2;3c" escape sequence
|
||||
where 1, 2 and 3 are parameters described below. If you omit any of them,
|
||||
they will default to zeroes.
|
||||
|
||||
Parameter 1 specifies cursor size (0=default, 1=invisible, 2=underline, ...,
|
||||
8=full block) + 16 if you want the software cursor to be applied + 32 if you
|
||||
want to always change the background color + 64 if you dislike having the
|
||||
background the same as the foreground. Highlights are ignored for the last two
|
||||
flags.
|
||||
|
||||
The second parameter selects character attribute bits you want to change
|
||||
(by simply XORing them with the value of this parameter). On standard VGA,
|
||||
the high four bits specify background and the low four the foreground. In both
|
||||
groups, low three bits set color (as in normal color codes used by the console)
|
||||
and the most significant one turns on highlight (or sometimes blinking--it
|
||||
depends on the configuration of your VGA).
|
||||
|
||||
The third parameter consists of character attribute bits you want to set.
|
||||
Bit setting takes place before bit toggling, so you can simply clear a bit by
|
||||
including it in both the set mask and the toggle mask.
|
||||
|
||||
Examples:
|
||||
=========
|
||||
|
||||
To get normal blinking underline, use: echo -e '\033[?2c'
|
||||
To get blinking block, use: echo -e '\033[?6c'
|
||||
To get red non-blinking block, use: echo -e '\033[?17;0;64c'
|
@ -101,6 +101,6 @@ received a notification, it will set the backlight level accordingly. This does
|
||||
not affect the sending of event to user space, they are always sent to user
|
||||
space regardless of whether or not the video module controls the backlight level
|
||||
directly. This behaviour can be controlled through the brightness_switch_enabled
|
||||
module parameter as documented in kernel-parameters.txt. It is recommended to
|
||||
module parameter as documented in admin-guide/kernel-parameters.rst. It is recommended to
|
||||
disable this behaviour once a GUI environment starts up and wants to have full
|
||||
control of the backlight level.
|
||||
|
411
Documentation/admin-guide/README.rst
Normal file
411
Documentation/admin-guide/README.rst
Normal file
@ -0,0 +1,411 @@
|
||||
Linux kernel release 4.x <http://kernel.org/>
|
||||
=============================================
|
||||
|
||||
These are the release notes for Linux version 4. Read them carefully,
|
||||
as they tell you what this is all about, explain how to install the
|
||||
kernel, and what to do if something goes wrong.
|
||||
|
||||
What is Linux?
|
||||
--------------
|
||||
|
||||
Linux is a clone of the operating system Unix, written from scratch by
|
||||
Linus Torvalds with assistance from a loosely-knit team of hackers across
|
||||
the Net. It aims towards POSIX and Single UNIX Specification compliance.
|
||||
|
||||
It has all the features you would expect in a modern fully-fledged Unix,
|
||||
including true multitasking, virtual memory, shared libraries, demand
|
||||
loading, shared copy-on-write executables, proper memory management,
|
||||
and multistack networking including IPv4 and IPv6.
|
||||
|
||||
It is distributed under the GNU General Public License - see the
|
||||
accompanying COPYING file for more details.
|
||||
|
||||
On what hardware does it run?
|
||||
-----------------------------
|
||||
|
||||
Although originally developed first for 32-bit x86-based PCs (386 or higher),
|
||||
today Linux also runs on (at least) the Compaq Alpha AXP, Sun SPARC and
|
||||
UltraSPARC, Motorola 68000, PowerPC, PowerPC64, ARM, Hitachi SuperH, Cell,
|
||||
IBM S/390, MIPS, HP PA-RISC, Intel IA-64, DEC VAX, AMD x86-64, AXIS CRIS,
|
||||
Xtensa, Tilera TILE, AVR32, ARC and Renesas M32R architectures.
|
||||
|
||||
Linux is easily portable to most general-purpose 32- or 64-bit architectures
|
||||
as long as they have a paged memory management unit (PMMU) and a port of the
|
||||
GNU C compiler (gcc) (part of The GNU Compiler Collection, GCC). Linux has
|
||||
also been ported to a number of architectures without a PMMU, although
|
||||
functionality is then obviously somewhat limited.
|
||||
Linux has also been ported to itself. You can now run the kernel as a
|
||||
userspace application - this is called UserMode Linux (UML).
|
||||
|
||||
Documentation
|
||||
-------------
|
||||
|
||||
- There is a lot of documentation available both in electronic form on
|
||||
the Internet and in books, both Linux-specific and pertaining to
|
||||
general UNIX questions. I'd recommend looking into the documentation
|
||||
subdirectories on any Linux FTP site for the LDP (Linux Documentation
|
||||
Project) books. This README is not meant to be documentation on the
|
||||
system: there are much better sources available.
|
||||
|
||||
- There are various README files in the Documentation/ subdirectory:
|
||||
these typically contain kernel-specific installation notes for some
|
||||
drivers for example. See Documentation/00-INDEX for a list of what
|
||||
is contained in each file. Please read the
|
||||
:ref:`Documentation/process/changes.rst <changes>` file, as it
|
||||
contains information about the problems, which may result by upgrading
|
||||
your kernel.
|
||||
|
||||
- The Documentation/DocBook/ subdirectory contains several guides for
|
||||
kernel developers and users. These guides can be rendered in a
|
||||
number of formats: PostScript (.ps), PDF, HTML, & man-pages, among others.
|
||||
After installation, ``make psdocs``, ``make pdfdocs``, ``make htmldocs``,
|
||||
or ``make mandocs`` will render the documentation in the requested format.
|
||||
|
||||
Installing the kernel source
|
||||
----------------------------
|
||||
|
||||
- If you install the full sources, put the kernel tarball in a
|
||||
directory where you have permissions (e.g. your home directory) and
|
||||
unpack it::
|
||||
|
||||
xz -cd linux-4.X.tar.xz | tar xvf -
|
||||
|
||||
Replace "X" with the version number of the latest kernel.
|
||||
|
||||
Do NOT use the /usr/src/linux area! This area has a (usually
|
||||
incomplete) set of kernel headers that are used by the library header
|
||||
files. They should match the library, and not get messed up by
|
||||
whatever the kernel-du-jour happens to be.
|
||||
|
||||
- You can also upgrade between 4.x releases by patching. Patches are
|
||||
distributed in the xz format. To install by patching, get all the
|
||||
newer patch files, enter the top level directory of the kernel source
|
||||
(linux-4.X) and execute::
|
||||
|
||||
xz -cd ../patch-4.x.xz | patch -p1
|
||||
|
||||
Replace "x" for all versions bigger than the version "X" of your current
|
||||
source tree, **in_order**, and you should be ok. You may want to remove
|
||||
the backup files (some-file-name~ or some-file-name.orig), and make sure
|
||||
that there are no failed patches (some-file-name# or some-file-name.rej).
|
||||
If there are, either you or I have made a mistake.
|
||||
|
||||
Unlike patches for the 4.x kernels, patches for the 4.x.y kernels
|
||||
(also known as the -stable kernels) are not incremental but instead apply
|
||||
directly to the base 4.x kernel. For example, if your base kernel is 4.0
|
||||
and you want to apply the 4.0.3 patch, you must not first apply the 4.0.1
|
||||
and 4.0.2 patches. Similarly, if you are running kernel version 4.0.2 and
|
||||
want to jump to 4.0.3, you must first reverse the 4.0.2 patch (that is,
|
||||
patch -R) **before** applying the 4.0.3 patch. You can read more on this in
|
||||
:ref:`Documentation/process/applying-patches.rst <applying_patches>`.
|
||||
|
||||
Alternatively, the script patch-kernel can be used to automate this
|
||||
process. It determines the current kernel version and applies any
|
||||
patches found::
|
||||
|
||||
linux/scripts/patch-kernel linux
|
||||
|
||||
The first argument in the command above is the location of the
|
||||
kernel source. Patches are applied from the current directory, but
|
||||
an alternative directory can be specified as the second argument.
|
||||
|
||||
- Make sure you have no stale .o files and dependencies lying around::
|
||||
|
||||
cd linux
|
||||
make mrproper
|
||||
|
||||
You should now have the sources correctly installed.
|
||||
|
||||
Software requirements
|
||||
---------------------
|
||||
|
||||
Compiling and running the 4.x kernels requires up-to-date
|
||||
versions of various software packages. Consult
|
||||
:ref:`Documentation/process/changes.rst <changes>` for the minimum version numbers
|
||||
required and how to get updates for these packages. Beware that using
|
||||
excessively old versions of these packages can cause indirect
|
||||
errors that are very difficult to track down, so don't assume that
|
||||
you can just update packages when obvious problems arise during
|
||||
build or operation.
|
||||
|
||||
Build directory for the kernel
|
||||
------------------------------
|
||||
|
||||
When compiling the kernel, all output files will per default be
|
||||
stored together with the kernel source code.
|
||||
Using the option ``make O=output/dir`` allows you to specify an alternate
|
||||
place for the output files (including .config).
|
||||
Example::
|
||||
|
||||
kernel source code: /usr/src/linux-4.X
|
||||
build directory: /home/name/build/kernel
|
||||
|
||||
To configure and build the kernel, use::
|
||||
|
||||
cd /usr/src/linux-4.X
|
||||
make O=/home/name/build/kernel menuconfig
|
||||
make O=/home/name/build/kernel
|
||||
sudo make O=/home/name/build/kernel modules_install install
|
||||
|
||||
Please note: If the ``O=output/dir`` option is used, then it must be
|
||||
used for all invocations of make.
|
||||
|
||||
Configuring the kernel
|
||||
----------------------
|
||||
|
||||
Do not skip this step even if you are only upgrading one minor
|
||||
version. New configuration options are added in each release, and
|
||||
odd problems will turn up if the configuration files are not set up
|
||||
as expected. If you want to carry your existing configuration to a
|
||||
new version with minimal work, use ``make oldconfig``, which will
|
||||
only ask you for the answers to new questions.
|
||||
|
||||
- Alternative configuration commands are::
|
||||
|
||||
"make config" Plain text interface.
|
||||
|
||||
"make menuconfig" Text based color menus, radiolists & dialogs.
|
||||
|
||||
"make nconfig" Enhanced text based color menus.
|
||||
|
||||
"make xconfig" Qt based configuration tool.
|
||||
|
||||
"make gconfig" GTK+ based configuration tool.
|
||||
|
||||
"make oldconfig" Default all questions based on the contents of
|
||||
your existing ./.config file and asking about
|
||||
new config symbols.
|
||||
|
||||
"make silentoldconfig"
|
||||
Like above, but avoids cluttering the screen
|
||||
with questions already answered.
|
||||
Additionally updates the dependencies.
|
||||
|
||||
"make olddefconfig"
|
||||
Like above, but sets new symbols to their default
|
||||
values without prompting.
|
||||
|
||||
"make defconfig" Create a ./.config file by using the default
|
||||
symbol values from either arch/$ARCH/defconfig
|
||||
or arch/$ARCH/configs/${PLATFORM}_defconfig,
|
||||
depending on the architecture.
|
||||
|
||||
"make ${PLATFORM}_defconfig"
|
||||
Create a ./.config file by using the default
|
||||
symbol values from
|
||||
arch/$ARCH/configs/${PLATFORM}_defconfig.
|
||||
Use "make help" to get a list of all available
|
||||
platforms of your architecture.
|
||||
|
||||
"make allyesconfig"
|
||||
Create a ./.config file by setting symbol
|
||||
values to 'y' as much as possible.
|
||||
|
||||
"make allmodconfig"
|
||||
Create a ./.config file by setting symbol
|
||||
values to 'm' as much as possible.
|
||||
|
||||
"make allnoconfig" Create a ./.config file by setting symbol
|
||||
values to 'n' as much as possible.
|
||||
|
||||
"make randconfig" Create a ./.config file by setting symbol
|
||||
values to random values.
|
||||
|
||||
"make localmodconfig" Create a config based on current config and
|
||||
loaded modules (lsmod). Disables any module
|
||||
option that is not needed for the loaded modules.
|
||||
|
||||
To create a localmodconfig for another machine,
|
||||
store the lsmod of that machine into a file
|
||||
and pass it in as a LSMOD parameter.
|
||||
|
||||
target$ lsmod > /tmp/mylsmod
|
||||
target$ scp /tmp/mylsmod host:/tmp
|
||||
|
||||
host$ make LSMOD=/tmp/mylsmod localmodconfig
|
||||
|
||||
The above also works when cross compiling.
|
||||
|
||||
"make localyesconfig" Similar to localmodconfig, except it will convert
|
||||
all module options to built in (=y) options.
|
||||
|
||||
You can find more information on using the Linux kernel config tools
|
||||
in Documentation/kbuild/kconfig.txt.
|
||||
|
||||
- NOTES on ``make config``:
|
||||
|
||||
- Having unnecessary drivers will make the kernel bigger, and can
|
||||
under some circumstances lead to problems: probing for a
|
||||
nonexistent controller card may confuse your other controllers
|
||||
|
||||
- A kernel with math-emulation compiled in will still use the
|
||||
coprocessor if one is present: the math emulation will just
|
||||
never get used in that case. The kernel will be slightly larger,
|
||||
but will work on different machines regardless of whether they
|
||||
have a math coprocessor or not.
|
||||
|
||||
- The "kernel hacking" configuration details usually result in a
|
||||
bigger or slower kernel (or both), and can even make the kernel
|
||||
less stable by configuring some routines to actively try to
|
||||
break bad code to find kernel problems (kmalloc()). Thus you
|
||||
should probably answer 'n' to the questions for "development",
|
||||
"experimental", or "debugging" features.
|
||||
|
||||
Compiling the kernel
|
||||
--------------------
|
||||
|
||||
- Make sure you have at least gcc 3.2 available.
|
||||
For more information, refer to :ref:`Documentation/process/changes.rst <changes>`.
|
||||
|
||||
Please note that you can still run a.out user programs with this kernel.
|
||||
|
||||
- Do a ``make`` to create a compressed kernel image. It is also
|
||||
possible to do ``make install`` if you have lilo installed to suit the
|
||||
kernel makefiles, but you may want to check your particular lilo setup first.
|
||||
|
||||
To do the actual install, you have to be root, but none of the normal
|
||||
build should require that. Don't take the name of root in vain.
|
||||
|
||||
- If you configured any of the parts of the kernel as ``modules``, you
|
||||
will also have to do ``make modules_install``.
|
||||
|
||||
- Verbose kernel compile/build output:
|
||||
|
||||
Normally, the kernel build system runs in a fairly quiet mode (but not
|
||||
totally silent). However, sometimes you or other kernel developers need
|
||||
to see compile, link, or other commands exactly as they are executed.
|
||||
For this, use "verbose" build mode. This is done by passing
|
||||
``V=1`` to the ``make`` command, e.g.::
|
||||
|
||||
make V=1 all
|
||||
|
||||
To have the build system also tell the reason for the rebuild of each
|
||||
target, use ``V=2``. The default is ``V=0``.
|
||||
|
||||
- Keep a backup kernel handy in case something goes wrong. This is
|
||||
especially true for the development releases, since each new release
|
||||
contains new code which has not been debugged. Make sure you keep a
|
||||
backup of the modules corresponding to that kernel, as well. If you
|
||||
are installing a new kernel with the same version number as your
|
||||
working kernel, make a backup of your modules directory before you
|
||||
do a ``make modules_install``.
|
||||
|
||||
Alternatively, before compiling, use the kernel config option
|
||||
"LOCALVERSION" to append a unique suffix to the regular kernel version.
|
||||
LOCALVERSION can be set in the "General Setup" menu.
|
||||
|
||||
- In order to boot your new kernel, you'll need to copy the kernel
|
||||
image (e.g. .../linux/arch/x86/boot/bzImage after compilation)
|
||||
to the place where your regular bootable kernel is found.
|
||||
|
||||
- Booting a kernel directly from a floppy without the assistance of a
|
||||
bootloader such as LILO, is no longer supported.
|
||||
|
||||
If you boot Linux from the hard drive, chances are you use LILO, which
|
||||
uses the kernel image as specified in the file /etc/lilo.conf. The
|
||||
kernel image file is usually /vmlinuz, /boot/vmlinuz, /bzImage or
|
||||
/boot/bzImage. To use the new kernel, save a copy of the old image
|
||||
and copy the new image over the old one. Then, you MUST RERUN LILO
|
||||
to update the loading map! If you don't, you won't be able to boot
|
||||
the new kernel image.
|
||||
|
||||
Reinstalling LILO is usually a matter of running /sbin/lilo.
|
||||
You may wish to edit /etc/lilo.conf to specify an entry for your
|
||||
old kernel image (say, /vmlinux.old) in case the new one does not
|
||||
work. See the LILO docs for more information.
|
||||
|
||||
After reinstalling LILO, you should be all set. Shutdown the system,
|
||||
reboot, and enjoy!
|
||||
|
||||
If you ever need to change the default root device, video mode,
|
||||
ramdisk size, etc. in the kernel image, use the ``rdev`` program (or
|
||||
alternatively the LILO boot options when appropriate). No need to
|
||||
recompile the kernel to change these parameters.
|
||||
|
||||
- Reboot with the new kernel and enjoy.
|
||||
|
||||
If something goes wrong
|
||||
-----------------------
|
||||
|
||||
- If you have problems that seem to be due to kernel bugs, please check
|
||||
the file MAINTAINERS to see if there is a particular person associated
|
||||
with the part of the kernel that you are having trouble with. If there
|
||||
isn't anyone listed there, then the second best thing is to mail
|
||||
them to me (torvalds@linux-foundation.org), and possibly to any other
|
||||
relevant mailing-list or to the newsgroup.
|
||||
|
||||
- In all bug-reports, *please* tell what kernel you are talking about,
|
||||
how to duplicate the problem, and what your setup is (use your common
|
||||
sense). If the problem is new, tell me so, and if the problem is
|
||||
old, please try to tell me when you first noticed it.
|
||||
|
||||
- If the bug results in a message like::
|
||||
|
||||
unable to handle kernel paging request at address C0000010
|
||||
Oops: 0002
|
||||
EIP: 0010:XXXXXXXX
|
||||
eax: xxxxxxxx ebx: xxxxxxxx ecx: xxxxxxxx edx: xxxxxxxx
|
||||
esi: xxxxxxxx edi: xxxxxxxx ebp: xxxxxxxx
|
||||
ds: xxxx es: xxxx fs: xxxx gs: xxxx
|
||||
Pid: xx, process nr: xx
|
||||
xx xx xx xx xx xx xx xx xx xx
|
||||
|
||||
or similar kernel debugging information on your screen or in your
|
||||
system log, please duplicate it *exactly*. The dump may look
|
||||
incomprehensible to you, but it does contain information that may
|
||||
help debugging the problem. The text above the dump is also
|
||||
important: it tells something about why the kernel dumped code (in
|
||||
the above example, it's due to a bad kernel pointer). More information
|
||||
on making sense of the dump is in Documentation/admin-guide/oops-tracing.rst
|
||||
|
||||
- If you compiled the kernel with CONFIG_KALLSYMS you can send the dump
|
||||
as is, otherwise you will have to use the ``ksymoops`` program to make
|
||||
sense of the dump (but compiling with CONFIG_KALLSYMS is usually preferred).
|
||||
This utility can be downloaded from
|
||||
ftp://ftp.<country>.kernel.org/pub/linux/utils/kernel/ksymoops/ .
|
||||
Alternatively, you can do the dump lookup by hand:
|
||||
|
||||
- In debugging dumps like the above, it helps enormously if you can
|
||||
look up what the EIP value means. The hex value as such doesn't help
|
||||
me or anybody else very much: it will depend on your particular
|
||||
kernel setup. What you should do is take the hex value from the EIP
|
||||
line (ignore the ``0010:``), and look it up in the kernel namelist to
|
||||
see which kernel function contains the offending address.
|
||||
|
||||
To find out the kernel function name, you'll need to find the system
|
||||
binary associated with the kernel that exhibited the symptom. This is
|
||||
the file 'linux/vmlinux'. To extract the namelist and match it against
|
||||
the EIP from the kernel crash, do::
|
||||
|
||||
nm vmlinux | sort | less
|
||||
|
||||
This will give you a list of kernel addresses sorted in ascending
|
||||
order, from which it is simple to find the function that contains the
|
||||
offending address. Note that the address given by the kernel
|
||||
debugging messages will not necessarily match exactly with the
|
||||
function addresses (in fact, that is very unlikely), so you can't
|
||||
just 'grep' the list: the list will, however, give you the starting
|
||||
point of each kernel function, so by looking for the function that
|
||||
has a starting address lower than the one you are searching for but
|
||||
is followed by a function with a higher address you will find the one
|
||||
you want. In fact, it may be a good idea to include a bit of
|
||||
"context" in your problem report, giving a few lines around the
|
||||
interesting one.
|
||||
|
||||
If you for some reason cannot do the above (you have a pre-compiled
|
||||
kernel image or similar), telling me as much about your setup as
|
||||
possible will help. Please read the :ref:`admin-guide/reporting-bugs.rst <reportingbugs>`
|
||||
document for details.
|
||||
|
||||
- Alternatively, you can use gdb on a running kernel. (read-only; i.e. you
|
||||
cannot change values or set break points.) To do this, first compile the
|
||||
kernel with -g; edit arch/x86/Makefile appropriately, then do a ``make
|
||||
clean``. You'll also need to enable CONFIG_PROC_FS (via ``make config``).
|
||||
|
||||
After you've rebooted with the new kernel, do ``gdb vmlinux /proc/kcore``.
|
||||
You can now use all the usual gdb commands. The command to look up the
|
||||
point where your system crashed is ``l *0xXXXXXXXX``. (Replace the XXXes
|
||||
with the EIP value.)
|
||||
|
||||
gdb'ing a non-running kernel currently fails because ``gdb`` (wrongly)
|
||||
disregards the starting offset for which the kernel is compiled.
|
151
Documentation/admin-guide/binfmt-misc.rst
Normal file
151
Documentation/admin-guide/binfmt-misc.rst
Normal file
@ -0,0 +1,151 @@
|
||||
Kernel Support for miscellaneous (your favourite) Binary Formats v1.1
|
||||
=====================================================================
|
||||
|
||||
This Kernel feature allows you to invoke almost (for restrictions see below)
|
||||
every program by simply typing its name in the shell.
|
||||
This includes for example compiled Java(TM), Python or Emacs programs.
|
||||
|
||||
To achieve this you must tell binfmt_misc which interpreter has to be invoked
|
||||
with which binary. Binfmt_misc recognises the binary-type by matching some bytes
|
||||
at the beginning of the file with a magic byte sequence (masking out specified
|
||||
bits) you have supplied. Binfmt_misc can also recognise a filename extension
|
||||
aka ``.com`` or ``.exe``.
|
||||
|
||||
First you must mount binfmt_misc::
|
||||
|
||||
mount binfmt_misc -t binfmt_misc /proc/sys/fs/binfmt_misc
|
||||
|
||||
To actually register a new binary type, you have to set up a string looking like
|
||||
``:name:type:offset:magic:mask:interpreter:flags`` (where you can choose the
|
||||
``:`` upon your needs) and echo it to ``/proc/sys/fs/binfmt_misc/register``.
|
||||
|
||||
Here is what the fields mean:
|
||||
|
||||
- ``name``
|
||||
is an identifier string. A new /proc file will be created with this
|
||||
``name below /proc/sys/fs/binfmt_misc``; cannot contain slashes ``/`` for
|
||||
obvious reasons.
|
||||
- ``type``
|
||||
is the type of recognition. Give ``M`` for magic and ``E`` for extension.
|
||||
- ``offset``
|
||||
is the offset of the magic/mask in the file, counted in bytes. This
|
||||
defaults to 0 if you omit it (i.e. you write ``:name:type::magic...``).
|
||||
Ignored when using filename extension matching.
|
||||
- ``magic``
|
||||
is the byte sequence binfmt_misc is matching for. The magic string
|
||||
may contain hex-encoded characters like ``\x0a`` or ``\xA4``. Note that you
|
||||
must escape any NUL bytes; parsing halts at the first one. In a shell
|
||||
environment you might have to write ``\\x0a`` to prevent the shell from
|
||||
eating your ``\``.
|
||||
If you chose filename extension matching, this is the extension to be
|
||||
recognised (without the ``.``, the ``\x0a`` specials are not allowed).
|
||||
Extension matching is case sensitive, and slashes ``/`` are not allowed!
|
||||
- ``mask``
|
||||
is an (optional, defaults to all 0xff) mask. You can mask out some
|
||||
bits from matching by supplying a string like magic and as long as magic.
|
||||
The mask is anded with the byte sequence of the file. Note that you must
|
||||
escape any NUL bytes; parsing halts at the first one. Ignored when using
|
||||
filename extension matching.
|
||||
- ``interpreter``
|
||||
is the program that should be invoked with the binary as first
|
||||
argument (specify the full path)
|
||||
- ``flags``
|
||||
is an optional field that controls several aspects of the invocation
|
||||
of the interpreter. It is a string of capital letters, each controls a
|
||||
certain aspect. The following flags are supported:
|
||||
|
||||
``P`` - preserve-argv[0]
|
||||
Legacy behavior of binfmt_misc is to overwrite
|
||||
the original argv[0] with the full path to the binary. When this
|
||||
flag is included, binfmt_misc will add an argument to the argument
|
||||
vector for this purpose, thus preserving the original ``argv[0]``.
|
||||
e.g. If your interp is set to ``/bin/foo`` and you run ``blah``
|
||||
(which is in ``/usr/local/bin``), then the kernel will execute
|
||||
``/bin/foo`` with ``argv[]`` set to ``["/bin/foo", "/usr/local/bin/blah", "blah"]``. The interp has to be aware of this so it can
|
||||
execute ``/usr/local/bin/blah``
|
||||
with ``argv[]`` set to ``["blah"]``.
|
||||
``O`` - open-binary
|
||||
Legacy behavior of binfmt_misc is to pass the full path
|
||||
of the binary to the interpreter as an argument. When this flag is
|
||||
included, binfmt_misc will open the file for reading and pass its
|
||||
descriptor as an argument, instead of the full path, thus allowing
|
||||
the interpreter to execute non-readable binaries. This feature
|
||||
should be used with care - the interpreter has to be trusted not to
|
||||
emit the contents of the non-readable binary.
|
||||
``C`` - credentials
|
||||
Currently, the behavior of binfmt_misc is to calculate
|
||||
the credentials and security token of the new process according to
|
||||
the interpreter. When this flag is included, these attributes are
|
||||
calculated according to the binary. It also implies the ``O`` flag.
|
||||
This feature should be used with care as the interpreter
|
||||
will run with root permissions when a setuid binary owned by root
|
||||
is run with binfmt_misc.
|
||||
``F`` - fix binary
|
||||
The usual behaviour of binfmt_misc is to spawn the
|
||||
binary lazily when the misc format file is invoked. However,
|
||||
this doesn``t work very well in the face of mount namespaces and
|
||||
changeroots, so the ``F`` mode opens the binary as soon as the
|
||||
emulation is installed and uses the opened image to spawn the
|
||||
emulator, meaning it is always available once installed,
|
||||
regardless of how the environment changes.
|
||||
|
||||
|
||||
There are some restrictions:
|
||||
|
||||
- the whole register string may not exceed 1920 characters
|
||||
- the magic must reside in the first 128 bytes of the file, i.e.
|
||||
offset+size(magic) has to be less than 128
|
||||
- the interpreter string may not exceed 127 characters
|
||||
|
||||
To use binfmt_misc you have to mount it first. You can mount it with
|
||||
``mount -t binfmt_misc none /proc/sys/fs/binfmt_misc`` command, or you can add
|
||||
a line ``none /proc/sys/fs/binfmt_misc binfmt_misc defaults 0 0`` to your
|
||||
``/etc/fstab`` so it auto mounts on boot.
|
||||
|
||||
You may want to add the binary formats in one of your ``/etc/rc`` scripts during
|
||||
boot-up. Read the manual of your init program to figure out how to do this
|
||||
right.
|
||||
|
||||
Think about the order of adding entries! Later added entries are matched first!
|
||||
|
||||
|
||||
A few examples (assumed you are in ``/proc/sys/fs/binfmt_misc``):
|
||||
|
||||
- enable support for em86 (like binfmt_em86, for Alpha AXP only)::
|
||||
|
||||
echo ':i386:M::\x7fELF\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x03:\xff\xff\xff\xff\xff\xfe\xfe\xff\xff\xff\xff\xff\xff\xff\xff\xff\xfb\xff\xff:/bin/em86:' > register
|
||||
echo ':i486:M::\x7fELF\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x06:\xff\xff\xff\xff\xff\xfe\xfe\xff\xff\xff\xff\xff\xff\xff\xff\xff\xfb\xff\xff:/bin/em86:' > register
|
||||
|
||||
- enable support for packed DOS applications (pre-configured dosemu hdimages)::
|
||||
|
||||
echo ':DEXE:M::\x0eDEX::/usr/bin/dosexec:' > register
|
||||
|
||||
- enable support for Windows executables using wine::
|
||||
|
||||
echo ':DOSWin:M::MZ::/usr/local/bin/wine:' > register
|
||||
|
||||
For java support see Documentation/admin-guide/java.rst
|
||||
|
||||
|
||||
You can enable/disable binfmt_misc or one binary type by echoing 0 (to disable)
|
||||
or 1 (to enable) to ``/proc/sys/fs/binfmt_misc/status`` or
|
||||
``/proc/.../the_name``.
|
||||
Catting the file tells you the current status of ``binfmt_misc/the_entry``.
|
||||
|
||||
You can remove one entry or all entries by echoing -1 to ``/proc/.../the_name``
|
||||
or ``/proc/sys/fs/binfmt_misc/status``.
|
||||
|
||||
|
||||
Hints
|
||||
-----
|
||||
|
||||
If you want to pass special arguments to your interpreter, you can
|
||||
write a wrapper script for it. See Documentation/admin-guide/java.rst for an
|
||||
example.
|
||||
|
||||
Your interpreter should NOT look in the PATH for the filename; the kernel
|
||||
passes it the full filename (or the file descriptor) to use. Using ``$PATH`` can
|
||||
cause unexpected behaviour and can be a security hazard.
|
||||
|
||||
|
||||
Richard Günther <rguenth@tat.physik.uni-tuebingen.de>
|
38
Documentation/admin-guide/braille-console.rst
Normal file
38
Documentation/admin-guide/braille-console.rst
Normal file
@ -0,0 +1,38 @@
|
||||
Linux Braille Console
|
||||
=====================
|
||||
|
||||
To get early boot messages on a braille device (before userspace screen
|
||||
readers can start), you first need to compile the support for the usual serial
|
||||
console (see :ref:`Documentation/admin-guide/serial-console.rst <serial_console>`), and
|
||||
for braille device
|
||||
(in :menuselection:`Device Drivers --> Accessibility support --> Console on braille device`).
|
||||
|
||||
Then you need to specify a ``console=brl``, option on the kernel command line, the
|
||||
format is::
|
||||
|
||||
console=brl,serial_options...
|
||||
|
||||
where ``serial_options...`` are the same as described in
|
||||
:ref:`Documentation/admin-guide/serial-console.rst <serial_console>`.
|
||||
|
||||
So for instance you can use ``console=brl,ttyS0`` if the braille device is connected to the first serial port, and ``console=brl,ttyS0,115200`` to
|
||||
override the baud rate to 115200, etc.
|
||||
|
||||
By default, the braille device will just show the last kernel message (console
|
||||
mode). To review previous messages, press the Insert key to switch to the VT
|
||||
review mode. In review mode, the arrow keys permit to browse in the VT content,
|
||||
:kbd:`PAGE-UP`/:kbd:`PAGE-DOWN` keys go at the top/bottom of the screen, and
|
||||
the :kbd:`HOME` key goes back
|
||||
to the cursor, hence providing very basic screen reviewing facility.
|
||||
|
||||
Sound feedback can be obtained by adding the ``braille_console.sound=1`` kernel
|
||||
parameter.
|
||||
|
||||
For simplicity, only one braille console can be enabled, other uses of
|
||||
``console=brl,...`` will be discarded. Also note that it does not interfere with
|
||||
the console selection mechanism described in
|
||||
:ref:`Documentation/admin-guide/serial-console.rst <serial_console>`.
|
||||
|
||||
For now, only the VisioBraille device is supported.
|
||||
|
||||
Samuel Thibault <samuel.thibault@ens-lyon.org>
|
76
Documentation/admin-guide/bug-bisect.rst
Normal file
76
Documentation/admin-guide/bug-bisect.rst
Normal file
@ -0,0 +1,76 @@
|
||||
Bisecting a bug
|
||||
+++++++++++++++
|
||||
|
||||
Last updated: 28 October 2016
|
||||
|
||||
Introduction
|
||||
============
|
||||
|
||||
Always try the latest kernel from kernel.org and build from source. If you are
|
||||
not confident in doing that please report the bug to your distribution vendor
|
||||
instead of to a kernel developer.
|
||||
|
||||
Finding bugs is not always easy. Have a go though. If you can't find it don't
|
||||
give up. Report as much as you have found to the relevant maintainer. See
|
||||
MAINTAINERS for who that is for the subsystem you have worked on.
|
||||
|
||||
Before you submit a bug report read
|
||||
:ref:`Documentation/admin-guide/reporting-bugs.rst <reportingbugs>`.
|
||||
|
||||
Devices not appearing
|
||||
=====================
|
||||
|
||||
Often this is caused by udev/systemd. Check that first before blaming it
|
||||
on the kernel.
|
||||
|
||||
Finding patch that caused a bug
|
||||
===============================
|
||||
|
||||
Using the provided tools with ``git`` makes finding bugs easy provided the bug
|
||||
is reproducible.
|
||||
|
||||
Steps to do it:
|
||||
|
||||
- build the Kernel from its git source
|
||||
- start bisect with [#f1]_::
|
||||
|
||||
$ git bisect start
|
||||
|
||||
- mark the broken changeset with::
|
||||
|
||||
$ git bisect bad [commit]
|
||||
|
||||
- mark a changeset where the code is known to work with::
|
||||
|
||||
$ git bisect good [commit]
|
||||
|
||||
- rebuild the Kernel and test
|
||||
- interact with git bisect by using either::
|
||||
|
||||
$ git bisect good
|
||||
|
||||
or::
|
||||
|
||||
$ git bisect bad
|
||||
|
||||
depending if the bug happened on the changeset you're testing
|
||||
- After some interactions, git bisect will give you the changeset that
|
||||
likely caused the bug.
|
||||
|
||||
- For example, if you know that the current version is bad, and version
|
||||
4.8 is good, you could do::
|
||||
|
||||
$ git bisect start
|
||||
$ git bisect bad # Current version is bad
|
||||
$ git bisect good v4.8
|
||||
|
||||
|
||||
.. [#f1] You can, optionally, provide both good and bad arguments at git
|
||||
start with ``git bisect start [BAD] [GOOD]``
|
||||
|
||||
For further references, please read:
|
||||
|
||||
- The man page for ``git-bisect``
|
||||
- `Fighting regressions with git bisect <https://www.kernel.org/pub/software/scm/git/docs/git-bisect-lk2009.html>`_
|
||||
- `Fully automated bisecting with "git bisect run" <https://lwn.net/Articles/317154>`_
|
||||
- `Using Git bisect to figure out when brokenness was introduced <http://webchick.net/node/99>`_
|
369
Documentation/admin-guide/bug-hunting.rst
Normal file
369
Documentation/admin-guide/bug-hunting.rst
Normal file
@ -0,0 +1,369 @@
|
||||
Bug hunting
|
||||
===========
|
||||
|
||||
Kernel bug reports often come with a stack dump like the one below::
|
||||
|
||||
------------[ cut here ]------------
|
||||
WARNING: CPU: 1 PID: 28102 at kernel/module.c:1108 module_put+0x57/0x70
|
||||
Modules linked in: dvb_usb_gp8psk(-) dvb_usb dvb_core nvidia_drm(PO) nvidia_modeset(PO) snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd soundcore nvidia(PO) [last unloaded: rc_core]
|
||||
CPU: 1 PID: 28102 Comm: rmmod Tainted: P WC O 4.8.4-build.1 #1
|
||||
Hardware name: MSI MS-7309/MS-7309, BIOS V1.12 02/23/2009
|
||||
00000000 c12ba080 00000000 00000000 c103ed6a c1616014 00000001 00006dc6
|
||||
c1615862 00000454 c109e8a7 c109e8a7 00000009 ffffffff 00000000 f13f6a10
|
||||
f5f5a600 c103ee33 00000009 00000000 00000000 c109e8a7 f80ca4d0 c109f617
|
||||
Call Trace:
|
||||
[<c12ba080>] ? dump_stack+0x44/0x64
|
||||
[<c103ed6a>] ? __warn+0xfa/0x120
|
||||
[<c109e8a7>] ? module_put+0x57/0x70
|
||||
[<c109e8a7>] ? module_put+0x57/0x70
|
||||
[<c103ee33>] ? warn_slowpath_null+0x23/0x30
|
||||
[<c109e8a7>] ? module_put+0x57/0x70
|
||||
[<f80ca4d0>] ? gp8psk_fe_set_frontend+0x460/0x460 [dvb_usb_gp8psk]
|
||||
[<c109f617>] ? symbol_put_addr+0x27/0x50
|
||||
[<f80bc9ca>] ? dvb_usb_adapter_frontend_exit+0x3a/0x70 [dvb_usb]
|
||||
[<f80bb3bf>] ? dvb_usb_exit+0x2f/0xd0 [dvb_usb]
|
||||
[<c13d03bc>] ? usb_disable_endpoint+0x7c/0xb0
|
||||
[<f80bb48a>] ? dvb_usb_device_exit+0x2a/0x50 [dvb_usb]
|
||||
[<c13d2882>] ? usb_unbind_interface+0x62/0x250
|
||||
[<c136b514>] ? __pm_runtime_idle+0x44/0x70
|
||||
[<c13620d8>] ? __device_release_driver+0x78/0x120
|
||||
[<c1362907>] ? driver_detach+0x87/0x90
|
||||
[<c1361c48>] ? bus_remove_driver+0x38/0x90
|
||||
[<c13d1c18>] ? usb_deregister+0x58/0xb0
|
||||
[<c109fbb0>] ? SyS_delete_module+0x130/0x1f0
|
||||
[<c1055654>] ? task_work_run+0x64/0x80
|
||||
[<c1000fa5>] ? exit_to_usermode_loop+0x85/0x90
|
||||
[<c10013f0>] ? do_fast_syscall_32+0x80/0x130
|
||||
[<c1549f43>] ? sysenter_past_esp+0x40/0x6a
|
||||
---[ end trace 6ebc60ef3981792f ]---
|
||||
|
||||
Such stack traces provide enough information to identify the line inside the
|
||||
Kernel's source code where the bug happened. Depending on the severity of
|
||||
the issue, it may also contain the word **Oops**, as on this one::
|
||||
|
||||
BUG: unable to handle kernel NULL pointer dereference at (null)
|
||||
IP: [<c06969d4>] iret_exc+0x7d0/0xa59
|
||||
*pdpt = 000000002258a001 *pde = 0000000000000000
|
||||
Oops: 0002 [#1] PREEMPT SMP
|
||||
...
|
||||
|
||||
Despite being an **Oops** or some other sort of stack trace, the offended
|
||||
line is usually required to identify and handle the bug. Along this chapter,
|
||||
we'll refer to "Oops" for all kinds of stack traces that need to be analized.
|
||||
|
||||
.. note::
|
||||
|
||||
``ksymoops`` is useless on 2.6 or upper. Please use the Oops in its original
|
||||
format (from ``dmesg``, etc). Ignore any references in this or other docs to
|
||||
"decoding the Oops" or "running it through ksymoops".
|
||||
If you post an Oops from 2.6+ that has been run through ``ksymoops``,
|
||||
people will just tell you to repost it.
|
||||
|
||||
Where is the Oops message is located?
|
||||
-------------------------------------
|
||||
|
||||
Normally the Oops text is read from the kernel buffers by klogd and
|
||||
handed to ``syslogd`` which writes it to a syslog file, typically
|
||||
``/var/log/messages`` (depends on ``/etc/syslog.conf``). On systems with
|
||||
systemd, it may also be stored by the ``journald`` daemon, and accessed
|
||||
by running ``journalctl`` command.
|
||||
|
||||
Sometimes ``klogd`` dies, in which case you can run ``dmesg > file`` to
|
||||
read the data from the kernel buffers and save it. Or you can
|
||||
``cat /proc/kmsg > file``, however you have to break in to stop the transfer,
|
||||
``kmsg`` is a "never ending file".
|
||||
|
||||
If the machine has crashed so badly that you cannot enter commands or
|
||||
the disk is not available then you have three options:
|
||||
|
||||
(1) Hand copy the text from the screen and type it in after the machine
|
||||
has restarted. Messy but it is the only option if you have not
|
||||
planned for a crash. Alternatively, you can take a picture of
|
||||
the screen with a digital camera - not nice, but better than
|
||||
nothing. If the messages scroll off the top of the console, you
|
||||
may find that booting with a higher resolution (eg, ``vga=791``)
|
||||
will allow you to read more of the text. (Caveat: This needs ``vesafb``,
|
||||
so won't help for 'early' oopses)
|
||||
|
||||
(2) Boot with a serial console (see
|
||||
:ref:`Documentation/admin-guide/serial-console.rst <serial_console>`),
|
||||
run a null modem to a second machine and capture the output there
|
||||
using your favourite communication program. Minicom works well.
|
||||
|
||||
(3) Use Kdump (see Documentation/kdump/kdump.txt),
|
||||
extract the kernel ring buffer from old memory with using dmesg
|
||||
gdbmacro in Documentation/kdump/gdbmacros.txt.
|
||||
|
||||
Finding the bug's location
|
||||
--------------------------
|
||||
|
||||
Reporting a bug works best if you point the location of the bug at the
|
||||
Kernel source file. There are two methods for doing that. Usually, using
|
||||
``gdb`` is easier, but the Kernel should be pre-compiled with debug info.
|
||||
|
||||
gdb
|
||||
^^^
|
||||
|
||||
The GNU debug (``gdb``) is the best way to figure out the exact file and line
|
||||
number of the OOPS from the ``vmlinux`` file.
|
||||
|
||||
The usage of gdb works best on a kernel compiled with ``CONFIG_DEBUG_INFO``.
|
||||
This can be set by running::
|
||||
|
||||
$ ./scripts/config -d COMPILE_TEST -e DEBUG_KERNEL -e DEBUG_INFO
|
||||
|
||||
On a kernel compiled with ``CONFIG_DEBUG_INFO``, you can simply copy the
|
||||
EIP value from the OOPS::
|
||||
|
||||
EIP: 0060:[<c021e50e>] Not tainted VLI
|
||||
|
||||
And use GDB to translate that to human-readable form::
|
||||
|
||||
$ gdb vmlinux
|
||||
(gdb) l *0xc021e50e
|
||||
|
||||
If you don't have ``CONFIG_DEBUG_INFO`` enabled, you use the function
|
||||
offset from the OOPS::
|
||||
|
||||
EIP is at vt_ioctl+0xda8/0x1482
|
||||
|
||||
And recompile the kernel with ``CONFIG_DEBUG_INFO`` enabled::
|
||||
|
||||
$ ./scripts/config -d COMPILE_TEST -e DEBUG_KERNEL -e DEBUG_INFO
|
||||
$ make vmlinux
|
||||
$ gdb vmlinux
|
||||
(gdb) l *vt_ioctl+0xda8
|
||||
0x1888 is in vt_ioctl (drivers/tty/vt/vt_ioctl.c:293).
|
||||
288 {
|
||||
289 struct vc_data *vc = NULL;
|
||||
290 int ret = 0;
|
||||
291
|
||||
292 console_lock();
|
||||
293 if (VT_BUSY(vc_num))
|
||||
294 ret = -EBUSY;
|
||||
295 else if (vc_num)
|
||||
296 vc = vc_deallocate(vc_num);
|
||||
297 console_unlock();
|
||||
|
||||
or, if you want to be more verbose::
|
||||
|
||||
(gdb) p vt_ioctl
|
||||
$1 = {int (struct tty_struct *, unsigned int, unsigned long)} 0xae0 <vt_ioctl>
|
||||
(gdb) l *0xae0+0xda8
|
||||
|
||||
You could, instead, use the object file::
|
||||
|
||||
$ make drivers/tty/
|
||||
$ gdb drivers/tty/vt/vt_ioctl.o
|
||||
(gdb) l *vt_ioctl+0xda8
|
||||
|
||||
If you have a call trace, such as::
|
||||
|
||||
Call Trace:
|
||||
[<ffffffff8802c8e9>] :jbd:log_wait_commit+0xa3/0xf5
|
||||
[<ffffffff810482d9>] autoremove_wake_function+0x0/0x2e
|
||||
[<ffffffff8802770b>] :jbd:journal_stop+0x1be/0x1ee
|
||||
...
|
||||
|
||||
this shows the problem likely in the :jbd: module. You can load that module
|
||||
in gdb and list the relevant code::
|
||||
|
||||
$ gdb fs/jbd/jbd.ko
|
||||
(gdb) l *log_wait_commit+0xa3
|
||||
|
||||
.. note::
|
||||
|
||||
You can also do the same for any function call at the stack trace,
|
||||
like this one::
|
||||
|
||||
[<f80bc9ca>] ? dvb_usb_adapter_frontend_exit+0x3a/0x70 [dvb_usb]
|
||||
|
||||
The position where the above call happened can be seen with::
|
||||
|
||||
$ gdb drivers/media/usb/dvb-usb/dvb-usb.o
|
||||
(gdb) l *dvb_usb_adapter_frontend_exit+0x3a
|
||||
|
||||
objdump
|
||||
^^^^^^^
|
||||
|
||||
To debug a kernel, use objdump and look for the hex offset from the crash
|
||||
output to find the valid line of code/assembler. Without debug symbols, you
|
||||
will see the assembler code for the routine shown, but if your kernel has
|
||||
debug symbols the C code will also be available. (Debug symbols can be enabled
|
||||
in the kernel hacking menu of the menu configuration.) For example::
|
||||
|
||||
$ objdump -r -S -l --disassemble net/dccp/ipv4.o
|
||||
|
||||
.. note::
|
||||
|
||||
You need to be at the top level of the kernel tree for this to pick up
|
||||
your C files.
|
||||
|
||||
If you don't have access to the code you can also debug on some crash dumps
|
||||
e.g. crash dump output as shown by Dave Miller::
|
||||
|
||||
EIP is at +0x14/0x4c0
|
||||
...
|
||||
Code: 44 24 04 e8 6f 05 00 00 e9 e8 fe ff ff 8d 76 00 8d bc 27 00 00
|
||||
00 00 55 57 56 53 81 ec bc 00 00 00 8b ac 24 d0 00 00 00 8b 5d 08
|
||||
<8b> 83 3c 01 00 00 89 44 24 14 8b 45 28 85 c0 89 44 24 18 0f 85
|
||||
|
||||
Put the bytes into a "foo.s" file like this:
|
||||
|
||||
.text
|
||||
.globl foo
|
||||
foo:
|
||||
.byte .... /* bytes from Code: part of OOPS dump */
|
||||
|
||||
Compile it with "gcc -c -o foo.o foo.s" then look at the output of
|
||||
"objdump --disassemble foo.o".
|
||||
|
||||
Output:
|
||||
|
||||
ip_queue_xmit:
|
||||
push %ebp
|
||||
push %edi
|
||||
push %esi
|
||||
push %ebx
|
||||
sub $0xbc, %esp
|
||||
mov 0xd0(%esp), %ebp ! %ebp = arg0 (skb)
|
||||
mov 0x8(%ebp), %ebx ! %ebx = skb->sk
|
||||
mov 0x13c(%ebx), %eax ! %eax = inet_sk(sk)->opt
|
||||
|
||||
Reporting the bug
|
||||
-----------------
|
||||
|
||||
Once you find where the bug happened, by inspecting its location,
|
||||
you could either try to fix it yourself or report it upstream.
|
||||
|
||||
In order to report it upstream, you should identify the mailing list
|
||||
used for the development of the affected code. This can be done by using
|
||||
the ``get_maintainer.pl`` script.
|
||||
|
||||
For example, if you find a bug at the gspca's conex.c file, you can get
|
||||
their maintainers with::
|
||||
|
||||
$ ./scripts/get_maintainer.pl -f drivers/media/usb/gspca/sonixj.c
|
||||
Hans Verkuil <hverkuil@xs4all.nl> (odd fixer:GSPCA USB WEBCAM DRIVER,commit_signer:1/1=100%)
|
||||
Mauro Carvalho Chehab <mchehab@kernel.org> (maintainer:MEDIA INPUT INFRASTRUCTURE (V4L/DVB),commit_signer:1/1=100%)
|
||||
Tejun Heo <tj@kernel.org> (commit_signer:1/1=100%)
|
||||
Bhaktipriya Shridhar <bhaktipriya96@gmail.com> (commit_signer:1/1=100%,authored:1/1=100%,added_lines:4/4=100%,removed_lines:9/9=100%)
|
||||
linux-media@vger.kernel.org (open list:GSPCA USB WEBCAM DRIVER)
|
||||
linux-kernel@vger.kernel.org (open list)
|
||||
|
||||
Please notice that it will point to:
|
||||
|
||||
- The last developers that touched on the source code. On the above example,
|
||||
Tejun and Bhaktipriya (in this specific case, none really envolved on the
|
||||
development of this file);
|
||||
- The driver maintainer (Hans Verkuil);
|
||||
- The subsystem maintainer (Mauro Carvalho Chehab)
|
||||
- The driver and/or subsystem mailing list (linux-media@vger.kernel.org);
|
||||
- the Linux Kernel mailing list (linux-kernel@vger.kernel.org).
|
||||
|
||||
Usually, the fastest way to have your bug fixed is to report it to mailing
|
||||
list used for the development of the code (linux-media ML) copying the driver maintainer (Hans).
|
||||
|
||||
If you are totally stumped as to whom to send the report, and
|
||||
``get_maintainer.pl`` didn't provide you anything useful, send it to
|
||||
linux-kernel@vger.kernel.org.
|
||||
|
||||
Thanks for your help in making Linux as stable as humanly possible.
|
||||
|
||||
Fixing the bug
|
||||
--------------
|
||||
|
||||
If you know programming, you could help us by not only reporting the bug,
|
||||
but also providing us with a solution. After all open source is about
|
||||
sharing what you do and don't you want to be recognised for your genius?
|
||||
|
||||
If you decide to take this way, once you have worked out a fix please submit
|
||||
it upstream.
|
||||
|
||||
Please do read
|
||||
ref:`Documentation/process/submitting-patches.rst <submittingpatches>` though
|
||||
to help your code get accepted.
|
||||
|
||||
|
||||
---------------------------------------------------------------------------
|
||||
|
||||
Notes on Oops tracing with ``klogd``
|
||||
------------------------------------
|
||||
|
||||
In order to help Linus and the other kernel developers there has been
|
||||
substantial support incorporated into ``klogd`` for processing protection
|
||||
faults. In order to have full support for address resolution at least
|
||||
version 1.3-pl3 of the ``sysklogd`` package should be used.
|
||||
|
||||
When a protection fault occurs the ``klogd`` daemon automatically
|
||||
translates important addresses in the kernel log messages to their
|
||||
symbolic equivalents. This translated kernel message is then
|
||||
forwarded through whatever reporting mechanism ``klogd`` is using. The
|
||||
protection fault message can be simply cut out of the message files
|
||||
and forwarded to the kernel developers.
|
||||
|
||||
Two types of address resolution are performed by ``klogd``. The first is
|
||||
static translation and the second is dynamic translation. Static
|
||||
translation uses the System.map file in much the same manner that
|
||||
ksymoops does. In order to do static translation the ``klogd`` daemon
|
||||
must be able to find a system map file at daemon initialization time.
|
||||
See the klogd man page for information on how ``klogd`` searches for map
|
||||
files.
|
||||
|
||||
Dynamic address translation is important when kernel loadable modules
|
||||
are being used. Since memory for kernel modules is allocated from the
|
||||
kernel's dynamic memory pools there are no fixed locations for either
|
||||
the start of the module or for functions and symbols in the module.
|
||||
|
||||
The kernel supports system calls which allow a program to determine
|
||||
which modules are loaded and their location in memory. Using these
|
||||
system calls the klogd daemon builds a symbol table which can be used
|
||||
to debug a protection fault which occurs in a loadable kernel module.
|
||||
|
||||
At the very minimum klogd will provide the name of the module which
|
||||
generated the protection fault. There may be additional symbolic
|
||||
information available if the developer of the loadable module chose to
|
||||
export symbol information from the module.
|
||||
|
||||
Since the kernel module environment can be dynamic there must be a
|
||||
mechanism for notifying the ``klogd`` daemon when a change in module
|
||||
environment occurs. There are command line options available which
|
||||
allow klogd to signal the currently executing daemon that symbol
|
||||
information should be refreshed. See the ``klogd`` manual page for more
|
||||
information.
|
||||
|
||||
A patch is included with the sysklogd distribution which modifies the
|
||||
``modules-2.0.0`` package to automatically signal klogd whenever a module
|
||||
is loaded or unloaded. Applying this patch provides essentially
|
||||
seamless support for debugging protection faults which occur with
|
||||
kernel loadable modules.
|
||||
|
||||
The following is an example of a protection fault in a loadable module
|
||||
processed by ``klogd``::
|
||||
|
||||
Aug 29 09:51:01 blizard kernel: Unable to handle kernel paging request at virtual address f15e97cc
|
||||
Aug 29 09:51:01 blizard kernel: current->tss.cr3 = 0062d000, %cr3 = 0062d000
|
||||
Aug 29 09:51:01 blizard kernel: *pde = 00000000
|
||||
Aug 29 09:51:01 blizard kernel: Oops: 0002
|
||||
Aug 29 09:51:01 blizard kernel: CPU: 0
|
||||
Aug 29 09:51:01 blizard kernel: EIP: 0010:[oops:_oops+16/3868]
|
||||
Aug 29 09:51:01 blizard kernel: EFLAGS: 00010212
|
||||
Aug 29 09:51:01 blizard kernel: eax: 315e97cc ebx: 003a6f80 ecx: 001be77b edx: 00237c0c
|
||||
Aug 29 09:51:01 blizard kernel: esi: 00000000 edi: bffffdb3 ebp: 00589f90 esp: 00589f8c
|
||||
Aug 29 09:51:01 blizard kernel: ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018
|
||||
Aug 29 09:51:01 blizard kernel: Process oops_test (pid: 3374, process nr: 21, stackpage=00589000)
|
||||
Aug 29 09:51:01 blizard kernel: Stack: 315e97cc 00589f98 0100b0b4 bffffed4 0012e38e 00240c64 003a6f80 00000001
|
||||
Aug 29 09:51:01 blizard kernel: 00000000 00237810 bfffff00 0010a7fa 00000003 00000001 00000000 bfffff00
|
||||
Aug 29 09:51:01 blizard kernel: bffffdb3 bffffed4 ffffffda 0000002b 0007002b 0000002b 0000002b 00000036
|
||||
Aug 29 09:51:01 blizard kernel: Call Trace: [oops:_oops_ioctl+48/80] [_sys_ioctl+254/272] [_system_call+82/128]
|
||||
Aug 29 09:51:01 blizard kernel: Code: c7 00 05 00 00 00 eb 08 90 90 90 90 90 90 90 90 89 ec 5d c3
|
||||
|
||||
---------------------------------------------------------------------------
|
||||
|
||||
::
|
||||
|
||||
Dr. G.W. Wettstein Oncology Research Div. Computing Facility
|
||||
Roger Maris Cancer Center INTERNET: greg@wind.rmcc.com
|
||||
820 4th St. N.
|
||||
Fargo, ND 58122
|
||||
Phone: 701-234-7556
|
10
Documentation/admin-guide/conf.py
Normal file
10
Documentation/admin-guide/conf.py
Normal file
@ -0,0 +1,10 @@
|
||||
# -*- coding: utf-8; mode: python -*-
|
||||
|
||||
project = 'Linux Kernel User Documentation'
|
||||
|
||||
tags.add("subproject")
|
||||
|
||||
latex_documents = [
|
||||
('index', 'linux-user.tex', 'Linux Kernel User Documentation',
|
||||
'The kernel development community', 'manual'),
|
||||
]
|
268
Documentation/admin-guide/devices.rst
Normal file
268
Documentation/admin-guide/devices.rst
Normal file
@ -0,0 +1,268 @@
|
||||
|
||||
Linux allocated devices (4.x+ version)
|
||||
======================================
|
||||
|
||||
This list is the Linux Device List, the official registry of allocated
|
||||
device numbers and ``/dev`` directory nodes for the Linux operating
|
||||
system.
|
||||
|
||||
The LaTeX version of this document is no longer maintained, nor is
|
||||
the document that used to reside at lanana.org. This version in the
|
||||
mainline Linux kernel is the master document. Updates shall be sent
|
||||
as patches to the kernel maintainers (see the
|
||||
:ref:`Documentation/process/submitting-patches.rst <submittingpatches>` document).
|
||||
Specifically explore the sections titled "CHAR and MISC DRIVERS", and
|
||||
"BLOCK LAYER" in the MAINTAINERS file to find the right maintainers
|
||||
to involve for character and block devices.
|
||||
|
||||
This document is included by reference into the Filesystem Hierarchy
|
||||
Standard (FHS). The FHS is available from http://www.pathname.com/fhs/.
|
||||
|
||||
Allocations marked (68k/Amiga) apply to Linux/68k on the Amiga
|
||||
platform only. Allocations marked (68k/Atari) apply to Linux/68k on
|
||||
the Atari platform only.
|
||||
|
||||
This document is in the public domain. The authors requests, however,
|
||||
that semantically altered versions are not distributed without
|
||||
permission of the authors, assuming the authors can be contacted without
|
||||
an unreasonable effort.
|
||||
|
||||
|
||||
.. attention::
|
||||
|
||||
DEVICE DRIVERS AUTHORS PLEASE READ THIS
|
||||
|
||||
Linux now has extensive support for dynamic allocation of device numbering
|
||||
and can use ``sysfs`` and ``udev`` (``systemd``) to handle the naming needs.
|
||||
There are still some exceptions in the serial and boot device area. Before
|
||||
asking for a device number make sure you actually need one.
|
||||
|
||||
To have a major number allocated, or a minor number in situations
|
||||
where that applies (e.g. busmice), please submit a patch and send to
|
||||
the authors as indicated above.
|
||||
|
||||
Keep the description of the device *in the same format
|
||||
as this list*. The reason for this is that it is the only way we have
|
||||
found to ensure we have all the requisite information to publish your
|
||||
device and avoid conflicts.
|
||||
|
||||
Finally, sometimes we have to play "namespace police." Please don't be
|
||||
offended. We often get submissions for ``/dev`` names that would be bound
|
||||
to cause conflicts down the road. We are trying to avoid getting in a
|
||||
situation where we would have to suffer an incompatible forward
|
||||
change. Therefore, please consult with us **before** you make your
|
||||
device names and numbers in any way public, at least to the point
|
||||
where it would be at all difficult to get them changed.
|
||||
|
||||
Your cooperation is appreciated.
|
||||
|
||||
.. include:: devices.txt
|
||||
:literal:
|
||||
|
||||
Additional ``/dev/`` directory entries
|
||||
--------------------------------------
|
||||
|
||||
This section details additional entries that should or may exist in
|
||||
the /dev directory. It is preferred that symbolic links use the same
|
||||
form (absolute or relative) as is indicated here. Links are
|
||||
classified as "hard" or "symbolic" depending on the preferred type of
|
||||
link; if possible, the indicated type of link should be used.
|
||||
|
||||
Compulsory links
|
||||
++++++++++++++++
|
||||
|
||||
These links should exist on all systems:
|
||||
|
||||
=============== =============== =============== ===============================
|
||||
/dev/fd /proc/self/fd symbolic File descriptors
|
||||
/dev/stdin fd/0 symbolic stdin file descriptor
|
||||
/dev/stdout fd/1 symbolic stdout file descriptor
|
||||
/dev/stderr fd/2 symbolic stderr file descriptor
|
||||
/dev/nfsd socksys symbolic Required by iBCS-2
|
||||
/dev/X0R null symbolic Required by iBCS-2
|
||||
=============== =============== =============== ===============================
|
||||
|
||||
Note: ``/dev/X0R`` is <letter X>-<digit 0>-<letter R>.
|
||||
|
||||
Recommended links
|
||||
+++++++++++++++++
|
||||
|
||||
It is recommended that these links exist on all systems:
|
||||
|
||||
|
||||
=============== =============== =============== ===============================
|
||||
/dev/core /proc/kcore symbolic Backward compatibility
|
||||
/dev/ramdisk ram0 symbolic Backward compatibility
|
||||
/dev/ftape qft0 symbolic Backward compatibility
|
||||
/dev/bttv0 video0 symbolic Backward compatibility
|
||||
/dev/radio radio0 symbolic Backward compatibility
|
||||
/dev/i2o* /dev/i2o/* symbolic Backward compatibility
|
||||
/dev/scd? sr? hard Alternate SCSI CD-ROM name
|
||||
=============== =============== =============== ===============================
|
||||
|
||||
Locally defined links
|
||||
+++++++++++++++++++++
|
||||
|
||||
The following links may be established locally to conform to the
|
||||
configuration of the system. This is merely a tabulation of existing
|
||||
practice, and does not constitute a recommendation. However, if they
|
||||
exist, they should have the following uses.
|
||||
|
||||
=============== =============== =============== ===============================
|
||||
/dev/mouse mouse port symbolic Current mouse device
|
||||
/dev/tape tape device symbolic Current tape device
|
||||
/dev/cdrom CD-ROM device symbolic Current CD-ROM device
|
||||
/dev/cdwriter CD-writer symbolic Current CD-writer device
|
||||
/dev/scanner scanner symbolic Current scanner device
|
||||
/dev/modem modem port symbolic Current dialout device
|
||||
/dev/root root device symbolic Current root filesystem
|
||||
/dev/swap swap device symbolic Current swap device
|
||||
=============== =============== =============== ===============================
|
||||
|
||||
``/dev/modem`` should not be used for a modem which supports dialin as
|
||||
well as dialout, as it tends to cause lock file problems. If it
|
||||
exists, ``/dev/modem`` should point to the appropriate primary TTY device
|
||||
(the use of the alternate callout devices is deprecated).
|
||||
|
||||
For SCSI devices, ``/dev/tape`` and ``/dev/cdrom`` should point to the
|
||||
*cooked* devices (``/dev/st*`` and ``/dev/sr*``, respectively), whereas
|
||||
``/dev/cdwriter`` and /dev/scanner should point to the appropriate generic
|
||||
SCSI devices (/dev/sg*).
|
||||
|
||||
``/dev/mouse`` may point to a primary serial TTY device, a hardware mouse
|
||||
device, or a socket for a mouse driver program (e.g. ``/dev/gpmdata``).
|
||||
|
||||
Sockets and pipes
|
||||
+++++++++++++++++
|
||||
|
||||
Non-transient sockets and named pipes may exist in /dev. Common entries are:
|
||||
|
||||
=============== =============== ===============================================
|
||||
/dev/printer socket lpd local socket
|
||||
/dev/log socket syslog local socket
|
||||
/dev/gpmdata socket gpm mouse multiplexer
|
||||
=============== =============== ===============================================
|
||||
|
||||
Mount points
|
||||
++++++++++++
|
||||
|
||||
The following names are reserved for mounting special filesystems
|
||||
under /dev. These special filesystems provide kernel interfaces that
|
||||
cannot be provided with standard device nodes.
|
||||
|
||||
=============== =============== ===============================================
|
||||
/dev/pts devpts PTY slave filesystem
|
||||
/dev/shm tmpfs POSIX shared memory maintenance access
|
||||
=============== =============== ===============================================
|
||||
|
||||
Terminal devices
|
||||
----------------
|
||||
|
||||
Terminal, or TTY devices are a special class of character devices. A
|
||||
terminal device is any device that could act as a controlling terminal
|
||||
for a session; this includes virtual consoles, serial ports, and
|
||||
pseudoterminals (PTYs).
|
||||
|
||||
All terminal devices share a common set of capabilities known as line
|
||||
disciplines; these include the common terminal line discipline as well
|
||||
as SLIP and PPP modes.
|
||||
|
||||
All terminal devices are named similarly; this section explains the
|
||||
naming and use of the various types of TTYs. Note that the naming
|
||||
conventions include several historical warts; some of these are
|
||||
Linux-specific, some were inherited from other systems, and some
|
||||
reflect Linux outgrowing a borrowed convention.
|
||||
|
||||
A hash mark (``#``) in a device name is used here to indicate a decimal
|
||||
number without leading zeroes.
|
||||
|
||||
Virtual consoles and the console device
|
||||
+++++++++++++++++++++++++++++++++++++++
|
||||
|
||||
Virtual consoles are full-screen terminal displays on the system video
|
||||
monitor. Virtual consoles are named ``/dev/tty#``, with numbering
|
||||
starting at ``/dev/tty1``; ``/dev/tty0`` is the current virtual console.
|
||||
``/dev/tty0`` is the device that should be used to access the system video
|
||||
card on those architectures for which the frame buffer devices
|
||||
(``/dev/fb*``) are not applicable. Do not use ``/dev/console``
|
||||
for this purpose.
|
||||
|
||||
The console device, ``/dev/console``, is the device to which system
|
||||
messages should be sent, and on which logins should be permitted in
|
||||
single-user mode. Starting with Linux 2.1.71, ``/dev/console`` is managed
|
||||
by the kernel; for previous versions it should be a symbolic link to
|
||||
either ``/dev/tty0``, a specific virtual console such as ``/dev/tty1``, or to
|
||||
a serial port primary (``tty*``, not ``cu*``) device, depending on the
|
||||
configuration of the system.
|
||||
|
||||
Serial ports
|
||||
++++++++++++
|
||||
|
||||
Serial ports are RS-232 serial ports and any device which simulates
|
||||
one, either in hardware (such as internal modems) or in software (such
|
||||
as the ISDN driver.) Under Linux, each serial ports has two device
|
||||
names, the primary or callin device and the alternate or callout one.
|
||||
Each kind of device is indicated by a different letter. For any
|
||||
letter X, the names of the devices are ``/dev/ttyX#`` and ``/dev/cux#``,
|
||||
respectively; for historical reasons, ``/dev/ttyS#`` and ``/dev/ttyC#``
|
||||
correspond to ``/dev/cua#`` and ``/dev/cub#``. In the future, it should be
|
||||
expected that multiple letters will be used; all letters will be upper
|
||||
case for the "tty" device (e.g. ``/dev/ttyDP#``) and lower case for the
|
||||
"cu" device (e.g. ``/dev/cudp#``).
|
||||
|
||||
The names ``/dev/ttyQ#`` and ``/dev/cuq#`` are reserved for local use.
|
||||
|
||||
The alternate devices provide for kernel-based exclusion and somewhat
|
||||
different defaults than the primary devices. Their main purpose is to
|
||||
allow the use of serial ports with programs with no inherent or broken
|
||||
support for serial ports. Their use is deprecated, and they may be
|
||||
removed from a future version of Linux.
|
||||
|
||||
Arbitration of serial ports is provided by the use of lock files with
|
||||
the names ``/var/lock/LCK..ttyX#``. The contents of the lock file should
|
||||
be the PID of the locking process as an ASCII number.
|
||||
|
||||
It is common practice to install links such as /dev/modem
|
||||
which point to serial ports. In order to ensure proper locking in the
|
||||
presence of these links, it is recommended that software chase
|
||||
symlinks and lock all possible names; additionally, it is recommended
|
||||
that a lock file be installed with the corresponding alternate
|
||||
device. In order to avoid deadlocks, it is recommended that the locks
|
||||
are acquired in the following order, and released in the reverse:
|
||||
|
||||
1. The symbolic link name, if any (``/var/lock/LCK..modem``)
|
||||
2. The "tty" name (``/var/lock/LCK..ttyS2``)
|
||||
3. The alternate device name (``/var/lock/LCK..cua2``)
|
||||
|
||||
In the case of nested symbolic links, the lock files should be
|
||||
installed in the order the symlinks are resolved.
|
||||
|
||||
Under no circumstances should an application hold a lock while waiting
|
||||
for another to be released. In addition, applications which attempt
|
||||
to create lock files for the corresponding alternate device names
|
||||
should take into account the possibility of being used on a non-serial
|
||||
port TTY, for which no alternate device would exist.
|
||||
|
||||
Pseudoterminals (PTYs)
|
||||
++++++++++++++++++++++
|
||||
|
||||
Pseudoterminals, or PTYs, are used to create login sessions or provide
|
||||
other capabilities requiring a TTY line discipline (including SLIP or
|
||||
PPP capability) to arbitrary data-generation processes. Each PTY has
|
||||
a master side, named ``/dev/pty[p-za-e][0-9a-f]``, and a slave side, named
|
||||
``/dev/tty[p-za-e][0-9a-f]``. The kernel arbitrates the use of PTYs by
|
||||
allowing each master side to be opened only once.
|
||||
|
||||
Once the master side has been opened, the corresponding slave device
|
||||
can be used in the same manner as any TTY device. The master and
|
||||
slave devices are connected by the kernel, generating the equivalent
|
||||
of a bidirectional pipe with TTY capabilities.
|
||||
|
||||
Recent versions of the Linux kernels and GNU libc contain support for
|
||||
the System V/Unix98 naming scheme for PTYs, which assigns a common
|
||||
device, ``/dev/ptmx``, to all the masters (opening it will automatically
|
||||
give you a previously unassigned PTY) and a subdirectory, ``/dev/pts``,
|
||||
for the slaves; the slaves are named with decimal integers (``/dev/pts/#``
|
||||
in our notation). This removes the problem of exhausting the
|
||||
namespace and enables the kernel to automatically create the device
|
||||
nodes for the slaves on demand using the "devpts" filesystem.
|
File diff suppressed because it is too large
Load Diff
353
Documentation/admin-guide/dynamic-debug-howto.rst
Normal file
353
Documentation/admin-guide/dynamic-debug-howto.rst
Normal file
@ -0,0 +1,353 @@
|
||||
Dynamic debug
|
||||
+++++++++++++
|
||||
|
||||
|
||||
Introduction
|
||||
============
|
||||
|
||||
This document describes how to use the dynamic debug (dyndbg) feature.
|
||||
|
||||
Dynamic debug is designed to allow you to dynamically enable/disable
|
||||
kernel code to obtain additional kernel information. Currently, if
|
||||
``CONFIG_DYNAMIC_DEBUG`` is set, then all ``pr_debug()``/``dev_dbg()`` and
|
||||
``print_hex_dump_debug()``/``print_hex_dump_bytes()`` calls can be dynamically
|
||||
enabled per-callsite.
|
||||
|
||||
If ``CONFIG_DYNAMIC_DEBUG`` is not set, ``print_hex_dump_debug()`` is just
|
||||
shortcut for ``print_hex_dump(KERN_DEBUG)``.
|
||||
|
||||
For ``print_hex_dump_debug()``/``print_hex_dump_bytes()``, format string is
|
||||
its ``prefix_str`` argument, if it is constant string; or ``hexdump``
|
||||
in case ``prefix_str`` is build dynamically.
|
||||
|
||||
Dynamic debug has even more useful features:
|
||||
|
||||
* Simple query language allows turning on and off debugging
|
||||
statements by matching any combination of 0 or 1 of:
|
||||
|
||||
- source filename
|
||||
- function name
|
||||
- line number (including ranges of line numbers)
|
||||
- module name
|
||||
- format string
|
||||
|
||||
* Provides a debugfs control file: ``<debugfs>/dynamic_debug/control``
|
||||
which can be read to display the complete list of known debug
|
||||
statements, to help guide you
|
||||
|
||||
Controlling dynamic debug Behaviour
|
||||
===================================
|
||||
|
||||
The behaviour of ``pr_debug()``/``dev_dbg()`` are controlled via writing to a
|
||||
control file in the 'debugfs' filesystem. Thus, you must first mount
|
||||
the debugfs filesystem, in order to make use of this feature.
|
||||
Subsequently, we refer to the control file as:
|
||||
``<debugfs>/dynamic_debug/control``. For example, if you want to enable
|
||||
printing from source file ``svcsock.c``, line 1603 you simply do::
|
||||
|
||||
nullarbor:~ # echo 'file svcsock.c line 1603 +p' >
|
||||
<debugfs>/dynamic_debug/control
|
||||
|
||||
If you make a mistake with the syntax, the write will fail thus::
|
||||
|
||||
nullarbor:~ # echo 'file svcsock.c wtf 1 +p' >
|
||||
<debugfs>/dynamic_debug/control
|
||||
-bash: echo: write error: Invalid argument
|
||||
|
||||
Viewing Dynamic Debug Behaviour
|
||||
===============================
|
||||
|
||||
You can view the currently configured behaviour of all the debug
|
||||
statements via::
|
||||
|
||||
nullarbor:~ # cat <debugfs>/dynamic_debug/control
|
||||
# filename:lineno [module]function flags format
|
||||
/usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svc_rdma.c:323 [svcxprt_rdma]svc_rdma_cleanup =_ "SVCRDMA Module Removed, deregister RPC RDMA transport\012"
|
||||
/usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svc_rdma.c:341 [svcxprt_rdma]svc_rdma_init =_ "\011max_inline : %d\012"
|
||||
/usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svc_rdma.c:340 [svcxprt_rdma]svc_rdma_init =_ "\011sq_depth : %d\012"
|
||||
/usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svc_rdma.c:338 [svcxprt_rdma]svc_rdma_init =_ "\011max_requests : %d\012"
|
||||
...
|
||||
|
||||
|
||||
You can also apply standard Unix text manipulation filters to this
|
||||
data, e.g.::
|
||||
|
||||
nullarbor:~ # grep -i rdma <debugfs>/dynamic_debug/control | wc -l
|
||||
62
|
||||
|
||||
nullarbor:~ # grep -i tcp <debugfs>/dynamic_debug/control | wc -l
|
||||
42
|
||||
|
||||
The third column shows the currently enabled flags for each debug
|
||||
statement callsite (see below for definitions of the flags). The
|
||||
default value, with no flags enabled, is ``=_``. So you can view all
|
||||
the debug statement callsites with any non-default flags::
|
||||
|
||||
nullarbor:~ # awk '$3 != "=_"' <debugfs>/dynamic_debug/control
|
||||
# filename:lineno [module]function flags format
|
||||
/usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svcsock.c:1603 [sunrpc]svc_send p "svc_process: st_sendto returned %d\012"
|
||||
|
||||
Command Language Reference
|
||||
==========================
|
||||
|
||||
At the lexical level, a command comprises a sequence of words separated
|
||||
by spaces or tabs. So these are all equivalent::
|
||||
|
||||
nullarbor:~ # echo -c 'file svcsock.c line 1603 +p' >
|
||||
<debugfs>/dynamic_debug/control
|
||||
nullarbor:~ # echo -c ' file svcsock.c line 1603 +p ' >
|
||||
<debugfs>/dynamic_debug/control
|
||||
nullarbor:~ # echo -n 'file svcsock.c line 1603 +p' >
|
||||
<debugfs>/dynamic_debug/control
|
||||
|
||||
Command submissions are bounded by a write() system call.
|
||||
Multiple commands can be written together, separated by ``;`` or ``\n``::
|
||||
|
||||
~# echo "func pnpacpi_get_resources +p; func pnp_assign_mem +p" \
|
||||
> <debugfs>/dynamic_debug/control
|
||||
|
||||
If your query set is big, you can batch them too::
|
||||
|
||||
~# cat query-batch-file > <debugfs>/dynamic_debug/control
|
||||
|
||||
A another way is to use wildcard. The match rule support ``*`` (matches
|
||||
zero or more characters) and ``?`` (matches exactly one character).For
|
||||
example, you can match all usb drivers::
|
||||
|
||||
~# echo "file drivers/usb/* +p" > <debugfs>/dynamic_debug/control
|
||||
|
||||
At the syntactical level, a command comprises a sequence of match
|
||||
specifications, followed by a flags change specification::
|
||||
|
||||
command ::= match-spec* flags-spec
|
||||
|
||||
The match-spec's are used to choose a subset of the known pr_debug()
|
||||
callsites to which to apply the flags-spec. Think of them as a query
|
||||
with implicit ANDs between each pair. Note that an empty list of
|
||||
match-specs will select all debug statement callsites.
|
||||
|
||||
A match specification comprises a keyword, which controls the
|
||||
attribute of the callsite to be compared, and a value to compare
|
||||
against. Possible keywords are:::
|
||||
|
||||
match-spec ::= 'func' string |
|
||||
'file' string |
|
||||
'module' string |
|
||||
'format' string |
|
||||
'line' line-range
|
||||
|
||||
line-range ::= lineno |
|
||||
'-'lineno |
|
||||
lineno'-' |
|
||||
lineno'-'lineno
|
||||
|
||||
lineno ::= unsigned-int
|
||||
|
||||
.. note::
|
||||
|
||||
``line-range`` cannot contain space, e.g.
|
||||
"1-30" is valid range but "1 - 30" is not.
|
||||
|
||||
|
||||
The meanings of each keyword are:
|
||||
|
||||
func
|
||||
The given string is compared against the function name
|
||||
of each callsite. Example::
|
||||
|
||||
func svc_tcp_accept
|
||||
|
||||
file
|
||||
The given string is compared against either the full pathname, the
|
||||
src-root relative pathname, or the basename of the source file of
|
||||
each callsite. Examples::
|
||||
|
||||
file svcsock.c
|
||||
file kernel/freezer.c
|
||||
file /usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svcsock.c
|
||||
|
||||
module
|
||||
The given string is compared against the module name
|
||||
of each callsite. The module name is the string as
|
||||
seen in ``lsmod``, i.e. without the directory or the ``.ko``
|
||||
suffix and with ``-`` changed to ``_``. Examples::
|
||||
|
||||
module sunrpc
|
||||
module nfsd
|
||||
|
||||
format
|
||||
The given string is searched for in the dynamic debug format
|
||||
string. Note that the string does not need to match the
|
||||
entire format, only some part. Whitespace and other
|
||||
special characters can be escaped using C octal character
|
||||
escape ``\ooo`` notation, e.g. the space character is ``\040``.
|
||||
Alternatively, the string can be enclosed in double quote
|
||||
characters (``"``) or single quote characters (``'``).
|
||||
Examples::
|
||||
|
||||
format svcrdma: // many of the NFS/RDMA server pr_debugs
|
||||
format readahead // some pr_debugs in the readahead cache
|
||||
format nfsd:\040SETATTR // one way to match a format with whitespace
|
||||
format "nfsd: SETATTR" // a neater way to match a format with whitespace
|
||||
format 'nfsd: SETATTR' // yet another way to match a format with whitespace
|
||||
|
||||
line
|
||||
The given line number or range of line numbers is compared
|
||||
against the line number of each ``pr_debug()`` callsite. A single
|
||||
line number matches the callsite line number exactly. A
|
||||
range of line numbers matches any callsite between the first
|
||||
and last line number inclusive. An empty first number means
|
||||
the first line in the file, an empty line number means the
|
||||
last number in the file. Examples::
|
||||
|
||||
line 1603 // exactly line 1603
|
||||
line 1600-1605 // the six lines from line 1600 to line 1605
|
||||
line -1605 // the 1605 lines from line 1 to line 1605
|
||||
line 1600- // all lines from line 1600 to the end of the file
|
||||
|
||||
The flags specification comprises a change operation followed
|
||||
by one or more flag characters. The change operation is one
|
||||
of the characters::
|
||||
|
||||
- remove the given flags
|
||||
+ add the given flags
|
||||
= set the flags to the given flags
|
||||
|
||||
The flags are::
|
||||
|
||||
p enables the pr_debug() callsite.
|
||||
f Include the function name in the printed message
|
||||
l Include line number in the printed message
|
||||
m Include module name in the printed message
|
||||
t Include thread ID in messages not generated from interrupt context
|
||||
_ No flags are set. (Or'd with others on input)
|
||||
|
||||
For ``print_hex_dump_debug()`` and ``print_hex_dump_bytes()``, only ``p`` flag
|
||||
have meaning, other flags ignored.
|
||||
|
||||
For display, the flags are preceded by ``=``
|
||||
(mnemonic: what the flags are currently equal to).
|
||||
|
||||
Note the regexp ``^[-+=][flmpt_]+$`` matches a flags specification.
|
||||
To clear all flags at once, use ``=_`` or ``-flmpt``.
|
||||
|
||||
|
||||
Debug messages during Boot Process
|
||||
==================================
|
||||
|
||||
To activate debug messages for core code and built-in modules during
|
||||
the boot process, even before userspace and debugfs exists, use
|
||||
``dyndbg="QUERY"``, ``module.dyndbg="QUERY"``, or ``ddebug_query="QUERY"``
|
||||
(``ddebug_query`` is obsoleted by ``dyndbg``, and deprecated). QUERY follows
|
||||
the syntax described above, but must not exceed 1023 characters. Your
|
||||
bootloader may impose lower limits.
|
||||
|
||||
These ``dyndbg`` params are processed just after the ddebug tables are
|
||||
processed, as part of the arch_initcall. Thus you can enable debug
|
||||
messages in all code run after this arch_initcall via this boot
|
||||
parameter.
|
||||
|
||||
On an x86 system for example ACPI enablement is a subsys_initcall and::
|
||||
|
||||
dyndbg="file ec.c +p"
|
||||
|
||||
will show early Embedded Controller transactions during ACPI setup if
|
||||
your machine (typically a laptop) has an Embedded Controller.
|
||||
PCI (or other devices) initialization also is a hot candidate for using
|
||||
this boot parameter for debugging purposes.
|
||||
|
||||
If ``foo`` module is not built-in, ``foo.dyndbg`` will still be processed at
|
||||
boot time, without effect, but will be reprocessed when module is
|
||||
loaded later. ``dyndbg_query=`` and bare ``dyndbg=`` are only processed at
|
||||
boot.
|
||||
|
||||
|
||||
Debug Messages at Module Initialization Time
|
||||
============================================
|
||||
|
||||
When ``modprobe foo`` is called, modprobe scans ``/proc/cmdline`` for
|
||||
``foo.params``, strips ``foo.``, and passes them to the kernel along with
|
||||
params given in modprobe args or ``/etc/modprob.d/*.conf`` files,
|
||||
in the following order:
|
||||
|
||||
1. parameters given via ``/etc/modprobe.d/*.conf``::
|
||||
|
||||
options foo dyndbg=+pt
|
||||
options foo dyndbg # defaults to +p
|
||||
|
||||
2. ``foo.dyndbg`` as given in boot args, ``foo.`` is stripped and passed::
|
||||
|
||||
foo.dyndbg=" func bar +p; func buz +mp"
|
||||
|
||||
3. args to modprobe::
|
||||
|
||||
modprobe foo dyndbg==pmf # override previous settings
|
||||
|
||||
These ``dyndbg`` queries are applied in order, with last having final say.
|
||||
This allows boot args to override or modify those from ``/etc/modprobe.d``
|
||||
(sensible, since 1 is system wide, 2 is kernel or boot specific), and
|
||||
modprobe args to override both.
|
||||
|
||||
In the ``foo.dyndbg="QUERY"`` form, the query must exclude ``module foo``.
|
||||
``foo`` is extracted from the param-name, and applied to each query in
|
||||
``QUERY``, and only 1 match-spec of each type is allowed.
|
||||
|
||||
The ``dyndbg`` option is a "fake" module parameter, which means:
|
||||
|
||||
- modules do not need to define it explicitly
|
||||
- every module gets it tacitly, whether they use pr_debug or not
|
||||
- it doesn't appear in ``/sys/module/$module/parameters/``
|
||||
To see it, grep the control file, or inspect ``/proc/cmdline.``
|
||||
|
||||
For ``CONFIG_DYNAMIC_DEBUG`` kernels, any settings given at boot-time (or
|
||||
enabled by ``-DDEBUG`` flag during compilation) can be disabled later via
|
||||
the sysfs interface if the debug messages are no longer needed::
|
||||
|
||||
echo "module module_name -p" > <debugfs>/dynamic_debug/control
|
||||
|
||||
Examples
|
||||
========
|
||||
|
||||
::
|
||||
|
||||
// enable the message at line 1603 of file svcsock.c
|
||||
nullarbor:~ # echo -n 'file svcsock.c line 1603 +p' >
|
||||
<debugfs>/dynamic_debug/control
|
||||
|
||||
// enable all the messages in file svcsock.c
|
||||
nullarbor:~ # echo -n 'file svcsock.c +p' >
|
||||
<debugfs>/dynamic_debug/control
|
||||
|
||||
// enable all the messages in the NFS server module
|
||||
nullarbor:~ # echo -n 'module nfsd +p' >
|
||||
<debugfs>/dynamic_debug/control
|
||||
|
||||
// enable all 12 messages in the function svc_process()
|
||||
nullarbor:~ # echo -n 'func svc_process +p' >
|
||||
<debugfs>/dynamic_debug/control
|
||||
|
||||
// disable all 12 messages in the function svc_process()
|
||||
nullarbor:~ # echo -n 'func svc_process -p' >
|
||||
<debugfs>/dynamic_debug/control
|
||||
|
||||
// enable messages for NFS calls READ, READLINK, READDIR and READDIR+.
|
||||
nullarbor:~ # echo -n 'format "nfsd: READ" +p' >
|
||||
<debugfs>/dynamic_debug/control
|
||||
|
||||
// enable messages in files of which the paths include string "usb"
|
||||
nullarbor:~ # echo -n '*usb* +p' > <debugfs>/dynamic_debug/control
|
||||
|
||||
// enable all messages
|
||||
nullarbor:~ # echo -n '+p' > <debugfs>/dynamic_debug/control
|
||||
|
||||
// add module, function to all enabled messages
|
||||
nullarbor:~ # echo -n '+mf' > <debugfs>/dynamic_debug/control
|
||||
|
||||
// boot-args example, with newlines and comments for readability
|
||||
Kernel command line: ...
|
||||
// see whats going on in dyndbg=value processing
|
||||
dynamic_debug.verbose=1
|
||||
// enable pr_debugs in 2 builtins, #cmt is stripped
|
||||
dyndbg="module params +p #cmt ; module sys +p"
|
||||
// enable pr_debugs in 2 functions in a module loaded later
|
||||
pc87360.dyndbg="func pc87360_init_device +p; func pc87360_find +p"
|
68
Documentation/admin-guide/index.rst
Normal file
68
Documentation/admin-guide/index.rst
Normal file
@ -0,0 +1,68 @@
|
||||
The Linux kernel user's and administrator's guide
|
||||
=================================================
|
||||
|
||||
The following is a collection of user-oriented documents that have been
|
||||
added to the kernel over time. There is, as yet, little overall order or
|
||||
organization here — this material was not written to be a single, coherent
|
||||
document! With luck things will improve quickly over time.
|
||||
|
||||
This initial section contains overall information, including the README
|
||||
file describing the kernel as a whole, documentation on kernel parameters,
|
||||
etc.
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
README
|
||||
kernel-parameters
|
||||
devices
|
||||
|
||||
Here is a set of documents aimed at users who are trying to track down
|
||||
problems and bugs in particular.
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
reporting-bugs
|
||||
security-bugs
|
||||
bug-hunting
|
||||
bug-bisect
|
||||
tainted-kernels
|
||||
ramoops
|
||||
dynamic-debug-howto
|
||||
init
|
||||
|
||||
This is the beginning of a section with information of interest to
|
||||
application developers. Documents covering various aspects of the kernel
|
||||
ABI will be found here.
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
sysfs-rules
|
||||
|
||||
The rest of this manual consists of various unordered guides on how to
|
||||
configure specific aspects of kernel behavior to your liking.
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
initrd
|
||||
serial-console
|
||||
braille-console
|
||||
parport
|
||||
md
|
||||
module-signing
|
||||
sysrq
|
||||
unicode
|
||||
vga-softcursor
|
||||
binfmt-misc
|
||||
mono
|
||||
java
|
||||
|
||||
.. only:: subproject and html
|
||||
|
||||
Indices
|
||||
=======
|
||||
|
||||
* :ref:`genindex`
|
@ -5,6 +5,7 @@ OK, so you've got this pretty unintuitive message (currently located
|
||||
in init/main.c) and are wondering what the H*** went wrong.
|
||||
Some high-level reasons for failure (listed roughly in order of execution)
|
||||
to load the init binary are:
|
||||
|
||||
A) Unable to mount root FS
|
||||
B) init binary doesn't exist on rootfs
|
||||
C) broken console device
|
||||
@ -12,37 +13,39 @@ D) binary exists but dependencies not available
|
||||
E) binary cannot be loaded
|
||||
|
||||
Detailed explanations:
|
||||
0) Set "debug" kernel parameter (in bootloader config file or CONFIG_CMDLINE)
|
||||
|
||||
A) Set "debug" kernel parameter (in bootloader config file or CONFIG_CMDLINE)
|
||||
to get more detailed kernel messages.
|
||||
A) make sure you have the correct root FS type
|
||||
(and root= kernel parameter points to the correct partition),
|
||||
B) make sure you have the correct root FS type
|
||||
(and ``root=`` kernel parameter points to the correct partition),
|
||||
required drivers such as storage hardware (such as SCSI or USB!)
|
||||
and filesystem (ext3, jffs2 etc.) are builtin (alternatively as modules,
|
||||
to be pre-loaded by an initrd)
|
||||
C) Possibly a conflict in console= setup --> initial console unavailable.
|
||||
C) Possibly a conflict in ``console= setup`` --> initial console unavailable.
|
||||
E.g. some serial consoles are unreliable due to serial IRQ issues (e.g.
|
||||
missing interrupt-based configuration).
|
||||
Try using a different console= device or e.g. netconsole= .
|
||||
Try using a different ``console= device`` or e.g. ``netconsole=``.
|
||||
D) e.g. required library dependencies of the init binary such as
|
||||
/lib/ld-linux.so.2 missing or broken. Use readelf -d <INIT>|grep NEEDED
|
||||
to find out which libraries are required.
|
||||
``/lib/ld-linux.so.2`` missing or broken. Use
|
||||
``readelf -d <INIT>|grep NEEDED`` to find out which libraries are required.
|
||||
E) make sure the binary's architecture matches your hardware.
|
||||
E.g. i386 vs. x86_64 mismatch, or trying to load x86 on ARM hardware.
|
||||
In case you tried loading a non-binary file here (shell script?),
|
||||
you should make sure that the script specifies an interpreter in its shebang
|
||||
header line (#!/...) that is fully working (including its library
|
||||
header line (``#!/...``) that is fully working (including its library
|
||||
dependencies). And before tackling scripts, better first test a simple
|
||||
non-script binary such as /bin/sh and confirm its successful execution.
|
||||
To find out more, add code to init/main.c to display kernel_execve()s
|
||||
non-script binary such as ``/bin/sh`` and confirm its successful execution.
|
||||
To find out more, add code ``to init/main.c`` to display kernel_execve()s
|
||||
return values.
|
||||
|
||||
Please extend this explanation whenever you find new failure causes
|
||||
(after all loading the init binary is a CRITICAL and hard transition step
|
||||
which needs to be made as painless as possible), then submit patch to LKML.
|
||||
Further TODOs:
|
||||
- Implement the various run_init_process() invocations via a struct array
|
||||
which can then store the kernel_execve() result value and on failure
|
||||
log it all by iterating over _all_ results (very important usability fix).
|
||||
|
||||
- Implement the various ``run_init_process()`` invocations via a struct array
|
||||
which can then store the ``kernel_execve()`` result value and on failure
|
||||
log it all by iterating over **all** results (very important usability fix).
|
||||
- try to make the implementation itself more helpful in general,
|
||||
e.g. by providing additional error messages at affected places.
|
||||
|
@ -2,7 +2,7 @@ Using the initial RAM disk (initrd)
|
||||
===================================
|
||||
|
||||
Written 1996,2000 by Werner Almesberger <werner.almesberger@epfl.ch> and
|
||||
Hans Lermen <lermen@fgan.de>
|
||||
Hans Lermen <lermen@fgan.de>
|
||||
|
||||
|
||||
initrd provides the capability to load a RAM disk by the boot loader.
|
||||
@ -16,7 +16,7 @@ where the kernel comes up with a minimum set of compiled-in drivers, and
|
||||
where additional modules are loaded from initrd.
|
||||
|
||||
This document gives a brief overview of the use of initrd. A more detailed
|
||||
discussion of the boot process can be found in [1].
|
||||
discussion of the boot process can be found in [#f1]_.
|
||||
|
||||
|
||||
Operation
|
||||
@ -27,10 +27,10 @@ When using initrd, the system typically boots as follows:
|
||||
1) the boot loader loads the kernel and the initial RAM disk
|
||||
2) the kernel converts initrd into a "normal" RAM disk and
|
||||
frees the memory used by initrd
|
||||
3) if the root device is not /dev/ram0, the old (deprecated)
|
||||
3) if the root device is not ``/dev/ram0``, the old (deprecated)
|
||||
change_root procedure is followed. see the "Obsolete root change
|
||||
mechanism" section below.
|
||||
4) root device is mounted. if it is /dev/ram0, the initrd image is
|
||||
4) root device is mounted. if it is ``/dev/ram0``, the initrd image is
|
||||
then mounted as root
|
||||
5) /sbin/init is executed (this can be any valid executable, including
|
||||
shell scripts; it is run with uid 0 and can do basically everything
|
||||
@ -38,7 +38,7 @@ When using initrd, the system typically boots as follows:
|
||||
6) init mounts the "real" root file system
|
||||
7) init places the root file system at the root directory using the
|
||||
pivot_root system call
|
||||
8) init execs the /sbin/init on the new root filesystem, performing
|
||||
8) init execs the ``/sbin/init`` on the new root filesystem, performing
|
||||
the usual boot sequence
|
||||
9) the initrd file system is removed
|
||||
|
||||
@ -51,7 +51,7 @@ be accessible.
|
||||
Boot command-line options
|
||||
-------------------------
|
||||
|
||||
initrd adds the following new options:
|
||||
initrd adds the following new options::
|
||||
|
||||
initrd=<path> (e.g. LOADLIN)
|
||||
|
||||
@ -83,36 +83,36 @@ Recent kernels have support for populating a ramdisk from a compressed cpio
|
||||
archive. On such systems, the creation of a ramdisk image doesn't need to
|
||||
involve special block devices or loopbacks; you merely create a directory on
|
||||
disk with the desired initrd content, cd to that directory, and run (as an
|
||||
example):
|
||||
example)::
|
||||
|
||||
find . | cpio --quiet -H newc -o | gzip -9 -n > /boot/imagefile.img
|
||||
find . | cpio --quiet -H newc -o | gzip -9 -n > /boot/imagefile.img
|
||||
|
||||
Examining the contents of an existing image file is just as simple:
|
||||
Examining the contents of an existing image file is just as simple::
|
||||
|
||||
mkdir /tmp/imagefile
|
||||
cd /tmp/imagefile
|
||||
gzip -cd /boot/imagefile.img | cpio -imd --quiet
|
||||
mkdir /tmp/imagefile
|
||||
cd /tmp/imagefile
|
||||
gzip -cd /boot/imagefile.img | cpio -imd --quiet
|
||||
|
||||
Installation
|
||||
------------
|
||||
|
||||
First, a directory for the initrd file system has to be created on the
|
||||
"normal" root file system, e.g.
|
||||
"normal" root file system, e.g.::
|
||||
|
||||
# mkdir /initrd
|
||||
# mkdir /initrd
|
||||
|
||||
The name is not relevant. More details can be found on the pivot_root(2)
|
||||
man page.
|
||||
The name is not relevant. More details can be found on the
|
||||
:manpage:`pivot_root(2)` man page.
|
||||
|
||||
If the root file system is created during the boot procedure (i.e. if
|
||||
you're building an install floppy), the root file system creation
|
||||
procedure should create the /initrd directory.
|
||||
procedure should create the ``/initrd`` directory.
|
||||
|
||||
If initrd will not be mounted in some cases, its content is still
|
||||
accessible if the following device has been created:
|
||||
accessible if the following device has been created::
|
||||
|
||||
# mknod /dev/initrd b 1 250
|
||||
# chmod 400 /dev/initrd
|
||||
# mknod /dev/initrd b 1 250
|
||||
# chmod 400 /dev/initrd
|
||||
|
||||
Second, the kernel has to be compiled with RAM disk support and with
|
||||
support for the initial RAM disk enabled. Also, at least all components
|
||||
@ -131,60 +131,76 @@ kernels, at least three types of devices are suitable for that:
|
||||
We'll describe the loopback device method:
|
||||
|
||||
1) make sure loopback block devices are configured into the kernel
|
||||
2) create an empty file system of the appropriate size, e.g.
|
||||
# dd if=/dev/zero of=initrd bs=300k count=1
|
||||
# mke2fs -F -m0 initrd
|
||||
2) create an empty file system of the appropriate size, e.g.::
|
||||
|
||||
# dd if=/dev/zero of=initrd bs=300k count=1
|
||||
# mke2fs -F -m0 initrd
|
||||
|
||||
(if space is critical, you may want to use the Minix FS instead of Ext2)
|
||||
3) mount the file system, e.g.
|
||||
# mount -t ext2 -o loop initrd /mnt
|
||||
4) create the console device:
|
||||
3) mount the file system, e.g.::
|
||||
|
||||
# mount -t ext2 -o loop initrd /mnt
|
||||
|
||||
4) create the console device::
|
||||
|
||||
# mkdir /mnt/dev
|
||||
# mknod /mnt/dev/console c 5 1
|
||||
|
||||
5) copy all the files that are needed to properly use the initrd
|
||||
environment. Don't forget the most important file, /sbin/init
|
||||
Note that /sbin/init's permissions must include "x" (execute).
|
||||
environment. Don't forget the most important file, ``/sbin/init``
|
||||
|
||||
.. note:: ``/sbin/init`` permissions must include "x" (execute).
|
||||
|
||||
6) correct operation the initrd environment can frequently be tested
|
||||
even without rebooting with the command
|
||||
# chroot /mnt /sbin/init
|
||||
even without rebooting with the command::
|
||||
|
||||
# chroot /mnt /sbin/init
|
||||
|
||||
This is of course limited to initrds that do not interfere with the
|
||||
general system state (e.g. by reconfiguring network interfaces,
|
||||
overwriting mounted devices, trying to start already running demons,
|
||||
etc. Note however that it is usually possible to use pivot_root in
|
||||
such a chroot'ed initrd environment.)
|
||||
7) unmount the file system
|
||||
# umount /mnt
|
||||
7) unmount the file system::
|
||||
|
||||
# umount /mnt
|
||||
|
||||
8) the initrd is now in the file "initrd". Optionally, it can now be
|
||||
compressed
|
||||
# gzip -9 initrd
|
||||
compressed::
|
||||
|
||||
# gzip -9 initrd
|
||||
|
||||
For experimenting with initrd, you may want to take a rescue floppy and
|
||||
only add a symbolic link from /sbin/init to /bin/sh. Alternatively, you
|
||||
can try the experimental newlib environment [2] to create a small
|
||||
only add a symbolic link from ``/sbin/init`` to ``/bin/sh``. Alternatively, you
|
||||
can try the experimental newlib environment [#f2]_ to create a small
|
||||
initrd.
|
||||
|
||||
Finally, you have to boot the kernel and load initrd. Almost all Linux
|
||||
boot loaders support initrd. Since the boot process is still compatible
|
||||
with an older mechanism, the following boot command line parameters
|
||||
have to be given:
|
||||
have to be given::
|
||||
|
||||
root=/dev/ram0 rw
|
||||
|
||||
(rw is only necessary if writing to the initrd file system.)
|
||||
|
||||
With LOADLIN, you simply execute
|
||||
With LOADLIN, you simply execute::
|
||||
|
||||
LOADLIN <kernel> initrd=<disk_image>
|
||||
e.g. LOADLIN C:\LINUX\BZIMAGE initrd=C:\LINUX\INITRD.GZ root=/dev/ram0 rw
|
||||
|
||||
With LILO, you add the option INITRD=<path> to either the global section
|
||||
or to the section of the respective kernel in /etc/lilo.conf, and pass
|
||||
the options using APPEND, e.g.
|
||||
e.g.::
|
||||
|
||||
LOADLIN C:\LINUX\BZIMAGE initrd=C:\LINUX\INITRD.GZ root=/dev/ram0 rw
|
||||
|
||||
With LILO, you add the option ``INITRD=<path>`` to either the global section
|
||||
or to the section of the respective kernel in ``/etc/lilo.conf``, and pass
|
||||
the options using APPEND, e.g.::
|
||||
|
||||
image = /bzImage
|
||||
initrd = /boot/initrd.gz
|
||||
append = "root=/dev/ram0 rw"
|
||||
|
||||
and run /sbin/lilo
|
||||
and run ``/sbin/lilo``
|
||||
|
||||
For other boot loaders, please refer to the respective documentation.
|
||||
|
||||
@ -204,33 +220,33 @@ The procedure involves the following steps:
|
||||
- unmounting the initrd file system and de-allocating the RAM disk
|
||||
|
||||
Mounting the new root file system is easy: it just needs to be mounted on
|
||||
a directory under the current root. Example:
|
||||
a directory under the current root. Example::
|
||||
|
||||
# mkdir /new-root
|
||||
# mount -o ro /dev/hda1 /new-root
|
||||
# mkdir /new-root
|
||||
# mount -o ro /dev/hda1 /new-root
|
||||
|
||||
The root change is accomplished with the pivot_root system call, which
|
||||
is also available via the pivot_root utility (see pivot_root(8) man
|
||||
page; pivot_root is distributed with util-linux version 2.10h or higher
|
||||
[3]). pivot_root moves the current root to a directory under the new
|
||||
is also available via the ``pivot_root`` utility (see :manpage:`pivot_root(8)`
|
||||
man page; ``pivot_root`` is distributed with util-linux version 2.10h or higher
|
||||
[#f3]_). ``pivot_root`` moves the current root to a directory under the new
|
||||
root, and puts the new root at its place. The directory for the old root
|
||||
must exist before calling pivot_root. Example:
|
||||
must exist before calling ``pivot_root``. Example::
|
||||
|
||||
# cd /new-root
|
||||
# mkdir initrd
|
||||
# pivot_root . initrd
|
||||
# cd /new-root
|
||||
# mkdir initrd
|
||||
# pivot_root . initrd
|
||||
|
||||
Now, the init process may still access the old root via its
|
||||
executable, shared libraries, standard input/output/error, and its
|
||||
current root directory. All these references are dropped by the
|
||||
following command:
|
||||
following command::
|
||||
|
||||
# exec chroot . what-follows <dev/console >dev/console 2>&1
|
||||
# exec chroot . what-follows <dev/console >dev/console 2>&1
|
||||
|
||||
Where what-follows is a program under the new root, e.g. /sbin/init
|
||||
Where what-follows is a program under the new root, e.g. ``/sbin/init``
|
||||
If the new root file system will be used with udev and has no valid
|
||||
/dev directory, udev must be initialized before invoking chroot in order
|
||||
to provide /dev/console.
|
||||
``/dev`` directory, udev must be initialized before invoking chroot in order
|
||||
to provide ``/dev/console``.
|
||||
|
||||
Note: implementation details of pivot_root may change with time. In order
|
||||
to ensure compatibility, the following points should be observed:
|
||||
@ -244,13 +260,13 @@ to ensure compatibility, the following points should be observed:
|
||||
- use relative paths for dev/console in the exec command
|
||||
|
||||
Now, the initrd can be unmounted and the memory allocated by the RAM
|
||||
disk can be freed:
|
||||
disk can be freed::
|
||||
|
||||
# umount /initrd
|
||||
# blockdev --flushbufs /dev/ram0
|
||||
# umount /initrd
|
||||
# blockdev --flushbufs /dev/ram0
|
||||
|
||||
It is also possible to use initrd with an NFS-mounted root, see the
|
||||
pivot_root(8) man page for details.
|
||||
:manpage:`pivot_root(8)` man page for details.
|
||||
|
||||
|
||||
Usage scenarios
|
||||
@ -263,21 +279,21 @@ as follows:
|
||||
1) system boots from floppy or other media with a minimal kernel
|
||||
(e.g. support for RAM disks, initrd, a.out, and the Ext2 FS) and
|
||||
loads initrd
|
||||
2) /sbin/init determines what is needed to (1) mount the "real" root FS
|
||||
2) ``/sbin/init`` determines what is needed to (1) mount the "real" root FS
|
||||
(i.e. device type, device drivers, file system) and (2) the
|
||||
distribution media (e.g. CD-ROM, network, tape, ...). This can be
|
||||
done by asking the user, by auto-probing, or by using a hybrid
|
||||
approach.
|
||||
3) /sbin/init loads the necessary kernel modules
|
||||
4) /sbin/init creates and populates the root file system (this doesn't
|
||||
3) ``/sbin/init`` loads the necessary kernel modules
|
||||
4) ``/sbin/init`` creates and populates the root file system (this doesn't
|
||||
have to be a very usable system yet)
|
||||
5) /sbin/init invokes pivot_root to change the root file system and
|
||||
5) ``/sbin/init`` invokes ``pivot_root`` to change the root file system and
|
||||
execs - via chroot - a program that continues the installation
|
||||
6) the boot loader is installed
|
||||
7) the boot loader is configured to load an initrd with the set of
|
||||
modules that was used to bring up the system (e.g. /initrd can be
|
||||
modules that was used to bring up the system (e.g. ``/initrd`` can be
|
||||
modified, then unmounted, and finally, the image is written from
|
||||
/dev/ram0 or /dev/rd/0 to a file)
|
||||
``/dev/ram0`` or ``/dev/rd/0`` to a file)
|
||||
8) now the system is bootable and additional installation tasks can be
|
||||
performed
|
||||
|
||||
@ -290,7 +306,7 @@ different hardware configurations in a single administrative domain. In
|
||||
such cases, it is desirable to generate only a small set of kernels
|
||||
(ideally only one) and to keep the system-specific part of configuration
|
||||
information as small as possible. In this case, a common initrd could be
|
||||
generated with all the necessary modules. Then, only /sbin/init or a file
|
||||
generated with all the necessary modules. Then, only ``/sbin/init`` or a file
|
||||
read by it would have to be different.
|
||||
|
||||
A third scenario is more convenient recovery disks, because information
|
||||
@ -301,9 +317,9 @@ auto-detection).
|
||||
|
||||
Last not least, CD-ROM distributors may use it for better installation
|
||||
from CD, e.g. by using a boot floppy and bootstrapping a bigger RAM disk
|
||||
via initrd from CD; or by booting via a loader like LOADLIN or directly
|
||||
via initrd from CD; or by booting via a loader like ``LOADLIN`` or directly
|
||||
from the CD-ROM, and loading the RAM disk from CD without need of
|
||||
floppies.
|
||||
floppies.
|
||||
|
||||
|
||||
Obsolete root change mechanism
|
||||
@ -316,51 +332,52 @@ continued availability.
|
||||
It works by mounting the "real" root device (i.e. the one set with rdev
|
||||
in the kernel image or with root=... at the boot command line) as the
|
||||
root file system when linuxrc exits. The initrd file system is then
|
||||
unmounted, or, if it is still busy, moved to a directory /initrd, if
|
||||
unmounted, or, if it is still busy, moved to a directory ``/initrd``, if
|
||||
such a directory exists on the new root file system.
|
||||
|
||||
In order to use this mechanism, you do not have to specify the boot
|
||||
command options root, init, or rw. (If specified, they will affect
|
||||
the real root file system, not the initrd environment.)
|
||||
|
||||
|
||||
If /proc is mounted, the "real" root device can be changed from within
|
||||
linuxrc by writing the number of the new root FS device to the special
|
||||
file /proc/sys/kernel/real-root-dev, e.g.
|
||||
file /proc/sys/kernel/real-root-dev, e.g.::
|
||||
|
||||
# echo 0x301 >/proc/sys/kernel/real-root-dev
|
||||
|
||||
Note that the mechanism is incompatible with NFS and similar file
|
||||
systems.
|
||||
|
||||
This old, deprecated mechanism is commonly called "change_root", while
|
||||
the new, supported mechanism is called "pivot_root".
|
||||
This old, deprecated mechanism is commonly called ``change_root``, while
|
||||
the new, supported mechanism is called ``pivot_root``.
|
||||
|
||||
|
||||
Mixed change_root and pivot_root mechanism
|
||||
------------------------------------------
|
||||
|
||||
In case you did not want to use root=/dev/ram0 to trigger the pivot_root
|
||||
mechanism, you may create both /linuxrc and /sbin/init in your initrd image.
|
||||
In case you did not want to use ``root=/dev/ram0`` to trigger the pivot_root
|
||||
mechanism, you may create both ``/linuxrc`` and ``/sbin/init`` in your initrd
|
||||
image.
|
||||
|
||||
/linuxrc would contain only the following:
|
||||
``/linuxrc`` would contain only the following::
|
||||
|
||||
#! /bin/sh
|
||||
mount -n -t proc proc /proc
|
||||
echo 0x0100 >/proc/sys/kernel/real-root-dev
|
||||
umount -n /proc
|
||||
#! /bin/sh
|
||||
mount -n -t proc proc /proc
|
||||
echo 0x0100 >/proc/sys/kernel/real-root-dev
|
||||
umount -n /proc
|
||||
|
||||
Once linuxrc exited, the kernel would mount again your initrd as root,
|
||||
this time executing /sbin/init. Again, it would be the duty of this init
|
||||
to build the right environment (maybe using the root= device passed on
|
||||
the cmdline) before the final execution of the real /sbin/init.
|
||||
this time executing ``/sbin/init``. Again, it would be the duty of this init
|
||||
to build the right environment (maybe using the ``root= device`` passed on
|
||||
the cmdline) before the final execution of the real ``/sbin/init``.
|
||||
|
||||
|
||||
Resources
|
||||
---------
|
||||
|
||||
[1] Almesberger, Werner; "Booting Linux: The History and the Future"
|
||||
.. [#f1] Almesberger, Werner; "Booting Linux: The History and the Future"
|
||||
http://www.almesberger.net/cv/papers/ols2k-9.ps.gz
|
||||
[2] newlib package (experimental), with initrd example
|
||||
http://sources.redhat.com/newlib/
|
||||
[3] util-linux: Miscellaneous utilities for Linux
|
||||
http://www.kernel.org/pub/linux/utils/util-linux/
|
||||
.. [#f2] newlib package (experimental), with initrd example
|
||||
https://www.sourceware.org/newlib/
|
||||
.. [#f3] util-linux: Miscellaneous utilities for Linux
|
||||
https://www.kernel.org/pub/linux/utils/util-linux/
|
@ -1,5 +1,5 @@
|
||||
Java(tm) Binary Kernel Support for Linux v1.03
|
||||
----------------------------------------------
|
||||
Java(tm) Binary Kernel Support for Linux v1.03
|
||||
----------------------------------------------
|
||||
|
||||
Linux beats them ALL! While all other OS's are TALKING about direct
|
||||
support of Java Binaries in the OS, Linux is doing it!
|
||||
@ -19,70 +19,82 @@ other program after you have done the following:
|
||||
as the application itself).
|
||||
|
||||
2) You have to compile BINFMT_MISC either as a module or into
|
||||
the kernel (CONFIG_BINFMT_MISC) and set it up properly.
|
||||
the kernel (``CONFIG_BINFMT_MISC``) and set it up properly.
|
||||
If you choose to compile it as a module, you will have
|
||||
to insert it manually with modprobe/insmod, as kmod
|
||||
cannot easily be supported with binfmt_misc.
|
||||
cannot easily be supported with binfmt_misc.
|
||||
Read the file 'binfmt_misc.txt' in this directory to know
|
||||
more about the configuration process.
|
||||
|
||||
3) Add the following configuration items to binfmt_misc
|
||||
(you should really have read binfmt_misc.txt now):
|
||||
support for Java applications:
|
||||
(you should really have read ``binfmt_misc.txt`` now):
|
||||
support for Java applications::
|
||||
|
||||
':Java:M::\xca\xfe\xba\xbe::/usr/local/bin/javawrapper:'
|
||||
support for executable Jar files:
|
||||
|
||||
support for executable Jar files::
|
||||
|
||||
':ExecutableJAR:E::jar::/usr/local/bin/jarwrapper:'
|
||||
support for Java Applets:
|
||||
|
||||
support for Java Applets::
|
||||
|
||||
':Applet:E::html::/usr/bin/appletviewer:'
|
||||
or the following, if you want to be more selective:
|
||||
|
||||
or the following, if you want to be more selective::
|
||||
|
||||
':Applet:M::<!--applet::/usr/bin/appletviewer:'
|
||||
|
||||
Of course you have to fix the path names. The path/file names given in this
|
||||
document match the Debian 2.1 system. (i.e. jdk installed in /usr,
|
||||
custom wrappers from this document in /usr/local)
|
||||
document match the Debian 2.1 system. (i.e. jdk installed in ``/usr``,
|
||||
custom wrappers from this document in ``/usr/local``)
|
||||
|
||||
Note, that for the more selective applet support you have to modify
|
||||
existing html-files to contain <!--applet--> in the first line
|
||||
('<' has to be the first character!) to let this work!
|
||||
existing html-files to contain ``<!--applet-->`` in the first line
|
||||
(``<`` has to be the first character!) to let this work!
|
||||
|
||||
For the compiled Java programs you need a wrapper script like the
|
||||
following (this is because Java is broken in case of the filename
|
||||
handling), again fix the path names, both in the script and in the
|
||||
above given configuration string.
|
||||
|
||||
You, too, need the little program after the script. Compile like
|
||||
gcc -O2 -o javaclassname javaclassname.c
|
||||
and stick it to /usr/local/bin.
|
||||
You, too, need the little program after the script. Compile like::
|
||||
|
||||
gcc -O2 -o javaclassname javaclassname.c
|
||||
|
||||
and stick it to ``/usr/local/bin``.
|
||||
|
||||
Both the javawrapper shellscript and the javaclassname program
|
||||
were supplied by Colin J. Watson <cjw44@cam.ac.uk>.
|
||||
|
||||
====================== Cut here ===================
|
||||
#!/bin/bash
|
||||
# /usr/local/bin/javawrapper - the wrapper for binfmt_misc/java
|
||||
Javawrapper shell script:
|
||||
|
||||
if [ -z "$1" ]; then
|
||||
.. code-block:: sh
|
||||
|
||||
#!/bin/bash
|
||||
# /usr/local/bin/javawrapper - the wrapper for binfmt_misc/java
|
||||
|
||||
if [ -z "$1" ]; then
|
||||
exec 1>&2
|
||||
echo Usage: $0 class-file
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
|
||||
CLASS=$1
|
||||
FQCLASS=`/usr/local/bin/javaclassname $1`
|
||||
FQCLASSN=`echo $FQCLASS | sed -e 's/^.*\.\([^.]*\)$/\1/'`
|
||||
FQCLASSP=`echo $FQCLASS | sed -e 's-\.-/-g' -e 's-^[^/]*$--' -e 's-/[^/]*$--'`
|
||||
CLASS=$1
|
||||
FQCLASS=`/usr/local/bin/javaclassname $1`
|
||||
FQCLASSN=`echo $FQCLASS | sed -e 's/^.*\.\([^.]*\)$/\1/'`
|
||||
FQCLASSP=`echo $FQCLASS | sed -e 's-\.-/-g' -e 's-^[^/]*$--' -e 's-/[^/]*$--'`
|
||||
|
||||
# for example:
|
||||
# CLASS=Test.class
|
||||
# FQCLASS=foo.bar.Test
|
||||
# FQCLASSN=Test
|
||||
# FQCLASSP=foo/bar
|
||||
# for example:
|
||||
# CLASS=Test.class
|
||||
# FQCLASS=foo.bar.Test
|
||||
# FQCLASSN=Test
|
||||
# FQCLASSP=foo/bar
|
||||
|
||||
unset CLASSBASE
|
||||
unset CLASSBASE
|
||||
|
||||
declare -i LINKLEVEL=0
|
||||
declare -i LINKLEVEL=0
|
||||
|
||||
while :; do
|
||||
while :; do
|
||||
if [ "`basename $CLASS .class`" == "$FQCLASSN" ]; then
|
||||
# See if this directory works straight off
|
||||
cd -L `dirname $CLASS`
|
||||
@ -119,9 +131,9 @@ while :; do
|
||||
exit 1
|
||||
fi
|
||||
CLASS=`ls --color=no -l $CLASS | sed -e 's/^.* \([^ ]*\)$/\1/'`
|
||||
done
|
||||
done
|
||||
|
||||
if [ -z "$CLASSBASE" ]; then
|
||||
if [ -z "$CLASSBASE" ]; then
|
||||
if [ -z "$FQCLASSP" ]; then
|
||||
GOODNAME=$FQCLASSN.class
|
||||
else
|
||||
@ -131,96 +143,97 @@ if [ -z "$CLASSBASE" ]; then
|
||||
echo $0:
|
||||
echo " $FQCLASS should be in a file called $GOODNAME"
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
|
||||
if ! echo $CLASSPATH | grep -q "^\(.*:\)*$CLASSBASE\(:.*\)*"; then
|
||||
if ! echo $CLASSPATH | grep -q "^\(.*:\)*$CLASSBASE\(:.*\)*"; then
|
||||
# class is not in CLASSPATH, so prepend dir of class to CLASSPATH
|
||||
if [ -z "${CLASSPATH}" ] ; then
|
||||
export CLASSPATH=$CLASSBASE
|
||||
else
|
||||
export CLASSPATH=$CLASSBASE:$CLASSPATH
|
||||
fi
|
||||
fi
|
||||
fi
|
||||
|
||||
shift
|
||||
/usr/bin/java $FQCLASS "$@"
|
||||
====================== Cut here ===================
|
||||
shift
|
||||
/usr/bin/java $FQCLASS "$@"
|
||||
|
||||
javaclassname.c:
|
||||
|
||||
====================== Cut here ===================
|
||||
/* javaclassname.c
|
||||
*
|
||||
* Extracts the class name from a Java class file; intended for use in a Java
|
||||
* wrapper of the type supported by the binfmt_misc option in the Linux kernel.
|
||||
*
|
||||
* Copyright (C) 1999 Colin J. Watson <cjw44@cam.ac.uk>.
|
||||
*
|
||||
* This program is free software; you can redistribute it and/or modify
|
||||
* it under the terms of the GNU General Public License as published by
|
||||
* the Free Software Foundation; either version 2 of the License, or
|
||||
* (at your option) any later version.
|
||||
*
|
||||
* This program is distributed in the hope that it will be useful,
|
||||
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
* GNU General Public License for more details.
|
||||
*
|
||||
* You should have received a copy of the GNU General Public License
|
||||
* along with this program; if not, write to the Free Software
|
||||
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
|
||||
*/
|
||||
.. code-block:: c
|
||||
|
||||
#include <stdlib.h>
|
||||
#include <stdio.h>
|
||||
#include <stdarg.h>
|
||||
#include <sys/types.h>
|
||||
/* javaclassname.c
|
||||
*
|
||||
* Extracts the class name from a Java class file; intended for use in a Java
|
||||
* wrapper of the type supported by the binfmt_misc option in the Linux kernel.
|
||||
*
|
||||
* Copyright (C) 1999 Colin J. Watson <cjw44@cam.ac.uk>.
|
||||
*
|
||||
* This program is free software; you can redistribute it and/or modify
|
||||
* it under the terms of the GNU General Public License as published by
|
||||
* the Free Software Foundation; either version 2 of the License, or
|
||||
* (at your option) any later version.
|
||||
*
|
||||
* This program is distributed in the hope that it will be useful,
|
||||
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
* GNU General Public License for more details.
|
||||
*
|
||||
* You should have received a copy of the GNU General Public License
|
||||
* along with this program; if not, write to the Free Software
|
||||
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
|
||||
*/
|
||||
|
||||
/* From Sun's Java VM Specification, as tag entries in the constant pool. */
|
||||
#include <stdlib.h>
|
||||
#include <stdio.h>
|
||||
#include <stdarg.h>
|
||||
#include <sys/types.h>
|
||||
|
||||
#define CP_UTF8 1
|
||||
#define CP_INTEGER 3
|
||||
#define CP_FLOAT 4
|
||||
#define CP_LONG 5
|
||||
#define CP_DOUBLE 6
|
||||
#define CP_CLASS 7
|
||||
#define CP_STRING 8
|
||||
#define CP_FIELDREF 9
|
||||
#define CP_METHODREF 10
|
||||
#define CP_INTERFACEMETHODREF 11
|
||||
#define CP_NAMEANDTYPE 12
|
||||
#define CP_METHODHANDLE 15
|
||||
#define CP_METHODTYPE 16
|
||||
#define CP_INVOKEDYNAMIC 18
|
||||
/* From Sun's Java VM Specification, as tag entries in the constant pool. */
|
||||
|
||||
/* Define some commonly used error messages */
|
||||
#define CP_UTF8 1
|
||||
#define CP_INTEGER 3
|
||||
#define CP_FLOAT 4
|
||||
#define CP_LONG 5
|
||||
#define CP_DOUBLE 6
|
||||
#define CP_CLASS 7
|
||||
#define CP_STRING 8
|
||||
#define CP_FIELDREF 9
|
||||
#define CP_METHODREF 10
|
||||
#define CP_INTERFACEMETHODREF 11
|
||||
#define CP_NAMEANDTYPE 12
|
||||
#define CP_METHODHANDLE 15
|
||||
#define CP_METHODTYPE 16
|
||||
#define CP_INVOKEDYNAMIC 18
|
||||
|
||||
#define seek_error() error("%s: Cannot seek\n", program)
|
||||
#define corrupt_error() error("%s: Class file corrupt\n", program)
|
||||
#define eof_error() error("%s: Unexpected end of file\n", program)
|
||||
#define utf8_error() error("%s: Only ASCII 1-255 supported\n", program);
|
||||
/* Define some commonly used error messages */
|
||||
|
||||
char *program;
|
||||
#define seek_error() error("%s: Cannot seek\n", program)
|
||||
#define corrupt_error() error("%s: Class file corrupt\n", program)
|
||||
#define eof_error() error("%s: Unexpected end of file\n", program)
|
||||
#define utf8_error() error("%s: Only ASCII 1-255 supported\n", program);
|
||||
|
||||
long *pool;
|
||||
char *program;
|
||||
|
||||
u_int8_t read_8(FILE *classfile);
|
||||
u_int16_t read_16(FILE *classfile);
|
||||
void skip_constant(FILE *classfile, u_int16_t *cur);
|
||||
void error(const char *format, ...);
|
||||
int main(int argc, char **argv);
|
||||
long *pool;
|
||||
|
||||
/* Reads in an unsigned 8-bit integer. */
|
||||
u_int8_t read_8(FILE *classfile)
|
||||
{
|
||||
u_int8_t read_8(FILE *classfile);
|
||||
u_int16_t read_16(FILE *classfile);
|
||||
void skip_constant(FILE *classfile, u_int16_t *cur);
|
||||
void error(const char *format, ...);
|
||||
int main(int argc, char **argv);
|
||||
|
||||
/* Reads in an unsigned 8-bit integer. */
|
||||
u_int8_t read_8(FILE *classfile)
|
||||
{
|
||||
int b = fgetc(classfile);
|
||||
if(b == EOF)
|
||||
eof_error();
|
||||
return (u_int8_t)b;
|
||||
}
|
||||
}
|
||||
|
||||
/* Reads in an unsigned 16-bit integer. */
|
||||
u_int16_t read_16(FILE *classfile)
|
||||
{
|
||||
/* Reads in an unsigned 16-bit integer. */
|
||||
u_int16_t read_16(FILE *classfile)
|
||||
{
|
||||
int b1, b2;
|
||||
b1 = fgetc(classfile);
|
||||
if(b1 == EOF)
|
||||
@ -229,11 +242,11 @@ u_int16_t read_16(FILE *classfile)
|
||||
if(b2 == EOF)
|
||||
eof_error();
|
||||
return (u_int16_t)((b1 << 8) | b2);
|
||||
}
|
||||
}
|
||||
|
||||
/* Reads in a value from the constant pool. */
|
||||
void skip_constant(FILE *classfile, u_int16_t *cur)
|
||||
{
|
||||
/* Reads in a value from the constant pool. */
|
||||
void skip_constant(FILE *classfile, u_int16_t *cur)
|
||||
{
|
||||
u_int16_t len;
|
||||
int seekerr = 1;
|
||||
pool[*cur] = ftell(classfile);
|
||||
@ -270,19 +283,19 @@ void skip_constant(FILE *classfile, u_int16_t *cur)
|
||||
}
|
||||
if(seekerr)
|
||||
seek_error();
|
||||
}
|
||||
}
|
||||
|
||||
void error(const char *format, ...)
|
||||
{
|
||||
void error(const char *format, ...)
|
||||
{
|
||||
va_list ap;
|
||||
va_start(ap, format);
|
||||
vfprintf(stderr, format, ap);
|
||||
va_end(ap);
|
||||
exit(1);
|
||||
}
|
||||
}
|
||||
|
||||
int main(int argc, char **argv)
|
||||
{
|
||||
int main(int argc, char **argv)
|
||||
{
|
||||
FILE *classfile;
|
||||
u_int16_t cp_count, i, this_class, classinfo_ptr;
|
||||
u_int8_t length;
|
||||
@ -349,19 +362,19 @@ int main(int argc, char **argv)
|
||||
free(pool);
|
||||
fclose(classfile);
|
||||
return 0;
|
||||
}
|
||||
====================== Cut here ===================
|
||||
}
|
||||
|
||||
jarwrapper::
|
||||
|
||||
#!/bin/bash
|
||||
# /usr/local/java/bin/jarwrapper - the wrapper for binfmt_misc/jar
|
||||
|
||||
java -jar $1
|
||||
|
||||
|
||||
====================== Cut here ===================
|
||||
#!/bin/bash
|
||||
# /usr/local/java/bin/jarwrapper - the wrapper for binfmt_misc/jar
|
||||
Now simply ``chmod +x`` the ``.class``, ``.jar`` and/or ``.html`` files you
|
||||
want to execute.
|
||||
|
||||
java -jar $1
|
||||
====================== Cut here ===================
|
||||
|
||||
|
||||
Now simply chmod +x the .class, .jar and/or .html files you want to execute.
|
||||
To add a Java program to your path best put a symbolic link to the main
|
||||
.class file into /usr/bin (or another place you like) omitting the .class
|
||||
extension. The directory containing the original .class file will be
|
||||
@ -371,29 +384,36 @@ added to your CLASSPATH during execution.
|
||||
To test your new setup, enter in the following simple Java app, and name
|
||||
it "HelloWorld.java":
|
||||
|
||||
.. code-block:: java
|
||||
|
||||
class HelloWorld {
|
||||
public static void main(String args[]) {
|
||||
System.out.println("Hello World!");
|
||||
}
|
||||
}
|
||||
|
||||
Now compile the application with:
|
||||
Now compile the application with::
|
||||
|
||||
javac HelloWorld.java
|
||||
|
||||
Set the executable permissions of the binary file, with:
|
||||
Set the executable permissions of the binary file, with::
|
||||
|
||||
chmod 755 HelloWorld.class
|
||||
|
||||
And then execute it:
|
||||
And then execute it::
|
||||
|
||||
./HelloWorld.class
|
||||
|
||||
|
||||
To execute Java Jar files, simple chmod the *.jar files to include
|
||||
the execution bit, then just do
|
||||
To execute Java Jar files, simple chmod the ``*.jar`` files to include
|
||||
the execution bit, then just do::
|
||||
|
||||
./Application.jar
|
||||
|
||||
|
||||
To execute Java Applets, simple chmod the *.html files to include
|
||||
the execution bit, then just do
|
||||
To execute Java Applets, simple chmod the ``*.html`` files to include
|
||||
the execution bit, then just do::
|
||||
|
||||
./Applet.html
|
||||
|
||||
|
||||
@ -401,4 +421,3 @@ originally by Brian A. Lantz, brian@lantz.com
|
||||
heavily edited for binfmt_misc by Richard Günther
|
||||
new scripts by Colin J. Watson <cjw44@cam.ac.uk>
|
||||
added executable Jar file support by Kurt Huwig <kurt@iku-netz.de>
|
||||
|
209
Documentation/admin-guide/kernel-parameters.rst
Normal file
209
Documentation/admin-guide/kernel-parameters.rst
Normal file
@ -0,0 +1,209 @@
|
||||
The kernel's command-line parameters
|
||||
====================================
|
||||
|
||||
The following is a consolidated list of the kernel parameters as
|
||||
implemented by the __setup(), core_param() and module_param() macros
|
||||
and sorted into English Dictionary order (defined as ignoring all
|
||||
punctuation and sorting digits before letters in a case insensitive
|
||||
manner), and with descriptions where known.
|
||||
|
||||
The kernel parses parameters from the kernel command line up to "--";
|
||||
if it doesn't recognize a parameter and it doesn't contain a '.', the
|
||||
parameter gets passed to init: parameters with '=' go into init's
|
||||
environment, others are passed as command line arguments to init.
|
||||
Everything after "--" is passed as an argument to init.
|
||||
|
||||
Module parameters can be specified in two ways: via the kernel command
|
||||
line with a module name prefix, or via modprobe, e.g.::
|
||||
|
||||
(kernel command line) usbcore.blinkenlights=1
|
||||
(modprobe command line) modprobe usbcore blinkenlights=1
|
||||
|
||||
Parameters for modules which are built into the kernel need to be
|
||||
specified on the kernel command line. modprobe looks through the
|
||||
kernel command line (/proc/cmdline) and collects module parameters
|
||||
when it loads a module, so the kernel command line can be used for
|
||||
loadable modules too.
|
||||
|
||||
Hyphens (dashes) and underscores are equivalent in parameter names, so::
|
||||
|
||||
log_buf_len=1M print-fatal-signals=1
|
||||
|
||||
can also be entered as::
|
||||
|
||||
log-buf-len=1M print_fatal_signals=1
|
||||
|
||||
Double-quotes can be used to protect spaces in values, e.g.::
|
||||
|
||||
param="spaces in here"
|
||||
|
||||
cpu lists:
|
||||
----------
|
||||
|
||||
Some kernel parameters take a list of CPUs as a value, e.g. isolcpus,
|
||||
nohz_full, irqaffinity, rcu_nocbs. The format of this list is:
|
||||
|
||||
<cpu number>,...,<cpu number>
|
||||
|
||||
or
|
||||
|
||||
<cpu number>-<cpu number>
|
||||
(must be a positive range in ascending order)
|
||||
|
||||
or a mixture
|
||||
|
||||
<cpu number>,...,<cpu number>-<cpu number>
|
||||
|
||||
Note that for the special case of a range one can split the range into equal
|
||||
sized groups and for each group use some amount from the beginning of that
|
||||
group:
|
||||
|
||||
<cpu number>-cpu number>:<used size>/<group size>
|
||||
|
||||
For example one can add to the command line following parameter:
|
||||
|
||||
isolcpus=1,2,10-20,100-2000:2/25
|
||||
|
||||
where the final item represents CPUs 100,101,125,126,150,151,...
|
||||
|
||||
|
||||
|
||||
This document may not be entirely up to date and comprehensive. The command
|
||||
"modinfo -p ${modulename}" shows a current list of all parameters of a loadable
|
||||
module. Loadable modules, after being loaded into the running kernel, also
|
||||
reveal their parameters in /sys/module/${modulename}/parameters/. Some of these
|
||||
parameters may be changed at runtime by the command
|
||||
``echo -n ${value} > /sys/module/${modulename}/parameters/${parm}``.
|
||||
|
||||
The parameters listed below are only valid if certain kernel build options were
|
||||
enabled and if respective hardware is present. The text in square brackets at
|
||||
the beginning of each description states the restrictions within which a
|
||||
parameter is applicable::
|
||||
|
||||
ACPI ACPI support is enabled.
|
||||
AGP AGP (Accelerated Graphics Port) is enabled.
|
||||
ALSA ALSA sound support is enabled.
|
||||
APIC APIC support is enabled.
|
||||
APM Advanced Power Management support is enabled.
|
||||
ARM ARM architecture is enabled.
|
||||
AVR32 AVR32 architecture is enabled.
|
||||
AX25 Appropriate AX.25 support is enabled.
|
||||
BLACKFIN Blackfin architecture is enabled.
|
||||
CLK Common clock infrastructure is enabled.
|
||||
CMA Contiguous Memory Area support is enabled.
|
||||
DRM Direct Rendering Management support is enabled.
|
||||
DYNAMIC_DEBUG Build in debug messages and enable them at runtime
|
||||
EDD BIOS Enhanced Disk Drive Services (EDD) is enabled
|
||||
EFI EFI Partitioning (GPT) is enabled
|
||||
EIDE EIDE/ATAPI support is enabled.
|
||||
EVM Extended Verification Module
|
||||
FB The frame buffer device is enabled.
|
||||
FTRACE Function tracing enabled.
|
||||
GCOV GCOV profiling is enabled.
|
||||
HW Appropriate hardware is enabled.
|
||||
IA-64 IA-64 architecture is enabled.
|
||||
IMA Integrity measurement architecture is enabled.
|
||||
IOSCHED More than one I/O scheduler is enabled.
|
||||
IP_PNP IP DHCP, BOOTP, or RARP is enabled.
|
||||
IPV6 IPv6 support is enabled.
|
||||
ISAPNP ISA PnP code is enabled.
|
||||
ISDN Appropriate ISDN support is enabled.
|
||||
JOY Appropriate joystick support is enabled.
|
||||
KGDB Kernel debugger support is enabled.
|
||||
KVM Kernel Virtual Machine support is enabled.
|
||||
LIBATA Libata driver is enabled
|
||||
LP Printer support is enabled.
|
||||
LOOP Loopback device support is enabled.
|
||||
M68k M68k architecture is enabled.
|
||||
These options have more detailed description inside of
|
||||
Documentation/m68k/kernel-options.txt.
|
||||
MDA MDA console support is enabled.
|
||||
MIPS MIPS architecture is enabled.
|
||||
MOUSE Appropriate mouse support is enabled.
|
||||
MSI Message Signaled Interrupts (PCI).
|
||||
MTD MTD (Memory Technology Device) support is enabled.
|
||||
NET Appropriate network support is enabled.
|
||||
NUMA NUMA support is enabled.
|
||||
NFS Appropriate NFS support is enabled.
|
||||
OSS OSS sound support is enabled.
|
||||
PV_OPS A paravirtualized kernel is enabled.
|
||||
PARIDE The ParIDE (parallel port IDE) subsystem is enabled.
|
||||
PARISC The PA-RISC architecture is enabled.
|
||||
PCI PCI bus support is enabled.
|
||||
PCIE PCI Express support is enabled.
|
||||
PCMCIA The PCMCIA subsystem is enabled.
|
||||
PNP Plug & Play support is enabled.
|
||||
PPC PowerPC architecture is enabled.
|
||||
PPT Parallel port support is enabled.
|
||||
PS2 Appropriate PS/2 support is enabled.
|
||||
RAM RAM disk support is enabled.
|
||||
S390 S390 architecture is enabled.
|
||||
SCSI Appropriate SCSI support is enabled.
|
||||
A lot of drivers have their options described inside
|
||||
the Documentation/scsi/ sub-directory.
|
||||
SECURITY Different security models are enabled.
|
||||
SELINUX SELinux support is enabled.
|
||||
APPARMOR AppArmor support is enabled.
|
||||
SERIAL Serial support is enabled.
|
||||
SH SuperH architecture is enabled.
|
||||
SMP The kernel is an SMP kernel.
|
||||
SPARC Sparc architecture is enabled.
|
||||
SWSUSP Software suspend (hibernation) is enabled.
|
||||
SUSPEND System suspend states are enabled.
|
||||
TPM TPM drivers are enabled.
|
||||
TS Appropriate touchscreen support is enabled.
|
||||
UMS USB Mass Storage support is enabled.
|
||||
USB USB support is enabled.
|
||||
USBHID USB Human Interface Device support is enabled.
|
||||
V4L Video For Linux support is enabled.
|
||||
VMMIO Driver for memory mapped virtio devices is enabled.
|
||||
VGA The VGA console has been enabled.
|
||||
VT Virtual terminal support is enabled.
|
||||
WDT Watchdog support is enabled.
|
||||
XT IBM PC/XT MFM hard disk support is enabled.
|
||||
X86-32 X86-32, aka i386 architecture is enabled.
|
||||
X86-64 X86-64 architecture is enabled.
|
||||
More X86-64 boot options can be found in
|
||||
Documentation/x86/x86_64/boot-options.txt .
|
||||
X86 Either 32-bit or 64-bit x86 (same as X86-32+X86-64)
|
||||
X86_UV SGI UV support is enabled.
|
||||
XEN Xen support is enabled
|
||||
|
||||
In addition, the following text indicates that the option::
|
||||
|
||||
BUGS= Relates to possible processor bugs on the said processor.
|
||||
KNL Is a kernel start-up parameter.
|
||||
BOOT Is a boot loader parameter.
|
||||
|
||||
Parameters denoted with BOOT are actually interpreted by the boot
|
||||
loader, and have no meaning to the kernel directly.
|
||||
Do not modify the syntax of boot loader parameters without extreme
|
||||
need or coordination with <Documentation/x86/boot.txt>.
|
||||
|
||||
There are also arch-specific kernel-parameters not documented here.
|
||||
See for example <Documentation/x86/x86_64/boot-options.txt>.
|
||||
|
||||
Note that ALL kernel parameters listed below are CASE SENSITIVE, and that
|
||||
a trailing = on the name of any parameter states that that parameter will
|
||||
be entered as an environment variable, whereas its absence indicates that
|
||||
it will appear as a kernel argument readable via /proc/cmdline by programs
|
||||
running once the system is up.
|
||||
|
||||
The number of kernel parameters is not limited, but the length of the
|
||||
complete command line (parameters including spaces etc.) is limited to
|
||||
a fixed number of characters. This limit depends on the architecture
|
||||
and is between 256 and 4096 characters. It is defined in the file
|
||||
./include/asm/setup.h as COMMAND_LINE_SIZE.
|
||||
|
||||
Finally, the [KMG] suffix is commonly described after a number of kernel
|
||||
parameter values. These 'K', 'M', and 'G' letters represent the _binary_
|
||||
multipliers 'Kilo', 'Mega', and 'Giga', equalling 2^10, 2^20, and 2^30
|
||||
bytes respectively. Such letter suffixes can also be entirely omitted:
|
||||
|
||||
.. include:: kernel-parameters.txt
|
||||
:literal:
|
||||
|
||||
Todo
|
||||
----
|
||||
|
||||
Add more DRM drivers.
|
@ -1,202 +1,3 @@
|
||||
Kernel Parameters
|
||||
~~~~~~~~~~~~~~~~~
|
||||
|
||||
The following is a consolidated list of the kernel parameters as
|
||||
implemented by the __setup(), core_param() and module_param() macros
|
||||
and sorted into English Dictionary order (defined as ignoring all
|
||||
punctuation and sorting digits before letters in a case insensitive
|
||||
manner), and with descriptions where known.
|
||||
|
||||
The kernel parses parameters from the kernel command line up to "--";
|
||||
if it doesn't recognize a parameter and it doesn't contain a '.', the
|
||||
parameter gets passed to init: parameters with '=' go into init's
|
||||
environment, others are passed as command line arguments to init.
|
||||
Everything after "--" is passed as an argument to init.
|
||||
|
||||
Module parameters can be specified in two ways: via the kernel command
|
||||
line with a module name prefix, or via modprobe, e.g.:
|
||||
|
||||
(kernel command line) usbcore.blinkenlights=1
|
||||
(modprobe command line) modprobe usbcore blinkenlights=1
|
||||
|
||||
Parameters for modules which are built into the kernel need to be
|
||||
specified on the kernel command line. modprobe looks through the
|
||||
kernel command line (/proc/cmdline) and collects module parameters
|
||||
when it loads a module, so the kernel command line can be used for
|
||||
loadable modules too.
|
||||
|
||||
Hyphens (dashes) and underscores are equivalent in parameter names, so
|
||||
log_buf_len=1M print-fatal-signals=1
|
||||
can also be entered as
|
||||
log-buf-len=1M print_fatal_signals=1
|
||||
|
||||
Double-quotes can be used to protect spaces in values, e.g.:
|
||||
param="spaces in here"
|
||||
|
||||
cpu lists:
|
||||
----------
|
||||
|
||||
Some kernel parameters take a list of CPUs as a value, e.g. isolcpus,
|
||||
nohz_full, irqaffinity, rcu_nocbs. The format of this list is:
|
||||
|
||||
<cpu number>,...,<cpu number>
|
||||
|
||||
or
|
||||
|
||||
<cpu number>-<cpu number>
|
||||
(must be a positive range in ascending order)
|
||||
|
||||
or a mixture
|
||||
|
||||
<cpu number>,...,<cpu number>-<cpu number>
|
||||
|
||||
Note that for the special case of a range one can split the range into equal
|
||||
sized groups and for each group use some amount from the beginning of that
|
||||
group:
|
||||
|
||||
<cpu number>-cpu number>:<used size>/<group size>
|
||||
|
||||
For example one can add to the command line following parameter:
|
||||
|
||||
isolcpus=1,2,10-20,100-2000:2/25
|
||||
|
||||
where the final item represents CPUs 100,101,125,126,150,151,...
|
||||
|
||||
|
||||
|
||||
This document may not be entirely up to date and comprehensive. The command
|
||||
"modinfo -p ${modulename}" shows a current list of all parameters of a loadable
|
||||
module. Loadable modules, after being loaded into the running kernel, also
|
||||
reveal their parameters in /sys/module/${modulename}/parameters/. Some of these
|
||||
parameters may be changed at runtime by the command
|
||||
"echo -n ${value} > /sys/module/${modulename}/parameters/${parm}".
|
||||
|
||||
The parameters listed below are only valid if certain kernel build options were
|
||||
enabled and if respective hardware is present. The text in square brackets at
|
||||
the beginning of each description states the restrictions within which a
|
||||
parameter is applicable:
|
||||
|
||||
ACPI ACPI support is enabled.
|
||||
AGP AGP (Accelerated Graphics Port) is enabled.
|
||||
ALSA ALSA sound support is enabled.
|
||||
APIC APIC support is enabled.
|
||||
APM Advanced Power Management support is enabled.
|
||||
ARM ARM architecture is enabled.
|
||||
AVR32 AVR32 architecture is enabled.
|
||||
AX25 Appropriate AX.25 support is enabled.
|
||||
BLACKFIN Blackfin architecture is enabled.
|
||||
CLK Common clock infrastructure is enabled.
|
||||
CMA Contiguous Memory Area support is enabled.
|
||||
DRM Direct Rendering Management support is enabled.
|
||||
DYNAMIC_DEBUG Build in debug messages and enable them at runtime
|
||||
EDD BIOS Enhanced Disk Drive Services (EDD) is enabled
|
||||
EFI EFI Partitioning (GPT) is enabled
|
||||
EIDE EIDE/ATAPI support is enabled.
|
||||
EVM Extended Verification Module
|
||||
FB The frame buffer device is enabled.
|
||||
FTRACE Function tracing enabled.
|
||||
GCOV GCOV profiling is enabled.
|
||||
HW Appropriate hardware is enabled.
|
||||
IA-64 IA-64 architecture is enabled.
|
||||
IMA Integrity measurement architecture is enabled.
|
||||
IOSCHED More than one I/O scheduler is enabled.
|
||||
IP_PNP IP DHCP, BOOTP, or RARP is enabled.
|
||||
IPV6 IPv6 support is enabled.
|
||||
ISAPNP ISA PnP code is enabled.
|
||||
ISDN Appropriate ISDN support is enabled.
|
||||
JOY Appropriate joystick support is enabled.
|
||||
KGDB Kernel debugger support is enabled.
|
||||
KVM Kernel Virtual Machine support is enabled.
|
||||
LIBATA Libata driver is enabled
|
||||
LP Printer support is enabled.
|
||||
LOOP Loopback device support is enabled.
|
||||
M68k M68k architecture is enabled.
|
||||
These options have more detailed description inside of
|
||||
Documentation/m68k/kernel-options.txt.
|
||||
MDA MDA console support is enabled.
|
||||
MIPS MIPS architecture is enabled.
|
||||
MOUSE Appropriate mouse support is enabled.
|
||||
MSI Message Signaled Interrupts (PCI).
|
||||
MTD MTD (Memory Technology Device) support is enabled.
|
||||
NET Appropriate network support is enabled.
|
||||
NUMA NUMA support is enabled.
|
||||
NFS Appropriate NFS support is enabled.
|
||||
OSS OSS sound support is enabled.
|
||||
PV_OPS A paravirtualized kernel is enabled.
|
||||
PARIDE The ParIDE (parallel port IDE) subsystem is enabled.
|
||||
PARISC The PA-RISC architecture is enabled.
|
||||
PCI PCI bus support is enabled.
|
||||
PCIE PCI Express support is enabled.
|
||||
PCMCIA The PCMCIA subsystem is enabled.
|
||||
PNP Plug & Play support is enabled.
|
||||
PPC PowerPC architecture is enabled.
|
||||
PPT Parallel port support is enabled.
|
||||
PS2 Appropriate PS/2 support is enabled.
|
||||
RAM RAM disk support is enabled.
|
||||
S390 S390 architecture is enabled.
|
||||
SCSI Appropriate SCSI support is enabled.
|
||||
A lot of drivers have their options described inside
|
||||
the Documentation/scsi/ sub-directory.
|
||||
SECURITY Different security models are enabled.
|
||||
SELINUX SELinux support is enabled.
|
||||
APPARMOR AppArmor support is enabled.
|
||||
SERIAL Serial support is enabled.
|
||||
SH SuperH architecture is enabled.
|
||||
SMP The kernel is an SMP kernel.
|
||||
SPARC Sparc architecture is enabled.
|
||||
SWSUSP Software suspend (hibernation) is enabled.
|
||||
SUSPEND System suspend states are enabled.
|
||||
TPM TPM drivers are enabled.
|
||||
TS Appropriate touchscreen support is enabled.
|
||||
UMS USB Mass Storage support is enabled.
|
||||
USB USB support is enabled.
|
||||
USBHID USB Human Interface Device support is enabled.
|
||||
V4L Video For Linux support is enabled.
|
||||
VMMIO Driver for memory mapped virtio devices is enabled.
|
||||
VGA The VGA console has been enabled.
|
||||
VT Virtual terminal support is enabled.
|
||||
WDT Watchdog support is enabled.
|
||||
XT IBM PC/XT MFM hard disk support is enabled.
|
||||
X86-32 X86-32, aka i386 architecture is enabled.
|
||||
X86-64 X86-64 architecture is enabled.
|
||||
More X86-64 boot options can be found in
|
||||
Documentation/x86/x86_64/boot-options.txt .
|
||||
X86 Either 32-bit or 64-bit x86 (same as X86-32+X86-64)
|
||||
X86_UV SGI UV support is enabled.
|
||||
XEN Xen support is enabled
|
||||
|
||||
In addition, the following text indicates that the option:
|
||||
|
||||
BUGS= Relates to possible processor bugs on the said processor.
|
||||
KNL Is a kernel start-up parameter.
|
||||
BOOT Is a boot loader parameter.
|
||||
|
||||
Parameters denoted with BOOT are actually interpreted by the boot
|
||||
loader, and have no meaning to the kernel directly.
|
||||
Do not modify the syntax of boot loader parameters without extreme
|
||||
need or coordination with <Documentation/x86/boot.txt>.
|
||||
|
||||
There are also arch-specific kernel-parameters not documented here.
|
||||
See for example <Documentation/x86/x86_64/boot-options.txt>.
|
||||
|
||||
Note that ALL kernel parameters listed below are CASE SENSITIVE, and that
|
||||
a trailing = on the name of any parameter states that that parameter will
|
||||
be entered as an environment variable, whereas its absence indicates that
|
||||
it will appear as a kernel argument readable via /proc/cmdline by programs
|
||||
running once the system is up.
|
||||
|
||||
The number of kernel parameters is not limited, but the length of the
|
||||
complete command line (parameters including spaces etc.) is limited to
|
||||
a fixed number of characters. This limit depends on the architecture
|
||||
and is between 256 and 4096 characters. It is defined in the file
|
||||
./include/asm/setup.h as COMMAND_LINE_SIZE.
|
||||
|
||||
Finally, the [KMG] suffix is commonly described after a number of kernel
|
||||
parameter values. These 'K', 'M', and 'G' letters represent the _binary_
|
||||
multipliers 'Kilo', 'Mega', and 'Giga', equalling 2^10, 2^20, and 2^30
|
||||
bytes respectively. Such letter suffixes can also be entirely omitted.
|
||||
|
||||
|
||||
acpi= [HW,ACPI,X86,ARM64]
|
||||
Advanced Configuration and Power Interface
|
||||
Format: { force | on | off | strict | noirq | rsdt |
|
||||
@ -811,7 +612,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
|
||||
bits, and "f" is flow control ("r" for RTS or
|
||||
omit it). Default is "9600n8".
|
||||
|
||||
See Documentation/serial-console.txt for more
|
||||
See Documentation/admin-guide/serial-console.rst for more
|
||||
information. See
|
||||
Documentation/networking/netconsole.txt for an
|
||||
alternative.
|
||||
@ -2235,7 +2036,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
|
||||
mce=option [X86-64] See Documentation/x86/x86_64/boot-options.txt
|
||||
|
||||
md= [HW] RAID subsystems devices and level
|
||||
See Documentation/md.txt.
|
||||
See Documentation/admin-guide/md.rst.
|
||||
|
||||
mdacon= [MDA]
|
||||
Format: <first>,<last>
|
||||
@ -2545,7 +2346,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
|
||||
will be sent.
|
||||
The default is to send the implementation identification
|
||||
information.
|
||||
|
||||
|
||||
nfs.recover_lost_locks =
|
||||
[NFSv4] Attempt to recover locks that were lost due
|
||||
to a lease timeout on the server. Please note that
|
||||
@ -3235,6 +3036,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
|
||||
may be specified.
|
||||
Format: <port>,<port>....
|
||||
|
||||
powersave=off [PPC] This option disables power saving features.
|
||||
It specifically disables cpuidle and sets the
|
||||
platform machine description specific power_save
|
||||
function to NULL. On Idle the CPU just reduces
|
||||
execution priority.
|
||||
|
||||
ppc_strict_facility_enable
|
||||
[PPC] This option catches any kernel floating point,
|
||||
Altivec, VSX and SPE outside of regions specifically
|
||||
@ -3318,7 +3125,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
|
||||
r128= [HW,DRM]
|
||||
|
||||
raid= [HW,RAID]
|
||||
See Documentation/md.txt.
|
||||
See Documentation/admin-guide/md.rst.
|
||||
|
||||
ramdisk_size= [RAM] Sizes of RAM disks in kilobytes
|
||||
See Documentation/blockdev/ramdisk.txt.
|
||||
@ -4197,7 +4004,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
|
||||
See also Documentation/input/joystick-parport.txt
|
||||
|
||||
udbg-immortal [PPC] When debugging early kernel crashes that
|
||||
happen after console_init() and before a proper
|
||||
happen after console_init() and before a proper
|
||||
console driver takes over, this boot options might
|
||||
help "seeing" what's going on.
|
||||
|
||||
@ -4564,9 +4371,3 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
|
||||
xirc2ps_cs= [NET,PCMCIA]
|
||||
Format:
|
||||
<irq>,<irq_mask>,<io>,<full_duplex>,<do_sound>,<lockup_hack>[,<irq2>[,<irq3>[,<irq4>]]]
|
||||
|
||||
______________________________________________________________________
|
||||
|
||||
TODO:
|
||||
|
||||
Add more DRM drivers.
|
@ -1,42 +1,77 @@
|
||||
Tools that manage md devices can be found at
|
||||
http://www.kernel.org/pub/linux/utils/raid/
|
||||
|
||||
RAID arrays
|
||||
===========
|
||||
|
||||
Boot time assembly of RAID arrays
|
||||
---------------------------------
|
||||
|
||||
Tools that manage md devices can be found at
|
||||
http://www.kernel.org/pub/linux/utils/raid/
|
||||
|
||||
|
||||
You can boot with your md device with the following kernel command
|
||||
lines:
|
||||
|
||||
for old raid arrays without persistent superblocks:
|
||||
for old raid arrays without persistent superblocks::
|
||||
|
||||
md=<md device no.>,<raid level>,<chunk size factor>,<fault level>,dev0,dev1,...,devn
|
||||
|
||||
for raid arrays with persistent superblocks
|
||||
for raid arrays with persistent superblocks::
|
||||
|
||||
md=<md device no.>,dev0,dev1,...,devn
|
||||
or, to assemble a partitionable array:
|
||||
|
||||
or, to assemble a partitionable array::
|
||||
|
||||
md=d<md device no.>,dev0,dev1,...,devn
|
||||
|
||||
md device no. = the number of the md device ...
|
||||
0 means md0,
|
||||
1 md1,
|
||||
2 md2,
|
||||
3 md3,
|
||||
4 md4
|
||||
|
||||
raid level = -1 linear mode
|
||||
0 striped mode
|
||||
other modes are only supported with persistent super blocks
|
||||
``md device no.``
|
||||
+++++++++++++++++
|
||||
|
||||
chunk size factor = (raid-0 and raid-1 only)
|
||||
Set the chunk size as 4k << n.
|
||||
|
||||
fault level = totally ignored
|
||||
|
||||
dev0-devn: e.g. /dev/hda1,/dev/hdc1,/dev/sda1,/dev/sdb1
|
||||
|
||||
A possible loadlin line (Harald Hoyer <HarryH@Royal.Net>) looks like this:
|
||||
The number of the md device
|
||||
|
||||
e:\loadlin\loadlin e:\zimage root=/dev/md0 md=0,0,4,0,/dev/hdb2,/dev/hdc3 ro
|
||||
================= =========
|
||||
``md device no.`` device
|
||||
================= =========
|
||||
0 md0
|
||||
1 md1
|
||||
2 md2
|
||||
3 md3
|
||||
4 md4
|
||||
================= =========
|
||||
|
||||
``raid level``
|
||||
++++++++++++++
|
||||
|
||||
level of the RAID array
|
||||
|
||||
=============== =============
|
||||
``raid level`` level
|
||||
=============== =============
|
||||
-1 linear mode
|
||||
0 striped mode
|
||||
=============== =============
|
||||
|
||||
other modes are only supported with persistent super blocks
|
||||
|
||||
``chunk size factor``
|
||||
+++++++++++++++++++++
|
||||
|
||||
(raid-0 and raid-1 only)
|
||||
|
||||
Set the chunk size as 4k << n.
|
||||
|
||||
``fault level``
|
||||
+++++++++++++++
|
||||
|
||||
Totally ignored
|
||||
|
||||
``dev0`` to ``devn``
|
||||
++++++++++++++++++++
|
||||
|
||||
e.g. ``/dev/hda1``, ``/dev/hdc1``, ``/dev/sda1``, ``/dev/sdb1``
|
||||
|
||||
A possible loadlin line (Harald Hoyer <HarryH@Royal.Net>) looks like this::
|
||||
|
||||
e:\loadlin\loadlin e:\zimage root=/dev/md0 md=0,0,4,0,/dev/hdb2,/dev/hdc3 ro
|
||||
|
||||
|
||||
Boot time autodetection of RAID arrays
|
||||
@ -45,10 +80,10 @@ Boot time autodetection of RAID arrays
|
||||
When md is compiled into the kernel (not as module), partitions of
|
||||
type 0xfd are scanned and automatically assembled into RAID arrays.
|
||||
This autodetection may be suppressed with the kernel parameter
|
||||
"raid=noautodetect". As of kernel 2.6.9, only drives with a type 0
|
||||
``raid=noautodetect``. As of kernel 2.6.9, only drives with a type 0
|
||||
superblock can be autodetected and run at boot time.
|
||||
|
||||
The kernel parameter "raid=partitionable" (or "raid=part") means
|
||||
The kernel parameter ``raid=partitionable`` (or ``raid=part``) means
|
||||
that all auto-detected arrays are assembled as partitionable.
|
||||
|
||||
Boot time assembly of degraded/dirty arrays
|
||||
@ -56,22 +91,23 @@ Boot time assembly of degraded/dirty arrays
|
||||
|
||||
If a raid5 or raid6 array is both dirty and degraded, it could have
|
||||
undetectable data corruption. This is because the fact that it is
|
||||
'dirty' means that the parity cannot be trusted, and the fact that it
|
||||
``dirty`` means that the parity cannot be trusted, and the fact that it
|
||||
is degraded means that some datablocks are missing and cannot reliably
|
||||
be reconstructed (due to no parity).
|
||||
|
||||
For this reason, md will normally refuse to start such an array. This
|
||||
requires the sysadmin to take action to explicitly start the array
|
||||
despite possible corruption. This is normally done with
|
||||
despite possible corruption. This is normally done with::
|
||||
|
||||
mdadm --assemble --force ....
|
||||
|
||||
This option is not really available if the array has the root
|
||||
filesystem on it. In order to support this booting from such an
|
||||
array, md supports a module parameter "start_dirty_degraded" which,
|
||||
array, md supports a module parameter ``start_dirty_degraded`` which,
|
||||
when set to 1, bypassed the checks and will allows dirty degraded
|
||||
arrays to be started.
|
||||
|
||||
So, to boot with a root filesystem of a dirty degraded raid[56], use
|
||||
So, to boot with a root filesystem of a dirty degraded raid 5 or 6, use::
|
||||
|
||||
md-mod.start_dirty_degraded=1
|
||||
|
||||
@ -80,30 +116,30 @@ Superblock formats
|
||||
------------------
|
||||
|
||||
The md driver can support a variety of different superblock formats.
|
||||
Currently, it supports superblock formats "0.90.0" and the "md-1" format
|
||||
Currently, it supports superblock formats ``0.90.0`` and the ``md-1`` format
|
||||
introduced in the 2.5 development series.
|
||||
|
||||
The kernel will autodetect which format superblock is being used.
|
||||
|
||||
Superblock format '0' is treated differently to others for legacy
|
||||
Superblock format ``0`` is treated differently to others for legacy
|
||||
reasons - it is the original superblock format.
|
||||
|
||||
|
||||
General Rules - apply for all superblock formats
|
||||
------------------------------------------------
|
||||
|
||||
An array is 'created' by writing appropriate superblocks to all
|
||||
An array is ``created`` by writing appropriate superblocks to all
|
||||
devices.
|
||||
|
||||
It is 'assembled' by associating each of these devices with an
|
||||
It is ``assembled`` by associating each of these devices with an
|
||||
particular md virtual device. Once it is completely assembled, it can
|
||||
be accessed.
|
||||
|
||||
An array should be created by a user-space tool. This will write
|
||||
superblocks to all devices. It will usually mark the array as
|
||||
'unclean', or with some devices missing so that the kernel md driver
|
||||
can create appropriate redundancy (copying in raid1, parity
|
||||
calculation in raid4/5).
|
||||
``unclean``, or with some devices missing so that the kernel md driver
|
||||
can create appropriate redundancy (copying in raid 1, parity
|
||||
calculation in raid 4/5).
|
||||
|
||||
When an array is assembled, it is first initialized with the
|
||||
SET_ARRAY_INFO ioctl. This contains, in particular, a major and minor
|
||||
@ -126,13 +162,12 @@ Devices that have failed or are not yet active can be detached from an
|
||||
array using HOT_REMOVE_DISK.
|
||||
|
||||
|
||||
Specific Rules that apply to format-0 super block arrays, and
|
||||
arrays with no superblock (non-persistent).
|
||||
-------------------------------------------------------------
|
||||
Specific Rules that apply to format-0 super block arrays, and arrays with no superblock (non-persistent)
|
||||
--------------------------------------------------------------------------------------------------------
|
||||
|
||||
An array can be 'created' by describing the array (level, chunksize
|
||||
etc) in a SET_ARRAY_INFO ioctl. This must have major_version==0 and
|
||||
raid_disks != 0.
|
||||
An array can be ``created`` by describing the array (level, chunksize
|
||||
etc) in a SET_ARRAY_INFO ioctl. This must have ``major_version==0`` and
|
||||
``raid_disks != 0``.
|
||||
|
||||
Then uninitialized devices can be added with ADD_NEW_DISK. The
|
||||
structure passed to ADD_NEW_DISK must specify the state of the device
|
||||
@ -142,24 +177,26 @@ Once started with RUN_ARRAY, uninitialized spares can be added with
|
||||
HOT_ADD_DISK.
|
||||
|
||||
|
||||
|
||||
MD devices in sysfs
|
||||
-------------------
|
||||
md devices appear in sysfs (/sys) as regular block devices,
|
||||
e.g.
|
||||
|
||||
md devices appear in sysfs (``/sys``) as regular block devices,
|
||||
e.g.::
|
||||
|
||||
/sys/block/md0
|
||||
|
||||
Each 'md' device will contain a subdirectory called 'md' which
|
||||
Each ``md`` device will contain a subdirectory called ``md`` which
|
||||
contains further md-specific information about the device.
|
||||
|
||||
All md devices contain:
|
||||
|
||||
level
|
||||
a text file indicating the 'raid level'. e.g. raid0, raid1,
|
||||
a text file indicating the ``raid level``. e.g. raid0, raid1,
|
||||
raid5, linear, multipath, faulty.
|
||||
If no raid level has been set yet (array is still being
|
||||
assembled), the value will reflect whatever has been written
|
||||
to it, which may be a name like the above, or may be a number
|
||||
such as '0', '5', etc.
|
||||
such as ``0``, ``5``, etc.
|
||||
|
||||
raid_disks
|
||||
a text file with a simple number indicating the number of devices
|
||||
@ -172,10 +209,10 @@ All md devices contain:
|
||||
A change to this attribute will not be permitted if it would
|
||||
reduce the size of the array. To reduce the number of drives
|
||||
in an e.g. raid5, the array size must first be reduced by
|
||||
setting the 'array_size' attribute.
|
||||
setting the ``array_size`` attribute.
|
||||
|
||||
chunk_size
|
||||
This is the size in bytes for 'chunks' and is only relevant to
|
||||
This is the size in bytes for ``chunks`` and is only relevant to
|
||||
raid levels that involve striping (0,4,5,6,10). The address space
|
||||
of the array is conceptually divided into chunks and consecutive
|
||||
chunks are striped onto neighbouring devices.
|
||||
@ -183,7 +220,7 @@ All md devices contain:
|
||||
of 2. This can only be set while assembling an array
|
||||
|
||||
layout
|
||||
The "layout" for the array for the particular level. This is
|
||||
The ``layout`` for the array for the particular level. This is
|
||||
simply a number that is interpretted differently by different
|
||||
levels. It can be written while assembling an array.
|
||||
|
||||
@ -193,22 +230,24 @@ All md devices contain:
|
||||
devices. Writing a number (in Kilobytes) which is less than
|
||||
the available size will set the size. Any reconfiguration of the
|
||||
array (e.g. adding devices) will not cause the size to change.
|
||||
Writing the word 'default' will cause the effective size of the
|
||||
Writing the word ``default`` will cause the effective size of the
|
||||
array to be whatever size is actually available based on
|
||||
'level', 'chunk_size' and 'component_size'.
|
||||
``level``, ``chunk_size`` and ``component_size``.
|
||||
|
||||
This can be used to reduce the size of the array before reducing
|
||||
the number of devices in a raid4/5/6, or to support external
|
||||
metadata formats which mandate such clipping.
|
||||
|
||||
reshape_position
|
||||
This is either "none" or a sector number within the devices of
|
||||
the array where "reshape" is up to. If this is set, the three
|
||||
This is either ``none`` or a sector number within the devices of
|
||||
the array where ``reshape`` is up to. If this is set, the three
|
||||
attributes mentioned above (raid_disks, chunk_size, layout) can
|
||||
potentially have 2 values, an old and a new value. If these
|
||||
values differ, reading the attribute returns
|
||||
values differ, reading the attribute returns::
|
||||
|
||||
new (old)
|
||||
and writing will effect the 'new' value, leaving the 'old'
|
||||
|
||||
and writing will effect the ``new`` value, leaving the ``old``
|
||||
unchanged.
|
||||
|
||||
component_size
|
||||
@ -223,9 +262,9 @@ All md devices contain:
|
||||
metadata_version
|
||||
This indicates the format that is being used to record metadata
|
||||
about the array. It can be 0.90 (traditional format), 1.0, 1.1,
|
||||
1.2 (newer format in varying locations) or "none" indicating that
|
||||
1.2 (newer format in varying locations) or ``none`` indicating that
|
||||
the kernel isn't managing metadata at all.
|
||||
Alternately it can be "external:" followed by a string which
|
||||
Alternately it can be ``external:`` followed by a string which
|
||||
is set by user-space. This indicates that metadata is managed
|
||||
by a user-space program. Any device failure or other event that
|
||||
requires a metadata update will cause array activity to be
|
||||
@ -233,9 +272,9 @@ All md devices contain:
|
||||
|
||||
resync_start
|
||||
The point at which resync should start. If no resync is needed,
|
||||
this will be a very large number (or 'none' since 2.6.30-rc1). At
|
||||
this will be a very large number (or ``none`` since 2.6.30-rc1). At
|
||||
array creation it will default to 0, though starting the array as
|
||||
'clean' will set it much larger.
|
||||
``clean`` will set it much larger.
|
||||
|
||||
new_dev
|
||||
This file can be written but not read. The value written should
|
||||
@ -246,10 +285,10 @@ All md devices contain:
|
||||
|
||||
safe_mode_delay
|
||||
When an md array has seen no write requests for a certain period
|
||||
of time, it will be marked as 'clean'. When another write
|
||||
request arrives, the array is marked as 'dirty' before the write
|
||||
commences. This is known as 'safe_mode'.
|
||||
The 'certain period' is controlled by this file which stores the
|
||||
of time, it will be marked as ``clean``. When another write
|
||||
request arrives, the array is marked as ``dirty`` before the write
|
||||
commences. This is known as ``safe_mode``.
|
||||
The ``certain period`` is controlled by this file which stores the
|
||||
period as a number of seconds. The default is 200msec (0.200).
|
||||
Writing a value of 0 disables safemode.
|
||||
|
||||
@ -260,38 +299,50 @@ All md devices contain:
|
||||
cannot be explicitly set, and some transitions are not allowed.
|
||||
|
||||
Select/poll works on this file. All changes except between
|
||||
active_idle and active (which can be frequent and are not
|
||||
very interesting) are notified. active->active_idle is
|
||||
reported if the metadata is externally managed.
|
||||
Active_idle and active (which can be frequent and are not
|
||||
very interesting) are notified. active->active_idle is
|
||||
reported if the metadata is externally managed.
|
||||
|
||||
clear
|
||||
No devices, no size, no level
|
||||
|
||||
Writing is equivalent to STOP_ARRAY ioctl
|
||||
|
||||
inactive
|
||||
May have some settings, but array is not active
|
||||
all IO results in error
|
||||
all IO results in error
|
||||
|
||||
When written, doesn't tear down array, but just stops it
|
||||
|
||||
suspended (not supported yet)
|
||||
All IO requests will block. The array can be reconfigured.
|
||||
|
||||
Writing this, if accepted, will block until array is quiessent
|
||||
|
||||
readonly
|
||||
no resync can happen. no superblocks get written.
|
||||
write requests fail
|
||||
read-auto
|
||||
like readonly, but behaves like 'clean' on a write request.
|
||||
|
||||
clean - no pending writes, but otherwise active.
|
||||
Write requests fail
|
||||
|
||||
read-auto
|
||||
like readonly, but behaves like ``clean`` on a write request.
|
||||
|
||||
clean
|
||||
no pending writes, but otherwise active.
|
||||
|
||||
When written to inactive array, starts without resync
|
||||
|
||||
If a write request arrives then
|
||||
if metadata is known, mark 'dirty' and switch to 'active'.
|
||||
if not known, block and switch to write-pending
|
||||
if metadata is known, mark ``dirty`` and switch to ``active``.
|
||||
if not known, block and switch to write-pending
|
||||
|
||||
If written to an active array that has pending writes, then fails.
|
||||
active
|
||||
fully active: IO and resync can be happening.
|
||||
When written to inactive array, starts with resync
|
||||
|
||||
write-pending
|
||||
clean, but writes are blocked waiting for 'active' to be written.
|
||||
clean, but writes are blocked waiting for ``active`` to be written.
|
||||
|
||||
active-idle
|
||||
like active, but no writes have been seen for a while (safe_mode_delay).
|
||||
@ -299,57 +350,71 @@ All md devices contain:
|
||||
bitmap/location
|
||||
This indicates where the write-intent bitmap for the array is
|
||||
stored.
|
||||
It can be one of "none", "file" or "[+-]N".
|
||||
"file" may later be extended to "file:/file/name"
|
||||
"[+-]N" means that many sectors from the start of the metadata.
|
||||
This is replicated on all devices. For arrays with externally
|
||||
managed metadata, the offset is from the beginning of the
|
||||
device.
|
||||
|
||||
It can be one of ``none``, ``file`` or ``[+-]N``.
|
||||
``file`` may later be extended to ``file:/file/name``
|
||||
``[+-]N`` means that many sectors from the start of the metadata.
|
||||
|
||||
This is replicated on all devices. For arrays with externally
|
||||
managed metadata, the offset is from the beginning of the
|
||||
device.
|
||||
|
||||
bitmap/chunksize
|
||||
The size, in bytes, of the chunk which will be represented by a
|
||||
single bit. For RAID456, it is a portion of an individual
|
||||
device. For RAID10, it is a portion of the array. For RAID1, it
|
||||
is both (they come to the same thing).
|
||||
|
||||
bitmap/time_base
|
||||
The time, in seconds, between looking for bits in the bitmap to
|
||||
be cleared. In the current implementation, a bit will be cleared
|
||||
between 2 and 3 times "time_base" after all the covered blocks
|
||||
between 2 and 3 times ``time_base`` after all the covered blocks
|
||||
are known to be in-sync.
|
||||
|
||||
bitmap/backlog
|
||||
When write-mostly devices are active in a RAID1, write requests
|
||||
to those devices proceed in the background - the filesystem (or
|
||||
other user of the device) does not have to wait for them.
|
||||
'backlog' sets a limit on the number of concurrent background
|
||||
``backlog`` sets a limit on the number of concurrent background
|
||||
writes. If there are more than this, new writes will by
|
||||
synchronous.
|
||||
|
||||
bitmap/metadata
|
||||
This can be either 'internal' or 'external'.
|
||||
'internal' is the default and means the metadata for the bitmap
|
||||
is stored in the first 256 bytes of the allocated space and is
|
||||
managed by the md module.
|
||||
'external' means that bitmap metadata is managed externally to
|
||||
the kernel (i.e. by some userspace program)
|
||||
This can be either ``internal`` or ``external``.
|
||||
|
||||
``internal``
|
||||
is the default and means the metadata for the bitmap
|
||||
is stored in the first 256 bytes of the allocated space and is
|
||||
managed by the md module.
|
||||
|
||||
``external``
|
||||
means that bitmap metadata is managed externally to
|
||||
the kernel (i.e. by some userspace program)
|
||||
|
||||
bitmap/can_clear
|
||||
This is either 'true' or 'false'. If 'true', then bits in the
|
||||
This is either ``true`` or ``false``. If ``true``, then bits in the
|
||||
bitmap will be cleared when the corresponding blocks are thought
|
||||
to be in-sync. If 'false', bits will never be cleared.
|
||||
This is automatically set to 'false' if a write happens on a
|
||||
to be in-sync. If ``false``, bits will never be cleared.
|
||||
This is automatically set to ``false`` if a write happens on a
|
||||
degraded array, or if the array becomes degraded during a write.
|
||||
When metadata is managed externally, it should be set to true
|
||||
once the array becomes non-degraded, and this fact has been
|
||||
recorded in the metadata.
|
||||
|
||||
|
||||
|
||||
|
||||
As component devices are added to an md array, they appear in the 'md'
|
||||
directory as new directories named
|
||||
|
||||
|
||||
|
||||
As component devices are added to an md array, they appear in the ``md``
|
||||
directory as new directories named::
|
||||
|
||||
dev-XXX
|
||||
where XXX is a name that the kernel knows for the device, e.g. hdb1.
|
||||
|
||||
where ``XXX`` is a name that the kernel knows for the device, e.g. hdb1.
|
||||
Each directory contains:
|
||||
|
||||
block
|
||||
a symlink to the block device in /sys/block, e.g.
|
||||
a symlink to the block device in /sys/block, e.g.::
|
||||
|
||||
/sys/block/md0/md/dev-hdb1/block -> ../../../../block/hdb/hdb1
|
||||
|
||||
super
|
||||
@ -358,51 +423,83 @@ Each directory contains:
|
||||
|
||||
state
|
||||
A file recording the current state of the device in the array
|
||||
which can be a comma separated list of
|
||||
faulty - device has been kicked from active use due to
|
||||
a detected fault, or it has unacknowledged bad
|
||||
blocks
|
||||
in_sync - device is a fully in-sync member of the array
|
||||
writemostly - device will only be subject to read
|
||||
requests if there are no other options.
|
||||
This applies only to raid1 arrays.
|
||||
blocked - device has failed, and the failure hasn't been
|
||||
acknowledged yet by the metadata handler.
|
||||
Writes that would write to this device if
|
||||
it were not faulty are blocked.
|
||||
spare - device is working, but not a full member.
|
||||
This includes spares that are in the process
|
||||
of being recovered to
|
||||
write_error - device has ever seen a write error.
|
||||
want_replacement - device is (mostly) working but probably
|
||||
should be replaced, either due to errors or
|
||||
due to user request.
|
||||
replacement - device is a replacement for another active
|
||||
device with same raid_disk.
|
||||
which can be a comma separated list of:
|
||||
|
||||
faulty
|
||||
device has been kicked from active use due to
|
||||
a detected fault, or it has unacknowledged bad
|
||||
blocks
|
||||
|
||||
in_sync
|
||||
device is a fully in-sync member of the array
|
||||
|
||||
writemostly
|
||||
device will only be subject to read
|
||||
requests if there are no other options.
|
||||
|
||||
This applies only to raid1 arrays.
|
||||
|
||||
blocked
|
||||
device has failed, and the failure hasn't been
|
||||
acknowledged yet by the metadata handler.
|
||||
|
||||
Writes that would write to this device if
|
||||
it were not faulty are blocked.
|
||||
|
||||
spare
|
||||
device is working, but not a full member.
|
||||
|
||||
This includes spares that are in the process
|
||||
of being recovered to
|
||||
|
||||
write_error
|
||||
device has ever seen a write error.
|
||||
|
||||
want_replacement
|
||||
device is (mostly) working but probably
|
||||
should be replaced, either due to errors or
|
||||
due to user request.
|
||||
|
||||
replacement
|
||||
device is a replacement for another active
|
||||
device with same raid_disk.
|
||||
|
||||
|
||||
This list may grow in future.
|
||||
|
||||
This can be written to.
|
||||
Writing "faulty" simulates a failure on the device.
|
||||
Writing "remove" removes the device from the array.
|
||||
Writing "writemostly" sets the writemostly flag.
|
||||
Writing "-writemostly" clears the writemostly flag.
|
||||
Writing "blocked" sets the "blocked" flag.
|
||||
Writing "-blocked" clears the "blocked" flags and allows writes
|
||||
to complete and possibly simulates an error.
|
||||
Writing "in_sync" sets the in_sync flag.
|
||||
Writing "write_error" sets writeerrorseen flag.
|
||||
Writing "-write_error" clears writeerrorseen flag.
|
||||
Writing "want_replacement" is allowed at any time except to a
|
||||
replacement device or a spare. It sets the flag.
|
||||
Writing "-want_replacement" is allowed at any time. It clears
|
||||
the flag.
|
||||
Writing "replacement" or "-replacement" is only allowed before
|
||||
starting the array. It sets or clears the flag.
|
||||
|
||||
Writing ``faulty`` simulates a failure on the device.
|
||||
|
||||
Writing ``remove`` removes the device from the array.
|
||||
|
||||
Writing ``writemostly`` sets the writemostly flag.
|
||||
|
||||
Writing ``-writemostly`` clears the writemostly flag.
|
||||
|
||||
Writing ``blocked`` sets the ``blocked`` flag.
|
||||
|
||||
Writing ``-blocked`` clears the ``blocked`` flags and allows writes
|
||||
to complete and possibly simulates an error.
|
||||
|
||||
Writing ``in_sync`` sets the in_sync flag.
|
||||
|
||||
Writing ``write_error`` sets writeerrorseen flag.
|
||||
|
||||
Writing ``-write_error`` clears writeerrorseen flag.
|
||||
|
||||
Writing ``want_replacement`` is allowed at any time except to a
|
||||
replacement device or a spare. It sets the flag.
|
||||
|
||||
Writing ``-want_replacement`` is allowed at any time. It clears
|
||||
the flag.
|
||||
|
||||
Writing ``replacement`` or ``-replacement`` is only allowed before
|
||||
starting the array. It sets or clears the flag.
|
||||
|
||||
|
||||
This file responds to select/poll. Any change to 'faulty'
|
||||
or 'blocked' causes an event.
|
||||
This file responds to select/poll. Any change to ``faulty``
|
||||
or ``blocked`` causes an event.
|
||||
|
||||
errors
|
||||
An approximate count of read errors that have been detected on
|
||||
@ -417,9 +514,9 @@ Each directory contains:
|
||||
|
||||
slot
|
||||
This gives the role that the device has in the array. It will
|
||||
either be 'none' if the device is not active in the array
|
||||
either be ``none`` if the device is not active in the array
|
||||
(i.e. is a spare or has failed) or an integer less than the
|
||||
'raid_disks' number for the array indicating which position
|
||||
``raid_disks`` number for the array indicating which position
|
||||
it currently fills. This can only be set while assembling an
|
||||
array. A device for which this is set is assumed to be working.
|
||||
|
||||
@ -437,7 +534,7 @@ Each directory contains:
|
||||
written, it will be rejected.
|
||||
|
||||
recovery_start
|
||||
When the device is not 'in_sync', this records the number of
|
||||
When the device is not ``in_sync``, this records the number of
|
||||
sectors from the start of the device which are known to be
|
||||
correct. This is normally zero, but during a recovery
|
||||
operation it will steadily increase, and if the recovery is
|
||||
@ -447,21 +544,21 @@ Each directory contains:
|
||||
|
||||
This can be set whenever the device is not an active member of
|
||||
the array, either before the array is activated, or before
|
||||
the 'slot' is set.
|
||||
the ``slot`` is set.
|
||||
|
||||
Setting this to ``none`` is equivalent to setting ``in_sync``.
|
||||
Setting to any other value also clears the ``in_sync`` flag.
|
||||
|
||||
Setting this to 'none' is equivalent to setting 'in_sync'.
|
||||
Setting to any other value also clears the 'in_sync' flag.
|
||||
|
||||
bad_blocks
|
||||
This gives the list of all known bad blocks in the form of
|
||||
start address and length (in sectors respectively). If output
|
||||
is too big to fit in a page, it will be truncated. Writing
|
||||
"sector length" to this file adds new acknowledged (i.e.
|
||||
``sector length`` to this file adds new acknowledged (i.e.
|
||||
recorded to disk safely) bad blocks.
|
||||
|
||||
unacknowledged_bad_blocks
|
||||
This gives the list of known-but-not-yet-saved-to-disk bad
|
||||
blocks in the same form of 'bad_blocks'. If output is too big
|
||||
blocks in the same form of ``bad_blocks``. If output is too big
|
||||
to fit in a page, it will be truncated. Writing to this file
|
||||
adds bad blocks without acknowledging them. This is largely
|
||||
for testing.
|
||||
@ -469,16 +566,18 @@ Each directory contains:
|
||||
|
||||
|
||||
An active md device will also contain an entry for each active device
|
||||
in the array. These are named
|
||||
in the array. These are named::
|
||||
|
||||
rdNN
|
||||
|
||||
where 'NN' is the position in the array, starting from 0.
|
||||
where ``NN`` is the position in the array, starting from 0.
|
||||
So for a 3 drive array there will be rd0, rd1, rd2.
|
||||
These are symbolic links to the appropriate 'dev-XXX' entry.
|
||||
Thus, for example,
|
||||
These are symbolic links to the appropriate ``dev-XXX`` entry.
|
||||
Thus, for example::
|
||||
|
||||
cat /sys/block/md*/md/rd*/state
|
||||
will show 'in_sync' on every line.
|
||||
|
||||
will show ``in_sync`` on every line.
|
||||
|
||||
|
||||
|
||||
@ -488,50 +587,62 @@ also have
|
||||
sync_action
|
||||
a text file that can be used to monitor and control the rebuild
|
||||
process. It contains one word which can be one of:
|
||||
resync - redundancy is being recalculated after unclean
|
||||
shutdown or creation
|
||||
recover - a hot spare is being built to replace a
|
||||
failed/missing device
|
||||
idle - nothing is happening
|
||||
check - A full check of redundancy was requested and is
|
||||
happening. This reads all blocks and checks
|
||||
them. A repair may also happen for some raid
|
||||
levels.
|
||||
repair - A full check and repair is happening. This is
|
||||
similar to 'resync', but was requested by the
|
||||
user, and the write-intent bitmap is NOT used to
|
||||
optimise the process.
|
||||
|
||||
resync
|
||||
redundancy is being recalculated after unclean
|
||||
shutdown or creation
|
||||
|
||||
recover
|
||||
a hot spare is being built to replace a
|
||||
failed/missing device
|
||||
|
||||
idle
|
||||
nothing is happening
|
||||
check
|
||||
A full check of redundancy was requested and is
|
||||
happening. This reads all blocks and checks
|
||||
them. A repair may also happen for some raid
|
||||
levels.
|
||||
|
||||
repair
|
||||
A full check and repair is happening. This is
|
||||
similar to ``resync``, but was requested by the
|
||||
user, and the write-intent bitmap is NOT used to
|
||||
optimise the process.
|
||||
|
||||
This file is writable, and each of the strings that could be
|
||||
read are meaningful for writing.
|
||||
|
||||
'idle' will stop an active resync/recovery etc. There is no
|
||||
guarantee that another resync/recovery may not be automatically
|
||||
started again, though some event will be needed to trigger
|
||||
this.
|
||||
'resync' or 'recovery' can be used to restart the
|
||||
corresponding operation if it was stopped with 'idle'.
|
||||
'check' and 'repair' will start the appropriate process
|
||||
providing the current state is 'idle'.
|
||||
``idle`` will stop an active resync/recovery etc. There is no
|
||||
guarantee that another resync/recovery may not be automatically
|
||||
started again, though some event will be needed to trigger
|
||||
this.
|
||||
|
||||
``resync`` or ``recovery`` can be used to restart the
|
||||
corresponding operation if it was stopped with ``idle``.
|
||||
|
||||
``check`` and ``repair`` will start the appropriate process
|
||||
providing the current state is ``idle``.
|
||||
|
||||
This file responds to select/poll. Any important change in the value
|
||||
triggers a poll event. Sometimes the value will briefly be
|
||||
"recover" if a recovery seems to be needed, but cannot be
|
||||
achieved. In that case, the transition to "recover" isn't
|
||||
``recover`` if a recovery seems to be needed, but cannot be
|
||||
achieved. In that case, the transition to ``recover`` isn't
|
||||
notified, but the transition away is.
|
||||
|
||||
degraded
|
||||
This contains a count of the number of devices by which the
|
||||
arrays is degraded. So an optimal array will show '0'. A
|
||||
single failed/missing drive will show '1', etc.
|
||||
arrays is degraded. So an optimal array will show ``0``. A
|
||||
single failed/missing drive will show ``1``, etc.
|
||||
|
||||
This file responds to select/poll, any increase or decrease
|
||||
in the count of missing devices will trigger an event.
|
||||
|
||||
mismatch_count
|
||||
When performing 'check' and 'repair', and possibly when
|
||||
performing 'resync', md will count the number of errors that are
|
||||
found. The count in 'mismatch_cnt' is the number of sectors
|
||||
that were re-written, or (for 'check') would have been
|
||||
When performing ``check`` and ``repair``, and possibly when
|
||||
performing ``resync``, md will count the number of errors that are
|
||||
found. The count in ``mismatch_cnt`` is the number of sectors
|
||||
that were re-written, or (for ``check``) would have been
|
||||
re-written. As most raid levels work in units of pages rather
|
||||
than sectors, this may be larger than the number of actual errors
|
||||
by a factor of the number of sectors in a page.
|
||||
@ -542,27 +653,30 @@ also have
|
||||
would need to check the corresponding blocks. Either individual
|
||||
numbers or start-end pairs can be written. Multiple numbers
|
||||
can be separated by a space.
|
||||
Note that the numbers are 'bit' numbers, not 'block' numbers.
|
||||
|
||||
Note that the numbers are ``bit`` numbers, not ``block`` numbers.
|
||||
They should be scaled by the bitmap_chunksize.
|
||||
|
||||
sync_speed_min
|
||||
sync_speed_max
|
||||
This are similar to /proc/sys/dev/raid/speed_limit_{min,max}
|
||||
sync_speed_min, sync_speed_max
|
||||
This are similar to ``/proc/sys/dev/raid/speed_limit_{min,max}``
|
||||
however they only apply to the particular array.
|
||||
If no value has been written to these, or if the word 'system'
|
||||
|
||||
If no value has been written to these, or if the word ``system``
|
||||
is written, then the system-wide value is used. If a value,
|
||||
in kibibytes-per-second is written, then it is used.
|
||||
|
||||
When the files are read, they show the currently active value
|
||||
followed by "(local)" or "(system)" depending on whether it is
|
||||
followed by ``(local)`` or ``(system)`` depending on whether it is
|
||||
a locally set or system-wide value.
|
||||
|
||||
sync_completed
|
||||
This shows the number of sectors that have been completed of
|
||||
whatever the current sync_action is, followed by the number of
|
||||
sectors in total that could need to be processed. The two
|
||||
numbers are separated by a '/' thus effectively showing one
|
||||
numbers are separated by a ``/`` thus effectively showing one
|
||||
value, a fraction of the process that is complete.
|
||||
A 'select' on this attribute will return when resync completes,
|
||||
|
||||
A ``select`` on this attribute will return when resync completes,
|
||||
when it reaches the current sync_max (below) and possibly at
|
||||
other times.
|
||||
|
||||
@ -570,26 +684,24 @@ also have
|
||||
This shows the current actual speed, in K/sec, of the current
|
||||
sync_action. It is averaged over the last 30 seconds.
|
||||
|
||||
suspend_lo
|
||||
suspend_hi
|
||||
suspend_lo, suspend_hi
|
||||
The two values, given as numbers of sectors, indicate a range
|
||||
within the array where IO will be blocked. This is currently
|
||||
only supported for raid4/5/6.
|
||||
|
||||
sync_min
|
||||
sync_max
|
||||
sync_min, sync_max
|
||||
The two values, given as numbers of sectors, indicate a range
|
||||
within the array where 'check'/'repair' will operate. Must be
|
||||
a multiple of chunk_size. When it reaches "sync_max" it will
|
||||
within the array where ``check``/``repair`` will operate. Must be
|
||||
a multiple of chunk_size. When it reaches ``sync_max`` it will
|
||||
pause, rather than complete.
|
||||
You can use 'select' or 'poll' on "sync_completed" to wait for
|
||||
You can use ``select`` or ``poll`` on ``sync_completed`` to wait for
|
||||
that number to reach sync_max. Then you can either increase
|
||||
"sync_max", or can write 'idle' to "sync_action".
|
||||
``sync_max``, or can write ``idle`` to ``sync_action``.
|
||||
|
||||
The value of 'max' for "sync_max" effectively disables the limit.
|
||||
The value of ``max`` for ``sync_max`` effectively disables the limit.
|
||||
When a resync is active, the value can only ever be increased,
|
||||
never decreased.
|
||||
The value of '0' is the minimum for "sync_min".
|
||||
The value of ``0`` is the minimum for ``sync_min``.
|
||||
|
||||
|
||||
|
||||
@ -598,13 +710,15 @@ personality module that manages it.
|
||||
These are specific to the implementation of the module and could
|
||||
change substantially if the implementation changes.
|
||||
|
||||
These currently include
|
||||
These currently include:
|
||||
|
||||
stripe_cache_size (currently raid5 only)
|
||||
number of entries in the stripe cache. This is writable, but
|
||||
there are upper and lower limits (32768, 17). Default is 256.
|
||||
|
||||
strip_cache_active (currently raid5 only)
|
||||
number of active entries in the stripe cache
|
||||
|
||||
preread_bypass_threshold (currently raid5 only)
|
||||
number of times a stripe requiring preread will be bypassed by
|
||||
a stripe that does not require preread. For fairness defaults
|
@ -1,22 +1,21 @@
|
||||
==============================
|
||||
KERNEL MODULE SIGNING FACILITY
|
||||
==============================
|
||||
Kernel module signing facility
|
||||
------------------------------
|
||||
|
||||
CONTENTS
|
||||
|
||||
- Overview.
|
||||
- Configuring module signing.
|
||||
- Generating signing keys.
|
||||
- Public keys in the kernel.
|
||||
- Manually signing modules.
|
||||
- Signed modules and stripping.
|
||||
- Loading signed modules.
|
||||
- Non-valid signatures and unsigned modules.
|
||||
- Administering/protecting the private key.
|
||||
.. CONTENTS
|
||||
..
|
||||
.. - Overview.
|
||||
.. - Configuring module signing.
|
||||
.. - Generating signing keys.
|
||||
.. - Public keys in the kernel.
|
||||
.. - Manually signing modules.
|
||||
.. - Signed modules and stripping.
|
||||
.. - Loading signed modules.
|
||||
.. - Non-valid signatures and unsigned modules.
|
||||
.. - Administering/protecting the private key.
|
||||
|
||||
|
||||
========
|
||||
OVERVIEW
|
||||
Overview
|
||||
========
|
||||
|
||||
The kernel module signing facility cryptographically signs modules during
|
||||
@ -36,17 +35,19 @@ SHA-512 (the algorithm is selected by data in the signature).
|
||||
|
||||
|
||||
==========================
|
||||
CONFIGURING MODULE SIGNING
|
||||
Configuring module signing
|
||||
==========================
|
||||
|
||||
The module signing facility is enabled by going to the "Enable Loadable Module
|
||||
Support" section of the kernel configuration and turning on
|
||||
The module signing facility is enabled by going to the
|
||||
:menuselection:`Enable Loadable Module Support` section of
|
||||
the kernel configuration and turning on::
|
||||
|
||||
CONFIG_MODULE_SIG "Module signature verification"
|
||||
|
||||
This has a number of options available:
|
||||
|
||||
(1) "Require modules to be validly signed" (CONFIG_MODULE_SIG_FORCE)
|
||||
(1) :menuselection:`Require modules to be validly signed`
|
||||
(``CONFIG_MODULE_SIG_FORCE``)
|
||||
|
||||
This specifies how the kernel should deal with a module that has a
|
||||
signature for which the key is not known or a module that is unsigned.
|
||||
@ -64,35 +65,39 @@ This has a number of options available:
|
||||
cannot be parsed, it will be rejected out of hand.
|
||||
|
||||
|
||||
(2) "Automatically sign all modules" (CONFIG_MODULE_SIG_ALL)
|
||||
(2) :menuselection:`Automatically sign all modules`
|
||||
(``CONFIG_MODULE_SIG_ALL``)
|
||||
|
||||
If this is on then modules will be automatically signed during the
|
||||
modules_install phase of a build. If this is off, then the modules must
|
||||
be signed manually using:
|
||||
be signed manually using::
|
||||
|
||||
scripts/sign-file
|
||||
|
||||
|
||||
(3) "Which hash algorithm should modules be signed with?"
|
||||
(3) :menuselection:`Which hash algorithm should modules be signed with?`
|
||||
|
||||
This presents a choice of which hash algorithm the installation phase will
|
||||
sign the modules with:
|
||||
|
||||
CONFIG_MODULE_SIG_SHA1 "Sign modules with SHA-1"
|
||||
CONFIG_MODULE_SIG_SHA224 "Sign modules with SHA-224"
|
||||
CONFIG_MODULE_SIG_SHA256 "Sign modules with SHA-256"
|
||||
CONFIG_MODULE_SIG_SHA384 "Sign modules with SHA-384"
|
||||
CONFIG_MODULE_SIG_SHA512 "Sign modules with SHA-512"
|
||||
=============================== ==========================================
|
||||
``CONFIG_MODULE_SIG_SHA1`` :menuselection:`Sign modules with SHA-1`
|
||||
``CONFIG_MODULE_SIG_SHA224`` :menuselection:`Sign modules with SHA-224`
|
||||
``CONFIG_MODULE_SIG_SHA256`` :menuselection:`Sign modules with SHA-256`
|
||||
``CONFIG_MODULE_SIG_SHA384`` :menuselection:`Sign modules with SHA-384`
|
||||
``CONFIG_MODULE_SIG_SHA512`` :menuselection:`Sign modules with SHA-512`
|
||||
=============================== ==========================================
|
||||
|
||||
The algorithm selected here will also be built into the kernel (rather
|
||||
than being a module) so that modules signed with that algorithm can have
|
||||
their signatures checked without causing a dependency loop.
|
||||
|
||||
|
||||
(4) "File name or PKCS#11 URI of module signing key" (CONFIG_MODULE_SIG_KEY)
|
||||
(4) :menuselection:`File name or PKCS#11 URI of module signing key`
|
||||
(``CONFIG_MODULE_SIG_KEY``)
|
||||
|
||||
Setting this option to something other than its default of
|
||||
"certs/signing_key.pem" will disable the autogeneration of signing keys
|
||||
``certs/signing_key.pem`` will disable the autogeneration of signing keys
|
||||
and allow the kernel modules to be signed with a key of your choosing.
|
||||
The string provided should identify a file containing both a private key
|
||||
and its corresponding X.509 certificate in PEM form, or — on systems where
|
||||
@ -102,10 +107,11 @@ This has a number of options available:
|
||||
|
||||
If the PEM file containing the private key is encrypted, or if the
|
||||
PKCS#11 token requries a PIN, this can be provided at build time by
|
||||
means of the KBUILD_SIGN_PIN variable.
|
||||
means of the ``KBUILD_SIGN_PIN`` variable.
|
||||
|
||||
|
||||
(5) "Additional X.509 keys for default system keyring" (CONFIG_SYSTEM_TRUSTED_KEYS)
|
||||
(5) :menuselection:`Additional X.509 keys for default system keyring`
|
||||
(``CONFIG_SYSTEM_TRUSTED_KEYS``)
|
||||
|
||||
This option can be set to the filename of a PEM-encoded file containing
|
||||
additional certificates which will be included in the system keyring by
|
||||
@ -116,7 +122,7 @@ packages to the kernel build processes for the tool that does the signing.
|
||||
|
||||
|
||||
=======================
|
||||
GENERATING SIGNING KEYS
|
||||
Generating signing keys
|
||||
=======================
|
||||
|
||||
Cryptographic keypairs are required to generate and check signatures. A
|
||||
@ -126,14 +132,14 @@ it can be deleted or stored securely. The public key gets built into the
|
||||
kernel so that it can be used to check the signatures as the modules are
|
||||
loaded.
|
||||
|
||||
Under normal conditions, when CONFIG_MODULE_SIG_KEY is unchanged from its
|
||||
Under normal conditions, when ``CONFIG_MODULE_SIG_KEY`` is unchanged from its
|
||||
default, the kernel build will automatically generate a new keypair using
|
||||
openssl if one does not exist in the file:
|
||||
openssl if one does not exist in the file::
|
||||
|
||||
certs/signing_key.pem
|
||||
|
||||
during the building of vmlinux (the public part of the key needs to be built
|
||||
into vmlinux) using parameters in the:
|
||||
into vmlinux) using parameters in the::
|
||||
|
||||
certs/x509.genkey
|
||||
|
||||
@ -142,14 +148,14 @@ file (which is also generated if it does not already exist).
|
||||
It is strongly recommended that you provide your own x509.genkey file.
|
||||
|
||||
Most notably, in the x509.genkey file, the req_distinguished_name section
|
||||
should be altered from the default:
|
||||
should be altered from the default::
|
||||
|
||||
[ req_distinguished_name ]
|
||||
#O = Unspecified company
|
||||
CN = Build time autogenerated kernel key
|
||||
#emailAddress = unspecified.user@unspecified.company
|
||||
|
||||
The generated RSA key size can also be set with:
|
||||
The generated RSA key size can also be set with::
|
||||
|
||||
[ req ]
|
||||
default_bits = 4096
|
||||
@ -158,23 +164,23 @@ The generated RSA key size can also be set with:
|
||||
It is also possible to manually generate the key private/public files using the
|
||||
x509.genkey key generation configuration file in the root node of the Linux
|
||||
kernel sources tree and the openssl command. The following is an example to
|
||||
generate the public/private key files:
|
||||
generate the public/private key files::
|
||||
|
||||
openssl req -new -nodes -utf8 -sha256 -days 36500 -batch -x509 \
|
||||
-config x509.genkey -outform PEM -out kernel_key.pem \
|
||||
-keyout kernel_key.pem
|
||||
|
||||
The full pathname for the resulting kernel_key.pem file can then be specified
|
||||
in the CONFIG_MODULE_SIG_KEY option, and the certificate and key therein will
|
||||
in the ``CONFIG_MODULE_SIG_KEY`` option, and the certificate and key therein will
|
||||
be used instead of an autogenerated keypair.
|
||||
|
||||
|
||||
=========================
|
||||
PUBLIC KEYS IN THE KERNEL
|
||||
Public keys in the kernel
|
||||
=========================
|
||||
|
||||
The kernel contains a ring of public keys that can be viewed by root. They're
|
||||
in a keyring called ".system_keyring" that can be seen by:
|
||||
in a keyring called ".system_keyring" that can be seen by::
|
||||
|
||||
[root@deneb ~]# cat /proc/keys
|
||||
...
|
||||
@ -184,27 +190,27 @@ in a keyring called ".system_keyring" that can be seen by:
|
||||
|
||||
Beyond the public key generated specifically for module signing, additional
|
||||
trusted certificates can be provided in a PEM-encoded file referenced by the
|
||||
CONFIG_SYSTEM_TRUSTED_KEYS configuration option.
|
||||
``CONFIG_SYSTEM_TRUSTED_KEYS`` configuration option.
|
||||
|
||||
Further, the architecture code may take public keys from a hardware store and
|
||||
add those in also (e.g. from the UEFI key database).
|
||||
|
||||
Finally, it is possible to add additional public keys by doing:
|
||||
Finally, it is possible to add additional public keys by doing::
|
||||
|
||||
keyctl padd asymmetric "" [.system_keyring-ID] <[key-file]
|
||||
|
||||
e.g.:
|
||||
e.g.::
|
||||
|
||||
keyctl padd asymmetric "" 0x223c7853 <my_public_key.x509
|
||||
|
||||
Note, however, that the kernel will only permit keys to be added to
|
||||
.system_keyring _if_ the new key's X.509 wrapper is validly signed by a key
|
||||
``.system_keyring _if_`` the new key's X.509 wrapper is validly signed by a key
|
||||
that is already resident in the .system_keyring at the time the key was added.
|
||||
|
||||
|
||||
=========================
|
||||
MANUALLY SIGNING MODULES
|
||||
=========================
|
||||
========================
|
||||
Manually signing modules
|
||||
========================
|
||||
|
||||
To manually sign a module, use the scripts/sign-file tool available in
|
||||
the Linux kernel source tree. The script requires 4 arguments:
|
||||
@ -214,7 +220,7 @@ the Linux kernel source tree. The script requires 4 arguments:
|
||||
3. The public key filename
|
||||
4. The kernel module to be signed
|
||||
|
||||
The following is an example to sign a kernel module:
|
||||
The following is an example to sign a kernel module::
|
||||
|
||||
scripts/sign-file sha512 kernel-signkey.priv \
|
||||
kernel-signkey.x509 module.ko
|
||||
@ -228,11 +234,11 @@ $KBUILD_SIGN_PIN environment variable.
|
||||
|
||||
|
||||
============================
|
||||
SIGNED MODULES AND STRIPPING
|
||||
Signed modules and stripping
|
||||
============================
|
||||
|
||||
A signed module has a digital signature simply appended at the end. The string
|
||||
"~Module signature appended~." at the end of the module's file confirms that a
|
||||
``~Module signature appended~.`` at the end of the module's file confirms that a
|
||||
signature is present but it does not confirm that the signature is valid!
|
||||
|
||||
Signed modules are BRITTLE as the signature is outside of the defined ELF
|
||||
@ -242,19 +248,19 @@ debug information present at the time of signing.
|
||||
|
||||
|
||||
======================
|
||||
LOADING SIGNED MODULES
|
||||
Loading signed modules
|
||||
======================
|
||||
|
||||
Modules are loaded with insmod, modprobe, init_module() or finit_module(),
|
||||
exactly as for unsigned modules as no processing is done in userspace. The
|
||||
signature checking is all done within the kernel.
|
||||
Modules are loaded with insmod, modprobe, ``init_module()`` or
|
||||
``finit_module()``, exactly as for unsigned modules as no processing is
|
||||
done in userspace. The signature checking is all done within the kernel.
|
||||
|
||||
|
||||
=========================================
|
||||
NON-VALID SIGNATURES AND UNSIGNED MODULES
|
||||
Non-valid signatures and unsigned modules
|
||||
=========================================
|
||||
|
||||
If CONFIG_MODULE_SIG_FORCE is enabled or module.sig_enforce=1 is supplied on
|
||||
If ``CONFIG_MODULE_SIG_FORCE`` is enabled or module.sig_enforce=1 is supplied on
|
||||
the kernel command line, the kernel will only load validly signed modules
|
||||
for which it has a public key. Otherwise, it will also load modules that are
|
||||
unsigned. Any module for which the kernel has a key, but which proves to have
|
||||
@ -264,7 +270,7 @@ Any module that has an unparseable signature will be rejected.
|
||||
|
||||
|
||||
=========================================
|
||||
ADMINISTERING/PROTECTING THE PRIVATE KEY
|
||||
Administering/protecting the private key
|
||||
=========================================
|
||||
|
||||
Since the private key is used to sign modules, viruses and malware could use
|
||||
@ -275,5 +281,5 @@ in the root node of the kernel source tree.
|
||||
If you use the same private key to sign modules for multiple kernel
|
||||
configurations, you must ensure that the module version information is
|
||||
sufficient to prevent loading a module into a different kernel. Either
|
||||
set CONFIG_MODVERSIONS=y or ensure that each configuration has a different
|
||||
kernel release string by changing EXTRAVERSION or CONFIG_LOCALVERSION.
|
||||
set ``CONFIG_MODVERSIONS=y`` or ensure that each configuration has a different
|
||||
kernel release string by changing ``EXTRAVERSION`` or ``CONFIG_LOCALVERSION``.
|
@ -1,5 +1,5 @@
|
||||
Mono(tm) Binary Kernel Support for Linux
|
||||
-----------------------------------------
|
||||
Mono(tm) Binary Kernel Support for Linux
|
||||
-----------------------------------------
|
||||
|
||||
To configure Linux to automatically execute Mono-based .NET binaries
|
||||
(in the form of .exe files) without the need to use the mono CLR
|
||||
@ -19,22 +19,24 @@ other program after you have done the following:
|
||||
http://www.go-mono.com/compiling.html
|
||||
|
||||
Once the Mono CLR support has been installed, just check that
|
||||
/usr/bin/mono (which could be located elsewhere, for example
|
||||
/usr/local/bin/mono) is working.
|
||||
``/usr/bin/mono`` (which could be located elsewhere, for example
|
||||
``/usr/local/bin/mono``) is working.
|
||||
|
||||
2) You have to compile BINFMT_MISC either as a module or into
|
||||
the kernel (CONFIG_BINFMT_MISC) and set it up properly.
|
||||
the kernel (``CONFIG_BINFMT_MISC``) and set it up properly.
|
||||
If you choose to compile it as a module, you will have
|
||||
to insert it manually with modprobe/insmod, as kmod
|
||||
cannot be easily supported with binfmt_misc.
|
||||
Read the file 'binfmt_misc.txt' in this directory to know
|
||||
cannot be easily supported with binfmt_misc.
|
||||
Read the file ``binfmt_misc.txt`` in this directory to know
|
||||
more about the configuration process.
|
||||
|
||||
3) Add the following entries to /etc/rc.local or similar script
|
||||
3) Add the following entries to ``/etc/rc.local`` or similar script
|
||||
to be run at system startup:
|
||||
|
||||
# Insert BINFMT_MISC module into the kernel
|
||||
if [ ! -e /proc/sys/fs/binfmt_misc/register ]; then
|
||||
.. code-block:: sh
|
||||
|
||||
# Insert BINFMT_MISC module into the kernel
|
||||
if [ ! -e /proc/sys/fs/binfmt_misc/register ]; then
|
||||
/sbin/modprobe binfmt_misc
|
||||
# Some distributions, like Fedora Core, perform
|
||||
# the following command automatically when the
|
||||
@ -43,24 +45,26 @@ if [ ! -e /proc/sys/fs/binfmt_misc/register ]; then
|
||||
# Thus, it is possible that the following line
|
||||
# is not needed at all.
|
||||
mount -t binfmt_misc none /proc/sys/fs/binfmt_misc
|
||||
fi
|
||||
fi
|
||||
|
||||
# Register support for .NET CLR binaries
|
||||
if [ -e /proc/sys/fs/binfmt_misc/register ]; then
|
||||
# Register support for .NET CLR binaries
|
||||
if [ -e /proc/sys/fs/binfmt_misc/register ]; then
|
||||
# Replace /usr/bin/mono with the correct pathname to
|
||||
# the Mono CLR runtime (usually /usr/local/bin/mono
|
||||
# when compiling from sources or CVS).
|
||||
echo ':CLR:M::MZ::/usr/bin/mono:' > /proc/sys/fs/binfmt_misc/register
|
||||
else
|
||||
else
|
||||
echo "No binfmt_misc support"
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
|
||||
4) Check that .exe binaries can be ran without the need of a
|
||||
wrapper script, simply by launching the .exe file directly
|
||||
from a command prompt, for example:
|
||||
4) Check that ``.exe`` binaries can be ran without the need of a
|
||||
wrapper script, simply by launching the ``.exe`` file directly
|
||||
from a command prompt, for example::
|
||||
|
||||
/usr/bin/xsd.exe
|
||||
|
||||
NOTE: If this fails with a permission denied error, check
|
||||
that the .exe file has execute permissions.
|
||||
.. note::
|
||||
|
||||
If this fails with a permission denied error, check
|
||||
that the ``.exe`` file has execute permissions.
|
286
Documentation/admin-guide/parport.rst
Normal file
286
Documentation/admin-guide/parport.rst
Normal file
@ -0,0 +1,286 @@
|
||||
Parport
|
||||
+++++++
|
||||
|
||||
The ``parport`` code provides parallel-port support under Linux. This
|
||||
includes the ability to share one port between multiple device
|
||||
drivers.
|
||||
|
||||
You can pass parameters to the ``parport`` code to override its automatic
|
||||
detection of your hardware. This is particularly useful if you want
|
||||
to use IRQs, since in general these can't be autoprobed successfully.
|
||||
By default IRQs are not used even if they **can** be probed. This is
|
||||
because there are a lot of people using the same IRQ for their
|
||||
parallel port and a sound card or network card.
|
||||
|
||||
The ``parport`` code is split into two parts: generic (which deals with
|
||||
port-sharing) and architecture-dependent (which deals with actually
|
||||
using the port).
|
||||
|
||||
|
||||
Parport as modules
|
||||
==================
|
||||
|
||||
If you load the `parport`` code as a module, say::
|
||||
|
||||
# insmod parport
|
||||
|
||||
to load the generic ``parport`` code. You then must load the
|
||||
architecture-dependent code with (for example)::
|
||||
|
||||
# insmod parport_pc io=0x3bc,0x378,0x278 irq=none,7,auto
|
||||
|
||||
to tell the ``parport`` code that you want three PC-style ports, one at
|
||||
0x3bc with no IRQ, one at 0x378 using IRQ 7, and one at 0x278 with an
|
||||
auto-detected IRQ. Currently, PC-style (``parport_pc``), Sun ``bpp``,
|
||||
Amiga, Atari, and MFC3 hardware is supported.
|
||||
|
||||
PCI parallel I/O card support comes from ``parport_pc``. Base I/O
|
||||
addresses should not be specified for supported PCI cards since they
|
||||
are automatically detected.
|
||||
|
||||
|
||||
modprobe
|
||||
--------
|
||||
|
||||
If you use modprobe , you will find it useful to add lines as below to a
|
||||
configuration file in /etc/modprobe.d/ directory::
|
||||
|
||||
alias parport_lowlevel parport_pc
|
||||
options parport_pc io=0x378,0x278 irq=7,auto
|
||||
|
||||
modprobe will load ``parport_pc`` (with the options ``io=0x378,0x278 irq=7,auto``)
|
||||
whenever a parallel port device driver (such as ``lp``) is loaded.
|
||||
|
||||
Note that these are example lines only! You shouldn't in general need
|
||||
to specify any options to ``parport_pc`` in order to be able to use a
|
||||
parallel port.
|
||||
|
||||
|
||||
Parport probe [optional]
|
||||
------------------------
|
||||
|
||||
In 2.2 kernels there was a module called ``parport_probe``, which was used
|
||||
for collecting IEEE 1284 device ID information. This has now been
|
||||
enhanced and now lives with the IEEE 1284 support. When a parallel
|
||||
port is detected, the devices that are connected to it are analysed,
|
||||
and information is logged like this::
|
||||
|
||||
parport0: Printer, BJC-210 (Canon)
|
||||
|
||||
The probe information is available from files in ``/proc/sys/dev/parport/``.
|
||||
|
||||
|
||||
Parport linked into the kernel statically
|
||||
=========================================
|
||||
|
||||
If you compile the ``parport`` code into the kernel, then you can use
|
||||
kernel boot parameters to get the same effect. Add something like the
|
||||
following to your LILO command line::
|
||||
|
||||
parport=0x3bc parport=0x378,7 parport=0x278,auto,nofifo
|
||||
|
||||
You can have many ``parport=...`` statements, one for each port you want
|
||||
to add. Adding ``parport=0`` to the kernel command-line will disable
|
||||
parport support entirely. Adding ``parport=auto`` to the kernel
|
||||
command-line will make ``parport`` use any IRQ lines or DMA channels that
|
||||
it auto-detects.
|
||||
|
||||
|
||||
Files in /proc
|
||||
==============
|
||||
|
||||
If you have configured the ``/proc`` filesystem into your kernel, you will
|
||||
see a new directory entry: ``/proc/sys/dev/parport``. In there will be a
|
||||
directory entry for each parallel port for which parport is
|
||||
configured. In each of those directories are a collection of files
|
||||
describing that parallel port.
|
||||
|
||||
The ``/proc/sys/dev/parport`` directory tree looks like::
|
||||
|
||||
parport
|
||||
|-- default
|
||||
| |-- spintime
|
||||
| `-- timeslice
|
||||
|-- parport0
|
||||
| |-- autoprobe
|
||||
| |-- autoprobe0
|
||||
| |-- autoprobe1
|
||||
| |-- autoprobe2
|
||||
| |-- autoprobe3
|
||||
| |-- devices
|
||||
| | |-- active
|
||||
| | `-- lp
|
||||
| | `-- timeslice
|
||||
| |-- base-addr
|
||||
| |-- irq
|
||||
| |-- dma
|
||||
| |-- modes
|
||||
| `-- spintime
|
||||
`-- parport1
|
||||
|-- autoprobe
|
||||
|-- autoprobe0
|
||||
|-- autoprobe1
|
||||
|-- autoprobe2
|
||||
|-- autoprobe3
|
||||
|-- devices
|
||||
| |-- active
|
||||
| `-- ppa
|
||||
| `-- timeslice
|
||||
|-- base-addr
|
||||
|-- irq
|
||||
|-- dma
|
||||
|-- modes
|
||||
`-- spintime
|
||||
|
||||
.. tabularcolumns:: |p{4.0cm}|p{13.5cm}|
|
||||
|
||||
======================= =======================================================
|
||||
File Contents
|
||||
======================= =======================================================
|
||||
``devices/active`` A list of the device drivers using that port. A "+"
|
||||
will appear by the name of the device currently using
|
||||
the port (it might not appear against any). The
|
||||
string "none" means that there are no device drivers
|
||||
using that port.
|
||||
|
||||
``base-addr`` Parallel port's base address, or addresses if the port
|
||||
has more than one in which case they are separated
|
||||
with tabs. These values might not have any sensible
|
||||
meaning for some ports.
|
||||
|
||||
``irq`` Parallel port's IRQ, or -1 if none is being used.
|
||||
|
||||
``dma`` Parallel port's DMA channel, or -1 if none is being
|
||||
used.
|
||||
|
||||
``modes`` Parallel port's hardware modes, comma-separated,
|
||||
meaning:
|
||||
|
||||
- PCSPP
|
||||
PC-style SPP registers are available.
|
||||
|
||||
- TRISTATE
|
||||
Port is bidirectional.
|
||||
|
||||
- COMPAT
|
||||
Hardware acceleration for printers is
|
||||
available and will be used.
|
||||
|
||||
- EPP
|
||||
Hardware acceleration for EPP protocol
|
||||
is available and will be used.
|
||||
|
||||
- ECP
|
||||
Hardware acceleration for ECP protocol
|
||||
is available and will be used.
|
||||
|
||||
- DMA
|
||||
DMA is available and will be used.
|
||||
|
||||
Note that the current implementation will only take
|
||||
advantage of COMPAT and ECP modes if it has an IRQ
|
||||
line to use.
|
||||
|
||||
``autoprobe`` Any IEEE-1284 device ID information that has been
|
||||
acquired from the (non-IEEE 1284.3) device.
|
||||
|
||||
``autoprobe[0-3]`` IEEE 1284 device ID information retrieved from
|
||||
daisy-chain devices that conform to IEEE 1284.3.
|
||||
|
||||
``spintime`` The number of microseconds to busy-loop while waiting
|
||||
for the peripheral to respond. You might find that
|
||||
adjusting this improves performance, depending on your
|
||||
peripherals. This is a port-wide setting, i.e. it
|
||||
applies to all devices on a particular port.
|
||||
|
||||
``timeslice`` The number of milliseconds that a device driver is
|
||||
allowed to keep a port claimed for. This is advisory,
|
||||
and driver can ignore it if it must.
|
||||
|
||||
``default/*`` The defaults for spintime and timeslice. When a new
|
||||
port is registered, it picks up the default spintime.
|
||||
When a new device is registered, it picks up the
|
||||
default timeslice.
|
||||
======================= =======================================================
|
||||
|
||||
Device drivers
|
||||
==============
|
||||
|
||||
Once the parport code is initialised, you can attach device drivers to
|
||||
specific ports. Normally this happens automatically; if the lp driver
|
||||
is loaded it will create one lp device for each port found. You can
|
||||
override this, though, by using parameters either when you load the lp
|
||||
driver::
|
||||
|
||||
# insmod lp parport=0,2
|
||||
|
||||
or on the LILO command line::
|
||||
|
||||
lp=parport0 lp=parport2
|
||||
|
||||
Both the above examples would inform lp that you want ``/dev/lp0`` to be
|
||||
the first parallel port, and /dev/lp1 to be the **third** parallel port,
|
||||
with no lp device associated with the second port (parport1). Note
|
||||
that this is different to the way older kernels worked; there used to
|
||||
be a static association between the I/O port address and the device
|
||||
name, so ``/dev/lp0`` was always the port at 0x3bc. This is no longer the
|
||||
case - if you only have one port, it will default to being ``/dev/lp0``,
|
||||
regardless of base address.
|
||||
|
||||
Also:
|
||||
|
||||
* If you selected the IEEE 1284 support at compile time, you can say
|
||||
``lp=auto`` on the kernel command line, and lp will create devices
|
||||
only for those ports that seem to have printers attached.
|
||||
|
||||
* If you give PLIP the ``timid`` parameter, either with ``plip=timid`` on
|
||||
the command line, or with ``insmod plip timid=1`` when using modules,
|
||||
it will avoid any ports that seem to be in use by other devices.
|
||||
|
||||
* IRQ autoprobing works only for a few port types at the moment.
|
||||
|
||||
Reporting printer problems with parport
|
||||
=======================================
|
||||
|
||||
If you are having problems printing, please go through these steps to
|
||||
try to narrow down where the problem area is.
|
||||
|
||||
When reporting problems with parport, really you need to give all of
|
||||
the messages that ``parport_pc`` spits out when it initialises. There are
|
||||
several code paths:
|
||||
|
||||
- polling
|
||||
- interrupt-driven, protocol in software
|
||||
- interrupt-driven, protocol in hardware using PIO
|
||||
- interrupt-driven, protocol in hardware using DMA
|
||||
|
||||
The kernel messages that ``parport_pc`` logs give an indication of which
|
||||
code path is being used. (They could be a lot better actually..)
|
||||
|
||||
For normal printer protocol, having IEEE 1284 modes enabled or not
|
||||
should not make a difference.
|
||||
|
||||
To turn off the 'protocol in hardware' code paths, disable
|
||||
``CONFIG_PARPORT_PC_FIFO``. Note that when they are enabled they are not
|
||||
necessarily **used**; it depends on whether the hardware is available,
|
||||
enabled by the BIOS, and detected by the driver.
|
||||
|
||||
So, to start with, disable ``CONFIG_PARPORT_PC_FIFO``, and load ``parport_pc``
|
||||
with ``irq=none``. See if printing works then. It really should,
|
||||
because this is the simplest code path.
|
||||
|
||||
If that works fine, try with ``io=0x378 irq=7`` (adjust for your
|
||||
hardware), to make it use interrupt-driven in-software protocol.
|
||||
|
||||
If **that** works fine, then one of the hardware modes isn't working
|
||||
right. Enable ``CONFIG_FIFO`` (no, it isn't a module option,
|
||||
and yes, it should be), set the port to ECP mode in the BIOS and note
|
||||
the DMA channel, and try with::
|
||||
|
||||
io=0x378 irq=7 dma=none (for PIO)
|
||||
io=0x378 irq=7 dma=3 (for DMA)
|
||||
|
||||
----------
|
||||
|
||||
philb@gnu.org
|
||||
tim@cyberelk.net
|
@ -5,34 +5,37 @@ Sergiu Iordache <sergiu@chromium.org>
|
||||
|
||||
Updated: 17 November 2011
|
||||
|
||||
0. Introduction
|
||||
Introduction
|
||||
------------
|
||||
|
||||
Ramoops is an oops/panic logger that writes its logs to RAM before the system
|
||||
crashes. It works by logging oopses and panics in a circular buffer. Ramoops
|
||||
needs a system with persistent RAM so that the content of that area can
|
||||
survive after a restart.
|
||||
|
||||
1. Ramoops concepts
|
||||
Ramoops concepts
|
||||
----------------
|
||||
|
||||
Ramoops uses a predefined memory area to store the dump. The start and size
|
||||
and type of the memory area are set using three variables:
|
||||
* "mem_address" for the start
|
||||
* "mem_size" for the size. The memory size will be rounded down to a
|
||||
power of two.
|
||||
* "mem_type" to specifiy if the memory type (default is pgprot_writecombine).
|
||||
|
||||
Typically the default value of mem_type=0 should be used as that sets the pstore
|
||||
mapping to pgprot_writecombine. Setting mem_type=1 attempts to use
|
||||
pgprot_noncached, which only works on some platforms. This is because pstore
|
||||
* ``mem_address`` for the start
|
||||
* ``mem_size`` for the size. The memory size will be rounded down to a
|
||||
power of two.
|
||||
* ``mem_type`` to specifiy if the memory type (default is pgprot_writecombine).
|
||||
|
||||
Typically the default value of ``mem_type=0`` should be used as that sets the pstore
|
||||
mapping to pgprot_writecombine. Setting ``mem_type=1`` attempts to use
|
||||
``pgprot_noncached``, which only works on some platforms. This is because pstore
|
||||
depends on atomic operations. At least on ARM, pgprot_noncached causes the
|
||||
memory to be mapped strongly ordered, and atomic operations on strongly ordered
|
||||
memory are implementation defined, and won't work on many ARMs such as omaps.
|
||||
|
||||
The memory area is divided into "record_size" chunks (also rounded down to
|
||||
power of two) and each oops/panic writes a "record_size" chunk of
|
||||
The memory area is divided into ``record_size`` chunks (also rounded down to
|
||||
power of two) and each oops/panic writes a ``record_size`` chunk of
|
||||
information.
|
||||
|
||||
Dumping both oopses and panics can be done by setting 1 in the "dump_oops"
|
||||
Dumping both oopses and panics can be done by setting 1 in the ``dump_oops``
|
||||
variable while setting 0 in that variable dumps only the panics.
|
||||
|
||||
The module uses a counter to record multiple dumps but the counter gets reset
|
||||
@ -43,7 +46,8 @@ This might be useful when a hardware reset was used to bring the machine back
|
||||
to life (i.e. a watchdog triggered). In such cases, RAM may be somewhat
|
||||
corrupt, but usually it is restorable.
|
||||
|
||||
2. Setting the parameters
|
||||
Setting the parameters
|
||||
----------------------
|
||||
|
||||
Setting the ramoops parameters can be done in several different manners:
|
||||
|
||||
@ -52,12 +56,13 @@ Setting the ramoops parameters can be done in several different manners:
|
||||
boot and then use the reserved memory for ramoops. For example, assuming a
|
||||
machine with > 128 MB of memory, the following kernel command line will tell
|
||||
the kernel to use only the first 128 MB of memory, and place ECC-protected
|
||||
ramoops region at 128 MB boundary:
|
||||
"mem=128M ramoops.mem_address=0x8000000 ramoops.ecc=1"
|
||||
ramoops region at 128 MB boundary::
|
||||
|
||||
mem=128M ramoops.mem_address=0x8000000 ramoops.ecc=1
|
||||
|
||||
B. Use Device Tree bindings, as described in
|
||||
Documentation/device-tree/bindings/reserved-memory/ramoops.txt.
|
||||
For example:
|
||||
``Documentation/device-tree/bindings/reserved-memory/admin-guide/ramoops.rst``.
|
||||
For example::
|
||||
|
||||
reserved-memory {
|
||||
#address-cells = <2>;
|
||||
@ -75,58 +80,63 @@ Setting the ramoops parameters can be done in several different manners:
|
||||
C. Use a platform device and set the platform data. The parameters can then
|
||||
be set through that platform data. An example of doing that is:
|
||||
|
||||
#include <linux/pstore_ram.h>
|
||||
[...]
|
||||
.. code-block:: c
|
||||
|
||||
static struct ramoops_platform_data ramoops_data = {
|
||||
#include <linux/pstore_ram.h>
|
||||
[...]
|
||||
|
||||
static struct ramoops_platform_data ramoops_data = {
|
||||
.mem_size = <...>,
|
||||
.mem_address = <...>,
|
||||
.mem_type = <...>,
|
||||
.record_size = <...>,
|
||||
.dump_oops = <...>,
|
||||
.ecc = <...>,
|
||||
};
|
||||
};
|
||||
|
||||
static struct platform_device ramoops_dev = {
|
||||
static struct platform_device ramoops_dev = {
|
||||
.name = "ramoops",
|
||||
.dev = {
|
||||
.platform_data = &ramoops_data,
|
||||
},
|
||||
};
|
||||
};
|
||||
|
||||
[... inside a function ...]
|
||||
int ret;
|
||||
[... inside a function ...]
|
||||
int ret;
|
||||
|
||||
ret = platform_device_register(&ramoops_dev);
|
||||
if (ret) {
|
||||
ret = platform_device_register(&ramoops_dev);
|
||||
if (ret) {
|
||||
printk(KERN_ERR "unable to register platform device\n");
|
||||
return ret;
|
||||
}
|
||||
}
|
||||
|
||||
You can specify either RAM memory or peripheral devices' memory. However, when
|
||||
specifying RAM, be sure to reserve the memory by issuing memblock_reserve()
|
||||
very early in the architecture code, e.g.:
|
||||
very early in the architecture code, e.g.::
|
||||
|
||||
#include <linux/memblock.h>
|
||||
#include <linux/memblock.h>
|
||||
|
||||
memblock_reserve(ramoops_data.mem_address, ramoops_data.mem_size);
|
||||
memblock_reserve(ramoops_data.mem_address, ramoops_data.mem_size);
|
||||
|
||||
3. Dump format
|
||||
Dump format
|
||||
-----------
|
||||
|
||||
The data dump begins with a header, currently defined as "====" followed by a
|
||||
The data dump begins with a header, currently defined as ``====`` followed by a
|
||||
timestamp and a new line. The dump then continues with the actual data.
|
||||
|
||||
4. Reading the data
|
||||
Reading the data
|
||||
----------------
|
||||
|
||||
The dump data can be read from the pstore filesystem. The format for these
|
||||
files is "dmesg-ramoops-N", where N is the record number in memory. To delete
|
||||
files is ``dmesg-ramoops-N``, where N is the record number in memory. To delete
|
||||
a stored record from RAM, simply unlink the respective pstore file.
|
||||
|
||||
5. Persistent function tracing
|
||||
Persistent function tracing
|
||||
---------------------------
|
||||
|
||||
Persistent function tracing might be useful for debugging software or hardware
|
||||
related hangs. The functions call chain log is stored in a "ftrace-ramoops"
|
||||
file. Here is an example of usage:
|
||||
related hangs. The functions call chain log is stored in a ``ftrace-ramoops``
|
||||
file. Here is an example of usage::
|
||||
|
||||
# mount -t debugfs debugfs /sys/kernel/debug/
|
||||
# echo 1 > /sys/kernel/debug/pstore/record_ftrace
|
@ -1,3 +1,8 @@
|
||||
.. _reportingbugs:
|
||||
|
||||
Reporting bugs
|
||||
++++++++++++++
|
||||
|
||||
Background
|
||||
==========
|
||||
|
||||
@ -50,12 +55,13 @@ maintainer replies to you, make sure to 'Reply-all' in order to keep the
|
||||
public mailing list(s) in the email thread.
|
||||
|
||||
If you know which driver is causing issues, you can pass one of the driver
|
||||
files to the get_maintainer.pl script:
|
||||
files to the get_maintainer.pl script::
|
||||
|
||||
perl scripts/get_maintainer.pl -f <filename>
|
||||
|
||||
If it is a security bug, please copy the Security Contact listed in the
|
||||
MAINTAINERS file. They can help coordinate bugfix and disclosure. See
|
||||
Documentation/SecurityBugs for more information.
|
||||
:ref:`Documentation/admin-guide/security-bugs.rst <securitybugs>` for more information.
|
||||
|
||||
If you can't figure out which subsystem caused the issue, you should file
|
||||
a bug in kernel.org bugzilla and send email to
|
||||
@ -69,8 +75,9 @@ Tips for reporting bugs
|
||||
|
||||
If you haven't reported a bug before, please read:
|
||||
|
||||
http://www.chiark.greenend.org.uk/~sgtatham/bugs.html
|
||||
http://www.catb.org/esr/faqs/smart-questions.html
|
||||
http://www.chiark.greenend.org.uk/~sgtatham/bugs.html
|
||||
|
||||
http://www.catb.org/esr/faqs/smart-questions.html
|
||||
|
||||
It's REALLY important to report bugs that seem unrelated as separate email
|
||||
threads or separate bugzilla entries. If you report several unrelated
|
||||
@ -87,7 +94,7 @@ step-by-step instructions for how a user can trigger the bug.
|
||||
|
||||
If the failure includes an "OOPS:", take a picture of the screen, capture
|
||||
a netconsole trace, or type the message from your screen into the bug
|
||||
report. Please read "Documentation/oops-tracing.txt" before posting your
|
||||
report. Please read "Documentation/admin-guide/oops-tracing.rst" before posting your
|
||||
bug report. This explains what you should do with the "Oops" information
|
||||
to make it useful to the recipient.
|
||||
|
||||
@ -99,34 +106,34 @@ relevant to your bug, feel free to exclude it.
|
||||
|
||||
First run the ver_linux script included as scripts/ver_linux, which
|
||||
reports the version of some important subsystems. Run this script with
|
||||
the command "sh scripts/ver_linux".
|
||||
the command ``awk -f scripts/ver_linux``.
|
||||
|
||||
Use that information to fill in all fields of the bug report form, and
|
||||
post it to the mailing list with a subject of "PROBLEM: <one line
|
||||
summary from [1.]>" for easy identification by the developers.
|
||||
summary from [1.]>" for easy identification by the developers::
|
||||
|
||||
[1.] One line summary of the problem:
|
||||
[2.] Full description of the problem/report:
|
||||
[3.] Keywords (i.e., modules, networking, kernel):
|
||||
[4.] Kernel information
|
||||
[4.1.] Kernel version (from /proc/version):
|
||||
[4.2.] Kernel .config file:
|
||||
[5.] Most recent kernel version which did not have the bug:
|
||||
[6.] Output of Oops.. message (if applicable) with symbolic information
|
||||
resolved (see Documentation/oops-tracing.txt)
|
||||
[7.] A small shell script or example program which triggers the
|
||||
problem (if possible)
|
||||
[8.] Environment
|
||||
[8.1.] Software (add the output of the ver_linux script here)
|
||||
[8.2.] Processor information (from /proc/cpuinfo):
|
||||
[8.3.] Module information (from /proc/modules):
|
||||
[8.4.] Loaded driver and hardware information (/proc/ioports, /proc/iomem)
|
||||
[8.5.] PCI information ('lspci -vvv' as root)
|
||||
[8.6.] SCSI information (from /proc/scsi/scsi)
|
||||
[8.7.] Other information that might be relevant to the problem
|
||||
(please look in /proc and include all information that you
|
||||
think to be relevant):
|
||||
[X.] Other notes, patches, fixes, workarounds:
|
||||
[1.] One line summary of the problem:
|
||||
[2.] Full description of the problem/report:
|
||||
[3.] Keywords (i.e., modules, networking, kernel):
|
||||
[4.] Kernel information
|
||||
[4.1.] Kernel version (from /proc/version):
|
||||
[4.2.] Kernel .config file:
|
||||
[5.] Most recent kernel version which did not have the bug:
|
||||
[6.] Output of Oops.. message (if applicable) with symbolic information
|
||||
resolved (see Documentation/admin-guide/oops-tracing.rst)
|
||||
[7.] A small shell script or example program which triggers the
|
||||
problem (if possible)
|
||||
[8.] Environment
|
||||
[8.1.] Software (add the output of the ver_linux script here)
|
||||
[8.2.] Processor information (from /proc/cpuinfo):
|
||||
[8.3.] Module information (from /proc/modules):
|
||||
[8.4.] Loaded driver and hardware information (/proc/ioports, /proc/iomem)
|
||||
[8.5.] PCI information ('lspci -vvv' as root)
|
||||
[8.6.] SCSI information (from /proc/scsi/scsi)
|
||||
[8.7.] Other information that might be relevant to the problem
|
||||
(please look in /proc and include all information that you
|
||||
think to be relevant):
|
||||
[X.] Other notes, patches, fixes, workarounds:
|
||||
|
||||
|
||||
Follow up
|
||||
@ -153,7 +160,8 @@ Expectations for kernel maintainers
|
||||
Linux kernel maintainers are busy, overworked human beings. Some times
|
||||
they may not be able to address your bug in a day, a week, or two weeks.
|
||||
If they don't answer your email, they may be on vacation, or at a Linux
|
||||
conference. Check the conference schedule at LWN.net for more info:
|
||||
conference. Check the conference schedule at https://LWN.net for more info:
|
||||
|
||||
https://lwn.net/Calendar/
|
||||
|
||||
In general, kernel maintainers take 1 to 5 business days to respond to
|
@ -8,8 +8,8 @@ like to know when a security bug is found so that it can be fixed and
|
||||
disclosed as quickly as possible. Please report security bugs to the
|
||||
Linux kernel security team.
|
||||
|
||||
1) Contact
|
||||
----------
|
||||
Contact
|
||||
-------
|
||||
|
||||
The Linux kernel security team can be contacted by email at
|
||||
<security@kernel.org>. This is a private list of security officers
|
||||
@ -19,12 +19,12 @@ area maintainers to understand and fix the security vulnerability.
|
||||
|
||||
As it is with any bug, the more information provided the easier it
|
||||
will be to diagnose and fix. Please review the procedure outlined in
|
||||
REPORTING-BUGS if you are unclear about what information is helpful.
|
||||
admin-guide/reporting-bugs.rst if you are unclear about what information is helpful.
|
||||
Any exploit code is very helpful and will not be released without
|
||||
consent from the reporter unless it has already been made public.
|
||||
|
||||
2) Disclosure
|
||||
-------------
|
||||
Disclosure
|
||||
----------
|
||||
|
||||
The goal of the Linux kernel security team is to work with the
|
||||
bug submitter to bug resolution as well as disclosure. We prefer
|
||||
@ -39,8 +39,8 @@ disclosure is from immediate (esp. if it's already publicly known)
|
||||
to a few weeks. As a basic default policy, we expect report date to
|
||||
disclosure date to be on the order of 7 days.
|
||||
|
||||
3) Non-disclosure agreements
|
||||
----------------------------
|
||||
Non-disclosure agreements
|
||||
-------------------------
|
||||
|
||||
The Linux kernel security team is not a formal body and therefore unable
|
||||
to enter any non-disclosure agreements.
|
@ -1,15 +1,21 @@
|
||||
Linux Serial Console
|
||||
.. _serial_console:
|
||||
|
||||
Linux Serial Console
|
||||
====================
|
||||
|
||||
To use a serial port as console you need to compile the support into your
|
||||
kernel - by default it is not compiled in. For PC style serial ports
|
||||
it's the config option next to "Standard/generic (dumb) serial support".
|
||||
it's the config option next to menu option:
|
||||
|
||||
:menuselection:`Character devices --> Serial drivers --> 8250/16550 and compatible serial support --> Console on 8250/16550 and compatible serial port`
|
||||
|
||||
You must compile serial support into the kernel and not as a module.
|
||||
|
||||
It is possible to specify multiple devices for console output. You can
|
||||
define a new kernel command line option to select which device(s) to
|
||||
use for console output.
|
||||
|
||||
The format of this option is:
|
||||
The format of this option is::
|
||||
|
||||
console=device,options
|
||||
|
||||
@ -28,11 +34,11 @@ The format of this option is:
|
||||
|
||||
You can specify multiple console= options on the kernel command line.
|
||||
Output will appear on all of them. The last device will be used when
|
||||
you open /dev/console. So, for example:
|
||||
you open ``/dev/console``. So, for example::
|
||||
|
||||
console=ttyS1,9600 console=tty0
|
||||
|
||||
defines that opening /dev/console will get you the current foreground
|
||||
defines that opening ``/dev/console`` will get you the current foreground
|
||||
virtual console, and kernel messages will appear on both the VGA
|
||||
console and the 2nd serial port (ttyS1 or COM2) at 9600 baud.
|
||||
|
||||
@ -44,61 +50,61 @@ first looks for a VGA card and then for a serial port. So if you don't
|
||||
have a VGA card in your system the first serial port will automatically
|
||||
become the console.
|
||||
|
||||
You will need to create a new device to use /dev/console. The official
|
||||
/dev/console is now character device 5,1.
|
||||
You will need to create a new device to use ``/dev/console``. The official
|
||||
``/dev/console`` is now character device 5,1.
|
||||
|
||||
(You can also use a network device as a console. See
|
||||
Documentation/networking/netconsole.txt for information on that.)
|
||||
``Documentation/networking/netconsole.txt`` for information on that.)
|
||||
|
||||
Here's an example that will use /dev/ttyS1 (COM2) as the console.
|
||||
Here's an example that will use ``/dev/ttyS1`` (COM2) as the console.
|
||||
Replace the sample values as needed.
|
||||
|
||||
1. Create /dev/console (real console) and /dev/tty0 (master virtual
|
||||
console):
|
||||
1. Create ``/dev/console`` (real console) and ``/dev/tty0`` (master virtual
|
||||
console)::
|
||||
|
||||
cd /dev
|
||||
rm -f console tty0
|
||||
mknod -m 622 console c 5 1
|
||||
mknod -m 622 tty0 c 4 0
|
||||
cd /dev
|
||||
rm -f console tty0
|
||||
mknod -m 622 console c 5 1
|
||||
mknod -m 622 tty0 c 4 0
|
||||
|
||||
2. LILO can also take input from a serial device. This is a very
|
||||
useful option. To tell LILO to use the serial port:
|
||||
In lilo.conf (global section):
|
||||
In lilo.conf (global section)::
|
||||
|
||||
serial = 1,9600n8 (ttyS1, 9600 bd, no parity, 8 bits)
|
||||
serial = 1,9600n8 (ttyS1, 9600 bd, no parity, 8 bits)
|
||||
|
||||
3. Adjust to kernel flags for the new kernel,
|
||||
again in lilo.conf (kernel section)
|
||||
again in lilo.conf (kernel section)::
|
||||
|
||||
append = "console=ttyS1,9600"
|
||||
append = "console=ttyS1,9600"
|
||||
|
||||
4. Make sure a getty runs on the serial port so that you can login to
|
||||
it once the system is done booting. This is done by adding a line
|
||||
like this to /etc/inittab (exact syntax depends on your getty):
|
||||
like this to ``/etc/inittab`` (exact syntax depends on your getty)::
|
||||
|
||||
S1:23:respawn:/sbin/getty -L ttyS1 9600 vt100
|
||||
S1:23:respawn:/sbin/getty -L ttyS1 9600 vt100
|
||||
|
||||
5. Init and /etc/ioctl.save
|
||||
5. Init and ``/etc/ioctl.save``
|
||||
|
||||
Sysvinit remembers its stty settings in a file in /etc, called
|
||||
`/etc/ioctl.save'. REMOVE THIS FILE before using the serial
|
||||
Sysvinit remembers its stty settings in a file in ``/etc``, called
|
||||
``/etc/ioctl.save``. REMOVE THIS FILE before using the serial
|
||||
console for the first time, because otherwise init will probably
|
||||
set the baudrate to 38400 (baudrate of the virtual console).
|
||||
|
||||
6. /dev/console and X
|
||||
6. ``/dev/console`` and X
|
||||
Programs that want to do something with the virtual console usually
|
||||
open /dev/console. If you have created the new /dev/console device,
|
||||
open ``/dev/console``. If you have created the new ``/dev/console`` device,
|
||||
and your console is NOT the virtual console some programs will fail.
|
||||
Those are programs that want to access the VT interface, and use
|
||||
/dev/console instead of /dev/tty0. Some of those programs are:
|
||||
``/dev/console instead of /dev/tty0``. Some of those programs are::
|
||||
|
||||
Xfree86, svgalib, gpm, SVGATextMode
|
||||
Xfree86, svgalib, gpm, SVGATextMode
|
||||
|
||||
It should be fixed in modern versions of these programs though.
|
||||
|
||||
Note that if you boot without a console= option (or with
|
||||
console=/dev/tty0), /dev/console is the same as /dev/tty0. In that
|
||||
case everything will still work.
|
||||
Note that if you boot without a ``console=`` option (or with
|
||||
``console=/dev/tty0``), ``/dev/console`` is the same as ``/dev/tty0``.
|
||||
In that case everything will still work.
|
||||
|
||||
7. Thanks
|
||||
|
192
Documentation/admin-guide/sysfs-rules.rst
Normal file
192
Documentation/admin-guide/sysfs-rules.rst
Normal file
@ -0,0 +1,192 @@
|
||||
Rules on how to access information in sysfs
|
||||
===========================================
|
||||
|
||||
The kernel-exported sysfs exports internal kernel implementation details
|
||||
and depends on internal kernel structures and layout. It is agreed upon
|
||||
by the kernel developers that the Linux kernel does not provide a stable
|
||||
internal API. Therefore, there are aspects of the sysfs interface that
|
||||
may not be stable across kernel releases.
|
||||
|
||||
To minimize the risk of breaking users of sysfs, which are in most cases
|
||||
low-level userspace applications, with a new kernel release, the users
|
||||
of sysfs must follow some rules to use an as-abstract-as-possible way to
|
||||
access this filesystem. The current udev and HAL programs already
|
||||
implement this and users are encouraged to plug, if possible, into the
|
||||
abstractions these programs provide instead of accessing sysfs directly.
|
||||
|
||||
But if you really do want or need to access sysfs directly, please follow
|
||||
the following rules and then your programs should work with future
|
||||
versions of the sysfs interface.
|
||||
|
||||
- Do not use libsysfs
|
||||
It makes assumptions about sysfs which are not true. Its API does not
|
||||
offer any abstraction, it exposes all the kernel driver-core
|
||||
implementation details in its own API. Therefore it is not better than
|
||||
reading directories and opening the files yourself.
|
||||
Also, it is not actively maintained, in the sense of reflecting the
|
||||
current kernel development. The goal of providing a stable interface
|
||||
to sysfs has failed; it causes more problems than it solves. It
|
||||
violates many of the rules in this document.
|
||||
|
||||
- sysfs is always at ``/sys``
|
||||
Parsing ``/proc/mounts`` is a waste of time. Other mount points are a
|
||||
system configuration bug you should not try to solve. For test cases,
|
||||
possibly support a ``SYSFS_PATH`` environment variable to overwrite the
|
||||
application's behavior, but never try to search for sysfs. Never try
|
||||
to mount it, if you are not an early boot script.
|
||||
|
||||
- devices are only "devices"
|
||||
There is no such thing like class-, bus-, physical devices,
|
||||
interfaces, and such that you can rely on in userspace. Everything is
|
||||
just simply a "device". Class-, bus-, physical, ... types are just
|
||||
kernel implementation details which should not be expected by
|
||||
applications that look for devices in sysfs.
|
||||
|
||||
The properties of a device are:
|
||||
|
||||
- devpath (``/devices/pci0000:00/0000:00:1d.1/usb2/2-2/2-2:1.0``)
|
||||
|
||||
- identical to the DEVPATH value in the event sent from the kernel
|
||||
at device creation and removal
|
||||
- the unique key to the device at that point in time
|
||||
- the kernel's path to the device directory without the leading
|
||||
``/sys``, and always starting with a slash
|
||||
- all elements of a devpath must be real directories. Symlinks
|
||||
pointing to /sys/devices must always be resolved to their real
|
||||
target and the target path must be used to access the device.
|
||||
That way the devpath to the device matches the devpath of the
|
||||
kernel used at event time.
|
||||
- using or exposing symlink values as elements in a devpath string
|
||||
is a bug in the application
|
||||
|
||||
- kernel name (``sda``, ``tty``, ``0000:00:1f.2``, ...)
|
||||
|
||||
- a directory name, identical to the last element of the devpath
|
||||
- applications need to handle spaces and characters like ``!`` in
|
||||
the name
|
||||
|
||||
- subsystem (``block``, ``tty``, ``pci``, ...)
|
||||
|
||||
- simple string, never a path or a link
|
||||
- retrieved by reading the "subsystem"-link and using only the
|
||||
last element of the target path
|
||||
|
||||
- driver (``tg3``, ``ata_piix``, ``uhci_hcd``)
|
||||
|
||||
- a simple string, which may contain spaces, never a path or a
|
||||
link
|
||||
- it is retrieved by reading the "driver"-link and using only the
|
||||
last element of the target path
|
||||
- devices which do not have "driver"-link just do not have a
|
||||
driver; copying the driver value in a child device context is a
|
||||
bug in the application
|
||||
|
||||
- attributes
|
||||
|
||||
- the files in the device directory or files below subdirectories
|
||||
of the same device directory
|
||||
- accessing attributes reached by a symlink pointing to another device,
|
||||
like the "device"-link, is a bug in the application
|
||||
|
||||
Everything else is just a kernel driver-core implementation detail
|
||||
that should not be assumed to be stable across kernel releases.
|
||||
|
||||
- Properties of parent devices never belong into a child device.
|
||||
Always look at the parent devices themselves for determining device
|
||||
context properties. If the device ``eth0`` or ``sda`` does not have a
|
||||
"driver"-link, then this device does not have a driver. Its value is empty.
|
||||
Never copy any property of the parent-device into a child-device. Parent
|
||||
device properties may change dynamically without any notice to the
|
||||
child device.
|
||||
|
||||
- Hierarchy in a single device tree
|
||||
There is only one valid place in sysfs where hierarchy can be examined
|
||||
and this is below: ``/sys/devices.``
|
||||
It is planned that all device directories will end up in the tree
|
||||
below this directory.
|
||||
|
||||
- Classification by subsystem
|
||||
There are currently three places for classification of devices:
|
||||
``/sys/block,`` ``/sys/class`` and ``/sys/bus.`` It is planned that these will
|
||||
not contain any device directories themselves, but only flat lists of
|
||||
symlinks pointing to the unified ``/sys/devices`` tree.
|
||||
All three places have completely different rules on how to access
|
||||
device information. It is planned to merge all three
|
||||
classification directories into one place at ``/sys/subsystem``,
|
||||
following the layout of the bus directories. All buses and
|
||||
classes, including the converted block subsystem, will show up
|
||||
there.
|
||||
The devices belonging to a subsystem will create a symlink in the
|
||||
"devices" directory at ``/sys/subsystem/<name>/devices``,
|
||||
|
||||
If ``/sys/subsystem`` exists, ``/sys/bus``, ``/sys/class`` and ``/sys/block``
|
||||
can be ignored. If it does not exist, you always have to scan all three
|
||||
places, as the kernel is free to move a subsystem from one place to
|
||||
the other, as long as the devices are still reachable by the same
|
||||
subsystem name.
|
||||
|
||||
Assuming ``/sys/class/<subsystem>`` and ``/sys/bus/<subsystem>``, or
|
||||
``/sys/block`` and ``/sys/class/block`` are not interchangeable is a bug in
|
||||
the application.
|
||||
|
||||
- Block
|
||||
The converted block subsystem at ``/sys/class/block`` or
|
||||
``/sys/subsystem/block`` will contain the links for disks and partitions
|
||||
at the same level, never in a hierarchy. Assuming the block subsystem to
|
||||
contain only disks and not partition devices in the same flat list is
|
||||
a bug in the application.
|
||||
|
||||
- "device"-link and <subsystem>:<kernel name>-links
|
||||
Never depend on the "device"-link. The "device"-link is a workaround
|
||||
for the old layout, where class devices are not created in
|
||||
``/sys/devices/`` like the bus devices. If the link-resolving of a
|
||||
device directory does not end in ``/sys/devices/``, you can use the
|
||||
"device"-link to find the parent devices in ``/sys/devices/``, That is the
|
||||
single valid use of the "device"-link; it must never appear in any
|
||||
path as an element. Assuming the existence of the "device"-link for
|
||||
a device in ``/sys/devices/`` is a bug in the application.
|
||||
Accessing ``/sys/class/net/eth0/device`` is a bug in the application.
|
||||
|
||||
Never depend on the class-specific links back to the ``/sys/class``
|
||||
directory. These links are also a workaround for the design mistake
|
||||
that class devices are not created in ``/sys/devices.`` If a device
|
||||
directory does not contain directories for child devices, these links
|
||||
may be used to find the child devices in ``/sys/class.`` That is the single
|
||||
valid use of these links; they must never appear in any path as an
|
||||
element. Assuming the existence of these links for devices which are
|
||||
real child device directories in the ``/sys/devices`` tree is a bug in
|
||||
the application.
|
||||
|
||||
It is planned to remove all these links when all class device
|
||||
directories live in ``/sys/devices.``
|
||||
|
||||
- Position of devices along device chain can change.
|
||||
Never depend on a specific parent device position in the devpath,
|
||||
or the chain of parent devices. The kernel is free to insert devices into
|
||||
the chain. You must always request the parent device you are looking for
|
||||
by its subsystem value. You need to walk up the chain until you find
|
||||
the device that matches the expected subsystem. Depending on a specific
|
||||
position of a parent device or exposing relative paths using ``../`` to
|
||||
access the chain of parents is a bug in the application.
|
||||
|
||||
- When reading and writing sysfs device attribute files, avoid dependency
|
||||
on specific error codes wherever possible. This minimizes coupling to
|
||||
the error handling implementation within the kernel.
|
||||
|
||||
In general, failures to read or write sysfs device attributes shall
|
||||
propagate errors wherever possible. Common errors include, but are not
|
||||
limited to:
|
||||
|
||||
``-EIO``: The read or store operation is not supported, typically
|
||||
returned by the sysfs system itself if the read or store pointer
|
||||
is ``NULL``.
|
||||
|
||||
``-ENXIO``: The read or store operation failed
|
||||
|
||||
Error codes will not be changed without good reason, and should a change
|
||||
to error codes result in user-space breakage, it will be fixed, or the
|
||||
the offending change will be reverted.
|
||||
|
||||
Userspace applications can, however, expect the format and contents of
|
||||
the attribute files to remain consistent in the absence of a version
|
||||
attribute change in the context of a given attribute.
|
289
Documentation/admin-guide/sysrq.rst
Normal file
289
Documentation/admin-guide/sysrq.rst
Normal file
@ -0,0 +1,289 @@
|
||||
Linux Magic System Request Key Hacks
|
||||
====================================
|
||||
|
||||
Documentation for sysrq.c
|
||||
|
||||
What is the magic SysRq key?
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
It is a 'magical' key combo you can hit which the kernel will respond to
|
||||
regardless of whatever else it is doing, unless it is completely locked up.
|
||||
|
||||
How do I enable the magic SysRq key?
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
You need to say "yes" to 'Magic SysRq key (CONFIG_MAGIC_SYSRQ)' when
|
||||
configuring the kernel. When running a kernel with SysRq compiled in,
|
||||
/proc/sys/kernel/sysrq controls the functions allowed to be invoked via
|
||||
the SysRq key. The default value in this file is set by the
|
||||
CONFIG_MAGIC_SYSRQ_DEFAULT_ENABLE config symbol, which itself defaults
|
||||
to 1. Here is the list of possible values in /proc/sys/kernel/sysrq:
|
||||
|
||||
- 0 - disable sysrq completely
|
||||
- 1 - enable all functions of sysrq
|
||||
- >1 - bitmask of allowed sysrq functions (see below for detailed function
|
||||
description)::
|
||||
|
||||
2 = 0x2 - enable control of console logging level
|
||||
4 = 0x4 - enable control of keyboard (SAK, unraw)
|
||||
8 = 0x8 - enable debugging dumps of processes etc.
|
||||
16 = 0x10 - enable sync command
|
||||
32 = 0x20 - enable remount read-only
|
||||
64 = 0x40 - enable signalling of processes (term, kill, oom-kill)
|
||||
128 = 0x80 - allow reboot/poweroff
|
||||
256 = 0x100 - allow nicing of all RT tasks
|
||||
|
||||
You can set the value in the file by the following command::
|
||||
|
||||
echo "number" >/proc/sys/kernel/sysrq
|
||||
|
||||
The number may be written here either as decimal or as hexadecimal
|
||||
with the 0x prefix. CONFIG_MAGIC_SYSRQ_DEFAULT_ENABLE must always be
|
||||
written in hexadecimal.
|
||||
|
||||
Note that the value of ``/proc/sys/kernel/sysrq`` influences only the invocation
|
||||
via a keyboard. Invocation of any operation via ``/proc/sysrq-trigger`` is
|
||||
always allowed (by a user with admin privileges).
|
||||
|
||||
How do I use the magic SysRq key?
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
On x86 - You press the key combo :kbd:`ALT-SysRq-<command key>`.
|
||||
|
||||
.. note::
|
||||
Some
|
||||
keyboards may not have a key labeled 'SysRq'. The 'SysRq' key is
|
||||
also known as the 'Print Screen' key. Also some keyboards cannot
|
||||
handle so many keys being pressed at the same time, so you might
|
||||
have better luck with press :kbd:`Alt`, press :kbd:`SysRq`,
|
||||
release :kbd:`SysRq`, press :kbd:`<command key>`, release everything.
|
||||
|
||||
On SPARC - You press :kbd:`ALT-STOP-<command key>`, I believe.
|
||||
|
||||
On the serial console (PC style standard serial ports only)
|
||||
You send a ``BREAK``, then within 5 seconds a command key. Sending
|
||||
``BREAK`` twice is interpreted as a normal BREAK.
|
||||
|
||||
On PowerPC
|
||||
Press :kbd:`ALT - Print Screen` (or :kbd:`F13`) - :kbd:`<command key>`,
|
||||
:kbd:`Print Screen` (or :kbd:`F13`) - :kbd:`<command key>` may suffice.
|
||||
|
||||
On other
|
||||
If you know of the key combos for other architectures, please
|
||||
let me know so I can add them to this section.
|
||||
|
||||
On all
|
||||
write a character to /proc/sysrq-trigger. e.g.::
|
||||
|
||||
echo t > /proc/sysrq-trigger
|
||||
|
||||
What are the 'command' keys?
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
=========== ===================================================================
|
||||
Command Function
|
||||
=========== ===================================================================
|
||||
``b`` Will immediately reboot the system without syncing or unmounting
|
||||
your disks.
|
||||
|
||||
``c`` Will perform a system crash by a NULL pointer dereference.
|
||||
A crashdump will be taken if configured.
|
||||
|
||||
``d`` Shows all locks that are held.
|
||||
|
||||
``e`` Send a SIGTERM to all processes, except for init.
|
||||
|
||||
``f`` Will call the oom killer to kill a memory hog process, but do not
|
||||
panic if nothing can be killed.
|
||||
|
||||
``g`` Used by kgdb (kernel debugger)
|
||||
|
||||
``h`` Will display help (actually any other key than those listed
|
||||
here will display help. but ``h`` is easy to remember :-)
|
||||
|
||||
``i`` Send a SIGKILL to all processes, except for init.
|
||||
|
||||
``j`` Forcibly "Just thaw it" - filesystems frozen by the FIFREEZE ioctl.
|
||||
|
||||
``k`` Secure Access Key (SAK) Kills all programs on the current virtual
|
||||
console. NOTE: See important comments below in SAK section.
|
||||
|
||||
``l`` Shows a stack backtrace for all active CPUs.
|
||||
|
||||
``m`` Will dump current memory info to your console.
|
||||
|
||||
``n`` Used to make RT tasks nice-able
|
||||
|
||||
``o`` Will shut your system off (if configured and supported).
|
||||
|
||||
``p`` Will dump the current registers and flags to your console.
|
||||
|
||||
``q`` Will dump per CPU lists of all armed hrtimers (but NOT regular
|
||||
timer_list timers) and detailed information about all
|
||||
clockevent devices.
|
||||
|
||||
``r`` Turns off keyboard raw mode and sets it to XLATE.
|
||||
|
||||
``s`` Will attempt to sync all mounted filesystems.
|
||||
|
||||
``t`` Will dump a list of current tasks and their information to your
|
||||
console.
|
||||
|
||||
``u`` Will attempt to remount all mounted filesystems read-only.
|
||||
|
||||
``v`` Forcefully restores framebuffer console
|
||||
``v`` Causes ETM buffer dump [ARM-specific]
|
||||
|
||||
``w`` Dumps tasks that are in uninterruptable (blocked) state.
|
||||
|
||||
``x`` Used by xmon interface on ppc/powerpc platforms.
|
||||
Show global PMU Registers on sparc64.
|
||||
Dump all TLB entries on MIPS.
|
||||
|
||||
``y`` Show global CPU Registers [SPARC-64 specific]
|
||||
|
||||
``z`` Dump the ftrace buffer
|
||||
|
||||
``0``-``9`` Sets the console log level, controlling which kernel messages
|
||||
will be printed to your console. (``0``, for example would make
|
||||
it so that only emergency messages like PANICs or OOPSes would
|
||||
make it to your console.)
|
||||
=========== ===================================================================
|
||||
|
||||
Okay, so what can I use them for?
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Well, unraw(r) is very handy when your X server or a svgalib program crashes.
|
||||
|
||||
sak(k) (Secure Access Key) is useful when you want to be sure there is no
|
||||
trojan program running at console which could grab your password
|
||||
when you would try to login. It will kill all programs on given console,
|
||||
thus letting you make sure that the login prompt you see is actually
|
||||
the one from init, not some trojan program.
|
||||
|
||||
.. important::
|
||||
|
||||
In its true form it is not a true SAK like the one in a
|
||||
c2 compliant system, and it should not be mistaken as
|
||||
such.
|
||||
|
||||
It seems others find it useful as (System Attention Key) which is
|
||||
useful when you want to exit a program that will not let you switch consoles.
|
||||
(For example, X or a svgalib program.)
|
||||
|
||||
``reboot(b)`` is good when you're unable to shut down. But you should also
|
||||
``sync(s)`` and ``umount(u)`` first.
|
||||
|
||||
``crash(c)`` can be used to manually trigger a crashdump when the system is hung.
|
||||
Note that this just triggers a crash if there is no dump mechanism available.
|
||||
|
||||
``sync(s)`` is great when your system is locked up, it allows you to sync your
|
||||
disks and will certainly lessen the chance of data loss and fscking. Note
|
||||
that the sync hasn't taken place until you see the "OK" and "Done" appear
|
||||
on the screen. (If the kernel is really in strife, you may not ever get the
|
||||
OK or Done message...)
|
||||
|
||||
``umount(u)`` is basically useful in the same ways as ``sync(s)``. I generally
|
||||
``sync(s)``, ``umount(u)``, then ``reboot(b)`` when my system locks. It's saved
|
||||
me many a fsck. Again, the unmount (remount read-only) hasn't taken place until
|
||||
you see the "OK" and "Done" message appear on the screen.
|
||||
|
||||
The loglevels ``0``-``9`` are useful when your console is being flooded with
|
||||
kernel messages you do not want to see. Selecting ``0`` will prevent all but
|
||||
the most urgent kernel messages from reaching your console. (They will
|
||||
still be logged if syslogd/klogd are alive, though.)
|
||||
|
||||
``term(e)`` and ``kill(i)`` are useful if you have some sort of runaway process
|
||||
you are unable to kill any other way, especially if it's spawning other
|
||||
processes.
|
||||
|
||||
"just thaw ``it(j)``" is useful if your system becomes unresponsive due to a
|
||||
frozen (probably root) filesystem via the FIFREEZE ioctl.
|
||||
|
||||
Sometimes SysRq seems to get 'stuck' after using it, what can I do?
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
That happens to me, also. I've found that tapping shift, alt, and control
|
||||
on both sides of the keyboard, and hitting an invalid sysrq sequence again
|
||||
will fix the problem. (i.e., something like :kbd:`alt-sysrq-z`). Switching to
|
||||
another virtual console (:kbd:`ALT+Fn`) and then back again should also help.
|
||||
|
||||
I hit SysRq, but nothing seems to happen, what's wrong?
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
There are some keyboards that produce a different keycode for SysRq than the
|
||||
pre-defined value of 99 (see ``KEY_SYSRQ`` in ``include/linux/input.h``), or
|
||||
which don't have a SysRq key at all. In these cases, run ``showkey -s`` to find
|
||||
an appropriate scancode sequence, and use ``setkeycodes <sequence> 99`` to map
|
||||
this sequence to the usual SysRq code (e.g., ``setkeycodes e05b 99``). It's
|
||||
probably best to put this command in a boot script. Oh, and by the way, you
|
||||
exit ``showkey`` by not typing anything for ten seconds.
|
||||
|
||||
I want to add SysRQ key events to a module, how does it work?
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
In order to register a basic function with the table, you must first include
|
||||
the header ``include/linux/sysrq.h``, this will define everything else you need.
|
||||
Next, you must create a ``sysrq_key_op`` struct, and populate it with A) the key
|
||||
handler function you will use, B) a help_msg string, that will print when SysRQ
|
||||
prints help, and C) an action_msg string, that will print right before your
|
||||
handler is called. Your handler must conform to the prototype in 'sysrq.h'.
|
||||
|
||||
After the ``sysrq_key_op`` is created, you can call the kernel function
|
||||
``register_sysrq_key(int key, struct sysrq_key_op *op_p);`` this will
|
||||
register the operation pointed to by ``op_p`` at table key 'key',
|
||||
if that slot in the table is blank. At module unload time, you must call
|
||||
the function ``unregister_sysrq_key(int key, struct sysrq_key_op *op_p)``, which
|
||||
will remove the key op pointed to by 'op_p' from the key 'key', if and only if
|
||||
it is currently registered in that slot. This is in case the slot has been
|
||||
overwritten since you registered it.
|
||||
|
||||
The Magic SysRQ system works by registering key operations against a key op
|
||||
lookup table, which is defined in 'drivers/tty/sysrq.c'. This key table has
|
||||
a number of operations registered into it at compile time, but is mutable,
|
||||
and 2 functions are exported for interface to it::
|
||||
|
||||
register_sysrq_key and unregister_sysrq_key.
|
||||
|
||||
Of course, never ever leave an invalid pointer in the table. I.e., when
|
||||
your module that called register_sysrq_key() exits, it must call
|
||||
unregister_sysrq_key() to clean up the sysrq key table entry that it used.
|
||||
Null pointers in the table are always safe. :)
|
||||
|
||||
If for some reason you feel the need to call the handle_sysrq function from
|
||||
within a function called by handle_sysrq, you must be aware that you are in
|
||||
a lock (you are also in an interrupt handler, which means don't sleep!), so
|
||||
you must call ``__handle_sysrq_nolock`` instead.
|
||||
|
||||
When I hit a SysRq key combination only the header appears on the console?
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Sysrq output is subject to the same console loglevel control as all
|
||||
other console output. This means that if the kernel was booted 'quiet'
|
||||
as is common on distro kernels the output may not appear on the actual
|
||||
console, even though it will appear in the dmesg buffer, and be accessible
|
||||
via the dmesg command and to the consumers of ``/proc/kmsg``. As a specific
|
||||
exception the header line from the sysrq command is passed to all console
|
||||
consumers as if the current loglevel was maximum. If only the header
|
||||
is emitted it is almost certain that the kernel loglevel is too low.
|
||||
Should you require the output on the console channel then you will need
|
||||
to temporarily up the console loglevel using :kbd:`alt-sysrq-8` or::
|
||||
|
||||
echo 8 > /proc/sysrq-trigger
|
||||
|
||||
Remember to return the loglevel to normal after triggering the sysrq
|
||||
command you are interested in.
|
||||
|
||||
I have more questions, who can I ask?
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Just ask them on the linux-kernel mailing list:
|
||||
linux-kernel@vger.kernel.org
|
||||
|
||||
Credits
|
||||
~~~~~~~
|
||||
|
||||
Written by Mydraal <vulpyne@vulpyne.net>
|
||||
Updated by Adam Sulmicki <adam@cfar.umd.edu>
|
||||
Updated by Jeremy M. Dolan <jmd@turbogeek.org> 2001/01/28 10:15:59
|
||||
Added to by Crutcher Dunnavant <crutcher+kernel@datastacks.com>
|
59
Documentation/admin-guide/tainted-kernels.rst
Normal file
59
Documentation/admin-guide/tainted-kernels.rst
Normal file
@ -0,0 +1,59 @@
|
||||
Tainted kernels
|
||||
---------------
|
||||
|
||||
Some oops reports contain the string **'Tainted: '** after the program
|
||||
counter. This indicates that the kernel has been tainted by some
|
||||
mechanism. The string is followed by a series of position-sensitive
|
||||
characters, each representing a particular tainted value.
|
||||
|
||||
1) 'G' if all modules loaded have a GPL or compatible license, 'P' if
|
||||
any proprietary module has been loaded. Modules without a
|
||||
MODULE_LICENSE or with a MODULE_LICENSE that is not recognised by
|
||||
insmod as GPL compatible are assumed to be proprietary.
|
||||
|
||||
2) ``F`` if any module was force loaded by ``insmod -f``, ``' '`` if all
|
||||
modules were loaded normally.
|
||||
|
||||
3) ``S`` if the oops occurred on an SMP kernel running on hardware that
|
||||
hasn't been certified as safe to run multiprocessor.
|
||||
Currently this occurs only on various Athlons that are not
|
||||
SMP capable.
|
||||
|
||||
4) ``R`` if a module was force unloaded by ``rmmod -f``, ``' '`` if all
|
||||
modules were unloaded normally.
|
||||
|
||||
5) ``M`` if any processor has reported a Machine Check Exception,
|
||||
``' '`` if no Machine Check Exceptions have occurred.
|
||||
|
||||
6) ``B`` if a page-release function has found a bad page reference or
|
||||
some unexpected page flags.
|
||||
|
||||
7) ``U`` if a user or user application specifically requested that the
|
||||
Tainted flag be set, ``' '`` otherwise.
|
||||
|
||||
8) ``D`` if the kernel has died recently, i.e. there was an OOPS or BUG.
|
||||
|
||||
9) ``A`` if the ACPI table has been overridden.
|
||||
|
||||
10) ``W`` if a warning has previously been issued by the kernel.
|
||||
(Though some warnings may set more specific taint flags.)
|
||||
|
||||
11) ``C`` if a staging driver has been loaded.
|
||||
|
||||
12) ``I`` if the kernel is working around a severe bug in the platform
|
||||
firmware (BIOS or similar).
|
||||
|
||||
13) ``O`` if an externally-built ("out-of-tree") module has been loaded.
|
||||
|
||||
14) ``E`` if an unsigned module has been loaded in a kernel supporting
|
||||
module signature.
|
||||
|
||||
15) ``L`` if a soft lockup has previously occurred on the system.
|
||||
|
||||
16) ``K`` if the kernel has been live patched.
|
||||
|
||||
The primary reason for the **'Tainted: '** string is to tell kernel
|
||||
debuggers if this is a clean kernel or if anything unusual has
|
||||
occurred. Tainting is permanent: even if an offending module is
|
||||
unloaded, the tainted value remains to indicate that the kernel is not
|
||||
trustworthy.
|
@ -1,12 +1,16 @@
|
||||
Unicode support
|
||||
===============
|
||||
|
||||
Last update: 2005-01-17, version 1.4
|
||||
|
||||
This file is maintained by H. Peter Anvin <unicode@lanana.org> as part
|
||||
of the Linux Assigned Names And Numbers Authority (LANANA) project.
|
||||
The current version can be found at:
|
||||
|
||||
http://www.lanana.org/docs/unicode/unicode.txt
|
||||
http://www.lanana.org/docs/unicode/admin-guide/unicode.rst
|
||||
|
||||
------------------------
|
||||
Introduction
|
||||
------------
|
||||
|
||||
The Linux kernel code has been rewritten to use Unicode to map
|
||||
characters to fonts. By downloading a single Unicode-to-font table,
|
||||
@ -16,12 +20,14 @@ the font as indicated.
|
||||
This changes the semantics of the eight-bit character tables subtly.
|
||||
The four character tables are now:
|
||||
|
||||
=============== =============================== ================
|
||||
Map symbol Map name Escape code (G0)
|
||||
|
||||
=============== =============================== ================
|
||||
LAT1_MAP Latin-1 (ISO 8859-1) ESC ( B
|
||||
GRAF_MAP DEC VT100 pseudographics ESC ( 0
|
||||
IBMPC_MAP IBM code page 437 ESC ( U
|
||||
USER_MAP User defined ESC ( K
|
||||
=============== =============================== ================
|
||||
|
||||
In particular, ESC ( U is no longer "straight to font", since the font
|
||||
might be completely different than the IBM character set. This
|
||||
@ -55,10 +61,12 @@ In addition, the following characters not present in Unicode 1.1.4
|
||||
have been defined; these are used by the DEC VT graphics map. [v1.2]
|
||||
THIS USE IS OBSOLETE AND SHOULD NO LONGER BE USED; PLEASE SEE BELOW.
|
||||
|
||||
====== ======================================
|
||||
U+F800 DEC VT GRAPHICS HORIZONTAL LINE SCAN 1
|
||||
U+F801 DEC VT GRAPHICS HORIZONTAL LINE SCAN 3
|
||||
U+F803 DEC VT GRAPHICS HORIZONTAL LINE SCAN 7
|
||||
U+F804 DEC VT GRAPHICS HORIZONTAL LINE SCAN 9
|
||||
====== ======================================
|
||||
|
||||
The DEC VT220 uses a 6x10 character matrix, and these characters form
|
||||
a smooth progression in the DEC VT graphics character set. I have
|
||||
@ -74,10 +82,12 @@ keyboard symbols that are unlikely to ever be added to Unicode proper
|
||||
since they are horribly vendor-specific. This, of course, is an
|
||||
excellent example of horrible design.
|
||||
|
||||
====== ======================================
|
||||
U+F810 KEYBOARD SYMBOL FLYING FLAG
|
||||
U+F811 KEYBOARD SYMBOL PULLDOWN MENU
|
||||
U+F812 KEYBOARD SYMBOL OPEN APPLE
|
||||
U+F813 KEYBOARD SYMBOL SOLID APPLE
|
||||
====== ======================================
|
||||
|
||||
Klingon language support
|
||||
------------------------
|
||||
@ -99,8 +109,10 @@ of the dingbats/symbols/forms type and this is a language, I have
|
||||
located it at the end, on a 16-cell boundary in keeping with standard
|
||||
Unicode practice.
|
||||
|
||||
NOTE: This range is now officially managed by the ConScript Unicode
|
||||
Registry. The normative reference is at:
|
||||
.. note::
|
||||
|
||||
This range is now officially managed by the ConScript Unicode
|
||||
Registry. The normative reference is at:
|
||||
|
||||
http://www.evertype.com/standards/csur/klingon.html
|
||||
|
||||
@ -112,6 +124,7 @@ However, since the set of symbols appear to be consistent throughout,
|
||||
with only the actual shapes being different, in keeping with standard
|
||||
Unicode practice these differences are considered font variants.
|
||||
|
||||
====== =======================================================
|
||||
U+F8D0 KLINGON LETTER A
|
||||
U+F8D1 KLINGON LETTER B
|
||||
U+F8D2 KLINGON LETTER CH
|
||||
@ -155,6 +168,7 @@ U+F8F9 KLINGON DIGIT NINE
|
||||
U+F8FD KLINGON COMMA
|
||||
U+F8FE KLINGON FULL STOP
|
||||
U+F8FF KLINGON SYMBOL FOR EMPIRE
|
||||
====== =======================================================
|
||||
|
||||
Other Fictional and Artificial Scripts
|
||||
--------------------------------------
|
66
Documentation/admin-guide/vga-softcursor.rst
Normal file
66
Documentation/admin-guide/vga-softcursor.rst
Normal file
@ -0,0 +1,66 @@
|
||||
Software cursor for VGA
|
||||
=======================
|
||||
|
||||
by Pavel Machek <pavel@atrey.karlin.mff.cuni.cz>
|
||||
and Martin Mares <mj@atrey.karlin.mff.cuni.cz>
|
||||
|
||||
Linux now has some ability to manipulate cursor appearance. Normally, you
|
||||
can set the size of hardware cursor (and also work around some ugly bugs in
|
||||
those miserable Trident cards [#f1]_. You can now play a few new tricks:
|
||||
you can make your cursor look
|
||||
|
||||
like a non-blinking red block, make it inverse background of the character it's
|
||||
over or to highlight that character and still choose whether the original
|
||||
hardware cursor should remain visible or not. There may be other things I have
|
||||
never thought of.
|
||||
|
||||
The cursor appearance is controlled by a ``<ESC>[?1;2;3c`` escape sequence
|
||||
where 1, 2 and 3 are parameters described below. If you omit any of them,
|
||||
they will default to zeroes.
|
||||
|
||||
first Parameter
|
||||
specifies cursor size::
|
||||
|
||||
0=default
|
||||
1=invisible
|
||||
2=underline,
|
||||
...
|
||||
8=full block
|
||||
+ 16 if you want the software cursor to be applied
|
||||
+ 32 if you want to always change the background color
|
||||
+ 64 if you dislike having the background the same as the
|
||||
foreground.
|
||||
|
||||
Highlights are ignored for the last two flags.
|
||||
|
||||
second parameter
|
||||
selects character attribute bits you want to change
|
||||
(by simply XORing them with the value of this parameter). On standard
|
||||
VGA, the high four bits specify background and the low four the
|
||||
foreground. In both groups, low three bits set color (as in normal
|
||||
color codes used by the console) and the most significant one turns
|
||||
on highlight (or sometimes blinking -- it depends on the configuration
|
||||
of your VGA).
|
||||
|
||||
third parameter
|
||||
consists of character attribute bits you want to set.
|
||||
|
||||
Bit setting takes place before bit toggling, so you can simply clear a
|
||||
bit by including it in both the set mask and the toggle mask.
|
||||
|
||||
.. [#f1] see ``#define TRIDENT_GLITCH`` in ``drivers/video/vgacon.c``.
|
||||
|
||||
Examples
|
||||
--------
|
||||
|
||||
To get normal blinking underline, use::
|
||||
|
||||
echo -e '\033[?2c'
|
||||
|
||||
To get blinking block, use::
|
||||
|
||||
echo -e '\033[?6c'
|
||||
|
||||
To get red non-blinking block, use::
|
||||
|
||||
echo -e '\033[?17;0;64c'
|
@ -51,7 +51,7 @@ As an alternative, the boot loader can pass the relevant 'console='
|
||||
option to the kernel via the tagged lists specifying the port, and
|
||||
serial format options as described in
|
||||
|
||||
Documentation/kernel-parameters.txt.
|
||||
Documentation/admin-guide/kernel-parameters.rst.
|
||||
|
||||
|
||||
3. Detect the machine type
|
||||
|
@ -1,574 +0,0 @@
|
||||
========================================
|
||||
GENERIC ASSOCIATIVE ARRAY IMPLEMENTATION
|
||||
========================================
|
||||
|
||||
Contents:
|
||||
|
||||
- Overview.
|
||||
|
||||
- The public API.
|
||||
- Edit script.
|
||||
- Operations table.
|
||||
- Manipulation functions.
|
||||
- Access functions.
|
||||
- Index key form.
|
||||
|
||||
- Internal workings.
|
||||
- Basic internal tree layout.
|
||||
- Shortcuts.
|
||||
- Splitting and collapsing nodes.
|
||||
- Non-recursive iteration.
|
||||
- Simultaneous alteration and iteration.
|
||||
|
||||
|
||||
========
|
||||
OVERVIEW
|
||||
========
|
||||
|
||||
This associative array implementation is an object container with the following
|
||||
properties:
|
||||
|
||||
(1) Objects are opaque pointers. The implementation does not care where they
|
||||
point (if anywhere) or what they point to (if anything).
|
||||
|
||||
[!] NOTE: Pointers to objects _must_ be zero in the least significant bit.
|
||||
|
||||
(2) Objects do not need to contain linkage blocks for use by the array. This
|
||||
permits an object to be located in multiple arrays simultaneously.
|
||||
Rather, the array is made up of metadata blocks that point to objects.
|
||||
|
||||
(3) Objects require index keys to locate them within the array.
|
||||
|
||||
(4) Index keys must be unique. Inserting an object with the same key as one
|
||||
already in the array will replace the old object.
|
||||
|
||||
(5) Index keys can be of any length and can be of different lengths.
|
||||
|
||||
(6) Index keys should encode the length early on, before any variation due to
|
||||
length is seen.
|
||||
|
||||
(7) Index keys can include a hash to scatter objects throughout the array.
|
||||
|
||||
(8) The array can iterated over. The objects will not necessarily come out in
|
||||
key order.
|
||||
|
||||
(9) The array can be iterated over whilst it is being modified, provided the
|
||||
RCU readlock is being held by the iterator. Note, however, under these
|
||||
circumstances, some objects may be seen more than once. If this is a
|
||||
problem, the iterator should lock against modification. Objects will not
|
||||
be missed, however, unless deleted.
|
||||
|
||||
(10) Objects in the array can be looked up by means of their index key.
|
||||
|
||||
(11) Objects can be looked up whilst the array is being modified, provided the
|
||||
RCU readlock is being held by the thread doing the look up.
|
||||
|
||||
The implementation uses a tree of 16-pointer nodes internally that are indexed
|
||||
on each level by nibbles from the index key in the same manner as in a radix
|
||||
tree. To improve memory efficiency, shortcuts can be emplaced to skip over
|
||||
what would otherwise be a series of single-occupancy nodes. Further, nodes
|
||||
pack leaf object pointers into spare space in the node rather than making an
|
||||
extra branch until as such time an object needs to be added to a full node.
|
||||
|
||||
|
||||
==============
|
||||
THE PUBLIC API
|
||||
==============
|
||||
|
||||
The public API can be found in <linux/assoc_array.h>. The associative array is
|
||||
rooted on the following structure:
|
||||
|
||||
struct assoc_array {
|
||||
...
|
||||
};
|
||||
|
||||
The code is selected by enabling CONFIG_ASSOCIATIVE_ARRAY.
|
||||
|
||||
|
||||
EDIT SCRIPT
|
||||
-----------
|
||||
|
||||
The insertion and deletion functions produce an 'edit script' that can later be
|
||||
applied to effect the changes without risking ENOMEM. This retains the
|
||||
preallocated metadata blocks that will be installed in the internal tree and
|
||||
keeps track of the metadata blocks that will be removed from the tree when the
|
||||
script is applied.
|
||||
|
||||
This is also used to keep track of dead blocks and dead objects after the
|
||||
script has been applied so that they can be freed later. The freeing is done
|
||||
after an RCU grace period has passed - thus allowing access functions to
|
||||
proceed under the RCU read lock.
|
||||
|
||||
The script appears as outside of the API as a pointer of the type:
|
||||
|
||||
struct assoc_array_edit;
|
||||
|
||||
There are two functions for dealing with the script:
|
||||
|
||||
(1) Apply an edit script.
|
||||
|
||||
void assoc_array_apply_edit(struct assoc_array_edit *edit);
|
||||
|
||||
This will perform the edit functions, interpolating various write barriers
|
||||
to permit accesses under the RCU read lock to continue. The edit script
|
||||
will then be passed to call_rcu() to free it and any dead stuff it points
|
||||
to.
|
||||
|
||||
(2) Cancel an edit script.
|
||||
|
||||
void assoc_array_cancel_edit(struct assoc_array_edit *edit);
|
||||
|
||||
This frees the edit script and all preallocated memory immediately. If
|
||||
this was for insertion, the new object is _not_ released by this function,
|
||||
but must rather be released by the caller.
|
||||
|
||||
These functions are guaranteed not to fail.
|
||||
|
||||
|
||||
OPERATIONS TABLE
|
||||
----------------
|
||||
|
||||
Various functions take a table of operations:
|
||||
|
||||
struct assoc_array_ops {
|
||||
...
|
||||
};
|
||||
|
||||
This points to a number of methods, all of which need to be provided:
|
||||
|
||||
(1) Get a chunk of index key from caller data:
|
||||
|
||||
unsigned long (*get_key_chunk)(const void *index_key, int level);
|
||||
|
||||
This should return a chunk of caller-supplied index key starting at the
|
||||
*bit* position given by the level argument. The level argument will be a
|
||||
multiple of ASSOC_ARRAY_KEY_CHUNK_SIZE and the function should return
|
||||
ASSOC_ARRAY_KEY_CHUNK_SIZE bits. No error is possible.
|
||||
|
||||
|
||||
(2) Get a chunk of an object's index key.
|
||||
|
||||
unsigned long (*get_object_key_chunk)(const void *object, int level);
|
||||
|
||||
As the previous function, but gets its data from an object in the array
|
||||
rather than from a caller-supplied index key.
|
||||
|
||||
|
||||
(3) See if this is the object we're looking for.
|
||||
|
||||
bool (*compare_object)(const void *object, const void *index_key);
|
||||
|
||||
Compare the object against an index key and return true if it matches and
|
||||
false if it doesn't.
|
||||
|
||||
|
||||
(4) Diff the index keys of two objects.
|
||||
|
||||
int (*diff_objects)(const void *object, const void *index_key);
|
||||
|
||||
Return the bit position at which the index key of the specified object
|
||||
differs from the given index key or -1 if they are the same.
|
||||
|
||||
|
||||
(5) Free an object.
|
||||
|
||||
void (*free_object)(void *object);
|
||||
|
||||
Free the specified object. Note that this may be called an RCU grace
|
||||
period after assoc_array_apply_edit() was called, so synchronize_rcu() may
|
||||
be necessary on module unloading.
|
||||
|
||||
|
||||
MANIPULATION FUNCTIONS
|
||||
----------------------
|
||||
|
||||
There are a number of functions for manipulating an associative array:
|
||||
|
||||
(1) Initialise an associative array.
|
||||
|
||||
void assoc_array_init(struct assoc_array *array);
|
||||
|
||||
This initialises the base structure for an associative array. It can't
|
||||
fail.
|
||||
|
||||
|
||||
(2) Insert/replace an object in an associative array.
|
||||
|
||||
struct assoc_array_edit *
|
||||
assoc_array_insert(struct assoc_array *array,
|
||||
const struct assoc_array_ops *ops,
|
||||
const void *index_key,
|
||||
void *object);
|
||||
|
||||
This inserts the given object into the array. Note that the least
|
||||
significant bit of the pointer must be zero as it's used to type-mark
|
||||
pointers internally.
|
||||
|
||||
If an object already exists for that key then it will be replaced with the
|
||||
new object and the old one will be freed automatically.
|
||||
|
||||
The index_key argument should hold index key information and is
|
||||
passed to the methods in the ops table when they are called.
|
||||
|
||||
This function makes no alteration to the array itself, but rather returns
|
||||
an edit script that must be applied. -ENOMEM is returned in the case of
|
||||
an out-of-memory error.
|
||||
|
||||
The caller should lock exclusively against other modifiers of the array.
|
||||
|
||||
|
||||
(3) Delete an object from an associative array.
|
||||
|
||||
struct assoc_array_edit *
|
||||
assoc_array_delete(struct assoc_array *array,
|
||||
const struct assoc_array_ops *ops,
|
||||
const void *index_key);
|
||||
|
||||
This deletes an object that matches the specified data from the array.
|
||||
|
||||
The index_key argument should hold index key information and is
|
||||
passed to the methods in the ops table when they are called.
|
||||
|
||||
This function makes no alteration to the array itself, but rather returns
|
||||
an edit script that must be applied. -ENOMEM is returned in the case of
|
||||
an out-of-memory error. NULL will be returned if the specified object is
|
||||
not found within the array.
|
||||
|
||||
The caller should lock exclusively against other modifiers of the array.
|
||||
|
||||
|
||||
(4) Delete all objects from an associative array.
|
||||
|
||||
struct assoc_array_edit *
|
||||
assoc_array_clear(struct assoc_array *array,
|
||||
const struct assoc_array_ops *ops);
|
||||
|
||||
This deletes all the objects from an associative array and leaves it
|
||||
completely empty.
|
||||
|
||||
This function makes no alteration to the array itself, but rather returns
|
||||
an edit script that must be applied. -ENOMEM is returned in the case of
|
||||
an out-of-memory error.
|
||||
|
||||
The caller should lock exclusively against other modifiers of the array.
|
||||
|
||||
|
||||
(5) Destroy an associative array, deleting all objects.
|
||||
|
||||
void assoc_array_destroy(struct assoc_array *array,
|
||||
const struct assoc_array_ops *ops);
|
||||
|
||||
This destroys the contents of the associative array and leaves it
|
||||
completely empty. It is not permitted for another thread to be traversing
|
||||
the array under the RCU read lock at the same time as this function is
|
||||
destroying it as no RCU deferral is performed on memory release -
|
||||
something that would require memory to be allocated.
|
||||
|
||||
The caller should lock exclusively against other modifiers and accessors
|
||||
of the array.
|
||||
|
||||
|
||||
(6) Garbage collect an associative array.
|
||||
|
||||
int assoc_array_gc(struct assoc_array *array,
|
||||
const struct assoc_array_ops *ops,
|
||||
bool (*iterator)(void *object, void *iterator_data),
|
||||
void *iterator_data);
|
||||
|
||||
This iterates over the objects in an associative array and passes each one
|
||||
to iterator(). If iterator() returns true, the object is kept. If it
|
||||
returns false, the object will be freed. If the iterator() function
|
||||
returns true, it must perform any appropriate refcount incrementing on the
|
||||
object before returning.
|
||||
|
||||
The internal tree will be packed down if possible as part of the iteration
|
||||
to reduce the number of nodes in it.
|
||||
|
||||
The iterator_data is passed directly to iterator() and is otherwise
|
||||
ignored by the function.
|
||||
|
||||
The function will return 0 if successful and -ENOMEM if there wasn't
|
||||
enough memory.
|
||||
|
||||
It is possible for other threads to iterate over or search the array under
|
||||
the RCU read lock whilst this function is in progress. The caller should
|
||||
lock exclusively against other modifiers of the array.
|
||||
|
||||
|
||||
ACCESS FUNCTIONS
|
||||
----------------
|
||||
|
||||
There are two functions for accessing an associative array:
|
||||
|
||||
(1) Iterate over all the objects in an associative array.
|
||||
|
||||
int assoc_array_iterate(const struct assoc_array *array,
|
||||
int (*iterator)(const void *object,
|
||||
void *iterator_data),
|
||||
void *iterator_data);
|
||||
|
||||
This passes each object in the array to the iterator callback function.
|
||||
iterator_data is private data for that function.
|
||||
|
||||
This may be used on an array at the same time as the array is being
|
||||
modified, provided the RCU read lock is held. Under such circumstances,
|
||||
it is possible for the iteration function to see some objects twice. If
|
||||
this is a problem, then modification should be locked against. The
|
||||
iteration algorithm should not, however, miss any objects.
|
||||
|
||||
The function will return 0 if no objects were in the array or else it will
|
||||
return the result of the last iterator function called. Iteration stops
|
||||
immediately if any call to the iteration function results in a non-zero
|
||||
return.
|
||||
|
||||
|
||||
(2) Find an object in an associative array.
|
||||
|
||||
void *assoc_array_find(const struct assoc_array *array,
|
||||
const struct assoc_array_ops *ops,
|
||||
const void *index_key);
|
||||
|
||||
This walks through the array's internal tree directly to the object
|
||||
specified by the index key..
|
||||
|
||||
This may be used on an array at the same time as the array is being
|
||||
modified, provided the RCU read lock is held.
|
||||
|
||||
The function will return the object if found (and set *_type to the object
|
||||
type) or will return NULL if the object was not found.
|
||||
|
||||
|
||||
INDEX KEY FORM
|
||||
--------------
|
||||
|
||||
The index key can be of any form, but since the algorithms aren't told how long
|
||||
the key is, it is strongly recommended that the index key includes its length
|
||||
very early on before any variation due to the length would have an effect on
|
||||
comparisons.
|
||||
|
||||
This will cause leaves with different length keys to scatter away from each
|
||||
other - and those with the same length keys to cluster together.
|
||||
|
||||
It is also recommended that the index key begin with a hash of the rest of the
|
||||
key to maximise scattering throughout keyspace.
|
||||
|
||||
The better the scattering, the wider and lower the internal tree will be.
|
||||
|
||||
Poor scattering isn't too much of a problem as there are shortcuts and nodes
|
||||
can contain mixtures of leaves and metadata pointers.
|
||||
|
||||
The index key is read in chunks of machine word. Each chunk is subdivided into
|
||||
one nibble (4 bits) per level, so on a 32-bit CPU this is good for 8 levels and
|
||||
on a 64-bit CPU, 16 levels. Unless the scattering is really poor, it is
|
||||
unlikely that more than one word of any particular index key will have to be
|
||||
used.
|
||||
|
||||
|
||||
=================
|
||||
INTERNAL WORKINGS
|
||||
=================
|
||||
|
||||
The associative array data structure has an internal tree. This tree is
|
||||
constructed of two types of metadata blocks: nodes and shortcuts.
|
||||
|
||||
A node is an array of slots. Each slot can contain one of four things:
|
||||
|
||||
(*) A NULL pointer, indicating that the slot is empty.
|
||||
|
||||
(*) A pointer to an object (a leaf).
|
||||
|
||||
(*) A pointer to a node at the next level.
|
||||
|
||||
(*) A pointer to a shortcut.
|
||||
|
||||
|
||||
BASIC INTERNAL TREE LAYOUT
|
||||
--------------------------
|
||||
|
||||
Ignoring shortcuts for the moment, the nodes form a multilevel tree. The index
|
||||
key space is strictly subdivided by the nodes in the tree and nodes occur on
|
||||
fixed levels. For example:
|
||||
|
||||
Level: 0 1 2 3
|
||||
=============== =============== =============== ===============
|
||||
NODE D
|
||||
NODE B NODE C +------>+---+
|
||||
+------>+---+ +------>+---+ | | 0 |
|
||||
NODE A | | 0 | | | 0 | | +---+
|
||||
+---+ | +---+ | +---+ | : :
|
||||
| 0 | | : : | : : | +---+
|
||||
+---+ | +---+ | +---+ | | f |
|
||||
| 1 |---+ | 3 |---+ | 7 |---+ +---+
|
||||
+---+ +---+ +---+
|
||||
: : : : | 8 |---+
|
||||
+---+ +---+ +---+ | NODE E
|
||||
| e |---+ | f | : : +------>+---+
|
||||
+---+ | +---+ +---+ | 0 |
|
||||
| f | | | f | +---+
|
||||
+---+ | +---+ : :
|
||||
| NODE F +---+
|
||||
+------>+---+ | f |
|
||||
| 0 | NODE G +---+
|
||||
+---+ +------>+---+
|
||||
: : | | 0 |
|
||||
+---+ | +---+
|
||||
| 6 |---+ : :
|
||||
+---+ +---+
|
||||
: : | f |
|
||||
+---+ +---+
|
||||
| f |
|
||||
+---+
|
||||
|
||||
In the above example, there are 7 nodes (A-G), each with 16 slots (0-f).
|
||||
Assuming no other meta data nodes in the tree, the key space is divided thusly:
|
||||
|
||||
KEY PREFIX NODE
|
||||
========== ====
|
||||
137* D
|
||||
138* E
|
||||
13[0-69-f]* C
|
||||
1[0-24-f]* B
|
||||
e6* G
|
||||
e[0-57-f]* F
|
||||
[02-df]* A
|
||||
|
||||
So, for instance, keys with the following example index keys will be found in
|
||||
the appropriate nodes:
|
||||
|
||||
INDEX KEY PREFIX NODE
|
||||
=============== ======= ====
|
||||
13694892892489 13 C
|
||||
13795289025897 137 D
|
||||
13889dde88793 138 E
|
||||
138bbb89003093 138 E
|
||||
1394879524789 12 C
|
||||
1458952489 1 B
|
||||
9431809de993ba - A
|
||||
b4542910809cd - A
|
||||
e5284310def98 e F
|
||||
e68428974237 e6 G
|
||||
e7fffcbd443 e F
|
||||
f3842239082 - A
|
||||
|
||||
To save memory, if a node can hold all the leaves in its portion of keyspace,
|
||||
then the node will have all those leaves in it and will not have any metadata
|
||||
pointers - even if some of those leaves would like to be in the same slot.
|
||||
|
||||
A node can contain a heterogeneous mix of leaves and metadata pointers.
|
||||
Metadata pointers must be in the slots that match their subdivisions of key
|
||||
space. The leaves can be in any slot not occupied by a metadata pointer. It
|
||||
is guaranteed that none of the leaves in a node will match a slot occupied by a
|
||||
metadata pointer. If the metadata pointer is there, any leaf whose key matches
|
||||
the metadata key prefix must be in the subtree that the metadata pointer points
|
||||
to.
|
||||
|
||||
In the above example list of index keys, node A will contain:
|
||||
|
||||
SLOT CONTENT INDEX KEY (PREFIX)
|
||||
==== =============== ==================
|
||||
1 PTR TO NODE B 1*
|
||||
any LEAF 9431809de993ba
|
||||
any LEAF b4542910809cd
|
||||
e PTR TO NODE F e*
|
||||
any LEAF f3842239082
|
||||
|
||||
and node B:
|
||||
|
||||
3 PTR TO NODE C 13*
|
||||
any LEAF 1458952489
|
||||
|
||||
|
||||
SHORTCUTS
|
||||
---------
|
||||
|
||||
Shortcuts are metadata records that jump over a piece of keyspace. A shortcut
|
||||
is a replacement for a series of single-occupancy nodes ascending through the
|
||||
levels. Shortcuts exist to save memory and to speed up traversal.
|
||||
|
||||
It is possible for the root of the tree to be a shortcut - say, for example,
|
||||
the tree contains at least 17 nodes all with key prefix '1111'. The insertion
|
||||
algorithm will insert a shortcut to skip over the '1111' keyspace in a single
|
||||
bound and get to the fourth level where these actually become different.
|
||||
|
||||
|
||||
SPLITTING AND COLLAPSING NODES
|
||||
------------------------------
|
||||
|
||||
Each node has a maximum capacity of 16 leaves and metadata pointers. If the
|
||||
insertion algorithm finds that it is trying to insert a 17th object into a
|
||||
node, that node will be split such that at least two leaves that have a common
|
||||
key segment at that level end up in a separate node rooted on that slot for
|
||||
that common key segment.
|
||||
|
||||
If the leaves in a full node and the leaf that is being inserted are
|
||||
sufficiently similar, then a shortcut will be inserted into the tree.
|
||||
|
||||
When the number of objects in the subtree rooted at a node falls to 16 or
|
||||
fewer, then the subtree will be collapsed down to a single node - and this will
|
||||
ripple towards the root if possible.
|
||||
|
||||
|
||||
NON-RECURSIVE ITERATION
|
||||
-----------------------
|
||||
|
||||
Each node and shortcut contains a back pointer to its parent and the number of
|
||||
slot in that parent that points to it. None-recursive iteration uses these to
|
||||
proceed rootwards through the tree, going to the parent node, slot N + 1 to
|
||||
make sure progress is made without the need for a stack.
|
||||
|
||||
The backpointers, however, make simultaneous alteration and iteration tricky.
|
||||
|
||||
|
||||
SIMULTANEOUS ALTERATION AND ITERATION
|
||||
-------------------------------------
|
||||
|
||||
There are a number of cases to consider:
|
||||
|
||||
(1) Simple insert/replace. This involves simply replacing a NULL or old
|
||||
matching leaf pointer with the pointer to the new leaf after a barrier.
|
||||
The metadata blocks don't change otherwise. An old leaf won't be freed
|
||||
until after the RCU grace period.
|
||||
|
||||
(2) Simple delete. This involves just clearing an old matching leaf. The
|
||||
metadata blocks don't change otherwise. The old leaf won't be freed until
|
||||
after the RCU grace period.
|
||||
|
||||
(3) Insertion replacing part of a subtree that we haven't yet entered. This
|
||||
may involve replacement of part of that subtree - but that won't affect
|
||||
the iteration as we won't have reached the pointer to it yet and the
|
||||
ancestry blocks are not replaced (the layout of those does not change).
|
||||
|
||||
(4) Insertion replacing nodes that we're actively processing. This isn't a
|
||||
problem as we've passed the anchoring pointer and won't switch onto the
|
||||
new layout until we follow the back pointers - at which point we've
|
||||
already examined the leaves in the replaced node (we iterate over all the
|
||||
leaves in a node before following any of its metadata pointers).
|
||||
|
||||
We might, however, re-see some leaves that have been split out into a new
|
||||
branch that's in a slot further along than we were at.
|
||||
|
||||
(5) Insertion replacing nodes that we're processing a dependent branch of.
|
||||
This won't affect us until we follow the back pointers. Similar to (4).
|
||||
|
||||
(6) Deletion collapsing a branch under us. This doesn't affect us because the
|
||||
back pointers will get us back to the parent of the new node before we
|
||||
could see the new node. The entire collapsed subtree is thrown away
|
||||
unchanged - and will still be rooted on the same slot, so we shouldn't
|
||||
process it a second time as we'll go back to slot + 1.
|
||||
|
||||
Note:
|
||||
|
||||
(*) Under some circumstances, we need to simultaneously change the parent
|
||||
pointer and the parent slot pointer on a node (say, for example, we
|
||||
inserted another node before it and moved it up a level). We cannot do
|
||||
this without locking against a read - so we have to replace that node too.
|
||||
|
||||
However, when we're changing a shortcut into a node this isn't a problem
|
||||
as shortcuts only have one slot and so the parent slot number isn't used
|
||||
when traversing backwards over one. This means that it's okay to change
|
||||
the slot number first - provided suitable barriers are used to make sure
|
||||
the parent slot number is read after the back pointer.
|
||||
|
||||
Obsolete blocks and leaves are freed up after an RCU grace period has passed,
|
||||
so as long as anyone doing walking or iteration holds the RCU read lock, the
|
||||
old superstructure should not go away on them.
|
@ -1,45 +0,0 @@
|
||||
March 2008
|
||||
Jan-Simon Moeller, dl9pf@gmx.de
|
||||
|
||||
|
||||
How to deal with bad memory e.g. reported by memtest86+ ?
|
||||
#########################################################
|
||||
|
||||
There are three possibilities I know of:
|
||||
|
||||
1) Reinsert/swap the memory modules
|
||||
|
||||
2) Buy new modules (best!) or try to exchange the memory
|
||||
if you have spare-parts
|
||||
|
||||
3) Use BadRAM or memmap
|
||||
|
||||
This Howto is about number 3) .
|
||||
|
||||
|
||||
BadRAM
|
||||
######
|
||||
BadRAM is the actively developed and available as kernel-patch
|
||||
here: http://rick.vanrein.org/linux/badram/
|
||||
|
||||
For more details see the BadRAM documentation.
|
||||
|
||||
memmap
|
||||
######
|
||||
|
||||
memmap is already in the kernel and usable as kernel-parameter at
|
||||
boot-time. Its syntax is slightly strange and you may need to
|
||||
calculate the values by yourself!
|
||||
|
||||
Syntax to exclude a memory area (see kernel-parameters.txt for details):
|
||||
memmap=<size>$<address>
|
||||
|
||||
Example: memtest86+ reported here errors at address 0x18691458, 0x18698424 and
|
||||
some others. All had 0x1869xxxx in common, so I chose a pattern of
|
||||
0x18690000,0xffff0000.
|
||||
|
||||
With the numbers of the example above:
|
||||
memmap=64K$0x18690000
|
||||
or
|
||||
memmap=0x10000$0x18690000
|
||||
|
@ -1,56 +0,0 @@
|
||||
These instructions are deliberately very basic. If you want something clever,
|
||||
go read the real docs ;-) Please don't add more stuff, but feel free to
|
||||
correct my mistakes ;-) (mbligh@aracnet.com)
|
||||
Thanks to John Levon, Dave Hansen, et al. for help writing this.
|
||||
|
||||
<test> is the thing you're trying to measure.
|
||||
Make sure you have the correct System.map / vmlinux referenced!
|
||||
|
||||
It is probably easiest to use "make install" for linux and hack
|
||||
/sbin/installkernel to copy vmlinux to /boot, in addition to vmlinuz,
|
||||
config, System.map, which are usually installed by default.
|
||||
|
||||
Readprofile
|
||||
-----------
|
||||
A recent readprofile command is needed for 2.6, such as found in util-linux
|
||||
2.12a, which can be downloaded from:
|
||||
|
||||
http://www.kernel.org/pub/linux/utils/util-linux/
|
||||
|
||||
Most distributions will ship it already.
|
||||
|
||||
Add "profile=2" to the kernel command line.
|
||||
|
||||
clear readprofile -r
|
||||
<test>
|
||||
dump output readprofile -m /boot/System.map > captured_profile
|
||||
|
||||
Oprofile
|
||||
--------
|
||||
|
||||
Get the source (see Changes for required version) from
|
||||
http://oprofile.sourceforge.net/ and add "idle=poll" to the kernel command
|
||||
line.
|
||||
|
||||
Configure with CONFIG_PROFILING=y and CONFIG_OPROFILE=y & reboot on new kernel
|
||||
|
||||
./configure --with-kernel-support
|
||||
make install
|
||||
|
||||
For superior results, be sure to enable the local APIC. If opreport sees
|
||||
a 0Hz CPU, APIC was not on. Be aware that idle=poll may mean a performance
|
||||
penalty.
|
||||
|
||||
One time setup:
|
||||
opcontrol --setup --vmlinux=/boot/vmlinux
|
||||
|
||||
clear opcontrol --reset
|
||||
start opcontrol --start
|
||||
<test>
|
||||
stop opcontrol --stop
|
||||
dump output opreport > output_file
|
||||
|
||||
To only report on the kernel, run opreport -l /boot/vmlinux > output_file
|
||||
|
||||
A reset is needed to clear old statistics, which survive a reboot.
|
||||
|
@ -1,131 +0,0 @@
|
||||
Kernel Support for miscellaneous (your favourite) Binary Formats v1.1
|
||||
=====================================================================
|
||||
|
||||
This Kernel feature allows you to invoke almost (for restrictions see below)
|
||||
every program by simply typing its name in the shell.
|
||||
This includes for example compiled Java(TM), Python or Emacs programs.
|
||||
|
||||
To achieve this you must tell binfmt_misc which interpreter has to be invoked
|
||||
with which binary. Binfmt_misc recognises the binary-type by matching some bytes
|
||||
at the beginning of the file with a magic byte sequence (masking out specified
|
||||
bits) you have supplied. Binfmt_misc can also recognise a filename extension
|
||||
aka '.com' or '.exe'.
|
||||
|
||||
First you must mount binfmt_misc:
|
||||
mount binfmt_misc -t binfmt_misc /proc/sys/fs/binfmt_misc
|
||||
|
||||
To actually register a new binary type, you have to set up a string looking like
|
||||
:name:type:offset:magic:mask:interpreter:flags (where you can choose the ':'
|
||||
upon your needs) and echo it to /proc/sys/fs/binfmt_misc/register.
|
||||
|
||||
Here is what the fields mean:
|
||||
- 'name' is an identifier string. A new /proc file will be created with this
|
||||
name below /proc/sys/fs/binfmt_misc; cannot contain slashes '/' for obvious
|
||||
reasons.
|
||||
- 'type' is the type of recognition. Give 'M' for magic and 'E' for extension.
|
||||
- 'offset' is the offset of the magic/mask in the file, counted in bytes. This
|
||||
defaults to 0 if you omit it (i.e. you write ':name:type::magic...'). Ignored
|
||||
when using filename extension matching.
|
||||
- 'magic' is the byte sequence binfmt_misc is matching for. The magic string
|
||||
may contain hex-encoded characters like \x0a or \xA4. Note that you must
|
||||
escape any NUL bytes; parsing halts at the first one. In a shell environment
|
||||
you might have to write \\x0a to prevent the shell from eating your \.
|
||||
If you chose filename extension matching, this is the extension to be
|
||||
recognised (without the '.', the \x0a specials are not allowed). Extension
|
||||
matching is case sensitive, and slashes '/' are not allowed!
|
||||
- 'mask' is an (optional, defaults to all 0xff) mask. You can mask out some
|
||||
bits from matching by supplying a string like magic and as long as magic.
|
||||
The mask is anded with the byte sequence of the file. Note that you must
|
||||
escape any NUL bytes; parsing halts at the first one. Ignored when using
|
||||
filename extension matching.
|
||||
- 'interpreter' is the program that should be invoked with the binary as first
|
||||
argument (specify the full path)
|
||||
- 'flags' is an optional field that controls several aspects of the invocation
|
||||
of the interpreter. It is a string of capital letters, each controls a
|
||||
certain aspect. The following flags are supported -
|
||||
'P' - preserve-argv[0]. Legacy behavior of binfmt_misc is to overwrite
|
||||
the original argv[0] with the full path to the binary. When this
|
||||
flag is included, binfmt_misc will add an argument to the argument
|
||||
vector for this purpose, thus preserving the original argv[0].
|
||||
e.g. If your interp is set to /bin/foo and you run `blah` (which is
|
||||
in /usr/local/bin), then the kernel will execute /bin/foo with
|
||||
argv[] set to ["/bin/foo", "/usr/local/bin/blah", "blah"]. The
|
||||
interp has to be aware of this so it can execute /usr/local/bin/blah
|
||||
with argv[] set to ["blah"].
|
||||
'O' - open-binary. Legacy behavior of binfmt_misc is to pass the full path
|
||||
of the binary to the interpreter as an argument. When this flag is
|
||||
included, binfmt_misc will open the file for reading and pass its
|
||||
descriptor as an argument, instead of the full path, thus allowing
|
||||
the interpreter to execute non-readable binaries. This feature
|
||||
should be used with care - the interpreter has to be trusted not to
|
||||
emit the contents of the non-readable binary.
|
||||
'C' - credentials. Currently, the behavior of binfmt_misc is to calculate
|
||||
the credentials and security token of the new process according to
|
||||
the interpreter. When this flag is included, these attributes are
|
||||
calculated according to the binary. It also implies the 'O' flag.
|
||||
This feature should be used with care as the interpreter
|
||||
will run with root permissions when a setuid binary owned by root
|
||||
is run with binfmt_misc.
|
||||
'F' - fix binary. The usual behaviour of binfmt_misc is to spawn the
|
||||
binary lazily when the misc format file is invoked. However,
|
||||
this doesn't work very well in the face of mount namespaces and
|
||||
changeroots, so the F mode opens the binary as soon as the
|
||||
emulation is installed and uses the opened image to spawn the
|
||||
emulator, meaning it is always available once installed,
|
||||
regardless of how the environment changes.
|
||||
|
||||
|
||||
There are some restrictions:
|
||||
- the whole register string may not exceed 1920 characters
|
||||
- the magic must reside in the first 128 bytes of the file, i.e.
|
||||
offset+size(magic) has to be less than 128
|
||||
- the interpreter string may not exceed 127 characters
|
||||
|
||||
To use binfmt_misc you have to mount it first. You can mount it with
|
||||
"mount -t binfmt_misc none /proc/sys/fs/binfmt_misc" command, or you can add
|
||||
a line "none /proc/sys/fs/binfmt_misc binfmt_misc defaults 0 0" to your
|
||||
/etc/fstab so it auto mounts on boot.
|
||||
|
||||
You may want to add the binary formats in one of your /etc/rc scripts during
|
||||
boot-up. Read the manual of your init program to figure out how to do this
|
||||
right.
|
||||
|
||||
Think about the order of adding entries! Later added entries are matched first!
|
||||
|
||||
|
||||
A few examples (assumed you are in /proc/sys/fs/binfmt_misc):
|
||||
|
||||
- enable support for em86 (like binfmt_em86, for Alpha AXP only):
|
||||
echo ':i386:M::\x7fELF\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x03:\xff\xff\xff\xff\xff\xfe\xfe\xff\xff\xff\xff\xff\xff\xff\xff\xff\xfb\xff\xff:/bin/em86:' > register
|
||||
echo ':i486:M::\x7fELF\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x06:\xff\xff\xff\xff\xff\xfe\xfe\xff\xff\xff\xff\xff\xff\xff\xff\xff\xfb\xff\xff:/bin/em86:' > register
|
||||
|
||||
- enable support for packed DOS applications (pre-configured dosemu hdimages):
|
||||
echo ':DEXE:M::\x0eDEX::/usr/bin/dosexec:' > register
|
||||
|
||||
- enable support for Windows executables using wine:
|
||||
echo ':DOSWin:M::MZ::/usr/local/bin/wine:' > register
|
||||
|
||||
For java support see Documentation/java.txt
|
||||
|
||||
|
||||
You can enable/disable binfmt_misc or one binary type by echoing 0 (to disable)
|
||||
or 1 (to enable) to /proc/sys/fs/binfmt_misc/status or /proc/.../the_name.
|
||||
Catting the file tells you the current status of binfmt_misc/the entry.
|
||||
|
||||
You can remove one entry or all entries by echoing -1 to /proc/.../the_name
|
||||
or /proc/sys/fs/binfmt_misc/status.
|
||||
|
||||
|
||||
HINTS:
|
||||
======
|
||||
|
||||
If you want to pass special arguments to your interpreter, you can
|
||||
write a wrapper script for it. See Documentation/java.txt for an
|
||||
example.
|
||||
|
||||
Your interpreter should NOT look in the PATH for the filename; the kernel
|
||||
passes it the full filename (or the file descriptor) to use. Using $PATH can
|
||||
cause unexpected behaviour and can be a security hazard.
|
||||
|
||||
|
||||
Richard Günther <rguenth@tat.physik.uni-tuebingen.de>
|
@ -184,7 +184,7 @@ infrequently used and the primary purpose of Smart Array controllers is to
|
||||
act as a RAID controller for disk drives, so the vast majority of commands
|
||||
are allocated for disk devices. However, if you have more than a few tape
|
||||
drives attached to a smart array, the default number of commands may not be
|
||||
enought (for example, if you have 8 tape drives, you could only rewind 6
|
||||
enough (for example, if you have 8 tape drives, you could only rewind 6
|
||||
at one time with the default number of commands.) The cciss_tape_cmds module
|
||||
parameter allows more commands (up to 16 more) to be allocated for use by
|
||||
tape drives. For example:
|
||||
|
@ -14,7 +14,7 @@ Contents:
|
||||
|
||||
The RAM disk driver is a way to use main system memory as a block device. It
|
||||
is required for initrd, an initial filesystem used if you need to load modules
|
||||
in order to access the root filesystem (see Documentation/initrd.txt). It can
|
||||
in order to access the root filesystem (see Documentation/admin-guide/initrd.rst). It can
|
||||
also be used for a temporary filesystem for crypto work, since the contents
|
||||
are erased on reboot.
|
||||
|
||||
|
@ -1,34 +0,0 @@
|
||||
Linux Braille Console
|
||||
|
||||
To get early boot messages on a braille device (before userspace screen
|
||||
readers can start), you first need to compile the support for the usual serial
|
||||
console (see serial-console.txt), and for braille device (in Device Drivers -
|
||||
Accessibility).
|
||||
|
||||
Then you need to specify a console=brl, option on the kernel command line, the
|
||||
format is:
|
||||
|
||||
console=brl,serial_options...
|
||||
|
||||
where serial_options... are the same as described in serial-console.txt
|
||||
|
||||
So for instance you can use console=brl,ttyS0 if the braille device is connected
|
||||
to the first serial port, and console=brl,ttyS0,115200 to override the baud rate
|
||||
to 115200, etc.
|
||||
|
||||
By default, the braille device will just show the last kernel message (console
|
||||
mode). To review previous messages, press the Insert key to switch to the VT
|
||||
review mode. In review mode, the arrow keys permit to browse in the VT content,
|
||||
page up/down keys go at the top/bottom of the screen, and the home key goes back
|
||||
to the cursor, hence providing very basic screen reviewing facility.
|
||||
|
||||
Sound feedback can be obtained by adding the braille_console.sound=1 kernel
|
||||
parameter.
|
||||
|
||||
For simplicity, only one braille console can be enabled, other uses of
|
||||
console=brl,... will be discarded. Also note that it does not interfere with
|
||||
the console selection mechanism described in serial-console.txt
|
||||
|
||||
For now, only the VisioBraille device is supported.
|
||||
|
||||
Samuel Thibault <samuel.thibault@ens-lyon.org>
|
@ -8,7 +8,7 @@ cpuacct.txt
|
||||
- CPU Accounting Controller; account CPU usage for groups of tasks.
|
||||
cpusets.txt
|
||||
- documents the cpusets feature; assign CPUs and Mem to a set of tasks.
|
||||
devices.txt
|
||||
admin-guide/devices.rst
|
||||
- Device Whitelist Controller; description, interface and security.
|
||||
freezer-subsystem.txt
|
||||
- checkpointing; rationale to not use signals, interface.
|
||||
|
@ -161,7 +161,7 @@ The producer will look something like this:
|
||||
|
||||
unsigned long head = buffer->head;
|
||||
/* The spin_unlock() and next spin_lock() provide needed ordering. */
|
||||
unsigned long tail = ACCESS_ONCE(buffer->tail);
|
||||
unsigned long tail = READ_ONCE(buffer->tail);
|
||||
|
||||
if (CIRC_SPACE(head, tail, buffer->size) >= 1) {
|
||||
/* insert one item into the buffer */
|
||||
@ -222,7 +222,7 @@ This will instruct the CPU to make sure the index is up to date before reading
|
||||
the new item, and then it shall make sure the CPU has finished reading the item
|
||||
before it writes the new tail pointer, which will erase the item.
|
||||
|
||||
Note the use of ACCESS_ONCE() and smp_load_acquire() to read the
|
||||
Note the use of READ_ONCE() and smp_load_acquire() to read the
|
||||
opposition index. This prevents the compiler from discarding and
|
||||
reloading its cached value - which some compilers will do across
|
||||
smp_read_barrier_depends(). This isn't strictly needed if you can
|
||||
|
@ -34,10 +34,10 @@ from load_config import loadConfig
|
||||
# Add any Sphinx extension module names here, as strings. They can be
|
||||
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
|
||||
# ones.
|
||||
extensions = ['kernel-doc', 'rstFlatTable', 'kernel_include', 'cdomain']
|
||||
extensions = ['kerneldoc', 'rstFlatTable', 'kernel_include', 'cdomain']
|
||||
|
||||
# The name of the math extension changed on Sphinx 1.4
|
||||
if minor > 3:
|
||||
if major == 1 and minor > 3:
|
||||
extensions.append("sphinx.ext.imgmath")
|
||||
else:
|
||||
extensions.append("sphinx.ext.pngmath")
|
||||
@ -136,7 +136,7 @@ pygments_style = 'sphinx'
|
||||
todo_include_todos = False
|
||||
|
||||
primary_domain = 'C'
|
||||
highlight_language = 'guess'
|
||||
highlight_language = 'none'
|
||||
|
||||
# -- Options for HTML output ----------------------------------------------
|
||||
|
||||
@ -332,18 +332,32 @@ latex_elements = {
|
||||
'''
|
||||
}
|
||||
|
||||
# Fix reference escape troubles with Sphinx 1.4.x
|
||||
if major == 1 and minor > 3:
|
||||
latex_elements['preamble'] += '\\renewcommand*{\\DUrole}[2]{ #2 }\n'
|
||||
|
||||
# Grouping the document tree into LaTeX files. List of tuples
|
||||
# (source start file, target name, title,
|
||||
# author, documentclass [howto, manual, or own class]).
|
||||
latex_documents = [
|
||||
('doc-guide/index', 'kernel-doc-guide.tex', 'Linux Kernel Documentation Guide',
|
||||
'The kernel development community', 'manual'),
|
||||
('admin-guide/index', 'linux-user.tex', 'Linux Kernel User Documentation',
|
||||
'The kernel development community', 'manual'),
|
||||
('core-api/index', 'core-api.tex', 'The kernel core API manual',
|
||||
'The kernel development community', 'manual'),
|
||||
('driver-api/index', 'driver-api.tex', 'The kernel driver API manual',
|
||||
'The kernel development community', 'manual'),
|
||||
('kernel-documentation', 'kernel-documentation.tex', 'The Linux Kernel Documentation',
|
||||
'The kernel development community', 'manual'),
|
||||
('development-process/index', 'development-process.tex', 'Linux Kernel Development Documentation',
|
||||
('process/index', 'development-process.tex', 'Linux Kernel Development Documentation',
|
||||
'The kernel development community', 'manual'),
|
||||
('gpu/index', 'gpu.tex', 'Linux GPU Driver Developer\'s Guide',
|
||||
'The kernel development community', 'manual'),
|
||||
('media/index', 'media.tex', 'Linux Media Subsystem Documentation',
|
||||
'The kernel development community', 'manual'),
|
||||
('security/index', 'security.tex', 'The kernel security subsystem manual',
|
||||
'The kernel development community', 'manual'),
|
||||
]
|
||||
|
||||
# The name of an image file (relative to this directory) to place at the top of
|
||||
|
551
Documentation/core-api/assoc_array.rst
Normal file
551
Documentation/core-api/assoc_array.rst
Normal file
@ -0,0 +1,551 @@
|
||||
========================================
|
||||
Generic Associative Array Implementation
|
||||
========================================
|
||||
|
||||
Overview
|
||||
========
|
||||
|
||||
This associative array implementation is an object container with the following
|
||||
properties:
|
||||
|
||||
1. Objects are opaque pointers. The implementation does not care where they
|
||||
point (if anywhere) or what they point to (if anything).
|
||||
.. note:: Pointers to objects _must_ be zero in the least significant bit.
|
||||
|
||||
2. Objects do not need to contain linkage blocks for use by the array. This
|
||||
permits an object to be located in multiple arrays simultaneously.
|
||||
Rather, the array is made up of metadata blocks that point to objects.
|
||||
|
||||
3. Objects require index keys to locate them within the array.
|
||||
|
||||
4. Index keys must be unique. Inserting an object with the same key as one
|
||||
already in the array will replace the old object.
|
||||
|
||||
5. Index keys can be of any length and can be of different lengths.
|
||||
|
||||
6. Index keys should encode the length early on, before any variation due to
|
||||
length is seen.
|
||||
|
||||
7. Index keys can include a hash to scatter objects throughout the array.
|
||||
|
||||
8. The array can iterated over. The objects will not necessarily come out in
|
||||
key order.
|
||||
|
||||
9. The array can be iterated over whilst it is being modified, provided the
|
||||
RCU readlock is being held by the iterator. Note, however, under these
|
||||
circumstances, some objects may be seen more than once. If this is a
|
||||
problem, the iterator should lock against modification. Objects will not
|
||||
be missed, however, unless deleted.
|
||||
|
||||
10. Objects in the array can be looked up by means of their index key.
|
||||
|
||||
11. Objects can be looked up whilst the array is being modified, provided the
|
||||
RCU readlock is being held by the thread doing the look up.
|
||||
|
||||
The implementation uses a tree of 16-pointer nodes internally that are indexed
|
||||
on each level by nibbles from the index key in the same manner as in a radix
|
||||
tree. To improve memory efficiency, shortcuts can be emplaced to skip over
|
||||
what would otherwise be a series of single-occupancy nodes. Further, nodes
|
||||
pack leaf object pointers into spare space in the node rather than making an
|
||||
extra branch until as such time an object needs to be added to a full node.
|
||||
|
||||
|
||||
The Public API
|
||||
==============
|
||||
|
||||
The public API can be found in ``<linux/assoc_array.h>``. The associative
|
||||
array is rooted on the following structure::
|
||||
|
||||
struct assoc_array {
|
||||
...
|
||||
};
|
||||
|
||||
The code is selected by enabling ``CONFIG_ASSOCIATIVE_ARRAY`` with::
|
||||
|
||||
./script/config -e ASSOCIATIVE_ARRAY
|
||||
|
||||
|
||||
Edit Script
|
||||
-----------
|
||||
|
||||
The insertion and deletion functions produce an 'edit script' that can later be
|
||||
applied to effect the changes without risking ``ENOMEM``. This retains the
|
||||
preallocated metadata blocks that will be installed in the internal tree and
|
||||
keeps track of the metadata blocks that will be removed from the tree when the
|
||||
script is applied.
|
||||
|
||||
This is also used to keep track of dead blocks and dead objects after the
|
||||
script has been applied so that they can be freed later. The freeing is done
|
||||
after an RCU grace period has passed - thus allowing access functions to
|
||||
proceed under the RCU read lock.
|
||||
|
||||
The script appears as outside of the API as a pointer of the type::
|
||||
|
||||
struct assoc_array_edit;
|
||||
|
||||
There are two functions for dealing with the script:
|
||||
|
||||
1. Apply an edit script::
|
||||
|
||||
void assoc_array_apply_edit(struct assoc_array_edit *edit);
|
||||
|
||||
This will perform the edit functions, interpolating various write barriers
|
||||
to permit accesses under the RCU read lock to continue. The edit script
|
||||
will then be passed to ``call_rcu()`` to free it and any dead stuff it points
|
||||
to.
|
||||
|
||||
2. Cancel an edit script::
|
||||
|
||||
void assoc_array_cancel_edit(struct assoc_array_edit *edit);
|
||||
|
||||
This frees the edit script and all preallocated memory immediately. If
|
||||
this was for insertion, the new object is _not_ released by this function,
|
||||
but must rather be released by the caller.
|
||||
|
||||
These functions are guaranteed not to fail.
|
||||
|
||||
|
||||
Operations Table
|
||||
----------------
|
||||
|
||||
Various functions take a table of operations::
|
||||
|
||||
struct assoc_array_ops {
|
||||
...
|
||||
};
|
||||
|
||||
This points to a number of methods, all of which need to be provided:
|
||||
|
||||
1. Get a chunk of index key from caller data::
|
||||
|
||||
unsigned long (*get_key_chunk)(const void *index_key, int level);
|
||||
|
||||
This should return a chunk of caller-supplied index key starting at the
|
||||
*bit* position given by the level argument. The level argument will be a
|
||||
multiple of ``ASSOC_ARRAY_KEY_CHUNK_SIZE`` and the function should return
|
||||
``ASSOC_ARRAY_KEY_CHUNK_SIZE bits``. No error is possible.
|
||||
|
||||
|
||||
2. Get a chunk of an object's index key::
|
||||
|
||||
unsigned long (*get_object_key_chunk)(const void *object, int level);
|
||||
|
||||
As the previous function, but gets its data from an object in the array
|
||||
rather than from a caller-supplied index key.
|
||||
|
||||
|
||||
3. See if this is the object we're looking for::
|
||||
|
||||
bool (*compare_object)(const void *object, const void *index_key);
|
||||
|
||||
Compare the object against an index key and return ``true`` if it matches and
|
||||
``false`` if it doesn't.
|
||||
|
||||
|
||||
4. Diff the index keys of two objects::
|
||||
|
||||
int (*diff_objects)(const void *object, const void *index_key);
|
||||
|
||||
Return the bit position at which the index key of the specified object
|
||||
differs from the given index key or -1 if they are the same.
|
||||
|
||||
|
||||
5. Free an object::
|
||||
|
||||
void (*free_object)(void *object);
|
||||
|
||||
Free the specified object. Note that this may be called an RCU grace period
|
||||
after ``assoc_array_apply_edit()`` was called, so ``synchronize_rcu()`` may be
|
||||
necessary on module unloading.
|
||||
|
||||
|
||||
Manipulation Functions
|
||||
----------------------
|
||||
|
||||
There are a number of functions for manipulating an associative array:
|
||||
|
||||
1. Initialise an associative array::
|
||||
|
||||
void assoc_array_init(struct assoc_array *array);
|
||||
|
||||
This initialises the base structure for an associative array. It can't fail.
|
||||
|
||||
|
||||
2. Insert/replace an object in an associative array::
|
||||
|
||||
struct assoc_array_edit *
|
||||
assoc_array_insert(struct assoc_array *array,
|
||||
const struct assoc_array_ops *ops,
|
||||
const void *index_key,
|
||||
void *object);
|
||||
|
||||
This inserts the given object into the array. Note that the least
|
||||
significant bit of the pointer must be zero as it's used to type-mark
|
||||
pointers internally.
|
||||
|
||||
If an object already exists for that key then it will be replaced with the
|
||||
new object and the old one will be freed automatically.
|
||||
|
||||
The ``index_key`` argument should hold index key information and is
|
||||
passed to the methods in the ops table when they are called.
|
||||
|
||||
This function makes no alteration to the array itself, but rather returns
|
||||
an edit script that must be applied. ``-ENOMEM`` is returned in the case of
|
||||
an out-of-memory error.
|
||||
|
||||
The caller should lock exclusively against other modifiers of the array.
|
||||
|
||||
|
||||
3. Delete an object from an associative array::
|
||||
|
||||
struct assoc_array_edit *
|
||||
assoc_array_delete(struct assoc_array *array,
|
||||
const struct assoc_array_ops *ops,
|
||||
const void *index_key);
|
||||
|
||||
This deletes an object that matches the specified data from the array.
|
||||
|
||||
The ``index_key`` argument should hold index key information and is
|
||||
passed to the methods in the ops table when they are called.
|
||||
|
||||
This function makes no alteration to the array itself, but rather returns
|
||||
an edit script that must be applied. ``-ENOMEM`` is returned in the case of
|
||||
an out-of-memory error. ``NULL`` will be returned if the specified object is
|
||||
not found within the array.
|
||||
|
||||
The caller should lock exclusively against other modifiers of the array.
|
||||
|
||||
|
||||
4. Delete all objects from an associative array::
|
||||
|
||||
struct assoc_array_edit *
|
||||
assoc_array_clear(struct assoc_array *array,
|
||||
const struct assoc_array_ops *ops);
|
||||
|
||||
This deletes all the objects from an associative array and leaves it
|
||||
completely empty.
|
||||
|
||||
This function makes no alteration to the array itself, but rather returns
|
||||
an edit script that must be applied. ``-ENOMEM`` is returned in the case of
|
||||
an out-of-memory error.
|
||||
|
||||
The caller should lock exclusively against other modifiers of the array.
|
||||
|
||||
|
||||
5. Destroy an associative array, deleting all objects::
|
||||
|
||||
void assoc_array_destroy(struct assoc_array *array,
|
||||
const struct assoc_array_ops *ops);
|
||||
|
||||
This destroys the contents of the associative array and leaves it
|
||||
completely empty. It is not permitted for another thread to be traversing
|
||||
the array under the RCU read lock at the same time as this function is
|
||||
destroying it as no RCU deferral is performed on memory release -
|
||||
something that would require memory to be allocated.
|
||||
|
||||
The caller should lock exclusively against other modifiers and accessors
|
||||
of the array.
|
||||
|
||||
|
||||
6. Garbage collect an associative array::
|
||||
|
||||
int assoc_array_gc(struct assoc_array *array,
|
||||
const struct assoc_array_ops *ops,
|
||||
bool (*iterator)(void *object, void *iterator_data),
|
||||
void *iterator_data);
|
||||
|
||||
This iterates over the objects in an associative array and passes each one to
|
||||
``iterator()``. If ``iterator()`` returns ``true``, the object is kept. If it
|
||||
returns ``false``, the object will be freed. If the ``iterator()`` function
|
||||
returns ``true``, it must perform any appropriate refcount incrementing on the
|
||||
object before returning.
|
||||
|
||||
The internal tree will be packed down if possible as part of the iteration
|
||||
to reduce the number of nodes in it.
|
||||
|
||||
The ``iterator_data`` is passed directly to ``iterator()`` and is otherwise
|
||||
ignored by the function.
|
||||
|
||||
The function will return ``0`` if successful and ``-ENOMEM`` if there wasn't
|
||||
enough memory.
|
||||
|
||||
It is possible for other threads to iterate over or search the array under
|
||||
the RCU read lock whilst this function is in progress. The caller should
|
||||
lock exclusively against other modifiers of the array.
|
||||
|
||||
|
||||
Access Functions
|
||||
----------------
|
||||
|
||||
There are two functions for accessing an associative array:
|
||||
|
||||
1. Iterate over all the objects in an associative array::
|
||||
|
||||
int assoc_array_iterate(const struct assoc_array *array,
|
||||
int (*iterator)(const void *object,
|
||||
void *iterator_data),
|
||||
void *iterator_data);
|
||||
|
||||
This passes each object in the array to the iterator callback function.
|
||||
``iterator_data`` is private data for that function.
|
||||
|
||||
This may be used on an array at the same time as the array is being
|
||||
modified, provided the RCU read lock is held. Under such circumstances,
|
||||
it is possible for the iteration function to see some objects twice. If
|
||||
this is a problem, then modification should be locked against. The
|
||||
iteration algorithm should not, however, miss any objects.
|
||||
|
||||
The function will return ``0`` if no objects were in the array or else it will
|
||||
return the result of the last iterator function called. Iteration stops
|
||||
immediately if any call to the iteration function results in a non-zero
|
||||
return.
|
||||
|
||||
|
||||
2. Find an object in an associative array::
|
||||
|
||||
void *assoc_array_find(const struct assoc_array *array,
|
||||
const struct assoc_array_ops *ops,
|
||||
const void *index_key);
|
||||
|
||||
This walks through the array's internal tree directly to the object
|
||||
specified by the index key..
|
||||
|
||||
This may be used on an array at the same time as the array is being
|
||||
modified, provided the RCU read lock is held.
|
||||
|
||||
The function will return the object if found (and set ``*_type`` to the object
|
||||
type) or will return ``NULL`` if the object was not found.
|
||||
|
||||
|
||||
Index Key Form
|
||||
--------------
|
||||
|
||||
The index key can be of any form, but since the algorithms aren't told how long
|
||||
the key is, it is strongly recommended that the index key includes its length
|
||||
very early on before any variation due to the length would have an effect on
|
||||
comparisons.
|
||||
|
||||
This will cause leaves with different length keys to scatter away from each
|
||||
other - and those with the same length keys to cluster together.
|
||||
|
||||
It is also recommended that the index key begin with a hash of the rest of the
|
||||
key to maximise scattering throughout keyspace.
|
||||
|
||||
The better the scattering, the wider and lower the internal tree will be.
|
||||
|
||||
Poor scattering isn't too much of a problem as there are shortcuts and nodes
|
||||
can contain mixtures of leaves and metadata pointers.
|
||||
|
||||
The index key is read in chunks of machine word. Each chunk is subdivided into
|
||||
one nibble (4 bits) per level, so on a 32-bit CPU this is good for 8 levels and
|
||||
on a 64-bit CPU, 16 levels. Unless the scattering is really poor, it is
|
||||
unlikely that more than one word of any particular index key will have to be
|
||||
used.
|
||||
|
||||
|
||||
Internal Workings
|
||||
=================
|
||||
|
||||
The associative array data structure has an internal tree. This tree is
|
||||
constructed of two types of metadata blocks: nodes and shortcuts.
|
||||
|
||||
A node is an array of slots. Each slot can contain one of four things:
|
||||
|
||||
* A NULL pointer, indicating that the slot is empty.
|
||||
* A pointer to an object (a leaf).
|
||||
* A pointer to a node at the next level.
|
||||
* A pointer to a shortcut.
|
||||
|
||||
|
||||
Basic Internal Tree Layout
|
||||
--------------------------
|
||||
|
||||
Ignoring shortcuts for the moment, the nodes form a multilevel tree. The index
|
||||
key space is strictly subdivided by the nodes in the tree and nodes occur on
|
||||
fixed levels. For example::
|
||||
|
||||
Level: 0 1 2 3
|
||||
=============== =============== =============== ===============
|
||||
NODE D
|
||||
NODE B NODE C +------>+---+
|
||||
+------>+---+ +------>+---+ | | 0 |
|
||||
NODE A | | 0 | | | 0 | | +---+
|
||||
+---+ | +---+ | +---+ | : :
|
||||
| 0 | | : : | : : | +---+
|
||||
+---+ | +---+ | +---+ | | f |
|
||||
| 1 |---+ | 3 |---+ | 7 |---+ +---+
|
||||
+---+ +---+ +---+
|
||||
: : : : | 8 |---+
|
||||
+---+ +---+ +---+ | NODE E
|
||||
| e |---+ | f | : : +------>+---+
|
||||
+---+ | +---+ +---+ | 0 |
|
||||
| f | | | f | +---+
|
||||
+---+ | +---+ : :
|
||||
| NODE F +---+
|
||||
+------>+---+ | f |
|
||||
| 0 | NODE G +---+
|
||||
+---+ +------>+---+
|
||||
: : | | 0 |
|
||||
+---+ | +---+
|
||||
| 6 |---+ : :
|
||||
+---+ +---+
|
||||
: : | f |
|
||||
+---+ +---+
|
||||
| f |
|
||||
+---+
|
||||
|
||||
In the above example, there are 7 nodes (A-G), each with 16 slots (0-f).
|
||||
Assuming no other meta data nodes in the tree, the key space is divided
|
||||
thusly::
|
||||
|
||||
KEY PREFIX NODE
|
||||
========== ====
|
||||
137* D
|
||||
138* E
|
||||
13[0-69-f]* C
|
||||
1[0-24-f]* B
|
||||
e6* G
|
||||
e[0-57-f]* F
|
||||
[02-df]* A
|
||||
|
||||
So, for instance, keys with the following example index keys will be found in
|
||||
the appropriate nodes::
|
||||
|
||||
INDEX KEY PREFIX NODE
|
||||
=============== ======= ====
|
||||
13694892892489 13 C
|
||||
13795289025897 137 D
|
||||
13889dde88793 138 E
|
||||
138bbb89003093 138 E
|
||||
1394879524789 12 C
|
||||
1458952489 1 B
|
||||
9431809de993ba - A
|
||||
b4542910809cd - A
|
||||
e5284310def98 e F
|
||||
e68428974237 e6 G
|
||||
e7fffcbd443 e F
|
||||
f3842239082 - A
|
||||
|
||||
To save memory, if a node can hold all the leaves in its portion of keyspace,
|
||||
then the node will have all those leaves in it and will not have any metadata
|
||||
pointers - even if some of those leaves would like to be in the same slot.
|
||||
|
||||
A node can contain a heterogeneous mix of leaves and metadata pointers.
|
||||
Metadata pointers must be in the slots that match their subdivisions of key
|
||||
space. The leaves can be in any slot not occupied by a metadata pointer. It
|
||||
is guaranteed that none of the leaves in a node will match a slot occupied by a
|
||||
metadata pointer. If the metadata pointer is there, any leaf whose key matches
|
||||
the metadata key prefix must be in the subtree that the metadata pointer points
|
||||
to.
|
||||
|
||||
In the above example list of index keys, node A will contain::
|
||||
|
||||
SLOT CONTENT INDEX KEY (PREFIX)
|
||||
==== =============== ==================
|
||||
1 PTR TO NODE B 1*
|
||||
any LEAF 9431809de993ba
|
||||
any LEAF b4542910809cd
|
||||
e PTR TO NODE F e*
|
||||
any LEAF f3842239082
|
||||
|
||||
and node B::
|
||||
|
||||
3 PTR TO NODE C 13*
|
||||
any LEAF 1458952489
|
||||
|
||||
|
||||
Shortcuts
|
||||
---------
|
||||
|
||||
Shortcuts are metadata records that jump over a piece of keyspace. A shortcut
|
||||
is a replacement for a series of single-occupancy nodes ascending through the
|
||||
levels. Shortcuts exist to save memory and to speed up traversal.
|
||||
|
||||
It is possible for the root of the tree to be a shortcut - say, for example,
|
||||
the tree contains at least 17 nodes all with key prefix ``1111``. The
|
||||
insertion algorithm will insert a shortcut to skip over the ``1111`` keyspace
|
||||
in a single bound and get to the fourth level where these actually become
|
||||
different.
|
||||
|
||||
|
||||
Splitting And Collapsing Nodes
|
||||
------------------------------
|
||||
|
||||
Each node has a maximum capacity of 16 leaves and metadata pointers. If the
|
||||
insertion algorithm finds that it is trying to insert a 17th object into a
|
||||
node, that node will be split such that at least two leaves that have a common
|
||||
key segment at that level end up in a separate node rooted on that slot for
|
||||
that common key segment.
|
||||
|
||||
If the leaves in a full node and the leaf that is being inserted are
|
||||
sufficiently similar, then a shortcut will be inserted into the tree.
|
||||
|
||||
When the number of objects in the subtree rooted at a node falls to 16 or
|
||||
fewer, then the subtree will be collapsed down to a single node - and this will
|
||||
ripple towards the root if possible.
|
||||
|
||||
|
||||
Non-Recursive Iteration
|
||||
-----------------------
|
||||
|
||||
Each node and shortcut contains a back pointer to its parent and the number of
|
||||
slot in that parent that points to it. None-recursive iteration uses these to
|
||||
proceed rootwards through the tree, going to the parent node, slot N + 1 to
|
||||
make sure progress is made without the need for a stack.
|
||||
|
||||
The backpointers, however, make simultaneous alteration and iteration tricky.
|
||||
|
||||
|
||||
Simultaneous Alteration And Iteration
|
||||
-------------------------------------
|
||||
|
||||
There are a number of cases to consider:
|
||||
|
||||
1. Simple insert/replace. This involves simply replacing a NULL or old
|
||||
matching leaf pointer with the pointer to the new leaf after a barrier.
|
||||
The metadata blocks don't change otherwise. An old leaf won't be freed
|
||||
until after the RCU grace period.
|
||||
|
||||
2. Simple delete. This involves just clearing an old matching leaf. The
|
||||
metadata blocks don't change otherwise. The old leaf won't be freed until
|
||||
after the RCU grace period.
|
||||
|
||||
3. Insertion replacing part of a subtree that we haven't yet entered. This
|
||||
may involve replacement of part of that subtree - but that won't affect
|
||||
the iteration as we won't have reached the pointer to it yet and the
|
||||
ancestry blocks are not replaced (the layout of those does not change).
|
||||
|
||||
4. Insertion replacing nodes that we're actively processing. This isn't a
|
||||
problem as we've passed the anchoring pointer and won't switch onto the
|
||||
new layout until we follow the back pointers - at which point we've
|
||||
already examined the leaves in the replaced node (we iterate over all the
|
||||
leaves in a node before following any of its metadata pointers).
|
||||
|
||||
We might, however, re-see some leaves that have been split out into a new
|
||||
branch that's in a slot further along than we were at.
|
||||
|
||||
5. Insertion replacing nodes that we're processing a dependent branch of.
|
||||
This won't affect us until we follow the back pointers. Similar to (4).
|
||||
|
||||
6. Deletion collapsing a branch under us. This doesn't affect us because the
|
||||
back pointers will get us back to the parent of the new node before we
|
||||
could see the new node. The entire collapsed subtree is thrown away
|
||||
unchanged - and will still be rooted on the same slot, so we shouldn't
|
||||
process it a second time as we'll go back to slot + 1.
|
||||
|
||||
.. note::
|
||||
|
||||
Under some circumstances, we need to simultaneously change the parent
|
||||
pointer and the parent slot pointer on a node (say, for example, we
|
||||
inserted another node before it and moved it up a level). We cannot do
|
||||
this without locking against a read - so we have to replace that node too.
|
||||
|
||||
However, when we're changing a shortcut into a node this isn't a problem
|
||||
as shortcuts only have one slot and so the parent slot number isn't used
|
||||
when traversing backwards over one. This means that it's okay to change
|
||||
the slot number first - provided suitable barriers are used to make sure
|
||||
the parent slot number is read after the back pointer.
|
||||
|
||||
Obsolete blocks and leaves are freed up after an RCU grace period has passed,
|
||||
so as long as anyone doing walking or iteration holds the RCU read lock, the
|
||||
old superstructure should not go away on them.
|
@ -1,36 +1,42 @@
|
||||
Semantics and Behavior of Atomic and
|
||||
Bitmask Operations
|
||||
=======================================================
|
||||
Semantics and Behavior of Atomic and Bitmask Operations
|
||||
=======================================================
|
||||
|
||||
David S. Miller
|
||||
:Author: David S. Miller
|
||||
|
||||
This document is intended to serve as a guide to Linux port
|
||||
This document is intended to serve as a guide to Linux port
|
||||
maintainers on how to implement atomic counter, bitops, and spinlock
|
||||
interfaces properly.
|
||||
|
||||
The atomic_t type should be defined as a signed integer and
|
||||
Atomic Type And Operations
|
||||
==========================
|
||||
|
||||
The atomic_t type should be defined as a signed integer and
|
||||
the atomic_long_t type as a signed long integer. Also, they should
|
||||
be made opaque such that any kind of cast to a normal C integer type
|
||||
will fail. Something like the following should suffice:
|
||||
will fail. Something like the following should suffice::
|
||||
|
||||
typedef struct { int counter; } atomic_t;
|
||||
typedef struct { long counter; } atomic_long_t;
|
||||
|
||||
Historically, counter has been declared volatile. This is now discouraged.
|
||||
See Documentation/volatile-considered-harmful.txt for the complete rationale.
|
||||
See :ref:`Documentation/process/volatile-considered-harmful.rst
|
||||
<volatile_considered_harmful>` for the complete rationale.
|
||||
|
||||
local_t is very similar to atomic_t. If the counter is per CPU and only
|
||||
updated by one CPU, local_t is probably more appropriate. Please see
|
||||
Documentation/local_ops.txt for the semantics of local_t.
|
||||
:ref:`Documentation/core-api/local_ops.rst <local_ops>` for the semantics of
|
||||
local_t.
|
||||
|
||||
The first operations to implement for atomic_t's are the initializers and
|
||||
plain reads.
|
||||
plain reads. ::
|
||||
|
||||
#define ATOMIC_INIT(i) { (i) }
|
||||
#define atomic_set(v, i) ((v)->counter = (i))
|
||||
|
||||
The first macro is used in definitions, such as:
|
||||
The first macro is used in definitions, such as::
|
||||
|
||||
static atomic_t my_counter = ATOMIC_INIT(1);
|
||||
static atomic_t my_counter = ATOMIC_INIT(1);
|
||||
|
||||
The initializer is atomic in that the return values of the atomic operations
|
||||
are guaranteed to be correct reflecting the initialized value if the
|
||||
@ -38,10 +44,10 @@ initializer is used before runtime. If the initializer is used at runtime, a
|
||||
proper implicit or explicit read memory barrier is needed before reading the
|
||||
value with atomic_read from another thread.
|
||||
|
||||
As with all of the atomic_ interfaces, replace the leading "atomic_"
|
||||
with "atomic_long_" to operate on atomic_long_t.
|
||||
As with all of the ``atomic_`` interfaces, replace the leading ``atomic_``
|
||||
with ``atomic_long_`` to operate on atomic_long_t.
|
||||
|
||||
The second interface can be used at runtime, as in:
|
||||
The second interface can be used at runtime, as in::
|
||||
|
||||
struct foo { atomic_t counter; };
|
||||
...
|
||||
@ -59,7 +65,7 @@ been set with this operation or set with another operation. A proper implicit
|
||||
or explicit memory barrier is needed before the value set with the operation
|
||||
is guaranteed to be readable with atomic_read from another thread.
|
||||
|
||||
Next, we have:
|
||||
Next, we have::
|
||||
|
||||
#define atomic_read(v) ((v)->counter)
|
||||
|
||||
@ -73,36 +79,37 @@ initialization by any other thread is visible yet, so the user of the
|
||||
interface must take care of that with a proper implicit or explicit memory
|
||||
barrier.
|
||||
|
||||
*** WARNING: atomic_read() and atomic_set() DO NOT IMPLY BARRIERS! ***
|
||||
.. warning::
|
||||
|
||||
Some architectures may choose to use the volatile keyword, barriers, or inline
|
||||
assembly to guarantee some degree of immediacy for atomic_read() and
|
||||
atomic_set(). This is not uniformly guaranteed, and may change in the future,
|
||||
so all users of atomic_t should treat atomic_read() and atomic_set() as simple
|
||||
C statements that may be reordered or optimized away entirely by the compiler
|
||||
or processor, and explicitly invoke the appropriate compiler and/or memory
|
||||
barrier for each use case. Failure to do so will result in code that may
|
||||
suddenly break when used with different architectures or compiler
|
||||
optimizations, or even changes in unrelated code which changes how the
|
||||
compiler optimizes the section accessing atomic_t variables.
|
||||
``atomic_read()`` and ``atomic_set()`` DO NOT IMPLY BARRIERS!
|
||||
|
||||
*** YOU HAVE BEEN WARNED! ***
|
||||
Some architectures may choose to use the volatile keyword, barriers, or
|
||||
inline assembly to guarantee some degree of immediacy for atomic_read()
|
||||
and atomic_set(). This is not uniformly guaranteed, and may change in
|
||||
the future, so all users of atomic_t should treat atomic_read() and
|
||||
atomic_set() as simple C statements that may be reordered or optimized
|
||||
away entirely by the compiler or processor, and explicitly invoke the
|
||||
appropriate compiler and/or memory barrier for each use case. Failure
|
||||
to do so will result in code that may suddenly break when used with
|
||||
different architectures or compiler optimizations, or even changes in
|
||||
unrelated code which changes how the compiler optimizes the section
|
||||
accessing atomic_t variables.
|
||||
|
||||
Properly aligned pointers, longs, ints, and chars (and unsigned
|
||||
equivalents) may be atomically loaded from and stored to in the same
|
||||
sense as described for atomic_read() and atomic_set(). The ACCESS_ONCE()
|
||||
macro should be used to prevent the compiler from using optimizations
|
||||
that might otherwise optimize accesses out of existence on the one hand,
|
||||
or that might create unsolicited accesses on the other.
|
||||
sense as described for atomic_read() and atomic_set(). The READ_ONCE()
|
||||
and WRITE_ONCE() macros should be used to prevent the compiler from using
|
||||
optimizations that might otherwise optimize accesses out of existence on
|
||||
the one hand, or that might create unsolicited accesses on the other.
|
||||
|
||||
For example consider the following code:
|
||||
For example consider the following code::
|
||||
|
||||
while (a > 0)
|
||||
do_something();
|
||||
|
||||
If the compiler can prove that do_something() does not store to the
|
||||
variable a, then the compiler is within its rights transforming this to
|
||||
the following:
|
||||
the following::
|
||||
|
||||
tmp = a;
|
||||
if (a > 0)
|
||||
@ -110,14 +117,14 @@ the following:
|
||||
do_something();
|
||||
|
||||
If you don't want the compiler to do this (and you probably don't), then
|
||||
you should use something like the following:
|
||||
you should use something like the following::
|
||||
|
||||
while (ACCESS_ONCE(a) < 0)
|
||||
while (READ_ONCE(a) < 0)
|
||||
do_something();
|
||||
|
||||
Alternatively, you could place a barrier() call in the loop.
|
||||
|
||||
For another example, consider the following code:
|
||||
For another example, consider the following code::
|
||||
|
||||
tmp_a = a;
|
||||
do_something_with(tmp_a);
|
||||
@ -125,7 +132,7 @@ For another example, consider the following code:
|
||||
|
||||
If the compiler can prove that do_something_with() does not store to the
|
||||
variable a, then the compiler is within its rights to manufacture an
|
||||
additional load as follows:
|
||||
additional load as follows::
|
||||
|
||||
tmp_a = a;
|
||||
do_something_with(tmp_a);
|
||||
@ -139,15 +146,15 @@ The compiler would be likely to manufacture this additional load if
|
||||
do_something_with() was an inline function that made very heavy use
|
||||
of registers: reloading from variable a could save a flush to the
|
||||
stack and later reload. To prevent the compiler from attacking your
|
||||
code in this manner, write the following:
|
||||
code in this manner, write the following::
|
||||
|
||||
tmp_a = ACCESS_ONCE(a);
|
||||
tmp_a = READ_ONCE(a);
|
||||
do_something_with(tmp_a);
|
||||
do_something_else_with(tmp_a);
|
||||
|
||||
For a final example, consider the following code, assuming that the
|
||||
variable a is set at boot time before the second CPU is brought online
|
||||
and never changed later, so that memory barriers are not needed:
|
||||
and never changed later, so that memory barriers are not needed::
|
||||
|
||||
if (a)
|
||||
b = 9;
|
||||
@ -155,7 +162,7 @@ and never changed later, so that memory barriers are not needed:
|
||||
b = 42;
|
||||
|
||||
The compiler is within its rights to manufacture an additional store
|
||||
by transforming the above code into the following:
|
||||
by transforming the above code into the following::
|
||||
|
||||
b = 42;
|
||||
if (a)
|
||||
@ -163,20 +170,22 @@ by transforming the above code into the following:
|
||||
|
||||
This could come as a fatal surprise to other code running concurrently
|
||||
that expected b to never have the value 42 if a was zero. To prevent
|
||||
the compiler from doing this, write something like:
|
||||
the compiler from doing this, write something like::
|
||||
|
||||
if (a)
|
||||
ACCESS_ONCE(b) = 9;
|
||||
WRITE_ONCE(b, 9);
|
||||
else
|
||||
ACCESS_ONCE(b) = 42;
|
||||
WRITE_ONCE(b, 42);
|
||||
|
||||
Don't even -think- about doing this without proper use of memory barriers,
|
||||
locks, or atomic operations if variable a can change at runtime!
|
||||
|
||||
*** WARNING: ACCESS_ONCE() DOES NOT IMPLY A BARRIER! ***
|
||||
.. warning::
|
||||
|
||||
``READ_ONCE()`` OR ``WRITE_ONCE()`` DO NOT IMPLY A BARRIER!
|
||||
|
||||
Now, we move onto the atomic operation interfaces typically implemented with
|
||||
the help of assembly code.
|
||||
the help of assembly code. ::
|
||||
|
||||
void atomic_add(int i, atomic_t *v);
|
||||
void atomic_sub(int i, atomic_t *v);
|
||||
@ -192,7 +201,7 @@ One very important aspect of these two routines is that they DO NOT
|
||||
require any explicit memory barriers. They need only perform the
|
||||
atomic_t counter update in an SMP safe manner.
|
||||
|
||||
Next, we have:
|
||||
Next, we have::
|
||||
|
||||
int atomic_inc_return(atomic_t *v);
|
||||
int atomic_dec_return(atomic_t *v);
|
||||
@ -214,7 +223,7 @@ If the atomic instructions used in an implementation provide explicit
|
||||
memory barrier semantics which satisfy the above requirements, that is
|
||||
fine as well.
|
||||
|
||||
Let's move on:
|
||||
Let's move on::
|
||||
|
||||
int atomic_add_return(int i, atomic_t *v);
|
||||
int atomic_sub_return(int i, atomic_t *v);
|
||||
@ -224,7 +233,7 @@ explicit counter adjustment is given instead of the implicit "1".
|
||||
This means that like atomic_{inc,dec}_return(), the memory barrier
|
||||
semantics are required.
|
||||
|
||||
Next:
|
||||
Next::
|
||||
|
||||
int atomic_inc_and_test(atomic_t *v);
|
||||
int atomic_dec_and_test(atomic_t *v);
|
||||
@ -234,13 +243,13 @@ given atomic counter. They return a boolean indicating whether the
|
||||
resulting counter value was zero or not.
|
||||
|
||||
Again, these primitives provide explicit memory barrier semantics around
|
||||
the atomic operation.
|
||||
the atomic operation::
|
||||
|
||||
int atomic_sub_and_test(int i, atomic_t *v);
|
||||
|
||||
This is identical to atomic_dec_and_test() except that an explicit
|
||||
decrement is given instead of the implicit "1". This primitive must
|
||||
provide explicit memory barrier semantics around the operation.
|
||||
provide explicit memory barrier semantics around the operation::
|
||||
|
||||
int atomic_add_negative(int i, atomic_t *v);
|
||||
|
||||
@ -249,7 +258,7 @@ is return which indicates whether the resulting counter value is negative.
|
||||
This primitive must provide explicit memory barrier semantics around
|
||||
the operation.
|
||||
|
||||
Then:
|
||||
Then::
|
||||
|
||||
int atomic_xchg(atomic_t *v, int new);
|
||||
|
||||
@ -257,14 +266,14 @@ This performs an atomic exchange operation on the atomic variable v, setting
|
||||
the given new value. It returns the old value that the atomic variable v had
|
||||
just before the operation.
|
||||
|
||||
atomic_xchg must provide explicit memory barriers around the operation.
|
||||
atomic_xchg must provide explicit memory barriers around the operation. ::
|
||||
|
||||
int atomic_cmpxchg(atomic_t *v, int old, int new);
|
||||
|
||||
This performs an atomic compare exchange operation on the atomic value v,
|
||||
with the given old and new values. Like all atomic_xxx operations,
|
||||
atomic_cmpxchg will only satisfy its atomicity semantics as long as all
|
||||
other accesses of *v are performed through atomic_xxx operations.
|
||||
other accesses of \*v are performed through atomic_xxx operations.
|
||||
|
||||
atomic_cmpxchg must provide explicit memory barriers around the operation,
|
||||
although if the comparison fails then no memory ordering guarantees are
|
||||
@ -273,7 +282,7 @@ required.
|
||||
The semantics for atomic_cmpxchg are the same as those defined for 'cas'
|
||||
below.
|
||||
|
||||
Finally:
|
||||
Finally::
|
||||
|
||||
int atomic_add_unless(atomic_t *v, int a, int u);
|
||||
|
||||
@ -289,12 +298,12 @@ atomic_inc_not_zero, equivalent to atomic_add_unless(v, 1, 0)
|
||||
|
||||
If a caller requires memory barrier semantics around an atomic_t
|
||||
operation which does not return a value, a set of interfaces are
|
||||
defined which accomplish this:
|
||||
defined which accomplish this::
|
||||
|
||||
void smp_mb__before_atomic(void);
|
||||
void smp_mb__after_atomic(void);
|
||||
|
||||
For example, smp_mb__before_atomic() can be used like so:
|
||||
For example, smp_mb__before_atomic() can be used like so::
|
||||
|
||||
obj->dead = 1;
|
||||
smp_mb__before_atomic();
|
||||
@ -315,67 +324,69 @@ atomic_t implementation above can have disastrous results. Here is
|
||||
an example, which follows a pattern occurring frequently in the Linux
|
||||
kernel. It is the use of atomic counters to implement reference
|
||||
counting, and it works such that once the counter falls to zero it can
|
||||
be guaranteed that no other entity can be accessing the object:
|
||||
be guaranteed that no other entity can be accessing the object::
|
||||
|
||||
static void obj_list_add(struct obj *obj, struct list_head *head)
|
||||
{
|
||||
obj->active = 1;
|
||||
list_add(&obj->list, head);
|
||||
}
|
||||
static void obj_list_add(struct obj *obj, struct list_head *head)
|
||||
{
|
||||
obj->active = 1;
|
||||
list_add(&obj->list, head);
|
||||
}
|
||||
|
||||
static void obj_list_del(struct obj *obj)
|
||||
{
|
||||
list_del(&obj->list);
|
||||
obj->active = 0;
|
||||
}
|
||||
static void obj_list_del(struct obj *obj)
|
||||
{
|
||||
list_del(&obj->list);
|
||||
obj->active = 0;
|
||||
}
|
||||
|
||||
static void obj_destroy(struct obj *obj)
|
||||
{
|
||||
BUG_ON(obj->active);
|
||||
kfree(obj);
|
||||
}
|
||||
static void obj_destroy(struct obj *obj)
|
||||
{
|
||||
BUG_ON(obj->active);
|
||||
kfree(obj);
|
||||
}
|
||||
|
||||
struct obj *obj_list_peek(struct list_head *head)
|
||||
{
|
||||
if (!list_empty(head)) {
|
||||
struct obj *obj_list_peek(struct list_head *head)
|
||||
{
|
||||
if (!list_empty(head)) {
|
||||
struct obj *obj;
|
||||
|
||||
obj = list_entry(head->next, struct obj, list);
|
||||
atomic_inc(&obj->refcnt);
|
||||
return obj;
|
||||
}
|
||||
return NULL;
|
||||
}
|
||||
|
||||
void obj_poke(void)
|
||||
{
|
||||
struct obj *obj;
|
||||
|
||||
obj = list_entry(head->next, struct obj, list);
|
||||
atomic_inc(&obj->refcnt);
|
||||
return obj;
|
||||
spin_lock(&global_list_lock);
|
||||
obj = obj_list_peek(&global_list);
|
||||
spin_unlock(&global_list_lock);
|
||||
|
||||
if (obj) {
|
||||
obj->ops->poke(obj);
|
||||
if (atomic_dec_and_test(&obj->refcnt))
|
||||
obj_destroy(obj);
|
||||
}
|
||||
}
|
||||
return NULL;
|
||||
}
|
||||
|
||||
void obj_poke(void)
|
||||
{
|
||||
struct obj *obj;
|
||||
void obj_timeout(struct obj *obj)
|
||||
{
|
||||
spin_lock(&global_list_lock);
|
||||
obj_list_del(obj);
|
||||
spin_unlock(&global_list_lock);
|
||||
|
||||
spin_lock(&global_list_lock);
|
||||
obj = obj_list_peek(&global_list);
|
||||
spin_unlock(&global_list_lock);
|
||||
|
||||
if (obj) {
|
||||
obj->ops->poke(obj);
|
||||
if (atomic_dec_and_test(&obj->refcnt))
|
||||
obj_destroy(obj);
|
||||
}
|
||||
}
|
||||
|
||||
void obj_timeout(struct obj *obj)
|
||||
{
|
||||
spin_lock(&global_list_lock);
|
||||
obj_list_del(obj);
|
||||
spin_unlock(&global_list_lock);
|
||||
.. note::
|
||||
|
||||
if (atomic_dec_and_test(&obj->refcnt))
|
||||
obj_destroy(obj);
|
||||
}
|
||||
|
||||
(This is a simplification of the ARP queue management in the
|
||||
generic neighbour discover code of the networking. Olaf Kirch
|
||||
found a bug wrt. memory barriers in kfree_skb() that exposed
|
||||
the atomic_t memory barrier requirements quite clearly.)
|
||||
This is a simplification of the ARP queue management in the generic
|
||||
neighbour discover code of the networking. Olaf Kirch found a bug wrt.
|
||||
memory barriers in kfree_skb() that exposed the atomic_t memory barrier
|
||||
requirements quite clearly.
|
||||
|
||||
Given the above scheme, it must be the case that the obj->active
|
||||
update done by the obj list deletion be visible to other processors
|
||||
@ -383,7 +394,7 @@ before the atomic counter decrement is performed.
|
||||
|
||||
Otherwise, the counter could fall to zero, yet obj->active would still
|
||||
be set, thus triggering the assertion in obj_destroy(). The error
|
||||
sequence looks like this:
|
||||
sequence looks like this::
|
||||
|
||||
cpu 0 cpu 1
|
||||
obj_poke() obj_timeout()
|
||||
@ -420,6 +431,10 @@ same scheme.
|
||||
Another note is that the atomic_t operations returning values are
|
||||
extremely slow on an old 386.
|
||||
|
||||
|
||||
Atomic Bitmask
|
||||
==============
|
||||
|
||||
We will now cover the atomic bitmask operations. You will find that
|
||||
their SMP and memory barrier semantics are similar in shape and scope
|
||||
to the atomic_t ops above.
|
||||
@ -427,7 +442,7 @@ to the atomic_t ops above.
|
||||
Native atomic bit operations are defined to operate on objects aligned
|
||||
to the size of an "unsigned long" C data type, and are least of that
|
||||
size. The endianness of the bits within each "unsigned long" are the
|
||||
native endianness of the cpu.
|
||||
native endianness of the cpu. ::
|
||||
|
||||
void set_bit(unsigned long nr, volatile unsigned long *addr);
|
||||
void clear_bit(unsigned long nr, volatile unsigned long *addr);
|
||||
@ -437,7 +452,7 @@ These routines set, clear, and change, respectively, the bit number
|
||||
indicated by "nr" on the bit mask pointed to by "ADDR".
|
||||
|
||||
They must execute atomically, yet there are no implicit memory barrier
|
||||
semantics required of these interfaces.
|
||||
semantics required of these interfaces. ::
|
||||
|
||||
int test_and_set_bit(unsigned long nr, volatile unsigned long *addr);
|
||||
int test_and_clear_bit(unsigned long nr, volatile unsigned long *addr);
|
||||
@ -466,7 +481,7 @@ must provide explicit memory barrier semantics around their execution.
|
||||
All memory operations before the atomic bit operation call must be
|
||||
made visible globally before the atomic bit operation is made visible.
|
||||
Likewise, the atomic bit operation must be visible globally before any
|
||||
subsequent memory operation is made visible. For example:
|
||||
subsequent memory operation is made visible. For example::
|
||||
|
||||
obj->dead = 1;
|
||||
if (test_and_set_bit(0, &obj->flags))
|
||||
@ -479,7 +494,7 @@ done by test_and_set_bit() becomes visible. Likewise, the atomic
|
||||
memory operation done by test_and_set_bit() must become visible before
|
||||
"obj->killed = 1;" is visible.
|
||||
|
||||
Finally there is the basic operation:
|
||||
Finally there is the basic operation::
|
||||
|
||||
int test_bit(unsigned long nr, __const__ volatile unsigned long *addr);
|
||||
|
||||
@ -488,13 +503,13 @@ pointed to by "addr".
|
||||
|
||||
If explicit memory barriers are required around {set,clear}_bit() (which do
|
||||
not return a value, and thus does not need to provide memory barrier
|
||||
semantics), two interfaces are provided:
|
||||
semantics), two interfaces are provided::
|
||||
|
||||
void smp_mb__before_atomic(void);
|
||||
void smp_mb__after_atomic(void);
|
||||
|
||||
They are used as follows, and are akin to their atomic_t operation
|
||||
brothers:
|
||||
brothers::
|
||||
|
||||
/* All memory operations before this call will
|
||||
* be globally visible before the clear_bit().
|
||||
@ -511,7 +526,7 @@ There are two special bitops with lock barrier semantics (acquire/release,
|
||||
same as spinlocks). These operate in the same way as their non-_lock/unlock
|
||||
postfixed variants, except that they are to provide acquire/release semantics,
|
||||
respectively. This means they can be used for bit_spin_trylock and
|
||||
bit_spin_unlock type operations without specifying any more barriers.
|
||||
bit_spin_unlock type operations without specifying any more barriers. ::
|
||||
|
||||
int test_and_set_bit_lock(unsigned long nr, unsigned long *addr);
|
||||
void clear_bit_unlock(unsigned long nr, unsigned long *addr);
|
||||
@ -526,7 +541,7 @@ provided. They are used in contexts where some other higher-level SMP
|
||||
locking scheme is being used to protect the bitmask, and thus less
|
||||
expensive non-atomic operations may be used in the implementation.
|
||||
They have names similar to the above bitmask operation interfaces,
|
||||
except that two underscores are prefixed to the interface name.
|
||||
except that two underscores are prefixed to the interface name. ::
|
||||
|
||||
void __set_bit(unsigned long nr, volatile unsigned long *addr);
|
||||
void __clear_bit(unsigned long nr, volatile unsigned long *addr);
|
||||
@ -542,9 +557,11 @@ The routines xchg() and cmpxchg() must provide the same exact
|
||||
memory-barrier semantics as the atomic and bit operations returning
|
||||
values.
|
||||
|
||||
Note: If someone wants to use xchg(), cmpxchg() and their variants,
|
||||
linux/atomic.h should be included rather than asm/cmpxchg.h, unless
|
||||
the code is in arch/* and can take care of itself.
|
||||
.. note::
|
||||
|
||||
If someone wants to use xchg(), cmpxchg() and their variants,
|
||||
linux/atomic.h should be included rather than asm/cmpxchg.h, unless the
|
||||
code is in arch/* and can take care of itself.
|
||||
|
||||
Spinlocks and rwlocks have memory barrier expectations as well.
|
||||
The rule to follow is simple:
|
||||
@ -558,7 +575,7 @@ The rule to follow is simple:
|
||||
|
||||
Which finally brings us to _atomic_dec_and_lock(). There is an
|
||||
architecture-neutral version implemented in lib/dec_and_lock.c,
|
||||
but most platforms will wish to optimize this in assembler.
|
||||
but most platforms will wish to optimize this in assembler. ::
|
||||
|
||||
int _atomic_dec_and_lock(atomic_t *atomic, spinlock_t *lock);
|
||||
|
||||
@ -573,7 +590,7 @@ sure the spinlock operation is globally visible before any
|
||||
subsequent memory operation.
|
||||
|
||||
We can demonstrate this operation more clearly if we define
|
||||
an abstract atomic operation:
|
||||
an abstract atomic operation::
|
||||
|
||||
long cas(long *mem, long old, long new);
|
||||
|
||||
@ -584,48 +601,48 @@ an abstract atomic operation:
|
||||
3) Regardless, the current value at "mem" is returned.
|
||||
|
||||
As an example usage, here is what an atomic counter update
|
||||
might look like:
|
||||
might look like::
|
||||
|
||||
void example_atomic_inc(long *counter)
|
||||
{
|
||||
long old, new, ret;
|
||||
void example_atomic_inc(long *counter)
|
||||
{
|
||||
long old, new, ret;
|
||||
|
||||
while (1) {
|
||||
old = *counter;
|
||||
new = old + 1;
|
||||
while (1) {
|
||||
old = *counter;
|
||||
new = old + 1;
|
||||
|
||||
ret = cas(counter, old, new);
|
||||
if (ret == old)
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
Let's use cas() in order to build a pseudo-C atomic_dec_and_lock():
|
||||
|
||||
int _atomic_dec_and_lock(atomic_t *atomic, spinlock_t *lock)
|
||||
{
|
||||
long old, new, ret;
|
||||
int went_to_zero;
|
||||
|
||||
went_to_zero = 0;
|
||||
while (1) {
|
||||
old = atomic_read(atomic);
|
||||
new = old - 1;
|
||||
if (new == 0) {
|
||||
went_to_zero = 1;
|
||||
spin_lock(lock);
|
||||
}
|
||||
ret = cas(atomic, old, new);
|
||||
if (ret == old)
|
||||
break;
|
||||
if (went_to_zero) {
|
||||
spin_unlock(lock);
|
||||
went_to_zero = 0;
|
||||
ret = cas(counter, old, new);
|
||||
if (ret == old)
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
return went_to_zero;
|
||||
}
|
||||
Let's use cas() in order to build a pseudo-C atomic_dec_and_lock()::
|
||||
|
||||
int _atomic_dec_and_lock(atomic_t *atomic, spinlock_t *lock)
|
||||
{
|
||||
long old, new, ret;
|
||||
int went_to_zero;
|
||||
|
||||
went_to_zero = 0;
|
||||
while (1) {
|
||||
old = atomic_read(atomic);
|
||||
new = old - 1;
|
||||
if (new == 0) {
|
||||
went_to_zero = 1;
|
||||
spin_lock(lock);
|
||||
}
|
||||
ret = cas(atomic, old, new);
|
||||
if (ret == old)
|
||||
break;
|
||||
if (went_to_zero) {
|
||||
spin_unlock(lock);
|
||||
went_to_zero = 0;
|
||||
}
|
||||
}
|
||||
|
||||
return went_to_zero;
|
||||
}
|
||||
|
||||
Now, as far as memory barriers go, as long as spin_lock()
|
||||
strictly orders all subsequent memory operations (including
|
||||
@ -635,6 +652,7 @@ Said another way, _atomic_dec_and_lock() must guarantee that
|
||||
a counter dropping to zero is never made visible before the
|
||||
spinlock being acquired.
|
||||
|
||||
Note that this also means that for the case where the counter
|
||||
is not dropping to zero, there are no memory ordering
|
||||
requirements.
|
||||
.. note::
|
||||
|
||||
Note that this also means that for the case where the counter is not
|
||||
dropping to zero, there are no memory ordering requirements.
|
10
Documentation/core-api/conf.py
Normal file
10
Documentation/core-api/conf.py
Normal file
@ -0,0 +1,10 @@
|
||||
# -*- coding: utf-8; mode: python -*-
|
||||
|
||||
project = "Core-API Documentation"
|
||||
|
||||
tags.add("subproject")
|
||||
|
||||
latex_documents = [
|
||||
('index', 'core-api.tex', project,
|
||||
'The kernel development community', 'manual'),
|
||||
]
|
310
Documentation/core-api/debug-objects.rst
Normal file
310
Documentation/core-api/debug-objects.rst
Normal file
@ -0,0 +1,310 @@
|
||||
============================================
|
||||
The object-lifetime debugging infrastructure
|
||||
============================================
|
||||
|
||||
:Author: Thomas Gleixner
|
||||
|
||||
Introduction
|
||||
============
|
||||
|
||||
debugobjects is a generic infrastructure to track the life time of
|
||||
kernel objects and validate the operations on those.
|
||||
|
||||
debugobjects is useful to check for the following error patterns:
|
||||
|
||||
- Activation of uninitialized objects
|
||||
|
||||
- Initialization of active objects
|
||||
|
||||
- Usage of freed/destroyed objects
|
||||
|
||||
debugobjects is not changing the data structure of the real object so it
|
||||
can be compiled in with a minimal runtime impact and enabled on demand
|
||||
with a kernel command line option.
|
||||
|
||||
Howto use debugobjects
|
||||
======================
|
||||
|
||||
A kernel subsystem needs to provide a data structure which describes the
|
||||
object type and add calls into the debug code at appropriate places. The
|
||||
data structure to describe the object type needs at minimum the name of
|
||||
the object type. Optional functions can and should be provided to fixup
|
||||
detected problems so the kernel can continue to work and the debug
|
||||
information can be retrieved from a live system instead of hard core
|
||||
debugging with serial consoles and stack trace transcripts from the
|
||||
monitor.
|
||||
|
||||
The debug calls provided by debugobjects are:
|
||||
|
||||
- debug_object_init
|
||||
|
||||
- debug_object_init_on_stack
|
||||
|
||||
- debug_object_activate
|
||||
|
||||
- debug_object_deactivate
|
||||
|
||||
- debug_object_destroy
|
||||
|
||||
- debug_object_free
|
||||
|
||||
- debug_object_assert_init
|
||||
|
||||
Each of these functions takes the address of the real object and a
|
||||
pointer to the object type specific debug description structure.
|
||||
|
||||
Each detected error is reported in the statistics and a limited number
|
||||
of errors are printk'ed including a full stack trace.
|
||||
|
||||
The statistics are available via /sys/kernel/debug/debug_objects/stats.
|
||||
They provide information about the number of warnings and the number of
|
||||
successful fixups along with information about the usage of the internal
|
||||
tracking objects and the state of the internal tracking objects pool.
|
||||
|
||||
Debug functions
|
||||
===============
|
||||
|
||||
.. kernel-doc:: lib/debugobjects.c
|
||||
:functions: debug_object_init
|
||||
|
||||
This function is called whenever the initialization function of a real
|
||||
object is called.
|
||||
|
||||
When the real object is already tracked by debugobjects it is checked,
|
||||
whether the object can be initialized. Initializing is not allowed for
|
||||
active and destroyed objects. When debugobjects detects an error, then
|
||||
it calls the fixup_init function of the object type description
|
||||
structure if provided by the caller. The fixup function can correct the
|
||||
problem before the real initialization of the object happens. E.g. it
|
||||
can deactivate an active object in order to prevent damage to the
|
||||
subsystem.
|
||||
|
||||
When the real object is not yet tracked by debugobjects, debugobjects
|
||||
allocates a tracker object for the real object and sets the tracker
|
||||
object state to ODEBUG_STATE_INIT. It verifies that the object is not
|
||||
on the callers stack. If it is on the callers stack then a limited
|
||||
number of warnings including a full stack trace is printk'ed. The
|
||||
calling code must use debug_object_init_on_stack() and remove the
|
||||
object before leaving the function which allocated it. See next section.
|
||||
|
||||
.. kernel-doc:: lib/debugobjects.c
|
||||
:functions: debug_object_init_on_stack
|
||||
|
||||
This function is called whenever the initialization function of a real
|
||||
object which resides on the stack is called.
|
||||
|
||||
When the real object is already tracked by debugobjects it is checked,
|
||||
whether the object can be initialized. Initializing is not allowed for
|
||||
active and destroyed objects. When debugobjects detects an error, then
|
||||
it calls the fixup_init function of the object type description
|
||||
structure if provided by the caller. The fixup function can correct the
|
||||
problem before the real initialization of the object happens. E.g. it
|
||||
can deactivate an active object in order to prevent damage to the
|
||||
subsystem.
|
||||
|
||||
When the real object is not yet tracked by debugobjects debugobjects
|
||||
allocates a tracker object for the real object and sets the tracker
|
||||
object state to ODEBUG_STATE_INIT. It verifies that the object is on
|
||||
the callers stack.
|
||||
|
||||
An object which is on the stack must be removed from the tracker by
|
||||
calling debug_object_free() before the function which allocates the
|
||||
object returns. Otherwise we keep track of stale objects.
|
||||
|
||||
.. kernel-doc:: lib/debugobjects.c
|
||||
:functions: debug_object_activate
|
||||
|
||||
This function is called whenever the activation function of a real
|
||||
object is called.
|
||||
|
||||
When the real object is already tracked by debugobjects it is checked,
|
||||
whether the object can be activated. Activating is not allowed for
|
||||
active and destroyed objects. When debugobjects detects an error, then
|
||||
it calls the fixup_activate function of the object type description
|
||||
structure if provided by the caller. The fixup function can correct the
|
||||
problem before the real activation of the object happens. E.g. it can
|
||||
deactivate an active object in order to prevent damage to the subsystem.
|
||||
|
||||
When the real object is not yet tracked by debugobjects then the
|
||||
fixup_activate function is called if available. This is necessary to
|
||||
allow the legitimate activation of statically allocated and initialized
|
||||
objects. The fixup function checks whether the object is valid and calls
|
||||
the debug_objects_init() function to initialize the tracking of this
|
||||
object.
|
||||
|
||||
When the activation is legitimate, then the state of the associated
|
||||
tracker object is set to ODEBUG_STATE_ACTIVE.
|
||||
|
||||
|
||||
.. kernel-doc:: lib/debugobjects.c
|
||||
:functions: debug_object_deactivate
|
||||
|
||||
This function is called whenever the deactivation function of a real
|
||||
object is called.
|
||||
|
||||
When the real object is tracked by debugobjects it is checked, whether
|
||||
the object can be deactivated. Deactivating is not allowed for untracked
|
||||
or destroyed objects.
|
||||
|
||||
When the deactivation is legitimate, then the state of the associated
|
||||
tracker object is set to ODEBUG_STATE_INACTIVE.
|
||||
|
||||
.. kernel-doc:: lib/debugobjects.c
|
||||
:functions: debug_object_destroy
|
||||
|
||||
This function is called to mark an object destroyed. This is useful to
|
||||
prevent the usage of invalid objects, which are still available in
|
||||
memory: either statically allocated objects or objects which are freed
|
||||
later.
|
||||
|
||||
When the real object is tracked by debugobjects it is checked, whether
|
||||
the object can be destroyed. Destruction is not allowed for active and
|
||||
destroyed objects. When debugobjects detects an error, then it calls the
|
||||
fixup_destroy function of the object type description structure if
|
||||
provided by the caller. The fixup function can correct the problem
|
||||
before the real destruction of the object happens. E.g. it can
|
||||
deactivate an active object in order to prevent damage to the subsystem.
|
||||
|
||||
When the destruction is legitimate, then the state of the associated
|
||||
tracker object is set to ODEBUG_STATE_DESTROYED.
|
||||
|
||||
.. kernel-doc:: lib/debugobjects.c
|
||||
:functions: debug_object_free
|
||||
|
||||
This function is called before an object is freed.
|
||||
|
||||
When the real object is tracked by debugobjects it is checked, whether
|
||||
the object can be freed. Free is not allowed for active objects. When
|
||||
debugobjects detects an error, then it calls the fixup_free function of
|
||||
the object type description structure if provided by the caller. The
|
||||
fixup function can correct the problem before the real free of the
|
||||
object happens. E.g. it can deactivate an active object in order to
|
||||
prevent damage to the subsystem.
|
||||
|
||||
Note that debug_object_free removes the object from the tracker. Later
|
||||
usage of the object is detected by the other debug checks.
|
||||
|
||||
|
||||
.. kernel-doc:: lib/debugobjects.c
|
||||
:functions: debug_object_assert_init
|
||||
|
||||
This function is called to assert that an object has been initialized.
|
||||
|
||||
When the real object is not tracked by debugobjects, it calls
|
||||
fixup_assert_init of the object type description structure provided by
|
||||
the caller, with the hardcoded object state ODEBUG_NOT_AVAILABLE. The
|
||||
fixup function can correct the problem by calling debug_object_init
|
||||
and other specific initializing functions.
|
||||
|
||||
When the real object is already tracked by debugobjects it is ignored.
|
||||
|
||||
Fixup functions
|
||||
===============
|
||||
|
||||
Debug object type description structure
|
||||
---------------------------------------
|
||||
|
||||
.. kernel-doc:: include/linux/debugobjects.h
|
||||
:internal:
|
||||
|
||||
fixup_init
|
||||
-----------
|
||||
|
||||
This function is called from the debug code whenever a problem in
|
||||
debug_object_init is detected. The function takes the address of the
|
||||
object and the state which is currently recorded in the tracker.
|
||||
|
||||
Called from debug_object_init when the object state is:
|
||||
|
||||
- ODEBUG_STATE_ACTIVE
|
||||
|
||||
The function returns true when the fixup was successful, otherwise
|
||||
false. The return value is used to update the statistics.
|
||||
|
||||
Note, that the function needs to call the debug_object_init() function
|
||||
again, after the damage has been repaired in order to keep the state
|
||||
consistent.
|
||||
|
||||
fixup_activate
|
||||
---------------
|
||||
|
||||
This function is called from the debug code whenever a problem in
|
||||
debug_object_activate is detected.
|
||||
|
||||
Called from debug_object_activate when the object state is:
|
||||
|
||||
- ODEBUG_STATE_NOTAVAILABLE
|
||||
|
||||
- ODEBUG_STATE_ACTIVE
|
||||
|
||||
The function returns true when the fixup was successful, otherwise
|
||||
false. The return value is used to update the statistics.
|
||||
|
||||
Note that the function needs to call the debug_object_activate()
|
||||
function again after the damage has been repaired in order to keep the
|
||||
state consistent.
|
||||
|
||||
The activation of statically initialized objects is a special case. When
|
||||
debug_object_activate() has no tracked object for this object address
|
||||
then fixup_activate() is called with object state
|
||||
ODEBUG_STATE_NOTAVAILABLE. The fixup function needs to check whether
|
||||
this is a legitimate case of a statically initialized object or not. In
|
||||
case it is it calls debug_object_init() and debug_object_activate()
|
||||
to make the object known to the tracker and marked active. In this case
|
||||
the function should return false because this is not a real fixup.
|
||||
|
||||
fixup_destroy
|
||||
--------------
|
||||
|
||||
This function is called from the debug code whenever a problem in
|
||||
debug_object_destroy is detected.
|
||||
|
||||
Called from debug_object_destroy when the object state is:
|
||||
|
||||
- ODEBUG_STATE_ACTIVE
|
||||
|
||||
The function returns true when the fixup was successful, otherwise
|
||||
false. The return value is used to update the statistics.
|
||||
|
||||
fixup_free
|
||||
-----------
|
||||
|
||||
This function is called from the debug code whenever a problem in
|
||||
debug_object_free is detected. Further it can be called from the debug
|
||||
checks in kfree/vfree, when an active object is detected from the
|
||||
debug_check_no_obj_freed() sanity checks.
|
||||
|
||||
Called from debug_object_free() or debug_check_no_obj_freed() when
|
||||
the object state is:
|
||||
|
||||
- ODEBUG_STATE_ACTIVE
|
||||
|
||||
The function returns true when the fixup was successful, otherwise
|
||||
false. The return value is used to update the statistics.
|
||||
|
||||
fixup_assert_init
|
||||
-------------------
|
||||
|
||||
This function is called from the debug code whenever a problem in
|
||||
debug_object_assert_init is detected.
|
||||
|
||||
Called from debug_object_assert_init() with a hardcoded state
|
||||
ODEBUG_STATE_NOTAVAILABLE when the object is not found in the debug
|
||||
bucket.
|
||||
|
||||
The function returns true when the fixup was successful, otherwise
|
||||
false. The return value is used to update the statistics.
|
||||
|
||||
Note, this function should make sure debug_object_init() is called
|
||||
before returning.
|
||||
|
||||
The handling of statically initialized objects is a special case. The
|
||||
fixup function should check if this is a legitimate case of a statically
|
||||
initialized object or not. In this case only debug_object_init()
|
||||
should be called to make the object known to the tracker. Then the
|
||||
function should return false because this is not a real fixup.
|
||||
|
||||
Known Bugs And Assumptions
|
||||
==========================
|
||||
|
||||
None (knock on wood).
|
33
Documentation/core-api/index.rst
Normal file
33
Documentation/core-api/index.rst
Normal file
@ -0,0 +1,33 @@
|
||||
======================
|
||||
Core API Documentation
|
||||
======================
|
||||
|
||||
This is the beginning of a manual for core kernel APIs. The conversion
|
||||
(and writing!) of documents for this manual is much appreciated!
|
||||
|
||||
Core utilities
|
||||
==============
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
assoc_array
|
||||
atomic_ops
|
||||
local_ops
|
||||
workqueue
|
||||
|
||||
Interfaces for kernel debugging
|
||||
===============================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
debug-objects
|
||||
tracepoint
|
||||
|
||||
.. only:: subproject
|
||||
|
||||
Indices
|
||||
=======
|
||||
|
||||
* :ref:`genindex`
|
206
Documentation/core-api/local_ops.rst
Normal file
206
Documentation/core-api/local_ops.rst
Normal file
@ -0,0 +1,206 @@
|
||||
|
||||
.. _local_ops:
|
||||
|
||||
=================================================
|
||||
Semantics and Behavior of Local Atomic Operations
|
||||
=================================================
|
||||
|
||||
:Author: Mathieu Desnoyers
|
||||
|
||||
|
||||
This document explains the purpose of the local atomic operations, how
|
||||
to implement them for any given architecture and shows how they can be used
|
||||
properly. It also stresses on the precautions that must be taken when reading
|
||||
those local variables across CPUs when the order of memory writes matters.
|
||||
|
||||
.. note::
|
||||
|
||||
Note that ``local_t`` based operations are not recommended for general
|
||||
kernel use. Please use the ``this_cpu`` operations instead unless there is
|
||||
really a special purpose. Most uses of ``local_t`` in the kernel have been
|
||||
replaced by ``this_cpu`` operations. ``this_cpu`` operations combine the
|
||||
relocation with the ``local_t`` like semantics in a single instruction and
|
||||
yield more compact and faster executing code.
|
||||
|
||||
|
||||
Purpose of local atomic operations
|
||||
==================================
|
||||
|
||||
Local atomic operations are meant to provide fast and highly reentrant per CPU
|
||||
counters. They minimize the performance cost of standard atomic operations by
|
||||
removing the LOCK prefix and memory barriers normally required to synchronize
|
||||
across CPUs.
|
||||
|
||||
Having fast per CPU atomic counters is interesting in many cases: it does not
|
||||
require disabling interrupts to protect from interrupt handlers and it permits
|
||||
coherent counters in NMI handlers. It is especially useful for tracing purposes
|
||||
and for various performance monitoring counters.
|
||||
|
||||
Local atomic operations only guarantee variable modification atomicity wrt the
|
||||
CPU which owns the data. Therefore, care must taken to make sure that only one
|
||||
CPU writes to the ``local_t`` data. This is done by using per cpu data and
|
||||
making sure that we modify it from within a preemption safe context. It is
|
||||
however permitted to read ``local_t`` data from any CPU: it will then appear to
|
||||
be written out of order wrt other memory writes by the owner CPU.
|
||||
|
||||
|
||||
Implementation for a given architecture
|
||||
=======================================
|
||||
|
||||
It can be done by slightly modifying the standard atomic operations: only
|
||||
their UP variant must be kept. It typically means removing LOCK prefix (on
|
||||
i386 and x86_64) and any SMP synchronization barrier. If the architecture does
|
||||
not have a different behavior between SMP and UP, including
|
||||
``asm-generic/local.h`` in your architecture's ``local.h`` is sufficient.
|
||||
|
||||
The ``local_t`` type is defined as an opaque ``signed long`` by embedding an
|
||||
``atomic_long_t`` inside a structure. This is made so a cast from this type to
|
||||
a ``long`` fails. The definition looks like::
|
||||
|
||||
typedef struct { atomic_long_t a; } local_t;
|
||||
|
||||
|
||||
Rules to follow when using local atomic operations
|
||||
==================================================
|
||||
|
||||
* Variables touched by local ops must be per cpu variables.
|
||||
* *Only* the CPU owner of these variables must write to them.
|
||||
* This CPU can use local ops from any context (process, irq, softirq, nmi, ...)
|
||||
to update its ``local_t`` variables.
|
||||
* Preemption (or interrupts) must be disabled when using local ops in
|
||||
process context to make sure the process won't be migrated to a
|
||||
different CPU between getting the per-cpu variable and doing the
|
||||
actual local op.
|
||||
* When using local ops in interrupt context, no special care must be
|
||||
taken on a mainline kernel, since they will run on the local CPU with
|
||||
preemption already disabled. I suggest, however, to explicitly
|
||||
disable preemption anyway to make sure it will still work correctly on
|
||||
-rt kernels.
|
||||
* Reading the local cpu variable will provide the current copy of the
|
||||
variable.
|
||||
* Reads of these variables can be done from any CPU, because updates to
|
||||
"``long``", aligned, variables are always atomic. Since no memory
|
||||
synchronization is done by the writer CPU, an outdated copy of the
|
||||
variable can be read when reading some *other* cpu's variables.
|
||||
|
||||
|
||||
How to use local atomic operations
|
||||
==================================
|
||||
|
||||
::
|
||||
|
||||
#include <linux/percpu.h>
|
||||
#include <asm/local.h>
|
||||
|
||||
static DEFINE_PER_CPU(local_t, counters) = LOCAL_INIT(0);
|
||||
|
||||
|
||||
Counting
|
||||
========
|
||||
|
||||
Counting is done on all the bits of a signed long.
|
||||
|
||||
In preemptible context, use ``get_cpu_var()`` and ``put_cpu_var()`` around
|
||||
local atomic operations: it makes sure that preemption is disabled around write
|
||||
access to the per cpu variable. For instance::
|
||||
|
||||
local_inc(&get_cpu_var(counters));
|
||||
put_cpu_var(counters);
|
||||
|
||||
If you are already in a preemption-safe context, you can use
|
||||
``this_cpu_ptr()`` instead::
|
||||
|
||||
local_inc(this_cpu_ptr(&counters));
|
||||
|
||||
|
||||
|
||||
Reading the counters
|
||||
====================
|
||||
|
||||
Those local counters can be read from foreign CPUs to sum the count. Note that
|
||||
the data seen by local_read across CPUs must be considered to be out of order
|
||||
relatively to other memory writes happening on the CPU that owns the data::
|
||||
|
||||
long sum = 0;
|
||||
for_each_online_cpu(cpu)
|
||||
sum += local_read(&per_cpu(counters, cpu));
|
||||
|
||||
If you want to use a remote local_read to synchronize access to a resource
|
||||
between CPUs, explicit ``smp_wmb()`` and ``smp_rmb()`` memory barriers must be used
|
||||
respectively on the writer and the reader CPUs. It would be the case if you use
|
||||
the ``local_t`` variable as a counter of bytes written in a buffer: there should
|
||||
be a ``smp_wmb()`` between the buffer write and the counter increment and also a
|
||||
``smp_rmb()`` between the counter read and the buffer read.
|
||||
|
||||
|
||||
Here is a sample module which implements a basic per cpu counter using
|
||||
``local.h``::
|
||||
|
||||
/* test-local.c
|
||||
*
|
||||
* Sample module for local.h usage.
|
||||
*/
|
||||
|
||||
|
||||
#include <asm/local.h>
|
||||
#include <linux/module.h>
|
||||
#include <linux/timer.h>
|
||||
|
||||
static DEFINE_PER_CPU(local_t, counters) = LOCAL_INIT(0);
|
||||
|
||||
static struct timer_list test_timer;
|
||||
|
||||
/* IPI called on each CPU. */
|
||||
static void test_each(void *info)
|
||||
{
|
||||
/* Increment the counter from a non preemptible context */
|
||||
printk("Increment on cpu %d\n", smp_processor_id());
|
||||
local_inc(this_cpu_ptr(&counters));
|
||||
|
||||
/* This is what incrementing the variable would look like within a
|
||||
* preemptible context (it disables preemption) :
|
||||
*
|
||||
* local_inc(&get_cpu_var(counters));
|
||||
* put_cpu_var(counters);
|
||||
*/
|
||||
}
|
||||
|
||||
static void do_test_timer(unsigned long data)
|
||||
{
|
||||
int cpu;
|
||||
|
||||
/* Increment the counters */
|
||||
on_each_cpu(test_each, NULL, 1);
|
||||
/* Read all the counters */
|
||||
printk("Counters read from CPU %d\n", smp_processor_id());
|
||||
for_each_online_cpu(cpu) {
|
||||
printk("Read : CPU %d, count %ld\n", cpu,
|
||||
local_read(&per_cpu(counters, cpu)));
|
||||
}
|
||||
del_timer(&test_timer);
|
||||
test_timer.expires = jiffies + 1000;
|
||||
add_timer(&test_timer);
|
||||
}
|
||||
|
||||
static int __init test_init(void)
|
||||
{
|
||||
/* initialize the timer that will increment the counter */
|
||||
init_timer(&test_timer);
|
||||
test_timer.function = do_test_timer;
|
||||
test_timer.expires = jiffies + 1;
|
||||
add_timer(&test_timer);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static void __exit test_exit(void)
|
||||
{
|
||||
del_timer_sync(&test_timer);
|
||||
}
|
||||
|
||||
module_init(test_init);
|
||||
module_exit(test_exit);
|
||||
|
||||
MODULE_LICENSE("GPL");
|
||||
MODULE_AUTHOR("Mathieu Desnoyers");
|
||||
MODULE_DESCRIPTION("Local Atomic Ops");
|
55
Documentation/core-api/tracepoint.rst
Normal file
55
Documentation/core-api/tracepoint.rst
Normal file
@ -0,0 +1,55 @@
|
||||
===============================
|
||||
The Linux Kernel Tracepoint API
|
||||
===============================
|
||||
|
||||
:Author: Jason Baron
|
||||
:Author: William Cohen
|
||||
|
||||
Introduction
|
||||
============
|
||||
|
||||
Tracepoints are static probe points that are located in strategic points
|
||||
throughout the kernel. 'Probes' register/unregister with tracepoints via
|
||||
a callback mechanism. The 'probes' are strictly typed functions that are
|
||||
passed a unique set of parameters defined by each tracepoint.
|
||||
|
||||
From this simple callback mechanism, 'probes' can be used to profile,
|
||||
debug, and understand kernel behavior. There are a number of tools that
|
||||
provide a framework for using 'probes'. These tools include Systemtap,
|
||||
ftrace, and LTTng.
|
||||
|
||||
Tracepoints are defined in a number of header files via various macros.
|
||||
Thus, the purpose of this document is to provide a clear accounting of
|
||||
the available tracepoints. The intention is to understand not only what
|
||||
tracepoints are available but also to understand where future
|
||||
tracepoints might be added.
|
||||
|
||||
The API presented has functions of the form:
|
||||
``trace_tracepointname(function parameters)``. These are the tracepoints
|
||||
callbacks that are found throughout the code. Registering and
|
||||
unregistering probes with these callback sites is covered in the
|
||||
``Documentation/trace/*`` directory.
|
||||
|
||||
IRQ
|
||||
===
|
||||
|
||||
.. kernel-doc:: include/trace/events/irq.h
|
||||
:internal:
|
||||
|
||||
SIGNAL
|
||||
======
|
||||
|
||||
.. kernel-doc:: include/trace/events/signal.h
|
||||
:internal:
|
||||
|
||||
Block IO
|
||||
========
|
||||
|
||||
.. kernel-doc:: include/trace/events/block.h
|
||||
:internal:
|
||||
|
||||
Workqueue
|
||||
=========
|
||||
|
||||
.. kernel-doc:: include/trace/events/workqueue.h
|
||||
:internal:
|
@ -1,21 +1,14 @@
|
||||
|
||||
====================================
|
||||
Concurrency Managed Workqueue (cmwq)
|
||||
====================================
|
||||
|
||||
September, 2010 Tejun Heo <tj@kernel.org>
|
||||
Florian Mickler <florian@mickler.org>
|
||||
|
||||
CONTENTS
|
||||
|
||||
1. Introduction
|
||||
2. Why cmwq?
|
||||
3. The Design
|
||||
4. Application Programming Interface (API)
|
||||
5. Example Execution Scenarios
|
||||
6. Guidelines
|
||||
7. Debugging
|
||||
:Date: September, 2010
|
||||
:Author: Tejun Heo <tj@kernel.org>
|
||||
:Author: Florian Mickler <florian@mickler.org>
|
||||
|
||||
|
||||
1. Introduction
|
||||
Introduction
|
||||
============
|
||||
|
||||
There are many cases where an asynchronous process execution context
|
||||
is needed and the workqueue (wq) API is the most commonly used
|
||||
@ -32,7 +25,8 @@ there is no work item left on the workqueue the worker becomes idle.
|
||||
When a new work item gets queued, the worker begins executing again.
|
||||
|
||||
|
||||
2. Why cmwq?
|
||||
Why cmwq?
|
||||
=========
|
||||
|
||||
In the original wq implementation, a multi threaded (MT) wq had one
|
||||
worker thread per CPU and a single threaded (ST) wq had one worker
|
||||
@ -71,7 +65,8 @@ focus on the following goals.
|
||||
the API users don't need to worry about such details.
|
||||
|
||||
|
||||
3. The Design
|
||||
The Design
|
||||
==========
|
||||
|
||||
In order to ease the asynchronous execution of functions a new
|
||||
abstraction, the work item, is introduced.
|
||||
@ -102,7 +97,7 @@ aspects of the way the work items are executed by setting flags on the
|
||||
workqueue they are putting the work item on. These flags include
|
||||
things like CPU locality, concurrency limits, priority and more. To
|
||||
get a detailed overview refer to the API description of
|
||||
alloc_workqueue() below.
|
||||
``alloc_workqueue()`` below.
|
||||
|
||||
When a work item is queued to a workqueue, the target worker-pool is
|
||||
determined according to the queue parameters and workqueue attributes
|
||||
@ -136,7 +131,7 @@ them.
|
||||
|
||||
For unbound workqueues, the number of backing pools is dynamic.
|
||||
Unbound workqueue can be assigned custom attributes using
|
||||
apply_workqueue_attrs() and workqueue will automatically create
|
||||
``apply_workqueue_attrs()`` and workqueue will automatically create
|
||||
backing worker pools matching the attributes. The responsibility of
|
||||
regulating concurrency level is on the users. There is also a flag to
|
||||
mark a bound wq to ignore the concurrency management. Please refer to
|
||||
@ -151,94 +146,95 @@ pressure. Else it is possible that the worker-pool deadlocks waiting
|
||||
for execution contexts to free up.
|
||||
|
||||
|
||||
4. Application Programming Interface (API)
|
||||
Application Programming Interface (API)
|
||||
=======================================
|
||||
|
||||
alloc_workqueue() allocates a wq. The original create_*workqueue()
|
||||
functions are deprecated and scheduled for removal. alloc_workqueue()
|
||||
takes three arguments - @name, @flags and @max_active. @name is the
|
||||
name of the wq and also used as the name of the rescuer thread if
|
||||
there is one.
|
||||
``alloc_workqueue()`` allocates a wq. The original
|
||||
``create_*workqueue()`` functions are deprecated and scheduled for
|
||||
removal. ``alloc_workqueue()`` takes three arguments - @``name``,
|
||||
``@flags`` and ``@max_active``. ``@name`` is the name of the wq and
|
||||
also used as the name of the rescuer thread if there is one.
|
||||
|
||||
A wq no longer manages execution resources but serves as a domain for
|
||||
forward progress guarantee, flush and work item attributes. @flags
|
||||
and @max_active control how work items are assigned execution
|
||||
forward progress guarantee, flush and work item attributes. ``@flags``
|
||||
and ``@max_active`` control how work items are assigned execution
|
||||
resources, scheduled and executed.
|
||||
|
||||
@flags:
|
||||
|
||||
WQ_UNBOUND
|
||||
``flags``
|
||||
---------
|
||||
|
||||
Work items queued to an unbound wq are served by the special
|
||||
worker-pools which host workers which are not bound to any
|
||||
specific CPU. This makes the wq behave as a simple execution
|
||||
context provider without concurrency management. The unbound
|
||||
worker-pools try to start execution of work items as soon as
|
||||
possible. Unbound wq sacrifices locality but is useful for
|
||||
the following cases.
|
||||
``WQ_UNBOUND``
|
||||
Work items queued to an unbound wq are served by the special
|
||||
worker-pools which host workers which are not bound to any
|
||||
specific CPU. This makes the wq behave as a simple execution
|
||||
context provider without concurrency management. The unbound
|
||||
worker-pools try to start execution of work items as soon as
|
||||
possible. Unbound wq sacrifices locality but is useful for
|
||||
the following cases.
|
||||
|
||||
* Wide fluctuation in the concurrency level requirement is
|
||||
expected and using bound wq may end up creating large number
|
||||
of mostly unused workers across different CPUs as the issuer
|
||||
hops through different CPUs.
|
||||
* Wide fluctuation in the concurrency level requirement is
|
||||
expected and using bound wq may end up creating large number
|
||||
of mostly unused workers across different CPUs as the issuer
|
||||
hops through different CPUs.
|
||||
|
||||
* Long running CPU intensive workloads which can be better
|
||||
managed by the system scheduler.
|
||||
* Long running CPU intensive workloads which can be better
|
||||
managed by the system scheduler.
|
||||
|
||||
WQ_FREEZABLE
|
||||
``WQ_FREEZABLE``
|
||||
A freezable wq participates in the freeze phase of the system
|
||||
suspend operations. Work items on the wq are drained and no
|
||||
new work item starts execution until thawed.
|
||||
|
||||
A freezable wq participates in the freeze phase of the system
|
||||
suspend operations. Work items on the wq are drained and no
|
||||
new work item starts execution until thawed.
|
||||
``WQ_MEM_RECLAIM``
|
||||
All wq which might be used in the memory reclaim paths **MUST**
|
||||
have this flag set. The wq is guaranteed to have at least one
|
||||
execution context regardless of memory pressure.
|
||||
|
||||
WQ_MEM_RECLAIM
|
||||
``WQ_HIGHPRI``
|
||||
Work items of a highpri wq are queued to the highpri
|
||||
worker-pool of the target cpu. Highpri worker-pools are
|
||||
served by worker threads with elevated nice level.
|
||||
|
||||
All wq which might be used in the memory reclaim paths _MUST_
|
||||
have this flag set. The wq is guaranteed to have at least one
|
||||
execution context regardless of memory pressure.
|
||||
Note that normal and highpri worker-pools don't interact with
|
||||
each other. Each maintain its separate pool of workers and
|
||||
implements concurrency management among its workers.
|
||||
|
||||
WQ_HIGHPRI
|
||||
``WQ_CPU_INTENSIVE``
|
||||
Work items of a CPU intensive wq do not contribute to the
|
||||
concurrency level. In other words, runnable CPU intensive
|
||||
work items will not prevent other work items in the same
|
||||
worker-pool from starting execution. This is useful for bound
|
||||
work items which are expected to hog CPU cycles so that their
|
||||
execution is regulated by the system scheduler.
|
||||
|
||||
Work items of a highpri wq are queued to the highpri
|
||||
worker-pool of the target cpu. Highpri worker-pools are
|
||||
served by worker threads with elevated nice level.
|
||||
Although CPU intensive work items don't contribute to the
|
||||
concurrency level, start of their executions is still
|
||||
regulated by the concurrency management and runnable
|
||||
non-CPU-intensive work items can delay execution of CPU
|
||||
intensive work items.
|
||||
|
||||
Note that normal and highpri worker-pools don't interact with
|
||||
each other. Each maintain its separate pool of workers and
|
||||
implements concurrency management among its workers.
|
||||
This flag is meaningless for unbound wq.
|
||||
|
||||
WQ_CPU_INTENSIVE
|
||||
Note that the flag ``WQ_NON_REENTRANT`` no longer exists as all
|
||||
workqueues are now non-reentrant - any work item is guaranteed to be
|
||||
executed by at most one worker system-wide at any given time.
|
||||
|
||||
Work items of a CPU intensive wq do not contribute to the
|
||||
concurrency level. In other words, runnable CPU intensive
|
||||
work items will not prevent other work items in the same
|
||||
worker-pool from starting execution. This is useful for bound
|
||||
work items which are expected to hog CPU cycles so that their
|
||||
execution is regulated by the system scheduler.
|
||||
|
||||
Although CPU intensive work items don't contribute to the
|
||||
concurrency level, start of their executions is still
|
||||
regulated by the concurrency management and runnable
|
||||
non-CPU-intensive work items can delay execution of CPU
|
||||
intensive work items.
|
||||
``max_active``
|
||||
--------------
|
||||
|
||||
This flag is meaningless for unbound wq.
|
||||
|
||||
Note that the flag WQ_NON_REENTRANT no longer exists as all workqueues
|
||||
are now non-reentrant - any work item is guaranteed to be executed by
|
||||
at most one worker system-wide at any given time.
|
||||
|
||||
@max_active:
|
||||
|
||||
@max_active determines the maximum number of execution contexts per
|
||||
CPU which can be assigned to the work items of a wq. For example,
|
||||
with @max_active of 16, at most 16 work items of the wq can be
|
||||
``@max_active`` determines the maximum number of execution contexts
|
||||
per CPU which can be assigned to the work items of a wq. For example,
|
||||
with ``@max_active`` of 16, at most 16 work items of the wq can be
|
||||
executing at the same time per CPU.
|
||||
|
||||
Currently, for a bound wq, the maximum limit for @max_active is 512
|
||||
and the default value used when 0 is specified is 256. For an unbound
|
||||
wq, the limit is higher of 512 and 4 * num_possible_cpus(). These
|
||||
values are chosen sufficiently high such that they are not the
|
||||
limiting factor while providing protection in runaway cases.
|
||||
Currently, for a bound wq, the maximum limit for ``@max_active`` is
|
||||
512 and the default value used when 0 is specified is 256. For an
|
||||
unbound wq, the limit is higher of 512 and 4 *
|
||||
``num_possible_cpus()``. These values are chosen sufficiently high
|
||||
such that they are not the limiting factor while providing protection
|
||||
in runaway cases.
|
||||
|
||||
The number of active work items of a wq is usually regulated by the
|
||||
users of the wq, more specifically, by how many work items the users
|
||||
@ -247,13 +243,14 @@ throttling the number of active work items, specifying '0' is
|
||||
recommended.
|
||||
|
||||
Some users depend on the strict execution ordering of ST wq. The
|
||||
combination of @max_active of 1 and WQ_UNBOUND is used to achieve this
|
||||
behavior. Work items on such wq are always queued to the unbound
|
||||
worker-pools and only one work item can be active at any given time thus
|
||||
achieving the same ordering property as ST wq.
|
||||
combination of ``@max_active`` of 1 and ``WQ_UNBOUND`` is used to
|
||||
achieve this behavior. Work items on such wq are always queued to the
|
||||
unbound worker-pools and only one work item can be active at any given
|
||||
time thus achieving the same ordering property as ST wq.
|
||||
|
||||
|
||||
5. Example Execution Scenarios
|
||||
Example Execution Scenarios
|
||||
===========================
|
||||
|
||||
The following example execution scenarios try to illustrate how cmwq
|
||||
behave under different configurations.
|
||||
@ -265,7 +262,7 @@ behave under different configurations.
|
||||
|
||||
Ignoring all other tasks, works and processing overhead, and assuming
|
||||
simple FIFO scheduling, the following is one highly simplified version
|
||||
of possible sequences of events with the original wq.
|
||||
of possible sequences of events with the original wq. ::
|
||||
|
||||
TIME IN MSECS EVENT
|
||||
0 w0 starts and burns CPU
|
||||
@ -279,7 +276,7 @@ of possible sequences of events with the original wq.
|
||||
40 w2 sleeps
|
||||
50 w2 wakes up and finishes
|
||||
|
||||
And with cmwq with @max_active >= 3,
|
||||
And with cmwq with ``@max_active`` >= 3, ::
|
||||
|
||||
TIME IN MSECS EVENT
|
||||
0 w0 starts and burns CPU
|
||||
@ -293,7 +290,7 @@ And with cmwq with @max_active >= 3,
|
||||
20 w1 wakes up and finishes
|
||||
25 w2 wakes up and finishes
|
||||
|
||||
If @max_active == 2,
|
||||
If ``@max_active`` == 2, ::
|
||||
|
||||
TIME IN MSECS EVENT
|
||||
0 w0 starts and burns CPU
|
||||
@ -308,7 +305,7 @@ If @max_active == 2,
|
||||
35 w2 wakes up and finishes
|
||||
|
||||
Now, let's assume w1 and w2 are queued to a different wq q1 which has
|
||||
WQ_CPU_INTENSIVE set,
|
||||
``WQ_CPU_INTENSIVE`` set, ::
|
||||
|
||||
TIME IN MSECS EVENT
|
||||
0 w0 starts and burns CPU
|
||||
@ -322,13 +319,15 @@ WQ_CPU_INTENSIVE set,
|
||||
25 w2 wakes up and finishes
|
||||
|
||||
|
||||
6. Guidelines
|
||||
Guidelines
|
||||
==========
|
||||
|
||||
* Do not forget to use WQ_MEM_RECLAIM if a wq may process work items
|
||||
which are used during memory reclaim. Each wq with WQ_MEM_RECLAIM
|
||||
set has an execution context reserved for it. If there is
|
||||
dependency among multiple work items used during memory reclaim,
|
||||
they should be queued to separate wq each with WQ_MEM_RECLAIM.
|
||||
* Do not forget to use ``WQ_MEM_RECLAIM`` if a wq may process work
|
||||
items which are used during memory reclaim. Each wq with
|
||||
``WQ_MEM_RECLAIM`` set has an execution context reserved for it. If
|
||||
there is dependency among multiple work items used during memory
|
||||
reclaim, they should be queued to separate wq each with
|
||||
``WQ_MEM_RECLAIM``.
|
||||
|
||||
* Unless strict ordering is required, there is no need to use ST wq.
|
||||
|
||||
@ -337,30 +336,31 @@ WQ_CPU_INTENSIVE set,
|
||||
well under the default limit.
|
||||
|
||||
* A wq serves as a domain for forward progress guarantee
|
||||
(WQ_MEM_RECLAIM, flush and work item attributes. Work items which
|
||||
are not involved in memory reclaim and don't need to be flushed as a
|
||||
part of a group of work items, and don't require any special
|
||||
attribute, can use one of the system wq. There is no difference in
|
||||
execution characteristics between using a dedicated wq and a system
|
||||
wq.
|
||||
(``WQ_MEM_RECLAIM``, flush and work item attributes. Work items
|
||||
which are not involved in memory reclaim and don't need to be
|
||||
flushed as a part of a group of work items, and don't require any
|
||||
special attribute, can use one of the system wq. There is no
|
||||
difference in execution characteristics between using a dedicated wq
|
||||
and a system wq.
|
||||
|
||||
* Unless work items are expected to consume a huge amount of CPU
|
||||
cycles, using a bound wq is usually beneficial due to the increased
|
||||
level of locality in wq operations and work item execution.
|
||||
|
||||
|
||||
7. Debugging
|
||||
Debugging
|
||||
=========
|
||||
|
||||
Because the work functions are executed by generic worker threads
|
||||
there are a few tricks needed to shed some light on misbehaving
|
||||
workqueue users.
|
||||
|
||||
Worker threads show up in the process list as:
|
||||
Worker threads show up in the process list as: ::
|
||||
|
||||
root 5671 0.0 0.0 0 0 ? S 12:07 0:00 [kworker/0:1]
|
||||
root 5672 0.0 0.0 0 0 ? S 12:07 0:00 [kworker/1:2]
|
||||
root 5673 0.0 0.0 0 0 ? S 12:12 0:00 [kworker/0:0]
|
||||
root 5674 0.0 0.0 0 0 ? S 12:13 0:00 [kworker/1:0]
|
||||
root 5671 0.0 0.0 0 0 ? S 12:07 0:00 [kworker/0:1]
|
||||
root 5672 0.0 0.0 0 0 ? S 12:07 0:00 [kworker/1:2]
|
||||
root 5673 0.0 0.0 0 0 ? S 12:12 0:00 [kworker/0:0]
|
||||
root 5674 0.0 0.0 0 0 ? S 12:13 0:00 [kworker/1:0]
|
||||
|
||||
If kworkers are going crazy (using too much cpu), there are two types
|
||||
of possible problems:
|
||||
@ -368,7 +368,7 @@ of possible problems:
|
||||
1. Something being scheduled in rapid succession
|
||||
2. A single work item that consumes lots of cpu cycles
|
||||
|
||||
The first one can be tracked using tracing:
|
||||
The first one can be tracked using tracing: ::
|
||||
|
||||
$ echo workqueue:workqueue_queue_work > /sys/kernel/debug/tracing/set_event
|
||||
$ cat /sys/kernel/debug/tracing/trace_pipe > out.txt
|
||||
@ -380,9 +380,15 @@ the output and the offender can be determined with the work item
|
||||
function.
|
||||
|
||||
For the second type of problems it should be possible to just check
|
||||
the stack trace of the offending worker thread.
|
||||
the stack trace of the offending worker thread. ::
|
||||
|
||||
$ cat /proc/THE_OFFENDING_KWORKER/stack
|
||||
|
||||
The work item's function should be trivially visible in the stack
|
||||
trace.
|
||||
|
||||
|
||||
Kernel Inline Documentations Reference
|
||||
======================================
|
||||
|
||||
.. kernel-doc:: include/linux/workqueue.h
|
@ -84,9 +84,9 @@ are added or removed anytime. Trimming it accurately for your system needs
|
||||
upfront can save some boot time memory. See below for how we use heuristics
|
||||
in x86_64 case to keep this under check.
|
||||
|
||||
cpu_online_mask: Bitmap of all CPUs currently online. Its set in __cpu_up()
|
||||
after a cpu is available for kernel scheduling and ready to receive
|
||||
interrupts from devices. Its cleared when a cpu is brought down using
|
||||
cpu_online_mask: Bitmap of all CPUs currently online. It's set in __cpu_up()
|
||||
after a CPU is available for kernel scheduling and ready to receive
|
||||
interrupts from devices. It's cleared when a CPU is brought down using
|
||||
__cpu_disable(), before which all OS services including interrupts are
|
||||
migrated to another target CPU.
|
||||
|
||||
@ -181,7 +181,7 @@ To support physical addition/removal, one would need some BIOS hooks and
|
||||
the platform should have something like an attention button in PCI hotplug.
|
||||
CONFIG_ACPI_HOTPLUG_CPU enables ACPI support for physical add/remove of CPUs.
|
||||
|
||||
Q: How do i logically offline a CPU?
|
||||
Q: How do I logically offline a CPU?
|
||||
A: Do the following.
|
||||
|
||||
#echo 0 > /sys/devices/system/cpu/cpuX/online
|
||||
@ -191,15 +191,15 @@ Once the logical offline is successful, check
|
||||
#cat /proc/interrupts
|
||||
|
||||
You should now not see the CPU that you removed. Also online file will report
|
||||
the state as 0 when a cpu if offline and 1 when its online.
|
||||
the state as 0 when a CPU is offline and 1 when it's online.
|
||||
|
||||
#To display the current cpu state.
|
||||
#cat /sys/devices/system/cpu/cpuX/online
|
||||
|
||||
Q: Why can't i remove CPU0 on some systems?
|
||||
Q: Why can't I remove CPU0 on some systems?
|
||||
A: Some architectures may have some special dependency on a certain CPU.
|
||||
|
||||
For e.g in IA64 platforms we have ability to sent platform interrupts to the
|
||||
For e.g in IA64 platforms we have ability to send platform interrupts to the
|
||||
OS. a.k.a Corrected Platform Error Interrupts (CPEI). In current ACPI
|
||||
specifications, we didn't have a way to change the target CPU. Hence if the
|
||||
current ACPI version doesn't support such re-direction, we disable that CPU
|
||||
@ -231,7 +231,7 @@ either by CONFIG_BOOTPARAM_HOTPLUG_CPU0 or by kernel parameter cpu0_hotplug.
|
||||
|
||||
--Fenghua Yu <fenghua.yu@intel.com>
|
||||
|
||||
Q: How do i find out if a particular CPU is not removable?
|
||||
Q: How do I find out if a particular CPU is not removable?
|
||||
A: Depending on the implementation, some architectures may show this by the
|
||||
absence of the "online" file. This is done if it can be determined ahead of
|
||||
time that this CPU cannot be removed.
|
||||
@ -250,7 +250,7 @@ A: The following happen, listed in no particular order :-)
|
||||
- All processes are migrated away from this outgoing CPU to new CPUs.
|
||||
The new CPU is chosen from each process' current cpuset, which may be
|
||||
a subset of all online CPUs.
|
||||
- All interrupts targeted to this CPU is migrated to a new CPU
|
||||
- All interrupts targeted to this CPU are migrated to a new CPU
|
||||
- timers/bottom half/task lets are also migrated to a new CPU
|
||||
- Once all services are migrated, kernel calls an arch specific routine
|
||||
__cpu_disable() to perform arch specific cleanup.
|
||||
@ -259,10 +259,10 @@ A: The following happen, listed in no particular order :-)
|
||||
CPU is being offlined).
|
||||
|
||||
"It is expected that each service cleans up when the CPU_DOWN_PREPARE
|
||||
notifier is called, when CPU_DEAD is called its expected there is nothing
|
||||
notifier is called, when CPU_DEAD is called it's expected there is nothing
|
||||
running on behalf of this CPU that was offlined"
|
||||
|
||||
Q: If i have some kernel code that needs to be aware of CPU arrival and
|
||||
Q: If I have some kernel code that needs to be aware of CPU arrival and
|
||||
departure, how to i arrange for proper notification?
|
||||
A: This is what you would need in your kernel code to receive notifications.
|
||||
|
||||
@ -311,7 +311,7 @@ things will happen if a notifier in path sent a BAD notify code.
|
||||
|
||||
Q: I don't see my action being called for all CPUs already up and running?
|
||||
A: Yes, CPU notifiers are called only when new CPUs are on-lined or offlined.
|
||||
If you need to perform some action for each cpu already in the system, then
|
||||
If you need to perform some action for each CPU already in the system, then
|
||||
do this:
|
||||
|
||||
for_each_online_cpu(i) {
|
||||
@ -363,8 +363,8 @@ A: Yes, CPU notifiers are called only when new CPUs are on-lined or offlined.
|
||||
callbacks as well as initialize the already online CPUs.
|
||||
|
||||
|
||||
Q: If i would like to develop cpu hotplug support for a new architecture,
|
||||
what do i need at a minimum?
|
||||
Q: If I would like to develop CPU hotplug support for a new architecture,
|
||||
what do I need at a minimum?
|
||||
A: The following are what is required for CPU hotplug infrastructure to work
|
||||
correctly.
|
||||
|
||||
@ -382,8 +382,8 @@ A: The following are what is required for CPU hotplug infrastructure to work
|
||||
per_cpu state to be set, to ensure the processor
|
||||
dead routine is called to be sure positively.
|
||||
|
||||
Q: I need to ensure that a particular cpu is not removed when there is some
|
||||
work specific to this cpu is in progress.
|
||||
Q: I need to ensure that a particular CPU is not removed when there is some
|
||||
work specific to this CPU in progress.
|
||||
A: There are two ways. If your code can be run in interrupt context, use
|
||||
smp_call_function_single(), otherwise use work_on_cpu(). Note that
|
||||
work_on_cpu() is slow, and can fail due to out of memory:
|
||||
|
10
Documentation/dev-tools/conf.py
Normal file
10
Documentation/dev-tools/conf.py
Normal file
@ -0,0 +1,10 @@
|
||||
# -*- coding: utf-8; mode: python -*-
|
||||
|
||||
project = "Development tools for the kernel"
|
||||
|
||||
tags.add("subproject")
|
||||
|
||||
latex_documents = [
|
||||
('index', 'dev-tools.tex', project,
|
||||
'The kernel development community', 'manual'),
|
||||
]
|
@ -201,7 +201,9 @@ Appendix A: gather_on_build.sh
|
||||
------------------------------
|
||||
|
||||
Sample script to gather coverage meta files on the build machine
|
||||
(see 6a)::
|
||||
(see 6a):
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
#!/bin/bash
|
||||
|
||||
@ -232,7 +234,9 @@ Appendix B: gather_on_test.sh
|
||||
-----------------------------
|
||||
|
||||
Sample script to gather coverage data files on the test machine
|
||||
(see 6b)::
|
||||
(see 6b):
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
#!/bin/bash -e
|
||||
|
||||
|
@ -23,3 +23,11 @@ whole; patches welcome!
|
||||
kmemleak
|
||||
kmemcheck
|
||||
gdb-kernel-debugging
|
||||
|
||||
|
||||
.. only:: subproject and html
|
||||
|
||||
Indices
|
||||
=======
|
||||
|
||||
* :ref:`genindex`
|
@ -24,7 +24,9 @@ Profiling data will only become accessible once debugfs has been mounted::
|
||||
|
||||
mount -t debugfs none /sys/kernel/debug
|
||||
|
||||
The following program demonstrates kcov usage from within a test program::
|
||||
The following program demonstrates kcov usage from within a test program:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
#include <stdio.h>
|
||||
#include <stddef.h>
|
||||
|
@ -1,9 +0,0 @@
|
||||
Linux Kernel Development Documentation
|
||||
======================================
|
||||
|
||||
Contents:
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
development-process
|
@ -17,7 +17,7 @@ The target is named "raid" and it accepts the following parameters:
|
||||
raid0 RAID0 striping (no resilience)
|
||||
raid1 RAID1 mirroring
|
||||
raid4 RAID4 with dedicated last parity disk
|
||||
raid5_n RAID5 with dedicated last parity disk suporting takeover
|
||||
raid5_n RAID5 with dedicated last parity disk supporting takeover
|
||||
Same as raid4
|
||||
-Transitory layout
|
||||
raid5_la RAID5 left asymmetric
|
||||
@ -36,7 +36,7 @@ The target is named "raid" and it accepts the following parameters:
|
||||
- rotating parity N (right-to-left) with data continuation
|
||||
raid6_n_6 RAID6 with dedicate parity disks
|
||||
- parity and Q-syndrome on the last 2 disks;
|
||||
laylout for takeover from/to raid4/raid5_n
|
||||
layout for takeover from/to raid4/raid5_n
|
||||
raid6_la_6 Same as "raid_la" plus dedicated last Q-syndrome disk
|
||||
- layout for takeover from raid5_la from/to raid6
|
||||
raid6_ra_6 Same as "raid5_ra" dedicated last Q-syndrome disk
|
||||
@ -137,8 +137,8 @@ The target is named "raid" and it accepts the following parameters:
|
||||
device removal (negative value) or device addition (positive
|
||||
value) to any reshape supporting raid levels 4/5/6 and 10.
|
||||
RAID levels 4/5/6 allow for addition of devices (metadata
|
||||
and data device tupel), raid10_near and raid10_offset only
|
||||
allow for device addtion. raid10_far does not support any
|
||||
and data device tuple), raid10_near and raid10_offset only
|
||||
allow for device addition. raid10_far does not support any
|
||||
reshaping at all.
|
||||
A minimum of devices have to be kept to enforce resilience,
|
||||
which is 3 devices for raid4/5 and 4 devices for raid6.
|
||||
|
@ -1,7 +1,7 @@
|
||||
* Maxim DS3231 Real Time Clock
|
||||
|
||||
Required properties:
|
||||
see: Documentation/devicetree/bindings/i2c/trivial-devices.txt
|
||||
see: Documentation/devicetree/bindings/i2c/trivial-admin-guide/devices.rst
|
||||
|
||||
Optional property:
|
||||
- #clock-cells: Should be 1.
|
||||
|
@ -3,7 +3,7 @@
|
||||
Philips PCF8563/Epson RTC8564 Real Time Clock
|
||||
|
||||
Required properties:
|
||||
see: Documentation/devicetree/bindings/i2c/trivial-devices.txt
|
||||
see: Documentation/devicetree/bindings/i2c/trivial-admin-guide/devices.rst
|
||||
|
||||
Optional property:
|
||||
- #clock-cells: Should be 0.
|
||||
|
@ -3,7 +3,7 @@
|
||||
|
||||
I. For patch submitters
|
||||
|
||||
0) Normal patch submission rules from Documentation/SubmittingPatches
|
||||
0) Normal patch submission rules from Documentation/process/submitting-patches.rst
|
||||
applies.
|
||||
|
||||
1) The Documentation/ portion of the patch should be a separate patch.
|
||||
|
10
Documentation/doc-guide/conf.py
Normal file
10
Documentation/doc-guide/conf.py
Normal file
@ -0,0 +1,10 @@
|
||||
# -*- coding: utf-8; mode: python -*-
|
||||
|
||||
project = 'Linux Kernel Documentation Guide'
|
||||
|
||||
tags.add("subproject")
|
||||
|
||||
latex_documents = [
|
||||
('index', 'kernel-doc-guide.tex', 'Linux Kernel Documentation Guide',
|
||||
'The kernel development community', 'manual'),
|
||||
]
|
90
Documentation/doc-guide/docbook.rst
Normal file
90
Documentation/doc-guide/docbook.rst
Normal file
@ -0,0 +1,90 @@
|
||||
DocBook XML [DEPRECATED]
|
||||
========================
|
||||
|
||||
.. attention::
|
||||
|
||||
This section describes the deprecated DocBook XML toolchain. Please do not
|
||||
create new DocBook XML template files. Please consider converting existing
|
||||
DocBook XML templates files to Sphinx/reStructuredText.
|
||||
|
||||
Converting DocBook to Sphinx
|
||||
----------------------------
|
||||
|
||||
Over time, we expect all of the documents under ``Documentation/DocBook`` to be
|
||||
converted to Sphinx and reStructuredText. For most DocBook XML documents, a good
|
||||
enough solution is to use the simple ``Documentation/sphinx/tmplcvt`` script,
|
||||
which uses ``pandoc`` under the hood. For example::
|
||||
|
||||
$ cd Documentation/sphinx
|
||||
$ ./tmplcvt ../DocBook/in.tmpl ../out.rst
|
||||
|
||||
Then edit the resulting rst files to fix any remaining issues, and add the
|
||||
document in the ``toctree`` in ``Documentation/index.rst``.
|
||||
|
||||
Components of the kernel-doc system
|
||||
-----------------------------------
|
||||
|
||||
Many places in the source tree have extractable documentation in the form of
|
||||
block comments above functions. The components of this system are:
|
||||
|
||||
- ``scripts/kernel-doc``
|
||||
|
||||
This is a perl script that hunts for the block comments and can mark them up
|
||||
directly into reStructuredText, DocBook, man, text, and HTML. (No, not
|
||||
texinfo.)
|
||||
|
||||
- ``Documentation/DocBook/*.tmpl``
|
||||
|
||||
These are XML template files, which are normal XML files with special
|
||||
place-holders for where the extracted documentation should go.
|
||||
|
||||
- ``scripts/docproc.c``
|
||||
|
||||
This is a program for converting XML template files into XML files. When a
|
||||
file is referenced it is searched for symbols exported (EXPORT_SYMBOL), to be
|
||||
able to distinguish between internal and external functions.
|
||||
|
||||
It invokes kernel-doc, giving it the list of functions that are to be
|
||||
documented.
|
||||
|
||||
Additionally it is used to scan the XML template files to locate all the files
|
||||
referenced herein. This is used to generate dependency information as used by
|
||||
make.
|
||||
|
||||
- ``Makefile``
|
||||
|
||||
The targets 'xmldocs', 'psdocs', 'pdfdocs', and 'htmldocs' are used to build
|
||||
DocBook XML files, PostScript files, PDF files, and html files in
|
||||
Documentation/DocBook. The older target 'sgmldocs' is equivalent to 'xmldocs'.
|
||||
|
||||
- ``Documentation/DocBook/Makefile``
|
||||
|
||||
This is where C files are associated with SGML templates.
|
||||
|
||||
How to use kernel-doc comments in DocBook XML template files
|
||||
------------------------------------------------------------
|
||||
|
||||
DocBook XML template files (\*.tmpl) are like normal XML files, except that they
|
||||
can contain escape sequences where extracted documentation should be inserted.
|
||||
|
||||
``!E<filename>`` is replaced by the documentation, in ``<filename>``, for
|
||||
functions that are exported using ``EXPORT_SYMBOL``: the function list is
|
||||
collected from files listed in ``Documentation/DocBook/Makefile``.
|
||||
|
||||
``!I<filename>`` is replaced by the documentation for functions that are **not**
|
||||
exported using ``EXPORT_SYMBOL``.
|
||||
|
||||
``!D<filename>`` is used to name additional files to search for functions
|
||||
exported using ``EXPORT_SYMBOL``.
|
||||
|
||||
``!F<filename> <function [functions...]>`` is replaced by the documentation, in
|
||||
``<filename>``, for the functions listed.
|
||||
|
||||
``!P<filename> <section title>`` is replaced by the contents of the ``DOC:``
|
||||
section titled ``<section title>`` from ``<filename>``. Spaces are allowed in
|
||||
``<section title>``; do not quote the ``<section title>``.
|
||||
|
||||
``!C<filename>`` is replaced by nothing, but makes the tools check that all DOC:
|
||||
sections and documented functions, symbols, etc. are used. This makes sense to
|
||||
use when you use ``!F`` or ``!P`` only and want to verify that all documentation
|
||||
is included.
|
20
Documentation/doc-guide/index.rst
Normal file
20
Documentation/doc-guide/index.rst
Normal file
@ -0,0 +1,20 @@
|
||||
.. _doc_guide:
|
||||
|
||||
=================================
|
||||
How to write kernel documentation
|
||||
=================================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
sphinx.rst
|
||||
kernel-doc.rst
|
||||
parse-headers.rst
|
||||
docbook.rst
|
||||
|
||||
.. only:: subproject and html
|
||||
|
||||
Indices
|
||||
=======
|
||||
|
||||
* :ref:`genindex`
|
@ -1,228 +1,3 @@
|
||||
==========================
|
||||
Linux Kernel Documentation
|
||||
==========================
|
||||
|
||||
Introduction
|
||||
============
|
||||
|
||||
The Linux kernel uses `Sphinx`_ to generate pretty documentation from
|
||||
`reStructuredText`_ files under ``Documentation``. To build the documentation in
|
||||
HTML or PDF formats, use ``make htmldocs`` or ``make pdfdocs``. The generated
|
||||
documentation is placed in ``Documentation/output``.
|
||||
|
||||
.. _Sphinx: http://www.sphinx-doc.org/
|
||||
.. _reStructuredText: http://docutils.sourceforge.net/rst.html
|
||||
|
||||
The reStructuredText files may contain directives to include structured
|
||||
documentation comments, or kernel-doc comments, from source files. Usually these
|
||||
are used to describe the functions and types and design of the code. The
|
||||
kernel-doc comments have some special structure and formatting, but beyond that
|
||||
they are also treated as reStructuredText.
|
||||
|
||||
There is also the deprecated DocBook toolchain to generate documentation from
|
||||
DocBook XML template files under ``Documentation/DocBook``. The DocBook files
|
||||
are to be converted to reStructuredText, and the toolchain is slated to be
|
||||
removed.
|
||||
|
||||
Finally, there are thousands of plain text documentation files scattered around
|
||||
``Documentation``. Some of these will likely be converted to reStructuredText
|
||||
over time, but the bulk of them will remain in plain text.
|
||||
|
||||
Sphinx Build
|
||||
============
|
||||
|
||||
The usual way to generate the documentation is to run ``make htmldocs`` or
|
||||
``make pdfdocs``. There are also other formats available, see the documentation
|
||||
section of ``make help``. The generated documentation is placed in
|
||||
format-specific subdirectories under ``Documentation/output``.
|
||||
|
||||
To generate documentation, Sphinx (``sphinx-build``) must obviously be
|
||||
installed. For prettier HTML output, the Read the Docs Sphinx theme
|
||||
(``sphinx_rtd_theme``) is used if available. For PDF output, ``rst2pdf`` is also
|
||||
needed. All of these are widely available and packaged in distributions.
|
||||
|
||||
To pass extra options to Sphinx, you can use the ``SPHINXOPTS`` make
|
||||
variable. For example, use ``make SPHINXOPTS=-v htmldocs`` to get more verbose
|
||||
output.
|
||||
|
||||
To remove the generated documentation, run ``make cleandocs``.
|
||||
|
||||
Writing Documentation
|
||||
=====================
|
||||
|
||||
Adding new documentation can be as simple as:
|
||||
|
||||
1. Add a new ``.rst`` file somewhere under ``Documentation``.
|
||||
2. Refer to it from the Sphinx main `TOC tree`_ in ``Documentation/index.rst``.
|
||||
|
||||
.. _TOC tree: http://www.sphinx-doc.org/en/stable/markup/toctree.html
|
||||
|
||||
This is usually good enough for simple documentation (like the one you're
|
||||
reading right now), but for larger documents it may be advisable to create a
|
||||
subdirectory (or use an existing one). For example, the graphics subsystem
|
||||
documentation is under ``Documentation/gpu``, split to several ``.rst`` files,
|
||||
and has a separate ``index.rst`` (with a ``toctree`` of its own) referenced from
|
||||
the main index.
|
||||
|
||||
See the documentation for `Sphinx`_ and `reStructuredText`_ on what you can do
|
||||
with them. In particular, the Sphinx `reStructuredText Primer`_ is a good place
|
||||
to get started with reStructuredText. There are also some `Sphinx specific
|
||||
markup constructs`_.
|
||||
|
||||
.. _reStructuredText Primer: http://www.sphinx-doc.org/en/stable/rest.html
|
||||
.. _Sphinx specific markup constructs: http://www.sphinx-doc.org/en/stable/markup/index.html
|
||||
|
||||
Specific guidelines for the kernel documentation
|
||||
------------------------------------------------
|
||||
|
||||
Here are some specific guidelines for the kernel documentation:
|
||||
|
||||
* Please don't go overboard with reStructuredText markup. Keep it simple.
|
||||
|
||||
* Please stick to this order of heading adornments:
|
||||
|
||||
1. ``=`` with overline for document title::
|
||||
|
||||
==============
|
||||
Document title
|
||||
==============
|
||||
|
||||
2. ``=`` for chapters::
|
||||
|
||||
Chapters
|
||||
========
|
||||
|
||||
3. ``-`` for sections::
|
||||
|
||||
Section
|
||||
-------
|
||||
|
||||
4. ``~`` for subsections::
|
||||
|
||||
Subsection
|
||||
~~~~~~~~~~
|
||||
|
||||
Although RST doesn't mandate a specific order ("Rather than imposing a fixed
|
||||
number and order of section title adornment styles, the order enforced will be
|
||||
the order as encountered."), having the higher levels the same overall makes
|
||||
it easier to follow the documents.
|
||||
|
||||
|
||||
the C domain
|
||||
------------
|
||||
|
||||
The `Sphinx C Domain`_ (name c) is suited for documentation of C API. E.g. a
|
||||
function prototype:
|
||||
|
||||
.. code-block:: rst
|
||||
|
||||
.. c:function:: int ioctl( int fd, int request )
|
||||
|
||||
The C domain of the kernel-doc has some additional features. E.g. you can
|
||||
*rename* the reference name of a function with a common name like ``open`` or
|
||||
``ioctl``:
|
||||
|
||||
.. code-block:: rst
|
||||
|
||||
.. c:function:: int ioctl( int fd, int request )
|
||||
:name: VIDIOC_LOG_STATUS
|
||||
|
||||
The func-name (e.g. ioctl) remains in the output but the ref-name changed from
|
||||
``ioctl`` to ``VIDIOC_LOG_STATUS``. The index entry for this function is also
|
||||
changed to ``VIDIOC_LOG_STATUS`` and the function can now referenced by:
|
||||
|
||||
.. code-block:: rst
|
||||
|
||||
:c:func:`VIDIOC_LOG_STATUS`
|
||||
|
||||
|
||||
list tables
|
||||
-----------
|
||||
|
||||
We recommend the use of *list table* formats. The *list table* formats are
|
||||
double-stage lists. Compared to the ASCII-art they might not be as
|
||||
comfortable for
|
||||
readers of the text files. Their advantage is that they are easy to
|
||||
create or modify and that the diff of a modification is much more meaningful,
|
||||
because it is limited to the modified content.
|
||||
|
||||
The ``flat-table`` is a double-stage list similar to the ``list-table`` with
|
||||
some additional features:
|
||||
|
||||
* column-span: with the role ``cspan`` a cell can be extended through
|
||||
additional columns
|
||||
|
||||
* row-span: with the role ``rspan`` a cell can be extended through
|
||||
additional rows
|
||||
|
||||
* auto span rightmost cell of a table row over the missing cells on the right
|
||||
side of that table-row. With Option ``:fill-cells:`` this behavior can
|
||||
changed from *auto span* to *auto fill*, which automatically inserts (empty)
|
||||
cells instead of spanning the last cell.
|
||||
|
||||
options:
|
||||
|
||||
* ``:header-rows:`` [int] count of header rows
|
||||
* ``:stub-columns:`` [int] count of stub columns
|
||||
* ``:widths:`` [[int] [int] ... ] widths of columns
|
||||
* ``:fill-cells:`` instead of auto-spanning missing cells, insert missing cells
|
||||
|
||||
roles:
|
||||
|
||||
* ``:cspan:`` [int] additional columns (*morecols*)
|
||||
* ``:rspan:`` [int] additional rows (*morerows*)
|
||||
|
||||
The example below shows how to use this markup. The first level of the staged
|
||||
list is the *table-row*. In the *table-row* there is only one markup allowed,
|
||||
the list of the cells in this *table-row*. Exceptions are *comments* ( ``..`` )
|
||||
and *targets* (e.g. a ref to ``:ref:`last row <last row>``` / :ref:`last row
|
||||
<last row>`).
|
||||
|
||||
.. code-block:: rst
|
||||
|
||||
.. flat-table:: table title
|
||||
:widths: 2 1 1 3
|
||||
|
||||
* - head col 1
|
||||
- head col 2
|
||||
- head col 3
|
||||
- head col 4
|
||||
|
||||
* - column 1
|
||||
- field 1.1
|
||||
- field 1.2 with autospan
|
||||
|
||||
* - column 2
|
||||
- field 2.1
|
||||
- :rspan:`1` :cspan:`1` field 2.2 - 3.3
|
||||
|
||||
* .. _`last row`:
|
||||
|
||||
- column 3
|
||||
|
||||
Rendered as:
|
||||
|
||||
.. flat-table:: table title
|
||||
:widths: 2 1 1 3
|
||||
|
||||
* - head col 1
|
||||
- head col 2
|
||||
- head col 3
|
||||
- head col 4
|
||||
|
||||
* - column 1
|
||||
- field 1.1
|
||||
- field 1.2 with autospan
|
||||
|
||||
* - column 2
|
||||
- field 2.1
|
||||
- :rspan:`1` :cspan:`1` field 2.2 - 3.3
|
||||
|
||||
* .. _`last row`:
|
||||
|
||||
- column 3
|
||||
|
||||
|
||||
Including kernel-doc comments
|
||||
=============================
|
||||
|
||||
@ -484,7 +259,10 @@ span multiple lines. The continuation lines may contain indentation.
|
||||
In-line member documentation comments
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The structure members may also be documented in-line within the definition::
|
||||
The structure members may also be documented in-line within the definition.
|
||||
There are two styles, single-line comments where both the opening ``/**`` and
|
||||
closing ``*/`` are on the same line, and multi-line comments where they are each
|
||||
on a line of their own, like all other kernel-doc comments::
|
||||
|
||||
/**
|
||||
* struct foo - Brief description.
|
||||
@ -502,6 +280,8 @@ The structure members may also be documented in-line within the definition::
|
||||
* Here, the member description may contain several paragraphs.
|
||||
*/
|
||||
int baz;
|
||||
/** @foobar: Single line description. */
|
||||
int foobar;
|
||||
}
|
||||
|
||||
Private members
|
||||
@ -586,94 +366,3 @@ file.
|
||||
|
||||
Data structures visible in kernel include files should also be documented using
|
||||
kernel-doc formatted comments.
|
||||
|
||||
DocBook XML [DEPRECATED]
|
||||
========================
|
||||
|
||||
.. attention::
|
||||
|
||||
This section describes the deprecated DocBook XML toolchain. Please do not
|
||||
create new DocBook XML template files. Please consider converting existing
|
||||
DocBook XML templates files to Sphinx/reStructuredText.
|
||||
|
||||
Converting DocBook to Sphinx
|
||||
----------------------------
|
||||
|
||||
Over time, we expect all of the documents under ``Documentation/DocBook`` to be
|
||||
converted to Sphinx and reStructuredText. For most DocBook XML documents, a good
|
||||
enough solution is to use the simple ``Documentation/sphinx/tmplcvt`` script,
|
||||
which uses ``pandoc`` under the hood. For example::
|
||||
|
||||
$ cd Documentation/sphinx
|
||||
$ ./tmplcvt ../DocBook/in.tmpl ../out.rst
|
||||
|
||||
Then edit the resulting rst files to fix any remaining issues, and add the
|
||||
document in the ``toctree`` in ``Documentation/index.rst``.
|
||||
|
||||
Components of the kernel-doc system
|
||||
-----------------------------------
|
||||
|
||||
Many places in the source tree have extractable documentation in the form of
|
||||
block comments above functions. The components of this system are:
|
||||
|
||||
- ``scripts/kernel-doc``
|
||||
|
||||
This is a perl script that hunts for the block comments and can mark them up
|
||||
directly into reStructuredText, DocBook, man, text, and HTML. (No, not
|
||||
texinfo.)
|
||||
|
||||
- ``Documentation/DocBook/*.tmpl``
|
||||
|
||||
These are XML template files, which are normal XML files with special
|
||||
place-holders for where the extracted documentation should go.
|
||||
|
||||
- ``scripts/docproc.c``
|
||||
|
||||
This is a program for converting XML template files into XML files. When a
|
||||
file is referenced it is searched for symbols exported (EXPORT_SYMBOL), to be
|
||||
able to distinguish between internal and external functions.
|
||||
|
||||
It invokes kernel-doc, giving it the list of functions that are to be
|
||||
documented.
|
||||
|
||||
Additionally it is used to scan the XML template files to locate all the files
|
||||
referenced herein. This is used to generate dependency information as used by
|
||||
make.
|
||||
|
||||
- ``Makefile``
|
||||
|
||||
The targets 'xmldocs', 'psdocs', 'pdfdocs', and 'htmldocs' are used to build
|
||||
DocBook XML files, PostScript files, PDF files, and html files in
|
||||
Documentation/DocBook. The older target 'sgmldocs' is equivalent to 'xmldocs'.
|
||||
|
||||
- ``Documentation/DocBook/Makefile``
|
||||
|
||||
This is where C files are associated with SGML templates.
|
||||
|
||||
How to use kernel-doc comments in DocBook XML template files
|
||||
------------------------------------------------------------
|
||||
|
||||
DocBook XML template files (\*.tmpl) are like normal XML files, except that they
|
||||
can contain escape sequences where extracted documentation should be inserted.
|
||||
|
||||
``!E<filename>`` is replaced by the documentation, in ``<filename>``, for
|
||||
functions that are exported using ``EXPORT_SYMBOL``: the function list is
|
||||
collected from files listed in ``Documentation/DocBook/Makefile``.
|
||||
|
||||
``!I<filename>`` is replaced by the documentation for functions that are **not**
|
||||
exported using ``EXPORT_SYMBOL``.
|
||||
|
||||
``!D<filename>`` is used to name additional files to search for functions
|
||||
exported using ``EXPORT_SYMBOL``.
|
||||
|
||||
``!F<filename> <function [functions...]>`` is replaced by the documentation, in
|
||||
``<filename>``, for the functions listed.
|
||||
|
||||
``!P<filename> <section title>`` is replaced by the contents of the ``DOC:``
|
||||
section titled ``<section title>`` from ``<filename>``. Spaces are allowed in
|
||||
``<section title>``; do not quote the ``<section title>``.
|
||||
|
||||
``!C<filename>`` is replaced by nothing, but makes the tools check that all DOC:
|
||||
sections and documented functions, symbols, etc. are used. This makes sense to
|
||||
use when you use ``!F`` or ``!P`` only and want to verify that all documentation
|
||||
is included.
|
192
Documentation/doc-guide/parse-headers.rst
Normal file
192
Documentation/doc-guide/parse-headers.rst
Normal file
@ -0,0 +1,192 @@
|
||||
===========================
|
||||
Including uAPI header files
|
||||
===========================
|
||||
|
||||
Sometimes, it is useful to include header files and C example codes in
|
||||
order to describe the userspace API and to generate cross-references
|
||||
between the code and the documentation. Adding cross-references for
|
||||
userspace API files has an additional vantage: Sphinx will generate warnings
|
||||
if a symbol is not found at the documentation. That helps to keep the
|
||||
uAPI documentation in sync with the Kernel changes.
|
||||
The :ref:`parse_headers.pl <parse_headers>` provide a way to generate such
|
||||
cross-references. It has to be called via Makefile, while building the
|
||||
documentation. Please see ``Documentation/media/Makefile`` for an example
|
||||
about how to use it inside the Kernel tree.
|
||||
|
||||
.. _parse_headers:
|
||||
|
||||
parse_headers.pl
|
||||
^^^^^^^^^^^^^^^^
|
||||
|
||||
NAME
|
||||
****
|
||||
|
||||
|
||||
parse_headers.pl - parse a C file, in order to identify functions, structs,
|
||||
enums and defines and create cross-references to a Sphinx book.
|
||||
|
||||
|
||||
SYNOPSIS
|
||||
********
|
||||
|
||||
|
||||
\ **parse_headers.pl**\ [<options>] <C_FILE> <OUT_FILE> [<EXCEPTIONS_FILE>]
|
||||
|
||||
Where <options> can be: --debug, --help or --man.
|
||||
|
||||
|
||||
OPTIONS
|
||||
*******
|
||||
|
||||
|
||||
|
||||
\ **--debug**\
|
||||
|
||||
Put the script in verbose mode, useful for debugging.
|
||||
|
||||
|
||||
|
||||
\ **--usage**\
|
||||
|
||||
Prints a brief help message and exits.
|
||||
|
||||
|
||||
|
||||
\ **--help**\
|
||||
|
||||
Prints a more detailed help message and exits.
|
||||
|
||||
|
||||
DESCRIPTION
|
||||
***********
|
||||
|
||||
|
||||
Convert a C header or source file (C_FILE), into a ReStructured Text
|
||||
included via ..parsed-literal block with cross-references for the
|
||||
documentation files that describe the API. It accepts an optional
|
||||
EXCEPTIONS_FILE with describes what elements will be either ignored or
|
||||
be pointed to a non-default reference.
|
||||
|
||||
The output is written at the (OUT_FILE).
|
||||
|
||||
It is capable of identifying defines, functions, structs, typedefs,
|
||||
enums and enum symbols and create cross-references for all of them.
|
||||
It is also capable of distinguish #define used for specifying a Linux
|
||||
ioctl.
|
||||
|
||||
The EXCEPTIONS_FILE contain two types of statements: \ **ignore**\ or \ **replace**\ .
|
||||
|
||||
The syntax for the ignore tag is:
|
||||
|
||||
|
||||
ignore \ **type**\ \ **name**\
|
||||
|
||||
The \ **ignore**\ means that it won't generate cross references for a
|
||||
\ **name**\ symbol of type \ **type**\ .
|
||||
|
||||
The syntax for the replace tag is:
|
||||
|
||||
|
||||
replace \ **type**\ \ **name**\ \ **new_value**\
|
||||
|
||||
The \ **replace**\ means that it will generate cross references for a
|
||||
\ **name**\ symbol of type \ **type**\ , but, instead of using the default
|
||||
replacement rule, it will use \ **new_value**\ .
|
||||
|
||||
For both statements, \ **type**\ can be either one of the following:
|
||||
|
||||
|
||||
\ **ioctl**\
|
||||
|
||||
The ignore or replace statement will apply to ioctl definitions like:
|
||||
|
||||
#define VIDIOC_DBG_S_REGISTER _IOW('V', 79, struct v4l2_dbg_register)
|
||||
|
||||
|
||||
|
||||
\ **define**\
|
||||
|
||||
The ignore or replace statement will apply to any other #define found
|
||||
at C_FILE.
|
||||
|
||||
|
||||
|
||||
\ **typedef**\
|
||||
|
||||
The ignore or replace statement will apply to typedef statements at C_FILE.
|
||||
|
||||
|
||||
|
||||
\ **struct**\
|
||||
|
||||
The ignore or replace statement will apply to the name of struct statements
|
||||
at C_FILE.
|
||||
|
||||
|
||||
|
||||
\ **enum**\
|
||||
|
||||
The ignore or replace statement will apply to the name of enum statements
|
||||
at C_FILE.
|
||||
|
||||
|
||||
|
||||
\ **symbol**\
|
||||
|
||||
The ignore or replace statement will apply to the name of enum statements
|
||||
at C_FILE.
|
||||
|
||||
For replace statements, \ **new_value**\ will automatically use :c:type:
|
||||
references for \ **typedef**\ , \ **enum**\ and \ **struct**\ types. It will use :ref:
|
||||
for \ **ioctl**\ , \ **define**\ and \ **symbol**\ types. The type of reference can
|
||||
also be explicitly defined at the replace statement.
|
||||
|
||||
|
||||
|
||||
EXAMPLES
|
||||
********
|
||||
|
||||
|
||||
ignore define _VIDEODEV2_H
|
||||
|
||||
|
||||
Ignore a #define _VIDEODEV2_H at the C_FILE.
|
||||
|
||||
ignore symbol PRIVATE
|
||||
|
||||
|
||||
On a struct like:
|
||||
|
||||
enum foo { BAR1, BAR2, PRIVATE };
|
||||
|
||||
It won't generate cross-references for \ **PRIVATE**\ .
|
||||
|
||||
replace symbol BAR1 :c:type:\`foo\`
|
||||
replace symbol BAR2 :c:type:\`foo\`
|
||||
|
||||
|
||||
On a struct like:
|
||||
|
||||
enum foo { BAR1, BAR2, PRIVATE };
|
||||
|
||||
It will make the BAR1 and BAR2 enum symbols to cross reference the foo
|
||||
symbol at the C domain.
|
||||
|
||||
|
||||
BUGS
|
||||
****
|
||||
|
||||
|
||||
Report bugs to Mauro Carvalho Chehab <mchehab@s-opensource.com>
|
||||
|
||||
|
||||
COPYRIGHT
|
||||
*********
|
||||
|
||||
|
||||
Copyright (c) 2016 by Mauro Carvalho Chehab <mchehab@s-opensource.com>.
|
||||
|
||||
License GPLv2: GNU GPL version 2 <http://gnu.org/licenses/gpl.html>.
|
||||
|
||||
This is free software: you are free to change and redistribute it.
|
||||
There is NO WARRANTY, to the extent permitted by law.
|
219
Documentation/doc-guide/sphinx.rst
Normal file
219
Documentation/doc-guide/sphinx.rst
Normal file
@ -0,0 +1,219 @@
|
||||
Introduction
|
||||
============
|
||||
|
||||
The Linux kernel uses `Sphinx`_ to generate pretty documentation from
|
||||
`reStructuredText`_ files under ``Documentation``. To build the documentation in
|
||||
HTML or PDF formats, use ``make htmldocs`` or ``make pdfdocs``. The generated
|
||||
documentation is placed in ``Documentation/output``.
|
||||
|
||||
.. _Sphinx: http://www.sphinx-doc.org/
|
||||
.. _reStructuredText: http://docutils.sourceforge.net/rst.html
|
||||
|
||||
The reStructuredText files may contain directives to include structured
|
||||
documentation comments, or kernel-doc comments, from source files. Usually these
|
||||
are used to describe the functions and types and design of the code. The
|
||||
kernel-doc comments have some special structure and formatting, but beyond that
|
||||
they are also treated as reStructuredText.
|
||||
|
||||
There is also the deprecated DocBook toolchain to generate documentation from
|
||||
DocBook XML template files under ``Documentation/DocBook``. The DocBook files
|
||||
are to be converted to reStructuredText, and the toolchain is slated to be
|
||||
removed.
|
||||
|
||||
Finally, there are thousands of plain text documentation files scattered around
|
||||
``Documentation``. Some of these will likely be converted to reStructuredText
|
||||
over time, but the bulk of them will remain in plain text.
|
||||
|
||||
Sphinx Build
|
||||
============
|
||||
|
||||
The usual way to generate the documentation is to run ``make htmldocs`` or
|
||||
``make pdfdocs``. There are also other formats available, see the documentation
|
||||
section of ``make help``. The generated documentation is placed in
|
||||
format-specific subdirectories under ``Documentation/output``.
|
||||
|
||||
To generate documentation, Sphinx (``sphinx-build``) must obviously be
|
||||
installed. For prettier HTML output, the Read the Docs Sphinx theme
|
||||
(``sphinx_rtd_theme``) is used if available. For PDF output, ``rst2pdf`` is also
|
||||
needed. All of these are widely available and packaged in distributions.
|
||||
|
||||
To pass extra options to Sphinx, you can use the ``SPHINXOPTS`` make
|
||||
variable. For example, use ``make SPHINXOPTS=-v htmldocs`` to get more verbose
|
||||
output.
|
||||
|
||||
To remove the generated documentation, run ``make cleandocs``.
|
||||
|
||||
Writing Documentation
|
||||
=====================
|
||||
|
||||
Adding new documentation can be as simple as:
|
||||
|
||||
1. Add a new ``.rst`` file somewhere under ``Documentation``.
|
||||
2. Refer to it from the Sphinx main `TOC tree`_ in ``Documentation/index.rst``.
|
||||
|
||||
.. _TOC tree: http://www.sphinx-doc.org/en/stable/markup/toctree.html
|
||||
|
||||
This is usually good enough for simple documentation (like the one you're
|
||||
reading right now), but for larger documents it may be advisable to create a
|
||||
subdirectory (or use an existing one). For example, the graphics subsystem
|
||||
documentation is under ``Documentation/gpu``, split to several ``.rst`` files,
|
||||
and has a separate ``index.rst`` (with a ``toctree`` of its own) referenced from
|
||||
the main index.
|
||||
|
||||
See the documentation for `Sphinx`_ and `reStructuredText`_ on what you can do
|
||||
with them. In particular, the Sphinx `reStructuredText Primer`_ is a good place
|
||||
to get started with reStructuredText. There are also some `Sphinx specific
|
||||
markup constructs`_.
|
||||
|
||||
.. _reStructuredText Primer: http://www.sphinx-doc.org/en/stable/rest.html
|
||||
.. _Sphinx specific markup constructs: http://www.sphinx-doc.org/en/stable/markup/index.html
|
||||
|
||||
Specific guidelines for the kernel documentation
|
||||
------------------------------------------------
|
||||
|
||||
Here are some specific guidelines for the kernel documentation:
|
||||
|
||||
* Please don't go overboard with reStructuredText markup. Keep it simple.
|
||||
|
||||
* Please stick to this order of heading adornments:
|
||||
|
||||
1. ``=`` with overline for document title::
|
||||
|
||||
==============
|
||||
Document title
|
||||
==============
|
||||
|
||||
2. ``=`` for chapters::
|
||||
|
||||
Chapters
|
||||
========
|
||||
|
||||
3. ``-`` for sections::
|
||||
|
||||
Section
|
||||
-------
|
||||
|
||||
4. ``~`` for subsections::
|
||||
|
||||
Subsection
|
||||
~~~~~~~~~~
|
||||
|
||||
Although RST doesn't mandate a specific order ("Rather than imposing a fixed
|
||||
number and order of section title adornment styles, the order enforced will be
|
||||
the order as encountered."), having the higher levels the same overall makes
|
||||
it easier to follow the documents.
|
||||
|
||||
|
||||
the C domain
|
||||
------------
|
||||
|
||||
The `Sphinx C Domain`_ (name c) is suited for documentation of C API. E.g. a
|
||||
function prototype:
|
||||
|
||||
.. code-block:: rst
|
||||
|
||||
.. c:function:: int ioctl( int fd, int request )
|
||||
|
||||
The C domain of the kernel-doc has some additional features. E.g. you can
|
||||
*rename* the reference name of a function with a common name like ``open`` or
|
||||
``ioctl``:
|
||||
|
||||
.. code-block:: rst
|
||||
|
||||
.. c:function:: int ioctl( int fd, int request )
|
||||
:name: VIDIOC_LOG_STATUS
|
||||
|
||||
The func-name (e.g. ioctl) remains in the output but the ref-name changed from
|
||||
``ioctl`` to ``VIDIOC_LOG_STATUS``. The index entry for this function is also
|
||||
changed to ``VIDIOC_LOG_STATUS`` and the function can now referenced by:
|
||||
|
||||
.. code-block:: rst
|
||||
|
||||
:c:func:`VIDIOC_LOG_STATUS`
|
||||
|
||||
|
||||
list tables
|
||||
-----------
|
||||
|
||||
We recommend the use of *list table* formats. The *list table* formats are
|
||||
double-stage lists. Compared to the ASCII-art they might not be as
|
||||
comfortable for
|
||||
readers of the text files. Their advantage is that they are easy to
|
||||
create or modify and that the diff of a modification is much more meaningful,
|
||||
because it is limited to the modified content.
|
||||
|
||||
The ``flat-table`` is a double-stage list similar to the ``list-table`` with
|
||||
some additional features:
|
||||
|
||||
* column-span: with the role ``cspan`` a cell can be extended through
|
||||
additional columns
|
||||
|
||||
* row-span: with the role ``rspan`` a cell can be extended through
|
||||
additional rows
|
||||
|
||||
* auto span rightmost cell of a table row over the missing cells on the right
|
||||
side of that table-row. With Option ``:fill-cells:`` this behavior can
|
||||
changed from *auto span* to *auto fill*, which automatically inserts (empty)
|
||||
cells instead of spanning the last cell.
|
||||
|
||||
options:
|
||||
|
||||
* ``:header-rows:`` [int] count of header rows
|
||||
* ``:stub-columns:`` [int] count of stub columns
|
||||
* ``:widths:`` [[int] [int] ... ] widths of columns
|
||||
* ``:fill-cells:`` instead of auto-spanning missing cells, insert missing cells
|
||||
|
||||
roles:
|
||||
|
||||
* ``:cspan:`` [int] additional columns (*morecols*)
|
||||
* ``:rspan:`` [int] additional rows (*morerows*)
|
||||
|
||||
The example below shows how to use this markup. The first level of the staged
|
||||
list is the *table-row*. In the *table-row* there is only one markup allowed,
|
||||
the list of the cells in this *table-row*. Exceptions are *comments* ( ``..`` )
|
||||
and *targets* (e.g. a ref to ``:ref:`last row <last row>``` / :ref:`last row
|
||||
<last row>`).
|
||||
|
||||
.. code-block:: rst
|
||||
|
||||
.. flat-table:: table title
|
||||
:widths: 2 1 1 3
|
||||
|
||||
* - head col 1
|
||||
- head col 2
|
||||
- head col 3
|
||||
- head col 4
|
||||
|
||||
* - column 1
|
||||
- field 1.1
|
||||
- field 1.2 with autospan
|
||||
|
||||
* - column 2
|
||||
- field 2.1
|
||||
- :rspan:`1` :cspan:`1` field 2.2 - 3.3
|
||||
|
||||
* .. _`last row`:
|
||||
|
||||
- column 3
|
||||
|
||||
Rendered as:
|
||||
|
||||
.. flat-table:: table title
|
||||
:widths: 2 1 1 3
|
||||
|
||||
* - head col 1
|
||||
- head col 2
|
||||
- head col 3
|
||||
- head col 4
|
||||
|
||||
* - column 1
|
||||
- field 1.1
|
||||
- field 1.2 with autospan
|
||||
|
||||
* - column 2
|
||||
- field 2.1
|
||||
- :rspan:`1` :cspan:`1` field 2.2 - 3.3
|
||||
|
||||
* .. _`last row`:
|
||||
|
||||
- column 3
|
@ -3,3 +3,8 @@
|
||||
project = "Linux 802.11 Driver Developer's Guide"
|
||||
|
||||
tags.add("subproject")
|
||||
|
||||
latex_documents = [
|
||||
('index', '80211.tex', project,
|
||||
'The kernel development community', 'manual'),
|
||||
]
|
@ -9,7 +9,7 @@ Linux 802.11 Driver Developer's Guide
|
||||
mac80211
|
||||
mac80211-advanced
|
||||
|
||||
.. only:: subproject
|
||||
.. only:: subproject and html
|
||||
|
||||
Indices
|
||||
=======
|
10
Documentation/driver-api/conf.py
Normal file
10
Documentation/driver-api/conf.py
Normal file
@ -0,0 +1,10 @@
|
||||
# -*- coding: utf-8; mode: python -*-
|
||||
|
||||
project = "The Linux driver implementer's API guide"
|
||||
|
||||
tags.add("subproject")
|
||||
|
||||
latex_documents = [
|
||||
('index', 'driver-api.tex', project,
|
||||
'The kernel development community', 'manual'),
|
||||
]
|
279
Documentation/driver-api/device_link.rst
Normal file
279
Documentation/driver-api/device_link.rst
Normal file
@ -0,0 +1,279 @@
|
||||
============
|
||||
Device links
|
||||
============
|
||||
|
||||
By default, the driver core only enforces dependencies between devices
|
||||
that are borne out of a parent/child relationship within the device
|
||||
hierarchy: When suspending, resuming or shutting down the system, devices
|
||||
are ordered based on this relationship, i.e. children are always suspended
|
||||
before their parent, and the parent is always resumed before its children.
|
||||
|
||||
Sometimes there is a need to represent device dependencies beyond the
|
||||
mere parent/child relationship, e.g. between siblings, and have the
|
||||
driver core automatically take care of them.
|
||||
|
||||
Secondly, the driver core by default does not enforce any driver presence
|
||||
dependencies, i.e. that one device must be bound to a driver before
|
||||
another one can probe or function correctly.
|
||||
|
||||
Often these two dependency types come together, so a device depends on
|
||||
another one both with regards to driver presence *and* with regards to
|
||||
suspend/resume and shutdown ordering.
|
||||
|
||||
Device links allow representation of such dependencies in the driver core.
|
||||
|
||||
In its standard form, a device link combines *both* dependency types:
|
||||
It guarantees correct suspend/resume and shutdown ordering between a
|
||||
"supplier" device and its "consumer" devices, and it guarantees driver
|
||||
presence on the supplier. The consumer devices are not probed before the
|
||||
supplier is bound to a driver, and they're unbound before the supplier
|
||||
is unbound.
|
||||
|
||||
When driver presence on the supplier is irrelevant and only correct
|
||||
suspend/resume and shutdown ordering is needed, the device link may
|
||||
simply be set up with the ``DL_FLAG_STATELESS`` flag. In other words,
|
||||
enforcing driver presence on the supplier is optional.
|
||||
|
||||
Another optional feature is runtime PM integration: By setting the
|
||||
``DL_FLAG_PM_RUNTIME`` flag on addition of the device link, the PM core
|
||||
is instructed to runtime resume the supplier and keep it active
|
||||
whenever and for as long as the consumer is runtime resumed.
|
||||
|
||||
Usage
|
||||
=====
|
||||
|
||||
The earliest point in time when device links can be added is after
|
||||
:c:func:`device_add()` has been called for the supplier and
|
||||
:c:func:`device_initialize()` has been called for the consumer.
|
||||
|
||||
It is legal to add them later, but care must be taken that the system
|
||||
remains in a consistent state: E.g. a device link cannot be added in
|
||||
the midst of a suspend/resume transition, so either commencement of
|
||||
such a transition needs to be prevented with :c:func:`lock_system_sleep()`,
|
||||
or the device link needs to be added from a function which is guaranteed
|
||||
not to run in parallel to a suspend/resume transition, such as from a
|
||||
device ``->probe`` callback or a boot-time PCI quirk.
|
||||
|
||||
Another example for an inconsistent state would be a device link that
|
||||
represents a driver presence dependency, yet is added from the consumer's
|
||||
``->probe`` callback while the supplier hasn't probed yet: Had the driver
|
||||
core known about the device link earlier, it wouldn't have probed the
|
||||
consumer in the first place. The onus is thus on the consumer to check
|
||||
presence of the supplier after adding the link, and defer probing on
|
||||
non-presence.
|
||||
|
||||
If a device link is added in the ``->probe`` callback of the supplier or
|
||||
consumer driver, it is typically deleted in its ``->remove`` callback for
|
||||
symmetry. That way, if the driver is compiled as a module, the device
|
||||
link is added on module load and orderly deleted on unload. The same
|
||||
restrictions that apply to device link addition (e.g. exclusion of a
|
||||
parallel suspend/resume transition) apply equally to deletion.
|
||||
|
||||
Several flags may be specified on device link addition, two of which
|
||||
have already been mentioned above: ``DL_FLAG_STATELESS`` to express that no
|
||||
driver presence dependency is needed (but only correct suspend/resume and
|
||||
shutdown ordering) and ``DL_FLAG_PM_RUNTIME`` to express that runtime PM
|
||||
integration is desired.
|
||||
|
||||
Two other flags are specifically targeted at use cases where the device
|
||||
link is added from the consumer's ``->probe`` callback: ``DL_FLAG_RPM_ACTIVE``
|
||||
can be specified to runtime resume the supplier upon addition of the
|
||||
device link. ``DL_FLAG_AUTOREMOVE`` causes the device link to be automatically
|
||||
purged when the consumer fails to probe or later unbinds. This obviates
|
||||
the need to explicitly delete the link in the ``->remove`` callback or in
|
||||
the error path of the ``->probe`` callback.
|
||||
|
||||
Limitations
|
||||
===========
|
||||
|
||||
Driver authors should be aware that a driver presence dependency (i.e. when
|
||||
``DL_FLAG_STATELESS`` is not specified on link addition) may cause probing of
|
||||
the consumer to be deferred indefinitely. This can become a problem if the
|
||||
consumer is required to probe before a certain initcall level is reached.
|
||||
Worse, if the supplier driver is blacklisted or missing, the consumer will
|
||||
never be probed.
|
||||
|
||||
Sometimes drivers depend on optional resources. They are able to operate
|
||||
in a degraded mode (reduced feature set or performance) when those resources
|
||||
are not present. An example is an SPI controller that can use a DMA engine
|
||||
or work in PIO mode. The controller can determine presence of the optional
|
||||
resources at probe time but on non-presence there is no way to know whether
|
||||
they will become available in the near future (due to a supplier driver
|
||||
probing) or never. Consequently it cannot be determined whether to defer
|
||||
probing or not. It would be possible to notify drivers when optional
|
||||
resources become available after probing, but it would come at a high cost
|
||||
for drivers as switching between modes of operation at runtime based on the
|
||||
availability of such resources would be much more complex than a mechanism
|
||||
based on probe deferral. In any case optional resources are beyond the
|
||||
scope of device links.
|
||||
|
||||
Examples
|
||||
========
|
||||
|
||||
* An MMU device exists alongside a busmaster device, both are in the same
|
||||
power domain. The MMU implements DMA address translation for the busmaster
|
||||
device and shall be runtime resumed and kept active whenever and as long
|
||||
as the busmaster device is active. The busmaster device's driver shall
|
||||
not bind before the MMU is bound. To achieve this, a device link with
|
||||
runtime PM integration is added from the busmaster device (consumer)
|
||||
to the MMU device (supplier). The effect with regards to runtime PM
|
||||
is the same as if the MMU was the parent of the master device.
|
||||
|
||||
The fact that both devices share the same power domain would normally
|
||||
suggest usage of a :c:type:`struct dev_pm_domain` or :c:type:`struct
|
||||
generic_pm_domain`, however these are not independent devices that
|
||||
happen to share a power switch, but rather the MMU device serves the
|
||||
busmaster device and is useless without it. A device link creates a
|
||||
synthetic hierarchical relationship between the devices and is thus
|
||||
more apt.
|
||||
|
||||
* A Thunderbolt host controller comprises a number of PCIe hotplug ports
|
||||
and an NHI device to manage the PCIe switch. On resume from system sleep,
|
||||
the NHI device needs to re-establish PCI tunnels to attached devices
|
||||
before the hotplug ports can resume. If the hotplug ports were children
|
||||
of the NHI, this resume order would automatically be enforced by the
|
||||
PM core, but unfortunately they're aunts. The solution is to add
|
||||
device links from the hotplug ports (consumers) to the NHI device
|
||||
(supplier). A driver presence dependency is not necessary for this
|
||||
use case.
|
||||
|
||||
* Discrete GPUs in hybrid graphics laptops often feature an HDA controller
|
||||
for HDMI/DP audio. In the device hierarchy the HDA controller is a sibling
|
||||
of the VGA device, yet both share the same power domain and the HDA
|
||||
controller is only ever needed when an HDMI/DP display is attached to the
|
||||
VGA device. A device link from the HDA controller (consumer) to the
|
||||
VGA device (supplier) aptly represents this relationship.
|
||||
|
||||
* ACPI allows definition of a device start order by way of _DEP objects.
|
||||
A classical example is when ACPI power management methods on one device
|
||||
are implemented in terms of I\ :sup:`2`\ C accesses and require a specific
|
||||
I\ :sup:`2`\ C controller to be present and functional for the power
|
||||
management of the device in question to work.
|
||||
|
||||
* In some SoCs a functional dependency exists from display, video codec and
|
||||
video processing IP cores on transparent memory access IP cores that handle
|
||||
burst access and compression/decompression.
|
||||
|
||||
Alternatives
|
||||
============
|
||||
|
||||
* A :c:type:`struct dev_pm_domain` can be used to override the bus,
|
||||
class or device type callbacks. It is intended for devices sharing
|
||||
a single on/off switch, however it does not guarantee a specific
|
||||
suspend/resume ordering, this needs to be implemented separately.
|
||||
It also does not by itself track the runtime PM status of the involved
|
||||
devices and turn off the power switch only when all of them are runtime
|
||||
suspended. Furthermore it cannot be used to enforce a specific shutdown
|
||||
ordering or a driver presence dependency.
|
||||
|
||||
* A :c:type:`struct generic_pm_domain` is a lot more heavyweight than a
|
||||
device link and does not allow for shutdown ordering or driver presence
|
||||
dependencies. It also cannot be used on ACPI systems.
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
The device hierarchy, which -- as the name implies -- is a tree,
|
||||
becomes a directed acyclic graph once device links are added.
|
||||
|
||||
Ordering of these devices during suspend/resume is determined by the
|
||||
dpm_list. During shutdown it is determined by the devices_kset. With
|
||||
no device links present, the two lists are a flattened, one-dimensional
|
||||
representations of the device tree such that a device is placed behind
|
||||
all its ancestors. That is achieved by traversing the ACPI namespace
|
||||
or OpenFirmware device tree top-down and appending devices to the lists
|
||||
as they are discovered.
|
||||
|
||||
Once device links are added, the lists need to satisfy the additional
|
||||
constraint that a device is placed behind all its suppliers, recursively.
|
||||
To ensure this, upon addition of the device link the consumer and the
|
||||
entire sub-graph below it (all children and consumers of the consumer)
|
||||
are moved to the end of the list. (Call to :c:func:`device_reorder_to_tail()`
|
||||
from :c:func:`device_link_add()`.)
|
||||
|
||||
To prevent introduction of dependency loops into the graph, it is
|
||||
verified upon device link addition that the supplier is not dependent
|
||||
on the consumer or any children or consumers of the consumer.
|
||||
(Call to :c:func:`device_is_dependent()` from :c:func:`device_link_add()`.)
|
||||
If that constraint is violated, :c:func:`device_link_add()` will return
|
||||
``NULL`` and a ``WARNING`` will be logged.
|
||||
|
||||
Notably this also prevents the addition of a device link from a parent
|
||||
device to a child. However the converse is allowed, i.e. a device link
|
||||
from a child to a parent. Since the driver core already guarantees
|
||||
correct suspend/resume and shutdown ordering between parent and child,
|
||||
such a device link only makes sense if a driver presence dependency is
|
||||
needed on top of that. In this case driver authors should weigh
|
||||
carefully if a device link is at all the right tool for the purpose.
|
||||
A more suitable approach might be to simply use deferred probing or
|
||||
add a device flag causing the parent driver to be probed before the
|
||||
child one.
|
||||
|
||||
State machine
|
||||
=============
|
||||
|
||||
.. kernel-doc:: include/linux/device.h
|
||||
:functions: device_link_state
|
||||
|
||||
::
|
||||
|
||||
.=============================.
|
||||
| |
|
||||
v |
|
||||
DORMANT <=> AVAILABLE <=> CONSUMER_PROBE => ACTIVE
|
||||
^ |
|
||||
| |
|
||||
'============ SUPPLIER_UNBIND <============'
|
||||
|
||||
* The initial state of a device link is automatically determined by
|
||||
:c:func:`device_link_add()` based on the driver presence on the supplier
|
||||
and consumer. If the link is created before any devices are probed, it
|
||||
is set to ``DL_STATE_DORMANT``.
|
||||
|
||||
* When a supplier device is bound to a driver, links to its consumers
|
||||
progress to ``DL_STATE_AVAILABLE``.
|
||||
(Call to :c:func:`device_links_driver_bound()` from
|
||||
:c:func:`driver_bound()`.)
|
||||
|
||||
* Before a consumer device is probed, presence of supplier drivers is
|
||||
verified by checking that links to suppliers are in ``DL_STATE_AVAILABLE``
|
||||
state. The state of the links is updated to ``DL_STATE_CONSUMER_PROBE``.
|
||||
(Call to :c:func:`device_links_check_suppliers()` from
|
||||
:c:func:`really_probe()`.)
|
||||
This prevents the supplier from unbinding.
|
||||
(Call to :c:func:`wait_for_device_probe()` from
|
||||
:c:func:`device_links_unbind_consumers()`.)
|
||||
|
||||
* If the probe fails, links to suppliers revert back to ``DL_STATE_AVAILABLE``.
|
||||
(Call to :c:func:`device_links_no_driver()` from :c:func:`really_probe()`.)
|
||||
|
||||
* If the probe succeeds, links to suppliers progress to ``DL_STATE_ACTIVE``.
|
||||
(Call to :c:func:`device_links_driver_bound()` from :c:func:`driver_bound()`.)
|
||||
|
||||
* When the consumer's driver is later on removed, links to suppliers revert
|
||||
back to ``DL_STATE_AVAILABLE``.
|
||||
(Call to :c:func:`__device_links_no_driver()` from
|
||||
:c:func:`device_links_driver_cleanup()`, which in turn is called from
|
||||
:c:func:`__device_release_driver()`.)
|
||||
|
||||
* Before a supplier's driver is removed, links to consumers that are not
|
||||
bound to a driver are updated to ``DL_STATE_SUPPLIER_UNBIND``.
|
||||
(Call to :c:func:`device_links_busy()` from
|
||||
:c:func:`__device_release_driver()`.)
|
||||
This prevents the consumers from binding.
|
||||
(Call to :c:func:`device_links_check_suppliers()` from
|
||||
:c:func:`really_probe()`.)
|
||||
Consumers that are bound are freed from their driver; consumers that are
|
||||
probing are waited for until they are done.
|
||||
(Call to :c:func:`device_links_unbind_consumers()` from
|
||||
:c:func:`__device_release_driver()`.)
|
||||
Once all links to consumers are in ``DL_STATE_SUPPLIER_UNBIND`` state,
|
||||
the supplier driver is released and the links revert to ``DL_STATE_DORMANT``.
|
||||
(Call to :c:func:`device_links_driver_cleanup()` from
|
||||
:c:func:`__device_release_driver()`.)
|
||||
|
||||
API
|
||||
===
|
||||
|
||||
.. kernel-doc:: drivers/base/core.c
|
||||
:functions: device_link_add device_link_del
|
73
Documentation/driver-api/dma-buf.rst
Normal file
73
Documentation/driver-api/dma-buf.rst
Normal file
@ -0,0 +1,73 @@
|
||||
Buffer Sharing and Synchronization
|
||||
==================================
|
||||
|
||||
The dma-buf subsystem provides the framework for sharing buffers for
|
||||
hardware (DMA) access across multiple device drivers and subsystems, and
|
||||
for synchronizing asynchronous hardware access.
|
||||
|
||||
This is used, for example, by drm "prime" multi-GPU support, but is of
|
||||
course not limited to GPU use cases.
|
||||
|
||||
The three main components of this are: (1) dma-buf, representing a
|
||||
sg_table and exposed to userspace as a file descriptor to allow passing
|
||||
between devices, (2) fence, which provides a mechanism to signal when
|
||||
one device as finished access, and (3) reservation, which manages the
|
||||
shared or exclusive fence(s) associated with the buffer.
|
||||
|
||||
Shared DMA Buffers
|
||||
------------------
|
||||
|
||||
.. kernel-doc:: drivers/dma-buf/dma-buf.c
|
||||
:export:
|
||||
|
||||
.. kernel-doc:: include/linux/dma-buf.h
|
||||
:internal:
|
||||
|
||||
Reservation Objects
|
||||
-------------------
|
||||
|
||||
.. kernel-doc:: drivers/dma-buf/reservation.c
|
||||
:doc: Reservation Object Overview
|
||||
|
||||
.. kernel-doc:: drivers/dma-buf/reservation.c
|
||||
:export:
|
||||
|
||||
.. kernel-doc:: include/linux/reservation.h
|
||||
:internal:
|
||||
|
||||
DMA Fences
|
||||
----------
|
||||
|
||||
.. kernel-doc:: drivers/dma-buf/dma-fence.c
|
||||
:export:
|
||||
|
||||
.. kernel-doc:: include/linux/dma-fence.h
|
||||
:internal:
|
||||
|
||||
Seqno Hardware Fences
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. kernel-doc:: drivers/dma-buf/seqno-fence.c
|
||||
:export:
|
||||
|
||||
.. kernel-doc:: include/linux/seqno-fence.h
|
||||
:internal:
|
||||
|
||||
DMA Fence Array
|
||||
~~~~~~~~~~~~~~~
|
||||
|
||||
.. kernel-doc:: drivers/dma-buf/dma-fence-array.c
|
||||
:export:
|
||||
|
||||
.. kernel-doc:: include/linux/dma-fence-array.h
|
||||
:internal:
|
||||
|
||||
DMA Fence uABI/Sync File
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. kernel-doc:: drivers/dma-buf/sync_file.c
|
||||
:export:
|
||||
|
||||
.. kernel-doc:: include/linux/sync_file.h
|
||||
:internal:
|
||||
|
@ -16,11 +16,23 @@ available subsections can be seen below.
|
||||
|
||||
basics
|
||||
infrastructure
|
||||
dma-buf
|
||||
device_link
|
||||
message-based
|
||||
sound
|
||||
frame-buffer
|
||||
input
|
||||
usb
|
||||
spi
|
||||
i2c
|
||||
hsi
|
||||
miscellaneous
|
||||
vme
|
||||
80211/index
|
||||
|
||||
.. only:: subproject and html
|
||||
|
||||
Indices
|
||||
=======
|
||||
|
||||
* :ref:`genindex`
|
||||
|
@ -46,76 +46,6 @@ Device Drivers Base
|
||||
.. kernel-doc:: drivers/base/bus.c
|
||||
:export:
|
||||
|
||||
Buffer Sharing and Synchronization
|
||||
----------------------------------
|
||||
|
||||
The dma-buf subsystem provides the framework for sharing buffers for
|
||||
hardware (DMA) access across multiple device drivers and subsystems, and
|
||||
for synchronizing asynchronous hardware access.
|
||||
|
||||
This is used, for example, by drm "prime" multi-GPU support, but is of
|
||||
course not limited to GPU use cases.
|
||||
|
||||
The three main components of this are: (1) dma-buf, representing a
|
||||
sg_table and exposed to userspace as a file descriptor to allow passing
|
||||
between devices, (2) fence, which provides a mechanism to signal when
|
||||
one device as finished access, and (3) reservation, which manages the
|
||||
shared or exclusive fence(s) associated with the buffer.
|
||||
|
||||
dma-buf
|
||||
~~~~~~~
|
||||
|
||||
.. kernel-doc:: drivers/dma-buf/dma-buf.c
|
||||
:export:
|
||||
|
||||
.. kernel-doc:: include/linux/dma-buf.h
|
||||
:internal:
|
||||
|
||||
reservation
|
||||
~~~~~~~~~~~
|
||||
|
||||
.. kernel-doc:: drivers/dma-buf/reservation.c
|
||||
:doc: Reservation Object Overview
|
||||
|
||||
.. kernel-doc:: drivers/dma-buf/reservation.c
|
||||
:export:
|
||||
|
||||
.. kernel-doc:: include/linux/reservation.h
|
||||
:internal:
|
||||
|
||||
fence
|
||||
~~~~~
|
||||
|
||||
.. kernel-doc:: drivers/dma-buf/dma-fence.c
|
||||
:export:
|
||||
|
||||
.. kernel-doc:: include/linux/dma-fence.h
|
||||
:internal:
|
||||
|
||||
.. kernel-doc:: drivers/dma-buf/seqno-fence.c
|
||||
:export:
|
||||
|
||||
.. kernel-doc:: include/linux/seqno-fence.h
|
||||
:internal:
|
||||
|
||||
.. kernel-doc:: drivers/dma-buf/dma-fence-array.c
|
||||
:export:
|
||||
|
||||
.. kernel-doc:: include/linux/dma-fence-array.h
|
||||
:internal:
|
||||
|
||||
.. kernel-doc:: drivers/dma-buf/reservation.c
|
||||
:export:
|
||||
|
||||
.. kernel-doc:: include/linux/reservation.h
|
||||
:internal:
|
||||
|
||||
.. kernel-doc:: drivers/dma-buf/sync_file.c
|
||||
:export:
|
||||
|
||||
.. kernel-doc:: include/linux/sync_file.h
|
||||
:internal:
|
||||
|
||||
Device Drivers DMA Management
|
||||
-----------------------------
|
||||
|
||||
|
748
Documentation/driver-api/usb.rst
Normal file
748
Documentation/driver-api/usb.rst
Normal file
@ -0,0 +1,748 @@
|
||||
===========================
|
||||
The Linux-USB Host Side API
|
||||
===========================
|
||||
|
||||
Introduction to USB on Linux
|
||||
============================
|
||||
|
||||
A Universal Serial Bus (USB) is used to connect a host, such as a PC or
|
||||
workstation, to a number of peripheral devices. USB uses a tree
|
||||
structure, with the host as the root (the system's master), hubs as
|
||||
interior nodes, and peripherals as leaves (and slaves). Modern PCs
|
||||
support several such trees of USB devices, usually
|
||||
a few USB 3.0 (5 GBit/s) or USB 3.1 (10 GBit/s) and some legacy
|
||||
USB 2.0 (480 MBit/s) busses just in case.
|
||||
|
||||
That master/slave asymmetry was designed-in for a number of reasons, one
|
||||
being ease of use. It is not physically possible to mistake upstream and
|
||||
downstream or it does not matter with a type C plug (or they are built into the
|
||||
peripheral). Also, the host software doesn't need to deal with
|
||||
distributed auto-configuration since the pre-designated master node
|
||||
manages all that.
|
||||
|
||||
Kernel developers added USB support to Linux early in the 2.2 kernel
|
||||
series and have been developing it further since then. Besides support
|
||||
for each new generation of USB, various host controllers gained support,
|
||||
new drivers for peripherals have been added and advanced features for latency
|
||||
measurement and improved power management introduced.
|
||||
|
||||
Linux can run inside USB devices as well as on the hosts that control
|
||||
the devices. But USB device drivers running inside those peripherals
|
||||
don't do the same things as the ones running inside hosts, so they've
|
||||
been given a different name: *gadget drivers*. This document does not
|
||||
cover gadget drivers.
|
||||
|
||||
USB Host-Side API Model
|
||||
=======================
|
||||
|
||||
Host-side drivers for USB devices talk to the "usbcore" APIs. There are
|
||||
two. One is intended for *general-purpose* drivers (exposed through
|
||||
driver frameworks), and the other is for drivers that are *part of the
|
||||
core*. Such core drivers include the *hub* driver (which manages trees
|
||||
of USB devices) and several different kinds of *host controller
|
||||
drivers*, which control individual busses.
|
||||
|
||||
The device model seen by USB drivers is relatively complex.
|
||||
|
||||
- USB supports four kinds of data transfers (control, bulk, interrupt,
|
||||
and isochronous). Two of them (control and bulk) use bandwidth as
|
||||
it's available, while the other two (interrupt and isochronous) are
|
||||
scheduled to provide guaranteed bandwidth.
|
||||
|
||||
- The device description model includes one or more "configurations"
|
||||
per device, only one of which is active at a time. Devices are supposed
|
||||
to be capable of operating at lower than their top
|
||||
speeds and may provide a BOS descriptor showing the lowest speed they
|
||||
remain fully operational at.
|
||||
|
||||
- From USB 3.0 on configurations have one or more "functions", which
|
||||
provide a common functionality and are grouped together for purposes
|
||||
of power management.
|
||||
|
||||
- Configurations or functions have one or more "interfaces", each of which may have
|
||||
"alternate settings". Interfaces may be standardized by USB "Class"
|
||||
specifications, or may be specific to a vendor or device.
|
||||
|
||||
USB device drivers actually bind to interfaces, not devices. Think of
|
||||
them as "interface drivers", though you may not see many devices
|
||||
where the distinction is important. *Most USB devices are simple,
|
||||
with only one function, one configuration, one interface, and one alternate
|
||||
setting.*
|
||||
|
||||
- Interfaces have one or more "endpoints", each of which supports one
|
||||
type and direction of data transfer such as "bulk out" or "interrupt
|
||||
in". The entire configuration may have up to sixteen endpoints in
|
||||
each direction, allocated as needed among all the interfaces.
|
||||
|
||||
- Data transfer on USB is packetized; each endpoint has a maximum
|
||||
packet size. Drivers must often be aware of conventions such as
|
||||
flagging the end of bulk transfers using "short" (including zero
|
||||
length) packets.
|
||||
|
||||
- The Linux USB API supports synchronous calls for control and bulk
|
||||
messages. It also supports asynchronous calls for all kinds of data
|
||||
transfer, using request structures called "URBs" (USB Request
|
||||
Blocks).
|
||||
|
||||
Accordingly, the USB Core API exposed to device drivers covers quite a
|
||||
lot of territory. You'll probably need to consult the USB 3.0
|
||||
specification, available online from www.usb.org at no cost, as well as
|
||||
class or device specifications.
|
||||
|
||||
The only host-side drivers that actually touch hardware (reading/writing
|
||||
registers, handling IRQs, and so on) are the HCDs. In theory, all HCDs
|
||||
provide the same functionality through the same API. In practice, that's
|
||||
becoming more true, but there are still differences
|
||||
that crop up especially with fault handling on the less common controllers.
|
||||
Different controllers don't
|
||||
necessarily report the same aspects of failures, and recovery from
|
||||
faults (including software-induced ones like unlinking an URB) isn't yet
|
||||
fully consistent. Device driver authors should make a point of doing
|
||||
disconnect testing (while the device is active) with each different host
|
||||
controller driver, to make sure drivers don't have bugs of their own as
|
||||
well as to make sure they aren't relying on some HCD-specific behavior.
|
||||
|
||||
USB-Standard Types
|
||||
==================
|
||||
|
||||
In ``<linux/usb/ch9.h>`` you will find the USB data types defined in
|
||||
chapter 9 of the USB specification. These data types are used throughout
|
||||
USB, and in APIs including this host side API, gadget APIs, and usbfs.
|
||||
|
||||
.. kernel-doc:: include/linux/usb/ch9.h
|
||||
:internal:
|
||||
|
||||
Host-Side Data Types and Macros
|
||||
===============================
|
||||
|
||||
The host side API exposes several layers to drivers, some of which are
|
||||
more necessary than others. These support lifecycle models for host side
|
||||
drivers and devices, and support passing buffers through usbcore to some
|
||||
HCD that performs the I/O for the device driver.
|
||||
|
||||
.. kernel-doc:: include/linux/usb.h
|
||||
:internal:
|
||||
|
||||
USB Core APIs
|
||||
=============
|
||||
|
||||
There are two basic I/O models in the USB API. The most elemental one is
|
||||
asynchronous: drivers submit requests in the form of an URB, and the
|
||||
URB's completion callback handles the next step. All USB transfer types
|
||||
support that model, although there are special cases for control URBs
|
||||
(which always have setup and status stages, but may not have a data
|
||||
stage) and isochronous URBs (which allow large packets and include
|
||||
per-packet fault reports). Built on top of that is synchronous API
|
||||
support, where a driver calls a routine that allocates one or more URBs,
|
||||
submits them, and waits until they complete. There are synchronous
|
||||
wrappers for single-buffer control and bulk transfers (which are awkward
|
||||
to use in some driver disconnect scenarios), and for scatterlist based
|
||||
streaming i/o (bulk or interrupt).
|
||||
|
||||
USB drivers need to provide buffers that can be used for DMA, although
|
||||
they don't necessarily need to provide the DMA mapping themselves. There
|
||||
are APIs to use used when allocating DMA buffers, which can prevent use
|
||||
of bounce buffers on some systems. In some cases, drivers may be able to
|
||||
rely on 64bit DMA to eliminate another kind of bounce buffer.
|
||||
|
||||
.. kernel-doc:: drivers/usb/core/urb.c
|
||||
:export:
|
||||
|
||||
.. kernel-doc:: drivers/usb/core/message.c
|
||||
:export:
|
||||
|
||||
.. kernel-doc:: drivers/usb/core/file.c
|
||||
:export:
|
||||
|
||||
.. kernel-doc:: drivers/usb/core/driver.c
|
||||
:export:
|
||||
|
||||
.. kernel-doc:: drivers/usb/core/usb.c
|
||||
:export:
|
||||
|
||||
.. kernel-doc:: drivers/usb/core/hub.c
|
||||
:export:
|
||||
|
||||
Host Controller APIs
|
||||
====================
|
||||
|
||||
These APIs are only for use by host controller drivers, most of which
|
||||
implement standard register interfaces such as XHCI, EHCI, OHCI, or UHCI. UHCI
|
||||
was one of the first interfaces, designed by Intel and also used by VIA;
|
||||
it doesn't do much in hardware. OHCI was designed later, to have the
|
||||
hardware do more work (bigger transfers, tracking protocol state, and so
|
||||
on). EHCI was designed with USB 2.0; its design has features that
|
||||
resemble OHCI (hardware does much more work) as well as UHCI (some parts
|
||||
of ISO support, TD list processing). XHCI was designed with USB 3.0. It
|
||||
continues to shift support for functionality into hardware.
|
||||
|
||||
There are host controllers other than the "big three", although most PCI
|
||||
based controllers (and a few non-PCI based ones) use one of those
|
||||
interfaces. Not all host controllers use DMA; some use PIO, and there is
|
||||
also a simulator and a virtual host controller to pipe USB over the network.
|
||||
|
||||
The same basic APIs are available to drivers for all those controllers.
|
||||
For historical reasons they are in two layers: :c:type:`struct
|
||||
usb_bus <usb_bus>` is a rather thin layer that became available
|
||||
in the 2.2 kernels, while :c:type:`struct usb_hcd <usb_hcd>`
|
||||
is a more featureful layer
|
||||
that lets HCDs share common code, to shrink driver size and
|
||||
significantly reduce hcd-specific behaviors.
|
||||
|
||||
.. kernel-doc:: drivers/usb/core/hcd.c
|
||||
:export:
|
||||
|
||||
.. kernel-doc:: drivers/usb/core/hcd-pci.c
|
||||
:export:
|
||||
|
||||
.. kernel-doc:: drivers/usb/core/buffer.c
|
||||
:internal:
|
||||
|
||||
The USB Filesystem (usbfs)
|
||||
==========================
|
||||
|
||||
This chapter presents the Linux *usbfs*. You may prefer to avoid writing
|
||||
new kernel code for your USB driver; that's the problem that usbfs set
|
||||
out to solve. User mode device drivers are usually packaged as
|
||||
applications or libraries, and may use usbfs through some programming
|
||||
library that wraps it. Such libraries include
|
||||
`libusb <http://libusb.sourceforge.net>`__ for C/C++, and
|
||||
`jUSB <http://jUSB.sourceforge.net>`__ for Java.
|
||||
|
||||
**Note**
|
||||
|
||||
This particular documentation is incomplete, especially with respect
|
||||
to the asynchronous mode. As of kernel 2.5.66 the code and this
|
||||
(new) documentation need to be cross-reviewed.
|
||||
|
||||
Configure usbfs into Linux kernels by enabling the *USB filesystem*
|
||||
option (CONFIG_USB_DEVICEFS), and you get basic support for user mode
|
||||
USB device drivers. Until relatively recently it was often (confusingly)
|
||||
called *usbdevfs* although it wasn't solving what *devfs* was. Every USB
|
||||
device will appear in usbfs, regardless of whether or not it has a
|
||||
kernel driver.
|
||||
|
||||
What files are in "usbfs"?
|
||||
--------------------------
|
||||
|
||||
Conventionally mounted at ``/proc/bus/usb``, usbfs features include:
|
||||
|
||||
- ``/proc/bus/usb/devices`` ... a text file showing each of the USB
|
||||
devices on known to the kernel, and their configuration descriptors.
|
||||
You can also poll() this to learn about new devices.
|
||||
|
||||
- ``/proc/bus/usb/BBB/DDD`` ... magic files exposing the each device's
|
||||
configuration descriptors, and supporting a series of ioctls for
|
||||
making device requests, including I/O to devices. (Purely for access
|
||||
by programs.)
|
||||
|
||||
Each bus is given a number (BBB) based on when it was enumerated; within
|
||||
each bus, each device is given a similar number (DDD). Those BBB/DDD
|
||||
paths are not "stable" identifiers; expect them to change even if you
|
||||
always leave the devices plugged in to the same hub port. *Don't even
|
||||
think of saving these in application configuration files.* Stable
|
||||
identifiers are available, for user mode applications that want to use
|
||||
them. HID and networking devices expose these stable IDs, so that for
|
||||
example you can be sure that you told the right UPS to power down its
|
||||
second server. "usbfs" doesn't (yet) expose those IDs.
|
||||
|
||||
Mounting and Access Control
|
||||
---------------------------
|
||||
|
||||
There are a number of mount options for usbfs, which will be of most
|
||||
interest to you if you need to override the default access control
|
||||
policy. That policy is that only root may read or write device files
|
||||
(``/proc/bus/BBB/DDD``) although anyone may read the ``devices`` or
|
||||
``drivers`` files. I/O requests to the device also need the
|
||||
CAP_SYS_RAWIO capability,
|
||||
|
||||
The significance of that is that by default, all user mode device
|
||||
drivers need super-user privileges. You can change modes or ownership in
|
||||
a driver setup when the device hotplugs, or maye just start the driver
|
||||
right then, as a privileged server (or some activity within one). That's
|
||||
the most secure approach for multi-user systems, but for single user
|
||||
systems ("trusted" by that user) it's more convenient just to grant
|
||||
everyone all access (using the *devmode=0666* option) so the driver can
|
||||
start whenever it's needed.
|
||||
|
||||
The mount options for usbfs, usable in /etc/fstab or in command line
|
||||
invocations of *mount*, are:
|
||||
|
||||
*busgid*\ =NNNNN
|
||||
Controls the GID used for the /proc/bus/usb/BBB directories.
|
||||
(Default: 0)
|
||||
|
||||
*busmode*\ =MMM
|
||||
Controls the file mode used for the /proc/bus/usb/BBB directories.
|
||||
(Default: 0555)
|
||||
|
||||
*busuid*\ =NNNNN
|
||||
Controls the UID used for the /proc/bus/usb/BBB directories.
|
||||
(Default: 0)
|
||||
|
||||
*devgid*\ =NNNNN
|
||||
Controls the GID used for the /proc/bus/usb/BBB/DDD files. (Default:
|
||||
0)
|
||||
|
||||
*devmode*\ =MMM
|
||||
Controls the file mode used for the /proc/bus/usb/BBB/DDD files.
|
||||
(Default: 0644)
|
||||
|
||||
*devuid*\ =NNNNN
|
||||
Controls the UID used for the /proc/bus/usb/BBB/DDD files. (Default:
|
||||
0)
|
||||
|
||||
*listgid*\ =NNNNN
|
||||
Controls the GID used for the /proc/bus/usb/devices and drivers
|
||||
files. (Default: 0)
|
||||
|
||||
*listmode*\ =MMM
|
||||
Controls the file mode used for the /proc/bus/usb/devices and
|
||||
drivers files. (Default: 0444)
|
||||
|
||||
*listuid*\ =NNNNN
|
||||
Controls the UID used for the /proc/bus/usb/devices and drivers
|
||||
files. (Default: 0)
|
||||
|
||||
Note that many Linux distributions hard-wire the mount options for usbfs
|
||||
in their init scripts, such as ``/etc/rc.d/rc.sysinit``, rather than
|
||||
making it easy to set this per-system policy in ``/etc/fstab``.
|
||||
|
||||
/proc/bus/usb/devices
|
||||
---------------------
|
||||
|
||||
This file is handy for status viewing tools in user mode, which can scan
|
||||
the text format and ignore most of it. More detailed device status
|
||||
(including class and vendor status) is available from device-specific
|
||||
files. For information about the current format of this file, see the
|
||||
``Documentation/usb/proc_usb_info.txt`` file in your Linux kernel
|
||||
sources.
|
||||
|
||||
This file, in combination with the poll() system call, can also be used
|
||||
to detect when devices are added or removed:
|
||||
|
||||
::
|
||||
|
||||
int fd;
|
||||
struct pollfd pfd;
|
||||
|
||||
fd = open("/proc/bus/usb/devices", O_RDONLY);
|
||||
pfd = { fd, POLLIN, 0 };
|
||||
for (;;) {
|
||||
/* The first time through, this call will return immediately. */
|
||||
poll(&pfd, 1, -1);
|
||||
|
||||
/* To see what's changed, compare the file's previous and current
|
||||
contents or scan the filesystem. (Scanning is more precise.) */
|
||||
}
|
||||
|
||||
Note that this behavior is intended to be used for informational and
|
||||
debug purposes. It would be more appropriate to use programs such as
|
||||
udev or HAL to initialize a device or start a user-mode helper program,
|
||||
for instance.
|
||||
|
||||
/proc/bus/usb/BBB/DDD
|
||||
---------------------
|
||||
|
||||
Use these files in one of these basic ways:
|
||||
|
||||
*They can be read,* producing first the device descriptor (18 bytes) and
|
||||
then the descriptors for the current configuration. See the USB 2.0 spec
|
||||
for details about those binary data formats. You'll need to convert most
|
||||
multibyte values from little endian format to your native host byte
|
||||
order, although a few of the fields in the device descriptor (both of
|
||||
the BCD-encoded fields, and the vendor and product IDs) will be
|
||||
byteswapped for you. Note that configuration descriptors include
|
||||
descriptors for interfaces, altsettings, endpoints, and maybe additional
|
||||
class descriptors.
|
||||
|
||||
*Perform USB operations* using *ioctl()* requests to make endpoint I/O
|
||||
requests (synchronously or asynchronously) or manage the device. These
|
||||
requests need the CAP_SYS_RAWIO capability, as well as filesystem
|
||||
access permissions. Only one ioctl request can be made on one of these
|
||||
device files at a time. This means that if you are synchronously reading
|
||||
an endpoint from one thread, you won't be able to write to a different
|
||||
endpoint from another thread until the read completes. This works for
|
||||
*half duplex* protocols, but otherwise you'd use asynchronous i/o
|
||||
requests.
|
||||
|
||||
Life Cycle of User Mode Drivers
|
||||
-------------------------------
|
||||
|
||||
Such a driver first needs to find a device file for a device it knows
|
||||
how to handle. Maybe it was told about it because a ``/sbin/hotplug``
|
||||
event handling agent chose that driver to handle the new device. Or
|
||||
maybe it's an application that scans all the /proc/bus/usb device files,
|
||||
and ignores most devices. In either case, it should :c:func:`read()`
|
||||
all the descriptors from the device file, and check them against what it
|
||||
knows how to handle. It might just reject everything except a particular
|
||||
vendor and product ID, or need a more complex policy.
|
||||
|
||||
Never assume there will only be one such device on the system at a time!
|
||||
If your code can't handle more than one device at a time, at least
|
||||
detect when there's more than one, and have your users choose which
|
||||
device to use.
|
||||
|
||||
Once your user mode driver knows what device to use, it interacts with
|
||||
it in either of two styles. The simple style is to make only control
|
||||
requests; some devices don't need more complex interactions than those.
|
||||
(An example might be software using vendor-specific control requests for
|
||||
some initialization or configuration tasks, with a kernel driver for the
|
||||
rest.)
|
||||
|
||||
More likely, you need a more complex style driver: one using non-control
|
||||
endpoints, reading or writing data and claiming exclusive use of an
|
||||
interface. *Bulk* transfers are easiest to use, but only their sibling
|
||||
*interrupt* transfers work with low speed devices. Both interrupt and
|
||||
*isochronous* transfers offer service guarantees because their bandwidth
|
||||
is reserved. Such "periodic" transfers are awkward to use through usbfs,
|
||||
unless you're using the asynchronous calls. However, interrupt transfers
|
||||
can also be used in a synchronous "one shot" style.
|
||||
|
||||
Your user-mode driver should never need to worry about cleaning up
|
||||
request state when the device is disconnected, although it should close
|
||||
its open file descriptors as soon as it starts seeing the ENODEV errors.
|
||||
|
||||
The ioctl() Requests
|
||||
--------------------
|
||||
|
||||
To use these ioctls, you need to include the following headers in your
|
||||
userspace program:
|
||||
|
||||
::
|
||||
|
||||
#include <linux/usb.h>
|
||||
#include <linux/usbdevice_fs.h>
|
||||
#include <asm/byteorder.h>
|
||||
|
||||
The standard USB device model requests, from "Chapter 9" of the USB 2.0
|
||||
specification, are automatically included from the ``<linux/usb/ch9.h>``
|
||||
header.
|
||||
|
||||
Unless noted otherwise, the ioctl requests described here will update
|
||||
the modification time on the usbfs file to which they are applied
|
||||
(unless they fail). A return of zero indicates success; otherwise, a
|
||||
standard USB error code is returned. (These are documented in
|
||||
``Documentation/usb/error-codes.txt`` in your kernel sources.)
|
||||
|
||||
Each of these files multiplexes access to several I/O streams, one per
|
||||
endpoint. Each device has one control endpoint (endpoint zero) which
|
||||
supports a limited RPC style RPC access. Devices are configured by
|
||||
hub_wq (in the kernel) setting a device-wide *configuration* that
|
||||
affects things like power consumption and basic functionality. The
|
||||
endpoints are part of USB *interfaces*, which may have *altsettings*
|
||||
affecting things like which endpoints are available. Many devices only
|
||||
have a single configuration and interface, so drivers for them will
|
||||
ignore configurations and altsettings.
|
||||
|
||||
Management/Status Requests
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
A number of usbfs requests don't deal very directly with device I/O.
|
||||
They mostly relate to device management and status. These are all
|
||||
synchronous requests.
|
||||
|
||||
USBDEVFS_CLAIMINTERFACE
|
||||
This is used to force usbfs to claim a specific interface, which has
|
||||
not previously been claimed by usbfs or any other kernel driver. The
|
||||
ioctl parameter is an integer holding the number of the interface
|
||||
(bInterfaceNumber from descriptor).
|
||||
|
||||
Note that if your driver doesn't claim an interface before trying to
|
||||
use one of its endpoints, and no other driver has bound to it, then
|
||||
the interface is automatically claimed by usbfs.
|
||||
|
||||
This claim will be released by a RELEASEINTERFACE ioctl, or by
|
||||
closing the file descriptor. File modification time is not updated
|
||||
by this request.
|
||||
|
||||
USBDEVFS_CONNECTINFO
|
||||
Says whether the device is lowspeed. The ioctl parameter points to a
|
||||
structure like this:
|
||||
|
||||
::
|
||||
|
||||
struct usbdevfs_connectinfo {
|
||||
unsigned int devnum;
|
||||
unsigned char slow;
|
||||
};
|
||||
|
||||
File modification time is not updated by this request.
|
||||
|
||||
*You can't tell whether a "not slow" device is connected at high
|
||||
speed (480 MBit/sec) or just full speed (12 MBit/sec).* You should
|
||||
know the devnum value already, it's the DDD value of the device file
|
||||
name.
|
||||
|
||||
USBDEVFS_GETDRIVER
|
||||
Returns the name of the kernel driver bound to a given interface (a
|
||||
string). Parameter is a pointer to this structure, which is
|
||||
modified:
|
||||
|
||||
::
|
||||
|
||||
struct usbdevfs_getdriver {
|
||||
unsigned int interface;
|
||||
char driver[USBDEVFS_MAXDRIVERNAME + 1];
|
||||
};
|
||||
|
||||
File modification time is not updated by this request.
|
||||
|
||||
USBDEVFS_IOCTL
|
||||
Passes a request from userspace through to a kernel driver that has
|
||||
an ioctl entry in the *struct usb_driver* it registered.
|
||||
|
||||
::
|
||||
|
||||
struct usbdevfs_ioctl {
|
||||
int ifno;
|
||||
int ioctl_code;
|
||||
void *data;
|
||||
};
|
||||
|
||||
/* user mode call looks like this.
|
||||
* 'request' becomes the driver->ioctl() 'code' parameter.
|
||||
* the size of 'param' is encoded in 'request', and that data
|
||||
* is copied to or from the driver->ioctl() 'buf' parameter.
|
||||
*/
|
||||
static int
|
||||
usbdev_ioctl (int fd, int ifno, unsigned request, void *param)
|
||||
{
|
||||
struct usbdevfs_ioctl wrapper;
|
||||
|
||||
wrapper.ifno = ifno;
|
||||
wrapper.ioctl_code = request;
|
||||
wrapper.data = param;
|
||||
|
||||
return ioctl (fd, USBDEVFS_IOCTL, &wrapper);
|
||||
}
|
||||
|
||||
File modification time is not updated by this request.
|
||||
|
||||
This request lets kernel drivers talk to user mode code through
|
||||
filesystem operations even when they don't create a character or
|
||||
block special device. It's also been used to do things like ask
|
||||
devices what device special file should be used. Two pre-defined
|
||||
ioctls are used to disconnect and reconnect kernel drivers, so that
|
||||
user mode code can completely manage binding and configuration of
|
||||
devices.
|
||||
|
||||
USBDEVFS_RELEASEINTERFACE
|
||||
This is used to release the claim usbfs made on interface, either
|
||||
implicitly or because of a USBDEVFS_CLAIMINTERFACE call, before the
|
||||
file descriptor is closed. The ioctl parameter is an integer holding
|
||||
the number of the interface (bInterfaceNumber from descriptor); File
|
||||
modification time is not updated by this request.
|
||||
|
||||
**Warning**
|
||||
|
||||
*No security check is made to ensure that the task which made
|
||||
the claim is the one which is releasing it. This means that user
|
||||
mode driver may interfere other ones.*
|
||||
|
||||
USBDEVFS_RESETEP
|
||||
Resets the data toggle value for an endpoint (bulk or interrupt) to
|
||||
DATA0. The ioctl parameter is an integer endpoint number (1 to 15,
|
||||
as identified in the endpoint descriptor), with USB_DIR_IN added
|
||||
if the device's endpoint sends data to the host.
|
||||
|
||||
**Warning**
|
||||
|
||||
*Avoid using this request. It should probably be removed.* Using
|
||||
it typically means the device and driver will lose toggle
|
||||
synchronization. If you really lost synchronization, you likely
|
||||
need to completely handshake with the device, using a request
|
||||
like CLEAR_HALT or SET_INTERFACE.
|
||||
|
||||
USBDEVFS_DROP_PRIVILEGES
|
||||
This is used to relinquish the ability to do certain operations
|
||||
which are considered to be privileged on a usbfs file descriptor.
|
||||
This includes claiming arbitrary interfaces, resetting a device on
|
||||
which there are currently claimed interfaces from other users, and
|
||||
issuing USBDEVFS_IOCTL calls. The ioctl parameter is a 32 bit mask
|
||||
of interfaces the user is allowed to claim on this file descriptor.
|
||||
You may issue this ioctl more than one time to narrow said mask.
|
||||
|
||||
Synchronous I/O Support
|
||||
~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Synchronous requests involve the kernel blocking until the user mode
|
||||
request completes, either by finishing successfully or by reporting an
|
||||
error. In most cases this is the simplest way to use usbfs, although as
|
||||
noted above it does prevent performing I/O to more than one endpoint at
|
||||
a time.
|
||||
|
||||
USBDEVFS_BULK
|
||||
Issues a bulk read or write request to the device. The ioctl
|
||||
parameter is a pointer to this structure:
|
||||
|
||||
::
|
||||
|
||||
struct usbdevfs_bulktransfer {
|
||||
unsigned int ep;
|
||||
unsigned int len;
|
||||
unsigned int timeout; /* in milliseconds */
|
||||
void *data;
|
||||
};
|
||||
|
||||
The "ep" value identifies a bulk endpoint number (1 to 15, as
|
||||
identified in an endpoint descriptor), masked with USB_DIR_IN when
|
||||
referring to an endpoint which sends data to the host from the
|
||||
device. The length of the data buffer is identified by "len"; Recent
|
||||
kernels support requests up to about 128KBytes. *FIXME say how read
|
||||
length is returned, and how short reads are handled.*.
|
||||
|
||||
USBDEVFS_CLEAR_HALT
|
||||
Clears endpoint halt (stall) and resets the endpoint toggle. This is
|
||||
only meaningful for bulk or interrupt endpoints. The ioctl parameter
|
||||
is an integer endpoint number (1 to 15, as identified in an endpoint
|
||||
descriptor), masked with USB_DIR_IN when referring to an endpoint
|
||||
which sends data to the host from the device.
|
||||
|
||||
Use this on bulk or interrupt endpoints which have stalled,
|
||||
returning *-EPIPE* status to a data transfer request. Do not issue
|
||||
the control request directly, since that could invalidate the host's
|
||||
record of the data toggle.
|
||||
|
||||
USBDEVFS_CONTROL
|
||||
Issues a control request to the device. The ioctl parameter points
|
||||
to a structure like this:
|
||||
|
||||
::
|
||||
|
||||
struct usbdevfs_ctrltransfer {
|
||||
__u8 bRequestType;
|
||||
__u8 bRequest;
|
||||
__u16 wValue;
|
||||
__u16 wIndex;
|
||||
__u16 wLength;
|
||||
__u32 timeout; /* in milliseconds */
|
||||
void *data;
|
||||
};
|
||||
|
||||
The first eight bytes of this structure are the contents of the
|
||||
SETUP packet to be sent to the device; see the USB 2.0 specification
|
||||
for details. The bRequestType value is composed by combining a
|
||||
USB_TYPE_\* value, a USB_DIR_\* value, and a USB_RECIP_\*
|
||||
value (from *<linux/usb.h>*). If wLength is nonzero, it describes
|
||||
the length of the data buffer, which is either written to the device
|
||||
(USB_DIR_OUT) or read from the device (USB_DIR_IN).
|
||||
|
||||
At this writing, you can't transfer more than 4 KBytes of data to or
|
||||
from a device; usbfs has a limit, and some host controller drivers
|
||||
have a limit. (That's not usually a problem.) *Also* there's no way
|
||||
to say it's not OK to get a short read back from the device.
|
||||
|
||||
USBDEVFS_RESET
|
||||
Does a USB level device reset. The ioctl parameter is ignored. After
|
||||
the reset, this rebinds all device interfaces. File modification
|
||||
time is not updated by this request.
|
||||
|
||||
**Warning**
|
||||
|
||||
*Avoid using this call* until some usbcore bugs get fixed, since
|
||||
it does not fully synchronize device, interface, and driver (not
|
||||
just usbfs) state.
|
||||
|
||||
USBDEVFS_SETINTERFACE
|
||||
Sets the alternate setting for an interface. The ioctl parameter is
|
||||
a pointer to a structure like this:
|
||||
|
||||
::
|
||||
|
||||
struct usbdevfs_setinterface {
|
||||
unsigned int interface;
|
||||
unsigned int altsetting;
|
||||
};
|
||||
|
||||
File modification time is not updated by this request.
|
||||
|
||||
Those struct members are from some interface descriptor applying to
|
||||
the current configuration. The interface number is the
|
||||
bInterfaceNumber value, and the altsetting number is the
|
||||
bAlternateSetting value. (This resets each endpoint in the
|
||||
interface.)
|
||||
|
||||
USBDEVFS_SETCONFIGURATION
|
||||
Issues the :c:func:`usb_set_configuration()` call for the
|
||||
device. The parameter is an integer holding the number of a
|
||||
configuration (bConfigurationValue from descriptor). File
|
||||
modification time is not updated by this request.
|
||||
|
||||
**Warning**
|
||||
|
||||
*Avoid using this call* until some usbcore bugs get fixed, since
|
||||
it does not fully synchronize device, interface, and driver (not
|
||||
just usbfs) state.
|
||||
|
||||
Asynchronous I/O Support
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
As mentioned above, there are situations where it may be important to
|
||||
initiate concurrent operations from user mode code. This is particularly
|
||||
important for periodic transfers (interrupt and isochronous), but it can
|
||||
be used for other kinds of USB requests too. In such cases, the
|
||||
asynchronous requests described here are essential. Rather than
|
||||
submitting one request and having the kernel block until it completes,
|
||||
the blocking is separate.
|
||||
|
||||
These requests are packaged into a structure that resembles the URB used
|
||||
by kernel device drivers. (No POSIX Async I/O support here, sorry.) It
|
||||
identifies the endpoint type (USBDEVFS_URB_TYPE_\*), endpoint
|
||||
(number, masked with USB_DIR_IN as appropriate), buffer and length,
|
||||
and a user "context" value serving to uniquely identify each request.
|
||||
(It's usually a pointer to per-request data.) Flags can modify requests
|
||||
(not as many as supported for kernel drivers).
|
||||
|
||||
Each request can specify a realtime signal number (between SIGRTMIN and
|
||||
SIGRTMAX, inclusive) to request a signal be sent when the request
|
||||
completes.
|
||||
|
||||
When usbfs returns these urbs, the status value is updated, and the
|
||||
buffer may have been modified. Except for isochronous transfers, the
|
||||
actual_length is updated to say how many bytes were transferred; if the
|
||||
USBDEVFS_URB_DISABLE_SPD flag is set ("short packets are not OK"), if
|
||||
fewer bytes were read than were requested then you get an error report.
|
||||
|
||||
::
|
||||
|
||||
struct usbdevfs_iso_packet_desc {
|
||||
unsigned int length;
|
||||
unsigned int actual_length;
|
||||
unsigned int status;
|
||||
};
|
||||
|
||||
struct usbdevfs_urb {
|
||||
unsigned char type;
|
||||
unsigned char endpoint;
|
||||
int status;
|
||||
unsigned int flags;
|
||||
void *buffer;
|
||||
int buffer_length;
|
||||
int actual_length;
|
||||
int start_frame;
|
||||
int number_of_packets;
|
||||
int error_count;
|
||||
unsigned int signr;
|
||||
void *usercontext;
|
||||
struct usbdevfs_iso_packet_desc iso_frame_desc[];
|
||||
};
|
||||
|
||||
For these asynchronous requests, the file modification time reflects
|
||||
when the request was initiated. This contrasts with their use with the
|
||||
synchronous requests, where it reflects when requests complete.
|
||||
|
||||
USBDEVFS_DISCARDURB
|
||||
*TBS* File modification time is not updated by this request.
|
||||
|
||||
USBDEVFS_DISCSIGNAL
|
||||
*TBS* File modification time is not updated by this request.
|
||||
|
||||
USBDEVFS_REAPURB
|
||||
*TBS* File modification time is not updated by this request.
|
||||
|
||||
USBDEVFS_REAPURBNDELAY
|
||||
*TBS* File modification time is not updated by this request.
|
||||
|
||||
USBDEVFS_SUBMITURB
|
||||
*TBS*
|
@ -1,13 +1,15 @@
|
||||
VME Device Driver API
|
||||
=====================
|
||||
VME Device Drivers
|
||||
==================
|
||||
|
||||
Driver registration
|
||||
===================
|
||||
-------------------
|
||||
|
||||
As with other subsystems within the Linux kernel, VME device drivers register
|
||||
with the VME subsystem, typically called from the devices init routine. This is
|
||||
achieved via a call to the following function:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
int vme_register_driver (struct vme_driver *driver, unsigned int ndevs);
|
||||
|
||||
If driver registration is successful this function returns zero, if an error
|
||||
@ -17,6 +19,8 @@ A pointer to a structure of type 'vme_driver' must be provided to the
|
||||
registration function. Along with ndevs, which is the number of devices your
|
||||
driver is able to support. The structure is as follows:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
struct vme_driver {
|
||||
struct list_head node;
|
||||
const char *name;
|
||||
@ -38,6 +42,8 @@ with the driver. The match function should return 1 if a device should be
|
||||
probed and 0 otherwise. This example match function (from vme_user.c) limits
|
||||
the number of devices probed to one:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
#define USER_BUS_MAX 1
|
||||
...
|
||||
static int vme_user_match(struct vme_dev *vdev)
|
||||
@ -51,6 +57,8 @@ The '.probe' element should contain a pointer to the probe routine. The
|
||||
probe routine is passed a 'struct vme_dev' pointer as an argument. The
|
||||
'struct vme_dev' structure looks like the following:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
struct vme_dev {
|
||||
int num;
|
||||
struct vme_bridge *bridge;
|
||||
@ -66,11 +74,13 @@ dev->bridge->num.
|
||||
A function is also provided to unregister the driver from the VME core and is
|
||||
usually called from the device driver's exit routine:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
void vme_unregister_driver (struct vme_driver *driver);
|
||||
|
||||
|
||||
Resource management
|
||||
===================
|
||||
-------------------
|
||||
|
||||
Once a driver has registered with the VME core the provided match routine will
|
||||
be called the number of times specified during the registration. If a match
|
||||
@ -86,6 +96,8 @@ specific window or DMA channel (which may be used by a different driver) this
|
||||
driver allows a resource to be assigned based on the required attributes of the
|
||||
driver in question:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
struct vme_resource * vme_master_request(struct vme_dev *dev,
|
||||
u32 aspace, u32 cycle, u32 width);
|
||||
|
||||
@ -112,6 +124,8 @@ Functions are also provided to free window allocations once they are no longer
|
||||
required. These functions should be passed the pointer to the resource provided
|
||||
during resource allocation:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
void vme_master_free(struct vme_resource *res);
|
||||
|
||||
void vme_slave_free(struct vme_resource *res);
|
||||
@ -120,7 +134,7 @@ during resource allocation:
|
||||
|
||||
|
||||
Master windows
|
||||
==============
|
||||
--------------
|
||||
|
||||
Master windows provide access from the local processor[s] out onto the VME bus.
|
||||
The number of windows available and the available access modes is dependent on
|
||||
@ -128,11 +142,13 @@ the underlying chipset. A window must be configured before it can be used.
|
||||
|
||||
|
||||
Master window configuration
|
||||
---------------------------
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Once a master window has been assigned the following functions can be used to
|
||||
configure it and retrieve the current settings:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
int vme_master_set (struct vme_resource *res, int enabled,
|
||||
unsigned long long base, unsigned long long size, u32 aspace,
|
||||
u32 cycle, u32 width);
|
||||
@ -149,11 +165,13 @@ These functions return 0 on success or an error code should the call fail.
|
||||
|
||||
|
||||
Master window access
|
||||
--------------------
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The following functions can be used to read from and write to configured master
|
||||
windows. These functions return the number of bytes copied:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
ssize_t vme_master_read(struct vme_resource *res, void *buf,
|
||||
size_t count, loff_t offset);
|
||||
|
||||
@ -164,6 +182,8 @@ In addition to simple reads and writes, a function is provided to do a
|
||||
read-modify-write transaction. This function returns the original value of the
|
||||
VME bus location :
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
unsigned int vme_master_rmw (struct vme_resource *res,
|
||||
unsigned int mask, unsigned int compare, unsigned int swap,
|
||||
loff_t offset);
|
||||
@ -175,12 +195,14 @@ the value of swap is written the specified offset.
|
||||
Parts of a VME window can be mapped into user space memory using the following
|
||||
function:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
int vme_master_mmap(struct vme_resource *resource,
|
||||
struct vm_area_struct *vma)
|
||||
|
||||
|
||||
Slave windows
|
||||
=============
|
||||
-------------
|
||||
|
||||
Slave windows provide devices on the VME bus access into mapped portions of the
|
||||
local memory. The number of windows available and the access modes that can be
|
||||
@ -189,11 +211,13 @@ it can be used.
|
||||
|
||||
|
||||
Slave window configuration
|
||||
--------------------------
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Once a slave window has been assigned the following functions can be used to
|
||||
configure it and retrieve the current settings:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
int vme_slave_set (struct vme_resource *res, int enabled,
|
||||
unsigned long long base, unsigned long long size,
|
||||
dma_addr_t mem, u32 aspace, u32 cycle);
|
||||
@ -210,13 +234,15 @@ These functions return 0 on success or an error code should the call fail.
|
||||
|
||||
|
||||
Slave window buffer allocation
|
||||
------------------------------
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Functions are provided to allow the user to allocate and free a contiguous
|
||||
buffers which will be accessible by the VME bridge. These functions do not have
|
||||
to be used, other methods can be used to allocate a buffer, though care must be
|
||||
taken to ensure that they are contiguous and accessible by the VME bridge:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
void * vme_alloc_consistent(struct vme_resource *res, size_t size,
|
||||
dma_addr_t *mem);
|
||||
|
||||
@ -225,14 +251,14 @@ taken to ensure that they are contiguous and accessible by the VME bridge:
|
||||
|
||||
|
||||
Slave window access
|
||||
-------------------
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Slave windows map local memory onto the VME bus, the standard methods for
|
||||
accessing memory should be used.
|
||||
|
||||
|
||||
DMA channels
|
||||
============
|
||||
------------
|
||||
|
||||
The VME DMA transfer provides the ability to run link-list DMA transfers. The
|
||||
API introduces the concept of DMA lists. Each DMA list is a link-list which can
|
||||
@ -241,29 +267,35 @@ executed, reused and destroyed.
|
||||
|
||||
|
||||
List Management
|
||||
---------------
|
||||
~~~~~~~~~~~~~~~
|
||||
|
||||
The following functions are provided to create and destroy DMA lists. Execution
|
||||
of a list will not automatically destroy the list, thus enabling a list to be
|
||||
reused for repetitive tasks:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
struct vme_dma_list *vme_new_dma_list(struct vme_resource *res);
|
||||
|
||||
int vme_dma_list_free(struct vme_dma_list *list);
|
||||
|
||||
|
||||
List Population
|
||||
---------------
|
||||
~~~~~~~~~~~~~~~
|
||||
|
||||
An item can be added to a list using the following function ( the source and
|
||||
destination attributes need to be created before calling this function, this is
|
||||
covered under "Transfer Attributes"):
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
int vme_dma_list_add(struct vme_dma_list *list,
|
||||
struct vme_dma_attr *src, struct vme_dma_attr *dest,
|
||||
size_t count);
|
||||
|
||||
NOTE: The detailed attributes of the transfers source and destination
|
||||
.. note::
|
||||
|
||||
The detailed attributes of the transfers source and destination
|
||||
are not checked until an entry is added to a DMA list, the request
|
||||
for a DMA channel purely checks the directions in which the
|
||||
controller is expected to transfer data. As a result it is
|
||||
@ -271,7 +303,7 @@ NOTE: The detailed attributes of the transfers source and destination
|
||||
source or destination is in an unsupported VME address space.
|
||||
|
||||
Transfer Attributes
|
||||
-------------------
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The attributes for the source and destination are handled separately from adding
|
||||
an item to a list. This is due to the diverse attributes required for each type
|
||||
@ -280,33 +312,43 @@ and pattern sources and destinations (where appropriate):
|
||||
|
||||
Pattern source:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
struct vme_dma_attr *vme_dma_pattern_attribute(u32 pattern, u32 type);
|
||||
|
||||
PCI source or destination:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
struct vme_dma_attr *vme_dma_pci_attribute(dma_addr_t mem);
|
||||
|
||||
VME source or destination:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
struct vme_dma_attr *vme_dma_vme_attribute(unsigned long long base,
|
||||
u32 aspace, u32 cycle, u32 width);
|
||||
|
||||
The following function should be used to free an attribute:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
void vme_dma_free_attribute(struct vme_dma_attr *attr);
|
||||
|
||||
|
||||
List Execution
|
||||
--------------
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
The following function queues a list for execution. The function will return
|
||||
once the list has been executed:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
int vme_dma_list_exec(struct vme_dma_list *list);
|
||||
|
||||
|
||||
Interrupts
|
||||
==========
|
||||
----------
|
||||
|
||||
The VME API provides functions to attach and detach callbacks to specific VME
|
||||
level and status ID combinations and for the generation of VME interrupts with
|
||||
@ -314,13 +356,15 @@ specific VME level and status IDs.
|
||||
|
||||
|
||||
Attaching Interrupt Handlers
|
||||
----------------------------
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The following functions can be used to attach and free a specific VME level and
|
||||
status ID combination. Any given combination can only be assigned a single
|
||||
callback function. A void pointer parameter is provided, the value of which is
|
||||
passed to the callback function, the use of this pointer is user undefined:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
int vme_irq_request(struct vme_dev *dev, int level, int statid,
|
||||
void (*callback)(int, int, void *), void *priv);
|
||||
|
||||
@ -329,31 +373,37 @@ passed to the callback function, the use of this pointer is user undefined:
|
||||
The callback parameters are as follows. Care must be taken in writing a callback
|
||||
function, callback functions run in interrupt context:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
void callback(int level, int statid, void *priv);
|
||||
|
||||
|
||||
Interrupt Generation
|
||||
--------------------
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The following function can be used to generate a VME interrupt at a given VME
|
||||
level and VME status ID:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
int vme_irq_generate(struct vme_dev *dev, int level, int statid);
|
||||
|
||||
|
||||
Location monitors
|
||||
=================
|
||||
-----------------
|
||||
|
||||
The VME API provides the following functionality to configure the location
|
||||
monitor.
|
||||
|
||||
|
||||
Location Monitor Management
|
||||
---------------------------
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The following functions are provided to request the use of a block of location
|
||||
monitors and to free them after they are no longer required:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
struct vme_resource * vme_lm_request(struct vme_dev *dev);
|
||||
|
||||
void vme_lm_free(struct vme_resource * res);
|
||||
@ -362,15 +412,19 @@ Each block may provide a number of location monitors, monitoring adjacent
|
||||
locations. The following function can be used to determine how many locations
|
||||
are provided:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
int vme_lm_count(struct vme_resource * res);
|
||||
|
||||
|
||||
Location Monitor Configuration
|
||||
------------------------------
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Once a bank of location monitors has been allocated, the following functions
|
||||
are provided to configure the location and mode of the location monitor:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
int vme_lm_set(struct vme_resource *res, unsigned long long base,
|
||||
u32 aspace, u32 cycle);
|
||||
|
||||
@ -379,12 +433,14 @@ are provided to configure the location and mode of the location monitor:
|
||||
|
||||
|
||||
Location Monitor Use
|
||||
--------------------
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The following functions allow a callback to be attached and detached from each
|
||||
location monitor location. Each location monitor can monitor a number of
|
||||
adjacent locations:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
int vme_lm_attach(struct vme_resource *res, int num,
|
||||
void (*callback)(void *));
|
||||
|
||||
@ -392,22 +448,27 @@ adjacent locations:
|
||||
|
||||
The callback function is declared as follows.
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
void callback(void *data);
|
||||
|
||||
|
||||
Slot Detection
|
||||
==============
|
||||
--------------
|
||||
|
||||
This function returns the slot ID of the provided bridge.
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
int vme_slot_num(struct vme_dev *dev);
|
||||
|
||||
|
||||
Bus Detection
|
||||
=============
|
||||
-------------
|
||||
|
||||
This function returns the bus ID of the provided bridge.
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
int vme_bus_num(struct vme_dev *dev);
|
||||
|
||||
|
@ -1,340 +0,0 @@
|
||||
|
||||
Introduction
|
||||
============
|
||||
|
||||
This document describes how to use the dynamic debug (dyndbg) feature.
|
||||
|
||||
Dynamic debug is designed to allow you to dynamically enable/disable
|
||||
kernel code to obtain additional kernel information. Currently, if
|
||||
CONFIG_DYNAMIC_DEBUG is set, then all pr_debug()/dev_dbg() and
|
||||
print_hex_dump_debug()/print_hex_dump_bytes() calls can be dynamically
|
||||
enabled per-callsite.
|
||||
|
||||
If CONFIG_DYNAMIC_DEBUG is not set, print_hex_dump_debug() is just
|
||||
shortcut for print_hex_dump(KERN_DEBUG).
|
||||
|
||||
For print_hex_dump_debug()/print_hex_dump_bytes(), format string is
|
||||
its 'prefix_str' argument, if it is constant string; or "hexdump"
|
||||
in case 'prefix_str' is build dynamically.
|
||||
|
||||
Dynamic debug has even more useful features:
|
||||
|
||||
* Simple query language allows turning on and off debugging
|
||||
statements by matching any combination of 0 or 1 of:
|
||||
|
||||
- source filename
|
||||
- function name
|
||||
- line number (including ranges of line numbers)
|
||||
- module name
|
||||
- format string
|
||||
|
||||
* Provides a debugfs control file: <debugfs>/dynamic_debug/control
|
||||
which can be read to display the complete list of known debug
|
||||
statements, to help guide you
|
||||
|
||||
Controlling dynamic debug Behaviour
|
||||
===================================
|
||||
|
||||
The behaviour of pr_debug()/dev_dbg()s are controlled via writing to a
|
||||
control file in the 'debugfs' filesystem. Thus, you must first mount
|
||||
the debugfs filesystem, in order to make use of this feature.
|
||||
Subsequently, we refer to the control file as:
|
||||
<debugfs>/dynamic_debug/control. For example, if you want to enable
|
||||
printing from source file 'svcsock.c', line 1603 you simply do:
|
||||
|
||||
nullarbor:~ # echo 'file svcsock.c line 1603 +p' >
|
||||
<debugfs>/dynamic_debug/control
|
||||
|
||||
If you make a mistake with the syntax, the write will fail thus:
|
||||
|
||||
nullarbor:~ # echo 'file svcsock.c wtf 1 +p' >
|
||||
<debugfs>/dynamic_debug/control
|
||||
-bash: echo: write error: Invalid argument
|
||||
|
||||
Viewing Dynamic Debug Behaviour
|
||||
===========================
|
||||
|
||||
You can view the currently configured behaviour of all the debug
|
||||
statements via:
|
||||
|
||||
nullarbor:~ # cat <debugfs>/dynamic_debug/control
|
||||
# filename:lineno [module]function flags format
|
||||
/usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svc_rdma.c:323 [svcxprt_rdma]svc_rdma_cleanup =_ "SVCRDMA Module Removed, deregister RPC RDMA transport\012"
|
||||
/usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svc_rdma.c:341 [svcxprt_rdma]svc_rdma_init =_ "\011max_inline : %d\012"
|
||||
/usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svc_rdma.c:340 [svcxprt_rdma]svc_rdma_init =_ "\011sq_depth : %d\012"
|
||||
/usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svc_rdma.c:338 [svcxprt_rdma]svc_rdma_init =_ "\011max_requests : %d\012"
|
||||
...
|
||||
|
||||
|
||||
You can also apply standard Unix text manipulation filters to this
|
||||
data, e.g.
|
||||
|
||||
nullarbor:~ # grep -i rdma <debugfs>/dynamic_debug/control | wc -l
|
||||
62
|
||||
|
||||
nullarbor:~ # grep -i tcp <debugfs>/dynamic_debug/control | wc -l
|
||||
42
|
||||
|
||||
The third column shows the currently enabled flags for each debug
|
||||
statement callsite (see below for definitions of the flags). The
|
||||
default value, with no flags enabled, is "=_". So you can view all
|
||||
the debug statement callsites with any non-default flags:
|
||||
|
||||
nullarbor:~ # awk '$3 != "=_"' <debugfs>/dynamic_debug/control
|
||||
# filename:lineno [module]function flags format
|
||||
/usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svcsock.c:1603 [sunrpc]svc_send p "svc_process: st_sendto returned %d\012"
|
||||
|
||||
|
||||
Command Language Reference
|
||||
==========================
|
||||
|
||||
At the lexical level, a command comprises a sequence of words separated
|
||||
by spaces or tabs. So these are all equivalent:
|
||||
|
||||
nullarbor:~ # echo -c 'file svcsock.c line 1603 +p' >
|
||||
<debugfs>/dynamic_debug/control
|
||||
nullarbor:~ # echo -c ' file svcsock.c line 1603 +p ' >
|
||||
<debugfs>/dynamic_debug/control
|
||||
nullarbor:~ # echo -n 'file svcsock.c line 1603 +p' >
|
||||
<debugfs>/dynamic_debug/control
|
||||
|
||||
Command submissions are bounded by a write() system call.
|
||||
Multiple commands can be written together, separated by ';' or '\n'.
|
||||
|
||||
~# echo "func pnpacpi_get_resources +p; func pnp_assign_mem +p" \
|
||||
> <debugfs>/dynamic_debug/control
|
||||
|
||||
If your query set is big, you can batch them too:
|
||||
|
||||
~# cat query-batch-file > <debugfs>/dynamic_debug/control
|
||||
|
||||
A another way is to use wildcard. The match rule support '*' (matches
|
||||
zero or more characters) and '?' (matches exactly one character).For
|
||||
example, you can match all usb drivers:
|
||||
|
||||
~# echo "file drivers/usb/* +p" > <debugfs>/dynamic_debug/control
|
||||
|
||||
At the syntactical level, a command comprises a sequence of match
|
||||
specifications, followed by a flags change specification.
|
||||
|
||||
command ::= match-spec* flags-spec
|
||||
|
||||
The match-spec's are used to choose a subset of the known pr_debug()
|
||||
callsites to which to apply the flags-spec. Think of them as a query
|
||||
with implicit ANDs between each pair. Note that an empty list of
|
||||
match-specs will select all debug statement callsites.
|
||||
|
||||
A match specification comprises a keyword, which controls the
|
||||
attribute of the callsite to be compared, and a value to compare
|
||||
against. Possible keywords are:
|
||||
|
||||
match-spec ::= 'func' string |
|
||||
'file' string |
|
||||
'module' string |
|
||||
'format' string |
|
||||
'line' line-range
|
||||
|
||||
line-range ::= lineno |
|
||||
'-'lineno |
|
||||
lineno'-' |
|
||||
lineno'-'lineno
|
||||
// Note: line-range cannot contain space, e.g.
|
||||
// "1-30" is valid range but "1 - 30" is not.
|
||||
|
||||
lineno ::= unsigned-int
|
||||
|
||||
The meanings of each keyword are:
|
||||
|
||||
func
|
||||
The given string is compared against the function name
|
||||
of each callsite. Example:
|
||||
|
||||
func svc_tcp_accept
|
||||
|
||||
file
|
||||
The given string is compared against either the full pathname, the
|
||||
src-root relative pathname, or the basename of the source file of
|
||||
each callsite. Examples:
|
||||
|
||||
file svcsock.c
|
||||
file kernel/freezer.c
|
||||
file /usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svcsock.c
|
||||
|
||||
module
|
||||
The given string is compared against the module name
|
||||
of each callsite. The module name is the string as
|
||||
seen in "lsmod", i.e. without the directory or the .ko
|
||||
suffix and with '-' changed to '_'. Examples:
|
||||
|
||||
module sunrpc
|
||||
module nfsd
|
||||
|
||||
format
|
||||
The given string is searched for in the dynamic debug format
|
||||
string. Note that the string does not need to match the
|
||||
entire format, only some part. Whitespace and other
|
||||
special characters can be escaped using C octal character
|
||||
escape \ooo notation, e.g. the space character is \040.
|
||||
Alternatively, the string can be enclosed in double quote
|
||||
characters (") or single quote characters (').
|
||||
Examples:
|
||||
|
||||
format svcrdma: // many of the NFS/RDMA server pr_debugs
|
||||
format readahead // some pr_debugs in the readahead cache
|
||||
format nfsd:\040SETATTR // one way to match a format with whitespace
|
||||
format "nfsd: SETATTR" // a neater way to match a format with whitespace
|
||||
format 'nfsd: SETATTR' // yet another way to match a format with whitespace
|
||||
|
||||
line
|
||||
The given line number or range of line numbers is compared
|
||||
against the line number of each pr_debug() callsite. A single
|
||||
line number matches the callsite line number exactly. A
|
||||
range of line numbers matches any callsite between the first
|
||||
and last line number inclusive. An empty first number means
|
||||
the first line in the file, an empty line number means the
|
||||
last number in the file. Examples:
|
||||
|
||||
line 1603 // exactly line 1603
|
||||
line 1600-1605 // the six lines from line 1600 to line 1605
|
||||
line -1605 // the 1605 lines from line 1 to line 1605
|
||||
line 1600- // all lines from line 1600 to the end of the file
|
||||
|
||||
The flags specification comprises a change operation followed
|
||||
by one or more flag characters. The change operation is one
|
||||
of the characters:
|
||||
|
||||
- remove the given flags
|
||||
+ add the given flags
|
||||
= set the flags to the given flags
|
||||
|
||||
The flags are:
|
||||
|
||||
p enables the pr_debug() callsite.
|
||||
f Include the function name in the printed message
|
||||
l Include line number in the printed message
|
||||
m Include module name in the printed message
|
||||
t Include thread ID in messages not generated from interrupt context
|
||||
_ No flags are set. (Or'd with others on input)
|
||||
|
||||
For print_hex_dump_debug() and print_hex_dump_bytes(), only 'p' flag
|
||||
have meaning, other flags ignored.
|
||||
|
||||
For display, the flags are preceded by '='
|
||||
(mnemonic: what the flags are currently equal to).
|
||||
|
||||
Note the regexp ^[-+=][flmpt_]+$ matches a flags specification.
|
||||
To clear all flags at once, use "=_" or "-flmpt".
|
||||
|
||||
|
||||
Debug messages during Boot Process
|
||||
==================================
|
||||
|
||||
To activate debug messages for core code and built-in modules during
|
||||
the boot process, even before userspace and debugfs exists, use
|
||||
dyndbg="QUERY", module.dyndbg="QUERY", or ddebug_query="QUERY"
|
||||
(ddebug_query is obsoleted by dyndbg, and deprecated). QUERY follows
|
||||
the syntax described above, but must not exceed 1023 characters. Your
|
||||
bootloader may impose lower limits.
|
||||
|
||||
These dyndbg params are processed just after the ddebug tables are
|
||||
processed, as part of the arch_initcall. Thus you can enable debug
|
||||
messages in all code run after this arch_initcall via this boot
|
||||
parameter.
|
||||
|
||||
On an x86 system for example ACPI enablement is a subsys_initcall and
|
||||
dyndbg="file ec.c +p"
|
||||
will show early Embedded Controller transactions during ACPI setup if
|
||||
your machine (typically a laptop) has an Embedded Controller.
|
||||
PCI (or other devices) initialization also is a hot candidate for using
|
||||
this boot parameter for debugging purposes.
|
||||
|
||||
If foo module is not built-in, foo.dyndbg will still be processed at
|
||||
boot time, without effect, but will be reprocessed when module is
|
||||
loaded later. dyndbg_query= and bare dyndbg= are only processed at
|
||||
boot.
|
||||
|
||||
|
||||
Debug Messages at Module Initialization Time
|
||||
============================================
|
||||
|
||||
When "modprobe foo" is called, modprobe scans /proc/cmdline for
|
||||
foo.params, strips "foo.", and passes them to the kernel along with
|
||||
params given in modprobe args or /etc/modprob.d/*.conf files,
|
||||
in the following order:
|
||||
|
||||
1. # parameters given via /etc/modprobe.d/*.conf
|
||||
options foo dyndbg=+pt
|
||||
options foo dyndbg # defaults to +p
|
||||
|
||||
2. # foo.dyndbg as given in boot args, "foo." is stripped and passed
|
||||
foo.dyndbg=" func bar +p; func buz +mp"
|
||||
|
||||
3. # args to modprobe
|
||||
modprobe foo dyndbg==pmf # override previous settings
|
||||
|
||||
These dyndbg queries are applied in order, with last having final say.
|
||||
This allows boot args to override or modify those from /etc/modprobe.d
|
||||
(sensible, since 1 is system wide, 2 is kernel or boot specific), and
|
||||
modprobe args to override both.
|
||||
|
||||
In the foo.dyndbg="QUERY" form, the query must exclude "module foo".
|
||||
"foo" is extracted from the param-name, and applied to each query in
|
||||
"QUERY", and only 1 match-spec of each type is allowed.
|
||||
|
||||
The dyndbg option is a "fake" module parameter, which means:
|
||||
|
||||
- modules do not need to define it explicitly
|
||||
- every module gets it tacitly, whether they use pr_debug or not
|
||||
- it doesn't appear in /sys/module/$module/parameters/
|
||||
To see it, grep the control file, or inspect /proc/cmdline.
|
||||
|
||||
For CONFIG_DYNAMIC_DEBUG kernels, any settings given at boot-time (or
|
||||
enabled by -DDEBUG flag during compilation) can be disabled later via
|
||||
the sysfs interface if the debug messages are no longer needed:
|
||||
|
||||
echo "module module_name -p" > <debugfs>/dynamic_debug/control
|
||||
|
||||
Examples
|
||||
========
|
||||
|
||||
// enable the message at line 1603 of file svcsock.c
|
||||
nullarbor:~ # echo -n 'file svcsock.c line 1603 +p' >
|
||||
<debugfs>/dynamic_debug/control
|
||||
|
||||
// enable all the messages in file svcsock.c
|
||||
nullarbor:~ # echo -n 'file svcsock.c +p' >
|
||||
<debugfs>/dynamic_debug/control
|
||||
|
||||
// enable all the messages in the NFS server module
|
||||
nullarbor:~ # echo -n 'module nfsd +p' >
|
||||
<debugfs>/dynamic_debug/control
|
||||
|
||||
// enable all 12 messages in the function svc_process()
|
||||
nullarbor:~ # echo -n 'func svc_process +p' >
|
||||
<debugfs>/dynamic_debug/control
|
||||
|
||||
// disable all 12 messages in the function svc_process()
|
||||
nullarbor:~ # echo -n 'func svc_process -p' >
|
||||
<debugfs>/dynamic_debug/control
|
||||
|
||||
// enable messages for NFS calls READ, READLINK, READDIR and READDIR+.
|
||||
nullarbor:~ # echo -n 'format "nfsd: READ" +p' >
|
||||
<debugfs>/dynamic_debug/control
|
||||
|
||||
// enable messages in files of which the paths include string "usb"
|
||||
nullarbor:~ # echo -n '*usb* +p' > <debugfs>/dynamic_debug/control
|
||||
|
||||
// enable all messages
|
||||
nullarbor:~ # echo -n '+p' > <debugfs>/dynamic_debug/control
|
||||
|
||||
// add module, function to all enabled messages
|
||||
nullarbor:~ # echo -n '+mf' > <debugfs>/dynamic_debug/control
|
||||
|
||||
// boot-args example, with newlines and comments for readability
|
||||
Kernel command line: ...
|
||||
// see whats going on in dyndbg=value processing
|
||||
dynamic_debug.verbose=1
|
||||
// enable pr_debugs in 2 builtins, #cmt is stripped
|
||||
dyndbg="module params +p #cmt ; module sys +p"
|
||||
// enable pr_debugs in 2 functions in a module loaded later
|
||||
pc87360.dyndbg="func pc87360_init_device +p; func pc87360_find +p"
|
@ -19,7 +19,7 @@ forever.
|
||||
|
||||
This should not cause problems for anybody, since everybody using a
|
||||
2.1.x kernel should have updated their C library to a suitable version
|
||||
anyway (see the file "Documentation/Changes".)
|
||||
anyway (see the file "Documentation/process/changes.rst".)
|
||||
|
||||
1.2 Allow Mixed Locks Again
|
||||
---------------------------
|
||||
|
@ -11,7 +11,7 @@ Updated 2006 by Horms <horms@verge.net.au>
|
||||
In order to use a diskless system, such as an X-terminal or printer server
|
||||
for example, it is necessary for the root filesystem to be present on a
|
||||
non-disk device. This may be an initramfs (see Documentation/filesystems/
|
||||
ramfs-rootfs-initramfs.txt), a ramdisk (see Documentation/initrd.txt) or a
|
||||
ramfs-rootfs-initramfs.txt), a ramdisk (see Documentation/admin-guide/initrd.rst) or a
|
||||
filesystem mounted via NFS. The following text describes on how to use NFS
|
||||
for the root filesystem. For the rest of this text 'client' means the
|
||||
diskless system, and 'server' means the NFS server.
|
||||
@ -284,7 +284,7 @@ They depend on various facilities being available:
|
||||
"kernel <relative-path-below /tftpboot>". The nfsroot parameters
|
||||
are passed to the kernel by adding them to the "append" line.
|
||||
It is common to use serial console in conjunction with pxeliunx,
|
||||
see Documentation/serial-console.txt for more information.
|
||||
see Documentation/admin-guide/serial-console.rst for more information.
|
||||
|
||||
For more information on isolinux, including how to create bootdisks
|
||||
for prebuilt kernels, see http://syslinux.zytor.com/
|
||||
|
@ -1305,7 +1305,16 @@ second). The meanings of the columns are as follows, from left to right:
|
||||
- nice: niced processes executing in user mode
|
||||
- system: processes executing in kernel mode
|
||||
- idle: twiddling thumbs
|
||||
- iowait: waiting for I/O to complete
|
||||
- iowait: In a word, iowait stands for waiting for I/O to complete. But there
|
||||
are several problems:
|
||||
1. Cpu will not wait for I/O to complete, iowait is the time that a task is
|
||||
waiting for I/O to complete. When cpu goes into idle state for
|
||||
outstanding task io, another task will be scheduled on this CPU.
|
||||
2. In a multi-core CPU, the task waiting for I/O to complete is not running
|
||||
on any CPU, so the iowait of each CPU is difficult to calculate.
|
||||
3. The value of iowait field in /proc/stat will decrease in certain
|
||||
conditions.
|
||||
So, the iowait is not reliable by reading from /proc/stat.
|
||||
- irq: servicing interrupts
|
||||
- softirq: servicing softirqs
|
||||
- steal: involuntary wait
|
||||
|
@ -119,7 +119,7 @@ separated by spaces:
|
||||
253:0 Device with major 253 and minor 0
|
||||
|
||||
Authoritative information can be found in
|
||||
"Documentation/kernel-parameters.txt".
|
||||
"Documentation/admin-guide/kernel-parameters.rst".
|
||||
|
||||
(*) rw
|
||||
|
||||
|
@ -3,3 +3,8 @@
|
||||
project = "Linux GPU Driver Developer's Guide"
|
||||
|
||||
tags.add("subproject")
|
||||
|
||||
latex_documents = [
|
||||
('index', 'gpu.tex', project,
|
||||
'The kernel development community', 'manual'),
|
||||
]
|
||||
|
@ -188,7 +188,7 @@ Connectors state change detection must be cleanup up with a call to
|
||||
Output discovery and initialization example
|
||||
-------------------------------------------
|
||||
|
||||
::
|
||||
.. code-block:: c
|
||||
|
||||
void intel_crt_init(struct drm_device *dev)
|
||||
{
|
||||
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue
Block a user