mirror of
https://mirrors.bfsu.edu.cn/git/linux.git
synced 2024-12-27 21:14:44 +08:00
213 lines
9.8 KiB
ReStructuredText
213 lines
9.8 KiB
ReStructuredText
|
.. SPDX-License-Identifier: GPL-2.0
|
||
|
.. _xfs_online_fsck_design:
|
||
|
|
||
|
..
|
||
|
Mapping of heading styles within this document:
|
||
|
Heading 1 uses "====" above and below
|
||
|
Heading 2 uses "===="
|
||
|
Heading 3 uses "----"
|
||
|
Heading 4 uses "````"
|
||
|
Heading 5 uses "^^^^"
|
||
|
Heading 6 uses "~~~~"
|
||
|
Heading 7 uses "...."
|
||
|
|
||
|
Sections are manually numbered because apparently that's what everyone
|
||
|
does in the kernel.
|
||
|
|
||
|
======================
|
||
|
XFS Online Fsck Design
|
||
|
======================
|
||
|
|
||
|
This document captures the design of the online filesystem check feature for
|
||
|
XFS.
|
||
|
The purpose of this document is threefold:
|
||
|
|
||
|
- To help kernel distributors understand exactly what the XFS online fsck
|
||
|
feature is, and issues about which they should be aware.
|
||
|
|
||
|
- To help people reading the code to familiarize themselves with the relevant
|
||
|
concepts and design points before they start digging into the code.
|
||
|
|
||
|
- To help developers maintaining the system by capturing the reasons
|
||
|
supporting higher level decision making.
|
||
|
|
||
|
As the online fsck code is merged, the links in this document to topic branches
|
||
|
will be replaced with links to code.
|
||
|
|
||
|
This document is licensed under the terms of the GNU Public License, v2.
|
||
|
The primary author is Darrick J. Wong.
|
||
|
|
||
|
This design document is split into seven parts.
|
||
|
Part 1 defines what fsck tools are and the motivations for writing a new one.
|
||
|
Parts 2 and 3 present a high level overview of how online fsck process works
|
||
|
and how it is tested to ensure correct functionality.
|
||
|
Part 4 discusses the user interface and the intended usage modes of the new
|
||
|
program.
|
||
|
Parts 5 and 6 show off the high level components and how they fit together, and
|
||
|
then present case studies of how each repair function actually works.
|
||
|
Part 7 sums up what has been discussed so far and speculates about what else
|
||
|
might be built atop online fsck.
|
||
|
|
||
|
.. contents:: Table of Contents
|
||
|
:local:
|
||
|
|
||
|
1. What is a Filesystem Check?
|
||
|
==============================
|
||
|
|
||
|
A Unix filesystem has four main responsibilities:
|
||
|
|
||
|
- Provide a hierarchy of names through which application programs can associate
|
||
|
arbitrary blobs of data for any length of time,
|
||
|
|
||
|
- Virtualize physical storage media across those names, and
|
||
|
|
||
|
- Retrieve the named data blobs at any time.
|
||
|
|
||
|
- Examine resource usage.
|
||
|
|
||
|
Metadata directly supporting these functions (e.g. files, directories, space
|
||
|
mappings) are sometimes called primary metadata.
|
||
|
Secondary metadata (e.g. reverse mapping and directory parent pointers) support
|
||
|
operations internal to the filesystem, such as internal consistency checking
|
||
|
and reorganization.
|
||
|
Summary metadata, as the name implies, condense information contained in
|
||
|
primary metadata for performance reasons.
|
||
|
|
||
|
The filesystem check (fsck) tool examines all the metadata in a filesystem
|
||
|
to look for errors.
|
||
|
In addition to looking for obvious metadata corruptions, fsck also
|
||
|
cross-references different types of metadata records with each other to look
|
||
|
for inconsistencies.
|
||
|
People do not like losing data, so most fsck tools also contains some ability
|
||
|
to correct any problems found.
|
||
|
As a word of caution -- the primary goal of most Linux fsck tools is to restore
|
||
|
the filesystem metadata to a consistent state, not to maximize the data
|
||
|
recovered.
|
||
|
That precedent will not be challenged here.
|
||
|
|
||
|
Filesystems of the 20th century generally lacked any redundancy in the ondisk
|
||
|
format, which means that fsck can only respond to errors by erasing files until
|
||
|
errors are no longer detected.
|
||
|
More recent filesystem designs contain enough redundancy in their metadata that
|
||
|
it is now possible to regenerate data structures when non-catastrophic errors
|
||
|
occur; this capability aids both strategies.
|
||
|
|
||
|
+--------------------------------------------------------------------------+
|
||
|
| **Note**: |
|
||
|
+--------------------------------------------------------------------------+
|
||
|
| System administrators avoid data loss by increasing the number of |
|
||
|
| separate storage systems through the creation of backups; and they avoid |
|
||
|
| downtime by increasing the redundancy of each storage system through the |
|
||
|
| creation of RAID arrays. |
|
||
|
| fsck tools address only the first problem. |
|
||
|
+--------------------------------------------------------------------------+
|
||
|
|
||
|
TLDR; Show Me the Code!
|
||
|
-----------------------
|
||
|
|
||
|
Code is posted to the kernel.org git trees as follows:
|
||
|
`kernel changes <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-symlink>`_,
|
||
|
`userspace changes <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-media-scan-service>`_, and
|
||
|
`QA test changes <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=repair-dirs>`_.
|
||
|
Each kernel patchset adding an online repair function will use the same branch
|
||
|
name across the kernel, xfsprogs, and fstests git repos.
|
||
|
|
||
|
Existing Tools
|
||
|
--------------
|
||
|
|
||
|
The online fsck tool described here will be the third tool in the history of
|
||
|
XFS (on Linux) to check and repair filesystems.
|
||
|
Two programs precede it:
|
||
|
|
||
|
The first program, ``xfs_check``, was created as part of the XFS debugger
|
||
|
(``xfs_db``) and can only be used with unmounted filesystems.
|
||
|
It walks all metadata in the filesystem looking for inconsistencies in the
|
||
|
metadata, though it lacks any ability to repair what it finds.
|
||
|
Due to its high memory requirements and inability to repair things, this
|
||
|
program is now deprecated and will not be discussed further.
|
||
|
|
||
|
The second program, ``xfs_repair``, was created to be faster and more robust
|
||
|
than the first program.
|
||
|
Like its predecessor, it can only be used with unmounted filesystems.
|
||
|
It uses extent-based in-memory data structures to reduce memory consumption,
|
||
|
and tries to schedule readahead IO appropriately to reduce I/O waiting time
|
||
|
while it scans the metadata of the entire filesystem.
|
||
|
The most important feature of this tool is its ability to respond to
|
||
|
inconsistencies in file metadata and directory tree by erasing things as needed
|
||
|
to eliminate problems.
|
||
|
Space usage metadata are rebuilt from the observed file metadata.
|
||
|
|
||
|
Problem Statement
|
||
|
-----------------
|
||
|
|
||
|
The current XFS tools leave several problems unsolved:
|
||
|
|
||
|
1. **User programs** suddenly **lose access** to the filesystem when unexpected
|
||
|
shutdowns occur as a result of silent corruptions in the metadata.
|
||
|
These occur **unpredictably** and often without warning.
|
||
|
|
||
|
2. **Users** experience a **total loss of service** during the recovery period
|
||
|
after an **unexpected shutdown** occurs.
|
||
|
|
||
|
3. **Users** experience a **total loss of service** if the filesystem is taken
|
||
|
offline to **look for problems** proactively.
|
||
|
|
||
|
4. **Data owners** cannot **check the integrity** of their stored data without
|
||
|
reading all of it.
|
||
|
This may expose them to substantial billing costs when a linear media scan
|
||
|
performed by the storage system administrator might suffice.
|
||
|
|
||
|
5. **System administrators** cannot **schedule** a maintenance window to deal
|
||
|
with corruptions if they **lack the means** to assess filesystem health
|
||
|
while the filesystem is online.
|
||
|
|
||
|
6. **Fleet monitoring tools** cannot **automate periodic checks** of filesystem
|
||
|
health when doing so requires **manual intervention** and downtime.
|
||
|
|
||
|
7. **Users** can be tricked into **doing things they do not desire** when
|
||
|
malicious actors **exploit quirks of Unicode** to place misleading names
|
||
|
in directories.
|
||
|
|
||
|
Given this definition of the problems to be solved and the actors who would
|
||
|
benefit, the proposed solution is a third fsck tool that acts on a running
|
||
|
filesystem.
|
||
|
|
||
|
This new third program has three components: an in-kernel facility to check
|
||
|
metadata, an in-kernel facility to repair metadata, and a userspace driver
|
||
|
program to drive fsck activity on a live filesystem.
|
||
|
``xfs_scrub`` is the name of the driver program.
|
||
|
The rest of this document presents the goals and use cases of the new fsck
|
||
|
tool, describes its major design points in connection to those goals, and
|
||
|
discusses the similarities and differences with existing tools.
|
||
|
|
||
|
+--------------------------------------------------------------------------+
|
||
|
| **Note**: |
|
||
|
+--------------------------------------------------------------------------+
|
||
|
| Throughout this document, the existing offline fsck tool can also be |
|
||
|
| referred to by its current name "``xfs_repair``". |
|
||
|
| The userspace driver program for the new online fsck tool can be |
|
||
|
| referred to as "``xfs_scrub``". |
|
||
|
| The kernel portion of online fsck that validates metadata is called |
|
||
|
| "online scrub", and portion of the kernel that fixes metadata is called |
|
||
|
| "online repair". |
|
||
|
+--------------------------------------------------------------------------+
|
||
|
|
||
|
The naming hierarchy is broken up into objects known as directories and files
|
||
|
and the physical space is split into pieces known as allocation groups.
|
||
|
Sharding enables better performance on highly parallel systems and helps to
|
||
|
contain the damage when corruptions occur.
|
||
|
The division of the filesystem into principal objects (allocation groups and
|
||
|
inodes) means that there are ample opportunities to perform targeted checks and
|
||
|
repairs on a subset of the filesystem.
|
||
|
|
||
|
While this is going on, other parts continue processing IO requests.
|
||
|
Even if a piece of filesystem metadata can only be regenerated by scanning the
|
||
|
entire system, the scan can still be done in the background while other file
|
||
|
operations continue.
|
||
|
|
||
|
In summary, online fsck takes advantage of resource sharding and redundant
|
||
|
metadata to enable targeted checking and repair operations while the system
|
||
|
is running.
|
||
|
This capability will be coupled to automatic system management so that
|
||
|
autonomous self-healing of XFS maximizes service availability.
|