Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Google-Bug-Id: 64109868
Test: e2fsck -E unshare_blocks does a full scan
Change-Id: Idc36ceba3bf24e1fb1487feedefe9a68f9acc7f3
From AOSP commit: 7c180d6598363722de6195d142d7677bbc2b0161
If -E unshare_blocks is used with -n, it will normally fail since the
filesystem is read-only. For Android's "adb remount" it is more useful
to report whether or not the unshare operation would succeed, were the
filesystem writable. We do that here by ignoring certain write
operations if -E unshare_blocks is specified with -n. It is not perfect,
since the actual unshare operation could still fail (for example if
new extents need to consume additional blocks).
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Google-Bug-Id: 64109868
Test: e2fsck -f -n -E unshare_blocks on deduplicated image
Change-Id: Ia50ceb7b3745fdf8766cff06c697818f07411635
From AOSP commit: 9e76dc0f65d8a8dec27f57b9020e81cbbbe12faf
Add an -E unshare_blocks flag for unsharing blocks that were created for
a filesystem with block sharing enabled. If the filesystem does not have
this feature enabled, the flag has no effect. If the filesystem does not
have free space, e2fsck will error.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Google-Bug-Id: 64109868
Test: f_unshare_blocks_no_space, f_unshare_blocks_ok
Change-Id: I8821353e9e6200c6c0c71dd22f4f43d796fc720c
From AOSP commit: 8ba190e3135d61501d3a694b6960c2fbee98e7a6
In the case of file system with large number of hard links, e2fsck can
take a large amount of time in pass 2 due to binary search lookup of
inode numbers. This implements a memory trade-off (storing 2 bytes
in-memory for each inode to store inode counts).
For a 40TB filesystem with 2.8bn inodes this map alone requires 5.7GB
of RAM. For this reason, we don't enable this optimization by
default. It can be enabled using either an extended option to e2fsck
or via a seting in e2fsck.conf.
Even when the fullmap optimization is enabled, we don't use this for
the icount structure in pass 1. This is because the gain CPU gain is
nearly nil for that pass and the sacrificed memory does not justify
the increase in RAM.
(It could be that during pass 1, if more than 17% if possible inodes
has link_count>1 (466m inodes in the 40TB with 2.8bn possible inodes
case) then it becomes more memory efficient to use the full map
implementation in terms of memory. However, this is extremely
unlikely given that most file systems are heavily over-provisioned in
terms of the number of inodes in the system.)
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Add an extended option, -E no_optimize_extents, as well as a
e2fsck.conf profile option, to disable extent tree optimization.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Provide the user with an option to create an undo file so that they
can roll back a failed repair operation.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Teach e2fsck to (re)construct extent trees. This enables us to do
either of the following: compress a highly sparse extent tree into
fewer ETB blocks; or convert a ext3-style block mapped file to an
extent file. The reconstruction is performed during pass 1E or 3A,
as detailed below.
For files that are already extent based, this algorithm will
automatically run (pending user approval) if pass1 determines either
(1) that a whole level of extent tree will fit into a higher level of
the tree; (2) that the size of any level can be reduced by at least
one ETB block; or (3) the extent tree is unnecessarily deep. It will
not run at all if errors are found and the user declines to fix the
errors.
The option "-E bmap2extent" can be used to force e2fsck to convert all
block map files to extent trees, and to rebuild all extent files'
extent trees. After conversion, files larger than 12 blocks should be
defragmented to eliminate empty holes where a block lives.
The extent tree constructor is pretty dumb -- it creates a list of
leaf extents (adjacent extents are collapsed), marks all indirect
blocks / ETB blocks free, installs a new extent tree root in the
inode, then loads the leaf extents into the tree.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
e2fsck pass1 is modified to use the block group data prefetch function
to try to fetch the inode tables into the pagecache before it is
needed. We iterate through the blockgroups until we have enough inode
tables that need reading such that we can issue readahead; then we sit
and wait until the last inode table block read of the last group to
start fetching the next bunch.
pass2 is modified to use the dirblock prefetching function to prefetch
the list of directory blocks that are assembled in pass1. We use the
"iterate a subset of a dblist" and avoid copying the dblist. Directory
blocks are fetched incrementally as we walk through the directory
block list. In previous iterations of this patch we would free the
directory blocks after processing, but the performance hit to e2fsck
itself wasn't worth it. Furthermore, it is anticipated that most
users will then mount the FS and start using the directories, so they
may as well remain in the page cache.
pass4 is modified to prefetch the block and inode bitmaps in
anticipation of pass 5, because pass4 is entirely CPU bound.
In general, these mechanisms can decrease fsck time by 10-40%, if the
host system has sufficient memory and the storage system can provide a
lot of IOPs. Pretty much any storage system capable of handling
multiple IOs in-flight at any time will see a fairly large performance
boost. (Single-issue USB mass storage disks seem to suffer badly.)
By default, the readahead buffer size will be set to the size of a block
group's inode table (which is 2MiB for a regular ext4 FS). The -E
readahead_kb= option can be given to specify the amount of memory to
use for readahead or zero to disable it entirely; or an option can be
given in e2fsck.conf.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Provide a mechanism for a user to switch fsck into '-y' mode if they
start an interactive session and then get tired of pressing 'y' in
response to numerous prompts.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
As recently discussed on linux-ext4@vger.kernel.org add an option to e2fsck
to allow to replay the journal only. That will allow scripts, such as
pacemakers 'Filesystem' RA to first replay the journal and if that sets
an error state from the journal replay, further check for that error
(dumpe2fh -h | grep "Filesystem state:") and if that shows and error
to refuse to mount. It also allows automatic e2fsck scripts to first
replay the journal and on a second run after the real pass1 to passX checks
to test for the return code.
Signed-off-by: Bernd Schubert <bschubert@ddn.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
In Pass 5 when we are checking block and inode bitmaps we have great
opportunity to discard free space and unused inodes on the device,
because bitmaps has just been verified as valid. This commit takes
advantage of this opportunity and discards both, all free space and
unused inodes.
I have added new set of options, 'nodiscard' and 'discard'. When the
underlying devices does not support discard, or discard ends with an
error, or when any kind of error occurs on the filesystem, no further
discard attempt will be made and the e2fsck will behave as it would
with nodiscard option provided.
As an addition, when there is any not-yet-zeroed inode table and
discard zeroes data, then inode table is marked as zeroed.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
A user was surprised when -n -D caused the file system to be opened
read/write, and then outsmarted himself when e2fsck asked the question:
WARNING!!! Running e2fsck on a mounted filesystem may cause
SEVERE filesystem damage.
Do you really want to continue (y/n)?
This is partially our fault for not documenting the fact that -D
overrode opening the filesystem read-write. But the bottom line is it
much safer if -n *always* opens the file system read-only, so there
can be no confusion. This means that we have to disable certain
combination of options, such as "-n -c", "-n -l", and "-n -L", and
"-n -D", but the utility of these combinations is pretty low, and
is more than offset by making e2fsck idiot-proof.
Addresses-Launchpad-Bug: #537483
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The e2fsprogs programs have historically just said that they operate
on ext2 and ext3 file system in their man pages. Update them to say
that they also operate on ext4 file systems.
Addresses-Launchpad-bug: #381854
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Also added support for "e2fsck -E fragcheck" which issues a
comprehensive report of discontiguous file extents.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
If a negative progress argument is given to -C, initially suppress the
progress information. It can be enabled later by sending the e2fsck
process a SIGUSR1 signal.
Addresses-Launchpad-Bug: #203323
Addresses-Sourceforge-Bug: #1926023
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Document in the e2fsck man page that e2fsck finds duplicate filenames
only when the -D option is passed to e2fsck.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Add an explanation of how e2fsck might decide to optimize a few
directories even without the -D option being specified.
Addresses-Debian-Bug: #441872
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The need for fixing byte-swapped filesystems is long-gone, and this is
getting in the way of cleaning up e2fsprogs's bitmaps code. So let's
get rid of it; modern kernels haven't been able to deal with a
byte-swapped filesystem in in about 9 years.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
A user was confused about whether or not e2fsck -c performed a destructive
test on the filesystem, since it stated that -cc resulted in a non-destructive
read/write test. Clarify that -c does a read/only test.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
make sure we gracefully clean up and only exit at safe points.
For fsck, we pass the SIGINT/SIGTERM signal to the child processes,
so they can do their own cleanup.
a read/write test on the disk. Update the man pages to encourage
using the -c option, and to discouraging running badblocks separately,
since users tend to forget to set the blocksize when running
badblocks.