The f_badcluster output format depends on how libreadline formats
and outputs the commands read from stdin. Instead of trying to
handle these differences, use an input command file, which does
not depend on external components to be consistent.
Signed-off-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Quiet warnings about signed vs. unsigned character mismatch.
Use __u8 for storing UUIDs instead of char to match the superblock
s_uuid field.
Signed-off-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
This bug was introduced by commit 7dfefaf413 ("tune2fs: update
journal super block when changing UUID for fs").
Fixes-Coverity-Bug: 1229243
Reported-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Using -U option you can change the UUID for fs, however it will not work
for journal device, since it have a copy of this UUID inside jsb (i.e.
journal super block). So copy UUID on change into that block.
Here is the initial thread:
http://comments.gmane.org/gmane.comp.file-systems.ext4/44532
You can reproduce this by executing following commands:
$ fallocate -l100M /tmp/dev
$ fallocate -l100M /tmp/journal
$ sudo /sbin/losetup /dev/loop1 /tmp/dev
$ sudo /sbin/losetup /dev/loop0 /tmp/journal
$ mke2fs -O journal_dev /tmp/journal
$ tune2fs -U da1f2ed0-60f6-aaaa-92fd-738701418523 /tmp/journal
$ sudo mke2fs -t ext4 -J device=/dev/loop0 /dev/loop1
$ dumpe2fs -h /tmp/dev | fgrep UUID
dumpe2fs 1.43-WIP (18-May-2014)
Filesystem UUID: 8a776be9-12eb-411f-8e88-b873575ecfb6
Journal UUID: e3d02151-e776-4865-af25-aecb7291e8e5
$ sudo e2fsck /dev/vdc
e2fsck 1.43-WIP (18-May-2014)
External journal does not support this filesystem
/dev/loop1: ********** WARNING: Filesystem still has errors **********
Reported-by: Chin Tzung Cheng <chintzung@gmail.com>
Signed-off-by: Azat Khuzhin <a3at.mail@gmail.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Use EXT2_MIN_BLOCK_SIZE, JFS_MIN_JOURNAL_BLOCKS, SUPERBLOCK_SIZE, and
SUPERBLOCK_OFFSET instead of hardcoded 1024 when it is okay, and also
add a helper ext2fs_journal_sb_start() that will return start of
journal sb with special case for fs with 1k block size.
Signed-off-by: Azat Khuzhin <a3at.mail@gmail.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
This should have been part of commit 9a1d614df2 ("e2fsck: fix
rule-violating lblk->pblk mappings on bigalloc filesystems") but it
accidentally got dropped when the patch was applied.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
ioctl(FIGETBSZ) was used to get block size earlier but 2508eaa7
(filefrag: improvements to filefrag FIEMAP handling) moved to fstatfs
f_bsize which doesn't work well for many files systems.
Block size returned using fstatfs isn't block size but "optimal
transfer block size" as per man page. Even stat st_blksize is
"preferred I/O block size" and in may file systems it may even vary
from file to file (POSIX). This patch changes filefrag to use
FIGETBSZ preferentially over f_bsize.
[ Modified by tytso to add the fallback to f_bsize if FIGETBSZ fails
for some reason ]
Signed-off-by: Rakesh Pandit <rakesh@tuxera.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
29758d2 broke -B option which is useful for filesystems not supporting
FIEMAP. Also, fix extents calculation for -B which is broken since
2508eaa7.
Signed-off-by: Rakesh Pandit <rakesh@tuxera.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If we're forced to delete a crosslinked file, only call
ext2fs_block_alloc_stats2() on cluster boundaries, since the block
bitmaps are all cluster bitmaps at this point. It's safe to do this
only once per cluster since we know all the blocks are going away.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
As far as I can tell, logical block mappings on a bigalloc filesystem are
supposed to follow a few constraints:
* The logical cluster offset must match the physical cluster offset.
* A logical cluster may not map to multiple physical clusters.
Since the multiply-claimed block recovery code can be used to fix these
problems, teach e2fsck to find these transgressions and fix them.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If we're filling a directory hole, we need to perform an implied
cluster allocation to satisfy the bigalloc rule of mapping only one
pblk to a logical cluster.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If we think we're going to need to repair either the root directory or
the lost+found directory, reserve a block at the end of pass 1 to
reduce the likelihood of an e2fsck abort while reconstructing
root/lost+found during pass 3.
If / and/or /lost+found are corrupt and duplicate processing in pass
1b allocates all the free blocks in the FS, fsck aborts with an
unusable FS since pass 3 can't recreate / or /lost+found. If either
of those directories are missing, an admin can't easily mount the FS
and access the directory tree to move files off the injured FS and
free up space; this in turn prevents subsequent runs of e2fsck from
being able to continue repairs of the FS.
(One could migrate files manually with debugfs without the help of
path names, but it seems easier if users can simply mount the FS and
use regular FS management tools.)
[ Fixed up an obvious C trap: const char * and const char [] are not
the same thing when you are taking the size of the parameter.
People, run your regression tests! Like spinach, it's good for you. :-)
-- tytso ]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Provide an API to set i_size in an inode and take care of all required
feature flag modifications. Refactor the code to use this new
function.
[ Moved the function to lib/ext2fs/blk_num.c, which is the rest of
these sorts of functions live, and renamed it to be
ext2fs_inode_size_set() instead of ext2fs_inode_set_size() to be
consistent with the other functions in in blk_num.c -- tytso ]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
We rely on a nasty hack to adjust the free block count where we pass
signed value into ext2fs_free_blocks_count_add(), which takes an
64-bit unsigned value, and relies on overflow and C's signed->unsigned
semantics to do the subtraction. This works, so long as a 64-bit
signed value is used.
Unfortunately, ext2fs_block_alloc_stats2() and
ext2fs_block_alloc_stats_range(), this is not true, so on a 64-bit
file system, the free blocks accounting can get screwed up.
A simple way to demonstrate the problem is:
mke2fs -F -t ext4 -O 64bit /tmp/foo.img 1M
e2fsck -fy /tmp/foo.img
... which will result in the following e2fsck complaint:
Pass 5: Checking group summary information
Free blocks count wrong (4294968278, counted=982).
Fix? yes
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
There are a number of places where we need convert groups to blocks or
clusters by multiply the groups by blocks/clusters per group.
Unfortunately, both quantities are 32-bit, but the result needs to be
64-bit, and very often the cast to 64-bit gets lost.
Fix this by adding new macros, EXT2_GROUPS_TO_BLOCKS() and
EXT2_GROUPS_TO_CLUSTERS().
This should fix a bug where resizing a 64bit file system can result in
calculate_minimum_resize_size() looping forever.
Addresses-Launchpad-Bug: #1321958
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Using C99 initializers makes the code a bit more readable, and it
avoids some gcc -Wall warnings regarding missing initializers.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When resizing an empty 21T file system to 28T, resize2fs was using
this much CPU time and memory:
216.98user 19.77system 4:02.92elapsed 97%CPU (0avgtext+0avgdata 4485664maxresident)k
8inputs+1068680outputs (0major+800745minor)pagefaults 0swaps
After this one-line change:
222.29user 0.49system 3:48.79elapsed 97%CPU (0avgtext+0avgdata 30080maxresident)k
8inputs+1068552outputs (0major+2497minor)pagefaults 0swaps
So this reduces the max memory utilized from 4.2GB to 29MB!
For future work, the primary place where we are spending the most cpu
time (from resize2fs -d 16) are these two places:
blocks_to_move: Memory used: 2508k/25096k (1903k/606k), time: 91.42/91.53/ 0.00
and
calculate_summary_stats: Memory used: 2508k/25612k (1908k/601k), time: 95.33/95.45/ 0.00
The calculate_summary_stats pass can be sped up by using
ext2fs_find_first_{zero,set}_block_bitmap2(), instead of iterating
over the entire block bitmap one bit at a time.
The blocks_to_move pass can be sped up by using a bitmap to store the
location of fs metadata blocks, to avoid an O(N**2) algorithm where N
is the number of groups in the file system.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The bits between end and real_end are set as a safety measure for the
kernel when it uses the bit scan instructions. We need to take this
into account when shrinking or growing the block allocation bitmap,
before we can safely use rbtree bitmaps in resize2fs.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Fix a few warnings about unused and uninitialized variables.
Also fix util/subst.c to include <sys/time.h> to avoid using
undeclared functions gettimeofday() and futimes().
Signed-off-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Directories can't have uninitialized extents, so offer to clear the
uninit flag when we find this situation. The actual directory blocks
will be checked in pass 2 and 3 regardless of the uninit flag.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Currently, directories cannot be fallocated, which means that the only
way they get bigger is for the kernel to append blocks one by one.
Therefore, if we encounter a logical block offset that is too big, we
needn't bother adding it to the dblist for pass2 processing, because
it's unlikely to contain a valid directory block. The code that
handles extent based directories also does not add toobig blocks to
the dblist.
Note that we can easily cause e2fsck to fail with ENOMEM if we start
feeding it really large logical block offsets, as the dblist
implementation will try to realloc() an array big enough to hold it.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Always iterate logical block 0 in a directory, even if no physical
block has been allocated. Pass 2 will notice the lack of mapping and
offer to allocate a new directory block; this enables us to link the
directory into lost+found.
Previously, if there were no logical blocks mapped, we would fail to
pick up even block 0 of the directory for processing in pass 2. This
meant that e2fsck never allocated a block 0 and therefore wouldn't fix
the missing . and .. entries for the directory; subsequent e2fsck runs
would complain about (yet never fix) the problem.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If we notice a hole in the block map of an extent-based directory,
offer to collapse the hole by decreasing the logical block # of the
extent. This saves us from pass 3's inefficient strategy, which fills
the holes by mapping in a lot of empty directory blocks.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If a user crafts a carefully constructed filesystem containing a
single directory entry block with an invalid checksum and fewer than
two entries, and then runs e2fsck to fix the filesystem, fsck will
crash when it tries to "compress" the short dir and passes a negative
dirent array length to qsort. Therefore, don't allow directory
"compression" in this situation.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The third argument to strncat is the maximum number of characters to
copy out of the second argument; it is not the maximum length of the
first argument.
Therefore, code in a check just in case we ever find a /sys/block/X
path long enough to hit the end of the buffer. FWIW the longest path
I could find on my machine was 133 bytes.
Fixes-Coverity-Bug: 1252003
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
In the loop in ext2fs_get_free_blocks2, we ask the bitmap if there's a
range of free blocks starting at "b" and ending at "b + num - 1".
That quantity is the number of the last block in the range. Since
ext2fs_blocks_count() returns the number of blocks and not the number
of the last block in the filesystem, the check is incorrect.
Put in a shortcut to exit the loop if finish > start, because in that
case it's obvious that we don't need to reset to the beginning of the
FS to continue the search for blocks. This is needed to terminate the
loop because the broken test meant that b could get large enough to
equal finish, which would end the while loop.
The attached testcase shows that with the off by one error, it is
possible to throw e2fsck into an infinite loop while it tries to
find space for the inode table even though there's no space for one.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Since fs->group_desc_count is the number of block groups, the number
of the last group is always one less than this count. Fix the bounds
check to reflect that.
This flaw shouldn't have any user-visible side effects, since the
block bitmap test based on last_grp later on can handle overbig block
numbers.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
During the later passes of efsck, we sometimes need to allocate and
map blocks into a file. This can happen either by fsck directly
calling new_block() or indirectly by the library calling new_block
because it needs to allocate a block for lower level metadata (bmap2()
with BMAP_SET; block_iterate3() with BLOCK_CHANGED).
We need to force new_block to allocate blocks from the found block
map, because the FS block map could be inaccurate for various reasons:
the map is wrong, there are missing blocks, the checksum failed, etc.
Therefore, any time fsck does something that could to allocate blocks,
we need to intercept allocation requests so that they're sourced from
the found block map. Remove the previous code that swapped bitmap
pointers as this is now unneeded.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When we call ext2fs_close_free at the end of main(), we need to supply
the address of ctx->fs, because the subsequent e2fsck_free_context
call will try to access ctx->fs (which is now set to a freed block) to
see if it should free the directory block list. This is clearly not
desirable, so fix the problem.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When we're about to iterate the blocks of a block-map file, we need to
write the inode out to disk if it's dirty because block_iterate3()
will re-read the inode from disk. (In practice this won't happen
because nothing dirties block-mapped inodes before the iterate call,
but we can program defensively).
More importantly, we need to re-read the inode after the iterate()
operation because it's possible that mappings were changed (or erased)
during the iteration. If we then dirty or clear the inode, we'll
mistakenly write the old inode values back out to disk!
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If the bitmaps are known to be unreadable, don't bother clearing them;
just mark fsck to restart itself after pass 5, by which time the
bitmaps should be fixed.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If e2fsck knows the bitmaps are bad at the exit (probably because they
were bad at the start and have not been fixed), don't offer to
recreate the journal because doing so causes e2fsck to abort a second
time.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If there's a problem with the inode scan during pass 1b, report the
inode that we were trying to examine when the error happened, not the
inode that just went through the checker.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Allow set_inode_field's bmap command in debugfs to allocate blocks,
which enables us to allocate blocks for indirect blocks and internal
extent tree blocks. True, we could do this manually, but seems like
unnecessary bookkeeping activity for humans.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Create a command that will dump an entire inode's space in hex.
[ Modified by tytso to add a description to the man page, and to add
the more formal command name, inode_dump, in addition to short
command name of "idump". ]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Currently, e4defrag avoids increasing file fragmentation by comparing
the number of runs of physical extents of both the original and the
donor files. Unfortunately, there is a bug in the routine that counts
physical extents, since it doesn't look at the logical block offsets
of the extents. Therefore, a file whose blocks were allocated in
reverse order will be seen as only having one big physical extent, and
therefore will not be defragmented.
Fix the counting routine to consider logical extent offset so that we
defragment backwards-allocated files. This could be problematic if we
ever gain the ability to lay out logically sparse extents in a
physically contiguous manner, but presumably one wouldn't call defrag
on such a file.
Reported-by: Xiaoguang Wang <wangxg.fnst@cn.fujitsu.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
It's only necessary to build tst_libext2fs when running "make check".
Also make sure the links of the tst_* programs are done with
$(ALL_LDFLAGS).
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Most systems have a backwards compatibility symlink in
/usr/include/syscall.h to /usr/include/sys/syscall.h, but
sys/syscall.h is the documented location of the header file. Fix two
locations where we were using <syscall.h> instead of <sys/syscall.h>.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
There were other protections which would prevent a buffer overflow
from happening, but we should fix this nevertheless.
Addresses-Coverity-Bug: #1225003
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The single quote character must not be in the first character in a
line, or else it can get mistaken as a macro call.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
As an object lesson in why autoreconf is fundamentally unsafe, the
newer version of nls.m4 no longer handles @MKINSTALLDIRS@. So add
this back, since our Makefiles depend on it.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>