Fix journal_bmap() and sync_blockdev() to use the kernel error
convetions (e.g., -EIO instead of EIO) since they are called by
reovery.c, which is shared userspace / kernel code.
Without this, e2fsck might print an error message like this:
/sbin/e2fsck: Unknown code ____ 251 while recovering journal of /dev/mapper/thin-vol
instead of what it should have printed which was this:
/sbin/e2fsck: Input/output error while recovering journal of /dev/mapper/thin-vol
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Previously, e2fsck would exit with a status code of 1 even though the
only changes that it made to the file system were various
optimziations and not fixing file system corruption. Since the man
page states that an exit status of 1 means "file system errors
corrupted", fix e2fsck to return an exit status of 0.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When journal is released, s_sequence is set to j_tail_sequence.
But, currently, even if the recovery process is successfully completed,
the j_tail_sequence and, finally, s_sequence are never changed. By this,
when we repeat doing power-off the device suddenly and executing e2fsck
without full scan before mount, the s_sequence number will never change
and, in a very rare case, newly generated journal logs will be
surprisingly grafted to the old journal logs. In this case, out-of-date
metadata log can be replayed on the filesystem area and the filesystem
can be crashed unintentionally by journal recovery process. Therefore,
we need to update j_tail_sequence after recovery process is successfully
completed in e2fsck.
Youngjin had repeated this test and found the problem. With our test,
the filesystem crash occurred within 4 hours.
Signed-off-by: Youngjin Gil <youngjin.gil@samsung.com>
Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If the journal superblock is corrupt and the user declines to fix it
(or runs e2fsck -n), make sure the error messages are clear and
explain that e2fsck cannot (safely) proceed.
Addresses-Debian-Bug: #768162
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Create separate predicate functions to test/set/clear feature flags,
thereby replacing the wordy old macros. Furthermore, clean out the
places where we open-coded feature tests.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When we recreate the journal, don't say that the FS "is now ext3
again", since we could be fixing a damaged ext4 FS journal, which does
not magically convert the FS back to ext3.
[ Use "journaled" instead of "journalled", and also fix the message we
print when deleting the journal --Ted ]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The journal superblock's s_sequence field seems to track the tid of
the tail (oldest) transaction in the log. Therefore, when we release
the journal, set the s_sequence to the tail_sequence, because setting
it to the transaction_sequence means that we're setting the tid to
that of the head of the log. Granted, for replay these two are
usually the same (and s_start == 0 anyway) so thus far we've gotten
lucky and nobody noticed.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When we're removing the internal journal (broken journal, turning it
off, or adding an external journal), zero s_jnl_blocks so that they
can't be picked up by accident later.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Cc: TR Reardon <thomas_reardon@hotmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Verify the (ext4) superblock checksum of an external journal device
and prompt to correct the checksum if nothing else is wrong with the
superblock.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Cc: TR Reardon <thomas_reardon@hotmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
It turns out that there are some serious problems with the on-disk
format of journal checksum v2. The foremost is that the function to
calculate descriptor tag size returns sizes that are too big. This
causes alignment issues on some architectures and is compounded by the
fact that some parts of jbd2 use the structure size (incorrectly) to
determine the presence of a 64bit journal instead of checking the
feature flags. These errors regrettably lead to the journal
corruption reported by Mr. Reardon.
Therefore, introduce journal checksum v3, which enlarges the
descriptor block tag format to allow for full 32-bit checksums of
journal blocks, fix the journal tag function to return the correct
sizes, and fix the jbd2 recovery code to use feature flags to
determine 64bitness.
Add a few function helpers so we don't have to open-code quite so
many pieces.
Switching to a 16-byte block size was found to increase journal size
overhead by a maximum of 0.1%, to convert a 32-bit journal with no
checksumming to a 32-bit journal with checksum v3 enabled.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reported-by: TR Reardon <thomas_reardon@hotmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Synchronize e2fsck's copy of recovery.c with the kernel's copy in
fs/jbd2.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Use EXT2_MIN_BLOCK_SIZE, JFS_MIN_JOURNAL_BLOCKS, SUPERBLOCK_SIZE, and
SUPERBLOCK_OFFSET instead of hardcoded 1024 when it is okay, and also
add a helper ext2fs_journal_sb_start() that will return start of
journal sb with special case for fs with 1k block size.
Signed-off-by: Azat Khuzhin <a3at.mail@gmail.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
After a journal replay, we close and reopen the file system so that
any changes in the superblock can get reflected in the libext2fs's
internal data structures. We need to save the flags passed to
ext2fs_open() that we used when we originally opened the file system.
Otherwise we will end up not be able to repair a file system which
requires a journal replay and which has bigalloc enabled or which has
more than 2**32 blocks; e2fsck will abort with the error message:
fsck.ext4: Filesystem too large to use legacy bitmaps while trying to re-open
Addresses-Debian-Bug: 744953
Cc: Андрей Василишин <a.vasilishin@kpi.ua>
Cc: Jon Severinsson <jon@severinsson.net>
Cc: 744953@bugs.debian.org
It can be made simpler because there is no need to differentiate between
having an internal journal inode and having an external journal device.
Signed-off-by: Benno Schulenberg <bensberg@justemail.net>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Compiling with LLVM generates a large number of warnings due
to the use of _() for wrapping strings for i18n:
warning: format string is not a string literal
(potentially insecure) [-Wformat-security]
./nls-enable.h:4:14: note: expanded from macro '_'
#define _(a) (gettext (a))
^~~~~~~~~~~~
These warnings are fixed by using "%s" as the format string,
and then _() is used as the string argument.
Signed-off-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
If the external journal device has exactly 1 << 32 blocks,
journal->j_maxlen would get set to zero, which would cause e2fsck to
declare the journal to be invalid.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
We need to store some error codes using an int to keep recovery.c as
close as possible to the recovery.c source file in the kernel.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Fix all the places where we should be using a blk64_t instead of a
blk_t. These fixes are more severe because 64bit values could be
truncated silently.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Perhaps the most serious fix up is a type-punning warning which could
result in miscompilation with overly enthusiastic compilers.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
When opening the external journal, use the same logic to decide
whether or not to open the file system with EXT2_FLAG_EXCLUSIVE found
in main().
Otherwise, it's not posible to use e2fsck when the root file system is
using an external journal.
Reported-by: Calvin Owens <jcalvinowens@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Ensure that the journal superblock passes checksum before recovering the
filesystem.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If the file system is mounted read-only after a file system error has
been detected, the fact that an error occurred is written to the
journal. This is important because while the journal is getting
replayed, the error indication in the superblock may very well get
overwritten.
Unfortunately, the code to propagate the error indication from the
journal to superblock was broken because this was being done before
the old file system handle is thrown away and the file system is
re-opened to ensure that no stale data is in the file system handle.
As a result, the error indication in the superblock was never written
out.
To fix this, we need to move the check if the journal's error
indicator has been set after the file system has been freed and
re-open.
Reported-by: Ken Sumrall <ksumrall@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
If a file system was remounted read-only after a file system
corruption is detected, and then that file system is mounted and
unmounted by the kernel, the journal would have been recovered, but
the kernel currently leaves the s_errno field still set. This is
arguably a bug, since it has already propgated the non-zero s_errno
field to the file system superblock, where it will be retained until
e2fsck has been run.
However, e2fsck should handle this case for existing kernel by
checking the journal superblock's s_errno field even if journal
recovery is not required.
Without this commit, e2fsck would not notice anything wrong with the
file system, but a subsequent mount of the file system by the kernel
would mark the file system's superblock as needing checking (since the
journal's s_errno field would still be set), resulting an full e2fsck
run at the next reboot, which would find nothing wrong --- and then
when the file system was mounted, the whole cycle would repeat again.
I had seen reports of this in the past, but it wasn't until recently
that I realized exactly how this had come about, since normally e2fsck
would be run automatically before the file system is mounted again,
thus avoiding this problem. However, a user using a rescue CD who
didn't run e2fsck before mounting the a file system in this condition
could trigger this situation, and unfortunately, with previous
versions of e2fsprogs and the kernel, there would be no way out no
matter what the user tried to do.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
64-bit journal support was broken; we weren't using the high bits from
the journal descriptor blocks! We were also using "unsigned long" for
the journal block numbers, which would be a problem on 32-bit systems.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Multi-mount protection is feature that allows mke2fs, e2fsck, and
others to detect if the filesystem is mounted on a remote node (on
SAN disks) and avoid corrupting the filesystem. For e2fsprogs this
means that it checks the MMP block to see if the filesystem is in use,
and marks the filesystem busy while e2fsck is running on the system.
This is useful on SAN disks that are shared between high-availability
servers, or accessible by multiple nodes that aren't in HA pairs. MMP
isn't intended to serve as a primary HA exclusion mechanism, but as a
failsafe to protect against user, software, or hardware errors.
There is no requirement that e2fsck updates the MMP block at regular
intervals, but e2fsck does this occasionally to provide useful
information to the sysadmin in case of a detected conflict.
For the kernel (since Linux 3.0) MMP adds a "heartbeat" mechanism to
periodically write to disk (every few seconds by default) to notify
other nodes that the filesystem is still in use and unsafe to modify.
Originally-by: Kalpak Shah <kalpak@clusterfs.com>
Signed-off-by: Johann Lombardi <johann@whamcloud.com>
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The DEFS line in MCONFIG had gotten so long that it exceeded 4k, and
this was starting to cause some tools heartburn. It also made "make
V=1" almost useless, since trying to following the individual commands
run by make was lost in the noise of all of the defines.
So fix this by putting the configure-generated defines in lib/config.h
and the directory pathnames to lib/dirpaths.h.
In addition, clean up some vestigal defines in configure.in and in the
Makefiles to further shorten the cc command lines.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The write_journal_inode() code is only setting the low 32-bit i_size
for the journal size, even though it is possible to specify a journal
up to 10M blocks in size. Trying to create a journal larger than 2GB
will succeed, but an immediate e2fsck would fail. Store i_size_high
for the journal inode when creating it, and load it upon access.
Use s_jnl_blocks[15] to store the journal i_size_high backup. This
field is currently unused, as EXT2_N_BLOCKS is 15, so it is using
s_jnl_blocks[0..14], and i_size is in s_jnl_blocks[16].
Rename the "size" argument "num_blocks" for the journal creation functions
to clarify this parameter is in units of filesystem blocks and not bytes.
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
This prevents accidentally replaying and resetting the journal while
it is mounted, due to an accidental attempt to run e2fsck on an LVM
snapshot of a file system with an external journal.
Addresses-Debian-Bug: #587531
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
For a filesystem that fails with:
journal_bmap: journal block not found at offset 7334 on loop0
JBD: bad block at offset 7334
e2fsck won't actually fix this; it will mark the fs as clean,
so it will mount, but it does not fix that block, and when the
journal reaches this point again it will fail again.
The following simple change to process_journal_block() might be
a little drastic; it will clear & recreate the journal inode if
it's sparse - i.e. if it gets block 0.
I suppose we could be more complicated and try to replay the journal
up to the error, but I'm not sure it's worth it since we're fscking
it anyway.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Fix a regression in e2fsprogs 1.41.5 which would undo updates to the
block group descriptors after a journal replay, caused by commit
b7c5b403. We now use ext2fs_free() instead of ext2fs_close() to make
sure we the library will never try to write out superblock or block
group descriptors.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
E2fsck was using a fixed-size 8k buffer for replaying blocks from the
journal. So attempts to replay a journal on filesystems greater than
8k would cause e2fsck to crash with a segfault.
Thanks to Miao Xie <miaox@cn.fujitsu.com> for reporting this problem.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>