If the file system is mounted read-only after a file system error has
been detected, the fact that an error occurred is written to the
journal. This is important because while the journal is getting
replayed, the error indication in the superblock may very well get
overwritten.
Unfortunately, the code to propagate the error indication from the
journal to superblock was broken because this was being done before
the old file system handle is thrown away and the file system is
re-opened to ensure that no stale data is in the file system handle.
As a result, the error indication in the superblock was never written
out.
To fix this, we need to move the check if the journal's error
indicator has been set after the file system has been freed and
re-open.
Reported-by: Ken Sumrall <ksumrall@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
If a file system was remounted read-only after a file system
corruption is detected, and then that file system is mounted and
unmounted by the kernel, the journal would have been recovered, but
the kernel currently leaves the s_errno field still set. This is
arguably a bug, since it has already propgated the non-zero s_errno
field to the file system superblock, where it will be retained until
e2fsck has been run.
However, e2fsck should handle this case for existing kernel by
checking the journal superblock's s_errno field even if journal
recovery is not required.
Without this commit, e2fsck would not notice anything wrong with the
file system, but a subsequent mount of the file system by the kernel
would mark the file system's superblock as needing checking (since the
journal's s_errno field would still be set), resulting an full e2fsck
run at the next reboot, which would find nothing wrong --- and then
when the file system was mounted, the whole cycle would repeat again.
I had seen reports of this in the past, but it wasn't until recently
that I realized exactly how this had come about, since normally e2fsck
would be run automatically before the file system is mounted again,
thus avoiding this problem. However, a user using a rescue CD who
didn't run e2fsck before mounting the a file system in this condition
could trigger this situation, and unfortunately, with previous
versions of e2fsprogs and the kernel, there would be no way out no
matter what the user tried to do.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
64-bit journal support was broken; we weren't using the high bits from
the journal descriptor blocks! We were also using "unsigned long" for
the journal block numbers, which would be a problem on 32-bit systems.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Multi-mount protection is feature that allows mke2fs, e2fsck, and
others to detect if the filesystem is mounted on a remote node (on
SAN disks) and avoid corrupting the filesystem. For e2fsprogs this
means that it checks the MMP block to see if the filesystem is in use,
and marks the filesystem busy while e2fsck is running on the system.
This is useful on SAN disks that are shared between high-availability
servers, or accessible by multiple nodes that aren't in HA pairs. MMP
isn't intended to serve as a primary HA exclusion mechanism, but as a
failsafe to protect against user, software, or hardware errors.
There is no requirement that e2fsck updates the MMP block at regular
intervals, but e2fsck does this occasionally to provide useful
information to the sysadmin in case of a detected conflict.
For the kernel (since Linux 3.0) MMP adds a "heartbeat" mechanism to
periodically write to disk (every few seconds by default) to notify
other nodes that the filesystem is still in use and unsafe to modify.
Originally-by: Kalpak Shah <kalpak@clusterfs.com>
Signed-off-by: Johann Lombardi <johann@whamcloud.com>
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The DEFS line in MCONFIG had gotten so long that it exceeded 4k, and
this was starting to cause some tools heartburn. It also made "make
V=1" almost useless, since trying to following the individual commands
run by make was lost in the noise of all of the defines.
So fix this by putting the configure-generated defines in lib/config.h
and the directory pathnames to lib/dirpaths.h.
In addition, clean up some vestigal defines in configure.in and in the
Makefiles to further shorten the cc command lines.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The write_journal_inode() code is only setting the low 32-bit i_size
for the journal size, even though it is possible to specify a journal
up to 10M blocks in size. Trying to create a journal larger than 2GB
will succeed, but an immediate e2fsck would fail. Store i_size_high
for the journal inode when creating it, and load it upon access.
Use s_jnl_blocks[15] to store the journal i_size_high backup. This
field is currently unused, as EXT2_N_BLOCKS is 15, so it is using
s_jnl_blocks[0..14], and i_size is in s_jnl_blocks[16].
Rename the "size" argument "num_blocks" for the journal creation functions
to clarify this parameter is in units of filesystem blocks and not bytes.
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
This prevents accidentally replaying and resetting the journal while
it is mounted, due to an accidental attempt to run e2fsck on an LVM
snapshot of a file system with an external journal.
Addresses-Debian-Bug: #587531
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
For a filesystem that fails with:
journal_bmap: journal block not found at offset 7334 on loop0
JBD: bad block at offset 7334
e2fsck won't actually fix this; it will mark the fs as clean,
so it will mount, but it does not fix that block, and when the
journal reaches this point again it will fail again.
The following simple change to process_journal_block() might be
a little drastic; it will clear & recreate the journal inode if
it's sparse - i.e. if it gets block 0.
I suppose we could be more complicated and try to replay the journal
up to the error, but I'm not sure it's worth it since we're fscking
it anyway.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Fix a regression in e2fsprogs 1.41.5 which would undo updates to the
block group descriptors after a journal replay, caused by commit
b7c5b403. We now use ext2fs_free() instead of ext2fs_close() to make
sure we the library will never try to write out superblock or block
group descriptors.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
E2fsck was using a fixed-size 8k buffer for replaying blocks from the
journal. So attempts to replay a journal on filesystems greater than
8k would cause e2fsck to crash with a segfault.
Thanks to Miao Xie <miaox@cn.fujitsu.com> for reporting this problem.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
This patch has all the necesary pieces to open and fix filesystems created
with the uninit block group feature.
Signed-off-by: Jose R. Santos <jrs@us.ibm.com>
Signed-off-by: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
If e2fsck adds or deletes any of the feature bitmasks, clear
EXT2_FLAG_MASTER_SB_ONLY so the backup superblocks are updated when
e2fsck finishes.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The original code only checked the direct blocks to make sure the
journal inode was sane. Unfortunately, if some or all of the indirect
or doubly indirect blocks were corrupted, this would not be caught.
Thanks to Andreas Dilger and Kalpak Shah for noticing this problem.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
If the journal inode was corrected from s_jnl_blocks, write the fixed
journal inode back to disk.
Signed-off-by: Kalpak Shah <kalpak@clusterfs.com>
Signed-off-by: Andreas Dilger <adilger@clusterfs.com>
If the journal had been removed because it was corrupt, the
E2F_FLAG_JOURNAL_INODE flag will be set. If this flag is set, then
recreate the filesystem after checking the filesystem.
Signed-off-by: Kalpak Shah <kalpak@clusterfs.com>
Signed-off-by: Andreas Dilger <adilger@clusterfs.com>
This patch changes ext2fs_open() to set EXT2_FLAG_MASTER_SB_ONLY by
default. This avoids some problems in e2fsck (reported by Jim Garlick)
where a corrupt journal can end up writing the bad superblock to the
backups. In general, only e2fsck (after the filesystem is clean),
tune2fs, and resize2fs should change the backup superblocks by default.
Most callers of ext2fs_open() should not be touching anything where the
backups should be touched. So let's change the defaults to avoid
potential problems.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Add missing brelse() calls to avoid memory leaks in error paths. (Thanks
to Michael C. Thompson for pointing these out; they were originally
found using Coverity.)
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Check to see if the superblock hint for the external journal needs to
be updated, and if so, offer to update it. (Addresses Debian Bug:
#355644)
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Detect if the superblock's last mount field or last write field is in
the future, and offer to fix if so. (Addresses Debian Bug #327580)
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
superblock. E2fsck will automatically save the journal information
in the superblock if it is not there already, and will use it if the
journal inode appears to be corrupted. ext2fs_add_journal_inode()
will also save the backup information, so that new filesystems
created by mke2fs and filesystems that have journals added via
tune2fs will also have journal location written to the superblock as
well. Debugfs's logdump command has been enhanced so that it can
use the journal information in the superblock.
The debugfs man page has been improved to more fully describe the
logdump command.
Added two new functions, ext2fs_file_open2() and
ext2fs_inode_io_intern2() which take a pointer to an inode structure;
this is needed so that e2fsck and debugfs can synthesize a
fake journal inode and use it to access the journal.