When we're turning on metadata checksumming /and/ resizing the inode
at the same time, disable checksum verification during the
resize_inode() call because the subroutines it calls will try to
verify the checksums (which have not yet been set), causing the
operation to fail unnecessarily.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The behavior of the r_fixup_lastbg_big test varies depending on
whether or not ext4.ko is loaded and supports lazy_itable_init. This
makes checking the bg flags after resize2fs hard to predict, so put in
a way to force resize2fs to zero the inode tables, and compare the
output based on lazy_itable_init == 0.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Strengthen the i_extra_isize checks to look for obviously too-small
values before trying to operate on inode EAs.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If i_extra_isize is zero when we try to write extended attributes,
we'll end up writing the EA magic into the i_extra_isize field, which
causes a subsequent crash on big endian systems (when we try to write
0xEA02 bytes past the inode!). Therefore when the field is zero, set
i_extra_isize to the desired extra_isize size, zero those bytes, and
write the EAs after the end of the extended inode.
v2: Don't bother if we have 128b inodes, and ensure that the value
is 32b-aligned so that the EA magic starts on a 32b boundary.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Add some simple tests to check that flex_bg and meta_bg filesystems
can be converted between 32 and 64bit layouts.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
resize2fs does its magic by loading a filesystem, duplicating the
in-memory image of that fs, moving relevant blocks out of the way of
whatever new metadata get created, and finally writing everything back
out to disk. Enabling 64bit mode enlarges the group descriptors,
which makes resize2fs a reasonable vehicle for taking care of the rest
of the bookkeeping requirements, so add to resize2fs the ability to
convert a filesystem to 64bit mode and back.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Cc: TR Reardon <thomas_reardon@hotmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
While it may be true that playing games with old_fs' block count
during a grow operation shuts up a bunch of warnings, resize2fs
doesn't actually expand the group descriptor array to match the size
we're artificially stuffing into old_fs, which means that if we
actually need to allocate a block out of the larger fs (i.e. we're in
desperation mode), ext2fs_block_alloc_stats2() scribbles on the heap,
leading to crashes if you're lucky and FS corruption if not.
So, rip that piece out and turn off com_err warnings properly and add
a test case to deal with growing a nearly full filesystem.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Recalculate the unused inode count and the block/inode uninit flags
when resizing a filesystem. This can speed up future e2fsck runs
considerably and will reduce mount times.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If a directory block lacks space for a checksum and the user directs
e2fsck to fix the directory block (by rehashing it), don't complain a
second time about the checksum verification failure when we get to the
end of the directory block.
Also, don't complain about broken HTREE directories if we're already
planning to rebuild the HTREE directory.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Don't display unused inodes twice, and make it clear that we're
printing a descriptor checksum.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Cc: TR Reardon <thomas_reardon@hotmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When we're splitting an extent node, try to allocate the new interior
tree block just prior to the first extent in the block we're trying to
split. The previous logic only set a goal block if we had to split
both the current node and its parent, which is somewhat infrequent.
When that would happen, the goal would start at zero, leading to poor
locality.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Try to be a little smarter about where we go to allocate blocks for a
inode. For a given inode and logical offset, set the goal as if the
file were physically continuous. If it's bmapped, just start looking
at wherever lblk 0 is. If that's not possible (the file has no
lblk>pblk mappings, inline data, etc.) then start looking in the
inode's block group.
[ Fixed memory leak --tytso ]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Don't say the physical block number is invalid if an extent block
fails only the checksum. It passes checks, so it's not invalid.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The file IO routines do not handle uninit blocks at all. The read
method should check for the uninit flag and return a buffer of zeroes,
and the write routine should convert unwritten extents.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Regression test for a problem inadvertently fixed by the patchset
"e2fsprogs/tune2fs: fix memory leak in inode_scan_and_fix()" by Xiaoguang Wang.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When looking for the start of the hugefile range, the 'next' variable
is incorrectly decremented. If we happened to find a single free
block, the effect of this decrement is that blk == next, which means
that we never modify the loop control variable, so get_start_block
never returns.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Fix various tests that break on non-Linux systems; in particular,
sed -i doesn't work the same on all platforms; and try to keep the
GNU getopt-isms out.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The metadata_csum feature (really, the journal checksum disk format)
didn't stabilize until the 3.18 kernel, at which point the companion
journal_csum feature was turned on by default if metadata_csum was
enabled. Therefore, warn the user if they try to create such a
filesystem on a pre-3.18 kernel.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If we get as far as calling libmagic, return the correct error code so
that mkfs asks for confirmation if libmagic finds something and
doesn't ask if nothing is found.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If we're using check_plausibility() to try to identify something that
obviously isn't an ext* filesystem and libblkid doesn't know what it
is, try libmagic instead.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If any of these utilities detect a bad superblock magic, call
check_plausibility to see if blkid can identify the passed-in argument
as something else (xfs, partition, etc.) in the hopes of catching a
user error.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Spit out just the group descriptor data in a machine readable format.
This is most useful for testing and scripting purposes.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If we find a hole in a directory on a bigalloc filesystem, we need to
obey the cluster alignment rules when collapsing the gap to avoid
later complaints.
Specifically, the calculation of the new logical cluster number was
incorrect, and we need to ensure that the logical cluster alignment
respects the physical cluster alignment, since we've concluded that
the extent's logical block number is wrong.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If in the course of iterating extents we find that an otherwise
valid-seeming second extent maps the same logical blocks as a
previously examined first extent, offer to clear the duplicate
mapping.
The test for this is already in f_extents.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When reading extended attributes, check e_value_offs to make sure that
it starts in the value area and not the name area. The attached test
case image will crash the kernel if it is mounted and you append more
than 4096 bytes of data to /a, due to insufficient validation.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If there isn't space in the root directory to add the lost+found
entry, try expanding the root directory before failing the fsck.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If the badblocks list says that the badblocks inode is bad, it's quite
likely that badblocks is broken. Worse yet, if the root inode is in
the same block as the badblocks inode (likely since they're adjacent),
the filesystem becomes unfixable because pass3 notices the bad root
inode and exits.
So... if we encounter this case, just kill the badblocks inode.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Test e2fsck' ability to deal with (a) corrupt descriptor block
checksum; (b) obviously bad journal block tid; and (c) corrupt journal
blocks. These should exercise the journal recovery infinite loop
bugfix earlier in this patchset.
This test also ensures that (with metadata_csum and journal_csum_v3)
journal replay continues past a corrupt journal block.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Add a couple of tests to verify that writing to and recovering from
an external journal work properly.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Test e2fsck' recovery of commit blocks with (a) only a corrupt
checksum and (b) an obviously incorrect tid.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Test e2fsck' ability to deal with (a) revoke blocks with a bad
checksum and (b) revoke blocks with an obviously bad block number.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Test e2fsck' ability to deal with corrupt journal superblock checksum
and a bad magic.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Add tests to ensure that we know how to recover journals with the
csum_v2 feature set.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Test that we can write and replay transactions with the old journal
checksum algorithm.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Simple tests for the 32bit journal transaction creation code when
journal and metadata_csum are enabled. We test the following:
(a) writing and replaying transactions with multiple
descriptor blocks
(b) same, but with multiple revoke blocks.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Simple tests for the 64bit journal transaction creation code when
journal and metadata_csum are enabled. We test writing (bad) block
bitmaps out through the journal and replaying them via fsck, with a
few twists:
(a) All bitmaps are committed (fs errors reported)
(b) All the bitmap blocks are revoked (no errors)
(c) The transaction is never committed (no errors)
(d) Same as (a), but debugfs gets to do the replay.
We also test:
(a) writing and replaying transactions with multiple
descriptor blocks
(b) same, but with multiple revoke blocks.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Simple tests for the journal transaction creation code. We test
writing (bad) block bitmaps out through the journal and replaying them
via fsck, with a few twists:
(a) All bitmaps are committed (fs errors reported)
(b) All the bitmap blocks are revoked (no errors)
(c) The transaction is never committed (no errors)
(d) Same as (a), but debugfs gets to do the replay.
We also test:
(a) writing and replaying transactions with multiple
descriptor blocks
(b) same, but with multiple revoke blocks.
(c) adding the 64bit flag to a journal
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Display the feature flags of an external journal.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reported-by: TR Reardon <thomas_reardon@hotmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Spit out a more specific error if someone tries to modify an
external journal device.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reported-by: TR Reardon <thomas_reardon@hotmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Verify the (ext4) superblock checksum of an external journal device
and prompt to correct the checksum if nothing else is wrong with the
superblock.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Cc: TR Reardon <thomas_reardon@hotmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Enable mke2fs to create an external journal device with a superblock
checksum.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Cc: TR Reardon <thomas_reardon@hotmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When creating a journal inode, check the return value from
block_iterate3() because otherwise we fail to capture errors such as
being unable to allocate an extent tree block, which leads to e2fsck
creating broken journals.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Also add the convenience macro $CLEAN_OUTPUT in test_config which can
be used to run the "sed -e $cmd_dir/filter.sed" command to clean up
e2fsprogs command output before comparing with the expected golden
output.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Add to ext2fs_symlink the ability to create inline data symlinks.
[ Modified by tytso to add more logging to the test script ]
Suggested-by: Pu Hou <houpu.hp@alibaba-inc.com>
Cc: Pu Hou <houpu.hp@alibaba-inc.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The following tests were using md5sum: i_e2image, u_mke2fs, and
u_tune2fs. Convert them to use crcsum for better portability (not all
environments have md5sum; some might have sha1sum instead :-)
For our purposes crcsum is quite sufficient.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
ext2fs_flush2() unconditionally writes the block group descriptors to
disk even if the underlying FS isn't marked dirty. This causes the
following error message on a fsck -n run:
e2fsck 1.43-WIP (09-Jul-2014)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Error writing block 2 (Attempt to write block to filesystem resulted in short write). Ignore error? no
Error writing block 2 (Attempt to write block to filesystem resulted in short write). Ignore error? no
Error writing file system info: Attempt to write block to filesystem resulted in short write
Since ext2fs_close2() only calls flush if the dirty flag is set,
modify e2fsck to exhibit the same behavior so that we don't spit out
write errors for a read only check.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Add a regression test to ensure that previous patches' fixes to e2fsck
do not revert.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If we come across an inode with the inline data and extents inode flag
set, try to figure out the correct flag settings from the contents of
i_block and i_size. If i_blocks looks like an extent tree head, we'll
make it an extent inode; if it's small enough for inline data, set it
to that. This leaves the weird gray area where there's no extent
tree but it's too big for the inode -- if /could/ be a block map,
change it to that; otherwise, just clear the inode.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Since fifo, socket, and device inodes cannot have inline data or
extents, strip off these flags if we find them.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Ensure that the various blobs in the in-inode EA region do not overlap.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Fix up the rest of the inline data code not to complain if there's no
EA, since it's possible that there's no EA because we're in the
process of creating an inline data file. Also, don't return an error
code when removing a nonexistent EA, because there's no reason to.
Furthermore, if we write less than 60 bytes of inline data, remove the
EA to avoid wasting space.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
In pass 3, convert the "delete files and re-run e2fsck" message to a
proper error code for more consistent error reporting and to make
translation easier.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
This test checks to make sure resize2fs can properly handle a file
system which started life as a normal ext4 file system and then was
grown to a size where meta_bg was enabled, and then shrunk back below
the point where the meta_bg format is still needed.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The test verifies that e2fsck can properly fix a file system where the
value of s_first_meta_bg in the superblock is larger than the number
of block group descriptors in the file system. E2fsck will fix this
by clearing the meta_bg feature.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Place the allocation bitmaps and inode table blocks so they are
adjacent, even in the last flexbg.
Previously, after running "mke2fs -t ext4 DEV 286720", the layout of
the last few block groups would look like this:
Group 32: (Blocks 262145-270336) [INODE_UNINIT, ITABLE_ZEROED]
Block bitmap at 262145 (+0), Inode bitmap at 262161 (+16)
Inode table at 262177-262432 (+32)
Group 33: (Blocks 270337-278528) [INODE_UNINIT, BLOCK_UNINIT, ITABLE_ZEROED]
Block bitmap at 262146 (bg #32 + 1), Inode bitmap at 262162 (bg #32 + 17)
Inode table at 262433-262688 (bg #32 + 288)
Group 34: (Blocks 278529-286719) [INODE_UNINIT, ITABLE_ZEROED]
Block bitmap at 262147 (bg #32 + 2), Inode bitmap at 262163 (bg #32 + 18)
Inode table at 262689-262944 (bg #32 + 544)
Now, they look like this:
Group 32: (Blocks 262145-270336) [INODE_UNINIT, ITABLE_ZEROED]
Block bitmap at 262145 (+0), Inode bitmap at 262148 (+3)
Inode table at 262151-262406 (+6)
Group 33: (Blocks 270337-278528) [INODE_UNINIT, BLOCK_UNINIT, ITABLE_ZEROED]
Block bitmap at 262146 (bg #32 + 1), Inode bitmap at 262149 (bg #32 + 4)
Inode table at 262407-262662 (bg #32 + 262)
Group 34: (Blocks 278529-286719) [INODE_UNINIT, ITABLE_ZEROED]
Block bitmap at 262147 (bg #32 + 2), Inode bitmap at 262150 (bg #32 + 5)
Inode table at 262663-262918 (bg #32 + 518)
This reduces the free space fragmentation in a freshly created file
system. It also allows the following mke2fs command to succeed:
mke2fs -t ext4 -b 4096 -O ^resize_inode -G $((2**20)) DEV 2130483
(Note that while this allows people to run mke2fs with insanely large
flexbg sizes, this is not a recommended practice, as the kernel may
refuse to resize such a file system while mounted, since it currently
tries to allocate an in-memory data structure based on the size of the
flexbg, and so a file system with a very large flexbg size will cause
the memory allocation to fail. This will hopefully be fixed in a
future kernel release, but if the goal is to force all of the metadata
blocks to be at the beginning of the file system, it's better to use
the packed_meta_blocks configuration parameter in mke2fs.conf.)
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Add regression tests to e2fsck to examine how it deals with inode
table blocks which (a) have been zero'd; (b) have been one'd; (c) have
corrupt inodes with obvious problems; and (d) have inodes with
non-obvious problems.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Add tests to examine how e2fsck deals with (a) the block bitmap being
corrupt; (b) the inode bitmap being corrupt; (c) the bitmap checksums
being incorrect (but the bitmaps are fine); and (d) the group
descriptor checksum itself is incorrect.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Add regression tests to examine how e2fsck deals with random
superblock corruption such as obviously wrong fields and the checksum
itself being incorrect.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Add regression tests to examine how e2fsck deals with MMP blocks with
(a) a bad magic number; and (b) an incorrect checksum.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Add some regression tests to examine how e2fsck handles directory
entry blocks and htree blocks with (a) malformed directory entries;
(b) incorrect checksums; or (c) obviously garbage entries.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Add some regression tests to examine how e2fsck deals with (a) extent
blocks with only a bad checksum; (b) extent blocks with a bad magic
number; and (c) extent entries with corruption.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Add regression tests for e2fsck dealing with (a) EA block with a bad
checksum; (b) EA block with a bad magic number; and (c) EA block with
damage that isn't otherwise noticeable.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If an inode fails checksum verification during pass 1 and the user
doesn't fix or clear the inode as part of the regular inode checks,
ensure that e2fsck remembers to ask the user if he simply wants to
correct the checksum.
We weren't capturing all the ways out of an interation of the inode
scanning loop, which means that not all errors were caught. Also,
we might as well clear the 'failed csum' flag if we write the inode
directly from the inode scanning loop.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If an inode fails checksum verification, don't stuff a copy of it in
the inode cache, because this can cause the library to fail to return
the "corrupt inode" error code.
In general, this happens if ext2fs_read_inode_full() is called twice
on an inode with an incorrect checksum. If fs->flags has
EXT2_FLAG_IGNORE_CSUM_ERRORS set during the first call and *unset*
during the second call, the cache hit during the second call fails to
return EXT2_ET_INODE_CSUM_INVALID as you'd expect. This happens
during fsck because the first read_inode call happens as part of
check_blocks and the second call happens during inode checksum
revalidation. A file system with a slightly corrupt non-extent inode
will trigger this.
While we're at it, make the inode read function consistent with the
rest of libext2fs -- copy the metadata object into the caller's buffer
even if it fails checksum verification. This will help e2fsck avoid a
double re-read later on down the line.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If we're totally unable to allocate a lost+found directory, ask the
user if he would like to dump orphaned files in the root directory.
Hopefully this enables the user to delete enough files so that a
subsequent run of e2fsck will make more progress. Better to cram lost
files in the rootdir than the current behavior, which is to fail at
linking them in, thereby leaving them as lost files.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The f_badcluster output format depends on how libreadline formats
and outputs the commands read from stdin. Instead of trying to
handle these differences, use an input command file, which does
not depend on external components to be consistent.
Signed-off-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
This should have been part of commit 9a1d614df2 ("e2fsck: fix
rule-violating lblk->pblk mappings on bigalloc filesystems") but it
accidentally got dropped when the patch was applied.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Fix the routine that adds dirent checksum structures to the directory
block to handle oddball situations a bit more robustly.
First, when we're walking the entry array, we might encounter an
entry that ends exactly one byte before where the checksum entry needs
to start, i.e. there's space for the tail entry, but it needs to be
reinitialized. When that happens, we should proceed until d points to
that space so that the tail entry can be initialized.
Second, it's possible that we've been fed a directory block where the
entries end just short of the end of the block. In this case, we need
to adjust the size of the last entry to point exactly to where the
dirent tail starts. The current code requires that entries end
exactly on the block boundary, but this is not always the case with
damaged filesystems.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If we trash the root directory block, e2fsck will find inode 11 (the
old lost+found) and try to attach it to l+f. The lost+found checker
also fails to find l+f and tries to add one to the root dir. The root
dir is not found but is recreated with incorrect checksums, so linking
in the l+f dir fails and the l+f '..' entry isn't set. Since both
dirs now fail checksum verification, they're both referred to rehash
to have that fixed, but because l+f doesn't have a '..' entry, rehash
crashes because l+f has < 2 entries.
On a checksumming filesystem, the routines in e2fsck that recreate
/lost+found and / must write the new directory block *after* the inode
has been written to disk because the checksum depends on i_generation.
Add a regression test while we're at it.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If we think we're going to need to repair either the root directory or
the lost+found directory, reserve a block at the end of pass 1 to
reduce the likelihood of an e2fsck abort while reconstructing
root/lost+found during pass 3.
If / and/or /lost+found are corrupt and duplicate processing in pass
1b allocates all the free blocks in the FS, fsck aborts with an
unusable FS since pass 3 can't recreate / or /lost+found. If either
of those directories are missing, an admin can't easily mount the FS
and access the directory tree to move files off the injured FS and
free up space; this in turn prevents subsequent runs of e2fsck from
being able to continue repairs of the FS.
(One could migrate files manually with debugfs without the help of
path names, but it seems easier if users can simply mount the FS and
use regular FS management tools.)
[ Fixed up an obvious C trap: const char * and const char [] are not
the same thing when you are taking the size of the parameter.
People, run your regression tests! Like spinach, it's good for you. :-)
-- tytso ]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Provide an API to set i_size in an inode and take care of all required
feature flag modifications. Refactor the code to use this new
function.
[ Moved the function to lib/ext2fs/blk_num.c, which is the rest of
these sorts of functions live, and renamed it to be
ext2fs_inode_size_set() instead of ext2fs_inode_set_size() to be
consistent with the other functions in in blk_num.c -- tytso ]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
We rely on a nasty hack to adjust the free block count where we pass
signed value into ext2fs_free_blocks_count_add(), which takes an
64-bit unsigned value, and relies on overflow and C's signed->unsigned
semantics to do the subtraction. This works, so long as a 64-bit
signed value is used.
Unfortunately, ext2fs_block_alloc_stats2() and
ext2fs_block_alloc_stats_range(), this is not true, so on a 64-bit
file system, the free blocks accounting can get screwed up.
A simple way to demonstrate the problem is:
mke2fs -F -t ext4 -O 64bit /tmp/foo.img 1M
e2fsck -fy /tmp/foo.img
... which will result in the following e2fsck complaint:
Pass 5: Checking group summary information
Free blocks count wrong (4294968278, counted=982).
Fix? yes
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Port tune2fs' -e flag to mke2fs so that we can set error behavior at
format time, and introduce the equivalent errors= setting into
mke2fs.conf.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Directories can't have uninitialized extents, so offer to clear the
uninit flag when we find this situation. The actual directory blocks
will be checked in pass 2 and 3 regardless of the uninit flag.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Currently, directories cannot be fallocated, which means that the only
way they get bigger is for the kernel to append blocks one by one.
Therefore, if we encounter a logical block offset that is too big, we
needn't bother adding it to the dblist for pass2 processing, because
it's unlikely to contain a valid directory block. The code that
handles extent based directories also does not add toobig blocks to
the dblist.
Note that we can easily cause e2fsck to fail with ENOMEM if we start
feeding it really large logical block offsets, as the dblist
implementation will try to realloc() an array big enough to hold it.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Always iterate logical block 0 in a directory, even if no physical
block has been allocated. Pass 2 will notice the lack of mapping and
offer to allocate a new directory block; this enables us to link the
directory into lost+found.
Previously, if there were no logical blocks mapped, we would fail to
pick up even block 0 of the directory for processing in pass 2. This
meant that e2fsck never allocated a block 0 and therefore wouldn't fix
the missing . and .. entries for the directory; subsequent e2fsck runs
would complain about (yet never fix) the problem.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If we notice a hole in the block map of an extent-based directory,
offer to collapse the hole by decreasing the logical block # of the
extent. This saves us from pass 3's inefficient strategy, which fills
the holes by mapping in a lot of empty directory blocks.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
In the loop in ext2fs_get_free_blocks2, we ask the bitmap if there's a
range of free blocks starting at "b" and ending at "b + num - 1".
That quantity is the number of the last block in the range. Since
ext2fs_blocks_count() returns the number of blocks and not the number
of the last block in the filesystem, the check is incorrect.
Put in a shortcut to exit the loop if finish > start, because in that
case it's obvious that we don't need to reset to the beginning of the
FS to continue the search for blocks. This is needed to terminate the
loop because the broken test meant that b could get large enough to
equal finish, which would end the while loop.
The attached testcase shows that with the off by one error, it is
possible to throw e2fsck into an infinite loop while it tries to
find space for the inode table even though there's no space for one.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If we encounter an inode with IND/DIND/TIND blocks or internal extent
tree blocks that point into critical FS metadata such as the
superblock, the group descriptors, the bitmaps, or the inode table,
it's quite possible that the validation code for those blocks is not
going to like what it finds, and it'll ask to try to fix the block.
Unfortunately, this happens before duplicate block processing (pass
1b), which means that we can end up doing stupid things like writing
extent blocks into the inode table, which multiplies e2fsck'
destructive effect and can render a filesystem unfixable.
To solve this, create a bitmap of all the critical FS metadata. If
before pass1b runs (basically check_blocks) we find a metadata block
that points into these critical regions, continue processing that
block, but avoid making any modifications, because we could be
misinterpreting inodes as block maps. Pass 1b will find the
multiply-owned blocks and fix that situation, which means that we can
then restart e2fsck from the beginning and actually fix whatever
problems we find.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When we're dumping a fast symlink inode, we print some odd things to
stdout. To clean this up, first don't print inline data EA, since the
inode dump doesn't display file and directory contents. Then, teach
the inode dump function how to print out either an inline data fast
symlink or a non-inline data fast symlink.
(This is a follow-up to the earlier patch "debugfs: Only print the
first 60 bytes from i_block on a fast symlink")
[ Modified by tytso so that the d_inline_dump test works when build
directory is different from the source directory --- i.e., when
doing a VPATH build. ]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>