If we find a hole in a directory on a bigalloc filesystem, we need to
obey the cluster alignment rules when collapsing the gap to avoid
later complaints.
Specifically, the calculation of the new logical cluster number was
incorrect, and we need to ensure that the logical cluster alignment
respects the physical cluster alignment, since we've concluded that
the extent's logical block number is wrong.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If in the course of iterating extents we find that an otherwise
valid-seeming second extent maps the same logical blocks as a
previously examined first extent, offer to clear the duplicate
mapping.
The test for this is already in f_extents.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If the badblocks list says that the badblocks inode is bad, it's quite
likely that badblocks is broken. Worse yet, if the root inode is in
the same block as the badblocks inode (likely since they're adjacent),
the filesystem becomes unfixable because pass3 notices the bad root
inode and exits.
So... if we encounter this case, just kill the badblocks inode.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If a file is marked inline_data but its i_size isn't a multiple of
four, it probably isn't an inline directory, because directory entries
have sizes that are multiples of four.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If we encounter a directory whose i_size != the inline data size, just
set i_size to the size of the inline data. The pb.last_block
calculation is wrong since pb.last_block == -1, which results in
i_size being set to zero, which corrupts the directory.
Clear the inline_data inode flag if we actually /are/ setting i_size
to zero.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If we come across an inode with the inline data and extents inode flag
set, try to figure out the correct flag settings from the contents of
i_block and i_size. If i_blocks looks like an extent tree head, we'll
make it an extent inode; if it's small enough for inline data, set it
to that. This leaves the weird gray area where there's no extent
tree but it's too big for the inode -- if /could/ be a block map,
change it to that; otherwise, just clear the inode.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Since fifo, socket, and device inodes cannot have inline data or
extents, strip off these flags if we find them.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Since the inline data flag will cause the extent/block map iteration
code to abort fsck early, move the test for the inode flag and the
actual block check call further forward in check_blocks. This
eliminates an e2fsck abort on an inline data symlink when the file ACL
block is set.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Perform some basic checks on inline-data symlinks.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If i_size indicates that an inode requires a system.data extended
attribute to hold overflow from i_blocks but the EA cannot be found,
offer to truncate the file.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Ensure that the various blobs in the in-inode EA region do not overlap.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When we're (a) reading EAs into a buffer; (b) byte-swapping EA
entries; or (c) checking EA data, be careful not to run off the end of
the memory buffer, because this causes invalid memory accesses and
e2fsck crashes. This can happen if we encounter a specially crafted
FS image.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If an inode fails checksum verification during pass 1 and the user
doesn't fix or clear the inode as part of the regular inode checks,
ensure that e2fsck remembers to ask the user if he simply wants to
correct the checksum.
We weren't capturing all the ways out of an interation of the inode
scanning loop, which means that not all errors were caught. Also,
we might as well clear the 'failed csum' flag if we write the inode
directly from the inode scanning loop.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Selectively disable checksum verification in a couple more places:
In check_blocks, disable checksum verification when iterating a block
map because the block map iterator function (re)reads the inode, which
could be unchanged since the scan found that the checksum fails. We
don't want to abort here; we want to keep evaluating the inode, and we
already know if the inode checksum doesn't match.
Further down in check_blocks when we're trying to see if i_size
matches the amount of data stored in the inode, don't allow checksum
errors when we go looking for the size of inline data. If the
required attribute is at all find-able in the EA block, we'll fix any
other problems with the EA block later. In the meantime, we don't
want to be truncating files unnecessarily.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Add a new behavior flag to the inode scan functions; when specified,
this flag will do some simple sanity checking of entire inode table
blocks. If all the checksums are ok, we can skip checksum
verification on individual inodes later on. If more than half of the
inodes look "insane" (bad extent tree root or checksum failure) then
ext2fs_get_next_inode_full() can return a special status code
indicating that what's in the buffer is probably garbage.
When e2fsck' inode scan encounters the 'inode is garbage' return code
it'll offer to zap the inode straightaway instead of trying to recover
anything. This replaces the previous behavior of asking to zap
anything with a checksum error (strict_csum).
Signed-off-by: Darrick J. Wong <darrick.wong@orale.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Remove the code that would zap an extent block immediately if the
checksum failed (i.e. strict_csums). Instead, we'll only do that if
the extent block header shows obvious structural problems; if the
header checks out, then we'll iterate the block and see if we can
recover some extents.
Requires a minor modification to ext2fs_extent_get such that the
extent block will be returned in the buffer even if the return code
indicates a checksum error. This brings its behavior in line with
the rest of libext2fs.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When reading an EA block in from disk, do a quick sanity check of the
block header, and return an error if we think we have garbage. Teach
e2fsck to ignore the new error code in favor of doing its own
checking, and remove the strict_csums bits while we're at it.
(Also document some assumptions in the new ext_attr code.)
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Don't allow critical metadata blocks to be marked free in the block
found map. This can theoretically happen on an FS where a first
inode's ETB/indirect map block is in the inode table, the first inode
is itself unclonable (and thus gets deleted) and there are enough
crosslinked files before and after the first inode to use up all the
free blocks during pass 1b.
(I do actually have a test FS image but it's 256M and it proved very
difficult to craft a bite-sized test case that actually hit this bug.)
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If an extent fails checksum and the sanity checks, and the user elects
to fix the extents, don't bother asking (the second time) if the user
would like to fix the checksum. Refactor some redundant code to make
what's going on a little cleaner.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If the badblocks inode fails checksum verification, just clear the
inode and move on. If we don't do this, we can end up importing a lot
of garbage into the badblocks list, which will then cause fsck to try
to regenerate anything that was sitting atop the supposedly damaged
blocks. Given that most hardware will remap bad sectors transparently
from ext4, the number of people this could affect adversely is pretty
low.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
As far as I can tell, logical block mappings on a bigalloc filesystem are
supposed to follow a few constraints:
* The logical cluster offset must match the physical cluster offset.
* A logical cluster may not map to multiple physical clusters.
Since the multiply-claimed block recovery code can be used to fix these
problems, teach e2fsck to find these transgressions and fix them.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
In the original patch (against -next), the hunk to fix uninit dirs was
just prior to the hunk labelled "Corrupt but passes checks?". The
hunks are ordered this way so that if e2fsck obtains permission to fix
a failed-csum extent (which in turn fixes the checksum), it will not
subsequently ask to (re)fix the checksum.
Due to a merge error the hunk moved to the wrong place, so put it
back.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If we think we're going to need to repair either the root directory or
the lost+found directory, reserve a block at the end of pass 1 to
reduce the likelihood of an e2fsck abort while reconstructing
root/lost+found during pass 3.
If / and/or /lost+found are corrupt and duplicate processing in pass
1b allocates all the free blocks in the FS, fsck aborts with an
unusable FS since pass 3 can't recreate / or /lost+found. If either
of those directories are missing, an admin can't easily mount the FS
and access the directory tree to move files off the injured FS and
free up space; this in turn prevents subsequent runs of e2fsck from
being able to continue repairs of the FS.
(One could migrate files manually with debugfs without the help of
path names, but it seems easier if users can simply mount the FS and
use regular FS management tools.)
[ Fixed up an obvious C trap: const char * and const char [] are not
the same thing when you are taking the size of the parameter.
People, run your regression tests! Like spinach, it's good for you. :-)
-- tytso ]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Provide an API to set i_size in an inode and take care of all required
feature flag modifications. Refactor the code to use this new
function.
[ Moved the function to lib/ext2fs/blk_num.c, which is the rest of
these sorts of functions live, and renamed it to be
ext2fs_inode_size_set() instead of ext2fs_inode_set_size() to be
consistent with the other functions in in blk_num.c -- tytso ]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Directories can't have uninitialized extents, so offer to clear the
uninit flag when we find this situation. The actual directory blocks
will be checked in pass 2 and 3 regardless of the uninit flag.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Currently, directories cannot be fallocated, which means that the only
way they get bigger is for the kernel to append blocks one by one.
Therefore, if we encounter a logical block offset that is too big, we
needn't bother adding it to the dblist for pass2 processing, because
it's unlikely to contain a valid directory block. The code that
handles extent based directories also does not add toobig blocks to
the dblist.
Note that we can easily cause e2fsck to fail with ENOMEM if we start
feeding it really large logical block offsets, as the dblist
implementation will try to realloc() an array big enough to hold it.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Always iterate logical block 0 in a directory, even if no physical
block has been allocated. Pass 2 will notice the lack of mapping and
offer to allocate a new directory block; this enables us to link the
directory into lost+found.
Previously, if there were no logical blocks mapped, we would fail to
pick up even block 0 of the directory for processing in pass 2. This
meant that e2fsck never allocated a block 0 and therefore wouldn't fix
the missing . and .. entries for the directory; subsequent e2fsck runs
would complain about (yet never fix) the problem.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If we notice a hole in the block map of an extent-based directory,
offer to collapse the hole by decreasing the logical block # of the
extent. This saves us from pass 3's inefficient strategy, which fills
the holes by mapping in a lot of empty directory blocks.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Since fs->group_desc_count is the number of block groups, the number
of the last group is always one less than this count. Fix the bounds
check to reflect that.
This flaw shouldn't have any user-visible side effects, since the
block bitmap test based on last_grp later on can handle overbig block
numbers.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
During the later passes of efsck, we sometimes need to allocate and
map blocks into a file. This can happen either by fsck directly
calling new_block() or indirectly by the library calling new_block
because it needs to allocate a block for lower level metadata (bmap2()
with BMAP_SET; block_iterate3() with BLOCK_CHANGED).
We need to force new_block to allocate blocks from the found block
map, because the FS block map could be inaccurate for various reasons:
the map is wrong, there are missing blocks, the checksum failed, etc.
Therefore, any time fsck does something that could to allocate blocks,
we need to intercept allocation requests so that they're sourced from
the found block map. Remove the previous code that swapped bitmap
pointers as this is now unneeded.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If we encounter an inode with IND/DIND/TIND blocks or internal extent
tree blocks that point into critical FS metadata such as the
superblock, the group descriptors, the bitmaps, or the inode table,
it's quite possible that the validation code for those blocks is not
going to like what it finds, and it'll ask to try to fix the block.
Unfortunately, this happens before duplicate block processing (pass
1b), which means that we can end up doing stupid things like writing
extent blocks into the inode table, which multiplies e2fsck'
destructive effect and can render a filesystem unfixable.
To solve this, create a bitmap of all the critical FS metadata. If
before pass1b runs (basically check_blocks) we find a metadata block
that points into these critical regions, continue processing that
block, but avoid making any modifications, because we could be
misinterpreting inodes as block maps. Pass 1b will find the
multiply-owned blocks and fix that situation, which means that we can
then restart e2fsck from the beginning and actually fix whatever
problems we find.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When we're about to iterate the blocks of a block-map file, we need to
write the inode out to disk if it's dirty because block_iterate3()
will re-read the inode from disk. (In practice this won't happen
because nothing dirties block-mapped inodes before the iterate call,
but we can program defensively).
More importantly, we need to re-read the inode after the iterate()
operation because it's possible that mappings were changed (or erased)
during the iteration. If we then dirty or clear the inode, we'll
mistakenly write the old inode values back out to disk!
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If the bitmaps are known to be unreadable, don't bother clearing them;
just mark fsck to restart itself after pass 5, by which time the
bitmaps should be fixed.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
After fixing a bug on kernel side for handling symlink properly with
inline data, it will break the assumption in e2fsck because in original
patch set of inline data it doesn't support symlink with inline data
feature. This commit makes e2fsck handle symlink properly with inline
data.
After applied this patch, the inline data feature has ability to store
the symlink. We also need to add this ability for symlink commmand in
debugfs.
Cc: Ian Nartowicz <claws@nartowicz.co.uk>
Cc: Tao Ma <tm@tao.ma>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Fix various small resource leaks and error code handling issues that
Coverity pointed out.
Fixes-Coverity-Bugs: 1215250, 1193379, 119194[2-4], 1049160
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
We already skip the low dtime check if the number of inods is greater
than the last mount or last written time. However, if a very large
file system is resized sufficiently large that the number of inodes is
greater than when the file system was original created, we can end up
running afoul of the low dtime check. This results in a large number
of false positives which e2fsck can fix up without causing any
problems, but it can induce a large amount of anxiety for the system
administrator.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reported-by: Patrik Horník <patrik@hornik.sk>
Fix a number of things that cppcheck complains about. Most of these
are minor resource leaks and forgotten declarations.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
When checking inline data blocks, always zero pctx->errcode because
otherwise a previous error condition could leak through and "cause" a
fatal block iteration failure. I found this by corrupting an xattr
block on an inline_data inode and fsck aborted when I tried to repair
it.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>