Now that the directory salvaging operation is fed the block size,
teach pass 2 that it should use the size of the inline data if the
directory is inline_data. Without this, it'll "fix" inline
directories by setting the rec_len to something approaching the FS
blocksize, which is clearly wrong.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If we encounter a directory whose i_size != the inline data size, just
set i_size to the size of the inline data. The pb.last_block
calculation is wrong since pb.last_block == -1, which results in
i_size being set to zero, which corrupts the directory.
Clear the inline_data inode flag if we actually /are/ setting i_size
to zero.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If we come across an inode with the inline data and extents inode flag
set, try to figure out the correct flag settings from the contents of
i_block and i_size. If i_blocks looks like an extent tree head, we'll
make it an extent inode; if it's small enough for inline data, set it
to that. This leaves the weird gray area where there's no extent
tree but it's too big for the inode -- if /could/ be a block map,
change it to that; otherwise, just clear the inode.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Since fifo, socket, and device inodes cannot have inline data or
extents, strip off these flags if we find them.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Inodes with inline_data set do not have iterable blocks, so don't try
to iterate the blocks, because that will just fail, causing e2fsck to
abort.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Since the inline data flag will cause the extent/block map iteration
code to abort fsck early, move the test for the inode flag and the
actual block check call further forward in check_blocks. This
eliminates an e2fsck abort on an inline data symlink when the file ACL
block is set.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Perform some basic checks on inline-data symlinks.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If i_size indicates that an inode requires a system.data extended
attribute to hold overflow from i_blocks but the EA cannot be found,
offer to truncate the file.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Ensure that the various blobs in the in-inode EA region do not overlap.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The xattr_get method returns to us a pointer to a buffer containing
the EA value. If for some reason we decide to fail out of iterating
the EA part of an inline-data directory, we must free the buffer that
xattr_get passed to us (via inline_data_ea_get).
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Fix up the rest of the inline data code not to complain if there's no
EA, since it's possible that there's no EA because we're in the
process of creating an inline data file. Also, don't return an error
code when removing a nonexistent EA, because there's no reason to.
Furthermore, if we write less than 60 bytes of inline data, remove the
EA to avoid wasting space.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If we're doing a strict overwrite (same data size) of data in an
inline data file, we should be able to skip the size check. If the
in-core EA representation is fine but the on-disk EA is slightly
corrupt (this happens when fixing minor errors in an inline dir), the
ext2fs_xattr_inode_max_size() call, which reads the disk EA, can lead
us to think that there's no space when in reality there is no issue
with doing a strict overwrite.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The inline data code fails to perform endianness conversions correctly
or at all in a number of places, so fix this so that big-endian
machines function properly.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When we're (a) reading EAs into a buffer; (b) byte-swapping EA
entries; or (c) checking EA data, be careful not to run off the end of
the memory buffer, because this causes invalid memory accesses and
e2fsck crashes. This can happen if we encounter a specially crafted
FS image.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Perform a little more sanity checking of EA value offsets so that we
don't crash while trying to load things from the filesystem.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
In pass 3, convert the "delete files and re-run e2fsck" message to a
proper error code for more consistent error reporting and to make
translation easier.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Fix clang warnings about forgotten header files, dead code, and pwrite
support on OS X. The unistd.h inclusion also fixes a parameter
truncation bug on i386.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
A user who sees the message
***** REBOOT LINUX *****
or
***** FILE SYSTEM WAS MODIFIED *****
might think that e2fsck was complete even though we haven't finished
writing out the superblock or bitmap blocks, and then either forcibly
reboot or power cycle the box, or yank the USB key out while the
storage device is still being written (before e2fsck exits).
So rearrange the exit path of e2fsck so that we flush out the dirty
superblock/bg descriptors/bitmaps before we print the final message.
Also clean up this code so that the flow of control is easier to
understand, and add error checking to catch any errors (normally
caused by I/O errors writing to the disk) for these final writebacks.
Addresses-Debian-Bugs: #757543, #757544
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: Dan Jacobson <jidanni@jidanni.org>
This test checks to make sure resize2fs can properly handle a file
system which started life as a normal ext4 file system and then was
grown to a size where meta_bg was enabled, and then shrunk back below
the point where the meta_bg format is still needed.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The test verifies that e2fsck can properly fix a file system where the
value of s_first_meta_bg in the superblock is larger than the number
of block group descriptors in the file system. E2fsck will fix this
by clearing the meta_bg feature.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When shrinking a file system, if the number block groups drops below
the point where we started using the meta_bg layout, disable the
meta_bg feature and set s_first_meta_bg to zero. This is necessary to
avoid creating an invalid/corrupted file system after the shrink.
Addresses-Debian-Bug: #756922
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reported-by: Marcin Wolcendorf <antymat+debian@chelmska.waw.pl>
Tested-by: Marcin Wolcendorf <antymat+debian@chelmska.waw.pl>
If s_first_meta_bg is greater than the of number block group
descriptor blocks, then reading or writing the block group descriptors
will end up overruning the memory buffer allocated for the
descriptors. Fix this by limiting first_meta_bg to no more than
fs->desc_blocks. This doesn't correct the bad s_first_meta_bg value,
but it avoids causing the e2fsprogs userspace programs from
potentially crashing.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Commit baa3544609 ("libext2fs: have UNIX IO manager use
pread/pwrite) causes a breakage on 32-bit systems where off_t is
32-bits for file systems larger than 4GB. Fix this by using
pread64/pwrite64 if possible, and if pread64/pwrite64 is not present,
using pread/pwrite only if the size of off_t is at least as big as
ext2_loff_t.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
No behaviour changes. This will simplify the next commit.
Signed-off-by: Aaron Crane <arc@aaroncrane.co.uk>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Previously, both of these usages called dump_file() with a true value as
the "preserve" argument, which caused it to in turn call fix_perms() to
make the permissions on the locally-dumped file match those found on the
ext2 filesystem. fix_perms() then attempted to close(2) the file descriptor
(if any) before returning (though it didn't attempt to report on any errors
found while doing so).
However, in both of these situations, the local file being dumped had been
opened by the caller of dump_file(), which also closes it (and reports on
any errors detected when closing). This meant that both "rdump" and "dump
-p" would then emit a spurious EBADF message when trying to re-close the
local file descriptor.
Deleting the spurious close(2) call in fix_perms() fixes the problem in both
commands.
Signed-off-by: Aaron Crane <arc@aaroncrane.co.uk>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Place the allocation bitmaps and inode table blocks so they are
adjacent, even in the last flexbg.
Previously, after running "mke2fs -t ext4 DEV 286720", the layout of
the last few block groups would look like this:
Group 32: (Blocks 262145-270336) [INODE_UNINIT, ITABLE_ZEROED]
Block bitmap at 262145 (+0), Inode bitmap at 262161 (+16)
Inode table at 262177-262432 (+32)
Group 33: (Blocks 270337-278528) [INODE_UNINIT, BLOCK_UNINIT, ITABLE_ZEROED]
Block bitmap at 262146 (bg #32 + 1), Inode bitmap at 262162 (bg #32 + 17)
Inode table at 262433-262688 (bg #32 + 288)
Group 34: (Blocks 278529-286719) [INODE_UNINIT, ITABLE_ZEROED]
Block bitmap at 262147 (bg #32 + 2), Inode bitmap at 262163 (bg #32 + 18)
Inode table at 262689-262944 (bg #32 + 544)
Now, they look like this:
Group 32: (Blocks 262145-270336) [INODE_UNINIT, ITABLE_ZEROED]
Block bitmap at 262145 (+0), Inode bitmap at 262148 (+3)
Inode table at 262151-262406 (+6)
Group 33: (Blocks 270337-278528) [INODE_UNINIT, BLOCK_UNINIT, ITABLE_ZEROED]
Block bitmap at 262146 (bg #32 + 1), Inode bitmap at 262149 (bg #32 + 4)
Inode table at 262407-262662 (bg #32 + 262)
Group 34: (Blocks 278529-286719) [INODE_UNINIT, ITABLE_ZEROED]
Block bitmap at 262147 (bg #32 + 2), Inode bitmap at 262150 (bg #32 + 5)
Inode table at 262663-262918 (bg #32 + 518)
This reduces the free space fragmentation in a freshly created file
system. It also allows the following mke2fs command to succeed:
mke2fs -t ext4 -b 4096 -O ^resize_inode -G $((2**20)) DEV 2130483
(Note that while this allows people to run mke2fs with insanely large
flexbg sizes, this is not a recommended practice, as the kernel may
refuse to resize such a file system while mounted, since it currently
tries to allocate an in-memory data structure based on the size of the
flexbg, and so a file system with a very large flexbg size will cause
the memory allocation to fail. This will hopefully be fixed in a
future kernel release, but if the goal is to force all of the metadata
blocks to be at the beginning of the file system, it's better to use
the packed_meta_blocks configuration parameter in mke2fs.conf.)
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
This reverts commit d988201ef9.
The problem with this commit is that causes common small file system
configurations to fail. For example:
mke2fs -O flex_bg -b 4096 -I 1024 -F /tmp/tt 79106
mke2fs 1.42.11 (09-Jul-2014)
/tmp/tt: Invalid argument passed to ext2 library while setting
up superblock
This check in ext2fs_initialize() was added to prevent the metadata
from being allocated beyond the end of the filesystem, but it is
also causing a wide range of failures for small filesystems.
We'll address this in a different way, by using a smarter algorithm
for deciding the layout of metadata blocks for the last flex block
group.
Reported-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Add regression tests to e2fsck to examine how it deals with inode
table blocks which (a) have been zero'd; (b) have been one'd; (c) have
corrupt inodes with obvious problems; and (d) have inodes with
non-obvious problems.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Add tests to examine how e2fsck deals with (a) the block bitmap being
corrupt; (b) the inode bitmap being corrupt; (c) the bitmap checksums
being incorrect (but the bitmaps are fine); and (d) the group
descriptor checksum itself is incorrect.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Add regression tests to examine how e2fsck deals with random
superblock corruption such as obviously wrong fields and the checksum
itself being incorrect.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Add regression tests to examine how e2fsck deals with MMP blocks with
(a) a bad magic number; and (b) an incorrect checksum.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Add some regression tests to examine how e2fsck handles directory
entry blocks and htree blocks with (a) malformed directory entries;
(b) incorrect checksums; or (c) obviously garbage entries.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Add some regression tests to examine how e2fsck deals with (a) extent
blocks with only a bad checksum; (b) extent blocks with a bad magic
number; and (c) extent entries with corruption.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Add regression tests for e2fsck dealing with (a) EA block with a bad
checksum; (b) EA block with a bad magic number; and (c) EA block with
damage that isn't otherwise noticeable.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If an inode fails checksum verification during pass 1 and the user
doesn't fix or clear the inode as part of the regular inode checks,
ensure that e2fsck remembers to ask the user if he simply wants to
correct the checksum.
We weren't capturing all the ways out of an interation of the inode
scanning loop, which means that not all errors were caught. Also,
we might as well clear the 'failed csum' flag if we write the inode
directly from the inode scanning loop.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Selectively disable checksum verification in a couple more places:
In check_blocks, disable checksum verification when iterating a block
map because the block map iterator function (re)reads the inode, which
could be unchanged since the scan found that the checksum fails. We
don't want to abort here; we want to keep evaluating the inode, and we
already know if the inode checksum doesn't match.
Further down in check_blocks when we're trying to see if i_size
matches the amount of data stored in the inode, don't allow checksum
errors when we go looking for the size of inline data. If the
required attribute is at all find-able in the EA block, we'll fix any
other problems with the EA block later. In the meantime, we don't
want to be truncating files unnecessarily.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If an inode fails checksum verification, don't stuff a copy of it in
the inode cache, because this can cause the library to fail to return
the "corrupt inode" error code.
In general, this happens if ext2fs_read_inode_full() is called twice
on an inode with an incorrect checksum. If fs->flags has
EXT2_FLAG_IGNORE_CSUM_ERRORS set during the first call and *unset*
during the second call, the cache hit during the second call fails to
return EXT2_ET_INODE_CSUM_INVALID as you'd expect. This happens
during fsck because the first read_inode call happens as part of
check_blocks and the second call happens during inode checksum
revalidation. A file system with a slightly corrupt non-extent inode
will trigger this.
While we're at it, make the inode read function consistent with the
rest of libext2fs -- copy the metadata object into the caller's buffer
even if it fails checksum verification. This will help e2fsck avoid a
double re-read later on down the line.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When we need to modify the "ignore checksum error" behavior flag to
get us past a library call, it's possible that the library call can
result in other flag bits being changed. Therefore, it is not correct
to restore unconditionally the previous flags value, since this will
have unintended side effects on the other fs->flags; nor is it correct
to assume that we can unconditionally set (or clear) the "ignore csum
error" flag bit. Therefore, we must merge the previous value of the
"ignore csum error" flag with the value of flags after the call.
Note that we want to leave checksum verification on as much as
possible because doing so exposes e2fsck bugs where two metadata
blocks are "sharing" the same disk block, and attempting to fix one
before relocating the other causes major filesystem damage. The
damage is much more obvious when a previously checked piece of
metadata suddenly fails in a subsequent pass.
The modifications to the pass 2, 3, and 3A code are justified as
follows: When e2fsck encounters a block of directory entries and
cannot find the placeholder entry at the end that contains the
checksum, it will try to insert the placeholder. If that fails, it
will schedule the directory for a pass 3A reconstruction. Until that
happens, we don't want directory block writing (pass 2), block
iteration (pass 3), or block reading (pass 3A) to fail due to checksum
errors, because failing to find the placeholder is itself a checksum
verification error, which causes e2fsck to abort without fixing
anything.
The e2fsck call to ext2fs_read_bitmaps must never fail due to a
checksum error because e2fsck subsequently (a) verifies the bitmaps
itself; or (b) decides that they don't match what has been observed,
and rewrites them.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Add a new behavior flag to the inode scan functions; when specified,
this flag will do some simple sanity checking of entire inode table
blocks. If all the checksums are ok, we can skip checksum
verification on individual inodes later on. If more than half of the
inodes look "insane" (bad extent tree root or checksum failure) then
ext2fs_get_next_inode_full() can return a special status code
indicating that what's in the buffer is probably garbage.
When e2fsck' inode scan encounters the 'inode is garbage' return code
it'll offer to zap the inode straightaway instead of trying to recover
anything. This replaces the previous behavior of asking to zap
anything with a checksum error (strict_csum).
Signed-off-by: Darrick J. Wong <darrick.wong@orale.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Remove the code that would prompt the user to zap directory entry
blocks with bad checksums (i.e. strict_csums). Instead, we'll run the
directory entries through the usual repair routines in an attempt to
save whatever we can. At the same time, refactor the code that
schedules the repair of missing dirblock checksum entries.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Remove the code that would zap an extent block immediately if the
checksum failed (i.e. strict_csums). Instead, we'll only do that if
the extent block header shows obvious structural problems; if the
header checks out, then we'll iterate the block and see if we can
recover some extents.
Requires a minor modification to ext2fs_extent_get such that the
extent block will be returned in the buffer even if the return code
indicates a checksum error. This brings its behavior in line with
the rest of libext2fs.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When reading an EA block in from disk, do a quick sanity check of the
block header, and return an error if we think we have garbage. Teach
e2fsck to ignore the new error code in favor of doing its own
checking, and remove the strict_csums bits while we're at it.
(Also document some assumptions in the new ext_attr code.)
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Warn the user to run e2fsck if the superblock or bitmaps fails
checksum verification.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If we're totally unable to allocate a lost+found directory, ask the
user if he would like to dump orphaned files in the root directory.
Hopefully this enables the user to delete enough files so that a
subsequent run of e2fsck will make more progress. Better to cram lost
files in the rootdir than the current behavior, which is to fail at
linking them in, thereby leaving them as lost files.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>