The scratch_files feature is not really needed except on 32-bit
platforms, since tdb's performance is pretty awful given how we are
using it. Maybe SQLite would be faster, but for 64-bit platforms,
enabling swap works fairly well, especially using the rbtree for the
bitmap abstraction.
We leave tdb for Android since it's unlikely that someone will be
trying to connect petabyte+ sized file systems to a mobile handset.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
There is a bug in how e2fsck handles being interrupted by CTRL-C.
If CTRL-C is pressed to kill e2fsck rather than e.g. kill -9, then
the interrupt handler sets E2F_FLAG_CANCEL in the context but doesn't
actually kill the process. Instead, e2fsck_pass1() checks this flag
before processing the next inode.
If a filesystem is running in fix mode (e2fsck -fy) is interrupted,
and the quota feature is enabled, then the quota file will still be
written to disk even though the inode scan was not complete and the
quota information is totally inaccurate. Even worse, if the Pass 1
inode and block scan was not finished, then the in-memory block
bitmaps (which are used for block allocation during e2fsck) are also
invalid, so any blocks allocated to the quota files may corrupt other
files if those blocks were actually used.
e2fsck 1.42.13.wc3 (28-Aug-2015)
Pass 1: Checking inodes, blocks, and sizes
^C[QUOTA WARNING] Usage inconsistent for ID 0:
actual (6455296, 168) != expected (8568832, 231)
[QUOTA WARNING] Usage inconsistent for ID 695:
actual (614932320256, 63981) != expected (2102405386240, 176432)
Update quota info for quota type 0? yes
[QUOTA WARNING] Usage inconsistent for ID 0:
actual (6455296, 168) != expected (8568832, 231)
[QUOTA WARNING] Usage inconsistent for ID 538:
actual (614932320256, 63981) != expected (2102405386240, 176432)
Update quota info for quota type 1? yes
myth-OST0001: e2fsck canceled.
myth-OST0001: ***** FILE SYSTEM WAS MODIFIED *****
There may be a desire to flush out modified inodes and such that have
been repaired, so that restarting an interrupted e2fsck will make
progress, but the quota file update is plain wrong unless at least
pass1 has finished, and the journal recreation is also dangerous if
the block bitmaps have not been fully updated.
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Create separate predicate functions to test/set/clear feature flags,
thereby replacing the wordy old macros. Furthermore, clean out the
places where we open-coded feature tests.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
We weren't verifying the checksum of an htree leaf block due to a
coding error that marked all htree blocks as not having checksums.
While we're at it, fix the error message that gets displayed so that
it doesn't print a meaningless block offset.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If there are directory entries with file names which are less than 16
bytes, it turns out that passing less than the crypto block size to
the kernel's crypto layer will cause the kernel to crash.
However, since there never should be encrypted directory entries where
the file name is less than 16 bytes (the AES block size), change
e2fsck to offer to address this corruption by deleting the directory
entry.
(We need to checks for this condition into the kernel as well.)
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The quota code required that we included dict.o in libsupport.a, so we
might as well just move dict.c and dict.h to lib/support, and then
have e2fsck use the version of dict.c in libsupport.a. This
simplifies the build system and eliminates having two identical copies
of dict.o floating around in the build tree.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The presence of --disable-htree is very much a legacy thing. Remove
it since supporting the lack of htree support is pretty silly.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The directory hash is now calculated using the on-disk encrypted
filename, and we no longer use the digest encoding or the SHA-256
encoding, so remove them from the ext2fs library until there is some
reason we need them.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
e2fsck pass1 is modified to use the block group data prefetch function
to try to fetch the inode tables into the pagecache before it is
needed. We iterate through the blockgroups until we have enough inode
tables that need reading such that we can issue readahead; then we sit
and wait until the last inode table block read of the last group to
start fetching the next bunch.
pass2 is modified to use the dirblock prefetching function to prefetch
the list of directory blocks that are assembled in pass1. We use the
"iterate a subset of a dblist" and avoid copying the dblist. Directory
blocks are fetched incrementally as we walk through the directory
block list. In previous iterations of this patch we would free the
directory blocks after processing, but the performance hit to e2fsck
itself wasn't worth it. Furthermore, it is anticipated that most
users will then mount the FS and start using the directories, so they
may as well remain in the page cache.
pass4 is modified to prefetch the block and inode bitmaps in
anticipation of pass 5, because pass4 is entirely CPU bound.
In general, these mechanisms can decrease fsck time by 10-40%, if the
host system has sufficient memory and the storage system can provide a
lot of IOPs. Pretty much any storage system capable of handling
multiple IOs in-flight at any time will see a fairly large performance
boost. (Single-issue USB mass storage disks seem to suffer badly.)
By default, the readahead buffer size will be set to the size of a block
group's inode table (which is 2MiB for a regular ext4 FS). The -E
readahead_kb= option can be given to specify the amount of memory to
use for readahead or zero to disable it entirely; or an option can be
given in e2fsck.conf.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The compression patches were an out-of-kernel patch set that was (a)
only available for ext2, (b) something that was never could be
stablized due to file system corruption, and (c) the most recent
patches were for 3.1, last updated in 2011.
The history of the compression patches has been a bit checkered.
There is a long history here at http://e2compr.sourceforge.net which
lists the perspective of the people working on it from the e2compr
side.
From the ext2/3/4 mainline developers' perspective, initial
compression support was added to e2fsprogs in 2000 (in the Linux 2.2
era), but due to stability concerns the kernel patches were never
merged into the mainline kernel. While there were some sporadic
efforts to try to get the ext2 compression patches working in the 2.4
and 2.6 era, by that time mainline work had moved on to ext4, and the
e2compr approach could only work with 32-bit block numbers and
indirect mapped files.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Use memcmp() instead of strncmp() since encrypted directory names can
contain NUL characters. For non-encrypted directories, we've already
checked for the case of NUL characters in file names, so it's safe to
use memcmp() here in all cases.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If the directory processing code ends up pointing to a directory entry
that's so close to the end of the block that there's not even space
for a rec_len/name_len, just substitute dummy values that will force
e2fsck to extend the previous entry to cover the remaining space. We
can't use the helper methods to extract rec_len because that's reading
off the end of the buffer.
This isn't an issue with non-inline directories because the directory
check buffer is zero-extended so that fsck won't blow up.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The design of inline directories (apparently) calls for the i_block[]
region and the EA regions to be treated as if they were two separate
blocks of dirents. Effectively this means that it is impossible for a
directory entry to straddle both areas. e2fsck doesn't enforce this,
so teach it to do so. e2fslib already knows to do this....
Cc: Zheng Liu <gnehzuil.liu@gmail.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Previously, e2fsck accessed the field osd2.linux2.l_i_file_acl_high
field without checking that the filesystem is indeed created for
Linux. This lead to e2fsck constantly complaining about certain
nodes:
i_file_acl_hi for inode XXX (/dev/console) is 32, should be zero.
By "correcting" this problem, e2fsck would clobber the field
osd2.hurd2.h_i_mode_high.
Properly guard access to the OS dependent fields.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If a directory block lacks space for a checksum and the user directs
e2fsck to fix the directory block (by rehashing it), don't complain a
second time about the checksum verification failure when we get to the
end of the directory block.
Also, don't complain about broken HTREE directories if we're already
planning to rebuild the HTREE directory.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Try to be a little smarter about where we go to allocate blocks for a
inode. For a given inode and logical offset, set the goal as if the
file were physically continuous. If it's bmapped, just start looking
at wherever lblk 0 is. If that's not possible (the file has no
lblk>pblk mappings, inline data, etc.) then start looking in the
inode's block group.
[ Fixed memory leak --tytso ]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Sami Liedes reports that e2fsck fails to report the correct directory
inode number during a pass2 check for unexpected HTREE blocks.
Provide the inode number in the problem report.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reported-by: Sami Liedes <sami.liedes@iki.fi>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
On big-endian systems, if the dirent swap routine finds a rec_len that
it doesn't like, it continues processing the block as if rec_len == 8.
This means that the name field gets byte swapped, which means that
salvage will not detect the correct name length (unless the name has a
length that's an exact multiple of four bytes), and it'll discard the
entry (unnecessarily) and the rest of the dirent block. Therefore,
swap the rest of the block back to disk order, run salvage, and
re-swap anything after the salvaged dirent.
The test case for this is f_inlinedata_repair if you run it on a BE
system.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
In an inline directory, the '..' entry is compacted down to just the
inode number; there is no full '..' entry. Therefore, it makes no
sense to assign 'prev' to the fake dotdot entry we put on the stack,
as this could confuse a salvage_directory call on a corrupted next
entry into modifying stack contents (the fake dotdot entry).
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Directory entries must have a size that's a multiple of 4; therefore
the inline directory structure must also have a size that is a muliple
of 4. Since e2fsck doesn't check this, we should check that now.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Now that the directory salvaging operation is fed the block size,
teach pass 2 that it should use the size of the inline data if the
directory is inline_data. Without this, it'll "fix" inline
directories by setting the rec_len to something approaching the FS
blocksize, which is clearly wrong.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Inodes with inline_data set do not have iterable blocks, so don't try
to iterate the blocks, because that will just fail, causing e2fsck to
abort.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The inline data code fails to perform endianness conversions correctly
or at all in a number of places, so fix this so that big-endian
machines function properly.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When we need to modify the "ignore checksum error" behavior flag to
get us past a library call, it's possible that the library call can
result in other flag bits being changed. Therefore, it is not correct
to restore unconditionally the previous flags value, since this will
have unintended side effects on the other fs->flags; nor is it correct
to assume that we can unconditionally set (or clear) the "ignore csum
error" flag bit. Therefore, we must merge the previous value of the
"ignore csum error" flag with the value of flags after the call.
Note that we want to leave checksum verification on as much as
possible because doing so exposes e2fsck bugs where two metadata
blocks are "sharing" the same disk block, and attempting to fix one
before relocating the other causes major filesystem damage. The
damage is much more obvious when a previously checked piece of
metadata suddenly fails in a subsequent pass.
The modifications to the pass 2, 3, and 3A code are justified as
follows: When e2fsck encounters a block of directory entries and
cannot find the placeholder entry at the end that contains the
checksum, it will try to insert the placeholder. If that fails, it
will schedule the directory for a pass 3A reconstruction. Until that
happens, we don't want directory block writing (pass 2), block
iteration (pass 3), or block reading (pass 3A) to fail due to checksum
errors, because failing to find the placeholder is itself a checksum
verification error, which causes e2fsck to abort without fixing
anything.
The e2fsck call to ext2fs_read_bitmaps must never fail due to a
checksum error because e2fsck subsequently (a) verifies the bitmaps
itself; or (b) decides that they don't match what has been observed,
and rewrites them.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Remove the code that would prompt the user to zap directory entry
blocks with bad checksums (i.e. strict_csums). Instead, we'll run the
directory entries through the usual repair routines in an attempt to
save whatever we can. At the same time, refactor the code that
schedules the repair of missing dirblock checksum entries.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Fix the routine that adds dirent checksum structures to the directory
block to handle oddball situations a bit more robustly.
First, when we're walking the entry array, we might encounter an
entry that ends exactly one byte before where the checksum entry needs
to start, i.e. there's space for the tail entry, but it needs to be
reinitialized. When that happens, we should proceed until d points to
that space so that the tail entry can be initialized.
Second, it's possible that we've been fed a directory block where the
entries end just short of the end of the block. In this case, we need
to adjust the size of the last entry to point exactly to where the
dirent tail starts. The current code requires that entries end
exactly on the block boundary, but this is not always the case with
damaged filesystems.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When we're salvaging a directory, leave room at the end of the block
for the checksum entry so that e2fsck can write the checksummed dir
block out later.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If e2fsck is writing a block of directory entries to disk, it should
adjust the dirents to add the dirent tail if one is missing. It's not
a big deal if there's no space to do this since rehash (pass 3A) will
reconstruct directories for us. However, we may as well avoid
unnecessary work.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If we're forced to delete a crosslinked file, only call
ext2fs_block_alloc_stats2() on cluster boundaries, since the block
bitmaps are all cluster bitmaps at this point. It's safe to do this
only once per cluster since we know all the blocks are going away.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If we're filling a directory hole, we need to perform an implied
cluster allocation to satisfy the bigalloc rule of mapping only one
pblk to a logical cluster.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Provide an API to set i_size in an inode and take care of all required
feature flag modifications. Refactor the code to use this new
function.
[ Moved the function to lib/ext2fs/blk_num.c, which is the rest of
these sorts of functions live, and renamed it to be
ext2fs_inode_size_set() instead of ext2fs_inode_set_size() to be
consistent with the other functions in in blk_num.c -- tytso ]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
During the later passes of efsck, we sometimes need to allocate and
map blocks into a file. This can happen either by fsck directly
calling new_block() or indirectly by the library calling new_block
because it needs to allocate a block for lower level metadata (bmap2()
with BMAP_SET; block_iterate3() with BLOCK_CHANGED).
We need to force new_block to allocate blocks from the found block
map, because the FS block map could be inaccurate for various reasons:
the map is wrong, there are missing blocks, the checksum failed, etc.
Therefore, any time fsck does something that could to allocate blocks,
we need to intercept allocation requests so that they're sourced from
the found block map. Remove the previous code that swapped bitmap
pointers as this is now unneeded.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Fix a number of things that cppcheck complains about. Most of these
are minor resource leaks and forgotten declarations.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Mostly by adding static and removing excess extern qualifiers. Also
convert a few remaining non-ANSI function declarations to ANSI.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
For each site where we test for a large file (> 2GB) and set the
LARGE_FILE feature, use a helper function to make the size test
consistent with the test that's in e2fsck. This fixes the fsck
complaints when we try to create a 2GB journal (not so hard with 64k
block size) and fixes the incorrect test in fileio.c.
Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>