The profile functions started as something specific to e2fsck. It's
now used by mke2fs and e2fsck, so it's better to move it into
libsupport.a.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
We will be using libsupport.a for e2fsprogs's internal support
functions. It will contain the quota support functions, but we will
also be moving code such as profile.c and plausible.c to libsupport.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The presence of --disable-htree is very much a legacy thing. Remove
it since supporting the lack of htree support is pretty silly.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
There were some generated files that weren't getting removed by the
clean and distclean targets; fix this.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When a file system with journal checksums is mounted, the journal
checksum is only updated when the journal superblock is actually
written to disk. But when a root file system is mounted read-only,
e2fsck will get the in-memory version of the journal superblock, and
the checksum is not necessarily going to be correct. Since we only
allow the root file system to be checked while it is mounted
read-only, and we won't be trying to replay the journal anyway. So we
can skip the checking the journal superblock fields for mounted file
systems.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Add a new get_alloc_blocks hook and a block_alloc_stats_range hook so
that e2fsck can capture allocation requests spanning more than a
block to its block_found_map.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Having multiple versions of jfs_user.h was confusing the Android
build. Clean up things by removing the lib/ext2fs/jfs_user.h and
misc/jfs_user.h and simplifying how we emulate the kernel
infrastructure needed by journal replay code and removing the
kernel-specific lines from kernel-jbd.h.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
e2fsck/dirinfo.c and misc/e4crypt.c use functions from libext2fs, so
we need to include its header file or clang will complain.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
As of v4.0, the Linux kernel won't add blocks to a block-mapped file
on a bigalloc filesystem. Therefore, convert any such files or
directories we find, to prevent fs errors later on.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Check the value of r_count to ensure that we never try to read revoke
records past the end of the revoke block. It turns out that the
journal writing code in debugfs was also playing fast and loose with
the r_count, so fix that as well.
The Coverity bug was 1297508.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Fix Coverity bugs 1297094-1297101 by fixing all the mutations in the
*_setup_tdb() functions, fixing buffer overflows, and checking
return values.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Provide the user with an option to create an undo file so that they
can roll back a failed repair operation.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The directory hash is now calculated using the on-disk encrypted
filename, and we no longer use the digest encoding or the SHA-256
encoding, so remove them from the ext2fs library until there is some
reason we need them.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Teach e2fsck to (re)construct extent trees. This enables us to do
either of the following: compress a highly sparse extent tree into
fewer ETB blocks; or convert a ext3-style block mapped file to an
extent file. The reconstruction is performed during pass 1E or 3A,
as detailed below.
For files that are already extent based, this algorithm will
automatically run (pending user approval) if pass1 determines either
(1) that a whole level of extent tree will fit into a higher level of
the tree; (2) that the size of any level can be reduced by at least
one ETB block; or (3) the extent tree is unnecessarily deep. It will
not run at all if errors are found and the user declines to fix the
errors.
The option "-E bmap2extent" can be used to force e2fsck to convert all
block map files to extent trees, and to rebuild all extent files'
extent trees. After conversion, files larger than 12 blocks should be
defragmented to eliminate empty holes where a block lives.
The extent tree constructor is pretty dumb -- it creates a list of
leaf extents (adjacent extents are collapsed), marks all indirect
blocks / ETB blocks free, installs a new extent tree root in the
inode, then loads the leaf extents into the tree.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
e2fsck pass1 is modified to use the block group data prefetch function
to try to fetch the inode tables into the pagecache before it is
needed. We iterate through the blockgroups until we have enough inode
tables that need reading such that we can issue readahead; then we sit
and wait until the last inode table block read of the last group to
start fetching the next bunch.
pass2 is modified to use the dirblock prefetching function to prefetch
the list of directory blocks that are assembled in pass1. We use the
"iterate a subset of a dblist" and avoid copying the dblist. Directory
blocks are fetched incrementally as we walk through the directory
block list. In previous iterations of this patch we would free the
directory blocks after processing, but the performance hit to e2fsck
itself wasn't worth it. Furthermore, it is anticipated that most
users will then mount the FS and start using the directories, so they
may as well remain in the page cache.
pass4 is modified to prefetch the block and inode bitmaps in
anticipation of pass 5, because pass4 is entirely CPU bound.
In general, these mechanisms can decrease fsck time by 10-40%, if the
host system has sufficient memory and the storage system can provide a
lot of IOPs. Pretty much any storage system capable of handling
multiple IOs in-flight at any time will see a fairly large performance
boost. (Single-issue USB mass storage disks seem to suffer badly.)
By default, the readahead buffer size will be set to the size of a block
group's inode table (which is 2MiB for a regular ext4 FS). The -E
readahead_kb= option can be given to specify the amount of memory to
use for readahead or zero to disable it entirely; or an option can be
given in e2fsck.conf.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
This patch adds to e2fsck the ability to pre-fetch metadata into the
page cache in the hopes of speeding up fsck runs. There are two new
functions -- the first allows a caller to readahead a list of blocks,
and the second is a helper function that uses that first mechanism to
load group data (bitmaps, inode tables).
These new e2fsck routines require the addition of a dblist API to
allow us to iterate a subset of a dblist. This will enable
incremental directory block readahead in e2fsck pass 2.
There's also a function to estimate the readahead given a FS.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When there's a problem accessing the EA part of an inline data symlink
and we want to truncate the symlink back to 60 characters (hoping the
user can re-establish the link later on, apparently) be sure to turn
off the inline data flag to convert the symlink back to a regular fast
symlink.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The compression patches were an out-of-kernel patch set that was (a)
only available for ext2, (b) something that was never could be
stablized due to file system corruption, and (c) the most recent
patches were for 3.1, last updated in 2011.
The history of the compression patches has been a bit checkered.
There is a long history here at http://e2compr.sourceforge.net which
lists the perspective of the people working on it from the e2compr
side.
From the ext2/3/4 mainline developers' perspective, initial
compression support was added to e2fsprogs in 2000 (in the Linux 2.2
era), but due to stability concerns the kernel patches were never
merged into the mainline kernel. While there were some sporadic
efforts to try to get the ext2 compression patches working in the 2.4
and 2.6 era, by that time mainline work had moved on to ext4, and the
e2compr approach could only work with 32-bit block numbers and
indirect mapped files.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Add missing new lib/ext2fs source files that were added for encryption
support. Also move configuration #define's from individual Android.mk
to the android_config.h file, since we've moved away from specifying
configuration #define's on the command-line upstream.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
fix_problem() returning 1 means to fix the fs error, so do that.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If ext2fs_new_block2() is called without a specific block map, we
should call the alloc_block hook before checking fs->block_map. This
helps us to avoid a bug in e2fsck where we need to allocate a block
but instead of consulting block_found_map, we use the FS bitmaps,
which (prior to pass 5) could be wrong.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
This allows us to print a message warning the user that there is
something funny going on with their hardware clock (probably time zone
issues caused by trying to be compatible with legacy OS's such as
Windows), without triggering a full file system check.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Use memcmp() instead of strncmp() since encrypted directory names can
contain NUL characters. For non-encrypted directories, we've already
checked for the case of NUL characters in file names, so it's safe to
use memcmp() here in all cases.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The Android.mk files were taken from the Android AOSP sources, and
updated for the 1.43 next branch. The intention is that this will
allow the repository which is currently located in external/e2fsprogs
with one which is based off of the upstream e2fsprogs. Right now
external/e2fsprogs was not created using "git clone", so it means that
git merges don't work. After the external/e2fsprogs Android
repository is replaced, with one based off the upstream repository,
Android will be able to synchronize with the upstream repository by
pulling and merging from upstream, and then running the script
"./util/gen-android-files" to update any generated files. (This is
necessary because in the Android build system, the Android.mk files
are rather stylized and don't make it easy to run arbitrary shell
scripts during the build phase.)
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The C preprocessing symbol NDEBUG is also defined (differently) by
Android's build files, and this was causing compilation failures. So
change assert() to dict_assert() and manually define it instead of
relying on the NDEBUG and <assert.h> semantics.
Also make sure the necessary debugging functions are available is
DICT_NODEBUG is not defined, so that dict.c will correctly build with
and without DICT_NODEBUG.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Provide a mechanism for a user to switch fsck into '-y' mode if they
start an interactive session and then get tired of pressing 'y' in
response to numerous prompts.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If the directory processing code ends up pointing to a directory entry
that's so close to the end of the block that there's not even space
for a rec_len/name_len, just substitute dummy values that will force
e2fsck to extend the previous entry to cover the remaining space. We
can't use the helper methods to extract rec_len because that's reading
off the end of the buffer.
This isn't an issue with non-inline directories because the directory
check buffer is zero-extended so that fsck won't blow up.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Strengthen the checks that guess if the inode we're looking at is an
inline directory. The current check sweeps up any inline inode if
its length is a multiple of four; now we'll at least try to see if
there's the beginning of a valid directory entry.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The design of inline directories (apparently) calls for the i_block[]
region and the EA regions to be treated as if they were two separate
blocks of dirents. Effectively this means that it is impossible for a
directory entry to straddle both areas. e2fsck doesn't enforce this,
so teach it to do so. e2fslib already knows to do this....
Cc: Zheng Liu <gnehzuil.liu@gmail.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Decrement the bad count *after* we've shown that (a) we can allocate a
replacement block and (b) remap the file block. Unfortunately,
the only way to tell if the remapping succeeded is to wait until the
next clone_file_block() call or block_iterate3() returns.
Otherwise, there's a corruption error: we decrease the badcount once in
preparation to remap, then the remap fails (either we can't find a
replacement block or we have to split the extent tree and can't find a
new extent block), so we delete the file, which decreases the badcount
on the block a second time. Later on e2fsck will think that it's
straightened out all the duplicate blocks, which isn't true.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
An earlier patch tried to detect indirect blocks that conflicted with
critical FS metadata for the purpose of preventing corrections being
made to those indirect blocks. Unfortunately, that patch cannot
handle more than one conflicting *ind block per file; therefore, use
the ref_block parameter to test the metadata block map to decide if
we need to avoid fixing the *ind block when we're iterating the
block's entries. (We have to iterate the block to capture any blocks
that the block points to, as they could be in use.)
As a side note, in 1B we'll reallocate all those conflicting *ind
blocks and restart fsck, so the contents will be checked eventually.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When we recreate the journal, don't say that the FS "is now ext3
again", since we could be fixing a damaged ext4 FS journal, which does
not magically convert the FS back to ext3.
[ Use "journaled" instead of "journalled", and also fix the message we
print when deleting the journal --Ted ]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If e2fsck encounters a read error on a block past the end of the
filesystem, don't bother trying to "rewrite" the block. We might
still want to re-try the read to capture FS data marooned past the end
of the filesystem, but in that case e2fsck ought to move the block
back inside the filesystem.
This enables e2fuzz to detect writes past the end of the FS due to
software bugs.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If we decide to clear a special inode because of bad mappings, we need
to zero the i_block array. The clearing routine depends on setting
i_links_count to zero to keep us from re-checking the block maps,
but that field isn't checked for special inodes. Therefore, if we
haven't erased the mappings, check_blocks will restart fsck and fsck
will try to check the blocks again, leading to an infinite loop.
(This seems easy to trigger if the bootloader inode extent map is
corrupted.)
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Previously, e2fsck accessed the field osd2.linux2.l_i_file_acl_high
field without checking that the filesystem is indeed created for
Linux. This lead to e2fsck constantly complaining about certain
nodes:
i_file_acl_hi for inode XXX (/dev/console) is 32, should be zero.
By "correcting" this problem, e2fsck would clobber the field
osd2.hurd2.h_i_mode_high.
Properly guard access to the OS dependent fields.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If e2fsck.conf's logging feature is enabled, and e2fsck is being run
via systemd-fsck, there will be a deadlock since systemd-fsck is
waiting for progress_fd pipe to be closed, instead of waiting for the
fsck process to exit --- and so the logfile child process won't exit
until it can write out the logfile, and systemd won't continue the
boot process so that the file system can be remounted read-write.
Oops.
Addresses-Debian-Bug: #775234
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Don't complain about checksum failures on the root dir when we're
trying to find l+f if the root dir is going to be rehashed anyway.
The test case for this is t_enable_mcsum in the next patch.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If a directory block lacks space for a checksum and the user directs
e2fsck to fix the directory block (by rehashing it), don't complain a
second time about the checksum verification failure when we get to the
end of the directory block.
Also, don't complain about broken HTREE directories if we're already
planning to rebuild the HTREE directory.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Try to be a little smarter about where we go to allocate blocks for a
inode. For a given inode and logical offset, set the goal as if the
file were physically continuous. If it's bmapped, just start looking
at wherever lblk 0 is. If that's not possible (the file has no
lblk>pblk mappings, inline data, etc.) then start looking in the
inode's block group.
[ Fixed memory leak --tytso ]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When we're rechecking an inode checksum failure, we need to force the
inode to be re-read from disk so that the verification routine runs,
so drop the stashed inode.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Don't say the physical block number is invalid if an extent block
fails only the checksum. It passes checks, so it's not invalid.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Fix some gcc-4.8 warnings and other problems that broke the build.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
e2fsck uses an array to store directory usage information during pass
3; the usage context also contains a pointer to the last directory
looked up. When expanding the dir_info array, this cache pointer
needs to be cleared if the array resize changed the pointer location,
or else we'll later walk off the end of this dead pointer.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reported-by: Sami Liedes <sami.liedes@iki.fi>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Sami Liedes reports that e2fsck fails to report the correct directory
inode number during a pass2 check for unexpected HTREE blocks.
Provide the inode number in the problem report.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reported-by: Sami Liedes <sami.liedes@iki.fi>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
This lays the groundwork for sparse-checking e2fsprogs for
endianness; defines bitwise types, and fixes up the ext2fs_*
swapping routines to do the proper casts.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Convert all call sites that write zero blocks to disk to use
ext2fs_zero_blocks2() since it can use Linux's zero out feature to do
the writes more quickly. Reclaim the zero buffer at freefs time and
make the write-zeroes fallback use a larger buffer.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If we're using check_plausibility() to try to identify something that
obviously isn't an ext* filesystem and libblkid doesn't know what it
is, try libmagic instead.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If any of these utilities detect a bad superblock magic, call
check_plausibility to see if blkid can identify the passed-in argument
as something else (xfs, partition, etc.) in the hopes of catching a
user error.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
There is no reason to request a aligned buffer in
check_{inode,block}_bitmap, and this will cause failures for dietlibc,
which doesn't have support for posix_memalign() or any other way to
request an aligned memory allocation. Fortunately, this is only
needed in very few places where direct I/O is required.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The asm_types.h file needs to include stdio.h and stdlib.h in order to
get integer types included. So add those includes into jfs_user.h to
avoid a build faliure under dietlibc.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Free the buffer head if the journal descriptor block fails checksum
verification. This has been patched before (see "e2fsck: free bh on
csum verify error in do_one_pass") but apparently the patch was never
committed to jbd2 in the kernel, so when we resync'd the recovery code
with 3.16, the bug came back. Sigh.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
If we find a hole in a directory on a bigalloc filesystem, we need to
obey the cluster alignment rules when collapsing the gap to avoid
later complaints.
Specifically, the calculation of the new logical cluster number was
incorrect, and we need to ensure that the logical cluster alignment
respects the physical cluster alignment, since we've concluded that
the extent's logical block number is wrong.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If in the course of iterating extents we find that an otherwise
valid-seeming second extent maps the same logical blocks as a
previously examined first extent, offer to clear the duplicate
mapping.
The test for this is already in f_extents.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If there isn't space in the root directory to add the lost+found
entry, try expanding the root directory before failing the fsck.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If the badblocks list says that the badblocks inode is bad, it's quite
likely that badblocks is broken. Worse yet, if the root inode is in
the same block as the badblocks inode (likely since they're adjacent),
the filesystem becomes unfixable because pass3 notices the bad root
inode and exits.
So... if we encounter this case, just kill the badblocks inode.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The journal superblock's s_sequence field seems to track the tid of
the tail (oldest) transaction in the log. Therefore, when we release
the journal, set the s_sequence to the tail_sequence, because setting
it to the transaction_sequence means that we're setting the tid to
that of the head of the log. Granted, for replay these two are
usually the same (and s_start == 0 anyway) so thus far we've gotten
lucky and nobody noticed.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Create a journal.c with routines adapted from e2fsck/journal.c to
handle opening and closing the journal, and setting up the
descriptors, and all that. Unlike e2fsck's versions which try to
identify and fix problems, the routines here have no way to repair
anything.
[ Modified by tytso to fold debugfs/jfs_user.h into e2fsck/jfs_user.h,
so we don't have to copy recovery.c and revoke.c into debugfs. --tytso ]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When we're removing the internal journal (broken journal, turning it
off, or adding an external journal), zero s_jnl_blocks so that they
can't be picked up by accident later.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Cc: TR Reardon <thomas_reardon@hotmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Verify the (ext4) superblock checksum of an external journal device
and prompt to correct the checksum if nothing else is wrong with the
superblock.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Cc: TR Reardon <thomas_reardon@hotmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
It turns out that there are some serious problems with the on-disk
format of journal checksum v2. The foremost is that the function to
calculate descriptor tag size returns sizes that are too big. This
causes alignment issues on some architectures and is compounded by the
fact that some parts of jbd2 use the structure size (incorrectly) to
determine the presence of a 64bit journal instead of checking the
feature flags. These errors regrettably lead to the journal
corruption reported by Mr. Reardon.
Therefore, introduce journal checksum v3, which enlarges the
descriptor block tag format to allow for full 32-bit checksums of
journal blocks, fix the journal tag function to return the correct
sizes, and fix the jbd2 recovery code to use feature flags to
determine 64bitness.
Add a few function helpers so we don't have to open-code quite so
many pieces.
Switching to a 16-byte block size was found to increase journal size
overhead by a maximum of 0.1%, to convert a 32-bit journal with no
checksumming to a 32-bit journal with checksum v3 enabled.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reported-by: TR Reardon <thomas_reardon@hotmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If the reallocation of dir_info fails, we will eventually cause e2fsck
to fail with an internal error. So if the realloc fails, print a
message and bail out with a fatal error early when at the time of the
reallocation failure.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When recovering the journal, don't fall into an infinite loop if we
encounter a corrupt journal block. Instead, just skip the block and
proceed with the full filesystem fsck.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Synchronize e2fsck's copy of revoke.c with the kernel's copy in
fs/jbd2.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Synchronize e2fsck's copy of recovery.c with the kernel's copy in
fs/jbd2.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
On big-endian systems, if the dirent swap routine finds a rec_len that
it doesn't like, it continues processing the block as if rec_len == 8.
This means that the name field gets byte swapped, which means that
salvage will not detect the correct name length (unless the name has a
length that's an exact multiple of four bytes), and it'll discard the
entry (unnecessarily) and the rest of the dirent block. Therefore,
swap the rest of the block back to disk order, run salvage, and
re-swap anything after the salvaged dirent.
The test case for this is f_inlinedata_repair if you run it on a BE
system.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
ext2fs_flush2() unconditionally writes the block group descriptors to
disk even if the underlying FS isn't marked dirty. This causes the
following error message on a fsck -n run:
e2fsck 1.43-WIP (09-Jul-2014)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Error writing block 2 (Attempt to write block to filesystem resulted in short write). Ignore error? no
Error writing block 2 (Attempt to write block to filesystem resulted in short write). Ignore error? no
Error writing file system info: Attempt to write block to filesystem resulted in short write
Since ext2fs_close2() only calls flush if the dirty flag is set,
modify e2fsck to exhibit the same behavior so that we don't spit out
write errors for a read only check.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
In an inline directory, the '..' entry is compacted down to just the
inode number; there is no full '..' entry. Therefore, it makes no
sense to assign 'prev' to the fake dotdot entry we put on the stack,
as this could confuse a salvage_directory call on a corrupted next
entry into modifying stack contents (the fake dotdot entry).
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If a file is marked inline_data but its i_size isn't a multiple of
four, it probably isn't an inline directory, because directory entries
have sizes that are multiples of four.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Directory entries must have a size that's a multiple of 4; therefore
the inline directory structure must also have a size that is a muliple
of 4. Since e2fsck doesn't check this, we should check that now.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Now that the directory salvaging operation is fed the block size,
teach pass 2 that it should use the size of the inline data if the
directory is inline_data. Without this, it'll "fix" inline
directories by setting the rec_len to something approaching the FS
blocksize, which is clearly wrong.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If we encounter a directory whose i_size != the inline data size, just
set i_size to the size of the inline data. The pb.last_block
calculation is wrong since pb.last_block == -1, which results in
i_size being set to zero, which corrupts the directory.
Clear the inline_data inode flag if we actually /are/ setting i_size
to zero.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If we come across an inode with the inline data and extents inode flag
set, try to figure out the correct flag settings from the contents of
i_block and i_size. If i_blocks looks like an extent tree head, we'll
make it an extent inode; if it's small enough for inline data, set it
to that. This leaves the weird gray area where there's no extent
tree but it's too big for the inode -- if /could/ be a block map,
change it to that; otherwise, just clear the inode.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Since fifo, socket, and device inodes cannot have inline data or
extents, strip off these flags if we find them.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Inodes with inline_data set do not have iterable blocks, so don't try
to iterate the blocks, because that will just fail, causing e2fsck to
abort.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Since the inline data flag will cause the extent/block map iteration
code to abort fsck early, move the test for the inode flag and the
actual block check call further forward in check_blocks. This
eliminates an e2fsck abort on an inline data symlink when the file ACL
block is set.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Perform some basic checks on inline-data symlinks.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If i_size indicates that an inode requires a system.data extended
attribute to hold overflow from i_blocks but the EA cannot be found,
offer to truncate the file.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Ensure that the various blobs in the in-inode EA region do not overlap.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The inline data code fails to perform endianness conversions correctly
or at all in a number of places, so fix this so that big-endian
machines function properly.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>