When an extent-mapped directory is compacted by "e2fsck -fD" and
frees enough leaf blocks that it loses an extent tree index block,
the old e2fsck_rehash_dir->ext2fs_block_iterate3->write_dir_block()
code would not free the extent block, which would result in the
extent tree becoming corrupted when it is written out.
Pass 1: Checking inodes, blocks, and sizes
Inode 17825800, end of extent exceeds allowed value
(logical block 710, physical block 570459684, len 1019)
This results in loss of a whole index block of directory leaf blocks
and maybe thousands or millions of files in lost+found.
Fix e2fsck_rehash_dir() to call ext2fs_punch() to free the blocks
at the end of the directory instead of trying to handle this itself
while writing out the directory. That properly handles all of the
cases of updating the extent tree as well as accounting for blocks
that are released (both leaf blocks and index blocks).
Add a test case for compacting the directory to be smaller than the
index block that originally caused the corruption.
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Create separate predicate functions to test/set/clear feature flags,
thereby replacing the wordy old macros. Furthermore, clean out the
places where we open-coded feature tests.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The directory hash is now calculated using the on-disk encrypted
filename, and we no longer use the digest encoding or the SHA-256
encoding, so remove them from the ext2fs library until there is some
reason we need them.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Teach e2fsck to (re)construct extent trees. This enables us to do
either of the following: compress a highly sparse extent tree into
fewer ETB blocks; or convert a ext3-style block mapped file to an
extent file. The reconstruction is performed during pass 1E or 3A,
as detailed below.
For files that are already extent based, this algorithm will
automatically run (pending user approval) if pass1 determines either
(1) that a whole level of extent tree will fit into a higher level of
the tree; (2) that the size of any level can be reduced by at least
one ETB block; or (3) the extent tree is unnecessarily deep. It will
not run at all if errors are found and the user declines to fix the
errors.
The option "-E bmap2extent" can be used to force e2fsck to convert all
block map files to extent trees, and to rebuild all extent files'
extent trees. After conversion, files larger than 12 blocks should be
defragmented to eliminate empty holes where a block lives.
The extent tree constructor is pretty dumb -- it creates a list of
leaf extents (adjacent extents are collapsed), marks all indirect
blocks / ETB blocks free, installs a new extent tree root in the
inode, then loads the leaf extents into the tree.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The compression patches were an out-of-kernel patch set that was (a)
only available for ext2, (b) something that was never could be
stablized due to file system corruption, and (c) the most recent
patches were for 3.1, last updated in 2011.
The history of the compression patches has been a bit checkered.
There is a long history here at http://e2compr.sourceforge.net which
lists the perspective of the people working on it from the e2compr
side.
From the ext2/3/4 mainline developers' perspective, initial
compression support was added to e2fsprogs in 2000 (in the Linux 2.2
era), but due to stability concerns the kernel patches were never
merged into the mainline kernel. While there were some sporadic
efforts to try to get the ext2 compression patches working in the 2.4
and 2.6 era, by that time mainline work had moved on to ext4, and the
e2compr approach could only work with 32-bit block numbers and
indirect mapped files.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When we need to modify the "ignore checksum error" behavior flag to
get us past a library call, it's possible that the library call can
result in other flag bits being changed. Therefore, it is not correct
to restore unconditionally the previous flags value, since this will
have unintended side effects on the other fs->flags; nor is it correct
to assume that we can unconditionally set (or clear) the "ignore csum
error" flag bit. Therefore, we must merge the previous value of the
"ignore csum error" flag with the value of flags after the call.
Note that we want to leave checksum verification on as much as
possible because doing so exposes e2fsck bugs where two metadata
blocks are "sharing" the same disk block, and attempting to fix one
before relocating the other causes major filesystem damage. The
damage is much more obvious when a previously checked piece of
metadata suddenly fails in a subsequent pass.
The modifications to the pass 2, 3, and 3A code are justified as
follows: When e2fsck encounters a block of directory entries and
cannot find the placeholder entry at the end that contains the
checksum, it will try to insert the placeholder. If that fails, it
will schedule the directory for a pass 3A reconstruction. Until that
happens, we don't want directory block writing (pass 2), block
iteration (pass 3), or block reading (pass 3A) to fail due to checksum
errors, because failing to find the placeholder is itself a checksum
verification error, which causes e2fsck to abort without fixing
anything.
The e2fsck call to ext2fs_read_bitmaps must never fail due to a
checksum error because e2fsck subsequently (a) verifies the bitmaps
itself; or (b) decides that they don't match what has been observed,
and rewrites them.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Provide an API to set i_size in an inode and take care of all required
feature flag modifications. Refactor the code to use this new
function.
[ Moved the function to lib/ext2fs/blk_num.c, which is the rest of
these sorts of functions live, and renamed it to be
ext2fs_inode_size_set() instead of ext2fs_inode_set_size() to be
consistent with the other functions in in blk_num.c -- tytso ]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If a user crafts a carefully constructed filesystem containing a
single directory entry block with an invalid checksum and fewer than
two entries, and then runs e2fsck to fix the filesystem, fsck will
crash when it tries to "compress" the short dir and passes a negative
dirent array length to qsort. Therefore, don't allow directory
"compression" in this situation.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
During the later passes of efsck, we sometimes need to allocate and
map blocks into a file. This can happen either by fsck directly
calling new_block() or indirectly by the library calling new_block
because it needs to allocate a block for lower level metadata (bmap2()
with BMAP_SET; block_iterate3() with BLOCK_CHANGED).
We need to force new_block to allocate blocks from the found block
map, because the FS block map could be inaccurate for various reasons:
the map is wrong, there are missing blocks, the checksum failed, etc.
Therefore, any time fsck does something that could to allocate blocks,
we need to intercept allocation requests so that they're sourced from
the found block map. Remove the previous code that swapped bitmap
pointers as this is now unneeded.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If a directory's contents are stored entirely inside the inode,
there's no index to rebuild and no dirblock checksum to recompute.
As far as I know these are the only two reasons to call dir rehash.
Therefore, we can move on to the next dir instead of what we do right
now, which is try to iterate the dir blocks (which of course fails due
to the inline_data iflag being set) and then flood stdout with useless
messages that aren't even failures.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
In e2fsck_expand_directory() we don't handle a dir with inline data
because when this function is called the directory inode shouldn't
contains inline data.
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
When the rehash process is running on a bigalloc filesystem, it
compresses all the directory entries and hash structures into the
beginning of the directory file and then uses block_iterate3() to free
the blocks off the end of the file. It seems to call
ext2fs_block_alloc_stats2() for every block in a cluster, which is
unfortunate because this function allocates and frees entire clusters
(and updates the summary counts accordingly). In this case e2fsck
writes out incorrect summary counts.
Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
We need to store some error codes using an int to keep recovery.c as
close as possible to the recovery.c source file in the kernel.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Fix all the places where we should be using a blk64_t instead of a
blk_t. These fixes are more severe because 64bit values could be
truncated silently.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Accessing name_len (and file_type) in ext4_dir_entry structure is
somewhat problematic because on big endian architecture we need to now
whether we are really dealing with ext4_dir_entry (which has u16
name_len which needs byte swapping) or ext4_dir_entry_2 (which has u8
name_len which must not be byte swapped).
Currently the code is somewhat surprising and name_len is always
treated as u16 and byte swapped (flag EXT2_DIRBLOCK_V2_STRUCT isn't
ever used) and then masking of name_len is used to access real
name_len or file_type. Doing things this way in applications using
libext2fs is unexpected to say the least (more natural is to type
struct ext4_dir_entry * to struct ext4_dir_entry_2 * but that gives
wrong results on big endian architectures. So provide helper functions
that give endian-safe access to these fields. Also convert users in
e2fsprogs to use these functions.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Perhaps the most serious fix up is a type-punning warning which could
result in miscompilation with overly enthusiastic compilers.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Commit 07307114de didn't correctly handle the lost+found directory
when it added support for metadata checksums. First of all,
e2fsck_get_lost_and_found() assumed that the inode_dir_map bitmap was
initialized, and it wasn't when it was called earlier by a change in
that commit. Secondly, it's important that lost+found dirctory is
processed in case its directory checksums are incorrect, but should
preserve any empty dirctory blocks so there space available for e2fsck
to reconnect any orphan inodes.
Fix these problems, to fix test failures: f_holedir2 and f_rehash_dir
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Introduce small structures for recording directory tree checksums, and
some API changes to support writing out directory blocks with
checksums.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Check htree internal node checksums. If broken, ask user to clear
the htree index and recreate it later.
[ Move the check for not rehashing the lost+found directory to pass1
so that we don't end up truncating lost+found when the metadata
checksum feature is enabled. -- TYT ]
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
When checking to see whether or not a new name is unique, the code was
using the wrong length parameter, which could cause the anti-collision
loop for a long time trying to find what it thinks is a unique name.
Addresses-Sourceforge-Bug: #3540545
Reported-by: Vitaly Oratovsky <vmo@users.sourceforge.net>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The DEFS line in MCONFIG had gotten so long that it exceeded 4k, and
this was starting to cause some tools heartburn. It also made "make
V=1" almost useless, since trying to following the individual commands
run by make was lost in the noise of all of the defines.
So fix this by putting the configure-generated defines in lib/config.h
and the directory pathnames to lib/dirpaths.h.
In addition, clean up some vestigal defines in configure.in and in the
Makefiles to further shorten the cc command lines.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Fix several types of compiler warnings (unused variables/labels),
uninitialized variables, etc that are hit with gcc -Wall.
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
E2fsprogs 1.41.10 introduced a regression (in commit b71e018) where
e2fsck -fD can corrupt non-indexed directories when are exists one or
more file names which alphabetically sort before ".". This can happen
with ext2 filesystems or for small directories (take less than a
block) which contain filenames that begin with a space or some other
punctuation mark.
Fix this by making sure we never reorder the '.' or '..' entry in the
directory, since they must be first.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Duplicate directory entries were not necessarily getting found and
fixed for non-indexed directories, since we were sorting these
directories by inode number, and the duplicate entry code assumed the
entries were getting sorted by name or directory name hash.
Addresses-Sourceforge-Bug: #2862551
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Previously e2fsprogs interpreted 0 for a rec_len of 65536 (which could
occur if the directory block is completely empty in 64k blocksize
filesystems), while the kernel interpreted 65535 to mean 65536. The
kernel will accept both to mean 65536, and encodes 65535 to be 65536.
This commit changes e2fsprogs to match.
We add the encoding agreed upon for 128k and 256k filesystems, but we
don't enable support for these larger block sizes, since they haven't
been fully tested.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
in the case of ! defined RESOURCE_TRACK, so that we can clean up #ifdef
throughout e2fsck source.
Signed-off-by: Ken Chen <kenchen@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
If the directory is packed with no slack space, as soon as any new
directory entries are added, leaf nodes end up getting split and
directory ends up getting very inefficient.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The rec_len field in the directory entry is 16 bits, so if the
filesystem is completely empty, rec_len of 0 is used to designate
65536, for the case where the directory entry takes the entire 64k
block.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>