Project quota related fields are reserved in Linux kernel.
As a preparation for it, this patch cleans up quota codes
of e2fsprogs so as to make it easier to add new quota type(s).
Signed-off-by: Li Xi <lixi@ddn.com>
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When an extent-mapped directory is compacted by "e2fsck -fD" and
frees enough leaf blocks that it loses an extent tree index block,
the old e2fsck_rehash_dir->ext2fs_block_iterate3->write_dir_block()
code would not free the extent block, which would result in the
extent tree becoming corrupted when it is written out.
Pass 1: Checking inodes, blocks, and sizes
Inode 17825800, end of extent exceeds allowed value
(logical block 710, physical block 570459684, len 1019)
This results in loss of a whole index block of directory leaf blocks
and maybe thousands or millions of files in lost+found.
Fix e2fsck_rehash_dir() to call ext2fs_punch() to free the blocks
at the end of the directory instead of trying to handle this itself
while writing out the directory. That properly handles all of the
cases of updating the extent tree as well as accounting for blocks
that are released (both leaf blocks and index blocks).
Add a test case for compacting the directory to be smaller than the
index block that originally caused the corruption.
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The kernel requires all inodes with the extent flag set to have a
valid extent tree header in i_block. The ext2fs_extent_open2 prefers
to initialize the header if i_block is zeroed, but e2fsck never writes
the new header to disk. Since the kernel won't create inodes with the
flag and no header anyway, zap such files.
Reported-by: Bo Branten <bosse@acc.umu.se>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
In check_inode_extra_space(), if we attempt to read an EA header at
the end of the extra space, in a corrupted filesystem it may result in
a read beyond the bounds of the inode. Add a check to prevent this.
Reproduced by running ./test_one --valgrind f_write_ea_toobig_extra_isize.
Signed-off-by: Artemiy Volkov <artemiyv@acm.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Acked-by: Darrick J. Wong <darrick.wong@oracle.com>
There is a bug in how e2fsck handles being interrupted by CTRL-C.
If CTRL-C is pressed to kill e2fsck rather than e.g. kill -9, then
the interrupt handler sets E2F_FLAG_CANCEL in the context but doesn't
actually kill the process. Instead, e2fsck_pass1() checks this flag
before processing the next inode.
If a filesystem is running in fix mode (e2fsck -fy) is interrupted,
and the quota feature is enabled, then the quota file will still be
written to disk even though the inode scan was not complete and the
quota information is totally inaccurate. Even worse, if the Pass 1
inode and block scan was not finished, then the in-memory block
bitmaps (which are used for block allocation during e2fsck) are also
invalid, so any blocks allocated to the quota files may corrupt other
files if those blocks were actually used.
e2fsck 1.42.13.wc3 (28-Aug-2015)
Pass 1: Checking inodes, blocks, and sizes
^C[QUOTA WARNING] Usage inconsistent for ID 0:
actual (6455296, 168) != expected (8568832, 231)
[QUOTA WARNING] Usage inconsistent for ID 695:
actual (614932320256, 63981) != expected (2102405386240, 176432)
Update quota info for quota type 0? yes
[QUOTA WARNING] Usage inconsistent for ID 0:
actual (6455296, 168) != expected (8568832, 231)
[QUOTA WARNING] Usage inconsistent for ID 538:
actual (614932320256, 63981) != expected (2102405386240, 176432)
Update quota info for quota type 1? yes
myth-OST0001: e2fsck canceled.
myth-OST0001: ***** FILE SYSTEM WAS MODIFIED *****
There may be a desire to flush out modified inodes and such that have
been repaired, so that restarting an interrupted e2fsck will make
progress, but the quota file update is plain wrong unless at least
pass1 has finished, and the journal recreation is also dangerous if
the block bitmaps have not been fully updated.
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Create separate predicate functions to test/set/clear feature flags,
thereby replacing the wordy old macros. Furthermore, clean out the
places where we open-coded feature tests.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
We weren't verifying the checksum of an htree leaf block due to a
coding error that marked all htree blocks as not having checksums.
While we're at it, fix the error message that gets displayed so that
it doesn't print a meaningless block offset.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If there are directory entries with file names which are less than 16
bytes, it turns out that passing less than the crypto block size to
the kernel's crypto layer will cause the kernel to crash.
However, since there never should be encrypted directory entries where
the file name is less than 16 bytes (the AES block size), change
e2fsck to offer to address this corruption by deleting the directory
entry.
(We need to checks for this condition into the kernel as well.)
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The /lost+found directory must not be encrypted, since e2fsck won't
have any keys. If we find an encrypted lost+found directory, we
should delete the directory and recreate it.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The quota code required that we included dict.o in libsupport.a, so we
might as well just move dict.c and dict.h to lib/support, and then
have e2fsck use the version of dict.c in libsupport.a. This
simplifies the build system and eliminates having two identical copies
of dict.o floating around in the build tree.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The check_plausibility() function is now used all over the place, so
we should move the plausible.c file to lib/support and remove the
special case handling for that file that had been in the build system.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The profile functions started as something specific to e2fsck. It's
now used by mke2fs and e2fsck, so it's better to move it into
libsupport.a.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
We will be using libsupport.a for e2fsprogs's internal support
functions. It will contain the quota support functions, but we will
also be moving code such as profile.c and plausible.c to libsupport.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The presence of --disable-htree is very much a legacy thing. Remove
it since supporting the lack of htree support is pretty silly.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
There were some generated files that weren't getting removed by the
clean and distclean targets; fix this.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When a file system with journal checksums is mounted, the journal
checksum is only updated when the journal superblock is actually
written to disk. But when a root file system is mounted read-only,
e2fsck will get the in-memory version of the journal superblock, and
the checksum is not necessarily going to be correct. Since we only
allow the root file system to be checked while it is mounted
read-only, and we won't be trying to replay the journal anyway. So we
can skip the checking the journal superblock fields for mounted file
systems.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Add a new get_alloc_blocks hook and a block_alloc_stats_range hook so
that e2fsck can capture allocation requests spanning more than a
block to its block_found_map.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Having multiple versions of jfs_user.h was confusing the Android
build. Clean up things by removing the lib/ext2fs/jfs_user.h and
misc/jfs_user.h and simplifying how we emulate the kernel
infrastructure needed by journal replay code and removing the
kernel-specific lines from kernel-jbd.h.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
e2fsck/dirinfo.c and misc/e4crypt.c use functions from libext2fs, so
we need to include its header file or clang will complain.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
As of v4.0, the Linux kernel won't add blocks to a block-mapped file
on a bigalloc filesystem. Therefore, convert any such files or
directories we find, to prevent fs errors later on.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Check the value of r_count to ensure that we never try to read revoke
records past the end of the revoke block. It turns out that the
journal writing code in debugfs was also playing fast and loose with
the r_count, so fix that as well.
The Coverity bug was 1297508.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Fix Coverity bugs 1297094-1297101 by fixing all the mutations in the
*_setup_tdb() functions, fixing buffer overflows, and checking
return values.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Provide the user with an option to create an undo file so that they
can roll back a failed repair operation.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The directory hash is now calculated using the on-disk encrypted
filename, and we no longer use the digest encoding or the SHA-256
encoding, so remove them from the ext2fs library until there is some
reason we need them.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Teach e2fsck to (re)construct extent trees. This enables us to do
either of the following: compress a highly sparse extent tree into
fewer ETB blocks; or convert a ext3-style block mapped file to an
extent file. The reconstruction is performed during pass 1E or 3A,
as detailed below.
For files that are already extent based, this algorithm will
automatically run (pending user approval) if pass1 determines either
(1) that a whole level of extent tree will fit into a higher level of
the tree; (2) that the size of any level can be reduced by at least
one ETB block; or (3) the extent tree is unnecessarily deep. It will
not run at all if errors are found and the user declines to fix the
errors.
The option "-E bmap2extent" can be used to force e2fsck to convert all
block map files to extent trees, and to rebuild all extent files'
extent trees. After conversion, files larger than 12 blocks should be
defragmented to eliminate empty holes where a block lives.
The extent tree constructor is pretty dumb -- it creates a list of
leaf extents (adjacent extents are collapsed), marks all indirect
blocks / ETB blocks free, installs a new extent tree root in the
inode, then loads the leaf extents into the tree.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
e2fsck pass1 is modified to use the block group data prefetch function
to try to fetch the inode tables into the pagecache before it is
needed. We iterate through the blockgroups until we have enough inode
tables that need reading such that we can issue readahead; then we sit
and wait until the last inode table block read of the last group to
start fetching the next bunch.
pass2 is modified to use the dirblock prefetching function to prefetch
the list of directory blocks that are assembled in pass1. We use the
"iterate a subset of a dblist" and avoid copying the dblist. Directory
blocks are fetched incrementally as we walk through the directory
block list. In previous iterations of this patch we would free the
directory blocks after processing, but the performance hit to e2fsck
itself wasn't worth it. Furthermore, it is anticipated that most
users will then mount the FS and start using the directories, so they
may as well remain in the page cache.
pass4 is modified to prefetch the block and inode bitmaps in
anticipation of pass 5, because pass4 is entirely CPU bound.
In general, these mechanisms can decrease fsck time by 10-40%, if the
host system has sufficient memory and the storage system can provide a
lot of IOPs. Pretty much any storage system capable of handling
multiple IOs in-flight at any time will see a fairly large performance
boost. (Single-issue USB mass storage disks seem to suffer badly.)
By default, the readahead buffer size will be set to the size of a block
group's inode table (which is 2MiB for a regular ext4 FS). The -E
readahead_kb= option can be given to specify the amount of memory to
use for readahead or zero to disable it entirely; or an option can be
given in e2fsck.conf.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
This patch adds to e2fsck the ability to pre-fetch metadata into the
page cache in the hopes of speeding up fsck runs. There are two new
functions -- the first allows a caller to readahead a list of blocks,
and the second is a helper function that uses that first mechanism to
load group data (bitmaps, inode tables).
These new e2fsck routines require the addition of a dblist API to
allow us to iterate a subset of a dblist. This will enable
incremental directory block readahead in e2fsck pass 2.
There's also a function to estimate the readahead given a FS.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When there's a problem accessing the EA part of an inline data symlink
and we want to truncate the symlink back to 60 characters (hoping the
user can re-establish the link later on, apparently) be sure to turn
off the inline data flag to convert the symlink back to a regular fast
symlink.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The compression patches were an out-of-kernel patch set that was (a)
only available for ext2, (b) something that was never could be
stablized due to file system corruption, and (c) the most recent
patches were for 3.1, last updated in 2011.
The history of the compression patches has been a bit checkered.
There is a long history here at http://e2compr.sourceforge.net which
lists the perspective of the people working on it from the e2compr
side.
From the ext2/3/4 mainline developers' perspective, initial
compression support was added to e2fsprogs in 2000 (in the Linux 2.2
era), but due to stability concerns the kernel patches were never
merged into the mainline kernel. While there were some sporadic
efforts to try to get the ext2 compression patches working in the 2.4
and 2.6 era, by that time mainline work had moved on to ext4, and the
e2compr approach could only work with 32-bit block numbers and
indirect mapped files.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Add missing new lib/ext2fs source files that were added for encryption
support. Also move configuration #define's from individual Android.mk
to the android_config.h file, since we've moved away from specifying
configuration #define's on the command-line upstream.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
fix_problem() returning 1 means to fix the fs error, so do that.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If ext2fs_new_block2() is called without a specific block map, we
should call the alloc_block hook before checking fs->block_map. This
helps us to avoid a bug in e2fsck where we need to allocate a block
but instead of consulting block_found_map, we use the FS bitmaps,
which (prior to pass 5) could be wrong.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
This allows us to print a message warning the user that there is
something funny going on with their hardware clock (probably time zone
issues caused by trying to be compatible with legacy OS's such as
Windows), without triggering a full file system check.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Use memcmp() instead of strncmp() since encrypted directory names can
contain NUL characters. For non-encrypted directories, we've already
checked for the case of NUL characters in file names, so it's safe to
use memcmp() here in all cases.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The Android.mk files were taken from the Android AOSP sources, and
updated for the 1.43 next branch. The intention is that this will
allow the repository which is currently located in external/e2fsprogs
with one which is based off of the upstream e2fsprogs. Right now
external/e2fsprogs was not created using "git clone", so it means that
git merges don't work. After the external/e2fsprogs Android
repository is replaced, with one based off the upstream repository,
Android will be able to synchronize with the upstream repository by
pulling and merging from upstream, and then running the script
"./util/gen-android-files" to update any generated files. (This is
necessary because in the Android build system, the Android.mk files
are rather stylized and don't make it easy to run arbitrary shell
scripts during the build phase.)
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The C preprocessing symbol NDEBUG is also defined (differently) by
Android's build files, and this was causing compilation failures. So
change assert() to dict_assert() and manually define it instead of
relying on the NDEBUG and <assert.h> semantics.
Also make sure the necessary debugging functions are available is
DICT_NODEBUG is not defined, so that dict.c will correctly build with
and without DICT_NODEBUG.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Provide a mechanism for a user to switch fsck into '-y' mode if they
start an interactive session and then get tired of pressing 'y' in
response to numerous prompts.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If the directory processing code ends up pointing to a directory entry
that's so close to the end of the block that there's not even space
for a rec_len/name_len, just substitute dummy values that will force
e2fsck to extend the previous entry to cover the remaining space. We
can't use the helper methods to extract rec_len because that's reading
off the end of the buffer.
This isn't an issue with non-inline directories because the directory
check buffer is zero-extended so that fsck won't blow up.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>