This bug was introduced by commit 7dfefaf413 ("tune2fs: update
journal super block when changing UUID for fs").
Fixes-Coverity-Bug: 1229243
Reported-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Using -U option you can change the UUID for fs, however it will not work
for journal device, since it have a copy of this UUID inside jsb (i.e.
journal super block). So copy UUID on change into that block.
Here is the initial thread:
http://comments.gmane.org/gmane.comp.file-systems.ext4/44532
You can reproduce this by executing following commands:
$ fallocate -l100M /tmp/dev
$ fallocate -l100M /tmp/journal
$ sudo /sbin/losetup /dev/loop1 /tmp/dev
$ sudo /sbin/losetup /dev/loop0 /tmp/journal
$ mke2fs -O journal_dev /tmp/journal
$ tune2fs -U da1f2ed0-60f6-aaaa-92fd-738701418523 /tmp/journal
$ sudo mke2fs -t ext4 -J device=/dev/loop0 /dev/loop1
$ dumpe2fs -h /tmp/dev | fgrep UUID
dumpe2fs 1.43-WIP (18-May-2014)
Filesystem UUID: 8a776be9-12eb-411f-8e88-b873575ecfb6
Journal UUID: e3d02151-e776-4865-af25-aecb7291e8e5
$ sudo e2fsck /dev/vdc
e2fsck 1.43-WIP (18-May-2014)
External journal does not support this filesystem
/dev/loop1: ********** WARNING: Filesystem still has errors **********
Reported-by: Chin Tzung Cheng <chintzung@gmail.com>
Signed-off-by: Azat Khuzhin <a3at.mail@gmail.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Use EXT2_MIN_BLOCK_SIZE, JFS_MIN_JOURNAL_BLOCKS, SUPERBLOCK_SIZE, and
SUPERBLOCK_OFFSET instead of hardcoded 1024 when it is okay, and also
add a helper ext2fs_journal_sb_start() that will return start of
journal sb with special case for fs with 1k block size.
Signed-off-by: Azat Khuzhin <a3at.mail@gmail.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
This should have been part of commit 9a1d614df2 ("e2fsck: fix
rule-violating lblk->pblk mappings on bigalloc filesystems") but it
accidentally got dropped when the patch was applied.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When creating a file system using a source directory, also copy any extended
attributes that have been set.
[ Add configure tests for Linux-specific xattr syscalls and add fallback
when compiling on non-Linux systems. --tytso ]
Signed-off-by: Ross Burton <ross.burton@intel.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
ioctl(FIGETBSZ) was used to get block size earlier but 2508eaa7
(filefrag: improvements to filefrag FIEMAP handling) moved to fstatfs
f_bsize which doesn't work well for many files systems.
Block size returned using fstatfs isn't block size but "optimal
transfer block size" as per man page. Even stat st_blksize is
"preferred I/O block size" and in may file systems it may even vary
from file to file (POSIX). This patch changes filefrag to use
FIGETBSZ preferentially over f_bsize.
[ Modified by tytso to add the fallback to f_bsize if FIGETBSZ fails
for some reason ]
Signed-off-by: Rakesh Pandit <rakesh@tuxera.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
29758d2 broke -B option which is useful for filesystems not supporting
FIEMAP. Also, fix extents calculation for -B which is broken since
2508eaa7.
Signed-off-by: Rakesh Pandit <rakesh@tuxera.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If an extent fails checksum and the sanity checks, and the user elects
to fix the extents, don't bother asking (the second time) if the user
would like to fix the checksum. Refactor some redundant code to make
what's going on a little cleaner.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Fix the routine that adds dirent checksum structures to the directory
block to handle oddball situations a bit more robustly.
First, when we're walking the entry array, we might encounter an
entry that ends exactly one byte before where the checksum entry needs
to start, i.e. there's space for the tail entry, but it needs to be
reinitialized. When that happens, we should proceed until d points to
that space so that the tail entry can be initialized.
Second, it's possible that we've been fed a directory block where the
entries end just short of the end of the block. In this case, we need
to adjust the size of the last entry to point exactly to where the
dirent tail starts. The current code requires that entries end
exactly on the block boundary, but this is not always the case with
damaged filesystems.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When we're salvaging a directory, leave room at the end of the block
for the checksum entry so that e2fsck can write the checksummed dir
block out later.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If the badblocks inode fails checksum verification, just clear the
inode and move on. If we don't do this, we can end up importing a lot
of garbage into the badblocks list, which will then cause fsck to try
to regenerate anything that was sitting atop the supposedly damaged
blocks. Given that most hardware will remap bad sectors transparently
from ext4, the number of people this could affect adversely is pretty
low.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If we trash the root directory block, e2fsck will find inode 11 (the
old lost+found) and try to attach it to l+f. The lost+found checker
also fails to find l+f and tries to add one to the root dir. The root
dir is not found but is recreated with incorrect checksums, so linking
in the l+f dir fails and the l+f '..' entry isn't set. Since both
dirs now fail checksum verification, they're both referred to rehash
to have that fixed, but because l+f doesn't have a '..' entry, rehash
crashes because l+f has < 2 entries.
On a checksumming filesystem, the routines in e2fsck that recreate
/lost+found and / must write the new directory block *after* the inode
has been written to disk because the checksum depends on i_generation.
Add a regression test while we're at it.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If e2fsck is writing a block of directory entries to disk, it should
adjust the dirents to add the dirent tail if one is missing. It's not
a big deal if there's no space to do this since rehash (pass 3A) will
reconstruct directories for us. However, we may as well avoid
unnecessary work.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Make the "EA block passes checks but fails checksum" message less
strange, and make the other checksum error messages actually print a
period at the end of the sentence.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If we're forced to delete a crosslinked file, only call
ext2fs_block_alloc_stats2() on cluster boundaries, since the block
bitmaps are all cluster bitmaps at this point. It's safe to do this
only once per cluster since we know all the blocks are going away.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
As far as I can tell, logical block mappings on a bigalloc filesystem are
supposed to follow a few constraints:
* The logical cluster offset must match the physical cluster offset.
* A logical cluster may not map to multiple physical clusters.
Since the multiply-claimed block recovery code can be used to fix these
problems, teach e2fsck to find these transgressions and fix them.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If we're filling a directory hole, we need to perform an implied
cluster allocation to satisfy the bigalloc rule of mapping only one
pblk to a logical cluster.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
In the original patch (against -next), the hunk to fix uninit dirs was
just prior to the hunk labelled "Corrupt but passes checks?". The
hunks are ordered this way so that if e2fsck obtains permission to fix
a failed-csum extent (which in turn fixes the checksum), it will not
subsequently ask to (re)fix the checksum.
Due to a merge error the hunk moved to the wrong place, so put it
back.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If we think we're going to need to repair either the root directory or
the lost+found directory, reserve a block at the end of pass 1 to
reduce the likelihood of an e2fsck abort while reconstructing
root/lost+found during pass 3.
If / and/or /lost+found are corrupt and duplicate processing in pass
1b allocates all the free blocks in the FS, fsck aborts with an
unusable FS since pass 3 can't recreate / or /lost+found. If either
of those directories are missing, an admin can't easily mount the FS
and access the directory tree to move files off the injured FS and
free up space; this in turn prevents subsequent runs of e2fsck from
being able to continue repairs of the FS.
(One could migrate files manually with debugfs without the help of
path names, but it seems easier if users can simply mount the FS and
use regular FS management tools.)
[ Fixed up an obvious C trap: const char * and const char [] are not
the same thing when you are taking the size of the parameter.
People, run your regression tests! Like spinach, it's good for you. :-)
-- tytso ]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Provide an API to set i_size in an inode and take care of all required
feature flag modifications. Refactor the code to use this new
function.
[ Moved the function to lib/ext2fs/blk_num.c, which is the rest of
these sorts of functions live, and renamed it to be
ext2fs_inode_size_set() instead of ext2fs_inode_set_size() to be
consistent with the other functions in in blk_num.c -- tytso ]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
We rely on a nasty hack to adjust the free block count where we pass
signed value into ext2fs_free_blocks_count_add(), which takes an
64-bit unsigned value, and relies on overflow and C's signed->unsigned
semantics to do the subtraction. This works, so long as a 64-bit
signed value is used.
Unfortunately, ext2fs_block_alloc_stats2() and
ext2fs_block_alloc_stats_range(), this is not true, so on a 64-bit
file system, the free blocks accounting can get screwed up.
A simple way to demonstrate the problem is:
mke2fs -F -t ext4 -O 64bit /tmp/foo.img 1M
e2fsck -fy /tmp/foo.img
... which will result in the following e2fsck complaint:
Pass 5: Checking group summary information
Free blocks count wrong (4294968278, counted=982).
Fix? yes
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
There are a number of places where we need convert groups to blocks or
clusters by multiply the groups by blocks/clusters per group.
Unfortunately, both quantities are 32-bit, but the result needs to be
64-bit, and very often the cast to 64-bit gets lost.
Fix this by adding new macros, EXT2_GROUPS_TO_BLOCKS() and
EXT2_GROUPS_TO_CLUSTERS().
This should fix a bug where resizing a 64bit file system can result in
calculate_minimum_resize_size() looping forever.
Addresses-Launchpad-Bug: #1321958
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Using C99 initializers makes the code a bit more readable, and it
avoids some gcc -Wall warnings regarding missing initializers.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When resizing an empty 21T file system to 28T, resize2fs was using
this much CPU time and memory:
216.98user 19.77system 4:02.92elapsed 97%CPU (0avgtext+0avgdata 4485664maxresident)k
8inputs+1068680outputs (0major+800745minor)pagefaults 0swaps
After this one-line change:
222.29user 0.49system 3:48.79elapsed 97%CPU (0avgtext+0avgdata 30080maxresident)k
8inputs+1068552outputs (0major+2497minor)pagefaults 0swaps
So this reduces the max memory utilized from 4.2GB to 29MB!
For future work, the primary place where we are spending the most cpu
time (from resize2fs -d 16) are these two places:
blocks_to_move: Memory used: 2508k/25096k (1903k/606k), time: 91.42/91.53/ 0.00
and
calculate_summary_stats: Memory used: 2508k/25612k (1908k/601k), time: 95.33/95.45/ 0.00
The calculate_summary_stats pass can be sped up by using
ext2fs_find_first_{zero,set}_block_bitmap2(), instead of iterating
over the entire block bitmap one bit at a time.
The blocks_to_move pass can be sped up by using a bitmap to store the
location of fs metadata blocks, to avoid an O(N**2) algorithm where N
is the number of groups in the file system.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The bits between end and real_end are set as a safety measure for the
kernel when it uses the bit scan instructions. We need to take this
into account when shrinking or growing the block allocation bitmap,
before we can safely use rbtree bitmaps in resize2fs.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Fix a few warnings about unused and uninitialized variables.
Also fix util/subst.c to include <sys/time.h> to avoid using
undeclared functions gettimeofday() and futimes().
Signed-off-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Creates a program that fuzzes only the metadata blocks (or optionally
all in-use blocks) of an ext* filesystem. There's also a script to
automate fuzz testing of the kernel and e2fsck in a loop.
[ Modified by tytso to add e2fuzz to the clean makefile rule ]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Creates a program that fuzzes only the metadata blocks (or optionally
all in-use blocks) of an ext* filesystem. There's also a script to
automate fuzz testing of the kernel and e2fsck in a loop.
[ Modified by tytso to add e2fuzz to the clean target ]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Port tune2fs' -e flag to mke2fs so that we can set error behavior at
format time, and introduce the equivalent errors= setting into
mke2fs.conf.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Directories can't have uninitialized extents, so offer to clear the
uninit flag when we find this situation. The actual directory blocks
will be checked in pass 2 and 3 regardless of the uninit flag.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Currently, directories cannot be fallocated, which means that the only
way they get bigger is for the kernel to append blocks one by one.
Therefore, if we encounter a logical block offset that is too big, we
needn't bother adding it to the dblist for pass2 processing, because
it's unlikely to contain a valid directory block. The code that
handles extent based directories also does not add toobig blocks to
the dblist.
Note that we can easily cause e2fsck to fail with ENOMEM if we start
feeding it really large logical block offsets, as the dblist
implementation will try to realloc() an array big enough to hold it.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Always iterate logical block 0 in a directory, even if no physical
block has been allocated. Pass 2 will notice the lack of mapping and
offer to allocate a new directory block; this enables us to link the
directory into lost+found.
Previously, if there were no logical blocks mapped, we would fail to
pick up even block 0 of the directory for processing in pass 2. This
meant that e2fsck never allocated a block 0 and therefore wouldn't fix
the missing . and .. entries for the directory; subsequent e2fsck runs
would complain about (yet never fix) the problem.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If we notice a hole in the block map of an extent-based directory,
offer to collapse the hole by decreasing the logical block # of the
extent. This saves us from pass 3's inefficient strategy, which fills
the holes by mapping in a lot of empty directory blocks.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If a user crafts a carefully constructed filesystem containing a
single directory entry block with an invalid checksum and fewer than
two entries, and then runs e2fsck to fix the filesystem, fsck will
crash when it tries to "compress" the short dir and passes a negative
dirent array length to qsort. Therefore, don't allow directory
"compression" in this situation.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The third argument to strncat is the maximum number of characters to
copy out of the second argument; it is not the maximum length of the
first argument.
Therefore, code in a check just in case we ever find a /sys/block/X
path long enough to hit the end of the buffer. FWIW the longest path
I could find on my machine was 133 bytes.
Fixes-Coverity-Bug: 1252003
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
In the loop in ext2fs_get_free_blocks2, we ask the bitmap if there's a
range of free blocks starting at "b" and ending at "b + num - 1".
That quantity is the number of the last block in the range. Since
ext2fs_blocks_count() returns the number of blocks and not the number
of the last block in the filesystem, the check is incorrect.
Put in a shortcut to exit the loop if finish > start, because in that
case it's obvious that we don't need to reset to the beginning of the
FS to continue the search for blocks. This is needed to terminate the
loop because the broken test meant that b could get large enough to
equal finish, which would end the while loop.
The attached testcase shows that with the off by one error, it is
possible to throw e2fsck into an infinite loop while it tries to
find space for the inode table even though there's no space for one.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Since fs->group_desc_count is the number of block groups, the number
of the last group is always one less than this count. Fix the bounds
check to reflect that.
This flaw shouldn't have any user-visible side effects, since the
block bitmap test based on last_grp later on can handle overbig block
numbers.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
During the later passes of efsck, we sometimes need to allocate and
map blocks into a file. This can happen either by fsck directly
calling new_block() or indirectly by the library calling new_block
because it needs to allocate a block for lower level metadata (bmap2()
with BMAP_SET; block_iterate3() with BLOCK_CHANGED).
We need to force new_block to allocate blocks from the found block
map, because the FS block map could be inaccurate for various reasons:
the map is wrong, there are missing blocks, the checksum failed, etc.
Therefore, any time fsck does something that could to allocate blocks,
we need to intercept allocation requests so that they're sourced from
the found block map. Remove the previous code that swapped bitmap
pointers as this is now unneeded.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When we call ext2fs_close_free at the end of main(), we need to supply
the address of ctx->fs, because the subsequent e2fsck_free_context
call will try to access ctx->fs (which is now set to a freed block) to
see if it should free the directory block list. This is clearly not
desirable, so fix the problem.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>