If e2fsck encounters a read error on a block past the end of the
filesystem, don't bother trying to "rewrite" the block. We might
still want to re-try the read to capture FS data marooned past the end
of the filesystem, but in that case e2fsck ought to move the block
back inside the filesystem.
This enables e2fuzz to detect writes past the end of the FS due to
software bugs.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If we decide to clear a special inode because of bad mappings, we need
to zero the i_block array. The clearing routine depends on setting
i_links_count to zero to keep us from re-checking the block maps,
but that field isn't checked for special inodes. Therefore, if we
haven't erased the mappings, check_blocks will restart fsck and fsck
will try to check the blocks again, leading to an infinite loop.
(This seems easy to trigger if the bootloader inode extent map is
corrupted.)
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If the user tries to enable or disable the 64bit feature via tune2fs,
tell them how to use resize2fs to effect the conversion.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Earlier, I tried to make tune2fs abort if the user tried to enable or
disable metadata_csum on a mounted FS, but forgot the exit() call.
Supply it now.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When we're turning on metadata checksumming /and/ resizing the inode
at the same time, disable checksum verification during the
resize_inode() call because the subroutines it calls will try to
verify the checksums (which have not yet been set), causing the
operation to fail unnecessarily.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The behavior of the r_fixup_lastbg_big test varies depending on
whether or not ext4.ko is loaded and supports lazy_itable_init. This
makes checking the bg flags after resize2fs hard to predict, so put in
a way to force resize2fs to zero the inode tables, and compare the
output based on lazy_itable_init == 0.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When undoing an expansion of an mmap'd database while cancelling a
transaction, the tdb code prematurely decreases the variable that
tracks the file size, which leads to a region leak during the
subsequent unmap. Fix this by maintaining a separate counter for the
region size.
(This is probably unnecessary since e2undo was the only user of tdb
transactions, but I suppose we could be proactive.)
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Strengthen the i_extra_isize checks to look for obviously too-small
values before trying to operate on inode EAs.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Use qsort to move the inlinedata attribute to the front of the list
and the empty entries to the end. Then we can use handle->count to
decide if we're done writing xattrs, which helps us to avoid the
situation where we're midway through the attribute list, so we
allocate an EA block to store more, but have no idea that there's
actually nothing left in the list.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If i_extra_isize is zero when we try to write extended attributes,
we'll end up writing the EA magic into the i_extra_isize field, which
causes a subsequent crash on big endian systems (when we try to write
0xEA02 bytes past the inode!). Therefore when the field is zero, set
i_extra_isize to the desired extra_isize size, zero those bytes, and
write the EAs after the end of the extended inode.
v2: Don't bother if we have 128b inodes, and ensure that the value
is 32b-aligned so that the EA magic starts on a 32b boundary.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Fix an incorrect check in ea_set that would crash debugfs if someone
runs 'ea_set / foo.bar' (i.e. with no value argument)
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Document the new journal and xattr commands in the debugfs manpage.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
'chmod -w' is not portable and can break the build:
| chmod: chmod: ss_err.h: new permissions are r--rw-r--, not r--r--r--
| ss_err.h: new permissions are r--rw-r--, not r--r--r--
| chmod: ss_err.c: new permissions are r--rw-r--, not r--r--r--
| make[2]: *** [ss_err.h] Error 1
This happens because 'chmod -w' is affected by umask. Issue can be
reproduced e.g. by
$ mkdir /tmp/foo
$ setfacl -m dⓂ️rwx /tmp/foo
$ umask 022
$ touch /tmp/foo/x
$ chmod -w /tmp/foo/x
chmod: /tmp/foo/x: new permissions are r--rw-r--, not r--r--r--
Signed-off-by: Enrico Scholz <enrico.scholz@sigma-chemnitz.de>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Previously, e2fsck accessed the field osd2.linux2.l_i_file_acl_high
field without checking that the filesystem is indeed created for
Linux. This lead to e2fsck constantly complaining about certain
nodes:
i_file_acl_hi for inode XXX (/dev/console) is 32, should be zero.
By "correcting" this problem, e2fsck would clobber the field
osd2.hurd2.h_i_mode_high.
Properly guard access to the OS dependent fields.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If e2fsck.conf's logging feature is enabled, and e2fsck is being run
via systemd-fsck, there will be a deadlock since systemd-fsck is
waiting for progress_fd pipe to be closed, instead of waiting for the
fsck process to exit --- and so the logfile child process won't exit
until it can write out the logfile, and systemd won't continue the
boot process so that the file system can be remounted read-write.
Oops.
Addresses-Debian-Bug: #775234
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If the number of unused inodes is greater than number of inodes a
block group, this can cause an e2fsck -n run of the file system to
crash.
We should add more checks to e2fsck to detect this case directly, but
this will at least protect progams (tune2fs, dump, etc.) which use the
inode_scan abstraction from crashing on an invalid file system.
Addresses-Debian-Bug: #773795
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Add some simple tests to check that flex_bg and meta_bg filesystems
can be converted between 32 and 64bit layouts.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
resize2fs does its magic by loading a filesystem, duplicating the
in-memory image of that fs, moving relevant blocks out of the way of
whatever new metadata get created, and finally writing everything back
out to disk. Enabling 64bit mode enlarges the group descriptors,
which makes resize2fs a reasonable vehicle for taking care of the rest
of the bookkeeping requirements, so add to resize2fs the ability to
convert a filesystem to 64bit mode and back.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Cc: TR Reardon <thomas_reardon@hotmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The maximum extent tree depth really only depends on the filesystem
block size, so cache the last result if possible.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
While it may be true that playing games with old_fs' block count
during a grow operation shuts up a bunch of warnings, resize2fs
doesn't actually expand the group descriptor array to match the size
we're artificially stuffing into old_fs, which means that if we
actually need to allocate a block out of the larger fs (i.e. we're in
desperation mode), ext2fs_block_alloc_stats2() scribbles on the heap,
leading to crashes if you're lucky and FS corruption if not.
So, rip that piece out and turn off com_err warnings properly and add
a test case to deal with growing a nearly full filesystem.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Recalculate the unused inode count and the block/inode uninit flags
when resizing a filesystem. This can speed up future e2fsck runs
considerably and will reduce mount times.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
resize2fs tries to infer the RAID stride by observing differences
between the locations of adjacent block groups' block and inode
bitmaps within the block group. If the two block groups being
compared belong to different flexbgs, however, it'll be fooled by the
large offset into thinking that the FS has an abnormally large RAID
stride.
Therefore, teach it not to get confused by crossing a flexbg.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reported-by: TR Reardon <thomas_reardon@hotmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When shrinking a filesystem, resize2fs wants to free per-bg metadata
blocks that are no longer needed. This behavior is gated on whether
there's a superblock in the group as told by new_fs. The check really
should be against old_fs, since we're effectively freeing blocks out
of old_fs in the transition to new_fs, but prior to sparse_super2 this
didn't matter since superblocks didn't move, so it didn't matter.
Under sparse_super2, however, there's a superblock in the last group,
so now we need to change the test to use old_fs as it should.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Currently maximum number of bad blocks is not limited in any way.
However our code can really handle at most INT_MAX/2 bad blocks (for
larger numbers binary search indexes start overflowing). So report
number of bad blocks is just too big instead of plain segfaulting.
It won't be too hard to raise the limit but I don't think there's any
real use for disks with over 1 billion of bad blocks...
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
My previous change ended up requiring that the filesystem
be fsck'd after the last mount, even if we are only querying
the minimum size. This is a bit draconian, and it burned
the Fedora installer, which wants to calculate minimum size
for every filesystem in the box at install time, which in turn
requires a full fsck of every filesystem.
Try this one more time, and separate out the tests to make things
a bit more clear. If we're only printing the min size, don't
require the fsck, as this is a bit less dangerous/critical.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When we're moving an inode on a metadata_csum filesystem, we need to
rewrite the checksum of all interior nodes of the extent tree. The
current code does this inefficiently via set_bmap, but we can do this
more efficiently through direct iteration of the extent tree.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If we're shrinking a sparse_super2 filesystem to a single block group,
the superblock will be in block 0. This is perfectly valid (for block
group 0 with a blocksize > 1024) so don't exit.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
At mke2fs time, if we discard the device and discard zeroes data,
don't bother zeroing the inode table blocks a second time.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If we're disabling metadata_csum and the user doesn't provide explicit
instructions to enable or disable uninit_bg, assume that they want
uninit_bg to be turned on by default. Otherwise, we lose all block
group flags and unused inode count, which is a big hit to performance.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Warn the user if we're trying to enable metadata_csum on a FS that
doesn't support extents (since block maps cannot contain checksums).
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Don't complain about checksum failures on the root dir when we're
trying to find l+f if the root dir is going to be rehashed anyway.
The test case for this is t_enable_mcsum in the next patch.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If a directory block lacks space for a checksum and the user directs
e2fsck to fix the directory block (by rehashing it), don't complain a
second time about the checksum verification failure when we get to the
end of the directory block.
Also, don't complain about broken HTREE directories if we're already
planning to rebuild the HTREE directory.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Don't display unused inodes twice, and make it clear that we're
printing a descriptor checksum.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Cc: TR Reardon <thomas_reardon@hotmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The current mk_hugefile code in mke2fs doesn't support creating
non-extent files, so disable the functionality when we're mkfs'ing
without extent support.
The fallocate patches further on will eliminate the need for this.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Add an API so that client programs can discover a reasonable maximum
extent tree depth. This will eventually be used by e2fsck as one of
the criteria to decide if an extent-based file should have its extent
tree rebuilt.
Turn some related magic numbers into constants while we're at it.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When we're splitting an extent node, try to allocate the new interior
tree block just prior to the first extent in the block we're trying to
split. The previous logic only set a goal block if we had to split
both the current node and its parent, which is somewhat infrequent.
When that would happen, the goal would start at zero, leading to poor
locality.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Try to be a little smarter about where we go to allocate blocks for a
inode. For a given inode and logical offset, set the goal as if the
file were physically continuous. If it's bmapped, just start looking
at wherever lblk 0 is. If that's not possible (the file has no
lblk>pblk mappings, inline data, etc.) then start looking in the
inode's block group.
[ Fixed memory leak --tytso ]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If the caller supplies a buffer to ext2fs_alloc_block2(), use it
instead of calling ext2fs_zero_blocks2().
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Dynamically grow the block zeroing buffer to a maximum of 4MB, and
allow callers to provide their own zeroed buffer in
ext2fs_zero_blocks2().
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
bmap_rb_extent is defined as __u64:blk __u64:count. So count can
exceed INT_MAX on populated filesystems.
TESTCASE: xfstest ext4/004
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>