Use the new fallocate API for creating the journal and the mk_hugefile
feature.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Having multiple versions of jfs_user.h was confusing the Android
build. Clean up things by removing the lib/ext2fs/jfs_user.h and
misc/jfs_user.h and simplifying how we emulate the kernel
infrastructure needed by journal replay code and removing the
kernel-specific lines from kernel-jbd.h.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
e2fsck/dirinfo.c and misc/e4crypt.c use functions from libext2fs, so
we need to include its header file or clang will complain.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
We need to use lgetxattr(2) instead of getxattr(2) or attempts to
create file systems with extended attributes will fail:
set_inode_xattr: No data available while reading attribute "trusted.link" of "link"
__populate_fs: No data available while setting xattrs for "link"
mke2fs: No data available while populating file system
Reported-by: Jack_Fewx@Dell.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
This started with the fm_ext being uninitialized, but upon closer
analysis I discovered that forcing extent emulation in FIBMAP mode
was reporting an extent for every block in the file. Fix both
problems.
The Coverity bug was 1297512.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Fix Coverity bugs 1297094-1297101 by fixing all the mutations in the
*_setup_tdb() functions, fixing buffer overflows, and checking
return values.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Add some simple tests for mke2fs -d (create image from dir) and make
the manpage options appear in alphabetic order.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Save errno (in retval) before doing anything else, because the
"anything else" (usually com_err()) can call library functions, which
will reset errno.
Fix the error messages to use the message catalog, and don't _ever_
print an error without providing context.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Rewrite the file copy-in algorithm to detect smaller holes in the
files we're copying in. Use SEEK_DATA/SEEK_HOLE/FIEMAP when available
to skip known empty parts. This fixes the particular bug where zeroed
blocks on a system with 64k pages are needlessly copied into a
4k-block filesystem. It also saves time by skipping parts we know to
be zeroed.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When we're creating hard links via ext2fs_link, the (misnamed?) flags
argument specifies the filetype for the directory entry. This is
*derived* from i_mode, so provide a translator. Otherwise, fsck will
complain about unset file types.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Provide the user with an option to create an undo file so that they
can roll back a failed tuning operation. Previously, one would be
created if force_undo was set in the configuration file and a bunch of
(undocumented) conditions were met.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Provide the user with an option to create an undo file so that they
can roll back a failed tuning operation. Previously, one would be
created for inode resize if a bunch of (undocumented) conditions were
met.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The existing undo file format (which is based on tdb) has many
problems. First, its comparison of superblock fields is ineffective,
since the last mount time is only written by the kernel, not the tools
(which means that undo files can be applied out of order, thus
corrupting the filesystem); block numbers are written in CPU byte
order, which will cause silent failures if an undo file is moved from
one type of system to another; using the tdb database costs us an
enormous amount of CPU overhead to maintain the key data structure,
and finally, the tdb database is unable to deal with databases larger
than 2GB. (Upstream tdb 1.2.12 can handle 4GB, but upgrading a 2TB FS
to 64bit,metadata_csum easily produces 2.9GB of undo files, so we
might as well move off of tdb now.)
The last problem is fatal if you want to use tune2fs to turn on
metadata checksumming, since that rewrites every block on the
filesystem, which can easily produce a many-gigabyte undo file, which
of course is unreadable and therefore the operation cannot be undone.
Therefore, rip all of that out in favor of writing to a flat file.
Old blocks are appended to a file and the index is written to the end
when we're done. This implementation is much faster than wasting a
considerable amount of time trying to maintain a hash index, which
drops the runtime overhead of tune2fs -O metadata_csum from ~45min
to ~20 seconds on a 2TB filesystem.
I have a few reasons that factored in my decision not to repurpose the
jbd2 file format for undo files. First, undo files are limited to
2^32 blocks (16TB) which some day might not serve us well. Second,
the journal block size is tied to the file system block size, but
mke2fs wants to be able to back up big chunks of old device contents.
This would require large changes to the e2fsck journal replay code,
which itself is derived from the kernel jbd2 driver, which I'd rather
not destabilize. Third, I want to require undo files to store the FS
superblock at the end of undo file creation so that e2undo can be
reasonably sure that an undo file is supposed to apply against the
given block device, and doing so would require changes to the jbd2
format. Fourth, it didn't seem like a good idea that external
journals should resemble undo files so closely.
v2: Provide a state bit that is only set when the undo channel is
closed correctly so we can warn the user about potentially incomplete
undo files. Straighten out the superblock handling so that undo files
won't be confused for real ext* FS images. Record multi-block runs in
each block key to reduce overhead even further. Support reopening an
undo file so that we can combine multiple FS operations into one
(overall smaller) transaction file, which will be easier to manage.
Flush the undo index data if the program should terminate
unexpectedly. Update the ext4 superblock bits if errors or -f is
found to encourage fsck to do a full run the next time it's invoked.
Enable undoing the undo.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Fix memory leaks and improve the error messages to make it easier
to figure out why e2undo went wrong.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The directory hash is now calculated using the on-disk encrypted
filename, and we no longer use the digest encoding or the SHA-256
encoding, so remove them from the ext2fs library until there is some
reason we need them.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Once we've "fixed" the filesystem, try mounting and modifying it to see
if we can break the kernel.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Previously we were using a weird hybrid CBC/CTS. Switch things so we
are using straight CTS; this corresponds to changes made in the latest
ext4 encryption patches.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Add missing new lib/ext2fs source files that were added for encryption
support. Also move configuration #define's from individual Android.mk
to the android_config.h file, since we've moved away from specifying
configuration #define's on the command-line upstream.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Previously, e4crypt required the user to manually specify the salt
used for their passphrase. This was user unfriendly to say the least.
The e4crypt program can now request the salt using an ioctl, which
will automatically generate the salt if necessary, and keep it in the
ext4 superblock.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
This patch adds new e4crypt tool for encryption management in the ext4
filesystem.
Signed-off-by: Ildar Muslukhov <muslukhovi@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The Android.mk files were taken from the Android AOSP sources, and
updated for the 1.43 next branch. The intention is that this will
allow the repository which is currently located in external/e2fsprogs
with one which is based off of the upstream e2fsprogs. Right now
external/e2fsprogs was not created using "git clone", so it means that
git merges don't work. After the external/e2fsprogs Android
repository is replaced, with one based off the upstream repository,
Android will be able to synchronize with the upstream repository by
pulling and merging from upstream, and then running the script
"./util/gen-android-files" to update any generated files. (This is
necessary because in the Android build system, the Android.mk files
are rather stylized and don't make it easy to run arbitrary shell
scripts during the build phase.)
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
If e2fsck encounters a read error on a block past the end of the
filesystem, don't bother trying to "rewrite" the block. We might
still want to re-try the read to capture FS data marooned past the end
of the filesystem, but in that case e2fsck ought to move the block
back inside the filesystem.
This enables e2fuzz to detect writes past the end of the FS due to
software bugs.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If the user tries to enable or disable the 64bit feature via tune2fs,
tell them how to use resize2fs to effect the conversion.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Earlier, I tried to make tune2fs abort if the user tried to enable or
disable metadata_csum on a mounted FS, but forgot the exit() call.
Supply it now.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When we're turning on metadata checksumming /and/ resizing the inode
at the same time, disable checksum verification during the
resize_inode() call because the subroutines it calls will try to
verify the checksums (which have not yet been set), causing the
operation to fail unnecessarily.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
resize2fs does its magic by loading a filesystem, duplicating the
in-memory image of that fs, moving relevant blocks out of the way of
whatever new metadata get created, and finally writing everything back
out to disk. Enabling 64bit mode enlarges the group descriptors,
which makes resize2fs a reasonable vehicle for taking care of the rest
of the bookkeeping requirements, so add to resize2fs the ability to
convert a filesystem to 64bit mode and back.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Cc: TR Reardon <thomas_reardon@hotmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Currently maximum number of bad blocks is not limited in any way.
However our code can really handle at most INT_MAX/2 bad blocks (for
larger numbers binary search indexes start overflowing). So report
number of bad blocks is just too big instead of plain segfaulting.
It won't be too hard to raise the limit but I don't think there's any
real use for disks with over 1 billion of bad blocks...
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
At mke2fs time, if we discard the device and discard zeroes data,
don't bother zeroing the inode table blocks a second time.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If we're disabling metadata_csum and the user doesn't provide explicit
instructions to enable or disable uninit_bg, assume that they want
uninit_bg to be turned on by default. Otherwise, we lose all block
group flags and unused inode count, which is a big hit to performance.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Warn the user if we're trying to enable metadata_csum on a FS that
doesn't support extents (since block maps cannot contain checksums).
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Don't display unused inodes twice, and make it clear that we're
printing a descriptor checksum.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Cc: TR Reardon <thomas_reardon@hotmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The current mk_hugefile code in mke2fs doesn't support creating
non-extent files, so disable the functionality when we're mkfs'ing
without extent support.
The fallocate patches further on will eliminate the need for this.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Don't open-code the creation of the extent tree header, since
ext2fs_extent_open2() knows how to take care of this.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If we apply this patch 'e2fsprogs/tune2fs: rewrite metadata checksums
when resizing inode size', we will trigger a segfault, this is because
of the inode cache issues.
Firstly we should notice that in expand_inode_table(), we have change
the super block's s_inode_size to new inode size(for example, 256).
Then we re-compute metadata checksums, see below code flow:
|-->rewrite_metadata_checksums
|----->rewrite_inodes
|-------->ext2fs_write_inode_full
In ext2fs_write_inode_full(), if an inode cache is hit, the below code will be executed:
/* Check to see if the inode cache needs to be updated */
if (fs->icache) {
for (i=0; i < fs->icache->cache_size; i++) {
if (fs->icache->cache[i].ino == ino) {
memcpy(fs->icache->cache[i].inode, inode,
(bufsize > length) ? length : bufsize);
break;
}
}
}
Before executing rewrite_inodes(), actually the inode in inode cache
is allocated by old inode size(for example, 128), but here the memcpy
will obviously write overflow, '(bufsize > length) ? length : bufsize'
here will return 256(new inode size), so this is wrong, we need to fix
this. I think we should call ext2fs_free_inode_cache() in
expand_inode_table(), to drop the inode cache, because inode size has
changed, if necessary, we will re-create this inode cache.
Steps to reproduce this bug (apply 'tune2fs: rewrite metadata checksums
when resizing inode size' first):
dd if=/dev/zero of=file.img bs=1M count=128
device_name=$(/sbin/losetup -f)
/sbin/losetup -f file.img
mkfs.ext4 -I 128 -O ^flex_bg $device_name
tune2fs -I 256 $device_name
Signed-off-by: Xiaoguang Wang <wangxg.fnst@cn.fujitsu.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When we use tune2fs -I new_ino_size to change inode size, if
everything is OK, the corresponding ext4_group_desc.bg_free_blocks_count
will be decreased, so obviously, we need to re-compute the group
descriptor checksums, and the inode 's size has also changed, we also
need to recompute the checksums of inodes for metadata_csum
filesystem, so here we choose to call a rewrite_metadata_checksums(),
this will fix checksum issues.
Meanwhile, the patch will trigger an existing memory write overflow,
which will casue segfault, please see the next patch.
Signed-off-by: Xiaoguang Wang <wangxg.fnst@cn.fujitsu.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When looking for the start of the hugefile range, the 'next' variable
is incorrectly decremented. If we happened to find a single free
block, the effect of this decrement is that blk == next, which means
that we never modify the loop control variable, so get_start_block
never returns.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When we use ext2fs_open_inode_scan() to iterate inodes and finish
jobs, we also need a ext2fs_close_inode_scan(scan) operation, but in
inode_scan_and_fix(), we forgot to call it, fix this error.
Signed-off-by: Xiaoguang Wang <wangxg.fnst@cn.fujitsu.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>