This patch adds to e2fsck the ability to pre-fetch metadata into the
page cache in the hopes of speeding up fsck runs. There are two new
functions -- the first allows a caller to readahead a list of blocks,
and the second is a helper function that uses that first mechanism to
load group data (bitmaps, inode tables).
These new e2fsck routines require the addition of a dblist API to
allow us to iterate a subset of a dblist. This will enable
incremental directory block readahead in e2fsck pass 2.
There's also a function to estimate the readahead given a FS.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The s_lpf_ino field is intended to store the location of the lost and
found directory if the root directory becomes encrypted (which is not
yet supported). The s_encryption_level field is designed to allow
support for future changes in the on-disk ext4 encryption format while
this feature under development, without having to burn a large number
of bits in the incompat feature flag.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The compression patches were an out-of-kernel patch set that was (a)
only available for ext2, (b) something that was never could be
stablized due to file system corruption, and (c) the most recent
patches were for 3.1, last updated in 2011.
The history of the compression patches has been a bit checkered.
There is a long history here at http://e2compr.sourceforge.net which
lists the perspective of the people working on it from the e2compr
side.
From the ext2/3/4 mainline developers' perspective, initial
compression support was added to e2fsprogs in 2000 (in the Linux 2.2
era), but due to stability concerns the kernel patches were never
merged into the mainline kernel. While there were some sporadic
efforts to try to get the ext2 compression patches working in the 2.4
and 2.6 era, by that time mainline work had moved on to ext4, and the
e2compr approach could only work with 32-bit block numbers and
indirect mapped files.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Previously we were using a weird hybrid CBC/CTS. Switch things so we
are using straight CTS; this corresponds to changes made in the latest
ext4 encryption patches.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The ext2fs_digest_encode() function was broken for any input which was
a multiple of 3. Previously we never hit that case, so we never
noticed it was busted. Also fix up the unit test so future problems
like this get noticed quickly.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Add missing new lib/ext2fs source files that were added for encryption
support. Also move configuration #define's from individual Android.mk
to the android_config.h file, since we've moved away from specifying
configuration #define's on the command-line upstream.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The kernel never updates the extended attribute hash value for
attributes stored in the inode. However, fsck has always checked this
value (if it's nonzero) and will complain if the hash doesn't match
the xattr. Therefore, always zero the hash value when writing to
in-ibody xattrs to avoid creating "corrupt" attribute errors
downstream.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If ext2fs_new_block2() is called without a specific block map, we
should call the alloc_block hook before checking fs->block_map. This
helps us to avoid a bug in e2fsck where we need to allocate a block
but instead of consulting block_found_map, we use the FS bitmaps,
which (prior to pass 5) could be wrong.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Plumb a new call into the IO manager to support translating
ext2fs_zero_blocks calls into the equivalent FALLOC_FL_ZERO_RANGE
fallocate flag primitive when possible. This patch provides _only_
support for file-based images.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Previously, e4crypt required the user to manually specify the salt
used for their passphrase. This was user unfriendly to say the least.
The e4crypt program can now request the salt using an ioctl, which
will automatically generate the salt if necessary, and keep it in the
ext4 superblock.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
This patch adds new e4crypt tool for encryption management in the ext4
filesystem.
Signed-off-by: Ildar Muslukhov <muslukhovi@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Add const annotation to the input pointers; also run the tst_sha256
and tst_sha512 unit tests on a "make check".
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Teach ext2fs_inodes_has_valid_blocks2() that encrypted symlinks always
use an external block (i.e., we never try to store the symlink in the
i_blocks[] array if it is encrypted).
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The Android.mk files were taken from the Android AOSP sources, and
updated for the 1.43 next branch. The intention is that this will
allow the repository which is currently located in external/e2fsprogs
with one which is based off of the upstream e2fsprogs. Right now
external/e2fsprogs was not created using "git clone", so it means that
git merges don't work. After the external/e2fsprogs Android
repository is replaced, with one based off the upstream repository,
Android will be able to synchronize with the upstream repository by
pulling and merging from upstream, and then running the script
"./util/gen-android-files" to update any generated files. (This is
necessary because in the Android build system, the Android.mk files
are rather stylized and don't make it easy to run arbitrary shell
scripts during the build phase.)
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The bug fix in f66e6ce4446: "libext2fs: avoid buffer overflow if
s_first_meta_bg is too big" had a typo in the fix for
ext2fs_closefs(). In practice most of the security exposure was from
the openfs path, since this meant if there was a carefully crafted
file system, buffer overrun would be triggered when the file system was
opened.
However, if corrupted file system didn't trip over some corruption
check, and then the file system was modified via tune2fs or debugfs,
such that the superblock was marked dirty and then written out via the
closefs() path, it's possible that the buffer overrun could be
triggered when the file system is closed.
Also clear up a signed vs unsigned warning while we're at it.
Thanks to Nick Kralevich <nnk@google.com> for asking me to look at
compiler warning in the code in question, which led me to notice the
bug in f66e6ce444.
Addresses: CVE-2015-1572
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If the directory processing code ends up pointing to a directory entry
that's so close to the end of the block that there's not even space
for a rec_len/name_len, just substitute dummy values that will force
e2fsck to extend the previous entry to cover the remaining space. We
can't use the helper methods to extract rec_len because that's reading
off the end of the buffer.
This isn't an issue with non-inline directories because the directory
check buffer is zero-extended so that fsck won't blow up.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When undoing an expansion of an mmap'd database while cancelling a
transaction, the tdb code prematurely decreases the variable that
tracks the file size, which leads to a region leak during the
subsequent unmap. Fix this by maintaining a separate counter for the
region size.
(This is probably unnecessary since e2undo was the only user of tdb
transactions, but I suppose we could be proactive.)
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Strengthen the i_extra_isize checks to look for obviously too-small
values before trying to operate on inode EAs.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Use qsort to move the inlinedata attribute to the front of the list
and the empty entries to the end. Then we can use handle->count to
decide if we're done writing xattrs, which helps us to avoid the
situation where we're midway through the attribute list, so we
allocate an EA block to store more, but have no idea that there's
actually nothing left in the list.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If i_extra_isize is zero when we try to write extended attributes,
we'll end up writing the EA magic into the i_extra_isize field, which
causes a subsequent crash on big endian systems (when we try to write
0xEA02 bytes past the inode!). Therefore when the field is zero, set
i_extra_isize to the desired extra_isize size, zero those bytes, and
write the EAs after the end of the extended inode.
v2: Don't bother if we have 128b inodes, and ensure that the value
is 32b-aligned so that the EA magic starts on a 32b boundary.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
'chmod -w' is not portable and can break the build:
| chmod: chmod: ss_err.h: new permissions are r--rw-r--, not r--r--r--
| ss_err.h: new permissions are r--rw-r--, not r--r--r--
| chmod: ss_err.c: new permissions are r--rw-r--, not r--r--r--
| make[2]: *** [ss_err.h] Error 1
This happens because 'chmod -w' is affected by umask. Issue can be
reproduced e.g. by
$ mkdir /tmp/foo
$ setfacl -m dⓂ️rwx /tmp/foo
$ umask 022
$ touch /tmp/foo/x
$ chmod -w /tmp/foo/x
chmod: /tmp/foo/x: new permissions are r--rw-r--, not r--r--r--
Signed-off-by: Enrico Scholz <enrico.scholz@sigma-chemnitz.de>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If the number of unused inodes is greater than number of inodes a
block group, this can cause an e2fsck -n run of the file system to
crash.
We should add more checks to e2fsck to detect this case directly, but
this will at least protect progams (tune2fs, dump, etc.) which use the
inode_scan abstraction from crashing on an invalid file system.
Addresses-Debian-Bug: #773795
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The maximum extent tree depth really only depends on the filesystem
block size, so cache the last result if possible.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Add an API so that client programs can discover a reasonable maximum
extent tree depth. This will eventually be used by e2fsck as one of
the criteria to decide if an extent-based file should have its extent
tree rebuilt.
Turn some related magic numbers into constants while we're at it.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When we're splitting an extent node, try to allocate the new interior
tree block just prior to the first extent in the block we're trying to
split. The previous logic only set a goal block if we had to split
both the current node and its parent, which is somewhat infrequent.
When that would happen, the goal would start at zero, leading to poor
locality.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Try to be a little smarter about where we go to allocate blocks for a
inode. For a given inode and logical offset, set the goal as if the
file were physically continuous. If it's bmapped, just start looking
at wherever lblk 0 is. If that's not possible (the file has no
lblk>pblk mappings, inline data, etc.) then start looking in the
inode's block group.
[ Fixed memory leak --tytso ]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If the caller supplies a buffer to ext2fs_alloc_block2(), use it
instead of calling ext2fs_zero_blocks2().
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Dynamically grow the block zeroing buffer to a maximum of 4MB, and
allow callers to provide their own zeroed buffer in
ext2fs_zero_blocks2().
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
bmap_rb_extent is defined as __u64:blk __u64:count. So count can
exceed INT_MAX on populated filesystems.
TESTCASE: xfstest ext4/004
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The file IO routines do not handle uninit blocks at all. The read
method should check for the uninit flag and return a buffer of zeroes,
and the write routine should convert unwritten extents.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Don't open-code the creation of the extent tree header, since
ext2fs_extent_open2() knows how to take care of this.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If the inode size is large enough that there are fewer than two inodes
per block, don't report an inode checksum failure as a garbage inode
during the scan because the "more than half are broken" criteria that
we use to decide if a block of inodes is garbage doesn't really apply.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Don't allow callers to feed bad block/inode numbers to
ext2fs_*_alloc_stats2, because evil callers (<cough>resize2fs<cough>)
can corrupt library state this way, leading to a crash.
(There will be a subsequent patch to resize2fs to fix its bad
behavior.)
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Set BLOCK_UNINIT in any group whose blocks are all unused, so long as
it isn't the last group. This helps us speed up future e2fsck runs
and mounts because we don't need to read or checksum block bitmaps for
these groups.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If we're going to read the "nr - 1" entry in an indirect block for use
as a "goal" input to the block allocator, we need to byteswap the
entry. While we're at it, if we're allocating blocks for the zeroth
entry in the indirect block, we might as well use the indirect block
as the starting point to try to reduce fragmentation.
(d_fallocate_blkmap will test this...)
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Fix some gcc-4.8 warnings and other problems that broke the build.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>