Commit 3e683eef93 ("define bitwise types and annotate conversion
routines") broke the build on various platforms. Turns out that
crossing our fingers wasn't such a good idea, so just define it
separately.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When writing an extended attribute (EA) block, it's quite possible
that the EA formatting code will not write the entire buffer.
Therefore, we must zero the buffer beforehand to avoid writing random
heap contents to disk.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reported-by: Sami Liedes <sami.liedes@iki.fi>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Sami Liedes found a scenario where we could memcpy incorrectly:
If a block read fails during an e2fsck run, the UNIX IO manager will
call the io->read_error routine with a pointer to the internal block
cache. The e2fsck read error handler immediately tries to write the
buffer back out to disk(!), at which point the block write code will
try to copy the buffer contents back into the block cache. Normally
this is fine, but not when the write buffer is the cache itself!
So, plumb in a trivial check for this condition. A more thorough
solution would pass a duplicated buffer to the IO error handlers, but
I don't know if that happens frequently enough to be worth the extra
point of failure.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reported-by: Sami Liedes <sami.liedes@iki.fi>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When we're iterating a directory, the loop control code reads the
length of the next directory record, failing to account for the fact
that there must be at least 8 bytes (the minimum size of a directory
entry) left in the buffer to read the next directory record. Fix the
loop conditional so that we don't read off the end of the buffer.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reported-by: Sami Liedes <sami.liedes@iki.fi>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The sparse checker treats 0 assignments as special, but
doesn't catch a = b = 0; separate them to make it quieter.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
This annotates most on-disk structures for endianness;
however it does not annotate some, like the superblock, inodes,
mmp, etc, as these are swapped in-place at this point. This is
a little inconsistent, but should help catch some endian mistakes,
at least.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
This lays the groundwork for sparse-checking e2fsprogs for
endianness; defines bitwise types, and fixes up the ext2fs_*
swapping routines to do the proper casts.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
This turned up when trying to resize a filesystem containing
a file with many extents on PPC64.
Fix all locations where ext3_extent_header members aren't
handled in an endian-safe manner.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Make subst more portable so it can deal with such oler systems that do
not have utimes(). Note that it is important that subst build
correctly without an autoconf-generated config.h (since that is what
happens on a cross-compile), as well as using whatever features are
available as determined by autoconf when doing a native build. We
currently assume the presence of utime(), but not utimes() or
futimes().
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Convert all call sites that write zero blocks to disk to use
ext2fs_zero_blocks2() since it can use Linux's zero out feature to do
the writes more quickly. Reclaim the zero buffer at freefs time and
make the write-zeroes fallback use a larger buffer.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If we're using check_plausibility() to try to identify something that
obviously isn't an ext* filesystem and libblkid doesn't know what it
is, try libmagic instead.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If any of these utilities detect a bad superblock magic, call
check_plausibility to see if blkid can identify the passed-in argument
as something else (xfs, partition, etc.) in the hopes of catching a
user error.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Move check_plausibility() into a separate file so that various
programs can use it without having to declare useless global variables
that the util.c functions seem to require.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Add a readahead method for prefetching ranges of disk blocks. This is
useful for inode table scanning, and other large contiguous ranges of
blocks, and may also prove useful for random block prefetch, since it
will allow reordering of the IO without waiting synchronously for the
reads to complete.
It is currently using the posix_fadvise(POSIX_FADV_WILLNEED)
interface, as this proved most efficient during our testing.
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Erase s_jnl_blocks when removing an external journal, or adding an
internal journal online. We can't add the backup for the internal
journal because we have no good way to get the indirect block or ETB
addresses, so the best we can do is hope that the user runs e2fsck,
which will correct that. We are motivated to erase during external
journal removal to state emphatically that there's no journal.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reported-by: thomas_reardon@hotmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When building in the source tree, the order of the includes caused the
compiling of debugfs/journal.c while in the lib/ext2fs directory to
find the version in lib/ext2fs instead of the desired version in
e2fsck/jfs_user.h.
We need to eventually get rid of this whole mess and have only one
jfs_user.h and build the journal-related functions once in an internal
library which is used only by e2fsprogs progams.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reported-by: "Darrick J. Wong" <darrick.wong@oracle.com>
When reading extended attributes, check e_value_offs to make sure that
it starts in the value area and not the name area. The attached test
case image will crash the kernel if it is mounted and you append more
than 4096 bytes of data to /a, due to insufficient validation.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Extend debugfs with the ability to create transactions and replay the
journal. This will eventually be used to test kernel recovery and
metadata_csum recovery.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Create a journal.c with routines adapted from e2fsck/journal.c to
handle opening and closing the journal, and setting up the
descriptors, and all that. Unlike e2fsck's versions which try to
identify and fix problems, the routines here have no way to repair
anything.
[ Modified by tytso to fold debugfs/jfs_user.h into e2fsck/jfs_user.h,
so we don't have to copy recovery.c and revoke.c into debugfs. --tytso ]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When we're removing the internal journal (broken journal, turning it
off, or adding an external journal), zero s_jnl_blocks so that they
can't be picked up by accident later.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Cc: TR Reardon <thomas_reardon@hotmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When creating a journal inode, check the return value from
block_iterate3() because otherwise we fail to capture errors such as
being unable to allocate an extent tree block, which leads to e2fsck
creating broken journals.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
We don't want ext2fs_open2() to report bad sb checksum on something
that's not even an ext* superblock. This apparently happens pretty
easily if we try to open an XFS filesystem. Thus, make it so that a
bad magic number code always trumps the sb checksum error code.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
It turns out that there are some serious problems with the on-disk
format of journal checksum v2. The foremost is that the function to
calculate descriptor tag size returns sizes that are too big. This
causes alignment issues on some architectures and is compounded by the
fact that some parts of jbd2 use the structure size (incorrectly) to
determine the presence of a 64bit journal instead of checking the
feature flags. These errors regrettably lead to the journal
corruption reported by Mr. Reardon.
Therefore, introduce journal checksum v3, which enlarges the
descriptor block tag format to allow for full 32-bit checksums of
journal blocks, fix the journal tag function to return the correct
sizes, and fix the jbd2 recovery code to use feature flags to
determine 64bitness.
Add a few function helpers so we don't have to open-code quite so
many pieces.
Switching to a 16-byte block size was found to increase journal size
overhead by a maximum of 0.1%, to convert a 32-bit journal with no
checksumming to a 32-bit journal with checksum v3 enabled.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reported-by: TR Reardon <thomas_reardon@hotmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Synchronize e2fsck's copy of revoke.c with the kernel's copy in
fs/jbd2.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Synchronize e2fsck's copy of recovery.c with the kernel's copy in
fs/jbd2.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Fix more problems that I found when testing on ppc64:
- Inode swap cut and paste error leads to immutable inodes being
detected as inlinedata inodes, leading to e2fsck incorrectly barfing
on i_block[] contents.
- Superblock csum/verify must be aware of the fs->super byte order
when checking for metadata_csum feature flag. (Hint: in _openfs(),
fs->super is in LE order for the first csum verification)
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
On BE platforms, we need to swap the inode bytes after doing the
checksum verification but before looking at i_blocks.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Add to ext2fs_symlink the ability to create inline data symlinks.
[ Modified by tytso to add more logging to the test script ]
Suggested-by: Pu Hou <houpu.hp@alibaba-inc.com>
Cc: Pu Hou <houpu.hp@alibaba-inc.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The xattr_get method returns to us a pointer to a buffer containing
the EA value. If for some reason we decide to fail out of iterating
the EA part of an inline-data directory, we must free the buffer that
xattr_get passed to us (via inline_data_ea_get).
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Fix up the rest of the inline data code not to complain if there's no
EA, since it's possible that there's no EA because we're in the
process of creating an inline data file. Also, don't return an error
code when removing a nonexistent EA, because there's no reason to.
Furthermore, if we write less than 60 bytes of inline data, remove the
EA to avoid wasting space.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If we're doing a strict overwrite (same data size) of data in an
inline data file, we should be able to skip the size check. If the
in-core EA representation is fine but the on-disk EA is slightly
corrupt (this happens when fixing minor errors in an inline dir), the
ext2fs_xattr_inode_max_size() call, which reads the disk EA, can lead
us to think that there's no space when in reality there is no issue
with doing a strict overwrite.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The inline data code fails to perform endianness conversions correctly
or at all in a number of places, so fix this so that big-endian
machines function properly.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When we're (a) reading EAs into a buffer; (b) byte-swapping EA
entries; or (c) checking EA data, be careful not to run off the end of
the memory buffer, because this causes invalid memory accesses and
e2fsck crashes. This can happen if we encounter a specially crafted
FS image.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Perform a little more sanity checking of EA value offsets so that we
don't crash while trying to load things from the filesystem.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If s_first_meta_bg is greater than the of number block group
descriptor blocks, then reading or writing the block group descriptors
will end up overruning the memory buffer allocated for the
descriptors. Fix this by limiting first_meta_bg to no more than
fs->desc_blocks. This doesn't correct the bad s_first_meta_bg value,
but it avoids causing the e2fsprogs userspace programs from
potentially crashing.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Commit baa3544609 ("libext2fs: have UNIX IO manager use
pread/pwrite) causes a breakage on 32-bit systems where off_t is
32-bits for file systems larger than 4GB. Fix this by using
pread64/pwrite64 if possible, and if pread64/pwrite64 is not present,
using pread/pwrite only if the size of off_t is at least as big as
ext2_loff_t.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>