Two new e2undo issues exist in the latest release on big endian
machines.
From sparse check:
undo_io.c:157:26: warning: invalid assignment: |=
undo_io.c:157:26: left side has type restricted __le32
undo_io.c:157:26: right side has type int
undo_io.c:161:26: warning: invalid assignment: &=
undo_io.c:161:26: left side has type restricted __le32
undo_io.c:161:26: right side has type int
e2undo.c:211:16: warning: cast to restricted __le64
e2undo.c:211:16: warning: cast from restricted blk64_t
e2undo.c:212:16: warning: cast to restricted __le64
e2undo.c:212:16: warning: cast from restricted blk64_t
Addresses-RedHat-Bugzilla: 1344636
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The fs_offset entry stores the filesystem offset. This allows for an
easy undo, because one does not have to remember/specify the
filesystem offset manually.
The fs_offset entry is implemented as a compatible feature.
Signed-off-by: Marcus Huewe <suse-tux@gmx.de>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Support key extension if the tdb_data_size is an arbitrary integer
multiple of the channel's block size. Before, key extension was only
possible if the tdb_data_size and the channel's block size were
equal.
Note: a key, whose data is the result of a short read, will be
extended if the tdb_data_size and the channel's block size are equal
(that's what the old code did) (if tdb_data_size is an arbitrary
integer multiple (> 1) of the channel's block size, the key might
be extended as well (depending on the keysize)).
Signed-off-by: Marcus Huewe <suse-tux@gmx.de>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The old code has some issues, for example, when backing up fs block 0
(can be reproduced via "mke2fs -z undo -b 1024 -E offset=1024 out 1024"):
* backing_blk_num is set to ULLONG_MAX instead of 0
* data is read from the beginning of the file instead of offset 1024
* data_ptr is set to read_ptr - 1024 ("invalid" address)
Hence, the wrong fs block is associated with the wrong data.
For details, see also commit 76da764639cbfcc998f13c263a11a4601bcb9961.
Signed-off-by: Marcus Huewe <suse-tux@gmx.de>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Remove all flushes of the undo file except for the one that happens just
prior to the file being closed; it seems that the arbitrary flushes
aren't sufficiently useful.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Write out the undo file's index block after writing a block to the
undo file. This ensures that we always have a consistent undo file
in the page cache, even if the program crashes. When we fill up a
key block in the undo file, we'll call fsync to force the whole
thing to storage; this should happen about every 256 blocks given
the usual 4K block size.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Provide the user with an option to create an undo file so that they
can roll back a failed repair operation.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Use the atexit() function to provide a means for the library to clean
itself up on program exit. This will be used by the undo IO manager
to flush the undo file state to disk if the program should terminate
without closing the io channel, since most e2fsprogs clients will
simply exit() when they hit errors.
This won't help for signal termination; client programs must set
up signal handlers.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The existing undo file format (which is based on tdb) has many
problems. First, its comparison of superblock fields is ineffective,
since the last mount time is only written by the kernel, not the tools
(which means that undo files can be applied out of order, thus
corrupting the filesystem); block numbers are written in CPU byte
order, which will cause silent failures if an undo file is moved from
one type of system to another; using the tdb database costs us an
enormous amount of CPU overhead to maintain the key data structure,
and finally, the tdb database is unable to deal with databases larger
than 2GB. (Upstream tdb 1.2.12 can handle 4GB, but upgrading a 2TB FS
to 64bit,metadata_csum easily produces 2.9GB of undo files, so we
might as well move off of tdb now.)
The last problem is fatal if you want to use tune2fs to turn on
metadata checksumming, since that rewrites every block on the
filesystem, which can easily produce a many-gigabyte undo file, which
of course is unreadable and therefore the operation cannot be undone.
Therefore, rip all of that out in favor of writing to a flat file.
Old blocks are appended to a file and the index is written to the end
when we're done. This implementation is much faster than wasting a
considerable amount of time trying to maintain a hash index, which
drops the runtime overhead of tune2fs -O metadata_csum from ~45min
to ~20 seconds on a 2TB filesystem.
I have a few reasons that factored in my decision not to repurpose the
jbd2 file format for undo files. First, undo files are limited to
2^32 blocks (16TB) which some day might not serve us well. Second,
the journal block size is tied to the file system block size, but
mke2fs wants to be able to back up big chunks of old device contents.
This would require large changes to the e2fsck journal replay code,
which itself is derived from the kernel jbd2 driver, which I'd rather
not destabilize. Third, I want to require undo files to store the FS
superblock at the end of undo file creation so that e2undo can be
reasonably sure that an undo file is supposed to apply against the
given block device, and doing so would require changes to the jbd2
format. Fourth, it didn't seem like a good idea that external
journals should resemble undo files so closely.
v2: Provide a state bit that is only set when the undo channel is
closed correctly so we can warn the user about potentially incomplete
undo files. Straighten out the superblock handling so that undo files
won't be confused for real ext* FS images. Record multi-block runs in
each block key to reduce overhead even further. Support reopening an
undo file so that we can combine multiple FS operations into one
(overall smaller) transaction file, which will be easier to manage.
Flush the undo index data if the program should terminate
unexpectedly. Update the ext4 superblock bits if errors or -f is
found to encourage fsck to do a full run the next time it's invoked.
Enable undoing the undo.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
It's really inefficient to (ab)use the TDB key store as a bitmap to
find out if we've already written a block to the undo file, because
the tdb code is reads the database key btree disk blocks for *every*
query. Changing that logic to a bitmap reduces overhead by a large
margin -- the overhead of using undo_io while converting a 2TB FS to
metadata_csum is reduced from 55 minutes to 45.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Most of the e2fsprogs utilities set the IO block size multiple times
(once to 1k to read the superblock, then again to set the real block
size if we find a real superblock). Unfortunately, the undo IO
manager only lets the block size be set once. For the non-mke2fs
utilities we'd rather catch the real block size and use that. mke2fs
of course wants to use a really large block size since it's probably
writing a lot of data.
Therefore, if we haven't written any blocks to the undo file, it's
perfectly fine to allow block size changes. For mke2fs, we'll modify
the IO channel option that lets us set the huge size to lock that
in place. This greatly reduces index overhead for undo files for
e2fsck/tune2fs/resize2fs while continuing the practice of reducing
it even more for mke2fs.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Implement pass-through calls for discard, zero-out, and readahead in
the IO manager so that we can take advantage of any underlying
support.
Furthermore, improve tdb write-out speed by disabling locking and only
fsyncing at the end -- we don't care about locking because having
multiple writers to the undo file will produce an undo database full
of garbage blocks; and we only need to fsync at the end because if we
fail before the end, our undo file will lack the necessary superblock
data that e2undo requires to do replay safely. Without this, we call
fsync four times per tdb update(!) This reduces the overhead of using
undo_io while converting a 2TB FS to metadata_csum from 3+ hours to 55
minutes.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Using C99 initializers makes the code a bit more readable, and it
avoids some gcc -Wall warnings regarding missing initializers.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Declare struct_io_manager at the end of unix_io.c, undo_io.c, and
test_io.c files so that there isn't a need to forward declare every
member of this structure. That avoids a lot of redundant code
at the start of every one of these files.
Move the test_flush() function above test_abort() to avoid the need
for a forward declaration.
Fix a few instances of space before tab in these files.
Signed-off-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The DEFS line in MCONFIG had gotten so long that it exceeded 4k, and
this was starting to cause some tools heartburn. It also made "make
V=1" almost useless, since trying to following the individual commands
run by make was lost in the noise of all of the defines.
So fix this by putting the configure-generated defines in lib/config.h
and the directory pathnames to lib/dirpaths.h.
In addition, clean up some vestigal defines in configure.in and in the
Makefiles to further shorten the cc command lines.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
In inode_open(), if the allocation of &io fails, we go to cleanup
and dereference io to test io->name, which is a bug.
Similarly in undo_open() if allocation of &data fails, we
go to cleanup and dereference data to test data->real.
In the test_open() case we explicitly set retval to the only
possible error return from ext2fs_get_mem(), so remove that
for tidiness.
The other changes just make make earlier returns go through
the error goto for consistency.
In many cases we returned directly from the first error, but
"goto cleanup" etc for every subsequent error. In some
cases this leads to "impossible" tests such as:
if (ptr)
ext2fs_free_mem(&ptr)
on paths where ptr cannot be null because we would have
returned directly earlier, and Coverity flags this.
This isn't really indicative of an error in most cases, but
I think it can be clearer to always exit through the error goto
if it's used later in the function.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
In the !undo_io_backing_manager case, undo_err_handler_init
will be passed a null data->real, which will be dereferenced.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The top-level COPYING file states that the e2p and ext2fs libraries
are available under the LGPLv2. The files were incorrectly labelled.
Alex Thomas/Luster has been consulted wrt to the ext3_extents.h file;
the rest of the files were primarily authored by Theodore Ts'o.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
This I/O manager saves the contents of the location being overwritten
to a tdb database. This helps in undoing the changes done to the
file system.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>