Remove the code that would zap an extent block immediately if the
checksum failed (i.e. strict_csums). Instead, we'll only do that if
the extent block header shows obvious structural problems; if the
header checks out, then we'll iterate the block and see if we can
recover some extents.
Requires a minor modification to ext2fs_extent_get such that the
extent block will be returned in the buffer even if the return code
indicates a checksum error. This brings its behavior in line with
the rest of libext2fs.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When we're appending an extent to the end of a file and the index
block is full, don't split the index block into two half-full index
blocks because this leaves us with under utilized index blocks, at
least in the fallocate case. Instead, copy the last extent from the
full block into the new block. This isn't perfect utilization, but
there's a lot of work involved in teaching extent.c to be able to goto
a nonexistent node in a newly allocated (and empty) extent block.
This patch does not fix the general problem of keeping the extent tree
balanced.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Fix various small resource leaks and error code handling issues that
Coverity pointed out.
Fixes-Coverity-Bugs: 11919{39-45}, 1174118, 1049160, 1049144
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
In ext2fs_extent_set_bmap() and ext2fs_punch_extent(), fix the parents
when altering either end of an extent so that the parent nodes reflect
the added mapping.
There's a slight complication to using fix_parents: if there are two
mappings to an lblk in the tree, the value of handle->path->curr can
point to either extent afterwards), which is documented in a comment.
Some additional color commentary from Darrick:
In the _set_bmap() case, I noticed that the "remapping last block in
extent" case would produce symptoms if we are trying to remap a
block from "extent" to "next_extent", and the two extents are
pointed to by different index nodes. _extent_replace(...,
next_extent) updates e_lblk in the leaf extent, but because there's
no _extent_fix_parents() call, the index nodes never get updated.
In the _punch_extent() case, we conclude that we need to split an
extent into two pieces since we're punching out the middle. If the
extent is the last extent in the block, the second extent will be
inserted into a new leaf node block. Without _fix_parents(), the
index node doesn't seem to get updated.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
In ext2fs_extent_free(), h(andle)->max_depth is used as a loop
conditional variable to free all the h->path[].buf pointers. However,
ext2fs_extent_delete() sets max_depth = 0 if we've removed everything
from the extent tree, which causes a subsequent _free() to leak some
buf pointers. max_depth can be re-incremented when splitting extent
nodes, but there's no guarantee that it'll reach the old value before
the free.
Therefore, remember the size of h->paths[] separately, and use that
when freeing the extent handle.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
If a client asks us to remap a block in the middle of an extent, we
potentially have to allocate a fair number of blocks to handle extent
tree splits. A failure in either of the ext2fs_extent_insert calls
leaves us with an extent tree that no longer maps the logical block in
question and everything that came after it! Therefore, try to roll
back the extent tree changes before returning an error code.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When inserting the first extent into an empty inode, the
ext2fs_extent_insert() leaves path->left set to 1 instead of 0. Since
path->curr is pointing at the last (only) extent in the file,
path->left should be 0.
This is mostly harmless, and gets corrected fairly quickly if the
calling applicaton jumps to a different part of the extent tree ---
for example, by calling ext2fs_extent_goto(), or calling
ext2fs_extent_get with the flags argument set to EXT2_EXTENT_ROOT.
Which is why we hadn't noticed this problem until now.
However, if you insert four extents using ext2fs_extent_insert, the
fourth insert will end up copying too many bytes in the i_block[]
array, since path->left is one larger than it should be. This results
in the inode fields i_generation, i_file_acl, and i_size_high getting
zeroed out.
This problem can be replicated as follows:
% cp /dev/null /tmp/foo.img
% mke2fs -F -t ext4 /tmp/foo.img 100
% debugfs -w /tmp/foo.img
debugfs: write /dev/null foo
debugfs: set_inode_field foo i_size_hi 1
debugfs: stat foo
<----- note that the inode's size is 4294967296
debugfs: extent_open foo
debugfs (extent ino 12): insert --after 0 1 100
debugfs (extent ino 12): insert --after 1 1 101
debugfs (extent ino 12): insert --after 2 1 102
debugfs (extent ino 12): insert --after 3 1 103
debugfs (extent ino 12): extent_close
debugfs: stat foo
<----- note that the inode's size is now 0
debugfs: quit
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Mostly by adding static and removing excess extern qualifiers. Also
convert a few remaining non-ANSI function declarations to ANSI.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
If we have an extent tree like this (from debuge2fs's "ex" command):
Level Entries Logical Physical Length Flags
...
2/ 2 60/ 63 13096 - 13117 650024 - 650045 22
2/ 2 61/ 63 13134 - 13142 650062 - 650070 9
2/ 2 62/ 63 13193 - 13194 650121 - 650122 2
2/ 2 63/ 63 13227 - 13227 650155 - 650155 1 A)
1/ 2 4/ 14 13228 - 17108 655367 3881 B)
2/ 2 1/117 13228 - 13251 650156 - 650179 24 C)
2/ 2 2/117 13275 - 13287 650203 - 650215 13
2/ 2 3/117 13348 - 13353 650276 - 650281 6
...
and we resize the fs in such a way that all of those blocks must
be moved down, we do them one at a time. Eventually we move 1-block
extent A) to a lower block, and then follow it with the other
blocks in the next logical offsets from extent C) in the next
interior node B).
The userspace extent code tries to merge, so when it finds that
logical 13228 can be merged with logical 13227 into a single extent,
it does. And so on, all through extent C), up to block 13250 (why
not 13251? [1]), and eventually move the node block as well.
So we end up with this when all the blocks are moved post-resize:
Level Entries Logical Physical Length Flags
...
2/ 2 120/122 13193 - 13193 33220 - 33220 1
2/ 2 121/122 13194 - 13194 33221 - 33221 1
2/ 2 122/122 13227 - 13250 33222 - 33245 24 D)
1/ 2 5/ 19 13228 - 17108 34676 3881 E) ***
2/ 2 1/222 13251 - 13251 33246 - 33246 1 F)
2/ 2 2/222 13275 - 13286 33247 - 33258 12
...
All those adjacent blocks got moved into extent D), which is nice -
but the next interior node E) was never updated to reflect its new
starting point - it says the leaf extents beneath it start at 13228,
when in fact they start at 13251.
So as we move blocks one by one out of original extent C) above, we
need to keep updating C)'s parent node B) for a proper starting point.
fix_parents() does this.
Once the tree is corrupted like this, more corruption can
ensue post-resize, because we traverse the tree by interior nodes,
relying on their start block to know where we are in the tree.
If it gets off, we'll end up inserting blocks into the wrong part
of the tree, etc.
I have a testcase using fsx to create a complex extent tree which
is then moved during resize; it hit this corruption quite easily,
and with this fix, it succeeds.
Note the first hunk in the commit is for going the other way,
moving the last block of an extent to the extent after it; this
needs the same sort of fix-up, although I haven't seen it in
practice.
[1] We leave the last block because a single-block extent is its
own case, and there is no merging code in that case. \o/
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
There are a number of places where we multiply a dgrp_t with
s_blocks_per_group expecting that we will get a blk64_t. This
requires a cast, or using the convenience function
ext2fs_group_first_block2().
This audit was suggested by Eric Sandeen.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
This commit adds the functionality which had previously only been in
the tst_extents command to debugfs. The debugfs command extent_open
will open extent tree of a particular inode, and enables a series of
commands which will allow the user to interact with the extent tree
directly. Once the extent tree is closed via extent_open(), these
additional commands will be disabled again.
This commit exports two new functions from lib/ext2fs/extent.c which
had previously been statically defined: ext2fs_extent_node_split() and
ext2fs_extent_goto2().
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Previously, ext2fs_extent_fix_parents() would only avoid modifying the
cursor location associated with the extent handle the cursor was
pointed at a leaf node in the extent tree. This is because it saved
the starting logical block number of the current extent, but not the
"level" of the extent (where level 0 is the leaf node, level 1 is the
interior node which points at blocks containing leaf nodes, etc.)
Fix ext2fs_extent_fix_parents() so it is guaranteed to not change the
current extent in the handle even if the current extent is not at the
bottom of the tree.
Also add a fix_extent command to the tst_extents program to make it
easier to test this function.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
An index node's logical start (ei_block) should
match the logical start of the first node (index
or leaf) below it. If we find a node whose start
does not match its parent, fix all of its parents
accordingly.
If it finds such a problem, we'll see:
Pass 1: Checking inodes, blocks, and sizes
Interior extent node level 0 of inode 274258:
Logical start 3666 does not match logical start 4093 at next level. Fix<y>?
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Verify and calculate extent tree block checksums when processing
filesystems.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When libext2fs allocates/deletes an extent leaf, the i_blocks
value is incremented/decremented by fs->blocksize / 512. This
is incorrect in case of bigalloc. The correct way here is to
use cluster_size / 512.
The problem is seen if we try to create a large inode using
libext2fs (say using ext2fs_block_iterate3()) on a bigalloc
filesystem. fsck catches this and complains.
Signed-off-by: Aditya Kali <adityakali@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The DEFS line in MCONFIG had gotten so long that it exceeded 4k, and
this was starting to cause some tools heartburn. It also made "make
V=1" almost useless, since trying to following the individual commands
run by make was lost in the noise of all of the defines.
So fix this by putting the configure-generated defines in lib/config.h
and the directory pathnames to lib/dirpaths.h.
In addition, clean up some vestigal defines in configure.in and in the
Makefiles to further shorten the cc command lines.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Using the /* fallthrough */ comment lets Coverity (and humans)
know that we really do want to fall through in these case statements.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Use the EXT2_I_SIZE() macro consistently to access the inode size.
The i_size/i_size_high combination is open coded in several places.
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Previously, ext2fs_extent_open2() copied the passed-in inode structure
into the extent handle, and the extent functions modified the copy of
the inode structure if necessary due to extent splits, etc. Change
ext2fs_extent_open2() so that the extent functions use the inode
structure passed into ext2fs_extent_open2(). Otherwise the passed-in
inode structure could become out of date due to changes made by the
extent functions.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The top-level COPYING file states that the e2p and ext2fs libraries
are available under the LGPLv2. The files were incorrectly labelled.
Alex Thomas/Luster has been consulted wrt to the ext3_extents.h file;
the rest of the files were primarily authored by Theodore Ts'o.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Avoiding inserting a new extent if it is possible to merge the new
block to the beginning or the end of the previous or next extent.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Comment out less common debugging printf's, and fix some type
warnings. Add high-level debugging printf's for ext2fs_extent_goto(),
ext2fs_extent_insert(), ext2fs_extent_delete(), ext2fs_extent_replace()
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Commit 0dc291611 introduced a regression when unmapping the first
block in an extent. This caused e2fsck -fD to corrupt large
directories if the directory has to shrink by more than one block.
The problem was set_bmap should only go to a next leaf when setting a
first block in an extent, and not when it is unmapping the first block
in an extent.
Addresses-Debian-Bug: #537510
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
In the case where we ext2fs_extent_set_bmap() is replacing the block
mapping at the beginning of an already-existing extent, insert a new
extent if necessary before shrinking an existing extent, to avoid data
loss if the disk is full.
This mostly addresses the problem described in Red Hat Bugzilla's
statistics are still wrong, but at least the files on the filesystem
are not corrupted. If there is a failure during the
inode_scan_and_fix pass, the simplest thing to do may be to tell the
user to run e2fsck -fy.
Addresses-Red-Hat-Bug: #510379
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Commit 53422e moved the new extent insertion in
ext2fs_extent_set_bmap() prior to the modification of the original
extent, but the insert function left the handle pointing to the new
extent. This left us modifying the -new- extent, instead of the
original one, and winding up with a corrupt extent tree something
like:
BLOCKS:
(0-1):588791-588792, (0):588791
We need to move back to the previous extent prior
to modification, if we inserted a new one.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The ext2_extent_handle only has a struct ext2_inode allocated on
it, and the same amount copied into it in that same function,
but in update_path() we're possibly writing out more than that -
for example 256 bytes, from that address. This causes uninitialized
memory to get written to disk, overwriting the parts of the
inode past the osd2 member (the end of the smaller structure).
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Each time an extent handle is opened and closed, if the inode has an
extent tree which does not fit in the inode's i_block structure, a
filesystem block buffer was not getting released. Since e2fsck opens
an extent handle for every inode using extents, this can translate to
a very large amount of memory getting lost.
Thanks to Henrik 'Mauritz' Johnson for discovering and pointing out
this leak, which he ran into while running the "rdump" command in
debugfs.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The patch below adds a function, ext2fs_extent_open2(), that behaves
as ext2fs_extent_open(), but will use the user-supplied inode
structure when opening an extent instead of reading the inode from
disk. It also changes several of the calls to extent_open() to use
this enhancement.
Signed-off-by: Nic Case <number9652@yahoo.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
http://people.redhat.com/esandeen/livecd-creator-imagefile.bz2
contains an image (for now) which, when resized to 578639, corrupts
the filesystem.
This is a bit crazy, I guess, because the fs currently has only
1 free block, but still, we should be graceful about the failure.
Perhaps it would make sense to check the requested valuea against
the minimum value resize2fs would compute for "-P" and fail (at
least without a force).
But in any case, this exposed 2 bugs when moving that one block
required an extent split, which is what hit the ENOSPC.
For starters, ext2fs_extent_set_bmap() in the "(re/un)mapping last
block in extent" case was replacing the old extent before the
new one was created; when the new extent creation failed, it
left us in an inconsistent state. Simply changing the order of
the two should fix this problem.
Next, ext2fs_extent_insert was calling ext2fs_extent_delete()
on *any* error, including one caused by failure to allocate a new
block to split the node to hold that extent ... the handle was left
unchanged, and we deleted the -original- extent.
As a quick fix for this, just don't do the delete if we fail the split,
though this may need to be smarter. I don't think we have terribly
consistent behavior about where a handle is left on various errors.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
A corrupted interior node in an extent tree would cause e2fsck to
crash with the error message:
Error1: Corrupt extent header on inode 107192
Aborted (core dumped)
Handle this and related failures when scanning an inode's extent tree
more robustly.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
When resize2fs moves blocks belonging to an inode, it will call
ext2fs_extent_set_bmap() for logical blocks 0, 1, 2, 3, ...
Optimize for this calling pattern so we don't end up creating a
separate extent for each block.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
When replacing a single block extent, make sure we set or clear the
uninitialized extent flag as requested by the caller.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
When setting a logical block which is before the first extent in the
extent tree, make sure the new extent goes in front, at the very
beginning of the extent tree. This fixes a bug where previously the
new extent would be inserted out of order in this case.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Memory allocated for the ext2_extent_handle is not getting freed from
all the return paths in case of error. Below patch fixes it.
Signed-off-by: "Manish Katiyar" <mkatiyar@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>