Commit Graph

84 Commits (9603da15cb95c4635de6eac6f2ec7dd36a6972f3)

Author SHA1 Message Date
Darrick J. Wong a5abfe0382 e2fsck: read-ahead metadata during passes 1, 2, and 4
e2fsck pass1 is modified to use the block group data prefetch function
to try to fetch the inode tables into the pagecache before it is
needed.  We iterate through the blockgroups until we have enough inode
tables that need reading such that we can issue readahead; then we sit
and wait until the last inode table block read of the last group to
start fetching the next bunch.

pass2 is modified to use the dirblock prefetching function to prefetch
the list of directory blocks that are assembled in pass1.  We use the
"iterate a subset of a dblist" and avoid copying the dblist.  Directory
blocks are fetched incrementally as we walk through the directory
block list.  In previous iterations of this patch we would free the
directory blocks after processing, but the performance hit to e2fsck
itself wasn't worth it.  Furthermore, it is anticipated that most
users will then mount the FS and start using the directories, so they
may as well remain in the page cache.

pass4 is modified to prefetch the block and inode bitmaps in
anticipation of pass 5, because pass4 is entirely CPU bound.

In general, these mechanisms can decrease fsck time by 10-40%, if the
host system has sufficient memory and the storage system can provide a
lot of IOPs.  Pretty much any storage system capable of handling
multiple IOs in-flight at any time will see a fairly large performance
boost.  (Single-issue USB mass storage disks seem to suffer badly.)

By default, the readahead buffer size will be set to the size of a block
group's inode table (which is 2MiB for a regular ext4 FS).  The -E
readahead_kb= option can be given to specify the amount of memory to
use for readahead or zero to disable it entirely; or an option can be
given in e2fsck.conf.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-04-21 10:40:21 -04:00
Theodore Ts'o 9a32411732 Merge branch 'maint' into next
Conflicts:
	lib/ext2fs/inode.c
2014-12-25 23:43:10 -05:00
Theodore Ts'o 13f450addb libext2fs: add sanity check for an invalid itable_used value in inode scan code
If the number of unused inodes is greater than number of inodes a
block group, this can cause an e2fsck -n run of the file system to
crash.

We should add more checks to e2fsck to detect this case directly, but
this will at least protect progams (tune2fs, dump, etc.) which use the
inode_scan abstraction from crashing on an invalid file system.

Addresses-Debian-Bug: #773795

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-12-25 23:29:19 -05:00
Darrick J. Wong 54f6faf7f2 libext2fs: don't report garbage inodes with really large inodes
If the inode size is large enough that there are fewer than two inodes
per block, don't report an inode checksum failure as a garbage inode
during the scan because the "more than half are broken" criteria that
we use to decide if a block of inodes is garbage doesn't really apply.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-12-02 22:17:10 -05:00
Darrick J. Wong 18b234b121 libext2fs: byteswap inode when performing the sanity scan
On BE platforms, we need to swap the inode bytes after doing the
checksum verification but before looking at i_blocks.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-08-24 22:00:56 -04:00
Darrick J. Wong b9f95911e9 libext2fs: don't cache inodes that fail checksum verification
If an inode fails checksum verification, don't stuff a copy of it in
the inode cache, because this can cause the library to fail to return
the "corrupt inode" error code.

In general, this happens if ext2fs_read_inode_full() is called twice
on an inode with an incorrect checksum.  If fs->flags has
EXT2_FLAG_IGNORE_CSUM_ERRORS set during the first call and *unset*
during the second call, the cache hit during the second call fails to
return EXT2_ET_INODE_CSUM_INVALID as you'd expect.  This happens
during fsck because the first read_inode call happens as part of
check_blocks and the second call happens during inode checksum
revalidation.  A file system with a slightly corrupt non-extent inode
will trigger this.

While we're at it, make the inode read function consistent with the
rest of libext2fs -- copy the metadata object into the caller's buffer
even if it fails checksum verification.  This will help e2fsck avoid a
double re-read later on down the line.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-08-02 22:49:23 -04:00
Darrick J. Wong 2e9d839156 e2fsck: correctly preserve fs flags when modifying ignore-csum-error flag
When we need to modify the "ignore checksum error" behavior flag to
get us past a library call, it's possible that the library call can
result in other flag bits being changed.  Therefore, it is not correct
to restore unconditionally the previous flags value, since this will
have unintended side effects on the other fs->flags; nor is it correct
to assume that we can unconditionally set (or clear) the "ignore csum
error" flag bit.  Therefore, we must merge the previous value of the
"ignore csum error" flag with the value of flags after the call.

Note that we want to leave checksum verification on as much as
possible because doing so exposes e2fsck bugs where two metadata
blocks are "sharing" the same disk block, and attempting to fix one
before relocating the other causes major filesystem damage.  The
damage is much more obvious when a previously checked piece of
metadata suddenly fails in a subsequent pass.

The modifications to the pass 2, 3, and 3A code are justified as
follows: When e2fsck encounters a block of directory entries and
cannot find the placeholder entry at the end that contains the
checksum, it will try to insert the placeholder.  If that fails, it
will schedule the directory for a pass 3A reconstruction.  Until that
happens, we don't want directory block writing (pass 2), block
iteration (pass 3), or block reading (pass 3A) to fail due to checksum
errors, because failing to find the placeholder is itself a checksum
verification error, which causes e2fsck to abort without fixing
anything.

The e2fsck call to ext2fs_read_bitmaps must never fail due to a
checksum error because e2fsck subsequently (a) verifies the bitmaps
itself; or (b) decides that they don't match what has been observed,
and rewrites them.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-08-02 22:48:21 -04:00
Darrick J. Wong 68d70624e3 e2fsck: offer to clear inode table blocks that are insane
Add a new behavior flag to the inode scan functions; when specified,
this flag will do some simple sanity checking of entire inode table
blocks.  If all the checksums are ok, we can skip checksum
verification on individual inodes later on.  If more than half of the
inodes look "insane" (bad extent tree root or checksum failure) then
ext2fs_get_next_inode_full() can return a special status code
indicating that what's in the buffer is probably garbage.

When e2fsck' inode scan encounters the 'inode is garbage' return code
it'll offer to zap the inode straightaway instead of trying to recover
anything.  This replaces the previous behavior of asking to zap
anything with a checksum error (strict_csum).

Signed-off-by: Darrick J. Wong <darrick.wong@orale.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-08-02 22:46:16 -04:00
Zheng Liu be31a8de5a libext2fs: export inode cache creation function
Currently we have already exported inode cache flush and free functions
for users.  This commit exports inode cache creation function.  Later
we will use this function to initialize inode cache and do some unit
tests for inline data.

Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-03-04 08:46:15 -05:00
Theodore Ts'o e337e7fad8 Merge branch 'maint' into next
Conflicts:
	e2fsck/problem.c
	e2fsck/rehash.c
	e2fsck/super.c
2013-10-12 22:26:28 -04:00
Darrick J. Wong 4dbfd79d14 e2fsprogs: fix blk_t <- blk64_t assignment mismatches
Fix all the places where we should be using a blk64_t instead of a
blk_t.  These fixes are more severe because 64bit values could be
truncated silently.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-10-07 09:51:48 -04:00
Theodore Ts'o 07bcd90f3d Merge branch 'maint' into next 2013-04-22 00:07:08 -04:00
Theodore Ts'o 572ef60b89 libext2fs: only use override function when reading an 128 byte inode
The ext2fs_read_inode_full() function should not use fs->read_inode()
if the caller has requested more than the base 128 byte inode
structure and the inode size is greater than 128 bytes.  Otherwise the
caller won't get all of the bytes that they were asking for, since
there's no way for the fs->read_inode override function can know what
the size of the buffer passed to ext2fs_read_inode_full().

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-04-21 23:53:26 -04:00
Andreas Dilger 1b8c4c1b45 build: quiet build warnings for "gcc -Wall"
Signed-off-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-01-27 22:31:03 -05:00
Theodore Ts'o 603e5ebc8b libext2fs: allocate separate memory regions for each inode in the cache
The changes to support metadata checksum allocated a single large
array for all of the inodes in the inode cache.  This is slightly more
efficient, but given that the inode cache is small (only 4 inodes) it
doesn't really have that much benefit.  The problem with doing things
this way is that the memory overruns, such as the one fixed in commit
43c4910371, do not get detected by valgrind.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-11-29 20:40:21 -05:00
Eric Whitney 43c4910371 libext2fs: fix inode cache overruns
An inode cache slot will be overrun if a caller to ext2fs_read_inode_full()
or ext2fs_write_inode_full() attempts to read or write a full sized 156
byte inode when the target filesystem contains 128 byte inodes.  Limit the
copied inode to the smaller of the target filesystem's or the caller's
requested inode size.

Signed-off-by: Eric Whitney <enwlinux@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2012-11-29 19:59:41 -05:00
Darrick J. Wong 5b58dc2304 libext2fs: block group checksum should use metadata_csum algorithm
Change the block group algorithm to use the same algorithm as the rest
of the metadata_csum.  This mostly involves providing a helper
function to tell if group descriptors should have checksums set or
verified, and modifying the gdt checksum code to use the correct
algorithm.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2012-08-02 20:47:45 -04:00
Darrick J. Wong 37d82b6a95 libext2fs: add inode checksum support
This patch adds the ability for the libext2fs functions to read and
write the inode checksum.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2012-07-30 18:46:04 -04:00
Darrick J. Wong 91db7e206d libext2fs: read and write full size inodes
Change libext2fs to read and write full-size inodes in preparation for
the metadata checksumming patchset, which will require this.  Due to
ABI compatibility requirements, this change must be hidden from client
programs.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2012-07-30 17:42:15 -04:00
Theodore Ts'o fd1c5a0622 libext2fs: factor out I/O buffer allocation
Create a new function, io_channel_alloc_buf() which allocates I/O
buffers with appropriate alignment if we are using direct I/O.  The
original code was sometimes using a larger alignment factor than
necessary, and would always request an aligned memory buffer even when
it was not necessary since the block device was not opened with
O_DIRECT.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-05-07 14:41:49 -04:00
Theodore Ts'o d1154eb460 Shorten compile commands run by the build system
The DEFS line in MCONFIG had gotten so long that it exceeded 4k, and
this was starting to cause some tools heartburn.  It also made "make
V=1" almost useless, since trying to following the individual commands
run by make was lost in the noise of all of the defines.

So fix this by putting the configure-generated defines in lib/config.h
and the directory pathnames to lib/dirpaths.h.

In addition, clean up some vestigal defines in configure.in and in the
Makefiles to further shorten the cc command lines.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-09-18 17:34:37 -04:00
Eric Sandeen 624e8ebe30 e2fsprogs: Fix some error cleanup path bugs
In inode_open(), if the allocation of &io fails, we go to cleanup
and dereference io to test io->name, which is a bug.

Similarly in undo_open()  if allocation of &data fails, we
go to cleanup and dereference data to test data->real.

In the test_open() case we explicitly set retval to the only
possible error return from ext2fs_get_mem(), so remove that
for tidiness.

The other changes just make make earlier returns go through
the error goto for consistency.

In many cases we returned directly from the first error, but
"goto cleanup" etc for every subsequent error.  In some
cases this leads to "impossible" tests such as:

	if (ptr)
		ext2fs_free_mem(&ptr)

on paths where ptr cannot be null because we would have
returned directly earlier, and Coverity flags this.

This isn't really indicative of an error in most cases, but
I think it can be clearer to always exit through the error goto
if it's used later in the function.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2011-09-16 18:43:05 -04:00
Theodore Ts'o 41e102a4d1 libext2fs: fix 64-bit support in ext2fs_{read,write}_inode_full()
This fixes a problem where reading or writing inodes located after the
4GB boundary would fail.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-07-05 20:02:27 -04:00
Theodore Ts'o 9d92a201de Merge branch 'maint' into next
Conflicts:
	configure
	configure.in
	lib/ext2fs/ext2fs.h
	misc/mke2fs.c
2010-09-24 22:40:21 -04:00
Theodore Ts'o 00f0b14118 ext2fs: Optimize for Direct I/O
Allocate various memory structures to be properly aligned to avoid
needing to use a bounce buffer when doing direct I/O read/writes.
This should also help on FreeBSD systems which require aligned buffers
unconditionally.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-09-24 10:06:45 -04:00
Theodore Ts'o 97d26ce9e3 Merge branch 'maint' into next
Conflicts:
	e2fsck/journal.c
	e2fsck/pass1.c
	e2fsck/pass2.c
	misc/mke2fs.c
2010-06-07 12:42:40 -04:00
Theodore Ts'o 543547a52a libe2p, libext2fs: Update file copyright permission states to match COPYING
The top-level COPYING file states that the e2p and ext2fs libraries
are available under the LGPLv2.  The files were incorrectly labelled.
Alex Thomas/Luster has been consulted wrt to the ext3_extents.h file;
the rest of the files were primarily authored by Theodore Ts'o.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-05-17 23:04:39 -04:00
Valerie Aurora Henson d7cca6b06f Convert to use block group accessor functions
Convert direct accesses to use the following block group accessor
functions: ext2fs_block_bitmap_loc(), ext2fs_inode_bitmap_loc(),
ext2fs_inode_table_loc(), ext2fs_bg_itable_unused(),
ext2fs_block_bitmap_loc_set(), ext2fs_inode_bitmap_loc_set(),
ext2fs_inode_table_loc_set(), ext2fs_bg_free_inodes_count(),
ext2fs_ext2fs_bg_used_dirs_count(), ext2fs_bg_free_inodes_count_set(),
ext2fs_bg_free_blocks_count_set(), ext2fs_bg_used_dirs_count_set()

Signed-off-by: Valerie Aurora Henson <vaurora@redhat.com>
Signed-off-by: Nick Dokos <nicholas.dokos@hp.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-10-25 21:43:47 -04:00
Theodore Ts'o cd65a24e75 libext2fs: Convert ext2fs_bg_flag_test() to ext2fs_bg_flags_test()
After cleaning up ext2fs_bg_flag_set() and ext2fs_bg_flag_clear(),
we're left with ext2fs_bg_flag_test().  Convert it to
ext2fs_bg_flags_test().

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-10-25 21:42:12 -04:00
Theodore Ts'o 732c8cd58f Use accessor functions fields for bg_flags in the block group descriptors
Signed-off-by: Valerie Aurora Henson <vaurora@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-07 21:15:12 -04:00
Valerie Aurora Henson 24a117abd0 Convert to use io_channel_read_blk64() and io_channel_write_blk64()
Signed-off-by: Valerie Aurora Henson <vaurora@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-07 21:14:24 -04:00
Theodore Ts'o a7843581f5 ext2fs_read_inode_full: Add safety check to avoid SEGV's on corrupted fs's
Thanks to Thiemo Nagel for suggesting this.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-01-19 23:09:37 -05:00
Theodore Ts'o 03fa6f8ae2 Fix various signed/unsigned gcc warnings
Some of these could affect filesystems between 2^31 and 2^32-1 blocks.

Thanks to Valerie Aurora Henson for pointing out the problems in
lib/ext2fs/alloc_tables.c, which led me to do a "make gcc-wall" scan
over the source tree.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-11-16 10:06:59 -05:00
Theodore Ts'o efc6f628e1 Remove trailing whitespace for the entire source tree
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-08-27 23:07:54 -04:00
Theodore Ts'o 3bcc6276a0 libext2fs: Initialize unset inode timestamps when writing a new inode
As Li Zefan <lizf@cn.fujitsu.com> reported, the creation timestamp was
not getting set on the lost+found inode.  This patch makes sure all of
the timestamps are appropriately set.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-08-14 14:44:15 -04:00
Manish Katiyar 8eb3b8a0a0 ext2fs_read_inode: Check the validity of the inode number earlier
It looks like the right place to check for ino=0 in
ext2fs_read_inode_full() is before creating the inode cache, otherwise
since we set icache[i].ino = 0 in create_icache(), it will match the
loop below and thus we return a wrong value.

Signed-off-by: "Manish Katiyar" <mkatiyar@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-07-15 11:10:19 -04:00
Theodore Ts'o d11736c6dd ext2fs_open_inode_scan: Handle an non-zero bg_itable_used in block group 0
Previously, the portion of the inode table for block group 0 was
always completely zero'ed out, so the ext2fs_open_inode_scan() didn't
handle a non-zero bg_itable_used value for the first block group.  Fix
this.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-22 23:22:17 -04:00
Theodore Ts'o 16b851cdae Remove LAZY_BG feature
This simplifies the code, and using the uninit_bg with the inode table
lazily initialized is just as good.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-20 23:33:34 -04:00
Andreas Dilger 6f19f44a4c libext2fs: Micro-optimization in inode scan code
Signed-off-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-03-31 14:28:37 -04:00
Jose R. Santos d4f34d41be Add uninit block group support to various libext2fs functions
Signed-off-by: Jose R. Santos <jrs@us.ibm.com>
Signed-off-by: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-03-20 15:33:12 -04:00
Theodore Ts'o e6a4571eec Merge branch 'maint' into next
Conflicts:

	lib/ext2fs/closefs.c
2007-12-09 17:03:01 -05:00
Theodore Ts'o ee01079a17 libext2fs: Add checks to prevent integer overflows passed to malloc()
This addresses a potential security vulnerability where an untrusted
filesystem can be corrupted in such a way that a program using
libext2fs will allocate a buffer which is far too small.  This can
lead to either a crash or potentially a heap-based buffer overflow
crash.  No known exploits exist, but main concern is where an
untrusted user who possesses privileged access in a guest Xen
environment could corrupt a filesystem which is then accessed by the
pygrub program, running as root in the dom0 host environment, thus
allowing the untrusted user to gain privileged access in the host OS.

Thanks to the McAfee AVERT Research group for reporting this issue.

Addresses CVE-2007-5497.

Signed-off-by: Rafal Wojtczuk <rafal_wojtczuk@mcafee.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2007-12-05 21:01:35 -05:00
Theodore Ts'o 126a291c76 Clean up libext2fs by byte swapping iff WORDS_BIGENDIAN
We don't need byte swapping to be a run-time option; it can just be a
compile-time option instead.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2007-08-11 01:59:13 -04:00
Kalpak Shah 1ed49d2c2a Fix byte swapping bug in get_next_inode_full()
On big-endian systems, while swapping, ext2fs_swap_inode_full() swaps
only 128+extra_isize bytes and the EAs if they are present. Now if inode
N has EAs, (and this is the inode in the "scratch inode") then inode N+1
also carries seems to have them since the "scratch inode" was never
zeroed.

Signed-off-by: Kalpak Shah <kalpak@clusterfs.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2007-06-29 21:40:19 -04:00
Kalpak Shah 915a2669ef Fix ext2fs_read_inode_full() so that the whole inode is byte-swapped
Signed-off-by: Kalpak Shah <kalpak@clusterfs.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2007-06-22 22:32:43 -04:00
Jim Garlick cc37e0d3ae Fix memory leak in ext2fs_write_new_inode()
The following patch addresses a memory leak in libext2fs
that occurs when using ext2fs_write_new_inode() on a file system
configured with large inodes.

Signed-off-by: Jim Garlick <garlick@llnl.gov>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2007-04-06 08:50:15 -04:00
Brian Behlendorf e649be9daa [COVERITY] Fix (error case) memory leak in libext2fs (ext2fs_write_inode_full)
Need to free w_inode on early exit if w_inode != &temp_inode.

Coverity ID: 22: Resource Leak

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2007-03-21 17:38:47 -04:00
Theodore Ts'o f5fa20078b Add support for EXT2_FEATURE_COMPAT_LAZY_BG
This feature is initially intended for testing purposes; it allows an
ext2/ext3 developer to create very large filesystems using sparse files
where most of the block groups are not initialized and so do not require
much disk space.  Eventually it could be used as a way of speeding up
mke2fs and e2fsck for large filesystem, but that would be best done by 
adding an RO_COMPAT extension to the filesystem to allow the inode table
to be lazily initialized on a per-block basis, instead of being entirely initialized
or entirely unused on a per-blockgroup basis.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2006-05-08 20:17:26 -04:00
Theodore Ts'o b1ae119729 Add missing return values in error return cases in the ext2fs library.
(Otherwise we return garbage instead of the error code.)
2005-04-09 01:21:21 -04:00
Theodore Ts'o e27b45639a Fix mke2fs so that it writes the root directory
using ext2fs_write_new_inode(), and fix ext2fs_write_new_inode() 
so that it initializes i_extra_isize properly.
2005-03-21 01:02:53 -05:00