This patch includes the changes required to e2fsck to understand the
nlink count changes made in the kernel.
In e2fsck pass 4, when we fetch the actual link count, if it is
exceeds 65,000 we set the link count to 1. We silently fix the
situation where the nlink count of the directory is 1, and there are
fewer than 65,000 subdirectories, since since that can happen
naturally.
Patch originally from CFS, significantly rewritten by Theodore Ts'o.
Signed-off-by: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: Kalpak Shah <kalpak@clusterfs.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
In order to more accurately count the number of directories, which
with the DIR_NLINKS feature can now be greater than 65,000, change
icount to use a 32-bit counter. This doesn't cost us anything extra
when the icount data structures are stored in memory, since due to
padding for alignment reasons.
If the actual count is greater than 65,500, we return 65,500. This is
because e2fsck doesn't actually need to know the exact count; it only
needs to know if the number of subdirectories is greater than 65,000.
In the future if someone really needs to know the exact number, we
could add a 32-bit interface. One isn't needed now, though.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
No application will ever use the ORPHAN_FS flag, since it only shows
up in kernel memory, but it's been pointed out it was first used in
ext3, and so it should be renamed for accuracy.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
After the fix for resize2fs's inode mover losing in-inode
extended attributes, the regression test I wrote caught
that the attrs were still getting lost on powerpc.
Looks like the problem is that ext2fs_swap_inode_full()
isn't paying attention to whether or not the EA magic is
in hostorder, so it's not recognized (and not swapped)
on BE machines. Patch below seems to fix it.
Yay for regression tests. ;)
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
ext2fs_extent_insert() was copying n-1 of the existing extents when
moving things down to make room for the new extent.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
When deleting the last entry in a node, back up the current pointer so
it is always pointing at a valid entry.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Add 64-bit block capable routines to inode IO manager. Since fileio.c
does not yet have 64bit support, these routines will not handle 64bit
block numbers correctly yet.
Signed-off-by: Jose R. Santos <jrs@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
In order to provide 64-bit block support for IO managers an maintain
ABI compatibility with the old API, some new functions need to be
added to struct_io_manger. Luckily, strcut_io_manager has some
reserved space that we can use to add these new functions.
Signed-off-by: Jose R. Santos <jrs@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Add two new functions which allows the caller to examine the last
directory block entry added to the list, and to drop if it necessary.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Add a flag which returns the partially completed filesystem object so
e2fsck can print more intelligent error messages.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
If a block buffer was not supplied and ext2fs_alloc_block() returned
with no errors, it would leak a temporary block buffer.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
This is useful for mballoc to align block allocation on the RAID
stripe boundaries.
Signed-off-by: Rupesh Thakare <rupesh@clusterfs.com>
Signed-off-by: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Add vertificaton of the in-inode EA information, and allow in-inode
EA's to have a checksum.
Signed-off-by: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Responsibility for byte swapping the extents information rests with
the low-level extent code, which translates the on-disk extents
information to the abstract extent format. The on-disk format will
eventually get more complicated, in order to add support for 64-bit
block numbers, bit-compressed extents, etc. So to avoid needing to
expose all of that complexity in swapfs.c, the in-memory contents of
i_blocks will not be byte-swapped and will be identical to the on-disk
format.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The major changes were:
* Fix realloc() leak on failure case from Jim Meyering
* Fixed various problems in transaction lock code
* Made transaction_brlock() static
* Added more fine-grained locking features
Moved from svn revision #22080 to #23590
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
DJGPP lacks sys/select.h and sys/un.h; add header checks to be more
portable.
Signed-off-by: Christophe Grenier <grenier@cgsecurity.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
This flag allows the caller to promise that it will not try to modify
the block numbers returned by the iterator.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Previously we used a hard-coded test where for the Alpha and the IA64,
we used lseek instead of llseek(). Generalize this to whenver
sizeof(long) is the same as sizeof(long long).
It turns out this fixes a FTBFS problem on the x86_64 for Debian,
since dietlibc doesn't provide llseek() on that architecture.
Addresses-Debian-Bug: #459614
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The test_fs flag is an "ok to be used with test kernel code" flag. It
makes it easier for us to determine whether a filesystem should be
mounted using ext4 or not.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
inode_uid() and inode_gid() weren't getting defined on systems that
were not Linux, Hurd, or Masix.
Addresses-Sourceforge-Bug: #1859778
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Add some additional checks, primarily in resize2fs and in the rarely
used (and soon to-be-deprecated) e2fsck byte-swap filesystem function.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Now that e2fsck tries to backup the primary superblock to the backups
when the feature sets ar different, it's important when tune2fs writes
out a changed superblock, that we filter out the
EXT3_FEATURE_INCOMPAT_RECOVER feature to the backup superblocks, since
it will be removed from the primary superblock either when the
filesystem is mounted uncleanly or when journal is replayed.
Addresses-Debian-Bug: #454926
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
This addresses a potential security vulnerability where an untrusted
filesystem can be corrupted in such a way that a program using
libext2fs will allocate a buffer which is far too small. This can
lead to either a crash or potentially a heap-based buffer overflow
crash. No known exploits exist, but main concern is where an
untrusted user who possesses privileged access in a guest Xen
environment could corrupt a filesystem which is then accessed by the
pygrub program, running as root in the dom0 host environment, thus
allowing the untrusted user to gain privileged access in the host OS.
Thanks to the McAfee AVERT Research group for reporting this issue.
Addresses CVE-2007-5497.
Signed-off-by: Rafal Wojtczuk <rafal_wojtczuk@mcafee.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
We cannot merge a removed directory entry to just arbitrary previous
directory entry. The previous entry must be in the same block. So
really bad things can happen when are deleting the first directory
entry in a block where the last directory entry in the previous
directory block is not in use. We fix this bug by checking to see if
the current entry is not the first one in the block before trying to
merge it to the previous entry.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
On my FC8 install, ismounted.c fails to build because open(O_CREAT) is
used without passing a mode. The following trivial patch fixes it.
Signed-off-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Add FLEX_BG as a supported feature bit.
Add support to mke2fs to create filesystems with FLEX_BG.
Add support to tune2fs to add (and remove, if it won't break
filesystem consistency) the FLEX_BG feature.
Signed-off-by: Jose R. Santos <jrs@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
--
lib/e2p/feature.c | 2 ++
lib/ext2fs/ext2fs.h | 6 ++++--
misc/mke2fs.c | 7 ++++++-
3 files changed, 12 insertions(+), 3 deletions(-)
The FLEX_BG feature allows the inode table, block bitmap, and inode
bitmaps to be located anywhere in the filesystem. Update e2fsck and
libext2fs's checking code to recognize this.
Signed-off-by: Jose R. Santos <jrs@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
--
e2fsck/super.c | 14 ++++++++++++--
lib/ext2fs/check_desc.c | 15 +++++++++++++--
2 files changed, 25 insertions(+), 4 deletions(-)
All files under $(OBJS) and $(SRCS) should be in alphabetical order
but this is not always the case. Let fix some some of these before
applying new files to the list of $(SRCS).
Signed-off-by: Jose R. Santos <jrs@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
--
lib/ext2fs/Makefile.in | 12 ++++++------
1 files changed, 6 insertions(+), 6 deletions(-)
When fgets() function fails, contents of the buffer is undefined. That
is, fgets() return value needs to be checked, to avoid undefined behavior.
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Add macros to support variable-length group descriptors for ext4.
Signed-off-by: Valerie Clement <valerie.clement@bull.net>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Use ext2fs_group_first_block() instead of the open-coded equivalent in
ext2fs_super_and_bgd_loc() and ext2fs_descriptor_block_loc().
Signed-off-by: Valerie Clement <valerie.clement@bull.net>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
ext2fs_dblist_dir_iterate() calls ext2fs_dblist_iterate(), which calls
ext2fs_process_dir_block(), which in turn calls the helper function
db_dir_proc() which calls callback function passed into
ext2fs_dblist_dir_iterate(). At each stage the conventions for
signalling requests to abort the iteration or to signal errors
changes, db_dir_proc() was not properly mapping the abort request back
to ext2fs_dblist_iterate().
Currently db_dir_proc() is ignoring errors (i/o errors or directory
block corrupt errors) from ext2fs_process_dir_block(), since the main
user of ext2fs_dblist_dir_iterate() is e2fsck, for which this is the
correct behavior. In the future ext2fs_dblist_dir_iterate() could
take a flag which would cause it to abort if
ext2fs_process_dir_block() returns an error; however, it's not clear
how useful this would be since we don't have a way of signalling the
exact nature of which block had the error, and the caller wouldn't
have a good way of knowing what percentage of the directory block list
had been processed. Ultimately this may not be the best interface for
applications that need that level of error reporting.
Thanks to Vladimir V. Saveliev <vs@clusterfs.com> for pointing out
this problem.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Any attempt to open a filesystem with s_inode_size set to zero causes
a floating point exception. This is true for e2fsck, dumpe2fs,
e2image, etc. Fix ext2fs_open2() so that it returns the error code
EXT2_ET_CORRUPT_SUPERBLOCK instead of crashing.
Thanks to Dean Bender for reporting this bug.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
This patch instruments the libext2fs unix I/O manager and adds bytes
read/written and data rate to e2fsck -tt pass/overall timing output.
Signed-off-by: Jim Garlick <garlick@llnl.gov>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Create new functions ext2fs_{set,get}_{inode,block}_bitmap_range()
which allow programs like e2fsck, dumpe2fs, etc. to get and set chunks
of the bitmap at a time.
Move the representation details of the 32-bit old-style bitmaps into
gen_bitmap.c.
Change calls in dumpe2fs, mke2s, et. al to use the new abstractions.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Move the 32-bit specific bitmap code into gen_bitmap.c, and the
high-level interfaces into bitmaps.c. Eventually we'll move the
new-style bitmap code into gen_bitmap64.c, but first we need to
isolate the code with knowledge of the bitmap internals in one place
first.
In this patch we move allocation, free, copy, clear, set_padding, and
fudge_end function into gen_bitmap.c, and make sure that the bitmaps.c
and bitops.c no longer have any knowledge of the bitmap internals.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
This changes ext2fs_fast_{mark,unmark,test}_{inode,block}_bitmap() to
be inline functions which calls ext2fs_{mark,unmark,test}_generic_bitmap().
This is part of the preparation to support the new-style bitmaps.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The test in ext2fs_check_desc() is off by one; if the inode table
goes all the way to the last block of the block group, it will
falsely assert that it has extended past it. The last block
of a range is start + len -1, not start + len.
You can create (valid) filesystems that will cause e2fsck to complain
via one of the following mkfs commands:
mkfs.ext3 -F -b 1024 /dev/sdb1 2046000000
mke2fs -j -F -b 4096 -m 0 -N 5217280 /mnt/test/fsfile2 327680
mkfs.ext2 -F -b 1024 -m 0 -g 256 -N 3744 fsfile 1024
Addresses-Red-Hat-Bugzilla: #214765
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
For some odd geometries*, mkfs will try to allocate inode tables off
the end of the block group and fail, rather than warning that too
many inodes have been requested.
This is because when ext2fs_initialize calculates metadata overhead,
it is only adding in group descriptor blocks and the superblock
if the *last* bg contains them - but the first bg also has all of
the various metadata bits taking up space.
We need to calculate the overhead both for the first block group and
the last block groups separately, since the two different tests need
to know what the overheads are for those two cases, which may be
different.
*for example "mke2fs -b 1024 -m 0 -g 256 -N 3745 fsfile 1024"
(Note, the test here is a little funky; the expected output is
actually a mkfs failure - but a proper failure instead of the
allocator catching the problem at the last minute)
Addresses-Red-Hat-Bugzilla: #241767
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
We need to set t->i_file_acl before we test it in
ext2fs_inode_data_blocks()
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Now that we are moving to x.y.z version number scheme for maintenance
releases, we ned to change ext2fs_parse_version_string and
blkid_parse_version_string to ignore the second period so we don't
have maintenance releases with a substantially bigger verison number
than the initial x.y release.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
On big-endian systems, while swapping, ext2fs_swap_inode_full() swaps
only 128+extra_isize bytes and the EAs if they are present. Now if inode
N has EAs, (and this is the inode in the "scratch inode") then inode N+1
also carries seems to have them since the "scratch inode" was never
zeroed.
Signed-off-by: Kalpak Shah <kalpak@clusterfs.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
This patch changes ext2fs_open() to set EXT2_FLAG_MASTER_SB_ONLY by
default. This avoids some problems in e2fsck (reported by Jim Garlick)
where a corrupt journal can end up writing the bad superblock to the
backups. In general, only e2fsck (after the filesystem is clean),
tune2fs, and resize2fs should change the backup superblocks by default.
Most callers of ext2fs_open() should not be touching anything where the
backups should be touched. So let's change the defaults to avoid
potential problems.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
There have been reported instances of a filesystem having been mounted
at 2 places at the same time causing a lot of damage to the
filesystem. This patch reserves superblock fields and an INCOMPAT flag
for adding multiple mount protection(MMP) support within the ext4
filesystem itself. The superblock will have a block number
(s_mmp_block) which will hold a MMP structure which has a sequence
number which will be periodically updated every 5 seconds by a mounted
filesystem. Whenever a filesystem will be mounted it will wait for
s_mmp_interval seconds to make sure that the MMP sequence does not
change. To further make sure, we write a random sequence number into
the MMP block and wait for another s_mmp_interval secs. If the
sequence no. doesn't change then the mount will succeed. In case of
failure, the nodename, bdevname and the time at which the MMP block
was last updated will be displayed. tune2fs can be used to set
s_mmp_interval as desired.
Signed-off-by: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: Kalpak Shah <kalpak@clusterfs.com>
Store the RAID stride value when a filesystem is created with a requested
RAID stride, and then use it automatically in resize2fs.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Mke2fs is supposed to set the uid/gid ownership of the root directory when
a non-rooot user creates the filesystem. This wasn't working correctly
if the uid/gid was > 16 bits. In additional, debugfs wasn't displaying
large uid/gid's correctly. This patch fixes these two programs.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The l_i_version field is now defined from the old l_i_reserved1 field in
the ext2 inode. This field will be used to store high 32 bits of the
64-bit inode version number.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Fix a problem byte-swapping fast symlinks inodes that contain extended
attributes.
Addresses Red Hat Bugzilla: #232663
Addresses LTC Bugzilla: #27634
Signed-off-by: "Bryn M. Reeves" <breeves@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Add support for using TDB to store the icount data, so we don't run out
of memory when checking really large filesystems.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The following patch addresses a memory leak in libext2fs
that occurs when using ext2fs_write_new_inode() on a file system
configured with large inodes.
Signed-off-by: Jim Garlick <garlick@llnl.gov>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Prevent floating point precision errors on really big filesystems from
causing the search interpolation algorithm in the icount abstraction
from looping forever.
Addresses Debian Bug: #411838
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>