In the loop in ext2fs_get_free_blocks2, we ask the bitmap if there's a
range of free blocks starting at "b" and ending at "b + num - 1".
That quantity is the number of the last block in the range. Since
ext2fs_blocks_count() returns the number of blocks and not the number
of the last block in the filesystem, the check is incorrect.
Put in a shortcut to exit the loop if finish > start, because in that
case it's obvious that we don't need to reset to the beginning of the
FS to continue the search for blocks. This is needed to terminate the
loop because the broken test meant that b could get large enough to
equal finish, which would end the while loop.
The attached testcase shows that with the off by one error, it is
possible to throw e2fsck into an infinite loop while it tries to
find space for the inode table even though there's no space for one.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Since fs->group_desc_count is the number of block groups, the number
of the last group is always one less than this count. Fix the bounds
check to reflect that.
This flaw shouldn't have any user-visible side effects, since the
block bitmap test based on last_grp later on can handle overbig block
numbers.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
During the later passes of efsck, we sometimes need to allocate and
map blocks into a file. This can happen either by fsck directly
calling new_block() or indirectly by the library calling new_block
because it needs to allocate a block for lower level metadata (bmap2()
with BMAP_SET; block_iterate3() with BLOCK_CHANGED).
We need to force new_block to allocate blocks from the found block
map, because the FS block map could be inaccurate for various reasons:
the map is wrong, there are missing blocks, the checksum failed, etc.
Therefore, any time fsck does something that could to allocate blocks,
we need to intercept allocation requests so that they're sourced from
the found block map. Remove the previous code that swapped bitmap
pointers as this is now unneeded.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When we call ext2fs_close_free at the end of main(), we need to supply
the address of ctx->fs, because the subsequent e2fsck_free_context
call will try to access ctx->fs (which is now set to a freed block) to
see if it should free the directory block list. This is clearly not
desirable, so fix the problem.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When we're about to iterate the blocks of a block-map file, we need to
write the inode out to disk if it's dirty because block_iterate3()
will re-read the inode from disk. (In practice this won't happen
because nothing dirties block-mapped inodes before the iterate call,
but we can program defensively).
More importantly, we need to re-read the inode after the iterate()
operation because it's possible that mappings were changed (or erased)
during the iteration. If we then dirty or clear the inode, we'll
mistakenly write the old inode values back out to disk!
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If the bitmaps are known to be unreadable, don't bother clearing them;
just mark fsck to restart itself after pass 5, by which time the
bitmaps should be fixed.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If e2fsck knows the bitmaps are bad at the exit (probably because they
were bad at the start and have not been fixed), don't offer to
recreate the journal because doing so causes e2fsck to abort a second
time.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If there's a problem with the inode scan during pass 1b, report the
inode that we were trying to examine when the error happened, not the
inode that just went through the checker.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Allow set_inode_field's bmap command in debugfs to allocate blocks,
which enables us to allocate blocks for indirect blocks and internal
extent tree blocks. True, we could do this manually, but seems like
unnecessary bookkeeping activity for humans.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Create a command that will dump an entire inode's space in hex.
[ Modified by tytso to add a description to the man page, and to add
the more formal command name, inode_dump, in addition to short
command name of "idump". ]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Currently, e4defrag avoids increasing file fragmentation by comparing
the number of runs of physical extents of both the original and the
donor files. Unfortunately, there is a bug in the routine that counts
physical extents, since it doesn't look at the logical block offsets
of the extents. Therefore, a file whose blocks were allocated in
reverse order will be seen as only having one big physical extent, and
therefore will not be defragmented.
Fix the counting routine to consider logical extent offset so that we
defragment backwards-allocated files. This could be problematic if we
ever gain the ability to lay out logically sparse extents in a
physically contiguous manner, but presumably one wouldn't call defrag
on such a file.
Reported-by: Xiaoguang Wang <wangxg.fnst@cn.fujitsu.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
It's only necessary to build tst_libext2fs when running "make check".
Also make sure the links of the tst_* programs are done with
$(ALL_LDFLAGS).
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Most systems have a backwards compatibility symlink in
/usr/include/syscall.h to /usr/include/sys/syscall.h, but
sys/syscall.h is the documented location of the header file. Fix two
locations where we were using <syscall.h> instead of <sys/syscall.h>.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
There were other protections which would prevent a buffer overflow
from happening, but we should fix this nevertheless.
Addresses-Coverity-Bug: #1225003
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The single quote character must not be in the first character in a
line, or else it can get mistaken as a macro call.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
As an object lesson in why autoreconf is fundamentally unsafe, the
newer version of nls.m4 no longer handles @MKINSTALLDIRS@. So add
this back, since our Makefiles depend on it.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Add the mke2fs.conf configuration option which causes the hugefiles to
be aligned to the beginning of the disk. This is important if the the
reason for aligning the hugefiles is to support hard-drive specific
features such as Shingled Magnetic Recording (SMR).
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
After a journal replay, we close and reopen the file system so that
any changes in the superblock can get reflected in the libext2fs's
internal data structures. We need to save the flags passed to
ext2fs_open() that we used when we originally opened the file system.
Otherwise we will end up not be able to repair a file system which
requires a journal replay and which has bigalloc enabled or which has
more than 2**32 blocks; e2fsck will abort with the error message:
fsck.ext4: Filesystem too large to use legacy bitmaps while trying to re-open
Addresses-Debian-Bug: 744953
Cc: Андрей Василишин <a.vasilishin@kpi.ua>
Cc: Jon Severinsson <jon@severinsson.net>
Cc: 744953@bugs.debian.org
In mke2fs command, if flex_bg count is too large to filesystem blocks
count, unmountable ext4 which has the out of filesystem block offset
is created (Case1). Moreover this large flex_bg count causes an
unintentional metadata layout (bmap-imap-itable-bmap-imap-itable .. in
block group) (Case2).
To fix these issues and keep healthy flex_bg layout, disallow creating
ext4 with obviously large flex_bg count to filesystem blocks count.
Steps to reproduce:
(Case1)
1.
# mke2fs -t ext4 -b 4096 -O ^resize_inode -G $((2**20)) DEV 2130483
2.
# mount -t ext4 DEV MP
mount: wrong fs type, bad option, bad superblock on /dev/sdb4,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so
3.
# dumpe2fs DEV
...
Block count: 2130483
...
Flex block group size: 1048576
...
Group 65: (Blocks 2129920-2130482) [INODE_UNINIT]
Checksum 0x4cb3, unused inodes 8080
Block bitmap at 67 (bg #0 + 67), Inode bitmap at 1048643 (bg #32 + 67)
Inode table at 2129979-2130483 (+59)
^^^^^^^ 2130483 is out of FS!
65535 free blocks, 8080 free inodes, 0 directories, 8080 unused inodes
Free blocks:
Free inodes: 525201-533280
(Case2)
1.
# mke2fs -t ext4 -G 2147483648 DEV 3145728
2.
# debugfs -R stats DEV
...
Block count: 786432
...
Flex block group size: 2147483648
...
Group 0: block bitmap at 193, inode bitmap at 194, inode table at 195
...
Group 1: block bitmap at 707, inode bitmap at 708, inode table at 709
...
Group 2: block bitmap at 1221, inode bitmap at 1222, inode table at 1223
...
Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
We can set flex_bg count only up to 2^30 with profile
because get_int_from_profile can handle it to 2^31-1.
Add get_uint_from_profile to read unsigned int value
so that mke2fs with profile can handle up to 2^31 flex_bg same as -G option.
Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
mke2fs -G option allows root user to set flex_bg count (power of 2).
However ext4 has bad metadata layout if we specify more than or equal to
2^32 to mke2fs -G, because of the 32bit shift operation
in ext2fs_allocate_group_table().
And the maximum block group count of ext4 is 2^32 -1 (ext4_group_t
s_groups_count), so diallow more than 2^32 flex_bg count.
Steps to reproduce:
# mke2fs -t ext4 -G 4294967296 DEV
# dumpe2fs DEV
...
Flex block group size: 1 <----- flex_bg is 1!
...
Group 0: (Blocks 0-32767)
Checksum 0x4afd, unused inodes 7541
Primary superblock at 0, Group descriptors at 1-1
Reserved GDT blocks at 2-59
Block bitmap at 60 (+60), Inode bitmap at 61 (+61)
Inode table at 62-533 (+62)
32228 free blocks, 7541 free inodes, 2 directories, 7541 unused inodes
Free blocks: 540-32767
Free inodes: 12-7552
Group 1: (Blocks 32768-65535) [INODE_UNINIT]
Checksum 0xc890, unused inodes 7552
Backup superblock at 32768, Group descriptors at 32769-32769
Reserved GDT blocks at 32770-32827
Block bitmap at 32828 (+60), Inode bitmap at 32829 (+61)
Inode table at 32830-33301 (+62)
32234 free blocks, 7552 free inodes, 0 directories, 7552 unused inodes
Free blocks: 33302-65535
Free inodes: 7553-15104
...
Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: "Darrick J. Wong" <darrick.wong@oracle.com>
If a large flex_bg factor is specified and the block allocator was
laying out block or inode bitmaps or inode tables, and collides with
previously allocated metadata (for example the backup superblock or
group descriptors) it would reset the allocator back to the beginning
of the flex_bg instead of continuing past the obstruction.
For example, with "-G 131072" the inode table will hit the backup
descriptors in groups 1, 3, 5, 7, 9 and start interleaving with the
block and inode bitmaps. That results in poorly allocated bitmaps
and inode tables that are interleaved and not contiguous as was
intended for flex_bg:
Group 0: (Blocks 0-32767)
Primary superblock at 0, Group descriptors at 1-2048
Block bitmap 2049 (+2049), Inode bitmap at 133121 (bg #4+2049)
Inode table 264193-264200 (bg #8+2049)
:
:
Group 3838: (Blocks 125763584-125796351) [INODE_UNINIT, BLOCK_UNINIT]
Block bitmap 5887 (bg #0+5887), Inode bitmap 136959 (bg #4+5887)
Inode table 294897-294904 (bg #8 + 32753)
Group 3839: (Blocks 125796352-125829119) [INODE_UNINIT, BLOCK_UNINIT]
Block bitmap 5888 (bg #0+5888), Inode bitmap 136960 (bg #4+5888)
Inode table 5889-5896 (bg #0 + 5889)
Group 3840: (Blocks 125829120-125861887) [INODE_UNINIT, BLOCK_UNINIT]
Block bitmap 5897 (bg #0+5897), Inode bitmap 136961 (bg #4+5889)
Inode table 5898-5905 (bg #0 + 5898)
:
:
Instead, skip the intervening blocks if there aren't too many of them.
That mostly keeps the flex_bg allocations from colliding, though still
not perfect because there is still some overlap with the backups.
This patch addresses the majority of the problem, allowing about 124k
groups to be layed out perfectly, instead of less than 4k groups with
the previous code.
Signed-off-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Currently is used did not specified lazy_itable_init option we rely on
information from ext4 module exported via sysfs interface. However if
the ext4 module is not loaded it will not be enabled even though kernel
might support it.
With this commit we set the default according to the kernel version,
however we still allow it to be set manually via extended option or be
enabled in case that ext4 module advertise that it supports this
feature.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>