Place the allocation bitmaps and inode table blocks so they are
adjacent, even in the last flexbg.
Previously, after running "mke2fs -t ext4 DEV 286720", the layout of
the last few block groups would look like this:
Group 32: (Blocks 262145-270336) [INODE_UNINIT, ITABLE_ZEROED]
Block bitmap at 262145 (+0), Inode bitmap at 262161 (+16)
Inode table at 262177-262432 (+32)
Group 33: (Blocks 270337-278528) [INODE_UNINIT, BLOCK_UNINIT, ITABLE_ZEROED]
Block bitmap at 262146 (bg #32 + 1), Inode bitmap at 262162 (bg #32 + 17)
Inode table at 262433-262688 (bg #32 + 288)
Group 34: (Blocks 278529-286719) [INODE_UNINIT, ITABLE_ZEROED]
Block bitmap at 262147 (bg #32 + 2), Inode bitmap at 262163 (bg #32 + 18)
Inode table at 262689-262944 (bg #32 + 544)
Now, they look like this:
Group 32: (Blocks 262145-270336) [INODE_UNINIT, ITABLE_ZEROED]
Block bitmap at 262145 (+0), Inode bitmap at 262148 (+3)
Inode table at 262151-262406 (+6)
Group 33: (Blocks 270337-278528) [INODE_UNINIT, BLOCK_UNINIT, ITABLE_ZEROED]
Block bitmap at 262146 (bg #32 + 1), Inode bitmap at 262149 (bg #32 + 4)
Inode table at 262407-262662 (bg #32 + 262)
Group 34: (Blocks 278529-286719) [INODE_UNINIT, ITABLE_ZEROED]
Block bitmap at 262147 (bg #32 + 2), Inode bitmap at 262150 (bg #32 + 5)
Inode table at 262663-262918 (bg #32 + 518)
This reduces the free space fragmentation in a freshly created file
system. It also allows the following mke2fs command to succeed:
mke2fs -t ext4 -b 4096 -O ^resize_inode -G $((2**20)) DEV 2130483
(Note that while this allows people to run mke2fs with insanely large
flexbg sizes, this is not a recommended practice, as the kernel may
refuse to resize such a file system while mounted, since it currently
tries to allocate an in-memory data structure based on the size of the
flexbg, and so a file system with a very large flexbg size will cause
the memory allocation to fail. This will hopefully be fixed in a
future kernel release, but if the goal is to force all of the metadata
blocks to be at the beginning of the file system, it's better to use
the packed_meta_blocks configuration parameter in mke2fs.conf.)
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If a large flex_bg factor is specified and the block allocator was
laying out block or inode bitmaps or inode tables, and collides with
previously allocated metadata (for example the backup superblock or
group descriptors) it would reset the allocator back to the beginning
of the flex_bg instead of continuing past the obstruction.
For example, with "-G 131072" the inode table will hit the backup
descriptors in groups 1, 3, 5, 7, 9 and start interleaving with the
block and inode bitmaps. That results in poorly allocated bitmaps
and inode tables that are interleaved and not contiguous as was
intended for flex_bg:
Group 0: (Blocks 0-32767)
Primary superblock at 0, Group descriptors at 1-2048
Block bitmap 2049 (+2049), Inode bitmap at 133121 (bg #4+2049)
Inode table 264193-264200 (bg #8+2049)
:
:
Group 3838: (Blocks 125763584-125796351) [INODE_UNINIT, BLOCK_UNINIT]
Block bitmap 5887 (bg #0+5887), Inode bitmap 136959 (bg #4+5887)
Inode table 294897-294904 (bg #8 + 32753)
Group 3839: (Blocks 125796352-125829119) [INODE_UNINIT, BLOCK_UNINIT]
Block bitmap 5888 (bg #0+5888), Inode bitmap 136960 (bg #4+5888)
Inode table 5889-5896 (bg #0 + 5889)
Group 3840: (Blocks 125829120-125861887) [INODE_UNINIT, BLOCK_UNINIT]
Block bitmap 5897 (bg #0+5897), Inode bitmap 136961 (bg #4+5889)
Inode table 5898-5905 (bg #0 + 5898)
:
:
Instead, skip the intervening blocks if there aren't too many of them.
That mostly keeps the flex_bg allocations from colliding, though still
not perfect because there is still some overlap with the backups.
This patch addresses the majority of the problem, allowing about 124k
groups to be layed out perfectly, instead of less than 4k groups with
the previous code.
Signed-off-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If the previous block group's inode table ends at the very end of file
system, wrap around to the beginning of the flex_bg.
This fixes a bug was tickled by:
mke2fs.conf:
frontload = {
features = extent,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize,^resize_inode,sparse_super2
hash_alg = half_md4
num_backup_sb = 0
packed_meta_blocks = 1
inode_ratio = 4194304
flex_bg_size = 262144
}
mke2fs -T frontload /tmp/foo.img 2T
resize2fs -M /tmp/foo.img
resize2fs -M /tmp/foo.img
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
By using ext2fs_mark_block_bitmap_range2 and/or
ext2fs_block_alloc_stats_range(), we can significantly speed up the
time needed by mke2fs to allocate the inode table.
For example, the CPU time needed to run the command "mke2fs -t ext4
/tmp/foo.img 32T" (where tmpfs was mounted on /tmp) was decreased from
21.7 CPU seconds down to under 1.7 seconds.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Instead of calling ext2fs_numeric_progress_*() directly from closefs.c
and alloc_tables.c, call it via a operations structure which is only
initialized by the one program (mke2fs) which needs it.
This reduces the number of C library symbols needed by boot loader
systems such as yaboot.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The DEFS line in MCONFIG had gotten so long that it exceeded 4k, and
this was starting to cause some tools heartburn. It also made "make
V=1" almost useless, since trying to following the individual commands
run by make was lost in the noise of all of the defines.
So fix this by putting the configure-generated defines in lib/config.h
and the directory pathnames to lib/dirpaths.h.
In addition, clean up some vestigal defines in configure.in and in the
Makefiles to further shorten the cc command lines.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Commit 25567a7b0f accidentally removed the initialization for flexbg
and flexbg_size, which affected ext2fs_allocate_group_table() and
ext2fs_allocate_tables(). Replace them.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Filesystems with a blocksize of 1024 have the superblock starting at
block #1. However, the first data block in the superblock is 0 to
simplify the cluster calculations. So we must compensate for this in
a number of places, mostly in the ext2fs library, but also in e2fsck.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
This is needed to enable 64-bit mke2fs to work correctly.
Signed-off-by: Jose R. Santos <jrs@us.ibm.com>
Signed-off-by: Valerie Aurora Henson <vaurora@redhat.com>
Signed-off-by: Nick Dokos <nicholas.dokos@hp.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The top-level COPYING file states that the e2p and ext2fs libraries
are available under the LGPLv2. The files were incorrectly labelled.
Alex Thomas/Luster has been consulted wrt to the ext3_extents.h file;
the rest of the files were primarily authored by Theodore Ts'o.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
When trying to find the best place for the inode table in the last
flex block group, use the true size for the flex_bg's portion of the
inode table instead of the worst case required size of the inode table
fragment if the file system is resized. This fixes a corner case
where if the size of the filesystem is just big enough that there is
only room for a single block group in the last flex_bg, and that
partial block group is too small for the full portion of the inode
table, the inode table is placed in the very first block group:
Group 64: (Blocks 2097152-2099199) [INODE_UNINIT, ITABLE_ZEROED]
Checksum 0xd305, unused inodes 8080
Block bitmap at 2097152 (+0), Inode bitmap at 2097168 (+16)
Inode table at 8626-9130 (+4292878770)
^^^^^^^^^
Thanks to Vyacheslav Dubeyko for pointing this out.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The ext2fs_bg_flag* functions were confusing.
Currently we have this:
void ext2fs_bg_flags_set(ext2_filsys fs, dgrp_t group, __u16 bg_flags);
void ext2fs_bg_flags_clear(ext2_filsys fs, dgrp_t group,__u16 bg_flags);
(_set (unused) sets exactly bg_flags; _clear clears all and ignores bg_flags)
and these, which can twiddle individual bits in bg_flags:
void ext2fs_bg_flag_set(ext2_filsys fs, dgrp_t group, __u16 bg_flag);
void ext2fs_bg_flag_clear(ext2_filsys fs, dgrp_t group, __u16 bg_flag);
A better interface, after the patch below, is just:
ext2fs_bg_flags_zap(fs, group) /* zeros bg_flags */
ext2fs_bg_flags_set(fs, group, flags) /* adds flags to bg_flags */
ext2fs_bg_flags_clear(fs, group, flags) /* clears flags in bg_flags */
and remove the original ext2fs_bg_flags_set / ext2fs_bg_flags_clear.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
With 64-bit file systems, mke2fs can take a long time to do things
other than write inode tables. I exported the mke2fs numeric progress
meter and used it for allocating group tables and the final file
system flush.
Signed-off-by: Valerie Aurora (Henson) <vaurora@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
We are using a signed int to store a block number in
ext2fs_allocate_group_table. We don't actually do any computation or
comparisons using it, so it shouldn't cause any bugs, but it's
technically incorrect, and it's possible an overly clever compiler
might do something wrong with it.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Some of these could affect filesystems between 2^31 and 2^32-1 blocks.
Thanks to Valerie Aurora Henson for pointing out the problems in
lib/ext2fs/alloc_tables.c, which led me to do a "make gcc-wall" scan
over the source tree.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Disordered inode tables may appear when inode_blocks_per_group is lesser
or equal to the number of groups in a flex group.
This bug can be reproduced with:
mkfs.ext4 -t ext4dev -G512 70G
In that case, you can see with dump2fs that inode tables for groups 510
and 511 are placed just after group 51's inode table instead of being
placed after group 509's inode table.
Signed-off-by: Frederic Bohe <frederic.bohe@bull.net>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The offset for both inode bitmaps and inode tables is overshot by one
block causing a hole between the group of bitmaps and inode tables
when initializing a filesystem using mke2fs.
Signed-off-by: Jose R. Santos <jrs@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Change the way we allocate bitmaps and inode tables if the FLEX_BG
feature is used at mke2fs time. It places calculates a new offset for
bitmaps and inode table base on the number of groups that the user
wishes to pack together using the new "-G" option. Creating a
filesystem with 64 block groups in a flex group can be done by:
mke2fs -j -I 256 -O flex_bg -G 32 /dev/sdX
Signed-off-by: Jose R. Santos <jrs@us.ibm.com>
Signed-off-by: Valerie Clement <valerie.clement@bull.net>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Create new ext2fs library inline functions in order to calculate
the starting and ending blocks in a block group.
Signed-off-by: Eric Sandeen <esandeen@redhat.com>
When allocating space for the RAID filesystems with the stride parameter,
place each portion of the group's inode table right up after the superblock
(if present) in order to minimize fragmentation of the freespace.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
rw_bitmaps.c: Fixed signed/unsigned warnings.
fileio.c (ext2fs_file_set_size): Remove unneeded extern from the
function declaration.
dblist.c (make_dblist): Add safety check in case the dblist pointer
passed in is null (in which case, assign it to fs->dblist). Fixed
some signed/unsigned warnings.
bmap.c: Make addr_per_block be of type blk_t to avoid signed/unsigned
warnings.
namei.c (ext2fs_follow_link): Remove uneeded extern from the function
declaration.
get_pathname.c (get_pathname_proc): Use return value from
ext2fs_get_mem, instead of checking if &gp->name is NULL.
dir_iterate.c (ext2fs_process_dir_block):
dblist_dir.c (ext2fs_dblist_dir_iterate): Remove uneeded extern from
the function declaration.
block.c (ext2fs_block_iterate2): If the read_inode call fails, return
the error directly instead of jumping to the cleanup routine, since we
don't need to do any cleanup.
alloc_table.c (ext2fs_allocate_group_table): Make this function take a
dgrp_t for its group argument.
ext2fs.h: Make dgrp_t an __u32 type, and make fs->desc_group_count be
of type dgrp_t.
alloc_tables.c (ext2fs_allocate_group_table): Fix bug so that if the
stride length hits a bad value, we retry the block allocation starting
at the beginning of the block group.
ext2fs.h, bb_inode.c, block.c, bmove.c, dir_iterate.c, expanddir.c,
ext2fsP.h, read_bb.c: Change blkcnt_t to be e2_blkcnt_t to avoid
collision with LFS API.
inode.c (ext2fs_open_inode_scan): Initialize the group variables
so that we don't need to call get_next_blockgroup() the first
time around. Saves a bit of time, and prevents us from
needing to assign -1 to current_group (which is an unsigned
value).
icount.c (insert_icount_el): Cast the estimated number of inodes
from a float to an ino_t.
alloc.c, alloc_tables.c, badlbocks.c, bb_compat.c, bb_inode.c,
bitmaps.c, bitops.c, block.c, bmap.c, bmove.c, brel_ma.c,
check_desc.c, closefs.c, cmp_bitmaps.c, dblist.c,
dblist_dir.c, dir_iterate.c, dirblock.c, dupfs.c, expanddir.c,
ext2fs.h, fileio.c, freefs.c, get_pathname.c, getsize.c,
icount.c, initialize.c, inline.c, inode.c, irel_ma.c,
ismounted.c, link.c, lookup.c, mkdir.c, namei.c, native.c,
newdir.c, openfs.c, read_bb.c, read_bb_file.c, rs_bitmap.c,
rw_bitmaps.c, swapfs.c, test_io.c, tst_badblocks.c,
tst_getsize.c, tst_iscan.c, unix_io.c, unlink.c, valid_blk.c,
version.c: If EXT2_FLAT_INCLUDES is defined, then assume all
of the ext2-specific header files are in a flat directory.
block.c, bmove.c, dirblock.c, fileio.c: Explicitly cast
all assignments from void * to be compatible with C++.
closefs.c (ext2fs_flush): Add a call to io_channel_flush() to
make sure the contents of the disk are flushed to disk.
dblist.c (ext2fs_add_dir_block): Change new to be new_entry to
avoid C++ namespace clash.
bitmaps.c (ext2fs_copy_bitmap): Change new to be new_map to
avoid C++ namespace clash.
ext2fs.h, bb_inode.c, block.c, bmove.c, brel.h, brel_ma.c,
irel.h, irel_ma.c, dblist.c, dblist_dir.c, dir_iterate.c,
ext2fsP.h, expanddir.c, get_pathname.c, inode.c, link.c,
unlink.c: Change private to be priv_data (to avoid C++
namespace clash)
dblist.c (ext2fs_get_num_dirs): Make ext2fs_get_num_dirs more paranoid
about validating the directory counts from the block group
information.
all files: Don't include stdlib.h anymore; include it in ext2_fs.h,
since that file requires stdlib.h
ChangeLog, Makefile.in, dirinfo.c:
dirinfo.c (e2fsck_add_dir_info): Use ext2fs_get_num_dirs instead of
e2fsck_get_num_dirs, which has been removed.
Makefile.in (PROGS): Remove @EXTRA_PROGS@, since we don't want to
compile and install flushb.
ChangeLog, configure.in:
Remove @EXTRA_PROGS@, since we aren't using it in 2fsck/Makefile.in anymore
ChangeLog, Makefile.in:
Install debugfs in /sbin, instead of /usr/sbin.
libext2fs.texinfo:
Update version string to be 1.12
Makefile.in:
Fix bug in find script which made the exclusion list, where a '-' was
missing from an -name option.
alloc.c (ext2fs_alloc_block): New function which allocates a
block and updates the filesystem accounting records
appropriately.
ext2_err.et.in: Added new error codes: EXT2_NO_MEMORY,
EXT2_INVALID_ARGUMENT, EXT2_BLOCK_ALLOC_FAIL, EXT2_INODE_ALLOC_FAIL,
EXT2_NOT_DIRECTORY
Change various library files to use these functions instead of EINVAL,
ENOENT, etc.
ChangeLog, pass1.c, pass3.c:
pass3.c (get_lost_and_found): Check error return of
EXT2_FILE_NOT_FOUND instead of ENOTDIR
pass1.c (pass1_check_directory): Return EXT2_NO_DIRECTORY instead of
ENOTDIR
expect.icount:
Change expected error string to be "Invalid argument passed to ext2 library"
instead of just "Invalid argument"
block.c (ext2fs_block_iterate2): Use retval which is a errcode_t type.
bitmaps.c (make_bitmap): Use size_t instead of int where appropriate.
bb_inode.c (set_bad_block_proc): Add #pragma argsused for Turbo C.
alloc.c (ext2fs_new_inode): Use ino_t instead of int for the group number.
get_pathname.c: Use ino_t instead of int where appropriate.
ext2fs.h: Make the magic structure element be errcode_t instead of int.
alloc.c alloc_tables.c badblocks.c bb_compat.c bb_inode.c
bitmaps.c block.c bmove.c brel_ma.c check_desc.c closefs.c
cmp_bitmaps.c dblist.c dblist_dir.c dir_iterate.c dirblock.c
dupfs.c expanddir.c freefs.c get_pathname.c icount.c
initialize.c inline.c inode.c irel_ma.c link.c llseek.c
lookup.c mkdir.c namei.c newdir.c read_bb.c read_bb_file.c
rs_bitmap.c rw_bitmaps.c swapfs.c test_io.c tst_badblocks.c
tst_iscan.c unix_io.c unlink.c valid_blk.c version.c: Add an
#ifdef for HAVE_UNISTD_H
bmove.c (ext2fs_move_blocks): New function which takes a bitmap of
blocks which need to be moved, and moves those blocks to another
location in the filesystem.
rs_bitmap.c (ext2fs_resize_generic_bitmap): When expanding a bitmap,
make sure all of the new parts of the bitmap are zero.
bitmaps.c (ext2fs_copy_bitmap): Fix bug; the destination bitmap wasn't
being returned to the caller.
alloc_tables.c (ext2fs_allocate_group_table): Add new function
ext2fs_allocate_group_table() which sets the group tables for a
particular block group. The relevant code was factored out of
ext2fs_allocate_tables().
dblist.c (make_dblist): Adjust the initial size of the directory block
list to be a bit more realize (ten plus twice the number of
directories in the filesystem).
Check in interim work.