The crc32c implementation in the kernel has been refactored a bit to
reduce the amount of code that needs to be maintained, and to speed up
tune2fs/e2fsck on PowerPC by 5-10%. Port the crc32c changes over, and
provide a crc32_be so that we can remove the duplicate functionality
from e2fsck. Also drop crc32c_be and crc32_le since neither got used.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Add metadata checksumming to the list of supported features.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Change the block group algorithm to use the same algorithm as the rest
of the metadata_csum. This mostly involves providing a helper
function to tell if group descriptors should have checksums set or
verified, and modifying the gdt checksum code to use the correct
algorithm.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Record the type of checksum algorithm we're using for metadata in the
superblock, in case we ever want/need to change the algorithm.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Calculate and verify the superblock checksums. Each copy of the
superblock records the number of the group it's in and the FS UUID, so
we can simply checksum the whole block.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Calculate and verify the checksum for separate (i.e. not in the inode)
extended attribute blocks; the checksum lives in the header.
[ Merged in change from Tao so that we always use the fs checksum seed
for the xattr blocks. ]
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Introduce small structures for recording directory tree checksums, and
some API changes to support writing out directory blocks with
checksums.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Verify and calculate checksums of htree internal node blocks.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Verify and calculate extent tree block checksums when processing
filesystems.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Provide a field in the block group descriptor to store inode bitmap
checksum, and some helper functions to calculate and verify it.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
This patch adds the ability for the libext2fs functions to read and
write the inode checksum.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Precompute the FS UUID checksum seed that is used for all metadata
checksumming operations and store it in ext2_filsys.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Define flags and extend ext4 structure definitions to support metadata
checksumming. Ted Ts'o covered many of these fields in an earlier
patch, but there are more required changes to the disk layout.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Instead of calling ext2fs_numeric_progress_*() directly from closefs.c
and alloc_tables.c, call it via a operations structure which is only
initialized by the one program (mke2fs) which needs it.
This reduces the number of C library symbols needed by boot loader
systems such as yaboot.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Add --{en,dis}able-mmp options for configure, default to enabled.
Also make tools fail gracefully in the event of encoutering a filesystem
with MMP enabled when the tools were compiled with --disable-mmp
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Currently EXT2_LIB_FEATURE_INCOMPAT_SUPP is #defined twice once with
EXT2_FEATURE_INCOMPAT_COMPRESSION and once without depending on the
state of ENABLE_COMPRESSION
Change this to use an intermediate symbol so that the definition of
EXT2_LIB_FEATURE_INCOMPAT_SUPP doesn't change as other optional fetures
are added.
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The creation of inline wrappers ext2fs_open_file() and ext2fs_stat()
in commit c859cb1de0 in ext2fs.h caused
difficulties with the use of headers, since the headers for open64()
and stat64() may already be included (and skip the declaration of the
64-bit variants) before ext2fs.h is ever read. There is no real way
to solve the missing prototypes and resulting compiler warnings inside
ext2fs.h.
Since ext2fs_open_file() and ext2fs_stat() are not performance
critical operations, they do not need to be inline functions at all,
and the needed function headers can be handled properly in one file.
Similarly, posix_memalloc() was having difficulties with headers, and
was being defined in ext2fs.h, but it is now only being used by a
single file, so move the required header there.
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Create a new function, ext2fs_get_dio_alignment(), which returns the
alignment requirements for direct I/O. This way we can factor out the
code from MMP and the Unix I/O manager. The two modules weren't
consistently calculating the alignment factors, and in particular MMP
would sometimes use a larger alignment factor than was strictly
necessary.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The lack of 32-bit support was causing febootstrap to crash since it
wasn't passing EXT2_FLAG_64BITS when opening the file system, so we
were still using the legacy bitmaps.
Also add support for bigalloc bitmap into the ffz functions.
Addresses-Red-Hat-Bugzilla: #808421
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The ext2fs_add_journal_inode() function calls
ext2fs_check_mount_point(), which can fail if /etc/mtab is missing.
This causes mke2fs to fail in the middle of the file system format
process; mke2fs calls ext2fs_check_mount_point() already (and has
appropriate fallbacks that calls fails), so add a flag so that mke2fs
can request ext2fs_add_journal_inode() to skip trying to call
e2fsck_check_mount_point().
Addresses-Sourceforge-Bug: #3509398
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The replica is a feature which stores multiple copies of the key
metadata blocks so a single block failure in failure-prone media
(read: certain types of flash storage) doesn't take out the entire
file system.
Discussion on the upstream list proved not to be very positive on this
feature; the arguments were that it added complexity that wasn't
warrented, since common practice in industry is to insist on reliable
media, and if media is unreliable, you're kind of toast anyway (unless
the file system is being used as the back-end store of a cluster file
system where checksuming and data replication is happening above the
local disk file system level). So, this feature is being developed
out of tree.
We reserve the code points so that other people won't accidentally
step on them. Since it's not upstream, it's a soft reservation, but
it's not like we have any shortage of RO_COMPAT features. We are a
bit more tight on reserved inodes, but EXT2_BOOT_LOADER_INO and
EXT2_UNDEL_DIR_INO are not currently used anywhere, and
EXT2_EXCLUDE_INO is a reservation for another out-of-tree feature.
There are no features currently being discussed which require a
reserved inode, but if a need were to arise, we can claw back code
point reservations that were never used or not in tree, as those will
always be considered lower priority than in-tree features.
Cc: Aditya Kali <adityakali@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Add a function to return the inode number of an open file.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
This feature is especially useful for better understanding how e2fsprogs
tools (mainly e2fsck) treats bitmaps and what bitmap backend can be most
suitable for particular bitmap. Backend itself (if implemented) can
provide statistics of its own as well.
[ Changed to provide basic statistics when enabled with the
E2FSPROGS_BITMAPS_STATS environment variable -- tytso]
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Now that we have multiple backend implementations of the bitmap code,
this commit teaches e2fsck to use either the most appropriate backend
for each use case.
Since we don't know for sure if we will get it all right, the default
choices can be overridden via e2fsck.conf. The various definitions
are shown here, with the current defaults (which may change as we add
more bitmap implementations and as learn what works better).
; EXT2FS_BAMP64_BITARRAY is 1
; EXT2FS_BMAP64_RBTREE is 2
; EXT2FS_BMAP64_AUTODIR is 3
[bitmaps]
inode_used_map = 2 ; pass1
inode_dir_map = 3 ; pass1
inode_reg_map = 2 ; pass1
block_found_map = 2 ; pass1
inode_bad_map = 2 ; pass1
inode_imagic_map = 2 ; pass1
block_dup_map = 2 ; pass1
block_ea_map = 2 ; pass1
inode_link_info = 2 ; pass1
inode_dup_map = 2 ; pass1b
inode_done_map = 3 ; pass3
inode_loop_detect = 3 ; pass3
fs_bitmaps = 2
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
This backend type will automatically switch between the bitarray and
the rbtree backend based on the number of directories in the file
system.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
For a long time we had a bitarray backend for storing filesystem
metadata bitmaps, however today this approach might hit its limits with
todays huge data storage devices, because of its memory utilization.
Bitarrays stores bitmaps as ..well, as bitmaps. But this is in most
cases highly unefficient because we need to allocate memory even for the
big parts of bitmaps we will never use, resulting in high memory
utilization especially for huge filesystem, when bitmaps might occupy
gigabytes of space.
This commit adds another backend to store bitmaps. It is based on
rbtrees and it stores just used extents of bitmaps. It means that it can
be more memory efficient in most cases.
I have done some limited benchmarking and it shows that rbtree backend
consumes approx 65% less memory that bitarray on 312GB filesystem aged
with Impression (default config). This number may grow significantly
with the filesystem size, but also it may be a lot lower (even negative)
if the inodes are very fragmented (need more benchmarking).
This commit itself does not enable the use of rbtree backend.
[ Simplified the code by avoiding unneeded memory allocation and
deallocation of del_ext. In addition, fixed a bug discovered by the
tst_bitmaps tests: rb_unamrk_bmap() must return true if the bit was
previously set in bitmap, and zero otherwise -- tytso ]
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
This allows a program to control the bitmap backend implementation
that will get used without needing to change the current library API.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Quota support can be enabled using --enable-quota. There are still
some buglets that we need to fix up before it can be considered 100%
supported, so let's disable it for the 1.42 release.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Commit 6b56f3d92d introduced the use of HAVE_STAT64 without arranging
that it be defined in configure.in. Previously ext4.h used
HAVE_OPEN64, but apparently there are (broken) platforms that have
open64() but not stat64(). Go figure.
We do need to consistently use a single test for ext2fs_stat(),
ext2fs_fstat(), and struct ext2fs_struct_stat, or we could end up
passing a struct stat64 to a fstat() system call, or some such. I've
elected to use HAVE_FSTAT64 because: (a) it's already defined in the
configure script, and (b) if we ever come across a really broken
platform that defines fstat64() but not stat64(), we can always
emulate stat64() using open64() followed by a fstat64().
This commit fixed a bug whose symptoms were that mke2fs would not work
if given a file > 2GB on 32-bit platforms.
Addresses-Debian-Bug: #647245
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The ext2fs_file_acl_block() and ext2fs_set_file_acl_block() needs to
only check i_file_acl_high if the 64-bit flag is set. This is needed
because otherwise we will run into problems on Hurd systems which
actually use that field for h_i_mode_high.
This involves an ABI change since we need to pass ext2_filsys to these
functions. Fortunately these functions were first included in the
1.42-WIP series, so it's OK for us to change them now. (This is why
we have 1.42-WIP releases. :-)
Addresses-Sourceforge-Bug: #3379227
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Remove unused variables, places where 'return' was used with no value
in a non-void function, missing function declarations, etc. Don't
assume that all systems have quotactl(), and use <sys/quota.h> if it
exists to define the quotactl interfaces.
One of the unused variables also got rid of a non-portable use of
PATH_MAX.
Cc: Aditya Kali <adityakali@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
ext2fs.h now calls open() so it should include the headers needed
for this system call as well.
Addresses-Red-Hat-Bugzilla: #742147
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Fix several minor errors in structure definitions, the byteswap code,
and Makefiles that result from merging the crc32c and initial parts of
the metadata checksumming patchset.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Multi-mount protection is feature that allows mke2fs, e2fsck, and
others to detect if the filesystem is mounted on a remote node (on
SAN disks) and avoid corrupting the filesystem. For e2fsprogs this
means that it checks the MMP block to see if the filesystem is in use,
and marks the filesystem busy while e2fsck is running on the system.
This is useful on SAN disks that are shared between high-availability
servers, or accessible by multiple nodes that aren't in HA pairs. MMP
isn't intended to serve as a primary HA exclusion mechanism, but as a
failsafe to protect against user, software, or hardware errors.
There is no requirement that e2fsck updates the MMP block at regular
intervals, but e2fsck does this occasionally to provide useful
information to the sysadmin in case of a detected conflict.
For the kernel (since Linux 3.0) MMP adds a "heartbeat" mechanism to
periodically write to disk (every few seconds by default) to notify
other nodes that the filesystem is still in use and unsafe to modify.
Originally-by: Kalpak Shah <kalpak@clusterfs.com>
Signed-off-by: Johann Lombardi <johann@whamcloud.com>
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Several compiler errors are quieted:
- zero-length gnu_printf format string
- unused variable
- uninitalized variable (though it isn't actually used for anything)
- fixed a bug in ext2fs_stat() if stat64() does not exist
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
This adds new APIs: ext2fs_flush2 and ext2fs_close2 which take an
extra 'int flags' parameter.
This allows us to pass in an EXT2_FLAG_FLUSH_NO_SYNC flag which avoids
fsync'ing the filesystem when closing it. For the case we have in
mind where we are just constructing a throwaway ext2 filesystem in a
file in order to boot a VM, this saves over 5 seconds during the boot
process and avoids many unnecessary disk writes.
Existing code using ext2fs_flush and ext2fs_close remains unaffected
by this change.
Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Code to count the number of blocks in the last partial
group is cut and pasted around the e2fsprogs codebase
a few times.
Making this a helper function should improve matters.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
In many places we are using #ifdef HAVE_OPEN64 to determine if we can
use open64() but that's ugly. This commit creates two new helpers
ext2fs_open_file() for open() and ext2fs_stat() for stat(). Also we need
new typedef ext2fs_struct_stat for struct stat.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Add a slicing-by-8 CRC32c implementation for metadata checksumming.
Adapted from Bob Pearson's kernel patch.
Also added a self-test mechanism so we can verify that the crc32c
implementation is working correctly.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The dump program relies on fs->frag_size and the
EXT2_FRAGS_PER_BLOCK() macro. Kind of silly for it to do so, but it's
part of the kludgy way the dump program (which was originally written
for the BSD FFS was ported over to support ext2/3.) Given how it
makes assumptions about the ext2/3/4 file system being similar to the
BSD FFS, it's a bit of a miracle it works for ext4 --- or at least
appears to work...
Addresses-Debian-Bug: #636418
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>