Convert all call sites that write zero blocks to disk to use
ext2fs_zero_blocks2() since it can use Linux's zero out feature to do
the writes more quickly. Reclaim the zero buffer at freefs time and
make the write-zeroes fallback use a larger buffer.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If in the course of iterating extents we find that an otherwise
valid-seeming second extent maps the same logical blocks as a
previously examined first extent, offer to clear the duplicate
mapping.
The test for this is already in f_extents.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If we think we're going to need to repair either the root directory or
the lost+found directory, reserve a block at the end of pass 1 to
reduce the likelihood of an e2fsck abort while reconstructing
root/lost+found during pass 3.
If / and/or /lost+found are corrupt and duplicate processing in pass
1b allocates all the free blocks in the FS, fsck aborts with an
unusable FS since pass 3 can't recreate / or /lost+found. If either
of those directories are missing, an admin can't easily mount the FS
and access the directory tree to move files off the injured FS and
free up space; this in turn prevents subsequent runs of e2fsck from
being able to continue repairs of the FS.
(One could migrate files manually with debugfs without the help of
path names, but it seems easier if users can simply mount the FS and
use regular FS management tools.)
[ Fixed up an obvious C trap: const char * and const char [] are not
the same thing when you are taking the size of the parameter.
People, run your regression tests! Like spinach, it's good for you. :-)
-- tytso ]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
During the later passes of efsck, we sometimes need to allocate and
map blocks into a file. This can happen either by fsck directly
calling new_block() or indirectly by the library calling new_block
because it needs to allocate a block for lower level metadata (bmap2()
with BMAP_SET; block_iterate3() with BLOCK_CHANGED).
We need to force new_block to allocate blocks from the found block
map, because the FS block map could be inaccurate for various reasons:
the map is wrong, there are missing blocks, the checksum failed, etc.
Therefore, any time fsck does something that could to allocate blocks,
we need to intercept allocation requests so that they're sourced from
the found block map. Remove the previous code that swapped bitmap
pointers as this is now unneeded.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If we encounter an inode with IND/DIND/TIND blocks or internal extent
tree blocks that point into critical FS metadata such as the
superblock, the group descriptors, the bitmaps, or the inode table,
it's quite possible that the validation code for those blocks is not
going to like what it finds, and it'll ask to try to fix the block.
Unfortunately, this happens before duplicate block processing (pass
1b), which means that we can end up doing stupid things like writing
extent blocks into the inode table, which multiplies e2fsck'
destructive effect and can render a filesystem unfixable.
To solve this, create a bitmap of all the critical FS metadata. If
before pass1b runs (basically check_blocks) we find a metadata block
that points into these critical regions, continue processing that
block, but avoid making any modifications, because we could be
misinterpreting inodes as block maps. Pass 1b will find the
multiply-owned blocks and fix that situation, which means that we can
then restart e2fsck from the beginning and actually fix whatever
problems we find.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
After a journal replay, we close and reopen the file system so that
any changes in the superblock can get reflected in the libext2fs's
internal data structures. We need to save the flags passed to
ext2fs_open() that we used when we originally opened the file system.
Otherwise we will end up not be able to repair a file system which
requires a journal replay and which has bigalloc enabled or which has
more than 2**32 blocks; e2fsck will abort with the error message:
fsck.ext4: Filesystem too large to use legacy bitmaps while trying to re-open
Addresses-Debian-Bug: 744953
Cc: Андрей Василишин <a.vasilishin@kpi.ua>
Cc: Jon Severinsson <jon@severinsson.net>
Cc: 744953@bugs.debian.org
There are interfaces that are used by mke2fs.c and tune2fs.c which are
in quotaio.h, and some future changes will be much simpler if we can
combine the two header files together. Also the guard #ifdef for
mkquota.h was incorrect, which caused problems when both header files
needed to be included.
Also remove quota.pc and installation rules for libquota, since this
library is never going to be something that we can export externally
anyway. Eventually we'll want to clean up the interfaces and move the
external publishable interfaces to the libext2fs library, and then
rename what's left from libquota.a to libsupport.a for internal use
only.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Aditya Kali <adityakali@google.com>
If there are any PREEN_OK problems fixed in check_super_block(), don't
skip checking the full file system.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
If we have a 64-bit file system with extended attribute blocks, e2fsck
would not correctly handle EA blocks that were located beyond the
32-bit block number boundary. Fix this by teaching
e2fsck/ea_refcount.c to use 64-bit block numbers.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
We need to store some error codes using an int to keep recovery.c as
close as possible to the recovery.c source file in the kernel.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Fix all the places where we should be using a blk64_t instead of a
blk_t. These fixes are more severe because 64bit values could be
truncated silently.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Remove crc32_be in favor of the implementation in libext2fs.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Check htree internal node checksums. If broken, ask user to clear
the htree index and recreate it later.
[ Move the check for not rehashing the lost+found directory to pass1
so that we don't end up truncating lost+found when the metadata
checksum feature is enabled. -- TYT ]
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Since "bool" is a valid C type, declarations of the form "int bool"
will cause compiler errors if <stdbool.h> is included. Rename these
variables to avoid this name clash.
Signed-off-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Add the ability to log messages about a file system to a specified
directory, using a file name templace that can be specified in
/etc/e2fsck.conf. This allows us to suppress the output of overly
verbose e2fsck outputs while still allowing the full logging output to
go to an appropriate file.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Now that we have multiple backend implementations of the bitmap code,
this commit teaches e2fsck to use either the most appropriate backend
for each use case.
Since we don't know for sure if we will get it all right, the default
choices can be overridden via e2fsck.conf. The various definitions
are shown here, with the current defaults (which may change as we add
more bitmap implementations and as learn what works better).
; EXT2FS_BAMP64_BITARRAY is 1
; EXT2FS_BMAP64_RBTREE is 2
; EXT2FS_BMAP64_AUTODIR is 3
[bitmaps]
inode_used_map = 2 ; pass1
inode_dir_map = 3 ; pass1
inode_reg_map = 2 ; pass1
block_found_map = 2 ; pass1
inode_bad_map = 2 ; pass1
inode_imagic_map = 2 ; pass1
block_dup_map = 2 ; pass1
block_ea_map = 2 ; pass1
inode_link_info = 2 ; pass1
inode_dup_map = 2 ; pass1b
inode_done_map = 3 ; pass3
inode_loop_detect = 3 ; pass3
fs_bitmaps = 2
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Multi-mount protection is feature that allows mke2fs, e2fsck, and
others to detect if the filesystem is mounted on a remote node (on
SAN disks) and avoid corrupting the filesystem. For e2fsprogs this
means that it checks the MMP block to see if the filesystem is in use,
and marks the filesystem busy while e2fsck is running on the system.
This is useful on SAN disks that are shared between high-availability
servers, or accessible by multiple nodes that aren't in HA pairs. MMP
isn't intended to serve as a primary HA exclusion mechanism, but as a
failsafe to protect against user, software, or hardware errors.
There is no requirement that e2fsck updates the MMP block at regular
intervals, but e2fsck does this occasionally to provide useful
information to the sysadmin in case of a detected conflict.
For the kernel (since Linux 3.0) MMP adds a "heartbeat" mechanism to
periodically write to disk (every few seconds by default) to notify
other nodes that the filesystem is still in use and unsafe to modify.
Originally-by: Kalpak Shah <kalpak@clusterfs.com>
Signed-off-by: Johann Lombardi <johann@whamcloud.com>
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The DEFS line in MCONFIG had gotten so long that it exceeded 4k, and
this was starting to cause some tools heartburn. It also made "make
V=1" almost useless, since trying to following the individual commands
run by make was lost in the noise of all of the defines.
So fix this by putting the configure-generated defines in lib/config.h
and the directory pathnames to lib/dirpaths.h.
In addition, clean up some vestigal defines in configure.in and in the
Makefiles to further shorten the cc command lines.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
This patch adds support for doing quota accounting during full
e2fsck scan if the 'quota' feature was set on the superblock.
If user-visible quota inodes are in use, they will be hidden
and converted to the reserved quota inodes.
Signed-off-by: Aditya Kali <adityakali@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Commit 2a77a784a3 (firest released in e2fsprogs 1.33) compared
superblock summary free blocks and inode counts with the allocation
bitmap counts before starting the file system check proper, and if
they differed, set the superblock and marked it as dirty. If no other
file systme changes were required, this would cause a "*** FILE SYSTEM
WAS MODIFIED ***" message without any explanation of what e2fsck had
changed.
We fix this by only setting the superblock summary free block/inodes
counts if we are skipping a full check, and in non-preen mode, e2fsck
will now print an explicit message stating how the superblock had been
updated.
In a full check, any updates to the superblock free blocks/inodes
fields will be noted in pass5.
This change requires changing a few test results (essentially
reversing the changes made in commit 2a77a784a3).
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
As recently discussed on linux-ext4@vger.kernel.org add an option to e2fsck
to allow to replay the journal only. That will allow scripts, such as
pacemakers 'Filesystem' RA to first replay the journal and if that sets
an error state from the journal replay, further check for that error
(dumpe2fh -h | grep "Filesystem state:") and if that shows and error
to refuse to mount. It also allows automatic e2fsck scripts to first
replay the journal and on a second run after the real pass1 to passX checks
to test for the return code.
Signed-off-by: Bernd Schubert <bschubert@ddn.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
In Pass 5 when we are checking block and inode bitmaps we have great
opportunity to discard free space and unused inodes on the device,
because bitmaps has just been verified as valid. This commit takes
advantage of this opportunity and discards both, all free space and
unused inodes.
I have added new set of options, 'nodiscard' and 'discard'. When the
underlying devices does not support discard, or discard ends with an
error, or when any kind of error occurs on the filesystem, no further
discard attempt will be made and the e2fsck will behave as it would
with nodiscard option provided.
As an addition, when there is any not-yet-zeroed inode table and
discard zeroes data, then inode table is marked as zeroed.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
There are broken embedded devices that have system clocks that always
reset to January 1, 1970 whenever they boot (even if no power is
lost). There are also systems that have super cheap clock crystals
that can be very inaccurate. So if the option broken_system_clock is
given, disable all time based checks. E2fsck will also try to detect
incorrect system clock times, and automatically mark the system clock
as insane.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
If the user grows a partition bigger than 2**32 blocks, e2fsprogs
1.41.x is not going to be able to support resizing the filesystem,
since it doesn't have > 2**32 block support. However, e2fsck should
still work, so the system administrator doesn't get stuck.
Addresses-Launchpad-Bug: #521648
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
in the case of ! defined RESOURCE_TRACK, so that we can clean up #ifdef
throughout e2fsck source.
Signed-off-by: Ken Chen <kenchen@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Restart e2fsck only once in case of multiple inodes in uninit range.
Display correct inode number during BG_INO_UNINIT and INOREF_IN_USED errors.
Signed-off-by: Kalpak Shah <kalpak.shah@sun.com>
Signed-off-by: Andreas Dilger <adilger@sun.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Move check_resize_inode() out of check_super_block(), since we only
need to test the resize_inode for correctness only if the filesystem
requires checking. This change avoids a lot of I/O operations which
slows down a 1 second boot.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Fixed a potential bug where by partial returns from the write(2)
system call could lost characters to be sent to external progress bar
display program.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
If the directory is packed with no slack space, as soon as any new
directory entries are added, leaf nodes end up getting split and
directory ends up getting very inefficient.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Track the number of non-contiguous files and directories so we can
give more detailed information in verbose mode.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
If a block device is read-only, e2fsck -p gets into an infinite loop
trying to preenhalt, closing and flushing the fs, which tries to flush
the cache, which gets a write error and calls preenhalt which tries to
close and flush the fs ... ad infinitum.
Per Ted's suggestion just flag the ctx as "exiting" and short-circuit
the infinite loop.
Tested by running e2fsck -p on a block device set read-only by BLKROSET.
Thanks to Vlado Potisk for reporting this.
Addresses-Red-Hat-Bugzilla: #465679
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Also added support for "e2fsck -E fragcheck" which issues a
comprehensive report of discontiguous file extents.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
This patch has all the necesary pieces to open and fix filesystems created
with the uninit block group feature.
Signed-off-by: Jose R. Santos <jrs@us.ibm.com>
Signed-off-by: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Add vertificaton of the in-inode EA information, and allow in-inode
EA's to have a checksum.
Signed-off-by: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>