Fix more problems that I found when testing on ppc64:
- Inode swap cut and paste error leads to immutable inodes being
detected as inlinedata inodes, leading to e2fsck incorrectly barfing
on i_block[] contents.
- Superblock csum/verify must be aware of the fs->super byte order
when checking for metadata_csum feature flag. (Hint: in _openfs(),
fs->super is in LE order for the first csum verification)
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When we're (a) reading EAs into a buffer; (b) byte-swapping EA
entries; or (c) checking EA data, be careful not to run off the end of
the memory buffer, because this causes invalid memory accesses and
e2fsck crashes. This can happen if we encounter a specially crafted
FS image.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Fix a typo that we didn't notice because all the world's an x86. :-)
Reported-by: jon ernst <jonernst07@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Inline_data is handled in dir iterator because a lot of commands use
this function to traverse directory entries in debugfs. We need to
handle inline_data individually because inline_data is saved in two
places. One is in i_block, and another is in ibody extended attribute.
After applied this commit, the following commands in debugfs can
support the inline_data feature:
- cd
- chroot
- link*
- ls
- ncheck
- pwd
- unlink
* TODO: Inline_data doesn't expand to ibody extended attribute because
link command doesn't handle DIR_NO_SPACE error until now. But if we
have already expanded inline data to ibody ea area, link command can
occupy this space.
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Later we will use ext2fs_dirent_swab_in/out to handle big-endian problem
for inline data. Now interfaces assume that it handles a block, but it
is not true after adding inline data. So this commit defines a new
interface for inline data.
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
In practice, it is **extremely** rare for users to try to use more
than the first backup superblock located at the beginning of block
group #1. (i.e., at block number 32768 for file systems with a 4k
block size). This new compat feature restricts the backup superblock
to block group #1 and the last block group in the file system.
Aside from reducing the overhead of the file system by a small number
of blocks, by eliminating the rest of the backup superblocks, it
allows us to have a much more flexible metadata layout. For example,
we can force all of the allocation bitmaps and inode table blocks to
the beginning of the disk, which allows most of the disk to be
exclusively used for contiguous data blocks.
This simplifies taking advantage of certain HDD specific features,
such as Shingled Magnetic Recording (aka Shingled Drives), and the
TCG's OPAL Storage Specification where having a simple mapping between
LBA block ranges and the data blocks used by the file system can make
life much simpler.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The s_desc_size in the superblock specifies the group descriptor
size in bytes, but in various places the EXT4_FEATURE_INCOMPAT_64BIT
flag implies that the descriptor size is EXT2_MIN_DESC_SIZE_64BIT
(64 bytes) instead of checking the actual size. In other places,
the s_desc_size field is used without checking for INCOMPAT_64BIT.
In the case of ext2fs_group_desc() the s_desc_size was being ignored,
and assumed to be sizeof(struct ext4_group_desc), which would result
in garbage for any but the first group descriptor. Similarly, in
ext2fs_group_desc_csum() and print_csum() they assumed that the
maximum group descriptor size was sizeof(struct ext4_group_desc).
Fix these functions to use the actual superblock s_desc_size if
INCOMPAT_64BIT.
Conversely, in ext2fs_swap_group_desc2() s_desc_size was used
without checking for INCOMPAT_64BIT being set.
The e2fsprogs behaviour is different than that of the kernel,
which always checks INCOMPAT_64BIT, and only uses s_desc_size to
determine the offset of group descriptors and what range of bytes
to checksum.
Allow specifying the s_desc_size field at mke2fs time with the
"-E desc_size=NNN" option. Allow a power-of-two s_desc_size
value up to s_blocksize if INCOMPAT_64BIT is specified. This
is not expected to be used by regular users at this time, so it
is not currently documented in the mke2fs usage or man page.
Add m_desc_size_128, f_desc_size_128, and f_desc_bad test cases to
verify mke2fs and e2fsck handling of larger group descriptor sizes.
Signed-off-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Calculate and verify the checksum for separate (i.e. not in the inode)
extended attribute blocks; the checksum lives in the header.
[ Merged in change from Tao so that we always use the fs checksum seed
for the xattr blocks. ]
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Introduce small structures for recording directory tree checksums, and
some API changes to support writing out directory blocks with
checksums.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Fix several minor errors in structure definitions, the byteswap code,
and Makefiles that result from merging the crc32c and initial parts of
the metadata checksumming patchset.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Multi-mount protection is feature that allows mke2fs, e2fsck, and
others to detect if the filesystem is mounted on a remote node (on
SAN disks) and avoid corrupting the filesystem. For e2fsprogs this
means that it checks the MMP block to see if the filesystem is in use,
and marks the filesystem busy while e2fsck is running on the system.
This is useful on SAN disks that are shared between high-availability
servers, or accessible by multiple nodes that aren't in HA pairs. MMP
isn't intended to serve as a primary HA exclusion mechanism, but as a
failsafe to protect against user, software, or hardware errors.
There is no requirement that e2fsck updates the MMP block at regular
intervals, but e2fsck does this occasionally to provide useful
information to the sysadmin in case of a detected conflict.
For the kernel (since Linux 3.0) MMP adds a "heartbeat" mechanism to
periodically write to disk (every few seconds by default) to notify
other nodes that the filesystem is still in use and unsafe to modify.
Originally-by: Kalpak Shah <kalpak@clusterfs.com>
Signed-off-by: Johann Lombardi <johann@whamcloud.com>
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The DEFS line in MCONFIG had gotten so long that it exceeded 4k, and
this was starting to cause some tools heartburn. It also made "make
V=1" almost useless, since trying to following the individual commands
run by make was lost in the noise of all of the defines.
So fix this by putting the configure-generated defines in lib/config.h
and the directory pathnames to lib/dirpaths.h.
In addition, clean up some vestigal defines in configure.in and in the
Makefiles to further shorten the cc command lines.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Also cleaned up ext2_fs.h, and improved the byte swapping code so the
extra fields in the large inode are properly byte swapped.
Addresses-Debian-Bug: #641838
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reserve EXT4_FEATURE_RO_COMPAT_METADATA_CSUM and
EXT2_FEATURE_COMPAT_EXCLUDE_BITMAP. Also reserve fields in the
superblock and the inode for the checksums. In the block group
descriptor, reserve the exclude bitmap field for the snapshot feature,
and checksums for the inode and block allocation bitmaps.
With this commit, the metadata checksum and exclude bitmap features
should have reserved all of the fields they need in ext4's on-disk
format.
This commit also fixes an a missing byte swap for s_overhead_blocks.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Darrick J. Wong <djwong@us.ibm.com>
Cc: Amir Goldstein <amir73il@gmail.com>
The write_journal_inode() code is only setting the low 32-bit i_size
for the journal size, even though it is possible to specify a journal
up to 10M blocks in size. Trying to create a journal larger than 2GB
will succeed, but an immediate e2fsck would fail. Store i_size_high
for the journal inode when creating it, and load it upon access.
Use s_jnl_blocks[15] to store the journal i_size_high backup. This
field is currently unused, as EXT2_N_BLOCKS is 15, so it is using
s_jnl_blocks[0..14], and i_size is in s_jnl_blocks[16].
Rename the "size" argument "num_blocks" for the journal creation functions
to clarify this parameter is in units of filesystem blocks and not bytes.
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
This adds the superblock fields needed so that dumpe2fs works and the
code points and renames the superblock fields from describing
fragments to clusters.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
This patch adds support for detecting the new 'quota' feature in ext4.
The patch reserves code points for usr and group quota inodes and also
for the feature flag EXT4_FEATURE_RO_COMPAT_QUOTA.
Signed-off-by: Aditya Kali <adityakali@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
These patches fix obvious bone-headed mistakes, so e2fsprogs will now
build and mostly work on powerpc. The m_meta_bg, u_mke2fs, and
u_tune2fs tests are still failing, however, so there's still work to do...
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
We also support for byte-swapping the Next3 fields, although the
current Next3 implementation doesn't support big-endian systems.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The top-level COPYING file states that the e2p and ext2fs libraries
are available under the LGPLv2. The files were incorrectly labelled.
Alex Thomas/Luster has been consulted wrt to the ext3_extents.h file;
the rest of the files were primarily authored by Theodore Ts'o.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The f_illitable_flexbg test was failing on ppc, because
e2fsck_move_ext3_journal is doing a direct memcmp of i_block with
s_jnl_blocks, and failing.
This is because we don't swap extent data on read from disk; rather
we do it when we access the extents. However, ext2fs_swap_super
was swapping s_jnl_blocks unconditionally, so these didn't match.
Looks like we need to treat s_jnl_blocks the same as i_block, and
swap it on access, not on read. Except for the last i_size bit...
Reviewed-by: Andreas Dilger <adilger@sun.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
This field tracks the lifetime amount of writes to the filesystem. It
will be updated by the kernel as well as by e2fsprogs programs which
write to the filesystem.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
ext2fs_swap_inode_full() was incorrectly swapping the i_block array
for extents.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
After the fix for resize2fs's inode mover losing in-inode
extended attributes, the regression test I wrote caught
that the attrs were still getting lost on powerpc.
Looks like the problem is that ext2fs_swap_inode_full()
isn't paying attention to whether or not the EA magic is
in hostorder, so it's not recognized (and not swapped)
on BE machines. Patch below seems to fix it.
Yay for regression tests. ;)
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Add vertificaton of the in-inode EA information, and allow in-inode
EA's to have a checksum.
Signed-off-by: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Responsibility for byte swapping the extents information rests with
the low-level extent code, which translates the on-disk extents
information to the abstract extent format. The on-disk format will
eventually get more complicated, in order to add support for 64-bit
block numbers, bit-compressed extents, etc. So to avoid needing to
expose all of that complexity in swapfs.c, the in-memory contents of
i_blocks will not be byte-swapped and will be identical to the on-disk
format.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
We need to set t->i_file_acl before we test it in
ext2fs_inode_data_blocks()
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The l_i_version field is now defined from the old l_i_reserved1 field in
the ext2 inode. This field will be used to store high 32 bits of the
64-bit inode version number.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Fix a problem byte-swapping fast symlinks inodes that contain extended
attributes.
Addresses Red Hat Bugzilla: #232663
Addresses LTC Bugzilla: #27634
Signed-off-by: "Bryn M. Reeves" <breeves@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The e2fsprogs and kernel implementation of directory hash tree has a
bug which causes the implementation to be dependent on whether
characters are signed or unsigned. Platforms such as the PowerPC,
Arm, and S/390 have signed characters by default, which means that
hash directories on those systems are incompatible with hash
directories on other systems, such as the x86.
To fix this we add a new flags field to the superblock, and define two
new bits in that field to indicate whether or not the directory should
be signed or unsigned. If the bits are not set, e2fsck and fixed
kernels will set them to the signed/unsigned value of the currently
running platform, and then respect those bits when calculating the
directory hash. This allows compatibility with current filesystems,
as well as allowing cross-architectural compatibility.
Addresses Debian Bug: #389772
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
- EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE (0x0040?) - add s_min_extra_isize and
s_want_extra_isize fields to superblock, which allow specifying
the minimum and desired i_extra_isize fields in large inodes
(for nsec+epoch timestamps, potential other uses). Needs RO_COMPAT
flag handling, needs e2fsck support, patch complete, little testing.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
- EXT4_FEATURE_INCOMPAT_64BIT (0x0080) - support for 64-bit block count
fields in the superblock (s_blocks_count_hi, s_free_blocks_count_hi),
large group descriptors (s_desc_size), extents with high 16 bits
(ee_start_hi, ei_leaf_hi), inode ACL (i_file_acl_hi). May also grow
to encompass the previously proposed BIG_BG.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
- EXT4_FEATURE_RO_COMPAT_GDT_CSUM (0x0010?) - store a crc16 checksum in
the group descriptor (s_uuid[16] | __u32 group | ext3_group_desc
(excluding gd_checksum itself)). This allows the kernel to more safely
manage UNINIT groups.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
- EXT4_FEATURE_RO_COMPAT_HUGE_FILE (0x0008) - change i_blocks to be
in units of s_blocksize units instead of 512-byte sectors, use
l_i_frag and l_i_fsize as i_blocks_hi (could also be part of 64BIT).
E2fsck and debugfs changed to support i_blocks_hi instead of l_i_frag and
l_i_fsize.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
This feature is initially intended for testing purposes; it allows an
ext2/ext3 developer to create very large filesystems using sparse files
where most of the block groups are not initialized and so do not require
much disk space. Eventually it could be used as a way of speeding up
mke2fs and e2fsck for large filesystem, but that would be best done by
adding an RO_COMPAT extension to the filesystem to allow the inode table
to be lazily initialized on a per-block basis, instead of being entirely initialized
or entirely unused on a per-blockgroup basis.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>