Commit Graph

96 Commits (95480c64ffcaba8a6c84a235b5230e7364881332)

Author SHA1 Message Date
Theodore Ts'o 65c6c3e06f Add support for new compat feature "sparse_super2"
In practice, it is **extremely** rare for users to try to use more
than the first backup superblock located at the beginning of block
group #1.  (i.e., at block number 32768 for file systems with a 4k
block size).  This new compat feature restricts the backup superblock
to block group #1 and the last block group in the file system.

Aside from reducing the overhead of the file system by a small number
of blocks, by eliminating the rest of the backup superblocks, it
allows us to have a much more flexible metadata layout.  For example,
we can force all of the allocation bitmaps and inode table blocks to
the beginning of the disk, which allows most of the disk to be
exclusively used for contiguous data blocks.

This simplifies taking advantage of certain HDD specific features,
such as Shingled Magnetic Recording (aka Shingled Drives), and the
TCG's OPAL Storage Specification where having a simple mapping between
LBA block ranges and the data blocks used by the file system can make
life much simpler.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-01-30 13:58:18 -05:00
Andreas Dilger 11d1116a7c e2fsck: verify s_desc_size is power-of-two value
Add a LOG2_CHECK mode for check_super_value() so that it is easy
to verify values that are supposed to be power-of-two values
(s_desc_size and s_inode_size so far).  In ext2fs_check_desc()
also check for a power-of-two s_desc_size.

Print out s_desc_size in debugfs "stats" and dumpe2fs output, if
it is non-zero.

It turns out that the s_desc_size validation in check_super_block()
is not currently used by e2fsck, because the group descriptors are
verified earlier by ext2fs_check_desc(), and even without an
explicit check of s_desc_size the group descriptors fail to align
correctly on disk.  It makes sense to keep the check_super_block()
regardless, in case the code changes at some point in the future.

Signed-off-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-12-23 16:03:46 -05:00
Theodore Ts'o 0796e66085 lsattr, chattr: add support for btrfs's No_COW flag
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-06-12 17:30:10 -04:00
Theodore Ts'o 660b4c3b3f Reserve the codepoints for the INCOMPAT features LARGEDATA and INLINEDATA
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-05-11 18:39:27 -04:00
Lukas Czerner 010dc7b90d e2fsck: remove EXT4_EOFBLOCKS_FL flag handling
We've decided to remove EOFBLOCKS_FL from the ext4 file system entirely,
because it is not actually very useful and it is causing more problems
than it solves. We're going to remove it from e2fsprogs first and then
after the new e2fsprogs version is common enough we can remove the
kernel part as well.

This commit changes e2fsck to not check for EOFBLOCKS_FL. Instead we
simply search for initialized extents past the i_size as this should not
happen. Uninitialized extents can be past the i_size as we can do
fallocate with KEEP_SIZE flag.

Also remove the EXT4_EOFBLOCKS_FL from lib/ext2fs/ext2_fs.h since it is
no longer needed.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2012-03-22 19:42:11 -04:00
Theodore Ts'o 991211f676 libext2fs, libe2p: Reserve RO_COMPAT_REPLICA feature
The replica is a feature which stores multiple copies of the key
metadata blocks so a single block failure in failure-prone media
(read: certain types of flash storage) doesn't take out the entire
file system.

Discussion on the upstream list proved not to be very positive on this
feature; the arguments were that it added complexity that wasn't
warrented, since common practice in industry is to insist on reliable
media, and if media is unreliable, you're kind of toast anyway (unless
the file system is being used as the back-end store of a cluster file
system where checksuming and data replication is happening above the
local disk file system level).  So, this feature is being developed
out of tree.

We reserve the code points so that other people won't accidentally
step on them.  Since it's not upstream, it's a soft reservation, but
it's not like we have any shortage of RO_COMPAT features.  We are a
bit more tight on reserved inodes, but EXT2_BOOT_LOADER_INO and
EXT2_UNDEL_DIR_INO are not currently used anywhere, and
EXT2_EXCLUDE_INO is a reservation for another out-of-tree feature.
There are no features currently being discussed which require a
reserved inode, but if a need were to arise, we can claw back code
point reservations that were never used or not in tree, as those will
always be considered lower priority than in-tree features.

Cc: Aditya Kali <adityakali@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-02-17 15:28:21 -05:00
Andreas Dilger 0f5eba7501 ext2fs: add multi-mount protection (INCOMPAT_MMP)
Multi-mount protection is feature that allows mke2fs, e2fsck, and
others to detect if the filesystem is mounted on a remote node (on
SAN disks) and avoid corrupting the filesystem.  For e2fsprogs this
means that it checks the MMP block to see if the filesystem is in use,
and marks the filesystem busy while e2fsck is running on the system.

This is useful on SAN disks that are shared between high-availability
servers, or accessible by multiple nodes that aren't in HA pairs.  MMP
isn't intended to serve as a primary HA exclusion mechanism, but as a
failsafe to protect against user, software, or hardware errors.

There is no requirement that e2fsck updates the MMP block at regular
intervals, but e2fsck does this occasionally to provide useful
information to the sysadmin in case of a detected conflict.

For the kernel (since Linux 3.0) MMP adds a "heartbeat" mechanism to
periodically write to disk (every few seconds by default) to notify
other nodes that the filesystem is still in use and unsafe to modify.

Originally-by: Kalpak Shah <kalpak@clusterfs.com>

Signed-off-by: Johann Lombardi <johann@whamcloud.com>
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2011-09-25 01:55:23 -04:00
Theodore Ts'o ae96c678e1 libext2s: fix swapfs.c so it builds on big endian systems
Also cleaned up ext2_fs.h, and improved the byte swapping code so the
extra fields in the large inode are properly byte swapped.

Addresses-Debian-Bug: #641838

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-09-16 17:51:03 -04:00
Theodore Ts'o 16c581d0e8 debugfs: add 64-bit support to the set_field commands
The set_fields commands (set_super_value, set_inode_field,
set_block_group) now handle fields which store in split fields on
ext4's on-disk format.  For example, the superblock fields
s_blocks_count and s_blocks_count_hi.

The user can either set the low or high part of the field via
"blocks_count_lo" or "blocks_count_hi", or both parts can be set via
"blocks_count".

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-09-16 10:32:02 -04:00
Theodore Ts'o 89efc88e65 libext2fs: add metadata checksum and snapshot feature flags
Reserve EXT4_FEATURE_RO_COMPAT_METADATA_CSUM and
EXT2_FEATURE_COMPAT_EXCLUDE_BITMAP.  Also reserve fields in the
superblock and the inode for the checksums.  In the block group
descriptor, reserve the exclude bitmap field for the snapshot feature,
and checksums for the inode and block allocation bitmaps.

With this commit, the metadata checksum and exclude bitmap features
should have reserved all of the fields they need in ext4's on-disk
format.

This commit also fixes an a missing byte swap for s_overhead_blocks.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Darrick J. Wong <djwong@us.ibm.com>
Cc: Amir Goldstein <amir73il@gmail.com>
2011-09-16 10:24:09 -04:00
Theodore Ts'o 75405ffde6 Merge branch 'maint' into next 2011-09-16 00:00:04 -04:00
Theodore Ts'o 3fbfad558e libext2fs: fix binary and source compatibility with the dump program
The dump program relies on fs->frag_size and the
EXT2_FRAGS_PER_BLOCK() macro.  Kind of silly for it to do so, but it's
part of the kludgy way the dump program (which was originally written
for the BSD FFS was ported over to support ext2/3.)  Given how it
makes assumptions about the ext2/3/4 file system being similar to the
BSD FFS, it's a bit of a miracle it works for ext4 --- or at least
appears to work...

Addresses-Debian-Bug: #636418

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-09-15 15:44:56 -04:00
Yongqiang Yang 9f6ba888f0 resize2fs: add support for new in-kernel online resize ioctl
This is needed to support online resizing for > 32-bit file systems

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-09-14 13:26:57 -04:00
Theodore Ts'o d32c915abf libext2fs: Fix gcc -Wall warnings
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-07-07 13:50:22 -04:00
Theodore Ts'o 6a6337c3df Merge branch 'maint' into next
Conflicts:
	lib/ext2fs/bitmaps.c
	lib/ext2fs/rw_bitmaps.c
	misc/dumpe2fs.c
2011-06-04 20:24:36 -04:00
Theodore Ts'o ae9e37cd11 libext2fs: change EXT2_MAX_BLOCKS_PER_GROUP() to be cluster size aware
Change the EXT2_MAX_BLOCKS_PER_GROUP so that it takes the cluster size
into account.  This way we can open bigalloc file systems without
ext2fs_open() thinking that they are corrupt.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-06-04 16:40:26 -04:00
Theodore Ts'o 5af9eeaa7d Merge branch 'maint' into next
Conflicts:
	lib/e2p/ls.c
2011-03-18 16:44:37 -04:00
Theodore Ts'o 4df1618250 add new superblock field: s_overhead_blocks
It turns out that it's very hard to calculate overheads in the face of
clustered allocation (bigalloc).  This is because multiple metadata
blocks from different block groups can end up in the same allocation
cluster.  Calculating the exact overhead requires O(all block bitmaps)
in memory, or O(number of block groups**2) in time.  So we will
calculate this at mkfs time and stash it in the superblock.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-03-18 14:47:15 -04:00
Theodore Ts'o 829d999488 Merge branch 'maint' into next
Conflicts:
	lib/ext2fs/initialize.c
2011-02-27 19:47:44 -05:00
Theodore Ts'o 412376efff Add basic BIGALLOC support for cluster-based allocation
This adds the superblock fields needed so that dumpe2fs works and the
code points and renames the superblock fields from describing
fragments to clusters.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-02-25 21:43:54 -05:00
Aditya Kali 0edcc27021 e2fsprogs: reserving code points for new ext4 quota feature
This patch adds support for detecting the new 'quota' feature in ext4.
The patch reserves code points for usr and group quota inodes and also
for the feature flag EXT4_FEATURE_RO_COMPAT_QUOTA.

Signed-off-by: Aditya Kali <adityakali@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2011-02-25 18:31:52 -05:00
Theodore Ts'o 9d92a201de Merge branch 'maint' into next
Conflicts:
	configure
	configure.in
	lib/ext2fs/ext2fs.h
	misc/mke2fs.c
2010-09-24 22:40:21 -04:00
Theodore Ts'o 9345f02671 tune2fs, debugfs, libext2fs: Add support for ext4 default mount options
Add support for 2.6.35's new default mount options which can be
specified in the superblock.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-09-18 19:38:22 -04:00
Theodore Ts'o db0bdb49f4 Merge branch 'maint' into next
Conflicts:
	resize/extent.c
2010-07-19 02:37:41 -04:00
Theodore Ts'o 993988f655 Add superblock fields which track first and most recent fs errors
Add superblock fields which track where and when the first and most
recent file system errors occured.  These fields are displayed by
dumpe2fs and cleared by e2fsck.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-07-05 14:45:55 -04:00
Theodore Ts'o 97d26ce9e3 Merge branch 'maint' into next
Conflicts:
	e2fsck/journal.c
	e2fsck/pass1.c
	e2fsck/pass2.c
	misc/mke2fs.c
2010-06-07 12:42:40 -04:00
Theodore Ts'o 6d0ed67802 Reserve feature flags and fields needed for the Next3 snapshot feature
The documentation is not (as of this writing) fully complete, but
there is some documentation here:

http://sourceforge.net/apps/mediawiki/next3/index.php?title=Code_documentation
http://sourceforge.net/apps/mediawiki/next3/index.php?title=On-disk_format
http://sourceforge.net/projects/next3/files/Next3_Snapshots.pdf/download

... which will hopefully be updated soon to be fully up to date with
these assignments and more details about how things work.

For now, the assignments should avoid collisions with other new work
that people might want to do on ext3/4.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-06-02 10:42:16 -04:00
Andreas Dilger 30a7610edf Reserve the EXT4_FEATURE_INCOMPAT_DIRDATA feature flag
Reserve the EXT4_FEATURE_INCOMPAT_DIRDATA feature flag for adding
extra file data in ext2_dir_entry_2 entries.

This changes the on-disk layout in the following way.

Firstly, the ext2_dir_entry_2 file_type field now has a mask: that
limits the "filetype" information to the low 4 bits of this field.
Since these values are sequentially assigned, this allows for up to 7
more filetypes to be assigned.  When reading the "filetype" field, the
high 4 bits should be masked off when converting to DT_* filetypes for
userspace.

The high 4 bits of "filetype" are used as a bitmask to register up to
4 different "extended" directory entry fields.  Extended data fields
are packed without alignment into the directory entry after the "name"
field in order of increasing bitmask value, for each field where bit
is set.  In order to avoid the need to "understand" each of the
extended fields, the first byte of each extended data field holds the
size of that data field (including the size itself), so they can be
skipped if not understood.  For fields that change the semantics of
the filesystem it is expected that a separate ROCOMPAT or INCOMPAT
field is registered.

There is a single dirent data type defined currently, for Lustre:
which holds a 128-bit file identifier.  It is expected that if there
are 64-bit inode values that this will be assigned the 0x20 value.

Should a need ever arise to use all 4 of the extended dirent data
fields, it would be possible to keep the last bit (0x80) for use as a
multiplexor that stores a 1-byte aggregate data size, then a series of
"<u8_size><u8_type><data>" records in the last extended data record.
It is not expected that this will actually be needed in the lifetime
of ext4.

Signed-off-by: Andreas Dilger <adilger@sun.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-06-02 10:41:54 -04:00
Andreas Dilger cff9690f4e Reserve the EXT4_INCOMPAT_EA_INODE feature flag
Reserve the EXT4_INCOMPAT_EA_INODE feature flag for use with
large extended attributes that are stored in a separate inode.
This changes the on-disk format in several ways:

First, replace the e_value_block field with e_value_inum, so that
an xattr entry can reference an external inode.  This field is
currently unused, as all of the entries live in the same block.

struct ext2_ext_attr_entry {
 	__u8	e_name_len;	/* length of name */
 	__u8	e_name_index;	/* attribute name index */
 	__le16	e_value_offs;	/* offset in disk block of value */
>	__le32	e_value_inum;	/* inode in which the value is stored */
 	__le32	e_value_size;	/* size of attribute value */
 	__le32	e_hash;		/* hash value of name and value */
 	char	e_name[0];	/* attribute name */
}

Second, add a flag to the inode that indicates it is using a large
(external) extended attribute.  This is needed so that when unlinking
an inode the xattrs will be scanned to unlink the xattr inodes
referenced by the main inode.

Third, for inodes that have a number of xattrs that are larger than
a single block, but not large enough to justify an external inode
(less than 64kB total xattr size, due to e_value_offs limitation)
the ext2_ext_attr_header->h_blocks field can grow beyond a single
block to represent a contiguous allocation of blocks for the xattr.

Signed-off-by: Andreas Dilger <adilger@sun.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-06-02 10:41:37 -04:00
Theodore Ts'o c5b23f6c0e Merge branch 'maint' into next 2010-03-15 18:53:45 -04:00
Eric Sandeen 4ffafee26c e2fsck: don't complain about i_size for known blocks past EOF
This is the userspace side of Jiaying's EOFBLOCKS patch.  With
Aneesh's patches for .33, Jiaying's patch, and this one, xfstests
013/fsstress (even with direct IO enabled) has held up through many
runs.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-02-24 11:24:37 -05:00
Valerie Aurora Henson a63745e81c Use ext2fs_file_acl_block() instead of using .i_file_acl directly
This provides support for 48-bit file acl blocks.

Signed-off-by: Valerie Aurora Henson <vaurora@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-07 22:29:45 -04:00
Valerie Aurora Henson 3c4d4d7459 libext2fs: Define bg_itable_unused_hi in the ext4_group_desc structure
Signed-off-by: Valerie Aurora Henson <vaurora@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-06-06 15:08:44 -04:00
Theodore Ts'o b7c5b40308 Add support for a new superblock field: s_kbytes_written
This field tracks the lifetime amount of writes to the filesystem.  It
will be updated by the kernel as well as by e2fsprogs programs which
write to the filesystem.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-03-06 01:59:23 -05:00
Theodore Ts'o efc6f628e1 Remove trailing whitespace for the entire source tree
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-08-27 23:07:54 -04:00
Theodore Ts'o 5c4f8d6748 resize2fs: Add support to use the ext4 online resize ioctl's
First try the ext3 ioctl, but if we get an error, try using the ext4
ioctl.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-06-07 13:13:16 -04:00
Theodore Ts'o bb767b2fc3 ext2fs.h: Add l_i_file_acl_high and l_version_hi to on-disk inode structure
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-05-15 22:14:52 -04:00
Theodore Ts'o 494a1daad3 Basic flexible block group support
Add superblock definition, and dumpe2fs and debugfs support.

Signed-off-by: Jose R. Santos <jrs@us.ibm.com>
Signed-off-by: Valerie Clement <valerie.clement@bull.net>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2008-04-22 23:32:15 -04:00
Theodore Ts'o 1ca1059fd0 Add support for the HUGE_FILE feature
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-17 16:38:13 -04:00
Jose R. Santos ca2634a46a Add initial checksum support for the gdt_checksum/uninit_group feature
- Add support for computing CRC-16 value.
- Add call to check/verify/set csum on block_groups.
- Add a test program to verify csum operations.

Signed-off-by: Jose R. Santos <jrs@us.ibm.com>
Signed-off-by: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-03-20 15:32:11 -04:00
Andreas Dilger a7c9cb7d0d Add support for the DIR_NLINK feature.
This patch includes the changes required to e2fsck to understand the
nlink count changes made in the kernel.

In e2fsck pass 4, when we fetch the actual link count, if it is
exceeds 65,000 we set the link count to 1.  We silently fix the
situation where the nlink count of the directory is 1, and there are
fewer than 65,000 subdirectories, since since that can happen
naturally.

Patch originally from CFS, significantly rewritten by Theodore Ts'o.

Signed-off-by: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: Kalpak Shah <kalpak@clusterfs.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-03-15 01:39:19 -04:00
Theodore Ts'o 6567e5a684 Merge branch 'maint' 2008-03-15 01:27:08 -04:00
Theodore Ts'o a80f3694a7 ext2_fs.h: Rename EXT4_ORPHAN_FS to be EXT3_ORPHAN_FS
No application will ever use the ORPHAN_FS flag, since it only shows
up in kernel memory, but it's been pointed out it was first used in
ext3, and so it should be renamed for accuracy.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-03-15 01:25:51 -04:00
Theodore Ts'o 428f6b32a9 Merge branch 'maint' into next
Conflicts:

	configure
	lib/ext2fs/ext2_fs.h
	misc/e2image.c
2008-01-27 20:09:05 -05:00
Theodore Ts'o 6cb27404f5 Add support for the test_fs flag
The test_fs flag is an "ok to be used with test kernel code" flag.  It
makes it easier for us to determine whether a filesystem should be
mounted using ext4 or not.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-01-26 21:47:40 -05:00
Theodore Ts'o 153439222e Define helper functions ext2fs_set_i_{u,g}id_high() for MacOS compatibility
This is needed for all non-Linux/Hurd/Masix systems...

Addresses-Sourceforge-Bug: #1863819

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-01-21 09:46:05 -05:00
Theodore Ts'o fef2b38d8e Merge branch 'maint' into next
Conflicts:

	configure
	debian/rules
	e2fsck/swapfs.c
	lib/ext2fs/ext2_fs.h
2008-01-01 12:41:35 -05:00
Theodore Ts'o 7132d48d83 Fix build failure on non-Linux/non-Hurd/non-Masix systems
The previous fix didn't quite work, but this one should!

Addresses-Sourceforge-Bug: #1861633

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-01-01 12:25:36 -05:00
Theodore Ts'o 3306861158 Fix build failure on non-Linux/non-Hurd/non-Masix systems
inode_uid() and inode_gid() weren't getting defined on systems that
were not Linux, Hurd, or Masix.

Addresses-Sourceforge-Bug: #1859778

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2007-12-31 22:33:56 -05:00
Theodore Ts'o 3166c58dc0 Add #define needed for Hurd ioctl definitions
Addresses-Debian-Bug: #437720

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2007-12-17 23:03:53 -05:00