October 10, 2003

Optimizing Linux filesystems

Author: Roderick W. Smith

Last time we introduced a few common Linux filesystems and examined their features. If you've already installed Linux, your partitions are already set up and configured with particular filesystems, but you may decide you want to modify this configuration. What's the best way to begin?

This article is excerpted from the recently published book Linux Power Tools.

Some changes are tedious to implement. For instance, changing from one filesystem to another requires you to back up, create a new filesystem, and restore your files. One exception to this rule is changing from ext2fs to ext3fs. If you switch filesystems, you may be able to use filesystem-creation options to improve the performance of the new filesystem. Other changes can also be done relatively painlessly. These include defragmenting a disk (that is, repositioning file contents so that they're not spread out over the entire partition) and resizing partitions to give you space where you need it.

Creating a filesystem for optimal performance

Most filesystems support a variety of options that may impact performance. For instance, large allocation blocks can improve performance by reducing fragmentation and the number of operations needed to retrieve an entire file. Some of these options can be set only at filesystem creation time, but some can be changed after the fact. Not all of these features are available in all filesystems. Across all Linux filesystems, important and popular performance-enhancing (or performance-degrading) options include:

Allocation block size

As noted in the earlier section, "Minimizing Space Consumption," small allocation blocks can facilitate more efficient use of disk space, but the cost is a small degradation in disk-access speed. Therefore, to improve performance slightly, you can increase your block size. This option is not easily changed after creating a filesystem. With ext2fs or ext3fs, you can use the -b block-size option to mke2fs; with XFS, the -b size=block-size option to mkfs.xfs does the job. For ext2fs and ext3fs, block-size must be 1024, 2048, or 4096; with XFS, the block size can theoretically be any power-of-two multiple of 512 bytes up to 64KB (65536 bytes), although in practice you can only mount a filesystem with block sizes up to 4KB or 8KB using common CPUs. ReiserFS and Linux's version of JFS do not yet support adjusting this feature.

Journaling options

All the journaling filesystems support various journal options. One common option is the location of the journal. By placing the journal on a separate physical disk from the main filesystem, you can improve performance (provided the target disk isn't too sluggish itself). You can use the -J device=journal-device option in mke2fs or the -j journal-device option in mkreiserfs or mkfs.jfs to set this feature. Ext3fs also supports setting the journal size with the -J size=journal-size option, where journal-size is specified in megabytes and must be between 1,024 and 102,400 filesystem blocks. Specifying a too-small journal may degrade performance, but setting one too large may rob you of too much disk space. If in doubt, let mke2fs decide on the journal size itself.

Reserved blocks

Ext2fs and ext3fs reserve a number of blocks for use by the superuser (or some other user you specify). The default value of 5 percent reserved space may be overkill on large partitions or on less critical partitions (such as /home). You can gain a bit more space by using the -m reserved-percentage option to mke2fs. Changing this percentage won't affect actual disk performance, but it may gain you just a bit more available disk space. You can change this option after you create a filesystem by passing the same parameter that mke2fs accepts to the tune2fs program, as in tune2fs -m 1 /dev/hda4 to set the reserved blocks percentage to 1.

Check interval

Ext2fs and ext3fs force a filesystem check after a specified number of mounts or a specified amount of time between mounts. The idea is to catch errors that might creep onto the filesystem due to random disk write errors or filesystem driver bugs. You can change these intervals by using the -c max-mount-counts and -i interval-between-checks options to tune2fs. For the latter option, you specify an interval in days, weeks, or months by providing a number followed by a d, w, or m, respectively. Altering the check interval won't modify day-to-day performance, but it will change how frequently the computer performs a full disk check on startup. This disk check can be quite lengthy, even for ext3fs; it doesn't restrict itself to recent transactions as recorded in the journal, as a forced check after a system crash does.

Directory hash

ReiserFS uses a sorted directory structure to speed directory lookups, and mkreiserfs provides several options for the hash (a type of lookup algorithm) used for this purpose. You set this option with the -h hash option to mkreiserfs, where hash can be r5, rupasov, or tea. Some hashes may yield improved or degraded performance for specific applications. The Squid Web proxy documentation suggests using the rupasov hash, whereas the qmail documentation recommends r5, for instance. One problem with the r5 and rupasov hashes is that they can greatly slow file creation in directories with very many (a million or so) files. In fact, rupasov is very prone to such problems, and so should be avoided on most systems. The tea hash is much less subject to this problem, but it is also much slower than r5 for directories with more typical numbers of files. In general, you should use the default r5 hash unless you know you'll be creating many files or the disk will be used by one performance-critical application, in which case checking the application's documentation or doing a web search for advice may be worthwhile.

Inode options

XFS enables you to set the inode size at filesystem creation time using the -i size=value option to mkfs.xfs. The minimum and default size is 256 bytes; the maximum is 2,048 bytes. (The inode size can't exceed half the allocation block size, though.) One impact of the inode size option relates to small file access times; because XFS tries to store small files within the inode whenever possible, specifying a large inode enables storing larger files within the inode. Doing so will speed access to these files. Therefore, if a partition will store many small files (under 2KB), you may want to increase the inode size. Depending on the exact mix of file sizes, the result may save or waste disk space. If few files will be smaller than 2KB, there's little point to increasing the inode size.

The default filesystem creation options usually yield acceptable performance. Modifying these options can help in some unusual cases, such as filesystems storing huge numbers of files or a computer that's restarted frequently. I don't recommend trying random changes to these options unless you intend to run tests to discover what works best for your purposes.

Converting ext2fs to ext3fs

One of the advantages of ext3fs over the other journaling filesystems is that it's easy to turn an existing ext2 filesystem into an ext3 filesystem. You can do this using the tune2fs program and its -j option:

# tune2fs -j /dev/hda4

If the filesystem to which you add a journal is mounted when you make this change, tune2fs creates the journal as a regular file, called .journal, in the filesystem's root directory. If the filesystem is unmounted when you run this command, the journal file doesn't appear as a regular file. In either case, the filesystem is now an ext3 filesystem, and it can be used just as if you created it as an ext3 filesystem initially. If necessary, you may be able to access the filesystem as ext2fs (say, using a kernel that has no ext3fs support); however, some older kernels and non-Linux utilities may refuse to access it in this way, or they may provide merely read-only access.

On rare occasion, an ext3 filesystem's journal may become so corrupted that it interferes with disk recovery operations. In such cases, you can convert the filesystem back into an ext2 filesystem using the debugfs tool:

# debugfs -w /dev/sda4
debugfs 1.32 (09-Nov-2002)
debugfs:  features -needs_recovery -has_journal
Filesystem features: dir_index filetype sparse_super
debugfs:  quit

After performing this operation, you should be able to use fsck.ext2 with its -f option, as described in the upcoming section, "Filesystem Check Options," to recover the filesystem. The newly deactivated journal will cause fsck.ext2 to report errors even if the filesystem did not previously have them. If you like, you can then add the journal back by using tune2fs, as just described. (Don't try to remove the journal from a mounted filesystem.)

Next time we'll talk about other filesystem operations, such as resizing filesystems and defragmenting a disk.


  • Linux
Click Here!