Linux.com

Feature: Linux

Optimizing Linux filesystems

By Roderick W. Smith on October 10, 2003 (8:00:00 AM)

Share    Print    Comments   

Last time we introduced a few common Linux filesystems and examined their features. If you've already installed Linux, your partitions are already set up and configured with particular filesystems, but you may decide you want to modify this configuration. What's the best way to begin?

This article is excerpted from the recently published book Linux Power Tools.

Some changes are tedious to implement. For instance, changing from one filesystem to another requires you to back up, create a new filesystem, and restore your files. One exception to this rule is changing from ext2fs to ext3fs. If you switch filesystems, you may be able to use filesystem-creation options to improve the performance of the new filesystem. Other changes can also be done relatively painlessly. These include defragmenting a disk (that is, repositioning file contents so that they're not spread out over the entire partition) and resizing partitions to give you space where you need it.

Creating a filesystem for optimal performance

Most filesystems support a variety of options that may impact performance. For instance, large allocation blocks can improve performance by reducing fragmentation and the number of operations needed to retrieve an entire file. Some of these options can be set only at filesystem creation time, but some can be changed after the fact. Not all of these features are available in all filesystems. Across all Linux filesystems, important and popular performance-enhancing (or performance-degrading) options include:

Allocation block size

As noted in the earlier section, "Minimizing Space Consumption," small allocation blocks can facilitate more efficient use of disk space, but the cost is a small degradation in disk-access speed. Therefore, to improve performance slightly, you can increase your block size. This option is not easily changed after creating a filesystem. With ext2fs or ext3fs, you can use the -b block-size option to mke2fs; with XFS, the -b size=block-size option to mkfs.xfs does the job. For ext2fs and ext3fs, block-size must be 1024, 2048, or 4096; with XFS, the block size can theoretically be any power-of-two multiple of 512 bytes up to 64KB (65536 bytes), although in practice you can only mount a filesystem with block sizes up to 4KB or 8KB using common CPUs. ReiserFS and Linux's version of JFS do not yet support adjusting this feature.

Journaling options

All the journaling filesystems support various journal options. One common option is the location of the journal. By placing the journal on a separate physical disk from the main filesystem, you can improve performance (provided the target disk isn't too sluggish itself). You can use the -J device=journal-device option in mke2fs or the -j journal-device option in mkreiserfs or mkfs.jfs to set this feature. Ext3fs also supports setting the journal size with the -J size=journal-size option, where journal-size is specified in megabytes and must be between 1,024 and 102,400 filesystem blocks. Specifying a too-small journal may degrade performance, but setting one too large may rob you of too much disk space. If in doubt, let mke2fs decide on the journal size itself.

Reserved blocks

Ext2fs and ext3fs reserve a number of blocks for use by the superuser (or some other user you specify). The default value of 5 percent reserved space may be overkill on large partitions or on less critical partitions (such as /home). You can gain a bit more space by using the -m reserved-percentage option to mke2fs. Changing this percentage won't affect actual disk performance, but it may gain you just a bit more available disk space. You can change this option after you create a filesystem by passing the same parameter that mke2fs accepts to the tune2fs program, as in tune2fs -m 1 /dev/hda4 to set the reserved blocks percentage to 1.

Check interval

Ext2fs and ext3fs force a filesystem check after a specified number of mounts or a specified amount of time between mounts. The idea is to catch errors that might creep onto the filesystem due to random disk write errors or filesystem driver bugs. You can change these intervals by using the -c max-mount-counts and -i interval-between-checks options to tune2fs. For the latter option, you specify an interval in days, weeks, or months by providing a number followed by a d, w, or m, respectively. Altering the check interval won't modify day-to-day performance, but it will change how frequently the computer performs a full disk check on startup. This disk check can be quite lengthy, even for ext3fs; it doesn't restrict itself to recent transactions as recorded in the journal, as a forced check after a system crash does.

Directory hash

ReiserFS uses a sorted directory structure to speed directory lookups, and mkreiserfs provides several options for the hash (a type of lookup algorithm) used for this purpose. You set this option with the -h hash option to mkreiserfs, where hash can be r5, rupasov, or tea. Some hashes may yield improved or degraded performance for specific applications. The Squid Web proxy documentation suggests using the rupasov hash, whereas the qmail documentation recommends r5, for instance. One problem with the r5 and rupasov hashes is that they can greatly slow file creation in directories with very many (a million or so) files. In fact, rupasov is very prone to such problems, and so should be avoided on most systems. The tea hash is much less subject to this problem, but it is also much slower than r5 for directories with more typical numbers of files. In general, you should use the default r5 hash unless you know you'll be creating many files or the disk will be used by one performance-critical application, in which case checking the application's documentation or doing a web search for advice may be worthwhile.

Inode options

XFS enables you to set the inode size at filesystem creation time using the -i size=value option to mkfs.xfs. The minimum and default size is 256 bytes; the maximum is 2,048 bytes. (The inode size can't exceed half the allocation block size, though.) One impact of the inode size option relates to small file access times; because XFS tries to store small files within the inode whenever possible, specifying a large inode enables storing larger files within the inode. Doing so will speed access to these files. Therefore, if a partition will store many small files (under 2KB), you may want to increase the inode size. Depending on the exact mix of file sizes, the result may save or waste disk space. If few files will be smaller than 2KB, there's little point to increasing the inode size.

The default filesystem creation options usually yield acceptable performance. Modifying these options can help in some unusual cases, such as filesystems storing huge numbers of files or a computer that's restarted frequently. I don't recommend trying random changes to these options unless you intend to run tests to discover what works best for your purposes.

Converting ext2fs to ext3fs

One of the advantages of ext3fs over the other journaling filesystems is that it's easy to turn an existing ext2 filesystem into an ext3 filesystem. You can do this using the tune2fs program and its -j option:

# tune2fs -j /dev/hda4

If the filesystem to which you add a journal is mounted when you make this change, tune2fs creates the journal as a regular file, called .journal, in the filesystem's root directory. If the filesystem is unmounted when you run this command, the journal file doesn't appear as a regular file. In either case, the filesystem is now an ext3 filesystem, and it can be used just as if you created it as an ext3 filesystem initially. If necessary, you may be able to access the filesystem as ext2fs (say, using a kernel that has no ext3fs support); however, some older kernels and non-Linux utilities may refuse to access it in this way, or they may provide merely read-only access.

On rare occasion, an ext3 filesystem's journal may become so corrupted that it interferes with disk recovery operations. In such cases, you can convert the filesystem back into an ext2 filesystem using the debugfs tool:

# debugfs -w /dev/sda4
debugfs 1.32 (09-Nov-2002)
debugfs:  features -needs_recovery -has_journal
Filesystem features: dir_index filetype sparse_super
debugfs:  quit

After performing this operation, you should be able to use fsck.ext2 with its -f option, as described in the upcoming section, "Filesystem Check Options," to recover the filesystem. The newly deactivated journal will cause fsck.ext2 to report errors even if the filesystem did not previously have them. If you like, you can then add the journal back by using tune2fs, as just described. (Don't try to remove the journal from a mounted filesystem.)

Next time we'll talk about other filesystem operations, such as resizing filesystems and defragmenting a disk.

Share    Print    Comments   

Comments

on Optimizing Linux filesystems

Note: Comments are owned by the poster. We are not responsible for their content.

What's this? NewsForge Infomercials?

Posted by: Anonymous Coward on October 10, 2003 05:04 PM
This is the second article in two days. Is NewsForge book mongering for Barnes and Noble now?. Doesn't this kind of stuff belong in newsvac instead of in the "homeboys" feature articles section?

#

Re:What's this? NewsForge Infomercials?

Posted by: smitty45 on October 10, 2003 10:42 PM
come on...it's an excerpt, and a very technical one at that. do you pay for newsforge ? no, you don't...if it's by plugging books that are about Linux and OpenSource, then more power too them.

#

Defragmenting?

Posted by: Anonymous Coward on October 10, 2003 05:15 PM
Defragmenting a linux partition?

Every forum a newbie asks this question, they get the reply that it is not needed.

Can you actually do this with tools that may come with a distro?

#

Re:Defragmenting?

Posted by: Anonymous Coward on October 10, 2003 09:24 PM
A common means of defragging is to copy the files to a new location, delete the original, and copy back. Chances are, the FS will attempt to allocate a continous area on the disk when you copy it back. The caveat is, you need to have a fair amount of free space available on the FS before you attempt to do this. If you do not have a lot of free space already, the fix for that is, copy large groups (directories, hint, hint) of files at one time, delete, then copy back.

** Make sure you preserve permissions and ownership when you do all this copying and moving! **

#

Re:Defragmenting?

Posted by: Anonymous Coward on October 14, 2003 03:35 PM
xfs include defrag support
the utility is xfs_fsr

#

Re:YES, Defragmenting

Posted by: Anonymous Coward on October 11, 2003 03:35 AM
Look, from the moment that not all files have the exact same size, defrag is NEEDED. The thing that people say that linux doesn't need it, is a MYTH.

#

Re:YES, Defragmenting

Posted by: Rob Park on October 11, 2003 05:26 AM
The way I understand defragmenting, is that basically the disk-access algorithms used in linux filesystems are more efficient than those used in Windows filesystems; the end result is that you have to constantly defrag your fat32 partitions or they'll slow right down, whereas defragging on linux doesn't benefit you as much (sure, having the file defragged would speed it up, but the algorithm is designed so that fragmentation doesn't slow it down as much, so the benefit of defragging is minimal).

#

Re:YES, Defragmenting

Posted by: Anonymous Coward on October 11, 2003 06:44 PM
The new version of Reiserfs (v4) is supposed to do automatic periodic defrags, IIRC. Hopefully it will be a tunable option.

#

Re:YES, Defragmenting

Posted by: Anonymous Coward on October 11, 2003 05:54 AM
Look, from the moment that not all files have the exact same size, defrag is NEEDED.

This shows a complete lack of understanding of what disk fragmentation is. The files can be all *sorts* of different sizes with zero fragmentation.

#

Re:YES, Defragmenting

Posted by: Anonymous Coward on October 15, 2003 01:22 AM
Actually, it shows your own lack.

If all files are the same size, any new file can fit evenly into the hole left by a previous file's deletion, without fragmentation. Thus, if all files are always the same size (a fantasy, obviously) then fragmentation never occurs.

Think, ya parrot.

#

Re:YES, Defragmenting

Posted by: Anonymous Coward on October 15, 2003 07:10 PM
Confusion point: IIRC the MS Windows defrag does 2 jobs:
1 - it ensures that all parts of a file are stored contiguously
2 - it packs the files together, leaving no (or little) space between them.
I think some people on this thread are talking about 1, others about 2.
(Personnal I would say "Defragmenting" means only 1.)

Regards, Simon

#

Sure, use tar or cpio.

Posted by: Anonymous Coward on October 15, 2003 01:26 AM
Do a file-by-file backup of everything on your disk to some other media using tar, wipe the disk clean (preferably by reformatting), restore.

All files will be restored contiguously. This is exactly what we all did until around 1980 or so when on-line defraggers became available.

I've done it too many times to count on PDP-11 and old Unix systems. Takes forever, but it works!

#

Try it

Posted by: peterdaly on October 10, 2003 07:27 PM
You would not believe how much a little fs can get you in performance for specific applications. Try it.

-Pete

#

block sizes in reiser

Posted by: Anonymous Coward on October 10, 2003 08:03 PM
You don't want to change the ReiserFS block size (unless you really know what you're doing). ReiserFS employs a technique called tail packing where the last block of a file is put in another file's slack space, thereby using disk space much more efficiently. As far as I know, no other file system is boasting this (but please correct me if I'm wrong).

#

available in ext3?

Posted by: gus3 on October 10, 2003 08:32 PM
IIRC, a project was on the way to get "tail merging" into ext3fs.

#

Re:block sizes in reiser

Posted by: Anonymous Coward on October 12, 2003 10:34 AM
Novell has been doing this very thing with slack space on the file system for years with their standard NetWare file system.

#

Re:block sizes in reiser

Posted by: Anonymous Coward on November 20, 2003 12:45 AM
Sun's UFS has been using this technique for many years already, like Novell has.

#

What about turning off the Last Access time?

Posted by: Anonymous Coward on October 10, 2003 11:23 PM
Although it doesn't apply to me (which is why I don't remember how to do it), I know that, in some circumstances, disk I/O performance can be substantially improved by turning off the recording of the Last Access time for each file.

This is especially true if your application has a large number of small, randomly-accessed, read-only files. If the Last Access recording is on (which is the default), then every time a file is read, it also requires a write to update the time.

Note that this (leaving the Last Access time recording on) is one of the tricks that Microsoft pulls when running their head-to-head benchmarks, in order to give Windows an advantage over Linux.

Maybe someone who knows more about it could remind us how to change the setting, as well as correct anything I've gotten wrong.

#

Re:What about turning off the Last Access time?

Posted by: Anonymous Coward on October 11, 2003 12:26 AM
The mount option "noatime" turns off access time modification.

#

Thanks (n/t)

Posted by: Anonymous Coward on October 11, 2003 03:39 PM
n/t

#

Re:What about turning off the Last Access time?

Posted by: ptuck on October 11, 2003 08:59 PM
In<nobr> <wbr></nobr>/etc/fstab, set the noatime and nodiratime options.

Phil.

#

QoS delivery on the filesystem

Posted by: Anonymous Coward on October 11, 2003 02:17 AM
Allegedly this is possible on XFS, but I've not seen a HOW-TO.

It would be sweet if I could guarantee disk rates for different applications... I could save a bundle by consolidating a few servers.

Network QoS doesn't do the job I want... I want to give higher priority to CIFS streams by Marketing (streaming video over CIFS), but lower CIFS performance for say Administrative folks who just copy files around. You can't do that per-user with Network QoS... (not really anyways).

#

Ext3 to optimize???

Posted by: Anonymous Coward on October 11, 2003 03:06 PM
Geez, the only way to optimize ext2 is to use another fs altogether, such as reiserfs, jfs or xfs.

Switching to ext3 will slow your system down tremendously and can hardly be called an optimization!

#

for what definition of optimize?

Posted by: Anonymous Coward on October 15, 2003 01:31 AM
For me, optimizing any computer system means arranging it to maximize my paycheck.

For you, it seems to be sacrificing reliability and high availability in favor of raw speed.

If you want to run a system that takes ten hours to fsck after a power supply blows, you don't need ext3, certainly. That would pessimize my paycheck, though, since I'd end up unemployed.

Do an fsck on a 150GB database after a hardware crash and tell me again that it's not worth my while to sacrifice raw speed for solid journaling.

#

hdparm?

Posted by: Anonymous Coward on October 11, 2003 06:58 PM
Can you not do a lot of these things with hdparm for the whole disc rather than individual partitions? And maybe even more?

Can someone give hdparm equivalents to the optimisations he suggests?

I use this, from the Gentoo docs, which is a fairly safe optimisation for any disc:

hdparm -d1 -A1 -m16 -u1 -a64<nobr> <wbr></nobr>/dev/hda

#

Re:hdparm?

Posted by: Anonymous Coward on October 11, 2003 10:33 PM
Hdparm, while a useful tool, is meant for something totally different. Hdparm is for
setting disk parameters.
In the command you give above:
-d1 enables dma (direct memory access)
-A1 enables read-lookahead
-m16 set the multiple sector I/O to 16 (as in read 16 sectors at a time)
-u1 sets interrupt-unmask
-a64 sets filesystem read-ahead (essentially buffering in main memory)

These options are all related to hardware control. This article talked about filesystem performance which is a separate issue. You can have the fastest harddrive in the world and still get bad performance because you are using
a filesystem that is configured all the wrong way. Usually any modern filesystem defaults to good allround settings which means they compromise and tweaking is possible.

#

Start with c:

Posted by: noshellswill on October 12, 2003 05:19 AM
Optimize *nix file_systems for the casual user - that is for 99.975% of the computer using yeomary? That's simple, pad'res. Start with c:\<nobr> <wbr></nobr>.... then c:\program files<nobr> <wbr></nobr>.....
What with ~nixVB and ~Street_C( imagine visual_Fortran ) we'll really be getting somewhere useful. Tack on WebMin & that's heaven.

#

Swap

Posted by: Anonymous Coward on October 13, 2003 07:18 AM
If you put our swap space somewares in the middle of the drive, the heads will not have to move as far from any weres of the hard disk.

I used this on my old 486sx and it did increase performance.

Tim.

#

Re:Swap

Posted by: Anonymous Coward on October 13, 2003 11:17 AM
Uhh, yes, while most people still do that - try to put swap in the middle, it is no longer clear where exactly the 'middle' is, due to multiple platter disks and buffering. The only thing that is clear is that putting it on either end is probably the worst position, but in between these two extremes, will be many similar equally bad positions. So the best position for swap is a matter of luck...

#

Re(1):Swap

Posted by: Anonymous [ip: 217.8.207.197] on October 16, 2007 10:52 AM
definitely, the best position for swap is on a separate drive..

#

reserved blocks

Posted by: Anonymous Coward on October 14, 2003 04:35 PM
The reserverd blocks are there for a reason. The reason is avoiding fragmentation. Ext2/ext3 don't fragment much if they do not get full. The reserved percentage keeps normal users from making the filesystem full so it helps against fragmentation. So you don't need any defragmentation at all in practice (in 95%+ cases).

#

An idea

Posted by: Anonymous Coward on October 14, 2003 04:56 PM
How could I make a script to reveal all the files that are touched during the boot process? Then all those files can be copied together sequentially, the old copies - deleted, and thus the boot process could be accelerated somehow. Do you think it would worth it?

#

Use the noatime option in /etc/fstab

Posted by: Anonymous Coward on October 14, 2003 09:57 PM
Every file read access in Linux does a disk write to store the last access time. This can be disables per partition with the noatime mount option. Excellent for busy web/e-mail/file servers. You should check, though, if some of your software doesn't depend on it (rare).

#

This story has been archived. Comments can no longer be posted.



 
Tableless layout Validate XHTML 1.0 Strict Validate CSS Powered by Xaraya