September 14, 2007

30 days with JFS

Author: Keith Winston

The Journaled File System (JFS) is a little-known filesystem open sourced by IBM in 1999 and available in the Linux kernel sources since 2002. It originated inside IBM as the standard filesystem on the AIX line of Unix servers, and was later ported to OS/2. Despite its pedigree, JFS has not received the publicity or widespread usage of Linux filesystems like ext2/3 and ReiserFS. To learn more about JFS, I installed it as my root filesystem. I found it to be a worthy alternative to the bigger names.

To give JFS a try, I installed Slackware 12 on a laptop, choosing JFS as the filesystem during installation. I performed no special partitioning and created one JFS filesystem to hold everything. Installation was uneventful and the system booted normally from GRUB. Not all distributions offer JFS as an install option, and some may not have JFS compiled into their default kernels. While Fedora and SUSE users can use JFS, they both default to ext3. Slackware, Debian, Ubuntu, and their derivatives are good choices for anyone who wants to try JFS.

One of the first things I noticed about my new system was the absence of a lost+found directory, which is a relic of lesser filesystems.

JFS is a fully 64-bit filesystem. With a default block size of 4KB, it supports a maximum filesystem size of 4 petabytes (less if you use smaller block sizes). The minimum filesystem size supported is 16MB. The JFS transaction log has a default size of 0.4% of the aggregate size, rounded up to a megabyte boundary. The maximum size of the log is 32MB. One interesting aspect of the layout on disk is the fsck working space, a small area allocated within the filesystem for keeping track of block allocation if there is not enough RAM to track a large filesystem at boot time.

JFS dynamically allocates space for disk inodes, freeing the space when it is no longer required. This eliminates the possibility of running out of inodes due to a large number of small files. As far as I can tell, JFS is the only filesystem in the kernel with this feature. For performance and efficiency, the contents of small directories are stored within the directory's inode. Up to eight entries are stored in-line within the inode, excluding the self (.) and parent (..) entries. Larger directories use a B+ tree keyed on name for faster retrieval. Internally, JFS uses extents to allocate blocks to files, leading to efficient use of space even as files grow in size. This is also available in XFS, and is a major new feature in ext4.

JFS supports both sparse and dense files. Sparse files allow data to be written to random locations within a file without writing intervening file blocks. JFS reports the file size as the largest used block, while only allocating actually used blocks. Sparse files are useful for applications that require a large logical space but use only a portion of the space. With dense files, blocks are allocated to fill the entire file size, whether data is written to them or not.

In addition to the standard permissions, JFS supports basic extended attributes, such as the immutable (i) and append-only (a) attributes. I was able to successfully set and test them with the lsattr and chattr programs. I could not find definitive information on JFS access control list support under Linux.

Logging

The main design goal of JFS was to provide fast crash recovery for large filesystems, avoiding the long filesystem check (fsck) times of older Unix filesystems. That was also the primary goal of filesystems like ext3 and ReiserFS. Unlike ext3, journaling was not an add-on to JFS, but baked into the design from the start. For high-performance applications, the JFS transaction log file can be created on an external volume if one is specified when the filesystem is first created.

JFS only logs operations on meta-data, maintaining the consistency of the filesystem structure, but not necessarily the data. A crash might result in stale data, but the files should remain consistent and usable.

Here is a list of the filesystem operations logged by JFS:

  • File creation (create)
  • Linking (link)
  • Making directory (mkdir)
  • Making node (mknod)
  • Removing file (unlink)
  • Rename (rename)
  • Removing directory (rmdir)
  • Symbolic link (symlink)
  • Truncating regular file

Utilities

JFS provides a suite of utilities to manage its filesystems. You must be the root user to use them.

Utility Description
jfs_debugfs Shell-based JFS filesystem editor. Allows changes to the ACL, uid/gid, mode, time, etc. You can also alter data on disk, but only by entering hex strings -- not the most efficient way to edit a file.
jfs_fsck Replay the JFS transaction log, check and repair a JFS device. Should be run only on an unmounted or read-only filesystem. Run automatically at boot.
jfs_fscklog Extract a JFS fsck service log into a file. jfs_fscklog -e /dev/hda6 extracts the binary log to file fscklog.new. To view, use jfs_fscklog -d fscklog.new.
jfs_logdump Dump the journal log to a plain text file that shows data on each transaction in the log file.
jfs_mkfs Create a JFS formatted partition. Use the -j journal_device option to create an external journal (1.0.18 or later).
jfs_tune Adjust tunable filesystem parameters on JFS. I didn't find options that looked like they might improve performance. The -l option lists the superblock info.

Here is what a dump of the superblock information looks like:

root@slackt41:~# jfs_tune -l /dev/hda6
jfs_tune version 1.1.11, 05-Jun-2006

JFS filesystem superblock:

JFS magic number:       'JFS1'
JFS version:            1
JFS state:              mounted
JFS flags:              JFS_LINUX  JFS_COMMIT  JFS_GROUPCOMMIT  JFS_INLINELOG 
Aggregate block size:   4096 bytes
Aggregate size:         12239720 blocks
Physical block size:    512 bytes
Allocation group size:  16384 aggregate blocks
Log device number:      0x306
Filesystem creation:    Wed Jul 11 01:52:42 2007
Volume label:           ''

Crash testing

White papers and man pages are no substitute for the harsh reality of a server room. To test the recovery capabilities of JFS, I started crashing my system (forced power off) with increasing workloads. I repeated each crash twice to see if my results were consistent.

Crash workload Recovery
Console (no X) running text editor with one open file About 2 seconds to replay the journal log. Changes I had not saved in the editor were missing but the file was intact.
X window system with KDE, GIMP, Nvu, and text editor in xterm all with open files About 2 seconds to replay the journal log. All open files were intact, unsaved changes were missing.
X window system with KDE, GIMP, Nvu, and text editor all with open files, plus a shell script that inserted records into a MySQL (ISAM) table. The script I wrote was an infinite loop, and I let it run for a couple of minutes to make sure some records were flushed to disk. About 3 seconds to replay the journal log. All open files intact, database intact with a few thousand records inserted, but the timestamp on the table file had been rolled back one minute.

In all cases, these boot messages appeared:

**Phase 0 - Replay Journal Log
-|----- (spinner appeared for a couple of seconds, then went away)
Filesystem is clean

Throughout the crash testing, I saw no filesystem corruption, and the longest log replay time I experienced was about 3 seconds.

Conclusion

While my improvised crash tests were not a good simulation a busy server, JFS did hold up well, and recovery time was fast. All file-level applications I tested, such as tar and rsync, worked flawlessly, and lower-level programs like Truecrypt also worked as expected.

After 30 days of kicking and prodding, I have a high level of confidence in JFS, and I am content trusting my data to it. JFS may not have been marketed as effectively as other alternatives, but is a solid choice in the long list of quality Linux filesystems.

Categories:

  • Reviews
  • Linux