June 8, 2009, 2:46 pm
Let‚Äôs face it‚ÄìWe‚Äôre addicted! To files that is. More importantly, we are addicted to the massively large and ever increasing storage devices upon which we store those files. Make no mistake though, like any addiction, storing content comes at a cost and usually those costs are paid at the filesystem level. We all want more space and we all want better performance when it comes to disk I/O and a junkie‚Äôs wishlist never ends.
Fedora 11, when released tomorrow, will be the first distribution to boast the inclusion of ext4, the latest incarnation in the extended file system family, as default. Ext4 brings with it support for larger filesystems, larger single file size and many improvements in almost every imaginable facet. Join me for an interview with Eric Sandeen, renown file system hacker, Red Hat Engineer and Fedora Contributor as he takes on a little trip down Filesystem Alley and explains what filesystems are, where did they come from, why should we care and why they along with Fedora 11 are prepping to take over the WOOOOORLD!
1. Please give is a quick self-introduction and how you got started working on Fedora/Red Hat and filesystems.
I was an electrical engineer by education & career years ago, but in the course of that work I started fiddling with Linux ‚Äì Red Hat Linux (5?) was actually the first distro I ever used. I worked at SGI for about 6 years on the XFS filesystem, and then moved to Red Hat to work on ext3, ext4, XFS, and other filesystem related bits. I feel lucky to be able to have turned a fun hobby into a paid gig.
2. Practically, what is a filesystem and why should the average user care about what filesystem they are using?
A filesystem is the detailed format of how the operating system stores data on disk, and how it manages reading and writing of that data. The filesystem‚Äôs job, first and foremost, is to keep the user‚Äôs data intact and accessible, but beyond that, extra features and speed on certain workloads may influence a user‚Äôs choice of which filesystem to use.
3. Can you give us a brief history on filesystems in Linux? What have been the major milestones?
Linux started out with a very simple filesystem, the Minix filesystem. This was replaced with the ‚Äúext‚Äù filesystem around 1992. Ext2 showed up around 1993, and the later ext* filesystems been developed from that basic lineage. Around 2000-2001, there was a bit of an explosion of new journaling filesystems for Linux, including ext3, xfs, jfs, and reiserfs. Of those, I‚Äôd say that ext3, ext4, and XFS have remained in most active development to this day.
Ext4 development was started about 3 years ago to address scalability & functional limitations of ext3, working on top of the ext3 codebase. Some of the basic features came from work that ClusterFS and Bull had done for Lustre, and other development has happened on top of that. It‚Äôs been a joint effort by several entities upstream, and we‚Äôve all worked together to make a good filesystem.
4. In Fedora 4, the default filesystem will be ext4. Fedora 11 will be the first distro to offer ext4 as the default FS. Why is that significant?
I think Fedora has always taken pride in helping to develop new features for Linux, and pushing them as part of the distribution to get these features out to a user base. It‚Äôs always a bit of a balancing act, because new software inherently has bugs, and users expect any distribution to work well, of course.
The open development process of Fedora has allowed early adopters to test & provide bug reports and feedback on ext4, and the end result, I think, is that we have a very solid ext4 codebase for F11. It was a little rough in the beginning but thanks to all the testers, and the hard work by all the upstream ext4 developers, I feel confident that we‚Äôre in good shape.
5. What limitations was ext4 developed to overcome and what benefits can we expect to see? There are also new features like the addition of extents and pre-allocation. These specific features are a big win over previous filesystems. Can you tell us more?
One of the primary limitations of ext3, and motivators for ext4, was the relatively small maximum file size (2T) and filesystem size (16T). The allocator in ext3 wasn‚Äôt particularly efficient either, and the direct/indirect block layout scheme caused some performance bottlenecks.
The ext4 on-disk format allows for up to 1EB filesystems with 4k blocks, although due to user space tool limitations we‚Äôre still at a 16T maximum filesystem size. Work is currently underway to address this.
Ext4 also has a new allocator, called ‚Äúmballoc‚Äù which can be much more efficient than ext3‚Äôs old block at a time allocator.
One of the other real bottlenecks to scalability is how well a very large filesystem can be checked and repaired, and modifications to ext4‚Äôs metadata layout have yielded some very impressive speedups in e2fsck‚Äôs check times.
Features like extents and delayed allocation have honestly been around for a very long time on other Linux filesystems like XFS, and ext4 implemented these features in part based on that proven track record. Together these features can help give us very efficient allocation patterns.
One other thing that the extent format brings us is much faster deletion of large files compared to ext3 ‚Äì something which anyone who has had to enable the ‚Äúslow delete‚Äù feature of MythTV may appreciate.
Extents also allow filesystems like ext4 to efficiently track preallocated disk space, allowing applications which use preallocation calls to get more efficient allocation. The transmission bittorrent client and the libvirt tools are a couple of packages in Fedora which make use of this.
6. Fedora has been using LVM and other volume management layers for a while. In fact, Fedora helped pioneer technologies like LVM. How does ext4 play well with these? How does it facilitate use of those technologies?
To be honest, there‚Äôs a lot more work to be done in this area. One of the things which has just recently been addressed upstream is LVM‚Äôs ability to pass write barriers from the filesystem down to the underlying block device. Write barriers prevent write reordering by the drive. They have a bit of a performance hit, but they‚Äôre needed to ensure a journaling filesystem‚Äôs consistency whenever power is lost to a disk with a volatile write cache. Until very recently, LVM didn‚Äôt pass these barrier requests down at all; this now works upstream for simple LVM volumes, and work is ongoing in this area.
The other area where filesystems and volume managers really need to communicate is in the geometry of the aggregate block device ‚Äì ideally the filesystem wants to know about the stripe unit and stripe width of a raid5 device, for example, so that it can do efficient, well-aligned allocation and IOs. The XFS userspace utilities are able to extract this information from software raid devices and use it at mkfs time, and honestly this is something that needs to be added to e2fsprogs as well. Again, there is more work going on upstream to address this issue.
7. What are your thoughts on the future of filesystems? What do you think are the features that we should be focusing on? Are we working on pioneering any of these efforts in Fedora?
One of the big pushes is for more active protection of the user‚Äôs data via checksumming at all levels, as well as management features, such as better ability to use multiple devices for a filesystem. In Linux, a lot of this type of work is being done in the new BtrFS filesystem.
Fedora 11 is a pretty exciting release for filesystems overall, because it also includes an early preview of BtrFS. Josef Bacik, one of our filesystem developers, has been putting a lot of effort into BtrFS upstream. Adventurous users who want to try out BtrFS can do so in F11, and even install the distro onto it by booting the installer with a ‚Äúsecret‚Äù boot argument ‚Äì ‚Äúicantbelieveitsnotbtr‚Äù. This is a very early preview, and isn‚Äôt yet suitable for more than testing for most users, but early testing and bug reporting will be very useful.
8. Do you like any other filesystems that are being used/developed, such as ZFS, which seems to be a big fan favorite and others like BtrFS, Tux3?
ZFS has a lot of nice advertised features, but it‚Äôs not really available for Linux primarily due to license issues ‚Äì and I‚Äôm not sure the userspace fuse implementation is optimal, but I may be biased as a kernel filesystem developer! BtrFS shows a lot of promise, and Chris Mason and his crew have been developing it at an amazing pace, in my opinion. BtrFS is a pretty fundamental re-thinking of what a Linux filesystem should be.
I can only keep so much in my brain at once, and so have not really kept up with Tux3. The other filesystem that I still think is interesting is XFS, because it has the scalability and feature set that ext4 is striving for in a mature, well-tested (though pretty complex‚Ä¶) code base. Of course, like anything else, it has its strengths and weaknesses. It‚Äôs also a pretty different beast administratively compared to what people are used to with ext2 and ext3.
9. What is your day to day development cycle look like? Surely, work continues on ext4, but what else are you working on? What do you spend your free time doing?
I do a fair amount of work on ext4 and XFS on a daily basis, and a lot of my time is taken addressing various Fedora and Red Hat Enterprise Linux user & customer bugs. I maintain a few other filesystem-related tools for Fedora and RHEL as well, including e2fsprogs, xfsprogs, xfsdump, blktrace, fio, ffsb, fs_mark, seekwatcher‚Ä¶. this keeps me plenty busy!
I‚Äôve recently been working on making the xfs regression test suite filesystem-agnostic so that other filesystems can use this basic framework for regression testing; it‚Äôs been hugely useful for XFS development. We have about 30 tests running on other filesystems now.
There are many other bits and pieces that compete for attention every day, so there‚Äôs a lot of juggling of priorities. Any filesystem corruption bugs or oopses usually rise to the top.
Free time? I have a family and a 2 kids, so there‚Äôs not a lot of that! I bike and swim when I can, and to be honest some of my free time is spent‚Ä¶. hacking filesystems. I guess it‚Äôs in my blood.
10. How are you planning to celebrate the Fedora 11 release tomorrow?
Hm, I‚Äôll probably be working on what needs to be done for F12.