Why filesystems matter
The filesystem mediates between the operating system and the storage device, mapping what the operating system understands as directories and files onto what the device understands such as tracks and sectors. This seems like an essential but mundane function -- not one that has a major bearing on IT decision-making. However, anyone who has ever had to defragment a Windows disk, or watch fsck grind through a long recovery on a Linux ext2 disk partition, can appreciate how important the filesystem can suddenly become.
Filesystems have a major impact on how secure and reliable your data is, as well as how flexible your applications can be in interacting with that data. This latter point may not be obvious. Think, however, about Windows's expectation that files of a certain type must have a certain extension; absent that extension applications are often at a loss for how to handle the file. Of course there are application-level workarounds for these problems, but they point to a clear tension in application design: how much should the filesystem be doing to facilitate application execution, and how much should the application be compensating for functionality not in the filesystem?
When it comes to fault tolerance and data integrity, the industry consensus is that filesystems should do the heavy lifting. This was a major challenge for Linux in the transition from the 2.2 to the 2.4 series of kernels.
By late 2000 Linux had become popular for low-end server systems. The low cost of the operating system, the low cost of the hardware needed to run it, and the relatively high performance made Linux compelling. What you did not find in late 2000, however, were signficant Linux deployments for mission-critical applications. One of the biggest limitations was the filesystem.
Proprietary versions of Unix all offered journaling filesystems: filesystems that not only mediated between operating system and storage media, but kept a log of mediation activity for rapid recovery in the event of a system crash. Notable among these were IBM's JFS and SGI's XFS. Linux still relied on ext2, which achieved high performance in terms of speed but suffered from a slow and sometimes painful recovery process.
Ext2 made Linux a great choice as a Web server. Having one of many mirrored Web heads behind a site go down and come back up slowly was an acceptable cost if the upside was very fast performance in serving flat-file Web pages. But ext2 severely limited Linux's suitability for mission-critical environments.
The release of the 2.4 kernel in January 2001 was a watershed moment for Linux. Linux gained a native journaling filesystem in ext3, and both JFS and XFS were supported options for 2.4 kernels as well. With other improvements in fault tolerance and scalability, Linux could take on an ever larger server role in the enterprise.
Still, issues of data integrity, recovery, and fault tolerance remained. These are the very same issues that arise in the world of databases and database application development. The parallel shouldn't be surprising. One can argue that a filesystem is nothing but another kind of database.
Linux filesystems today
In the enterprise, Linux is viewed primarily as a server operating system. Not surprisingly, then, filesystem innovation has been driven by server needs. The performance and fault tolerance that come with a journaling filesystem were the earliest need. There's a good technical comparison of journaling filesystems in Linux Gazette.
Work has progressed more slowly on incorporating attributes into filesystems. Attributes are short name-value pairs that are associated with each file; familiar stuff to anyone from the database world. "Phone number," "email," and "mime type" are examples of entities that could be attributes. Attributes help a filesystem present its structure to the operating sytem in a rich and meaningful way.
| The search wars |
|---|
|
The early days of the Web saw fierce competition between search engines: Alta Vista, Lycos, Magellan, Inktomi, all strove to dominate the search market. Seemingly out of nowhere, Google emerged as the clear winner. Google's ascendance signaled not the end of the "search wars", but rather the beginning.
Microsoft realizes that it has so far lost the search war, much as it lost the early stages of the browser war. To make a comeback, it must find and dominate an area of search technology where Google is not already entrenched. It has chosen the desktop, an area about which many users are asking these days, "Why is it easier to search the Internet than my hard drive?" Microsoft will try to leverage its ability to tweak the operating system (hence WinFS) to become a leader in desktop search capability. It can then couple that with its online presence to offer unified search. Google already dominates online search, and will have to find an application-level solution to extend its capabilities to the desktop. This competition has broad implications for the enterprise. In any large company the thousands of enterprise desktops house valuable data. Any software that figures out how to make that data easy to retrieve will be a compelling choice for the enterprise desktop. There are implications beyond the desktop, however. Think about why Oracle urges deploying its database configured for raw disk I/O. Such a configuration increases performance because it enables Oracle to have its database function as a filesystem. If Microsoft can enable its filesystem to function as a database, then at least on small to midsized applications SQL Server may be able to compete with Oracle as never before. In an interview to New Scientist, IDC operating system analyst Dan Kuznetsky says, "A number of people have started to say we need to use the technology developed for databases and Web searching and use them for the filesystem." Where does Linux stand in all of this? If Linux is really to compete on the desktop, and if Linux is to advance its hold in the server space, then it must enter the search wars, and do so at the filesystem level. |
The first serious effort to incorporate attributes with Linux came from the appliction side, not the filesystem side, and not surprisingly it came from someone with a long history at Apple: Andy Hertzfeld. Hertzfeld's Eazel brought us the Nautilus file browser, an elegant addition to the Linux desktop interface that attempted to deliver many of the benefits of an attributed filesystem: automatic viewing/previewing of file contents, attribute-based rather than hierarchical folders, intelligent recognition of file type for application handling.
Alas, Eazel was ahead of its time, and suffered the fate of many dot-coms. Nautilus, however, lives on as part of the GNOME Desktop.
The real future for Linux, though, depends on filesystem innovations that enable Linux to keep up or lead in the race with Longhorn and Tiger.
Longhorn, Microsoft's next generation operating system, expected in 2006, will include WinFS, a filesystem built on an object relational database structure. This will improve speed and stability, and also greatly facilitate search capability.
In Tiger, expected in the first half of next year, Apple will debut a new search technology called Spotlight. Not only will Spotlight speed searching, but it will return richer data about files it searches by "by indexing the descriptive informational items already saved within your files and documents called metadata."
The next-generation Linux filesystem should facilitate comparably functionality, rather than requiring applications to compensate for capabilities the filesystem lacks. There's some genuine awareness and discussion of this on the GNOME Desktop mailing list. The GNOME developers realize that they need attribute functionality in the filesystem, and that they need it on a time table that puts them ahead of the WinFS release in Longhorn.
Linux already has a viable next-generation filesystem candidate in ReiserFS. ReiserFS is not just a journaling filesystem, but one that uses an innovative database structure (so-called "dancing trees"). While ReiserFS does not have a native concept of attributes per se, its ability to handle lots of small files with negligible performance hits means that all the metadata functionality we associate with attributes can be built in.
For now the emphasis is on "can be." This is a clear direction in which Linux is moving, but we're not there yet.
Looking to the future
All indications are that Linux, Windows, and Mac OS are moving in a common direction with filesystem innovation. Linux's continued success depends on who gets there first, and how the market reacts to the Linux approach.
Much also depends on what happens competitively within the Linux market. Right now, more real innovation seems to be coming from Novell/SUSE rather than Red Hat. Novell's Miguel de Icaza and Nat Friedman have been very clear about the competitive challenge presented by Longhorn. SUSE already ships with ReiserFS as the default filesystem (Red Hat defaults to ext3).
Linux is a ways yet from having a fully attributed, database-driven, journaling filesystem. The direction of future development looks promising, though. Linux will certainly compete as the search wars come to the desktop. Linux's value to the enterprise depends on it.
Note: Comments are owned by the poster. We are not responsible for their content.
Adding attributes to the filesystem is absolutely required, and as a former BeOS user I can tell you that they make the user experience orders of magnitude better -- it is totally painful managing MP3s on any other OS, but on BeOS it's trivial. My Tracker windows in Be were more powerful than iTunes.
But adding those attributes is not innovation, it's catch-up. Linux is a good server OS, but most of its success has been through commoditization of established practices. Linux has a lot of work to do before it can begin actually innovating, because it is pretty far behind even defunct operating systems like BeOS and MacOS 9 and earlier, at least in terms of functionality that the user sees.
I'd love to see live queries, built-in support for attributes in file browsers, file translators (I still don't see anyone trying to copy this real innovation from BeOS, but it was one of their best ideas), global API-level support for MIME-based filetyping, global registration of which types each app can handle, global and per-user registration of which apps are the default handlers for each super- and sub-type, etc.
Like I said, Linux has a heckuva lot of work to do before anything it does is an innovation. The announcement of Spotlight is not an innovation, it's a copy of what Be did, and it's even done by the guy who did it at Be. Longhorn's DBFS stuff is also not an innovation -- Be did that before their later BFS but got rid of it because of performance.
We don't need innovation, we need people actually providing what are established as great ideas. I just hope that the Linux community is interested in truly great Desktop support; I am not, unfortunately, convinced that it is.
One thing though, even BeOS did borrow features from those that came before it. For example, the translators you mention actually evolved as a result of some of us ex-Amiga programmers recognising the pioneering work done on the Amiga datatypes libraries.
In regard to paragraph three:
The filesystem as database doesn't sound much different from existing file systems, other than possibly adding the ability to add extra data fields (attributes) to the file entries. As stated, everything must eventually come down to what is physically on the disk and clever software can compensate for at least some of the deficiencies. For example, fragmentation may be defeated with filesystem software that automatically defragments each file as it is closed and/or when it reaches a certain fragmentation level -- why should that be done outside the filesystem itself anyway?
However, the idea that the filesystem should somehow know (and possibly enforce) what applications can open what files would create a whole new set of problems. Unless we have 100% reliability in the hardware, a file somewhere will eventually be damaged. The damage may cause the filesystem to use the wrong application to access the file. The damage may make the file unreadable to the associated application, but the filesystem enforcement of which apps can open which files makes it difficult to repair. Unfortunately, if it isn't difficult to repair, then it probably isn't very secure, either.
Currently, under Windows or linux at least, it is relatively easy to open a file with some other application, for example, to run an image through a number of different tools. There is a danger that an overly smart filesystem would inhibit that capability. Or, from the security point of view, there is an opportunity for a smart enough filesystem to prevent access to files by 'unauthorised' applications.
As for portability, I'm thinking about all of the times when a physical disk drive may need to be transferred from one computer to another, upgrading or replacing are both good reasons for that. However, embedding too much intelligence on the disk itself will make that more and more difficult. There's also the possiblity that without a global (and I mean all of Earth) application registry to ensure that all machines and systems use the same app to access the same files, otherwise the smart, database filesystem may be getting in the way.
Of course, what I'm really saying is that whatever the filesystem, we will still be required to make application (and maybe system) level tools to work around it in order to do the dirty work of keeping the work running. At least, we be doing that until computers are thinking for themselves and writing all software on demand...
The open source community, at least as of late in the GUI area (think KDE, and Gnome) have just been playing catch up, and duplicating a lot of the elements from Windows and Apple. I haven't seen a whole hell of a lot of innovation in either KDE or Gnome, the two most popular environments. We can't let the same happen with file systems.
If a database-based filesystem was to be developed for Linux right now, I'd be willing to bet that it would be a totally mundate "relational" system.
We need to do better than that- innovate! I think the neuralNexus guy's really got it right (<A HREF="http://www.neuralnexus.com/" title="neuralnexus.com">http://www.neuralnexus.com/</a neuralnexus.com>)... OK... Fine. I'm the neuralNexus guy. I think this stuff would make a really good file system. It's perhaps the most searchable, and easy to manage system, and breaks down really well into easily manageable, and optimizeable parts. It would blow mere attributes outta the water. Imagine being able to tell your computer (from a command line, specialized application, or generic "file" browser) to get all emails related to a specific topic, person, etc... I know I'm just self-promoting my project, but I really think that this is the way to go with filesystems. Comments?
XML in a filesystem? I find it hard to believe that any filesystem architect would ever design the overhead of an XML layer into a filesystem. While XML is human readible it is hardly efficent.
Also your reference to Linix desktops copying elements from Windows is laughable. Apple maybe, but Microsoft?!. I find it amazing that anyone would believe that Microsoft has been innovative in any area other than marketing over the last 20 years. KDE is a far superior deskop to Windows XP (Multiple Desktops, gotta love em). The only issue I have is that you can't run Fruity Loops on it (Yet, <A HREF="http://www.codeweavers.com/site/products/cxoffice/" title="codeweavers.com"> http://www.codeweavers.com/site/products/cxoffice<nobr>/<wbr></nobr> </a codeweavers.com>).
How can I transfer the metadata from one metadata-aware-filesystem intact with my file to another metadata-aware-filesystem (Longhorn to Tiger, for example?)
The answer is - you probably can't. If you've ever worked with Synchronize tools for your PDA, phone, Outlook, Yahoo and whatnot, you will realize that nobody agrees on what and how metadata should be organized much less presented. (It may even be considered a business advantage for there not to be agreement - consider the lock in possibilities of a metadata filesystem that is not compatible with other systems!!)
What I think is a significantly better approach (especially for an open system like Linux) would be to create and support an open standard that wraps metadata and filedata into a single file - something like a mime-multipart package.
This allows filesystems and applications alike to inspect a file. (Advanced metadata filesystems can utilize this info if they want, but the canonical source is still the file).
In addition, since the file data is always stored in well-defined manner, even if the application cannot understand the metadata, it can still try to do something with the filedata.
A File System's Job
Posted by: Anonymous Coward on April 02, 2005 06:11 PM#