Why NFS sucks
Olaf Kirch gave his talk entitled "Why NFS sucks", following a pattern of talks entitled "Why _ sucks" at this year's OLS, on the topic of NFS and its many less successful rivals.
He started by commenting that it was really a talk about NFS and what a wonderful filesystem it is. He meant it just as seriously as he the original title of the talk.
Everybody complains about NFS, Kirch stated. To prove his point, he asked the audience if anyone thinks NFS is good. Three people raised their hands in an audience of more than a hundred. The SUSE Linux distribution's bugzilla had "NFS sucks" as a catch-all bug for gripes for a while, he commented, though it was recently removed.
In the early 1980s, Kirch stated getting a little more serious as he began discussing the history of NFS, Sun had a limited network filesystem called RFS. In 1985, Sun released NFS version 2 along with SunOS 2, with no sign of an NFS version 1. In 1986, Carnegie Melon University and IBM created AFS.1988 saw the creation of Spitely NFS, which was NFS version 2 with cache consistency. It was another six years before the next major development on the time-line. In 1994, crash recovery was introduced for Spitely NFS, and that same year Rick Macklem released Not Quite NFS (NQNFS) along with 4.4BSD. In 1995, NFS version 3 was released as, as Kirch put it, general wart removal. In 1997, Sun released WebNFS, intended to be as big as HTTP, but it didn't even fizzle. In 2002, NFS version 4, the 'Internet filesystem' was released.
Kirch went on to explain the basics of NFS version 2. NFSv2 is a stateless protocol. This allows either party to carry on as if nothing happened after a crash and reboot or restart. If an NFS server crashes, the client just has to wait until the server comes back up, and then it can continue as it was. If it were stateful, every client would need a state recorded and tracked by the server. A stateless protocol scales better.
NFS can export almost any filesystem as a network filesystem. It is an important strength of NFS. It is not filesystem specific.
Files need a file handle that is valid for the entire life of the file, Kirch stated. This works well with inode tables, but new filesystems are more complicated. Directories can reconstruct a chain of entries using the parent directory (..) entries. Files are pointers to inodes and directories. With NFS, these ids can change.
NFS listens on port 2049. It needs to talk to mountd to get the file handle to mount a directory, portmap to get a port to connect to, another protocol to perform file locking, another to recover from a failure in a stateless state, another to recover locks after crashes... Kirch expressed some exasperation with an old NFS attitude from versions prior to four that each new feature requires its own protocol. Version four, he noted, mostly gets it right.
NFS version 2, Kirch commented, is notorious for having its implementation details passed on primarily by oral tradition rather than meaningful specs. He described attribute problems that can result in client/server confusion because of different common implementations.
Renaming or deleting an open file should allow continued writing of that file. Over NFS versions 2 and 3, removing or renaming a file can have, as Kirch put it, interesting results. In NFS version 4, this is solved with "silly rename" which turns the removed file into a dot-file (.nfs.xxxxxx), though this file can also be deleted. The dot-file is then only removed once nothing has it open any more.
NFS versions 2 and 3 cannot handle simultaneous access to a file properly, he cautioned. The results can be gabled. NFS version 4 also has the problem, but will give an error message warning that there could be trouble.
Another problem inherent in NFS is the lack of file security. The client machine tells the server the user and group ids of the user trying to access a file on the server, and the server agreeably goes along with the information, trusting the client fully. A number of workarounds have been proposed and implemented over time, but none have really caught on.
NFS also has the nasty habit of saturating networks. Prior to version 4, NFS was entirely a user datagram protocol (UDP) based protocol. This is a lossy protocol that can overwhelm a network if it gets too busy. Some kind of congestion avoidance was needed, Kirch concluded. It needs to be smarter about re-transmission. The solution he offered is TCP, which NFS version 4 now uses exclusively. TCP is a stateful network protocol that ensures packets reach their destination and retransmits only if the packets were lost.
Kirch noted that there are a variety of alternatives to NFS, and summarized it as picking your poison. He listed a number of the alternatives, a long with brief descriptions of them and then a more detailed list with their strengths and their flaws:
- IBM open sourced AFS rather than continuing to maintain it as an end-of-life solution for it.
- DFS came from the Open Group and is either dying or is altogether dead.
- CIFS is a surprisingly healthy network file system.
- Intermezzo was nicely designed, but went away.
- Coda was written by Peter Braams, who subsequently moved on to another project. It's also kind of dead.
- Cluster filesystems exist, Kirch noted, but generally live on top of either NFS or CIFS.
- NFS with extensions, called pNFS,stores files and meta-data on separate servers.
Kirch, having listed them, got a little more in depth about a few of them.
AFS he called "Antiques For Sale" and said the filesystem is in maintenance mode. It relies on Kerberos 4 for security. The code itself is difficult to read, being a mass of #ifdef statements used to make it portable across multiple platforms. It is not interoperable, and cannot function on 64-bit platforms.
CIFS he called the "Cannot Interoperate File System". It is a stateful, connection based network file system. He described the protocol as a jungle, saying he couldn't speak about it any further because it is just "horrible". Its biggest problem, he noted, is it is controlled by Microsoft and that is its main barrier to adoption. Users want to know that it will still be there tomorrow, he added.
NFS version 4 he described as "Now Fully Satisfactory?" It's an Internet-oriented filesystem that has got a lot of things right. It interoperates with Windows, is on a single, firewall-friendly port (2049), and a flaw in callback code that opened another port has even been fixed in version 4.1. It is entirely TCP, with UDP now a thing of the past.
Basics on reverse engineering a USB device
I attended one tutorial session: Reverse engineering USB drivers for compatibility, by F/OSS consultant Eric Preston.
He began with a standard disclaimer -- "This is for educational purposes only."
The premise is simple: USB devices often lack vendor support. The vendors don't care about Linux, and their excuses range from nobody uses Linux to USB-IDs are intellectual property to who cares about USB, anyway, Linux isn't on the desktop.
What do we do, Preston asked We can wait for support from the device vendors or the community at large, or we can do it ourselves.
The mission, therefore, Preston stated, is to figure out how existing drivers work in order to write drivers ourselves. The goal is to support cool hardware, get more people involved in writing userspace drivers, and remove barriers for less experienced developers; make driver writing fun and less tedious.
The tools needed to reverse engineer a USB device, Preston explained, are, primarily, usbsnoopy and Windows. Using Windows, where most drivers are, and usbsnoopy, it is possible to see the interaction of packets between the USB device and the device driver in the operating system. It creates a log which can then be decoded into the functions.
To figure out what is what, simple tasks can be performed in Windows on the USB device and the interaction monitored and logged. Then the USB specification can be consulted and the log can be manually decoded, eventually, after months of work, resulting in some idea of what is happening.
With the help of VMWare or other virtualization programs, the painfully frequent reboots involved in the process can be avoided and Linux tools can be used in place of usbsnoopy, including one using a Linux program called usbmon in combination with Linux network snooper ethereal to monitor USB device traffic with the ethereal interface called ethereal dissector. Preston is writing it, but warned that the code is very messy and is not something that is quite ready to yet be shared.
The drivers themselves can be written with the help of libusb entirely in userspace. With the advent of libusb, it is not longer necessary to write kernel drivers to run USB devices. Preston did not actually write a driver in the tutorial, but did show attendees in the beyond-capacity packed room the path to do so.
AppArmor vs SELinux
Among the interesting BOF sessions of the evening was one called The State of Linux Security, led by Doc Shankar of IBM.
He invited several security experts to give brief updates on their security projects, largely concentrated around SELinux, whose esoteric nature is completely over my head. But one brief presentation particularly caught my attention.
Crispin Cowan of Novell presented recent Novell acquisition Immunix' AppArmor Linux security suite which appears to be an alternative to SELinux.
Its simplicity and logic led me to wonder why it was I had never heard of it before. The long and the short of it is it is a security tool that restricts access to services and applications only to the privileges, including specific root privileges, and files it needs to perform its duties -- and it's capable of learning what those are without being explicitly told by watching the programs to be defended perform their tasks and logging what they do. Cowan did a brief demonstration, showing how Apache could be tied down with AppArmor in just a couple of minutes, preventing a root hole in a sample web page from being exploitable by virtue of not allowing the resources needed to exploit it.
How can you beat that?
Update on the Linux Standard Base
The last session I attended on the third day was the obligatory annual Linux Standard Base update, presented as a BOF by Mats Wichmann.
Since the last OLS, Wichmann says that the Linux Standard Base version 3.1 has been released -- in two parts. The first part, the LSB core, was released in November of 2005, with the second part, the modules, being released in April of 2006. It was split into two to allow it to meet International Standards Organization (ISO) deadlines to become an ISO specification.
As a result of the ISO involvement, there are now two LSB streams. One is a relatively frequently updated version administered by the Linux Standard Base project itself, the other is the ISO specification. The two specifications are essentially identical.
The ISO specification exists mainly to allow governments to specify it as an ISO standards compliance when releasing contract tenders for technology, which would allow Linux Standard Base as a requirement. ISO standard 23-660 provides this.
The Linux Standard Base documentation is released under the Free Documentation License, but for the ISO, it is effectively dual-licensed documentation to allow the ISO to retain it as an official standard under their direction.
Asked how hard it is to keep the ISO version of the LSB standard up to date, Wichmann replied that it is a concern. The specifics of the specification cannot be changed all the time, even though the LSB project itself is evolving. The ISO specification can be kept up to date with occasional errata report filings, but the update cycle with the ISO is approximately 18 months. As a result, the ISO spec will inevitably lag behind the LSB specifications.
The next question asked who gets certified with the LSB. Wichmann answered that any company that has an economic interest in certifying its distribution or software package will do it, if there is a return. In theory, anyone can get any software certified, he noted, and there is no reason that companies cannot keep their software compliant even if they don't go through the process of actually being certified.
Questions on how conformance is verified and how long it take to do were asked. It's a self test, Wichmann admitted. Labs are too expensive, but tools are available for anyone to download and run against the software they would like to check for compliance. If there are no errors, the tests can easily be completed in a single day. If there are errors, naturally it will take longer. To become certified, the logs of the tests need to be submitted.
It was noted during the session that the Linux Standard Base's role is more or less passive. It does not mandate standards that are not generally already the norm. Its mandate is to document, not to push, even if better systems exist than the ones that are in use.
The last day of the conference promises to be exciting, with Greg Kroah-Hartman's keynote address. Stay tuned!