Author: David "cdlu" Graham
Day 3 of this year’s Ottawa Linux Symposium featured a number of sessions, most notably a keynote address by Ubuntu founder and space tourist Mark Shuttleworth, who called for the greater Linux community to start thinking about discussing syncronicity, his term for having major software releases synchronised. The conference wrapped up on Saturday with some final interesting sessions and statistics.
Shuttleworth was, per OLS tradition, introduced by 2007 keynote speaker James Bottomley, who showed a graph of Shuttleworth’s Linux kernel-related maililng list contributions over the years, noting three years in which nothing happened — the first in which he received half a billion dollars, the second in which he was “not on planet Earth,” and the third in which he was busy founding the Ubuntu Linux distribution.
Shuttleworth’s talk was called “The Joy of Syncronicity.” It was a visionary statement about how to grow the Linux market for everyone and reduce software development waste. As the world changes, so too must we, he said.
Development has to be driven by three major factors, Shuttleworth said: Cadence, Collaboration, and Customers.
Cadence is the pace of release of any given project. It is a regular, predictable time at which the next version will be released, a release cycle tied to the calendar. For example, Ubuntu is targeting a six-month release cycle for point releases with a predictable two-year release for major releases, he said. GNOME is good at this, though its initial attempts were met with some difficulties. It is now on a six-month cycle, and KDE is beginning to explore the idea.
Syncronocity, for Shuttleworth, is all about collaborating the cadences of several projects for the benefit of the customers. If the Linux kernel, gcc, KDE, and GNOME, to start, were always at the same version in each co-released Linux distribution, Shuttleworth argued, it would reduce code waste and help grow the Linux market for everyone.
The point is simple. In Shuttleworth’s vision, distributions would all be on the same versions of major software, but would always retain their other traits, differentiating them from each other and keeping the diversity of Linux distributions as lively as it is today. The predictability of releases would help all around. Kernel developers, he argued, would have an easier time developing if they knew exactly which versions of the kernel would be used when by what distributions. The same would apply to all aspects of the open source community.
Shuttleworth expressed hope that such a predictable, marketing-friendly setup would grow the total Linux market for every distribution and market.
Sustainable Student Development in Open Source
Earlier in the day I attended an interesting talk by Chris Tyler of Seneca College who discussed a strategy the school has developed to educate students in open source technology and development.
Many students get involved in open source software and the community on their own, but do it outside of their coursework. Seneca College, from Tyler’s explanation, has been looking to incorporate open source development directly into the curriculum. Under their system, senior year college students at Seneca are offered a list of open source projects seeking help to choose from to contribute to as their class projects.
Most of the efforts so far have been within the Mozilla project. One thing Tyler noted is that students are not used to large projects. Thousands or tens of thousands of lines of code is something that students can grok, and understand right through. But once they start dealing with larger projects, like Mozilla, which are in the millions or tens of millions of lines, there is too much code for any one person to know right through.
The other side to that is that faculty can also be overwhelmed. It is critical, Tyler noted, that faculty involved in this program be both familiar with the academic environment, as professors necessarily are, and integrated with and active in the open source community. Without that intrinsic understanding of the community, faculty members cannot be expected to do well. To that end, Tyler commented that other institutions have contacted his department about using the curriculum, but they are advised that it is not the curriculum that makes the project a success, but that integration between the faculty and the open source community.
A significant difference that Tyler noted between open source projects and normal assignments for students is that in a typical assignment, the student is responsible for the complete coding project, from design to implementation. In an open source project, they can be using code that already exists as part of a larger project and is as much as 20 years old.
While Tyler indicated that open source was clearly not for all students, some of whom are not happy working on group projects in that way, he said the successes far exceeded the failures. He cited a number of examples, one of which was of a student who took on the challenge of documented a previously undocumented API. This led to the question of how such assignments are graded. Tyler explained that the marking is done as an assessment of the contribution to the open source project and the accomplishment of the student’s stated goals. Thus it does not necessarily have to be a coding project to be a successful project.
Another example he cited was a student who developed an animated Portable Network Graphics implementation which he called apng. It was less cumbersome than MMG, the PNG project’s implementation of the same task, and has been merged into Firefox, Opera, and soon Microsoft Internet Explorer, although it was rejected by the PNG group itself.
The course requires real contribution to real, existing open source projects as a normal new contributor. A critical component of the course is to encourage the developers and the students to interact on an ongoing basis, preferably actually meeting in person at some point, during the course. As one example, he noted that students and developers of Mozilla interact on an ongoing basis in the #seneca channel on irc.mozilla.org.
Tyler said that the course works within an open source philosophy in its own right. The course notes and outline are posted on wikis, the projects are developed with them, and coursework is submitted through a developer blog aggregator, with each student required to create an aggregated blog to cover his progress. This setup also allows other members of the projects involved to keep up the accuracy of the course and project information.
More information about his project can be found on Tyler’s own blog.
Peace, love, and rockets
Worth brief mention is Bdale Garbee’s talk on using open source and open hardware to build a useful telemetry system for model rockets. Garbee spent some time outlining the model rocket hobby and explaining the shortcomings of altimeters and accelerometers currently available, namely that they are not easily hackable. He said he has been told that his main hobby is turning his other hobbies into open source projects.
The fourth and final day of Ottawa Linux Symposium started for me with an entertaining trip down memory lane by D. Hugh Redelmeier in a talk entitled “Red Hat Linux 5.1 vs. CentOS 5.1: Ten Years of Change.” Redelmeier took Red Hat Linux 5.1, released in June 1998, and compared it on the same computer to CentOS 5.1, a free version of Red Hat Enterprise Linux 5.1 released late last year. He chose the two systems because of the time separation, direct lineage, and coincidentally numbered versions of the two operating systems. He compared them by dual booting them on a 1999-built Compaq Deskpro EN SFX desktop machine with 320MB of RAM, upgraded from the original 64MB, and a 120GB hard drive, upgraded from the original 6.4GB drive that came with the machine.
Redelmeier described installing two versions of a Linux distribution nearly a decade apart in age on the same hardware as a “bit of a trick.” For example, he said, Red Hat 5.1 only understood hard drive geometry as CHS — Cylinders, Heads, Sectors. How many people remember CHS, he asked? The standard bootloader at that time, LILO, had to be installed on a cylinder below 1024. On a 120GB drive, that meant ensuring that /boot showed up in the first 8.5GB of the drive. Except that Red Hat 5.1 had not yet introduced the concept of /boot as a separate partition — that did not come until 5.2 — and so the root partition needed to be in the first 8.5GB of the drive — a relic of old AT BIOSes.
Among his other surprises were that CentOS 5.1 and Red Hat 5.1 could not share a swap partition. Red Hat 5.1 could not read the CentOS swap partition without running mkswap on boot, which is not a normal boot procedure. Red Hat 5.1, he noted, was limited to a 127MB swap partition anyway. That version of the distribution could be installed in 16MB of RAM, so 127MB of swap seemed like an awful lot at the time.
The computer Redelmeier chose did not have an optical drive, and so he was forced to install CentOS 5.1 using PXE boot. CentOS also requires a yum update once installed, which he described as very slow on that machine.
His observations from the process include noting that GRUB is generally better than LILO, as he had an opportunity to re-experience such entertainment as “LILILILILI…” as a LILO boot error.
Redelmeier indicated that he has been using Unix in some form or another since 1975. Considering that, he said, the Red Hat 5.1 Unix environment is “pretty solid.” There were a “few stupidities” he said, “like colour ‘ls’.” Looking at it now, he noted, FVWM, the window manager in Red Hat 5.1, had an old feel to it. Another age-old piece of software, xterm, he said, was still mostly the same, except that in Red Hat 5.1, xterm had been improved slightly to use termcaps — which broke it when you tried to use it remotely from, for example, Sun OS.
Red Hat 5.1 did not come with SSH; at the time it still had to be downloaded from ssh.fi. To log into the machine, he used rlogin with Kerberos. OpenSSH requires openSSL, and a newer version of Zlib than was available for Red Hat 5.1, something he was not inclined to backport. Redelmeier warned of “cascading backports” when trying to use newer software on such old installs.
Security, too, is quite bad in the original Red Hat 5.1, he commented, but the obscurity factor largely made up for it.
Another lesson he learned in the process of comparing the installs is about “bitrot.” Redelmeier commented that the original pressed CDs that came in the box still worked fine, but his burned update CD had bonded to the CD case and was no longer usable. Avoid bitrot, he cautioned, by actively maintaining stuff you care about.
Issues in Linux mirroring
John Hawley, admin for the kernel.org mirrors, spoke in the afternoon about “problems us mirror admins have to scream about.”
Not every mirror has 5.5 terabytes of space to offer the various distributions that need mirroring, Hawley said. Some mirrors only have as little as one terabyte to offer. Yet in spite of this, many distributions leave hundreds of gigs of archival material on mirrors. Hawley asked that distributions make it optional to mirror admins whether or not they take these archives. Fedora and Mandriva alone, he noted, use up fully half of his mirror space, while Debian, at a paltry half terabyte, has cleaned up its act on request. He warned that if other distributions don’t start reducing their mirror footprint, mirrors will no longer be able to carry them.
Disk cache is a major constraint on mirrors, Hawley warned. Disk I/O is the most significant part of any mirror operation. No mirror can keep up, he noted, with 2,000 users downloading distributions at the same time if the servers are not able to cache the data being sent out. Cache runs out, I/O use goes up, disk thrashing begins, the load goes up, and it is nearly impossible to get it back under control without restarting the HTTP daemon.
Keep working sets as small as possible, Hawley asked. His servers have 24GB of RAM, yet a distribution today can be 50GB. To be able to distribute the whole distribution means that some of it necessarily has to be gotten from disk at any given time, since only half of it can fit in RAM. Add multiple releases at the same time, and pretty soon mirrors are no longer able to keep up.
Hawley asked that distributions coordinate not to release at the same time. “I don’t care what Mark said — it’s bad!” Hawley exlaimed, in reference to Mark Shuttleworth’s keynote. Last year, Hawley noted as an example, Fedora, openSUSE, and CentOS all released within three days of each other, swamping mirrors. When that happens, he said, “we are dead in the water.” Please, he said, when doing releases, coordinate with other distros so as not to release the same week.
Hawley strongly suggested that distributions need to learn to keep mirror operators in the loop on release plans. Sometimes, he said, the only way he knows that one of the distributions he is mirroring has released a new version is by the spike in traffic on his mirrors. When a distribution is preparing to release, he suggested sending repeated loud, clear emails to mirror admins to warn them of this fact.
And then Hawley really got started. Hawley said he does not know of many admins who like BitTorrent. Users think it’s the best thing since sliced bread, and distributions and mirror admins are answering that demand. But Hawley would rather that users be informed as to what is wrong with BitTorrent.
So why is BitTorrent considered harmful?
The original idea, Hawley said, is to allow multiple users to download from the other people downloading. This, he said, is great for projects with large datasets but small numbers of downloads. But once the volume rises, BitTorrent “falls flat on its face.” Every client needs to talk to the tracker to get the source of its next segment and check the checksums of what it has. The tracker itself becomes a single point of failure, and a bottleneck to the download. There’s no concept in BitTorrent of mirrors versus downloaders, as everyone takes on both roles. This also means that any user of BitTorrent sinks to the lowest common denominator. If, for example, in your cloud of downloaders, there is a 56K modem user, that user can slow down the rest of the users’ downloads considerably as they wait to get chunks out of that modem.
BitTorrent, Hawley said, is complex for everyone. It adds manual labour to set it up to work on the mirrors, it is slow to download, and he explained that BitTorrent as a whole cannot even keep up with a single major mirror.
With graphs to back him up, Hawley showed that in the first week of Fedora 8’s release, the total number of downloads by BitTorrent of the release across all sources was roughly equivalent to the total number of downloads from only the kernel.org mirror for BitTorrent, yet some 25% of all bits traded in BitTorrent for Fedora 8 still came from the kernel.org mirrors.
Among its problems, BitTorrent is a largely manual process to set up for mirror admins. BitTorrent does not inherently have a way to automatically detect and join existing torrents, nor does it have an easy way to create a torrent from existing data. Aside from that, its chunk approach to data distribution causes disk thrashing on the servers. Per download, he said, BitTorrent is 400 times more intensive than a single direct download from a mirror, largely on the client side, because of its weird disk seeks.
With a Web server, the server can simply use a kernel function called sendfile() to pick up a file and send it. With BitTorrent, a file is divided into small chunks that have to be seeked for constantly and distributed. If BitTorrent continues to thrash mirrors, he warned, mirrors will no longer participate.
Peer-to-peer distribution for Linux distribution releases has a role, he said, but BitTorrent is not the answer.
This marks the tenth consecutive year of the Ottawa Linux Symposium. Organisers say that 600 people attended this year, in spite of the weak US economy and the scheduling conflict with OSCON, which scheduled itself for the same week as OLS’s traditional time slot — and has again for next year. Some attendees at OLS, including keynote speaker Mark Shuttleworth, attended part of each conference to reconcile this conflict.
The Ottawa Congress Centre, where OLS has taken place for the past 10 years, is being torn down and rebuilt over the next three years. As a result, OLS is “going on the road” and will take place in Montreal at the Centre Mont-Royal next year, with dates to be determined.
As per tradition, Craig Ross, one of OLS’s two key organisers along with Andrew Hutton, gave the closing announcements and statistics at the end of the last day. In 10 years, there have been approximately 5,000 attendees, 850 talks, 23 calls from embassies, 11 calls from authorities, 2 attendees found asleep in the fountain at the closing reception (alcohol is provided, in case you were wondering), and some 50,000 beverages consumed.
And of course, Ross had to post a slide showing T-shirt sizes issued through the conference — slide photographed by Yani Ioannou — showing the, ahem, enlarging Linux community.