June 11, 2005

Decline and fall of the version number

Author: Nathan Willis

As a software version-numbering aficionado, I have recently concluded that the FOSS world has gone mad and is hurling itself -- users and developers alike -- into a black hole of confusion and long-winded explanations.

This realization first struck me weeks ago while I was reading a thorough review of GCC 4.0 by Scott Robert Ladd. The review itself was precise, documented, and well-thought-out. I liked it. But it concluded with this harrowing assessment:

... no one should expect a "point-oh-point-oh" release to deliver the full potential of a product, particularly when it comes to a software system with the complexity of GCC.

Yikes. Historically the point-oh release signified a finished product. That was why release teams resorted to tacking on Greek letter suffixes and precious metal monikers as caveats while they sweated swatting those final few bugs. When did all that change? I wondered. Is it a point-oh release whether it's ready or not?

Then I remembered the Linux kernel. When I first starting running Linux, the venerated odd/even dual-branch numbering scheme was The Law: 2.1 for development, 2.0 for general consumption. That's gone, too. Before 2.6.0 was unleashed, we witnessed a parade of "pre"-this and "pre"-that-"testX" kernels before the ball dropped.

And since then there hasn't even been and odd-numbered kernel series. Development takes places in branches maintained by individual branch maintainers, a change that has two important effects. First off, it has ruined the formerly crisp-looking homepage at kernel.org. Whereas in the past two links were sufficient to completely describe the state of the kernel, it now takes 11 kernels and fourpagesofdefinitions. But FAQs and wikis still advertise the odd/even scheme, several years out of date.

Secondly, Linux enthusiasts live in fear now; fear that any day (perhaps today) a non-Linux programmer you know is going to walk up to you at a street corner and ask, "When is there going to be a new version of Linux again?" Your first instinct will be to blurt out, "Well, you see, odd-numbered kernels are the unstable series where they do the development, even-numbered kernels are the stable branch. Cheers." But you'll be wrong, and will have to either launch into a long and drawn-out explanation of the BitKeeper fiasco or quickly concoct a web of lies.

Reflecting on this I lamented also the countless man-hours GNOME developers spend explaining that to them 2.10 comes after 2.8, the mistrust on the faces of Ubuntu neophytes upon hearing that Ubuntu 5.04 is a finished "point-oh" release, and the headaches brought on by debating whether to install WINE 20041019 or WINE 20050111. Then, just when I thought it could get no worse, the Anjuta IDE project simultaneously announced a stable release numbered 1.2.3 and a development release numbered 2.0.0. We had finally come full-circle.

All of the laws of version numbering are now but charred ashes; not only was this x.0.0 release not to be blindly trusted without skepticism, it wasn't even meant to be run by users at all.

So what?

Of course, I am using a fair bit of false naivete in describing these incidents; I'm familiar with all of the aforementioned projects and I am not really confused by their version numbers. But I am able to navigate between the incompatible meanings of those numbers only because of my personal familiarity with the projects over the course of time. A new user would be genuinely confused.

That confusion illustrates how a non-technical decision as unsubstantial as the number tacked on after a release can have unintended and unfortunate side effects. Picking a bizarre or unpleasant project name can distance new users. Making a major release the same day the new Xbox hits the shelves can cost press coverage. Abandoning your established version-numbering scheme can call down a scourge of mail messages all asking the same questions over and over again.

The big question is whether or not open source is more vulnerable to this type of malady than closed-source, commercial vendors. I'm going to take the foolhardy step of openly soliciting comments on that question.

My gut feeling is that open projects are susceptible to these kind of non-technical pitfalls, precisely because they are founded and driven by technical thinkers and workers. As hard as it is to attract coders to your project, it is far harder to get volunteers for the non-technical tasks (bug triage, documentation, aesthetic stuff) just because the community (by definition) is populated with programmers. In contrast, commercial products are cooked up by front-office suits, marketing surveys, and knee-jerk reaction to popular trends, then non-technical decisions (and, regrettably, technical ones) are made by non-techies -- up to and including the name of the product and its release schedule.

But there is evidence to the contrary as well. Namely, the change in kernel version numbering seems correlated to corporate support of its development. The demise of the odd-numbered development branch was most publicly associated with Linus's adoption of BitKeeper, but the numbering change actually began earlier. Specifically, the more corporate capital has been funding kernel development and paying salaried developers, the more -testX and pre-releases have preceded each new x.0.0 release.

That in itself makes sense; given that closed source vendors have the luxury of not inadvertently selling a beta-status product, no open source vendor wants to jump the gun and ship its package with a critical bug. Thus they squeeze extra QA cycles out of the development team, that we see in the form of those pre- and -testX releases.

Based on some Googling, it also seems that commercial Linux distributions have become slower and slower to adopt x.0.0 kernels in their boxed-and-shelved products. This is, of course, at odds with the extended test cycle that produces each x.0.0 release -- if you take longer testing it, you should be more certain of it once it is ready.

Fight the man?

So, corporate interference is screwing up the release-numbering of the Linux kernel? Possibly. But it's the ripple effect of this interference on those unguarded, non-technical factors of other open source projects that bothers me.

I've already said that conflicting numbering schemes lead to confusion for the user; now you have to have firsthand experience with a software project in order to interpret its new releases. Version numbering is the one thing that distinguishes one release of an application or library from another. If people are confused by it or it generates more support forum questions than your actual software, you have a problem.

Luckily, I am here to present to you what I'm humbly calling Willis's Three Laws of Release Numerics. Adhere to these rules and happy users, accolades, and peer respect will beat a path to your door.

The First Law: Pick a numbering scheme, then don't change it. Ever. When you get right down to it, the only purpose a version number serves is to denote the relative supremacy of one of those versions compared to another. That's why even though commercial software vendors periodically decide to rename their products with letters (see RealPlayer G2 for a historical example, or Adobe's Photoshop CS for a will-be-a-historical-example-shortly example), they always have to slink back to the good old real number system when their customer service reaches the 90% threshold on that one question.

This doesn't mean that you can't decide that your big rewrite merits a major-number increment instead of a minor-number increment. But it does mean don't switch from the venerated even/odd scheme to an odd/even scheme or to a new scheme tied directly to the numbering scheme of some toolkit you depend on. Don't make a point-oh release mean "unstable" when the previous point-oh release meant "stable."

The Second Law: Don't mess with math. In other words, don't employ a scheme that violates established rules of numbers. Higher numbers indicate more recent versions. Numbers sort; your FTP archive needs to sort in some meaningful way, or you are asking for a world of hurt. Tacking on -testX, -alpha, or -beta will give you releases that may or may not sort correctly, so avoid them.

Furthermore: if you use decimal points as your major/minor delimiter, your users are going to ask you why.10 comes after.8, every single time. And the misunderstanding is not their fault, it is yours; the decimal point has a very precise meaning and has had for 400 years. You've chosen to overload it by declaring that in this one special context, it means something different. John Napier invented it in 1619; lacking his permission, don't try to assign new definitions and behavior to it.

Brrr. I can feel the "it's not a decimal point" police straightening up the chips on their shoulders and reaching for their email clients. Nevertheless, we trudge on. Your choices are: (a) get comfortable with re-explaining your decision every time, (b) increment from 2.8 to 3.0 as OpenBSD does so effortlessly, or (c) pick one of the plentiful other ASCII characters as a delimiter. There's no shortage.

The Third Law: Make friends with infinity. In other words, don't be afraid to increment. You're not going to run out of numbers; we have mathematicians working around the clock to find new and bigger ones for common usage all the time. Trepidation to increment version numbers leads programmers to commit heinous sins like tacking on additional ".minor.minor" decimals to subsequent releases, leading to a syndrome called decimal bloat wherein the version number changes more and more slowly approaching some finite limit, but the length of the version number increases without bound. The logical end of this behavior is a program whose version number consumes all of the system resources and thus cannot be run.

In the real world, though, remember that Sun famously decided to leap forward from Solaris 2.6 to Solaris 7 when it realized that everyone else in the software industry was incrementing their numbers faster, and they suddenly looked like they had lots of ground to catch up. I have seen a lot of open source projects fall victim to the same "is this different enough to increment the minor number?" paralysis. It doesn't matter if there are fewer changes between 2.4 and 2.6 than there were between 2.2 and 2.4. All that matters is that the number communicates which one is the most recent. Decelerating release numbers make outsiders think that development has slowed, and that misconception will hurt your user base.

A brave new tomorrow?

The world used to make sense. As the first rays of dawn broke over the horizon, farmers strode nobly into their fields of grain to reap the harvest of an honest season's living, while across the country programmers put to rest another night's coding and packaged a well-honed x.0 release, wistfully watching it bound off into the Internet to replicate blissfully on the mirror servers. People everywhere were happy.

Then our version numbers collapsed. There was chaos. Some numbers got longer and longer. Some turned into letters and words. Some became dates. Nobody knew what the numbers meant anymore. People were afraid to ask what the numbers meant, and they became afraid of numbers themselves.

What happens next remains to be seen. Will there be riots in the streets? Blockades? Protesters spray-painting numbers on the walls of company headquarters? No. How about the gradual awakening of an enlightened, numerically harmonious world consciousness? I wouldn't bet on that either. But maybe a generation from now, when those post-apocalyptic programmers rebuild the software industry from its ashes, they'll do it right from the ground up. Here's to you, amigos.