February 24, 2010

Built to Last

It has now been almost exactly five years since kernel development community tentatively started using the git source code management system with the 2.6.12-rc2 commit. That was an uncertain time; nobody really knew how long it would take the development process to get back up to speed after an abrupt core-tool change. As it turned out, git was almost immediately useful, and has only become more so since. Making the development process work is git’s main claim to fame, but, as a side benefit, git also makes it possible to learn a lot about how our kernel is developed. And that, as it turns out, includes taking a look at the code which is not changed.

The speed of the development process is impressive; the nearly-released 2.6.33 kernel is the product of nearly 11,000 individual changes affecting nearly a million lines of code (look here for more 2.6.33 statistics). Those numbers are boringly normal for a three-month development cycle; things are always moving that fast.

Given that, one might think that, by now, very little of that 2.6.12-rc2 kernel which was first committed to git would remain. After all, over 500,000 lines were deleted in this development cycle alone. I got curious, and decided to look a bit deeper. The result was the creation of some brutally hackish Python scripts, the expenditure of about a week of solid CPU time, and some statistics on the age of the kernel code base.

It turns out that, of the approximately 12 million lines of code and documentation that make up the 2.6.33 kernel, about 31% dates back to that 2.6.12-rc2 commit. A third of our current kernel has not been touched in the last five years.

Some parts of the kernel (including the network stack, the filesystem layer, and, alas, the documentation directory) have higher-than-average amounts of old code. Over 40% of our documentation is at least five years old. Some of that documentation covers things which haven’t changed - how to configure old hardware, for example, and Klingon language support - but much of the rest is just … old.

The newest code can generally be found in the core kernel, which is much more aggressively improved and updated. But, even there, 25% of the memory management layer dates from 2.6.12, as does about 13% of the “kernel” directory.

Does this mean that we have a lot of old and unmaintained code sitting around? In places, that will certainly be true; no body of code this large can be without the occasional cobweb-filled corner. But I also think these numbers show that we have built this kernel to last. The development community’s focus on code quality and maintainability means that, even in a rapidly-changing kernel with contributions from thousands of developers, a third of our code works so well that it has not even needed a dusting-off in the last five years.