Jon Corbet QA: Upstream Contributions Influence Direction of Linux Kernel

31

Jon Corbet is a highly-recognized contributor to the Linux kernel community. He is a developer and the executive editor of Linux Weekly News (LWN). He is also The Linux Foundation’s chief meteorologist, a role in which he translates kernel-level milestones into an industry outlook for Linux. Corbet has also written extensively on how to work within the Linux kernel development community and has moderated a variety of panels on the topic. Today, he gives us an update on the Linux “weather forecast,” why sharing your code upstream is critical, and the state of virtualization in the kernel.

You’ve been the “chief meteorologist” for the Linux Weather Forecast for a while now. What’s the general forecast for Linux today?

Corbet: Bright and sunny with occasional thunderstorms.

That’s a broad question; I’ll restrict myself to the kernel level where I can at least pretend to know what I’m talking about. Things are going quite well, for the most part. The development process is humming along with very few glitches, and we have more developers involved with every release. Things look very healthy.

The 2.6.34 kernel will hit the net sometime this month; it seems to be stabilizing well. It’s full of improvements to our power management and tracing infrastructures, both of which have been slower than we might have liked it to mature over the last few years. There’s two new filesystems: LogFS is aimed at solid-state devices, while Ceph is meant for high-reliability network-distributed storage. Things have been improved across the board; it should be a good release.

You’ve also written a lot about how to participate in the Linux development community and have moderated a number of panels on the topic. What is the most common question you get and how do you address it?

Corbet: The most common question I get must certainly be: “how do I get started

working on the kernel?” It is true that getting into the community can be an intimidating prospect: the code is large and complex; the mailing list gets 400 messages per day; and the community does not have a reputation for

always being overly friendly to those who are just beginning to feel their way around.

That said, it’s not as bad as it seems. Most community discussions are quite polite and professional anymore, and people do try to guide newcomers in the right direction. The quality of the kernel’s code base has increased over the years; it has also always been a highly modular system, so it’s easy to just focus on one little piece. And the documentation for newcomers has gone from nonexistent to significant.

Aspiring kernel developers still often wonder where to get started. Too many of them try to make their entrance with things like coding style patches or spelling fixes. That’s understandable; it must be tempting to start learning the ropes with a patch, which, with any luck at all, cannot break anything.  But that kind of work is not hugely helpful, to either the developer or the kernel as a whole. That’s why I tend to pass on Andrew Morton’s advice: the first thing any kernel developer should do is make the kernel work flawlessly on every system they have access to. Fixing bugs is a far more valuable exercise and a great way to start building a reputation within the community.

The other thing I like to suggest is reviewing of patches. That can be hard for a newcomer, who can only feel out of place asking questions about code posted by established developers.  But code review is always in short supply, and there’s no better way to learn than to look at a lot of code and figure out how it works. Review comments which are expressed as questions (rather than as criticism) will always get a polite reply – and often thanks.

With an increase in the number of companies using Linux, especially in the mobile/embedded space, is it changing the dynamic of the Linux development process? If so, how?

Corbet: There have been many companies using Linux for years, and most kernel developers have been employed to do that work for just as long; so, in one

way, things haven’t changed much. We just have some new companies showing up, and they are all welcome.

That said, the embedded world is different. Embedded developers are bringing some good things, including a stronger focus on power efficiency and code size and support for a wide variety of new hardware. On the other hand, embedded developers often work in an environment of strict secrecy and tight deadlines that can make it very hard for them to work effectively with the community.  We are still working on ways to help these developers get their code upstream. Progress has been made, but the problem is not fully solved by any means.

Can you tell us a little about the virtualization work happening at the kernel level and what still needs to be done?

Corbet: I’ve been saying for a while that, at the kernel level, the virtualization problem has been solved. We have three virtualization mechanisms in the kernel (Lguest, KVM, and Xen), two of which are being widely used commercially.

The large number of developers working on Linux virtualization would be amused to hear me say that there’s nothing left for them to do, though. Most of the activity at this point is in the performance area. The intersection of memory management and virtualization appears to be especially tricky, so there is a lot of work being done to make it function more efficiently.

Some people question the importance of “mainlining” their code in the Linux kernel. Can you talk about the benefits and the payoff of getting your code accepted?

Corbet: Well, that’s a long list. Some of the things I routinely tell people include:

* Code that is in the mainline kernel is invariably better code. There has never been a code submission to come out from behind a corporate firewall – or even from a community project site – which did not need major improvements. Those improvements will happen, either as part of the merging process or afterward. Anybody who cares about the quality of their code will want to get it into the kernel, where it can be reviewed and improved.

* The maintenance of out-of-tree code can be an exercise in pain; there is quite a bit of work required just to keep up with mainline kernel changes. Once the code is merged, that work simply vanishes.  When a kernel API changes, all in-tree users are fixed as part of the change; out-of-tree code is out of luck. Merging code into the mainline allows the contributor to forget about much of the maintenance work, freeing them to go build new things.

* Code in the mainline kernel gets to users automatically; they do not have to go digging through other repositories to find it. Distributors will enable it. That makes life easier for your users, and they will appreciate it.

* That’s simply how our community works.  We would not have the kernel we have now if all those contributors did not do the work to get their changes upstream.

* It should also be noted that the contribution of code is the best way to influence the direction of the kernel going into the future. The kernel belongs, literally, to all who have contributed to it; each contributor has helped to build a kernel that meets their needs a little better. A code contribution is like a vote that helps to decide what kernel we’ll be running tomorrow.