2.6.38: Making Things Just Work

72

Linus Torvalds announced the release of the 2.6.38 kernel on March 14. Like its predecessors, 2.6.38 incorporates a lot of work – over 9,500 patches from over 1,100 developers. There are a number of useful changes, including some important scalability improvements, but, in my mind, the most interesting theme behind this kernel is that of making advanced features Just Work.

“Pages” are the units of memory as understood by the processor’s memory management unit. Since the beginning, Linux has used 4096-byte pages on most architectures – the smallest size that the MMU understands. Contemporary processors can handle a number of page sizes simultaneously, though, with 2MB or 4MB often being the next largest size available. Larger pages require less overhead to manage, but the real value to their use is that they greatly increase the amount of memory which can be covered by the processor’s translation lookaside buffer (TLB). The TLB, which caches virtual-to-physical address translations, is a severely limited resource on most systems. But it is important; a TLB miss can cost many hundreds of processor cycles even if the destination page is fully resident in memory. A 2MB page requires one TLB entry; the same memory, in 4096-byte pages, needs 512 TLB entries. So using huge pages can save a lot of TLB misses, leading to significant performance increases, especially in virtualized situations.

Linux has supported the use of huge pages for years, but they have been painful to use. The pages must be explicitly set aside by the system administrator at system boot, and applications have to jump through special hoops to use them; see this series of articles for details on how it all works. The key point is that the fiddly nature of huge pages meant that there were few users; use of this feature was mostly limited to large database installations.

2.6.38 adds the transparent huge page (THP) feature. Rather than requiring administrator setup and application changes, THP simply substitutes huge pages for the regular kind whenever they are available and it appears to make sense. The result is that Linux systems take advantage of this hardware feature and run faster on almost all workloads – with no additional effort from administrators, developers, or users.

Group scheduling is another longstanding (since 2.6.26) Linux feature that has seen relatively little use. Group scheduling allows the processes running on a system to be partitioned into groups. Each group then contends for CPU time as a single unit; for each slide of CPU time given to a given control group, all of the processes within that group then contend to run. This feature creates an amount of isolation between groups of tasks, limiting the degree to which they can interfere with each other.

Once again, this feature was not heavily used because it required a certain amount of administrator-heavy setup to use. With 2.6.38, we now have automatic per-session group scheduling; this is the famous “200-line kernel” patch which got so much attention late last year. This patch (more like 700 lines by the time it was merged) automatically partitions processes into groups based on their session ID, essentially creating one group for each logged-in session. That separates users from each other and from system tasks without anybody even having to think about it.

I recently encountered the benefits of this feature when two-dozen processes simultaneously ran amok on my desktop system. See this article for a description of what happened then. In short, all of those processes destroyed system response – for themselves. Other processes, including those I created when logging in over the network to investigate the problem, were able to run normally, barely noticing the riot which was under way in a different control group. In a few moments, I was convinced of the value of this feature.

I think we will see more of this kind of change in the future. Linux has a lot of advanced features which can make the system run better and faster, but Linux-specific features are often underused. Users may simply not know about them and developers will be reluctant to add Linux-specific code to their programs. But if these features simply work without the need for specific action from anybody, they will benefit the Linux user base as a whole.