"To me, what we see now on the virtualization front is a natural evolution, rather than a revolution," says Kir Kolyshkin, project manager for OpenVZ, an open source virtualization effort supported by SWsoft. "We went from single-user to multi-user, from single task to multitasking. Now we go from single OS to multiple OSes, from single instance to multiple isolated VMs or VEs."
Virtualization has many applications, including testbeds and virtual server hosting. "The core motivation for most users is that a single machine now has more power than they can use for a single application," says Eric W. Biederman, a kernel hacker who works on system software and clusters for the high-performance supercomputing market. "Therefore it makes sense to consolidate servers on a single machine."
"Newer machines are 95% idle when used for 'typical' setups," agrees Herbert Poetzl, project leader of the Linux-VServer project. According to Poetzl, virtualization is also important for security. "Isolating services without adding unnecessary overhead and/or blocking out wanted communications is a big advantage," he says. "Lightweight isolation allows you to save a lot of resources by sharing, and increases security for service isolation."
Biederman says that virtualization also opens up the possibility for migration of running programs. "[I]f you can take everything with you that is global [such as a system's IP addresses and PIDs], and use those same global identifiers after you migrate, the problem becomes tractable without application modifications."
Two types of virtualization
Biederman explains that there are actually two separate and non-conflicting approaches to virtualization that are being addressed for the Linux kernel. One is to use a hypervisor, which handles calls to hardware allowing a system to run multiple (and different) operating systems at the same time in virtual servers; with paravirtualization (as with Xen), the kernel or OS is modified so that it knows that it's running on a virtual machine.
The other approach is to give the kernel the ability to run multiple instances of itself in separate, secure "containers" on the host system by virtualizing the namespace of the kernel.
"The core idea is that several objects that the kernel manages have global identifiers, and by making those identifiers non-global and doing a context-sensitive lookup, you can have the feel of multiple UNIX instances without the overhead," says Biederman. "Essentially this is building a super chroot facility."
A team of many players
OS-level virtualization is a major feature; implementing it in the Linux kernel is a large undertaking and isn't going to happen casually. In fact, several developers in the Linux virtualization arena, including Biederman, Linux-VServer, OpenVZ, and IBM are contributing to the effort. IBM's contribution should be particularly useful, since the company was a pioneer in the development of virtualization, and has more than 40 years of experience with the technology. IBM says that its recent innovations being integrated into the Linux kernel "represent some of the most significant development initiatives underway at IBM's Linux Technology Center."
Biederman says that the first thing everybody had to do was learn how to work together. What helped, he says, was working on first implementing "generally non-controversial" features. "Plus, this cemented the incremental approach," he adds, noting that both Linux-VServer and OpenVZ have already announced that they're using these new features in their software.
To help things get going, some of the key developers met at the Linux Kernel Developers Summit in Ottawa, Canada this past summer.
"Face-to-face meetings are always helpful, and I think we need more," says Kolyshkin, who suggests a "virtualization track" at next year's Kernel Summit. He says that what they've accomplished so far is only the beginning.
"I think it's too early to say we have achieved something big," he says. "In fact, there are just a few building blocks, a few bricks that came in."
The first few bricks
All of these first virtualization building blocks were proposed, discussed and worked on throughout 2006, and were in Linux kernel developer Andrew Morton's merge tree for a while; now, having made it to a stable 2.6 kernel (2.6.19, released November 29), they're officially a part of mainstream Linux.
"Strictly speaking, [these patchsets] do not change the current behavior of the Linux kernel," says Poetzl, "but they pave the road to OS-level virtualization."
The IPC virtualization patch virtualizes Linux Inter-Process Communication (IPC), which is the ability of processes to communicate and share data with each other. With this patch, processes can only see and communicate with those processes in the same virtual container.
"Traditionally, there is a single set of IPC objects (shared memory segments, message queues, and semaphores) per a running kernel," explains Kolyshkin. "Since we want to create multiple isolated environments -- containers -- on top of a single kernel, we do not want those containers to see each other's IPC objects. Thus the need to virtualize IPC -- or, in other words, create IPC namespaces."
PID (Process ID) virtualization allows PID space to be unique to each container so that inside a container no processes outside of it can be seen.
On a UNIX system, the init process always has a PID of 1, explains Kolyshkin. "Since with multiple containers there are multiple inits, all those inits also should have PID=1. Thus the need for PID namespaces, or PID virtualization."
Kolyshkin adds that this isn't the only time PID virtualization can be handy. "Consider the live migration scenario when you migrate a set of processes from one machine to another," he says. "Since there is no way to change the PID of the task, you have to have all the same PIDs on a destination machine. Without separate PID namespaces, you cannot guarantee that."
The UTS namespace patch virtualizes the utsname structure, which gives basic information about the operating system and hardware, as well as the hostname; with this patch, utsname is local to each container -- a necessity for virtualization.
"We cannot live with the hostname being the same for all containers," says Kolyshkin.
According to IBM, the purpose of this patchset isn't merely the functionality itself, but that it helps lay the groundwork for virtualizing other system resources in the same way. Towards that end, the patchset introduced the necessary structures and code and provided an example of how to use them.
"IBM contributed it to 'get the ball rolling' on development for application containers, which are needed both for virtual server and application migration functionality," the company said in a statement. "In developing the patch, IBM also helped to get all the parties who were working to develop such functionality privately, together. This both accelerates development of the features and increases the code quality by ensuring that all parties get the right to object to bad design or bad code."
But on the paravirtualization side, things have been advancing, too. "The truth is that the paravirtualization support has been trickling in for a long time," says Biederman.
Patches to begin implementing paravirtualization have been merged into the development kernels; another patch merged into the development tree is the Kernel-based Virtual Machine (KVM). It creates a /dev/kvm device that allows the system to run virtual machines.
Morton says that he expects both KVM and the base paravirtualization to be included in Linux 2.6.20. "OS-level virtualization continues to trickle in," says Morton. "I don't know when [or] if it will be complete."
Meanwhile, all of the developers are steadily working on making it happen. "My personal plan is to finish up the PID namespace before I move forward with network namespace," says Biederman. "In some senses that is the critical namespace, because once you start separating out processes it really begins to feel like a separate system."
IBM is continuing to develop full-featured Linux virtualization at three levels -- full OS, virtual servers, and lightweight application sets. Kolyshkin predicts that the next two big advances will be more resource management work, and a checkpointing and live migration capability. While it will be some time before all these pieces are in mainstream Linux, these careful steps show that it's materializing.
"As for the general idea of how to achieve it -- bit by bit, patch by patch, with peer review, suggestions and improvements from everybody," says Kolyshkin. "And this is how we do that. Slowly, but moving forward."